Wednesday, 2025-01-15

fungi	backup02.ca-ymq-1.vexxhost is up to 90% used on /opt/backups-202010 again, i'll start a new prune	00:17
fungi	in progress in a root screen session now	00:17
fungi	#status log Pruned backups on backup02.ca-ymq-1.vexxhost reducing volume usage from 90% to 60%	01:27
opendevstatus	fungi: finished logging	01:28
*** jhorstmann is now known as Guest5975		07:51
*** dmellado075539377 is now known as dmellado07553937		08:49
*** ykarel_ is now known as ykarel		13:43
*** jhorstmann is now known as Guest5997		13:46
clarkb	infra-root once I've caught up on my morning I'll try to test the db migration from paste01 to paste02 (paste02 seems to have deployed properly in the daily job run) and if that is quick and easy I'll probably proceed with doing it for real	16:11
fungi	and i'm hopping in a car for the next several hours, but i'll try to check in at some point in my evening	16:11
opendevreview	Clark Boylan proposed opendev/system-config master: Backup paste02 to the backup servers https://review.opendev.org/c/opendev/system-config/+/939126	16:13
opendevreview	Clark Boylan proposed opendev/system-config master: Retire paste01 backups on the smaller backup server https://review.opendev.org/c/opendev/system-config/+/939128	16:13
clarkb	that is just a rebase for some of the followup changes after we migrate the production server	16:13
clarkb	db dump takes about 3 minutes (I'm just taking notes here but also helps others follow along)	16:25
clarkb	and abuot 40 seconds to copy that dump from old server to new server	16:30
clarkb	the gunzip takes a little bit too maybe another 40 seconds	16:38
opendevreview	James E. Blair proposed opendev/system-config master: Fix parentage of jaegertracing image https://review.opendev.org/c/opendev/system-config/+/939367	16:39
corvus	clarkb: fungi ^ quick oopsie fix	16:41
corvus	that probably raced the refactoring	16:42
clarkb	corvus: approved	16:48
clarkb	the db restore is probably going to be the slowest step	16:48
corvus	thx!	16:48
clarkb	my rough plan is to put paste01 in the emergency file, announce a downtime via status bot, stop all but mariadb containers on paste01 and paste02, do a db dump, copy it from 01 to 02, restore db dump, start containers on 02, stop mariadb on 01, land dns CNAME update	16:49
clarkb	db restore is still running so don't have an estimate on total downtime yet	16:49
clarkb	7 minutse and 17 seconds. So probably a total downtime of 15-20 minutes with me coordinating things. Let me double check paste02 is functional	16:52
clarkb	yup I can load old pastse and make a new paste	16:53
clarkb	https://paste.opendev.org/show/bjRihXVrWY0IBQyUHy4L/ this is the canary paste I've just made for the production switch	16:54
clarkb	corvus: I suspect you are the only other person around right now. Any objection to me proceeding with my plan above? Let me know if you'd like to do your own testing or have things to you'd like me to check	16:55
clarkb	I'll send something like #status notice the Paste service at paste.opendev.org will have a short (15-20) minute outage momentarily to replace the underlying server.	17:00
opendevreview	Merged opendev/system-config master: Fix parentage of jaegertracing image https://review.opendev.org/c/opendev/system-config/+/939367	17:00
corvus	clarkb: that sounds like a good plan, but i may not be around much this morning to pitch in if something goes awry	17:01
clarkb	corvus: ack. I think the paste service is minor enough with enough alternatives that I'm ok with that. Worst case peopel can use gist or something temporarily	17:01
corvus	++	17:02
clarkb	#status notice The paste service at paste.opendev.org will have a short (15-20) minute outage momentarily to replace the underlying server.	17:06
opendevstatus	clarkb: sending notice	17:06
-opendevstatus- NOTICE: The paste service at paste.opendev.org will have a short (15-20) minute outage momentarily to replace the underlying server.		17:06
clarkb	not sure if impatient or if bot doesn't notify when done anymore	17:10
opendevstatus	clarkb: finished sending notice	17:10
clarkb	impatient confirmed	17:10
clarkb	ok things seem to work I'll alnd the dns update	17:26
clarkb	https://review.opendev.org/c/opendev/zone-opendev.org/+/939289 that is this change for the record	17:26
clarkb	heh backups failed on paste01 just now. I believe that isb ecause I deleted the sqldump gzip file I made	17:28
clarkb	oh or because mariadb is stopped there. Either way I believe this is anticipated and not indicative of a problem	17:29
clarkb	the outage is going to eb a little longer because I didn't factor in the time it would take to queue and gate the dns update...	17:30
opendevreview	Merged opendev/zone-opendev.org master: Swap paste.o.o to paste02 https://review.opendev.org/c/opendev/zone-opendev.org/+/939289	17:34
clarkb	Wed Jan 15 17:28:36 UTC 2025 Streaming script /etc/borg-streams/mariadb failed!	17:34
clarkb	confirmed the issue is that I stopped the database container. Not concerned about that	17:34
clarkb	https://zuul.opendev.org/t/openstack/build/2f26e4c5bc9e4c0aab84224c1a936ca1 dns update reports success now to test that	17:37
clarkb	seems to work for me at https://paste.opendev.org/show/bia8GZVgpVGidZOG96q2/ (new post migration paste) and https://paste.opendev.org/show/bjRihXVrWY0IBQyUHy4L/ pre migration paste used as a canary	17:38
clarkb	this should conclude the outage aspect of the transition. I have a few followups I need to add to my already existing set of followups. I'll get to that shortly	17:38
opendevreview	Clark Boylan proposed opendev/system-config master: Retire paste01 backups on the smaller backup server https://review.opendev.org/c/opendev/system-config/+/939128	17:42
opendevreview	Clark Boylan proposed opendev/system-config master: Remove paste01 from config management https://review.opendev.org/c/opendev/system-config/+/939378	17:42
opendevreview	Clark Boylan proposed opendev/zone-opendev.org master: Reset paste.o.o's TTL to the default (1 hour) https://review.opendev.org/c/opendev/zone-opendev.org/+/939379	17:43
clarkb	infra-root ^ 939126, 939128, 939378 and 939379 are all semi related followups to the paste migration to finalize things on the new server (start backing up new server, remove old server from inventory, retire old server backups, and set the ttl on the dns record back to default)	17:45
clarkb	I don't think those are super urgent, but if you have a chance to review and/or land them after you've confirmed the new server is working for you that would be great	17:45
clarkb	I just realized that I can start publishing the lodgeit image to quay again	17:54
clarkb	probably won't bother with that until we're a bit more confident in the new setup. But that is exciting	17:55
opendevreview	Clark Boylan proposed opendev/system-config master: Mirror uwsgi base image https://review.opendev.org/c/opendev/system-config/+/939383	18:47
opendevreview	Clark Boylan proposed opendev/lodgeit master: Reapply "Move lodgeit image publication to quay.io" https://review.opendev.org/c/opendev/lodgeit/+/939385	18:51
clarkb	the lodgeit change will fail until 939383 lands and mirrors the image. But I think this is what we need to do to publish to quay then we can followup with a change to consume the image from quay	18:52
clarkb	lots of moving pieces, but I'm hoping that as we get more of these building blocks in place we'll need less and less of them for images later	18:52
clarkb	I've just realized that the paste01 backups will continue to email spam us with failure warnings until we delete the server since that is driven by cron on the server. Any preference between disabling the crons by hand or restarting mariadb so that backups continue to run?	19:01
clarkb	I've made this rant before but I feel like it deservers another pass: Why does `ssh foo` show you one ssh host key hash type and `ssh-keyscan` shows you another ssh host key hash type and finally `dig sshfp foo` shows you a third hash type	20:33
clarkb	the underlying data is the same its just being hashed three different ways so that you cannot easily compare between them	20:33
clarkb	call me crazy but if the intention is to avoid mitm attacks making a hash type that humans can understand and check seems really important	20:35
clarkb	apparently you can do things like ssh-keyscan -t ecdsa \| ssh-keygen -lf - and that will get you the same format as what `ssh` emits	20:37
clarkb	still seems liek making it that difficult to do means no one will compare and double check. Thats all rant over	20:37
clarkb	I decided to restart the mariadb container on paste01 so that backups don't spam us for now. I left the actual web service in a down'd state	20:43

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!