fungi | backup02.ca-ymq-1.vexxhost is up to 90% used on /opt/backups-202010 again, i'll start a new prune | 00:17 |
---|---|---|
fungi | in progress in a root screen session now | 00:17 |
fungi | #status log Pruned backups on backup02.ca-ymq-1.vexxhost reducing volume usage from 90% to 60% | 01:27 |
opendevstatus | fungi: finished logging | 01:28 |
*** jhorstmann is now known as Guest5975 | 07:51 | |
*** dmellado075539377 is now known as dmellado07553937 | 08:49 | |
*** ykarel_ is now known as ykarel | 13:43 | |
*** jhorstmann is now known as Guest5997 | 13:46 | |
clarkb | infra-root once I've caught up on my morning I'll try to test the db migration from paste01 to paste02 (paste02 seems to have deployed properly in the daily job run) and if that is quick and easy I'll probably proceed with doing it for real | 16:11 |
fungi | and i'm hopping in a car for the next several hours, but i'll try to check in at some point in my evening | 16:11 |
opendevreview | Clark Boylan proposed opendev/system-config master: Backup paste02 to the backup servers https://review.opendev.org/c/opendev/system-config/+/939126 | 16:13 |
opendevreview | Clark Boylan proposed opendev/system-config master: Retire paste01 backups on the smaller backup server https://review.opendev.org/c/opendev/system-config/+/939128 | 16:13 |
clarkb | that is just a rebase for some of the followup changes after we migrate the production server | 16:13 |
clarkb | db dump takes about 3 minutes (I'm just taking notes here but also helps others follow along) | 16:25 |
clarkb | and abuot 40 seconds to copy that dump from old server to new server | 16:30 |
clarkb | the gunzip takes a little bit too maybe another 40 seconds | 16:38 |
opendevreview | James E. Blair proposed opendev/system-config master: Fix parentage of jaegertracing image https://review.opendev.org/c/opendev/system-config/+/939367 | 16:39 |
corvus | clarkb: fungi ^ quick oopsie fix | 16:41 |
corvus | that probably raced the refactoring | 16:42 |
clarkb | corvus: approved | 16:48 |
clarkb | the db restore is probably going to be the slowest step | 16:48 |
corvus | thx! | 16:48 |
clarkb | my rough plan is to put paste01 in the emergency file, announce a downtime via status bot, stop all but mariadb containers on paste01 and paste02, do a db dump, copy it from 01 to 02, restore db dump, start containers on 02, stop mariadb on 01, land dns CNAME update | 16:49 |
clarkb | db restore is still running so don't have an estimate on total downtime yet | 16:49 |
clarkb | 7 minutse and 17 seconds. So probably a total downtime of 15-20 minutes with me coordinating things. Let me double check paste02 is functional | 16:52 |
clarkb | yup I can load old pastse and make a new paste | 16:53 |
clarkb | https://paste.opendev.org/show/bjRihXVrWY0IBQyUHy4L/ this is the canary paste I've just made for the production switch | 16:54 |
clarkb | corvus: I suspect you are the only other person around right now. Any objection to me proceeding with my plan above? Let me know if you'd like to do your own testing or have things to you'd like me to check | 16:55 |
clarkb | I'll send something like #status notice the Paste service at paste.opendev.org will have a short (15-20) minute outage momentarily to replace the underlying server. | 17:00 |
opendevreview | Merged opendev/system-config master: Fix parentage of jaegertracing image https://review.opendev.org/c/opendev/system-config/+/939367 | 17:00 |
corvus | clarkb: that sounds like a good plan, but i may not be around much this morning to pitch in if something goes awry | 17:01 |
clarkb | corvus: ack. I think the paste service is minor enough with enough alternatives that I'm ok with that. Worst case peopel can use gist or something temporarily | 17:01 |
corvus | ++ | 17:02 |
clarkb | #status notice The paste service at paste.opendev.org will have a short (15-20) minute outage momentarily to replace the underlying server. | 17:06 |
opendevstatus | clarkb: sending notice | 17:06 |
-opendevstatus- NOTICE: The paste service at paste.opendev.org will have a short (15-20) minute outage momentarily to replace the underlying server. | 17:06 | |
clarkb | not sure if impatient or if bot doesn't notify when done anymore | 17:10 |
opendevstatus | clarkb: finished sending notice | 17:10 |
clarkb | impatient confirmed | 17:10 |
clarkb | ok things seem to work I'll alnd the dns update | 17:26 |
clarkb | https://review.opendev.org/c/opendev/zone-opendev.org/+/939289 that is this change for the record | 17:26 |
clarkb | heh backups failed on paste01 just now. I believe that isb ecause I deleted the sqldump gzip file I made | 17:28 |
clarkb | oh or because mariadb is stopped there. Either way I believe this is anticipated and not indicative of a problem | 17:29 |
clarkb | the outage is going to eb a little longer because I didn't factor in the time it would take to queue and gate the dns update... | 17:30 |
opendevreview | Merged opendev/zone-opendev.org master: Swap paste.o.o to paste02 https://review.opendev.org/c/opendev/zone-opendev.org/+/939289 | 17:34 |
clarkb | Wed Jan 15 17:28:36 UTC 2025 Streaming script /etc/borg-streams/mariadb failed! | 17:34 |
clarkb | confirmed the issue is that I stopped the database container. Not concerned about that | 17:34 |
clarkb | https://zuul.opendev.org/t/openstack/build/2f26e4c5bc9e4c0aab84224c1a936ca1 dns update reports success now to test that | 17:37 |
clarkb | seems to work for me at https://paste.opendev.org/show/bia8GZVgpVGidZOG96q2/ (new post migration paste) and https://paste.opendev.org/show/bjRihXVrWY0IBQyUHy4L/ pre migration paste used as a canary | 17:38 |
clarkb | this should conclude the outage aspect of the transition. I have a few followups I need to add to my already existing set of followups. I'll get to that shortly | 17:38 |
opendevreview | Clark Boylan proposed opendev/system-config master: Retire paste01 backups on the smaller backup server https://review.opendev.org/c/opendev/system-config/+/939128 | 17:42 |
opendevreview | Clark Boylan proposed opendev/system-config master: Remove paste01 from config management https://review.opendev.org/c/opendev/system-config/+/939378 | 17:42 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Reset paste.o.o's TTL to the default (1 hour) https://review.opendev.org/c/opendev/zone-opendev.org/+/939379 | 17:43 |
clarkb | infra-root ^ 939126, 939128, 939378 and 939379 are all semi related followups to the paste migration to finalize things on the new server (start backing up new server, remove old server from inventory, retire old server backups, and set the ttl on the dns record back to default) | 17:45 |
clarkb | I don't think those are super urgent, but if you have a chance to review and/or land them after you've confirmed the new server is working for you that would be great | 17:45 |
clarkb | I just realized that I can start publishing the lodgeit image to quay again | 17:54 |
clarkb | probably won't bother with that until we're a bit more confident in the new setup. But that is exciting | 17:55 |
opendevreview | Clark Boylan proposed opendev/system-config master: Mirror uwsgi base image https://review.opendev.org/c/opendev/system-config/+/939383 | 18:47 |
opendevreview | Clark Boylan proposed opendev/lodgeit master: Reapply "Move lodgeit image publication to quay.io" https://review.opendev.org/c/opendev/lodgeit/+/939385 | 18:51 |
clarkb | the lodgeit change will fail until 939383 lands and mirrors the image. But I think this is what we need to do to publish to quay then we can followup with a change to consume the image from quay | 18:52 |
clarkb | lots of moving pieces, but I'm hoping that as we get more of these building blocks in place we'll need less and less of them for images later | 18:52 |
clarkb | I've just realized that the paste01 backups will continue to email spam us with failure warnings until we delete the server since that is driven by cron on the server. Any preference between disabling the crons by hand or restarting mariadb so that backups continue to run? | 19:01 |
clarkb | I've made this rant before but I feel like it deservers another pass: Why does `ssh foo` show you one ssh host key hash type and `ssh-keyscan` shows you another ssh host key hash type and finally `dig sshfp foo` shows you a third hash type | 20:33 |
clarkb | the underlying data is the same its just being hashed three different ways so that you cannot easily compare between them | 20:33 |
clarkb | call me crazy but if the intention is to avoid mitm attacks making a hash type that humans can understand and check seems really important | 20:35 |
clarkb | apparently you can do things like ssh-keyscan -t ecdsa | ssh-keygen -lf - and that will get you the same format as what `ssh` emits | 20:37 |
clarkb | still seems liek making it that difficult to do means no one will compare and double check. Thats all rant over | 20:37 |
clarkb | I decided to restart the mariadb container on paste01 so that backups don't spam us for now. I left the actual web service in a down'd state | 20:43 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!