Saturday, 2022-01-22

clarkbwe did not get an email about filling the disk on the backup server today00:30
clarkbpruning did its job. thank you fungi00:31
fungiindeed, it worked a treat00:31
clarkbOne thing that just occurred to me re gerrit upgrade is to compare the config files between 3.3 and 3.4 in the test systems to ensure that gerrit isn't adding any new data to the config (it likes to do that)00:37
clarkbif it does that isn't the end of the world but means ansible will fight gerrit over the content and we should try and make them match to avoid that. I can probably look into that over the weekend00:38
clarkbfungi: has a question for you00:43
fungioh, thanks00:45
fungiclarkb: i don't see any new questions since my answer from thirsday00:46
clarkboh sorry. I read them out of order00:47
clarkbmy brain is clearly fried enough for one week00:47
fungino worries, thought maybe gertty had swallowed my comments or something00:49
clarkbno, you didn't quote reply to the question you commented in the code where you would make the change and I got things mixed up in my head00:52
fungiahh, yeah00:52
corvusi'm going to do a very relaxed rolling restart of zuul15:53
corvusi'll start with a graceful shutdown of half the executors15:53
corvuscomponents page shows ze01-ze06 paused15:56
corvusand actually, looks like ze04 has exited already15:57
fungithanks for the heads up16:13
fungialso i rebooted all the executors yesterday for a kernel upgrade, so they're probably fairly fresh processes anyway16:13
fungicorvus: any way we can work in reboots of the schedulers and mergers? or is it too far along?16:13
corvusfungi: oh sure, can do.  we're still at wait for first half of executors to stop.16:16
corvus3 to go16:16
fungicool, as i said the 12 executors are already good on reboots, so it's just the 8 mergers and 2 schedulers16:17
fungii figure we'll do rolling reboots of the zk cluster next week16:17
corvusze01-06 are stopped17:10
corvusi'm starting them back up now17:10
fungithat was, what, ~1.5 hours?17:10
fungigranted, it's a saturday17:10
corvusmaybe 1:1517:10
corvusze01-06 are back up; gracefully stopping 07-12 now17:12
corvusze07-12 show paused on the components page17:13
corvusi'm going to stop all the mergers and reboot their hosts now17:13
fungioh, yep i keep forgetting that page exists17:13
corvusfungi: just 'reboot' ?17:14
corvusno other prep needed for the update?17:14
fungiyeah, just reboot17:15
fungithey'll already have the latest kernels installed17:16
fungiLinux zm08 5.4.0-96-generic #109-Ubuntu SMP17:17
fungithat's the one we wanted17:17
corvuscool.  mergers are back up.17:18
corvuswill restart zuul01 now17:19
fungiand yeah, we discovered yesterday that the images we've been using recently explicitly disable unattended-upgrades, so i manually fixed the config on zuul01 and forced a package update there, but we still need to add some configuration management to solve that17:20
corvusoops :(17:25
corvusok, zuul01 is coming back up now17:25
fungiyep, it's running on the intended kernel now too, thanks17:26
corvuszuul01 is back online; restarting zuul02 now;  this will cause a web outage17:30
corvuszuul02 is coming up17:34
fungithanks, booted kernel looks right on 02 now too17:42
corvusit's fully up now.  only tasks remaining are to wait for ze07-12 to stop and restart them.  i may just check in later for that.  or if anyone else wants to run `docker-compose up -d` on them after they stop, feel free.17:46
fungii'll check on them periodically as well and mention in here17:51
fungi09 no longer has any processes owned by zuuld but i'll wait for there to be at least a few18:53
fungi07 has stopped too19:56
fungi07-09 are stopped now, so i'll go ahead and start those back up21:25
fungi10-12 have not stopped yet21:25
fungi shows ze01-ze09 running now, ze10-ze12 still paused21:28
fungiand those have finally stopped, so starting them up22:09
fungicorvus: components page shows all 12 executors running again now on the same 4.11.1.dev66 1ed18610 version22:10
fungiall components list a consistent version now22:11
corvus\o/ thanks!22:14
corvus#status log rolling restart of zuul onto commit 1ed186108956a1f7cc5fe34dc9d93731beaa56f622:14
opendevstatuscorvus: finished logging22:14
corvusi think that's our first complete+successful rolling restart :)22:15
fungiso smooth!22:18
fungithough the second half of the executors did take close to 5 hours to come to a full graceful stop, that's fine when we're not in a hurry22:20
