19:01:24 #startmeeting infra 19:01:24 Meeting started Tue Aug 2 19:01:24 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:24 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:24 The meeting name has been set to 'infra' 19:01:29 #link https://lists.opendev.org/pipermail/service-discuss/2022-August/000348.html Our Agenda 19:01:33 I am prepared with an agenda :) 19:01:39 #topic Announcements 19:02:04 The Service Coordinator nomination period has officially begun 19:02:18 o/ 19:02:24 It started today and will run through August 16, 2022. I'll send a followup email to the thread I started last week warning people of this timeline :) 19:02:28 #link https://lists.opendev.org/pipermail/service-discuss/2022-July/000347.html 19:03:17 #topic Topics 19:03:30 #topic Improving OpenDev CD Throughput 19:03:55 I don't have anything new on this item. I'm thinking maybe we can pull it off the agenda until we've got new developments? Seems like we've got a lot of other stuff going on in the meantime 19:04:46 Any objections to that? 19:05:53 none from me 19:06:00 not really, it is a permanent todo :) 19:06:08 ok cool 19:06:16 #topic Updating Grafana Management Tooling 19:06:21 #link https://review.opendev.org/q/topic:grafana-json 19:06:39 ianw: ^ I think this stack is largely ready to go though I had some comments on it. Not sure if you want to respin or land as is and then improve in followups 19:07:08 sorry, i wanted to get back and make sure i responded to comments before merging 19:07:33 either approach is fine with me. I just didn't want anyone to feel my comments were necessary improvements. I did +2 afterall 19:08:09 i should have time to get to it soon 19:08:31 sounds good 19:08:42 #topic Bastion Host Updates 19:08:56 This item is largely a proxy for the zuul streaming log file cleanup work 19:09:02 at least for now 19:09:09 ianw: did those changes get included in the weekend upgrade of zuul? 19:09:44 If so I think we should manually clear out the remaining files on bridge (and static) and then we can monitor to see how many sneak through due to aborted jobs and similar situation with zuul 19:10:31 i haven't yet pushed the +w on those changes as i haven't responded to corvus' comments on the files potentially not being removed for aborted jobs 19:11:22 the suggestion was a background thread to remove them 19:11:34 got it. FWIW it was my impression that we can land the stack you've got as it is an improvement. Just taht we will also want to look into a tmpreaper setup for the straggler files 19:12:20 i'm starting to think perhaps documenting the situation a bit better first, and we can see how much of an issue it is, and perhaps if we can land something that puts them in more of a reserved namespace i'd feel better about a generic cleaner 19:12:54 so i have a half-written doc change that i'll clean up ... very soon :) 19:12:55 before landing the current improvements? 19:13:08 i'll push that on top and feel ok about landing what's there, i think 19:13:16 got it 19:14:27 I just didn't want this to get forgotten as the creatino of the tmpfiles will eventually bite us I think :) 19:14:36 this plan seems reasonable though. I think we can move on 19:14:51 #topic Upgrading Bionic Servers to Focal/Jammy 19:14:58 #link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes on the work that needs to be done. 19:15:19 I've been using the mailman3 works as a good exercise for checking Jammy generally works for our config management 19:15:31 There are two changes related to improving Jammy support that can landable today 19:15:36 #link https://review.opendev.org/c/opendev/system-config/+/851094/2 Run system-config-run-base against Jammy 19:15:41 #link https://review.opendev.org/c/opendev/system-config/+/851266/1 Fix install-docker role for Jammy 19:15:51 thanks, i'll make sure to check those out 19:16:09 Overall nothing really crazy about using Jammy yet which is a good thing 19:16:18 ahh, i already reviewed one of them ;) 19:16:27 But I haven't gotten to the apahce config for mailman3 yet as I've been struggling with various mailman3 related things recently 19:16:53 I'm hoping we'll have functioning apache configs in the mailman3 context soon and that will hopefully expose if that server has any new things we need to accomodate 19:17:16 If you are curious about the mailman3 work it isn't for a bionic upgrade but it is updating some other old software 19:17:21 #link https://review.opendev.org/c/opendev/system-config/+/851248 WIP change to run mailman3 on Jammy 19:18:01 thanks, mm3 seems like something we have to do eventually :) 19:18:16 #link https://review.opendev.org/c/zuul/nodepool/+/849273 Dockerfile: move into separate group when running under cgroupsv2 19:18:31 is i guess related, and maybe invalidates your "nothing really crazy" bit :) 19:18:32 The latest thing is hacking around assumptions in the upstream docker image configs. And for some reason port 8000 doesn't show up as listening even though I've gotten the errors out of the service logs now as far as I can tell 19:18:49 oh ya cgroupsv2 is definitely something that falls under crazy :) 19:18:55 I'll take a look at that change today 19:19:16 Anyway there is progress here. Slow but it is happening :) 19:20:12 #topic Gitea 1.17 Upgrade 19:20:39 Gitea made their 1.17.0 release over the weekend. Yesterday I updated my WIP change that was deploying and testing the gitea release candidates to this final release version 19:20:44 #link https://review.opendev.org/c/opendev/system-config/+/847204 19:21:12 I don't think we are in a rush to upgrade. I've tried to call out all of the breaking changes in my commit message and give details why they do or do not affect us 19:21:51 The screenshots and our testing seem to show it generally works though. So if we are happy after reviews I think we can upgrade whenever we feel ready 19:23:18 Seems like we're making good time through the agenda 19:23:23 #topic Rock Images Not Booting 19:23:39 Late last week it was pointed out that our Rocky Linux images were no longer booting 19:24:00 I pulled one up in a rescue instance and noticed there were not kernels in /boot and there were no entries in grub.cfg to boot 19:24:27 ianw: managed to trace this back to the machine ID missing in the image which prevents kernels from being installed to /boot and that prevents grub from updating its config to boot a kernel 19:24:52 The fix for this has landed in DIB. The next steps in addressing this are to make a DIB release and update nodepool to use that release. 19:25:14 However, there is one change that we'd liek to get into the DIB release that hasn't merged yet which adds support for rocky linux 9 19:25:22 In any case we should expect this to be happy in the near future 19:26:15 ianw: out of curiousity how did you trace it back to the machine id? 19:27:40 i started by looking at the dib functional boot jobs; luckily we had one that did work with logs still so had a comparision point 19:27:59 eventually i realised that the upstream container had updated since that run, so that cracked open something to explore 19:28:31 comparing to the older version of the container, the new one didn't have a /boot directory ... which i thought might be the problem at first 19:29:18 when that didn't pan out, i started wondering about how the kernel actually got into /boot, which led me to the rpm scripts run by kernel-core package, which lead to tracing /bin/kernel-install, which led to realising it was looking for a machine-id ... 19:30:16 just now though, weirdly, following this same thread on the rocky9 images, there doens't seem to be a kernel installed either. but the jobs are booting ok there when built under dib. so i don't know what's going on with that 19:32:27 I also wonder if the rhel stuff needs similar fixes? Or are they likely shielded by not starting from a container image? 19:33:31 i do feel like we've been down a similar path with machine-ids and kernel installs on the other build types, in various incarnations 19:34:13 https://review.opendev.org/c/openstack/diskimage-builder/+/675056/ even 19:34:38 ha ya ok 19:35:02 oh god, how depressing 19:35:12 https://bugzilla.redhat.com/show_bug.cgi?id=1737355#c7 19:35:22 "Oh, and I actually debugged this once before 2 years ago and forgot about it! :) 19:35:22 https://review.opendev.org/#/c/504300/" 19:35:32 so now i've debugged it 3 times?! 19:36:47 that sounds familiar 19:38:46 #topic Open Discussion 19:38:56 Seems like the last topic was well covered :) and that was all on the agenda 19:39:55 could i ask for reviews on 19:39:59 #link https://review.opendev.org/q/topic:ansible-lint-update-6 19:40:10 oh, yep 19:40:27 there's a lot, but hopefully nothing too controversial. it should bring everything in sync 19:40:37 everything being our *-jobs repos 19:40:38 lots are already merged too 19:41:01 related to linting the new rules about spaces after keywords hit zuul 19:41:08 I expect that will continue to crop up in places 19:41:52 related to the rocky troubleshooting, 851520 is probably safe to merge now 19:41:57 speaking of that, i proposed 19:42:18 #link https://review.opendev.org/c/openstack/releases/+/851273 hacking: release 5.0.0 19:42:48 i don't know who usually looks after that. that would bring in a flake8 that is 3.10 compatible 19:42:56 also on an entirely separate note, i have a wip behavior change proposed for git-review which could use some feedback as to whether it's desirable: https://review.opendev.org/850061 19:43:13 ianw: I think that openstack often updates those at the beginning of a cycle so may be too late now and have to wait for ~end of october? 19:43:52 yeah, true, i guess it has potential to do more than just support 3.10 19:44:16 ianw: I think it belongs to qa, so kopecmartin 19:47:23 Anything else? Last call 19:47:35 i got nuthin 19:49:07 oh, though i don't expect to be around much thursday, just a heads up to everyone 19:49:31 thank you for the heads up. 19:49:39 Thanks everyone. We'll see you back here next week 19:49:44 #endmeeting