19:00:11 <clarkb> #startmeeting infra 19:00:11 <opendevmeet> Meeting started Tue Feb 11 19:00:11 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:11 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:11 <opendevmeet> The meeting name has been set to 'infra' 19:00:54 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/2PCH5D4ULKXRVIFVGKMOVZ2L2MAXHPBB/ Our Agenda 19:01:16 <clarkb> #topic Announcements 19:02:00 <clarkb> I'm trying to find the archive link for the openinfra foundation stuff 19:02:05 <clarkb> and lists is being slow 19:02:17 <clarkb> #link https://lists.openinfra.org/archives/list/foundation@lists.openinfra.org/thread/3B7OWPRXB4KD2DVX7SYYSHYYRNCKVV46/ 19:02:26 <fungi> you beat me to it 19:02:35 <clarkb> if you haven't seen it yet the foundation has an important announcement asking for feedback on shaping the foundation's futuer 19:02:57 <clarkb> if you haven't seen it yet that is probably worth a read then fungi or myself are happy to proxy questions or you can reach out directly to jbryce as inidicated in the email 19:03:24 <clarkb> in general I would probably encourage people to reach out on the mailing list there if they are comfortable just to keep the feedback in one place as much as possible 19:03:57 <clarkb> anything elset o announce? 19:05:02 <clarkb> #topic Zuul-launcher image builds 19:05:25 <clarkb> last week corvus added ubuntu jammy and noble images to zuul-launcher config so that zuul projects can start to dogfood the new launcher system 19:05:38 <clarkb> those images came with new 4gb, 8gb, and 16gb flavors/labels 19:05:56 <clarkb> at least two bugs were discovered. The first had to do wtih handling of jobs without nodesets (including noop jobs) I believe this was fixed 19:06:11 <clarkb> the other is in decompressing zlib compressed images before uplodaing them to clouds 19:06:22 <clarkb> I don't know if that got fixed in time for a weekly reboot last friday/saturday 19:06:39 <clarkb> corvus: any news on that bugfix and progress with dogfooding? 19:08:16 <clarkb> I'll give it another minute or so but we can probably move on. I don't know that there is much to do on the opendev side 19:08:30 <clarkb> but good progress is being made (yes finding bugs is progress!) 19:09:08 <clarkb> #topic Unpinning our Grafana deployment 19:09:27 <clarkb> last week we moved our grafana deployment to a new noble host and in the process also updating the grafana version to the latest 10.x release 19:09:43 <clarkb> doing so now produces deprecation warnings on some graphs that they use angular and angular is deprecated 19:10:11 <clarkb> I pushed up a change to test and see what beraks with grafana 11 (expecting breakage due to the deprecated angular stuff) but everything actually seems to just work and the deprecation warnings are gone too 19:10:19 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/940997 Update to Grafana 11 19:10:44 <corvus> sorry i'm late 19:10:48 <clarkb> I updated that change's testinfra testing to grab the json definitions for each dashboard to compare against current production and they don't really differ in meaningful ways 19:11:03 <clarkb> corvus: thats ok we can switch back if you have any more to add on zuul-launcer after this topic 19:11:09 <corvus> ++ 19:11:38 <clarkb> so I'm a bit stumped on whether or not it is a problem for us to updated to grafana 11. I suspect taht that automatic conversion of old deprecated stuff to new stuff is what makes this all work happily but it isn't clear to me if we need to change anything to keep working in the future 19:12:04 <clarkb> we could just send it and upgrade to grafana 11 and figure it out later or we can hold a test node now and try to understand it better before taking that step. Curious if there was any input on those options 19:12:55 <fungi> if the held server seems to be working as-is with our current data sources, i'm fine just upgrading 19:13:14 <clarkb> note there isn't a current held server, just log collection of our system-config-run job 19:13:28 <clarkb> but yes the screenshots look ok and the json doesn't meaningfully change in those logs 19:14:05 <clarkb> don't need an answer now. Maybe drop your thoughts on the linked change 19:14:16 <clarkb> #topic Zuul-launcher image builds 19:14:20 <clarkb> corvus: ok back to this for your updates 19:14:35 <corvus> the bugfixes are all in and things are working better now 19:14:59 <corvus> we're running into quota errors (!) and those aren't handled well in the new system yet 19:15:21 <corvus> so we may need to implement that before doing much more dogfooding 19:15:35 <clarkb> that makes sense considering we effectively have nodepool and zuul launcher fighting for resources 19:15:49 <corvus> yep 19:15:55 <clarkb> you might be able to get a sense of how things work otherwise during the weekend when overall demand is lower. But addressing quota seems like a good thing 19:16:15 <corvus> ya. i'll recheck some time to see if we can sneak in some runs 19:16:32 <corvus> but otherwise, we'll probably need that before doing more regularly scheduled dogfooding 19:16:42 <corvus> i think that's about it. 19:16:52 <clarkb> thanks! and again this is great progress 19:17:02 <corvus> yw! 19:17:06 <clarkb> #topic Upgrading old servers 19:17:22 <clarkb> I discovered a new podman issue yesterday replacing zuul-lb01 with a new noble node 19:17:57 <clarkb> tl;dr is that apparmor rules on noble don't allow docker compose or podman to send signals like hup to containers. Our haproxy config management uses sighup to request graceful config reloading 19:18:08 <clarkb> #link https://bugs.launchpad.net/ubuntu/+source/libpod/+bug/2089664 19:18:31 <clarkb> #link https://github.com/containers/common/issues/2321 19:18:57 <clarkb> first link is an ubuntu issue and the second is the issue I filed upstream. They both contain a link to a proposed fix from november that hasn't gone anywhere yet 19:19:05 <clarkb> I'm hoping that reviving the discussion a bit might get things to move along 19:19:24 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/941256 workaround podman signal problems 19:19:38 <clarkb> in the meantime I've rewritten our ansible to issue a sighup with kill(1) 19:19:45 <clarkb> this seem to work with manual testing on zuul-lb02 19:20:07 <clarkb> that change has a -1 due to docker quota limits but in #opendev I linked to the ara report showing it working for the zuul load balancer at least 19:20:30 <clarkb> I'll recheck that change after the meeting to see if we can get it to pass (later in the day my local time seems to be better for docker quota limits) 19:21:09 <clarkb> reviews welcome as well as any concerns with the general plan for moving to podman on noble. I think that we continue to be able to find reasonable workarounds for limitations though so I personally think we can keep going with our plan 19:22:26 <clarkb> tonyb: any updates from your end? 19:24:08 <tonyb> Nothing from me. 19:25:04 <clarkb> #topic Sprinting to Upgrade Servers to Noble 19:25:17 <clarkb> I forgot to edit the agenda and fix the typo in this topic but I fixed it here 19:25:24 <clarkb> #link https://etherpad.opendev.org/p/opendev-server-replacement-sprint 19:25:57 <clarkb> This etherpad captures a list of servers that should be upgraded in the nearish future. I then tried to categorize them by ease of upgrade with the idea being more servers on noble finds more problems before we eventually upgrade review 19:26:07 <clarkb> and "good" news the podman signal problem is a direct result of that 19:26:40 <clarkb> I've been working to replace zuul-lb and codesearch servers. The zuul-lb02 server is deployed and we can technically switch dns to point at it at this time if we like but we have to land that fix for signals if we want graceful config updates 19:27:02 <clarkb> codesarch is running more test jobs (that may not be necesasry I guess) and has hit docker rate limits pretty consistently until my last recheck yesterday evening 19:27:21 <clarkb> I think those changes are all ready for reviwe and I'm happy to approve them and watch them go in 19:27:44 <clarkb> the etherpad has links to the changes. I'm trying to keep the info there as that should enable others to jump in and do upgrades too and add their notes in one central location 19:28:23 <tonyb> Sounds good. 19:28:57 <clarkb> I'm hoping I can get these two done today or ealry tomorrow then grab another couple or so off the list 19:29:09 <clarkb> anyway help is welcome both in reviews and in more upgrades. 19:29:19 <clarkb> #topic Running certcheck on bridge 19:29:29 <clarkb> fungi: any movement on this one? 19:30:18 <fungi> ah, nope 19:30:29 <fungi> sorry, been distracted by other matters still 19:30:43 <clarkb> it has been a busy week 19:30:48 <clarkb> #topic Service Coordinator Election 19:31:19 <clarkb> A reminder that nominations for OpenDev Service Coordinator are open until EOD February 18, 2025 UTC time 19:31:35 <clarkb> that gives you about one week remaining to nominate yourself if interested 19:31:54 <clarkb> if we have more than one candiate we'll haev an election from the 19th to 26th. Again all UTC time based 19:32:02 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/NGS2APEFQB45OCJCQ645P5N6XCH52BXW/ 19:32:11 <clarkb> that thread has all the details in it if you need to refer back to them later 19:33:33 <clarkb> I'm happy to answer any question you have have about the position either publicly or privately. Feel free to reach out over irc, matrix or email if that works best for you 19:33:42 <clarkb> #topic Working through our TODO list 19:33:47 <clarkb> #link https://etherpad.opendev.org/p/opendev-january-2025-meetup 19:34:27 <clarkb> and this is just another friendly reminder of the todo list that came out of our meetup last month. If you find yourself needing to find something new to do this is a good place to start. It can also be a good place for a new contributor to dig in though maybe reach out and let us know so we can provide guidance 19:35:21 <clarkb> #topic Open Discussion 19:35:23 <clarkb> anything else? 19:38:01 <tonyb> Nothing from me 19:38:22 <clarkb> we're expecting to get $snoworice storm thursday through friday 19:38:38 <clarkb> it doesn't look too bad though so I don't expect it to have an impact. But there is always the risk of power outage etc 19:39:25 <clarkb> and then I may need someone else to chair next week's meeting. Trying to do passports for kids and they don't have school tuesday and it isn't a holiday so we can try a walk in location rather than an appointment next month 19:39:35 <clarkb> I'll know more when we get closer to that 19:41:26 <clarkb> I'll give it until 19:45 for anything else but Isuspect we can end 15 minute early today 19:41:28 <clarkb> thanks everyone! 19:43:28 <fungi> we just had sleet here a few minutes ago 19:45:08 <clarkb> #endmeeting