19:00:11 <clarkb> #startmeeting infra
19:00:11 <opendevmeet> Meeting started Tue Feb 11 19:00:11 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:11 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:11 <opendevmeet> The meeting name has been set to 'infra'
19:00:54 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/2PCH5D4ULKXRVIFVGKMOVZ2L2MAXHPBB/ Our Agenda
19:01:16 <clarkb> #topic Announcements
19:02:00 <clarkb> I'm trying to find the archive link for the openinfra foundation stuff
19:02:05 <clarkb> and lists is being slow
19:02:17 <clarkb> #link https://lists.openinfra.org/archives/list/foundation@lists.openinfra.org/thread/3B7OWPRXB4KD2DVX7SYYSHYYRNCKVV46/
19:02:26 <fungi> you beat me to it
19:02:35 <clarkb> if you haven't seen it yet the foundation has an important announcement asking for feedback on shaping the foundation's futuer
19:02:57 <clarkb> if you haven't seen it yet that is probably worth a read then fungi or myself are happy to proxy questions or you can reach out directly to jbryce as inidicated in the email
19:03:24 <clarkb> in general I would probably encourage people to reach out on the mailing list there if they are comfortable just to keep the feedback in one place as much as possible
19:03:57 <clarkb> anything elset o announce?
19:05:02 <clarkb> #topic Zuul-launcher image builds
19:05:25 <clarkb> last week corvus  added ubuntu jammy and noble images to zuul-launcher config so that zuul projects can start to dogfood the new launcher system
19:05:38 <clarkb> those images came with new 4gb, 8gb, and 16gb flavors/labels
19:05:56 <clarkb> at least two bugs were discovered. The first had to do wtih handling of jobs without nodesets (including noop jobs) I believe this was fixed
19:06:11 <clarkb> the other is in decompressing zlib compressed images before uplodaing them to clouds
19:06:22 <clarkb> I don't know if that got fixed in time for a weekly reboot last friday/saturday
19:06:39 <clarkb> corvus: any news on that bugfix and progress with dogfooding?
19:08:16 <clarkb> I'll give it another minute or so but we can probably move on. I don't know that there is much to do on the opendev side
19:08:30 <clarkb> but good progress is being made (yes finding bugs is progress!)
19:09:08 <clarkb> #topic Unpinning our Grafana deployment
19:09:27 <clarkb> last week we moved our grafana deployment to a new noble host and in the process also updating the grafana version to the latest 10.x release
19:09:43 <clarkb> doing so now produces deprecation warnings on some graphs that they use angular and angular is deprecated
19:10:11 <clarkb> I pushed up a change to test and see what beraks with grafana 11 (expecting breakage due to the deprecated angular stuff) but everything actually seems to just work and the deprecation warnings are gone too
19:10:19 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/940997 Update to Grafana 11
19:10:44 <corvus> sorry i'm late
19:10:48 <clarkb> I updated that change's testinfra testing to grab the json definitions for each dashboard to compare against current production and they don't really differ in meaningful ways
19:11:03 <clarkb> corvus: thats ok we can switch back if you have any more to add on zuul-launcer after this topic
19:11:09 <corvus> ++
19:11:38 <clarkb> so I'm a bit stumped on whether or not it is a problem for us to updated to grafana 11. I suspect taht that automatic conversion of old deprecated stuff to new stuff is what makes this all work happily but it isn't clear to me if we need to change anything to keep working in the future
19:12:04 <clarkb> we could just send it and upgrade to grafana 11 and figure it out later or we can hold a test node now and try to understand it better before taking that step. Curious if there was any input on those options
19:12:55 <fungi> if the held server seems to be working as-is with our current data sources, i'm fine just upgrading
19:13:14 <clarkb> note there isn't a current held server, just log collection of our system-config-run job
19:13:28 <clarkb> but yes the screenshots look ok and the json doesn't meaningfully change in those logs
19:14:05 <clarkb> don't need an answer now. Maybe drop your thoughts on the linked change
19:14:16 <clarkb> #topic Zuul-launcher image builds
19:14:20 <clarkb> corvus: ok back to this for your updates
19:14:35 <corvus> the bugfixes are all in and things are working better now
19:14:59 <corvus> we're running into quota errors (!) and those aren't handled well in the new system yet
19:15:21 <corvus> so we may need to implement that before doing much more dogfooding
19:15:35 <clarkb> that makes sense considering we effectively have nodepool and zuul launcher fighting for resources
19:15:49 <corvus> yep
19:15:55 <clarkb> you might be able to get a sense of how things work otherwise during the weekend when overall demand is lower. But addressing quota seems like a good thing
19:16:15 <corvus> ya.  i'll recheck some time to see if we can sneak in some runs
19:16:32 <corvus> but otherwise, we'll probably need that before doing more regularly scheduled dogfooding
19:16:42 <corvus> i think that's about it.
19:16:52 <clarkb> thanks! and again this is great progress
19:17:02 <corvus> yw!
19:17:06 <clarkb> #topic Upgrading old servers
19:17:22 <clarkb> I discovered a new podman issue yesterday replacing zuul-lb01 with a new noble node
19:17:57 <clarkb> tl;dr is that apparmor rules on noble don't allow docker compose or podman to send signals like hup to containers. Our haproxy config management uses sighup to request graceful config reloading
19:18:08 <clarkb> #link https://bugs.launchpad.net/ubuntu/+source/libpod/+bug/2089664
19:18:31 <clarkb> #link https://github.com/containers/common/issues/2321
19:18:57 <clarkb> first link is an ubuntu issue and the second is the issue I filed upstream. They both contain a link to a proposed fix from november that hasn't gone anywhere yet
19:19:05 <clarkb> I'm hoping that reviving the discussion a bit might get things to move along
19:19:24 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/941256 workaround podman signal problems
19:19:38 <clarkb> in the meantime I've rewritten our ansible to issue a sighup with kill(1)
19:19:45 <clarkb> this seem to work with manual testing on zuul-lb02
19:20:07 <clarkb> that change has a -1 due to docker quota limits but in #opendev I linked to the ara report showing it working for the zuul load balancer at least
19:20:30 <clarkb> I'll recheck that change after the meeting to see if we can get it to pass (later in the day my local time seems to be better for docker quota limits)
19:21:09 <clarkb> reviews welcome as well as any concerns with the general plan for moving to podman on noble. I think that we continue to be able to find reasonable workarounds for limitations though so I personally think we can keep going with our plan
19:22:26 <clarkb> tonyb: any updates from your end?
19:24:08 <tonyb> Nothing from me.
19:25:04 <clarkb> #topic Sprinting to Upgrade Servers to Noble
19:25:17 <clarkb> I forgot to edit the agenda and fix the typo in this topic but I fixed it here
19:25:24 <clarkb> #link https://etherpad.opendev.org/p/opendev-server-replacement-sprint
19:25:57 <clarkb> This etherpad captures a list of servers that should be upgraded in the nearish future. I then tried to categorize them by ease of upgrade with the idea being more servers on noble finds more problems before we eventually upgrade review
19:26:07 <clarkb> and "good" news the podman signal problem is a direct result of that
19:26:40 <clarkb> I've been working to replace zuul-lb and codesearch servers. The zuul-lb02 server is deployed and we can technically switch dns to point at it at this time if we like but we have to land that fix for signals if we want graceful config updates
19:27:02 <clarkb> codesarch is running more test jobs (that may not be necesasry I guess) and has hit docker rate limits pretty consistently until my last recheck yesterday evening
19:27:21 <clarkb> I think those changes are all ready for reviwe and I'm happy to approve them and watch them go in
19:27:44 <clarkb> the etherpad has links to the changes. I'm trying to keep the info there as that should enable others to jump in and do upgrades too and add their notes in one central location
19:28:23 <tonyb> Sounds good.
19:28:57 <clarkb> I'm hoping I can get these two done today or ealry tomorrow then grab another couple or so off the list
19:29:09 <clarkb> anyway help is welcome both in reviews and in more upgrades.
19:29:19 <clarkb> #topic Running certcheck on bridge
19:29:29 <clarkb> fungi: any movement on this one?
19:30:18 <fungi> ah, nope
19:30:29 <fungi> sorry, been distracted by other matters still
19:30:43 <clarkb> it has been a busy week
19:30:48 <clarkb> #topic Service Coordinator Election
19:31:19 <clarkb> A reminder that nominations for OpenDev Service Coordinator are open until EOD February 18, 2025 UTC time
19:31:35 <clarkb> that gives you about one week remaining to nominate yourself if interested
19:31:54 <clarkb> if we have more than one candiate we'll haev an election from the 19th to 26th. Again all UTC time based
19:32:02 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/NGS2APEFQB45OCJCQ645P5N6XCH52BXW/
19:32:11 <clarkb> that thread has all the details in it if you need to refer back to them later
19:33:33 <clarkb> I'm happy to answer any question you have have about the position either publicly or privately. Feel free to reach out over irc, matrix or email if that works best for you
19:33:42 <clarkb> #topic Working through our TODO list
19:33:47 <clarkb> #link https://etherpad.opendev.org/p/opendev-january-2025-meetup
19:34:27 <clarkb> and this is just another friendly reminder of the todo list that came out of our meetup last month. If you find yourself needing to find something new to do this is a good place to start. It can also be a good place for a new contributor to dig in though maybe reach out and let us know so we can provide guidance
19:35:21 <clarkb> #topic Open Discussion
19:35:23 <clarkb> anything else?
19:38:01 <tonyb> Nothing from me
19:38:22 <clarkb> we're expecting to get $snoworice storm thursday through friday
19:38:38 <clarkb> it doesn't look too bad though so I don't expect it to have an impact. But there is always the risk of power outage etc
19:39:25 <clarkb> and then I may need someone else to chair next week's meeting. Trying to do passports for kids and they don't have school tuesday and it isn't a holiday so we can try a walk in location rather than an appointment next month
19:39:35 <clarkb> I'll know more when we get closer to that
19:41:26 <clarkb> I'll give it until 19:45 for anything else but  Isuspect we can end 15 minute early today
19:41:28 <clarkb> thanks everyone!
19:43:28 <fungi> we just had sleet here a few minutes ago
19:45:08 <clarkb> #endmeeting