19:00:18 <clarkb> #startmeeting infra
19:00:18 <opendevmeet> Meeting started Tue Jul  2 19:00:18 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:18 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:18 <opendevmeet> The meeting name has been set to 'infra'
19:00:24 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/VV6IZNOMAO2KDKBBQ45R3VPNSRCOLWCG/ Our Agenda
19:00:28 <clarkb> #topic Announcements
19:00:45 <clarkb> A reminder that Thursday (and possibly Friday?) is a holiday for several of us
19:01:17 <tonyb> happy turkey day!
19:01:41 <clarkb> no this is happy please do your best to not set everything on fire with your firworks day :)
19:01:51 <corvus> i will celebrate with maple syrup!
19:02:20 <clarkb> might also be worth calling out that oepnstack has loaded up the CI system with some important changes and we should do our best to ensure we aren't impacting their ability to merge
19:02:35 <tonyb> oh right.  silly me
19:02:44 <clarkb> fungi: frickler: I saw rumblings of merge issues but haven't seen any of that tied back infrastructure issues.
19:03:56 <clarkb> well we can get back to that at the end of the agenda if there are concerns
19:04:00 <clarkb> #topic Upgrading Old Servers
19:04:27 <clarkb> tonyb: I've not been able to apy as much attention to this as I would've liked over the last week. Plenty of other distractions. Anything new to share about the wiki work?
19:04:41 <tonyb> i can get the IP of the held node
19:05:11 <tonyb> I did a snapshot and import of the current wiki data and it seems good
19:05:20 <clarkb> tonyb: does that include the database content?
19:05:30 <tonyb> yup
19:05:57 <tonyb> it helped me discover some additional work to do but nothing significant
19:05:57 <clarkb> that is great. I guess that proves we can do the sideways migration (probably with a reasonable downtime to avoid deltas between the two sides)
19:06:26 <tonyb> yup I'd say the outage will be an hour tops
19:06:46 <tonyb> so I'd like y'all to look at the wiki on that node
19:07:03 <tonyb> to test for extension breakage etc
19:07:16 <tonyb> then I can publish changes for review
19:07:25 <clarkb> ++ I can #link a url if we have one (or do we need to edit /etc/hosts locally to have vhosts align?)
19:07:32 <fungi> are images working too? the separate non-database file storage in mediawiki required some careful handling last i did this
19:07:44 <tonyb> yup images work too
19:07:49 <fungi> awesome!
19:08:05 <tonyb> you'll need to edit hosts with the IP
19:08:27 <clarkb> do you have the IP? otherwise we have to go look in nodepool's hold list
19:08:28 <tonyb> I'll drop the details in #opendev when I get to my laptop
19:08:36 <clarkb> perfect thanks!
19:09:02 <clarkb> I also wanted to ask about booting noble nodes. Have we booted one yet? I know the stack to make that possible landed last week.
19:09:24 <tonyb> for noble I was going to try mirror mode but that was made complex
19:10:07 <tonyb> The vexxhost mirrors are currently boot from volune
19:10:25 <tonyb> the tax clouds don't have noble images
19:11:08 <tonyb> etc nothing super worrisome but it means I'll probably stall for a bit while I figure out the right way forward
19:11:17 <clarkb> seems reasonable
19:11:42 <clarkb> I also wanted to note that debian fixed some openafs packaging so we may be able to flip some of those infra roles jobs that are non voting to voting at this point
19:11:57 <tonyb> 104.239.143.6	wiki99.opendev.org
19:12:25 <tonyb> I can look at that too
19:12:59 <tonyb> I wanted to check the next release and see if there is anything we need that isn't backported
19:12:59 <clarkb> I haven't tested the fixes myself but flipping from non voting to voting to run the jobs then see what fails/succeeds seems reasonable
19:13:36 <clarkb> Anything else related to upgrading servers?
19:13:51 <tonyb> It's a bit of a tangent but I've been looking at kAFS also
19:14:11 <fungi> oh, i did confirm that this week's new openafs packages in debian fixed my dkms build problems
19:14:33 <fungi> speaking of afs
19:14:38 <tonyb> Nice
19:14:56 <clarkb> A good lead into the next topic
19:15:06 <clarkb> #topic Cleaning Up AFS Mirror Content
19:15:44 <clarkb> a number of the topic:drop-centos-8-stream changes have merged at this point, but what remains is currently stuck behind projects like glance (and tempest etc?) still trying to run fips jobs on centos 8 stream
19:16:18 <clarkb> I don't think we want to force merge cleanups this week due to the holiday and openstack being preoccupied with the security fix patches, but maybe we should consider forc emerging things next week?
19:17:01 <clarkb> Currently the jobs are broken and projects ahve just set thinsg to non voting which is completely uselss from our perspective. The jobs cannot succeed and need to be removed/replaced and if that isn't happening more naturally (due to the non voting workaround) I think we should be more forceful
19:17:51 <clarkb> any concerns with doing that next week? Maybe we want to see where the openstack patching stands on monday?
19:18:08 <tonyb> Sounds okay to me
19:18:19 <tonyb> we pre-warned of that plan
19:19:14 <tonyb> What's the status of Xenial and Bionic?
19:19:43 <tonyb> What can I/we do to help get that content removed
19:20:00 <clarkb> tonyb: my efforst there stalled on on trying to clean up centos 8 stream first (I prioritized that way since xenial jobs still function and centos 8 stream had fewer tendrils)
19:20:28 <clarkb> I do have a semi recent change up to system-config to remove our last uses of xenial with a warning that once we do that we are at higher risk of breaking things fomr an opendev perspective
19:20:45 <clarkb> that was always going to happen with our plan so its a matter of timing. We could proceed with that if we think the risk is low enough
19:20:55 <clarkb> topic:drop-ubuntu-xenial should have that change Let me see about a direct link
19:21:21 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/922680 Xenial CI Job removal from system-config
19:22:27 <tonyb> Thanks
19:22:45 <clarkb> #topic Gitea 1.22 Upgrade
19:23:39 <clarkb> Upstream still doesn't have a 1.22.1 release
19:24:15 <clarkb> so I haven't really looked at this much more. However, I did cycle out my held nodes as the previous ones were old enough to not have logs available in zuul related to them anymore
19:24:53 <clarkb> also a user pointed out yesterday on irc that tarball downloads for repos do not work currently. They were workign not that long ago and the logs indicate 200 responses with 19 bytes of json saying "complete: false"
19:25:08 <clarkb> my plan is to test that with the 1.22 held nodes and see if the behavior persists as I suspect it may just be a gitea bug
19:25:53 <clarkb> #topic OpenMetal Cloud Rebuild
19:26:41 <clarkb> I haven't seen any response to my last email
19:27:20 <clarkb> The main concern is that I think configuring storage properly is something that they should consider more generally for their product and we're a good test case which means we may want to avoid fixing it directly using frickler's kolla knowledge
19:28:14 <clarkb> basically trying to avoid stepping on toes and help provide some value back to the donating organization. With the holdiay and it being summer for those of us in nroth america it may also just be a vacation problem. I'll try to followup again after the holiday and see if we can get direction that avoids toe stepping
19:28:36 <tonyb> Sounds good.
19:28:38 <clarkb> #topic Testing Rackspace's New Cloud Offering
19:28:48 <clarkb> similarly with this one I haven't heard back on the email I sent
19:28:52 <tonyb> I guess we could also file an issue / ticket
19:29:05 <clarkb> tonyb: ya that might be another way to get attention
19:29:35 <clarkb> In the rackspace case I think I may lean on some folks at the foundation who may be in more regular contact with them and see if we can set something up
19:29:43 <clarkb> again with the holiday this week I don't expect it to move quickly though
19:30:00 <fungi> yeah, cloudnull seems to busy for our usual heckling ;)
19:30:05 <fungi> er, too busy
19:31:28 <clarkb> #topic Nodepool in Zuul
19:31:50 <corvus> you may remember this zuul spec from a while back
19:32:28 <corvus> the main work of implementation on that has begun, so i think in the not too distant future, this may become more relevant to opendev
19:32:54 <clarkb> the main goals are to express image and node info directly in zuul configs as well as using the zuul runtime engine to process things like image builds
19:33:20 <corvus> yep, and a big part of that is being able to build images inside a zuul job
19:33:24 <clarkb> and reduce confusion over zuul and nodepool being different things for historical reasons despite being tightly coupled today
19:33:44 <fungi> and provide opportunities for things like acceptance testing of images, i suppose
19:33:55 <tonyb> Can you link to the spec so I can ask reasonable questions
19:33:57 <corvus> so in opendev, we will need to port our image building from nodepool-builder into zuul jobs
19:33:58 <corvus> fungi: yep
19:34:40 <corvus> #link nodepool in zuul spec https://zuul-ci.org/docs/zuul/latest/developer/specs/nodepool-in-zuul.html
19:34:51 <tonyb> Thank you
19:35:56 <clarkb> I suspect that we'll be able to port an image at a time as we sort out any unexpected items
19:36:06 <corvus> as for moving the image building into jobs -- there's a bit of work to do there, but i don't think it's going to be too bad, and we'll have help
19:36:30 <corvus> first, i expect that the image build jobs are basically going to be "run diskimage-builder with the same parameters we use today inside nodepool-builder"
19:36:57 <corvus> second, the folks at bmw already build their images this way, and are offering their existing ansible roles that execute DIB to zuul-jobs
19:37:18 <corvus> so a lot of the boilerplate for "run dib in a zuul job" should exist in some form
19:37:42 <corvus> we also have my old proof-of-concept patch: https://review.opendev.org/848792
19:37:55 <corvus> that just does it in a shell script, but the principle is the same
19:38:10 <clarkb> I suspect that zuul will want to ship things that work out of the box for people migrating (though we're likely the test case for those)?
19:38:45 <corvus> so anyway, as clarkb says, i think we can start working on these image jobs one at a time, once zuul grows the ability to run them
19:39:35 <corvus> clarkb: yeah, i think there should be a straightforward path for anyone using nodepool-builder.
19:40:08 <corvus> the other thing opendev may see soon is a zuul-launcher host
19:40:30 <tonyb> Exciting :)
19:40:32 <clarkb> talking out loud here: but do we need another host? Could just run that on the existing launcher nodes?
19:40:53 <corvus> that will be the zuul component responsible for launching nodes, and at least at first, driving image build workflow
19:40:57 <clarkb> though maybe since long term nodepool goes away not having things named nlXY is preferred
19:41:30 <corvus> clarkb: we... could, i think?  but unless we're very constrained, i think it would be good to have a new host for simplicity
19:41:56 <clarkb> I don't think we're constrained. Was more thinking it might speed up the conversion process to not launch new nodes too (though that isn't too big of a deal either)
19:42:13 <corvus> the process for doing all of this will be to effectively develop this in a shadow mode.  so zuul is going to grow a lot of features that are not enabled by default, and are not documented.
19:42:35 <corvus> and i think we're going to get all the way to the end and have the ability to run both systems in parallel for a while before we call it done.
19:43:04 <fungi> that sounds reasonable
19:43:18 <corvus> i'll take care of writing the deployment changes, and doing a lot of the (undocumented) job/workflow construction
19:43:36 <fungi> thanks!
19:43:46 <corvus> i will be leaning on other folks to help with the image build jobs themselves, because i am not as expert in that as others are :)
19:44:06 <clarkb> sounds like a plan
19:44:27 * tonyb is far from an expert but is happy to "drive" the opendev side of the image-building
19:44:54 <corvus> tonyb: ++ thanks!  and you will be soon!  :)
19:45:04 <clarkb> anything else?
19:45:07 <tonyb> \o/
19:45:12 <corvus> i think you can expect to see some deployment changes relatively soon....
19:45:50 <corvus> oh, and if anyone asks, i don't think opendev running zuul-launcher should be seen as a sign it's time for other folks to do so
19:46:10 <clarkb> probably want to point people at zuul release notes for signal that they are expected to migrate?
19:46:37 <corvus> obviously anyone is welcome too -- but for clarity, sometimes we point to ourselves as an example of how zuul should be run; but in this case i consider this more like part of the development process, and it's not a signal of maturity or production readiness.
19:46:40 <corvus> just want to be clear to that
19:46:54 <corvus> clarkb: exactly -- documentation and release notes will be a much better signal for that
19:47:09 <tonyb> Good to be clear
19:47:14 <corvus> i think that's about it; thanks
19:47:17 <tonyb> thanks corvus
19:47:40 <fungi> that is also how we handled the switch from jenkins to ansible builders in zuul 2.5.x
19:47:46 <clarkb> #topic Collating Backlog Items From the Group
19:48:02 <clarkb> tonyb: you added this item do you want to drive or should I from the notes ( I know it is early for you)
19:48:10 <tonyb> I can
19:48:19 <clarkb> go for it
19:48:24 <tonyb> It's kind of a discussion point
19:48:50 <tonyb> As a group we're somewhat overcommited and we all have a list (physical or not) of things to do
19:49:09 <fungi> i have a to do item to find where i put my to do list
19:49:22 <tonyb> I was thinking of coming up with a lighteight way to track these things
19:50:02 <fungi> we've used etherpads fairly well for scratch coordination in the past
19:50:15 <tonyb> My initial idea was to have a bot (#noteit) that we could use to add a topic and link to the IRC logs where an item was discussed
19:50:25 <clarkb> one idea I had was that maybe one of the super simple web kanban board things that have sprouted up could be worth trying. Downside to that is who knows how long those services will stay up and whether or not data is exportable
19:50:38 <clarkb> but https://gokanban.io/ for example is like etherpad for kanban I think
19:51:20 <clarkb> tonyb: I do like the idea of linking back to the irc logs for greater context
19:51:31 <corvus> i like the idea of #noteit so when i'm off doing something for a day or two and come back i can find the important bits in backlog;  seems similar to #status log which we might be able to abuse for that purpose too
19:51:33 <clarkb> so much of the context ends up in IRC and we often end up grepping/googling/searching irc logs
19:51:58 <clarkb> ya I'd be willing to try it
19:52:01 <tonyb> Any of those could work I mainly wanted a non-invasive way to keep track
19:52:22 <fungi> or add a new #status track but sure log and even the others could link back to the irc log url where they were called
19:52:29 <clarkb> (to be clear the kanban thing was an idea in addition to the noteit idea not a replacement. It was just somethign that came to mind when reading this on the agenda yesterday as I put it together)
19:53:35 <clarkb> sounds like there aer no objections if someone has time to implement the feature in the bot
19:53:43 <tonyb> I figured as much #noteit would be the "quick add" to $whatever and then we'd edit that backend once its done
19:54:03 <clarkb> makes sense
19:54:05 <tonyb> A releated note is the specs repo is .... outdated
19:54:14 <fungi> yes indeedily-doodily
19:54:38 <clarkb> ya the main issue with the specs repo is as you note we're largely overcommitted with the existing stuff so finding time for new things or major changes is difficult and then that reflects back on the specs repo
19:54:56 <tonyb> If I start moving things around in there
19:54:59 <clarkb> but yes I think we should probably declare bankruptcy in the specs repo and carry over a small set of things we know we absolutely want to do
19:55:14 <fungi> in my tenure as ptl, pre-opendev, i tried to reframe the specs list as a help-wanted board
19:55:16 <clarkb> prometheus and the login improvements come to mind as things to carry over
19:55:16 <tonyb> for example mailman3 can be marked as done right?
19:55:20 <fungi> since that's more what it is
19:55:24 <clarkb> tonyb: yes that one can be marked done
19:55:29 <fungi> but yeah, some of those things can be crossed off the list now
19:55:59 <clarkb> tonyb: but ya I think patches to clean that up and maybe reflect the help wanted ness of the situation more explicitly would all be good
19:56:55 <tonyb> Okay.  I can push some of those and ask for reviews from time-to-time to make sure I'm on the right path
19:57:04 <clarkb> ++ sounds good
19:57:07 <fungi> thanks!
19:57:11 <clarkb> we have a few more minutes
19:57:15 <clarkb> #topic Open Discussion
19:57:25 <clarkb> anything else that didn't get captured in the agenda that we want to call out quickly?
19:57:30 <tonyb> So in the near terms I'll add a new bot to a mode for #status to do the note it stuff
19:57:49 <fungi> the lists performance adjustment seems to have worked out, messages are coming through quite a lot faster now
19:58:57 <clarkb> ya I haven't noticed any major lags since the queuing stuff went in
19:59:31 <clarkb> thank you everyone for all your help running OpenDev and your time during the meeting!
19:59:34 <tonyb> Nice.  My mail is often laggy anyway so I didn't notice at all :/
19:59:52 <clarkb> We'll be back here next week at our regularly scheduled time, but as always feel free to start discussions on the mailing list or on irc if things are urgent
19:59:59 <clarkb> or if you just want quick feedback dosen't have to urgent
20:00:06 <clarkb> #endmeeting