19:00:18 #startmeeting infra 19:00:18 Meeting started Tue Jul 2 19:00:18 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:18 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:18 The meeting name has been set to 'infra' 19:00:24 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/VV6IZNOMAO2KDKBBQ45R3VPNSRCOLWCG/ Our Agenda 19:00:28 #topic Announcements 19:00:45 A reminder that Thursday (and possibly Friday?) is a holiday for several of us 19:01:17 happy turkey day! 19:01:41 no this is happy please do your best to not set everything on fire with your firworks day :) 19:01:51 i will celebrate with maple syrup! 19:02:20 might also be worth calling out that oepnstack has loaded up the CI system with some important changes and we should do our best to ensure we aren't impacting their ability to merge 19:02:35 oh right. silly me 19:02:44 fungi: frickler: I saw rumblings of merge issues but haven't seen any of that tied back infrastructure issues. 19:03:56 well we can get back to that at the end of the agenda if there are concerns 19:04:00 #topic Upgrading Old Servers 19:04:27 tonyb: I've not been able to apy as much attention to this as I would've liked over the last week. Plenty of other distractions. Anything new to share about the wiki work? 19:04:41 i can get the IP of the held node 19:05:11 I did a snapshot and import of the current wiki data and it seems good 19:05:20 tonyb: does that include the database content? 19:05:30 yup 19:05:57 it helped me discover some additional work to do but nothing significant 19:05:57 that is great. I guess that proves we can do the sideways migration (probably with a reasonable downtime to avoid deltas between the two sides) 19:06:26 yup I'd say the outage will be an hour tops 19:06:46 so I'd like y'all to look at the wiki on that node 19:07:03 to test for extension breakage etc 19:07:16 then I can publish changes for review 19:07:25 ++ I can #link a url if we have one (or do we need to edit /etc/hosts locally to have vhosts align?) 19:07:32 are images working too? the separate non-database file storage in mediawiki required some careful handling last i did this 19:07:44 yup images work too 19:07:49 awesome! 19:08:05 you'll need to edit hosts with the IP 19:08:27 do you have the IP? otherwise we have to go look in nodepool's hold list 19:08:28 I'll drop the details in #opendev when I get to my laptop 19:08:36 perfect thanks! 19:09:02 I also wanted to ask about booting noble nodes. Have we booted one yet? I know the stack to make that possible landed last week. 19:09:24 for noble I was going to try mirror mode but that was made complex 19:10:07 The vexxhost mirrors are currently boot from volune 19:10:25 the tax clouds don't have noble images 19:11:08 etc nothing super worrisome but it means I'll probably stall for a bit while I figure out the right way forward 19:11:17 seems reasonable 19:11:42 I also wanted to note that debian fixed some openafs packaging so we may be able to flip some of those infra roles jobs that are non voting to voting at this point 19:11:57 104.239.143.6 wiki99.opendev.org 19:12:25 I can look at that too 19:12:59 I wanted to check the next release and see if there is anything we need that isn't backported 19:12:59 I haven't tested the fixes myself but flipping from non voting to voting to run the jobs then see what fails/succeeds seems reasonable 19:13:36 Anything else related to upgrading servers? 19:13:51 It's a bit of a tangent but I've been looking at kAFS also 19:14:11 oh, i did confirm that this week's new openafs packages in debian fixed my dkms build problems 19:14:33 speaking of afs 19:14:38 Nice 19:14:56 A good lead into the next topic 19:15:06 #topic Cleaning Up AFS Mirror Content 19:15:44 a number of the topic:drop-centos-8-stream changes have merged at this point, but what remains is currently stuck behind projects like glance (and tempest etc?) still trying to run fips jobs on centos 8 stream 19:16:18 I don't think we want to force merge cleanups this week due to the holiday and openstack being preoccupied with the security fix patches, but maybe we should consider forc emerging things next week? 19:17:01 Currently the jobs are broken and projects ahve just set thinsg to non voting which is completely uselss from our perspective. The jobs cannot succeed and need to be removed/replaced and if that isn't happening more naturally (due to the non voting workaround) I think we should be more forceful 19:17:51 any concerns with doing that next week? Maybe we want to see where the openstack patching stands on monday? 19:18:08 Sounds okay to me 19:18:19 we pre-warned of that plan 19:19:14 What's the status of Xenial and Bionic? 19:19:43 What can I/we do to help get that content removed 19:20:00 tonyb: my efforst there stalled on on trying to clean up centos 8 stream first (I prioritized that way since xenial jobs still function and centos 8 stream had fewer tendrils) 19:20:28 I do have a semi recent change up to system-config to remove our last uses of xenial with a warning that once we do that we are at higher risk of breaking things fomr an opendev perspective 19:20:45 that was always going to happen with our plan so its a matter of timing. We could proceed with that if we think the risk is low enough 19:20:55 topic:drop-ubuntu-xenial should have that change Let me see about a direct link 19:21:21 #link https://review.opendev.org/c/opendev/system-config/+/922680 Xenial CI Job removal from system-config 19:22:27 Thanks 19:22:45 #topic Gitea 1.22 Upgrade 19:23:39 Upstream still doesn't have a 1.22.1 release 19:24:15 so I haven't really looked at this much more. However, I did cycle out my held nodes as the previous ones were old enough to not have logs available in zuul related to them anymore 19:24:53 also a user pointed out yesterday on irc that tarball downloads for repos do not work currently. They were workign not that long ago and the logs indicate 200 responses with 19 bytes of json saying "complete: false" 19:25:08 my plan is to test that with the 1.22 held nodes and see if the behavior persists as I suspect it may just be a gitea bug 19:25:53 #topic OpenMetal Cloud Rebuild 19:26:41 I haven't seen any response to my last email 19:27:20 The main concern is that I think configuring storage properly is something that they should consider more generally for their product and we're a good test case which means we may want to avoid fixing it directly using frickler's kolla knowledge 19:28:14 basically trying to avoid stepping on toes and help provide some value back to the donating organization. With the holdiay and it being summer for those of us in nroth america it may also just be a vacation problem. I'll try to followup again after the holiday and see if we can get direction that avoids toe stepping 19:28:36 Sounds good. 19:28:38 #topic Testing Rackspace's New Cloud Offering 19:28:48 similarly with this one I haven't heard back on the email I sent 19:28:52 I guess we could also file an issue / ticket 19:29:05 tonyb: ya that might be another way to get attention 19:29:35 In the rackspace case I think I may lean on some folks at the foundation who may be in more regular contact with them and see if we can set something up 19:29:43 again with the holiday this week I don't expect it to move quickly though 19:30:00 yeah, cloudnull seems to busy for our usual heckling ;) 19:30:05 er, too busy 19:31:28 #topic Nodepool in Zuul 19:31:50 you may remember this zuul spec from a while back 19:32:28 the main work of implementation on that has begun, so i think in the not too distant future, this may become more relevant to opendev 19:32:54 the main goals are to express image and node info directly in zuul configs as well as using the zuul runtime engine to process things like image builds 19:33:20 yep, and a big part of that is being able to build images inside a zuul job 19:33:24 and reduce confusion over zuul and nodepool being different things for historical reasons despite being tightly coupled today 19:33:44 and provide opportunities for things like acceptance testing of images, i suppose 19:33:55 Can you link to the spec so I can ask reasonable questions 19:33:57 so in opendev, we will need to port our image building from nodepool-builder into zuul jobs 19:33:58 fungi: yep 19:34:40 #link nodepool in zuul spec https://zuul-ci.org/docs/zuul/latest/developer/specs/nodepool-in-zuul.html 19:34:51 Thank you 19:35:56 I suspect that we'll be able to port an image at a time as we sort out any unexpected items 19:36:06 as for moving the image building into jobs -- there's a bit of work to do there, but i don't think it's going to be too bad, and we'll have help 19:36:30 first, i expect that the image build jobs are basically going to be "run diskimage-builder with the same parameters we use today inside nodepool-builder" 19:36:57 second, the folks at bmw already build their images this way, and are offering their existing ansible roles that execute DIB to zuul-jobs 19:37:18 so a lot of the boilerplate for "run dib in a zuul job" should exist in some form 19:37:42 we also have my old proof-of-concept patch: https://review.opendev.org/848792 19:37:55 that just does it in a shell script, but the principle is the same 19:38:10 I suspect that zuul will want to ship things that work out of the box for people migrating (though we're likely the test case for those)? 19:38:45 so anyway, as clarkb says, i think we can start working on these image jobs one at a time, once zuul grows the ability to run them 19:39:35 clarkb: yeah, i think there should be a straightforward path for anyone using nodepool-builder. 19:40:08 the other thing opendev may see soon is a zuul-launcher host 19:40:30 Exciting :) 19:40:32 talking out loud here: but do we need another host? Could just run that on the existing launcher nodes? 19:40:53 that will be the zuul component responsible for launching nodes, and at least at first, driving image build workflow 19:40:57 though maybe since long term nodepool goes away not having things named nlXY is preferred 19:41:30 clarkb: we... could, i think? but unless we're very constrained, i think it would be good to have a new host for simplicity 19:41:56 I don't think we're constrained. Was more thinking it might speed up the conversion process to not launch new nodes too (though that isn't too big of a deal either) 19:42:13 the process for doing all of this will be to effectively develop this in a shadow mode. so zuul is going to grow a lot of features that are not enabled by default, and are not documented. 19:42:35 and i think we're going to get all the way to the end and have the ability to run both systems in parallel for a while before we call it done. 19:43:04 that sounds reasonable 19:43:18 i'll take care of writing the deployment changes, and doing a lot of the (undocumented) job/workflow construction 19:43:36 thanks! 19:43:46 i will be leaning on other folks to help with the image build jobs themselves, because i am not as expert in that as others are :) 19:44:06 sounds like a plan 19:44:27 * tonyb is far from an expert but is happy to "drive" the opendev side of the image-building 19:44:54 tonyb: ++ thanks! and you will be soon! :) 19:45:04 anything else? 19:45:07 \o/ 19:45:12 i think you can expect to see some deployment changes relatively soon.... 19:45:50 oh, and if anyone asks, i don't think opendev running zuul-launcher should be seen as a sign it's time for other folks to do so 19:46:10 probably want to point people at zuul release notes for signal that they are expected to migrate? 19:46:37 obviously anyone is welcome too -- but for clarity, sometimes we point to ourselves as an example of how zuul should be run; but in this case i consider this more like part of the development process, and it's not a signal of maturity or production readiness. 19:46:40 just want to be clear to that 19:46:54 clarkb: exactly -- documentation and release notes will be a much better signal for that 19:47:09 Good to be clear 19:47:14 i think that's about it; thanks 19:47:17 thanks corvus 19:47:40 that is also how we handled the switch from jenkins to ansible builders in zuul 2.5.x 19:47:46 #topic Collating Backlog Items From the Group 19:48:02 tonyb: you added this item do you want to drive or should I from the notes ( I know it is early for you) 19:48:10 I can 19:48:19 go for it 19:48:24 It's kind of a discussion point 19:48:50 As a group we're somewhat overcommited and we all have a list (physical or not) of things to do 19:49:09 i have a to do item to find where i put my to do list 19:49:22 I was thinking of coming up with a lighteight way to track these things 19:50:02 we've used etherpads fairly well for scratch coordination in the past 19:50:15 My initial idea was to have a bot (#noteit) that we could use to add a topic and link to the IRC logs where an item was discussed 19:50:25 one idea I had was that maybe one of the super simple web kanban board things that have sprouted up could be worth trying. Downside to that is who knows how long those services will stay up and whether or not data is exportable 19:50:38 but https://gokanban.io/ for example is like etherpad for kanban I think 19:51:20 tonyb: I do like the idea of linking back to the irc logs for greater context 19:51:31 i like the idea of #noteit so when i'm off doing something for a day or two and come back i can find the important bits in backlog; seems similar to #status log which we might be able to abuse for that purpose too 19:51:33 so much of the context ends up in IRC and we often end up grepping/googling/searching irc logs 19:51:58 ya I'd be willing to try it 19:52:01 Any of those could work I mainly wanted a non-invasive way to keep track 19:52:22 or add a new #status track but sure log and even the others could link back to the irc log url where they were called 19:52:29 (to be clear the kanban thing was an idea in addition to the noteit idea not a replacement. It was just somethign that came to mind when reading this on the agenda yesterday as I put it together) 19:53:35 sounds like there aer no objections if someone has time to implement the feature in the bot 19:53:43 I figured as much #noteit would be the "quick add" to $whatever and then we'd edit that backend once its done 19:54:03 makes sense 19:54:05 A releated note is the specs repo is .... outdated 19:54:14 yes indeedily-doodily 19:54:38 ya the main issue with the specs repo is as you note we're largely overcommitted with the existing stuff so finding time for new things or major changes is difficult and then that reflects back on the specs repo 19:54:56 If I start moving things around in there 19:54:59 but yes I think we should probably declare bankruptcy in the specs repo and carry over a small set of things we know we absolutely want to do 19:55:14 in my tenure as ptl, pre-opendev, i tried to reframe the specs list as a help-wanted board 19:55:16 prometheus and the login improvements come to mind as things to carry over 19:55:16 for example mailman3 can be marked as done right? 19:55:20 since that's more what it is 19:55:24 tonyb: yes that one can be marked done 19:55:29 but yeah, some of those things can be crossed off the list now 19:55:59 tonyb: but ya I think patches to clean that up and maybe reflect the help wanted ness of the situation more explicitly would all be good 19:56:55 Okay. I can push some of those and ask for reviews from time-to-time to make sure I'm on the right path 19:57:04 ++ sounds good 19:57:07 thanks! 19:57:11 we have a few more minutes 19:57:15 #topic Open Discussion 19:57:25 anything else that didn't get captured in the agenda that we want to call out quickly? 19:57:30 So in the near terms I'll add a new bot to a mode for #status to do the note it stuff 19:57:49 the lists performance adjustment seems to have worked out, messages are coming through quite a lot faster now 19:58:57 ya I haven't noticed any major lags since the queuing stuff went in 19:59:31 thank you everyone for all your help running OpenDev and your time during the meeting! 19:59:34 Nice. My mail is often laggy anyway so I didn't notice at all :/ 19:59:52 We'll be back here next week at our regularly scheduled time, but as always feel free to start discussions on the mailing list or on irc if things are urgent 19:59:59 or if you just want quick feedback dosen't have to urgent 20:00:06 #endmeeting