19:01:12 <clarkb> #startmeeting infra
19:01:12 <opendevmeet> Meeting started Tue May 24 19:01:12 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:12 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:12 <opendevmeet> The meeting name has been set to 'infra'
19:01:19 <clarkb> #link https://lists.opendev.org/pipermail/service-discuss/2022-May/000337.html Our Agenda
19:01:23 <ianw> o/
19:01:28 <frickler> \o
19:01:28 <clarkb> #topic Announcements
19:01:41 <clarkb> A reminder that the Summit starts in 2 weeks. Running June 7-9
19:01:55 <clarkb> I think we should go ahead and cancel the meeting on June 7 (I don't anticipate being able to attend
19:02:16 <fungi> i support that
19:02:23 <frickler> who is going there? me isn't
19:02:37 <clarkb> I think fungi and myself will be there
19:03:02 <fungi> if others want to hold a meeting that tuesday i'm not opposed, i'll just catch up from the meeting log
19:03:22 <clarkb> ya thats fine too. I won't be able to run it
19:04:15 <frickler> maybe we can decide next week?
19:04:19 <clarkb> sure
19:04:57 <clarkb> Also keep in mind the event if making changes to things like etherpad which will be used by the forum
19:05:49 <clarkb> #topic Actions from last meeting
19:05:54 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2022/infra.2022-05-17-19.01.txt minutes from last meeting
19:06:14 <clarkb> This wasn't recorded explicitly as an action but I do recall saying I would get it done this week. wiki.openstack.org has a new cert now. Thank you fungi for installing it
19:06:23 <clarkb> The email spam about that should go away now
19:06:29 <frickler> yay
19:06:35 <fungi> thanks for ordering it
19:07:25 <clarkb> #topic Improving CD throughput
19:07:54 <clarkb> With the summit and everything else going on recently (jammy and afs and pip and so on) this has taken a backseat for me when I meant to work on some related items
19:08:02 <clarkb> Hopefully next week I'll have made progress
19:08:27 <clarkb> Did anyone else have anything to mention on this topic?
19:09:54 <clarkb> Sounds like no. We can continue
19:09:59 <clarkb> #topic Container Maintenance
19:10:12 <clarkb> ianw: did end up testing the mariadb upgrade as part of the gerrit 3.5 upgrade
19:10:39 <clarkb> ianw: I left comments on the etherpad but was curious if you tried the container image env var or if you decided not to use it for some reason?
19:10:59 <clarkb> Just trying to figure out if running the upgrade command by hand is preferable to using the automation built into the container image for some reason
19:11:44 <ianw> sorry haven't looped back to that yet.  i can check on the env var way of doing things
19:11:47 <clarkb> either way the process looked straightforward and while we aren't in a rush to do this due to mariadb support periods it is probably good hygiene
19:11:51 <clarkb> ianw: thanks
19:13:35 <clarkb> #topic Support For Jammy Jellyfish
19:13:46 <clarkb> frickler: has made some more progress on spinning up the wheel mirrors for jammy
19:13:57 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/842841 add jobs to publish to openafs
19:14:12 <clarkb> I think that is the last remaining piece? frickler do we need to create a new openafs volume for that or is that done already?
19:14:37 <frickler> I think I did that, let me check
19:15:13 <clarkb> I guess ti is two volumes. One for each architecture
19:15:52 <frickler> seems I created the volumes, but didn't mount them
19:15:55 <ianw> doesn't seem mounted @ http://mirror.iad.rax.opendev.org/wheel/
19:17:13 <clarkb> maybe we review and +2 the change then you can +A when it is ready on the afs side?
19:17:13 <frickler> will need to check that later
19:17:35 <ianw> i can push it through today if you like to get it going
19:18:55 <clarkb> cool I'm sure we'll sort it out. Thank you for getting this together
19:19:03 <clarkb> #topic Gerrit 3.5 upgrade planning
19:19:11 <clarkb> #link http://lists.opendev.org/pipermail/service-announce/2022-May/000039.html Scheduled for 20:00 UTC June 19, 2022
19:19:18 <clarkb> the announcement for the upgrade went out.
19:19:22 <clarkb> #link https://etherpad.opendev.org/p/gerrit-upgrade-3.5
19:19:27 <clarkb> ianw also has notes on that etherpad.
19:19:42 <clarkb> I've reviewed them and tried to leave notes/questions/comments for stuff but this seems pretty straightforwatd
19:20:02 <clarkb> the main thing I think we're missing now is explicit config to enable collision checking. We should be able to land that on 3.4 and gerrt will ignore it until 3.5
19:20:08 <clarkb> ianw: ^ is there a change for that yet?
19:21:27 <ianw> no sorry, there's a couple of those todo items that need changes.  i will do those
19:21:47 <ianw> #link https://etherpad.opendev.org/p/gerrit-upgrade-3.5
19:21:50 <ianw> is what we're talking aobut
19:22:41 <clarkb> feel free to ping me when they're up. Happy to revie wthat stuff and I tend to have just enough gerrit context paged in to do a decent job :)
19:23:04 <clarkb> Anything else to bring up on the gerrit upgrade?
19:24:34 <clarkb> #topic Move reqirements propose-updates job to py3.8 + 3.9
19:24:55 <clarkb> frickler: want to fill us in on this one? I've reviewed at least one change but I think I'm lacking some context
19:26:29 <clarkb> I believe what is going on is openstack is trying to update requirements handling for newer python
19:26:45 <clarkb> and there were issues in the first attempt because all the needed python interpreters weren't installed
19:26:47 <clarkb> That should be fixed now
19:27:21 <clarkb> In the process frickler discovered that enqueing periodic jobs manually is not straightfoward and was asking how to do that properly
19:27:50 <clarkb> I want to say we have to use enqueue-ref of master for periodic jobs but periodic triggers have been weird. corvus is there anything else to know about that?
19:28:28 <corvus> clarkb: i don't know off hand either way, sorry
19:29:36 <clarkb> I'll give it a couple more minutes in case frickler is still around to add anything I missed
19:30:11 <ianw> i have de ja vu on starting periodic jobs
19:30:45 <ianw> http://lists.zuul-ci.org/pipermail/zuul-discuss/2019-May/000909.html
19:30:51 <clarkb> Thinking out loud the script to dump queues is probably a good one to cross reference
19:31:00 <ianw> i don't think i ever really found the right thing
19:31:02 <clarkb> since it will produce enqueue commands for the periodic pipelines
19:31:26 <frickler> the difference seems to be a "commit 00000" added to the manual trigger
19:31:40 <frickler> and then jobs end up in error
19:32:43 <ianw> yeah, i think, and from re-reading that message, you need to set newrev as the HEAD
19:32:56 <clarkb> sounds like this problem deserves more debugging. I wonder if we can use the zuul client testing or a similar setup to get testing going for it and then encode that in docs?
19:34:07 <clarkb> this might be a good topic to resurrect on the zuul mailing list if there continues to be confusion
19:34:12 <frickler> the other thing I noticed: the docs talk about adding the --trigger option
19:34:23 <frickler> but our zuul-client doesn't know that option
19:35:02 <clarkb> the old built in zuul enqueue commands did know trigger but I think it became unnecessary because each pipeline has a specific trigger already
19:35:10 <fungi> i think that must be cruft, i haven't needed to specify it. in theory zuul should just get it from the config
19:35:30 <frickler> o.k., so just a docs update needed then
19:35:34 <clarkb> right so zuul-client just dropped it as unnecessary where it was required with the old zuul command
19:35:35 <corvus> yeah, trigger is dead.  the other stuff certainly warrants more debugging and should be testable within zuul itself.
19:36:29 <frickler> I wanted to test on my local zuul but didn't get to that yet
19:37:28 <ianw> i'm having some more memories, based on http://lists.zuul-ci.org/pipermail/zuul-discuss/2019-May/000914.html
19:37:47 <ianw> which was all rpc client at the time, so things might have changed a bit
19:38:53 <ianw> but there wasn't a way to create an event that looked *exactly* like the timer trigger remotely, as that didn't fill in the oldrev/newrev
19:39:39 <ianw> but iirc setting the newrev value to the current HEAD was enough to make it work
19:39:55 <clarkb> makes sense. And if we have time to add a etst for that in zuul somewhere that would be great
19:40:10 <ianw> but i never became convinced enough this was the real solution to update the docs
19:40:11 <frickler> thx for the pointer, I'll test that @home
19:41:53 <clarkb> #topic Zuul changing default Ansible to v5
19:42:15 <clarkb> Zuul supports ansible v5 now. Eventually v5 will become the default ansible version and then sometime after that old EOL ansible versions will stop being supported by zuul
19:42:37 <clarkb> What that means is now is the time to start sorting out Ansiblev5 support so that we can hopefully uplift onto that version in opendev for our tenants.
19:43:03 <clarkb> we've found a number of issues doing that uplift so far. In particular the include task si no longer valid. You need to use include_* or import_*
19:43:40 <clarkb> We've also had issues where ansible running as zuul wants to become another unprivileged user and this fails because ansible can't setfacl or chown/chmod the temporary files it copies to do work on the remote
19:43:52 <clarkb> it is looking like ensuring that setfacl is installed may be the fix for this
19:44:14 <fungi> installed on the job nodes
19:44:21 <corvus> re setfacl: oy ... that's such a weird thing to have changed....
19:45:11 <corvus> i've been working through some stuff for the zuul tenant, mostly around container jobs at this point.  not quite done yet.
19:45:46 <clarkb> ya reading the code diff it isn't clear to me how old ansible worked and new ansible fails if setfacl isn't installed
19:45:57 <corvus> it looks like we'll be dropping support for focal in the ensure-podman role, because there is no longer any place (that i know of) to get podman for focal
19:46:01 <clarkb> but installing it seems to make new ansible happy so that is probably a reasonable thing to us to add to our base images
19:46:12 <corvus> we previously relied on kubic for that, but it's now gone...
19:46:51 <corvus> that may be something to keep in mind in the future potential uses of kubic...
19:47:01 <ianw> yeah, i just saw that they have restored that
19:47:08 <ianw> https://github.com/containers/podman/issues/14336
19:47:13 <clarkb> they note that it has known security bugs though
19:47:21 <clarkb> not sure we should set users up with that by default
19:47:23 <ianw> but an old version.  it's definitely a better idea to switch to jammy for that
19:47:29 <corvus> oh neat
19:47:44 <clarkb> (side note, this is why tools like docker end up ebing so popular. Its the solaris problem all over again... access and simplicity)
19:47:54 <ianw> i guess once we get the wheels sorted we will have a fully operational deathstar^w jammy environment
19:47:59 <corvus> well, the current version of my change is just to throw an error if you use ensure-podman on focal... that may still be the best approach....
19:48:37 <ianw> probably not a bad idea.  i'm not sure there's a lot of users
19:49:02 <clarkb> system-config-run jobs are happy with new ansible
19:49:36 <clarkb> I'm wondering if we should start forcing ansible v5 on in places like that and slowly just keep adding to the list. Or is that problematic because if v6 happens we have to go and update all those explicit entries?
19:50:07 <corvus> i really like keeping it out of job defs because of ^
19:51:14 <fungi> yeah, that just seems like a way of inflicting pain on our future selves
19:51:19 <clarkb> I guess the ideal then is to test what we can, set a date to update to v5 by default (we can do that in opendev by tenent before zuul changes) and then anything that breaks after gets fixed
19:51:20 <corvus> i think in the past, we've gotten like 90% confidence with targeted testing, then switch the default and folks can pin to the old version if something comes up...
19:51:39 <clarkb> ya ok. Maybe end of june is a good target for that?
19:51:44 <corvus> yeah, pin or fix depending on severity :)
19:52:18 <clarkb> (thinking we want it to be long enough for people to give testing a go before we switch and with summit going on that means some time, but also early enough we don't majorly impact say the openstack release cycle)
19:52:43 <corvus> sounds good.  i don't know what zuul's schedule will end up being, but probably not before end of june.  but it doesn't really matter because opendev can set the tenant defaults regardless of what zuul does -- just as long as we communicate and know when it's happening
19:52:53 <clarkb> yup
19:53:19 <clarkb> and for setfacl I think adding that to our base images is reasonable. Its a system utility that helps bootstrap ansible.
19:53:32 <corvus> agree
19:53:33 <clarkb> rather than force everyone to go and install the package themselves if they are using bcome
19:54:06 <corvus> (also, facls are awesome and should just be installed everywhere anyway)
19:54:36 <frickler> fyi devstack passed with acl installed https://zuul.opendev.org/t/openstack/build/113f568178fc4012b7cb862713a48130
19:55:26 <clarkb> yup I suspect if we update our base images this problem largely goes away
19:55:49 <clarkb> probably infra-package-needs is the thing to modify
19:56:01 <clarkb> I can try to collect all the different setfacl package names later today and push a change for that
19:56:05 <frickler> except maybe for some weird distro ... but we'll see
19:56:52 <clarkb> #topic Open Discussion
19:57:05 <clarkb> We are almost out of time so want to give any other business a quick opportunity
19:57:31 <corvus> [i plan to be at the summit]
19:58:37 <clarkb> cool. It will be an interesting experience to do a conference again.
19:59:03 <corvus> interesting is what i'm worried about! :)
19:59:31 * fungi hopes it remains safely boring
20:00:18 <clarkb> and we are at time
20:00:20 <clarkb> thanks everyone
20:00:20 <fungi> thanks clarkb!
20:00:22 <clarkb> #endmeeting