19:00:20 <clarkb> #startmeeting infra
19:00:20 <opendevmeet> Meeting started Tue Mar  5 19:00:20 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:20 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:20 <opendevmeet> The meeting name has been set to 'infra'
19:00:27 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/UG2JFEL6XFFLDT5UYDHCBYNAJF72XXHZ/ Our Agenda
19:00:43 <clarkb> #topic Announcements
19:00:53 <fungi> hold bowl with one hand, chopsticks with second hand, type with third hand
19:01:33 <clarkb> small note that I'll be AFK through a good chunk of tomorrow. Taking advantage of a morning matinee and kids being in school to see Dune
19:02:16 <clarkb> #topic Server Upgrades
19:02:27 <clarkb> I haven't seen any new movement on this
19:02:55 <tonyb> nope.  I'll address the review feedback and boot the new servers today
19:03:04 <clarkb> Worth calling out that the announced rackspace mfa switch may impact our ability to run launch node. I've got notes to discuss that further at the tail end of the meeting
19:03:12 <clarkb> tonyb: ah if you boot today you should be fine
19:03:24 <clarkb> #topic MariaDB Upgrades
19:03:56 <clarkb> The paste db upgrade went as expected. It seems to have only touched system tables, and did a backup of those tables first the size of which is less than 1MB and reasonable to continue to have the process do that backup
19:04:04 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/910999 Upgrade refstack mariadb to 10.11
19:04:09 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/911000 Upgrade etherpad mariadb to 10.11
19:05:03 <clarkb> I went ahead and pushed these two changes to upgrade refstack and etherpad's backing databases. I did have to make a small change to etherpad's test cases because the log output from 10.11 was updated to say mariadb is read instead of myslq is ready
19:06:31 <clarkb> reviews welcome as well as any feedback on whether we're comfortable with docker-compose kicking the upgrade off automatically or if we'd prefer manual intervention for up to the minute backups
19:07:10 <clarkb> after these two gerrit, gitea, and mailman 3 are the remaining dbs that need upgrades. I'll try to continue to step through them
19:07:57 <clarkb> #topic AFS Mirror cleanups
19:08:14 <clarkb> OpenSUSE Leap and Debian Buster have been removed from afs mirroring as well as nodepool
19:08:30 <clarkb> Next up is CentOS 7 which we've got some stuff in progress for under topic:drop-centos-7
19:09:04 <clarkb> I did realize that CentOS 7 had/has far more reach than the other two so decided to announce a removal date for March 15 in order to minimize impact to the openstack release process
19:09:24 <clarkb> the impact should still be minimal but there were enough places thatcentos 7 was still showing up that I didn't want to just blaze ahead like I did with the others
19:10:02 <clarkb> we're currently cleaning up project configs then late this week early next week I'll drop zuul-jobs testing of centos 7 and remove wheel caching for centos 7
19:10:13 <fungi> the custom nodeset definition in devstack is nearly done merging backports across 8 active branches
19:10:27 <clarkb> then we can do the actual nodeset and nodepool removal on the 15th and once that is done clean up afs
19:10:38 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/906013 Improve DKMS for CentOS OpenAFS testing/packaging
19:10:58 <fungi> though i expect whichever tries merging last to fail with errors we can then use to see what old branches of other projects are using the devstack nodeset
19:11:04 <clarkb> thsi change isn't directly related to the cleanup but involves centos and afs and I think will make it easier to understand failures with dkms on the platform
19:11:12 <clarkb> fungi: ya
19:11:25 <clarkb> fungi: keystone for example
19:12:03 <clarkb> slow but steady progress. And we've already freed up like 400GB of openafs consumption
19:12:20 <fungi> this cleanup effort is likely to be pretty vast, since copies of bits like custom nodesets and jobs are declared across many, many branches and only the last removal will actually tell you what was using it
19:12:33 <clarkb> I will note that last friday when I tried to clean up the buster mirror content afs01.dfw.openstack.org lost a "disk" and everything went sideways
19:12:56 <clarkb> it isn't clear to me if this was due to deleting a few hundred gigabytes of data or just coincidence
19:13:13 <clarkb> something we should be aware of when making other large changes to openafs. Rax addressed it quickly at least
19:13:40 <fungi> yeah, and other than retrying some vos releases there wasn't any lasting impact
19:13:40 <clarkb> fungi: yes, I mentioned it elsewhere but we really need openstack to clean up old stuff early in the branching process instead of at eol time
19:14:01 <clarkb> because we're ending up with ancient configs that make no sense in modern oepnstack that continue to be carried forward release after release increasing the cleanup time/cost
19:14:34 <fungi> i think people define shared resources in branched repos without considering how zuul uses them
19:15:07 <fungi> and not realizing that even if you delete something out of your master branch, other projects will just keep using it from a branch 5 releases ago
19:15:09 <clarkb> Another issue that I ran into was that openafs doesn't load on debian bookworm.
19:15:11 <clarkb> #link https://gerrit.openafs.org/#/c/15668/ Fix for openafs on arm with newer gcc
19:15:23 <clarkb> * doesn't load on debian bookworm arm64
19:15:58 <clarkb> Once upstream merges this fix for it I'll submit a bug to debian to see if we can get that fixed (it doesn't work at all so should be a good candidate for a fixup)
19:16:44 <clarkb> Once we've chipped enough of this old stuff out we can add in new things :)
19:16:59 <clarkb> if anyone wants to get a headstart on that a new dib job to start building 24.04 might be helpful
19:17:20 <clarkb> *start building Ubuntu 24.04 to avoid any confusion on what I was referring to
19:17:30 <frickler> is it coincidence that openafs is using gerrit and we are using openafs? /me just notices this
19:17:46 <clarkb> frickler: yes I think it is
19:19:15 <clarkb> #topic OpenDev Email Hosting
19:19:32 <clarkb> Don't think we have anything new to mention on this. But kept it on the agenda in case we had any stronger opinions
19:19:44 * clarkb will give everyone a couple minutes to chime in if so. Otherwise we can continue on
19:20:22 <frickler> I'd be fine with dropping it from the agenda and reviving once we consider it to be more urgent again
19:20:58 <clarkb> wfm I can do that
19:21:33 <clarkb> #topic Project Renames
19:21:53 <clarkb> This is mostly a reminder that we're planning to do renames after the openstack release on April 19
19:21:59 <clarkb> we can adjust this timing as necessary
19:22:08 <clarkb> so please say somethign if that timing is especially bad for some reason
19:22:50 <frickler> the release should happen earlier, the date is after the PTG
19:23:06 <clarkb> correct. Its basically release then ptg then the 19th
19:23:21 <clarkb> we didn't want to conflict with the ptg or the release so we're doing it late
19:23:55 <clarkb> which is a good lead into our next tope
19:23:57 <clarkb> *topic
19:24:03 <clarkb> #topic PTG Planning
19:24:09 <clarkb> #link https://ptg.opendev.org/ptg.html
19:24:26 <clarkb> I was hoping this schedule would be a bit more fileld in before picking times but it is very empty
19:24:47 <clarkb> rather than wait for others to fill in I think we can go ahead and grab some time.
19:24:56 <clarkb> Something like Wednesday 0400-0600 and Thursday 1400-1600UTC. Gives enough time between blocks to catch up on sleep.
19:25:49 <clarkb> monday and tuesday tend to be busy so I'm trying to accomodate that
19:26:22 <frickler> +1
19:26:34 <tonyb> Works for me.  I admit I'll only be attending APAC friendly meetings
19:26:38 <fungi> ever since the ptg organizers stopped trying to pre-schedule times for all registered teams,many teams tend to wait until the last week to book any slots
19:26:59 <tonyb> I was thinking of dropping into the openeuler session
19:27:11 <clarkb> tonyb: that doesn't conflict with the times I proposed does it?
19:27:28 <clarkb> no it is on friday so we're good there
19:27:43 <frickler> I wasn't even aware that the scheduling is already happening. seems it is only announced to PTLs/session leaders?
19:28:02 <tonyb> I don't think so.  the one I saw was Friday
19:28:05 <clarkb> frickler: yes emails did go out to the session leaders. Not sure if emails went out more broadly.
19:28:28 <clarkb> I can make a note that we may need to communicate this more widely
19:28:54 <tonyb> I think it only goes to session leaders
19:31:07 <clarkb> anyway I'll get us signed up for those two blcoks later today
19:31:17 <clarkb> #topic Rax MFA Requirement
19:31:17 <fungi> sounds good
19:31:44 <clarkb> fungi received email today announcing that rax will require MFA for authenticating starting march 26, 2024
19:32:06 <fungi> they've also added a similar notice on the login page for their portal
19:32:06 <clarkb> enabling MFA breaks normal openstack api auth. We have to either use a rax api key or bearer token
19:32:35 <clarkb> this means all of our automation is impacted.
19:33:08 <clarkb> Since bearer tokens expire (relatively quickly too) we've decided to investigate using the api_key method. To do this we need ot install rackspaceauth as a keystoneauth1 plugin to all the places we use the api
19:33:18 <clarkb> the nwe need to use the api key value instead of regular user auth
19:33:51 <clarkb> the rough plan here is to test this with nodepool using a single region to start that way we can check that launcher and builder operati ons work (or don't)
19:34:16 <frickler> do we know the lifetime for those api keys?
19:34:27 <clarkb> then when that works we can switch all rax nodepool providers over to the new system and update our control plane management to use the same api-key stuff. Then we can opt in to MFA when ready
19:34:44 <clarkb> fungi: ^ do you know the answer to frickler's question? You were testing this with your personal account any indication of a lifetime?
19:34:45 <fungi> frickler: i generated one for my personal rackspace account years ago and it's never changed
19:35:12 <fungi> from what i can tell it only changes if you click the "reset" button next to it in the account settings
19:35:32 <clarkb> If you'd like to help with reviews or pitch in pushing changes we're using topic:rackspace-mfa
19:35:48 <tonyb> I'm not really seeing how that helps with security at all?
19:35:59 <fungi> tonyb: it helps with security theater
19:36:10 <fungi> if you force people to make changes then you can't say you didn't do anything
19:36:24 <clarkb> fungi: for the system-config chagne we need to put new secrets in private vars. Is that done yet?
19:36:38 <clarkb> thinking about our next steps and I think it is roughly add the new private vars, land the system-config change, then update nodepool config
19:36:46 <fungi> yes, i left a comment on the change saying i did it too
19:37:28 <clarkb> then we can either land the nodepool change or try it out of the intermediate registry. Pull from the intermediate registry will only work for the launcher image I think since the builer is multiarch and docker isn't able to negotiate multiarch images out of the intermediate registry currently :/
19:37:31 <clarkb> fungi: thanks!
19:38:06 <clarkb> fungi: we should be able to push up a project-config update with a depends on system-config too if we haven't yet
19:38:16 <clarkb> but I think that is where we're at until a couple of things merge
19:38:17 <corvus> i think if it works for launcher that's good enough to land the nodepool change
19:38:50 <clarkb> as an alternative we can manually install the lib itno the image if the launcher is multiarch too and can't be fetched out of testing
19:38:52 <fungi> clarkb: what needs changing in project-config? i can do that
19:38:59 <corvus> (i don't think we need to prove it works to land the nodepool change; it's pretty simple.  but still, it'd be nice to avoid churn or errors there since there's no real way to test it other than in prod)
19:39:23 <clarkb> fungi: we have to update the nodepool/nl01.opendev.org and nodepool/nodepool.yaml files to force one of the three rax providers to use your newly defined clouds.yaml entries
19:39:32 <fungi> oh, right that
19:39:34 <clarkb> corvus: ++
19:39:37 <fungi> yeah i'll get that proposed
19:39:51 <fungi> though probably not until after 21z
19:39:51 <clarkb> fungi: I would pick the rax region with the lowset capacity to reduce impact if it doesn't work
19:39:58 <fungi> good idea
19:40:38 <fungi> we have three weeks to get this working, which seems like plenty, but if we run into problems that time can disappear on us very quickly
19:41:03 <clarkb> agreed best to get as much info as we can as early as possible then adjust our plan as necessary
19:41:55 <frickler> what about log uploads, are these also affected or not? the earlier discussion in #opendev didn't seem conclusive to me
19:42:18 <fungi> we use swift account credentials for that, not keystone
19:42:25 <fungi> as i understand it
19:42:45 <fungi> those are separate accounts defined in swift itself and scoped to specific swift acls
19:42:46 <clarkb> ya so I don't think they will be affected but we should double check on that (check that we are using special creds and check that they aren't affected though i'm not sure how we do this second thing)
19:43:00 <clarkb> corvus: you may recall the details as I think youset that UP/
19:43:03 <clarkb> (and I can't type)
19:43:57 <fungi> we can also, worst case, fall back to only uploading to ovh in the interim while we work it out
19:44:05 <clarkb> not ideal but ya that would work
19:44:41 <clarkb> as far as actual MFA implementatino goes their docs refer to phone authenticator apps. Typically this means they are doing totp so we should be able to do that here as well
19:44:57 <clarkb> similar to how some of our other accounts have done totp
19:46:12 <clarkb> Still a lot of unknowns for now but we've got a plan to learn more. Next week we can catch up and make sure there aren't any glaring issues we need to address
19:46:16 <clarkb> #topic Open Discussion
19:46:21 <clarkb> Anything else before we end the meeting?
19:47:00 <corvus> clarkb: i don't recall the details....
19:47:21 <clarkb> corvus: ack we should be able to log in to the swift stuff and check and/or look at our secrets in zuul
19:47:35 <corvus> yeah, probably worth looking into ahead of time
19:47:41 <corvus> because i agree, something is different about it
19:48:45 <clarkb> openstack is starting to get into release mode. Keep that in mind when making changes
19:48:57 <clarkb> and thats about all I had
19:51:50 <clarkb> sounds like that is everything for today. Thank you everyone for your time and effort operating and improving opendev
19:51:55 <clarkb> #endmeeting