19:04:42 <clarkb> #startmeeting infra 19:04:43 <openstack> Meeting started Tue Apr 27 19:04:42 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:04:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:04:46 <openstack> The meeting name has been set to 'infra' 19:05:06 <ianw> o/ 19:05:35 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2021-April/000227.html Our Agenda 19:05:42 <clarkb> #topic Announcements 19:06:07 <clarkb> Airship and openstack have largely finished up their releases so I think we can stopping holding off on things for that (like the zk cluster upgrade) 19:06:30 <clarkb> #topic Actions from last meeting 19:06:35 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-04-13-19.01.txt minutes from last meeting 19:06:54 <clarkb> ianw had an action to push changes up to retire pbx.o.o and fungi had an action to push changes to retire survey.o.o 19:07:01 <clarkb> ianw: fungi: any updates on those items? 19:07:08 <fungi> yeah, i haven't gotten to it yet 19:07:18 <clarkb> #action fungi push changes to retire survey.o.o 19:07:24 <fungi> maybe sometime this week, though i'm trying to not be around the computer mych 19:07:26 <fungi> much 19:07:30 <ianw> i totally forgot about pbx, sorry 19:07:33 <ianw> on todo list now 19:07:49 <clarkb> #action ianw push changes to retire pbx.o.o 19:07:53 <clarkb> thanks! 19:07:59 <clarkb> #topic Priority Efforts 19:08:03 <clarkb> #topic OpenDev 19:08:18 <clarkb> I've done more account cleanups. We are down to ~250 iirc with conflicts now 19:08:44 <fungi> excellent! 19:08:45 <clarkb> I also put together a list that is probably a bit more "dangerous" btu if we start by disabling the accoutns and waiting a week or two we can probably rule out issues that way 19:09:26 <clarkb> ~clarkb/gerrit_user_cleanups/notes/proposed-cleanups.20210416 has that list if you want to look it over 19:09:27 <fungi> sounds reasonable 19:09:42 <clarkb> ~clarkb/gerrit_user_cleanups/notes/audit-results.yaml.20210415 is the info that was used to produce that list 19:09:43 <fungi> how many for that batch? 19:09:54 <fungi> is that the remaining 250 or a subset? 19:09:59 <clarkb> ~180, it is a subset 19:10:29 <clarkb> there are some folks like donnyd and stackalytics who show up and I do want to reach out to them by email once we get the list down to a manageable set 19:11:18 <clarkb> a good chunk of this group are accounst that haven't been used in many years 19:11:25 <clarkb> I think I used ~2018 as a rough cut off 19:12:22 <fungi> so ~70 which will need an even higher level of care, got it 19:13:07 <clarkb> but ya if ya'll can skim it and see if anything stands out as a bad idea I can go through it again if that happens 19:13:12 <clarkb> otehrwise will try to proceed with it 19:13:38 <clarkb> In other gerrit news we upgraded to 3.2.8 and incorporated fungi's jeepyb improvements on our iamge 19:13:56 <clarkb> ianw noticed that gerrit lost the account lock again :/ it seems to do that after we update then we restart and it doesn't seem to happen much after 19:14:14 <clarkb> anyway we'll continue to keep an eye on it. Earlier today when I checked lslocks showed gerrit still had the lock 19:14:29 <clarkb> Any other OpenDev related discussion? 19:15:02 <clarkb> #topic Update Config Management 19:15:25 <clarkb> Between ptg and adding inmotion cloud and now zookeeper I've been pretty busy doing other things. Anyone have config management updates to call out? 19:16:20 <fungi> none i can recall 19:17:06 <ianw> not really, other things popping up 19:17:18 <clarkb> I have been meaning to followup with the gerrit sql stuff but we can do that out of the meeting 19:17:21 <clarkb> #topic General Topics 19:17:31 <clarkb> #topic Server Upgrades 19:18:05 <clarkb> The zk cluster upgrade is happening as we speak. One of three hosts has been swapped out so far. Waiting on some changes to apply on bridge before starting on the second 19:18:10 <clarkb> #link https://etherpad.opendev.org/p/opendev-zookeeper-upgrade-2021 tracking progress there 19:18:35 <clarkb> had one small issue that needed fixing, but otherwise seems to be working about how I expected it to 19:19:46 <clarkb> Once zk is done I'll be looking at the zuul scheduler. I think the rough idea there is have ansible configure a new zuul02.opendev.org host, get everything LE'd and in place, copy the keys from old to new, prime repos, then schedule a cutover 19:19:57 <clarkb> My raed of the ansible is that this should work ebcause we don't start zuul automatically 19:20:12 <clarkb> but I don't want to get too ahead of myself while in the middle of the zk upgrade 19:20:22 <clarkb> #topic survey.openstack.org 19:20:40 <fungi> probably don't need this on the agenda with the action item 19:20:42 <clarkb> I added this to the agenda beacuse we're getting the your cert will expire warnings from it. fungi is on the case but no progress 19:20:59 <clarkb> ya it was on the agenda so just quickly covering it, but agreed we can move on 19:21:10 <fungi> yeah, well i mean, if it stops serving a valid cert, it's as good as down ;) 19:21:12 <clarkb> #topic Debian Bullseye Images 19:21:51 <clarkb> I just wanted to call out the odd situation these are in. Bullseye has not released yet, but we're under some pressure to provide images for bullseye because openstack (specifically nova) dropped support for buster 19:22:16 <clarkb> We've run into at least one issue related to "bullseye hasn't been released yet" causing problems with ansible facts that ianw has a workaround in dib for 19:22:34 <fungi> that hasn't been approved yet though 19:22:35 <clarkb> the problem with that is we've had some persistent nova 500 errors when doing end to end functional testing of the dib changes 19:23:06 <ianw> yeah it looks like that got a +1 from zuul, but it took a few rounds 19:23:07 <clarkb> I've got a change up to nodepool to collect openstack logs to help us debug these problems but you have to depends-on that change from dib to get the logs. I don't think nodepool is interested in collecting all of those devstack logs 19:23:07 <fungi> yep, that's also been hampering another buster-related fix in dib 19:23:29 <clarkb> if we want we can split the depends-on for the nodepool change in dib out to another change and then try and land the dib stuff as is 19:23:35 <fungi> (the one to correct the security mirror path) 19:24:14 <clarkb> once zk things settle down I can look into that again, but I'm happy if others want to try and work past it for now 19:24:44 <fungi> er, bullseye-related i meant 19:25:23 <clarkb> that was all I had on this, wanted to record why bullseye is important before it even releases 19:25:28 <clarkb> anything else to add ? 19:25:56 <ianw> not really, we can keep rechecking, but hopefully we can just get the logs thing in to help debug 19:26:05 <fungi> which change was that again? 19:26:14 <ianw> https://review.opendev.org/c/zuul/nodepool/+/788028 19:26:17 <ianw> #link https://review.opendev.org/c/zuul/nodepool/+/788028 19:26:46 <fungi> aha, thanks 19:26:49 <clarkb> but I think corvus and others have expressed an opinion that they don't want those logs in there because we aren't testing devstack/openstack 19:27:05 <clarkb> (if it were me having the debug info there makes sensesince it does seem to cause a nonzero number of failures) 19:27:38 <clarkb> maybe bring it up for discussion in #zuul 19:28:54 <clarkb> #topic Minor git-review release to support --no-thin 19:29:14 <clarkb> I put this on the agenda because this is a new feature we added to git-review that users aren't likely to know they want until they ask us for help 19:29:41 <clarkb> if individuals come to us with problems pushing to gerrit related to missing trees/objects in packfiles you can ask them to update git-review (if necessary) then do git review --no-thin 19:29:45 <clarkb> that should work around the problem 19:30:33 <fungi> 2.1.0 is the minimum version needed to get that option 19:31:02 <fungi> (noted for posterity) 19:31:47 <clarkb> #topic openEuler patches 19:32:36 <clarkb> linaro is asking us to get openEuler test instances running. They discussed this with us at the PTG. I don't have any major concerns other than ensuring we don't become a defacto official mirror because we're pulling from the root with some magic rsync key 19:33:02 <clarkb> ianw: I don't think we need to get the TC involved. We can provide testing platforms in opendev that go beyond what openstack wants to test on 19:33:19 <ianw> ok, the first such change was 19:33:22 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/784874 19:33:45 <clarkb> ya thats the one at the PTG that I mentioned we should pull anonymously and from another mirror (not the root) if necessary 19:33:58 <clarkb> so that we aren't any more special than anyone else pulling the packages 19:34:24 <clarkb> (they don't seem to have any mirrors outside of china currently so I want to avoid giving the impression that our mirrors are official) 19:34:48 <clarkb> ianw: at the PTG they agreed they would look for a differetn mirror to pull from that allows anonymous access (they thought at least one of the other mirrors may do that) 19:35:11 <ianw> ok; i just received a couple of personal pings, so while i'm happy for this to go ahead i just didn't want to be the defacto owner of this 19:35:49 <clarkb> ianw: definitely push them to the channel (#opendev) and the mailing list so that it doesnt' become something you alone are negotiating 19:36:45 <ianw> ++, thanks 19:36:55 <clarkb> ianw: I also suggested that we ensure the functional end to end testing for the dib changes is working before we try to land those 19:37:02 <fungi> at a minimum, we need people with a vested interest in the platform hanging out in irc or reading our mailing lists in case we need something fixed 19:37:06 <clarkb> I think they are currently non voting so the chagnes get +1 but the tests for openeuler don't actually work 19:37:16 <clarkb> they thought this was due to the lack of a mirror (I haven't looked at the failures yet myself) 19:37:32 <clarkb> fungi: ya I also meant to check if it is even localized into english? 19:37:40 <clarkb> I assume it is as a fork of centos/rhel but who knows :) 19:37:52 <ianw> yeah, i think the other issue is that we haven't figured out a way to really do boot tests of arm64 19:37:52 <clarkb> but that didn't occur to me until later 19:38:13 <ianw> qemu binary translation, at least in linaro, is just too slow 19:38:23 <fungi> it was supposedly a fork of centos, i couldn't find any information on what they're doing wrt stream though 19:39:16 <ianw> it can get a cirros instance up in devstack but trying to boot a dib image like we do in the devstack+nodepool tests i could barely do after leaving it for literally hours 19:39:26 <fungi> why would we need qemu binary translation in linaro? figured we'd have to use it only when the architecture differs 19:40:08 <clarkb> fungi: no nested virt on arm64 19:40:14 <fungi> or is the problem that we want to test booting amd64 cirros images on arm64 devstack? 19:40:17 <ianw> this is for the nested case. so we build the arm64 image on the native host, then try to boot it 19:40:48 <ianw> i mean, in theory, we could make a multi-node devstack, have a compute node separate and boot our image on that 19:40:57 <fungi> and qemu needs binary translation to boot arm64 on arm64? 19:41:08 <ianw> which is probably the solution, but also a lot of work 19:42:36 <ianw> fungi: there's no nested virt, at least on linaro, so yeah it's going old-school. i haven't checked on osu 19:42:59 <clarkb> ianw: I think linaro said there isn't any nested virt support in kvm yet for arm64 19:43:23 <clarkb> fungi: basically if you don't have nested virt then its always binary translation regardless of the targets 19:44:44 <fungi> interesting, i thought that was the point if paravirtualization vs not 19:44:50 <fungi> s/if/of/ 19:45:15 <clarkb> fungi: there are optimizations depending on the whether or not you have a common arch but in general you're still hitting expensive paths 19:45:16 <ianw> i think there is, there is some flag it says during boot, but as for end-to-end plumbing of everything from kvm/qemu -> nova -> userspace, i don't know 19:45:26 <clarkb> ianw: ah interesting 19:47:53 <clarkb> anything else on the openeuler topic? 19:48:06 <ianw> nope, thanks 19:49:33 <clarkb> #topic Open Discussion 19:49:59 <clarkb> That was everything on the agenda. I'm currently working through the zk upgrade as zk05 was added to the cluster a few minutes earlier than I expected (its fine) 19:50:25 <clarkb> was there anything else to cover? Otherwise I'm going to dig into zk again 19:50:48 <fungi> as mentioned i'm not really around this week if i can help it, but will still make time if there's something urgent 19:52:12 <clarkb> enjoy the break 19:52:21 <fungi> oh, one thing not yet resolved... ansible fact gathering crashes the python interpreter for centos 8 on our arm64 nodes, i have one held, going to try to wrestle a core dump out of it but not sure how much that's going to tell us 19:52:21 <clarkb> sounds like that may be it so I'll stop the meeting here. 19:52:23 <clarkb> Thank you everyone 19:52:26 <clarkb> #endmeeting