19:01:16 <ianw> #startmeeting infra 19:01:17 <openstack> Meeting started Tue Mar 23 19:01:16 2021 UTC and is due to finish in 60 minutes. The chair is ianw. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:20 <openstack> The meeting name has been set to 'infra' 19:01:28 <ianw> #topic Annoucements 19:01:54 <fungi> this just in: clark takes a week off 19:02:15 <ianw> i also spelt that wrong 19:02:33 <fungi> nothing wrong with an annoucement or two 19:02:37 <ianw> anyway, yes, no other global announcements 19:02:49 <ianw> #topic Actions from last meeting 19:03:01 <ianw> #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-03-16-19.01.html minutes from last meeting 19:03:16 <ianw> we didn't seem to have any particular action items 19:03:42 <ianw> #topic Specs approval 19:04:09 <fungi> is the gerrit server replacement spec ready for consideration? 19:04:45 <ianw> not quite, i was going to start up a new server and then fill in some things from info from that 19:05:16 <fungi> cool. i'm good with what's there so far anyway 19:05:37 <ianw> #topic Priority Efforts 19:05:55 <ianw> #topic Update Config Management 19:06:08 <ianw> I think we will cover the active parts of this in other topics 19:06:33 <ianw> #topic Opendev 19:06:47 <ianw> the main work here is the Gerrit account inconsistencies 19:07:08 <ianw> this is really being driven by clarkb, but maybe fungi you have a update? 19:07:18 <fungi> nothing new this week, no 19:07:35 <fungi> in some belated afs news, the debian 10.9 stable point release will include the awaited openafs fix 19:07:56 <fungi> so we should be able to simplify our buster image builds next week 19:08:02 <fungi> release is scheduled for saturday 19:08:48 <ianw> cool. executors rely on AFS from outside the container though, right? 19:09:21 <fungi> yes, in our case i believe so 19:09:33 <fungi> but this was for testing it 19:09:42 <fungi> where we added the temporary workaround 19:10:04 <fungi> or at least that's the only lingering workaround i remember 19:10:38 <ianw> ++ 19:10:51 <fungi> also i've just about hammered out getting zuul-jobs working with our gentoo images again, thanks to prometheanfire's help 19:12:09 <ianw> yeah i saw something fly by -- i feel like gentoo images are currently not building 19:12:25 <fungi> oh, again? i'll check that too 19:12:26 <ianw> that was something to do with iscsi and newer gcc 19:12:30 <fungi> they were working a few days ago 19:12:41 <fungi> ahh, right, that. i think he had a fix happening upstream there 19:13:21 <fungi> and for simplifying our gerrit all-projects acl, i looked into repurposing the openstack/openstack acl to contain the openstack release management bits, but ultimately determined that was a non-starter due to an exclusive setting in one section. so i've tentatively settled on making "openstack/meta-config" as the empty project for other openstack projects to inherit, but am not thrilled with the name 19:13:23 <fungi> (especially considering we may want to recommend this model to other namespaces) 19:13:30 <ianw> ok, i've noticed failures on some glean changes i've pushed, will have to look closer 19:14:29 <fungi> i'll push the change up for openstack/meta-config later today, folks can follow up there if they have good name suggestions 19:14:58 <ianw> ok, this is for release managers to remove old branches? 19:15:59 <fungi> well, more generally, to get openstack release manager permissions out of our global config and into an openstack-only acl 19:16:22 <fungi> so that, e.g., openstack release managers can't accidentally push tags for airship 19:16:42 <ianw> ahh, right, got it 19:17:14 <fungi> but yes also so that they can't accidentally delete another project's branches 19:18:40 <ianw> there's also a note in the agenda about configuration tuning 19:18:53 <fungi> ahh, yup 19:18:58 <ianw> i'm not sure we've discussed that previously 19:19:22 <fungi> oh, also on the gerrit theme, i've pushed up some changes to partially restore launchpad in-progress integration 19:19:36 <fungi> as a stopgap until someone has time to write the replacement 19:19:58 <fungi> #link https://review.opendev.org/782538 Stop trying to assign Launchpad bugs 19:20:12 <fungi> #link https://review.opendev.org/782540 Run update-bug on patchset-created again 19:20:29 <ianw> ++ that seems like a good compromise 19:20:37 <fungi> the first one seems to have a job failure, likely bitrot for jeepyb 19:20:59 <fungi> i'll look at it shortly 19:21:22 <fungi> oh, and we're on a new version of zuul (4.1.0) but had to roll back off master temporarily 19:21:45 <fungi> corvus has fixed the bug we rolled back for, and we'll be restarting again on latest master shortly after this meeting 19:22:02 <corvus> and swest fixed the next bug we would have seen which avass found :) 19:22:41 <corvus> (2nd bug only affected github; we probably would have seen it eventually) 19:23:33 <fungi> i also revisited the gerrit upgrade fallout pad and tried to catch it up to current reality 19:23:40 <ianw> ok, will watch out for all that and any new behaviour 19:23:45 <fungi> #link https://etherpad.opendev.org/p/gerrit-3.2-post-upgrade-notes 19:23:49 <corvus> however, i think we had a "pretty good" run on 4.1.0 in that the openstack tenant was fully running with the event queues in zk, with, afaict, no appreciable change in performance or load on zk. so i'm not too worried about the switch back. 19:24:00 <fungi> if there's anything still in there which we've fixed or can stop worrying about, please mark it off the list 19:25:21 <ianw> #link https://grafana.opendev.org/d/5Imot6EMk/zuul-status?orgId=1 19:25:31 <ianw> for anyone who hasn't seen recent updates to add zookeeper stats in there 19:26:26 <fungi> oh, also there's a push to get debian-bullseye images added, starting with package mirroring. i think we'll need to evaluate quota usage on that volume as well as afs01.dfw overall 19:26:57 <ianw> i think i may still owe some cleanups on fedora 19:27:04 <fungi> checking out the volume utilization on our afs stats grafana dashboard, quite a few volumes are almost full, yeah 19:27:48 <ianw> #link https://grafana.opendev.org/d/T5zTt6PGk/afs?orgId=1 19:27:56 <fungi> i suggested seeing if we can drop debian-stretch mirroring, but a number of openstack projects are still etsting with it on older stable branches 19:27:58 <ianw> the wheel release stats there are depressing. i'll have to look at that 19:28:30 <fungi> though related, we still have a node label named "debian-stable" aliasing stretch, when buster is the current stable as of a couple of years ago 19:29:43 <fungi> we should probably encourage people to reevaluate their use of that, and either update or remove it 19:30:20 <ianw> we do have plenty of disk quota in rax dfw so adding a drive to vicepa might be the simplest thing 19:31:13 <fungi> yeah, though the more cinder volumes we attach the more precarious it becomes, as we saw with the old static.o.o 19:31:44 <fungi> we're basically multiplying the odds of catastrophic failure by the number of volumes 19:31:52 <ianw> or even afs01.dfw, when i rebooted it recently :) 19:32:21 <ianw> one thing i've been meaning to look at too, after that OVH region burnt down, is the redundancy of tarballs in particular 19:32:58 <fungi> in theory we replicate that, and can turn a read-only replica into the new read-write replica 19:33:11 <ianw> it's sort of related to the failure mode; when we have vos release failures and require full releases, we get tied up in days and days of copying 19:33:29 <fungi> yup 19:34:03 <fungi> as for the recent afs01.dfw boot failure, i'm almost certain it's because we created the pv on the raw vilume block device and not a partition 19:34:16 <fungi> i have a feeling we could reproduce that if we wanted 19:34:27 <ianw> still, since we moved to running releases via ssh i think things have generally been more reliable 19:34:39 <fungi> yes, that's helped immensely 19:35:23 <ianw> we also spent quite a long time diagnosing and tuning rsync to stop touching every file for some updates too, which helped 19:36:05 <ianw> alright, i think let's move on 19:36:08 <ianw> #topic General Topics 19:36:15 <ianw> #topic Puppet/Ansible rewrites 19:36:35 <ianw> i think the news of the week here was the launchers all switched over to fresh opendev.org versions 19:36:56 <ianw> i think that leaves zuul scheduler host as the only Xenial system in that ecosystem? 19:37:12 <ianw> executors, mergers, builders and launchers are all done now 19:38:05 <fungi> zk servers? 19:38:05 <ianw> i'm guessing with the pace of zuul development at the moment, we're better waiting a little to tackle that host 19:38:33 <fungi> yeah, just double-checked, our zk servers are also still xenial 19:38:45 <fungi> we should be able to rolling-replace those live 19:39:12 <fungi> though as corvus observed, doing that will end in zuul only connecting to two out of the three until the next time the zuul services are restarted 19:40:01 <fungi> because it won't automatically redistribute connections, only reconnect as needed 19:40:27 <ianw> i'm willing to help out on that, a good way to become more familiar with zk operations 19:41:11 <ianw> corvus: ^ maybe reach out when it's a better time to consider this, i.e. not pending bug restarts for bug fix updates :) 19:41:33 <ianw> #link https://etherpad.opendev.org/p/infra-puppet-conversions-and-xenial-upgrades 19:41:41 <ianw> i had a quick pass through that 19:42:36 <fungi> i'm tempted to snapshot the wiki server and try an in-place ubuntu upgrade for now, as repugnant as that idea may be 19:43:22 <fungi> part of why the wiki server isn't listed there is that it's not running xenial. still on trusty :/ 19:43:28 <corvus> i think we can upgrade zk any time; it's containerized, so we should already be running a recent release of the software; hopefully an os upgrade won't have too big of an impact 19:44:23 <corvus> ianw: i think if you want to go ahead and stage the patches to do that, it's probably okay to do so more or less any time 19:44:55 <ianw> corvus: ok, i'll take a look and see what i come up with 19:45:15 <ianw> i feel like clarkb might have already written a change to switch to focal in testing at least 19:45:27 <fungi> that does sound familiar 19:45:56 <ianw> one from that list was the asterisk server; i feel like retirement is probably the best idea there 19:46:25 <ianw> do we want a spec, or an announcement, or just changes we can vote on? 19:47:11 <fungi> announcement is probably in order, just in case anyone was using it 19:47:40 <fungi> ideally we'd work out how to move the current dial-in trunk's sip config to meetpad, but that's not absolutely necessary 19:47:52 <ianw> openstack-discuss or just the service list? 19:48:28 <fungi> i'd say service-announce 19:48:45 <ianw> ok, i'll give myself an action item to get that going 19:48:51 <fungi> thanks! 19:49:06 <ianw> #action ianw start retirement for asterisk 19:49:39 <ianw> there's nothing else on that list that is a surprise ... just a bunch of things we know we need to do :) but it is getting smaller 19:50:11 <ianw> #topic Refstack 19:50:22 <ianw> speaking of, i think this is almost ready to be dropped as a topic 19:50:30 <ianw> i have one outstanding bugfix review 19:50:30 <fungi> excellent 19:50:41 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/781593 19:51:39 <ianw> i will put in a todo to clean up the old server in a few months just to be super safe 19:51:54 <ianw> otherwise, i'd say this one is done 19:52:04 <fungi> yay! 19:52:12 <ianw> #topic PTG planning 19:52:33 <ianw> last week clarkb put out a call for suggestions on this, did we want dedicated times to talk, or a hackathon, etc 19:52:50 <ianw> tbh i feel like pretty much every day is a hackathon :) 19:53:31 <fungi> yeah, it's more like do we want a hackathon where we're all awake at the same relatively inconvenient time ;) 19:54:59 <ianw> i feel like the requests for times deadline was this thursday? 19:54:59 <fungi> anyway, i gave my loose suggestions last week, don't really have any new ideas personally 19:55:33 <fungi> yeah, maybe i'll double-check the ethercalc and see if he reserved anything 19:56:27 <fungi> amusing side-note, the ptg organizers forgot we run an ethercalc instance and created a spreadsheet on the ethercalc.org site instead, which has been going up and down and returning random errors to people 19:56:49 <ianw> ok, maybe i'll send a mail too. just in case anyone who doesn't hang out in meetings has an interest 19:56:59 <fungi> thanks 19:57:08 <ianw> it would certainly be worth it if we have a dedicated time to help onbaord people who are interested, etc. 19:57:30 <ianw> #topic Open Discussion 19:57:35 <fungi> yes, especially new config reviewers 19:57:49 <fungi> but anybody really 19:58:02 <ianw> this is true, it is probably worth reserving a time dedicated for that, see who turns up 19:58:45 <ianw> there's been a bit of work on glean lately if anyone wants to look 19:58:48 <fungi> #link https://ethercalc.net/oz7q0gds9zfi PTG schedule spreadsheet 19:58:56 <fungi> i don't see opendev reserving any slots in there yet 19:59:19 <ianw> basically all open changes. ironic have some requirements there 19:59:49 <fungi> oh, following up on the gentoo image builds, prometheanfire has a dib change proposed to solve it 19:59:57 <fungi> see #opendev for details 20:00:07 <ianw> ok will look 20:00:12 <fungi> thanks for chairing, ianw! 20:00:22 <ianw> that's about time, see you all next time! 20:00:27 <ianw> #endmeeting