Tuesday, 2022-02-08

clarkbOur meeting will begin in a couple of minutes18:59
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Feb  8 19:01:20 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
ianwo/19:01
clarkb#link https://lists.opendev.org/pipermail/service-discuss/2022-February/000317.html Our Agenda19:01
frickler\o19:01
clarkb#topic Announcements19:01
clarkbOpenInfra Summit CFP needs your input: https://openinfra.dev/summit/ You have until tomorrow to get your ideas in.19:01
clarkbI believe the deadline is just something like 21:00 UTC February 919:02
clarkbit is based on US central time. If you plan to get something in last minute definitely double check what the cut off is19:02
clarkbService coordinator nominations run January 25, 2022 - February 8, 2022  Today is the last day.19:02
clarkbtwo things about this. First I've realized that I don't think I formally sent announcement of this outside of our meeting agendas and the email planning for it last year19:03
clarkbsecond we don't have any volunteers yet19:03
clarkbDo we think we should send formal announcement in a dedicated thread and give it another week? Not doing that was on me. I think in my head I had done so because I put it on the agenda but that probably wasn't sufficient19:04
fungioh, sure i don't think delaying it a week will hurt19:05
fricklerI would be surprised if anyone outside this group would show up, so I'd be fine with you just continuing19:05
clarkbI'm willing to volunteer again, but would be happy to have someone else do it too. I just don't want anyone to feel like I was being sneaky if I volunteer last minute after doing a poor job announcing this19:05
clarkbfrickler: agreed, but in that case I don't think waiting another week will hurt anything either.19:05
clarkbAnd that way I can feel like this was done a bit more above board19:05
frickleryeah, it won't change anything, true19:06
frickleroh, while we're announcing, I'll be on PTO next monday and tuesday19:06
fricklerso probably won't make it to the meeting, too19:06
clarkbIf there are no objections I'll send an email today asking for nominations until 23:59 UTC February 15, 2022 just to make sure it is clear and anyone can speak up if they aren't reading meeting agendas or attending19:06
clarkbfrickler: hopefully you get to do something fun19:07
clarkbalright I'm not hearing objections so I'll proceed with that plan after the meeting19:07
clarkb#topic Actions from last meeting19:07
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2022/infra.2022-02-01-19.01.txt minutes from last meeting19:08
clarkbWe recorded two actions. The first was for clarkb to make a list of opendev projects to retire19:08
clarkb#link https://etherpad.opendev.org/p/opendev-repo-retirements List of repos to retire. Please double check.19:08
clarkbThis is step 0 to cleaning up old reviews as we can abandon any changes associated with those repos once they are retired19:08
clarkbIf you get time please take a look and cross any out that shouldn't be retired or feel free to add projects that should be. I'm hoping to start batching those up later this week19:08
clarkbAnd the other action was frickler was going to push a chaneg to reenable gerrit mergability checks19:09
clarkbfrickler: I looked for a change but didn't see one. Did I miss it or should we cotninue recording this as an action?19:09
frickleryeah, I didn't get to that after having fun with cirros19:09
fricklerplease continue19:09
clarkb#action frickler Push change to reenable Gerrit mergability checks19:09
clarkbthanks! the cirros stuff helped us double check other important items :)19:10
clarkb#topic Topics19:10
clarkb#topic Improving OpenDev's CD throughput19:10
clarkbI keep getting distracted and reviewing these changes falls lower on the list :(19:10
clarkbianw: anything new to call out for those of us that haven't been able to give this attention recently?19:10
ianwno, they might make it to the top of the todo list later this week :)19:11
clarkb#topic Container Maintenance19:11
clarkbI haven't made recent progress on this, but feel like I am getting out of the hole of the zuul release and server upgrades and summit CFP whee I have time for this. jentoio if an afternoon later this week works for you please reach out.19:12
clarkbjentoio: I think we can get on a call together for an hour or two and work through a specific item together19:12
clarkbI did take notes a week or two back that hsould help us identify a good candidate and take it from there19:13
clarkbAnyway lets sync up later today (or otherwise soon) and find a good time that works19:14
clarkb#topic Nodepool image cleanups19:14
clarkbCentOS 8 is now gone. We've removed the image from nodepool and the repo from our mirrors.19:15
clarkbWe accelerated this because upstream started removing bits from repos that broke things anyway19:15
clarkbHowever, we ran into problems where projects were stuck in a chicken and egg situation unable to remove centos 8 jobs beacuse centos 8 was gone19:15
clarkbTo address this we added the nodeset back to base-jobs but centos-8-stream provides the nodes19:15
clarkbWe should check periodically with projects on when we can remove that label to remove any remaining confusion over what centos-8 is. But we don't need to be in a huge rush for that.19:16
ianwhrm, i thought we were doing that just to avoid zuul errors19:17
fungiodds are some of those jobs may "just work" with stream, but they should be corrected anyway19:17
ianwi guess it works ... but if they "just work" i think they'll probably never get their node-type fixed19:17
clarkbianw: ya it was the circular zuul errors thta prevented them from removing the centos-8 jobs19:17
fungiianw: we were doing it to avoid zuul errors which prevented those projects from merging the changes they needed to remove the nodes19:17
clarkbthey are still expected to remove those jobs and stop using the nodeset19:17
ianwsure; i wonder if a "fake" node that doesn't fail errors, but doesn't run anything, could work19:18
fungithe alternative was for gerrit admins to bypass testing and merge the removal changes for them since they were untestable19:18
fricklercan we somehow make those jobs fail instead of passing? that would increase probability of fixing things19:18
fungibut yeah, we could have pointed them at a different label too19:18
clarkbianw: frickler: that is ag ood idea though I'm not sure how to make that happen19:18
fungii guess we'd need to add an intentionally broken distro19:19
ianwif we had a special node that just always was broken19:19
clarkbwe might be able to do it by setting up a nodepool image+label where the username isn't valid19:19
clarkbso nodepool would node_failure it19:19
fungiand this becomes an existential debate into what constitutes "broken" ;)19:19
fungibut yeah, that sounds pretty broken19:19
clarkbbasically reuse an existing image but tell nodepool to login as zuulinvalid19:19
fungiand would probably be doable without a special image19:20
clarkbthen we don't have another image around19:20
clarkbfungi: ya exactly19:20
fungiwfm19:20
ianwi wonder if it's generic enough to do something in nodepool itself19:20
fricklerand nodefailure is actually better than just failing the jobs19:20
ianwmaybe a no-op node19:20
fricklerbecause it gives a hint into the right direction19:21
clarkbfrickler: ++19:21
ianwalthough, no-op tends to suggest passing, rather than failing19:21
ianwi can take a look at some options19:21
clarkbthanks!19:21
clarkbianw: frickler: I've also noticed progress on getting things switched over to fedora 3519:22
clarkbare we near being able to remove fedora 34 yet?19:22
frickleryes, I merged ianw's work earlier today19:22
ianwyeah, thanks for that, once devstack updates that should be the last user and we can remove that node type for f3519:23
fricklerf3419:23
clarkbexciting19:23
ianw#link https://review.opendev.org/c/openstack/diskimage-builder/+/82777219:23
ianwis one minor one that stops a bunch of locale error messages, but will need a dib release19:23
frickleranother question that came up was how long do we want to keep running xenial images?19:24
clarkbfrickler: at this point I think they are largely there for our own needs with the last few puppeted things19:24
fricklerno, there are some py2 jobs and others afaict19:24
clarkbfrickler: I've got on my todo list to start cleaning up openstack health, subunit2sql, and logstash/elasticsearch stuff which will be a huge chunk of that removed. Now that openstack is doing that with opensearch19:25
clarkbfrickler: oh interesting19:25
clarkbfrickler: I think we should push those ahead as much as possible. The fact xenial remains is an artifact of our puppetry and not because we think anyone should be using it anymore19:25
fricklerpublish-openstack-sphinx-docs-base is one example19:26
clarkbfrickler: do you think that deserves a targetted email to the existing users?19:26
fungiright, if we still want to test that our remaining puppet modules can deploy things, we need xenial servers until we decommission the last of them or update their configuration management to something post-puppet19:26
fungithankfully their numbers are continuing to dwindle19:27
fricklerI'd say I try to collect a more complete list of jobs that would be affected and we can discuss again next week19:27
clarkbsounds good19:27
fricklerexcept I'm not there, but I'll try to prepare the list at least19:27
clarkbfrickler: ya if you put the list on the agenda or somewhere else conspicuous I can bring it up19:27
ianwon a probably similar note, i started looking at the broken translation proposal jobs19:29
fricklerbut those were bionic iirc?19:29
ianwthese seemed to break when we moved ensure-sphinx to py3 only, so somehow it seems py2 is involved19:29
fungiyeah, bionic19:30
ianwbut also, all the zanata stuff is stuck in puppet but has no clear path19:30
fricklerianw: from what I saw, they were py3 before19:30
funginote that translation imports aren't the only jobs which were broken by that ensure-sphinx change, a bunch of pre-py3 openstack branches were still testing docs builds daily19:30
fricklerso rather the change from virtualenv to python3 -m venv19:30
ianwmmm, yeah that could be it19:30
ianw#link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/82821919:31
clarkbianw: ya I think we need to start a thred on the openstack discuss list (because zanata is openstack specific) and lay out the issues and try to get help with a plan forward19:31
ianwbetter explains ^ ... if "python3 -m venv" has a lower version of pip by default than the virtualenv19:31
clarkbI'm nto sure I'm fully clued in on all the new stuff. Maybe we should draft an email on an etherpad to make sure we cover all teh important points?19:31
ianwclarkb: i feel like there have been threads, but yeah restarting the conversation won't hurt.  although i feel like, just like with the images, we might need to come to a "this will break at X" time-frame 19:32
ianwi can dig through my notes and come up with something19:33
clarkbianw: ya I think that is fair. It is basically what we did with elasticsaerch too. basically this is not currently maintainable and we don't have the bandwidth to make it maintainable. If people want to help us change that please reach out otherwise we'll need to sunset at $time19:33
clarkband ya there have been some threads, but I think they end up bitrotting in people's minds as zanata continues to work :)19:34
clarkbthanks!19:35
clarkb#topic Cleaing up Old Reviews19:35
clarkbAs mentioned previously I came up with a list of old repos in opendev that we can probably retire19:35
clarkb#link https://etherpad.opendev.org/p/opendev-repo-retirements List of repos to retire. Please double check19:35
clarkbIf we're happy with the list running through retirements for them will largely be mechanical and then we can abandon all related changes as phase one of the cleanup here19:36
fungii added one a few minutes ago19:37
clarkbThen we can see where we are at and dive into system-config cleanups19:37
clarkbfungi: thanks!19:37
fungiabandoning the changes for those repos will also be necessary as a step in their retirement anyway19:37
clarkbfungi: yup the two processes are tied together hwich is nice19:37
clarkbThe last item on the agenda was already covered (Gerrit mergability checking)19:38
clarkb#topic Open Discussion19:39
clarkbI'm working on a gitea 1.16 upgrade. First change to do that with confidence is an update to our gitea testing to ensure we're exercisign the ssh container: https://review.opendev.org/c/opendev/system-config/+/82820319:39
clarkbThe child change which actually upgrades to 1.16 should probably be WIP (I'll do that now) until we've had time to go over the changelog a bit more and hold a node to look for any problems19:40
clarkbAs a general rule the bug releases have been fine with gitea but the feature releases have been more problematic and I don't mind taking some time to check stuff19:40
fricklerdo we know how much quota we have for our log storage and how much of it we are using?19:41
fricklerthe cirros artifacts, once I manage to collect them properly, are pretty large and I don't want to explode anything19:41
clarkbfrickler: I don't think quota was ever explicitly stated. Instead we maintained an expiry of 30 days whcih is what we did prior to the move to swift19:42
clarkbThe swift apis in the clouds we use for this should be abel to give us container size iirc19:42
clarkbbut I haven't used the swift tooling in a while so could be wrong about that19:42
fricklero.k., so I can try to look into checking that manually, thx19:42
clarkbnote that we shard the containers too so you might need a script to get it19:43
clarkbbut that should be doable19:43
fungiwe had a request come through backchannels at the foundation from a user who is looking to have an old ethercalc spreadsheet restored (it was defaced or corrupted at some point in the past). unfortunately the most they know is that the content was intact as of mid-2019. any idea if we even have backups stretching back that far (not to mention that they'd be in bup not borg)?19:43
ianwhrm, i think the answer is no19:44
clarkbfungi:  Ithink if you login to the two backup servers you can see what volumes are mounted and see if we have any of the old volumes but I suspect ya that19:44
clarkbyour best bet is probably to get the oldest version in borg and spin up an ethercalc locally off of that and se eif the content was corrupted then19:44
fungii was going to say that we have no easy way to restore the data regardless, but also if we don't even have the data still then that's pretty much the end of it19:45
clarkb(which is unfortunately a lot of work)19:45
ianwunless deletion is just a bit flip of "this is just deleted"19:45
ianwyou can quickly mount the borg backups on the host19:45
clarkbianw: it looks like ethercalc doesn't store history of edits like etherpad does unfrotunately19:45
clarkbit is far more "ether" :)19:45
fungiyeah, ethercalc itself can't unwind edits19:45
fungionce a pad is defaced, the way you fix it is to reimport the backup export you made19:46
clarkbbut also it is redis so there isn't an easy just grab these rows from the db backup19:46
fungiin this case the user did make a backup, but onto a device whose hard drive apparently died recently19:46
clarkbits a whole db backup and I think you have to use it that way19:46
ianwethercalc02-filesystem-2021-02-28T05:51:02 is the earliest19:47
fungioh, thanks for checking!19:47
ianw(/usr/local/bin/borg-mount backup02.vexxhost.opendev.org on the host)19:47
fungithat's the borg backup. do we still have older bup backups?19:48
fungior did we delete the bup servers?19:48
clarkbfungi: correct the bup servers were deleted. Their volumes were kept for a time. Then also eventually deleted iirc19:49
clarkbyou could double check that the bup volumes are gone. its possible they remained. But pretty sure the servers did not19:49
ianwneither has bup backups mounted19:49
fungiyeah, okay, so sounds like that's gone, thanks19:49
ianwi have in my old todo notes  "cleanup final bup backup volume 2021-07"19:50
clarkbcorvus: made a thing that was really cool I wanted to call out https://twitter.com/acmegating/status/149082110491861811219:50
ianwbut no note that i did that ... fungi i'll poke to make sure19:51
clarkbshows how we can do testing of gerrit upstream in our downstream jobs with our configs19:51
clarkb(using changes that fungi and I wrote as illustrations)19:51
ianwnice!19:52
funginow if they'll only merge your fix so i can get on with plastering over gitiles19:52
clarkbSounds like that may be it. Thank you everyone!19:54
clarkbwe'll see you here next week19:54
clarkb#endmeeting19:54
opendevmeetMeeting ended Tue Feb  8 19:54:41 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:54
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2022/infra.2022-02-08-19.01.html19:54
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-02-08-19.01.txt19:54
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2022/infra.2022-02-08-19.01.log.html19:54
fungithanks clarkb!19:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!