19:01:20 <clarkb> #startmeeting infra
19:01:20 <opendevmeet> Meeting started Tue Feb  8 19:01:20 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:20 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:20 <opendevmeet> The meeting name has been set to 'infra'
19:01:31 <ianw> o/
19:01:31 <clarkb> #link https://lists.opendev.org/pipermail/service-discuss/2022-February/000317.html Our Agenda
19:01:38 <frickler> \o
19:01:42 <clarkb> #topic Announcements
19:01:48 <clarkb> OpenInfra Summit CFP needs your input: https://openinfra.dev/summit/ You have until tomorrow to get your ideas in.
19:02:11 <clarkb> I believe the deadline is just something like 21:00 UTC February 9
19:02:33 <clarkb> it is based on US central time. If you plan to get something in last minute definitely double check what the cut off is
19:02:45 <clarkb> Service coordinator nominations run January 25, 2022 - February 8, 2022  Today is the last day.
19:03:37 <clarkb> two things about this. First I've realized that I don't think I formally sent announcement of this outside of our meeting agendas and the email planning for it last year
19:03:45 <clarkb> second we don't have any volunteers yet
19:04:22 <clarkb> Do we think we should send formal announcement in a dedicated thread and give it another week? Not doing that was on me. I think in my head I had done so because I put it on the agenda but that probably wasn't sufficient
19:05:07 <fungi> oh, sure i don't think delaying it a week will hurt
19:05:17 <frickler> I would be surprised if anyone outside this group would show up, so I'd be fine with you just continuing
19:05:19 <clarkb> I'm willing to volunteer again, but would be happy to have someone else do it too. I just don't want anyone to feel like I was being sneaky if I volunteer last minute after doing a poor job announcing this
19:05:46 <clarkb> frickler: agreed, but in that case I don't think waiting another week will hurt anything either.
19:05:57 <clarkb> And that way I can feel like this was done a bit more above board
19:06:04 <frickler> yeah, it won't change anything, true
19:06:35 <frickler> oh, while we're announcing, I'll be on PTO next monday and tuesday
19:06:49 <frickler> so probably won't make it to the meeting, too
19:06:57 <clarkb> If there are no objections I'll send an email today asking for nominations until 23:59 UTC February 15, 2022 just to make sure it is clear and anyone can speak up if they aren't reading meeting agendas or attending
19:07:09 <clarkb> frickler: hopefully you get to do something fun
19:07:51 <clarkb> alright I'm not hearing objections so I'll proceed with that plan after the meeting
19:07:59 <clarkb> #topic Actions from last meeting
19:08:04 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2022/infra.2022-02-01-19.01.txt minutes from last meeting
19:08:15 <clarkb> We recorded two actions. The first was for clarkb to make a list of opendev projects to retire
19:08:20 <clarkb> #link https://etherpad.opendev.org/p/opendev-repo-retirements List of repos to retire. Please double check.
19:08:35 <clarkb> This is step 0 to cleaning up old reviews as we can abandon any changes associated with those repos once they are retired
19:08:59 <clarkb> If you get time please take a look and cross any out that shouldn't be retired or feel free to add projects that should be. I'm hoping to start batching those up later this week
19:09:19 <clarkb> And the other action was frickler was going to push a chaneg to reenable gerrit mergability checks
19:09:32 <clarkb> frickler: I looked for a change but didn't see one. Did I miss it or should we cotninue recording this as an action?
19:09:33 <frickler> yeah, I didn't get to that after having fun with cirros
19:09:39 <frickler> please continue
19:09:59 <clarkb> #action frickler Push change to reenable Gerrit mergability checks
19:10:10 <clarkb> thanks! the cirros stuff helped us double check other important items :)
19:10:18 <clarkb> #topic Topics
19:10:24 <clarkb> #topic Improving OpenDev's CD throughput
19:10:34 <clarkb> I keep getting distracted and reviewing these changes falls lower on the list :(
19:10:45 <clarkb> ianw: anything new to call out for those of us that haven't been able to give this attention recently?
19:11:21 <ianw> no, they might make it to the top of the todo list later this week :)
19:11:59 <clarkb> #topic Container Maintenance
19:12:36 <clarkb> I haven't made recent progress on this, but feel like I am getting out of the hole of the zuul release and server upgrades and summit CFP whee I have time for this. jentoio if an afternoon later this week works for you please reach out.
19:12:54 <clarkb> jentoio: I think we can get on a call together for an hour or two and work through a specific item together
19:13:48 <clarkb> I did take notes a week or two back that hsould help us identify a good candidate and take it from there
19:14:45 <clarkb> Anyway lets sync up later today (or otherwise soon) and find a good time that works
19:14:51 <clarkb> #topic Nodepool image cleanups
19:15:08 <clarkb> CentOS 8 is now gone. We've removed the image from nodepool and the repo from our mirrors.
19:15:23 <clarkb> We accelerated this because upstream started removing bits from repos that broke things anyway
19:15:42 <clarkb> However, we ran into problems where projects were stuck in a chicken and egg situation unable to remove centos 8 jobs beacuse centos 8 was gone
19:15:55 <clarkb> To address this we added the nodeset back to base-jobs but centos-8-stream provides the nodes
19:16:19 <clarkb> We should check periodically with projects on when we can remove that label to remove any remaining confusion over what centos-8 is. But we don't need to be in a huge rush for that.
19:17:06 <ianw> hrm, i thought we were doing that just to avoid zuul errors
19:17:09 <fungi> odds are some of those jobs may "just work" with stream, but they should be corrected anyway
19:17:28 <ianw> i guess it works ... but if they "just work" i think they'll probably never get their node-type fixed
19:17:31 <clarkb> ianw: ya it was the circular zuul errors thta prevented them from removing the centos-8 jobs
19:17:37 <fungi> ianw: we were doing it to avoid zuul errors which prevented those projects from merging the changes they needed to remove the nodes
19:17:47 <clarkb> they are still expected to remove those jobs and stop using the nodeset
19:18:19 <ianw> sure; i wonder if a "fake" node that doesn't fail errors, but doesn't run anything, could work
19:18:25 <fungi> the alternative was for gerrit admins to bypass testing and merge the removal changes for them since they were untestable
19:18:34 <frickler> can we somehow make those jobs fail instead of passing? that would increase probability of fixing things
19:18:42 <fungi> but yeah, we could have pointed them at a different label too
19:18:47 <clarkb> ianw: frickler: that is ag ood idea though I'm not sure how to make that happen
19:19:03 <fungi> i guess we'd need to add an intentionally broken distro
19:19:10 <ianw> if we had a special node that just always was broken
19:19:26 <clarkb> we might be able to do it by setting up a nodepool image+label where the username isn't valid
19:19:31 <clarkb> so nodepool would node_failure it
19:19:32 <fungi> and this becomes an existential debate into what constitutes "broken" ;)
19:19:49 <fungi> but yeah, that sounds pretty broken
19:19:52 <clarkb> basically reuse an existing image but tell nodepool to login as zuulinvalid
19:20:01 <fungi> and would probably be doable without a special image
19:20:02 <clarkb> then we don't have another image around
19:20:07 <clarkb> fungi: ya exactly
19:20:12 <fungi> wfm
19:20:46 <ianw> i wonder if it's generic enough to do something in nodepool itself
19:20:48 <frickler> and nodefailure is actually better than just failing the jobs
19:20:56 <ianw> maybe a no-op node
19:21:03 <frickler> because it gives a hint into the right direction
19:21:06 <clarkb> frickler: ++
19:21:20 <ianw> although, no-op tends to suggest passing, rather than failing
19:21:54 <ianw> i can take a look at some options
19:21:59 <clarkb> thanks!
19:22:13 <clarkb> ianw: frickler: I've also noticed progress on getting things switched over to fedora 35
19:22:24 <clarkb> are we near being able to remove fedora 34 yet?
19:22:30 <frickler> yes, I merged ianw's work earlier today
19:23:06 <ianw> yeah, thanks for that, once devstack updates that should be the last user and we can remove that node type for f35
19:23:14 <frickler> f34
19:23:14 <clarkb> exciting
19:23:27 <ianw> #link https://review.opendev.org/c/openstack/diskimage-builder/+/827772
19:23:46 <ianw> is one minor one that stops a bunch of locale error messages, but will need a dib release
19:24:16 <frickler> another question that came up was how long do we want to keep running xenial images?
19:24:34 <clarkb> frickler: at this point I think they are largely there for our own needs with the last few puppeted things
19:24:51 <frickler> no, there are some py2 jobs and others afaict
19:25:10 <clarkb> frickler: I've got on my todo list to start cleaning up openstack health, subunit2sql, and logstash/elasticsearch stuff which will be a huge chunk of that removed. Now that openstack is doing that with opensearch
19:25:12 <clarkb> frickler: oh interesting
19:25:37 <clarkb> frickler: I think we should push those ahead as much as possible. The fact xenial remains is an artifact of our puppetry and not because we think anyone should be using it anymore
19:26:05 <frickler> publish-openstack-sphinx-docs-base is one example
19:26:08 <clarkb> frickler: do you think that deserves a targetted email to the existing users?
19:26:38 <fungi> right, if we still want to test that our remaining puppet modules can deploy things, we need xenial servers until we decommission the last of them or update their configuration management to something post-puppet
19:27:01 <fungi> thankfully their numbers are continuing to dwindle
19:27:21 <frickler> I'd say I try to collect a more complete list of jobs that would be affected and we can discuss again next week
19:27:39 <clarkb> sounds good
19:27:39 <frickler> except I'm not there, but I'll try to prepare the list at least
19:27:53 <clarkb> frickler: ya if you put the list on the agenda or somewhere else conspicuous I can bring it up
19:29:11 <ianw> on a probably similar note, i started looking at the broken translation proposal jobs
19:29:42 <frickler> but those were bionic iirc?
19:29:58 <ianw> these seemed to break when we moved ensure-sphinx to py3 only, so somehow it seems py2 is involved
19:30:00 <fungi> yeah, bionic
19:30:19 <ianw> but also, all the zanata stuff is stuck in puppet but has no clear path
19:30:22 <frickler> ianw: from what I saw, they were py3 before
19:30:35 <fungi> note that translation imports aren't the only jobs which were broken by that ensure-sphinx change, a bunch of pre-py3 openstack branches were still testing docs builds daily
19:30:36 <frickler> so rather the change from virtualenv to python3 -m venv
19:30:53 <ianw> mmm, yeah that could be it
19:31:12 <ianw> #link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/828219
19:31:26 <clarkb> ianw: ya I think we need to start a thred on the openstack discuss list (because zanata is openstack specific) and lay out the issues and try to get help with a plan forward
19:31:32 <ianw> better explains ^ ... if "python3 -m venv" has a lower version of pip by default than the virtualenv
19:31:52 <clarkb> I'm nto sure I'm fully clued in on all the new stuff. Maybe we should draft an email on an etherpad to make sure we cover all teh important points?
19:32:58 <ianw> clarkb: i feel like there have been threads, but yeah restarting the conversation won't hurt.  although i feel like, just like with the images, we might need to come to a "this will break at X" time-frame
19:33:23 <ianw> i can dig through my notes and come up with something
19:33:35 <clarkb> ianw: ya I think that is fair. It is basically what we did with elasticsaerch too. basically this is not currently maintainable and we don't have the bandwidth to make it maintainable. If people want to help us change that please reach out otherwise we'll need to sunset at $time
19:34:09 <clarkb> and ya there have been some threads, but I think they end up bitrotting in people's minds as zanata continues to work :)
19:35:15 <clarkb> thanks!
19:35:24 <clarkb> #topic Cleaing up Old Reviews
19:35:39 <clarkb> As mentioned previously I came up with a list of old repos in opendev that we can probably retire
19:35:46 <clarkb> #link https://etherpad.opendev.org/p/opendev-repo-retirements List of repos to retire. Please double check
19:36:09 <clarkb> If we're happy with the list running through retirements for them will largely be mechanical and then we can abandon all related changes as phase one of the cleanup here
19:37:04 <fungi> i added one a few minutes ago
19:37:05 <clarkb> Then we can see where we are at and dive into system-config cleanups
19:37:08 <clarkb> fungi: thanks!
19:37:40 <fungi> abandoning the changes for those repos will also be necessary as a step in their retirement anyway
19:37:51 <clarkb> fungi: yup the two processes are tied together hwich is nice
19:38:56 <clarkb> The last item on the agenda was already covered (Gerrit mergability checking)
19:39:00 <clarkb> #topic Open Discussion
19:39:32 <clarkb> I'm working on a gitea 1.16 upgrade. First change to do that with confidence is an update to our gitea testing to ensure we're exercisign the ssh container: https://review.opendev.org/c/opendev/system-config/+/828203
19:40:01 <clarkb> The child change which actually upgrades to 1.16 should probably be WIP (I'll do that now) until we've had time to go over the changelog a bit more and hold a node to look for any problems
19:40:24 <clarkb> As a general rule the bug releases have been fine with gitea but the feature releases have been more problematic and I don't mind taking some time to check stuff
19:41:17 <frickler> do we know how much quota we have for our log storage and how much of it we are using?
19:41:47 <frickler> the cirros artifacts, once I manage to collect them properly, are pretty large and I don't want to explode anything
19:42:00 <clarkb> frickler: I don't think quota was ever explicitly stated. Instead we maintained an expiry of 30 days whcih is what we did prior to the move to swift
19:42:17 <clarkb> The swift apis in the clouds we use for this should be abel to give us container size iirc
19:42:33 <clarkb> but I haven't used the swift tooling in a while so could be wrong about that
19:42:45 <frickler> o.k., so I can try to look into checking that manually, thx
19:43:08 <clarkb> note that we shard the containers too so you might need a script to get it
19:43:11 <clarkb> but that should be doable
19:43:22 <fungi> we had a request come through backchannels at the foundation from a user who is looking to have an old ethercalc spreadsheet restored (it was defaced or corrupted at some point in the past). unfortunately the most they know is that the content was intact as of mid-2019. any idea if we even have backups stretching back that far (not to mention that they'd be in bup not borg)?
19:44:24 <ianw> hrm, i think the answer is no
19:44:30 <clarkb> fungi:  Ithink if you login to the two backup servers you can see what volumes are mounted and see if we have any of the old volumes but I suspect ya that
19:44:58 <clarkb> your best bet is probably to get the oldest version in borg and spin up an ethercalc locally off of that and se eif the content was corrupted then
19:45:15 <fungi> i was going to say that we have no easy way to restore the data regardless, but also if we don't even have the data still then that's pretty much the end of it
19:45:17 <clarkb> (which is unfortunately a lot of work)
19:45:20 <ianw> unless deletion is just a bit flip of "this is just deleted"
19:45:39 <ianw> you can quickly mount the borg backups on the host
19:45:45 <clarkb> ianw: it looks like ethercalc doesn't store history of edits like etherpad does unfrotunately
19:45:50 <clarkb> it is far more "ether" :)
19:45:59 <fungi> yeah, ethercalc itself can't unwind edits
19:46:22 <fungi> once a pad is defaced, the way you fix it is to reimport the backup export you made
19:46:43 <clarkb> but also it is redis so there isn't an easy just grab these rows from the db backup
19:46:50 <fungi> in this case the user did make a backup, but onto a device whose hard drive apparently died recently
19:46:51 <clarkb> its a whole db backup and I think you have to use it that way
19:47:16 <ianw> ethercalc02-filesystem-2021-02-28T05:51:02 is the earliest
19:47:28 <fungi> oh, thanks for checking!
19:47:51 <ianw> (/usr/local/bin/borg-mount backup02.vexxhost.opendev.org on the host)
19:48:13 <fungi> that's the borg backup. do we still have older bup backups?
19:48:25 <fungi> or did we delete the bup servers?
19:49:01 <clarkb> fungi: correct the bup servers were deleted. Their volumes were kept for a time. Then also eventually deleted iirc
19:49:32 <clarkb> you could double check that the bup volumes are gone. its possible they remained. But pretty sure the servers did not
19:49:49 <ianw> neither has bup backups mounted
19:49:57 <fungi> yeah, okay, so sounds like that's gone, thanks
19:50:47 <ianw> i have in my old todo notes  "cleanup final bup backup volume 2021-07"
19:50:48 <clarkb> corvus: made a thing that was really cool I wanted to call out https://twitter.com/acmegating/status/1490821104918618112
19:51:05 <ianw> but no note that i did that ... fungi i'll poke to make sure
19:51:18 <clarkb> shows how we can do testing of gerrit upstream in our downstream jobs with our configs
19:51:28 <clarkb> (using changes that fungi and I wrote as illustrations)
19:52:16 <ianw> nice!
19:52:20 <fungi> now if they'll only merge your fix so i can get on with plastering over gitiles
19:54:32 <clarkb> Sounds like that may be it. Thank you everyone!
19:54:36 <clarkb> we'll see you here next week
19:54:41 <clarkb> #endmeeting