19:01:20 <clarkb> #startmeeting infra 19:01:20 <opendevmeet> Meeting started Tue Feb 8 19:01:20 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:20 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:20 <opendevmeet> The meeting name has been set to 'infra' 19:01:31 <ianw> o/ 19:01:31 <clarkb> #link https://lists.opendev.org/pipermail/service-discuss/2022-February/000317.html Our Agenda 19:01:38 <frickler> \o 19:01:42 <clarkb> #topic Announcements 19:01:48 <clarkb> OpenInfra Summit CFP needs your input: https://openinfra.dev/summit/ You have until tomorrow to get your ideas in. 19:02:11 <clarkb> I believe the deadline is just something like 21:00 UTC February 9 19:02:33 <clarkb> it is based on US central time. If you plan to get something in last minute definitely double check what the cut off is 19:02:45 <clarkb> Service coordinator nominations run January 25, 2022 - February 8, 2022 Today is the last day. 19:03:37 <clarkb> two things about this. First I've realized that I don't think I formally sent announcement of this outside of our meeting agendas and the email planning for it last year 19:03:45 <clarkb> second we don't have any volunteers yet 19:04:22 <clarkb> Do we think we should send formal announcement in a dedicated thread and give it another week? Not doing that was on me. I think in my head I had done so because I put it on the agenda but that probably wasn't sufficient 19:05:07 <fungi> oh, sure i don't think delaying it a week will hurt 19:05:17 <frickler> I would be surprised if anyone outside this group would show up, so I'd be fine with you just continuing 19:05:19 <clarkb> I'm willing to volunteer again, but would be happy to have someone else do it too. I just don't want anyone to feel like I was being sneaky if I volunteer last minute after doing a poor job announcing this 19:05:46 <clarkb> frickler: agreed, but in that case I don't think waiting another week will hurt anything either. 19:05:57 <clarkb> And that way I can feel like this was done a bit more above board 19:06:04 <frickler> yeah, it won't change anything, true 19:06:35 <frickler> oh, while we're announcing, I'll be on PTO next monday and tuesday 19:06:49 <frickler> so probably won't make it to the meeting, too 19:06:57 <clarkb> If there are no objections I'll send an email today asking for nominations until 23:59 UTC February 15, 2022 just to make sure it is clear and anyone can speak up if they aren't reading meeting agendas or attending 19:07:09 <clarkb> frickler: hopefully you get to do something fun 19:07:51 <clarkb> alright I'm not hearing objections so I'll proceed with that plan after the meeting 19:07:59 <clarkb> #topic Actions from last meeting 19:08:04 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2022/infra.2022-02-01-19.01.txt minutes from last meeting 19:08:15 <clarkb> We recorded two actions. The first was for clarkb to make a list of opendev projects to retire 19:08:20 <clarkb> #link https://etherpad.opendev.org/p/opendev-repo-retirements List of repos to retire. Please double check. 19:08:35 <clarkb> This is step 0 to cleaning up old reviews as we can abandon any changes associated with those repos once they are retired 19:08:59 <clarkb> If you get time please take a look and cross any out that shouldn't be retired or feel free to add projects that should be. I'm hoping to start batching those up later this week 19:09:19 <clarkb> And the other action was frickler was going to push a chaneg to reenable gerrit mergability checks 19:09:32 <clarkb> frickler: I looked for a change but didn't see one. Did I miss it or should we cotninue recording this as an action? 19:09:33 <frickler> yeah, I didn't get to that after having fun with cirros 19:09:39 <frickler> please continue 19:09:59 <clarkb> #action frickler Push change to reenable Gerrit mergability checks 19:10:10 <clarkb> thanks! the cirros stuff helped us double check other important items :) 19:10:18 <clarkb> #topic Topics 19:10:24 <clarkb> #topic Improving OpenDev's CD throughput 19:10:34 <clarkb> I keep getting distracted and reviewing these changes falls lower on the list :( 19:10:45 <clarkb> ianw: anything new to call out for those of us that haven't been able to give this attention recently? 19:11:21 <ianw> no, they might make it to the top of the todo list later this week :) 19:11:59 <clarkb> #topic Container Maintenance 19:12:36 <clarkb> I haven't made recent progress on this, but feel like I am getting out of the hole of the zuul release and server upgrades and summit CFP whee I have time for this. jentoio if an afternoon later this week works for you please reach out. 19:12:54 <clarkb> jentoio: I think we can get on a call together for an hour or two and work through a specific item together 19:13:48 <clarkb> I did take notes a week or two back that hsould help us identify a good candidate and take it from there 19:14:45 <clarkb> Anyway lets sync up later today (or otherwise soon) and find a good time that works 19:14:51 <clarkb> #topic Nodepool image cleanups 19:15:08 <clarkb> CentOS 8 is now gone. We've removed the image from nodepool and the repo from our mirrors. 19:15:23 <clarkb> We accelerated this because upstream started removing bits from repos that broke things anyway 19:15:42 <clarkb> However, we ran into problems where projects were stuck in a chicken and egg situation unable to remove centos 8 jobs beacuse centos 8 was gone 19:15:55 <clarkb> To address this we added the nodeset back to base-jobs but centos-8-stream provides the nodes 19:16:19 <clarkb> We should check periodically with projects on when we can remove that label to remove any remaining confusion over what centos-8 is. But we don't need to be in a huge rush for that. 19:17:06 <ianw> hrm, i thought we were doing that just to avoid zuul errors 19:17:09 <fungi> odds are some of those jobs may "just work" with stream, but they should be corrected anyway 19:17:28 <ianw> i guess it works ... but if they "just work" i think they'll probably never get their node-type fixed 19:17:31 <clarkb> ianw: ya it was the circular zuul errors thta prevented them from removing the centos-8 jobs 19:17:37 <fungi> ianw: we were doing it to avoid zuul errors which prevented those projects from merging the changes they needed to remove the nodes 19:17:47 <clarkb> they are still expected to remove those jobs and stop using the nodeset 19:18:19 <ianw> sure; i wonder if a "fake" node that doesn't fail errors, but doesn't run anything, could work 19:18:25 <fungi> the alternative was for gerrit admins to bypass testing and merge the removal changes for them since they were untestable 19:18:34 <frickler> can we somehow make those jobs fail instead of passing? that would increase probability of fixing things 19:18:42 <fungi> but yeah, we could have pointed them at a different label too 19:18:47 <clarkb> ianw: frickler: that is ag ood idea though I'm not sure how to make that happen 19:19:03 <fungi> i guess we'd need to add an intentionally broken distro 19:19:10 <ianw> if we had a special node that just always was broken 19:19:26 <clarkb> we might be able to do it by setting up a nodepool image+label where the username isn't valid 19:19:31 <clarkb> so nodepool would node_failure it 19:19:32 <fungi> and this becomes an existential debate into what constitutes "broken" ;) 19:19:49 <fungi> but yeah, that sounds pretty broken 19:19:52 <clarkb> basically reuse an existing image but tell nodepool to login as zuulinvalid 19:20:01 <fungi> and would probably be doable without a special image 19:20:02 <clarkb> then we don't have another image around 19:20:07 <clarkb> fungi: ya exactly 19:20:12 <fungi> wfm 19:20:46 <ianw> i wonder if it's generic enough to do something in nodepool itself 19:20:48 <frickler> and nodefailure is actually better than just failing the jobs 19:20:56 <ianw> maybe a no-op node 19:21:03 <frickler> because it gives a hint into the right direction 19:21:06 <clarkb> frickler: ++ 19:21:20 <ianw> although, no-op tends to suggest passing, rather than failing 19:21:54 <ianw> i can take a look at some options 19:21:59 <clarkb> thanks! 19:22:13 <clarkb> ianw: frickler: I've also noticed progress on getting things switched over to fedora 35 19:22:24 <clarkb> are we near being able to remove fedora 34 yet? 19:22:30 <frickler> yes, I merged ianw's work earlier today 19:23:06 <ianw> yeah, thanks for that, once devstack updates that should be the last user and we can remove that node type for f35 19:23:14 <frickler> f34 19:23:14 <clarkb> exciting 19:23:27 <ianw> #link https://review.opendev.org/c/openstack/diskimage-builder/+/827772 19:23:46 <ianw> is one minor one that stops a bunch of locale error messages, but will need a dib release 19:24:16 <frickler> another question that came up was how long do we want to keep running xenial images? 19:24:34 <clarkb> frickler: at this point I think they are largely there for our own needs with the last few puppeted things 19:24:51 <frickler> no, there are some py2 jobs and others afaict 19:25:10 <clarkb> frickler: I've got on my todo list to start cleaning up openstack health, subunit2sql, and logstash/elasticsearch stuff which will be a huge chunk of that removed. Now that openstack is doing that with opensearch 19:25:12 <clarkb> frickler: oh interesting 19:25:37 <clarkb> frickler: I think we should push those ahead as much as possible. The fact xenial remains is an artifact of our puppetry and not because we think anyone should be using it anymore 19:26:05 <frickler> publish-openstack-sphinx-docs-base is one example 19:26:08 <clarkb> frickler: do you think that deserves a targetted email to the existing users? 19:26:38 <fungi> right, if we still want to test that our remaining puppet modules can deploy things, we need xenial servers until we decommission the last of them or update their configuration management to something post-puppet 19:27:01 <fungi> thankfully their numbers are continuing to dwindle 19:27:21 <frickler> I'd say I try to collect a more complete list of jobs that would be affected and we can discuss again next week 19:27:39 <clarkb> sounds good 19:27:39 <frickler> except I'm not there, but I'll try to prepare the list at least 19:27:53 <clarkb> frickler: ya if you put the list on the agenda or somewhere else conspicuous I can bring it up 19:29:11 <ianw> on a probably similar note, i started looking at the broken translation proposal jobs 19:29:42 <frickler> but those were bionic iirc? 19:29:58 <ianw> these seemed to break when we moved ensure-sphinx to py3 only, so somehow it seems py2 is involved 19:30:00 <fungi> yeah, bionic 19:30:19 <ianw> but also, all the zanata stuff is stuck in puppet but has no clear path 19:30:22 <frickler> ianw: from what I saw, they were py3 before 19:30:35 <fungi> note that translation imports aren't the only jobs which were broken by that ensure-sphinx change, a bunch of pre-py3 openstack branches were still testing docs builds daily 19:30:36 <frickler> so rather the change from virtualenv to python3 -m venv 19:30:53 <ianw> mmm, yeah that could be it 19:31:12 <ianw> #link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/828219 19:31:26 <clarkb> ianw: ya I think we need to start a thred on the openstack discuss list (because zanata is openstack specific) and lay out the issues and try to get help with a plan forward 19:31:32 <ianw> better explains ^ ... if "python3 -m venv" has a lower version of pip by default than the virtualenv 19:31:52 <clarkb> I'm nto sure I'm fully clued in on all the new stuff. Maybe we should draft an email on an etherpad to make sure we cover all teh important points? 19:32:58 <ianw> clarkb: i feel like there have been threads, but yeah restarting the conversation won't hurt. although i feel like, just like with the images, we might need to come to a "this will break at X" time-frame 19:33:23 <ianw> i can dig through my notes and come up with something 19:33:35 <clarkb> ianw: ya I think that is fair. It is basically what we did with elasticsaerch too. basically this is not currently maintainable and we don't have the bandwidth to make it maintainable. If people want to help us change that please reach out otherwise we'll need to sunset at $time 19:34:09 <clarkb> and ya there have been some threads, but I think they end up bitrotting in people's minds as zanata continues to work :) 19:35:15 <clarkb> thanks! 19:35:24 <clarkb> #topic Cleaing up Old Reviews 19:35:39 <clarkb> As mentioned previously I came up with a list of old repos in opendev that we can probably retire 19:35:46 <clarkb> #link https://etherpad.opendev.org/p/opendev-repo-retirements List of repos to retire. Please double check 19:36:09 <clarkb> If we're happy with the list running through retirements for them will largely be mechanical and then we can abandon all related changes as phase one of the cleanup here 19:37:04 <fungi> i added one a few minutes ago 19:37:05 <clarkb> Then we can see where we are at and dive into system-config cleanups 19:37:08 <clarkb> fungi: thanks! 19:37:40 <fungi> abandoning the changes for those repos will also be necessary as a step in their retirement anyway 19:37:51 <clarkb> fungi: yup the two processes are tied together hwich is nice 19:38:56 <clarkb> The last item on the agenda was already covered (Gerrit mergability checking) 19:39:00 <clarkb> #topic Open Discussion 19:39:32 <clarkb> I'm working on a gitea 1.16 upgrade. First change to do that with confidence is an update to our gitea testing to ensure we're exercisign the ssh container: https://review.opendev.org/c/opendev/system-config/+/828203 19:40:01 <clarkb> The child change which actually upgrades to 1.16 should probably be WIP (I'll do that now) until we've had time to go over the changelog a bit more and hold a node to look for any problems 19:40:24 <clarkb> As a general rule the bug releases have been fine with gitea but the feature releases have been more problematic and I don't mind taking some time to check stuff 19:41:17 <frickler> do we know how much quota we have for our log storage and how much of it we are using? 19:41:47 <frickler> the cirros artifacts, once I manage to collect them properly, are pretty large and I don't want to explode anything 19:42:00 <clarkb> frickler: I don't think quota was ever explicitly stated. Instead we maintained an expiry of 30 days whcih is what we did prior to the move to swift 19:42:17 <clarkb> The swift apis in the clouds we use for this should be abel to give us container size iirc 19:42:33 <clarkb> but I haven't used the swift tooling in a while so could be wrong about that 19:42:45 <frickler> o.k., so I can try to look into checking that manually, thx 19:43:08 <clarkb> note that we shard the containers too so you might need a script to get it 19:43:11 <clarkb> but that should be doable 19:43:22 <fungi> we had a request come through backchannels at the foundation from a user who is looking to have an old ethercalc spreadsheet restored (it was defaced or corrupted at some point in the past). unfortunately the most they know is that the content was intact as of mid-2019. any idea if we even have backups stretching back that far (not to mention that they'd be in bup not borg)? 19:44:24 <ianw> hrm, i think the answer is no 19:44:30 <clarkb> fungi: Ithink if you login to the two backup servers you can see what volumes are mounted and see if we have any of the old volumes but I suspect ya that 19:44:58 <clarkb> your best bet is probably to get the oldest version in borg and spin up an ethercalc locally off of that and se eif the content was corrupted then 19:45:15 <fungi> i was going to say that we have no easy way to restore the data regardless, but also if we don't even have the data still then that's pretty much the end of it 19:45:17 <clarkb> (which is unfortunately a lot of work) 19:45:20 <ianw> unless deletion is just a bit flip of "this is just deleted" 19:45:39 <ianw> you can quickly mount the borg backups on the host 19:45:45 <clarkb> ianw: it looks like ethercalc doesn't store history of edits like etherpad does unfrotunately 19:45:50 <clarkb> it is far more "ether" :) 19:45:59 <fungi> yeah, ethercalc itself can't unwind edits 19:46:22 <fungi> once a pad is defaced, the way you fix it is to reimport the backup export you made 19:46:43 <clarkb> but also it is redis so there isn't an easy just grab these rows from the db backup 19:46:50 <fungi> in this case the user did make a backup, but onto a device whose hard drive apparently died recently 19:46:51 <clarkb> its a whole db backup and I think you have to use it that way 19:47:16 <ianw> ethercalc02-filesystem-2021-02-28T05:51:02 is the earliest 19:47:28 <fungi> oh, thanks for checking! 19:47:51 <ianw> (/usr/local/bin/borg-mount backup02.vexxhost.opendev.org on the host) 19:48:13 <fungi> that's the borg backup. do we still have older bup backups? 19:48:25 <fungi> or did we delete the bup servers? 19:49:01 <clarkb> fungi: correct the bup servers were deleted. Their volumes were kept for a time. Then also eventually deleted iirc 19:49:32 <clarkb> you could double check that the bup volumes are gone. its possible they remained. But pretty sure the servers did not 19:49:49 <ianw> neither has bup backups mounted 19:49:57 <fungi> yeah, okay, so sounds like that's gone, thanks 19:50:47 <ianw> i have in my old todo notes "cleanup final bup backup volume 2021-07" 19:50:48 <clarkb> corvus: made a thing that was really cool I wanted to call out https://twitter.com/acmegating/status/1490821104918618112 19:51:05 <ianw> but no note that i did that ... fungi i'll poke to make sure 19:51:18 <clarkb> shows how we can do testing of gerrit upstream in our downstream jobs with our configs 19:51:28 <clarkb> (using changes that fungi and I wrote as illustrations) 19:52:16 <ianw> nice! 19:52:20 <fungi> now if they'll only merge your fix so i can get on with plastering over gitiles 19:54:32 <clarkb> Sounds like that may be it. Thank you everyone! 19:54:36 <clarkb> we'll see you here next week 19:54:41 <clarkb> #endmeeting