19:01:05 <clarkb> #startmeeting infra
19:01:05 <opendevmeet> Meeting started Tue Feb  1 19:01:05 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:05 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:05 <opendevmeet> The meeting name has been set to 'infra'
19:01:29 <ianw> o/
19:01:30 <frickler> o/
19:01:57 <fungi> ohai
19:02:15 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2022-January/000316.html Our Agenda
19:02:39 <clarkb> #topic Announcements
19:02:45 <clarkb> Service coordinator nominations run January 25, 2022 - February 8, 2022  You have another week :)
19:02:56 <clarkb> As always let me know if you have questions about that and I'd be happy to answer them
19:03:02 <clarkb> OpenInfra Summit CFP needs your input: https://openinfra.dev/summit/
19:03:28 <clarkb> If you'd like to talk at the open infra summit there is a ci/cd track as well as other tracks you may be interested in proposing towards. I think you have until february 9 for that
19:03:53 <clarkb> And finally Zuul v5 released today! The culmination of much long term planning and effort. Thank you everyone who helped make that possible
19:04:09 <clarkb> side note our zuul isntall still says v4.12.something but we're running the same commits that were tagged v5
19:04:52 <clarkb> #topic Actions from last meeting
19:05:04 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2022/infra.2022-01-25-19.01.txt minutes from last meeting
19:05:08 <clarkb> There were no actions recorded
19:05:25 <clarkb> #topic Topics
19:05:35 <clarkb> #topic Improving Opendev's CD throughput
19:05:41 <clarkb> #link https://review.opendev.org/c/opendev/infra-specs/+/821645 -- spec outlining some of the issues with secrets
19:05:47 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/821155 -- sample of secret writing; more info in changelog
19:06:05 <clarkb> Unfortunately the Gerrit upgrade and server patching had me far more distracted last week than I would've liked
19:06:14 <clarkb> I haven't had time to look at these yet. They are still on my todo list though...
19:06:23 <clarkb> maybe I should make an action for everyone to review those :)
19:06:49 <clarkb> #action infra-root Review OpenDev CD throughput related spec for secrets management: https://review.opendev.org/c/opendev/infra-specs/+/821645
19:06:58 <clarkb> ianw: is there anything else to add to this topic?
19:07:59 <ianw> no, no work has been done on this one
19:08:12 <clarkb> #topic Container Maintenance
19:08:17 <clarkb> #link https://etherpad.opendev.org/p/opendev-container-maintenance
19:08:26 <clarkb> My time for this last week was largely sidelined by server patching
19:08:40 <clarkb> I don't really have anything new to add to this unfortunately.
19:08:50 <clarkb> #topic Nodepool Image Cleanup
19:09:28 <clarkb> Changes to remove CentOS 8 have been pushed as promised by the end of January. However, at least one project (OSA) is still struggling with removing centos 8 so we can hold off until they are ready since they are actively workign to correct this
19:09:41 <clarkb> Once projects like OSA are ready we can land the chagnes in this order:
19:09:46 <clarkb> #link https://review.opendev.org/c/opendev/base-jobs/+/827181
19:09:50 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/827184
19:09:54 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/827186
19:10:46 <ianw> so we found yesterday that the centos mirror infrastructure stopped returning links
19:10:50 <clarkb> It looks like centos 8 upstream is starting to archive itself which is causing some issues here and there and people will be motivated to start moving
19:10:55 <clarkb> ya that
19:11:20 <ianw> yeah, so things are really only working because we run in a little mirror bubble
19:11:31 <fungi> bless this bubble
19:12:24 <ianw> i've stopped the image builds (https://review.opendev.org/c/openstack/project-config/+/827195) because they will fail
19:13:01 <ianw> if upstream modifies their mirror bits and we rsync that on an run, then we will be totally broken
19:13:47 <ianw> so tbh i feel like we could probably pull the images now, and if jobs fail people need to switch them to 8-stream and make them non-voting if it doesn't work ootb and fix it
19:14:35 <fungi> the problem with dropping our images is that if something happens we can't upload them again
19:14:38 <clarkb> ianw: do we know why the rsyncing hasn't broken yet?
19:15:10 <clarkb> I think I'm ok with leaving this up until projects migrate and removing sooner if the upstream infrastructure is no longer tenable
19:15:17 <fungi> oh, i see what you mean, drop the images in our providers and stop providing centos-8 nodes, full stop
19:15:46 <ianw> clarkb: when i last checked, they hadn't moved the 8/ directories into vault.centos.org
19:16:06 <ianw> but that may happen at any time i guess, which would make them disappear from the mirror and we'd pull that
19:16:25 <fungi> not that we should have to handhold anyone, but i know we've been focused mostly on openstack's use of centos-8 nodes... does anyone happen to know if starlingx is also impacted? (or did they never finish moving off centos-7?)
19:16:32 <frickler> mirror.centos dropped like 90G in size at around 11:30 today
19:17:26 <fungi> and yeah, this is probably good to make folks on openstack-discuss aware of. centos-8 is going away even if we do nothing. your jobs are breaking today, sorry!
19:18:25 <clarkb> fungi: ++ maybe the thing to do is respond to my thread that warned people about the removal. Indicate that centos-8 doesn't work if you talk to upstream anymore and as a result we're going to remove things?
19:20:06 <clarkb> jrosser isn't here but was one who wanted to keep them up
19:20:12 <ianw> ahh, yeah, http://mirror.iad.rax.opendev.org/centos/8/os isn't there ...
19:20:22 <ianw> so that might have happened overnight
19:20:31 <clarkb> ianw: I assume os/ includes important packages :)
19:20:37 <ianw> (or today, depending on how you look at it :)
19:20:38 <clarkb> I definitely think we should accelerate the removal given ^
19:22:23 <ianw> looks like from the logs it largely cleared itself out @ 2022-02-01T10:43:44,980764868+00:00
19:23:41 <clarkb> anyone want to volunteer to respond to the thread? The changes should be ready to go once we're ready
19:24:38 <ianw> i can chase up on it, reply and merge those things through today
19:24:41 <clarkb> thank you
19:24:48 <fungi> i've got a few other deadlines looming so probably can't give it the immediate attention it deserves
19:24:52 <fungi> thanks ianw!
19:25:01 <clarkb> #topic Cleaning up old reviews
19:25:08 <clarkb> #action clarkb to produce a list of repos that can be retired. We can then retire the repos and abandon their open reviews as step 0
19:25:32 <clarkb> I'll go ahead and record this now as an explicit action for my todo list. I think we still start with ^ which should take out a chunk of reviews then reevaluate when this is done
19:25:38 <clarkb> frickler: anything else to add to this topic?
19:25:53 <frickler> nope, didn't do anything on that yet
19:26:10 <clarkb> #topic Gerrit mergeability checking
19:26:42 <clarkb> When we upgraded to gerrit 3.4 we lost mergability checking by default. Gerrit disabled this functionality by default as it can use a disproportionate amount of resources to calculate merge conflicts
19:26:54 <clarkb> The functionality is still in Gerrit though and we can opt into it via a config switch
19:27:05 <clarkb> A few users have mentioned that the infromation was useful to them.
19:27:30 <fungi> particularly folks using it to omit unmergeable changes from their review dashboards, seems like
19:27:33 <clarkb> I'm not opposed to reenabling the functionality but do have some minor concerns. The biggest is that this will likelymake reindexing projects take longer now. But we were ok with that in the past.
19:28:18 <clarkb> Other concerns are that gerrit has a tendency to remove functionality entirely after disabling it by default so we may have to accept it going away one day. But no automated tooling relies on this functionality so the cost remains with humans. If they remove it entirely we should be fine other than having some sad users
19:28:41 <clarkb> There is also the potential that we'd be exposing ourselves to bugs in that functionality since few other users are going to use it. If that happens we can always disable it again
19:29:06 <frickler> maybe we could send them (gerrit devs) feedback that we would like not to loose that functionality?
19:29:07 <clarkb> All that to say I think despite my concerns there are good solutions should the concerns become a problem which means I'm ok with reenabling this
19:29:24 <clarkb> frickler: ya that is another option. Basically "we're toggling to non default here please keep it working"
19:29:57 <fungi> i'm also fine with bringing it back if someone proposes a patch
19:30:09 <ianw> ++ i agree with having it on, i always found it useful, and also agree with sending some feedback that we're turning it on
19:30:33 <frickler> I can look at doing a patch, since I'm one of the users who like to have it
19:30:34 <ianw> it seems like we could probably create conflicting changes and push them during the testing at least
19:30:39 <clarkb> I guess there is the msall matter of weather or not we need to offline reindex after enabling it. But I suspect that it will just start adding the info to new changes
19:30:44 <clarkb> frickler: thank you!
19:31:01 <clarkb> and ya our testing should alrgely cover major concerns with enabling it
19:31:04 <ianw> it would be super cool to check that with selenium but probably just seeing in a screenshot is enough
19:31:50 <frickler> clarkb: can you #action me on that or can I do that myself?
19:31:58 <frickler> just so I don't forget it
19:31:59 <clarkb> frickler: you should be able to do it yourself.
19:32:17 <frickler> #action frickler propose patch to re-enable Gerrit mergeability checking
19:32:25 <clarkb> (the bot doesn't give a lot of feedback though so I guess we'll find out after the meeting is done)
19:32:57 <clarkb> #topic Gerrit issues we are tracking
19:33:35 <clarkb> First up is the regression with gerrit ignoring signed tag acls for pushing tags. My patch to fix this which I tested manually on a held test node landed upstream and we have restarted Gerrit with that code and removed our workaround
19:33:53 <clarkb> We are just waiting on someone to push a signed tag and confirm it is happy now. Once that is done I'll merge the 3.4 fix into gerrit 3.5 as well
19:34:25 <clarkb> Next is url text substitution for gitweb links doesn't provide the hash value for ${commit} in all cases and gitea needs that
19:34:30 <clarkb> #link https://bugs.chromium.org/p/gerrit/issues/detail?id=15589
19:34:42 <clarkb> fungi and I are working to test a fix that I pushed upstream. Still no review comments though
19:35:08 <clarkb> One neat thing we are trying to do with our testing though is depends-on against upstream gerrit and running that code in our test jobs
19:35:11 <fungi> also i'm somewhat blocked on zuul-client autohold's --change option not working as advertised
19:35:13 <clarkb> It seems t owork from what I've seen so far
19:35:58 <fungi> i'm testing a workaround with --ref instead, but we're probably going to need to fix zuul-client to be able to continue doing change-specific autoholds (or go back to the rpc client in the meantime)
19:36:29 <clarkb> And finally yesterday we noticed that git pulls over ssh can backlog in gerrit where the tcp connection is made and gerrit recognizes there is a pull waiting but the tasks remain in waiting and are not processed by a thread
19:36:43 <clarkb> If this happens long enough and the backlog grows eventually it leads to Zuul being very backlogged with its mergers
19:36:49 <clarkb> #link https://bugs.chromium.org/p/gerrit/issues/detail?id=15649
19:37:04 <clarkb> Upstream asked for a thread dump which we have. We just need to audit it for any over exposure of sensitive info
19:37:22 <clarkb> I'll try to work on that I guess since I've been working upstream with Gerrit more and more
19:38:10 <clarkb> #topic Open Discussion
19:38:12 <clarkb> Anything else?
19:39:00 <frickler> not sure where to discuss, but lp is nearing the 2000000 bug count
19:39:15 <frickler> so overlap with storyboard ids will happen
19:39:41 <clarkb> fun. I'm not sure I personally grasp the impact of that. fungi would probably know better
19:39:48 <fungi> that's a good reminder
19:40:03 <frickler> we will loose the option to migrate existing bugs keeping their ids
19:40:10 <fungi> it basically means we can no longer migrate projects from lp to sb and expect a 1:1 correlation between imported bug numbers
19:40:16 <fungi> yeah, exactly
19:40:31 <clarkb> I see
19:40:43 <clarkb> solvable but with degraded ease of migration
19:40:52 <clarkb> (since we would have to map to new numbers)
19:40:58 <frickler> but then I also don't see a tendency to further do migrations to sb
19:41:12 <fungi> we can probably do some logic to uprev any imported bugs in the 20k+ range and continue to import earlier reports the way we did in the past
19:41:15 <clarkb> ya thats a good point
19:41:31 <clarkb> If anything it seems like projects are looking at github issues more than anything else
19:41:52 <fungi> and asking us to turn on gitea's issues feature, yeah
19:42:07 <clarkb> thank you for calling that out
19:42:16 <fungi> (but that means fixing the clustering problem, account management, and a host of other challenges)
19:42:41 <clarkb> As a general heads up my availability over the next week may be spotty. I'm going to do my best to be around but not sure what my availability will be like.
19:42:46 <clarkb> fungi: yup not an easy task.
19:43:47 <ianw> have upstream fixed the clustering issues?
19:43:53 <ianw> or not so much fixed, but added?
19:44:57 <clarkb> yes I think the elasticsearch backend is there. Not sure if it will work with opensearch though
19:45:27 <clarkb> unfortunate that the gitea effort happened while elasticsearch became less open but at least in theory we could run gitea with an opensearch cluster, a mariadb cluster, and a shared cephfs fs
19:45:57 <ianw> interesting ... another one for the todo list :)
19:46:18 <ianw> all that running together ... sounds like kubernetes might fit in ...
19:46:32 <fungi> that's what we tried to use the first time!
19:46:33 <clarkb> ya that was the original goal with gitea
19:46:52 <clarkb> but when we realized the indexes weren't distributed it wasn't tenable until that got fixed (and they addressed that with the elasticsearch backend option)
19:47:11 <fungi> i think that kubernetes cluster might still exist even, but we'd almost certainly want to rebuild it from scratch if we take it in that direction
19:47:52 <clarkb> ++
19:47:57 <fungi> if memory serves, we also ran into trouble with rook
19:48:05 <clarkb> figuring out how to manage a k8s cluster is probably step 0
19:48:06 <fungi> but that's almost certainly improved in the meantime
19:48:30 <fungi> yeah, managing ceph within kubernetes was a struggle back then
19:48:43 <clarkb> since there are many options and every option we tried perviously had its downsides (magnum didn't do upgrades via the api and you couldn't upgrade directly because there wasn't enough disk to grab two copies of the k8s images)
19:49:16 <clarkb> We don't need to solve that in this meeting though. But if people want to investigate that again now might be a good time to start looking into it
19:50:51 <clarkb> Sounds like we may be winding down. I'll give it a couple more minutes for any last minute items then call it a meeting
19:52:27 <clarkb> Thank you everyone!
19:52:33 <clarkb> #endmeeting