19:01:05 #startmeeting infra 19:01:05 Meeting started Tue Feb 1 19:01:05 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:05 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:05 The meeting name has been set to 'infra' 19:01:29 o/ 19:01:30 o/ 19:01:57 ohai 19:02:15 #link http://lists.opendev.org/pipermail/service-discuss/2022-January/000316.html Our Agenda 19:02:39 #topic Announcements 19:02:45 Service coordinator nominations run January 25, 2022 - February 8, 2022 You have another week :) 19:02:56 As always let me know if you have questions about that and I'd be happy to answer them 19:03:02 OpenInfra Summit CFP needs your input: https://openinfra.dev/summit/ 19:03:28 If you'd like to talk at the open infra summit there is a ci/cd track as well as other tracks you may be interested in proposing towards. I think you have until february 9 for that 19:03:53 And finally Zuul v5 released today! The culmination of much long term planning and effort. Thank you everyone who helped make that possible 19:04:09 side note our zuul isntall still says v4.12.something but we're running the same commits that were tagged v5 19:04:52 #topic Actions from last meeting 19:05:04 #link http://eavesdrop.openstack.org/meetings/infra/2022/infra.2022-01-25-19.01.txt minutes from last meeting 19:05:08 There were no actions recorded 19:05:25 #topic Topics 19:05:35 #topic Improving Opendev's CD throughput 19:05:41 #link https://review.opendev.org/c/opendev/infra-specs/+/821645 -- spec outlining some of the issues with secrets 19:05:47 #link https://review.opendev.org/c/opendev/system-config/+/821155 -- sample of secret writing; more info in changelog 19:06:05 Unfortunately the Gerrit upgrade and server patching had me far more distracted last week than I would've liked 19:06:14 I haven't had time to look at these yet. They are still on my todo list though... 19:06:23 maybe I should make an action for everyone to review those :) 19:06:49 #action infra-root Review OpenDev CD throughput related spec for secrets management: https://review.opendev.org/c/opendev/infra-specs/+/821645 19:06:58 ianw: is there anything else to add to this topic? 19:07:59 no, no work has been done on this one 19:08:12 #topic Container Maintenance 19:08:17 #link https://etherpad.opendev.org/p/opendev-container-maintenance 19:08:26 My time for this last week was largely sidelined by server patching 19:08:40 I don't really have anything new to add to this unfortunately. 19:08:50 #topic Nodepool Image Cleanup 19:09:28 Changes to remove CentOS 8 have been pushed as promised by the end of January. However, at least one project (OSA) is still struggling with removing centos 8 so we can hold off until they are ready since they are actively workign to correct this 19:09:41 Once projects like OSA are ready we can land the chagnes in this order: 19:09:46 #link https://review.opendev.org/c/opendev/base-jobs/+/827181 19:09:50 #link https://review.opendev.org/c/openstack/project-config/+/827184 19:09:54 #link https://review.opendev.org/c/opendev/system-config/+/827186 19:10:46 so we found yesterday that the centos mirror infrastructure stopped returning links 19:10:50 It looks like centos 8 upstream is starting to archive itself which is causing some issues here and there and people will be motivated to start moving 19:10:55 ya that 19:11:20 yeah, so things are really only working because we run in a little mirror bubble 19:11:31 bless this bubble 19:12:24 i've stopped the image builds (https://review.opendev.org/c/openstack/project-config/+/827195) because they will fail 19:13:01 if upstream modifies their mirror bits and we rsync that on an run, then we will be totally broken 19:13:47 so tbh i feel like we could probably pull the images now, and if jobs fail people need to switch them to 8-stream and make them non-voting if it doesn't work ootb and fix it 19:14:35 the problem with dropping our images is that if something happens we can't upload them again 19:14:38 ianw: do we know why the rsyncing hasn't broken yet? 19:15:10 I think I'm ok with leaving this up until projects migrate and removing sooner if the upstream infrastructure is no longer tenable 19:15:17 oh, i see what you mean, drop the images in our providers and stop providing centos-8 nodes, full stop 19:15:46 clarkb: when i last checked, they hadn't moved the 8/ directories into vault.centos.org 19:16:06 but that may happen at any time i guess, which would make them disappear from the mirror and we'd pull that 19:16:25 not that we should have to handhold anyone, but i know we've been focused mostly on openstack's use of centos-8 nodes... does anyone happen to know if starlingx is also impacted? (or did they never finish moving off centos-7?) 19:16:32 mirror.centos dropped like 90G in size at around 11:30 today 19:17:26 and yeah, this is probably good to make folks on openstack-discuss aware of. centos-8 is going away even if we do nothing. your jobs are breaking today, sorry! 19:18:25 fungi: ++ maybe the thing to do is respond to my thread that warned people about the removal. Indicate that centos-8 doesn't work if you talk to upstream anymore and as a result we're going to remove things? 19:20:06 jrosser isn't here but was one who wanted to keep them up 19:20:12 ahh, yeah, http://mirror.iad.rax.opendev.org/centos/8/os isn't there ... 19:20:22 so that might have happened overnight 19:20:31 ianw: I assume os/ includes important packages :) 19:20:37 (or today, depending on how you look at it :) 19:20:38 I definitely think we should accelerate the removal given ^ 19:22:23 looks like from the logs it largely cleared itself out @ 2022-02-01T10:43:44,980764868+00:00 19:23:41 anyone want to volunteer to respond to the thread? The changes should be ready to go once we're ready 19:24:38 i can chase up on it, reply and merge those things through today 19:24:41 thank you 19:24:48 i've got a few other deadlines looming so probably can't give it the immediate attention it deserves 19:24:52 thanks ianw! 19:25:01 #topic Cleaning up old reviews 19:25:08 #action clarkb to produce a list of repos that can be retired. We can then retire the repos and abandon their open reviews as step 0 19:25:32 I'll go ahead and record this now as an explicit action for my todo list. I think we still start with ^ which should take out a chunk of reviews then reevaluate when this is done 19:25:38 frickler: anything else to add to this topic? 19:25:53 nope, didn't do anything on that yet 19:26:10 #topic Gerrit mergeability checking 19:26:42 When we upgraded to gerrit 3.4 we lost mergability checking by default. Gerrit disabled this functionality by default as it can use a disproportionate amount of resources to calculate merge conflicts 19:26:54 The functionality is still in Gerrit though and we can opt into it via a config switch 19:27:05 A few users have mentioned that the infromation was useful to them. 19:27:30 particularly folks using it to omit unmergeable changes from their review dashboards, seems like 19:27:33 I'm not opposed to reenabling the functionality but do have some minor concerns. The biggest is that this will likelymake reindexing projects take longer now. But we were ok with that in the past. 19:28:18 Other concerns are that gerrit has a tendency to remove functionality entirely after disabling it by default so we may have to accept it going away one day. But no automated tooling relies on this functionality so the cost remains with humans. If they remove it entirely we should be fine other than having some sad users 19:28:41 There is also the potential that we'd be exposing ourselves to bugs in that functionality since few other users are going to use it. If that happens we can always disable it again 19:29:06 maybe we could send them (gerrit devs) feedback that we would like not to loose that functionality? 19:29:07 All that to say I think despite my concerns there are good solutions should the concerns become a problem which means I'm ok with reenabling this 19:29:24 frickler: ya that is another option. Basically "we're toggling to non default here please keep it working" 19:29:57 i'm also fine with bringing it back if someone proposes a patch 19:30:09 ++ i agree with having it on, i always found it useful, and also agree with sending some feedback that we're turning it on 19:30:33 I can look at doing a patch, since I'm one of the users who like to have it 19:30:34 it seems like we could probably create conflicting changes and push them during the testing at least 19:30:39 I guess there is the msall matter of weather or not we need to offline reindex after enabling it. But I suspect that it will just start adding the info to new changes 19:30:44 frickler: thank you! 19:31:01 and ya our testing should alrgely cover major concerns with enabling it 19:31:04 it would be super cool to check that with selenium but probably just seeing in a screenshot is enough 19:31:50 clarkb: can you #action me on that or can I do that myself? 19:31:58 just so I don't forget it 19:31:59 frickler: you should be able to do it yourself. 19:32:17 #action frickler propose patch to re-enable Gerrit mergeability checking 19:32:25 (the bot doesn't give a lot of feedback though so I guess we'll find out after the meeting is done) 19:32:57 #topic Gerrit issues we are tracking 19:33:35 First up is the regression with gerrit ignoring signed tag acls for pushing tags. My patch to fix this which I tested manually on a held test node landed upstream and we have restarted Gerrit with that code and removed our workaround 19:33:53 We are just waiting on someone to push a signed tag and confirm it is happy now. Once that is done I'll merge the 3.4 fix into gerrit 3.5 as well 19:34:25 Next is url text substitution for gitweb links doesn't provide the hash value for ${commit} in all cases and gitea needs that 19:34:30 #link https://bugs.chromium.org/p/gerrit/issues/detail?id=15589 19:34:42 fungi and I are working to test a fix that I pushed upstream. Still no review comments though 19:35:08 One neat thing we are trying to do with our testing though is depends-on against upstream gerrit and running that code in our test jobs 19:35:11 also i'm somewhat blocked on zuul-client autohold's --change option not working as advertised 19:35:13 It seems t owork from what I've seen so far 19:35:58 i'm testing a workaround with --ref instead, but we're probably going to need to fix zuul-client to be able to continue doing change-specific autoholds (or go back to the rpc client in the meantime) 19:36:29 And finally yesterday we noticed that git pulls over ssh can backlog in gerrit where the tcp connection is made and gerrit recognizes there is a pull waiting but the tasks remain in waiting and are not processed by a thread 19:36:43 If this happens long enough and the backlog grows eventually it leads to Zuul being very backlogged with its mergers 19:36:49 #link https://bugs.chromium.org/p/gerrit/issues/detail?id=15649 19:37:04 Upstream asked for a thread dump which we have. We just need to audit it for any over exposure of sensitive info 19:37:22 I'll try to work on that I guess since I've been working upstream with Gerrit more and more 19:38:10 #topic Open Discussion 19:38:12 Anything else? 19:39:00 not sure where to discuss, but lp is nearing the 2000000 bug count 19:39:15 so overlap with storyboard ids will happen 19:39:41 fun. I'm not sure I personally grasp the impact of that. fungi would probably know better 19:39:48 that's a good reminder 19:40:03 we will loose the option to migrate existing bugs keeping their ids 19:40:10 it basically means we can no longer migrate projects from lp to sb and expect a 1:1 correlation between imported bug numbers 19:40:16 yeah, exactly 19:40:31 I see 19:40:43 solvable but with degraded ease of migration 19:40:52 (since we would have to map to new numbers) 19:40:58 but then I also don't see a tendency to further do migrations to sb 19:41:12 we can probably do some logic to uprev any imported bugs in the 20k+ range and continue to import earlier reports the way we did in the past 19:41:15 ya thats a good point 19:41:31 If anything it seems like projects are looking at github issues more than anything else 19:41:52 and asking us to turn on gitea's issues feature, yeah 19:42:07 thank you for calling that out 19:42:16 (but that means fixing the clustering problem, account management, and a host of other challenges) 19:42:41 As a general heads up my availability over the next week may be spotty. I'm going to do my best to be around but not sure what my availability will be like. 19:42:46 fungi: yup not an easy task. 19:43:47 have upstream fixed the clustering issues? 19:43:53 or not so much fixed, but added? 19:44:57 yes I think the elasticsearch backend is there. Not sure if it will work with opensearch though 19:45:27 unfortunate that the gitea effort happened while elasticsearch became less open but at least in theory we could run gitea with an opensearch cluster, a mariadb cluster, and a shared cephfs fs 19:45:57 interesting ... another one for the todo list :) 19:46:18 all that running together ... sounds like kubernetes might fit in ... 19:46:32 that's what we tried to use the first time! 19:46:33 ya that was the original goal with gitea 19:46:52 but when we realized the indexes weren't distributed it wasn't tenable until that got fixed (and they addressed that with the elasticsearch backend option) 19:47:11 i think that kubernetes cluster might still exist even, but we'd almost certainly want to rebuild it from scratch if we take it in that direction 19:47:52 ++ 19:47:57 if memory serves, we also ran into trouble with rook 19:48:05 figuring out how to manage a k8s cluster is probably step 0 19:48:06 but that's almost certainly improved in the meantime 19:48:30 yeah, managing ceph within kubernetes was a struggle back then 19:48:43 since there are many options and every option we tried perviously had its downsides (magnum didn't do upgrades via the api and you couldn't upgrade directly because there wasn't enough disk to grab two copies of the k8s images) 19:49:16 We don't need to solve that in this meeting though. But if people want to investigate that again now might be a good time to start looking into it 19:50:51 Sounds like we may be winding down. I'll give it a couple more minutes for any last minute items then call it a meeting 19:52:27 Thank you everyone! 19:52:33 #endmeeting