Tuesday, 2023-04-04

clarkbmeeting time19:00
fungiahoy!19:01
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Apr  4 19:01:27 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/T2HXV6JPAKXGQREUBECMYYVGEQC2PYNY/ Our Agenda19:01
clarkb#topic Announcements19:02
clarkbI have no announcements other than PTG is over and openstack release has happened. We should be clear to make changes regularly19:02
clarkb#topic Topics19:03
clarkb#topic Docker Hub team shutdown19:03
clarkbGood news everyone! They have reverted the team shutdown and will no longer be making this change19:03
clarkbThat means our deadline 10 days from now is no longer present, but I think we are all in agreement we should move anyway19:03
clarkb#link https://review.opendev.org/q/topic:tag-deletion Changes to handle publication to registries generically.19:04
corvusthey just took longer than the 24h we allocated for them to decide that ...19:04
clarkbya took them about a week19:04
clarkbThis stack of changes from ianw is the current set of work around being abel to move to our generic container roles whihc we would point at quay19:04
clarkbI need to rereview them but have had meetings all morning. I'll try to get to that soon19:05
fungiit's not surprising how quickly people will abandon a platform when it's already burned most of their good will19:05
ianwi think this gives us ultimate flexibility in the roles, which is pretty cool.  we can use a promote pipeline like we have now with tags; a little bit of work and we'll have the ability to upload from intermediate registry19:06
ianwboth have different trade-offs which are documented in the changes19:06
clarkbyup and i Think zuul users may choose one or the other depending on their specific needs19:06
clarkbfor opendev I'd like us to try the intermediate registry approach first since that relies on no registry specific features19:06
clarkb(though we'd have the weird creation step either way for new images in quay)19:07
clarkbanyway reviews on that stack are the next step and figuring out the intermediate registry promotion process the step after that19:07
clarkbAnything else to call out re docker hub?19:07
ianwyep i personally would like to get that merged and try it with zuul-client, and then work on the promote from itnermediate registry19:08
corvusi plan on reviewing that today; fyi i'll be afk wed-fri this week.19:08
clarkbsound good I should be able to rereview today as well19:08
ianwwe can probably do similar and switch zuul-client to that as well, use it as a test.  because it pushes to real repos it is a bit hard to test 100% outside that19:08
corvusi'm happy to use zuul-client as a test for both paths and can help with that19:09
clarkb#topic Bastion Host Updates19:10
clarkbI don't think there is really anything new here at this point? There was the launch env and rax rdns stuff but I've got that listed under booting new servers later19:10
ianwno; i'm not sure the backup roles have really had full review, so i haven't done anything there19:11
clarkback we've been plenty busy with other items19:11
clarkb#topic Mailman 319:11
fungiwith end of quarter, openstack release and ptg finally in the rear view mirror i'm hoping to be able to resume work on this, but travelling for vacation all next week will likely delay things a little longer19:11
fungino new updates though19:11
clarkbthanks19:11
fungiwe did have a brief related item19:11
fungii raised a question on the mm3-users ml19:12
fungi#link https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/CYSMH4H2VC3P5JIOFJIPRJ32QKQNITJS/19:12
fungiquick summary, there's a hyperkitty bug which currently prevents list owners from deleting posts19:12
fungiif you log into hyperkitty with the admin creds from our private ansible hostvars, you can do it19:13
fungiit's not intuitive, you basically need to go from the whole thread view to the single-message view before "delete this message" shows up as a button19:14
clarkbbut possible at least19:14
fungialso the reply included a recommended alternative moderation workflow which may be worth consideration (basically moderate all users and then set them individually to unmoderated the first time they post something that isn't spam)19:14
clarkbThis is what mailman's lists use themselves if I remember what it was like going through the initial setup there19:15
fungiit's similar to how we "patrol" edits on the wiki from new users19:15
fungiwell, i say "we" but i think it's probably just me patrolling the wiki these days19:15
clarkbI think we worry aout that if the problem becomes more widespread19:16
clarkbits been one issue in a few months?19:16
fungiyes, with mm3 having the option to post from the webui it seems likely that we may run into it more19:16
fungibut i agree it's not critical for the moment19:16
fungialso we could bulk-set all current subscribers to unmoderated on a list of we switched moderation workflow19:17
clarkboh that is good to know19:17
fungior i think they probably already will be, since the setting is that new subscribers are initially moderated19:17
fungianyway, just wanted to catch folks up on that development, i didn't have anything more on the topic19:18
clarkb#topic Gerrit 3.7 Upgrade and Project Renames19:18
clarkb#link https://etherpad.opendev.org/p/gerrit-upgrade-3.719:18
clarkbWe plan to do a gerrit upgrade and project renames on April 6 starting at 22:00 UTC19:18
clarkbThere were a few items related ot this I wanted to cover in the meeting to make sure we're ahppy with proceeding19:18
clarkbFirst up: Do we have any strong opinions on renaming first or upgrading first? I think if we upgrade first the general process flow is easier with zuul and reindexing19:19
clarkbianw: I think your etherpad implies doing the upgrade first as well?19:19
clarkbWe have not done any project names under gerrit 3.6 yet so I don't think we need to prefer doing renames first for this reason19:20
ianwthat was my thinking, seems like we can put everything in emergency and make sure manage-project isn't running19:20
fungialso it's easier to postpone the renames if things go sideways with the upgrade19:20
clarkbianw: ok keep in mind that the rename playbook requires things not be in emergency19:21
clarkbbut I think we can do the upgrade first, land the chnge to reflect that in configs, and have nothing in emergency then proceed with renames and be happy19:21
ianwhrm, i'm not sure the current procedure refelects that19:22
clarkbseems like upgrade then renames is the order of operations we are happy with.19:22
ianwhttps://docs.opendev.org/opendev/system-config/latest/gerrit.html#renaming-a-project19:22
clarkbianw: I think it does. Step 19 on line 156 removes things from emergency before renaming19:22
clarkbianw: oh! we must not use !disabled in the rename playbook19:23
ianwyeah, that's what i was thinking, but i didn't check19:23
clarkbyup thats the case so what I said above is not correct19:23
clarkbso I think we need to edit step 19 to keep things in emergency until we're completely done?19:23
ianwso there's step "18.5" before 19 which is "step to rename procedure"19:24
ianwso the idea was to do it with everything in emergency 19:24
clarkbsounds good19:24
ianwhttps://opendev.org/opendev/system-config/src/branch/master/playbooks/rename_repos.yaml19:25
ianwdoes't look at disabled, so i think the assumption is correct19:25
clarkbNext up was comfort levels with landing three different rename changes after we are done and our in order processing of those changes. This has the potential to recreate projects under their old names in gerrit and gitea19:26
clarkbOn the Gerrit side of things fungi came to the realization that our jeepyb cache file should prevent this from happening. I'm still not sure what prevents it on the gitea side19:26
clarkbpossibly because the redirects exist for those names gitea would fail to create the names?19:26
fungibut also it's a warning to us not to invalidate jeepyb's cache during a rename ;)19:27
clarkbI'm starting to lean towards squashing the rename changes together for simplicity though then we don't have to worry about it19:27
clarkbbut I wanted to see if we have a preference. I think if we stick to separate chagnes we would need to rundown what the potential gitea behavior is and double check the jeepyb cache file has appropriate data in it for all involved projects19:28
ianware we talking about the force-merge of the project-config changes?19:28
clarkbianw: yes19:28
clarkbthe other option is to leave everything in emergency until after those jobs completely finish then let hourly/daily jobs ensure we're in sync after landing the seprate changes.19:28
ianwstep 7 of https://docs.opendev.org/opendev/system-config/latest/gerrit.html#renaming-a-project ?19:29
clarkbSo three options: 1) land as three changes and let things run normally if we understand why old projects names won't be recreated 2) land as three changes but keep hosts in emergency preventing the jobs for first two changes from running 3) squash into single change and then concerns go away19:29
clarkbianw: yes step 719:30
ianwis it three changes or two changes?  19:30
clarkbianw: basically I think we've come to a bit of a realization that if we land more than one project-config rename change that we've gotten lucky in the past that we haven't accidentally recreated projects19:30
clarkbianw: its three now as of a coupel of hours ago19:30
ianwi've accounted for virtualpdu and xstatic-angular-something19:30
clarkbovn-bgp-agent is the latest19:31
ianwoh right, ok, need to update for that19:31
fungiyeah, they just added it today19:31
fungii tried to iterate with them on it quickly to give us time to work it in19:31
fungiwe likely need to decide on a cut-off for further additions19:32
fungiwhich could be as soon as now, i suppose19:32
ianw(that's fine, was just a bit confused :)19:32
clarkbI think that maybe in the gitea case our checks for creating new projects may see the redirect and/or trying to create a project where a redirect exists is an error. But I don't know for sure and feel like I'm running out of time to check that. Maybe we should provisinally plan to do 2) or 3) and only do 1) if we manage to run down gitea and jeepyb cache state?19:32
ianwi was working under the assumption of 2) ... basically force merge everything, and manage-projects only runs after all 3 are committed19:33
fungii'm good with any of those options19:33
clarkbianw: ok that would be option 2)19:33
ianwbut i guess the point is zuul is starting manage-projects and we're relying on it not working as things are in emergency right?19:33
clarkbianw: by default if we take things out of the emergency file manage-projects will run for each of them in order19:33
clarkbyes exactly19:34
clarkbbut if we leave them all in emergency zuul can run the jobs and they will noop because they can't talk to the hosts.19:34
clarkbI'm happy with that particularly if you were already planning to go with that19:34
ianwyeah, i cargo-culted the checklist from the system-config docs, but it makes a bit more sense to me now.  i can update the checklist to be a bit more explicit19:35
clarkbsounds good19:35
ianwand i'll double check the manage-projects playbook to make sure things won't run19:35
clarkbnext up is calling out there is a third rename request. I should've brought that up first :)19:35
fungii suppose we could hack around it by exiting from emergency mode between the penultimate deploy failing and submitting the ultimate rename change19:36
ianwheh :)  i'll also add that in today19:36
clarkbI think we've covered that and we can ensure all our notes and records change is updated19:36
clarkboh since we've decided to keep the changes separate we may want to rebase them in order to address any merge conflicts19:36
ianwfungi: that could work19:36
ianwclarkb: i can do that and make sure they stack19:37
clarkbthanks!19:37
clarkbAnd last questions was whether or not the revert path for 3.7 -> 3.6 has been tested19:37
clarkbI think you tested this recently based on the plugins checking you did yesterday?19:37
ianwyep, https://23.253.56.187/ is a reverted host19:37
ianwbasically some manual git fiddling to revert the index metadata, and then init and reindex everything19:38
ianwas noted, that puts into logs some initially worrying things that makes you think you've got the wrong plugins19:38
ianwbut we decided that what's really happening is that if you've got multiple tags in a plugin, bazel must be choosing the highest one19:39
ianwto stamp in as a version19:39
clarkbgreat I think from a process and planning perspective this is coming together. Really jus need to udpate our notes and records change19:39
clarkband get the chagnes rebased so they can merge cleanly19:39
ianw++ will do19:39
clarkbOh I will note I did some due diligence around the xstatic rename to ensure we aren't hijacking a project like moin did with a different xstatic package to us19:40
ianwi'll add a note on the rebase too, so that when we copy the checklist for next time we have that19:40
clarkband there was a discussin on a github issue with them that basically boiled down to there are packages moin cares about and there are packages horizon cares about and splitting them to make that clear is happening and this is part of that19:40
clarkball that to say I think we are good to rename the xstatic repo19:40
clarkbI'll try to take a look oever everything again tomorrow too19:41
clarkband I think that may be it? THank you everyone for helping put this together. Gerrit stuff is always fun :)19:41
ianwthis has been a big one!19:41
clarkb#topic Upgrading Old Servers19:42
clarkb#link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes19:42
clarkbianw: the nameserver changes have landed for the most part19:42
clarkb#link https://etherpad.opendev.org/p/2023-opendev-dns19:42
clarkbI suspect docker/etc/etc have made this go slowly (I ran into similar issues)19:43
clarkbBut is there anything to do on this next? maybe reconvene next week after gerrit is done?19:43
ianwyeah i haven't got back to that, sorry.  i just need some clear air to think about it :)19:43
clarkbunderstood19:43
ianwbut yeah, still very high on the todo list19:43
clarkbYesterday I picked up my todo list here and launched a replacement static server and a replacement etherpad server19:43
clarkb#link https://review.opendev.org/q/topic:add-static02 static and etherpad replacements19:43
clarkbreviews on these changes are very much appreciated. I think everything should be good to go except for reverse dns records for these two servers. Neither does email so that isn't critical and fixing that is in progress19:44
clarkb#link https://review.opendev.org/c/opendev/system-config/+/879388 Fix rax rdns19:44
clarkbThe updated scripting to do rdns automatically was tied into launch node and this change fixes that. I think once this chagne lands we can run the rdns command for the new servers to have it update the records. Worst case I can do it through the web ui19:45
clarkbI also discovered in this process that the launch env openstack client could not list rax volumes (though it could attach them)19:45
clarkb#link https://review.opendev.org/c/opendev/system-config/+/879387 Reinstall launch env periodically19:45
clarkbthis change should address that as I think we decided that we didn't reinstall the env after we fixed the dep versions for listing volumes in rax?19:46
clarkbfungi: ianw: ^ I'm still behind on that one if you have anything to add19:46
ianwi think that's it; fungi did confirm a fresh venv seemed to work19:46
ianwthe other thing we can do is just rm it and start it again19:46
fungii think we can just move /user/launcher-venv aside and let deploy recreate it in place as a test19:46
fungior just merge the change19:46
clarkbI need to review the change before I decide onthat. But I suspect just merging the change is fine since we don't rely on this launch env for regularly processing19:47
ianwi'm happy to monitor it and make sure it's doing what i think it's going to do :)19:47
clarkbAnd ya reviews on the new static server in particular would be good. I stacked things due to the dns updates and etherpad is more involved19:48
clarkb#topic AFS volume quotas19:48
ianwlgtm and i'll review the etherpad group stuff after a tea :)19:48
clarkbthanks!19:48
clarkbI don't have much new here other than to point out that the utilization is slowly climbing up over time :)19:49
clarkbsome of that may be all of the openstack release artifacts too so not just mirror volumes19:49
fungibut seems like we might free up some space with the wheel cache cleanup too19:49
clarkbBut lets keep an eye on it so that we can intervene before it becomes and emergency19:49
clarkb++19:49
clarkb#topic Gitea 1.1919:50
clarkbmoving along now as we are running out of time19:50
clarkb#link https://review.opendev.org/c/opendev/system-config/+/877541 Upgrade opendev.org to 1.19.019:50
clarkbThere is no 1.19.1 yet last I checked and in the past we have waited for the next bugfix release to land before upgrading gitea. I'm not in a hurry for this reason. But I think if we review the change now it is unlikely there will be a bit delta when .1 arrives. Also if we like we can update to 1.19.0 before .1 releases19:50
ianwmaybe first thing after gerrit upgrade?  just in case?19:51
clarkbThe major update to this release is gitea actions which is experimental and we disable19:51
clarkb++19:51
clarkb#topic Quo vadis Storyboard19:52
clarkbAs prediced a number of projects decided to move off of storyboard at the PTG. We may be asked to mark the projects read only in storyboard once they have moved19:52
clarkbIt does feel like the individualprojects could do a bit more to work together to create a more consistent process but I don't think we can force them to do that or mandate anything. Just continue to enourage them ot talk to one another I guess19:53
fungii have a stack of projects i need to switch to inactive and update descriptions on now19:53
fungijust haven't gotten to that yet19:53
clarkb#topic Open Discussion19:54
clarkbI actually meant to put the wheel mirror stuff on the agenda then spaced it.19:54
clarkbBut I think we can talk about that now if there was anything important to bring up that isn't already on the mailing list19:54
ianwyeah, i'm still trying to come up with a concrete plan19:54
fungiyour preliminary exploration seems promising though19:55
ianwbut i think the first thing we should probably action is double-checking the audit tool is right, and cleaning up19:55
fungii need to find time to reply on the ml thread19:55
clarkbat the very lest I think the cleanup can probably be run before any long term plan is committed to19:55
clarkbianw: ++19:55
ianw#link https://review.opendev.org/c/opendev/system-config/+/87923919:55
ianwthat has some links to the output, etc.19:55
fungiyeah, i feel like we know what's safe to delete now, even if we haven't settled on what's safe to stop building yet19:55
fungialso, this is one of those times where vos backup may come in handy19:56
ianwright, we can make sure we're not putting back anything we don't want (i.e. the prune is working)19:56
fricklerregarding afs, is there a way we could increase capacity further by adding more volumes or servers? or are we facing a hard limit there? just to know possible options19:56
ianwand evaluate what clarkb was talking about in that we don't need to carry .whl's that don't build against libraries really19:56
clarkbfrickler: yes we can add up to 14 cinder volumes of 1TB each to each server19:57
fungifor some stuff it's pretty straightforward. if we decide to delete all pure python wheels on the premise that they're trivially rebuildable from sdist in jobs, we might want to vos backup so we can quickly switch the volume state back if our assumptions turn out to be incorrect19:57
clarkbfrickler: then we add them to the vicepa pool or add a vicepb or something. I would have to look at the existing server setup to see how the existing 3TB is organized19:57
corvusand no known limit to the number of servers19:57
ianwit's all in vicepa19:57
fungiyeah, we discussed previously that adding more servers probably helps distribute the points of failure a bit, vs the situation we ended up in with the static file server that had 14 attached volumes and fell over if you sneezed19:58
ianwi think we can definitely grow if we need to, but "more storage == more problems" :)19:58
ianwheh, yeah19:58
clarkbyup and we do have three servers only one of which is under the massive pressure. We might want to double check if we can drop content from that server or rebalance somhow19:58
clarkb(though I suspect with only three servers the number of organizations is small)19:59
clarkbwe definitely have options though which is a good thing19:59
fungii would want to add a server if we're rebalancing, because we probably need to be able to continue functioning when one dies19:59
clarkband we are officially at time20:00
clarkbThank you everyone!20:00
ianwanyway, i'm happy to run the wheel cleanup, but yeah, after someone else has looked at it :)20:00
clarkb#endmeeting20:00
opendevmeetMeeting ended Tue Apr  4 20:00:29 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2023/infra.2023-04-04-19.01.html20:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-04-04-19.01.txt20:00
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2023/infra.2023-04-04-19.01.log.html20:00
fungithanks!20:00
* clarkb hunts down lunch. I'll dig into the various reviews afterwards20:01
clarkbactually let me do the container one now since it should go quickly20:01

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!