clarkb | meeting time | 19:00 |
---|---|---|
fungi | ahoy! | 19:01 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Apr 4 19:01:27 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/T2HXV6JPAKXGQREUBECMYYVGEQC2PYNY/ Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:02 |
clarkb | I have no announcements other than PTG is over and openstack release has happened. We should be clear to make changes regularly | 19:02 |
clarkb | #topic Topics | 19:03 |
clarkb | #topic Docker Hub team shutdown | 19:03 |
clarkb | Good news everyone! They have reverted the team shutdown and will no longer be making this change | 19:03 |
clarkb | That means our deadline 10 days from now is no longer present, but I think we are all in agreement we should move anyway | 19:03 |
clarkb | #link https://review.opendev.org/q/topic:tag-deletion Changes to handle publication to registries generically. | 19:04 |
corvus | they just took longer than the 24h we allocated for them to decide that ... | 19:04 |
clarkb | ya took them about a week | 19:04 |
clarkb | This stack of changes from ianw is the current set of work around being abel to move to our generic container roles whihc we would point at quay | 19:04 |
clarkb | I need to rereview them but have had meetings all morning. I'll try to get to that soon | 19:05 |
fungi | it's not surprising how quickly people will abandon a platform when it's already burned most of their good will | 19:05 |
ianw | i think this gives us ultimate flexibility in the roles, which is pretty cool. we can use a promote pipeline like we have now with tags; a little bit of work and we'll have the ability to upload from intermediate registry | 19:06 |
ianw | both have different trade-offs which are documented in the changes | 19:06 |
clarkb | yup and i Think zuul users may choose one or the other depending on their specific needs | 19:06 |
clarkb | for opendev I'd like us to try the intermediate registry approach first since that relies on no registry specific features | 19:06 |
clarkb | (though we'd have the weird creation step either way for new images in quay) | 19:07 |
clarkb | anyway reviews on that stack are the next step and figuring out the intermediate registry promotion process the step after that | 19:07 |
clarkb | Anything else to call out re docker hub? | 19:07 |
ianw | yep i personally would like to get that merged and try it with zuul-client, and then work on the promote from itnermediate registry | 19:08 |
corvus | i plan on reviewing that today; fyi i'll be afk wed-fri this week. | 19:08 |
clarkb | sound good I should be able to rereview today as well | 19:08 |
ianw | we can probably do similar and switch zuul-client to that as well, use it as a test. because it pushes to real repos it is a bit hard to test 100% outside that | 19:08 |
corvus | i'm happy to use zuul-client as a test for both paths and can help with that | 19:09 |
clarkb | #topic Bastion Host Updates | 19:10 |
clarkb | I don't think there is really anything new here at this point? There was the launch env and rax rdns stuff but I've got that listed under booting new servers later | 19:10 |
ianw | no; i'm not sure the backup roles have really had full review, so i haven't done anything there | 19:11 |
clarkb | ack we've been plenty busy with other items | 19:11 |
clarkb | #topic Mailman 3 | 19:11 |
fungi | with end of quarter, openstack release and ptg finally in the rear view mirror i'm hoping to be able to resume work on this, but travelling for vacation all next week will likely delay things a little longer | 19:11 |
fungi | no new updates though | 19:11 |
clarkb | thanks | 19:11 |
fungi | we did have a brief related item | 19:11 |
fungi | i raised a question on the mm3-users ml | 19:12 |
fungi | #link https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/CYSMH4H2VC3P5JIOFJIPRJ32QKQNITJS/ | 19:12 |
fungi | quick summary, there's a hyperkitty bug which currently prevents list owners from deleting posts | 19:12 |
fungi | if you log into hyperkitty with the admin creds from our private ansible hostvars, you can do it | 19:13 |
fungi | it's not intuitive, you basically need to go from the whole thread view to the single-message view before "delete this message" shows up as a button | 19:14 |
clarkb | but possible at least | 19:14 |
fungi | also the reply included a recommended alternative moderation workflow which may be worth consideration (basically moderate all users and then set them individually to unmoderated the first time they post something that isn't spam) | 19:14 |
clarkb | This is what mailman's lists use themselves if I remember what it was like going through the initial setup there | 19:15 |
fungi | it's similar to how we "patrol" edits on the wiki from new users | 19:15 |
fungi | well, i say "we" but i think it's probably just me patrolling the wiki these days | 19:15 |
clarkb | I think we worry aout that if the problem becomes more widespread | 19:16 |
clarkb | its been one issue in a few months? | 19:16 |
fungi | yes, with mm3 having the option to post from the webui it seems likely that we may run into it more | 19:16 |
fungi | but i agree it's not critical for the moment | 19:16 |
fungi | also we could bulk-set all current subscribers to unmoderated on a list of we switched moderation workflow | 19:17 |
clarkb | oh that is good to know | 19:17 |
fungi | or i think they probably already will be, since the setting is that new subscribers are initially moderated | 19:17 |
fungi | anyway, just wanted to catch folks up on that development, i didn't have anything more on the topic | 19:18 |
clarkb | #topic Gerrit 3.7 Upgrade and Project Renames | 19:18 |
clarkb | #link https://etherpad.opendev.org/p/gerrit-upgrade-3.7 | 19:18 |
clarkb | We plan to do a gerrit upgrade and project renames on April 6 starting at 22:00 UTC | 19:18 |
clarkb | There were a few items related ot this I wanted to cover in the meeting to make sure we're ahppy with proceeding | 19:18 |
clarkb | First up: Do we have any strong opinions on renaming first or upgrading first? I think if we upgrade first the general process flow is easier with zuul and reindexing | 19:19 |
clarkb | ianw: I think your etherpad implies doing the upgrade first as well? | 19:19 |
clarkb | We have not done any project names under gerrit 3.6 yet so I don't think we need to prefer doing renames first for this reason | 19:20 |
ianw | that was my thinking, seems like we can put everything in emergency and make sure manage-project isn't running | 19:20 |
fungi | also it's easier to postpone the renames if things go sideways with the upgrade | 19:20 |
clarkb | ianw: ok keep in mind that the rename playbook requires things not be in emergency | 19:21 |
clarkb | but I think we can do the upgrade first, land the chnge to reflect that in configs, and have nothing in emergency then proceed with renames and be happy | 19:21 |
ianw | hrm, i'm not sure the current procedure refelects that | 19:22 |
clarkb | seems like upgrade then renames is the order of operations we are happy with. | 19:22 |
ianw | https://docs.opendev.org/opendev/system-config/latest/gerrit.html#renaming-a-project | 19:22 |
clarkb | ianw: I think it does. Step 19 on line 156 removes things from emergency before renaming | 19:22 |
clarkb | ianw: oh! we must not use !disabled in the rename playbook | 19:23 |
ianw | yeah, that's what i was thinking, but i didn't check | 19:23 |
clarkb | yup thats the case so what I said above is not correct | 19:23 |
clarkb | so I think we need to edit step 19 to keep things in emergency until we're completely done? | 19:23 |
ianw | so there's step "18.5" before 19 which is "step to rename procedure" | 19:24 |
ianw | so the idea was to do it with everything in emergency | 19:24 |
clarkb | sounds good | 19:24 |
ianw | https://opendev.org/opendev/system-config/src/branch/master/playbooks/rename_repos.yaml | 19:25 |
ianw | does't look at disabled, so i think the assumption is correct | 19:25 |
clarkb | Next up was comfort levels with landing three different rename changes after we are done and our in order processing of those changes. This has the potential to recreate projects under their old names in gerrit and gitea | 19:26 |
clarkb | On the Gerrit side of things fungi came to the realization that our jeepyb cache file should prevent this from happening. I'm still not sure what prevents it on the gitea side | 19:26 |
clarkb | possibly because the redirects exist for those names gitea would fail to create the names? | 19:26 |
fungi | but also it's a warning to us not to invalidate jeepyb's cache during a rename ;) | 19:27 |
clarkb | I'm starting to lean towards squashing the rename changes together for simplicity though then we don't have to worry about it | 19:27 |
clarkb | but I wanted to see if we have a preference. I think if we stick to separate chagnes we would need to rundown what the potential gitea behavior is and double check the jeepyb cache file has appropriate data in it for all involved projects | 19:28 |
ianw | are we talking about the force-merge of the project-config changes? | 19:28 |
clarkb | ianw: yes | 19:28 |
clarkb | the other option is to leave everything in emergency until after those jobs completely finish then let hourly/daily jobs ensure we're in sync after landing the seprate changes. | 19:28 |
ianw | step 7 of https://docs.opendev.org/opendev/system-config/latest/gerrit.html#renaming-a-project ? | 19:29 |
clarkb | So three options: 1) land as three changes and let things run normally if we understand why old projects names won't be recreated 2) land as three changes but keep hosts in emergency preventing the jobs for first two changes from running 3) squash into single change and then concerns go away | 19:29 |
clarkb | ianw: yes step 7 | 19:30 |
ianw | is it three changes or two changes? | 19:30 |
clarkb | ianw: basically I think we've come to a bit of a realization that if we land more than one project-config rename change that we've gotten lucky in the past that we haven't accidentally recreated projects | 19:30 |
clarkb | ianw: its three now as of a coupel of hours ago | 19:30 |
ianw | i've accounted for virtualpdu and xstatic-angular-something | 19:30 |
clarkb | ovn-bgp-agent is the latest | 19:31 |
ianw | oh right, ok, need to update for that | 19:31 |
fungi | yeah, they just added it today | 19:31 |
fungi | i tried to iterate with them on it quickly to give us time to work it in | 19:31 |
fungi | we likely need to decide on a cut-off for further additions | 19:32 |
fungi | which could be as soon as now, i suppose | 19:32 |
ianw | (that's fine, was just a bit confused :) | 19:32 |
clarkb | I think that maybe in the gitea case our checks for creating new projects may see the redirect and/or trying to create a project where a redirect exists is an error. But I don't know for sure and feel like I'm running out of time to check that. Maybe we should provisinally plan to do 2) or 3) and only do 1) if we manage to run down gitea and jeepyb cache state? | 19:32 |
ianw | i was working under the assumption of 2) ... basically force merge everything, and manage-projects only runs after all 3 are committed | 19:33 |
fungi | i'm good with any of those options | 19:33 |
clarkb | ianw: ok that would be option 2) | 19:33 |
ianw | but i guess the point is zuul is starting manage-projects and we're relying on it not working as things are in emergency right? | 19:33 |
clarkb | ianw: by default if we take things out of the emergency file manage-projects will run for each of them in order | 19:33 |
clarkb | yes exactly | 19:34 |
clarkb | but if we leave them all in emergency zuul can run the jobs and they will noop because they can't talk to the hosts. | 19:34 |
clarkb | I'm happy with that particularly if you were already planning to go with that | 19:34 |
ianw | yeah, i cargo-culted the checklist from the system-config docs, but it makes a bit more sense to me now. i can update the checklist to be a bit more explicit | 19:35 |
clarkb | sounds good | 19:35 |
ianw | and i'll double check the manage-projects playbook to make sure things won't run | 19:35 |
clarkb | next up is calling out there is a third rename request. I should've brought that up first :) | 19:35 |
fungi | i suppose we could hack around it by exiting from emergency mode between the penultimate deploy failing and submitting the ultimate rename change | 19:36 |
ianw | heh :) i'll also add that in today | 19:36 |
clarkb | I think we've covered that and we can ensure all our notes and records change is updated | 19:36 |
clarkb | oh since we've decided to keep the changes separate we may want to rebase them in order to address any merge conflicts | 19:36 |
ianw | fungi: that could work | 19:36 |
ianw | clarkb: i can do that and make sure they stack | 19:37 |
clarkb | thanks! | 19:37 |
clarkb | And last questions was whether or not the revert path for 3.7 -> 3.6 has been tested | 19:37 |
clarkb | I think you tested this recently based on the plugins checking you did yesterday? | 19:37 |
ianw | yep, https://23.253.56.187/ is a reverted host | 19:37 |
ianw | basically some manual git fiddling to revert the index metadata, and then init and reindex everything | 19:38 |
ianw | as noted, that puts into logs some initially worrying things that makes you think you've got the wrong plugins | 19:38 |
ianw | but we decided that what's really happening is that if you've got multiple tags in a plugin, bazel must be choosing the highest one | 19:39 |
ianw | to stamp in as a version | 19:39 |
clarkb | great I think from a process and planning perspective this is coming together. Really jus need to udpate our notes and records change | 19:39 |
clarkb | and get the chagnes rebased so they can merge cleanly | 19:39 |
ianw | ++ will do | 19:39 |
clarkb | Oh I will note I did some due diligence around the xstatic rename to ensure we aren't hijacking a project like moin did with a different xstatic package to us | 19:40 |
ianw | i'll add a note on the rebase too, so that when we copy the checklist for next time we have that | 19:40 |
clarkb | and there was a discussin on a github issue with them that basically boiled down to there are packages moin cares about and there are packages horizon cares about and splitting them to make that clear is happening and this is part of that | 19:40 |
clarkb | all that to say I think we are good to rename the xstatic repo | 19:40 |
clarkb | I'll try to take a look oever everything again tomorrow too | 19:41 |
clarkb | and I think that may be it? THank you everyone for helping put this together. Gerrit stuff is always fun :) | 19:41 |
ianw | this has been a big one! | 19:41 |
clarkb | #topic Upgrading Old Servers | 19:42 |
clarkb | #link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes | 19:42 |
clarkb | ianw: the nameserver changes have landed for the most part | 19:42 |
clarkb | #link https://etherpad.opendev.org/p/2023-opendev-dns | 19:42 |
clarkb | I suspect docker/etc/etc have made this go slowly (I ran into similar issues) | 19:43 |
clarkb | But is there anything to do on this next? maybe reconvene next week after gerrit is done? | 19:43 |
ianw | yeah i haven't got back to that, sorry. i just need some clear air to think about it :) | 19:43 |
clarkb | understood | 19:43 |
ianw | but yeah, still very high on the todo list | 19:43 |
clarkb | Yesterday I picked up my todo list here and launched a replacement static server and a replacement etherpad server | 19:43 |
clarkb | #link https://review.opendev.org/q/topic:add-static02 static and etherpad replacements | 19:43 |
clarkb | reviews on these changes are very much appreciated. I think everything should be good to go except for reverse dns records for these two servers. Neither does email so that isn't critical and fixing that is in progress | 19:44 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/879388 Fix rax rdns | 19:44 |
clarkb | The updated scripting to do rdns automatically was tied into launch node and this change fixes that. I think once this chagne lands we can run the rdns command for the new servers to have it update the records. Worst case I can do it through the web ui | 19:45 |
clarkb | I also discovered in this process that the launch env openstack client could not list rax volumes (though it could attach them) | 19:45 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/879387 Reinstall launch env periodically | 19:45 |
clarkb | this change should address that as I think we decided that we didn't reinstall the env after we fixed the dep versions for listing volumes in rax? | 19:46 |
clarkb | fungi: ianw: ^ I'm still behind on that one if you have anything to add | 19:46 |
ianw | i think that's it; fungi did confirm a fresh venv seemed to work | 19:46 |
ianw | the other thing we can do is just rm it and start it again | 19:46 |
fungi | i think we can just move /user/launcher-venv aside and let deploy recreate it in place as a test | 19:46 |
fungi | or just merge the change | 19:46 |
clarkb | I need to review the change before I decide onthat. But I suspect just merging the change is fine since we don't rely on this launch env for regularly processing | 19:47 |
ianw | i'm happy to monitor it and make sure it's doing what i think it's going to do :) | 19:47 |
clarkb | And ya reviews on the new static server in particular would be good. I stacked things due to the dns updates and etherpad is more involved | 19:48 |
clarkb | #topic AFS volume quotas | 19:48 |
ianw | lgtm and i'll review the etherpad group stuff after a tea :) | 19:48 |
clarkb | thanks! | 19:48 |
clarkb | I don't have much new here other than to point out that the utilization is slowly climbing up over time :) | 19:49 |
clarkb | some of that may be all of the openstack release artifacts too so not just mirror volumes | 19:49 |
fungi | but seems like we might free up some space with the wheel cache cleanup too | 19:49 |
clarkb | But lets keep an eye on it so that we can intervene before it becomes and emergency | 19:49 |
clarkb | ++ | 19:49 |
clarkb | #topic Gitea 1.19 | 19:50 |
clarkb | moving along now as we are running out of time | 19:50 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/877541 Upgrade opendev.org to 1.19.0 | 19:50 |
clarkb | There is no 1.19.1 yet last I checked and in the past we have waited for the next bugfix release to land before upgrading gitea. I'm not in a hurry for this reason. But I think if we review the change now it is unlikely there will be a bit delta when .1 arrives. Also if we like we can update to 1.19.0 before .1 releases | 19:50 |
ianw | maybe first thing after gerrit upgrade? just in case? | 19:51 |
clarkb | The major update to this release is gitea actions which is experimental and we disable | 19:51 |
clarkb | ++ | 19:51 |
clarkb | #topic Quo vadis Storyboard | 19:52 |
clarkb | As prediced a number of projects decided to move off of storyboard at the PTG. We may be asked to mark the projects read only in storyboard once they have moved | 19:52 |
clarkb | It does feel like the individualprojects could do a bit more to work together to create a more consistent process but I don't think we can force them to do that or mandate anything. Just continue to enourage them ot talk to one another I guess | 19:53 |
fungi | i have a stack of projects i need to switch to inactive and update descriptions on now | 19:53 |
fungi | just haven't gotten to that yet | 19:53 |
clarkb | #topic Open Discussion | 19:54 |
clarkb | I actually meant to put the wheel mirror stuff on the agenda then spaced it. | 19:54 |
clarkb | But I think we can talk about that now if there was anything important to bring up that isn't already on the mailing list | 19:54 |
ianw | yeah, i'm still trying to come up with a concrete plan | 19:54 |
fungi | your preliminary exploration seems promising though | 19:55 |
ianw | but i think the first thing we should probably action is double-checking the audit tool is right, and cleaning up | 19:55 |
fungi | i need to find time to reply on the ml thread | 19:55 |
clarkb | at the very lest I think the cleanup can probably be run before any long term plan is committed to | 19:55 |
clarkb | ianw: ++ | 19:55 |
ianw | #link https://review.opendev.org/c/opendev/system-config/+/879239 | 19:55 |
ianw | that has some links to the output, etc. | 19:55 |
fungi | yeah, i feel like we know what's safe to delete now, even if we haven't settled on what's safe to stop building yet | 19:55 |
fungi | also, this is one of those times where vos backup may come in handy | 19:56 |
ianw | right, we can make sure we're not putting back anything we don't want (i.e. the prune is working) | 19:56 |
frickler | regarding afs, is there a way we could increase capacity further by adding more volumes or servers? or are we facing a hard limit there? just to know possible options | 19:56 |
ianw | and evaluate what clarkb was talking about in that we don't need to carry .whl's that don't build against libraries really | 19:56 |
clarkb | frickler: yes we can add up to 14 cinder volumes of 1TB each to each server | 19:57 |
fungi | for some stuff it's pretty straightforward. if we decide to delete all pure python wheels on the premise that they're trivially rebuildable from sdist in jobs, we might want to vos backup so we can quickly switch the volume state back if our assumptions turn out to be incorrect | 19:57 |
clarkb | frickler: then we add them to the vicepa pool or add a vicepb or something. I would have to look at the existing server setup to see how the existing 3TB is organized | 19:57 |
corvus | and no known limit to the number of servers | 19:57 |
ianw | it's all in vicepa | 19:57 |
fungi | yeah, we discussed previously that adding more servers probably helps distribute the points of failure a bit, vs the situation we ended up in with the static file server that had 14 attached volumes and fell over if you sneezed | 19:58 |
ianw | i think we can definitely grow if we need to, but "more storage == more problems" :) | 19:58 |
ianw | heh, yeah | 19:58 |
clarkb | yup and we do have three servers only one of which is under the massive pressure. We might want to double check if we can drop content from that server or rebalance somhow | 19:58 |
clarkb | (though I suspect with only three servers the number of organizations is small) | 19:59 |
clarkb | we definitely have options though which is a good thing | 19:59 |
fungi | i would want to add a server if we're rebalancing, because we probably need to be able to continue functioning when one dies | 19:59 |
clarkb | and we are officially at time | 20:00 |
clarkb | Thank you everyone! | 20:00 |
ianw | anyway, i'm happy to run the wheel cleanup, but yeah, after someone else has looked at it :) | 20:00 |
clarkb | #endmeeting | 20:00 |
opendevmeet | Meeting ended Tue Apr 4 20:00:29 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2023/infra.2023-04-04-19.01.html | 20:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-04-04-19.01.txt | 20:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2023/infra.2023-04-04-19.01.log.html | 20:00 |
fungi | thanks! | 20:00 |
* clarkb hunts down lunch. I'll dig into the various reviews afterwards | 20:01 | |
clarkb | actually let me do the container one now since it should go quickly | 20:01 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!