19:01:01 #startmeeting infra 19:01:01 Meeting started Tue Nov 15 19:01:01 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:01 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:01 The meeting name has been set to 'infra' 19:01:07 o/ 19:01:07 #link https://lists.opendev.org/pipermail/service-discuss/2022-November/000379.html Our Agenda 19:01:13 #topic Announcements 19:01:23 I had no announcements 19:01:28 #topic Topics 19:01:28 there are a couple 19:01:30 #undo 19:01:30 Removing item from minutes: #topic Topics 19:01:38 go for it 19:01:53 nominations for the openinfra foundation board of directors are open 19:02:06 and the cfp for the openinfra summit in vancouver is now open as well 19:02:43 #link https://lists.openinfra.dev/pipermail/foundation/2022-November/003104.html 2023 Open Infrastructure Foundation Individual Director nominations are open 19:02:58 #link https://lists.openinfra.dev/pipermail/foundation/2022-November/003105.html The CFP for the OpenInfra Summit 2023 is open 19:03:12 that's all i can think of though 19:03:44 #topic Bastion Host Updates 19:03:51 #link https://review.opendev.org/q/topic:prod-bastion-group 19:03:55 #link https://review.opendev.org/q/topic:bridge-ansible-venv 19:04:19 looks like a few changes have merged since we last discussed this. ianw anything urgent or otherwise not captured by the two change topics that we should look at? 19:04:37 One idea I had was maybe we should consolidate to a single topic for review even if there are distinct trees of change happening? 19:05:15 yeah i can clean up; i think prod-bastion-group is really now about being in a position to run parallel jobs 19:05:37 which is basically "setup source in one place, then fire off jobs" 19:06:29 ah so maybe another topic for "things we need to do before turning off the old server" ? 19:06:49 the bridge-ansible-venv; one i'll get back to is us storing the host keys for our servers and deploying to /etc/ssh 19:07:02 fungi had some good points on making that better, so that's wip, but i'll get to that soon 19:07:47 (the idea being that when we start a new bridge, i'm trying to make it so we have as few manual steps as possible :) 19:08:06 ++ 19:08:09 so writing down the manual steps has been a good way to try and think of ways to codify them :) 19:08:21 it's a good approach, but gets weird if you don't include all ip addresses along wit the hostnames, and we have that info in the inventory already 19:08:33 the only other one is 19:08:34 #link https://review.opendev.org/c/opendev/system-config/+/861284 19:08:53 which converts our launch-node into a small package installed in a venv on the bridge node 19:09:07 and that should address our paramiko needs? 19:09:12 I'll have to take a look at that 19:09:27 i'm going to need to launch a new server for mm3 this week probably, so will try to give that change a closer look 19:10:42 Great, I'll have need of that too for our next topic 19:10:51 #topic Upgrading old Servers 19:10:52 yep, it fixes that issue, and i think is a path to help with openstacksdk versions too 19:10:52 if we need two venv's with different versions -- well that's not great, but at least possible 19:11:05 #link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades 19:11:49 later this week I'm hoping to put a dent into some of these. I think we've basically sorted out all of the logistical challenges for doing jammy things once 861284 has landed 19:11:57 so ya I'll try to review that and continue to make progress here 19:12:33 I don't think there is much else to say about this now. We've hit the point where we just need to start doing the ugprades. 19:13:06 #topic Mailman 3 19:13:25 fungi has been pushign on this again. We ran into some problems with updated images that I think should be happier now 19:13:40 in preparing to do a final round of import tests, i got stuck on upstream updates to the container image tag we were tracking which slightly broke our setup automation 19:13:55 and fixing those in turn broke our forked images 19:14:06 which i think i have sorted (waiting for zuul to confirm) 19:14:17 the good news is that this is giving us a sneak preview of the sorts of changes that need to be made for mm3 updates 19:14:28 I wouldn't say the story there is great, but it should at least be doable 19:14:59 but pending successful repeat of a final round of migration testing, i hope to boot a future mm3 prod server later this week and send announcements for migration maintenance maybe? 19:15:00 the main issue being the way mailman deals with dependencies and needing specific versions of stuff 19:15:08 fungi: ++ 19:15:13 yeah, it's dependency hell, more or less 19:16:00 anyway, might be worth looking at the calendar for some tentative dates 19:16:25 i don't expect we need to provide a ton of advance notice for lists.opendev.org and lists.zuul-ci.org, but a couple of weeks heads-up might be nice 19:16:39 which puts us in early december at the soonest 19:16:48 that seems reasonable 19:16:55 ++ 19:17:24 based on my tests so far, importing those two sites plus dns cut-over is all doable inside of an hour 19:17:49 lists.openstack.org will need a few hours minimum, but i won't want to tackle that until early next year 19:18:19 should i be looking at trying to do the initial sites over a weekend, or do we think a friday is probably acceptable? 19:18:41 I think weekdays should be fine. Both lists are quite low traffic 19:18:54 thinking about friday december 2, or that weekend (3..4) 19:19:04 the second is bad for me, but I won't let that stop you 19:19:13 assuming things look good by the end of this week i can send a two-week warning on friday 19:20:13 could shoot for friday december 9 instead, if that works better for folks, but that's getting closer to holidays 19:20:39 i expect to be travelling at the end of december, so won't be around a computer as much 19:20:49 i think almost no notice is required and you should feel free to do it at your convenience 19:21:24 yeah, from a sending and receiving messages standpoint it should be essentially transparent 19:21:27 (low traffic lists + smtp queing) 19:22:00 for people doing list moderation, the webui will be down, and when it comes back for them it will need a new login (which they'll be able to get into their accounts for via password recovery steps) 19:23:33 that's really the biggest impact. that and some unknowns around how dkim signatures on some folks' messages may stop validating if the new mailman alters posts in ways that v2 didn't 19:24:37 maybe we should have a test ml where people concerned about their mailsetup could test this? 19:24:52 frickler: unfortunately that would require setting up an entirely new domain 19:25:12 I think if we didn't have to do so much on a per domain case this would be easier to test and transition 19:25:28 but starting with low traffic domains is a reasonable stand in 19:25:48 anyway I agree. I don't think a ton of notice is necessary. But sending some notice is probably a good idea as there will be user facing changes 19:25:48 not upfront, but add something like test@lists.opendev.org 19:25:52 would require a new domain before the migration, but could be a test ml on a migrated domain later if people want to test through it without bothering legitimate lists with noise posts 19:25:57 frickler: oh I see, ya not a bad idea 19:26:17 we've had a test ml in the past, but dropped it during some domain shuffles in recent years 19:26:24 i have no objection to adding one 19:26:25 so that people from the large lists could test before those get moved 19:27:17 anything else on this topic or should we move on? 19:27:34 if we're talking fridays, the next possible friday would be november 25, but since the 24th is a holiday around here i'm probably not going to be around much on the 25th 19:27:59 fungi: maybe we should consider a monday instead? and do the 7th or 28th? 19:29:05 it's easier for me to help on a monday, but also i don't imagine i'd be much help ;) 19:29:34 mondays mean smaller tolerance for extra downtime and more people actively trying to use stuff 19:29:49 but fridays mean subtle problems may not get spotted for days 19:29:54 so it's hard to say which is worse 19:29:56 right, but as corvus points out the risk theer is minimal 19:30:11 for openstack that might change, but for opendev.org and zuul-ci.org I think it is fine? 19:30:36 i have a call at 17:00 utc on the 28th but am basically open otherwise 19:30:51 i can give it a shot. later in the day also means ianw is on hand 19:31:14 or earlier in the day for frickler 19:31:18 and we can always reschedule if we get closer and decide that timing is bad 19:32:03 yeah, i guess we can revisit in next week's meeting for a go/no-go 19:32:09 sounds good 19:32:14 no need to burn more time on today's agenda 19:32:18 #topic Updating python base images 19:32:24 #link https://review.opendev.org/c/opendev/system-config/+/862152 19:32:41 This change has the reviews it needs. Assuming nothing else comes up tomorrow I plan to merge it then rebuild some of our images as well 19:33:02 At this point this is mostly a heads up that image churn will occur but it is expected to largely be a noop 19:33:37 #topic Etherpad container log growth 19:33:42 #link https://review.opendev.org/c/opendev/system-config/+/864060 19:34:10 This change ahs been approved. I half expected we need to manually restart the container to hvae it change behavior though 19:34:23 I'll try to check on it after lunch today and can manually restart it at that point if necessary 19:34:34 This should make the etherpad service far more reliable whcih is great 19:35:20 #topic Quo vadis Storyboard 19:35:26 #link https://lists.opendev.org/pipermail/service-discuss/2022-October/000370.html 19:35:51 I did send that update to the mailing list las week explicitly asking for users that would like to keep using storyboard so that we can plan accordingly 19:35:57 (and maybe convince them to help maintain it :) ) 19:36:08 I have not seen any movement on that thread since then though 19:36:25 i've added a highlight of that topic to this week's foundation newsletter too, just to get some added visibility 19:36:45 It does look like the openstack TC is not interested in making any broad openstack wide decisions either. Which means it is unlikely we'll get openstack pushing in any single direction. 19:37:10 I think we should keep the discussion open for another week then consider feedack collected and use that to make decisions on what we should be doing 19:37:59 Any other thoughts or concerns about storyboard? 19:38:23 i had a user request in #storyboard this morning, but it was fairly easily resolved 19:38:31 duplicate account, colliding e-mail address 19:38:50 deactivated the old account and removed the e-mail address from it, then account autocreation for the new openid worked 19:40:11 #topic Vexxhost server rescue behavior 19:40:18 I did more testing of this and learned a bit 19:40:33 For normally launched instances resolving the disk label collision does fix things. 19:41:00 For BFV instances melwitt pointed me at the tempest testing for bfv server rescues and in that testing they set very specific image disk type and bus options 19:41:32 I suspect that we need an image to be created with those properties that are proeprly set for the volume setup in vexxhost. Then theoretically this would work 19:42:06 In both cases I think that vexxhost should consider creating a dedicated rescue image. Possibly one for bfv and one for non bfv. But with labels set (or uuid used) and the appropriate flags 19:42:32 mnaser: ^ I don't think this is urgent, but it is also a nice feature to have. I'd be curious to know if you have an feedback on that as well 19:42:42 it sounds like bfv was something we previously needed but don't anymore; should we migrate to non-bfv? 19:43:11 corvus: that is probably worth considering as well. I did that with the newly deployed gitea load balancer 19:43:14 i suggest sticking to bfv, non-bfv means your data is sitting on local storage 19:43:36 risks are if the hv goes poof the data might be gone, so if its a cattle then you're fine 19:43:45 but if it's a pet that might be a bit of a bad time 19:43:57 mnaser: I think for things like gitea and gerrit we would still mount a distinct data volume, but don't necessarily need the disk to be managed that way too. For the load balancer this is definitely a non issue 19:44:21 oh well in that case when you're factoring it in then you're good 19:44:26 yeah, we would just deploy a new load balancer in a matter of minutes, it has no persistent state whatsoeevr 19:45:04 but also I think these concerns are distinct. Server rescue should work particularly for a public cloud imo so that users can fixup things themselves. 19:45:13 then whether or not we boot bfv is something we should consider 19:45:46 definitely an interesting point to add to our launch node docs though? mostly things are "cattle" but with various degrees of how annoying it would be to restore i guess 19:46:02 In any case I just wanted to give an update on what I found. rescue can be made to work for non bfv instances on our end and possibly for bfv as well but I'm unsure what to set those image property values to for ceph volumes 19:46:58 #topic Replacing Twitter 19:47:09 we currently have a twitter account to post our status bot alerts to 19:47:34 frickler has put this topic up asking if we should consider a switch to fosstodon instead 19:47:51 yes, not urgent for anything, but we should at least prepare imo 19:48:00 I think that is a reasonable thing to do, but I have no idea what that requires in the status bot code. I assume we'd need a new driver since I doubt mastodon and twitter share an api 19:48:07 is it just a credential/url change for the api in statusbot's integration, or does it need a different api? 19:48:28 yeah, same thing i was wondering 19:48:29 definitely a different api 19:48:50 i had a quick look and not *too* hard 19:49:27 i guess if someone has time to write the necessary bindings for whatever library implements that, i'm okay with it, but i don't use either twitter or mastodon so can't speak to the exodus situation there 19:49:42 i don't feel like we need to abandon ship on twitter, but also adding mastodon seems like very little cost. 19:49:51 there isn't a technical reason to remove twitter support or stop posting there. (of course, there may be non-technical reasons, but there always have been). the twitter support itself is a tertiary output (after irc and wiki) 19:50:18 right we could post to both locations (as long as we can still login to twitter which has apparnetly become an issue for 2fa users) 19:50:18 that sums up my position on it as well, yes 19:50:27 and i agree that adding mastodon is desirable if there are folks who would like to receive info there 19:50:50 opendevinfra i think have chosen fosstondon as a host -- it seems reasonable. it broadly shares our philosophy, only caveat is that it is english only (for moderation purposes) 19:51:11 not that i think we send status in other languages ... 19:51:21 I would say if you are interested and have the time to do it then go for it :) This is unlikely to be a priority but also something taht shouldn't take long I don't expect 19:51:21 fosstodon seems like an appropriate place, but note that they currently have a waitlist 19:52:31 yeah, personally i have an account and like it enough to pitch in some $ via the patreon. it seems like a more sustainable model to pay for things you like 19:53:03 maybe we can get on the waitlist now so that it is ready when we have the driver written? 19:53:11 but ya I agree fosstodon seems appropriate 19:53:17 it looks like waitlist processing is not slow 19:53:40 anecdote: https://fosstodon.org/@acmegating waitlisted and approved within a few hours 19:53:43 well i can quickly put in "opendevinfra" as a name, if we like 19:53:54 +1 19:54:07 that matches the twitter account we're using, right? 19:54:13 yes 19:54:13 yep 19:54:13 if so, sgtm 19:54:37 well i'll do that and add it to the usual places, and i think we can make statusbot talk to it pretty quick 19:54:44 sounds good 19:55:03 #topic Open Discussion 19:55:23 There were a few more things I was hoping to get to that weren't on the agenda. Let's see if we can cover them really quickly 19:55:34 I've got a change up to upgrade our Gerrit version 19:55:51 #link https://review.opendev.org/c/opendev/system-config/+/864217 Upgrade Gerrit to 3.5.4 19:56:00 This change needs its parent too in order to land 19:56:14 If that lands sometime this week I can be around to restart Gerrit to pick it up at some point 19:56:48 openstack/ansible-role-zookeeper was renamed to windmill/ansible-role-zookeeper and we've since created a new openstack/ansible-role-zookeeper 19:57:01 yeah, we recently created a new repository for openstack whose name collides with a redirect for an old repository we moved out of openstack (different project entirely, just happens to use the same name). i've got a held node and am going to poke at options 19:57:05 I didn't expect this to cause problems because we have a bunch of foo/project-config repos but because there was a prior rename we hvae redirects in place which this runs afoul of 19:57:25 in addition to fixing this in gitea one way or another we should look into update our testing to call these problems out 19:58:09 And finally, nodepool needs newer openstacksdk in order to run under python3.11 (because old sdk uses pythonisms that were deprecated and removedin 3.11). However new opnstacksdk previously didn't work with our nodepool and clouds 19:58:38 also there's an interesting git-review patch adding patch set descriptions, which looks useful to me https://review.opendev.org/c/opendev/git-review/+/864098 some concern on whether more sophisticated url mangling might be needed, maybe have a look if you're interested 19:58:40 corvus has a nodepool test script thing that I'm hoping to try and use to test this without doing a whole nodepool deployment t osee if openstacksdk updates have made things better (and if not identify the problems) 19:58:43 heh, well if it can happen it will ... is the only problem really that apache is sending things the wrong way? 19:59:10 ianw: its gitea itself redirecting, but ya I think that may be the only problem? 19:59:15 ianw: not apache but gitea 19:59:44 frickler: interesting, I'm not evensure I know what that feature does in gerrit. I'll have to take a look 19:59:50 and I'd also like to learn other root's opinion on the Ubuntu FIPS token patch, if I'm in a minority I might be fine with getting outvoted 20:00:08 clarkb: you can see it in the patch, the submitter used their patch version 20:00:52 frickler: excellent that will help :) 20:00:59 https://review.opendev.org/c/openstack/project-config/+/861457 for fips 20:01:02 frickler: re FIPs I think that is more a question for openstack 20:01:10 yeah, the main issue with things like gerrit patchset descriptions is that we currently can't add regression tests for newer gerrit features unless we can get our git-review tests able to deploy newer gerrit versions 20:01:13 I don't think it runs afoul of our expectations from a hosting side 20:01:17 i would have expected a new project creation to invalidate the gitea redirects. regardless of why it didn't work out, the last time i looked, the redirects were a gitea db entry. probably can be fixed manually, but if so, then we should remember to record that in our yaml files for repo moves since we have assumed that we should be able to reconstruct the gitea redirect mappings from that data alone. 20:01:32 corvus: yup ++ 20:01:47 corvus: yes, i plan to comment or comment out the relevant entry in opendev/project-config 20:01:52 frickler: from the hosting side of things this is a big part of why I don't think we should have fips specific images 20:01:58 after i finish playing with the held node to determine options 20:02:16 frickler: but we already allow jobs to interact with proprietary services (quay is/was for example) 20:02:42 We are at time now. Feel free to continue discussion in #opendev or on the mailing list. Thank you for your time everyone 20:03:02 Next week is a big US holiday week but I expect I'll be around through tuesday and probably most of wednesday 20:03:13 I don't expect to be around much thursday and friday 20:03:19 #endmeeting