19:00:17 #startmeeting infra 19:00:17 Meeting started Tue Apr 2 19:00:17 2024 UTC and is due to finish in 60 minutes. The chair is fungi. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:17 The meeting name has been set to 'infra' 19:00:52 i didn't send out an agenda to the ml yesterday, but will be following the one in the wiki 19:01:09 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:01:24 #topic Announcements 19:02:01 #info OpenStack has a major release occurring tomorrow, please be slushy with configuration change approvals until it's done 19:02:14 lol 19:02:46 #info The PTG is occurring next week, and using our Meetpad platform by default, so please be mindful of changes that might impact Jitsi-Meet or Etherpad servers 19:03:20 #info Join our PTG sessions next week with your questions or whatever you need help with, see the schedule for time and room link 19:03:45 #link https://ptg.opendev.org/ptg.html PTG schedule 19:03:49 Noted 19:04:18 #info https://etherpad.opendev.org/p/apr2024-ptg-opendev PTG discussion topics 19:04:24 #undo 19:04:24 Removing item from minutes: #info https://etherpad.opendev.org/p/apr2024-ptg-opendev PTG discussion topics 19:04:28 #link https://etherpad.opendev.org/p/apr2024-ptg-opendev PTG discussion topics 19:04:47 #topic Upgrading Bionic servers to Focal/Jammy (clarkb 20230627) 19:05:06 #link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades worklist for upgrades 19:05:19 #link https://review.opendev.org/q/topic:jitsi_meet-jammy-update outstanding changes for upgrades 19:05:42 i'm guessing there's been no new progress here in the past week 19:05:48 not that I've seen 19:05:49 Correct. 19:05:55 okay, cool. moving on! 19:06:05 #topic MariaDB Upgrades (clarkb 20240220) 19:06:19 #link https://review.opendev.org/c/opendev/system-config/+/911000 Upgrade etherpad mariadb to 10.11 19:06:34 as previously discussed, that's still waiting for the ptg to end 19:06:51 i also said i'd work on the mailman server and then promptly did zilch 19:07:18 #action fungi Propose a change to upgrade the MariaDB container for our Mailman deployment 19:07:37 we also talked about holding gerrit/gitea upgrades until we have more time after ptg week 19:07:43 any other updates since last meeting? 19:07:49 not from me 19:07:58 this is definitely in the slushy category of change 19:08:07 agreed 19:08:18 #info Being slushy, work will resume post-PTG 19:08:30 #topic AFS Mirror cleanups (clarkb 20240220) 19:08:47 also slushy 19:08:58 #info Being slushy, work will resume post-PTG 19:09:34 i suppose we could do a webserver log analysis as a group activity during the ptg, if we get bored? 19:09:43 i'll stick it on the pad 19:10:08 I also suggested that volunteers could add noble at this point 19:10:09 fungi: any particular goal for the log analysis? 19:10:21 whcih was one of the goals of the cleanup, to make room for new distros like noble 19:10:40 corvus1: the idea came up at our pre ptg and the idea was to do log analysis to identify other afs content that can be removed because it is largely unused 19:10:42 corvus1: to see if there's stuff we're mirroring/proxying that nobody actually uses in jobs 19:10:49 like say possibly the wheel content 19:10:54 ah that one, got it, thx 19:11:21 anyway, i stuck it on the pad just now in case we run out of other activities 19:11:38 #topic Rebuilding Gerrit Images (clarkb 20240312) 19:11:48 this is also on slush-wait? 19:11:59 Last week I said I'd try to do this after the openstack release so Thursday or Friady this week 19:12:08 I think that is still possible though Monday may end up being more likely at this point 19:12:33 #info Holding until the OpenStack release is out, may resume early next week 19:12:46 #topic Review02 had an oops last night (clarkb 20240326) 19:13:00 i haven't seen any rca/post-mortem on this 19:13:14 After our meeting last week I pinged mnaser and guillermesp asking if they had more info and didn't hear back 19:13:17 do we know anything new since last week? do we want to leave it on the agenda? 19:13:37 I think we should leave it on for now as a reminder to see if we can track that down. Avoiding gerrit shutdowns would be a good thing to do fi we can 19:14:11 #info We'll see if we can get more details from the provider on the root cause 19:14:22 #topic Rackspace MFA Requirement (clarkb 20240312) 19:14:37 i didn't see any problems after we did it, nor past the deadline 19:14:52 in particular, job log uploads seem to have continued uninterrupted 19:14:54 ya assuming rax made the changes (and I have no reason not to) I think we're in the clear for now 19:15:04 we can probably drop this from the agenda at this point 19:15:33 i agree. we can always discuss it again if we see any issues we think might be related 19:15:44 #topic Project Renames (clarkb 20240227) 19:16:02 #link https://review.opendev.org/c/opendev/system-config/+/911622 Move gerrit replication queue aside during project renames 19:16:21 mostly just looking for extra reviews on that change 19:16:38 that's been open for a while with no additional reviews, yeah, but i guess there's no hurry as long as we approve it before the maintenance 19:16:39 when we get closer I'll compile the historical record changes that act as an input to the rename playbook 19:16:43 exactly 19:16:55 #info Penciled in April 19, 2024 submit your rename requests now 19:17:19 #topic Nodepool image delete after upload (clarkb 20240319) 19:17:31 i haven't seen the change to configure this yet 19:17:40 I pushed one and corvus1 landed it last week 19:17:46 oh! 19:17:59 i was living under a rock for the past week, sorry about that 19:18:00 this actually exposed a bug in the implementation so we caught a real problem and corvus fixed that. It should now be in effect 19:18:15 very cool. have we observed a reduction in filesystem utilization on the builders? 19:18:31 i double checked the results and it works as expected now 19:18:37 yay! 19:18:49 (so no raw images kept on nb01/02) 19:19:24 does look like nb01 could use some cleanup unrelated to that change though (I see some orphaned intermediate vhd files for example) 19:19:41 yeah a bunch of old zero-byte orphans 19:19:44 yeah, i just pulled up the cacti graph for /opt and saw the same 19:20:03 looks like it will fill up in a few hours if we don't intervene 19:20:41 though i do see a drop in utilization back on thursday. i guess that was when the change went into effect? 19:20:57 yes that sounds right 19:21:40 ouch, nb02 looks like its /opt filled up in the past day as well 19:21:52 there must be some issue going on 19:22:18 yeah, the climb on both looks linear, and seems to start the same time the image cleanup change took effect, so could be related i suppose 19:22:38 hrm, i'll look into that this afternoon 19:22:44 thanks corvus1! 19:22:59 i can try to find a few minutes to help with cleanup activities if needed 19:23:20 nodepool_dib is under 300GB on nb02. Maybe something to do with dib filling up dib_tmp instead 19:23:40 #info This went into effect, but it seems we've sprung a new leak, so need to dig deeper 19:24:48 anything else we want to cover on this topic? we can probably do troubleshooting discussion in #opendev unless we want it recorded as part of the meeting log 19:24:59 ya I think we can troubleshoot later 19:25:07 even if the disks fill it isn't urgent for a little while 19:25:16 agreed 19:25:27 #topic Should the proposal-bot Gerrit account be usable by non-OpenStack jobs (fungi 20240402) 19:25:31 #link https://review.opendev.org/912904 "Implement periodic job to update Packstack constraints" 19:25:41 we discussed this a little in #opendev earlier today as well 19:26:23 for a little historical background, we used a single proposal-bot account back when everything was openstack 19:26:41 At a high level I think it would be ok to have a shared proposal bot since people should be reviewing changes pushed by the bot either wya (however they may just trust the bot in which acse :( ). My bigger concern is that a global shared account implies managing the secret for that account centrally with the jobs that use the secret. For this reason I think it is better to 19:26:43 ask projects to have their own bot accounts if they need them. 19:27:07 now we have some non-openstack usage (opendev projects.yaml normalization changes) and the packstack project is asking to use it for proposing their own manifest updates as well 19:28:10 it does seem like in the packstack case there's only one repo it's going to propose changes to, so they could define the secret for their dedicated bot account there along with the job that uses it, from what i'm seeing 19:28:47 there might be some way to construct jobs that makes me concern moot as well but I haven't thought through that well enough yet 19:29:02 if, for example, starlingx wanted their own translation update jobs that work similarly to openstack's, i wonder how we'd recommend approaching that, since it would need to be in a trusted config repo to get shared, right? 19:29:08 like maybe we push via post-run tasks from the executor 19:30:04 fungi: ya I think thats another case where the ideal would be starlingx have their own central config repo and manage those credetnials independent of however openstack is doing it 19:30:13 but that implies tenant moves etc so may not be straightforward 19:30:42 yeah, obviously for projects that use a dedicated zuul tenant it's easy enough to do 19:31:50 for older non-openstack projects sharing the openstack tenant (including our own opendev use), there are possible compromises to avoid moving to a new tenant 19:32:18 I'm not sure I understand the concern with sharing the one bot? 19:32:25 the secret is centrally managed ad IIUC can't be exposed 19:32:44 tonyb: the problem is I don't want to be responsible for reviewing changes for how packstack wants to automatically update deps 19:32:44 it's not as much a security concern as a "we're on the hook to review their job configuration" concern 19:33:22 that is a packstack concern in my opinion and we should try as much as possible to use the tools we have to keep that off of our plates 19:33:30 we have similar challenges with some openstack projects that haven't moved their job definitions out of our config repo too 19:33:34 Ahh okay. 19:33:38 and in an ideal world we wouldn't care bout openstack and starlingx either but for historical reasons we're nto there yet 19:34:15 just to be clear this isn't specific to packstack its more trying to leverage tooling to avoid becoming bottlenecks 19:34:35 i suppose our recommendations are 1. use a separate zuul tenant, 2. if #1 isn't feasible and you're only in need of proposals to a single project then put the secret and job definition in that project, 3. if #1 isn't feasible and you need to propose changes to multiple repos then we'll help you work something out 19:34:37 Can we make the job then generates the update in one of their repos and depend on it in the pipeline so it's just the actual proposal that we'd be on the hook with? 19:35:08 (also sorry my IRCclient seems to keep disconnecting) 19:36:01 tonyb: ya it might be possible to define a standard interface for pushing stuff to gerrit in zuul jobs in such a way that we share the account but not the figure out what to update in git steps 19:36:19 that is what I was referring to above but I haven't given it enoguh thought ot be confident in that approach one way or another 19:36:23 like return the generated patch as an artifact and then have a standardized rpoposal job push it to gerrit 19:36:37 s/rpoposal/proposal/ 19:36:42 Yeah something like that 19:36:56 I was trying to switch requirements to that model 19:37:07 yup I think it may be possoble to do that 19:37:27 the afs publishing jobs may have some tricks relevant here (embedding restrictions in secrets, etc) 19:39:05 Is there any time pressure? 19:39:26 not from my side. packstack may want to get this done sooner than later though 19:39:54 i suppose we could tell them that the roll-your-own solution is available to them today, or they're welcome to wait 19:40:05 Okay. 19:40:10 or help with the central job idea path if they want to explore that 19:40:17 yes, that too definitely 19:41:17 also i was thinking how a shared account might work across tenants, we obviously can't use the same key because a tenant owner could extract it with minimal effort, but we could assign per-tenant keys and add them all to the one account in gerrit if we want 19:41:43 ++ 19:42:00 though i guess the key could be used to authenticate to the ssh api and make account changes, so maybe one account per tenant is still preferable 19:42:02 This has me thinking that gerrit having an anonymous coward code submission process would be neat 19:42:08 but probably full of foot guns 19:42:46 Yup totally agree 19:43:10 if gerrit allowed us to scope a key to specific permissions, it would be safer 19:43:56 also, if we wanted to switch to https pushes, you're restricted to one api key per account so it wouldn't work there at all 19:45:47 #agreed Let the PackStack maintainers know that they can implement this inside their x/packstack repository with a dedicated Gerrit account, but they're welcome to work with us on a refactor to better support shared account pushes 19:46:31 #info We're looking into a split job model where only the push to Gerrit tasks are defined in the config repo 19:46:43 do those two items capture everyone's takeaways from this discussion? 19:46:53 lgtm 19:46:59 and me 19:47:18 anything else on this topic? 19:47:32 not from me 19:48:13 #topic Open discussion 19:48:27 did anyone have something to discuss that wasn't covered in the agenda so far? 19:48:54 Not from me 19:49:04 I was just notified that multiple people have tested positive for covid after our family easter lunch saturday. I really hope I don't end up sick again but warning that I may be useless again in the near future 19:49:40 that sounds even worse than ill-prepared egg salad 19:50:12 I definitely did not enjoy my time with covid last summer. Would not recommend. If its anything like that again I probably would risk bad eggs :) 19:50:24 er if I had the choice of replacing one iwth the other you know what I mean 19:50:33 Hi, just I wanted to share this - translate.zanata.org sunset on sep 2024 for visibility 19:50:35 https://lists.osci.io/hyperkitty/list/zanata-sunset@lists.osci.io/thread/6F2D6JRPFF6RRKYURB2WMCXSJ6C4AFBS/ 19:50:35 Ergh. Good luck 19:50:46 LOL 19:51:15 i guess the good news is we don't use translate.zanata.org and zanata itself can't get any less-maintained than it already has been for years now 19:51:28 its a race nwo to see who can shtudown faster :) 19:51:31 I want to win this race 19:52:16 I think we all do 19:52:23 Oh nice analogy as race :p 19:52:37 yes, it would be nice if ours isn't the last zanata standing 19:53:38 I will try to put my effort to win the race - thank u for all the help 19:53:48 and thank you for being on top of it 19:54:19 Thank u too! 19:57:06 seems like that's about it. i thank you all for your attention, and return the unused 3 minutes 19:57:09 #endmeeting