Tuesday, 2023-11-14

clarkbalmost meeting time18:59
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Nov 14 19:00:26 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/NIDXZX7JT4MQJOUS7GKI5PPRMDIIY6FI/ Our Agenda19:00
clarkbThe agenda went out late because I wasn't around yesterday, but we do have an agenda19:00
clarkb#topic Announcements19:01
clarkbNext week is a big US holiday. That said, I expect to be around for the beginning of the week and plan to host our weekly meeting Tuesday19:01
clarkbBut be aware that by Thursday I expect it to be very quiet19:01
clarkb#topic Mailman 319:02
clarkbfungi: I think you dug up some more info on the template file parse error? And basically mailman3 is missing some file that they need to add after django removed it from their library?19:02
fungithe bug we talked about yesterday turns out to be legitimate, yes19:03
fungier, last week i mean19:03
tonybtime flies19:03
clarkbto confirm we are running all of the versions of the softwrare we expect, but a new bug has surfaced and we aren't seeing an old bug due to accidental use of old libraries19:03
fungiyeah, and this error really just means django isn't pre-compressing some html templates, so they're a little bigger on the wire to users19:04
clarkbin that case I guess we're probably going to ignore this until the next mm3 upgrade?19:05
fungi#link https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/36U5NY725FNJSGRNELFOJLLEZQIS2L3Y/ mailman-web compress - Invalid template socialaccount/login_cancelled.html19:05
fungiyeah, it seems safe to just ignore and then we can plan to do a mid-release update when it gets fixed if we want, or wait until the next release19:05
clarkbshould we drop this agenda item from next weeks meeting then?19:06
clarkbI believe this was the last open item for mm319:06
fungii think so, yes. we can add upgrades to the agenda as needed in the future19:06
clarkbsounds good. Thanks again for workign through all of this for us19:06
clarkb#topic Server Upgrades19:06
fungithanks for your patience and help!19:06
clarkbwe added tonyb to the root list last week and promtly put him to work booting new servers :)19:07
tonyb\o/19:07
clarkbmirror01.ord.rax is being replaced wtih a new mirror02.ord.rax server courtesy of tonyb19:07
clarkb#link https://review.opendev.org/c/opendev/zone-opendev.org/+/90092219:07
clarkb#link https://review.opendev.org/c/opendev/system-config/+/90092319:07
clarkbThese changes should get the server all deployed, then we can confirm it is happy before udpating DNS to slip over the mirror.ord.rax CNAMEs19:08
clarkbI think the plan is to work through this one first and then start doing others19:08
tonybAfter a good session booting mirror02 I managed to clip some for the longer strings and so the reviews took me longer to publish19:08
clarkbtonyb: I did run two variations of ssh-keyscan in order to dobule check the data19:08
tonybclarkb: Thanks19:08
clarkbI think it is correct and noted taht in my reviews when I noticed the note about the copy paste problems19:08
clarkbfeel free to continue asking questions and poking for reviews. This is really helpful19:09
tonybI started writing a "standalone" tool for handling the volume setup as the mirrors nodes are a little different19:09
tonybYup I certainly will do.19:10
clarkbtonyb: ++ to having the mirror volumes a bit more automated19:10
fungiagreed, we have enough following that pattern that it could be worthwhile19:11
funginote that not all mirror servers get that treatment though, some have sufficiently large rootfs we just leave it as-is and don't create additional volumes19:11
tonybI think thats about that for the mirror nodes.  It's mostly carfully follwoing the bouncing ball at this stage19:11
clarkbcool. I'm happy to do another runtrhough too if we like. I feel like that was helpful for everyone as it made probelms with cinder volume creation apparent and so on19:12
tonybfungi: Yup.  and as we can't always predict the device name in the guest it wont be fully automated ot intgrated it's just to document/simlify the creation work we did on the meetpad19:13
fungii too am happy to do another onboarding call, maybe for other activities19:13
* tonyb too.19:13
clarkbanything else on this topic?19:14
tonybnot from me19:14
clarkb#topic Python Container Updates19:14
clarkbUnfortunately I haven't really had time to look at the failures here in more detail. I saw tonyb asking question about them though, were you looking?19:15
clarkb#link https://review.opendev.org/c/zuul/zuul-operator/+/881245 Is the zuul-operator canary change19:15
clarkbspecifically we need that change to begin passing in zuul-operator before we can land the updates for the docker image in that repo19:15
tonybI am looking at it19:16
tonybI spoke to dpawlik about status and background19:16
corvusi suspect something has bitrotted with cert-manager; but with the switch in k8s setup away from docker, we don't have the right logs collected to see it, so that's probably the first task19:16
tonybNo substantial progress but I'm finding my feet there19:17
corvus(in other words, the old k8s setup got us all container logs via docker, but the new setup needs to get them from k8s and explicitly fetch from all namespaces)19:17
clarkbgotcha19:17
clarkbbecause we are no longer using docker under k8s19:17
corvusyep19:18
clarkbI agree, addressing log collection seems like a good next step19:18
tonybOkay that's good to know.19:18
clarkb#topic Gitea 1.2119:19
clarkb1.21.0 has been released19:20
clarkb#link https://github.com/go-gitea/gitea/blob/v1.21.0/CHANGELOG.md we have a changelog19:20
fungi(and there was much rejoicing)19:20
clarkb#link https://review.opendev.org/c/opendev/system-config/+/897679 Upgrade change needs updating now that we have changelog info19:20
clarkbso ya the next step here is to go over the changelog and make sure our change is modified properly to handle their breaking changes19:20
clarkbI haven't even looked at the changelog yet19:21
clarkbbut doing so and modifying that change is on my todo19:21
clarkb*todo list19:21
clarkbIn the past we've often not upgraded until the .1 release anyway due to them very quickly releasing bugfixes19:21
funginobody ever wants to go first19:22
clarkbbetween that and the gerrit upgrade and then thanksgiving I'm not sure this is urgent, but also dont' want it to get forgotten19:22
fungii agree that the next two weeks are probably not a great time to merge it, but i'll review at least19:22
clarkbsounds good. Should have something to look at in the next day or so19:23
fricklerI'm wondering about the key length thing, how much effort would it be to use longer keys?19:23
tonybFWIW I'll review it to and, probably, ask "why do we $x" questions ;P19:23
clarkbfrickler: we need to generate a ne key, add it to the gerrit user in gitea (this step may be manual currently I think we only automate this at user creation time) and then add the key to gerrit and restart gerrit to pick it up19:24
clarkbfrickler: I suspect taht if we switch to ed25519 then we can have it sit next to the existing rsa key in gerrit and we don't have to coordinate any moves19:24
clarkbif we replace shorter rsa key with logner rsa key then we'd need a bit more coordination19:24
fungiwell, we could have multiple rsa keys too, right?19:25
clarkbfungi: I don't think gerrit will find multiple rsa keys19:25
clarkbbut I'm not sure of that. We can test that on a held node I guess19:25
fungioh, right, local filename collision19:25
fungiwe can do two different keytypes because they use separate filenames19:26
clarkbyup19:26
clarkbI can look into that more closely as I page the gitea upgrade stuff abck in19:26
fungii was thinking in the webui, not gerrit as a client19:26
fungiso yeah, i agree parallel keys probably makes the transition smoother than having to swap one out in a single step19:27
clarkbspeaking of Gerrit:19:27
clarkb#topic Gerrit 3.8 Upgrade19:27
fungithough i guess if we add the old and new keys to gitea first, then we could swap rsa for rsa on the gerrit side19:27
fungibut might need a restart19:27
clarkbit will need a restart of gerrit in all cases iirc19:27
clarkbbecause it reads the keys on startup19:27
clarkbFor the Gerrit upgrade I'm planning on going through the etherpad again tomorrow19:28
clarkb#link https://etherpad.opendev.org/p/gerrit-upgrade-3.819:28
clarkbI want to make sure I understand the screen logging magic a bit better19:28
clarkbbut also would appreciate reviews of that plan if you haven't read it yet19:28
fungialso for the sake of the minutes...19:28
fungi#link https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/XT26HFG2FOZL3UHZVLXCCANDZ3TJZM7Q/ Upgrading review.opendev.org to Gerrit 3.8 on November 17, 202319:28
fungii figured you were going to include that in the announcements at the beginning19:28
clarkbas far as coordination goes on the day of I expect I can drive things, but maybe fungi you can do some of the earlier stuff like adding hosts to emergency files and sending #status notice notices19:29
clarkbI'll let you know if my expectations for that change, but I don't expect them to19:29
fungihappy to. i think i'm driving christine to an eye appointment, but can do basic stuff from my phone in the car19:30
fungi(also the appointment is about 2 minutes from the house)19:31
clarkbseems like we are in good shape. And I'll triple check myself before Friday anyway19:31
tonybI can potentially do some of the "non-destructive" early work19:31
clarkbtonyb: oh! we should add you to the statusbot acls19:31
fungiwe should add tonyb to statusbot19:31
clarkband generally irc acls 19:31
tonybbut that may make more work than doing it19:31
fungihah, jinx!19:31
tonybhehe19:32
fungitonyb: it's work that needs doing sometime anyway19:32
tonybso how owes who a soda?19:32
fungii can take care of it19:32
tonybkk19:32
fungii owe everyone soda anyway19:32
tonybLOL19:32
clarkbopenstack/project-config/accessbot/channels.yaml is one file that needs editing19:32
fungistill repaying from my ptl days19:32
tonybI can do that.19:33
clarkbI'm not acutally sure where statusbot gets its user list. Does it just check for opers in the channel it is in?19:33
fungii'll look into it19:33
corvusi think it's a config file19:34
clarkbnope its statusbot_auth_nicks in system-config/inventory/service/group_vars/eavesdrop.yaml19:34
clarkbtonyb: ^ so that file too19:34
fungithanks, i was almost there19:34
tonybgotcha19:34
clarkbanything else Gerrit upgrade related?19:34
fungii'm getting slow this afternoon, must be time to start on dinner19:35
clarkbits basically lunch time. I'm starving19:35
tonybCoffee o'clock and then a run. ... and then lunch19:35
clarkbalright next up19:36
clarkb#topic Ironic Bug Dashboard19:36
clarkb#link https://github.com/dtantsur/ironic-bug-dashboard19:36
clarkbThe ironic team is asking if we woudl be willing to run an instance of their bug dashboard tool for them19:36
fungiJayF: you were going to speak to this one?19:36
JayFSo some context; this is an old bug dashboard. No auth needed. Simplest python app ever.19:36
fungiotherwise i can19:36
JayFWe've run it in various places we've just done custom-ly, before doing that again with our move to LP, we thought we'd ask about getting it a real home.19:37
JayFNo depedencies. Literally just needs a place to run, and I think dtantsur wrote a dockerfile for it the other day, too19:37
clarkbMy major concern is that running a service for a single project feels very inefficient from our side. If someone wanted to resurrect the openstack bug dashboard instead I feel like that might be a little different?19:37
fungiso options are for adding it to opendev officially (deployment via ansible/container image building and testinfra tests), or us booting a vm for them to manage themselves19:38
tonybThe docs show using podman etc so yeah I think that's been done19:38
clarkbadditionally teams like tripleo have had one tool and ironic has one apparently and so on. I think it is inefficient for the project teams too19:38
fungifor historical reference, "the openstack bug dashboard" was called "bugday"19:38
JayFclarkb: I talked to dtantsur; we are extremely willing to take patches (and will email the list about this existing again once we get it a home) if other teams want t ouse it19:38
fricklerJayF: so would you be willing to run this yourself if we give you an vm with an DNS record?19:40
JayFfungi: it's extremely likely if infra says no, and we host it out of band, we'd do something similar to the second option (just get a VM somewhere and run it manually)19:40
JayFfrickler: replace instances of "you" and yourself" with ironic community as appropriate and the answer is "yes", with specific contacts being dtantsur and I to start19:40
JayFfrickler: if you all had no answer for us, nonzero chance this ended up as a container in my homelab :)19:41
fricklerthat would be an easy start and we could see how it develops19:41
clarkbso basically the idea behind openstack infra and now opendev was that we'd avoid doing stuff like this and instead create a commons where projects could work together to address common problems19:41
fungiyeah, when this came up yesterday in #openstack-ironic i mentioned the current situation with the opensearch-backed log ingestion service dpawlik set up19:42
clarkbwhere we've struggled is when projects do things like this specific tool and blaze their own trail. This takes away potential commons resources as well as multiplies effort required19:42
JayFFrom an infra standpoint; I'm with you.19:42
JayFThis is why it's an opportunistic ask with almost an expectation that "no" was a likely answer.19:42
clarkbI think that if we were to host it it would need to be a more generic tool for OpenDev users and not ironic specific. I have fewer concerns with handing over a VM19:42
JayFFrom a community standpoint; that was storyboard; we adopted it; it disappeared; we are trying to dig out from that mistake19:42
frickleriiuc the tool is open to be used by other projects, they just need to amend it accordingly19:43
fungii do think we want to encourage people to collaborate on infrastructure that supports their projects when there is a will to do so19:43
JayFand I do not want to burn more time trying to go down alternate "work together" paths in pursuit of that goal19:43
clarkbJayF: the problem is that all the cases of not working together are why we have massive debt19:43
clarkbironic is not the only project trying to deal with storyboard for example19:44
JayFclarkb: I have lots of examples of cases of us working together that also have massive debt; so I'm not sure I agree with all of the root causing, but I do understand what you're getting at and like I said, if the answer is no, it's no. 19:44
fungibasically the risk is that the opendev sysadmins are the last people standing when whoever put some service together disappears and there are still users19:44
clarkband despite my prodding very little collaboration between teams with the same problems has occured as an example19:44
fungiso we get to be the ones who tell users "sorry, nobody's keeping this running any more"19:44
clarkbthe infra sig continues to field questions about how to set up LP19:45
clarkbstuff that should have ideally been far more coordinated among the groups moving19:45
fungii mostly just remind folks that we don't run launchpad, and it has documentation19:45
clarkband I can't shake the feeling that an ironic bug dashboard is just an extension of these problems and we'll end up being asked to run a different tool for nova and then a different one for sdks and so on19:46
JayFThis is off topic for the meeting, but the coordination is always the most difficult part ime; which is why for Ironic's LP migration it finally started moving when I stopped trying so hard to pull half of openstack with me. 19:46
clarkbwhen what we need as a group is rough agreement on what a tool should be and then run that. And as mentioend before this tool did exist19:46
clarkbbut it too ran into disrepair and was no longer maintained and we shut it off19:46
JayFIt sounds like consensus is no though; so for this topic you all can move on. I wouldn't want you all to host it unless everyone was onboard, anyway.19:47
clarkbI don't think we necessarily need to resurrect bugday the code base, but I think if opendev hosts something it should be bugday the spiritial successor tool and not an ironic specific tool19:47
fungii think it can be "not yet" instead of just "no"?19:47
fungialso i'm not opposed to booting a vm for them to run it on themselves, while they work on building consensus across other teams to possible make it useful beyond ironic's use case19:48
JayFI just sent an email to the mailing list, last week, about how a cornerstone library to OpenStack is rotting and ~nobody noticed. I'm skeptical someone is going to take up the banner of uniting bug dashboards across openstack.19:48
JayFfungi: I do not commit to building such a consesnus. I commit to being open to accepting patches.19:48
fungiwith the expectation that if opendev is going to officially take it on, then there will need to be more of a cross-project interest (and of course configuration management and tests)19:49
clarkbya I'm far less concerned with booting a VM and adding a DNS record19:49
JayFfungi: not trying to be harsh; just trying to set a reasonable expectation to be clear :)19:49
JayFmy plate is overflowing and I can't fit another ounce on it19:49
fungisure. and we've all been there more than once, i can assure you ;)19:49
fungiJayF: so there are some options and stipulations you can take back to the ironic team for further discussion, i guess19:50
JayFIf you want to give us a VM and a DNS name, that will work for us. If not, I'll go get equivalent from my downstream/personal resources and my next steps are the same either way19:51
corvusi'm not sure i'm a fan of the "boot a vm and hand it over" approach19:51
corvusif a vm is going to be handed over, i don't see why that's an opendev/infra team ask... i don't feel like we're here to hand out vms, we're here to help facilitate collaboration.  anyone can propose a patch to run a service if the service fits the mission.  so if it does fit the mission, that's how it should be run.  and if it doesn't, then it shouldn't be an opendev conversation.19:51
fungishould we not have provided the vm for the log ingestion system that loosely replaced the old logstash system? mistake in your opinion, or failed experiment, or...?19:53
corvusi thought that ran on aws or something19:53
clarkbthe opensearch cluster runs in aws, but there is a node that fetches logs and sends them to opensearch that dpawlik is managing19:54
fungithe backend does, but the custom log ingestion glue to zuul's interface is on a vm we booted for the systadmins19:54
fungier, s/systadmins/admins of that service/19:54
corvusi was unaware of that, and yeah, i think that's the wrong approach.  for one, the fact that i'm a root member unaware of it and it's not documented in https://docs.opendev.org/opendev/system-config/latest/ seems like a red flag.  :)19:54
corvusthat seems like something that fits the mission and should be run in the usual manner to me19:56
clarkbya better documentation of the exceptional node(s) is a good idea19:56
fungiand possibly also deciding as a group that exceptions are a bad idea19:56
corvusi think the wiki is an instructive example here too19:56
JayFOne thing I'll note that is a key difference about the service I proposed (and I suspect that logstash service) is their stateless nature.19:57
fungithe main takeaway we had from the wiki is that we made it clear we would not take responsibility for the services running the log search service19:57
JayFIt doesn't address the basic philosophical questions; but it does draw a different picture than something like the wiki does.19:57
fungiand that if the people maintaining it go away, we'll just turn it off with no notice19:57
corvusyeah, in both new cases running them is operationally dead simple19:58
clarkb(side note I think the original plan was to run the ingestion on the cluster itself but then realized that you can't really do that with the openserach as a service)19:59
corvusi must have gotten the first version of the memo and not the update19:59
clarkbbecause they delete and replace servers or somethign for upgrades. Its basically an appliance20:00
clarkbwe are at time.20:00
clarkb#topic Upgrade Server Pruning20:00
clarkb#undo20:00
opendevmeetRemoving item from minutes: #topic Upgrade Server Pruning20:00
clarkb#topic Backup Server Backup Pruning20:00
clarkbreally quickly before we end I wanted to note that the rax backup server needs its backups pruned due to disk utilization20:01
clarkbMaybe that is somethign tonyb wants to do with anothe root (ianw set it up and documented and scripted it well so its mostly a matter of going through the motions)20:01
tonybYup happy to.20:01
clarkb#topic Open Discussion20:01
fungii'm also happy to help tonyb if there are questions about backup pruning20:02
clarkbWe don't really have time for this but feel free to take discussion to #opendev or service-discuss@lists.opendev.org to bring up extra stuff and/or keep talking about the boot a VM and hand it over stuff20:02
tonybfungi: thanks.20:02
clarkband happy 1700000000 day20:02
fungiwoo!20:02
clarkbI thinkwe are about 2 hours away?20:02
clarkbsomething like that20:02
clarkbthank you everyoen for your time!20:03
clarkb#endmeeting20:03
opendevmeetMeeting ended Tue Nov 14 20:03:06 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:03
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2023/infra.2023-11-14-19.00.html20:03
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-11-14-19.00.txt20:03
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2023/infra.2023-11-14-19.00.log.html20:03

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!