clarkb | almost meeting time | 18:59 |
---|---|---|
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Nov 14 19:00:26 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/NIDXZX7JT4MQJOUS7GKI5PPRMDIIY6FI/ Our Agenda | 19:00 |
clarkb | The agenda went out late because I wasn't around yesterday, but we do have an agenda | 19:00 |
clarkb | #topic Announcements | 19:01 |
clarkb | Next week is a big US holiday. That said, I expect to be around for the beginning of the week and plan to host our weekly meeting Tuesday | 19:01 |
clarkb | But be aware that by Thursday I expect it to be very quiet | 19:01 |
clarkb | #topic Mailman 3 | 19:02 |
clarkb | fungi: I think you dug up some more info on the template file parse error? And basically mailman3 is missing some file that they need to add after django removed it from their library? | 19:02 |
fungi | the bug we talked about yesterday turns out to be legitimate, yes | 19:03 |
fungi | er, last week i mean | 19:03 |
tonyb | time flies | 19:03 |
clarkb | to confirm we are running all of the versions of the softwrare we expect, but a new bug has surfaced and we aren't seeing an old bug due to accidental use of old libraries | 19:03 |
fungi | yeah, and this error really just means django isn't pre-compressing some html templates, so they're a little bigger on the wire to users | 19:04 |
clarkb | in that case I guess we're probably going to ignore this until the next mm3 upgrade? | 19:05 |
fungi | #link https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/36U5NY725FNJSGRNELFOJLLEZQIS2L3Y/ mailman-web compress - Invalid template socialaccount/login_cancelled.html | 19:05 |
fungi | yeah, it seems safe to just ignore and then we can plan to do a mid-release update when it gets fixed if we want, or wait until the next release | 19:05 |
clarkb | should we drop this agenda item from next weeks meeting then? | 19:06 |
clarkb | I believe this was the last open item for mm3 | 19:06 |
fungi | i think so, yes. we can add upgrades to the agenda as needed in the future | 19:06 |
clarkb | sounds good. Thanks again for workign through all of this for us | 19:06 |
clarkb | #topic Server Upgrades | 19:06 |
fungi | thanks for your patience and help! | 19:06 |
clarkb | we added tonyb to the root list last week and promtly put him to work booting new servers :) | 19:07 |
tonyb | \o/ | 19:07 |
clarkb | mirror01.ord.rax is being replaced wtih a new mirror02.ord.rax server courtesy of tonyb | 19:07 |
clarkb | #link https://review.opendev.org/c/opendev/zone-opendev.org/+/900922 | 19:07 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/900923 | 19:07 |
clarkb | These changes should get the server all deployed, then we can confirm it is happy before udpating DNS to slip over the mirror.ord.rax CNAMEs | 19:08 |
clarkb | I think the plan is to work through this one first and then start doing others | 19:08 |
tonyb | After a good session booting mirror02 I managed to clip some for the longer strings and so the reviews took me longer to publish | 19:08 |
clarkb | tonyb: I did run two variations of ssh-keyscan in order to dobule check the data | 19:08 |
tonyb | clarkb: Thanks | 19:08 |
clarkb | I think it is correct and noted taht in my reviews when I noticed the note about the copy paste problems | 19:08 |
clarkb | feel free to continue asking questions and poking for reviews. This is really helpful | 19:09 |
tonyb | I started writing a "standalone" tool for handling the volume setup as the mirrors nodes are a little different | 19:09 |
tonyb | Yup I certainly will do. | 19:10 |
clarkb | tonyb: ++ to having the mirror volumes a bit more automated | 19:10 |
fungi | agreed, we have enough following that pattern that it could be worthwhile | 19:11 |
fungi | note that not all mirror servers get that treatment though, some have sufficiently large rootfs we just leave it as-is and don't create additional volumes | 19:11 |
tonyb | I think thats about that for the mirror nodes. It's mostly carfully follwoing the bouncing ball at this stage | 19:11 |
clarkb | cool. I'm happy to do another runtrhough too if we like. I feel like that was helpful for everyone as it made probelms with cinder volume creation apparent and so on | 19:12 |
tonyb | fungi: Yup. and as we can't always predict the device name in the guest it wont be fully automated ot intgrated it's just to document/simlify the creation work we did on the meetpad | 19:13 |
fungi | i too am happy to do another onboarding call, maybe for other activities | 19:13 |
* tonyb too. | 19:13 | |
clarkb | anything else on this topic? | 19:14 |
tonyb | not from me | 19:14 |
clarkb | #topic Python Container Updates | 19:14 |
clarkb | Unfortunately I haven't really had time to look at the failures here in more detail. I saw tonyb asking question about them though, were you looking? | 19:15 |
clarkb | #link https://review.opendev.org/c/zuul/zuul-operator/+/881245 Is the zuul-operator canary change | 19:15 |
clarkb | specifically we need that change to begin passing in zuul-operator before we can land the updates for the docker image in that repo | 19:15 |
tonyb | I am looking at it | 19:16 |
tonyb | I spoke to dpawlik about status and background | 19:16 |
corvus | i suspect something has bitrotted with cert-manager; but with the switch in k8s setup away from docker, we don't have the right logs collected to see it, so that's probably the first task | 19:16 |
tonyb | No substantial progress but I'm finding my feet there | 19:17 |
corvus | (in other words, the old k8s setup got us all container logs via docker, but the new setup needs to get them from k8s and explicitly fetch from all namespaces) | 19:17 |
clarkb | gotcha | 19:17 |
clarkb | because we are no longer using docker under k8s | 19:17 |
corvus | yep | 19:18 |
clarkb | I agree, addressing log collection seems like a good next step | 19:18 |
tonyb | Okay that's good to know. | 19:18 |
clarkb | #topic Gitea 1.21 | 19:19 |
clarkb | 1.21.0 has been released | 19:20 |
clarkb | #link https://github.com/go-gitea/gitea/blob/v1.21.0/CHANGELOG.md we have a changelog | 19:20 |
fungi | (and there was much rejoicing) | 19:20 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/897679 Upgrade change needs updating now that we have changelog info | 19:20 |
clarkb | so ya the next step here is to go over the changelog and make sure our change is modified properly to handle their breaking changes | 19:20 |
clarkb | I haven't even looked at the changelog yet | 19:21 |
clarkb | but doing so and modifying that change is on my todo | 19:21 |
clarkb | *todo list | 19:21 |
clarkb | In the past we've often not upgraded until the .1 release anyway due to them very quickly releasing bugfixes | 19:21 |
fungi | nobody ever wants to go first | 19:22 |
clarkb | between that and the gerrit upgrade and then thanksgiving I'm not sure this is urgent, but also dont' want it to get forgotten | 19:22 |
fungi | i agree that the next two weeks are probably not a great time to merge it, but i'll review at least | 19:22 |
clarkb | sounds good. Should have something to look at in the next day or so | 19:23 |
frickler | I'm wondering about the key length thing, how much effort would it be to use longer keys? | 19:23 |
tonyb | FWIW I'll review it to and, probably, ask "why do we $x" questions ;P | 19:23 |
clarkb | frickler: we need to generate a ne key, add it to the gerrit user in gitea (this step may be manual currently I think we only automate this at user creation time) and then add the key to gerrit and restart gerrit to pick it up | 19:24 |
clarkb | frickler: I suspect taht if we switch to ed25519 then we can have it sit next to the existing rsa key in gerrit and we don't have to coordinate any moves | 19:24 |
clarkb | if we replace shorter rsa key with logner rsa key then we'd need a bit more coordination | 19:24 |
fungi | well, we could have multiple rsa keys too, right? | 19:25 |
clarkb | fungi: I don't think gerrit will find multiple rsa keys | 19:25 |
clarkb | but I'm not sure of that. We can test that on a held node I guess | 19:25 |
fungi | oh, right, local filename collision | 19:25 |
fungi | we can do two different keytypes because they use separate filenames | 19:26 |
clarkb | yup | 19:26 |
clarkb | I can look into that more closely as I page the gitea upgrade stuff abck in | 19:26 |
fungi | i was thinking in the webui, not gerrit as a client | 19:26 |
fungi | so yeah, i agree parallel keys probably makes the transition smoother than having to swap one out in a single step | 19:27 |
clarkb | speaking of Gerrit: | 19:27 |
clarkb | #topic Gerrit 3.8 Upgrade | 19:27 |
fungi | though i guess if we add the old and new keys to gitea first, then we could swap rsa for rsa on the gerrit side | 19:27 |
fungi | but might need a restart | 19:27 |
clarkb | it will need a restart of gerrit in all cases iirc | 19:27 |
clarkb | because it reads the keys on startup | 19:27 |
clarkb | For the Gerrit upgrade I'm planning on going through the etherpad again tomorrow | 19:28 |
clarkb | #link https://etherpad.opendev.org/p/gerrit-upgrade-3.8 | 19:28 |
clarkb | I want to make sure I understand the screen logging magic a bit better | 19:28 |
clarkb | but also would appreciate reviews of that plan if you haven't read it yet | 19:28 |
fungi | also for the sake of the minutes... | 19:28 |
fungi | #link https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/XT26HFG2FOZL3UHZVLXCCANDZ3TJZM7Q/ Upgrading review.opendev.org to Gerrit 3.8 on November 17, 2023 | 19:28 |
fungi | i figured you were going to include that in the announcements at the beginning | 19:28 |
clarkb | as far as coordination goes on the day of I expect I can drive things, but maybe fungi you can do some of the earlier stuff like adding hosts to emergency files and sending #status notice notices | 19:29 |
clarkb | I'll let you know if my expectations for that change, but I don't expect them to | 19:29 |
fungi | happy to. i think i'm driving christine to an eye appointment, but can do basic stuff from my phone in the car | 19:30 |
fungi | (also the appointment is about 2 minutes from the house) | 19:31 |
clarkb | seems like we are in good shape. And I'll triple check myself before Friday anyway | 19:31 |
tonyb | I can potentially do some of the "non-destructive" early work | 19:31 |
clarkb | tonyb: oh! we should add you to the statusbot acls | 19:31 |
fungi | we should add tonyb to statusbot | 19:31 |
clarkb | and generally irc acls | 19:31 |
tonyb | but that may make more work than doing it | 19:31 |
fungi | hah, jinx! | 19:31 |
tonyb | hehe | 19:32 |
fungi | tonyb: it's work that needs doing sometime anyway | 19:32 |
tonyb | so how owes who a soda? | 19:32 |
fungi | i can take care of it | 19:32 |
tonyb | kk | 19:32 |
fungi | i owe everyone soda anyway | 19:32 |
tonyb | LOL | 19:32 |
clarkb | openstack/project-config/accessbot/channels.yaml is one file that needs editing | 19:32 |
fungi | still repaying from my ptl days | 19:32 |
tonyb | I can do that. | 19:33 |
clarkb | I'm not acutally sure where statusbot gets its user list. Does it just check for opers in the channel it is in? | 19:33 |
fungi | i'll look into it | 19:33 |
corvus | i think it's a config file | 19:34 |
clarkb | nope its statusbot_auth_nicks in system-config/inventory/service/group_vars/eavesdrop.yaml | 19:34 |
clarkb | tonyb: ^ so that file too | 19:34 |
fungi | thanks, i was almost there | 19:34 |
tonyb | gotcha | 19:34 |
clarkb | anything else Gerrit upgrade related? | 19:34 |
fungi | i'm getting slow this afternoon, must be time to start on dinner | 19:35 |
clarkb | its basically lunch time. I'm starving | 19:35 |
tonyb | Coffee o'clock and then a run. ... and then lunch | 19:35 |
clarkb | alright next up | 19:36 |
clarkb | #topic Ironic Bug Dashboard | 19:36 |
clarkb | #link https://github.com/dtantsur/ironic-bug-dashboard | 19:36 |
clarkb | The ironic team is asking if we woudl be willing to run an instance of their bug dashboard tool for them | 19:36 |
fungi | JayF: you were going to speak to this one? | 19:36 |
JayF | So some context; this is an old bug dashboard. No auth needed. Simplest python app ever. | 19:36 |
fungi | otherwise i can | 19:36 |
JayF | We've run it in various places we've just done custom-ly, before doing that again with our move to LP, we thought we'd ask about getting it a real home. | 19:37 |
JayF | No depedencies. Literally just needs a place to run, and I think dtantsur wrote a dockerfile for it the other day, too | 19:37 |
clarkb | My major concern is that running a service for a single project feels very inefficient from our side. If someone wanted to resurrect the openstack bug dashboard instead I feel like that might be a little different? | 19:37 |
fungi | so options are for adding it to opendev officially (deployment via ansible/container image building and testinfra tests), or us booting a vm for them to manage themselves | 19:38 |
tonyb | The docs show using podman etc so yeah I think that's been done | 19:38 |
clarkb | additionally teams like tripleo have had one tool and ironic has one apparently and so on. I think it is inefficient for the project teams too | 19:38 |
fungi | for historical reference, "the openstack bug dashboard" was called "bugday" | 19:38 |
JayF | clarkb: I talked to dtantsur; we are extremely willing to take patches (and will email the list about this existing again once we get it a home) if other teams want t ouse it | 19:38 |
frickler | JayF: so would you be willing to run this yourself if we give you an vm with an DNS record? | 19:40 |
JayF | fungi: it's extremely likely if infra says no, and we host it out of band, we'd do something similar to the second option (just get a VM somewhere and run it manually) | 19:40 |
JayF | frickler: replace instances of "you" and yourself" with ironic community as appropriate and the answer is "yes", with specific contacts being dtantsur and I to start | 19:40 |
JayF | frickler: if you all had no answer for us, nonzero chance this ended up as a container in my homelab :) | 19:41 |
frickler | that would be an easy start and we could see how it develops | 19:41 |
clarkb | so basically the idea behind openstack infra and now opendev was that we'd avoid doing stuff like this and instead create a commons where projects could work together to address common problems | 19:41 |
fungi | yeah, when this came up yesterday in #openstack-ironic i mentioned the current situation with the opensearch-backed log ingestion service dpawlik set up | 19:42 |
clarkb | where we've struggled is when projects do things like this specific tool and blaze their own trail. This takes away potential commons resources as well as multiplies effort required | 19:42 |
JayF | From an infra standpoint; I'm with you. | 19:42 |
JayF | This is why it's an opportunistic ask with almost an expectation that "no" was a likely answer. | 19:42 |
clarkb | I think that if we were to host it it would need to be a more generic tool for OpenDev users and not ironic specific. I have fewer concerns with handing over a VM | 19:42 |
JayF | From a community standpoint; that was storyboard; we adopted it; it disappeared; we are trying to dig out from that mistake | 19:42 |
frickler | iiuc the tool is open to be used by other projects, they just need to amend it accordingly | 19:43 |
fungi | i do think we want to encourage people to collaborate on infrastructure that supports their projects when there is a will to do so | 19:43 |
JayF | and I do not want to burn more time trying to go down alternate "work together" paths in pursuit of that goal | 19:43 |
clarkb | JayF: the problem is that all the cases of not working together are why we have massive debt | 19:43 |
clarkb | ironic is not the only project trying to deal with storyboard for example | 19:44 |
JayF | clarkb: I have lots of examples of cases of us working together that also have massive debt; so I'm not sure I agree with all of the root causing, but I do understand what you're getting at and like I said, if the answer is no, it's no. | 19:44 |
fungi | basically the risk is that the opendev sysadmins are the last people standing when whoever put some service together disappears and there are still users | 19:44 |
clarkb | and despite my prodding very little collaboration between teams with the same problems has occured as an example | 19:44 |
fungi | so we get to be the ones who tell users "sorry, nobody's keeping this running any more" | 19:44 |
clarkb | the infra sig continues to field questions about how to set up LP | 19:45 |
clarkb | stuff that should have ideally been far more coordinated among the groups moving | 19:45 |
fungi | i mostly just remind folks that we don't run launchpad, and it has documentation | 19:45 |
clarkb | and I can't shake the feeling that an ironic bug dashboard is just an extension of these problems and we'll end up being asked to run a different tool for nova and then a different one for sdks and so on | 19:46 |
JayF | This is off topic for the meeting, but the coordination is always the most difficult part ime; which is why for Ironic's LP migration it finally started moving when I stopped trying so hard to pull half of openstack with me. | 19:46 |
clarkb | when what we need as a group is rough agreement on what a tool should be and then run that. And as mentioend before this tool did exist | 19:46 |
clarkb | but it too ran into disrepair and was no longer maintained and we shut it off | 19:46 |
JayF | It sounds like consensus is no though; so for this topic you all can move on. I wouldn't want you all to host it unless everyone was onboard, anyway. | 19:47 |
clarkb | I don't think we necessarily need to resurrect bugday the code base, but I think if opendev hosts something it should be bugday the spiritial successor tool and not an ironic specific tool | 19:47 |
fungi | i think it can be "not yet" instead of just "no"? | 19:47 |
fungi | also i'm not opposed to booting a vm for them to run it on themselves, while they work on building consensus across other teams to possible make it useful beyond ironic's use case | 19:48 |
JayF | I just sent an email to the mailing list, last week, about how a cornerstone library to OpenStack is rotting and ~nobody noticed. I'm skeptical someone is going to take up the banner of uniting bug dashboards across openstack. | 19:48 |
JayF | fungi: I do not commit to building such a consesnus. I commit to being open to accepting patches. | 19:48 |
fungi | with the expectation that if opendev is going to officially take it on, then there will need to be more of a cross-project interest (and of course configuration management and tests) | 19:49 |
clarkb | ya I'm far less concerned with booting a VM and adding a DNS record | 19:49 |
JayF | fungi: not trying to be harsh; just trying to set a reasonable expectation to be clear :) | 19:49 |
JayF | my plate is overflowing and I can't fit another ounce on it | 19:49 |
fungi | sure. and we've all been there more than once, i can assure you ;) | 19:49 |
fungi | JayF: so there are some options and stipulations you can take back to the ironic team for further discussion, i guess | 19:50 |
JayF | If you want to give us a VM and a DNS name, that will work for us. If not, I'll go get equivalent from my downstream/personal resources and my next steps are the same either way | 19:51 |
corvus | i'm not sure i'm a fan of the "boot a vm and hand it over" approach | 19:51 |
corvus | if a vm is going to be handed over, i don't see why that's an opendev/infra team ask... i don't feel like we're here to hand out vms, we're here to help facilitate collaboration. anyone can propose a patch to run a service if the service fits the mission. so if it does fit the mission, that's how it should be run. and if it doesn't, then it shouldn't be an opendev conversation. | 19:51 |
fungi | should we not have provided the vm for the log ingestion system that loosely replaced the old logstash system? mistake in your opinion, or failed experiment, or...? | 19:53 |
corvus | i thought that ran on aws or something | 19:53 |
clarkb | the opensearch cluster runs in aws, but there is a node that fetches logs and sends them to opensearch that dpawlik is managing | 19:54 |
fungi | the backend does, but the custom log ingestion glue to zuul's interface is on a vm we booted for the systadmins | 19:54 |
fungi | er, s/systadmins/admins of that service/ | 19:54 |
corvus | i was unaware of that, and yeah, i think that's the wrong approach. for one, the fact that i'm a root member unaware of it and it's not documented in https://docs.opendev.org/opendev/system-config/latest/ seems like a red flag. :) | 19:54 |
corvus | that seems like something that fits the mission and should be run in the usual manner to me | 19:56 |
clarkb | ya better documentation of the exceptional node(s) is a good idea | 19:56 |
fungi | and possibly also deciding as a group that exceptions are a bad idea | 19:56 |
corvus | i think the wiki is an instructive example here too | 19:56 |
JayF | One thing I'll note that is a key difference about the service I proposed (and I suspect that logstash service) is their stateless nature. | 19:57 |
fungi | the main takeaway we had from the wiki is that we made it clear we would not take responsibility for the services running the log search service | 19:57 |
JayF | It doesn't address the basic philosophical questions; but it does draw a different picture than something like the wiki does. | 19:57 |
fungi | and that if the people maintaining it go away, we'll just turn it off with no notice | 19:57 |
corvus | yeah, in both new cases running them is operationally dead simple | 19:58 |
clarkb | (side note I think the original plan was to run the ingestion on the cluster itself but then realized that you can't really do that with the openserach as a service) | 19:59 |
corvus | i must have gotten the first version of the memo and not the update | 19:59 |
clarkb | because they delete and replace servers or somethign for upgrades. Its basically an appliance | 20:00 |
clarkb | we are at time. | 20:00 |
clarkb | #topic Upgrade Server Pruning | 20:00 |
clarkb | #undo | 20:00 |
opendevmeet | Removing item from minutes: #topic Upgrade Server Pruning | 20:00 |
clarkb | #topic Backup Server Backup Pruning | 20:00 |
clarkb | really quickly before we end I wanted to note that the rax backup server needs its backups pruned due to disk utilization | 20:01 |
clarkb | Maybe that is somethign tonyb wants to do with anothe root (ianw set it up and documented and scripted it well so its mostly a matter of going through the motions) | 20:01 |
tonyb | Yup happy to. | 20:01 |
clarkb | #topic Open Discussion | 20:01 |
fungi | i'm also happy to help tonyb if there are questions about backup pruning | 20:02 |
clarkb | We don't really have time for this but feel free to take discussion to #opendev or service-discuss@lists.opendev.org to bring up extra stuff and/or keep talking about the boot a VM and hand it over stuff | 20:02 |
tonyb | fungi: thanks. | 20:02 |
clarkb | and happy 1700000000 day | 20:02 |
fungi | woo! | 20:02 |
clarkb | I thinkwe are about 2 hours away? | 20:02 |
clarkb | something like that | 20:02 |
clarkb | thank you everyoen for your time! | 20:03 |
clarkb | #endmeeting | 20:03 |
opendevmeet | Meeting ended Tue Nov 14 20:03:06 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:03 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2023/infra.2023-11-14-19.00.html | 20:03 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-11-14-19.00.txt | 20:03 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2023/infra.2023-11-14-19.00.log.html | 20:03 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!