clarkb | meeting time | 19:00 |
---|---|---|
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Jan 9 19:00:25 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/7IXDFVY34MYBW3WO2EEU3AIGOLAL6WRB/ Our Agenda | 19:00 |
clarkb | Its been a little while since we had these regularly | 19:00 |
clarkb | #topic Announcements | 19:01 |
clarkb | The OpenInfra Foundation Individual Board member election is happening now. Look for your ballot via email and vote. | 19:02 |
clarkb | This election also includes bylaw ammendments to make the bylaws less openstack specific | 19:02 |
clarkb | If you expected to have a ballot and can't find out please reach out. There may have been email delivery problems | 19:02 |
clarkb | Separately we're going to feature OpenDev on the OpenInfra Live stream/podcast/show (I'm not sure exactly how you'd classify it) | 19:03 |
clarkb | That will happen on January 18th at 1500 UTC? | 19:04 |
clarkb | I know the day is correct but not positive on the time. Feel free to tune it | 19:04 |
clarkb | *tune in | 19:04 |
corvus | clarkb: i think the kids are calling it a "realplayer tv show" now ;) | 19:04 |
fungi | also some streaming platforms have the ability for you to heckle us and ask questions | 19:04 |
clarkb | #topic Topics | 19:06 |
clarkb | #topic Server Upgrades | 19:07 |
clarkb | I believe that tonyb has gotten all of the mirror nodes upgraded at this point | 19:07 |
clarkb | Not sure if tonyb is around for the meeting, but I think the plan was to look at meetpad servers next | 19:07 |
tonyb | Correct | 19:08 |
tonyb | I started looking at meetpad, One thing that worries me a little is I can't quite see how we add the jvb nodes to meetpad | 19:08 |
clarkb | tonyb: it should be automated via configuration somehow | 19:09 |
clarkb | tonyb: I can look into that after the meeting | 19:09 |
tonyb | it seems to just be "magic" and I don't want any new jvb nodes added to auto regiuster with the existing meetpad | 19:09 |
tonyb | clarkb: Thanks | 19:09 |
clarkb | tonyb: yes it should be magic and it happems via xmpp iirc | 19:09 |
fungi | we've scaled up and down if you look at git history | 19:09 |
tonyb | Ah okay. | 19:10 |
clarkb | so ya one approach would be to have a new jvb join the old meetpad and the nreplace old meetpad and have new jvb join to the new thing. Or update config management to allow two side by side installations then update dns | 19:10 |
clarkb | we'll need to sort out how the magic happens in order to make a decision on approach I think | 19:10 |
tonyb | That was my thinking | 19:10 |
corvus | (i think a rolling replacement sounds good, but i haven't thought about it deeply) | 19:12 |
tonyb | I also looked at mediawiki and I'm reasonably close to starting that server. translate looks like we'll just turn it off when i18n are ready, but I'm trying to help them with new weblate tools | 19:12 |
corvus | (just mostly that since we're not changing any software versions, we'd expect it to work) | 19:12 |
tonyb | so that leaves cacti and storyboard to look at | 19:12 |
clarkb | tonyb: we've got a spec to add a prometheus and some agents on servers to replace cacti which is one option there | 19:12 |
clarkb | but maybe the easiest thing right now is to just uplift cacti? I don't know | 19:13 |
fungi | cacti was in theory going to be retired in favor of prometheus | 19:13 |
fungi | yeah that | 19:13 |
clarkb | I think the main issue with prometheus was figuring out the agent stuff. Running the service to collect the data is straightforward | 19:13 |
tonyb | Okay, I know ianw was thinking prometheus would be a good place for me to start so I'd be happy to look at that | 19:14 |
clarkb | alright lets move on have a fair numebr of things things to discuss and it sounds like we're continuing to make progress there. Thanks!@ | 19:14 |
clarkb | #topic Python container updates | 19:15 |
clarkb | The zuul registry service migrated to bookworm images so I've proposed a change to drop the bullseye images it was relying on | 19:15 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/905018 Drop Bullseye python3.11 images | 19:15 |
clarkb | That leaves us with zuul-operator on the bullseye python3.10 images as our last bullseye container images | 19:15 |
clarkb | #topic Upgrading Zuul's DB server | 19:16 |
clarkb | I realized while prepping for this meeting that I had completely spaced on this. | 19:16 |
tonyb | It happens at this time of year ;P | 19:16 |
clarkb | However, coincidentally hacker news had a post about postgres options recently | 19:16 |
clarkb | #link https://www.crunchydata.com/blog/an-overview-of-distributed-postgresql-architectures a recent rundown of postgresql options | 19:17 |
clarkb | I haven't read the article yet, but figured I should as a good next step on this item | 19:17 |
clarkb | did anyone else have new input to add? | 19:17 |
* tonyb shakes head | 19:18 | |
clarkb | #topic EMS discontinuing legacy consumer hosting plans | 19:19 |
clarkb | fungi indicated that at the last meeting the general consensus was that we should investigate a switch to the newer plans | 19:19 |
clarkb | fungi: have we done any discussion about this on the foundation side yet? I'm guessing we need a general ack there then we can reach out to element about changing the deployment type? | 19:20 |
fungi | they indicated in the notice that they'd let folks on the old plan have a half-normal minimum user license | 19:20 |
fungi | i did some cursory talking to wes about it and it sounded like they'd be able to work it in for 2024 | 19:21 |
fungi | we would have to pay for a full year up front though | 19:21 |
clarkb | I don't expect we'll stop using matrix anytime soon | 19:21 |
clarkb | so that seems fine from a usage standpoint | 19:21 |
fungi | right, since we're supporting multiple openinfra projects with it, the cost is fairly easy to justify | 19:22 |
clarkb | fungi: in that case I guess we should reach out to Element. IIRC the email gave a contact for the conversion | 19:22 |
clarkb | maybe double check with wes that nothing has changed in the last few weeks before sending that email | 19:22 |
* clarkb scribbles a note to do this stuff | 19:22 | |
fungi | will do | 19:22 |
tonyb | Also gives us this year to test self-hosting a homeserver | 19:23 |
fungi | we've still got about a month to sort it | 19:23 |
clarkb | right we have until February 7 | 19:23 |
frickler | do we really want to test self-hosting? also, would we get an export from element that would allow moving and keeping rooms and history? | 19:24 |
corvus | no export is needed; the system is fully distributed | 19:24 |
clarkb | they provided a link to a mgiration document in the email too | 19:24 |
clarkb | trying to find it | 19:25 |
fungi | but they do have a settings export we can use too | 19:25 |
clarkb | https://ems-docs.element.io/books/element-cloud-documentation/page/migrate-from-ems-to-self-hosted | 19:25 |
fungi | basically the homeserver config | 19:25 |
frickler | so you start a new homeserver with the same name and the rooms just magically migrate? | 19:25 |
tonyb | frickler: I think it's something to investigate during the year. Gives us more information for making a long term decision | 19:25 |
clarkb | we "own" the room names so ti would largely be history and room config to worry about aiui | 19:25 |
corvus | the rooms and their contents exist on all matrix servers involved in the federation (typically homeservers of users in those rooms) | 19:26 |
corvus | if the history is exported, cool, but in theory i think a replacement server should be able to grab the history from any other server | 19:27 |
clarkb | oh interesting. So if you stand up a new server and have the well known file say it is the :opendev.org homeserver then clients will talk to the new server. That new server will sync out of the federated state the history of its rooms | 19:28 |
corvus | that's what i'd expect. i have not tested it. | 19:28 |
clarkb | ack. Also looks like we can copy databases per the ems migration doc should that be necessary | 19:29 |
corvus | (you'd just need to use one of the other room ids initially) | 19:29 |
corvus | but i'm still in no rush to self-host. | 19:29 |
clarkb | in any case figuring that out is a next step. First up is figuring out a year of hosting | 19:29 |
clarkb | and if that is reasonable. Which I can help coordinate with fungi at the foundation and talking to element | 19:29 |
clarkb | #topic Followup on haproxy update being broken | 19:30 |
clarkb | There was a lot of info under this item but the two main points seem to be "should we be more explicit about the versions of docker images we consume" and "should we prune less aggressively" | 19:30 |
corvus | (like, i'm not looking at ems as an interim step based on our conversations so far -- but i agree that keeping aware of future options is good) | 19:30 |
clarkb | I think for haproxy in particular we can and should probably stick with their lts tag | 19:31 |
fungi | i think we mostly covered the haproxy topic at the last meeting, but happy to revisit since not everyone was present | 19:31 |
corvus | ++lts tag | 19:31 |
clarkb | fungi: ack. I wanted to bring up one thing primiarly on pruning | 19:31 |
clarkb | One gotcha with pruning is that it seems to be based on the image build/creation time not when you started using the newer image(s) | 19:31 |
fungi | right, note that we hadn't actually pruned the old haproxy image we downgraded to, when i did the manual config change and pulled, it didn't need to retrieve the image | 19:32 |
clarkb | and so it is a bit of a clunky tool, but better than nothing for images like haproxy for example where we could easily revert | 19:32 |
clarkb | I'm happy for us to extend the time we keep images, but also be aware of this limitation with the pruning command | 19:32 |
corvus | i'm ambivalent about pruning because i'm not worried about not being able to pull an old version from a registry on demand | 19:33 |
fungi | the main thing it might offer is insurance against upstreams deleting their images | 19:33 |
fungi | but i don't think that's actually been an issue we've encountered yet? | 19:33 |
frickler | one concern of mine was being able to find out which last version it actually was that we were running | 19:33 |
corvus | i'm not eager to run an image that upstream has deleted either | 19:33 |
fungi | frickler: yes, if we could add some more verbosity around our image management, that could help | 19:34 |
clarkb | frickler: we could update our ansible runs to do something like a docker ps -a and docker image list | 19:34 |
clarkb | and record that in our deployment logs | 19:34 |
fungi | even if it's just something that periodically interrogates docker for image ids and logs them to a file | 19:34 |
fungi | or yeah that | 19:34 |
frickler | maybe even somewhere more persistent than zuul build logs would be good | 19:35 |
corvus | i agree with frickler that leaving an image sitting around for some number of days provides a good indication of what we were probably running before | 19:35 |
clarkb | ok so the outstanding need is better records of what docker images we ran during which timeframes | 19:36 |
corvus | (we could stick version numbers in prometheus; it's not great for that though, but it's okay as long as they don't change too often) | 19:36 |
clarkb | ya this will probably require a bit more brainstorming | 19:36 |
corvus | (the only way to do that with prometheus increases the cardinality of metrics with each new version number) | 19:36 |
clarkb | maybe start with the simple thing of having ansible record a bit more info then try and improve on that for longer term retention | 19:37 |
clarkb | I'll continue on as we have a few more items to discuss | 19:38 |
clarkb | #topic Followup on haproxy update being broken | 19:38 |
clarkb | Similar to the last one I'm not sure if this reached a conclusion but two things worth mentioning have happened recently. First zuul's doc quota was increased | 19:38 |
frickler | that's the topic we just had? | 19:38 |
clarkb | bah yes | 19:39 |
clarkb | #undo | 19:39 |
opendevmeet | Removing item from minutes: #topic Followup on haproxy update being broken | 19:39 |
clarkb | #topic AFS Quota issues | 19:39 |
clarkb | copy and paste failure | 19:39 |
* fungi is now much less confused | 19:39 | |
clarkb | Second is that there are some early discussions around having openeuler be more involved with opendev and possibly contributing some CI resources | 19:39 |
frickler | the zuul project quota was increased (not doc I think) | 19:39 |
clarkb | frickler: ya it hosts the zuul docs iirc | 19:40 |
clarkb | and website? | 19:40 |
frickler | IIUC the release artefacts | 19:40 |
clarkb | There may be an opportunity to lverage this interest in collaboration to clean up the openeuler mirrors and feedback to them on the growth problems | 19:40 |
corvus | everything under zuul-ci.org is on one volume | 19:40 |
fungi | zuul's docs are part of its project website | 19:40 |
fungi | yeah that | 19:40 |
corvus | and i increased it to 5gb | 19:40 |
clarkb | ahah | 19:40 |
clarkb | essentially work with the interested parties to improve the situation around mirrors for openeuler and maybe our CI quotas | 19:41 |
clarkb | responding to their latest queries about the sizes of VMs and how many is on my todo list after meetings and lunch | 19:41 |
clarkb | (you know we write that stuff down in a document but 100% of the time the questions get asked anyway) | 19:42 |
frickler | do you have a reference to those openeuler discussions or are they private for now? | 19:42 |
corvus | they have an openstack cloud? | 19:42 |
clarkb | frickler: I think keeping the email discussion small while we sort out if it is even possible is good, but once we know if it will go somewhere we can do that more publicly | 19:43 |
clarkb | corvus: yes sounds like it? We tried to be explicit that what we need is an openstack api endpoint and accounts that can provision VMs | 19:43 |
frickler | yeah, I just wanted to know whether I missed something somewhere | 19:43 |
fungi | for transparency: openeuler representatives were in discussion with openinfra foundation staff members and offered to supply system resources, so the foundation staff are trying to put them in touch with us to determine more scope around it | 19:43 |
fungi | it's all been private discussions so far | 19:43 |
corvus | neat | 19:44 |
clarkb | were there other outstanding afs quota concerns to discuss? | 19:44 |
fungi | since openstack is a primary use case for their distro, they have a vested interest in helping test openstack upstream on it | 19:44 |
frickler | some other mirror volumes need watching | 19:45 |
clarkb | for centos stream I seem to recall digging around in those mirrors and we end up with lots of packages with many versions | 19:45 |
frickler | centos-stream and ubuntu-ports look very close to their limit | 19:46 |
clarkb | in theory we only need the newest 2 to avoid installation failures | 19:46 |
clarkb | we could potentially write a smarter syncing script that scanned through and deleted older versions | 19:46 |
clarkb | for ubuntu ports I had thought we were still syncing old versions of the distro that we could delete but we aren't so I'm nto sure what we can do there | 19:46 |
clarkb | are we syncing more than arm64 packages maybe? like 32bit arm and or ppc? I think not | 19:47 |
clarkb | I don't think we have time to solve that in this meeting. Lets continue on as we have ~3 more topics to cover | 19:48 |
clarkb | #topic Broken wheel build issues | 19:48 |
frickler | I don't know, I just noticed these issues when checking whether we have room to mirror rocky | 19:48 |
clarkb | frickler: ack | 19:48 |
fungi | it's also possible that dropping old releases from our config isn't cleaning up the old packages associated with them | 19:49 |
clarkb | fungi: oh interesting. Worth double checking | 19:49 |
clarkb | for wheels I think we can stop building and mirroring them at any time beacuse pip will prefer new sdists over old wheels right? so we don't even need to update the pip.conf in our test nodes | 19:49 |
fungi | correct | 19:49 |
clarkb | fungi: ^ you probably know off the top of your head if that is the case. But that would be my main concern is that we start testing older stuff accidentally if we stop building wheels | 19:50 |
fungi | unless you pass the pip option to prefer "binary" packages (wheels) | 19:50 |
clarkb | right | 19:50 |
fungi | but it's not on by default | 19:50 |
fungi | i'd treat that as a case oc caveat emptor | 19:50 |
clarkb | in that case I think it is reasonable to send email to the service announce list indicating we plan to stop running those jobs in the future (say beginning of february) ask if anyone is interested in keeping them alive and if not jobs will fallback to building from source | 19:50 |
clarkb | the fallback is slower and may require some bindep file updates but it isn't going to hard stop anyone from getting work done on centos distros | 19:51 |
fungi | wfm | 19:51 |
frickler | will we also clean out existing wheels at the same time? maybe keep the afs volume but not publish anymore? | 19:51 |
clarkb | frickler: I think we should keep the content for a bit as some of the existing wheels may be up to date for a while | 19:52 |
fungi | we could probably do it in phases | 19:52 |
frickler | ok | 19:52 |
clarkb | since pip's behavior is acceptable by default here we can still take advantage of the remaining benefit from the mirror for a bit | 19:52 |
clarkb | then maybe after 6-12 months clean it up | 19:52 |
clarkb | alright next topic | 19:53 |
clarkb | #topic Gitea repo-archives filling server disk | 19:53 |
fungi | fwiw, the python ecosystem has gotten a lot better about making cross-platform wheels for releases of things now, and in a more timely fashion | 19:53 |
fungi | so our pre-built wheels are far less necessary | 19:53 |
clarkb | when you ask gitea for a repo archive (tarball/zip/.bundle) it caches that on disk | 19:53 |
clarkb | then once a day it runs an itnernal cron task (using a go library implemtnation of cron not system cron) to clean up any repo archives that are older than a day old | 19:54 |
fungi | oh, yeah this is a fun one. i'd somehow already pushed it to the back of my mind | 19:54 |
frickler | can we disable that functionality? we do have our own tarballs instead (at least for openstack)? | 19:54 |
corvus | i'm guessing people do that a lot to get releases even though like zero opendev projects make releases that way? | 19:54 |
corvus | what frickler said :) | 19:54 |
fungi | s/people/web crawlers/ i think | 19:55 |
clarkb | upstream indicated it could be web crawlers | 19:55 |
clarkb | so their suggestion was to update our robots.txt | 19:55 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/904868 update robots.txt on upstream's suggestion | 19:55 |
clarkb | and no we can't disable teh feature | 19:55 |
clarkb | at least I haven't found a way to do that | 19:55 |
clarkb | the problem is the daily claenup isn't actually cleaning up everything more than a day old | 19:55 |
clarkb | I've spent a bit of time rtfs'ing and looking at the database and I can't figure out why it is broken but you can see on gitea12 that it falls about 4 hours behind each time it runs so we end up leaking and filling the disk | 19:56 |
clarkb | In addition to reducing the number of archives generated by asking bots to leave them alone we can also run a cron job that simply deletes all archives | 19:56 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/904874 Run weekly removal of all cached repo archives | 19:56 |
frickler | does gitea break if we make the cache non-writeable? | 19:56 |
clarkb | frickler: I haven't tesed that but I would assume so. I would expect a 500 error when you request the archive | 19:57 |
frickler | which would also be like disabling it kind of | 19:57 |
fungi | i suppose it depends on your definition of "break" ;) | 19:57 |
clarkb | since we are already trying to delete archives more than a day old deleting all archives once a week on the weekend seems safe | 19:57 |
clarkb | and when you ask it to delte all archives it does successfully delete all archives | 19:57 |
clarkb | I would prefer we not intentionally create 500 errors | 19:58 |
clarkb | there are valid reasons to get repo archives | 19:58 |
clarkb | I also noticed when looking at the cron jobs that gitea has a phone home to check if it is running the latest release cron job | 19:58 |
corvus | the cron might have a small window of breakage, but should immediately work on a retry so lgtm | 19:59 |
clarkb | I pushed https://review.opendev.org/c/opendev/system-config/+/905020 to disable that cron job ecabsue I hate the idea of a phone home for that | 19:59 |
clarkb | our hour is up and I have to context switch to another meeting | 20:00 |
clarkb | #topic Service Coordinator Election | 20:00 |
clarkb | really quickly because I end the meeting I wanted to call out that we're appraoaching the service coordinator election timeframe. I need to dig up emails to determine when I said that would happen (I beleivee it is end of january early february) | 20:01 |
clarkb | nothing for anyone to do at this point other than consider if they wish to assume the role and nominate themselves. And I'll work to get things official via email | 20:01 |
tonyb | If it matches openstack PTL/TC elections then they'll start in Feb | 20:01 |
clarkb | tonyb: its slightly offset | 20:01 |
tonyb | okay | 20:01 |
clarkb | #topic Open Discussion | 20:01 |
clarkb | Anything else important before we call the meeting? | 20:02 |
tonyb | nope | 20:03 |
clarkb | sounds like no. Thank you everyone for your time and help running the opendev services! | 20:03 |
clarkb | we'll be back next week same time and location | 20:03 |
clarkb | #endmeeting | 20:03 |
opendevmeet | Meeting ended Tue Jan 9 20:03:27 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:03 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2024/infra.2024-01-09-19.00.html | 20:03 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-01-09-19.00.txt | 20:03 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2024/infra.2024-01-09-19.00.log.html | 20:03 |
corvus | thanks clarkb ! | 20:03 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!