clarkb | almost meeting time | 18:58 |
---|---|---|
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Nov 28 19:00:15 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/3B75BMYDEBIQ56DW355IGF72ZH6JVVQI/ Our Agenda | 19:00 |
clarkb | #topic Announcements | 19:00 |
clarkb | I didn't have anything to announce. | 19:00 |
clarkb | OpenInfra foundation individial board member seat nominations are open now | 19:01 |
clarkb | if that interests you I'm sure we can point you in the right direction | 19:01 |
clarkb | I'll give it a couple more minutes before diving into the agenda | 19:02 |
clarkb | #topic Server Upgrades | 19:05 |
clarkb | tonyb continues to push this along | 19:05 |
clarkb | #link https://review.opendev.org/q/topic:%22mirror-distro-updates%22+status:open | 19:05 |
clarkb | there are three mirrors all booted and ready to be swapped in now. Just waiting on reviews | 19:05 |
clarkb | one thing tonyb and I discovered yesterday is that the launcher venv cannot create new volumes in rax. We had to use fungi's xyzzy env for that. The xyzzy env cannot attach the volume :/ | 19:06 |
clarkb | fungi: so maybe don't go cleaning up that env anytime soon :) | 19:06 |
clarkb | tonyb: once we get those servers swapped in we'll need to go through and clean out the old servers too. I'm happy to sit down for that and we can go over some other root topics as well | 19:07 |
fungi | yeah, a good quiet-time project for someone might be to do another round of bisecting sdk/cli versions to figure out what will actually work | 19:07 |
fungi | i think the launch venv might be usable for all those things? and we just didn't try it for volume creation | 19:08 |
fungi | but then ended up using it for volume attachment | 19:08 |
tonyb | clarkb: Yup. That'd probably be good to have extra eyes | 19:08 |
frickler | iiuc the intention for latest sdk/cli is still to support rax, so reporting bugs if things don't work would be an option, too | 19:09 |
clarkb | fungi: yes, the launch env worked for everything but volume creation. volume creation failed | 19:09 |
clarkb | frickler: ya we can also run with the --debug flag to see what calls are actually failing | 19:10 |
tonyb | frickler: I think so. The challenge is the CLI/SDK team don't have easy access to testing (yet) | 19:10 |
clarkb | anyway reviews for the current set of nodes would be good so we can get them in place and then figure out cleanup of the old nodes | 19:11 |
clarkb | anything else related to this? | 19:11 |
frickler | tonyb: that's why feedback from us would be even more valuable | 19:11 |
tonyb | frickler: fair point | 19:12 |
tonyb | clarkb: not from me. | 19:12 |
clarkb | #topic Python Container Updates | 19:12 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/898756 And parent add python3.12 images | 19:12 |
clarkb | At this point I think adding python3.12 images is the only action we can take as we are still waiting on the zuul-operator fixups. I have not personally had time to look into that more closely | 19:13 |
clarkb | That said I don't think anything should stop us from adding those images | 19:13 |
tonyb | Neither have I. It's in the "top 5" items on my todo list | 19:13 |
clarkb | #topic Gitea 1.21 | 19:14 |
clarkb | Gitea just released a 1.20.6 bugfix release that we should upgrade to prior to upgrading to 1.21 | 19:14 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/902094 Upgrade gitea to 1.20.6 first | 19:15 |
clarkb | They also made a 1.21.1 release which I bumped our existing 1.21 change to | 19:15 |
clarkb | in #opendev earlier today we said we'd appove the 1.20.6 update after this meeting. I think that still works for me though I will be popping out from about 2100-2230 UTC | 19:15 |
clarkb | My hope is that later this week (maybe thursday at this rate?) I'll be able to write a change for the gerrit half of the key rotation and then generate a new key and stash it in the appropriate locations | 19:16 |
tonyb | Sounds good. | 19:16 |
clarkb | That said the gitea side of key rotation is ready for review and landable as is: #link https://review.opendev.org/c/opendev/system-config/+/901082 Support gitea key rotation | 19:16 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/901082 Support gitea key rotation | 19:16 |
clarkb | The change there is set up to manage the single existing key and we can do a followup to add the new key | 19:17 |
clarkb | for clarity I think the rough plan here is 0) upgrade to 1.20.6 1) add gitea key rotation support 2) add gerrit key rotation support 3) add new key to gitea 4) add new key to gerrit 5) use new key in gerrit 6) remove old key from gitea (and gerrit?) 7) upgrade gitea | 19:18 |
clarkb | steps 0) and 1) should be good to go. | 19:18 |
tonyb | Seems like a plan, FWIW, I'll look again at 0 and 1 | 19:19 |
clarkb | #topic Upgrading Zuul's DB Server | 19:19 |
clarkb | #link https://etherpad.opendev.org/p/opendev-zuul-mysql-upgrade info gathering document | 19:19 |
clarkb | I haven't had time to dig into db cluster options yet | 19:20 |
frickler | I'm wondering whether we could reuse what kolla does for that | 19:20 |
clarkb | Looking at the document it seems like some conclusions can be made though. Backups are not currently critical, database size is about 18GB uncompressed so the server(s) don't need to be large, the database should not be hosted on zuul nodes beacuse we auto upgrade zuul nodes | 19:21 |
clarkb | frickler: that is an interesting idea. | 19:22 |
tonyb | Yup. Given it won't be on any zuul servers I guess the RAM requirements are less interesting | 19:22 |
tonyb | frickler: Can you drop some pointers? | 19:22 |
fungi | also we can resize the instances if we run into memory pressure | 19:22 |
frickler | I need to look up the pointers in the docs, but in general there is quite a bit of logic in there to make things like upgrades work without interruption | 19:23 |
fungi | at one point we had played around with percona replicating to a hot standby | 19:23 |
clarkb | yes you need a lot of explicit coordination unlike say zookeeper | 19:23 |
clarkb | and you have to run a proxy | 19:24 |
fungi | may have relied on ndb? | 19:24 |
frickler | kolla uses either haproxy or proxysql | 19:25 |
clarkb | I don't remember that. The zuul-operator uses percona xtradb cluster and I think kolla uses galera | 19:25 |
clarkb | which are very similar backends and then ya a proxy in front | 19:25 |
corvus | looks like kolla may use galera. that's one of the options (in addition to percona xtradb, and whatever postgres does for clustering these days) | 19:25 |
corvus | i don't think ndb is an option due to memory requirements | 19:26 |
fungi | i trust the former mysql contributors in these matters, i'm mostly database illiterate | 19:26 |
clarkb | one thing we should look at too is whether or not we can promote and existing mysql/mariadb to galera/xtradb cluster and similar with postgres | 19:27 |
corvus | (and the sort of archival nature of the data seems like not a great fit for ndb; though it is my favorite cluster tech just because of how wonderfully crazy it is) | 19:27 |
clarkb | then one option we may have is to start with a spof which isn't a regression then later add in the more complicated load balanced cluster | 19:27 |
fungi | in theory the trove instance is already a spof | 19:27 |
clarkb | yes that is why this isn't a regression | 19:28 |
corvus | clarkb: i think that's useful to know, but in all cases, a db migration for us won't be too burdensome | 19:28 |
corvus | worst case we're talking like an hour on a weekend for an outage if we want to completely change architecture | 19:28 |
clarkb | good point | 19:29 |
corvus | (so i agree, a good plan might look like "move to a spof mariadb and then make it better later" but also it's not the end of the world if we decide "move to a spof mariadb then move to a non-spof mariadb during a maint window") | 19:30 |
corvus | anyway, seems like a survey of HA options is still on the task list | 19:31 |
clarkb | fwiw it looks like postgres ha options are also fairly involved and require you to manage fault identification and failover | 19:31 |
clarkb | ++ lets defer any decision making unti lwe have a bit more data. But I think we're leaning towards running our own system on dedicated machine(s) at the very least | 19:31 |
clarkb | #topic Annual Report Season | 19:32 |
clarkb | #link https://etherpad.opendev.org/p/2023-opendev-annual-report OpenDev draft report | 19:32 |
clarkb | I've done this for a number of years now. I'll be drafting a section of the openinfra foundation's annual report that covers opendev | 19:32 |
clarkb | I'm still in the brainstorming just get something started phase but feel free to add items to the etherpad | 19:33 |
clarkb | Once I've actually written something I'll ask for feedback as well. I think they want them written by the 22nd of december. Something like that | 19:33 |
fungi | i'll try to get preliminary 2023 engagement report data to you soon, though the mailing list measurements need to be reworked for mm3 | 19:33 |
tonyb | Okay so there is some time, but not lots of time | 19:34 |
clarkb | tonyb: ya its a bit earlier than previous years too. Usually we have until the first week of january | 19:34 |
fungi | final counts of things get an exception beyond december for obvious reasons, but the prose needs to be ready with placeholders or preliminary numbers | 19:35 |
tonyb | clarkb: Hmm okay | 19:35 |
clarkb | its a good opportunity to call out work you've been involved in :) | 19:36 |
clarkb | definitely add those items to the brainstorm list so I don't forget about them | 19:37 |
clarkb | #topic Open Discussion | 19:37 |
fungi | yeah, whatever you think we should be proud of | 19:37 |
clarkb | tonyb stuck the idea of making it possible for people to run unittest jobs using python containers under open discussion | 19:37 |
corvus | what's the use case that's motivating this? is it that someone wants to run jobs on containers instead of vms? or that they want an easier way to customize our vm images than using dib? | 19:37 |
tonyb | Yeah, I just wanted to get a feel for what's been tried. I guess there are potentially 2 "motivators" | 19:38 |
tonyb | 1) a possibly flawed assumption that we could do more unit tests in some form of container system as the startup/rest costs are lower? | 19:39 |
tonyb | 2) making it possible, if not easy, for the community to test newer pythons without the problems of chasing unstable distros | 19:39 |
fungi | where would those containers run? | 19:40 |
corvus | ok! for 1 there are a few things: | 19:40 |
tonyb | Well that'd be part of the discussion. | 19:40 |
corvus | - in opendev, we're usually not really limited by startup/recycle time. most of our clouds are fast. | 19:41 |
corvus | (and we have enough capacity we can gloss over the recycle time) | 19:41 |
clarkb | also worth noting that the last time I checked we utilize less than 30% of our total available resources on a long term basis | 19:41 |
tonyb | We could make zuul job templates like openstack-tox-* to setup a VM with an appropriate comtainer-runtime and run tox in there | 19:42 |
clarkb | from an efficiency standpoint we'd need to cut our resource usage down to about 1/3 if we use always one container runners | 19:42 |
clarkb | *always on/running | 19:42 |
tonyb | but that would negate the 1st motivator | 19:42 |
corvus | - nevertheless, nodepool and zuul do support running jobs in containers via k8s or openshift. we ran a k8s cluster for a short time, but running it required a lot of work that no one had time for. only one of our clouds provides a k8s aas, so that doesn't meet our diversity requirements | 19:42 |
tonyb | Both fair points | 19:43 |
corvus | that ^ goes to fungi's point about where to run them. i don't think the answer has changed since then, sadly :( | 19:43 |
tonyb | Okay. | 19:44 |
corvus | yeah, we could write jobs/roles to pull in the image and run it, but if we do that a lot, that'll be slow and drive a lot of network traffic | 19:44 |
corvus | if the motivation is to expand python versions, we might want to consider new dib images with them? | 19:44 |
clarkb | part of the problem with that is dib imges are extremely heavy weight | 19:44 |
corvus | i think there was talk of using stow to have a bunch of pythons on one image? | 19:44 |
clarkb | they are massive (each image is like 50GB * 2 of storage) and uploads are slow to certain clouds | 19:45 |
clarkb | ya so we could colocate instead. | 19:45 |
clarkb | My hesitancy here is that in the times where we've tried to make it easier for the projects to test with new stuff its not gone anywhere bceause they have a hard time keeping up in general | 19:45 |
clarkb | tumbleweed and fedora are examples of this | 19:46 |
clarkb | but even today openstack isn't testing with python3.11 across the board yet (though it is close) | 19:46 |
clarkb | I think there is probably a balance in effort vs return and maybe containers are a good tool in balancing that out? | 19:46 |
tonyb | Yeah that's why I thought avoiding the DIB image side might be helpful | 19:47 |
clarkb | basically I don't expect all of openstack to run python3.12 jobs until well after ubuntu has packages for it anyway. But maybe a project like zuul would run python3.12 jobs and those are relatively infrequent compared to openstack | 19:47 |
clarkb | but also have a dib step install python3.12 on jammy is not a ton of work if we think thi is largely a python problem | 19:48 |
clarkb | (I think generally it could be a nodejs, golang, rust, etc problem but many of those ecosystems make it a bit easier to get a random version) | 19:48 |
clarkb | corvus: does ensure-python with appropriate flags already know how to go to the internet to fetch a python version and build it? | 19:49 |
clarkb | I think it does? maybe we start there and see if there is usage and we can optmize from there? | 19:49 |
corvus | yeah, there's pyenv and stow | 19:50 |
corvus | in ensure-python | 19:50 |
tonyb | Okay. | 19:51 |
tonyb | I think that was helpful, I'd be willing to look at the ensure-python part and see what works and doesn't | 19:52 |
tonyb | it seems like the idea of using a container runtime isn't justified right now. | 19:52 |
clarkb | if our clouds had first class container runtimes as a service it would be much easier to sell/experiement with. But without that there is a lot of bootstrapping overhead for the humans and networking | 19:53 |
clarkb | side note: dox is a thing that mordred experimented with for a while: https://pypi.org/project/dox/ | 19:54 |
clarkb | but ya lets start with the easy thing which is try ensure-python's existing support for getting a random python and take what we learn from there | 19:55 |
clarkb | Anything else? We are just about at time for our hour? | 19:55 |
clarkb | Is everyone still comfortable merging that gitea 1.20.6 udpate even if I'm gone from 2100 to 2230? | 19:55 |
clarkb | if so I say someone should approve it :) | 19:55 |
fungi | i will, i can keep an eye on it | 19:56 |
clarkb | thanks! | 19:56 |
fungi | and done | 19:56 |
clarkb | I guess its worth mentioning I think I'll miss our meeting on December 12. I'll be around for the very first part of the day but then I'm popping out | 19:56 |
* tonyb will be back in AU by then | 19:57 | |
clarkb | thank you for your time everyone | 19:57 |
clarkb | #endmeeting | 19:57 |
opendevmeet | Meeting ended Tue Nov 28 19:57:13 2023 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:57 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2023/infra.2023-11-28-19.00.html | 19:57 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-11-28-19.00.txt | 19:57 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2023/infra.2023-11-28-19.00.log.html | 19:57 |
fungi | things will probably be very quiet from that point on until the end of the year anyway | 19:57 |
clarkb | fungi: ya thats my hope | 19:57 |
corvus | we'll miss you too clarkb | 19:57 |
tonyb | Thanks all | 19:57 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!