fungi | ahoy! | 19:00 |
---|---|---|
clarkb | hello | 19:00 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Sep 14 19:01:13 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
ianw | o/ | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-September/000283.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | I didn't have any announcements. Did anyone else have announcements to share? | 19:02 |
fungi | i don't think i did | 19:02 |
clarkb | #topic Actions from last meeting | 19:03 |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-09-07-19.01.txt minutes from last meeting | 19:03 |
clarkb | There were no actions recorded last meeting | 19:03 |
* mordred waves to lovely hoomans | 19:03 | |
clarkb | #topic Specs | 19:03 |
clarkb | #link https://review.opendev.org/c/opendev/infra-specs/+/804122 Prometheus Cacti replacement | 19:03 |
clarkb | I updated the spec based on some of the feedback I got. Seems everyone is happy with the general plan but one specific thing has come up since I pushed the update | 19:04 |
clarkb | Basically corvus is pointing out we shouldn't try to do node exporter and snmp exporter as that will double our work we should commit to one or the other | 19:04 |
clarkb | I'll try to capture the pros/cons of each really quickly here, but I would appreciate it ya'll could take a look and leave your thoughts on this specific topic | 19:05 |
clarkb | For the SNMP exporter the upside is we already run and configure snmpd on all of our instances. This means the only change on our instance needed to collect snmp data is a firewall update to allow the new prometheus server to poll the data. | 19:05 |
clarkb | The snmp exporter downside is that we'll have to do a fair bit of configuration to tell the snmp exporter what snmp mibs (is that the right terminology?) to collect and where to map them into prometheus. Then we have to do a bunch of work to set of graphs for that data | 19:06 |
fungi | oids, technically | 19:06 |
clarkb | For node exporter the issue is we need to run a new service that doesn't exist in our distros (at least I'm fairly certain there aren't packages for it). We would instead use docker + docker-compose to run this service | 19:07 |
fungi | mibs are collections of oids | 19:07 |
clarkb | This means we will need to add docker to a number of systems that don't currently run docker today. OpenAFS, DNS, mailman servers immediately come to mind. This is possible but a bit of work too. | 19:07 |
clarkb | The upside to using node exporter is we use something that is a bit more ready out of the box to collect server perormance metrics and I'm sure there are preexisting grafana graphs we can borrow from somewhere too | 19:08 |
clarkb | That is the gist of it. Please leave your preferences on the spec and I'll followup on that | 19:08 |
fungi | i guess we'd just include the docker role in our base playbook | 19:08 |
fungi | right? | 19:09 |
clarkb | Personally I was leaning towards snmp simply because I thought we hadn't wanted to run docker in places like our dns servers | 19:09 |
clarkb | fungi: yup and set up docker-compose for node exporter there | 19:09 |
fungi | are there resource concerns with adding docker to some of those servers? | 19:09 |
frickler | do we really need docker to run node-exporter? | 19:10 |
clarkb | frickler: we do if we don't want to reinvent systems/tooling to deploy an up to date version of node exporter | 19:10 |
clarkb | there are alternatives but then you're doing a bunch of work to keep a binary blob up to date whcih is basically what docker does | 19:10 |
clarkb | I definitely don't have a strong opinion on this myself right now and will need to think about it a bit more | 19:11 |
frickler | yeah, I guess I'll need to do some research, too | 19:11 |
clarkb | fungi: that is probably the biggest reason to not do this if dockerd + node exporter consume a bunch of resources. I can probably deploy it locally on my fileserver and see what sort of memory and cpu consumption it does | 19:11 |
fungi | ubuntu has prometheus-node-exporter and prometheus-node-exporter-collectors packages, maybe that would be just as good? | 19:12 |
frickler | I'm also thinking whether I should add myself as volunteer, but let me sleep about that idea first | 19:12 |
clarkb | fungi: but not far enough back in time for our systems iirc | 19:12 |
clarkb | I thought I looked at that and decided docker was really the only way we could run it with our heterogenous setup | 19:13 |
fungi | there's a prometheus-node-exporter on bionic | 19:13 |
clarkb | looks like focal does have it but the version is quite old (and focals is quite old too) maybe that was the issue | 19:13 |
fungi | we're just about out of the xenial weeds | 19:13 |
clarkb | Ya lets look at this a bit more. THink it over and update the spec. I'm going to continue on in the meeting as we have other stuff to cover and are a quarter of the way through the hour | 19:14 |
clarkb | #topic Topics | 19:14 |
clarkb | #topic Mailman Ansible and Server Upgrades | 19:15 |
corvus | i don't have a strong opinion on which; i just feel like writing system-config changes for either basically negates the value of the other, so we should try to pick one early | 19:15 |
corvus | [eot from me; carry on] | 19:15 |
clarkb | On Sunday fungi and I upgraded lists.openstack.org and that was quite the adventure | 19:15 |
fungi | corvus also helped with that | 19:15 |
clarkb | oh right corvus helped out with the mailman stuff at the end | 19:15 |
clarkb | Everything went well until we tried to boot the Focal kernel on the ancient rax xen pv flavor | 19:16 |
corvus | very little; i made only a brief appearance; :) | 19:16 |
clarkb | it turns out that xen can't properly decompress the focal kernels because they are compressed with lz4 | 19:16 |
fungi | corvus: brief but crucial to the plot | 19:16 |
clarkb | We worked around the kernel issue by manually decompressing the kernel using the linxu kernel's extract-vmlinux tool, installing grub-xen, then chainbooting to the /boot/xen/pvboot-x86_64.elf that it installs | 19:17 |
clarkb | What that did was tell xen how to find the kernel as well as supply a kernel to it that it doesn't have to decompress | 19:17 |
clarkb | Then we had to fix up our exim, mailman, and apache configs to handle new mailman env var filtering | 19:17 |
clarkb | Where we are at right now is the host is out of the emergency file and ansible is ansibling the new configs that we had to do successfuuly | 19:18 |
clarkb | But the kernel situation is still all kinds of bad. We need to decide how we want to ensure that ubuntu isn't going to (re)install a compressed kernel. | 19:18 |
fungi | note that the kernel dance is purely because the lists.o.o server was created in 2013 and has been in-place upgraded continuously since ubuntu 12.04 lts | 19:18 |
fungi | so it's still running an otherwise unavailable pv flavor in rackspace | 19:19 |
clarkb | We can pin the kernel package. We can create a kernel postinst.d hook to decompress the kernel when the kernel updates. We can manually decompress the current kernel whenever we need to update (and use a rescue instance if the host reboots unexpectedly). | 19:19 |
fungi | pv xen loads the kernel from outside the guest domu, while pvhvm works more like a bootable virtual machine similar to kvm | 19:19 |
clarkb | In all cases I think we should begin working to replace the server, but there will be some period of time between now and when we are running with a new server where we want to have a working boot setup | 19:19 |
corvus | oh, the chainloaded kernel can't be compressed? | 19:20 |
corvus | (i thought maybe the chainloading could get around that) | 19:20 |
fungi | corvus: nope, because it still has to hand the kernel blob off to the pv xen hypervisor | 19:20 |
clarkb | corvus: we did some digging this morning and while we haven't tested it have foudn sufficient evidence that this doesnt work on mailing lists and web forums that we didn't want to try it | 19:20 |
clarkb | really all the chain load is doing is finding the correct kernel to hand to xen I think | 19:20 |
clarkb | because it understands grub2 configs | 19:20 |
clarkb | https://unix.stackexchange.com/questions/583714/xen-pvgrub-with-lz4-compressed-kernels covers what is involved in auto decompressing the kernel if we want to do that | 19:21 |
fungi | yeah, it essentially communicates the offset where the kernel blob starts | 19:21 |
corvus | ok. then i agree, we're in a hole and we should get out of it with a new server | 19:21 |
fungi | with "new server" comes a number of questions, like should we take this opportunity to fold in lists.katacontainers.io? should we take this as an opportunity to migrate to mm3 on a new server? | 19:22 |
clarkb | yup I think we should just accept that is necessary now. Then decide what workaround for the kernel we want to use while we do that new server work | 19:22 |
ianw | pinning it as is so a power-off situation doesn't become fatal and working on a new server seems best to me | 19:22 |
clarkb | ianw: ya and if we really need to do a kernel update on the server we can do it manually and do the decompress step at the same time | 19:23 |
clarkb | I'm leaning towards an apt pin myself for this reason. It doesn't prevent us from updating but ensures we do so with care | 19:23 |
frickler | maybe too obvious a question, but resizing to a modern flavor isn't supported on rackspace? | 19:24 |
clarkb | frickler: ya iirc you could only resize within pv or pvhvm flavors but not across | 19:24 |
fungi | switching from pv to pvhvm isn't supported anyway | 19:24 |
clarkb | But I guess that is somethign we could ask? fungi maybe as a followup on the issue you opened? | 19:24 |
ianw | fungi: it seems sensible to make the migration also be a mm3 migration | 19:25 |
fungi | oh, that trouble ticket is already closed after we worked out how to boot | 19:25 |
fungi | i went back over the current state of semi-official mm3 containers, we'd basically need three containers for the basic components of mm3 (core, hyperkitty, postorius) plus apache and mysql. or we could use the distro packages in focal (it has mm 3.2.2 while latest is 3.3.4) | 19:25 |
fungi | also there are tools to import mm 2.1 configs to 3.x | 19:25 |
clarkb | fungi: I think we should confirm we can't switch from pv to pvhm. I'm fairly certain our image would support both since the menu.lst is where we put the chainload and normal grub boot should ignore that | 19:25 |
fungi | and import old archives (with some caveats), though we can also serve old pipermail copies of the archives for backward compatibility with existing hyperlinks | 19:26 |
ianw | fungi: it could basically be stood up completely independently for validation right? the archives seem the thing that need importing | 19:26 |
clarkb | ianw: fungi: yes and we should be able to use zuul holds for that too | 19:26 |
fungi | clarkb: yeah, in theory the image we have now could work on a pvhvm flavor, if there's a way to swotch it | 19:26 |
fungi | switch it | 19:26 |
fungi | ianw: archives and list configs both need importing, but yes i expect we'd follow our test-based development pattern for building the new mm3 deployment and then just hold a test node | 19:27 |
clarkb | Let me try an summarize what we seem to be thinking: 1) pin the kernel package on lists.o.o so it doesn't break. Manually update the kernel and decompress if necessary. 2) Begin work to upgrade to mm3 3and4) Determine if we can switch to a pvhvm flavor whcih boots reliably against modern kernels or replace the server | 19:28 |
clarkb | Is there any objections to 1) since getting that sorted sooner than later is a good idea. | 19:28 |
ianw | ++ to all from me | 19:28 |
fungi | yeah, i'm good with all of the above | 19:28 |
fungi | i can set the kernel hackage hold once the meeting ends | 19:29 |
clarkb | fungi: thanks. I'd be happy to follow along since I always find those confusing and more experience with them would be good :) | 19:29 |
fungi | happily | 19:29 |
clarkb | I'll see if I can do any research into the pv to pvhvm question | 19:30 |
clarkb | and sounds like fungi has already been looking at 2) | 19:30 |
fungi | for years, but again this week yes | 19:30 |
clarkb | Anything else on this subject? Concerns or issues you've noticed since the upgrade outside of the above | 19:31 |
fungi | aside from the kernel issue we also had some changes we needed to make to our tooling around envvars | 19:31 |
fungi | corvus managed to work out that newer mailman started filtering envvars | 19:32 |
fungi | so the one we made up for the site dir in our multi-site design was no longer making it throughto the config script | 19:32 |
fungi | and we ended up needing to pivot to a specific envvar it wasn't filtering | 19:32 |
fungi | this meant refactoring the site hostname to directory mapping into the config script | 19:33 |
fungi | since we switched from using an envvar which conveyed the directory to one which conveyed the virtual hostname | 19:33 |
clarkb | right we could've theoretically set the site dir in the HOST env var but that would have been very confusing | 19:35 |
clarkb | and if mailman used the env var for anything else potentially broken | 19:35 |
fungi | worth noting, since mm3 properly supports distinct hostnames (you can have the same list localpart at multiple domains now and each is distinct) we'll be able to avoid all that complexity with a switch to mm3 | 19:35 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/808570 has the details if you are interested | 19:36 |
clarkb | Alright lets move on. We haev a few more things to discuss and more than half our time is gone. | 19:36 |
clarkb | #topic Improving CD throughput | 19:37 |
clarkb | ianw: ^ anything new on this subject since the realization we needed to update periodic pipelines? Sorry I haven't had time to look at this again in a while | 19:37 |
ianw | umm, things in progress but i got a little sidetracked | 19:38 |
ianw | i think we have the basis of the dependencies worked out, but i need to rework the changes | 19:38 |
ianw | so in short, no, nothing new | 19:38 |
clarkb | It might also be good to sketch out what the future of the semaphores looks like in WIP changes just so we can see the end result. But no rush lots to sort out on this stack | 19:38 |
ianw | yeah it's definitely a "make it work serially first" situation | 19:39 |
clarkb | #topic Gerrit Account Cleanups | 19:39 |
clarkb | I have written notes on proposed plans for each user in the comments of review02:~clarkb/gerrit_user_cleanups/audit-results-annotated.yaml | 19:40 |
clarkb | There are 33 of these conflicts remaining. If you get a chance to look at the notes I wrote that would be great. fungi has read them over and didn't seem concerned though | 19:40 |
clarkb | My intent was to start writing those emails this week and make fixups in a checkout of the repo on review02 but mailing lists and other things have distracted me | 19:40 |
clarkb | Other than checking the notes I don't really need anything other than time though. This is making decent progress when I get that time | 19:41 |
clarkb | #topic OpenDev Logo Hosting | 19:41 |
clarkb | at this point we just need to update paste and gerrit's themes to use the gitea in repo hosted files then we are cleaned up from a gitea upgrade perspective | 19:41 |
fungi | this seems to be working well so far | 19:42 |
clarkb | ianw: you said you would write those changes, are they up yet? | 19:42 |
clarkb | fungi: and I agree seems to be working for what we are doing with gitea itself | 19:42 |
ianw | ahh, no those changes aren't up yet. on my todo | 19:42 |
clarkb | feel free to ping me when they go up and I'll review them | 19:43 |
clarkb | #topic Expanding InMotion cloud deployment | 19:43 |
clarkb | It sounds like InMotion is able to give us a few more IPs in order to better utilize the cluster we have | 19:43 |
clarkb | I'll be working with them Friday morning to work through that. However, right now we are failing to boot instances there and I need to go look at it more in depth | 19:43 |
clarkb | apparently rabbitmq is fine? and it may be some nova quota mismatch problem | 19:44 |
fungi | neat | 19:44 |
clarkb | I'll probably go ahead and disable the cloud soon as it isn't booting stuff and I think network changes potentailly mean we don't want nodepool running against it anyway | 19:44 |
clarkb | if anyone else wants to join let me know | 19:45 |
clarkb | will be conference call configuration meeting sounds like | 19:45 |
clarkb | #topic Scheduling Gerrit Project Renames | 19:46 |
clarkb | We've got a few project rename requests now. In addition to starting to think about a time to do those we have discovered some additional questions about the rename process | 19:46 |
clarkb | When we rename projects we should update all of their metadata in gitea so that the issues link and review links all line up | 19:46 |
clarkb | This should be doable but requires updates to the rename playbook. Good news is that is tested now :) | 19:47 |
clarkb | The other question I had was what do we do with orgs in gitea (and gerrit) when all projects are moved out of them. | 19:47 |
clarkb | In particular I'm concerned that deleting an org would break the redirects gitea has for things from foo/bar -> bar/foo if we delete foo/ | 19:47 |
clarkb | corvus: ^ you've looked at that code before do you have a sense for what might happen there? | 19:48 |
fungi | in addition to that, it might not be terrible to have a redirect mapping for storyboard.o.o, which could probably just be a flat list in a .htaccess file deployed on the server, built from the renames data we have in opendev/project-config (this could be a nice addition after the containerization work diablo_rojo is hacking on) | 19:48 |
corvus | clarkb: i don't recall for certain, but i think there is a reasonable chance that it may break as you suspect | 19:49 |
clarkb | ok something to test for sure then | 19:50 |
clarkb | As far as scheduling goes I'm wary of trying to do it before the openstack release which happens October 6th ish | 19:50 |
fungi | ywah, one of the proposed rename changes would empty the osf/ gerrit namespace and thus the osf org in gitea | 19:50 |
clarkb | But the week after: October 11 -15 might be a good time to do renames | 19:51 |
clarkb | I think that is the week before the ptg too? | 19:51 |
clarkb | Probably a good idea to avoid doing it during the ptg :) | 19:51 |
fungi | yeah, i'm good with that. i'm taking the friday before then off though | 19:52 |
fungi | (the 8th) | 19:52 |
clarkb | ok lets pencil in that week and decide on a specifc day as we get closer. Also work on doing metadata upadtes and test org removals | 19:52 |
fungi | wfm | 19:53 |
clarkb | If orgs can't be removed safely that isn't the end of the world and we'll just keep them for redirects | 19:53 |
clarkb | #topic Open Discussion | 19:53 |
clarkb | Thank you for listening to me for the last hour :) Anything else? | 19:53 |
fungi | nothing immediately springs to mind. i'll try to whip up a mm3 spec though | 19:54 |
clarkb | I'll give it another minute | 19:55 |
clarkb | Thanks everyone! we'll see you here next week same time and location | 19:56 |
clarkb | #endmeeting | 19:56 |
opendevmeet | Meeting ended Tue Sep 14 19:56:36 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:56 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2021/infra.2021-09-14-19.01.html | 19:56 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-09-14-19.01.txt | 19:56 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2021/infra.2021-09-14-19.01.log.html | 19:56 |
fungi | thanks clarkb! | 19:56 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!