Tuesday, 2021-09-14

fungi	ahoy!	19:00
clarkb	hello	19:00
clarkb	#startmeeting infra	19:01
opendevmeet	Meeting started Tue Sep 14 19:01:13 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:01
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:01
opendevmeet	The meeting name has been set to 'infra'	19:01
ianw	o/	19:01
clarkb	#link http://lists.opendev.org/pipermail/service-discuss/2021-September/000283.html Our Agenda	19:01
clarkb	#topic Announcements	19:01
clarkb	I didn't have any announcements. Did anyone else have announcements to share?	19:02
fungi	i don't think i did	19:02
clarkb	#topic Actions from last meeting	19:03
clarkb	#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-09-07-19.01.txt minutes from last meeting	19:03
clarkb	There were no actions recorded last meeting	19:03
* mordred waves to lovely hoomans		19:03
clarkb	#topic Specs	19:03
clarkb	#link https://review.opendev.org/c/opendev/infra-specs/+/804122 Prometheus Cacti replacement	19:03
clarkb	I updated the spec based on some of the feedback I got. Seems everyone is happy with the general plan but one specific thing has come up since I pushed the update	19:04
clarkb	Basically corvus is pointing out we shouldn't try to do node exporter and snmp exporter as that will double our work we should commit to one or the other	19:04
clarkb	I'll try to capture the pros/cons of each really quickly here, but I would appreciate it ya'll could take a look and leave your thoughts on this specific topic	19:05
clarkb	For the SNMP exporter the upside is we already run and configure snmpd on all of our instances. This means the only change on our instance needed to collect snmp data is a firewall update to allow the new prometheus server to poll the data.	19:05
clarkb	The snmp exporter downside is that we'll have to do a fair bit of configuration to tell the snmp exporter what snmp mibs (is that the right terminology?) to collect and where to map them into prometheus. Then we have to do a bunch of work to set of graphs for that data	19:06
fungi	oids, technically	19:06
clarkb	For node exporter the issue is we need to run a new service that doesn't exist in our distros (at least I'm fairly certain there aren't packages for it). We would instead use docker + docker-compose to run this service	19:07
fungi	mibs are collections of oids	19:07
clarkb	This means we will need to add docker to a number of systems that don't currently run docker today. OpenAFS, DNS, mailman servers immediately come to mind. This is possible but a bit of work too.	19:07
clarkb	The upside to using node exporter is we use something that is a bit more ready out of the box to collect server perormance metrics and I'm sure there are preexisting grafana graphs we can borrow from somewhere too	19:08
clarkb	That is the gist of it. Please leave your preferences on the spec and I'll followup on that	19:08
fungi	i guess we'd just include the docker role in our base playbook	19:08
fungi	right?	19:09
clarkb	Personally I was leaning towards snmp simply because I thought we hadn't wanted to run docker in places like our dns servers	19:09
clarkb	fungi: yup and set up docker-compose for node exporter there	19:09
fungi	are there resource concerns with adding docker to some of those servers?	19:09
frickler	do we really need docker to run node-exporter?	19:10
clarkb	frickler: we do if we don't want to reinvent systems/tooling to deploy an up to date version of node exporter	19:10
clarkb	there are alternatives but then you're doing a bunch of work to keep a binary blob up to date whcih is basically what docker does	19:10
clarkb	I definitely don't have a strong opinion on this myself right now and will need to think about it a bit more	19:11
frickler	yeah, I guess I'll need to do some research, too	19:11
clarkb	fungi: that is probably the biggest reason to not do this if dockerd + node exporter consume a bunch of resources. I can probably deploy it locally on my fileserver and see what sort of memory and cpu consumption it does	19:11
fungi	ubuntu has prometheus-node-exporter and prometheus-node-exporter-collectors packages, maybe that would be just as good?	19:12
frickler	I'm also thinking whether I should add myself as volunteer, but let me sleep about that idea first	19:12
clarkb	fungi: but not far enough back in time for our systems iirc	19:12
clarkb	I thought I looked at that and decided docker was really the only way we could run it with our heterogenous setup	19:13
fungi	there's a prometheus-node-exporter on bionic	19:13
clarkb	looks like focal does have it but the version is quite old (and focals is quite old too) maybe that was the issue	19:13
fungi	we're just about out of the xenial weeds	19:13
clarkb	Ya lets look at this a bit more. THink it over and update the spec. I'm going to continue on in the meeting as we have other stuff to cover and are a quarter of the way through the hour	19:14
clarkb	#topic Topics	19:14
clarkb	#topic Mailman Ansible and Server Upgrades	19:15
corvus	i don't have a strong opinion on which; i just feel like writing system-config changes for either basically negates the value of the other, so we should try to pick one early	19:15
corvus	[eot from me; carry on]	19:15
clarkb	On Sunday fungi and I upgraded lists.openstack.org and that was quite the adventure	19:15
fungi	corvus also helped with that	19:15
clarkb	oh right corvus helped out with the mailman stuff at the end	19:15
clarkb	Everything went well until we tried to boot the Focal kernel on the ancient rax xen pv flavor	19:16
corvus	very little; i made only a brief appearance; :)	19:16
clarkb	it turns out that xen can't properly decompress the focal kernels because they are compressed with lz4	19:16
fungi	corvus: brief but crucial to the plot	19:16
clarkb	We worked around the kernel issue by manually decompressing the kernel using the linxu kernel's extract-vmlinux tool, installing grub-xen, then chainbooting to the /boot/xen/pvboot-x86_64.elf that it installs	19:17
clarkb	What that did was tell xen how to find the kernel as well as supply a kernel to it that it doesn't have to decompress	19:17
clarkb	Then we had to fix up our exim, mailman, and apache configs to handle new mailman env var filtering	19:17
clarkb	Where we are at right now is the host is out of the emergency file and ansible is ansibling the new configs that we had to do successfuuly	19:18
clarkb	But the kernel situation is still all kinds of bad. We need to decide how we want to ensure that ubuntu isn't going to (re)install a compressed kernel.	19:18
fungi	note that the kernel dance is purely because the lists.o.o server was created in 2013 and has been in-place upgraded continuously since ubuntu 12.04 lts	19:18
fungi	so it's still running an otherwise unavailable pv flavor in rackspace	19:19
clarkb	We can pin the kernel package. We can create a kernel postinst.d hook to decompress the kernel when the kernel updates. We can manually decompress the current kernel whenever we need to update (and use a rescue instance if the host reboots unexpectedly).	19:19
fungi	pv xen loads the kernel from outside the guest domu, while pvhvm works more like a bootable virtual machine similar to kvm	19:19
clarkb	In all cases I think we should begin working to replace the server, but there will be some period of time between now and when we are running with a new server where we want to have a working boot setup	19:19
corvus	oh, the chainloaded kernel can't be compressed?	19:20
corvus	(i thought maybe the chainloading could get around that)	19:20
fungi	corvus: nope, because it still has to hand the kernel blob off to the pv xen hypervisor	19:20
clarkb	corvus: we did some digging this morning and while we haven't tested it have foudn sufficient evidence that this doesnt work on mailing lists and web forums that we didn't want to try it	19:20
clarkb	really all the chain load is doing is finding the correct kernel to hand to xen I think	19:20
clarkb	because it understands grub2 configs	19:20
clarkb	https://unix.stackexchange.com/questions/583714/xen-pvgrub-with-lz4-compressed-kernels covers what is involved in auto decompressing the kernel if we want to do that	19:21
fungi	yeah, it essentially communicates the offset where the kernel blob starts	19:21
corvus	ok. then i agree, we're in a hole and we should get out of it with a new server	19:21
fungi	with "new server" comes a number of questions, like should we take this opportunity to fold in lists.katacontainers.io? should we take this as an opportunity to migrate to mm3 on a new server?	19:22
clarkb	yup I think we should just accept that is necessary now. Then decide what workaround for the kernel we want to use while we do that new server work	19:22
ianw	pinning it as is so a power-off situation doesn't become fatal and working on a new server seems best to me	19:22
clarkb	ianw: ya and if we really need to do a kernel update on the server we can do it manually and do the decompress step at the same time	19:23
clarkb	I'm leaning towards an apt pin myself for this reason. It doesn't prevent us from updating but ensures we do so with care	19:23
frickler	maybe too obvious a question, but resizing to a modern flavor isn't supported on rackspace?	19:24
clarkb	frickler: ya iirc you could only resize within pv or pvhvm flavors but not across	19:24
fungi	switching from pv to pvhvm isn't supported anyway	19:24
clarkb	But I guess that is somethign we could ask? fungi maybe as a followup on the issue you opened?	19:24
ianw	fungi: it seems sensible to make the migration also be a mm3 migration	19:25
fungi	oh, that trouble ticket is already closed after we worked out how to boot	19:25
fungi	i went back over the current state of semi-official mm3 containers, we'd basically need three containers for the basic components of mm3 (core, hyperkitty, postorius) plus apache and mysql. or we could use the distro packages in focal (it has mm 3.2.2 while latest is 3.3.4)	19:25
fungi	also there are tools to import mm 2.1 configs to 3.x	19:25
clarkb	fungi: I think we should confirm we can't switch from pv to pvhm. I'm fairly certain our image would support both since the menu.lst is where we put the chainload and normal grub boot should ignore that	19:25
fungi	and import old archives (with some caveats), though we can also serve old pipermail copies of the archives for backward compatibility with existing hyperlinks	19:26
ianw	fungi: it could basically be stood up completely independently for validation right? the archives seem the thing that need importing	19:26
clarkb	ianw: fungi: yes and we should be able to use zuul holds for that too	19:26
fungi	clarkb: yeah, in theory the image we have now could work on a pvhvm flavor, if there's a way to swotch it	19:26
fungi	switch it	19:26
fungi	ianw: archives and list configs both need importing, but yes i expect we'd follow our test-based development pattern for building the new mm3 deployment and then just hold a test node	19:27
clarkb	Let me try an summarize what we seem to be thinking: 1) pin the kernel package on lists.o.o so it doesn't break. Manually update the kernel and decompress if necessary. 2) Begin work to upgrade to mm3 3and4) Determine if we can switch to a pvhvm flavor whcih boots reliably against modern kernels or replace the server	19:28
clarkb	Is there any objections to 1) since getting that sorted sooner than later is a good idea.	19:28
ianw	++ to all from me	19:28
fungi	yeah, i'm good with all of the above	19:28
fungi	i can set the kernel hackage hold once the meeting ends	19:29
clarkb	fungi: thanks. I'd be happy to follow along since I always find those confusing and more experience with them would be good :)	19:29
fungi	happily	19:29
clarkb	I'll see if I can do any research into the pv to pvhvm question	19:30
clarkb	and sounds like fungi has already been looking at 2)	19:30
fungi	for years, but again this week yes	19:30
clarkb	Anything else on this subject? Concerns or issues you've noticed since the upgrade outside of the above	19:31
fungi	aside from the kernel issue we also had some changes we needed to make to our tooling around envvars	19:31
fungi	corvus managed to work out that newer mailman started filtering envvars	19:32
fungi	so the one we made up for the site dir in our multi-site design was no longer making it throughto the config script	19:32
fungi	and we ended up needing to pivot to a specific envvar it wasn't filtering	19:32
fungi	this meant refactoring the site hostname to directory mapping into the config script	19:33
fungi	since we switched from using an envvar which conveyed the directory to one which conveyed the virtual hostname	19:33
clarkb	right we could've theoretically set the site dir in the HOST env var but that would have been very confusing	19:35
clarkb	and if mailman used the env var for anything else potentially broken	19:35
fungi	worth noting, since mm3 properly supports distinct hostnames (you can have the same list localpart at multiple domains now and each is distinct) we'll be able to avoid all that complexity with a switch to mm3	19:35
clarkb	https://review.opendev.org/c/opendev/system-config/+/808570 has the details if you are interested	19:36
clarkb	Alright lets move on. We haev a few more things to discuss and more than half our time is gone.	19:36
clarkb	#topic Improving CD throughput	19:37
clarkb	ianw: ^ anything new on this subject since the realization we needed to update periodic pipelines? Sorry I haven't had time to look at this again in a while	19:37
ianw	umm, things in progress but i got a little sidetracked	19:38
ianw	i think we have the basis of the dependencies worked out, but i need to rework the changes	19:38
ianw	so in short, no, nothing new	19:38
clarkb	It might also be good to sketch out what the future of the semaphores looks like in WIP changes just so we can see the end result. But no rush lots to sort out on this stack	19:38
ianw	yeah it's definitely a "make it work serially first" situation	19:39
clarkb	#topic Gerrit Account Cleanups	19:39
clarkb	I have written notes on proposed plans for each user in the comments of review02:~clarkb/gerrit_user_cleanups/audit-results-annotated.yaml	19:40
clarkb	There are 33 of these conflicts remaining. If you get a chance to look at the notes I wrote that would be great. fungi has read them over and didn't seem concerned though	19:40
clarkb	My intent was to start writing those emails this week and make fixups in a checkout of the repo on review02 but mailing lists and other things have distracted me	19:40
clarkb	Other than checking the notes I don't really need anything other than time though. This is making decent progress when I get that time	19:41
clarkb	#topic OpenDev Logo Hosting	19:41
clarkb	at this point we just need to update paste and gerrit's themes to use the gitea in repo hosted files then we are cleaned up from a gitea upgrade perspective	19:41
fungi	this seems to be working well so far	19:42
clarkb	ianw: you said you would write those changes, are they up yet?	19:42
clarkb	fungi: and I agree seems to be working for what we are doing with gitea itself	19:42
ianw	ahh, no those changes aren't up yet. on my todo	19:42
clarkb	feel free to ping me when they go up and I'll review them	19:43
clarkb	#topic Expanding InMotion cloud deployment	19:43
clarkb	It sounds like InMotion is able to give us a few more IPs in order to better utilize the cluster we have	19:43
clarkb	I'll be working with them Friday morning to work through that. However, right now we are failing to boot instances there and I need to go look at it more in depth	19:43
clarkb	apparently rabbitmq is fine? and it may be some nova quota mismatch problem	19:44
fungi	neat	19:44
clarkb	I'll probably go ahead and disable the cloud soon as it isn't booting stuff and I think network changes potentailly mean we don't want nodepool running against it anyway	19:44
clarkb	if anyone else wants to join let me know	19:45
clarkb	will be conference call configuration meeting sounds like	19:45
clarkb	#topic Scheduling Gerrit Project Renames	19:46
clarkb	We've got a few project rename requests now. In addition to starting to think about a time to do those we have discovered some additional questions about the rename process	19:46
clarkb	When we rename projects we should update all of their metadata in gitea so that the issues link and review links all line up	19:46
clarkb	This should be doable but requires updates to the rename playbook. Good news is that is tested now :)	19:47
clarkb	The other question I had was what do we do with orgs in gitea (and gerrit) when all projects are moved out of them.	19:47
clarkb	In particular I'm concerned that deleting an org would break the redirects gitea has for things from foo/bar -> bar/foo if we delete foo/	19:47
clarkb	corvus: ^ you've looked at that code before do you have a sense for what might happen there?	19:48
fungi	in addition to that, it might not be terrible to have a redirect mapping for storyboard.o.o, which could probably just be a flat list in a .htaccess file deployed on the server, built from the renames data we have in opendev/project-config (this could be a nice addition after the containerization work diablo_rojo is hacking on)	19:48
corvus	clarkb: i don't recall for certain, but i think there is a reasonable chance that it may break as you suspect	19:49
clarkb	ok something to test for sure then	19:50
clarkb	As far as scheduling goes I'm wary of trying to do it before the openstack release which happens October 6th ish	19:50
fungi	ywah, one of the proposed rename changes would empty the osf/ gerrit namespace and thus the osf org in gitea	19:50
clarkb	But the week after: October 11 -15 might be a good time to do renames	19:51
clarkb	I think that is the week before the ptg too?	19:51
clarkb	Probably a good idea to avoid doing it during the ptg :)	19:51
fungi	yeah, i'm good with that. i'm taking the friday before then off though	19:52
fungi	(the 8th)	19:52
clarkb	ok lets pencil in that week and decide on a specifc day as we get closer. Also work on doing metadata upadtes and test org removals	19:52
fungi	wfm	19:53
clarkb	If orgs can't be removed safely that isn't the end of the world and we'll just keep them for redirects	19:53
clarkb	#topic Open Discussion	19:53
clarkb	Thank you for listening to me for the last hour :) Anything else?	19:53
fungi	nothing immediately springs to mind. i'll try to whip up a mm3 spec though	19:54
clarkb	I'll give it another minute	19:55
clarkb	Thanks everyone! we'll see you here next week same time and location	19:56
clarkb	#endmeeting	19:56
opendevmeet	Meeting ended Tue Sep 14 19:56:36 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	19:56
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2021/infra.2021-09-14-19.01.html	19:56
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-09-14-19.01.txt	19:56
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2021/infra.2021-09-14-19.01.log.html	19:56
fungi	thanks clarkb!	19:56

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!