Tuesday, 2022-10-18

ianw	stephenfin: trying to get to the bottom of a stestr/stevedore/importlib-metadata/python3.7 horror combo issue -> https://github.com/mtreinish/stestr/issues/336	03:00
ianw	it looks like https://opendev.org/openstack/stevedore/commit/143a3e9f0716690be7343d4d083f65d7624b3d2e in stevedore 3.5.1 should be the fix; but stestr still isn't finding it's commands	03:02
ianw	real_groups with old importlib-metadata looks like -> https://paste.opendev.org/show/bZ7yPaO1pVF8aKT2uGof/	03:14
ianw	real_groups with stevedore 3.5.1 looks like -> https://paste.opendev.org/show/bpqS0OPic22XF284x5Dg/	03:14
ianw	... i think they should look the same. it suggests to me the expansion maybe isn't quite right?	03:15
ianw	https://review.opendev.org/c/openstack/stevedore/+/861695	03:57
ianw	that seems to fix stestr on buster/python 3.7 ... but does it introduce some other issue? i don't know	03:58
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: Pin sphinx to 5.2.3 https://review.opendev.org/c/zuul/zuul-jobs/+/861587	04:30
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: linter: Use capitals for names https://review.opendev.org/c/zuul/zuul-jobs/+/854933	04:30
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: Fix ansible-lint name[template] https://review.opendev.org/c/zuul/zuul-jobs/+/861559	04:30
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: Add names to include tasks https://review.opendev.org/c/zuul/zuul-jobs/+/861560	04:30
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: Standarise block/when ordering https://review.opendev.org/c/zuul/zuul-jobs/+/861562	04:30
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: Update to ansible-lint 6.8.2 https://review.opendev.org/c/zuul/zuul-jobs/+/861563	04:30
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: [wip] sphinx circular dependencies error https://review.opendev.org/c/zuul/zuul-jobs/+/861588	04:30
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: Workaround stevedore/python3.7 issues https://review.opendev.org/c/zuul/zuul-jobs/+/861698	04:30
*** ysandeep\|out is now known as ysandeep		05:47
ianw	chown: invalid group: â€˜root:letsencyrptâ€™	06:20
ianw	... ?	06:20
ianw	https://7e827a77180c1e6e432f-3c4e8d8f712aba3e652b0cfd0c30a298.ssl.cf5.rackcdn.com/861138/12/check/system-config-run-letsencrypt/35bbcf9/letsencrypt01.opendev.org/acme.sh/acme.sh.log	06:20
*** ramishra_ is now known as ramishra		06:38
*** jpena\|off is now known as jpena		07:17
*** ysandeep is now known as ysandeep\|lunch		08:16
*** marios is now known as marios\|call		09:00
*** jpodivin__ is now known as jpodivin		09:33
*** ysandeep\|lunch is now known as ysandeep		10:37
*** marios\|call is now known as marios		11:33
*** dviroel\|out is now known as dviroel		11:41
stephenfin	ianw: One comment on https://review.opendev.org/c/openstack/stevedore/+/861695	12:01
*** ysandeep is now known as ysandeep\|away		12:20
frickler	anyone else seeing lags when loading etherpads currently?	13:03
fungi	it was maybe a little slower than usual for me. i'll take a look at the system resource utilization	13:35
*** dasm\|off is now known as dasm		13:49
clarkb	memory and system load look fine. I do note that the root fs is a bit full	14:09
clarkb	it looks like log rotate has rotated out (deleted) one of the old db backups in /var/backups/etherpad-mariadb	14:10
clarkb	the bulk of the disk is consumed by the etherpad container's json log file	14:14
clarkb	I think we should either directly truncate that under the container (ugh) or down then up the container later to restart the log collection with a new container	14:16
clarkb	and then look at redirecting the logs to /var/log/containers in order to log rotate them there	14:17
clarkb	also we should clear out the old backup at /var/backups/etherpad-mariadb/etherpad-mariadb.sql.gz.3	14:17
fungi	yeah, we can probably do it safely around 18:00 utc if we don't want to wait until the weekend	14:17
clarkb	ya I think start with clearing the extra backup file	14:19
clarkb	then down then up and that should get the disk usage into a happy enough spot where we can add the syslog redirects without being rush	14:19
clarkb	*without being in a rush	14:19
fungi	people will get disconnected from pads they've left up in their browsers, but since there shouldn't be any sessions running at that time it hopefully won't be too disruptive	14:22
fungi	we can #status log it for a bit of added visibility	14:22
clarkb	++	14:22
fungi	i have no idea what would happen if we restarted etherpad while a meetpad call is running	14:23
clarkb	it should just fail the document	14:25
clarkb	the call will work (as was the casewhen we had the cross site domain stuff improperly set up)	14:25
*** ysandeep\|away is now known as ysandeep		14:31
fungi	yeah, just didn't know if it would be able to reconnect clients to the pad after	14:31
*** dviroel is now known as dviroel\|dr_appt		14:52
clarkb	one thing I noticed about the jammy cloud image I used for gitea-lb02 is that it has a /boot/efi as a separate partition	15:31
fungi	debian's official images are like that too	15:31
clarkb	Does that imply it is also gpt instead of mbr?	15:32
clarkb	vexxhost booted it just fine so I'm not really worried about it, but curious to see cloud images moving forward like that	15:32
fungi	you should be able to inspect it easily, but i expect so yes	15:32
fungi	`sudo fdisk -l /dev/vda`	15:33
fungi	"Disklabel type: gpt"	15:33
clarkb	is that from gitea-lb02?	15:33
fungi	yes	15:33
clarkb	neat	15:33
fungi	partition types are Linux filesystem on /dev/vda1, BIOS boot on /dev/vda14, and EFI System on /dev/vda15	15:34
clarkb	oh interesting I guess that implies it can boot legacy or efi	15:35
fungi	vda14 isn't mounted but vda15 is, so that's what i take from it yes	15:35
clarkb	that makes sense for an official cloud image that might be used in many places	15:35
fungi	also fstab says it's configured to use swap on /swapfile	15:36
fungi	so no separate swap partition	15:36
clarkb	yup the swapfile is created by our launch scripts	15:39
*** ysandeep is now known as ysandeep\|out		15:44
fungi	ah okay	15:47
clarkb	infra-root the ptg sessions I got up earlyfor today are winding down and I should be around if we want to alnd https://review.opendev.org/c/opendev/zone-opendev.org/+/861229	15:59
clarkb	cacti is collecting info from the new server now too so we'll be able to watch and compare that data against the historical data for theold server	16:00
*** marios is now known as marios\|out		16:02
clarkb	and now breakfast since I managed to skip that	16:03
fungi	yeah, i'm ready for 861229 whenever	16:04
fungi	about to go pick up some lunch takeout, but i'll only be gone for like 10-15 minutes	16:04
*** jpena is now known as jpena\|off		16:31
clarkb	fungi: are you back? If so I'll go ahead and approve that change	16:46
*** svinavel_ is now known as svinavel		16:47
*** dviroel\|dr_appt is now known as dviroel		16:49
fungi	clarkb: yeah, sorry, had stepped away to eat	17:07
fungi	but i'm around and ready to watch/fix stuff	17:07
clarkb	ok approved	17:08
clarkb	next up is looking at ethepad. Earlier I think you mentioend waiting until 1800 for that to be sure people had set the ptg down?	17:08
fungi	yeah, sessions officially ended at 17:00 but i expect some may run long	17:09
clarkb	wfm	17:09
fungi	but an hour after sessions have officially ended seems like plenty of buffer	17:09
fungi	and the outage should be extremely brief	17:10
clarkb	yup	17:10
opendevreview	Merged opendev/zone-opendev.org master: Swap gitea-lb01 to gitea-lb02 for opendev.org https://review.opendev.org/c/opendev/zone-opendev.org/+/861229	17:10
clarkb	gitea-lb02 is in dns for opendev.org now	17:48
clarkb	It seems towork for me but please say something if you notice anything odd or unexpected. We can monitor resource utilization via cacti as well and probably clean up gitea-lb01 towards the end of the week if nothing comes up	17:49
fungi	i'm cloning nova now just as a cursory check	17:51
clarkb	++	17:53
fungi	that worked	17:53
fungi	i also let #openinfra-events know about the impending etherpad restart	17:54
clarkb	thanks. Before doing that etherpad down and up we should clean up the unneeded backup file if others agree that file is extra	17:54
fungi	looking	17:55
fungi	etherpad-mariadb.sql.gz.3 from 2022-10-11 or something else?	17:55
clarkb	yes that one	17:56
clarkb	it seems that the rotation failed. I suspect due to running out disk space to move things around	17:56
fungi	yes, it looks like there is currently insufficient space to create a new db backup even	17:57
clarkb	I think we can manually delete it and then down then up -d the containers to clear out the json log file associated with the container. And that should free a bunch of space	17:57
fungi	agreed	17:57
fungi	i don't think .3 is necessarily "extra" but that it was probably created just before space got too tight to rotate any new backups	17:58
clarkb	well I think we only keep like one extra day normally	17:58
clarkb	which is why .1 and .2 don't exist now that .3 won't go away?	17:58
fungi	and since then it's just been overwriting the primary backup file	17:59
fungi	could be	17:59
clarkb	fungi: were you planning the do the things or should I?	18:00
clarkb	(I'm around tohelp either way)	18:00
fungi	i can do the things, no prob	18:00
fungi	working on that now	18:00
clarkb	thanks!	18:00
fungi	first deleting /var/backups/etherpad-mariadb/etherpad-mariadb.sql.gz.3	18:00
fungi	free space on the rootfs went from 2.9G to 6.4G	18:01
fungi	when we stop and start the container, is there a log deletion step i need to do in between?	18:02
clarkb	fungi: you have to use docker-compose down then docker-compose up -d which will cause docker-compose and docker to delete the containers and create new ones. The container deletion step will wipe the logs	18:02
clarkb	if you stop then start the container process will stop and you'd have to manually delete the files behind docker's back which seems hackier	18:03
fungi	perfect, so no extra step required	18:03
fungi	done. available space on the rootfs is back up to 20G now	18:04
clarkb	heh we just got an email about backups failing. I checked the log and the reason is that we stopped the databsae server (the dockercompose down) while it was backing up remotely	18:05
clarkb	I think that is fine, but if we are concerned we could manually retrigger the backup crontab entry in a root screen	18:05
clarkb	I'm able to open a coupel of etherpads so all looks well from that perspective	18:06
clarkb	according to cacti the gitea-lb02 network traffic is picking up. System load and cpu usage both look to be happy so far	18:10
fungi	yeah, etherpads seem to be working for me	18:13
fungi	database backups kick off at 04:42z daily, so i expect if we interrupted that db backup then it was approximately hung	18:14
clarkb	fungi: there are two different db backups	18:15
clarkb	there are the borg driven backups which happen randomly for each server and stream to the borg backup servers (this is what we interrupted). And there is the local write a file on the host for ease of use and local db backups which is what we rm'd thestale file for	18:15
fungi	oic	18:15
fungi	it's the local mysqldump which happens at 04:42	18:16
clarkb	yup	18:16
clarkb	fungi: https://review.opendev.org/q/topic:docker-cleanups+OR+topic:use-new-python is a set of docker image fixes and python modernization changes	18:17
clarkb	if you've got time to take a look at those a number of them are probablfairly safe to land	18:17
fungi	and indeed, the borg backup to backup01.ord.rax.opendev.org started at 17:49z, about 15 minutes before i took the container down	18:18
fungi	#status log Restarted the services on etherpad.opendev.org in order to free up some disk space	18:25
opendevstatus	fungi: finished logging	18:25
opendevreview	Merged opendev/system-config master: Fixup jinja-init image https://review.opendev.org/c/opendev/system-config/+/861473	18:41
clarkb	sounds like gitea 1.18 rc0 will be out in a few days	18:50
clarkb	that will include the vendor file indentification fix I wrote. I'll try ot get a patch up testing it once the tag exists	18:50
fungi	oh, nice!	18:51
opendevreview	Merged openstack/project-config master: Move grafyaml check and gate jobs in repo https://review.opendev.org/c/openstack/project-config/+/861482	20:10
opendevreview	Merged opendev/grafyaml master: Run pep8 and unittest jobs out of in repo config https://review.opendev.org/c/opendev/grafyaml/+/861483	20:22
opendevreview	MICHAEL KELLY proposed zuul/zuul-jobs master: helm: Add job for linting helm charts https://review.opendev.org/c/zuul/zuul-jobs/+/861799	21:26
opendevreview	Clark Boylan proposed opendev/system-config master: Stop updating pip in our docker assemble script https://review.opendev.org/c/opendev/system-config/+/861800	21:35
*** dasm is now known as dasm\|off		22:15
clarkb	hrm I think there is a chicken and egg in that change for the uwsgi image	22:17
clarkb	the good news with that is it means we do actually test it. The bad news is I have to figure out how to unravel things	22:17
opendevreview	MICHAEL KELLY proposed zuul/zuul-jobs master: helm: Add job for linting helm charts https://review.opendev.org/c/zuul/zuul-jobs/+/861799	22:18
clarkb	hrm so it does seem that the uwsgi builds properly did not update pip but we've still got the same problem	22:24
clarkb	in this case with the uwsgi package instead of netifaces	22:24
clarkb	maybe my reproduction case with netifaces locally was too trivial and there is something bigger happening	22:27
clarkb	hrm and this doens't fix nodepool either	22:28
opendevreview	MICHAEL KELLY proposed zuul/zuul-jobs master: helm: Add job for linting helm charts https://review.opendev.org/c/zuul/zuul-jobs/+/861799	22:32
opendevreview	MICHAEL KELLY proposed zuul/zuul-jobs master: helm: Add job for linting helm charts https://review.opendev.org/c/zuul/zuul-jobs/+/861799	22:42
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: Pin sphinx to 5.2.3 https://review.opendev.org/c/zuul/zuul-jobs/+/861587	23:21
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: Workaround stevedore/python3.7 issues https://review.opendev.org/c/zuul/zuul-jobs/+/861698	23:25
opendevreview	Merged opendev/system-config master: infra-prod-bootstrap-bridge: run directly on bridge https://review.opendev.org/c/opendev/system-config/+/861138	23:34
opendevreview	Ian Wienand proposed opendev/system-config master: docs: Update force-merge docs for removing votes https://review.opendev.org/c/opendev/system-config/+/861802	23:41
opendevreview	Merged zuul/zuul-jobs master: Pin sphinx to 5.2.3 https://review.opendev.org/c/zuul/zuul-jobs/+/861587	23:48
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: Workaround stevedore/python3.7 issues https://review.opendev.org/c/zuul/zuul-jobs/+/861698	23:51
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: linter: Use capitals for names https://review.opendev.org/c/zuul/zuul-jobs/+/854933	23:51
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: Fix ansible-lint name[template] https://review.opendev.org/c/zuul/zuul-jobs/+/861559	23:51
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: Add names to include tasks https://review.opendev.org/c/zuul/zuul-jobs/+/861560	23:51
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: Standarise block/when ordering https://review.opendev.org/c/zuul/zuul-jobs/+/861562	23:51
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: Update to ansible-lint 6.8.2 https://review.opendev.org/c/zuul/zuul-jobs/+/861563	23:51
opendevreview	Ian Wienand proposed zuul/zuul-jobs master: [wip] sphinx circular dependencies error https://review.opendev.org/c/zuul/zuul-jobs/+/861588	23:51

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!