Friday, 2020-01-10

*** stevebaker has joined #openstack-infra		00:01
*** mattw4 has quit IRC		00:04
*** dchen has joined #openstack-infra		00:09
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	00:20
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	00:27
*** hrw has joined #openstack-infra		00:33
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: collect-container-logs: add role https://review.opendev.org/701867	00:34
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	00:36
*** zhurong has quit IRC		00:36
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: collect-container-logs: add role https://review.opendev.org/701867	00:38
*** tetsuro has joined #openstack-infra		00:39
openstackgerrit	Mohammed Naser proposed zuul/zuul-registry master: Switch to collect-container-logs https://review.opendev.org/701868	00:39
openstackgerrit	Mohammed Naser proposed zuul/nodepool master: Switch to collect-container-logs https://review.opendev.org/701869	00:42
openstackgerrit	Mohammed Naser proposed zuul/zuul-registry master: Switch to collect-container-logs https://review.opendev.org/701868	00:42
openstackgerrit	Mohammed Naser proposed opendev/system-config master: Switch to collect-container-logs https://review.opendev.org/701870	00:47
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: collect-container-logs: add role https://review.opendev.org/701867	00:52
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	00:52
*** eandersson has joined #openstack-infra		00:54
eandersson	stackalytics.com cert expired?	00:54
fungi	eandersson: so we've heard	00:56
fungi	we don't run it, and no clue who to reach out to at mirantis	00:56
fungi	(we offered to run it more than once in the past)	00:57
eandersson	Hopefully someone that cares enough to fix it :p	00:57
mnaser	infra-root: i think one of the executors might have issues with log streaming, as i'm seeing occasional "--- END OF STREAM ---" on jobs that are clearly running and eventually report a result	00:58
mnaser	example: http://zuul.opendev.org/t/zuul/stream/bf5d120011d448c8baedcce26d0b31d0?logfile=console.log	00:59
mnaser	according to the api, ze05 is the one running that job	00:59
fungi	mnaser: it happens frequently that we go over memory on them and the oom-killer decides the output streamer would be a good thing to arbitrarily kill	01:04
fungi	i'll take a look	01:04
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: helm-template: Add role to run 'helm template' https://review.opendev.org/701871	01:06
fungi	the following executors need restarts to get their output streamers going again: 02,03,04,05,12	01:07
fungi	so almost half	01:07
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	01:07
fungi	i'll poke at them for a bit	01:07
fungi	things look pretty quiet, so i can just restart them all at the same time and let the other 7 handle the load in the interim	01:09
*** zbr\|rover has quit IRC		01:11
*** HenryG has quit IRC		01:11
*** HenryG has joined #openstack-infra		01:12
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	01:14
*** zhurong has joined #openstack-infra		01:15
*** roman_g has quit IRC		01:15
*** zbr has joined #openstack-infra		01:19
*** zbr has quit IRC		01:24
*** zbr has joined #openstack-infra		01:25
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts https://review.opendev.org/701874	01:29
clarkb	we have LE certs on zuul.o.o now	01:31
clarkb	I'll merge the change to start using those certs first thing tomorrow morning	01:31
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	01:31
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: helm-template: Add role to run 'helm template' https://review.opendev.org/701871	01:40
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts https://review.opendev.org/701874	01:40
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	01:40
*** Lucas_Gray has quit IRC		01:43
openstackgerrit	Mohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s https://review.opendev.org/701764	01:47
*** ricolin_ has joined #openstack-infra		01:50
*** zbr_ has joined #openstack-infra		02:15
*** zbr has quit IRC		02:16
*** zbr_ has quit IRC		02:17
*** gyee has quit IRC		02:23
*** rh-jelabarre has joined #openstack-infra		02:26
*** zbr has joined #openstack-infra		02:28
*** zxiiro has quit IRC		02:35
fungi	#status log restarted zuul-executor service on ze02,03,04,05,12 to get log streamers running again after oom-killer got them; had to clear stale pidfile on zm04	02:41
openstackstatus	fungi: finished logging	02:41
*** rh-jelabarre has quit IRC		02:44
*** rlandy has quit IRC		02:55
mnaser	thank you for taking care of it fungi	02:56
fungi	no promlem	02:59
*** apetrich has quit IRC		03:12
*** ricolin_ has quit IRC		03:18
*** ricolin has joined #openstack-infra		03:19
*** psachin has joined #openstack-infra		03:22
*** armax has quit IRC		03:41
*** ykarel\|away has joined #openstack-infra		04:22
*** hwoarang has quit IRC		04:24
*** hwoarang has joined #openstack-infra		04:26
*** stevebaker has quit IRC		04:41
*** tetsuro has quit IRC		04:44
*** tetsuro has joined #openstack-infra		04:45
*** tetsuro has quit IRC		04:49
hrw	fungi: thanks!	04:54
*** surpatil has joined #openstack-infra		05:00
*** factor has quit IRC		05:08
*** factor has joined #openstack-infra		05:08
*** ykarel has joined #openstack-infra		05:18
*** ykarel\|away has quit IRC		05:20
*** tkajinam has quit IRC		05:26
*** tkajinam has joined #openstack-infra		05:29
*** goldyfruit has quit IRC		05:30
*** goldyfruit has joined #openstack-infra		05:30
*** evrardjp has quit IRC		05:33
*** evrardjp has joined #openstack-infra		05:34
*** ykarel_ has joined #openstack-infra		05:35
*** kjackal has joined #openstack-infra		05:35
*** bdodd has joined #openstack-infra		05:37
*** ykarel has quit IRC		05:38
*** exsdev has quit IRC		05:44
*** tetsuro has joined #openstack-infra		05:45
*** exsdev has joined #openstack-infra		05:46
*** exsdev has quit IRC		05:48
*** tetsuro has quit IRC		05:49
*** tetsuro has joined #openstack-infra		05:53
*** tetsuro has quit IRC		05:57
*** tetsuro_ has joined #openstack-infra		05:57
*** tkajinam_ has joined #openstack-infra		06:02
*** exsdev has joined #openstack-infra		06:02
*** tkajinam has quit IRC		06:04
*** lpetrut has joined #openstack-infra		06:08
*** lpetrut has quit IRC		06:09
*** lpetrut has joined #openstack-infra		06:10
*** kjackal has quit IRC		06:11
*** lmiccini has joined #openstack-infra		06:42
*** ykarel_ is now known as ykarel		07:03
*** exsdev has quit IRC		07:09
*** rcernin has quit IRC		07:15
*** slaweq has joined #openstack-infra		07:18
*** exsdev has joined #openstack-infra		07:23
*** pgaxatte has joined #openstack-infra		07:24
*** dpawlik has joined #openstack-infra		07:41
*** iurygregory has joined #openstack-infra		07:42
*** rpittau\|afk is now known as rpittau		07:44
*** kjackal has joined #openstack-infra		07:45
*** pcaruana has joined #openstack-infra		07:53
*** kjackal has quit IRC		07:57
*** ykarel is now known as ykarel\|lunch		07:57
hrw	fungi: kolla-build-ubuntu-source-aarch64 SUCCESS in 1h 48m 13s (non-voting)	07:58
hrw	fungi: thanks again	07:58
*** jtomasek has joined #openstack-infra		08:09
*** tetsuro_ has quit IRC		08:10
*** gfidente\|afk is now known as gfidente		08:11
*** tosky has joined #openstack-infra		08:12
*** dchen has quit IRC		08:16
*** iurygregory has quit IRC		08:17
*** tesseract has joined #openstack-infra		08:22
*** tkajinam_ has quit IRC		08:22
*** fdegir has quit IRC		08:22
*** fdegir has joined #openstack-infra		08:23
*** pkopec has joined #openstack-infra		08:23
*** pkopec has quit IRC		08:23
*** kjackal has joined #openstack-infra		08:29
*** iurygregory has joined #openstack-infra		08:33
*** dpawlik has quit IRC		08:39
*** dpawlik has joined #openstack-infra		08:45
*** pcaruana has quit IRC		08:46
*** harlowja has quit IRC		08:48
*** xek__ has joined #openstack-infra		08:49
*** factor has quit IRC		08:50
*** factor has joined #openstack-infra		08:51
*** harlowja has joined #openstack-infra		08:51
*** pcaruana has joined #openstack-infra		08:53
*** jpena\|off is now known as jpena		08:54
*** ralonsoh has joined #openstack-infra		08:56
*** ykarel\|lunch is now known as ykarel		09:04
*** gibi has left #openstack-infra		09:06
*** zbr is now known as zbr\|rover		09:19
*** lucasagomes has joined #openstack-infra		09:22
*** apetrich has joined #openstack-infra		09:28
*** derekh has joined #openstack-infra		09:35
*** ociuhandu has joined #openstack-infra		09:48
*** apetrich has quit IRC		10:04
*** dtantsur\|afk is now known as dtantsur		10:05
*** ykarel is now known as ykarel\|afk		10:11
*** apetrich has joined #openstack-infra		10:13
*** ykarel\|afk is now known as ykarel		10:35
openstackgerrit	Matthieu Huin proposed zuul/zuul master: [WIP] Docker compose example: add keycloak authentication https://review.opendev.org/664813	10:41
*** hrw has left #openstack-infra		10:52
*** aedc has joined #openstack-infra		10:58
*** aedc has quit IRC		11:03
*** rpittau is now known as rpittau\|bbl		11:21
*** Lucas_Gray has joined #openstack-infra		11:48
*** sshnaidm is now known as sshnaidm\|off		11:52
*** Lucas_Gray has quit IRC		11:53
*** jpena is now known as jpena\|lunch		12:01
*** ykarel is now known as ykarel\|afk		12:20
*** pcaruana has quit IRC		12:34
*** surpatil has quit IRC		12:34
*** ykarel\|afk is now known as ykarel		12:36
*** pcaruana has joined #openstack-infra		12:38
*** rpittau\|bbl is now known as rpittau		12:55
*** ociuhandu has quit IRC		12:55
*** ociuhandu has joined #openstack-infra		12:56
*** jpena\|lunch is now known as jpena		12:57
*** ociuhandu has quit IRC		12:58
*** ociuhandu has joined #openstack-infra		12:58
*** ykarel is now known as ykarel\|afk		13:03
*** goldyfruit has quit IRC		13:05
*** rh-jelabarre has joined #openstack-infra		13:05
*** goldyfruit has joined #openstack-infra		13:06
*** psachin has quit IRC		13:09
*** ykarel\|afk is now known as ykarel\|away		13:09
openstackgerrit	Lee Yarwood proposed openstack/devstack-gate master: nova: Renable n-net on stable/queens\|pike\|ocata https://review.opendev.org/701957	13:12
*** trident has quit IRC		13:13
*** trident has joined #openstack-infra		13:15
*** ociuhandu has quit IRC		13:22
*** ociuhandu_ has joined #openstack-infra		13:22
*** rlandy has joined #openstack-infra		13:22
*** aedc has joined #openstack-infra		13:35
*** Goneri has joined #openstack-infra		13:39
openstackgerrit	Lee Yarwood proposed openstack/devstack-gate master: nova: Renable n-net on stable/rocky\|queens\|pike\|ocata https://review.opendev.org/701957	13:47
*** aedc has quit IRC		13:51
*** gfidente has quit IRC		13:59
*** gfidente has joined #openstack-infra		14:05
openstackgerrit	Simon Westphahl proposed zuul/nodepool master: Always identify static nodes by node tuple https://review.opendev.org/701969	14:06
openstackgerrit	Simon Westphahl proposed zuul/nodepool master: Always identify static nodes by node tuple https://review.opendev.org/701969	14:10
openstackgerrit	Matthieu Huin proposed zuul/zuul master: JWT drivers: Deprecate RS256withJWKS, introduce OpenIDConnect https://review.opendev.org/701972	14:20
*** liuyulong has joined #openstack-infra		14:25
*** dtantsur is now known as dtantsur\|brb		14:27
*** ociuhandu has joined #openstack-infra		14:46
*** ociuhandu_ has quit IRC		14:46
openstackgerrit	David Shrewsbury proposed zuul/zuul master: Extract project config YAML into ref docs https://review.opendev.org/701977	14:47
*** eernst has joined #openstack-infra		14:49
*** ykarel\|away is now known as ykarel		14:53
*** eernst has quit IRC		14:56
*** lmiccini has quit IRC		14:57
*** dave-mccowan has joined #openstack-infra		15:06
fungi	amotoki: tosky: AJaeger: looking at those stable tox failures, the log shows /usr/local/bin/tox is being run directly (not under an explicit interpreter) so it must be getting installed with python3. the job log also indicates the ensure-tox role found an existing tox executable so that suggests it's preinstalled in our images (i couldn't find any record of tox getting installed within the job).	15:08
fungi	unfortunately our nodepool image build logs don't seem to be verbose enough to include confirmation that tox is being installed or how, so i'll need to dig into nodepool element sources	15:08
AJaeger	fungi, thanks for digging into this	15:09
frickler	fungi: where is this failing? I seem to remember that we had (and fixed) some similar issue a couple of weeks ago	15:15
fungi	infra-root: nb01 has run out of tempspace and is no longer able to build images. i'll attempt to remedy	15:15
*** jtomasek has quit IRC		15:16
*** armax has joined #openstack-infra		15:16
*** eharney has joined #openstack-infra		15:17
fungi	frickler: stable branch tox jobs for horizon at least. this example was a pep8 job for stable/pike: https://zuul.opendev.org/t/openstack/build/daaeaedb0a184e29a03eeaae59157c78/	15:18
fungi	it looks like probably a few weeks ago (mid-december) our ubuntu-xenial images started installing tox under python3	15:18
frickler	yes they did and iirc we said the fix was to set basepython=2.7 for those jobs that need that	15:20
*** zul has joined #openstack-infra		15:20
fungi	the solution we suggested for ubuntu-bionic is still applicable in my opinion (stable/pike of horizon can't run `tox -e pep8` on python3 but its tox.ini doesn't actually indicate that)	15:20
fungi	frickler: the ml thread from november was about the default python for tox changing on our ubuntu-bionic images	15:21
fungi	some weeks later, something happened to make it the case on ubuntu-xenial images as well	15:21
fungi	http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010957.html	15:22
openstackgerrit	Merged openstack/project-config master: Remove duplicated ACL files https://review.opendev.org/700913	15:22
*** ociuhandu has quit IRC		15:23
amotoki	it seems tox uses the interpreter where tox is installed as the default python when basepython is not specified.	15:26
amotoki	in case of horizon, we have landed a workaround like https://review.opendev.org/#/c/701848/	15:27
amotoki	it turns out it affects all jobs without basepython.... horizon is okay now but I am afraid that at least horizon plugins are affected.	15:28
fungi	amotoki: yes, the solution in ianw_pto's ml post from november 20 is still probably a good idea just so that the tox.ini is appropriately explicit about what major version of python it needs	15:31
fungi	otherwise a developer who has installed tox with python3 on their machine will encounter the same problems when trying to run tests locally	15:32
frickler	fungi: amotoki: and that arguments holds regardless of how we did change the default for xenial, so I'm not sure how much value there is in trying to dig that down	15:32
fungi	i'm still curious to know how it ended up changing for xenial images, but yes the answer likely doesn't change the recommendation	15:33
openstackgerrit	Thierry Carrez proposed openstack/project-config master: Define check-release-approval executor job https://review.opendev.org/701982	15:33
amotoki	yeah, I agree that we suggest to have the interpreter explicitly in tox.ini,	15:33
amotoki	on the other hand, I am confused as it happens in xenial. we would like to avoid a workaround for older stable branches.	15:34
fungi	well, it's not a workaround, it's fixing a latent bug which simply hadn't surfaced in our ci jobs	15:35
fungi	but it's a bug which could easily bite developers running tox locally, as i mentioned	15:35
amotoki	exactly	15:36
openstackgerrit	Merged zuul/zuul-jobs master: Make pre-molecule tox playbook platform agnostic https://review.opendev.org/700452	15:38
*** ociuhandu has joined #openstack-infra		15:44
*** jpena is now known as jpena\|brb		15:45
fungi	i wonder if https://review.opendev.org/697211 (merged to dib on december 12, released in 2.32.0 the next day, probably started influencing our image builds the day after that) is what changed	15:46
AJaeger	fungi: if it worked on the 18th, then its still 5 days difference, isn't it?	15:47
AJaeger	still, we might not have build images for 5 days...	15:47
fungi	yeah, not sure. you're right the timing doesn't match up though	15:48
fungi	our image build logs don't go back that far	15:48
frickler	fungi: builds on nb02 also seem to be failing, does it need to be cleaned up, too? /me needs to leave now	15:51
fungi	frickler: quite possibly, i'll take a look after i finish with nb01	15:52
*** rfolco has quit IRC		15:52
openstackgerrit	Thierry Carrez proposed openstack/project-config master: Define check-release-approval executor job https://review.opendev.org/701982	15:55
fungi	frickler: which image build failures were you seenig on nb02? its dib scratchspace is only 56% used right now	15:57
*** ykarel is now known as ykarel\|away		16:01
*** eernst has joined #openstack-infra		16:04
*** roman_g has joined #openstack-infra		16:07
*** dtantsur\|brb is now known as dtantsur		16:08
clarkb	infra-root https://review.opendev.org/#/c/701821/ should be the last step in LE'ing zuul.opendev.org (the certs are in place now we have to consume them in apache)	16:14
clarkb	mordred: corvus frickler ^ I'm able to watch that today if you approve it	16:14
clarkb	also I've confirmed that zuul-ci.org no longer has a "your cert will expire soon" warning	16:14
mordred	clarkb: +A	16:15
clarkb	tyty	16:15
*** jpena\|brb is now known as jpena		16:22
*** mattw4 has joined #openstack-infra		16:23
*** rlandy is now known as rlandy\|brb		16:29
openstackgerrit	Merged opendev/system-config master: Use zuul.opendev.org LE cert https://review.opendev.org/701821	16:34
fungi	what's the safest way to clean up a full /opt on a nodepool builder? on nb01 i see we have 0.9tb in /opt/nodepool_dib and 45gb in /opt/dib_cache	16:35
fungi	i've stopped the nodepool-builder service on the server	16:36
fungi	looks like there are a ton of kernel threads for loop and bioset handling, suggesting leaked devices in chroots?	16:37
clarkb	fungi: usually I disable the service then reboot it to clear stale mounts	16:37
fungi	do those need to be cleared out somehow too?	16:37
clarkb	yes	16:37
clarkb	then /opt/dib_tmp is typically what you clean up	16:37
clarkb	having .9tb in nodepool_dib implies that the space is consumed by actual image builds	16:38
clarkb	which may imply that nb02 isn't sharing the load	16:38
fungi	yeah, /opt/dib_tmp is only 7mb	16:38
clarkb	ya nb02 is out to lunch too	16:38
fungi	so clearing that out won't do much	16:38
clarkb	its got a ton of stale build processes from 2020	16:39
clarkb	er 2019	16:39
clarkb	I think the fix in this case is to have nb02 come back and take some of the image load off of nb01	16:39
clarkb	then clean up nb01 if necessary	16:39
fungi	okay, but same cleanup process on both?	16:39
clarkb	ya	16:39
*** lucasagomes has quit IRC		16:40
clarkb	(this was why I was cleaning up old images a while back, to reduce the total number of images we had so that a single builder had a chance at building them. I think we cleaned up all the images we could clean up at the time though)	16:40
fungi	/opt/dib_tmp on nb02 is definitely larger, waiting on du to tell me how much	16:41
openstackgerrit	Matthieu Huin proposed zuul/zuul master: web capabilities: remove unused job_history attribute https://review.opendev.org/702001	16:41
clarkb	we cleared out a couple fedora images and opensuse images iirc	16:41
clarkb	I wonder if maybe the oldest debian can go too?	16:41
fungi	likely, but we'd want to codesearch to see if it's in use before we pull it	16:42
clarkb	yup that is what we did with the other images. Pushed up changes to remove jobs that use them if just old or update them to use newer options. Then remove the nodeset. Then remove the images	16:43
clarkb	not a quick process, but this was a big part of the motivation for it.	16:44
fungi	i wonder if it's time for another amd64 builder so losing one doesn't cause the other to fill up	16:44
clarkb	or add more disk to the existing builders	16:45
clarkb	another option would be to delete the image from local disk once uploaded to all the clouds (but then people won't be able to download them)	16:45
*** pgaxatte has quit IRC		16:45
clarkb	we could potentially keep just the qcow2 compressed version and then convert to raw or vhd if necessary from there	16:45
clarkb	Shrews: ^ as soon as I've said that I've realized that could be a really nice nodepool-builder feature	16:46
clarkb	Shrews: basically keep a version of the image (qcow2 will almost always be smallest) for recovery purposes if necessary but delete the other versions once they have finished uploading	16:46
clarkb	then we have 9GB * num images storage space instead of 60GB * num images storage space	16:46
fungi	that does make automated reuploading of raw images harder i guess?	16:48
fungi	or when adding a new provider (the builder would normally start uploading already-built images to it automatically as soon as the provider was added, right?)	16:48
clarkb	fungi: you'd have to qemu-img convert them first	16:48
clarkb	that is a good point about adding a new provider	16:49
clarkb	we could force a new build at that point as a workaround but that isn't very user friendly	16:49
fungi	hrm, not a lot more in /opt/dib_tmp on nb02 either... 50gb according to du	16:50
clarkb	fungi: ya check the ps listings for disk-image-create though	16:51
clarkb	nb02 seems stuck on a process problem not a disk problem	16:51
fungi	right, just in terms of clearing out /opt/dib_tmp it's not really going to free up much is what i meant	16:51
clarkb	ya but its got about 500GB free	16:51
clarkb	whcih is normal	16:51
clarkb	(and why if we lose one the other fills 1GB of disk	16:52
*** lpetrut has quit IRC		16:52
fungi	how do you normally go about disabling nodepool-builder? the update-rc.d tool or rename the rc.2 symlinks from S to K or via systemctl disable or some other way?	16:52
fungi	edit the initscript to exit 0?	16:52
clarkb	systemctl disable nodepool-builder	16:53
clarkb	it should give you a mesage about updating init script things	16:53
fungi	cool, i didn't know that worked for sysv-compat	16:56
clarkb	yup, the way systemd sysv compat works is it automatically adds a shim unit file for each sysv init script	16:56
*** gyee has joined #openstack-infra		16:56
*** ociuhandu has quit IRC		16:57
clarkb	systemctl can then manage that shim as if it were any other unit	16:57
fungi	thanks!	16:57
clarkb	(this is why we have to daemon-reload systemd in our puppetry to have systemd figure out the service exists)	16:57
*** dpawlik has quit IRC		16:57
fungi	got it	16:57
fungi	so as far as bringing these back online after rebooting and clearing out /opt/dib_tmp, i should enable and start nodepool-builder on nb02 first and leave it stopped on nb01 until a full set of images is going?	16:59
fungi	(so that nb01 doesn't try to build more images when it lacks disk space to write them?)	16:59
clarkb	you'll need to leave it running on nb01 so that it can delete the images that nb02 buidls new ones for	16:59
clarkb	I think	17:00
fungi	is it smart enough to know not to try to build any on nb01 until it deletes some?	17:00
clarkb	no	17:00
clarkb	it will fail to build images in that period	17:00
fungi	because it's going to have maybe 90gb free here after i clear dib_tmp	17:00
clarkb	this is where the auto rebuild aggresiveness makes it difficult to work with nodepool	17:00
clarkb	because we could delete the older images of a pair to free up space but then it will immediately start trying to build that image	17:01
fungi	is it safe to delete /opt/dib_tmp itself, or do i need to leave the directory and just remove contents?	17:01
clarkb	you need to remove the contents or wait for puppet to run and put it back or put it back yourself	17:01
clarkb	nodepool doesn't create that dir	17:01
fungi	okay	17:01
fungi	thanks	17:01
clarkb	(typically dib would use /tmp)	17:02
clarkb	another option is to pause all images in the nb01 config, then delete the older image of the pairs on it	17:02
fungi	hrm, nb02 isn't reachable yet. maybe a periodic fsck was triggered	17:02
clarkb	then only unpause the images in nb01's config once nb02 has picked up some slack	17:03
clarkb	Its probably ok to simply leave it running and let some errors happen?	17:03
clarkb	fungi: I think reboots may be slow there due to needing to clean up all those mounts and stuff	17:03
fungi	yeah, that seems reasonable	17:03
clarkb	systemd will immediately stop sshd but then other things are slower	17:03
clarkb	maybe what we need is the ability to set image pausing outside of config	17:04
*** rlandy\|brb is now known as rlandy		17:04
clarkb	then we could say nodepool pause foo, nodepool delete foo-1, wait for nb02, nodepool unpause foo	17:05
clarkb	and not bother with emergency files and config	17:05
*** ociuhandu has joined #openstack-infra		17:06
clarkb	we had a similar problem in the past where the most recent image was the problem. I wanted to delete the most recent image and use the previous image but nodepool immediately started building a new image that would be broken	17:06
clarkb	solution there is to pause then delete	17:06
*** rpittau is now known as rpittau\|afk		17:10
fungi	nb01 cleanup is done but i'm not starting it just yet because nb02 is still unreachable	17:12
fungi	i'll check the oob console	17:12
*** tesseract has quit IRC		17:19
fungi	nb02 console just shows "Ubuntu 16.04" and a little spinner	17:20
fungi	not sure if it's booting or stopping	17:20
fungi	hiding boot/shutdown progress from the console display is an unpardonable sin. why would that be the default?	17:21
*** eernst has quit IRC		17:22
fungi	and the `console log show` cli command is unsupported for rackspace	17:23
clarkb	fungi: thats long been an issue on ubuntu (the hiding console output on servers problem)	17:23
fungi	i guess our options are to wait, or try to force a(nother) reboot and hope it doesn't irrecoverably corrupt /opt	17:23
clarkb	I seem to recall that is a symptom of fscking	17:23
*** zxiiro has joined #openstack-infra		17:24
clarkb	because you get error messages if there was actually something wrong much more quickly	17:24
fungi	ahh	17:24
*** liuyulong_ has joined #openstack-infra		17:25
fungi	so maybe it did hit a scheduled fsck on boot and since /opt is 1tb (and maybe slow)...	17:25
clarkb	puppet apply at about 1800UTC should update zuul.opendev.org cert	17:27
*** liuyulong has quit IRC		17:28
*** dtantsur is now known as dtantsur\|afk		17:30
*** eernst has joined #openstack-infra		17:30
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: helm-template: Add role to run 'helm template' https://review.opendev.org/701871	17:31
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts https://review.opendev.org/701874	17:31
openstackgerrit	Mohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts https://review.opendev.org/701874	17:31
*** ociuhandu has quit IRC		17:33
*** evrardjp has quit IRC		17:33
*** evrardjp has joined #openstack-infra		17:34
*** ociuhandu has joined #openstack-infra		17:36
openstackgerrit	Merged zuul/zuul-jobs master: install-go: bump version to 1.13.5 https://review.opendev.org/700467	17:42
fungi	okay, nb02 finally became reachable, cleaning it up now	17:45
clarkb	infra-root: thoughts on adding gmann to devstack-gate core? seems like the changes goign in now are for life support on old branches	17:46
fungi	i'm in favor	17:46
clarkb	gmann in particular seems to be helping to ensure those changes get in so having him be able to approve would be good I think	17:46
fungi	and he's helping drive the openstack cycle goal to drop legacy jobs from master	17:47
fungi	which should mean less use of d-g overall	17:47
clarkb	gmann: ^ would you be interested in that?	17:47
gmann	clarkb: fungi sure, that will be helpful. thanks	17:48
tosky	oh, right, devstack-gate was originally part of infra and not QA	17:48
tosky	(I guess it is still infra)	17:49
*** iurygregory has quit IRC		17:54
*** derekh has quit IRC		18:01
smcginnis	zuul down?	18:01
*** ykarel\|away has quit IRC		18:02
clarkb	smcginnis: no, looks like the apache config for new ssl cert is unhappy :/	18:04
clarkb	zuul is still running though	18:04
smcginnis	OK, I'm just getting a connection refused trying to access the status page. Good it's only that part.	18:05
clarkb	ya I'm trying to sort it out	18:05
smcginnis	Thanks!	18:05
clarkb	ok I've put the old vhost config abck in place	18:08
clarkb	and put the host in the emergency file. This way the webserver is up and runnign while I sort this out	18:08
smcginnis	Confirmed - at least loads for me now.	18:08
clarkb	oh I know what is wrong ugh	18:09
*** pcaruana has quit IRC		18:09
clarkb	ok, I made the (bad) assumption that having any content at all in /etc/letsencrypt-certs/zuul.opendev.org/ was a sign that things were happy there	18:09
clarkb	they were not, I could not issue the certificate because zuul01.opendev.org does not have a acme delegation record	18:10
clarkb	and the reason for that is we don'y have a zuul01.opendev.org, just a zuul01.openstack.org	18:11
clarkb	fix incoming	18:11
*** ociuhandu has quit IRC		18:12
openstackgerrit	Clark Boylan proposed opendev/system-config master: Don't issue cert for zuul01.opendev.org https://review.opendev.org/702020	18:14
clarkb	infra-root ^ that cleanup is ncessary for zuul.opendev.org le happyness	18:14
*** stevebaker has joined #openstack-infra		18:15
*** rfolco has joined #openstack-infra		18:16
*** eernst has quit IRC		18:18
*** gfidente is now known as gfidente\|afk		18:24
*** jpena is now known as jpena\|off		18:25
fungi	okay, nb02 is cleaned up and nodepool-builder service enabled and started on it, currently building debian-stretch-0000100801 for 6 minutes now	18:29
fungi	per earlier discussion, i'll start the service on nb01 now and maybe it'll fail for a bit until nb02 builds enough replacements that 01 can delete some of its older images	18:29
fungi	nb01 is now building (or at least trying to) gentoo-17-0-systemd-0000131965	18:31
clarkb	fungi: great. I expect things should start ot settle down on the builders after a cuple images manage to build and get their old versions cleaned up	18:32
clarkb	~3 hours away probably	18:32
fungi	yup	18:33
fungi	the /opt partition on nb01 has only 76gb to spare, so i do expect some failures	18:34
*** liuyulong_ has quit IRC		18:41
*** kjackal has quit IRC		18:45
*** aedc has joined #openstack-infra		18:51
*** eharney has quit IRC		18:54
*** pcaruana has joined #openstack-infra		18:57
*** ralonsoh has quit IRC		18:59
openstackgerrit	Andreas Jaeger proposed openstack/project-config master: Remove retired x/js-* repos from gerritbot https://review.opendev.org/702028	19:05
yoctozepto	AJaeger: by the looks of it js-openstack-lib already spams in openstack-sdks	19:06
yoctozepto	gerritbot config needs no update :-)	19:06
AJaeger	yoctozepto: indeed, was surprised by that - so, that governance change somehow reflects reality ;)	19:06
AJaeger	yoctozepto: 702028 is the change I did - thanks for reminding me of that one	19:07
yoctozepto	AJaeger: well, it is far from infra	19:07
yoctozepto	no problem, apply cleaning procedures before the weekend :-)	19:07
AJaeger	;)	19:07
* AJaeger would just love to have the house cleaned up as easily :)		19:08
clarkb	AJaeger: ++	19:08
yoctozepto	I'm allergic to dust so I clean mine regularly...	19:09
yoctozepto	AJaeger, clarkb: regarding https://review.opendev.org/#/admin/groups/1408,members <- how to propose change - I presume it should happen after governance change anyways :-)	19:10
AJaeger	clarkb: regarding js-openstack-lib, I suggest you +1 as infra PTL the governance change https://review.opendev.org/#/c/701854/	19:11
clarkb	yoctozepto: mordred is sdk ptl so we'd give him access then he can edit the list as he wants	19:11
AJaeger	yoctozepto: mordred as PTL and infra-core can take care of it	19:11
clarkb	and ya he is already in there	19:11
yoctozepto	clarkb, AJaeger: mhm, that makes sense	19:12
clarkb	AJaeger: done	19:12
AJaeger	thanks	19:13
* AJaeger disappears to cycle for collecting his kids		19:14
*** kjackal has joined #openstack-infra		19:18
yoctozepto	AJaeger: healthy you!	19:20
*** factor has quit IRC		19:27
*** dklyle has quit IRC		19:28
openstackgerrit	Radosław Piliszek proposed openstack/project-config master: Remove old openstack/js-openstack-lib jobs https://review.opendev.org/702030	19:29
*** lpetrut has joined #openstack-infra		19:30
*** dklyle has joined #openstack-infra		19:37
fungi	okay, nb02 finished the debian-stretch image and is now onto opensuse-15 as of 30 minutes ago	19:43
fungi	nb01 is still building gentoo-17-0-systemd for over an hour, but will hopefully complete soon	19:43
fungi	and it's still got 55gb worth of space left in /opt so maybe it won't fail to write	19:44
fungi	i'm going to go out for a brief walk since we seem to have a spate of pleasant weather, but i'll be back in an hour-ish to check in on it	19:45
openstackgerrit	Merged opendev/system-config master: Don't issue cert for zuul01.opendev.org https://review.opendev.org/702020	19:45
*** stevebaker has quit IRC		19:50
*** eharney has joined #openstack-infra		19:52
openstackgerrit	Merged openstack/devstack-gate master: nova: Renable n-net on stable/rocky\|queens\|pike\|ocata https://review.opendev.org/701957	19:52
*** Goneri has quit IRC		19:59
clarkb	I have remoed zuul01 from the emergency file and will keep an eye on it	20:00
clarkb	not hearing any opposition I've added gmann to d-g core	20:02
clarkb	I think that will help with the straggler changes that go in there to keep stable branches running	20:03
clarkb	while checking where we are in ansible + puppet loop I discovered that logstash-worker05 was not responding to ssh	20:05
clarkb	this has been the case for days according to logs. I will reboot it via the api	20:06
clarkb	#status log Added gmann to devstack-gate-core to help support fixes necessary for stable branches there.	20:07
openstackstatus	clarkb: finished logging	20:07
clarkb	#status log Rebooted logstash-worker05 via nova api after discovering it has stopped responding to ssh for several days.	20:07
openstackstatus	clarkb: finished logging	20:07
clarkb	syslog doesn't show anything but it appears to have stopped on december 21, 2019	20:08
clarkb	fungi: looks like nb03 is also in a similar no disk state. I'm going to apply similar cleanup to it now	20:13
clarkb	fungi: also I think part of our problem is we are holding much older copies of images possibly because we're failing to delete them from clouds (probably vexxhost because of the BFV you can't delete this image because something is using it problem)	20:23
clarkb	fungi: I'm going through on nb01 and clearing out files in /opt/nodepool_dib that don't correspond to images reported by dib-image-list	20:23
clarkb	as a first pass cleanup	20:23
clarkb	anything that remains is still valid and possibly "stuck"	20:23
*** arif-ali has quit IRC		20:25
clarkb	fungi: ok nb01's /opt/nodepool_dib contents should reflect what is in nodepool dib-image-list now	20:39
clarkb	we have an excess of bionic, buster, centos-7, and gentoo images which I think is related to not being able to delete them from cloud providers	20:40
clarkb	nb03 /opt/dib_tmp cleanup is very slow	20:41
clarkb	we have issued an LE cert properly for zuul.opendev.org now	20:42
clarkb	just waiting for puppet to run and switch the apache config over	20:42
clarkb	Failed to delete image with name or ID 'ubuntu-bionic-1573653999': 409 Conflict: Image c68d93eb-72ff-42ad-b5c8-63daace0286a could not be deleted because it is in use: The image cannot be deleted because it is in use through the backend store outside of Glance. (HTTP 409)	20:46
clarkb	that confirms that for at least one of the images	20:46
clarkb	and I've tracked one of the centos-7 image leaks to a volume that reports to be attached to a server that no longer exists	20:51
clarkb	I think that means we want to start with a volume cleanup	20:51
clarkb	then let nodepool cleanup images again then see if there is anything left	20:51
clarkb	I expect that to be fairly involved and I want to finish up this zuul cert thing and find lunch first	20:52
fungi	i wonder if nodepool could be adjusted to delete local copies of images it also wants to delete remotely, regardless of whether remote deletion fails	20:54
fungi	if it's actively trying to delete those images from providers, there's probably no need to keep the local copy of them on disk any longer	20:55
clarkb	++	20:55
clarkb	fwiw cleaning up nb03's dib_tmp freed like 16GB. I'm not doing the same cleanup to /opt/nodepool_dib there that I did on nb01 to see if we can free more space	20:55
openstackgerrit	Andreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove jobs and templates used by js-openstack-lib https://review.opendev.org/701510	20:55
fungi	i assume we'd still need to keep a record of the images since that's how it knows to keep trying to delete them?	20:55
clarkb	nb03 poses a slightly different problem. We've got images associated with linaro-cn1 in zk and those will never delete because the cloud is gone.	20:57
*** michael-beaver has joined #openstack-infra		20:57
clarkb	For there I'll delete them from disk then after lunch I can figure out how to surgery the zk db?	20:57
clarkb	fungi: all of that is stored in zk and is the source of the problem for ^	20:58
clarkb	zk says that image must be deleted but it will never be deleted at this point because the cloud is gone so we need to edit the zk db	20:58
clarkb	I'll start with simply removing them from disk as that is easy and frees space	20:58
clarkb	oh except the ones for cn1 are not on disk? we'ev also got images that refuse to delete in london?	20:59
fungi	that sounds like a royal mess	21:00
fungi	i don't suppose zk has a convenient cli you can use to inspect and manipulate records?	21:00
mordred	fungi: zkshell	21:01
mordred	fungi: https://github.com/apache/zookeeper/blob/master/zookeeper-docs/src/main/resources/markdown/zookeeperCLI.md	21:01
clarkb	I'm trying to manually image delete the leaked images in london now	21:02
clarkb	to see if the error is useful	21:02
clarkb	at the very least we should be able to apply fungi's new rule for deleting from disk when db record is set to deleting manually	21:03
fungi	that's a good point	21:03
clarkb	zuul.opendev.org is LE'd now	21:04
clarkb	I'm going to go find lunch while i wait for this image delete to return	21:04
clarkb	fungi: if you want to poke at the vexxhost image leaks via volume leaks I can poke at nb03	21:04
clarkb	I'm not doing anything with nb01 or nb02 right now so we won't be getting in each other's way	21:05
fungi	cool, will do	21:05
fungi	though i need to get started making dinner soon	21:05
clarkb	fungi: what I noticed is that if you volume list sjc1 you'll get some volumes that say "attached to $name" and others are "attached to $uuid"	21:05
fungi	will see if i can get through them quickly	21:05
clarkb	the $uuid ones seem to not have names because those servers do not exist anymore and we have leaked those volumes	21:06
clarkb	I think if we delete those volumes after confirming the servers do not exist then the images should be able to delete	21:06
*** rfolco has quit IRC		21:06
clarkb	and then nodepool will automatically remove the files on disk	21:06
fungi	and there's a special way mordred worked out to possibly delete them?	21:06
clarkb	fungi: ya you unattach them first	21:06
fungi	(if still attached to nonexistent instance)	21:06
clarkb	I don't know what the specific details for that are but its some api call to do an unattach	21:07
fungi	right, i have a feeling there is no way to do it with osc, will need to use sdk or api	21:07
clarkb	ah	21:07
fungi	you can detach normally if the instance still exists	21:08
fungi	if the instance was deleted but cinder still has an attachment record pointing to it, then you need an undocumented api call	21:08
*** hwoarang has quit IRC		21:16
*** hwoarang has joined #openstack-infra		21:16
corvus	clarkb: thanks for z.o.o!	21:23
*** zxiiro has quit IRC		21:23
fungi	just reconfirmed, if i try `openstack server remove volume eb0cbf8e-16b5-4712-8274-c4989b1bf956 0f91579c-c627-452b-aad4-67cdeae865c3` i get No server with a name or ID of 'eb0cbf8e-16b5-4712-8274-c4989b1bf956' exists.	21:24
smcginnis	fungi, clarkb: I believe mordred was going to look at doing something for that.	21:24
smcginnis	We were talking about it the other day and he confirmed he can call the API needed to clean things up.	21:25
fungi	smcginnis: yep, in the meantime i can probably use the api/sdk	21:25
clarkb	my image delete against linaro-london hasn't returned yet	21:25
clarkb	I think I'll go ahead and apply fungi's rule of deleting from disk when we start the delete process on nb03	21:26
smcginnis	It's a long was from Portland to London.	21:26
*** rlandy has quit IRC		21:26
clarkb	this will give us room for normal operations while we sort out why those images aren't deleting	21:26
openstackgerrit	James E. Blair proposed zuul/zuul-jobs master: Fix typo in helm role https://review.opendev.org/702046	21:27
*** kjackal has quit IRC		21:27
*** Goneri has joined #openstack-infra		21:28
clarkb	nb03 is running a builder again after that cleanup	21:33
clarkb	fungi: is doing the zk surgery something you are interested in doing? I don't think that is urgent so fine if you want to give it a go next week	21:33
fungi	i can, sure	21:33
fungi	still looking at forced volume detachment	21:34
clarkb	I've done it a few times in the past so can help, but figured if you hadn't done it before this might be a good time to try :)	21:34
fungi	might be nice of osc grew a --force option to volume delete which did the os-force_detach action from https://docs.openstack.org/api-ref/block-storage/v3/#force-detach-a-volume	21:34
fungi	oh, whaddya know! `openstack volume delete --force <uuid>` is a thing!	21:36
fungi	--force Attempt forced removal of volume(s), regardless of state	21:36
fungi	unfortunately, in vexxhost:	21:36
fungi	"Policy doesn't allow volume_extension:volume_admin_actions:force_delete to be performed. (HTTP 403)"	21:36
fungi	mnaser: do you happen to know if there's a (maybe safety-related) reason for that ^ ?	21:37
fungi	i guess that's considered a protected admin-only function?	21:38
mnaser	i believe that cinder by default has that as an admin-only policy thing	21:38
mnaser	(we don't have custom policy fwiw)	21:38
fungi	maybe force deleting volumes associated with an existing instance could crash hypervisors or something	21:38
fungi	makes sense	21:38
mnaser	https://github.com/openstack/cinder/blob/master/cinder/policies/volume_actions.py#L106	21:39
mnaser	yeah the cinder default is admin API	21:39
mnaser	so i'll delegate that answer to them :-p	21:39
fungi	if pabelanger still hung out in here i'd ask him for details on the environment where he was successfully using the os-force_detach action	21:39
mnaser	fungi: i think he might have deleted the attachment in cinder first	21:40
fungi	yeah, i was hoping that's what `openstack volume delete --force` was doing under the hood	21:40
clarkb	fungi: and openstack server remove volume fails because the server does not exist anymore?	21:45
clarkb	that is a command btw `openstack server remove volume`	21:45
fungi	clarkb: yep, that's what i was trying first	21:45
clarkb	still waiting to hear back on this image delete against linaro-london :/	21:45
clarkb	hrm	21:45
fungi	the instance you specify must exist, because i guess it's asking nova to process the detachment and nova says "i have no idea what server that is"	21:46
fungi	the actual error is "No server with a name or ID of '<uudi>' exists."	21:47
fungi	s/uudi/uuid/	21:47
clarkb	fungi: well if you want to do the zk stuff instead I can try to pick this up if pabelanger responds in #zuul	21:48
*** lbragstad has quit IRC		21:54
*** lbragstad has joined #openstack-infra		21:54
openstackgerrit	Merged zuul/zuul-jobs master: collect-container-logs: add role https://review.opendev.org/701867	21:56
*** ahosam has joined #openstack-infra		21:57
openstackgerrit	Clark Boylan proposed opendev/zone-opendev.org master: Manage insecure-ci-registry ssl with LE https://review.opendev.org/702050	22:14
openstackgerrit	Clark Boylan proposed opendev/system-config master: Manage insecure-ci-registry cert with LE https://review.opendev.org/702051	22:14
clarkb	infra-root ^ the dns change should be able to go in whenever its reviewed and ready. But I'm thinking best to hold off on the switch until next week while we juggle this nodepool cleaning	22:15
mordred	clarkb, fungi: moving here	22:16
mordred	clarkb: what did you mean by volume size?	22:16
clarkb	mordred: our mirror volume would be bigger than 80GB so you can check that attribute as another sanity check	22:16
mordred	nod	22:16
clarkb	80GB is our standard size for our nodepool instances	22:16
mordred	yes - that whole list is 80	22:17
mordred	clarkb, fungi: ready for me to try running it for real?	22:19
clarkb	ya so I think worst case you might kill a running job or delete a held node	22:19
clarkb	mordred: does your script handle the case where volume doesn't have a server because the server hasn't booted yet on initial create?	22:20
clarkb	mordred: that would be the only other case I'd worry about	22:20
clarkb	(I think checking that volume age > 1 hour would be sufficient ot guard against that)	22:20
mordred	uh. I have no idea what the race conditions there would be ... good call ... one sec	22:20
openstackgerrit	Merged zuul/zuul master: Make files matcher match changes with no files https://review.opendev.org/678273	22:23
*** lastmikoi has quit IRC		22:28
*** arif-ali has joined #openstack-infra		22:31
openstackgerrit	James E. Blair proposed zuul/zuul-helm master: Add option to manage secrets outside of helm https://review.opendev.org/702052	22:32
mordred	clarkb: would you look at clean-volumes.py and tell me if my time delta code looks right?	22:34
clarkb	is that on bridge?	22:35
mordred	yeah	22:35
mordred	clarkb: I did it by hand - but datetimes are so horrible in python I'd like a second set of eyes	22:35
mordred	mnaser: are created_at values from volumes in vexxhost going to come back in UTC?	22:36
clarkb	mordred: ya agreed on the horribleness	22:37
mordred	clarkb: also - it's safe to run that script in its current form via the line in the first line	22:37
mordred	if you want to run it and look at the output	22:37
clarkb	mordred: whats with the truncation of created_at?	22:38
clarkb	otherwise it looks right to me. Might also want to print the server uuid for volumes being deleted as that will be a breadcrumb for debugging if it doesn't do what we want	22:38
mordred	clarkb: it has microseconds ... oh - you konw - that was from when I was trying to parse with dateutil which doesn't grok those	22:39
clarkb	mordred: ah ya that should be fine	22:39
mordred	clarkb: ok. so - game for me to run that for real?	22:39
clarkb	mordred: I think so. And maybe add in the server uuid logging if you want	22:39
fungi	everything should just return utc epoch seconds, and then whatever you need is basic arithmetic	22:41
mordred	ok. I'm going to run it and then I'll paste the output	22:41
fungi	i mean, ideally we'd return planck units since the big bang within our relativistic frame of reference, but that's probably overengineering until we crack near-light-speed travel	22:43
mordred	:)	22:43
*** KeithMnemonic has quit IRC		22:46
mordred	http://paste.openstack.org/show/788262/	22:47
mordred	clarkb, fungi: ^^	22:47
mordred	(the output is a bit verbose - I should just print volume_id on the delete line I think	22:47
clarkb	we deleted ~23 volumes?	22:48
mordred	next time someone runs it - it should be slightly less chatty	22:48
mordred	yeah - I think so	22:48
mordred	there's still likely one left around where I manually deleted the attachment from fungi's example earlier	22:48
mordred	b/c I did not delete the volume itself	22:48
*** dave-mccowan has quit IRC		22:48
openstackgerrit	James E. Blair proposed zuul/zuul-helm master: Add option to manage secrets outside of helm https://review.opendev.org/702052	22:49
mnaser	mordred: uh, i think so.	22:49
clarkb	the image I tried deleting before is no longer there	22:49
fungi	yeah, looks good	22:50
openstackgerrit	James E. Blair proposed zuul/zuul-helm master: Change builder container name https://review.opendev.org/701793	22:50
openstackgerrit	James E. Blair proposed zuul/zuul-helm master: Add empty clouds value https://review.opendev.org/701865	22:50
openstackgerrit	James E. Blair proposed zuul/zuul-helm master: Add option to manage secrets outside of helm https://review.opendev.org/702052	22:50
clarkb	mordred: fungi there is one image left in vexxhost to delete	22:50
clarkb	probably associated to that volume monty did not delete	22:50
* clarkb looks to find it		22:50
fungi	it'll be 0f91579c-c627-452b-aad4-67cdeae865c3	22:51
clarkb	fungi: mordred 0f91579c-c627-452b-aad4-67cdeae865c3 I think it is that one	22:51
clarkb	yup	22:51
clarkb	should I go ahead and delete it	22:51
fungi	go for it as far as i'm concerned	22:51
clarkb	done	22:53
clarkb	cool now only linaro images stuck in deleting	22:54
clarkb	fungi: are you at the end of your week or do you want to do that now?	22:54
fungi	i can take a look after dinner	22:55
openstackgerrit	Monty Taylor proposed opendev/system-config master: Add quick script for cleaning boot from volume leaks https://review.opendev.org/702053	22:55
mordred	mnaser: ^^ there's a script that non-admin users can use to cleanup leaked BFV volumes on vexxhost	22:55
mordred	mnaser: I'm going to generalize it a bit and put it into sdk / osc ... but for now, that seems to safely work	22:56
clarkb	fungi: ok. basically what we want to do is `nodepool image-list \| grep linaro-cn1` then for each of those records delete the zk nodes that correspond to them	22:56
fungi	via zkshell	22:56
mordred	mnaser: I mean - it should work on any modern openstack - it's just hardcoded for us to point at vexxhost since that's the only nodepool place we BFV	22:56
clarkb	fungi: yup which you install via pip into a venv (I've got an install on zk01.o.o in my homedir)	22:57
mnaser	mordred: neat	22:58
clarkb	fungi: /nodepool/images/ubuntu-bionic-arm64/builds/0000001978/providers/linaro-cn1/images the content at that path up to linaro-cn1 should be deleted I think	23:06
clarkb	It should be sufficient to simply delete the content below images/ though	23:06
clarkb	I found a case for the other arm64 cloud that was removed and it still was listed under providers but had no image under providers/name/images/	23:06
fungi	is zkshell known by another name? pypi doesn't know it	23:12
clarkb	fungi: zk-shell	23:13
fungi	ahh	23:13
fungi	yup, that's working	23:13
clarkb	ya I think you only need to remove 0000000001 from /nodepool/images/ubuntu-bionic-arm64/builds/0000001978/providers/linaro-cn1/images	23:14
clarkb	then do that for the other 6 linaro-cn1 images	23:14
fungi	what about nodepool/images/ubuntu-bionic-arm64/builds/0000001978/providers/linaro-cn1/images/lock	23:19
fungi	leave that there?	23:19
clarkb	ya that node is still there for the nrt arm cloud	23:20
clarkb	I think we can actually delete everything linaro-cn1 and below	23:20
fungi	as in `rm nodepool/images/ubuntu-bionic-arm64/builds/0000001978/providers/linaro-cn1`	23:21
clarkb	ya	23:21
clarkb	I think you have to rm things below it first, there is no -r for this	23:22
fungi	indeed: /nodepool/images/ubuntu-bionic-arm64/builds/0000001978/providers/linaro-cn1 is not empty.	23:25
*** rfolco has joined #openstack-infra		23:25
fungi	okay, manually recursed	23:25
fungi	i'll work through the others	23:25
clarkb	and now image-list doesn't show that image anymore	23:26
clarkb	really the key thing here is to avoid operating on zk when nodepool may be operating on those nodes, but we know zk won't do that because this cloud dne anymore	23:26
*** mattw4 has quit IRC		23:27
*** lastmikoi has joined #openstack-infra		23:27
fungi	how did you identify the /nodepool/images/ubuntu-bionic-arm64/builds/0000001978 node?is that the build id reported by nodepool image-list? or the upload id, or something else entirely?	23:28
clarkb	1978 is the build id for that image name	23:28
fungi	i'm looking at build id 0000012627 upload id 0000010921 for debian-stretch-arm64 in linaro-cn1	23:28
clarkb	then the 00000...1 you removed under provider is the upload id for the provider	23:29
fungi	oh... i need to make the image name in the path match too	23:29
clarkb	nodepool/images/debian-stretch-arm64/builds/0000012627/providers/linaro-cn1	23:29
* fungi smacks forehead		23:29
clarkb	yup	23:29
clarkb	in old nodepool the build id was unique globally	23:30
clarkb	but now it is per image name	23:30
fungi	okay, it's working as expected	23:30
fungi	one more down	23:31
fungi	will work my way through the rest	23:31
openstackgerrit	James E. Blair proposed zuul/zuul-helm master: Add Zuul charts https://review.opendev.org/700460	23:31
fungi	okay, linaro-cn1 entries are entirely gone from image-list	23:34
*** pcaruana has quit IRC		23:35
clarkb	confirmed	23:37
clarkb	that leaves us with figuring out linaro-london image situation	23:37
clarkb	my image delete is still sitting there	23:37
*** rfolco has quit IRC		23:40
fungi	i still need to wash dishes, but can probably help once i'm done	23:43
clarkb	I'm going to try manually deleting the other images that have leaked there and see if any act different than the random one I selected first	23:44
clarkb	I can talk to the api because image show on that image name works	23:46
clarkb	unless there is layer 7 firewalling we shouldn't be getting lost that way	23:46
clarkb	adding --debug to the osc command shows it getting all the way to the delete request on the image uuid	23:47
clarkb	so also not getting lost somewhere in between due to name lookups	23:48
clarkb	that makes me think it is likely a cloud problem	23:49
clarkb	kevinz: http://paste.openstack.org/show/788263/ is a list of images that nodepool has been trying to delete in the linaro london cloud. Manually attempting to delete them shows the commands getting as far as the DELETE http request but they seem to hang there	23:52
clarkb	kevinz: nodepool not being able to clean up these images has meant it kept them around on disk which ended up filling the disk on our builder node.	23:52
clarkb	kevinz: hrw: maybe I'm thinking this must be something on the cloud side as I am able to show those images just fine (implying api access is otherwise working)	23:52
clarkb	is that something you can look into when you get a chance?	23:53
fungi	in good news, /opt utilization on nb01 is falling rapidly	23:55
clarkb	kevinz: hrw I realize it is likely your weekend now so no rush. We can pick this up next week	23:56
*** michael-beaver has quit IRC		23:57
clarkb	fungi: ya I'm not sure there is much else we can do re deleting these images	23:58
openstackgerrit	James E. Blair proposed zuul/zuul-helm master: Allow tenant config file to be managed externally https://review.opendev.org/702057	23:58
clarkb	lets see if kevinz can help on monday	23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!