Friday, 2020-01-10

*** stevebaker has joined #openstack-infra00:01
*** mattw4 has quit IRC00:04
*** dchen has joined #openstack-infra00:09
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176400:20
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176400:27
*** hrw has joined #openstack-infra00:33
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: collect-container-logs: add role  https://review.opendev.org/70186700:34
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176400:36
*** zhurong has quit IRC00:36
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: collect-container-logs: add role  https://review.opendev.org/70186700:38
*** tetsuro has joined #openstack-infra00:39
openstackgerritMohammed Naser proposed zuul/zuul-registry master: Switch to collect-container-logs  https://review.opendev.org/70186800:39
openstackgerritMohammed Naser proposed zuul/nodepool master: Switch to collect-container-logs  https://review.opendev.org/70186900:42
openstackgerritMohammed Naser proposed zuul/zuul-registry master: Switch to collect-container-logs  https://review.opendev.org/70186800:42
openstackgerritMohammed Naser proposed opendev/system-config master: Switch to collect-container-logs  https://review.opendev.org/70187000:47
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: collect-container-logs: add role  https://review.opendev.org/70186700:52
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176400:52
*** eandersson has joined #openstack-infra00:54
eanderssonstackalytics.com cert expired?00:54
fungieandersson: so we've heard00:56
fungiwe don't run it, and no clue who to reach out to at mirantis00:56
fungi(we offered to run it more than once in the past)00:57
eanderssonHopefully someone that cares enough to fix it :p00:57
mnaserinfra-root: i think one of the executors might have issues with log streaming, as i'm seeing occasional "--- END OF STREAM ---" on jobs that are clearly running and eventually report a result00:58
mnaserexample: http://zuul.opendev.org/t/zuul/stream/bf5d120011d448c8baedcce26d0b31d0?logfile=console.log00:59
mnaseraccording to the api, ze05 is the one running that job00:59
fungimnaser: it happens frequently that we go over memory on them and the oom-killer decides the output streamer would be a good thing to arbitrarily kill01:04
fungii'll take a look01:04
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: helm-template: Add role to run 'helm template'  https://review.opendev.org/70187101:06
fungithe following executors need restarts to get their output streamers going again: 02,03,04,05,1201:07
fungiso almost half01:07
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176401:07
fungii'll poke at them for a bit01:07
fungithings look pretty quiet, so i can just restart them all at the same time and let the other 7 handle the load in the interim01:09
*** zbr|rover has quit IRC01:11
*** HenryG has quit IRC01:11
*** HenryG has joined #openstack-infra01:12
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176401:14
*** zhurong has joined #openstack-infra01:15
*** roman_g has quit IRC01:15
*** zbr has joined #openstack-infra01:19
*** zbr has quit IRC01:24
*** zbr has joined #openstack-infra01:25
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts  https://review.opendev.org/70187401:29
clarkbwe have LE certs on zuul.o.o now01:31
clarkbI'll merge the change to start using those certs first thing tomorrow morning01:31
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176401:31
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: helm-template: Add role to run 'helm template'  https://review.opendev.org/70187101:40
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts  https://review.opendev.org/70187401:40
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176401:40
*** Lucas_Gray has quit IRC01:43
openstackgerritMohammed Naser proposed zuul/zuul-helm master: Test helm charts against k8s  https://review.opendev.org/70176401:47
*** ricolin_ has joined #openstack-infra01:50
*** zbr_ has joined #openstack-infra02:15
*** zbr has quit IRC02:16
*** zbr_ has quit IRC02:17
*** gyee has quit IRC02:23
*** rh-jelabarre has joined #openstack-infra02:26
*** zbr has joined #openstack-infra02:28
*** zxiiro has quit IRC02:35
fungi#status log restarted zuul-executor service on ze02,03,04,05,12 to get log streamers running again after oom-killer got them; had to clear stale pidfile on zm0402:41
openstackstatusfungi: finished logging02:41
*** rh-jelabarre has quit IRC02:44
*** rlandy has quit IRC02:55
mnaserthank you for taking care of it fungi02:56
fungino promlem02:59
*** apetrich has quit IRC03:12
*** ricolin_ has quit IRC03:18
*** ricolin has joined #openstack-infra03:19
*** psachin has joined #openstack-infra03:22
*** armax has quit IRC03:41
*** ykarel|away has joined #openstack-infra04:22
*** hwoarang has quit IRC04:24
*** hwoarang has joined #openstack-infra04:26
*** stevebaker has quit IRC04:41
*** tetsuro has quit IRC04:44
*** tetsuro has joined #openstack-infra04:45
*** tetsuro has quit IRC04:49
hrwfungi: thanks!04:54
*** surpatil has joined #openstack-infra05:00
*** factor has quit IRC05:08
*** factor has joined #openstack-infra05:08
*** ykarel has joined #openstack-infra05:18
*** ykarel|away has quit IRC05:20
*** tkajinam has quit IRC05:26
*** tkajinam has joined #openstack-infra05:29
*** goldyfruit has quit IRC05:30
*** goldyfruit has joined #openstack-infra05:30
*** evrardjp has quit IRC05:33
*** evrardjp has joined #openstack-infra05:34
*** ykarel_ has joined #openstack-infra05:35
*** kjackal has joined #openstack-infra05:35
*** bdodd has joined #openstack-infra05:37
*** ykarel has quit IRC05:38
*** exsdev has quit IRC05:44
*** tetsuro has joined #openstack-infra05:45
*** exsdev has joined #openstack-infra05:46
*** exsdev has quit IRC05:48
*** tetsuro has quit IRC05:49
*** tetsuro has joined #openstack-infra05:53
*** tetsuro has quit IRC05:57
*** tetsuro_ has joined #openstack-infra05:57
*** tkajinam_ has joined #openstack-infra06:02
*** exsdev has joined #openstack-infra06:02
*** tkajinam has quit IRC06:04
*** lpetrut has joined #openstack-infra06:08
*** lpetrut has quit IRC06:09
*** lpetrut has joined #openstack-infra06:10
*** kjackal has quit IRC06:11
*** lmiccini has joined #openstack-infra06:42
*** ykarel_ is now known as ykarel07:03
*** exsdev has quit IRC07:09
*** rcernin has quit IRC07:15
*** slaweq has joined #openstack-infra07:18
*** exsdev has joined #openstack-infra07:23
*** pgaxatte has joined #openstack-infra07:24
*** dpawlik has joined #openstack-infra07:41
*** iurygregory has joined #openstack-infra07:42
*** rpittau|afk is now known as rpittau07:44
*** kjackal has joined #openstack-infra07:45
*** pcaruana has joined #openstack-infra07:53
*** kjackal has quit IRC07:57
*** ykarel is now known as ykarel|lunch07:57
hrwfungi: kolla-build-ubuntu-source-aarch64 SUCCESS in 1h 48m 13s (non-voting)07:58
hrwfungi: thanks again07:58
*** jtomasek has joined #openstack-infra08:09
*** tetsuro_ has quit IRC08:10
*** gfidente|afk is now known as gfidente08:11
*** tosky has joined #openstack-infra08:12
*** dchen has quit IRC08:16
*** iurygregory has quit IRC08:17
*** tesseract has joined #openstack-infra08:22
*** tkajinam_ has quit IRC08:22
*** fdegir has quit IRC08:22
*** fdegir has joined #openstack-infra08:23
*** pkopec has joined #openstack-infra08:23
*** pkopec has quit IRC08:23
*** kjackal has joined #openstack-infra08:29
*** iurygregory has joined #openstack-infra08:33
*** dpawlik has quit IRC08:39
*** dpawlik has joined #openstack-infra08:45
*** pcaruana has quit IRC08:46
*** harlowja has quit IRC08:48
*** xek__ has joined #openstack-infra08:49
*** factor has quit IRC08:50
*** factor has joined #openstack-infra08:51
*** harlowja has joined #openstack-infra08:51
*** pcaruana has joined #openstack-infra08:53
*** jpena|off is now known as jpena08:54
*** ralonsoh has joined #openstack-infra08:56
*** ykarel|lunch is now known as ykarel09:04
*** gibi has left #openstack-infra09:06
*** zbr is now known as zbr|rover09:19
*** lucasagomes has joined #openstack-infra09:22
*** apetrich has joined #openstack-infra09:28
*** derekh has joined #openstack-infra09:35
*** ociuhandu has joined #openstack-infra09:48
*** apetrich has quit IRC10:04
*** dtantsur|afk is now known as dtantsur10:05
*** ykarel is now known as ykarel|afk10:11
*** apetrich has joined #openstack-infra10:13
*** ykarel|afk is now known as ykarel10:35
openstackgerritMatthieu Huin proposed zuul/zuul master: [WIP] Docker compose example: add keycloak authentication  https://review.opendev.org/66481310:41
*** hrw has left #openstack-infra10:52
*** aedc has joined #openstack-infra10:58
*** aedc has quit IRC11:03
*** rpittau is now known as rpittau|bbl11:21
*** Lucas_Gray has joined #openstack-infra11:48
*** sshnaidm is now known as sshnaidm|off11:52
*** Lucas_Gray has quit IRC11:53
*** jpena is now known as jpena|lunch12:01
*** ykarel is now known as ykarel|afk12:20
*** pcaruana has quit IRC12:34
*** surpatil has quit IRC12:34
*** ykarel|afk is now known as ykarel12:36
*** pcaruana has joined #openstack-infra12:38
*** rpittau|bbl is now known as rpittau12:55
*** ociuhandu has quit IRC12:55
*** ociuhandu has joined #openstack-infra12:56
*** jpena|lunch is now known as jpena12:57
*** ociuhandu has quit IRC12:58
*** ociuhandu has joined #openstack-infra12:58
*** ykarel is now known as ykarel|afk13:03
*** goldyfruit has quit IRC13:05
*** rh-jelabarre has joined #openstack-infra13:05
*** goldyfruit has joined #openstack-infra13:06
*** psachin has quit IRC13:09
*** ykarel|afk is now known as ykarel|away13:09
openstackgerritLee Yarwood proposed openstack/devstack-gate master: nova: Renable n-net on stable/queens|pike|ocata  https://review.opendev.org/70195713:12
*** trident has quit IRC13:13
*** trident has joined #openstack-infra13:15
*** ociuhandu has quit IRC13:22
*** ociuhandu_ has joined #openstack-infra13:22
*** rlandy has joined #openstack-infra13:22
*** aedc has joined #openstack-infra13:35
*** Goneri has joined #openstack-infra13:39
openstackgerritLee Yarwood proposed openstack/devstack-gate master: nova: Renable n-net on stable/rocky|queens|pike|ocata  https://review.opendev.org/70195713:47
*** aedc has quit IRC13:51
*** gfidente has quit IRC13:59
*** gfidente has joined #openstack-infra14:05
openstackgerritSimon Westphahl proposed zuul/nodepool master: Always identify static nodes by node tuple  https://review.opendev.org/70196914:06
openstackgerritSimon Westphahl proposed zuul/nodepool master: Always identify static nodes by node tuple  https://review.opendev.org/70196914:10
openstackgerritMatthieu Huin proposed zuul/zuul master: JWT drivers: Deprecate RS256withJWKS, introduce OpenIDConnect  https://review.opendev.org/70197214:20
*** liuyulong has joined #openstack-infra14:25
*** dtantsur is now known as dtantsur|brb14:27
*** ociuhandu has joined #openstack-infra14:46
*** ociuhandu_ has quit IRC14:46
openstackgerritDavid Shrewsbury proposed zuul/zuul master: Extract project config YAML into ref docs  https://review.opendev.org/70197714:47
*** eernst has joined #openstack-infra14:49
*** ykarel|away is now known as ykarel14:53
*** eernst has quit IRC14:56
*** lmiccini has quit IRC14:57
*** dave-mccowan has joined #openstack-infra15:06
fungiamotoki: tosky: AJaeger: looking at those stable tox failures, the log shows /usr/local/bin/tox is being run directly (not under an explicit interpreter) so it must be getting installed with python3. the job log also indicates the ensure-tox role found an existing tox executable so that suggests it's preinstalled in our images (i couldn't find any record of tox getting installed within the job).15:08
fungiunfortunately our nodepool image build logs don't seem to be verbose enough to include confirmation that tox is being installed or how, so i'll need to dig into nodepool element sources15:08
AJaegerfungi, thanks for digging into this15:09
fricklerfungi: where is this failing? I seem to remember that we had (and fixed) some similar issue a couple of weeks ago15:15
fungiinfra-root: nb01 has run out of tempspace and is no longer able to build images. i'll attempt to remedy15:15
*** jtomasek has quit IRC15:16
*** armax has joined #openstack-infra15:16
*** eharney has joined #openstack-infra15:17
fungifrickler: stable branch tox jobs for horizon at least. this example was a pep8 job for stable/pike: https://zuul.opendev.org/t/openstack/build/daaeaedb0a184e29a03eeaae59157c78/15:18
fungiit looks like probably a few weeks ago (mid-december) our ubuntu-xenial images started installing tox under python315:18
frickleryes they did and iirc we said the fix was to set basepython=2.7 for those jobs that need that15:20
*** zul has joined #openstack-infra15:20
fungithe solution we suggested for ubuntu-bionic is still applicable in my opinion (stable/pike of horizon can't run `tox -e pep8` on python3 but its tox.ini doesn't actually indicate that)15:20
fungifrickler: the ml thread from november was about the default python for tox changing on our ubuntu-bionic images15:21
fungisome weeks later, something happened to make it the case on ubuntu-xenial images as well15:21
fungihttp://lists.openstack.org/pipermail/openstack-discuss/2019-November/010957.html15:22
openstackgerritMerged openstack/project-config master: Remove duplicated ACL files  https://review.opendev.org/70091315:22
*** ociuhandu has quit IRC15:23
amotokiit seems tox uses the interpreter where tox is installed as the default python when basepython is not specified.15:26
amotokiin case of horizon, we have landed a workaround like https://review.opendev.org/#/c/701848/15:27
amotokiit turns out it affects all jobs without basepython.... horizon is okay now but I am afraid that at least horizon plugins are affected.15:28
fungiamotoki: yes, the solution in ianw_pto's ml post from november 20 is still probably a good idea just so that the tox.ini is appropriately explicit about what major version of python it needs15:31
fungiotherwise a developer who has installed tox with python3 on their machine will encounter the same problems when trying to run tests locally15:32
fricklerfungi: amotoki: and that arguments holds regardless of how we did change the default for xenial, so I'm not sure how much value there is in trying to dig that down15:32
fungii'm still curious to know how it ended up changing for xenial images, but yes the answer likely doesn't change the recommendation15:33
openstackgerritThierry Carrez proposed openstack/project-config master: Define check-release-approval executor job  https://review.opendev.org/70198215:33
amotokiyeah, I agree that we suggest to have the interpreter explicitly in tox.ini,15:33
amotokion the other hand, I am confused as it happens in xenial. we would like to avoid a workaround for older stable branches.15:34
fungiwell, it's not a workaround, it's fixing a latent bug which simply hadn't surfaced in our ci jobs15:35
fungibut it's a bug which could easily bite developers running tox locally, as i mentioned15:35
amotokiexactly15:36
openstackgerritMerged zuul/zuul-jobs master: Make pre-molecule tox playbook platform agnostic  https://review.opendev.org/70045215:38
*** ociuhandu has joined #openstack-infra15:44
*** jpena is now known as jpena|brb15:45
fungii wonder if https://review.opendev.org/697211 (merged to dib on december 12, released in 2.32.0 the next day, probably started influencing our image builds the day after that) is what changed15:46
AJaegerfungi: if it worked on the 18th, then its still 5 days difference, isn't it?15:47
AJaegerstill, we might not have build images for 5 days...15:47
fungiyeah, not sure. you're right the timing doesn't match up though15:48
fungiour image build logs don't go back that far15:48
fricklerfungi: builds on nb02 also seem to be failing, does it need to be cleaned up, too? /me needs to leave now15:51
fungifrickler: quite possibly, i'll take a look after i finish with nb0115:52
*** rfolco has quit IRC15:52
openstackgerritThierry Carrez proposed openstack/project-config master: Define check-release-approval executor job  https://review.opendev.org/70198215:55
fungifrickler: which image build failures were you seenig on nb02? its dib scratchspace is only 56% used right now15:57
*** ykarel is now known as ykarel|away16:01
*** eernst has joined #openstack-infra16:04
*** roman_g has joined #openstack-infra16:07
*** dtantsur|brb is now known as dtantsur16:08
clarkbinfra-root https://review.opendev.org/#/c/701821/ should be the last step in LE'ing zuul.opendev.org (the certs are in place now we have to consume them in apache)16:14
clarkbmordred: corvus frickler ^ I'm able to watch that today if you approve it16:14
clarkbalso I've confirmed that zuul-ci.org no longer has a "your cert will expire soon" warning16:14
mordredclarkb: +A16:15
clarkbtyty16:15
*** jpena|brb is now known as jpena16:22
*** mattw4 has joined #openstack-infra16:23
*** rlandy is now known as rlandy|brb16:29
openstackgerritMerged opendev/system-config master: Use zuul.opendev.org LE cert  https://review.opendev.org/70182116:34
fungiwhat's the safest way to clean up a full /opt on a nodepool builder? on nb01 i see we have 0.9tb in /opt/nodepool_dib and 45gb in /opt/dib_cache16:35
fungii've stopped the nodepool-builder service on the server16:36
fungilooks like there are a ton of kernel threads for loop and bioset handling, suggesting leaked devices in chroots?16:37
clarkbfungi: usually I disable the service then reboot it to clear stale mounts16:37
fungido those need to be cleared out somehow too?16:37
clarkbyes16:37
clarkbthen /opt/dib_tmp is typically what you clean up16:37
clarkbhaving .9tb in nodepool_dib implies that the space is consumed by actual image builds16:38
clarkbwhich may imply that nb02 isn't sharing the load16:38
fungiyeah, /opt/dib_tmp is only 7mb16:38
clarkbya nb02 is out to lunch too16:38
fungiso clearing that out won't do much16:38
clarkbits got a ton of stale build processes from 202016:39
clarkber 201916:39
clarkbI think the fix in this case is to have nb02 come back and take some of the image load off of nb0116:39
clarkbthen clean up nb01 if necessary16:39
fungiokay, but same cleanup process on both?16:39
clarkbya16:39
*** lucasagomes has quit IRC16:40
clarkb(this was why I was cleaning up old images a while back, to reduce the total number of images we had so that a single builder had a chance at building them. I think we cleaned up all the images we could clean up at the time though)16:40
fungi/opt/dib_tmp on nb02 is definitely larger, waiting on du to tell me how much16:41
openstackgerritMatthieu Huin proposed zuul/zuul master: web capabilities: remove unused job_history attribute  https://review.opendev.org/70200116:41
clarkbwe cleared out a couple fedora images and opensuse images iirc16:41
clarkbI wonder if maybe the oldest debian can go too?16:41
fungilikely, but we'd want to codesearch to see if it's in use before we pull it16:42
clarkbyup that is what we did with the other images. Pushed up changes to remove jobs that use them if just old or update them to use newer options. Then remove the nodeset. Then remove the images16:43
clarkbnot a quick process, but this was a big part of the motivation for it.16:44
fungii wonder if it's time for another amd64 builder so losing one doesn't cause the other to fill up16:44
clarkbor add more disk to the existing builders16:45
clarkbanother option would be to delete the image from local disk once uploaded to all the clouds (but then people won't be able to download them)16:45
*** pgaxatte has quit IRC16:45
clarkbwe could potentially keep just the qcow2 compressed version and then convert to raw or vhd if necessary from there16:45
clarkbShrews: ^ as soon as I've said that I've realized that could be a really nice nodepool-builder feature16:46
clarkbShrews: basically keep a version of the image (qcow2 will almost always be smallest) for recovery purposes if necessary but delete the other versions once they have finished uploading16:46
clarkbthen we have 9GB * num images storage space instead of 60GB * num images storage space16:46
fungithat does make automated reuploading of raw images harder i guess?16:48
fungior when adding a new provider (the builder would normally start uploading already-built images to it automatically as soon as the provider was added, right?)16:48
clarkbfungi: you'd have to qemu-img convert them first16:48
clarkbthat is a good point about adding a new provider16:49
clarkbwe could force a new build at that point as a workaround but that isn't very user friendly16:49
fungihrm, not a lot more in /opt/dib_tmp on nb02 either... 50gb according to du16:50
clarkbfungi: ya check the ps listings for disk-image-create though16:51
clarkbnb02 seems stuck on a process problem not a disk problem16:51
fungiright, just in terms of clearing out /opt/dib_tmp it's not really going to free up much is what i meant16:51
clarkbya but its got about 500GB free16:51
clarkbwhcih is normal16:51
clarkb(and why if we lose one the other fills 1GB of disk16:52
*** lpetrut has quit IRC16:52
fungihow do you normally go about disabling nodepool-builder? the update-rc.d tool or rename the rc.2 symlinks from S to K or via systemctl disable or some other way?16:52
fungiedit the initscript to exit 0?16:52
clarkbsystemctl disable nodepool-builder16:53
clarkbit should give you a mesage about updating init script things16:53
fungicool, i didn't know that worked for sysv-compat16:56
clarkbyup, the way systemd sysv compat works is it automatically adds a shim unit file for each sysv init script16:56
*** gyee has joined #openstack-infra16:56
*** ociuhandu has quit IRC16:57
clarkbsystemctl can then manage that shim as if it were any other unit16:57
fungithanks!16:57
clarkb(this is why we have to daemon-reload systemd in our puppetry to have systemd figure out the service exists)16:57
*** dpawlik has quit IRC16:57
fungigot it16:57
fungiso as far as bringing these back online after rebooting and clearing out /opt/dib_tmp, i should enable and start nodepool-builder on nb02 first and leave it stopped on nb01 until a full set of images is going?16:59
fungi(so that nb01 doesn't try to build more images when it lacks disk space to write them?)16:59
clarkbyou'll need to leave it running on nb01 so that it can delete the images that nb02 buidls new ones for16:59
clarkbI think17:00
fungiis it smart enough to know not to try to build any on nb01 until it deletes some?17:00
clarkbno17:00
clarkbit will fail to build images in that period17:00
fungibecause it's going to have maybe 90gb free here after i clear dib_tmp17:00
clarkbthis is where the auto rebuild aggresiveness makes it difficult to work with nodepool17:00
clarkbbecause we could delete the older images of a pair to free up space but then it will immediately start trying to build that image17:01
fungiis it safe to delete /opt/dib_tmp itself, or do i need to leave the directory and just remove contents?17:01
clarkbyou need to remove the contents or wait for puppet to run and put it back or put it back yourself17:01
clarkbnodepool doesn't create that dir17:01
fungiokay17:01
fungithanks17:01
clarkb(typically dib would use /tmp)17:02
clarkbanother option is to pause all images in the nb01 config, then delete the older image of the pairs on it17:02
fungihrm, nb02 isn't reachable yet. maybe a periodic fsck was triggered17:02
clarkbthen only unpause the images in nb01's config once nb02 has picked up some slack17:03
clarkbIts probably ok to simply leave it running and let some errors happen?17:03
clarkbfungi: I think reboots may be slow there due to needing to clean up all those mounts and stuff17:03
fungiyeah, that seems reasonable17:03
clarkbsystemd will immediately stop sshd but then other things are slower17:03
clarkbmaybe what we need is the ability to set image pausing outside of config17:04
*** rlandy|brb is now known as rlandy17:04
clarkbthen we could say nodepool pause foo, nodepool delete foo-1, wait for nb02, nodepool unpause foo17:05
clarkband not bother with emergency files and config17:05
*** ociuhandu has joined #openstack-infra17:06
clarkbwe had a similar problem in the past where the most recent image was the problem. I wanted to delete the most recent image and use the previous image but nodepool immediately started building a new image that would be broken17:06
clarkbsolution there is to pause then delete17:06
*** rpittau is now known as rpittau|afk17:10
funginb01 cleanup is done but i'm not starting it just yet because nb02 is still unreachable17:12
fungii'll check the oob console17:12
*** tesseract has quit IRC17:19
funginb02 console just shows "Ubuntu 16.04" and a little spinner17:20
funginot sure if it's booting or stopping17:20
fungihiding boot/shutdown progress from the console display is an unpardonable sin. why would that be the default?17:21
*** eernst has quit IRC17:22
fungiand the `console log show` cli command is unsupported for rackspace17:23
clarkbfungi: thats long been an issue on ubuntu (the hiding console output on servers problem)17:23
fungii guess our options are to wait, or try to force a(nother) reboot and hope it doesn't irrecoverably corrupt /opt17:23
clarkbI seem to recall that is a symptom of fscking17:23
*** zxiiro has joined #openstack-infra17:24
clarkbbecause you get error messages if there was actually something wrong much more quickly17:24
fungiahh17:24
*** liuyulong_ has joined #openstack-infra17:25
fungiso maybe it did hit a scheduled fsck on boot and since /opt is 1tb (and maybe slow)...17:25
clarkbpuppet apply at about 1800UTC should update zuul.opendev.org cert17:27
*** liuyulong has quit IRC17:28
*** dtantsur is now known as dtantsur|afk17:30
*** eernst has joined #openstack-infra17:30
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: helm-template: Add role to run 'helm template'  https://review.opendev.org/70187117:31
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts  https://review.opendev.org/70187417:31
openstackgerritMohammed Naser proposed zuul/zuul-jobs master: apply-helm-charts: Job to apply Helm charts  https://review.opendev.org/70187417:31
*** ociuhandu has quit IRC17:33
*** evrardjp has quit IRC17:33
*** evrardjp has joined #openstack-infra17:34
*** ociuhandu has joined #openstack-infra17:36
openstackgerritMerged zuul/zuul-jobs master: install-go: bump version to 1.13.5  https://review.opendev.org/70046717:42
fungiokay, nb02 finally became reachable, cleaning it up now17:45
clarkbinfra-root: thoughts on adding gmann to devstack-gate core? seems like the changes goign in now are for life support on old branches17:46
fungii'm in favor17:46
clarkbgmann in particular seems to be helping to ensure those changes get in so having him be able to approve would be good I think17:46
fungiand he's helping drive the openstack cycle goal to drop legacy jobs from master17:47
fungiwhich should mean less use of d-g overall17:47
clarkbgmann: ^ would you be interested in that?17:47
gmannclarkb: fungi sure, that will be helpful. thanks17:48
toskyoh, right, devstack-gate was originally part of infra and not QA17:48
tosky(I guess it is still infra)17:49
*** iurygregory has quit IRC17:54
*** derekh has quit IRC18:01
smcginniszuul down?18:01
*** ykarel|away has quit IRC18:02
clarkbsmcginnis: no, looks like the apache config for new ssl cert is unhappy :/18:04
clarkbzuul is still running though18:04
smcginnisOK, I'm just getting a connection refused trying to access the status page. Good it's only that part.18:05
clarkbya I'm trying to sort it out18:05
smcginnisThanks!18:05
clarkbok I've put the old vhost config abck in place18:08
clarkband put the host in the emergency file. This way the webserver is up and runnign while I sort this out18:08
smcginnisConfirmed - at least loads for me now.18:08
clarkboh I know what is wrong ugh18:09
*** pcaruana has quit IRC18:09
clarkbok, I made the (bad) assumption that having any content at all in /etc/letsencrypt-certs/zuul.opendev.org/ was a sign that things were happy there18:09
clarkbthey were not, I could not issue the certificate because zuul01.opendev.org does not have a acme delegation record18:10
clarkband the reason for that is we don'y have a zuul01.opendev.org, just a zuul01.openstack.org18:11
clarkbfix incoming18:11
*** ociuhandu has quit IRC18:12
openstackgerritClark Boylan proposed opendev/system-config master: Don't issue cert for zuul01.opendev.org  https://review.opendev.org/70202018:14
clarkbinfra-root ^ that cleanup is ncessary for zuul.opendev.org le happyness18:14
*** stevebaker has joined #openstack-infra18:15
*** rfolco has joined #openstack-infra18:16
*** eernst has quit IRC18:18
*** gfidente is now known as gfidente|afk18:24
*** jpena is now known as jpena|off18:25
fungiokay, nb02 is cleaned up and nodepool-builder service enabled and started on it, currently building debian-stretch-0000100801 for 6 minutes now18:29
fungiper earlier discussion, i'll start the service on nb01 now and maybe it'll fail for a bit until nb02 builds enough replacements that 01 can delete some of its older images18:29
funginb01 is now building (or at least trying to) gentoo-17-0-systemd-000013196518:31
clarkbfungi: great. I expect things should start ot settle down on the builders after a cuple images manage to build and get their old versions cleaned up18:32
clarkb~3 hours away probably18:32
fungiyup18:33
fungithe /opt partition on nb01 has only 76gb to spare, so i do expect some failures18:34
*** liuyulong_ has quit IRC18:41
*** kjackal has quit IRC18:45
*** aedc has joined #openstack-infra18:51
*** eharney has quit IRC18:54
*** pcaruana has joined #openstack-infra18:57
*** ralonsoh has quit IRC18:59
openstackgerritAndreas Jaeger proposed openstack/project-config master: Remove retired x/js-* repos from gerritbot  https://review.opendev.org/70202819:05
yoctozeptoAJaeger: by the looks of it js-openstack-lib already spams in openstack-sdks19:06
yoctozeptogerritbot config needs no update :-)19:06
AJaegeryoctozepto: indeed, was surprised by that - so, that governance change somehow reflects reality ;)19:06
AJaegeryoctozepto: 702028 is the change I did - thanks for reminding me of that one19:07
yoctozeptoAJaeger: well, it is far from infra19:07
yoctozeptono problem, apply cleaning procedures before the weekend :-)19:07
AJaeger;)19:07
* AJaeger would just love to have the house cleaned up as easily :)19:08
clarkbAJaeger: ++19:08
yoctozeptoI'm allergic to dust so I clean mine regularly...19:09
yoctozeptoAJaeger, clarkb: regarding https://review.opendev.org/#/admin/groups/1408,members <- how to propose change - I presume it should happen after governance change anyways :-)19:10
AJaegerclarkb: regarding js-openstack-lib, I suggest you +1 as infra PTL the governance change https://review.opendev.org/#/c/701854/19:11
clarkbyoctozepto: mordred is sdk ptl so we'd give him access then he can edit the list as he wants19:11
AJaegeryoctozepto: mordred as PTL and infra-core can take care of it19:11
clarkband ya he is already in there19:11
yoctozeptoclarkb, AJaeger: mhm, that makes sense19:12
clarkbAJaeger: done19:12
AJaegerthanks19:13
* AJaeger disappears to cycle for collecting his kids19:14
*** kjackal has joined #openstack-infra19:18
yoctozeptoAJaeger: healthy you!19:20
*** factor has quit IRC19:27
*** dklyle has quit IRC19:28
openstackgerritRadosÅ‚aw Piliszek proposed openstack/project-config master: Remove old openstack/js-openstack-lib jobs  https://review.opendev.org/70203019:29
*** lpetrut has joined #openstack-infra19:30
*** dklyle has joined #openstack-infra19:37
fungiokay, nb02 finished the debian-stretch image and is now onto opensuse-15 as of 30 minutes ago19:43
funginb01 is still building gentoo-17-0-systemd for over an hour, but will hopefully complete soon19:43
fungiand it's still got 55gb worth of space left in /opt so maybe it won't fail to write19:44
fungii'm going to go out for a brief walk since we seem to have a spate of pleasant weather, but i'll be back in an hour-ish to check in on it19:45
openstackgerritMerged opendev/system-config master: Don't issue cert for zuul01.opendev.org  https://review.opendev.org/70202019:45
*** stevebaker has quit IRC19:50
*** eharney has joined #openstack-infra19:52
openstackgerritMerged openstack/devstack-gate master: nova: Renable n-net on stable/rocky|queens|pike|ocata  https://review.opendev.org/70195719:52
*** Goneri has quit IRC19:59
clarkbI have remoed zuul01 from the emergency file and will keep an eye on it20:00
clarkbnot hearing any opposition I've added gmann to d-g core20:02
clarkbI think that will help with the straggler changes that go in there to keep stable branches running20:03
clarkbwhile checking where we are in ansible + puppet loop I discovered that logstash-worker05 was not responding to ssh20:05
clarkbthis has been the case for days according to logs. I will reboot it via the api20:06
clarkb#status log Added gmann to devstack-gate-core to help support fixes necessary for stable branches there.20:07
openstackstatusclarkb: finished logging20:07
clarkb#status log Rebooted logstash-worker05 via nova api after discovering it has stopped responding to ssh for several days.20:07
openstackstatusclarkb: finished logging20:07
clarkbsyslog doesn't show anything but it appears to have stopped on december 21, 201920:08
clarkbfungi: looks like nb03 is also in a similar no disk state. I'm going to apply similar cleanup to it now20:13
clarkbfungi: also I think part of our problem is we are holding much older copies of images possibly because we're failing to delete them from clouds (probably vexxhost because of the BFV you can't delete this image because something is using it problem)20:23
clarkbfungi: I'm going through on nb01 and clearing out files in /opt/nodepool_dib that don't correspond to images reported by dib-image-list20:23
clarkbas a first pass cleanup20:23
clarkbanything that remains is still valid and possibly "stuck"20:23
*** arif-ali has quit IRC20:25
clarkbfungi: ok nb01's /opt/nodepool_dib contents should reflect what is in nodepool dib-image-list now20:39
clarkbwe have an excess of bionic, buster, centos-7, and gentoo images which I think is related to not being able to delete them from cloud providers20:40
clarkbnb03 /opt/dib_tmp cleanup is very slow20:41
clarkbwe have issued an LE cert properly for zuul.opendev.org now20:42
clarkbjust waiting for puppet to run and switch the apache config over20:42
clarkbFailed to delete image with name or ID 'ubuntu-bionic-1573653999': 409 Conflict: Image c68d93eb-72ff-42ad-b5c8-63daace0286a could not be deleted because it is in use: The image cannot be deleted because it is in use through the backend store outside of Glance. (HTTP 409)20:46
clarkbthat confirms that for at least one of the images20:46
clarkband I've tracked one of the centos-7 image leaks to a volume that reports to be attached to a server that no longer exists20:51
clarkbI think that means we want to start with a volume cleanup20:51
clarkbthen let nodepool cleanup images again then see if there is anything left20:51
clarkbI expect that to be fairly involved and I want to finish up this zuul cert thing and find lunch first20:52
fungii wonder if nodepool could be adjusted to delete local copies of images it also wants to delete remotely, regardless of whether remote deletion fails20:54
fungiif it's actively trying to delete those images from providers, there's probably no need to keep the local copy of them on disk any longer20:55
clarkb++20:55
clarkbfwiw cleaning up nb03's dib_tmp freed like 16GB. I'm not doing the same cleanup to /opt/nodepool_dib there that I did on nb01 to see if we can free more space20:55
openstackgerritAndreas Jaeger proposed openstack/openstack-zuul-jobs master: Remove jobs and templates used by js-openstack-lib  https://review.opendev.org/70151020:55
fungii assume we'd still need to keep a record of the images since that's how it knows to keep trying to delete them?20:55
clarkbnb03 poses a slightly different problem. We've got images associated with linaro-cn1 in zk and those will never delete because the cloud is gone.20:57
*** michael-beaver has joined #openstack-infra20:57
clarkbFor there I'll delete them from disk then after lunch I can figure out how to surgery the zk db?20:57
clarkbfungi: all of that is stored in zk and is the source of the problem for ^20:58
clarkbzk says that image must be deleted but it will never be deleted at this point because the cloud is gone so we need to edit the zk db20:58
clarkbI'll start with simply removing them from disk as that is easy and frees space20:58
clarkboh except the ones for cn1 are not on disk? we'ev also got images that refuse to delete in london?20:59
fungithat sounds like a royal mess21:00
fungii don't suppose zk has a convenient cli you can use to inspect and manipulate records?21:00
mordredfungi: zkshell21:01
mordredfungi: https://github.com/apache/zookeeper/blob/master/zookeeper-docs/src/main/resources/markdown/zookeeperCLI.md21:01
clarkbI'm trying to manually image delete the leaked images in london now21:02
clarkbto see if the error is useful21:02
clarkbat the very least we should be able to apply fungi's new rule for deleting from disk when db record is set to deleting manually21:03
fungithat's a good point21:03
clarkbzuul.opendev.org is LE'd now21:04
clarkbI'm going to go find lunch while i wait for this image delete to return21:04
clarkbfungi: if you want to poke at the vexxhost image leaks via volume leaks I can poke at nb0321:04
clarkbI'm not doing anything with nb01 or nb02 right now so we won't be getting in each other's way21:05
fungicool, will do21:05
fungithough i need to get started making dinner soon21:05
clarkbfungi: what I noticed is that if you volume list sjc1 you'll get some volumes that say "attached to $name" and others are "attached to $uuid"21:05
fungiwill see if i can get through them quickly21:05
clarkbthe $uuid ones seem to not have names because those servers do not exist anymore and we have leaked those volumes21:06
clarkbI think if we delete those volumes after confirming the servers do not exist then the images should be able to delete21:06
*** rfolco has quit IRC21:06
clarkband then nodepool will automatically remove the files on disk21:06
fungiand there's a special way mordred worked out to possibly delete them?21:06
clarkbfungi: ya you unattach them first21:06
fungi(if still attached to nonexistent instance)21:06
clarkbI don't know what the specific details for that are but its some api call to do an unattach21:07
fungiright, i have a feeling there is no way to do it with osc, will need to use sdk or api21:07
clarkbah21:07
fungiyou can detach normally *if* the instance still exists21:08
fungiif the instance was deleted but cinder still has an attachment record pointing to it, then you need an undocumented api call21:08
*** hwoarang has quit IRC21:16
*** hwoarang has joined #openstack-infra21:16
corvusclarkb: thanks for z.o.o!21:23
*** zxiiro has quit IRC21:23
fungijust reconfirmed, if i try `openstack server remove volume eb0cbf8e-16b5-4712-8274-c4989b1bf956 0f91579c-c627-452b-aad4-67cdeae865c3` i get No server with a name or ID of 'eb0cbf8e-16b5-4712-8274-c4989b1bf956' exists.21:24
smcginnisfungi, clarkb: I believe mordred was going to look at doing something for that.21:24
smcginnisWe were talking about it the other day and he confirmed he can call the API needed to clean things up.21:25
fungismcginnis: yep, in the meantime i can probably use the api/sdk21:25
clarkbmy image delete against linaro-london hasn't returned yet21:25
clarkbI think I'll go ahead and apply fungi's rule of deleting from disk when we start the delete process on nb0321:26
smcginnisIt's a long was from Portland to London.21:26
*** rlandy has quit IRC21:26
clarkbthis will give us room for normal operations while we sort out why those images aren't deleting21:26
openstackgerritJames E. Blair proposed zuul/zuul-jobs master: Fix typo in helm role  https://review.opendev.org/70204621:27
*** kjackal has quit IRC21:27
*** Goneri has joined #openstack-infra21:28
clarkbnb03 is running a builder again after that cleanup21:33
clarkbfungi: is doing the zk surgery something you are interested in doing? I don't think that is urgent so fine if you want to give it a go next week21:33
fungii can, sure21:33
fungistill looking at forced volume detachment21:34
clarkbI've done it a few times in the past so can help, but figured if you hadn't done it before this might be a good time to try :)21:34
fungimight be nice of osc grew a --force option to volume delete which did the os-force_detach action from https://docs.openstack.org/api-ref/block-storage/v3/#force-detach-a-volume21:34
fungioh, whaddya know! `openstack volume delete --force <uuid>` is a thing!21:36
fungi--force Attempt forced removal of volume(s), regardless of state21:36
fungiunfortunately, in vexxhost:21:36
fungi"Policy doesn't allow volume_extension:volume_admin_actions:force_delete to be performed. (HTTP 403)"21:36
fungimnaser: do you happen to know if there's a (maybe safety-related) reason for that ^ ?21:37
fungii guess that's considered a protected admin-only function?21:38
mnaseri believe that cinder by default has that as an admin-only policy thing21:38
mnaser(we don't have custom policy fwiw)21:38
fungimaybe force deleting volumes associated with an existing instance could crash hypervisors or something21:38
fungimakes sense21:38
mnaserhttps://github.com/openstack/cinder/blob/master/cinder/policies/volume_actions.py#L10621:39
mnaseryeah the cinder default is admin API21:39
mnaserso i'll delegate that answer to them :-p21:39
fungiif pabelanger still hung out in here i'd ask him for details on the environment where he was successfully using the os-force_detach action21:39
mnaserfungi: i think he might have deleted the attachment in cinder first21:40
fungiyeah, i was hoping that's what `openstack volume delete --force` was doing under the hood21:40
clarkbfungi: and openstack server remove volume fails because the server does not exist anymore?21:45
clarkbthat is a command btw `openstack server remove volume`21:45
fungiclarkb: yep, that's what i was trying first21:45
clarkbstill waiting to hear back on this image delete against linaro-london :/21:45
clarkbhrm21:45
fungithe instance you specify must exist, because i guess it's asking nova to process the detachment and nova says "i have no idea what server that is"21:46
fungithe actual error is "No server with a name or ID of '<uudi>' exists."21:47
fungis/uudi/uuid/21:47
clarkbfungi: well if you want to do the zk stuff instead I can try to pick this up if pabelanger responds in #zuul21:48
*** lbragstad has quit IRC21:54
*** lbragstad has joined #openstack-infra21:54
openstackgerritMerged zuul/zuul-jobs master: collect-container-logs: add role  https://review.opendev.org/70186721:56
*** ahosam has joined #openstack-infra21:57
openstackgerritClark Boylan proposed opendev/zone-opendev.org master: Manage insecure-ci-registry ssl with LE  https://review.opendev.org/70205022:14
openstackgerritClark Boylan proposed opendev/system-config master: Manage insecure-ci-registry cert with LE  https://review.opendev.org/70205122:14
clarkbinfra-root ^ the dns change should be able to go in whenever its reviewed and ready. But I'm thinking best to hold off on the switch until next week while we juggle this nodepool cleaning22:15
mordredclarkb, fungi: moving here22:16
mordredclarkb: what did you mean by volume size?22:16
clarkbmordred: our mirror volume would be bigger than 80GB so you can check that attribute as another sanity check22:16
mordrednod22:16
clarkb80GB is our standard size for our nodepool instances22:16
mordredyes - that whole list is 8022:17
mordredclarkb, fungi: ready for me to try running it for real?22:19
clarkbya so I think worst case you might kill a running job or delete a held node22:19
clarkbmordred: does your script handle the case where volume doesn't have a server because the server hasn't booted yet on initial create?22:20
clarkbmordred: that would be the only other case I'd worry about22:20
clarkb(I think checking that volume age > 1 hour would be sufficient ot guard against that)22:20
mordreduh. I have no idea what the race conditions there would be ... good call ... one sec22:20
openstackgerritMerged zuul/zuul master: Make files matcher match changes with no files  https://review.opendev.org/67827322:23
*** lastmikoi has quit IRC22:28
*** arif-ali has joined #openstack-infra22:31
openstackgerritJames E. Blair proposed zuul/zuul-helm master: Add option to manage secrets outside of helm  https://review.opendev.org/70205222:32
mordredclarkb: would you look at clean-volumes.py and tell me if my time delta code looks right?22:34
clarkbis that on bridge?22:35
mordredyeah22:35
mordredclarkb: I did it by hand - but datetimes are so horrible in python I'd like a second set of eyes22:35
mordredmnaser: are created_at values from volumes in vexxhost going to come back in UTC?22:36
clarkbmordred: ya agreed on the horribleness22:37
mordredclarkb: also - it's safe to run that script in its current form via the line in the first line22:37
mordredif you want to run it and look at the output22:37
clarkbmordred: whats with the truncation of created_at?22:38
clarkbotherwise it looks right to me. Might also want to print the server uuid for volumes being deleted as that will be a breadcrumb for debugging if it doesn't do what we want22:38
mordredclarkb: it has microseconds ... oh - you konw - that was from when I was trying to parse with dateutil which doesn't grok those22:39
clarkbmordred: ah ya that should be fine22:39
mordredclarkb: ok. so - game for me to run that for real?22:39
clarkbmordred: I think so. And maybe add in the server uuid logging if you want22:39
fungieverything should just return utc epoch seconds, and then whatever you need is basic arithmetic22:41
mordredok. I'm going to run it and then I'll paste the output22:41
fungii mean, ideally we'd return planck units since the big bang within our relativistic frame of reference, but that's probably overengineering until we crack near-light-speed travel22:43
mordred:)22:43
*** KeithMnemonic has quit IRC22:46
mordredhttp://paste.openstack.org/show/788262/22:47
mordredclarkb, fungi: ^^22:47
mordred(the output is a bit verbose - I should just print volume_id on the delete line I think22:47
clarkbwe deleted ~23 volumes?22:48
mordrednext time someone runs it - it should be slightly less chatty22:48
mordredyeah - I think so22:48
mordredthere's still likely one left around where I manually deleted the attachment from fungi's example earlier22:48
mordredb/c I did not delete the volume itself22:48
*** dave-mccowan has quit IRC22:48
openstackgerritJames E. Blair proposed zuul/zuul-helm master: Add option to manage secrets outside of helm  https://review.opendev.org/70205222:49
mnasermordred: uh, i think so.22:49
clarkbthe image I tried deleting before is no longer there22:49
fungiyeah, looks good22:50
openstackgerritJames E. Blair proposed zuul/zuul-helm master: Change builder container name  https://review.opendev.org/70179322:50
openstackgerritJames E. Blair proposed zuul/zuul-helm master: Add empty clouds value  https://review.opendev.org/70186522:50
openstackgerritJames E. Blair proposed zuul/zuul-helm master: Add option to manage secrets outside of helm  https://review.opendev.org/70205222:50
clarkbmordred: fungi there is one image left in vexxhost to delete22:50
clarkbprobably associated to that volume monty did not delete22:50
* clarkb looks to find it22:50
fungiit'll be 0f91579c-c627-452b-aad4-67cdeae865c322:51
clarkbfungi: mordred 0f91579c-c627-452b-aad4-67cdeae865c3 I think it is that one22:51
clarkbyup22:51
clarkbshould I go ahead and delete it22:51
fungigo for it as far as i'm concerned22:51
clarkbdone22:53
clarkbcool now only linaro images stuck in deleting22:54
clarkbfungi: are you at the end of your week or do you want to do that now?22:54
fungii can take a look after dinner22:55
openstackgerritMonty Taylor proposed opendev/system-config master: Add quick script for cleaning boot from volume leaks  https://review.opendev.org/70205322:55
mordredmnaser: ^^ there's a script that non-admin users can use to cleanup leaked BFV volumes on vexxhost22:55
mordredmnaser: I'm going to generalize it a bit and put it into sdk / osc ... but for now, that seems to safely work22:56
clarkbfungi: ok. basically what we want to do is `nodepool image-list | grep linaro-cn1` then for each of those records delete the zk nodes that correspond to them22:56
fungivia zkshell22:56
mordredmnaser: I mean - it should work on any modern openstack - it's just hardcoded for us to point at vexxhost since that's the only nodepool place we BFV22:56
clarkbfungi: yup which you install via pip into a venv (I've got an install on zk01.o.o in my homedir)22:57
mnasermordred: neat22:58
clarkbfungi: /nodepool/images/ubuntu-bionic-arm64/builds/0000001978/providers/linaro-cn1/images the content at that path up to linaro-cn1 should be deleted I think23:06
clarkbIt should be sufficient to simply delete the content below images/ though23:06
clarkbI found a case for the other arm64 cloud that was removed and it still was listed under providers but had no image under providers/name/images/23:06
fungiis zkshell known by another name? pypi doesn't know it23:12
clarkbfungi: zk-shell23:13
fungiahh23:13
fungiyup, that's working23:13
clarkbya I think you only need to remove 0000000001 from /nodepool/images/ubuntu-bionic-arm64/builds/0000001978/providers/linaro-cn1/images23:14
clarkbthen do that for the other 6 linaro-cn1 images23:14
fungiwhat about nodepool/images/ubuntu-bionic-arm64/builds/0000001978/providers/linaro-cn1/images/lock23:19
fungileave that there?23:19
clarkbya that node is still there for the nrt arm cloud23:20
clarkbI think we can actually delete everything linaro-cn1 and below23:20
fungias in `rm nodepool/images/ubuntu-bionic-arm64/builds/0000001978/providers/linaro-cn1`23:21
clarkbya23:21
clarkbI think you have to rm things below it first, there is no -r for this23:22
fungiindeed: /nodepool/images/ubuntu-bionic-arm64/builds/0000001978/providers/linaro-cn1 is not empty.23:25
*** rfolco has joined #openstack-infra23:25
fungiokay, manually recursed23:25
fungii'll work through the others23:25
clarkband now image-list doesn't show that image anymore23:26
clarkbreally the key thing here is to avoid operating on zk when nodepool may be operating on those nodes, but we know zk won't do that because this cloud dne anymore23:26
*** mattw4 has quit IRC23:27
*** lastmikoi has joined #openstack-infra23:27
fungihow did you identify the /nodepool/images/ubuntu-bionic-arm64/builds/0000001978 node?is that the build id reported by nodepool image-list? or the upload id, or something else entirely?23:28
clarkb1978 is the build id for that image name23:28
fungii'm looking at build id 0000012627 upload id 0000010921 for debian-stretch-arm64 in linaro-cn123:28
clarkbthen the 00000...1 you removed under provider is the upload id for the provider23:29
fungioh... i need to make the image name in the path match too23:29
clarkbnodepool/images/debian-stretch-arm64/builds/0000012627/providers/linaro-cn123:29
* fungi smacks forehead23:29
clarkbyup23:29
clarkbin old nodepool the build id was unique globally23:30
clarkbbut now it is per image name23:30
fungiokay, it's working as expected23:30
fungione more down23:31
fungiwill work my way through the rest23:31
openstackgerritJames E. Blair proposed zuul/zuul-helm master: Add Zuul charts  https://review.opendev.org/70046023:31
fungiokay, linaro-cn1 entries are entirely gone from image-list23:34
*** pcaruana has quit IRC23:35
clarkbconfirmed23:37
clarkbthat leaves us with figuring out linaro-london image situation23:37
clarkbmy image delete is still sitting there23:37
*** rfolco has quit IRC23:40
fungii still need to wash dishes, but can probably help once i'm done23:43
clarkbI'm going to try manually deleting the other images that have leaked there and see if any act different than the random one I selected first23:44
clarkbI can talk to the api because image show on that image name works23:46
clarkbunless there is layer 7 firewalling we shouldn't be getting lost that way23:46
clarkbadding --debug to the osc command shows it getting all the way to the delete request on the image uuid23:47
clarkbso also not getting lost somewhere in between due to name lookups23:48
clarkbthat makes me think it is likely a cloud problem23:49
clarkbkevinz: http://paste.openstack.org/show/788263/ is a list of images that nodepool has been trying to delete in the linaro london cloud. Manually attempting to delete them shows the commands getting as far as the DELETE http request but they seem to hang there23:52
clarkbkevinz: nodepool not being able to clean up these images has meant it kept them around on disk which ended up filling the disk on our builder node.23:52
clarkbkevinz: hrw: maybe I'm thinking this must be something on the cloud side as I am able to show those images just fine (implying api access is otherwise working)23:52
clarkbis that something you can look into when you get a chance?23:53
fungiin good news, /opt utilization on nb01 is falling rapidly23:55
clarkbkevinz: hrw I realize it is likely your weekend now so no rush. We can pick this up next week23:56
*** michael-beaver has quit IRC23:57
clarkbfungi: ya I'm not sure there is much else we can do re deleting these images23:58
openstackgerritJames E. Blair proposed zuul/zuul-helm master: Allow tenant config file to be managed externally  https://review.opendev.org/70205723:58
clarkblets see if kevinz can help on monday23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!