Tuesday, 2023-05-23

tonybclarkb: That's all totally fair.00:42
tonybIf I understand the scrollback we can pause/revert the quay.io work, migrate the tooling to podman[1] and then resume the migration00:43
tonyb[1] When people other than clarkb is able to push on it00:44
clarkbfwiw I can work on it too. I just don't want the expectation to be clarkb is gonna get it all done in a few weeks for quay.io stuff :)00:44
clarkbIdeally we can work on it together over time00:44
tonybYeah.  definately *not* a clarkb thing.00:53
tonybFWIW, whenever I make suggestions I'm always of the opinion that if I'm not going to do the work I only count as 0.5 of a vote.00:55
clarkbit looks like the nodepool podman test is going to or has timed out because image builds weren't happening01:01
clarkbI can't poke at that more today. Feel free to if interested01:02
tonybclarkb: totallty interested, but also it isn't exactly in my "wheelhouse".01:05
*** amoralej|off is now known as amoralej06:33
*** mooynick is now known as yoctozepto09:14
opendevreviewMerged opendev/base-jobs master: buildset-registry: Always use Docker  https://review.opendev.org/c/opendev/base-jobs/+/88386911:45
fungiyoctozepto: ^ should be able to recheck now11:55
yoctozeptofungi: thanks, yeah, it went further: https://zuul.opendev.org/t/nebulous/build/a8d511bf7e6b487684e69210ef59d812 I just need to fix the references11:58
yoctozeptoand now it works :D12:12
*** amoralej is now known as amoralej|lunch12:13
*** amoralej|lunch is now known as amoralej13:05
*** amoralej is now known as amoralej|off16:10
fungithe rackspace tickets i opened yesterday have been acted on, reclaiming 118 nodes worth of capacity for jobs16:18
clarkbany indication if we should expect the problem to recur?16:20
fungino clue, but it did cause rackspace support to ask who was opening the tickets since (with both of our accounts) the internal employee advocate is the one whose contact information appears on the account rather than ours16:21
fungiapparently the current account contact there is don norton, who was surprised by the ticket16:22
fungiit finally happened... https://blog.pypi.org/posts/2023-05-23-removing-pgp/16:50
opendevreviewClark Boylan proposed openstack/diskimage-builder master: DNM testing if depends-on parent change works with dib  https://review.opendev.org/c/openstack/diskimage-builder/+/88395817:04
yoctozeptohave you seen this error using the opendev cointainer image promoting job? https://zuul.opendev.org/t/nebulous/build/e2e20e8bf84d4fc9b9b500fd1dea6e0e17:45
yoctozeptooh well, it means nothing was obtained from the api, strange17:46
yoctozeptoargh, a typo17:47
clarkbyoctozepto: that means the job is looking for the gate job that build your image17:47
clarkband couldn't find it17:47
clarkb(it uses that info to then find the artifact to fetch and promote)17:47
yoctozeptooff by one letter17:47
yoctozeptoyeah, that I figured :-)17:47
yoctozeptobut it's like17:47
yoctozepto"huh, it should be there"17:47
yoctozeptoand then "oh well, one letter off"17:48
yoctozeptonighty night!19:14
tonybI have a couple of "how does it work" questions. When people have time?19:54
fungii have time, hopefully even answers, and if you're lucky they'll even be correct19:55
tonyb1) For AFS utilization is there a finegraned way to see howmuch $something is using.  Currently I'm looking at https://grafana.opendev.org/d/9871b26303/afs?orgId=1 for a general sense but if I wanted to see how much we'd reclaim if we removed $x from storage where shoudl I go?19:55
fungiif you want to install the openafs client locally, you can check quotas with fs subcommands19:56
clarkbyou can also look at the rsync logs iirc they are in afs too19:56
clarkband rsync has size info19:56
tonyb2) this one is more basic, how do I find which jobs/builds are using the Ubuntu cloud archive?  I'm just using stable/$branch && ubuntu as a proxy but I don't know if that's valid19:57
tonybthese both came up from me looking at: https://review.opendev.org/c/opendev/system-config/+/88346819:57
fungifungi@dhole:~$ fs listquota /afs/.openstack.org/docs19:57
fungiVolume Name                    Quota       Used %Used   Partition19:57
fungidocs                        50000000   30206816   60%         77%19:57
fungitonyb: you might be able to query for a uca url in opensearch?19:58
tonybOkay.  I'll look at that.  Having OpenAFS locally is a little complex due to packaging on Fedora but I can make it work19:58
fungiif you have a debian vm you could just apt install it19:59
tonybOkay, so opensearch .... I don't knwo about that, is that essentially the old logstash?19:59
tonybfungi: very true, I could just do that.20:00
fungiessentially, except it's being run by openstack community volunteers20:00
fungithe project team guide has details i think, or maybe the tact sig page on governance... checking20:00
tonybAhhh okay that's why I couldn't find it when I went poking in git20:00
tonybclarkb: rsync logs are .... https://static.opendev.org/mirror/logs/rsync-mirrors/ ?20:00
fungitonyb: https://governance.openstack.org/sigs/tact-sig.html#opensearch20:01
fungitonyb: yes20:02
fungifor the mirror content rsyncing logs20:02
tonybfungi: Thanks x220:02
fungiyou bet20:04
fungiif you have any other questions, i'm happy to answer any time i'm awake20:04
tonybI don't see anything "unbuntu" in .../mirror/logs20:04
tonybfungi: thanks20:04
fungiubuntu (and debbian) mirrors are not mirrored with rsync20:04
fungithey use a tool called reprepro20:04
tonybThis 6-months suck WRT tz overlap20:05
fungiwe may not be splatting or copying reprepro logs into afs20:05
fungibut that's something we could add, i'm sure20:05
tonybOkay: https://static.opendev.org/mirror/logs/reprepro/20:05
tonybthose logs don't help with the size thing20:06
fungithat was quick!20:06
tonybIts next to rsync in the list ;P20:06
fungiwhat's your size dilemma you need to answer?20:08
tonybfungi: It isn't a dilema as such.  I was curious how much AFS we'd get back if we merged: https://review.opendev.org/c/opendev/system-config/+/883468/ which would stop mirroring older UCA things20:10
fungioh, got it20:10
fungiit's hard to know precisely because debian package repositories are often deduplicated in order to avoid carrying identical copies of packages which might be the same in more than one distribution release20:11
fungithey don't use completely separate file trees like other distros tend to20:11
tonybAhh of course the "pool" concept.20:12
fungiyes, exactly20:12
fungireprepro deletes any packages not still referenced in the indices20:12
fungiso removing an index will free up the space needed by any packages which are only listed in that index, but packages which were also listed in other indices are retained20:13
fungiuca may trivially not pool its packages, so we might simply be able to du a subtree to get a good guess20:13
tonybOkay.  I understand.  This has been helpful.20:14
clarkbya I think UCA is pretty well segregated by openstack release and ubuntu relase20:14
funginevermind, uca is also pooled20:14
fungibut we might be able to estimate it by parsing file sizes out of the indices20:15
clarkbI restarted the merger on zm0620:17
fungithe "size" fields in indices like /afs/.openstack.org/mirror/ubuntu-cloud-archive/dists/bionic-updates/rocky/main/binary-amd64/Packages.gz20:17
fungitonyb: ^ you could collect up all the relevant indices and then parse those files20:17
clarkbfungi: does du work against openafs mounted content?20:18
tonybI think so.20:18
clarkbya so du might work. Might also be a bit slow as it stats everything20:18
tonybIt was mostly I did one review and came up with a bunch of impacts from it and I couldn't really answer any of them so I figured 1) my review wasn't super helpful; and 2) I needed to ask :)20:20
clarkbtonyb: re fedora + afs you might get away with kafs though I'm not sure if that gets you userland support you might need20:21
clarkbI tried kafs on opensuse a while back and it didn't work but that was a while ago20:21
tonybI don't know about the userland stuff ianw suggested it, tried it and immediately found it is non-functional ATM :/20:21
clarkbat one point I had my fileserver doing openafs mounts for me because it is ubuntu based (with zfs!)20:23
fungiclarkb: du won't work because, as i said, uca is pooled after all20:30
fungithere's not separate subdirectories for each set of packages20:30
tonybAlso using opensearch answered my which (UCA) releases are still in use question20:31
clarkbfungi: I just mean generally. Du doesn't work on some filesystems. btrfs in particular has tripped me up because it gives you some naive view20:47
clarkbthough I think that may have improved over time20:48
fungii've used it with afs in the past20:48
fungithough it can take a while20:49
clarkbya lots of stats is slow iirc20:50
fungifungi@dhole:~$ du -s /afs/.openstack.org/mirror/ubuntu-cloud-archive20:53
fungi6583494 /afs/.openstack.org/mirror/ubuntu-cloud-archive20:53
opendevreviewClark Boylan proposed openstack/diskimage-builder master: fedora: don't use CI mirrors  https://review.opendev.org/c/openstack/diskimage-builder/+/88379822:47
clarkbfungi: ianw ^ I think that should fix the most recent error22:48
fungiah, cool!22:48
fungiyep, so it was trying to use the mirrors which were no longer set22:50
clarkbcorvus: moving here because its more opendev specific. I think our system-config-run-zuul jobs deploy and configure a zookeeper for ssl and all that right? we should be able to adapt that to the nodepool job and then maybe even have that job build an image to test things end to end?22:51
clarkbI think the nodepool job currently doesn't do any workload because there is no zookeeper present22:52
corvusclarkb: yeah, that could be done... but... two things: 1) that will take ages assuming a production image, and if we use a dummy image, i'm not sure that adds anything; 2) we CD nodepool, so that kind of breakage is more likely to come from the nodepool repo than system-config22:59
corvusclarkb: also 3) it shouldn't be necessary once we move image building into jobs so may not be a great investment22:59
clarkbthat is a good point23:03
opendevreviewJames E. Blair proposed opendev/system-config master: WIP: Test zuul on jammy  https://review.opendev.org/c/opendev/system-config/+/88398623:07
opendevreviewJames E. Blair proposed opendev/system-config master: WIP: Test nodepool on jammy  https://review.opendev.org/c/opendev/system-config/+/88398723:11
corvusclarkb: is there any current testing that would actually exercise that nested podman issue?23:15
clarkbcorvus: I think just what nodepool's testing was doing before the quay move broke speculative gating23:17
corvusyeah, it looks like the container-release job should do that23:17
clarkbwe could do the workaround with skopeo and run it under docker instead of podman23:18
clarkband have a one off job on the side sort of deal just to cover that case23:18
clarkbthat probably wouldn't be too terrible since we can isolate the job23:18
corvushrm? in nodepool repo?  i don't think that's necessary...23:18
corvusi just want to know if https://review.opendev.org/883952 means it really worked23:19
corvusand it looks like it did... though i should probably update that to also remove your sudos23:20
clarkbcorvus: butthat would be podman nested in podman23:20
corvusright, which is what we want23:20
clarkbah I see. I was confused I think due to the concern about how opendev is still podman in docker23:21
corvusi think the specific question was: with https://github.com/containers/podman/issues/14884 merged can we now remove the cgroup hack23:21
clarkbbut ya I think that shows podman in podman is fine. The original issue was docker in podman (not sure if it exhibited with podman in podman or not)23:21
clarkbcorvus: right but the original issue was filed specific as podman in docker being problematic. Unknown if the same issue existed as podman in podman23:22
clarkbWe can test the podman in docker case if we revert the podman change and use the skopeo hack or just run a separate job instead of a revert that does that23:22
clarkbanyway I think it is probably sufficient to land that and if it breaks opendev we can revert and tackle with more robust testing23:22
clarkbsince the impact will be low23:22
corvusyeah. i think if we can run podman-in-podman as a normal user without the cgroup hack in, oh, say about a month after debian releases, then i think we're in a good place.  i think that's the key thing that, from opendev's perspective, would weigh in on whether it's okay to start landing the podman changes in the zuul project.23:23
corvusput another way, if we can clear out the cgroup hack, then i think we're good to land the podman switch for now (with the cgroup hack in place and the sudo workaround; ignore everything in opendev because nothing substantial is changing, then land the cgroup cleanup later.23:24
corvusif the cgroup cleanup doesn't work in our desired end-state, then i think opendev should raise that with the zuul project as a reason to hold off/reconsider podman23:25
clarkbI think the only thing that opendev really cares about is whether or not podman in docker would work. Everything else should be well covered.23:26
clarkbAnd the only reason that is at question is we don't know what sort of testing podman upstream did when they fixed it23:26
clarkb(it is possible they made changes they thought would fix things but for whatever reason are insufficient)23:26
clarkblooks like https://zuul.opendev.org/t/openstack/build/65d8dd29a0de4c55ba12eba75156a522/log/logs/fedora_build-succeeds.FAIL.log#1078 is still finding the mirror for some reason23:26
clarkb(separate thing)23:26
corvusclarkb: well, i think at the meeting today we said opendev wants to run nodepool in podman, so i think opendev cares if nodepool-on-podman works23:27
clarkbwell that too, it will just take a bit more time to get there. But yes that is doable with an upgrade of builders to jammy and running nodepool-builder with podman23:28
clarkbAnd in the scheme of things swapping out nodepool builders might be one of the simplest services bceause it is almost entirely backend and not user facing (so no one will notice if we take an outage t owork out the transition)23:29
corvusright.  since everything is containerized in the zuul system, node upgrades should be easy/fast.23:29
corvusexactly that :)23:29
clarkbthe transition is the other concern I have and will need to start looking at. I think it is potentially going to lead to noticeable outages for user facing things because we have to stop the service, clean up content, then start it up after fetching images into the podman context23:30
clarkband that is mostly to avoid any unexpected interaction ebtween docker run services and podman run services (since they will want the same ports and stuff)23:30
corvusthis is where the "don't always try to automatically start everything" approach is handy.  we can install both and then manually switch.23:31
clarkbya so maybe we have an update on a service by service basis that stops starting things autoamtically until that service is moved over or something23:32
clarkbthen a human can cut it over, land a cleanup change for docker and have podman autostart...23:32

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!