Friday, 2022-01-07

*** ysandeep|out is now known as ysandeep02:15
*** mazzy5098812929 is now known as mazzy50988129202:18
*** rlandy|ruck is now known as rlandy|out05:39
*** ysandeep is now known as ysandeep|afk06:40
*** ysandeep|afk is now known as ysandeep09:38
*** rlandy|out is now known as rlandy|ruck11:15
*** dviroel|afk is now known as dviroel11:25
opendevreviewGustavo Sanchez proposed openstack/project-config master: Add the cinder-solidfire charm to Openstack charms
opendevreviewGustavo Sanchez proposed openstack/project-config master: Add the cinder-solidfire charm to Openstack charms
fungiso looks like the letsencrypt certs which failed to renew are for the two vexxhost mirrors and the limestone mirror, i'll try to look through relevant logs in a bit14:24
*** ysandeep is now known as ysandeep|dinner14:45
*** ysandeep|dinner is now known as ysandeep15:10
*** dviroel is now known as dviroel|lunch15:37
*** ysandeep is now known as ysandeep|out15:49
slittle1_need help to delete a tag pushed in error15:58
clarkbslittle1_: we typically don't delete tags as they cannot be removed from downstream repositories that have already pulled the tag16:06
clarkbinstead we suggest that a subsequent tag be pushed to correct errors16:06
clarkb(importantly if we delete a tag from the repo then you tag some other commit with the same value any repo that hasn't undergone manual intervention will still see the old commit on that tag potentially creating very confusing problems)16:07
jpic_clarkb: indeed, that worked! thank you!16:20
clarkbjpic_: sorry for the trouble, but glad you got it sorted out. We've made one bug fix to gerrit to address one aspect of this, but there are a number of assumptions on the ID provider and gerrit sides that don't always align and working around all of them isn't always easy :/16:22
slittle1_my error,  the tag is good.16:23
slittle1_I was using 'git rev-parse' to inspect the tag... forgot it was anotated16:24
clarkbgood to hear16:25
slittle1_git rev-parse <tag>^{commit}   shows thhhhe correct sha16:25
*** dviroel|lunch is now known as dviroel16:33
clarkbtimburke: just following up on yesterday, should I go ahead and delete the held node? I think yes?17:15
timburkeoh yeah! thanks again for the help17:15
clarkbnow to load ssh keys17:17
timburkefwiw i'm working on writing up a bug for eventlet about it -- i think it was caused by (i noticed there's been an uptick since moving from 0.32.0 -> 0.33.0)17:17
clarkbok hold should be cleaning itself up now17:23
dtantsurhi folks! has there been any demand for non-cirros images in the CI? we in Ironic need something that has a real grub.17:53
dtantsurI'm currently looking into making something out of but any ideas are welcome17:53
clarkbdtantsur: its been discussed in the past, but not something that we would do directly I don't think. sean mooney was looking at an alpine image at one time17:54
dtantsuralpine is a good idea as well17:55
clarkbThe upside to alpine is it was designed for this use case (it existed before containers and is meant for small embedded devices and other places where size is important)17:55
clarkbNova used openwrt at one time and even has an image commited to its repo17:55
clarkb(openstack trivia time!)17:56
dtantsuroh fun :D17:56
clarkbI think that is the other place I would look for ideas or existing alternatives. Embedded distros like alpine/openwrt/etc17:56
dtantsura very good idea, thank you!17:57
dtantsurif we find something small, would it be reasonable to cache it on infra nodes?17:57
clarkbya I think something in the size range of cirros or maybe even a little bigger would be reasonable.17:57
clarkbEspecially if we can get away with not needing 5 versions :)17:57
dtantsurmmm, alpine offers ISOs for download.. and a root filesystem without kernel/bootloader17:58
clarkbya this is the main issue, iirc. So many things are stuck in 1999 with their cdroms17:58
clarkbLooks like debian .xz's their cloud images which significantly reduces their size. Another potential option is that we start with a reasonable small image and then xz it and cache that17:59
clarkbwould depend on what (de)compression time is like18:00
clarkb and xz that. /me tries locally18:01
clarkbor do something similar with dib18:02
clarkbthe upside to using a published image is it is easier for other people to reproduce18:02
fungiit's too bad the emdebian effort imploded some years back. debian blend focused on very small footprint use cases18:03
dtantsur hmm18:04
clarkbmy xz of that ubuntu image is not done so not very quick. I'm running it under time so will have timing data when it completes18:05
clarkbhrm it only compressed a few MB too18:06
clarkbI wonder what flags debian is using to get better compression. Or maybe debian's .img isn't compressed already18:06
dtantsurqcow2 is already pretty compressed18:06
dtantsur(for ubuntu)18:06
clarkbdtantsur: ya but debian's images go from 242MB to 151MB
clarkbbut maybe they aren't compressing their images upfront and ubuntu is18:07
dtantsurthe actual OS root partition is a bit larger for ubuntu:
clarkbthis is a mystery I could probably spend all day digging into but not sure there is enough value in that :) but it is curious18:12
dtantsurno worries, I'll keep experimenting with different things18:13
dtantsurthe actual file content for debian is 600M, meh..18:13
dtantsurworst case, we will download and process the image in the jobs that need it18:14
fungii think it would also be possible to include those images on our ci mirrors instead of in our node images, should that prove preferable18:15
fungiwe probably already cache larger docker images for jobs18:15
fungibut either way, teh smaller the better18:16
dtantsurfor the record, we (used to?) have a job that does some conversion of a centos 7 image:
clarkbfungi: good point. We directly mirrored the fedora atomic images for magnum18:16
clarkbdtantsur: ime the centos images are very large18:16
clarkbactually I bet dib's container image thing could be useful here18:17
dtantsurmetalsmith has a low change rate so we could do that. probably a bit too much for ironic.18:17
clarkbwe might be abkle to use it to make an alpine image with grub18:17
dtantsurI need to catch up with the DIB's container image thing. Are there docs?18:17
clarkbbasically instead of starting a chroot iwth a tool like debootstrap or yum it grabs the distro's container image an unpacks that to disk and chroots into that18:17
clarkblet me see if I can find docs for it18:18
clarkbbut in theory you could do that with alpine to get the alpine image on the chroot without grub and a kernel. THen use alpines package manager to install those extra bits18:18
clarkb hrm seems it might be more of a building block for eg the fedora element right now18:19
clarkbdtantsur: latest fedora image builds use it though so the fedora element may serve as a good example18:19
dtantsurcool, thank you! I think Fedora will work for us just as well, if the image is not large. We really only need it to be able to boot and run cloud-init.18:20
dtantsur(or glean)18:20
clarkbfungi: rax email says afsdb03 had a sad. it is up and running now and bos status for it shows it is happy. I think we're fine18:21
clarkbdtantsur: one issue with both of those is they rely on python which bloats images18:21
clarkbthis is one thing that makes cirros small, it has a simple init system (on top of all the other minimization techniques used)18:22
dtantsurdefinitely :( was it mordred who tried to rewrite glean in rust? :)18:22
clarkbhe started it, but I think the effort stalled. I poked at it a bit when learning some rust. The idea isn't terrible, the problem is more the amount of weird edge cases for every little distro difference that all have to be encoded before it is really viable18:22
clarkbbasically doable but needs effort18:23
dtantsurwell, Python is unavoidable in RH systems because of DNF18:23
dtantsurDebian may be somewhat better, but what to do with cloud-init..18:23
clarkbI think I noticed that it uses a number of outdated/replaced libraries for stuff like json/yaml too. But updating that should be straightforward. The bigger issues are related to getting the behavior aligned because each distro is different18:23
dtantsurI think more of them use either NetworkManager or systemd-networkwhatever?18:24
clarkbRHEL/Centos stream/Fedora are all networkmanager, except for centos < 8. Ubuntu uses netplan but glean configures it using the debian /etc/network/interfaces still. Gentoo and suse are systemd iirc18:25
dtantsurokay, there is enough diversity :)18:26
dtantsurfor our case, I guess, I could write a terrible bash script using jq that just sets up SSH keys and maybe basic networking18:26
dtantsurbut that won't work for anyone else18:26
clarkbthat is what cirros does I think18:26
clarkbit assumes dhcp and configures the ssh key. But not much else18:27
clarkbIt might do static config too of the network18:27
dtantsuryeah, I remember diving into its bash scripts.. not the best memories18:27
dtantsuranyway. I've got enough food for thought, now time for literal food! have a good weekend o/18:30
clarkbyou too!18:31
*** rlandy|ruck is now known as rlandy|ruck|biab18:34
*** rlandy|ruck|biab is now known as rlandy|ruck19:13
*** dviroel is now known as dviroel|afk19:37
fungiupdate on the expiring certs for a few mirrors, looks like /var/log/ansible/letsencrypt.yaml.log hasn't been touched on bridge.o.o since monday, so something is probably blocking the job from running/succeeding21:38
fungii'll continue looking after dinner21:38
clarkbin the past that happened when we broke our actual zuul config so jobs didn't run21:39
fungiso it looks like infra-prod-base is failing22:34
fungibuild history says it last ran successfully on monday22:35
clarkbhopefully the logs for it indicate why it is failing22:35
fungifatal: []: UNREACHABLE!22:35
clarkbI wonder if it needs a hard reboot22:35
fungiyeah, checking on it now22:35
clarkband if that doesn't work we can put it in emergency and email kevinz_ about it22:36
fungiit responds to ping at least22:36
fungiconnection reset on 22/tcp though22:36
clarkbIn the past I had problems with it where I could even ssh in but it had a consistently high system load and that broke ssh timeouts for ansible22:36
clarkband a reboot fixed that22:36
fungi`openstack console log show` isn't responding to me22:37
fungino, it was just the name resolution problem in the container version22:39
fungirunning a venv osc i can see some dracut and systemd errors on the console22:39
fungithough they look like the usual dmesg spam caused by dib22:40
fungirebooting it now22:40
fungithere was nothing obvious on the console to explain the connection resets22:41
fungihrm, reboot returned an opaque 5xx error22:41
fungiserver show says the server is running/active though22:42
clarkbwas that a normal reboot or a hard reboot? I don't know that I've gotten 500 errors from either in the past but the normal acpi reboots are often inneffective22:42
fungihard, but trying normal i also get the same "Unknown Error (HTTP 504)"22:43
clarkbmaybe put it in the emergency file for now and send kevinz_  an email?22:44
clarkbif the APIs are failing then there isn't much we can do I don't think22:44
fungiadded to the emergency disable list22:50

Generated by 2.17.3 by Marius Gedminas - find it at!