*** ysandeep|out is now known as ysandeep | 02:15 | |
*** mazzy5098812929 is now known as mazzy509881292 | 02:18 | |
*** rlandy|ruck is now known as rlandy|out | 05:39 | |
*** ysandeep is now known as ysandeep|afk | 06:40 | |
*** ysandeep|afk is now known as ysandeep | 09:38 | |
*** rlandy|out is now known as rlandy|ruck | 11:15 | |
*** dviroel|afk is now known as dviroel | 11:25 | |
opendevreview | Gustavo Sanchez proposed openstack/project-config master: Add the cinder-solidfire charm to Openstack charms https://review.opendev.org/c/openstack/project-config/+/823803 | 14:00 |
---|---|---|
opendevreview | Gustavo Sanchez proposed openstack/project-config master: Add the cinder-solidfire charm to Openstack charms https://review.opendev.org/c/openstack/project-config/+/823803 | 14:08 |
fungi | so looks like the letsencrypt certs which failed to renew are for the two vexxhost mirrors and the limestone mirror, i'll try to look through relevant logs in a bit | 14:24 |
*** ysandeep is now known as ysandeep|dinner | 14:45 | |
*** ysandeep|dinner is now known as ysandeep | 15:10 | |
*** dviroel is now known as dviroel|lunch | 15:37 | |
*** ysandeep is now known as ysandeep|out | 15:49 | |
slittle1_ | need help to delete a tag pushed in error | 15:58 |
clarkb | slittle1_: we typically don't delete tags as they cannot be removed from downstream repositories that have already pulled the tag | 16:06 |
clarkb | instead we suggest that a subsequent tag be pushed to correct errors | 16:06 |
clarkb | (importantly if we delete a tag from the repo then you tag some other commit with the same value any repo that hasn't undergone manual intervention will still see the old commit on that tag potentially creating very confusing problems) | 16:07 |
jpic_ | clarkb: indeed, that worked! thank you! | 16:20 |
clarkb | jpic_: sorry for the trouble, but glad you got it sorted out. We've made one bug fix to gerrit to address one aspect of this, but there are a number of assumptions on the ID provider and gerrit sides that don't always align and working around all of them isn't always easy :/ | 16:22 |
slittle1_ | my error, the tag is good. | 16:23 |
slittle1_ | I was using 'git rev-parse' to inspect the tag... forgot it was anotated | 16:24 |
clarkb | good to hear | 16:25 |
slittle1_ | git rev-parse <tag>^{commit} shows thhhhe correct sha | 16:25 |
*** dviroel|lunch is now known as dviroel | 16:33 | |
clarkb | timburke: just following up on yesterday, should I go ahead and delete the held node? I think yes? | 17:15 |
timburke | oh yeah! thanks again for the help | 17:15 |
clarkb | now to load ssh keys | 17:17 |
timburke | fwiw i'm working on writing up a bug for eventlet about it -- i think it was caused by https://github.com/eventlet/eventlet/commit/3c25d0c (i noticed there's been an uptick since moving from 0.32.0 -> 0.33.0) | 17:17 |
clarkb | ok hold should be cleaning itself up now | 17:23 |
dtantsur | hi folks! has there been any demand for non-cirros images in the CI? we in Ironic need something that has a real grub. | 17:53 |
dtantsur | I'm currently looking into making something out of https://cloud-images.ubuntu.com/minimal/releases/focal/release/ but any ideas are welcome | 17:53 |
clarkb | dtantsur: its been discussed in the past, but not something that we would do directly I don't think. sean mooney was looking at an alpine image at one time | 17:54 |
dtantsur | alpine is a good idea as well | 17:55 |
clarkb | The upside to alpine is it was designed for this use case (it existed before containers and is meant for small embedded devices and other places where size is important) | 17:55 |
clarkb | Nova used openwrt at one time and even has an image commited to its repo | 17:55 |
clarkb | (openstack trivia time!) | 17:56 |
dtantsur | oh fun :D | 17:56 |
clarkb | I think that is the other place I would look for ideas or existing alternatives. Embedded distros like alpine/openwrt/etc | 17:56 |
dtantsur | a very good idea, thank you! | 17:57 |
dtantsur | if we find something small, would it be reasonable to cache it on infra nodes? | 17:57 |
clarkb | ya I think something in the size range of cirros or maybe even a little bigger would be reasonable. | 17:57 |
clarkb | Especially if we can get away with not needing 5 versions :) | 17:57 |
dtantsur | heh | 17:58 |
dtantsur | mmm, alpine offers ISOs for download.. and a root filesystem without kernel/bootloader | 17:58 |
clarkb | ya this is the main issue, iirc. So many things are stuck in 1999 with their cdroms | 17:58 |
dtantsur | :( | 17:58 |
clarkb | Looks like debian .xz's their cloud images which significantly reduces their size. Another potential option is that we start with a reasonable small image and then xz it and cache that | 17:59 |
clarkb | would depend on what (de)compression time is like | 18:00 |
clarkb | https://cloud-images.ubuntu.com/minimal/releases/focal/release/ubuntu-20.04-minimal-cloudimg-amd64.img and xz that. /me tries locally | 18:01 |
clarkb | or do something similar with dib | 18:02 |
clarkb | the upside to using a published image is it is easier for other people to reproduce | 18:02 |
fungi | it's too bad the emdebian effort imploded some years back. debian blend focused on very small footprint use cases | 18:03 |
dtantsur | https://github.com/dermotbradley/create-alpine-disk-image hmm | 18:04 |
clarkb | my xz of that ubuntu image is not done so not very quick. I'm running it under time so will have timing data when it completes | 18:05 |
clarkb | real1m49.338s | 18:06 |
clarkb | hrm it only compressed a few MB too | 18:06 |
clarkb | I wonder what flags debian is using to get better compression. Or maybe debian's .img isn't compressed already | 18:06 |
dtantsur | qcow2 is already pretty compressed | 18:06 |
dtantsur | (for ubuntu) | 18:06 |
clarkb | dtantsur: ya but debian's images go from 242MB to 151MB https://cloud.debian.org/images/cloud/bullseye/latest/ | 18:07 |
clarkb | but maybe they aren't compressing their images upfront and ubuntu is | 18:07 |
dtantsur | interesting | 18:08 |
dtantsur | the actual OS root partition is a bit larger for ubuntu: https://paste.opendev.org/show/811969/ | 18:10 |
clarkb | this is a mystery I could probably spend all day digging into but not sure there is enough value in that :) but it is curious | 18:12 |
dtantsur | no worries, I'll keep experimenting with different things | 18:13 |
dtantsur | the actual file content for debian is 600M, meh.. | 18:13 |
dtantsur | worst case, we will download and process the image in the jobs that need it | 18:14 |
fungi | i think it would also be possible to include those images on our ci mirrors instead of in our node images, should that prove preferable | 18:15 |
fungi | we probably already cache larger docker images for jobs | 18:15 |
dtantsur | fair | 18:15 |
fungi | but either way, teh smaller the better | 18:16 |
dtantsur | for the record, we (used to?) have a job that does some conversion of a centos 7 image: https://opendev.org/openstack/metalsmith/src/branch/master/playbooks/integration/centos-image.yaml | 18:16 |
clarkb | fungi: good point. We directly mirrored the fedora atomic images for magnum | 18:16 |
clarkb | dtantsur: ime the centos images are very large | 18:16 |
dtantsur | yep | 18:16 |
clarkb | actually I bet dib's container image thing could be useful here | 18:17 |
dtantsur | metalsmith has a low change rate so we could do that. probably a bit too much for ironic. | 18:17 |
clarkb | we might be abkle to use it to make an alpine image with grub | 18:17 |
dtantsur | I need to catch up with the DIB's container image thing. Are there docs? | 18:17 |
clarkb | basically instead of starting a chroot iwth a tool like debootstrap or yum it grabs the distro's container image an unpacks that to disk and chroots into that | 18:17 |
clarkb | let me see if I can find docs for it | 18:18 |
clarkb | but in theory you could do that with alpine to get the alpine image on the chroot without grub and a kernel. THen use alpines package manager to install those extra bits | 18:18 |
clarkb | https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/containerfile hrm seems it might be more of a building block for eg the fedora element right now | 18:19 |
clarkb | dtantsur: latest fedora image builds use it though so the fedora element may serve as a good example | 18:19 |
dtantsur | cool, thank you! I think Fedora will work for us just as well, if the image is not large. We really only need it to be able to boot and run cloud-init. | 18:20 |
dtantsur | (or glean) | 18:20 |
clarkb | fungi: rax email says afsdb03 had a sad. it is up and running now and bos status for it shows it is happy. I think we're fine | 18:21 |
clarkb | dtantsur: one issue with both of those is they rely on python which bloats images | 18:21 |
clarkb | this is one thing that makes cirros small, it has a simple init system (on top of all the other minimization techniques used) | 18:22 |
dtantsur | definitely :( was it mordred who tried to rewrite glean in rust? :) | 18:22 |
clarkb | he started it, but I think the effort stalled. I poked at it a bit when learning some rust. The idea isn't terrible, the problem is more the amount of weird edge cases for every little distro difference that all have to be encoded before it is really viable | 18:22 |
clarkb | basically doable but needs effort | 18:23 |
dtantsur | yeah | 18:23 |
dtantsur | well, Python is unavoidable in RH systems because of DNF | 18:23 |
dtantsur | Debian may be somewhat better, but what to do with cloud-init.. | 18:23 |
clarkb | I think I noticed that it uses a number of outdated/replaced libraries for stuff like json/yaml too. But updating that should be straightforward. The bigger issues are related to getting the behavior aligned because each distro is different | 18:23 |
dtantsur | I think more of them use either NetworkManager or systemd-networkwhatever? | 18:24 |
dtantsur | nowadays? | 18:24 |
clarkb | RHEL/Centos stream/Fedora are all networkmanager, except for centos < 8. Ubuntu uses netplan but glean configures it using the debian /etc/network/interfaces still. Gentoo and suse are systemd iirc | 18:25 |
dtantsur | okay, there is enough diversity :) | 18:26 |
dtantsur | for our case, I guess, I could write a terrible bash script using jq that just sets up SSH keys and maybe basic networking | 18:26 |
dtantsur | but that won't work for anyone else | 18:26 |
clarkb | that is what cirros does I think | 18:26 |
clarkb | it assumes dhcp and configures the ssh key. But not much else | 18:27 |
clarkb | It might do static config too of the network | 18:27 |
dtantsur | yeah, I remember diving into its bash scripts.. not the best memories | 18:27 |
dtantsur | anyway. I've got enough food for thought, now time for literal food! have a good weekend o/ | 18:30 |
clarkb | you too! | 18:31 |
*** rlandy|ruck is now known as rlandy|ruck|biab | 18:34 | |
*** rlandy|ruck|biab is now known as rlandy|ruck | 19:13 | |
*** dviroel is now known as dviroel|afk | 19:37 | |
fungi | update on the expiring certs for a few mirrors, looks like /var/log/ansible/letsencrypt.yaml.log hasn't been touched on bridge.o.o since monday, so something is probably blocking the job from running/succeeding | 21:38 |
fungi | i'll continue looking after dinner | 21:38 |
clarkb | in the past that happened when we broke our actual zuul config so jobs didn't run | 21:39 |
fungi | so it looks like infra-prod-base is failing | 22:34 |
fungi | build history says it last ran successfully on monday | 22:35 |
clarkb | hopefully the logs for it indicate why it is failing | 22:35 |
fungi | fatal: [nb03.opendev.org]: UNREACHABLE! | 22:35 |
clarkb | I wonder if it needs a hard reboot | 22:35 |
fungi | yeah, checking on it now | 22:35 |
clarkb | and if that doesn't work we can put it in emergency and email kevinz_ about it | 22:36 |
fungi | it responds to ping at least | 22:36 |
fungi | connection reset on 22/tcp though | 22:36 |
clarkb | In the past I had problems with it where I could even ssh in but it had a consistently high system load and that broke ssh timeouts for ansible | 22:36 |
clarkb | and a reboot fixed that | 22:36 |
fungi | `openstack console log show` isn't responding to me | 22:37 |
fungi | no, it was just the name resolution problem in the container version | 22:39 |
fungi | running a venv osc i can see some dracut and systemd errors on the console | 22:39 |
fungi | though they look like the usual dmesg spam caused by dib | 22:40 |
fungi | rebooting it now | 22:40 |
fungi | there was nothing obvious on the console to explain the connection resets | 22:41 |
fungi | hrm, reboot returned an opaque 5xx error | 22:41 |
fungi | server show says the server is running/active though | 22:42 |
clarkb | was that a normal reboot or a hard reboot? I don't know that I've gotten 500 errors from either in the past but the normal acpi reboots are often inneffective | 22:42 |
fungi | hard, but trying normal i also get the same "Unknown Error (HTTP 504)" | 22:43 |
clarkb | maybe put it in the emergency file for now and send kevinz_ an email? | 22:44 |
clarkb | if the APIs are failing then there isn't much we can do I don't think | 22:44 |
fungi | added to the emergency disable list | 22:50 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!