Friday, 2022-01-07

*** ysandeep\|out is now known as ysandeep		02:15
*** mazzy5098812929 is now known as mazzy509881292		02:18
*** rlandy\|ruck is now known as rlandy\|out		05:39
*** ysandeep is now known as ysandeep\|afk		06:40
*** ysandeep\|afk is now known as ysandeep		09:38
*** rlandy\|out is now known as rlandy\|ruck		11:15
*** dviroel\|afk is now known as dviroel		11:25
opendevreview	Gustavo Sanchez proposed openstack/project-config master: Add the cinder-solidfire charm to Openstack charms https://review.opendev.org/c/openstack/project-config/+/823803	14:00
opendevreview	Gustavo Sanchez proposed openstack/project-config master: Add the cinder-solidfire charm to Openstack charms https://review.opendev.org/c/openstack/project-config/+/823803	14:08
fungi	so looks like the letsencrypt certs which failed to renew are for the two vexxhost mirrors and the limestone mirror, i'll try to look through relevant logs in a bit	14:24
*** ysandeep is now known as ysandeep\|dinner		14:45
*** ysandeep\|dinner is now known as ysandeep		15:10
*** dviroel is now known as dviroel\|lunch		15:37
*** ysandeep is now known as ysandeep\|out		15:49
slittle1_	need help to delete a tag pushed in error	15:58
clarkb	slittle1_: we typically don't delete tags as they cannot be removed from downstream repositories that have already pulled the tag	16:06
clarkb	instead we suggest that a subsequent tag be pushed to correct errors	16:06
clarkb	(importantly if we delete a tag from the repo then you tag some other commit with the same value any repo that hasn't undergone manual intervention will still see the old commit on that tag potentially creating very confusing problems)	16:07
jpic_	clarkb: indeed, that worked! thank you!	16:20
clarkb	jpic_: sorry for the trouble, but glad you got it sorted out. We've made one bug fix to gerrit to address one aspect of this, but there are a number of assumptions on the ID provider and gerrit sides that don't always align and working around all of them isn't always easy :/	16:22
slittle1_	my error, the tag is good.	16:23
slittle1_	I was using 'git rev-parse' to inspect the tag... forgot it was anotated	16:24
clarkb	good to hear	16:25
slittle1_	git rev-parse <tag>^{commit} shows thhhhe correct sha	16:25
*** dviroel\|lunch is now known as dviroel		16:33
clarkb	timburke: just following up on yesterday, should I go ahead and delete the held node? I think yes?	17:15
timburke	oh yeah! thanks again for the help	17:15
clarkb	now to load ssh keys	17:17
timburke	fwiw i'm working on writing up a bug for eventlet about it -- i think it was caused by https://github.com/eventlet/eventlet/commit/3c25d0c (i noticed there's been an uptick since moving from 0.32.0 -> 0.33.0)	17:17
clarkb	ok hold should be cleaning itself up now	17:23
dtantsur	hi folks! has there been any demand for non-cirros images in the CI? we in Ironic need something that has a real grub.	17:53
dtantsur	I'm currently looking into making something out of https://cloud-images.ubuntu.com/minimal/releases/focal/release/ but any ideas are welcome	17:53
clarkb	dtantsur: its been discussed in the past, but not something that we would do directly I don't think. sean mooney was looking at an alpine image at one time	17:54
dtantsur	alpine is a good idea as well	17:55
clarkb	The upside to alpine is it was designed for this use case (it existed before containers and is meant for small embedded devices and other places where size is important)	17:55
clarkb	Nova used openwrt at one time and even has an image commited to its repo	17:55
clarkb	(openstack trivia time!)	17:56
dtantsur	oh fun :D	17:56
clarkb	I think that is the other place I would look for ideas or existing alternatives. Embedded distros like alpine/openwrt/etc	17:56
dtantsur	a very good idea, thank you!	17:57
dtantsur	if we find something small, would it be reasonable to cache it on infra nodes?	17:57
clarkb	ya I think something in the size range of cirros or maybe even a little bigger would be reasonable.	17:57
clarkb	Especially if we can get away with not needing 5 versions :)	17:57
dtantsur	heh	17:58
dtantsur	mmm, alpine offers ISOs for download.. and a root filesystem without kernel/bootloader	17:58
clarkb	ya this is the main issue, iirc. So many things are stuck in 1999 with their cdroms	17:58
dtantsur	:(	17:58
clarkb	Looks like debian .xz's their cloud images which significantly reduces their size. Another potential option is that we start with a reasonable small image and then xz it and cache that	17:59
clarkb	would depend on what (de)compression time is like	18:00
clarkb	https://cloud-images.ubuntu.com/minimal/releases/focal/release/ubuntu-20.04-minimal-cloudimg-amd64.img and xz that. /me tries locally	18:01
clarkb	or do something similar with dib	18:02
clarkb	the upside to using a published image is it is easier for other people to reproduce	18:02
fungi	it's too bad the emdebian effort imploded some years back. debian blend focused on very small footprint use cases	18:03
dtantsur	https://github.com/dermotbradley/create-alpine-disk-image hmm	18:04
clarkb	my xz of that ubuntu image is not done so not very quick. I'm running it under time so will have timing data when it completes	18:05
clarkb	real1m49.338s	18:06
clarkb	hrm it only compressed a few MB too	18:06
clarkb	I wonder what flags debian is using to get better compression. Or maybe debian's .img isn't compressed already	18:06
dtantsur	qcow2 is already pretty compressed	18:06
dtantsur	(for ubuntu)	18:06
clarkb	dtantsur: ya but debian's images go from 242MB to 151MB https://cloud.debian.org/images/cloud/bullseye/latest/	18:07
clarkb	but maybe they aren't compressing their images upfront and ubuntu is	18:07
dtantsur	interesting	18:08
dtantsur	the actual OS root partition is a bit larger for ubuntu: https://paste.opendev.org/show/811969/	18:10
clarkb	this is a mystery I could probably spend all day digging into but not sure there is enough value in that :) but it is curious	18:12
dtantsur	no worries, I'll keep experimenting with different things	18:13
dtantsur	the actual file content for debian is 600M, meh..	18:13
dtantsur	worst case, we will download and process the image in the jobs that need it	18:14
fungi	i think it would also be possible to include those images on our ci mirrors instead of in our node images, should that prove preferable	18:15
fungi	we probably already cache larger docker images for jobs	18:15
dtantsur	fair	18:15
fungi	but either way, teh smaller the better	18:16
dtantsur	for the record, we (used to?) have a job that does some conversion of a centos 7 image: https://opendev.org/openstack/metalsmith/src/branch/master/playbooks/integration/centos-image.yaml	18:16
clarkb	fungi: good point. We directly mirrored the fedora atomic images for magnum	18:16
clarkb	dtantsur: ime the centos images are very large	18:16
dtantsur	yep	18:16
clarkb	actually I bet dib's container image thing could be useful here	18:17
dtantsur	metalsmith has a low change rate so we could do that. probably a bit too much for ironic.	18:17
clarkb	we might be abkle to use it to make an alpine image with grub	18:17
dtantsur	I need to catch up with the DIB's container image thing. Are there docs?	18:17
clarkb	basically instead of starting a chroot iwth a tool like debootstrap or yum it grabs the distro's container image an unpacks that to disk and chroots into that	18:17
clarkb	let me see if I can find docs for it	18:18
clarkb	but in theory you could do that with alpine to get the alpine image on the chroot without grub and a kernel. THen use alpines package manager to install those extra bits	18:18
clarkb	https://opendev.org/openstack/diskimage-builder/src/branch/master/diskimage_builder/elements/containerfile hrm seems it might be more of a building block for eg the fedora element right now	18:19
clarkb	dtantsur: latest fedora image builds use it though so the fedora element may serve as a good example	18:19
dtantsur	cool, thank you! I think Fedora will work for us just as well, if the image is not large. We really only need it to be able to boot and run cloud-init.	18:20
dtantsur	(or glean)	18:20
clarkb	fungi: rax email says afsdb03 had a sad. it is up and running now and bos status for it shows it is happy. I think we're fine	18:21
clarkb	dtantsur: one issue with both of those is they rely on python which bloats images	18:21
clarkb	this is one thing that makes cirros small, it has a simple init system (on top of all the other minimization techniques used)	18:22
dtantsur	definitely :( was it mordred who tried to rewrite glean in rust? :)	18:22
clarkb	he started it, but I think the effort stalled. I poked at it a bit when learning some rust. The idea isn't terrible, the problem is more the amount of weird edge cases for every little distro difference that all have to be encoded before it is really viable	18:22
clarkb	basically doable but needs effort	18:23
dtantsur	yeah	18:23
dtantsur	well, Python is unavoidable in RH systems because of DNF	18:23
dtantsur	Debian may be somewhat better, but what to do with cloud-init..	18:23
clarkb	I think I noticed that it uses a number of outdated/replaced libraries for stuff like json/yaml too. But updating that should be straightforward. The bigger issues are related to getting the behavior aligned because each distro is different	18:23
dtantsur	I think more of them use either NetworkManager or systemd-networkwhatever?	18:24
dtantsur	nowadays?	18:24
clarkb	RHEL/Centos stream/Fedora are all networkmanager, except for centos < 8. Ubuntu uses netplan but glean configures it using the debian /etc/network/interfaces still. Gentoo and suse are systemd iirc	18:25
dtantsur	okay, there is enough diversity :)	18:26
dtantsur	for our case, I guess, I could write a terrible bash script using jq that just sets up SSH keys and maybe basic networking	18:26
dtantsur	but that won't work for anyone else	18:26
clarkb	that is what cirros does I think	18:26
clarkb	it assumes dhcp and configures the ssh key. But not much else	18:27
clarkb	It might do static config too of the network	18:27
dtantsur	yeah, I remember diving into its bash scripts.. not the best memories	18:27
dtantsur	anyway. I've got enough food for thought, now time for literal food! have a good weekend o/	18:30
clarkb	you too!	18:31
*** rlandy\|ruck is now known as rlandy\|ruck\|biab		18:34
*** rlandy\|ruck\|biab is now known as rlandy\|ruck		19:13
*** dviroel is now known as dviroel\|afk		19:37
fungi	update on the expiring certs for a few mirrors, looks like /var/log/ansible/letsencrypt.yaml.log hasn't been touched on bridge.o.o since monday, so something is probably blocking the job from running/succeeding	21:38
fungi	i'll continue looking after dinner	21:38
clarkb	in the past that happened when we broke our actual zuul config so jobs didn't run	21:39
fungi	so it looks like infra-prod-base is failing	22:34
fungi	build history says it last ran successfully on monday	22:35
clarkb	hopefully the logs for it indicate why it is failing	22:35
fungi	fatal: [nb03.opendev.org]: UNREACHABLE!	22:35
clarkb	I wonder if it needs a hard reboot	22:35
fungi	yeah, checking on it now	22:35
clarkb	and if that doesn't work we can put it in emergency and email kevinz_ about it	22:36
fungi	it responds to ping at least	22:36
fungi	connection reset on 22/tcp though	22:36
clarkb	In the past I had problems with it where I could even ssh in but it had a consistently high system load and that broke ssh timeouts for ansible	22:36
clarkb	and a reboot fixed that	22:36
fungi	`openstack console log show` isn't responding to me	22:37
fungi	no, it was just the name resolution problem in the container version	22:39
fungi	running a venv osc i can see some dracut and systemd errors on the console	22:39
fungi	though they look like the usual dmesg spam caused by dib	22:40
fungi	rebooting it now	22:40
fungi	there was nothing obvious on the console to explain the connection resets	22:41
fungi	hrm, reboot returned an opaque 5xx error	22:41
fungi	server show says the server is running/active though	22:42
clarkb	was that a normal reboot or a hard reboot? I don't know that I've gotten 500 errors from either in the past but the normal acpi reboots are often inneffective	22:42
fungi	hard, but trying normal i also get the same "Unknown Error (HTTP 504)"	22:43
clarkb	maybe put it in the emergency file for now and send kevinz_ an email?	22:44
clarkb	if the APIs are failing then there isn't much we can do I don't think	22:44
fungi	added to the emergency disable list	22:50

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!