Thursday, 2023-05-11

clarkb	I've just discovered a problem with speculative gating of container images in quay using docker. tldr is that docker doesn't have an easy way to map quay.io/foo/bar to buildsetregistry/quay.io/foo/bar	03:13
clarkb	so non of the transparent fetching of images is working	03:13
clarkb	https://github.com/moby/moby/pull/34319 is the unfixed and closed upstream issue that would have addressed this	03:13
clarkb	there are some potential workarounds. I'm going to need to dig into this more tomorrow. But I wanted to give a heads up on this	03:14
clarkb	the lack of support for this makes me want to get off of docker hub even more quickly. Problem is it definitely makes it more difficult to do so	03:15
clarkb	I really wish I had realized this sooner. I guess this is a credit to the skopeo/podman crew to have solved this a long long time ago well enough that we just assumed this would work elsewhere...	03:26
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539	03:50
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679	03:50
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680	03:50
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539	04:30
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679	04:30
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680	04:30
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539	04:36
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679	04:36
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680	04:36
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539	04:42
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679	04:42
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680	04:42
*** dmellado9 is now known as dmellado		05:04
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539	05:17
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679	05:17
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680	05:17
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539	05:23
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679	05:23
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680	05:23
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539	05:37
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679	05:37
opendevreview	Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680	05:37
*** atmark is now known as Guest1107		07:51
*** amoralej is now known as amoralej\|lunch		12:58
*** dviroel__ is now known as dviroel		13:36
*** amoralej\|lunch is now known as amoralej		14:12
mnasiadka	Maybe a stupid question, but is there a mirror of cirros image on the OpenDev CI mirrors?	15:22
fungi	taking advantage of nice spring weather to grab an early lunch at the biergarten while i can. back in an hour-ish	15:22
fungi	mnasiadka: we bake copies of it into our node images	15:22
fungi	mnasiadka: https://review.opendev.org/873735 is a pending change to update that, but probably needs to be refreshed since it's a few months old now	15:23
fungi	anyway, biab	15:23
opendevreview	Radosław Piliszek proposed openstack/project-config master: Add nebulous/component-template https://review.opendev.org/c/openstack/project-config/+/882975	15:24
clarkb	I've tried to summarize the docker speculative image problem with images in quay here: https://etherpad.opendev.org/p/3anTDDTht91wLwohumzW feel free to add more info to that if you have ideas or clarifications	15:40
opendevreview	Rodolfo Alonso proposed openstack/project-config master: Revert "Temporary disable nested-virt labels in vexxhost-ca-ymq-1" https://review.opendev.org/c/openstack/project-config/+/882787	15:44
opendevreview	Clark Boylan proposed opendev/system-config master: DNM testing sideloading container images https://review.opendev.org/c/opendev/system-config/+/882977	15:58
opendevreview	Merged openstack/project-config master: Revert "Temporary disable nested-virt labels in vexxhost-ca-ymq-1" https://review.opendev.org/c/openstack/project-config/+/882787	16:05
*** amoralej is now known as amoralej\|off		16:20
stephenfin	fungi: clarkb: I assume you're aware of https://blog.pypi.org/posts/2023-04-23-introducing-pypi-organizations/ (context being management of OpenStack projects on PyPI)	16:23
stephenfin	Example of org in use https://pypi.org/org/pallets/	16:24
stephenfin	Example of project belonging to an org (look at Owner in the left column) https://pypi.org/project/Flask/	16:25
clarkb	its been mentioned a couple of times. I haven't looked into it yet	16:42
fungi	okay, i'm back	16:44
fungi	and yeah, i did register a pypi user called "opendev.org" a while back we can use for the org account	16:45
clarkb	I don't think 882977 is working unfortunately so even the potentially simple hacky workaround needs work :/	16:54
clarkb	I suspect the reason is that running skopeo copy doesn't look at aliases since it is a specific copy from X to Y command	16:55
clarkb	once the job finishes we should get more logs	16:55
opendevreview	Clark Boylan proposed opendev/system-config master: DNM testing sideloading container images https://review.opendev.org/c/opendev/system-config/+/882977	17:03
clarkb	yay for logs this may have just been pebcak. Hopefully it works but I'm still not getting my hopes up	17:03
opendevreview	Clark Boylan proposed opendev/system-config master: DNM testing sideloading container images https://review.opendev.org/c/opendev/system-config/+/882977	17:48
clarkb	now it is my turn to step out and enjoy the weather before the heat wave arrives. I'm going to jump on the bike with no particular plan and be back when I'm back. I think this will be good for head clearing after debugging the docker stuff last night	17:50
fungi	get out there!	17:51
clarkb	fungi: have you had a chance to look over the etherpad I linked earlier? curious if you've got an feedbakc for what I've brainstormed so far	21:35
clarkb	my hacky change (882977) appears to work actually. So thats one potential workaround (a not good workaround but a workaround)	21:35
fungi	no, sorry i've been trying to dig out from under yesterday's culmination of an exceptional openstack security advisory, the resultin breakage and ripple effects are unfortunately ongoing	21:38
clarkb	ack. When you get a chance it would probably be worth looking at. I've got one possible workaround in 882977 mocked up (currnetly only for jammy but theoretically possibly to solve for older systems)	21:40
fungi	this is the quay speculative container builds issue?	21:44
clarkb	yes	21:44
fungi	okay, found the link in scrollback	21:44
opendevreview	Merged openstack/project-config master: Add Node Feature Discovery FluxCD app to StarlingX https://review.opendev.org/c/openstack/project-config/+/881883	21:56
ianw	mnasiadka: no, not in the mirrors. but we do actually pull a range of images into each node's on-disk cache via nodepool -> https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/cache-devstack/source-repository-images	21:59
ianw	you can look at devstack which sets itself up to use these caches	22:00
ianw	clarkb: urgh. yeah it's not fair comparision because it's nested, but podman in the nodepool container has been ... interesting.	22:08
ianw	i will say whenever we have raised a problem, they have been responsive. but the problem is having to raise the problem :)	22:09
clarkb	ianw: ya I'm actually starting to come around to the workaround in 882977 as a short term thing at least.	22:11
ianw	use-buildset-registry already pulls the images from changes ahead of it into the BR ... perhaps we need a "populate-from-br" role?	22:11
ianw	the pulls the images from the BR locally first in a generic way?	22:11
clarkb	ianw: the problem with that is there isn't sufficient info in the test system to know if it shoudl pull or not	22:11
clarkb	specifically you need to pull the images if they aren't in the BR too	22:12
ianw	if would have to work the same as use-buildset-registry and ask zuul right?	22:12
clarkb	ianw: zuul doesn't have the info is the problem	22:12
clarkb	because you would only have images in the BR if you were updating the images too	22:12
clarkb	but somtimes you update only ansible and want to check that existing images work with new ansible for example	22:13
ianw	so won't they pull as usual from quay?	22:13
clarkb	I think this means we need to more tightly couple the image population to each specific service's ansible playbook/roles	22:13
clarkb	ianw: well the issue is if you pull per usual you'll overwrite what is populated by the BR	22:13
clarkb	you can only do one or the other	22:13
clarkb	but I think we can do that by embedding a thing in each service ansible that check if we are under testing and then populates the specific images it needs using skopeo then let that come from the BR or quay using the aliases normally	22:14
clarkb	I suppose we could do what you are suggesting and prevent docker-compose pull from running and feed a list of images into the population thing.	22:14
ianw	hrm, if you pull from the BR when you know you ahve to , and tag it as :latest in the local ... won't that satisfy docker-compose?	22:15
clarkb	we just have to be very careful that we don't skopeo copy then docker-compose pull undoing what we did with skopeo	22:15
clarkb	ianw: no becaus when you do a docker-compose pull it will check quay proper and notice that latest is different in quay.io and pull what is on quay.io overwriting what you did with skopeo	22:15
ianw	huh, i would have thought it would trust the local first :/	22:16
clarkb	using my example of the haproxy-statsd stuff I think what we want is a role that runs after docker-compose pull and accepts a list of images. Then if we are running under CI it installs skopeo however we do that for the distr version hen for each image it skopeo copies the image	22:16
clarkb	ianw: its basically what happens today when we update :latest on docker hub	22:17
clarkb	ianw: the local version says it is :latest but then when we pull it sees docker hub has a newer :latest and it fetches that	22:17
clarkb	this is doable we just have to embed it into the actual service ansible a bit more which is unfortuante but not the end of the world. Particularly if we don't run any of the weirdness in prod	22:17
ianw	fair enough, when you put it like that :0	22:18
ianw	:)	22:18
ianw	could you change quay.io to the BR in /etc/hosts?	22:19
clarkb	and I think we can probably start with that and see how ugly it turns out to be and use it in the short term. Then long term we can kill two birds with one stone and replace docker and docker-compose with their podman et al equivalents	22:19
clarkb	ianw: that doesn't work because in the BR the image path is br:5000/quay.io/foo/bar	22:19
clarkb	we might be able to do some fancy templating to make that work though	22:19
clarkb	we should add that idea to the etherpad	22:20
fungi	at one point we talked about making zuul-registry a pullthrough for anything it doesn't have locally... that never happened did it?	22:20
clarkb	the two problems with it are the nonstandard port and the quay.io/ path prefix (and possibly the insecure registry settings)	22:20
clarkb	fungi: it did	22:20
clarkb	well not pull through	22:21
clarkb	but it is happily hosting images for quay.io that we shove into it. The problem is the client also needs to be intelligent enough to fetch it from the registry	22:21
fungi	but if we ask the br for something it doesn't have, it'll fetch it from the right public registry and serve that to the requester?	22:21
clarkb	fungi: I think what you are describing is effectively the option on line 36	22:21
fungi	got it	22:21
clarkb	fungi: no, today the zuul registry is set up as the first alias for an upstream with the upstream being the second alias. Then clients try them in order until they get a winner	22:22
clarkb	fungi: the problem here is that docker only knows how to do this for containers on docker.io. It is effectively vendor lock in when you consider someone fixed it for them and they refused to merge the patch for 6 years	22:22
* fungi puts on his "shocked" face		22:23
clarkb	fungi: whta this means is that docker will always try to talk to quay.io directly when you list an image from quay.io. The only way around this from what I can tell is to run docker with an http_proxy set	22:23
clarkb	Then you could run an http_proxy that was super smart for our needs and looked at the BR first and falls back to upstream if not in the BR	22:23
clarkb	this is the otpion on line 36	22:24
clarkb	ya as much as I'm super frustrated by this whole thing it is also really good motivation and indication we are making the right high level choices in moving away from their stuff	22:24
ianw	just throwing ideas, but perhaps instead of podman, do something like run minikube and run in a more k8s-ish way? that might be interesting and something that could keep services flexible for possibly moving to some sort of k8s	22:24
clarkb	ianw: so podman actually suggests that for production workloads and I personally hate the idea :) I think it is ridiculout that you have to run an entire kubernetes to run a single container	22:25
clarkb	I'd rather go back to puppet deploying deb packages or whatever before I do that	22:25
clarkb	It feels like everyone has drank so much koolaid they never stopped to think if they stepped over into silly territory	22:26
ianw	i do agree; but i guess the advantage is that in the future, you don't point it at the kubernetes you're running but that is being run elsewhere	22:26
clarkb	ianw: the problem with that is we have nowhere to run a k8s that doesn't involve us doing it ourslves which is a ton of work	22:26
clarkb	magnum is not production worthy	22:26
clarkb	in particular it relies on short lived distros for the base OS so you don't get security updates almost immediately and even if you did you can't upgrade the k8s on magnum anyway	22:27
clarkb	so ya bespoke minikubes on each host would theoretically work. It just feels like terrible design	22:28
clarkb	we'll end up with far more complicated service management than the services we are running	22:28
ianw	sure; i mean practically i'm thinking in the same way rax and vexxhost provide us openstack resources, someone/something provides us openshift resources, etc. in a way that is practical to consume	22:28
ianw	anyway, just a thought. i didn't find minikube that hard to work with when i updated some of the zuul-jobs testing to use it as a test-backend for k8s things	22:29
clarkb	I'm also firmly against openshift because you cannot test on it	22:30
clarkb	and it needs like 12GB of memory just to run a single container. I mean I get that it is a possibility it just doesn't seem like a good one	22:30
clarkb	(side note I really think openshift has dropped the ball on making their software consumable from a development standpoint. minikube shows this can been done pretty effectively for the kubernetes world)	22:30
* fungi wonders where microk8s fits in that spectrum		22:31
clarkb	it is unfortunate because v3 made it possible but they broke all that with v4	22:31
ianw	yeah, it was 9gb when i looked into that stuff ^^^ which was a non-starter on our 8gb systems, and why minikube became the best option, plus the ubuntu integration	22:31
clarkb	fungi: microk8s is basically equivalent to minikube	22:31
ianw	sorry, microk8s ... yeah minikube wasn't an option for various reasons in the email i sent	22:32
fungi	got it, so not appreciably bigger or smaller than minikube	22:32
ianw	i forget but there was some work going on making a smaller openshift	22:32
fungi	they should call that effort "downshift"	22:33
clarkb	I think minikube/microk8s both need about 1GB memory	22:33
ianw	but i guess the gist is that if you're deploying services with k8s, the backend should at least theoretically be fairly switchable	22:33
clarkb	there is also kind but then you are back to docker :)	22:33
clarkb	There's a thing going around about how some crypto company paid datadog $65million for services in one quarter a year ago	22:34
ianw	https://github.com/openshift/microshift was it	22:34
clarkb	Feels like this sort of "just run an entire kubernetes for something that should be simple" falls under a similar problem space	22:35
clarkb	yes it is doable and it works, but it is complete overkill and you don't really get any benefits from it	22:35
ianw	this idea ... that you run a k8s on the "edge" as we'd effectively do, is a thing	22:35
clarkb	ianw: it is, but again I think it is complete overkill. We'd have to double the size of a number of our servers	22:36
clarkb	and we don't really get any benefits because most of these applications can't run in an active active manner with many redundant load balanced instances	22:36
clarkb	etherpad can't, irc bots can't, gerrit can't without a whole lot of extra work, etc. Gitea and hound likely could.	22:36
ianw	i think the load balancing is one thing, but essentially the abstraction about where the service is running is somewhat cool	22:37
clarkb	you really only start to see benefits if you can centralize the k8s (theoretically possible but we'd be on our own today running a large service like that) and/or use it for redundancy and load balancing	22:37
clarkb	If magnum was a viable option I think this would be far more likely	22:38
clarkb	but we don't have a gke equivalent that we can click a button on and run with	22:38
clarkb	to backup a bit. I don't think "move container image from this registry to that registry" should require us to completely change how we run just about every service we have	22:39
ianw	right, it gives you the option to centralise; by basically having all the work on the deployment side close-to-done. is that worth it? dunno, i guess you'd say no, i'd say "maybe"	22:39
clarkb	If we decide that is the road we want to take I would argue we undo the registry move and start with the complete rearchitect first. Then worry about image hosting	22:39
ianw	that i agree with -- replacing docker.io has really shown how embedded it is	22:40
clarkb	ianw: I think it is really only worth it if you don't have to suddenly run a very complex application alongside everything else. Most people use $cloud to do it for them and its fine	22:40
clarkb	and maybe this should be an indicatin that we've failed to provide adequate feedback to the magnum team over the yaers and we should start changing that	22:42
clarkb	and step one in that process would be to deploy a new magnum k8s cluster so that we can sort out the current issues	22:42
clarkb	however I'm 99% sure the k8s nodes are still fedora based which means we'll only get a few months out of each k8s cluster before it needs deletion and recreation	22:43
clarkb	hopefully I'm not completely crazy thinking we've jumped from "containers are amazing use them everywhere" to "the only way to run a container now is with k8s" :/	22:44
ianw	no, sorry to derail things. just if there's a lot of work in switching to podman on ubuntu (that we know is probably not the most well trodden path), "edge kubernetes" might also be worth considering	22:52
clarkb	I think it is a good to throw ideas out there. I'm just really skeptical of anything that requires us to redo completely everything and run more services	23:22

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!