clarkb | I've just discovered a problem with speculative gating of container images in quay using docker. tldr is that docker doesn't have an easy way to map quay.io/foo/bar to buildsetregistry/quay.io/foo/bar | 03:13 |
---|---|---|
clarkb | so non of the transparent fetching of images is working | 03:13 |
clarkb | https://github.com/moby/moby/pull/34319 is the unfixed and closed upstream issue that would have addressed this | 03:13 |
clarkb | there are some potential workarounds. I'm going to need to dig into this more tomorrow. But I wanted to give a heads up on this | 03:14 |
clarkb | the lack of support for this makes me want to get off of docker hub even more quickly. Problem is it definitely makes it more difficult to do so | 03:15 |
clarkb | I really wish I had realized this sooner. I guess this is a credit to the skopeo/podman crew to have solved this a long long time ago well enough that we just assumed this would work elsewhere... | 03:26 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539 | 03:50 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679 | 03:50 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680 | 03:50 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539 | 04:30 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679 | 04:30 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680 | 04:30 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539 | 04:36 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679 | 04:36 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680 | 04:36 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539 | 04:42 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679 | 04:42 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680 | 04:42 |
*** dmellado9 is now known as dmellado | 05:04 | |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539 | 05:17 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679 | 05:17 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680 | 05:17 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539 | 05:23 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679 | 05:23 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680 | 05:23 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539 | 05:37 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679 | 05:37 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680 | 05:37 |
*** atmark is now known as Guest1107 | 07:51 | |
*** amoralej is now known as amoralej|lunch | 12:58 | |
*** dviroel__ is now known as dviroel | 13:36 | |
*** amoralej|lunch is now known as amoralej | 14:12 | |
mnasiadka | Maybe a stupid question, but is there a mirror of cirros image on the OpenDev CI mirrors? | 15:22 |
fungi | taking advantage of nice spring weather to grab an early lunch at the biergarten while i can. back in an hour-ish | 15:22 |
fungi | mnasiadka: we bake copies of it into our node images | 15:22 |
fungi | mnasiadka: https://review.opendev.org/873735 is a pending change to update that, but probably needs to be refreshed since it's a few months old now | 15:23 |
fungi | anyway, biab | 15:23 |
opendevreview | Radosław Piliszek proposed openstack/project-config master: Add nebulous/component-template https://review.opendev.org/c/openstack/project-config/+/882975 | 15:24 |
clarkb | I've tried to summarize the docker speculative image problem with images in quay here: https://etherpad.opendev.org/p/3anTDDTht91wLwohumzW feel free to add more info to that if you have ideas or clarifications | 15:40 |
opendevreview | Rodolfo Alonso proposed openstack/project-config master: Revert "Temporary disable nested-virt labels in vexxhost-ca-ymq-1" https://review.opendev.org/c/openstack/project-config/+/882787 | 15:44 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM testing sideloading container images https://review.opendev.org/c/opendev/system-config/+/882977 | 15:58 |
opendevreview | Merged openstack/project-config master: Revert "Temporary disable nested-virt labels in vexxhost-ca-ymq-1" https://review.opendev.org/c/openstack/project-config/+/882787 | 16:05 |
*** amoralej is now known as amoralej|off | 16:20 | |
stephenfin | fungi: clarkb: I assume you're aware of https://blog.pypi.org/posts/2023-04-23-introducing-pypi-organizations/ (context being management of OpenStack projects on PyPI) | 16:23 |
stephenfin | Example of org in use https://pypi.org/org/pallets/ | 16:24 |
stephenfin | Example of project belonging to an org (look at Owner in the left column) https://pypi.org/project/Flask/ | 16:25 |
clarkb | its been mentioned a couple of times. I haven't looked into it yet | 16:42 |
fungi | okay, i'm back | 16:44 |
fungi | and yeah, i did register a pypi user called "opendev.org" a while back we can use for the org account | 16:45 |
clarkb | I don't think 882977 is working unfortunately so even the potentially simple hacky workaround needs work :/ | 16:54 |
clarkb | I suspect the reason is that running skopeo copy doesn't look at aliases since it is a specific copy from X to Y command | 16:55 |
clarkb | once the job finishes we should get more logs | 16:55 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM testing sideloading container images https://review.opendev.org/c/opendev/system-config/+/882977 | 17:03 |
clarkb | yay for logs this may have just been pebcak. Hopefully it works but I'm still not getting my hopes up | 17:03 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM testing sideloading container images https://review.opendev.org/c/opendev/system-config/+/882977 | 17:48 |
clarkb | now it is my turn to step out and enjoy the weather before the heat wave arrives. I'm going to jump on the bike with no particular plan and be back when I'm back. I think this will be good for head clearing after debugging the docker stuff last night | 17:50 |
fungi | get out there! | 17:51 |
clarkb | fungi: have you had a chance to look over the etherpad I linked earlier? curious if you've got an feedbakc for what I've brainstormed so far | 21:35 |
clarkb | my hacky change (882977) appears to work actually. So thats one potential workaround (a not good workaround but a workaround) | 21:35 |
fungi | no, sorry i've been trying to dig out from under yesterday's culmination of an exceptional openstack security advisory, the resultin breakage and ripple effects are unfortunately ongoing | 21:38 |
clarkb | ack. When you get a chance it would probably be worth looking at. I've got one possible workaround in 882977 mocked up (currnetly only for jammy but theoretically possibly to solve for older systems) | 21:40 |
fungi | this is the quay speculative container builds issue? | 21:44 |
clarkb | yes | 21:44 |
fungi | okay, found the link in scrollback | 21:44 |
opendevreview | Merged openstack/project-config master: Add Node Feature Discovery FluxCD app to StarlingX https://review.opendev.org/c/openstack/project-config/+/881883 | 21:56 |
ianw | mnasiadka: no, not in the mirrors. but we do actually pull a range of images into each node's on-disk cache via nodepool -> https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/cache-devstack/source-repository-images | 21:59 |
ianw | you can look at devstack which sets itself up to use these caches | 22:00 |
ianw | clarkb: urgh. yeah it's not fair comparision because it's nested, but podman in the nodepool container has been ... interesting. | 22:08 |
ianw | i will say whenever we have raised a problem, they have been responsive. but the problem is having to raise the problem :) | 22:09 |
clarkb | ianw: ya I'm actually starting to come around to the workaround in 882977 as a short term thing at least. | 22:11 |
ianw | use-buildset-registry already pulls the images from changes ahead of it into the BR ... perhaps we need a "populate-from-br" role? | 22:11 |
ianw | the pulls the images from the BR locally first in a generic way? | 22:11 |
clarkb | ianw: the problem with that is there isn't sufficient info in the test system to know if it shoudl pull or not | 22:11 |
clarkb | specifically you need to pull the images if they aren't in the BR too | 22:12 |
ianw | if would have to work the same as use-buildset-registry and ask zuul right? | 22:12 |
clarkb | ianw: zuul doesn't have the info is the problem | 22:12 |
clarkb | because you would only have images in the BR if you were updating the images too | 22:12 |
clarkb | but somtimes you update only ansible and want to check that existing images work with new ansible for example | 22:13 |
ianw | so won't they pull as usual from quay? | 22:13 |
clarkb | I think this means we need to more tightly couple the image population to each specific service's ansible playbook/roles | 22:13 |
clarkb | ianw: well the issue is if you pull per usual you'll overwrite what is populated by the BR | 22:13 |
clarkb | you can only do one or the other | 22:13 |
clarkb | but I think we can do that by embedding a thing in each service ansible that check if we are under testing and then populates the specific images it needs using skopeo then let that come from the BR or quay using the aliases normally | 22:14 |
clarkb | I suppose we could do what you are suggesting and prevent docker-compose pull from running and feed a list of images into the population thing. | 22:14 |
ianw | hrm, if you pull from the BR when you know you ahve to , and tag it as :latest in the local ... won't that satisfy docker-compose? | 22:15 |
clarkb | we just have to be very careful that we don't skopeo copy then docker-compose pull undoing what we did with skopeo | 22:15 |
clarkb | ianw: no becaus when you do a docker-compose pull it will check quay proper and notice that latest is different in quay.io and pull what is on quay.io overwriting what you did with skopeo | 22:15 |
ianw | huh, i would have thought it would trust the local first :/ | 22:16 |
clarkb | using my example of the haproxy-statsd stuff I think what we want is a role that runs after docker-compose pull and accepts a list of images. Then if we are running under CI it installs skopeo however we do that for the distr version hen for each image it skopeo copies the image | 22:16 |
clarkb | ianw: its basically what happens today when we update :latest on docker hub | 22:17 |
clarkb | ianw: the local version says it is :latest but then when we pull it sees docker hub has a newer :latest and it fetches that | 22:17 |
clarkb | this is doable we just have to embed it into the actual service ansible a bit more which is unfortuante but not the end of the world. Particularly if we don't run any of the weirdness in prod | 22:17 |
ianw | fair enough, when you put it like that :0 | 22:18 |
ianw | :) | 22:18 |
ianw | could you change quay.io to the BR in /etc/hosts? | 22:19 |
clarkb | and I think we can probably start with that and see how ugly it turns out to be and use it in the short term. Then long term we can kill two birds with one stone and replace docker and docker-compose with their podman et al equivalents | 22:19 |
clarkb | ianw: that doesn't work because in the BR the image path is br:5000/quay.io/foo/bar | 22:19 |
clarkb | we might be able to do some fancy templating to make that work though | 22:19 |
clarkb | we should add that idea to the etherpad | 22:20 |
fungi | at one point we talked about making zuul-registry a pullthrough for anything it doesn't have locally... that never happened did it? | 22:20 |
clarkb | the two problems with it are the nonstandard port and the quay.io/ path prefix (and possibly the insecure registry settings) | 22:20 |
clarkb | fungi: it did | 22:20 |
clarkb | well not pull through | 22:21 |
clarkb | but it is happily hosting images for quay.io that we shove into it. The problem is the client also needs to be intelligent enough to fetch it from the registry | 22:21 |
fungi | but if we ask the br for something it doesn't have, it'll fetch it from the right public registry and serve that to the requester? | 22:21 |
clarkb | fungi: I think what you are describing is effectively the option on line 36 | 22:21 |
fungi | got it | 22:21 |
clarkb | fungi: no, today the zuul registry is set up as the first alias for an upstream with the upstream being the second alias. Then clients try them in order until they get a winner | 22:22 |
clarkb | fungi: the problem here is that docker only knows how to do this for containers on docker.io. It is effectively vendor lock in when you consider someone fixed it for them and they refused to merge the patch for 6 years | 22:22 |
* fungi puts on his "shocked" face | 22:23 | |
clarkb | fungi: whta this means is that docker will always try to talk to quay.io directly when you list an image from quay.io. The only way around this from what I can tell is to run docker with an http_proxy set | 22:23 |
clarkb | Then you could run an http_proxy that was super smart for our needs and looked at the BR first and falls back to upstream if not in the BR | 22:23 |
clarkb | this is the otpion on line 36 | 22:24 |
clarkb | ya as much as I'm super frustrated by this whole thing it is also really good motivation and indication we are making the right high level choices in moving away from their stuff | 22:24 |
ianw | just throwing ideas, but perhaps instead of podman, do something like run minikube and run in a more k8s-ish way? that might be interesting and something that could keep services flexible for possibly moving to some sort of k8s | 22:24 |
clarkb | ianw: so podman actually suggests that for production workloads and I personally hate the idea :) I think it is ridiculout that you have to run an entire kubernetes to run a single container | 22:25 |
clarkb | I'd rather go back to puppet deploying deb packages or whatever before I do that | 22:25 |
clarkb | It feels like everyone has drank so much koolaid they never stopped to think if they stepped over into silly territory | 22:26 |
ianw | i do agree; but i guess the advantage is that in the future, you don't point it at the kubernetes you're running but that is being run elsewhere | 22:26 |
clarkb | ianw: the problem with that is we have nowhere to run a k8s that doesn't involve us doing it ourslves which is a ton of work | 22:26 |
clarkb | magnum is not production worthy | 22:26 |
clarkb | in particular it relies on short lived distros for the base OS so you don't get security updates almost immediately and even if you did you can't upgrade the k8s on magnum anyway | 22:27 |
clarkb | so ya bespoke minikubes on each host would theoretically work. It just feels like terrible design | 22:28 |
clarkb | we'll end up with far more complicated service management than the services we are running | 22:28 |
ianw | sure; i mean practically i'm thinking in the same way rax and vexxhost provide us openstack resources, someone/something provides us openshift resources, etc. in a way that is practical to consume | 22:28 |
ianw | anyway, just a thought. i didn't find minikube that hard to work with when i updated some of the zuul-jobs testing to use it as a test-backend for k8s things | 22:29 |
clarkb | I'm also firmly against openshift because you cannot test on it | 22:30 |
clarkb | and it needs like 12GB of memory just to run a single container. I mean I get that it is a possibility it just doesn't seem like a good one | 22:30 |
clarkb | (side note I really think openshift has dropped the ball on making their software consumable from a development standpoint. minikube shows this can been done pretty effectively for the kubernetes world) | 22:30 |
* fungi wonders where microk8s fits in that spectrum | 22:31 | |
clarkb | it is unfortunate because v3 made it possible but they broke all that with v4 | 22:31 |
ianw | yeah, it was 9gb when i looked into that stuff ^^^ which was a non-starter on our 8gb systems, and why minikube became the best option, plus the ubuntu integration | 22:31 |
clarkb | fungi: microk8s is basically equivalent to minikube | 22:31 |
ianw | sorry, microk8s ... yeah minikube wasn't an option for various reasons in the email i sent | 22:32 |
fungi | got it, so not appreciably bigger or smaller than minikube | 22:32 |
ianw | i forget but there was some work going on making a smaller openshift | 22:32 |
fungi | they should call that effort "downshift" | 22:33 |
clarkb | I think minikube/microk8s both need about 1GB memory | 22:33 |
ianw | but i guess the gist is that if you're deploying services with k8s, the backend should at least theoretically be fairly switchable | 22:33 |
clarkb | there is also kind but then you are back to docker :) | 22:33 |
clarkb | There's a thing going around about how some crypto company paid datadog $65million for services in one quarter a year ago | 22:34 |
ianw | https://github.com/openshift/microshift was it | 22:34 |
clarkb | Feels like this sort of "just run an entire kubernetes for something that should be simple" falls under a similar problem space | 22:35 |
clarkb | yes it is doable and it works, but it is complete overkill and you don't really get any benefits from it | 22:35 |
ianw | this idea ... that you run a k8s on the "edge" as we'd effectively do, is a thing | 22:35 |
clarkb | ianw: it is, but again I think it is complete overkill. We'd have to double the size of a number of our servers | 22:36 |
clarkb | and we don't really get any benefits because most of these applications can't run in an active active manner with many redundant load balanced instances | 22:36 |
clarkb | etherpad can't, irc bots can't, gerrit can't without a whole lot of extra work, etc. Gitea and hound likely could. | 22:36 |
ianw | i think the load balancing is one thing, but essentially the abstraction about where the service is running is somewhat cool | 22:37 |
clarkb | you really only start to see benefits if you can centralize the k8s (theoretically possible but we'd be on our own today running a large service like that) and/or use it for redundancy and load balancing | 22:37 |
clarkb | If magnum was a viable option I think this would be far more likely | 22:38 |
clarkb | but we don't have a gke equivalent that we can click a button on and run with | 22:38 |
clarkb | to backup a bit. I don't think "move container image from this registry to that registry" should require us to completely change how we run just about every service we have | 22:39 |
ianw | right, it gives you the *option* to centralise; by basically having all the work on the deployment side close-to-done. is that worth it? dunno, i guess you'd say no, i'd say "maybe" | 22:39 |
clarkb | If we decide that is the road we want to take I would argue we undo the registry move and start with the complete rearchitect first. Then worry about image hosting | 22:39 |
ianw | that i agree with -- replacing docker.io has really shown how embedded it is | 22:40 |
clarkb | ianw: I think it is really only worth it if you don't have to suddenly run a very complex application alongside everything else. Most people use $cloud to do it for them and its fine | 22:40 |
clarkb | and maybe this should be an indicatin that we've failed to provide adequate feedback to the magnum team over the yaers and we should start changing that | 22:42 |
clarkb | and step one in that process would be to deploy a new magnum k8s cluster so that we can sort out the current issues | 22:42 |
clarkb | however I'm 99% sure the k8s nodes are still fedora based which means we'll only get a few months out of each k8s cluster before it needs deletion and recreation | 22:43 |
clarkb | hopefully I'm not completely crazy thinking we've jumped from "containers are amazing use them everywhere" to "the only way to run a container now is with k8s" :/ | 22:44 |
ianw | no, sorry to derail things. just if there's a lot of work in switching to podman on ubuntu (that we know is probably not the most well trodden path), "edge kubernetes" might also be worth considering | 22:52 |
clarkb | I think it is a good to throw ideas out there. I'm just really skeptical of anything that requires us to redo completely everything and run more services | 23:22 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!