| -@gerrit:opendev.org- Benjamin Schanzel proposed: [zuul/nodepool] 959679: Replace deprecated pkg_resources with importlib https://review.opendev.org/c/zuul/nodepool/+/959679 | 07:35 | |
| -@gerrit:opendev.org- Fabien Boucher proposed: [zuul/zuul] 959689: Fix the model changelog between version 34 and 35 https://review.opendev.org/c/zuul/zuul/+/959689 | 10:05 | |
| -@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 959428: Fix provider quota calculation https://review.opendev.org/c/zuul/zuul/+/959428 | 10:23 | |
| @ckulla:matrix.org | Hi, I experience today a regression when building docker images with the build-docker-job, networking is causing problems, dns isn't working. It might be related to recently merged change https://review.opendev.org/c/zuul/zuul-jobs/+/958783 as it happens only with buildx. When using 'docker buildx build --network==host' it works as expected. In the log I see that '--driver-opt network=host' is set in the 'Create builder' task. But looks like this is not sufficient. | 12:03 |
|---|---|---|
| -@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Benjamin Schanzel: [zuul/nodepool] 959679: Replace deprecated pkg_resources with importlib https://review.opendev.org/c/zuul/nodepool/+/959679 | 13:35 | |
| @jim:acmegating.com | Christoph Kulla: that is surprising. thanks for the report. are you using it with a buildset registry and/or an intermediate registry, or standalone? | 13:36 |
| @ckulla:matrix.org | I'm using it standalone | 13:39 |
| @jim:acmegating.com | okay, i wonder if we lack a test for that, or if there's something different about the OS or DNS setup. | 13:41 |
| @clarkb:matrix.org | Fwiw after that landed I made a point of building a bunch of images in opendev and none failed. But they all would have used the build set registry so that could be a difference | 13:42 |
| @jim:acmegating.com | (these roles are very well tested, so i'm wondering what the difference is) | 13:42 |
| @clarkb:matrix.org | Christoph Kulla: I think docker defaults to using Google DNS resolvers if the host systems' resolver is localhost (since localhost doesn't work in network namespaces). Is it possible that your environment uses a localhost resolver and Google DNS is not reachable?, that might explain why an extra explicit host networking argument gets things working | 13:46 |
| @fungicide:matrix.org | do we rely on dns lookups for reaching the buildset registry, or raw ip address/injected hosts entries? | 13:46 |
| @clarkb:matrix.org | fungi: the role modifies /etc/hosts to point at the build set registry | 13:47 |
| @jim:acmegating.com | i think there's at least some injection, which may also be insulating the jobs from dns issues | 13:47 |
| @clarkb:matrix.org | corvus: yes or it could be that since we don't block access to 8.8.8.8 we're working anyway | 13:48 |
| @jim:acmegating.com | also, in the name of test resiliency, did we managed to completely remove references to real external images? so we might not even be exercising a lookup of, say, registry1.docker.io | 13:48 |
| @jim:acmegating.com | * also, in the name of test resiliency, did we manage to completely remove references to real external images? so we might not even be exercising a lookup of, say, registry1.docker.io | 13:48 |
| @clarkb:matrix.org | Yes, but I don't think that is true of the opendev builds I ran yesterday | 13:49 |
| @clarkb:matrix.org | Nor of the old multiarch testing | 13:49 |
| @clarkb:matrix.org | I suspect but am not 100% confident that lookups of say docker hub do generally work hence my theory it could be the fallback dns is what is failing | 13:50 |
| @clarkb:matrix.org | https://review.opendev.org/c/opendev/system-config/+/957277/ the builds here should've pulled the python images from docker hub | 13:51 |
| @jim:acmegating.com | Christoph Kulla: in addition to Clark 's question above about dns resolution -- can you shed any light on what dns queries might be failing? ie, are you referencing a docker.io image in FROM, or maybe a local registry? if so, is that local registry only listed in a local dns server? any info like that could be useful. | 13:52 |
| @jim:acmegating.com | https://github.com/docker/buildx/issues/1823 maybe related | 13:55 |
| @clarkb:matrix.org | Fwiw if setting networking to host on the build command solves the problem I don't think there is an issue with doing that in the role as we already want to use host networking. But it would be good to understand why it is necessary (that issue seems like a good hint) | 13:56 |
| @jim:acmegating.com | https://github.com/docker/buildx/issues/1688 is a little confusing; i *think* they say that adding --network=host to buildx build doesn't fix that problem for them. but it's unclear | 13:56 |
| @jim:acmegating.com | Clark: yeah, i'm not coming up with a downside for --network=host on the build command. so we can maybe just add that, and if we find it's a problem, make an option for it. | 13:57 |
| @clarkb:matrix.org | The other thing it could be is ssl cert trust chains. But using host networking wouldn't affect which certs are trusted | 14:02 |
| @clarkb:matrix.org | ++ to just adding that and make it an option if necessary. I have to do school departure things for the next bit though so happy for someone else to write that | 14:03 |
| @jangutter:matrix.org | network=host in the build command is a layer of isolation gone bananas - the hypothetical exploit where you trick the builder into doing something nasty for you. | 14:35 |
| @jangutter:matrix.org | I mean _not_ using network=host in the build command, of course. need to engage brain before sarcasm.... | 14:36 |
| @jim:acmegating.com | yeah, i don't think we anticipate these roles being used in place where we think the node is at that kind of risk. | 14:37 |
| @jangutter:matrix.org | my theory about this is that the builder is just being treated at the default isolation level. Classic case of differing build and runtime requirements, and in this case, 99% of the time it makes no sense for this kind of isolation at build time. | 14:40 |
| @clarkb:matrix.org | ok I'm back and can write that change now | 14:46 |
| -@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 958470: Shutdown log stream socket instead of close https://review.opendev.org/c/zuul/zuul/+/958470 | 14:48 | |
| -@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul-jobs] 959863: Use network=host on docker buildx build https://review.opendev.org/c/zuul/zuul-jobs/+/959863 | 14:54 | |
| @clarkb:matrix.org | lets see what the CI jobs have to say about ^ | 14:54 |
| @ckulla:matrix.org | Not sure how to check this, /etc/resolv.conf gives nameserver 127.0.0.53, resolvectl status gives Current DNS Server: 168.63.129.16 - Its an ubuntu noble VM in Azure, does this answer the question? | 15:03 |
| @clarkb:matrix.org | Yes, I think that confirms you are using a localhost resolver (all of 128/8 is localhost on Linux at least) | 15:04 |
| @clarkb:matrix.org | Then if docker determines it cannot speak to that resolver it will fallback to Google DNS I think and if that is blocked in your environment DNS may not work. Or I'd your azure env has env specific records that you need they won't be on Google DNS either | 15:05 |
| @clarkb:matrix.org | Anyway I pushed a change to use host networking explicitly during the build and if the CI jobs pass I think we can land that and it should fix it for you | 15:05 |
| @ckulla:matrix.org | The image build is downloading wheels from https://files.pythonhosted.org/packages/…, in another it tries to install packages and gives "Temporary failure resolving 'deb.debian.org'" | 15:07 |
| @ckulla:matrix.org | Thanks a lot! | 15:08 |
| @jim:acmegating.com | thanks. those network operations are all things that happen in the images that Clark build too, so i think that reinforces the theory Clark described above. | 15:08 |
| @jim:acmegating.com | * thanks. those network operations are all things that happen in the images that Clark built too, so i think that reinforces the theory Clark described above. | 15:08 |
| @clarkb:matrix.org | Er 127/8 not 128/8 | 15:09 |
| @clarkb:matrix.org | https://review.opendev.org/c/zuul/zuul-jobs/+/959863 passed all jobs except the microk8s one on a failure that I think is due to snapcraft.io either rate limiting us or failing to serve the requests in the first place | 15:22 |
| @clarkb:matrix.org | I have rechecked it, but I think the initial indication is that this is working and we could approve it | 15:22 |
| @jim:acmegating.com | Clark: +2 from me, feel free to +3 if no one objects in a bit (i think we should merge it with some alacrity since there's a regression that's a blocker for at least some folks) | 15:36 |
| @clarkb:matrix.org | will do | 15:39 |
| @clarkb:matrix.org | I did approve the change but the microk8s job is consistently failing on `error: cannot install "microk8s": Post "https://api.snapcraft.io/v2/snaps/refresh": context canceled` | 16:32 |
| @clarkb:matrix.org | just browsing snapcraft.io is slow for me locally so I suspect this is a remote side problem with the service. Do we want to make that job nonvoting for now maybe? | 16:34 |
| @jim:acmegating.com | yeah, if it bounces out of gate let's do that | 16:56 |
| -@gerrit:opendev.org- Clark Boylan proposed: | 17:00 | |
| - [zuul/zuul-jobs] 959863: Use network=host on docker buildx build https://review.opendev.org/c/zuul/zuul-jobs/+/959863 | ||
| - [zuul/zuul-jobs] 959884: Make microk8s jobs non voting https://review.opendev.org/c/zuul/zuul-jobs/+/959884 | ||
| @clarkb:matrix.org | Hopefully I constructed the job updates in a way that make the linter happy. I put it in a parent change so that we can more easily revert it later | 17:01 |
| @jim:acmegating.com | Clark: can you go ahead and push the revert so we have it? | 17:02 |
| @clarkb:matrix.org | yes one sec | 17:02 |
| @clarkb:matrix.org | I want to see the linter pass before I do that just so that i know I'm not going to need to rewrite it first | 17:02 |
| -@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul-jobs] 959887: Revert "Make microk8s jobs non voting" https://review.opendev.org/c/zuul/zuul-jobs/+/959887 | 17:11 | |
| @clarkb:matrix.org | linters passed and the revert is pushed ^ | 17:11 |
| @clarkb:matrix.org | I'm going to pop out now to try and get a bike ride in before the day gets hot. I'm happy to approve those when I get back and babysit, but feel free to approve while I'm out too | 17:12 |
| @clarkb:matrix.org | I have approved the changes now | 19:13 |
| -@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul-jobs] 959884: Make microk8s jobs non voting https://review.opendev.org/c/zuul/zuul-jobs/+/959884 | 19:25 | |
| -@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul-jobs] 959863: Use network=host on docker buildx build https://review.opendev.org/c/zuul/zuul-jobs/+/959863 | 19:34 | |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!