*** Guest0 is now known as prometheanfire | 00:17 | |
*** mazzy50988129295808594944 is now known as mazzy5098812929580859494 | 00:47 | |
frickler | so ... some story about debootstrap. by default it indeed only uses the release pocket with too old ca-certs | 06:06 |
---|---|---|
frickler | in theory on could add --extra-suites=focal-updates,focal-security, but then the fun begins | 06:07 |
frickler | that option is made to be able to add extra pks from other sources, it doesn't deal with pkgs appearing multiple times | 06:08 |
ianw | i think using http:// is a good work around :) | 06:08 |
frickler | or put differently, debootstrap still used the first version of a pkgs it finds, there is no mechanism to select the most recent version like apt would do | 06:09 |
frickler | so I tried to turn things around, create a "focal-updates" image and have "--extra-suites=focal" as fallback for pkgs that didn't receive updates | 06:10 |
frickler | this uncovers a bug in the updated apt, which fails to install from scratch at all | 06:10 |
frickler | the only feasible solution I found would be to install an updated ca-certificates pkg via dpkg after debootstrap finishes | 06:11 |
frickler | ianw: seems a bit sad, but likely it is indeed. even jammy will have to fall back to that if the situation persists once the released ca-certs are no longer good enough | 06:12 |
ianw | yeah, i think the main mirror is cloudflare, which probably doesn't use LE certs? maybe a lot of people use https via that, hiding this even further? | 06:13 |
ianw | we're probably getting fairly unique even using debootstrap, but also against our own mirrors | 06:14 |
frickler | ack | 06:15 |
frickler | ianw: did you see the dib failures at https://review.opendev.org/c/openstack/diskimage-builder/+/842856 ? I haven't found out what is breaking there, not sure if I should just keep rechecking | 06:16 |
ianw | umm, havne't checked dib queue today | 06:16 |
frickler | also please add https://review.opendev.org/c/openstack/project-config/+/842853 to your review list, not sure if my idea works and waiting for a periodic run to test is tedious | 06:17 |
ianw | 2022-05-22 14:00:00.016130 | LOOP [push-to-intermediate-registry : Push tag to intermediate registry] | 06:17 |
ianw | 2022-05-22 14:29:42.793690 | POST-RUN END RESULT_TIMED_OUT: [trusted : opendev.org/opendev/base-jobs/playbooks/buildset-registry/post.yaml@master] | 06:17 |
frickler | yes, I looked at the registry log, too, but didn't find out what is broken there | 06:18 |
ianw | that's weird | 06:18 |
ianw | yeah the ir logs seem to cut off at 14:00 | 06:19 |
ianw | https://7e29afca27547df970c8-f36a7ace61553ff461a9764933cd7ea3.ssl.cf5.rackcdn.com/842856/1/check/opendev-buildset-registry/1e94e9a/docker/buildset_registry.txt | 06:20 |
ianw | i dunno, probably rechecking and if it replicates we'll need to look more | 06:21 |
*** ysandeep|out is now known as ysandeep|rover | 06:25 | |
frickler | ianw: well this was two times in a row. the first time I just assumed some network or whatever hickup, but twice the same I don't know. but ok, let's go for three to have more confidence | 06:25 |
*** frenzy_friday is now known as frenzyfriday|ruck | 06:34 | |
*** jpena|off is now known as jpena | 07:33 | |
*** elodilles is now known as elodilles_afk | 08:31 | |
*** ysandeep|rover is now known as ysandeep|rover|lunch | 08:38 | |
opendevreview | Dr. Jens Harbott proposed openstack/diskimage-builder master: DNM: Testing registry failures https://review.opendev.org/c/openstack/diskimage-builder/+/842928 | 09:16 |
frickler | ianw: failed the third time in a row at the same step. testing with ^^ now to make sure it is really independent. | 09:19 |
*** ysandeep|rover|lunch is now known as ysandeep|rover | 10:07 | |
mnasiadka | mgoddard, yoctozepto: the haproxy single frontend patch is ready for review - https://review.opendev.org/c/openstack/kolla-ansible/+/823395 (and CI in https://review.opendev.org/c/openstack/kolla-ansible/+/841239) if you have some time | 10:18 |
yoctozepto | mnasiadka: let's discuss on #openstack-kolla, not here | 10:20 |
mnasiadka | ups | 10:20 |
mnasiadka | makes sense :) | 10:20 |
yoctozepto | :-) | 10:20 |
*** rlandy|out is now known as rlandy | 10:21 | |
*** dviroel|out is now known as dviroel | 11:21 | |
frickler | weird post failure on a docs build in gate, maybe someone else sees more than I do? https://zuul.opendev.org/t/openstack/build/8246b40d474545b99f7e0dc5134fbad1 | 11:40 |
fungi | looks like the test node got unhappy in the post-run playbook after copying sphinx-build-pdf.log from work/logs/ and before or during the copy of whatevery would have been in work/artifacts/ | 11:47 |
fungi | "kex_exchange_identification: Connection closed by remote host" is usually an indication that the sshd is unhappy | 11:48 |
fungi | though interestingly, that build has artifacts | 11:49 |
fungi | oh, that's uploaded by the fetch-sphinx-tarball role in the earlier post-run playbook | 11:52 |
fungi | not by the fetch-output role, which is what broke | 11:52 |
fungi | my guess is something unexlected (ECLOUD) happened to the node, causing ssh to start insta-closing new connections at that moment | 11:53 |
fungi | er, something unexpected | 11:54 |
*** elodilles_afk is now known as elodilles | 12:16 | |
*** ysandeep|rover is now known as ysandeep|rover|brb | 12:21 | |
frickler | ah, right, I missed that, looks like just the usual cloud hickup indeed | 12:30 |
*** ysandeep|rover|brb is now known as ysandeep|rover | 12:37 | |
frickler | zuul held the failing nodepool-build-image-siblings job, but not the depending registry, that's not helpful for debugging | 13:14 |
fungi | might instead need to patch the job in the broken change to just force it to wait prior to the upload, and increase the timeout(s)? | 13:16 |
frickler | yeah, I guess for now I'll stick to hoping someone with more experience with this setup will pick things up ;) | 13:55 |
Clark[m] | Not really here yet but I would check file sizes for the image (did it explode in size causing the job to timeout uploading it?) And maybe check the intermediate registry logs (that's the insecure CI registry) | 13:59 |
opendevreview | Joseph Kostreva proposed zuul/zuul-jobs master: prepare-workspace: Add role variable prepare_workspace_delete_dest https://review.opendev.org/c/zuul/zuul-jobs/+/842723 | 14:36 |
opendevreview | Joseph Kostreva proposed zuul/zuul-jobs master: prepare-workspace: Add variable prepare_workspace_delete_dest https://review.opendev.org/c/zuul/zuul-jobs/+/842723 | 14:37 |
opendevreview | Joseph Kostreva proposed zuul/zuul-jobs master: prepare-workspace: Add variable prepare_workspace_delete_dest https://review.opendev.org/c/zuul/zuul-jobs/+/842723 | 14:44 |
opendevreview | Mohammed Naser proposed openstack/project-config master: neutron: add neutron-vpnaas-stable-maint https://review.opendev.org/c/openstack/project-config/+/842985 | 14:49 |
*** dviroel is now known as dviroel|lunch | 15:20 | |
*** mazzy50988129295808594948 is now known as mazzy5098812929580859494 | 15:43 | |
*** marios is now known as marios|out | 15:44 | |
*** mazzy50988129295808594940 is now known as mazzy5098812929580859494 | 15:57 | |
*** dviroel|lunch is now known as dviroel | 16:29 | |
*** ysandeep|rover is now known as ysandeep|out | 16:30 | |
corvus | 3 nodepool image builds in a row failed with post_failure; i checked the first and it timed out while pushing to the intermediate registry; i assume the same is true for the other 2 | 16:57 |
corvus | the failed job started pushing at 15:42 and timed out at 16:09 | 16:57 |
clarkb | corvus: I think frickler was looking at that but unsure of progress | 17:00 |
clarkb | looks like frickler was hoping someone else with more knowledge of the setup could look at it next | 17:01 |
corvus | i suspect all the worker threads may be stuck. i don't think we have metrics for that or a sigusr2 handler, so difficult to confirm. | 17:04 |
corvus | #status log restarted zuul-registry since it appeared to be stuck | 17:04 |
fungi | and its logs were basically silent? | 17:09 |
*** jpena is now known as jpena|off | 17:10 | |
corvus | fungi: yeah, a few ssl connection errors over the past few days | 17:34 |
corvus | not even the usual complement of entries from bots/crawlers/etc | 17:34 |
fungi | interesting | 17:37 |
*** rlandy is now known as rlandy|mtg | 18:14 | |
frickler | ah, those were actual nodepool builds failing. I had only been looking at dib failures. but it seems both are repaired now \o/ | 18:59 |
*** rlandy|mtg is now known as rlandy | 19:11 | |
johnsom | Does anyone know what is up with the "caputre-performance-data" task "Unkown database 'stats'" ? | 19:20 |
johnsom | https://zuul.opendev.org/t/openstack/build/52a212790e0f4ce3b29b7ed3448b10a8/log/job-output.txt#7847 | 19:20 |
johnsom | Oh, that isn't a zuul task, it's devstack. I will go bug the qa channel. | 19:22 |
johnsom | Ah, that is a side effect of: ERROR: Could not find a version that satisfies the requirement os-brick===5.2.0 | 19:34 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Correct git config item name in mirror-workspace-git-repos https://review.opendev.org/c/zuul/zuul-jobs/+/843023 | 20:13 |
opendevreview | Merged zuul/zuul-jobs master: Correct git config item name in mirror-workspace-git-repos https://review.opendev.org/c/zuul/zuul-jobs/+/843023 | 20:36 |
corvus | infra-root: ^ heads up that touches every job (it passed a base-test cycle, so should be fine) | 20:42 |
fungi | yep, thanks! i was satisfied with the results of the base-test testing | 21:00 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Remove "include:" usage from multi-node-bridge https://review.opendev.org/c/zuul/zuul-jobs/+/843026 | 21:09 |
*** dviroel is now known as dviroel|out | 21:21 | |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Remove "include:" usage from multi-node-bridge https://review.opendev.org/c/zuul/zuul-jobs/+/843026 | 21:24 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Remove "include:" usage from multi-node-bridge https://review.opendev.org/c/zuul/zuul-jobs/+/843026 | 21:37 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Remove "include:" usage from multi-node-bridge https://review.opendev.org/c/zuul/zuul-jobs/+/843026 | 21:44 |
*** rlandy is now known as rlandy|biab | 22:01 | |
corvus | infra-root: i have piloted switching jobs for the zuul project to ansible 5 and fixed two issues that came up. it would probably be good for someone to do similar with some openstack jobs before opendev and/or zuul switches the default version | 22:15 |
clarkb | I've just added ^ this topic to the meeting agenda which I'll send out soon | 22:16 |
opendevreview | Clark Boylan proposed opendev/system-config master: Try system-config-run jobs with ansible v5 https://review.opendev.org/c/opendev/system-config/+/843032 | 22:29 |
opendevreview | James E. Blair proposed openstack/project-config master: Set "zuul" tenant default Ansible version to 5 https://review.opendev.org/c/openstack/project-config/+/843034 | 22:31 |
clarkb | I pushed a change to devstack and one to system-config to start collecting info for those sets of jobs. I expect we'll get decent coverage out of those to start | 22:32 |
corvus | clarkb: depends-on https://review.opendev.org/843026 yeah? | 22:36 |
clarkb | corvus: not yes but just noticed I need to | 22:36 |
corvus | fungi: ^ there are replies to your question on that, if you feel like re-approving it | 22:36 |
corvus | clarkb: ++ | 22:36 |
clarkb | corvus: do we record the ansible version in the job log somewhere to be extra sure (the failure nidicates it is working though) | 22:37 |
corvus | that's in "multinode" so it's going to hit a lot | 22:37 |
opendevreview | Clark Boylan proposed opendev/system-config master: Try system-config-run jobs with ansible v5 https://review.opendev.org/c/opendev/system-config/+/843032 | 22:37 |
corvus | clarkb: heh, yeah so far "red" has been the easiest way to tell :) | 22:37 |
corvus | clarkb: 2022-05-23 22:33:27.907109 | Ansible Version: 2.9.27 | 22:39 |
corvus | clarkb: 2022-05-23 22:31:42.537820 | Ansible Version: 2.12.5 | 22:40 |
corvus | 2.12 == 5 | 22:40 |
corvus | looks like it's just that in job-output.txt; i don't see it in any of the other files | 22:42 |
clarkb | thanks | 22:42 |
clarkb | corvus: devstack is hitting https://docs.ansible.com/ansible-core/2.12/user_guide/become.html#risks-of-becoming-an-unprivileged-user | 22:55 |
clarkb | at first glance I'm a bit worried that this is going to affect things more broadly | 22:55 |
clarkb | rc: 1, err: chmod: invalid mode: \u2018A+user:stack:rx:allow\u2019\nTry 'chmod --help' for more information.\n is the error | 22:56 |
clarkb | A+user:stack:rx:allow I'm not even sure how to process that | 22:57 |
clarkb | "POSIX-draft ACL specification. Solaris, maybe others." from the ansible source. Well we aren't solaris and it doesn't work | 23:01 |
clarkb | ok so the issue is we're unprivileged as zuul and trying to do privileged task of chowning a file to another user? | 23:03 |
clarkb | For whatever reason they allow the solaris case to fall through and send you that error message regardless of the platform hyou are on :/ | 23:03 |
clarkb | but how did this ever work? | 23:04 |
clarkb | this is quickly feeling like a thread I don't want to pull on rightn ow as it will likely end up like that pip install thread | 23:04 |
corvus | clarkb: i agree, neither of those things make sense (why would this be a problem now and not before; and why would adding the extra thing for solaris cause a failure?) | 23:09 |
*** rlandy|biab is now known as rlandy | 23:18 | |
opendevreview | Merged zuul/zuul-jobs master: Remove "include:" usage from multi-node-bridge https://review.opendev.org/c/zuul/zuul-jobs/+/843026 | 23:47 |
*** rlandy is now known as rlandy|out | 23:50 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!