clarkb | openstack.exceptions.ResourceTimeout: Timeout waiting for the server to come up. <- ya I think that the node request may be hitting that and we're just seeing cloudyness | 00:02 |
---|---|---|
clarkb | [e: 17c1c039c9f94bcbaeaaecf81ca119ec] [node_request: 199-0015701818] [node: 0026846253] Launch attempt 1/3 failed on nl01 | 00:02 |
clarkb | yup there it goes second launch went ready and the job as started | 00:03 |
clarkb | I will attempt to be less impatient while sorting out dinner | 00:04 |
fungi | i can't figure out why this puppet deploy job failed: https://zuul.opendev.org/t/openstack/build/b0e2fecdbfb8477184ea0f7c833d24f8 | 00:22 |
fungi | log on bridge says puppet ended with an exit code of 6 so reported the task failed | 00:22 |
fungi | syslog on logstash01 doesn't say much about the ansible puppeting | 00:22 |
ianw | hrm indeed it does go very quiet | 01:14 |
opendevreview | Merged opendev/system-config master: Update ICLA to reference OpenInfra https://review.opendev.org/c/opendev/system-config/+/813055 | 01:14 |
opendevreview | Merged opendev/system-config master: Update gerritbot-matrix version to include change number in notifications https://review.opendev.org/c/opendev/system-config/+/813040 | 01:14 |
ianw | Error: /Stage[main]/Kibana::Js/Vcsrepo[/opt/kibana/v3.1.2]: Could not evaluate: Execution of '/usr/bin/git fetch --tags origin' returned 1: | 01:18 |
ianw | it looks like some on-disk git trees are not happy updating | 01:19 |
ianw | fsck reports "dangling commit 85ddfd6cfbf337ab6f5408bc23aa1faae93c37cf" | 01:23 |
ianw | error: cannot lock ref 'refs/remotes/origin/ilm/rollup-v2-action': 'refs/remotes/origin/ilm' exists; cannot create 'refs/remotes/origin/ilm/rollup-v2-action' | 01:24 |
ianw | looks like there was a branch called "ilm" and now there's one called "ilm/rollup-v2-action" | 01:25 |
ianw | i did a "git remote prune origin" and a lot of branches went, "ilm" included | 01:26 |
ianw | * [pruned] origin/ilm | 01:27 |
ianw | i feel like this might have fixed it. there is probably something systematic about the puppet and what it's cloning, but i doubt anyone wants to dive too deep into that | 01:27 |
fungi | ahh, thanks! i clearly failed to spot that in the log | 01:30 |
fungi | probably a sign it's too late at night for me to be trying to pick apart logs | 01:31 |
ianw | it really doesn't help that it's not prefixed. i feel like we had some sort of output filter change to help with that, but i can't remember | 01:38 |
opendevreview | Merged zuul/zuul-jobs master: ensure-rust: rework global install https://review.opendev.org/c/zuul/zuul-jobs/+/812272 | 01:40 |
*** ysandeep|out is now known as ysandeep | 03:27 | |
*** ysandeep is now known as ysandeep|afk | 04:08 | |
*** ykarel|away is now known as ykarel | 05:15 | |
*** ysandeep|afk is now known as ysandeep | 05:26 | |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 05:33 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 05:46 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 05:51 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 06:07 |
frickler | infra-root: seems ubuntu images are failing to build with some cert issue. that would also explain why the devstack failure I'm seeing with local testing doesn't show up in CI yet | 06:27 |
frickler | E: Failed to fetch https://mirror.dfw.rax.opendev.org/ubuntu/dists/bionic/universe/binary-amd64/Packages Certificate verification failed: The certificate is NOT trusted. The certificate chain uses expired certificate. Could not handshake: Error in the certificate verification. [IP: 2001:4800:7819:105:be76:4eff:fe04:9b8a 443] | 06:28 |
frickler | likely some LE fallout still? | 06:28 |
frickler | yep, the images nodepool uses are 8d old | 06:32 |
frickler | (for bionic and focal, xenial seems unaffected) | 06:32 |
opendevreview | OpenStack Proposal Bot proposed openstack/project-config master: Normalize projects.yaml https://review.opendev.org/c/openstack/project-config/+/813127 | 06:45 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 06:46 |
ysandeep | hey folks o/ to me looks like its taking long in getting node comparatively to other days.. https://zuul.openstack.org/status#tripleo | 06:47 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 06:57 |
ianw | frickler: hrm, that works for me. but there was something recently about LE changing part of it's cert right ... | 07:12 |
ianw | IdentTrust DST Root CA X3 | 07:13 |
frickler | ianw: yes, seems to be something in dib or debootstrap not handling that correctly | 07:15 |
ianw | 2021-10-08 07:12:20.889 | I: Validating ca-certificates 20190110ubuntu1 | 07:15 |
ianw | i guess that debootstrap doesn't use the updates repo... | 07:16 |
ianw | wow that is super annoying | 07:18 |
frickler | ianw: maybe add something with "--extra-suites"? | 07:23 |
frickler | bionic certs are even older | 07:23 |
opendevreview | Ian Wienand proposed openstack/project-config master: nodepool: drop https for ubuntu https://review.opendev.org/c/openstack/project-config/+/813135 | 07:25 |
ianw | frickler: ^ i think that is the sanest solution. otherwise we'd have to do something like a wget of an updates package (that is sure to change anyway) | 07:26 |
opendevreview | Ian Wienand proposed openstack/project-config master: nodepool: drop https for ubuntu https://review.opendev.org/c/openstack/project-config/+/813135 | 07:27 |
ianw | usually it uses gpg signed repos and http. we don't have the signing | 07:29 |
*** jpena|off is now known as jpena | 07:32 | |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 07:32 |
*** odyssey4me is now known as Guest2184 | 07:34 | |
frickler | ianw: yeah, that looks better already. now let's see whether we trigger https://bugs.launchpad.net/cinder/+bug/1946340 with new images :-S | 07:40 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 07:42 |
*** ykarel is now known as ykarel|lunch | 07:44 | |
*** ysandeep is now known as ysandeep|away | 07:52 | |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 07:57 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 08:08 |
opendevreview | Ananya proposed opendev/elastic-recheck rdo: Fix ER bot to report back to gerrit with bug/error report https://review.opendev.org/c/opendev/elastic-recheck/+/805638 | 08:12 |
opendevreview | Ananya proposed opendev/elastic-recheck rdo: Fix ER bot to report back to gerrit with bug/error report https://review.opendev.org/c/opendev/elastic-recheck/+/805638 | 08:24 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 08:26 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 08:35 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 08:55 |
frickler | 2021-10-08 08:55:32.531 | Build completed successfully | 08:57 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 09:01 |
fungi | ysandeep|away: daily periodic jobs start at ~06:25, roughly 20 minutes before you commented, and they tend to exhaust our available quota and create a bit of a backlog for node requests. see the graphs here: https://grafana.opendev.org/d/5Imot6EMk/zuul-status | 09:03 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 09:14 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 09:19 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 09:47 |
frickler | this also looks like fallout from the new ubuntu image, pip resolver taking ages trying to install things https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_7fd/813149/1/check/openstack-tox-docs/7fd2bee/job-output.txt | 09:49 |
*** ysandeep|away is now known as ysandeep | 09:54 | |
ysandeep | fungi: thanks! | 09:56 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 10:04 |
fungi | frickler: usually it's correctable by a constraints update | 10:19 |
fungi | does octavia maybe have some unconstrained dependencies? | 10:27 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 10:31 |
fungi | frickler: aha, looking, one of the delays there is bandit, which is intentionally unconstrained | 10:32 |
fungi | that seems to be the main one it complains about taking too long to satisfy | 10:32 |
fungi | oh, i think pylint is another cause there (surfacing through its dep on astroid), same situation though with being unconstrained | 10:34 |
fungi | anyway, the real problem is the kitchen sink approach to having one test-requirements.txt and trying to install it in every tox env even when most of it is unneeded | 10:37 |
fungi | you don't need tempest and bandit and pylint installed to do a docs build | 10:37 |
fungi | also don't need tempest installed to run linters | 10:38 |
fungi | and don't need linters installed to run tempest | 10:38 |
*** ykarel is now known as ykarel|afk | 10:39 | |
fungi | if those jobs used separate sets of deps, this would really be a non-issue because pip would not have to spend so much time rendering a dependency set which satisfies them all | 10:40 |
*** dviroel|out is now known as dviroel | 11:10 | |
*** jpena is now known as jpena|lunch | 11:34 | |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 11:36 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 11:45 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 11:57 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: WIP: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 12:09 |
opendevreview | Ananya proposed opendev/elastic-recheck rdo: Fix ER bot to report back to gerrit with bug/error report https://review.opendev.org/c/opendev/elastic-recheck/+/805638 | 12:10 |
*** ykarel|afk is now known as ykarel | 12:11 | |
*** sshnaidm is now known as sshnaidm|afk | 12:17 | |
*** ysandeep is now known as ysandeep|brb | 12:21 | |
*** jpena|lunch is now known as jpena | 12:21 | |
opendevreview | Yuriy Shyyan proposed openstack/project-config master: Disabling inmotion cloud scheduling for upgrades. https://review.opendev.org/c/openstack/project-config/+/813181 | 12:22 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 12:32 |
*** ysandeep|brb is now known as ysandeep|afk | 12:34 | |
opendevreview | Merged openstack/project-config master: Disabling inmotion cloud scheduling for upgrades. https://review.opendev.org/c/openstack/project-config/+/813181 | 12:41 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 12:58 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 13:12 |
*** ysandeep|afk is now known as ysandeep | 13:19 | |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 13:23 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 13:46 |
*** lbragstad_ is now known as lbragstad | 13:51 | |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 13:59 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 14:21 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 14:33 |
frickler | infra-root: I'd like to go over the nodes with exim paniclogs and clean them up to reduce the amount of daily mails, any objections to that? | 14:39 |
frickler | from spot checking most seem to stem from when the node was being installed. maybe we should automate some cleaning at the end of the installation | 14:39 |
Clark[m] | frickler: seems fine with me. If you can confirm the contents are from node creation adding a clean step to our launch node tooling is probably a good idea | 14:41 |
fungi | frickler: thanks, that would be great. i keep meaning to do it and keep getting distracted by other emergencies | 14:44 |
fungi | and i agree, when looking at the content of the paniclogs they have generally been about pathological states during server bootstrapping | 14:45 |
opendevreview | Dong Zhang proposed zuul/zuul-jobs master: Implement role for limiting zuul log file size https://review.opendev.org/c/zuul/zuul-jobs/+/813034 | 14:52 |
opendevreview | Gonéri Le Bouder proposed zuul/zuul-jobs master: build-container-image: improve the reliabilty https://review.opendev.org/c/zuul/zuul-jobs/+/813203 | 15:17 |
clarkb | Goneri: ^ is that related to the retries you were looking at in the zuul matrix room? | 15:19 |
clarkb | if so I worry that retrying aggressively like that will only make the network less stable | 15:19 |
clarkb | The buildset repository is local to the same cloud as the image builds. Those uploads should be on the most stable network segment we have | 15:20 |
clarkb | This is me thinking out loud here and wondering if we need to look at this from a different angle | 15:21 |
Goneri | 10 times is a bit aggressive indeed. But 2 or 3 would be enough to save us a couple of jobs every week. | 15:30 |
clarkb | right, but only if it doesn't make things worse | 15:31 |
clarkb | what has me concerned is that pushing to the buildset regsitry should be the most reliable thing we do on the network since it is local network | 15:31 |
clarkb | now maybe that means we have local network problems in some clouds or maybe the zuul registry cannot handle the level of writes that are happening etc | 15:31 |
clarkb | but identifying the underlying issue might be appropriate hear rather than asking the system to do more. I believe the buildset registry jobs collect logs for the buildset registry, was there anything in there indicating the registry might have been at fault? | 15:32 |
Goneri | In this case, I checked today and we don't have anything in the logs. I understand your point. But it may also be useful to give another change to a job before canceling everything. | 15:35 |
Goneri | I suggest to move from 10 retries to 3. | 15:35 |
clarkb | right I just want to amke sure we're understanding the issue before we simply retry | 15:35 |
clarkb | if this was uploads to docker hub I'd say yes retry a bunch :) | 15:35 |
clarkb | but this is retrying to a local service on the same local network and should be very reliable | 15:36 |
Goneri | overall our failure rate is pretty low: https://dashboard.zuul.ansible.com/t/ansible/builds?job_name=network-ee-build-container-image-stable-2.9&project=ansible-collections/cisco.iosxr | 15:37 |
clarkb | and the buildset registry logs didn't indicate any unexpected trouble there? | 15:38 |
opendevreview | Gonéri Le Bouder proposed zuul/zuul-jobs master: build-container-image: improve the reliabilty https://review.opendev.org/c/zuul/zuul-jobs/+/813203 | 15:38 |
clarkb | if not I guess we can go with the retries | 15:38 |
Goneri | It's like the 3 times I check the registry log and I don't see anything suspicious. | 15:39 |
clarkb | ok +2'd | 15:39 |
*** ysandeep is now known as ysandeep|away | 15:43 | |
*** marios is now known as marios|out | 15:55 | |
*** ykarel is now known as ykarel|away | 16:19 | |
*** jpena is now known as jpena|off | 16:30 | |
opendevreview | Clark Boylan proposed opendev/infra-specs master: Spec to deploy Prometheus as a Cacti replacement https://review.opendev.org/c/opendev/infra-specs/+/804122 | 16:58 |
clarkb | now with more node-exporter | 16:58 |
clarkb | corvus: ianw: frickler: fyi since it was your feedback that drove a lot of that change of opinion in my head too | 16:58 |
corvus | clarkb: cool, 1q inline also jobs fail | 17:18 |
clarkb | corvus: responded and looking into the test issue now | 17:27 |
opendevreview | Clark Boylan proposed opendev/infra-specs master: Spec to deploy Prometheus as a Cacti replacement https://review.opendev.org/c/opendev/infra-specs/+/804122 | 17:29 |
corvus | clarkb: one quick test: does it run on xenial | 17:31 |
corvus | or even trusty | 17:31 |
clarkb | ya that would be a good check let me see on one of the lgostash workers | 17:47 |
clarkb | it runs on xenial | 17:52 |
clarkb | corvus: from what I can see latest glibc's minimal kernel appears to still be 3.2 (which is what the blog post I linked stated it was a while back too) | 18:04 |
clarkb | I believe it will run on trusty as a result, but will work with fungi to double check when he gets back. I don't want to touch any trusty machines without him around :) | 18:04 |
corvus | cool, makes sense to me. | 18:05 |
clarkb | and its a good thing to sanity check. Thank you for bringing that up as I hadn't even considered the possibility initially | 18:06 |
clarkb | also there is apparently a way to have golang build with its own internal glibc replacement and then you avoid these issues entirely. But reading node-exporter docs they seems to really want glibc proper (hence the glibc-static on rhel/centos requirement in the build section) | 18:07 |
clarkb | is anyone else having trouble getting to gerrit? | 18:22 |
clarkb | seems i can get to it via both ssh ports but not https? | 18:22 |
clarkb | now it is there again | 18:23 |
clarkb | I'm going to assume local networking trouble if others didn't notice similar | 18:27 |
clarkb | fungi: the mailman3 spec lgtm but I did leave a few notes. Nothing urgent but wanted to make sure you saw them for next week | 18:49 |
frickler | clarkb: fungi: btw. https://review.opendev.org/c/openstack/project-config/+/813135 is what ianw and me tested on nb01 earlier, please have a look whether you agree with that or would prefer a more secure, but likely more tedious solution | 18:58 |
clarkb | hrm, the reason we don't use proper mirrors for those builds is sometimes our mirror will be behind then when we push up new images they aer ahead and can't install any pacaklges against our mirrors | 19:00 |
frickler | clarkb: iiuc the issue is not using mirrors or not, but not using -updates and thus only getting the ca-certs from the original release | 19:03 |
clarkb | frickler: ya I'm just pointing out why we couldn't point it at official mirrors then verify gpg signatures | 19:04 |
clarkb | its unfortunate | 19:04 |
clarkb | can we do --extra-suites updates? | 19:04 |
frickler | that's the idea I had earlier, before ianw came up with that simpler solution | 19:05 |
frickler | if you want to give that a try, I'd support it | 19:05 |
clarkb | it looks like dib has a DIB_DEBOOTSTRAP_EXTRA_ARGS var we could use to pass in --extra-suites I think | 19:06 |
* frickler won't do much more today | 19:06 | |
clarkb | frickler: ya enjoy your weekend. I'll push up a change that tries to use ^ and we can compare the two I guess | 19:06 |
clarkb | hrm actually ca-certificates is the same in both base bionic/focal and updates | 19:09 |
clarkb | I wonder instead if they need an updates openssl or gnutls | 19:09 |
clarkb | oh I see ianw notes in the commit message that you can't use the updates repo when doing the initial chroot. That is fun | 19:11 |
clarkb | I +2'd the change but didn't approve it in case there were some better ideas, but think we can approve it if we are stumped | 19:12 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update gitea to 1.15.4 https://review.opendev.org/c/opendev/system-config/+/813243 | 19:13 |
opendevreview | Ananya proposed opendev/elastic-recheck rdo: Fix ER bot to report back to gerrit with bug/error report https://review.opendev.org/c/opendev/elastic-recheck/+/805638 | 20:03 |
opendevreview | Ananya proposed opendev/elastic-recheck rdo: WIP: ER bot with opensearch for upstream https://review.opendev.org/c/opendev/elastic-recheck/+/813250 | 20:10 |
opendevreview | Douglas Viroel proposed zuul/zuul-jobs master: WIP - Add FIPS enable role to multi-node job https://review.opendev.org/c/zuul/zuul-jobs/+/813253 | 20:36 |
ianw | clarkb / frickler: hrm, i may have been hitting https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=991625 ; perhaps that can work but will need some more testing | 21:07 |
ianw | clarkb: i also didn't a gerrit point release | 21:08 |
*** dviroel is now known as dviroel|afk | 21:21 | |
clarkb | ianw: ya I'm subscribed to their mailing list and they usually post releases there | 22:30 |
clarkb | we'll be fine to deploy what we've got I gues then can sneak in a quick restart later to pick up the latest release if it doesn't show up soon | 22:30 |
clarkb | I've just checked gerrit.googlesource.com and no 3.3.7 tag there either | 22:31 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/813243 is a good one to land on monday to upgrade gitea. Looks like it passed testing happily today, but I'm not sure I'll have the focus to watch it should it have problems upgraded in prod at this point before the weekend | 23:01 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!