fungi | gitea05 has recovered sufficiently that i'm readding it to the pool | 00:06 |
---|---|---|
fungi | also huge thanks to jralbert for helping us work out that this might be the fallout from pathological deployment failure scenarios with openstack-ansible. if it crops up again that may make it easier to work out the cause, and to design around that case | 00:07 |
fungi | per discussion in #openstack-ansible | 00:07 |
*** mlavalle has quit IRC | 00:16 | |
*** tosky has quit IRC | 00:26 | |
fungi | ianw: the other unusual (almost certainly unrelated) situation we had earlier was that four nodes booted in ovh-bhs1 around 03:30 utc which the launcher never returned as completed or rejected (three were filled though had some retries, the fourth failed three tries in a row), so the node requests were still locked by the launcher and the change they were for was blocking other changes in a gate queue for | 01:04 |
fungi | some 14 hours | 01:04 |
*** hamalq has quit IRC | 01:04 | |
fungi | restarting the nodepool-launcher container on nl04 released those locks and allowed the requests to be picked up by another launcher | 01:05 |
*** Eighth_Doctor is now known as Conan_Kudo | 01:32 | |
*** Conan_Kudo is now known as Eighth_Doctor | 01:34 | |
ianw | fungi: huh ... do we suspect zuul changes? | 01:38 |
*** bhagyashris has quit IRC | 01:39 | |
fungi | seems fairly unlikely, unless zuul somehow replaced the request and didn't register the completion/rejectoin | 01:45 |
fungi | the launcher never logged "Fulfilled node request" for them | 01:48 |
fungi | which i think means it stopped short for some reason | 01:48 |
fungi | grep 200-0013422532 /var/log/nodepool/launcher-debug.log.2021-03-24_20 | 01:48 |
fungi | you'll see the last thing it logs is "Node is ready" | 01:49 |
fungi | for normal fulfilled requests the launcher logs "Fulfilled node request" next | 01:49 |
openstackgerrit | Ian Wienand proposed opendev/system-config master: gitea: switch to token auth for project creation https://review.opendev.org/c/opendev/system-config/+/782887 | 01:54 |
openstackgerrit | Merged opendev/zone-opendev.org master: Remove review-dev https://review.opendev.org/c/opendev/zone-opendev.org/+/782889 | 02:02 |
openstackgerrit | Merged opendev/zone-opendev.org master: Add review02.opendev.org https://review.opendev.org/c/opendev/zone-opendev.org/+/782893 | 02:03 |
*** hemanth_n has joined #opendev | 03:22 | |
*** brinzhang has joined #opendev | 03:52 | |
openstackgerrit | Ian Wienand proposed opendev/system-config master: Add review02.opendev.org https://review.opendev.org/c/opendev/system-config/+/783183 | 03:53 |
*** chandan_kumar has joined #opendev | 03:58 | |
*** chandan_kumar is now known as chkumar|ruck | 03:58 | |
*** ykarel has joined #opendev | 03:59 | |
*** tkajinam has quit IRC | 04:01 | |
*** tkajinam has joined #opendev | 04:01 | |
*** marios has joined #opendev | 06:09 | |
*** jralbert has quit IRC | 06:19 | |
*** sboyron has joined #opendev | 06:35 | |
*** elod is now known as elod_afk | 06:45 | |
*** rpittau|afk has quit IRC | 07:05 | |
*** rpittau|afk has joined #opendev | 07:06 | |
*** ricolin has joined #opendev | 07:17 | |
*** dpawlik6 has joined #opendev | 07:22 | |
*** gothicserpent has quit IRC | 07:26 | |
*** roman_g has joined #opendev | 07:39 | |
*** eolivare has joined #opendev | 07:43 | |
*** amoralej has joined #opendev | 07:45 | |
*** gothicserpent has joined #opendev | 07:56 | |
*** fressi has joined #opendev | 08:02 | |
*** ykarel is now known as ykarel|lunch | 08:09 | |
*** rpittau|afk is now known as rpittau | 08:09 | |
*** hashar has joined #opendev | 08:10 | |
*** lourot has joined #opendev | 08:18 | |
*** lpetrut has joined #opendev | 08:28 | |
*** bhagyash- has joined #opendev | 08:31 | |
*** bhagyash- is now known as bhagyashris | 08:32 | |
*** fressi has quit IRC | 08:41 | |
*** fressi has joined #opendev | 08:42 | |
*** ozzzo has joined #opendev | 08:45 | |
*** elod_afk is now known as elod | 08:52 | |
*** jpena|off is now known as jpena | 08:58 | |
*** andrewbonney has joined #opendev | 09:02 | |
*** ykarel|lunch is now known as ykarel | 09:02 | |
*** ysandeep is now known as ysandeep|lunch | 09:12 | |
*** dtantsur|afk is now known as dtantsur | 09:17 | |
openstackgerrit | Moshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse. https://review.opendev.org/c/openstack/diskimage-builder/+/778723 | 09:18 |
openstackgerrit | Moshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse. https://review.opendev.org/c/openstack/diskimage-builder/+/778723 | 09:22 |
*** ysandeep|lunch is now known as ysandeep | 09:25 | |
*** tosky has joined #opendev | 09:30 | |
*** eharney has quit IRC | 09:34 | |
*** whoami-rajat has joined #opendev | 09:44 | |
*** eharney has joined #opendev | 09:52 | |
*** ykarel_ has joined #opendev | 09:54 | |
*** ykarel has quit IRC | 09:57 | |
*** roman_g has quit IRC | 10:07 | |
fungi | i've got a morning full of errands and won't be around much before 15:00 utc | 11:29 |
*** owalsh has quit IRC | 11:35 | |
*** owalsh has joined #opendev | 11:55 | |
*** tbarron has joined #opendev | 11:58 | |
*** cloudnull has quit IRC | 12:14 | |
*** redrobot has joined #opendev | 12:16 | |
*** ykarel_ is now known as ykarel | 12:28 | |
*** cloudnull has joined #opendev | 12:31 | |
*** jpena is now known as jpena|lunch | 12:31 | |
*** arxcruz has joined #opendev | 12:39 | |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Clean up Gerrit image builds https://review.opendev.org/c/opendev/system-config/+/765577 | 12:42 |
*** hemanth_n has quit IRC | 12:43 | |
openstackgerrit | Jeremy Stanley proposed opendev/jeepyb master: Bump gerritlib requirement to 0.10.0 https://review.opendev.org/c/opendev/jeepyb/+/765357 | 12:44 |
*** jpena|lunch is now known as jpena | 13:31 | |
*** roman_g has joined #opendev | 13:34 | |
openstackgerrit | Moshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse. https://review.opendev.org/c/openstack/diskimage-builder/+/778723 | 13:43 |
openstackgerrit | Moshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse. https://review.opendev.org/c/openstack/diskimage-builder/+/778723 | 13:48 |
*** rpittau is now known as rpittau|afk | 13:50 | |
*** mlavalle has joined #opendev | 13:58 | |
openstackgerrit | Moshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse. https://review.opendev.org/c/openstack/diskimage-builder/+/778723 | 14:04 |
*** fressi has quit IRC | 14:10 | |
*** brinzhang has quit IRC | 14:14 | |
*** smcginnis has quit IRC | 14:24 | |
*** smcginnis has joined #opendev | 14:26 | |
*** amoralej is now known as amoralej|lunch | 14:26 | |
*** ysandeep is now known as ysandeep|dinner | 14:31 | |
openstackgerrit | Merged opendev/base-jobs master: This is to test the changes made in https://review.opendev.org/c/zuul/zuul-jobs/+/773474 https://review.opendev.org/c/opendev/base-jobs/+/782864 | 14:38 |
openstackgerrit | James E. Blair proposed ttygroup/gertty master: Highlight WIP state in change view https://review.opendev.org/c/ttygroup/gertty/+/783315 | 14:45 |
corvus | fungi: ^ that bit me too | 14:46 |
*** amoralej|lunch is now known as amoralej | 14:46 | |
fungi | ooh, thanks! | 14:47 |
fungi | i saw the wip filter change up, but it was also conflicting with one of the other gertty patches i was trying out (i forget which one now) | 14:48 |
corvus | i think that's an old one | 14:48 |
corvus | i don't think there's a patch for filtering wip-state changes | 14:49 |
fungi | oh | 14:49 |
corvus | however, you can change the default queries i think, so you should be able to do that in the config file if you want | 14:49 |
fungi | aha, yeah 190001 | 14:49 |
corvus | (i'd rather see them myself -- and then maybe add an indication in the listings that they're wip) | 14:49 |
fungi | and yes, i'd also rather see wip changes listed, but some way to see they're wip in the change list/query result screen could be cool | 14:50 |
openstackgerrit | Merged ttygroup/gertty master: Add support for searching for hashtags https://review.opendev.org/c/ttygroup/gertty/+/778088 | 14:50 |
corvus | i'm tempted to put in a 1-char wide column for state, but with states of N,M,W that will be hard to read. like, you couldn't pick 3 letters harder to differentiate at a glance. :) | 14:53 |
fungi | maybe a different kind of row highlighting, similar to how already-reviewed changes are dimmed? | 14:57 |
corvus | yeah, maybe brown? | 14:58 |
corvus | fungi: do you run in 80 chars? | 14:59 |
fungi | i do | 15:00 |
fungi | i have a full 16 colors though (if you count black)! | 15:00 |
corvus | k. then if i add a new multi-char column, it wouldn't show up for you (80 chars already drops the topic column) | 15:00 |
corvus | (and branch) | 15:01 |
corvus | i'll cipher on this some more | 15:01 |
fungi | yeah, i normally just see number, subject, owner, updated, and single-letter label columns | 15:01 |
fungi | which is plenty for me | 15:01 |
fungi | the only time the 80 columns becomes a challenge is when an insanely long patch series end in changes where i can no longer see subjects | 15:02 |
fungi | something like *top tools have to cycle between columns which columns are displayed could be neat, but probably a lot of work | 15:03 |
*** cloudnull has quit IRC | 15:17 | |
*** lpetrut has quit IRC | 15:19 | |
johnsom | Hi OpenDev folks. We are seeing some DNS failures on jobs this morning. https://800680eabfb6e9ab62bb-b38def2a49f1e94bd62e4be171bb57bc.ssl.cf5.rackcdn.com/774157/4/check/octavia-v2-dsvm-spare-pool-stable-train/afbceef/job-output.txt | 15:24 |
johnsom | Complete output from command python setup.py egg_info:\n Download error on https://pypi.python.org/simple/pbr/: [Errno -3] Temporary failure in name resolution -- Some packages may not be found! | 15:24 |
johnsom | It's causing POST_FAILURE on the stackviz pip step | 15:25 |
fungi | johnsom: thanks for the heads up, digging into it now | 15:25 |
johnsom | Thank you | 15:26 |
johnsom | Also, another job in the same check run: 2021-03-26 14:26:59.119795 | controller | E: Failed to fetch https://mirror.regionone.limestone.opendev.org/ubuntu/pool/main/v/vlan/vlan_1.9-3.2ubuntu6_amd64.deb Unable to connect to mirror.regionone.limestone.opendev.org:https: | 15:27 |
johnsom | https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_3b1/774157/4/check/octavia-v2-dsvm-scenario-stable-ussuri/3b15d38/job-output.txt | 15:27 |
fungi | a few possibilities... the job is for some reason not querying the local unbound daemon on the node, unbound has died, or opendns/googledns to which unbound is forwarding isn't responding | 15:27 |
fungi | that dns failure occurred in ovh-bhs1 whereas the mirror connection error was in limestone-regionone, so the two are probably unrelated | 15:29 |
johnsom | Another job just failed in that patch check pipeline: Err:170 https://mirror.regionone.limestone.opendev.org/ubuntu bionic-updates/universe amd64 osinfo-db all 0.20180929-1ubuntu0.1 Unable to connect to mirror.regionone.limestone.opendev.org:https: | 15:31 |
johnsom | https://7f0f8e0cf64675d9261c-2b273022a7511a658191230814353cfa.ssl.cf1.rackcdn.com/774157/4/check/octavia-v2-dsvm-scenario-stable-train/01f216f/job-output.txt | 15:31 |
johnsom | unbound shows "2606:4700:4700::1111#53" as the forwarded DNS server and it got no answer | 15:34 |
fungi | dns lookup on the first one (i'm still looking at that right now, i can only realistically investigate one failure at a time, sorry) happened between 15:18:16 and 15:18:59, and there's an entry in the unbound log for a similar lookup at 15:18:36 | 15:35 |
johnsom | So either cloudflare has an issue or IPv6 out of that limestone region is having troubles | 15:35 |
johnsom | Yeah, no worries, was just trying to share the information. | 15:35 |
*** cloudnull has joined #opendev | 15:35 | |
fungi | well, as i said, i'm still looking at the first one, and that didn't happen in limestone | 15:39 |
fungi | trying to get to the bottom of why the lookup through unbound failed. i'm seeing a number of "Verified that unsigned response is INSECURE" messages toward the end of the processing for that query | 15:40 |
fungi | related to records for dualstack.python.map.fastly.net | 15:40 |
*** chkumar|ruck is now known as raukadah | 15:41 | |
fungi | pypi.python.org is presently an alias for dualstack.python.map.fastly.net when i query it from home | 15:41 |
fungi | yeah, not finding any ds records for either domain | 15:48 |
fungi | so i expect that's normal | 15:49 |
johnsom | fastly has a declared issue in Paris: https://status.fastly.com/ | 15:49 |
johnsom | Not sure if that is close/related | 15:49 |
fungi | interestingly, ovh bhs1 is in quebec so probably not | 15:49 |
fungi | but could be. fastly often doesn't direct traffic to nearby endpoints | 15:50 |
fungi | our mirror server in bhs1 is able to retrieve https://pypi.python.org/simple/pbr/ without error presently, but it's possible the issue is intermittent there too | 15:51 |
*** cloudnull has quit IRC | 15:52 | |
fungi | regardless, it does look like unbound in that first failure returned 151.101.248.223 and 2a04:4e42:46::223 | 15:52 |
fungi | different addresses than i get when performing a lookup from mirror01.bhs1.ovh.opendev.org right now | 15:53 |
fungi | unrelated, i wonder why the process-stackviz role is hitting pypi directly instead of going through our pypi proxy-caching "mirror" server there | 15:54 |
corvus | infra-root: fyi i found a new gitea-based foss code hosting site and said hello: https://codeberg.org/Codeberg/build-deploy-gitea/pulls/59 | 15:55 |
*** cloudnull has joined #opendev | 15:55 | |
fungi | corvus: neat! glad to see we're not alone | 15:56 |
fungi | johnsom: traceroute from bhs1 to 151.101.248.223 shows it going to mae-east, so paris involvement is unlikely | 15:58 |
fungi | rtt is far too low to be anywhere farther | 15:59 |
fungi | and not enough hops past there, so doubtful it's taking the transatlantic line out of mae-east | 15:59 |
johnsom | lol, yeah, transatlantic penalty is pretty obvious | 16:00 |
fungi | so anyway, i'm a bit baffled. it looks like pip performed a host lookup to unbound which in turn queried external dns and returned valid records, so it seems like pip may be confused about the error (or somehow unbound didn't correctly return the results back through to libc) | 16:08 |
*** dwilde has quit IRC | 16:09 | |
fungi | on the limestone errors, we've seen intermittent network connectivity issues there, particularly when connecting to the mirror instance which should be on an adjacent network. it's possible we've got more cases of bug 1844712 there | 16:11 |
openstack | bug 1844712 in OpenStack Security Advisory "RA Leak on tenant network" [Undecided,Incomplete] https://launchpad.net/bugs/1844712 | 16:11 |
fungi | but it wouldn't hurt to start collecting the neighbor table from the server periodically and seeing if that's showing new examples | 16:11 |
openstackgerrit | Gomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Test changes made using base-test https://review.opendev.org/c/zuul/zuul-jobs/+/783378 | 16:15 |
openstackgerrit | Gomathi Selvi Srinivasan proposed zuul/zuul-jobs master: DNM : Test changes made using base-test https://review.opendev.org/c/zuul/zuul-jobs/+/783378 | 16:17 |
openstackgerrit | Gomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Test changes made using base-test https://review.opendev.org/c/zuul/zuul-jobs/+/783378 | 16:19 |
*** ysandeep|dinner is now known as ysandeep | 16:24 | |
fungi | johnsom: i've got this running in a root screen session on the limestone mirror instance now: | 16:25 |
fungi | while :;do sleep 10;ip -6 ro sh|ts '%Y-%m-%dT%H:%M:%S'|tee gateways.log;done | 16:25 |
fungi | we can check that against observed connection failures and see if | 16:25 |
fungi | it had a stray invalid gateway in that timeframe | 16:25 |
johnsom | Ok, cool, thanks! | 16:26 |
openstackgerrit | Gomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Test changes made using base-test https://review.opendev.org/c/zuul/zuul-jobs/+/783378 | 16:27 |
fungi | johnsom: yeah, that bug is pesky. would be good if we could figure out how to reliably reproduce it so neutron and/or openstack-ansible folks can finally have some hope of working on a fix | 16:29 |
fungi | i say openstack-ansible because both the clouds where we've observed it are deployed with it, no idea if it's actually involved or mere coincidence | 16:29 |
fungi | but hard to say without a better understanding of what causes it as to whether it's an issue in the deployment, in what neutron configures, or even in one of the lower-level components neutron's relying on at the operating system level | 16:30 |
johnsom | Yeah, I know there have been some conflict issues with neutron routes, etc. in the past. I don't know if there is a way to get changes in those tables logged somehow or not. Short of a script like yours | 16:32 |
openstackgerrit | Gomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Test changes made using base-test https://review.opendev.org/c/zuul/zuul-jobs/+/783378 | 16:34 |
fungi | well, in this case it's more about filtering i think | 16:35 |
fungi | basically what we think is happening is that a job node starts errantly leaking route announcements onto the network, and they're arriving at the mirror server even though it's in a separate tenant, so the mirror dutifully adds a new default route to the job node, which not only can't actually route traffic for it, but also completely disappears shortly thereafter, leaving a bogus default route installed on | 16:37 |
fungi | the mirror until it expires | 16:37 |
*** ysandeep is now known as ysandeep|away | 16:37 | |
fungi | possible there's a race in port setup, for example, where during a brief window some packets which should be getting filtered are making it through | 16:37 |
*** d34dh0r53 has joined #opendev | 16:41 | |
*** hamalq has joined #opendev | 16:43 | |
*** hamalq_ has joined #opendev | 16:44 | |
*** amoralej is now known as amoralej|off | 16:45 | |
*** hamalq has quit IRC | 16:47 | |
*** marios is now known as marios|out | 17:00 | |
*** ykarel has quit IRC | 17:00 | |
*** artom has quit IRC | 17:02 | |
*** marios|out has quit IRC | 17:09 | |
*** iurygregory has quit IRC | 17:12 | |
*** iurygregory has joined #opendev | 17:13 | |
*** artom has joined #opendev | 17:18 | |
*** dtantsur is now known as dtantsur|afk | 17:28 | |
*** eolivare has quit IRC | 17:39 | |
*** d34dh0r53 has quit IRC | 17:40 | |
*** d34dh0r53 has joined #opendev | 17:40 | |
*** yoctozepto has quit IRC | 17:41 | |
*** d34dh0r53 has quit IRC | 17:42 | |
*** d34dh0r53 has joined #opendev | 17:43 | |
*** yoctozepto has joined #opendev | 17:50 | |
*** jpena is now known as jpena|off | 17:56 | |
*** slaweq has quit IRC | 17:58 | |
*** slaweq has joined #opendev | 18:03 | |
*** slaweq has quit IRC | 18:09 | |
*** diablo_rojo_phon has joined #opendev | 18:33 | |
*** hashar has quit IRC | 18:38 | |
*** andrewbonney has quit IRC | 18:47 | |
*** ralonsoh has quit IRC | 19:04 | |
*** noonedeadpunk_ has joined #opendev | 19:17 | |
*** SWAT has quit IRC | 19:40 | |
*** tkajinam_ has joined #opendev | 19:40 | |
*** tkajinam has quit IRC | 19:41 | |
*** slaweq has joined #opendev | 19:43 | |
openstackgerrit | Gomathi Selvi Srinivasan proposed opendev/base-jobs master: Revert https://review.opendev.org/c/opendev/base-jobs/+/782864 https://review.opendev.org/c/opendev/base-jobs/+/783468 | 20:07 |
*** whoami-rajat has quit IRC | 21:30 | |
*** sboyron has quit IRC | 21:59 | |
*** tosky has quit IRC | 23:39 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!