Friday, 2021-03-26

fungigitea05 has recovered sufficiently that i'm readding it to the pool00:06
fungialso huge thanks to jralbert for helping us work out that this might be the fallout from pathological deployment failure scenarios with openstack-ansible. if it crops up again that may make it easier to work out the cause, and to design around that case00:07
fungiper discussion in #openstack-ansible00:07
*** mlavalle has quit IRC00:16
*** tosky has quit IRC00:26
fungiianw: the other unusual (almost certainly unrelated) situation we had earlier was that four nodes booted in ovh-bhs1 around 03:30 utc which the launcher never returned as completed or rejected (three were filled though had some retries, the fourth failed three tries in a row), so the node requests were still locked by the launcher and the change they were for was blocking other changes in a gate queue for01:04
fungisome 14 hours01:04
*** hamalq has quit IRC01:04
fungirestarting the nodepool-launcher container on nl04 released those locks and allowed the requests to be picked up by another launcher01:05
*** Eighth_Doctor is now known as Conan_Kudo01:32
*** Conan_Kudo is now known as Eighth_Doctor01:34
ianwfungi: huh ... do we suspect zuul changes?01:38
*** bhagyashris has quit IRC01:39
fungiseems fairly unlikely, unless zuul somehow replaced the request and didn't register the completion/rejectoin01:45
fungithe launcher never logged "Fulfilled node request" for them01:48
fungiwhich i think means it stopped short for some reason01:48
fungigrep 200-0013422532 /var/log/nodepool/launcher-debug.log.2021-03-24_2001:48
fungiyou'll see the last thing it logs is "Node is ready"01:49
fungifor normal fulfilled requests the launcher logs "Fulfilled node request" next01:49
openstackgerritIan Wienand proposed opendev/system-config master: gitea: switch to token auth for project creation  https://review.opendev.org/c/opendev/system-config/+/78288701:54
openstackgerritMerged opendev/zone-opendev.org master: Remove review-dev  https://review.opendev.org/c/opendev/zone-opendev.org/+/78288902:02
openstackgerritMerged opendev/zone-opendev.org master: Add review02.opendev.org  https://review.opendev.org/c/opendev/zone-opendev.org/+/78289302:03
*** hemanth_n has joined #opendev03:22
*** brinzhang has joined #opendev03:52
openstackgerritIan Wienand proposed opendev/system-config master: Add review02.opendev.org  https://review.opendev.org/c/opendev/system-config/+/78318303:53
*** chandan_kumar has joined #opendev03:58
*** chandan_kumar is now known as chkumar|ruck03:58
*** ykarel has joined #opendev03:59
*** tkajinam has quit IRC04:01
*** tkajinam has joined #opendev04:01
*** marios has joined #opendev06:09
*** jralbert has quit IRC06:19
*** sboyron has joined #opendev06:35
*** elod is now known as elod_afk06:45
*** rpittau|afk has quit IRC07:05
*** rpittau|afk has joined #opendev07:06
*** ricolin has joined #opendev07:17
*** dpawlik6 has joined #opendev07:22
*** gothicserpent has quit IRC07:26
*** roman_g has joined #opendev07:39
*** eolivare has joined #opendev07:43
*** amoralej has joined #opendev07:45
*** gothicserpent has joined #opendev07:56
*** fressi has joined #opendev08:02
*** ykarel is now known as ykarel|lunch08:09
*** rpittau|afk is now known as rpittau08:09
*** hashar has joined #opendev08:10
*** lourot has joined #opendev08:18
*** lpetrut has joined #opendev08:28
*** bhagyash- has joined #opendev08:31
*** bhagyash- is now known as bhagyashris08:32
*** fressi has quit IRC08:41
*** fressi has joined #opendev08:42
*** ozzzo has joined #opendev08:45
*** elod_afk is now known as elod08:52
*** jpena|off is now known as jpena08:58
*** andrewbonney has joined #opendev09:02
*** ykarel|lunch is now known as ykarel09:02
*** ysandeep is now known as ysandeep|lunch09:12
*** dtantsur|afk is now known as dtantsur09:17
openstackgerritMoshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse.  https://review.opendev.org/c/openstack/diskimage-builder/+/77872309:18
openstackgerritMoshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse.  https://review.opendev.org/c/openstack/diskimage-builder/+/77872309:22
*** ysandeep|lunch is now known as ysandeep09:25
*** tosky has joined #opendev09:30
*** eharney has quit IRC09:34
*** whoami-rajat has joined #opendev09:44
*** eharney has joined #opendev09:52
*** ykarel_ has joined #opendev09:54
*** ykarel has quit IRC09:57
*** roman_g has quit IRC10:07
fungi i've got a morning full of errands and won't be around much before 15:00 utc11:29
*** owalsh has quit IRC11:35
*** owalsh has joined #opendev11:55
*** tbarron has joined #opendev11:58
*** cloudnull has quit IRC12:14
*** redrobot has joined #opendev12:16
*** ykarel_ is now known as ykarel12:28
*** cloudnull has joined #opendev12:31
*** jpena is now known as jpena|lunch12:31
*** arxcruz has joined #opendev12:39
openstackgerritJeremy Stanley proposed opendev/system-config master: Clean up Gerrit image builds  https://review.opendev.org/c/opendev/system-config/+/76557712:42
*** hemanth_n has quit IRC12:43
openstackgerritJeremy Stanley proposed opendev/jeepyb master: Bump gerritlib requirement to 0.10.0  https://review.opendev.org/c/opendev/jeepyb/+/76535712:44
*** jpena|lunch is now known as jpena13:31
*** roman_g has joined #opendev13:34
openstackgerritMoshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse.  https://review.opendev.org/c/openstack/diskimage-builder/+/77872313:43
openstackgerritMoshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse.  https://review.opendev.org/c/openstack/diskimage-builder/+/77872313:48
*** rpittau is now known as rpittau|afk13:50
*** mlavalle has joined #opendev13:58
openstackgerritMoshiur Rahman proposed openstack/diskimage-builder master: Fix: IPA image buidling with OpenSuse.  https://review.opendev.org/c/openstack/diskimage-builder/+/77872314:04
*** fressi has quit IRC14:10
*** brinzhang has quit IRC14:14
*** smcginnis has quit IRC14:24
*** smcginnis has joined #opendev14:26
*** amoralej is now known as amoralej|lunch14:26
*** ysandeep is now known as ysandeep|dinner14:31
openstackgerritMerged opendev/base-jobs master: This is to test the changes made in https://review.opendev.org/c/zuul/zuul-jobs/+/773474  https://review.opendev.org/c/opendev/base-jobs/+/78286414:38
openstackgerritJames E. Blair proposed ttygroup/gertty master: Highlight WIP state in change view  https://review.opendev.org/c/ttygroup/gertty/+/78331514:45
corvusfungi: ^ that bit me too14:46
*** amoralej|lunch is now known as amoralej14:46
fungiooh, thanks!14:47
fungii saw the wip filter change up, but it was also conflicting with one of the other gertty patches i was trying out (i forget which one now)14:48
corvusi think that's an old one14:48
corvusi don't think there's a patch for filtering wip-state changes14:49
fungioh14:49
corvushowever, you can change the default queries i think, so you should be able to do that in the config file if you want14:49
fungiaha, yeah 19000114:49
corvus(i'd rather see them myself -- and then maybe add an indication in the listings that they're wip)14:49
fungiand yes, i'd also rather see wip changes listed, but some way to see they're wip in the change list/query result screen could be cool14:50
openstackgerritMerged ttygroup/gertty master: Add support for searching for hashtags  https://review.opendev.org/c/ttygroup/gertty/+/77808814:50
corvusi'm tempted to put in a 1-char wide column for state, but with states of N,M,W that will be hard to read.  like, you couldn't pick 3 letters harder to differentiate at a glance.  :)14:53
fungimaybe a different kind of row highlighting, similar to how already-reviewed changes are dimmed?14:57
corvusyeah, maybe brown?14:58
corvusfungi: do you run in 80 chars?14:59
fungii do15:00
fungii have a full 16 colors though (if you count black)!15:00
corvusk.  then if i add a new multi-char column, it wouldn't show up for you (80 chars already drops the topic column)15:00
corvus(and branch)15:01
corvusi'll cipher on this some more15:01
fungiyeah, i normally just see number, subject, owner, updated, and single-letter label columns15:01
fungiwhich is plenty for me15:01
fungithe only time the 80 columns becomes a challenge is when an insanely long patch series end in changes where i can no longer see subjects15:02
fungisomething like *top tools have to cycle between columns which columns are displayed could be neat, but probably a lot of work15:03
*** cloudnull has quit IRC15:17
*** lpetrut has quit IRC15:19
johnsomHi OpenDev folks. We are seeing some DNS failures on jobs this morning. https://800680eabfb6e9ab62bb-b38def2a49f1e94bd62e4be171bb57bc.ssl.cf5.rackcdn.com/774157/4/check/octavia-v2-dsvm-spare-pool-stable-train/afbceef/job-output.txt15:24
johnsomComplete output from command python setup.py egg_info:\n    Download error on https://pypi.python.org/simple/pbr/: [Errno -3] Temporary failure in name resolution -- Some packages may not be found!15:24
johnsomIt's causing POST_FAILURE on the stackviz pip step15:25
fungijohnsom: thanks for the heads up, digging into it now15:25
johnsomThank you15:26
johnsomAlso, another job in the same check run: 2021-03-26 14:26:59.119795 | controller | E: Failed to fetch https://mirror.regionone.limestone.opendev.org/ubuntu/pool/main/v/vlan/vlan_1.9-3.2ubuntu6_amd64.deb  Unable to connect to mirror.regionone.limestone.opendev.org:https:15:27
johnsomhttps://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_3b1/774157/4/check/octavia-v2-dsvm-scenario-stable-ussuri/3b15d38/job-output.txt15:27
fungia few possibilities... the job is for some reason not querying the local unbound daemon on the node, unbound has died, or opendns/googledns to which unbound is forwarding isn't responding15:27
fungithat dns failure occurred in ovh-bhs1 whereas the mirror connection error was in limestone-regionone, so the two are probably unrelated15:29
johnsomAnother job just failed in that patch check pipeline: Err:170 https://mirror.regionone.limestone.opendev.org/ubuntu bionic-updates/universe amd64 osinfo-db all 0.20180929-1ubuntu0.1 Unable to connect to mirror.regionone.limestone.opendev.org:https:15:31
johnsomhttps://7f0f8e0cf64675d9261c-2b273022a7511a658191230814353cfa.ssl.cf1.rackcdn.com/774157/4/check/octavia-v2-dsvm-scenario-stable-train/01f216f/job-output.txt15:31
johnsomunbound shows "2606:4700:4700::1111#53" as the forwarded DNS server and it got no answer15:34
fungidns lookup on the first one (i'm still looking at that right now, i can only realistically investigate one failure at a time, sorry) happened between 15:18:16 and 15:18:59, and there's an entry in the unbound log for a similar lookup at 15:18:3615:35
johnsomSo either cloudflare has an issue or IPv6 out of that limestone region is having troubles15:35
johnsomYeah, no worries, was just trying to share the information.15:35
*** cloudnull has joined #opendev15:35
fungiwell, as i said, i'm still looking at the first one, and that didn't happen in limestone15:39
fungitrying to get to the bottom of why the lookup through unbound failed. i'm seeing a number of "Verified that unsigned response is INSECURE" messages toward the end of the processing for that query15:40
fungirelated to records for dualstack.python.map.fastly.net15:40
*** chkumar|ruck is now known as raukadah15:41
fungipypi.python.org is presently an alias for dualstack.python.map.fastly.net when i query it from home15:41
fungiyeah, not finding any ds records for either domain15:48
fungiso i expect that's normal15:49
johnsomfastly has a declared issue in Paris: https://status.fastly.com/15:49
johnsomNot sure if that is close/related15:49
fungiinterestingly, ovh bhs1 is in quebec so probably not15:49
fungibut could be. fastly often doesn't direct traffic to nearby endpoints15:50
fungiour mirror server in bhs1 is able to retrieve https://pypi.python.org/simple/pbr/ without error presently, but it's possible the issue is intermittent there too15:51
*** cloudnull has quit IRC15:52
fungiregardless, it does look like unbound in that first failure returned 151.101.248.223 and 2a04:4e42:46::22315:52
fungidifferent addresses than i get when performing a lookup from mirror01.bhs1.ovh.opendev.org right now15:53
fungiunrelated, i wonder why the process-stackviz role is hitting pypi directly instead of going through our pypi proxy-caching "mirror" server there15:54
corvusinfra-root: fyi i found a new gitea-based foss code hosting site and said hello: https://codeberg.org/Codeberg/build-deploy-gitea/pulls/5915:55
*** cloudnull has joined #opendev15:55
fungicorvus: neat! glad to see we're not alone15:56
fungijohnsom: traceroute from bhs1 to 151.101.248.223 shows it going to mae-east, so paris involvement is unlikely15:58
fungirtt is far too low to be anywhere farther15:59
fungiand not enough hops past there, so doubtful it's taking the transatlantic line out of mae-east15:59
johnsomlol,  yeah, transatlantic penalty is pretty obvious16:00
fungiso anyway, i'm a bit baffled. it looks like pip performed a host lookup to unbound which in turn queried external dns and returned valid records, so it seems like pip may be confused about the error (or somehow unbound didn't correctly return the results back through to libc)16:08
*** dwilde has quit IRC16:09
fungion the limestone errors, we've seen intermittent network connectivity issues there, particularly when connecting to the mirror instance which should be on an adjacent network. it's possible we've got more cases of bug 1844712 there16:11
openstackbug 1844712 in OpenStack Security Advisory "RA Leak on tenant network" [Undecided,Incomplete] https://launchpad.net/bugs/184471216:11
fungibut it wouldn't hurt to start collecting the neighbor table from the server periodically and seeing if that's showing new examples16:11
openstackgerritGomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Test changes made using base-test  https://review.opendev.org/c/zuul/zuul-jobs/+/78337816:15
openstackgerritGomathi Selvi Srinivasan proposed zuul/zuul-jobs master: DNM : Test changes made using base-test  https://review.opendev.org/c/zuul/zuul-jobs/+/78337816:17
openstackgerritGomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Test changes made using base-test  https://review.opendev.org/c/zuul/zuul-jobs/+/78337816:19
*** ysandeep|dinner is now known as ysandeep16:24
fungijohnsom: i've got this running in a root screen session on the limestone mirror instance now:16:25
fungiwhile :;do sleep 10;ip -6 ro sh|ts '%Y-%m-%dT%H:%M:%S'|tee gateways.log;done16:25
fungiwe can check that against observed connection failures and see if16:25
fungiit had a stray invalid gateway in that timeframe16:25
johnsomOk, cool, thanks!16:26
openstackgerritGomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Test changes made using base-test  https://review.opendev.org/c/zuul/zuul-jobs/+/78337816:27
fungijohnsom: yeah, that bug is pesky. would be good if we could figure out how to reliably reproduce it so neutron and/or openstack-ansible folks can finally have some hope of working on a fix16:29
fungii say openstack-ansible because both the clouds where we've observed it are deployed with it, no idea if it's actually involved or mere coincidence16:29
fungibut hard to say without a better understanding of what causes it as to whether it's an issue in the deployment, in what neutron configures, or even in one of the lower-level components neutron's relying on at the operating system level16:30
johnsomYeah, I know there have been some conflict issues with neutron routes, etc. in the past. I don't know if there is a way to get changes in those tables logged somehow or not. Short of a script like yours16:32
openstackgerritGomathi Selvi Srinivasan proposed zuul/zuul-jobs master: Test changes made using base-test  https://review.opendev.org/c/zuul/zuul-jobs/+/78337816:34
fungiwell, in this case it's more about filtering i think16:35
fungibasically what we think is happening is that a job node starts errantly leaking route announcements onto the network, and they're arriving at the mirror server even though it's in a separate tenant, so the mirror dutifully adds a new default route to the job node, which not only can't actually route traffic for it, but also completely disappears shortly thereafter, leaving a bogus default route installed on16:37
fungithe mirror until it expires16:37
*** ysandeep is now known as ysandeep|away16:37
fungipossible there's a race in port setup, for example, where during a brief window some packets which should be getting filtered are making it through16:37
*** d34dh0r53 has joined #opendev16:41
*** hamalq has joined #opendev16:43
*** hamalq_ has joined #opendev16:44
*** amoralej is now known as amoralej|off16:45
*** hamalq has quit IRC16:47
*** marios is now known as marios|out17:00
*** ykarel has quit IRC17:00
*** artom has quit IRC17:02
*** marios|out has quit IRC17:09
*** iurygregory has quit IRC17:12
*** iurygregory has joined #opendev17:13
*** artom has joined #opendev17:18
*** dtantsur is now known as dtantsur|afk17:28
*** eolivare has quit IRC17:39
*** d34dh0r53 has quit IRC17:40
*** d34dh0r53 has joined #opendev17:40
*** yoctozepto has quit IRC17:41
*** d34dh0r53 has quit IRC17:42
*** d34dh0r53 has joined #opendev17:43
*** yoctozepto has joined #opendev17:50
*** jpena is now known as jpena|off17:56
*** slaweq has quit IRC17:58
*** slaweq has joined #opendev18:03
*** slaweq has quit IRC18:09
*** diablo_rojo_phon has joined #opendev18:33
*** hashar has quit IRC18:38
*** andrewbonney has quit IRC18:47
*** ralonsoh has quit IRC19:04
*** noonedeadpunk_ has joined #opendev19:17
*** SWAT has quit IRC19:40
*** tkajinam_ has joined #opendev19:40
*** tkajinam has quit IRC19:41
*** slaweq has joined #opendev19:43
openstackgerritGomathi Selvi Srinivasan proposed opendev/base-jobs master: Revert https://review.opendev.org/c/opendev/base-jobs/+/782864  https://review.opendev.org/c/opendev/base-jobs/+/78346820:07
*** whoami-rajat has quit IRC21:30
*** sboyron has quit IRC21:59
*** tosky has quit IRC23:39

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!