ianw | there is something going on here | 00:07 |
---|---|---|
fungi | agreed, icmp !h is pretty much always a firewall rejecting something | 00:09 |
ianw | http://paste.openstack.org/show/794339/ | 00:09 |
ianw | that is traces of both sides between linaro-us mirror and afs01.dfw | 00:10 |
clarkb | these are udp packets right? one weird thing with udp that happens in some firewalls is it treats them as stateful connections but then if a ping isn't ponged recently enough that state goes away and the return traffic is killed | 00:10 |
ianw | so i can setup a ping <-> the two hosts and it won't drop a single packet | 00:10 |
fungi | not really, no | 00:16 |
ianw | PING 104.130.138.161 (104.130.138.161) 65500(65528) bytes of data. | 00:16 |
ianw | [1591229777.453607] 65508 bytes from 104.130.138.161: icmp_seq=1 ttl=54 time=6.12 ms | 00:16 |
fungi | firewalls usually set up a mock state based on source and destination port and address, but a separate ping won't force them to maintain that fake state | 00:17 |
ianw | i mean it doesn't seem to be mtu, those two hosts are fine with big packets and the mtu is only 1450 | 00:17 |
fungi | often it's overloaded state tracking tables choosing to sacrifice udp flows | 00:18 |
clarkb | fungi: sorry nit a literal ping. Just a response to a packet in the opposite direction | 00:18 |
ianw | 139.178.85.143 > afs01.dfw.openstack.org: ICMP host 139.178.85.143 unreachable - admin prohibited, length 102 | 00:18 |
ianw | 92.168.1.4 > 104.130.138.161: ICMP host 192.168.1.4 unreachable - admin prohibited, length 102 | 00:18 |
clarkb | if outbound happens and then inbound is 10 minutes later the stateful bits aregone | 00:18 |
clarkb | its all tied to ip:port tuples | 00:19 |
fungi | or if the state table is under extreme pressure and expires that state way sooner | 00:19 |
*** _mlavalle_1 has joined #opendev | 00:19 | |
ianw | who is rejecting who? | 00:19 |
fungi | 92.168.1.4 is responding to 104.130.138.161 to say that 192.168.1.4 is unreachable. i guess that's a nat destination? | 00:20 |
openstackgerrit | Merged opendev/system-config master: Restart apache on graphite when LE updates certs https://review.opendev.org/733247 | 00:21 |
ianw | sorry, dropped a 1 there | 00:21 |
fungi | oh... | 00:22 |
ianw | yeah, 192.168.1.4 is the mirror ip address, which is a floating ip | 00:22 |
ianw | 139.178.85.143 | 00:23 |
fungi | in that case it's either a packet filter *on* 192.168.1.4 rejecting a packet from 104.130.138.161 with a !h response *or* it's something between 192.168.1.4 and 104.130.138.161 which has responded on behalf of 192.168.1.4 and spoofed its source address | 00:23 |
*** mlavalle has quit IRC | 00:23 | |
ianw | but we see the reject *on* the mirror node, right? | 00:23 |
fungi | both scenarios are common enough | 00:23 |
ianw | 00:02:05.099258 IP 192.168.1.4 > 104.130.138.161: ICMP host 192.168.1.4 unreachable - admin prohibited, length 102 | 00:24 |
fungi | oh, is this seen in a packet capture made on the mirror? | 00:24 |
ianw | that seen on mirror.regionone... says that *it* must have sent back the reject? | 00:25 |
fungi | i didn't catch all of the scrollback | 00:25 |
ianw | http://paste.openstack.org/show/794339/ has a trace from both sides | 00:25 |
fungi | and yeah, in that case i would take it to mean the mirror is indeed the origin of that rejection | 00:25 |
ianw | the firewall rules are the same afaics as every other mirror | 00:25 |
fungi | "from server to client" is a tcpdump run in a shell on the server? or on the client? | 00:26 |
ianw | server to client is tcpdump run on afs01.dfw with host of mirror.regionone.linaro-us | 00:27 |
ianw | tcpdump -i eth0 host 139.178.85.143 on afs01 | 00:28 |
clarkb | I think we log droppedpackets in syslog | 00:29 |
fungi | sorry, i'm a bit braindead at this time of night and the trace would have been easier to follow if -n had been used. the mix of sometimes resolved sometimes not addresses is more than i can keep in my head right now | 00:30 |
ianw | i've run bidirectional pings for like 10+ minutes between them and not one dropped packet | 00:30 |
fungi | it's possible the icmp !h response from 192.168.1.4 is not coming from the kernel/iptables, it could be coming from the service listening for 51838/udp datagrams | 00:32 |
fungi | if it were just the kernel tcp/ip stack refusing the datagram because there was no longer anything to pass it to, i would have expected port unreachable rather than host unreachable | 00:33 |
fungi | but also the "admin prohibited" opt is pretty much only ever used by firewalling software | 00:35 |
fungi | and our iptables rules do that | 00:36 |
fungi | REJECT all -- anywhere anywhere reject-with icmp-host-prohibited | 00:36 |
fungi | so it *could* be we're somehow getting responses that iptables can't match to an existing tracked state | 00:36 |
ianw | http://paste.openstack.org/show/794340/ | 00:36 |
ianw | here it is with -n on both sides | 00:36 |
fungi | 51838 doesn't appear to be a udp port we're allowing, so it's only getting responses thanks to state tracking entries | 00:37 |
fungi | so in this case i feel like the icmp host unreachable admin prohibited packets are being generated in response to something which iptables lacks a rule or an existing state for | 00:37 |
ianw | i'll try adding some drop logging ... | 00:39 |
ianw | arrggghhhh damn it i seem to have cut myself off | 00:40 |
ianw | damn it, i'm going to have to reboot it | 00:41 |
ianw | http://paste.openstack.org/show/794341/ looked reasonable to me | 00:41 |
ianw | guess what ... after a reboot ... none of it is happening | 00:50 |
ianw | at least i have some traces if it happens again | 00:51 |
*** diablo_rojo has quit IRC | 00:58 | |
*** Meiyan has joined #opendev | 00:59 | |
*** xiaolin has joined #opendev | 01:25 | |
ianw | https://tarballs.opendev.org/openstack/openstack-zuul-jobs/aarch64/ ... interesting, it almost got there but not quite it seems | 01:36 |
ianw | oh ... i guess artifact retrieval isn't recursive for directories? | 01:43 |
*** _mlavalle_1 has quit IRC | 01:51 | |
*** xiaolin has quit IRC | 01:53 | |
ianw | do the executors have wget? | 02:27 |
ianw | i just wrote a recursive download but then realised it probably doesn't work :/ | 02:28 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: [wip] download-artifact : recursive download https://review.opendev.org/733425 | 02:38 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: [wip] download-artifact : recursive download https://review.opendev.org/733425 | 02:43 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: [wip] download-artifact : recursive download https://review.opendev.org/733425 | 02:45 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: [wip] download-artifact : recursive download https://review.opendev.org/733425 | 02:49 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: [wip] download-artifact : recursive download https://review.opendev.org/733425 | 02:50 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: [wip] download-artifact : recursive download https://review.opendev.org/733425 | 03:01 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: [wip] download-artifact : recursive download https://review.opendev.org/733425 | 03:09 |
*** auristor has joined #opendev | 03:18 | |
*** xiaolin has joined #opendev | 03:25 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: [wip] download-artifact : recursive download https://review.opendev.org/733425 | 03:39 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: [wip] download-artifact : recursive download https://review.opendev.org/733425 | 03:50 |
*** sgw has quit IRC | 04:00 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: [wip] download-artifact : recursive download https://review.opendev.org/733425 | 04:00 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: [wip] download-artifact : recursive download https://review.opendev.org/733425 | 04:17 |
*** ykarel|away is now known as ykarel | 04:31 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: [wip] download-artifact : recursive download https://review.opendev.org/733425 | 04:51 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: download-artifact : support recursive download https://review.opendev.org/733425 | 05:03 |
*** sgw has joined #opendev | 05:12 | |
AJaeger | ianw: working now? ;) | 05:14 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: download-artifact : support recursive download https://review.opendev.org/733425 | 05:14 |
AJaeger | ianw: could you push a change to remove project-config-build-openafs-centos* from openstack-zuul-jobs, please? It should depend on https://review.opendev.org/733049 | 05:15 |
ianw | AJaeger: that bit is almost, yes :) after all this i still haven't actually got the openafs bits published | 05:15 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: download-artifact : support recursive download https://review.opendev.org/733425 | 05:18 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: download-artifact : support recursive download https://review.opendev.org/733425 | 05:20 |
AJaeger | ianw: left some comments on an older patchset, you're fixing too quickly ;) | 05:20 |
AJaeger | ianw: I think https://review.opendev.org/#/c/733425/14 still applies, please check | 05:21 |
ianw | sorry yeah i *thought* is was no longer wip but then i started fiddling it :) | 05:22 |
ianw | AJaeger: yeah it will expire; that was something i need to add a comment about. i'm not sure what we can do ... | 05:22 |
AJaeger | ianw: it's fine. | 05:24 |
ianw | it will basically mean that when the role gets updated (and the test runs) you'll need to choose some new artifact to pull to test against | 05:25 |
AJaeger | ianw: want to add the test a separate change stacked on top? Then we can discuss merging the first - and consider what to do with the expired content? | 05:25 |
*** ysandeep|away is now known as ysandeep | 05:25 | |
ianw | the test shouldn't run unless the actual bits of the download-artifacts role have changed (file matcher) ... so it's not like it will be gate brekaing | 05:26 |
AJaeger | indeed | 05:26 |
AJaeger | config-core, please review https://review.opendev.org/732150 and https://review.opendev.org/731989 | 05:27 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: download-artifact : support recursive download https://review.opendev.org/733425 | 05:30 |
ianw | AJaeger: oh, 731989 was one i didn't quite understand | 05:33 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: download-artifact : support recursive download https://review.opendev.org/733425 | 05:42 |
AJaeger | ianw: 731989 is a partial revert | 05:43 |
openstackgerrit | Merged zuul/zuul-jobs master: Document twine_executable https://review.opendev.org/732150 | 05:43 |
AJaeger | ianw: I introduced in https://review.opendev.org/#/c/731630/ the wheel bindep to use it in requirments repository's bindep.txt and then trying to add it, it did not work since the content is used for generate-constraints as well. So, I stopped that idea | 05:44 |
AJaeger | ianw: does that explanation help? | 05:45 |
ianw | ahh ok | 05:49 |
*** jaicaa has quit IRC | 05:51 | |
ianw | hrmm, really not sure why 733425 fails in post | 05:52 |
*** jaicaa has joined #opendev | 05:54 | |
AJaeger | that looks indeed odd | 05:54 |
hrw | morning | 06:00 |
hrw | mnaser: please add arm64 jobs and use check-arm64 pipeline. when we get close to the limit it will be a sign that it used so will be easier to request more resources | 06:01 |
*** gtema has joined #opendev | 06:02 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: download-artifact : support recursive download https://review.opendev.org/733425 | 06:13 |
*** gtema has quit IRC | 06:14 | |
*** gtema has joined #opendev | 06:15 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: download-artifact : support recursive download https://review.opendev.org/733425 | 06:25 |
ianw | AJaeger: I think because i used base-minimal to reduce anything else runnign with the test | 06:32 |
*** redrobot has quit IRC | 06:39 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: download-artifact: cleanup long when statement https://review.opendev.org/733446 | 06:47 |
ianw | AJaeger: https://review.opendev.org/#/c/733425/ passes (and is not wip now :) and https://review.opendev.org/733446 follow-on with the cleanup you mentioned | 06:48 |
*** gtema has quit IRC | 06:57 | |
*** gtema has joined #opendev | 06:59 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: download-artifact: cleanup long when statement https://review.opendev.org/733446 | 07:00 |
*** hashar has joined #opendev | 07:08 | |
AJaeger | thanks, ianw ! | 07:09 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: download-artifact: cleanup long when statement https://review.opendev.org/733446 | 07:13 |
zbr | ianw: https://review.opendev.org/#/c/731591/2 please, thanks. | 07:41 |
*** tosky has joined #opendev | 07:46 | |
*** rpittau|afk is now known as rpittau | 07:50 | |
*** moppy has quit IRC | 08:01 | |
*** moppy has joined #opendev | 08:01 | |
openstackgerrit | Merged opendev/irc-meetings master: Fix the policy popup team meeting id https://review.opendev.org/733298 | 08:06 |
*** dtantsur|afk is now known as dtantsur | 08:13 | |
*** DSpider has joined #opendev | 08:19 | |
*** xiaolin has quit IRC | 08:20 | |
*** ysandeep is now known as ysandeep|lunch | 08:20 | |
*** roman_g has joined #opendev | 08:23 | |
openstackgerrit | Pierre Riteau proposed openstack/project-config master: Add Backport-Candidate label for Kolla deliverables https://review.opendev.org/733243 | 08:43 |
*** tkajinam has quit IRC | 08:48 | |
openstackgerrit | Pierre Riteau proposed openstack/project-config master: Add Backport-Candidate label for Kolla deliverables https://review.opendev.org/733243 | 08:54 |
*** priteau has joined #opendev | 08:57 | |
*** xiaolin has joined #opendev | 09:03 | |
*** ysandeep|lunch is now known as ysandeep | 09:03 | |
yoctozepto | morning folks | 09:05 |
yoctozepto | could anyone advise on https://review.opendev.org/705547 - it merged long time ago but it does not seem the buttons work :/ could it be that gerrit did not actually like the sequence gap in there? | 09:06 |
frickler | yoctozepto: humm, I'll take a look | 09:10 |
frickler | yoctozepto: all the documentation I can find talks about ranges for labels. as the actual value doesn't seem to really matter, I'd propose to rewrite that to use 0..1 instead of 0,2 | 09:27 |
yoctozepto | frickler: or +0..+2, we might want to use important from time to time (though likely rarely), we don't really need only -1 as it never happens (branch freeze) | 09:43 |
yoctozepto | frickler: wdyt? | 09:43 |
openstackgerrit | Radosław Piliszek proposed openstack/project-config master: Fix devstack's review-priority label https://review.opendev.org/733513 | 09:46 |
yoctozepto | frickler: done according to the docs ^ | 09:46 |
*** Meiyan has quit IRC | 09:49 | |
*** roman_g has quit IRC | 09:56 | |
*** jesusaur has quit IRC | 09:56 | |
*** dpawlik has quit IRC | 09:57 | |
*** kevinz has quit IRC | 09:57 | |
*** dpawlik3 has joined #opendev | 09:57 | |
*** jesusaur has joined #opendev | 09:57 | |
*** rpittau is now known as rpittau|bbl | 10:02 | |
*** dpawlik3 has quit IRC | 10:05 | |
*** dpawlik3 has joined #opendev | 10:05 | |
*** gtema has quit IRC | 10:15 | |
*** ysandeep is now known as ysandeep|afk | 10:51 | |
*** gtema has joined #opendev | 11:01 | |
*** ravsingh has joined #opendev | 11:21 | |
*** slaweq has joined #opendev | 11:24 | |
openstackgerrit | Felix Edel proposed zuul/zuul-jobs master: Return upload_results in upload-logs-swift role https://review.opendev.org/733564 | 11:24 |
slaweq | fungi: hi | 11:25 |
slaweq | fungi: I don't know if You remember but some time ago I was asking here about possibility of pinging some external resource in neutron tests | 11:25 |
slaweq | fungi: You pointed me to something which is in NODEPOOL_MIRROR_HOST | 11:26 |
slaweq | so I did patch https://review.opendev.org/#/c/730766/2 | 11:26 |
slaweq | but it don | 11:26 |
slaweq | but it don't works as expected | 11:26 |
slaweq | in all jobs I have error like: | 11:26 |
slaweq | Failed to ping IP: mirror.us-east.openedge.opendev.org via a ssh connection from: 172.24.5.112 | 11:26 |
slaweq | fungi: can You take a look and tell me if I choose wrong variable/host? or should I look for the issue somewhere in the test itself? | 11:27 |
*** donnyd_ has quit IRC | 11:33 | |
*** donnyd_ has joined #opendev | 11:33 | |
*** donnyd_ has quit IRC | 11:34 | |
*** rpittau|bbl is now known as rpittau | 11:35 | |
*** donnyd_ has joined #opendev | 11:35 | |
*** donnyd_ has quit IRC | 11:35 | |
*** donnyd_ has joined #opendev | 11:36 | |
*** donnyd_ has quit IRC | 11:36 | |
*** donnyd43 has joined #opendev | 11:39 | |
*** donnyd43 has left #opendev | 11:39 | |
openstackgerrit | Felix Edel proposed zuul/zuul-jobs master: Return upload_results in upload-logs-swift role https://review.opendev.org/733564 | 11:41 |
*** donnyd_ has joined #opendev | 11:42 | |
*** ysandeep|afk is now known as ysandeep | 11:43 | |
*** donnyd_ is now known as donnyd | 11:44 | |
frickler | slaweq: I don't think devstack usually sets up external connectivity for instances, why do you expect that to work? also like we should move this to either -qa or -neutron | 11:47 |
frickler | s/like/I think/ , /me wonders what my brain did there | 11:48 |
cgoncalves | is there a depth limit in which Zuul stops merging job dict variables? | 11:51 |
cgoncalves | I'm defining a job that should have merged its vars with vars in the parent's parent job | 11:51 |
cgoncalves | "An unhandled exception occurred while templating '{{ grenade_devstack_localrc.shared|combine(grenade_devstack_localrc.old) }}'. Error was a <class 'ansible.errors.AnsibleFilterError'>, original message: |combine expects dictionaries, got AnsibleUndefined" | 11:51 |
slaweq | frickler: we have in neutron-tempest-plugin test that checks connectivity to the external resource but it's disaabled on u/s gate for now | 11:52 |
slaweq | I would like to enable it as it was written due to some real bug which we had in the past with such kind of connectivity | 11:52 |
slaweq | so some time ago I asked here if that would be fine to e.g. ping 8.8.8.8 from our tests and fungi pointed me then to this zuul mirror | 11:53 |
frickler | slaweq: yeah, turns out my assumption about devstack was wrong. maybe DNS inside the instance isn't working. did you test with 8.8.8.8 before? or maybe 172.24.5.1 which should be the ip of the devstack host. if you trigger another check, let me know and I'll set up a hold in order to dig deeper | 12:00 |
mordred | cgoncalves: no, there shouldn't be | 12:24 |
openstackgerrit | Radosław Piliszek proposed openstack/project-config master: Fix devstack's review-priority label https://review.opendev.org/733513 | 12:25 |
cgoncalves | mordred, ok, that's good. would you be able to help me understand if that is not what is happening in https://review.opendev.org/#/c/733262/ ? | 12:25 |
cgoncalves | https://zuul.opendev.org/t/openstack/build/a72284e4e3334da9ab544596a85b59fa/log/job-output.txt#2207 | 12:26 |
cgoncalves | grenade_devstack_localrc.old is defined in https://opendev.org/openstack/grenade/src/branch/master/.zuul.yaml#L51 | 12:26 |
mordred | cgoncalves: that looks all correct to me ... | 12:29 |
mordred | cgoncalves: maybe clarkb or corvus will see something when they're up | 12:29 |
cgoncalves | mordred, thanks | 12:30 |
*** dpawlik6 has joined #opendev | 12:33 | |
frickler | cgoncalves: it looks like overriding grenade_devstack_localrc will mask any inheritance, so you'd need to define .old and .new yourself, too | 12:34 |
*** tobiash_ has joined #opendev | 12:35 | |
*** hrww has joined #opendev | 12:35 | |
*** owalsh_ has joined #opendev | 12:36 | |
cgoncalves | frickler, right, although Zuul documentation says Zuul merges dict variables through job inheritance: https://docs.openstack.org/devstack/latest/zuul_ci_jobs_migration.html#job-variables | 12:36 |
*** bolg has quit IRC | 12:36 | |
*** SotK has quit IRC | 12:36 | |
*** abhishekk has quit IRC | 12:36 | |
*** dpawlik3 has quit IRC | 12:36 | |
*** owalsh has quit IRC | 12:36 | |
*** tobiash has quit IRC | 12:36 | |
*** ysandeep has quit IRC | 12:36 | |
*** elod has quit IRC | 12:36 | |
*** paladox has quit IRC | 12:36 | |
*** hrw has quit IRC | 12:36 | |
*** hrww is now known as hrw | 12:36 | |
*** elod_ has joined #opendev | 12:37 | |
*** ysandeep has joined #opendev | 12:37 | |
*** SotK has joined #opendev | 12:37 | |
*** paladox has joined #opendev | 12:37 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Run iptables in service playbooks instead of base https://review.opendev.org/730999 | 12:44 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Rename service-letsencrypt to just letsencrypt https://review.opendev.org/731617 | 12:44 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Stop cloning drupal puppet modules https://review.opendev.org/731947 | 12:44 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Split inventory into multiple dirs and move hostvars https://review.opendev.org/730991 | 12:44 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Override bridge hostvars directly https://review.opendev.org/731258 | 12:44 |
mordred | cgoncalves: oh - I wonder if your hostvars are the thing breaking | 12:49 |
mordred | cgoncalves: "Host and group variables override variables with the same name defined as global variables." | 12:49 |
mordred | cgoncalves: if that's it, this might be a case where having old and new and shared as keys under grenade_devstack_localrc might be making it hard to use per-host overrides, and maybe we should get the grenade stuff updated to have grenade_devstack_localrc_old grenade_devstack_localrc_new and grenade_devstack_localrc_shared - so that you can do per-host overrides without breaking inheritance | 12:51 |
cgoncalves | mordred, good catch! | 12:51 |
cgoncalves | tosky, ^ | 12:52 |
tosky | ouch | 12:54 |
*** sshnaidm is now known as sshnaidm|mtg | 12:59 | |
*** tkajinam has joined #opendev | 13:02 | |
tosky | cgoncalves: wouldn't that break anyway, because you would not inherit from the base grenade_devstack_localrc_shared in your job? | 13:04 |
*** Guest10631 has joined #opendev | 13:04 | |
*** Guest10631 is now known as redrobot | 13:05 | |
tosky | in controller and controller2 | 13:05 |
*** mlavalle has joined #opendev | 13:06 | |
*** slaweq has quit IRC | 13:07 | |
*** slaweq_ has joined #opendev | 13:07 | |
*** roman_g has joined #opendev | 13:13 | |
*** elod_ is now known as elod | 13:17 | |
cgoncalves | tosky, yeah. could we consider different var names and then ansible | combine() them? | 13:18 |
tosky | different how? No metter how they are defined, if you override them in host-vars.controller, you would need to rewrite all the values anyway | 13:19 |
cgoncalves | tosky, if vars are named differently, it wouldn't be an override | 13:20 |
tosky | I don't follow, sorry | 13:21 |
tosky | if we switch to grenade_devstack_localrc_shared and grenade_devstack_localrc_old and ..._new, that instruction would be | 13:21 |
tosky | '{{ grenade_devstack_localrc_shared|combine(grenade_devstack_localrc_old) }} | 13:22 |
tosky | you would have grenade_devstack_localrc_old, but you are overriding grenade_devstack_localrc_shared in host-vars.controller | 13:22 |
tosky | and if I understand it correctly, that means not using the base variable, without the values for, say, DATABASE_PASSWORD and so on | 13:23 |
cgoncalves | tosky, it's far from ideal but my thinking is: keep existing var names and introduce new ones like suggested (grenade_devstack_localrc_{shared,old,new}) so folks can use in {host,group}-vars and, when defined, combine in the grenade playbook | 13:23 |
*** rpittau is now known as rpittau|brb | 13:24 | |
tosky | so combine in addition? | 13:25 |
tosky | that make things a bit complicated: when do you override one or the other? | 13:26 |
tosky | I feel that it needs still a more clear solution | 13:26 |
cgoncalves | +1 | 13:26 |
*** slaweq_ is now known as slaweq | 13:28 | |
fungi | slaweq: the error message from that "ping test" indicates it's "pinging" with an ssh connection... does it expect to authenticate successfully? how/where is that test implemented? | 13:35 |
slaweq | fungi: that may be true, I will check that test | 13:41 |
slaweq | thx a lot | 13:41 |
fungi | i suspect it's designed for pinging other nested instances in devstack | 13:45 |
fungi | so may expect to have viable ssh keys and even maybe to be able to run some minimal/no-op command in a shell on them | 13:45 |
fungi | something using actual icmp echo ping may be a better fit for your purpose | 13:46 |
*** hashar has quit IRC | 13:56 | |
*** rpittau|brb is now known as rpittau | 14:01 | |
zbr | do we have an example of single build stage Dockferfile for our python-builder? the two stage Dockerfile is a PITA for development. Is good for producing final images, but not while you are developing, it is very slow. | 14:03 |
mordred | zbr: that hasn't been my experience - docker does a really good job of caching the builds for me | 14:05 |
mordred | which one are you hacking on? | 14:05 |
mordred | zbr: (and no - the python-builder is explicitly designed for multi-stage build dockerfiles) | 14:05 |
corvus | cgoncalves, tosky, mordred: how about this? grenade_devstack_localrc, grenade_devstack_localrc_group, grenade_devstack_localrc_host. define those in zuul as global, group, and host vars respectively. then in ansible combine() them at the last minute. that lets you take advantage of zuul's internal combining and bypassing ansible's override. | 14:05 |
mordred | corvus: not a bad idea | 14:06 |
tosky | corvus: so keep the .shared, .new and .old items for each of them? | 14:06 |
corvus | tosky: yeah, i think they should continue to merge as expected | 14:06 |
corvus | (the way i look at it, they're really a second axis: old/new (where the first axis is controller/subnode)) | 14:07 |
mordred | yeah - so you put grenade_devstack_localrc_group in group_vars and grenade_devstack_localrc_host in hostvars | 14:07 |
mordred | yeah | 14:07 |
* mordred is just a "yeah" bot | 14:07 | |
tosky | ack - and you don't plan any other top-level variable type like host-vars and groups which may potentially override again the base variables, do you? | 14:07 |
tosky | :) | 14:07 |
mordred | tosky: not until ansible makes one :) | 14:07 |
tosky | oh, right, that comes from ansible | 14:08 |
tosky | thanks for the idea | 14:08 |
tosky | cgoncalves: just to know that it may take some time for me to do that | 14:08 |
*** gtema has quit IRC | 14:09 | |
*** gtema has joined #opendev | 14:10 | |
corvus | fwiw, i would probably start by setting up a local inventory (with localhost playing the part of 2 hosts) and verify that i can get all the ansbile vars combining as i expected first | 14:10 |
corvus | (you can add the same host to the inventory twice with separate names for testing things like this) | 14:10 |
corvus | then move on to zuul | 14:10 |
*** tobiash_ is now known as tobiash | 14:23 | |
hrw | morning | 14:26 |
hrw | where can I see content of /etc/pip.conf provided for CI jobs? | 14:27 |
*** hashar has joined #opendev | 14:36 | |
clarkb | hrw: the template is in the configure mirrors role. If you want the actual per job content you can fetch it as part of your logs | 14:56 |
clarkb | hrw https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/configure-mirrors/templates/etc/pip.conf.j2 | 14:58 |
zbr | mordred: not sure from which infra project I took it but I was working on e-r trans to container. | 15:00 |
mordred | zbr: nod | 15:00 |
zbr | two step build works correctly, it not its fault. is just that is inconvenient for local hacking | 15:01 |
mordred | nod. and yeah - the copy . /tmp/src is a bummer since it invalidates all of the build cache | 15:03 |
hrw | clarkb: thanks | 15:07 |
clarkb | slaweq: fungi also, was a second layer of NAT configured to make the ip routable external to the host? | 15:08 |
fungi | i do not know, but that's certainly worth double-checking | 15:09 |
*** ykarel is now known as ykarel|away | 15:37 | |
cgoncalves | corvus, that sounds good to me | 15:37 |
cgoncalves | tosky, sure, I understand. thank you | 15:37 |
*** lpetrut has joined #opendev | 15:48 | |
*** factor has quit IRC | 15:59 | |
*** lpetrut has quit IRC | 16:03 | |
*** tkajinam has quit IRC | 16:05 | |
openstackgerrit | James E. Blair proposed opendev/system-config master: Stop using backend hostname in zuul testinfra tests https://review.opendev.org/733409 | 16:14 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Recombine zuul testinfra tests into one file https://review.opendev.org/733413 | 16:14 |
corvus | mordred: did you see the chat between clarkb and i about 730999 (iptables testing) yesterday? | 16:15 |
mordred | corvus: reading now | 16:16 |
mordred | corvus: ah - yeah. I like your reorg - although I'm not sure I fully follow how it helps avoid the fail-open | 16:19 |
mordred | since we could still just not match right? | 16:19 |
corvus | mordred: separately, i think you lost my fix for the ansible-lint error in 731258 when you pushed up the %ip fix; but it turns out my fix had a problem anyway (i used the wrong quotes -- now *that* is something i would expect a linter to catch, but it did not). i'm about to push up a new patchset which should fix it right. you might want to grab that locally. | 16:20 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Override bridge hostvars directly https://review.opendev.org/731258 | 16:20 |
corvus | mordred: ^ that's the lint fix | 16:20 |
corvus | mordred: yeah (also, note that clarkb preferred the multiple files version) | 16:20 |
corvus | mordred: (so we might stop there and not do the recombine patch) | 16:20 |
mordred | corvus: ++ | 16:20 |
mordred | yeah - I think I like the multiple files version | 16:21 |
corvus | mordred: re not failing open -- agreed, the main thing is that with either of these, we will have a definitive result for each test. so if we're not matching, we'll see "SKIPPED" | 16:21 |
corvus | which isn't going to fail the change, but at least if we look, we'll know. | 16:21 |
corvus | right now, if we look at 730999, it just says it passed the test even though there's a part of the test that should have failed, and we can't tell if that part ran | 16:22 |
mordred | corvus: I DO NOT UNDERSTAND WHY THE LINTER DOES NOT LIKE WHAT YOU FIXED | 16:22 |
mordred | we set the fact above unconditionally :( | 16:23 |
corvus | mordred: agreed | 16:23 |
corvus | re 999: with the reorg, we'll either get SKIPPED in which case we know there's an issue with the hostname matching (did we typo the hostname?), FAILED in which case there's apparently just an issue with using host.backend.get_hostname() and we can carry on, or SUCCESS in which case something about the test is fundamentally broken | 16:24 |
mordred | ++ | 16:24 |
fungi | galaxy express 999? | 16:25 |
openstackgerrit | James E. Blair proposed opendev/system-config master: Stop running ansible-lint on this repo https://review.opendev.org/733406 | 16:25 |
corvus | fungi: will take you on a journey, a never ending journey | 16:25 |
fungi | i'm buying my ticket right now | 16:25 |
clarkb | mordred: corvus re the linter, I wonder if its a parsing problem like unmatched ' ? | 16:29 |
clarkb | basically ansible applying some layer of parsing above yaml that causes it to be undefined somehow | 16:30 |
clarkb | are we using _'s? | 16:30 |
clarkb | iirc ansible has said no - or no _ I can't recall which is good and which is bad anymore though | 16:30 |
zbr | interesting to remark the amount of subjective arguments and the linting subject | 16:30 |
clarkb | I think its no more - because - is a symbol in python that means a thing | 16:30 |
*** rpittau is now known as rpittau|afk | 16:31 | |
*** dtantsur is now known as dtantsur|afk | 16:33 | |
zbr | my impression is that the complains are related to https://github.com/ansible/ansible-lint/issues/776 which was fixed in last version. | 16:34 |
mordred | it looks like it wasn't fixed in py3 | 16:39 |
mordred | according to the last comment | 16:39 |
fungi | but also, for this repository in particular, we wind up spending more time fighting that linter than we benefit from bugs it identifies which wouldn't be caught by our other tests | 16:40 |
fungi | (if we're committing ansible to the opendev/system-config repository which is not exercised, then that's a problem in itself) | 16:41 |
*** ravsingh has quit IRC | 16:49 | |
*** priteau has quit IRC | 16:49 | |
*** ysandeep is now known as ysandeep|away | 16:53 | |
*** sshnaidm|mtg is now known as sshnaidm|afk | 16:59 | |
*** gtema has quit IRC | 17:06 | |
*** hashar has quit IRC | 17:07 | |
*** priteau has joined #opendev | 17:11 | |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Stop using backend hostname in zuul testinfra tests https://review.opendev.org/733409 | 17:13 |
mordred | corvus: there was a missing import ^^ | 17:14 |
corvus | mordred: drat, thx | 17:14 |
corvus | mordred: i was just looking at 731258 which is green | 17:14 |
corvus | i wanted to triple check that it's working as expected | 17:14 |
mordred | corvus: so - in good news - the test did run on zuul01 - so we know that matchign worked | 17:14 |
*** priteau has quit IRC | 17:15 | |
corvus | mordred: iiuc no "zuul" in https://98f17a7a28d5343bcff7-cb207f5accf30a643a07da583fd48b42.ssl.cf1.rackcdn.com/731258/20/check/system-config-run-base/7cb6206/bridge.openstack.org/ara-report/result/6cde835a-b4ac-469b-88fb-791b983b752b/ means success on 731258, right? | 17:17 |
mordred | corvus: yes. that is the goal of the extra_users: [] | 17:19 |
mordred | corvus: so I think that means \o/ | 17:20 |
corvus | cool, i think it'd be good to see 733409 fail for the right reason then succeed; then maybe we're good to push through the stack? | 17:22 |
mordred | corvus: I think so | 17:23 |
mordred | and then I think we're in a much more solid place | 17:24 |
*** gtema has joined #opendev | 17:37 | |
*** gtema has quit IRC | 17:41 | |
openstackgerrit | Sorin Sbarnea (zbr) proposed zuul/zuul-jobs master: test-playbooks: avoid warnings with shell/command https://review.opendev.org/731605 | 17:55 |
openstackgerrit | Jeremy Stanley proposed opendev/system-config master: Forward user-committee ML to openstack-discuss https://review.opendev.org/733673 | 18:11 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform jobs https://review.opendev.org/733675 | 18:18 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 18:30 |
*** gtema has joined #opendev | 18:31 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 18:32 |
*** gtema has quit IRC | 18:35 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 18:40 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 18:48 |
*** roman_g has quit IRC | 18:50 | |
clarkb | yoctozepto: fyi I -1'd https://review.opendev.org/#/c/733243/4 because gerrit basically. Other than that I think the change is fine if yall want ot respin it | 18:58 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 19:01 |
openstackgerrit | Merged openstack/project-config master: Stop to use the __future__ module. https://review.opendev.org/732911 | 19:07 |
openstackgerrit | Merged openstack/project-config master: Fix devstack's review-priority label https://review.opendev.org/733513 | 19:07 |
openstackgerrit | Merged openstack/project-config master: Remove periodic openafs jobs https://review.opendev.org/733049 | 19:10 |
clarkb | that ^ is removal of centos package builds for openafs (not our wheels or other periodic things) | 19:11 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 19:13 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 19:17 |
mordred | corvus: it does not seem to have failed properly | 19:38 |
mordred | corvus: my hunch is that util.get_ips is returning an empty list | 19:40 |
mordred | (or an empty set rather) | 19:43 |
*** hashar has joined #opendev | 19:44 | |
corvus | aha, that would do it | 19:45 |
mordred | corvus: but ... I cannot explain why it would do that | 19:46 |
mordred | we're doing multi-node-hosts-file | 19:46 |
mordred | so I expect ze01.opendev.org to be in /etc/hosts | 19:46 |
mordred | (we are not collecting the /etc/hosts file) | 19:47 |
corvus | does the lookup we're doing consult that, or only dns? | 19:47 |
corvus | hrm, should use /etc/hosts -- at least, it does locally | 19:48 |
mordred | yeah. same for me | 19:49 |
mordred | fwiw - https://zuul.opendev.org/t/openstack/build/d1db88a6ee0049ab86efeb7d75deaaf8/log/zuul01.openstack.org/rules.v4.txt#21 <-- we wouldn't match the rule correctly even if we _were_ runnin ghte test | 19:49 |
mordred | but we are setting the iptables - so that's at least good | 19:49 |
corvus | mordred: why wouldn't we match? | 19:50 |
mordred | the string we're searching for is not actually what the rule is | 19:50 |
mordred | it's close - but there is some difference | 19:50 |
corvus | i don't think we're actually searching that file | 19:50 |
mordred | I'd expect the rule content to be roughtly the same though no? | 19:51 |
mordred | in any case - figuring out why we can't resolve that name is the more pressing concern | 19:51 |
mordred | should we collect /etc/hosts? | 19:51 |
corvus | iptables -t filter -S is the command it runs | 19:53 |
corvus | -A openstack-INPUT -s 23.253.245.60/32 -p tcp -m state --state NEW -m tcp --dport 4730 -j ACCEPT | 19:53 |
corvus | that's a sample output line from the real scheduler | 19:53 |
mordred | ah - ok | 19:53 |
corvus | but yeah, the hostname... hrm. yeah, i guess we should collect that? | 19:54 |
corvus | do we have a task where we write it...? | 19:55 |
mordred | corvus: yeah - | 19:55 |
mordred | corvus: https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-base-pre.yaml#L5 | 19:56 |
corvus | oh wow, we have ze01.openstack and ze01.opendev hosts in this job? | 19:57 |
mordred | corvus: yeah | 19:57 |
mordred | corvus: one is on xenial and one is on focal | 19:58 |
corvus | this claims it's adding it to bridge: https://zuul.opendev.org/t/openstack/build/d1db88a6ee0049ab86efeb7d75deaaf8/console#1/0/23/bridge.openstack.org | 19:58 |
corvus | and bridge is where we run testinfra | 19:59 |
mordred | yeah | 20:00 |
mordred | corvus: in that diff output, what do after_header and before_header mean | 20:00 |
openstackgerrit | Merged zuul/zuul-jobs master: test-playbooks: improved syntax https://review.opendev.org/731591 | 20:00 |
corvus | mordred: i have no idea, the lack of info on lineinfile and template, etc, make me sad | 20:01 |
mordred | yeah | 20:01 |
mordred | I think the command looks reasonable though | 20:01 |
mordred | I'm going to push up a rev collecting /etc/hosts - and putting in an assert that we're getting ips, k? | 20:01 |
mordred | or should we put in a hold (or I guess we could do both) | 20:02 |
corvus | let's do both | 20:02 |
corvus | or all 3 even | 20:02 |
corvus | that assert should fail and trigger the hold | 20:02 |
mordred | yeah. I've got the patch for /etc/hosts and the assert ready to go | 20:03 |
mordred | you in a position to put in the hold or want me to? | 20:03 |
corvus | mordred: i'll do it, go ahead and push | 20:03 |
mordred | kk | 20:03 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Stop using backend hostname in zuul testinfra tests https://review.opendev.org/733409 | 20:03 |
corvus | all set | 20:04 |
corvus | mordred: i think since we've examined the iptables script, we can probably merge your changes | 20:04 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 20:04 |
mordred | corvus: yeah. and iterate on this to fix the tests | 20:05 |
corvus | clarkb: are you available to +3 https://review.opendev.org/730999 ? | 20:05 |
corvus | clarkb: and https://review.opendev.org/731258 | 20:05 |
corvus | i think those are the only 2 that need another +2 | 20:06 |
clarkb | corvus: are we ready on 730999? it sounded like we hadnt seen it fial then pass? | 20:06 |
clarkb | oh I see we'refixing the issue with interolation there then cleaning up testing separately | 20:07 |
clarkb | +3'd 999 | 20:07 |
clarkb | looking at the other one now | 20:08 |
corvus | clarkb: yeah, and we've inspected the actual iptables script so the testing hole should be okay while we close it up. | 20:08 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 20:08 |
clarkb | +2'd the other. Seems like with this stack we may want to approve them somewhat sequentially to ensure each major step is working? | 20:09 |
mordred | clarkb: when's the last time a giant stack from me was ever not perfect? | 20:10 |
corvus | meh, i say bombs away and fix all the issues in parallel :) | 20:10 |
corvus | (i *think* we've been pretty good about checking up on the tests for these, so i have reasonable confidence in the gate results) | 20:10 |
clarkb | corvus: thats a good point (including the iptables checking) | 20:12 |
fungi | i'm free to help fix stuff again after i finish cooking dinner | 20:18 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 20:20 |
*** gtema has joined #opendev | 20:32 | |
*** gtema has quit IRC | 20:37 | |
corvus | okay we have a failure on the test change; checking it now | 20:58 |
corvus | mordred: your assert hit as expected | 20:58 |
*** sgw has quit IRC | 21:00 | |
corvus | the /etc/hosts files do not have the expected hosts | 21:04 |
corvus | https://zuul.opendev.org/t/openstack/build/70b4d930c6574d85b76e919bbc635751/log/ze01.opendev.org/hosts | 21:04 |
openstackgerrit | Merged opendev/system-config master: Run iptables in service playbooks instead of base https://review.opendev.org/730999 | 21:05 |
corvus | mordred: and that's because we run 'set-hostname' after we run the multi-node-known-hosts role: https://zuul.opendev.org/t/openstack/build/70b4d930c6574d85b76e919bbc635751/console#1/0/35/bridge.openstack.org | 21:09 |
corvus | https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-base-pre.yaml#L8 | 21:10 |
mordred | corvus: hah | 21:11 |
mordred | corvus: well - multi-node-host-file is designed to append entries | 21:11 |
mordred | corvus: should we move it to after set-hostname? | 21:11 |
mordred | that way we'll get our /etc/hosts written adn then just append ansible inventory files to it | 21:11 |
corvus | mordred: yeah -- i was just looking back through the change history to see if we're missing anything here, but i don't think we are. i think reordering should be fine | 21:12 |
mordred | k. lemme push that up. | 21:13 |
corvus | mordred: while you're at it | 21:13 |
corvus | mordred: can you add grabbing /etc/hosts from bridge too? | 21:13 |
corvus | i've released the hold | 21:13 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Stop using backend hostname in zuul testinfra tests https://review.opendev.org/733409 | 21:13 |
mordred | corvus: yes. crappit. one sec | 21:13 |
openstackgerrit | Monty Taylor proposed opendev/system-config master: Stop using backend hostname in zuul testinfra tests https://review.opendev.org/733409 | 21:14 |
mordred | corvus: done | 21:14 |
corvus | mordred: lg -- hopefully that should fail for realz now | 21:15 |
corvus | clarkb: got a min for one more re-review? https://review.opendev.org/731583 | 21:16 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 21:18 |
*** sgw has joined #opendev | 21:18 | |
clarkb | mordred: corvus minor thing on https://review.opendev.org/731583 I don't actually think it matters too much for this case so I +2'd anyway (but didn't approve in case it does matter more than I thought) | 21:19 |
openstackgerrit | Merged opendev/system-config master: Rename service-letsencrypt to just letsencrypt https://review.opendev.org/731617 | 21:19 |
openstackgerrit | Merged opendev/system-config master: Stop cloning drupal puppet modules https://review.opendev.org/731947 | 21:19 |
fungi | i've likely been living in a cave, but what does #{ syntax do? associative array lookup? | 21:19 |
fungi | or some special scoping? | 21:20 |
clarkb | fungi: in ruby thats string interolation | 21:20 |
fungi | oh! ruby | 21:20 |
fungi | my eyes glazed right over at the .rb on the ends of those files | 21:21 |
fungi | i thought i was reviewing j2 for some reason. curly brace tunnelvision | 21:21 |
*** xiaolin has quit IRC | 21:25 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 21:25 |
mordred | clarkb: let's do that as a followup I think | 21:25 |
clarkb | mordred: ok I wrote a second comment to clarify something about the first but followup is fine | 21:26 |
openstackgerrit | Monty Taylor proposed opendev/puppet-openstack_infra_spec_helper master: Name gate inventory file gate-hosts https://review.opendev.org/733704 | 21:27 |
mordred | clarkb: ^^ followup | 21:27 |
*** _mlavalle_1 has joined #opendev | 21:30 | |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 21:31 |
*** mlavalle has quit IRC | 21:33 | |
openstackgerrit | Merged opendev/puppet-openstack_infra_spec_helper master: Install hosts and group files into service location https://review.opendev.org/731583 | 21:38 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 21:40 |
clarkb | mordred: approved thanks | 21:45 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 21:59 |
openstackgerrit | Albin Vass proposed zuul/zuul-jobs master: WIP: Add terraform roles https://review.opendev.org/733675 | 22:04 |
*** hashar has quit IRC | 22:05 | |
ianw | still getting pretty regular afs dropouts on arm64 mirror | 22:07 |
ianw | [Thu Jun 4 20:18:37 2020] afs: Lost contact with volume location server 104.130.136.20 in cell openstack.org (code -1) | 22:07 |
ianw | [Thu Jun 4 20:18:48 2020] afs: volume location server 104.130.136.20 in cell openstack.org is back up (code 0) | 22:07 |
clarkb | ianw: it returned after the reboot then | 22:07 |
clarkb | ianw: fungi did we decide it was the local server dropping the packets? | 22:07 |
ianw | i tried to dump the iptables logs and cut myself off | 22:08 |
ianw | FileLog:Thu Jun 4 00:39:41 2020 CB: ProbeUuid for host 00007FA26872E448 (139.178.85.143:54328) failed -1 | 22:10 |
*** _mlavalle_1 has quit IRC | 22:10 | |
ianw | that's the last one of those probeuuid failures | 22:10 |
*** mlavalle has joined #opendev | 22:11 | |
ianw | 22:11:03 up 21:24 ... so i must have rebooted it around then | 22:11 |
fungi | clarkb: it looked like something caused iptables to decide after a point that those datagrams no longer matched any established flow | 22:12 |
fungi | that could point to a conntrack bug for example, maybe an architecture-specific one even | 22:13 |
fungi | basically, ianw had packet captures which indicated that the client was emitting the precise sort of icmp errors which our iptables catch-all block rule creates | 22:15 |
fungi | packet captures taken on the client | 22:15 |
fungi | so no chance they were being spoofed by some middlebox elsewhere in the route | 22:15 |
fungi | but yeah, we didn't manage to confirm by correlating those with iptables reject logs | 22:16 |
fungi | that was going to be the next step when ianw inadvertently cut himself off from the instance, had to reboot, and that temporarily quelled the problem | 22:16 |
ianw | could i get a couple of eyes on https://review.opendev.org/#/c/733425/20 to do recursive downloads in download-artifact | 22:17 |
ianw | this is to avoid having to tarball up the openafs rpms and then untarball them to publish them on tarballs.opendev.org in the promote/publish job | 22:17 |
ianw | fungi: i've got a dump running now and tailing the kernel log, to see if i can capture packets around any reported dropout period | 22:18 |
clarkb | should we add '-A openstack-INPUT -j LOG' to the end of our iptables rules (but before the reject)? | 22:18 |
ianw | although it is not doing anything like this reject now | 22:19 |
ianw | s/this/that/ | 22:19 |
*** noonedeadpunk has quit IRC | 22:19 | |
*** noonedeadpunk has joined #opendev | 22:20 | |
fungi | ianw: on the recursive option to download-artifact, what webserver is serving those artifacts when they're being downloaded? | 22:21 |
fungi | is the build spinning up a web service on the job node? | 22:22 |
ianw | fungi: whatever is presenting us the log directories, which i guess varies by provider | 22:22 |
fungi | oh, we're retrieving these from log archival? | 22:22 |
fungi | neat, now this is making more sense, thanks ;) | 22:22 |
ianw | yeah; the artifact is just the log URL | 22:22 |
ianw | it is certainly a complex wget command and it took me quite a bit of fiddling to do | 22:23 |
fungi | i didn't know if there was some paused job providing a scratch space for file transfers similar to how the buildset registries work | 22:23 |
fungi | which seems like it would be more efficient, but if this is how single artifact publication is working now then i guess no need to redesign | 22:24 |
fungi | s/efficient/robust against internet flakiness/ | 22:24 |
ianw | yeah, i think it's the way the promote pipeline is designed | 22:26 |
ianw | i just had another thought that maybe that role could grab the manifest, and then download the urls that have the right prefix | 22:27 |
ianw | under the artifact's URL | 22:27 |
ianw | the result would be the same, but without the wget dependency ... and i actually wrote *most* of it for the download-logs.sh parsing bits | 22:29 |
fungi | oh, got it, we're relying on this to do publication of artifacts cross-pipeline (generated in gate, published in promote) so the paused job design isn't applicable anyway | 22:29 |
clarkb | ianw: left a note about what I think is a bug in the change | 22:29 |
fungi | i guess for our image publication we get away with not copying images around that way by uploading in gate and then flagging them in promote | 22:30 |
ianw | clarkb: oh doh, yes that is supposed to ignore | 22:30 |
fungi | it might be nice if we could come up with a similar model for arbitrary artifact publication | 22:30 |
fungi | but that's a much larger effort | 22:31 |
ianw | clarkb: yeah, i wasn't sure on mocking something out. whatever it is, it really has to exist for wget to download it. i figured the amount the job actually gets updated, just re-running something to get a downloadable artifact probably isn't too bad | 22:32 |
ianw | you can't really generate an artifact at the same time, because zuul has to report back before you can pick it up | 22:33 |
clarkb | ianw: ya I think ist more that will be an unexpected road block after you push your first patchset, it fails, you figure out why then ahve to fix it | 22:33 |
*** gtema has joined #opendev | 22:33 | |
openstackgerrit | Merged opendev/system-config master: Split inventory into multiple dirs and move hostvars https://review.opendev.org/730991 | 22:33 |
openstackgerrit | Merged opendev/system-config master: Override bridge hostvars directly https://review.opendev.org/731258 | 22:33 |
openstackgerrit | Merged opendev/puppet-openstack_infra_spec_helper master: Name gate inventory file gate-hosts https://review.opendev.org/733704 | 22:33 |
clarkb | ianw: what we might be able to do is run python -m http.simpleserver /dir and put a json manifest there and a file/dir the manifest points too? | 22:34 |
clarkb | that doesn't test we're speaking the zuul api properly though | 22:34 |
clarkb | and there is value in ^ too | 22:34 |
*** rajinir has quit IRC | 22:35 | |
*** jbryce has quit IRC | 22:35 | |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: download-artifact : support recursive download https://review.opendev.org/733425 | 22:35 |
openstackgerrit | Ian Wienand proposed zuul/zuul-jobs master: download-artifact: cleanup long when statement https://review.opendev.org/733446 | 22:35 |
*** knikolla has quit IRC | 22:36 | |
*** hillpd has quit IRC | 22:36 | |
*** rpittau|afk has quit IRC | 22:36 | |
*** mnaser has quit IRC | 22:36 | |
*** donnyd has quit IRC | 22:36 | |
*** larainema has quit IRC | 22:36 | |
*** Open10K8S has quit IRC | 22:37 | |
*** mnasiadka has quit IRC | 22:37 | |
*** wendallkaters has quit IRC | 22:37 | |
*** jentoio has quit IRC | 22:37 | |
*** jrosser has quit IRC | 22:37 | |
*** johnsom has quit IRC | 22:37 | |
*** vblando has quit IRC | 22:37 | |
*** seongsoocho has quit IRC | 22:37 | |
*** diablo_rojo_phon has quit IRC | 22:37 | |
*** seongsoocho_ has joined #opendev | 22:37 | |
*** rm_work has quit IRC | 22:38 | |
*** rajinir has joined #opendev | 22:38 | |
*** gtema has quit IRC | 22:38 | |
*** zbr has quit IRC | 22:38 | |
*** larainema has joined #opendev | 22:38 | |
*** knikolla has joined #opendev | 22:38 | |
fungi | yeah, as soon as we stop running openafs-rpm-package-build-centos-7-x86 that job will begin failing a month later when its last artifact is expired out of the swift container, right? | 22:39 |
clarkb | fungi: its chagne specific so as soon as that change's artifacts expire | 22:39 |
ianw | fungi: yep, but that job will only run if you update the download-artifact role too | 22:39 |
*** Open10K8S has joined #opendev | 22:39 | |
*** vblando has joined #opendev | 22:39 | |
fungi | oh, indeed, i missed the change=733420 | 22:40 |
*** mnasiadka has joined #opendev | 22:40 | |
fungi | yeah, so i guess the idea is any time you update download-artifact you must also update the change= in its test | 22:40 |
*** wendallkaters has joined #opendev | 22:40 | |
fungi | (and possibly jobname) | 22:41 |
*** jbryce has joined #opendev | 22:41 | |
*** jrosser has joined #opendev | 22:41 | |
*** mnaser has joined #opendev | 22:41 | |
ianw | yeah, i would not argue it's ideal | 22:41 |
*** jentoio has joined #opendev | 22:41 | |
*** donnyd has joined #opendev | 22:41 | |
*** hillpd has joined #opendev | 22:42 | |
*** johnsom has joined #opendev | 22:43 | |
clarkb | ya I think its going to come down to the balance of: is it more important to test that query of the zuul api functions or not update the query parameters when making other changes | 22:44 |
clarkb | I think I'm inclined to try the thing that tests it properly end to end ebfore deciding it will be a headache | 22:44 |
ianw | yeah, i mean we could go back to the way it was too, which was not tested at all :) | 22:45 |
*** seongsoocho_ has quit IRC | 22:46 | |
*** seongsoocho has joined #opendev | 22:48 | |
*** zbr has joined #opendev | 22:51 | |
*** rm_work has joined #opendev | 22:51 | |
*** tkajinam has joined #opendev | 22:56 | |
*** rpittau|afk has joined #opendev | 23:04 | |
fungi | clarkb: ianw: those jobs are also going to be triggered any time job configuration changes, right? it doesn't only coincide with alterations to the role | 23:22 |
clarkb | fungi: ya if the job is modified in zuul config too | 23:23 |
*** slaweq has quit IRC | 23:25 | |
*** tosky has quit IRC | 23:35 | |
ianw | fungi: there is a comment @ https://review.opendev.org/#/c/733425/21/test-playbooks/download-artifact.yaml ... are you thinking somewhere else? | 23:41 |
ianw | i have entered on my todo list to do something else; i just don't want to get sidetracked into a zuul mock etc. while the wheel builds still don't work :) | 23:41 |
fungi | ianw: oh, looking again. it wasn't where i would have expected it i guess | 23:43 |
fungi | yep, i see it, sorry about that! +2 but as clarkb notes we should get some additional feedback on that choice | 23:46 |
ianw | alright i might just pass it via a tarball to get things moving | 23:48 |
fungi | oh, sorry, i missed this was blocking your work, we can always revert later if there are concerns | 23:49 |
ianw | thanks, i'll keep an eye on things :) | 23:52 |
fungi | ianw: since this is in zuul-jobs, maybe giving folks a brief heads up in #zuul would be polite too, since a lot of them don't pay attention in here (nor should we expect them to) | 23:52 |
ianw | yep, AJaeger kindly called it out there as well when he was looking | 23:52 |
fungi | oh, good | 23:52 |
fungi | i know we're all zuul maintainers, but that role might be getting used outside opendev | 23:53 |
fungi | so some more diverse acquiescence should be sought for such changes | 23:54 |
*** mlavalle has quit IRC | 23:57 | |
openstackgerrit | Merged zuul/zuul-jobs master: download-artifact : support recursive download https://review.opendev.org/733425 | 23:59 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!