clarkb | which might mean we have to relax our rules around checking for carrier to support baremetal if we remove that ip link up call | 00:00 |
---|---|---|
ianw | ens3 <DOWN;broadcast,multicast> good start | 00:02 |
clarkb | ianw: thank you for helping to run this down | 00:02 |
ianw | all addresses up ... | 00:02 |
clarkb | ianw: if we can do ~10 nova boots in a row successfully chances are this is it | 00:02 |
ianw | ... i'm unreasonably excited we've found the problem ... we have thought that before though! | 00:02 |
clarkb | the computers are excellent trolls afterall | 00:03 |
clarkb | I need to pop out nowish to start on dinner. The fishing trip was successful yesterday so I've got salmon to figure out cooking for | 00:03 |
ianw | WARNING:glean:Skipping system interface ens3 (fa:16:3e:bb:8b:23) | 00:05 |
ianw | hrm, just commenting isn't enough ... but i think we're on the right track; will keep at it | 00:05 |
clarkb | huh maybe the carrier link isn't set then | 00:06 |
ianw | clarkb: jealous! best i get around here is one from Costco :) | 00:06 |
*** hwoarang has quit IRC | 00:13 | |
*** hwoarang has joined #openstack-infra | 00:21 | |
*** diablo_rojo has quit IRC | 00:21 | |
*** Goneri has quit IRC | 00:39 | |
*** gyee has quit IRC | 00:40 | |
*** kaisers has quit IRC | 00:45 | |
*** kaisers has joined #openstack-infra | 00:57 | |
*** slaweq has joined #openstack-infra | 01:11 | |
*** zhurong has quit IRC | 01:11 | |
*** slaweq has quit IRC | 01:16 | |
*** rh-jelabarre has quit IRC | 01:17 | |
*** hwoarang has quit IRC | 01:25 | |
*** hwoarang has joined #openstack-infra | 01:25 | |
*** whoami-rajat has joined #openstack-infra | 01:32 | |
*** dklyle has quit IRC | 01:35 | |
*** dklyle has joined #openstack-infra | 01:50 | |
*** dychen has joined #openstack-infra | 02:08 | |
*** apetrich has quit IRC | 02:09 | |
*** dchen has quit IRC | 02:09 | |
*** markvoelker has joined #openstack-infra | 02:11 | |
*** markvoelker has quit IRC | 02:21 | |
*** markvoelker has joined #openstack-infra | 02:22 | |
*** markvoelker has quit IRC | 02:26 | |
*** hongbin has joined #openstack-infra | 02:37 | |
*** diablo_rojo has joined #openstack-infra | 02:59 | |
*** slaweq has joined #openstack-infra | 03:02 | |
*** slaweq has quit IRC | 03:06 | |
*** ramishra has joined #openstack-infra | 03:24 | |
*** hongbin has quit IRC | 03:24 | |
*** rlandy|bbl is now known as rlandy | 03:30 | |
*** lbragstad_ has joined #openstack-infra | 03:31 | |
*** lbragstad has quit IRC | 03:31 | |
*** ykarel|away has joined #openstack-infra | 03:33 | |
*** whoami-rajat has quit IRC | 03:41 | |
*** kjackal has joined #openstack-infra | 03:43 | |
*** rlandy has quit IRC | 03:49 | |
*** ociuhandu has joined #openstack-infra | 04:01 | |
*** ociuhandu has quit IRC | 04:06 | |
*** slaweq has joined #openstack-infra | 04:11 | |
*** rascasoft has quit IRC | 04:15 | |
*** slaweq has quit IRC | 04:16 | |
*** rascasoft has joined #openstack-infra | 04:16 | |
*** lbragstad has joined #openstack-infra | 04:25 | |
*** diablo_rojo has quit IRC | 04:26 | |
*** lbragstad_ has quit IRC | 04:28 | |
*** lbragstad_ has joined #openstack-infra | 04:33 | |
*** lbragstad has quit IRC | 04:34 | |
openstackgerrit | Ian Wienand proposed opendev/glean master: Do not bring up udev assigned interfaces https://review.opendev.org/688031 | 04:38 |
*** dave-mccowan has quit IRC | 04:39 | |
*** lbragstad has joined #openstack-infra | 04:40 | |
*** lbragstad_ has quit IRC | 04:41 | |
ianw | clarkb / tristanC / donnyd : ^ i do not like it ... but it one of those things where you pull a thread and the whole thing starts to unravel :/ | 04:42 |
*** surpatil has joined #openstack-infra | 04:50 | |
*** jtomasek has joined #openstack-infra | 04:51 | |
*** surpatil has quit IRC | 04:53 | |
*** surpatil has joined #openstack-infra | 04:54 | |
*** pcaruana has joined #openstack-infra | 04:55 | |
*** gagehugo has joined #openstack-infra | 05:00 | |
openstackgerrit | Jan Kubovy proposed opendev/gear master: Add BSD/Darwin support. https://review.opendev.org/671674 | 05:00 |
*** tkajinam has quit IRC | 05:01 | |
*** tkajinam has joined #openstack-infra | 05:02 | |
*** tkajinam has quit IRC | 05:23 | |
*** tkajinam has joined #openstack-infra | 05:23 | |
openstackgerrit | Ian Wienand proposed opendev/glean master: Do not bring up udev assigned interfaces https://review.opendev.org/688031 | 05:28 |
*** kjackal has quit IRC | 05:28 | |
openstackgerrit | Ian Wienand proposed openstack/diskimage-builder master: Remove NM RA workaround https://review.opendev.org/688036 | 05:40 |
*** ykarel|away is now known as ykarel | 05:43 | |
*** roman_g has joined #openstack-infra | 05:43 | |
roman_g | Hello everyone. I've got a problem with Rackspace CDN, from which we use S3 to store builds information. | 05:45 |
roman_g | This is build logs URL: https://2365c1d014187c3ae706-2572cddac5187c7b669ab9398e41b48d.ssl.cf5.rackcdn.com/687536/4/check/openstack-tox-docs/eb49c40/ | 05:45 |
roman_g | Trying to see contents of docs/ directory (documentation preview) I get error. | 05:46 |
*** jamespage_ has joined #openstack-infra | 05:47 | |
*** weshay_ has joined #openstack-infra | 05:47 | |
*** ykarel is now known as ykarel|afk | 05:48 | |
roman_g | It's compressed, but the right header is missing. | 05:48 |
*** evrardjp_ has joined #openstack-infra | 05:49 | |
*** dirk1 has joined #openstack-infra | 05:50 | |
*** brwyatt_ has joined #openstack-infra | 05:53 | |
*** ktsuyuzaki has joined #openstack-infra | 05:53 | |
*** cloudnull has joined #openstack-infra | 05:53 | |
roman_g | I might be mistaken, but 'Content-Encoding: deflate' is missing from Rackspace CDN. | 05:53 |
*** ykarel|afk has quit IRC | 05:54 | |
*** jamespage has quit IRC | 05:54 | |
*** tbarron has quit IRC | 05:54 | |
*** weshay has quit IRC | 05:55 | |
*** evrardjp has quit IRC | 05:55 | |
*** brwyatt has quit IRC | 05:55 | |
*** kota_ has quit IRC | 05:55 | |
*** cloudnull-afk has quit IRC | 05:55 | |
*** antonym has quit IRC | 05:55 | |
*** dirk has quit IRC | 05:55 | |
*** brwyatt_ is now known as brwyatt | 05:55 | |
*** jamespage_ is now known as jamespage | 05:55 | |
ianw | roman_g: hrm, https://2365c1d014187c3ae706-2572cddac5187c7b669ab9398e41b48d.ssl.cf5.rackcdn.com/687536/4/check/openstack-tox-docs/eb49c40/docs/ shows for me | 05:55 |
*** antonym has joined #openstack-infra | 05:55 | |
roman_g | ianw Safari says "cannot decode raw data" | 05:56 |
*** udesale has joined #openstack-infra | 05:56 | |
*** irclogbot_0 has quit IRC | 05:56 | |
roman_g | Yesterday have had similar error with irefox | 05:56 |
roman_g | *Firefox | 05:56 |
ianw | interesting, i'm using firefox | 05:56 |
*** irclogbot_0 has joined #openstack-infra | 05:57 | |
roman_g | Whom to talk to to add http headers? | 05:57 |
roman_g | If it's ever possible | 05:57 |
ianw | HTTP/1.1 200 OK | 05:58 |
ianw | Content-Encoding: deflate | 05:58 |
ianw | firefox is telling me that from it's netowrk console | 05:59 |
ianw | http://paste.openstack.org/show/782828/ ... it also has gzip, not sure what that means | 05:59 |
*** lbragstad_ has joined #openstack-infra | 06:00 | |
roman_g | ianw http://paste.openstack.org/show/782829/ | 06:00 |
roman_g | This is mine | 06:00 |
*** lbragstad has quit IRC | 06:01 | |
ianw | oh curl, i think you need "--compressed" | 06:02 |
*** lpetrut has joined #openstack-infra | 06:04 | |
*** prometheanfire has quit IRC | 06:05 | |
*** prometheanfire has joined #openstack-infra | 06:06 | |
roman_g | Now works with curl. But not with Safari | 06:08 |
roman_g | Will check with Firefox later today | 06:08 |
*** slaweq has joined #openstack-infra | 06:09 | |
*** xek_ has joined #openstack-infra | 06:09 | |
*** ykarel|afk has joined #openstack-infra | 06:10 | |
roman_g | I think I found the problem. | 06:13 |
roman_g | curl -v -H "Accept-Encoding: gzip, deflate, br" -o /dev/null https://2365c1d014187c3ae706-2572cddac5187c7b669ab9398e41b48d.ssl.cf5.rackcdn.com/687536/4/check/openstack-tox-docs/eb49c40/docs/ 2>&1 | grep Content-Encoding | 06:13 |
roman_g | curl -v -H "Accept-Encoding: gzip, deflate, br" -o /dev/null https://2365c1d014187c3ae706-2572cddac5187c7b669ab9398e41b48d.ssl.cf5.rackcdn.com/687536/4/check/openstack-tox-docs/eb49c40/ 2>&1 | grep Content-Encoding | 06:13 |
*** slaweq_ has joined #openstack-infra | 06:13 | |
*** ykarel|afk is now known as ykarel | 06:14 | |
openstackgerrit | OpenStack Proposal Bot proposed opendev/storyboard master: Imported Translations from Zanata https://review.opendev.org/684669 | 06:14 |
roman_g | For docs/ subdirectory Rackspace sends Content-Encoding twice. One with deflate and another one with gzip. | 06:14 |
*** slaweq has quit IRC | 06:14 | |
roman_g | And Safari can't decide how to decode output. | 06:15 |
*** pgaxatte has joined #openstack-infra | 06:15 | |
roman_g | There should be only deflate. | 06:15 |
*** kopecmartin|off is now known as kopecmartin | 06:17 | |
*** kmarc has quit IRC | 06:22 | |
*** xenos76 has joined #openstack-infra | 06:27 | |
roman_g | https://tools.ietf.org/html/rfc7231#section-3.1.2.2 | 06:27 |
roman_g | There could be multiple encoding, but in this form: "Content-Encoding: value1, value2" | 06:29 |
*** kmarc has joined #openstack-infra | 06:30 | |
*** lathiat has quit IRC | 06:33 | |
*** lathiat has joined #openstack-infra | 06:34 | |
*** xenos76 has quit IRC | 06:38 | |
*** xenos76 has joined #openstack-infra | 06:40 | |
*** slaweq_ is now known as slaweq | 06:48 | |
*** jbadiapa has joined #openstack-infra | 06:51 | |
openstackgerrit | Albin Vass proposed zuul/nodepool master: Static provider defaults to ssh connection https://review.opendev.org/688043 | 06:53 |
*** trident has quit IRC | 06:53 | |
*** trident has joined #openstack-infra | 06:55 | |
*** pkopec has joined #openstack-infra | 07:03 | |
*** rcernin has quit IRC | 07:03 | |
*** tesseract has joined #openstack-infra | 07:03 | |
*** ccamacho has joined #openstack-infra | 07:04 | |
*** kjackal has joined #openstack-infra | 07:08 | |
*** xenos76 has quit IRC | 07:14 | |
*** lpetrut has quit IRC | 07:15 | |
*** gfidente has joined #openstack-infra | 07:18 | |
*** georgk has quit IRC | 07:19 | |
*** georgk has joined #openstack-infra | 07:20 | |
*** tobberydberg has quit IRC | 07:20 | |
*** xenos76 has joined #openstack-infra | 07:21 | |
*** whoami-rajat has joined #openstack-infra | 07:21 | |
*** jbadiapa has quit IRC | 07:25 | |
*** jbadiapa has joined #openstack-infra | 07:25 | |
*** tobberydberg has joined #openstack-infra | 07:26 | |
*** apetrich has joined #openstack-infra | 07:29 | |
*** lpetrut has joined #openstack-infra | 07:34 | |
*** jaosorior has joined #openstack-infra | 07:38 | |
*** odicha has joined #openstack-infra | 07:39 | |
*** jpena|off is now known as jpena | 07:41 | |
*** xenos76 has quit IRC | 07:48 | |
*** xenos76 has joined #openstack-infra | 07:50 | |
*** rpittau|afk is now known as rpittau | 07:53 | |
*** lpetrut has quit IRC | 07:53 | |
*** ralonsoh has joined #openstack-infra | 07:56 | |
*** ykarel is now known as ykarel|lunch | 07:56 | |
*** tkajinam has quit IRC | 07:58 | |
*** kjackal has quit IRC | 08:02 | |
*** lucasagomes has joined #openstack-infra | 08:04 | |
*** kjackal has joined #openstack-infra | 08:04 | |
*** roman_g has quit IRC | 08:09 | |
*** xenos76 has quit IRC | 08:23 | |
*** xenos76 has joined #openstack-infra | 08:24 | |
*** markvoelker has joined #openstack-infra | 08:25 | |
*** markvoelker has quit IRC | 08:30 | |
openstackgerrit | pengyuesheng proposed openstack/diskimage-builder master: Bump the openstackdocstheme extension to 1.20 https://review.opendev.org/688071 | 08:39 |
*** ykarel|lunch is now known as ykarel | 08:43 | |
*** xenos76 has quit IRC | 08:44 | |
*** takamatsu has joined #openstack-infra | 08:48 | |
*** kjackal has quit IRC | 09:07 | |
*** e0ne has joined #openstack-infra | 09:09 | |
*** kjackal has joined #openstack-infra | 09:09 | |
*** rpioso has quit IRC | 09:28 | |
*** rpioso has joined #openstack-infra | 09:29 | |
*** kjackal has quit IRC | 09:29 | |
*** whoami-rajat has quit IRC | 09:41 | |
openstackgerrit | Jens Harbott (frickler) proposed opendev/system-config master: Fix access to clouds on bridge https://review.opendev.org/615197 | 09:43 |
*** kjackal has joined #openstack-infra | 09:43 | |
*** ociuhandu has joined #openstack-infra | 09:49 | |
*** kaisers has quit IRC | 09:52 | |
*** kaisers has joined #openstack-infra | 09:56 | |
*** derekh has joined #openstack-infra | 09:58 | |
*** yamamoto has quit IRC | 10:06 | |
*** rcernin has joined #openstack-infra | 10:08 | |
*** rpittau is now known as rpittau|bbl | 10:15 | |
*** xenos76 has joined #openstack-infra | 10:17 | |
*** rcernin has quit IRC | 10:21 | |
*** ociuhandu has quit IRC | 10:25 | |
*** gfidente has quit IRC | 10:37 | |
*** yamamoto has joined #openstack-infra | 10:41 | |
donnyd | ianw: So it would look to me like when I changed the RA timer in neutron (lowered how long it takes between RA's) this bug became more apparent. I think you are on track with 688031 | 10:43 |
*** yamamoto has quit IRC | 10:46 | |
*** pgaxatte has quit IRC | 10:54 | |
*** dave-mccowan has joined #openstack-infra | 10:57 | |
*** rfolco has joined #openstack-infra | 11:02 | |
*** ociuhandu has joined #openstack-infra | 11:04 | |
*** ociuhandu has quit IRC | 11:04 | |
*** rfolco is now known as rfolco|ruck | 11:09 | |
*** ociuhandu has joined #openstack-infra | 11:17 | |
*** jbadiapa has quit IRC | 11:21 | |
*** yamamoto has joined #openstack-infra | 11:24 | |
*** xek_ has quit IRC | 11:31 | |
*** Goneri has joined #openstack-infra | 11:31 | |
*** jpena is now known as jpena|lunch | 11:33 | |
*** yamamoto has quit IRC | 11:39 | |
*** gfidente has joined #openstack-infra | 11:55 | |
*** Tengu has quit IRC | 11:56 | |
*** pgaxatte has joined #openstack-infra | 11:59 | |
*** yamamoto has joined #openstack-infra | 12:09 | |
*** rh-jelabarre has joined #openstack-infra | 12:11 | |
*** Tengu has joined #openstack-infra | 12:12 | |
*** roman_g has joined #openstack-infra | 12:16 | |
openstackgerrit | Nate Johnston proposed openstack/project-config master: Update neutron-tempest-plugin grafana dashboard https://review.opendev.org/687686 | 12:25 |
*** markvoelker has joined #openstack-infra | 12:28 | |
openstackgerrit | Paul Belanger proposed zuul/zuul master: WIP: Support Ansible 2.9 https://review.opendev.org/674854 | 12:29 |
*** yamamoto has quit IRC | 12:29 | |
*** rlandy has joined #openstack-infra | 12:32 | |
*** markvoelker has quit IRC | 12:33 | |
*** rpittau|bbl is now known as rpittau | 12:33 | |
*** markvoelker has joined #openstack-infra | 12:34 | |
*** jpena|lunch is now known as jpena | 12:35 | |
*** ramishra has quit IRC | 12:42 | |
zbr | ianw: are the new centos-8 nodes usable? I tried to use one but got a weird failure from zuul, https://64fbca123e4b3879c213-47bc4821d6678036d17c4560af30ce98.ssl.cf2.rackcdn.com/688106/1/check/tripleo-tox-molecule/05a99a3/job-output.txt | 12:53 |
zbr | is far from clean why it failed, the only hint I got was from the line with: "failed: 1" but I was not able to identify which one did really failed. | 12:54 |
*** jbadiapa has joined #openstack-infra | 12:57 | |
*** jbadiapa has quit IRC | 12:58 | |
*** jbadiapa has joined #openstack-infra | 12:58 | |
*** yamamoto has joined #openstack-infra | 12:59 | |
openstackgerrit | Merged zuul/nodepool master: Static provider defaults to ssh connection https://review.opendev.org/688043 | 13:00 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: configure-mirrors: use dnf when needed https://review.opendev.org/688118 | 13:01 |
*** yamamoto has quit IRC | 13:02 | |
frickler | zbr: I think that ^^ is the right fix. ftr you can find the corresponding error "msg": "[Errno 2] No such file or directory: 'yum': 'yum'" in the job-output.json file. | 13:05 |
zbr | frickler: yeah, one of probably lots more, but I am working on it. | 13:05 |
*** dklyle has quit IRC | 13:11 | |
*** roman_g has quit IRC | 13:12 | |
AJaeger | zbr: looking at backscroll, there was quite some debugging going on, not sure what the outcome was... | 13:12 |
AJaeger | zbr: I would not merge a change that needs them | 13:12 |
zbr | AJaeger: i am on it, i need for it wait because failures in pre trigger retries. | 13:12 |
mordred | morning all - I'm actually not in a meeting today! | 13:17 |
*** anteaya has joined #openstack-infra | 13:17 | |
*** lbragstad_ is now known as lbragstad | 13:18 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: configure-mirrors: use dnf when needed https://review.opendev.org/688118 | 13:21 |
zbr | pabelanger: ^ what is more interesting is that I failed to identify why https://zuul.opendev.org/t/zuul/build/63f92e0851494096b440c0f7368c49fb was marked as a failure. | 13:22 |
zbr | no task failure but final result is marked as failure | 13:23 |
pabelanger | zbr: I bet it has something to do with handler failing | 13:25 |
pabelanger | and not being logged properly | 13:25 |
*** dklyle has joined #openstack-infra | 13:26 | |
*** lbragstad has quit IRC | 13:26 | |
fungi | zbr: we only just got centos-8 nodes booting in opendev's nodepool late yesterday, so afaik no actual jobs have been run on them yet | 13:26 |
pabelanger | zbr: the run playbook didn't fire for some reason | 13:26 |
pabelanger | which, it think means a syntax error some place | 13:27 |
zbr | fungi: i know,. but i am here to help. | 13:27 |
pabelanger | and we don't bubble that up into logs | 13:27 |
* fungi cheers | 13:27 | |
pabelanger | zbr: maybe ask fungi to check executor logs, to see if ansible-playbook raised an error (sorry, I won't be able to look for it) | 13:28 |
fungi | but yeah, having jobs running on them reasonably well by the time train releases middle of next week and openstack focus shifts fully to ussuri would be great | 13:28 |
fungi | i can take a look in a sec | 13:28 |
pabelanger | Yah, currently working on getting centos-8 working for zuul.a.c, so suspect we might find some of the same issues :) | 13:29 |
zbr | there is one thing that concerns me: centos-8 support likely need ansible 2.8 minimum in order to allow ansible to do proper interpreter detection. | 13:29 |
pabelanger | zbr: we should be okay, we zuul does support ansible 2.8, we also expose the say to select right python interpreter in nodepool | 13:29 |
pabelanger | I think ianw set default to python3 in nodepool | 13:30 |
fungi | well, we can set specific jobs to use 2.8, but this could also be a good incentive to up the default for any tenants on <2.8 | 13:30 |
zbr | i think that sooner or later I will find some fixes that are not working well on default ansible of our zuul, which is still 2.7 | 13:30 |
mordred | yeah - but also setting jobs to use 2.8 for centos-8 jobs would likely be good | 13:30 |
fungi | i think we were just waiting for openstack release activity to die off before we went and did something possibly disruptive | 13:30 |
mordred | zbr: jobs can select their ansible version | 13:30 |
zbr | i know how to do it manually, probably I will do this if needed, but only for centos-8. | 13:30 |
mordred | ++ | 13:30 |
pabelanger | zbr: thanks for the reminder, I am going to send an email to zuul ML about this topic :) | 13:31 |
*** goldyfruit has joined #openstack-infra | 13:31 | |
zbr | for tripleo jobs I had changes to change ansible to 2.8, not sure it they all merged yet. | 13:31 |
zbr | i would personally salute any attempt to bump default ansible to 2.8, anywhere. | 13:32 |
pabelanger | zbr: https://review.opendev.org/650431/ drops ansible 2.5 support, https://review.opendev.org/676695/ defaults to ansible 2.8 and deprecates ansible 2.6 | 13:32 |
clarkb | romqn&g isnt the first person to have trouble with apple browsers and rax. However according to the RFC ot is apple at fault and not rax. https://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2 that bit says you create the list from the multiple headers | 13:32 |
clarkb | er roman_g | 13:32 |
fungi | for some reason i thought we'd started listing executors in the build results table. i guess that change hasn't landed yet? | 13:32 |
pabelanger | zbr: and https://review.opendev.org/674854/ is ansible 2.9.0rc3 support :) | 13:32 |
clarkb | infra-root and config-core ^ fyi about safari being broken with our swifthosted logs in rax | 13:33 |
clarkb | fungi: I suggested it but it requires adb migration for anew field | 13:33 |
fungi | ahh | 13:33 |
clarkb | we do record that info in the console log though | 13:33 |
*** yamamoto has joined #openstack-infra | 13:34 | |
fungi | unfortunately, in this case there was no console log preserved | 13:34 |
fungi | (possible it never even got that far) | 13:34 |
fungi | and yeah, specifying the same http header multiple times with different values is totally legitimate | 13:36 |
clarkb | fwiw I'm still -0.5 on dropping ansible 2.5 | 13:36 |
fungi | as long as it's defined as a list-type | 13:36 |
clarkb | I dont think that really helps anyone and only causes problems | 13:36 |
fungi | (which content-encoding is) | 13:36 |
clarkb | fungi yup and for some reason safari fails to treat content encoding that way. None or the other browsers I tested had trouble (I didnt test safari) | 13:37 |
fungi | amusing too since it just uses gecko like chrom* | 13:37 |
frickler | seems IE would also be broken according to https://noxxi.de/research/http-evader-explained-4-double-encoding.html | 13:38 |
frickler | it also mentions firewalls, so that may affect some corporate VPNs, too. likely we'd be on the safer side if we could avoid double encodings, then | 13:41 |
clarkb | frickler: I think the only way for us to do that is stop compressing ehat we upload to rax | 13:42 |
clarkb | which seems less than ideal particularly since rax is following the rfc | 13:42 |
frickler | clarkb: or stop uploading to rax. I haven't followed yet why this is only happening there | 13:42 |
fungi | likely a nuance of their cdn or the vintage of swift they're still running or some local patches they're carrying | 13:43 |
clarkb | My understanding is whatever web servee sits in front of rax swift is gzip encoding everything | 13:43 |
clarkb | even stuff that was deflate encoded | 13:43 |
clarkb | it does so validly | 13:43 |
fungi | but yeah, sounds like cdn in that case | 13:43 |
clarkb | if we upload raw uncompressed data it should result in a single compression encoding from that proxy | 13:44 |
clarkb | but weupload deflate encoded data to reduce disk amd upload resource conaumption | 13:45 |
frickler | so I think the main question now is: do we insist what we do is correct or do we want to adjust for users of non-RFC compliant browsing environments that they may not be able to influence? | 13:46 |
frickler | rax is one in how many sites we upload to? could we live with not using them for logs? | 13:47 |
clarkb | they are currently 1/3 clouds and 3/5 regions iirc | 13:48 |
mordred | yeah - and I think we also don't want to upload without deflate, for the reasons | 13:48 |
mordred | this is an annoying bug | 13:48 |
clarkb | I dont think we should turn off rax | 13:48 |
frickler | can we use gzip instead of deflate? that should not be compressed twice, then, should it? | 13:49 |
clarkb | I havent tested that but that may be possible | 13:49 |
*** ramishra has joined #openstack-infra | 13:50 | |
*** eharney has joined #openstack-infra | 13:52 | |
*** xek_ has joined #openstack-infra | 13:55 | |
*** ociuhandu has quit IRC | 13:57 | |
*** udesale has quit IRC | 13:57 | |
*** udesale has joined #openstack-infra | 13:58 | |
*** surpatil has quit IRC | 13:58 | |
*** surpatil has joined #openstack-infra | 13:59 | |
*** lbragstad has joined #openstack-infra | 13:59 | |
corvus | i *think* (digging into fuzzy memory) the only reason we deflate is that it's easy to do so in a streaming manner with the python libs | 14:01 |
corvus | so investigating switching that to gzip may be promising | 14:01 |
mordred | ++ | 14:03 |
donnyd | clarkb: for instance right now http://grafana.openstack.org/d/3Bwpi5SZk/nodepool-fortnebula?orgId=1&from=now-3h&to=now | 14:05 |
donnyd | the 21 nodes it says its deleting all deleted in a few seconds on my end | 14:06 |
openstackgerrit | Merged opendev/project-config master: Add a third-party check pipeline to OpenDev https://review.opendev.org/682758 | 14:06 |
*** fdegir has quit IRC | 14:06 | |
*** georgk has quit IRC | 14:06 | |
*** georgk has joined #openstack-infra | 14:06 | |
*** fdegir has joined #openstack-infra | 14:06 | |
donnyd | https://www.irccloud.com/pastebin/xYumO2ij/ | 14:07 |
donnyd | There is nothing in BUILD or DELETE | 14:07 |
clarkb | donnyd: are there ctive nodes though? | 14:10 |
donnyd | 2019-10-11 14:05:13.758 544 INFO nova.compute.manager [req-2cd9bc7f-ea85-4039-8e15-3aa9fc4d9928 3c109a4413ca4b68b90560093ff2d79c e8fd161dc34c421a979a9e6421f823e9 - default default] [instance: e314f6a8-cd8e-4c8c-8fbc-f220fb8ddadc] Took 3.39 seconds to destroy the instance on the hypervisor. | 14:10 |
clarkb | nova couldve refused or failed to delete those | 14:10 |
donnyd | I see none of that in the logs, but maybe I am not looking for the right thing | 14:10 |
clarkb | its not the hypervisor that matters as much as the api | 14:10 |
clarkb | since nodepool only knows what the api tells it | 14:10 |
*** liuyulong has joined #openstack-infra | 14:12 | |
donnyd | welp this is not good | 14:13 |
donnyd | https://www.irccloud.com/pastebin/ckEGpfRE/ | 14:14 |
mordred | donnyd: that does, in fact, seem less than ideal | 14:14 |
donnyd | This doesn't help with the deleting thing | 14:14 |
*** ociuhandu has joined #openstack-infra | 14:14 | |
donnyd | but there are plenty of resources | 14:14 |
donnyd | gonna have to run that down | 14:14 |
donnyd | well that was from 4 hours ago... it makes more sense now because I was infact out of resources | 14:16 |
donnyd | I may have automated the build of a big data platform that ate all of the resources | 14:17 |
donnyd | but that is fixed now | 14:17 |
fungi | frickler: i would hesitate to assume users may not be able to influence their browsing environments. there is always influence which can be exerted, the users may simply feel it's not worth the degree of effort involved in doing so | 14:17 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: configure-mirrors: use dnf when needed https://review.opendev.org/688118 | 14:18 |
fungi | effort/risk | 14:19 |
donnyd | nodepool asked FN to build instances which happens pretty quickly as noted here | 14:24 |
donnyd | https://www.irccloud.com/pastebin/AM0pC7Rt/ | 14:24 |
donnyd | In the time it took to run the commands twice it went from BUILD to ACTIVE | 14:24 |
donnyd | and there is nothing in DELETE | 14:24 |
donnyd | so I can't understand where the API would report that they are still deleting | 14:25 |
clarkb | it might not report they are deleting | 14:25 |
clarkb | nodepool manages its own state which you see in grafana | 14:25 |
donnyd | oh | 14:25 |
clarkb | if a node goes to deleting in nodepools state it will then ask nova to delete the instance | 14:26 |
clarkb | and it will do so until the instance actually deletes | 14:26 |
donnyd | Well why does nodepool take so long to register that those nodes were deleted like 20 minutes ago | 14:26 |
clarkb | but the instance may remain active in nova during that time if there are problems deleting | 14:26 |
donnyd | but they aren't active in nova.. they are gone | 14:26 |
donnyd | and there aren't any problems deleteing | 14:27 |
*** sreejithp has joined #openstack-infra | 14:27 | |
fungi | it's possible we need to pick one from the nodepool list output which seems to have been in a delete state for a while, and then trace it in the launcher debug log | 14:27 |
clarkb | if the api returns that result (they are gone) then possibly a bug in opemstacksdk or nodepool | 14:27 |
*** whoami-rajat has joined #openstack-infra | 14:27 | |
fungi | donnyd can also coorelate the nova logs for the same instance uuid to those timestamps | 14:28 |
donnyd | probably | 14:28 |
donnyd | I did finally get aggregated logging up | 14:28 |
*** ociuhandu has quit IRC | 14:28 | |
fungi | that should in theory tell us on what end the delay is happening | 14:28 |
*** odicha has quit IRC | 14:29 | |
donnyd | just so I can understand the logic, nodepool issues delete, then waits for it to be returned back as deleted | 14:30 |
*** pgaxatte has quit IRC | 14:30 | |
donnyd | but the server remains active until its told to delete | 14:30 |
clarkb | nodepool sends the deleterequest then polls for the instance to go away | 14:31 |
clarkb | what nova's state looks like in the interim is often a mystery to us I think we've seen nova failto process the request entirely, process the request but not delete, attempt to delete and fail and set instance to error state etc | 14:32 |
clarkb | as fungi says we ahould find an instance and track that uuid | 14:32 |
fungi | the state in nodepool goes from boot/ready/used to delete, nodepool issues a nova delete api call via openstacksdk, if the api reports failure or the call times out it queues that delete and retries again shortly. if the api call succeeds it waits to see the state in nova transition to something other than active (or possibly to disappear altogether). if it doesn't disappear, regardless of state, i | 14:32 |
fungi | think it may continue to retry the delete? | 14:32 |
*** ykarel is now known as ykarel|away | 14:33 | |
donnyd | If it had been doing this for a while, then I would lean more towards something is busted in FN.. but it started happening a week or so ago | 14:33 |
fungi | at any rate, once the instance no longer appears in the server list from the nova api, nodepool cleans up the record of that node on its side | 14:33 |
donnyd | but also it only seems to happen in FN | 14:33 |
donnyd | so it can't be nodepool either | 14:33 |
*** ociuhandu has joined #openstack-infra | 14:34 | |
fungi | at any rate, the nodes "stuck" in delete according to nodepool could be anywhere in the spectrum from nodepool deciding the node should be deleted but hasn't issued the call to nova yet, to nodepool confirming the instance no longer appears in the nova server list and cleaning up the record on its end | 14:36 |
donnyd | so is it something I should just not worry about | 14:36 |
fungi | well, i think we start by catching it when there's a glut of stale deleting nodes and pick one which has been in that state for the longest and then collect the nodepool debug log entries for it | 14:37 |
fungi | then we can show you the timestamps we see for various api calls related to that instance uuid | 14:37 |
donnyd | ok, that would be useful data to figure out what the dealio is with it | 14:38 |
fungi | may be best to identify an affected instance but wait until the system recovers to steady state before pulling logs, just for completeness | 14:38 |
*** ykarel|away has quit IRC | 14:38 | |
fungi | it's not super disruptive for us, it's just preventing us from making efficient use of our quota there | 14:39 |
donnyd | http://grafana.openstack.org/d/3Bwpi5SZk/nodepool-fortnebula?orgId=1&from=now-2w%2Fw&to=now-2w%2Fw | 14:39 |
donnyd | here is two weeks ago | 14:39 |
donnyd | if you isolate out just deleting nodes the graph makes sense | 14:39 |
donnyd | and then look here | 14:39 |
donnyd | http://grafana.openstack.org/d/3Bwpi5SZk/nodepool-fortnebula?orgId=1&from=now-7d&to=now | 14:39 |
donnyd | same deal, just show only deleting nodes | 14:40 |
*** diablo_rojo has joined #openstack-infra | 14:40 | |
*** michael-beaver has joined #openstack-infra | 14:41 | |
donnyd | and finally looking at yesterday it went on pretty much all day | 14:41 |
donnyd | http://grafana.openstack.org/d/3Bwpi5SZk/nodepool-fortnebula?orgId=1&from=1570694286711&to=1570720082216 | 14:41 |
donnyd | I also use FN daily for other things, so if there was an issue deleting something.. you would think I would have noticed something | 14:42 |
*** goldyfruit_ has joined #openstack-infra | 14:43 | |
clarkb | ya I havent noticed issues with my boot and delete loops testing network manager stuff | 14:43 |
fungi | yeah, looks like we saw an instance of it today between 14:00 and 14:30 | 14:44 |
fungi | it could be something like a thread getting blocked or in livelock of some sort in nodepool-launcher | 14:44 |
donnyd | would it be appropriate to maybe bounce nl02? | 14:45 |
fungi | it's interesting we saw a spike of deleting nodes (~60) all at once at 1400, and then half of them were cleared out immediately and the other half stuck around | 14:45 |
*** goldyfruit has quit IRC | 14:46 | |
donnyd | but then other times it goes back to normal | 14:47 |
donnyd | building and deleteing without issue | 14:47 |
fungi | the nodepool-launcher process on nl02 has been running for just over a month according to ps, so yeah it's possible it's gotten itself into a tizzy, but i think we'd rather collect details on the symptom first before potentially making it vanish for weeks without any information on what may have caused it | 14:48 |
donnyd | yea that makes sense to do | 14:48 |
*** pkopec has quit IRC | 14:49 | |
donnyd | I want to be clear I am not pointing the finger at nodepool... it could very well be FN.. just want to get to the bottom of it | 14:50 |
fungi | but if memory serves there's a separate deleter thread per provider so it's possible the symptom isn't actually being triggered by fortnebula-specific behavior and that thread is just experiencing some bug for unrelated reasons | 14:50 |
fungi | (as an explanation for why we don't see it in the graphs for other providers) | 14:51 |
fungi | though interestingly, we saw a glut of nodes in a deleting state today in ovh just before fn: http://grafana.openstack.org/d/BhcSH5Iiz/nodepool-ovh?orgId=1 | 14:52 |
*** e0ne has quit IRC | 14:53 | |
fungi | ovh-bhs1 i mean | 14:53 |
fungi | so i wouldn't assume just yet that it's only fn exhibiting this | 14:53 |
donnyd | here http://grafana.openstack.org/d/BhcSH5Iiz/nodepool-ovh?orgId=1&from=1570800147449&to=1570803601387 | 14:53 |
fungi | yep | 14:54 |
fungi | anyway, i need to go meet some folks for tacos, so will be disappearing for a bit | 14:54 |
donnyd | oh don't be missing tacos... for anything | 14:55 |
donnyd | its not hurting anything other than lack of test nodes, so not critical to T/S right this second | 14:55 |
fungi | well, it's more that we don't have it happening at the moment, it'll be easier to pick a canary instance if we catch it in the act | 14:56 |
donnyd | ok cool | 14:56 |
clarkb | we can probably find an instance in the logs though? | 14:56 |
donnyd | I will keep my eyes peeled for the next time | 14:56 |
fungi | clarkb: yeah, i'm just lazy and that may require more hunting ;) | 14:57 |
donnyd | clarkb: I'm thinking you missed the tacos part | 14:57 |
*** KeithMnemonic has joined #openstack-infra | 14:57 | |
donnyd | LOL | 14:57 |
fungi | if it's activity-related then we may not see it again until next week what with friday and the weekend usually being slow periods for us | 14:57 |
donnyd | I will send up flares the next time i see it | 14:58 |
fungi | so going on a hunt for an example in the logs may be the better option for getting to the bottom of it | 14:58 |
clarkb | ya I can do it | 14:58 |
clarkb | I've already found a candidate, just trying to put relevant logs together | 14:59 |
fungi | well, if you're busy, and if nothing's on fire when i return, i can see what i find | 14:59 |
fungi | ahh, excellent. | 14:59 |
fungi | anyway, gotta go. bbiaw | 14:59 |
donnyd | fungi: wait back to the tacos thing.. do we all get them.. or just you? | 14:59 |
donnyd | clarkb: thanks for looking | 14:59 |
fungi | donnyd: they don't fit into a fax machine like pizza does, so just me i guess | 14:59 |
clarkb | donnyd: http://paste.openstack.org/show/783017/ | 15:00 |
clarkb | donnyd: nodepool reports that the nova api timed out when doing the first delete pass of that instnace (there are others that exhibit this behavior in that same timeframe) | 15:00 |
clarkb | then it tries again and appears to succeed | 15:00 |
clarkb | I think we look at that instance to start and if necessary I can grab uuids for the others in that same block of time (these are UTC timestamps) | 15:00 |
donnyd | so I can just run down that instance id on my end and see what the issues is | 15:01 |
clarkb | donnyd: yup and if that doesn't clear things up we can try some other uuids | 15:01 |
*** surpatil has quit IRC | 15:01 | |
clarkb | donnyd: note that waitForNodeCleanup in that traceback is waiting for a nova list (or show against a uuid?) type api call to stop listing that instance uuid | 15:02 |
clarkb | ah ya it calls getServer | 15:02 |
clarkb | donnyd: basically what it is saying is that the server record didn't go away in 10 minutes, it doesn't tell us any info about what the server record's state was, just that it existed the whole time | 15:02 |
*** FlorianFa has quit IRC | 15:03 | |
donnyd | I found it in the logs, now I am searching for where the delete request is at | 15:04 |
*** lpetrut has joined #openstack-infra | 15:04 | |
*** kjackal has quit IRC | 15:05 | |
donnyd | http://paste.openstack.org/show/783020/ | 15:07 |
pabelanger | fungi: zbr: yah, configure-mirrors is going to be broken on centos-8, we hard code some stuff in the repo files to centos-7 | 15:08 |
zbr | pabelanger: fungi it seems thatIneed somehelp with https://review.opendev.org/#/c/688118/ -- fails on f29 without any trace of failure. | 15:08 |
pabelanger | I can push up a patch once I figure out new bits | 15:09 |
*** ociuhandu has quit IRC | 15:09 | |
donnyd | clarkb: | 15:10 |
donnyd | NP - 2019-10-11 14:18:32,732 INFO nodepool.DeletedNodeWorker: Deleting used instance 59cf655b-22e0-4d2f-ae7c-0610a911ada6 from fortnebula-regionone | 15:10 |
donnyd | FN - 2019-10-11 14:18:35.366 49335 INFO nova.compute.manager [req-738b906b-abfe-498a-83d0-a5d7922898b0 3c109a4413ca4b68b90560093ff2d79c e8fd161dc34c421a979a9e6421f823e9 - default default] [instance: 59cf655b-22e0-4d2f-ae7c-0610a911ada6] Terminating instance | 15:10 |
donnyd | so three seconds after the delete request was called, delete was issued | 15:11 |
*** kjackal has joined #openstack-infra | 15:12 | |
donnyd | and 5 seconds after that the instance was done | 15:12 |
donnyd | gone | 15:12 |
*** ykarel|away has joined #openstack-infra | 15:14 | |
clarkb | donnyd: but for 10 minutse the api continued to report back that instance | 15:15 |
clarkb | donnyd: do you see the GETs for the instance in the api log? | 15:15 |
clarkb | that might provide some clues as to what was being returned back (if anything, maybe there is a caching bug in sdk?) | 15:15 |
donnyd | no | 15:16 |
clarkb | mordred: ^ bug in sdk then? | 15:18 |
mordred | clarkb: reading | 15:18 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: Enable zuul-jobs-test-base-roles-centos-8 as nv https://review.opendev.org/688146 | 15:18 |
mordred | clarkb: we do cache server list in nodepool's use of sdk | 15:19 |
mordred | but - 10 minutes would be the wrong amount of time to cache that | 15:19 |
mordred | clarkb: we wouldn't be making GETs for the instance uuid - we should be doing GET /servers/details every X seconds | 15:20 |
*** ykarel|away is now known as ykarel | 15:21 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: configure-mirrors: use dnf when needed https://review.opendev.org/688118 | 15:21 |
donnyd | clarkb: I use ansible to launch/delete all of my instances from gitlab ci, and one would think I would have hit this at some point in time | 15:21 |
*** kopecmartin is now known as kopecmartin|off | 15:21 | |
clarkb | mordred: ok so getServer translates to /servers/details and does a listing rather than a specific uuid get | 15:21 |
clarkb | donnyd: ansible likely doesn't check if things are actually deleted which is whatn odepool is doing here | 15:22 |
clarkb | donnyd: basiaclly nodepool doesn't trust nova. It waits for nova to actually remove the server instance record before accepting that the deletion is complete | 15:22 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: configure-mirrors: use dnf when needed https://review.opendev.org/688118 | 15:22 |
clarkb | donnyd: are there api calls for /servers/details in that time period? | 15:22 |
mordred | clarkb: it's possible it's being returned with a status DELETED or someting like that and we're not handling that properly? | 15:23 |
*** udesale has quit IRC | 15:23 | |
clarkb | maybe? | 15:25 |
donnyd | 2019-10-11 14:28:34.782 43667 INFO nova.osapi_compute.wsgi.server [req-3d0d6252-e286-4ead-98f0-04a38868e6cc 17c3614b712a447f85c7b08e07b7ae93 5bdb8777971d40799563ceb726317f11 - default default] 10.0.10.240 "GET /v2.1/5bdb8777971d40799563ceb726317f11/servers/detail HTTP/1.1" status: 200 len: 3876 time: 2.3317320 | 15:27 |
EmilienM | pabelanger: hey good morning, what's the path forward https://review.opendev.org/#/c/686196/ ? | 15:27 |
donnyd | clocks might be a few ms off | 15:27 |
*** ociuhandu has joined #openstack-infra | 15:27 | |
donnyd | 2019-10-11 14:28:35,778 ERROR | 15:27 |
donnyd | ^^^ that is the NP timestamp | 15:28 |
pabelanger | EmilienM: maybe best to debug in #zuul, and see how to support it. For now, you'd need to remove your plugin filter from the commit | 15:29 |
*** eernst has joined #openstack-infra | 15:31 | |
donnyd | I can export the log data if that is of interest | 15:31 |
EmilienM | pabelanger: no way I need that code in | 15:33 |
EmilienM | pabelanger: or my role doesn't work | 15:33 |
clarkb | donnyd: no I think that confirms that servers/detail is being called (implying we aren't caching old values of it) | 15:35 |
clarkb | donnyd: mordred I guess we have to investigate if we aren't handling DELETED states then | 15:35 |
pabelanger | EmilienM: there should be otherways to do it, but you bascially need to remove it from current location. Because zuul-executor thinks it is a security risk and won't load the role | 15:36 |
EmilienM | pabelanger: where should I put it then? | 15:38 |
*** auristor has quit IRC | 15:39 | |
*** jamesmcarthur has joined #openstack-infra | 15:39 | |
pabelanger | EmilienM: that's the trick, it could go anyplace, but before you can use the role in tripleo, you'd need to move the plugin into the correct location for the nested ansible-playbook. There are a few example projects using filter plugins, https://opendev.org/openstack/openstack-ansible-plugins comes to mind | 15:41 |
*** bnemec has quit IRC | 15:43 | |
*** rpittau is now known as rpittau|afk | 15:44 | |
*** bnemec has joined #openstack-infra | 15:44 | |
*** auristor has joined #openstack-infra | 15:45 | |
pabelanger | EmilienM: https://opendev.org/zuul/zuul/src/branch/master/zuul/executor/server.py#L1429 is the code path you are running into | 15:45 |
*** yamamoto has quit IRC | 15:48 | |
*** yamamoto has joined #openstack-infra | 15:49 | |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Use gzip to compress files uploaded to swift https://review.opendev.org/688154 | 15:50 |
clarkb | corvus: frickler ^ that is totally untested and ya gzip is much more painful to work with in this context than zlib/compress | 15:50 |
mordred | clarkb: yeah. I thnik that's something something soft delete something perhaps? | 15:50 |
clarkb | corvus: frickler that said it is entirely possible I've missed something obvious and there is an easier way to work with gzipfile | 15:50 |
clarkb | like maybe if we just open a GzipFile that class will do the windowing of reads for us automagically? | 15:51 |
clarkb | I should actually test that I guess | 15:51 |
clarkb | oh no becaues with python2 you have to write to a "file" I expect its similar pain | 15:51 |
*** auristor has quit IRC | 15:51 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: URL quote username/password in gerrit https://review.opendev.org/688155 | 15:53 |
*** yamamoto has quit IRC | 15:54 | |
corvus | clarkb: i haven't looked at it deeply yet -- but is "send the gzip header followed by use zlib as we are now" a viable option? | 15:54 |
corvus | (this is brainstorm-level engineering -- i have no idea if what i said makes sense) | 15:54 |
*** jaosorior has quit IRC | 15:55 | |
*** ykarel is now known as ykarel|afk | 15:55 | |
*** auristor has joined #openstack-infra | 15:55 | |
clarkb | corvus: you know it probably is. In fact python gzip lib is implemented with zlib under the hood. Unfortauntely they don't expose the "write a gzip" header functionality publicly | 15:56 |
clarkb | but it probably isn't too bad to write one ourselves | 15:56 |
AJaeger | clarkb: do we really need to support python2 for that file? I left a comment | 15:56 |
clarkb | AJaeger: we do as long as we continue to run jobs on centos7 | 15:56 |
corvus | s/we/a zuul user/ | 15:56 |
clarkb | corvus: ++ | 15:56 |
corvus | clarkb: then i think it may be worth doing, because i think "store entire compressed file in memory" is something we should avoid :) | 15:57 |
AJaeger | clarkb: then the python3 in line 1 is confusing ;( | 15:57 |
clarkb | AJaeger: its meant to default to python3 but I think we run it under 2 on centos 7 | 15:57 |
corvus | clarkb: oh hrm.... actually.... | 15:58 |
corvus | clarkb: this runs only on the executor... | 15:58 |
corvus | do we know if ansible running on localhost uses python2 or 3? | 15:58 |
corvus | (this is run by the ansible process on the executor using the implicit localhost connection) | 15:59 |
*** auristor has quit IRC | 16:00 | |
*** lucasagomes has quit IRC | 16:01 | |
openstackgerrit | Clark Boylan proposed zuul/nodepool master: Handle case where nova server is in DELETED state https://review.opendev.org/688157 | 16:01 |
clarkb | corvus: oh I think it is python2 by default but can be python3 | 16:01 |
clarkb | mordred: donnyd ^ something liek that maybe? | 16:01 |
clarkb | corvus: actually wait no | 16:02 |
clarkb | corvus: the remote python is python2 by default but can be python3. The local ansible will have been installed under a python3 venv iirc. Double checking | 16:03 |
*** prometheanfire has quit IRC | 16:03 | |
*** prometheanfire has joined #openstack-infra | 16:03 | |
mordred | clarkb: ++ | 16:03 |
corvus | yeah, that is sounding plausible to me... | 16:03 |
clarkb | corvus: ya the local ansible venvs are all python3 on ze01 | 16:04 |
mordred | clarkb: did we find any corroboration that DELETED is the status? | 16:04 |
corvus | clarkb, AJaeger: so maybe we can write this to be py3 only | 16:04 |
clarkb | mordred: no | 16:04 |
corvus | clarkb: does that make it easier? | 16:04 |
clarkb | donnyd: do you know if the instances are going into a DELETED state for an appreciable amount of time? | 16:04 |
corvus | and does it avoid the read everything into memory issue? | 16:04 |
clarkb | corvus: ya I think in that case we can just use gzip.compress() | 16:04 |
clarkb | corvus: and we may write some extra headers but the resulting file is still a valid gzip file from my testing | 16:04 |
*** lpetrut has quit IRC | 16:05 | |
clarkb | basically we'll have a header for every 16kb input bytes | 16:05 |
clarkb | but can avoid reading everything into memroy easily that way | 16:05 |
mordred | clarkb: the nova docs say it can be available with no contract on amount of tie | 16:05 |
mordred | time | 16:05 |
mordred | clarkb: "In some circumstances deleted items will still be accessible via the backend database" | 16:06 |
clarkb | mordred: ya I think it is probably a good thing to check against anyway even if it isn't this specific issue | 16:06 |
clarkb | mordred: given that nova documentation | 16:06 |
corvus | clarkb: or we could just use a tempfile | 16:07 |
mordred | clarkb: yeah - I can't find a list of valid statuses - BUT - the docs for vm_state list ACTIVE And DELETED as two different states | 16:07 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: configure-mirrors: use dnf when needed https://review.opendev.org/688118 | 16:07 |
clarkb | corvus: not sure I understand | 16:07 |
clarkb | mordred: https://docs.openstack.org/api-guide/compute/server_concepts.html is the list of states | 16:07 |
*** auristor has joined #openstack-infra | 16:08 | |
mordred | clarkb: oh - also - apparently we should add SOFT_DELETED too? | 16:08 |
clarkb | mordred: I thought about that but I don't think so. If the instance can be restored from a soft deleted state then it likely counts against your quota | 16:08 |
mordred | good point | 16:08 |
mordred | well - I mean - there's a question ... | 16:08 |
mordred | does a sever in DELETED state count against quota? | 16:08 |
mordred | dansmith: ^^ ? | 16:08 |
dansmith | mordred: no, shouldn't | 16:09 |
mordred | cool | 16:09 |
mordred | clarkb: then yeah - I think you're patch is what we shoudl do | 16:09 |
mordred | your | 16:09 |
clarkb | mordred: cool | 16:09 |
mordred | I mean - you ARE a patch | 16:09 |
mordred | but that's different | 16:09 |
corvus | clarkb: what if we used gzip just to write the header for us, then switched to zlib? :) | 16:10 |
clarkb | hrm gzip.write('') then append writes for zlib after? | 16:11 |
clarkb | it is possible that produces a working file | 16:11 |
clarkb | I can test that | 16:11 |
corvus | clarkb: another idea: can we use the code you have now, but truncate the stringio after each iteration? | 16:12 |
clarkb | corvus: a naive s=gzip.write('') then s+=zlib.compressobj().write('stuff') does not result in stuff being uncompressed. It is treated as trailing garbage and we uncompress an empty file | 16:16 |
clarkb | I guess I need to read up on gzip headers | 16:16 |
clarkb | the crc32 may break us here? | 16:17 |
clarkb | and the size | 16:17 |
*** jamesmcarthur has quit IRC | 16:19 | |
fungi | tacos managed | 16:21 |
fungi | seeing what all i missed in the meantime | 16:21 |
*** eernst has quit IRC | 16:21 | |
pabelanger | yum | 16:21 |
*** eernst has joined #openstack-infra | 16:23 | |
*** jamesmcarthur has joined #openstack-infra | 16:23 | |
*** odicha has joined #openstack-infra | 16:23 | |
*** ociuhandu has quit IRC | 16:23 | |
*** e0ne has joined #openstack-infra | 16:23 | |
*** ramishra has quit IRC | 16:23 | |
*** eernst has quit IRC | 16:23 | |
clarkb | corvus: learning up on zlib headers vs gzip it seems that zlib appends the checksum to the end of the stream | 16:24 |
clarkb | gzip prepends it in the header | 16:25 |
clarkb | this is why zlib implements the partial encoding with compressobj but gzip requires reading the whole thing into memory or writing a new header for each block | 16:25 |
corvus | oy | 16:26 |
*** ociuhandu has joined #openstack-infra | 16:26 | |
*** e0ne has quit IRC | 16:26 | |
corvus | clarkb: so our only viable options are to compress blockwise, compress the whole thing in memory, or compress the whole thing on disk. | 16:26 |
*** yamamoto has joined #openstack-infra | 16:26 | |
clarkb | corvus: aiui yes | 16:26 |
corvus | clarkb: since this runs on the executors, which are memory-constrained, i think we should avoid 'whole file in memory'. i think we could make a tempfile though, and then hand the upload method a handle to that. | 16:29 |
corvus | if you think blockwise is better, that's fine -- i just worry that it might be too weird for some utility... | 16:30 |
corvus | like, i dunno, safari :) | 16:30 |
clarkb | ya blockwise could potentially trip up some clients | 16:30 |
*** yamamoto has quit IRC | 16:31 | |
clarkb | or we could just suggest everyone use a browser that works >_> | 16:31 |
corvus | clarkb: yeah, though this should have the benefit of making curl/wget better too, right? | 16:32 |
corvus | (in that curl|gunzip should work?) | 16:32 |
corvus | clarkb: actualy... | 16:33 |
clarkb | yes though curl | gunzip | some_deflate_utility should also already work | 16:33 |
corvus | clarkb: if you use tar with gzip, does it do blockwise headers? | 16:34 |
clarkb | (That was roughyl how I tested the rax was nesting the compression properly according to the rfc last time this came up) | 16:34 |
corvus | clarkb: yeah, but i mean we don't need "some_deflate_handler" anymore -- more folks have gunzip in their muscle memory | 16:34 |
corvus | clarkb: re tar: i'm wondering if maybe blockwise headers aren't so exotic after all | 16:35 |
clarkb | I don't know if tar does it block (file?) wise | 16:35 |
*** xeno_os76_xyz has joined #openstack-infra | 16:35 | |
donnyd | clarkb: sorry.. .fungi got me thinking about tacos... so I had to do something about it | 16:35 |
clarkb | donnyd: ha | 16:36 |
corvus | now i'm thinking about tacos | 16:36 |
*** xenos76 has quit IRC | 16:36 | |
fungi | donnyd: corvus: sorry about that, but they were very good tacos | 16:36 |
fungi | clarkb: i believe tar does headers per file in sequence, but not within the stream for a single file | 16:37 |
clarkb | I don't have traditional taco makings but I do have some leftover chinook salmon I can pick the bones out of and put into a tortilla | 16:37 |
fungi | file inside the tar container i mean | 16:37 |
*** markvoelker has quit IRC | 16:38 | |
fungi | clarkb: salmon tacos are marvellous, you should go for it | 16:38 |
donnyd | clarkb: >do you know if the instances are going into a DELETED state for an appreciable amount of time? < I don't even know where I would look for that data | 16:38 |
fungi | donnyd: this is one of the things where if we catch it in the act we can inspect with the api too | 16:39 |
corvus | fungi: yeah, but what about the gzip headers? | 16:39 |
clarkb | donnyd: I think the next time we run into the spike of deleting nodes we can do a nova list/openstack server list and check | 16:39 |
corvus | fungi: if you 'tar cfz', do you get a tar.gz file with 1 gzip crc, or a bunch of them? | 16:39 |
fungi | corvus: in the case of tar+gzip there is only one file from gzip's perspective. i don't know if there are interspersed gzip headers mixed within the data stream of that file | 16:40 |
donnyd | clarkb: maybe I can setup a CI job that polls the API every so often and puts that into object storage | 16:40 |
donnyd | So that way could at least track the states over time | 16:40 |
corvus | clarkb: https://pypi.org/project/gzip-stream/ | 16:40 |
fungi | unfortunately i'm more familiar with tar's protocol than gzip's, owing to dealing with it as a tape stream | 16:41 |
corvus | clarkb: ha! it's the truncate method :) | 16:42 |
clarkb | corvus: yup | 16:42 |
clarkb | its licensed in such a way that we can vendor it without any concern | 16:42 |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: configure-mirrors: use dnf when needed https://review.opendev.org/688118 | 16:43 |
corvus | clarkb: yeah, that could make things easy | 16:44 |
*** xek_ has quit IRC | 16:44 | |
*** markvoelker has joined #openstack-infra | 16:48 | |
*** derekh has quit IRC | 16:48 | |
*** paladox has quit IRC | 16:49 | |
*** xeno_os76_xyz has quit IRC | 16:53 | |
*** xeno_os76_xyz has joined #openstack-infra | 16:53 | |
*** gyee has joined #openstack-infra | 16:53 | |
openstackgerrit | James E. Blair proposed zuul/zuul master: URL quote username/password in gerrit https://review.opendev.org/688155 | 16:54 |
donnyd | clarkb: is there a place I can post this stat in json? | 16:54 |
donnyd | have a job running | 16:55 |
*** roman_g has joined #openstack-infra | 16:55 | |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Use gzip to compress files uploaded to swift https://review.opendev.org/688154 | 16:55 |
donnyd | or should I just ship it to somewhere local | 16:55 |
clarkb | corvus: ^ I think I did that correctly and did proper attribution in the vendoring (even though technically we don't have to because CC0) | 16:56 |
clarkb | donnyd: we don't really have somewhere to post to I don't think | 16:56 |
clarkb | donnyd: I guess you could make a new paste on paste.o.o every once in a while? | 16:56 |
*** bnemec has quit IRC | 16:57 | |
donnyd | I will just create a container in the zuul swift and put it there by timestamp | 16:57 |
*** paladox has joined #openstack-infra | 16:57 | |
corvus | clarkb: yeah, attribution looks good (and i agree we should have it there anyway) | 16:58 |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Use gzip to compress files uploaded to swift https://review.opendev.org/688154 | 16:58 |
clarkb | corvus: ^ I missed an import which that ps fixes | 16:58 |
donnyd | that way if we ever get around to it we could consume it with something else and it will be public | 16:58 |
*** bnemec has joined #openstack-infra | 16:58 | |
*** eernst has joined #openstack-infra | 16:58 | |
*** eernst_ has joined #openstack-infra | 16:59 | |
*** eernst has quit IRC | 16:59 | |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Use gzip to compress files uploaded to swift https://review.opendev.org/688154 | 17:04 |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Test role for upload-logs-swift https://review.opendev.org/688177 | 17:04 |
*** jpena is now known as jpena|off | 17:05 | |
donnyd | clarkb: can you get to this https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/fortnebula-infra-logs/ | 17:06 |
openstackgerrit | Clark Boylan proposed opendev/base-jobs master: Test the upload-logs-swift test role in base-test https://review.opendev.org/688178 | 17:08 |
clarkb | corvus: ^ that updates base-test | 17:08 |
clarkb | donnyd: I can, it renders as xml listing which isn't the prettiest but was able to find the json file from it just fine | 17:08 |
donnyd | is there a way to change that? | 17:08 |
clarkb | donnyd: I think that is working | 17:08 |
clarkb | donnyd: you can set an attribute on the container to have swift render and index.html for you | 17:09 |
clarkb | donnyd: but I don't think it is strictly necessary | 17:09 |
donnyd | well it surely would make it more convenient | 17:09 |
clarkb | X-Container-Meta-Web-Listings set that to true I think | 17:09 |
clarkb | as a container header attribute | 17:10 |
donnyd | ok, can you check again to see if its good on your end? | 17:11 |
clarkb | donnyd: yup I have an index.html now | 17:12 |
donnyd | ok, I will schedule that job to run every couple minutes | 17:12 |
clarkb | donnyd: maybe set a week long expiry on the files too | 17:13 |
clarkb | we only keep about a week of nodepool logs iirc | 17:14 |
donnyd | I am learning all kinds of the swifts today... not sure how to do that either | 17:14 |
*** dklyle has quit IRC | 17:14 | |
donnyd | will swift do that for me too | 17:15 |
clarkb | yup /me is finding docs | 17:15 |
clarkb | https://docs.openstack.org/ocata/user-guide/cli-swift-set-object-expiration.html | 17:15 |
clarkb | I would use X-Delete-After | 17:15 |
clarkb | then do the maths for 7 days in seconds as the value | 17:15 |
donnyd | so something like date -u "+%Y%m%d-%H%M%S" -d "+7 days" | 17:17 |
*** markvoelker has quit IRC | 17:18 | |
openstackgerrit | Sorin Sbarnea proposed zuul/zuul-jobs master: configure-mirrors: use dnf when needed https://review.opendev.org/688118 | 17:18 |
clarkb | donnyd: if using Delete-At yes | 17:18 |
stephenfin | nova references things e.g. the py36 bindep profile in .zuul.yaml. Could someone point me to where those are defined? | 17:19 |
clarkb | or set X-Delete-After to 604800 | 17:19 |
donnyd | oh I like that much mo betta | 17:19 |
*** roman_g has quit IRC | 17:19 | |
clarkb | stephenfin: the bindep profile should be called bindep.txt in the root of the nova repo | 17:19 |
clarkb | stephenfin: or rather the full set of rules are in that file then those with the py36 profile will have a py36 annotation on those lines | 17:20 |
stephenfin | that's what I was expecting but I don't see any such annotations | 17:20 |
stephenfin | $ cat bindep.txt | grep py36 | 17:20 |
stephenfin | $ | 17:20 |
stephenfin | we don't have a global bindep.txt any more, do we? | 17:20 |
stephenfin | so maybe the definition in zuul.yaml is useless | 17:21 |
clarkb | stephenfin: we do have a global bindep file but only in legacy jobs (which your python36 jobs shouldn't be based on) | 17:21 |
stephenfin | nope, those are based on openstack-tox | 17:21 |
clarkb | in that case you don't have any py36 specific rules | 17:21 |
stephenfin | (y) thanks for the help :) | 17:22 |
donnyd | clarkb: is there a way to check and make sure its set on an object | 17:23 |
AJaeger | stephenfin: https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/jobs.yaml#L140 is the job definition, and line 153 shows the bindep profiles that get installed | 17:24 |
donnyd | oh i guess swift stat will tell me that data | 17:24 |
*** eernst_ has quit IRC | 17:25 | |
donnyd | man object storage is so handy to have | 17:25 |
stephenfin | AJaeger: I'd expected bindep to fail if there wasn't anything matching the py36 profile but clearly not | 17:25 |
stephenfin | I guess that's just in case we (nova) needed to install anything for that environment specifically | 17:26 |
donnyd | booya... so that job will run every 2 minutes.. do you think that is too much? | 17:26 |
AJaeger | stephenfin: bindep works fine with nothing to do ;) | 17:27 |
AJaeger | stephenfin: correct, only if nova needs anything additional | 17:27 |
*** yamamoto has joined #openstack-infra | 17:27 | |
*** rascasoft has quit IRC | 17:28 | |
*** rascasoft has joined #openstack-infra | 17:28 | |
*** gfidente has quit IRC | 17:29 | |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Use gzip to compress files uploaded to swift https://review.opendev.org/688154 | 17:29 |
*** yamamoto has quit IRC | 17:33 | |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Test role for upload-logs-swift https://review.opendev.org/688177 | 17:34 |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Use gzip to compress files uploaded to swift https://review.opendev.org/688154 | 17:34 |
*** eharney has quit IRC | 17:34 | |
*** ociuhandu_ has joined #openstack-infra | 17:34 | |
*** rlandy is now known as rlandy|brb | 17:36 | |
*** ociuhandu has quit IRC | 17:37 | |
*** smarcet has joined #openstack-infra | 17:41 | |
*** ociuhandu_ has quit IRC | 17:42 | |
*** ociuhandu has joined #openstack-infra | 17:43 | |
*** gfidente has joined #openstack-infra | 17:45 | |
donnyd | clarkb: also do you thing I should be using json or cli outputs? | 17:47 |
*** ociuhandu has quit IRC | 17:48 | |
donnyd | json is handy if I connect up something else to ingest this data | 17:48 |
donnyd | put not so handy for the humans | 17:48 |
clarkb | donnyd: my browser renders json fine so I'm ok with it as json | 17:49 |
*** dklyle has joined #openstack-infra | 17:49 | |
pabelanger | Hmm | 17:50 |
donnyd | okie dokey | 17:50 |
pabelanger | I seem to be getting a segfault with latest DIB now: http://paste.openstack.org/show/783025/ | 17:50 |
pabelanger | that is for fedora-30 | 17:50 |
pabelanger | ianw: ^maybe ideas on how to debug | 17:51 |
pabelanger | using latest released dib | 17:51 |
*** smarcet has quit IRC | 17:51 | |
*** smarcet has joined #openstack-infra | 17:53 | |
*** dklyle has quit IRC | 17:56 | |
clarkb | pabelanger: looks like yum segfaulted? | 17:58 |
clarkb | pabelanger: you should be able to turn on keeping of core dumpts with ulimit then load it in gdb to confirm | 17:58 |
clarkb | (where yum is $YUM and may be dnf) | 17:58 |
*** e0ne has joined #openstack-infra | 17:59 | |
pabelanger | Oct 11 17:56:19 nb01 kernel: [12804485.152760] traps: dnf[3238] general protection ip:7fe85b87811d sp:7ffdadb934c0 error:0 in libc-2.29.so[7fe85b815000+14d000] | 17:59 |
pabelanger | yah, looks like it | 18:00 |
*** slaweq has quit IRC | 18:00 | |
openstackgerrit | Clark Boylan proposed zuul/nodepool master: Handle case where nova server is in DELETED state https://review.opendev.org/688157 | 18:01 |
*** dklyle has joined #openstack-infra | 18:02 | |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Test role for upload-logs-swift https://review.opendev.org/688177 | 18:04 |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Use gzip to compress files uploaded to swift https://review.opendev.org/688154 | 18:04 |
*** yamamoto has joined #openstack-infra | 18:06 | |
pabelanger | clarkb: odd, stop / start of nodepool-builder seems to have cleared things up. wonder if I had some hung process some place | 18:07 |
pabelanger | nope, that is a lie. Just happened again | 18:07 |
*** rlandy|brb is now known as rlandy | 18:08 | |
clarkb | it is breaking in libc | 18:08 |
clarkb | is it possible that the libc in the chroot is unhappy with the kernel ? That would be very odd though | 18:08 |
clarkb | pabelanger: loading the coredump and running bt against it to get a backtrace might help (though you may also need to install debug files) | 18:09 |
pabelanger | there must be something specific to that gettext install phase because stuff before seems to work okay | 18:09 |
pabelanger | yah, going to work on that | 18:09 |
*** yamamoto has quit IRC | 18:11 | |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Test role for upload-logs-swift https://review.opendev.org/688177 | 18:15 |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Use gzip to compress files uploaded to swift https://review.opendev.org/688154 | 18:15 |
clarkb | *, somearg isn't valid in python2 | 18:15 |
clarkb | er that was for #zuul | 18:15 |
*** eharney has joined #openstack-infra | 18:16 | |
*** e0ne has quit IRC | 18:25 | |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Test role for upload-logs-swift https://review.opendev.org/688177 | 18:25 |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Use gzip to compress files uploaded to swift https://review.opendev.org/688154 | 18:25 |
donnyd | so next week if we have anymore hiccups now we can correlate what the api says on my end and from zuul | 18:27 |
donnyd | I have some other work to get done before i pack it up for the weekend | 18:28 |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Test role for upload-logs-swift https://review.opendev.org/688177 | 18:32 |
openstackgerrit | Clark Boylan proposed zuul/zuul-jobs master: Use gzip to compress files uploaded to swift https://review.opendev.org/688154 | 18:32 |
fungi | thanks donnyd! | 18:34 |
*** e0ne has joined #openstack-infra | 18:35 | |
openstackgerrit | Monty Taylor proposed zuul/zuul-registry master: Catch openstack exceptions instead of keystoneauth https://review.opendev.org/688185 | 18:37 |
*** rlandy has quit IRC | 18:38 | |
*** rlandy has joined #openstack-infra | 18:39 | |
*** smarcet has quit IRC | 18:42 | |
*** jrist has quit IRC | 18:43 | |
*** jbadiapa has quit IRC | 18:45 | |
*** jrist has joined #openstack-infra | 18:51 | |
*** e0ne has quit IRC | 18:53 | |
*** smarcet has joined #openstack-infra | 18:55 | |
*** pcaruana has quit IRC | 18:55 | |
openstackgerrit | Merged zuul/zuul-jobs master: Test role for upload-logs-swift https://review.opendev.org/688177 | 18:56 |
*** eernst has joined #openstack-infra | 18:58 | |
*** weshay_ is now known as weshay | 19:00 | |
openstackgerrit | Merged zuul/zuul master: URL quote username/password in gerrit https://review.opendev.org/688155 | 19:00 |
*** eernst has quit IRC | 19:02 | |
*** xek_ has joined #openstack-infra | 19:03 | |
*** eernst has joined #openstack-infra | 19:07 | |
openstackgerrit | Merged opendev/base-jobs master: Test the upload-logs-swift test role in base-test https://review.opendev.org/688178 | 19:09 |
*** eernst has quit IRC | 19:12 | |
pabelanger | clarkb: ianw: Hmm, it looks like the core is getting saved inside the chroot, which nodepool-builder then deletes before I can copy out :( | 19:13 |
pabelanger | ERROR: apport (pid 22642) Fri Oct 11 19:08:23 2019: writing core dump to /tmp/dib_build.UnFYbmGt/mnt/core (limit: -1) | 19:13 |
pabelanger | ERROR: apport (pid 22642) Fri Oct 11 19:08:23 2019: executable: /tmp/dib_build.UnFYbmGt/mnt/usr/bin/python3.7 (command line "/usr/bin/python3 /usr/bin/dnf -v -y install gettext") | 19:14 |
pabelanger | seems to be what is triggering it | 19:14 |
clarkb | pabelanger: you can add a bash command into the dib scripts as a sort of breakpoint | 19:15 |
clarkb | it will drop you into the chroot with that shell | 19:15 |
*** goldyfruit___ has joined #openstack-infra | 19:15 | |
*** eharney has quit IRC | 19:16 | |
pabelanger | yah, I think I may also be able to export break=after-error | 19:16 |
*** goldyfruit_ has quit IRC | 19:17 | |
*** markvoelker has joined #openstack-infra | 19:18 | |
*** smarcet has quit IRC | 19:18 | |
*** yamamoto has joined #openstack-infra | 19:20 | |
clarkb | https://c540761633a53fe4ef5d-0ee3eeb68aa256c74f1e35f60c262d61.ssl.cf1.rackcdn.com/680178/2/check/tox-py27/5193465/job-output.txt renders for me and only has gzip encoding | 19:24 |
*** yamamoto has quit IRC | 19:24 | |
clarkb | corvus: ^ I think that means it worked | 19:24 |
*** dchen has joined #openstack-infra | 19:26 | |
*** jamesmcarthur has quit IRC | 19:26 | |
clarkb | I'm trying to find a file I can compare against but for whatever reason we don't seem to double encode job-output.txt on jobs normally? | 19:26 |
clarkb | hwoever they are definitely deflate befoer and gzip now | 19:26 |
clarkb | at the very least I don't think we'll have broken existing working clients | 19:27 |
*** dychen has quit IRC | 19:27 | |
*** gagehugo has quit IRC | 19:29 | |
*** markvoelker has quit IRC | 19:30 | |
mordred | clarkb: I can verify that url renders in safari | 19:30 |
corvus | what was double encoded before? | 19:30 |
clarkb | corvus: looks like roman_g's example was a docs/ subdir | 19:31 |
fungi | it was https://2365c1d014187c3ae706-2572cddac5187c7b669ab9398e41b48d.ssl.cf5.rackcdn.com/687536/4/check/openstack-tox-docs/eb49c40/docs/ | 19:31 |
clarkb | so maybe we can make a change to reparent zuul-jobs docs build | 19:31 |
* clarkb looks | 19:31 | |
fungi | specifically | 19:31 |
clarkb | its possible that we have to trip over some html handling in the cdn or something | 19:32 |
mordred | clarkb: yeah. docs subdir | 19:32 |
mordred | and I can confirm in a safari that docs from that change fails to render | 19:32 |
mordred | so if we can reparent a docs change, I can verify whether it fixes the issue or not | 19:32 |
clarkb | mordred: k I'm just double checking the inheritance path but should have a change up to test that shortly | 19:33 |
*** markvoelker has joined #openstack-infra | 19:33 | |
mordred | cool | 19:33 |
clarkb | ok opendev-tox-docs doesn't parent to tox-docs | 19:35 |
clarkb | so I can't edit tox-docs in zuul-jobs | 19:35 |
clarkb | this might be tricky to test because of trusted repo stuff | 19:35 |
clarkb | I'm not quite sure what the best way to get non trivial html content out of an existing job we can speculatively execute is | 19:37 |
clarkb | I'm going to find lunch now before I forget | 19:40 |
clarkb | if someone else wants to figure out ^ go for it. You just need to reparent to base-test | 19:40 |
*** jamesmcarthur has joined #openstack-infra | 19:40 | |
corvus | ++ me eat too | 19:40 |
*** markvoelker has quit IRC | 19:41 | |
*** rh-jelabarre has quit IRC | 19:45 | |
*** rh-jelabarre has joined #openstack-infra | 19:46 | |
fungi | i guess the simplest approach is going to be to make a clone of opendev-tox-docs which is parented to base-test? i'll see if i can propose that and then we merge it and then try to run that job in another proposed change | 19:46 |
clarkb | hrm ya I guess that would do it | 19:47 |
clarkb | then once that is done I guess I can start testong ianw's glean change | 19:47 |
*** jtomasek has quit IRC | 19:49 | |
pabelanger | clarkb: looks like bash inside chroot is messed up | 19:50 |
*** ykarel|afk has quit IRC | 19:50 | |
pabelanger | 2019-10-11 19:50:05.089 | bash: line 1: $'t\024': command not found | 19:50 |
pabelanger | when it dropped to bash | 19:50 |
openstackgerrit | Jeremy Stanley proposed opendev/base-jobs master: Add temporary opendev-tox-docs clone for base-test https://review.opendev.org/688194 | 19:51 |
fungi | clarkb: corvus: ^ | 19:51 |
pabelanger | going to try and break before pre-install, and run command myself | 19:52 |
*** jamesmcarthur has quit IRC | 19:52 | |
*** yamamoto has joined #openstack-infra | 19:57 | |
pabelanger | clarkb: ianw: heh, https://nb01.openstack.org/fedora-30-0000000067.log is also failing | 19:59 |
*** yamamoto has quit IRC | 20:02 | |
pabelanger | clarkb: ianw: from what I see, bash doesn't appear to be setup correct, in chroot. I have core, but looking for development headers | 20:03 |
pabelanger | I _think_ it is failing when bash complation stuff is getting compiled | 20:03 |
clarkb | neat | 20:03 |
clarkb | so it hampers debugging too | 20:04 |
pabelanger | because inside chroot, I don't have proper bash prompt | 20:04 |
pabelanger | clarkb: ianw: okay, fedora-30 src nodepool jobs is also failing, so trying to see when it broke | 20:08 |
corvus | i'll need to do a full restart of zuul for the gerrit urlquote fix... i'm ready for that, is there anything else we should make sure is merged first, or anything i should wait on? | 20:13 |
clarkb | I am not aware of anything | 20:17 |
*** kjackal has quit IRC | 20:17 | |
clarkb | maybe double check no releases are in flight | 20:17 |
openstackgerrit | Merged opendev/base-jobs master: Add temporary opendev-tox-docs clone for base-test https://review.opendev.org/688194 | 20:19 |
corvus | fungi: ^ | 20:23 |
fungi | thanks! working on the next change now | 20:25 |
*** eharney has joined #openstack-infra | 20:30 | |
openstackgerrit | Jeremy Stanley proposed zuul/zuul master: DNM: Test changes to base-test job https://review.opendev.org/688199 | 20:31 |
clarkb | fungi: ^ that job doesn't exist | 20:34 |
clarkb | maybe missed a git add? | 20:34 |
fungi | d'oh | 20:36 |
fungi | i probably got the name wrong | 20:36 |
fungi | should have copies and pasted | 20:36 |
*** odicha has quit IRC | 20:37 | |
fungi | oh, zuul-tox-docs is defined somewhere other than zuul | 20:42 |
fungi | maybe i can find a project which directly consumes opendev-tox-docs | 20:42 |
clarkb | fungi: looks like openstack-tox-docs parents to it | 20:44 |
clarkb | ozj does | 20:44 |
fungi | yeah, nothing uses it directly outside trusted config projects | 20:44 |
clarkb | is ozj trusted? | 20:44 |
clarkb | (I don't think it is) | 20:45 |
fungi | it doesn't run it, only defines it | 20:45 |
clarkb | fungi: https://review.opendev.org/#/c/685453/ says it runs it | 20:45 |
fungi | oh, though maybe it does run something which is parented to it | 20:45 |
fungi | huh. so it does | 20:45 |
fungi | ahh, probably via a project-template | 20:46 |
fungi | okay, i'll test it there. thanks! | 20:46 |
openstackgerrit | Merged opendev/system-config master: Remove read-only user from registry https://review.opendev.org/687423 | 20:48 |
openstackgerrit | Merged opendev/system-config master: Remove linaro-cn1 https://review.opendev.org/686770 | 20:48 |
*** markvoelker has joined #openstack-infra | 20:49 | |
*** sreejithp has quit IRC | 20:52 | |
corvus | restarting now | 20:52 |
openstackgerrit | Jeremy Stanley proposed openstack/openstack-zuul-jobs master: DNM: exercise base-test job https://review.opendev.org/688202 | 20:52 |
*** markvoelker has quit IRC | 20:53 | |
clarkb | ianw's glean fix is 6/6 centos 7 boots on fn and 6/6 fedora 29 boots on fn | 20:57 |
clarkb | I'm going to sort out ubuntu builds now (trusty in particular hits the other code path in his change) | 20:58 |
clarkb | donnyd: ^fyi it is looking good for ianw tracking down that issue | 20:58 |
donnyd | Yea, that was good work in running down what would seem to be the root of it | 20:58 |
corvus | fungi: you may need to recheck cthat change | 20:59 |
* corvus does so | 20:59 | |
corvus | oh, nm it was there | 20:59 |
fungi | i figured i would check it it got enqueued, but thanks! | 21:01 |
fungi | we'll presumably need to recheck it until we hit a rax log location | 21:02 |
clarkb | fungi: ya | 21:02 |
fungi | or i can propose multiples if we're in a hurry | 21:02 |
corvus | #status log restarted all of zuul on commit b768ece2c0ecd235c418fe910b84ff88f69860d6 | 21:02 |
openstackstatus | corvus: finished logging | 21:02 |
* donnyd appreciates everyones hard work in figuring out the ipv6 issue so FN works correctly | 21:03 | |
clarkb | I'll see how far I get testing these images today. I'll leave a comment on the chagne with how much testing I've done so that ianw can pick it back up on his monday morning | 21:04 |
clarkb | but I think we need a coordinated release of glean and dib to fix it (dib update to remove ra solicit delay and glean to just configure interfaces if udev triggered us) | 21:05 |
*** yamamoto has joined #openstack-infra | 21:12 | |
*** yamamoto has quit IRC | 21:16 | |
*** rfolco|ruck has quit IRC | 21:17 | |
*** tesseract has quit IRC | 21:18 | |
clarkb | mordred: does https://1354a872ead15749011f-193839948e82df3f7b6031d1afea5d13.ssl.cf1.rackcdn.com/688202/1/check/opendev-tox-docs-temporary-test/4e52d83/docs/ work for you? | 21:19 |
clarkb | https://b07845379824cfa9e48f-ea6c0ca013bce2ed83ca9ffa0031d5bd.ssl.cf1.rackcdn.com/685453/1/gate/opendev-tox-docs/fdd525a/docs/ is a previous ozj docs build on rackcdn that is encoded with both deflate and gzip | 21:20 |
clarkb | the first link was from fungi's test chagne and it appears to only have gzip encoding | 21:20 |
clarkb | if that works for mordred and the second fails then I think we can consider this the fix and merge the change to the production role | 21:20 |
corvus | also beware the jabberwock | 21:22 |
fungi | i should have just said "fnord" there | 21:22 |
fungi | then you would be able to say "i see the fnord" | 21:23 |
clarkb | and them maybe all you mac os x owners can file a bug or complain on twitter or somethign about safari not following the rfc | 21:23 |
*** eharney has quit IRC | 21:25 | |
*** whoami-rajat has quit IRC | 21:27 | |
clarkb | Serious errors were found while checking the disk drive for /. <- from booting my trusty image | 21:33 |
clarkb | I'll have to sort that out I guess | 21:33 |
clarkb | bionic works, testing xenial nowish | 21:34 |
openstackgerrit | David Ames proposed openstack/project-config master: New charms & interfaces for MySQL8 https://review.opendev.org/688209 | 21:40 |
*** xeno_os76_xyz has quit IRC | 21:41 | |
mordred | clarkb: checking | 21:41 |
mordred | clarkb: yes - that link works in safari | 21:42 |
openstackgerrit | David Ames proposed openstack/project-config master: New charms & interfaces for MySQL8 https://review.opendev.org/688209 | 21:42 |
mordred | clarkb: and the second one fails | 21:42 |
clarkb | cool I think we can approve https://review.opendev.org/#/c/688154/10 as having been tested with the correct behavior | 21:43 |
clarkb | yall should obviously review it for other criteria though | 21:43 |
*** yamamoto has joined #openstack-infra | 21:44 | |
fungi | awesome. i'll review that and abandon my test change and get a revert for the docs test job pushed up | 21:46 |
*** yamamoto has quit IRC | 21:49 | |
fungi | clarkb: i'm guessing the addition of the test-upload-logs-swift role in zuul/zuul-jobs is something we want to keep, but its addition in the opendev/base-jobs base-test job definition should be reverted, right? | 21:53 |
clarkb | fungi: correct | 21:53 |
fungi | okay, i'll just squash a revert of the most recent two opendev/base-jobs changes in that case | 21:54 |
openstackgerrit | Jeremy Stanley proposed opendev/base-jobs master: Revert "Test the upload-logs-swift test role in base-test" https://review.opendev.org/688215 | 21:56 |
*** rlandy has quit IRC | 21:57 | |
ianw | pabelanger: yeah, i'll have to jump on that segfault on monday :/ | 21:59 |
ianw | clarkb: those boots sound good! i started ripping it all out, but quickly ran into all the corner cases i mentioned so it got a bit much for friday afternoon | 22:00 |
clarkb | ianw: I'm having trouble with trusty but I don't think it is due to your glean change | 22:00 |
clarkb | so far fedora 29, centos 7, xenial, and bionic all work though | 22:00 |
clarkb | I'm trying a rebuild of trusty to rule out a fluke in uploads or bit flips somewhere | 22:00 |
clarkb | being a lot more careful to check hashes | 22:00 |
ianw | pabelanger: so you're seeing that same segfault outside the gate? | 22:00 |
ianw | ok cool ... i'm not sure what trusty we have left, there only seemed to be a few things from a cursory codesearch | 22:01 |
openstackgerrit | Merged zuul/zuul-jobs master: Use gzip to compress files uploaded to swift https://review.opendev.org/688154 | 22:02 |
*** diablo_rojo has quit IRC | 22:03 | |
clarkb | ianw: the internet seems to say this is often due to grub config errors (and common on trusty :( ) | 22:08 |
clarkb | ianw: any of that familiar to you? | 22:08 |
*** goldyfruit___ has quit IRC | 22:08 | |
*** ralonsoh has quit IRC | 22:09 | |
*** iurygregory has quit IRC | 22:10 | |
ianw | no sorry ... i haven't touched trusty in ages | 22:15 |
clarkb | also my build host is bionic not xenial so I can't build opensuse :/ and maybe that is a difference with our prod builders | 22:16 |
clarkb | I'm fetching the image locally and will check if anything looks wrong | 22:17 |
ianw | it has booted a trusty with the change in https://zuul.opendev.org/t/openstack/build/8e26a210bb1e43b4bf063272a462ef78 | 22:17 |
clarkb | oh that is a good point | 22:25 |
*** diablo_rojo has joined #openstack-infra | 22:29 | |
openstackgerrit | Clark Boylan proposed openstack/diskimage-builder master: Remove RA solicit delay https://review.opendev.org/688218 | 22:30 |
clarkb | ianw: ^ fyi that is the companion change to your glean change | 22:30 |
clarkb | cool editing grub settings got it to work | 22:34 |
clarkb | ianw: see comments on https://review.opendev.org/#/c/688031/2 but I think it is good to go | 22:36 |
*** KeithMnemonic has quit IRC | 22:38 | |
*** rcernin has joined #openstack-infra | 22:41 | |
*** xek_ has quit IRC | 22:42 | |
*** xek_ has joined #openstack-infra | 22:45 | |
*** jamesmcarthur has joined #openstack-infra | 22:45 | |
clarkb | mordred: https://8e92ab9dbce64be02399-6e69b28122178bb298c7327890cc2dcb.ssl.cf2.rackcdn.com/679132/5/check/build-openstack-releasenotes/97952ec/docs/ if that works for you then I think the fix is now in production and working there | 22:45 |
clarkb | I've confirmed it only has a gzip encoding | 22:45 |
clarkb | also if others want to review https://review.opendev.org/#/c/688031/2 and https://review.opendev.org/688218 that might help ianw get to a position where he can roll that bit of fixes out on his monday | 22:46 |
clarkb | and finally we think https://review.opendev.org/#/c/688157/ may fix a bug in nodepool detecting deleted instances in nova | 22:46 |
clarkb | if anyone else can review that that would be great though I likely won't restart launchers until monday because it is just late enough on a friday at this point | 22:47 |
mordred | clarkb: I can confirm that fix continues to work | 22:48 |
mordred | clarkb: so yay! | 22:48 |
*** xek_ has quit IRC | 22:57 | |
fungi | clarkb: should we update 688031 to remove the dead codepath, or propose that in a separate change? i'm leaning toward the former, since as noted the log call is unreachable | 23:03 |
*** yamamoto has joined #openstack-infra | 23:03 | |
fungi | i've got a new patchset for it ready to push, and am otherwise happy to approve that change | 23:06 |
clarkb | fungi: either way should be fine | 23:07 |
openstackgerrit | Jeremy Stanley proposed opendev/glean master: Do not bring up udev assigned interfaces https://review.opendev.org/688031 | 23:07 |
*** yamamoto has quit IRC | 23:07 | |
clarkb | as a sanity check I've confirmed we *should* be handling gzip in addition to deflate in the logstash gearman worker | 23:25 |
clarkb | which means the gzip switch shouldn't pose any problems for elastic-recheck | 23:25 |
clarkb | and ya the job-output.txt for the job that mordred just checked above is in logstash | 23:26 |
clarkb | implying it handled the gzip encoding just fine | 23:26 |
*** michael-beaver has quit IRC | 23:38 | |
*** yamamoto has joined #openstack-infra | 23:41 | |
*** yamamoto has quit IRC | 23:46 | |
*** jamesmcarthur has quit IRC | 23:53 | |
*** jamesmcarthur has joined #openstack-infra | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!