*** wolverineav has quit IRC | 00:09 | |
*** wolverineav has joined #openstack-infra | 00:09 | |
*** slaweq has joined #openstack-infra | 00:11 | |
*** wolverineav has quit IRC | 00:14 | |
*** slaweq has quit IRC | 00:16 | |
*** woojay has quit IRC | 00:25 | |
*** sthussey has quit IRC | 00:34 | |
*** dtroyer has quit IRC | 00:34 | |
*** ssbarnea|rover has quit IRC | 00:34 | |
*** wolverineav has joined #openstack-infra | 00:34 | |
*** dtroyer has joined #openstack-infra | 00:35 | |
*** dtroyer has quit IRC | 00:36 | |
*** wolverineav has quit IRC | 00:36 | |
*** dtroyer has joined #openstack-infra | 00:37 | |
*** wolverineav has joined #openstack-infra | 00:38 | |
*** wolverineav has quit IRC | 00:38 | |
*** wolverineav has joined #openstack-infra | 00:38 | |
*** Swami has quit IRC | 00:49 | |
*** gyee has quit IRC | 00:58 | |
*** armax has joined #openstack-infra | 01:00 | |
*** yamamoto has quit IRC | 01:03 | |
*** slaweq has joined #openstack-infra | 01:10 | |
*** slaweq has quit IRC | 01:15 | |
*** psachin has joined #openstack-infra | 01:30 | |
zxiiro | Anyone else seeing "ImportError: cannot import name decorate" when using openstack client? | 01:34 |
---|---|---|
zxiiro | I think dogpile.cache released a new version yesterday that's breaking. | 01:34 |
clarkb | zxiiro: I think the dogpile thing is a known issue but unaware of fix | 01:35 |
zxiiro | pinning it to 0.6.8 seems to help my build job at least. | 01:39 |
clarkb | kmalloc: Shrews might be worth email to the discuss list? | 01:43 |
*** d0ugal has quit IRC | 01:56 | |
*** mrsoul has quit IRC | 02:07 | |
*** d0ugal has joined #openstack-infra | 02:11 | |
*** rfolco has quit IRC | 02:30 | |
*** armax has quit IRC | 02:31 | |
*** dave-mccowan has joined #openstack-infra | 02:38 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add change status page https://review.openstack.org/599472 | 02:50 |
*** bhavikdbavishi has joined #openstack-infra | 02:51 | |
*** yamamoto has joined #openstack-infra | 03:02 | |
*** wolverineav has quit IRC | 03:03 | |
*** wolverineav has joined #openstack-infra | 03:04 | |
*** armax has joined #openstack-infra | 03:05 | |
*** wolverineav has quit IRC | 03:08 | |
*** slaweq has joined #openstack-infra | 03:11 | |
*** hongbin has joined #openstack-infra | 03:14 | |
*** apetrich has quit IRC | 03:15 | |
*** hongbin has quit IRC | 03:15 | |
*** slaweq has quit IRC | 03:15 | |
*** hongbin has joined #openstack-infra | 03:16 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add change status page https://review.openstack.org/599472 | 03:18 |
kmalloc | clarkb: i also think that openstacksdk is doing the wrong thing here | 03:22 |
kmalloc | clarkb: trying to figure out why we have written a wrapper that is insisting on passing a bound method into a decorator instead of just wrapping the methods like you normally would. | 03:23 |
kmalloc | zxiiro: set to <0.7.0 for now | 03:24 |
kmalloc | clarkb: i'll write an email to the ML tomorrow/later tonight if someone else doesn't get to it first. | 03:24 |
*** lathiat has quit IRC | 03:36 | |
*** lathiat has joined #openstack-infra | 03:36 | |
*** yamamoto has quit IRC | 03:39 | |
ianw | kmalloc: i just dropped a mail now that we have everything lined up | 03:43 |
kmalloc | Thanks! | 03:49 |
*** ramishra has joined #openstack-infra | 03:49 | |
kmalloc | I think I have a fix for SDK, just need to poke at it a but tomorrow. | 03:49 |
kmalloc | Should be straight forward actually | 03:49 |
*** lbragstad has joined #openstack-infra | 03:50 | |
*** lbragstad has quit IRC | 03:51 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add projects page https://review.openstack.org/604266 | 03:55 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor change page to use a reducer https://review.openstack.org/625145 | 03:55 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor projects page to use a reducer https://review.openstack.org/625146 | 03:55 |
*** dave-mccowan has quit IRC | 04:03 | |
*** udesale has joined #openstack-infra | 04:10 | |
*** psachin has quit IRC | 04:10 | |
*** slaweq has joined #openstack-infra | 04:11 | |
*** lbragstad has joined #openstack-infra | 04:13 | |
*** armax has quit IRC | 04:14 | |
*** slaweq has quit IRC | 04:16 | |
*** ykarel|away has joined #openstack-infra | 04:24 | |
*** psachin has joined #openstack-infra | 04:28 | |
*** hongbin has quit IRC | 04:34 | |
*** woojay has joined #openstack-infra | 04:41 | |
*** jamesmcarthur has joined #openstack-infra | 04:45 | |
*** jamesmcarthur has quit IRC | 04:49 | |
*** bhavikdbavishi has quit IRC | 04:50 | |
*** bhavikdbavishi has joined #openstack-infra | 04:51 | |
*** _alastor_ has quit IRC | 04:54 | |
*** yamamoto has joined #openstack-infra | 04:55 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenShift resource provider https://review.openstack.org/570667 | 04:55 |
*** _alastor_ has joined #openstack-infra | 05:02 | |
*** wolverineav has joined #openstack-infra | 05:08 | |
*** slaweq has joined #openstack-infra | 05:11 | |
*** slaweq has quit IRC | 05:16 | |
*** lucasagomes has quit IRC | 05:17 | |
*** agopi has quit IRC | 05:17 | |
*** agopi has joined #openstack-infra | 05:25 | |
*** _alastor_ has quit IRC | 05:34 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: update status page layout based on screen size https://review.openstack.org/622010 | 05:37 |
*** dklyle has joined #openstack-infra | 05:38 | |
*** yamamoto has quit IRC | 05:40 | |
*** gecong has joined #openstack-infra | 05:44 | |
*** gecong has quit IRC | 05:50 | |
*** wolverineav has quit IRC | 05:51 | |
spsurya | me and whoami-rajat discussed this, So we think we can save our infra resources, if we optimise and fix this https://storyboard.openstack.org/#!/story/2004569 also checking how much this is feasible, confirmation from infra team would correct our understanding and approach | 05:58 |
spsurya | Thanks | 05:58 |
*** gengchc has joined #openstack-infra | 05:59 | |
whoami-rajat | fungi clarkb ^ Please provide your valuable inputs on the above query. Thanks! | 06:05 |
gengchc | hello EmilienM! There are a problem in freezer-api and freezer. Elasticsearch server can't start, Could you please take a look at https://review.openstack.org/#/c/624867/ . error message is [pkg/elasticsearch.sh:_check_elasticsearch_ready:53 : die 53 'Maximum timeout reached. Could not connect to ElasticSearch'] | 06:05 |
openstackgerrit | OpenStack Proposal Bot proposed openstack-infra/project-config master: Normalize projects.yaml https://review.openstack.org/625149 | 06:06 |
*** bhavikdbavishi has quit IRC | 06:06 | |
*** ykarel|away is now known as ykarel | 06:11 | |
*** agopi has quit IRC | 06:13 | |
*** lpetrut has joined #openstack-infra | 06:13 | |
*** agopi has joined #openstack-infra | 06:18 | |
*** yamamoto has joined #openstack-infra | 06:20 | |
*** hwoarang has quit IRC | 06:20 | |
*** hwoarang has joined #openstack-infra | 06:21 | |
*** bhavikdbavishi has joined #openstack-infra | 06:24 | |
*** agopi has quit IRC | 06:25 | |
*** jtomasek has quit IRC | 06:52 | |
*** rcernin has quit IRC | 07:03 | |
*** jtomasek has joined #openstack-infra | 07:06 | |
*** quiquell|off is now known as quiquell | 07:11 | |
*** slaweq has joined #openstack-infra | 07:11 | |
*** pcaruana has joined #openstack-infra | 07:12 | |
*** slaweq has quit IRC | 07:16 | |
*** gengchc has quit IRC | 07:21 | |
*** aojea has joined #openstack-infra | 07:24 | |
*** bhavikdbavishi has quit IRC | 07:35 | |
*** pgaxatte has joined #openstack-infra | 07:36 | |
*** dpawlik has joined #openstack-infra | 07:38 | |
*** ssbarnea|rover has joined #openstack-infra | 07:40 | |
*** slaweq has joined #openstack-infra | 07:41 | |
*** slaweq has quit IRC | 07:47 | |
*** yamamoto has quit IRC | 07:48 | |
*** yamamoto has joined #openstack-infra | 07:48 | |
*** psachin has quit IRC | 07:48 | |
*** slaweq has joined #openstack-infra | 07:51 | |
*** ginopc has joined #openstack-infra | 07:56 | |
*** yamamoto has quit IRC | 07:57 | |
*** lpetrut has quit IRC | 07:58 | |
*** rpittau has joined #openstack-infra | 08:06 | |
*** apetrich has joined #openstack-infra | 08:09 | |
*** markvoelker has joined #openstack-infra | 08:16 | |
openstackgerrit | Merged openstack-infra/project-config master: Add 'Review-Priority' for Cinder repos https://review.openstack.org/620664 | 08:24 |
*** imacdonn has quit IRC | 08:24 | |
*** imacdonn has joined #openstack-infra | 08:24 | |
*** dkehn has quit IRC | 08:28 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add change status page https://review.openstack.org/599472 | 08:35 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor change page to use a reducer https://review.openstack.org/625145 | 08:35 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add projects page https://review.openstack.org/604266 | 08:35 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor projects page to use a reducer https://review.openstack.org/625146 | 08:35 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add labels page https://review.openstack.org/604682 | 08:35 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add nodes page https://review.openstack.org/604683 | 08:35 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor build page to use a reducer https://review.openstack.org/624894 | 08:35 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor build page using a container https://review.openstack.org/624895 | 08:35 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add errors from the job-output to the build page https://review.openstack.org/624896 | 08:35 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add jobs graph rendering https://review.openstack.org/537869 | 08:35 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add project page https://review.openstack.org/625177 | 08:35 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor job page to use a reducer https://review.openstack.org/625178 | 08:35 |
*** yamamoto has joined #openstack-infra | 08:35 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor labels page to use a reducer https://review.openstack.org/625179 | 08:35 |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor nodes page to use a reducer https://review.openstack.org/625180 | 08:35 |
*** priteau has joined #openstack-infra | 08:38 | |
*** bhavikdbavishi has joined #openstack-infra | 08:42 | |
*** jpena|off is now known as jpena | 08:48 | |
*** tosky has joined #openstack-infra | 08:53 | |
*** shardy has joined #openstack-infra | 09:01 | |
*** Emine has joined #openstack-infra | 09:05 | |
*** gfidente|afk is now known as gfidente | 09:14 | |
*** bhavikdbavishi has quit IRC | 09:17 | |
*** ccamacho has joined #openstack-infra | 09:20 | |
*** yamamoto has quit IRC | 09:37 | |
*** ykarel is now known as ykarel|lunch | 10:00 | |
*** bhavikdbavishi has joined #openstack-infra | 10:07 | |
*** pbourke has quit IRC | 10:29 | |
*** pbourke has joined #openstack-infra | 10:31 | |
*** agopi has joined #openstack-infra | 10:31 | |
*** ykarel|lunch is now known as ykarel | 10:32 | |
*** agopi has quit IRC | 10:36 | |
*** markvoelker has quit IRC | 10:36 | |
*** markvoelker has joined #openstack-infra | 10:37 | |
*** e0ne has joined #openstack-infra | 10:40 | |
*** bhavikdbavishi has quit IRC | 10:40 | |
*** markvoelker has quit IRC | 10:41 | |
*** bhavikdbavishi has joined #openstack-infra | 10:46 | |
*** electrofelix has joined #openstack-infra | 10:49 | |
*** rpittau is now known as rpittau|lunch | 11:10 | |
*** bhavikdbavishi has quit IRC | 11:10 | |
*** yamamoto has joined #openstack-infra | 11:15 | |
*** rfolco has joined #openstack-infra | 11:16 | |
*** markvoelker has joined #openstack-infra | 11:16 | |
*** derekh has joined #openstack-infra | 11:17 | |
*** bhavikdbavishi has joined #openstack-infra | 11:28 | |
dulek | Hey, any idea what might be using port 50036 on infra VM's? Or how do I check that? | 11:33 |
dulek | Our kuryr-daemon is unable to bind to it: http://logs.openstack.org/54/623554/6/check/kuryr-kubernetes-tempest-daemon-containerized-octavia-py36/aba6dc8/controller/logs/kubernetes/pod_logs/kube-system-kuryr-cni-ds-lb8xk-kuryr-cni.txt.gz#_2018-12-14_09_02_55_111 | 11:33 |
dulek | It's not 100% of the time, but from time to time the port is taken. | 11:34 |
*** bhavikdbavishi has quit IRC | 11:36 | |
*** gary_perkins has quit IRC | 11:37 | |
*** rossella_s has quit IRC | 11:44 | |
*** rossella_s has joined #openstack-infra | 11:44 | |
*** gary_perkins has joined #openstack-infra | 11:51 | |
*** udesale has quit IRC | 12:11 | |
*** rpittau|lunch is now known as rpittau | 12:13 | |
*** tpsilva has joined #openstack-infra | 12:14 | |
*** bhavikdbavishi has joined #openstack-infra | 12:17 | |
*** pcaruana has quit IRC | 12:21 | |
*** pcaruana has joined #openstack-infra | 12:22 | |
*** rh-jelabarre has joined #openstack-infra | 12:23 | |
*** bobh has quit IRC | 12:24 | |
*** pcaruana is now known as pcaruana|intw| | 12:25 | |
*** yamamoto has quit IRC | 12:28 | |
*** yamamoto has joined #openstack-infra | 12:30 | |
*** yamamoto has quit IRC | 12:30 | |
*** bobh has joined #openstack-infra | 12:30 | |
*** jpena is now known as jpena|lunch | 12:31 | |
openstackgerrit | Jeremy Stanley proposed openstack-infra/zone-opendev.org master: Add address records for lists.opendev.org https://review.openstack.org/625241 | 12:32 |
fungi | clarkb: jhesketh: corvus: ^ corresponding dns addition for lists.opendev.org | 12:33 |
*** _alastor_ has joined #openstack-infra | 12:35 | |
jhesketh | fungi: lgtm | 12:37 |
*** smarcet has joined #openstack-infra | 12:38 | |
*** smarcet has quit IRC | 12:39 | |
*** bobh has quit IRC | 12:41 | |
*** bobh has joined #openstack-infra | 12:44 | |
openstackgerrit | Merged openstack-infra/system-config master: Add lists.opendev.org to Mailman https://review.openstack.org/625096 | 12:53 |
*** bhavikdbavishi has quit IRC | 12:54 | |
*** boden has joined #openstack-infra | 12:55 | |
*** ykarel has quit IRC | 13:00 | |
*** ykarel has joined #openstack-infra | 13:01 | |
fungi | spsurya: whoami-rajat: it's come up many times in the past (as you can expect, lots of people propose this "simple" optimization), but it requires a lot of discussion because of the combination of whether we configure gerrit to clear or preserve verified votes on commit-message-only edits, whether some projects might want ci jobs which lint commit messages, and so on. can't hurt to discuss it again, | 13:02 |
fungi | but there's a lot more nuance to it than it might seem | 13:02 |
*** bhavikdbavishi has joined #openstack-infra | 13:04 | |
*** yamamoto has joined #openstack-infra | 13:05 | |
fungi | dulek: 50036 is well within the default ephemeral ports range for the linux kernel (32768-61000) as well as iana's suggested range (49152-65535) so it could and most probably is something random and maybe a different process each time you hit that | 13:05 |
fungi | dulek: assigning a static listening port above 2^15 is a bad idea | 13:06 |
spsurya | fungi: thanks for update | 13:07 |
*** weshay_pto is now known as weshay | 13:07 | |
*** dave-mccowan has joined #openstack-infra | 13:09 | |
dulek | fungi: Okay, thanks! | 13:11 |
*** trown|outtypewww is now known as trown | 13:11 | |
fungi | dulek: in ci jobs, it's often more effective to use a method which chooses an available ephemeral port and then passes that information along to whatever routines will try connecting to it. this also allows you to have the same fixture start up multiple copies of a listening service without them conflicting over a single port and without having to manually configure individual ports for them | 13:13 |
*** quiquell is now known as quiquell|lunch | 13:13 | |
fungi | i forget the syscall, but you can basically ask the socket to bind to an unspecified ephemeral port and it will get assigned one and return the integer value on success | 13:14 |
fungi | if this is python, socket.socket() and friends probably have a parameter explicitly for this | 13:14 |
*** _alastor_ has quit IRC | 13:14 | |
*** bobh has quit IRC | 13:14 | |
dulek | fungi: That would be doable, but initially we've simply used file socket, but stopped due to some issues with requests lib. Maybe we should revisit that approach. | 13:15 |
*** dave-mccowan has quit IRC | 13:15 | |
fungi | sure, a named pipe/fifo for a unix socket is a useful alternative if you don't actually need it to be a real network connection | 13:15 |
*** yamamoto has quit IRC | 13:18 | |
*** yamamoto has joined #openstack-infra | 13:18 | |
*** EmilienM is now known as EvilienM | 13:20 | |
*** bobh has joined #openstack-infra | 13:20 | |
openstackgerrit | Jeremy Stanley proposed openstack-infra/system-config master: Add rust-vmm OpenDev ML https://review.openstack.org/625254 | 13:23 |
*** bobh has quit IRC | 13:25 | |
fungi | jhesketh: clarkb: corvus: ^ and the first mailing list anyone has requested us to host on lists.opendev.org | 13:26 |
*** ykarel is now known as ykarel|afk | 13:29 | |
*** weshay is now known as weshay1-1 | 13:34 | |
*** jpena|lunch is now known as jpena | 13:35 | |
*** bhavikdbavishi has quit IRC | 13:36 | |
*** rlandy has joined #openstack-infra | 13:37 | |
*** derekh has quit IRC | 13:47 | |
jhesketh | +2 | 13:51 |
*** kgiusti has joined #openstack-infra | 13:54 | |
*** mriedem has joined #openstack-infra | 13:56 | |
*** dkehn has joined #openstack-infra | 13:58 | |
Shrews | fungi: dulek: i think binding to port 0 picks a random, available port | 14:00 |
*** ykarel|afk is now known as ykarel | 14:00 | |
Shrews | iirc, we do that in nodepool tests a lot | 14:01 |
*** pcaruana|intw| has quit IRC | 14:05 | |
*** weshay1-1 is now known as weshay | 14:05 | |
fungi | oh, yep, that's the way | 14:06 |
fungi | for some reason i always forget port 0 is magic | 14:06 |
*** Emine has quit IRC | 14:16 | |
*** jamesmcarthur has joined #openstack-infra | 14:18 | |
*** derekh has joined #openstack-infra | 14:24 | |
*** bobh has joined #openstack-infra | 14:26 | |
*** udesale has joined #openstack-infra | 14:27 | |
*** dave-mccowan has joined #openstack-infra | 14:31 | |
*** pcaruana has joined #openstack-infra | 14:31 | |
*** bobh has quit IRC | 14:32 | |
*** bobh has joined #openstack-infra | 14:37 | |
*** dave-mccowan has quit IRC | 14:43 | |
dhellmann | gerrit-admin: could someone add me to the git-os-job-core and git-os-job-release groups, please, so I can complete the migration? https://review.openstack.org/#/admin/groups/1988,members and https://review.openstack.org/#/admin/groups/1989,members | 14:47 |
*** panda|off is now known as panda | 14:50 | |
*** psachin has joined #openstack-infra | 14:50 | |
dansmith | Shrews: I'm finger | grep'ing right now.. nifty trick | 14:51 |
*** Adri2000 has quit IRC | 14:51 | |
frickler | dhellmann: done | 14:51 |
openstackgerrit | Doug Hellmann proposed openstack-infra/project-config master: add release jobs for git-os-job https://review.openstack.org/625273 | 14:52 |
dhellmann | frickler : thanks! | 14:52 |
fungi | dhellmann: also if you don't feel like using the gerrit groups search form (i find it cumbersome) you can use urls like https://review.openstack.org/#/admin/groups/git-os-job-core | 14:53 |
dhellmann | oh, that's handy | 14:53 |
dhellmann | I couldn't remember the name of the group in this case, so I started from the git repo details page | 14:53 |
dhellmann | but in future... | 14:53 |
fungi | ahh, yep | 14:54 |
*** jamesmcarthur has quit IRC | 14:54 | |
fungi | i think i stumbled on that entirely by accident, so not sure where/whether it's actually documented | 14:54 |
*** Adri2000 has joined #openstack-infra | 14:55 | |
openstackgerrit | Hervé Beraud proposed openstack-dev/pbr master: Allow git-tags to be SemVer compliant https://review.openstack.org/618569 | 14:57 |
ssbarnea|rover | is thera way for zuul to finish a job with a WARNING result? I think I seen somewhere some nice orange WARNING results in gerrit, but not sure where. | 14:58 |
Shrews | dansmith: ++ | 14:58 |
openstackgerrit | Hervé Beraud proposed openstack-dev/pbr master: Allow git-tags to be SemVer compliant https://review.openstack.org/618569 | 14:59 |
*** jamesmcarthur has joined #openstack-infra | 15:00 | |
*** armstrong has joined #openstack-infra | 15:01 | |
*** zul has joined #openstack-infra | 15:04 | |
fungi | ssbarnea|rover: these are the job statuses documented as provided by zuul: https://zuul-ci.org/docs/zuul/user/jobs.html#build-status | 15:05 |
*** markvoelker has quit IRC | 15:06 | |
*** smarcet has joined #openstack-infra | 15:09 | |
*** quiquell|lunch is now known as quiquell | 15:12 | |
*** psachin has quit IRC | 15:21 | |
*** dpawlik has quit IRC | 15:24 | |
openstackgerrit | sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957 | 15:28 |
*** psachin has joined #openstack-infra | 15:29 | |
*** jamesmcarthur has quit IRC | 15:31 | |
boden | hi, has anyone else reported a ContextualVersionConflict error cropping up in the last day or so that appears to be related to eventlet?? ex: http://logs.openstack.org/05/624205/2/check/vmware-tox-lower-constraints/8c25e0b/tox/lower-constraints-2.log | 15:31 |
*** jamesmcarthur has joined #openstack-infra | 15:31 | |
boden | I can't seem to figure out what's changed | 15:31 |
fungi | you've compared that install log with a previous passing run? | 15:33 |
fungi | looks like it's getting eventlet 0.24.1 (maybe via oslo.service?) when the constraint requests <0.21.0 | 15:34 |
*** jamesmcarthur has quit IRC | 15:36 | |
boden | fungi yeah I'm just trying to understand why/how... it just started breaking in the last 24hrs or so and I don't see any changes in requirements that would've affected it | 15:38 |
fungi | Collecting eventlet==0.24.1 (from -c /home/zuul/src/git.openstack.org/openstack/vmware-nsx/lower-constraints.txt (line 22)) | 15:39 |
fungi | http://logs.openstack.org/05/624205/2/check/vmware-tox-lower-constraints/8c25e0b/tox/lower-constraints-1.log | 15:39 |
openstackgerrit | Merged openstack-infra/storyboard master: Change openstack-dev to openstack-discuss https://review.openstack.org/622377 | 15:40 |
fungi | boden: https://git.openstack.org/cgit/openstack/vmware-nsx/tree/lower-constraints.txt#n22 | 15:40 |
boden | fungi yes, but lower constraints haven't changed there recently... so why starting to fail now | 15:40 |
fungi | i'm looking for the <0.21.0 | 15:41 |
*** jamesmcarthur has joined #openstack-infra | 15:42 | |
fungi | resorting to http://codesearch.openstack.org/?q=eventlet.*<0.21.0 since my hunches based on the error message didn't pan out | 15:43 |
fungi | none of those seem relevant either | 15:44 |
fungi | oh, i should check the tagged versions | 15:47 |
openstackgerrit | Sean McGinnis proposed openstack-infra/irc-meetings master: Switch release team meeting to Thursday 1600 https://review.openstack.org/625290 | 15:47 |
openstackgerrit | Doug Hellmann proposed openstack-infra/project-config master: import openstack-summit-counter repository https://review.openstack.org/625292 | 15:48 |
*** dpawlik has joined #openstack-infra | 15:48 | |
*** pgaxatte has quit IRC | 15:48 | |
fungi | boden: Collecting oslo.service==1.24.0 (from -c /home/zuul/src/git.openstack.org/openstack/vmware-nsx/lower-constraints.txt (line 80)) | 15:50 |
fungi | boden: so it's https://git.openstack.org/cgit/openstack/oslo.service/tree/requirements.txt?h=1.24.1#n6 | 15:50 |
fungi | er, https://git.openstack.org/cgit/openstack/oslo.service/tree/requirements.txt?h=1.24.0#n6 rather | 15:51 |
fungi | so your lower constraint for eventlet is set higher than what your lower constraint for oslo.service supports as its maximum eventlet version | 15:52 |
*** gfidente has quit IRC | 15:52 | |
fungi | that's the reason for the error | 15:52 |
fungi | now as to why it only just started happening, this will require more digging | 15:52 |
*** adriancz has quit IRC | 15:52 | |
*** dpawlik has quit IRC | 15:52 | |
boden | fungi yeah I don't understand why it just cropped up... I'll have to dig more as to how we can resolve it | 15:53 |
*** armax has joined #openstack-infra | 15:56 | |
openstackgerrit | sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957 | 15:57 |
*** bhavikdbavishi has joined #openstack-infra | 15:59 | |
dansmith | clarkb: so I've been trying to poke the lvm timeout stuff with a stick | 16:00 |
dansmith | clarkb: we tried serializing all lvm ops, which didn't seem to help | 16:00 |
dansmith | clarkb: I'm also wondering if having a couple few 24G loop devices is causing us to do some really long buffer flushes | 16:01 |
dansmith | clarkb: I dunno how much you know about how that works, but heavy writes to a loop device can OOM the system and the overhead is generally quite high, so I'm wondering if lvm ops occasionally cause a bunch of data to be flushed out and takes a really long time | 16:01 |
dansmith | clarkb: >= bionic has direct-io support for loop, which should help if that's the case.. making loop devices behave more like real block devices. so I have a patch up for devstack to enable that when available | 16:02 |
openstackgerrit | Merged openstack-dev/hacking master: Change openstack-dev to openstack-discuss https://review.openstack.org/622317 | 16:02 |
fungi | boden: https://review.openstack.org/605834 is when the eventlet lower constraint got bumped in vmware-nsx. that merged on october 4 | 16:04 |
fungi | the oslo.service lower constraint has remained unchanged since the job was added | 16:04 |
fungi | boden: this really hasn't been failing all the way back to october 4? | 16:04 |
boden | fungi: https://review.openstack.org/#/c/623609/ | 16:06 |
boden | see lower-constraints job | 16:06 |
fungi | boden: agreed, http://zuul.openstack.org/builds?project=openstack%2Fvmware-nsx&job_name=vmware-tox-lower-constraints&branch=master shows it succeeded as recently as 15 hours ago | 16:06 |
*** dklyle has quit IRC | 16:07 | |
*** dklyle has joined #openstack-infra | 16:07 | |
*** efried has joined #openstack-infra | 16:08 | |
smarcet | fungi: how are u doing , thx for your advices on reviews, but i having one issue on apt::update on xenial its complains about an entry on /etc/apt/sources.list : deb cdrom | 16:13 |
*** dklyle has quit IRC | 16:13 | |
smarcet | fungi: if i remove that line by hand , the puppet runs ok | 16:13 |
smarcet | fungi: i am testing the puppet on xenial 16.04 LTS server | 16:14 |
smarcet | fungi: error seems to be _("/etc/apt/sources.list contains a cdrom source; not installing. Use 'allowcdrom' to override this failure.") | 16:15 |
*** fuentess has joined #openstack-infra | 16:16 | |
clarkb | dansmith: that is good to know, I can review the devstack change if you like. I think we are largely switched over to bionic for devstasck/tempest testing at this point so we should see a change if the direct io support helps | 16:16 |
ttx | Hi ! With some IRC meetings having moved to team channels, we have a lot more room in the "common" meeting rooms. To the point where openstack-meeting-5 is not used that much and we could easily consolidate to the other 4. Would that be desirable or overkill? | 16:16 |
dansmith | clarkb: https://review.openstack.org/#/c/625269/2 | 16:16 |
dansmith | clarkb: swift uses a loop as well, but via mount -o loop, which doesn't get directio turned on.. | 16:17 |
dansmith | clarkb: I figure if this seems to help I can refactor out some loop utilities and do the loop manually for the swift piece if we decide it's worth it | 16:17 |
ttx | with only 32 lurkers, meeting-5 fails to reach the "lurkers benefit too!" benefit | 16:17 |
ttx | we could also get rid of #openstack-meeting-cp. 31 lurkers, no meeting. | 16:18 |
*** bobh has quit IRC | 16:18 | |
* cmurphy had no idea we had a -5 | 16:19 | |
ttx | cmurphy: well only neutron-upgrades and helm uses it right now. And they could move to another room free at the same time | 16:19 |
*** pcaruana has quit IRC | 16:20 | |
openstackgerrit | Merged openstack-infra/git-review master: test_uploads_with_nondefault_rebase: fix git screen scraping https://review.openstack.org/623096 | 16:21 |
*** ginopc has quit IRC | 16:22 | |
*** quiquell is now known as quiquell|off | 16:23 | |
*** e0ne has quit IRC | 16:24 | |
*** dklyle has joined #openstack-infra | 16:24 | |
clarkb | whoami-rajat: spsurya: One gotcha with that is that projects have chosen in the past to enforce testing against their commit messages beyond simple rules like metadata for depends on | 16:25 |
openstackgerrit | Merged openstack-infra/zone-opendev.org master: Add address records for lists.opendev.org https://review.openstack.org/625241 | 16:25 |
clarkb | whoami-rajat: spsurya implementing a feature like that should likely go in zuul itself and be a per project flag if we want to try it. | 16:26 |
*** jamesmcarthur has quit IRC | 16:26 | |
*** bobh has joined #openstack-infra | 16:26 | |
clarkb | that said as I have tried to point out elsewhere the real cost for openstack infra is tied up in a small number of repos and really one extra large project. I am going to continue to push for fixing flaky tests and reducing the impact of those expensive projects over these smaller optomizations | 16:27 |
clarkb | What we get out of reliable testing is not just more efficient use of resources but better software too | 16:27 |
ttx | fungi: opinion on that? (IRC channels ^) | 16:27 |
clarkb | Shrews: yup port 0 will bind to an available high port and the python socket lib lets you ask the socket object for the port number it found | 16:28 |
clarkb | super useful in testing | 16:28 |
fungi | smarcet: you're seeing that error raised by our ci jobs, or on your local system? if the latter, i expect puppet just doesn't think you'll be running on a system installed from cd (or which gets its updates by cd anyway) and instead assumes you'll have removed the cdrom lines from your config already | 16:28 |
smarcet | the later | 16:29 |
smarcet | ok i will remove by hand then and re test | 16:29 |
smarcet | thx u1 | 16:29 |
smarcet | ! | 16:29 |
jbryce | Thanks for setting up the lists.opendev.org pieces. I think this is a simple but neat step toward getting more communities involved | 16:29 |
*** electrofelix has quit IRC | 16:29 | |
clarkb | jbryce: its on its way. I've +2'd https://review.openstack.org/#/c/625254/1 but not approved it in case fungi would like more non OSF input first. fungi you've tended to be cautious on that front in the past, let me know if I should just go ahead an approve or if you want to | 16:30 |
fungi | ttx: i do not object to smashing meeting-5 and meeting-cp into the others if someone wants to reach out to those teams to ask them to consolidate. they likely need time to warn their regular attendees about the channel changes | 16:31 |
ttx | yes of course. Was just wanting to gut-check that was desirable before starting anything | 16:31 |
fungi | clarkb: well, we have jhesketh's blessing at least. but sure, if we can get an additional infra-root reviewer to weigh in i'm all for that as it is our first proposed mailing list on that new domain | 16:32 |
clarkb | fungi: any chance you have a moment to quickly review https://review.openstack.org/#/c/615968/3 and its parents. I think I can likely get through that portion of the stack today (so I can approve them in chunks today and babysit) | 16:34 |
fungi | trying to catch up, but sure i'll get it on my roster | 16:34 |
clarkb | thanks! | 16:35 |
clarkb | dansmith: fwiw my version of losetup says that direct-io=on is the default setting | 16:35 |
clarkb | dansmith: possible we are already enabling it on bionic. /me digs up a bionic manpage | 16:36 |
dansmith | clarkb: mine too, but I floated a test patch to confirm that's a lie | 16:36 |
clarkb | oh neat | 16:36 |
dansmith | clarkb: clarkb https://review.openstack.org/#/c/625268/ | 16:36 |
clarkb | ya bionic manpage says the same, so if it isn't actually set to on as claimed thats a fun bug | 16:36 |
dansmith | see the pastebin in there | 16:36 |
clarkb | certainly seems set to 0 | 16:37 |
dansmith | when I pass =on it goes to 1 | 16:37 |
dansmith | so yeah | 16:37 |
*** gyee has joined #openstack-infra | 16:37 | |
*** jamesmcarthur has joined #openstack-infra | 16:38 | |
*** jamesmcarthur has quit IRC | 16:38 | |
*** jamesmcarthur has joined #openstack-infra | 16:39 | |
*** dklyle has quit IRC | 16:41 | |
*** sthussey has joined #openstack-infra | 16:41 | |
clarkb | frickler: if you are still around https://review.openstack.org/#/c/625269/ is dansmiths change above that may help cinder test reliability | 16:44 |
*** wolverineav has joined #openstack-infra | 16:44 | |
openstackgerrit | Merged openstack-infra/zuul-jobs master: Vendor the RDO repository configuration for installing OVS https://review.openstack.org/624817 | 16:44 |
clarkb | mwhahaha: ^ fyi that should improve reliability of the multinode setup | 16:45 |
mwhahaha | cool, in general it's been really stable lately | 16:45 |
clarkb | (it is worth noting that that was always failing in pre-run so should've been retried, but we should avoid retries as much as possible where we can too) | 16:45 |
clarkb | mwhahaha: ya I think http://status.openstack.org/elastic-recheck/gate.html#1708704 shows a worse picture than what we are seeing on gerrit because those failures will be retried | 16:46 |
clarkb | but cleaning that up and getting it out of the way on e-r will improve resource usage slightly and also fix a bug | 16:46 |
mwhahaha | we've had a few of those in our container update process where we were getting 503s from the mirrors | 16:46 |
clarkb | (and reshuffle e-r with the more important graphs at the top) | 16:46 |
mwhahaha | so it might have been accurate actually | 16:46 |
fungi | yeah, every retry_limit result you see probably means 6x as many jobs got aborted (since some may work on the second or third retry) | 16:47 |
clarkb | hrm oslo.policy crashing stestr subunit streams is a really weird interaction | 16:50 |
clarkb | ah infinite recursion that will do it | 16:50 |
fungi | clarkb: i gather it's likely due to creating massive amounts of stdout/stderr? | 16:51 |
fungi | and yeah, i suppose unbounded recursion could explain that case | 16:51 |
clarkb | fungi: possibly due to infinite recursion. https://review.openstack.org/#/c/625114/4/glance/quota/__init__.py seems to be the fix | 16:51 |
*** jamesmcarthur has quit IRC | 16:51 | |
openstackgerrit | sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957 | 16:52 |
*** jamesmcarthur has joined #openstack-infra | 16:53 | |
*** ykarel is now known as ykarel|away | 16:55 | |
*** tosky has quit IRC | 16:56 | |
*** jamesmcarthur has quit IRC | 16:57 | |
*** shardy is now known as shardy_mtg | 16:58 | |
*** jamesmcarthur has joined #openstack-infra | 17:03 | |
*** udesale has quit IRC | 17:07 | |
*** wolverineav has quit IRC | 17:07 | |
openstackgerrit | sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957 | 17:08 |
clarkb | ssbarnea|rover: was going to ask if you had any more insight into the possible du issue. I'm mostly curious to know what the cause of that was when you sort it out as it seems like it could be useful knoweldge for the future :) | 17:09 |
*** sshnaidm|off has quit IRC | 17:10 | |
*** mriedem is now known as mriedem_lunch | 17:10 | |
ssbarnea|rover | clarkb: sure, I think i almost nailed it. I will add you the the review so you will be able to see it, ok? | 17:10 |
ssbarnea|rover | clarkb: mainly I am still on it! | 17:10 |
clarkb | ssbarnea|rover: thanks | 17:10 |
openstackgerrit | sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957 | 17:10 |
clarkb | ssbarnea|rover: its the sort of bug that knowing what causes zuul to exhibit that behavior is useful as a zuul oeprator :) | 17:11 |
*** dklyle has joined #openstack-infra | 17:11 | |
*** aojea has quit IRC | 17:11 | |
ssbarnea|rover | clarkb: sadly I am not sure but i suspect i found a workaroud, see https://review.openstack.org/#/c/624381/13 | 17:11 |
ssbarnea|rover | using threshold param on du makes it 10x faster, probably because sort ends up doing much less work to sort. | 17:12 |
clarkb | interesting so du does still seem suspect | 17:12 |
clarkb | the threshold flag is probably a reasonable compromise there | 17:12 |
ssbarnea|rover | using timeout solves nothing, even with SIGKILL it does not do it. | 17:13 |
dmellado | hey clarkb is there any issues with zuul as of now? I have seen patches passing on the gate queue being stuck for a while and not getting merged.... | 17:13 |
*** e0ne has joined #openstack-infra | 17:13 | |
clarkb | dmellado: I'm not aware of any functional issues with zuul itself | 17:13 |
ssbarnea|rover | there is also another aspect, which could underline a possible bug related to std* redirections or buggering. | 17:14 |
clarkb | dmellado: kuryr-kubernetes is waiting for the top of its queue to pass tests so it can merge. kuryr-kubernetes-tempest-daemon-octavia is still running | 17:14 |
*** ginopc has joined #openstack-infra | 17:14 | |
ssbarnea|rover | if you remember we alway shard some warnings about closed pipes around du|sort|tail, somethign I was not able to reproduce outsize zuul. | 17:14 |
openstackgerrit | sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957 | 17:14 |
dmellado | clarkb: I've seen for example https://review.openstack.org/#/c/623554/ and if you check for openstack/kuryr-kubernetes on zuul.openstack.org some patches stuck on the gate queue for a while | 17:14 |
ssbarnea|rover | maybe these warnings were related to the blocking bug. | 17:15 |
clarkb | dmellado: yes the gate is a queue, the top/head of the queue must pass testing and merge before anything behind it can merge | 17:15 |
clarkb | dmellado: this is what ensures correctness of the resulting code (we remove the race of one change breaking another my merging out of sync) | 17:15 |
dmellado | d'oh | 17:15 |
dmellado | forget it | 17:15 |
dmellado | I had filtering enabled and didn't realize it | 17:15 |
dmellado | lol | 17:15 |
dmellado | I guess it's friday after all | 17:15 |
*** ginopc has quit IRC | 17:16 | |
clarkb | dmellado: no worries. I had a pretty d'oh moment yesterday thinking we had broken requirements | 17:17 |
dmellado | heh, glad that it didn't happen xD | 17:17 |
clarkb | (turns out it was a broken job on unmerged code, the system was working as intended protecting us from the broken :) ) | 17:17 |
dmellado | xD | 17:18 |
*** efried has quit IRC | 17:18 | |
*** rpittau has quit IRC | 17:22 | |
*** sshnaidm|off has joined #openstack-infra | 17:25 | |
*** tobiash has quit IRC | 17:29 | |
clarkb | ssbarnea|rover: thinking about the warnings more, perhaps also related to how zuul does logging? | 17:29 |
clarkb | ssbarnea|rover: could be there is a buffering bug lingering somewhere or similar | 17:30 |
clarkb | (would need more data to debug that likely) | 17:30 |
*** markvoelker has joined #openstack-infra | 17:31 | |
*** agopi has joined #openstack-infra | 17:32 | |
*** markvoelker has quit IRC | 17:35 | |
*** Emine has joined #openstack-infra | 17:36 | |
*** bnemec is now known as beekneemech | 17:37 | |
*** psachin has quit IRC | 17:39 | |
clarkb | mriedem_lunch: http://logs.openstack.org/74/624974/1/gate/tempest-slow/4ac5ef0/job-output.txt.gz#_2018-12-14_15_29_30_910274 timed out and lost a bunch of time in tempest. Any idea of what is going on there? I can dig in more if that is unfamiliar to you | 17:39 |
clarkb | it seems that tempest got incredibly unhappy and it just snowballed from there | 17:39 |
*** tobiash has joined #openstack-infra | 17:40 | |
*** e0ne has quit IRC | 17:41 | |
dansmith | clarkb: almost looks like something fundamental is stuck.. like keystone or apache itself | 17:42 |
*** e0ne has joined #openstack-infra | 17:43 | |
*** rkukura has quit IRC | 17:44 | |
clarkb | dansmith: http://logs.openstack.org/74/624974/1/gate/tempest-slow/4ac5ef0/controller/logs/apache/access_log.txt.gz shows that apache seems to process requests during the lost time. Many of them to placement | 17:44 |
dansmith | yep | 17:44 |
clarkb | There is the occasional identity request (reupping token?) | 17:44 |
dansmith | you know what I mean though, right? if everything after that is just timing out http calls.. | 17:45 |
clarkb | ya | 17:45 |
*** tobiash has quit IRC | 17:45 | |
*** derekh has quit IRC | 17:47 | |
dansmith | clarkb: http://logs.openstack.org/74/624974/1/gate/tempest-slow/4ac5ef0/controller/logs/screen-n-api.txt.gz#_Dec_14_15_02_19_898727 | 17:49 |
dansmith | rabbit goes down at some point it looks like | 17:49 |
*** dklyle has quit IRC | 17:51 | |
dansmith | although I don't really see any evidence in rabbit's log, | 17:51 |
dansmith | so maybe something networking-wise | 17:51 |
*** wolverineav has joined #openstack-infra | 17:52 | |
*** wolverineav has quit IRC | 17:52 | |
*** wolverineav has joined #openstack-infra | 17:52 | |
clarkb | there are a bunch of missed heartbeats but ya other than that rabbit doesn't seem to think something is wrong | 17:52 |
dansmith | we're connecting to the public ip, but that shouldn't really require that the network be up | 17:53 |
dansmith | so it'd have to be something like iptables blocking something, or just extreme scheduling lag or something like that | 17:53 |
dansmith | http://logs.openstack.org/74/624974/1/gate/tempest-slow/4ac5ef0/compute1/logs/screen-n-cpu.txt.gz?level=ERROR | 17:54 |
dansmith | the compute is way unhappy about rabbit | 17:54 |
dansmith | it also got 502 Proxy Error from cinder | 17:55 |
*** tobiash has joined #openstack-infra | 17:55 | |
dansmith | and neutron | 17:55 |
clarkb | dansmith: I've pulled up a render of the dstat csv and I think that explains why this happens | 17:56 |
clarkb | cpu wai > 50% for much of the job | 17:56 |
clarkb | and load skyrockets. Basically not enough cpu to go around | 17:57 |
dansmith | do you have some automated way to do that btw? | 17:57 |
dansmith | clarkb: cpu wai is not iowait right? | 17:57 |
clarkb | cpu wai is when the kernel is busy waiting on io iirc | 17:57 |
clarkb | so it can be iowait related | 17:57 |
clarkb | dansmith: I dump the csv file from the job into https://lamada.eu/dstat-graph/ | 17:57 |
dansmith | okay, iowait does not mean that there's not enough cpu to go around | 17:58 |
dansmith | ooh | 17:58 |
*** armax has quit IRC | 17:58 | |
dansmith | have to download it to drag/drop it I guess? | 17:58 |
clarkb | ya. There is probably a way to hack things in the js behind the scenes to load from http but I'm a browser noob | 17:59 |
*** graphene has joined #openstack-infra | 17:59 | |
*** dklyle has joined #openstack-infra | 18:00 | |
clarkb | thats a good point though. cpus are busy waiting on other things. Not things the cpu can do itself | 18:00 |
dansmith | that cpu wai looks like iowait to me, | 18:01 |
dansmith | which would mean we're on a really io constrained node | 18:01 |
clarkb | we don't appoear to be swapping either (though that graph doesn't actually render swap usage so I'll need to look more carefully at the raw data) | 18:01 |
dansmith | and io total is low until the end | 18:01 |
openstackgerrit | sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957 | 18:01 |
dansmith | but, just because we're not doing any doesn't mean we're not trying | 18:01 |
dansmith | fwiw, this was next on my list of debugging cinder timeouts, rendering out this data to see if we see spikes around the time we hang for a while | 18:02 |
clarkb | ya, it is also curious that it seems to start around when we start tempest | 18:02 |
*** ykarel|away has quit IRC | 18:02 | |
clarkb | er no devstack? /me double checks timestamps | 18:02 |
*** trown is now known as trown|lunch | 18:03 | |
clarkb | http://logs.openstack.org/74/624974/1/gate/tempest-slow/4ac5ef0/job-output.txt.gz#_2018-12-14_14_50_51_551412 tempest correlates strongly | 18:03 |
dansmith | ah yeah, see, | 18:03 |
clarkb | so devstack + services isn't unhappy until it adds workload | 18:04 |
dansmith | the cpu idle value is like 50% most of the time | 18:04 |
dansmith | that means it has nothing to do, but is doing a lot of waiting | 18:04 |
dansmith | if you mouse over the graph you can see the actual cpu idle value, which is hard to see otherwise since it's white on my screen at least | 18:05 |
*** tobiash has quit IRC | 18:05 | |
*** jpena is now known as jpena|off | 18:06 | |
dansmith | disk traffic in MB/s is zero the whole time until the end and then writes spike, | 18:06 |
dansmith | but there are write iops the whole time | 18:06 |
smcginnis | So directio should help prevent that big flush from happening at the end. | 18:07 |
*** tobiash has joined #openstack-infra | 18:07 | |
dansmith | smcginnis: it's possible that that's what it is yeah, although I wouldn't expect the loop to hamstring the whole system this badly | 18:07 |
smcginnis | Yeah, seems like something else is compounding the situation. | 18:08 |
*** shardy_mtg has quit IRC | 18:08 | |
dansmith | there are literally zero read iops over the range I'm looking at, but some write iops all the time | 18:08 |
dansmith | which really sounds like thrashing with very little io bandwidth | 18:08 |
clarkb | what is odd is devstack dose a bunch of io too | 18:12 |
dansmith | there is a big spike in writes and write iops early in the run, which is exactly when cinder runs lvchange -ay for the first volume | 18:12 |
clarkb | so we have io available during the start and end of the job. It isn't until we try to use the cloud that we find it unhappy | 18:12 |
dansmith | and load starts climbing right there and never recovers | 18:12 |
dansmith | let me get a screenshot, this is interesting | 18:12 |
clarkb | so could be a combo workload plus a bug or different io demands | 18:12 |
dansmith | https://imgur.com/a/0bvhVEB | 18:13 |
clarkb | (also we did just switch to bionic, possibly this is new bionic behavior and maybe direct-io would help) | 18:13 |
dansmith | right as that disk spike, load goes nuts | 18:13 |
dansmith | that spike is lvchange -ay $first_volume | 18:13 |
ssbarnea|rover | does anyone knows a way to dump the SSL certificates when using a https proxy? i need the certs returned by the proxy for debugging purposes. | 18:14 |
clarkb | ssbarnea|rover: openssl s_client | 18:14 |
ssbarnea|rover | clarkb: i know how to use it to get certificate from a normal web server but not to make the request to a proxy | 18:14 |
dansmith | there's a net spike at the same time.. we don't have /opt mounted on nfs or something crazy do we? | 18:15 |
ssbarnea|rover | the HTTPS proxy would generate and sign a SSL cert using its own CA-cert. | 18:15 |
clarkb | ssbarnea|rover: ok so the proxy is a mitm | 18:15 |
fungi | ssbarnea|rover: right, make a request to the proxy and you'll get it | 18:15 |
clarkb | ssbarnea|rover: in that cse I think s_client should still work | 18:15 |
clarkb | since that is the cert you see not the one on the backend | 18:16 |
ssbarnea|rover | clarkb: yep... me trying to debug why curl works and python requests (and pip) choke with the same cert bundle. | 18:16 |
clarkb | ssbarnea|rover: probably because python requests uses its own package of CAs to trust | 18:16 |
clarkb | so it isn't using your system set | 18:17 |
fungi | ssbarnea|rover: they don't use the same trust set. python (or requests if python is too old) bundles its own by default | 18:17 |
ssbarnea|rover | clarkb: yep, i need the proxy cert, but the proxy cert returned when the request was made for that specific URL (cert will vary on each website) | 18:17 |
clarkb | ssbarnea|rover: not if you are mitm'd | 18:18 |
ssbarnea|rover | clarkb: I do have SSL_CERT_FILE=/Users/ssbarnea/cacert.pem and REQUESTS_CA_BUNDLE=/Users/ssbarnea/cacert.pem -- which worked well so far. | 18:18 |
fungi | clarkb: they'll still differ for each site if it's a "transparent" proxy which uses its own ca to generate new certs for those sites on the fly | 18:18 |
ssbarnea|rover | cacert.pem contains the root-CA from the proxy server. proof that is ok is that both browsers and curl do accept use of the proxy. | 18:19 |
clarkb | dansmith: maybe this is a case of getting your direct-io change in. Then looking to see if behavior changes or persists | 18:19 |
dansmith | it will definitely be interesting to see if and what changes yeah | 18:19 |
*** armax has joined #openstack-infra | 18:19 | |
dansmith | clarkb: side note.. we _have_ to automate this dstat graph thing right? :) | 18:19 |
clarkb | dansmith: there was support for it in the stackviz tool but that broke somewhere along the way | 18:20 |
fungi | ssbarnea|rover: pass verify='/path/to/public_key.pem' as a parameter into your requests methods | 18:20 |
clarkb | dansmith: but ya it would be nice t get this back into easy to consume format | 18:20 |
dansmith | yeah | 18:20 |
ssbarnea|rover | yep, my proxy is configured in transparent mode but traffic is not enforced, still need to tell clients to use the proxy. | 18:20 |
ssbarnea|rover | fungi: I did set verify, still fails. | 18:20 |
fungi | ssbarnea|rover: you can also export REQUESTS_CA_BUNDLE | 18:20 |
*** armax has quit IRC | 18:21 | |
ssbarnea|rover | see https://gist.github.com/ssbarnea/3d5067d41abc68c3788f1c9bc0ab4418#file-ssl-request-transparent-proxy-txt-L29 | 18:21 |
ssbarnea|rover | yes they are exported. | 18:21 |
*** armax has joined #openstack-infra | 18:22 | |
dansmith | clarkb: I picked another random tempest-full run from a few days ago and it doesn't have the same signature | 18:23 |
*** wolverineav has quit IRC | 18:23 | |
ssbarnea|rover | i suspect one of two issues: either requests fails to load the entire ca bundle (>250kb in size), or it fails to validate the entire chain because the MITM re-signing generate some intermediary certs if I remember well. probably requests fails to inherit the trust from CA. | 18:23 |
dansmith | like, we do IO the whole time successfully and much more normally | 18:24 |
openstackgerrit | sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957 | 18:24 |
ssbarnea|rover | obsiously that it makes no sense to export the certs from each visited site, trusting the CA should be enough. | 18:25 |
fungi | ssbarnea|rover: i can't fathom why there would be any intermediary chain required if you're putting the ca's signing cert into the bundle already | 18:25 |
clarkb | if it does end up being an issue we can tie back to ubuntu bionic we may want to see if coreycb is able to help | 18:25 |
*** dklyle has quit IRC | 18:25 | |
fungi | ssbarnea|rover: contents of the ca bundle are basically the end of the line. if those signed something, they're trusted | 18:26 |
ssbarnea|rover | fungi: exactly. i used this proxy to install fedora/centos without any problems once I trusted the CA. Only pip chokes on it, on mac... in fact i have an idea... | 18:26 |
fungi | er, of one of those signed something, the cert with that signature is trusted is what i meant to say | 18:26 |
*** Swami has joined #openstack-infra | 18:26 | |
*** wolverineav has joined #openstack-infra | 18:26 | |
clarkb | fwiw pip has historically had other issues with proxies too | 18:27 |
fungi | ssbarnea|rover: oh! this is pip on a mac? not pypi-installed pip in a ci job? | 18:27 |
ssbarnea|rover | only one issue? python requests is notorious to do things in its own way around SSL. | 18:27 |
fungi | i thought you were trying to work out how to dump and log ssl certs in a ci job. tunnel vision ;) | 18:27 |
Linkid | hi | 18:28 |
*** wolverineav has quit IRC | 18:28 | |
*** wolverineav has joined #openstack-infra | 18:28 | |
Linkid | I have a question about the spec I'm writing | 18:28 |
ssbarnea|rover | fungi: well, my long shot is to see if I can use a HTTPS proxy as a generic proxy, as a simple way to avoid configuring custom mirrors. | 18:28 |
Linkid | I saw that you are using puppet modules to install services | 18:28 |
Linkid | but I saw that there is a spec for using ansible + containers | 18:29 |
clarkb | Linkid: yes, we are now running two services with ansible alone (no puppet), and working out some bugs shown in testing to do containers (so no container based services yet) | 18:30 |
Linkid | so, I'm wondering what is the way I should speak of in the spec for a new service | 18:30 |
clarkb | Linkid: depending on your patience for tools and in general I would likely assume ansible if you are impatient, but can assume containers if more patient | 18:31 |
fungi | ssbarnea|rover: you're sure your ca bundle is in pem format? | 18:31 |
clarkb | (unfortunately we are in the weird sport of figuring out what our migration looks like. I think you should be fine to pick one and go with it and if we learn stuff that changes your spec we can help you to update it)_ | 18:32 |
*** armax has quit IRC | 18:33 | |
ssbarnea|rover | fungi: it it would not be, curl would choke because I defined SSL_CERT_FILE to point to the same file. | 18:33 |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Set iptables forward drop by default https://review.openstack.org/624501 | 18:33 |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Collect syslogs from nodes in ansible tests https://review.openstack.org/624827 | 18:33 |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Import install-docker role https://review.openstack.org/605585 | 18:33 |
fungi | ssbarnea|rover: is it possible curl supports SSL_CERT_FILE in other formats? rather than a bare cert | 18:34 |
fungi | er, like a bare cert | 18:34 |
clarkb | infra-root ^ I've removed the ipv6 setting from that change as we intend on using the system network namespace anyway. I expect that update will get the change into a mergeable state. Now why did it just push all three chagnes I only updated the last one | 18:34 |
clarkb | OH! | 18:34 |
fungi | ssbarnea|rover: like could it be in der format maybe? | 18:34 |
ssbarnea|rover | fungi: I don't know but I will narrow it down, probably tomorrow as I am already getting tired. | 18:34 |
ssbarnea|rover | looks like PEM to me. | 18:34 |
clarkb | fungi: stephenfin ssbarnea|rover ^ this points at a git-review bug | 18:34 |
clarkb | I ran git-review in a dir that doesn't exist on master so it failed with Errors running git reset --hard 3496292b845d33be6c5649195a54ccbf76494050 | 18:35 |
openstackgerrit | sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957 | 18:35 |
clarkb | I think it must've done the rebase by that point | 18:35 |
clarkb | and that error prevented it from undoing the rebase so when I pushed it pushed up the rebase too | 18:35 |
fungi | ssbarnea|rover: another workaround seems to be setting cert=/path/to/ca.crt in a [global] section within pip.conf | 18:35 |
clarkb | thankfully the diffs come out cleanly | 18:36 |
ssbarnea|rover | fungi: it does not help as it does the same thing as setting the variable, already did. the only way I was able to make it work was to set verify=False which is not really an option | 18:36 |
ssbarnea|rover | anyway, i will find a solution. | 18:37 |
fungi | clarkb: i expect we don't get enough testing of running git-review from random subdirs of a repo. maybe we should make sure it performs all its actions under a chdir | 18:37 |
clarkb | fungi: ya repo root should be more predictable | 18:37 |
fungi | ssbarnea|rover: this sounds like you should be filing a bug report against pip/requests or asking for help in their irc channels instead | 18:38 |
jrosser | ssbarnea|rover: i have a similar situation and add a custom CA to the system CA store | 18:38 |
jrosser | then using the requests env var to point to that and it is all good | 18:38 |
fungi | jrosser: that's working for pip install in particular? | 18:38 |
*** smarcet has quit IRC | 18:38 | |
fungi | or just requests-based python software in general? | 18:38 |
jrosser | i get an entire openstack-ansible deploy done in an environment like that | 18:39 |
jrosser | the big gotcha is that by default requests is setup to use certifi on ubuntu, but you can't add extra custom CA to that | 18:39 |
*** graphene has quit IRC | 18:39 | |
jrosser | so the env var is needed to point it at ca-certificates stuff instead | 18:40 |
jrosser | so in principle if all the certs are good then it should work | 18:41 |
dansmith | clarkb: does anyone try to do correlation between e-r failures and providers? | 18:42 |
dansmith | clarkb: it would be interesting to know if the cinder-related failures are almost always on one provider | 18:42 |
clarkb | dansmith: I had tried some of it with the ovh bhs1 slowness we saw | 18:42 |
clarkb | dansmith: there were ~6 bugs that went away when we turned off bhs1 the first time | 18:43 |
ssbarnea|rover | there is a patch for e-r that should add metadata about this, i think. | 18:43 |
dansmith | clarkb: nice | 18:43 |
ssbarnea|rover | it should provide info while you hover of the graph, once we merge it. | 18:43 |
fungi | dansmith: yes, fairly easy to do by following the logstash query link from the graphs page and then adding the node_provider column | 18:43 |
clarkb | dansmith: I haven't followed up since we turned off bhs1 again (and is still off) | 18:43 |
clarkb | at least in that specific case amorin foudn a memory leak on their end that would need fixing and we'll likely do artificial load testing with devstack and tempets outside of zuul before adding it back to the pool | 18:44 |
dansmith | fungi: ah thanks | 18:44 |
fungi | dansmith: usually if there is a strong correlation with node_provider in the results you don't really need to do much statistical analysis | 18:44 |
dansmith | fungi: yeah | 18:44 |
fungi | like, if it's a provider-specific issue 9 out of 10 results will be that one provider | 18:44 |
clarkb | dansmith: in this particular case it happened on inap which we weren't previously watching for io issues | 18:44 |
fungi | at least on the ones i've investigated in the past | 18:45 |
dansmith | yup I just didn't know I could click one box and get that in front of my face | 18:45 |
*** chandan_kumar has quit IRC | 18:46 | |
*** dklyle has joined #openstack-infra | 18:46 | |
fungi | on the failures related to job timeouts it's been a little more subtle, so i've resorted to turning off all columns except node provider, pasting the resulting list into a file and running it through sort|uniq -c | 18:46 |
fungi | using the little gear to the top-right of the results list to crank up the number of results per page also helps for that case | 18:47 |
fungi | so that you don't have to stich together multiple pages of results | 18:47 |
fungi | er, stitch | 18:47 |
dansmith | yeah | 18:48 |
*** mriedem_lunch is now known as mriedem | 18:49 | |
Linkid | clarkb: ok, thanks :) | 18:49 |
clarkb | (thinking out loud with little hard evidence here) if we do find that we have general IO issues across clouds we may have to consider it is not cloud specific but potentially something in how nova runs compute or issues in the linux kernel. I think most of our clouds use local storage for our instances. The exception being vexxhost in sjc1 whcih is boot from ceph backed volumes | 18:50 |
mriedem | clarkb: nope don't think i've seen that | 18:50 |
ssbarnea|rover | clarkb: fungi dansmith : it may worth checking https://review.openstack.org/#/c/260188/10 about e-r - i only rebased it to make it pass ci and tested CLI output. i didn't had time to test the graphs. | 18:50 |
*** mriedem has quit IRC | 18:52 | |
*** dklyle has quit IRC | 18:54 | |
ssbarnea|rover | clarkb: aparently my du patch seems to work well, but I will do two more rechecks on it to be sure is not a just luck. | 18:54 |
*** mriedem has joined #openstack-infra | 18:56 | |
fungi | ssbarnea|rover: what was the workaround there? | 18:58 |
ssbarnea|rover | fungi: --threshold=100K combined with a nohup ... & -- just to be sure. | 18:58 |
fungi | ssbarnea|rover: oh, you weren't using --summarize | 18:59 |
fungi | hence, tons of stdout | 19:00 |
ssbarnea|rover | likely i decided that i don't want to wait for du to finish. the irony that the treshold already made it it run in 0s. | 19:00 |
ssbarnea|rover | but the background part should be a saferty measure if it ever hangs again, the job will miss the report, but it will be a success. | 19:01 |
clarkb | as long as you still get a total so you can see if thinsg move on the long tail I think its fine | 19:01 |
ssbarnea|rover | this is why i want to have 2-3 rechecks to see if I spot one such job. | 19:01 |
*** fuentess has quit IRC | 19:02 | |
ssbarnea|rover | --threshold=100K was a huge improvement on my own machine: it reduces the report from 15s to 1s. | 19:02 |
*** jamesmcarthur has quit IRC | 19:02 | |
ssbarnea|rover | (not same data as build but clearly less data goes to sort, not like 1M files to sort just to keep top 200) | 19:03 |
clarkb | fungi: I promise to buy you all the beers in portland next month if you can review https://review.openstack.org/#/c/615968/ and children :) (I'm impatient but the turnaround time on those puppet related chagnes isn't the quickest) | 19:09 |
fungi | clarkb: i have them up in gertty, almost there now | 19:09 |
clarkb | yay | 19:09 |
fungi | also, i don't think i could drink all the beer in portland any more than i could eat all the rice in china | 19:10 |
clarkb | the trick is to be strategic about the beers you do drink then call it good | 19:10 |
fungi | sound logic | 19:10 |
fungi | i just now approved the one for logstash.openstack.org | 19:11 |
*** dklyle has joined #openstack-infra | 19:11 | |
fungi | do you want me to only +2 the others so you can approve at your preferred pace? | 19:11 |
clarkb | fungi: ya that would be good | 19:11 |
fungi | don't want to flood you with too many at once | 19:11 |
clarkb | I'll do subunit workers and elasticsearch with the logstash one as well. Then make sure those are happy before being a bit more cautious with the git servers | 19:12 |
*** e0ne has quit IRC | 19:13 | |
fungi | there are so many of these chained together that the threading in gertty's change display doesn't even give me the first letter of title on the last dozen or so | 19:13 |
fungi | er, changes list display i mean | 19:13 |
clarkb | hrm your reviews prompted me to hop on nodes and double check things and lists.o.o and wiki-dev don't have parser = future in their puppet.conf | 19:14 |
clarkb | so I don't think this is actually working as expected. | 19:14 |
clarkb | I think I won't approve any additional ones and instead try to figure out why they aren't futureparsing | 19:14 |
fungi | oh, are we failing to set it? | 19:14 |
*** bhavikdbavishi has quit IRC | 19:15 | |
*** bhavikdbavishi has joined #openstack-infra | 19:15 | |
openstackgerrit | Doug Hellmann proposed openstack-infra/project-config master: import openstack-summit-counter repository https://review.openstack.org/625292 | 19:16 |
clarkb | fungi: it looks that way though not obvious to me why that is teh case | 19:16 |
fungi | what were some of the earlier ones we tried to turn on? | 19:18 |
clarkb | `ansible --list-hosts futureparser` shows wiki-dev01.o.o is in the group | 19:18 |
clarkb | fungi: eavesdrop is one that works | 19:19 |
clarkb | it was the host before kata containers lists | 19:19 |
clarkb | I haven't checked kata containers lists though | 19:19 |
*** armax has joined #openstack-infra | 19:19 | |
clarkb | fungi: neither lists server shows up in the --list-hosts output from above | 19:20 |
*** bhavikdbavishi has quit IRC | 19:20 | |
clarkb | I think there are at least two issues. The first is lists.* don't show up in the futureparser group at all. The second is hosts like wiki-dev01.o.o which is in the group not getting it. Oh that is beacuse puppet is disabled on wiki-dev01 maybe? | 19:21 |
*** dklyle has quit IRC | 19:21 | |
clarkb | ya not seeing any puppeting happen there | 19:21 |
clarkb | in that case I think we wait for logstash to happen and see if it works properly as expected. Then figure out the lists server group membership issue | 19:21 |
clarkb | fungi: I think its a localized issue to the list servers now. Likely the glob is wrong for them | 19:22 |
clarkb | logstash.o.o should confirm | 19:22 |
clarkb | ya I see the problem now [0-9]* in glob means something different than in regex | 19:23 |
clarkb | in regex it means match 0 or more, in glob it means match the digit always then match anything | 19:23 |
* clarkb scribbles a note to come back around and address that when we can watch the lists | 19:23 | |
*** wolverineav has quit IRC | 19:25 | |
*** wolverineav has joined #openstack-infra | 19:25 | |
fungi | aha | 19:26 |
fungi | yes indeedie | 19:26 |
*** Emine has quit IRC | 19:26 | |
fungi | i don't think there's a "zero or more" operator in shell globbing | 19:27 |
*** gagehugo has quit IRC | 19:28 | |
fungi | well, except for an any match at least | 19:29 |
*** trown|lunch is now known as trown | 19:29 | |
*** wolverineav has quit IRC | 19:30 | |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Import install-docker role https://review.openstack.org/605585 | 19:30 |
clarkb | this time git-review only pushed the one change | 19:31 |
clarkb | ya the * is a match nothing or anything which would work there actually | 19:31 |
clarkb | but I think we may want to list lists.o.o and lists[0-9]*.o.o separtely and delete lists.o.o when we get that host converted over | 19:31 |
fungi | right. basically need two entries in the case of non-enumerated hostnames | 19:32 |
fungi | or to cover the case of | 19:33 |
clarkb | I'll try to get some of these changes merged before pushing updates to that file though :) | 19:33 |
clarkb | rebasing yesterday was an interesting experience | 19:33 |
fungi | seeing [0-9]* in there is definitely confusing as it means two distinctly different patterns depending on whether you intended a file glob or a regex | 19:34 |
clarkb | ya I've already had to fix a bunch of related bugs | 19:34 |
clarkb | + isn't valid in globbing for example | 19:34 |
fungi | one digit followed by anything (or nothing) vs zero or more digits | 19:35 |
clarkb | maybe a file header comment that says "this file uses shell globs not regexes" | 19:35 |
fungi | i really only see that assumption as a risk if we expect to have digits in the middle of the host portion of some server names | 19:35 |
fungi | if we can assume digits will always fall at the end, it's fine | 19:35 |
clarkb | ya so far that is true | 19:36 |
*** wolverineav has joined #openstack-infra | 19:38 | |
*** dklyle has joined #openstack-infra | 19:42 | |
notmyname | I'm seeing a permission denied error on one of our test jobs. it doesn't seem to be something related to swift code, so I'm hoping someone here may be able to provide some insight. http://logs.openstack.org/16/625116/3/experimental/swift-multinode-rolling-upgrade-queens/fa6db38/job-output.txt.gz#_2018-12-14_19_23_50_362102 | 19:42 |
clarkb | notmyname: let me see | 19:42 |
notmyname | thank | 19:43 |
notmyname | s | 19:43 |
openstackgerrit | Sean McGinnis proposed openstack-infra/irc-meetings master: Switch release team meeting to Thursday 1600 https://review.openstack.org/625290 | 19:45 |
clarkb | notmyname: I think pip is saying it can't read the git repo for swift. I believe that command is running as the zuul user so if something earlier in the job has updated or chowned that repo to another user that may explain it (dgging in logs for any evidence of that) | 19:45 |
clarkb | notmyname: http://logs.openstack.org/16/625116/3/experimental/swift-multinode-rolling-upgrade-queens/fa6db38/ara-report/file/1ba7e97e-08df-4385-b6ca-32e4edce0d22/#line-28 I think that may do it | 19:47 |
clarkb | notmyname: that task is running python setup.py develop in teh swift repo as root which will modify the build dir and package link stuff iirc. Then when tox tries to do the same as zuul user it fails to update those files | 19:48 |
clarkb | notmyname: I think there ae a few options to fix that 1) only install swift external to tox (there is a tox setting to not install the source repo) 2) only install with tox and don't do it externally | 19:49 |
clarkb | if you are just using tox as a way to trigger the testsuite then the first option might make the most sense | 19:49 |
clarkb | there is a 3) which is have a subsequent task cleanup/chown things so that tox works | 19:50 |
notmyname | clarkb: ah, ok. thanks. I'll have to think on this and talk with timburke and tdasilva to see what the best option is | 19:50 |
clarkb | and 4) run tox as root | 19:50 |
*** dklyle has quit IRC | 19:55 | |
mriedem | i just noticed that https://docs.openstack.org/nova/latest/admin/live-migration-usage.html hasn't been updated since september, but there have been changes to that doc since then - is that normal? | 19:57 |
fungi | mriedem: possible your doc publication jobs have broken, i suppose. looking now | 19:58 |
*** mtreinish has joined #openstack-infra | 19:58 | |
mtreinish | is there any way for us to check the load on the trovie instnace that runs the subunit2sql db | 19:58 |
mtreinish | I've had a couple queries going for >173min. and I'm wondering if the little trove node is too small for the size of the db now | 19:59 |
clarkb | mtreinish: I don't think we get system level access like that so unless mysql can provide that info I'm guessing no | 20:00 |
fungi | mriedem: http://logs.openstack.org/f6/f6996903d2ef0fdb40135b506c83ed6517b28e19/post/publish-openstack-tox-docs/e140ad1/job-output.txt.gz#_2018-12-14_15_29_25_026373 | 20:00 |
fungi | looks like it's getting built and included | 20:00 |
openstackgerrit | Merged openstack-infra/system-config master: Turn on the future parser for logstash.openstack.org https://review.openstack.org/615660 | 20:00 |
clarkb | but maybe rax collects that data for us? | 20:00 |
mtreinish | clarkb: hmm, that's what I expected the answer was gonna be :/ | 20:01 |
fungi | clarkb: mtreinish: yes, i think we can see it in the rackspace cloud dashboard | 20:01 |
mtreinish | well mriedem will just have to keep waiting for his list of slowest tempest tests | 20:01 |
fungi | heh | 20:01 |
mtreinish | it probably wouldn't hurt to check the dashboard though and think about upsizing the node | 20:03 |
clarkb | can we/should we trim the db? | 20:03 |
mtreinish | we trim at 6 months now | 20:03 |
clarkb | ah so its already bound, in that case ya maybe upsize is best if we see indication its too small | 20:03 |
* clarkb tries to figure out where that lives | 20:05 | |
mtreinish | well we were supposed to be running a cron or something to trim to six months, fungi and I set that up a long time ago | 20:06 |
mtreinish | but I just checked the oldest entries in the db and it's from june 2014 | 20:06 |
clarkb | nice | 20:06 |
mtreinish | oh, but theres only a few | 20:06 |
mtreinish | then 2016 | 20:06 |
mtreinish | 10/2016 | 20:06 |
mtreinish | then 11/2017 | 20:07 |
mtreinish | and then it's 6months and a lot more data | 20:07 |
mtreinish | I guess our triming job isn't perfect :p | 20:07 |
clarkb | might need to do multiple passes to get the new stuff | 20:08 |
clarkb | so I see a bunch of our DBs for services but not one for health | 20:08 |
mriedem | mtreinish: gibi already eyeballed it | 20:08 |
clarkb | whcih makes me think i am not looking in the right place | 20:08 |
mriedem | using good ol' gumption | 20:08 |
mriedem | see comments 11 and 12 https://bugs.launchpad.net/tempest/+bug/1783405 | 20:09 |
openstack | Launchpad bug 1783405 in tempest "Slow tests randomly timing out jobs (which aren't marked slow)" [High,In progress] - Assigned to Ghanshyam Mann (ghanshyammann) | 20:09 |
*** betherly has joined #openstack-infra | 20:09 | |
clarkb | fungi: any idea where the db is hiding? | 20:09 |
mtreinish | clarkb: it's probably called subunit2sql something | 20:09 |
mriedem | fungi: ok maybe my browser has that page cached.../me hard refreshes | 20:09 |
fungi | yeah, it'll be the subunit2sql db | 20:09 |
clarkb | mtreinish: bah there it is | 20:09 |
clarkb | ok I'm just blind in that case :) | 20:09 |
mtreinish | we turned it on around paris iirc :p | 20:09 |
*** bobh has quit IRC | 20:09 | |
mriedem | hmm, that didn't help | 20:10 |
clarkb | mtreinish: load average spiked to 15 and is on its way down now but still high | 20:10 |
mtreinish | mriedem: heh, ok | 20:10 |
mtreinish | mriedem: oh, you actually used openstack-health? | 20:10 |
*** armstrong has quit IRC | 20:10 | |
mtreinish | clarkb: hmm, that's probably not a good sign | 20:10 |
clarkb | memory usage is either happy or the gauge isn't working | 20:10 |
clarkb | (I don't see any memory usage) | 20:11 |
clarkb | disk usage looks ok | 20:11 |
*** e0ne has joined #openstack-infra | 20:11 | |
mtreinish | I'm pretty sure we used the smallest node size when we provisioned it | 20:11 |
*** e0ne has quit IRC | 20:11 | |
clarkb | its pretty big now at least for disk | 20:11 |
clarkb | I don't see an indication of the cpus we've got | 20:12 |
mriedem | mtreinish: i did, however, i still complained about it's ux when i did (in here yesterday) | 20:12 |
mriedem | i can't use it w/o cursing it's name first | 20:12 |
openstackgerrit | Merged openstack-infra/system-config master: Add rust-vmm OpenDev ML https://review.openstack.org/625254 | 20:13 |
clarkb | mtreinish: I take that back I misread this graph | 20:13 |
clarkb | cpu usage is 15% not a load average | 20:13 |
*** betherly has quit IRC | 20:13 | |
clarkb | load average is ~2 | 20:14 |
clarkb | and has been for ~3 hours | 20:14 |
mtreinish | heh, well that's when I started running my script | 20:14 |
clarkb | so it seems that we notice your script but it doesn't seem that the script has used all the available memory or cpu or disk | 20:15 |
clarkb | might also need to consider if the query is inefficient or can be improved? | 20:16 |
mtreinish | I was looking at the explain before it didn't look that bad to me, but I'm hardly an expert | 20:17 |
clarkb | the 15% cpu usage implies that either we have a bunch of cpus and mysql can't use them for that query or we are waiting on io? | 20:17 |
mtreinish | http://paste.openstack.org/show/737336/ | 20:18 |
mtreinish | err I guess I misread it before, it's not that great | 20:19 |
mtreinish | I'm probably trying to grab too much data at once | 20:19 |
clarkb | the time scale on load average and cpu % are different so overlapping them in my head is hard | 20:19 |
mtreinish | mordred: ^^^ want to fix it for me :p | 20:20 |
clarkb | mtreinish: I'm definitely not a db expert :) | 20:20 |
clarkb | mtreinish: is the 500k rows for test ids unique runs or just unique tests | 20:20 |
clarkb | because ya if then going through 500k unique tests to find all the unique run times that could be quite expensive | 20:20 |
*** jento has quit IRC | 20:21 | |
clarkb | but if its 500k unique test runs I would expect that to be easy for it | 20:21 |
mtreinish | it's 500k unique test_ids (which I expect is that table's size) | 20:21 |
mtreinish | the tests table is the test name, total run counts, and a moving average of run times for that individual test across all runs | 20:22 |
*** armax has quit IRC | 20:22 | |
mtreinish | the query my script generated was: http://paste.openstack.org/show/737337/ (yeah paste not line wrapping) | 20:22 |
mtreinish | I just called: https://github.com/openstack-infra/subunit2sql/blob/master/subunit2sql/db/api.py#L1854 although looking at that it's grabbing more data than i actually need (and has an extra join for no benefit because of that) | 20:27 |
Shrews | oof, that explain doesn't look so great for that query | 20:31 |
*** rkukura has joined #openstack-infra | 20:31 | |
Shrews | you might try a combined index in 'runs' that contains both uuid and id, but i'm speculating what the table schema actually looks like | 20:31 |
Shrews | b/c that first table scan is likely what's hurting you | 20:32 |
Shrews | been a while since i did that type of stuff, too, so take with a grain of salt :) | 20:32 |
clarkb | fungi: as an fyi it appears lists.opendev.org records are in place | 20:35 |
fungi | excellent | 20:36 |
clarkb | fwiw logstash seems to have future parsered and broken its apache | 20:37 |
clarkb | which isn't a huge emergency but I'm sorting that out now | 20:37 |
*** priteau has quit IRC | 20:37 | |
fungi | #status log started the new opendev mailing list manager process with `sudo service mailman-opendev start` on lists.openstack.org | 20:39 |
openstackstatus | fungi: finished logging | 20:39 |
*** armax has joined #openstack-infra | 20:39 | |
*** diablo_rojo has joined #openstack-infra | 20:40 | |
*** wolverineav has quit IRC | 20:43 | |
*** wolverineav has joined #openstack-infra | 20:44 | |
mtreinish | Shrews: that seems correct, I'm trying to rewrite the script using a better query right now | 20:44 |
openstackgerrit | Clark Boylan proposed openstack-infra/puppet-kibana master: Set server admin var so that vhost works https://review.openstack.org/625344 | 20:47 |
clarkb | fungi: ^ I expect that to be the fix for logstash.o.o apache brokenness | 20:48 |
fungi | clarkb: for some reason the mm_domains variable addition in system-config change doesn't seem to have propagated to lists.o.o | 20:48 |
clarkb | and now I must lunch | 20:48 |
clarkb | oh hrm | 20:48 |
clarkb | did puppet actually run? | 20:48 |
fungi | it created the new mailing list | 20:48 |
fungi | it just didn't update exim configuration | 20:48 |
clarkb | yes pupept has run accordingto syslog | 20:48 |
clarkb | fungi: I can help look after lunch | 20:49 |
fungi | -r--r--r-- 1 root root 34445 Nov 13 04:15 /etc/exim4/exim4.conf | 20:49 |
fungi | i have to disappear to meet someone in ~25 minutes but may work it out before then | 20:49 |
*** wolverineav has quit IRC | 20:49 | |
*** betherly has joined #openstack-infra | 20:50 | |
clarkb | have a link to the line that sets it? | 20:52 |
fungi | as best i can tell roles/exim/templates/exim4.conf.j2 isn't getting applied by ansible | 20:54 |
*** betherly has quit IRC | 20:55 | |
clarkb | oh right exim is in ansible now | 20:55 |
fungi | the "Write Exim config file" task in roles/exim/tasks/main.yaml is what should be applying that | 20:56 |
clarkb | /var/log/ansible on bridge has the log files | 20:57 |
fungi | it sets dest: "{{ config_file }}" | 20:57 |
*** wolverineav has joined #openstack-infra | 20:58 | |
fungi | which roles/exim/vars/Debian.yaml sets to /etc/exim4/exim4.conf | 20:58 |
clarkb | I dont see mm_domains in the conf.j2 fike | 21:00 |
clarkb | *file | 21:00 |
fungi | i still feel uninformed about ansible... does it automatically replace files with a template task? | 21:00 |
clarkb | yes it should | 21:00 |
fungi | clarkb: it's used to set a couple values in playbooks/host_vars/lists.openstack.org.yaml | 21:01 |
fungi | not used in the role itself | 21:01 |
clarkb | ah its transitive | 21:01 |
fungi | one of them is exim_local_domains which is what i'm tracking down now | 21:01 |
fungi | that does then get included in the template | 21:01 |
fungi | but the template never seems to have been written out on lists.o.o once that was updated | 21:02 |
clarkb | the force parameter to template is defaulted to yes | 21:03 |
clarkb | so should update if contents differ | 21:03 |
dmsimard | unrelated question, I want to sync a feature branch with master with no regard to git history -- in the context of gerrit and zuul, should I send a merge commit for review or should I delete and re-create the branch ? | 21:04 |
dmsimard | I guess it's sort of the reverse of when we merged the zuulv3 branch of zuul back into master | 21:05 |
clarkb | fungi do we maybe overwrite mm_domains in the ansible var data on bridge? | 21:05 |
clarkb | and so old value supercedes your ew value? | 21:05 |
clarkb | dmsimard: is ths feature branch in gsrrit? | 21:05 |
dmsimard | clarkb: it is | 21:06 |
clarkb | it already exists that is? I would msrge master to featude then push that to gsrrit | 21:06 |
dmsimard | openstack/ansible-role-ara has a feature/1.0 branch | 21:06 |
dmsimard | ok so something like git merge master feature/1.0 and then git push gerrit feature/1.0 ? or a git review ? | 21:06 |
mtreinish | mriedem: fwiw: http://paste.openstack.org/show/737340/ is the average run time of every test over the last 300 runs of tempest-full in gate | 21:08 |
clarkb | dmsimard: git review should work | 21:08 |
clarkb | dmsimard: it will push up a proposal chagne for merging the merge commit | 21:08 |
dmsimard | clarkb: ok, I don't think I've ever sent a merge commit for review -- I'll try that, thanks :D | 21:08 |
clarkb | fungi: its not overridden in private vars from what I see | 21:08 |
fungi | clarkb: on bridge.o.o `sudo grep -ir mm_domains /etc` only turns up hits in /etc/puppet/modules/exim/templates/exim4.conf.erb which is presumably vestigial | 21:09 |
clarkb | dmsimard: note that you may have to allow it in your acls file | 21:09 |
clarkb | fungi: ya that should be ignored beacuse its puppet | 21:09 |
mtreinish | mriedem: you can generate it in the future by just running: http://paste.openstack.org/show/737341/ | 21:09 |
clarkb | fungi: I'm currently looking for evidence that role ran at all | 21:11 |
*** rlandy has quit IRC | 21:12 | |
fungi | yeah, i was checking the logs on bridge.o.o and thinking the exim role isn't getting used? | 21:12 |
clarkb | fungi: its in playbooks/base.yaml | 21:13 |
clarkb | but ya I don't see evidence of it in the logs | 21:13 |
*** chandan_kumar has joined #openstack-infra | 21:13 | |
fungi | i agree, seems to be included for all of hosts: "!disabled" | 21:14 |
clarkb | base-server is running according to the logs | 21:15 |
fungi | clarkb: i don't see the iptables role getting applied either | 21:16 |
fungi | and it's in the same set in base | 21:16 |
clarkb | ya but base-server is which is really weird | 21:16 |
fungi | i need to run now. worst case we can append lists.opendev.org to the two hostlists in /etc/exim/exim4.conf and reload the exim4 service while digging deeper | 21:16 |
clarkb | its almost like ansible crashes | 21:17 |
clarkb | the logs go from base-server to "Base: configure OpenStackSDK on bridge which is the next block of tasks | 21:18 |
fungi | i've manually appended that hostname to the hostlists in the exim config and reloaded | 21:18 |
fungi | and with that i'm disappearing for an hour or so but will check on this as soon as i get back | 21:18 |
clarkb | it looks like my rewrok of the debian handling for arm may be the cause? | 21:18 |
clarkb | at least it gets to that point then stops | 21:18 |
mriedem | mtreinish: ok 2-4 there in that first paste are the same things i identified from gibi's comments in the bug report | 21:19 |
mriedem | so that's good to know | 21:19 |
mriedem | tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern: 161.5385627070707 is the only difference, but that one shouldn't surprise me | 21:19 |
clarkb | dmsimard: is include_tasks as used at https://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/roles/base-server/tasks/Ubuntu.xenial.aarch64.yaml#n6 not expected to work? | 21:20 |
mriedem | kind of surprised that isn't already marked as slow | 21:20 |
clarkb | dmsimard: http://paste.openstack.org/show/737342/ I'm seeing that include_task seemingly nop then the entire playbook jumps to the next play https://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/base.yaml#n12 any idea why that happens | 21:23 |
*** tpsilva has quit IRC | 21:23 | |
clarkb | also extra scary here is that ansible isn't failing (this is something that for all puppet's problems it was really good at, if something can't happen then be safe and stop) | 21:25 |
dmsimard | clarkb: I have no idea why that is | 21:25 |
dmsimard | hang on | 21:26 |
dmsimard | hum | 21:27 |
*** dklyle has joined #openstack-infra | 21:27 | |
clarkb | dmsimard: I'm tempted to just copy pasta that set of tasks from Debian.yaml into the arm64 task list for now | 21:28 |
clarkb | but am open to other ideas (like is it worth trying import_tasks instead of include_tasks?) | 21:29 |
dmsimard | I'm a bit lost in the Ansible transition between include and import, especially across versions | 21:29 |
clarkb | I'm sure I'm more lost :) | 21:29 |
dmsimard | My understanding is that include is parsed "at runtime" and is meant to be used when there are conditions attached | 21:29 |
clarkb | (this is a specific topic that could use better docs) | 21:29 |
dmsimard | While import is static | 21:30 |
clarkb | dmsimard: ya, but in this case its being called from a file that itself is conditionally loaded | 21:30 |
clarkb | so I'm guessing it has to be included not imported? or may not | 21:30 |
dmsimard | worth a try | 21:30 |
clarkb | https://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/roles/base-server/tasks/main.yaml#n67 | 21:31 |
dmsimard | import_* came with Ansible >= 2.5 | 21:31 |
dmsimard | According to https://docs.ansible.com/ansible/latest/user_guide/playbooks_reuse_includes.html | 21:31 |
clarkb | maybe it can't handle multiple levels of iclude | 21:31 |
clarkb | (the docs say it can but could be buggy) | 21:31 |
dmsimard | yeah, I'm not implying there's no bug or odd behavior at play here | 21:32 |
dmsimard | There's definitely something weird going on | 21:32 |
*** dklyle has quit IRC | 21:32 | |
*** dklyle has joined #openstack-infra | 21:33 | |
dmsimard | In unrelated news, I was looking at a system-config run through ara (to see what the base-server role did) and I'm confused why a job came back successful despite a failure http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/hosts/bridge.openstack.org/ara-report/ | 21:33 |
dmsimard | (from https://review.openstack.org/#/c/605585/) | 21:33 |
clarkb | could be the same bug | 21:34 |
clarkb | ansible is apparently not trackign failures in both cases | 21:35 |
clarkb | we should double check we are checking return codes properly too | 21:35 |
*** woojay has quit IRC | 21:35 | |
dmsimard | The failure in that particular system-config run was http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/hosts/bridge.openstack.org/ara-report/result/e3ced53a-5b94-484c-976d-868386826527/ | 21:35 |
dmsimard | But everything is green from the perspective of Zuul :/ http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/ara-report/ | 21:36 |
*** woojay has joined #openstack-infra | 21:37 | |
dmsimard | ctrl+f for "root_rsa_key" is turning up empty in the full job log.. | 21:38 |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Copy pasta the debian base server bits, don't include them https://review.openstack.org/625350 | 21:38 |
clarkb | there is the naive make it work change (I hope that makes it work | 21:38 |
clarkb | dmsimard: where does bridge.yaml run? I see the base.yaml in the surrounding ara report | 21:40 |
*** jamesmcarthur has joined #openstack-infra | 21:40 | |
dmsimard | clarkb: that's what I was trying to find out but I ended up being even more confused | 21:41 |
dmsimard | The failing task is "Write out ssh private key" which is "changed" in run-base.yaml (from the perspective of zuul) but it's "failed" when it later runs for bridge.yaml from the perspective of system-config ? | 21:42 |
dmsimard | http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/job-output.txt.gz#_2018-12-14_19_41_53_387941 | 21:42 |
dmsimard | That's the only instance of "Write out ssh private key" in the job logs, I suppose the one from inside system-config is not in the stdout | 21:42 |
clarkb | dmsimard: its from run-base.yaml | 21:43 |
clarkb | which is the run playbook for the job | 21:43 |
*** EvilienM is now known as EmilienM | 21:44 | |
*** weshay is now known as weshay_pto | 21:44 | |
*** jamesmcarthur has quit IRC | 21:44 | |
*** jamesmcarthur has joined #openstack-infra | 21:44 | |
clarkb | is it possible the streams are getting crossed? | 21:44 |
clarkb | dmsimard: ok I think I get it now maybe | 21:46 |
clarkb | dmsimard: the job's run playbook runs bridge.yaml to set things up for the job. We pass in a ssh key value there http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/ara-report/result/ee88cd4f-f3b3-4ea8-b0f6-d0fbc9f05bea/ | 21:47 |
dmsimard | yeah, that one works | 21:47 |
dmsimard | but the nested one doesn't | 21:47 |
clarkb | dmsimard: then what the job is testing is that we can run ansible against all the hosts in our inventory which happens to include bridge.o.o so it reruns bridge.yaml only this time we don't pass in the ssh key info | 21:47 |
dmsimard | what I don't understand is that there's no such thing as "Write out ssh keys" in the ansible-playbook command output, unless I'm not looking at the right one: http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/ara-report/result/89704ccd-5067-4715-81c9-fa0dcee02e55/ | 21:48 |
clarkb | dmsimard: ya thats base.yaml which doesn't run bridge.yaml I don't think | 21:48 |
clarkb | run_all.sh runs bridge.yaml | 21:49 |
clarkb | which is where this gets weird | 21:49 |
clarkb | its the cron | 21:49 |
*** jtomasek has quit IRC | 21:49 | |
dmsimard | so that's why we wouldn't have the output anywhere then ? it's in the cron shell ? | 21:49 |
clarkb | so that run isn't part of the job except that the job installs a cron to run things every 15 minutes | 21:49 |
clarkb | yup | 21:49 |
clarkb | and that cron ansible is corssing the streams with your nested ara | 21:49 |
clarkb | I think the clean up for this is to disable the cron on test jobs | 21:50 |
dmsimard | that would explain why the job didn't fail as a result | 21:50 |
clarkb | yup | 21:50 |
dmsimard | ianw: ^ FYI, tl;dr is there was a failure in the nested bridge.o.o ara report and I was confused as to why Zuul hadn't failed the job: http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/hosts/bridge.openstack.org/ara-report/ | 21:52 |
clarkb | dmsimard: fungi: https://review.openstack.org/#/c/625350/1 is I think the short term ansible fix (and we should watch it when it goes in to make sure iptables continues working as expected ugh, I really wish that ansible would fail safe when it crashes like this) | 21:53 |
clarkb | dmsimard: also thinking about that more I wonder if the issue is including a task list that is included elsewhere by other hosts in the same play | 21:53 |
clarkb | basically we were trying to dedup special tasks for arm by doing arm specific then generic tasks, but may be the reorg here is do all the generic tasks on debuntu then include only the arm specific tasks from there? | 21:54 |
clarkb | I dunno this is all incredibly cryptic to me and it not actually knowing it failed makes it harder to understand | 21:54 |
*** dklyle has quit IRC | 21:54 | |
*** jamesmcarthur has quit IRC | 21:54 | |
dmsimard | clarkb: where is that failure occurring ? not in zuul right ? | 21:55 |
clarkb | no this is to apply things to production. I don't think it happens in the zuul jobs beacuse we don't use arm64 test nodes in the zuul jobs | 21:55 |
clarkb | we could add an arm64 node to the system-config inventory to further test things (but that seems out of scope for now) | 21:56 |
clarkb | (at least for me trying to make things happy before weekend) | 21:56 |
*** dklyle has joined #openstack-infra | 21:56 | |
dmsimard | clarkb: I don't fully recognize the output format of the paste you sent me, could a callback be eating some sort of trace or failure ? | 21:56 |
clarkb | maybe? I think we use default output for that | 21:57 |
dmsimard | where does "2018-12-14 20:54:28,515 p=11685 u=root |" come from ? is that journalctl ? | 21:57 |
clarkb | callback_whitelist=profile_tasks, timer | 21:57 |
clarkb | I think from the timer callback | 21:57 |
clarkb | or the profuiler? | 21:57 |
clarkb | I dunno | 21:58 |
clarkb | it does look like syslog/journalctl format though | 21:58 |
dmsimard | ah, it looks like it's the format provided by ansible from log_path=/var/log/ansible/ansible.log | 21:59 |
*** mriedem has quit IRC | 22:01 | |
openstackgerrit | Merged openstack-infra/puppet-kibana master: Set server admin var so that vhost works https://review.openstack.org/625344 | 22:02 |
*** dklyle has quit IRC | 22:02 | |
openstackgerrit | Clark Boylan proposed openstack-infra/system-config master: Stop running unnecessary tests on trusty https://review.openstack.org/625358 | 22:02 |
clarkb | and ^ is a small cleanup optomization I noticed we could make | 22:03 |
dmsimard | clarkb: my gut feeling is that the logging doesn't tell the whole story | 22:03 |
dmsimard | a bit like how the zuul callback munges some traces or errors | 22:03 |
clarkb | dmsimard: ya I'm beginning to think it must be a corner case of include_task for a file that is already included in the play | 22:03 |
dmsimard | sometimes* munges | 22:04 |
clarkb | its being included for a different set of nodes, but the way ansible queues things up is global iirc | 22:04 |
dmsimard | clarkb: if we have the ability to run that playbook in the foreground, we would certainly get a different output | 22:04 |
*** priteau has joined #openstack-infra | 22:05 | |
*** priteau has quit IRC | 22:05 | |
clarkb | dmsimard: probably the path forward here is merge the copy pasta (I assume that won't break t hings similarly) then make a revert that dds an arm64 test node to the inventory and tweak until it works | 22:06 |
clarkb | keeping in mind that ansible success doesn't mean it actually worked | 22:07 |
dmsimard | clarkb: hmmm, the output is the same in the run_all raw output http://paste.openstack.org/show/737344/ | 22:09 |
dmsimard | I'm out of ideas | 22:09 |
dmsimard | ¯\_(ツ)_/¯ | 22:09 |
clarkb | it definitely looks like ansible just goes "NOPE" | 22:09 |
clarkb | and continues with the next play | 22:09 |
clarkb | dmsimard: should I try to fabricobble a bug together on github for this? | 22:13 |
clarkb | I don't really have much info other than the setup and logs and ansible version info | 22:13 |
*** dklyle has joined #openstack-infra | 22:13 | |
clarkb | I guess I can start there and see if anyone in ansible land is able/willing to debug further | 22:13 |
dmsimard | We have a good case for a bug if we can come up with a generic reproducer | 22:13 |
*** betherly has joined #openstack-infra | 22:14 | |
*** boden has quit IRC | 22:14 | |
dmsimard | Probably worthwhile to check if there's already an issue about this too | 22:14 |
dmsimard | yikes, the answer in https://github.com/ansible/ansible/issues/41984 is basically "include_tasks is in tech preview, use at your own risks" | 22:16 |
*** trown is now known as trown|outtypewww | 22:16 | |
dmsimard | (that was a while ago, though) | 22:16 |
clarkb | re reproducing if I had to guess the trick there is having a play that includes foo.yaml and bar.yaml on different hosts, then have bar.yaml include_tasks: foo.yaml | 22:17 |
*** dklyle has quit IRC | 22:18 | |
dmsimard | you can probably fake it using add_host | 22:18 |
clarkb | play1 runs role1 and role2. role1 executors foo.yaml via include_tasks on some hosts and bar.yaml on others. Have bar.yaml also include_tasks foo.yaml | 22:18 |
dmsimard | (with ansible_connection: local) | 22:18 |
clarkb | then role2 never runs (and neither does play2) | 22:18 |
*** betherly has quit IRC | 22:18 | |
clarkb | or just use an inventory with a few connect local settings | 22:19 |
*** rfolco has quit IRC | 22:22 | |
*** e0ne has joined #openstack-infra | 22:23 | |
*** betherly has joined #openstack-infra | 22:24 | |
*** betherly has quit IRC | 22:29 | |
openstackgerrit | demosdemon proposed openstack-dev/pbr master: Resolve ``ValueError`` when mapping value contains a literal ``=``. https://review.openstack.org/625364 | 22:32 |
clarkb | dmsimard: I can reproduce | 22:32 |
clarkb | totally doesn't print any errors either as far as I can tell | 22:32 |
clarkb | I'm double checkign that copying the tasks fixes | 22:32 |
*** slaweq has quit IRC | 22:33 | |
clarkb | yup | 22:33 |
clarkb | that is amazing | 22:33 |
clarkb | I'm going to take a short break then will be back to file a bug report | 22:34 |
*** eernst has joined #openstack-infra | 22:34 | |
*** dklyle has joined #openstack-infra | 22:37 | |
*** e0ne has quit IRC | 22:40 | |
*** dklyle has quit IRC | 22:41 | |
*** dklyle has joined #openstack-infra | 22:42 | |
*** eernst has quit IRC | 22:48 | |
*** wolverineav has quit IRC | 22:51 | |
dmsimard | signing off for today, do share the link, I'm curious :) | 22:52 |
kmalloc | Shrews, ianw, mordred, clarkb: https://review.openstack.org/625370 <-- dogpile.cache and sdk fix | 22:52 |
kmalloc | i am unsure how to actually run this as a unit test... | 22:53 |
kmalloc | but that has been confirmed locally to fix the issue | 22:53 |
notmyname | FYI https://blade.tencent.com/magellan/index_en.html | 22:55 |
notmyname | there's a severe lack of detail (including a CVE), but it sounds scary. but maybe "just" upgrading will mitigate it? | 22:56 |
*** kgiusti has left #openstack-infra | 22:57 | |
*** dklyle has quit IRC | 22:57 | |
kmalloc | notmyname: gross. | 22:58 |
kmalloc | i think chromium is fixed with upgrade according to that. | 22:58 |
kmalloc | not sure if it's SQLite upgrade fixing =/ man so few details. | 22:59 |
*** wolverineav has joined #openstack-infra | 22:59 | |
*** jamesmcarthur has joined #openstack-infra | 22:59 | |
notmyname | ah, it's the one linked in https://news.ycombinator.com/item?id=18685516. seems related to WebSQL | 23:00 |
kmalloc | ahh | 23:00 |
notmyname | CVE TBD | 23:00 |
notmyname | well, that's at least the chromium issue. not sure if there are others | 23:02 |
*** wolverineav has quit IRC | 23:03 | |
*** jamesmcarthur has quit IRC | 23:03 | |
kmalloc | yeah | 23:04 |
*** wolverineav has joined #openstack-infra | 23:06 | |
*** lbragstad has quit IRC | 23:08 | |
clarkb | kmalloc: thanks and notmyname that looks like a fun one | 23:09 |
clarkb | I'm going to figure out how to write this bug report for ansible without making the person taking it on go crazy | 23:09 |
notmyname | heh | 23:09 |
notmyname | clarkb: looks like https://review.openstack.org/#/c/625361/ worked for the permissions error | 23:10 |
openstackgerrit | demosdemon proposed openstack-dev/pbr master: Resolve ``ValueError`` when mapping value contains a literal ``=``. https://review.openstack.org/625372 | 23:11 |
clarkb | notmyname: cool | 23:14 |
*** lbragstad has joined #openstack-infra | 23:17 | |
fungi | clarkb: okay, back from steak night and catching up | 23:20 |
fungi | sounds like you found a bug | 23:20 |
clarkb | fungi: ya In the process of writing a bug | 23:22 |
*** lbragstad has quit IRC | 23:22 | |
clarkb | just pushed https://github.com/cboylan/ansible_include_tasks_crash to refer to in the bug | 23:22 |
clarkb | as its easie rto do that than to try and do all this in markdown I think | 23:22 |
*** eernst has joined #openstack-infra | 23:26 | |
fungi | sure | 23:26 |
fungi | i would have never figured that out myself, fwiw | 23:27 |
fungi | good job | 23:27 |
fungi | ansible is still so much blackbox to me | 23:27 |
scas | keeping ruby in wet memory for chef leaves ansible pretty opaque for me. it's a trade-off | 23:35 |
clarkb | fungi: dmsimard mordred Shrews https://github.com/ansible/ansible/issues/49969 | 23:36 |
clarkb | imo its a pretty big deal because ansible shouldn't fail and continue running like that | 23:37 |
clarkb | hrm logstash webserver still broken | 23:37 |
*** diablo_rojo has quit IRC | 23:37 | |
clarkb | arg I know why | 23:38 |
fungi | do share | 23:38 |
*** yamamoto has quit IRC | 23:39 | |
openstackgerrit | Clark Boylan proposed openstack-infra/puppet-kibana master: Use full lookup path for serveradmin in template https://review.openstack.org/625374 | 23:41 |
clarkb | fungi: ^ beacuse I fail at puppet | 23:41 |
clarkb | the good news is I fail at ansible equally as evidenced by the include_tasks thing :) | 23:41 |
clarkb | fungi: https://review.openstack.org/625350 and https://review.openstack.org/625374 should fix the two oustanding issues we know we currently have | 23:41 |
clarkb | fungi: on the first one I have no idea if ansible will apply cleanly after not doign so for a few days | 23:42 |
fungi | maybe we just merge and fix any issues we spot over the weekend | 23:42 |
scas | speaking of opaque, what might i do with several cross-repo dependencies that all need each other to pass each build? | 23:43 |
kmalloc | zzzeek: i don't think we need to revert dogpile. what SDK is doing is very .. not normal. | 23:43 |
clarkb | scas: option A build in backward compat/future compat as necessary to get them all happy. option B make tests non voting as necessary to get over hump. | 23:44 |
clarkb | scas: there is an option C too, realize that these peices of software are tightly coupled and might be better off in a single repo | 23:44 |
scas | single repo is not exactly the easiest to manage, since it's configuration management | 23:45 |
scas | option A might be the path forward | 23:45 |
scas | making things non-voting would mean making everything non-voting, negating the testing mechanism | 23:45 |
clarkb | scas: ya B is a short term, thing | 23:46 |
clarkb | that comma is there for I don't know why reasons | 23:46 |
fungi | in openstack, we've held that option a is good engineering and generally downstream-friendly | 23:47 |
clarkb | ++ | 23:47 |
fungi | since at any point in time, the set of your stuff continues to work | 23:47 |
scas | yeah, that's where i'm leaning | 23:47 |
clarkb | after filing this ansible github bug I feel like I've done my good deed of the week | 23:48 |
clarkb | that was a fun one | 23:48 |
clarkb | fungi: fwiw if https://review.openstack.org/#/c/625350/ looks good to you I don't mind keeping one eyeball on irc/bridge logs this evening | 23:48 |
clarkb | if you want to approve it | 23:48 |
fungi | scas: it's more complexity and more iteration for sure, but has the up-side that if one of those changes gets ignored for weeks due to lack of reviewers or a wayward bus, stuff still runs | 23:49 |
clarkb | and I don't mind self approving the logstash fix since I already self approved the broken fix | 23:49 |
clarkb | (maybe this one is broken too ) | 23:49 |
scas | i know force-merging is a less favorable option, but i'm not considering that an option at this point | 23:52 |
scas | i'd rather get them testing without having to rely on local testing alone saying all's well | 23:53 |
fungi | scas: at least here that requires cooperation of the admins for the repository hosting platform | 23:53 |
scas | absolutely, and i'm sure none would be too pleased of me asking it | 23:54 |
fungi | not that we're an uncooperative bunch, we'll just spend a while telling you why it's a bad idea ;) | 23:54 |
scas | i'm familiar with how bad of an idea it can be. it had to be wielded in the recent months for something else unrelated, as it was the only option in that case | 23:56 |
fungi | yeah, it's not necessarily the worst idea, circumstances depending. though it is usually still a bad idea regardless | 23:57 |
fungi | sometimes all the other ideas are simply worse still | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!