Friday, 2018-12-14

*** wolverineav has quit IRC		00:09
*** wolverineav has joined #openstack-infra		00:09
*** slaweq has joined #openstack-infra		00:11
*** wolverineav has quit IRC		00:14
*** slaweq has quit IRC		00:16
*** woojay has quit IRC		00:25
*** sthussey has quit IRC		00:34
*** dtroyer has quit IRC		00:34
*** ssbarnea\|rover has quit IRC		00:34
*** wolverineav has joined #openstack-infra		00:34
*** dtroyer has joined #openstack-infra		00:35
*** dtroyer has quit IRC		00:36
*** wolverineav has quit IRC		00:36
*** dtroyer has joined #openstack-infra		00:37
*** wolverineav has joined #openstack-infra		00:38
*** wolverineav has quit IRC		00:38
*** wolverineav has joined #openstack-infra		00:38
*** Swami has quit IRC		00:49
*** gyee has quit IRC		00:58
*** armax has joined #openstack-infra		01:00
*** yamamoto has quit IRC		01:03
*** slaweq has joined #openstack-infra		01:10
*** slaweq has quit IRC		01:15
*** psachin has joined #openstack-infra		01:30
zxiiro	Anyone else seeing "ImportError: cannot import name decorate" when using openstack client?	01:34
zxiiro	I think dogpile.cache released a new version yesterday that's breaking.	01:34
clarkb	zxiiro: I think the dogpile thing is a known issue but unaware of fix	01:35
zxiiro	pinning it to 0.6.8 seems to help my build job at least.	01:39
clarkb	kmalloc: Shrews might be worth email to the discuss list?	01:43
*** d0ugal has quit IRC		01:56
*** mrsoul has quit IRC		02:07
*** d0ugal has joined #openstack-infra		02:11
*** rfolco has quit IRC		02:30
*** armax has quit IRC		02:31
*** dave-mccowan has joined #openstack-infra		02:38
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add change status page https://review.openstack.org/599472	02:50
*** bhavikdbavishi has joined #openstack-infra		02:51
*** yamamoto has joined #openstack-infra		03:02
*** wolverineav has quit IRC		03:03
*** wolverineav has joined #openstack-infra		03:04
*** armax has joined #openstack-infra		03:05
*** wolverineav has quit IRC		03:08
*** slaweq has joined #openstack-infra		03:11
*** hongbin has joined #openstack-infra		03:14
*** apetrich has quit IRC		03:15
*** hongbin has quit IRC		03:15
*** slaweq has quit IRC		03:15
*** hongbin has joined #openstack-infra		03:16
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add change status page https://review.openstack.org/599472	03:18
kmalloc	clarkb: i also think that openstacksdk is doing the wrong thing here	03:22
kmalloc	clarkb: trying to figure out why we have written a wrapper that is insisting on passing a bound method into a decorator instead of just wrapping the methods like you normally would.	03:23
kmalloc	zxiiro: set to <0.7.0 for now	03:24
kmalloc	clarkb: i'll write an email to the ML tomorrow/later tonight if someone else doesn't get to it first.	03:24
*** lathiat has quit IRC		03:36
*** lathiat has joined #openstack-infra		03:36
*** yamamoto has quit IRC		03:39
ianw	kmalloc: i just dropped a mail now that we have everything lined up	03:43
kmalloc	Thanks!	03:49
*** ramishra has joined #openstack-infra		03:49
kmalloc	I think I have a fix for SDK, just need to poke at it a but tomorrow.	03:49
kmalloc	Should be straight forward actually	03:49
*** lbragstad has joined #openstack-infra		03:50
*** lbragstad has quit IRC		03:51
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add projects page https://review.openstack.org/604266	03:55
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor change page to use a reducer https://review.openstack.org/625145	03:55
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor projects page to use a reducer https://review.openstack.org/625146	03:55
*** dave-mccowan has quit IRC		04:03
*** udesale has joined #openstack-infra		04:10
*** psachin has quit IRC		04:10
*** slaweq has joined #openstack-infra		04:11
*** lbragstad has joined #openstack-infra		04:13
*** armax has quit IRC		04:14
*** slaweq has quit IRC		04:16
*** ykarel\|away has joined #openstack-infra		04:24
*** psachin has joined #openstack-infra		04:28
*** hongbin has quit IRC		04:34
*** woojay has joined #openstack-infra		04:41
*** jamesmcarthur has joined #openstack-infra		04:45
*** jamesmcarthur has quit IRC		04:49
*** bhavikdbavishi has quit IRC		04:50
*** bhavikdbavishi has joined #openstack-infra		04:51
*** _alastor_ has quit IRC		04:54
*** yamamoto has joined #openstack-infra		04:55
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool master: Implement an OpenShift resource provider https://review.openstack.org/570667	04:55
*** _alastor_ has joined #openstack-infra		05:02
*** wolverineav has joined #openstack-infra		05:08
*** slaweq has joined #openstack-infra		05:11
*** slaweq has quit IRC		05:16
*** lucasagomes has quit IRC		05:17
*** agopi has quit IRC		05:17
*** agopi has joined #openstack-infra		05:25
*** _alastor_ has quit IRC		05:34
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: update status page layout based on screen size https://review.openstack.org/622010	05:37
*** dklyle has joined #openstack-infra		05:38
*** yamamoto has quit IRC		05:40
*** gecong has joined #openstack-infra		05:44
*** gecong has quit IRC		05:50
*** wolverineav has quit IRC		05:51
spsurya	me and whoami-rajat discussed this, So we think we can save our infra resources, if we optimise and fix this https://storyboard.openstack.org/#!/story/2004569 also checking how much this is feasible, confirmation from infra team would correct our understanding and approach	05:58
spsurya	Thanks	05:58
*** gengchc has joined #openstack-infra		05:59
whoami-rajat	fungi clarkb ^ Please provide your valuable inputs on the above query. Thanks!	06:05
gengchc	hello EmilienM! There are a problem in freezer-api and freezer. Elasticsearch server can't start, Could you please take a look at https://review.openstack.org/#/c/624867/ . error message is [pkg/elasticsearch.sh:_check_elasticsearch_ready:53 : die 53 'Maximum timeout reached. Could not connect to ElasticSearch']	06:05
openstackgerrit	OpenStack Proposal Bot proposed openstack-infra/project-config master: Normalize projects.yaml https://review.openstack.org/625149	06:06
*** bhavikdbavishi has quit IRC		06:06
*** ykarel\|away is now known as ykarel		06:11
*** agopi has quit IRC		06:13
*** lpetrut has joined #openstack-infra		06:13
*** agopi has joined #openstack-infra		06:18
*** yamamoto has joined #openstack-infra		06:20
*** hwoarang has quit IRC		06:20
*** hwoarang has joined #openstack-infra		06:21
*** bhavikdbavishi has joined #openstack-infra		06:24
*** agopi has quit IRC		06:25
*** jtomasek has quit IRC		06:52
*** rcernin has quit IRC		07:03
*** jtomasek has joined #openstack-infra		07:06
*** quiquell\|off is now known as quiquell		07:11
*** slaweq has joined #openstack-infra		07:11
*** pcaruana has joined #openstack-infra		07:12
*** slaweq has quit IRC		07:16
*** gengchc has quit IRC		07:21
*** aojea has joined #openstack-infra		07:24
*** bhavikdbavishi has quit IRC		07:35
*** pgaxatte has joined #openstack-infra		07:36
*** dpawlik has joined #openstack-infra		07:38
*** ssbarnea\|rover has joined #openstack-infra		07:40
*** slaweq has joined #openstack-infra		07:41
*** slaweq has quit IRC		07:47
*** yamamoto has quit IRC		07:48
*** yamamoto has joined #openstack-infra		07:48
*** psachin has quit IRC		07:48
*** slaweq has joined #openstack-infra		07:51
*** ginopc has joined #openstack-infra		07:56
*** yamamoto has quit IRC		07:57
*** lpetrut has quit IRC		07:58
*** rpittau has joined #openstack-infra		08:06
*** apetrich has joined #openstack-infra		08:09
*** markvoelker has joined #openstack-infra		08:16
openstackgerrit	Merged openstack-infra/project-config master: Add 'Review-Priority' for Cinder repos https://review.openstack.org/620664	08:24
*** imacdonn has quit IRC		08:24
*** imacdonn has joined #openstack-infra		08:24
*** dkehn has quit IRC		08:28
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add change status page https://review.openstack.org/599472	08:35
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor change page to use a reducer https://review.openstack.org/625145	08:35
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add projects page https://review.openstack.org/604266	08:35
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor projects page to use a reducer https://review.openstack.org/625146	08:35
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add labels page https://review.openstack.org/604682	08:35
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add nodes page https://review.openstack.org/604683	08:35
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor build page to use a reducer https://review.openstack.org/624894	08:35
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor build page using a container https://review.openstack.org/624895	08:35
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add errors from the job-output to the build page https://review.openstack.org/624896	08:35
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add jobs graph rendering https://review.openstack.org/537869	08:35
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: add project page https://review.openstack.org/625177	08:35
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor job page to use a reducer https://review.openstack.org/625178	08:35
*** yamamoto has joined #openstack-infra		08:35
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor labels page to use a reducer https://review.openstack.org/625179	08:35
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul master: web: refactor nodes page to use a reducer https://review.openstack.org/625180	08:35
*** priteau has joined #openstack-infra		08:38
*** bhavikdbavishi has joined #openstack-infra		08:42
*** jpena\|off is now known as jpena		08:48
*** tosky has joined #openstack-infra		08:53
*** shardy has joined #openstack-infra		09:01
*** Emine has joined #openstack-infra		09:05
*** gfidente\|afk is now known as gfidente		09:14
*** bhavikdbavishi has quit IRC		09:17
*** ccamacho has joined #openstack-infra		09:20
*** yamamoto has quit IRC		09:37
*** ykarel is now known as ykarel\|lunch		10:00
*** bhavikdbavishi has joined #openstack-infra		10:07
*** pbourke has quit IRC		10:29
*** pbourke has joined #openstack-infra		10:31
*** agopi has joined #openstack-infra		10:31
*** ykarel\|lunch is now known as ykarel		10:32
*** agopi has quit IRC		10:36
*** markvoelker has quit IRC		10:36
*** markvoelker has joined #openstack-infra		10:37
*** e0ne has joined #openstack-infra		10:40
*** bhavikdbavishi has quit IRC		10:40
*** markvoelker has quit IRC		10:41
*** bhavikdbavishi has joined #openstack-infra		10:46
*** electrofelix has joined #openstack-infra		10:49
*** rpittau is now known as rpittau\|lunch		11:10
*** bhavikdbavishi has quit IRC		11:10
*** yamamoto has joined #openstack-infra		11:15
*** rfolco has joined #openstack-infra		11:16
*** markvoelker has joined #openstack-infra		11:16
*** derekh has joined #openstack-infra		11:17
*** bhavikdbavishi has joined #openstack-infra		11:28
dulek	Hey, any idea what might be using port 50036 on infra VM's? Or how do I check that?	11:33
dulek	Our kuryr-daemon is unable to bind to it: http://logs.openstack.org/54/623554/6/check/kuryr-kubernetes-tempest-daemon-containerized-octavia-py36/aba6dc8/controller/logs/kubernetes/pod_logs/kube-system-kuryr-cni-ds-lb8xk-kuryr-cni.txt.gz#_2018-12-14_09_02_55_111	11:33
dulek	It's not 100% of the time, but from time to time the port is taken.	11:34
*** bhavikdbavishi has quit IRC		11:36
*** gary_perkins has quit IRC		11:37
*** rossella_s has quit IRC		11:44
*** rossella_s has joined #openstack-infra		11:44
*** gary_perkins has joined #openstack-infra		11:51
*** udesale has quit IRC		12:11
*** rpittau\|lunch is now known as rpittau		12:13
*** tpsilva has joined #openstack-infra		12:14
*** bhavikdbavishi has joined #openstack-infra		12:17
*** pcaruana has quit IRC		12:21
*** pcaruana has joined #openstack-infra		12:22
*** rh-jelabarre has joined #openstack-infra		12:23
*** bobh has quit IRC		12:24
*** pcaruana is now known as pcaruana\|intw\|		12:25
*** yamamoto has quit IRC		12:28
*** yamamoto has joined #openstack-infra		12:30
*** yamamoto has quit IRC		12:30
*** bobh has joined #openstack-infra		12:30
*** jpena is now known as jpena\|lunch		12:31
openstackgerrit	Jeremy Stanley proposed openstack-infra/zone-opendev.org master: Add address records for lists.opendev.org https://review.openstack.org/625241	12:32
fungi	clarkb: jhesketh: corvus: ^ corresponding dns addition for lists.opendev.org	12:33
*** _alastor_ has joined #openstack-infra		12:35
jhesketh	fungi: lgtm	12:37
*** smarcet has joined #openstack-infra		12:38
*** smarcet has quit IRC		12:39
*** bobh has quit IRC		12:41
*** bobh has joined #openstack-infra		12:44
openstackgerrit	Merged openstack-infra/system-config master: Add lists.opendev.org to Mailman https://review.openstack.org/625096	12:53
*** bhavikdbavishi has quit IRC		12:54
*** boden has joined #openstack-infra		12:55
*** ykarel has quit IRC		13:00
*** ykarel has joined #openstack-infra		13:01
fungi	spsurya: whoami-rajat: it's come up many times in the past (as you can expect, lots of people propose this "simple" optimization), but it requires a lot of discussion because of the combination of whether we configure gerrit to clear or preserve verified votes on commit-message-only edits, whether some projects might want ci jobs which lint commit messages, and so on. can't hurt to discuss it again,	13:02
fungi	but there's a lot more nuance to it than it might seem	13:02
*** bhavikdbavishi has joined #openstack-infra		13:04
*** yamamoto has joined #openstack-infra		13:05
fungi	dulek: 50036 is well within the default ephemeral ports range for the linux kernel (32768-61000) as well as iana's suggested range (49152-65535) so it could and most probably is something random and maybe a different process each time you hit that	13:05
fungi	dulek: assigning a static listening port above 2^15 is a bad idea	13:06
spsurya	fungi: thanks for update	13:07
*** weshay_pto is now known as weshay		13:07
*** dave-mccowan has joined #openstack-infra		13:09
dulek	fungi: Okay, thanks!	13:11
*** trown\|outtypewww is now known as trown		13:11
fungi	dulek: in ci jobs, it's often more effective to use a method which chooses an available ephemeral port and then passes that information along to whatever routines will try connecting to it. this also allows you to have the same fixture start up multiple copies of a listening service without them conflicting over a single port and without having to manually configure individual ports for them	13:13
*** quiquell is now known as quiquell\|lunch		13:13
fungi	i forget the syscall, but you can basically ask the socket to bind to an unspecified ephemeral port and it will get assigned one and return the integer value on success	13:14
fungi	if this is python, socket.socket() and friends probably have a parameter explicitly for this	13:14
*** _alastor_ has quit IRC		13:14
*** bobh has quit IRC		13:14
dulek	fungi: That would be doable, but initially we've simply used file socket, but stopped due to some issues with requests lib. Maybe we should revisit that approach.	13:15
*** dave-mccowan has quit IRC		13:15
fungi	sure, a named pipe/fifo for a unix socket is a useful alternative if you don't actually need it to be a real network connection	13:15
*** yamamoto has quit IRC		13:18
*** yamamoto has joined #openstack-infra		13:18
*** EmilienM is now known as EvilienM		13:20
*** bobh has joined #openstack-infra		13:20
openstackgerrit	Jeremy Stanley proposed openstack-infra/system-config master: Add rust-vmm OpenDev ML https://review.openstack.org/625254	13:23
*** bobh has quit IRC		13:25
fungi	jhesketh: clarkb: corvus: ^ and the first mailing list anyone has requested us to host on lists.opendev.org	13:26
*** ykarel is now known as ykarel\|afk		13:29
*** weshay is now known as weshay1-1		13:34
*** jpena\|lunch is now known as jpena		13:35
*** bhavikdbavishi has quit IRC		13:36
*** rlandy has joined #openstack-infra		13:37
*** derekh has quit IRC		13:47
jhesketh	+2	13:51
*** kgiusti has joined #openstack-infra		13:54
*** mriedem has joined #openstack-infra		13:56
*** dkehn has joined #openstack-infra		13:58
Shrews	fungi: dulek: i think binding to port 0 picks a random, available port	14:00
*** ykarel\|afk is now known as ykarel		14:00
Shrews	iirc, we do that in nodepool tests a lot	14:01
*** pcaruana\|intw\| has quit IRC		14:05
*** weshay1-1 is now known as weshay		14:05
fungi	oh, yep, that's the way	14:06
fungi	for some reason i always forget port 0 is magic	14:06
*** Emine has quit IRC		14:16
*** jamesmcarthur has joined #openstack-infra		14:18
*** derekh has joined #openstack-infra		14:24
*** bobh has joined #openstack-infra		14:26
*** udesale has joined #openstack-infra		14:27
*** dave-mccowan has joined #openstack-infra		14:31
*** pcaruana has joined #openstack-infra		14:31
*** bobh has quit IRC		14:32
*** bobh has joined #openstack-infra		14:37
*** dave-mccowan has quit IRC		14:43
dhellmann	gerrit-admin: could someone add me to the git-os-job-core and git-os-job-release groups, please, so I can complete the migration? https://review.openstack.org/#/admin/groups/1988,members and https://review.openstack.org/#/admin/groups/1989,members	14:47
*** panda\|off is now known as panda		14:50
*** psachin has joined #openstack-infra		14:50
dansmith	Shrews: I'm finger \| grep'ing right now.. nifty trick	14:51
*** Adri2000 has quit IRC		14:51
frickler	dhellmann: done	14:51
openstackgerrit	Doug Hellmann proposed openstack-infra/project-config master: add release jobs for git-os-job https://review.openstack.org/625273	14:52
dhellmann	frickler : thanks!	14:52
fungi	dhellmann: also if you don't feel like using the gerrit groups search form (i find it cumbersome) you can use urls like https://review.openstack.org/#/admin/groups/git-os-job-core	14:53
dhellmann	oh, that's handy	14:53
dhellmann	I couldn't remember the name of the group in this case, so I started from the git repo details page	14:53
dhellmann	but in future...	14:53
fungi	ahh, yep	14:54
*** jamesmcarthur has quit IRC		14:54
fungi	i think i stumbled on that entirely by accident, so not sure where/whether it's actually documented	14:54
*** Adri2000 has joined #openstack-infra		14:55
openstackgerrit	Hervé Beraud proposed openstack-dev/pbr master: Allow git-tags to be SemVer compliant https://review.openstack.org/618569	14:57
ssbarnea\|rover	is thera way for zuul to finish a job with a WARNING result? I think I seen somewhere some nice orange WARNING results in gerrit, but not sure where.	14:58
Shrews	dansmith: ++	14:58
openstackgerrit	Hervé Beraud proposed openstack-dev/pbr master: Allow git-tags to be SemVer compliant https://review.openstack.org/618569	14:59
*** jamesmcarthur has joined #openstack-infra		15:00
*** armstrong has joined #openstack-infra		15:01
*** zul has joined #openstack-infra		15:04
fungi	ssbarnea\|rover: these are the job statuses documented as provided by zuul: https://zuul-ci.org/docs/zuul/user/jobs.html#build-status	15:05
*** markvoelker has quit IRC		15:06
*** smarcet has joined #openstack-infra		15:09
*** quiquell\|lunch is now known as quiquell		15:12
*** psachin has quit IRC		15:21
*** dpawlik has quit IRC		15:24
openstackgerrit	sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957	15:28
*** psachin has joined #openstack-infra		15:29
*** jamesmcarthur has quit IRC		15:31
boden	hi, has anyone else reported a ContextualVersionConflict error cropping up in the last day or so that appears to be related to eventlet?? ex: http://logs.openstack.org/05/624205/2/check/vmware-tox-lower-constraints/8c25e0b/tox/lower-constraints-2.log	15:31
*** jamesmcarthur has joined #openstack-infra		15:31
boden	I can't seem to figure out what's changed	15:31
fungi	you've compared that install log with a previous passing run?	15:33
fungi	looks like it's getting eventlet 0.24.1 (maybe via oslo.service?) when the constraint requests <0.21.0	15:34
*** jamesmcarthur has quit IRC		15:36
boden	fungi yeah I'm just trying to understand why/how... it just started breaking in the last 24hrs or so and I don't see any changes in requirements that would've affected it	15:38
fungi	Collecting eventlet==0.24.1 (from -c /home/zuul/src/git.openstack.org/openstack/vmware-nsx/lower-constraints.txt (line 22))	15:39
fungi	http://logs.openstack.org/05/624205/2/check/vmware-tox-lower-constraints/8c25e0b/tox/lower-constraints-1.log	15:39
openstackgerrit	Merged openstack-infra/storyboard master: Change openstack-dev to openstack-discuss https://review.openstack.org/622377	15:40
fungi	boden: https://git.openstack.org/cgit/openstack/vmware-nsx/tree/lower-constraints.txt#n22	15:40
boden	fungi yes, but lower constraints haven't changed there recently... so why starting to fail now	15:40
fungi	i'm looking for the <0.21.0	15:41
*** jamesmcarthur has joined #openstack-infra		15:42
fungi	resorting to http://codesearch.openstack.org/?q=eventlet.*<0.21.0 since my hunches based on the error message didn't pan out	15:43
fungi	none of those seem relevant either	15:44
fungi	oh, i should check the tagged versions	15:47
openstackgerrit	Sean McGinnis proposed openstack-infra/irc-meetings master: Switch release team meeting to Thursday 1600 https://review.openstack.org/625290	15:47
openstackgerrit	Doug Hellmann proposed openstack-infra/project-config master: import openstack-summit-counter repository https://review.openstack.org/625292	15:48
*** dpawlik has joined #openstack-infra		15:48
*** pgaxatte has quit IRC		15:48
fungi	boden: Collecting oslo.service==1.24.0 (from -c /home/zuul/src/git.openstack.org/openstack/vmware-nsx/lower-constraints.txt (line 80))	15:50
fungi	boden: so it's https://git.openstack.org/cgit/openstack/oslo.service/tree/requirements.txt?h=1.24.1#n6	15:50
fungi	er, https://git.openstack.org/cgit/openstack/oslo.service/tree/requirements.txt?h=1.24.0#n6 rather	15:51
fungi	so your lower constraint for eventlet is set higher than what your lower constraint for oslo.service supports as its maximum eventlet version	15:52
*** gfidente has quit IRC		15:52
fungi	that's the reason for the error	15:52
fungi	now as to why it only just started happening, this will require more digging	15:52
*** adriancz has quit IRC		15:52
*** dpawlik has quit IRC		15:52
boden	fungi yeah I don't understand why it just cropped up... I'll have to dig more as to how we can resolve it	15:53
*** armax has joined #openstack-infra		15:56
openstackgerrit	sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957	15:57
*** bhavikdbavishi has joined #openstack-infra		15:59
dansmith	clarkb: so I've been trying to poke the lvm timeout stuff with a stick	16:00
dansmith	clarkb: we tried serializing all lvm ops, which didn't seem to help	16:00
dansmith	clarkb: I'm also wondering if having a couple few 24G loop devices is causing us to do some really long buffer flushes	16:01
dansmith	clarkb: I dunno how much you know about how that works, but heavy writes to a loop device can OOM the system and the overhead is generally quite high, so I'm wondering if lvm ops occasionally cause a bunch of data to be flushed out and takes a really long time	16:01
dansmith	clarkb: >= bionic has direct-io support for loop, which should help if that's the case.. making loop devices behave more like real block devices. so I have a patch up for devstack to enable that when available	16:02
openstackgerrit	Merged openstack-dev/hacking master: Change openstack-dev to openstack-discuss https://review.openstack.org/622317	16:02
fungi	boden: https://review.openstack.org/605834 is when the eventlet lower constraint got bumped in vmware-nsx. that merged on october 4	16:04
fungi	the oslo.service lower constraint has remained unchanged since the job was added	16:04
fungi	boden: this really hasn't been failing all the way back to october 4?	16:04
boden	fungi: https://review.openstack.org/#/c/623609/	16:06
boden	see lower-constraints job	16:06
fungi	boden: agreed, http://zuul.openstack.org/builds?project=openstack%2Fvmware-nsx&job_name=vmware-tox-lower-constraints&branch=master shows it succeeded as recently as 15 hours ago	16:06
*** dklyle has quit IRC		16:07
*** dklyle has joined #openstack-infra		16:07
*** efried has joined #openstack-infra		16:08
smarcet	fungi: how are u doing , thx for your advices on reviews, but i having one issue on apt::update on xenial its complains about an entry on /etc/apt/sources.list : deb cdrom	16:13
*** dklyle has quit IRC		16:13
smarcet	fungi: if i remove that line by hand , the puppet runs ok	16:13
smarcet	fungi: i am testing the puppet on xenial 16.04 LTS server	16:14
smarcet	fungi: error seems to be _("/etc/apt/sources.list contains a cdrom source; not installing. Use 'allowcdrom' to override this failure.")	16:15
*** fuentess has joined #openstack-infra		16:16
clarkb	dansmith: that is good to know, I can review the devstack change if you like. I think we are largely switched over to bionic for devstasck/tempest testing at this point so we should see a change if the direct io support helps	16:16
ttx	Hi ! With some IRC meetings having moved to team channels, we have a lot more room in the "common" meeting rooms. To the point where openstack-meeting-5 is not used that much and we could easily consolidate to the other 4. Would that be desirable or overkill?	16:16
dansmith	clarkb: https://review.openstack.org/#/c/625269/2	16:16
dansmith	clarkb: swift uses a loop as well, but via mount -o loop, which doesn't get directio turned on..	16:17
dansmith	clarkb: I figure if this seems to help I can refactor out some loop utilities and do the loop manually for the swift piece if we decide it's worth it	16:17
ttx	with only 32 lurkers, meeting-5 fails to reach the "lurkers benefit too!" benefit	16:17
ttx	we could also get rid of #openstack-meeting-cp. 31 lurkers, no meeting.	16:18
*** bobh has quit IRC		16:18
* cmurphy had no idea we had a -5		16:19
ttx	cmurphy: well only neutron-upgrades and helm uses it right now. And they could move to another room free at the same time	16:19
*** pcaruana has quit IRC		16:20
openstackgerrit	Merged openstack-infra/git-review master: test_uploads_with_nondefault_rebase: fix git screen scraping https://review.openstack.org/623096	16:21
*** ginopc has quit IRC		16:22
*** quiquell is now known as quiquell\|off		16:23
*** e0ne has quit IRC		16:24
*** dklyle has joined #openstack-infra		16:24
clarkb	whoami-rajat: spsurya: One gotcha with that is that projects have chosen in the past to enforce testing against their commit messages beyond simple rules like metadata for depends on	16:25
openstackgerrit	Merged openstack-infra/zone-opendev.org master: Add address records for lists.opendev.org https://review.openstack.org/625241	16:25
clarkb	whoami-rajat: spsurya implementing a feature like that should likely go in zuul itself and be a per project flag if we want to try it.	16:26
*** jamesmcarthur has quit IRC		16:26
*** bobh has joined #openstack-infra		16:26
clarkb	that said as I have tried to point out elsewhere the real cost for openstack infra is tied up in a small number of repos and really one extra large project. I am going to continue to push for fixing flaky tests and reducing the impact of those expensive projects over these smaller optomizations	16:27
clarkb	What we get out of reliable testing is not just more efficient use of resources but better software too	16:27
ttx	fungi: opinion on that? (IRC channels ^)	16:27
clarkb	Shrews: yup port 0 will bind to an available high port and the python socket lib lets you ask the socket object for the port number it found	16:28
clarkb	super useful in testing	16:28
fungi	smarcet: you're seeing that error raised by our ci jobs, or on your local system? if the latter, i expect puppet just doesn't think you'll be running on a system installed from cd (or which gets its updates by cd anyway) and instead assumes you'll have removed the cdrom lines from your config already	16:28
smarcet	the later	16:29
smarcet	ok i will remove by hand then and re test	16:29
smarcet	thx u1	16:29
smarcet	!	16:29
jbryce	Thanks for setting up the lists.opendev.org pieces. I think this is a simple but neat step toward getting more communities involved	16:29
*** electrofelix has quit IRC		16:29
clarkb	jbryce: its on its way. I've +2'd https://review.openstack.org/#/c/625254/1 but not approved it in case fungi would like more non OSF input first. fungi you've tended to be cautious on that front in the past, let me know if I should just go ahead an approve or if you want to	16:30
fungi	ttx: i do not object to smashing meeting-5 and meeting-cp into the others if someone wants to reach out to those teams to ask them to consolidate. they likely need time to warn their regular attendees about the channel changes	16:31
ttx	yes of course. Was just wanting to gut-check that was desirable before starting anything	16:31
fungi	clarkb: well, we have jhesketh's blessing at least. but sure, if we can get an additional infra-root reviewer to weigh in i'm all for that as it is our first proposed mailing list on that new domain	16:32
clarkb	fungi: any chance you have a moment to quickly review https://review.openstack.org/#/c/615968/3 and its parents. I think I can likely get through that portion of the stack today (so I can approve them in chunks today and babysit)	16:34
fungi	trying to catch up, but sure i'll get it on my roster	16:34
clarkb	thanks!	16:35
clarkb	dansmith: fwiw my version of losetup says that direct-io=on is the default setting	16:35
clarkb	dansmith: possible we are already enabling it on bionic. /me digs up a bionic manpage	16:36
dansmith	clarkb: mine too, but I floated a test patch to confirm that's a lie	16:36
clarkb	oh neat	16:36
dansmith	clarkb: clarkb https://review.openstack.org/#/c/625268/	16:36
clarkb	ya bionic manpage says the same, so if it isn't actually set to on as claimed thats a fun bug	16:36
dansmith	see the pastebin in there	16:36
clarkb	certainly seems set to 0	16:37
dansmith	when I pass =on it goes to 1	16:37
dansmith	so yeah	16:37
*** gyee has joined #openstack-infra		16:37
*** jamesmcarthur has joined #openstack-infra		16:38
*** jamesmcarthur has quit IRC		16:38
*** jamesmcarthur has joined #openstack-infra		16:39
*** dklyle has quit IRC		16:41
*** sthussey has joined #openstack-infra		16:41
clarkb	frickler: if you are still around https://review.openstack.org/#/c/625269/ is dansmiths change above that may help cinder test reliability	16:44
*** wolverineav has joined #openstack-infra		16:44
openstackgerrit	Merged openstack-infra/zuul-jobs master: Vendor the RDO repository configuration for installing OVS https://review.openstack.org/624817	16:44
clarkb	mwhahaha: ^ fyi that should improve reliability of the multinode setup	16:45
mwhahaha	cool, in general it's been really stable lately	16:45
clarkb	(it is worth noting that that was always failing in pre-run so should've been retried, but we should avoid retries as much as possible where we can too)	16:45
clarkb	mwhahaha: ya I think http://status.openstack.org/elastic-recheck/gate.html#1708704 shows a worse picture than what we are seeing on gerrit because those failures will be retried	16:46
clarkb	but cleaning that up and getting it out of the way on e-r will improve resource usage slightly and also fix a bug	16:46
mwhahaha	we've had a few of those in our container update process where we were getting 503s from the mirrors	16:46
clarkb	(and reshuffle e-r with the more important graphs at the top)	16:46
mwhahaha	so it might have been accurate actually	16:46
fungi	yeah, every retry_limit result you see probably means 6x as many jobs got aborted (since some may work on the second or third retry)	16:47
clarkb	hrm oslo.policy crashing stestr subunit streams is a really weird interaction	16:50
clarkb	ah infinite recursion that will do it	16:50
fungi	clarkb: i gather it's likely due to creating massive amounts of stdout/stderr?	16:51
fungi	and yeah, i suppose unbounded recursion could explain that case	16:51
clarkb	fungi: possibly due to infinite recursion. https://review.openstack.org/#/c/625114/4/glance/quota/__init__.py seems to be the fix	16:51
*** jamesmcarthur has quit IRC		16:51
openstackgerrit	sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957	16:52
*** jamesmcarthur has joined #openstack-infra		16:53
*** ykarel is now known as ykarel\|away		16:55
*** tosky has quit IRC		16:56
*** jamesmcarthur has quit IRC		16:57
*** shardy is now known as shardy_mtg		16:58
*** jamesmcarthur has joined #openstack-infra		17:03
*** udesale has quit IRC		17:07
*** wolverineav has quit IRC		17:07
openstackgerrit	sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957	17:08
clarkb	ssbarnea\|rover: was going to ask if you had any more insight into the possible du issue. I'm mostly curious to know what the cause of that was when you sort it out as it seems like it could be useful knoweldge for the future :)	17:09
*** sshnaidm\|off has quit IRC		17:10
*** mriedem is now known as mriedem_lunch		17:10
ssbarnea\|rover	clarkb: sure, I think i almost nailed it. I will add you the the review so you will be able to see it, ok?	17:10
ssbarnea\|rover	clarkb: mainly I am still on it!	17:10
clarkb	ssbarnea\|rover: thanks	17:10
openstackgerrit	sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957	17:10
clarkb	ssbarnea\|rover: its the sort of bug that knowing what causes zuul to exhibit that behavior is useful as a zuul oeprator :)	17:11
*** dklyle has joined #openstack-infra		17:11
*** aojea has quit IRC		17:11
ssbarnea\|rover	clarkb: sadly I am not sure but i suspect i found a workaroud, see https://review.openstack.org/#/c/624381/13	17:11
ssbarnea\|rover	using threshold param on du makes it 10x faster, probably because sort ends up doing much less work to sort.	17:12
clarkb	interesting so du does still seem suspect	17:12
clarkb	the threshold flag is probably a reasonable compromise there	17:12
ssbarnea\|rover	using timeout solves nothing, even with SIGKILL it does not do it.	17:13
dmellado	hey clarkb is there any issues with zuul as of now? I have seen patches passing on the gate queue being stuck for a while and not getting merged....	17:13
*** e0ne has joined #openstack-infra		17:13
clarkb	dmellado: I'm not aware of any functional issues with zuul itself	17:13
ssbarnea\|rover	there is also another aspect, which could underline a possible bug related to std* redirections or buggering.	17:14
clarkb	dmellado: kuryr-kubernetes is waiting for the top of its queue to pass tests so it can merge. kuryr-kubernetes-tempest-daemon-octavia is still running	17:14
*** ginopc has joined #openstack-infra		17:14
ssbarnea\|rover	if you remember we alway shard some warnings about closed pipes around du\|sort\|tail, somethign I was not able to reproduce outsize zuul.	17:14
openstackgerrit	sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957	17:14
dmellado	clarkb: I've seen for example https://review.openstack.org/#/c/623554/ and if you check for openstack/kuryr-kubernetes on zuul.openstack.org some patches stuck on the gate queue for a while	17:14
ssbarnea\|rover	maybe these warnings were related to the blocking bug.	17:15
clarkb	dmellado: yes the gate is a queue, the top/head of the queue must pass testing and merge before anything behind it can merge	17:15
clarkb	dmellado: this is what ensures correctness of the resulting code (we remove the race of one change breaking another my merging out of sync)	17:15
dmellado	d'oh	17:15
dmellado	forget it	17:15
dmellado	I had filtering enabled and didn't realize it	17:15
dmellado	lol	17:15
dmellado	I guess it's friday after all	17:15
*** ginopc has quit IRC		17:16
clarkb	dmellado: no worries. I had a pretty d'oh moment yesterday thinking we had broken requirements	17:17
dmellado	heh, glad that it didn't happen xD	17:17
clarkb	(turns out it was a broken job on unmerged code, the system was working as intended protecting us from the broken :) )	17:17
dmellado	xD	17:18
*** efried has quit IRC		17:18
*** rpittau has quit IRC		17:22
*** sshnaidm\|off has joined #openstack-infra		17:25
*** tobiash has quit IRC		17:29
clarkb	ssbarnea\|rover: thinking about the warnings more, perhaps also related to how zuul does logging?	17:29
clarkb	ssbarnea\|rover: could be there is a buffering bug lingering somewhere or similar	17:30
clarkb	(would need more data to debug that likely)	17:30
*** markvoelker has joined #openstack-infra		17:31
*** agopi has joined #openstack-infra		17:32
*** markvoelker has quit IRC		17:35
*** Emine has joined #openstack-infra		17:36
*** bnemec is now known as beekneemech		17:37
*** psachin has quit IRC		17:39
clarkb	mriedem_lunch: http://logs.openstack.org/74/624974/1/gate/tempest-slow/4ac5ef0/job-output.txt.gz#_2018-12-14_15_29_30_910274 timed out and lost a bunch of time in tempest. Any idea of what is going on there? I can dig in more if that is unfamiliar to you	17:39
clarkb	it seems that tempest got incredibly unhappy and it just snowballed from there	17:39
*** tobiash has joined #openstack-infra		17:40
*** e0ne has quit IRC		17:41
dansmith	clarkb: almost looks like something fundamental is stuck.. like keystone or apache itself	17:42
*** e0ne has joined #openstack-infra		17:43
*** rkukura has quit IRC		17:44
clarkb	dansmith: http://logs.openstack.org/74/624974/1/gate/tempest-slow/4ac5ef0/controller/logs/apache/access_log.txt.gz shows that apache seems to process requests during the lost time. Many of them to placement	17:44
dansmith	yep	17:44
clarkb	There is the occasional identity request (reupping token?)	17:44
dansmith	you know what I mean though, right? if everything after that is just timing out http calls..	17:45
clarkb	ya	17:45
*** tobiash has quit IRC		17:45
*** derekh has quit IRC		17:47
dansmith	clarkb: http://logs.openstack.org/74/624974/1/gate/tempest-slow/4ac5ef0/controller/logs/screen-n-api.txt.gz#_Dec_14_15_02_19_898727	17:49
dansmith	rabbit goes down at some point it looks like	17:49
*** dklyle has quit IRC		17:51
dansmith	although I don't really see any evidence in rabbit's log,	17:51
dansmith	so maybe something networking-wise	17:51
*** wolverineav has joined #openstack-infra		17:52
*** wolverineav has quit IRC		17:52
*** wolverineav has joined #openstack-infra		17:52
clarkb	there are a bunch of missed heartbeats but ya other than that rabbit doesn't seem to think something is wrong	17:52
dansmith	we're connecting to the public ip, but that shouldn't really require that the network be up	17:53
dansmith	so it'd have to be something like iptables blocking something, or just extreme scheduling lag or something like that	17:53
dansmith	http://logs.openstack.org/74/624974/1/gate/tempest-slow/4ac5ef0/compute1/logs/screen-n-cpu.txt.gz?level=ERROR	17:54
dansmith	the compute is way unhappy about rabbit	17:54
dansmith	it also got 502 Proxy Error from cinder	17:55
*** tobiash has joined #openstack-infra		17:55
dansmith	and neutron	17:55
clarkb	dansmith: I've pulled up a render of the dstat csv and I think that explains why this happens	17:56
clarkb	cpu wai > 50% for much of the job	17:56
clarkb	and load skyrockets. Basically not enough cpu to go around	17:57
dansmith	do you have some automated way to do that btw?	17:57
dansmith	clarkb: cpu wai is not iowait right?	17:57
clarkb	cpu wai is when the kernel is busy waiting on io iirc	17:57
clarkb	so it can be iowait related	17:57
clarkb	dansmith: I dump the csv file from the job into https://lamada.eu/dstat-graph/	17:57
dansmith	okay, iowait does not mean that there's not enough cpu to go around	17:58
dansmith	ooh	17:58
*** armax has quit IRC		17:58
dansmith	have to download it to drag/drop it I guess?	17:58
clarkb	ya. There is probably a way to hack things in the js behind the scenes to load from http but I'm a browser noob	17:59
*** graphene has joined #openstack-infra		17:59
*** dklyle has joined #openstack-infra		18:00
clarkb	thats a good point though. cpus are busy waiting on other things. Not things the cpu can do itself	18:00
dansmith	that cpu wai looks like iowait to me,	18:01
dansmith	which would mean we're on a really io constrained node	18:01
clarkb	we don't appoear to be swapping either (though that graph doesn't actually render swap usage so I'll need to look more carefully at the raw data)	18:01
dansmith	and io total is low until the end	18:01
openstackgerrit	sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957	18:01
dansmith	but, just because we're not doing any doesn't mean we're not trying	18:01
dansmith	fwiw, this was next on my list of debugging cinder timeouts, rendering out this data to see if we see spikes around the time we hang for a while	18:02
clarkb	ya, it is also curious that it seems to start around when we start tempest	18:02
*** ykarel\|away has quit IRC		18:02
clarkb	er no devstack? /me double checks timestamps	18:02
*** trown is now known as trown\|lunch		18:03
clarkb	http://logs.openstack.org/74/624974/1/gate/tempest-slow/4ac5ef0/job-output.txt.gz#_2018-12-14_14_50_51_551412 tempest correlates strongly	18:03
dansmith	ah yeah, see,	18:03
clarkb	so devstack + services isn't unhappy until it adds workload	18:04
dansmith	the cpu idle value is like 50% most of the time	18:04
dansmith	that means it has nothing to do, but is doing a lot of waiting	18:04
dansmith	if you mouse over the graph you can see the actual cpu idle value, which is hard to see otherwise since it's white on my screen at least	18:05
*** tobiash has quit IRC		18:05
*** jpena is now known as jpena\|off		18:06
dansmith	disk traffic in MB/s is zero the whole time until the end and then writes spike,	18:06
dansmith	but there are write iops the whole time	18:06
smcginnis	So directio should help prevent that big flush from happening at the end.	18:07
*** tobiash has joined #openstack-infra		18:07
dansmith	smcginnis: it's possible that that's what it is yeah, although I wouldn't expect the loop to hamstring the whole system this badly	18:07
smcginnis	Yeah, seems like something else is compounding the situation.	18:08
*** shardy_mtg has quit IRC		18:08
dansmith	there are literally zero read iops over the range I'm looking at, but some write iops all the time	18:08
dansmith	which really sounds like thrashing with very little io bandwidth	18:08
clarkb	what is odd is devstack dose a bunch of io too	18:12
dansmith	there is a big spike in writes and write iops early in the run, which is exactly when cinder runs lvchange -ay for the first volume	18:12
clarkb	so we have io available during the start and end of the job. It isn't until we try to use the cloud that we find it unhappy	18:12
dansmith	and load starts climbing right there and never recovers	18:12
dansmith	let me get a screenshot, this is interesting	18:12
clarkb	so could be a combo workload plus a bug or different io demands	18:12
dansmith	https://imgur.com/a/0bvhVEB	18:13
clarkb	(also we did just switch to bionic, possibly this is new bionic behavior and maybe direct-io would help)	18:13
dansmith	right as that disk spike, load goes nuts	18:13
dansmith	that spike is lvchange -ay $first_volume	18:13
ssbarnea\|rover	does anyone knows a way to dump the SSL certificates when using a https proxy? i need the certs returned by the proxy for debugging purposes.	18:14
clarkb	ssbarnea\|rover: openssl s_client	18:14
ssbarnea\|rover	clarkb: i know how to use it to get certificate from a normal web server but not to make the request to a proxy	18:14
dansmith	there's a net spike at the same time.. we don't have /opt mounted on nfs or something crazy do we?	18:15
ssbarnea\|rover	the HTTPS proxy would generate and sign a SSL cert using its own CA-cert.	18:15
clarkb	ssbarnea\|rover: ok so the proxy is a mitm	18:15
fungi	ssbarnea\|rover: right, make a request to the proxy and you'll get it	18:15
clarkb	ssbarnea\|rover: in that cse I think s_client should still work	18:15
clarkb	since that is the cert you see not the one on the backend	18:16
ssbarnea\|rover	clarkb: yep... me trying to debug why curl works and python requests (and pip) choke with the same cert bundle.	18:16
clarkb	ssbarnea\|rover: probably because python requests uses its own package of CAs to trust	18:16
clarkb	so it isn't using your system set	18:17
fungi	ssbarnea\|rover: they don't use the same trust set. python (or requests if python is too old) bundles its own by default	18:17
ssbarnea\|rover	clarkb: yep, i need the proxy cert, but the proxy cert returned when the request was made for that specific URL (cert will vary on each website)	18:17
clarkb	ssbarnea\|rover: not if you are mitm'd	18:18
ssbarnea\|rover	clarkb: I do have SSL_CERT_FILE=/Users/ssbarnea/cacert.pem and REQUESTS_CA_BUNDLE=/Users/ssbarnea/cacert.pem -- which worked well so far.	18:18
fungi	clarkb: they'll still differ for each site if it's a "transparent" proxy which uses its own ca to generate new certs for those sites on the fly	18:18
ssbarnea\|rover	cacert.pem contains the root-CA from the proxy server. proof that is ok is that both browsers and curl do accept use of the proxy.	18:19
clarkb	dansmith: maybe this is a case of getting your direct-io change in. Then looking to see if behavior changes or persists	18:19
dansmith	it will definitely be interesting to see if and what changes yeah	18:19
*** armax has joined #openstack-infra		18:19
dansmith	clarkb: side note.. we _have_ to automate this dstat graph thing right? :)	18:19
clarkb	dansmith: there was support for it in the stackviz tool but that broke somewhere along the way	18:20
fungi	ssbarnea\|rover: pass verify='/path/to/public_key.pem' as a parameter into your requests methods	18:20
clarkb	dansmith: but ya it would be nice t get this back into easy to consume format	18:20
dansmith	yeah	18:20
ssbarnea\|rover	yep, my proxy is configured in transparent mode but traffic is not enforced, still need to tell clients to use the proxy.	18:20
ssbarnea\|rover	fungi: I did set verify, still fails.	18:20
fungi	ssbarnea\|rover: you can also export REQUESTS_CA_BUNDLE	18:20
*** armax has quit IRC		18:21
ssbarnea\|rover	see https://gist.github.com/ssbarnea/3d5067d41abc68c3788f1c9bc0ab4418#file-ssl-request-transparent-proxy-txt-L29	18:21
ssbarnea\|rover	yes they are exported.	18:21
*** armax has joined #openstack-infra		18:22
dansmith	clarkb: I picked another random tempest-full run from a few days ago and it doesn't have the same signature	18:23
*** wolverineav has quit IRC		18:23
ssbarnea\|rover	i suspect one of two issues: either requests fails to load the entire ca bundle (>250kb in size), or it fails to validate the entire chain because the MITM re-signing generate some intermediary certs if I remember well. probably requests fails to inherit the trust from CA.	18:23
dansmith	like, we do IO the whole time successfully and much more normally	18:24
openstackgerrit	sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957	18:24
ssbarnea\|rover	obsiously that it makes no sense to export the certs from each visited site, trusting the CA should be enough.	18:25
fungi	ssbarnea\|rover: i can't fathom why there would be any intermediary chain required if you're putting the ca's signing cert into the bundle already	18:25
clarkb	if it does end up being an issue we can tie back to ubuntu bionic we may want to see if coreycb is able to help	18:25
*** dklyle has quit IRC		18:25
fungi	ssbarnea\|rover: contents of the ca bundle are basically the end of the line. if those signed something, they're trusted	18:26
ssbarnea\|rover	fungi: exactly. i used this proxy to install fedora/centos without any problems once I trusted the CA. Only pip chokes on it, on mac... in fact i have an idea...	18:26
fungi	er, of one of those signed something, the cert with that signature is trusted is what i meant to say	18:26
*** Swami has joined #openstack-infra		18:26
*** wolverineav has joined #openstack-infra		18:26
clarkb	fwiw pip has historically had other issues with proxies too	18:27
fungi	ssbarnea\|rover: oh! this is pip on a mac? not pypi-installed pip in a ci job?	18:27
ssbarnea\|rover	only one issue? python requests is notorious to do things in its own way around SSL.	18:27
fungi	i thought you were trying to work out how to dump and log ssl certs in a ci job. tunnel vision ;)	18:27
Linkid	hi	18:28
*** wolverineav has quit IRC		18:28
*** wolverineav has joined #openstack-infra		18:28
Linkid	I have a question about the spec I'm writing	18:28
ssbarnea\|rover	fungi: well, my long shot is to see if I can use a HTTPS proxy as a generic proxy, as a simple way to avoid configuring custom mirrors.	18:28
Linkid	I saw that you are using puppet modules to install services	18:28
Linkid	but I saw that there is a spec for using ansible + containers	18:29
clarkb	Linkid: yes, we are now running two services with ansible alone (no puppet), and working out some bugs shown in testing to do containers (so no container based services yet)	18:30
Linkid	so, I'm wondering what is the way I should speak of in the spec for a new service	18:30
clarkb	Linkid: depending on your patience for tools and in general I would likely assume ansible if you are impatient, but can assume containers if more patient	18:31
fungi	ssbarnea\|rover: you're sure your ca bundle is in pem format?	18:31
clarkb	(unfortunately we are in the weird sport of figuring out what our migration looks like. I think you should be fine to pick one and go with it and if we learn stuff that changes your spec we can help you to update it)_	18:32
*** armax has quit IRC		18:33
ssbarnea\|rover	fungi: it it would not be, curl would choke because I defined SSL_CERT_FILE to point to the same file.	18:33
openstackgerrit	Clark Boylan proposed openstack-infra/system-config master: Set iptables forward drop by default https://review.openstack.org/624501	18:33
openstackgerrit	Clark Boylan proposed openstack-infra/system-config master: Collect syslogs from nodes in ansible tests https://review.openstack.org/624827	18:33
openstackgerrit	Clark Boylan proposed openstack-infra/system-config master: Import install-docker role https://review.openstack.org/605585	18:33
fungi	ssbarnea\|rover: is it possible curl supports SSL_CERT_FILE in other formats? rather than a bare cert	18:34
fungi	er, like a bare cert	18:34
clarkb	infra-root ^ I've removed the ipv6 setting from that change as we intend on using the system network namespace anyway. I expect that update will get the change into a mergeable state. Now why did it just push all three chagnes I only updated the last one	18:34
clarkb	OH!	18:34
fungi	ssbarnea\|rover: like could it be in der format maybe?	18:34
ssbarnea\|rover	fungi: I don't know but I will narrow it down, probably tomorrow as I am already getting tired.	18:34
ssbarnea\|rover	looks like PEM to me.	18:34
clarkb	fungi: stephenfin ssbarnea\|rover ^ this points at a git-review bug	18:34
clarkb	I ran git-review in a dir that doesn't exist on master so it failed with Errors running git reset --hard 3496292b845d33be6c5649195a54ccbf76494050	18:35
openstackgerrit	sebastian marcet proposed openstack-infra/puppet-openstackid master: Updated script to support PHP7 https://review.openstack.org/624957	18:35
clarkb	I think it must've done the rebase by that point	18:35
clarkb	and that error prevented it from undoing the rebase so when I pushed it pushed up the rebase too	18:35
fungi	ssbarnea\|rover: another workaround seems to be setting cert=/path/to/ca.crt in a [global] section within pip.conf	18:35
clarkb	thankfully the diffs come out cleanly	18:36
ssbarnea\|rover	fungi: it does not help as it does the same thing as setting the variable, already did. the only way I was able to make it work was to set verify=False which is not really an option	18:36
ssbarnea\|rover	anyway, i will find a solution.	18:37
fungi	clarkb: i expect we don't get enough testing of running git-review from random subdirs of a repo. maybe we should make sure it performs all its actions under a chdir	18:37
clarkb	fungi: ya repo root should be more predictable	18:37
fungi	ssbarnea\|rover: this sounds like you should be filing a bug report against pip/requests or asking for help in their irc channels instead	18:38
jrosser	ssbarnea\|rover: i have a similar situation and add a custom CA to the system CA store	18:38
jrosser	then using the requests env var to point to that and it is all good	18:38
fungi	jrosser: that's working for pip install in particular?	18:38
*** smarcet has quit IRC		18:38
fungi	or just requests-based python software in general?	18:38
jrosser	i get an entire openstack-ansible deploy done in an environment like that	18:39
jrosser	the big gotcha is that by default requests is setup to use certifi on ubuntu, but you can't add extra custom CA to that	18:39
*** graphene has quit IRC		18:39
jrosser	so the env var is needed to point it at ca-certificates stuff instead	18:40
jrosser	so in principle if all the certs are good then it should work	18:41
dansmith	clarkb: does anyone try to do correlation between e-r failures and providers?	18:42
dansmith	clarkb: it would be interesting to know if the cinder-related failures are almost always on one provider	18:42
clarkb	dansmith: I had tried some of it with the ovh bhs1 slowness we saw	18:42
clarkb	dansmith: there were ~6 bugs that went away when we turned off bhs1 the first time	18:43
ssbarnea\|rover	there is a patch for e-r that should add metadata about this, i think.	18:43
dansmith	clarkb: nice	18:43
ssbarnea\|rover	it should provide info while you hover of the graph, once we merge it.	18:43
fungi	dansmith: yes, fairly easy to do by following the logstash query link from the graphs page and then adding the node_provider column	18:43
clarkb	dansmith: I haven't followed up since we turned off bhs1 again (and is still off)	18:43
clarkb	at least in that specific case amorin foudn a memory leak on their end that would need fixing and we'll likely do artificial load testing with devstack and tempets outside of zuul before adding it back to the pool	18:44
dansmith	fungi: ah thanks	18:44
fungi	dansmith: usually if there is a strong correlation with node_provider in the results you don't really need to do much statistical analysis	18:44
dansmith	fungi: yeah	18:44
fungi	like, if it's a provider-specific issue 9 out of 10 results will be that one provider	18:44
clarkb	dansmith: in this particular case it happened on inap which we weren't previously watching for io issues	18:44
fungi	at least on the ones i've investigated in the past	18:45
dansmith	yup I just didn't know I could click one box and get that in front of my face	18:45
*** chandan_kumar has quit IRC		18:46
*** dklyle has joined #openstack-infra		18:46
fungi	on the failures related to job timeouts it's been a little more subtle, so i've resorted to turning off all columns except node provider, pasting the resulting list into a file and running it through sort\|uniq -c	18:46
fungi	using the little gear to the top-right of the results list to crank up the number of results per page also helps for that case	18:47
fungi	so that you don't have to stich together multiple pages of results	18:47
fungi	er, stitch	18:47
dansmith	yeah	18:48
*** mriedem_lunch is now known as mriedem		18:49
Linkid	clarkb: ok, thanks :)	18:49
clarkb	(thinking out loud with little hard evidence here) if we do find that we have general IO issues across clouds we may have to consider it is not cloud specific but potentially something in how nova runs compute or issues in the linux kernel. I think most of our clouds use local storage for our instances. The exception being vexxhost in sjc1 whcih is boot from ceph backed volumes	18:50
mriedem	clarkb: nope don't think i've seen that	18:50
ssbarnea\|rover	clarkb: fungi dansmith : it may worth checking https://review.openstack.org/#/c/260188/10 about e-r - i only rebased it to make it pass ci and tested CLI output. i didn't had time to test the graphs.	18:50
*** mriedem has quit IRC		18:52
*** dklyle has quit IRC		18:54
ssbarnea\|rover	clarkb: aparently my du patch seems to work well, but I will do two more rechecks on it to be sure is not a just luck.	18:54
*** mriedem has joined #openstack-infra		18:56
fungi	ssbarnea\|rover: what was the workaround there?	18:58
ssbarnea\|rover	fungi: --threshold=100K combined with a nohup ... & -- just to be sure.	18:58
fungi	ssbarnea\|rover: oh, you weren't using --summarize	18:59
fungi	hence, tons of stdout	19:00
ssbarnea\|rover	likely i decided that i don't want to wait for du to finish. the irony that the treshold already made it it run in 0s.	19:00
ssbarnea\|rover	but the background part should be a saferty measure if it ever hangs again, the job will miss the report, but it will be a success.	19:01
clarkb	as long as you still get a total so you can see if thinsg move on the long tail I think its fine	19:01
ssbarnea\|rover	this is why i want to have 2-3 rechecks to see if I spot one such job.	19:01
*** fuentess has quit IRC		19:02
ssbarnea\|rover	--threshold=100K was a huge improvement on my own machine: it reduces the report from 15s to 1s.	19:02
*** jamesmcarthur has quit IRC		19:02
ssbarnea\|rover	(not same data as build but clearly less data goes to sort, not like 1M files to sort just to keep top 200)	19:03
clarkb	fungi: I promise to buy you all the beers in portland next month if you can review https://review.openstack.org/#/c/615968/ and children :) (I'm impatient but the turnaround time on those puppet related chagnes isn't the quickest)	19:09
fungi	clarkb: i have them up in gertty, almost there now	19:09
clarkb	yay	19:09
fungi	also, i don't think i could drink all the beer in portland any more than i could eat all the rice in china	19:10
clarkb	the trick is to be strategic about the beers you do drink then call it good	19:10
fungi	sound logic	19:10
fungi	i just now approved the one for logstash.openstack.org	19:11
*** dklyle has joined #openstack-infra		19:11
fungi	do you want me to only +2 the others so you can approve at your preferred pace?	19:11
clarkb	fungi: ya that would be good	19:11
fungi	don't want to flood you with too many at once	19:11
clarkb	I'll do subunit workers and elasticsearch with the logstash one as well. Then make sure those are happy before being a bit more cautious with the git servers	19:12
*** e0ne has quit IRC		19:13
fungi	there are so many of these chained together that the threading in gertty's change display doesn't even give me the first letter of title on the last dozen or so	19:13
fungi	er, changes list display i mean	19:13
clarkb	hrm your reviews prompted me to hop on nodes and double check things and lists.o.o and wiki-dev don't have parser = future in their puppet.conf	19:14
clarkb	so I don't think this is actually working as expected.	19:14
clarkb	I think I won't approve any additional ones and instead try to figure out why they aren't futureparsing	19:14
fungi	oh, are we failing to set it?	19:14
*** bhavikdbavishi has quit IRC		19:15
*** bhavikdbavishi has joined #openstack-infra		19:15
openstackgerrit	Doug Hellmann proposed openstack-infra/project-config master: import openstack-summit-counter repository https://review.openstack.org/625292	19:16
clarkb	fungi: it looks that way though not obvious to me why that is teh case	19:16
fungi	what were some of the earlier ones we tried to turn on?	19:18
clarkb	`ansible --list-hosts futureparser` shows wiki-dev01.o.o is in the group	19:18
clarkb	fungi: eavesdrop is one that works	19:19
clarkb	it was the host before kata containers lists	19:19
clarkb	I haven't checked kata containers lists though	19:19
*** armax has joined #openstack-infra		19:19
clarkb	fungi: neither lists server shows up in the --list-hosts output from above	19:20
*** bhavikdbavishi has quit IRC		19:20
clarkb	I think there are at least two issues. The first is lists.* don't show up in the futureparser group at all. The second is hosts like wiki-dev01.o.o which is in the group not getting it. Oh that is beacuse puppet is disabled on wiki-dev01 maybe?	19:21
*** dklyle has quit IRC		19:21
clarkb	ya not seeing any puppeting happen there	19:21
clarkb	in that case I think we wait for logstash to happen and see if it works properly as expected. Then figure out the lists server group membership issue	19:21
clarkb	fungi: I think its a localized issue to the list servers now. Likely the glob is wrong for them	19:22
clarkb	logstash.o.o should confirm	19:22
clarkb	ya I see the problem now [0-9]* in glob means something different than in regex	19:23
clarkb	in regex it means match 0 or more, in glob it means match the digit always then match anything	19:23
* clarkb scribbles a note to come back around and address that when we can watch the lists		19:23
*** wolverineav has quit IRC		19:25
*** wolverineav has joined #openstack-infra		19:25
fungi	aha	19:26
fungi	yes indeedie	19:26
*** Emine has quit IRC		19:26
fungi	i don't think there's a "zero or more" operator in shell globbing	19:27
*** gagehugo has quit IRC		19:28
fungi	well, except for an any match at least	19:29
*** trown\|lunch is now known as trown		19:29
*** wolverineav has quit IRC		19:30
openstackgerrit	Clark Boylan proposed openstack-infra/system-config master: Import install-docker role https://review.openstack.org/605585	19:30
clarkb	this time git-review only pushed the one change	19:31
clarkb	ya the * is a match nothing or anything which would work there actually	19:31
clarkb	but I think we may want to list lists.o.o and lists[0-9]*.o.o separtely and delete lists.o.o when we get that host converted over	19:31
fungi	right. basically need two entries in the case of non-enumerated hostnames	19:32
fungi	or to cover the case of	19:33
clarkb	I'll try to get some of these changes merged before pushing updates to that file though :)	19:33
clarkb	rebasing yesterday was an interesting experience	19:33
fungi	seeing [0-9]* in there is definitely confusing as it means two distinctly different patterns depending on whether you intended a file glob or a regex	19:34
clarkb	ya I've already had to fix a bunch of related bugs	19:34
clarkb	+ isn't valid in globbing for example	19:34
fungi	one digit followed by anything (or nothing) vs zero or more digits	19:35
clarkb	maybe a file header comment that says "this file uses shell globs not regexes"	19:35
fungi	i really only see that assumption as a risk if we expect to have digits in the middle of the host portion of some server names	19:35
fungi	if we can assume digits will always fall at the end, it's fine	19:35
clarkb	ya so far that is true	19:36
*** wolverineav has joined #openstack-infra		19:38
*** dklyle has joined #openstack-infra		19:42
notmyname	I'm seeing a permission denied error on one of our test jobs. it doesn't seem to be something related to swift code, so I'm hoping someone here may be able to provide some insight. http://logs.openstack.org/16/625116/3/experimental/swift-multinode-rolling-upgrade-queens/fa6db38/job-output.txt.gz#_2018-12-14_19_23_50_362102	19:42
clarkb	notmyname: let me see	19:42
notmyname	thank	19:43
notmyname	s	19:43
openstackgerrit	Sean McGinnis proposed openstack-infra/irc-meetings master: Switch release team meeting to Thursday 1600 https://review.openstack.org/625290	19:45
clarkb	notmyname: I think pip is saying it can't read the git repo for swift. I believe that command is running as the zuul user so if something earlier in the job has updated or chowned that repo to another user that may explain it (dgging in logs for any evidence of that)	19:45
clarkb	notmyname: http://logs.openstack.org/16/625116/3/experimental/swift-multinode-rolling-upgrade-queens/fa6db38/ara-report/file/1ba7e97e-08df-4385-b6ca-32e4edce0d22/#line-28 I think that may do it	19:47
clarkb	notmyname: that task is running python setup.py develop in teh swift repo as root which will modify the build dir and package link stuff iirc. Then when tox tries to do the same as zuul user it fails to update those files	19:48
clarkb	notmyname: I think there ae a few options to fix that 1) only install swift external to tox (there is a tox setting to not install the source repo) 2) only install with tox and don't do it externally	19:49
clarkb	if you are just using tox as a way to trigger the testsuite then the first option might make the most sense	19:49
clarkb	there is a 3) which is have a subsequent task cleanup/chown things so that tox works	19:50
notmyname	clarkb: ah, ok. thanks. I'll have to think on this and talk with timburke and tdasilva to see what the best option is	19:50
clarkb	and 4) run tox as root	19:50
*** dklyle has quit IRC		19:55
mriedem	i just noticed that https://docs.openstack.org/nova/latest/admin/live-migration-usage.html hasn't been updated since september, but there have been changes to that doc since then - is that normal?	19:57
fungi	mriedem: possible your doc publication jobs have broken, i suppose. looking now	19:58
*** mtreinish has joined #openstack-infra		19:58
mtreinish	is there any way for us to check the load on the trovie instnace that runs the subunit2sql db	19:58
mtreinish	I've had a couple queries going for >173min. and I'm wondering if the little trove node is too small for the size of the db now	19:59
clarkb	mtreinish: I don't think we get system level access like that so unless mysql can provide that info I'm guessing no	20:00
fungi	mriedem: http://logs.openstack.org/f6/f6996903d2ef0fdb40135b506c83ed6517b28e19/post/publish-openstack-tox-docs/e140ad1/job-output.txt.gz#_2018-12-14_15_29_25_026373	20:00
fungi	looks like it's getting built and included	20:00
openstackgerrit	Merged openstack-infra/system-config master: Turn on the future parser for logstash.openstack.org https://review.openstack.org/615660	20:00
clarkb	but maybe rax collects that data for us?	20:00
mtreinish	clarkb: hmm, that's what I expected the answer was gonna be :/	20:01
fungi	clarkb: mtreinish: yes, i think we can see it in the rackspace cloud dashboard	20:01
mtreinish	well mriedem will just have to keep waiting for his list of slowest tempest tests	20:01
fungi	heh	20:01
mtreinish	it probably wouldn't hurt to check the dashboard though and think about upsizing the node	20:03
clarkb	can we/should we trim the db?	20:03
mtreinish	we trim at 6 months now	20:03
clarkb	ah so its already bound, in that case ya maybe upsize is best if we see indication its too small	20:03
* clarkb tries to figure out where that lives		20:05
mtreinish	well we were supposed to be running a cron or something to trim to six months, fungi and I set that up a long time ago	20:06
mtreinish	but I just checked the oldest entries in the db and it's from june 2014	20:06
clarkb	nice	20:06
mtreinish	oh, but theres only a few	20:06
mtreinish	then 2016	20:06
mtreinish	10/2016	20:06
mtreinish	then 11/2017	20:07
mtreinish	and then it's 6months and a lot more data	20:07
mtreinish	I guess our triming job isn't perfect :p	20:07
clarkb	might need to do multiple passes to get the new stuff	20:08
clarkb	so I see a bunch of our DBs for services but not one for health	20:08
mriedem	mtreinish: gibi already eyeballed it	20:08
clarkb	whcih makes me think i am not looking in the right place	20:08
mriedem	using good ol' gumption	20:08
mriedem	see comments 11 and 12 https://bugs.launchpad.net/tempest/+bug/1783405	20:09
openstack	Launchpad bug 1783405 in tempest "Slow tests randomly timing out jobs (which aren't marked slow)" [High,In progress] - Assigned to Ghanshyam Mann (ghanshyammann)	20:09
*** betherly has joined #openstack-infra		20:09
clarkb	fungi: any idea where the db is hiding?	20:09
mtreinish	clarkb: it's probably called subunit2sql something	20:09
mriedem	fungi: ok maybe my browser has that page cached.../me hard refreshes	20:09
fungi	yeah, it'll be the subunit2sql db	20:09
clarkb	mtreinish: bah there it is	20:09
clarkb	ok I'm just blind in that case :)	20:09
mtreinish	we turned it on around paris iirc :p	20:09
*** bobh has quit IRC		20:09
mriedem	hmm, that didn't help	20:10
clarkb	mtreinish: load average spiked to 15 and is on its way down now but still high	20:10
mtreinish	mriedem: heh, ok	20:10
mtreinish	mriedem: oh, you actually used openstack-health?	20:10
*** armstrong has quit IRC		20:10
mtreinish	clarkb: hmm, that's probably not a good sign	20:10
clarkb	memory usage is either happy or the gauge isn't working	20:10
clarkb	(I don't see any memory usage)	20:11
clarkb	disk usage looks ok	20:11
*** e0ne has joined #openstack-infra		20:11
mtreinish	I'm pretty sure we used the smallest node size when we provisioned it	20:11
*** e0ne has quit IRC		20:11
clarkb	its pretty big now at least for disk	20:11
clarkb	I don't see an indication of the cpus we've got	20:12
mriedem	mtreinish: i did, however, i still complained about it's ux when i did (in here yesterday)	20:12
mriedem	i can't use it w/o cursing it's name first	20:12
openstackgerrit	Merged openstack-infra/system-config master: Add rust-vmm OpenDev ML https://review.openstack.org/625254	20:13
clarkb	mtreinish: I take that back I misread this graph	20:13
clarkb	cpu usage is 15% not a load average	20:13
*** betherly has quit IRC		20:13
clarkb	load average is ~2	20:14
clarkb	and has been for ~3 hours	20:14
mtreinish	heh, well that's when I started running my script	20:14
clarkb	so it seems that we notice your script but it doesn't seem that the script has used all the available memory or cpu or disk	20:15
clarkb	might also need to consider if the query is inefficient or can be improved?	20:16
mtreinish	I was looking at the explain before it didn't look that bad to me, but I'm hardly an expert	20:17
clarkb	the 15% cpu usage implies that either we have a bunch of cpus and mysql can't use them for that query or we are waiting on io?	20:17
mtreinish	http://paste.openstack.org/show/737336/	20:18
mtreinish	err I guess I misread it before, it's not that great	20:19
mtreinish	I'm probably trying to grab too much data at once	20:19
clarkb	the time scale on load average and cpu % are different so overlapping them in my head is hard	20:19
mtreinish	mordred: ^^^ want to fix it for me :p	20:20
clarkb	mtreinish: I'm definitely not a db expert :)	20:20
clarkb	mtreinish: is the 500k rows for test ids unique runs or just unique tests	20:20
clarkb	because ya if then going through 500k unique tests to find all the unique run times that could be quite expensive	20:20
*** jento has quit IRC		20:21
clarkb	but if its 500k unique test runs I would expect that to be easy for it	20:21
mtreinish	it's 500k unique test_ids (which I expect is that table's size)	20:21
mtreinish	the tests table is the test name, total run counts, and a moving average of run times for that individual test across all runs	20:22
*** armax has quit IRC		20:22
mtreinish	the query my script generated was: http://paste.openstack.org/show/737337/ (yeah paste not line wrapping)	20:22
mtreinish	I just called: https://github.com/openstack-infra/subunit2sql/blob/master/subunit2sql/db/api.py#L1854 although looking at that it's grabbing more data than i actually need (and has an extra join for no benefit because of that)	20:27
Shrews	oof, that explain doesn't look so great for that query	20:31
*** rkukura has joined #openstack-infra		20:31
Shrews	you might try a combined index in 'runs' that contains both uuid and id, but i'm speculating what the table schema actually looks like	20:31
Shrews	b/c that first table scan is likely what's hurting you	20:32
Shrews	been a while since i did that type of stuff, too, so take with a grain of salt :)	20:32
clarkb	fungi: as an fyi it appears lists.opendev.org records are in place	20:35
fungi	excellent	20:36
clarkb	fwiw logstash seems to have future parsered and broken its apache	20:37
clarkb	which isn't a huge emergency but I'm sorting that out now	20:37
*** priteau has quit IRC		20:37
fungi	#status log started the new opendev mailing list manager process with `sudo service mailman-opendev start` on lists.openstack.org	20:39
openstackstatus	fungi: finished logging	20:39
*** armax has joined #openstack-infra		20:39
*** diablo_rojo has joined #openstack-infra		20:40
*** wolverineav has quit IRC		20:43
*** wolverineav has joined #openstack-infra		20:44
mtreinish	Shrews: that seems correct, I'm trying to rewrite the script using a better query right now	20:44
openstackgerrit	Clark Boylan proposed openstack-infra/puppet-kibana master: Set server admin var so that vhost works https://review.openstack.org/625344	20:47
clarkb	fungi: ^ I expect that to be the fix for logstash.o.o apache brokenness	20:48
fungi	clarkb: for some reason the mm_domains variable addition in system-config change doesn't seem to have propagated to lists.o.o	20:48
clarkb	and now I must lunch	20:48
clarkb	oh hrm	20:48
clarkb	did puppet actually run?	20:48
fungi	it created the new mailing list	20:48
fungi	it just didn't update exim configuration	20:48
clarkb	yes pupept has run accordingto syslog	20:48
clarkb	fungi: I can help look after lunch	20:49
fungi	-r--r--r-- 1 root root 34445 Nov 13 04:15 /etc/exim4/exim4.conf	20:49
fungi	i have to disappear to meet someone in ~25 minutes but may work it out before then	20:49
*** wolverineav has quit IRC		20:49
*** betherly has joined #openstack-infra		20:50
clarkb	have a link to the line that sets it?	20:52
fungi	as best i can tell roles/exim/templates/exim4.conf.j2 isn't getting applied by ansible	20:54
*** betherly has quit IRC		20:55
clarkb	oh right exim is in ansible now	20:55
fungi	the "Write Exim config file" task in roles/exim/tasks/main.yaml is what should be applying that	20:56
clarkb	/var/log/ansible on bridge has the log files	20:57
fungi	it sets dest: "{{ config_file }}"	20:57
*** wolverineav has joined #openstack-infra		20:58
fungi	which roles/exim/vars/Debian.yaml sets to /etc/exim4/exim4.conf	20:58
clarkb	I dont see mm_domains in the conf.j2 fike	21:00
clarkb	*file	21:00
fungi	i still feel uninformed about ansible... does it automatically replace files with a template task?	21:00
clarkb	yes it should	21:00
fungi	clarkb: it's used to set a couple values in playbooks/host_vars/lists.openstack.org.yaml	21:01
fungi	not used in the role itself	21:01
clarkb	ah its transitive	21:01
fungi	one of them is exim_local_domains which is what i'm tracking down now	21:01
fungi	that does then get included in the template	21:01
fungi	but the template never seems to have been written out on lists.o.o once that was updated	21:02
clarkb	the force parameter to template is defaulted to yes	21:03
clarkb	so should update if contents differ	21:03
dmsimard	unrelated question, I want to sync a feature branch with master with no regard to git history -- in the context of gerrit and zuul, should I send a merge commit for review or should I delete and re-create the branch ?	21:04
dmsimard	I guess it's sort of the reverse of when we merged the zuulv3 branch of zuul back into master	21:05
clarkb	fungi do we maybe overwrite mm_domains in the ansible var data on bridge?	21:05
clarkb	and so old value supercedes your ew value?	21:05
clarkb	dmsimard: is ths feature branch in gsrrit?	21:05
dmsimard	clarkb: it is	21:06
clarkb	it already exists that is? I would msrge master to featude then push that to gsrrit	21:06
dmsimard	openstack/ansible-role-ara has a feature/1.0 branch	21:06
dmsimard	ok so something like git merge master feature/1.0 and then git push gerrit feature/1.0 ? or a git review ?	21:06
mtreinish	mriedem: fwiw: http://paste.openstack.org/show/737340/ is the average run time of every test over the last 300 runs of tempest-full in gate	21:08
clarkb	dmsimard: git review should work	21:08
clarkb	dmsimard: it will push up a proposal chagne for merging the merge commit	21:08
dmsimard	clarkb: ok, I don't think I've ever sent a merge commit for review -- I'll try that, thanks :D	21:08
clarkb	fungi: its not overridden in private vars from what I see	21:08
fungi	clarkb: on bridge.o.o `sudo grep -ir mm_domains /etc` only turns up hits in /etc/puppet/modules/exim/templates/exim4.conf.erb which is presumably vestigial	21:09
clarkb	dmsimard: note that you may have to allow it in your acls file	21:09
clarkb	fungi: ya that should be ignored beacuse its puppet	21:09
mtreinish	mriedem: you can generate it in the future by just running: http://paste.openstack.org/show/737341/	21:09
clarkb	fungi: I'm currently looking for evidence that role ran at all	21:11
*** rlandy has quit IRC		21:12
fungi	yeah, i was checking the logs on bridge.o.o and thinking the exim role isn't getting used?	21:12
clarkb	fungi: its in playbooks/base.yaml	21:13
clarkb	but ya I don't see evidence of it in the logs	21:13
*** chandan_kumar has joined #openstack-infra		21:13
fungi	i agree, seems to be included for all of hosts: "!disabled"	21:14
clarkb	base-server is running according to the logs	21:15
fungi	clarkb: i don't see the iptables role getting applied either	21:16
fungi	and it's in the same set in base	21:16
clarkb	ya but base-server is which is really weird	21:16
fungi	i need to run now. worst case we can append lists.opendev.org to the two hostlists in /etc/exim/exim4.conf and reload the exim4 service while digging deeper	21:16
clarkb	its almost like ansible crashes	21:17
clarkb	the logs go from base-server to "Base: configure OpenStackSDK on bridge which is the next block of tasks	21:18
fungi	i've manually appended that hostname to the hostlists in the exim config and reloaded	21:18
fungi	and with that i'm disappearing for an hour or so but will check on this as soon as i get back	21:18
clarkb	it looks like my rewrok of the debian handling for arm may be the cause?	21:18
clarkb	at least it gets to that point then stops	21:18
mriedem	mtreinish: ok 2-4 there in that first paste are the same things i identified from gibi's comments in the bug report	21:19
mriedem	so that's good to know	21:19
mriedem	tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern: 161.5385627070707 is the only difference, but that one shouldn't surprise me	21:19
clarkb	dmsimard: is include_tasks as used at https://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/roles/base-server/tasks/Ubuntu.xenial.aarch64.yaml#n6 not expected to work?	21:20
mriedem	kind of surprised that isn't already marked as slow	21:20
clarkb	dmsimard: http://paste.openstack.org/show/737342/ I'm seeing that include_task seemingly nop then the entire playbook jumps to the next play https://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/base.yaml#n12 any idea why that happens	21:23
*** tpsilva has quit IRC		21:23
clarkb	also extra scary here is that ansible isn't failing (this is something that for all puppet's problems it was really good at, if something can't happen then be safe and stop)	21:25
dmsimard	clarkb: I have no idea why that is	21:25
dmsimard	hang on	21:26
dmsimard	hum	21:27
*** dklyle has joined #openstack-infra		21:27
clarkb	dmsimard: I'm tempted to just copy pasta that set of tasks from Debian.yaml into the arm64 task list for now	21:28
clarkb	but am open to other ideas (like is it worth trying import_tasks instead of include_tasks?)	21:29
dmsimard	I'm a bit lost in the Ansible transition between include and import, especially across versions	21:29
clarkb	I'm sure I'm more lost :)	21:29
dmsimard	My understanding is that include is parsed "at runtime" and is meant to be used when there are conditions attached	21:29
clarkb	(this is a specific topic that could use better docs)	21:29
dmsimard	While import is static	21:30
clarkb	dmsimard: ya, but in this case its being called from a file that itself is conditionally loaded	21:30
clarkb	so I'm guessing it has to be included not imported? or may not	21:30
dmsimard	worth a try	21:30
clarkb	https://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/roles/base-server/tasks/main.yaml#n67	21:31
dmsimard	import_* came with Ansible >= 2.5	21:31
dmsimard	According to https://docs.ansible.com/ansible/latest/user_guide/playbooks_reuse_includes.html	21:31
clarkb	maybe it can't handle multiple levels of iclude	21:31
clarkb	(the docs say it can but could be buggy)	21:31
dmsimard	yeah, I'm not implying there's no bug or odd behavior at play here	21:32
dmsimard	There's definitely something weird going on	21:32
*** dklyle has quit IRC		21:32
*** dklyle has joined #openstack-infra		21:33
dmsimard	In unrelated news, I was looking at a system-config run through ara (to see what the base-server role did) and I'm confused why a job came back successful despite a failure http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/hosts/bridge.openstack.org/ara-report/	21:33
dmsimard	(from https://review.openstack.org/#/c/605585/)	21:33
clarkb	could be the same bug	21:34
clarkb	ansible is apparently not trackign failures in both cases	21:35
clarkb	we should double check we are checking return codes properly too	21:35
*** woojay has quit IRC		21:35
dmsimard	The failure in that particular system-config run was http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/hosts/bridge.openstack.org/ara-report/result/e3ced53a-5b94-484c-976d-868386826527/	21:35
dmsimard	But everything is green from the perspective of Zuul :/ http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/ara-report/	21:36
*** woojay has joined #openstack-infra		21:37
dmsimard	ctrl+f for "root_rsa_key" is turning up empty in the full job log..	21:38
openstackgerrit	Clark Boylan proposed openstack-infra/system-config master: Copy pasta the debian base server bits, don't include them https://review.openstack.org/625350	21:38
clarkb	there is the naive make it work change (I hope that makes it work	21:38
clarkb	dmsimard: where does bridge.yaml run? I see the base.yaml in the surrounding ara report	21:40
*** jamesmcarthur has joined #openstack-infra		21:40
dmsimard	clarkb: that's what I was trying to find out but I ended up being even more confused	21:41
dmsimard	The failing task is "Write out ssh private key" which is "changed" in run-base.yaml (from the perspective of zuul) but it's "failed" when it later runs for bridge.yaml from the perspective of system-config ?	21:42
dmsimard	http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/job-output.txt.gz#_2018-12-14_19_41_53_387941	21:42
dmsimard	That's the only instance of "Write out ssh private key" in the job logs, I suppose the one from inside system-config is not in the stdout	21:42
clarkb	dmsimard: its from run-base.yaml	21:43
clarkb	which is the run playbook for the job	21:43
*** EvilienM is now known as EmilienM		21:44
*** weshay is now known as weshay_pto		21:44
*** jamesmcarthur has quit IRC		21:44
*** jamesmcarthur has joined #openstack-infra		21:44
clarkb	is it possible the streams are getting crossed?	21:44
clarkb	dmsimard: ok I think I get it now maybe	21:46
clarkb	dmsimard: the job's run playbook runs bridge.yaml to set things up for the job. We pass in a ssh key value there http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/ara-report/result/ee88cd4f-f3b3-4ea8-b0f6-d0fbc9f05bea/	21:47
dmsimard	yeah, that one works	21:47
dmsimard	but the nested one doesn't	21:47
clarkb	dmsimard: then what the job is testing is that we can run ansible against all the hosts in our inventory which happens to include bridge.o.o so it reruns bridge.yaml only this time we don't pass in the ssh key info	21:47
dmsimard	what I don't understand is that there's no such thing as "Write out ssh keys" in the ansible-playbook command output, unless I'm not looking at the right one: http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/ara-report/result/89704ccd-5067-4715-81c9-fa0dcee02e55/	21:48
clarkb	dmsimard: ya thats base.yaml which doesn't run bridge.yaml I don't think	21:48
clarkb	run_all.sh runs bridge.yaml	21:49
clarkb	which is where this gets weird	21:49
clarkb	its the cron	21:49
*** jtomasek has quit IRC		21:49
dmsimard	so that's why we wouldn't have the output anywhere then ? it's in the cron shell ?	21:49
clarkb	so that run isn't part of the job except that the job installs a cron to run things every 15 minutes	21:49
clarkb	yup	21:49
clarkb	and that cron ansible is corssing the streams with your nested ara	21:49
clarkb	I think the clean up for this is to disable the cron on test jobs	21:50
dmsimard	that would explain why the job didn't fail as a result	21:50
clarkb	yup	21:50
dmsimard	ianw: ^ FYI, tl;dr is there was a failure in the nested bridge.o.o ara report and I was confused as to why Zuul hadn't failed the job: http://logs.openstack.org/85/605585/20/check/system-config-run-base/b98ee49/hosts/bridge.openstack.org/ara-report/	21:52
clarkb	dmsimard: fungi: https://review.openstack.org/#/c/625350/1 is I think the short term ansible fix (and we should watch it when it goes in to make sure iptables continues working as expected ugh, I really wish that ansible would fail safe when it crashes like this)	21:53
clarkb	dmsimard: also thinking about that more I wonder if the issue is including a task list that is included elsewhere by other hosts in the same play	21:53
clarkb	basically we were trying to dedup special tasks for arm by doing arm specific then generic tasks, but may be the reorg here is do all the generic tasks on debuntu then include only the arm specific tasks from there?	21:54
clarkb	I dunno this is all incredibly cryptic to me and it not actually knowing it failed makes it harder to understand	21:54
*** dklyle has quit IRC		21:54
*** jamesmcarthur has quit IRC		21:54
dmsimard	clarkb: where is that failure occurring ? not in zuul right ?	21:55
clarkb	no this is to apply things to production. I don't think it happens in the zuul jobs beacuse we don't use arm64 test nodes in the zuul jobs	21:55
clarkb	we could add an arm64 node to the system-config inventory to further test things (but that seems out of scope for now)	21:56
clarkb	(at least for me trying to make things happy before weekend)	21:56
*** dklyle has joined #openstack-infra		21:56
dmsimard	clarkb: I don't fully recognize the output format of the paste you sent me, could a callback be eating some sort of trace or failure ?	21:56
clarkb	maybe? I think we use default output for that	21:57
dmsimard	where does "2018-12-14 20:54:28,515 p=11685 u=root \|" come from ? is that journalctl ?	21:57
clarkb	callback_whitelist=profile_tasks, timer	21:57
clarkb	I think from the timer callback	21:57
clarkb	or the profuiler?	21:57
clarkb	I dunno	21:58
clarkb	it does look like syslog/journalctl format though	21:58
dmsimard	ah, it looks like it's the format provided by ansible from log_path=/var/log/ansible/ansible.log	21:59
*** mriedem has quit IRC		22:01
openstackgerrit	Merged openstack-infra/puppet-kibana master: Set server admin var so that vhost works https://review.openstack.org/625344	22:02
*** dklyle has quit IRC		22:02
openstackgerrit	Clark Boylan proposed openstack-infra/system-config master: Stop running unnecessary tests on trusty https://review.openstack.org/625358	22:02
clarkb	and ^ is a small cleanup optomization I noticed we could make	22:03
dmsimard	clarkb: my gut feeling is that the logging doesn't tell the whole story	22:03
dmsimard	a bit like how the zuul callback munges some traces or errors	22:03
clarkb	dmsimard: ya I'm beginning to think it must be a corner case of include_task for a file that is already included in the play	22:03
dmsimard	sometimes* munges	22:04
clarkb	its being included for a different set of nodes, but the way ansible queues things up is global iirc	22:04
dmsimard	clarkb: if we have the ability to run that playbook in the foreground, we would certainly get a different output	22:04
*** priteau has joined #openstack-infra		22:05
*** priteau has quit IRC		22:05
clarkb	dmsimard: probably the path forward here is merge the copy pasta (I assume that won't break t hings similarly) then make a revert that dds an arm64 test node to the inventory and tweak until it works	22:06
clarkb	keeping in mind that ansible success doesn't mean it actually worked	22:07
dmsimard	clarkb: hmmm, the output is the same in the run_all raw output http://paste.openstack.org/show/737344/	22:09
dmsimard	I'm out of ideas	22:09
dmsimard	¯\_(ツ)_/¯	22:09
clarkb	it definitely looks like ansible just goes "NOPE"	22:09
clarkb	and continues with the next play	22:09
clarkb	dmsimard: should I try to fabricobble a bug together on github for this?	22:13
clarkb	I don't really have much info other than the setup and logs and ansible version info	22:13
*** dklyle has joined #openstack-infra		22:13
clarkb	I guess I can start there and see if anyone in ansible land is able/willing to debug further	22:13
dmsimard	We have a good case for a bug if we can come up with a generic reproducer	22:13
*** betherly has joined #openstack-infra		22:14
*** boden has quit IRC		22:14
dmsimard	Probably worthwhile to check if there's already an issue about this too	22:14
dmsimard	yikes, the answer in https://github.com/ansible/ansible/issues/41984 is basically "include_tasks is in tech preview, use at your own risks"	22:16
*** trown is now known as trown\|outtypewww		22:16
dmsimard	(that was a while ago, though)	22:16
clarkb	re reproducing if I had to guess the trick there is having a play that includes foo.yaml and bar.yaml on different hosts, then have bar.yaml include_tasks: foo.yaml	22:17
*** dklyle has quit IRC		22:18
dmsimard	you can probably fake it using add_host	22:18
clarkb	play1 runs role1 and role2. role1 executors foo.yaml via include_tasks on some hosts and bar.yaml on others. Have bar.yaml also include_tasks foo.yaml	22:18
dmsimard	(with ansible_connection: local)	22:18
clarkb	then role2 never runs (and neither does play2)	22:18
*** betherly has quit IRC		22:18
clarkb	or just use an inventory with a few connect local settings	22:19
*** rfolco has quit IRC		22:22
*** e0ne has joined #openstack-infra		22:23
*** betherly has joined #openstack-infra		22:24
*** betherly has quit IRC		22:29
openstackgerrit	demosdemon proposed openstack-dev/pbr master: Resolve ``ValueError`` when mapping value contains a literal ``=``. https://review.openstack.org/625364	22:32
clarkb	dmsimard: I can reproduce	22:32
clarkb	totally doesn't print any errors either as far as I can tell	22:32
clarkb	I'm double checkign that copying the tasks fixes	22:32
*** slaweq has quit IRC		22:33
clarkb	yup	22:33
clarkb	that is amazing	22:33
clarkb	I'm going to take a short break then will be back to file a bug report	22:34
*** eernst has joined #openstack-infra		22:34
*** dklyle has joined #openstack-infra		22:37
*** e0ne has quit IRC		22:40
*** dklyle has quit IRC		22:41
*** dklyle has joined #openstack-infra		22:42
*** eernst has quit IRC		22:48
*** wolverineav has quit IRC		22:51
dmsimard	signing off for today, do share the link, I'm curious :)	22:52
kmalloc	Shrews, ianw, mordred, clarkb: https://review.openstack.org/625370 <-- dogpile.cache and sdk fix	22:52
kmalloc	i am unsure how to actually run this as a unit test...	22:53
kmalloc	but that has been confirmed locally to fix the issue	22:53
notmyname	FYI https://blade.tencent.com/magellan/index_en.html	22:55
notmyname	there's a severe lack of detail (including a CVE), but it sounds scary. but maybe "just" upgrading will mitigate it?	22:56
*** kgiusti has left #openstack-infra		22:57
*** dklyle has quit IRC		22:57
kmalloc	notmyname: gross.	22:58
kmalloc	i think chromium is fixed with upgrade according to that.	22:58
kmalloc	not sure if it's SQLite upgrade fixing =/ man so few details.	22:59
*** wolverineav has joined #openstack-infra		22:59
*** jamesmcarthur has joined #openstack-infra		22:59
notmyname	ah, it's the one linked in https://news.ycombinator.com/item?id=18685516. seems related to WebSQL	23:00
kmalloc	ahh	23:00
notmyname	CVE TBD	23:00
notmyname	well, that's at least the chromium issue. not sure if there are others	23:02
*** wolverineav has quit IRC		23:03
*** jamesmcarthur has quit IRC		23:03
kmalloc	yeah	23:04
*** wolverineav has joined #openstack-infra		23:06
*** lbragstad has quit IRC		23:08
clarkb	kmalloc: thanks and notmyname that looks like a fun one	23:09
clarkb	I'm going to figure out how to write this bug report for ansible without making the person taking it on go crazy	23:09
notmyname	heh	23:09
notmyname	clarkb: looks like https://review.openstack.org/#/c/625361/ worked for the permissions error	23:10
openstackgerrit	demosdemon proposed openstack-dev/pbr master: Resolve ``ValueError`` when mapping value contains a literal ``=``. https://review.openstack.org/625372	23:11
clarkb	notmyname: cool	23:14
*** lbragstad has joined #openstack-infra		23:17
fungi	clarkb: okay, back from steak night and catching up	23:20
fungi	sounds like you found a bug	23:20
clarkb	fungi: ya In the process of writing a bug	23:22
*** lbragstad has quit IRC		23:22
clarkb	just pushed https://github.com/cboylan/ansible_include_tasks_crash to refer to in the bug	23:22
clarkb	as its easie rto do that than to try and do all this in markdown I think	23:22
*** eernst has joined #openstack-infra		23:26
fungi	sure	23:26
fungi	i would have never figured that out myself, fwiw	23:27
fungi	good job	23:27
fungi	ansible is still so much blackbox to me	23:27
scas	keeping ruby in wet memory for chef leaves ansible pretty opaque for me. it's a trade-off	23:35
clarkb	fungi: dmsimard mordred Shrews https://github.com/ansible/ansible/issues/49969	23:36
clarkb	imo its a pretty big deal because ansible shouldn't fail and continue running like that	23:37
clarkb	hrm logstash webserver still broken	23:37
*** diablo_rojo has quit IRC		23:37
clarkb	arg I know why	23:38
fungi	do share	23:38
*** yamamoto has quit IRC		23:39
openstackgerrit	Clark Boylan proposed openstack-infra/puppet-kibana master: Use full lookup path for serveradmin in template https://review.openstack.org/625374	23:41
clarkb	fungi: ^ beacuse I fail at puppet	23:41
clarkb	the good news is I fail at ansible equally as evidenced by the include_tasks thing :)	23:41
clarkb	fungi: https://review.openstack.org/625350 and https://review.openstack.org/625374 should fix the two oustanding issues we know we currently have	23:41
clarkb	fungi: on the first one I have no idea if ansible will apply cleanly after not doign so for a few days	23:42
fungi	maybe we just merge and fix any issues we spot over the weekend	23:42
scas	speaking of opaque, what might i do with several cross-repo dependencies that all need each other to pass each build?	23:43
kmalloc	zzzeek: i don't think we need to revert dogpile. what SDK is doing is very .. not normal.	23:43
clarkb	scas: option A build in backward compat/future compat as necessary to get them all happy. option B make tests non voting as necessary to get over hump.	23:44
clarkb	scas: there is an option C too, realize that these peices of software are tightly coupled and might be better off in a single repo	23:44
scas	single repo is not exactly the easiest to manage, since it's configuration management	23:45
scas	option A might be the path forward	23:45
scas	making things non-voting would mean making everything non-voting, negating the testing mechanism	23:45
clarkb	scas: ya B is a short term, thing	23:46
clarkb	that comma is there for I don't know why reasons	23:46
fungi	in openstack, we've held that option a is good engineering and generally downstream-friendly	23:47
clarkb	++	23:47
fungi	since at any point in time, the set of your stuff continues to work	23:47
scas	yeah, that's where i'm leaning	23:47
clarkb	after filing this ansible github bug I feel like I've done my good deed of the week	23:48
clarkb	that was a fun one	23:48
clarkb	fungi: fwiw if https://review.openstack.org/#/c/625350/ looks good to you I don't mind keeping one eyeball on irc/bridge logs this evening	23:48
clarkb	if you want to approve it	23:48
fungi	scas: it's more complexity and more iteration for sure, but has the up-side that if one of those changes gets ignored for weeks due to lack of reviewers or a wayward bus, stuff still runs	23:49
clarkb	and I don't mind self approving the logstash fix since I already self approved the broken fix	23:49
clarkb	(maybe this one is broken too )	23:49
scas	i know force-merging is a less favorable option, but i'm not considering that an option at this point	23:52
scas	i'd rather get them testing without having to rely on local testing alone saying all's well	23:53
fungi	scas: at least here that requires cooperation of the admins for the repository hosting platform	23:53
scas	absolutely, and i'm sure none would be too pleased of me asking it	23:54
fungi	not that we're an uncooperative bunch, we'll just spend a while telling you why it's a bad idea ;)	23:54
scas	i'm familiar with how bad of an idea it can be. it had to be wielded in the recent months for something else unrelated, as it was the only option in that case	23:56
fungi	yeah, it's not necessarily the worst idea, circumstances depending. though it is usually still a bad idea regardless	23:57
fungi	sometimes all the other ideas are simply worse still	23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!