Tuesday, 2016-08-16

*** markvoelker has joined #openstack-infra		00:00
*** tkelsey has joined #openstack-infra		00:00
*** jimbaker has joined #openstack-infra		00:00
*** jimbaker has quit IRC		00:00
*** jimbaker has joined #openstack-infra		00:00
cloudnull	pabelanger mordred is there a way we could store the console log from a vm if it's marked error by nodepool?	00:01
pabelanger	cloudnull: Ya, we don't have a way today to keep a node online with ready-script failure. Maybe jeblair has some thoughts on that	00:01
cloudnull	IE boot, if fail store console log, delete?	00:02
clarkb	now thats interesting, ubuntu does run an ntpdate on if up	00:02
clarkb	I wonder if its failing to resolve dns at that point due to unbound not being up?	00:02
pabelanger	we could make our configure_mirror.sh script smarter	00:02
*** Goneri has quit IRC		00:03
*** spzala has joined #openstack-infra		00:03
cloudnull	I guess I could enable defered delete for a while and try to trap the log of instances.	00:04
openstackgerrit	Ian Wienand proposed openstack-infra/project-config: Further F24 kernel update https://review.openstack.org/353783	00:04
clarkb	its tempting to just go back to ntpdate, deprecation or not, there doesn't seem to be any other sane tools to do this	00:04
fungi	clarkb: not all of our providers (/me glares) provide nova console log access, so we haven't relied on it in nodepool previously	00:05
fungi	er, cloudnull ^	00:05
fungi	sorry, clarkb	00:05
* fungi failx0rz at teh tabcompletes		00:05
*** tkelsey has quit IRC		00:05
* cloudnull knows who to glare at...		00:05
mordred	cloudnull: we have the ability to hold nodes on error in nodepool - but it currently only works on job names	00:06
fungi	so, yes, it's possible obviously. nodepool calls openstack apis, nodepool logs things, nodepool could call another api method and log the results	00:06
fungi	it's "just" a matter of code, as they say	00:06
* mordred muses having a feature of be able to grab an error node from a provider rather than a job		00:06
*** piet_ has joined #openstack-infra		00:07
*** baoli has joined #openstack-infra		00:07
cloudnull	I think for now i'll set the reclaim_instance_interval w/in the nova.conf to something like an hour or so.	00:07
mordred	fungi: it would not be difficult "just code" to attempt a console log grab on node boot error	00:08
fungi	agreed	00:08
cloudnull	then next time we have an ssh timeout let me know and I can go look at the things.	00:08
mordred	fungi: I'm cooking meat now - but I can make that patch tomorrow	00:08
fungi	yeah, was going to say, as long as it ends up on someone's "just code" list, that's the tricky bit	00:09
clarkb	ianw: pabelanger so I am open to ideas, but even using eg chrony on centos/fedora and ntp on ubuntu/debian isn't going to fix this for us I don't think	00:09
*** tqtran has quit IRC		00:09
clarkb	ianw: pabelanger since we will continue to run into the problem of slowly skewing time rather than making a step at boot to avoid that	00:09
fungi	speaking of "just code" third batch of contributor registration discount codes for barcelona just finished going out. 270 in ~3 weeks	00:10
*** PalTale has joined #openstack-infra		00:10
ianw	clarkb: yes, i don't really see ntpdate actually being deprecated, despite what it says. the RH maintainer tells people it's about the only sane way to start ntp	00:10
fungi	used latest state of 263971 for that (i also did lots of additional validation of results against older data to make sure it did what was expected of it)	00:10
fungi	ianw: though the rh maintainer also said not relying on ntp was even saner on rh-derivatives since it's no longer default	00:11
clarkb	ianw: even using chronyd you have to do non default things to make it actually step from my reading	00:11
clarkb	basically the time sync services as implemented by these distros don't solve this problem	00:12
clarkb	which si annoying	00:12
harlowja	mordred ' You might even come to the conclusion that my personal preferences	00:12
harlowja	or needs are not the most important thing. I'	00:12
harlowja	but they are!	00:12
harlowja	ha	00:12
mordred	harlowja: :)	00:12
harlowja	as long as your preferences are my preferences	00:12
harlowja	lol	00:12
*** spzala has quit IRC		00:12
openstackgerrit	Chris Krelle proposed openstack/diskimage-builder: WIP: A hardware burn-in element. https://review.openstack.org/355675	00:13
ianw	fungi: yes, that too	00:13
mordred	harlowja: listen - you are entitled to your own wrong opinion	00:13
fungi	clarkb: i wonder if the openntpd package for debian/ubuntu has a config option to start with -s	00:13
harlowja	mordred not if donald gets elected, lol	00:13
mordred	harlowja: I believe it's a basic human right	00:13
* mordred steps away from election talk ...		00:13
harlowja	hahahahha	00:13
ianw	clarkb: i believe "chronyc makestep" is the ntpdate equivalent	00:14
* harlowja goes right into the deep end		00:14
openstackgerrit	Abhishek Raut proposed openstack-infra/project-config: Use python-db-jobs for tap-as-a-service https://review.openstack.org/355670	00:14
clarkb	ianw: yup but that doesn't happen for us on boot	00:15
clarkb	ianw: so we would have to write our own service to do it or otherwise hack it in	00:15
openstackgerrit	Merged openstack-infra/system-config: Fix firehose hostname on cacti hiera https://review.openstack.org/355671	00:19
ianw	clarkb: the config is	00:22
ianw	# In first three updates step the system clock instead of slew	00:22
ianw	# if the adjustment is larger than 1 second.	00:22
ianw	makestep 1.0 3	00:22
clarkb	ianw: thats teh chronyd default config on centos/fedora?	00:23
*** Hal has quit IRC		00:23
ianw	clarkb: yes	00:23
clarkb	ah ok	00:23
ianw	so, of course systemd is in the mix here	00:23
clarkb	yes systemd has its own service to do syncing	00:24
clarkb	but it has almost no docs	00:24
ianw	oh, that's not in use, but i think chrony does have network detection service bits	00:24
clarkb	ianw: its in use on ubuntu xenial :/	00:24
ianw	particularly http://pkgs.fedoraproject.org/cgit/rpms/chrony.git/tree/chrony-dnssrv@.service	00:24
*** Swami has quit IRC		00:24
clarkb	by default	00:24
openstackgerrit	YAMAMOTO Takashi proposed openstack-infra/project-config: networking-midonet: switch to python-db-jobs https://review.openstack.org/335551	00:25
*** gildub has joined #openstack-infra		00:26
ianw	clarkb: ah ... so now we have 3 methods to set the time	00:27
clarkb	ianw: indeed :(	00:27
*** fitoduarte has quit IRC		00:27
clarkb	ianw: though I am somewhat partial to just using one across the barod if it can be made to work sanely	00:27
clarkb	chronyd seems fine except ubuntu doesn't seemt o have that makestep setup that centos/fedora do	00:27
*** thorst_ has joined #openstack-infra		00:28
clarkb	I wonder if we can configure that via the default file somehow	00:28
*** piet_ has quit IRC		00:29
*** woodster_ has quit IRC		00:29
*** signed8bit is now known as signed8bit_Zzz		00:30
ianw	clarkb: just looking at the deb packaging now...	00:31
clarkb	ianw: I did confirm that an ubuntu-minimal build of xenial boots up with teh systemd service running	00:31
clarkb	and trusty doesn't have anything	00:32
*** gildub_ has joined #openstack-infra		00:32
clarkb	ianw: I think the ideal situation would be to have on each distro we run something that does the similar case to ntpdate first then ntpd. Then completely remove ntp munging from devstack-gate. Sounds like centos/fedora do this with chrony, so need to figure out ubuntu/debian option that works	00:33
*** xarses has quit IRC		00:34
clarkb	my reading of the ntp setup on ubuntu/debian is that it will try to ntpdate on if-up but that doesn't seemt to be working for us? Maybe beacuse of a race between unbound and networking coming up resolving the ntp servers	00:34
*** gongysh has joined #openstack-infra		00:36
clarkb	ianw: and the chrony package on ubuntu will do a burst but not a step from my reading of scripts	00:38
ianw	clarkb: yeah, so config in https://launchpad.net/ubuntu/+archive/primary/+files/chrony_2.1.1-1.debian.tar.xz doesn't specify makestep as you say. to me, a bug saying "redhat does it, it would be nice to be consistent and it's probably what you want anyway" might be ok	00:38
ianw	but time people also seem very, ahh, set in their ways	00:38
ianw	so i expect that might also be closed with a flame to boot	00:39
* fungi always boots in flames		00:39
fungi	and in flaming boots	00:39
clarkb	might be worth attempting to trace the normal ntp boot up and see if ntpdate is in fact running and if it is failing due to other deps not being there at boot	00:41
clarkb	I have some local VMs I can use to try and attempt that but need to go to dinner now	00:41
*** rbuzatu has joined #openstack-infra		00:41
ianw	clarkb: i have an osic vm i've been pottering on for f24. let me rebuild that with a ubuntu image and see if anything pops out	00:42
cloudnull	clarkb mordred pabelanger I have deffered deletes enabled now. if at all possible I'd love to know the next time an instance has ssh issues so i can go hunt down that specific failures and such.	00:45
pabelanger	cloudnull: sure, I can check now	00:46
*** rbuzatu has quit IRC		00:46
*** amotoki has joined #openstack-infra		00:47
pabelanger	cloudnull: 8098e5c0-125f-4fda-9887-496f8f7fdf7d	00:48
pabelanger	cloudnull: just failed	00:48
*** jamielennox is now known as jamielennox\|away		00:48
cloudnull	ok	00:48
ianw	clarkb: ok, so with ntpdate not in the base image, it's not starting on boot for sure	00:49
*** spzala has joined #openstack-infra		00:49
*** jamielennox\|away is now known as jamielennox		00:49
*** tonytan4ever has joined #openstack-infra		00:50
pabelanger	cloudnull: 7b3e102d-3f32-4a76-9e44-abf0b42dad4d is another	00:50
ianw	clarkb: and when it is there, it is called in the network ifup scripts, but -> Aug 16 00:49:35 iwienand-f24-test ntpdate[816]: Can't find host 3.debian.pool.ntp.org: Name or service not known (-2)	00:50
*** csmart has quit IRC		00:52
*** csmart has joined #openstack-infra		00:53
cloudnull	pabelanger: idk if its related but both of those instances are 16.04? do we generally see these ssh failures more on 16.04 than not?	00:54
cloudnull	or is 16.04 just what's more common now?	00:55
pabelanger	cloudnull: let me check, I have logs. we are doing more and more xenial	00:55
cloudnull	also both are using config_drive, is that the default?	00:56
cloudnull	i'd like to spin up lots of tests to reproduce this issue without continuing to bother you :)	00:56
*** fguillot_ has quit IRC		00:56
openstackgerrit	fumihiko kakuma proposed openstack-infra/devstack-gate: Enable to add sudo permission to tempest user https://review.openstack.org/355682	00:57
*** gyee has quit IRC		00:59
*** fguillot_ has joined #openstack-infra		00:59
pabelanger	cloudnull: you are correct, it looks to be only xenial failing	01:00
pabelanger	cloudnull: let me manually launch one and see why	01:00
*** rbuzatu has joined #openstack-infra		01:04
*** aeng has quit IRC		01:04
*** gongysh has quit IRC		01:05
*** aeng has joined #openstack-infra		01:05
*** zhurong has joined #openstack-infra		01:05
ianw	clarkb: so here's how i think it goes on trusty. ./network/if-up.d/ntpdate gets called by ifup ... but dhclient is still working at that point. that's ok, because ./dhcp/dhclient-exit-hooks.d/ntpdate will be called when we actually have network	01:05
ianw	clarkb: none of this happens on boot of our trusty images, because ntpdate isn't installed	01:05
*** esberglu has joined #openstack-infra		01:06
ianw	which is probably the fault of puppet-ntp ... i don't think ntpdate is really an optional component	01:07
*** pahuang has joined #openstack-infra		01:07
*** rbuzatu has quit IRC		01:08
*** tqtran has joined #openstack-infra		01:09
clarkb	aha!	01:09
*** baoli has quit IRC		01:10
*** julim has joined #openstack-infra		01:10
pabelanger	clarkb: cloudnull: okay, reproduced the failure of host git.openstack.org in osic-cloud1. I think we have a race condition, if I ran the command 1min later, it worked	01:11
*** adrian_otto has quit IRC		01:11
clarkb	oraybe a nat issue?	01:11
pabelanger	possible	01:12
pabelanger	let me force ipv6 and reboot	01:12
*** baoli has joined #openstack-infra		01:12
pabelanger	going to also check that sshd depends on unbound too	01:12
*** tqtran has quit IRC		01:13
cloudnull	maybe we can add something like this to the script http://cdn.pasteraw.com/cs48x75pis3n67r63j5mgc0a3fsscur ?	01:14
pabelanger	ya, unbound is taking a while to start	01:14
cloudnull	then it can try for a min or two before failing ?	01:14
*** weshay has quit IRC		01:15
pabelanger	http://paste.openstack.org/show/557770/	01:16
pabelanger	unbound is taking about 1 min to start	01:16
pabelanger	err	01:16
pabelanger	yes, 1 min	01:16
*** esberglu has quit IRC		01:17
*** gildub_ has quit IRC		01:18
*** gildub has quit IRC		01:18
*** jimbaker has quit IRC		01:18
*** gildub has joined #openstack-infra		01:19
cloudnull	pabelanger: rather... http://cdn.pasteraw.com/n57rvu8vw6w3q8mzd9s0hiua5i8v677 -- forgot an import loop there ;)	01:19
*** Apoorva_ has joined #openstack-infra		01:19
pabelanger	cloudnull: Ya, we could trying polling a few times. Let me see why unbound is taking 1 min to start	01:20
*** asettle has joined #openstack-infra		01:22
*** jimbaker has joined #openstack-infra		01:22
*** jimbaker has quit IRC		01:22
*** jimbaker has joined #openstack-infra		01:22
*** Apoorva has quit IRC		01:23
*** Apoorva_ has quit IRC		01:24
cloudnull	going to grab a bite, back in a while.	01:24
*** rajinir has quit IRC		01:25
*** spzala has quit IRC		01:29
*** spzala has joined #openstack-infra		01:30
pabelanger	cloudnull: clarkb: So, I think unbound is blocking on key generation: http://paste.openstack.org/show/557771/ waiting for random from the kernel	01:31
pabelanger	cloudnull: clarkb: so, we can either, make configure_mirror.sh smartly but polling service unbound status every 30second, 10 times: http://paste.openstack.org/show/557773/	01:32
pabelanger	cloudnull: clarkb: see if we can preseed the key, or disable the key	01:32
*** asettle has quit IRC		01:32
*** aeng has quit IRC		01:33
*** spzala has quit IRC		01:34
*** dkehn_ has quit IRC		01:34
*** dkehn has quit IRC		01:34
*** thorst_ has quit IRC		01:38
*** thorst_ has joined #openstack-infra		01:39
*** rfolco has quit IRC		01:39
*** hparekh has quit IRC		01:40
*** nwkarsten has joined #openstack-infra		01:43
*** baoli has quit IRC		01:43
*** amotoki has quit IRC		01:43
*** Sukhdev has quit IRC		01:44
*** gongysh has joined #openstack-infra		01:45
*** elo has quit IRC		01:45
*** dkehn has joined #openstack-infra		01:47
*** dkehn_ has joined #openstack-infra		01:47
*** thorst_ has quit IRC		01:48
*** yanyanhu has joined #openstack-infra		01:48
*** larainema has quit IRC		01:49
openstackgerrit	Tim Burke proposed openstack-dev/hacking: Add optional H203 to check that assertIs(Not)None is used https://review.openstack.org/276517	01:50
*** baoli has joined #openstack-infra		01:51
*** vinaypotluri has quit IRC		01:51
*** hparekh has joined #openstack-infra		01:51
*** gongysh has quit IRC		01:55
*** tkelsey has joined #openstack-infra		02:02
*** thorst_ has joined #openstack-infra		02:02
*** thorst_ has quit IRC		02:03
*** inc0 has joined #openstack-infra		02:03
openstackgerrit	James Polley proposed openstack-dev/pbr: Fix handling of old git log output https://review.openstack.org/339392	02:03
*** gongysh has joined #openstack-infra		02:04
*** dimtruck is now known as zz_dimtruck		02:05
*** rbuzatu has joined #openstack-infra		02:05
*** zz_dimtruck is now known as dimtruck		02:05
openstackgerrit	zhangyanxian proposed openstack-infra/project-config: Fix typo in the Pypi-extract-name.py https://review.openstack.org/355692	02:05
*** tkelsey has quit IRC		02:06
*** jamielennox is now known as jamielennox\|away		02:07
*** xarses has joined #openstack-infra		02:07
openstackgerrit	zhangyanxian proposed openstack-infra/project-config: Fix typo in the Pypi-extract-name.py https://review.openstack.org/355692	02:09
openstackgerrit	zhangyanxian proposed openstack-infra/project-config: Fix typo in the pypi-extract-name.py https://review.openstack.org/355692	02:09
*** rbuzatu has quit IRC		02:10
openstackgerrit	Merged openstack-infra/project-config: Further F24 kernel update https://review.openstack.org/353783	02:10
openstackgerrit	James Polley proposed openstack-dev/pbr: Fix handling of old git log output https://review.openstack.org/339392	02:11
*** pradk has quit IRC		02:12
openstackgerrit	James Polley proposed openstack-dev/pbr: Fix handling of old git log output https://review.openstack.org/339392	02:18
*** aeng has joined #openstack-infra		02:19
*** gongysh has quit IRC		02:20
openstackgerrit	Timothy R. Chavez proposed openstack-infra/jenkins-job-builder: Use xml_jobs not jobs https://review.openstack.org/355694	02:20
*** baoli has quit IRC		02:21
*** elo has joined #openstack-infra		02:22
openstackgerrit	Paul Belanger proposed openstack-infra/project-config: Add smarter dns checking for configure_mirror.sh https://review.openstack.org/355695	02:22
pabelanger	cloudnull: clarkb: fungi: So, that should fix our launch node errors around DNS not working ^. In the case of osic-cloud1 and ubuntu-xenial, we are SSHing into the node and running host git.openstack.org before unbound has finished starting	02:24
*** raunak has quit IRC		02:25
*** jamielennox\|away is now known as jamielennox		02:26
timrc	zxiiro: Hi... it looks like 80aa5266166dfcc84be765060cae7c6eac363ecd caused a regression. See: https://review.openstack.org/#/c/355694/	02:27
*** mriedem is now known as mriedem_away		02:27
timrc	zxiiro: Use of --delete-old with commit 80aa5266166dfcc84be765060cae7c6eac363ecd will delete every job.	02:27
fungi	pabelanger: what are the odds that we're not preinstalling haveged on our nodes, resulting in entropy starvation?	02:29
fungi	timrc: i thought they fixed that last week?	02:29
zxiiro	i thought we fixed it too. I've been using it on my systems with no issue	02:30
zxiiro	i'm not sure what the difference of passing xml_jobs instead of jobs is. Jobs is what is returned from jenkins as the list of all jobs that were updated, hence shouldn't be deleted. xml_jobs should be the same too?	02:31
timrc	Not what I'm seeing...	02:31
zxiiro	timrc: how are you running your command? jenkins-jobs update --delete-old jjbs/ ?	02:33
*** asettle has joined #openstack-infra		02:33
timrc	zxiiro: Essentially, e.g. jenkins-jobs --conf /etc/jenkins_jobs/jenkins_jobs.ini update ./jjb-jobs/servers/`hostname` --delete-old	02:34
*** netsin has quit IRC		02:34
*** signed8bit_Zzz is now known as signed8bit		02:37
timrc	zxiiro: From my console running the script that runs whenever a change to our jobs repo changes... http://paste.openstack.org/show/557860/	02:38
zxiiro	timrc: well let me test it real quick and if it works for me I'll merge it	02:39
*** mdrabe has joined #openstack-infra		02:39
*** asettle has quit IRC		02:40
*** tphummel has quit IRC		02:40
*** vinaypotluri has joined #openstack-infra		02:42
timrc	zxiiro: I think the jobs list that gets returned by update_jobs is just the list of jobs that changed.. so if no jobs changed, for example, it returns []. That empty list gets passed as the "keeps" list. Since no jobs are in that list, they all get removed.	02:44
*** hongbin has joined #openstack-infra		02:44
*** bin_ has quit IRC		02:44
timrc	If we use xml_jobs the "keeps" list will always be every job in config, regardless of if it changed or not.	02:45
timrc	Which is exactly what we want, I think.	02:45
*** zhenguo has joined #openstack-infra		02:46
timrc	--delete-old should presumably just delete the jobs which are no longer in config.	02:46
zxiiro	timrc: yeah i'm testing that theory now. I want ot make sure we understand the difference betweeen the 2	02:48
zxiiro	timrc: i suspect i didn't catch it in testing because i run my system with ignore_cache=True	02:48
*** gongysh_ has joined #openstack-infra		02:48
*** yuanying has quit IRC		02:49
zxiiro	timrc: Ok I just confirmed it	02:49
zxiiro	timrc: you're right. jobs returns only updated so if you cached and your jobs didnt' update it won't be in the list. xml_Jobs is the right thing to use	02:50
zxiiro	timrc: can you update the commit message to explain that?	02:50
zxiiro	timrc: I'll approve the change right away once you do that	02:50
*** elo has quit IRC		02:51
*** yuanying has joined #openstack-infra		02:52
*** jimbaker has quit IRC		02:53
openstackgerrit	Jeremy Stanley proposed openstack-infra/system-config: Add a script to list change owner statistics https://review.openstack.org/263971	02:53
*** yamahata has quit IRC		02:54
*** inc0 has quit IRC		02:55
*** elo has joined #openstack-infra		02:57
*** jimbaker has joined #openstack-infra		02:57
*** jimbaker has quit IRC		02:57
*** jimbaker has joined #openstack-infra		02:57
*** gongysh_ has quit IRC		02:57
ianw	is it possible there's something up with the nodepool builder?	02:58
*** signed8bit is now known as signed8bit_Zzz		03:00
pabelanger	fungi: I am not sure, I'd have to check. I've never used haveged before either	03:00
pabelanger	ianw: I kicked of a build an hour or so go	03:01
pabelanger	looks like ubuntu-xenial is just finishing up	03:01
pabelanger	actually, done now	03:01
*** baoli has joined #openstack-infra		03:01
ianw	pabelanger: ahh, yeah, sorry should have checked the debug log	03:01
*** krtaylor has joined #openstack-infra		03:02
pabelanger	fungi: it is installed for ubuntu-xenial	03:02
*** thorst_ has joined #openstack-infra		03:03
*** yamahata has joined #openstack-infra		03:03
*** dimtruck is now known as zz_dimtruck		03:07
ianw	pabelanger: what's up with that -> OpenStackCloudException: Image creation failed: delete() takes exactly 2 arguments (1 given)	03:07
*** raunak has joined #openstack-infra		03:08
*** signed8bit_Zzz is now known as signed8bit		03:09
pabelanger	ianw: never seen that	03:10
pabelanger	ianw: I did delete some old DIB images from nodepool.o.o tonight however	03:10
*** apetrich has joined #openstack-infra		03:10
*** netsin has joined #openstack-infra		03:10
*** nwkarste_ has joined #openstack-infra		03:11
pabelanger	ianw: looks like a bug in shade	03:11
pabelanger	mordred: ^	03:11
ianw	pabelanger: yeah ... odd traceback	03:11
ianw	http://paste.openstack.org/show/557882/	03:11
*** thorst_ has quit IRC		03:11
*** elo has quit IRC		03:12
zxiiro	timrc: looks like you're not here. I'll update the commit message	03:12
*** nwkarsten has quit IRC		03:13
*** fguillot_ has quit IRC		03:14
openstackgerrit	Thanh Ha proposed openstack-infra/jenkins-job-builder: Use xml_jobs not jobs https://review.openstack.org/355694	03:14
*** elo has joined #openstack-infra		03:17
*** nwkarste_ has quit IRC		03:18
ianw	pabelanger: that tb really makes no sense ... i get the feeling the builder process might not be running the same code as on disk...	03:19
pabelanger	ianw: possible, you can restart it if you want, I am done for the night	03:20
*** raunak has quit IRC		03:20
ianw	pabelanger: ok, no worries, i'll see, numbers might make sense on an older release	03:21
*** raunak has joined #openstack-infra		03:21
timrc	zxiiro: Sorry, was putting my daughter to sleep. Readig up	03:22
zxiiro	timrc: no worries. once jenkins returns I will merge it	03:24
*** psilvad has quit IRC		03:25
timrc	zxiiro: Excellent. Thanks!	03:25
zxiiro	timrc: no thank you for reporting and fixing the issue!	03:25
*** baoli has quit IRC		03:26
*** rbuzatu has joined #openstack-infra		03:26
ianw	pabelanger: to answer my own question, the shade .py files are from the 13th, and the builder was started on the 12th. so yeah, the numbers don't line up in the tb	03:27
*** shashank_hegde has joined #openstack-infra		03:30
*** rbuzatu has quit IRC		03:31
ianw	yep, 1.9.0 makes much more sense	03:32
*** signed8bit has quit IRC		03:34
*** signed8bit has joined #openstack-infra		03:34
*** shashank_hegde has quit IRC		03:36
beagles	meh, still have really weird issues with zuul ansible on ubuntu. "async task produced unparseable results" shows up in the ansible log and the job fails	03:38
*** signed8bit has quit IRC		03:38
*** signed8b_ has joined #openstack-infra		03:39
*** julim has quit IRC		03:42
*** vikrant has joined #openstack-infra		03:42
*** yamahata has quit IRC		03:43
*** roxanaghe has joined #openstack-infra		03:43
*** roxanaghe has quit IRC		03:43
*** hongbin has quit IRC		03:45
*** ramishra has quit IRC		03:45
*** rajinir has joined #openstack-infra		03:45
*** nwkarsten has joined #openstack-infra		03:46
beagles	pabelanger, you still around - if so, should that possible fix to ^^^ have propogated through to where it'd get picked up on a recheck?	03:47
ianw	beagles: we were having issue with that on fedora, which had to do with locales on the host and it outputting error messages that got things confused	03:47
beagles	ouch. how did you resolve it?	03:48
clarkb	beagles: ianw my understanding is jeblair kocked off some restarts to pick up new ansible today	03:48
*** yuanying has quit IRC		03:48
*** roxanaghe has joined #openstack-infra		03:48
clarkb	the ansible fix merged but us not yet released	03:48
jeblair	yeah, should be in place. we may need to hold a node to debug further. (i can't do that now)	03:48
beagles	clarkb, awwww okay	03:48
ianw	beagles: fixed the locales in the image build :) but yeah, ansible did fix it in later release	03:48
beagles	clarkb, I had sifted through IRC backlog and misunderstood - thought it was "in the mix"	03:49
openstackgerrit	Merged openstack-infra/jenkins-job-builder: Use xml_jobs not jobs https://review.openstack.org/355694	03:49
jeblair	beagles: it is in place -- we are running unreleased ansible to get it	03:49
*** ramishra has joined #openstack-infra		03:51
*** yuanying has joined #openstack-infra		03:51
beagles	jeblair, okay nice.. how long ago would it have been available? Just want to confirm these jobs were launched before they would've gotten the fix	03:51
jeblair	beagles: i think i status logged it... 1 sec	03:52
jeblair	beagles: https://wiki.openstack.org/wiki/Infrastructure_Status says	03:52
jeblair	2016-08-15 20:34:14 UTC Installed ansible stable-2.1 branch on zuul launchers to pick up https://github.com/ansible/ansible/commit/d35377dac78a8fcc6e8acf0ffd92f47f44d70946	03:52
*** nwkarsten has quit IRC		03:52
*** nwkarsten has joined #openstack-infra		03:53
beagles	jeblair, crap.. then unless I'm missing something it should've been picked up.. 1s	03:54
beagles	jeblair, is there something in the ansible logs, etc. I can spot to check what version was being used?	03:55
*** signed8bit has joined #openstack-infra		03:56
*** asettle has joined #openstack-infra		03:56
*** nwkarsten has quit IRC		03:57
*** signed8b_ has quit IRC		03:58
*** winggundamth has quit IRC		03:59
*** asettle has quit IRC		04:01
prometheanfire	think I may have found a bug in git-review/gerrit	04:06
prometheanfire	maybe	04:06
prometheanfire	can you git-review to the same change-id but a diferent branch?	04:07
prometheanfire	huh, you can	04:07
prometheanfire	nvm then lol	04:07
prometheanfire	https://review.openstack.org/#/q/I67d7a5000bfe0c98717d3e29d23edc9c6117e765,n,z	04:07
*** thorst_ has joined #openstack-infra		04:10
*** tqtran has joined #openstack-infra		04:10
beagles	jeblair, actually .. what I'm looking at looks like a timeout... wow	04:10
clarkb	prometheanfire: yes change ids are not unique	04:12
clarkb	prometheanfire: the unique tuple is prject, branch, change id	04:13
*** hichihara has joined #openstack-infra		04:13
prometheanfire	just realized that :D	04:13
*** winggundamth has joined #openstack-infra		04:14
prometheanfire	toabctl: for you to watch I guess https://review.openstack.org/353349 https://review.openstack.org/355711	04:14
*** tqtran has quit IRC		04:14
prometheanfire	bah	04:14
prometheanfire	toabctl: sorry, mistype	04:14
prometheanfire	tonyb: for you to watch I guess https://review.openstack.org/353349 https://review.openstack.org/355711	04:15
prometheanfire	tonyb: though you might be done working	04:15
openstackgerrit	kyle liu proposed openstack-infra/project-config: Add new project networking-zte https://review.openstack.org/355278	04:16
prometheanfire	also, if someone has some time to review... https://review.openstack.org/#/c/310865/	04:17
*** thorst_ has quit IRC		04:17
*** rlandy has quit IRC		04:19
*** jimbaker has quit IRC		04:20
*** links has joined #openstack-infra		04:20
*** sflanigan has joined #openstack-infra		04:22
*** sflanigan has joined #openstack-infra		04:22
*** raunak has quit IRC		04:22
*** jimbaker has joined #openstack-infra		04:23
*** jimbaker has quit IRC		04:23
*** jimbaker has joined #openstack-infra		04:23
*** raunak has joined #openstack-infra		04:24
*** javeriak has joined #openstack-infra		04:26
openstackgerrit	Ian Wienand proposed openstack-infra/shade: Use "image" as argument for Glance V1 upload error path https://review.openstack.org/355715	04:27
*** tonytan4ever has quit IRC		04:27
ianw	pabelanger: ^ re that error.	04:27
ianw	that fixed, i'll restart the builder now since it's quiet and so it's running the same code that's actually on disk :)	04:28
*** javeriak has quit IRC		04:31
*** kzaitsev_mb has joined #openstack-infra		04:38
ianw	i wonder why "nodepool image-build fedora-24" gets stuck?	04:42
*** Sukhdev has joined #openstack-infra		04:43
*** javeriak has joined #openstack-infra		04:45
*** rbuzatu has joined #openstack-infra		04:48
*** pgadiya has joined #openstack-infra		04:48
*** sarob has joined #openstack-infra		04:49
*** signed8bit has quit IRC		04:52
*** sarob has quit IRC		04:53
*** mdrabe has quit IRC		04:54
*** rbuzatu has quit IRC		04:54
*** psachin has joined #openstack-infra		04:59
*** arnewiebalck has quit IRC		05:00
*** jimbaker has quit IRC		05:00
*** tonytan4ever has joined #openstack-infra		05:03
*** kzaitsev_mb has quit IRC		05:03
*** jimbaker has joined #openstack-infra		05:04
*** jimbaker has quit IRC		05:04
*** jimbaker has joined #openstack-infra		05:04
*** elo has quit IRC		05:04
*** raunak has quit IRC		05:05
*** raunak has joined #openstack-infra		05:06
*** thorst_ has joined #openstack-infra		05:15
*** senk_ has joined #openstack-infra		05:16
*** _nadya_ has joined #openstack-infra		05:19
*** raunak has quit IRC		05:20
*** raunak has joined #openstack-infra		05:21
*** thorst_ has quit IRC		05:22
*** _nadya_ has quit IRC		05:24
*** Sukhdev has quit IRC		05:26
*** kushal has joined #openstack-infra		05:29
*** jaosorior has joined #openstack-infra		05:30
*** raunak has quit IRC		05:35
*** hichihara has quit IRC		05:36
*** baoli has joined #openstack-infra		05:38
*** rbuzatu has joined #openstack-infra		05:39
*** ccamacho has joined #openstack-infra		05:40
*** shashank_hegde has joined #openstack-infra		05:42
*** baoli has quit IRC		05:42
*** M-docaedo_vector has quit IRC		05:43
*** raunak has joined #openstack-infra		05:43
*** senk_ has quit IRC		05:45
*** roxanaghe has quit IRC		05:45
*** r-mibu has quit IRC		05:46
*** tonytan4ever has quit IRC		05:46
beagles	is it possible to log in to a node and see what's going on if it looks like jobs are hung?	05:47
openstackgerrit	guo yunxian proposed openstack/os-testr: Add support for Python versions https://review.openstack.org/355730	05:48
*** dkehn_ has quit IRC		05:48
*** dkehn has quit IRC		05:49
*** shashank_hegde has quit IRC		05:49
*** raunak has quit IRC		05:50
*** markusry has joined #openstack-infra		05:50
openstackgerrit	guo yunxian proposed openstack/os-testr: Add support for Python versions https://review.openstack.org/355730	05:51
*** tonytan4ever has joined #openstack-infra		05:54
*** rajinir has quit IRC		05:55
*** raunak has joined #openstack-infra		05:55
*** dkehn has joined #openstack-infra		05:55
ianw	beagles: yes, we can hold a node and give you a login, but it's a manual process	05:56
*** slaweq_ has joined #openstack-infra		05:57
*** oanson has joined #openstack-infra		05:58
*** markvoelker has quit IRC		05:58
*** dkehn_ has joined #openstack-infra		06:01
openstackgerrit	Ian Wienand proposed openstack-infra/system-config: Pre-install python2-requests package for Fedora https://review.openstack.org/355731	06:01
*** sandanar has joined #openstack-infra		06:02
*** pabelanger has quit IRC		06:02
*** pabelanger has joined #openstack-infra		06:03
*** ccamacho is now known as ccamacho\|afk		06:04
*** tonytan4ever has quit IRC		06:04
*** tkelsey has joined #openstack-infra		06:05
*** r-mibu has joined #openstack-infra		06:06
*** raunak has quit IRC		06:09
*** florianf has joined #openstack-infra		06:09
*** tkelsey has quit IRC		06:09
*** M-docaedo_vector has joined #openstack-infra		06:10
*** tqtran has joined #openstack-infra		06:11
*** markusry has quit IRC		06:11
*** jimbaker has quit IRC		06:13
*** rcernin has joined #openstack-infra		06:14
*** tqtran has quit IRC		06:15
*** elo has joined #openstack-infra		06:16
*** jimbaker has joined #openstack-infra		06:17
*** raunak has joined #openstack-infra		06:17
*** jimbaker has quit IRC		06:17
*** jimbaker has joined #openstack-infra		06:17
*** javeriak has quit IRC		06:18
yolanda	good morning	06:19
*** thorst_ has joined #openstack-infra		06:20
*** raunak has quit IRC		06:21
*** raunak has joined #openstack-infra		06:25
*** shashank_hegde has joined #openstack-infra		06:26
*** kzaitsev_mb has joined #openstack-infra		06:27
*** elo has quit IRC		06:27
*** elo has joined #openstack-infra		06:27
*** thorst_ has quit IRC		06:27
*** csomerville has quit IRC		06:29
*** cody-somerville has joined #openstack-infra		06:30
*** cody-somerville has joined #openstack-infra		06:30
*** Jeffrey4l has joined #openstack-infra		06:30
*** liusheng has quit IRC		06:30
*** spzala has joined #openstack-infra		06:31
*** liusheng has joined #openstack-infra		06:31
*** spzala has quit IRC		06:35
*** raunak has quit IRC		06:35
*** savihou has joined #openstack-infra		06:36
*** gildub has quit IRC		06:37
*** kushal has quit IRC		06:39
*** vsaienko has quit IRC		06:42
*** markusry has joined #openstack-infra		06:46
*** raunak has joined #openstack-infra		06:47
*** ihrachys has joined #openstack-infra		06:47
yolanda	ianw, around? care reviewing https://review.openstack.org/353994 ?	06:49
*** martinkopec has joined #openstack-infra		06:50
*** raunak has quit IRC		06:50
*** markvoelker has joined #openstack-infra		06:51
*** markusry has quit IRC		06:52
*** yamahata has joined #openstack-infra		06:53
*** tkelsey has joined #openstack-infra		06:54
*** rbuzatu has quit IRC		06:57
*** rbuzatu has joined #openstack-infra		06:58
*** jtomasek\|afk is now known as jtomasek		07:00
openstackgerrit	Vitaly Gridnev proposed openstack-infra/project-config: don't run tempest tests in sahara grenade https://review.openstack.org/354700	07:01
*** yamahata has quit IRC		07:02
*** savihou has quit IRC		07:07
*** thorongil has joined #openstack-infra		07:10
*** jpich has joined #openstack-infra		07:11
*** ccamacho\|afk is now known as ccamacho		07:13
openstackgerrit	Merged openstack-infra/project-config: fix typo in comment https://review.openstack.org/355153	07:14
*** shashank_hegde has quit IRC		07:18
openstackgerrit	Merged openstack-infra/project-config: Fix syntax error in ironic-python-agent post job https://review.openstack.org/355487	07:18
*** dizquierdo has joined #openstack-infra		07:19
*** tonytan4ever has joined #openstack-infra		07:22
*** nmagnezi has joined #openstack-infra		07:23
*** tonytan4ever has quit IRC		07:26
*** e0ne has joined #openstack-infra		07:28
*** hichihara has joined #openstack-infra		07:28
*** thorst_ has joined #openstack-infra		07:28
*** raunak has joined #openstack-infra		07:30
*** thorst_ has quit IRC		07:32
akscram	Guys, I want to add the puppet-check-jobs group and make it non-voting but I do not know how to do it properly: https://review.openstack.org/#/c/355265/	07:33
akscram	Could someone provide me an advise how to enable it?	07:34
*** raunak has quit IRC		07:37
*** bauzas_off is now known as bauzas		07:38
*** ifarkas_afk is now known as ifarkas		07:40
*** javeriak has joined #openstack-infra		07:42
*** dkehn has quit IRC		07:43
*** dkehn_ has quit IRC		07:43
*** raunak has joined #openstack-infra		07:45
*** savihou has joined #openstack-infra		07:45
openstackgerrit	Changcheng Intel proposed openstack-infra/jenkins-job-builder: add compress-log option to compress log https://review.openstack.org/354138	07:49
*** dkehn has joined #openstack-infra		07:50
*** matthewbodkin has joined #openstack-infra		07:50
*** baoli has joined #openstack-infra		07:50
*** chlong has quit IRC		07:50
*** raunak has quit IRC		07:52
*** kzaitsev_mb has quit IRC		07:52
*** yanyanhu has quit IRC		07:52
openstackgerrit	Changcheng Intel proposed openstack-infra/jenkins-job-builder: add post-send script option https://review.openstack.org/355135	07:53
*** hwoarang has joined #openstack-infra		07:53
*** baoli has quit IRC		07:54
openstackgerrit	Changcheng Intel proposed openstack-infra/jenkins-job-builder: use base_email_create to customize email flexible https://review.openstack.org/355139	07:54
*** sshnaidm\|afk is now known as sshnaidm		07:55
*** kzaitsev_mb has joined #openstack-infra		07:55
*** zzzeek has quit IRC		08:00
*** zzzeek has joined #openstack-infra		08:00
*** pilgrimstack has joined #openstack-infra		08:01
*** dkehn_ has joined #openstack-infra		08:01
*** markvoelker has quit IRC		08:01
*** Mmike has quit IRC		08:02
*** Mmike has joined #openstack-infra		08:02
*** pilgrimstack has quit IRC		08:05
*** afred312 has quit IRC		08:05
*** raunak has joined #openstack-infra		08:06
*** afred312 has joined #openstack-infra		08:06
*** pilgrimstack has joined #openstack-infra		08:07
*** raunak has quit IRC		08:11
*** esikachev has joined #openstack-infra		08:13
*** matrohon has joined #openstack-infra		08:19
*** yanyanhu has joined #openstack-infra		08:20
*** asettle has joined #openstack-infra		08:20
*** sshnaidm has quit IRC		08:21
*** lucas-dinner is now known as lucasagomes		08:21
*** sshnaidm has joined #openstack-infra		08:21
*** tonytan4ever has joined #openstack-infra		08:23
openstackgerrit	Matthew Bodkin proposed openstack-infra/storyboard-webclient: Make side bar the same length as navbar https://review.openstack.org/355554	08:26
*** Goneri has joined #openstack-infra		08:27
*** Na3iL has joined #openstack-infra		08:27
*** tonytan4ever has quit IRC		08:28
*** chem has joined #openstack-infra		08:29
*** thorst_ has joined #openstack-infra		08:30
*** electrofelix has joined #openstack-infra		08:33
*** bethwhite_ has joined #openstack-infra		08:33
*** kzaitsev_mb has quit IRC		08:34
*** sandanar_ has joined #openstack-infra		08:34
*** thorst_ has quit IRC		08:37
*** sandanar has quit IRC		08:38
*** tkelsey has quit IRC		08:39
*** yaume has joined #openstack-infra		08:40
openstackgerrit	Ivan Udovichenko proposed openstack-infra/project-config: Add new/update existing projects https://review.openstack.org/347047	08:40
*** mhickey has joined #openstack-infra		08:41
*** yamamoto has quit IRC		08:44
*** acoles_ is now known as acoles		08:49
*** sarob has joined #openstack-infra		08:51
*** dkehn_ has quit IRC		08:51
*** dkehn has quit IRC		08:51
*** bethwhite__ has joined #openstack-infra		08:53
*** sarob has quit IRC		08:55
*** Na3iL has quit IRC		08:56
*** dkehn has joined #openstack-infra		08:58
*** Julien-zte has joined #openstack-infra		08:59
*** Goneri has quit IRC		09:01
*** Goneri has joined #openstack-infra		09:01
*** markvoelker has joined #openstack-infra		09:02
*** derekh has joined #openstack-infra		09:03
*** dkehn_ has joined #openstack-infra		09:04
*** markvoelker has quit IRC		09:07
*** sambetts\|afk is now known as sambetts		09:07
*** Na3iL has joined #openstack-infra		09:11
*** vinaypotluri has quit IRC		09:11
openstackgerrit	Aleksey Zvyagintsev proposed openstack-infra/project-config: Add repo for murano-pkg-check. Murano package validator tool. https://review.openstack.org/353861	09:18
*** eranrom has quit IRC		09:20
*** markmcd has joined #openstack-infra		09:20
*** _nadya_ has joined #openstack-infra		09:22
*** _nadya_ has quit IRC		09:22
*** _nadya_ has joined #openstack-infra		09:22
*** infra-red has joined #openstack-infra		09:23
openstackgerrit	Merged openstack/diskimage-builder: Allow to skip kernel cleanup https://review.openstack.org/353994	09:24
*** dtardivel has joined #openstack-infra		09:28
*** eranrom has joined #openstack-infra		09:30
*** yamamoto has joined #openstack-infra		09:31
*** thorst_ has joined #openstack-infra		09:35
*** ociuhandu has joined #openstack-infra		09:40
*** nwkarsten has joined #openstack-infra		09:40
*** dtantsur\|afk is now known as dtantsur		09:40
*** thorst_ has quit IRC		09:41
*** yamamoto has quit IRC		09:41
*** ramishra has quit IRC		09:42
*** ramishra has joined #openstack-infra		09:44
*** nwkarsten has quit IRC		09:44
*** yamamoto has joined #openstack-infra		09:46
*** yamamoto has quit IRC		09:46
*** dmellado has quit IRC		09:46
*** amoralej has quit IRC		09:46
*** geguileo has quit IRC		09:46
*** kzaitsev_mb has joined #openstack-infra		09:48
*** yamamoto has joined #openstack-infra		09:48
*** tosky has joined #openstack-infra		09:49
openstackgerrit	Ilya Shakhat proposed openstack-infra/project-config: Add new project "os-failures" https://review.openstack.org/355819	09:53
openstackgerrit	James Slagle proposed openstack-infra/tripleo-ci: Add temporary pin to last known good current tripleo repo https://review.openstack.org/354481	09:54
*** hichihara has quit IRC		09:54
openstackgerrit	James Slagle proposed openstack-infra/tripleo-ci: Add temporary pin to last known good current tripleo repo https://review.openstack.org/354481	09:55
*** dmellado has joined #openstack-infra		09:56
*** ihrachys has quit IRC		09:58
*** javeriak has quit IRC		10:00
*** zhurong has quit IRC		10:01
*** markvoelker has joined #openstack-infra		10:03
*** jed56 has joined #openstack-infra		10:03
*** sandanar__ has joined #openstack-infra		10:03
*** sandanar_ has quit IRC		10:07
openstackgerrit	Julien Danjou proposed openstack-infra/project-config: Teach some Telemetry jobs about Gnocchi stable/2.2 branch https://review.openstack.org/355828	10:07
*** markvoelker has quit IRC		10:08
*** kushal has joined #openstack-infra		10:09
*** tqtran has joined #openstack-infra		10:12
*** ihrachys has joined #openstack-infra		10:14
*** pt_15 has quit IRC		10:16
*** Julien-zte has quit IRC		10:17
*** tqtran has quit IRC		10:17
*** _degorenko\|afk is now known as degorenko		10:18
sshnaidm	do you know why in some of projects when I set "closes-bug" it doesn't affect bugs in launchpad? Should be something special configured for this feature?	10:18
*** asettle has quit IRC		10:22
*** sdague has joined #openstack-infra		10:23
*** ihrachys has quit IRC		10:23
*** yanyanhu has quit IRC		10:24
*** tonytan4ever has joined #openstack-infra		10:24
*** yamamoto has quit IRC		10:25
*** mhickey has quit IRC		10:25
*** kushal has quit IRC		10:26
*** kushal has joined #openstack-infra		10:27
*** tonytan4ever has quit IRC		10:28
*** javeriak has joined #openstack-infra		10:29
*** rbuzatu has quit IRC		10:29
*** cdent has joined #openstack-infra		10:30
cdent	I'm trying to figure out how to integrate the api-wg gerrit with the (newer) launchpad bugs collection it has. I can see from the docs that jeepyb does it, but I'm missing the bit on what to change in config to turn it on. halp?	10:31
*** spzala has joined #openstack-infra		10:31
*** boogibugs has joined #openstack-infra		10:32
*** florianf has quit IRC		10:33
openstackgerrit	Darragh Bailey proposed openstack-infra/jenkins-job-builder: Adding support for Manual Build Trigger https://review.openstack.org/202543	10:34
openstackgerrit	Darragh Bailey proposed openstack-infra/jenkins-job-builder: Consolidate trigger-manual and trigger-parameterized-builds https://review.openstack.org/314108	10:34
*** spzala has quit IRC		10:36
*** boogibugs has quit IRC		10:36
*** boogibugs has joined #openstack-infra		10:36
*** Na3iL has quit IRC		10:38
*** florianf has joined #openstack-infra		10:38
*** narayrak has joined #openstack-infra		10:39
openstackgerrit	Ricardo Carrillo Cruz proposed openstack-infra/system-config: Add provisioning and public IP addresses for compute00[0-1].vanilla https://review.openstack.org/355839	10:40
*** thorst_ has joined #openstack-infra		10:41
openstackgerrit	Ricardo Carrillo Cruz proposed openstack-infra/system-config: Add provisioning and public IP addresses for compute00[0-1].vanilla https://review.openstack.org/355839	10:44
openstackgerrit	Ricardo Carrillo Cruz proposed openstack-infra/system-config: Correct public IP for baremetal00 https://review.openstack.org/355841	10:46
*** thorst_ has quit IRC		10:47
*** bethwhite_ has quit IRC		10:48
electrofelix	zxiiro waynr: given TOX_TESTENV_PASSENV works for https://review.openstack.org/271244, perhaps I should just change that review to update documentation when testing and add a comment instead of explicitly allowing proxy variables to be passed through?	10:48
*** rhallisey has joined #openstack-infra		10:49
*** sarob has joined #openstack-infra		10:52
openstackgerrit	yolanda.robla proposed openstack-infra/puppet-infracloud: Fix bridge creation when no vlan is involved https://review.openstack.org/355845	10:54
*** sarob has quit IRC		10:56
openstackgerrit	Benny Kopilov proposed openstack-infra/devstack-gate: Enable support for cinder multi-backend in tempest https://review.openstack.org/355846	11:01
openstackgerrit	Benny Kopilov proposed openstack-infra/devstack-gate: Enable support for cinder multi-backend in tempest https://review.openstack.org/355846	11:03
*** yamamoto has joined #openstack-infra		11:03
*** dmsimard is now known as dmsimard\|afk		11:03
*** azvyagintsev_h has joined #openstack-infra		11:03
*** markvoelker has joined #openstack-infra		11:04
*** markvoelker has quit IRC		11:08
*** asettle has joined #openstack-infra		11:09
*** Na3iL has joined #openstack-infra		11:09
*** locust has joined #openstack-infra		11:12
*** baoli has joined #openstack-infra		11:14
openstackgerrit	Ryan Hallisey proposed openstack-infra/project-config: Few changed to the kolla-kubernetes job https://review.openstack.org/355199	11:15
*** florianf has quit IRC		11:17
*** baoli has quit IRC		11:19
*** ociuhandu has quit IRC		11:20
*** florianf has joined #openstack-infra		11:21
openstackgerrit	Sean Dague proposed openstack-infra/project-config: Prime pip cache https://review.openstack.org/355854	11:22
*** jkilpatr has joined #openstack-infra		11:23
*** dizquierdo is now known as dizquierdo_afk		11:29
*** rbuzatu has joined #openstack-infra		11:29
*** asettle has quit IRC		11:30
*** ramishra has quit IRC		11:30
openstackgerrit	Sagi Shnaidman proposed openstack-infra/tripleo-ci: DONT MERGE: test periodic job https://review.openstack.org/355859	11:31
*** ccamacho is now known as ccamacho\|lunch		11:31
cdent	sdague: since you appear to be awake maybe you know the answer to my question above: "I'm trying to figure out how to integrate the api-wg gerrit with the (newer) launchpad bugs collection it has. I can see from the docs that jeepyb does it, but I'm missing the bit on what to change in config to turn it on. halp?"	11:31
*** ramishra has joined #openstack-infra		11:32
*** pbourke has joined #openstack-infra		11:32
pbourke	hi, wondering are the repos at http://mirror.ord.rax.openstack.org/ubuntu/dists/xenial/ signed, and if so, where can I find the key?	11:33
openstackgerrit	Fathi Boudra proposed openstack-infra/jenkins-job-builder: builders: add 'publish over ssh' support as a build step https://review.openstack.org/98437	11:34
*** rbuzatu has quit IRC		11:34
*** thorst_ has joined #openstack-infra		11:35
*** jaosorior has quit IRC		11:35
*** jaosorior has joined #openstack-infra		11:36
*** sdake has joined #openstack-infra		11:36
*** berendt has joined #openstack-infra		11:39
*** rfolco has joined #openstack-infra		11:41
openstackgerrit	Merged openstack-infra/system-config: Add provisioning and public IP addresses for compute00[0-1].vanilla https://review.openstack.org/355839	11:41
*** sfinucan has quit IRC		11:41
*** tpsilva has joined #openstack-infra		11:41
*** asettle has joined #openstack-infra		11:44
*** sfinucan has joined #openstack-infra		11:44
dtantsur	hi folks! could you please merge https://review.openstack.org/#/c/354608/ ? it's blocking Ironic stable gate	11:45
openstackgerrit	Darragh Bailey proposed openstack-infra/jenkins-job-builder: Allow using lockfile per jenkins master https://review.openstack.org/293631	11:46
sdague	cdent: what is the old bug group, and what is the new one?	11:47
*** matbu is now known as matbu\|lunch		11:47
sdague	dtantsur: +A	11:47
cdent	sdague: there was no previous association with launchpad. The new launchpad is: https://bugs.launchpad.net/openstack-api-wg https://launchpad.net/~openstack-api-wg-drivers	11:48
odyssey4me	yolanda if you have a moment, reviews of https://review.openstack.org/355434 & https://review.openstack.org/355491 would be appreciated	11:48
*** sarob has joined #openstack-infra		11:50
dtantsur	sdague, thanks!	11:50
sdague	cdent: I think it's the 'groups' field	11:51
*** rodrigods has quit IRC		11:51
*** asettle has quit IRC		11:51
*** rodrigods has joined #openstack-infra		11:51
sdague	https://github.com/openstack-infra/project-config/blob/c5ed5d0c03c337c8834cb153de78459f4d802dda/gerrit/projects.yaml#L4220	11:51
sdague	anteaya, is that right? ^^^	11:51
*** asettle has joined #openstack-infra		11:52
*** sshnaidm is now known as sshnaidm\|lnch		11:52
sdague	are we really imbalanced on xenial nodes?	11:53
*** baoli has joined #openstack-infra		11:53
*** baoli_ has joined #openstack-infra		11:54
*** sarob has quit IRC		11:54
*** tonytan4ever has joined #openstack-infra		11:55
openstackgerrit	James Slagle proposed openstack-infra/tripleo-ci: Add temporary pin to last known good current tripleo repo https://review.openstack.org/354481	11:56
*** acabot has quit IRC		11:57
*** baoli has quit IRC		11:58
*** rbuzatu has joined #openstack-infra		11:58
*** tonytan4ever has quit IRC		11:59
openstackgerrit	Merged openstack-infra/project-config: Ensure we alway build old Ironic ramdisk https://review.openstack.org/354608	12:00
openstackgerrit	James Slagle proposed openstack-infra/tripleo-ci: Add temporary pin to last known good current tripleo repo https://review.openstack.org/354481	12:00
beagles	ianw: sorry I ran off on you... had to catch some Zzz's	12:00
beagles	ianw, these jobs seem to be largely hanging while cloning repos... if not hanging, then at least slowing wwaaaayyyyyy done	12:00
openstackgerrit	Jim Rollenhagen proposed openstack-infra/project-config: Ironic: multitenant job should not run on stable https://review.openstack.org/355880	12:01
openstackgerrit	Merged openstack-infra/project-config: Implement Swift pypy experimental check https://review.openstack.org/355491	12:02
openstackgerrit	Sam Betts proposed openstack-infra/project-config: Prevent Ironic multitenancy job running on old versions https://review.openstack.org/355881	12:02
jroll	sambetts: you're too slow :)	12:02
sambetts	jroll: apprently so :-P	12:02
*** dprince has joined #openstack-infra		12:03
*** ldnunes has joined #openstack-infra		12:03
*** markvoelker has joined #openstack-infra		12:05
*** sigmavirus\|away is now known as sigmavirus		12:05
*** lucasagomes is now known as lucas-hungry		12:06
*** mriedem_away has quit IRC		12:07
*** markvoelker has quit IRC		12:09
*** kgiusti has joined #openstack-infra		12:09
openstackgerrit	Julia Kreger proposed openstack-infra/project-config: Rename bifrost integration test job https://review.openstack.org/355652	12:09
openstackgerrit	Chris Dent proposed openstack-infra/project-config: Set the launchpad name for api-wg https://review.openstack.org/355885	12:09
openstackgerrit	Matthew Bodkin proposed openstack-infra/storyboard: Fixing docs so it is easy to understand https://review.openstack.org/355886	12:10
*** acabot has joined #openstack-infra		12:10
*** psachin has quit IRC		12:10
azvyagintsev_h	Folks, could you please suggest how i should fix templates for https://review.openstack.org/#/c/353861/17..18/zuul/layout.yaml ? since if i remove check\gate section - test fall ;(	12:11
*** vrovachev has joined #openstack-infra		12:11
vrovachev	Hello around, please take a look https://review.openstack.org/#/c/355382/	12:12
*** rbuzatu has quit IRC		12:13
*** yaume has quit IRC		12:13
*** rbuzatu has joined #openstack-infra		12:14
*** narayrak has quit IRC		12:15
*** locust has quit IRC		12:17
*** weshay has joined #openstack-infra		12:17
*** javeriak has quit IRC		12:21
*** matbu\|lunch is now known as matbu		12:21
openstackgerrit	Dmitry Tantsur proposed openstack-infra/project-config: Make the grenade job voting on ironic-inspector https://review.openstack.org/355894	12:22
*** javeriak has joined #openstack-infra		12:24
*** gordc has joined #openstack-infra		12:24
beagles	ianw: is there something particular with these jobs (osic cloud jobs?) that could slow down stuff like git clone operations	12:25
EmilienM	to give a bit more precisions than beagles, we are seeing a persistent problem when cloning repositories with zuul cloner, when running ubuntu nodes on osic-cloud1	12:25
beagles	yeah, what he said	12:25
beagles	:)	12:25
EmilienM	are we aware about any downtime on osic ?	12:25
*** markvoelker has joined #openstack-infra		12:26
*** pradk has joined #openstack-infra		12:26
*** burgerk has joined #openstack-infra		12:27
*** mdrabe has joined #openstack-infra		12:29
*** gouthamr has joined #openstack-infra		12:32
*** apetrich has quit IRC		12:32
pleia2	mtreinish: so this time it really was getting stuck on the fact that the new/ directory existed and immediately failing, I manually removed it and let it run at :20, other.html now exists: http://status.openstack.org/elastic-recheck/data/other.html	12:33
*** yamamoto has quit IRC		12:33
pleia2	mtreinish: should probably sort out the naming though :) http://status.openstack.org/elastic-recheck/ links to other.html and that exists, but it's inconsistent with our others.html template	12:34
odyssey4me	EmilienM afaik it's running well... but you may need to know that it's running IPv6 and that its DNS resolver is configured to use 127.0.0.1 to point at a locally running unbound service... so your tests may appear to have dns resolution errors	12:34
odyssey4me	EmilienM also, if your tests can't use IPv6 for external connectivity, then that may also be an issue	12:34
EmilienM	odyssey4me: zuul-cloner takes forever	12:35
EmilienM	odyssey4me: 355612	12:35
EmilienM	err	12:35
EmilienM	http://logs.openstack.org/35/355235/1/check/gate-puppet-openstacklib-puppet-beaker-rspec-ubuntu-trusty/730b053/console.html#_2016-08-16_10_38_40_231581	12:35
odyssey4me	EmilienM yeah, that could relate to DNS resolution... we've seen slowness in odd places too	12:36
openstackgerrit	Benny Kopilov proposed openstack-infra/devstack-gate: Enable support for cinder multi-backend in tempest https://review.openstack.org/355846	12:36
odyssey4me	basically OSIC is configured to use unbound, RAX has something in place which overwrites the nodepool config and uses the RAX DNS...	12:37
*** apetrich has joined #openstack-infra		12:37
odyssey4me	so we're seeing inconsistencies and odd slowness here and there too	12:37
*** ccamacho\|lunch is now known as ccamacho		12:38
*** yamamoto has joined #openstack-infra		12:39
*** sandanar__ has quit IRC		12:39
openstackgerrit	Brad P. Crochet proposed openstack-infra/tripleo-ci: Use tripleo-build-images for CI https://review.openstack.org/336312	12:41
mordred	EmilienM, odyssey4me: for the slow cloning ... is there any chance that there is some weird routing which is causing routing between OSIC and RAX to go strange? the git mirrors are all in RAX	12:42
odyssey4me	mordred hmm, good question - not one I have the answer to, but that would explain how slow the cloning is	12:43
*** yamamoto has quit IRC		12:43
odyssey4me	I'm surprised that we don't have regional git endpoints too. :)	12:44
odyssey4me	perhaps cloudnull can provide some insight when he comes online	12:44
mordred	yah - well, so far it hasn't been an issue :)	12:44
sdague	mordred: I did some poking around on my devstack	12:44
mordred	yeah?	12:44
odyssey4me	mordred ah of course, the local git cache is useful to speed things up	12:44
sdague	the pip cache used by devstack is actually the one owned by the root user	12:45
sdague	because sudo	12:45
sdague	so https://review.openstack.org/#/c/355854/ might be all that we need	12:45
*** raildo has joined #openstack-infra		12:45
sdague	I don't know how one actually validates a thing like that before it goes into production	12:45
*** rlandy has joined #openstack-infra		12:45
* mordred looks		12:46
mordred	sdague: yesterday, I noticed in this change: http://logs.openstack.org/05/351905/7/check/check-osc-plugins/71038e2/console.html#_2016-08-15_17_51_18_401054	12:46
mordred	(which does happen to be on OSIC)	12:47
mordred	that every remote update action took 4 seconds	12:47
mordred	sdague: root owns a pip cache?	12:48
sdague	sudo pip install foo	12:48
dtantsur	folks, jroll, the check-osc-plugin seems broken for ironic: http://logs.openstack.org/55/328955/15/check/check-osc-plugins/177080d/. is it something known?	12:48
*** jheroux has joined #openstack-infra		12:48
sdague	will put that content into ~/.cache/pip	12:48
sdague	for root	12:48
sdague	/root/.cache/pip	12:49
kgiusti	folks: the oslo.messaging team is experiencing frequent failures of the same 3 tempest tests: http://status.openstack.org/openstack-health/#/job/gate-oslo.messaging-src-dsvm-full-zmq	12:49
sdague	mordred: 4 seconds for a git operation does not seem completely out of bounds	12:49
*** devkulkarni has joined #openstack-infra		12:49
kgiusti	similarish to bug: https://bugs.launchpad.net/openstack-gate/+bug/1449136	12:49
openstack	Launchpad bug 1449136 in OpenStack-Gate "OpenStack pypi mirrors disconnecting connections" [Undecided,New]	12:49
*** matt-borland has joined #openstack-infra		12:49
kgiusti	same failures, but not against pypi host but against localhost http server	12:50
kgiusti	known issue?	12:50
*** bswartz has joined #openstack-infra		12:50
*** ociuhandu has joined #openstack-infra		12:50
*** itisha has quit IRC		12:50
sdague	kgiusti: I think we're feeding it the icon on git.openstack.org for the http image registration	12:51
sdague	so that really means git.openstack.org is dropping requests	12:51
*** devkulkarni has quit IRC		12:53
*** devkulkarni has joined #openstack-infra		12:54
mordred	sdague: I think a consistent 4 seconds to check whether there are any new refs to pull in repos that should be no more than a day out of day is exceptionally long	12:54
mordred	sdague: that said - I have verified that the root pip caching works - so neat	12:54
kgiusti	sdague: are the three failing tests the only ones that query git.openstack.org? I ask because only those three tests consistently fail - all others have passed without incident.	12:55
*** asettle has quit IRC		12:55
mordred	sdague: I don't think your patch is going to work, becuase install is going to want to build them, and we don;'t have the bindep depends installed at that point	12:55
mordred	sdague: if we want to prime the cache, using pip download I think may be better? but now I need to check if that also does cache things ...	12:56
mordred	yah. it does (just checked)	12:57
*** ociuhandu has quit IRC		12:57
jroll	dtantsur: that's new to me	12:57
rcarrillocruz	o/	12:58
rcarrillocruz	i'm around today (yesterday was bank holiday in Spain)	12:58
*** asettle has joined #openstack-infra		12:58
sdague	mordred: pip download won't prime the cache	12:58
mordred	I just tested that it will	12:58
sdague	I got a wildly smaller cache with it locally	12:58
*** vikrant has quit IRC		12:59
sdague	mordred: it will only try to build if the wheels aren't there, right?	12:59
sdague	we're hitting the wheel mirror with this, right?	12:59
mordred	mordred@camelot:~/src/openstack-infra/nodepool$ sudo -H pip install -d . paramz	12:59
mordred	Collecting paramz	12:59
rcarrillocruz	doh	12:59
mordred	Using cached paramz-0.6.1.tar.gz	12:59
rcarrillocruz	yolanda , mordred , pabelanger : http://paste.openstack.org/show/558377/	12:59
mordred	that was the second time I ran it, after deleting the tarball from the local dir	12:59
*** julim has joined #openstack-infra		12:59
rcarrillocruz	glean cruft on writing interfaces file	12:59
rcarrillocruz	but yeah, i can deploy servers with bifrost	13:00
rcarrillocruz	i'll see what's up with glean	13:00
Zara	hm, should gerrit search autocomplete for stories and tasks now? aiui we need config to enable gerrit-updating-storyboard per project, as per the commit message here: https://review.openstack.org/#/c/347486/ but tasks and stories were now indexed in gerrit search?	13:01
mordred	sdague: http://paste.openstack.org/show/558378/	13:01
rcarrillocruz	huh	13:01
rcarrillocruz	also, the interface is set to dhcp, but should not	13:01
*** apetrich has quit IRC		13:02
mordred	Zara: I'm not sure they autocomplete - but https://review.openstack.org/#/q/bug:2000522 works	13:02
openstackgerrit	Emmet Hikory proposed openstack-infra/storyboard-webclient: Add Worklists and Boards to About Page https://review.openstack.org/355912	13:02
mordred	Zara: so adding a bug:2000522 to the search finds the thing by story id	13:03
*** xyang1 has joined #openstack-infra		13:03
*** _ari_ has joined #openstack-infra		13:03
sdague	mrodden: can you rm -rf ~/.cache/pip and try that again?	13:04
*** javeriak has quit IRC		13:04
*** woodster_ has joined #openstack-infra		13:04
*** kbaegis has joined #openstack-infra		13:05
Zara	mordred: oh, aha. I thought it needed 'story:2000522' but that was probably just me misinterpreting the expected behaviour. found the docs now and they do say 'bug:' and 'tr:' so whoops.	13:06
*** javeriak has joined #openstack-infra		13:06
sdague	because when I use -d, my pip cache remains empty	13:06
mordred	sdague: sure	13:06
sdague	with pip 8.1.2	13:06
*** yamamoto has joined #openstack-infra		13:07
tosky	sdague: now that devstack switched to neutron by default, how to enable nova-network in gate jobs (for a poor old Sahara job that I'd like to kill sooner than later)?	13:07
*** yamamoto has quit IRC		13:07
sdague	tosky: the gate doesn't really change, it's always had explicit service lists	13:07
odyssey4me	yolanda if you have a moment, a review of https://review.openstack.org/355434 would be appreciated	13:07
*** andymaier has joined #openstack-infra		13:08
mordred	sdague: yes. it works	13:08
yolanda	odyssey4me, back from lunch, i'll take a look in a while	13:08
*** ociuhandu has joined #openstack-infra		13:08
Zara	(yes, bug:$task_id will also find storyboard tasks, ace)	13:09
*** sshnaidm\|lnch is now known as sshnaidm		13:09
*** devkulkarni has quit IRC		13:10
*** lucas-hungry is now known as lucasagomes		13:10
mordred	sdague: http://paste.openstack.org/show/558383/	13:10
penguinolog	Hello! Could anybody help with https://review.openstack.org/#/c/355382/ - it's blocker for the parallel team	13:10
*** edmondsw has joined #openstack-infra		13:11
*** lifeless has quit IRC		13:11
persia	Zara: Do we run any risk of collision between LP bug# and SB task#? There's a gap in SB stories to avoid LP bugs, but I don't think there is one for tasks.	13:11
*** mriedem has joined #openstack-infra		13:12
*** Julien-zte has joined #openstack-infra		13:12
*** andymaier has quit IRC		13:13
mordred	sdague, odyssey4me: I tested git remote operations on an osic node and they all took less than a second as expected ( doing git remote update origin pointed at git.o.o)	13:13
mordred	so it doesn't seem to be routing issues	13:13
*** javeriak has quit IRC		13:14
Zara	persia: yes, I think so, though just when searching for them. so if two commits pop up when someone searches, it should be fairly quick to find the right one since I'd imagine they'd be about totally different things.	13:14
tosky	sdague: I see, thanks	13:15
*** nmagnezi has quit IRC		13:15
*** apetrich has joined #openstack-infra		13:15
persia	Zara: I was worried more about comments being posted to unrelated stories that might trigger email as a result of subscriptions. Maybe I lack context.	13:15
sdague	mordred: pip --version?	13:15
Zara	persia: ah, that's a separate thing. this is just for searching things in gerrit. the plugin should use storyboard-specific syntax in the commit message	13:18
azvyagintsev_h	Folks, could you please suggest how i should fix templates for https://review.openstack.org/#/c/353861/17..18/zuul/layout.yaml ? since if i remove check\gate section - test fall ;(	13:18
persia	Ah, cool. I was missing context :)	13:18
mordred	sdague: pip 8.1.2 from /usr/lib/python2.7/site-packages (python 2.7)	13:19
*** spzala_ has joined #openstack-infra		13:19
sdague	mordred: ok, well	13:19
mordred	sdague: I'm not sure why it's not working for you - but I think it has the better chance of working, since we know install won't work	13:20
*** ianychoi has quit IRC		13:20
Zara	(so 'closes-bug: $id' will close a lp bug, 'task: $id' will affect sb task status; both are searchable in gerrit with 'bug:$id'. so in practice I think the tricky bit will be that we'll probably see people using lp notation to try to change sb task status, but that's one for the future)	13:20
*** zhurong has joined #openstack-infra		13:21
*** cdent has left #openstack-infra		13:22
sdague	mordred: this is what I get - http://paste.openstack.org/show/558386/	13:22
mordred	sdague: you need to do find /root/.cache/pip	13:22
mordred	not /home/sdague	13:23
sdague	I'm not running as root	13:23
*** kushal has quit IRC		13:23
mordred	hrm. weird. try as root and see if you get my behavior?	13:23
mordred	(since that's the important one for this)	13:23
mordred	sdague: got it	13:25
mordred	sdague: adding the index prevents the cache	13:25
openstackgerrit	Oleksii Zamiatin proposed openstack-infra/project-config: Remove n-net related gates https://review.openstack.org/355919	13:25
sdague	mordred: gah, really?	13:25
mordred	sdague: yup	13:25
openstackgerrit	Merged openstack-infra/project-config: Implement LXD hypervisor experimental check https://review.openstack.org/355434	13:25
sdague	so that means this doesn't work at all because we're using alternative indexes?	13:25
mordred	it will neither download to the cache or use things in the cache	13:25
mordred	yah	13:25
mordred	at least, according to my test just now	13:26
mordred	I haven't poked more extensively	13:26
openstackgerrit	Merged openstack-infra/project-config: fuel-qa: stable-mu branches for maintenance and stable for upgrades https://review.openstack.org/355382	13:26
sdague	so... we know that's not entirely true during runs, because we definitely only download each package once	13:27
mordred	weird	13:27
mordred	well, local testing with your command line resulted in nothing being cached	13:27
sdague	yeh, install with index still builds the cache	13:28
sdague	it's just download that doesn't	13:28
mordred	sigh	13:28
mordred	that seems like a pip bug	13:28
mordred	oh - so - this is going to run during image build	13:28
*** lifeless has joined #openstack-infra		13:28
*** rbuzatu_ has joined #openstack-infra		13:28
mordred	which means it should be hitting pypi, not pip mirrors	13:29
mordred	yeah?	13:29
yolanda	rcarrillocruz, can it be a race? i ran glean several times on my environment and i get good results	13:29
mordred	or do we set it to use the dfw mirror during image buids (/me can't remember)	13:29
sdague	mordred: that actually will defeat the purpose of the patch if it does	13:29
mordred	why?	13:29
mordred	it's during image build - it'll download and cache the things using download. then, during devstack run, the cache will be populated and the intsall command will be using install so it should read the cache	13:30
sdague	because if we hit pypi and download, then we'll get numpy as source	13:30
sdague	which means we have to spend 4 minutes compiling it on the node	13:30
mordred	oh. right. bother	13:30
mordred	so - I guess we just have to get download honoring caches	13:30
*** rlandy is now known as rlandy\|mtg		13:31
mordred	dstufft: ^^ whence you awaken ... tl;dr pip download with -i option for an alternate index does not populate or consume cache. pip install with -i option does	13:32
*** andymaier has joined #openstack-infra		13:32
*** ianychoi has joined #openstack-infra		13:32
mordred	that said - I do not believe we point at pip mirrors until the image boots	13:32
*** rbuzatu has quit IRC		13:33
mordred	so we'd also want to explore setting a mirror location during image build	13:33
fungi	mordred: does changing the mirror url after boot still pose a problem?	13:34
fungi	if so, we're sort of stuck unless we want to build different images for every provider/region	13:34
*** inc0 has joined #openstack-infra		13:35
mordred	fungi: no, I do not believe it does	13:35
fungi	so if we, say, set it to the dfw pypi mirror before caching packages, then we can update it to a different mirror later and it'll still use the cache?	13:36
*** esberglu has joined #openstack-infra		13:36
jroll	mordred: when you have a sec, this looks like a similar thing you were looking at yesterday, is it just a timeout or something else? http://logs.openstack.org/55/328955/15/check/check-osc-plugins/177080d/_zuul_ansible/ansible_log.txt . no errors in the console log http://logs.openstack.org/55/328955/15/check/check-osc-plugins/177080d/console.html	13:37
sdague	fungi: ug, you might be right	13:38
*** rbuzatu_ has quit IRC		13:38
mordred	jroll: yah - 2016-08-16 11:33:55,481 p=6961 u=zuul \| fatal: [node]: FAILED! => {"async_result": {"ansible_job_id": "344516947230.6884", "changed": false, "finished": 0, "invocation": {"module_args": {"jid": "344516947230.6884", "mode": "status"}, "module_name": "async_status"}, "started": 1}, "changed": false, "failed": true, "msg": "async task produced unparseable results"}	13:39
fungi	sdague: mordred: i mean, maybe we can "transform" the cache when we reset the mirror url, as an alternative. though that's getting into implementation details of pip's cache that probably aren't a guaranteed stable api	13:39
mordred	jroll: that just ran a couple of hours ago doens' it?	13:39
jroll	mordred: looks like it, yeah	13:40
mordred	fungi: I think it's caching by content hash, not by name	13:40
*** nwkarsten has joined #openstack-infra		13:40
sdague	mordred: I'm not so sure	13:40
fungi	oh, so if the content hash is consistent (which it would be across our mirrors unless there's an update) then we might be fine	13:40
fungi	but if it mixes other data into that hash, like the url or something, then that gets tricky	13:41
mordred	sdague: pip install with a -i doesn't cache for me with install either	13:41
*** hichihara has joined #openstack-infra		13:41
fungi	huh. apparently crowbar is still under active development? just saw a cve request to the oss-security ml because they were setting a known default admin account password in it	13:42
mordred	fungi: yah - it's the basis of rob's current company	13:43
fungi	oic	13:43
*** zhurong has quit IRC		13:43
*** markusry has joined #openstack-infra		13:43
sdague	anyway, I need to get back to release things. Once dstufft is up he can probably just tell us all our silliness instead of us guessing	13:43
jroll	mordred: so "yah" meaning "yah that is similar" or "yah that is a timeout" or? :) looking for something actionable I can do here	13:43
Shrews	mordred: that ansible error... looks like a genuine timeout	13:43
mordred	sdague: http://paste.openstack.org/show/558391/	13:43
sdague	my quick git grepping in pip source isn't finding payload	13:43
Shrews	mordred: TASK [zuul_runner with 1547 second timeout]	13:43
jroll	oh wait, timestamps	13:44
* jroll feels dumb		13:44
*** rbuzatu has joined #openstack-infra		13:44
fungi	Shrews: so the behavior i was seeing in various job logs yesterday were legitimate timeouts ending with an ansible json parse failure	13:44
*** zhurong has joined #openstack-infra		13:44
*** ramishra has quit IRC		13:44
mordred	fungi: yah. that's what we fixed yesterday	13:44
sdague	mordred: you delete the venv	13:44
*** dizquierdo_afk is now known as dizquierdo		13:44
Shrews	fungi: yeah, i can't explain why a timeout causes that	13:44
mordred	sdague: I do - then I re-make it	13:44
sdague	ah, right	13:44
*** ramishra has joined #openstack-infra		13:44
jroll	Shrews: so this is just pip being super slow, I guess	13:44
fungi	as if one of the things ansible was failing to parse was the json coming from jobs that timed out, not that ansible was responsible for the timeout	13:44
Shrews	jroll: probably?	13:45
sdague	mordred: can you do that without the ^C?	13:45
mordred	sdague: sure	13:45
jroll	Shrews: all that job does, if you look at the console, is install a bunch of OSC plugins	13:45
jroll	:)	13:45
Shrews	fungi: yeah, i suspect nothing is written to the async file if the job doesn't finish, thus the unparseable	13:45
fungi	empty != json	13:45
fungi	indeed	13:45
fungi	jroll: did that run in rax-ord?	13:46
jroll	fungi: nope, osic http://logs.openstack.org/55/328955/15/check/check-osc-plugins/177080d/console.html	13:46
mordred	Shrews, fungi: hrm. that would be annoying	13:46
pabelanger	cloudnull: I updated ubuntu-xenail in osic-cloud1 and confirmed DNS is running on ipv6. Other images will be updated today	13:46
fungi	jroll: oh, okay. we did just up the quota significantly there... lemme check a few things	13:46
Shrews	mordred: we might be able to recognize the timeout in ansible and write empty json to solve that	13:46
mordred	Shrews: there is an if/else case that I thought was related to timeout	13:47
* Shrews looks		13:47
jroll	fungi: cool, I'm going to recheck that unless you think there's reason not to	13:47
jroll	I guess we had two in a row, though	13:47
fungi	unfortunately https://review.openstack.org/355580 hasn't merged yet, so we're going to have a relatively hard time figuring out if we're taxing that mirror	13:48
mordred	Shrews: line 603 in lib/ansible/executor/task_executor.py	13:48
jroll	both osic cloud	13:48
Shrews	mordred: ah, it still depends on 'parsed' being there	13:48
Shrews	which it won't be if it didn't actually finish	13:48
fungi	jroll: might only be coincidence, but i'll get our mirror there into cacti in moments and see what else i can find in the meantime	13:48
*** kushal has joined #openstack-infra		13:49
jroll	fungi: cool, thank you :)	13:49
mordred	Shrews: why not thought? the async_runner shold be the thing writing the status to the file	13:49
Shrews	mordred: apparently it isn't. 'parsed' is not in the output you just pasted in channel	13:49
rcarrillocruz	so yeah	13:52
rcarrillocruz	fungi: we are bifrosting	13:52
rcarrillocruz	i just redeployed a server with paul via screen session now	13:53
fungi	rcarrillocruz: rock on! that's awesome news	13:53
*** tonytan4ever has joined #openstack-infra		13:53
*** permalac has joined #openstack-infra		13:53
mordred	\o/	13:54
rcarrillocruz	we needed several bifrost fixes	13:54
rcarrillocruz	and i spotted a couple glean things	13:54
rcarrillocruz	we'll go thru in a bit	13:54
pabelanger	mordred: fungi: clarkb: Would love some feedback on: https://review.openstack.org/#/c/355695/ to fix some launch node failures with ubuntu-xenial	13:54
* rcarrillocruz goes for coffee now		13:54
*** burgerk has quit IRC		13:54
*** vikrant has joined #openstack-infra		13:55
*** infra-red has quit IRC		13:55
azvyagintsev_h	fungi craige Folks, could you please suggest how i should fix templates for https://review.openstack.org/#/c/353861/17..18/zuul/layout.yaml ? since if i remove check\gate section - test fall ;(	13:55
*** yamahata has joined #openstack-infra		13:55
mordred	pabelanger: wow	13:55
*** infra-red has joined #openstack-infra		13:56
mordred	pabelanger: just out of idle curiosity - (patch looks fine) - I wonder if we could get the ssh daemon to not start until unbound is started	13:56
pabelanger	mordred: yes, I thought of that too. I haven't looked into that yet	13:56
fungi	pabelanger: speaking of xenial, snmpd won't start on firehose01... i suspect we need to tweak our config for it	13:57
pabelanger	mordred: as for the problem: http://paste.openstack.org/show/557771/ I _think_ unbound is waiting for random to initialize in the kernel, before doing things with its root.key	13:57
*** vikrant has quit IRC		13:57
*** markusry has quit IRC		13:57
*** markusry has joined #openstack-infra		13:57
pabelanger	fungi: sounds like we need to get an etherpad to track our xenial issues	13:57
*** thorongil has quit IRC		13:58
fungi	rcarrillocruz: looks (from all the sudospam i've received) that baremetal00 is having trouble resolving its own hostname. may need /etc/hosts fixed?	13:58
*** rbrndt has joined #openstack-infra		13:59
fungi	rcarrillocruz: or maybe you already fixed that... last entry i have was 07:47:53 utc	13:59
*** pgadiya has quit IRC		14:00
*** jimbaker has quit IRC		14:00
*** nmagnezi has joined #openstack-infra		14:01
*** markusry has quit IRC		14:02
*** andymaier has quit IRC		14:02
*** bin_ has joined #openstack-infra		14:02
*** jistr is now known as jistr\|debug		14:03
*** rlandy\|mtg is now known as rlandy		14:04
*** jimbaker has joined #openstack-infra		14:04
*** jimbaker has quit IRC		14:04
*** jimbaker has joined #openstack-infra		14:04
rcarrillocruz	fungi: https://review.openstack.org/#/c/355778/	14:05
*** zhurong has quit IRC		14:06
rcarrillocruz	fungi: essentially, the install playbook on bifrost hardcodes /etc/hostname on 127.0.0.1	14:06
*** zhurong has joined #openstack-infra		14:06
rcarrillocruz	which breaks fqdn resolution	14:06
rcarrillocruz	and breaks puppet apply runs	14:07
rcarrillocruz	other thing i've noticed is that puppet sets /etc/resolv.conf to nameserver 127.0.0.1, not sure if that's some unbound thing on the node declaration	14:07
rcarrillocruz	pabelanger: ^	14:07
*** yamamoto has joined #openstack-infra		14:07
*** hichihara has quit IRC		14:08
fungi	rcarrillocruz: yeah, that's because we run unbound on all our servers to provide a local resolver cache	14:08
rcarrillocruz	that's a problem, since baremetal00 runs dnsmasq itself	14:09
fungi	oh, so port conflict i guess	14:09
rcarrillocruz	possibly, although i see unbound set to false on the node declaration	14:09
rcarrillocruz	O_O	14:09
rcarrillocruz	i'll wait for paul, i remember he changed the unbound setting on this node for a reason	14:10
fungi	rcarrillocruz: likely we need to specify a remote resolved if we're not installing unbound	14:10
fungi	er, remote resolver	14:10
rcarrillocruz	i set by hand /etc/resolv.conf to 8.8.8.8 :/	14:10
*** armax has joined #openstack-infra		14:10
fungi	pabelanger: digging into the snmpd issue, the commands in our initscript seem fine, but apparently it's not used because there's a systemd unit for it which takes precedence	14:10
* fungi blames lennart		14:10
rcarrillocruz	but yeah, conflicts with puppet, since that changes it back to 127.0.0.1 which breaks install playbook that pulls things from IntarWeb	14:11
rcarrillocruz	it can't resolve	14:11
mtreinish	pleia2: I though I moved everything to use others.html now	14:11
pleia2	mtreinish: shrug	14:11
*** _ari_ has quit IRC		14:11
* rcarrillocruz is procrastinating learning of systemd		14:11
mtreinish	pleia2: ugh, no the template for integrated gate and the output file look like it's other.html still	14:13
*** yamamoto has quit IRC		14:13
pleia2	mtreinish: well, at least it generates now, just some final tweaks to tidy this up then	14:14
*** tqtran has joined #openstack-infra		14:14
jeblair	rcarrillocruz: why is baremetal00 running dnsmasq as a nameserver?	14:15
rcarrillocruz	jeblair: it's what it uses to pxe boot servers	14:15
*** pgadiya has joined #openstack-infra		14:16
rcarrillocruz	bifrost that is	14:16
*** xarses has quit IRC		14:16
jeblair	rcarrillocruz: why is a name server needed for that?	14:16
rcarrillocruz	it's a bifrost dependency	14:16
rcarrillocruz	it's also used as a pxe/tftp server, not just a nameserver	14:17
*** jaosorior has quit IRC		14:17
jeblair	yeah, the parts that are not a dns resolver make sense. i'm just wondering why it's also configured as a dns resolver	14:17
rcarrillocruz	if you want historical reasons why it was decided to be used , TheJulia may be best to answer that	14:17
jeblair	i'm not sure i'm stating the question in a way that is conveying my meaning	14:17
fungi	is it possible to use it without having it serve as a dns resolver?	14:17
TheJulia	it is, the configuration just needs to be disabled for dns resolution	14:18
*** asettle has quit IRC		14:18
TheJulia	that is in what is put in place for dnsmasq's main config file	14:18
fungi	that way it wouldn't conflict with the local resolver cache service we want to run on the same machine	14:18
*** tqtran has quit IRC		14:18
*** asettle has joined #openstack-infra		14:18
jeblair	dnsmasq is a server which supports lots of protocols. do we need to use its dns resolver as opposed to just the other (pxe/tftp) bits?	14:18
rcarrillocruz	is there a flag available to disable it ?	14:19
rcarrillocruz	TheJulia: ^	14:19
rcarrillocruz	the dns part	14:19
TheJulia	rcarrillocruz: in bifrost, not presently	14:19
rcarrillocruz	what i thought	14:19
rcarrillocruz	i mean, it wouldn't be complex to push	14:19
*** edtubill has joined #openstack-infra		14:19
rcarrillocruz	s/push/patch	14:19
TheJulia	no, it should be extremely simple	14:19
rcarrillocruz	TheJulia: do nodes need any dns resolving from the bifrost controller during the IPA loading etc	14:20
rcarrillocruz	?	14:20
*** jcoufal has joined #openstack-infra		14:20
rcarrillocruz	if DNS was never needed, i'm curious why it wasn't just disabled from the beginning	14:20
*** asselin has joined #openstack-infra		14:21
TheJulia	rcarrillocruz: if someone decides to use names in the config handed to ironic in terms of URLs, then dns resolution is required	14:21
TheJulia	but if only IPs are used, then it is not required	14:21
rcarrillocruz	ah	14:21
*** gongysh has joined #openstack-infra		14:21
rcarrillocruz	hmmm	14:21
jeblair	we do have a dns resolver :)	14:21
TheJulia	Well, there you go, the correct dns resolver just needs to be offered out for dhcp requests then	14:22
*** devkulkarni has joined #openstack-infra		14:22
jeblair	{% if disable_dnsmasq_dns %}	14:22
jeblair	bifrost may already have the option :)	14:22
openstackgerrit	Matthew Treinish proposed openstack-infra/elastic-recheck: Make everything plural https://review.openstack.org/355967	14:23
rcarrillocruz	doing ironic node-show blah, i only see IPs there	14:23
rcarrillocruz	so i think we should be good	14:23
*** asettle has quit IRC		14:23
*** edtubill has quit IRC		14:23
*** _ari_ has joined #openstack-infra		14:23
*** hongbin has joined #openstack-infra		14:25
fungi	anybody happen to know how to get systemd to tell you the location on disk of the unit it's using for a particular service?	14:28
openstackgerrit	Ricardo Carrillo Cruz proposed openstack-infra/puppet-infracloud: Disable DNS resolver on Bifrost dnsmasq server https://review.openstack.org/355973	14:29
rcarrillocruz	jeblair: ^	14:30
rcarrillocruz	fungi: ^	14:30
*** tosky has quit IRC		14:30
*** edtubill has joined #openstack-infra		14:31
*** zz_dimtruck is now known as dimtruck		14:31
openstackgerrit	Ivan Udovichenko proposed openstack-infra/project-config: Add new/update existing projects https://review.openstack.org/347047	14:32
*** tosky has joined #openstack-infra		14:33
azvyagintsev_h	fungi will you have some time to help me with https://review.openstack.org/#/c/353861 ? i cannot get where i miss those stuff..(	14:34
fungi	just a heads up, we're discussing some job failures in #openstack-qa that look like they're a result of remote http(s) calls out of bluebox are failing with some consistency	14:34
*** rajinir has joined #openstack-infra		14:34
openstackgerrit	Ivan Udovichenko proposed openstack-infra/project-config: Add new/update existing projects https://review.openstack.org/347047	14:34
fungi	azvyagintsev_h: no clue what you're talking about, or why you're asking me directly. can you elaborate?	14:34
*** mdrabe has quit IRC		14:34
fungi	azvyagintsev_h: is it related to something i was already working on?	14:34
*** mdrabe has joined #openstack-infra		14:35
rajinir	fungi: The cell patch was reverted https://review.openstack.org/#/c/355599/1	14:35
*** jed56 has quit IRC		14:35
*** asettle has joined #openstack-infra		14:35
fungi	rajinir: the patch which was causing the failure you were seeing, right?	14:35
rajinir	fungi: yes	14:35
fungi	i didn't follow that very closely, just saw it was also severely impacting nova cells tests in the upstream ci as well	14:36
*** pgadiya has quit IRC		14:36
*** burgerk has joined #openstack-infra		14:37
azvyagintsev_h	fungi i guess no:) should i directly ask\wait Craig\someone else ? (i'm asking you just because you are guru )	14:37
*** amitgandhinz has quit IRC		14:38
*** links has quit IRC		14:39
*** amitgandhinz has joined #openstack-infra		14:39
*** kushal has quit IRC		14:40
openstackgerrit	Ilya Shakhat proposed openstack-infra/project-config: Add new project "os-failures" https://review.openstack.org/355819	14:40
betherly	hi there! getting ready to release ironic-ui. i have a patch for openstack-releases but do i also need to tag the release?	14:40
*** kushal has joined #openstack-infra		14:41
betherly	the route for releasing eslint was quite different from what i did with the ironic-ui last time so got a bit confused re what i need to do this time round	14:41
fungi	betherly: you probably want to ask in #openstack-release but i believe for projects under release management you submit a patch to the releases repo and then a release manager runs a script after it merges and pushes a tag for you	14:41
*** xarses has joined #openstack-infra		14:42
betherly	ah sorry fungi!!	14:42
betherly	that would make sense! thank you so much :)	14:42
fungi	betherly: you're welcome	14:42
*** florianf has quit IRC		14:44
jeblair	fungi: 355580 was killed by the problem it attempts to debug	14:44
Shrews	fungi: so, hopefully this change will make those ansible timeouts actually be reported as timeouts and not unparseable: https://github.com/ansible/ansible/pull/17104	14:44
openstackgerrit	Matthew Treinish proposed openstack-infra/elastic-recheck: Add query for bug 1613749 https://review.openstack.org/355988	14:44
openstack	bug 1613749 in OpenStack-Gate "Git timeouts from bluebox" [Undecided,New] https://launchpad.net/bugs/1613749	14:44
beagles	rcarrillocruz, is https://review.openstack.org/#/c/355973/ supposed to help with some of the jobs failing because of stuff like slow git repo cloning, etc?	14:45
beagles	just seeking clarification as the issue is atm near and dear to my heart :)	14:45
jeblair	beagles: no; but the ansible fix from yesterday was not sufficient	14:45
jeblair	beagles: https://github.com/ansible/ansible/pull/17104	14:46
rcarrillocruz	beagles: no, that change is unrelated	14:46
beagles	rcarrillocruz, thanks	14:46
beagles	also jeblair thanks	14:46
jeblair	beagles: so the next iteration of that fix is in progress. it hasn't landed in ansible yet, but when it does, we'll redeploy	14:47
jeblair	beagles: however, it's looking like most of the instances of this error are actually timeouts -- did you say you were thinking that was the case with your job?	14:47
rcarrillocruz	that change is about infracloud, a pool of servers we'll manage to run a cloud for CI	14:47
Shrews	jeblair: to be fair, that PR will just (hopefully) get timeouts reported as timeouts. it doesn't actually fix the timeout	14:47
jeblair	Shrews: right :)	14:47
beagles	jeblair, yeah, I was just going to say.. the parsing thing is just how the info was represented - it's the timeouts I'm wondering about	14:48
beagles	not the timeouts actually by why those particular things are taking so long :)	14:49
beagles	jeblair, ultimately, I want the ansible fix to be unnecessary :)	14:49
fungi	azvyagintsev_h: explaining in channel what you're trying to figure out and what potential issues you've eliminated already is usually a faster way to get help, rather than just pasting a link. i've skimmed the change and it seems you're proposing creation of a new project/repo but are having trouble with the layout job. the console log from it indicates you're trying to configure zuul to run jobs you	14:49
fungi	haven't defined (e.g., gate-murano-pkg-check-python27-ubuntu-trusty). i see a typo which i've marked inline on your change that would account for it	14:49
*** dprince has quit IRC		14:49
*** pt_15 has joined #openstack-infra		14:50
*** jistr\|debug is now known as jistr		14:50
jeblair	mordred: your 4-minute git thing was on osic?	14:51
sdague	jeblair: 4 seconds, right?	14:52
jeblair	sdague: those are different than minutes? :) yeah, 4-something. i guess i'm asking what that value is too. i may be confused because i'm staring at a log that took 5 minutes for each remote update.	14:53
jeblair	on osic.	14:54
sdague	jeblair: I thought he said seconds	14:54
sdague	4 minutes would be an issue, I agree	14:54
jeblair	sdague: a full clone of git://git.openstack.org/openstack/python-aodhclient took 1 second, so even 4 seconds is :(	14:55
*** tongli has joined #openstack-infra		14:56
*** vinaypotluri has joined #openstack-infra		14:56
*** Julien-zte has quit IRC		14:57
* jeblair fixes he.net tunnel		14:57
*** permalac has quit IRC		14:58
*** florianf has joined #openstack-infra		14:58
*** jimbaker has quit IRC		14:59
jeblair	telnet 2001:4800:1ae1:18:f816:3eff:fe13:4660 1885	14:59
jeblair	ga	14:59
jeblair	that's even the wrong port	14:59
mordred	jeblair: yes. 4 minutes	14:59
jeblair	mordred: not seconds? :)	14:59
mordred	:)	14:59
*** yaume has joined #openstack-infra		15:00
mordred	nope. 4 minutes :)	15:00
jeblair	mordred: osic?	15:00
openstackgerrit	Aleksey Zvyagintsev proposed openstack-infra/project-config: Add repo for murano-pkg-check. Murano package validator tool. https://review.openstack.org/353861	15:00
openstackgerrit	Darragh Bailey proposed openstack-infra/jenkins-job-builder: Allow using lockfile per jenkins master https://review.openstack.org/293631	15:00
mordred	jeblair: yup	15:01
openstackgerrit	Aleksey Zvyagintsev proposed openstack-infra/project-config: Add repo for murano-pkg-check. Murano package validator tool. https://review.openstack.org/353861	15:01
*** amitgandhinz has quit IRC		15:02
*** amitgandhinz has joined #openstack-infra		15:02
wznoinsk	hi infra	15:03
wznoinsk	et al	15:03
*** jimbaker has joined #openstack-infra		15:03
*** jimbaker has quit IRC		15:03
*** jimbaker has joined #openstack-infra		15:03
openstackgerrit	Aleksey Zvyagintsev proposed openstack-infra/project-config: Add repo for murano-pkg-check. Murano package validator tool. https://review.openstack.org/353861	15:03
wznoinsk	did anyone see a situation where in static-network-up is emitted earlier than all the interfaces get their IPs and their /run/network/ifup.* get created?	15:04
jeblair	mordred, sdague: oh, huh, it's not every job on osic. i just watched one breeze right through a git clone.	15:04
*** ifarkas is now known as ifarkas_afk		15:04
wznoinsk	that's ubuntu 14.04, troubleshooting cloud-init init kicking off to early (before the network is actually up)	15:04
*** dizquierdo is now known as dizquierdo_afk		15:05
mordred	jeblair: yah - I jumped on an osic node earlier and tried some manual updates and they worked as expected	15:05
mordred	jeblair: I have not yet been able to find the pattern	15:05
jeblair	grr.	15:06
pabelanger	mordred: jeblair: fungi: So, here is the boot process on ubuntu-xenail visualized: http://imgh.us/filename_3.svg check out unbound	15:06
pabelanger	1min 155ms to start	15:06
pabelanger	I don't know why yet	15:06
*** martinkopec has quit IRC		15:06
openstackgerrit	Emmet Hikory proposed openstack-infra/storyboard-webclient: Add Worklists and Boards to About Page https://review.openstack.org/355912	15:06
jeblair	cloudnull: if we collect instance ids from jobs which have very slow interactions with our git farm, can you correlate those and see if there is a host/network patten on the cloud side?	15:07
pabelanger	it does look like it is waiting for random	15:07
fungi	speaking of osic, it looks like we also have devstack jobs failing there because glance isn't responding on 127.0.0.1:9292 when told to listen on 0.0.0.0:9292 (baffling)	15:07
fungi	and i notice traceroute6 out from job nodes there to git.o.o coming back blank	15:07
jeblair	pabelanger: is that first or second boot?	15:08
*** mhickey has joined #openstack-infra		15:08
*** itisha has joined #openstack-infra		15:09
jeblair	fungi: i just did 'traceroute6 git.openstack.org' from a node (which ran a job where git worked fine) and got data	15:09
jeblair	fungi: so maybe on the nodes where git takes 4+ minutes for each operation, traceroute6 git.o.o also fails?	15:09
*** hockeynut has joined #openstack-infra		15:10
openstackgerrit	Aleksey Zvyagintsev proposed openstack-infra/project-config: Add repo for murano-pkg-check. Murano package validator tool. https://review.openstack.org/353861	15:10
pabelanger	jeblair: in this case, 2nd boot (I disabled the puppet service on boot). I can redo on first boot if needed	15:10
fungi	jeblair: perhaps. here's one log we were looking at with the localhost glance weirdness http://logs.openstack.org/55/352455/3/gate/gate-tempest-dsvm-cells/7881266/console.html#_2016-08-16_12_27_19_830897	15:10
jeblair	pabelanger: nah, that's okay. 2nd is more interesting to me.	15:10
*** vhosakot has joined #openstack-infra		15:13
*** devkulkarni has quit IRC		15:13
*** jcoufal has quit IRC		15:13
*** devkulkarni has joined #openstack-infra		15:13
openstackgerrit	Paul Belanger proposed openstack-infra/system-config: Disable puppet service on boot https://review.openstack.org/356004	15:13
pabelanger	disabled puppet service on boot^	15:14
*** hockeynu_ has joined #openstack-infra		15:15
pabelanger	jeblair: I've updated our configure_mirror.sh (355695) to better handle the delayed dns on ubuntu-xenial. Since we have a large amount of launch failures because of it. This could also explain why ubuntu-trusty ready nodes is much higher then ubuntu-xenial during the day	15:15
*** devkulkarni has quit IRC		15:16
*** devkulkarni has joined #openstack-infra		15:16
jeblair	pabelanger: does that always happen, or just sometimes?	15:16
jeblair	pabelanger: i wonder if it's the same problem as git.	15:17
*** Goneri has quit IRC		15:17
pabelanger	jeblair: Yes, I've also see it on multiple clouds, osic-cloud1 and bluebox, in sampling ubuntu-xenial syslogs	15:17
*** dprince has joined #openstack-infra		15:18
jeblair	pabelanger: oh, so not just osic	15:18
pabelanger	let me check others quickly	15:18
pabelanger	jeblair: right	15:18
*** jcoufal has joined #openstack-infra		15:18
*** hockeynut has quit IRC		15:18
clarkb	pabelanger: and that is with the pre ipv6 resolver config right? (that hasn't gotten onto our images yet)	15:18
*** Goneri has joined #openstack-infra		15:18
jeblair	cloudnull: is there any commonality between instances a4b575fe-b043-4775-9d8e-286c04f03a9f and 9b3bf68f-b08b-4851-a22a-d2f6a5247982 ?	15:19
pabelanger	clarkb: no, I build and uploaded a ubuntu-xenial image to osic last night, same issue	15:19
jeblair	cloudnull, pabelanger: i don't think we actually needed the ipv6 resolver change -- osic has a 6 to 4 gateway	15:19
jeblair	clarkb: ^	15:19
*** markusry has joined #openstack-infra		15:19
jeblair	shouldn't hurt	15:20
clarkb	jeblair: correct we don't need it for stuff to function properly. Just trying to make sure this isn't somehow a regression related to that change	15:20
*** markusry has quit IRC		15:20
fungi	yeah, we don't _need_ it for osic, but we do need it in case we end up with a provider with no ipv4 routing at all for our job nodes in the future	15:20
fungi	so not urgent, but not entirely useless	15:20
*** mtanino has joined #openstack-infra		15:20
mtreinish	jeblair: fwiw, we're tracking 2 failures on the tempest glance tests. One on osic and the other on bluebox	15:20
mtreinish	it's the same tests, but they manifest a little differently	15:21
pabelanger	clarkb: jeblair: that is from internap http://imgh.us/filename_4.svg	15:22
pabelanger	I have a change out to disable puppet on boot	15:23
pabelanger	356004	15:23
*** derekh has quit IRC		15:24
clarkb	pabelanger: how long does it take if you stop the unbound service then start it?	15:24
*** oanson has quit IRC		15:24
clarkb	is this only present on boot or any time the service starts?	15:24
openstackgerrit	Ricardo Carrillo Cruz proposed openstack-infra/puppet-infracloud: Switch to infra-cloud-bridge element https://review.openstack.org/356009	15:24
pabelanger	clarkb: after it has started properly, restarts are instant	15:24
fungi	i have a feeling it's generating a local key for dnssec at first start	15:24
*** ccamacho is now known as ccamacho\|out		15:24
pabelanger	right, I think that too	15:25
fungi	though why it takes that long to do so is worth asking	15:25
openstackgerrit	Merged openstack-infra/system-config: Add mirror.regionone.osic-cloud1.o.o to cacti https://review.openstack.org/355580	15:25
fungi	haveged starts well before udevd according to that visualization	15:25
rcarrillocruz	some entropy delay ^	15:25
pabelanger	clarkb: http://paste.openstack.org/show/557771/ is always the order when unbound starts processing	15:25
clarkb	"With cache restoration turned on, my system reboot would take forever, because of unbound hanging/processing a maybe corrupt cache-file." is from a random pfsense forum post	15:25
pabelanger	rcarrillocruz: likely ^	15:25
fungi	er, i mean haveged starts well before unbound	15:26
fungi	oh, that's worth checking	15:26
pabelanger	clarkb: So....	15:26
pabelanger	there is some chroot logic in unbound too	15:26
clarkb	pabelanger: apparently we can turn off cache restoration which should be fine for our single use nodes	15:26
clarkb	(if that is indeed related)	15:26
pabelanger	okay, I can try that	15:27
*** tongli has quit IRC		15:27
pabelanger	any docs on how to do that?	15:27
*** tongli has joined #openstack-infra		15:27
clarkb	not seeing it in the unbound.conf man page	15:28
* clarkb keeps digging		15:28
*** esikachev has quit IRC		15:29
fungi	yeah, i've been through the manpages for unbound, unbound.conf and unbound-control so far, to no avail	15:30
pabelanger	I think it is a manual process	15:30
jeblair	fungi, pabelanger: i thought this graph was second boot?	15:31
clarkb	pabelanger: ya looks like its part of unbound-control so would probably be part of the unit files if being done on ubuntu	15:31
pabelanger	jeblair: first svg was 2nd boot, seconds svg was first boot	15:32
jeblair	pabelanger: either way, it's a long startup both times, ya?	15:32
*** matrohon has quit IRC		15:32
jeblair	pabelanger: do you have a node where this is slow?	15:33
jeblair	i just restarted unbound on a xenial node and it was fast	15:33
pabelanger	jeblair: yes, 2001:4800:1ae1:18:f816:3eff:fe8e:9a3e	15:33
pabelanger	I manually launched that is osic-cloud1	15:34
pabelanger	jeblair: feel free to reboot if needed	15:34
jeblair	pabelanger: cool, thanks	15:34
clarkb	I guess the other thing to check is logs? is unbound logging to journald here?	15:35
jeblair	unbound restarts instantly there. i'll reboot	15:35
pabelanger	jeblair: right on first boot check the status of the service	15:36
pabelanger	I haven't not stopped and started right after a boot	15:36
pabelanger	clarkb: we'd have to enable debugging, which I can	15:36
jeblair	pabelanger: yeah, there's no delay when doing a stop/start	15:36
jeblair	there is some logging to syslog	15:36
clarkb	Aug 16 15:07:56 ubuntu unbound-anchor: fail: the anchor is NOT ok and could not be fixed	15:36
pabelanger	jeblair: on first boot?	15:37
*** tosky_ has joined #openstack-infra		15:37
clarkb	though on that random host I am looking at syslog for it seems to have started in about 2 seconds	15:37
*** zhurong has quit IRC		15:37
clarkb	Aug 16 15:07:55 ubuntu systemd[1]: Starting unbound.service... to Aug 16 15:07:56 ubuntu systemd[1]: Started unbound.service.	15:37
*** _nadya_ has quit IRC		15:38
pabelanger	clarkb: which host is that?	15:38
clarkb	ubuntu-xenial-rax-ord-3521279	15:38
clarkb	just a random one I grabbed out of the nodepool list	15:38
jeblair	strace -p 1852	15:38
jeblair	strace: Process 1852 attached	15:38
jeblair	getrandom(	15:38
clarkb	so this isn't consistent	15:38
*** edtubill has quit IRC		15:38
jeblair	so yes, waiting for getrandom	15:38
fungi	cat /proc/sys/kernel/random/entropy_avail	15:39
pabelanger	jeblair: neat	15:39
jeblair	2374	15:39
jeblair	i'll reboot again and repeat	15:39
fungi	also does ps suggest haveged is running?	15:39
clarkb	there is a haveged on my host that did it quickly	15:39
fungi	haveged should be keeping the entropy pool nice and full	15:39
*** tosky has quit IRC		15:40
pabelanger	fungi: I see it running	15:40
*** amotoki has joined #openstack-infra		15:40
openstackgerrit	Ilya Shakhat proposed openstack-infra/project-config: Add new project "os-failures" https://review.openstack.org/355819	15:41
pabelanger	root 700 0.2 0.0 12204 6584 ? Ss 15:39 0:00 /usr/sbin/haveged --Foreground --verbose=1 -w 1024	15:41
dstufft	mordred: sdague I am awake now, what's up?	15:41
jeblair	hrm. haveged was running while unbound-anchor was waiting. the pool had 2496	15:41
*** andreykurilin has quit IRC		15:41
jeblair	now that unbound-anchor completed the pool is at 2369	15:41
fungi	jeblair: yeah, that's a ton of available entropy	15:41
mordred	dstufft: pip download with an alternate index does not store into or retreive from cache. pip download without an alternate index does. pip install writes to and reads from cache in both cases	15:42
mordred	dstufft: should I file a bug for that?	15:42
*** rbuzatu has quit IRC		15:42
*** amotoki has quit IRC		15:43
*** amotoki has joined #openstack-infra		15:43
dstufft	mordred: is this alternate index available publically? can I repro it on my desktop?	15:43
mordred	dstufft: yup!	15:43
clarkb	jeblair: fungi pabelanger I wonder if /usr/share/dns/root.key's key is just old and stale? the unbound-anchor manpage warns against this	15:43
mordred	dstufft: pip install --trusted-host mirror.gra1.ovh.openstack.org -i http://mirror.gra1.ovh.openstack.org/pypi/simple paramz	15:44
mordred	dstufft: is what we've been using	15:44
mordred	dstufft: (obviously in the various different combinations)	15:44
clarkb	it then does an update beacuse that file is not valid	15:44
openstackgerrit	Emmet Hikory proposed openstack-infra/storyboard: Describe Storyboard in more detail https://review.openstack.org/356021	15:44
*** rcernin has quit IRC		15:46
clarkb	it does fetch things from the internet in that case	15:46
*** rbuzatu has joined #openstack-infra		15:46
pabelanger	clarkb: but once the file is updated once, shouldn't the next reboot be good?	15:47
rcarrillocruz	TheJulia, cinerama : are we good to land https://review.openstack.org/#/c/353990/ and https://review.openstack.org/#/c/354615/ ?	15:48
*** e0ne has quit IRC		15:49
clarkb	pabelanger: maybe? Probably not if it copies the bad one over again	15:50
jeblair	clarkb, pabelanger: when i strace unbound-anchor at boot, it's sitting at getrandom, and stays there until the kernel says: Aug 16 15:49:39 ubuntu kernel: [ 62.801497] random: nonblocking pool is initialized	15:50
dstufft	mordred: Hmm, well pip install --trusted-host doesn't populate the cache at all for me here, and I think that's by design (trying to remember back to when we implemented it). The comments in the code suggest we purposely only cache valid HTTPS to prevent semi persistent poisoning of the cache and requiring manual eviction, also http://mirror.gra1.ovh.openstack.org/pypi/simple/paramz/ doesn't have cache control headers so even if you	15:51
dstufft	had valid HTTPS it wouldn't do a no-network cache hit (it does have an ETag header, so it'll do a conditonal GET though) (same is true for the files themselves)	15:51
pabelanger	jeblair: I'd like to try something quickly, can we set ROOT_TRUST_ANCHOR_UPDATE=false in /etc/default/unbound and restart?	15:52
pabelanger	jeblair: then run strace	15:52
pabelanger	# Whether to automatically update the root trust anchor file.	15:52
pabelanger	ROOT_TRUST_ANCHOR_UPDATE=true	15:52
jeblair	pabelanger: go for it	15:52
mordred	dstufft: ah. you're right - I must have done one of the combinations wrong :(	15:52
mordred	dstufft: if we did a download without the alternate index just pointing to normal pypi	15:53
mordred	dstufft: and then did a subsequent install with the trusted index ... should we expect it to read from the cache?	15:53
dstufft	mordred: No, cache keys are full URLs	15:53
mordred	gotcha	15:53
openstackgerrit	Andrea Frittoli proposed openstack-infra/subunit2sql: Fix typo in test_attr_list handling https://review.openstack.org/355385	15:53
openstackgerrit	Andrea Frittoli proposed openstack-infra/subunit2sql: Remove the test_attr_prefix before injecting https://review.openstack.org/355393	15:53
jeblair	$ reboot	15:53
jeblair	Failed to connect to bus: No such file or directory	15:53
*** ganesan has joined #openstack-infra		15:53
dstufft	mordred: so my recommendation would be A) Throw letsencrypt on the mirror B) setup cache-control	15:54
jeblair	love it	15:54
clarkb	jeblair: need more sudo	15:54
jeblair	clarkb: indeed	15:54
openstackgerrit	Beth Elwell proposed openstack-infra/project-config: Add release notes jobs for ironic-ui https://review.openstack.org/356029	15:54
mordred	dstufft: nod. cool. thanks!	15:54
fungi	clarkb: sure, but that error message is beyond vague	15:54
mordred	fungi: ^^ see convo with dstufft	15:54
clarkb	fungi: ya its systemctl failing to talk to systemd due to perms	15:54
*** hieulq_ has joined #openstack-infra		15:55
fungi	mordred: but if cache keys are the full urls, then we're still not going to end up being able to do much to prepopulate the cache	15:55
mordred	fungi: agree	15:55
fungi	since we need a different mirror url in each provider	15:55
mordred	yup	15:55
dstufft	mordred: fungi if this is a systemd using machine I have a couple of systemd unit files and a cron job you can use to keep LE up to date	15:55
pabelanger	jeblair: clarkb: okay, that booted a little faster, jeblair I missed the strace, do you mind trying?	15:55
dstufft	ah yea	15:56
dstufft	that's harder	15:56
mordred	yah. I think that's the real issue	15:56
fungi	i suppose i can buy certs for the mirrors... i'm hesitant to have letsencrypt breaking our mirrors at random when it tries to renew certs	15:56
mordred	fungi: I don;'t think it'll get us anywhere	15:56
dstufft	it's a proper HTTP cache, so it treats different URLs as distinct	15:56
fungi	dstufft: is there a good way to transform the cache? i suppose you're using a one-way hash so we can't reverse it to update the urls?	15:57
mordred	actually - how is this working on normal devstack runs then?	15:57
mordred	sdague said earlier that we do see only one download of a given thing in our devstack jobs	15:57
openstackgerrit	Emmet Hikory proposed openstack-infra/storyboard-webclient: Add Worklists and Boards to About Page https://review.openstack.org/355912	15:57
mordred	but if install is not supposed to cache when we have trusted-host set	15:57
mordred	those two things are potentially at odds	15:57
clarkb	its wheel caching in devstack iirc	15:58
sdague	clarkb: it's just pip	15:58
dstufft	mordred: it's possible that wheel caching didn't get the same treatment	15:58
mordred	gotit. because those are locally built wheels	15:58
mordred	or whatnot	15:58
sdague	mordred: except they aren't	15:58
pabelanger	clarkb: jeblair: fungi: Oh, ya. Way faster now: http://imgh.us/filename_5.svg That is with ROOT_TRUST_ANCHOR_UPDATE=false	15:58
sdague	we're mostly downloading wheels	15:58
pabelanger	unbound.service (138ms)	15:59
mordred	so wheel cache potentially not caching the same as tarballs is the thing saving us there	15:59
*** matthewbodkin has quit IRC		16:00
sdague	clarkb / fungi - I have suspicions that some of our odd fails in the last day are related to this - https://review.openstack.org/#/c/356010/	16:00
dstufft	fungi: it is a one way hash, I'm still kinda asleep (woo waking up at 11am), but off the top of my head it might be reasonable to implement some sort of aliasing thing. a la "treat domain x, y, z as domain a"	16:00
sdague	which increased keystone debug logs by 2 orders of magnitude	16:00
dstufft	when it comes to caching	16:00
sdague	that's the revert	16:00
sdague	any chance we could pop it into the top of gate?	16:00
openstackgerrit	Emmet Hikory proposed openstack-infra/storyboard-webclient: Add Worklists and Boards to About Page https://review.openstack.org/355912	16:00
*** yaume has quit IRC		16:00
*** Sukhdev has joined #openstack-infra		16:01
sdague	I've definitely seen a bunch of odd keystone token lookup fails since that merged	16:01
dstufft	we should probably make the wheel cache and the http cache consistent though	16:01
dstufft	it's weird that it's not	16:01
*** xarses has quit IRC		16:01
sdague	plus, until it's reverted, keystone logs are about ~1G uncompressed	16:01
*** xarses has joined #openstack-infra		16:01
dstufft	either skip the lack of caching on http or make wheel cache not cache on http	16:01
fungi	sdague: ouch	16:01
pabelanger	jeblair: clarkb: maybe not. still takes 1min for host git.openstack.org to resolve	16:01
jeblair	pabelanger: can you put that host back so i can continue to debug?	16:01
sdague	fungi: it was an attempt to narrow some issues in the revoke code, I think the full extent of the fallout wasn't anticipated	16:02
pabelanger	jeblair: yes, just rebooted back to original settings	16:02
mordred	dstufft: I agree on making them consistent	16:03
jeblair	pabelanger, clarkb: looking at another host which is not slow to boot, i see:	16:03
jeblair	Aug 16 15:55:19 ubuntu-xenial-osic-cloud1-3521020 kernel: [ 3.906606] random: nonblocking pool is initialized	16:03
*** jcoufal_ has joined #openstack-infra		16:03
jeblair	note that's at 3.9 seconds from boot, as opposed to 62 seconds on pabelanger's host	16:03
fungi	sdague: openstack/keystone 356010,1 is at the top of the integrated gate change queue now	16:03
fungi	stevemar: dstanek: ^	16:04
*** edtubill has joined #openstack-infra		16:04
stevemar	fungi: thank you	16:04
stevemar	sdague is padding his stats with reverts again :)	16:05
*** Sukhdev has quit IRC		16:05
zaro	morning	16:05
jeblair	pabelanger, clarkb: putting everything together so far -- it's deciding to fetch a new anchor file, and is using openssl for that which needs some random which is taking 60+ seconds	16:05
clarkb	jeblair: that sounds correct to me	16:05
*** xarses has quit IRC		16:05
mordred	that also sounds correct to me	16:05
mordred	based on reading	16:05
*** xarses has joined #openstack-infra		16:06
jeblair	pabelanger: i can't log into 2001:4800:1ae1:18:f816:3eff:fe8e:9a3e	16:06
*** jcoufal has quit IRC		16:06
jeblair	pabelanger: nm	16:06
*** aaltman has joined #openstack-infra		16:07
aaltman	Hey guys, I had a quick question about nodepool if anyone has a moment	16:07
*** xarses has quit IRC		16:07
*** xarses has joined #openstack-infra		16:07
pabelanger	I still don't understand why it gets a new root.key on reboot	16:07
mordred	aaltman: just shoot - we'll respond in time	16:07
*** amotoki has quit IRC		16:07
pabelanger	I would expect that to persist	16:07
clarkb	pabelanger: its copying the bad one if it does that unconditionally the fixed one will be overwritten would be easy to check that in syslog	16:08
*** infra-red has quit IRC		16:09
*** sdake has quit IRC		16:09
*** xyang1 has quit IRC		16:09
aaltman	okay cool: so when I boot a vm w/ nodepool and have it configured w/ Jenkins - what is the expectation for those two to connect? Who is performing the registration? I thought it happened over Gearman, but does the jenkins ssh key and username need to be enabled on the vm or can it use something like cloud-user	16:09
*** hockeynu_ has quit IRC		16:10
*** sshnaidm is now known as sshnaidm\|afk		16:11
*** matbu is now known as matbu\|afk		16:11
mgagne	aaltman: looks to be done via Jenkins API: https://github.com/openstack-infra/nodepool/blob/master/nodepool/myjenkins.py#L132-L133	16:12
pabelanger	clarkb: Yup, I see that on first boot. root.key copied, follow up reboots, key has content	16:12
mordred	aaltman: the expectation is that once nodepool spins up a node it'll have account/public-keys on it such that jenkins can connect to it - we bake those in as part of our base-image build process	16:12
mordred	and then yes, as mgagne says, nodepool uses the Jenkins API to attach the slave to jenkins	16:12
aaltman	mgagne: okay great. I think I can replicate that and see what's going on. May be an SSL issue w/ our jenkins since it's self signed for dev.	16:13
*** asettle has quit IRC		16:13
sdague	stevemar: hey, it counts as commit in keystone, so I get to vote for ptl again :)	16:14
stevemar	sdague: haha, uh oh ... :)	16:14
jeblair	aaltman: do the nodes show up in jenkins at all? if not, the nodepool log may have information as to why	16:14
dstanek	sdague: now i see your game :-)	16:14
aaltman	mordred: okay. So that may be missing as well, we are generating a key, uploading to openstack on container entry, and nodepool can access, but Jenkins shouldn't be able to w/ that model	16:14
aaltman	jetblair: they don't	16:15
aaltman	Jetblair: we checked the logs thoroughly and don't see anything suspcious	16:15
*** dizquierdo_afk is now known as dizquierdo		16:15
aaltman	jetblair: there's auth exec related to root/fedora/ubuntu logins until it hits cloud-user, which works fine and finishes out the setup	16:15
*** dtantsur is now known as dtantsur\|afk		16:18
*** xyang1 has joined #openstack-infra		16:18
jeblair	aaltman: what you describe sounds like the snapshot image build process, where nodepool boots a node from a base image, customizes it, then takes a snapshot of it. the actual test nodes are built from the snapshot.	16:18
jeblair	aaltman: nodepool and jenkins both need to have the private ssh key for the same account which should be installed on the snapshot. nodepool uses it to log in immediately after a node boots to make sure that it worked, then it attaches it to jenkins	16:19
openstackgerrit	Matt Riedemann proposed openstack-infra/project-config: Make gate-tempest-dsvm-multinode-live-migration gating for nova https://review.openstack.org/356043	16:20
aaltman	jetblair: Okay, so the boot process seems* to go fine, it's the handoff to Jenkins, which I suspect is an SSL cert issue that I currently do not see in the log and then in addition matching the keys	16:20
aaltman	That should be enough information to go off of	16:21
*** adrian_otto has joined #openstack-infra		16:21
aaltman	I'll give those two things a try. Thanks for the help!	16:21
fungi	aaltman: worth noting, back when we stil used jenkins we had self-signed certs on our jenkins masters	16:21
fungi	i don't recall if we had to do anything special to "trust" those, or were simply relying on older python 2.7 not actually validating server certs	16:22
aaltman	fungi: hmmm that's interesting	16:22
*** markusry has joined #openstack-infra		16:22
kgiusti	fungi: just fyi oslo.messaging is hitting the exact same tempest failures as the gate-tempest-dsvm-cells you mentioned	16:22
pabelanger	Aug 16 16:07:16 ubuntu kernel: [ 15.415094] random: nonblocking pool is initialized	16:22
*** pilgrimstack has quit IRC		16:23
kgiusti	fungi: but we never see the issue running the same test on the centos box FWIW	16:23
*** Sukhdev has joined #openstack-infra		16:23
*** yamahata has quit IRC		16:23
fungi	kgiusti: the ones where it fails to reach glance on 127.0.0.1:9292?	16:23
pabelanger	jeblair: that was 2 reboots ago, did you make a change on the node?	16:23
pabelanger	^	16:23
*** piet_ has joined #openstack-infra		16:23
pabelanger	only time random has started below 60 seconds	16:23
kgiusti	fungi: http://logs.openstack.org/90/349290/3/check/gate-oslo.messaging-src-dsvm-full-zmq/dd1de25/console.html#_2016-08-16_09_59_04_230462	16:24
fungi	kgiusti: urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=9292): Max retries exceeded with url: /v1/images (Caused by ReadTimeoutError("HTTPConnectionPool(host='127.0.0.1', port=9292): Read timed out. (read timeout=60)",))	16:24
fungi	so, yep	16:24
jeblair	pabelanger: no	16:25
kgiusti	fungi: yay it's not just me! :)	16:25
fungi	kgiusti: and in osic, so this pattern seems consistent	16:25
cloudnull	afternoons. sorry have been mostly AFK so far today.	16:26
kgiusti	fungi: agreed.	16:26
* cloudnull reading back		16:26
fungi	cloudnull: have fun, you have several nick highlights in here ;)	16:26
*** savihou has quit IRC		16:28
cloudnull	mordred jeblair if we can get a list of instance id's I can go and track them down to and see if there are specific issues with a given host.	16:28
openstackgerrit	Matthew Treinish proposed openstack-infra/elastic-recheck: Add query for bug 1613749 https://review.openstack.org/355988	16:28
openstack	bug 1613749 in OpenStack-Gate "Git timeouts from bluebox" [Undecided,New] https://launchpad.net/bugs/1613749	16:28
mtreinish	fungi: ^^^ there is the bug and e-r query we're using to track it	16:29
cloudnull	pabelanger: do we think that the DNS resolver issues are what is causing the slowdown folks have been mentioning?	16:30
*** jpich has quit IRC		16:30
cloudnull	mordred: whats with the 4 min to resolve git.openstack.org? is that something on the OSIC side that is causing that slowdown or is that a known routing issue?	16:31
*** gongysh has quit IRC		16:31
*** baoli_ has quit IRC		16:31
pabelanger	cloudnull: we've had ipv6 dns on ubuntu-xenail since last night	16:32
cloudnull	still not happy ?	16:32
pabelanger	cloudnull: however, I haven't followed the git issue much this morning	16:32
cloudnull	ok	16:32
jeblair	cloudnull: i believe we're working on 2 simultaneous issues, only one of which is osic-specific	16:33
cloudnull	did the unbound start issues get resolved?	16:33
jeblair	(the other is xenial specific)	16:33
cloudnull	jeblair: which one is the osic specific issue? -- sorry likley missed the message in scroll back	16:33
jeblair	cloudnull: the 'git' issue is that it takes 4 minutes to perform git operations from osic to git.openstack.org	16:33
jeblair	cloudnull: and that's the one where i sent you two instance ids to see if there is any correlation	16:33
cloudnull	well the message is likely there, i just missed reading it	16:34
cloudnull	:)	16:34
* cloudnull looking into those instances now		16:34
greghaynes	jeblair: Random depends-on quesion - in the case of there being multiple changesets which match a change-id in a depends-on (in two different projects) does zuul depend on both?	16:34
jeblair	cloudnull: the other issue is that unbound sometimes takes a while to start, but i don't think that's xenial related	16:34
jeblair	greghaynes: yes	16:34
greghaynes	good deal :)	16:34
jeblair	greghaynes: whew! :)	16:34
cloudnull	I saw the patch from pabelanger last night regarding that issue giving the process start and g.o.o resolv a wait.	16:35
cloudnull	does that fix the unbound problem ?	16:35
jeblair	cloudnull: i don't know about that. pabelanger ?	16:37
pabelanger	jeblair: cloudnull: 355695 will just make our configure_mirror.sh script more robust, it doesn't address the actually unbound delay issue	16:38
cloudnull	yes that one, https://review.openstack.org/#/c/355695/ -- do we think thats an entropy issue?	16:39
openstackgerrit	Matt Riedemann proposed openstack-infra/elastic-recheck: Add query for bug 1613749 https://review.openstack.org/355988	16:39
openstack	bug 1613749 in OpenStack-Gate "Timeouts when requesting a glance image created with a remote image from git.o.o" [Undecided,New] https://launchpad.net/bugs/1613749	16:39
jeblair	grr. all of the google results about "nonblocking pool is initialized" are related to the fact that it's a late kernel message, so it's what people see when their systems are borked and hang	16:39
jeblair	i can't actually find what prints it	16:39
* fungi wishes linux would just switch to a continuously-seeded high-quality nonblocking prng as its /dev/random backend, like all the *bsds have done for years		16:40
*** florianf has quit IRC		16:40
*** hockeynut has joined #openstack-infra		16:40
persia	Isn't /dev/urandom very close to that?	16:40
jeblair	fungi: i'm still really confused since haveged is running and the pool has entropy.	16:40
fungi	jeblair: agreed	16:41
fungi	no clue why it thinks it needs to wait still. unless aslr is grabbing priority over the available entropy pool soon after boot while other stuff is being loaded into memory?	16:41
mgagne	pabelanger: I'm considering contributing to grafyaml. I found that Grafana 3.x supports more features but also changes the syntax of some options. How should grafyaml be updated so it doesn't break the world of 2.x?	16:42
fungi	persia: yeah, except they continue to claim /dev/urandom is not secure for things like key generation. consensus among cryptographers is that you don't really "use up" entropy you accumulate, and so the linux entropy pool design is a bit of a fiction	16:42
fungi	you should be able to reuse the same entropy once you have it, as long as it's presented through an appropriately turbulent prng algorithm	16:43
persia	Claiming /dev/urandom isn't secure is just FUD. It's usually at least as good as using the HWRNG on a TPM module, or similar.	16:44
*** berendt has quit IRC		16:44
*** tosky_ has quit IRC		16:44
fungi	agreed	16:44
*** rbuzatu has quit IRC		16:45
persia	Well, if you need to generate a one-time pad vs. an adversary with unlimited resources, /dev/random might be better, if you have good sources of true entropy, but in that situation, you probably shouldn't be using an operating system you didn't hand-code from scratch...	16:45
*** florianf has joined #openstack-infra		16:45
fungi	there's a bit of stockholm syndrome going on with linux's /dev/random though. other operating systems have moved past that thinking	16:45
cloudnull	jeblair: regarding those two instances, nothing stands out between the two nodes, they landed on different compute nodes w/in different cabinets and both compute nodes have other vms running on them which are funcitonal .	16:46
fungi	jeblair: cloudnull: we do have a second (suspected) osic-specific issue, which could be related but also could be distinct: specific tests timing out trying to connect to a service listening on 127.0.0.1. we're only seeing this failure manifest in osic (so far anyway)	16:46
jeblair	cloudnull: gr.	16:46
*** piet_ has quit IRC		16:46
cloudnull	fungi: hum...	16:47
*** kcobb has quit IRC		16:47
*** kcobb has joined #openstack-infra		16:48
fungi	example failure is http://logs.openstack.org/55/352455/3/gate/gate-tempest-dsvm-cells/7881266/console.html#_2016-08-16_13_38_04_769628	16:48
cloudnull	fungi: anything strange or did something change in the hosts file making it not resolve?	16:48
fungi	cloudnull: not entirely sure yet, though our network diagnostics at the start of the log also show traceroute6 timing out trying to get to git.o.o	16:49
fungi	it resolves via dns correctly, but sees no icmp responses from any hop	16:49
jeblair	fungi, cloudnull: the two instances of slow git operations also have traceroute6 git.o.o timeouts	16:49
kgiusti	fungi: cloudnull: fyi: https://bugs.launchpad.net/openstack-gate/+bug/1613749	16:51
openstack	Launchpad bug 1613749 in OpenStack-Gate "Timeouts when requesting a glance image created with a remote image from git.o.o" [Undecided,New]	16:51
fungi	kgiusti: i think that's separate	16:52
fungi	oh, maybe it's not	16:52
fungi	the bluebox one might want to be separated out though since the symptoms are distinct	16:53
*** sambetts is now known as sambetts\|afk		16:53
*** jerryz has joined #openstack-infra		16:54
kgiusti	mtreinish: ^^^	16:54
fungi	but thinking through that test, _if_ glance thinks it's serving the image from a remote location on git.o.o, then that could account for the timeout for the test's api calls to 127.0.0.1:9292	16:54
*** infra-red has joined #openstack-infra		16:54
cloudnull	kgiusti fungi jeblair: We have an SSD sepcific AZ if we think that the speed of writes is whats causing that issue? we could switch to using that AZ to see if more iops fixes the issue?	16:54
*** nwkarsten has quit IRC		16:54
*** bin_ has quit IRC		16:54
mtreinish	fungi: the working theory right now is it actually might not network related. we're waiting on sdague's revert to see if the load being generated by keystone logging constantly was causing these issue	16:55
mtreinish	because there are some keystone token errors in the glance logs before things start getting weird	16:55
fungi	mtreinish: still strange we would only see it manifest that way in osic	16:55
mordred	cloudnull: I think sorting out the network issue first is more likely to be a win	16:55
cloudnull	++	16:55
cloudnull	fungi: maybe we're seeing it in the osic more due to it now having more tests run within the cloud ?	16:56
mtreinish	fungi: well if it's load related than the hardware and/or cloud config comes into play more	16:56
*** javeriak has joined #openstack-infra		16:57
fungi	cloudnull: well, our osic quota is still a minority of our overall aggregate quota, so we should be seeing it in other providers besides just osic. so far i haven't found any though	16:58
fungi	mtreinish: agreed	16:58
*** nwkarste_ has joined #openstack-infra		16:58
*** infra-red has quit IRC		16:58
*** infra-red has joined #openstack-infra		16:58
openstackgerrit	Merged openstack-infra/elastic-recheck: Make everything plural https://review.openstack.org/355967	16:58
*** edtubill has quit IRC		17:00
*** lucasagomes is now known as lucas-dinner		17:02
*** javeriak_ has joined #openstack-infra		17:02
*** javeriak has quit IRC		17:02
krotscheck	Any infra-core around to add a +A to https://review.openstack.org/#/c/346130/ ? I already have 2 +2's. pabelanger, in particualr, as you can verify that the bindep changes have landed.	17:02
*** nwkarste_ has quit IRC		17:03
*** hockeynut has quit IRC		17:03
krotscheck	Also, I'm trying to get our JS DSVM job landed... https://review.openstack.org/#/c/348056/8	17:03
*** xarses has quit IRC		17:04
*** javeriak has joined #openstack-infra		17:04
fungi	cloudnull: jeblair: picking jobs running at random in osic, `traceroute6 git.openstack.org` seems to be broken in all of them at the start of jobs. so maybe we have an early race with something in the network there if it's working later on?	17:05
fungi	also i just found one i can't connect to the console for	17:05
*** yamahata has joined #openstack-infra		17:05
pabelanger	fungi: jeblair: sudo apt-get install rng-tools	17:06
pabelanger	fungi: jeblair: I do not know why yet, but that is making random start faster	17:06
fungi	`nc 2001:4800:1ae1:18:f816:3eff:fe3b:53ef 19885` is just dead for me. should be running gate-tempest-dsvm-neutron-full-ubuntu-xenial	17:06
*** devkulkarni has quit IRC		17:07
pabelanger	Aug 16 17:06:38 ubuntu kernel: [ 24.471992] random: nonblocking pool is initialized	17:07
pabelanger	lowest was: Aug 16 17:04:20 ubuntu kernel: [ 7.680334] random: nonblocking pool is initialized	17:07
*** javeriak_ has quit IRC		17:07
jeblair	pabelanger: what image are you using for your tests?	17:07
jeblair	pabelanger: i notice that apparmor is not installed	17:07
zxiiro	electrofelix: I agree with you. I think we should just make it documented and a comment in the tox file	17:07
pabelanger	jeblair: template-ubuntu-xenial-1471316598	17:08
*** oanson has joined #openstack-infra		17:09
*** _nadya_ has joined #openstack-infra		17:09
fungi	pabelanger: rng-tools _can_ be configured to feed /dev/urandom in as a mock hardware rng. probably worth double-checking its config but that might be what it's doing. otherwise it's likely getting passthrough entropy from the hypervisor host	17:09
*** david-lyle_ has joined #openstack-infra		17:09
*** infra-red has quit IRC		17:10
*** tonytan4ever has quit IRC		17:11
*** e0ne has joined #openstack-infra		17:11
pabelanger	jeblair: I didn't know we installed apparmor explicitly	17:11
fungi	cloudnull: yeah, 2001:4800:1ae1:18:f816:3eff:fe3b:53ef is just plain unreachable, but should still be up. uuid is c6dd7d7d-2797-47a5-b7b1-ed3cc917a4cc according to nodepool	17:11
*** ansmith has joined #openstack-infra		17:12
cloudnull	fungi: looking now	17:13
*** david-lyle has quit IRC		17:13
*** david-lyle_ is now known as david-lyle		17:13
fungi	openstack server list isn't showing that uuid as existing at all for me though	17:13
fungi	oh, now nodepool's deleted it too	17:14
cloudnull	yup deleted.	17:15
cloudnull	:'(	17:15
cloudnull	sorry i was too slow	17:15
jeblair	fungi, pabelanger: unbound (via openssl) may actually be using urandom. the getrandom(2) call reads that by default, but even the urandom pool needs to be initialized, and getrandom will block until urandom has been initialized.	17:15
jeblair	fungi, pabelanger: the fact that it waits until the kernel prints 'nonblocking pool is initialized' reinforces that for me	17:16
jeblair	though i have not checked the openssl code to verify the flags	17:16
fungi	jeblair: seems a likely explanation	17:16
*** _nadya_ has quit IRC		17:16
jeblair	fungi, pabelanger: finally found the print statement: http://lxr.free-electrons.com/source/drivers/char/random.c?v=4.4#L684	17:17
pabelanger	fungi: jeblair: nice work on finding the reason	17:17
openstackgerrit	Eddie Ramirez proposed openstack-infra/project-config: Add craton-dashboard repository (Horizon Plugin) https://review.openstack.org/354274	17:18
*** rbuzatu has joined #openstack-infra		17:18
*** javeriak has quit IRC		17:19
*** tqtran has joined #openstack-infra		17:19
*** javeriak has joined #openstack-infra		17:19
jeblair	i don't know why it's taking so long to initialize with haveged running and, according to the kernel, 2300+ bits of entropy	17:20
jeblair	when apparently 128 bits is needed to call it initialized	17:20
fungi	cloudnull: jeblair: okay, some random spot checking turned up an example where a job in osic is successfully doing a traceroute6 to git.openstack.org, so this certainly seems inconsistent (could still be a startup race i suppose?)	17:20
jeblair	fungi: yeah, if you're thinking a startup race, it could be affected by how long the node sat idle before launching the job	17:21
cloudnull	fungi: the only way I'm able to reproduce this issue is break the resolvers.	17:21
cloudnull	:(	17:21
fungi	right, exactly what i'm wondering	17:21
*** _nadya_ has joined #openstack-infra		17:21
fungi	cloudnull: strangely, the example logs i have, dns resolution of git.openstack.org is fine, but traceroute responses aren't coming in	17:21
fungi	owing in part, i think, to the fact that nodepool ready scripts do a dns lookup of that name before ever declaring the node fit for use, so it should have resolution already cached	17:22
fungi	however, also dns lookups are happening via ipv4, so wouldn't be broken by ipv6 routing issues	17:23
jeblair	until that change lands	17:23
jeblair	(though it will still fall back on v4)	17:23
*** fguillot has quit IRC		17:24
*** piet_ has joined #openstack-infra		17:25
cloudnull	maybe this is an issue with the neutron router for IPv4 traffic? The v6 network is dual stack in the OSIC and the v4 interface is part of a neutron router. potentially, we're pushing the router farther than it wants to be pushed or its slow to be programmed which is causing the various timeouts?	17:25
*** tonytan4ever has joined #openstack-infra		17:26
fungi	cloudnull: so what i find particularly strange is that when traceroute6 works we get a response back from what appears to be the global address of the default gateway (2001:4800:1ae1:18::3), but when traceroute6 doesn't work, i don't even get a response from that one indicating an issue with neutron, or neighbor discovery (even though the fe80::def linklocal for that gateway is showing up as having a	17:27
fungi	valid hw address like 00:05:73:a0:00:06), or the local layer 2 maybe?	17:27
*** aaltman has quit IRC		17:27
*** gomarivera has joined #openstack-infra		17:27
pabelanger	fungi: jeblair: if rng-tools is hardware based generator, doesn't it make more sense to use that? I admit, haveged and rng-tools is new to me	17:27
fungi	pabelanger: the "hardware" based entropy sources supported by rng-tools may include things that are not actual hardware (especially on virtual machines). but regardless i'm fine with using it	17:29
*** hieulq_ has quit IRC		17:29
jeblair	pabelanger: i don't know (yet). pabelange, fungi: i'm still digging, and i have found that /proc/sys/kernel/random/entropy_avail is the amount of entropy in the input pool, which i believe feeds the urandom pool, which is what we're waiting on being initialized. so that at least partially explains how the value in proc is high while we're still waiting for initialization. it doesn't explain why.	17:30
*** gyee has joined #openstack-infra		17:30
fungi	pabelanger: haveged provides a nice fallback when there are no rng devices available, since it attempts to extract entropy from other timing-related sources	17:30
openstackgerrit	Jim Rollenhagen proposed openstack-infra/project-config: Make ironic job non-voting on Neutron https://review.openstack.org/356072	17:31
openstackgerrit	Sai Sindhur Malleni proposed openstack-infra/project-config: Adding Ansible jobs for Browbeat https://review.openstack.org/356073	17:32
jroll	^ 356072 is is a fairly easy review so we don't end up blocking neutron this close to release	17:32
*** baoli has joined #openstack-infra		17:33
pabelanger	fungi: okay thanks for the info.	17:33
*** mhickey has quit IRC		17:34
pabelanger	But it does seem to be related to which cloud we start ubuntu-xenial on	17:34
pabelanger	Aug 16 17:21:31 ubuntu kernel: [ 3.385322] random: nonblocking pool is initialized	17:34
pabelanger	that is from rackspace	17:34
pabelanger	nice and fast	17:34
jeblair	pabelanger: we don't have a lot of these log lines in logstash, but there are some	17:34
jeblair	pabelanger: i see 30 seconds in ovh, 60 seconds in bluebox	17:35
*** florianf has quit IRC		17:35
fungi	jeblair: here's another fun anecdote related to urandom initialization times http://haypo-notes.readthedocs.io/summary_python_random_issue.html	17:35
pabelanger	internap is 30sec too	17:35
fungi	seems consistent with what we're suspecting	17:35
*** hieulq_ has joined #openstack-infra		17:36
*** acoles is now known as acoles_		17:36
pabelanger	fungi: oh, nice	17:36
*** florianf has joined #openstack-infra		17:36
*** thorongil has joined #openstack-infra		17:36
pabelanger	http://bugs.python.org/issue26839#msg264121	17:36
*** electrofelix has quit IRC		17:37
*** ccamacho\|out has quit IRC		17:38
*** thorongil has quit IRC		17:38
fungi	at least on some platforms, some pseudo-random data gets written out on shutdown and then read in at startup to quickly seed /dev/urandom. we might be able to dump something into /var/lib/random-seed in our job node images	17:38
*** tphummel has joined #openstack-infra		17:38
*** thorongil has joined #openstack-infra		17:38
fungi	ahh, that's an rh-ism. debian derivatives use /var/lib/urandom/random-seed to the same ends however	17:39
*** thorongil has quit IRC		17:40
*** thorongil has joined #openstack-infra		17:40
*** thorongil has quit IRC		17:41
*** thorongil has joined #openstack-infra		17:42
*** shashank_hegde has joined #openstack-infra		17:43
*** thorongil has quit IRC		17:43
*** thorongil has joined #openstack-infra		17:44
*** thorongil has quit IRC		17:45
*** nwkarste_ has joined #openstack-infra		17:45
*** thorongil has joined #openstack-infra		17:45
*** nwkarst__ has joined #openstack-infra		17:46
*** thorongil has quit IRC		17:47
*** thorongil has joined #openstack-infra		17:47
*** thorongil has quit IRC		17:48
*** devkulkarni has joined #openstack-infra		17:49
*** nwkarste_ has quit IRC		17:50
*** hieulq_ has quit IRC		17:50
fungi	pabelanger: that makes sense, so starting in linux 3.17 we're getting that behavior, which explains why it's impacting xenial and not trusty or centos 7	17:51
fungi	i would guess recent fedoras are impacted as well	17:52
*** oanson has quit IRC		17:52
fungi	debian jessie is one kernel rev too old to see it	17:52
pabelanger	ya, we can check fedora-24	17:52
pabelanger	we have a node online	17:52
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool: [WIP] Add scheduling thread to nodepool builder https://review.openstack.org/356079	17:52
*** Sukhdev has quit IRC		17:53
pabelanger	fungi: so we have a few work around for now, rng-tools, smarter configure_mirror.sh. I wait until jeblair is finished before moving forward on that front	17:54
SpamapS	Hey. I just wanted to offer some public praise. Thanks for all the hard work you everyone in infra has put in on zuul and nodepool. :-D http://zuul.cloud-ci.ibmcis.com/	17:54
fungi	pabelanger: jeblair: i guess we could also pick haypo's brain in #openstack-oslo about this since he seems to have dig into it quite a bit	17:54
mordred	SpamapS: woot!	17:54
tlbr	infra-core could you please review merge https://review.openstack.org/#/c/347047/ ?	17:54
pabelanger	SpamapS: yay	17:54
fungi	er, dug	17:55
fungi	SpamapS: thanks!	17:55
SpamapS	We're pipelining and jobbing and really just happy as clams to have CI that works like upstream. :-D	17:55
*** sdake has joined #openstack-infra		17:55
*** tqtran has quit IRC		17:55
Shrews	mmm, clams	17:55
kgiusti	+1 what SpamapS said - thanks muchly!	17:56
*** baoli has quit IRC		17:56
*** ccamacho has joined #openstack-infra		17:57
*** baoli has joined #openstack-infra		17:57
SpamapS	Shrews: oh heck yeah, clams would be great	17:57
SpamapS	steamed in a little white wine sauce. :)	17:57
Shrews	oh yeah	17:58
tlbr	mordred, could you please also review https://review.openstack.org/#/c/347047/ ? We want to start work on this projects as soon as possible :)	17:58
*** tqtran has joined #openstack-infra		17:58
Shrews	mordred: jeblair: notmorgan: look, pretty diagrams https://review.openstack.org/#/c/356079/1/doc/source/devguide.rst	17:58
*** andrey-mp has joined #openstack-infra		17:58
*** gomarivera has quit IRC		17:58
*** tonytan4ever has quit IRC		17:59
mordred	Shrews: woot	18:00
jeblair	SpamapS: thanks!	18:00
jeblair	Shrews: nice!	18:00
*** dmsimard\|afk is now known as dmsimard		18:01
*** ganesan has quit IRC		18:01
*** rcernin has joined #openstack-infra		18:01
*** tqtran has quit IRC		18:03
*** rbrndt has quit IRC		18:04
*** kzaitsev_mb has quit IRC		18:06
rcarrillocruz	nice :-)	18:07
*** dprince has quit IRC		18:07
*** ociuhandu has quit IRC		18:07
openstackgerrit	Merged openstack-infra/puppet-infracloud: Switch to infra-cloud-bridge element https://review.openstack.org/356009	18:07
rcarrillocruz	\o/ ^	18:08
*** ccamacho has quit IRC		18:08
jeblair	pabelanger, fungi: i booted the machine without haveged and verified that unbound will continue to sit there waiting because there is no entropy. so i know that we are getting entropy from haveged. i then ran haveged in the foreground which immediately (<1s) provided entropy to the pool. yet it still took 95 seconds for the pool to be initialized	18:08
jeblair	er, sorry, it took an additional 30 seconds to be initialized	18:08
jeblair	(i waited 60 seconds to start)	18:08
rcarrillocruz	pabelanger: i'm going to wipe the deploy dib image to get the bridge element in	18:09
rcarrillocruz	in case you want to run it to see how it goes	18:09
rcarrillocruz	?	18:09
rcarrillocruz	well nm, it seems you all are hooked with the entropy thing, sorry for the noise	18:09
fungi	jeblair: what about with haveged removed but rng-tools installed? in theory (if this is qemu-based at least) there'll be a virt-rng it uses to get extra entropy	18:11
*** tqtran has joined #openstack-infra		18:12
jeblair	fungi: i don't know, but i'm not quite ready to try that yet; still trying to understand the sequence with haveged	18:12
mordred	also - rackspace isn't qemu based	18:12
jeblair	i'm running on osic	18:12
jeblair	i think :)	18:12
fungi	mordred: yeah, not sure how this will vary from provider to provider	18:13
mordred	I know - I was just respnding to fungi in that we have to make sure that fixing osic doesn't break rax	18:13
mordred	fungi: ++	18:13
jeblair	ya	18:13
*** xarses has joined #openstack-infra		18:13
fungi	we already know there's a significant timing variance for this across providers. seems to block longer on some than others	18:13
*** degorenko is now known as _degorenko\|afk		18:13
jeblair	fungi's question is a good one -- essentially in my mind as "if haveged is doing it's thing quickly, which seems to be the case, why does rng-tools appear to be faster"	18:14
jeblair	i just think i have a bit more data i can pull out of this configuration before i start to examine the delta with that one	18:14
fungi	i'm also curious if it's got a /var/lib/urandom/random-seed it's reading in at boot	18:15
*** _nadya_ has quit IRC		18:15
fungi	if we're seeing this delay on successive reboots then the on-disk seed likely isn't going to help	18:15
jeblair	cat: /var/lib/urandom/random-seed: No such file or directory	18:15
sdague	mtreinish: http://logs.openstack.org/10/352610/4/gate/gate-tempest-dsvm-cells/1d2248f/console.html still failing even after the keystone revert, so I think osic issues are still a real thing	18:16
fungi	jeblair: oh, i wonder if it's not saving one for some reason, or if it's been moved	18:16
jeblair	fungi: /var/lib/systemd/random-seed exists	18:16
fungi	aha, now all restaurants are taco bell^W^Wsystemd	18:17
*** fguillot has joined #openstack-infra		18:17
*** javeriak_ has joined #openstack-infra		18:17
jeblair	fungi: there is a /lib/systemd/system/systemd-random-seed.service	18:18
jeblair	Description=Load/Save Random Seed	18:18
*** javeriak has quit IRC		18:18
fungi	yeah, so sounds like it's there and doesn't help speed up urandom initialization at boot	18:18
jeblair	i would like to verify it's working	18:18
*** tqtran has quit IRC		18:19
mordred	jeblair: is it enabled?	18:19
*** pvaneck has joined #openstack-infra		18:19
jeblair	# service random-seed status	18:19
jeblair	● random-seed.service Loaded: not-found (Reason: No such file or directory) Active: inactive (dead)	18:19
mordred	well, there we go	18:19
jeblair	mordred: that looks somewhat negative	18:20
mordred	yah	18:20
mordred	is /var/lib/urandom present?	18:20
openstackgerrit	Abhishek Raut proposed openstack-infra/project-config: Use python-db-jobs for tap-as-a-service https://review.openstack.org/355670	18:20
jeblair	yes	18:20
mordred	nod	18:20
fungi	huh	18:20
fungi	that's odd	18:20
*** jaosorior has joined #openstack-infra		18:21
Shrews	i like the "active-inactive-dead" text there. not confusing at all	18:21
fungi	ELENNART	18:21
sdague	does someone have a summary of the current best theory around the osic issues?	18:21
jeblair	Shrews: 2 out of 3 are negative	18:22
fungi	mtreinish: i found the root of our snmpd issue on xenial... https://review.openstack.org/10112 says "This can go away after everything is upgraded to precise..."	18:22
sdague	as I'm trying to ponder if there is a short term mitigation on the qa side?	18:22
Shrews	jeblair: ah, it's a proportional failure message. got it :)	18:22
jeblair	sdague: i think you're referring to the 'glance' issue?	18:23
*** gomarivera has joined #openstack-infra		18:23
sdague	yeh	18:23
jeblair	sdague: or do you mean the 'git' issue?	18:23
mordred	jeblair: the internet tells me the actice: inactive (dead) thing may be the result of getting the name wrong in the status command	18:23
sdague	well... the glance issue is coupled to the git issue, right?	18:23
andrey-mp	hi! is there a document about gate/integrated queue? I want to inderstand how it works... job for changeset 352455 is going about 10 hours. it stops at the end and begins again...	18:24
fungi	sdague: that's unclear. the glance issue in bluebox does seem to be related to being unable to directly reach git.openstack.org because it's being treated as a "fake" glance remote location	18:24
mordred	jeblair: I think you want "systemctl status systemd-random-seed.service"	18:24
jeblair	mordred: aha you and the internet are right	18:24
mordred	woot!	18:24
pabelanger	rcarrillocruz: okay	18:25
fungi	sdague: the glance issue we're seeing in osic seems to be that calls to the local glance service on the job node time out (so maybe behind the scenes, glance is acting as a sort of proxy to that remote file on git.o.o still?)	18:25
mordred	although amusingly on my laptop I get a different error when I do that wrong	18:25
jeblair	mordred: Active: active (exited) since Tue 2016-08-16 18:06:16 UTC; 18min ago	18:25
jeblair	that seems more better	18:25
mordred	jeblair: that's good	18:25
*** jaosorior has quit IRC		18:25
jeblair	now i wonder if there are any logs	18:25
jeblair	cause i would like to have a timestamp for when it ran/exited	18:25
pabelanger	nice work	18:26
jeblair	journalctl -u systemd-random-seed.service	18:27
jeblair	Aug 16 18:06:16 ubuntu systemd[1]: Started Load/Save Random Seed.	18:27
*** dprince has joined #openstack-infra		18:27
*** gomarivera has quit IRC		18:27
jeblair	that's about t+0 seconds for this boot	18:27
jeblair	so it seems to have run as expected	18:28
*** berendt has joined #openstack-infra		18:28
*** berendt has quit IRC		18:28
*** baoli has quit IRC		18:29
*** baoli has joined #openstack-infra		18:29
*** cody-somerville has quit IRC		18:29
*** cody-somerville has joined #openstack-infra		18:29
*** csomerville has joined #openstack-infra		18:30
openstackgerrit	Jeremy Stanley proposed openstack-infra/puppet-snmpd: Remove initscript https://review.openstack.org/356090	18:30
openstackgerrit	Henry Gessau proposed openstack-infra/project-config: Use python-db-jobs for networking-sfc https://review.openstack.org/354358	18:30
fungi	mtreinish: pabelanger: ^	18:31
*** Jeffrey4l has quit IRC		18:31
*** rbrndt has joined #openstack-infra		18:32
*** ociuhandu has joined #openstack-infra		18:32
*** tqtran has joined #openstack-infra		18:32
*** cody-somerville has quit IRC		18:34
*** abregman has joined #openstack-infra		18:36
*** chem` has joined #openstack-infra		18:37
*** chem has quit IRC		18:38
*** _nadya_ has joined #openstack-infra		18:40
*** amotoki has joined #openstack-infra		18:45
mtreinish	fungi: heh, that would do it	18:45
openstackgerrit	Paul Belanger proposed openstack-infra/project-config: Add ansible-role-jobs for browbeat https://review.openstack.org/356093	18:45
openstackgerrit	Vasyl Saienko proposed openstack-infra/devstack-gate: DO NOT REVIEW https://review.openstack.org/356094	18:46
*** _nadya_ has quit IRC		18:48
*** amotoki has quit IRC		18:48
mtreinish	sdague: ok, sure. At least we know now	18:49
mtreinish	fungi: especially since the whole systemd thing on xenial an init script is even less useful there :)	18:49
mtreinish	fungi: do we need an equiv systemd unit file for xenial or does it come with the package?	18:50
openstackgerrit	Isaku Yamahata proposed openstack-infra/project-config: Add networking-odl for grafana dashboard https://review.openstack.org/353226	18:51
fungi	mtreinish: the snmpd package on xenial ships with an initscript still	18:51
mtreinish	heh, ok	18:52
*** abregman has quit IRC		18:52
*** ryanpetrello has quit IRC		18:54
*** ryanpetrello has joined #openstack-infra		18:55
*** chem` has quit IRC		18:55
*** Goneri has quit IRC		18:55
*** tonytan4ever has joined #openstack-infra		18:58
*** devkulkarni1 has joined #openstack-infra		18:59
*** Sukhdev has joined #openstack-infra		18:59
fungi	it's that (weekly infra team meeting) time again! find us in #openstack-meeting for the next hour	19:00
*** Na3iL has quit IRC		19:00
*** e0ne has quit IRC		19:01
*** Goneri has joined #openstack-infra		19:01
*** andrey-mp has left #openstack-infra		19:01
*** devkulkarni has quit IRC		19:01
*** baoli has quit IRC		19:01
*** edtubill has joined #openstack-infra		19:02
openstackgerrit	Isaku Yamahata proposed openstack-infra/project-config: networking-odl: cover more combinations of version https://review.openstack.org/347045	19:02
*** camunoz has joined #openstack-infra		19:04
*** edtubill has quit IRC		19:04
openstackgerrit	Joost van der Griendt proposed openstack-infra/jenkins-job-builder: Add support for stash-pullrequest-builder plugin Although the application has now been renamed/merge to BitBucket, it is still sensible to keep the Stash name for now. As there are already plugins named BitBucket, which are purely targeting the cloud solu https://review.openstack.org/355211	19:05
*** fifieldt has quit IRC		19:06
*** gomarivera has joined #openstack-infra		19:08
openstackgerrit	Isaku Yamahata proposed openstack-infra/project-config: Add networking-odl for grafana dashboard https://review.openstack.org/353226	19:11
*** edtubill has joined #openstack-infra		19:13
*** sdake_ has joined #openstack-infra		19:14
*** asettle has joined #openstack-infra		19:14
*** sdake has quit IRC		19:14
*** Apsu has left #openstack-infra		19:15
*** sdake_ has quit IRC		19:15
*** docaedo has quit IRC		19:16
*** sdake has joined #openstack-infra		19:16
*** dtardivel has quit IRC		19:17
openstackgerrit	Merged openstack-infra/system-config: Pre-install python2-requests package for Fedora https://review.openstack.org/355731	19:17
*** edtubill has quit IRC		19:18
*** fifieldt has joined #openstack-infra		19:19
*** _nadya_ has joined #openstack-infra		19:20
*** martinkopec has joined #openstack-infra		19:20
*** martinkopec has quit IRC		19:21
anteaya	sdague: I'm at a loss about http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2016-08-16.log.html#t2016-08-16T11:51:36	19:21
*** asettle has quit IRC		19:21
anteaya	sdague: you link to a line that says groups:	19:21
anteaya	- labs	19:21
sdague	anteaya: I assumed you'd be up at that timezone, and it was a project-config question	19:21
anteaya	and ask about xenial nodes	19:21
anteaya	did you get an answer?	19:22
anteaya	I have been offline most of today	19:22
sdague	about how to set the launchpad bug project page	19:22
sdague	I did not, but cdent is really the one that needs to know	19:22
anteaya	yes, that is the way to set a launchpad group for bugs	19:22
anteaya	this is the best documentation for setting up launchpad: http://docs.openstack.org/infra/manual/creators.html#set-up-launchpad	19:23
anteaya	as the group and who owns the group is important	19:23
*** iurygregory has joined #openstack-infra		19:25
*** Hal has joined #openstack-infra		19:26
*** edtubill has joined #openstack-infra		19:27
*** edtubill has quit IRC		19:28
*** xyang1 has quit IRC		19:29
*** Hal has quit IRC		19:30
*** edtubill has joined #openstack-infra		19:30
*** tqtran has quit IRC		19:34
*** tongli has quit IRC		19:35
*** Goneri has quit IRC		19:36
karthikp_	clarkb: afazekas: sdague, ianw Please could you help me review these change to the infra in your free time. we need this to test multinode grenade job for Cinder	19:37
karthikp_	Thanks in advance	19:37
*** gomarivera has quit IRC		19:40
karthikp_	https://review.openstack.org/#/c/355678/	19:41
sdague	clarkb: is there a patch up already to move cells & ceph jobs to xenial?	19:41
*** _nadya_ has quit IRC		19:42
openstackgerrit	Joost van der Griendt proposed openstack-infra/jenkins-job-builder: Adding support for Hidden parameter plugin https://review.openstack.org/355209	19:42
*** docaedo has joined #openstack-infra		19:44
*** markusry has quit IRC		19:44
*** tqtran has joined #openstack-infra		19:47
*** florianf has quit IRC		19:48
*** markusry has joined #openstack-infra		19:48
tlbr	infra-team could you please merge https://review.openstack.org/#/c/347047/ ?	19:48
*** hockeynut has joined #openstack-infra		19:48
openstackgerrit	yolanda.robla proposed openstack-infra/system-config: Bump version of rabbitmq module https://review.openstack.org/356117	19:52
*** hockeynut has quit IRC		19:53
*** kzaitsev_mb has joined #openstack-infra		19:54
*** hockeynut has joined #openstack-infra		19:54
*** Apoorva has joined #openstack-infra		19:55
mtreinish	jeblair, fungi: how difficult would it be to get the node type into the metadata we pass to logstash and subunit2sql?	19:56
*** nwkarst__ has quit IRC		19:57
*** nwkarsten has joined #openstack-infra		19:57
*** asettle has joined #openstack-infra		19:59
*** asettle has quit IRC		19:59
*** tqtran has quit IRC		19:59
*** camunoz has quit IRC		19:59
*** annegentle has joined #openstack-infra		19:59
*** nwkarste_ has joined #openstack-infra		20:00
jpmaxman	Krenair: I think your config is more correct - keep in mind this was patched together going from older distribution / apache. I'm assuming you started fresh with trusty / apache 2.4	20:01
*** nwkarsten has quit IRC		20:02
Krenair	Fresh Trusty, then I applied puppet which gave me apache etc.	20:02
fungi	Krenair: yeah, this was me porting jpmaxman's apache config changes to production. i didn't really try to whittle them down. what you have in your change is likely sufficient	20:02
*** julim has quit IRC		20:03
fungi	that diff was simply between what we had on the production server and what i found on the upgrade test server. so it's known-working, but almost certainly could be improved/tightened	20:04
yolanda	hi, so no time in the meeting... i wanted to raise the topic about mid-cycle sprint. OPNFV people are interested in adding some slot to the agenda, they requested in the etherpad	20:04
*** e0ne has joined #openstack-infra		20:05
jeblair	mtreinish: possible; have the ansible launch server return it in the zmq event it sends	20:05
*** tqtran has joined #openstack-infra		20:05
mtreinish	jeblair: ok do you have a link to where to start looking? That way I can take a detailed look after the tc meeting	20:06
fungi	yolanda: opnfv people from the qa team? or we have opnfv people on the infra team?	20:07
jeblair	mtreinish: yep, right here: http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/launcher/ansiblelaunchserver.py#n857	20:07
mtreinish	jeblair: cool, thanks	20:07
*** edtubill has quit IRC		20:07
fungi	mtreinish: so you mean some different node type than what we already record in logstash?	20:08
yolanda	opnfv people from infra. Actually Fatih is interested on coming	20:08
yolanda	i'm collaborating with them in infracloud deployment efforts on opnfv	20:09
fungi	yolanda: cool, i didn't know we had people in infra helping with that	20:09
jeblair	fungi, pabelanger: i believe i have found that restoring entropy data from a file in the manner of systemd (or init scripts that use dd) does put entropy into the pool, but does not update the entropy count.	20:09
fungi	yolanda: so it's really less an opnfv topic, and more a making infra-cloud reconsumable downstream topic?	20:09
*** matrohon has joined #openstack-infra		20:09
anteaya	yolanda: what is faith's irc nick?	20:10
mtreinish	fungi: we have the build_node and the node_provider today. I just want like trusty, or xenial-2-node or something like that	20:10
*** vhosakot has quit IRC		20:10
anteaya	I've been bemoaning the lack of new women lately	20:10
yolanda	they have interest in infra-cloud, they need some specific features, being more reconsumable, having ha, some more network configs	20:10
anteaya	great to see more	20:10
yolanda	but also an specific opnfv topic, about how can we collaborate better	20:10
mtreinish	fungi: so we can more easily see if a failure is isolated to a specific distro or something like that	20:10
yolanda	fungi, Fatih nick is fdegir	20:10
anteaya	yolanda: thanks	20:11
openstackgerrit	Merged openstack-infra/elastic-recheck: Add query for bug 1613749 https://review.openstack.org/355988	20:11
openstack	bug 1613749 in OpenStack-Gate "Timeouts when requesting a glance image created with a remote image from git.o.o" [Undecided,New] https://launchpad.net/bugs/1613749	20:11
jpmaxman	so Krenair Fungi I'm not super familiar with Puppet - so really with your changes as far as I can tell they look good. I'd be more capable of judging by looking at a resulting server that was spun up from these puppet scripts. I was actually hoping to do that myself, but it's a little silly to hold this up for that. I'm hopeful to get more familiar with	20:11
jpmaxman	puppet in general and be able to be more helpful with that side of things moving forward.	20:11
*** vhosakot has joined #openstack-infra		20:11
jeblair	mtreinish: 'node_image' is what i would recommend for naming that with specificity	20:11
fungi	mtreinish: oh, indeed, for some reason i thought we had the base node label as a parameter there already	20:11
fungi	but on inspection i see it's definitely noy	20:12
fungi	not	20:12
mtreinish	I prefer noy :)	20:12
jeblair	jpmaxman: cool -- if it's at all helpful, there's a bit of a walkthrough here about how to run infra puppet on a vm: http://docs.openstack.org/infra/system-config/sysadmin.html#making-a-change-in-puppet	20:13
yolanda	fungi, anteaya, so well, i wanted to raise the attention on the etherpad, requesting that slot to be added if there is time, so Fatih can come to the mid-cycle if there is interest on it	20:13
fungi	yolanda: testing additional deployments of our infra-cloud manifests sounds like something some of the attendees might find interesting, but i would avoid spinning it as the infra team helping opnfv deploy a cloud	20:16
Krenair	There's a story somewhere about having a wiki-dev server	20:16
Krenair	maybe the puppet changes could be applied there and tested properly?	20:16
*** tqtran has quit IRC		20:16
Krenair	It could be that my puppet changes don't cover everywhere and there's still some things to do that I didn't find	20:17
*** gomarivera has joined #openstack-infra		20:17
Krenair	e.g. you made some changes for ReCaptcha I think?	20:17
fungi	Krenair: yep, i expect that we'll do that as the next step after we merge those changes. puppet is currently disabled for the production server since the upgrade	20:17
*** e0ne has quit IRC		20:18
fungi	Krenair: http://paste.openstack.org/show/558529 was the change i applied to Settings.php (with credentials redacted)	20:18
*** yaume has joined #openstack-infra		20:18
yolanda	fungi, i would not say "infra team helping them", but propose as some ways to collaborate or join efforts	20:19
Krenair	fungi, yeah it seems we're going to have quite a few extra things to puppetise	20:19
fungi	Krenair: again, just directly ported from the upgrade test server jpmaxman worked on, with some whitespace cleanup to reduce the diff as much as possible	20:19
Krenair	why was MF removed?	20:20
fungi	Krenair: it allowed account creation outside openid previously	20:20
fungi	it's possible in 1.27 that's no longer the case	20:20
Krenair	okay but isn't disabling that a separate patch? why was it included in a wiki-upgrade change?	20:21
yolanda	anyway, i have to leave, i'll try to attend to next infra meeting and propose some item to the agenda, to see if there is interest on having an slot for it or not	20:22
jpmaxman	Krenair: also when I enabled it the wiki error'd out	20:22
jpmaxman	I didn't dig into it too deep	20:23
fungi	yeah, we discussed the possibility of reenabling it again once we work out what's needed	20:23
fungi	this was fairly rushed as we're still scrambling to get the spam problem under control	20:23
*** baoli has joined #openstack-infra		20:24
fungi	so having a wiki with limited incoming spam was prioritized over some previous features we had	20:24
*** pfallenop has quit IRC		20:24
fungi	similar for file uploads	20:24
Krenair	okay well	20:24
pabelanger	jeblair: good to know, thanks for the update	20:25
Krenair	there's no safe way you can just send these commit through and apply puppet in prod, it's going to have to go through a wiki-dev server	20:25
fungi	Krenair: yes, that's what i'm expecting	20:25
Krenair	too many unknowns created by working on servers without using puppet	20:26
fungi	Krenair: we have puppet entirely disabled for the production server for now so we can work through massive refactoring of that puppet module in safety on a dev deployment	20:26
cloudnull	fungi pabelanger mordred jeblair: Just as an update, we've found that the VLAN that was supposed to be running on all of our compute nodes wasn't trunked to all of the required switch ports. so that is likely a major part of the recent raft of failures. I Believe this is fixed now. we're rerunning some tests and I'll let you know what I find out.	20:26
Krenair	okay. what will it take to get a -dev server?	20:26
Krenair	I assume these servers are all just instances in a cloud somewhere, right? you don't have to procure hardware for this	20:26
jpmaxman	right - I think dev-wiki is next step and get that where we want it to be with the functionality we want	20:27
*** jordanP has joined #openstack-infra		20:27
jpmaxman	Krenair: correct	20:27
krotscheck	mordred: I'm going through these cloud-config things here- what's the point of having the API version in clouds.config? Shouldn't the SDK be the thing that knows what language it can talk?	20:27
pabelanger	cloudnull: Nice, thanks for the update	20:27
fungi	Krenair: i (or another of our ~dozen root admins) needs to launch one. this is a priority for me, but it's competing with a number of other priorities so i can't promise it in the next 24-48 hours	20:27
*** inc0 has quit IRC		20:27
krotscheck	Any infra cores around that can +A this patch? I've got 2x+2's, Ajaeger is on vacation, and I don't really want to sit on this for the next two weeks.	20:28
Krenair	okay well, don't let me rush you :)	20:28
krotscheck	https://review.openstack.org/#/c/346130/	20:28
*** piet_ has quit IRC		20:28
fungi	krotscheck: specifically we'll need to create a server instance for it, a trove instance to hold its database, a cinder volume for the file content mounted in the appropriate place on the fs, add some dns records, and we're probably at a minimum also lacking some glue in the system-config repo to instantiate the mediawiki module for that new server name	20:28
fungi	er, Krenair ^	20:29
fungi	sorry krotscheck	20:29
*** piet has joined #openstack-infra		20:29
* krotscheck lays claim on the tab-completion scope of the letter K!		20:29
Zara	:)	20:29
* fungi is now known as krugerand		20:30
fungi	oh, i guess it has two r's	20:30
fungi	well, three in total	20:30
*** tqtran has joined #openstack-infra		20:31
*** kgiusti has left #openstack-infra		20:31
*** gouthamr has quit IRC		20:33
*** tqtran has quit IRC		20:34
openstackgerrit	James Slagle proposed openstack-infra/tripleo-ci: Deploy minimal services in multinode job https://review.openstack.org/355097	20:34
*** pfallenop has joined #openstack-infra		20:35
*** xyang1 has joined #openstack-infra		20:36
ianw	krotscheck: it's waiting on depends-on's ?	20:36
krotscheck	ianw: Hrm.	20:37
krotscheck	ianw: Ah, right. So https://review.openstack.org/#/c/334873/ is a review that I don't have any other cores on	20:37
*** rbuzatu has quit IRC		20:37
Krenair	fungi, okay, well, let me know when it's up?	20:37
Krenair	I'm on holiday next week but other than that I should be available	20:38
*** javeriak has joined #openstack-infra		20:40
ianw	krotscheck: is that list in 334873 curated in any way, or just grabbed from somewhere? i mean i don't mind just putting it in as is, the only problem would be that it does too much	20:40
krotscheck	ianw: You'd have to ask AJaeger, I think it's a list of default dependencies from project ¯\_(ツ)_/¯	20:41
*** pfallenop has quit IRC		20:41
*** tqtran has joined #openstack-infra		20:42
*** piet has quit IRC		20:44
*** javeriak_ has quit IRC		20:44
*** armax has quit IRC		20:45
*** jheroux has quit IRC		20:45
*** e0ne has joined #openstack-infra		20:45
fungi	Krenair: will do. your work on this so far is much appreciated too!	20:47
*** ansmith has quit IRC		20:48
*** pfallenop has joined #openstack-infra		20:48
*** tqtran has quit IRC		20:48
*** kbaegis has quit IRC		20:48
*** jheroux has joined #openstack-infra		20:49
jpmaxman	yes Krenair thank you!	20:51
*** jordanP has quit IRC		20:52
*** edtubill has joined #openstack-infra		20:52
*** yaume has quit IRC		20:54
*** Apoorva_ has joined #openstack-infra		20:54
*** kbaegis has joined #openstack-infra		20:55
*** matrohon has quit IRC		20:55
*** piet has joined #openstack-infra		20:56
*** javeriak has quit IRC		20:57
*** Apoorva has quit IRC		20:57
*** rbrndt has quit IRC		20:59
*** tonytan4ever has quit IRC		20:59
*** dprince has quit IRC		21:00
*** raildo has quit IRC		21:01
cloudnull	fungi pabelanger mordred jeblair: So I've now built a VM on every compute node using the V6 network and pinged it. Additionally I've added user data to the VM to install traceroute and tracerout(6) git.o.o and from a spot check of many of the instances console log they're all being able to get there. so I hope this "resolves" the issue with instances + busted v6 networks. In test, I've found a few misbehaving hosts	21:01
cloudnull	have pulled them from the available pool.	21:01
fungi	cloudnull: thanks!	21:02
*** thorst_ has quit IRC		21:02
mtreinish	cloudnull: did you check it from a trusty vm by any chance?	21:02
fungi	mtreinish: sdague: ^ keep an eye out for continued hits	21:02
cloudnull	IDK if that makes the localhost routing thing happy, but getting there.	21:02
cloudnull	mtreinish: no, i did it w/ xenial	21:02
mtreinish	cloudnull: because that was another side of the equation we saw. The failures were only happening on trusty jobs	21:03
cloudnull	I can do it w/ trusty	21:03
cloudnull	mtreinish: the localhost failures we're on trusty ?	21:04
mtreinish	cloudnull: yep	21:04
cloudnull	ok. i'll give that a go to o	21:04
*** javeriak has joined #openstack-infra		21:04
*** sdague has quit IRC		21:04
*** tqtran has joined #openstack-infra		21:07
*** yamamoto has joined #openstack-infra		21:08
openstackgerrit	Merged openstack-infra/project-config: Added documentation draft jobs for nodejs-based projects https://review.openstack.org/346130	21:09
*** gomarivera has quit IRC		21:10
fungi	cloudnull: `nc 2001:4800:1ae1:18:f816:3eff:fed4:f536 198851 to see a log of an instance which showed failing traceroute6 as recently as 10 minutes ago. uuid is 553a91ef-fe3d-4c14-965a-419cf93acbba	21:10
openstackgerrit	Merged openstack/python-jenkins: Remove discover from test-requirements https://review.openstack.org/345764	21:10
*** Apoorva_ has quit IRC		21:11
cloudnull	I can ping that node .	21:12
fungi	cloudnull: i ssh'd into it, and `traceroute6 git.openstack.org` continues to fail for me there	21:12
*** Apoorva has joined #openstack-infra		21:12
fungi	want me to hold it?	21:12
cloudnull	hum. are the routes set?	21:12
cloudnull	if you could	21:12
fungi	okay, it's held	21:13
cloudnull	does ``host git.openstack.org`` work?	21:13
*** dizquierdo has quit IRC		21:13
fungi	yeah, and it also resolved it correctly for the traceroute6	21:13
fungi	just gets no responses back to its datagram probes	21:14
fungi	default via fe80::def dev eth0 proto ra metric 1024 expires 1787sec hoplimit 64	21:14
fungi	which i take it is the linklocal of the next hop	21:14
cloudnull	hum.	21:14
fungi	fe80::def dev eth0 lladdr 00:05:73:a0:00:06 router REACHABLE	21:14
* cloudnull looking at the compute node		21:14
*** yamamoto has quit IRC		21:15
fungi	`ping6 git.openstack.org` from it works fine	21:16
*** Hal has joined #openstack-infra		21:16
cloudnull	well thats odd.	21:16
fungi	it's possible that these failing v6 traceroutes are correlated to "trusty in osic" and the job failures are also correlated to "trusty in osic" but that the two behaviors are unrelated	21:18
*** matrohon has joined #openstack-infra		21:18
cloudnull	fungi: its missing all the hops ? or just fails all together?	21:18
*** gyee has quit IRC		21:20
*** edtubill has quit IRC		21:20
cloudnull	also does cloning from git.o.o work and _not_ take the 4 some odd minutes.	21:20
*** sarob has joined #openstack-infra		21:20
*** gomarivera has joined #openstack-infra		21:20
*** e0ne has quit IRC		21:20
fungi	cloudnull: missing all hops	21:21
*** jcoufal_ has quit IRC		21:22
fungi	we have trusty servers elsewhere with working ipv6 and the basic ip6tables -L output matches	21:22
fungi	and i can successfully traceroute6 to stuff from those	21:22
*** jordanP has joined #openstack-infra		21:23
*** jkilpatr has quit IRC		21:23
*** spzala_ has quit IRC		21:23
cloudnull	does it miss all of the hops w/ something else, like google.com ?	21:23
fungi	so it doesn't seem to be a misconfigured firewall rule or trusty-specific bug	21:23
*** spzala has joined #openstack-infra		21:23
*** gyee has joined #openstack-infra		21:23
fungi	yep	21:24
cloudnull	and from the sounds of it, everything is working?	21:24
cloudnull	besides the traceroute that is	21:24
fungi	right, so i think the traceroute errors we're getting in the logs may be unrelated to the slow git clones and to the glance-related errors in devstack	21:25
mtreinish	fungi: heh, that'd be too much of a coincidence for me to have 2 separate issues with trusty + osic involving talking to git.o.o	21:25
*** sarob has quit IRC		21:25
fungi	it's something we can (and should) dig into, but i'm unconvinced it's a marker for the other issues	21:25
*** sarob has joined #openstack-infra		21:26
fungi	mtreinish: well, i have a node held where traceroute6 to git.o.o times out, but pin6 to it works fine and git cloning from it works fine	21:26
fungi	s/pin6/ping6/	21:26
*** rcernin has quit IRC		21:27
fungi	i'm going to try to find more examples of traceroute6 _working_ from osic, and see if any of them are on trusty	21:27
*** javeriak has quit IRC		21:28
*** spzala has quit IRC		21:28
cloudnull	it seems suspect, but im going to rope in some of our network folks to see whats what,	21:28
cloudnull	maybe a misconfiguration somewhere in the path.	21:29
fungi	i have to step away for a bit though and eat dinner. bbiab	21:30
cloudnull	kk, ttyl	21:32
cloudnull	enjoy dinner.	21:32
*** jordanP has quit IRC		21:32
*** matrohon has quit IRC		21:32
*** annegentle has quit IRC		21:32
*** baoli has quit IRC		21:33
jeblair	fungi, pabelanger: i'm still not quite at the bottom of the rabbit hole, but i think i'm getting close. neither systemd nor haveged alone is sufficient to initialize urandom. systemd's entropy is not counted at all. during the initialization phase, all system entropy goes to the urandom pool, unless it comes in via ioctl, which is what haveged does. in that case, it goes straight to the input pool, and either none of it, or at ...	21:33
jeblair	... least not enough of it spills over (reall this is a thing) into the nonblocking (urandom) pool for it to be initialized. eventually, it's regular system entropy which pushes it over the 128 bit threshold.	21:33
jeblair	i have good news though	21:34
jeblair	ted ts'o ripped all of this out last month: https://git.kernel.org/cgit/linux/kernel/git/tytso/random.git/commit/?h=dev&id=e192be9d9a30555aae2ca1dc3aad37cba484cd4a	21:35
jeblair	so it's going to get better	21:35
jeblair	i have one more kernel recompile i want to do, then i think i'll be ready to try the experiment with the other generator	21:36
*** tqtran has quit IRC		21:37
*** tqtran has joined #openstack-infra		21:37
cloudnull	fungi: when you get back, if you would not mind, ``traceroute -6 -T git.openstack.org`` which forces TCP instead of the assumed UDP	21:38
cloudnull	also same for -I	21:38
*** jheroux has quit IRC		21:38
cloudnull	which is forcing ICMP, maybe the UDP packets are getting dropped/deprioritized in the path?	21:38
cloudnull	I'd be curious if that too fails	21:39
*** thorst_ has joined #openstack-infra		21:39
*** gomarivera has quit IRC		21:40
*** rhallisey has quit IRC		21:41
*** tqtran has quit IRC		21:42
dmsimard	o/ I'm trying to find where the Cirros image gets pre-cached in the nodepool images. I searched for "cirros-0.3.4-x86_64-disk.img" in project-config and system-config but no luck :(	21:42
dmsimard	I know it ends up in '~/cache/files' but I want to know how.	21:43
*** sarob has quit IRC		21:43
*** thorst_ has quit IRC		21:43
*** ldnunes has quit IRC		21:43
jeblair	dmsimard: devstack i think	21:44
mtreinish	fungi, rcarrillocruz: https://github.com/eclipse/mosquitto/commit/ba2de8879008f6df90a0d6af5902926483051124 the mosquitto bug got fixed	21:44
*** sarob has joined #openstack-infra		21:44
mtreinish	jeblair: yeah, devstack has a command which exports a list of images to precache and the nodepool image scripts call that	21:44
dmsimard	jeblair: ah, found it, ty https://github.com/openstack-dev/devstack/blob/06f3639a70dc5884107a4045bef5a9de1fb725a5/stackrc#L645	21:44
*** nmagnezi has quit IRC		21:45
beagles	the irony would be this network thing being MTU and neutron related	21:45
mtreinish	jeblair, dmsimard: http://git.openstack.org/cgit/openstack-dev/devstack/tree/tools/image_list.sh	21:45
mtreinish	dmsimard: and http://git.openstack.org/cgit/openstack-infra/project-config/tree/nodepool/scripts/cache_devstack.py	21:46
*** admcleod_ has joined #openstack-infra		21:46
openstackgerrit	James Slagle proposed openstack-infra/tripleo-ci: Deploy minimal services in multinode job https://review.openstack.org/355097	21:46
*** rbrndt has joined #openstack-infra		21:46
*** matrohon has joined #openstack-infra		21:46
*** weshay has quit IRC		21:47
*** nwkarste_ has quit IRC		21:48
*** nwkarsten has joined #openstack-infra		21:48
*** njohnston has joined #openstack-infra		21:48
cloudnull	beagles: you may be onto something there. we're using an MTU of 9000 on the hosts, maybe something is off on nodes that are showing signs of failure.	21:48
*** admcleod has quit IRC		21:49
* beagles facepalm		21:49
*** fguillot has quit IRC		21:50
njohnston	Hi, I have a quick question about change 339246 - it has been sitting in the zuul UI in the check queue, all tests having completed, for over an hour now I believe. Will it ever post it's results so the change can move on to the gate queue?	21:50
*** matt-borland has quit IRC		21:51
*** gomarivera has joined #openstack-infra		21:51
*** annegentle has joined #openstack-infra		21:51
cloudnull	beagles: sadly not the problem	21:51
cloudnull	:'(	21:52
*** ggillies has joined #openstack-infra		21:52
cloudnull	i kinda wish it was it would've been simple to fix...	21:52
*** nwkarsten has quit IRC		21:52
beagles	:(	21:52
fungi	cloudnull: yeah, same behavior with traceroute6 -T as with the default method. however slightly different behavior with -I... first attempt none of the hops gave a response except git.openstack.org and it only responded to the second probe, but then rerunning with -I a second time worked correctly (-T and default protocols still do not however)	21:53
*** thorst_ has joined #openstack-infra		21:54
cloudnull	fungi: traceroute6 or traceroute -6 ?	21:54
*** tqtran has joined #openstack-infra		21:54
fungi	cloudnull: traceroute6	21:55
fungi	trying now with traceroute -6 and various options (i didn't know traditional traceroute grew a -6 option)	21:55
cloudnull	yea, that was news to me today too :)	21:56
pabelanger	jeblair: wow, that is a rabbit hole	21:56
openstackgerrit	Merged openstack-infra/system-config: Disable puppet service on boot https://review.openstack.org/356004	21:56
*** annegentle has quit IRC		21:58
fungi	jeblair: yeah, i'm aware ted ts'o has been heavily revamping the entropy gathering and rng stuff kernel-side. very excited for that to finally land	21:58
*** sarob has quit IRC		21:58
*** edmondsw has quit IRC		21:58
*** thiagop has quit IRC		21:58
*** amitgandhinz has quit IRC		21:58
fungi	it's been all abuzz on the post-cypherpunks crypto lists	21:59
*** tqtran has quit IRC		21:59
*** piet has quit IRC		22:00
jeblair	fungi, pabelanger: on boot, the first pull from urandom results in a transfer of 0 bits of entropy from the input pool to the nonblocking pool.	22:00
*** spzala has joined #openstack-infra		22:00
fungi	that brings a tear to my eye	22:00
jeblair	fungi, pabelanger: that transfer of 0 bits causes a timer to start which protects urandom from draining the input pool too quickly.	22:00
jeblair	fungi, pabelanger: which means that later, after haveged dumps 4096 bits of entropy into the input pool, the system waits 60 seconds before it will allow a transfer from input to nonblocking for urandom to reseed	22:01
*** nwkarsten has joined #openstack-infra		22:01
jeblair	which is why we were seeing an almost exactly 60 second delay	22:01
jeblair	and when i turned off haveged, the 90 seconds was just how long it took to naturally accumulate entropy from interrupts one bit at a time	22:02
fungi	cloudnull: confirmed, traceroute -6 $* gives me identical behavior to traceroute6 $*	22:02
cloudnull	bummer.	22:02
fungi	right down to the strangeness with -I	22:02
cloudnull	off to the next rabbit hole	22:02
*** gomarivera has quit IRC		22:03
bkero	jeblair: Trying to gather entropy inside a VM?	22:03
*** mriedem has quit IRC		22:04
*** gomarivera has joined #openstack-infra		22:05
fungi	bkero: specifically, trying to get unbound to not wait 60 seconds from boot before it can start, since that causes all other services starting and trying to resolve names via dns to bomb	22:05
fungi	and unbound wants a working /dev/urandom to be able to do stuff for dnssec	22:06
*** nwkarsten has quit IRC		22:06
fungi	and the kernel makes /dev/urandom basically useless for a full minute after boot starting with linux 3.17	22:06
anteaya	njohnston: all tests are not complete on 339246,	22:07
anteaya	njohnston: all tests are not complete on 339246,	22:07
anteaya	njohnston: all tests are not complete on 339246,	22:08
*** devkulkarni1 has quit IRC		22:08
anteaya	njohnston: all tests are not complete on 339246,	22:08
* fungi thinks anteaya is caught in a loop		22:08
anteaya	njohnston: all tests are not complete on 339246, one test	22:08
bkero	fungi: weird. i would have assumed that urandom would be (as the name says) unblocking	22:08
anteaya	njohnston: all tests are not complete on 339246, one test is waiting for a node:	22:08
anteaya	njohnston: all tests are not complete on 339246, one test is waiting for a node:	22:08
*** rbuzatu has joined #openstack-infra		22:08
anteaya	gate-tempest-dsvm-neutron-full-ubuntu-xenial	22:08
anteaya	sorry for the multiple spame	22:09
anteaya	my laptop was doing something weird with pasting	22:09
anteaya	and I had scrolled up	22:09
anteaya	my apologies	22:09
fungi	bkero: yep, the kernel wants it to be safely seeded so processes don't rely on it before it's sufficiently entropic	22:09
*** tqtran has joined #openstack-infra		22:10
fungi	and manage that by blocking on reads during that time	22:10
bkero	fungi: Huh, man urandom has a little shell script to carry that randomness between reboots. Cute.	22:11
jeblair	bkero: read scrollback from me today to understand why that doesn't help	22:11
openstackgerrit	Merged openstack-infra/tripleo-ci: Use geard with keepalives https://review.openstack.org/352566	22:12
anteaya	fungi: and you had pinged me yesterday that the patch merged to allow anyone to compose electroal rolls, thank you for all your work on that	22:12
anteaya	and zaro too	22:12
*** rbuzatu has quit IRC		22:13
fungi	anteaya: yw! that and also the patch to expose submitted date via the rest api in change details are both in production, so the script is a good bit simpler now	22:14
anteaya	yay simpler scripts!	22:15
*** esberglu has quit IRC		22:16
bkero	jeblair: read scrollback. That's just unfun.	22:16
jeblair	bkero: okay, well, i mean, i've been digging into a seriously complex subject all day. i'm not sure if you're trying to help or not.	22:16
fungi	cloudnull: so, more spot checking, every trusty node i've found in osic has broken traceroute6, every xenial node i've found in osic seems to have a working traceroute6	22:16
*** mdrabe has quit IRC		22:17
beagles	is there a way to get a packet trace on hosts that are doing the 4 minute clone thing	22:17
bkero	jeblair: ignore me, just sympathies	22:17
beagles	mmm	22:17
bkero	If you're at the point of recompiling kernels to add printk()s I'm not going to be much help.	22:17
beagles	actually that probably wouldn't help - a retransmit doesn't tell you why	22:17
notmorgan	bkero: oh god	22:17
fungi	beagles: what's a packet trace? do you mean route trace or a packet capture?	22:17
jeblair	bkero: if you would like to help, i'm happy to have it, just not sure to what degree i should invest in bootstrapping you -- not reading scrollback suggests you may not be very invested. :)	22:18
notmorgan	bkero: recompiling kernels.... i... nooooooo	22:18
fungi	beagles: sounds like you meant a packet capture	22:18
beagles	fungi, I was referring to capture	22:18
beagles	yeah	22:18
bkero	jeblair: I meant I did read scrollback and was offering sympathies	22:18
jeblair	bkero: ah, thanks on all accounts then :)	22:18
fungi	beagles: we'd need to catch one of those instances while the job was running, since nodepool deletes them immediately on failure	22:18
jeblair	bkero: i read your 'read' as 'read' when you meant 'read.	22:19
jeblair	bkero: more like "i just read scrollback" and less like "what? me read scrollback" :)	22:19
bkero	yeah	22:19
bkero	My phrasing could have been better	22:19
beagles	fungi, I could probably point you in the right direction there.. I don't know if a packet trace will help or not, but it might provide some kind of clue as to what the "profile" of the poor connection is	22:20
fungi	beagles: we're running down alternate theories involving other anomalous symptoms we're able to observe, in hopes that they're related enough to provide an indicator	22:20
beagles	fungi, ack	22:20
*** tqtran has quit IRC		22:20
fungi	beagles: we've also got glance doing something odd in certain devstack jobs only on trusty nodes in osic, and traceroute6 not working correctly on trusty in osic (while xenial seems to be doing fine)	22:21
jeblair	fungi, bkero, pabelanger: i've moved on to investigating why rng-tools makes this better -- somehow it has ticked a code path where entropy is transferred from the input pool to the nonblocking pool more often than 60 seconds	22:21
jeblair	so i may not understand that timer fully...	22:21
jeblair	(also, that timer value can be set in proc, but that's not the solution i'd like to take)	22:22
*** gordc has quit IRC		22:22
bkero	Hahaha, wow. The char/random.c is copyright Matt Mackall of Mercurial fame.	22:22
fungi	jeblair: there was an ubuntu bug that talked some about that. lemme see if i can dig it back up. something about the kernel not allowing userspace to advance the entropy pool directly, but the method rng-tools uses bypasses that	22:22
fungi	the idea is that less privileged processes should be able to add to entropy, while still not trusted to actually provide good quality entropy	22:24
jeblair	fungi: hrm, it looks like it's using the same ioctl that haveged is using... but yeah, that may be helpful	22:24
jhesketh	Morning	22:25
* bkero reading random.c, looks like adding entropy might not necessarily trigger 'crediting' the pool size. That might have to be done manually depending on the method used to add it.		22:25
jeblair	bkero: yes, that's why the 'save script' doesn't work. but haveged (and presumably rng tools) use the ioctl which does credit	22:26
jeblair	bkero: sarch for RNDADDENTROPY:	22:26
jeblair	bkero: hower, that goes to the input pool instead of the nonblocking (urandom) pool. so the thing that's missing is triggering a transfer from input to nonblocking	22:26
bkero	Ahhh okay	22:26
jeblair	sarch=search; hower=however	22:27
*** tqtran has joined #openstack-infra		22:27
rcarrillocruz	mtreinish: nice!	22:27
*** sdake has quit IRC		22:27
*** rbuzatu has joined #openstack-infra		22:28
*** sdake has joined #openstack-infra		22:28
*** rbuzatu has joined #openstack-infra		22:29
anteaya	morning jhesketh	22:29
*** dimtruck is now known as zz_dimtruck		22:29
*** zz_dimtruck is now known as dimtruck		22:29
jeblair	bkero, fungi, pabelanger: oh -- i think pulls can only happen once per 60 seconds, but i think if you add entropy with the ioctl, and the input pool is full, then it can schedule a transfer from input to nonblocking pools	22:31
*** yamahata has quit IRC		22:31
jeblair	bkero, fungi, pabelanger: it's looking like rng-tools does multiple ioctls to add entropy -- along with, on my test system, haveged adding one of its own	22:31
jeblair	so that's how adding rng-tools makes initialization happen faster	22:31
bkero	hm, ok	22:32
* bkero looking what nonblocking_pool.initialized does		22:32
jeblair	presumably, convincing haveged to push more entropy than required (possibly via multiple ioctls) may do the same	22:32
fungi	sounds like something worth testing	22:33
jeblair	bkero: credit_entropy_bits has both the part i just described as well as the initialization threshold ("> 128")	22:33
*** annegentle has joined #openstack-infra		22:34
*** tqtran has quit IRC		22:36
openstackgerrit	Varun Gadiraju proposed openstack-infra/project-config: Step 1 patch to project-config from bug #1609573 https://review.openstack.org/354344	22:36
openstack	bug 1609573 in Ironic "Ironic gate jobs should not pass configs through devstack-gate when possible" [Undecided,New] https://launchpad.net/bugs/1609573 - Assigned to Varun Gadiraju (varun-gadiraju)	22:36
*** adrian_otto has quit IRC		22:38
*** tqtran has joined #openstack-infra		22:39
*** dimtruck is now known as zz_dimtruck		22:39
*** gouthamr has joined #openstack-infra		22:39
*** gouthamr_ has joined #openstack-infra		22:40
*** matrohon has quit IRC		22:40
*** tqtran has quit IRC		22:43
*** gouthamr has quit IRC		22:44
craige	o/	22:44
*** gouthamr_ is now known as gouthamr		22:44
*** yamamoto has joined #openstack-infra		22:46
openstackgerrit	Sai Sindhur Malleni proposed openstack-infra/project-config: Adding Ansible jobs for Browbeat https://review.openstack.org/356073	22:47
*** Apsu has joined #openstack-infra		22:48
pabelanger	jeblair: great info, thanks	22:48
*** thorst_ is now known as thorst		22:49
*** yamahata has joined #openstack-infra		22:51
bkero	jeblair: could always ping mpm on freenode :) he wrote the code	22:51
*** burgerk has quit IRC		22:51
bkero	jeblair: Could it be initialized, but maybe prandom_reseed_late() is being set too high?	22:53
bkero	jeblair: I'm curious what prandom_seed_full_state() and prandom_bytes_state() would return	22:54
jeblair	bkero: well, we see the "random: nonblocking pool is initialized" line in the logs right around 63 seconds, then urandom starts working.	22:54
*** hockeynut has quit IRC		22:55
bkero	jeblair: That print happens after the timer is set	22:55
bkero	I'm tracing prandom_reseed_late(); in random.c line 682 in v4.7	22:55
jeblair	bkero: oh! look at 4.4	22:56
jeblair	bkero: all this gets way better after https://git.kernel.org/cgit/linux/kernel/git/tytso/random.git/commit/?h=dev&id=e192be9d9a30555aae2ca1dc3aad37cba484cd4a	22:56
*** vhosakot has quit IRC		22:56
jeblair	bkero: but that's not what we're running :(	22:56
bkero	jeblair: Still the same in 4.4	22:56
bkero	line 681	22:56
jeblair	bkero: that's the part where it initializes the urandom rng, right?	22:57
bkero	jeblair: despite the name it looks like it actually seeds for the first time, so seed + reseed	22:58
jeblair	bkero: you were asking if it could be initialized -- but we don't see the initialized line until 60+ seconds in (when the pull timer has expired)	22:58
jeblair	so prandom_reseed_late isn't going to be called until then	22:59
bkero	jeblair: I'm assuming this thing spins with the wake_up_all() on line 683 until rng is initialized, then prints out the message	22:59
jeblair	bkero: i don't think this function spins at all	23:00
bkero	That credit_entropy_bits codeblock section does: prandom_reseed_late(), process_random_ready_list(), wake_up_all(), then prints the message	23:00
*** spzala has quit IRC		23:01
bkero	__prandom_reseed has a spinlock	23:01
*** spzala has joined #openstack-infra		23:01
jeblair	bkero: that only happens after nonblocking pool gets 128 bits	23:01
bkero	Yeah, I'm assuming that's happened. Maybe that's a false assumption.	23:02
jeblair	bkero: it hasn't happened, because the only thing that can feed the nonblocking pool is entropy from interrupts (~ one bit per second) or a transfer from the input pool.	23:03
bkero	I'd think the timers would be adding it too. http://lxr.free-electrons.com/source/drivers/char/random.c?v=4.4#L804	23:04
*** asettle has joined #openstack-infra		23:05
openstackgerrit	Matthew Treinish proposed openstack-infra/devstack-gate: SUPER WIP: Use new tempest run workflow https://review.openstack.org/355666	23:05
jeblair	bkero: not in practice on this system. but theoretically yes.	23:05
*** xarses has quit IRC		23:05
*** hongbin has quit IRC		23:05
*** tqtran has joined #openstack-infra		23:05
*** spzala has quit IRC		23:06
cloudnull	jeblair fungi: i did some more tests using trusty and the traceroute issues using vanialla 14.04 -- I built 127 vms, passed user data to it to install and use traceroute(6) and from the looks of it, it all works.	23:06
cloudnull	VMS: http://cdn.pasteraw.com/rdq27ar1tcxag4zjal72vciufu2r28c	23:06
cloudnull	opps that console data show the traceroute	23:07
cloudnull	VMS http://cdn.pasteraw.com/7pgwxkqhmfvzc8pz4y3uwk9y6v6z58b	23:07
cloudnull	all using trusty	23:07
cloudnull	mtreinish: -cc ^	23:07
*** Hal has quit IRC		23:08
*** tpsilva has quit IRC		23:08
cloudnull	simple userdata passed in http://cdn.pasteraw.com/7n0b3yeculm5zl5w4g5y5indfn4izhx	23:08
*** rbrndt has quit IRC		23:09
*** asettle has quit IRC		23:09
*** Hal has joined #openstack-infra		23:09
cloudnull	I also made sure all of the VMs we're built on different compute nodes.	23:09
fungi	cloudnull: yeah, this could be something odd with our image. i'm starting to dig around with tcpdump	23:09
cloudnull	I have another battery of tests to run using our various AZs and other networks just to make sure everything is on the up and up , but im kinda at a loss... :'(	23:11
bkero	jeblair: have you measured how many interrupts are being thrown at the boot of the system? Maybe it's not that many.	23:11
*** xarses has joined #openstack-infra		23:13
cloudnull	fungi: this is the image i've been using http://cdn.pasteraw.com/eey9gn9gggaxyimu64dd79xwu0vjble	23:14
jeblair	bkero: it's about 1 per second	23:14
*** xyang1 has quit IRC		23:14
bkero	jeblair: If the only entropy source is interrupts, and add_interrupt_randomness triggered for each, that would only add 60 bits of randomness per minute	23:15
bkero	add_interrupt_randomness() sets credit=0, and calls credit_entropy_bits(r, credit + 1). Since credit is never set except for on seed generators (PPC only) it's always 1.	23:16
bkero	where 1 = 1 bit	23:17
jeblair	bkero: yep. i'd expect it to initialize after 128 seconds. in practice, i saw it initialize after 90 seconds with no help. i can't immediately account for the 30 second discrepancy, but could reboot with both rng and haveged disabled to find out if it might be important.	23:17
*** thorst has quit IRC		23:17
cloudnull	I've got to relocate home. bbl	23:17
*** thorst has joined #openstack-infra		23:17
bkero	jeblair: I'm curious why add_timer_randomness() isn't working too	23:17
bkero	NO_HZ?	23:18
bkero	Maybe try nohz=off in the cmdline?	23:18
fungi	cloudnull: fyi, our current suspect image is b9cb5844-82a6-4034-9d09-d651ec019c7b	23:18
jeblair	bkero: possibly state->dont_count_entropy is true?	23:19
jeblair	bkero: i only instrumented the entropy credit, not the mix_pool_bytes func	23:19
*** tqtran has quit IRC		23:19
jeblair	bkero: so i know that it's not crediting, but i don't know whether the add_timer_randomness func is being called	23:20
jeblair	bkero: here's the most recent boot: http://paste.openstack.org/show/558614/ this is with rng-tools rngd adding entropy starting around 19s	23:21
jeblair	fungi, pabelanger, bkero: i strongly suspect the difference between rngd and haveged is that rngd writes data in smaller chunks which allows the overflow routine to happen to push the nonblocking pool over the limit and initialize	23:24
*** adriant has joined #openstack-infra		23:24
bkero	jeblair: I think it would help a lot to print which entropy pool was being credited	23:24
bkero	Can just print the memory address, there should only be a few.	23:24
jeblair	bkero: credit pool nonblocking nbits 1	23:24
jeblair	bkero: nonblocking is a variable sub there, it's the name of the pool	23:25
bkero	Hmm, what is entropy_count?	23:25
jeblair	"credit from interrupt / credit pool nonblocking nbits 1 / credit entropy_count 6 / credit entropy_total 64" are all one event, and that's the order to read them in	23:26
*** thorst has quit IRC		23:26
bkero	Aug 16 23:06:32 ubuntu kernel: [ 2.026124] random: write nonblocking pool 512 <-- doesn't look like that's getting credited.	23:26
bkero	I'm betting that's systemd's seed thing	23:27
jeblair	bkero: entropy_count is the value of that local variable before "if (unlikely(entropy_count < 0)) {"	23:27
*** fguillot has joined #openstack-infra		23:27
jeblair	bkero: that's it exactly	23:27
jeblair	bkero: that's a write to /dev/random rather than an ioctl, so it's added but not credited	23:27
openstackgerrit	Abhishek Raut proposed openstack-infra/project-config: Use python-db-jobs for tap-as-a-service https://review.openstack.org/355670	23:28
bkero	hrm	23:28
bkero	That sounds like a bug	23:28
bkero	Or maybe they don't want to credit userspace additions as a matter of security	23:28
bkero	If that were the case I'd hope they would leave a comment htough.	23:28
jeblair	bkero: there are some comments that allude to that	23:28
jeblair	considering it's systemd, i think it could have used the ioctl, but anyway, all these bugs are fixed in newer kernels anyway :)	23:29
bkero	jeblair: RNDADDENTROPY should be crediting it unless write_pool()s return value is 0, but according to your log it's 0.	23:31
* bkero reads the systemd source		23:32
jeblair	bkero: RNDADDENTROPY is the ioctl (haveged and rngd use it). random_write is the entry point for "cat > /dev/random" which is what systemd does. those are the write_pool calls that i have wrapped with those debug lines (the ones that print return codes).	23:33
*** sarob has joined #openstack-infra		23:33
bkero	Yeah, systemd v229 just does: r = loop_write(random_fd, buf, (size_t) k, false);	23:35
bkero	blah	23:36
* bkero disappears for a bit		23:38
bkero	jeblair: Good luck figuring it out :(	23:38
jeblair	bkero: thanks :)	23:38
jeblair	mordred, fungi, pabelanger: rngd doesn't seem to mind if there is no hardware rng. it prints some error lines and continues.	23:38
bkero	jeblair: maybe make a systemd unit file to call the ioctl with a few bytes to do things correctly?	23:39
*** jklare has quit IRC		23:39
*** zz_dimtruck is now known as dimtruck		23:40
fungi	jeblair: yeah, in some cases it may consume from virt-rng i think. i'm not super familiar with what happens if that's not available eithter	23:40
jeblair	bkero: yeah, possibly with the help of rngd or haveged	23:41
jeblair	fungi: do you know how to tell if it's doing that?	23:41
fungi	jeblair: i do not, no	23:42
*** csomerville has quit IRC		23:42
*** aviau has quit IRC		23:43
*** aviau has joined #openstack-infra		23:43
jeblair	fungi: it seems to behave the same in rax as on osic	23:43
*** moravec has quit IRC		23:44
*** cody-somerville has joined #openstack-infra		23:45
*** tqtran has joined #openstack-infra		23:46
fungi	virtio-rng i guess	23:48
*** zhurong has joined #openstack-infra		23:49
*** moravec has joined #openstack-infra		23:49
fungi	qemu/kvm passthrough... i though xen had something similar	23:49
fungi	thought	23:49
*** bswartz has quit IRC		23:49
*** annegentle has quit IRC		23:50
*** ihrachys has joined #openstack-infra		23:50
*** tqtran has quit IRC		23:51
*** jerryz has quit IRC		23:51
*** tqtran has joined #openstack-infra		23:54
*** kbaegis has quit IRC		23:55
*** apetrich has quit IRC		23:55
*** kbaegis has joined #openstack-infra		23:56
mtreinish	jeblair: so I'm looking at the zuul snippet you pointed me to, and do you know if there is an example of what the job.arguments dict looks like or just the job object that gets passed to launch()	23:56
mtreinish	because I'm not exactly sure what I have to work with for adding the node_image to the metadata there	23:57
jeblair	mtreinish: i may be in a better position to help tomorrow; i don't think i can context switch right now, sorry.	23:58
mtreinish	jeblair: ok, no worries	23:58
*** jklare has joined #openstack-infra		23:58
*** amitgandhinz has joined #openstack-infra		23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!