Wednesday, 2014-02-19

*** NikitaKonovalov_ has quit IRC		00:00
fungi	right. i'm deleting nodes associated with each master as i restart it, because i can't be assured they were properly marked offline	00:00
jeblair	fungi: also, it takes a _really_ long time to retrieve 40k keys	00:00
jeblair	so this is a memory and performance problem	00:00
lifeless	jeblair: 40k keys?	00:00
lifeless	jeblair: unique per image?	00:00
lifeless	jeblair: erm, node ?	00:01
jeblair	lifeless: overall in the account	00:01
*** tjones has quit IRC		00:01
jeblair	for that region	00:01
*** NikitaKonovalov_ has joined #openstack-infra		00:01
*** NikitaKonovalov_ is now known as NikitaKonovalov		00:01
jeblair	lifeless: oh, yes, i think that's per node	00:01
jeblair	lifeless: unique per node	00:01
lifeless	hmm, if so we should make it per image	00:02
*** rfolco has joined #openstack-infra		00:02
jeblair	lifeless: actually, i think it could be per-provider	00:02
jeblair	lifeless: it's only used to bootstrap the image creation	00:03
lifeless	its per image	00:03
lifeless	updateImage ... manager.addKeypair	00:03
jog0	mordred: https://bitbucket.org/hpk42/tox/issue/116/new-pypi-override-breaks-people-who	00:03
jeblair	lifeless: yeah, that makes sense. probably got so many due to image creation loops	00:03
lifeless	jeblair: but making it per provider would avoid running into provider quotas when lots of images are in play	00:03
lifeless	jeblair: and avoid this issue entirely	00:04
jeblair	lifeless: yep. and less work for nodepool overall	00:04
fungi	okay, jenkins01 is definitely getting lots of nodes now	00:04
*** CaptTofu has joined #openstack-infra		00:05
jog0	mordred: ahh I have tox 1.6	00:06
jog0	mordred: I am always scared at the bugs you find in python dev workflows	00:06
jog0	mordred: tox 1.6.1 works \o/	00:07
* jeblair deletes keypairs		00:07
*** tjones has joined #openstack-infra		00:07
*** changbl has quit IRC		00:07
*** gokrokve_ has quit IRC		00:07
*** gokrokve has joined #openstack-infra		00:08
*** jhesketh__ has joined #openstack-infra		00:09
*** jhesketh__ has quit IRC		00:09
*** jhesketh has joined #openstack-infra		00:09
*** jhesketh__ has joined #openstack-infra		00:09
fungi	jenkins03 is up and running again	00:09
jeblair	does anyone know if you can bulk-delete keypairs?	00:11
jeblair	the nova api docs don't look promising in this regard...	00:11
jog0	jeblair: AFAIK I don't think you can	00:11
*** dims has quit IRC		00:11
*** tjones has quit IRC		00:12
*** gokrokve has quit IRC		00:12
mordred	jeblair: I do forloops	00:12
mordred	sadly	00:12
mordred	jeblair: I support changing where keypairs happen, btw	00:13
jeblair	mordred: yeah, that's probably faster than asking hpcloud for a new account. but barely. it could take ~10 hours	00:13
stevebaker	hey, it looks like the tarballs job is having an issue in my heatclient release https://jenkins06.openstack.org/job/python-heatclient-tarball/11/console	00:13
stevebaker	Connecting to tarballs.openstack.org	00:13
stevebaker	2014-02-19 00:08:47.995 \| ERROR: Failed to upload files	00:13
*** prad has quit IRC		00:14
jeblair	mordred: any chance of increasing the rate limits for our hpcloud account?	00:15
fungi	jenkins01 finally shows a nodepool node in its webui	00:17
fungi	two	00:17
fungi	they're runnign jobs	00:17
fungi	this is a good sign	00:17
fungi	stevebaker: https://jenkins01.openstack.org/job/python-heatclient-tarball/2/console	00:18
*** prad has joined #openstack-infra		00:18
fungi	worked	00:18
fungi	"	00:18
fungi	"Offline due to Gearman request"	00:19
stevebaker	fungi: yay	00:19
fungi	for the corresponding node which ran it too	00:19
fungi	so i think we're on the right track now	00:19
jeblair	fungi: awesome	00:19
mordred	jeblair: probably - I could also see if they can bulk-delete keypairs behind the scenes	00:19
*** sarob has quit IRC		00:19
jeblair	mordred: both of those would be helpful (the rate limit thing is helpful even aside from this)	00:20
*** sarob has joined #openstack-infra		00:20
*** rcleere has quit IRC		00:22
*** matsuhashi has joined #openstack-infra		00:22
jeblair	az2 only has 22k. az3 has 48k.	00:23
*** cadenzajon_ has quit IRC		00:23
mordred	jeblair: asking	00:24
jeblair	that's 13 hours to delete	00:24
*** sarob has quit IRC		00:24
*** yamahata has joined #openstack-infra		00:25
jeblair	mordred: and in case they can: it's okay to delete all keypairs in all regions from the account	00:25
*** miguelzuniga has quit IRC		00:26
*** mgagne has quit IRC		00:26
*** ryanpetrello has joined #openstack-infra		00:26
*** dims has joined #openstack-infra		00:27
*** sandywalsh has quit IRC		00:27
fungi	jenkins05 is back up	00:28
*** banix has quit IRC		00:29
*** talluri has joined #openstack-infra		00:30
*** hogepodge has quit IRC		00:30
*** nati_ueno has joined #openstack-infra		00:32
*** matsuhashi has quit IRC		00:32
mordred	jeblair: I have put in a few questions - the support team does not have a bulk-delete option, but they pointed me to the nova team, and I'm asking them	00:34
*** talluri has quit IRC		00:34
*** matsuhas_ has joined #openstack-infra		00:34
* clarkb is back		00:34
mordred	jeblair: I have not yet asked about rate limits - I'll need to file a ticket for that	00:34
fungi	yay clarkb!	00:34
lifeless	'phil, please delete ma stuff'!	00:34
fungi	(so you don't have to read scrollback, just note that we're breaking everything)	00:35
*** eharney has quit IRC		00:35
*** nati_uen_ has quit IRC		00:35
clarkb	now I want to read sb	00:36
fungi	clarkb: main current issues are dns resolution broken from review.o.o querying rackspace recursive resolvers in dfw (worked around by pointing at iad), nodepool memory leak appears to be related to nearly 100 thousand crufy keypairs in hpcloud, and jenkins 1.511 changed the offline api call	00:36
clarkb	fungi: wow re keypairs	00:37
fungi	jeblair's deleting keypairs, i'm downgrading jenkinses to lts	00:37
jog0	fungi jeblair: can you file a bug with nov about bulk keypair	00:37
clarkb	fungi: are we upgrading zmq plugin when jenkinses are downgraded?	00:37
clarkb	also is the bug in jenkins or nodepool?	00:37
clarkb	and why is it only biting us now?	00:38
fungi	clarkb: i already upgraded the zmq plugin earlier when i upgraded 1.511	00:38
jeblair	clarkb: jenkins changed something about the internal offline node api that gearman-plugin uses	00:38
fungi	downgrading now to 1.532.2 (lts) which seems to solve current concerns	00:38
anteaya	I think crufty keypairs would be a great username	00:38
jeblair	clarkb: so we need to (later) update gearman-plugin to fix that	00:38
anteaya	like nifty lettuce	00:39
clarkb	fungi: jeblair: wait I am confused if lts is 1.532 how does it help to downgrade to it if 1.511 introduced the problem?	00:39
fungi	1.551	00:40
fungi	i mistyped	00:40
clarkb	ah ok it makes a lot more sense now thanks	00:40
fungi	earlier i upgraded from 1.525/1.543 to 1.551, now i'm downgrading to 1.532.2	00:40
*** dangers is now known as dangers_away		00:41
fungi	which supposedly also has the same security fixes backported to it	00:41
clarkb	note that that lts version may have a different offline node bug	00:41
jeblair	jog0: https://bugs.launchpad.net/nova/+bug/1281853	00:41
clarkb	the one that we are trying to work around with single use nodes	00:41
uvirtbot	Launchpad bug 1281853 in nova "Add method to bulk delete keypairs" [Undecided,New]	00:41
*** sabari has quit IRC		00:41
fungi	ooh! uvirtbot came back too while i wasn't looking, huh?	00:41
*** yamahata has quit IRC		00:42
*** yamahata has joined #openstack-infra		00:43
jog0	jeblair: thanks	00:44
fungi	okay, jenkins07 is online again	00:44
clarkb	jeblair: fungi: ok I think I grok the current state of fun. ANything I can jump onto to help?	00:44
clarkb	looks like DNS is better now curtesy of google	00:44
jog0	jeblair: do you want to be able to delete all keypairs?	00:44
clarkb	and jenkinses are being downgraded	00:44
jeblair	clarkb: no we switched to iad dns	00:44
clarkb	ah iad dns	00:44
jeblair	clarkb: do you think we will have a problem with the lts release?	00:45
fungi	i'm going to start in on the even numbered masters, but more slowly while the odd numbered masters get more nodes assigned	00:45
fungi	since nodepool is on a go-slow	00:45
clarkb	jeblair: let me dig into that more, my hunch is single use nodes will mitigate it if so	00:45
jeblair	jog0: well, at this moment, yes. but in general being able to provide a list of things to delete would be nice	00:45
jog0	jeblair: makes sense although listing 10k things in a single request seems excessive	00:47
fungi	jog0: xargs man, xargs	00:47
jeblair	jog0: everything about openstack-infra is excessive. haven't you noticed? ;)	00:48
jog0	jeblair: :)	00:48
fungi	or being able to go in a for loop and delete 10 keys per call would at least speed up the situation by a factor of 10	00:48
jog0	I have	00:48
geekinutah	so sad, success but still requed :-( https://jenkins06.openstack.org/job/gate-nova-python27/847/console	00:48
*** tjones has joined #openstack-infra		00:48
fungi	geekinutah: we're continuing to downgrade jenkins masters	00:48
geekinutah	yeah, I've been watching pass on downgraded guys	00:49
fungi	geekinutah: i've got about half of them done, and am getting started shutting the other half down, so it should be fixed up soonish	00:49
*** rfolco has quit IRC		00:49
jeblair	geekinutah: as fungi works through the jenkins downgrade, your changes of completion are going up! :)	00:50
clarkb	https://issues.jenkins-ci.org/browse/JENKINS-19453 is the upsteram bug	00:50
*** sarob has joined #openstack-infra		00:50
*** wenlock has quit IRC		00:50
clarkb	sorting out if that made it into the lts	00:51
clarkb	looks like ti may have been backported	00:51
geekinutah	fungi, jeblair: don't mind me, you guys are doing great, really appreciate it	00:51
clarkb	jeblair: fungi: the fix for 19453 was backported into stable and is in 1.532.2's log	00:54
clarkb	we should be fine	00:54
fungi	clarkb: all's the better. thanks for checking!	00:54
*** geekinutah has left #openstack-infra		00:54
openstackgerrit	Derek Higgins proposed a change to openstack-infra/nodepool: Add fedora support https://review.openstack.org/74529	00:56
openstackgerrit	Derek Higgins proposed a change to openstack-infra/nodepool: Catch key problems in ssh_connect https://review.openstack.org/74528	00:56
*** david-lyle has quit IRC		00:57
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Make jenkins get info task synchronous https://review.openstack.org/74545	00:57
clarkb	fungi: do any more jenkinses need downgrading?	00:58
clarkb	I can hand hold some of those if it helps	00:58
jeblair	clarkb, fungi: ^ maybe let's merge that soon and i think that will reduce the 40 minute main-loop cycle in nodepool	00:58
clarkb	jeblair: rgr will review	00:58
mordred	derekh: re: your key problems patch - what happens if the node never comes online/	00:59
mordred	?	00:59
fungi	clarkb: not really. it's mostly just stretching the process out so that i don't completely starve us, but i'm shutting down the other evens here momentarily	00:59
mordred	jeblair: lookin	00:59
clarkb	fungi: ok	00:59
clarkb	jeblair: that change is nice and small +2	01:00
clarkb	jeblair: is there any concern that there are mixed async and sync calls?	01:00
derekh	mordred: it should timeout like it always did, the exception I'm catching get thrown if ssh comes up but the key doesn't work	01:00
fungi	okay, 2/4/6 are in shutdown now, and 02 is close to me being able to downgrade it. i'm evacuating all it's 100+ ready nodes so that the good masters will start to pick up steam	01:00
mordred	ok. the commit message said something about continuing to try - I just wanted to make sure we weren't introducing a possibly endless loop	01:01
anteaya	like we are in now	01:01
mordred	derekh: yup. duh. I read it properly now. thanks	01:01
jeblair	clarkb: i don't think so; it should just be some simple urllib2 calls; i think the jenkins object is thread safe	01:02
jeblair	clarkb: yeah, it just stores some strings and that's it	01:02
clarkb	jeblair: great	01:03
*** dcramer__ has joined #openstack-infra		01:03
*** mdenny has quit IRC		01:04
*** russellb has quit IRC		01:05
mordred	jeblair: while I'm reviewing that one, I'm reviewing the other nodepool changes that are up - there's one from BobBall that looks very safe and has 2 +2s already (it just adds matching for image hex strings)	01:05
mordred	should I avoid landing extra thigns on principle?	01:06
jeblair	mordred: should be ok	01:06
clarkb	yeah BobBalls change is pretty safe iirc	01:06
mordred	jeblair: k. (I've been heads-down in nodepool today, so I also feel fairly competent on what it's doing)	01:06
jeblair	i have manually installed nodepool with my change on nodepool.o.o	01:07
jeblair	(because it's going to be forever before it actually merges)	01:07
jeblair	fungi: what's the current jenkins state? i'm trying to figure out when it would be best to restart np	01:08
jeblair	fungi: (not only to pick that up, but also because it's about time to free memory)	01:08
fungi	jeblair: jenkins01,3,5,7 are online but none have nodes assigned (well not entirely true, there are a few dozen in nodepool ready state on 01 but not showing in the webui yet)	01:09
fungi	i'm nodepool deleting ready nodes from the even masters while they finish up their remaining jobs	01:09
fungi	in hopes nodepool will soon start adding fresh nodes to the active masters	01:10
*** markmcclain has quit IRC		01:10
jeblair	fungi: the evens are in shutdown mode?	01:10
fungi	jeblair: yes	01:10
jeblair	fungi: now might be the best time to restart then	01:10
fungi	works for me	01:11
jeblair	fungi: i think it may have oomed while we were talking about it	01:13
clarkb	jeblair: did we identify why keypairs are leaking? and maybe we should switch to using a specific keypair isntead?	01:14
*** ryanpetrello has quit IRC		01:14
*** tjones has quit IRC		01:14
*** tjones has joined #openstack-infra		01:14
jeblair	clarkb: my guess is they leaked during image creation loops. and yes, i think we should have one keypair per provider.	01:14
*** atiwari has quit IRC		01:15
*** tjones has quit IRC		01:16
openstackgerrit	A change was merged to openstack-infra/config: Add single-use py3k-precise nodes https://review.openstack.org/73846	01:17
clarkb	jeblair: should be ok to merge change to nodepools config yaml too?	01:18
jeblair	a change merged!	01:19
jeblair	clarkb: yeah	01:19
clarkb	oh gah, I really need to figure out why gerrit doesn't show my commit message first	01:20
jeblair	it looks like the nodepool main loop now runs every ~13 seconds	01:20
jeblair	so it should be much less spiky now	01:20
clarkb	I think it happens when I jump to different changes via the dependency links	01:20
fungi	boy howdy	01:20
fungi	and the good masters are running mucho jobs now	01:21
*** jergerber has joined #openstack-infra		01:21
*** nati_uen_ has joined #openstack-infra		01:21
mordred	jeblair, clarkb: I have locally observed keypairs leaking - best I can tell, if an image fails at creation, one is left with a keypair	01:21
clarkb	mordred: so I think we should just use a single keypair per provider and call ti good	01:21
openstackgerrit	A change was merged to openstack-infra/config: Fix Climate jobs https://review.openstack.org/71317	01:21
clarkb	which jeblair agrees with	01:21
*** tjones has joined #openstack-infra		01:22
mordred	clarkb: yup	01:22
jeblair	clarkb: i assume that means nodepool will need to create it and stash the private half locally in /var. shouldn't be a big deal though.	01:22
clarkb	jeblair: correct	01:23
*** sarob has quit IRC		01:23
clarkb	jeblair: ideally it will store both halves :) you only put one half on zuul-dev which meant I had to dig in DBs for the public half which si no fun :)	01:23
*** nati_ueno has quit IRC		01:23
mordred	clarkb: you can construct a public key from a private one	01:23
mordred	clarkb: I always have to go re-learn the command though	01:23
*** mestery has quit IRC		01:24
clarkb	mordred: oh are both in the encrypted file?	01:24
jeblair	clarkb: actually, nodepool really only has to store the public half, come to think of it.	01:24
*** talluri has joined #openstack-infra		01:24
clarkb	jeblair: it sshs which needs the private side right?	01:24
*** banix has joined #openstack-infra		01:24
*** derekh has quit IRC		01:24
jeblair	clarkb: right, it only needs the private half. :)	01:24
*** harlowja_away has quit IRC		01:25
clarkb	mordred: I don't know why I never knew that, I guess I assumed that they were distinct (you can't get one from other with maths)	01:25
*** tjones has quit IRC		01:25
fungi	jenkins02 is downgraded and back online now	01:25
mordred	clarkb: you can go in one direction, just not the other	01:26
clarkb	mordred: right but only because the public key is in the private key file	01:26
clarkb	not due to maths	01:26
*** mrodden has quit IRC		01:26
jeblair	there were 138 nodes in the building state while np was stopped; i'm deleting them now.	01:26
anteaya	look at all that yellow and green in the graph	01:26
anteaya	that would get rid of the yellow I guess	01:27
anteaya	I wonder how much of the green is actual usable available nodes	01:27
clarkb	anteaya: I think very little of it due to jeblair's fire and brimstone approach	01:28
*** tjones has joined #openstack-infra		01:28
anteaya	k	01:28
jeblair	i think nodes that have been ready for >1h are suspicous and should be deleted	01:28
anteaya	ah	01:28
anteaya	goodbye nodes	01:28
anteaya	take your crufty keys with you	01:28
jeblair	that's another 101 nodes	01:28
*** talluri has quit IRC		01:28
jeblair	though of course we're running into rate limits with so much going on	01:29
fungi	jeblair: agreed. if they're on jenkins04 or 06 though they're explainable. i'm in the process of deleting them already	01:29
anteaya	of course	01:29
anteaya	a fire would be a fire without some throttling	01:29
anteaya	wouldn't	01:30
*** mestery has joined #openstack-infra		01:30
fungi	i've nearly got 04 cleared out. 06 still has a couple jobs running but they should be wrapped up by the time i get to it	01:31
anteaya	they will just loop back round for another go on a different jenkins	01:31
*** tjones has quit IRC		01:32
clarkb	fungi: out of curiousity is there a reason we limited bare-centos to rax? py3k-precise as well	01:33
fungi	clarkb: i think because that's hwere they'd previously run	01:34
fungi	and we maybe hadn't tested puppeting up hpcloud's base centos images?	01:34
fungi	we can certainly add a change to spin up images in those too and see how they fare	01:35
jeblair	yep	01:35
*** balar has quit IRC		01:35
fungi	jenkins04 is back online now	01:35
clarkb	fungi: cool, just checkign that there wasn't a specific reason for that	01:36
clarkb	like image didn't work or some such	01:36
*** nosnos has joined #openstack-infra		01:36
fungi	ph33r of the unknown (and a black hat)	01:36
anteaya	was jenkins 06 the last one to come down?	01:36
fungi	anteaya: yes, i'm clearing it out now	01:36
anteaya	k	01:36
anteaya	some jobs just finished on 06 and have started up again on other nodes on a patch I am watching	01:37
anteaya	I hope this is the last round	01:37
clarkb	zaro: are you about? looks like a bug was fixed for the envinject thing. Did we chase that down?	01:37
clarkb	zaro: or are we just calling it a derp and moving on?	01:37
clarkb	zaro: the bug wasn't clear to me	01:37
fungi	clarkb: for the zmq plugin tarball job? i merely retriggered the job and it worked the second time around	01:38
clarkb	fungi: right, but zaro marked that bug fixed I think	01:39
*** banix has quit IRC		01:39
fungi	ahh	01:39
fungi	"fixed"	01:40
*** banix has joined #openstack-infra		01:41
jeblair	the sparklines for both check and gate have a downtick	01:41
*** mgagne has joined #openstack-infra		01:41
clarkb	jeblair: fungi: all jenkins are downgraded and all cruft nodepool nodes are in the process of being deleted?	01:42
fungi	jenkins06 is online again now and downgraded	01:42
*** tjones has joined #openstack-infra		01:42
clarkb	fungi: in other news the logstash build_master data is populated which is pretty awesome	01:42
fungi	clarkb: i still have a couple nodepool delete loops taking their time, but only a handful of remaining nodes each between them	01:42
clarkb	dims: ^	01:43
fungi	i'll try to check back in later and mass delete any nodepool nodes which have been in any state at all for >3 hours	01:44
*** mgagne1 has joined #openstack-infra		01:44
fungi	just in case we miss a few	01:44
clarkb	fungi: ping me then I may be around and can assist	01:44
fungi	k	01:44
jeblair	fungi: cool. i just started some for deletes that have been in state for > 1hr	01:44
openstackgerrit	K Jonathan Harker proposed a change to openstack-infra/config: Parameterize the status page urls https://review.openstack.org/74557	01:44
fungi	in amusing news, the zuul jph graph topped out at 3000 earlier :/	01:45
*** tjones has quit IRC		01:45
jeblair	fungi: yowza.	01:45
fungi	i guess we road tested the new patches	01:45
*** prad has quit IRC		01:45
jeblair	http://graphite.openstack.org/render/?from=-24hours&height=600&until=now&width=800&bgcolor=ffffff&fgcolor=000000&areaMode=stacked&target=color%28alias%28sumSeries%28stats.gauges.nodepool.target....building%29,%20%27Building%27%29,%20%27ffbf52%27%29&target=color%28alias%28sumSeries%28stats.gauges.nodepool.target....ready%29,%20%27Available%27%29,%20%2700c868%27%29&target=color%28alias%28sumSeries%28stats.gauges.nodepool.target...*.us	01:45
fungi	i think because so many jobs were cycling and gettign reset	01:46
clarkb	fungi: cycling? meaning 72 hour timeout?	01:46
*** mgagne has quit IRC		01:46
jeblair	oops, too long.	01:46
dims	clarkb, nice!	01:46
*** prad has joined #openstack-infra		01:46
fungi	clarkb: the jobs were getting nodes offlined out from under them and restarted over and over	01:46
clarkb	fungi: oh right jenkins bug	01:47
jeblair	clarkb: the jenkins/gearman-plugin bug manifested as a null-result build to zuul	01:47
clarkb	or gearman plugin	01:47
jeblair	clarkb: so the 'restart job on jenkins derp' logic kicked in and zuul has been restarting these jobs for several hours	01:47
openstackgerrit	A change was merged to openstack-infra/nodepool: Make jenkins get info task synchronous https://review.openstack.org/74545	01:47
fungi	which is the main reason for the current pile-up	01:47
openstackgerrit	A change was merged to openstack-infra/nodepool: Allow useage of server IDs as well as names. https://review.openstack.org/69424	01:47
jeblair	clarkb: so the nice thing is that they aren't reporting negative results	01:48
*** melwitt1 has quit IRC		01:48
clarkb	as things settle any chance I can get another core review on https://review.openstack.org/#/c/72509/1	01:48
clarkb	jeblair: ya that would make the fun right now a bit more chaotic	01:48
jeblair	clarkb: you don't want the random sleep?	01:48
clarkb	jeblair: I don't think it is necessary as the jobs are being split apart by 20 minutes already	01:49
jeblair	k. should be fine as long as other things aren't hitting it at those times	01:49
*** yaguang has joined #openstack-infra		01:50
fungi	jenkins06 is getting nodepool nodes and running jobs now	01:50
clarkb	jeblair: it should be an improvement over the current situations which is 12 large query sets per hour instead of 3	01:50
clarkb	(I think 12)	01:50
*** banix has quit IRC		01:54
fungi	clarkb: are we holding off upgrading the gearman plugin to 0.0.5 in production (i see it's only on -dev)	01:55
fungi	jeblair: ^	01:55
fungi	i don't recall what the situation was with that	01:55
clarkb	fungi: I don't think we need to, but it does require a zuul restart	01:55
clarkb	I can do the logstash server during a less hectic time	01:56
*** weshay has quit IRC		01:57
*** banix has joined #openstack-infra		01:57
*** dkehn has quit IRC		01:57
anteaya	yay, jobs are finished and staying finished	01:57
*** dkehn has joined #openstack-infra		01:57
fungi	i've gone back through the jenkins masters and confirmed they're all on the correct versions of jenkins and important plugins	01:58
fungi	nice and consistent for the first time in a while	01:58
anteaya	well now we are stable jenkins heading into ff	01:59
fungi	we hope, anyway	02:00
anteaya	we do	02:00
*** talluri has joined #openstack-infra		02:00
fungi	it's not like we've had great luck with running a different jenkins release and not finding new issues	02:00
fungi	and just like the one i downgraded from, this is also a version we haven't run before	02:01
*** yamahata has quit IRC		02:01
anteaya	more fun coming up	02:01
anteaya	no idea how it will manifest	02:02
*** khyati_ has quit IRC		02:02
*** pcrews has quit IRC		02:02
anteaya	fungi: have you eaten lately?	02:02
anteaya	the fire fighting has been going on 3+ hours	02:03
mordred	wait- I thought fungi didn't get to eat until after FF	02:03
fungi	heh	02:03
anteaya	just using up the camel hump I guess	02:03
*** mgagne has joined #openstack-infra		02:06
*** mgagne1 has quit IRC		02:07
*** mrodden has joined #openstack-infra		02:07
fungi	getting back to rerunning jenkins-jobs update on the jenkins masters. my bare-precise change touched a whole lot of jobs, so the puppet exec timeout was way too short to handle it	02:07
*** ryanpetrello has joined #openstack-infra		02:07
*** hdd_ has joined #openstack-infra		02:08
fungi	i had already gotten through jenkins.o.o and 01-04, so now it's running on 05-07	02:08
jeblair	fungi: oh nice catch; that's probably responsible for some stuck jobs too	02:08
jeblair	fungi: as i imagine that our static nodes may be marked offline at this point...	02:08
*** mgagne1 has joined #openstack-infra		02:08
fungi	well, they were all on 01 and 02, which is why i got those out of the way early	02:09
fungi	they were mostly done before we got into any of the real fun	02:09
*** nati_ueno has joined #openstack-infra		02:09
*** mgagne has quit IRC		02:10
*** dstanek has joined #openstack-infra		02:10
*** talluri has quit IRC		02:12
*** nati_uen_ has quit IRC		02:12
*** talluri has joined #openstack-infra		02:12
fungi	i had already confirmed no new jobs were getting assigned to the static precise slaves, then offlined them and removed them from the masters a while later (i can add them back fairly easily if need be, since they're not deleted at the provider yet)	02:13
jeblair	fungi: there are a lot of ready nodes, and most of the are attatched to jenkins02...	02:15
jeblair	fungi: do you think one of your scripts missed something, or could those be casualties of the nodepool downtime and restart?	02:15
fungi	the latter. nodepool list counted zero when i deleted them initially with jenkins02 shutdown	02:16
clarkb	jeblair: I want to say I have seen nodepool do wild swings like that while nodes are down	02:16
clarkb	then it settles out again once everything is back up for a full iteration through the single use nodes	02:16
fungi	it's possible it wasn't the nodepool restart, but that nodepool tried to add nodes to 02 too quickly and most didn't register in jenkins	02:16
jeblair	clarkb: oh, i'm not thinking that jenkins02 is over loaded, i mean to say that they are not really ready nodes	02:17
clarkb	oh that is different then	02:17
jeblair	clarkb: the ones i have spot checked were already deleted from jenkins	02:17
*** talluri has quit IRC		02:17
clarkb	jeblair: so they are in nodepool buit not reflected in jenkins	02:17
clarkb	gotcha	02:17
morganfainberg	i am guessing some excitement for the day has abated since i see jobs making their way through check/gate	02:17
fungi	there was an initial glut of 100+ nodes built for 02 and i only saw a fraction of them show up in the webui. i suspect jenkins never added them to its slave list	02:18
lifeless	'maybe' :P	02:18
morganfainberg	lifeless :)	02:18
anteaya	morganfainberg: aye	02:18
anteaya	we hope they continue along the abatement route	02:18
anteaya	abating?	02:18
morganfainberg	well... if they are resolving...and there is a little bandwidth to do something that keystone-core will <3 you (ok ok, i still lie, I will <3 you guys) for https://review.openstack.org/#/c/74472/ - it'll keep us from monopolizing openstack-dev. (just eavesdrop bot stuff)	02:20
* morganfainberg thinks if that can be rephrased in a creepy stalker-ish way...		02:20
morganfainberg	>.>	02:20
morganfainberg	nah	02:20
*** sarob has joined #openstack-infra		02:20
anteaya	morganfainberg: please don't do that	02:21
jeblair	fungi: i think i'll delete ready nodes that are > 0.1 hours	02:21
anteaya	I have such high regard for you	02:21
fungi	jeblair: sounds safe	02:21
morganfainberg	anteaya, hehe, i don't think i could actually think of a way to rephrase it.	02:21
anteaya	good	02:21
morganfainberg	anteaya, just doesn't come naturally to me.	02:21
anteaya	glad to hear it	02:21
anteaya	again, so happy to hear that	02:21
morganfainberg	besides, i actually genuninely like -infra folks	02:22
anteaya	well there's that too	02:22
anteaya	so thanks	02:22
anteaya	back at 'ya	02:22
morganfainberg	anyway. just relaying keystone desires :) thanks in advance	02:22
anteaya	already +1'd	02:22
morganfainberg	anteaya, i know :) you're awesome.	02:23
anteaya	jenkins is +1, 6 other +1 and a +2 on it	02:23
jeblair	okay there's another 89 nodes i hope	02:23
anteaya	morganfainberg: nah, I just review the easy patches	02:23
morganfainberg	anteaya, yep. it's why i hopped over.	02:23
morganfainberg	lol	02:23
morganfainberg	anteaya, one of these days i'm going to have time to be really more involved with infra stuff	02:24
morganfainberg	anteaya, one of these days...	02:24
dolphm	fallacy &	02:24
anteaya	morganfainberg: one of these days	02:24
jeblair	morganfainberg: one of these days i hope i'll have time too. :)	02:24
anteaya	doesn't sound like today	02:24
morganfainberg	jeblair, lol :)	02:24
morganfainberg	anteaya, nah, dolphm will just find more stuff to be done.	02:24
anteaya	he is like that, dolphm is	02:25
* anteaya gestures pushing that patch out of the gate		02:26
*** UtahDave has quit IRC		02:26
jeblair	oh, the top check change is running its missing job!	02:28
clarkb	and the downtick on the sparklines continue	02:29
anteaya	yay for both	02:29
anteaya	yay that job finished success	02:31
*** ryanpetrello has quit IRC		02:31
anteaya	out out out	02:31
anteaya	look at the gate shrink	02:32
anteaya	6	02:32
anteaya	12 in post	02:35
*** gokrokve has joined #openstack-infra		02:35
fungi	nibalizer: what size vm does this puppetdb need to start out? 2gb ram?	02:36
openstackgerrit	A change was merged to openstack-infra/config: Run fewer es queries with elastic_recheck. https://review.openstack.org/72509	02:39
clarkb	fungi: http://docs.puppetlabs.com/puppetdb/latest/scaling_recommendations.html	02:41
clarkb	fungi: basically we have two major processes, puppetdb itself (which is java and needs heap) and postgresql	02:41
jeblair	clarkb: puppetdb is java? not ruby?	02:42
clarkb	fungi: it doesn't look like we need a very large puppetdb java process because we are using postgresql. Which leaves us with accomodating postgresql	02:42
clarkb	jeblair: its jvm, it might be jruby or similar	02:42
*** jergerber has quit IRC		02:42
*** dcramer__ has quit IRC		02:42
jeblair	clarkb: ah.	02:43
clarkb	looks like clojure	02:43
clarkb	https://github.com/puppetlabs/puppetdb/tree/master/src/com/puppetlabs	02:43
fungi	the language schizophrenia of the puppet ecosystem amuses me	02:43
clarkb	I have a hunch 2GB is plenty	02:44
clarkb	but nibalizer should know more	02:44
openstackgerrit	Davanum Srinivas (dims) proposed a change to openstack/requirements: Sync requirements to oslo.vmware https://review.openstack.org/74569	02:45
clarkb	oslo.vmware	02:45
clarkb	I think that is my queue for dinner	02:45
clarkb	cue?	02:46
clarkb	silly english	02:46
fungi	oh, fair warning i'll be disappearing around 21:00 utc tomorrow for our monthly local osug	02:46
dims	i will probably need help fixing a bad requirements.txt in oslo.vmware :)	02:46
fungi	dims: you're going to need help fixing vmware? i think that's out of my league, sorry ;)	02:47
clarkb	dims: why do we need vendor specific oslo libs?	02:47
fungi	sounds like the different virtual resources in vmwareland need some common interaction from more than one component of openstack	02:48
dims	fungi, just need to add a \n in the requirements.txt	02:50
dims	clarkb, fungi - yea, same code in cinder, nova etc	02:50
clarkb	dims: why wouldn't that live in vmware land?	02:50
clarkb	(just trying to sort out why this lives in openstack and not the vendor space)	02:50
fungi	vmware python sdk	02:51
dims	clarkb, the code is pretty specific to openstack and not usable outside of openstack	02:51
anteaya	is it opensource?	02:51
jerryz	hi guys, i have a problem with nodepool used by gerrit-triggered jenkins. when a patch is updated, the on-going job will be aborted but the slave hasn't been deleted yet and the new patch is tested on the used slave. later on that slave will be deleted by nodepool and the job fails.	02:51
*** sarob_ has joined #openstack-infra		02:52
clarkb	anteaya: yes	02:52
clarkb	jerryz: I don't think you can mix the two	02:52
dims	anteaya, it's existing code in nova/cinder that's getting moved out all projects can use the same code base	02:52
clarkb	jerryz: you need to use the offline slave functionality in gearman plugin with nodepool	02:52
anteaya	yeah, I'm with clarkb not sure why we have to maintain vendor code, regardless of how specific it is to openstack	02:52
clarkb	anteaya: I think I understand now	02:53
clarkb	it is openstack specific bits for interacting with vmware	02:53
clarkb	and if it needs to go in multiple projects oslo is the place for it	02:53
dims	anteaya, clarkb, fungi - https://blueprints.launchpad.net/oslo/+spec/vmware-api	02:53
clarkb	it just feels wrong	02:53
anteaya	it does	02:53
anteaya	I have vendor prickly radar	02:53
jerryz	clarkb: i will have a try	02:54
jerryz	thanks	02:54
clarkb	jerryz: or if the gerrit plugin can offline nodes when jobs are started that will work too	02:54
*** sarob has quit IRC		02:55
*** nati_uen_ has joined #openstack-infra		02:57
jerryz	clarkb: is the gearman plugin with offline slave function a snapshot version?	02:58
clarkb	jerryz: I think you may need a snapshot for the latest bug fixes	02:58
clarkb	jerryz: http://tarballs.openstack.org/ci/gearman-plugin/	02:59
clarkb	no 0.0.6 looks new enough	02:59
*** yamahata has joined #openstack-infra		02:59
jerryz	clarkb: thanks	02:59
*** nati_ueno has quit IRC		03:00
*** julim has quit IRC		03:02
*** simonmcc has quit IRC		03:08
*** gokrokve has quit IRC		03:08
*** gokrokve has joined #openstack-infra		03:09
*** simonmcc has joined #openstack-infra		03:10
*** gokrokve has quit IRC		03:13
*** gokrokve has joined #openstack-infra		03:15
*** sarob_ has quit IRC		03:24
*** CaptTofu has quit IRC		03:26
openstackgerrit	Cyril Roelandt proposed a change to openstack-infra/pypi-mirror: Do not download wheels when running "pip install" https://review.openstack.org/74579	03:28
*** matsuhas_ has quit IRC		03:29
mordred	hrm	03:30
mordred	that's pretty much the opposite direction we'd like that to go	03:30
mordred	clarkb: ^^ unless there is a direction or issue we're seeing I don't know about?	03:31
clarkb	?	03:34
clarkb	74579?	03:35
mordred	yeah	03:37
morganfainberg	anteaya, can you explain something to me...	03:39
morganfainberg	anteaya, why are recruiters obnoxious? :P	03:40
morganfainberg	ok ok enoough of that	03:40
mordred	morganfainberg: because of the reasons	03:40
mordred	morganfainberg: also, because they need a bunch of contract java programmers in new jersey apparently	03:40
dstufft	mordred: ideally you'd download both sdist and Wheel	03:41
dstufft	but pip isn't really designed for mirroring :[	03:41
morganfainberg	mordred, esp when they look at a resume and think "Oh open source developer, he'd like to work on proprietary java internal close source insanity"	03:41
fungi	wow... a java programming gig in new jersey? can't say i'm sure which part is worse	03:41
dstufft	fungi: I was just thinking that	03:42
mordred	morganfainberg: ++	03:42
dstufft	there's literally nothing about that which sounds appealing	03:42
lifeless	mordred: btw, if you ahve 70G available, bandersnatch++	03:42
mordred	I don't know what a bandersnatch is	03:42
lifeless	mordred: its the official pypi mirror tool	03:42
dstufft	it does a full mirror of PyPI	03:42
lifeless	s/the/a/	03:42
dstufft	I think there was a reason why openstack didn't want that	03:42
dstufft	because i'm pretty sure i suggested that before	03:42
clarkb	which isnt what we want but is slowly getting there	03:42
lifeless	mordred: efficiently	03:42
anteaya	mordred: they don't know any better	03:43
morganfainberg	fungi, or worse, java + "git architect". wait... what is a git architect job really? and why does that need ot be a full time job. I could help do that and have more fun/wider range of things just by working with -infra	03:43
clarkb	review.o.o isnt working on my phone again	03:43
clarkb	:/	03:43
lifeless	clarkb: get a new phone? :)	03:43
morganfainberg	lifeless, bandersnatch is cool. i've been looking at that for some internal stuff	03:43
clarkb	its a chrome js + caching problem I think	03:43
morganfainberg	it's the 70G that is a challenge for me to sell, but i like the project	03:43
mordred	lifeless, dstufft I BELIEVE lasst time we looked at it it didn't work yet	03:43
mordred	70G is a piece of cake	03:44
clarkb	mordred its a mirror	03:44
lifeless	morganfainberg: 70G - mem.	03:44
lifeless	morganfainberg: meh I mean. :)	03:44
clarkb	which we dont want	03:44
dstufft	oh	03:44
mordred	lifeless: does it follow external links?	03:44
anteaya	my sister's niece is a recruiter	03:44
dstufft	no it doesn't	03:44
clarkb	because external links	03:44
morganfainberg	oh oh. i meant to ask...is there some strange chrome issue w/ gerrit?	03:44
lifeless	mordred: no	03:44
mordred	clarkb: you used to argue the opposed	03:44
mordred	opposite	03:44
mordred	lifeless: yea - that's why	03:44
anteaya	pretty and knows zip and doesnt' want to know	03:44
mordred	we want it to suck down external links	03:44
dstufft	kill all your external links imo	03:44
dstufft	:D	03:44
morganfainberg	my chrome (desktop) browser jumps around when i click.	03:44
mordred	because those are what kill us	03:44
morganfainberg	lifeless, phsaw, ram is free right?	03:44
anteaya	one ugly christmas dinner was enough for me	03:44
lifeless	mordred: we want them to die :)	03:44
mordred	we do	03:44
lifeless	morganfainberg: it doesn't use much ram	03:44
mordred	but they are not yet dead	03:44
clarkb	mordred well a mirror that pulls external links is what I want :)	03:44
mordred	clarkb: yes	03:45
mordred	if it pulled external links, I'd get rid of pypi-mirror and just use it	03:45
morganfainberg	lifeless, also network is free, right?	03:45
dstufft	it's convievable that bandersnatch would grow the option to pull in external links	03:45
morganfainberg	lifeless, >.>	03:45
lifeless	morganfainberg: I may have exceeded my quota this week :)	03:45
morganfainberg	lifeless, hehehe	03:45
dstufft	althouth it wouldn't match the output of pip install exactly	03:45
dstufft	because people can update the external links	03:45
lifeless	morganfainberg: since I setup an Ubuntu mirror (100Gish) + bandersnatch(70G)	03:45
*** nati_ueno has joined #openstack-infra		03:45
morganfainberg	lifeless, oh dear!	03:45
dstufft	without updating the pypi listing at all	03:45
dstufft	and then bandersnatch won't know to download :[[	03:45
lifeless	morganfainberg: I have a 500G quota	03:46
lifeless	morganfainberg: so I'm probably ok.	03:46
morganfainberg	lifeless, that would blow out my bandwidth cap. (I only get 250)	03:46
* clarkb is unlimited \o/		03:46
*** nati_ueno has quit IRC		03:46
morganfainberg	clarkb, i could use my cellphone i have "unlimited data"*	03:46
anteaya	I too am unlimited	03:46
morganfainberg	anteaya, :( /jealous	03:47
anteaya	if you want	03:47
anteaya	Canadian telco monopoly is pretty bad	03:47
anteaya	I'm jealous of the dude in SF with 271mb up and down	03:48
anteaya	don't know what he has for data cap	03:48
*** nati_uen_ has quit IRC		03:48
dstufft	unlimited ftw	03:48
clarkb	dstufft right which is why we pip	03:48
anteaya	okay so that was the last patch we were waiting for	03:49
anteaya	lifeless: when markmcclain shows up again, he can release a neutronclient, the last patch he was waiting on (yours) has merged: https://review.openstack.org/#/c/69110/	03:50
anteaya	he said he would check in after he had dinner	03:50
anteaya	and I am off to bed	03:50
anteaya	nighty-night	03:51
clarkb	mordred: so we don't use latest pip	03:51
lifeless	anteaya: thanks for the update	03:51
anteaya	np	03:51
mordred	clarkb: no?	03:51
clarkb	morganfainberg: we use pip 1.4.X because 1.5 is broken	03:51
clarkb	mordred: ^	03:51
mordred	clarkb: I thought 1.5.1 was out	03:51
clarkb	as is latest virtualenv and tox	03:51
mordred	which fixed it	03:51
mordred	we just haven't unpinned yet	03:51
mordred	because of FF	03:51
clarkb	mordred: it might, if it did we didn't unpin	03:51
clarkb	mordred: that said	03:51
morganfainberg	clarkb, i know i know i should use a different name in irc, cause i make you type 4 characters instead of 3	03:51
clarkb	mordred: we might want to set no wheels there bceause we do two passes right?	03:51
*** sarob has joined #openstack-infra		03:51
mordred	well... yeah. I could see that	03:52
clarkb	mordred: I think what cyril is saying is that the mirror builder will only find wheels if they are available	03:52
mordred	we do the tarball pass, adn then we build wheels from the tarballs	03:52
clarkb	mordred: so the tarball pass needs to not get any wheels then the wheel pass needs to get all wheels	03:52
mordred	so - ok. I can buy that	03:52
*** rcleere has joined #openstack-infra		03:53
clarkb	also pinging holger tomorrow is on my list of things to do	03:53
clarkb	I would really like working tox	03:53
mordred	yeah	03:53
mordred	that would be awesome	03:53
* mordred is so happy that dstufft hangs out in here		03:53
mordred	it's made pip so much better for us	03:53
mordred	also, he's cool and all	03:54
clarkb	I am going to hop over to #tox tomorrow when I feel patient :)	03:54
dstufft	;)	03:54
dstufft	um	03:54
dstufft	i'm going to be releasing a 1.5.3 either tonight or tomorrow	03:55
dstufft	I don't know how hard it would be to unpin and try things out	03:55
mordred	well, right now we've got tox pinned	03:55
dstufft	ok	03:55
mordred	beacuse of $other_bug	03:55
mordred	so realistically we won't try for another 2 weeks at best, because of our feature freeze cycle of death	03:55
dstufft	just saying, if it's easy to unpin then it would be a good time to sneak something into 1.5.3	03:56
dstufft	if something else came up	03:56
dstufft	:]	03:56
mordred	nod	03:56
mordred	I trust that pip is perfect at this point	03:56
dstufft	ok :)	03:58
Steap_	clarkb: yep, that was my idea (I'm Cyril), looking at the code, I think we'll get the wheels anyway	03:58
Steap_	so I wanted to keep the tarballs since we currently cannot do without them	03:58
clarkb	yup	03:59
*** gyee has quit IRC		04:00
mordred	Steap_: yup. I grok now	04:04
mordred	and thanks- I believe you're quite right	04:04
Steap_	mordred: honestly, I don't really get everything :)	04:06
Steap_	I've just learnt that there was yet another way of installing Python packages	04:07
mordred	:)	04:07
mordred	wheels are awesome	04:07
mordred	we need to get on them	04:07
Steap_	and it prevents some packages from being updated :/	04:07
mordred	but we're not there yet	04:07
Steap_	mordred: yeah, probably	04:07
Steap_	but I've known easy_install, pip, setuptools, distutils... Yesterday I had to install a Ruby package, had to learn abotu gems... That's sort of a pain in the end :)	04:08
Steap_	I miss ./configure && make && make install :)	04:08
clarkb	mordred I am 95% sure we can turn on wheels now	04:08
Steap_	mordred: do you have a link explaining how wheels are awesome ?	04:08
clarkb	i did a bunch of local testing against the mirror and it seemed to work	04:08
mordred	Steap_: one reason - they don't run python setup.py to install	04:08
Steap_	clarkb: what about the old pip used in the gates ?	04:08
mordred	Steap_: they are pre-built/binary	04:09
Steap_	mordred: ok	04:09
mordred	Steap_: so you don't need dev libs or c compilers or anything	04:09
mordred	I mean, you still need the c libs	04:09
clarkb	Steap_ not a problem I uses the same version of pip	04:09
mordred	but MUCH MUCH more efficient	04:09
Steap_	clarkb: well, how do you explain the failures in the gates, then ?	04:09
mordred	Steap_: I miss configure ; make ; make install too	04:09
Steap_	mordred: things where simpler :)	04:09
clarkb	Steap_ which failures?	04:09
mordred	Steap_: on my last project, I was teh automake/autoconf person	04:09
Steap_	mordred: the main issue is that you need to learn a different way of installing packages for every language you migth have to use	04:10
mordred	we could move to autoconf for our python stuff ...	04:10
mordred	Steap_: yeah	04:10
Steap_	and it changes every5 years	04:10
Steap_	so in the end, as a user, it's a pain	04:10
Steap_	if it's not packaged in the distrib, it can keep me busy for a long time	04:10
clarkb	I think the failures we saw last time wheels were enabled were due to having wheels but telling pip to not use them	04:11
Steap_	clarkb: well, when only wheels are available for a given package, thegates fail to install it	04:11
clarkb	need to go the other way around	04:11
Steap_	clarkb: oh	04:11
Steap_	sure	04:11
clarkb	Steap_ we dont use wheels today	04:11
clarkb	so that fails	04:11
clarkb	wheels are not enabled in tests now so a package that is only wheeled wont install	04:12
Steap_	clarkb: ok	04:12
Steap_	yes, that's what happens with six	04:12
*** david-lyle has joined #openstack-infra		04:13
Steap_	so, maybe we should discard my patch and just enable wheels in tests	04:13
Steap_	shouldn't we ?	04:13
clarkb	no your patch is good	04:13
clarkb	then we enable wheels again	04:13
Steap_	why would we still need the tarballs ? :)	04:13
Steap_	if wheels are awesome	04:13
clarkb	because not everyone will have wheels enabled	04:13
Steap_	ok	04:13
clarkb	tarballs work everywhere	04:13
Steap_	yes	04:14
Steap_	that's the good thing about them :)	04:14
clarkb	wheels are system specific	04:14
dolphm	can anyone take a glance at this very short log and tell me it's not normal? http://logs.openstack.org/37/69137/5/check/check-tempest-dsvm-postgres-full/3cf2f41/logs/devstack-gate-setup-host.txt	04:14
dolphm	"useradd: user 'stack' already exists" etc	04:15
fungi	dolphm: sounds like a host got reused during a test. likely a casualty of jenkins 1.551 (which we just downgraded away from a few hours ago	04:19
dolphm	fungi: i juust filed a bug report for it -- is it a dupe of something?	04:19
clarkb	fungi timestamps are newer. does d-g create the user before devstack	04:19
dolphm	https://bugs.launchpad.net/devstack/+bug/1281902	04:19
uvirtbot	Launchpad bug 1281902 in openstack-ci "/opt/stack/new/devstack/functions-common:1128 Keystone fail to get token" [Undecided,New]	04:19
fungi	oh, i was going off the "user 'stack' already exists"	04:20
*** markmcclain has joined #openstack-infra		04:20
fungi	i believe it does, because it needs to chown some things	04:20
lifeless	Steap_: you need the tarballs to create wheels for platforms	04:21
dolphm	fungi: host re-use makes sense, given the other failures with git	04:21
clarkb	still possible reuse happened	04:21
*** markmcclain has quit IRC		04:22
*** changbl has joined #openstack-infra		04:23
*** lcheng has joined #openstack-infra		04:23
*** sarob has quit IRC		04:23
clarkb	fungi swap didnt need fixing I think you are right node was reused	04:24
*** coolsvap has joined #openstack-infra		04:25
*** gokrokve has quit IRC		04:27
*** gokrokve has joined #openstack-infra		04:28
*** jeckersb is now known as jeckersb_gone		04:29
*** dkliban has joined #openstack-infra		04:32
*** gokrokve has quit IRC		04:32
*** ryanpetrello has joined #openstack-infra		04:35
*** matsuhashi has joined #openstack-infra		04:35
*** tian has quit IRC		04:50
*** masayukig has joined #openstack-infra		04:51
*** sarob has joined #openstack-infra		04:51
*** yamahata_ has quit IRC		04:51
*** tian has joined #openstack-infra		04:52
*** yamahata_ has joined #openstack-infra		04:56
*** jaypipes has quit IRC		05:00
zaro	clarkb, fungi : was there a bug for jenkins zmq job?	05:04
*** rcleere has quit IRC		05:05
*** lcheng has quit IRC		05:09
*** khyati has joined #openstack-infra		05:11
clarkb	zaro I thought do but I may havemisread	05:12
*** vogxn has joined #openstack-infra		05:15
*** miqui has quit IRC		05:15
zaro	clarkb: i don't think so, the only bug i marked fixed today was 1276180	05:17
zaro	bug 1276180	05:17
uvirtbot	Launchpad bug 1276180 in openstack-ci "Gerrit hook scripts failing with IndexError exceptions" [High,Fix committed] https://launchpad.net/bugs/1276180	05:17
clarkb	I mustve misparsed mail on my phone then	05:18
zaro	clarkb: you have time to take a quick look at change https://review.openstack.org/#/c/60348 ?	05:19
zaro	real quick look, i promise.	05:20
*** markmcclain has joined #openstack-infra		05:23
*** nicedice has quit IRC		05:23
*** lcheng has joined #openstack-infra		05:24
*** sarob has quit IRC		05:24
clarkb	but house of cards	05:25
*** chandan_kumar has joined #openstack-infra		05:26
*** nati_ueno has joined #openstack-infra		05:27
*** markmcclain has quit IRC		05:27
*** CaptTofu has joined #openstack-infra		05:27
*** nati_ueno has quit IRC		05:28
*** nati_ueno has joined #openstack-infra		05:29
*** mfisch has quit IRC		05:31
*** CaptTofu has quit IRC		05:32
zaro	clarkb: huh? what does that mean?	05:34
clarkb	its a show on netflix. quite good.	05:35
*** mfisch has joined #openstack-infra		05:35
*** mfisch has joined #openstack-infra		05:35
openstackgerrit	Shawn Hartsock proposed a change to openstack/requirements: add pyvmomi library https://review.openstack.org/69964	05:36
*** nati_uen_ has joined #openstack-infra		05:37
*** DinaBelova_ is now known as DinaBelova		05:37
*** amotoki has quit IRC		05:39
*** nati_ueno has quit IRC		05:41
*** dstanek has quit IRC		05:48
*** dstanek has joined #openstack-infra		05:49
*** sarob has joined #openstack-infra		05:51
*** wenlock has joined #openstack-infra		05:57
openstackgerrit	A change was merged to openstack/requirements: Update hp3parclient low version number https://review.openstack.org/73727	05:58
nibalizer	clarkb: fungi sure 2GB sounds fine to start with	05:59
nibalizer	i'd also want a chunk of disk too, at least 20GB for it to write stuff down in	05:59
*** coolsvap1 has joined #openstack-infra		05:59
nibalizer	(/var)	05:59
*** gokrokve has joined #openstack-infra		06:01
*** e0ne has joined #openstack-infra		06:02
*** coolsvap has quit IRC		06:03
*** gokrokve has quit IRC		06:06
*** vkozhukalov has joined #openstack-infra		06:10
*** hdd_ has quit IRC		06:14
fungi	nibalizer: yeah, it has at least that much, but we can attach volumes too	06:17
*** cadenzajon has joined #openstack-infra		06:20
fungi	anyway, it puppeted fine and is up and in dns now	06:20
*** markmcclain has joined #openstack-infra		06:23
*** amotoki has joined #openstack-infra		06:23
*** gokrokve has joined #openstack-infra		06:24
*** sarob has quit IRC		06:24
fungi	deleting 70 nodes over 3 hours in their current state	06:25
*** amotoki has quit IRC		06:25
*** yolanda has joined #openstack-infra		06:25
*** amotoki has joined #openstack-infra		06:26
*** cadenzajon has quit IRC		06:26
*** banix has quit IRC		06:26
*** markmcclain has quit IRC		06:28
*** e0ne has quit IRC		06:28
*** markmcclain has joined #openstack-infra		06:30
nibalizer	fungi: sweeeet!	06:30
jeblair	fungi: the hpcloud nodepool providers haven't done anything since around 5:30	06:33
*** ryanpetrello has quit IRC		06:33
fungi	oh/	06:34
fungi	?	06:34
jeblair	fungi: i think they're waiting for network data	06:34
*** markmcclain has quit IRC		06:34
*** matsuhashi has quit IRC		06:35
jeblair	fungi: yes, they are all sitting in return self._sslobj.read(len)	06:36
jeblair	(inside of ssl, called from urllib	06:36
jeblair	fungi: suggest we just restart nodepool	06:36
clarkb	++	06:37
fungi	wfm	06:37
*** matsuhashi has joined #openstack-infra		06:37
jeblair	done	06:37
jeblair	i'll delete nodes that were 'building' and 'delete' while it was stopped	06:38
*** lcheng has quit IRC		06:39
jeblair	okay, that's started; 560 nodes	06:39
jeblair	2 of my scripts that were deleting keypairs similarly stopped	06:39
jeblair	i restarted them	06:39
fungi	thanks... i'm operating at a reduced capacity at this point and may just grab a nap in preparation for whatever fresh challenges await us tomorrow	06:39
fungi	wondering if hp had network maintenance or something	06:40
jeblair	fungi: good question.	06:40
openstackgerrit	Spencer Krum proposed a change to openstack-infra/config: Enable puppetdb from puppetmaster https://review.openstack.org/74612	06:40
jeblair	i also wonder if there's a way we can protect against that; basically that was a novaclient call that just never returned.	06:40
nibalizer	fungi: there is your follow up ^^	06:41
fungi	nibalizer: thanks!	06:41
nibalizer	fungi: can you check /var/log/puppetdb/puppetdb.log for any errors or warnings?	06:42
*** gokrokve_ has joined #openstack-infra		06:45
*** gokrokve has quit IRC		06:48
*** khyati has quit IRC		06:49
*** gokrokve_ has quit IRC		06:49
fungi	nibalizer: info lines filtered out for brevity... http://paste.openstack.org/show/67166	06:51
*** sarob has joined #openstack-infra		06:51
*** lcheng has joined #openstack-infra		06:52
*** lttrl has joined #openstack-infra		06:53
fungi	looks like it wants ~150g in /var/lib/puppetdb	06:53
nibalizer	fungi: okay i was more or less expecting that message	06:54
nibalizer	im not sure what the economics of adding more storage are	06:54
fungi	so we'll either need to tune it down or add a volume there	06:54
fungi	free (for us)	06:54
nibalizer	okay well if its not expensive lets just add disk	06:55
fungi	but it will need to wait for tomorrow unless someone else wants to take over. i'm officially out of steam (2am here)	06:56
nibalizer	thats fine with me	06:56
clarkb	fungi: sleep	06:56
nibalizer	im just trying to make sure you all aren't blocked on me	06:56
fungi	nibalizer: not at all--thanks for the help!	06:57
* fungi is just blocked on not enough hours in the day		06:57
nibalizer	for context, my company charges something like $8/gig/month for storage between teams, so i wondered if 150 gig would be a problem	06:57
fungi	we have very generous sponsors	06:58
lifeless	8/GBM - wow	06:58
*** e0ne has joined #openstack-infra		06:59
lifeless	thats some fancy pants storage at that rate	06:59
*** jhesketh has quit IRC		06:59
*** jhesketh__ has quit IRC		06:59
fungi	engraved on golden platters	07:00
*** e0ne has quit IRC		07:00
*** e0ne has joined #openstack-infra		07:01
*** thomasbiege has joined #openstack-infra		07:01
*** lcheng has quit IRC		07:04
*** rlandy has joined #openstack-infra		07:04
*** morganfainberg is now known as morganfainberg_Z		07:05
*** e0ne has quit IRC		07:06
*** mgagne has joined #openstack-infra		07:07
nibalizer	yea... if my company would openstack... that would be great	07:10
nibalizer	(on commodity hardware)	07:10
*** mgagne1 has quit IRC		07:10
lifeless	nibalizer: so what does it want 150GB for	07:10
lifeless	thats like 2 copies of PyPI	07:10
clarkb	logs	07:12
nibalizer	it doesn't need that much space	07:12
nibalizer	in my experience	07:12
clarkb	for every puppet run done every 10 minutes on every server. now with single use slaves its probably not terrible	07:12
nibalizer	are the single use slaves the ones that run puppet apply?	07:13
nibalizer	because those will probably never hit puppetdb	07:13
nibalizer	at my work we have 300+ nodes and 20K resources on a 17GB disk with plenty of space to go, given a 2 week retention policy	07:14
*** mrda is now known as mrda_away		07:14
clarkb	right so no impact from them	07:15
nibalizer	how many nodes are there checking in?	07:15
*** basha has joined #openstack-infra		07:16
*** jcooley_ has quit IRC		07:17
*** jcooley_ has joined #openstack-infra		07:17
*** chandan_kumar has quit IRC		07:18
*** dstanek has quit IRC		07:18
*** basha has quit IRC		07:20
*** denis_makogon has joined #openstack-infra		07:21
*** bhuvan has joined #openstack-infra		07:21
*** jcooley_ has quit IRC		07:22
*** e0ne has joined #openstack-infra		07:24
*** sarob has quit IRC		07:25
*** chandan_kumar has joined #openstack-infra		07:25
*** markwash has quit IRC		07:26
*** vogxn has quit IRC		07:27
*** e0ne has quit IRC		07:28
*** CaptTofu has joined #openstack-infra		07:29
*** dstanek has joined #openstack-infra		07:29
*** saju_m has joined #openstack-infra		07:30
*** markmcclain has joined #openstack-infra		07:31
*** CaptTofu has quit IRC		07:33
*** markmcclain has quit IRC		07:35
*** vishy has quit IRC		07:40
*** cyeoh has quit IRC		07:40
*** DinaBelova is now known as DinaBelova_		07:40
*** cyeoh has joined #openstack-infra		07:41
*** vishy has joined #openstack-infra		07:43
*** mrmartin has joined #openstack-infra		07:47
*** nati_uen_ has quit IRC		07:49
*** nati_ueno has joined #openstack-infra		07:49
*** sarob has joined #openstack-infra		07:51
*** e0ne has joined #openstack-infra		07:56
*** daniil has quit IRC		07:56
*** luqas has joined #openstack-infra		07:58
*** e0ne has quit IRC		08:00
*** skraynev_afk has quit IRC		08:01
*** sandywalsh has joined #openstack-infra		08:03
*** basha has joined #openstack-infra		08:08
*** denis_makogon has quit IRC		08:16
*** DinaBelova_ is now known as DinaBelova		08:20
*** saju_m has quit IRC		08:21
*** sarob has quit IRC		08:24
*** jgallard has joined #openstack-infra		08:26
*** thomasbiege has quit IRC		08:26
*** DinaBelova is now known as DinaBelova_		08:28
*** DinaBelova_ is now known as DinaBelova		08:28
*** vogxn has joined #openstack-infra		08:28
*** markmcclain has joined #openstack-infra		08:32
*** vogxn has quit IRC		08:33
*** openstack has joined #openstack-infra		08:42
-dickson.freenode.net- [freenode-info] if you're at a conference and other people are having trouble connecting, please mention it to staff: http://freenode.net/faq.shtml#gettinghelp		08:42
*** asadoughi has joined #openstack-infra		08:43
*** nati_uen_ has joined #openstack-infra		08:43
*** chandan_kumar has joined #openstack-infra		08:44
*** jog0 has joined #openstack-infra		08:45
*** matrohon has joined #openstack-infra		08:46
*** nati_ueno has quit IRC		08:47
*** jroovers\|afk has joined #openstack-infra		08:49
*** koolhead17 has joined #openstack-infra		08:50
*** basha has joined #openstack-infra		08:50
*** sarob has joined #openstack-infra		08:51
*** jcoufal has joined #openstack-infra		08:52
*** rossella-s has joined #openstack-infra		08:53
*** jpich has joined #openstack-infra		08:54
*** lttrl has quit IRC		08:54
*** markmcclain has joined #openstack-infra		08:59
*** nosnos_ has joined #openstack-infra		08:59
*** nosnos has quit IRC		08:59
*** markmcclain1 has joined #openstack-infra		09:00
*** markmcclain has quit IRC		09:01
*** afazekas has joined #openstack-infra		09:01
*** e0ne has joined #openstack-infra		09:04
*** markmcclain1 has quit IRC		09:05
*** coolsvap1 has quit IRC		09:06
*** dstanek has quit IRC		09:09
*** derekh has joined #openstack-infra		09:13
*** yassine has joined #openstack-infra		09:14
*** flaper87\|afk is now known as flaper87		09:16
*** hashar has joined #openstack-infra		09:18
*** hashar_ has joined #openstack-infra		09:19
*** hashar has quit IRC		09:22
*** chandan_kumar has quit IRC		09:23
*** sarob has quit IRC		09:25
*** NikitaKonovalov is now known as NikitaKonovalov_		09:29
*** CaptTofu has joined #openstack-infra		09:29
*** CaptTofu has quit IRC		09:34
*** coolsvap has joined #openstack-infra		09:34
*** DinaBelova is now known as DinaBelova_		09:35
*** yaguang has quit IRC		09:36
*** fbo_away is now known as fbo		09:37
*** dpyzhov has joined #openstack-infra		09:37
*** mrmartin has quit IRC		09:39
*** luqas has quit IRC		09:40
*** dpyzhov has joined #openstack-infra		09:41
*** chandan_kumar has joined #openstack-infra		09:41
*** jp_at_hp has joined #openstack-infra		09:42
*** chandankumar_ has joined #openstack-infra		09:42
*** chandan_kumar has quit IRC		09:46
*** matsuhashi has quit IRC		09:46
*** nosnos_ has quit IRC		09:46
*** nosnos has joined #openstack-infra		09:47
*** DinaBelova_ is now known as DinaBelova		09:47
*** matsuhashi has joined #openstack-infra		09:47
*** pblaho has joined #openstack-infra		09:50
*** gilliard has quit IRC		09:51
*** sarob has joined #openstack-infra		09:51
*** saju_m has joined #openstack-infra		09:58
*** jdurgin has quit IRC		09:59
*** noorul has joined #openstack-infra		10:00
noorul	http://logs.openstack.org/98/69498/31/check/gate-solum-docs/9de63ba/console.html#_2014-02-19_03_44_50_840	10:00
noorul	Any idea why that is happening?	10:00
*** markmcclain has joined #openstack-infra		10:01
*** NikitaKonovalov_ is now known as NikitaKonovalov		10:03
*** julienvey has joined #openstack-infra		10:04
*** markmcclain has quit IRC		10:06
*** nati_uen_ has quit IRC		10:07
*** pblaho has quit IRC		10:07
*** pblaho has joined #openstack-infra		10:08
*** jdurgin has joined #openstack-infra		10:12
*** luqas has joined #openstack-infra		10:20
*** hashar_ is now known as hashar		10:20
*** sarob has quit IRC		10:24
*** masayukig has quit IRC		10:25
*** ociuhandu has joined #openstack-infra		10:29
*** chandankumar_ has quit IRC		10:32
*** ArxCruz has joined #openstack-infra		10:33
*** alexpilotti has joined #openstack-infra		10:34
*** dpyzhov has quit IRC		10:35
*** dpyzhov has joined #openstack-infra		10:37
*** dizquierdo has joined #openstack-infra		10:38
*** dpyzhov has quit IRC		10:38
*** flaper87 is now known as flaper87\|afk		10:38
kiall	Seems the gerrit bot is MIA	10:40
*** flaper87\|afk is now known as flaper87		10:42
BobBall	jeblair: I'm using novaclient 2.15.0 with nodepool - my problem was NOVA_RAX_AUTH was set to 1, which is an environment variable read explicitly by novaclient and used to set auth_system thus introducing the dependency on the RAX authentication method.	10:44
*** chandankumar_ has joined #openstack-infra		10:47
*** nati_ueno has joined #openstack-infra		10:49
*** flaper87 is now known as flaper87\|afk		10:50
*** flaper87\|afk is now known as flaper87		10:50
*** sarob has joined #openstack-infra		10:51
*** nati_ueno has quit IRC		10:53
*** che-arne has joined #openstack-infra		10:55
*** dpyzhov has joined #openstack-infra		10:57
*** dpyzhov has quit IRC		10:59
*** dpyzhov has joined #openstack-infra		11:00
*** markmcclain has joined #openstack-infra		11:02
*** markmcclain has quit IRC		11:06
*** basha has quit IRC		11:11
*** jgallard has quit IRC		11:12
*** wenlock has quit IRC		11:14
*** chandankumar_ has quit IRC		11:14
*** chandan_kumar has joined #openstack-infra		11:15
*** NikitaKonovalov is now known as NikitaKonovalov_		11:17
*** NikitaKonovalov_ is now known as NikitaKonovalov		11:19
*** heyongli has joined #openstack-infra		11:20
*** CaptTofu has joined #openstack-infra		11:22
*** ociuhandu has quit IRC		11:23
*** mrmartin has joined #openstack-infra		11:24
*** johnthetubaguy has joined #openstack-infra		11:24
*** andreaf has joined #openstack-infra		11:25
*** sarob has quit IRC		11:25
*** CaptTofu has quit IRC		11:26
*** matsuhashi has quit IRC		11:28
enikanorov_	hi. does anyone knows what's up with check queue? looks like it's stuck	11:31
*** matsuhashi has joined #openstack-infra		11:31
ilyashakhat_	enikanorov_: maybe SergeyLukjanov ?	11:32
enikanorov_	SergeyLukjanov: ping	11:32
SergeyLukjanov	enikanorov_, ilyashakhat_, pong	11:32
SergeyLukjanov	looking on it	11:32
SergeyLukjanov	heh, I'm afraid that we have no free devstack-precise slaves	11:34
ilyashakhat_	is it ok such large 'Deleting' area on Job Stats?	11:36
*** CaptTofu has joined #openstack-infra		11:36
*** mrmartin has quit IRC		11:36
SergeyLukjanov	ilyashakhat_, nope	11:36
SergeyLukjanov	ilyashakhat_, I don't see that any jobs are running now	11:38
SergeyLukjanov	and mostly all slaves are offline	11:38
*** jroovers\|afk is now known as jroovers		11:39
*** ociuhandu has joined #openstack-infra		11:42
*** noorul has left #openstack-infra		11:42
SergeyLukjanov	sdague, I've checked all jenkins nodes and we have only several online slaves on jenkins and jenkins01	11:42
sdague	yeh, it looks like something went all bonkers again	11:43
sdague	it looks like, from scrollback, they were working on it last night	11:44
sdague	so I think it's just a wait for fungi thing, because this is the class of things where you need root to go fix I think	11:45
SergeyLukjanov	sdague, yup	11:45
SergeyLukjanov	I think that we have tons of 'deleting' slaves that are already offline on jenkins	11:45
SergeyLukjanov	probably, nodepool is dead :(	11:45
*** che-arne has quit IRC		11:45
sdague	yeh, probably	11:46
*** hashar is now known as hasharAW		11:47
*** sarob has joined #openstack-infra		11:51
SergeyLukjanov	hm, one more idea is that it's related to the fact that gate-noop is now running on single use nodes	11:56
*** coolsvap has quit IRC		11:57
*** hasharAW has quit IRC		11:58
*** lcostantino has joined #openstack-infra		12:01
*** markmcclain has joined #openstack-infra		12:03
*** rfolco has joined #openstack-infra		12:07
*** markmcclain has quit IRC		12:07
*** ArxCruz has quit IRC		12:08
*** lcostantino has quit IRC		12:08
*** max_lobur has joined #openstack-infra		12:09
*** lcostantino has joined #openstack-infra		12:09
*** mrmartin has joined #openstack-infra		12:10
*** ArxCruz has joined #openstack-infra		12:10
mrmartin	re	12:12
*** banix has joined #openstack-infra		12:13
*** dpyzhov has quit IRC		12:13
*** lcostantino has quit IRC		12:14
mrmartin	something wrong with gating jobs, some tasks had been started more than 9 hours ago	12:14
sdague	mrmartin: yep, no one that can fix it is currently awake	12:14
*** ArxCruz has quit IRC		12:16
*** matsuhashi has quit IRC		12:20
kiall	Guess it's time for someone to give the gate a kick ;) Totally and utterly wedged.	12:20
*** NikitaKonovalov is now known as NikitaKonovalov_		12:20
*** ArxCruz has joined #openstack-infra		12:20
*** sarob has quit IRC		12:24
*** matsuhashi has joined #openstack-infra		12:25
*** lcostantino has joined #openstack-infra		12:30
*** luqas has quit IRC		12:32
*** yamahata has quit IRC		12:36
*** lcostantino has quit IRC		12:37
*** Nikolay_St has quit IRC		12:38
*** hashar has joined #openstack-infra		12:42
*** dpyzhov has joined #openstack-infra		12:45
*** jgallard has joined #openstack-infra		12:46
*** che-arne has joined #openstack-infra		12:49
*** sarob has joined #openstack-infra		12:51
*** smarcet has joined #openstack-infra		12:52
*** banix has quit IRC		12:54
*** CaptTofu has quit IRC		12:55
*** CaptTofu has joined #openstack-infra		12:56
*** nosnos has quit IRC		12:56
*** CaptTofu has quit IRC		13:00
*** NikitaKonovalov_ is now known as NikitaKonovalov		13:01
*** lcostantino has joined #openstack-infra		13:01
*** david-lyle has quit IRC		13:01
*** banix has joined #openstack-infra		13:01
*** koolhead17 has quit IRC		13:01
*** koolhead17 has joined #openstack-infra		13:01
*** markmcclain has joined #openstack-infra		13:04
*** matsuhashi has quit IRC		13:08
*** matsuhashi has joined #openstack-infra		13:08
*** matsuhashi has quit IRC		13:08
*** markmcclain has quit IRC		13:08
*** dkranz has quit IRC		13:09
ekarlso	did someone threw a banany peel into the gate or ?	13:11
SergeyLukjanov	fungi, clarkb, jeblair, mordred, gate is dead // just want to be sure that you'll see it :)	13:13
*** zhiyan_ is now known as zhiyan		13:14
*** luqas has joined #openstack-infra		13:17
*** ken1ohmichi has quit IRC		13:18
*** dprince has joined #openstack-infra		13:19
*** mrmartin has quit IRC		13:21
*** thomasbiege has joined #openstack-infra		13:21
*** pdmars has joined #openstack-infra		13:22
*** sarob has quit IRC		13:24
*** pdmars has quit IRC		13:25
*** weshay has joined #openstack-infra		13:33
*** luqas has quit IRC		13:33
*** dolphm has quit IRC		13:35
*** dolphm has joined #openstack-infra		13:35
*** dcramer__ has joined #openstack-infra		13:36
*** lcostantino has quit IRC		13:36
*** dolphm has quit IRC		13:37
*** dolphm has joined #openstack-infra		13:38
*** sandywalsh has quit IRC		13:39
*** mrmartin has joined #openstack-infra		13:40
*** CaptTofu has joined #openstack-infra		13:45
*** dizquierdo has quit IRC		13:46
*** hashar has quit IRC		13:47
*** sarob has joined #openstack-infra		13:51
*** sandywalsh has joined #openstack-infra		13:53
*** thomasem has joined #openstack-infra		13:53
*** mfer has joined #openstack-infra		13:55
*** dkehn_ has quit IRC		13:55
*** dolphm has quit IRC		13:55
*** CaptTofu has quit IRC		13:56
*** dolphm has joined #openstack-infra		13:57
*** dpyzhov has quit IRC		13:57
*** ryanpetrello has joined #openstack-infra		13:57
*** luqas has joined #openstack-infra		13:57
*** dpyzhov has joined #openstack-infra		13:57
*** salv-orlando has quit IRC		13:58
*** CaptTofu has joined #openstack-infra		13:58
*** dolphm has quit IRC		13:58
*** lcostantino has joined #openstack-infra		13:59
*** dolphm has joined #openstack-infra		13:59
*** gordc has joined #openstack-infra		13:59
*** hashar has joined #openstack-infra		14:01
*** sandywalsh_ has joined #openstack-infra		14:02
*** markmcclain has joined #openstack-infra		14:04
*** prad has quit IRC		14:05
*** banix has quit IRC		14:05
*** sandywalsh has quit IRC		14:06
*** heyongli has quit IRC		14:06
*** dcramer__ has quit IRC		14:07
*** markmcclain has quit IRC		14:08
*** lcostantino has quit IRC		14:09
*** salv-orlando has joined #openstack-infra		14:09
*** markmc has joined #openstack-infra		14:10
*** saju_m has quit IRC		14:11
*** dkranz has joined #openstack-infra		14:11
*** dpyzhov has quit IRC		14:12
*** salv-orlando has quit IRC		14:12
*** andreaf has quit IRC		14:14
*** smarcet has left #openstack-infra		14:17
*** smarcet has joined #openstack-infra		14:17
*** pafuent has joined #openstack-infra		14:18
*** yamahata has joined #openstack-infra		14:19
fungi	looking now	14:22
fungi	looks like we've piled up fake ready nodes again	14:22
*** e0ne has quit IRC		14:23
*** julim has joined #openstack-infra		14:23
sdague	fungi: yeh, it seems to have started happening about the same time the infra team was signing off last night	14:23
sdague	so I'm curious if a late change caused issues	14:23
*** dolphm has quit IRC		14:24
fungi	not sure. i'll have to read scrollback. we were still stabilizing things coming out of the jenkins upgrade and redowngrade	14:24
*** sarob has quit IRC		14:24
*** lcostantino has joined #openstack-infra		14:24
fungi	by the time i passed out	14:25
sdague	yep, that's fair	14:25
*** oubiwann has joined #openstack-infra		14:25
ttx	SergeyLukjanov: I thought you'd magically get things fixed while everyone else sleeps	14:26
fungi	hopefully this isn't an issue with jenkins 1.632.2, because if so we've basically ruled out being able to use any of the jenkins releases which aren't riddled with known security holes	14:26
*** dpyzhov has joined #openstack-infra		14:26
fungi	er, 1.532.2	14:26
SergeyLukjanov	ttx, I have no root access and it looks like problem is about non-code stuff :(	14:26
fungi	which was what drove me to upgrade us to 1.551 yesterday. huge list of security fixes, some in bits we expose to the public	14:27
ttx	SergeyLukjanov: just trolling you, ignore me :)	14:27
SergeyLukjanov	ttx :)	14:27
SergeyLukjanov	fungi, can I help you somehow?	14:28
fungi	i've got 10 loops running in parallel right now deleting any nodepool nodes which are >3 hours in any state	14:28
fungi	this should get things moving again, i think	14:28
fungi	checking scrollback to see if they left us any other breadcrumbs	14:29
ttx	When I saw 400 checks piled up I thought: "we should really have those tripleo checks appear in a separate display".. then I looked again	14:30
fungi	yah	14:30
SergeyLukjanov	heh	14:31
*** e0ne has joined #openstack-infra		14:31
*** dims has quit IRC		14:32
*** wenlock has joined #openstack-infra		14:33
fungi	so as best i can tell, nodepool thinks it piled about 600 nodes onto jenkins04, a nearly couple hundred of them in a ready state, but jenkins04's interface shows only offline nodes	14:34
*** jeckersb_gone is now known as jeckersb		14:34
fungi	hard to tell how many, but it could be in the hundreds	14:34
*** dims has joined #openstack-infra		14:35
*** thomasbiege has quit IRC		14:35
fungi	looks like we're down gerritbot and statusbot too...	14:37
fungi	2014-02-19 08:37:31 <-- openstackgerrit (~openstack@review.openstack.org) has quit (Ping timeout: 260 seconds)	14:37
fungi	2014-02-19 08:38:01 <-- openstackstatus (~openstack@eavesdrop.openstack.org) has quit (Ping timeout: 272 seconds)	14:37
sdague	fungi: the deletes are already helping, things started to move again in gate queue	14:38
fungi	i'll get the bots restarted and see whether something happened in raxland around 6 hours ago (which could also coincide with when this started to go south)	14:39
*** yamahata has quit IRC		14:39
*** dstanek has joined #openstack-infra		14:39
*** yamahata has joined #openstack-infra		14:39
fungi	sort of odd to see bots from two different servers fall off from a ping timeout at the same moment. and those servers are in the same rax region as the nodepool server and jenkins masters	14:40
*** dkliban has quit IRC		14:40
* fungi makes the obligatory "clouds" sigh and finds a second cup of coffee		14:40
SergeyLukjanov	fungi, it sounds like rax network outage could be the reason	14:41
*** ildikov_ has joined #openstack-infra		14:41
fungi	"Our engineers have received reports of a brief network disruption that impacted a portion of our DFW2 data center starting at approximately 02:36 CST. The team engaged has stabilized the issue at approximately 02:41 CST and will continue to monitor for further impact. "	14:42
ArxCruz	lifeless: hey, can you give me your blessing here https://review.openstack.org/#/c/70152/ ?	14:42
ArxCruz	:)	14:42
fungi	https://status.rackspace.com/	14:42
fungi	02:36 CST is 08:36 UTC for the timezone-impared	14:43
fungi	well, we already know that nodepool behaves terribly in the face of provider outages, and thankfully lifeless and derekh have patches proposed which should help that	14:44
*** jp_at_hp has quit IRC		14:47
fungi	that tempest change failing near the head of the gate hit connectivity issues to pypi.python.org trying to download pip around 14:38, just a few minutes ago, from a rax-dfw slave too	14:47
SergeyLukjanov	fungi, it's good	14:47
*** russellb has joined #openstack-infra		14:48
SergeyLukjanov	fungi, is pip installation the only external op?	14:48
fungi	(separate note, we still need to neuter get-pip so that it installs from a local cache on these systems)	14:48
fungi	SergeyLukjanov: nah, jobs also need to look up dns records, retrieve git updates and zuul refs, upload logs/artifacts and stream data back to the jenkins master too	14:49
*** dhellmann has quit IRC		14:49
*** jp_at_hp has joined #openstack-infra		14:49
fungi	anyway, my pint was that whatever this is happening in dfw, it might be ongoing	14:49
*** dhellmann has joined #openstack-infra		14:49
SergeyLukjanov	fungi, bad wording from my side, I mean outside of our infra	14:49
*** dizquierdo has joined #openstack-infra		14:49
fungi	s/pint/point/ (though now i feel like i need a pint too)	14:50
SergeyLukjanov	fungi, dns and pip	14:50
*** dhellmann has quit IRC		14:50
fungi	SergeyLukjanov: possibly... some less common jobs also download other sorts of things from the internet too	14:50
SergeyLukjanov	fungi, "whatever this is happening in dfw, it might be ongoing" :(	14:50
fungi	well, just noting that was a connectivity issue from a few minutes ago, and when i checked the slave's location, it was in that same region which had the outage earlier	14:51
*** sarob has joined #openstack-infra		14:51
sdague	how easy would it be to pull the whole region?	14:51
fungi	but it could also just be an unfortunate coincidence. i'm still casting my net wide here	14:51
*** dkliban has joined #openstack-infra		14:51
*** dhellmann has joined #openstack-infra		14:51
fungi	sdague: not easy... for historical reasons we have most of our static infrastructure deployed in rax-dfw... we'd need to rebuild a lot of longer-lived servers	14:52
sdague	so, yeh, spot checking additional fails	14:52
sdague	they all look like dfw	14:52
sdague	and all because of connectivitiy	14:52
fungi	(pretty much any infra service you can think of, aside from nodepool slaves, backups and some experimental systems, is in dfw)	14:53
anteaya	ah	14:53
sdague	oof	14:53
anteaya	sounds like some waves of movement might be a good idea	14:53
fungi	so the good news here is that we could recover from complete loss of dfw, but it's not a move to be undertaken on a whim	14:54
anteaya	no not a whim	14:54
anteaya	but perhaps a slow migration	14:54
*** miqui has joined #openstack-infra		14:54
*** jnoller has joined #openstack-infra		14:54
* anteaya looks for a land bridge over the glacier		14:54
anteaya	which server would be the easiest to migrate?	14:55
*** luqas has quit IRC		14:55
anteaya	follow up question, which would be the most important?	14:55
fungi	just about any of them would be roughly similarly easy to migrate, with a few exceptions, but there's just a lot of systems	14:55
*** dkliban has quit IRC		14:56
*** mwagner_lap has joined #openstack-infra		14:56
anteaya	right	14:56
fungi	think back to the several easels of marker-smeared paper we had diagramming them from a high level at the bootcamp... then mentally add a bunch more we've brought online since then	14:56
anteaya	yes	14:56
*** luqas has joined #openstack-infra		14:57
anteaya	several easels worth	14:57
anteaya	if I started up and etherpad to list them, would this help the conversation/migration?	14:57
anteaya	even if the consclusion is not to migrate?	14:57
*** prad has joined #openstack-infra		14:57
anteaya	https://etherpad.openstack.org/p/migrate-all-the-things	14:59
*** mgagne has quit IRC		14:59
anteaya	do joing me	14:59
anteaya	join	14:59
*** wenlock_ has joined #openstack-infra		15:00
*** CaptTofu has quit IRC		15:01
fungi	i think that's premature. what we need is a group discussion about ways to spread systems out to reduce the impact of provider outages, which probably means some redundancy... or we remind ourselves that as we've previously stated we're operating at the mercy of providers donating these resources, and they're up most of the time, and when they're not, we should just go out for a drink and clean up	15:01
fungi	the mess later	15:01
*** CaptTofu has joined #openstack-infra		15:01
anteaya	okay	15:01
fungi	but right now i need to focus on stabilizing this and see what else might be left broken from the earlier incident	15:02
anteaya	well while waiting for the others I don't mind having a place to copy/paste	15:02
anteaya	and I can abandon the etherpad later if need be	15:02
anteaya	right	15:02
anteaya	and I need something to do because I can't help you with that	15:02
*** CaptTofu has quit IRC		15:02
*** CaptTofu has joined #openstack-infra		15:02
fungi	if we continue to see any new gate failures (besides the ones there) which are network connectivity problems and are on nodepool nodes in dfw, i'll temporarily scale nodepool off that region to buy us a little more stability	15:03
*** e0ne_ has joined #openstack-infra		15:03
*** protux has joined #openstack-infra		15:04
*** jaypipes has joined #openstack-infra		15:05
*** markmcclain has joined #openstack-infra		15:05
fungi	though at the cost of 132 nodes of capacity	15:06
anteaya	:(	15:07
fungi	yeah, it's a balancing act	15:07
*** e0ne has quit IRC		15:07
*** dpyzhov has quit IRC		15:08
*** openstackstatus has joined #openstack-infra		15:08
*** wenlock_ has quit IRC		15:08
*** openstackgerrit has joined #openstack-infra		15:09
*** dpyzhov has joined #openstack-infra		15:09
fungi	okay, we've got openstackstatus and openstackgerrit back	15:10
*** markmcclain has quit IRC		15:10
anteaya	yay	15:10
*** Ajaeger has joined #openstack-infra		15:10
anteaya	what server are they on?	15:10
fungi	one is on eavesdrop and the other is on review	15:10
jeblair	fungi: good morning	15:10
fungi	jeblair: i hope so!	15:10
dims	:)	15:11
*** eharney has joined #openstack-infra		15:11
fungi	jeblair: quick summary, "brief" outage in rax-dfw around 08:30 utc (but maybe with lingering effects, jury's still deliberating)	15:11
*** NikitaKonovalov is now known as NikitaKonovalov_		15:12
jeblair	fungi: so... i was reading the scrollback and talk of a mass migration from dfw and imagined something rather serious...	15:12
jeblair	fungi: anything i should kick or check?	15:13
fungi	not especially serious, no	15:13
*** bknudson has quit IRC		15:13
fungi	jeblair: a deeper health check on nodepoold would be great. it does seem to be adding replacement nodes as i delete the stale ones, but curious whether it warrants restarting	15:13
jeblair	ack	15:14
sdague	one other oddity	15:14
sdague	I can't seem to find a single functioning py26 node in the currently running list	15:15
jeblair	sdague: ack	15:15
sdague	so that might be a parallel thing to look into, because even if the nodepool recovers, that will hold us up	15:15
*** jeckersb is now known as jeckersb_gone		15:16
*** jergerber has joined #openstack-infra		15:16
fungi	sdague: i'll make a second pass to delete the py26 nodes which aren't in use, to speed that along	15:16
*** DinaBelova is now known as DinaBelova_		15:16
sdague	fungi: cool	15:17
fungi	thanks for spotting it	15:17
*** jeckersb_gone is now known as jeckersb		15:17
anteaya	I count 63 servers listed in cacti	15:19
anteaya	which are now listed in the etherpad	15:19
jeblair	fungi: hpcloud seems operational; az1 and az3 are idle because they are at capacity, likely with false-ready nodes	15:19
*** e0ne_ has quit IRC		15:19
jeblair	fungi: you have deletes going on that will catch those?	15:19
fungi	jeblair: yes	15:19
*** malini has joined #openstack-infra		15:20
*** e0ne has joined #openstack-infra		15:20
fungi	10 loops going in parallel right now	15:20
fungi	though it looks like nodepoold is adding new ready nodes which aren't picking up jobs either... i'm starting to suspect it's having trouble adding them to jenkins masters successfully	15:20
fungi	none of the nodes currently in a ready state are >1hr there	15:21
jeblair	anteaya: keep in mind that it's generally better to have single-point-of-failure servers that interact with each other in the same data center; it's likely that if we spread out some of our services to 2 data centers, we woud be subject to twice the number of service interruptions	15:21
jeblair	anteaya: the exception of that is if we can make services truly ha; that is difficult for many of the more important things we run.	15:22
*** gokrokve has joined #openstack-infra		15:22
fungi	and looks like all the centos6 nodes which have been ready for over 0.2 hours are on jenkins04 for some reason	15:22
fungi	which is the same master which had most of the nodes earlier. i'm checking it out now	15:22
fungi	maybe it's having issues	15:22
fungi	yeah, jenkins04 currently has a handful of offline devstack slaves from rax-dfw assigned to it, and nothing else	15:23
*** sarob has quit IRC		15:23
anteaya	jeblair: ah a good point, I did not know that	15:23
fungi	so whatever nodepool thinks is going on, it's wrong	15:23
jeblair	fungi: seeing if the jenkins provider for 04 is stuck...	15:25
jeblair	File "/usr/lib/python2.7/ssl.py", line 160, in read	15:25
jeblair	return self._sslobj.read(len)	15:25
jeblair	hey same place novaclient was stuck yesterday	15:25
fungi	is that via requests, or direct socket?	15:26
*** ihrachys has quit IRC		15:26
jeblair	2014-02-19 08:33:40,851 DEBUG nodepool.JenkinsManager: Manager jenkins04 running task <nodepool.jenkins_manager.CreateNodeTask object at 0x7f3fddc7be10>	15:26
*** sandywalsh_ has quit IRC		15:26
jeblair	stuck since then	15:26
fungi	yep, that's when all the trouble began	15:26
jeblair	fungi: its via File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen	15:27
fungi	ahh, okay	15:27
jeblair	fungi: so i think we're going to need a restart	15:27
*** dmsimard has joined #openstack-infra		15:27
fungi	okay, i'll take care of it if you're done debugging it's present state	15:28
*** atiwari has joined #openstack-infra		15:28
jeblair	i am; go for it	15:28
dmsimard	Hi guys, I think a merge glitched but I wanted to ask you to make sure not to re-try the merge if there's problems right now.. https://review.openstack.org/#/c/74082/	15:28
Shrews	jeblair, fungi: not really following the issue too closely, but when I've seen network errors like that (getting stuck on reads after cloud outages), having keepalive enabled on the sockets usually helps prevent the "stuck"	15:28
*** zhiyan is now known as zhiyan_		15:29
*** bknudson has joined #openstack-infra		15:29
*** dkliban has joined #openstack-infra		15:29
jeblair	Shrews: good idea; hopefully we can get that, or something, passed through all the layers we need	15:29
jeblair	(novaclient, requests, urllib[123456789], ssl, socket)	15:30
openstackgerrit	Ruslan Kamaldinov proposed a change to openstack-infra/storyboard: Update developer documentation https://review.openstack.org/74713	15:30
*** ihrachys has joined #openstack-infra		15:30
jeblair	dmsimard: thanks for pointing that out; that looks like a new kind of failure we have only seen a couple of times. it suggests something wrong with the git repos that are cached on the slave images	15:32
jeblair	dmsimard: unfortunately that slave is gone now; but you should be able to just try again	15:32
annegentle	where's the one true Jenkins to find out if something built? www-01.jenkins.openstack.org?	15:32
*** coolsvap has joined #openstack-infra		15:33
dmsimard	jeblair: I checked the recheck bugs but haven't found anything that seemed to be like that. Should I recheck with a specific bug # ?	15:33
fungi	blasting out the ready nodes from prior to the restart, so that we get back some momentum. then i'll do the building and delete lists i saved from it	15:33
fungi	annegentle: what specifically are you looking for?	15:33
annegentle	Jenkins where are you?	15:33
jeblair	dmsimard: i don't think there is one; could you please file one on openstack-ci, link to that job, and paste the bug here?	15:33
annegentle	fungi: why the source for this training manuals page http://git.openstack.org/cgit/openstack/openstack-manuals/tree/doc/training-guides/lab001-control-node.xml isn't being published to http://docs.openstack.org/training-guides/content/lab001-control-node.xml.html	15:34
dmsimard	jeblair: Will do and recheck against that bug. Thanks.	15:34
*** mgagne has joined #openstack-infra		15:34
annegentle	fungi: specifically apt-get dist-update is showing on published, apt-get dist-upgrade is what's in the source	15:34
fungi	annegentle: logs.openstack.org is going to be your best bet, but you need to know how to build the url. i'll get you an example for that one specifically	15:34
annegentle	fungi: nice, a worked example	15:34
*** bhuvan has quit IRC		15:35
fungi	annegentle: though before i jump into that, keep in mind taht we've had a bit of a setback this morning and jobs are just now starting to catch up... if it's for a merged change listed in the post pipeline at http://status.openstack.org/zuul/ then it possibly hasn't been finished yet	15:36
fungi	i see about a dozen changes for openstack-manuals which haven't finished in post yet	15:37
fungi	still awaiting worker assignments	15:37
Ajaeger	annegentle: that's strange, published last on the 1st of February...	15:37
fungi	yeah, so sounds like it's been broken for longer. i'll pick a commit which merged a few days ago to be assured i can get you a good example	15:38
fungi	rather than one which might still be pending completion	15:38
*** david-lyle has joined #openstack-infra		15:39
*** esker has joined #openstack-infra		15:39
Ajaeger	Fungi, annegentle: Go to http://docs.openstack.org/training-guides/content/ and compare the list of chapters with http://docs.openstack.org/training-guides/content/lab001-control-node.xml.html	15:40
*** jgrimm has joined #openstack-infra		15:40
openstackgerrit	Petr Blaho proposed a change to openstack-infra/config: Adds gate-tuskar-docs job to zuul. https://review.openstack.org/74756	15:40
openstackgerrit	Petr Blaho proposed a change to openstack-infra/config: Adds gate-python-tuskarclient-docs job to zuul. https://review.openstack.org/74757	15:40
mordred	annegentle: there is no one-true-jenkins. we have 8-masters in a pool behind gearman	15:40
Ajaeger	annegentle: this one http://docs.openstack.org/training-guides/content/bk001-associate-training-guide.html shows as last section "Architect Training Guide"	15:40
*** dpyzhov has quit IRC		15:41
Ajaeger	But the other one contains "Introduction to OpenStack" and further chapters afterwards	15:41
Ajaeger	annegentle: so, this looks like a problem in the openstack-manuals side, nothing fungi can help with.	15:41
*** sandywalsh_ has joined #openstack-infra		15:42
*** guitarzan has joined #openstack-infra		15:42
*** afazekas has quit IRC		15:42
fungi	annegentle: so, as a working example, if you wanted to see the post jobs for http://git.openstack.org/cgit/openstack/openstack-manuals/commit/?id=254befa4824ef2b3f34be2e54eddcfabf082a6d3	15:42
fungi	annegentle: the log url for that is http://logs.openstack.org/25/254befa4824ef2b3f34be2e54eddcfabf082a6d3/	15:42
*** esker has quit IRC		15:43
annegentle	fungi: Ajaeger: helpful! So, figure out what patch would have fixed it, work from there	15:43
fungi	specifically the training-guide build/publication for 254befa post-merge is http://logs.openstack.org/25/254befa4824ef2b3f34be2e54eddcfabf082a6d3/post/openstack-training-guides/6692343/console.html	15:43
Ajaeger	annegentle: see this change: https://review.openstack.org/#/c/70499/	15:43
Ajaeger	The chapters you are missing are not published anymore...	15:44
Ajaeger	We really need to remove old files from the server!	15:44
annegentle	Ajaeger: oh yes we do without me doing it manually! Arghhh	15:44
*** wenlock_ has joined #openstack-infra		15:44
*** markwash has joined #openstack-infra		15:44
Ajaeger	annegentle: did you talk with clarkb or others to remove regularly old files?	15:45
fungi	Ajaeger: annegentle: unless all your jobs can be fixed to publish into completely separate subdirectories so that they can delete and recreate that entire tree on publication, there's not much which can really be done about having old files	15:45
annegentle	we have in the past, never got a good solution (yep what fungi says)	15:45
Ajaeger	fungi, we publish in complete separate subdirectories.	15:46
fungi	Ajaeger: at least previously it was a "too many cooks in the kitchen" problem (multiple jobs writing to common locations)	15:46
Ajaeger	But how do you want to do it - upload to subdirectory.new, then mv subdirectory to subdirectoy.old etc.	15:46
Ajaeger	fungi: That problem shouldn't be there anymore at all.	15:47
*** esker has joined #openstack-infra		15:47
jgriffith	fungi: jeblair should we hold off on +2/A patches for now?	15:47
fungi	if they no longer do, i think there's an option to the ftp publisher to completely remove the target directory when it runs... though you also get a brief outage for that content on every update i think	15:47
jgriffith	fungi: jeblair or does it not matter	15:47
Ajaeger	But I might be wrong and oversee something ;)	15:47
Ajaeger	and that brief outage is the problem.	15:47
*** esker has quit IRC		15:47
anteaya	there is a neutron patch failing in the gate	15:47
Ajaeger	Uploading takes a minute or more - that's too long IMO.	15:47
*** esker has joined #openstack-infra		15:47
fungi	jgriffith: approving stuff won't hurt. we're backlogged, but the systems can queue things up just fine	15:47
anteaya	it appears it will be removed without a gate reset	15:47
anteaya	or I can remove it	15:48
*** markwash has quit IRC		15:48
jgriffith	fungi: ok, thanks	15:48
annegentle	yeah the brief outage is a stopper	15:48
dmsimard	jeblair: https://bugs.launchpad.net/openstack-ci/+bug/1282136	15:48
uvirtbot	Launchpad bug 1282136 in openstack-ci "Git problem: "Failed to resolve 'HEAD' as a valid ref."" [Undecided,New]	15:48
jgriffith	just wanted to make sure I don't compound issues	15:48
annegentle	fungi: jeblair: mordred: if the docs.openstack.org site goes to Jekyll or some such do you imagine this particular problem would go away	15:48
fungi	annegentle: Ajaeger: getting a location to publish things without being limited to ftp access (so that we could rsync) would make that easier	15:49
jeblair	Ajaeger, annegentle: after i3 i'd like to switch to scp copying	15:49
*** rcleere has joined #openstack-infra		15:49
*** jaypipes has quit IRC		15:49
annegentle	fungi: jeblair: mordred: Todd Morey has ideas for that, with the overall vision being Docbook source > built to html > built with jekyll	15:49
Ajaeger	fungi, regarding approving - could you put approve https://review.openstack.org/73690 - and remember to delete openstack-api-ref since we go from maven to freestyle.	15:49
jeblair	or rsync if we can swing it	15:49
mordred	annegentle: uhm. - I have no idea what a jekyll is	15:49
*** fbo is now known as fbo_away		15:50
Ajaeger	jeblair: can we rsync over ssh?	15:50
fungi	a release name candidate which unfairly lost the poll	15:50
annegentle	mordred: http://jekyllrb.com/docs/usage/	15:50
*** gpocentek has joined #openstack-infra		15:50
mordred	annegentle: ok. so it's just like doing it with the maven build from what I can see	15:50
jeblair	annegentle: i don't think the rendering systems affects the underlying problem.	15:50
annegentle	mordred: yeah	15:50
mordred	yeah. what jeblair said	15:50
annegentle	jeblair: that's true	15:50
fungi	Ajaeger: we can rsync over ssh but need a functional shell for that (just scp/sftp access won't help) and the destination needs rsync installed	15:50
Ajaeger	fungi: Yeah, indeed.	15:51
mordred	from our side, we don't really care if you use maven or jekyll - other than wanting to make sure that jekyll is installable and not just hipster crap	15:51
annegentle	fungi: Ajaeger: moving to jekyll would be a good reason to get off Cloud Sites which gives us shell access	15:51
jeblair	mordred: ++	15:51
Ajaeger	What is Jekyll`	15:51
*** sarob has joined #openstack-infra		15:51
mordred	annegentle: if jekyll incentivizes moving off of cloud sites, I'm all for it	15:51
annegentle	Ajaeger: http://jekyllrb.com/docs/usage/	15:51
jeblair	annegentle: we can move off of cloud sites and switch to scp or rsync independent of when/if you switch to jekyll	15:51
mordred	but also, what jeblair said	15:51
annegentle	mordred: jeblair: get Todd Morey to get the design done :)	15:51
anteaya	Ajaeger: it is a templating language made up by tom of github	15:51
fungi	rb is an abbreviation for hipster crap in japanese, right?	15:52
anteaya	Ajaeger: used widely in the ruby community	15:52
mordred	fungi: ++	15:52
fungi	;)	15:52
annegentle	jeblair: not to me, the two are directly related because I don't get a marked-enough improvement	15:52
jeblair	annegentle: no, i'm saying we don't need to wait for that. we have other reasons we need to change how the docs are published	15:52
annegentle	jeblair: in other words, I'm not willing to risk all the changes without a killer redesign	15:52
annegentle	jeblair: Don't wanna. :)	15:52
jeblair	annegentle: i'm sorry, we need to move and it can't wait for todd. we need to make the publishing pipeline better. :)	15:53
annegentle	Ajaeger: jeblair: we can of course revisit at the summit and get a game plan for moving off of Cloud Sites, but right now there's not enough incentive	15:53
Ajaeger	Thanks for the explanations about Jekyll - let's see how this integrates with our XML publishing	15:53
anteaya	Ajaeger: I'm betting it won't	15:53
anteaya	ruby doesn't integrate	15:53
annegentle	Ajaeger: to me it lets us stop publishing "webhelp" and publish plain html (or xhtml)	15:53
anteaya	that is a point of pride for ruby	15:53
Ajaeger	annegentle: We still have the option of running remotely a job that deletes old files.	15:53
annegentle	Ajaeger: yeah	15:53
*** sandywalsh_ has quit IRC		15:53
mordred	I think we've confused about four different conversations here	15:53
annegentle	jeblair: I'm fine with not waiting on todd but need more incentive	15:53
Ajaeger	annegentle: something to discuss in Atlanta I guess	15:54
annegentle	mordred: that's four more fun!	15:54
mordred	the incentive is that docs publication is a special pony right now	15:54
* anteaya would like to focus on the fire fighting in the gate		15:54
mordred	and cloud sites are a bit of a pita to deal with	15:54
annegentle	mordred: not enough incentive	15:54
*** tjones has joined #openstack-infra		15:54
annegentle	mordred: not with a month and a week before an rc	15:54
annegentle	mordred: mostly it's timin	15:54
annegentle	timing	15:54
jeblair	annegentle: for starters, we have the problem we're talking about now where you have to delete things; but moreover, we need to stop using jenkins, and the ftp publishing is not really compatible with what we're moving to	15:54
jeblair	annegentle: we're not doing it now! :)	15:54
mordred	what jeblair said	15:55
annegentle	jeblair: yes then timing is all I'm concerned with.	15:55
mordred	god no. not this instant	15:55
Ajaeger	mordred: documentation will always be special ;) But yeah, let's make it less special :)	15:55
annegentle	jeblair: when does jenkins go away	15:55
mordred	ok. that makes more sense	15:55
*** dcramer_ has joined #openstack-infra		15:55
mordred	annegentle: as soon as we can make it go away, which means we need to get rid of a few things, like ftp publishing	15:55
annegentle	a redesign is HIGH priority too because of translation, versioning, old files, all that	15:55
mordred	but - when I say "as soon" - I mean without affecting things like FF	15:56
*** vrovachev has joined #openstack-infra		15:56
Ajaeger	jeblair: so, one part in moving of Jenkins - and fixing a bug with image api publishing - is getting https://review.openstack.org/73690 in ;)	15:56
annegentle	not trying to conflate redesign with building, but to me they're tightly tied due to what all a redesign can also fix	15:56
jeblair	annegentle: don't worry, the process will be working and tested and in use and in production before we move the docs	15:56
annegentle	jeblair: you know I trust you guys, just trying to make sure you know the importance of a redesign (since you work at the Foundation I tell you these things too)	15:56
jeblair	annegentle: we should keep these things in mind so that the two projects don't make incompatible decisions, but for the most part, they really are separate and we shouldn't tie one to the other -- it could just slow both down	15:57
jeblair	annegentle: i wish i could make todd go faster. :)	15:57
*** amotoki_ has joined #openstack-infra		15:58
jeblair	annegentle: but he's constantly getting sucked into side projects, and so the larger project of "improve how the website (all of it) is published" seems to move slowly :(	15:58
jeblair	annegentle: believe me, i'm as interested in todd completing this kind of work as you are.	15:58
annegentle	jeblair: don't we all :) Yes, it's a tough rock/hard place position	15:58
*** banix has joined #openstack-infra		15:59
annegentle	jeblair: and I'm happy to be convinced of a 2-phase approach, pulling too many levers at once is probably folly	15:59
annegentle	phase 1: un-jenkins phase 2: remove webhelp output	15:59
*** salv-orlando has joined #openstack-infra		16:00
jeblair	annegentle: cool; we're about 2 years into taking baby steps to remove jenkins and getting near the end. we _try_ to not bite of more than we can chew.	16:01
fungi	if you can dislocate your jaw like a snake, it helps too	16:02
fungi	no chewing required	16:03
persia	Digestion takes longer that way, though	16:03
jeblair	fungi: the graph suggests we have used nodes!	16:03
*** salv-orlando_ has joined #openstack-infra		16:03
*** salv-orlando has quit IRC		16:04
*** salv-orlando_ is now known as salv-orlando		16:04
fungi	jeblair: yep. i'm still churning through the ready ones, but should start blowing through the old building/delete nodes here shortly	16:04
*** dpyzhov has joined #openstack-infra		16:05
*** amcrn has joined #openstack-infra		16:05
fungi	the gate sparkline has dropped sharply and check has at least plateaued now	16:05
fungi	so we've regained forward momentum	16:05
anteaya	will that neutron failure cause a gate reset?	16:06
anteaya	I can remove it if so	16:06
anteaya	I'm thinking it won't	16:06
jeblair	anteaya: zuul already did that for you	16:06
anteaya	great	16:07
fungi	it caused a gate reset a while ago, but the benefit of nnfi is that it shields us somewhat from the pain of resets as long as there aren't multiple failing changes causing the broken ones further down to get retried repeatedly	16:07
*** sandywalsh_ has joined #openstack-infra		16:07
fungi	the state of that change is "failed assuming everything in front of it succeeds"	16:07
anteaya	yes	16:07
openstackgerrit	Petr Blaho proposed a change to openstack-infra/config: Adds gate-python-tuskarclient-docs job to zuul. https://review.openstack.org/74757	16:08
openstackgerrit	Petr Blaho proposed a change to openstack-infra/config: Adds gate-tuskar-docs job to zuul. https://review.openstack.org/74756	16:08
anteaya	and the failure was FAIL: process-returncode	16:08
jeblair	only about 2k more keypairs to delete from hpcloud	16:08
fungi	jeblair: nearly done!	16:08
*** roz has joined #openstack-infra		16:08
anteaya	which we have seen before and which clarkb has said in an email is due to sys.exit being used in the tests	16:08
anteaya	it is the line between what zuul can do for me and what I need to do manually that I am trying to get better at figuring out	16:09
anteaya	now the one ahead of it is failing	16:09
anteaya	so two of them	16:09
fungi	yep, but it depends on that one, so it won't get retried unless something ahead of those also fails	16:10
fungi	but that has caused the two cinder changes in the gate to get tests restarted without the neutron changes in line	16:10
fungi	i think i have time today to try and get the py3k-precise nodepool nodes into operation	16:11
fungi	if things don't get bad again	16:12
roz	quick question: just the owner of a change can mark it as WIP? I am working on a change where I am not the original author and I'd like to submit a patch as WIP but I can't do it.Ootionss: submit it as DRAFT - submit the patch and put a note in a comment "This is a WIP" ? any other suggetions?	16:12
*** salv-orlando has quit IRC		16:12
anteaya	roz: don't use draft	16:12
fungi	roz: there's an acl which controls that. for most openstack projects the core reviewer group on that project also has wip control	16:12
*** yassine has quit IRC		16:13
*** yassine has joined #openstack-infra		16:13
fungi	roz: but i think the heart of the issue here is that gerrit leaves the change owner as the original patchset submitter rather than the most recent patchset submitter. i'm curious to see whether that's configurable in latest gerrit releases	16:14
fungi	zaro: ^ ? (when you're around)	16:14
*** Sukhdev has joined #openstack-infra		16:14
*** DinaBelova_ is now known as DinaBelova		16:15
roz	thanks for the replies. When you say core reviewer can control the WIP means that they can mark a change as WIP ro they can add me as "WIP cpntroller" for that specific change?	16:15
anteaya	they can mark the change WIP	16:16
anteaya	they can't change what permissions you have	16:16
fungi	roz: or un-wip a wip change too	16:16
anteaya	unless they make you core	16:16
roz	thanks, now it's clear.	16:16
fungi	roz: right, it's an acl covering wip control for an entire project--can't be assigned on a per-change basis except by modifying the owner of the change (which i don't think our current gerrit release has a feature to make that easy)	16:17
*** pcrews has joined #openstack-infra		16:19
fungi	most of the old ready nodes are gone, and i've started some processes deleting old building nodes next	16:19
fungi	oh, also, i was wrong about disappearing at 21:00 today for the osug monthly... i also have a tax appointment prior to that, so will actually be mostly offline starting at 19:00 utc	16:20
anteaya	happy tax appointment	16:20
fungi	so a little over couple hours from now	16:20
anteaya	I hope you exit smiling	16:20
*** rossella has joined #openstack-infra		16:21
anteaya	will clarkb be back today?	16:21
*** mrmartin has quit IRC		16:21
fungi	anteaya: i believe he was back in seattle last night	16:21
*** tjones has quit IRC		16:21
anteaya	yes	16:21
*** rossella has quit IRC		16:22
anteaya	I'm also gone for the day soon	16:22
anteaya	another appointment to fix my back/neck/head	16:22
anteaya	hopefully this should wrap it up	16:22
*** rossella-s has quit IRC		16:23
vrovachev	hi guys, please, review me: https://review.openstack.org/#/c/74342/	16:24
*** afazekas has joined #openstack-infra		16:24
*** david_lyle_ has joined #openstack-infra		16:24
anteaya	does the post job upstream-translation-update require a specific kind of node? lots of post jobs waiting for that job to get a node	16:25
*** sarob has quit IRC		16:25
anteaya	the only specific node I am aware of is centos for python26 jobs	16:26
fungi	anteaya: yes, there is a trusted static node named "proposal" assigned to jenkins.o.o which runs those, in order, one at a time	16:26
fungi	so they have a tendency to queue up	16:26
anteaya	ah	16:26
fungi	oh! and it got marked offline	16:26
anteaya	I don't see any of them running	16:26
anteaya	okay	16:27
fungi	looks like i need to patch teh regex for the jobs which run on it. checking to see what it ran last	16:27
anteaya	k	16:27
*** rossella-s has joined #openstack-infra		16:27
*** jcoufal has quit IRC		16:28
*** david-lyle has quit IRC		16:28
*** yassine has quit IRC		16:28
*** yassine has joined #openstack-infra		16:29
*** chuck__ has joined #openstack-infra		16:29
fungi	seems i missed setting propose-requirements-updates to add the reusable_node parameter function. i've re-onlined the slave so it should burn through those fairly quickly unless it hits another propose-requirements-update job before we merge the fix	16:29
fungi	er, propose-requirements-updates	16:30
anteaya	okay	16:30
anteaya	I'll watch it	16:30
anteaya	one down	16:31
*** Ajaeger has quit IRC		16:31
*** esker has quit IRC		16:32
*** esker has joined #openstack-infra		16:32
anteaya	fungi: how long does it need between jobs?	16:35
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Check server status in batch https://review.openstack.org/74773	16:35
openstackgerrit	Jeremy Stanley proposed a change to openstack-infra/config: Don't offline after propose-requirements-updates https://review.openstack.org/74774	16:35
jeblair	fungi, clarkb, mordred: https://review.openstack.org/74773 is another fairly small nodepool change that should make a huge difference	16:36
anteaya	fungi: it is not currently running	16:36
fungi	anteaya: looks like it takes about 1-1.5 minutes per job... https://jenkins.openstack.org/computer/proposal.slave.openstack.org/	16:36
anteaya	to finish the job	16:36
anteaya	but to move from one to the other?	16:36
fungi	anteaya: a few seconds	16:36
*** sandywalsh_ has quit IRC		16:36
anteaya	okay, can you check it again	16:36
anteaya	it isn't running, been at least 20 seconds	16:37
*** esker has quit IRC		16:37
fungi	anteaya: it's running	16:37
fungi	i just watched it complete a glance translation update and start a ceilometer one	16:37
*** tjones has joined #openstack-infra		16:38
anteaya	I don't even see a glance patch in post	16:38
anteaya	but at least it is running	16:38
*** gokrokve has quit IRC		16:38
anteaya	thanks for the link	16:38
fungi	anteaya: it might not have been in the post queue	16:38
anteaya	oh	16:38
anteaya	I had just been watching the post queue	16:38
*** smarcet has quit IRC		16:38
fungi	it was in the periodic pipeline... https://jenkins.openstack.org/job/glance-propose-translation-update/325/parameters/	16:38
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Check server status in batch https://review.openstack.org/74773	16:39
*** gokrokve has joined #openstack-infra		16:39
dmsimard	jeblair: Got another merge fail on that same git bug again, is there another affected slave ? https://review.openstack.org/#/c/74082/	16:39
anteaya	ah, okay thanks	16:40
SergeyLukjanov	jeblair, fwiw #2 lgtm (https://review.openstack.org/74773)	16:40
fungi	dmsimard: looks like it might be a broken repository on one of the git server farm. i'll double-check that	16:40
dmsimard	fungi: Thanks, I submitted https://bugs.launchpad.net/openstack-ci/+bug/1282136 FYI	16:40
uvirtbot	Launchpad bug 1282136 in openstack-ci "Git problem: "Failed to resolve 'HEAD' as a valid ref."" [Undecided,New]	16:40
*** ravikumar_hp has joined #openstack-infra		16:41
ravikumar_hp	quick question - What is Jenkins URL that runs nightly jobs	16:42
anteaya	ravikumar_hp: we have 7 jenkins	16:42
anteaya	they all run jobs	16:42
SergeyLukjanov	anteaya, 8	16:42
SergeyLukjanov	:)	16:43
*** gyee has joined #openstack-infra		16:43
fungi	ravikumar_hp: http://logs.openstack.org/periodic/	16:43
ravikumar_hp	ok. Jenkins that runs Tempest nightly jobs	16:43
anteaya	yes 8	16:43
*** gokrokve has quit IRC		16:43
fungi	anteaya: SergeyLukjanov: technically 9 if you also count jenkins-dev	16:43
anteaya	yes	16:43
SergeyLukjanov	yeah ^)	16:43
fungi	though for tempest periodic jobs, only 7 of them run those	16:44
anteaya	I was going to move to figuring out what ravikumar_at_mothership wanted	16:44
*** andreaf has joined #openstack-infra		16:44
anteaya	but you beat me too it	16:44
ravikumar_hp	anteaya: i am trying to find out if there is Tempest job that runs everyday other than gated test	16:45
jeblair	dmsimard: thanks! i was quiet because i was quickly sshing into that slave, which hadn't been deleted	16:45
jeblair	dmsimard: it does indeed look like the cached git repo for puppet-swift on that node was bad; i saved a copy of it	16:46
dmsimard	jeblair: We ran into the same issue for puppet-neutron, I did a recheck and it worked - I linked it to the bug	16:46
fungi	jeblair: dmsimard: yes, my eye jumped to the remote update, but the git farm's copies of that repo seem fine	16:46
fungi	and 'git clone file:///opt/git/stackforge/puppet-swift' was the local source of the issue	16:47
fungi	wonder if it's bad on the image in that provider region	16:47
jeblair	fungi: yeah, i'm going to check that next	16:47
*** beagles has quit IRC		16:47
*** dkliban is now known as dkliban_afk		16:47
*** virmitio has joined #openstack-infra		16:48
anteaya	ravikumar_hp: I'm looking here: http://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/zuul/layout.yaml	16:48
*** tjones has quit IRC		16:48
anteaya	I don't have the answer yet, but you are welcome to join me	16:48
jeblair	dmsimard, fungi: both failures were puppet-swift in az2;	16:48
jeblair	dmsimard: did you say you saw a puppet-neutron as well?	16:48
fungi	jeblair: sounds like a strong correlation	16:48
fungi	sample size 2 ;)	16:49
*** sandywalsh_ has joined #openstack-infra		16:49
*** oubiwann has quit IRC		16:49
dmsimard	jeblair: Yeah, I linked it, it's on http://status.openstack.org/rechecks/ - review is https://review.openstack.org/#/c/74709/	16:49
jeblair	also az2	16:49
ravikumar_hp	anteaya: ok. Thanks	16:49
anteaya	ravikumar_hp: and in the periodic logs that fungi linked you to: http://logs.openstack.org/periodic/	16:50
anteaya	you can see all the periodic tempest job logs	16:50
ravikumar_hp	anteaya: ok	16:50
*** jaypipes has joined #openstack-infra		16:50
fungi	jeblair: also, last successful build of that image was 181.65 hours ago	16:50
anteaya	ravikumar_hp: did you have more to your question or does that give you the information you need?	16:50
fungi	hpcloud-az2 really does not like to build images	16:51
jeblair	no it doesn't	16:51
anteaya	I need to change tasks and don't want to leave you hanging	16:51
*** tjones has joined #openstack-infra		16:51
*** b3nt_pin has joined #openstack-infra		16:51
ravikumar_hp	anteaya: i got the information. That's it .Thanks .	16:51
*** sarob has joined #openstack-infra		16:51
anteaya	ravikumar_hp: great	16:51
*** b3nt_pin is now known as beagles		16:51
*** beagles is now known as beagles_brb		16:52
*** hemnafk is now known as hemna_		16:53
*** smarcet has joined #openstack-infra		16:53
*** sarob_ has joined #openstack-infra		16:53
*** tjones has quit IRC		16:54
*** markmcclain has joined #openstack-infra		16:54
*** tjones has joined #openstack-infra		16:55
jeblair	fungi: the git repos with the latest timestamps are all bad. perhaps we didn't call sync three times while spinning in a circle.	16:55
*** esker has joined #openstack-infra		16:55
dmsimard	lol ?	16:55
fungi	that takes me back	16:56
*** sarob has quit IRC		16:56
medieval1	XYZZY	16:56
*** esker has quit IRC		16:56
fungi	sync ; sleep 10 ; sync ; sleep 10; sync; sleep 10 ; shutdown -h now	16:56
jeblair	fungi: prepare_devstack has a sync, but not prepare_node, which is where the clones are	16:56
fungi	PLUGH	16:56
*** sabari_ has joined #openstack-infra		16:57
jeblair	(and you can bet the sync is in prepare_devstack because it's needed)	16:57
annegentle	fungi: so I know you said the merge would look funny, but https://review.openstack.org/#/c/74777/ is the result of my trying to merge openstack/operations-guide feature/edits with master	16:57
annegentle	fungi: seems to be blank (no changes, just a commit message)	16:57
*** sabari_ is now known as sabari		16:57
fungi	annegentle: sometimes it's funnier. but it's never a laugh riot or anything	16:57
annegentle	fungi: snort	16:57
*** e0ne has quit IRC		16:58
dmsimard	jeblair, fungi: You guys let me know when to try a reverify :)	16:58
annegentle	fungi: I followed the steps in https://wiki.openstack.org/wiki/GerritJenkinsGit#Merge_Commits with git checkout -b oreilly/71943 remotes/origin/feature/edits as my first step	16:58
*** ociuhandu has quit IRC		16:58
fungi	annegentle: anyway, if your ops guide jobs include draft building, you should be able to preview the result there from the check run before approving	16:58
annegentle	fungi: it really is supposed to look like that?	16:59
* annegentle is freaked out :)		16:59
fungi	annegentle: usually, yes	16:59
annegentle	fungi: no way. Okay!	16:59
annegentle	I'll wait for it to build then! Nice	16:59
*** jcooley_ has joined #openstack-infra		16:59
fungi	annegentle: the critical part is the "parent(s)" field there... you can see it lists the commits you're merging	16:59
jeblair	dmsimard: at this point, i think the lack of sync is the problem in the image build. i'll fix it but it'll take a few hours to work through the system; you can play the odds if you like, or come back to it later in the day for better odds.	17:00
jeblair	dmsimard: considering the state of the backlog, if you can do the latter, that would probably be best	17:00
fungi	annegentle: and also the "branch" field which tells you which branch you're merging them on	17:00
jeblair	dmsimard: i'll update the bug in a minute; thank you very much for catching this and pointing me at a live server!	17:00
fungi	annegentle: presumably the two parents are one from each branch you're trying to merge	17:00
annegentle	fungi: okay, I see parents now.	17:00
*** dkliban_afk is now known as dkliban		17:00
annegentle	fungi: so I've got a spreadsheet with all the patches I need to go to feature/edits at https://docs.google.com/spreadsheet/ccc?key=0AhXvn1h0dcyYdGtiRXo5ODFMbkhRZkVROGdTY3RjWVE#gid=0 and I'll just go through the list from oldest to newest	17:01
*** dpyzhov has quit IRC		17:01
annegentle	fungi: and I think that helps me figure out parentage	17:01
annegentle	fungi: woops, gotta get on a call, thanks for the help!	17:01
dmsimard	jeblair: np, I appreciate you fixing the issue - i'm the one thanking you, here :p	17:01
*** jaypipes has quit IRC		17:02
BobBall	I've managed to break my environment around pip and pbr while playing with nodepool... I may have deleted something I shouldn't have done. http://paste.openstack.org/show/67315/ Can anyone suggest how I can uninstall / reinstall PBR in a sensible way?	17:03
*** thomasbiege has joined #openstack-infra		17:03
*** derekh has quit IRC		17:03
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Add sync calls to all prepare scripts https://review.openstack.org/74780	17:03
jeblair	oh wait let me attach the bug to that	17:04
*** jaypipes has joined #openstack-infra		17:04
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Add sync calls to all prepare scripts https://review.openstack.org/74780	17:05
*** jcooley_ has quit IRC		17:05
*** markmc has quit IRC		17:06
jeblair	BobBall: delete the virtualenv and start over?	17:07
BobBall	I wish I had done this in a virtual environment... :P	17:07
jeblair	oh, i thought /usr/workspace/scratch/openstack/citrix/nodepool/easy-install.pth was in a venv	17:07
BobBall	It might have been at one point - I'm still getting used to using venvs by default, and so perhaps the issue is I might have installed it in a venv and then done something else outside the venv that broke it or similar	17:08
BobBall	that file doesn't exist	17:08
*** zul has quit IRC		17:08
*** chuck__ has quit IRC		17:08
*** zul has joined #openstack-infra		17:09
BobBall	the venv environment I have works great - but I'm trying to fix my system so I don't have to be in a venv to use novaclient :P	17:09
jeblair	BobBall: then at this point i usually go mucking about and try to remove things manually	17:09
jeblair	BobBall: it's possible mordred may have better advice	17:09
jeblair	mordred: btw, do know the relative merits of https://review.openstack.org/#/c/74521/ vs https://review.openstack.org/#/c/74523/ ?	17:10
SergeyLukjanov	jeblair, all scripts are based on prepare_node, is it 'as designed' to sync twice?	17:10
fungi	jeblair: so... there are jobs running, but not very many. looking at the jenkins masters' webuis, some have no assigned nodes at all, some have nodes but they're all marked offline, some have nodes running jobs but none have a bunch	17:10
*** jcooley_ has joined #openstack-infra		17:10
jeblair	SergeyLukjanov: yes, so that we don't have to think about it. :)	17:11
SergeyLukjanov	jeblair, can't disagree ;)	17:11
fungi	nodepoold even after restarting and deleting everything, seems to have 574 nodes on jenkins04	17:12
*** sandywalsh_ has quit IRC		17:12
jeblair	fungi: oh, but those were false-ready nodes	17:12
jeblair	fungi: and probably need to be deleted	17:12
jeblair	fungi: i think nodepool marks them ready _before_ adding them to jenkins	17:12
fungi	jeblair: i deleted any nodes which were marked ready at the time of the restart	17:12
jeblair	oh	17:13
fungi	so these are all new since the restart	17:13
*** sarob_ has quit IRC		17:13
*** cadenzajon has joined #openstack-infra		17:13
*** sarob has joined #openstack-infra		17:13
*** amotoki_ has quit IRC		17:14
fungi	http://paste.openstack.org/show/67316/ is the current breakdown for jenkins04 according to nodepool	17:14
fungi	skimming its webui, i believe the used and delete counts, but not the ready	17:15
jeblair	fungi: i think the jenkins04 manager is stuck again waiting for a response	17:18
jeblair	fungi: nodepool reports the connection ESTABLISHED but it doesn't show up on jenkins04	17:18
*** sarob has quit IRC		17:18
fungi	maybe jenkins04 is struggling? i can put it into shutdown and see what happens when i restart nodepoold	17:19
jeblair	fungi: hrm, i wouldn't expect a half-closed connection as a result of that	17:20
fungi	or maybe this is ongoing network issues in dfw	17:20
jeblair	fungi: seems more likely that we're losing fin packets	17:20
*** mrmartin has joined #openstack-infra		17:21
openstackgerrit	A change was merged to openstack/requirements: Sync requirements to oslo.vmware https://review.openstack.org/74569	17:22
jeblair	fungi: nm, jenkins04 is moving	17:23
fungi	this looks decidedly non-graceful... http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=1411&rra_id=all	17:23
*** chandankumar_ has joined #openstack-infra		17:24
fungi	checking to see whether there's anything obvious from teh javamelody side	17:25
jeblair	fungi: there are a lot of offline nodes on jenkins04	17:25
fungi	https://jenkins04.openstack.org/monitoring?part=graph&graph=fileDescriptors	17:25
fungi	jeblair: yeah, it seems like nodepool may be having trouble adding or deleting nodes from jenkins04	17:25
*** khyati has joined #openstack-infra		17:26
*** nicedice has joined #openstack-infra		17:26
openstackgerrit	A change was merged to openstack-infra/config: Don't offline after propose-requirements-updates https://review.openstack.org/74774	17:26
fungi	i wonder if it's hitting an open file descriptors limit	17:26
jeblair	fungi: nodepool's interactions with it are _very_ slow	17:26
jeblair	fungi: so maybe it is struggling	17:26
*** chandankumar_ has quit IRC		17:27
jeblair	5-10 seconds between api calls	17:27
fungi	and the number of open files flatlining at 4k for long periods seems suspiciously like a max	17:28
*** sandywalsh_ has joined #openstack-infra		17:28
jeblair	fungi: jenkins04 has 413 slaves attached to it which is considerably more than our intent	17:28
fungi	right. just wondering what caused it to get so many new slaves assigned after the nodepool restart	17:28
fungi	it was similarly the one with most of the slaves before the nodepool restart (though i deleted those). could the predictive assignment in nodepool be misinterpreting that?	17:30
fungi	thinking it wants to run that many jobs?	17:30
jeblair	fungi: it tries to balance across all providers that are up. that does mean that if a provider comes up with 0 nodes, it's going to try to catch it up to the others quickly	17:31
fungi	that sounds like the reverse of what we see here then	17:32
*** gokrokve has joined #openstack-infra		17:32
fungi	so maybe it's not a feedback loop problem	17:32
*** gyee has quit IRC		17:33
*** oubiwann has joined #openstack-infra		17:33
jeblair	fungi: which restart are you thinking of? it never did a mass-allocation to 04 aronud 15:30	17:34
fungi	no, i think it builds back up on 04	17:34
*** beagles_brb is now known as beagles		17:35
fungi	the established tcp connections and open file descriptor graphs look like they might be proportional to the number of connected slaves	17:35
jeblair	fungi: it seems that even nodepool thinks 04 has all the slaves	17:35
fungi	they start to ramp up almost linearly from the time of the nodepool restart	17:35
jeblair	just about	17:36
*** jcoufal has joined #openstack-infra		17:36
markmcclain	with the current jenkins issue it is expected lose a release job?	17:37
markmcclain	I pushed this tag: http://git.openstack.org/cgit/openstack/python-neutronclient/tag/?id=2.3.4	17:37
markmcclain	but the release job disappeared from zuul	17:37
jeblair	fungi: okay, i'm going to load the db locally and debug the allocator. in the mean time, why don't you put 04 into shutdown and see if it redistributes after that.	17:40
fungi	markmcclain: looks like jenkins04 ate it... http://logs.openstack.org/59/5931316dd7cddd6834eed6bd9665bd5ef7adbffc/release/python-neutronclient-tarball/0f27e48/console.html	17:40
fungi	jeblair: will do. that was going to be my next suggestion	17:40
fungi	markmcclain: i'll retrigger it once we get this going again	17:41
*** pblaho has quit IRC		17:41
markmcclain	fungi: thanks	17:41
fungi	jeblair: manually deleting the ready/building nodes assigned to jenkins04 now that it's in shutdown	17:42
*** luqas has quit IRC		17:43
jeblair	fungi: so a lot of those actually still have active threads trying to add them	17:43
fungi	i can hold off if you like	17:43
jeblair	fungi: it's probably okay. i think it will cause a lot of errors in the daemon, but it should be ok.	17:44
jeblair	carry on	17:44
fungi	proceeding in that case	17:44
*** esker has joined #openstack-infra		17:44
*** wenlock has quit IRC		17:45
openstackgerrit	Devananda van der Veen proposed a change to openstack-infra/config: Let infra manage pyghmi releases https://review.openstack.org/74499	17:45
*** sandywalsh_ has quit IRC		17:46
*** sarob has joined #openstack-infra		17:46
clarkb	morning	17:47
*** basha has joined #openstack-infra		17:48
*** hashar has quit IRC		17:48
*** packet has joined #openstack-infra		17:48
*** Ryan_Lane has quit IRC		17:49
*** max_lobur is now known as max_lobur_afk		17:50
fungi	clarkb: welcome to the continuation of "what can possibly break next?"	17:51
clarkb	jenkins04 is in trouble?	17:51
jeblair	clarkb: nodepool is being mean to it	17:52
fungi	as software goes, nodepool really can be a bit of a bully	17:52
*** markwash has joined #openstack-infra		17:58
anteaya	morning clarkb	17:58
*** mrmartin has quit IRC		17:59
*** sandywalsh_ has joined #openstack-infra		17:59
*** dangers_away is now known as dangers		18:00
*** rossella-s has quit IRC		18:00
openstackgerrit	Henry Gessau proposed a change to openstack-infra/config: Incompatible chrome extension has been fixed https://review.openstack.org/74796	18:00
*** jpich has quit IRC		18:02
*** thomasbiege has quit IRC		18:05
*** hogepodge has joined #openstack-infra		18:05
clarkb	fungi: to answer your question, elasticsearch. The cluster fell over around 0842UTC today	18:07
clarkb	I have restarted elasticsearch6 which was the only node node back in the cluster at this point and ES is recovering shards to go back to all green	18:08
fungi	clarkb: that was not a good time for, well, anything running in dfw i suspect	18:08
*** david_lyle_ is now known as david_lyle		18:08
clarkb	oh did dfw have a bad time?	18:08
clarkb	I am still trying to catch up on everything, but ES is on its way to being happy again so I can move onto the next thing	18:09
fungi	ahh, you probably haven't had time to read scrollback	18:09
clarkb	nope	18:09
anteaya	dfw had a bad time yesterday	18:09
fungi	yes, rax-dfw network outage	18:09
anteaya	which you came in towards the end of	18:09
*** chris_johnson has joined #openstack-infra		18:09
fungi	today utc though	18:09
anteaya	then we went to bed, except Sergey - credit to him for not doing anything drastic	18:09
anteaya	and then dfw had problems again today	18:10
anteaya	cleanup is underway	18:10
fungi	apparently rax-dfw problem was just after 08:30 utc	18:10
clarkb	oh that explains why my weechat derped	18:10
anteaya	and debugging to see what we can do since dfw might have problems some more	18:10
clarkb	fungi: that lines up perfectly with ES cluster issues, I won't dig into them too deeply then. I may increase the wait for master timeout though	18:11
*** ildikov_ has quit IRC		18:11
fungi	clarkb: yeah, i don't know what the exact duration was, but we can guess from gaps in cacti graphs	18:11
*** chandan_kumar has quit IRC		18:12
anteaya	oh and fungi is afk for a good portion of the afternoon	18:13
*** basha has quit IRC		18:13
anteaya	and so am I	18:13
*** Sukhdev has quit IRC		18:14
fungi	yeah, i need to vaporize in about 45 minutes	18:15
lifeless	fungi: speaking of said patches, i haven't looked at reviews yet; are either of them acceptable?	18:16
fungi	lifeless: i basically haven't reviewed anything in the past 24 hours which was > 1 line long unless it was addressing an in-progress firefight	18:17
lifeless	ack	18:17
*** esker has quit IRC		18:18
openstackgerrit	A change was merged to openstack-infra/config: Add sync calls to all prepare scripts https://review.openstack.org/74780	18:18
fungi	okay, retriggered markmcclain's tarball job, only to discover that the authentication error it failed on the first time doesn't seem to be related to jenkins04 issues after all... got the same on jenkins05 now: http://logs.openstack.org/59/5931316dd7cddd6834eed6bd9665bd5ef7adbffc/release/python-neutronclient-tarball/0f27e48,1/console.html	18:19
*** jcooley_ has quit IRC		18:20
fungi	checking logs on static.o.o	18:20
*** nati_ueno has joined #openstack-infra		18:21
*** johnthetubaguy has quit IRC		18:21
clarkb	is it trying to use the credential store for scp now as well?	18:22
clarkb	might explain oddness in scp'ing to tarballs if the credentials stuff changed there	18:22
jeblair	lifeless: you and derekh both seemed to propose a patch that does similar things; is that correct? should we choose one or the other?	18:22
fungi	nice! "Feb 19 18:09:10 static sshd[32104]: Invalid user hudson from 162.242.149.179"	18:22
fungi	apparently the jenkins upgrade/downgrade has mucked with credentials	18:23
jeblair	lifeless: i haven't reviewed yet, but knowing what to do with those two might help	18:23
lifeless	jeblair: derekh and I independently approached the problem, now you get to choose	18:23
lifeless	jeblair: I will review his; I think on his description that both approaches are valid	18:24
jeblair	lifeless: ok, thanks. that will help.	18:24
lifeless	jeblair: we probably can do both at the same time in fact	18:24
lifeless	jeblair: though I don't know if that would be needed	18:24
jeblair	belt and braces and a rope and some duct tape too? :)	18:24
lifeless	yes	18:25
lifeless	superglue as well	18:25
ArxCruz	jeblair: regarding https://review.openstack.org/#/c/69715/ which paramiko version are you guys using? because I've tested in fedora19 and it fails because there's no get_tty argument on sshclient	18:25
fungi	yep, so it definitely has the username as "hudson" in the scp publisher for tarballs.o.o on at least two of the masters so far, probably more	18:25
fungi	i'll correct them	18:26
*** jroovers has quit IRC		18:26
*** chris_johnson is now known as wchrisj\|away		18:26
jeblair	fungi: that's very weird.	18:26
*** jroovers has joined #openstack-infra		18:26
clarkb	fungi: jeblair: wouldn't be surprised if older jenkins read files different	18:26
fungi	jeblair: i'm making sure nothing else about that publisher got reverted. i think hudson was the name it used back before we folded it onto static.o.o	18:27
openstackgerrit	A change was merged to openstack-infra/zuul: Log components starts in Zuul.Server https://review.openstack.org/66939	18:28
*** wchrisj\|away is now known as chris_johnson		18:30
lifeless	ArxCruz: +1'd - I am not core in -infra in general, only in pbr	18:30
ArxCruz	lifeless: ;) thanks	18:31
*** chris_johnson has quit IRC		18:32
*** wchrisj has joined #openstack-infra		18:32
fungi	it also changed the target directory from /srv/static to /srv (for some reason it didn't alter any of that publisher on jenkins.o.o)	18:32
*** krtaylor has quit IRC		18:34
*** krtaylor has joined #openstack-infra		18:36
*** mriedem has quit IRC		18:37
fungi	yeah, it seems to have only happened on 04-07 so maybe something to do with the way we copied in the configs for those when we built them?	18:37
*** jgallard has quit IRC		18:37
clarkb	possible since they were created all at once iirc	18:37
fungi	er, 03-07	18:37
*** jgrimm has quit IRC		18:38
*** jgrimm has joined #openstack-infra		18:38
clarkb	oh 03 was before 04-07 so maybe?	18:38
fungi	i thought we created 01+02 at one time, 03+04 together and then 05-07 together	18:40
clarkb	could be, my memory is fuzzy	18:41
clarkb	that was around LCA when a bunch of stuff was happening	18:41
*** mriedem has joined #openstack-infra		18:41
*** jp_at_hp has quit IRC		18:42
clarkb	I thought mordred spun up 4 new masters	18:42
fungi	proposal.slave got offlined by another reqs update job before the layout.yaml change made it onto zuul, so i've brought it back online again	18:42
anteaya	we had 3 before lca, and 2 more during lca	18:42
fungi	oh, i guess puppet agent is still disabled on zuul anyway?	18:42
*** morganfainberg_Z is now known as morganfainberg		18:42
anteaya	then mordred brought up 3 more after that	18:42
*** esker has joined #openstack-infra		18:42
clarkb	fungi: must be since I thought your change merged	18:43
anteaya	we had jenkins, 01 and 02 before	18:43
anteaya	and 03 and 04 during lca	18:43
*** dizquierdo has quit IRC		18:43
*** esker has quit IRC		18:44
anteaya	I remember since my graphic was current on the monday and stale on the tuesday	18:44
*** e0ne has joined #openstack-infra		18:44
*** esker has joined #openstack-infra		18:45
*** dcramer_ has quit IRC		18:46
fungi	markmcclain: https://pypi.python.org/pypi/python-neutronclient/2.3.4	18:46
markmcclain	fungi: awesome.. thanks	18:47
anteaya	lifeless: ^	18:47
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Fix typo in allocation https://review.openstack.org/74803	18:47
jeblair	fungi: ^	18:47
jeblair	fungi, clarkb: we're going to want to restart with that soon. understanding that bug leads me to believe that the distribution is currently piling up on a different jenkins	18:48
fungi	jeblair: gah!	18:48
fungi	and yes, i think so	18:48
fungi	i however won't be around for that bit of fun, i suspect	18:48
jeblair	fungi, clarkb: the behavior change that triggered that is the addition of the py3k nodes, which happened to be the last ones in the loop. since there are few of them, the distribution is rather skewed.	18:48
clarkb	reviewing now	18:49
*** jroovers has quit IRC		18:49
openstackgerrit	K Jonathan Harker proposed a change to openstack-infra/config: Parameterize the status page urls https://review.openstack.org/74557	18:50
fungi	for some reason i seem to be unable to bring proposal.slave back online in the jenkins.o.o webui this time... after i click the button it just sits	18:50
clarkb	jeblair: that is a fun typo	18:51
fungi	now adding to the fun, i can't even get the login link on jenkins.o.o to work after logging out and trying to log back in	18:52
fungi	doesn't seem to be the dns issue review.o.o was having along those lines yesterday though	18:52
jeblair	fungi: ok if i restart nodepool? (i manually installed that)	18:52
clarkb	elasticsearch is doing a slow recovery :( this is going to be like last week for ES I think	18:52
fungi	jeblair: sure	18:53
clarkb	fungi: I am giving jenkins.o.o and proposal a shot	18:53
clarkb	but logging in seems to be unresponsive for me too	18:53
clarkb	nothing in the jenkins log about it htough	18:53
jeblair	restarting and running deletes for nodes in building/delete state	18:54
openstackgerrit	Justin Lund proposed a change to openstack/requirements: Update neutron-client minimum to 2.3.4 https://review.openstack.org/74805	18:55
*** malini has left #openstack-infra		18:55
openstackgerrit	Justin Lund proposed a change to openstack/requirements: Update neutron-client minimum to 2.3.4 https://review.openstack.org/74805	18:56
*** melwitt has joined #openstack-infra		18:56
*** oubiwann has quit IRC		18:56
jeblair	all the keypairs are deleted	18:58
anteaya	yay	18:58
anteaya	did it take 13 hours?	18:59
jeblair	anteaya: about	18:59
anteaya	well now we have that datapoint	18:59
clarkb	fungi: apache is throwing proxy timeout errors when trying to log in	18:59
*** dcramer_ has joined #openstack-infra		18:59
fungi	okay, i'm headed out. i'll get online from when/where i can over the next ~6 hours, and then get some more stuff done later when i'm home again	18:59
jeblair	fungi: have fun	19:00
fungi	jeblair: thanks	19:00
* anteaya leaves too		19:00
clarkb	trying to read from jenkinses securityRealm/commengeLogin which I assume does the openid dance	19:00
*** e0ne has quit IRC		19:00
*** tjones has quit IRC		19:04
*** dkehn_ has joined #openstack-infra		19:05
clarkb	jeblair: any ideas on where else to look for jenkins.o.o login issues? jenkins.log is pretty much empty	19:05
clarkb	I am tempest to restart the server since it isn't doing anyhting at the moment	19:05
lifeless	clarkb: lol	19:06
clarkb	wow	19:06
lifeless	clarkb: your fingers failed you	19:06
clarkb	*tempted	19:06
clarkb	my print drivers cache common words	19:06
lifeless	clarkb: have you seen that fax encoding bug ?	19:06
jeblair	clarkb: not without logging in. :) i vote you restart	19:06
clarkb	lifeless: no	19:06
lifeless	clarkb: so there's a compression driver for some faxs that takes a bitmap from the page - say a 0	19:07
jeblair	jenkins 04 has a lot of nodes attached to it that don't exist. i'm going to stop it and manually remove the configs	19:07
lifeless	clarkb: and then applies it everywhere there are 0's	19:07
clarkb	jeblair: ok	19:07
*** mriedem has quit IRC		19:07
lifeless	clarkb: the algorithm is tunable for noise etc	19:07
lifeless	clarkb: if you don't tune it just right you end up with numbers - e.g. payroll data, cheques, bank accounts - totally messed up	19:08
jeblair	starting jenkins04	19:10
clarkb	jenkins.o.o is dead. it can't getRootDir. Investigating now :/	19:11
*** wchrisj has quit IRC		19:11
*** chris_johnson has joined #openstack-infra		19:12
*** dstanek is now known as dstanek_afk		19:12
jeblair	jenkins04 is up, getting slaves added, and running jobs	19:13
clarkb	hrm now it is up, maybe that is a false alarm	19:13
lifeless	clarkb: http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning	19:14
*** mriedem has joined #openstack-infra		19:14
clarkb	and proposal.slave.o.o is running jobs again	19:14
lifeless	clarkb: have a read of that and weep	19:15
*** julim has quit IRC		19:15
clarkb	I will :)	19:15
lifeless	also, don't by xerox:)	19:15
jeblair	xerox laser printers are great. i print books on them.	19:15
lifeless	'They indeed implemented a software bug, eight years ago, and indeed, numbers could be mangled across all compression modes. They have to roll out a patch for hundreds of thousands of devices world-wide.'	19:16
lifeless	jeblair: It was meant in humour; single mistakes don't blacklist a vendor - mistakes happen	19:16
lifeless	jeblair: I've purchased some pretty large xerox kit at firms in the past	19:17
jeblair	nod	19:17
clarkb	now to figure out what fungi needed to run on the proposal slave. to the scrollback	19:17
*** protux has quit IRC		19:18
jeblair	clarkb: i think it was just that it kept going offline because the regex was wrong; i don't think anything needs to be re-run	19:18
clarkb	jeblair: oh right because zuul needs new functions	19:18
jeblair	clarkb: the only thing that needed re-running was the tarball job due to the scp thing	19:18
jeblair	clarkb: i think that change merged so we should be set now wrt proposal	19:19
clarkb	jeblair: great, I will look at retriggering tarball job now	19:19
*** e0ne has joined #openstack-infra		19:21
*** thomasbiege has joined #openstack-infra		19:21
clarkb	jeblair: there are a bunch of offline nodes on 05, not sure if that is just nodepool catching up though	19:23
clarkb	markmcclain: you had tagged a release right?	19:24
clarkb	markmcclain: I will make sure that the whole pipeline happens for that	19:24
*** nati_ueno has quit IRC		19:24
jeblair	clarkb: it could be a similar situation to 04; i'll check it out	19:25
clarkb	looks like fungi may have retriggered already, I am hunting this down	19:25
*** nati_ueno has joined #openstack-infra		19:25
markmcclain	clarkb: yes and everything looks to have been published now	19:25
clarkb	markmcclain: yup I see it, I think fungi must've triggered everything then the jobs ran once I brought the slave back online	19:26
markmcclain	ah	19:26
jeblair	clarkb: it's moving; i think i'll leave it be and see if it catches up.	19:27
clarkb	jeblair: ok	19:27
openstackgerrit	A change was merged to openstack-infra/nodepool: Fix typo in allocation https://review.openstack.org/74803	19:28
*** dstanek_afk has quit IRC		19:28
*** salv-orlando has joined #openstack-infra		19:29
jeblair	02 has that problem too. the others are ok	19:29
clarkb	https://issues.jenkins-ci.org/browse/JENKINS-16239 is what I saw on jenkins.o.o	19:29
clarkb	I think an update of the envinject plugin will fix it	19:30
clarkb	but it doesn't appear to be as serious as I first thought	19:30
*** mrmartin has joined #openstack-infra		19:30
openstackgerrit	K Jonathan Harker proposed a change to openstack-infra/config: Parameterize the status page urls https://review.openstack.org/74557	19:32
*** thomasbiege has quit IRC		19:40
*** jcooley_ has joined #openstack-infra		19:47
*** afazekas has quit IRC		19:50
*** mfisch has quit IRC		19:50
*** salv-orlando has quit IRC		19:53
*** salv-orlando has joined #openstack-infra		19:53
*** mrmartin has quit IRC		19:54
*** dstanek_afk has joined #openstack-infra		19:54
*** yassine has quit IRC		19:55
*** yassine has joined #openstack-infra		19:55
*** sandywalsh_ has quit IRC		19:57
*** salv-orlando has quit IRC		19:57
*** dstanek_afk has quit IRC		19:59
*** julim has joined #openstack-infra		20:01
clarkb	ES recovery is really slow, I am going to stop my indexers to give the cluster a chance to finish recovering	20:02
jog0	343 patches in check?	20:04
clarkb	welcome to the jungle	20:05
*** dcramer_ has quit IRC		20:05
jog0	is this the recheck 24 thing?	20:05
jeblair	jog0: no, this is a rax network outage + ffp load + the check thing	20:05
jog0	ffp?	20:06
jeblair	feature freeze proposal	20:06
jeblair	er	20:06
jeblair	feature proposal freeze?	20:06
jeblair	some combination of those words. :)	20:06
jog0	ack	20:06
clarkb	jeblair: indexers are stopped. I think indexing and recovery was slow because it was doing both at the same time which meant everything had to be extremely synchronous	20:06
clarkb	going to watch it now and see if that last 4 shards recover more quickyl	20:07
jog0	wow this is pretty scary	20:07
* jog0 finds lunch		20:07
jeblair	jog0: ha	20:07
*** ociuhandu has joined #openstack-infra		20:08
*** jcooley_ has quit IRC		20:08
*** hashar has joined #openstack-infra		20:09
*** jcooley_ has joined #openstack-infra		20:09
*** markmcclain has quit IRC		20:12
jeblair	clarkb: looks like the ready node count is now small (as it should be under load)	20:12
openstackgerrit	Zane Bitter proposed a change to openstack-infra/config: Fix ChangeId links https://review.openstack.org/74821	20:13
*** sandywalsh_ has joined #openstack-infra		20:13
*** jcooley_ has quit IRC		20:13
*** oubiwann has joined #openstack-infra		20:14
*** jamespage_ has joined #openstack-infra		20:17
*** oubiwann has quit IRC		20:18
BobBall	nodepool question... deleteNode can sometimes timeout in RAX causing nodepool to bail at http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/nodepool.py#n1112 - but the node is eventually cleaned from RAX. What would the advice be here? Extend timeout? ignore all exceptions and carry on with the nodepool stuff?	20:20
*** ociuhandu has quit IRC		20:20
openstackgerrit	A change was merged to openstack-infra/git-review: Retrieve remote pushurl independently of user's locale https://review.openstack.org/64307	20:20
openstackgerrit	Dan Prince proposed a change to openstack-infra/nodepool: Retry ssh connections on auth failure. https://review.openstack.org/74825	20:21
jeblair	BobBall: the cleanup thread is supposed to cleanup the nodepool db in that case. i think we should extend the rax timeout so it hits less often.	20:21
jeblair	BobBall: lifeless was working on a patch series that tackles that from a different perspective, but it's not ready yet	20:22
BobBall	Ah, OK.	20:22
BobBall	You hit it in the gate too with rax nodes?	20:22
jeblair	BobBall: yep	20:22
jeblair	they eventually get cleaned up, just slower than they should	20:23
BobBall	kay. Wonder why it hits them. Might have a chat with Ant/John about that.	20:23
*** ociuhandu has joined #openstack-infra		20:23
zaro	clarkb: i think this question is meant for you.. https://review.openstack.org/#/c/61321	20:23
BobBall	Will increase the timeout, and rely on the cleaning thread ;)	20:23
clarkb	we seem to actually be recovering indexes now. I stopped all indexers and cleared the caches on es nodes	20:24
jeblair	BobBall: if you decide on a good value, let me know	20:24
clarkb	once it is green again I will turn on indexers	20:24
clarkb	(I think we are beginning to get into our nodes are too small for the data thrown at it territory again)	20:25
openstackgerrit	Andreas Jaeger proposed a change to openstack/requirements: Update openstack-doc-tools to 0.7.1 https://review.openstack.org/74827	20:25
BobBall	Just to understand jeblair - why do you want it to wait for the server to have gone on deletion? a quota thing? Could just add to a list of nodes that are being deleted and poll them in the cleanup thread, rather than block?	20:25
*** cadenzajon has quit IRC		20:26
lifeless	jeblair: the delete refactoring stuff ?	20:26
clarkb	you don't want to account the node as gone before it is gone	20:26
clarkb	quota is part of that but more importantly the allocation of nodes across providers	20:27
BobBall	OK	20:27
BobBall	10 minutes just seems like a long time to block so I'm hesitant about making it even longer :P	20:27
jeblair	BobBall: answering in order: nodepool needs to know how many servers there actually are in order to do math about how many it should spin up correctly. yes. there's lots of ways you could do it; this one is not that problematic, it just needs tuning; lifeless has another way.	20:27
jeblair	BobBall: it's not blocking anything	20:27
jeblair	BobBall: the current nodepool design has lots of threads all fighting to get their work done, mediated by the provider managers (so they don't starve each other or run over rate limits)	20:28
jeblair	BobBall: so that one thread is blocking, but it isn't slowing anything else down.	20:29
jeblair	lifeless: yes	20:29
BobBall	Understood.	20:29
*** markmcclain has joined #openstack-infra		20:30
*** mrmartin has joined #openstack-infra		20:30
*** markmcclain1 has joined #openstack-infra		20:31
BobBall	I've got a python script trying to use nodepool - so my script polls for nodes, holds them then deletes them. This is what's blocking for me, but I can re-work the blocking there so a longer timeout is fine.	20:31
*** jgrimm has quit IRC		20:32
*** markmcclain1 has quit IRC		20:32
*** jamespage_ has quit IRC		20:32
mtreinish	fungi, clarkb: have you guys seen this failure before/is there a bug for it?: http://logs.openstack.org/57/73457/1/check/check-tempest-dsvm-postgres-full/595b00c/console.html	20:32
*** dstanek has joined #openstack-infra		20:32
*** mrda_away is now known as mrda		20:33
*** denis_makogon_ has joined #openstack-infra		20:34
clarkb	mtreinish: haven't seen that before. looks like adding a region failed	20:34
*** markmcclain has quit IRC		20:34
clarkb	but I don't see keystone logs	20:34
openstackgerrit	Ryan Petrello proposed a change to openstack/requirements: Update pecan >= 0.4.5 in global requirements. https://review.openstack.org/74830	20:34
*** dprince has quit IRC		20:35
jeblair	2014-02-19 06:37:40.894 \| 2014-02-19 06:37:40 /opt/stack/new/devstack/functions-common: line 997: /opt/stack/new/devstack/stack-screenrc: Permission denied	20:35
jeblair	is that the actual error?	20:35
clarkb	and syncing requirements failed	20:35
clarkb	jeblair: looks like permissions trouble in the /opt dirs	20:35
*** ryanpetrello has left #openstack-infra		20:36
HenryG	Hi, I am unable to find an existing bug for this gate-neutron-python27 failure: http://logs.openstack.org/33/68833/3/gate/gate-neutron-python27/42c2370	20:36
*** ryanpetrello has joined #openstack-infra		20:36
HenryG	Any clues/hints would be appreciated.	20:37
clarkb	HenryG: looks like a greenlet failure	20:38
clarkb	I would ask neutron folks	20:38
HenryG	clarkb: thanks, will do	20:38
mtreinish	clarkb, jeblair: ok I was just thrown by what looked like ps output interspersed in the log messages	20:39
*** jcooley_ has joined #openstack-infra		20:39
mtreinish	but yeah it definitely looks like permissions issue, should I open it against devstack or ci?	20:39
clarkb	mtreinish: not sure, does that change change permissions in a weird way?	20:41
*** smarcet has left #openstack-infra		20:41
jeblair	right before running devstack, devstack-gate does: "sudo chown -R stack:stack $BASE"	20:42
*** yolanda has quit IRC		20:42
jeblair	so it's hard to say what the problem could be. did that fail? or did something in devstack change it?	20:42
mtreinish	jeblair: it looks like everything was working fine until: http://logs.openstack.org/57/73457/1/check/check-tempest-dsvm-postgres-full/595b00c/console.html#_2014-02-19_06_37_13_730	20:43
mtreinish	when it went to sync the requirements for horizon	20:43
*** jgrimm has joined #openstack-infra		20:44
clarkb	devstack is also doing safe_chown ing of it sown	20:45
clarkb	so yeah I think it could be in a number of places	20:45
clarkb	sudo chown -R jenkins:jenkins /opt/stack/new happens in workspace new setup	20:46
clarkb	should it be stack:stack instead?	20:46
jeblair	clarkb: not unless we want to 'sudo stack' before every command	20:46
clarkb	doesn't devstack run as the stack user though?	20:47
clarkb	I guess it gets root as necessary though	20:47
lifeless	you could sudo stack exec :)	20:47
jeblair	clarkb: yes, which is why devstack-gate does "sudo chown -R stack:stack $BASE"	20:47
jeblair	right before running devstack	20:47
jeblair	lifeless: then we couldn't go back.	20:47
clarkb	oh gotcha	20:47
jeblair	lifeless: jenkins has sudo, stack drops sudo	20:47
lifeless	ah	20:48
*** dcramer_ has joined #openstack-infra		20:48
*** jcooley_ has quit IRC		20:49
jeblair	that seems to have happened to 2 builds in the last 24h, in dfw and iad.	20:51
jeblair	according to logstash	20:51
*** mwagner_lap has quit IRC		20:53
mtreinish	jeblair: I guess I'm really lucky then :)	20:54
*** smarcet has joined #openstack-infra		20:54
*** jcooley_ has joined #openstack-infra		20:55
jeblair	mtreinish: i think we'll either need to catch a live node or add some debugging	20:56
mtreinish	jeblair: ok, should I open a bug about it then?	20:58
mtreinish	yeah the logs don't really show what happened	20:58
*** DinaBelova is now known as DinaBelova_		20:58
jeblair	mtreinish: sure; target ci and devstack until we know what's up i guess	20:58
*** khyati has quit IRC		21:01
*** sabari has quit IRC		21:01
*** khyati has joined #openstack-infra		21:02
mtreinish	jeblair: https://bugs.launchpad.net/devstack/+bug/1282262	21:02
uvirtbot	Launchpad bug 1282262 in openstack-ci "Permission denied errors on /opt during devstack" [Undecided,New]	21:02
*** khyati has quit IRC		21:04
clarkb	jeblair: I am thinking we may want to add another ES node so that losing one node doesn't cause the others to run into GC trouble (will need to bump the number of shards slightly too though that may be less necessary)	21:05
clarkb	jeblair: but I think this can happen after FF	21:05
*** CaptTofu has quit IRC		21:06
jeblair	clarkb: whew	21:06
*** CaptTofu has joined #openstack-infra		21:06
*** jamespage_ has joined #openstack-infra		21:07
openstackgerrit	Dan Prince proposed a change to openstack-infra/nodepool: Retry ssh connections on auth failure. https://review.openstack.org/74825	21:07
*** rfolco has quit IRC		21:08
ArxCruz	jeblair: clarkb are you guys having problems with jenkins and nodepool? I have a few VM's ready, but a lot of jobs in the build queue	21:09
*** thomasbiege has joined #openstack-infra		21:09
* ArxCruz blame zuul changes :@		21:10
*** jamespage_ has quit IRC		21:10
openstackgerrit	Ivan Melnikov proposed a change to openstack-infra/config: Add documentation jobs for taskflow https://review.openstack.org/74837	21:10
*** pafuent has left #openstack-infra		21:10
openstackgerrit	Matthew Treinish proposed a change to openstack-infra/devstack-gate: Start compressing config files too https://review.openstack.org/74838	21:10
*** CaptTofu has quit IRC		21:11
*** alexpilotti has quit IRC		21:11
*** alexpilotti_ has joined #openstack-infra		21:11
HenryG	clarkb: there does not seem to be a bug tracking this yet, but it looks like trouble may be brewing: http://logstash.openstack.org/index.html#eyJzZWFyY2giOiJtZXNzYWdlOlwiZ3JlZW5sZXQuR3JlZW5sZXRFeGl0XCIgQU5EIGZpbGVuYW1lOlwiY29uc29sZS5odG1sXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTI4NDQyMDk0OTV9	21:11
jeblair	ArxCruz: actually, zuul seems to have dealt rather well with 400 changes in queue....	21:12
jeblair	ArxCruz: the bulk of our problems stem from a rax network outage this morning	21:12
*** weshay has quit IRC		21:12
ArxCruz	jeblair: that's really weird, I have only one zuul and zuul-merger, and nodepool latest version	21:12
HenryG	Any tips on how to track down the culprit?	21:12
mattoliverau	Morning!	21:12
ArxCruz	jeblair: right now I have a lot of vm's idle and a lot of jobs in build queue	21:12
jeblair	mattoliverau: good morning; things are busy here.	21:13
*** jamespage_ has joined #openstack-infra		21:13
jeblair	ArxCruz: oh, you're talking about your own thing.	21:13
*** mrmartin has quit IRC		21:13
ArxCruz	jeblair: hehe, yup	21:13
jeblair	ArxCruz: you asked about us.	21:13
ArxCruz	wondering if is something I did or if there's something wrong with yours too	21:13
ArxCruz	sorry, bad english	21:13
*** cadenzajon has joined #openstack-infra		21:13
jeblair	ArxCruz: see what the state of the nodes are in jenkins. we upgraded jenkins and found that the latest version didn't work with the gearman plugin, so we're currently running the lts version	21:14
ArxCruz	oh boy...	21:15
ArxCruz	which jenkins version are you guys using ?	21:15
ArxCruz	and which gearman plugin ?	21:15
ArxCruz	:/	21:15
ArxCruz	jeblair: ^	21:16
*** jcooley_ has quit IRC		21:17
jeblair	ArxCruz: you can check the version # at the bottom of the page; the gearman plugin is something recent but shouldn't matter too much.	21:17
*** jcooley_ has joined #openstack-infra		21:18
ArxCruz	jeblair: thanks, sorry for the confusion :)	21:19
*** tjones has joined #openstack-infra		21:21
*** jcooley_ has quit IRC		21:22
*** jroovers has joined #openstack-infra		21:26
openstackgerrit	Sergey Lukjanov proposed a change to openstack-infra/config: Enable docs for python-savannaclient https://review.openstack.org/74470	21:27
*** markmcclain has joined #openstack-infra		21:28
*** sabari has joined #openstack-infra		21:28
*** e0ne has quit IRC		21:28
*** andreaf has quit IRC		21:28
*** jroovers has quit IRC		21:30
*** jcooley_ has joined #openstack-infra		21:30
dhellmann	dstufft: fyi, I'm very close to giving up on namespace packages for oslo libraries :-\|	21:30
*** thomasbiege has quit IRC		21:33
*** fbo_away is now known as fbo		21:34
*** jamielennox is now known as jamielennox\|away		21:36
*** hashar has quit IRC		21:38
*** jhesketh_ has joined #openstack-infra		21:39
*** protux has joined #openstack-infra		21:39
jhesketh_	Morning	21:39
*** ok_delta has joined #openstack-infra		21:40
*** sabari_ has joined #openstack-infra		21:40
*** sabari has quit IRC		21:41
jeblair	jhesketh_: good morning	21:42
jhesketh_	hey jeblair, how's things?	21:43
jeblair	jhesketh_: could be better. :)	21:43
jog0	jeblair: how much of the check queue is from the outage vs recheck	21:43
jeblair	jhesketh_: there was a rax network outage this morning; that's the flat line in the nodepool graph	21:43
jeblair	jog0: i'm not sure how i would determine the answer to that	21:43
jhesketh_	:-(	21:43
jeblair	jhesketh_: that set us back a bit	21:43
jhesketh_	right, let me know if I can help with anything	21:44
*** salv-orlando has joined #openstack-infra		21:44
jeblair	jog0: the trend in queue length has been solidly downward since we got everything unstuck, so at current in/out rates, we're not getting worse. that suggests that under normal circumstances we can more than handle the current patchset test load.	21:46
jeblair	jog0: (extrapolating from less than 1 days worth of data which is potentially dangerous)	21:46
*** oubiwann has joined #openstack-infra		21:47
*** oubiwann has quit IRC		21:47
jeblair	jhesketh_: i have a puzzle for you if you're interested -- during the network outage, both the jenkins manager in nodepool as well as novaclient itself were stuck in the same ssl read function.	21:48
jeblair	jhesketh_: Shrews suggested that setting keepalive on the socket might help prevent that sort of situation in the future	21:48
*** markmcclain has quit IRC		21:48
dims	puzzled...requirements/projects.txt seems to be outdated for a brand new docs run. any ideas? http://logs.openstack.org/74/74474/17/check/gate-oslo.vmware-docs/33f359e/console.html	21:48
jeblair	jhesketh_: are you interested in seeing if something like that is possible? it might involve some novaclient, urllib, or ssl library deep diving	21:49
*** sabari_ has quit IRC		21:50
jeblair	dims: that will update during the next image build, which won't be for a while	21:50
*** smarcet has quit IRC		21:51
jhesketh_	jeblair: not sure I have enough knowledge of those systems to actually achieve much there to be honest	21:51
dims	jeblair, i see. thx	21:51
jeblair	mordred: ^ might need to do something about stale requirements repos	21:52
jeblair	jhesketh_: no prob	21:52
jhesketh_	jeblair: what was the read function they were stuck in/error they saw	21:52
jeblair	jhesketh_:	21:53
jeblair	http://paste.openstack.org/show/67382/	21:53
*** wenlock_ has quit IRC		21:54
openstackgerrit	Davanum Srinivas (dims) proposed a change to openstack-infra/config: Mark a few oslo.vmware jobs as non-voting https://review.openstack.org/74669	21:54
*** skraynev is now known as skraynev_afk		21:55
mordred	jeblair: reading scrollback	21:55
*** wenlock has joined #openstack-infra		21:56
*** prad has quit IRC		21:56
jeblair	mordred: /opt/requirements is now updated daily at most. in the case of hpcloud-az2, it was last updated feb 12.	21:57
*** julim has quit IRC		21:58
openstackgerrit	Davanum Srinivas (dims) proposed a change to openstack-infra/config: Temporary : Mark a few oslo.vmware jobs as non-voting https://review.openstack.org/74669	21:59
mordred	jeblair: so - we might need to "cd /opt/requirements ; git pull --ff-only" (or something similar)	21:59
mordred	?	21:59
mordred	oh STALE requirements. I thought you were saying stable requirements	21:59
jeblair	mordred: yes, though that may require sudo access unless we change the owner of those repos to jenkins	21:59
jeblair	mordred: since all the slaves are single use, i think we can do that now	22:00
mordred	jeblair: shouldn't the repo prep be setting requirements to master?	22:00
mordred	like, since requirements is part of the integration set?	22:00
jeblair	mordred: not devstack	22:00
jeblair	mordred: unit test, etc, jobs	22:00
mordred	oh. but why does /opt/requirements matter for unittests - they're all in tox?	22:00
jeblair	mordred: see the original question from dims and 22:03 < jeblair> mordred: not devstack	22:01
jeblair	22:03 < jeblair> mordred: unit test, etc, jobs	22:01
jeblair	gah	22:01
jeblair	mordred: and http://logs.openstack.org/74/74474/17/check/gate-oslo.vmware-docs/33f359e/console.html	22:01
mordred	k. reading	22:01
mordred	jeblair: GOTCHA. thank you	22:02
mordred	yeah - I think we fetch /opt/requirements as a pre-test sudo operation	22:03
jeblair	mordred: can't sudo, not yet at least.	22:03
mordred	or, rather, change it to jenkins owner	22:03
mordred	sorry - misspoke	22:03
jeblair	mordred: can sudo after this merges: https://review.openstack.org/#/q/status:open+project:openstack-infra/config+branch:master+topic:sudoers,n,z	22:03
annegentle	fungi: what's a an appropriate gerrit ref to point oreilly to for a pointer to HEAD:openstack/operations-guide/feature/edits (is that right?)	22:03
clarkb	ok back from lunch	22:04
annegentle	fungi: right now they are pointing at a fork of openstack/operations-guide, but I think jeblair mentioned they could push to a gerrit ref	22:05
annegentle	clarkb: welcome back!	22:05
clarkb	I am going to turn indexers back on	22:06
clarkb	annegentle: fungi is AFk for a while. let me kick ES then I will look at your question	22:06
annegentle	clarkb: okie	22:06
*** amcrn has quit IRC		22:07
*** ok_delta has quit IRC		22:07
*** virmitio has quit IRC		22:08
*** dkliban is now known as dkliban_afk		22:09
*** cadenzajon has quit IRC		22:09
*** CaptTofu has joined #openstack-infra		22:10
clarkb	ok ES and logstash are "UP" it is relocating shards but indexing is happening at a reasonable speed. I am a bit worried that we might run into memory trouble so will keep an eye on it	22:11
*** oubiwann has joined #openstack-infra		22:11
clarkb	annegentle: now for oreilly. What is it that oreilly needs to do? just push their edits upstream?	22:11
*** jamielennox\|away is now known as jamielennox		22:12
*** vkozhukalov has quit IRC		22:12
*** cadenzajon has joined #openstack-infra		22:13
*** ArxCruz has quit IRC		22:13
annegentle	clarkb: so we created a branch so that oreilly's edits are less intrusive on our master	22:13
clarkb	yup	22:13
annegentle	clarkb: we can happily keep editing while they make it production ready	22:13
*** lcostantino has quit IRC		22:13
annegentle	clarkb: we're still changing master and then I keep delivering changes to feature/edits	22:14
*** khyati has joined #openstack-infra		22:14
annegentle	clarkb: they just want to know what we want :) very accomodating	22:14
*** ArxCruz has joined #openstack-infra		22:15
zaro	roz: you cannot replace the change owner and that's not configurable in gerrit. however it looks like there might be a workaround which fungi has powers to do.. https://groups.google.com/forum/#!topic/repo-discuss/aqNgmuiCtyk	22:15
clarkb	annegentle: what would you like them to do ?	22:15
*** sarob has quit IRC		22:16
jeblair	zaro, roz: we're not going to do that. what's the problem?	22:16
annegentle	clarkb: ideally they'll push to feature/edits	22:16
annegentle	clarkb: so what do I tell them to push to?	22:16
*** sarob has joined #openstack-infra		22:16
clarkb	dims: still around? we need to test the -proposed version of libvirt 1.1.1 on precise before it will end up in cloud archive. I think the easiest way to do that is with a devstack change that enables -proposed for the libvirt package. Is that something you are already testing?	22:16
*** thedodd has joined #openstack-infra		22:16
clarkb	annegentle: and these edits would go into review right?	22:17
annegentle	push to the appropriate gerrit ref (HEAD:refs/for/branchname)	22:17
annegentle	clarkb: jeblair originally had that in an email ^^	22:17
annegentle	clarkb: so helping their production staff get the pointer right	22:17
clarkb	git push ssh://username@review.openstack.org:29418 HEAD:refs/for/feature/edits <- that will push them up for review	22:17
*** banix has quit IRC		22:18
clarkb	can also just use git review on that branch if the .gitreview branch is set correctly	22:18
annegentle	refs/for? really?	22:18
clarkb	annegentle: refs/for is the magical gerrit reference prefix	22:18
annegentle	clarkb: (not that I'm doubting)!	22:18
annegentle	clarkb: do you think it makes sense to give them one username that can push directly? or did we decide that was bad	22:19
*** bknudson has quit IRC		22:19
annegentle	clarkb: I'm okay with walking one of their production staff through cla but wanting to be sure it's required	22:20
clarkb	annegentle: personally I think that is bad. It isn't how openstack accepts commits. But the relationship here is new and special and may not require review	22:20
clarkb	jeblair: ^	22:20
jeblair	i'd like to try having them push things for review	22:20
annegentle	jeblair: clarkb: okay I'll keep pushing them	22:20
*** sarob has quit IRC		22:21
*** dolphm has joined #openstack-infra		22:21
dolphm	is zuul waiting for a check job to complete before moving approved changes into the gate?	22:22
clarkb	dolphm: if the check results are more than 24 hours old yes	22:22
dolphm	YAY!	22:22
annegentle	jeblair: if I give them git push ssh://username@review.openstack.org:29418 HEAD:refs/for/feature/edits and they go through the CLA and all, what will those patches look like to me on review.openstack.org?	22:22
*** mfer has quit IRC		22:22
dstufft	dhellmann: dooo it	22:22
dstufft	dhellmann: namespace packages are bad for you	22:23
clarkb	dolphm: and it will recheck if comments happen (not just approvals) and the check tests are more than 72 hours old	22:23
annegentle	jeblair: right now I'm porting from master to feature/edits	22:23
dstufft	at least until python 3.whatever is the baseline and you can use the built in form of namepsace packages	22:23
dstufft	maybe someone can backport that to 2.x, I dunno	22:23
dolphm	clarkb: ha, that's awesome	22:23
*** esker has quit IRC		22:23
clarkb	dolphm: idea behind that is test results stay fresh as review happens	22:23
fungi	clarkb: the neutronclient release was not related to the offlined proposal slave... just different broken things i was trying to fix	22:23
fungi	but looks like you figured that out	22:24
jeblair	annegentle: they'll show up like normal but the branch column will be different	22:24
dolphm	clarkb: that's great -- it should help catch merge conflicts earlier too, which will be super useful all by itself	22:24
*** dstanek has quit IRC		22:24
clarkb	fungi: yup thanks, go back to being AFK :P	22:24
annegentle	jeblair: ok like stable/havana.	22:24
jeblair	annegentle: exactly	22:24
zaro	jeblair: roz wants to make himself the owner of a change so he can set it to WIP Status.	22:24
dmsimard	jeblair: Sorry to bother you with that again, when did you say https://review.openstack.org/#/c/74780/ was going to be effective ?	22:24
jeblair	annegentle: so you'll want to watch out for that	22:24
dims	clarkb, pong. yes i can help with that	22:24
clarkb	dims: awesome thanks. let me collect the relevant data really quickly	22:25
jeblair	zaro, roz: remove the changeid from the commit message and git-review it again to make a new change in gerrit. abondon the old one.	22:25
dims	clarkb, i ended up building the libvirt from their git and running it in our gate	22:25
clarkb	dims: see https://bugs.launchpad.net/nova/+bug/1228977/ comment from Brian Murray. Lifeless already updated the impact and risk stuff for us	22:26
uvirtbot	Launchpad bug 1228977 in nova "n-cpu seems to crash when running with libvirt 1.1.1 from ubuntu cloud archive" [High,Confirmed]	22:26
clarkb	dims: so now we need to test it	22:26
jeblair	dmsimard: i'll try kicking off an image build now	22:27
clarkb	dims: we need a change to devstack that enables ubuntu -proposed https://wiki.ubuntu.com/Testing/EnableProposed and changes the name of the libvirt package to libvirt/precise-proposed in devstack that we can WIP	22:27
clarkb	dims: that should install libvirt from proposed and test that the patched libvirt works as expected	22:27
dims	clarkb, am on it after i wrap up a couple of things in a few	22:27
*** jamespage_ has quit IRC		22:28
clarkb	dims: with that info we should be able to get the package updated in cloud archive and hopefully switch all tests to new libvirt	22:28
*** oubiwann has quit IRC		22:28
dims	clarkb, yep. sounds good.	22:28
clarkb	dims: so basically this is a throw away change to show ubuntu that the fix is safe	22:28
clarkb	dims: awesome thank you	22:28
dims	clarkb, yep.	22:28
*** jomara has quit IRC		22:29
*** prad has joined #openstack-infra		22:31
*** jcooley_ has quit IRC		22:31
*** jcooley_ has joined #openstack-infra		22:31
dmsimard	jeblair: Thanks, appreciate it. Let me know what happens :)	22:33
*** mrda is now known as mrda_away		22:33
*** dcramer_ has quit IRC		22:34
jeblair	clarkb: http://paste.openstack.org/show/67391/	22:34
jeblair	clarkb: az2 consistently fails image creation with that	22:34
clarkb	looking	22:35
*** dolphm is now known as dolphm_503		22:35
clarkb	FYI gearman for logstash is 164k events behind but slowly catching up	22:35
clarkb	jeblair: is the remote side killing our connection?	22:35
*** jcooley_ has quit IRC		22:35
jeblair	clarkb: i have no idea	22:36
jeblair	clarkb: i tried it from my workstation at home and it works. :/	22:36
*** jcooley_ has joined #openstack-infra		22:36
*** VijayT has joined #openstack-infra		22:37
*** mriedem has quit IRC		22:37
*** jcooley_ has quit IRC		22:37
*** jeckersb is now known as jeckersb_gone		22:37
*** jcooley_ has joined #openstack-infra		22:38
*** thomasem has quit IRC		22:39
*** e0ne has joined #openstack-infra		22:39
*** rcleere has quit IRC		22:39
clarkb	is CONNECT_TIMEOUT being hit?	22:40
* clarkb reads more code		22:40
clarkb	doesn't look like it	22:41
*** miqui has quit IRC		22:42
*** jcooley_ has quit IRC		22:42
clarkb	jeblair: I think the nodeutils ssh_connect may need to catch a wider net of exceptions possibly	22:42
clarkb	right now it only catches socket.error	22:42
jeblair	clarkb: interesting that this is new and only happens on az2	22:42
*** e0ne has quit IRC		22:42
*** dstanek has joined #openstack-infra		22:42
clarkb	I agree	22:43
jeblair	clarkb: i'm trying some manual tests with 'nova boot'	22:43
clarkb	k	22:43
fungi	clarkb: thanks. brad topol is giving a great keystone overview to the group. Shrews is here too	22:44
clarkb	so #tox is the channel for the skype replacement, not the python test tool...	22:44
lifeless	hahahahaha	22:45
lifeless	clarkb: #python-testing	22:45
jeblair	clarkb: so just manually sshing, for a while i got ssh: connect to host 15.185.190.118 port 22: Connection refused	22:45
*** ArxCruz has quit IRC		22:45
jeblair	clarkb: now i get Connection closed by 15.185.190.118	22:45
clarkb	jeblair: which should cause an EOFError right?	22:46
dims	clarkb, i see zul may have updated "precise-proposed/icehouse" to libvirt 1.2.1 with the changes we need (https://launchpad.net/~ubuntu-cloud-archive/+archive/icehouse-staging/+sourcepub/3889570/+listing-archive-extra) - we will have to try that	22:46
clarkb	lifeless: thanks	22:46
*** sarob has joined #openstack-infra		22:46
clarkb	dims: that should work	22:47
morganfainberg	fungi, give topol a hard time for me ;)	22:47
dims	clarkb, will report back tomorrow.	22:47
morganfainberg	fungi, (or at least wave enthusiastically at him for me)	22:47
jeblair	clarkb: i've never looked at a console log for an hpcs vm before, but this doesn't look great to me: http://paste.openstack.org/show/67397/	22:47
clarkb	dims: awesome thank you for the help (I had hoped to get to it eventually but so much other stuff is going on)	22:48
zul	dims/clarkb: we should be uploading a new version of libvirt next week	22:48
clarkb	zul: does that mean you don't need us to test it?	22:48
clarkb	zul: https://bugs.launchpad.net/nova/+bug/1228977/ started the conversation	22:48
uvirtbot	Launchpad bug 1228977 in nova "n-cpu seems to crash when running with libvirt 1.1.1 from ubuntu cloud archive" [High,Confirmed]	22:48
jaypipes	quick question... anybody know which config file the periodic QA jobs are defined in?	22:49
clarkb	we need to test it anyways, but it is easier to do that once in cloud archive	22:49
*** bknudson has joined #openstack-infra		22:49
clarkb	however getting ahead of it is probably best so that if it doesn't work we can hopefully fix it before the update	22:49
fungi	morganfainberg: will do. i'm a well-practiced heckler	22:49
morganfainberg	fungi, ++	22:49
morganfainberg	:)	22:49
clarkb	jeblair: that looks like unhappy metadata server which is bad times	22:50
lifeless	zul: as we understand it you need it tested, so we're aiming to do that :)	22:51
clarkb	jaypipes: most of them should be templates now and we specify which branch to test in the projects.yaml file for JJB when we instantiate the template	22:51
mordred	clarkb: its #pylib	22:51
*** rlandy has quit IRC		22:52
jeblair	clarkb: still getting eof on ssh to that host. spinning up another one in az1 to compare console log.	22:52
jaypipes	clarkb: yeah, am looking in that file now.. unless I am mistaken, all the periodic jobs are run against "devstack-precise" single use nodes. Is that correct?	22:52
*** dkranz has quit IRC		22:52
clarkb	jaypipes: all of the tempest periodic tests yes	22:53
clarkb	the unittest periodic jobs are run on bare-precise and bare-centos now	22:53
jeblair	clarkb: yeah, the output looks much less error-like on az1	22:53
jaypipes	clarkb: gotcha. thx man.	22:54
jeblair	clarkb: i think this may be hpcs ticket-worthy	22:54
clarkb	jeblair: I agree, though we may just be told to stop using az2 which is :(	22:54
jeblair	clarkb: not much we can do about that, we can't use it now anyway	22:55
clarkb	yup	22:55
*** ryanpetrello has quit IRC		22:55
jeblair	clarkb: would you please do the honors?	22:56
clarkb	oh you want me to do it :P yes I will file it	22:56
zul	lifeless: thats for srus	22:57
lifeless	zul: so, UCA doesn't need as much testing as SRUs ?	22:58
lifeless	zul: anyhow, we want it in saucy directly too	22:58
*** esker has joined #openstack-infra		22:59
*** thedodd has quit IRC		23:00
*** esker has quit IRC		23:00
*** esker has joined #openstack-infra		23:00
*** mrda_away is now known as mrda		23:01
zul	lifeless: to get it saucy it needs an sru, UCA it gets updated when trusty gets updated	23:01
lifeless	zul: ok, so - tripleo wants it in saucy ;)	23:01
zul	lifeless: thats nice for tripleo, that takes a bit longer then :)	23:02
*** fbo is now known as fbo_away		23:02
*** markmcclain has joined #openstack-infra		23:03
*** markmcclain1 has joined #openstack-infra		23:05
clarkb	jeblair: ticket sent, I cc'd you	23:05
*** markmcclain has quit IRC		23:07
*** julim has joined #openstack-infra		23:08
*** ayoung has joined #openstack-infra		23:08
*** khyati has quit IRC		23:09
*** jnoller has quit IRC		23:10
*** sarob has quit IRC		23:11
openstackgerrit	Mat Lowery proposed a change to openstack-infra/config: Enable list item bullets in CSS except for Jenkins https://review.openstack.org/71752	23:12
ayoung	jeblair, whom do we bug about enabling evesdrop for #openstack-keystone? I feel like we are coding without git right now	23:15
jeblair	ayoung: sorry, it's lost in the infra review backlog	23:15
ayoung	of course	23:15
clarkb	hsa the change been proposed?	23:16
jeblair	ayoung: well, not lost, but it's there.	23:16
clarkb	I see it	23:16
*** yassine has quit IRC		23:16
jeblair	i can't really prioritize reviewing irc-related changes right now. sorry.	23:16
clarkb	I will approve, I don't think there are any meetings for the next 45 minutes	23:17
ayoung	heh	23:17
*** gordc has quit IRC		23:17
ayoung	sorry to be a noodge	23:17
clarkb	ayoung: out of curiousity why vacate -dev?	23:17
dhellmann	dstufft: the problem is the amount of pain to rename the packages we already have :-/	23:17
jeblair	ayoung: apparently clarkb is the answer. he's nicer than i am. maybe i can convince him to review some of my changes. ;)	23:17
dmsimard	jeblair: Leaving the office, i'll let you know if I still see the issue tomorrow	23:17
ayoung	clarkb, so many people were complaining about the keystone devs crowding out the room	23:17
clarkb	ayoung: thats the point	23:18
dstufft	dhellmann: sufficient pain to teach you the error of your ways ;)	23:18
dstufft	(yes it sucks :( )	23:18
clarkb	ayoung: eg that is a good thing	23:18
ayoung	clarkb, think we should stay in -dev?	23:18
clarkb	oh well	23:18
dstufft	dhellmann: (true talk, basically this pain is why I'm anti namespaces, because i know this feel)	23:19
clarkb	ayoung: not necessarily. I definitely seem ot have a different idea of how irc should work than most	23:19
clarkb	ayoung: I expect folks to use clients that don't suck :)	23:19
dhellmann	dstufft: well, it's pain on the packagers, not on me	23:19
dhellmann	the same pain applies for renaming anything	23:19
ayoung	clarkb, I preferred being in -dev as it meant I was paying attention there and tended to answer General Purpose questions, too	23:19
morganfainberg	clarkb, ++ on clients that don't suck	23:20
*** oubiwann has joined #openstack-infra		23:20
zul	wth are we renaming now?	23:20
morganfainberg	clarkb, and i agree w/ ayoung, but if there is a real push for us to be elsewhere, I'm ok with it.	23:20
clarkb	zul: everything	23:20
zul	awesome	23:21
* zul goes jump off a cliff		23:21
morganfainberg	clarkb, shrug it's why i hang out here as well, good convos, and sometimes even unrelated to -infra stuffs	23:21
*** yamahata has quit IRC		23:21
fungi	morganfainberg: we fish you in with good conversation and then try to put you to work on infra tasks ;)	23:21
morganfainberg	fungi, LOL someday when dolphm_503 hasn't swamped us keystone folks w/ work, I'll be contributing more to infra :)	23:22
morganfainberg	fungi, actually... it is on my "I will be more involved in this" list for Juno	23:22
clarkb	lol logstash gearman backlog isn't falling	23:22
*** flaper87 is now known as flaper87\|afk		23:23
*** dmsimard has quit IRC		23:23
*** CaptTofu has quit IRC		23:24
jeblair	fungi: have the static slaves been deleted and nodepool config adjusted?	23:24
jeblair	no	23:24
jeblair	https://review.openstack.org/#/q/status:open+project:openstack-infra/config+branch:master+topic:single-use,n,z	23:24
*** CaptTofu has joined #openstack-infra		23:24
openstackgerrit	A change was merged to openstack-infra/config: Add Eavesdrop bot to #openstack-keystone https://review.openstack.org/74472	23:27
jeblair	fungi, clarkb: i approved the next change in that series; we have a node ready and it has python3 and pypy installed	23:27
jeblair	(i'm thinking 60 more nodes would be helpful now)	23:27
clarkb	I am going to temporarily increase the number of logstash workers to 3 per host while I am watching it. Hopefully that drops the backlog	23:27
morganfainberg	clarkb, ++ thanks for approving that	23:27
clarkb	jeblair: sounds good	23:28
SergeyLukjanov	jeblair, agreed	23:28
*** chris_johnson has quit IRC		23:28
*** CaptTofu has quit IRC		23:29
SergeyLukjanov	heh, just understand that it's already 3:30am in my tz while reading scrollback...	23:30
fungi	jeblair: yeah, that sounds good. i removed the static slaves (except py3k) from jenkins01 and 02 but didn't press forward yet with everything else going on	23:30
fungi	we should be safe to delete the static centos6 and precise slaves from rax now	23:31
fungi	i've seen no failures which seem to stem from the precise->bare-precise shift	23:31
SergeyLukjanov	is there anyway to see the gate backlog?	23:31
clarkb	SergeyLukjanov: zuul status?	23:32
*** dstanek has quit IRC		23:32
*** CaptTofu has joined #openstack-infra		23:32
*** sarob has joined #openstack-infra		23:33
*** dkliban_afk has quit IRC		23:33
SergeyLukjanov	clarkb, it shows only 20 for each queue that are now in progress	23:33
*** openstack has joined #openstack-infra		23:34
clarkb	SergeyLukjanov: right, 20 is the floor and it will grow as long as there aren't failures	23:36
clarkb	SergeyLukjanov: it adds 2 to the queue for each successful merge and halves with a floor of 20 for each failed merge	23:36
clarkb	SergeyLukjanov: the actual value is in the json blob	23:36
SergeyLukjanov	clarkb, yup, I know, looks like I should sleep a bit to be able to ask correctly :)	23:37
clarkb	SergeyLukjanov: you should sleep more	23:37
clarkb	SergeyLukjanov: compared to you and fungi I think I get more sleep than the both of you combined	23:37
clarkb	>_>	23:37
SergeyLukjanov	clarkb, oh, thanks for the tip about json	23:37
*** hemna_ is now known as hemnafk		23:38
SergeyLukjanov	clarkb, :)	23:38
*** oubiwann has quit IRC		23:38
clarkb	jeblair: ok, I think I just need to leave es and logstash be for a while and see how they do over a larger time sample	23:39
clarkb	jeblair: anything in particular you think needs attention re feature proposal freeze?	23:39
clarkb	if not I am going to go through review backlogs	23:39
clarkb	jeblair: we have a response from hpcloud, it happens every time right? and we are booting precise images there?	23:41
* clarkb pokes at nodepool for info		23:41
morganfainberg	out of curiosity who do you tell that the link for the hotel block at the omni is now raising a 404 (ATL summit)?	23:42
clarkb	morganfainberg: the foundation	23:42
clarkb	reed would be a good one but is afk this week	23:42
morganfainberg	clarkb, hm, ok i'll hunt down some email in that front.	23:42
morganfainberg	clarkb, k thnks :)	23:42
morganfainberg	clarkb, Infra, they know everything	23:43
morganfainberg	yes.. everything	23:43
morganfainberg	;)	23:43
*** jgrimm has quit IRC		23:43
* anteaya finishes reading backscroll		23:44
*** jerryz has quit IRC		23:44
*** protux has quit IRC		23:44
*** dstanek has joined #openstack-infra		23:44
*** denis_makogon_ has quit IRC		23:46
*** jergerber has quit IRC		23:46
*** jerryz has joined #openstack-infra		23:46
openstackgerrit	Cyril Roelandt proposed a change to openstack-infra/config: python-ceilometerclient: make the py33 gate voting https://review.openstack.org/74875	23:46
*** esker has quit IRC		23:46
*** alexpilotti_ has quit IRC		23:47
openstackgerrit	A change was merged to openstack-infra/devstack-gate: Add change in README file according to changes in code https://review.openstack.org/74342	23:56
openstackgerrit	Cyril Roelandt proposed a change to openstack-infra/config: python-ceilometerclient: make the py33 gate voting https://review.openstack.org/74875	23:58

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!