Thursday, 2013-08-22

*** sarob_ has joined #openstack-infra		00:01
jeblair	that didn't quite tell me what i needed; restarted with more logging	00:01
*** nati_uen_ has joined #openstack-infra		00:02
anteaya	:(	00:02
anteaya	yay restart	00:02
*** sarob has quit IRC		00:04
*** nati_ueno has quit IRC		00:05
*** michchap has joined #openstack-infra		00:05
*** weshay has quit IRC		00:05
clarkb	jeblair: I feel like trying to debug this git thing is taking too much time. Everything works but https clone from centos clients	00:06
clarkb	jeblair: we can either make http available too, clone from /cgit, or use git:// on centos nodes	00:07
clarkb	cloning from /cgit appears to use the non smart protocol	00:07
clarkb	I am going to see if pcrews is still in the office to see if he has any ideas	00:08
pcrews	clarkb: /me is at home today :)	00:08
jeblair	clarkb: ok. let's make http available as a backup but go back to git://	00:08
jeblair	clarkb: (with enough capacity to handle it this time)	00:08
*** UtahDave has quit IRC		00:09
clarkb	jeblair: sounds good. I will push one more patchset to enable http then we should be ready to start spinning up nodes	00:11
clarkb	pcrews: git clone https://foo through haproxy takes a really long time then fails. doing the same clone to the backend server takes a really long time but does not fail	00:12
clarkb	pcrews: wondering what might cause that and it is only when using https not http and only on centos	00:12
pcrews	? not a clue	00:14
*** senk has joined #openstack-infra		00:16
clarkb	jeblair: do you want to try doing this tonight?	00:17
clarkb	fwiw /cgit isn't that much slower. about 2 minutes to clone over https	00:18
jeblair	clarkb: up to you; i will need to continue to focus on zuul tonight	00:18
jeblair	clarkb: but what is git:// ?	00:18
clarkb	jeblair: 34 seconds or so	00:18
jeblair	clarkb: that's kinda why i was thinking we should use it, but have http as a backup	00:19
clarkb	++	00:19
clarkb	jeblair: I think I would prefer at least one extra set of eyes when we make the switch as there are a lot of moving parts	00:19
anteaya	clarkb: who do you have in mind? I'm no help there	00:20
clarkb	anteaya: jeblair :)	00:20
*** wenlock has quit IRC		00:20
jeblair	clarkb: how many servers do you want, and what size?	00:20
anteaya	clarkb: ah okay, sorry thought you were talking about a 3rd person	00:20
clarkb	jeblair: I think you have a better feel for that than I do	00:21
jeblair	aha, i think i found the zuul problem	00:21
clarkb	\o/	00:21
clarkb	jeblair: one additional thing to keep in mind with lots of small servers is the extra work gerrit will need to do replicating. Not sure if that is a big deal	00:22
anteaya	yay	00:22
comstud	btw, i appreciate the work all of you are doing... despite all of the cursing that I'm doing. :)	00:22
jeblair	clarkb: good point; maybe we should go with several 8g servers then?	00:22
anteaya	I vote we let jeblair try to patch zuul first	00:22
anteaya	thanks comstud	00:22
anteaya	I think we are doing our share of cursing too	00:23
comstud	i figure so	00:23
clarkb	jeblair: that sounds reasonable	00:23
clarkb	jeblair: start with ~4 then we can add more if needed?	00:23
*** alexpilotti has quit IRC		00:23
jeblair	clarkb: yeah	00:24
clarkb	lifeless: does the haproxy source balance type completely break if you sources are in the same /24 subnet?	00:24
clarkb	lifeless: I am slightly worried that replication delays will cause problems with git if it hits 5 different servers at once (which by default it can do that)	00:25
*** jjmb has joined #openstack-infra		00:25
clarkb	at least with the http protocol. I think git:// is one connection	00:25
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	00:28
lifeless	clarkb: git is one connection yes	00:28
clarkb	jeblair: ^ that makes http a viable fallback option	00:28
*** dina_belova has joined #openstack-infra		00:28
lifeless	clarkb: multiple http requests can go to different servers	00:28
anteaya	10 lovely patches in the gate, currently passing tests ha ha ha ha flash of lightening	00:28
clarkb	lifeless: typically, however we will be replicated to 5 different servers and potentially different rates	00:28
*** changbl has joined #openstack-infra		00:28
anteaya	the 11th one has a LOST	00:28
lifeless	clarkb: if the first server hit's its max_queue, goes down	00:29
lifeless	clarkb: yes, I get the case ;)	00:29
lifeless	clarkb: have you read the docs for source - when you add servers, you'll shuffle 1/new-num-servers of the http clients onto new servers	00:29
clarkb	lifeless: using the default round robin it dynamically weighs them	00:30
lifeless	clarkb: right	00:30
clarkb	oh looks like that would happen with source as there is a division of the hash	00:31
lifeless	clarkb: there isn't a mode where you can avoid http requests going to different servers; you only get to choose whether it happens all the time or when you have servers going down/up/added.	00:31
clarkb	I think I am less worried about that case and more generally worried about it when everything is going smoothly and gerrit happens to update one server more slowlythan the others	00:31
lifeless	are you running git-http-backend, or plain-ol-HTTP ?	00:32
harlowja	qq, is there going to be an update to say the mailing list when jenkins is ok again?	00:32
clarkb	git-http-backend	00:32
*** dina_belova has quit IRC		00:33
anteaya	harlowja: you will hear the party happening when jenkins is okay again	00:33
harlowja	sounds great :)	00:33
anteaya	and yes, we can do an update to the ml too	00:33
anteaya	thanks	00:33
harlowja	thx for your guys hardwork	00:33
harlowja	*and gals	00:33
anteaya	thanks harlowja	00:34
anteaya	:D	00:34
lifeless	clarkb: so, why do you want HTTP ?	00:34
lifeless	clarkb: you should read http://git-scm.com/book/en/Git-Internals-Transfer-Protocols	00:34
lifeless	thats why the http git CDN terrifies me :)	00:36
*** chmouel has quit IRC		00:36
jeblair	stopped zuul again; it hit the bug and i have more logs	00:36
*** westmaas has quit IRC		00:36
*** westmaas has joined #openstack-infra		00:36
*** chmouel has joined #openstack-infra		00:37
*** GheRivero has quit IRC		00:37
*** dtroyer has quit IRC		00:37
*** juice has quit IRC		00:37
*** GheRivero has joined #openstack-infra		00:37
anteaya	jeblair: yay	00:37
*** dtroyer has joined #openstack-infra		00:37
anteaya	let's hope the secret is in the logs	00:37
*** jpeeler has quit IRC		00:37
*** juice has joined #openstack-infra		00:37
*** jpeeler has joined #openstack-infra		00:38
*** jjmb1 has joined #openstack-infra		00:39
clarkb	lifeless: for a couple reasons. 1. Apache is generally good about helping us not shoot ourselves in the foot unlike git daemon 2. $RANDOM people can usually hit port 443 3. with https you a reasonable amount of trust of who the remote end is	00:39
*** jjmb has quit IRC		00:40
clarkb	lifeless: I think we are getting better at item 1 with haproxy but items 2 and 3 aren't really solved with git daemon + haproxy	00:40
*** melwitt has quit IRC		00:40
clarkb	2 and 3 aren't really gate issues	00:42
clarkb	jeblair: I have ten nova git clones over git protocol in a while true loop on the 30g host cloning from the 15g host through haproxy	00:44
lifeless	clarkb: http://www.anchor.com.au/blog/2009/10/load-balancing-at-github-why-ldirectord/ might be an interesting read when you have time	00:45
lifeless	clarkb: so I'm reasonably sure smart http will still do multiple requests.	00:46
lifeless	clarkb: smart https will be totally fine.m it's only http that will suck.	00:46
lifeless	clarkb: my suggestion, use roundrobin, but https and git ports only	00:46
clarkb	lifeless: how is https different than http in this scenario?	00:47
lifeless	clarkb: [and https in tcp mode so you just get a tunnel]	00:47
mgagne	lifeless: where are their puppet manifests =)	00:48
lifeless	clarkb: I'm fairly sure git will use one tcp connection for https	00:48
lifeless	clarkb: because everyone knows how slow https handshakes are	00:48
clarkb	lifeless: I wonder if that could be why it fails so hard on centos	00:49
lifeless	clarkb: and intermediaries can't mess you up, whereas for http there are intercepting proxies all over the damn place	00:49
lifeless	clarkb: https intercepting proxies are rarer	00:49
clarkb	there is a long delay when starting an https clone. At first I thought it may be related to handshaking but cloning from /cgit over https does not have the same delay and they share the same ssl setup	00:50
*** sarob_ has quit IRC		00:51
clarkb	and tcpdump showed the client reporting a zero window size frequently	00:52
*** jhesketh has quit IRC		00:53
*** jhesketh has joined #openstack-infra		00:53
*** nati_uen_ has quit IRC		00:55
clarkb	jeblair: I have taken the load on fake git.o.o up to 23ish and haven't had any clones fall over yet	00:55
*** nati_ueno has joined #openstack-infra		00:57
openstackgerrit	benley proposed a change to openstack-infra/jenkins-job-builder: Add display-name job property. https://review.openstack.org/41828	01:01
openstackgerrit	Angus Salkeld proposed a change to openstack/requirements: Add some more filters to the .gitignore https://review.openstack.org/43216	01:01
openstackgerrit	Angus Salkeld proposed a change to openstack/requirements: Bump python-ceilometerclient to 1.0.3 https://review.openstack.org/43217	01:01
jeblair	#status alert Zuul is offline for troubleshooting	01:02
openstackstatus	NOTICE: Zuul is offline for troubleshooting	01:02
*** ChanServ changes topic to "Zuul is offline for troubleshooting"		01:02
clarkb	jeblair: I am going to grab dinner soon	01:05
jeblair	clarkb: k	01:05
*** reed has quit IRC		01:06
clarkb	jeblair: we will need to spin up those 4 nodes tomorrow then write puppet changes to replicate to them and add them to haproxy then we can put everything in	01:06
clarkb	oh and changes to update the clone urls	01:06
jeblair	clarkb: are you happy with the haproxy config?	01:06
*** nati_ueno has quit IRC		01:07
clarkb	jeblair: I think so. I hammered the git:// relatively hard with some for loops	01:07
*** markmcclain has quit IRC		01:07
*** fifieldt_ has joined #openstack-infra		01:20
*** UtahDave has joined #openstack-infra		01:20
openstackgerrit	James E. Blair proposed a change to openstack-infra/zuul: Make updateChange actually update the change https://review.openstack.org/43220	01:23
openstackgerrit	James E. Blair proposed a change to openstack-infra/zuul: Add some log lines https://review.openstack.org/43221	01:23
jeblair	let's hope that's it.	01:23
jeblair	i will install that and restart zuul	01:23
* clarkb reviews really quickly		01:25
jeblair	clarkb: i will wait to start until you have reviewed	01:25
jeblair	clarkb: specifically it was the 'needed_by_changes that was the problem here	01:25
jeblair	and a patch series of like 20 changes	01:26
clarkb	ok interesting thing about the list of files	01:26
clarkb	jeblair: yup lgtm. nice catch	01:27
jeblair	i just did that because it looked like it could be wrong too (just keep appending files)	01:27
clarkb	ya I agree	01:27
clarkb	ok running off to find dinner and fungi	01:27
jeblair	ok starting zuul	01:27
*** dina_belova has joined #openstack-infra		01:28
*** dina_belova has quit IRC		01:33
*** gyee has quit IRC		01:35
openstackgerrit	Mathieu Gagné proposed a change to openstack-infra/config: Add commit-filter for cgit https://review.openstack.org/43222	01:36
*** huangtianhua has joined #openstack-infra		01:37
jeblair	#status ok Zuul is running again	01:38
openstackstatus	NOTICE: Zuul is running again	01:38
*** ChanServ changes topic to "Discussion of OpenStack Developer Infrastructure \| docs http://ci.openstack.org \| bugs https://launchpad.net/openstack-ci/+milestone/grizzly \| https://github.com/openstack-infra/config"		01:38
anteaya	yay	01:38
anteaya	let's see what happens now	01:38
anteaya	way to go jeblair!	01:39
jeblair	may want to wait about an hour before you cheer :)	01:39
anteaya	I'll cheer again in an hour	01:40
*** mriedem has joined #openstack-infra		01:40
anteaya	hopefully soon I will learn enough to help you	01:40
*** dhellmann is now known as dhellmann_		01:48
*** xchu has joined #openstack-infra		01:52
*** yaguang has joined #openstack-infra		02:02
*** roaet has joined #openstack-infra		02:17
roaet	Alright. Read through that scroll back. I'll try to pay attention now. :)	02:17
anteaya	roaet: welcome	02:19
anteaya	66 in the gate, 25 in the check	02:20
anteaya	any free nodes are going to check now, the gate queue seems to be loaded	02:20
anteaya	it took almost an hour to load those 91 patches	02:21
anteaya	and I expect there are at least 90 check patches to come into the check queue so I would look for your patch in the list in about an hour roaet	02:22
roaet	anteaya: thanks. I will do so. trying to wrap my mind around all the information.	02:22
anteaya	it is a lot	02:22
anteaya	I suggest take a small piece	02:22
anteaya	if you know jenkins plugins already, start there	02:22
anteaya	ask questions, don't worry how dumb	02:23
anteaya	I'll do my best to answer or help you find the answer	02:23
roaet	Thanks. Look forward to working with you all.	02:23
*** DennyZhang has joined #openstack-infra		02:23
anteaya	thanks looking forward to working with you too	02:23
anteaya	what time zone are you in?	02:24
anteaya	I'm in Eastern	02:24
roaet	Central	02:24
jeblair	anteaya: i don't have a full list of everything that needs to be rechecked/reverified; only the list from when i stopped it the first time	02:25
anteaya	jeblair: great	02:25
jeblair	anteaya: i'm slowly leaving recheck comments on those, to avoid the thundering herd	02:25
anteaya	I have been encouraging those to wait for the queue to populate and then recheck if they don't see theirs	02:25
anteaya	great	02:25
jeblair	anteaya: but anything added since about 4 hours or so ago i won't have	02:26
anteaya	I am encouraging the thundering herd to let us build up slowly	02:26
anteaya	ah okay	02:26
anteaya	I'll use that as a marker	02:26
jeblair	(if i've been leaving recheck comments though, my script will get to those again)	02:26
roaet	jeblair: I'm assuming if you hit my change then it was there (i see your recheck there)	02:26
jeblair	roaet: probably, which number?	02:26
roaet	42242	02:27
jeblair	roaet: ye,h it's about 30-something down the list, so probably any time now	02:27
roaet	Thanks a lot. I'll try to help however I can in the future. Don't want to mythical man month you at the moment. But I'll try my best.	02:28
lifeless	ttx: jeblair: so - nova baremetal is broken at the moment	02:28
lifeless	does that impact anything release mgmt wise right now ?	02:28
*** dina_belova has joined #openstack-infra		02:29
jeblair	lifeless: i don't think so; we're just around feature freeze (we're actually only at feature proposal freeze)	02:29
jeblair	lifeless: https://wiki.openstack.org/wiki/Havana_Release_Schedule	02:30
jeblair	lifeless: h3 milestone release is sep 6	02:30
jeblair	\o/ 3 changes just merged	02:33
*** dina_belova has quit IRC		02:33
morganfainberg	jeblair: just looking at https://review.openstack.org/#/c/39899/ i see zuul posted on it about 15 minutes ago, but don't see it in the queue	02:34
morganfainberg	oh wait nvm	02:34
morganfainberg	Misreading time	02:34
morganfainberg	crap 2hrs ago	02:34
anteaya	ah okay	02:34
morganfainberg	i obviously can't think.	02:34
anteaya	no worries	02:34
jeblair	morganfainberg: yeah, you'll want to reverify that then, sorry	02:35
anteaya	we are all tired	02:35
morganfainberg	jeblair: not a worry man, just tyring to make sure i get these important ones in the queue	02:35
*** gordc has joined #openstack-infra		02:35
anteaya	4 successful patches in the gate	02:35
anteaya	yay	02:35
morganfainberg	jeblair: is it going to take a while to pickup reverifies since it's still slowly reconsituting the queues?	02:36
*** rcleere has joined #openstack-infra		02:36
jeblair	morganfainberg: it's about 70 gerrit events behind right now, so if you add your reverify, it'll go onto the end of that queue first before it shows up in the gate queue	02:37
morganfainberg	great thats that i wanted to know.	02:38
morganfainberg	thanks!	02:38
*** ftcjeff has joined #openstack-infra		02:38
jeblair	6 more changes merged	02:38
openstackgerrit	Steve Baker proposed a change to openstack-infra/config: Generate heat docs on check and gate https://review.openstack.org/43234	02:38
morganfainberg	woot.	02:38
anteaya	yay	02:39
morganfainberg	merged is good!	02:39
anteaya	yay merged	02:39
jeblair	the git server is still going to be a big problem; a lot of tests are going to fail because of that.	02:40
anteaya	:(	02:41
anteaya	https://tinyurl.com/m9gcyjp	02:41
anteaya	the graph seems to be in UTC	02:41
anteaya	hopefully zuul can stay up for the rest of the night (insert appropriate time of day for yourself, dear reader)	02:42
anteaya	jeblair: so the plan is to address git changes tomorrow?	02:51
anteaya	yay 3 jobs in post	02:51
jeblair	anteaya: yes. just like today.	02:52
anteaya	sorry I thought today was benchmarking and tomorrow is making the changes	02:52
jeblair	anteaya: nope, benchmarking wasn't on the agenda until after we rolled it out. it just took a while for haproxy to get set up.	02:53
anteaya	ah	02:53
morganfainberg	anteaya: ok, my 2 changesets that are needed arrived in the gate queue, thanks again for keeping me posted on what was up over here earlier on.	02:54
anteaya	sorry I missed that point	02:54
anteaya	yay	02:54
anteaya	you are welcome, morganfainberg, thanks for your patience	02:54
gordc	sweet, finally got a jenkins result back. big thanks jeblair and anyone else working on the issues.	02:55
anteaya	yay gordc	02:58
anteaya	congratulations on your jenkins result	02:58
anteaya	jeblair has been working hard on it	02:58
gordc	small victories :) yep, i've seen his name all over the rechecks.	02:59
*** markmcclain has joined #openstack-infra		03:00
anteaya	:D	03:00
anteaya	gotta celebrate them when they occur	03:00
*** mriedem has quit IRC		03:01
*** tjones has joined #openstack-infra		03:02
*** tjones has left #openstack-infra		03:02
*** blamar has quit IRC		03:17
anteaya	zuul has been up for an hour and 40 minutes, how is it looking jeblair?	03:19
anteaya	can I cheer again?	03:19
anteaya	everything I can see looks good	03:20
anteaya	jobs are finishing, others are starting	03:20
anteaya	roaet your patch is being tested as we speak	03:22
*** dina_belova has joined #openstack-infra		03:29
*** jfriedly has quit IRC		03:32
anteaya	there are two patches, a cinder and a neutron patch that have been in the post queue for a while	03:34
*** dina_belova has quit IRC		03:34
anteaya	the translation-update job passed for both but the other three jobs: tarball, coverage and docs are queued and have been for a while	03:34
anteaya	I will watch them and see if they move along	03:35
anteaya	gate 72, check 153, post 2	03:35
Alex_Gaynor	need more workers :)	03:35
anteaya	yeah	03:36
anteaya	that is what jeblair and clarkb talked about creating	03:36
anteaya	I think it is on tomorrow's agenda	03:36
Alex_Gaynor	"need more cloud" :)	03:36
anteaya	moar cloud	03:37
anteaya	yeah, I hear that	03:37
Alex_Gaynor	did we land either the git mirroring or the zuul fix, or are we just flying on luck?	03:37
anteaya	okay those post patches have jobs running	03:37
anteaya	zuul fix landed	03:37
anteaya	been up for two hours with the new zuul fix	03:38
Alex_Gaynor	oh, awesome, so now it should be at least smooth sailing (but slow)	03:38
anteaya	so far, so good from what I can see	03:38
anteaya	smooth but slow would be great	03:38
anteaya	hanging out to check on the smooth part	03:38
Alex_Gaynor	hmm, was the commit in the zuul repo? I don't see any new commits	03:38
anteaya	not yet	03:38
anteaya	let me dig it up	03:39
Alex_Gaynor	I thought I read everything in the backlog, I must have missed it	03:39
anteaya	https://review.openstack.org/#/c/43220/	03:39
anteaya	https://review.openstack.org/#/c/43221/	03:39
anteaya	everything I understand has me believing that jeblair made these changes before the last zuul restart	03:40
anteaya	I rely on jeblair to correct me if I am wrong	03:40
anteaya	in this regard	03:40
Alex_Gaynor	thanks	03:40
anteaya	np	03:41
anteaya	thanks for asking	03:41
anteaya	you actually know more about what is going on than I do	03:41
Alex_Gaynor	I seriously doubt it :)	03:42
anteaya	ha ha ha	03:42
*** yaguang has quit IRC		03:42
anteaya	well you know a lot	03:42
*** nati_ueno has joined #openstack-infra		03:42
anteaya	grateful for you input	03:42
*** dims has quit IRC		03:43
anteaya	s/you/your	03:44
*** nati_ueno has quit IRC		03:44
*** yaguang has joined #openstack-infra		03:45
anteaya	yay Queue lengths: 0 events, 0 results.	03:51
anteaya	gate 71, post 1, check 152	03:51
anteaya	it is deleting a bunch of servers and starting a bunch more jobs	03:52
*** HenryG_ has joined #openstack-infra		03:54
*** dstufft_ has joined #openstack-infra		03:54
*** DennyZhang has quit IRC		03:54
*** cyeoh has quit IRC		03:54
*** soren has quit IRC		03:54
*** dstufft has quit IRC		03:55
*** soren has joined #openstack-infra		03:55
*** DennyZhang has joined #openstack-infra		03:55
*** cyeoh has joined #openstack-infra		03:55
*** HenryG has quit IRC		03:57
*** vogxn has joined #openstack-infra		03:57
anteaya	slow and smooth seems to characterize what I am seeing right now	03:58
Alex_Gaynor	uh oh, we've got a failure coming up in the gate pipeline :(	03:58
anteaya	:(	03:58
anteaya	yeah that always makes me sad too	03:59
anteaya	the third patch	03:59
anteaya	so two have a chance of getting in, then reset	03:59
* Alex_Gaynor shaves 45 minutes off his life		03:59
anteaya	when I see 6 or 8 passing in the gate, I do a little happy dance in my chair	04:00
anteaya	yup	04:00
anteaya	http://tinyurl.com/kmotmns	04:00
anteaya	here is a url for the test node graph	04:00
anteaya	refreshing the page updates the graph	04:00
anteaya	I have to turn in	04:00
anteaya	what patch are you waiting on Alex_Gaynor?	04:01
Alex_Gaynor	Nothing in particular, I just like watching the patches flow through the system	04:01
anteaya	cool	04:01
anteaya	I hear that	04:02
anteaya	3 in post, yay!	04:02
anteaya	okay I'm done	04:02
anteaya	have a good night Alex_Gaynor	04:03
Alex_Gaynor	you too!	04:03
anteaya	thanks	04:03
*** anteaya has quit IRC		04:03
*** eharney has joined #openstack-infra		04:03
*** dkliban has joined #openstack-infra		04:04
Alex_Gaynor	uh oh, the results seem to have started to build up again	04:07
Alex_Gaynor	there's severeal changesets in check that should have been processed already	04:08
Alex_Gaynor	maybe not, coming back down	04:12
*** huangtianhua has quit IRC		04:15
*** dklyle has joined #openstack-infra		04:23
*** dtroyer has quit IRC		04:23
*** dtroyer has joined #openstack-infra		04:23
*** retr0h has quit IRC		04:23
*** david-lyle has quit IRC		04:23
*** samalba has quit IRC		04:23
*** comstud has quit IRC		04:23
*** mberwanger has joined #openstack-infra		04:24
*** retr0h has joined #openstack-infra		04:25
*** retr0h has joined #openstack-infra		04:25
*** comstud has joined #openstack-infra		04:25
openstackgerrit	A change was merged to openstack-infra/jenkins-job-builder: Add support for parameter filters in copyartifact https://review.openstack.org/41582	04:26
openstackgerrit	A change was merged to openstack-infra/jenkins-job-builder: Fixed timeout wrapper https://review.openstack.org/42348	04:26
*** samalba has joined #openstack-infra		04:27
*** gordc has quit IRC		04:29
*** dina_belova has joined #openstack-infra		04:30
*** rcleere has quit IRC		04:32
*** markmcclain has quit IRC		04:33
*** markmcclain has joined #openstack-infra		04:33
*** dina_belova has quit IRC		04:35
clarkb	Alex_Gaynor: seems ok right now	04:35
clarkb	Alex_Gaynor: ahve you seen any more oddness?	04:35
Alex_Gaynor	clarkb: yeah, must have just been a temporary blip	04:35
Alex_Gaynor	Also, damn you centos for being the only thing with py26	04:36
clarkb	Alex_Gaynor: I agree	04:36
clarkb	centos git makes me so sad	04:36
*** boris-42 has joined #openstack-infra		04:36
Alex_Gaynor	clarkb: I don't think so, the only other thing I've noticed is that sometimes the SCP step takes an abnormally long time, way longer than I remember it taking it previous weeks	04:37
*** DennyZhang has quit IRC		04:37
clarkb	Alex_Gaynor: they may be contention on the log server	04:37
*** eharney has quit IRC		04:37
Alex_Gaynor	makes sense	04:37
Alex_Gaynor	most insane CI infrastructure I've ever been a part of	04:37
clarkb	http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=13&page=2	04:38
clarkb	its possible the finds that cleanup things slow stuff down when they run	04:38
clarkb	Alex_Gaynor: the big CPU blips you see on that page are a result of find running and deleting old logs, compressing new things and so on	04:39
*** Anju has left #openstack-infra		04:39
clarkb	Alex_Gaynor: I will make a note to look at that once git is sorted	04:39
Alex_Gaynor	clarkb: yeah, definitely a low priority item :)	04:39
clarkb	jeblair: my git plan. 1. spin up new servers 2. replicate from gerrit to new servers. 3. merge change to use git:// in g-g-p 4. merge haproxy change 5. merge change to add haproxy nodes	04:40
clarkb	jeblair: 4 and 5 may end up being squashed together or at least merged together	04:40
clarkb	then at some point we can use git:// in d-g but d-g should continue to happily use https so we can make sure nothing is falling over before doing that	04:41
*** dstufft_ is now known as dstufft		04:46
*** ftcjeff has quit IRC		04:49
* Alex_Gaynor wonders if there's merit in installing a ppa for 2.6 on some of the other nodes		04:50
clarkb	maybe? there is value in testing on centos	04:51
clarkb	now we know that you don't want to deploy openstack with git on centos :)	04:52
clarkb	at least not with https and git-http-backend	04:52
* Alex_Gaynor doesn't want to deploy much of anything on centos		04:52
* dstufft concurs with Alex_Gaynor		04:52
Alex_Gaynor	basically the entire check queue is bottlenecked on 2.6 :{	04:53
*** SergeyLukjanov has joined #openstack-infra		04:53
clarkb	Alex_Gaynor: yeah made worse by the gate monopolizing those resources	04:53
Alex_Gaynor	clarkb: probably better this way, as long as we don't get a gate reset	04:54
clarkb	ya, making the gate a higher priority was done with reason	04:55
clarkb	in part to remove barriers to merging security related fixes	04:55
Alex_Gaynor	the right way to address starvation in check is to just add more workers, not mess with the algorithms, IMO	04:55
clarkb	yup	04:55
*** nati_ueno has joined #openstack-infra		04:56
clarkb	Alex_Gaynor: we definitely want to use nodepool to dynamically add slaves that hang around longer	04:56
clarkb	Alex_Gaynor: mordred has even hacked up kexec machinery that might be useful in having single use slaves that aren't as expensive to use as today's singel use slaves	04:56
Alex_Gaynor	neat	04:57
clarkb	Alex_Gaynor: the tricky bit there is we have single use slaves like we do today because tests get root and can really hose stuff	04:57
clarkb	Alex_Gaynor: making sure that kexec can reboot into a good state without having been hosed by a test is a bit of work	04:57
Alex_Gaynor	I wonder if there's any prior art	04:58
clarkb	Alex_Gaynor: jeblair had stuff to do it when the tests ran on hardware	04:58
clarkb	but I am not sure how worried they were of root abuse (intentional or not) at the time	04:58
*** dkliban has quit IRC		05:00
fungi	clarkb: still reading scrollback but for future reference i think you can pass git ipv6 address literals using standard square-bracket notation (git clone http://[2001:4800:7812:514:3bc	05:03
pleia2	ah! good to know	05:04
*** Dr01d has joined #openstack-infra		05:04
clarkb	fungi: thanks. I wonder why it doesnt' split on the right side	05:05
clarkb	seems like that should work just fine. I gues sif you leave the port off you won't knwo if it is a port or part of the address	05:05
clarkb	fungi: if you want to poke at the centos git cloning the 15g server jeblair listed in scroll back 162.209.12.127 iirc (I remembered that somehow) is the haproxy + apache + git serve and has openstack/nova on it	05:07
clarkb	fungi: the 30g server 198.something was where I was running the client	05:07
clarkb	fungi: haproxy is listening on 80, 443, and 9418 and apache is on 8080, 4443. git-daemon is on 29418	05:09
*** Ryan_Lane has quit IRC		05:10
*** nicedice_ has quit IRC		05:13
fungi	clarkb: cool. i'll see if i can spot any major differences in window scaling defaults in the kernel tcp/ip settings vs on ubuntu precise	05:15
clarkb	ty	05:15
*** mberwanger has quit IRC		05:16
*** primeministerp has quit IRC		05:22
*** primeministerp has joined #openstack-infra		05:29
*** dina_belova has joined #openstack-infra		05:30
*** sridevi has joined #openstack-infra		05:32
*** dina_belova has quit IRC		05:35
openstackgerrit	A change was merged to openstack/requirements: Bump python-swiftclient requirement to >=1.5 https://review.openstack.org/43092	05:37
openstackgerrit	A change was merged to openstack-infra/jenkins-job-builder: Fixing override-votes for gerrit trigger https://review.openstack.org/42341	05:45
*** morganfainberg is now known as morganfainberg\|a		05:56
*** dmakogon_ has joined #openstack-infra		05:58
*** UtahDave has quit IRC		05:58
*** xchu has quit IRC		06:07
*** SlickNik has quit IRC		06:09
*** SlickNik has joined #openstack-infra		06:09
*** morganfainberg\|a is now known as morganfainberg		06:14
*** SergeyLukjanov has quit IRC		06:15
*** markmc has joined #openstack-infra		06:20
*** dguitarbite has quit IRC		06:21
markmc	clarkb, fwiw, https://bugs.launchpad.net/openstack-ci/+bug/1215290	06:21
uvirtbot	Launchpad bug 1215290 in openstack-ci "git https clones failing on centos slaves" [Undecided,New]	06:21
markmc	clarkb, just trying to get the info in one place	06:21
markmc	clarkb, I'm gonna see what the story is with git being rebased in RHEL6	06:22
markmc	clarkb, happy to help build a newer git RPM, though, if you'd use that	06:22
*** sridevi has quit IRC		06:23
clarkb	markmc: thanks. I would be open to building newer git rpms but jeblair was understandably hesitant	06:23
markmc	clarkb, ok	06:24
*** xchu has joined #openstack-infra		06:24
openstackgerrit	@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248	06:26
*** HenryG_ has quit IRC		06:30
*** mikal has quit IRC		06:30
*** dina_belova has joined #openstack-infra		06:31
*** p5ntangle has joined #openstack-infra		06:31
*** mikal has joined #openstack-infra		06:32
*** Dr01d has quit IRC		06:34
*** dina_belova has quit IRC		06:36
*** markmc has quit IRC		06:38
*** nayward has joined #openstack-infra		06:42
openstackgerrit	@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248	06:42
*** p5ntangle has quit IRC		06:42
*** AJaeger has joined #openstack-infra		06:57
*** nati_ueno has quit IRC		06:58
*** Dr01d has joined #openstack-infra		07:01
*** AJaeger has quit IRC		07:05
*** SergeyLukjanov has joined #openstack-infra		07:05
*** pblaho has joined #openstack-infra		07:05
*** odyssey4me4 has joined #openstack-infra		07:14
*** fbo_away is now known as fbo		07:16
*** markmcclain has quit IRC		07:18
*** yolanda has joined #openstack-infra		07:21
*** sridevi has joined #openstack-infra		07:24
*** Anju has joined #openstack-infra		07:25
*** sridevi has quit IRC		07:28
*** vogxn has quit IRC		07:28
openstackgerrit	@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248	07:29
*** dina_belova has joined #openstack-infra		07:32
openstackgerrit	@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248	07:33
*** dina_belova has quit IRC		07:36
*** jpich has joined #openstack-infra		07:37
*** mkerrin has joined #openstack-infra		07:39
*** SergeyLukjanov has quit IRC		07:40
*** p5ntangle has joined #openstack-infra		07:43
*** afazekas has joined #openstack-infra		07:52
*** boris-42 has quit IRC		07:58
*** p5ntangle has quit IRC		08:01
*** AJaeger has joined #openstack-infra		08:01
*** AJaeger has joined #openstack-infra		08:01
*** p5ntangle has joined #openstack-infra		08:02
*** xchu has quit IRC		08:04
*** AJaeger has quit IRC		08:06
*** AJaeger has joined #openstack-infra		08:12
*** AJaeger has joined #openstack-infra		08:12
*** fifieldt_ has quit IRC		08:15
*** xchu has joined #openstack-infra		08:16
*** p5ntangl_ has joined #openstack-infra		08:19
*** vogxn has joined #openstack-infra		08:20
*** AJaeger has quit IRC		08:20
*** cthulhup has joined #openstack-infra		08:21
*** markmc has joined #openstack-infra		08:22
*** p5ntangle has quit IRC		08:22
*** michchap_ has joined #openstack-infra		08:25
*** cthulhup has quit IRC		08:25
*** dmakogon_ has quit IRC		08:26
*** michchap has quit IRC		08:27
*** ruhe has joined #openstack-infra		08:30
*** dina_belova has joined #openstack-infra		08:32
*** dina_belova has quit IRC		08:37
*** koobs` has quit IRC		08:45
*** koobs` has joined #openstack-infra		08:45
*** koobs` is now known as koobs		08:45
*** AJaeger has joined #openstack-infra		08:50
*** AJaeger has quit IRC		08:50
*** AJaeger has joined #openstack-infra		08:50
*** p5ntangl_ has quit IRC		08:54
*** AJaeger has quit IRC		08:55
*** p5ntangle has joined #openstack-infra		08:55
*** sridevi has joined #openstack-infra		08:57
*** xBsd has joined #openstack-infra		09:02
*** sridevi has quit IRC		09:03
Anju	cyeoh : in neutron cli there is an optional argument of json and xml	09:05
markmc	clarkb, jeblair, there are git 1.7.12.4 packages available for centos, signed with the centos testing key: https://bugs.launchpad.net/openstack-ci/+bug/1215290/comments/3	09:12
uvirtbot	Launchpad bug 1215290 in openstack-ci "git https clones failing on centos slaves" [Undecided,New]	09:12
*** cthulhup has joined #openstack-infra		09:15
*** cthulhup has quit IRC		09:20
openstackgerrit	Julien Danjou proposed a change to openstack-infra/statusbot: Handle topic via a configuration file https://review.openstack.org/43263	09:30
*** michchap_ has quit IRC		09:30
*** dina_belova has joined #openstack-infra		09:33
openstackgerrit	Serg Melikyan proposed a change to openstack-infra/config: Fix ACL for Murano projects https://review.openstack.org/41650	09:35
openstackgerrit	Serg Melikyan proposed a change to openstack-infra/config: Fix ACL for Murano projects https://review.openstack.org/41650	09:36
openstackgerrit	Serg Melikyan proposed a change to openstack-infra/config: Fix ACL for Murano projects https://review.openstack.org/41650	09:37
*** dina_belova has quit IRC		09:37
*** AJaeger has joined #openstack-infra		09:37
*** AJaeger has joined #openstack-infra		09:37
*** boris-42 has joined #openstack-infra		09:37
*** enikanorov_ has joined #openstack-infra		09:47
*** AJaeger has quit IRC		09:49
*** BobBallAway is now known as BobBall		09:57
*** xchu has quit IRC		09:58
*** afazekas has quit IRC		10:07
*** thomasbiege has joined #openstack-infra		10:10
*** ruhe has quit IRC		10:15
*** morganfainberg is now known as morganfainberg\|a		10:21
*** ruhe has joined #openstack-infra		10:26
*** p5ntangl_ has joined #openstack-infra		10:27
*** thomasbiege has quit IRC		10:28
*** weshay has joined #openstack-infra		10:29
*** p5ntangle has quit IRC		10:30
*** ruhe has quit IRC		10:30
*** dina_belova has joined #openstack-infra		10:33
*** vogxn has quit IRC		10:35
*** vogxn has joined #openstack-infra		10:37
*** thomasbiege has joined #openstack-infra		10:38
*** vogxn has quit IRC		10:38
*** dina_belova has quit IRC		10:38
*** vogxn has joined #openstack-infra		10:38
*** vogxn has left #openstack-infra		10:42
*** vogxn has joined #openstack-infra		10:42
*** openstack has joined #openstack-infra		15:12
markmc	oh look, everything got requeued	15:12
markmc	dark magic at work	15:12
anteaya	except for 41070 at the bottom	15:13
anteaya	it is still running	15:13
*** reed has joined #openstack-infra		15:13
markmc	41070 is the first patch	15:14
markmc	none of the rest can merge without them	15:14
markmc	wth?	15:14
*** p5ntangle has quit IRC		15:14
jeblair	markmc: it just figured that out	15:15
jeblair	2013-08-22 15:14:42,294 INFO zuul.DependentPipelineManager: Dequeuing change <Change 0x7faf68327050 42433,7> because it can no longer merge	15:15
markmc	and 4 have disappeared	15:15
markmc	DRAMATIC SCENES UNFOLDING HERE	15:15
*** ruhe has quit IRC		15:17
jeblair	the reason it's slow is because it's building up proposed states of the git repo _before_ it checks that. in retrospect, that does seem like a sub-optimal ordering.	15:17
jd__	just for my personal culture, the slowness is a problem with zuul or lack of resource to run the jobs?	15:17
anteaya	yes	15:17
jd__	'cause I saw a lot of checks waiting for python26 only	15:18
anteaya	that is a git issue	15:18
markmc	speaking of python26	15:18
anteaya	proxying problems	15:18
anteaya	we are trying to address git today	15:18
markmc	jeblair, saw my message about newer centos6 git rpms?	15:18
anteaya	jd__: so more than one issue	15:18
jd__	anteaya: is there a trace of that I can read about?	15:18
anteaya	hopefully the problem with zuul has been addresssed	15:18
anteaya	just the log for the last 3-4 days	15:19
anteaya	it has slowly built	15:19
markmc	jd__, https://bugs.launchpad.net/openstack-ci/+bug/1215290	15:19
uvirtbot	Launchpad bug 1215290 in openstack-ci "git https clones failing on centos slaves" [Undecided,New]	15:19
anteaya	the tl;dr version is that zuul had a bug which was hard to trace but jeblair found it yesterday	15:19
jeblair	markmc: yes, thanks; i'm not sure if we should try to do that now or stick with the current tentative plan and switch to the git protocol, which is faster even with the version in centos6	15:19
jeblair	(after we scale out the git server)	15:20
markmc	jeblair, cool	15:20
anteaya	the git issue I have a weak grasp of, but it is about not having enough git repos available to clone/download and we are having timeouts	15:20
jeblair	anteaya: the git server is overloaded if we run all of the jobs we have at once	15:20
anteaya	there we go, thanks jeblair	15:20
anteaya	we are working to better load balance the git server	15:21
jeblair	which is why we haven't added more centos slaves, because at this point, adding more slaves will only make that worse	15:21
anteaya	hoping to make progress on that today	15:21
*** rfolco has joined #openstack-infra		15:21
anteaya	right, it just increases the overload on the git server	15:21
* anteaya feels understanding is starting to fall into place		15:21
jeblair	zaro: when you're up and have a minute; i have no idea why this happened: http://paste.openstack.org/show/44899/	15:22
markmc	anteaya, if you're up for it, it might be cool to file bugs about ongoing stuff like this and update the bug as progress is made	15:22
*** thomasbiege has quit IRC		15:22
anteaya	I can make an attempt on it	15:22
jeblair	zaro: oops, better paste here: http://paste.openstack.org/show/44900/	15:22
markmc	that'd be awesome	15:22
anteaya	I welcome your direction on it as I go along, markmc, thanks	15:22
jeblair	anteaya: ++	15:22
BobBall	recheck no bug doesn't seem to be working?	15:23
anteaya	I will be afk for about 10 minutes and then will get started on bug reports	15:23
BobBall	I added a few on my changes and the check queue is still empty?	15:23
jeblair	BobBall: zuul has a backlog of gerrit events right now, it should get to it	15:23
BobBall	I'll be patient then :)	15:23
*** dims has joined #openstack-infra		15:23
jeblair	BobBall: "Queue lengths: 126 events" is the operative thing	15:23
*** thomasbiege has joined #openstack-infra		15:24
jeblair	zaro: anyway, it looks like jenkins said it was taking the node offline, but it apparently wasn't offline when the functions were registered, so it ran a job anyway	15:24
BobBall	ahhhh I see	15:24
*** vogxn has quit IRC		15:25
anteaya	BobBall: we just restarted zuul, and we have the queue in a staggered start	15:26
anteaya	once the queue length is 0 events - will probably take about 90 minutes	15:27
anteaya	if you don't see your patch, then recheck	15:27
BobBall	Makes sense	15:27
BobBall	stop it overloading	15:28
BobBall	Might be worth adding the "wait 90 minutes" in the topic? I'm sure I won't be the only person asking this	15:28
anteaya	right	15:28
*** jjmb has quit IRC		15:28
anteaya	I am going to work on some bugs as communication tools	15:29
jeblair	zaro: i think it's because when that happens, jenkins disconnects the node asynchronously; so it may not actually be offline for a while	15:29
anteaya	the wait will change as time passes, so the message will get stale quickly	15:29
anteaya	I can answer questions and folks are good about reading logs	15:29
*** ruhe has joined #openstack-infra		15:30
*** thomasbiege has quit IRC		15:32
*** pblaho has quit IRC		15:33
*** dina_belova has joined #openstack-infra		15:35
reed	hello folks	15:35
jeblair	reed: hello	15:35
*** CaptTofu has quit IRC		15:36
fungi	jeblair: clarkb: so i tried adjusting some tcp settings on git-test-15 but cloning nova from it via https was still taking ~8 minutes with nothing else going on	15:40
fungi	plus a lot of errors like...	15:40
fungi	error: Unable to get pack index https://162.209.12.127/openstack/nova/objects/pack/pack-38faaee3478b9e659b67b5f59b7ecb1e77552a93.idx	15:40
*** Ryan_Lane has joined #openstack-infra		15:40
fungi	error: Unable to find 8f47cb63996d34ce3d8fcaf9f449b400ce033c70 under https://162.209.12.127/openstack/nova	15:40
fungi	Cannot obtain needed object 8f47cb63996d34ce3d8fcaf9f449b400ce033c70	15:40
fungi	et cetera	15:40
jeblair	fungi: well that's what happens when it falls back on the dumb http protocol	15:41
*** vogxn has joined #openstack-infra		15:41
fungi	as opposed to git and http protocol which averaged a snappy 40 seconds	15:41
fungi	so yeah, i suspect there is something terribly wrong on centos as pertains to the git cgi backend and https but still no clue what	15:41
* fungi has to dash out to meet some people but will check back in later		15:42
clarkb	fungi I actually think it is a client side issue	15:42
*** gyee has joined #openstack-infra		15:42
fungi	clarkb: marvellous	15:42
clarkb	other git clients clone from that host just fine. centos 1.7.1 git does not	15:42
clarkb	jeblair I am going to run to the office in a minute then will begin the load balance git process	15:43
jeblair	clarkb: cool, i should be ready to pitch in then	15:43
pleia2	clarkb: want me to do some time tests with 1.7.1 and the rpmforge 1.7.11 so at least we have a data point?	15:43
clarkb	pleia2 yes testing newer git clients on centos would at least help confirm it is client side	15:44
pleia2	k, will do	15:44
markmc	pleia2, I provided a link to a repo containing 1.7.12.4 for centos, maintained by a centos maintainer	15:46
* markmc digs it up again		15:46
pleia2	markmc: saw the lp link, I can use that	15:47
jeblair	https://bugs.launchpad.net/openstack-ci/+bug/1215290	15:47
uvirtbot	Launchpad bug 1215290 in openstack-ci "git https clones failing on centos slaves" [Undecided,New]	15:47
markmc	pleia2, ok, great	15:47
markmc	pleia2, just wasn't sure from "rpmforge"	15:47
pleia2	markmc: the packages on rpmforge seem to be the most common way folks install newer versions on centos	15:48
markmc	pleia2, who maintains those?	15:48
pleia2	markmc: I don't know	15:49
markmc	pleia2, right :)	15:49
*** wu_wenxiang has joined #openstack-infra		15:52
wu_wenxiang	https://review.openstack.org/#/c/43138/, I tried "recheck no bug" twice, however didn't start check process	15:53
*** pabelanger_ has joined #openstack-infra		15:55
*** pabelanger_ has quit IRC		15:56
*** pabelanger_ has joined #openstack-infra		15:56
anteaya	here is the LOST test logs bug: https://bugs.launchpad.net/openstack-ci/+bug/1215511	15:57
uvirtbot	Launchpad bug 1215511 in openstack-ci "LOST test logs" [Undecided,New]	15:57
*** CaptTofu has joined #openstack-infra		15:57
*** dina_belova has quit IRC		15:58
*** pabelanger has quit IRC		16:00
*** pabelanger_ is now known as pabelanger		16:00
markmc	anteaya, nice	16:00
*** pabelanger_ has joined #openstack-infra		16:00
anteaya	thanks	16:01
*** CaptTofu_ has joined #openstack-infra		16:01
*** CaptTofu has quit IRC		16:02
wu_wenxiang	https://review.openstack.org/#/c/43138/, I tried "recheck no bug" twice, however didn't start check process, Could anyone help? Thanks	16:03
pleia2	wu_wenxiang: someone should be able to take a look soon, it's been a bit of a crazy week	16:05
jeblair	wu_wenxiang: it's probably in the backlog of gerrit events	16:05
*** markmc has quit IRC		16:05
jeblair	wu_wenxiang: "Queue lengths: 106 events" on the status page; it should get to it soon	16:05
wu_wenxiang	pleia2: jeblair: Thanks	16:06
wu_wenxiang	pleia2: crazy week means too much commit?	16:07
*** dkranz has quit IRC		16:07
*** cthulhup has joined #openstack-infra		16:09
*** datsun180b has quit IRC		16:09
*** datsun180b has joined #openstack-infra		16:09
*** alexpilotti_ has joined #openstack-infra		16:11
*** dklyle is now known as david-lyle		16:11
*** ruhe has quit IRC		16:11
*** pabelanger has quit IRC		16:11
*** zul has quit IRC		16:12
*** cppcabrera has left #openstack-infra		16:12
anteaya	here is the my rechecked/reverified patch isn't in the queue: https://bugs.launchpad.net/openstack-ci/+bug/1215522	16:13
uvirtbot	Launchpad bug 1215522 in openstack-ci "my recheck/reverify patch isn't showing up in status.openstack.org/zuul" [Undecided,New]	16:13
*** alexpilotti has quit IRC		16:13
*** alexpilotti_ is now known as alexpilotti		16:13
anteaya	going to grab a bite to eat	16:14
*** wu_wenxiang has quit IRC		16:14
*** SergeyLukjanov has quit IRC		16:14
jd__	anteaya: ah, your bug is exactly the question I was going to ask!	16:14
anteaya	yay	16:15
anteaya	my first customer	16:15
anteaya	jd__: add comments if I left anything out	16:15
*** jfriedly has joined #openstack-infra		16:17
*** ^d has joined #openstack-infra		16:19
*** ^d has joined #openstack-infra		16:19
*** ruhe has joined #openstack-infra		16:19
*** ruhe has quit IRC		16:19
clarkb	jeblair: I am in front of the big monitor now	16:20
clarkb	jeblair: I am going to spin up git01 through git04 on the ci account as 8GB centos nodes	16:21
clarkb	jeblair: and will point them at the puppet development env so that they get all of the cgit stuff	16:21
*** arezadr has quit IRC		16:21
zaro	morning	16:22
clarkb	Then I will propose a change to replicate gerrit to them and update the existing change to balance across them. Once gerrit replication has caught up merge the haproxy and g-g-p changes	16:22
zaro	jeblair: do we need a double check to make sure slave is offline in StartJobWorker?	16:22
jeblair	zaro: that's the thing, there's a check in registerfunctions; and since it registered 46 functions, it must have been online	16:23
jeblair	zaro: oh, i see	16:23
jeblair	zaro: a check right before we accept a job	16:23
*** arezadr has joined #openstack-infra		16:24
jeblair	zaro: yeah i think if we wanted to do that, maybe put it in the gearmanworkerimpl right before we do a grab_job?	16:24
zaro	jeblair: that would probably work. i was thinking another one after setting slave offline?	16:25
jeblair	zaro: so we don't get it from gearman (once we get the job from gearman, it doesn't matter, we have to run it)	16:25
*** dkranz has joined #openstack-infra		16:25
jeblair	zaro: what do you mean about setting the slave offline?	16:25
jeblair	zaro: and here's an idea we should have thought about earlier -- why don't we have the gearman plugin always return work_complete if the jenkins job finishes (regardless of the outcome); but have it return work_fail if it grabs a job and finds that it can't run it...	16:27
jeblair	zaro: it already returns work_exception if there is a problem running it; i should have zuul catch that case and re-run the job	16:27
jeblair	(that would help with some of the strange exceptions we've been seeing)	16:28
jeblair	zaro: and then later, if we do the thing with work_fail, we could have zuul do the same thing (re-run the job)	16:28
*** SergeyLukjanov has joined #openstack-infra		16:28
jeblair	clarkb: great, i'll be with you in just a min.	16:28
*** dina_belova has joined #openstack-infra		16:29
*** datsun180b has quit IRC		16:29
zaro	jeblair: i don't see a problem with that off the top of my head.	16:30
*** mrodden has quit IRC		16:35
HenryG	How do I search for gerrit reviews containing a specific string in the commit message?	16:36
*** saper has quit IRC		16:37
*** saper has joined #openstack-infra		16:37
clarkb	HenryG: there may be a way to do it with grep and the ssh query interface, but the gerrit ui does not offer that functionality	16:38
clarkb	HenryG: upstream gerrit has played with using lucene to index that stuff but it gets expensive quickyl	16:38
clarkb	jeblair: rax gave me a build error on the first host (I was going to build one before the others to smooth out any additional stuff). Have you seen BUILD 0 then ERROR before?	16:39
HenryG	clarkb: thanks. :(	16:40
*** datsun180b has joined #openstack-infra		16:40
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Add option to test jenkins node before use https://review.openstack.org/43313	16:40
clarkb	jeblair: I am going to try a second host and see if this is transient	16:41
*** zul has joined #openstack-infra		16:41
jeblair	clarkb: all the time, yeah, just try again	16:41
*** cthulhup has quit IRC		16:41
jeblair	clarkb: ^ that patch is untested; no rush -- but something to think about in the back of your head for after the git server.	16:41
*** pcrews has quit IRC		16:41
clarkb	ok	16:42
jeblair	clarkb: the idea is we can have nodepool run a very simple test job before actually putting each node into service	16:42
jeblair	clarkb: it might be useful for some of the weird errors we've b	16:42
jeblair	een seeing from jenkins	16:42
jeblair	clarkb: (though it would mean quite a bit more work for jenkins)	16:43
clarkb	I like the idea. Possibly try to find better performing nodes if we can test that quickly and have a decent understanding of what to look at	16:43
jeblair	clarkb: yeah, could put anything in the test. though i was thinking "echo ok" for now.	16:43
*** nicedice_ has joined #openstack-infra		16:44
*** morganfainberg\|a is now known as morganfainberg		16:44
clarkb	ya performance testing probably won't happen any time soon	16:44
jeblair	clarkb: anyway, how may i help?	16:44
clarkb	jeblair: want to get a change ready to switch g-g-p to using git:// again?	16:45
pleia2	clarkb: time output is in the etherpad (1.7.12 is faster)	16:46
*** Dr01d has quit IRC		16:46
HenryG	clarkb: Googling "<text> site:review.openstack.org" turned up some useable results for carefully chosen <text>. YMMV.	16:46
jeblair	clarkb: ack	16:46
clarkb	pleia2: cool. want to try cloning from https://162.209.12.127/openstack/nova using both client versions of git?	16:47
clarkb	pleia2: I expect 1.7.1 to fail	16:47
pleia2	on it	16:48
jeblair	04:41 < clarkb> jeblair: my git plan. 1. spin up new servers 2. replicate from gerrit to new servers. 3. merge change to use git:// in g-g-p 4. merge haproxy change 5. merge change to add haproxy nodes	16:48
*** vogxn has left #openstack-infra		16:48
*** mrodden has joined #openstack-infra		16:49
clarkb	jeblair: that is still the plan, though at this point I expect 4 and 5 to be one change	16:49
jeblair	clarkb: why not do 3 last?	16:49
clarkb	jeblair: I was thinking of propogation delay of the JJB update	16:49
clarkb	jeblair: it can be done last	16:49
pleia2	clarkb: ssl errors, how were you getting around this?	16:49
clarkb	s/propogation delay/time to run/	16:49
*** jpich has quit IRC		16:50
*** krtaylor has quit IRC		16:50
clarkb	pleia2: you have to tell git to ignore ssl errors /me looks in hsitory for the flag	16:50
clarkb	pleia2: GIT_SSL_NO_VERIFY=true	16:50
jeblair	clarkb: ok, original plan wfm	16:50
pleia2	clarkb: thanks	16:51
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Switch ggp to use git:// https://review.openstack.org/43315	16:51
jeblair	clarkb: I'm updating the etherpad with the plan and changes	16:52
*** bingbu has quit IRC		16:53
*** svarnau has joined #openstack-infra		16:54
jeblair	clarkb: is there a replication change?	16:54
clarkb	jeblair: there isn't a replciation change yet. I suppose that doesn't need IP addresses. jeblair can you write that one too and put it on the bottom of the current stack of 2 changes?	16:55
clarkb	(there will be a conflict because I put a todo in my change to do it)	16:55
*** fbo is now known as fbo_away		16:56
jeblair	clarkb: oh, right, you said 4+5 are one change... hang on	16:56
*** dina_belova has quit IRC		16:57
clarkb	jeblair: ya the haproxy stuff needs IP addreses so will happen after the nodes are all spun up and replicated to. But gerrit replication doesn't need IP addresses so you can get that change ready and merge it as soon as those hosts have DNS records	16:57
clarkb	jeblair: what was the pyyaml workaround?	16:57
pleia2	clarkb: yeah, after ~6 minutes it fails on 1.7.1, but 1.7.12 works (added to pad)	16:57
jeblair	clarkb: oh, that's what you meant by bottom. ok, i think i'm caught up now	16:57
jeblair	clarkb: 'pip uninstall pyyaml'; re run puppet	16:58
clarkb	pleia2: awesome. I think that confirms it is client side and version related	16:58
clarkb	jeblair: thanks	16:58
jeblair	clarkb: https://review.openstack.org/#/c/43012/3	16:58
jeblair	https://review.openstack.org/#/c/42784/	16:58
jeblair	clarkb: those are the 2 changes you're talking about, right?	16:59
jeblair	(haproxy and xinetd)	16:59
anteaya	etherpad link, for those viewers at home: https://etherpad.openstack.org/git-lb	16:59
clarkb	jeblair: yes	16:59
*** SergeyLukjanov has quit IRC		17:00
clarkb	git02 is happy now. I will add its DNS record then do the other three in one batch	17:00
*** dkranz has quit IRC		17:01
*** dkranz has joined #openstack-infra		17:01
*** dims has quit IRC		17:01
*** nati_ueno has joined #openstack-infra		17:02
*** dims has joined #openstack-infra		17:02
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Replicate to git01-git04 https://review.openstack.org/43316	17:03
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	17:03
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Swap git daemon in xinetd for service https://review.openstack.org/43012	17:03
*** morganfainberg is now known as morganfainberg\|a		17:05
*** BobBall is now known as BobBallAway		17:05
*** SergeyLukjanov has joined #openstack-infra		17:05
*** nayward has quit IRC		17:06
clarkb	jeblair: that looks right	17:07
clarkb	jeblair: pleia2's test indicates upgrading git would help in the https case should we need to go down that route	17:07
jeblair	clarkb: excellent. i love plan b's. and c's. and d's.	17:08
*** thomasbiege has joined #openstack-infra		17:08
clarkb	jeblair: eventually we will have the whole alphabet	17:09
jeblair	sometimes i put j at the end, that could be confusing.	17:09
anteaya	right now this is on the zuul status page: Queue lengths: 50 events, 84 results. What results are being referenced here?	17:10
*** svarnau has quit IRC		17:10
jeblair	anteaya: results from jenkins	17:10
anteaya	like logs?	17:10
jeblair	anteaya: just information as to whether the job succeded	17:10
anteaya	ah okay	17:10
anteaya	success, failure, lost	17:11
openstackgerrit	A change was merged to openstack-infra/jenkins-job-builder: Adding support for the Warnings plugin https://review.openstack.org/40621	17:11
clarkb	oh shiny I have two git01's because of the error	17:11
jeblair	anteaya: when they pile up like that, it's usuaally because zuul either started or stopped a bunch of jobs.	17:11
clarkb	jeblair: do I need to explicitly delete the one in ERROR state?	17:11
jeblair	clarkb: yes	17:11
*** lcestari has joined #openstack-infra		17:11
anteaya	ah okay, I didn't know the Jenkins results were queued as well	17:11
*** svarnau has joined #openstack-infra		17:12
jeblair	anteaya: zuul is almost to the point where we can get rid of that.	17:12
anteaya	yay	17:12
jeblair	anteaya: it looks like it was a gate reset, so those were probably abort results	17:12
anteaya	ah okay	17:12
*** svarnau has quit IRC		17:12
*** svarnau has joined #openstack-infra		17:13
*** wenlock has joined #openstack-infra		17:13
anteaya	I think I can see the gate reset in this graph: https://tinyurl.com/kmotmns	17:13
anteaya	looks like it happened 15 or 20 minutes ago	17:14
jeblair	possibly	17:14
* anteaya nods		17:15
anteaya	282 results that are queued right now, I am going with no action is required from us	17:16
jeblair	clarkb: $::ipaddress is a puppet fact?	17:17
clarkb	jeblair: yes	17:18
clarkb	git01 has dns records and is puppet happy	17:18
clarkb	still waiting for the error state node to go away	17:18
clarkb	git04 errord as well and git03 will be ready as soon as the reboot completes	17:18
*** alexpilotti has quit IRC		17:18
jeblair	clarkb: don't hold your breath	17:18
jeblair	anteaya: zuul is done launching all the jobs from the gate reset and is back processing the event and result queues again	17:19
clarkb	launching a new git04. errored git04 went away faster than git01	17:20
*** thomasbiege has quit IRC		17:20
anteaya	jeblair: grand thank you	17:20
clarkb	1 though 3 should have DNS records and are puppet happy now. Just waiting on git04	17:21
*** SergeyLukjanov has quit IRC		17:22
jeblair	btw, the new image in az2 looks good (no java segfault), but i haven't deleted the old nodes in jenkins which are preventing its use	17:23
clarkb	jeblair: ok	17:24
jeblair	(as a mechanism to slow nodepool)	17:24
clarkb	jeblair: note that I am running all puppet on these new nodes out of the development env so that when we do merge the prposed changes the diff puppet has to deal with should be minimal or nil	17:24
*** boris-42 has joined #openstack-infra		17:25
jeblair	clarkb: ack	17:25
clarkb	the exciting puppet run will be on git.o.o though :)	17:25
jeblair	#status ok	17:25
*** ChanServ changes topic to "Discussion of OpenStack Developer Infrastructure \| docs http://ci.openstack.org \| bugs https://launchpad.net/openstack-ci/+milestone/grizzly \| https://github.com/openstack-infra/config"		17:25
*** pcm_ has quit IRC		17:28
anteaya	yay back to status ok	17:28
openstackgerrit	@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248	17:28
*** pcrews has joined #openstack-infra		17:28
*** pabelanger has joined #openstack-infra		17:29
*** morganfainberg\|a is now known as morganfainberg		17:29
*** svarnau has quit IRC		17:29
*** svarnau has joined #openstack-infra		17:30
jswarren	Any thoughts on why the python26 jobs appear to be significantly slower than the python27 jobs?	17:31
clarkb	jswarren: there are a couple related things but the biggest factor is we have fewer slaves capable of running python26 jobs	17:32
clarkb	jeblair: git04 is almost ready	17:32
openstackgerrit	@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248	17:32
jswarren	OK.	17:32
jswarren	Thanks.	17:32
*** pcm_ has joined #openstack-infra		17:32
jeblair	Alex_Gaynor: mind if i quote you in my slides next time i give a presentation? :)	17:32
Alex_Gaynor	jeblair: sure, what'd I say?	17:33
*** markmcclain has quit IRC		17:33
*** thomasbiege has joined #openstack-infra		17:33
clarkb	jswarren: it also doesn't help that the python26 jobs do tend to take a little longer as they run on hosts with older slow git and I think running many of our tests on python26 just takes longer	17:33
*** xBsd has quit IRC		17:34
jeblair	04:38 < Alex_Gaynor> most insane CI infrastructure I've ever been a part of	17:36
Alex_Gaynor	jeblair: oh, absolutely :D	17:36
*** morganfainberg has left #openstack-infra		17:36
clarkb	git04 is happy with puppet now	17:37
*** morganfainberg has joined #openstack-infra		17:37
clarkb	waiting for DNS records to resolve then I think we can prepare to replicate	17:37
clarkb	jeblair: ^ does approving the replication change automatically restart gerrit? if not I think we should go ahead and merge	17:37
jeblair	clarkb: i don't _think_ anything restarts gerrit except an upgrade	17:38
*** fbo_away is now known as fbo		17:38
*** jbjohnso has quit IRC		17:38
jeblair	clarkb: yeah, looking at the puppet, i think we're fine.	17:39
clarkb	jeblair: the haproxy change failed puppet lint but I can fix that when I add the balancermembers	17:39
*** svarnau has quit IRC		17:39
* anteaya gets ready to applaud		17:40
clarkb	anteaya: we are still a little ways out	17:40
*** svarnau has joined #openstack-infra		17:40
anteaya	I'll applaud all I can	17:40
clarkb	going to wait for replication to happen completely before moving to the next step	17:40
clarkb	jeblair: is it not possible to SIGHUP gerrit and have it pick up those changes?	17:40
clarkb	iirc gerrit can pick up some config and project changes on the fly but I never remember which ones	17:41
clarkb	one replicated we can do a quick set of tests to make sure 8080, 4443, and 29418 all answer to git operations	17:42
jeblair	clarkb: i think i read in a stackoverflow question yesterday it needed a restart	17:42
jeblair	clarkb: gerrit restarts are fairly fast, i don't think it's a big deal	17:42
clarkb	jeblair: ok	17:42
*** svarnau has quit IRC		17:42
openstackgerrit	A change was merged to openstack-infra/config: Replicate to git01-git04 https://review.openstack.org/43316	17:45
*** dina_belova has joined #openstack-infra		17:45
*** svarnau has joined #openstack-infra		17:45
jeblair	yay gate priority	17:45
anteaya	:D	17:46
*** SergeyLukjanov has joined #openstack-infra		17:46
clarkb	jeblair: do you want to kick gerrit when you think it is safe? I am going to fix the haproxy change and add the balancermembers	17:47
*** thomasbiege2 has joined #openstack-infra		17:47
*** ruhe has joined #openstack-infra		17:48
*** svarnau has quit IRC		17:48
anteaya	do we need a channel status update for the gerrit reset?	17:48
clarkb	anteaya: maybe not. as jeblair mentioned it goes really fast though occasionally people do notice	17:49
anteaya	okay	17:49
anteaya	I will stand by to field inquiries	17:49
anteaya	though folks have been really patient and supportive	17:49
anteaya	thanks everyone	17:50
jeblair	clarkb: i will handle the gerrit restart	17:50
*** changbl has joined #openstack-infra		17:51
*** thomasbiege has quit IRC		17:51
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	17:52
clarkb	that should pass lint and it adds the balancer members	17:52
pleia2	great	17:53
*** ^demon has joined #openstack-infra		17:54
jeblair	#status notice restarting gerrit to pick up a configuration change	17:55
openstackstatus	NOTICE: restarting gerrit to pick up a configuration change	17:55
^demon	jeblair: I wasn't paying attention to what channel I was in and I freaked out for a moment.	17:56
^demon	I was like "who's making config changes and I don't know?" :)	17:56
*** ^d has quit IRC		17:56
jeblair	^demon: haha!	17:57
uvirtbot	jeblair: Error: "demon:" is not a valid command.	17:57
jeblair	wow, uvirtbot makes it really fun to talk to you ^demon :)	17:57
pleia2	hehe	17:58
jeblair	need to get gerrit to accept the new hostkeys	17:58
*** thomasbiege2 is now known as thomasbiege		17:58
clarkb	jeblair: pleia2: Is that puppetted or do we just do it by hand?	17:59
jeblair	clarkb: i don't think it's puppeted	17:59
*** AJaeger has joined #openstack-infra		18:00
*** AJaeger has joined #openstack-infra		18:00
clarkb	jeblair: ya I don't see it in the site.pp node for review.o.o	18:00
pleia2	there is an open bug for sorting out gerrit's keys	18:00
pleia2	(I opened it recently)	18:00
anteaya	so zuul and jenkins are still working on what they had, but since gerrit is down nothing new is being queued	18:00
anteaya	now I see	18:00
jeblair	i think i may need to restart gerrit again?	18:01
jeblair	anteaya: gerrit is not down	18:01
anteaya	oh sorry	18:01
anteaya	restarted	18:01
clarkb	jeblair: maybe? java likes to cache a lot of stuff including perhaps the known hosts file	18:01
jeblair	i'm going to restart gerrit again and see if it picks up the known hosts changes	18:01
pleia2	https://bugs.launchpad.net/openstack-ci/+bug/1209464 for when someone is bored ;)	18:02
uvirtbot	Launchpad bug 1209464 in openstack-ci "Start managing ~gerrit2/.ssh/ contents in puppet" [Undecided,New]	18:02
jeblair	pleia2: ++	18:02
jeblair	[2013-08-22 18:03:29,807] ERROR com.google.gerrit.server.git.PushReplication : Cannot replicate to file:///var/lib/git/stackforge/python-ipmi.git; repository not found	18:03
jeblair	that's slightly disturbing	18:03
clarkb	jeblair: that was one of the projects that got renamed	18:04
jeblair	both python-ipmi and pyghmi exist in gerrit's git repo dir	18:04
jeblair	<sigh>	18:04
clarkb	:/	18:04
*** p5ntangle has joined #openstack-infra		18:06
jeblair	ok, the db has no python-ipmi entries	18:08
clarkb	so monty must've done a cp instead of a mv	18:09
jeblair	there doesn't seem to be anything new in python-ipmi....	18:09
jeblair	wait, i wonder if manage_projects put it back	18:10
jeblair	because it's actually quite old	18:10
mtreinish	jeblair: quick question: do I need to do another reverify on: https://review.openstack.org/#/c/43175/	18:10
*** svarnau has joined #openstack-infra		18:11
mtreinish	because I don't see it in the gate pipeline	18:11
clarkb	mtreinish: yes I think so	18:11
jeblair	clarkb: nah, projects.yaml looks right; probably a cp then. so i'll stop gerrit and mv it out of the way	18:11
clarkb	jeblair: ok	18:11
mtreinish	clarkb: ok thanks	18:11
jeblair	#status notice stopping gerrit to correct a stackforge project rename error	18:12
openstackstatus	NOTICE: stopping gerrit to correct a stackforge project rename error	18:12
*** dmakogon_ has joined #openstack-infra		18:13
jeblair	this may make zuul unhappy, it's in the middle of a gate reset	18:13
clarkb	jeblair: up to you if you want to wait	18:13
clarkb	I replication does appear to be happening for everything else	18:13
jeblair	it is done	18:14
*** mrodden has quit IRC		18:14
clarkb	http and git:// seem to be working on git01 but not https. looking into that now	18:16
clarkb	pleia2: jeblair did you guys want to try cloning from the other hosts?	18:16
jeblair	clarkb: can do	18:16
jeblair	clarkb, pleia2: gerrit replication is still runinng	18:17
ttx	lifeless: as long as it gets fixed sometimes in the next two months (and stay fixed), we should be good	18:17
jeblair	we might want to wait until that finishes	18:17
clarkb	jeblair: ok	18:17
mtreinish	clarkb: do again now, because I did it right before the gerrit restart?	18:18
clarkb	mtreinish: gerrit restart shouldn't affect you (zuul shouldget that event quick enough)	18:18
mtreinish	clarkb: ok	18:18
anteaya	33 events in the zuul queue	18:18
ttx	jeblair: the gate looks calmer today. Anything special you've done ? Just arrived	18:19
anteaya	mtreinish: when zuul has 0 events, your patch should show up on the status page	18:19
anteaya	ttx kept zuul running overnight	18:19
mtreinish	anteaya: ok	18:19
anteaya	zuul had a bug which jeblair fixed last night	18:20
jeblair	ttx: i fixed a zuul bug last night (which was causing us to restart zuul a lot with nothing merging)	18:20
* ttx is still trying to understand the patterns that govern gate load		18:20
*** melwitt has joined #openstack-infra		18:20
* anteaya too		18:20
jeblair	ttx: we're working on load balancing git.o.o so that we can serve git repos to all the jobs we need to run	18:20
clarkb	error: gnutls_handshake() failed: A TLS warning alert has been received. while accessing https://git01.openstack.org:4443/openstack/nova/info/refs is what I got speaking https to git01	18:20
anteaya	ttx can't hurt to read this: https://bugs.launchpad.net/openstack-ci/+bug/1215522	18:21
ttx	jeblair: is that the new bottleneck ?	18:21
uvirtbot	Launchpad bug 1215522 in openstack-ci "my recheck/reverify patch isn't showing up in status.openstack.org/zuul" [Undecided,New]	18:21
ttx	anteaya: thx for the pointer	18:21
anteaya	np	18:21
*** zul has quit IRC		18:21
jeblair	ttx: yes; we're actually keeping our slave count artificially low to try to stress it less (but we still get occasional errors)	18:21
jeblair	ttx: once that's scaled out, we should be able to run a lot more tests at once, which should help with backlogs	18:22
jeblair	ttx: (there are a few jenkins errors we've encountered as well that we need to work around; that's next up)	18:22
*** mrodden has joined #openstack-infra		18:22
ttx	jeblair: thx for the executive summary :)	18:22
jeblair	ttx: np	18:22
clarkb	jeblair: I think it is related to the hostname and the cert. GIT_SSL_NO_VERIFY isn't letting it though though as it happens in the handshake. Speaking directly to the ip works	18:24
clarkb	I am going to test with a hacked up /etc/hosts	18:24
*** erfanian has quit IRC		18:24
clarkb	hacked up /etc/hosts makes it better	18:28
*** sarob has joined #openstack-infra		18:29
*** sarob has quit IRC		18:31
*** alexpilotti has joined #openstack-infra		18:31
*** sarob has joined #openstack-infra		18:31
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Add git01-git04 to cacti https://review.openstack.org/43325	18:32
clarkb	jeblair: and git.o.o too	18:32
jeblair	clarkb: fungi did that yesterday (or the day before)	18:32
clarkb	cool I missed that	18:33
jeblair	clarkb: why don't you push that one through real quick?	18:33
clarkb	sure	18:33
jeblair	clarkb: yeah, it's telling	18:33
clarkb	done	18:33
jeblair	clarkb: it's how we knew we were cpu bound and not io or network	18:33
clarkb	git01 is serving files over all three protocols. I just have to give it an IP address for https otherwise tls handshaking complains	18:34
clarkb	jeblair: gotcha	18:34
clarkb	I am going to test git02 now	18:34
jeblair	ok, i'll test 3 and 4 with the ip	18:34
*** mrmartin has joined #openstack-infra		18:34
jeblair	clarkb: you mean with etc hosts, right?	18:35
jeblair	oh, or use the ip but set the no verify var?	18:35
jeblair	yeah, that seems to work	18:35
*** sarob has quit IRC		18:35
clarkb	jeblair: IP and no verify or you can verify and put the ip address in /etc/hosts for git.o.o	18:37
*** krtaylor has joined #openstack-infra		18:37
jeblair	clarkb: a nova clone too real 2m33.517s	18:38
jeblair	took	18:38
jeblair	over https	18:38
anteaya	7 jobs in post, yay!	18:38
clarkb	I got one Timeout waiting for output from CGI script /usr/libexec/git-core/git-http-backend on git02	18:39
clarkb	is that timeout something we can extend?	18:39
jeblair	clarkb: why did you get a timeout from a server under no load?	18:39
clarkb	jeblair: I am cloning http https and git:// concurrently so it has some load	18:39
anteaya	look at all the merges in the last hour: https://tinyurl.com/m2skvhg	18:40
anteaya	yay	18:40
openstackgerrit	A change was merged to openstack-infra/config: Add git01-git04 to cacti https://review.openstack.org/43325	18:40
*** pabelanger has quit IRC		18:40
jeblair	oh, i was cloning from 02, sorry; i guess that could explain the time	18:40
*** pblaho has joined #openstack-infra		18:40
jeblair	but no, 03 and 04 are taking forever too	18:42
clarkb	jeblair: there is some delay as git has to do pack files and things	18:43
clarkb	load doesn't look terrible on 03	18:43
jeblair	03 took 2m15.276s	18:43
jeblair	clarkb: i just started another clone	18:43
jeblair	clarkb: i believe we were shooting for <1 min, yeah?	18:44
clarkb	jeblair: yeah, but really only for git://	18:44
jeblair	clarkb: i did all of my tests with https; and this is on a precise node	18:44
clarkb	oh I see	18:45
jeblair	i think the refs aren't packed at all	18:45
clarkb	jeblair: ya	18:45
jeblair	so somehowe the git.o.o repos ended up with packed refs, but not these. i'm testing if that's the diff.	18:45
* anteaya tries to pick the best time for her 1 hour afternoon walk		18:46
clarkb	02 is 1:42 git clone nova over git protocol	18:47
clarkb	nova repo on 02 has one pack file and a bunch of loose files	18:47
clarkb	I think you are onto something with thattheory	18:48
jeblair	clarkb: i'm looking at refs, not objects	18:48
clarkb	ah	18:48
*** thomasbiege has quit IRC		18:48
jeblair	clarkb: after a 'git gc' (which did both objects and refs), it's real 0m52.021s	18:49
clarkb	jeblair: should we add a daily/weekly cronjob to git gc?	18:51
jeblair	clarkb, pleia2: does cgit do repo maintenance, or do we have a cron defined?	18:51
*** nayward has joined #openstack-infra		18:51
pleia2	jeblair: it does not	18:51
Alex_Gaynor	so in prepare_devstack.sh is there a reason we don't use --depth 1	18:51
jeblair	how did we end up with a packed repo state?	18:51
pleia2	jeblair: it's really just a web interface that accesses the repo, doesn't do much else	18:51
pleia2	jeblair: maybe that's how it's replicated?	18:51
clarkb	Alex_Gaynor: there is a reason and I always forget what it is	18:52
jeblair	Alex_Gaynor: that's used to build an image, then the full repo is available at basically no cost (which is useful because tests can run on any branch)	18:52
Alex_Gaynor	jeblair: ah ok, so it's in an image, that was hte missing bit in my mind	18:52
jeblair	Alex_Gaynor: yep. mordred was pointing out that in devstack-gate itself (in the wrap script) we could possibly be doing something smarter than 'git remote update'	18:53
jeblair	Alex_Gaynor: but we need to be careful that whatever we change there doesn't transfer load to the zuul server (where the actual test refs are served)	18:53
Alex_Gaynor	right	18:54
jeblair	pleia2: the repos that were just replicated look just like the gerrit repos	18:54
pleia2	ah, hrm	18:54
clarkb	maybe the cgit package comes with a cron to do it?	18:55
jeblair	btw, https clone from review.o.o is real 1m0.056s	18:56
jeblair	(using the local mirr,r not gerrit)	18:56
clarkb	so we are on par with that	18:57
jeblair	clarkb: _if_ we pack refs	18:57
jeblair	on the mirror	18:57
clarkb	ya	18:57
jeblair	packed refs only (not a gc): real 0m46.005s	18:58
*** hartsocks has joined #openstack-infra		18:58
jeblair	that's actually faster than the gc	18:58
bnemec	I'm seeing a couple of changes that have no Jenkins score and aren't showing up on the status page.	18:59
anteaya	the git-fetch-pack command allows you to specify <refs>: http://git-scm.com/docs/git-fetch-pack/1.8.3	18:59
bnemec	Should I go ahead and recheck them?	18:59
anteaya	ah shoot - that is 1.8.3	18:59
anteaya	:(	18:59
jeblair	bnemec: yep	19:00
bnemec	jeblair: Okay, thanks. I didn't want to drive any extra load unnecessarily.	19:00
anteaya	bnemec: more explaination here: https://bugs.launchpad.net/openstack-ci/+bug/1215522	19:00
uvirtbot	Launchpad bug 1215522 in openstack-ci "my recheck/reverify patch isn't showing up in status.openstack.org/zuul" [Undecided,New]	19:00
clarkb	jeblair: speaking of zuul. Does the zuul process that is currently running catch SIGUSR2 properly?	19:00
jeblair	clarkb: so i think we should 'git pack-refs --all' nightly on the mirrors	19:00
dansmith	jeblair: I haven't been rechecking things much since a lot of things seem to be failing all tests due to package fetch timeouts or something like that	19:00
jeblair	clarkb: yes, i restarted with both of those changes	19:00
dansmith	jeblair: is that just my imagination?	19:01
bnemec	jeblair: Oh, that's embarrassing. I even saw that link earlier.	19:01
clarkb	jeblair: I agree	19:01
jeblair	dansmith: nope, we're working on that now	19:01
anteaya	dansmith: no, that is the git issue we are working on	19:01
anteaya	dansmith: not your imagination	19:01
anteaya	bnemec: no worries	19:01
jeblair	clarkb: i'll write that change real quick?	19:01
dansmith	okay, I figured, but also figured more rechecks weren't likely to help :)	19:02
clarkb	jeblair: go for it	19:02
*** sarob has joined #openstack-infra		19:02
clarkb	jeblair: base it atop my haproxy change	19:02
*** lbragstad has left #openstack-infra		19:02
anteaya	dansmith: not right now, but you are free to spin the wheel and take your chances like everyone else	19:02
clarkb	jeblair: so that we can continue using the development env until we actually turn haproxy on	19:02
dansmith	anteaya: hah, okay :P	19:02
anteaya	:D	19:02
jeblair	clarkb: i think later we may want to swing back around and look into using a newer git on these servers	19:02
clarkb	jeblair: ++	19:02
jeblair	clarkb: because perhaps the newer git can deal with unpacked refs better	19:02
jeblair	clarkb: but i think i'm fine with packed refs in a mirror	19:03
clarkb	jeblair: any idea how packed refs like that will affect fetches of a few refs?	19:03
clarkb	does git unpack them and give you just what you want?	19:03
jeblair	clarkb: it's just the list of refs	19:03
clarkb	oh right you are packing refs. I keeping thinking objects	19:04
jeblair	clarkb: for use when git advertises what refs it has	19:04
clarkb	objects != refs and I need to beat that into my brain	19:04
clarkb	I am going to find some food really quick. I smell it so I won't be gone long	19:04
anteaya	happy food clarkb	19:04
pleia2	yes, lunch	19:05
reed	get good food	19:05
anteaya	happy lunch pleia2	19:05
mrmartin	jeblair: if you have some free minutes, please review https://review.openstack.org/#/c/42608/ it is blocking task in the groups portal. thnx!	19:05
reed	what's the current estimate for this patch to land somewhere? https://bugs.launchpad.net/horizon/+bug/1179526	19:05
uvirtbot	Launchpad bug 1179526 in horizon "source_lang in Horizon repo is overwritten by Transifex" [High,Confirmed]	19:05
reed	no, not that	19:05
reed	this one https://review.openstack.org/#/c/42608/	19:06
anteaya	is it in the queue, reed?	19:06
reed	anteaya, waiting for review	19:06
anteaya	sorry, no it isn't - I'm focused on queue questions, sorry	19:06
jeblair	mrmartin: this week is very unusual -- we're having a lot of load problems because we have a feature freeze this week, and we only have 2 infra team members working	19:07
jeblair	mrmartin: as soon as we have things working reliably again, i will review that patch and reed's as well	19:07
mrmartin	jeblair: ok, maybe on monday?	19:07
anteaya	both linked to the same patch	19:07
jeblair	mrmartin: certainly by monday	19:08
jeblair	mrmartin: did you see the instructions for running that locally on a test server?	19:08
mrmartin	jeblair, I tested it in a local vm	19:08
jeblair	mrmartin: even if we haven't merged that and launched the real server yet, i wanted to make sure you can work on it locally	19:08
jeblair	mrmartin: okay, great	19:08
mrmartin	was working in the test env, but you know, it doesn't mean that everything will be perfect on prod :D	19:09
anteaya	true, but it is a very good start	19:09
jeblair	yep :)	19:09
*** CaptTofu_ has quit IRC		19:09
*** CaptTofu has joined #openstack-infra		19:10
anteaya	zuul reports 0 events, yay	19:10
*** ^demon is now known as ^demon\|lunch		19:10
*** sarob has quit IRC		19:10
anteaya	24 gate, 1 post, 53 check	19:10
anteaya	almost manageable again	19:11
*** wenlock has quit IRC		19:11
*** CaptTofu has quit IRC		19:14
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Add a mirror repack cron to git servers https://review.openstack.org/43331	19:15
jeblair	I'm going to run the repack on all of the git servers, then eat.	19:15
hartsocks	Hi. I think my account on review.openstack.org is screwed up. Is this the correct channel for that?	19:16
clarkb	hartsocks: yes, what problem are you seeing	19:17
clarkb	the food is a lie. I need to wait a little longer on it	19:17
anteaya	jeblair: sounds good	19:17
anteaya	clarkb: k	19:17
hartsocks	clarkb: Example: https://review.openstack.org/#/c/43266/	19:18
hartsocks	The review system has decided I don't own about half my patches. Sometimes I show up as "hartsock" and other times as "hartsocks" I don't know why. Who do I ask about this? I've tried to get help on the mailing list in the past.	19:18
anteaya	clarkb: does the db have both a hartsock and a hartsocks?	19:18
clarkb	anteaya: checking	19:18
hartsocks	my preference would be to fold everything into 'hartsocks'	19:19
anteaya	hartsocks: great	19:19
clarkb	hartsocks: yeah you have two accounts	19:20
hartsocks	can you just fold everything to hartsocks?	19:20
anteaya	hartsocks do you have any repos with connections to gerrit? you might need to delete the remote branch and create a new remote branch to gerrit with `git review -s`	19:21
anteaya	to ensure you don't have any headed for hartsock	19:21
hartsocks	okay.	19:21
reed	stupid mailman	19:21
*** thomasbiege has joined #openstack-infra		19:21
clarkb	hartsocks: first a little background on why this appears to have happened	19:21
clarkb	hartsocks: you have logged into gerrit with two different launchpad accounts	19:22
anteaya	reed an app called mailman or the human being holding your mail?	19:22
clarkb	hartsocks: and if you push code with two different usernames changes will be attached to two different accounts	19:22
hartsocks	clarkb: whoops :-/	19:22
clarkb	hartsocks: if you want to be hartsocks you should login with your vmware launchpad account	19:22
reed	anteaya, the python code that delivers email	19:22
hartsocks	clarkb: The only actions have been git actions that seem to screw up.	19:23
clarkb	actually I take that back both acocunts see acm and vmware email	19:23
anteaya	reed: ah okay, stupid python code that delivers email	19:23
*** SergeyLukjanov has quit IRC		19:23
hartsocks	clarkb: I must have a git repo that was set up 'hartsock'	19:23
clarkb	hartsocks: that will do it	19:23
hartsocks	clarkb: I will go through them all and make sure they are 'hartsocks'	19:23
clarkb	hartsocks: you can set gitreview.username in your global git config to set it globally	19:23
clarkb	hartsocks: then make sure you don't have any local overrides	19:24
hartsocks	clarkb: I'm guessing that's in .git/config locally	19:24
anteaya	hartsocks: yes	19:24
clarkb	hartsocks: ~/.gitconfig but setting it with the git config command is preferred. `git config --global gitreview.username hartsocks`	19:24
reed	pleia2, mordred, jeblair: when you approve my message to infra mlist please whitelist also stefano+infra@openstack as allowed email	19:25
hartsocks	clarkb: thanks	19:25
pleia2	reed: will do, sec	19:25
clarkb	hartsocks: rolling stuff under the other name into hartstocks is probably possible, but this is a busy week and if you can live with those being wrong until they get merged or die that would probably be easiest	19:25
clarkb	I am also not sure if we have updated changes and the like in the past	19:26
clarkb	may not be possible	19:26
hartsocks	clarkb: now that I know what's going on I can live with that for a while.	19:26
hartsocks	clarkb: just want my karma points that's all :-)	19:26
anteaya	my local .git changes are in .git/config and I put them there with the git config command	19:26
*** thomasbiege has quit IRC		19:27
hartsocks	clarkb: (I know the points don't matter.)	19:27
anteaya	just like on Whose Line	19:27
*** ruhe has quit IRC		19:27
*** xBsd has joined #openstack-infra		19:28
bnemec	Dibs on being OpenStack's Ryan Stiles.	19:29
bnemec	I've even got the requisite height. ;-)	19:30
anteaya	perfect	19:30
anteaya	as a Canadian I'd like to try for Colin Mocherie	19:30
anteaya	but my gender might be a hinderance	19:30
anteaya	and I'm not bald	19:30
bnemec	It's all good - half the time they had him playing female characters anyway. ;-)	19:31
bnemec	Although the inability to make bald jokes would definitely be a problem. :-D	19:31
anteaya	I can wear a swim cap	19:32
*** vipul is now known as vipul-away		19:32
anteaya	I'm out sick for the richard simmons episode though	19:32
bnemec	Bah, what fun is that? :-P	19:33
anteaya	it's all you Ryan	19:33
anteaya	pleia2: are you still lunching?	19:34
anteaya	I'm trying to find a space for some exercise	19:34
pleia2	anteaya: I'm back-ish :)	19:35
anteaya	I can wait	19:35
anteaya	let me know when you are back	19:35
*** xBsd has quit IRC		19:35
*** HenryG has quit IRC		19:36
pleia2	anteaya: I'm back	19:36
anteaya	okay great	19:36
*** sarob has joined #openstack-infra		19:37
anteaya	thanks, off for a walk I expect to be back in an hour	19:37
pleia2	enjoy	19:37
*** yolanda has quit IRC		19:38
*** sdake_ has quit IRC		19:39
jeblair	okay, pack-refs has completed on all the git servers	19:40
*** sarob has quit IRC		19:41
jeblair	real 0m40.868s	19:42
jeblair	clone time for nova on 03	19:42
*** emagana has joined #openstack-infra		19:43
openstackgerrit	@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248	19:43
clarkb	jeblair: nice	19:43
* clarkb reviews the change to put that in place everywhere		19:43
jeblair	clarkb: are you back and ready to proceed, or killing time around lunch?	19:44
clarkb	jeblair: I should technically kill more time around lunch because the food I smell hasn't made it to the scavenging grounds yet	19:45
clarkb	jeblair: but I am also impatient. I think we should continue if you don't need more time for food	19:46
*** wenlock has joined #openstack-infra		19:47
clarkb	jeblair: I am fetching your cron change into the puppet development env	19:47
jeblair	clarkb: we can wait, i think we're getting to the point where we don't want to be interrupted	19:47
clarkb	jeblair: ok	19:47
clarkb	I will pull the change into that repo and possibly just find a sandwich	19:48
clarkb	to speed things along	19:48
*** p5ntangle has quit IRC		19:48
clarkb	jeblair: two things to note before I afk for a few minutes. The haproxy git:// queue and conn numbers may need changing and we may need to change the default balance type to source to accomodate lag in replication across the different servers	19:49
*** p5ntangle has joined #openstack-infra		19:49
clarkb	jeblair: the current balance method is round robin and git http by default can open up to five connections.	19:49
*** SergeyLukjanov has joined #openstack-infra		19:51
jeblair	k	19:52
jeblair	clarkb: we have cacti graphs for 01-04	19:53
*** gyee has quit IRC		19:57
*** CaptTofu has joined #openstack-infra		19:58
*** sarob has joined #openstack-infra		20:00
*** pcm_ has quit IRC		20:00
*** nayward has quit IRC		20:01
*** dina_belova has quit IRC		20:03
*** dina_belova has joined #openstack-infra		20:04
*** sdake_ has joined #openstack-infra		20:06
*** sdake_ has quit IRC		20:06
*** sdake_ has joined #openstack-infra		20:06
*** nati_uen_ has joined #openstack-infra		20:09
clarkb	jeblair: woot.	20:09
clarkb	jeblair: sandwich was good. ready whenever you are	20:09
jeblair	1 sec	20:09
jeblair	k	20:10
jeblair	so shall we merge the git:// change now?	20:11
*** p5ntangle has quit IRC		20:11
jeblair	clarkb: i'll let you do that since you haven't reviewed it	20:11
jeblair	https://review.openstack.org/#/c/43315/	20:11
clarkb	jeblair: the gate is undergoing a reset. should we wait a little bit for that?	20:11
clarkb	or just power through?	20:11
*** nati_ueno has quit IRC		20:12
jeblair	clarkb: power through	20:13
jeblair	it'll be done by the time that gets merged	20:13
clarkb	ok merging 43315 now	20:13
clarkb	s/merging/approving/	20:13
*** ^demon\|lunch is now known as ^d		20:13
clarkb	the zuul results queue is large again	20:15
clarkb	but that may just be a side effect of cancelling a bunch of stuff	20:15
jeblair	clarkb: yep	20:15
*** lcestari has quit IRC		20:22
*** dina_belova has quit IRC		20:25
*** SergeyLukjanov has quit IRC		20:25
*** vipul-away is now known as vipul		20:25
*** jbjohnso has joined #openstack-infra		20:26
*** nati_uen_ has quit IRC		20:26
*** danger_fo_away is now known as danger_fo		20:27
clarkb	jeblair: still waiting to get queued. Should I go ahead and force submit the change?	20:27
jeblair	clarkb: yeah, it's about 2/3 through reconfiguring the reset. let's not wait.	20:28
anteaya	back	20:28
openstackgerrit	A change was merged to openstack-infra/config: Switch ggp to use git:// https://review.openstack.org/43315	20:29
clarkb	jeblair: I am going to run a puppet agent --noop	20:29
*** sarob_ has joined #openstack-infra		20:29
clarkb	as a quick sanity check but then we should be ready to apply the haproxy stuff to git.o.o	20:30
clarkb	jeblair: it looks clean to me. should I go ahead and run puppet for real? are you ready?	20:32
*** sarob_ has quit IRC		20:32
*** sarob_ has joined #openstack-infra		20:32
*** sarob has quit IRC		20:33
jeblair	clarkb: yep	20:33
* clarkb pushes the go button		20:34
clarkb	jeblair: can I have you check the ip6tables rules after puppet is one on git.o.o? I noticed some weirdness there yesterday and think our iptables module may not be completely happy on centos	20:34
jeblair	k	20:34
clarkb	puppet is still running. I will let you know when to check	20:35
jeblair	clarkb: what was weird?	20:37
clarkb	jeblair: it didn't pick up the new 4443 29418 and 8080 rules. but I kicked it by hand and that seemed to work	20:37
jeblair	that seems to be the case again. probably a puppet bug	20:37
jeblair	clarkb: how's that puppet run?	20:38
jeblair	we're starting to fail jobs	20:38
clarkb	puppet is done running. haproxy is up	20:39
* clarkb checks a local clone really fast		20:39
clarkb	local clone of nova via git:// works	20:39
anteaya	yay	20:39
jeblair	i just did an https clone from home	20:39
jeblair	worked	20:39
* clarkb looks in the haproxy log for anything crazy looking		20:40
jeblair	(i'm cloning zuul, not nova though so i don't impact the server)	20:40
jeblair	git and http work too	20:40
jeblair	clarkb: how do we examine haproxy state?	20:41
clarkb	jeblair: /var/log/haproxy.log	20:41
clarkb	jeblair: is the log	20:41
pleia2	git is still running from xinetd, right?	20:41
clarkb	I think it opens a socket somewhere that you can talk directly to asw well /me finds that	20:41
jeblair	any way to see the current connection count, distributions	20:41
clarkb	pleia2: no this includes your daemon change	20:41
pleia2	clarkb: oh ok, great	20:41
clarkb	jeblair: good question. I am looking for that socket now	20:42
*** pblaho has quit IRC		20:42
clarkb	jeblair: http://code.google.com/p/haproxy-docs/wiki/UnixSocketCommands on /var/lib/haproxy/stats	20:43
jeblair	pleia2: would you mind writing a change to add the 'socat' package to the git servers?	20:44
*** woodspa has quit IRC		20:44
jeblair	i installed it manually on git.o.o	20:44
pleia2	jeblair: sure, on it	20:44
clarkb	jeblair: out of curiousity what command(s) are you running against that socket?	20:44
*** danger_fo is now known as danger_fo_away		20:45
jeblair	clarkb: https://etherpad.openstack.org/SIzEkjfC1R	20:45
jeblair	clarkb: this looks useful http://joeandmotorboat.com/2009/08/20/haproxy-stats-socket-and-fun-with-socat/	20:46
jeblair	(i've pasted in more output into the etherpad)	20:47
clarkb	thanks	20:47
psedlak	hi, how is the issue with python-*client dependency collisions solved for stable/grizzly branch? i've tried to get similar env at my machine and nova failed to start at all due to wrong versions of keystoneclient ... :/	20:49
psedlak	*similar env as gate-devstack-tempest-vm-full for stable/grizzly	20:49
*** jjmb has joined #openstack-infra		20:49
openstackgerrit	Elizabeth Krumbach Joseph proposed a change to openstack-infra/config: Add socat package to cgit servers https://review.openstack.org/43354	20:50
anteaya	psedlak: we are doing a little internal work now, and the ones able to answer your question need to focus on their fix right now	20:51
*** dkranz has quit IRC		20:51
anteaya	psedlak: if you have a link to a bug report or patch I can look at it, if you want	20:51
clarkb	jeblair: are you seeing any rampant failure?	20:51
clarkb	jeblair: best as I can tell we are mostly up	20:51
psedlak	anteaya: you mean it's not best time for it now ... should i ask later/tomorrow?	20:52
anteaya	you can try	20:52
anteaya	have you tried in -dev or -nova yet?	20:52
anteaya	the keystone folks hang out in -dev	20:52
jeblair	clarkb: nope, afaict, we seem to be distributing across all servers	20:52
jeblair	echo "show errors" \|socat stdio /var/lib/haproxy/stats	20:52
jeblair	is empty	20:52
*** jjmb1 has joined #openstack-infra		20:53
psedlak	anteaya: no, not yet as on gate it reinstalls (at least keystoneclient, but maybe also others) multiple times during setup (devstack) ... and there are clearly incompatible reqs http://bpaste.net/show/vnYioO66WaD27IC7C1dh/	20:53
jeblair	clarkb: since we broke the gate queue during the hup, that actually stopped the gate reset	20:54
jeblair	psedlak: in master, we are now forcing the requirements specified in openstack/requirements to be installed	20:54
clarkb	jeblair: there are a handleful of could not get file errors due to the lack of no .git to .git translation but far fewer than we seemed to have in the past	20:54
anteaya	psedlak: yeah, let's move to -dev and see if some -qa folks are around	20:54
*** jjmb has quit IRC		20:54
jeblair	psedlak: i believe devstack has code to do that; i'm not sure if all of that has been backported to grizzly yet, but it's under consideration at least (if it hasn't been done)	20:54
jeblair	psedlak: you might ask dtroyer	20:55
jeblair	clarkb: it's possible those errors are due to a smart http request failing during the hup	20:55
psedlak	jeblair: ok, thanks	20:55
clarkb	jeblair: that is possible	20:56
jeblair	clarkb: maybe give it a few mins and if they continue, start to worry? :)	20:56
* jeblair adds a new tree to cacti		20:56
clarkb	jeblair: can you add one for logstash + elasticsearch if you are collapsing things together?	20:57
*** thomasbiege has joined #openstack-infra		20:57
jeblair	clarkb: let me do that later; i'm just going to add a quick collection of graphs for git now; later i'll add a single graph that graphs multiple hosts; i'll do logstash then too	20:57
clarkb	ok wfm	20:57
*** thomasbiege has quit IRC		20:57
*** mrmartin has quit IRC		20:58
*** apcruz has quit IRC		20:59
jeblair	http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=2	21:00
anteaya	wow	21:01
anteaya	a few cpus finally taking a smoke break	21:01
*** gyee has joined #openstack-infra		21:01
jeblair	anteaya: heh, ie, smoking less?	21:01
anteaya	yeah, taking a break from smoking	21:02
*** hartsocks has left #openstack-infra		21:02
anteaya	what a difference	21:02
anteaya	the greens are so similar, they aren't even in nice, they went right to idle	21:03
jeblair	anteaya: you should be able to tell the difference on the graph; if you look at zuul, youll see nice time.	21:03
*** CaptTofu has quit IRC		21:04
jeblair	but yeah, nothing is nice here	21:04
clarkb	I think the gate reset is part of it	21:04
*** jjmb1 has quit IRC		21:04
anteaya	yeah look at those idle numbers	21:04
jeblair	clarkb: yes, git is basically idle now	21:04
anteaya	clarkb: okay, I'll see if I can see what happens on a gate reset	21:04
anteaya	w00t	21:04
*** CaptTofu has joined #openstack-infra		21:04
jeblair	we're back at one job at a time until the next reset	21:05
anteaya	one job? one what kind of job - one git clone job?	21:06
anteaya	looks like a gate reset coming up	21:07
jeblair	anteaya: well, we don't usually clone things, but yes, since all nodes are now occupied, they will each just pick up a new jenkins job (which will perform some git action) one at a time as they finish	21:07
anteaya	ah okay, I think I understand	21:07
jeblair	anteaya: without a new error, we're 17 minutes away from a gate reset	21:07
*** sarob_ has quit IRC		21:08
jeblair	openstack/nova 42435,7 is the first change with an failed job in the gate (at the moment)	21:08
anteaya	okay, can I see that on status.openstack.org/zuul?	21:08
anteaya	ah okay	21:08
*** sarob has joined #openstack-infra		21:08
*** CaptTofu has quit IRC		21:08
anteaya	right, a failed voting job	21:09
anteaya	I see it	21:09
*** changbl has quit IRC		21:10
anteaya	what is expected to happen at the next gate reset?	21:10
jeblair	anteaya: zuul will cancel any running jobs in the gate queue which will free many jenkins slaves at once to immediately start running new gate jobs which will stress the git server	21:11
anteaya	ah ha	21:11
anteaya	then we will see what happens	21:11
jeblair	then we'll see how the load-balanced server performs under our current load	21:11
jeblair	if it performs well, we can add more nodes; if it does not, we can add more git servers	21:11
anteaya	okay	21:11
anteaya	so 13 minutes of downtime for you	21:12
anteaya	or maybe a smoke break?	21:12
anteaya	or maybe not quite the stress load	21:12
*** sarob_ has joined #openstack-infra		21:12
*** sarob has quit IRC		21:13
anteaya	8 minutes	21:13
*** sarob_ has quit IRC		21:13
jeblair	well, perhaps a few minutes to switch to the other desktop and check in on the nodepool change i'm working on	21:13
*** sarob has joined #openstack-infra		21:13
clarkb	we might also need to tune the maxconn settings for git://	21:13
anteaya	jeblair: :D	21:13
jeblair	clarkb: this is one of those times i miss gerritbot reading merges in dev	21:13
clarkb	jeblair: ya	21:13
anteaya	like a mini vacation	21:14
clarkb	jeblair: I have been watching the post queue for that info now	21:14
jeblair	clarkb: we just merged 13 changes in the past 8 minutes	21:14
clarkb	nice	21:14
anteaya	here is a graph of merged changes: http://graphite.openstack.org/graphlot/?width=586&height=308&_salt=1377178709.576&target=stats.gerrit.event.change-merged	21:14
clarkb	fatal: git upload-pack: not our ref 39f1e9314ee28eed74cdaf3c447fc32a64e76f45 multi_ack_detailed side-band-64k thin-pack no-progress include-tag ofs-delta	21:15
clarkb	I think ^ may be related to non atomic mirror replication	21:15
* clarkb looks in the error log of the other servers		21:15
jeblair	clarkb: ya, where'd you see it?	21:15
clarkb	jeblair: that is on git.o.o and git02 has a couple as well	21:16
clarkb	git01 is clean	21:16
clarkb	03 is clean	21:16
clarkb	04 as well	21:16
clarkb	so not common at least not under heavy load	21:17
anteaya	3 minutes to gate reset	21:17
clarkb	jeblair: we can try switching to source balancing which may suck with the d-g slaves as they are all in similar network space, or add retries to our git stuff	21:17
jeblair	clarkb: what's the mask on source balancing?	21:18
jeblair	clarkb: why not go with the full 32?	21:18
clarkb	jeblair: I don't know that you can provide the mask	21:19
clarkb	I will look into it more closely	21:19
anteaya	gate is resetting	21:19
*** AJaeger has quit IRC		21:19
clarkb	jeblair: also note that reload in the haproxy init script should be mostly invisible to the clients	21:19
jeblair	clarkb: excellent	21:20
*** vipul is now known as vipul-away		21:22
*** nati_ueno has joined #openstack-infra		21:22
*** boris-42 has quit IRC		21:23
openstackgerrit	A change was merged to openstack/requirements: Allow use of oslo.messaging 1.2.0a10 https://review.openstack.org/43060	21:23
openstackgerrit	@Spazm proposed a change to openstack-infra/git-review: fixes behavior when port=None https://review.openstack.org/43248	21:23
jeblair	clarkb: lhttp://code.google.com/p/haproxy-docs/wiki/balance makes it look like like it considers the whole ip	21:23
clarkb	jeblair: yeah I am beginning to think that too. Looking in the source they use a hash over 32bit space with good distribution	21:24
clarkb	(accordng to the comments anyways)	21:25
clarkb	let me see if we can make the change with puppet (depends on whether or not it uses reload vs restart)	21:25
*** dina_belova has joined #openstack-infra		21:25
jeblair	gate just reset	21:26
clarkb	might be a little while before we reenable puppet though so I am open to doing it by hand if you want to get it in	21:26
jeblair	clarkb: may as well see how this reset goes, no rush	21:26
clarkb	ok	21:26
anteaya	when the patches in the gate pipeline change to unknown - that is the indicator that the gate is reset?	21:27
jeblair	oh, this is still markmc's chain, so it has to kick a bunch of changes out first before it actually starts jobs	21:27
jeblair	anteaya: there are no running jobs currently, it has canceled everything and is recomputing the new proposed series to merge	21:27
anteaya	okay, how do I see that using the status page, cacti and graphite?	21:28
anteaya	or can I?	21:28
*** dina_belova has quit IRC		21:28
jeblair	anteaya: the status page; if you look at the gate queue, you should see that nothing has started running yet	21:28
anteaya	right, but all the old jobs with any logs are no longer in the queue	21:29
anteaya	so that can be my indicator	21:29
*** dprince has quit IRC		21:30
*** krtaylor has quit IRC		21:31
*** krtaylor has joined #openstack-infra		21:32
jeblair	ok it's starting jobs now	21:35
anteaya	yes I see that	21:35
*** dina_belova has joined #openstack-infra		21:35
anteaya	and cpu usage for user on the git server is 1.7	21:36
anteaya	I don't see a spike	21:36
jeblair	anteaya: there's a 5 minute polling interval on graphite	21:36
anteaya	ah ha	21:36
jeblair	s/graphite/cacti/	21:36
anteaya	I'll check back in 5+ minutes	21:36
*** mriedem has quit IRC		21:36
anteaya	time for toast	21:37
*** dina_belova has quit IRC		21:40
anteaya	so the jobs that stress the git server are any job with devstack in it, correct?	21:41
clarkb	anteaya: they are the worst offenders	21:41
anteaya	ah okay	21:41
*** vipul-away is now known as vipul		21:42
anteaya	so far on my cacti graph user is up to 4	21:42
anteaya	with idle at 92.8	21:43
anteaya	nice ratio	21:43
clarkb	I am not seeing terrible load average on the individual servers	21:43
anteaya	yay	21:43
anteaya	any numbers for the etherpad?	21:43
clarkb	not yet. I am not sure that the full wave has hit us yet	21:44
*** pblaho has joined #openstack-infra		21:44
anteaya	okay	21:44
anteaya	but good early results	21:44
clarkb	load average: 0.39, 0.45, 0.43 on git.o.o these numbers are on cacti too	21:44
* anteaya scrolls down		21:45
*** ftcjeff has quit IRC		21:45
anteaya	toast	21:45
*** Ryan_Lane has quit IRC		21:47
*** Ryan_Lane has joined #openstack-infra		21:47
jeblair	clarkb: i have a disturbing thought; what if nova was the only repo on git.o.o that had packed refs?	21:48
clarkb	oh	21:48
clarkb	jeblair: hahahahaha	21:48
clarkb	well it seems happy now in any case :)	21:49
jeblair	clarkb: if that graph holds, then the inflection point of load dropping on git.o.o is much closer to the point where i ran pack-refs than when we started the other servers	21:49
clarkb	jeblair: makes sense	21:49
clarkb	if that is the case we can always scale back the additonal nodes	21:50
jeblair	well, if so, maybe we can just throw more load at it sooner. :)	21:50
clarkb	or that	21:50
jeblair	i see a lot of graphs on the status page that should have passed the point of git errors by now; and there are basically no git connections	21:51
jeblair	so i think we've seen as much 'rush' from this reset as we're going to	21:51
anteaya	yay	21:51
clarkb	jeblair: I agree. I do however think we should switch to source balance method	21:51
jeblair	clarkb: yep, let's do it now before the next rush?	21:52
jeblair	clarkb: and then perhaps unstick az2 and give nodepool its reigns again?	21:52
clarkb	ya I am checking puppet now and if puppet is sane will do it with puppet and if it isn't sane will do it by hand and update puppet	21:52
clarkb	jeblair: ++	21:52
clarkb	looks like it iwll use restart	21:53
clarkb	https://github.com/puppetlabs/puppetlabs-haproxy/blob/0.3.0/manifests/init.pp#L132	21:53
jeblair	grumble	21:53
clarkb	I will edit the file by hand, reload haproxy then do puppet so puppet doesn't see the change	21:54
jeblair	is it haproxy.cfg?	21:54
clarkb	jeblair: yes	21:54
clarkb	in /etc/haproxy/	21:54
*** dmakogon_ has quit IRC		21:54
jeblair	mgagne: want to write a puppet patch?	21:55
mgagne	jeblair: go on	21:55
*** ^d has quit IRC		21:55
jeblair	mgagne: it would be cool if changes to haproxy.cfg could run '/etc/init.d/haproxy reload' instead of 'restart' in the puppetlabs haproxy module https://github.com/puppetlabs/puppetlabs-haproxy/blob/0.3.0/manifests/init.pp#L132	21:55
jeblair	clarkb: do you think that makes sense? or are the times when you'd want to reload vs restart significant enough that there isn't a clear winner?	21:56
*** jjmb has joined #openstack-infra		21:56
anteaya	currently no failures in the gate queue/pipeline	21:57
jeblair	mgagne: restart is disruptive to clients, reload is not, and you do things like 'add new backend servers' by editing that file	21:57
mgagne	jeblair: are you suggesting setting $manage_service to false and handling the definition of the haproxy service within the node manifest?	21:57
*** mrodden has quit IRC		21:58
clarkb	I think that makes sense. But I don't know enough about haproxy to know if one is preferred over the other in some instances	21:58
jeblair	mgagne: well, either that, or make the puppetlabs module better; clarkb what do you think?	21:59
*** sdake_ has quit IRC		21:59
mgagne	jeblair: according to my coworker, reload is preferred. If restart is used and config contains an error, you are screwed, haproxy won't restart. restart will kill all the connections, reload won't.	22:00
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Use the haproxy source balance method. https://review.openstack.org/43359	22:00
clarkb	mgagne: yeah that is why we want reload. it should be much more invisible to end users	22:01
clarkb	I wrote 43359 so I can see what the puppet concat diff looks like before modifying the file	22:01
clarkb	jeblair: once I have reloaded haproxy we should merge these puppet changes	22:01
mgagne	jeblair: depends on your urgency: designing and proposing a patch, having it accepted, releasing on forge won't happen in one day	22:01
*** prad_ has quit IRC		22:02
clarkb	mgagne: understood. we will work around it now. But it is something that will probably end up being desirable to us and others	22:02
clarkb	at the very least I suppose I hsould open a bug with puppetlabs	22:02
mgagne	clarkb: yes, bodepd could use his contact to fast-forward the patch ;)	22:02
jeblair	clarkb: also, hunner looks like he's involved in that	22:03
mgagne	clarkb: it will be useful to us too as we are dealing with haproxy tuning atm	22:03
clarkb	oh I could just bug hunner	22:03
anteaya	the entire gate queue/pipeline has some test jobs running, so far no failures	22:03
mgagne	clarkb: yes, hunner is the man	22:03
clarkb	mgagne: are you puppetconfing?	22:03
anteaya	first failure is on the last (27th) patch	22:04
mgagne	clarkb: could you make the scope of your question smaller? :D	22:04
jeblair	anteaya: and it's a real test question	22:04
jeblair	anteaya: and it's a real test failure	22:04
* jeblair just writes what he reads		22:04
anteaya	yes, a voting job	22:04
mtreinish	anteaya: does that include the testr-full jobs too?	22:04
clarkb	mgagne: are you at the conference?	22:04
clarkb	a bunch of folks are there	22:05
mgagne	clarkb: not me =)	22:05
clarkb	just curious if you were part of the bunch	22:05
anteaya	mtreinish: testr jobs are running	22:05
pleia2	I have a dr appt to run off to, bbiab	22:05
mtreinish	anteaya: yeah, but they're not voting. I was curious if you've seen random failures there. (since they wouldn't trigger a gate reset)	22:06
clarkb	the way puppet concat works is weird. I am not entirely sure that merging that puppet change won't cause an ha proxy restart	22:06
mgagne	clarkb: I don't use puppet for client products, only internal stuff, mainly openstack. So they sent the ones designing products with puppet =)	22:06
jeblair	grenade test failed: https://jenkins02.openstack.org/job/gate-grenade-devstack-vm/3624/console	22:06
clarkb	jeblair: but I figure I should write the change locally, reload haproxy then worry about the restart later	22:06
anteaya	33 minutes gate-grenade-devstack-vm failed	22:06
anteaya	see ya pleia2	22:06
jeblair	but that's also a real test failure, not an infra failure)	22:06
anteaya	mtreinish: yes, so far testr jobs are running, not results back yet in the grouping	22:06
mtreinish	anteaya: ok thanks	22:07
anteaya	np	22:07
jeblair	i really need to mask aborted test results in zuul	22:07
clarkb	jeblair: reloading haproxy now	22:07
mgagne	clarkb: haproxy will be "notified" if haproxy.cfg is regenerated: https://github.com/puppetlabs/puppetlabs-haproxy/blob/0.3.0/manifests/init.pp#L79-L87	22:07
anteaya	jeblair: yay	22:07
clarkb	mgagne: yeah and I think concat ends up building it from scratch	22:07
clarkb	mgagne: but I think it checks a diff maybe	22:08
clarkb	haproxy reloaded	22:08
anteaya	jeblair: now I understand your prior question, I don't know how to open test logs reporting failure when the patch is still in the queue	22:08
anteaya	it takes me to jenkins and then I can't get to the log itself	22:09
mgagne	clarkb: it concats a bunch of fragments using a bash script: https://github.com/puppetlabs/puppetlabs-concat/blob/master/files/concatfragments.sh#L22-24	22:09
clarkb	anteaya: click on "console log" on the left hand side in jenkins	22:09
anteaya	clarkb: thanks	22:09
clarkb	jeblair: If you are happy with that stack of changes I think you can approve them now	22:10
clarkb	then we can reenable puppet on the servers	22:10
anteaya	here is a python26 error and it looks like a real error, not a git timeout: https://jenkins02.openstack.org/job/gate-nova-python26/1261/console	22:10
jeblair	clarkb: including source?	22:10
*** weshay has quit IRC		22:10
clarkb	jeblair: yes including source	22:10
clarkb	jeblair: I will just be careful when I start puppet again... I am not sure there is much we can do there	22:11
clarkb	I could move the init script aside :)	22:11
anteaya	mtreinish: here is a testr failure for a swift patch: https://jenkins01.openstack.org/job/gate-tempest-devstack-vm-testr-full/3500/console	22:11
mtreinish	anteaya: thanks I was just looking at it. It looks like one I've seen before where all the server creates in nova go to an error state	22:12
anteaya	okay	22:12
anteaya	hmmmm	22:12
clarkb	you can definitely see it is no longer roudn robinning requests if you tail the log	22:14
anteaya	have a nova patch failing both 26 and 27, look like real failures - 23 minutes until if finishes	22:14
clarkb	anteaya: link to py26	22:14
anteaya	py26: https://jenkins02.openstack.org/job/gate-nova-python26/1261/console	22:15
anteaya	py27: https://jenkins02.openstack.org/job/gate-nova-python27/1560/console	22:15
anteaya	the patch passed both in the check queue	22:15
clarkb	yup real failure	22:16
anteaya	I see those as being actual python failures, not git timeouts	22:16
anteaya	yay, my log parsing skills are getting better	22:16
anteaya	funny they passed in check	22:16
*** burt has quit IRC		22:16
jeblair	clarkb: did you confirm whether smart http client is one connection? if so, do you want to round-robin it? or shelve this topic until we have more graphs for 'source'?	22:18
clarkb	pleia2 mind checking cgit?	22:18
jeblair	clarkb: i think she's afk	22:18
clarkb	jeblair I think shelve	22:18
jeblair	clarkb: wfm	22:18
clarkb	jeblair thanks	22:18
anteaya	at 14 minutes we have a postgress failure: https://jenkins01.openstack.org/job/gate-tempest-devstack-vm-postgres-full/3919/console	22:20
anteaya	she is at a dr appointment	22:20
anteaya	s/postgress/postgres	22:21
mgagne	clarkb: trying to see if puppet service resource supports reload. But I'm always finding puppet bugs that have been opened for years without patch or conclusion...	22:21
clarkb	mgagne: I think you have to give ita restart command or something like that	22:21
clarkb	where puppet intends to `restart` but you have told it to do something else	22:22
mgagne	clarkb: yes, which (IMO) is suboptimal	22:22
clarkb	mgagne: I agree	22:22
*** pblaho has quit IRC		22:22
anteaya	the postgres error is from nova patch 28819,3	22:22
jeblair	clarkb: shall i unstick az2 nodepool now?	22:23
jeblair	anteaya: it's not an infra error	22:23
anteaya	jeblair: yay	22:23
anteaya	so far, no infra errors in the gate	22:23
mgagne	clarkb: is haproxy actually restarted when the config is updated?	22:23
clarkb	jeblair: yes I think we can open the flood gates	22:23
clarkb	mgagne: --noop says the service will be restarted	22:23
clarkb	mgagne: let me get the exact log line	22:23
mgagne	clarkb: service resource has the "refreshable" feature	22:23
anteaya	patch which will spark a gate reset to be finished in 4 minutes	22:24
clarkb	mgagne: notice: /Stage[main]/Haproxy/Service[haproxy]: Would have triggered 'refresh' from 1 events	22:24
jeblair	clarkb: done; az2 nodes should start showing up in a few mins	22:24
anteaya	8 in post, hopefully 6 more to join them	22:24
mgagne	clarkb: we can only hope the service provider detects that the haproxy service can actually be reloaded.	22:25
mgagne	clarkb: I don't see any trace of reload in that file: https://github.com/puppet/puppet/blob/master/lib/puppet/provider/service/init.rb	22:26
anteaya	this patch has the nova py26 and py27 errors: https://review.openstack.org/#/c/40565/ it is going to remain in the queue after reset, I guess there is nothing we can do about that	22:27
anteaya	it needs the logs from the failure attached to the patch and it won't get them otherwise	22:27
clarkb	anteaya: yeah that is normal	22:28
anteaya	okay	22:28
jeblair	clarkb: the first new az2 node is in use, it appears to be running a job	22:28
clarkb	mgagne: ok, I think I will just try starting puppet again on that server when jenkins is quiet	22:28
clarkb	mgagne: that way if it restarts it doesn't hurt a lot of stuff and we know about it. Otherwise \o/	22:28
jeblair	clarkb: you mean in november? :)	22:28
clarkb	jeblair: Friday afternoons are usually sanish	22:28
clarkb	of course this is no normal week	22:28
anteaya	heh	22:29
mgagne	clarkb: thanks for asking about reload, now I have to fix haproxy to reload with my setup =)	22:29
anteaya	that git.o.o cacti graph just looks beautiful	22:30
openstackgerrit	A change was merged to openstack/requirements: Allow pyflakes 0.7.3 https://review.openstack.org/35804	22:31
openstackgerrit	A change was merged to openstack-infra/config: Swap git daemon in xinetd for service https://review.openstack.org/43012	22:31
anteaya	10, 10 pretty patches in post ah ha ha ha lightening flash	22:32
clarkb	It feels like we are moving again	22:33
anteaya	w00t	22:33
anteaya	look at that graph of test nodes climb	22:34
anteaya	https://tinyurl.com/kmotmns	22:34
openstackgerrit	A change was merged to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	22:35
*** dina_belova has joined #openstack-infra		22:36
openstackgerrit	A change was merged to openstack-infra/config: Add a mirror repack cron to git servers https://review.openstack.org/43331	22:37
clarkb	jeblair: the time remaining numbers when you hover over the progress bars on the status page don't add hours properly	22:39
clarkb	jeblair: you can see that now if you look at the gate tempest jobs. I intend on taking a look at that when things are not so busy if no one else beats me to it	22:39
openstackgerrit	A change was merged to openstack-infra/config: Use the haproxy source balance method. https://review.openstack.org/43359	22:39
*** dina_belova has quit IRC		22:41
jeblair	clarkb: thx; yeah, i _think_ the bug is in status.js	22:41
anteaya	clarkb: just seems to be the ones in the gate, check and post seem reasonable	22:41
jeblair	clarkb: also, it needs to round better; anything < 60 seconds is 0min	22:42
clarkb	anteaya: yeah it has to do with jobs that roll over an hour in length	22:42
clarkb	anteaya: we keep the hour set to 00	22:42
anteaya	ah	22:42
anteaya	okay	22:42
anteaya	what happens if you just go with minutes and get rid of hours	22:42
anteaya	90 minutes rather than 1 hour 30 minutes	22:43
clarkb	anteaya: humans don't like reading timestamps like that	22:43
anteaya	I can live with it	22:43
anteaya	but other humans, okay	22:43
anteaya	movie running times are all like that	22:43
anteaya	120 minutes	22:43
anteaya	200 minutes	22:43
anteaya	gate reset	22:44
anteaya	12 in post!	22:44
anteaya	look at the test node numbers climb	22:44
mgagne	feels like a sport commentator =)	22:44
anteaya	I have to do something	22:45
anteaya	don't know enough to write any scripts to do any helpful changes	22:45
anteaya	I would have to ask questions, slows them down	22:45
mgagne	=)	22:46
anteaya	:D	22:46
anteaya	I'll learn more when it is quieter	22:46
wenlock	hey guys, grats on getting your current challenge fixed... was wondering if i could ask a few questions ... ive been working on trying to understand puppet and using wiki	22:47
jeblair	clarkb: a noticable bump in the git cpu graphs	22:49
jeblair	wenlock: what's your question?	22:49
clarkb	jeblair: we still seem to be under control though	22:51
jeblair	clarkb: yep, seems well within capabality atm	22:51
anteaya	clarkb: here is a bug report for you: https://bugs.launchpad.net/openstack-ci/+bug/1215659	22:52
clarkb	jeblair: I am going to enable puppet on 01-04 since all of the outstanding changes that affect them have merged	22:52
uvirtbot	Launchpad bug 1215659 in openstack-ci "zuul status bars hover box "time remaining" fails after 61 minutes" [Undecided,New]	22:52
clarkb	jeblair: I will hold off on git.o.o until I can do it semi safely	22:52
jeblair	clarkb: ok	22:53
*** mrodden has joined #openstack-infra		22:54
clarkb	in other news I think the kicking out of changes that may not merge is the greatest thing ever	22:54
jeblair	This change was unable to be automatically merged with the current state of the repository and the following changes which were enqueued ahead of it: 31061, 41723, 42430, 42431, 43088, 42751, 42746, 42744, 42743, 41070, 42745, 42765, 42747, 42432, 42433, 42434, 42435, 42436, 42437, 42748, 42749, 42750, 42752, 40845, 37465, 38601. Please rebase your change and upload a new patchset.	22:55
jeblair	clarkb: ^ you mean like that? :)	22:55
jeblair	there's a merge conflict in there! somewhere!	22:55
clarkb	jeblair: ya :)	22:55
clarkb	I think the choice to sacrifice the few for the many was the correct one	22:56
wenlock	i setup wiki on a private server, using the wiki.pp module it installed ok, but seems only mysql is started	22:56
wenlock	is there some additional modules that control started state?	22:57
clarkb	gate throughput is much higher now in the best case scenario	22:57
wenlock	or should i have expected to see a running server on port 80?	22:57
jeblair	clarkb: the needs of the many outweigh the needs of the few (or the one).	22:57
anteaya	look at all those recent merges: http://graphite.openstack.org/graphlot/?width=586&height=308&_salt=1377178709.576&target=stats.gerrit.event.change-merged	22:57
jeblair	wenlock: unfortunately, some parts of the wiki servers aren't in puppet :(	22:57
jeblair	wenlock: i believe Ryan_Lane is planning on working on that when he gets a chance	22:58
jeblair	wenlock: but i think at least some of the config is just on-host	22:58
Ryan_Lane	very little of it is just on-host	22:58
jeblair	wenlock: however, we do have some documentation about how upgrades are manually performed	22:58
jeblair	wenlock: http://ci.openstack.org/wiki.html	22:58
Ryan_Lane	just the mediawiki software and its config	22:58
Ryan_Lane	everything else is in the module	22:58
jeblair	Ryan_Lane: ah ok	22:59
jeblair	wenlock: that upgrade documentation might be able to serve as install configuration too	22:59
jeblair	act	22:59
jeblair	wenlock: that upgrade documentation might be able to serve as install documentation too	22:59
wenlock	ok, cool... thats making a little more sense now :D	23:00
*** datsun180b has quit IRC		23:00
clarkb	puppet is running on 01-04	23:01
clarkb	now I will check cgit	23:01
clarkb	cgit seems happy	23:02
anteaya	i don't see any failures in the gate queue/pipeline yet	23:02
clarkb	jeblair: http://git.openstack.org/cgit/openstack-infra/config/stats/ you write a lot of commits apparently	23:03
*** rnirmal has quit IRC		23:03
mgagne	at least I'm on the list ^^'	23:03
*** notmyname has quit IRC		23:04
clarkb	I wonder if that is counting patchsets	23:04
*** notmyname has joined #openstack-infra		23:04
anteaya	mgagne: I'm just hoping I am in other somewhere	23:04
anteaya	:D	23:04
* clarkb looks in status.js to focus on something different for a bit		23:05
mgagne	anteaya: http://git.openstack.org/cgit/openstack-infra/config/stats/?period=q&ofs=25	23:06
*** pcrews has quit IRC		23:06
mgagne	clarkb: how about upgrading apache puppet module to latest version :D /jk	23:06
clarkb	mgagne: you arefunny	23:06
*** wenlock has quit IRC		23:07
anteaya	mgagne: yay I'm on the list, thanks	23:07
anteaya	clarkb: did you see this? https://bugs.launchpad.net/openstack-ci/+bug/1215659	23:07
uvirtbot	Launchpad bug 1215659 in openstack-ci "zuul status bars hover box "time remaining" fails after 61 minutes" [Undecided,New]	23:07
anteaya	or did it get lost in the blur?	23:07
*** pabelanger_ has quit IRC		23:07
clarkb	anteaya: I did thanks	23:07
anteaya	np	23:07
clarkb	it popped up in my email which is what prompted me to look a tit	23:08
clarkb	that is an unfortunate typo	23:08
anteaya	cool	23:08
anteaya	let it pass	23:08
*** pabelanger has joined #openstack-infra		23:08
anteaya	did you ever get real food, clarkb?	23:08
*** notmyname has quit IRC		23:08
anteaya	or are you still running on sandwich?	23:09
clarkb	anteaya: sandwiches are real food	23:09
*** notmyname has joined #openstack-infra		23:09
anteaya	that they are yes, I was referring to the aromatic food that was cooking earlier	23:09
clarkb	jeblair: I think I see the bug in status.js	23:09
* anteaya lives on sandwiches herself		23:09
*** jhesketh has quit IRC		23:11
*** sdake_ has joined #openstack-infra		23:12
*** jhesketh has joined #openstack-infra		23:14
*** _TheDodd_ has quit IRC		23:14
pleia2	clarkb: back, lmk if you still need tests	23:15
clarkb	pleia2: I think we are good	23:16
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Fix zuul status hours display. https://review.openstack.org/43375	23:16
clarkb	jeblair: anteaya ^	23:16
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Fix error with stats for de-configured resources https://review.openstack.org/43376	23:17
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Make jenkins username and private key path configurable https://review.openstack.org/43377	23:17
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Move setup scripts destination https://review.openstack.org/43033	23:17
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Change credentials-id parameter in config file https://review.openstack.org/43016	23:17
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Reduce timeout when waiting for server deletion https://review.openstack.org/43017	23:17
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Add option to test jenkins node before use https://review.openstack.org/43313	23:17
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Add JenkinsManager https://review.openstack.org/43014	23:17
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Add an ssh check periodic task https://review.openstack.org/43015	23:17
jeblair	clarkb: something about a lot of patches?	23:17
clarkb	jeblair: ya you write them :)	23:17
pleia2	whee :)	23:17
*** mriedem has joined #openstack-infra		23:18
jeblair	clarkb: so that adds the node test feature; it's completely optional, and i'm not sure i want to use it, but i figured it'd be good to get that lever in place in case we want to pull it	23:18
clarkb	++	23:18
jeblair	clarkb: i'm actually more leaning toward thinking that getting zuul to re-run jobs that come back with jenkins exceptions is the way to go, and i think we can do that without a change to the gearman plugin	23:18
clarkb	ooh	23:19
jeblair	clarkb: but i'll go ahead and write up the jjb change to populate the node test job so it'll be there if we want it	23:19
*** mrodden1 has joined #openstack-infra		23:19
*** mrodden has quit IRC		23:20
clarkb	sounds good. I may take a break shortly to do something other than type in a terminal. But plan to do some code review after that	23:23
anteaya	if we are re-running jobs that return with exceptions do we have some form of counter so it doesn't loop endlessly?	23:23
clarkb	I have found that code review at night is nice because there are few distractions	23:23
clarkb	anteaya: In this case it may be ok to loop endlessly as the failure is on th ejenkins side	23:23
anteaya	okay	23:24
jeblair	anteaya, clarkb: i think i would use a counter; if jenkins goes crazy i don't want everything stuck in zuul	23:24
anteaya	makes sense	23:24
jeblair	clarkb: do you think we're in a good place to add, say, 8 more centos nodes?	23:24
jeblair	oh,...	23:25
jeblair	actually, we should re-evaluate now that they should be using the git protocol	23:25
jeblair	they may not be as far behind now	23:25
clarkb	jeblair: they are using git protocal and a spot check showed that it sped up ggp tremendously for them	23:25
*** Adri2000 has quit IRC		23:25
anteaya	we have a LOST in the gate, 40833, 10: https://jenkins02.openstack.org/job/gate-grenade-devstack-vm/3669/console	23:26
jeblair	anteaya: yeah, that's the situation that either the nodepool change or (hopefully, still looking into it) a zuul change would fix	23:26
anteaya	okay	23:26
anteaya	what more does zuul need fixed?	23:27
anteaya	trying to keep up	23:27
clarkb	I think the LOST jobs is the last major outstanding item	23:27
jeblair	anteaya: the change we were just talking about with exceptions	23:27
anteaya	yay, we finally got there	23:27
anteaya	sorry, I will re-read	23:27
clarkb	which means I need to get into code review mode soon	23:27
anteaya	oh yeah, coming back from jenkins with an exception	23:27
clarkb	if it makes everyone feel better about this week NASDAQ halted trading today due to a technical issue	23:28
anteaya	you are kidding	23:29
clarkb	nope. for 3 hours today they shut it down	23:29
jeblair	clarkb: you know, the sun's magnetic polarity is reversing. just sayin.	23:29
anteaya	can't imagine what it would be like on the NASDAQ tech team	23:29
anteaya	ha ha ha	23:29
anteaya	it happens every 11 years	23:29
anteaya	but yeah, 11 years ago we didn't have the reliance on tech we have today	23:30
anteaya	that is for sure	23:30
clarkb	jeblair: I joked in a different channel that their ops team must be at puppetconf	23:31
clarkb	anteaya: ^	23:32
anteaya	ha ha ha	23:32
pleia2	hah	23:32
jeblair	clarkb: hrm, it looks like that error came back as a regular work_fail, just without a result	23:32
jeblair	clarkb: so not quite as nice as a work_exception, but that might still be actionable	23:32
clarkb	jeblair: hmm. I think jenkins is catching that and bottling it up before gearman plugin sees it	23:33
clarkb	jeblair: so it becomes a failed test with no result	23:33
*** Adri2000 has joined #openstack-infra		23:33
clarkb	there is just not enough data in the return from the job future	23:33
jeblair	clarkb: possibly; but i'm also double checking that either gearman-plugin or java-gearman isn't turning that into work_fail	23:34
*** jhesketh has quit IRC		23:35
clarkb	jeblair: does gearman plugin break the timeout plugin? there are a few jobs that seem to have run much longer than is allowed	23:36
clarkb	back when git was slow	23:36
*** dina_belova has joined #openstack-infra		23:36
jeblair	clarkb: yeah, i think you're right; if gearman-plugin gets an exception, it should return work_exception	23:37
clarkb	jeblair: there may be info returned by the future that can be examined	23:38
clarkb	jeblair: you may have to grep through the console log which seems dirty	23:38
clarkb	or treat a failure with no result as a jenkins exception	23:39
jeblair	it seems weird that the result would be null	23:39
clarkb	ya	23:39
*** jhesketh has joined #openstack-infra		23:39
jeblair	it seems accurate enough; i'm willing to do it, but it also seems tenuous	23:39
clarkb	to slightly change the subject, I think we should release a new zuul version if lasts night's bug fix holds up	23:40
clarkb	though that bug was only in unreleased zuul so it may not be very urgent	23:40
*** dina_belova has quit IRC		23:41
jeblair	i think it's probably time for me to write a mailing list update	23:41
clarkb	++	23:42
jeblair	along the lines of 'mostly better' still working on a few things.	23:42
*** shardy is now known as shardy_afk		23:42
jeblair	and i guess an announcement of git.o.o (not the way i expected it to be announced)	23:43
clarkb	ya	23:43
clarkb	these things happen	23:43
pleia2	jeblair: including git.o.o in the same post? (I don't mind writing a separate one, I was thinking about blogging about it too)	23:44
*** sdake_ has quit IRC		23:44
jeblair	i think it actually deserves its own post, so i think i'll mention it, but i think pleia2 should also write an email about it	23:44
anteaya	jeblair: I think there would be many happy people if there was a ml update	23:44
jeblair	i think i should mention it as i describe what we're doing to handle the load	23:44
jeblair	but i also want people to really learn about git.o.o and how cool it is	23:44
clarkb	jeblair: ++	23:45
jeblair	and that should be its own topic/post	23:45
jeblair	pleia2: how does that sound?	23:45
anteaya	yes, I agree	23:45
pleia2	jeblair: wfm	23:45
clarkb	I think if you mention it in passing to explain the mitigation of test failures that leaves the door open to give it a proper writeup	23:45
pleia2	I'll update the ci.o.o/git docs real quick first	23:45
pleia2	(I'll need clarkb to review)	23:45
clarkb	oh ya I completely neglected to write docs on the haproxy stuff >_>	23:46
*** fbo is now known as fbo_away		23:46
jeblair	pleia2: cool, so you'll handle the git.o.o post then, at your leisure, and i'll mention it in passing and that you'll be sending a real announcement	23:46
jeblair	clarkb: i haven't written nodepool docs yet either	23:46
pleia2	clarkb: no worries, I'm on it	23:46
jeblair	speaking of which...	23:46
jeblair	fungi hasn't disappeared yet, has he?	23:46
* clarkb waits for gerritbot to announce new change adding nodepool docs :)		23:47
jeblair	clarkb: ha	23:47
clarkb	jeblair: it sounded like today was going to busy for him	23:47
clarkb	and that he would try to be on this evening	23:47
jeblair	ok, but he's not on a boat yet, so he might catch this...	23:47
clarkb	jeblair: correct. boat is tomorrow morning	23:47
jeblair	fungi: for the 'run your own devstack-gate node' thing -- i need to delete all the node launching stuff from d-g....	23:47
jeblair	fungi: the shell scripts to actually do all the work are fairly well split out now...	23:48
jeblair	fungi: so there are two approaches for migrating that	23:48
clarkb	I am going to run home really quick so that I can do code review on the couch	23:48
clarkb	s/code/docs/ as appropriate	23:48
jeblair	fungi: 1) instruct people on how to run those scripts on a node (sort of a one-off "make this a devstack-gate node" process)	23:49
openstackgerrit	A change was merged to openstack-infra/zuul: Make updateChange actually update the change https://review.openstack.org/43220	23:49
jeblair	fungi: or 2) how to set up a local nodepool (more complicated, but you can spin up replacement nodes easily)	23:49
jeblair	fungi: (#2 is more or less palatable depending on whether nodepoll still works with sqlite in low-volume; that's unknown at this point)	23:50
*** michchap has joined #openstack-infra		23:52
*** Adri2000 has quit IRC		23:53
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Add node-test job https://review.openstack.org/43381	23:53
*** rcleere has quit IRC		23:55

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!