Wednesday, 2013-08-21

mgagne	how are repositories named on the filesystem? with or without .git suffix?	00:00
clarkb	mgagne: with	00:00
clarkb	jeblair: pleia2 I can fetch that neutron ref now	00:02
clarkb	should I go ahead and add symlinks for all the things?	00:02
clarkb	or should we focus on actual solution now that we know that is sufficient	00:02
*** datsun180b has quit IRC		00:02
*** reed has quit IRC		00:03
jeblair	clarkb: i'd add the symlinks for the existing projects	00:04
mgagne	Who is therefore returning an URL without .git in it?	00:04
jeblair	mgagne: exactly, that's the question; is it a difference in the git version?	00:05
mgagne	jeblair: could it be the git client?	00:05
clarkb	well our test scripts are hard coded to use paths without .git	00:05
jeblair	clarkb: yes, but our scripts don't fetch packfiles, git does	00:06
clarkb	so either something is adding it when we talk to review.o.o or our rewrite and aliasmatch stuff is munging it or git version makes a difference	00:06
jeblair	clarkb: so either the git client or git server is doing something unexpected	00:06
jeblair	clarkb: remember, almost no pack files were retrieved from review.o.o	00:06
jeblair	clarkb: i am not certain this is a difference	00:06
mgagne	if it's related to the git client, I believe that both should be supported (w/ and w/o .git) to avoid frustration and issues with the enduser/devs	00:07
jeblair	2013-08-20 23:57:07.094 \| error: Failed connect to git.openstack.org:443; Connection refused while accessing https://git.openstack.org/openstack/tempest/info/refs	00:08
mgagne	should it be handled by Apache or by the filesystem, I don't know what's the best.	00:08
fungi	yeah, pretty sure no amount of filesystem or cgi adjustments are going to solve a connection refusal from apache	00:12
jeblair	fungi: different problem	00:12
fungi	granted	00:16
fungi	but suspect we could be hitting overall connection limits too	00:17
jeblair	fungi: yep	00:18
fungi	git.o.o is acting a lot more hammered than review.o.o was, even though the load average isn't near as high	00:19
SpamapS	do we run devstack on a py26 system in the gate?	00:19
SpamapS	or just unit tests?	00:19
fungi	SpamapS: just unit tests	00:19
SpamapS	I think python-novaclient may be uninstallable in py26	00:19
SpamapS	AttributeError: 'module' object has no attribute '__getstate__'	00:19
*** dkliban has quit IRC		00:19
SpamapS	File "/usr/local/lib/python2.6/dist-packages/setuptools/sandbox.py", line 58, in run_setup	00:21
SpamapS	hm thats actually in ye-olde distribute	00:21
jeblair	clarkb, pleia2: think the pack thing may be a tiny bit of a red herring	00:21
jeblair	mgagne: ^	00:21
mordred	jeblair: oh piddle :)	00:21
fungi	ping rtt to git.o.o is averaging 1600ms for me right now, as opposed to review.o.o which is around 55ms	00:21
jeblair	clarkb, pleia2: i _suspect_ that those files are only retrieved directly by the _dumb_ http client	00:21
SpamapS	ahh have to remove python-pkg-resources	00:21
clarkb	jeblair: interesting	00:22
jeblair	that job fell back on the dumb client because:	00:22
jeblair	2013-08-20 22:50:00.379 \| error: The requested URL returned error: 504 while accessing https://git.openstack.org/openstack/neutron/info/refs	00:22
pleia2	ah	00:22
jeblair	it thought the smart client wasn't available	00:22
clarkb	jeblair: so our rewrites are not working properly	00:22
jeblair	clarkb: correct, they're just plain wrong but pretty much never used (my hypothesis)	00:23
clarkb	also writing a script to make these symlinks that is idempotent and not insane is taking too much time	00:23
jeblair	clarkb: i would consider abandoning that and deleting the symlinks at this point	00:23
mordred	SpamapS: you need to pip install -U pip before installing anything via pip currently	00:23
mordred	SpamapS: if you want to be safe	00:23
jeblair	and add a medium priority todo fix the rewrites	00:23
SpamapS	mordred: did that, had to apt-get remove python-pkg-resources	00:24
mordred	SpamapS: you will pip install -U setuptools	00:24
mordred	SpamapS: that mainly means that something borked something first	00:24
SpamapS	mordred: and apt-get remove python-setuptools	00:24
mordred	that should not be necessary	00:24
mordred	but if something pip installed something first	00:24
mordred	you'll need to do that to recover	00:24
SpamapS	first two things I did were exactly that, pip install -U pip, and then pip install -U setuptools	00:24
jeblair	fungi: yeah. my interactive shell is very slow too.	00:24
mordred	SpamapS: wow. really?	00:24
mordred	sigh	00:24
mordred	SpamapS: this is on precise?	00:24
mordred	SpamapS: or?	00:25
SpamapS	yeah.. had to apt-get remove setuptools and then re-do pip install -U setuptools to recover :-/	00:25
SpamapS	mordred: lucid	00:25
mordred	oh. jeez	00:25
SpamapS	mordred: tryign to test py26	00:25
mordred	sorry. I have done zero testing of lucid	00:25
fungi	and yet load average on git.o.o is in the single digits, not >200 like we saw on review.o.o	00:25
mordred	god only knows how broken it is	00:25
clarkb	jeblair: ok	00:25
mordred	SpamapS: we have workarounds for that in devstack, which involve wget-ing things	00:25
clarkb	jeblair: in the mean time a bunch of jobs will fail for random things	00:26
mordred	SpamapS: the situation is pretty messed up	00:26
clarkb	jeblair: do we maybe want to point everything at /cgit for now?	00:26
jeblair	clarkb: no, we need to make git.o.o responsive	00:26
jeblair	moving it around to a different unresponsive thing isn't going to make us happy	00:26
jeblair	clarkb: that timeout could have happened just as easily talking to cgit	00:27
mgagne	what is git.kernel.org using to server requests over http?	00:28
*** jog0-away is now known as jog0		00:28
jeblair	so how about we go ahead and load balance it, even though we don't have a good config, and we can come back and make it sane later?	00:28
jeblair	start throwing hardware at the problem	00:28
mordred	jeblair: ++	00:28
pleia2	we're close to a good config	00:28
pleia2	at least, to limping	00:29
fungi	i thought we had git.o.o in cacti, but i guess not	00:29
pleia2	https://review.openstack.org/43012 switches us over to service git:// then we bring in clarkb's haproxy patch https://review.openstack.org/#/c/42784/ (will need some edits after my patch)	00:29
jeblair	pleia2: i meant a config that is correctly tuned for the tradeoffs we have chosen (based on those things we talked about in the meeting)	00:29
pleia2	jeblair: oh, right	00:30
*** nati_ueno has quit IRC		00:30
jeblair	pleia2: but those are definitely steps in the right direction	00:30
clarkb	jeblair: so I agree that the pack thing isn't the only issue, but until we fix that redirect or have the symlinks every single one of those fetches will fail	00:30
jeblair	clarkb: but they should never happen unless there has already been an error	00:30
jeblair	clarkb: i'm trying to say we've already lost by the time that fetch happens, we need to make sure it never happens	00:31
clarkb	is that what that means? I was clearly focusing way too hard on symlinks of all things	00:31
jeblair	clarkb: yeah, that's what i was trying to say earlier -- the smart http client should never fetch those	00:32
jeblair	clarkb: it only did it because it thought the smart http server wasn't available	00:32
jeblair	clarkb: supporting those urls only means that if the smart http client fails, our jobs will suck even _more_ data from git.o.o using the dumb client	00:32
clarkb	yeah	00:32
clarkb	if we are going to go mulitnode, I wonder if it is worth investigating using precise for the git-http-backend serving	00:33
openstackgerrit	Jeremy Stanley proposed a change to openstack-infra/config: Start trending git.o.o performance with cacti https://review.openstack.org/43023	00:33
clarkb	since as fungi pointed out git.o.o is feeling the load a lot more than review.o.o	00:33
* SpamapS considers snapshotting this lucid box for the next time a python2.6 thing is needed.. so much pip.. so little time		00:34
clarkb	then serve /cgit from centos and everything else from ubuntu nodes	00:34
mordred	jeblair: what about tuning apache to do a much smaller number of active connections and tcp backlog most of them?	00:34
* clarkb goes back to cleaning up symlinks		00:34
mordred	jeblair: to prevent the overloaded/timeout situation?	00:34
jeblair	the current situation that fungi pointed out is weird.	00:34
jeblair	the load average is low, cpi is mostly idle	00:35
jeblair	cpu	00:35
jeblair	i'm wondering if it's hypervisor host system load	00:35
fungi	some other sort of starvation here. perhaps interrupt handling?	00:35
jeblair	or network bottleneck	00:35
fungi	or, yes, something on the host compute node maybe	00:35
SpamapS	I've only ever seen phantom hypervisor load on xen	00:35
mordred	hypervisor host system load sounds like an interesting cause for timeouts in taht situation	00:35
SpamapS	kvm has served me well and reported it as "stolen" CPU	00:36
jeblair	SpamapS: i meant other vms starving the hardware	00:36
SpamapS	Yeah, that should be the same thing.	00:36
SpamapS	"CPU time that I should have gotten went somewhere else"	00:36
SpamapS	that is supposed to be "steal%"	00:36
jeblair	SpamapS: ah, well, it says its low, 0.0-0.2; are you saying it's unreliable under xen?	00:37
SpamapS	I have seen it be totally unreliable in xen	00:38
SpamapS	Most famously in the phantom load seen on Ubuntu 10.04 hosts on ec2	00:38
jeblair	mordred: i think that tuning strategy would be good if we new how many git operations we could handle at once	00:38
SpamapS	(which has mostly cleared up as they've upgraded their xen)	00:38
mordred	jeblair: indeed. I was going to suggest starting with number of 'cores'	00:39
mordred	jeblair: but that might take too long to chase	00:39
jeblair	clarkb: i like your idea of serving git and cgit separately... because i think we may want to tune them separately	00:39
mordred	jeblair: if we're going to do that ^^	00:39
jeblair	mordred: that's reasonable with lbaas?	00:39
mordred	jeblair: I guess? OR - how about for now we just spin up haproxy so that we can actually control it	00:39
fungi	in the past when we've been dos'd by our neighbors chewing up resources on the compute node, we've usually observed significant packet loss. in this case we're only seeing very high rtt, which suggests the kernel is being slow about processing the packets	00:39
mordred	jeblair: and later we can engineer it into lbaas	00:40
mordred	just so that our learning curve is lower	00:40
* mordred assumes the people in this room can probably tweak an haproxy machine pretty quickly		00:40
jeblair	oh, well, i guess we're talking about two https services, so that's trickier	00:40
mordred	oh yeah. good point	00:41
clarkb	symlinks all cleaned up	00:41
mordred	jeblair: three-tier?	00:41
fungi	you'd have to terminate ssl/tls on the haproxy and then do stream munging/rewriting on the plaintext http stream, which would get ugly	00:41
jeblair	i think the thing we can do quickly is spin up more copies of what we have	00:42
mordred	jeblair: haproxy in front of a couple of apache nodes with mod_proxy that do termination that proxy to different git serving machines?	00:42
mordred	jeblair: but yes. I think that's the quickest direct route to try first	00:42
jeblair	so maybe we ought to do that, stick something (haproxy or lbaas) in front of it	00:42
mordred	and we can furthre optimize by splitting in fancy ways later	00:42
openstackgerrit	A change was merged to openstack-infra/config: Start trending git.o.o performance with cacti https://review.openstack.org/43023	00:42
jeblair	and then come back for another pass... yeah ^	00:42
mordred	jeblair: also - is it worth spinning up a copy on centos, and one on precise just to see if the backends perform differently? or too much work due to how our cgit module is written?	00:43
jeblair	mordred: we have to figure out how to install cgit on precise 1st	00:43
pleia2	mordred: there is no cgit package for ubuntu (which is why we went with centos)	00:43
mordred	jeblair: yeah. good point. later	00:43
jeblair	mordred: it's _definitely_ worth it, but later, i think.	00:43
mordred	jeblair: last stupid question from me - given the xen theory from earlier - is it worth trying to spin up a centos node at hpcloud to see if kvm gives us more love?	00:44
jeblair	okay, so working plan so far: spin up git01 and git01.o.o, and front them with (haproxy on git.o.o) or (lbaas) ?	00:44
mordred	these boxes don't need email really	00:44
mordred	(for now)	00:44
jeblair	mordred: i don't think it's a xen problem as much as a bad neighbor problem	00:44
mordred	nod	00:44
pleia2	mordred: well, might be worthwhile just so we have one on rackspace and one on hpcloud	00:44
*** ^d has joined #openstack-infra		00:44
*** ^d has joined #openstack-infra		00:44
jeblair	mordred: i hear hpcloud has a particularly bad tenant. i'd hate bo be stuck near us.	00:44
mordred	jeblair: hahaha	00:45
clarkb	jeblair: hahahahah	00:45
pleia2	hah	00:45
mordred	jeblair: working plan sounds good	00:45
fungi	we are the bad neighbor	00:45
fungi	nice	00:45
mordred	we need to replicate to both of them from gerrit, yeah?	00:45
*** woodspa has joined #openstack-infra		00:45
mordred	so we're going to have ot bounce gerrit	00:45
clarkb	like a bad neighbor openstack infra is there	00:45
clarkb	the jingle doesn't quite work but I laughed inside	00:45
jeblair	mordred: i'm still worried about hpcloud deleting nodes. you got an email that they deleted one the other day, right? let's put git03 in hpcloud if we want to try that.	00:45
jeblair	mordred: yep	00:45
jeblair	replication	00:45
mordred	jeblair: I am too - but if we're actually going to elastic throwaway nodes	00:46
jeblair	so which do we want to do, our own haproxy on git, or lbaas?	00:46
openstackgerrit	Joshua Hesketh proposed a change to openstack-infra/zuul: Move gerrit specific result actions under reporter https://review.openstack.org/42644	00:46
openstackgerrit	Joshua Hesketh proposed a change to openstack-infra/zuul: Add support for emailing results via SMTP https://review.openstack.org/42645	00:46
openstackgerrit	Joshua Hesketh proposed a change to openstack-infra/zuul: Separate reporters from triggers https://review.openstack.org/42643	00:46
* Shrews sees lots of familiar words being thrown around.		00:46
mordred	jeblair: I think haproxy on git is the path of least resistnace right now	00:46
mordred	although I think if it helps, we sohuld definitely re-work to use lbaas	00:47
clarkb	++	00:47
fungi	rework to use Shrews	00:47
mordred	jeblair: if we do lbaas right now, we'll have to do a dns swap and whatnot	00:47
clarkb	fungi: make Shrews do it FTFY	00:47
jeblair	k, one more question -- should we spin up 30g nodes, or try shriking them a bit?	00:47
* Shrews doesn't work. Try again.		00:47
jeblair	(i lean toward sticking with 30g until we benchmark)	00:47
mordred	yes. I agree	00:47
mordred	30g	00:47
clarkb	jeblair: ya, and we can go small easily once the lb is happy	00:48
mordred	how much would it kill the cloud for us to snapshot git.o.o and then spin up git1 and git2 using that?	00:48
jeblair	we are using almost none of the memory, but we don't really understand the cpu or network requirements yet	00:48
mordred	(so that we don't have to do initial clones right now)	00:48
fungi	the plan seems sound	00:48
jeblair	mordred: faster to spin up from scratch; gerrit is lightly loaded, the push won't be bad.	00:48
mordred	ok	00:48
clarkb	now that we have a plan. Will everyone hate me if I duck out to bother fungi while he is on this side of the continent?	00:49
jeblair	clarkb: can you stick around for a sec?	00:49
clarkb	sure	00:50
jeblair	in order to get there, we need some puppet work....	00:50
clarkb	ah	00:50
jeblair	we need git\d+.o.o defined to be a cgit/git server	00:50
SpamapS	dumb-but-performant-lbaas-->two modest layer 7 routing boxes-->appropriate target pools is not a terrible meme.	00:50
SpamapS	if lbaas does ssl, win, otherwise let the layer 7 boxes do it.	00:50
jeblair	SpamapS: yeah, we may come back and do l7 in the next pass	00:50
jeblair	and we need git.o.o defined to be an haproxy server pointing to them	00:51
SpamapS	Oh I thought you were scaling different urls differently and that was why this was complicated?	00:51
SpamapS	also.. has git->swift come up?	00:51
jeblair	SpamapS: that was the idea, but we're punting because it's complicated	00:51
jeblair	SpamapS: you aren't helping	00:51
jeblair	:)	00:51
SpamapS	:)	00:51
*** dkliban has joined #openstack-infra		00:52
jeblair	does that puppet description make sense?	00:52
jeblair	and do we want the service and haproxy changes on each of the worker nodes as well?	00:52
fungi	so just splatter https connections round-robin to the pool members?	00:52
mgagne	Could it be apache not being the right tool for such use case? And I don't believe an out-of-the-box apache config is appropriate for such setup.	00:53
jeblair	fungi: i think that's the idea, or whatever haproxy does (maybe it counts sockets?)	00:53
mgagne	I could be mistaken	00:53
jeblair	mgagne: we know it is not correct, someone needs to actually benchmark it and get good numbers	00:53
jeblair	mgagne: and we're planning on using haproxy to make the git server behave better too	00:53
clarkb	jeblair: as long as the node def allows us to have nodes with digits that don't haproxy and the one without digits to haproxy we should be good	00:54
*** lbragstad has joined #openstack-infra		00:54
fungi	yeah, seems like two node defs to me	00:54
clarkb	fungi: yurp	00:54
clarkb	jeblair: should I start hacking something up?	00:55
clarkb	or are you ahead of me and looking for reviews?	00:55
jeblair	clarkb: no, i think we're at the point of 'looking for volunteers'	00:55
mgagne	jeblair: I do understand the benefit of haproxy. I would however reduce keepalive timeout of apache and increase MaxClients if the server can handle it. Serving static files shouldn't put a server on its knees like that.	00:56
clarkb	jeblair: ok I can start writing the change	00:56
jswarren	Looks like there are problems with grenade.	00:56
mordred	mgagne: well, right now we don't know what the server can handle	00:56
mgagne	jeblair: but it's only blind suggestions as I don't have much info of what's really going on on the server	00:56
fungi	mgagne: well, a lot of this is cgi backend, not flat file serving	00:57
jeblair	mgagne: hopefully we'll have performance monitoring soon	00:57
clarkb	jeblair: canyou or someone else diagram what they want it to look like as we have talked about a bunch of different layouts and I am not 100% sure of what we settled on	00:57
mordred	mgagne: and it's actually not about serving static files - it's the not-static that are a problem	00:57
jeblair	fungi: what's your schedule like, are you working at all this evening?	00:57
mordred	jswarren: we're working through some things. I do not know if that's related	00:57
jeblair	(i'm getting pretty close to burnout point again myself, so will probably have to pick up tomorrow)	00:57
mordred	jswarren: do you have a link	00:57
*** ryanpetrello has joined #openstack-infra		00:57
fungi	jeblair: i can come back and work after dinner. christine is about to bite my head off if i don't take her out to dinner and sight seeing. she's getting bored of sitting in the hotel room	00:57
jswarren	http://logs.openstack.org/98/31898/49/check/gate-grenade-devstack-vm/f68a47e/console.html	00:58
jswarren	Seen a couple like that.	00:58
mgagne	fungi: which CGI processes ? cgit or git-http-backend?	00:58
jeblair	fungi: no pressure	00:58
fungi	mgagne: git-http-backend	00:58
mordred	jswarren: yes	00:58
mordred	http://logs.openstack.org/98/31898/49/check/gate-grenade-devstack-vm/f68a47e/logs/devstack-gate-setup-workspace-new.txt	00:58
fungi	mgagne: more specifically, hundreds of git-upload-pack	00:58
fungi	i think	00:58
mordred	oh. wait	00:59
mgagne	fungi: could it be URL without .git being served by git-http-backend instead of hitting the filesystem?	00:59
mordred	jeblair:	00:59
*** zul has joined #openstack-infra		00:59
mordred	fatal: Couldn't find remote ref refs/zuul/master/Z10fb39f7b5984e1283445238278973f5	00:59
mordred	Unexpected end of command stream	00:59
mordred	jeblair: is zuul also having problems? or is that a consequence of git.o.o having issues?	00:59
jeblair	clarkb, fungi: https://etherpad.openstack.org/git-lb	00:59
fungi	mgagne: it could be just about anything right now. with the server misbehaving potentially causing fallback behaviors in the clients it's tough to know what the real problem is and what the secondary effects are	01:00
jeblair	diagram ^	01:00
jeblair	mordred: was that an error?	01:00
mordred	jeblair: yes. in http://logs.openstack.org/98/31898/49/check/gate-grenade-devstack-vm/f68a47e/logs/devstack-gate-setup-workspace-new.txt	01:00
jeblair	mordred: i think that's a perfectly normal error	01:00
clarkb	jeblair: where does ssl terminate in that diagram?	01:00
mordred	ok.	01:00
mordred	good!	01:00
jeblair	clarkb: my understanding of haproxy is that it proxies tcp connections,	01:01
jeblair	clarkb: so i think ssl terminates at the workers	01:01
fungi	clarkb: my assumption is ssl terminates on the individual nodes and we do at best layer 4 redirection	01:01
mordred	yes to ^^	01:01
mordred	so git1 and git2 apache should each think that they are git.o.o	01:01
clarkb	jeblair: fungi ok. I think we can have it terminate in haproxy but that makes it morecomplicated /me goes with terminating on the individual nodes	01:01
mordred	which means that the apache module is likely going to need to change, or the puppet	01:01
Shrews	haproxy does ssl pass thru, but the dev version is supposed to support ssl termination	01:01
Shrews	just fyi	01:01
jeblair	sounds like that's the right choice for now then. :)	01:02
mordred	otherwise it's going to spin up apache on git1 as git1.o.o and the vhost info will be wrong - unless I'm wrong?	01:02
jeblair	mordred: i believe that just means "there is no zuul ref for this project"	01:02
mgagne	jeblair: haproxy should support both mode	01:03
fungi	mordred: the apache module is perfectly capable of serving sites with different names than the node's name	01:03
mordred	jeblair: great! I was worried that there was all of a sudden another issue	01:03
mordred	jeblair: quick stupid suggestion - what if we stopped doing the git remote update	01:03
fungi	mordred: unless i misunderstood the question	01:04
mordred	jeblair: and intead just let it do the git fetch from zuul?	01:04
mordred	which is a much more specific request for information	01:04
jeblair	mordred: increase the load on zuul? rather not. :)	01:04
mordred	jeblair: well, I'm just saying - it's already doing the fetch from zuul, and we're already starting from repos that are pretty close	01:05
jeblair	mordred: i'm not sure we'd end up with tags, etc...	01:05
mordred	ah. k. there it is. tags for sure	01:05
jeblair	mordred: whatever it gets from git.o.o now it would have to get from zuul	01:05
mordred	yeah. k. let's call it another thing to think about later when we have more time to think	01:05
jeblair	mordred: yes; that would need some testing.	01:06
mordred	as in - rethink the flow of the states of the refs in the repos and see if we can avoid the blanket 'git remote upate'	01:06
*** weshay has joined #openstack-infra		01:06
* mordred stops brainstorming		01:06
*** rfolco has quit IRC		01:06
mgagne	are exported resources supported by puppet on infra?	01:06
mgagne	it requires storeconfigs	01:07
mordred	mgagne: we've never used them	01:07
mordred	mgagne: all of our stuff currently works via puppet apply as well as puppet agent	01:07
mgagne	mordred: we will forget it for now I guess =)	01:07
clarkb	mgagne: they are not	01:08
clarkb	exported resources are kind of annoying to work with iirc.	01:08
clarkb	because you need mutliple passes	01:08
mordred	once we get to that level of complexity, I think we'll be happier with heat driving puppet and handing it the needed metadata	01:08
mgagne	clarkb: true when bootstraping an infra, order becomes important	01:09
SpamapS	hm, is there a recheck bug for https://review.openstack.org/#/c/42995/ .. looks like just timeouts during git clones or something	01:09
SpamapS	(is that what is being discussed right here right now?;)	01:09
jeblair	clarkb, mordred: i can't run 'nova list' for any of the hpcloud azs	01:09
mordred	SpamapS: yes	01:10
mordred	jeblair: are you getting the 400 error?	01:10
jeblair	yes	01:10
mordred	that's what I was getting ealier	01:10
*** anteaya has quit IRC		01:11
mordred	jeblair: I'm asking hp people	01:11
mordred	"We have a couple of P1 incidents still ongoing. We're on it."	01:12
mordred	man, when it rains, it pours	01:12
jeblair	mordred: ok, thanks. nodpool isn't going to be able to delete all those nodes until that's fixed.	01:12
mordred	ossum	01:12
jeblair	mordred: which means it is constrained in what it can spin up	01:12
fungi	i get a list out of openstackjenkins2-project1 on az-1.region-a.geo-1	01:13
fungi	also out of az2 and az3	01:14
jeblair	weird, i do not.	01:14
jeblair	this is as root on ci-puppetmaster	01:15
fungi	be sure you're sourcing the openstackjenkins2 creds and not the old openstackjenkins creds?	01:15
jeblair	fungi: yep; i'm in a terminal i've been using for days now	01:15
jeblair	(screen session)	01:15
*** mriedem has quit IRC		01:16
fungi	huh, yep	01:16
fungi	i get the 400 from the puppet master	01:16
*** tian has joined #openstack-infra		01:17
mordred	fungi: what's the network range of the machien you do not get 400 from	01:17
mordred	?	01:17
mordred	and what's the puppetmaster IP?	01:17
fungi	mordred: working from 66.26.81.51 and failing from198.101.208.204	01:17
mordred	fungi: stellar	01:18
fungi	oh, though on the working system i left out some of the params we define on the puppetmaster one. let me see if it's one of those	01:19
mgagne	untested haproxy puppet manifest: http://paste.openstack.org/show/44704/	01:19
jeblair	a bunch of jobs are stuck trying to clone or update from earlier (about 1.25 hours ago)	01:19
jeblair	i'm aborting them	01:19
fungi	nope, using precisely the same creds on the puppetmaster as on my working system, i get a 400 error	01:20
fungi	how do you get novaclient to tell you what version it is?	01:21
fungi	my working system is running 2.14.1 from a virtualenv	01:21
fungi	the puppet master is running $something_older i guess	01:21
mordred	fungi: it works on my laptop using those creds	01:22
fungi	so might be something specific to the way the api calls are being made by newer vs older novaclient	01:22
clarkb	I will have a first draft of a change shortly	01:22
mordred	wait. those were old cred. lemme try new ones	01:22
mordred	yes. openstackjenkins2 works with the creds from puppetmaster on my laptop	01:23
mordred	mordred@camelot:~/src/openstack-infra/gear$ nova --version	01:24
mordred	2.13.0.108	01:24
*** ryanpetrello has quit IRC		01:24
jeblair	i think the devstack jobs are cloning	01:24
jeblair	repos	01:24
fungi	rather than using the cached copies?	01:24
mordred	jeblair: full clones?	01:24
jeblair	i think so	01:25
fungi	that would explain the sudden explosion in git load	01:25
mordred	wow. well, that would explain the amount of traffic	01:25
jeblair	i will work on that after dinner, and build new images if needed.	01:26
mordred	jeblair: I'll look at that too	01:28
mordred	jeblair: btw - nova on ci-puppetmaster is 2012.1	01:29
mordred	so _very_ old	01:29
mordred	and if was working earlier	01:29
mordred	then got flaky	01:29
mordred	now is dead	01:29
mordred	so I'm asking if they did any deploys today	01:29
mordred	because they may have broken compat with 2012.1 novaclient	01:29
*** gyee has quit IRC		01:30
fungi	that would be so awesome^Wunfortunate	01:30
jeblair	mordred: ERROR: HTTPSConnectionPool(host='region-a.geo-1.identity.hpcloudsvc.com', port=35357): Max retries exceeded with url: /v2.0/tokens (Caused by <class 'socket.gaierror'>: [Errno -3] Temporary failure in name resolution)	01:31
jeblair	mordred: i ran that with a newer novaclient on the same system	01:31
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	01:31
fungi	huh, that's freaky	01:32
mordred	jeblair, fungi: host region-a.geo-1.identity.hpcloudsvc.com worked, but took a _while_	01:32
jeblair	mordred: yeah, i got a partial timeout trying that too	01:33
* fungi ducks out to dinner but will check back in later		01:33
clarkb	assuming 42784 doesn't have any syntax errors or tyops I actually expect that to work	01:33
clarkb	it will only load balance across a single node of localhost right now	01:33
*** zul has quit IRC		01:36
*** jjmb has joined #openstack-infra		01:38
clarkb	mordred: whatever happened to IAD?	01:39
mordred	clarkb: I don't believe we did anything with it yet	01:39
mordred	clarkb: I mean, I'm not sure that patch even landed	01:39
jeblair	mordred: it did not land	01:39
jeblair	i'm not running puppet on nodepool because it's too touch and go	01:40
mordred	++	01:40
mordred	jeblair, clarkb: troy thinks IAD may be faster - worth spinning up git1 and git2 in IAD? (also,probably less neighbors right now)	01:40
jeblair	mordred: we'd be pushing updates across data centers	01:41
mordred	hrm. good point	01:41
jeblair	mordred: (not to mention pulling from them)	01:41
lifeless	IAD?	01:41
lifeless	Is that like the younger version of an IED?	01:41
mordred	IAD is the airport code for the Washington Dulles airport	01:42
mordred	lifeless: and is a new not-quite-rolled out region in rax cloud	01:42
lifeless	ah	01:42
mordred	lifeless: I don't know if we've mentioned before, but all of our important servers run in rax, because hp is too ephemeral and also blocks email ports	01:43
lifeless	the email thing I knew	01:43
lifeless	I didn't know about the ephemeral aspect; do you mean flaky?	01:43
mordred	they also have not taken our feedback about how this makes them not suitable for our usecase to heart	01:43
mordred	yes	01:43
lifeless	is there a trouble ticket open on it?	01:43
mordred	nodes get deleted from time to time	01:43
lifeless	That seems like something we should do.	01:44
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	01:44
mgagne	too fast, can't comment =(	01:44
clarkb	I believe that patchset will pass tests and it has had some additional cleanup done to it	01:44
clarkb	mgagne: you can commetn on the older patchest	01:44
clarkb	mgagne: I will look for your comments there	01:44
jeblair	mordred: i believe my new scripts put all the git repos in ~ubuntu	01:44
mordred	jeblair: oh poop. that's now where devstack looks for them	01:45
mordred	not	01:45
jeblair	mordred: it's not the usual place, no.	01:45
mgagne	sup with 29418 and 19418 ?	01:46
clarkb	mgagne: 29418 is where the actual git-daemon will listen. Then each is fronted by an haproxy to do queueing that haproxy listens on 19418.	01:47
clarkb	mgagne: all so that 9418 is free on git.o.o for the world. Its a bit ugly yes	01:47
clarkb	but I figure the haproxy at the front of everything shouldn't worry about queueing	01:48
clarkb	I could be completely wrong	01:48
mgagne	clarkb: I understand now, I was wondering if it was legitimate or you were typing with boxing gloves on	01:49
clarkb	mgagne: gotcha, thank you for checking	01:49
mgagne	clarkb: 4443?	01:51
clarkb	mgagne: again to not conflict with 443 on the frontend haproxy, because the frontend haproxy is sharing space with apache	01:52
mordred	jeblair: are you respinning/fixing? or would you like me to so you can get dinner? I'm happy to squeeze in being mildly helpful before going away	01:52
clarkb	oh I missed the gerrit replication stuff /me adds that	01:52
jeblair	mordred: i have a local patch i'm about to let it test while eating, so i think i got it.	01:52
mordred	jeblair: ok	01:53
jeblair	mordred: probably applying your expertise to reviewing the haproxy thing would be helpful	01:53
mgagne	clarkb: my mistake, sorry	01:53
*** ftcjeff has joined #openstack-infra		01:53
lifeless	jeblair: url? [I have haproxy knowledge, for my sins]	01:54
lifeless	clarkb: one of the most useful things haproxy can do for maintaining consistent response time is to cap the concurrent backend work that is being permitted	01:54
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	01:54
lifeless	clarkb: so it totally should managing queue	01:54
clarkb	lifeless: yeah that is what we are using it for	01:54
lifeless	be managing the queue	01:54
lifeless	clarkb: then 13:48 < clarkb> but I figure the haproxy at the front of everything shouldn't worry about queueing	01:55
lifeless	clarkb: has me confused	01:55
clarkb	lifeless: just not the frontend haproxy. We have three layers for git-daemon. The middle haproxy worries about the queue	01:55
clarkb	lifeless: haproxy 9418 -> haproxy 19418 -> git-darmon 29418	01:55
lifeless	clarkb: cross-meshed across two machines ?	01:55
clarkb	lifeless: https://review.openstack.org/#/c/42784/	01:55
clarkb	lifeless: no I am not worrying about the ha part of ha proxy right now	01:56
clarkb	lifeless: https://review.openstack.org/#/c/42784/6/modules/cgit/manifests/init.pp has the most interesting bits in it	01:56
lifeless	clarkb: ok, so whats the front end haproxy for ?	01:56
clarkb	lifeless: load balancing	01:56
lifeless	clarkb: what are the middle ones for then ?	01:57
clarkb	lifeless: queueing	01:57
openstackgerrit	Kui Shi proposed a change to openstack-dev/hacking: Improve H202 to cover more cases https://review.openstack.org/43029	01:57
lifeless	clarkb: that doesn't make sense to me	01:57
clarkb	lifeless: git-daemon needs queueing otherwise it just goes nuts. A simple haproxy -> gitdaemon gives us that	01:58
*** ryanpetrello has joined #openstack-infra		01:58
lifeless	clarkb: ok; so why isn't that the front haproxy ?	01:58
clarkb	lifeless: reason #1 is to make it easier to consume lbaas	01:58
clarkb	the thing that the simple haproxy in front of gitdaemon is doing is something that our lbaas providers cannot do	01:58
lifeless	the lbaas apis don't expose the full capabilities of haproxy like queue depth limits etc?	01:59
clarkb	but everything in the frontend haproxy issomething that could be replaced with lbaas	01:59
clarkb	lifeless: they do not	01:59
lifeless	sadface	01:59
clarkb	you can set per host throttles	01:59
clarkb	that is it	01:59
*** lbragstad has quit IRC		02:00
lifeless	clarkb: so frankly, I wouldn't use lbaas then; you want queuing handled at the front end, and ha in the middle tier	02:00
*** wenlock has joined #openstack-infra		02:00
lifeless	clarkb: but - this is your teams call; I'm just coming from my running-busy-site-with-haproxy-squid-etc-etc background	02:00
mordred	lifeless: the main thing we're trying to get with this is just _some_ headroom without reengineering the whole thing yet	02:01
mordred	we'd like to do a better/deeper re-architecture	02:01
clarkb	lifeless: thanks, it is good to know. And yes we do plan on actually testing and engineering this stuff	02:02
clarkb	but right now we need a thing that works	02:02
mordred	but we need to actually analyze what's going on and what our capacity is etc - get real numbers/baselines	02:02
lifeless	clarkb: whats deployed right now?	02:02
mordred	yah. what he said	02:02
lifeless	clarkb: all three layers?	02:02
mordred	lifeless: a single apache server serving git	02:02
lifeless	ok	02:02
clarkb	lifeless: and xinetd in front of git-daemon	02:02
clarkb	which is bonghits	02:03
lifeless	so two layers of haproxy will work, but if you want to keep it simpler - which I encourage - I'd just deploy a single haproxy frontend	02:03
SpamapS	simple is for the weak	02:03
clarkb	I am going to manually isntall puppet-haproxy on the puppet master so that we can use dev envs with this change	02:03
lifeless	and ignore lbaas for now, because what you want right now is breathing room.	02:03
mordred	clarkb: ++	02:03
mordred	lifeless: yes. we are ignoring lbaas for now for sure	02:04
SpamapS	http://terrorobe.soup.io/post/132401460/Downtime-is-sexy-Josh-Berkus-of-PostgreSQL	02:04
SpamapS	:)	02:04
clarkb	mordred: of course if I can't ssh to that server I might not install puppetlabs-haproxy	02:04
clarkb	mordred: are you able to get in?	02:04
mordred	ci-puppetmaster?	02:05
clarkb	oh now it access my connection	02:05
clarkb	mordred: yeah	02:05
*** zul has joined #openstack-infra		02:05
mordred	clarkb: # TODO add additional git servers here.	02:05
clarkb	mordred: you like that?	02:06
*** markmcclain has quit IRC		02:06
mordred	clarkb: so, if I'm reading this right...	02:06
clarkb	mordred: I think this may be a case of getting everything going on git.o.o first. Then building the new hosts and kicking everythin	02:06
lifeless	clarkb: looking at this I really think one haproxy is better	02:06
mordred	clarkb: yes. so deploy the haproxy on git.o.o that haproxies localhost	02:06
mordred	clarkb: right?	02:07
lifeless	clarkb: your configuration could give you terrible latency as it stands	02:07
clarkb	mordred: yup	02:07
lifeless	clarkb: in overload situations	02:07
mordred	and then add the additional git servers to it	02:07
mordred	lifeless, clarkb: lifeless suggestion should be easy enough to test- set balance_git to false on git1 and git2	02:07
clarkb	lifeless: it could. the git-daemon stuff won't actually be heavily used immediately so we can work it out	02:08
clarkb	lifeless: the http(s) stuff is the immediate concern	02:08
clarkb	ci-puppetmaster seems to have network trouble too	02:08
mordred	yeah.	02:08
clarkb	I can't git fetch my change	02:08
mordred	clarkb: check load	02:08
clarkb	mordred: its 1.5 ish	02:09
mordred	clarkb: is salt running again?	02:09
mordred	it should be 0	02:09
clarkb	hmm it is salt master again. I am going to kill that thing with fire	02:09
mordred	yup. salt-master	02:09
mordred	I believe puppet is going to re-launch him for us	02:09
clarkb	heh all better now	02:09
clarkb	mordred: ugh	02:09
mordred	clarkb: might be worth disabling puppet agent on puppetmaster for a sec	02:09
clarkb	mordred: ok I will do that. mordred do you want to write a puppet change to disable it?	02:10
mordred	yes	02:10
mordred	on it	02:10
clarkb	*to disable salt master	02:10
lifeless	clarkb: nearly finished	02:12
openstackgerrit	Monty Taylor proposed a change to openstack-infra/config: Disable salt master and minions globally https://review.openstack.org/43030	02:12
mordred	clarkb: I hit it with a stick	02:13
mordred	clarkb: our salt class wasn't really written with disabling in mind	02:13
mordred	and I didn't want to run the risk of deleting the key info	02:13
lifeless	clarkb: ok, reviewed.	02:13
clarkb	I have stopped puppet on git.o.o as well. I am going to run puppet agent --test --environment development --noop there	02:13
clarkb	lifeless: thank you looking	02:14
lifeless	clarkb:tl;dr - one haproxy, set a backlog of 200 or so, make sure you have maxconn and maxqueue set for each backend	02:14
lifeless	clarkb: the backlog affects when clients get an error rather than a long pause during overload; the maxconn prevents overloading a backend, and the maxqueue is about signaling overload and errors early	02:15
clarkb	lifeless: so maxqueue is different than the conn backlog?	02:16
clarkb	lifeless: also, for whatever it is worth we seem to be very bursty eg after a gate reset	02:16
mordred	yeah, I think we're quite ok with things sitting in backlog wait for a while	02:16
clarkb	lifeless: so having a longer backlog where things wait their turn is better than failing a bunch of tests	02:16
clarkb	lifeless: or at least that was the theory	02:17
lifeless	clarkb: yes, they are different.	02:17
lifeless	clarkb: so I suggest get it up and working and then tune the numbers up	02:17
clarkb	that is the plan	02:17
lifeless	clarkb: backlog holds things in SYN without SYN-ACK	02:17
clarkb	does maxqueue hold things after a handshake?	02:18
lifeless	yes	02:18
lifeless	there is a TCP timeout on backlog	02:19
lifeless	so you really don't want it too long	02:19
lifeless	let me dig that up	02:19
lifeless	http://www.ietf.org/mail-archive/web/tcpm/current/msg07472.html	02:19
lifeless	0,3 etc seconds	02:19
lifeless	and folk are talking about reducing it	02:19
clarkb	lifeless: I don't see a maxqueue in the keyword list at http://code.google.com/p/haproxy-docs/wiki/balance	02:20
lifeless	you really can't sanely avoid errors by making the backlog high	02:20
lifeless	clarkb: huh, ignore the wiki, useless.	02:20
lifeless	clarkb: http://haproxy.1wt.eu/download/1.3/doc/configuration.txt	02:20
lifeless	maxqueue <maxqueue>	02:20
lifeless	The "maxqueue" parameter specifies the maximal number of connections which	02:20
lifeless	will wait in the queue for this server. If this limit is reached, next	02:20
lifeless	...	02:20
clarkb	lifeless: https://github.com/puppetlabs/puppetlabs-haproxy/blob/0.3.0/manifests/params.pp#L10-L34 are the default that we would use if we don't explicitly set them	02:20
*** cwj has left #openstack-infra		02:20
clarkb	not sure why there are two different maxconn values	02:21
lifeless	clarkb: one is on the server, one is on frontend	02:21
lifeless	clarkb: they are wholly different and its terrible it has the same name	02:21
clarkb	8k is server wide and 4k is frontend specific?	02:22
lifeless	clarkb: not sure about the puppet mapping; sorry - that maxconn mentionI made was about the global thing vs server backend limits	02:23
lifeless	those defaults look non-terrible to me.	02:23
lifeless	clarkb: anyhow, backlog has to be less than 3s - (max RTT/2) to avoid retransmits of SYN	02:24
lifeless	clarkb: which would just add overhead.	02:24
lifeless	clarkb: so yeah, way lower than you have it.	02:24
lifeless	clarkb: use the queue timeout value and maxqueue to control how long something can be queued, and how many things can be queued for a server.	02:25
lifeless	clarkb: HTH, I need to run for a bit; ping here and I will happily review again - or if you can get me a rendered haproxy config I'm very happy climbing through those	02:25
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	02:26
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Swap git daemon in xinetd for service https://review.openstack.org/43012	02:26
clarkb	mordred: lifeless ^ that fixes a bug the noop puppet run found in pleia2's change	02:26
clarkb	lifeless: thank you. I think I am going to try ramming this in with the http stuff then fixup the gitdaemon stuff in a subsequent change	02:27
clarkb	though maybe that is more work than it is worth	02:27
clarkb	I think I am going to take this as an opportunity to head home	02:28
clarkb	rerunning noop puppet really quickly with the latest patchset	02:29
*** jerryz has quit IRC		02:33
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	02:34
clarkb	puppet noop is being slow so I went ahead and fiddled with lifeless' suggestions	02:34
*** yaguang has joined #openstack-infra		02:34
jeblair	hpcloud seems to be working better now, and nodepool seems to be doing a better job deleting nodes now	02:35
*** morganfainberg is now known as morganfainberg\|a		02:36
clarkb	jeblair: mordred: the current noop run looks mostly clean. There is one error but I think it is related to puppet not copying files locally because it is in noop mode	02:36
clarkb	jeblair: mordred do we want to attempt applying it?	02:37
clarkb	I think rolling back will involve stopping haproxy, and reapplying old puppet to get the old apache configs back	02:37
jeblair	clarkb: i'm about to check out for the night (i'm past my point of uselessness), so i'd say: your call	02:37
*** adalbas has quit IRC		02:37
clarkb	jeblair: I am feeling a bit like that too	02:37
jeblair	clarkb: i'm mostly sticking around to fix the image thing (which should reduce the criticality of the git thing)	02:38
clarkb	probably best to hold off on git for now	02:38
jeblair	clarkb: sounds like the way to go.	02:38
jeblair	clarkb: see how i'm talking? "image thing" "git thing"?	02:38
jeblair	useless	02:38
*** xchu has joined #openstack-infra		02:39
clarkb	:) I am beat	02:39
* clarkb heads home. Tomorrow we can hit this thing with a giant stick		02:39
lifeless	ooh, stick.	02:39
clarkb	lifeless: are my numbers at https://review.openstack.org/#/c/42784/8/modules/cgit/manifests/init.pp any better?	02:40
lifeless	clarkb: btw - http://code.google.com/p/haproxy-docs/wiki/ServerOptions is where maxqueue is covered	02:40
*** rcleere has joined #openstack-infra		02:40
lifeless	clarkb: it looks like that wiki is just machine-processed from the docs	02:40
lifeless	clarkb: I don't know if maxqueue there will end up in the right place; but the literal numbers are saner yes.	02:41
mordred	jeblair: you're sounding like me!	02:42
clarkb	lifeless: ya Iwas worreid about it not ending up in the right place after reading the maxqueue doc	02:42
lifeless	clarkb: I'm not confident they are right in any shape, but then I have a different model for how failures should go in my head :)	02:42
jeblair	mordred: i just put "sudo -u ubuntu" in a script and wondered why it didn't run as jenkins.	02:42
lifeless	clarkb: and now is not the time to run through that given tired + fire drill	02:42
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	02:45
clarkb	lifeless: ^ that puts maxqueue in the correct spot and now I am getting off of IRC	02:45
clarkb	mordred: don't have too much fun on the playa. it will only make feature freeze less enjoyable for the rest of us :)	02:46
mordred	clarkb: well, you can also look at it the other way...	02:46
mordred	clarkb: I will be hitting in the extreme weather conditions of the high desert in an arid desert with a abnormally basic pH	02:46
mordred	clarkb: where the only running water, food, electricity or trash service are the ones I bring myself	02:47
mordred	clarkb: surrounded by 60k people who are all in various stages of mind alteration who are walking around with things on fire	02:47
clarkb	mordred: good point. You have basically described why I would have a hard time doing it myself :)	02:47
mordred	:)	02:48
jeblair	mordred: how is that different than feature freeze?	02:48
lifeless	jeblair: more dust?	02:49
jeblair	lifeless: that must be it	02:49
*** jhesketh has quit IRC		02:52
*** melwitt1 has quit IRC		02:52
*** jhesketh has joined #openstack-infra		02:57
openstackgerrit	James E. Blair proposed a change to openstack-infra/nodepool: Move setup scripts destination https://review.openstack.org/43033	02:57
jeblair	mordred: around? part 1 of my fix ^	03:00
mordred	jeblair: looking	03:00
*** dims has quit IRC		03:00
mordred	jeblair: and the difference is the alkalinity	03:00
jeblair	mordred: i don't actually need to merge that one (i can just run it in place)	03:00
jeblair	mordred: the next one i will need to merge	03:01
mordred	jeblair: +2 anyway	03:01
SpamapS	I am so jealous of you guys	03:02
SpamapS	I haven't scaled anything in years. :-P	03:03
mordred	SpamapS: you're always welcome in here	03:04
jeblair	SpamapS: it's all yours if you want it. :)	03:04
mordred	SpamapS: there's plenty to go around	03:04
mordred	SpamapS: also, just wait until we start using heat for some of this	03:04
SpamapS	uh err, no I'm busy with my theoretical scaling things in Heat.	03:04
jeblair	SpamapS: the team scales horizontally too	03:04
mordred	SpamapS: we'll have excellent real-world feedback for you	03:04
SpamapS	I actually can't wait	03:04
*** jog0 is now known as jog0-away		03:04
mordred	you say that now...	03:04
jeblair	SpamapS: it comes in the form of mordred yelling	03:05
mordred	nothing is ever quite so fun as watching the thundering herd of feature freeze come your way	03:05
SpamapS	You guys will thank me that I got this one done: https://bugs.launchpad.net/heat/+bug/1214580 :)	03:05
uvirtbot	Launchpad bug 1214580 in heat "Heat does not make use of the C libyaml parser." [High,In progress]	03:05
jeblair	that's some serious scaling	03:05
mordred	SpamapS: is libyaml web-scale?	03:05
SpamapS	mordred: its not. It doesn't use /dev/null.	03:05
mordred	SpamapS: I mean, I've heard that /dev/null processes yaml faster	03:05
mordred	dammit	03:06
mordred	you were quicker	03:06
SpamapS	Need to tackle the ORM insanity though.. https://bugs.launchpad.net/heat/+bug/1214602	03:06
uvirtbot	Launchpad bug 1214602 in heat "stack_list loads all resource from the database via the ORM" [Medium,Triaged]	03:06
*** woodspa has quit IRC		03:06
mordred	SpamapS: oh, have fun with that	03:06
SpamapS	100 stacks, 10 resources each == 1000 sql queries to do 'heat stack-list'	03:06
SpamapS	actually probably 1100 sql queries	03:06
mordred	SpamapS: now that IS web-scale	03:08
SpamapS	https://bugs.launchpad.net/heat/+bug/1214239	03:10
uvirtbot	Launchpad bug 1214239 in heat "Infinitely recursing stacks reach python's maximum recursion depth" [Medium,Triaged]	03:10
SpamapS	mordred: ^^ thats what I'm working on right now	03:10
openstackgerrit	James E. Blair proposed a change to openstack-infra/config: Fix nodepool setup scripts https://review.openstack.org/43037	03:13
jeblair	mordred: review+aprv ^ ?	03:13
mordred	jeblair: wow. you might actually almost enjoy te weather on playa this year: http://www.weather.com/weather/tenday/Gerlach+NV+USNV0033	03:13
mordred	jeblair: looking	03:13
jeblair	mordred: wow, not bad. i could dig that.	03:14
jeblair	mordred: maybe i'll go to the desert next door.	03:14
mordred	jeblair: why get rid of pushd/popd? (curious)	03:14
jeblair	mordred: don't care about the current dir anymore; there's an explicit cd to the script dir at the bottom	03:15
mordred	jeblair: ah. see it	03:15
jeblair	mordred: (cwd should now be ~jenkins instead of the script dir)	03:15
jeblair	mordred: (because of the sudo)	03:15
mordred	+2 - want me to aprv?	03:15
jeblair	mordred: pls	03:16
mordred	done	03:16
jeblair	my local instance is spinning up a host from an image from that now; i'll double check its sane and then apply	03:16
mordred	++	03:17
jeblair	mordred: then i'll set the image state to delete for one of the providers, which should automatically build a new one	03:18
jeblair	and do that one at a time	03:18
openstackgerrit	A change was merged to openstack-infra/config: Fix nodepool setup scripts https://review.openstack.org/43037	03:19
jeblair	mordred: even though i set one image to deleted, it's going to rebuild all of them.	03:26
jeblair	so, um, hopefully it will work. :)	03:26
jeblair	(the old ones will still be there, so we can roll back if we need to; it's just going to be a little less incremental than i'd hoped)	03:27
jeblair	mordred: i think the post jobs need to fetch from review.o.o; the replication to git.o.o isn't fast enough	03:31
jeblair	mordred: (or else, we should catch that error and retry in g-g-p)	03:32
jeblair	mordred: https://jenkins02.openstack.org/job/openstack-admin-manual-netconn/47/console	03:32
*** blamar has quit IRC		03:32
*** mberwanger has joined #openstack-infra		03:34
*** blamar has joined #openstack-infra		03:34
*** ^d has quit IRC		03:44
jeblair	mordred: that image seems to be good; it's no longer cloning repos	03:46
mordred	jeblair: awesome	03:47
jeblair	unfortunately, the image build process for az2 was disconnected, so it's still using the old one	03:47
mordred	jeblair: and yes re: post fetch from review.o.o	03:47
jeblair	i'll kick off another image build, hopefully az2 will succed this time	03:47
mordred	sigh	03:47
jeblair	mordred: the good news is that at this point, it will keep making new nodes from az1 and az3	03:48
jeblair	and will only start using az2 again if that image update succeeds	03:48
jeblair	mordred: so starting from right now, no new nodes should be created from the old images	03:48
*** nayward has joined #openstack-infra		03:57
openstackgerrit	Jason Meridth proposed a change to openstack-dev/hacking: Adds ability to ignore hacking validations with noqa https://review.openstack.org/41713	03:58
jeblair	mordred: az2 failed again	04:01
*** ftcjeff has quit IRC		04:01
jeblair	i'm going to leave it as is, maybe it'll get better overnight.	04:01
mordred	kk	04:06
*** afazekas has joined #openstack-infra		04:09
*** vogxn has joined #openstack-infra		04:09
mgagne	wow, git.o.o interface is blazing fast now	04:13
mordred	mgagne: whee!	04:17
mordred	mgagne: it helps when it's not being pummelled to death by zuul jobs	04:17
mgagne	mordred: well, we can say it was a great benchmark	04:18
mordred	mgagne: we always learn MANY MANY things during feature freeze	04:18
mgagne	mordred: haha, I was questioning myself the timing of such update =)	04:18
mordred	mgagne: we knew the rust was coming, we've been trying to get enough new tech in place to handle it	04:19
mordred	mgagne: part of this rush was that we removed one of the bottlenecks from last time by making that part of the system better	04:19
mordred	mgagne: and have thus found the next piece in the puzzle :)	04:19
mgagne	mordred: =)	04:19
*** senk has quit IRC		04:20
*** sridevi has joined #openstack-infra		04:22
*** sridevi has left #openstack-infra		04:22
*** sridevi has joined #openstack-infra		04:23
*** wenlock has quit IRC		04:23
*** sridevi has quit IRC		04:32
jeblair	mordred: we now have a full set of images for all the providers	04:33
*** mberwanger has quit IRC		04:34
*** morganfainberg\|a is now known as morganfainberg		04:35
*** mberwanger has joined #openstack-infra		04:38
* fungi is caught back up and reviewing the outstanding bits. glad the source of the pummeling was figured out		04:38
fungi	even with the performance issues we had, the graph says we still spiked up to 600jph today	04:40
jeblair	mordred, clarkb: there's a boat load of hpcloud servers stuck in "ACTIVE(deleting)" state; we may need to open a trouble ticket if the're still around tomorrow	04:40
jeblair	the neutron job seems to be flakey :(	04:42
*** nati_ueno has joined #openstack-infra		04:42
*** nayward has quit IRC		04:46
*** Anju has joined #openstack-infra		04:54
*** ladquin is now known as ladquin_afk		04:55
*** thomasbiege1 has joined #openstack-infra		05:01
*** thomasbiege1 has quit IRC		05:01
*** mirrorbox has quit IRC		05:05
*** mberwanger has quit IRC		05:06
*** ogelbukh has quit IRC		05:06
*** enikanorov-w has quit IRC		05:08
*** enikanorov-w has joined #openstack-infra		05:10
*** sridevi has joined #openstack-infra		05:15
*** rcleere has quit IRC		05:23
sridevi	Hi, can anyone help me debug the jenkins' failure in https://review.openstack.org/#/c/34801/	05:28
sridevi	anyone?	05:29
sridevi	around?	05:29
*** DennyZhang has joined #openstack-infra		05:33
*** nicedice_ has quit IRC		05:37
*** UtahDave has joined #openstack-infra		05:47
*** DennyZhang has quit IRC		05:55
openstackgerrit	A change was merged to openstack/requirements: Remove upper bounds on lifeless test libraries https://review.openstack.org/42515	05:55
*** vogxn has quit IRC		05:57
*** cody-somerville has quit IRC		05:57
*** sridevi has quit IRC		05:57
openstackgerrit	A change was merged to openstack/requirements: Add dogpile.cache>=0.5.0 to requirements https://review.openstack.org/42455	05:58
*** vogxn has joined #openstack-infra		05:58
*** w_ has joined #openstack-infra		06:02
*** olaph has quit IRC		06:05
*** ryanpetrello has quit IRC		06:11
*** vogxn has quit IRC		06:11
*** cody-somerville has joined #openstack-infra		06:13
*** nayward has joined #openstack-infra		06:17
*** vogxn has joined #openstack-infra		06:20
*** Dr0id has joined #openstack-infra		06:20
*** dmakogon_ has joined #openstack-infra		06:24
*** Dr0id has quit IRC		06:25
*** annegentle has quit IRC		06:25
*** odyssey4me4 has joined #openstack-infra		06:25
*** psedlak has joined #openstack-infra		06:30
*** annegentle has joined #openstack-infra		06:30
*** AJaeger has joined #openstack-infra		06:33
*** sridevi has joined #openstack-infra		06:34
*** afazekas has quit IRC		06:44
*** jinkoo has joined #openstack-infra		06:51
*** ruhe has joined #openstack-infra		06:52
*** Guest75819 has quit IRC		06:56
openstackgerrit	Mark McLoughlin proposed a change to openstack/requirements: Allow use of oslo.messaging 1.2.0a10 https://review.openstack.org/43060	07:04
*** lillie has joined #openstack-infra		07:06
*** lillie is now known as Guest16331		07:06
*** stevebaker has quit IRC		07:07
*** Dr01d has joined #openstack-infra		07:10
*** stevebaker has joined #openstack-infra		07:12
sridevi	anyone around?	07:14
sridevi	I'm having trouble debugging the devstack neutron failures	07:15
*** stevebaker has quit IRC		07:18
*** thomasbiege1 has joined #openstack-infra		07:18
*** jinkoo has quit IRC		07:19
*** yonglihe_ has joined #openstack-infra		07:19
yonglihe_	hello, seems Jenkins build machine had problem,	07:20
yonglihe_	2013-08-21 06:11:42.794 \| Started by user anonymous	07:20
yonglihe_	2013-08-21 06:11:42.797 \| [EnvInject] - Loading node environment variables.	07:20
yonglihe_	2013-08-21 06:11:42.833 \| Building remotely on centos6-7 in workspace /home/jenkins/workspace/gate-nova-python26	07:20
yonglihe_	2013-08-21 06:11:42.866 \| [gate-nova-python26] $ /bin/bash -xe /tmp/hudson2665365283182338716.sh	07:20
yonglihe_	2013-08-21 06:11:42.873 \| + /usr/local/jenkins/slave_scripts/gerrit-git-prep.sh https://review.openstack.org http://zuul.openstack.org https://git.openstack.org	07:20
yonglihe_	2013-08-21 06:11:42.877 \| Triggered by: https://review.openstack.org/35074	07:20
yonglihe_	2013-08-21 06:11:42.877 \| + [[ ! -e .git ]]	07:20
yonglihe_	2013-08-21 06:11:42.878 \| + git remote set-url origin https://git.openstack.org/openstack/nova	07:20
yonglihe_	2013-08-21 06:11:42.882 \| + git remote update	07:20
yonglihe_	2013-08-21 06:11:42.889 \| Fetching origin	07:20
yonglihe_	2013-08-21 06:51:42.842 \| Build timed out (after 40 minutes). Marking the build as failed.	07:21
yonglihe_	2013-08-21 06:51:42.934 \| fatal: The remote end hung up unexpectedly	07:21
yonglihe_	2013-08-21 06:51:42.939 \| error: Could not fetch origin	07:21
yonglihe_	2013-08-21 06:51:42.941 \| + git remote update	07:21
yonglihe_	2013-08-21 06:51:42.949 \| Fetching origin	07:21
yonglihe_	http://logs.openstack.org/74/35074/24/check/gate-nova-python26/5227601/console.html	07:21
*** pblaho has joined #openstack-infra		07:21
yonglihe_	sorry for the long log	07:21
*** stevebaker has joined #openstack-infra		07:21
*** sridevi has quit IRC		07:24
morganfainberg	yonglihe_: i'm sure not a worry, but next time (to avoid the long log) use a paste (e.g. http://paste.openstack.org/ )	07:25
morganfainberg	(that way you can reference it again if needed as well w/o having to hunt for it)	07:25
*** kspear has quit IRC		07:25
*** xBsd has joined #openstack-infra		07:27
*** DennyZhang has joined #openstack-infra		07:28
*** dmakogon_ has quit IRC		07:28
*** shardy_afk is now known as shardy		07:29
*** michchap has quit IRC		07:29
*** GheRivero has quit IRC		07:30
*** kspear has joined #openstack-infra		07:30
*** thomasbiege1 has quit IRC		07:33
*** GheRivero has joined #openstack-infra		07:35
*** dmakogon_ has joined #openstack-infra		07:37
*** kspear has quit IRC		07:40
yonglihe_	thanks morganfainberg, i got it	07:40
*** boris-42 has joined #openstack-infra		07:42
*** jpich has joined #openstack-infra		07:51
*** nati_uen_ has joined #openstack-infra		07:54
*** michchap has joined #openstack-infra		07:54
yonglihe_	http://paste.openstack.org/show/44724/	07:54
*** michchap has quit IRC		07:54
yonglihe_	seems something lost, but i can not find which machine is this.	07:55
*** michchap has joined #openstack-infra		07:55
*** fbo_away is now known as fbo		07:55
*** nati_ueno has quit IRC		07:57
*** vogxn has quit IRC		07:57
*** mikal has joined #openstack-infra		07:59
*** GheRivero has quit IRC		08:02
*** GheRivero has joined #openstack-infra		08:02
*** michchap has quit IRC		08:03
*** GheRivero has quit IRC		08:03
*** GheRivero has joined #openstack-infra		08:04
*** xchu has quit IRC		08:05
*** nayward has quit IRC		08:10
*** nayward has joined #openstack-infra		08:11
*** dmakogon_ has quit IRC		08:11
*** moted has quit IRC		08:11
*** EntropyWorks has quit IRC		08:11
*** soren has quit IRC		08:11
*** mindjiver has quit IRC		08:11
*** clarkb has quit IRC		08:11
*** rockstar has quit IRC		08:11
*** echohead has quit IRC		08:11
*** jeblair has quit IRC		08:11
*** echohead has joined #openstack-infra		08:12
*** mindjiver has joined #openstack-infra		08:12
*** jeblair has joined #openstack-infra		08:12
*** clarkb has joined #openstack-infra		08:12
*** EntropyWorks has joined #openstack-infra		08:12
*** soren has joined #openstack-infra		08:12
*** soren has quit IRC		08:12
*** soren has joined #openstack-infra		08:12
*** moted has joined #openstack-infra		08:12
*** Kiall has quit IRC		08:13
*** rockstar has joined #openstack-infra		08:13
*** rockstar has joined #openstack-infra		08:13
*** AJaeger has quit IRC		08:13
*** kiall has joined #openstack-infra		08:15
*** vogxn has joined #openstack-infra		08:20
*** GheRivero has quit IRC		08:20
*** xchu has joined #openstack-infra		08:21
*** GheRivero has joined #openstack-infra		08:21
*** xBsd has quit IRC		08:22
*** GheRivero has quit IRC		08:22
*** GheRivero has joined #openstack-infra		08:22
*** GheRivero has quit IRC		08:29
*** GheRivero has joined #openstack-infra		08:29
*** xBsd has joined #openstack-infra		08:32
*** michchap has joined #openstack-infra		08:34
*** michchap has quit IRC		08:42
*** UtahDave has quit IRC		08:45
*** Dr01d has quit IRC		08:45
*** Dr01d has joined #openstack-infra		08:46
*** DennyZha` has joined #openstack-infra		08:53
*** DennyZhang has quit IRC		08:55
*** xBsd has quit IRC		08:55
*** jpich has quit IRC		08:57
*** jpich has joined #openstack-infra		08:59
*** BobBall_Away is now known as BobBall		09:06
*** yaguang has quit IRC		09:07
*** xchu has quit IRC		09:07
*** yaguang has joined #openstack-infra		09:09
*** yaguang has quit IRC		09:14
*** ruhe has quit IRC		09:16
*** ruhe has joined #openstack-infra		09:18
*** xchu has joined #openstack-infra		09:20
*** yaguang has joined #openstack-infra		09:27
*** nayward has quit IRC		09:28
*** yaguang has quit IRC		09:35
*** ruhe has quit IRC		09:37
*** kspear has joined #openstack-infra		09:37
*** xBsd has joined #openstack-infra		09:39
*** yaguang has joined #openstack-infra		09:42
*** nayward has joined #openstack-infra		09:52
*** markmc has joined #openstack-infra		09:54
*** DennyZha` has quit IRC		10:01
*** pcm_ has joined #openstack-infra		10:04
*** pcm_ has quit IRC		10:06
*** pcm_ has joined #openstack-infra		10:06
*** boris-42 has quit IRC		10:09
*** ruhe has joined #openstack-infra		10:16
*** xchu has quit IRC		10:19
*** Shrews has quit IRC		10:27
*** Shrews has joined #openstack-infra		10:36
*** nati_uen_ has quit IRC		10:39
*** markmcclain has joined #openstack-infra		10:39
*** xBsd has quit IRC		10:39
*** xBsd has joined #openstack-infra		10:40
markmc	anyone seeing zuul miss events ?	10:49
* markmc just pushed ~30 nova patches and there's only 10 in the check queue		10:49
markmc	maybe it's just catching up	10:50
markmc	ah, yeah	10:50
markmc	1 added every 30 seconds or so	10:50
*** vogxn has quit IRC		10:50
*** SergeyLukjanov has joined #openstack-infra		10:50
markmc	oh, god, I shouldn't watch the zuul dashboard	10:52
markmc	this failure: https://jenkins02.openstack.org/job/gate-swift-devstack-vm-functional/94/console	10:52
markmc	just aborted 18 changes in the gate queue	10:53
markmc	tragic	10:53
*** SergeyLukjanov has quit IRC		10:54
openstackgerrit	Stuart McLaren proposed a change to openstack/requirements: Bump python-swiftclient requirement to >=1.5 https://review.openstack.org/43092	10:54
*** ruhe has quit IRC		10:59
*** yaguang has quit IRC		11:00
*** ruhe has joined #openstack-infra		11:01
*** SergeyLukjanov has joined #openstack-infra		11:03
*** whayutin_ has joined #openstack-infra		11:04
mordred	markmc: morning	11:04
markmc	mordred, howdy	11:05
mordred	markmc: at oscon we discussed an idea about how to speculatively deal with the scenario you tweeted about	11:05
*** dina_belova has joined #openstack-infra		11:05
*** weshay has quit IRC		11:05
markmc	mordred, the "no! no! no! zuul! don't do it! nooooo!" scenario? :)	11:06
mordred	markmc: it gets complex, so it's not going to happen this cycle, ut there is a way we could use WAY more resources to start a new virtual queue based on the now-presumptive state of the world	11:06
mordred	markmc: yeah	11:06
mordred	the reason we leave those jobs aborted currently is that we don't know if changes 1 and 2 will fail or not - so we wait for the queue head of the aborted jobs to resolve	11:06
mordred	if we restarted them currently, we'd essentially need to start building a tree rather than a plain queue	11:07
mordred	but - we had a good chat about it	11:07
mordred	:)	11:07
markmc	not sure I follow, but definitely an interesting subject :)	11:07
markmc	now go have fun offline :)	11:08
mordred	I think now that we have gearman, multi-jenkins and the new nodepool code - we'll be set nicely to think about things like that next cycle	11:08
mordred	markmc: I have 5 hours of plane flights before I get to do that	11:08
markmc	ah, lovely	11:08
mordred	yah.	11:08
markmc	use that time wisely	11:08
* mordred is hoping that he can provide _some_ usefulness after how hectic the past two days have been		11:09
markmc	like replying to all your linkedin recruiter spam	11:09
mordred	jeez	11:09
mordred	that's not possible	11:09
mordred	although, I've learned that there is a Java Opportunity in Studio City, CA	11:09
markmc	I contemplated hacking on gerrit's topic review support briefly yesterday	11:09
markmc	very briefly	11:09
mordred	hahahaha	11:10
*** whayutin_ is now known as weshay		11:11
mordred	git.o.o is running warm, but not dying currently:	11:14
mordred	http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=854&rra_id=all	11:14
mordred	given the current length of the queue, i'm going to take that as a good thing	11:15
mordred	load average of 11 - all cpus at around 80%	11:16
mordred	WOW	11:18
*** boris-42 has joined #openstack-infra		11:19
mordred	swift change 28892 has been in the gate queue for 12H	11:19
mordred	but it MIGHT merge in 11 minutes	11:20
mordred	after which point we will have a gate reset event :)	11:20
mordred	and everyone can watch the thundering herd clone from git.o.o	11:21
*** xBsd has quit IRC		11:26
*** xBsd has joined #openstack-infra		11:31
*** BobBall has quit IRC		11:37
*** lcestari has joined #openstack-infra		11:41
*** zehicle_at_dell has joined #openstack-infra		11:41
*** nayward has quit IRC		11:45
*** dina_belova has quit IRC		11:45
*** xBsd has quit IRC		11:45
*** AJaeger has joined #openstack-infra		11:47
*** AJaeger has joined #openstack-infra		11:47
*** xBsd has joined #openstack-infra		11:49
*** dims has joined #openstack-infra		11:52
*** apcruz has joined #openstack-infra		11:54
*** zehicle_at_dell has quit IRC		11:55
*** AJaeger has quit IRC		11:59
*** ruhe has quit IRC		12:00
*** AJaeger has joined #openstack-infra		12:02
*** AJaeger has joined #openstack-infra		12:02
*** BobBall has joined #openstack-infra		12:03
*** dprince has joined #openstack-infra		12:03
*** psedlak has quit IRC		12:04
*** michchap has joined #openstack-infra		12:06
*** michchap has joined #openstack-infra		12:07
*** AJaeger has quit IRC		12:11
*** SergeyLukjanov has quit IRC		12:11
*** zehicle_at_dell has joined #openstack-infra		12:12
openstackgerrit	Julien Danjou proposed a change to openstack-infra/config: Add py33 jobs for WSME https://review.openstack.org/43112	12:16
*** jungleboyj has quit IRC		12:17
*** jungleboyj has joined #openstack-infra		12:18
*** zehicle_at_dell has quit IRC		12:24
*** dkliban has quit IRC		12:27
*** jjmb has quit IRC		12:35
*** dkranz has joined #openstack-infra		12:36
*** dims has quit IRC		12:40
*** dkranz has quit IRC		12:41
*** dims has joined #openstack-infra		12:42
openstackgerrit	A change was merged to openstack/requirements: assign a min version to pycadf https://review.openstack.org/42923	12:47
*** pabelanger has quit IRC		12:55
*** cppcabrera has joined #openstack-infra		12:57
*** adalbas has joined #openstack-infra		12:57
*** alexpilotti has joined #openstack-infra		13:01
*** ruhe has joined #openstack-infra		13:04
*** weshay has quit IRC		13:05
*** zehicle_at_dell has joined #openstack-infra		13:07
*** mriedem has joined #openstack-infra		13:09
markmc	seeing a lot of these	13:11
markmc	https://jenkins02.openstack.org/job/gate-grenade-devstack-vm/2834/console	13:11
markmc	anyone know what the cause is?	13:11
*** cppcabrera has left #openstack-infra		13:11
mordred	markmc: looking	13:11
*** dina_belova has joined #openstack-infra		13:11
*** dina_belova has quit IRC		13:12
*** dina_belova has joined #openstack-infra		13:12
mordred	markmc: http://logs.openstack.org/56/42756/5/check/gate-grenade-devstack-vm/c05ec42/logs/devstack-gate-setup-workspace-old.txt	13:13
mordred	markmc:	13:13
mordred	+ timeout -k 1m 5m git remote update	13:13
mordred	Fetching origin	13:13
mordred	error: RPC failed; result=52, HTTP code = 0	13:14
openstackgerrit	will soula proposed a change to openstack-infra/jenkins-job-builder: Adding AnsiColor Support https://review.openstack.org/43121	13:14
mordred	fatal: The remote end hung up unexpectedly	13:14
mordred	error: Could not fetch origin	13:14
mordred	clarkb, jeblair, fungi ^^ looks like we're still slamming git.o.o	13:14
markmc	ok	13:15
*** dina_belova has quit IRC		13:17
*** acabrera has joined #openstack-infra		13:17
*** acabrera has left #openstack-infra		13:18
*** vogxn has joined #openstack-infra		13:19
*** weshay has joined #openstack-infra		13:19
*** anteaya has joined #openstack-infra		13:25
*** jjmb has joined #openstack-infra		13:25
mordred	markmc: wanna hear something funny?	13:29
markmc	mordred, perhaps :)	13:29
mordred	hacking can't pass unittest in python 3.3 because if its python 3 compatibility checks	13:29
mordred	because the good/bad strings throw different errors :)	13:30
markmc	nice	13:30
mordred	yah	13:30
*** afazekas has joined #openstack-infra		13:30
*** afazekas has quit IRC		13:31
*** lbragstad has joined #openstack-infra		13:32
*** sandywalsh has quit IRC		13:43
*** jjmb has quit IRC		13:46
*** changbl has quit IRC		13:46
*** ftcjeff has joined #openstack-infra		13:48
*** prad_ has joined #openstack-infra		13:53
jeblair	things would be a lot better if the neutron job weren't flakey	13:54
anteaya	morning jeblair	13:57
anteaya	I might be wrong, but am I seeing we have 10 devstack precise nodes available?	13:57
anteaya	when we normally have about twice as many	13:57
jeblair	anteaya: http://tinyurl.com/kmotmns	13:58
anteaya	ah so the chart at the very bottom of the long check queue on zuul status page is just saying we have very few free, since it is entitled "available test nodes"	13:59
jeblair	yep	13:59
anteaya	I knew my interpretation didn't make sense	13:59
anteaya	thanks	13:59
* anteaya is refraining from acknowledging mordred since his is on vacation		14:00
mordred	morning anteaya	14:00
anteaya	Wednesday 8am, the timeline has got to be Central time, for some reason	14:00
mordred	jeblair: remember the tox 1.6 issue where it stopped using our mirror?	14:01
anteaya	which is weird since I know you are on Pacific time jeblair	14:01
anteaya	morning mordred	14:01
mordred	jeblair: I filed a bug and hpk is said that $HOME thing should not have merged/been in 1.6	14:01
mordred	he's working on a 1.6.1 that reverts that change	14:01
*** burt has joined #openstack-infra		14:01
mordred	and I've just tested it and it works well	14:02
jeblair	yay	14:02
jeblair	mordred: did we discuss using afs to share the git repos across several git servers?	14:03
mordred	jeblair: we did not - but I think it's an excellent idea	14:03
mordred	jeblair: because, honestly, it's not file io that's a problem - it's the cpu cost associated with calculating what's needed	14:03
jeblair	mordred: it is seriously worth considering; local caching + invalidation would be good; we'd just need to make sure the locking model works	14:04
mordred	jeblair: so, quite honestly, if all of our nodes were afs clients and read from /afs/infra.openstack.org/git/$project	14:04
jeblair	mordred: (all this as opposed to having gerrit star-replicate to n workers)	14:04
mordred	yes	14:04
mordred	jeblair: oh, were you thinking afs to get the repos to the gitX servers?	14:05
jeblair	mordred: oh, heh, well, afs can be somewhat bandwidth inefficient; so i'm not sure how well having everything use it would work; i was just thinking of a pool of git servers.	14:05
jeblair	mordred: yeah	14:05
mordred	gotcha	14:05
mordred	well, here's the thing	14:05
mordred	we can start with that	14:05
mordred	and it'll either work or not	14:05
mordred	and then if that is set up - then we can look at whether access via /afs on slaves is better or worse	14:06
mordred	pretty easilyu	14:06
jeblair	yep. though by start you mean 'start looking into after we implement our current plan', right? :)	14:06
*** dkliban has joined #openstack-infra		14:06
mordred	jeblair: god yes	14:06
jeblair	so we have some real data now	14:06
jeblair	i mean, it's only like 2 data points	14:07
mordred	jeblair: did the az2 image update work?	14:07
mordred	it looked to me like it did from looking at nova image base information	14:07
jeblair	but we know that if 100 clients hit git.o.o, we push 20-25Mbit and peg user cpu time	14:07
jeblair	mordred: do you get the idea that top was lying to us?	14:08
*** _TheDodd_ has joined #openstack-infra		14:08
mordred	tough to say, honestly	14:08
jeblair	mordred: there's like no i/o.	14:09
mordred	jeblair: that doesn't realy surprise me	14:09
mordred	there's tons of ram on the boxes	14:09
jeblair	so it's all cpu (and possibly file locking; not sure how that would show up)	14:09
mordred	I'm pretty sure it's all in the fs cache layer	14:09
mordred	file locking I _think_ would show up in sys wait time	14:10
jeblair	that's what i'd expect, unless git is doing something on its own	14:10
*** ryanpetrello has joined #openstack-infra		14:11
*** vogxn has quit IRC		14:12
*** michchap has quit IRC		14:12
*** dina_belova has joined #openstack-infra		14:13
jeblair	mordred: i'm reading about git's lockfile usage (to understand current behavior); i note that it _is_ compatible with afs.	14:16
mordred	neat	14:16
mordred	you know - afs client caching may make total access not ridiculous	14:16
mordred	since most of the pack files should wind up cached client side	14:16
jeblair	mordred: it's the initial population i'm worried about; though, i suppose if the devstack nodes have a fully populated afs cache from image creation... maybe not so bad.	14:17
*** dina_belova has quit IRC		14:17
*** ruhe has quit IRC		14:17
mordred	jeblair: yah. that's what I was thining	14:17
mordred	thinking	14:17
jeblair	mordred: i've uh, never used an afs client that was cloned from another afs client.	14:18
jeblair	mordred: those two worlds have not collided for me. :)	14:18
mordred	love it	14:18
anteaya	mordred: this was the jenkins failure on your disable salt globally patch: http://logs.openstack.org/30/43030/1/check/gate-ci-docs/1cdc607/console.html.gz can I do "recheck no bug"?	14:20
mordred	anteaya: yes.	14:20
mordred	the failure is a git clone failure	14:20
dhellmann	good morning	14:20
*** vogxn has joined #openstack-infra		14:21
Alex_Gaynor	dhellmann: morning (I assume you're not at home?)	14:21
jeblair	well, crap; it looks like zuul is stuck again	14:21
mordred	morning dhellmann !	14:21
anteaya	mordred: that was what I thought, thanks for confirmation	14:21
dhellmann	Alex_Gaynor: it's still morning here at home :-)	14:21
anteaya	morning dhellmann Alex_Gaynor	14:21
anteaya	jeblair: :(	14:21
*** datsun180b has joined #openstack-infra		14:22
openstackgerrit	A change was merged to openstack-infra/zuul: SIGUSR2 logs stack traces for active threads. https://review.openstack.org/42959	14:22
mordred	Alex_Gaynor: there's a little bit of pushback from clayg on syncing with global requirements - I responded that it's not urgent and that perhaps sdague and I should chat with him when we both get back from vacation	14:22
mordred	Alex_Gaynor: but then I just realized that you have a foot in both worlds	14:22
jeblair	i forced that ^ so it's in place after the restart	14:22
mordred	jeblair: great. I support you in that	14:22
Alex_Gaynor	mordred: Ok, I can take a look at trying to push that along, I need to take a bit and figure out what hte most effective advocacy strategy is going to be	14:23
anteaya	jeblair: do we need to change channel status do you think?	14:23
Alex_Gaynor	I think so, zuul seems totally stalled	14:24
mordred	Alex_Gaynor: yeah - I think we might need to articulate better the reasons we want it	14:24
dhellmann	so I'd like to set up WSME on launchpad so bugs are updated when things happen in gerrit. IIRC, to do that for ceilometer I added a user (or group?) to our Drivers group. Is that right?	14:24
mordred	Alex_Gaynor: also, I think we have a little bit of the traditional push-back against 'openstack is one project' (I don't mean that to be nasty, just that there are remaining pockets of resistence to that decision, and I think they color openstack-centric tasks at times, which means extra care needs to be taken with justification)	14:25
*** dguitarbite has joined #openstack-infra		14:25
jeblair	i'm restarting zuul	14:26
Alex_Gaynor	This is going to cause us to lose all current pipelines? Are there any thoughts about putting that state somewhere persistent?	14:27
jeblair	Alex_Gaynor: i've saved a copy	14:27
Alex_Gaynor	jeblair: oh, cool	14:27
*** ladquin_afk is now known as ladquin		14:29
jeblair	i'm adding them back with a 30 second delay between each.	14:30
mordred	jeblair: nice	14:31
mordred	jeblair: you know - I wonder - when zuul re-queues things after a gate reset - perhaps it should put a delay between each gearman request? mitigate the herd a little bit?	14:31
*** jungleboyj has quit IRC		14:33
jeblair	mordred: yeah, i was suggesting that to clarkb yesterday as something to explore; we need to be careful that we don't get too backed up	14:34
mordred	yah	14:34
jeblair	that's the thing with queuing systems; if you can't keep up with the throughput, you can get into situations where you never recover	14:34
jeblair	so i'm much more focused on making sure we can keep up	14:34
*** yolanda has joined #openstack-infra		14:34
jeblair	mordred: that '30 second delay' i'm doing? that's 15 minutes before the gate queue is populated again.	14:35
Alex_Gaynor	So is zuul CPU bound, or something else?	14:35
yolanda	hi, i'm trying to deploy zuul using an apache frontend that is on another machine, but i'm having a problem with serving git repos, any one has done something similar?	14:35
yolanda	problem i have is with aliasmatch, it refers to AliasMatch ^/p/(.*/objects/[0-9a-f]{2}/[0-9a-f]{38})$ /var/lib/zuul/git/$1 that is on zuul machine, and cannot be accessed from apache	14:36
jeblair	Alex_Gaynor: zuul is not operating near its limits	14:36
Alex_Gaynor	jeblair: so it's git / gearman / gerrit ?	14:36
jeblair	Alex_Gaynor: but if it were, it would be cpu bound	14:36
jeblair	Alex_Gaynor: the current problem is we can't serve git repos fast enough for all the test jobs	14:37
Alex_Gaynor	jeblair: ok, so we're sure it's that	14:37
*** thomasbiege1 has joined #openstack-infra		14:37
jeblair	Alex_Gaynor: which is why today's project is load-balancing that across multiple servers	14:37
Alex_Gaynor	jeblair: surely someone has had this problem before right... ? We can't be the first people to to be git-bound	14:37
jeblair	Alex_Gaynor: zuul's problem is that it has a bug that we haven't been able to identify due to inadequate logging and lack of ability to get a stacktrace	14:38
jeblair	Alex_Gaynor: http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=23	14:38
jeblair	that's zuul ^	14:38
jeblair	Alex_Gaynor: http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=43	14:38
jeblair	that's git ^	14:38
Alex_Gaynor	consistent 10-15MBps, that's rpetty cool	14:39
*** thomasbiege1 has quit IRC		14:39
jeblair	mordred, clarkb, zaro: when the gearman server restarts, i think the executorworkerthread dies, which means the offline-on-complete feature fails	14:42
*** xBsd has quit IRC		14:42
jeblair	mordred, clarkb, zaro: which is why a lot of jobs are showing up as lost right now -- they are re-running on hosts that should have been offlined	14:42
*** michchap has joined #openstack-infra		14:43
jeblair	so for the moment, if we stop zuul, we need to delete all the slaves	14:44
anteaya	ouch	14:45
*** pblaho has quit IRC		14:45
*** gordc has joined #openstack-infra		14:46
*** changbl has joined #openstack-infra		14:46
*** pabelanger has joined #openstack-infra		14:46
*** AJaeger has joined #openstack-infra		14:47
*** AJaeger has joined #openstack-infra		14:47
openstackgerrit	Ryan Petrello proposed a change to openstack-infra/config: Provide a more generic run-tox.sh. https://review.openstack.org/43145	14:48
*** jungleboyj has joined #openstack-infra		14:48
mgagne	mordred: what was your gerrit search filter you sent a couple of weeks ago?	14:49
jeblair	mordred: so none of the az2 nodes are launching jenkins slaves.	14:50
jeblair	i spot-checked one and got this:	14:50
jeblair	$ java -version	14:50
jeblair	Segmentation fault (core dumped)	14:50
*** michchap has quit IRC		14:51
Alex_Gaynor	awesome.	14:51
ttx	mordred: late pong	14:51
jeblair	3 makes a pattern, right?	14:51
anteaya	ttx: since you are here: https://review.openstack.org/#/c/43002/	14:53
*** rnirmal has joined #openstack-infra		14:53
*** kspear has quit IRC		14:54
*** kspear has joined #openstack-infra		14:54
*** ruhe has joined #openstack-infra		14:57
*** _TheDodd_ has quit IRC		14:59
*** _TheDodd_ has joined #openstack-infra		15:01
*** w_ is now known as olaph		15:02
ttx	anteaya: looking	15:04
anteaya	thanks	15:04
Alex_Gaynor	jeblair: can you link the review for load balanced git?	15:07
*** UtahDave has joined #openstack-infra		15:07
*** mrodden has quit IRC		15:08
jeblair	Alex_Gaynor: https://review.openstack.org/#/c/42784/	15:08
jeblair	Alex_Gaynor: I think we're also going to do this https://review.openstack.org/#/c/43012/	15:08
ttx	anteaya: reviewed	15:08
anteaya	ttx thank you	15:08
*** vogxn has quit IRC		15:08
*** rnirmal has quit IRC		15:10
openstackgerrit	Anita Kuno proposed a change to openstack-infra/config: Creating/adding the openstack/governance repository https://review.openstack.org/43002	15:13
*** dina_belova has joined #openstack-infra		15:13
jeblair	pleia2, clarkb, Alex_Gaynor: I'm going to spin up a few copies of git.o.o of different sizes (8, 15, 30) for testing.	15:15
jeblair	pleia2, clarkb, Alex_Gaynor: if we are cpu bound, it looks like the 8gb machines (4vcpus) might be the sweet spot (half the cpus with 1/4 the ram of a 30gb vm)	15:15
anteaya	mordred can I get your feedback on the openstack/governance name, please?	15:16
anteaya	if you don't like it, can I get a better suggestion?	15:16
*** dina_belova has quit IRC		15:18
*** mrodden has joined #openstack-infra		15:19
*** mkerrin has quit IRC		15:20
*** mkerrin has joined #openstack-infra		15:20
openstackgerrit	Anita Kuno proposed a change to openstack-infra/config: Creating/adding the openstack/governance repository https://review.openstack.org/43002	15:20
*** mkerrin has quit IRC		15:23
mordred	anteaya: openstack/governance sounds great	15:24
mordred	jeblair: wow. segfault. nice	15:24
mordred	mgagne: gerrit search filter ... for things I should review?	15:25
jeblair	mordred: yeah, i'm going to leave that and assume it's an image problem	15:25
mgagne	yes	15:25
mordred	jeblair: k	15:25
mordred	jeblair: gosh, do we need to make the ssh check an "ssh and run java --version" check?	15:25
jeblair	slowing down the rate of adding new nodes is mildly helpful atm anyway.	15:25
mordred	mgagne: I do this: https://review.openstack.org/#/q/watchedby:mordred%2540inaugust.com+-label:CodeReview%253C%253D-1+-label:Verified%253C%253D-1+-label:Approved%253E%253D1++-status:workinprogress+-status:draft+-is:starred+-owner:mordred%2540inaugust.com,n,z	15:26
jeblair	mordred: hrm, i wonder if the template host was broken, or if the image created from the template host was broken.	15:26
mordred	jeblair: good question - template host still around?	15:26
jeblair	it would be difficult to find out, since the template host is deleted almost immediately	15:26
mordred	yeah. I was afraid of that	15:26
mordred	mgagne: and I scan that list, and star things that I need to review, then I do: https://review.openstack.org/#/q/is:starred+-label:CodeReview%253C%253D-1+-label:Verified%253C%253D-1,n,z	15:27
mordred	and unstar things when I'm done with them	15:27
mordred	I'm doing that every morning when I wake up now	15:27
mordred	it's helping	15:27
mordred	(although getting the list under control and then reviewing every morning also helped)	15:27
mordred	jeblair: az2 was the one having issues yesterday though	15:28
mordred	jeblair: mark the image for delete and see if it can generate a real one today?	15:28
mgagne	mordred: thanks!	15:29
*** mkerrin has joined #openstack-infra		15:30
jeblair	mordred: yeah, but we're doing ok on the other 2 for now, this will help with the git.o.o load (a little) so i'm in no rush to fix	15:30
mrodden	anyone seen tox failing with "no such option: --pre" on the pip install step?	15:31
mordred	anteaya: reviewed	15:31
mordred	jeblair: ok	15:31
mrodden	apparently tox 1.6.0 is on virtualenv 1.9.1 which has pip 1.3.1 embedded which doesn't support --pre	15:31
mordred	mrodden: link?	15:31
markmc	sob my change approved 8 hours ago is now 34th in the gate queue sob	15:31
mrodden	not sure why i am hitting it all of a sudden	15:31
markmc	where's my violin?	15:31
mordred	mrodden: that souds like the glanceclient issue from the other day that I expect to be fixed	15:32
mrodden	mordred: its in my local env	15:32
mrodden	oh	15:32
mordred	mrodden: update your glanceclient	15:32
mordred	mrodden: evil happened	15:32
mrodden	lol	15:32
mrodden	will do	15:32
mrodden	thanks	15:32
* dansmith feels sorry for zuul today		15:33
mordred	dansmith: it likes it	15:33
dansmith	mordred: oh, a little masochistic, is it?	15:34
mordred	heck yes	15:34
dansmith	kinky.	15:34
jeblair	markmc, dansmith: current major issues: we can't serve git repos fast enough for all the tests we're running; the neutron job appears flakey.	15:36
dansmith	dammit neutron!	15:37
markmc	jeblair, yeah, was following along	15:37
mriedem	i know one guy in here that likes to give out punishment, might be a good match for zuul :)	15:37
markmc	don't mind the whining from the cheap seats	15:37
*** nati_ueno has joined #openstack-infra		15:37
jeblair	markmc, dansmith: minor issues: zuul has a bug that causes it to stop occasionally; one of our test images has a java that segfaults	15:37
jeblair	and a few more minor than that	15:37
markmc	heh, "minor issues"	15:37
dansmith	nice, I saw the big reset this morning	15:38
mordred	markmc: gotta love feature freeze, when the two of those are 'minor'	15:38
jeblair	markmc: yeah when "zuul stops working" is a minor issue, you know we're having fun. :)	15:38
mordred	oh - corralary to that issue- debugging a hung python program is apparently not easy	15:38
markmc	jeblair, not whining honestly, but how did https://review.openstack.org/#/c/43060/ end up at the bottom after the restart ?	15:38
markmc	jeblair, shoulda been near the top, no?	15:39
jeblair	markmc: erm, it's worse than that. :( it was at the top, but due to a recently discovered very minor issue, when i restarted zuul, several of the test nodes were not off-lined as they should have been	15:39
* markmc puts it down to karma for approving his own change		15:39
jeblair	markmc: so it got dequeued due to an erroneously failing test	15:40
markmc	jeblair, ok	15:40
jeblair	markmc: sorry :(	15:40
* markmc shrugs		15:40
openstackgerrit	Anita Kuno proposed a change to openstack-infra/config: Creating/adding the openstack/governance repository https://review.openstack.org/43002	15:40
jeblair	markmc: we now know that if that happens again we need to clean up the test nodes until we can automate that case	15:40
anteaya	mordred: thanks	15:40
markmc	jeblair, cool	15:40
jeblair	markmc: that's what most of the "LOST" jobs on the screen are	15:41
*** reed has joined #openstack-infra		15:41
markmc	jeblair, ok, thanks	15:41
markmc	jeblair, that's a particularly sad name for a status	15:41
markmc	LOST,LONELY	15:41
markmc	now that would be sad	15:41
jeblair	markmc: or just "SAD"	15:42
anteaya	:(	15:42
markmc	jeblair, indeed :)	15:42
openstackgerrit	Andreas Jaeger proposed a change to openstack-infra/config: Build Basic Install Guide for openSUSE https://review.openstack.org/42988	15:44
*** dkranz has joined #openstack-infra		15:46
*** nayward has joined #openstack-infra		15:49
*** SergeyLukjanov has joined #openstack-infra		15:49
*** senk has joined #openstack-infra		15:51
chmouel	so for the LOST thing should I just do a recheck no bugs?	15:51
*** nati_ueno has quit IRC		15:51
Alex_Gaynor	chmouel: yup	15:51
chmouel	Alex_Gaynor: tks	15:52
* chmouel didn't feel like reading the full scrollback :-p		15:52
*** rfolco has joined #openstack-infra		15:53
*** dina_belova has joined #openstack-infra		15:54
*** vogxn has joined #openstack-infra		15:56
*** pcm_ has quit IRC		15:56
*** boris-42 has quit IRC		15:57
*** mkerrin has quit IRC		15:59
*** mkerrin has joined #openstack-infra		15:59
jeblair	pleia2, fungi, clarkb: the git puppet manifest has some problems; an selinux command failed during the firts run, and i think there may be an rpm/pip conflict on the pyyaml package	16:00
*** mkerrin has quit IRC		16:01
*** mkerrin has joined #openstack-infra		16:01
*** mkerrin has quit IRC		16:02
clarkb	:(	16:02
clarkb	jeblair It needs a firewall ypdate too	16:02
clarkb	jeblair was that run on a new host or the existing?	16:03
jeblair	that was easy enough to fix (pip uninstall pyyaml)	16:04
jeblair	clarkb: new hosts -- i'm spinning up test hosts for benchmarking	16:04
clarkb	cool. let me know if you catch other puppet things I will update that manifest soon	16:04
fungi	jeblair: ah, yes i believe i pointed out the selinux thing to pleia2 before. i think the issue is that enabling selinux requires a reboot, and the command to adjust selinux won't work until it's activated	16:05
*** ruhe has quit IRC		16:05
fungi	i believe it was an oversight caused by hpcloud enabling selinux by default and rackspace not	16:05
jeblair	yay	16:05
clarkb	fungi so activate; reboot; puppet?	16:05
*** sridevi has joined #openstack-infra		16:05
*** jungleboyj has left #openstack-infra		16:06
fungi	clarkb: i think just reboot, but may need to manually activate selinux before doing so (though i think the puppet selinux module has already set it to be active after a reboot)	16:06
sridevi	Hi, can someone help me with this jenkins' failure.	16:06
*** AJaeger has quit IRC		16:06
sridevi	https://review.openstack.org/#/c/34801/	16:06
sridevi	anyone?	16:07
anteaya	thanks markmc	16:07
markmc	anteaya, thank you	16:08
anteaya	:D	16:08
anteaya	sridevi: I'll take a look	16:09
sridevi	thanks anteaya	16:09
sridevi	http://logs.openstack.org/01/34801/21/check/gate-tempest-devstack-vm-neutron/3076bcb/console.html.gz	16:09
jeblair	sridevi: that appears to be a real failure; it happens consistently for every test run for days now.	16:09
sridevi	real failure, you mean some bug in the patch? jeblair	16:10
jeblair	sridevi: yes	16:10
sridevi	okay.	16:10
jeblair	sridevi: i'd recommend setting up a devstack environment and testing it locally there	16:11
sridevi	jeblair: Hmm. But I don't see any error other that "ERROR:root:Could not find any typelib for GnomeKeyring"	16:11
anteaya	Process leaked file descriptors.	16:11
anteaya	it is in every failure log	16:12
jeblair	anteaya: that's harmless	16:12
anteaya	jeblair: ah okay	16:12
jeblair	sridevi: it looks like the patch broke devstack, from the way the devstack log ends.	16:12
sridevi	Hmm	16:13
jeblair	sridevi: last line of this file: http://logs.openstack.org/01/34801/21/check/gate-tempest-devstack-vm-full/984c01f/logs/devstacklog.txt.gz	16:13
*** nayward has quit IRC		16:13
pleia2	hm, what brought in pyyaml?	16:13
jeblair	pleia2: jeepyb	16:13
pleia2	jeblair: via pip?	16:14
pleia2	(looking now)	16:15
jeblair	pleia2: i think the sequencing is off; it installed jeepyb first which would have easy_installed it using python setup.py install, then it tried to install the rpm	16:15
jeblair	pleia2: i think either we want to make jeepyb require-> the package, or else remove the package and let easy install do its thing	16:16
*** SergeyLukjanov has quit IRC		16:16
sridevi	jeblair: what in the last line."services=s-container"	16:16
sridevi	?	16:16
reed	hi guys, how are things going today?	16:16
pleia2	jeblair: I see, thanks	16:16
* fungi is going to be out at the space needle and the science museum for a little while, but will be back on later this afternoon		16:17
pleia2	fungi: enjoy :)	16:17
fungi	thanks pleia2	16:17
anteaya	fungi have fun at the space needle	16:17
reed	fungi, enjoy... and in your free time comment on https://review.openstack.org/#/c/42998/ :)	16:17
anteaya	reed: about the same as yesterday, zuul got stuck again this morning	16:18
jeblair	reed: not terribly well, i think we have at least a full day ahead of us	16:18
reed	:(	16:18
reed	not terribly well is hard to parse	16:18
anteaya	sridevi: yes, that is the last line that ran in devstack, after that it broke	16:18
*** rnirmal has joined #openstack-infra		16:18
jeblair	reed: heh, that seems appropriate somehow. anyway, 'poorly'. :)	16:19
reed	not terribly is a double negation, right? makes it a positive... well is positive ... double positive is bad? :)	16:19
sridevi	anteaya: hmm	16:19
anteaya	sridevi: the fact that devstack didn't finish is an indication that the patch affected the devstack installation	16:19
reed	jeblair, trying to assess how long it will take for https://review.openstack.org/#/c/42998/ to be evaluated and go through... two days?	16:20
reed	(it's my request for a staging server)	16:20
anteaya	sridevi: so your patch affects swift and the swift container service couldn't install properly	16:21
sridevi	okay	16:21
*** ruhe has joined #openstack-infra		16:21
anteaya	sridevi: here is the screen log for the swift container: http://logs.openstack.org/01/34801/21/check/gate-tempest-devstack-vm-full/984c01f/logs/screen-s-container.txt.gz	16:22
jeblair	reed: i hope so; but this is a very exceptional time; we have unprecedented test load, several systems that need upgrading to deal with it, and only two core developers full-time (though i believe we are more than full-time at the moment)	16:22
koolhead17	hi all	16:22
anteaya	hi koolhead17	16:23
koolhead17	anteaya: how have you been	16:23
anteaya	koolhead17: good thanks, trying to be helpful without getting in the way	16:23
anteaya	busy time right now	16:23
jeblair	reed: as soon as things are not on fire, i will review your and mrmartin's patches	16:23
koolhead17	reed: hi there	16:23
koolhead17	anteaya: what patch are we discussing about	16:24
clarkb	jeblair: just a little more than full time :)	16:24
clarkb	jeblair: I am finally in a chair where I can focus. Is there anything I should look at first/immediately?	16:24
jeblair	clarkb: get the git.o.o load balanced stuff ready to go	16:24
clarkb	ok	16:25
reed	jeblair, thanks	16:25
jeblair	clarkb: i'm working on some simple benchmarking (but obviously even simple benchmarking is going to take a bit)	16:25
anteaya	koolhead17: well, I was helping sridevi with his patch https://review.openstack.org/#/c/34801/ I have a patch up: https://review.openstack.org/#/c/43002/4 and two patches are under consideration hoping they will help the current jenkins/zuul/git issues: https://review.openstack.org/#/c/42784/ https://review.openstack.org/#/c/43012/	16:26
anteaya	so we have a few to choose from, koolhead17 :D	16:26
anteaya	jeblair clarkb I don't think I know enough to be of use and don't want to slow you down, if there is something you think I can do to help, please tell me	16:27
jeblair	anteaya: thanks; fielding questions like that ^ is _very_ helpful	16:28
anteaya	jeblair: very good, I shall endeavour to do my best	16:28
*** SergeyLukjanov has joined #openstack-infra		16:28
clarkb	jeblair: any interest in updating the git.pp to possibly run on precise sans cgit?	16:29
*** cthulhup has joined #openstack-infra		16:29
clarkb	jeblair: not sure if you are interested in testing that, but I think it would be a small change	16:29
jd__	I've a LOST job here https://review.openstack.org/#/c/42642/ should I open a bug?	16:30
anteaya	jd__: yes	16:30
anteaya	no no bug	16:30
anteaya	it is a a result of a zuul restart this morning	16:30
anteaya	the gearman server lost a thread	16:30
jd__	anteaya: define "morning"? :)	16:30
anteaya	and as a result there were lost jobs	16:30
anteaya	sorry yes, you are right	16:30
jd__	ack, I'll recheck no bug then	16:30
anteaya	about 3 hours ago	16:30
anteaya	yes, recheck no bug	16:31
anteaya	thanks	16:31
jeblair	clarkb: no cgit that way	16:31
jd__	thanks anteaya	16:31
anteaya	:D	16:31
clarkb	jeblair: correct, it would just be a repo mirror	16:31
*** markmc has quit IRC		16:31
jd__	btw I wonder, what/where is openstackstatus used?	16:31
jeblair	clarkb: haven't we started using the cgit server?	16:31
anteaya	jd__: where do you see openstackstatus?	16:31
anteaya	I'm on help desk as the fires are being fought	16:32
jeblair	#status alert LOST jobs are due to a known bug; use "recheck no bug"	16:32
openstackstatus	NOTICE: LOST jobs are due to a known bug; use "recheck no bug"	16:32
*** ChanServ changes topic to "LOST jobs are due to a known bug; use "recheck no bug""		16:32
*** dina_belova has quit IRC		16:32
clarkb	jeblair: a little yes. we would probably end up needing to do an additional set of proxying for cgit back to the centos servers. now that I think about it nevermind	16:32
jeblair	clarkb: yeah, i think that why we decided to just throw hardware at it for now	16:33
jd__	anteaya: I meant the bot, but now I see it changes the topic :)	16:33
anteaya	jd__: ah okay	16:33
jeblair	jd__: it needs some work; it's not very reliable yet	16:33
jeblair	jd__: eventually we'd like it in all the channels and to have it update web pages	16:33
jeblair	jd__: it's been a while since we've had time to hack on that	16:34
clarkb	jeblair: I am not finding python-yaml or pyyaml in our puppet manifest for cgit. It looks like jeepyb installs it and something on centos is installing it globally? And since centos doesn't do site-packages they interfere?	16:34
jd__	jeblair: what's its Git repository?	16:34
*** jpich has quit IRC		16:34
jeblair	jd__: openstack-infra/statusbot	16:34
clarkb	jeblair: I think I am going to ignore that for now as you have a work around	16:34
jeblair	jd__: one of the pre-reqs for all channels is this bug (let me fetch it)	16:34
*** gyee has joined #openstack-infra		16:34
*** sridevi has quit IRC		16:35
jeblair	jd__: https://bugs.launchpad.net/openstack-ci/+bug/1190296	16:35
uvirtbot	Launchpad bug 1190296 in openstack-ci "IRC bot to manage official channel settings" [Medium,Triaged]	16:35
jeblair	(i don't want to add it to 30 channels manually)	16:35
jeblair	jd__: and then it has problems reconnecting on netsplits	16:35
jeblair	don't know if there's a bug for that	16:36
jeblair	clarkb: sounds good	16:36
jd__	jeblair: ack	16:36
*** dina_belova has joined #openstack-infra		16:37
mrodden	wow that is dirty...	16:38
anteaya	mrodden: what are you referencing?	16:38
mrodden	when you pip install virtualenv it drops the latest version it can find of pip into $SITE_PACKAGES/virtual_env/	16:38
mrodden	and it never updates it from then on	16:39
mrodden	and that is what it uses when it creates a new virtualenv	16:39
mrodden	so my virtualenvs were all stuck at pip 1.2.1	16:39
mrodden	sorry $SITE_PACKAGES/virtualenv_support/	16:39
mgagne	mordred: since you are the gerrit search master to me, how can you exclude changes which have been reviewed by yourself?	16:40
clarkb	mrodden: correct because virtualenv vendors pip and setuptools and distribute	16:40
mrodden	clarkb: yeah but for soem reason it had pip 1.2.1 and also pip 1.4.1 and was only using 1.2.1	16:41
mrodden	it doesnt enforce that it copies the correct version from that spot	16:41
mrodden	:(	16:41
anteaya	mgagne: I'm not sure of his status, he was last here an hour ago	16:41
*** Dr01d has quit IRC		16:41
mgagne	anteaya: thanks, I can wait =)	16:42
anteaya	k	16:42
anteaya	I'm sure once he is on planes he will pop in again	16:42
anteaya	I am afk for about 30 minutes, I have to give a hand to a family member	16:43
*** dkranz has quit IRC		16:45
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	16:45
clarkb	jeblair: ^ I believe that is in a reviewable state. I am going to --noop apply it to git.o.o now	16:45
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	16:47
clarkb	and that addressed one more review comment	16:47
*** AJaeger has joined #openstack-infra		16:47
*** AJaeger has joined #openstack-infra		16:47
jeblair	clarkb: the switch could be hairy; if it doesn't work, we end up with a lot of failed jenkins jobs	16:49
clarkb	jeblair: yup	16:49
clarkb	jeblair: how do you feel about putting jenkins* into shutdown mode while we do it?	16:49
jeblair	clarkb: may want shut down puppet and apply it to a test node first	16:49
*** cthulhup has quit IRC		16:49
jeblair	clarkb: at the current rate, you'd still have to wait like 30 minutes for the git processes to finish	16:50
*** vogxn has quit IRC		16:50
clarkb	jeblair: just the git processes?	16:50
clarkb	wow	16:50
jeblair	clarkb: last i looked, the devstack-gate prep steps were taking a looong time	16:50
jeblair	clarkb: i have 3 test nodes we can run it on. :)	16:51
jeblair	8 192.237.168.226	16:51
jeblair	15 162.209.12.127	16:51
jeblair	30 198.101.151.5	16:51
jeblair	clarkb: ^	16:52
jeblair	clarkb: (first column is memory)	16:52
clarkb	jeblair: ok I can hijack one of them and change its certname so that it gets the haproxy stuff	16:52
jeblair	clarkb: please; i ran 'puppet apply --test --certname git.openstack.org'	16:52
clarkb	jeblair: also, this is a multistep process. The change above will only add haproxy and move the apache vhosts and git daemon to offset ports	16:53
jeblair	clarkb: take the 15g one	16:53
clarkb	jeblair: it won't do load balancing until we get another change or two in to replciate to the other hosts and balance across them with haproxy	16:53
clarkb	jeblair: ok	16:53
jeblair	clarkb: yeah, i like the process; it's just the port move that i'm worried about	16:53
clarkb	jeblair: should I be running a bunch of clones against the 15g node while I apply puppet?	16:54
jeblair	clarkb: er, were their firewall changes?	16:54
jeblair	there even	16:54
clarkb	jeblair: ya my latest patchset adds firewall changes	16:54
clarkb	to allow 4443 and 8080 and 29418	16:55
jeblair	ah, i see it now.	16:55
clarkb	I am not restricting access to those ports as they are all read only anyways	16:55
jeblair	clarkb: hosently, i wouldn't worry about it. if there's a blip; we can deal. it's more of if it's actually offline more more than 30 seconds we would be very unhappy	16:56
clarkb	starting with a --noop on the 15g node	16:56
jeblair	clarkb: we can also do the jenkins shutdown idea, to reduce the impact	16:56
clarkb	jeblair: the port change for apache didn't go in so haproxy wouldn't start. Looking into that now	16:57
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	16:59
clarkb	that hsould do it, testing	16:59
*** nati_ueno has joined #openstack-infra		16:59
*** ruhe has quit IRC		17:04
clarkb	apache isn't letting go of 443 and 80. Looks to be set to listen on those ports in the default configs	17:04
pleia2	clarkb: yeah, /etc/httpd/conf/httpd.conf has Listen 80 (looking around for https)	17:06
pleia2	might have a /etc/httpd/conf.d/ssl.conf too	17:06
clarkb	pleia2: yup	17:06
*** portante has joined #openstack-infra		17:06
clarkb	pleia2: we are not managing those with puppet are we?	17:06
pleia2	clarkb: nope	17:06
portante	clarkb: ran into a swift tox issue, http://paste.openstack.org/show/44776/	17:07
*** nicedice_ has joined #openstack-infra		17:07
*** ^d has joined #openstack-infra		17:07
*** ^d has joined #openstack-infra		17:07
portante	do you know what I should do to fix this?	17:07
clarkb	jeblair: appropriate to just copy what we have there now into a puppet template and toggle the ports?	17:07
clarkb	jeblair: any better ideas?	17:07
portante	clarkb: that is a swift tox issue related to missing "pbr" package	17:07
clarkb	portante: it looks like you have an old version of pbr installed. can you try tox -re pep8?	17:08
*** david-lyle has quit IRC		17:08
*** ftcjeff_ has quit IRC		17:08
*** ftcjeff has quit IRC		17:08
portante	k	17:08
jeblair	clarkb: apache module doesn't deal with it?	17:08
clarkb	jeblair: oh maybe /me looks	17:08
*** david-lyle has joined #openstack-infra		17:08
*** ftcjeff has joined #openstack-infra		17:08
*** ftcjeff_ has joined #openstack-infra		17:09
*** SergeyLukjanov has quit IRC		17:09
*** UtahDave has quit IRC		17:09
*** dina_belova has quit IRC		17:10
portante	clarkb: weird, old version in /usr/lib by why should that affect tox?	17:11
clarkb	portante: if you have site packages enabled in tox it will use your site packages	17:12
clarkb	portante: site packages should probably be disabled if it is enabled (I believe the only project that needs it is nova for libvirt)	17:12
*** dina_belova has joined #openstack-infra		17:12
clarkb	jeblair: ssl.conf is already vendored by us (and not by puppetlabs-apache). I will just do the same with httpd.conf and set the ports dynamicaly	17:13
*** fbo is now known as fbo_away		17:13
jeblair	clarkb: sounds good	17:13
BobBall	wow the gate is queued up a lot! I hadn't been watching!	17:14
pleia2	clarkb: right, sorry, I did use the ssl one for our certificates (I should not rely on memory!)	17:14
*** dina_belova has quit IRC		17:15
burt	speaking of the gate: will 38697,2 automatically get restarted, or should I do a reverify no bug ?	17:15
burt	(looks like the python27 job was killed in the middle, https://jenkins01.openstack.org/job/gate-nova-python27/1231/console)	17:16
*** lifeless has quit IRC		17:16
portante	clarkb: I believe tox's default is to NOT use global packages, and I can't find anything in our tox.ini file that sets it to true	17:16
clarkb	portante: correct the default should be to not use it. The way to toggle it is with sitepackages = true iirc	17:17
*** lifeless has joined #openstack-infra		17:17
clarkb	portante: however, I think your virtualenvs may be stale as well	17:17
portante	I removed my entire .tox tree	17:17
clarkb	portante: if you do a .tox/pep8/bin/pip freeze do you see pbr	17:17
clarkb	portante: oh. Do you still see the error?	17:17
*** zaro has quit IRC		17:17
portante	not now, because I removed the /usr/lib/python2.7/site-packages/pbr* directory in order to make progress	17:18
portante	clarkb: and yes, now I do see the correct pbr version in the freeze output	17:19
*** ryanpetrello has quit IRC		17:19
*** ryanpetrello has joined #openstack-infra		17:20
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	17:20
anteaya	back	17:20
clarkb	portante: this is for swift? I am going to take a quick peak at the tox.ini	17:21
*** mordred has quit IRC		17:22
portante	clarkb: yes, thanks	17:22
*** dmakogon_ has joined #openstack-infra		17:22
anteaya	BobBall: yes, large queue much work happening to address it	17:23
*** rcleere has joined #openstack-infra		17:23
clarkb	pleia2: any idea of how to make selinux allow apache to listen on ports 8080 and 4443?	17:23
jeblair	clarkb: semanage port -a -t http_port_t -p tcp 8080	17:24
clarkb	jeblair: we should puppet that :)	17:24
anteaya	burt: right now my best advice is to reverify	17:24
anteaya	if I am wrong it is on me	17:24
* clarkb looks at puppet selinux docs		17:25
pleia2	clarkb: I'll poke around hte puppet module	17:25
jeblair	clarkb: shouldn't be hard (if it isn't already) semanage lets you query and add	17:25
*** yolanda has quit IRC		17:25
*** jpeeler has quit IRC		17:25
burt	anteaya: thanks, will do	17:25
*** ruhe has joined #openstack-infra		17:26
anteaya	burt welcome	17:26
clarkb	jeblair: I suppose I can add a couple execs if nothing else	17:26
pleia2	clarkb: actually, puppet module won't do this, we'll probably need to do something like I did with restorecons	17:27
jeblair	what _does_ the module do? :)	17:27
pleia2	turns it on and off, loads more modules	17:27
pleia2	it's pretty simple	17:27
jeblair	everything except managing selinux :)	17:27
pleia2	yeah, there is at least one manging one out there but it wasn't very good	17:28
*** BobBall is now known as BobBallAway		17:28
jeblair	:(	17:28
*** mordred has joined #openstack-infra		17:30
jeblair	gah, one of my test worker nodes is a dud; takes 1:40 to clone nova alone (standard is 0:22)	17:31
jeblair	(i find i'm benchmarking the clients before i can benchmark the server)	17:31
pleia2	clarkb: we'll also need to add the policycoreutils-python package (that's what has semanage)	17:32
clarkb	pleia2: ya just discovered that	17:32
clarkb	jeblair: :(	17:32
jeblair	the other 9 are ok though. :)	17:33
anteaya	mordred: I think this was the only comment I saw directed at you since you were last here: <mgagne> mordred: since you are the gerrit search master to me, how can you exclude changes which have been reviewed by yourself?	17:33
*** jpeeler has joined #openstack-infra		17:33
*** arezadr has quit IRC		17:35
pleia2	clarkb: looks like selinux already gave 8080 away: http_cache_port_t tcp 3128, 8080, 8118, 8123, 10001-10010	17:36
pleia2	get a "/usr/sbin/semanage: Port tcp/8080 already defined" error when trying to set it again	17:36
*** morganfainberg is now known as morganfainberg\|a		17:38
pleia2	ah: semanage port -m -t http_port_t -p tcp 8080 (-m to modify, rather than -a to add port def)	17:38
clarkb	pleia2: I am going to brute force it to allow other potential ports. I think I can do this with the onlyif exec clause	17:38
clarkb	or I can use -m thanks	17:38
Alex_Gaynor	:/ we really need fewer failures in the gate pipeline	17:39
mordred	mgagne: you can't	17:41
mordred	mgagne: that's the reason I do the two passes with the star	17:41
mgagne	mordred: sad panda. Sad that you can't replicate the behaviour of the "Previously Reviewed By" section	17:42
*** cthulhup has joined #openstack-infra		17:42
mgagne	mordred: is this section openstack specific?	17:42
*** rnirmal has quit IRC		17:43
*** SergeyLukjanov has joined #openstack-infra		17:43
mordred	mgagne: yes. we had to write java to get that	17:44
mordred	Alex_Gaynor: if anyone ever says that testing code as it's uploaded rather than doing the work we do to test as it would land is sufficient, they should watch our gate resets	17:46
*** cthulhup has quit IRC		17:47
Alex_Gaynor	mordred: Seriously. A decent portion of recents are from flaky tests or people approving patches before their jenkins run happens though. We really need to cut down on this	17:47
mordred	every time something fails in the gate pipeline, it's a testament to just how complex this opentsack thing we're testing really is. oy	17:47
Alex_Gaynor	s/this/those/	17:47
Alex_Gaynor	each one costs us like an hour	17:47
mordred	Alex_Gaynor: yes. we really do	17:47
mordred	and _seriously_ ? people are approving in this climant before the check job finishes?	17:47
mordred	s/climant/climate/	17:48
mtreinish	Alex_Gaynor: https://review.openstack.org/#/c/41797/ that will drop that reset time down	17:48
Alex_Gaynor	maybe not today, but I've definitely seen it before	17:48
mtreinish	but at the cost of a bit more flakiness	17:48
Alex_Gaynor	mtreinish: only one way to find out!	17:48
Alex_Gaynor	(if it's worth it)	17:48
*** afazekas has joined #openstack-infra		17:48
Alex_Gaynor	mtreinish: we going to land that once check passes?	17:48
mtreinish	Alex_Gaynor: we can, but I wasn't planning on doing it until 2 race fixes get through the gate (we can just stack it on the end)	17:49
mtreinish	here's the graphs I've been watching https://tinyurl.com/kmwsvob	17:50
Alex_Gaynor	mtreinish: probably best to wait for those to be fully landed, given how the gate is right now :/	17:50
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	17:51
mtreinish	Alex_Gaynor: yeah, the other problem is those 2 reviews don't fix the 3 most common flaky parallel fails I've been seeing in the gate pipeline	17:51
clarkb	pleia2: ^ I think that should work. you can't -m an existing thing so I do -a and if that fails -m	17:51
pleia2	clarkb: "can't -m an non-existing thing" I think you mean, but yes, good call	17:52
* pleia2 reviews		17:53
clarkb	pleia2: yah non-existing. I can type I swear	17:53
clarkb	woot dependency cycle	17:53
*** dina_belova has joined #openstack-infra		17:54
pleia2	clarkb: how are we handling git daemon's port?	17:55
pleia2	my patch?	17:57
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	17:57
clarkb	pleia2: ya	17:57
pleia2	ok cool	17:58
clarkb	which is working fine best I can tell	17:58
*** ruhe has quit IRC		17:58
mordred	Alex_Gaynor, pleia2, clarkb can I get a read on this before I send it to the dev list?	17:59
Alex_Gaynor	mordred: that's "this"?	18:00
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	18:00
mordred	haha	18:00
mordred	how about I paste the link	18:00
pleia2	:)	18:00
mordred	http://paste.openstack.org/show/44785/	18:00
Alex_Gaynor	woudln't hurt :)	18:00
mordred	I want to be clear, not too bitchy or accusing, and also not indicate panic	18:00
clarkb	I am going to kill this dependency cycle darnit	18:00
mordred	clarkb: I believe in you	18:01
pleia2	mordred: looks good to me	18:01
Alex_Gaynor	mordred: looks good to me	18:01
mordred	thanks	18:01
jeblair	does anyone want to become (even more of) a git expert?	18:02
mordred	jeblair: sure	18:02
*** zehicle_at_dell has quit IRC		18:02
jeblair	i think we need to get a handle on the refs/changes issue	18:02
*** AJaeger has quit IRC		18:03
jeblair	because a very simple test (cloning nova with and without refs/changes) is about a 2x difference in speed	18:03
mordred	as in, how that affects a remote update?	18:03
jeblair	but it's _complicated_	18:03
anteaya	mordred: I would reiterate your tl;dr before you sign off	18:03
anteaya	just in case they love your prose so much, they forget the point	18:03
jeblair	so i don't want a simple "oh, let's just not replicate refs/changes" before we _understand_ it	18:03
mordred	jeblair: can you give a summary of what's complicated?	18:04
jeblair	things that may impact the issue are whether the refs are in the repo at all, whether they are there and packed, and whether our clients or servers are appropriately (not) advertising them on initial connect	18:04
jeblair	see this thread: http://thread.gmane.org/gmane.comp.version-control.git/126797/focus=127059	18:04
jeblair	i don't know if that landed, or what	18:05
jeblair	anyway, i will get around to understanding that, but i don't want that to distract from our work on 'just add more mirrors of what we have' for now	18:05
jeblair	so if anyone makes some headway into that before we get to that optimization point, it would be useful	18:06
*** fbo_away is now known as fbo		18:06
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	18:06
mordred	jeblair: I will read that and other things and see if I can drop some knowledge	18:07
jeblair	mordred: awesome, thx	18:08
clarkb	that last patchset makes me really sad	18:08
clarkb	I am running bash in a puppet exec so that I can easily negate the return code of a command in the onlyif	18:08
clarkb	of course I probably forgot to update the path and it will fai	18:08
*** rnirmal has joined #openstack-infra		18:08
* anteaya hands clarkb an "l"		18:09
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	18:09
*** datsun180b has quit IRC		18:11
mordred	jeblair: jumping to thoughts - what if we make our remote refspecs on the build slaves more specific	18:12
*** cthulhup has joined #openstack-infra		18:13
jeblair	mordred: maybe; i'd want to understand what it's doing now though (what does git remote update do? how does it relate to the (non-)advertisement of refs?)	18:14
*** marun has quit IRC		18:14
anteaya	mordred jeblair not sure if this helps or not, but in this post I did as an intro to git I posted the changes to refs and logs/refs as I went along: http://anteaya.info/blog/2013/02/26/the-structure-and-habits-of-git/	18:14
clarkb	jeblair: that latest patchset mostly works. I am not entirely convinced it will restart apache before attempting to start haproxy, but we can do multiple passes really quickly if we need it	18:14
clarkb	jeblair: its tricky to get that right because I kept running into dependency cycles.	18:14
clarkb	jeblair: but the 15g nodes is now running apache and git-daemon behind haproxy	18:14
jeblair	clarkb: you should be able to clone nova	18:15
anteaya	but I didn't create or track remote branches or refs, so I don't answer that question	18:15
jeblair	(the others don't exist)	18:15
clarkb	jeblair: ok testing	18:15
clarkb	jeblair: git clone git://162.209.12.127/openstack/nova works	18:16
adalbas	hi! some jobs in the gate (looking at devstack-testr-vm-full) are showing this error ''ERROR:root:Could not find any typelib for GnomeKeyring'. Anyone noticed that and know what is this about?	18:16
*** xBsd has joined #openstack-infra		18:17
jeblair	clarkb: http? use GIT_SSL_NO_VERIFY=true	18:17
clarkb	jeblair: and https is failing because the development hiera does not have the ssl cert	18:17
anteaya	adalbas: that is a bug	18:17
jeblair	clarkb: are you sure? i thought dev hiera was prod hiera?	18:17
clarkb	jeblair: I don't think it is, but I will double check	18:18
anteaya	it shouldn't affect the outcome of the tests adalbas	18:18
adalbas	anteaya, yeah, i realized that. Is there a bug opened for that anyway?	18:18
clarkb	jeblair: nevermind it is a symlink. I will look into this more closely	18:18
anteaya	adalbas: looking	18:18
*** fbo is now known as fbo_away		18:18
jeblair	clarkb: it _should_ install the cert for git, which you should be able to ignore with that env var	18:19
*** marun has joined #openstack-infra		18:19
adalbas	anteaya, i found this one: https://bugs.launchpad.net/devstack/+bug/1193164	18:19
uvirtbot	Launchpad bug 1193164 in devstack "GnomeKeyring errors when installing devstack" [Undecided,New]	18:19
*** boris-42 has joined #openstack-infra		18:19
clarkb	jeblair: it isn't installing the cert at all so we can't ignore the error (I think apache is failing to do anything at that point)	18:19
anteaya	adalbas: that's the one	18:20
adalbas	anteaya, tks!	18:20
anteaya	adalbas: np	18:20
clarkb	jeblair: error does change when using the GIT_SSL_NO_VERIFY flag	18:20
ttx	jeblair: about mordred's suggestion of not approving before checks are run... is it something we could enforce ? I can see benefits for it even outside of the FF craze.	18:21
*** xBsd has quit IRC		18:21
jeblair	ttx: probably; occasionally it's useful. worth thinking about	18:22
Alex_Gaynor	ttx: So, FWIW when I first got involved in OpenStack, the way I thoguht it worked was that there wasn't an explicit "Approve" state, that instead stuff was approved when jenkins passed and it had the needed +2s. Such a model might be interesting to explore.	18:22
clarkb	jeblair: https://162.209.12.127/openstack/nova/info/refs not found falling back on the dumb client?	18:23
mordred	Alex_Gaynor: that's where we started, actually	18:23
ttx	Alex_Gaynor: we kinda want the APRV because smoetimes there is a timing constraint. So you can have two +2s but waiting for something to happen before hitting APRV	18:23
*** AJaeger has joined #openstack-infra		18:23
mordred	that too. but the effect on the gate would be largely the same if we triggered a gate run direcetly on the second +2	18:24
Alex_Gaynor	how many builders do we have right now for non-devstack builds?	18:24
mordred	which is that the second +2 could jump the initial vrfy and trigger the gate testing anyway	18:24
ttx	mordred: it wouldn't be completely insane to require that check tests pass before adding something to the gate queue. At least for some pipes	18:25
jeblair	(also, why would you never want more than 2 core reviewers to review something?)	18:25
clarkb	oh I know. I need to put git.openstack.org in the request. /me edits /etc/hosts locally	18:25
*** fbo_away is now known as fbo		18:25
*** datsun180b has joined #openstack-infra		18:25
*** xBsd has joined #openstack-infra		18:27
*** woodspa has joined #openstack-infra		18:28
jeblair	clarkb: why?	18:31
jeblair	clarkb: (the other servers don't require that)	18:31
clarkb	jeblair: because the 4443 vhost is for git.openstack.org otherwise you get the default vhost	18:32
jeblair	clarkb: why don't we make the 4443 accept all hostnames?	18:33
clarkb	jeblair: we can do taht as well. Remove the default vhost and put a * in the git.openstack.org vhost	18:33
clarkb	but now I appear to have haproxy logging issues. It wants to log to rsyslog via udp	18:34
clarkb	error: gnutls_handshake() failed: A TLS warning alert has been received. is the current error	18:34
mordred	jeblair: ok. I think I have learned new things	18:41
reed	bbl	18:41
*** reed has quit IRC		18:41
jeblair	clarkb, mordred: https://etherpad.openstack.org/git-lb	18:42
jeblair	dinky benchmarks	18:42
jeblair	i think we should use 8g nodes instead of 30g; and lots of them.	18:43
jeblair	mordred: what have you learned?	18:44
clarkb	wow those numbers are very close to each other	18:44
sdake_	is the gate broken ?	18:44
mordred	jeblair: ah. nope.	18:45
mordred	jeblair: I did not learn something	18:45
anteaya	sdake_: what do you see that you ask the question?	18:46
anteaya	the gate is very very slow but it should still be running	18:46
jeblair	mordred: (i have learned that git.o.o has a partial packed-refs file; i suspect it has something to do with how it was created (maybe an initial git clone --mirror or something))	18:46
jeblair	mordred: 28k refs are in packed refs, 9k are loose	18:47
mordred	jeblair: interesting	18:47
jeblair	mordred: review.o.o is all unpacked	18:47
anteaya	the check queue however is filled with unknown rather than a time	18:47
mordred	jeblair: I'm breaking down and asking spearce questions directly	18:47
jeblair	anteaya: waiting on centos nodes for py26 tests	18:47
sdake_	anteaya apparently heat gate jobs are going slowly	18:47
sdake_	but they appear to make progress according to devs in the heat channel - but thanks for responding	18:47
anteaya	sdake_: yes all gate jobs are going slowly	18:48
anteaya	yes absolutely	18:48
jeblair	which probably means we should add more centos nodes	18:48
anteaya	jeblair: great thanks	18:48
anteaya	go go centos nodes	18:48
jeblair	clarkb: they are actually close enough that i want to spin up a 4g and 2g node (they both have 2vcpus; half of 8g's 4vcpu)	18:50
*** cthulhup has quit IRC		18:51
clarkb	jeblair: good idea	18:52
clarkb	I am going to stop using haproxy for the http to https rediect. I don't think that works tiwh the tcp mode	18:53
mordred	jeblair: best I can tel, the patch did not land, nor any patches like it	18:53
*** danger_fo_away is now known as danger_fo		18:54
jeblair	mordred: :(	18:54
clarkb	I had a hunch this would be the case which is why I kept the 8080 vhost	18:54
mordred	jeblair: I'm continuing to dig though	18:54
jeblair	those are launching now; i need to get exercise and lunch; should be back in about 1 hour	18:54
*** sarob has joined #openstack-infra		18:59
clarkb	removing the default virtualhost and matching * on the git.o.o vhost makes things work for some reason. I am not complaining patchset icoming	19:04
mordred	jeblair: uploadpack.hiderefs	19:08
mordred	jeblair: it's in 1.8.2	19:09
mordred	which means we'd almost certainly want the fetch-from repos to be on precise so taht we could install latest git from the git ppa	19:10
mordred	clarkb: ^^	19:10
clarkb	mordred: ugh	19:10
*** gordc has quit IRC		19:10
clarkb	I think we either get cgit on precise or new git on centos	19:11
mordred	oh yeah?	19:11
clarkb	because those seem less painful than a complicated proxy mess to send cgit to centos boxes and everything else to precise boxes	19:11
mordred	nod	19:11
mordred	how awful is getting git >=1.8.2 on centos along side of our cgit install?	19:12
mordred	pleia2: ^^ ?	19:12
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	19:13
pleia2	mordred: pretty awful	19:13
mordred	SWEET	19:13
pleia2	would have to load up a 3rd party rpm, which makes me :(	19:13
mordred	where are we getting cgit from?	19:14
pleia2	epel	19:14
mordred	epel has cgit and not git >=1.8 ?	19:14
pleia2	as I understand it, epel is just "other stuff" not so much backports	19:14
clarkb	ugh I just derped and started a git fetch of 42784 into the hiera repo...	19:14
* clarkb makes a note to clean up that repo when this is all done		19:15
clarkb	hiera itself should be fine as I didn't check out anything in that repo	19:15
mordred	pleia2: I understood the oposite - that epel is backports of current fedora for old centos/rhel	19:15
mordred	but - I REALLY don't understand	19:15
*** AJaeger has quit IRC		19:15
clarkb	mordred: maybe you can take a look at that repo to make sure I didn't hose anything? second set of eyes and all that	19:15
mordred	clarkb: all you did was fetch?	19:16
pleia2	Does EPEL replace packages provided within Red Hat Enterprise Linux or layered products?	19:16
pleia2	No. EPEL is purely a complementary repository that provide add-on packages.	19:16
clarkb	mordred: yes	19:16
*** zaro has joined #openstack-infra		19:17
anteaya	hey zaro	19:17
mordred	ok. then I do not have a good answer	19:17
clarkb	mordred: http://paste.openstack.org/show/44788/ I ^C'd before the checkout	19:17
pleia2	I mean, we can just use an rpm	19:17
mordred	well, cgit is compiled against git	19:18
pleia2	oh, that	19:18
mordred	isnt' it? so wouldn't that screw the cgit install too?	19:18
pleia2	I'm not sure	19:18
mordred	or - wait - no, they do static linking	19:18
mordred	that's why it's not in ubuntu	19:18
pleia2	er, hooray for static linking?	19:18
pleia2	:)	19:18
mordred	clarkb: yeah. you're fine	19:18
pleia2	I can find a nice looking rpm and install it on my test system	19:19
clarkb	mordred: ok, is that something we should git gc?	19:19
mordred	clarkb: not this week	19:19
mordred	:)	19:19
*** beagles has quit IRC		19:20
clarkb	ya I am not terribly worried about it, but I should probably clean that up at some point. I will write a note on the whiteboard	19:20
clarkb	jeblair: ^	19:20
*** sarob has quit IRC		19:21
*** pblaho has joined #openstack-infra		19:21
*** sarob has joined #openstack-infra		19:21
zaro	anteaya: hello!	19:22
anteaya	welcome to the party	19:23
anteaya	jeblair noticed that when restarting zuul a gearman thread dropped resulting in slaves sticking around and tests running on them, but they were orphaned	19:23
anteaya	so the logs from the tests wore lost	19:24
anteaya	<jeblair> mordred, clarkb, zaro: when the gearman server restarts, i think the executorworkerthread dies, which means the offline-on-complete feature fails	19:25
anteaya	* xBsd has quit (Quit: xBsd)	19:25
anteaya	<jeblair> mordred, clarkb, zaro: which is why a lot of jobs are showing up as lost right now -- they are re-running on hosts that should have been offlined	19:25
anteaya	* michchap (~michchap@60-242-111-85.tpgi.com.au) has joined #openstack-infra	19:25
anteaya	<jeblair> so for the moment, if we stop zuul, we need to delete all the slaves	19:25
*** AJaeger has joined #openstack-infra		19:25
*** AJaeger has joined #openstack-infra		19:25
anteaya	zaro from about 4.5 hours ago	19:25
*** beagles has joined #openstack-infra		19:25
*** sarob has quit IRC		19:25
clarkb	ok lunch time back shortly	19:28
zaro	anteaya: sorry i missed it all. i was deep in gerrit.	19:28
anteaya	clarkb: happy lunch	19:28
anteaya	zaro: understandable	19:28
zaro	anteaya: lunching with clarkb so will think about it after food.	19:28
anteaya	zaro: happy food	19:28
ttx	anteaya: the "available test nodes" graph at bottom of zuul status page looks a bit funny. Since you've been following the action, is it considered normal ?	19:29
pleia2	tsk, RPMForge is popular but for centos they only have up to git 1.7.11	19:29
ttx	10 is the new 0	19:29
anteaya	ttx: I asked the same question at the start of my day today	19:30
anteaya	it means that we are using all available nodes, really	19:31
anteaya	so yes 10 is the new 0	19:31
anteaya	ttx when I asked jeblair the same question this morning he responded with this image: http://graphite.openstack.org/render/?from=-24hours&fgcolor=000000&title=Test%20Nodes&_t=0.8664466904279092&height=308&bgcolor=ffffff&width=586&until=now&showTarget=color%28alias%28sumSeries%28stats.gauges.nodepool.target..devstack-precise..ready%29%2C%20%27devstack-precise%27%29%2C%20%27green%27%29&_salt=1376751567.43&target=alias%28sum	19:31
anteaya	Series%28stats.gauges.nodepool.target..devstack-precise..building%29%2C%20%27Building%27%29&target=alias%28sumSeries%28stats.gauges.nodepool.target..devstack-precise..ready%29%2C%20%27Ready%27%29&target=alias%28sumSeries%28stats.gauges.nodepool.target..devstack-precise..used%29%2C%20%27Used%27%29&target=alias%28sumSeries%28stats.gauges.nodepool.target..devstack-precise..delete%29%2C%20%27Delete%27%29&areaMode=stacked	19:31
anteaya	oh goodness sorry about that	19:32
anteaya	ick	19:32
ttx	ok, was expecting something like this. "free node" graphs are always a bit funny in a dynamic allocation system	19:32
anteaya	yes	19:32
ttx	anteaya: could you tinyurl that for me ?	19:32
anteaya	ttx: https://tinyurl.com/kmotmns	19:33
anteaya	better	19:33
mordred	ok. plane landing. I _may_ get on for a minute at the hotel tonight, but in general I'm switching to driving large trucks and building steel structures in the hot sun	19:34
mordred	and, you know, burning the man	19:35
anteaya	happy sand, mordred	19:35
pleia2	mordred: have fun! (or whatever you're supposed to have at burning man :))	19:36
anteaya	whatever it is, it doesn't include water or shade	19:36
*** vipul is now known as vipul-away		19:36
*** vipul-away is now known as vipul		19:36
*** boris-42 has quit IRC		19:37
ttx	Interesting thing... FeatureProposalFreezes should overflow the checks, not the gate pipeline. FeatureFreeze will overflow the gate pipeline. That should really be fun	19:38
ttx	(i.e. people are supposed to propose stuff, not so much approve them)	19:39
anteaya	when is the date for FeatureProposalFreezes?	19:40
mgagne	anteaya: August 21 for nova and cinder -> https://wiki.openstack.org/wiki/Havana_Release_Schedule	19:41
mgagne	anteaya: today =)	19:41
anteaya	mgagne: thank you	19:41
anteaya	ah ha	19:42
anteaya	funny it has been neutron and heat we have heard from today	19:42
anteaya	cinder and nova have been relatively quiet in this channel	19:43
*** melwitt has joined #openstack-infra		19:43
pleia2	clarkb: so the only reasonable, new git rpms that people use are from http://pkgs.repoforge.org/git/ (might find some random ones on some-person's-blog if I search more, but I haven't yet, and even then...), repoforge only goes up to 7.11, the other option is installing from source :\	19:44
pleia2	er, 1.7.11	19:44
*** pblaho has quit IRC		19:45
* pleia2 lunch		19:46
anteaya	happy lunch	19:46
anteaya	guess it is just me right now	19:46
anteaya	ttx are stackforge projects affected by feature freeze? like savanah and murano?	19:47
ttx	no, only the integrated projects	19:47
ttx	i.e. the ones that do a common release	19:47
*** arezadr has joined #openstack-infra		19:48
anteaya	do you think there would be offence taken if stackforge projects were asked to submit patches on a critical basis only right now?	19:49
anteaya	then if something is non-critical it could wait until after the rush	19:49
ttx	that's not really the concept that was sold to them, and unfortunately we are far from hte activity peak	19:50
ttx	ie. Feature Freeze is actually two weeks away.	19:50
ttx	We can't ask them to hold for two weeks.	19:51
anteaya	I'm seeing a lot of nova/heat/cinder/neutron patches so that is as expected	19:51
jeblair	anteaya: out of the 200 changes in zuul, ~40 are stackforge, and they run simple/fast jobs. i don't think it's worth it.	19:51
anteaya	ttx fair enough	19:51
anteaya	jeblair: ah stats thank you	19:51
*** vipul is now known as vipul-away		19:51
anteaya	the question just floated through my head so I thought I would give it voice	19:51
anteaya	jeblair: mordred found a git fix but it requires git 1.8.2 which requires installing a third party rpm for cgit and even then it appears the package is not available	19:52
*** wenlock has joined #openstack-infra		19:52
wenlock	hi all	19:53
*** vipul-away is now known as vipul		19:53
wenlock	question about hiera config, was looking for a sample... finally got some time to dig back into this today	19:54
jeblair	anteaya: i saw	19:54
*** thomasbiege has joined #openstack-infra		19:55
anteaya	k	19:55
anteaya	wenlock: hello what is the question?	19:55
wenlock	maybe enough to get me started with wiki	19:55
*** thomasbiege has quit IRC		19:57
*** cyeoh has quit IRC		19:57
*** chuckieb\|2 has quit IRC		19:58
*** koobs` has joined #openstack-infra		19:58
jeblair	it actually merged 11 changes in the past hour; i think it just got 11 more added to the gate queue.	19:58
*** cyeoh has joined #openstack-infra		19:58
*** koobs has quit IRC		19:58
*** jhesketh has quit IRC		19:59
anteaya	how does 11 merges in the last hour compare with prior hours?	19:59
anteaya	are we getting better or staying the same?	19:59
*** jhesketh has joined #openstack-infra		20:00
jeblair	anteaya: we haven't done anything to make it better yet so it's not worth looking. i mostly wanted to see if it was functioning at all, and it is. so it's back to scaling git.o.o now.	20:01
anteaya	ah okay	20:01
jeblair	anteaya: (it's in graphite if you wanted to play with it; i don't have a link, i was grepping logs because i was looking for errors)	20:02
*** cthulhup has joined #openstack-infra		20:02
anteaya	jeblair: I have forgotten how I get to graphite	20:02
jeblair	anteaya: graphite.openstack.org	20:03
anteaya	that would be it, thanks	20:03
*** thomasbiege has joined #openstack-infra		20:03
*** linggao has joined #openstack-infra		20:03
jeblair	clarkb: we should consider using the private interfaces for git haproxy (but that only works within a DC; and we should also test to see which is actually faster)	20:04
*** mrodden has quit IRC		20:05
*** hartsocks has joined #openstack-infra		20:05
*** thomasbiege has quit IRC		20:06
*** cthulhup has quit IRC		20:06
linggao	Hi clarkb, I accidently added a patch 10 to someone's code in review. I meant only depend on his code.	20:09
linggao	clarkb, how do I remove patch 10 in https://review.openstack.org/#/c/40844/ ?	20:09
linggao	clarkb, NobodyCam told me to ask you about it.	20:10
jeblair	#status ok	20:11
*** ChanServ changes topic to "Discussion of OpenStack Developer Infrastructure \| docs http://ci.openstack.org \| bugs https://launchpad.net/openstack-ci/+milestone/grizzly \| https://github.com/openstack-infra/config"		20:11
*** afazekas has quit IRC		20:13
*** yolanda has joined #openstack-infra		20:13
*** rnirmal has quit IRC		20:18
clarkb	jeblair: did you catch my hiera data repo derp in scrollback? don't let me forget to clean that up at some point when things are quieter	20:18
clarkb	jeblair: tl;dr I fetched a ref from openstack-infra/config into that repo http://paste.openstack.org/show/44788/ because I was in the wrong PWD when running that command	20:19
clarkb	linggao: there is not way to remove patch10. You can only push a patch 11 that restores patchset 9	20:19
jeblair	clarkb: yep	20:19
clarkb	jeblair: I am reading up on private interfaces now. 162.209.12.127 has the latest patchset of my change applied to it and is working fine	20:20
*** morganfainberg\|a is now known as morganfainberg		20:20
linggao	clarkb: thanks. I'll do that to repaire the damage.	20:20
*** dkliban has quit IRC		20:20
clarkb	jeblair: oh you mean the rax private interfaces	20:20
jeblair	clarkb: yeah	20:20
clarkb	jeblair: do our firewall rules apply to both interfaces? if so it is just a matter of putting those IPs into the balance member IP list	20:21
jeblair	clarkb: well, we need to decide if we want to use them first	20:22
jeblair	clarkb: you want to run a quick benchmark beetween the 15 and 30 g test nodes i set up?	20:22
jeblair	clarkb: (note, they are in ORD, not DFW)	20:22
clarkb	jeblair: I will set up 162.209.12.127 to balance across that node and the 30G node on their private interfaces then switch to public	20:22
jeblair	clarkb: ok, i was just thinking do a quick git clone from one to the other to see if you notice a diff	20:24
clarkb	jeblair: without haproxy?	20:24
clarkb	I can do that too	20:24
jeblair	clarkb: updated https://etherpad.openstack.org/git-lb	20:26
jeblair	2g has the highest 'clients served per gb' ratio	20:26
*** hashar has joined #openstack-infra		20:28
jeblair	and overall it correlates very closely to 1/1 client/cpu (with 8g being able to serve 1.5 clients per cpu)	20:28
*** hashar has left #openstack-infra		20:28
jeblair	(but slowly)	20:28
Alex_Gaynor	event/result queues on zuul seem to be rising	20:30
jeblair	Alex_Gaynor: thanks, i'll take a look	20:30
*** rfolco has quit IRC		20:31
zaro	jeblair: i can't seem to repro executorworkers stopping when gearman server restarts. are you still seeing this?	20:34
*** yolanda has quit IRC		20:34
jeblair	zaro: the problem is that the node was not taken offline	20:34
jeblair	zaro: the rest is speculation	20:35
*** hartsocks has left #openstack-infra		20:35
zaro	jeblair: node not taken offline due to restarting gearman server?	20:35
odyssey4me4	join #chef	20:35
odyssey4me4	hahaha, oops	20:35
jeblair	zaro: when the gearman server was taken offline, nodes that were running jobs were not set offline	20:36
*** p5ntangle has joined #openstack-infra		20:36
zaro	jeblair: ahh, ok.	20:36
*** rnirmal has joined #openstack-infra		20:38
*** UtahDave has joined #openstack-infra		20:38
jeblair	clarkb: i'm going to try your signal patch now, but i expect it to kill the gearman server	20:41
jeblair	clarkb: which means we'll get one thread dump and then we get to restart zuul	20:42
clarkb	jeblair: ok	20:42
clarkb	also :(	20:42
*** p5ntangle has quit IRC		20:44
anteaya	jeblair: should we have a status update for the channels?	20:44
clarkb	anteaya: I think we can do that once the gearman server falls over	20:44
clarkb	anteaya: if we don't recover cleanly	20:44
anteaya	okay	20:44
*** AJaeger has quit IRC		20:46
*** odyssey4me4 has quit IRC		20:46
jeblair	clarkb: i don't think it fell over	20:47
clarkb	jeblair: did you get a stack dump?	20:48
jeblair	clarkb: i have 2 of them, so far; slightly different, and useful	20:48
clarkb	jeblair: best I can tell there isn't a real difference betwene private or public interfaces on those boxes	20:49
clarkb	I updated the etherpad	20:49
*** thomasbiege has joined #openstack-infra		20:49
*** thomasbiege has quit IRC		20:50
jeblair	clarkb: i'm switching to intense zuul hacking; i'd lean toward going with public and proceeding with the plan	20:50
clarkb	ok, I am going to test ipv6 now. as I noticed that wasn't working	20:51
clarkb	and git.o.o has a AAAA record so it should be made to work	20:51
jeblair	clarkb: but we can haproxy over v4, yeah?	20:51
clarkb	jeblair: yeah this is just for the frontend listen directives	20:51
jeblair	clarkb: (i mean, if somethings broke, probably worth looking into)	20:51
jeblair	ok, yeh	20:51
openstackgerrit	Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784	20:52
*** apcruz has quit IRC		20:54
*** lbragstad has quit IRC		20:55
jeblair	clarkb: ok, i think i have what i need to hack on the zuul problem	20:56
jeblair	clarkb: i don't believe it's going to get better (it might, after a long time, increment through the loop again)	20:56
jeblair	clarkb: so we should go ahead and stop it, which as we learned, means some cleanup work.	20:56
jeblair	clarkb: up for helping?	20:56
clarkb	jeblair: sure	20:57
*** dkliban has joined #openstack-infra		20:57
clarkb	jeblair: I think 42784 is just about ready. Need to test thta that works over ipv6 now but git clone doesn't like ipv6 addresses in its url	20:57
clarkb	it splits on ':'	20:57
jeblair	ah	20:57
clarkb	*splits on ':' and treats the righ hand side as the port	20:57
jeblair	clarkb: will you log into jenkins02?	20:57
clarkb	jeblair: I am in	20:58
jeblair	clarkb: abort all jobs. :)	20:58
clarkb	oh you mean through web ui?	20:58
jeblair	clarkb: yep	20:58
clarkb	I ssh'd in >_>	20:58
clarkb	jeblair: should I put it in shutdown mode first to prevent more jobs from starting?	20:58
jeblair	clarkb: i stopped zuul	20:59
clarkb	ok	20:59
clarkb	I am aborting jobs now	20:59
jeblair	clarkb: with any luck, nodepool should clean most of those up	21:01
*** cody-somerville has quit IRC		21:01
clarkb	I am not sure if I should wait after clicking the red button or if I can just spam that. I assume that it is just making a rest call back to jenkins	21:02
jeblair	clarkb: spam it	21:02
*** fbo is now known as fbo_away		21:02
jeblair	clarkb: when you're done; double check that of all the on-line devstack nodes, none of them has a build history	21:03
clarkb	jeblair: FYI https://jenkins02.openstack.org/job/gate-neutron-python26/275/ won't die and has been running for hours	21:05
jeblair	clarkb: i'll look into it	21:06
clarkb	I am waiting for nodepool to cleanup the nodes now	21:06
jeblair	clarkb: (it'll add them too, you should end up with 10 online and 5 offline nodes [thanks to az2])	21:07
jeblair	clarkb: that's nasty; i think we should restart that jenkins master	21:08
jeblair	(when nodepool finishes	21:08
*** dkranz has joined #openstack-infra		21:08
jeblair	clarkb: (i killed and relaunched the slave and that build is still stuck)	21:08
clarkb	jeblair: restarting the master wfm	21:09
clarkb	jeblair: I assume we will wait for nodepool to settle first	21:09
*** dprince has quit IRC		21:09
jeblair	i think we're there...	21:09
clarkb	yup I am checking build history now	21:09
jeblair	clarkb: k; you can restart it at will	21:10
jeblair	#status alert Restarting zuul, changes should be automatically re-enqueued	21:10
openstackstatus	NOTICE: Restarting zuul, changes should be automatically re-enqueued	21:10
*** ChanServ changes topic to "Restarting zuul, changes should be automatically re-enqueued"		21:11
clarkb	jeblair: build history is all empty. restarting jenkins now	21:11
*** mrodden has joined #openstack-infra		21:12
jeblair	clarkb: ready for me to start zuul?	21:13
clarkb	jeblair: ya, jenkins is back up	21:13
jeblair	zuul is up; i've started the reverifies and rechecks (with a 30s delay as earlier)	21:14
jeblair	though perhaps i should have done 60s, knowing what we know about git.o.o now	21:14
clarkb	I just ran into the cannot fetch idx thing cloning from the 15g test node on the 30g test node...	21:15
clarkb	this was over ipv6	21:15
clarkb	through haproxy	21:15
*** cody-somerville has joined #openstack-infra		21:15
clarkb	are we not able to pack up all of the refs before the http timeout?	21:15
jeblair	i have not seen that in isolated testing	21:16
clarkb	you know, I wonder if the centos git slowness has anything to do with ipv6	21:16
*** dina_belova has quit IRC		21:16
clarkb	because it is being really slow too	21:16
*** eharney has joined #openstack-infra		21:16
clarkb	cloning with git:// over ipv6 worked fine	21:17
jeblair	i'm going to switch to zuul hacking to try to squash this bug before the next time we have to restart it	21:17
clarkb	ok	21:17
clarkb	pleia2: are you around?	21:17
clarkb	pleia2: any chance you can try and corroborate that git cloning on centos is slow when using ipv6 but not when using ipv4?	21:18
pleia2	clarkb: hey	21:18
clarkb	pleia2: cloning against review.o.o should be sufficient to test that	21:18
pleia2	clarkb: ok, will do	21:18
*** gordc has joined #openstack-infra		21:19
clarkb	thank you	21:20
clarkb	pleia2: fwiw it is consistent on these test boxes	21:20
clarkb	I am testing ipv4 again to make sure it isn't some other external thing being weird	21:21
pleia2	clarkb: wait, running git clone on centos or to a git server on centos?	21:23
*** dkranz has quit IRC		21:23
clarkb	pleia2: git clone on centos	21:24
clarkb	pleia2: as our centos slaves are slow cloning from review.o.o	21:24
pleia2	clarkb: only have an hpcloud account, no ipv6	21:24
clarkb	oh	21:24
*** pabelanger has quit IRC		21:24
clarkb	I am seeing the same slowness again ipv4 now. I am going to test cloning from my local box now	21:24
pleia2	I have several hosts that do have ipv6, but all debian and ubuntu	21:25
*** reed has joined #openstack-infra		21:25
clarkb	wow this is so weird. On the rax test centos box ipv4 clone timed out too then did the cannot find idx pack file thing	21:26
clarkb	but I run the same clone on my local precise box and clone all of nova in ~45 seconds	21:27
clarkb	git:// works just fine on centos tough	21:27
*** xBsd has quit IRC		21:28
jeblair	clarkb: remember that i was able to do 6 simultaneous clones over v4 https to the 8g box	21:28
jeblair	(without error)	21:28
clarkb	I am going to bypass haproxy now to see if that is tickling the issue	21:28
clarkb	jeblair: were you running the clones on cenots?	21:28
jeblair	clarkb: no, on precise	21:29
clarkb	jeblair: I think this is the centos slowness remanifesting itself	21:29
jeblair	interesting	21:29
clarkb	because my precise box is fine	21:29
jeblair	clarkb: er, is the issue that centos git does not speak the smart http protocol?	21:30
clarkb	jeblair: no, centos git does speak smart http protocol	21:30
*** fbo_away is now known as fbo		21:30
clarkb	it is 1.7.1 iirc and smart http went in 1.6.something	21:30
clarkb	I will double check thatthough	21:30
jeblair	maybe it doesn't speak it well.	21:30
clarkb	could be	21:30
*** dkranz has joined #openstack-infra		21:31
*** vipul is now known as vipul-away		21:32
pleia2	it takes over 2 minutes to clone nova over http review.o.o in a couple places I tested (a debian linode - ipv4&6 and centos hpcloud ipv4)	21:34
clarkb	jeblair: I can clone directly over ipv4 to apache. It is slow, but it works. I think haproxy must be amplifying some latency	21:35
clarkb	I am trying to test with ipv6 but our iptables puppet stuff doesn't work correctly on centos for ipv6	21:35
*** lbragstad has joined #openstack-infra		21:35
*** vipul-away is now known as vipul		21:35
*** mriedem has quit IRC		21:37
*** danger_fo is now known as danger_fo_away		21:38
*** dkranz has quit IRC		21:43
*** SergeyLukjanov has quit IRC		21:45
*** lcestari has quit IRC		21:47
clarkb	direct ipv6 is also slow but works eventually	21:48
clarkb	pleia2: where are those newer versions of git? I am half tempted to try one of them to see if the slowness goes away	21:48
pleia2	clarkb: debian is 1.7.10	21:49
clarkb	pleia2: the ones you found for centos	21:49
pleia2	clarkb: ah, newest one for centos is 1.7.11	21:49
anteaya	I'm on 1.8.1.2 - ubuntu quantal	21:49
*** changbl has quit IRC		21:49
jeblair	clarkb: does it matter? i mean, is that the way we're going to solve this?=	21:49
anteaya	not sure if that helps or creates jealousy	21:49
pleia2	git-daemon is a separate package though, realized I'd need to find a package for that too if it's not included	21:50
jeblair	(i mean, maybe it'll tell you something, but if the end result of this is 'it might work if we upgrade all the slaves' i think we're digging a bigger hole)	21:51
*** fbo is now known as fbo_away		21:52
* anteaya heads out for a walk		21:54
*** dkranz has joined #openstack-infra		21:55
pleia2	oh, http://repoforge.org/ is the place for them though	21:55
jeblair	clarkb: ^	21:55
pleia2	(I tend to agree with jeblair though)	21:55
pleia2	usage details on centos6: http://wiki.centos.org/AdditionalResources/Repositories/RPMForge#head-f0c3ecee3dbb407e4eed79a56ec0ae92d1398e01	21:56
*** linggao has quit IRC		21:56
*** dkliban has quit IRC		21:58
*** ^d has quit IRC		21:59
*** hashar has joined #openstack-infra		22:00
*** hashar has left #openstack-infra		22:00
clarkb	jeblair: I agree in general too. I am scanning git release notes there are a few things that pop out as possibly being hte cause	22:04
*** burt has quit IRC		22:04
*** mrodden has quit IRC		22:04
*** ryanpetrello has quit IRC		22:05
*** dkranz has quit IRC		22:06
clarkb	jeblair: pleia2 the two items with HTTP in them at https://git.kernel.org/cgit/git/git.git/tree/Documentation/RelNotes/1.7.5.txt seem like possible culprits	22:09
*** _TheDodd_ has quit IRC		22:11
*** dmakogon_ has left #openstack-infra		22:15
pleia2	clarkb: first seems like it would be trivial for big clones, not so sure about 2nd, they mention many tags but not big repo (nova does have a fair number of tags, I dont know how many "many" is)	22:15
clarkb	pleia2: neither do I. I am tcpdumping now and will have to look at this closer	22:15
jeblair	i don't expect the second to affect a clone	22:15
clarkb	because this will need to be sorted before we can point any of the centos slaves at haproxy	22:16
*** dina_belova has joined #openstack-infra		22:17
jeblair	i think zuul is stuck again; but i haven't been able to repro the problem locally yet	22:19
*** dina_belova has quit IRC		22:21
*** mrodden has joined #openstack-infra		22:23
jeblair	i'm going to restart it with some cowboy logging	22:26
*** dina_belova has joined #openstack-infra		22:27
*** dina_belova has quit IRC		22:32
clarkb	ok	22:33
*** prad_ has quit IRC		22:33
*** rcleere has quit IRC		22:37
*** sarob has joined #openstack-infra		22:38
clarkb	comparing tcpdump taken locally and tcpdump on centos the centos client ends up with a window size of 0 frequently which does not happen locally	22:41
clarkb	I think the client is unable to accept more data for some reason	22:42
jeblair	is that with haproxy in both cases?	22:43
clarkb	jeblair: without in the centos case. with locally I should retry locally without haproxy	22:45
clarkb	I should learn to use punctuation too	22:46
anteaya	back	22:46
*** ftcjeff has quit IRC		22:51
clarkb	it is related to https somehow	22:55
clarkb	I gave the http vhost the same git stuff as the https vhost and it is much faster	22:55
jeblair	clarkb: haproxy?	22:55
*** dims has quit IRC		22:58
clarkb	jeblair: https is slow through haproxy and not through haproxy, but worse through haproxy so definitely possible	23:01
clarkb	http seemed to be much faster through both	23:01
*** woodspa has quit IRC		23:01
jeblair	clarkb: wasn't suggesting a cause, just trying to understand the variables in your experiment	23:01
clarkb	http + haproxy = fast, http = fast, https = slow, but does not fail, https + haproxy = even slower causes git clone to fail	23:05
*** rnirmal has quit IRC		23:05
clarkb	and git clone fails because it cannot get the idx files. Looks like same issue before where it falls back on non smart http and the lack of .git in the dir name breaks it	23:06
*** senk has quit IRC		23:08
jeblair	right, so that first connection fails	23:08
jeblair	on the refs front, i believe the refs on git.o.o are as packed as they are going to be; the items in refs/ are all _directories_ (up to the last component of the path); the refs themselves are in a packed-refs file.	23:09
*** dims has joined #openstack-infra		23:11
*** mberwanger has joined #openstack-infra		23:13
mgagne	clarkb: cgit config remove-suffix = 1. This will allow repositories on the filesystem to have a .git suffix but still show without it in the interface and in generated URL.	23:15
jeblair	mgagne: i don't think cgit was the problem	23:15
jeblair	mgagne: the problem was that the rewrite rules needed to support the dumb http protocol don't work	23:16
jeblair	mgagne: but since we're never supposed to use the dumb http protocol, we're not worrying about fixing it for now (instead, making it reliable enough that the smart http protocol doesn't fail)	23:17
mgagne	jeblair: that's what I'm trying to find out and stumble on this config	23:17
*** jjmb has joined #openstack-infra		23:18
*** jhesketh has quit IRC		23:18
*** jhesketh has joined #openstack-infra		23:20
*** mberwanger has quit IRC		23:21
* HenryG wonders if jeblair has a created a script to automate "recheck\|reverify no bug" submissions. :D		23:24
*** datsun180b has quit IRC		23:27
*** dina_belova has joined #openstack-infra		23:28
*** pabelanger has joined #openstack-infra		23:28
*** eharney has quit IRC		23:30
sarob	whats the pipeline ETA?	23:31
Alex_Gaynor	hmm, so there's stuff that has all it's builds complete, but is still hanging out at the top of the pipeline, is that because of git?	23:32
*** dina_belova has quit IRC		23:32
*** gordc has quit IRC		23:33
anteaya	sarob: no ETA	23:35
anteaya	if it makes it through we are happy	23:35
anteaya	we have had to restart zuul three times today	23:35
anteaya	not our best day	23:35
jeblair	Alex_Gaynor: no, i believe we are reliably reproducing the zuul bug now	23:37
Alex_Gaynor	jeblair: ah!	23:37
anteaya	jeblair: yay, did you cowboy logging result in more bug information?	23:38
anteaya	s/you/your	23:38
*** jjmb has quit IRC		23:38
jeblair	anteaya: a bit too much, i'm afraid	23:38
jeblair	i've stopped zuul again	23:39
anteaya	:(	23:39
jeblair	it will take me a few mins to process this and figure out what's going on	23:40
*** jcooley has quit IRC		23:41
anteaya	k	23:42
anteaya	well that seems like time well spent because this zuul bug keeps showing up	23:43
*** jcooley has joined #openstack-infra		23:44
*** zul has quit IRC		23:47
sarob	no sweat guys. you do an awesome job of keeping stuff humming	23:52
anteaya	thanks sarob, jeblair may have found our elusive zuul bug	23:52
sarob	sweet. good luck.	23:52
anteaya	so hopefully we can keep zuul up longer on the next restart	23:53
anteaya	thanks	23:53
anteaya	:D	23:53
*** zul has joined #openstack-infra		23:53

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!