mgagne | how are repositories named on the filesystem? with or without .git suffix? | 00:00 |
---|---|---|
clarkb | mgagne: with | 00:00 |
clarkb | jeblair: pleia2 I can fetch that neutron ref now | 00:02 |
clarkb | should I go ahead and add symlinks for all the things? | 00:02 |
clarkb | or should we focus on actual solution now that we know that is sufficient | 00:02 |
*** datsun180b has quit IRC | 00:02 | |
*** reed has quit IRC | 00:03 | |
jeblair | clarkb: i'd add the symlinks for the existing projects | 00:04 |
mgagne | Who is therefore returning an URL without .git in it? | 00:04 |
jeblair | mgagne: exactly, that's the question; is it a difference in the git version? | 00:05 |
mgagne | jeblair: could it be the git client? | 00:05 |
clarkb | well our test scripts are hard coded to use paths without .git | 00:05 |
jeblair | clarkb: yes, but our scripts don't fetch packfiles, git does | 00:06 |
clarkb | so either something is adding it when we talk to review.o.o or our rewrite and aliasmatch stuff is munging it or git version makes a difference | 00:06 |
jeblair | clarkb: so either the git client or git server is doing something unexpected | 00:06 |
jeblair | clarkb: remember, almost no pack files were retrieved from review.o.o | 00:06 |
jeblair | clarkb: i am not certain this is a difference | 00:06 |
mgagne | if it's related to the git client, I believe that both should be supported (w/ and w/o .git) to avoid frustration and issues with the enduser/devs | 00:07 |
jeblair | 2013-08-20 23:57:07.094 | error: Failed connect to git.openstack.org:443; Connection refused while accessing https://git.openstack.org/openstack/tempest/info/refs | 00:08 |
mgagne | should it be handled by Apache or by the filesystem, I don't know what's the best. | 00:08 |
fungi | yeah, pretty sure no amount of filesystem or cgi adjustments are going to solve a connection refusal from apache | 00:12 |
jeblair | fungi: different problem | 00:12 |
fungi | granted | 00:16 |
fungi | but suspect we could be hitting overall connection limits too | 00:17 |
jeblair | fungi: yep | 00:18 |
fungi | git.o.o is acting a lot more hammered than review.o.o was, even though the load average isn't near as high | 00:19 |
SpamapS | do we run devstack on a py26 system in the gate? | 00:19 |
SpamapS | or just unit tests? | 00:19 |
fungi | SpamapS: just unit tests | 00:19 |
SpamapS | I think python-novaclient may be uninstallable in py26 | 00:19 |
SpamapS | AttributeError: 'module' object has no attribute '__getstate__' | 00:19 |
*** dkliban has quit IRC | 00:19 | |
SpamapS | File "/usr/local/lib/python2.6/dist-packages/setuptools/sandbox.py", line 58, in run_setup | 00:21 |
SpamapS | hm thats actually in ye-olde distribute | 00:21 |
jeblair | clarkb, pleia2: think the pack thing may be a tiny bit of a red herring | 00:21 |
jeblair | mgagne: ^ | 00:21 |
mordred | jeblair: oh piddle :) | 00:21 |
fungi | ping rtt to git.o.o is averaging 1600ms for me right now, as opposed to review.o.o which is around 55ms | 00:21 |
jeblair | clarkb, pleia2: i _suspect_ that those files are only retrieved directly by the _dumb_ http client | 00:21 |
SpamapS | ahh have to remove python-pkg-resources | 00:21 |
clarkb | jeblair: interesting | 00:22 |
jeblair | that job fell back on the dumb client because: | 00:22 |
jeblair | 2013-08-20 22:50:00.379 | error: The requested URL returned error: 504 while accessing https://git.openstack.org/openstack/neutron/info/refs | 00:22 |
pleia2 | ah | 00:22 |
jeblair | it thought the smart client wasn't available | 00:22 |
clarkb | jeblair: so our rewrites are not working properly | 00:22 |
jeblair | clarkb: correct, they're just plain wrong but pretty much never used (my hypothesis) | 00:23 |
clarkb | also writing a script to make these symlinks that is idempotent and not insane is taking too much time | 00:23 |
jeblair | clarkb: i would consider abandoning that and deleting the symlinks at this point | 00:23 |
mordred | SpamapS: you need to pip install -U pip before installing anything via pip currently | 00:23 |
mordred | SpamapS: if you want to be safe | 00:23 |
jeblair | and add a medium priority todo fix the rewrites | 00:23 |
SpamapS | mordred: did that, had to apt-get remove python-pkg-resources | 00:24 |
mordred | SpamapS: you will pip install -U setuptools | 00:24 |
mordred | SpamapS: that mainly means that something borked something first | 00:24 |
SpamapS | mordred: and apt-get remove python-setuptools | 00:24 |
mordred | that should not be necessary | 00:24 |
mordred | but if something pip installed something first | 00:24 |
mordred | you'll need to do that to recover | 00:24 |
SpamapS | first two things I did were exactly that, pip install -U pip, and then pip install -U setuptools | 00:24 |
jeblair | fungi: yeah. my interactive shell is very slow too. | 00:24 |
mordred | SpamapS: wow. really? | 00:24 |
mordred | sigh | 00:24 |
mordred | SpamapS: this is on precise? | 00:24 |
mordred | SpamapS: or? | 00:25 |
SpamapS | yeah.. had to apt-get remove setuptools and then re-do pip install -U setuptools to recover :-/ | 00:25 |
SpamapS | mordred: lucid | 00:25 |
mordred | oh. jeez | 00:25 |
SpamapS | mordred: tryign to test py26 | 00:25 |
mordred | sorry. I have done zero testing of lucid | 00:25 |
fungi | and yet load average on git.o.o is in the single digits, not >200 like we saw on review.o.o | 00:25 |
mordred | god only knows how broken it is | 00:25 |
clarkb | jeblair: ok | 00:25 |
mordred | SpamapS: we have workarounds for that in devstack, which involve wget-ing things | 00:25 |
clarkb | jeblair: in the mean time a bunch of jobs will fail for random things | 00:26 |
mordred | SpamapS: the situation is pretty messed up | 00:26 |
clarkb | jeblair: do we maybe want to point everything at /cgit for now? | 00:26 |
jeblair | clarkb: no, we need to make git.o.o responsive | 00:26 |
jeblair | moving it around to a different unresponsive thing isn't going to make us happy | 00:26 |
jeblair | clarkb: that timeout could have happened just as easily talking to cgit | 00:27 |
mgagne | what is git.kernel.org using to server requests over http? | 00:28 |
*** jog0-away is now known as jog0 | 00:28 | |
jeblair | so how about we go ahead and load balance it, even though we don't have a good config, and we can come back and make it sane later? | 00:28 |
jeblair | start throwing hardware at the problem | 00:28 |
mordred | jeblair: ++ | 00:28 |
pleia2 | we're close to a good config | 00:28 |
pleia2 | at least, to limping | 00:29 |
fungi | i thought we had git.o.o in cacti, but i guess not | 00:29 |
pleia2 | https://review.openstack.org/43012 switches us over to service git:// then we bring in clarkb's haproxy patch https://review.openstack.org/#/c/42784/ (will need some edits after my patch) | 00:29 |
jeblair | pleia2: i meant a config that is correctly tuned for the tradeoffs we have chosen (based on those things we talked about in the meeting) | 00:29 |
pleia2 | jeblair: oh, right | 00:30 |
*** nati_ueno has quit IRC | 00:30 | |
jeblair | pleia2: but those are definitely steps in the right direction | 00:30 |
clarkb | jeblair: so I agree that the pack thing isn't the only issue, but until we fix that redirect or have the symlinks every single one of those fetches will fail | 00:30 |
jeblair | clarkb: but they should never happen unless there has already been an error | 00:30 |
jeblair | clarkb: i'm trying to say we've already lost by the time that fetch happens, we need to make sure it never happens | 00:31 |
clarkb | is that what that means? I was clearly focusing way too hard on symlinks of all things | 00:31 |
jeblair | clarkb: yeah, that's what i was trying to say earlier -- the smart http client should never fetch those | 00:32 |
jeblair | clarkb: it only did it because it thought the smart http server wasn't available | 00:32 |
jeblair | clarkb: supporting those urls only means that if the smart http client fails, our jobs will suck even _more_ data from git.o.o using the dumb client | 00:32 |
clarkb | yeah | 00:32 |
clarkb | if we are going to go mulitnode, I wonder if it is worth investigating using precise for the git-http-backend serving | 00:33 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/config: Start trending git.o.o performance with cacti https://review.openstack.org/43023 | 00:33 |
clarkb | since as fungi pointed out git.o.o is feeling the load a lot more than review.o.o | 00:33 |
* SpamapS considers snapshotting this lucid box for the next time a python2.6 thing is needed.. so much pip.. so little time | 00:34 | |
clarkb | then serve /cgit from centos and everything else from ubuntu nodes | 00:34 |
mordred | jeblair: what about tuning apache to do a much smaller number of active connections and tcp backlog most of them? | 00:34 |
* clarkb goes back to cleaning up symlinks | 00:34 | |
mordred | jeblair: to prevent the overloaded/timeout situation? | 00:34 |
jeblair | the current situation that fungi pointed out is weird. | 00:34 |
jeblair | the load average is low, cpi is mostly idle | 00:35 |
jeblair | cpu | 00:35 |
jeblair | i'm wondering if it's hypervisor host system load | 00:35 |
fungi | some other sort of starvation here. perhaps interrupt handling? | 00:35 |
jeblair | or network bottleneck | 00:35 |
fungi | or, yes, something on the host compute node maybe | 00:35 |
SpamapS | I've only ever seen phantom hypervisor load on xen | 00:35 |
mordred | hypervisor host system load sounds like an interesting cause for timeouts in taht situation | 00:35 |
SpamapS | kvm has served me well and reported it as "stolen" CPU | 00:36 |
jeblair | SpamapS: i meant other vms starving the hardware | 00:36 |
SpamapS | Yeah, that should be the same thing. | 00:36 |
SpamapS | "CPU time that I should have gotten went somewhere else" | 00:36 |
SpamapS | that is supposed to be "steal%" | 00:36 |
jeblair | SpamapS: ah, well, it says its low, 0.0-0.2; are you saying it's unreliable under xen? | 00:37 |
SpamapS | I have seen it be totally unreliable in xen | 00:38 |
SpamapS | Most famously in the phantom load seen on Ubuntu 10.04 hosts on ec2 | 00:38 |
jeblair | mordred: i think that tuning strategy would be good if we new how many git operations we could handle at once | 00:38 |
SpamapS | (which has mostly cleared up as they've upgraded their xen) | 00:38 |
mordred | jeblair: indeed. I was going to suggest starting with number of 'cores' | 00:39 |
mordred | jeblair: but that might take too long to chase | 00:39 |
jeblair | clarkb: i like your idea of serving git and cgit separately... because i think we may want to tune them separately | 00:39 |
mordred | jeblair: if we're going to do that ^^ | 00:39 |
jeblair | mordred: that's reasonable with lbaas? | 00:39 |
mordred | jeblair: I guess? OR - how about for now we just spin up haproxy so that we can actually control it | 00:39 |
fungi | in the past when we've been dos'd by our neighbors chewing up resources on the compute node, we've usually observed significant packet loss. in this case we're only seeing very high rtt, which suggests the kernel is being slow about processing the packets | 00:39 |
mordred | jeblair: and later we can engineer it into lbaas | 00:40 |
mordred | just so that our learning curve is lower | 00:40 |
* mordred assumes the people in this room can probably tweak an haproxy machine pretty quickly | 00:40 | |
jeblair | oh, well, i guess we're talking about two https services, so that's trickier | 00:40 |
mordred | oh yeah. good point | 00:41 |
clarkb | symlinks all cleaned up | 00:41 |
mordred | jeblair: three-tier? | 00:41 |
fungi | you'd have to terminate ssl/tls on the haproxy and then do stream munging/rewriting on the plaintext http stream, which would get ugly | 00:41 |
jeblair | i think the thing we can do quickly is spin up more copies of what we have | 00:42 |
mordred | jeblair: haproxy in front of a couple of apache nodes with mod_proxy that do termination that proxy to different git serving machines? | 00:42 |
mordred | jeblair: but yes. I think that's the quickest direct route to try first | 00:42 |
jeblair | so maybe we ought to do that, stick something (haproxy or lbaas) in front of it | 00:42 |
mordred | and we can furthre optimize by splitting in fancy ways later | 00:42 |
openstackgerrit | A change was merged to openstack-infra/config: Start trending git.o.o performance with cacti https://review.openstack.org/43023 | 00:42 |
jeblair | and then come back for another pass... yeah ^ | 00:42 |
mordred | jeblair: also - is it worth spinning up a copy on centos, and one on precise just to see if the backends perform differently? or too much work due to how our cgit module is written? | 00:43 |
jeblair | mordred: we have to figure out how to install cgit on precise 1st | 00:43 |
pleia2 | mordred: there is no cgit package for ubuntu (which is why we went with centos) | 00:43 |
mordred | jeblair: yeah. good point. later | 00:43 |
jeblair | mordred: it's _definitely_ worth it, but later, i think. | 00:43 |
mordred | jeblair: last stupid question from me - given the xen theory from earlier - is it worth trying to spin up a centos node at hpcloud to see if kvm gives us more love? | 00:44 |
jeblair | okay, so working plan so far: spin up git01 and git01.o.o, and front them with (haproxy on git.o.o) or (lbaas) ? | 00:44 |
mordred | these boxes don't need email really | 00:44 |
mordred | (for now) | 00:44 |
jeblair | mordred: i don't think it's a xen problem as much as a bad neighbor problem | 00:44 |
mordred | nod | 00:44 |
pleia2 | mordred: well, might be worthwhile just so we have one on rackspace and one on hpcloud | 00:44 |
*** ^d has joined #openstack-infra | 00:44 | |
*** ^d has joined #openstack-infra | 00:44 | |
jeblair | mordred: i hear hpcloud has a particularly bad tenant. i'd hate bo be stuck near us. | 00:44 |
mordred | jeblair: hahaha | 00:45 |
clarkb | jeblair: hahahahah | 00:45 |
pleia2 | hah | 00:45 |
mordred | jeblair: working plan sounds good | 00:45 |
fungi | we are the bad neighbor | 00:45 |
fungi | nice | 00:45 |
mordred | we need to replicate to both of them from gerrit, yeah? | 00:45 |
*** woodspa has joined #openstack-infra | 00:45 | |
mordred | so we're going to have ot bounce gerrit | 00:45 |
clarkb | like a bad neighbor openstack infra is there | 00:45 |
clarkb | the jingle doesn't quite work but I laughed inside | 00:45 |
jeblair | mordred: i'm still worried about hpcloud deleting nodes. you got an email that they deleted one the other day, right? let's put git03 in hpcloud if we want to try that. | 00:45 |
jeblair | mordred: yep | 00:45 |
jeblair | replication | 00:45 |
mordred | jeblair: I am too - but if we're actually going to elastic throwaway nodes | 00:46 |
jeblair | so which do we want to do, our own haproxy on git, or lbaas? | 00:46 |
openstackgerrit | Joshua Hesketh proposed a change to openstack-infra/zuul: Move gerrit specific result actions under reporter https://review.openstack.org/42644 | 00:46 |
openstackgerrit | Joshua Hesketh proposed a change to openstack-infra/zuul: Add support for emailing results via SMTP https://review.openstack.org/42645 | 00:46 |
openstackgerrit | Joshua Hesketh proposed a change to openstack-infra/zuul: Separate reporters from triggers https://review.openstack.org/42643 | 00:46 |
* Shrews sees lots of familiar words being thrown around. | 00:46 | |
mordred | jeblair: I think haproxy on git is the path of least resistnace right now | 00:46 |
mordred | although I think if it helps, we sohuld definitely re-work to use lbaas | 00:47 |
clarkb | ++ | 00:47 |
fungi | rework to use Shrews | 00:47 |
mordred | jeblair: if we do lbaas right now, we'll have to do a dns swap and whatnot | 00:47 |
clarkb | fungi: make Shrews do it FTFY | 00:47 |
jeblair | k, one more question -- should we spin up 30g nodes, or try shriking them a bit? | 00:47 |
* Shrews doesn't work. Try again. | 00:47 | |
jeblair | (i lean toward sticking with 30g until we benchmark) | 00:47 |
mordred | yes. I agree | 00:47 |
mordred | 30g | 00:47 |
clarkb | jeblair: ya, and we can go small easily once the lb is happy | 00:48 |
mordred | how much would it kill the cloud for us to snapshot git.o.o and then spin up git1 and git2 using that? | 00:48 |
jeblair | we are using almost none of the memory, but we don't really understand the cpu or network requirements yet | 00:48 |
mordred | (so that we don't have to do initial clones right now) | 00:48 |
fungi | the plan seems sound | 00:48 |
jeblair | mordred: faster to spin up from scratch; gerrit is lightly loaded, the push won't be bad. | 00:48 |
mordred | ok | 00:48 |
clarkb | now that we have a plan. Will everyone hate me if I duck out to bother fungi while he is on this side of the continent? | 00:49 |
jeblair | clarkb: can you stick around for a sec? | 00:49 |
clarkb | sure | 00:50 |
jeblair | in order to get there, we need some puppet work.... | 00:50 |
clarkb | ah | 00:50 |
jeblair | we need git\d+.o.o defined to be a cgit/git server | 00:50 |
SpamapS | dumb-but-performant-lbaas-->two modest layer 7 routing boxes-->appropriate target pools is not a terrible meme. | 00:50 |
SpamapS | if lbaas does ssl, win, otherwise let the layer 7 boxes do it. | 00:50 |
jeblair | SpamapS: yeah, we may come back and do l7 in the next pass | 00:50 |
jeblair | and we need git.o.o defined to be an haproxy server pointing to them | 00:51 |
SpamapS | Oh I thought you were scaling different urls differently and that was why this was complicated? | 00:51 |
SpamapS | also.. has git->swift come up? | 00:51 |
jeblair | SpamapS: that was the idea, but we're punting because it's complicated | 00:51 |
jeblair | SpamapS: you aren't helping | 00:51 |
jeblair | :) | 00:51 |
SpamapS | :) | 00:51 |
*** dkliban has joined #openstack-infra | 00:52 | |
jeblair | does that puppet description make sense? | 00:52 |
jeblair | and do we want the service and haproxy changes on each of the worker nodes as well? | 00:52 |
fungi | so just splatter https connections round-robin to the pool members? | 00:52 |
mgagne | Could it be apache not being the right tool for such use case? And I don't believe an out-of-the-box apache config is appropriate for such setup. | 00:53 |
jeblair | fungi: i think that's the idea, or whatever haproxy does (maybe it counts sockets?) | 00:53 |
mgagne | I could be mistaken | 00:53 |
jeblair | mgagne: we know it is not correct, someone needs to actually benchmark it and get good numbers | 00:53 |
jeblair | mgagne: and we're planning on using haproxy to make the git server behave better too | 00:53 |
clarkb | jeblair: as long as the node def allows us to have nodes with digits that don't haproxy and the one without digits to haproxy we should be good | 00:54 |
*** lbragstad has joined #openstack-infra | 00:54 | |
fungi | yeah, seems like two node defs to me | 00:54 |
clarkb | fungi: yurp | 00:54 |
clarkb | jeblair: should I start hacking something up? | 00:55 |
clarkb | or are you ahead of me and looking for reviews? | 00:55 |
jeblair | clarkb: no, i think we're at the point of 'looking for volunteers' | 00:55 |
mgagne | jeblair: I do understand the benefit of haproxy. I would however reduce keepalive timeout of apache and increase MaxClients if the server can handle it. Serving static files shouldn't put a server on its knees like that. | 00:56 |
clarkb | jeblair: ok I can start writing the change | 00:56 |
jswarren | Looks like there are problems with grenade. | 00:56 |
mordred | mgagne: well, right now we don't know what the server can handle | 00:56 |
mgagne | jeblair: but it's only blind suggestions as I don't have much info of what's really going on on the server | 00:56 |
fungi | mgagne: well, a lot of this is cgi backend, not flat file serving | 00:57 |
jeblair | mgagne: hopefully we'll have performance monitoring soon | 00:57 |
clarkb | jeblair: canyou or someone else diagram what they want it to look like as we have talked about a bunch of different layouts and I am not 100% sure of what we settled on | 00:57 |
mordred | mgagne: and it's actually not about serving static files - it's the not-static that are a problem | 00:57 |
jeblair | fungi: what's your schedule like, are you working at all this evening? | 00:57 |
mordred | jswarren: we're working through some things. I do not know if that's related | 00:57 |
jeblair | (i'm getting pretty close to burnout point again myself, so will probably have to pick up tomorrow) | 00:57 |
mordred | jswarren: do you have a link | 00:57 |
*** ryanpetrello has joined #openstack-infra | 00:57 | |
fungi | jeblair: i can come back and work after dinner. christine is about to bite my head off if i don't take her out to dinner and sight seeing. she's getting bored of sitting in the hotel room | 00:57 |
jswarren | http://logs.openstack.org/98/31898/49/check/gate-grenade-devstack-vm/f68a47e/console.html | 00:58 |
jswarren | Seen a couple like that. | 00:58 |
mgagne | fungi: which CGI processes ? cgit or git-http-backend? | 00:58 |
jeblair | fungi: no pressure | 00:58 |
fungi | mgagne: git-http-backend | 00:58 |
mordred | jswarren: yes | 00:58 |
mordred | http://logs.openstack.org/98/31898/49/check/gate-grenade-devstack-vm/f68a47e/logs/devstack-gate-setup-workspace-new.txt | 00:58 |
fungi | mgagne: more specifically, hundreds of git-upload-pack | 00:58 |
fungi | i think | 00:58 |
mordred | oh. wait | 00:59 |
mgagne | fungi: could it be URL without .git being served by git-http-backend instead of hitting the filesystem? | 00:59 |
mordred | jeblair: | 00:59 |
*** zul has joined #openstack-infra | 00:59 | |
mordred | fatal: Couldn't find remote ref refs/zuul/master/Z10fb39f7b5984e1283445238278973f5 | 00:59 |
mordred | Unexpected end of command stream | 00:59 |
mordred | jeblair: is zuul also having problems? or is that a consequence of git.o.o having issues? | 00:59 |
jeblair | clarkb, fungi: https://etherpad.openstack.org/git-lb | 00:59 |
fungi | mgagne: it could be just about anything right now. with the server misbehaving potentially causing fallback behaviors in the clients it's tough to know what the real problem is and what the secondary effects are | 01:00 |
jeblair | diagram ^ | 01:00 |
jeblair | mordred: was that an error? | 01:00 |
mordred | jeblair: yes. in http://logs.openstack.org/98/31898/49/check/gate-grenade-devstack-vm/f68a47e/logs/devstack-gate-setup-workspace-new.txt | 01:00 |
jeblair | mordred: i think that's a perfectly normal error | 01:00 |
clarkb | jeblair: where does ssl terminate in that diagram? | 01:00 |
mordred | ok. | 01:00 |
mordred | good! | 01:00 |
jeblair | clarkb: my understanding of haproxy is that it proxies tcp connections, | 01:01 |
jeblair | clarkb: so i think ssl terminates at the workers | 01:01 |
fungi | clarkb: my assumption is ssl terminates on the individual nodes and we do at best layer 4 redirection | 01:01 |
mordred | yes to ^^ | 01:01 |
mordred | so git1 and git2 apache should each think that they are git.o.o | 01:01 |
clarkb | jeblair: fungi ok. I think we can have it terminate in haproxy but that makes it morecomplicated /me goes with terminating on the individual nodes | 01:01 |
mordred | which means that the apache module is likely going to need to change, or the puppet | 01:01 |
Shrews | haproxy does ssl pass thru, but the dev version is supposed to support ssl termination | 01:01 |
Shrews | just fyi | 01:01 |
jeblair | sounds like that's the right choice for now then. :) | 01:02 |
mordred | otherwise it's going to spin up apache on git1 as git1.o.o and the vhost info will be wrong - unless I'm wrong? | 01:02 |
jeblair | mordred: i believe that just means "there is no zuul ref for this project" | 01:02 |
mgagne | jeblair: haproxy should support both mode | 01:03 |
fungi | mordred: the apache module is perfectly capable of serving sites with different names than the node's name | 01:03 |
mordred | jeblair: great! I was worried that there was all of a sudden another issue | 01:03 |
mordred | jeblair: quick stupid suggestion - what if we stopped doing the git remote update | 01:03 |
fungi | mordred: unless i misunderstood the question | 01:04 |
mordred | jeblair: and intead just let it do the git fetch from zuul? | 01:04 |
mordred | which is a much more specific request for information | 01:04 |
jeblair | mordred: increase the load on zuul? rather not. :) | 01:04 |
mordred | jeblair: well, I'm just saying - it's already doing the fetch from zuul, and we're already starting from repos that are pretty close | 01:05 |
jeblair | mordred: i'm not sure we'd end up with tags, etc... | 01:05 |
mordred | ah. k. there it is. tags for sure | 01:05 |
jeblair | mordred: whatever it gets from git.o.o now it would have to get from zuul | 01:05 |
mordred | yeah. k. let's call it another thing to think about later when we have more time to think | 01:05 |
jeblair | mordred: yes; that would need some testing. | 01:06 |
mordred | as in - rethink the flow of the states of the refs in the repos and see if we can avoid the blanket 'git remote upate' | 01:06 |
*** weshay has joined #openstack-infra | 01:06 | |
* mordred stops brainstorming | 01:06 | |
*** rfolco has quit IRC | 01:06 | |
mgagne | are exported resources supported by puppet on infra? | 01:06 |
mgagne | it requires storeconfigs | 01:07 |
mordred | mgagne: we've never used them | 01:07 |
mordred | mgagne: all of our stuff currently works via puppet apply as well as puppet agent | 01:07 |
mgagne | mordred: we will forget it for now I guess =) | 01:07 |
clarkb | mgagne: they are not | 01:08 |
clarkb | exported resources are kind of annoying to work with iirc. | 01:08 |
clarkb | because you need mutliple passes | 01:08 |
mordred | once we get to that level of complexity, I think we'll be happier with heat driving puppet and handing it the needed metadata | 01:08 |
mgagne | clarkb: true when bootstraping an infra, order becomes important | 01:09 |
SpamapS | hm, is there a recheck bug for https://review.openstack.org/#/c/42995/ .. looks like just timeouts during git clones or something | 01:09 |
SpamapS | (is that what is being discussed right here right now?;) | 01:09 |
jeblair | clarkb, mordred: i can't run 'nova list' for any of the hpcloud azs | 01:09 |
mordred | SpamapS: yes | 01:10 |
mordred | jeblair: are you getting the 400 error? | 01:10 |
jeblair | yes | 01:10 |
mordred | that's what I was getting ealier | 01:10 |
*** anteaya has quit IRC | 01:11 | |
mordred | jeblair: I'm asking hp people | 01:11 |
mordred | "We have a couple of P1 incidents still ongoing.  We're on it." | 01:12 |
mordred | man, when it rains, it pours | 01:12 |
jeblair | mordred: ok, thanks. nodpool isn't going to be able to delete all those nodes until that's fixed. | 01:12 |
mordred | ossum | 01:12 |
jeblair | mordred: which means it is constrained in what it can spin up | 01:12 |
fungi | i get a list out of openstackjenkins2-project1 on az-1.region-a.geo-1 | 01:13 |
fungi | also out of az2 and az3 | 01:14 |
jeblair | weird, i do not. | 01:14 |
jeblair | this is as root on ci-puppetmaster | 01:15 |
fungi | be sure you're sourcing the openstackjenkins2 creds and not the old openstackjenkins creds? | 01:15 |
jeblair | fungi: yep; i'm in a terminal i've been using for days now | 01:15 |
jeblair | (screen session) | 01:15 |
*** mriedem has quit IRC | 01:16 | |
fungi | huh, yep | 01:16 |
fungi | i get the 400 from the puppet master | 01:16 |
*** tian has joined #openstack-infra | 01:17 | |
mordred | fungi: what's the network range of the machien you do not get 400 from | 01:17 |
mordred | ? | 01:17 |
mordred | and what's the puppetmaster IP? | 01:17 |
fungi | mordred: working from 66.26.81.51 and failing from198.101.208.204 | 01:17 |
mordred | fungi: stellar | 01:18 |
fungi | oh, though on the working system i left out some of the params we define on the puppetmaster one. let me see if it's one of those | 01:19 |
mgagne | untested haproxy puppet manifest: http://paste.openstack.org/show/44704/ | 01:19 |
jeblair | a bunch of jobs are stuck trying to clone or update from earlier (about 1.25 hours ago) | 01:19 |
jeblair | i'm aborting them | 01:19 |
fungi | nope, using precisely the same creds on the puppetmaster as on my working system, i get a 400 error | 01:20 |
fungi | how do you get novaclient to tell you what version it is? | 01:21 |
fungi | my working system is running 2.14.1 from a virtualenv | 01:21 |
fungi | the puppet master is running $something_older i guess | 01:21 |
mordred | fungi: it works on my laptop using those creds | 01:22 |
fungi | so might be something specific to the way the api calls are being made by newer vs older novaclient | 01:22 |
clarkb | I will have a first draft of a change shortly | 01:22 |
mordred | wait. those were old cred. lemme try new ones | 01:22 |
mordred | yes. openstackjenkins2 works with the creds from puppetmaster on my laptop | 01:23 |
mordred | mordred@camelot:~/src/openstack-infra/gear$ nova --version | 01:24 |
mordred | 2.13.0.108 | 01:24 |
*** ryanpetrello has quit IRC | 01:24 | |
jeblair | i think the devstack jobs are cloning | 01:24 |
jeblair | repos | 01:24 |
fungi | rather than using the cached copies? | 01:24 |
mordred | jeblair: full clones? | 01:24 |
jeblair | i think so | 01:25 |
fungi | that would explain the sudden explosion in git load | 01:25 |
mordred | wow. well, that would explain the amount of traffic | 01:25 |
jeblair | i will work on that after dinner, and build new images if needed. | 01:26 |
mordred | jeblair: I'll look at that too | 01:28 |
mordred | jeblair: btw - nova on ci-puppetmaster is 2012.1 | 01:29 |
mordred | so _very_ old | 01:29 |
mordred | and if was working earlier | 01:29 |
mordred | then got flaky | 01:29 |
mordred | now is dead | 01:29 |
mordred | so I'm asking if they did any deploys today | 01:29 |
mordred | because they may have broken compat with 2012.1 novaclient | 01:29 |
*** gyee has quit IRC | 01:30 | |
fungi | that would be so awesome^Wunfortunate | 01:30 |
jeblair | mordred: ERROR: HTTPSConnectionPool(host='region-a.geo-1.identity.hpcloudsvc.com', port=35357): Max retries exceeded with url: /v2.0/tokens (Caused by <class 'socket.gaierror'>: [Errno -3] Temporary failure in name resolution) | 01:31 |
jeblair | mordred: i ran that with a newer novaclient on the same system | 01:31 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 01:31 |
fungi | huh, that's freaky | 01:32 |
mordred | jeblair, fungi: host region-a.geo-1.identity.hpcloudsvc.com worked, but took a _while_ | 01:32 |
jeblair | mordred: yeah, i got a partial timeout trying that too | 01:33 |
* fungi ducks out to dinner but will check back in later | 01:33 | |
clarkb | assuming 42784 doesn't have any syntax errors or tyops I actually expect that to work | 01:33 |
clarkb | it will only load balance across a single node of localhost right now | 01:33 |
*** zul has quit IRC | 01:36 | |
*** jjmb has joined #openstack-infra | 01:38 | |
clarkb | mordred: whatever happened to IAD? | 01:39 |
mordred | clarkb: I don't believe we did anything with it yet | 01:39 |
mordred | clarkb: I mean, I'm not sure that patch even landed | 01:39 |
jeblair | mordred: it did not land | 01:39 |
jeblair | i'm not running puppet on nodepool because it's too touch and go | 01:40 |
mordred | ++ | 01:40 |
mordred | jeblair, clarkb: troy thinks IAD may be faster - worth spinning up git1 and git2 in IAD? (also,probably less neighbors right now) | 01:40 |
jeblair | mordred: we'd be pushing updates across data centers | 01:41 |
mordred | hrm. good point | 01:41 |
jeblair | mordred: (not to mention pulling from them) | 01:41 |
lifeless | IAD? | 01:41 |
lifeless | Is that like the younger version of an IED? | 01:41 |
mordred | IAD is the airport code for the Washington Dulles airport | 01:42 |
mordred | lifeless: and is a new not-quite-rolled out region in rax cloud | 01:42 |
lifeless | ah | 01:42 |
mordred | lifeless: I don't know if we've mentioned before, but all of our important servers run in rax, because hp is too ephemeral and also blocks email ports | 01:43 |
lifeless | the email thing I knew | 01:43 |
lifeless | I didn't know about the ephemeral aspect; do you mean flaky? | 01:43 |
mordred | they also have not taken our feedback about how this makes them not suitable for our usecase to heart | 01:43 |
mordred | yes | 01:43 |
lifeless | is there a trouble ticket open on it? | 01:43 |
mordred | nodes get deleted from time to time | 01:43 |
lifeless | That seems like something we should do. | 01:44 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 01:44 |
mgagne | too fast, can't comment =( | 01:44 |
clarkb | I believe that patchset will pass tests and it has had some additional cleanup done to it | 01:44 |
clarkb | mgagne: you can commetn on the older patchest | 01:44 |
clarkb | mgagne: I will look for your comments there | 01:44 |
jeblair | mordred: i believe my new scripts put all the git repos in ~ubuntu | 01:44 |
mordred | jeblair: oh poop. that's now where devstack looks for them | 01:45 |
mordred | not | 01:45 |
jeblair | mordred: it's not the usual place, no. | 01:45 |
mgagne | sup with 29418 and 19418 ? | 01:46 |
clarkb | mgagne: 29418 is where the actual git-daemon will listen. Then each is fronted by an haproxy to do queueing that haproxy listens on 19418. | 01:47 |
clarkb | mgagne: all so that 9418 is free on git.o.o for the world. Its a bit ugly yes | 01:47 |
clarkb | but I figure the haproxy at the front of everything shouldn't worry about queueing | 01:48 |
clarkb | I could be completely wrong | 01:48 |
mgagne | clarkb: I understand now, I was wondering if it was legitimate or you were typing with boxing gloves on | 01:49 |
clarkb | mgagne: gotcha, thank you for checking | 01:49 |
mgagne | clarkb: 4443? | 01:51 |
clarkb | mgagne: again to not conflict with 443 on the frontend haproxy, because the frontend haproxy is sharing space with apache | 01:52 |
mordred | jeblair: are you respinning/fixing? or would you like me to so you can get dinner? I'm happy to squeeze in being mildly helpful before going away | 01:52 |
clarkb | oh I missed the gerrit replication stuff /me adds that | 01:52 |
jeblair | mordred: i have a local patch i'm about to let it test while eating, so i think i got it. | 01:52 |
mordred | jeblair: ok | 01:53 |
jeblair | mordred: probably applying your expertise to reviewing the haproxy thing would be helpful | 01:53 |
mgagne | clarkb: my mistake, sorry | 01:53 |
*** ftcjeff has joined #openstack-infra | 01:53 | |
lifeless | jeblair: url? [I have haproxy knowledge, for my sins] | 01:54 |
lifeless | clarkb: one of the most useful things haproxy can do for maintaining consistent response time is to cap the concurrent backend work that is being permitted | 01:54 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 01:54 |
lifeless | clarkb: so it totally should managing queue | 01:54 |
clarkb | lifeless: yeah that is what we are using it for | 01:54 |
lifeless | be managing the queue | 01:54 |
lifeless | clarkb: then 13:48 < clarkb> but I figure the haproxy at the front of everything shouldn't worry about queueing | 01:55 |
lifeless | clarkb: has me confused | 01:55 |
clarkb | lifeless: just not the frontend haproxy. We have three layers for git-daemon. The middle haproxy worries about the queue | 01:55 |
clarkb | lifeless: haproxy 9418 -> haproxy 19418 -> git-darmon 29418 | 01:55 |
lifeless | clarkb: cross-meshed across two machines ? | 01:55 |
clarkb | lifeless: https://review.openstack.org/#/c/42784/ | 01:55 |
clarkb | lifeless: no I am not worrying about the ha part of ha proxy right now | 01:56 |
clarkb | lifeless: https://review.openstack.org/#/c/42784/6/modules/cgit/manifests/init.pp has the most interesting bits in it | 01:56 |
lifeless | clarkb: ok, so whats the front end haproxy for ? | 01:56 |
clarkb | lifeless: load balancing | 01:56 |
lifeless | clarkb: what are the middle ones for then ? | 01:57 |
clarkb | lifeless: queueing | 01:57 |
openstackgerrit | Kui Shi proposed a change to openstack-dev/hacking: Improve H202 to cover more cases https://review.openstack.org/43029 | 01:57 |
lifeless | clarkb: that doesn't make sense to me | 01:57 |
clarkb | lifeless: git-daemon needs queueing otherwise it just goes nuts. A simple haproxy -> gitdaemon gives us that | 01:58 |
*** ryanpetrello has joined #openstack-infra | 01:58 | |
lifeless | clarkb: ok; so why isn't that the front haproxy ? | 01:58 |
clarkb | lifeless: reason #1 is to make it easier to consume lbaas | 01:58 |
clarkb | the thing that the simple haproxy in front of gitdaemon is doing is something that our lbaas providers cannot do | 01:58 |
lifeless | the lbaas apis don't expose the full capabilities of haproxy like queue depth limits etc? | 01:59 |
clarkb | but everything in the frontend haproxy issomething that could be replaced with lbaas | 01:59 |
clarkb | lifeless: they do not | 01:59 |
lifeless | sadface | 01:59 |
clarkb | you can set per host throttles | 01:59 |
clarkb | that is it | 01:59 |
*** lbragstad has quit IRC | 02:00 | |
lifeless | clarkb: so frankly, I wouldn't use lbaas then; you want queuing handled at the front end, and ha in the middle tier | 02:00 |
*** wenlock has joined #openstack-infra | 02:00 | |
lifeless | clarkb: but - this is your teams call; I'm just coming from my running-busy-site-with-haproxy-squid-etc-etc background | 02:00 |
mordred | lifeless: the main thing we're trying to get with this is just _some_ headroom without reengineering the whole thing yet | 02:01 |
mordred | we'd like to do a better/deeper re-architecture | 02:01 |
clarkb | lifeless: thanks, it is good to know. And yes we do plan on actually testing and engineering this stuff | 02:02 |
clarkb | but right now we need a thing that works | 02:02 |
mordred | but we need to actually analyze what's going on and what our capacity is etc - get real numbers/baselines | 02:02 |
lifeless | clarkb: whats deployed right now? | 02:02 |
mordred | yah. what he said | 02:02 |
lifeless | clarkb: all three layers? | 02:02 |
mordred | lifeless: a single apache server serving git | 02:02 |
lifeless | ok | 02:02 |
clarkb | lifeless: and xinetd in front of git-daemon | 02:02 |
clarkb | which is bonghits | 02:03 |
lifeless | so two layers of haproxy will work, but if you want to keep it simpler - which I encourage - I'd just deploy a single haproxy frontend | 02:03 |
SpamapS | simple is for the weak | 02:03 |
clarkb | I am going to manually isntall puppet-haproxy on the puppet master so that we can use dev envs with this change | 02:03 |
lifeless | and ignore lbaas for now, because what you want right now is breathing room. | 02:03 |
mordred | clarkb: ++ | 02:03 |
mordred | lifeless: yes. we are ignoring lbaas for now for sure | 02:04 |
SpamapS | http://terrorobe.soup.io/post/132401460/Downtime-is-sexy-Josh-Berkus-of-PostgreSQL | 02:04 |
SpamapS | :) | 02:04 |
clarkb | mordred: of course if I can't ssh to that server I might not install puppetlabs-haproxy | 02:04 |
clarkb | mordred: are you able to get in? | 02:04 |
mordred | ci-puppetmaster? | 02:05 |
clarkb | oh now it access my connection | 02:05 |
clarkb | mordred: yeah | 02:05 |
*** zul has joined #openstack-infra | 02:05 | |
mordred | clarkb: # TODO add additional git servers here. | 02:05 |
clarkb | mordred: you like that? | 02:06 |
*** markmcclain has quit IRC | 02:06 | |
mordred | clarkb: so, if I'm reading this right... | 02:06 |
clarkb | mordred: I think this may be a case of getting everything going on git.o.o first. Then building the new hosts and kicking everythin | 02:06 |
lifeless | clarkb: looking at this I really think one haproxy is better | 02:06 |
mordred | clarkb: yes. so deploy the haproxy on git.o.o that haproxies localhost | 02:06 |
mordred | clarkb: right? | 02:07 |
lifeless | clarkb: your configuration could give you terrible latency as it stands | 02:07 |
clarkb | mordred: yup | 02:07 |
lifeless | clarkb: in overload situations | 02:07 |
mordred | and then add the additional git servers to it | 02:07 |
mordred | lifeless, clarkb: lifeless suggestion should be easy enough to test- set balance_git to false on git1 and git2 | 02:07 |
clarkb | lifeless: it could. the git-daemon stuff won't actually be heavily used immediately so we can work it out | 02:08 |
clarkb | lifeless: the http(s) stuff is the immediate concern | 02:08 |
clarkb | ci-puppetmaster seems to have network trouble too | 02:08 |
mordred | yeah. | 02:08 |
clarkb | I can't git fetch my change | 02:08 |
mordred | clarkb: check load | 02:08 |
clarkb | mordred: its 1.5 ish | 02:09 |
mordred | clarkb: is salt running again? | 02:09 |
mordred | it should be 0 | 02:09 |
clarkb | hmm it is salt master again. I am going to kill that thing with fire | 02:09 |
mordred | yup. salt-master | 02:09 |
mordred | I believe puppet is going to re-launch him for us | 02:09 |
clarkb | heh all better now | 02:09 |
clarkb | mordred: ugh | 02:09 |
mordred | clarkb: might be worth disabling puppet agent on puppetmaster for a sec | 02:09 |
clarkb | mordred: ok I will do that. mordred do you want to write a puppet change to disable it? | 02:10 |
mordred | yes | 02:10 |
mordred | on it | 02:10 |
clarkb | *to disable salt master | 02:10 |
lifeless | clarkb: nearly finished | 02:12 |
openstackgerrit | Monty Taylor proposed a change to openstack-infra/config: Disable salt master and minions globally https://review.openstack.org/43030 | 02:12 |
mordred | clarkb: I hit it with a stick | 02:13 |
mordred | clarkb: our salt class wasn't really written with disabling in mind | 02:13 |
mordred | and I didn't want to run the risk of deleting the key info | 02:13 |
lifeless | clarkb: ok, reviewed. | 02:13 |
clarkb | I have stopped puppet on git.o.o as well. I am going to run puppet agent --test --environment development --noop there | 02:13 |
clarkb | lifeless: thank you looking | 02:14 |
lifeless | clarkb:tl;dr - one haproxy, set a backlog of 200 or so, make sure you have maxconn and maxqueue set for each backend | 02:14 |
lifeless | clarkb: the backlog affects when clients get an error rather than a long pause during overload; the maxconn prevents overloading a backend, and the maxqueue is about signaling overload and errors early | 02:15 |
clarkb | lifeless: so maxqueue is different than the conn backlog? | 02:16 |
clarkb | lifeless: also, for whatever it is worth we seem to be very bursty eg after a gate reset | 02:16 |
mordred | yeah, I think we're quite ok with things sitting in backlog wait for a while | 02:16 |
clarkb | lifeless: so having a longer backlog where things wait their turn is better than failing a bunch of tests | 02:16 |
clarkb | lifeless: or at least that was the theory | 02:17 |
lifeless | clarkb: yes, they are different. | 02:17 |
lifeless | clarkb: so I suggest get it up and working and then tune the numbers up | 02:17 |
clarkb | that is the plan | 02:17 |
lifeless | clarkb: backlog holds things in SYN without SYN-ACK | 02:17 |
clarkb | does maxqueue hold things after a handshake? | 02:18 |
lifeless | yes | 02:18 |
lifeless | there is a TCP timeout on backlog | 02:19 |
lifeless | so you really don't want it too long | 02:19 |
lifeless | let me dig that up | 02:19 |
lifeless | http://www.ietf.org/mail-archive/web/tcpm/current/msg07472.html | 02:19 |
lifeless | 0,3 etc seconds | 02:19 |
lifeless | and folk are talking about reducing it | 02:19 |
clarkb | lifeless: I don't see a maxqueue in the keyword list at http://code.google.com/p/haproxy-docs/wiki/balance | 02:20 |
lifeless | you really can't sanely avoid errors by making the backlog high | 02:20 |
lifeless | clarkb: huh, ignore the wiki, useless. | 02:20 |
lifeless | clarkb: http://haproxy.1wt.eu/download/1.3/doc/configuration.txt | 02:20 |
lifeless | maxqueue <maxqueue> | 02:20 |
lifeless | The "maxqueue" parameter specifies the maximal number of connections which | 02:20 |
lifeless | will wait in the queue for this server. If this limit is reached, next | 02:20 |
lifeless | ... | 02:20 |
clarkb | lifeless: https://github.com/puppetlabs/puppetlabs-haproxy/blob/0.3.0/manifests/params.pp#L10-L34 are the default that we would use if we don't explicitly set them | 02:20 |
*** cwj has left #openstack-infra | 02:20 | |
clarkb | not sure why there are two different maxconn values | 02:21 |
lifeless | clarkb: one is on the server, one is on frontend | 02:21 |
lifeless | clarkb: they are wholly different and its terrible it has the same name | 02:21 |
clarkb | 8k is server wide and 4k is frontend specific? | 02:22 |
lifeless | clarkb: not sure about the puppet mapping; sorry - that maxconn mentionI made was about the global thing vs server backend limits | 02:23 |
lifeless | those defaults look non-terrible to me. | 02:23 |
lifeless | clarkb: anyhow, backlog has to be less than 3s - (max RTT/2) to avoid retransmits of SYN | 02:24 |
lifeless | clarkb: which would just add overhead. | 02:24 |
lifeless | clarkb: so yeah, way lower than you have it. | 02:24 |
lifeless | clarkb: use the queue timeout value and maxqueue to control how long something can be queued, and how many things can be queued for a server. | 02:25 |
lifeless | clarkb: HTH, I need to run for a bit; ping here and I will happily review again - or if you can get me a rendered haproxy config I'm very happy climbing through those | 02:25 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 02:26 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Swap git daemon in xinetd for service https://review.openstack.org/43012 | 02:26 |
clarkb | mordred: lifeless ^ that fixes a bug the noop puppet run found in pleia2's change | 02:26 |
clarkb | lifeless: thank you. I think I am going to try ramming this in with the http stuff then fixup the gitdaemon stuff in a subsequent change | 02:27 |
clarkb | though maybe that is more work than it is worth | 02:27 |
clarkb | I think I am going to take this as an opportunity to head home | 02:28 |
clarkb | rerunning noop puppet really quickly with the latest patchset | 02:29 |
*** jerryz has quit IRC | 02:33 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 02:34 |
clarkb | puppet noop is being slow so I went ahead and fiddled with lifeless' suggestions | 02:34 |
*** yaguang has joined #openstack-infra | 02:34 | |
jeblair | hpcloud seems to be working better now, and nodepool seems to be doing a better job deleting nodes now | 02:35 |
*** morganfainberg is now known as morganfainberg|a | 02:36 | |
clarkb | jeblair: mordred: the current noop run looks mostly clean. There is one error but I think it is related to puppet not copying files locally because it is in noop mode | 02:36 |
clarkb | jeblair: mordred do we want to attempt applying it? | 02:37 |
clarkb | I think rolling back will involve stopping haproxy, and reapplying old puppet to get the old apache configs back | 02:37 |
jeblair | clarkb: i'm about to check out for the night (i'm past my point of uselessness), so i'd say: your call | 02:37 |
*** adalbas has quit IRC | 02:37 | |
clarkb | jeblair: I am feeling a bit like that too | 02:37 |
jeblair | clarkb: i'm mostly sticking around to fix the image thing (which should reduce the criticality of the git thing) | 02:38 |
clarkb | probably best to hold off on git for now | 02:38 |
jeblair | clarkb: sounds like the way to go. | 02:38 |
jeblair | clarkb: see how i'm talking? "image thing" "git thing"? | 02:38 |
jeblair | useless | 02:38 |
*** xchu has joined #openstack-infra | 02:39 | |
clarkb | :) I am beat | 02:39 |
* clarkb heads home. Tomorrow we can hit this thing with a giant stick | 02:39 | |
lifeless | ooh, stick. | 02:39 |
clarkb | lifeless: are my numbers at https://review.openstack.org/#/c/42784/8/modules/cgit/manifests/init.pp any better? | 02:40 |
lifeless | clarkb: btw - http://code.google.com/p/haproxy-docs/wiki/ServerOptions is where maxqueue is covered | 02:40 |
*** rcleere has joined #openstack-infra | 02:40 | |
lifeless | clarkb: it looks like that wiki is just machine-processed from the docs | 02:40 |
lifeless | clarkb: I don't know if maxqueue there will end up in the right place; but the literal numbers are saner yes. | 02:41 |
mordred | jeblair: you're sounding like me! | 02:42 |
clarkb | lifeless: ya Iwas worreid about it not ending up in the right place after reading the maxqueue doc | 02:42 |
lifeless | clarkb: I'm not confident they are right in any shape, but then I have a different model for how failures should go in my head :) | 02:42 |
jeblair | mordred: i just put "sudo -u ubuntu" in a script and wondered why it didn't run as jenkins. | 02:42 |
lifeless | clarkb: and now is not the time to run through that given tired + fire drill | 02:42 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 02:45 |
clarkb | lifeless: ^ that puts maxqueue in the correct spot and now I am getting off of IRC | 02:45 |
clarkb | mordred: don't have too much fun on the playa. it will only make feature freeze less enjoyable for the rest of us :) | 02:46 |
mordred | clarkb: well, you can also look at it the other way... | 02:46 |
mordred | clarkb: I will be hitting in the extreme weather conditions of the high desert in an arid desert with a abnormally basic pH | 02:46 |
mordred | clarkb: where the only running water, food, electricity or trash service are the ones I bring myself | 02:47 |
mordred | clarkb: surrounded by 60k people who are all in various stages of mind alteration who are walking around with things on fire | 02:47 |
clarkb | mordred: good point. You have basically described why I would have a hard time doing it myself :) | 02:47 |
mordred | :) | 02:48 |
jeblair | mordred: how is that different than feature freeze? | 02:48 |
lifeless | jeblair: more dust? | 02:49 |
jeblair | lifeless: that must be it | 02:49 |
*** jhesketh has quit IRC | 02:52 | |
*** melwitt1 has quit IRC | 02:52 | |
*** jhesketh has joined #openstack-infra | 02:57 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Move setup scripts destination https://review.openstack.org/43033 | 02:57 |
jeblair | mordred: around? part 1 of my fix ^ | 03:00 |
mordred | jeblair: looking | 03:00 |
*** dims has quit IRC | 03:00 | |
mordred | jeblair: and the difference is the alkalinity | 03:00 |
jeblair | mordred: i don't actually need to merge that one (i can just run it in place) | 03:00 |
jeblair | mordred: the next one i will need to merge | 03:01 |
mordred | jeblair: +2 anyway | 03:01 |
SpamapS | I am so jealous of you guys | 03:02 |
SpamapS | I haven't scaled anything in years. :-P | 03:03 |
mordred | SpamapS: you're always welcome in here | 03:04 |
jeblair | SpamapS: it's all yours if you want it. :) | 03:04 |
mordred | SpamapS: there's plenty to go around | 03:04 |
mordred | SpamapS: also, just wait until we start using heat for some of this | 03:04 |
SpamapS | uh err, no I'm busy with my theoretical scaling things in Heat. | 03:04 |
jeblair | SpamapS: the team scales horizontally too | 03:04 |
mordred | SpamapS: we'll have excellent real-world feedback for you | 03:04 |
SpamapS | I actually can't wait | 03:04 |
*** jog0 is now known as jog0-away | 03:04 | |
mordred | you say that now... | 03:04 |
jeblair | SpamapS: it comes in the form of mordred yelling | 03:05 |
mordred | nothing is ever quite so fun as watching the thundering herd of feature freeze come your way | 03:05 |
SpamapS | You guys will thank me that I got this one done: https://bugs.launchpad.net/heat/+bug/1214580 :) | 03:05 |
uvirtbot | Launchpad bug 1214580 in heat "Heat does not make use of the C libyaml parser." [High,In progress] | 03:05 |
jeblair | that's some serious scaling | 03:05 |
mordred | SpamapS: is libyaml web-scale? | 03:05 |
SpamapS | mordred: its not. It doesn't use /dev/null. | 03:05 |
mordred | SpamapS: I mean, I've heard that /dev/null processes yaml faster | 03:05 |
mordred | dammit | 03:06 |
mordred | you were quicker | 03:06 |
SpamapS | Need to tackle the ORM insanity though.. https://bugs.launchpad.net/heat/+bug/1214602 | 03:06 |
uvirtbot | Launchpad bug 1214602 in heat "stack_list loads all resource from the database via the ORM" [Medium,Triaged] | 03:06 |
*** woodspa has quit IRC | 03:06 | |
mordred | SpamapS: oh, have fun with that | 03:06 |
SpamapS | 100 stacks, 10 resources each == 1000 sql queries to do 'heat stack-list' | 03:06 |
SpamapS | actually probably 1100 sql queries | 03:06 |
mordred | SpamapS: now that IS web-scale | 03:08 |
SpamapS | https://bugs.launchpad.net/heat/+bug/1214239 | 03:10 |
uvirtbot | Launchpad bug 1214239 in heat "Infinitely recursing stacks reach python's maximum recursion depth" [Medium,Triaged] | 03:10 |
SpamapS | mordred: ^^ thats what I'm working on right now | 03:10 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Fix nodepool setup scripts https://review.openstack.org/43037 | 03:13 |
jeblair | mordred: review+aprv ^ ? | 03:13 |
mordred | jeblair: wow. you might actually almost enjoy te weather on playa this year: http://www.weather.com/weather/tenday/Gerlach+NV+USNV0033 | 03:13 |
mordred | jeblair: looking | 03:13 |
jeblair | mordred: wow, not bad. i could dig that. | 03:14 |
jeblair | mordred: maybe i'll go to the desert next door. | 03:14 |
mordred | jeblair: why get rid of pushd/popd? (curious) | 03:14 |
jeblair | mordred: don't care about the current dir anymore; there's an explicit cd to the script dir at the bottom | 03:15 |
mordred | jeblair: ah. see it | 03:15 |
jeblair | mordred: (cwd should now be ~jenkins instead of the script dir) | 03:15 |
jeblair | mordred: (because of the sudo) | 03:15 |
mordred | +2 - want me to aprv? | 03:15 |
jeblair | mordred: pls | 03:16 |
mordred | done | 03:16 |
jeblair | my local instance is spinning up a host from an image from that now; i'll double check its sane and then apply | 03:16 |
mordred | ++ | 03:17 |
jeblair | mordred: then i'll set the image state to delete for one of the providers, which should automatically build a new one | 03:18 |
jeblair | and do that one at a time | 03:18 |
openstackgerrit | A change was merged to openstack-infra/config: Fix nodepool setup scripts https://review.openstack.org/43037 | 03:19 |
jeblair | mordred: even though i set one image to deleted, it's going to rebuild all of them. | 03:26 |
jeblair | so, um, hopefully it will work. :) | 03:26 |
jeblair | (the old ones will still be there, so we can roll back if we need to; it's just going to be a little less incremental than i'd hoped) | 03:27 |
jeblair | mordred: i think the post jobs need to fetch from review.o.o; the replication to git.o.o isn't fast enough | 03:31 |
jeblair | mordred: (or else, we should catch that error and retry in g-g-p) | 03:32 |
jeblair | mordred: https://jenkins02.openstack.org/job/openstack-admin-manual-netconn/47/console | 03:32 |
*** blamar has quit IRC | 03:32 | |
*** mberwanger has joined #openstack-infra | 03:34 | |
*** blamar has joined #openstack-infra | 03:34 | |
*** ^d has quit IRC | 03:44 | |
jeblair | mordred: that image seems to be good; it's no longer cloning repos | 03:46 |
mordred | jeblair: awesome | 03:47 |
jeblair | unfortunately, the image build process for az2 was disconnected, so it's still using the old one | 03:47 |
mordred | jeblair: and yes re: post fetch from review.o.o | 03:47 |
jeblair | i'll kick off another image build, hopefully az2 will succed this time | 03:47 |
mordred | sigh | 03:47 |
jeblair | mordred: the good news is that at this point, it will keep making new nodes from az1 and az3 | 03:48 |
jeblair | and will only start using az2 again if that image update succeeds | 03:48 |
jeblair | mordred: so starting from right now, no new nodes should be created from the old images | 03:48 |
*** nayward has joined #openstack-infra | 03:57 | |
openstackgerrit | Jason Meridth proposed a change to openstack-dev/hacking: Adds ability to ignore hacking validations with noqa https://review.openstack.org/41713 | 03:58 |
jeblair | mordred: az2 failed again | 04:01 |
*** ftcjeff has quit IRC | 04:01 | |
jeblair | i'm going to leave it as is, maybe it'll get better overnight. | 04:01 |
mordred | kk | 04:06 |
*** afazekas has joined #openstack-infra | 04:09 | |
*** vogxn has joined #openstack-infra | 04:09 | |
mgagne | wow, git.o.o interface is blazing fast now | 04:13 |
mordred | mgagne: whee! | 04:17 |
mordred | mgagne: it helps when it's not being pummelled to death by zuul jobs | 04:17 |
mgagne | mordred: well, we can say it was a great benchmark | 04:18 |
mordred | mgagne: we always learn MANY MANY things during feature freeze | 04:18 |
mgagne | mordred: haha, I was questioning myself the timing of such update =) | 04:18 |
mordred | mgagne: we knew the rust was coming, we've been trying to get enough new tech in place to handle it | 04:19 |
mordred | mgagne: part of this rush was that we removed one of the bottlenecks from last time by making that part of the system better | 04:19 |
mordred | mgagne: and have thus found the next piece in the puzzle :) | 04:19 |
mgagne | mordred: =) | 04:19 |
*** senk has quit IRC | 04:20 | |
*** sridevi has joined #openstack-infra | 04:22 | |
*** sridevi has left #openstack-infra | 04:22 | |
*** sridevi has joined #openstack-infra | 04:23 | |
*** wenlock has quit IRC | 04:23 | |
*** sridevi has quit IRC | 04:32 | |
jeblair | mordred: we now have a full set of images for all the providers | 04:33 |
*** mberwanger has quit IRC | 04:34 | |
*** morganfainberg|a is now known as morganfainberg | 04:35 | |
*** mberwanger has joined #openstack-infra | 04:38 | |
* fungi is caught back up and reviewing the outstanding bits. glad the source of the pummeling was figured out | 04:38 | |
fungi | even with the performance issues we had, the graph says we still spiked up to 600jph today | 04:40 |
jeblair | mordred, clarkb: there's a boat load of hpcloud servers stuck in "ACTIVE(deleting)" state; we may need to open a trouble ticket if the're still around tomorrow | 04:40 |
jeblair | the neutron job seems to be flakey :( | 04:42 |
*** nati_ueno has joined #openstack-infra | 04:42 | |
*** nayward has quit IRC | 04:46 | |
*** Anju has joined #openstack-infra | 04:54 | |
*** ladquin is now known as ladquin_afk | 04:55 | |
*** thomasbiege1 has joined #openstack-infra | 05:01 | |
*** thomasbiege1 has quit IRC | 05:01 | |
*** mirrorbox has quit IRC | 05:05 | |
*** mberwanger has quit IRC | 05:06 | |
*** ogelbukh has quit IRC | 05:06 | |
*** enikanorov-w has quit IRC | 05:08 | |
*** enikanorov-w has joined #openstack-infra | 05:10 | |
*** sridevi has joined #openstack-infra | 05:15 | |
*** rcleere has quit IRC | 05:23 | |
sridevi | Hi, can anyone help me debug the jenkins' failure in https://review.openstack.org/#/c/34801/ | 05:28 |
sridevi | anyone? | 05:29 |
sridevi | around? | 05:29 |
*** DennyZhang has joined #openstack-infra | 05:33 | |
*** nicedice_ has quit IRC | 05:37 | |
*** UtahDave has joined #openstack-infra | 05:47 | |
*** DennyZhang has quit IRC | 05:55 | |
openstackgerrit | A change was merged to openstack/requirements: Remove upper bounds on lifeless test libraries https://review.openstack.org/42515 | 05:55 |
*** vogxn has quit IRC | 05:57 | |
*** cody-somerville has quit IRC | 05:57 | |
*** sridevi has quit IRC | 05:57 | |
openstackgerrit | A change was merged to openstack/requirements: Add dogpile.cache>=0.5.0 to requirements https://review.openstack.org/42455 | 05:58 |
*** vogxn has joined #openstack-infra | 05:58 | |
*** w_ has joined #openstack-infra | 06:02 | |
*** olaph has quit IRC | 06:05 | |
*** ryanpetrello has quit IRC | 06:11 | |
*** vogxn has quit IRC | 06:11 | |
*** cody-somerville has joined #openstack-infra | 06:13 | |
*** nayward has joined #openstack-infra | 06:17 | |
*** vogxn has joined #openstack-infra | 06:20 | |
*** Dr0id has joined #openstack-infra | 06:20 | |
*** dmakogon_ has joined #openstack-infra | 06:24 | |
*** Dr0id has quit IRC | 06:25 | |
*** annegentle has quit IRC | 06:25 | |
*** odyssey4me4 has joined #openstack-infra | 06:25 | |
*** psedlak has joined #openstack-infra | 06:30 | |
*** annegentle has joined #openstack-infra | 06:30 | |
*** AJaeger has joined #openstack-infra | 06:33 | |
*** sridevi has joined #openstack-infra | 06:34 | |
*** afazekas has quit IRC | 06:44 | |
*** jinkoo has joined #openstack-infra | 06:51 | |
*** ruhe has joined #openstack-infra | 06:52 | |
*** Guest75819 has quit IRC | 06:56 | |
openstackgerrit | Mark McLoughlin proposed a change to openstack/requirements: Allow use of oslo.messaging 1.2.0a10 https://review.openstack.org/43060 | 07:04 |
*** lillie has joined #openstack-infra | 07:06 | |
*** lillie is now known as Guest16331 | 07:06 | |
*** stevebaker has quit IRC | 07:07 | |
*** Dr01d has joined #openstack-infra | 07:10 | |
*** stevebaker has joined #openstack-infra | 07:12 | |
sridevi | anyone around? | 07:14 |
sridevi | I'm having trouble debugging the devstack neutron failures | 07:15 |
*** stevebaker has quit IRC | 07:18 | |
*** thomasbiege1 has joined #openstack-infra | 07:18 | |
*** jinkoo has quit IRC | 07:19 | |
*** yonglihe_ has joined #openstack-infra | 07:19 | |
yonglihe_ | hello, seems Jenkins build machine had problem, | 07:20 |
yonglihe_ | 2013-08-21 06:11:42.794 | Started by user anonymous | 07:20 |
yonglihe_ | 2013-08-21 06:11:42.797 | [EnvInject] - Loading node environment variables. | 07:20 |
yonglihe_ | 2013-08-21 06:11:42.833 | Building remotely on centos6-7 in workspace /home/jenkins/workspace/gate-nova-python26 | 07:20 |
yonglihe_ | 2013-08-21 06:11:42.866 | [gate-nova-python26] $ /bin/bash -xe /tmp/hudson2665365283182338716.sh | 07:20 |
yonglihe_ | 2013-08-21 06:11:42.873 | + /usr/local/jenkins/slave_scripts/gerrit-git-prep.sh https://review.openstack.org http://zuul.openstack.org https://git.openstack.org | 07:20 |
yonglihe_ | 2013-08-21 06:11:42.877 | Triggered by: https://review.openstack.org/35074 | 07:20 |
yonglihe_ | 2013-08-21 06:11:42.877 | + [[ ! -e .git ]] | 07:20 |
yonglihe_ | 2013-08-21 06:11:42.878 | + git remote set-url origin https://git.openstack.org/openstack/nova | 07:20 |
yonglihe_ | 2013-08-21 06:11:42.882 | + git remote update | 07:20 |
yonglihe_ | 2013-08-21 06:11:42.889 | Fetching origin | 07:20 |
yonglihe_ | 2013-08-21 06:51:42.842 | Build timed out (after 40 minutes). Marking the build as failed. | 07:21 |
yonglihe_ | 2013-08-21 06:51:42.934 | fatal: The remote end hung up unexpectedly | 07:21 |
yonglihe_ | 2013-08-21 06:51:42.939 | error: Could not fetch origin | 07:21 |
yonglihe_ | 2013-08-21 06:51:42.941 | + git remote update | 07:21 |
yonglihe_ | 2013-08-21 06:51:42.949 | Fetching origin | 07:21 |
yonglihe_ | http://logs.openstack.org/74/35074/24/check/gate-nova-python26/5227601/console.html | 07:21 |
*** pblaho has joined #openstack-infra | 07:21 | |
yonglihe_ | sorry for the long log | 07:21 |
*** stevebaker has joined #openstack-infra | 07:21 | |
*** sridevi has quit IRC | 07:24 | |
morganfainberg | yonglihe_: i'm sure not a worry, but next time (to avoid the long log) use a paste (e.g. http://paste.openstack.org/ ) | 07:25 |
morganfainberg | (that way you can reference it again if needed as well w/o having to hunt for it) | 07:25 |
*** kspear has quit IRC | 07:25 | |
*** xBsd has joined #openstack-infra | 07:27 | |
*** DennyZhang has joined #openstack-infra | 07:28 | |
*** dmakogon_ has quit IRC | 07:28 | |
*** shardy_afk is now known as shardy | 07:29 | |
*** michchap has quit IRC | 07:29 | |
*** GheRivero has quit IRC | 07:30 | |
*** kspear has joined #openstack-infra | 07:30 | |
*** thomasbiege1 has quit IRC | 07:33 | |
*** GheRivero has joined #openstack-infra | 07:35 | |
*** dmakogon_ has joined #openstack-infra | 07:37 | |
*** kspear has quit IRC | 07:40 | |
yonglihe_ | thanks morganfainberg, i got it | 07:40 |
*** boris-42 has joined #openstack-infra | 07:42 | |
*** jpich has joined #openstack-infra | 07:51 | |
*** nati_uen_ has joined #openstack-infra | 07:54 | |
*** michchap has joined #openstack-infra | 07:54 | |
yonglihe_ | http://paste.openstack.org/show/44724/ | 07:54 |
*** michchap has quit IRC | 07:54 | |
yonglihe_ | seems something lost, but i can not find which machine is this. | 07:55 |
*** michchap has joined #openstack-infra | 07:55 | |
*** fbo_away is now known as fbo | 07:55 | |
*** nati_ueno has quit IRC | 07:57 | |
*** vogxn has quit IRC | 07:57 | |
*** mikal has joined #openstack-infra | 07:59 | |
*** GheRivero has quit IRC | 08:02 | |
*** GheRivero has joined #openstack-infra | 08:02 | |
*** michchap has quit IRC | 08:03 | |
*** GheRivero has quit IRC | 08:03 | |
*** GheRivero has joined #openstack-infra | 08:04 | |
*** xchu has quit IRC | 08:05 | |
*** nayward has quit IRC | 08:10 | |
*** nayward has joined #openstack-infra | 08:11 | |
*** dmakogon_ has quit IRC | 08:11 | |
*** moted has quit IRC | 08:11 | |
*** EntropyWorks has quit IRC | 08:11 | |
*** soren has quit IRC | 08:11 | |
*** mindjiver has quit IRC | 08:11 | |
*** clarkb has quit IRC | 08:11 | |
*** rockstar has quit IRC | 08:11 | |
*** echohead has quit IRC | 08:11 | |
*** jeblair has quit IRC | 08:11 | |
*** echohead has joined #openstack-infra | 08:12 | |
*** mindjiver has joined #openstack-infra | 08:12 | |
*** jeblair has joined #openstack-infra | 08:12 | |
*** clarkb has joined #openstack-infra | 08:12 | |
*** EntropyWorks has joined #openstack-infra | 08:12 | |
*** soren has joined #openstack-infra | 08:12 | |
*** soren has quit IRC | 08:12 | |
*** soren has joined #openstack-infra | 08:12 | |
*** moted has joined #openstack-infra | 08:12 | |
*** Kiall has quit IRC | 08:13 | |
*** rockstar has joined #openstack-infra | 08:13 | |
*** rockstar has joined #openstack-infra | 08:13 | |
*** AJaeger has quit IRC | 08:13 | |
*** kiall has joined #openstack-infra | 08:15 | |
*** vogxn has joined #openstack-infra | 08:20 | |
*** GheRivero has quit IRC | 08:20 | |
*** xchu has joined #openstack-infra | 08:21 | |
*** GheRivero has joined #openstack-infra | 08:21 | |
*** xBsd has quit IRC | 08:22 | |
*** GheRivero has quit IRC | 08:22 | |
*** GheRivero has joined #openstack-infra | 08:22 | |
*** GheRivero has quit IRC | 08:29 | |
*** GheRivero has joined #openstack-infra | 08:29 | |
*** xBsd has joined #openstack-infra | 08:32 | |
*** michchap has joined #openstack-infra | 08:34 | |
*** michchap has quit IRC | 08:42 | |
*** UtahDave has quit IRC | 08:45 | |
*** Dr01d has quit IRC | 08:45 | |
*** Dr01d has joined #openstack-infra | 08:46 | |
*** DennyZha` has joined #openstack-infra | 08:53 | |
*** DennyZhang has quit IRC | 08:55 | |
*** xBsd has quit IRC | 08:55 | |
*** jpich has quit IRC | 08:57 | |
*** jpich has joined #openstack-infra | 08:59 | |
*** BobBall_Away is now known as BobBall | 09:06 | |
*** yaguang has quit IRC | 09:07 | |
*** xchu has quit IRC | 09:07 | |
*** yaguang has joined #openstack-infra | 09:09 | |
*** yaguang has quit IRC | 09:14 | |
*** ruhe has quit IRC | 09:16 | |
*** ruhe has joined #openstack-infra | 09:18 | |
*** xchu has joined #openstack-infra | 09:20 | |
*** yaguang has joined #openstack-infra | 09:27 | |
*** nayward has quit IRC | 09:28 | |
*** yaguang has quit IRC | 09:35 | |
*** ruhe has quit IRC | 09:37 | |
*** kspear has joined #openstack-infra | 09:37 | |
*** xBsd has joined #openstack-infra | 09:39 | |
*** yaguang has joined #openstack-infra | 09:42 | |
*** nayward has joined #openstack-infra | 09:52 | |
*** markmc has joined #openstack-infra | 09:54 | |
*** DennyZha` has quit IRC | 10:01 | |
*** pcm_ has joined #openstack-infra | 10:04 | |
*** pcm_ has quit IRC | 10:06 | |
*** pcm_ has joined #openstack-infra | 10:06 | |
*** boris-42 has quit IRC | 10:09 | |
*** ruhe has joined #openstack-infra | 10:16 | |
*** xchu has quit IRC | 10:19 | |
*** Shrews has quit IRC | 10:27 | |
*** Shrews has joined #openstack-infra | 10:36 | |
*** nati_uen_ has quit IRC | 10:39 | |
*** markmcclain has joined #openstack-infra | 10:39 | |
*** xBsd has quit IRC | 10:39 | |
*** xBsd has joined #openstack-infra | 10:40 | |
markmc | anyone seeing zuul miss events ? | 10:49 |
* markmc just pushed ~30 nova patches and there's only 10 in the check queue | 10:49 | |
markmc | maybe it's just catching up | 10:50 |
markmc | ah, yeah | 10:50 |
markmc | 1 added every 30 seconds or so | 10:50 |
*** vogxn has quit IRC | 10:50 | |
*** SergeyLukjanov has joined #openstack-infra | 10:50 | |
markmc | oh, god, I shouldn't watch the zuul dashboard | 10:52 |
markmc | this failure: https://jenkins02.openstack.org/job/gate-swift-devstack-vm-functional/94/console | 10:52 |
markmc | just aborted 18 changes in the gate queue | 10:53 |
markmc | tragic | 10:53 |
*** SergeyLukjanov has quit IRC | 10:54 | |
openstackgerrit | Stuart McLaren proposed a change to openstack/requirements: Bump python-swiftclient requirement to >=1.5 https://review.openstack.org/43092 | 10:54 |
*** ruhe has quit IRC | 10:59 | |
*** yaguang has quit IRC | 11:00 | |
*** ruhe has joined #openstack-infra | 11:01 | |
*** SergeyLukjanov has joined #openstack-infra | 11:03 | |
*** whayutin_ has joined #openstack-infra | 11:04 | |
mordred | markmc: morning | 11:04 |
markmc | mordred, howdy | 11:05 |
mordred | markmc: at oscon we discussed an idea about how to speculatively deal with the scenario you tweeted about | 11:05 |
*** dina_belova has joined #openstack-infra | 11:05 | |
*** weshay has quit IRC | 11:05 | |
markmc | mordred, the "no! no! no! zuul! don't do it! nooooo!" scenario? :) | 11:06 |
mordred | markmc: it gets complex, so it's not going to happen this cycle, ut there is a way we could use WAY more resources to start a new virtual queue based on the now-presumptive state of the world | 11:06 |
mordred | markmc: yeah | 11:06 |
mordred | the reason we leave those jobs aborted currently is that we don't know if changes 1 and 2 will fail or not - so we wait for the queue head of the aborted jobs to resolve | 11:06 |
mordred | if we restarted them currently, we'd essentially need to start building a tree rather than a plain queue | 11:07 |
mordred | but - we had a good chat about it | 11:07 |
mordred | :) | 11:07 |
markmc | not sure I follow, but definitely an interesting subject :) | 11:07 |
markmc | now go have fun offline :) | 11:08 |
mordred | I think now that we have gearman, multi-jenkins and the new nodepool code - we'll be set nicely to think about things like that next cycle | 11:08 |
mordred | markmc: I have 5 hours of plane flights before I get to do that | 11:08 |
markmc | ah, lovely | 11:08 |
mordred | yah. | 11:08 |
markmc | use that time wisely | 11:08 |
* mordred is hoping that he can provide _some_ usefulness after how hectic the past two days have been | 11:09 | |
markmc | like replying to all your linkedin recruiter spam | 11:09 |
mordred | jeez | 11:09 |
mordred | that's not possible | 11:09 |
mordred | although, I've learned that there is a Java Opportunity in Studio City, CA | 11:09 |
markmc | I contemplated hacking on gerrit's topic review support briefly yesterday | 11:09 |
markmc | very briefly | 11:09 |
mordred | hahahaha | 11:10 |
*** whayutin_ is now known as weshay | 11:11 | |
mordred | git.o.o is running warm, but not dying currently: | 11:14 |
mordred | http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=854&rra_id=all | 11:14 |
mordred | given the current length of the queue, i'm going to take that as a good thing | 11:15 |
mordred | load average of 11 - all cpus at around 80% | 11:16 |
mordred | WOW | 11:18 |
*** boris-42 has joined #openstack-infra | 11:19 | |
mordred | swift change 28892 has been in the gate queue for 12H | 11:19 |
mordred | but it MIGHT merge in 11 minutes | 11:20 |
mordred | after which point we will have a gate reset event :) | 11:20 |
mordred | and everyone can watch the thundering herd clone from git.o.o | 11:21 |
*** xBsd has quit IRC | 11:26 | |
*** xBsd has joined #openstack-infra | 11:31 | |
*** BobBall has quit IRC | 11:37 | |
*** lcestari has joined #openstack-infra | 11:41 | |
*** zehicle_at_dell has joined #openstack-infra | 11:41 | |
*** nayward has quit IRC | 11:45 | |
*** dina_belova has quit IRC | 11:45 | |
*** xBsd has quit IRC | 11:45 | |
*** AJaeger has joined #openstack-infra | 11:47 | |
*** AJaeger has joined #openstack-infra | 11:47 | |
*** xBsd has joined #openstack-infra | 11:49 | |
*** dims has joined #openstack-infra | 11:52 | |
*** apcruz has joined #openstack-infra | 11:54 | |
*** zehicle_at_dell has quit IRC | 11:55 | |
*** AJaeger has quit IRC | 11:59 | |
*** ruhe has quit IRC | 12:00 | |
*** AJaeger has joined #openstack-infra | 12:02 | |
*** AJaeger has joined #openstack-infra | 12:02 | |
*** BobBall has joined #openstack-infra | 12:03 | |
*** dprince has joined #openstack-infra | 12:03 | |
*** psedlak has quit IRC | 12:04 | |
*** michchap has joined #openstack-infra | 12:06 | |
*** michchap has joined #openstack-infra | 12:07 | |
*** AJaeger has quit IRC | 12:11 | |
*** SergeyLukjanov has quit IRC | 12:11 | |
*** zehicle_at_dell has joined #openstack-infra | 12:12 | |
openstackgerrit | Julien Danjou proposed a change to openstack-infra/config: Add py33 jobs for WSME https://review.openstack.org/43112 | 12:16 |
*** jungleboyj has quit IRC | 12:17 | |
*** jungleboyj has joined #openstack-infra | 12:18 | |
*** zehicle_at_dell has quit IRC | 12:24 | |
*** dkliban has quit IRC | 12:27 | |
*** jjmb has quit IRC | 12:35 | |
*** dkranz has joined #openstack-infra | 12:36 | |
*** dims has quit IRC | 12:40 | |
*** dkranz has quit IRC | 12:41 | |
*** dims has joined #openstack-infra | 12:42 | |
openstackgerrit | A change was merged to openstack/requirements: assign a min version to pycadf https://review.openstack.org/42923 | 12:47 |
*** pabelanger has quit IRC | 12:55 | |
*** cppcabrera has joined #openstack-infra | 12:57 | |
*** adalbas has joined #openstack-infra | 12:57 | |
*** alexpilotti has joined #openstack-infra | 13:01 | |
*** ruhe has joined #openstack-infra | 13:04 | |
*** weshay has quit IRC | 13:05 | |
*** zehicle_at_dell has joined #openstack-infra | 13:07 | |
*** mriedem has joined #openstack-infra | 13:09 | |
markmc | seeing a lot of these | 13:11 |
markmc | https://jenkins02.openstack.org/job/gate-grenade-devstack-vm/2834/console | 13:11 |
markmc | anyone know what the cause is? | 13:11 |
*** cppcabrera has left #openstack-infra | 13:11 | |
mordred | markmc: looking | 13:11 |
*** dina_belova has joined #openstack-infra | 13:11 | |
*** dina_belova has quit IRC | 13:12 | |
*** dina_belova has joined #openstack-infra | 13:12 | |
mordred | markmc: http://logs.openstack.org/56/42756/5/check/gate-grenade-devstack-vm/c05ec42/logs/devstack-gate-setup-workspace-old.txt | 13:13 |
mordred | markmc: | 13:13 |
mordred | + timeout -k 1m 5m git remote update | 13:13 |
mordred | Fetching origin | 13:13 |
mordred | error: RPC failed; result=52, HTTP code = 0 | 13:14 |
openstackgerrit | will soula proposed a change to openstack-infra/jenkins-job-builder: Adding AnsiColor Support https://review.openstack.org/43121 | 13:14 |
mordred | fatal: The remote end hung up unexpectedly | 13:14 |
mordred | error: Could not fetch origin | 13:14 |
mordred | clarkb, jeblair, fungi ^^ looks like we're still slamming git.o.o | 13:14 |
markmc | ok | 13:15 |
*** dina_belova has quit IRC | 13:17 | |
*** acabrera has joined #openstack-infra | 13:17 | |
*** acabrera has left #openstack-infra | 13:18 | |
*** vogxn has joined #openstack-infra | 13:19 | |
*** weshay has joined #openstack-infra | 13:19 | |
*** anteaya has joined #openstack-infra | 13:25 | |
*** jjmb has joined #openstack-infra | 13:25 | |
mordred | markmc: wanna hear something funny? | 13:29 |
markmc | mordred, perhaps :) | 13:29 |
mordred | hacking can't pass unittest in python 3.3 because if its python 3 compatibility checks | 13:29 |
mordred | because the good/bad strings throw different errors :) | 13:30 |
markmc | nice | 13:30 |
mordred | yah | 13:30 |
*** afazekas has joined #openstack-infra | 13:30 | |
*** afazekas has quit IRC | 13:31 | |
*** lbragstad has joined #openstack-infra | 13:32 | |
*** sandywalsh has quit IRC | 13:43 | |
*** jjmb has quit IRC | 13:46 | |
*** changbl has quit IRC | 13:46 | |
*** ftcjeff has joined #openstack-infra | 13:48 | |
*** prad_ has joined #openstack-infra | 13:53 | |
jeblair | things would be a lot better if the neutron job weren't flakey | 13:54 |
anteaya | morning jeblair | 13:57 |
anteaya | I might be wrong, but am I seeing we have 10 devstack precise nodes available? | 13:57 |
anteaya | when we normally have about twice as many | 13:57 |
jeblair | anteaya: http://tinyurl.com/kmotmns | 13:58 |
anteaya | ah so the chart at the very bottom of the long check queue on zuul status page is just saying we have very few free, since it is entitled "available test nodes" | 13:59 |
jeblair | yep | 13:59 |
anteaya | I knew my interpretation didn't make sense | 13:59 |
anteaya | thanks | 13:59 |
* anteaya is refraining from acknowledging mordred since his is on vacation | 14:00 | |
mordred | morning anteaya | 14:00 |
anteaya | Wednesday 8am, the timeline has got to be Central time, for some reason | 14:00 |
mordred | jeblair: remember the tox 1.6 issue where it stopped using our mirror? | 14:01 |
anteaya | which is weird since I know you are on Pacific time jeblair | 14:01 |
anteaya | morning mordred | 14:01 |
mordred | jeblair: I filed a bug and hpk is said that $HOME thing should not have merged/been in 1.6 | 14:01 |
mordred | he's working on a 1.6.1 that reverts that change | 14:01 |
*** burt has joined #openstack-infra | 14:01 | |
mordred | and I've just tested it and it works well | 14:02 |
jeblair | yay | 14:02 |
jeblair | mordred: did we discuss using afs to share the git repos across several git servers? | 14:03 |
mordred | jeblair: we did not - but I think it's an excellent idea | 14:03 |
mordred | jeblair: because, honestly, it's not file io that's a problem - it's the cpu cost associated with calculating what's needed | 14:03 |
jeblair | mordred: it is seriously worth considering; local caching + invalidation would be good; we'd just need to make sure the locking model works | 14:04 |
mordred | jeblair: so, quite honestly, if all of our nodes were afs clients and read from /afs/infra.openstack.org/git/$project | 14:04 |
jeblair | mordred: (all this as opposed to having gerrit star-replicate to n workers) | 14:04 |
mordred | yes | 14:04 |
mordred | jeblair: oh, were you thinking afs to get the repos to the gitX servers? | 14:05 |
jeblair | mordred: oh, heh, well, afs can be somewhat bandwidth inefficient; so i'm not sure how well having everything use it would work; i was just thinking of a pool of git servers. | 14:05 |
jeblair | mordred: yeah | 14:05 |
mordred | gotcha | 14:05 |
mordred | well, here's the thing | 14:05 |
mordred | we can start with that | 14:05 |
mordred | and it'll either work or not | 14:05 |
mordred | and then if that is set up - then we can look at whether access via /afs on slaves is better or worse | 14:06 |
mordred | pretty easilyu | 14:06 |
jeblair | yep. though by start you mean 'start looking into after we implement our current plan', right? :) | 14:06 |
*** dkliban has joined #openstack-infra | 14:06 | |
mordred | jeblair: god yes | 14:06 |
jeblair | so we have some real data now | 14:06 |
jeblair | i mean, it's only like 2 data points | 14:07 |
mordred | jeblair: did the az2 image update work? | 14:07 |
mordred | it looked to me like it did from looking at nova image base information | 14:07 |
jeblair | but we know that if 100 clients hit git.o.o, we push 20-25Mbit and peg user cpu time | 14:07 |
jeblair | mordred: do you get the idea that top was lying to us? | 14:08 |
*** _TheDodd_ has joined #openstack-infra | 14:08 | |
mordred | tough to say, honestly | 14:08 |
jeblair | mordred: there's like no i/o. | 14:09 |
mordred | jeblair: that doesn't realy surprise me | 14:09 |
mordred | there's tons of ram on the boxes | 14:09 |
jeblair | so it's all cpu (and possibly file locking; not sure how that would show up) | 14:09 |
mordred | I'm pretty sure it's all in the fs cache layer | 14:09 |
mordred | file locking I _think_ would show up in sys wait time | 14:10 |
jeblair | that's what i'd expect, unless git is doing something on its own | 14:10 |
*** ryanpetrello has joined #openstack-infra | 14:11 | |
*** vogxn has quit IRC | 14:12 | |
*** michchap has quit IRC | 14:12 | |
*** dina_belova has joined #openstack-infra | 14:13 | |
jeblair | mordred: i'm reading about git's lockfile usage (to understand current behavior); i note that it _is_ compatible with afs. | 14:16 |
mordred | neat | 14:16 |
mordred | you know - afs client caching may make total access not ridiculous | 14:16 |
mordred | since most of the pack files should wind up cached client side | 14:16 |
jeblair | mordred: it's the initial population i'm worried about; though, i suppose if the devstack nodes have a fully populated afs cache from image creation... maybe not so bad. | 14:17 |
*** dina_belova has quit IRC | 14:17 | |
*** ruhe has quit IRC | 14:17 | |
mordred | jeblair: yah. that's what I was thining | 14:17 |
mordred | thinking | 14:17 |
jeblair | mordred: i've uh, never used an afs client that was cloned from another afs client. | 14:18 |
jeblair | mordred: those two worlds have not collided for me. :) | 14:18 |
mordred | love it | 14:18 |
anteaya | mordred: this was the jenkins failure on your disable salt globally patch: http://logs.openstack.org/30/43030/1/check/gate-ci-docs/1cdc607/console.html.gz can I do "recheck no bug"? | 14:20 |
mordred | anteaya: yes. | 14:20 |
mordred | the failure is a git clone failure | 14:20 |
dhellmann | good morning | 14:20 |
*** vogxn has joined #openstack-infra | 14:21 | |
Alex_Gaynor | dhellmann: morning (I assume you're not at home?) | 14:21 |
jeblair | well, crap; it looks like zuul is stuck again | 14:21 |
mordred | morning dhellmann ! | 14:21 |
anteaya | mordred: that was what I thought, thanks for confirmation | 14:21 |
dhellmann | Alex_Gaynor: it's still morning here at home :-) | 14:21 |
anteaya | morning dhellmann Alex_Gaynor | 14:21 |
anteaya | jeblair: :( | 14:21 |
*** datsun180b has joined #openstack-infra | 14:22 | |
openstackgerrit | A change was merged to openstack-infra/zuul: SIGUSR2 logs stack traces for active threads. https://review.openstack.org/42959 | 14:22 |
mordred | Alex_Gaynor: there's a little bit of pushback from clayg on syncing with global requirements - I responded that it's not urgent and that perhaps sdague and I should chat with him when we both get back from vacation | 14:22 |
mordred | Alex_Gaynor: but then I just realized that you have a foot in both worlds | 14:22 |
jeblair | i forced that ^ so it's in place after the restart | 14:22 |
mordred | jeblair: great. I support you in that | 14:22 |
Alex_Gaynor | mordred: Ok, I can take a look at trying to push that along, I need to take a bit and figure out what hte most effective advocacy strategy is going to be | 14:23 |
anteaya | jeblair: do we need to change channel status do you think? | 14:23 |
Alex_Gaynor | I think so, zuul seems totally stalled | 14:24 |
mordred | Alex_Gaynor: yeah - I think we might need to articulate better the reasons we want it | 14:24 |
dhellmann | so I'd like to set up WSME on launchpad so bugs are updated when things happen in gerrit. IIRC, to do that for ceilometer I added a user (or group?) to our Drivers group. Is that right? | 14:24 |
mordred | Alex_Gaynor: also, I think we have a little bit of the traditional push-back against 'openstack is one project' (I don't mean that to be nasty, just that there are remaining pockets of resistence to that decision, and I think they color openstack-centric tasks at times, which means extra care needs to be taken with justification) | 14:25 |
*** dguitarbite has joined #openstack-infra | 14:25 | |
jeblair | i'm restarting zuul | 14:26 |
Alex_Gaynor | This is going to cause us to lose all current pipelines? Are there any thoughts about putting that state somewhere persistent? | 14:27 |
jeblair | Alex_Gaynor: i've saved a copy | 14:27 |
Alex_Gaynor | jeblair: oh, cool | 14:27 |
*** ladquin_afk is now known as ladquin | 14:29 | |
jeblair | i'm adding them back with a 30 second delay between each. | 14:30 |
mordred | jeblair: nice | 14:31 |
mordred | jeblair: you know - I wonder - when zuul re-queues things after a gate reset - perhaps it should put a delay between each gearman request? mitigate the herd a little bit? | 14:31 |
*** jungleboyj has quit IRC | 14:33 | |
jeblair | mordred: yeah, i was suggesting that to clarkb yesterday as something to explore; we need to be careful that we don't get too backed up | 14:34 |
mordred | yah | 14:34 |
jeblair | that's the thing with queuing systems; if you can't keep up with the throughput, you can get into situations where you never recover | 14:34 |
jeblair | so i'm much more focused on making sure we can keep up | 14:34 |
*** yolanda has joined #openstack-infra | 14:34 | |
jeblair | mordred: that '30 second delay' i'm doing? that's 15 minutes before the gate queue is populated again. | 14:35 |
Alex_Gaynor | So is zuul CPU bound, or something else? | 14:35 |
yolanda | hi, i'm trying to deploy zuul using an apache frontend that is on another machine, but i'm having a problem with serving git repos, any one has done something similar? | 14:35 |
yolanda | problem i have is with aliasmatch, it refers to AliasMatch ^/p/(.*/objects/[0-9a-f]{2}/[0-9a-f]{38})$ /var/lib/zuul/git/$1 that is on zuul machine, and cannot be accessed from apache | 14:36 |
jeblair | Alex_Gaynor: zuul is not operating near its limits | 14:36 |
Alex_Gaynor | jeblair: so it's git / gearman / gerrit ? | 14:36 |
jeblair | Alex_Gaynor: but if it were, it would be cpu bound | 14:36 |
jeblair | Alex_Gaynor: the current problem is we can't serve git repos fast enough for all the test jobs | 14:37 |
Alex_Gaynor | jeblair: ok, so we're sure it's that | 14:37 |
*** thomasbiege1 has joined #openstack-infra | 14:37 | |
jeblair | Alex_Gaynor: which is why today's project is load-balancing that across multiple servers | 14:37 |
Alex_Gaynor | jeblair: surely someone has had this problem before right... ? We can't be the first people to to be git-bound | 14:37 |
jeblair | Alex_Gaynor: zuul's problem is that it has a bug that we haven't been able to identify due to inadequate logging and lack of ability to get a stacktrace | 14:38 |
jeblair | Alex_Gaynor: http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=23 | 14:38 |
jeblair | that's zuul ^ | 14:38 |
jeblair | Alex_Gaynor: http://cacti.openstack.org/cacti/graph_view.php?action=tree&tree_id=1&leaf_id=43 | 14:38 |
jeblair | that's git ^ | 14:38 |
Alex_Gaynor | consistent 10-15MBps, that's rpetty cool | 14:39 |
*** thomasbiege1 has quit IRC | 14:39 | |
jeblair | mordred, clarkb, zaro: when the gearman server restarts, i think the executorworkerthread dies, which means the offline-on-complete feature fails | 14:42 |
*** xBsd has quit IRC | 14:42 | |
jeblair | mordred, clarkb, zaro: which is why a lot of jobs are showing up as lost right now -- they are re-running on hosts that should have been offlined | 14:42 |
*** michchap has joined #openstack-infra | 14:43 | |
jeblair | so for the moment, if we stop zuul, we need to delete all the slaves | 14:44 |
anteaya | ouch | 14:45 |
*** pblaho has quit IRC | 14:45 | |
*** gordc has joined #openstack-infra | 14:46 | |
*** changbl has joined #openstack-infra | 14:46 | |
*** pabelanger has joined #openstack-infra | 14:46 | |
*** AJaeger has joined #openstack-infra | 14:47 | |
*** AJaeger has joined #openstack-infra | 14:47 | |
openstackgerrit | Ryan Petrello proposed a change to openstack-infra/config: Provide a more generic run-tox.sh. https://review.openstack.org/43145 | 14:48 |
*** jungleboyj has joined #openstack-infra | 14:48 | |
mgagne | mordred: what was your gerrit search filter you sent a couple of weeks ago? | 14:49 |
jeblair | mordred: so none of the az2 nodes are launching jenkins slaves. | 14:50 |
jeblair | i spot-checked one and got this: | 14:50 |
jeblair | $ java -version | 14:50 |
jeblair | Segmentation fault (core dumped) | 14:50 |
*** michchap has quit IRC | 14:51 | |
Alex_Gaynor | awesome. | 14:51 |
ttx | mordred: late pong | 14:51 |
jeblair | 3 makes a pattern, right? | 14:51 |
anteaya | ttx: since you are here: https://review.openstack.org/#/c/43002/ | 14:53 |
*** rnirmal has joined #openstack-infra | 14:53 | |
*** kspear has quit IRC | 14:54 | |
*** kspear has joined #openstack-infra | 14:54 | |
*** ruhe has joined #openstack-infra | 14:57 | |
*** _TheDodd_ has quit IRC | 14:59 | |
*** _TheDodd_ has joined #openstack-infra | 15:01 | |
*** w_ is now known as olaph | 15:02 | |
ttx | anteaya: looking | 15:04 |
anteaya | thanks | 15:04 |
Alex_Gaynor | jeblair: can you link the review for load balanced git? | 15:07 |
*** UtahDave has joined #openstack-infra | 15:07 | |
*** mrodden has quit IRC | 15:08 | |
jeblair | Alex_Gaynor: https://review.openstack.org/#/c/42784/ | 15:08 |
jeblair | Alex_Gaynor: I think we're also going to do this https://review.openstack.org/#/c/43012/ | 15:08 |
ttx | anteaya: reviewed | 15:08 |
anteaya | ttx thank you | 15:08 |
*** vogxn has quit IRC | 15:08 | |
*** rnirmal has quit IRC | 15:10 | |
openstackgerrit | Anita Kuno proposed a change to openstack-infra/config: Creating/adding the openstack/governance repository https://review.openstack.org/43002 | 15:13 |
*** dina_belova has joined #openstack-infra | 15:13 | |
jeblair | pleia2, clarkb, Alex_Gaynor: I'm going to spin up a few copies of git.o.o of different sizes (8, 15, 30) for testing. | 15:15 |
jeblair | pleia2, clarkb, Alex_Gaynor: if we are cpu bound, it looks like the 8gb machines (4vcpus) might be the sweet spot (half the cpus with 1/4 the ram of a 30gb vm) | 15:15 |
anteaya | mordred can I get your feedback on the openstack/governance name, please? | 15:16 |
anteaya | if you don't like it, can I get a better suggestion? | 15:16 |
*** dina_belova has quit IRC | 15:18 | |
*** mrodden has joined #openstack-infra | 15:19 | |
*** mkerrin has quit IRC | 15:20 | |
*** mkerrin has joined #openstack-infra | 15:20 | |
openstackgerrit | Anita Kuno proposed a change to openstack-infra/config: Creating/adding the openstack/governance repository https://review.openstack.org/43002 | 15:20 |
*** mkerrin has quit IRC | 15:23 | |
mordred | anteaya: openstack/governance sounds great | 15:24 |
mordred | jeblair: wow. segfault. nice | 15:24 |
mordred | mgagne: gerrit search filter ... for things I should review? | 15:25 |
jeblair | mordred: yeah, i'm going to leave that and assume it's an image problem | 15:25 |
mgagne | yes | 15:25 |
mordred | jeblair: k | 15:25 |
mordred | jeblair: gosh, do we need to make the ssh check an "ssh and run java --version" check? | 15:25 |
jeblair | slowing down the rate of adding new nodes is mildly helpful atm anyway. | 15:25 |
mordred | mgagne: I do this: https://review.openstack.org/#/q/watchedby:mordred%2540inaugust.com+-label:CodeReview%253C%253D-1+-label:Verified%253C%253D-1+-label:Approved%253E%253D1++-status:workinprogress+-status:draft+-is:starred+-owner:mordred%2540inaugust.com,n,z | 15:26 |
jeblair | mordred: hrm, i wonder if the template host was broken, or if the image created from the template host was broken. | 15:26 |
mordred | jeblair: good question - template host still around? | 15:26 |
jeblair | it would be difficult to find out, since the template host is deleted almost immediately | 15:26 |
mordred | yeah. I was afraid of that | 15:26 |
mordred | mgagne: and I scan that list, and star things that I need to review, then I do: https://review.openstack.org/#/q/is:starred+-label:CodeReview%253C%253D-1+-label:Verified%253C%253D-1,n,z | 15:27 |
mordred | and unstar things when I'm done with them | 15:27 |
mordred | I'm doing that every morning when I wake up now | 15:27 |
mordred | it's helping | 15:27 |
mordred | (although getting the list under control and then reviewing every morning also helped) | 15:27 |
mordred | jeblair: az2 was the one having issues yesterday though | 15:28 |
mordred | jeblair: mark the image for delete and see if it can generate a real one today? | 15:28 |
mgagne | mordred: thanks! | 15:29 |
*** mkerrin has joined #openstack-infra | 15:30 | |
jeblair | mordred: yeah, but we're doing ok on the other 2 for now, this will help with the git.o.o load (a little) so i'm in no rush to fix | 15:30 |
mrodden | anyone seen tox failing with "no such option: --pre" on the pip install step? | 15:31 |
mordred | anteaya: reviewed | 15:31 |
mordred | jeblair: ok | 15:31 |
mrodden | apparently tox 1.6.0 is on virtualenv 1.9.1 which has pip 1.3.1 embedded which doesn't support --pre | 15:31 |
mordred | mrodden: link? | 15:31 |
markmc | *sob* my change approved 8 hours ago is now 34th in the gate queue *sob* | 15:31 |
mrodden | not sure why i am hitting it all of a sudden | 15:31 |
markmc | where's my violin? | 15:31 |
mordred | mrodden: that souds like the glanceclient issue from the other day that I expect to be fixed | 15:32 |
mrodden | mordred: its in my local env | 15:32 |
mrodden | oh | 15:32 |
mordred | mrodden: update your glanceclient | 15:32 |
mordred | mrodden: evil happened | 15:32 |
mrodden | lol | 15:32 |
mrodden | will do | 15:32 |
mrodden | thanks | 15:32 |
* dansmith feels sorry for zuul today | 15:33 | |
mordred | dansmith: it likes it | 15:33 |
dansmith | mordred: oh, a little masochistic, is it? | 15:34 |
mordred | heck yes | 15:34 |
dansmith | kinky. | 15:34 |
jeblair | markmc, dansmith: current major issues: we can't serve git repos fast enough for all the tests we're running; the neutron job appears flakey. | 15:36 |
dansmith | dammit neutron! | 15:37 |
markmc | jeblair, yeah, was following along | 15:37 |
mriedem | i know one guy in here that likes to give out punishment, might be a good match for zuul :) | 15:37 |
markmc | don't mind the whining from the cheap seats | 15:37 |
*** nati_ueno has joined #openstack-infra | 15:37 | |
jeblair | markmc, dansmith: minor issues: zuul has a bug that causes it to stop occasionally; one of our test images has a java that segfaults | 15:37 |
jeblair | and a few more minor than that | 15:37 |
markmc | heh, "minor issues" | 15:37 |
dansmith | nice, I saw the big reset this morning | 15:38 |
mordred | markmc: gotta love feature freeze, when the two of those are 'minor' | 15:38 |
jeblair | markmc: yeah when "zuul stops working" is a minor issue, you know we're having fun. :) | 15:38 |
mordred | oh - corralary to that issue- debugging a hung python program is apparently not easy | 15:38 |
markmc | jeblair, not whining honestly, but how did https://review.openstack.org/#/c/43060/ end up at the bottom after the restart ? | 15:38 |
markmc | jeblair, shoulda been near the top, no? | 15:39 |
jeblair | markmc: erm, it's worse than that. :( it was at the top, but due to a recently discovered very minor issue, when i restarted zuul, several of the test nodes were not off-lined as they should have been | 15:39 |
* markmc puts it down to karma for approving his own change | 15:39 | |
jeblair | markmc: so it got dequeued due to an erroneously failing test | 15:40 |
markmc | jeblair, ok | 15:40 |
jeblair | markmc: sorry :( | 15:40 |
* markmc shrugs | 15:40 | |
openstackgerrit | Anita Kuno proposed a change to openstack-infra/config: Creating/adding the openstack/governance repository https://review.openstack.org/43002 | 15:40 |
jeblair | markmc: we now know that if that happens again we need to clean up the test nodes until we can automate that case | 15:40 |
anteaya | mordred: thanks | 15:40 |
markmc | jeblair, cool | 15:40 |
jeblair | markmc: that's what most of the "LOST" jobs on the screen are | 15:41 |
*** reed has joined #openstack-infra | 15:41 | |
markmc | jeblair, ok, thanks | 15:41 |
markmc | jeblair, that's a particularly sad name for a status | 15:41 |
markmc | LOST,LONELY | 15:41 |
markmc | now that would be sad | 15:41 |
jeblair | markmc: or just "SAD" | 15:42 |
anteaya | :( | 15:42 |
markmc | jeblair, indeed :) | 15:42 |
openstackgerrit | Andreas Jaeger proposed a change to openstack-infra/config: Build Basic Install Guide for openSUSE https://review.openstack.org/42988 | 15:44 |
*** dkranz has joined #openstack-infra | 15:46 | |
*** nayward has joined #openstack-infra | 15:49 | |
*** SergeyLukjanov has joined #openstack-infra | 15:49 | |
*** senk has joined #openstack-infra | 15:51 | |
chmouel | so for the LOST thing should I just do a recheck no bugs? | 15:51 |
*** nati_ueno has quit IRC | 15:51 | |
Alex_Gaynor | chmouel: yup | 15:51 |
chmouel | Alex_Gaynor: tks | 15:52 |
* chmouel didn't feel like reading the full scrollback :-p | 15:52 | |
*** rfolco has joined #openstack-infra | 15:53 | |
*** dina_belova has joined #openstack-infra | 15:54 | |
*** vogxn has joined #openstack-infra | 15:56 | |
*** pcm_ has quit IRC | 15:56 | |
*** boris-42 has quit IRC | 15:57 | |
*** mkerrin has quit IRC | 15:59 | |
*** mkerrin has joined #openstack-infra | 15:59 | |
jeblair | pleia2, fungi, clarkb: the git puppet manifest has some problems; an selinux command failed during the firts run, and i think there may be an rpm/pip conflict on the pyyaml package | 16:00 |
*** mkerrin has quit IRC | 16:01 | |
*** mkerrin has joined #openstack-infra | 16:01 | |
*** mkerrin has quit IRC | 16:02 | |
clarkb | :( | 16:02 |
clarkb | jeblair It needs a firewall ypdate too | 16:02 |
clarkb | jeblair was that run on a new host or the existing? | 16:03 |
jeblair | that was easy enough to fix (pip uninstall pyyaml) | 16:04 |
jeblair | clarkb: new hosts -- i'm spinning up test hosts for benchmarking | 16:04 |
clarkb | cool. let me know if you catch other puppet things I will update that manifest soon | 16:04 |
fungi | jeblair: ah, yes i believe i pointed out the selinux thing to pleia2 before. i think the issue is that enabling selinux requires a reboot, and the command to adjust selinux won't work until it's activated | 16:05 |
*** ruhe has quit IRC | 16:05 | |
fungi | i believe it was an oversight caused by hpcloud enabling selinux by default and rackspace not | 16:05 |
jeblair | yay | 16:05 |
clarkb | fungi so activate; reboot; puppet? | 16:05 |
*** sridevi has joined #openstack-infra | 16:05 | |
*** jungleboyj has left #openstack-infra | 16:06 | |
fungi | clarkb: i think just reboot, but may need to manually activate selinux before doing so (though i think the puppet selinux module has already set it to be active after a reboot) | 16:06 |
sridevi | Hi, can someone help me with this jenkins' failure. | 16:06 |
*** AJaeger has quit IRC | 16:06 | |
sridevi | https://review.openstack.org/#/c/34801/ | 16:06 |
sridevi | anyone? | 16:07 |
anteaya | thanks markmc | 16:07 |
markmc | anteaya, thank you | 16:08 |
anteaya | :D | 16:08 |
anteaya | sridevi: I'll take a look | 16:09 |
sridevi | thanks anteaya | 16:09 |
sridevi | http://logs.openstack.org/01/34801/21/check/gate-tempest-devstack-vm-neutron/3076bcb/console.html.gz | 16:09 |
jeblair | sridevi: that appears to be a real failure; it happens consistently for every test run for days now. | 16:09 |
sridevi | real failure, you mean some bug in the patch? jeblair | 16:10 |
jeblair | sridevi: yes | 16:10 |
sridevi | okay. | 16:10 |
jeblair | sridevi: i'd recommend setting up a devstack environment and testing it locally there | 16:11 |
sridevi | jeblair: Hmm. But I don't see any error other that "ERROR:root:Could not find any typelib for GnomeKeyring" | 16:11 |
anteaya | Process leaked file descriptors. | 16:11 |
anteaya | it is in every failure log | 16:12 |
jeblair | anteaya: that's harmless | 16:12 |
anteaya | jeblair: ah okay | 16:12 |
jeblair | sridevi: it looks like the patch broke devstack, from the way the devstack log ends. | 16:12 |
sridevi | Hmm | 16:13 |
jeblair | sridevi: last line of this file: http://logs.openstack.org/01/34801/21/check/gate-tempest-devstack-vm-full/984c01f/logs/devstacklog.txt.gz | 16:13 |
*** nayward has quit IRC | 16:13 | |
pleia2 | hm, what brought in pyyaml? | 16:13 |
jeblair | pleia2: jeepyb | 16:13 |
pleia2 | jeblair: via pip? | 16:14 |
pleia2 | (looking now) | 16:15 |
jeblair | pleia2: i think the sequencing is off; it installed jeepyb first which would have easy_installed it using python setup.py install, then it tried to install the rpm | 16:15 |
jeblair | pleia2: i think either we want to make jeepyb require-> the package, or else remove the package and let easy install do its thing | 16:16 |
*** SergeyLukjanov has quit IRC | 16:16 | |
sridevi | jeblair: what in the last line."services=s-container" | 16:16 |
sridevi | ? | 16:16 |
reed | hi guys, how are things going today? | 16:16 |
pleia2 | jeblair: I see, thanks | 16:16 |
* fungi is going to be out at the space needle and the science museum for a little while, but will be back on later this afternoon | 16:17 | |
pleia2 | fungi: enjoy :) | 16:17 |
fungi | thanks pleia2 | 16:17 |
anteaya | fungi have fun at the space needle | 16:17 |
reed | fungi, enjoy... and in your free time comment on https://review.openstack.org/#/c/42998/ :) | 16:17 |
anteaya | reed: about the same as yesterday, zuul got stuck again this morning | 16:18 |
jeblair | reed: not terribly well, i think we have at least a full day ahead of us | 16:18 |
reed | :( | 16:18 |
reed | not terribly well is hard to parse | 16:18 |
anteaya | sridevi: yes, that is the last line that ran in devstack, after that it broke | 16:18 |
*** rnirmal has joined #openstack-infra | 16:18 | |
jeblair | reed: heh, that seems appropriate somehow. anyway, 'poorly'. :) | 16:19 |
reed | not terribly is a double negation, right? makes it a positive... well is positive ... double positive is bad? :) | 16:19 |
sridevi | anteaya: hmm | 16:19 |
anteaya | sridevi: the fact that devstack didn't finish is an indication that the patch affected the devstack installation | 16:19 |
reed | jeblair, trying to assess how long it will take for https://review.openstack.org/#/c/42998/ to be evaluated and go through... two days? | 16:20 |
reed | (it's my request for a staging server) | 16:20 |
anteaya | sridevi: so your patch affects swift and the swift container service couldn't install properly | 16:21 |
sridevi | okay | 16:21 |
*** ruhe has joined #openstack-infra | 16:21 | |
anteaya | sridevi: here is the screen log for the swift container: http://logs.openstack.org/01/34801/21/check/gate-tempest-devstack-vm-full/984c01f/logs/screen-s-container.txt.gz | 16:22 |
jeblair | reed: i hope so; but this is a very exceptional time; we have unprecedented test load, several systems that need upgrading to deal with it, and only two core developers full-time (though i believe we are more than full-time at the moment) | 16:22 |
koolhead17 | hi all | 16:22 |
anteaya | hi koolhead17 | 16:23 |
koolhead17 | anteaya: how have you been | 16:23 |
anteaya | koolhead17: good thanks, trying to be helpful without getting in the way | 16:23 |
anteaya | busy time right now | 16:23 |
jeblair | reed: as soon as things are not on fire, i will review your and mrmartin's patches | 16:23 |
koolhead17 | reed: hi there | 16:23 |
koolhead17 | anteaya: what patch are we discussing about | 16:24 |
clarkb | jeblair: just a little more than full time :) | 16:24 |
clarkb | jeblair: I am finally in a chair where I can focus. Is there anything I should look at first/immediately? | 16:24 |
jeblair | clarkb: get the git.o.o load balanced stuff ready to go | 16:24 |
clarkb | ok | 16:25 |
reed | jeblair, thanks | 16:25 |
jeblair | clarkb: i'm working on some simple benchmarking (but obviously even simple benchmarking is going to take a bit) | 16:25 |
anteaya | koolhead17: well, I was helping sridevi with his patch https://review.openstack.org/#/c/34801/ I have a patch up: https://review.openstack.org/#/c/43002/4 and two patches are under consideration hoping they will help the current jenkins/zuul/git issues: https://review.openstack.org/#/c/42784/ https://review.openstack.org/#/c/43012/ | 16:26 |
anteaya | so we have a few to choose from, koolhead17 :D | 16:26 |
anteaya | jeblair clarkb I don't think I know enough to be of use and don't want to slow you down, if there is something you think I can do to help, please tell me | 16:27 |
jeblair | anteaya: thanks; fielding questions like that ^ is _very_ helpful | 16:28 |
anteaya | jeblair: very good, I shall endeavour to do my best | 16:28 |
*** SergeyLukjanov has joined #openstack-infra | 16:28 | |
clarkb | jeblair: any interest in updating the git.pp to possibly run on precise sans cgit? | 16:29 |
*** cthulhup has joined #openstack-infra | 16:29 | |
clarkb | jeblair: not sure if you are interested in testing that, but I think it would be a small change | 16:29 |
jd__ | I've a LOST job here https://review.openstack.org/#/c/42642/ should I open a bug? | 16:30 |
anteaya | jd__: yes | 16:30 |
anteaya | no no bug | 16:30 |
anteaya | it is a a result of a zuul restart this morning | 16:30 |
anteaya | the gearman server lost a thread | 16:30 |
jd__ | anteaya: define "morning"? :) | 16:30 |
anteaya | and as a result there were lost jobs | 16:30 |
anteaya | sorry yes, you are right | 16:30 |
jd__ | ack, I'll recheck no bug then | 16:30 |
anteaya | about 3 hours ago | 16:30 |
anteaya | yes, recheck no bug | 16:31 |
anteaya | thanks | 16:31 |
jeblair | clarkb: no cgit that way | 16:31 |
jd__ | thanks anteaya | 16:31 |
anteaya | :D | 16:31 |
clarkb | jeblair: correct, it would just be a repo mirror | 16:31 |
*** markmc has quit IRC | 16:31 | |
jd__ | btw I wonder, what/where is openstackstatus used? | 16:31 |
jeblair | clarkb: haven't we started using the cgit server? | 16:31 |
anteaya | jd__: where do you see openstackstatus? | 16:31 |
anteaya | I'm on help desk as the fires are being fought | 16:32 |
jeblair | #status alert LOST jobs are due to a known bug; use "recheck no bug" | 16:32 |
openstackstatus | NOTICE: LOST jobs are due to a known bug; use "recheck no bug" | 16:32 |
*** ChanServ changes topic to "LOST jobs are due to a known bug; use "recheck no bug"" | 16:32 | |
*** dina_belova has quit IRC | 16:32 | |
clarkb | jeblair: a little yes. we would probably end up needing to do an additional set of proxying for cgit back to the centos servers. now that I think about it nevermind | 16:32 |
jeblair | clarkb: yeah, i think that why we decided to just throw hardware at it for now | 16:33 |
jd__ | anteaya: I meant the bot, but now I see it changes the topic :) | 16:33 |
anteaya | jd__: ah okay | 16:33 |
jeblair | jd__: it needs some work; it's not very reliable yet | 16:33 |
jeblair | jd__: eventually we'd like it in all the channels and to have it update web pages | 16:33 |
jeblair | jd__: it's been a while since we've had time to hack on that | 16:34 |
clarkb | jeblair: I am not finding python-yaml or pyyaml in our puppet manifest for cgit. It looks like jeepyb installs it and something on centos is installing it globally? And since centos doesn't do site-packages they interfere? | 16:34 |
jd__ | jeblair: what's its Git repository? | 16:34 |
*** jpich has quit IRC | 16:34 | |
jeblair | jd__: openstack-infra/statusbot | 16:34 |
clarkb | jeblair: I think I am going to ignore that for now as you have a work around | 16:34 |
jeblair | jd__: one of the pre-reqs for all channels is this bug (let me fetch it) | 16:34 |
*** gyee has joined #openstack-infra | 16:34 | |
*** sridevi has quit IRC | 16:35 | |
jeblair | jd__: https://bugs.launchpad.net/openstack-ci/+bug/1190296 | 16:35 |
uvirtbot | Launchpad bug 1190296 in openstack-ci "IRC bot to manage official channel settings" [Medium,Triaged] | 16:35 |
jeblair | (i don't want to add it to 30 channels manually) | 16:35 |
jeblair | jd__: and then it has problems reconnecting on netsplits | 16:35 |
jeblair | don't know if there's a bug for that | 16:36 |
jeblair | clarkb: sounds good | 16:36 |
jd__ | jeblair: ack | 16:36 |
*** dina_belova has joined #openstack-infra | 16:37 | |
mrodden | wow that is dirty... | 16:38 |
anteaya | mrodden: what are you referencing? | 16:38 |
mrodden | when you pip install virtualenv it drops the latest version it can find of pip into $SITE_PACKAGES/virtual_env/ | 16:38 |
mrodden | and it never updates it from then on | 16:39 |
mrodden | and that is what it uses when it creates a new virtualenv | 16:39 |
mrodden | so my virtualenvs were all stuck at pip 1.2.1 | 16:39 |
mrodden | sorry $SITE_PACKAGES/virtualenv_support/ | 16:39 |
mgagne | mordred: since you are the gerrit search master to me, how can you exclude changes which have been reviewed by yourself? | 16:40 |
clarkb | mrodden: correct because virtualenv vendors pip and setuptools and distribute | 16:40 |
mrodden | clarkb: yeah but for soem reason it had pip 1.2.1 and also pip 1.4.1 and was only using 1.2.1 | 16:41 |
mrodden | it doesnt enforce that it copies the correct version from that spot | 16:41 |
mrodden | :( | 16:41 |
anteaya | mgagne: I'm not sure of his status, he was last here an hour ago | 16:41 |
*** Dr01d has quit IRC | 16:41 | |
mgagne | anteaya: thanks, I can wait =) | 16:42 |
anteaya | k | 16:42 |
anteaya | I'm sure once he is on planes he will pop in again | 16:42 |
anteaya | I am afk for about 30 minutes, I have to give a hand to a family member | 16:43 |
*** dkranz has quit IRC | 16:45 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 16:45 |
clarkb | jeblair: ^ I believe that is in a reviewable state. I am going to --noop apply it to git.o.o now | 16:45 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 16:47 |
clarkb | and that addressed one more review comment | 16:47 |
*** AJaeger has joined #openstack-infra | 16:47 | |
*** AJaeger has joined #openstack-infra | 16:47 | |
jeblair | clarkb: the switch could be hairy; if it doesn't work, we end up with a lot of failed jenkins jobs | 16:49 |
clarkb | jeblair: yup | 16:49 |
clarkb | jeblair: how do you feel about putting jenkins* into shutdown mode while we do it? | 16:49 |
jeblair | clarkb: may want shut down puppet and apply it to a test node first | 16:49 |
*** cthulhup has quit IRC | 16:49 | |
jeblair | clarkb: at the current rate, you'd still have to wait like 30 minutes for the git processes to finish | 16:50 |
*** vogxn has quit IRC | 16:50 | |
clarkb | jeblair: just the git processes? | 16:50 |
clarkb | wow | 16:50 |
jeblair | clarkb: last i looked, the devstack-gate prep steps were taking a looong time | 16:50 |
jeblair | clarkb: i have 3 test nodes we can run it on. :) | 16:51 |
jeblair | 8 192.237.168.226 | 16:51 |
jeblair | 15 162.209.12.127 | 16:51 |
jeblair | 30 198.101.151.5 | 16:51 |
jeblair | clarkb: ^ | 16:52 |
jeblair | clarkb: (first column is memory) | 16:52 |
clarkb | jeblair: ok I can hijack one of them and change its certname so that it gets the haproxy stuff | 16:52 |
jeblair | clarkb: please; i ran 'puppet apply --test --certname git.openstack.org' | 16:52 |
clarkb | jeblair: also, this is a multistep process. The change above will only add haproxy and move the apache vhosts and git daemon to offset ports | 16:53 |
jeblair | clarkb: take the 15g one | 16:53 |
clarkb | jeblair: it won't do load balancing until we get another change or two in to replciate to the other hosts and balance across them with haproxy | 16:53 |
clarkb | jeblair: ok | 16:53 |
jeblair | clarkb: yeah, i like the process; it's just the port move that i'm worried about | 16:53 |
clarkb | jeblair: should I be running a bunch of clones against the 15g node while I apply puppet? | 16:54 |
jeblair | clarkb: er, were their firewall changes? | 16:54 |
jeblair | there even | 16:54 |
clarkb | jeblair: ya my latest patchset adds firewall changes | 16:54 |
clarkb | to allow 4443 and 8080 and 29418 | 16:55 |
jeblair | ah, i see it now. | 16:55 |
clarkb | I am not restricting access to those ports as they are all read only anyways | 16:55 |
jeblair | clarkb: hosently, i wouldn't worry about it. if there's a blip; we can deal. it's more of if it's actually offline more more than 30 seconds we would be very unhappy | 16:56 |
clarkb | starting with a --noop on the 15g node | 16:56 |
jeblair | clarkb: we can also do the jenkins shutdown idea, to reduce the impact | 16:56 |
clarkb | jeblair: the port change for apache didn't go in so haproxy wouldn't start. Looking into that now | 16:57 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 16:59 |
clarkb | that hsould do it, testing | 16:59 |
*** nati_ueno has joined #openstack-infra | 16:59 | |
*** ruhe has quit IRC | 17:04 | |
clarkb | apache isn't letting go of 443 and 80. Looks to be set to listen on those ports in the default configs | 17:04 |
pleia2 | clarkb: yeah, /etc/httpd/conf/httpd.conf has Listen 80 (looking around for https) | 17:06 |
pleia2 | might have a /etc/httpd/conf.d/ssl.conf too | 17:06 |
clarkb | pleia2: yup | 17:06 |
*** portante has joined #openstack-infra | 17:06 | |
clarkb | pleia2: we are not managing those with puppet are we? | 17:06 |
pleia2 | clarkb: nope | 17:06 |
portante | clarkb: ran into a swift tox issue, http://paste.openstack.org/show/44776/ | 17:07 |
*** nicedice_ has joined #openstack-infra | 17:07 | |
*** ^d has joined #openstack-infra | 17:07 | |
*** ^d has joined #openstack-infra | 17:07 | |
portante | do you know what I should do to fix this? | 17:07 |
clarkb | jeblair: appropriate to just copy what we have there now into a puppet template and toggle the ports? | 17:07 |
clarkb | jeblair: any better ideas? | 17:07 |
portante | clarkb: that is a swift tox issue related to missing "pbr" package | 17:07 |
clarkb | portante: it looks like you have an old version of pbr installed. can you try tox -re pep8? | 17:08 |
*** david-lyle has quit IRC | 17:08 | |
*** ftcjeff_ has quit IRC | 17:08 | |
*** ftcjeff has quit IRC | 17:08 | |
portante | k | 17:08 |
jeblair | clarkb: apache module doesn't deal with it? | 17:08 |
clarkb | jeblair: oh maybe /me looks | 17:08 |
*** david-lyle has joined #openstack-infra | 17:08 | |
*** ftcjeff has joined #openstack-infra | 17:08 | |
*** ftcjeff_ has joined #openstack-infra | 17:09 | |
*** SergeyLukjanov has quit IRC | 17:09 | |
*** UtahDave has quit IRC | 17:09 | |
*** dina_belova has quit IRC | 17:10 | |
portante | clarkb: weird, old version in /usr/lib by why should that affect tox? | 17:11 |
clarkb | portante: if you have site packages enabled in tox it will use your site packages | 17:12 |
clarkb | portante: site packages should probably be disabled if it is enabled (I believe the only project that needs it is nova for libvirt) | 17:12 |
*** dina_belova has joined #openstack-infra | 17:12 | |
clarkb | jeblair: ssl.conf is already vendored by us (and not by puppetlabs-apache). I will just do the same with httpd.conf and set the ports dynamicaly | 17:13 |
*** fbo is now known as fbo_away | 17:13 | |
jeblair | clarkb: sounds good | 17:13 |
BobBall | wow the gate is queued up a lot! I hadn't been watching! | 17:14 |
pleia2 | clarkb: right, sorry, I did use the ssl one for our certificates (I should not rely on memory!) | 17:14 |
*** dina_belova has quit IRC | 17:15 | |
burt | speaking of the gate: will 38697,2 automatically get restarted, or should I do a reverify no bug ? | 17:15 |
burt | (looks like the python27 job was killed in the middle, https://jenkins01.openstack.org/job/gate-nova-python27/1231/console) | 17:16 |
*** lifeless has quit IRC | 17:16 | |
portante | clarkb: I believe tox's default is to NOT use global packages, and I can't find anything in our tox.ini file that sets it to true | 17:16 |
clarkb | portante: correct the default should be to not use it. The way to toggle it is with sitepackages = true iirc | 17:17 |
*** lifeless has joined #openstack-infra | 17:17 | |
clarkb | portante: however, I think your virtualenvs may be stale as well | 17:17 |
portante | I removed my entire .tox tree | 17:17 |
clarkb | portante: if you do a .tox/pep8/bin/pip freeze do you see pbr | 17:17 |
clarkb | portante: oh. Do you still see the error? | 17:17 |
*** zaro has quit IRC | 17:17 | |
portante | not now, because I removed the /usr/lib/python2.7/site-packages/pbr* directory in order to make progress | 17:18 |
portante | clarkb: and yes, now I do see the correct pbr version in the freeze output | 17:19 |
*** ryanpetrello has quit IRC | 17:19 | |
*** ryanpetrello has joined #openstack-infra | 17:20 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 17:20 |
anteaya | back | 17:20 |
clarkb | portante: this is for swift? I am going to take a quick peak at the tox.ini | 17:21 |
*** mordred has quit IRC | 17:22 | |
portante | clarkb: yes, thanks | 17:22 |
*** dmakogon_ has joined #openstack-infra | 17:22 | |
anteaya | BobBall: yes, large queue much work happening to address it | 17:23 |
*** rcleere has joined #openstack-infra | 17:23 | |
clarkb | pleia2: any idea of how to make selinux allow apache to listen on ports 8080 and 4443? | 17:23 |
jeblair | clarkb: semanage port -a -t http_port_t -p tcp 8080 | 17:24 |
clarkb | jeblair: we should puppet that :) | 17:24 |
anteaya | burt: right now my best advice is to reverify | 17:24 |
anteaya | if I am wrong it is on me | 17:24 |
* clarkb looks at puppet selinux docs | 17:25 | |
pleia2 | clarkb: I'll poke around hte puppet module | 17:25 |
jeblair | clarkb: shouldn't be hard (if it isn't already) semanage lets you query and add | 17:25 |
*** yolanda has quit IRC | 17:25 | |
*** jpeeler has quit IRC | 17:25 | |
burt | anteaya: thanks, will do | 17:25 |
*** ruhe has joined #openstack-infra | 17:26 | |
anteaya | burt welcome | 17:26 |
clarkb | jeblair: I suppose I can add a couple execs if nothing else | 17:26 |
pleia2 | clarkb: actually, puppet module won't do this, we'll probably need to do something like I did with restorecons | 17:27 |
jeblair | what _does_ the module do? :) | 17:27 |
pleia2 | turns it on and off, loads more modules | 17:27 |
pleia2 | it's pretty simple | 17:27 |
jeblair | everything except managing selinux :) | 17:27 |
pleia2 | yeah, there is at least one manging one out there but it wasn't very good | 17:28 |
*** BobBall is now known as BobBallAway | 17:28 | |
jeblair | :( | 17:28 |
*** mordred has joined #openstack-infra | 17:30 | |
jeblair | gah, one of my test worker nodes is a dud; takes 1:40 to clone nova alone (standard is 0:22) | 17:31 |
jeblair | (i find i'm benchmarking the clients before i can benchmark the server) | 17:31 |
pleia2 | clarkb: we'll also need to add the policycoreutils-python package (that's what has semanage) | 17:32 |
clarkb | pleia2: ya just discovered that | 17:32 |
clarkb | jeblair: :( | 17:32 |
jeblair | the other 9 are ok though. :) | 17:33 |
anteaya | mordred: I think this was the only comment I saw directed at you since you were last here: <mgagne> mordred: since you are the gerrit search master to me, how can you exclude changes which have been reviewed by yourself? | 17:33 |
*** jpeeler has joined #openstack-infra | 17:33 | |
*** arezadr has quit IRC | 17:35 | |
pleia2 | clarkb: looks like selinux already gave 8080 away: http_cache_port_t tcp 3128, 8080, 8118, 8123, 10001-10010 | 17:36 |
pleia2 | get a "/usr/sbin/semanage: Port tcp/8080 already defined" error when trying to set it again | 17:36 |
*** morganfainberg is now known as morganfainberg|a | 17:38 | |
pleia2 | ah: semanage port -m -t http_port_t -p tcp 8080 (-m to modify, rather than -a to add port def) | 17:38 |
clarkb | pleia2: I am going to brute force it to allow other potential ports. I think I can do this with the onlyif exec clause | 17:38 |
clarkb | or I can use -m thanks | 17:38 |
Alex_Gaynor | :/ we really need fewer failures in the gate pipeline | 17:39 |
mordred | mgagne: you can't | 17:41 |
mordred | mgagne: that's the reason I do the two passes with the star | 17:41 |
mgagne | mordred: sad panda. Sad that you can't replicate the behaviour of the "Previously Reviewed By" section | 17:42 |
*** cthulhup has joined #openstack-infra | 17:42 | |
mgagne | mordred: is this section openstack specific? | 17:42 |
*** rnirmal has quit IRC | 17:43 | |
*** SergeyLukjanov has joined #openstack-infra | 17:43 | |
mordred | mgagne: yes. we had to write java to get that | 17:44 |
mordred | Alex_Gaynor: if anyone ever says that testing code as it's uploaded rather than doing the work we do to test as it would land is sufficient, they should watch our gate resets | 17:46 |
*** cthulhup has quit IRC | 17:47 | |
Alex_Gaynor | mordred: Seriously. A decent portion of recents are from flaky tests or people approving patches before their jenkins run happens though. We really need to cut down on this | 17:47 |
mordred | every time something fails in the gate pipeline, it's a testament to just how complex this opentsack thing we're testing really is. oy | 17:47 |
Alex_Gaynor | s/this/those/ | 17:47 |
Alex_Gaynor | each one costs us like an hour | 17:47 |
mordred | Alex_Gaynor: yes. we really do | 17:47 |
mordred | and _seriously_ ? people are approving in this climant before the check job finishes? | 17:47 |
mordred | s/climant/climate/ | 17:48 |
mtreinish | Alex_Gaynor: https://review.openstack.org/#/c/41797/ that will drop that reset time down | 17:48 |
Alex_Gaynor | maybe not today, but I've definitely seen it before | 17:48 |
mtreinish | but at the cost of a bit more flakiness | 17:48 |
Alex_Gaynor | mtreinish: only one way to find out! | 17:48 |
Alex_Gaynor | (if it's worth it) | 17:48 |
*** afazekas has joined #openstack-infra | 17:48 | |
Alex_Gaynor | mtreinish: we going to land that once check passes? | 17:48 |
mtreinish | Alex_Gaynor: we can, but I wasn't planning on doing it until 2 race fixes get through the gate (we can just stack it on the end) | 17:49 |
mtreinish | here's the graphs I've been watching https://tinyurl.com/kmwsvob | 17:50 |
Alex_Gaynor | mtreinish: probably best to wait for those to be fully landed, given how the gate is right now :/ | 17:50 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 17:51 |
mtreinish | Alex_Gaynor: yeah, the other problem is those 2 reviews don't fix the 3 most common flaky parallel fails I've been seeing in the gate pipeline | 17:51 |
clarkb | pleia2: ^ I think that should work. you can't -m an existing thing so I do -a and if that fails -m | 17:51 |
pleia2 | clarkb: "can't -m an non-existing thing" I think you mean, but yes, good call | 17:52 |
* pleia2 reviews | 17:53 | |
clarkb | pleia2: yah non-existing. I can type I swear | 17:53 |
clarkb | woot dependency cycle | 17:53 |
*** dina_belova has joined #openstack-infra | 17:54 | |
pleia2 | clarkb: how are we handling git daemon's port? | 17:55 |
pleia2 | my patch? | 17:57 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 17:57 |
clarkb | pleia2: ya | 17:57 |
pleia2 | ok cool | 17:58 |
clarkb | which is working fine best I can tell | 17:58 |
*** ruhe has quit IRC | 17:58 | |
mordred | Alex_Gaynor, pleia2, clarkb can I get a read on this before I send it to the dev list? | 17:59 |
Alex_Gaynor | mordred: that's "this"? | 18:00 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 18:00 |
mordred | haha | 18:00 |
mordred | how about I paste the link | 18:00 |
pleia2 | :) | 18:00 |
mordred | http://paste.openstack.org/show/44785/ | 18:00 |
Alex_Gaynor | woudln't hurt :) | 18:00 |
mordred | I want to be clear, not too bitchy or accusing, and also not indicate panic | 18:00 |
clarkb | I am going to kill this dependency cycle darnit | 18:00 |
mordred | clarkb: I believe in you | 18:01 |
pleia2 | mordred: looks good to me | 18:01 |
Alex_Gaynor | mordred: looks good to me | 18:01 |
mordred | thanks | 18:01 |
jeblair | does anyone want to become (even more of) a git expert? | 18:02 |
mordred | jeblair: sure | 18:02 |
*** zehicle_at_dell has quit IRC | 18:02 | |
jeblair | i think we need to get a handle on the refs/changes issue | 18:02 |
*** AJaeger has quit IRC | 18:03 | |
jeblair | because a very simple test (cloning nova with and without refs/changes) is about a 2x difference in speed | 18:03 |
mordred | as in, how that affects a remote update? | 18:03 |
jeblair | but it's _complicated_ | 18:03 |
anteaya | mordred: I would reiterate your tl;dr before you sign off | 18:03 |
anteaya | just in case they love your prose so much, they forget the point | 18:03 |
jeblair | so i don't want a simple "oh, let's just not replicate refs/changes" before we _understand_ it | 18:03 |
mordred | jeblair: can you give a summary of what's complicated? | 18:04 |
jeblair | things that may impact the issue are whether the refs are in the repo at all, whether they are there and packed, and whether our clients or servers are appropriately (not) advertising them on initial connect | 18:04 |
jeblair | see this thread: http://thread.gmane.org/gmane.comp.version-control.git/126797/focus=127059 | 18:04 |
jeblair | i don't know if that landed, or what | 18:05 |
jeblair | anyway, i will get around to understanding that, but i don't want that to distract from our work on 'just add more mirrors of what we have' for now | 18:05 |
jeblair | so if anyone makes some headway into that before we get to that optimization point, it would be useful | 18:06 |
*** fbo_away is now known as fbo | 18:06 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 18:06 |
mordred | jeblair: I will read that and other things and see if I can drop some knowledge | 18:07 |
jeblair | mordred: awesome, thx | 18:08 |
clarkb | that last patchset makes me really sad | 18:08 |
clarkb | I am running bash in a puppet exec so that I can easily negate the return code of a command in the onlyif | 18:08 |
clarkb | of course I probably forgot to update the path and it will fai | 18:08 |
*** rnirmal has joined #openstack-infra | 18:08 | |
* anteaya hands clarkb an "l" | 18:09 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 18:09 |
*** datsun180b has quit IRC | 18:11 | |
mordred | jeblair: jumping to thoughts - what if we make our remote refspecs on the build slaves more specific | 18:12 |
*** cthulhup has joined #openstack-infra | 18:13 | |
jeblair | mordred: maybe; i'd want to understand what it's doing now though (what does git remote update do? how does it relate to the (non-)advertisement of refs?) | 18:14 |
*** marun has quit IRC | 18:14 | |
anteaya | mordred jeblair not sure if this helps or not, but in this post I did as an intro to git I posted the changes to refs and logs/refs as I went along: http://anteaya.info/blog/2013/02/26/the-structure-and-habits-of-git/ | 18:14 |
clarkb | jeblair: that latest patchset mostly works. I am not entirely convinced it will restart apache before attempting to start haproxy, but we can do multiple passes really quickly if we need it | 18:14 |
clarkb | jeblair: its tricky to get that right because I kept running into dependency cycles. | 18:14 |
clarkb | jeblair: but the 15g nodes is now running apache and git-daemon behind haproxy | 18:14 |
jeblair | clarkb: you should be able to clone nova | 18:15 |
anteaya | but I didn't create or track remote branches or refs, so I don't answer that question | 18:15 |
jeblair | (the others don't exist) | 18:15 |
clarkb | jeblair: ok testing | 18:15 |
clarkb | jeblair: git clone git://162.209.12.127/openstack/nova works | 18:16 |
adalbas | hi! some jobs in the gate (looking at devstack-testr-vm-full) are showing this error ''ERROR:root:Could not find any typelib for GnomeKeyring'. Anyone noticed that and know what is this about? | 18:16 |
*** xBsd has joined #openstack-infra | 18:17 | |
jeblair | clarkb: http? use GIT_SSL_NO_VERIFY=true | 18:17 |
clarkb | jeblair: and https is failing because the development hiera does not have the ssl cert | 18:17 |
anteaya | adalbas: that is a bug | 18:17 |
jeblair | clarkb: are you sure? i thought dev hiera was prod hiera? | 18:17 |
clarkb | jeblair: I don't think it is, but I will double check | 18:18 |
anteaya | it shouldn't affect the outcome of the tests adalbas | 18:18 |
adalbas | anteaya, yeah, i realized that. Is there a bug opened for that anyway? | 18:18 |
clarkb | jeblair: nevermind it is a symlink. I will look into this more closely | 18:18 |
anteaya | adalbas: looking | 18:18 |
*** fbo is now known as fbo_away | 18:18 | |
jeblair | clarkb: it _should_ install the cert for git, which you should be able to ignore with that env var | 18:19 |
*** marun has joined #openstack-infra | 18:19 | |
adalbas | anteaya, i found this one: https://bugs.launchpad.net/devstack/+bug/1193164 | 18:19 |
uvirtbot | Launchpad bug 1193164 in devstack "GnomeKeyring errors when installing devstack" [Undecided,New] | 18:19 |
*** boris-42 has joined #openstack-infra | 18:19 | |
clarkb | jeblair: it isn't installing the cert at all so we can't ignore the error (I think apache is failing to do anything at that point) | 18:19 |
anteaya | adalbas: that's the one | 18:20 |
adalbas | anteaya, tks! | 18:20 |
anteaya | adalbas: np | 18:20 |
clarkb | jeblair: error does change when using the GIT_SSL_NO_VERIFY flag | 18:20 |
ttx | jeblair: about mordred's suggestion of not approving before checks are run... is it something we could enforce ? I can see benefits for it even outside of the FF craze. | 18:21 |
*** xBsd has quit IRC | 18:21 | |
jeblair | ttx: probably; occasionally it's useful. worth thinking about | 18:22 |
Alex_Gaynor | ttx: So, FWIW when I first got involved in OpenStack, the way I thoguht it worked was that there wasn't an explicit "Approve" state, that instead stuff was approved when jenkins passed and it had the needed +2s. Such a model might be interesting to explore. | 18:22 |
clarkb | jeblair: https://162.209.12.127/openstack/nova/info/refs not found falling back on the dumb client? | 18:23 |
mordred | Alex_Gaynor: that's where we started, actually | 18:23 |
ttx | Alex_Gaynor: we kinda want the APRV because smoetimes there is a timing constraint. So you can have two +2s but waiting for something to happen before hitting APRV | 18:23 |
*** AJaeger has joined #openstack-infra | 18:23 | |
mordred | that too. but the effect on the gate would be largely the same if we triggered a gate run direcetly on the second +2 | 18:24 |
Alex_Gaynor | how many builders do we have right now for non-devstack builds? | 18:24 |
mordred | which is that the second +2 could jump the initial vrfy and trigger the gate testing anyway | 18:24 |
ttx | mordred: it wouldn't be completely insane to require that check tests pass before adding something to the gate queue. At least for some pipes | 18:25 |
jeblair | (also, why would you never want more than 2 core reviewers to review something?) | 18:25 |
clarkb | oh I know. I need to put git.openstack.org in the request. /me edits /etc/hosts locally | 18:25 |
*** fbo_away is now known as fbo | 18:25 | |
*** datsun180b has joined #openstack-infra | 18:25 | |
*** xBsd has joined #openstack-infra | 18:27 | |
*** woodspa has joined #openstack-infra | 18:28 | |
jeblair | clarkb: why? | 18:31 |
jeblair | clarkb: (the other servers don't require that) | 18:31 |
clarkb | jeblair: because the 4443 vhost is for git.openstack.org otherwise you get the default vhost | 18:32 |
jeblair | clarkb: why don't we make the 4443 accept all hostnames? | 18:33 |
clarkb | jeblair: we can do taht as well. Remove the default vhost and put a * in the git.openstack.org vhost | 18:33 |
clarkb | but now I appear to have haproxy logging issues. It wants to log to rsyslog via udp | 18:34 |
clarkb | error: gnutls_handshake() failed: A TLS warning alert has been received. is the current error | 18:34 |
mordred | jeblair: ok. I think I have learned new things | 18:41 |
reed | bbl | 18:41 |
*** reed has quit IRC | 18:41 | |
jeblair | clarkb, mordred: https://etherpad.openstack.org/git-lb | 18:42 |
jeblair | dinky benchmarks | 18:42 |
jeblair | i think we should use 8g nodes instead of 30g; and lots of them. | 18:43 |
jeblair | mordred: what have you learned? | 18:44 |
clarkb | wow those numbers are very close to each other | 18:44 |
sdake_ | is the gate broken ? | 18:44 |
mordred | jeblair: ah. nope. | 18:45 |
mordred | jeblair: I did not learn something | 18:45 |
anteaya | sdake_: what do you see that you ask the question? | 18:46 |
anteaya | the gate is very very slow but it should still be running | 18:46 |
jeblair | mordred: (i have learned that git.o.o has a partial packed-refs file; i suspect it has something to do with how it was created (maybe an initial git clone --mirror or something)) | 18:46 |
jeblair | mordred: 28k refs are in packed refs, 9k are loose | 18:47 |
mordred | jeblair: interesting | 18:47 |
jeblair | mordred: review.o.o is all unpacked | 18:47 |
anteaya | the check queue however is filled with unknown rather than a time | 18:47 |
mordred | jeblair: I'm breaking down and asking spearce questions directly | 18:47 |
jeblair | anteaya: waiting on centos nodes for py26 tests | 18:47 |
sdake_ | anteaya apparently heat gate jobs are going slowly | 18:47 |
sdake_ | but they appear to make progress according to devs in the heat channel - but thanks for responding | 18:47 |
anteaya | sdake_: yes all gate jobs are going slowly | 18:48 |
anteaya | yes absolutely | 18:48 |
jeblair | which probably means we should add more centos nodes | 18:48 |
anteaya | jeblair: great thanks | 18:48 |
anteaya | go go centos nodes | 18:48 |
jeblair | clarkb: they are actually close enough that i want to spin up a 4g and 2g node (they both have 2vcpus; half of 8g's 4vcpu) | 18:50 |
*** cthulhup has quit IRC | 18:51 | |
clarkb | jeblair: good idea | 18:52 |
clarkb | I am going to stop using haproxy for the http to https rediect. I don't think that works tiwh the tcp mode | 18:53 |
mordred | jeblair: best I can tel, the patch did not land, nor any patches like it | 18:53 |
*** danger_fo_away is now known as danger_fo | 18:54 | |
jeblair | mordred: :( | 18:54 |
clarkb | I had a hunch this would be the case which is why I kept the 8080 vhost | 18:54 |
mordred | jeblair: I'm continuing to dig though | 18:54 |
jeblair | those are launching now; i need to get exercise and lunch; should be back in about 1 hour | 18:54 |
*** sarob has joined #openstack-infra | 18:59 | |
clarkb | removing the default virtualhost and matching * on the git.o.o vhost makes things work for some reason. I am not complaining patchset icoming | 19:04 |
mordred | jeblair: uploadpack.hiderefs | 19:08 |
mordred | jeblair: it's in 1.8.2 | 19:09 |
mordred | which means we'd almost certainly want the fetch-from repos to be on precise so taht we could install latest git from the git ppa | 19:10 |
mordred | clarkb: ^^ | 19:10 |
clarkb | mordred: ugh | 19:10 |
*** gordc has quit IRC | 19:10 | |
clarkb | I think we either get cgit on precise or new git on centos | 19:11 |
mordred | oh yeah? | 19:11 |
clarkb | because those seem less painful than a complicated proxy mess to send cgit to centos boxes and everything else to precise boxes | 19:11 |
mordred | nod | 19:11 |
mordred | how awful is getting git >=1.8.2 on centos along side of our cgit install? | 19:12 |
mordred | pleia2: ^^ ? | 19:12 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 19:13 |
pleia2 | mordred: pretty awful | 19:13 |
mordred | SWEET | 19:13 |
pleia2 | would have to load up a 3rd party rpm, which makes me :( | 19:13 |
mordred | where are we getting cgit from? | 19:14 |
pleia2 | epel | 19:14 |
mordred | epel has cgit and not git >=1.8 ? | 19:14 |
pleia2 | as I understand it, epel is just "other stuff" not so much backports | 19:14 |
clarkb | ugh I just derped and started a git fetch of 42784 into the hiera repo... | 19:14 |
* clarkb makes a note to clean up that repo when this is all done | 19:15 | |
clarkb | hiera itself should be fine as I didn't check out anything in that repo | 19:15 |
mordred | pleia2: I understood the oposite - that epel is backports of current fedora for old centos/rhel | 19:15 |
mordred | but - I REALLY don't understand | 19:15 |
*** AJaeger has quit IRC | 19:15 | |
clarkb | mordred: maybe you can take a look at that repo to make sure I didn't hose anything? second set of eyes and all that | 19:15 |
mordred | clarkb: all you did was fetch? | 19:16 |
pleia2 | Does EPEL replace packages provided within Red Hat Enterprise Linux or layered products? | 19:16 |
pleia2 | No. EPEL is purely a complementary repository that provide add-on packages. | 19:16 |
clarkb | mordred: yes | 19:16 |
*** zaro has joined #openstack-infra | 19:17 | |
anteaya | hey zaro | 19:17 |
mordred | ok. then I do not have a good answer | 19:17 |
clarkb | mordred: http://paste.openstack.org/show/44788/ I ^C'd before the checkout | 19:17 |
pleia2 | I mean, we can just use an rpm | 19:17 |
mordred | well, cgit is compiled against git | 19:18 |
pleia2 | oh, that | 19:18 |
mordred | isnt' it? so wouldn't that screw the cgit install too? | 19:18 |
pleia2 | I'm not sure | 19:18 |
mordred | or - wait - no, they do static linking | 19:18 |
mordred | that's why it's not in ubuntu | 19:18 |
pleia2 | er, hooray for static linking? | 19:18 |
pleia2 | :) | 19:18 |
mordred | clarkb: yeah. you're fine | 19:18 |
pleia2 | I can find a nice looking rpm and install it on my test system | 19:19 |
clarkb | mordred: ok, is that something we should git gc? | 19:19 |
mordred | clarkb: not this week | 19:19 |
mordred | :) | 19:19 |
*** beagles has quit IRC | 19:20 | |
clarkb | ya I am not terribly worried about it, but I should probably clean that up at some point. I will write a note on the whiteboard | 19:20 |
clarkb | jeblair: ^ | 19:20 |
*** sarob has quit IRC | 19:21 | |
*** pblaho has joined #openstack-infra | 19:21 | |
*** sarob has joined #openstack-infra | 19:21 | |
zaro | anteaya: hello! | 19:22 |
anteaya | welcome to the party | 19:23 |
anteaya | jeblair noticed that when restarting zuul a gearman thread dropped resulting in slaves sticking around and tests running on them, but they were orphaned | 19:23 |
anteaya | so the logs from the tests wore lost | 19:24 |
anteaya | <jeblair> mordred, clarkb, zaro: when the gearman server restarts, i think the executorworkerthread dies, which means the offline-on-complete feature fails | 19:25 |
anteaya | * xBsd has quit (Quit: xBsd) | 19:25 |
anteaya | <jeblair> mordred, clarkb, zaro: which is why a lot of jobs are showing up as lost right now -- they are re-running on hosts that should have been offlined | 19:25 |
anteaya | * michchap (~michchap@60-242-111-85.tpgi.com.au) has joined #openstack-infra | 19:25 |
anteaya | <jeblair> so for the moment, if we stop zuul, we need to delete all the slaves | 19:25 |
*** AJaeger has joined #openstack-infra | 19:25 | |
*** AJaeger has joined #openstack-infra | 19:25 | |
anteaya | zaro from about 4.5 hours ago | 19:25 |
*** beagles has joined #openstack-infra | 19:25 | |
*** sarob has quit IRC | 19:25 | |
clarkb | ok lunch time back shortly | 19:28 |
zaro | anteaya: sorry i missed it all. i was deep in gerrit. | 19:28 |
anteaya | clarkb: happy lunch | 19:28 |
anteaya | zaro: understandable | 19:28 |
zaro | anteaya: lunching with clarkb so will think about it after food. | 19:28 |
anteaya | zaro: happy food | 19:28 |
ttx | anteaya: the "available test nodes" graph at bottom of zuul status page looks a bit funny. Since you've been following the action, is it considered normal ? | 19:29 |
pleia2 | tsk, RPMForge is popular but for centos they only have up to git 1.7.11 | 19:29 |
ttx | 10 is the new 0 | 19:29 |
anteaya | ttx: I asked the same question at the start of my day today | 19:30 |
anteaya | it means that we are using all available nodes, really | 19:31 |
anteaya | so yes 10 is the new 0 | 19:31 |
anteaya | ttx when I asked jeblair the same question this morning he responded with this image: http://graphite.openstack.org/render/?from=-24hours&fgcolor=000000&title=Test%20Nodes&_t=0.8664466904279092&height=308&bgcolor=ffffff&width=586&until=now&showTarget=color%28alias%28sumSeries%28stats.gauges.nodepool.target.*.devstack-precise.*.ready%29%2C%20%27devstack-precise%27%29%2C%20%27green%27%29&_salt=1376751567.43&target=alias%28sum | 19:31 |
anteaya | Series%28stats.gauges.nodepool.target.*.devstack-precise.*.building%29%2C%20%27Building%27%29&target=alias%28sumSeries%28stats.gauges.nodepool.target.*.devstack-precise.*.ready%29%2C%20%27Ready%27%29&target=alias%28sumSeries%28stats.gauges.nodepool.target.*.devstack-precise.*.used%29%2C%20%27Used%27%29&target=alias%28sumSeries%28stats.gauges.nodepool.target.*.devstack-precise.*.delete%29%2C%20%27Delete%27%29&areaMode=stacked | 19:31 |
anteaya | oh goodness sorry about that | 19:32 |
anteaya | ick | 19:32 |
ttx | ok, was expecting something like this. "free node" graphs are always a bit funny in a dynamic allocation system | 19:32 |
anteaya | yes | 19:32 |
ttx | anteaya: could you tinyurl that for me ? | 19:32 |
anteaya | ttx: https://tinyurl.com/kmotmns | 19:33 |
anteaya | better | 19:33 |
mordred | ok. plane landing. I _may_ get on for a minute at the hotel tonight, but in general I'm switching to driving large trucks and building steel structures in the hot sun | 19:34 |
mordred | and, you know, burning the man | 19:35 |
anteaya | happy sand, mordred | 19:35 |
pleia2 | mordred: have fun! (or whatever you're supposed to have at burning man :)) | 19:36 |
anteaya | whatever it is, it doesn't include water or shade | 19:36 |
*** vipul is now known as vipul-away | 19:36 | |
*** vipul-away is now known as vipul | 19:36 | |
*** boris-42 has quit IRC | 19:37 | |
ttx | Interesting thing... FeatureProposalFreezes should overflow the checks, not the gate pipeline. FeatureFreeze will overflow the gate pipeline. That should really be fun | 19:38 |
ttx | (i.e. people are supposed to propose stuff, not so much approve them) | 19:39 |
anteaya | when is the date for FeatureProposalFreezes? | 19:40 |
mgagne | anteaya: August 21 for nova and cinder -> https://wiki.openstack.org/wiki/Havana_Release_Schedule | 19:41 |
mgagne | anteaya: today =) | 19:41 |
anteaya | mgagne: thank you | 19:41 |
anteaya | ah ha | 19:42 |
anteaya | funny it has been neutron and heat we have heard from today | 19:42 |
anteaya | cinder and nova have been relatively quiet in this channel | 19:43 |
*** melwitt has joined #openstack-infra | 19:43 | |
pleia2 | clarkb: so the only reasonable, new git rpms that people use are from http://pkgs.repoforge.org/git/ (might find some random ones on some-person's-blog if I search more, but I haven't yet, and even then...), repoforge only goes up to 7.11, the other option is installing from source :\ | 19:44 |
pleia2 | er, 1.7.11 | 19:44 |
*** pblaho has quit IRC | 19:45 | |
* pleia2 lunch | 19:46 | |
anteaya | happy lunch | 19:46 |
anteaya | guess it is just me right now | 19:46 |
anteaya | ttx are stackforge projects affected by feature freeze? like savanah and murano? | 19:47 |
ttx | no, only the integrated projects | 19:47 |
ttx | i.e. the ones that do a common release | 19:47 |
*** arezadr has joined #openstack-infra | 19:48 | |
anteaya | do you think there would be offence taken if stackforge projects were asked to submit patches on a critical basis only right now? | 19:49 |
anteaya | then if something is non-critical it could wait until after the rush | 19:49 |
ttx | that's not really the concept that was sold to them, and unfortunately we are far from hte activity peak | 19:50 |
ttx | ie. Feature Freeze is actually two weeks away. | 19:50 |
ttx | We can't ask them to hold for two weeks. | 19:51 |
anteaya | I'm seeing a lot of nova/heat/cinder/neutron patches so that is as expected | 19:51 |
jeblair | anteaya: out of the 200 changes in zuul, ~40 are stackforge, and they run simple/fast jobs. i don't think it's worth it. | 19:51 |
anteaya | ttx fair enough | 19:51 |
anteaya | jeblair: ah stats thank you | 19:51 |
*** vipul is now known as vipul-away | 19:51 | |
anteaya | the question just floated through my head so I thought I would give it voice | 19:51 |
anteaya | jeblair: mordred found a git fix but it requires git 1.8.2 which requires installing a third party rpm for cgit and even then it appears the package is not available | 19:52 |
*** wenlock has joined #openstack-infra | 19:52 | |
wenlock | hi all | 19:53 |
*** vipul-away is now known as vipul | 19:53 | |
wenlock | question about hiera config, was looking for a sample... finally got some time to dig back into this today | 19:54 |
jeblair | anteaya: i saw | 19:54 |
*** thomasbiege has joined #openstack-infra | 19:55 | |
anteaya | k | 19:55 |
anteaya | wenlock: hello what is the question? | 19:55 |
wenlock | maybe enough to get me started with wiki | 19:55 |
*** thomasbiege has quit IRC | 19:57 | |
*** cyeoh has quit IRC | 19:57 | |
*** chuckieb|2 has quit IRC | 19:58 | |
*** koobs` has joined #openstack-infra | 19:58 | |
jeblair | it actually merged 11 changes in the past hour; i think it just got 11 more added to the gate queue. | 19:58 |
*** cyeoh has joined #openstack-infra | 19:58 | |
*** koobs has quit IRC | 19:58 | |
*** jhesketh has quit IRC | 19:59 | |
anteaya | how does 11 merges in the last hour compare with prior hours? | 19:59 |
anteaya | are we getting better or staying the same? | 19:59 |
*** jhesketh has joined #openstack-infra | 20:00 | |
jeblair | anteaya: we haven't done anything to make it better yet so it's not worth looking. i mostly wanted to see if it was functioning at all, and it is. so it's back to scaling git.o.o now. | 20:01 |
anteaya | ah okay | 20:01 |
jeblair | anteaya: (it's in graphite if you wanted to play with it; i don't have a link, i was grepping logs because i was looking for errors) | 20:02 |
*** cthulhup has joined #openstack-infra | 20:02 | |
anteaya | jeblair: I have forgotten how I get to graphite | 20:02 |
jeblair | anteaya: graphite.openstack.org | 20:03 |
anteaya | that would be it, thanks | 20:03 |
*** thomasbiege has joined #openstack-infra | 20:03 | |
*** linggao has joined #openstack-infra | 20:03 | |
jeblair | clarkb: we should consider using the private interfaces for git haproxy (but that only works within a DC; and we should also test to see which is actually faster) | 20:04 |
*** mrodden has quit IRC | 20:05 | |
*** hartsocks has joined #openstack-infra | 20:05 | |
*** thomasbiege has quit IRC | 20:06 | |
*** cthulhup has quit IRC | 20:06 | |
linggao | Hi clarkb, I accidently added a patch 10 to someone's code in review. I meant only depend on his code. | 20:09 |
linggao | clarkb, how do I remove patch 10 in https://review.openstack.org/#/c/40844/ ? | 20:09 |
linggao | clarkb, NobodyCam told me to ask you about it. | 20:10 |
jeblair | #status ok | 20:11 |
*** ChanServ changes topic to "Discussion of OpenStack Developer Infrastructure | docs http://ci.openstack.org | bugs https://launchpad.net/openstack-ci/+milestone/grizzly | https://github.com/openstack-infra/config" | 20:11 | |
*** afazekas has quit IRC | 20:13 | |
*** yolanda has joined #openstack-infra | 20:13 | |
*** rnirmal has quit IRC | 20:18 | |
clarkb | jeblair: did you catch my hiera data repo derp in scrollback? don't let me forget to clean that up at some point when things are quieter | 20:18 |
clarkb | jeblair: tl;dr I fetched a ref from openstack-infra/config into that repo http://paste.openstack.org/show/44788/ because I was in the wrong PWD when running that command | 20:19 |
clarkb | linggao: there is not way to remove patch10. You can only push a patch 11 that restores patchset 9 | 20:19 |
jeblair | clarkb: yep | 20:19 |
clarkb | jeblair: I am reading up on private interfaces now. 162.209.12.127 has the latest patchset of my change applied to it and is working fine | 20:20 |
*** morganfainberg|a is now known as morganfainberg | 20:20 | |
linggao | clarkb: thanks. I'll do that to repaire the damage. | 20:20 |
*** dkliban has quit IRC | 20:20 | |
clarkb | jeblair: oh you mean the rax private interfaces | 20:20 |
jeblair | clarkb: yeah | 20:20 |
clarkb | jeblair: do our firewall rules apply to both interfaces? if so it is just a matter of putting those IPs into the balance member IP list | 20:21 |
jeblair | clarkb: well, we need to decide if we want to use them first | 20:22 |
jeblair | clarkb: you want to run a quick benchmark beetween the 15 and 30 g test nodes i set up? | 20:22 |
jeblair | clarkb: (note, they are in ORD, not DFW) | 20:22 |
clarkb | jeblair: I will set up 162.209.12.127 to balance across that node and the 30G node on their private interfaces then switch to public | 20:22 |
jeblair | clarkb: ok, i was just thinking do a quick git clone from one to the other to see if you notice a diff | 20:24 |
clarkb | jeblair: without haproxy? | 20:24 |
clarkb | I can do that too | 20:24 |
jeblair | clarkb: updated https://etherpad.openstack.org/git-lb | 20:26 |
jeblair | 2g has the highest 'clients served per gb' ratio | 20:26 |
*** hashar has joined #openstack-infra | 20:28 | |
jeblair | and overall it correlates very closely to 1/1 client/cpu (with 8g being able to serve 1.5 clients per cpu) | 20:28 |
*** hashar has left #openstack-infra | 20:28 | |
jeblair | (but slowly) | 20:28 |
Alex_Gaynor | event/result queues on zuul seem to be rising | 20:30 |
jeblair | Alex_Gaynor: thanks, i'll take a look | 20:30 |
*** rfolco has quit IRC | 20:31 | |
zaro | jeblair: i can't seem to repro executorworkers stopping when gearman server restarts. are you still seeing this? | 20:34 |
*** yolanda has quit IRC | 20:34 | |
jeblair | zaro: the problem is that the node was not taken offline | 20:34 |
jeblair | zaro: the rest is speculation | 20:35 |
*** hartsocks has left #openstack-infra | 20:35 | |
zaro | jeblair: node not taken offline due to restarting gearman server? | 20:35 |
odyssey4me4 | join #chef | 20:35 |
odyssey4me4 | hahaha, oops | 20:35 |
jeblair | zaro: when the gearman server was taken offline, nodes that were running jobs were not set offline | 20:36 |
*** p5ntangle has joined #openstack-infra | 20:36 | |
zaro | jeblair: ahh, ok. | 20:36 |
*** rnirmal has joined #openstack-infra | 20:38 | |
*** UtahDave has joined #openstack-infra | 20:38 | |
jeblair | clarkb: i'm going to try your signal patch now, but i expect it to kill the gearman server | 20:41 |
jeblair | clarkb: which means we'll get one thread dump and then we get to restart zuul | 20:42 |
clarkb | jeblair: ok | 20:42 |
clarkb | also :( | 20:42 |
*** p5ntangle has quit IRC | 20:44 | |
anteaya | jeblair: should we have a status update for the channels? | 20:44 |
clarkb | anteaya: I think we can do that once the gearman server falls over | 20:44 |
clarkb | anteaya: if we don't recover cleanly | 20:44 |
anteaya | okay | 20:44 |
*** AJaeger has quit IRC | 20:46 | |
*** odyssey4me4 has quit IRC | 20:46 | |
jeblair | clarkb: i don't think it fell over | 20:47 |
clarkb | jeblair: did you get a stack dump? | 20:48 |
jeblair | clarkb: i have 2 of them, so far; slightly different, and useful | 20:48 |
clarkb | jeblair: best I can tell there isn't a real difference betwene private or public interfaces on those boxes | 20:49 |
clarkb | I updated the etherpad | 20:49 |
*** thomasbiege has joined #openstack-infra | 20:49 | |
*** thomasbiege has quit IRC | 20:50 | |
jeblair | clarkb: i'm switching to intense zuul hacking; i'd lean toward going with public and proceeding with the plan | 20:50 |
clarkb | ok, I am going to test ipv6 now. as I noticed that wasn't working | 20:51 |
clarkb | and git.o.o has a AAAA record so it should be made to work | 20:51 |
jeblair | clarkb: but we can haproxy over v4, yeah? | 20:51 |
clarkb | jeblair: yeah this is just for the frontend listen directives | 20:51 |
jeblair | clarkb: (i mean, if somethings broke, probably worth looking into) | 20:51 |
jeblair | ok, yeh | 20:51 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Load balance git requests. https://review.openstack.org/42784 | 20:52 |
*** apcruz has quit IRC | 20:54 | |
*** lbragstad has quit IRC | 20:55 | |
jeblair | clarkb: ok, i think i have what i need to hack on the zuul problem | 20:56 |
jeblair | clarkb: i don't believe it's going to get better (it might, after a long time, increment through the loop again) | 20:56 |
jeblair | clarkb: so we should go ahead and stop it, which as we learned, means some cleanup work. | 20:56 |
jeblair | clarkb: up for helping? | 20:56 |
clarkb | jeblair: sure | 20:57 |
*** dkliban has joined #openstack-infra | 20:57 | |
clarkb | jeblair: I think 42784 is just about ready. Need to test thta that works over ipv6 now but git clone doesn't like ipv6 addresses in its url | 20:57 |
clarkb | it splits on ':' | 20:57 |
jeblair | ah | 20:57 |
clarkb | *splits on ':' and treats the righ hand side as the port | 20:57 |
jeblair | clarkb: will you log into jenkins02? | 20:57 |
clarkb | jeblair: I am in | 20:58 |
jeblair | clarkb: abort all jobs. :) | 20:58 |
clarkb | oh you mean through web ui? | 20:58 |
jeblair | clarkb: yep | 20:58 |
clarkb | I ssh'd in >_> | 20:58 |
clarkb | jeblair: should I put it in shutdown mode first to prevent more jobs from starting? | 20:58 |
jeblair | clarkb: i stopped zuul | 20:59 |
clarkb | ok | 20:59 |
clarkb | I am aborting jobs now | 20:59 |
jeblair | clarkb: with any luck, nodepool should clean most of those up | 21:01 |
*** cody-somerville has quit IRC | 21:01 | |
clarkb | I am not sure if I should wait after clicking the red button or if I can just spam that. I assume that it is just making a rest call back to jenkins | 21:02 |
jeblair | clarkb: spam it | 21:02 |
*** fbo is now known as fbo_away | 21:02 | |
jeblair | clarkb: when you're done; double check that of all the on-line devstack nodes, none of them has a build history | 21:03 |
clarkb | jeblair: FYI https://jenkins02.openstack.org/job/gate-neutron-python26/275/ won't die and has been running for hours | 21:05 |
jeblair | clarkb: i'll look into it | 21:06 |
clarkb | I am waiting for nodepool to cleanup the nodes now | 21:06 |
jeblair | clarkb: (it'll add them too, you should end up with 10 online and 5 offline nodes [thanks to az2]) | 21:07 |
jeblair | clarkb: that's nasty; i think we should restart that jenkins master | 21:08 |
jeblair | (when nodepool finishes | 21:08 |
*** dkranz has joined #openstack-infra | 21:08 | |
jeblair | clarkb: (i killed and relaunched the slave and that build is still stuck) | 21:08 |
clarkb | jeblair: restarting the master wfm | 21:09 |
clarkb | jeblair: I assume we will wait for nodepool to settle first | 21:09 |
*** dprince has quit IRC | 21:09 | |
jeblair | i think we're there... | 21:09 |
clarkb | yup I am checking build history now | 21:09 |
jeblair | clarkb: k; you can restart it at will | 21:10 |
jeblair | #status alert Restarting zuul, changes should be automatically re-enqueued | 21:10 |
openstackstatus | NOTICE: Restarting zuul, changes should be automatically re-enqueued | 21:10 |
*** ChanServ changes topic to "Restarting zuul, changes should be automatically re-enqueued" | 21:11 | |
clarkb | jeblair: build history is all empty. restarting jenkins now | 21:11 |
*** mrodden has joined #openstack-infra | 21:12 | |
jeblair | clarkb: ready for me to start zuul? | 21:13 |
clarkb | jeblair: ya, jenkins is back up | 21:13 |
jeblair | zuul is up; i've started the reverifies and rechecks (with a 30s delay as earlier) | 21:14 |
jeblair | though perhaps i should have done 60s, knowing what we know about git.o.o now | 21:14 |
clarkb | I just ran into the cannot fetch idx thing cloning from the 15g test node on the 30g test node... | 21:15 |
clarkb | this was over ipv6 | 21:15 |
clarkb | through haproxy | 21:15 |
*** cody-somerville has joined #openstack-infra | 21:15 | |
clarkb | are we not able to pack up all of the refs before the http timeout? | 21:15 |
jeblair | i have not seen that in isolated testing | 21:16 |
clarkb | you know, I wonder if the centos git slowness has anything to do with ipv6 | 21:16 |
*** dina_belova has quit IRC | 21:16 | |
clarkb | because it is being really slow too | 21:16 |
*** eharney has joined #openstack-infra | 21:16 | |
clarkb | cloning with git:// over ipv6 worked fine | 21:17 |
jeblair | i'm going to switch to zuul hacking to try to squash this bug before the next time we have to restart it | 21:17 |
clarkb | ok | 21:17 |
clarkb | pleia2: are you around? | 21:17 |
clarkb | pleia2: any chance you can try and corroborate that git cloning on centos is slow when using ipv6 but not when using ipv4? | 21:18 |
pleia2 | clarkb: hey | 21:18 |
clarkb | pleia2: cloning against review.o.o should be sufficient to test that | 21:18 |
pleia2 | clarkb: ok, will do | 21:18 |
*** gordc has joined #openstack-infra | 21:19 | |
clarkb | thank you | 21:20 |
clarkb | pleia2: fwiw it is consistent on these test boxes | 21:20 |
clarkb | I am testing ipv4 again to make sure it isn't some other external thing being weird | 21:21 |
pleia2 | clarkb: wait, running git clone *on* centos or to a git server on centos? | 21:23 |
*** dkranz has quit IRC | 21:23 | |
clarkb | pleia2: git clone on centos | 21:24 |
clarkb | pleia2: as our centos slaves are slow cloning from review.o.o | 21:24 |
pleia2 | clarkb: only have an hpcloud account, no ipv6 | 21:24 |
clarkb | oh | 21:24 |
*** pabelanger has quit IRC | 21:24 | |
clarkb | I am seeing the same slowness again ipv4 now. I am going to test cloning from my local box now | 21:24 |
pleia2 | I have several hosts that do have ipv6, but all debian and ubuntu | 21:25 |
*** reed has joined #openstack-infra | 21:25 | |
clarkb | wow this is so weird. On the rax test centos box ipv4 clone timed out too then did the cannot find idx pack file thing | 21:26 |
clarkb | but I run the same clone on my local precise box and clone all of nova in ~45 seconds | 21:27 |
clarkb | git:// works just fine on centos tough | 21:27 |
*** xBsd has quit IRC | 21:28 | |
jeblair | clarkb: remember that i was able to do 6 simultaneous clones over v4 https to the 8g box | 21:28 |
jeblair | (without error) | 21:28 |
clarkb | I am going to bypass haproxy now to see if that is tickling the issue | 21:28 |
clarkb | jeblair: were you running the clones on cenots? | 21:28 |
jeblair | clarkb: no, on precise | 21:29 |
clarkb | jeblair: I think this is the centos slowness remanifesting itself | 21:29 |
jeblair | interesting | 21:29 |
clarkb | because my precise box is fine | 21:29 |
jeblair | clarkb: er, is the issue that centos git does not speak the smart http protocol? | 21:30 |
clarkb | jeblair: no, centos git does speak smart http protocol | 21:30 |
*** fbo_away is now known as fbo | 21:30 | |
clarkb | it is 1.7.1 iirc and smart http went in 1.6.something | 21:30 |
clarkb | I will double check thatthough | 21:30 |
jeblair | maybe it doesn't speak it well. | 21:30 |
clarkb | could be | 21:30 |
*** dkranz has joined #openstack-infra | 21:31 | |
*** vipul is now known as vipul-away | 21:32 | |
pleia2 | it takes over 2 minutes to clone nova over http review.o.o in a couple places I tested (a debian linode - ipv4&6 and centos hpcloud ipv4) | 21:34 |
clarkb | jeblair: I can clone directly over ipv4 to apache. It is slow, but it works. I think haproxy must be amplifying some latency | 21:35 |
clarkb | I am trying to test with ipv6 but our iptables puppet stuff doesn't work correctly on centos for ipv6 | 21:35 |
*** lbragstad has joined #openstack-infra | 21:35 | |
*** vipul-away is now known as vipul | 21:35 | |
*** mriedem has quit IRC | 21:37 | |
*** danger_fo is now known as danger_fo_away | 21:38 | |
*** dkranz has quit IRC | 21:43 | |
*** SergeyLukjanov has quit IRC | 21:45 | |
*** lcestari has quit IRC | 21:47 | |
clarkb | direct ipv6 is also slow but works eventually | 21:48 |
clarkb | pleia2: where are those newer versions of git? I am half tempted to try one of them to see if the slowness goes away | 21:48 |
pleia2 | clarkb: debian is 1.7.10 | 21:49 |
clarkb | pleia2: the ones you found for centos | 21:49 |
pleia2 | clarkb: ah, newest one for centos is 1.7.11 | 21:49 |
anteaya | I'm on 1.8.1.2 - ubuntu quantal | 21:49 |
*** changbl has quit IRC | 21:49 | |
jeblair | clarkb: does it matter? i mean, is that the way we're going to solve this?= | 21:49 |
anteaya | not sure if that helps or creates jealousy | 21:49 |
pleia2 | git-daemon is a separate package though, realized I'd need to find a package for that too if it's not included | 21:50 |
jeblair | (i mean, maybe it'll tell you something, but if the end result of this is 'it might work if we upgrade all the slaves' i think we're digging a bigger hole) | 21:51 |
*** fbo is now known as fbo_away | 21:52 | |
* anteaya heads out for a walk | 21:54 | |
*** dkranz has joined #openstack-infra | 21:55 | |
pleia2 | oh, http://repoforge.org/ is the place for them though | 21:55 |
jeblair | clarkb: ^ | 21:55 |
pleia2 | (I tend to agree with jeblair though) | 21:55 |
pleia2 | usage details on centos6: http://wiki.centos.org/AdditionalResources/Repositories/RPMForge#head-f0c3ecee3dbb407e4eed79a56ec0ae92d1398e01 | 21:56 |
*** linggao has quit IRC | 21:56 | |
*** dkliban has quit IRC | 21:58 | |
*** ^d has quit IRC | 21:59 | |
*** hashar has joined #openstack-infra | 22:00 | |
*** hashar has left #openstack-infra | 22:00 | |
clarkb | jeblair: I agree in general too. I am scanning git release notes there are a few things that pop out as possibly being hte cause | 22:04 |
*** burt has quit IRC | 22:04 | |
*** mrodden has quit IRC | 22:04 | |
*** ryanpetrello has quit IRC | 22:05 | |
*** dkranz has quit IRC | 22:06 | |
clarkb | jeblair: pleia2 the two items with HTTP in them at https://git.kernel.org/cgit/git/git.git/tree/Documentation/RelNotes/1.7.5.txt seem like possible culprits | 22:09 |
*** _TheDodd_ has quit IRC | 22:11 | |
*** dmakogon_ has left #openstack-infra | 22:15 | |
pleia2 | clarkb: first seems like it would be trivial for big clones, not so sure about 2nd, they mention many tags but not big repo (nova does have a fair number of tags, I dont know how many "many" is) | 22:15 |
clarkb | pleia2: neither do I. I am tcpdumping now and will have to look at this closer | 22:15 |
jeblair | i don't expect the second to affect a clone | 22:15 |
clarkb | because this will need to be sorted before we can point any of the centos slaves at haproxy | 22:16 |
*** dina_belova has joined #openstack-infra | 22:17 | |
jeblair | i think zuul is stuck again; but i haven't been able to repro the problem locally yet | 22:19 |
*** dina_belova has quit IRC | 22:21 | |
*** mrodden has joined #openstack-infra | 22:23 | |
jeblair | i'm going to restart it with some cowboy logging | 22:26 |
*** dina_belova has joined #openstack-infra | 22:27 | |
*** dina_belova has quit IRC | 22:32 | |
clarkb | ok | 22:33 |
*** prad_ has quit IRC | 22:33 | |
*** rcleere has quit IRC | 22:37 | |
*** sarob has joined #openstack-infra | 22:38 | |
clarkb | comparing tcpdump taken locally and tcpdump on centos the centos client ends up with a window size of 0 frequently which does not happen locally | 22:41 |
clarkb | I think the client is unable to accept more data for some reason | 22:42 |
jeblair | is that with haproxy in both cases? | 22:43 |
clarkb | jeblair: without in the centos case. with locally I should retry locally without haproxy | 22:45 |
clarkb | I should learn to use punctuation too | 22:46 |
anteaya | back | 22:46 |
*** ftcjeff has quit IRC | 22:51 | |
clarkb | it is related to https somehow | 22:55 |
clarkb | I gave the http vhost the same git stuff as the https vhost and it is much faster | 22:55 |
jeblair | clarkb: haproxy? | 22:55 |
*** dims has quit IRC | 22:58 | |
clarkb | jeblair: https is slow through haproxy and not through haproxy, but worse through haproxy so definitely possible | 23:01 |
clarkb | http seemed to be much faster through both | 23:01 |
*** woodspa has quit IRC | 23:01 | |
jeblair | clarkb: wasn't suggesting a cause, just trying to understand the variables in your experiment | 23:01 |
clarkb | http + haproxy = fast, http = fast, https = slow, but does not fail, https + haproxy = even slower causes git clone to fail | 23:05 |
*** rnirmal has quit IRC | 23:05 | |
clarkb | and git clone fails because it cannot get the idx files. Looks like same issue before where it falls back on non smart http and the lack of .git in the dir name breaks it | 23:06 |
*** senk has quit IRC | 23:08 | |
jeblair | right, so that first connection fails | 23:08 |
jeblair | on the refs front, i believe the refs on git.o.o are as packed as they are going to be; the items in refs/ are all _directories_ (up to the last component of the path); the refs themselves are in a packed-refs file. | 23:09 |
*** dims has joined #openstack-infra | 23:11 | |
*** mberwanger has joined #openstack-infra | 23:13 | |
mgagne | clarkb: cgit config remove-suffix = 1. This will allow repositories on the filesystem to have a .git suffix but still show without it in the interface and in generated URL. | 23:15 |
jeblair | mgagne: i don't think cgit was the problem | 23:15 |
jeblair | mgagne: the problem was that the rewrite rules needed to support the dumb http protocol don't work | 23:16 |
jeblair | mgagne: but since we're never supposed to use the dumb http protocol, we're not worrying about fixing it for now (instead, making it reliable enough that the smart http protocol doesn't fail) | 23:17 |
mgagne | jeblair: that's what I'm trying to find out and stumble on this config | 23:17 |
*** jjmb has joined #openstack-infra | 23:18 | |
*** jhesketh has quit IRC | 23:18 | |
*** jhesketh has joined #openstack-infra | 23:20 | |
*** mberwanger has quit IRC | 23:21 | |
* HenryG wonders if jeblair has a created a script to automate "recheck|reverify no bug" submissions. :D | 23:24 | |
*** datsun180b has quit IRC | 23:27 | |
*** dina_belova has joined #openstack-infra | 23:28 | |
*** pabelanger has joined #openstack-infra | 23:28 | |
*** eharney has quit IRC | 23:30 | |
sarob | whats the pipeline ETA? | 23:31 |
Alex_Gaynor | hmm, so there's stuff that has all it's builds complete, but is still hanging out at the top of the pipeline, is that because of git? | 23:32 |
*** dina_belova has quit IRC | 23:32 | |
*** gordc has quit IRC | 23:33 | |
anteaya | sarob: no ETA | 23:35 |
anteaya | if it makes it through we are happy | 23:35 |
anteaya | we have had to restart zuul three times today | 23:35 |
anteaya | not our best day | 23:35 |
jeblair | Alex_Gaynor: no, i believe we are reliably reproducing the zuul bug now | 23:37 |
Alex_Gaynor | jeblair: ah! | 23:37 |
anteaya | jeblair: yay, did you cowboy logging result in more bug information? | 23:38 |
anteaya | s/you/your | 23:38 |
*** jjmb has quit IRC | 23:38 | |
jeblair | anteaya: a bit too much, i'm afraid | 23:38 |
jeblair | i've stopped zuul again | 23:39 |
anteaya | :( | 23:39 |
jeblair | it will take me a few mins to process this and figure out what's going on | 23:40 |
*** jcooley has quit IRC | 23:41 | |
anteaya | k | 23:42 |
anteaya | well that seems like time well spent because this zuul bug keeps showing up | 23:43 |
*** jcooley has joined #openstack-infra | 23:44 | |
*** zul has quit IRC | 23:47 | |
sarob | no sweat guys. you do an awesome job of keeping stuff humming | 23:52 |
anteaya | thanks sarob, jeblair may have found our elusive zuul bug | 23:52 |
sarob | sweet. good luck. | 23:52 |
anteaya | so hopefully we can keep zuul up longer on the next restart | 23:53 |
anteaya | thanks | 23:53 |
anteaya | :D | 23:53 |
*** zul has joined #openstack-infra | 23:53 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!