*** NikitaKonovalov_ has quit IRC | 00:00 | |
fungi | right. i'm deleting nodes associated with each master as i restart it, because i can't be assured they were properly marked offline | 00:00 |
---|---|---|
jeblair | fungi: also, it takes a _really_ long time to retrieve 40k keys | 00:00 |
jeblair | so this is a memory and performance problem | 00:00 |
lifeless | jeblair: 40k keys? | 00:00 |
lifeless | jeblair: unique per image? | 00:00 |
lifeless | jeblair: erm, node ? | 00:01 |
jeblair | lifeless: overall in the account | 00:01 |
*** tjones has quit IRC | 00:01 | |
jeblair | for that region | 00:01 |
*** NikitaKonovalov_ has joined #openstack-infra | 00:01 | |
*** NikitaKonovalov_ is now known as NikitaKonovalov | 00:01 | |
jeblair | lifeless: oh, yes, i think that's per node | 00:01 |
jeblair | lifeless: unique per node | 00:01 |
lifeless | hmm, if so we should make it per image | 00:02 |
*** rfolco has joined #openstack-infra | 00:02 | |
jeblair | lifeless: actually, i think it could be per-provider | 00:02 |
jeblair | lifeless: it's only used to bootstrap the image creation | 00:03 |
lifeless | its per image | 00:03 |
lifeless | updateImage ... manager.addKeypair | 00:03 |
jog0 | mordred: https://bitbucket.org/hpk42/tox/issue/116/new-pypi-override-breaks-people-who | 00:03 |
jeblair | lifeless: yeah, that makes sense. probably got so many due to image creation loops | 00:03 |
lifeless | jeblair: but making it per provider would avoid running into provider quotas when lots of images are in play | 00:03 |
lifeless | jeblair: and avoid this issue entirely | 00:04 |
jeblair | lifeless: yep. and less work for nodepool overall | 00:04 |
fungi | okay, jenkins01 is definitely getting lots of nodes now | 00:04 |
*** CaptTofu has joined #openstack-infra | 00:05 | |
jog0 | mordred: ahh I have tox 1.6 | 00:06 |
jog0 | mordred: I am always scared at the bugs you find in python dev workflows | 00:06 |
jog0 | mordred: tox 1.6.1 works \o/ | 00:07 |
* jeblair deletes keypairs | 00:07 | |
*** tjones has joined #openstack-infra | 00:07 | |
*** changbl has quit IRC | 00:07 | |
*** gokrokve_ has quit IRC | 00:07 | |
*** gokrokve has joined #openstack-infra | 00:08 | |
*** jhesketh__ has joined #openstack-infra | 00:09 | |
*** jhesketh__ has quit IRC | 00:09 | |
*** jhesketh has joined #openstack-infra | 00:09 | |
*** jhesketh__ has joined #openstack-infra | 00:09 | |
fungi | jenkins03 is up and running again | 00:09 |
jeblair | does anyone know if you can bulk-delete keypairs? | 00:11 |
jeblair | the nova api docs don't look promising in this regard... | 00:11 |
jog0 | jeblair: AFAIK I don't think you can | 00:11 |
*** dims has quit IRC | 00:11 | |
*** tjones has quit IRC | 00:12 | |
*** gokrokve has quit IRC | 00:12 | |
mordred | jeblair: I do forloops | 00:12 |
mordred | sadly | 00:12 |
mordred | jeblair: I support changing where keypairs happen, btw | 00:13 |
jeblair | mordred: yeah, that's probably faster than asking hpcloud for a new account. but barely. it could take ~10 hours | 00:13 |
stevebaker | hey, it looks like the tarballs job is having an issue in my heatclient release https://jenkins06.openstack.org/job/python-heatclient-tarball/11/console | 00:13 |
stevebaker | Connecting to tarballs.openstack.org | 00:13 |
stevebaker | 2014-02-19 00:08:47.995 | ERROR: Failed to upload files | 00:13 |
*** prad has quit IRC | 00:14 | |
jeblair | mordred: any chance of increasing the rate limits for our hpcloud account? | 00:15 |
fungi | jenkins01 finally shows a nodepool node in its webui | 00:17 |
fungi | two | 00:17 |
fungi | they're runnign jobs | 00:17 |
fungi | this is a good sign | 00:17 |
fungi | stevebaker: https://jenkins01.openstack.org/job/python-heatclient-tarball/2/console | 00:18 |
*** prad has joined #openstack-infra | 00:18 | |
fungi | worked | 00:18 |
fungi | " | 00:18 |
fungi | "Offline due to Gearman request" | 00:19 |
stevebaker | fungi: yay | 00:19 |
fungi | for the corresponding node which ran it too | 00:19 |
fungi | so i think we're on the right track now | 00:19 |
jeblair | fungi: awesome | 00:19 |
mordred | jeblair: probably - I could also see if they can bulk-delete keypairs behind the scenes | 00:19 |
*** sarob has quit IRC | 00:19 | |
jeblair | mordred: both of those would be helpful (the rate limit thing is helpful even aside from this) | 00:20 |
*** sarob has joined #openstack-infra | 00:20 | |
*** rcleere has quit IRC | 00:22 | |
*** matsuhashi has joined #openstack-infra | 00:22 | |
jeblair | az2 only has 22k. az3 has 48k. | 00:23 |
*** cadenzajon_ has quit IRC | 00:23 | |
mordred | jeblair: asking | 00:24 |
jeblair | that's 13 hours to delete | 00:24 |
*** sarob has quit IRC | 00:24 | |
*** yamahata has joined #openstack-infra | 00:25 | |
jeblair | mordred: and in case they can: it's okay to delete all keypairs in all regions from the account | 00:25 |
*** miguelzuniga has quit IRC | 00:26 | |
*** mgagne has quit IRC | 00:26 | |
*** ryanpetrello has joined #openstack-infra | 00:26 | |
*** dims has joined #openstack-infra | 00:27 | |
*** sandywalsh has quit IRC | 00:27 | |
fungi | jenkins05 is back up | 00:28 |
*** banix has quit IRC | 00:29 | |
*** talluri has joined #openstack-infra | 00:30 | |
*** hogepodge has quit IRC | 00:30 | |
*** nati_ueno has joined #openstack-infra | 00:32 | |
*** matsuhashi has quit IRC | 00:32 | |
mordred | jeblair: I have put in a few questions - the support team does not have a bulk-delete option, but they pointed me to the nova team, and I'm asking them | 00:34 |
*** talluri has quit IRC | 00:34 | |
*** matsuhas_ has joined #openstack-infra | 00:34 | |
* clarkb is back | 00:34 | |
mordred | jeblair: I have not yet asked about rate limits - I'll need to file a ticket for that | 00:34 |
fungi | yay clarkb! | 00:34 |
lifeless | 'phil, please delete ma stuff'! | 00:34 |
fungi | (so you don't have to read scrollback, just note that we're breaking everything) | 00:35 |
*** eharney has quit IRC | 00:35 | |
*** nati_uen_ has quit IRC | 00:35 | |
clarkb | now I want to read sb | 00:36 |
fungi | clarkb: main current issues are dns resolution broken from review.o.o querying rackspace recursive resolvers in dfw (worked around by pointing at iad), nodepool memory leak appears to be related to nearly 100 thousand crufy keypairs in hpcloud, and jenkins 1.511 changed the offline api call | 00:36 |
clarkb | fungi: wow re keypairs | 00:37 |
fungi | jeblair's deleting keypairs, i'm downgrading jenkinses to lts | 00:37 |
jog0 | fungi jeblair: can you file a bug with nov about bulk keypair | 00:37 |
clarkb | fungi: are we upgrading zmq plugin when jenkinses are downgraded? | 00:37 |
clarkb | also is the bug in jenkins or nodepool? | 00:37 |
clarkb | and why is it only biting us now? | 00:38 |
fungi | clarkb: i already upgraded the zmq plugin earlier when i upgraded 1.511 | 00:38 |
jeblair | clarkb: jenkins changed something about the internal offline node api that gearman-plugin uses | 00:38 |
fungi | downgrading now to 1.532.2 (lts) which seems to solve current concerns | 00:38 |
anteaya | I think crufty keypairs would be a great username | 00:38 |
jeblair | clarkb: so we need to (later) update gearman-plugin to fix that | 00:38 |
anteaya | like nifty lettuce | 00:39 |
clarkb | fungi: jeblair: wait I am confused if lts is 1.532 how does it help to downgrade to it if 1.511 introduced the problem? | 00:39 |
fungi | 1.551 | 00:40 |
fungi | i mistyped | 00:40 |
clarkb | ah ok it makes a lot more sense now thanks | 00:40 |
fungi | earlier i upgraded from 1.525/1.543 to 1.551, now i'm downgrading to 1.532.2 | 00:40 |
*** dangers is now known as dangers_away | 00:41 | |
fungi | which supposedly also has the same security fixes backported to it | 00:41 |
clarkb | note that that lts version may have a different offline node bug | 00:41 |
jeblair | jog0: https://bugs.launchpad.net/nova/+bug/1281853 | 00:41 |
clarkb | the one that we are trying to work around with single use nodes | 00:41 |
uvirtbot | Launchpad bug 1281853 in nova "Add method to bulk delete keypairs" [Undecided,New] | 00:41 |
*** sabari has quit IRC | 00:41 | |
fungi | ooh! uvirtbot came back too while i wasn't looking, huh? | 00:41 |
*** yamahata has quit IRC | 00:42 | |
*** yamahata has joined #openstack-infra | 00:43 | |
jog0 | jeblair: thanks | 00:44 |
fungi | okay, jenkins07 is online again | 00:44 |
clarkb | jeblair: fungi: ok I think I grok the current state of fun. ANything I can jump onto to help? | 00:44 |
clarkb | looks like DNS is better now curtesy of google | 00:44 |
jog0 | jeblair: do you want to be able to delete all keypairs? | 00:44 |
clarkb | and jenkinses are being downgraded | 00:44 |
jeblair | clarkb: no we switched to iad dns | 00:44 |
clarkb | ah iad dns | 00:44 |
jeblair | clarkb: do you think we will have a problem with the lts release? | 00:45 |
fungi | i'm going to start in on the even numbered masters, but more slowly while the odd numbered masters get more nodes assigned | 00:45 |
fungi | since nodepool is on a go-slow | 00:45 |
clarkb | jeblair: let me dig into that more, my hunch is single use nodes will mitigate it if so | 00:45 |
jeblair | jog0: well, at this moment, yes. but in general being able to provide a list of things to delete would be nice | 00:45 |
jog0 | jeblair: makes sense although listing 10k things in a single request seems excessive | 00:47 |
fungi | jog0: xargs man, xargs | 00:47 |
jeblair | jog0: everything about openstack-infra is excessive. haven't you noticed? ;) | 00:48 |
jog0 | jeblair: :) | 00:48 |
fungi | or being able to go in a for loop and delete 10 keys per call would at least speed up the situation by a factor of 10 | 00:48 |
jog0 | I have | 00:48 |
geekinutah | so sad, success but still requed :-( https://jenkins06.openstack.org/job/gate-nova-python27/847/console | 00:48 |
*** tjones has joined #openstack-infra | 00:48 | |
fungi | geekinutah: we're continuing to downgrade jenkins masters | 00:48 |
geekinutah | yeah, I've been watching pass on downgraded guys | 00:49 |
fungi | geekinutah: i've got about half of them done, and am getting started shutting the other half down, so it should be fixed up soonish | 00:49 |
*** rfolco has quit IRC | 00:49 | |
jeblair | geekinutah: as fungi works through the jenkins downgrade, your changes of completion are going up! :) | 00:50 |
clarkb | https://issues.jenkins-ci.org/browse/JENKINS-19453 is the upsteram bug | 00:50 |
*** sarob has joined #openstack-infra | 00:50 | |
*** wenlock has quit IRC | 00:50 | |
clarkb | sorting out if that made it into the lts | 00:51 |
clarkb | looks like ti may have been backported | 00:51 |
geekinutah | fungi, jeblair: don't mind me, you guys are doing great, really appreciate it | 00:51 |
clarkb | jeblair: fungi: the fix for 19453 was backported into stable and is in 1.532.2's log | 00:54 |
clarkb | we should be fine | 00:54 |
fungi | clarkb: all's the better. thanks for checking! | 00:54 |
*** geekinutah has left #openstack-infra | 00:54 | |
openstackgerrit | Derek Higgins proposed a change to openstack-infra/nodepool: Add fedora support https://review.openstack.org/74529 | 00:56 |
openstackgerrit | Derek Higgins proposed a change to openstack-infra/nodepool: Catch key problems in ssh_connect https://review.openstack.org/74528 | 00:56 |
*** david-lyle has quit IRC | 00:57 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Make jenkins get info task synchronous https://review.openstack.org/74545 | 00:57 |
clarkb | fungi: do any more jenkinses need downgrading? | 00:58 |
clarkb | I can hand hold some of those if it helps | 00:58 |
jeblair | clarkb, fungi: ^ maybe let's merge that soon and i think that will reduce the 40 minute main-loop cycle in nodepool | 00:58 |
clarkb | jeblair: rgr will review | 00:58 |
mordred | derekh: re: your key problems patch - what happens if the node never comes online/ | 00:59 |
mordred | ? | 00:59 |
fungi | clarkb: not really. it's mostly just stretching the process out so that i don't completely starve us, but i'm shutting down the other evens here momentarily | 00:59 |
mordred | jeblair: lookin | 00:59 |
clarkb | fungi: ok | 00:59 |
clarkb | jeblair: that change is nice and small +2 | 01:00 |
clarkb | jeblair: is there any concern that there are mixed async and sync calls? | 01:00 |
derekh | mordred: it should timeout like it always did, the exception I'm catching get thrown if ssh comes up but the key doesn't work | 01:00 |
fungi | okay, 2/4/6 are in shutdown now, and 02 is close to me being able to downgrade it. i'm evacuating all it's 100+ ready nodes so that the good masters will start to pick up steam | 01:00 |
mordred | ok. the commit message said something about continuing to try - I just wanted to make sure we weren't introducing a possibly endless loop | 01:01 |
anteaya | like we are in now | 01:01 |
mordred | derekh: yup. duh. I read it properly now. thanks | 01:01 |
jeblair | clarkb: i don't think so; it should just be some simple urllib2 calls; i think the jenkins object is thread safe | 01:02 |
jeblair | clarkb: yeah, it just stores some strings and that's it | 01:02 |
clarkb | jeblair: great | 01:03 |
*** dcramer__ has joined #openstack-infra | 01:03 | |
*** mdenny has quit IRC | 01:04 | |
*** russellb has quit IRC | 01:05 | |
mordred | jeblair: while I'm reviewing that one, I'm reviewing the other nodepool changes that are up - there's one from BobBall that looks very safe and has 2 +2s already (it just adds matching for image hex strings) | 01:05 |
mordred | should I avoid landing extra thigns on principle? | 01:06 |
jeblair | mordred: should be ok | 01:06 |
clarkb | yeah BobBalls change is pretty safe iirc | 01:06 |
mordred | jeblair: k. (I've been heads-down in nodepool today, so I also feel fairly competent on what it's doing) | 01:06 |
jeblair | i have manually installed nodepool with my change on nodepool.o.o | 01:07 |
jeblair | (because it's going to be forever before it actually merges) | 01:07 |
jeblair | fungi: what's the current jenkins state? i'm trying to figure out when it would be best to restart np | 01:08 |
jeblair | fungi: (not only to pick that up, but also because it's about time to free memory) | 01:08 |
fungi | jeblair: jenkins01,3,5,7 are online but none have nodes assigned (well not entirely true, there are a few dozen in nodepool ready state on 01 but not showing in the webui yet) | 01:09 |
fungi | i'm nodepool deleting ready nodes from the even masters while they finish up their remaining jobs | 01:09 |
fungi | in hopes nodepool will soon start adding fresh nodes to the active masters | 01:10 |
*** markmcclain has quit IRC | 01:10 | |
jeblair | fungi: the evens are in shutdown mode? | 01:10 |
fungi | jeblair: yes | 01:10 |
jeblair | fungi: now might be the best time to restart then | 01:10 |
fungi | works for me | 01:11 |
jeblair | fungi: i think it may have oomed while we were talking about it | 01:13 |
clarkb | jeblair: did we identify why keypairs are leaking? and maybe we should switch to using a specific keypair isntead? | 01:14 |
*** ryanpetrello has quit IRC | 01:14 | |
*** tjones has quit IRC | 01:14 | |
*** tjones has joined #openstack-infra | 01:14 | |
jeblair | clarkb: my guess is they leaked during image creation loops. and yes, i think we should have one keypair per provider. | 01:14 |
*** atiwari has quit IRC | 01:15 | |
*** tjones has quit IRC | 01:16 | |
openstackgerrit | A change was merged to openstack-infra/config: Add single-use py3k-precise nodes https://review.openstack.org/73846 | 01:17 |
clarkb | jeblair: should be ok to merge change to nodepools config yaml too? | 01:18 |
jeblair | a change merged! | 01:19 |
jeblair | clarkb: yeah | 01:19 |
clarkb | oh gah, I really need to figure out why gerrit doesn't show my commit message first | 01:20 |
jeblair | it looks like the nodepool main loop now runs every ~13 seconds | 01:20 |
jeblair | so it should be much less spiky now | 01:20 |
clarkb | I think it happens when I jump to different changes via the dependency links | 01:20 |
fungi | boy howdy | 01:20 |
fungi | and the good masters are running mucho jobs now | 01:21 |
*** jergerber has joined #openstack-infra | 01:21 | |
*** nati_uen_ has joined #openstack-infra | 01:21 | |
mordred | jeblair, clarkb: I have locally observed keypairs leaking - best I can tell, if an image fails at creation, one is left with a keypair | 01:21 |
clarkb | mordred: so I think we should just use a single keypair per provider and call ti good | 01:21 |
openstackgerrit | A change was merged to openstack-infra/config: Fix Climate jobs https://review.openstack.org/71317 | 01:21 |
clarkb | which jeblair agrees with | 01:21 |
*** tjones has joined #openstack-infra | 01:22 | |
mordred | clarkb: yup | 01:22 |
jeblair | clarkb: i assume that means nodepool will need to create it and stash the private half locally in /var. shouldn't be a big deal though. | 01:22 |
clarkb | jeblair: correct | 01:23 |
*** sarob has quit IRC | 01:23 | |
clarkb | jeblair: ideally it will store both halves :) you only put one half on zuul-dev which meant I had to dig in DBs for the public half which si no fun :) | 01:23 |
*** nati_ueno has quit IRC | 01:23 | |
mordred | clarkb: you can construct a public key from a private one | 01:23 |
mordred | clarkb: I always have to go re-learn the command though | 01:23 |
*** mestery has quit IRC | 01:24 | |
clarkb | mordred: oh are both in the encrypted file? | 01:24 |
jeblair | clarkb: actually, nodepool really only has to store the public half, come to think of it. | 01:24 |
*** talluri has joined #openstack-infra | 01:24 | |
clarkb | jeblair: it sshs which needs the private side right? | 01:24 |
*** banix has joined #openstack-infra | 01:24 | |
*** derekh has quit IRC | 01:24 | |
jeblair | clarkb: right, it only needs the private half. :) | 01:24 |
*** harlowja_away has quit IRC | 01:25 | |
clarkb | mordred: I don't know why I never knew that, I guess I assumed that they were distinct (you can't get one from other with maths) | 01:25 |
*** tjones has quit IRC | 01:25 | |
fungi | jenkins02 is downgraded and back online now | 01:25 |
mordred | clarkb: you can go in one direction, just not the other | 01:26 |
clarkb | mordred: right but only because the public key is in the private key file | 01:26 |
clarkb | not due to maths | 01:26 |
*** mrodden has quit IRC | 01:26 | |
jeblair | there were 138 nodes in the building state while np was stopped; i'm deleting them now. | 01:26 |
anteaya | look at all that yellow and green in the graph | 01:26 |
anteaya | that would get rid of the yellow I guess | 01:27 |
anteaya | I wonder how much of the green is actual usable available nodes | 01:27 |
clarkb | anteaya: I think very little of it due to jeblair's fire and brimstone approach | 01:28 |
*** tjones has joined #openstack-infra | 01:28 | |
anteaya | k | 01:28 |
jeblair | i think nodes that have been ready for >1h are suspicous and should be deleted | 01:28 |
anteaya | ah | 01:28 |
anteaya | goodbye nodes | 01:28 |
anteaya | take your crufty keys with you | 01:28 |
jeblair | that's another 101 nodes | 01:28 |
*** talluri has quit IRC | 01:28 | |
jeblair | though of course we're running into rate limits with so much going on | 01:29 |
fungi | jeblair: agreed. if they're on jenkins04 or 06 though they're explainable. i'm in the process of deleting them already | 01:29 |
anteaya | of course | 01:29 |
anteaya | a fire would be a fire without some throttling | 01:29 |
anteaya | wouldn't | 01:30 |
*** mestery has joined #openstack-infra | 01:30 | |
fungi | i've nearly got 04 cleared out. 06 still has a couple jobs running but they should be wrapped up by the time i get to it | 01:31 |
anteaya | they will just loop back round for another go on a different jenkins | 01:31 |
*** tjones has quit IRC | 01:32 | |
clarkb | fungi: out of curiousity is there a reason we limited bare-centos to rax? py3k-precise as well | 01:33 |
fungi | clarkb: i think because that's hwere they'd previously run | 01:34 |
fungi | and we maybe hadn't tested puppeting up hpcloud's base centos images? | 01:34 |
fungi | we can certainly add a change to spin up images in those too and see how they fare | 01:35 |
jeblair | yep | 01:35 |
*** balar has quit IRC | 01:35 | |
fungi | jenkins04 is back online now | 01:35 |
clarkb | fungi: cool, just checkign that there wasn't a specific reason for that | 01:36 |
clarkb | like image didn't work or some such | 01:36 |
*** nosnos has joined #openstack-infra | 01:36 | |
fungi | ph33r of the unknown (and a black hat) | 01:36 |
anteaya | was jenkins 06 the last one to come down? | 01:36 |
fungi | anteaya: yes, i'm clearing it out now | 01:36 |
anteaya | k | 01:36 |
anteaya | some jobs just finished on 06 and have started up again on other nodes on a patch I am watching | 01:37 |
anteaya | I hope this is the last round | 01:37 |
clarkb | zaro: are you about? looks like a bug was fixed for the envinject thing. Did we chase that down? | 01:37 |
clarkb | zaro: or are we just calling it a derp and moving on? | 01:37 |
clarkb | zaro: the bug wasn't clear to me | 01:37 |
fungi | clarkb: for the zmq plugin tarball job? i merely retriggered the job and it worked the second time around | 01:38 |
clarkb | fungi: right, but zaro marked that bug fixed I think | 01:39 |
*** banix has quit IRC | 01:39 | |
fungi | ahh | 01:39 |
fungi | "fixed" | 01:40 |
*** banix has joined #openstack-infra | 01:41 | |
jeblair | the sparklines for both check and gate have a downtick | 01:41 |
*** mgagne has joined #openstack-infra | 01:41 | |
clarkb | jeblair: fungi: all jenkins are downgraded and all cruft nodepool nodes are in the process of being deleted? | 01:42 |
fungi | jenkins06 is online again now and downgraded | 01:42 |
*** tjones has joined #openstack-infra | 01:42 | |
clarkb | fungi: in other news the logstash build_master data is populated which is pretty awesome | 01:42 |
fungi | clarkb: i still have a couple nodepool delete loops taking their time, but only a handful of remaining nodes each between them | 01:42 |
clarkb | dims: ^ | 01:43 |
fungi | i'll try to check back in later and mass delete any nodepool nodes which have been in any state at all for >3 hours | 01:44 |
*** mgagne1 has joined #openstack-infra | 01:44 | |
fungi | just in case we miss a few | 01:44 |
clarkb | fungi: ping me then I may be around and can assist | 01:44 |
fungi | k | 01:44 |
jeblair | fungi: cool. i just started some for deletes that have been in state for > 1hr | 01:44 |
openstackgerrit | K Jonathan Harker proposed a change to openstack-infra/config: Parameterize the status page urls https://review.openstack.org/74557 | 01:44 |
fungi | in amusing news, the zuul jph graph topped out at 3000 earlier :/ | 01:45 |
*** tjones has quit IRC | 01:45 | |
jeblair | fungi: yowza. | 01:45 |
fungi | i guess we road tested the new patches | 01:45 |
*** prad has quit IRC | 01:45 | |
jeblair | http://graphite.openstack.org/render/?from=-24hours&height=600&until=now&width=800&bgcolor=ffffff&fgcolor=000000&areaMode=stacked&target=color%28alias%28sumSeries%28stats.gauges.nodepool.target.*.*.*.building%29,%20%27Building%27%29,%20%27ffbf52%27%29&target=color%28alias%28sumSeries%28stats.gauges.nodepool.target.*.*.*.ready%29,%20%27Available%27%29,%20%2700c868%27%29&target=color%28alias%28sumSeries%28stats.gauges.nodepool.target.*.*.*.us | 01:45 |
fungi | i think because so many jobs were cycling and gettign reset | 01:46 |
clarkb | fungi: cycling? meaning 72 hour timeout? | 01:46 |
*** mgagne has quit IRC | 01:46 | |
jeblair | oops, too long. | 01:46 |
dims | clarkb, nice! | 01:46 |
*** prad has joined #openstack-infra | 01:46 | |
fungi | clarkb: the jobs were getting nodes offlined out from under them and restarted over and over | 01:46 |
clarkb | fungi: oh right jenkins bug | 01:47 |
jeblair | clarkb: the jenkins/gearman-plugin bug manifested as a null-result build to zuul | 01:47 |
clarkb | or gearman plugin | 01:47 |
jeblair | clarkb: so the 'restart job on jenkins derp' logic kicked in and zuul has been restarting these jobs for several hours | 01:47 |
openstackgerrit | A change was merged to openstack-infra/nodepool: Make jenkins get info task synchronous https://review.openstack.org/74545 | 01:47 |
fungi | which is the main reason for the current pile-up | 01:47 |
openstackgerrit | A change was merged to openstack-infra/nodepool: Allow useage of server IDs as well as names. https://review.openstack.org/69424 | 01:47 |
jeblair | clarkb: so the nice thing is that they aren't reporting negative results | 01:48 |
*** melwitt1 has quit IRC | 01:48 | |
clarkb | as things settle any chance I can get another core review on https://review.openstack.org/#/c/72509/1 | 01:48 |
clarkb | jeblair: ya that would make the fun right now a bit more chaotic | 01:48 |
jeblair | clarkb: you don't want the random sleep? | 01:48 |
clarkb | jeblair: I don't think it is necessary as the jobs are being split apart by 20 minutes already | 01:49 |
jeblair | k. should be fine as long as other things aren't hitting it at those times | 01:49 |
*** yaguang has joined #openstack-infra | 01:50 | |
fungi | jenkins06 is getting nodepool nodes and running jobs now | 01:50 |
clarkb | jeblair: it should be an improvement over the current situations which is 12 large query sets per hour instead of 3 | 01:50 |
clarkb | (I think 12) | 01:50 |
*** banix has quit IRC | 01:54 | |
fungi | clarkb: are we holding off upgrading the gearman plugin to 0.0.5 in production (i see it's only on -dev) | 01:55 |
fungi | jeblair: ^ | 01:55 |
fungi | i don't recall what the situation was with that | 01:55 |
clarkb | fungi: I don't think we need to, but it does require a zuul restart | 01:55 |
clarkb | I can do the logstash server during a less hectic time | 01:56 |
*** weshay has quit IRC | 01:57 | |
*** banix has joined #openstack-infra | 01:57 | |
*** dkehn has quit IRC | 01:57 | |
anteaya | yay, jobs are finished and staying finished | 01:57 |
*** dkehn has joined #openstack-infra | 01:57 | |
fungi | i've gone back through the jenkins masters and confirmed they're all on the correct versions of jenkins and important plugins | 01:58 |
fungi | nice and consistent for the first time in a while | 01:58 |
anteaya | well now we are stable jenkins heading into ff | 01:59 |
fungi | we hope, anyway | 02:00 |
anteaya | we do | 02:00 |
*** talluri has joined #openstack-infra | 02:00 | |
fungi | it's not like we've had great luck with running a different jenkins release and not finding new issues | 02:00 |
fungi | and just like the one i downgraded from, this is also a version we haven't run before | 02:01 |
*** yamahata has quit IRC | 02:01 | |
anteaya | more fun coming up | 02:01 |
anteaya | no idea how it will manifest | 02:02 |
*** khyati_ has quit IRC | 02:02 | |
*** pcrews has quit IRC | 02:02 | |
anteaya | fungi: have you eaten lately? | 02:02 |
anteaya | the fire fighting has been going on 3+ hours | 02:03 |
mordred | wait- I thought fungi didn't get to eat until after FF | 02:03 |
fungi | heh | 02:03 |
anteaya | just using up the camel hump I guess | 02:03 |
*** mgagne has joined #openstack-infra | 02:06 | |
*** mgagne1 has quit IRC | 02:07 | |
*** mrodden has joined #openstack-infra | 02:07 | |
fungi | getting back to rerunning jenkins-jobs update on the jenkins masters. my bare-precise change touched a whole lot of jobs, so the puppet exec timeout was way too short to handle it | 02:07 |
*** ryanpetrello has joined #openstack-infra | 02:07 | |
*** hdd_ has joined #openstack-infra | 02:08 | |
fungi | i had already gotten through jenkins.o.o and 01-04, so now it's running on 05-07 | 02:08 |
jeblair | fungi: oh nice catch; that's probably responsible for some stuck jobs too | 02:08 |
jeblair | fungi: as i imagine that our static nodes may be marked offline at this point... | 02:08 |
*** mgagne1 has joined #openstack-infra | 02:08 | |
fungi | well, they were all on 01 and 02, which is why i got those out of the way early | 02:09 |
fungi | they were mostly done before we got into any of the real fun | 02:09 |
*** nati_ueno has joined #openstack-infra | 02:09 | |
*** mgagne has quit IRC | 02:10 | |
*** dstanek has joined #openstack-infra | 02:10 | |
*** talluri has quit IRC | 02:12 | |
*** nati_uen_ has quit IRC | 02:12 | |
*** talluri has joined #openstack-infra | 02:12 | |
fungi | i had already confirmed no new jobs were getting assigned to the static precise slaves, then offlined them and removed them from the masters a while later (i can add them back fairly easily if need be, since they're not deleted at the provider yet) | 02:13 |
jeblair | fungi: there are a lot of ready nodes, and most of the are attatched to jenkins02... | 02:15 |
jeblair | fungi: do you think one of your scripts missed something, or could those be casualties of the nodepool downtime and restart? | 02:15 |
fungi | the latter. nodepool list counted zero when i deleted them initially with jenkins02 shutdown | 02:16 |
clarkb | jeblair: I want to say I have seen nodepool do wild swings like that while nodes are down | 02:16 |
clarkb | then it settles out again once everything is back up for a full iteration through the single use nodes | 02:16 |
fungi | it's possible it wasn't the nodepool restart, but that nodepool tried to add nodes to 02 too quickly and most didn't register in jenkins | 02:16 |
jeblair | clarkb: oh, i'm not thinking that jenkins02 is over loaded, i mean to say that they are not really ready nodes | 02:17 |
clarkb | oh that is different then | 02:17 |
jeblair | clarkb: the ones i have spot checked were already deleted from jenkins | 02:17 |
*** talluri has quit IRC | 02:17 | |
clarkb | jeblair: so they are in nodepool buit not reflected in jenkins | 02:17 |
clarkb | gotcha | 02:17 |
morganfainberg | i am guessing some excitement for the day has abated since i see jobs making their way through check/gate | 02:17 |
fungi | there was an initial glut of 100+ nodes built for 02 and i only saw a fraction of them show up in the webui. i suspect jenkins never added them to its slave list | 02:18 |
lifeless | 'maybe' :P | 02:18 |
morganfainberg | lifeless :) | 02:18 |
anteaya | morganfainberg: aye | 02:18 |
anteaya | we hope they continue along the abatement route | 02:18 |
anteaya | abating? | 02:18 |
morganfainberg | well... if they are resolving...and there is a little bandwidth to do something that keystone-core will <3 you (ok ok, i still lie, I will <3 you guys) for https://review.openstack.org/#/c/74472/ - it'll keep us from monopolizing openstack-dev. (just eavesdrop bot stuff) | 02:20 |
* morganfainberg thinks if that can be rephrased in a creepy stalker-ish way... | 02:20 | |
morganfainberg | >.> | 02:20 |
morganfainberg | nah | 02:20 |
*** sarob has joined #openstack-infra | 02:20 | |
anteaya | morganfainberg: please don't do that | 02:21 |
jeblair | fungi: i think i'll delete ready nodes that are > 0.1 hours | 02:21 |
anteaya | I have such high regard for you | 02:21 |
fungi | jeblair: sounds safe | 02:21 |
morganfainberg | anteaya, hehe, i don't think i could actually think of a way to rephrase it. | 02:21 |
anteaya | good | 02:21 |
morganfainberg | anteaya, just doesn't come naturally to me. | 02:21 |
anteaya | glad to hear it | 02:21 |
anteaya | again, so happy to hear that | 02:21 |
morganfainberg | besides, i actually genuninely like -infra folks | 02:22 |
anteaya | well there's that too | 02:22 |
anteaya | so thanks | 02:22 |
anteaya | back at 'ya | 02:22 |
morganfainberg | anyway. just relaying keystone desires :) thanks in advance | 02:22 |
anteaya | already +1'd | 02:22 |
morganfainberg | anteaya, i know :) you're awesome. | 02:23 |
anteaya | jenkins is +1, 6 other +1 and a +2 on it | 02:23 |
jeblair | okay there's another 89 nodes i hope | 02:23 |
anteaya | morganfainberg: nah, I just review the easy patches | 02:23 |
morganfainberg | anteaya, yep. it's why i hopped over. | 02:23 |
morganfainberg | lol | 02:23 |
morganfainberg | anteaya, one of these days i'm going to have time to be really more involved with infra stuff | 02:24 |
morganfainberg | anteaya, one of these days... | 02:24 |
dolphm | fallacy & | 02:24 |
anteaya | morganfainberg: one of these days | 02:24 |
jeblair | morganfainberg: one of these days i hope i'll have time too. :) | 02:24 |
anteaya | doesn't sound like today | 02:24 |
morganfainberg | jeblair, lol :) | 02:24 |
morganfainberg | anteaya, nah, dolphm will just find more stuff to be done. | 02:24 |
anteaya | he is like that, dolphm is | 02:25 |
* anteaya gestures pushing that patch out of the gate | 02:26 | |
*** UtahDave has quit IRC | 02:26 | |
jeblair | oh, the top check change is running its missing job! | 02:28 |
clarkb | and the downtick on the sparklines continue | 02:29 |
anteaya | yay for both | 02:29 |
anteaya | yay that job finished success | 02:31 |
*** ryanpetrello has quit IRC | 02:31 | |
anteaya | out out out | 02:31 |
anteaya | look at the gate shrink | 02:32 |
anteaya | 6 | 02:32 |
anteaya | 12 in post | 02:35 |
*** gokrokve has joined #openstack-infra | 02:35 | |
fungi | nibalizer: what size vm does this puppetdb need to start out? 2gb ram? | 02:36 |
openstackgerrit | A change was merged to openstack-infra/config: Run fewer es queries with elastic_recheck. https://review.openstack.org/72509 | 02:39 |
clarkb | fungi: http://docs.puppetlabs.com/puppetdb/latest/scaling_recommendations.html | 02:41 |
clarkb | fungi: basically we have two major processes, puppetdb itself (which is java and needs heap) and postgresql | 02:41 |
jeblair | clarkb: puppetdb is java? not ruby? | 02:42 |
clarkb | fungi: it doesn't look like we need a very large puppetdb java process because we are using postgresql. Which leaves us with accomodating postgresql | 02:42 |
clarkb | jeblair: its jvm, it might be jruby or similar | 02:42 |
*** jergerber has quit IRC | 02:42 | |
*** dcramer__ has quit IRC | 02:42 | |
jeblair | clarkb: ah. | 02:43 |
clarkb | looks like clojure | 02:43 |
clarkb | https://github.com/puppetlabs/puppetdb/tree/master/src/com/puppetlabs | 02:43 |
fungi | the language schizophrenia of the puppet ecosystem amuses me | 02:43 |
clarkb | I have a hunch 2GB is plenty | 02:44 |
clarkb | but nibalizer should know more | 02:44 |
openstackgerrit | Davanum Srinivas (dims) proposed a change to openstack/requirements: Sync requirements to oslo.vmware https://review.openstack.org/74569 | 02:45 |
clarkb | oslo.vmware | 02:45 |
clarkb | I think that is my queue for dinner | 02:45 |
clarkb | cue? | 02:46 |
clarkb | silly english | 02:46 |
fungi | oh, fair warning i'll be disappearing around 21:00 utc tomorrow for our monthly local osug | 02:46 |
dims | i will probably need help fixing a bad requirements.txt in oslo.vmware :) | 02:46 |
fungi | dims: you're going to need help fixing vmware? i think that's out of my league, sorry ;) | 02:47 |
clarkb | dims: why do we need vendor specific oslo libs? | 02:47 |
fungi | sounds like the different virtual resources in vmwareland need some common interaction from more than one component of openstack | 02:48 |
dims | fungi, just need to add a \n in the requirements.txt | 02:50 |
dims | clarkb, fungi - yea, same code in cinder, nova etc | 02:50 |
clarkb | dims: why wouldn't that live in vmware land? | 02:50 |
clarkb | (just trying to sort out why this lives in openstack and not the vendor space) | 02:50 |
fungi | vmware python sdk | 02:51 |
dims | clarkb, the code is pretty specific to openstack and not usable outside of openstack | 02:51 |
anteaya | is it opensource? | 02:51 |
jerryz | hi guys, i have a problem with nodepool used by gerrit-triggered jenkins. when a patch is updated, the on-going job will be aborted but the slave hasn't been deleted yet and the new patch is tested on the used slave. later on that slave will be deleted by nodepool and the job fails. | 02:51 |
*** sarob_ has joined #openstack-infra | 02:52 | |
clarkb | anteaya: yes | 02:52 |
clarkb | jerryz: I don't think you can mix the two | 02:52 |
dims | anteaya, it's existing code in nova/cinder that's getting moved out all projects can use the same code base | 02:52 |
clarkb | jerryz: you need to use the offline slave functionality in gearman plugin with nodepool | 02:52 |
anteaya | yeah, I'm with clarkb not sure why we have to maintain vendor code, regardless of how specific it is to openstack | 02:52 |
clarkb | anteaya: I think I understand now | 02:53 |
clarkb | it is openstack specific bits for interacting with vmware | 02:53 |
clarkb | and if it needs to go in multiple projects oslo is the place for it | 02:53 |
dims | anteaya, clarkb, fungi - https://blueprints.launchpad.net/oslo/+spec/vmware-api | 02:53 |
clarkb | it just feels wrong | 02:53 |
anteaya | it does | 02:53 |
anteaya | I have vendor prickly radar | 02:53 |
jerryz | clarkb: i will have a try | 02:54 |
jerryz | thanks | 02:54 |
clarkb | jerryz: or if the gerrit plugin can offline nodes when jobs are started that will work too | 02:54 |
*** sarob has quit IRC | 02:55 | |
*** nati_uen_ has joined #openstack-infra | 02:57 | |
jerryz | clarkb: is the gearman plugin with offline slave function a snapshot version? | 02:58 |
clarkb | jerryz: I think you may need a snapshot for the latest bug fixes | 02:58 |
clarkb | jerryz: http://tarballs.openstack.org/ci/gearman-plugin/ | 02:59 |
clarkb | no 0.0.6 looks new enough | 02:59 |
*** yamahata has joined #openstack-infra | 02:59 | |
jerryz | clarkb: thanks | 02:59 |
*** nati_ueno has quit IRC | 03:00 | |
*** julim has quit IRC | 03:02 | |
*** simonmcc has quit IRC | 03:08 | |
*** gokrokve has quit IRC | 03:08 | |
*** gokrokve has joined #openstack-infra | 03:09 | |
*** simonmcc has joined #openstack-infra | 03:10 | |
*** gokrokve has quit IRC | 03:13 | |
*** gokrokve has joined #openstack-infra | 03:15 | |
*** sarob_ has quit IRC | 03:24 | |
*** CaptTofu has quit IRC | 03:26 | |
openstackgerrit | Cyril Roelandt proposed a change to openstack-infra/pypi-mirror: Do not download wheels when running "pip install" https://review.openstack.org/74579 | 03:28 |
*** matsuhas_ has quit IRC | 03:29 | |
mordred | hrm | 03:30 |
mordred | that's pretty much the opposite direction we'd like that to go | 03:30 |
mordred | clarkb: ^^ unless there is a direction or issue we're seeing I don't know about? | 03:31 |
clarkb | ? | 03:34 |
clarkb | 74579? | 03:35 |
mordred | yeah | 03:37 |
morganfainberg | anteaya, can you explain something to me... | 03:39 |
morganfainberg | anteaya, why are recruiters obnoxious? :P | 03:40 |
morganfainberg | ok ok enoough of that | 03:40 |
mordred | morganfainberg: because of the reasons | 03:40 |
mordred | morganfainberg: also, because they need a bunch of contract java programmers in new jersey apparently | 03:40 |
dstufft | mordred: ideally you'd download both sdist and Wheel | 03:41 |
dstufft | but pip isn't really designed for mirroring :[ | 03:41 |
morganfainberg | mordred, esp when they look at a resume and think "Oh open source developer, he'd like to work on proprietary java internal close source insanity" | 03:41 |
fungi | wow... a java programming gig in new jersey? can't say i'm sure which part is worse | 03:41 |
dstufft | fungi: I was just thinking that | 03:42 |
mordred | morganfainberg: ++ | 03:42 |
dstufft | there's literally nothing about that which sounds appealing | 03:42 |
lifeless | mordred: btw, if you ahve 70G available, bandersnatch++ | 03:42 |
mordred | I don't know what a bandersnatch is | 03:42 |
lifeless | mordred: its the official pypi mirror tool | 03:42 |
dstufft | it does a full mirror of PyPI | 03:42 |
lifeless | s/the/a/ | 03:42 |
dstufft | I think there was a reason why openstack didn't want that | 03:42 |
dstufft | because i'm pretty sure i suggested that before | 03:42 |
clarkb | which isnt what we want but is slowly getting there | 03:42 |
lifeless | mordred: efficiently | 03:42 |
anteaya | mordred: they don't know any better | 03:43 |
morganfainberg | fungi, or worse, java + "git architect". wait... what is a git architect job really? and why does that need ot be a full time job. I could help do that and have more fun/wider range of things just by working with -infra | 03:43 |
clarkb | review.o.o isnt working on my phone again | 03:43 |
clarkb | :/ | 03:43 |
lifeless | clarkb: get a new phone? :) | 03:43 |
morganfainberg | lifeless, bandersnatch is cool. i've been looking at that for some internal stuff | 03:43 |
clarkb | its a chrome js + caching problem I think | 03:43 |
morganfainberg | it's the 70G that is a challenge for me to sell, but i like the project | 03:43 |
mordred | lifeless, dstufft I BELIEVE lasst time we looked at it it didn't work yet | 03:43 |
mordred | 70G is a piece of cake | 03:44 |
clarkb | mordred its a mirror | 03:44 |
lifeless | morganfainberg: 70G - mem. | 03:44 |
lifeless | morganfainberg: meh I mean. :) | 03:44 |
clarkb | which we dont want | 03:44 |
dstufft | oh | 03:44 |
mordred | lifeless: does it follow external links? | 03:44 |
anteaya | my sister's niece is a recruiter | 03:44 |
dstufft | no it doesn't | 03:44 |
clarkb | because external links | 03:44 |
morganfainberg | oh oh. i meant to ask...is there some strange chrome issue w/ gerrit? | 03:44 |
lifeless | mordred: no | 03:44 |
mordred | clarkb: you used to argue the opposed | 03:44 |
mordred | opposite | 03:44 |
mordred | lifeless: yea - that's why | 03:44 |
anteaya | pretty and knows zip and doesnt' want to know | 03:44 |
mordred | we want it to suck down external links | 03:44 |
dstufft | kill all your external links imo | 03:44 |
dstufft | :D | 03:44 |
morganfainberg | my chrome (desktop) browser jumps around when i click. | 03:44 |
mordred | because those are what kill us | 03:44 |
morganfainberg | lifeless, phsaw, ram is free right? | 03:44 |
anteaya | one ugly christmas dinner was enough for me | 03:44 |
lifeless | mordred: we want them to die :) | 03:44 |
mordred | we do | 03:44 |
lifeless | morganfainberg: it doesn't use much ram | 03:44 |
mordred | but they are not yet dead | 03:44 |
clarkb | mordred well a mirror that pulls external links is what I want :) | 03:44 |
mordred | clarkb: yes | 03:45 |
mordred | if it pulled external links, I'd get rid of pypi-mirror and just use it | 03:45 |
morganfainberg | lifeless, also network is free, right? | 03:45 |
dstufft | it's convievable that bandersnatch would grow the option to pull in external links | 03:45 |
morganfainberg | lifeless, >.> | 03:45 |
lifeless | morganfainberg: I may have exceeded my quota this week :) | 03:45 |
morganfainberg | lifeless, hehehe | 03:45 |
dstufft | althouth it wouldn't match the output of pip install exactly | 03:45 |
dstufft | because people can update the external links | 03:45 |
lifeless | morganfainberg: since I setup an Ubuntu mirror (100Gish) + bandersnatch(70G) | 03:45 |
*** nati_ueno has joined #openstack-infra | 03:45 | |
morganfainberg | lifeless, oh dear! | 03:45 |
dstufft | without updating the pypi listing at all | 03:45 |
dstufft | and then bandersnatch won't know to download :[[ | 03:45 |
lifeless | morganfainberg: I have a 500G quota | 03:46 |
lifeless | morganfainberg: so I'm probably ok. | 03:46 |
morganfainberg | lifeless, that would blow out my bandwidth cap. (I only get 250) | 03:46 |
* clarkb is unlimited \o/ | 03:46 | |
*** nati_ueno has quit IRC | 03:46 | |
morganfainberg | clarkb, i could use my cellphone i have "unlimited data"* | 03:46 |
anteaya | I too am unlimited | 03:46 |
morganfainberg | anteaya, :( /jealous | 03:47 |
anteaya | if you want | 03:47 |
anteaya | Canadian telco monopoly is pretty bad | 03:47 |
anteaya | I'm jealous of the dude in SF with 271mb up and down | 03:48 |
anteaya | don't know what he has for data cap | 03:48 |
*** nati_uen_ has quit IRC | 03:48 | |
dstufft | unlimited ftw | 03:48 |
clarkb | dstufft right which is why we pip | 03:48 |
anteaya | okay so that was the last patch we were waiting for | 03:49 |
anteaya | lifeless: when markmcclain shows up again, he can release a neutronclient, the last patch he was waiting on (yours) has merged: https://review.openstack.org/#/c/69110/ | 03:50 |
anteaya | he said he would check in after he had dinner | 03:50 |
anteaya | and I am off to bed | 03:50 |
anteaya | nighty-night | 03:51 |
clarkb | mordred: so we don't use latest pip | 03:51 |
lifeless | anteaya: thanks for the update | 03:51 |
anteaya | np | 03:51 |
mordred | clarkb: no? | 03:51 |
clarkb | morganfainberg: we use pip 1.4.X because 1.5 is broken | 03:51 |
clarkb | mordred: ^ | 03:51 |
mordred | clarkb: I thought 1.5.1 was out | 03:51 |
clarkb | as is latest virtualenv and tox | 03:51 |
mordred | which fixed it | 03:51 |
mordred | we just haven't unpinned yet | 03:51 |
mordred | because of FF | 03:51 |
clarkb | mordred: it might, if it did we didn't unpin | 03:51 |
clarkb | mordred: that said | 03:51 |
morganfainberg | clarkb, i know i know i should use a different name in irc, cause i make you type 4 characters instead of 3 | 03:51 |
clarkb | mordred: we might want to set no wheels there bceause we do two passes right? | 03:51 |
*** sarob has joined #openstack-infra | 03:51 | |
mordred | well... yeah. I could see that | 03:52 |
clarkb | mordred: I think what cyril is saying is that the mirror builder will only find wheels if they are available | 03:52 |
mordred | we do the tarball pass, adn then we build wheels from the tarballs | 03:52 |
clarkb | mordred: so the tarball pass needs to not get any wheels then the wheel pass needs to get all wheels | 03:52 |
mordred | so - ok. I can buy that | 03:52 |
*** rcleere has joined #openstack-infra | 03:53 | |
clarkb | also pinging holger tomorrow is on my list of things to do | 03:53 |
clarkb | I would really like working tox | 03:53 |
mordred | yeah | 03:53 |
mordred | that would be awesome | 03:53 |
* mordred is so happy that dstufft hangs out in here | 03:53 | |
mordred | it's made pip so much better for us | 03:53 |
mordred | also, he's cool and all | 03:54 |
clarkb | I am going to hop over to #tox tomorrow when I feel patient :) | 03:54 |
dstufft | ;) | 03:54 |
dstufft | um | 03:54 |
dstufft | i'm going to be releasing a 1.5.3 either tonight or tomorrow | 03:55 |
dstufft | I don't know how hard it would be to unpin and try things out | 03:55 |
mordred | well, right now we've got tox pinned | 03:55 |
dstufft | ok | 03:55 |
mordred | beacuse of $other_bug | 03:55 |
mordred | so realistically we won't try for another 2 weeks at best, because of our feature freeze cycle of death | 03:55 |
dstufft | just saying, if it's easy to unpin then it would be a good time to sneak something into 1.5.3 | 03:56 |
dstufft | if something else came up | 03:56 |
dstufft | :] | 03:56 |
mordred | nod | 03:56 |
mordred | I trust that pip is perfect at this point | 03:56 |
dstufft | ok :) | 03:58 |
Steap_ | clarkb: yep, that was my idea (I'm Cyril), looking at the code, I think we'll get the wheels anyway | 03:58 |
Steap_ | so I wanted to keep the tarballs since we currently cannot do without them | 03:58 |
clarkb | yup | 03:59 |
*** gyee has quit IRC | 04:00 | |
mordred | Steap_: yup. I grok now | 04:04 |
mordred | and thanks- I believe you're quite right | 04:04 |
Steap_ | mordred: honestly, I don't really get everything :) | 04:06 |
Steap_ | I've just learnt that there was yet another way of installing Python packages | 04:07 |
mordred | :) | 04:07 |
mordred | wheels are awesome | 04:07 |
mordred | we need to get on them | 04:07 |
Steap_ | and it prevents some packages from being updated :/ | 04:07 |
mordred | but we're not there yet | 04:07 |
Steap_ | mordred: yeah, probably | 04:07 |
Steap_ | but I've known easy_install, pip, setuptools, distutils... Yesterday I had to install a Ruby package, had to learn abotu gems... That's sort of a pain in the end :) | 04:08 |
Steap_ | I miss ./configure && make && make install :) | 04:08 |
clarkb | mordred I am 95% sure we can turn on wheels now | 04:08 |
Steap_ | mordred: do you have a link explaining how wheels are awesome ? | 04:08 |
clarkb | i did a bunch of local testing against the mirror and it seemed to work | 04:08 |
mordred | Steap_: one reason - they don't run python setup.py to install | 04:08 |
Steap_ | clarkb: what about the old pip used in the gates ? | 04:08 |
mordred | Steap_: they are pre-built/binary | 04:09 |
Steap_ | mordred: ok | 04:09 |
mordred | Steap_: so you don't need dev libs or c compilers or anything | 04:09 |
mordred | I mean, you still need the c libs | 04:09 |
clarkb | Steap_ not a problem I uses the same version of pip | 04:09 |
mordred | but MUCH MUCH more efficient | 04:09 |
Steap_ | clarkb: well, how do you explain the failures in the gates, then ? | 04:09 |
mordred | Steap_: I miss configure ; make ; make install too | 04:09 |
Steap_ | mordred: things where simpler :) | 04:09 |
clarkb | Steap_ which failures? | 04:09 |
mordred | Steap_: on my last project, I was teh automake/autoconf person | 04:09 |
Steap_ | mordred: the main issue is that you need to learn a different way of installing packages for every language you migth have to use | 04:10 |
mordred | we could move to autoconf for our python stuff ... | 04:10 |
mordred | Steap_: yeah | 04:10 |
Steap_ | and it changes every5 years | 04:10 |
Steap_ | so in the end, as a user, it's a pain | 04:10 |
Steap_ | if it's not packaged in the distrib, it can keep me busy for a long time | 04:10 |
clarkb | I think the failures we saw last time wheels were enabled were due to having wheels but telling pip to not use them | 04:11 |
Steap_ | clarkb: well, when only wheels are available for a given package, thegates fail to install it | 04:11 |
clarkb | need to go the other way around | 04:11 |
Steap_ | clarkb: oh | 04:11 |
Steap_ | sure | 04:11 |
clarkb | Steap_ we dont use wheels today | 04:11 |
clarkb | so that fails | 04:11 |
clarkb | wheels are not enabled in tests now so a package that is only wheeled wont install | 04:12 |
Steap_ | clarkb: ok | 04:12 |
Steap_ | yes, that's what happens with six | 04:12 |
*** david-lyle has joined #openstack-infra | 04:13 | |
Steap_ | so, maybe we should discard my patch and just enable wheels in tests | 04:13 |
Steap_ | shouldn't we ? | 04:13 |
clarkb | no your patch is good | 04:13 |
clarkb | then we enable wheels again | 04:13 |
Steap_ | why would we still need the tarballs ? :) | 04:13 |
Steap_ | if wheels are awesome | 04:13 |
clarkb | because not everyone will have wheels enabled | 04:13 |
Steap_ | ok | 04:13 |
clarkb | tarballs work everywhere | 04:13 |
Steap_ | yes | 04:14 |
Steap_ | that's the good thing about them :) | 04:14 |
clarkb | wheels are system specific | 04:14 |
dolphm | can anyone take a glance at this very short log and tell me it's not normal? http://logs.openstack.org/37/69137/5/check/check-tempest-dsvm-postgres-full/3cf2f41/logs/devstack-gate-setup-host.txt | 04:14 |
dolphm | "useradd: user 'stack' already exists" etc | 04:15 |
fungi | dolphm: sounds like a host got reused during a test. likely a casualty of jenkins 1.551 (which we just downgraded away from a few hours ago | 04:19 |
dolphm | fungi: i juust filed a bug report for it -- is it a dupe of something? | 04:19 |
clarkb | fungi timestamps are newer. does d-g create the user before devstack | 04:19 |
dolphm | https://bugs.launchpad.net/devstack/+bug/1281902 | 04:19 |
uvirtbot | Launchpad bug 1281902 in openstack-ci "/opt/stack/new/devstack/functions-common:1128 Keystone fail to get token" [Undecided,New] | 04:19 |
fungi | oh, i was going off the "user 'stack' already exists" | 04:20 |
*** markmcclain has joined #openstack-infra | 04:20 | |
fungi | i believe it does, because it needs to chown some things | 04:20 |
lifeless | Steap_: you need the tarballs to create wheels for platforms | 04:21 |
dolphm | fungi: host re-use makes sense, given the other failures with git | 04:21 |
clarkb | still possible reuse happened | 04:21 |
*** markmcclain has quit IRC | 04:22 | |
*** changbl has joined #openstack-infra | 04:23 | |
*** lcheng has joined #openstack-infra | 04:23 | |
*** sarob has quit IRC | 04:23 | |
clarkb | fungi swap didnt need fixing I think you are right node was reused | 04:24 |
*** coolsvap has joined #openstack-infra | 04:25 | |
*** gokrokve has quit IRC | 04:27 | |
*** gokrokve has joined #openstack-infra | 04:28 | |
*** jeckersb is now known as jeckersb_gone | 04:29 | |
*** dkliban has joined #openstack-infra | 04:32 | |
*** gokrokve has quit IRC | 04:32 | |
*** ryanpetrello has joined #openstack-infra | 04:35 | |
*** matsuhashi has joined #openstack-infra | 04:35 | |
*** tian has quit IRC | 04:50 | |
*** masayukig has joined #openstack-infra | 04:51 | |
*** sarob has joined #openstack-infra | 04:51 | |
*** yamahata_ has quit IRC | 04:51 | |
*** tian has joined #openstack-infra | 04:52 | |
*** yamahata_ has joined #openstack-infra | 04:56 | |
*** jaypipes has quit IRC | 05:00 | |
zaro | clarkb, fungi : was there a bug for jenkins zmq job? | 05:04 |
*** rcleere has quit IRC | 05:05 | |
*** lcheng has quit IRC | 05:09 | |
*** khyati has joined #openstack-infra | 05:11 | |
clarkb | zaro I thought do but I may havemisread | 05:12 |
*** vogxn has joined #openstack-infra | 05:15 | |
*** miqui has quit IRC | 05:15 | |
zaro | clarkb: i don't think so, the only bug i marked fixed today was 1276180 | 05:17 |
zaro | bug 1276180 | 05:17 |
uvirtbot | Launchpad bug 1276180 in openstack-ci "Gerrit hook scripts failing with IndexError exceptions" [High,Fix committed] https://launchpad.net/bugs/1276180 | 05:17 |
clarkb | I mustve misparsed mail on my phone then | 05:18 |
zaro | clarkb: you have time to take a quick look at change https://review.openstack.org/#/c/60348 ? | 05:19 |
zaro | real quick look, i promise. | 05:20 |
*** markmcclain has joined #openstack-infra | 05:23 | |
*** nicedice has quit IRC | 05:23 | |
*** lcheng has joined #openstack-infra | 05:24 | |
*** sarob has quit IRC | 05:24 | |
clarkb | but house of cards | 05:25 |
*** chandan_kumar has joined #openstack-infra | 05:26 | |
*** nati_ueno has joined #openstack-infra | 05:27 | |
*** markmcclain has quit IRC | 05:27 | |
*** CaptTofu has joined #openstack-infra | 05:27 | |
*** nati_ueno has quit IRC | 05:28 | |
*** nati_ueno has joined #openstack-infra | 05:29 | |
*** mfisch has quit IRC | 05:31 | |
*** CaptTofu has quit IRC | 05:32 | |
zaro | clarkb: huh? what does that mean? | 05:34 |
clarkb | its a show on netflix. quite good. | 05:35 |
*** mfisch has joined #openstack-infra | 05:35 | |
*** mfisch has joined #openstack-infra | 05:35 | |
openstackgerrit | Shawn Hartsock proposed a change to openstack/requirements: add pyvmomi library https://review.openstack.org/69964 | 05:36 |
*** nati_uen_ has joined #openstack-infra | 05:37 | |
*** DinaBelova_ is now known as DinaBelova | 05:37 | |
*** amotoki has quit IRC | 05:39 | |
*** nati_ueno has quit IRC | 05:41 | |
*** dstanek has quit IRC | 05:48 | |
*** dstanek has joined #openstack-infra | 05:49 | |
*** sarob has joined #openstack-infra | 05:51 | |
*** wenlock has joined #openstack-infra | 05:57 | |
openstackgerrit | A change was merged to openstack/requirements: Update hp3parclient low version number https://review.openstack.org/73727 | 05:58 |
nibalizer | clarkb: fungi sure 2GB sounds fine to start with | 05:59 |
nibalizer | i'd also want a chunk of disk too, at least 20GB for it to write stuff down in | 05:59 |
*** coolsvap1 has joined #openstack-infra | 05:59 | |
nibalizer | (/var) | 05:59 |
*** gokrokve has joined #openstack-infra | 06:01 | |
*** e0ne has joined #openstack-infra | 06:02 | |
*** coolsvap has quit IRC | 06:03 | |
*** gokrokve has quit IRC | 06:06 | |
*** vkozhukalov has joined #openstack-infra | 06:10 | |
*** hdd_ has quit IRC | 06:14 | |
fungi | nibalizer: yeah, it has at least that much, but we can attach volumes too | 06:17 |
*** cadenzajon has joined #openstack-infra | 06:20 | |
fungi | anyway, it puppeted fine and is up and in dns now | 06:20 |
*** markmcclain has joined #openstack-infra | 06:23 | |
*** amotoki has joined #openstack-infra | 06:23 | |
*** gokrokve has joined #openstack-infra | 06:24 | |
*** sarob has quit IRC | 06:24 | |
fungi | deleting 70 nodes over 3 hours in their current state | 06:25 |
*** amotoki has quit IRC | 06:25 | |
*** yolanda has joined #openstack-infra | 06:25 | |
*** amotoki has joined #openstack-infra | 06:26 | |
*** cadenzajon has quit IRC | 06:26 | |
*** banix has quit IRC | 06:26 | |
*** markmcclain has quit IRC | 06:28 | |
*** e0ne has quit IRC | 06:28 | |
*** markmcclain has joined #openstack-infra | 06:30 | |
nibalizer | fungi: sweeeet! | 06:30 |
jeblair | fungi: the hpcloud nodepool providers haven't done anything since around 5:30 | 06:33 |
*** ryanpetrello has quit IRC | 06:33 | |
fungi | oh/ | 06:34 |
fungi | ? | 06:34 |
jeblair | fungi: i think they're waiting for network data | 06:34 |
*** markmcclain has quit IRC | 06:34 | |
*** matsuhashi has quit IRC | 06:35 | |
jeblair | fungi: yes, they are all sitting in return self._sslobj.read(len) | 06:36 |
jeblair | (inside of ssl, called from urllib | 06:36 |
jeblair | fungi: suggest we just restart nodepool | 06:36 |
clarkb | ++ | 06:37 |
fungi | wfm | 06:37 |
*** matsuhashi has joined #openstack-infra | 06:37 | |
jeblair | done | 06:37 |
jeblair | i'll delete nodes that were 'building' and 'delete' while it was stopped | 06:38 |
*** lcheng has quit IRC | 06:39 | |
jeblair | okay, that's started; 560 nodes | 06:39 |
jeblair | 2 of my scripts that were deleting keypairs similarly stopped | 06:39 |
jeblair | i restarted them | 06:39 |
fungi | thanks... i'm operating at a reduced capacity at this point and may just grab a nap in preparation for whatever fresh challenges await us tomorrow | 06:39 |
fungi | wondering if hp had network maintenance or something | 06:40 |
jeblair | fungi: good question. | 06:40 |
openstackgerrit | Spencer Krum proposed a change to openstack-infra/config: Enable puppetdb from puppetmaster https://review.openstack.org/74612 | 06:40 |
jeblair | i also wonder if there's a way we can protect against that; basically that was a novaclient call that just never returned. | 06:40 |
nibalizer | fungi: there is your follow up ^^ | 06:41 |
fungi | nibalizer: thanks! | 06:41 |
nibalizer | fungi: can you check /var/log/puppetdb/puppetdb.log for any errors or warnings? | 06:42 |
*** gokrokve_ has joined #openstack-infra | 06:45 | |
*** gokrokve has quit IRC | 06:48 | |
*** khyati has quit IRC | 06:49 | |
*** gokrokve_ has quit IRC | 06:49 | |
fungi | nibalizer: info lines filtered out for brevity... http://paste.openstack.org/show/67166 | 06:51 |
*** sarob has joined #openstack-infra | 06:51 | |
*** lcheng has joined #openstack-infra | 06:52 | |
*** lttrl has joined #openstack-infra | 06:53 | |
fungi | looks like it wants ~150g in /var/lib/puppetdb | 06:53 |
nibalizer | fungi: okay i was more or less expecting that message | 06:54 |
nibalizer | im not sure what the economics of adding more storage are | 06:54 |
fungi | so we'll either need to tune it down or add a volume there | 06:54 |
fungi | free (for us) | 06:54 |
nibalizer | okay well if its not expensive lets just add disk | 06:55 |
fungi | but it will need to wait for tomorrow unless someone else wants to take over. i'm officially out of steam (2am here) | 06:56 |
nibalizer | thats fine with me | 06:56 |
clarkb | fungi: sleep | 06:56 |
nibalizer | im just trying to make sure you all aren't blocked on me | 06:56 |
fungi | nibalizer: not at all--thanks for the help! | 06:57 |
* fungi is just blocked on not enough hours in the day | 06:57 | |
nibalizer | for context, my company charges something like $8/gig/month for storage between teams, so i wondered if 150 gig would be a problem | 06:57 |
fungi | we have very generous sponsors | 06:58 |
lifeless | 8/GBM - wow | 06:58 |
*** e0ne has joined #openstack-infra | 06:59 | |
lifeless | thats some fancy pants storage at that rate | 06:59 |
*** jhesketh has quit IRC | 06:59 | |
*** jhesketh__ has quit IRC | 06:59 | |
fungi | engraved on golden platters | 07:00 |
*** e0ne has quit IRC | 07:00 | |
*** e0ne has joined #openstack-infra | 07:01 | |
*** thomasbiege has joined #openstack-infra | 07:01 | |
*** lcheng has quit IRC | 07:04 | |
*** rlandy has joined #openstack-infra | 07:04 | |
*** morganfainberg is now known as morganfainberg_Z | 07:05 | |
*** e0ne has quit IRC | 07:06 | |
*** mgagne has joined #openstack-infra | 07:07 | |
nibalizer | yea... if my company would openstack... that would be great | 07:10 |
nibalizer | (on commodity hardware) | 07:10 |
*** mgagne1 has quit IRC | 07:10 | |
lifeless | nibalizer: so what does it want 150GB for | 07:10 |
lifeless | thats like 2 copies of PyPI | 07:10 |
clarkb | logs | 07:12 |
nibalizer | it doesn't need that much space | 07:12 |
nibalizer | in my experience | 07:12 |
clarkb | for every puppet run done every 10 minutes on every server. now with single use slaves its probably not terrible | 07:12 |
nibalizer | are the single use slaves the ones that run puppet apply? | 07:13 |
nibalizer | because those will probably never hit puppetdb | 07:13 |
nibalizer | at my work we have 300+ nodes and 20K resources on a 17GB disk with plenty of space to go, given a 2 week retention policy | 07:14 |
*** mrda is now known as mrda_away | 07:14 | |
clarkb | right so no impact from them | 07:15 |
nibalizer | how many nodes are there checking in? | 07:15 |
*** basha has joined #openstack-infra | 07:16 | |
*** jcooley_ has quit IRC | 07:17 | |
*** jcooley_ has joined #openstack-infra | 07:17 | |
*** chandan_kumar has quit IRC | 07:18 | |
*** dstanek has quit IRC | 07:18 | |
*** basha has quit IRC | 07:20 | |
*** denis_makogon has joined #openstack-infra | 07:21 | |
*** bhuvan has joined #openstack-infra | 07:21 | |
*** jcooley_ has quit IRC | 07:22 | |
*** e0ne has joined #openstack-infra | 07:24 | |
*** sarob has quit IRC | 07:25 | |
*** chandan_kumar has joined #openstack-infra | 07:25 | |
*** markwash has quit IRC | 07:26 | |
*** vogxn has quit IRC | 07:27 | |
*** e0ne has quit IRC | 07:28 | |
*** CaptTofu has joined #openstack-infra | 07:29 | |
*** dstanek has joined #openstack-infra | 07:29 | |
*** saju_m has joined #openstack-infra | 07:30 | |
*** markmcclain has joined #openstack-infra | 07:31 | |
*** CaptTofu has quit IRC | 07:33 | |
*** markmcclain has quit IRC | 07:35 | |
*** vishy has quit IRC | 07:40 | |
*** cyeoh has quit IRC | 07:40 | |
*** DinaBelova is now known as DinaBelova_ | 07:40 | |
*** cyeoh has joined #openstack-infra | 07:41 | |
*** vishy has joined #openstack-infra | 07:43 | |
*** mrmartin has joined #openstack-infra | 07:47 | |
*** nati_uen_ has quit IRC | 07:49 | |
*** nati_ueno has joined #openstack-infra | 07:49 | |
*** sarob has joined #openstack-infra | 07:51 | |
*** e0ne has joined #openstack-infra | 07:56 | |
*** daniil has quit IRC | 07:56 | |
*** luqas has joined #openstack-infra | 07:58 | |
*** e0ne has quit IRC | 08:00 | |
*** skraynev_afk has quit IRC | 08:01 | |
*** sandywalsh has joined #openstack-infra | 08:03 | |
*** basha has joined #openstack-infra | 08:08 | |
*** denis_makogon has quit IRC | 08:16 | |
*** DinaBelova_ is now known as DinaBelova | 08:20 | |
*** saju_m has quit IRC | 08:21 | |
*** sarob has quit IRC | 08:24 | |
*** jgallard has joined #openstack-infra | 08:26 | |
*** thomasbiege has quit IRC | 08:26 | |
*** DinaBelova is now known as DinaBelova_ | 08:28 | |
*** DinaBelova_ is now known as DinaBelova | 08:28 | |
*** vogxn has joined #openstack-infra | 08:28 | |
*** markmcclain has joined #openstack-infra | 08:32 | |
*** vogxn has quit IRC | 08:33 | |
*** openstack has joined #openstack-infra | 08:42 | |
-dickson.freenode.net- [freenode-info] if you're at a conference and other people are having trouble connecting, please mention it to staff: http://freenode.net/faq.shtml#gettinghelp | 08:42 | |
*** asadoughi has joined #openstack-infra | 08:43 | |
*** nati_uen_ has joined #openstack-infra | 08:43 | |
*** chandan_kumar has joined #openstack-infra | 08:44 | |
*** jog0 has joined #openstack-infra | 08:45 | |
*** matrohon has joined #openstack-infra | 08:46 | |
*** nati_ueno has quit IRC | 08:47 | |
*** jroovers|afk has joined #openstack-infra | 08:49 | |
*** koolhead17 has joined #openstack-infra | 08:50 | |
*** basha has joined #openstack-infra | 08:50 | |
*** sarob has joined #openstack-infra | 08:51 | |
*** jcoufal has joined #openstack-infra | 08:52 | |
*** rossella-s has joined #openstack-infra | 08:53 | |
*** jpich has joined #openstack-infra | 08:54 | |
*** lttrl has quit IRC | 08:54 | |
*** markmcclain has joined #openstack-infra | 08:59 | |
*** nosnos_ has joined #openstack-infra | 08:59 | |
*** nosnos has quit IRC | 08:59 | |
*** markmcclain1 has joined #openstack-infra | 09:00 | |
*** markmcclain has quit IRC | 09:01 | |
*** afazekas has joined #openstack-infra | 09:01 | |
*** e0ne has joined #openstack-infra | 09:04 | |
*** markmcclain1 has quit IRC | 09:05 | |
*** coolsvap1 has quit IRC | 09:06 | |
*** dstanek has quit IRC | 09:09 | |
*** derekh has joined #openstack-infra | 09:13 | |
*** yassine has joined #openstack-infra | 09:14 | |
*** flaper87|afk is now known as flaper87 | 09:16 | |
*** hashar has joined #openstack-infra | 09:18 | |
*** hashar_ has joined #openstack-infra | 09:19 | |
*** hashar has quit IRC | 09:22 | |
*** chandan_kumar has quit IRC | 09:23 | |
*** sarob has quit IRC | 09:25 | |
*** NikitaKonovalov is now known as NikitaKonovalov_ | 09:29 | |
*** CaptTofu has joined #openstack-infra | 09:29 | |
*** CaptTofu has quit IRC | 09:34 | |
*** coolsvap has joined #openstack-infra | 09:34 | |
*** DinaBelova is now known as DinaBelova_ | 09:35 | |
*** yaguang has quit IRC | 09:36 | |
*** fbo_away is now known as fbo | 09:37 | |
*** dpyzhov has joined #openstack-infra | 09:37 | |
*** mrmartin has quit IRC | 09:39 | |
*** luqas has quit IRC | 09:40 | |
*** dpyzhov has joined #openstack-infra | 09:41 | |
*** chandan_kumar has joined #openstack-infra | 09:41 | |
*** jp_at_hp has joined #openstack-infra | 09:42 | |
*** chandankumar_ has joined #openstack-infra | 09:42 | |
*** chandan_kumar has quit IRC | 09:46 | |
*** matsuhashi has quit IRC | 09:46 | |
*** nosnos_ has quit IRC | 09:46 | |
*** nosnos has joined #openstack-infra | 09:47 | |
*** DinaBelova_ is now known as DinaBelova | 09:47 | |
*** matsuhashi has joined #openstack-infra | 09:47 | |
*** pblaho has joined #openstack-infra | 09:50 | |
*** gilliard has quit IRC | 09:51 | |
*** sarob has joined #openstack-infra | 09:51 | |
*** saju_m has joined #openstack-infra | 09:58 | |
*** jdurgin has quit IRC | 09:59 | |
*** noorul has joined #openstack-infra | 10:00 | |
noorul | http://logs.openstack.org/98/69498/31/check/gate-solum-docs/9de63ba/console.html#_2014-02-19_03_44_50_840 | 10:00 |
noorul | Any idea why that is happening? | 10:00 |
*** markmcclain has joined #openstack-infra | 10:01 | |
*** NikitaKonovalov_ is now known as NikitaKonovalov | 10:03 | |
*** julienvey has joined #openstack-infra | 10:04 | |
*** markmcclain has quit IRC | 10:06 | |
*** nati_uen_ has quit IRC | 10:07 | |
*** pblaho has quit IRC | 10:07 | |
*** pblaho has joined #openstack-infra | 10:08 | |
*** jdurgin has joined #openstack-infra | 10:12 | |
*** luqas has joined #openstack-infra | 10:20 | |
*** hashar_ is now known as hashar | 10:20 | |
*** sarob has quit IRC | 10:24 | |
*** masayukig has quit IRC | 10:25 | |
*** ociuhandu has joined #openstack-infra | 10:29 | |
*** chandankumar_ has quit IRC | 10:32 | |
*** ArxCruz has joined #openstack-infra | 10:33 | |
*** alexpilotti has joined #openstack-infra | 10:34 | |
*** dpyzhov has quit IRC | 10:35 | |
*** dpyzhov has joined #openstack-infra | 10:37 | |
*** dizquierdo has joined #openstack-infra | 10:38 | |
*** dpyzhov has quit IRC | 10:38 | |
*** flaper87 is now known as flaper87|afk | 10:38 | |
kiall | Seems the gerrit bot is MIA | 10:40 |
*** flaper87|afk is now known as flaper87 | 10:42 | |
BobBall | jeblair: I'm using novaclient 2.15.0 with nodepool - my problem was NOVA_RAX_AUTH was set to 1, which is an environment variable read explicitly by novaclient and used to set auth_system thus introducing the dependency on the RAX authentication method. | 10:44 |
*** chandankumar_ has joined #openstack-infra | 10:47 | |
*** nati_ueno has joined #openstack-infra | 10:49 | |
*** flaper87 is now known as flaper87|afk | 10:50 | |
*** flaper87|afk is now known as flaper87 | 10:50 | |
*** sarob has joined #openstack-infra | 10:51 | |
*** nati_ueno has quit IRC | 10:53 | |
*** che-arne has joined #openstack-infra | 10:55 | |
*** dpyzhov has joined #openstack-infra | 10:57 | |
*** dpyzhov has quit IRC | 10:59 | |
*** dpyzhov has joined #openstack-infra | 11:00 | |
*** markmcclain has joined #openstack-infra | 11:02 | |
*** markmcclain has quit IRC | 11:06 | |
*** basha has quit IRC | 11:11 | |
*** jgallard has quit IRC | 11:12 | |
*** wenlock has quit IRC | 11:14 | |
*** chandankumar_ has quit IRC | 11:14 | |
*** chandan_kumar has joined #openstack-infra | 11:15 | |
*** NikitaKonovalov is now known as NikitaKonovalov_ | 11:17 | |
*** NikitaKonovalov_ is now known as NikitaKonovalov | 11:19 | |
*** heyongli has joined #openstack-infra | 11:20 | |
*** CaptTofu has joined #openstack-infra | 11:22 | |
*** ociuhandu has quit IRC | 11:23 | |
*** mrmartin has joined #openstack-infra | 11:24 | |
*** johnthetubaguy has joined #openstack-infra | 11:24 | |
*** andreaf has joined #openstack-infra | 11:25 | |
*** sarob has quit IRC | 11:25 | |
*** CaptTofu has quit IRC | 11:26 | |
*** matsuhashi has quit IRC | 11:28 | |
enikanorov_ | hi. does anyone knows what's up with check queue? looks like it's stuck | 11:31 |
*** matsuhashi has joined #openstack-infra | 11:31 | |
ilyashakhat_ | enikanorov_: maybe SergeyLukjanov ? | 11:32 |
enikanorov_ | SergeyLukjanov: ping | 11:32 |
SergeyLukjanov | enikanorov_, ilyashakhat_, pong | 11:32 |
SergeyLukjanov | looking on it | 11:32 |
SergeyLukjanov | heh, I'm afraid that we have no free devstack-precise slaves | 11:34 |
ilyashakhat_ | is it ok such large 'Deleting' area on Job Stats? | 11:36 |
*** CaptTofu has joined #openstack-infra | 11:36 | |
*** mrmartin has quit IRC | 11:36 | |
SergeyLukjanov | ilyashakhat_, nope | 11:36 |
SergeyLukjanov | ilyashakhat_, I don't see that any jobs are running now | 11:38 |
SergeyLukjanov | and mostly all slaves are offline | 11:38 |
*** jroovers|afk is now known as jroovers | 11:39 | |
*** ociuhandu has joined #openstack-infra | 11:42 | |
*** noorul has left #openstack-infra | 11:42 | |
SergeyLukjanov | sdague, I've checked all jenkins nodes and we have only several online slaves on jenkins and jenkins01 | 11:42 |
sdague | yeh, it looks like something went all bonkers again | 11:43 |
sdague | it looks like, from scrollback, they were working on it last night | 11:44 |
sdague | so I think it's just a wait for fungi thing, because this is the class of things where you need root to go fix I think | 11:45 |
SergeyLukjanov | sdague, yup | 11:45 |
SergeyLukjanov | I think that we have tons of 'deleting' slaves that are already offline on jenkins | 11:45 |
SergeyLukjanov | probably, nodepool is dead :( | 11:45 |
*** che-arne has quit IRC | 11:45 | |
sdague | yeh, probably | 11:46 |
*** hashar is now known as hasharAW | 11:47 | |
*** sarob has joined #openstack-infra | 11:51 | |
SergeyLukjanov | hm, one more idea is that it's related to the fact that gate-noop is now running on single use nodes | 11:56 |
*** coolsvap has quit IRC | 11:57 | |
*** hasharAW has quit IRC | 11:58 | |
*** lcostantino has joined #openstack-infra | 12:01 | |
*** markmcclain has joined #openstack-infra | 12:03 | |
*** rfolco has joined #openstack-infra | 12:07 | |
*** markmcclain has quit IRC | 12:07 | |
*** ArxCruz has quit IRC | 12:08 | |
*** lcostantino has quit IRC | 12:08 | |
*** max_lobur has joined #openstack-infra | 12:09 | |
*** lcostantino has joined #openstack-infra | 12:09 | |
*** mrmartin has joined #openstack-infra | 12:10 | |
*** ArxCruz has joined #openstack-infra | 12:10 | |
mrmartin | re | 12:12 |
*** banix has joined #openstack-infra | 12:13 | |
*** dpyzhov has quit IRC | 12:13 | |
*** lcostantino has quit IRC | 12:14 | |
mrmartin | something wrong with gating jobs, some tasks had been started more than 9 hours ago | 12:14 |
sdague | mrmartin: yep, no one that can fix it is currently awake | 12:14 |
*** ArxCruz has quit IRC | 12:16 | |
*** matsuhashi has quit IRC | 12:20 | |
kiall | Guess it's time for someone to give the gate a kick ;) Totally and utterly wedged. | 12:20 |
*** NikitaKonovalov is now known as NikitaKonovalov_ | 12:20 | |
*** ArxCruz has joined #openstack-infra | 12:20 | |
*** sarob has quit IRC | 12:24 | |
*** matsuhashi has joined #openstack-infra | 12:25 | |
*** lcostantino has joined #openstack-infra | 12:30 | |
*** luqas has quit IRC | 12:32 | |
*** yamahata has quit IRC | 12:36 | |
*** lcostantino has quit IRC | 12:37 | |
*** Nikolay_St has quit IRC | 12:38 | |
*** hashar has joined #openstack-infra | 12:42 | |
*** dpyzhov has joined #openstack-infra | 12:45 | |
*** jgallard has joined #openstack-infra | 12:46 | |
*** che-arne has joined #openstack-infra | 12:49 | |
*** sarob has joined #openstack-infra | 12:51 | |
*** smarcet has joined #openstack-infra | 12:52 | |
*** banix has quit IRC | 12:54 | |
*** CaptTofu has quit IRC | 12:55 | |
*** CaptTofu has joined #openstack-infra | 12:56 | |
*** nosnos has quit IRC | 12:56 | |
*** CaptTofu has quit IRC | 13:00 | |
*** NikitaKonovalov_ is now known as NikitaKonovalov | 13:01 | |
*** lcostantino has joined #openstack-infra | 13:01 | |
*** david-lyle has quit IRC | 13:01 | |
*** banix has joined #openstack-infra | 13:01 | |
*** koolhead17 has quit IRC | 13:01 | |
*** koolhead17 has joined #openstack-infra | 13:01 | |
*** markmcclain has joined #openstack-infra | 13:04 | |
*** matsuhashi has quit IRC | 13:08 | |
*** matsuhashi has joined #openstack-infra | 13:08 | |
*** matsuhashi has quit IRC | 13:08 | |
*** markmcclain has quit IRC | 13:08 | |
*** dkranz has quit IRC | 13:09 | |
ekarlso | did someone threw a banany peel into the gate or ? | 13:11 |
SergeyLukjanov | fungi, clarkb, jeblair, mordred, gate is dead // just want to be sure that you'll see it :) | 13:13 |
*** zhiyan_ is now known as zhiyan | 13:14 | |
*** luqas has joined #openstack-infra | 13:17 | |
*** ken1ohmichi has quit IRC | 13:18 | |
*** dprince has joined #openstack-infra | 13:19 | |
*** mrmartin has quit IRC | 13:21 | |
*** thomasbiege has joined #openstack-infra | 13:21 | |
*** pdmars has joined #openstack-infra | 13:22 | |
*** sarob has quit IRC | 13:24 | |
*** pdmars has quit IRC | 13:25 | |
*** weshay has joined #openstack-infra | 13:33 | |
*** luqas has quit IRC | 13:33 | |
*** dolphm has quit IRC | 13:35 | |
*** dolphm has joined #openstack-infra | 13:35 | |
*** dcramer__ has joined #openstack-infra | 13:36 | |
*** lcostantino has quit IRC | 13:36 | |
*** dolphm has quit IRC | 13:37 | |
*** dolphm has joined #openstack-infra | 13:38 | |
*** sandywalsh has quit IRC | 13:39 | |
*** mrmartin has joined #openstack-infra | 13:40 | |
*** CaptTofu has joined #openstack-infra | 13:45 | |
*** dizquierdo has quit IRC | 13:46 | |
*** hashar has quit IRC | 13:47 | |
*** sarob has joined #openstack-infra | 13:51 | |
*** sandywalsh has joined #openstack-infra | 13:53 | |
*** thomasem has joined #openstack-infra | 13:53 | |
*** mfer has joined #openstack-infra | 13:55 | |
*** dkehn_ has quit IRC | 13:55 | |
*** dolphm has quit IRC | 13:55 | |
*** CaptTofu has quit IRC | 13:56 | |
*** dolphm has joined #openstack-infra | 13:57 | |
*** dpyzhov has quit IRC | 13:57 | |
*** ryanpetrello has joined #openstack-infra | 13:57 | |
*** luqas has joined #openstack-infra | 13:57 | |
*** dpyzhov has joined #openstack-infra | 13:57 | |
*** salv-orlando has quit IRC | 13:58 | |
*** CaptTofu has joined #openstack-infra | 13:58 | |
*** dolphm has quit IRC | 13:58 | |
*** lcostantino has joined #openstack-infra | 13:59 | |
*** dolphm has joined #openstack-infra | 13:59 | |
*** gordc has joined #openstack-infra | 13:59 | |
*** hashar has joined #openstack-infra | 14:01 | |
*** sandywalsh_ has joined #openstack-infra | 14:02 | |
*** markmcclain has joined #openstack-infra | 14:04 | |
*** prad has quit IRC | 14:05 | |
*** banix has quit IRC | 14:05 | |
*** sandywalsh has quit IRC | 14:06 | |
*** heyongli has quit IRC | 14:06 | |
*** dcramer__ has quit IRC | 14:07 | |
*** markmcclain has quit IRC | 14:08 | |
*** lcostantino has quit IRC | 14:09 | |
*** salv-orlando has joined #openstack-infra | 14:09 | |
*** markmc has joined #openstack-infra | 14:10 | |
*** saju_m has quit IRC | 14:11 | |
*** dkranz has joined #openstack-infra | 14:11 | |
*** dpyzhov has quit IRC | 14:12 | |
*** salv-orlando has quit IRC | 14:12 | |
*** andreaf has quit IRC | 14:14 | |
*** smarcet has left #openstack-infra | 14:17 | |
*** smarcet has joined #openstack-infra | 14:17 | |
*** pafuent has joined #openstack-infra | 14:18 | |
*** yamahata has joined #openstack-infra | 14:19 | |
fungi | looking now | 14:22 |
fungi | looks like we've piled up fake ready nodes again | 14:22 |
*** e0ne has quit IRC | 14:23 | |
*** julim has joined #openstack-infra | 14:23 | |
sdague | fungi: yeh, it seems to have started happening about the same time the infra team was signing off last night | 14:23 |
sdague | so I'm curious if a late change caused issues | 14:23 |
*** dolphm has quit IRC | 14:24 | |
fungi | not sure. i'll have to read scrollback. we were still stabilizing things coming out of the jenkins upgrade and redowngrade | 14:24 |
*** sarob has quit IRC | 14:24 | |
*** lcostantino has joined #openstack-infra | 14:24 | |
fungi | by the time i passed out | 14:25 |
sdague | yep, that's fair | 14:25 |
*** oubiwann has joined #openstack-infra | 14:25 | |
ttx | SergeyLukjanov: I thought you'd magically get things fixed while everyone else sleeps | 14:26 |
fungi | hopefully this isn't an issue with jenkins 1.632.2, because if so we've basically ruled out being able to use any of the jenkins releases which aren't riddled with known security holes | 14:26 |
*** dpyzhov has joined #openstack-infra | 14:26 | |
fungi | er, 1.532.2 | 14:26 |
SergeyLukjanov | ttx, I have no root access and it looks like problem is about non-code stuff :( | 14:26 |
fungi | which was what drove me to upgrade us to 1.551 yesterday. huge list of security fixes, some in bits we expose to the public | 14:27 |
ttx | SergeyLukjanov: just trolling you, ignore me :) | 14:27 |
SergeyLukjanov | ttx :) | 14:27 |
SergeyLukjanov | fungi, can I help you somehow? | 14:28 |
fungi | i've got 10 loops running in parallel right now deleting any nodepool nodes which are >3 hours in any state | 14:28 |
fungi | this should get things moving again, i think | 14:28 |
fungi | checking scrollback to see if they left us any other breadcrumbs | 14:29 |
ttx | When I saw 400 checks piled up I thought: "we should really have those tripleo checks appear in a separate display".. then I looked again | 14:30 |
fungi | yah | 14:30 |
SergeyLukjanov | heh | 14:31 |
*** e0ne has joined #openstack-infra | 14:31 | |
*** dims has quit IRC | 14:32 | |
*** wenlock has joined #openstack-infra | 14:33 | |
fungi | so as best i can tell, nodepool thinks it piled about 600 nodes onto jenkins04, a nearly couple hundred of them in a ready state, but jenkins04's interface shows only offline nodes | 14:34 |
*** jeckersb_gone is now known as jeckersb | 14:34 | |
fungi | hard to tell how many, but it *could* be in the hundreds | 14:34 |
*** dims has joined #openstack-infra | 14:35 | |
*** thomasbiege has quit IRC | 14:35 | |
fungi | looks like we're down gerritbot and statusbot too... | 14:37 |
fungi | 2014-02-19 08:37:31 <-- openstackgerrit (~openstack@review.openstack.org) has quit (Ping timeout: 260 seconds) | 14:37 |
fungi | 2014-02-19 08:38:01 <-- openstackstatus (~openstack@eavesdrop.openstack.org) has quit (Ping timeout: 272 seconds) | 14:37 |
sdague | fungi: the deletes are already helping, things started to move again in gate queue | 14:38 |
fungi | i'll get the bots restarted and see whether something happened in raxland around 6 hours ago (which could also coincide with when this started to go south) | 14:39 |
*** yamahata has quit IRC | 14:39 | |
*** dstanek has joined #openstack-infra | 14:39 | |
*** yamahata has joined #openstack-infra | 14:39 | |
fungi | sort of odd to see bots from two different servers fall off from a ping timeout at the same moment. and those servers are in the same rax region as the nodepool server and jenkins masters | 14:40 |
*** dkliban has quit IRC | 14:40 | |
* fungi makes the obligatory "clouds" sigh and finds a second cup of coffee | 14:40 | |
SergeyLukjanov | fungi, it sounds like rax network outage could be the reason | 14:41 |
*** ildikov_ has joined #openstack-infra | 14:41 | |
fungi | "Our engineers have received reports of a brief network disruption that impacted a portion of our DFW2 data center starting at approximately 02:36 CST. The team engaged has stabilized the issue at approximately 02:41 CST and will continue to monitor for further impact. " | 14:42 |
ArxCruz | lifeless: hey, can you give me your blessing here https://review.openstack.org/#/c/70152/ ? | 14:42 |
ArxCruz | :) | 14:42 |
fungi | https://status.rackspace.com/ | 14:42 |
fungi | 02:36 CST is 08:36 UTC for the timezone-impared | 14:43 |
fungi | well, we already know that nodepool behaves terribly in the face of provider outages, and thankfully lifeless and derekh have patches proposed which should help that | 14:44 |
*** jp_at_hp has quit IRC | 14:47 | |
fungi | that tempest change failing near the head of the gate hit connectivity issues to pypi.python.org trying to download pip around 14:38, just a few minutes ago, from a rax-dfw slave too | 14:47 |
SergeyLukjanov | fungi, it's good | 14:47 |
*** russellb has joined #openstack-infra | 14:48 | |
SergeyLukjanov | fungi, is pip installation the only external op? | 14:48 |
fungi | (separate note, we still need to neuter get-pip so that it installs from a local cache on these systems) | 14:48 |
fungi | SergeyLukjanov: nah, jobs also need to look up dns records, retrieve git updates and zuul refs, upload logs/artifacts and stream data back to the jenkins master too | 14:49 |
*** dhellmann has quit IRC | 14:49 | |
*** jp_at_hp has joined #openstack-infra | 14:49 | |
fungi | anyway, my pint was that whatever this is happening in dfw, it might be ongoing | 14:49 |
*** dhellmann has joined #openstack-infra | 14:49 | |
SergeyLukjanov | fungi, bad wording from my side, I mean outside of our infra | 14:49 |
*** dizquierdo has joined #openstack-infra | 14:49 | |
fungi | s/pint/point/ (though now i feel like i need a pint too) | 14:50 |
SergeyLukjanov | fungi, dns and pip | 14:50 |
*** dhellmann has quit IRC | 14:50 | |
fungi | SergeyLukjanov: possibly... some less common jobs also download other sorts of things from the internet too | 14:50 |
SergeyLukjanov | fungi, "whatever this is happening in dfw, it might be ongoing" :( | 14:50 |
fungi | well, just noting that was a connectivity issue from a few minutes ago, and when i checked the slave's location, it was in that same region which had the outage earlier | 14:51 |
*** sarob has joined #openstack-infra | 14:51 | |
sdague | how easy would it be to pull the whole region? | 14:51 |
fungi | but it could also just be an unfortunate coincidence. i'm still casting my net wide here | 14:51 |
*** dkliban has joined #openstack-infra | 14:51 | |
*** dhellmann has joined #openstack-infra | 14:51 | |
fungi | sdague: not easy... for historical reasons we have most of our static infrastructure deployed in rax-dfw... we'd need to rebuild a lot of longer-lived servers | 14:52 |
sdague | so, yeh, spot checking additional fails | 14:52 |
sdague | they all look like dfw | 14:52 |
sdague | and all because of connectivitiy | 14:52 |
fungi | (pretty much any infra service you can think of, aside from nodepool slaves, backups and some experimental systems, is in dfw) | 14:53 |
anteaya | ah | 14:53 |
sdague | oof | 14:53 |
anteaya | sounds like some waves of movement might be a good idea | 14:53 |
fungi | so the *good* news here is that we could recover from complete loss of dfw, but it's not a move to be undertaken on a whim | 14:54 |
anteaya | no not a whim | 14:54 |
anteaya | but perhaps a slow migration | 14:54 |
*** miqui has joined #openstack-infra | 14:54 | |
*** jnoller has joined #openstack-infra | 14:54 | |
* anteaya looks for a land bridge over the glacier | 14:54 | |
anteaya | which server would be the easiest to migrate? | 14:55 |
*** luqas has quit IRC | 14:55 | |
anteaya | follow up question, which would be the most important? | 14:55 |
fungi | just about any of them would be roughly similarly easy to migrate, with a few exceptions, but there's just a lot of systems | 14:55 |
*** dkliban has quit IRC | 14:56 | |
*** mwagner_lap has joined #openstack-infra | 14:56 | |
anteaya | right | 14:56 |
fungi | think back to the several easels of marker-smeared paper we had diagramming them from a high level at the bootcamp... then mentally add a bunch more we've brought online since then | 14:56 |
anteaya | yes | 14:56 |
*** luqas has joined #openstack-infra | 14:57 | |
anteaya | several easels worth | 14:57 |
anteaya | if I started up and etherpad to list them, would this help the conversation/migration? | 14:57 |
anteaya | even if the consclusion is not to migrate? | 14:57 |
*** prad has joined #openstack-infra | 14:57 | |
anteaya | https://etherpad.openstack.org/p/migrate-all-the-things | 14:59 |
*** mgagne has quit IRC | 14:59 | |
anteaya | do joing me | 14:59 |
anteaya | join | 14:59 |
*** wenlock_ has joined #openstack-infra | 15:00 | |
*** CaptTofu has quit IRC | 15:01 | |
fungi | i think that's premature. what we need is a group discussion about ways to spread systems out to reduce the impact of provider outages, which probably means some redundancy... or we remind ourselves that as we've previously stated we're operating at the mercy of providers donating these resources, and they're up most of the time, and when they're not, we should just go out for a drink and clean up | 15:01 |
fungi | the mess later | 15:01 |
*** CaptTofu has joined #openstack-infra | 15:01 | |
anteaya | okay | 15:01 |
fungi | but right now i need to focus on stabilizing this and see what else might be left broken from the earlier incident | 15:02 |
anteaya | well while waiting for the others I don't mind having a place to copy/paste | 15:02 |
anteaya | and I can abandon the etherpad later if need be | 15:02 |
anteaya | right | 15:02 |
anteaya | and I need something to do because I can't help you with that | 15:02 |
*** CaptTofu has quit IRC | 15:02 | |
*** CaptTofu has joined #openstack-infra | 15:02 | |
fungi | if we continue to see any new gate failures (besides the ones there) which are network connectivity problems and are on nodepool nodes in dfw, i'll temporarily scale nodepool off that region to buy us a little more stability | 15:03 |
*** e0ne_ has joined #openstack-infra | 15:03 | |
*** protux has joined #openstack-infra | 15:04 | |
*** jaypipes has joined #openstack-infra | 15:05 | |
*** markmcclain has joined #openstack-infra | 15:05 | |
fungi | though at the cost of 132 nodes of capacity | 15:06 |
anteaya | :( | 15:07 |
fungi | yeah, it's a balancing act | 15:07 |
*** e0ne has quit IRC | 15:07 | |
*** dpyzhov has quit IRC | 15:08 | |
*** openstackstatus has joined #openstack-infra | 15:08 | |
*** wenlock_ has quit IRC | 15:08 | |
*** openstackgerrit has joined #openstack-infra | 15:09 | |
*** dpyzhov has joined #openstack-infra | 15:09 | |
fungi | okay, we've got openstackstatus and openstackgerrit back | 15:10 |
*** markmcclain has quit IRC | 15:10 | |
anteaya | yay | 15:10 |
*** Ajaeger has joined #openstack-infra | 15:10 | |
anteaya | what server are they on? | 15:10 |
fungi | one is on eavesdrop and the other is on review | 15:10 |
jeblair | fungi: good morning | 15:10 |
fungi | jeblair: i hope so! | 15:10 |
dims | :) | 15:11 |
*** eharney has joined #openstack-infra | 15:11 | |
fungi | jeblair: quick summary, "brief" outage in rax-dfw around 08:30 utc (but maybe with lingering effects, jury's still deliberating) | 15:11 |
*** NikitaKonovalov is now known as NikitaKonovalov_ | 15:12 | |
jeblair | fungi: so... i was reading the scrollback and talk of a mass migration from dfw and imagined something rather serious... | 15:12 |
jeblair | fungi: anything i should kick or check? | 15:13 |
fungi | not especially serious, no | 15:13 |
*** bknudson has quit IRC | 15:13 | |
fungi | jeblair: a deeper health check on nodepoold would be great. it does seem to be adding replacement nodes as i delete the stale ones, but curious whether it warrants restarting | 15:13 |
jeblair | ack | 15:14 |
sdague | one other oddity | 15:14 |
sdague | I can't seem to find a single functioning py26 node in the currently running list | 15:15 |
jeblair | sdague: ack | 15:15 |
sdague | so that might be a parallel thing to look into, because even if the nodepool recovers, that will hold us up | 15:15 |
*** jeckersb is now known as jeckersb_gone | 15:16 | |
*** jergerber has joined #openstack-infra | 15:16 | |
fungi | sdague: i'll make a second pass to delete the py26 nodes which aren't in use, to speed that along | 15:16 |
*** DinaBelova is now known as DinaBelova_ | 15:16 | |
sdague | fungi: cool | 15:17 |
fungi | thanks for spotting it | 15:17 |
*** jeckersb_gone is now known as jeckersb | 15:17 | |
anteaya | I count 63 servers listed in cacti | 15:19 |
anteaya | which are now listed in the etherpad | 15:19 |
jeblair | fungi: hpcloud seems operational; az1 and az3 are idle because they are at capacity, likely with false-ready nodes | 15:19 |
*** e0ne_ has quit IRC | 15:19 | |
jeblair | fungi: you have deletes going on that will catch those? | 15:19 |
fungi | jeblair: yes | 15:19 |
*** malini has joined #openstack-infra | 15:20 | |
*** e0ne has joined #openstack-infra | 15:20 | |
fungi | 10 loops going in parallel right now | 15:20 |
fungi | though it looks like nodepoold is adding new ready nodes which aren't picking up jobs either... i'm starting to suspect it's having trouble adding them to jenkins masters successfully | 15:20 |
fungi | none of the nodes currently in a ready state are >1hr there | 15:21 |
jeblair | anteaya: keep in mind that it's generally better to have single-point-of-failure servers that interact with each other in the same data center; it's likely that if we spread out some of our services to 2 data centers, we woud be subject to twice the number of service interruptions | 15:21 |
jeblair | anteaya: the exception of that is if we can make services truly ha; that is difficult for many of the more important things we run. | 15:22 |
*** gokrokve has joined #openstack-infra | 15:22 | |
fungi | and looks like all the centos6 nodes which have been ready for over 0.2 hours are on jenkins04 for some reason | 15:22 |
fungi | which is the same master which had most of the nodes earlier. i'm checking it out now | 15:22 |
fungi | maybe it's having issues | 15:22 |
fungi | yeah, jenkins04 currently has a handful of offline devstack slaves from rax-dfw assigned to it, and nothing else | 15:23 |
*** sarob has quit IRC | 15:23 | |
anteaya | jeblair: ah a good point, I did not know that | 15:23 |
fungi | so whatever nodepool thinks is going on, it's wrong | 15:23 |
jeblair | fungi: seeing if the jenkins provider for 04 is stuck... | 15:25 |
jeblair | File "/usr/lib/python2.7/ssl.py", line 160, in read | 15:25 |
jeblair | return self._sslobj.read(len) | 15:25 |
jeblair | hey same place novaclient was stuck yesterday | 15:25 |
fungi | is that via requests, or direct socket? | 15:26 |
*** ihrachys has quit IRC | 15:26 | |
jeblair | 2014-02-19 08:33:40,851 DEBUG nodepool.JenkinsManager: Manager jenkins04 running task <nodepool.jenkins_manager.CreateNodeTask object at 0x7f3fddc7be10> | 15:26 |
*** sandywalsh_ has quit IRC | 15:26 | |
jeblair | stuck since then | 15:26 |
fungi | yep, that's when all the trouble began | 15:26 |
jeblair | fungi: its via File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen | 15:27 |
fungi | ahh, okay | 15:27 |
jeblair | fungi: so i think we're going to need a restart | 15:27 |
*** dmsimard has joined #openstack-infra | 15:27 | |
fungi | okay, i'll take care of it if you're done debugging it's present state | 15:28 |
*** atiwari has joined #openstack-infra | 15:28 | |
jeblair | i am; go for it | 15:28 |
dmsimard | Hi guys, I think a merge glitched but I wanted to ask you to make sure not to re-try the merge if there's problems right now.. https://review.openstack.org/#/c/74082/ | 15:28 |
Shrews | jeblair, fungi: not really following the issue too closely, but when I've seen network errors like that (getting stuck on reads after cloud outages), having keepalive enabled on the sockets usually helps prevent the "stuck" | 15:28 |
*** zhiyan is now known as zhiyan_ | 15:29 | |
*** bknudson has joined #openstack-infra | 15:29 | |
*** dkliban has joined #openstack-infra | 15:29 | |
jeblair | Shrews: good idea; hopefully we can get that, or something, passed through all the layers we need | 15:29 |
jeblair | (novaclient, requests, urllib[123456789], ssl, socket) | 15:30 |
openstackgerrit | Ruslan Kamaldinov proposed a change to openstack-infra/storyboard: Update developer documentation https://review.openstack.org/74713 | 15:30 |
*** ihrachys has joined #openstack-infra | 15:30 | |
jeblair | dmsimard: thanks for pointing that out; that looks like a new kind of failure we have only seen a couple of times. it suggests something wrong with the git repos that are cached on the slave images | 15:32 |
jeblair | dmsimard: unfortunately that slave is gone now; but you should be able to just try again | 15:32 |
annegentle | where's the one true Jenkins to find out if something built? www-01.jenkins.openstack.org? | 15:32 |
*** coolsvap has joined #openstack-infra | 15:33 | |
dmsimard | jeblair: I checked the recheck bugs but haven't found anything that seemed to be like that. Should I recheck with a specific bug # ? | 15:33 |
fungi | blasting out the ready nodes from prior to the restart, so that we get back some momentum. then i'll do the building and delete lists i saved from it | 15:33 |
fungi | annegentle: what specifically are you looking for? | 15:33 |
annegentle | Jenkins where are you? | 15:33 |
jeblair | dmsimard: i don't think there is one; could you please file one on openstack-ci, link to that job, and paste the bug here? | 15:33 |
annegentle | fungi: why the source for this training manuals page http://git.openstack.org/cgit/openstack/openstack-manuals/tree/doc/training-guides/lab001-control-node.xml isn't being published to http://docs.openstack.org/training-guides/content/lab001-control-node.xml.html | 15:34 |
dmsimard | jeblair: Will do and recheck against that bug. Thanks. | 15:34 |
*** mgagne has joined #openstack-infra | 15:34 | |
annegentle | fungi: specifically apt-get dist-update is showing on published, apt-get dist-upgrade is what's in the source | 15:34 |
fungi | annegentle: logs.openstack.org is going to be your best bet, but you need to know how to build the url. i'll get you an example for that one specifically | 15:34 |
annegentle | fungi: nice, a worked example | 15:34 |
*** bhuvan has quit IRC | 15:35 | |
fungi | annegentle: though before i jump into that, keep in mind taht we've had a bit of a setback this morning and jobs are just now starting to catch up... if it's for a merged change listed in the post pipeline at http://status.openstack.org/zuul/ then it possibly hasn't been finished yet | 15:36 |
fungi | i see about a dozen changes for openstack-manuals which haven't finished in post yet | 15:37 |
fungi | still awaiting worker assignments | 15:37 |
Ajaeger | annegentle: that's strange, published last on the 1st of February... | 15:37 |
fungi | yeah, so sounds like it's been broken for longer. i'll pick a commit which merged a few days ago to be assured i can get you a good example | 15:38 |
fungi | rather than one which might still be pending completion | 15:38 |
*** david-lyle has joined #openstack-infra | 15:39 | |
*** esker has joined #openstack-infra | 15:39 | |
Ajaeger | Fungi, annegentle: Go to http://docs.openstack.org/training-guides/content/ and compare the list of chapters with http://docs.openstack.org/training-guides/content/lab001-control-node.xml.html | 15:40 |
*** jgrimm has joined #openstack-infra | 15:40 | |
openstackgerrit | Petr Blaho proposed a change to openstack-infra/config: Adds gate-tuskar-docs job to zuul. https://review.openstack.org/74756 | 15:40 |
openstackgerrit | Petr Blaho proposed a change to openstack-infra/config: Adds gate-python-tuskarclient-docs job to zuul. https://review.openstack.org/74757 | 15:40 |
mordred | annegentle: there is no one-true-jenkins. we have 8-masters in a pool behind gearman | 15:40 |
Ajaeger | annegentle: this one http://docs.openstack.org/training-guides/content/bk001-associate-training-guide.html shows as last section "Architect Training Guide" | 15:40 |
*** dpyzhov has quit IRC | 15:41 | |
Ajaeger | But the other one contains "Introduction to OpenStack" and further chapters afterwards | 15:41 |
Ajaeger | annegentle: so, this looks like a problem in the openstack-manuals side, nothing fungi can help with. | 15:41 |
*** sandywalsh_ has joined #openstack-infra | 15:42 | |
*** guitarzan has joined #openstack-infra | 15:42 | |
*** afazekas has quit IRC | 15:42 | |
fungi | annegentle: so, as a working example, if you wanted to see the post jobs for http://git.openstack.org/cgit/openstack/openstack-manuals/commit/?id=254befa4824ef2b3f34be2e54eddcfabf082a6d3 | 15:42 |
fungi | annegentle: the log url for that is http://logs.openstack.org/25/254befa4824ef2b3f34be2e54eddcfabf082a6d3/ | 15:42 |
*** esker has quit IRC | 15:43 | |
annegentle | fungi: Ajaeger: helpful! So, figure out what patch would have fixed it, work from there | 15:43 |
fungi | specifically the training-guide build/publication for 254befa post-merge is http://logs.openstack.org/25/254befa4824ef2b3f34be2e54eddcfabf082a6d3/post/openstack-training-guides/6692343/console.html | 15:43 |
Ajaeger | annegentle: see this change: https://review.openstack.org/#/c/70499/ | 15:43 |
Ajaeger | The chapters you are missing are not published anymore... | 15:44 |
Ajaeger | We really need to remove old files from the server! | 15:44 |
annegentle | Ajaeger: oh yes we do without me doing it manually! Arghhh | 15:44 |
*** wenlock_ has joined #openstack-infra | 15:44 | |
*** markwash has joined #openstack-infra | 15:44 | |
Ajaeger | annegentle: did you talk with clarkb or others to remove regularly old files? | 15:45 |
fungi | Ajaeger: annegentle: unless all your jobs can be fixed to publish into completely separate subdirectories so that they can delete and recreate that entire tree on publication, there's not much which can really be done about having old files | 15:45 |
annegentle | we have in the past, never got a good solution (yep what fungi says) | 15:45 |
Ajaeger | fungi, we publish in complete separate subdirectories. | 15:46 |
fungi | Ajaeger: at least previously it was a "too many cooks in the kitchen" problem (multiple jobs writing to common locations) | 15:46 |
Ajaeger | But how do you want to do it - upload to subdirectory.new, then mv subdirectory to subdirectoy.old etc. | 15:46 |
Ajaeger | fungi: That problem shouldn't be there anymore at all. | 15:47 |
*** esker has joined #openstack-infra | 15:47 | |
jgriffith | fungi: jeblair should we hold off on +2/A patches for now? | 15:47 |
fungi | if they no longer do, i think there's an option to the ftp publisher to completely remove the target directory when it runs... though you also get a brief outage for that content on every update i think | 15:47 |
jgriffith | fungi: jeblair or does it not matter | 15:47 |
Ajaeger | But I might be wrong and oversee something ;) | 15:47 |
Ajaeger | and that brief outage is the problem. | 15:47 |
*** esker has quit IRC | 15:47 | |
anteaya | there is a neutron patch failing in the gate | 15:47 |
Ajaeger | Uploading takes a minute or more - that's too long IMO. | 15:47 |
*** esker has joined #openstack-infra | 15:47 | |
fungi | jgriffith: approving stuff won't hurt. we're backlogged, but the systems can queue things up just fine | 15:47 |
anteaya | it appears it will be removed without a gate reset | 15:47 |
anteaya | or I can remove it | 15:48 |
*** markwash has quit IRC | 15:48 | |
jgriffith | fungi: ok, thanks | 15:48 |
annegentle | yeah the brief outage is a stopper | 15:48 |
dmsimard | jeblair: https://bugs.launchpad.net/openstack-ci/+bug/1282136 | 15:48 |
uvirtbot | Launchpad bug 1282136 in openstack-ci "Git problem: "Failed to resolve 'HEAD' as a valid ref."" [Undecided,New] | 15:48 |
jgriffith | just wanted to make sure I don't compound issues | 15:48 |
annegentle | fungi: jeblair: mordred: if the docs.openstack.org site goes to Jekyll or some such do you imagine this particular problem would go away | 15:48 |
fungi | annegentle: Ajaeger: getting a location to publish things without being limited to ftp access (so that we could rsync) would make that easier | 15:49 |
jeblair | Ajaeger, annegentle: after i3 i'd like to switch to scp copying | 15:49 |
*** rcleere has joined #openstack-infra | 15:49 | |
*** jaypipes has quit IRC | 15:49 | |
annegentle | fungi: jeblair: mordred: Todd Morey has ideas for that, with the overall vision being Docbook source > built to html > built with jekyll | 15:49 |
Ajaeger | fungi, regarding approving - could you put approve https://review.openstack.org/73690 - and remember to delete openstack-api-ref since we go from maven to freestyle. | 15:49 |
jeblair | or rsync if we can swing it | 15:49 |
mordred | annegentle: uhm. - I have no idea what a jekyll is | 15:49 |
*** fbo is now known as fbo_away | 15:50 | |
Ajaeger | jeblair: can we rsync over ssh? | 15:50 |
fungi | a release name candidate which unfairly lost the poll | 15:50 |
annegentle | mordred: http://jekyllrb.com/docs/usage/ | 15:50 |
*** gpocentek has joined #openstack-infra | 15:50 | |
mordred | annegentle: ok. so it's just like doing it with the maven build from what I can see | 15:50 |
jeblair | annegentle: i don't think the rendering systems affects the underlying problem. | 15:50 |
annegentle | mordred: yeah | 15:50 |
mordred | yeah. what jeblair said | 15:50 |
annegentle | jeblair: that's true | 15:50 |
fungi | Ajaeger: we can rsync over ssh but need a functional shell for that (just scp/sftp access won't help) and the destination needs rsync installed | 15:50 |
Ajaeger | fungi: Yeah, indeed. | 15:51 |
mordred | from our side, we don't really care if you use maven or jekyll - other than wanting to make sure that jekyll is installable and not just hipster crap | 15:51 |
annegentle | fungi: Ajaeger: moving to jekyll would be a good reason to get off Cloud Sites which gives us shell access | 15:51 |
jeblair | mordred: ++ | 15:51 |
Ajaeger | What is Jekyll` | 15:51 |
*** sarob has joined #openstack-infra | 15:51 | |
mordred | annegentle: if jekyll incentivizes moving off of cloud sites, I'm all for it | 15:51 |
annegentle | Ajaeger: http://jekyllrb.com/docs/usage/ | 15:51 |
jeblair | annegentle: we can move off of cloud sites and switch to scp or rsync independent of when/if you switch to jekyll | 15:51 |
mordred | but also, what jeblair said | 15:51 |
annegentle | mordred: jeblair: get Todd Morey to get the design done :) | 15:51 |
anteaya | Ajaeger: it is a templating language made up by tom of github | 15:51 |
fungi | rb is an abbreviation for hipster crap in japanese, right? | 15:52 |
anteaya | Ajaeger: used widely in the ruby community | 15:52 |
mordred | fungi: ++ | 15:52 |
fungi | ;) | 15:52 |
annegentle | jeblair: not to me, the two are directly related because I don't get a marked-enough improvement | 15:52 |
jeblair | annegentle: no, i'm saying we don't need to wait for that. we have other reasons we need to change how the docs are published | 15:52 |
annegentle | jeblair: in other words, I'm not willing to risk all the changes without a killer redesign | 15:52 |
annegentle | jeblair: Don't wanna. :) | 15:52 |
jeblair | annegentle: i'm sorry, we need to move and it can't wait for todd. we need to make the publishing pipeline better. :) | 15:53 |
annegentle | Ajaeger: jeblair: we can of course revisit at the summit and get a game plan for moving off of Cloud Sites, but right now there's not enough incentive | 15:53 |
Ajaeger | Thanks for the explanations about Jekyll - let's see how this integrates with our XML publishing | 15:53 |
anteaya | Ajaeger: I'm betting it won't | 15:53 |
anteaya | ruby doesn't integrate | 15:53 |
annegentle | Ajaeger: to me it lets us stop publishing "webhelp" and publish plain html (or xhtml) | 15:53 |
anteaya | that is a point of pride for ruby | 15:53 |
Ajaeger | annegentle: We still have the option of running remotely a job that deletes old files. | 15:53 |
annegentle | Ajaeger: yeah | 15:53 |
*** sandywalsh_ has quit IRC | 15:53 | |
mordred | I think we've confused about four different conversations here | 15:53 |
annegentle | jeblair: I'm fine with not waiting on todd but need more incentive | 15:53 |
Ajaeger | annegentle: something to discuss in Atlanta I guess | 15:54 |
annegentle | mordred: that's four more fun! | 15:54 |
mordred | the incentive is that docs publication is a special pony right now | 15:54 |
* anteaya would like to focus on the fire fighting in the gate | 15:54 | |
mordred | and cloud sites are a bit of a pita to deal with | 15:54 |
annegentle | mordred: not enough incentive | 15:54 |
*** tjones has joined #openstack-infra | 15:54 | |
annegentle | mordred: not with a month and a week before an rc | 15:54 |
annegentle | mordred: mostly it's timin | 15:54 |
annegentle | timing | 15:54 |
jeblair | annegentle: for starters, we have the problem we're talking about now where you have to delete things; but moreover, we need to stop using jenkins, and the ftp publishing is not really compatible with what we're moving to | 15:54 |
jeblair | annegentle: we're not doing it now! :) | 15:54 |
mordred | what jeblair said | 15:55 |
annegentle | jeblair: yes then timing is all I'm concerned with. | 15:55 |
mordred | god no. not this instant | 15:55 |
Ajaeger | mordred: documentation will always be special ;) But yeah, let's make it less special :) | 15:55 |
annegentle | jeblair: when does jenkins go away | 15:55 |
mordred | ok. that makes more sense | 15:55 |
*** dcramer_ has joined #openstack-infra | 15:55 | |
mordred | annegentle: as soon as we can make it go away, which means we need to get rid of a few things, like ftp publishing | 15:55 |
annegentle | a redesign is HIGH priority too because of translation, versioning, old files, all that | 15:55 |
mordred | but - when I say "as soon" - I mean without affecting things like FF | 15:56 |
*** vrovachev has joined #openstack-infra | 15:56 | |
Ajaeger | jeblair: so, one part in moving of Jenkins - and fixing a bug with image api publishing - is getting https://review.openstack.org/73690 in ;) | 15:56 |
annegentle | not trying to conflate redesign with building, but to me they're tightly tied due to what all a redesign can also fix | 15:56 |
jeblair | annegentle: don't worry, the process will be working and tested and in use and in production before we move the docs | 15:56 |
annegentle | jeblair: you know I trust you guys, just trying to make sure you know the importance of a redesign (since you work at the Foundation I tell you these things too) | 15:56 |
jeblair | annegentle: we should keep these things in mind so that the two projects don't make incompatible decisions, but for the most part, they really are separate and we shouldn't tie one to the other -- it could just slow both down | 15:57 |
jeblair | annegentle: i wish i could make todd go faster. :) | 15:57 |
*** amotoki_ has joined #openstack-infra | 15:58 | |
jeblair | annegentle: but he's constantly getting sucked into side projects, and so the larger project of "improve how the website (all of it) is published" seems to move slowly :( | 15:58 |
jeblair | annegentle: believe me, i'm as interested in todd completing this kind of work as you are. | 15:58 |
annegentle | jeblair: don't we all :) Yes, it's a tough rock/hard place position | 15:58 |
*** banix has joined #openstack-infra | 15:59 | |
annegentle | jeblair: and I'm happy to be convinced of a 2-phase approach, pulling too many levers at once is probably folly | 15:59 |
annegentle | phase 1: un-jenkins phase 2: remove webhelp output | 15:59 |
*** salv-orlando has joined #openstack-infra | 16:00 | |
jeblair | annegentle: cool; we're about 2 years into taking baby steps to remove jenkins and getting near the end. we _try_ to not bite of more than we can chew. | 16:01 |
fungi | if you can dislocate your jaw like a snake, it helps too | 16:02 |
fungi | no chewing required | 16:03 |
persia | Digestion takes longer that way, though | 16:03 |
jeblair | fungi: the graph suggests we have used nodes! | 16:03 |
*** salv-orlando_ has joined #openstack-infra | 16:03 | |
*** salv-orlando has quit IRC | 16:04 | |
*** salv-orlando_ is now known as salv-orlando | 16:04 | |
fungi | jeblair: yep. i'm still churning through the ready ones, but should start blowing through the old building/delete nodes here shortly | 16:04 |
*** dpyzhov has joined #openstack-infra | 16:05 | |
*** amcrn has joined #openstack-infra | 16:05 | |
fungi | the gate sparkline has dropped sharply and check has at least plateaued now | 16:05 |
fungi | so we've regained forward momentum | 16:05 |
anteaya | will that neutron failure cause a gate reset? | 16:06 |
anteaya | I can remove it if so | 16:06 |
anteaya | I'm thinking it won't | 16:06 |
jeblair | anteaya: zuul already did that for you | 16:06 |
anteaya | great | 16:07 |
fungi | it caused a gate reset a while ago, but the benefit of nnfi is that it shields us somewhat from the pain of resets as long as there aren't multiple failing changes causing the broken ones further down to get retried repeatedly | 16:07 |
*** sandywalsh_ has joined #openstack-infra | 16:07 | |
fungi | the state of that change is "failed assuming everything in front of it succeeds" | 16:07 |
anteaya | yes | 16:07 |
openstackgerrit | Petr Blaho proposed a change to openstack-infra/config: Adds gate-python-tuskarclient-docs job to zuul. https://review.openstack.org/74757 | 16:08 |
openstackgerrit | Petr Blaho proposed a change to openstack-infra/config: Adds gate-tuskar-docs job to zuul. https://review.openstack.org/74756 | 16:08 |
anteaya | and the failure was FAIL: process-returncode | 16:08 |
jeblair | only about 2k more keypairs to delete from hpcloud | 16:08 |
fungi | jeblair: nearly done! | 16:08 |
*** roz has joined #openstack-infra | 16:08 | |
anteaya | which we have seen before and which clarkb has said in an email is due to sys.exit being used in the tests | 16:08 |
anteaya | it is the line between what zuul can do for me and what I need to do manually that I am trying to get better at figuring out | 16:09 |
anteaya | now the one ahead of it is failing | 16:09 |
anteaya | so two of them | 16:09 |
fungi | yep, but it depends on that one, so it won't get retried unless something ahead of those also fails | 16:10 |
fungi | but that has caused the two cinder changes in the gate to get tests restarted without the neutron changes in line | 16:10 |
fungi | i think i have time today to try and get the py3k-precise nodepool nodes into operation | 16:11 |
fungi | if things don't get bad again | 16:12 |
roz | quick question: just the owner of a change can mark it as WIP? I am working on a change where I am not the original author and I'd like to submit a patch as WIP but I can't do it.Ootionss: submit it as DRAFT - submit the patch and put a note in a comment "This is a WIP" ? any other suggetions? | 16:12 |
*** salv-orlando has quit IRC | 16:12 | |
anteaya | roz: don't use draft | 16:12 |
fungi | roz: there's an acl which controls that. for most openstack projects the core reviewer group on that project also has wip control | 16:12 |
*** yassine has quit IRC | 16:13 | |
*** yassine has joined #openstack-infra | 16:13 | |
fungi | roz: but i think the heart of the issue here is that gerrit leaves the change owner as the original patchset submitter rather than the most recent patchset submitter. i'm curious to see whether that's configurable in latest gerrit releases | 16:14 |
fungi | zaro: ^ ? (when you're around) | 16:14 |
*** Sukhdev has joined #openstack-infra | 16:14 | |
*** DinaBelova_ is now known as DinaBelova | 16:15 | |
roz | thanks for the replies. When you say core reviewer can control the WIP means that they can mark a change as WIP ro they can add me as "WIP cpntroller" for that specific change? | 16:15 |
anteaya | they can mark the change WIP | 16:16 |
anteaya | they can't change what permissions you have | 16:16 |
fungi | roz: or un-wip a wip change too | 16:16 |
anteaya | unless they make you core | 16:16 |
roz | thanks, now it's clear. | 16:16 |
fungi | roz: right, it's an acl covering wip control for an entire project--can't be assigned on a per-change basis except by modifying the owner of the change (which i don't think our current gerrit release has a feature to make that easy) | 16:17 |
*** pcrews has joined #openstack-infra | 16:19 | |
fungi | most of the old ready nodes are gone, and i've started some processes deleting old building nodes next | 16:19 |
fungi | oh, also, i was wrong about disappearing at 21:00 today for the osug monthly... i also have a tax appointment prior to that, so will actually be mostly offline starting at 19:00 utc | 16:20 |
anteaya | happy tax appointment | 16:20 |
fungi | so a little over couple hours from now | 16:20 |
anteaya | I hope you exit smiling | 16:20 |
*** rossella has joined #openstack-infra | 16:21 | |
anteaya | will clarkb be back today? | 16:21 |
*** mrmartin has quit IRC | 16:21 | |
fungi | anteaya: i believe he was back in seattle last night | 16:21 |
*** tjones has quit IRC | 16:21 | |
anteaya | yes | 16:21 |
*** rossella has quit IRC | 16:22 | |
anteaya | I'm also gone for the day soon | 16:22 |
anteaya | another appointment to fix my back/neck/head | 16:22 |
anteaya | hopefully this should wrap it up | 16:22 |
*** rossella-s has quit IRC | 16:23 | |
vrovachev | hi guys, please, review me: https://review.openstack.org/#/c/74342/ | 16:24 |
*** afazekas has joined #openstack-infra | 16:24 | |
*** david_lyle_ has joined #openstack-infra | 16:24 | |
anteaya | does the post job upstream-translation-update require a specific kind of node? lots of post jobs waiting for that job to get a node | 16:25 |
*** sarob has quit IRC | 16:25 | |
anteaya | the only specific node I am aware of is centos for python26 jobs | 16:26 |
fungi | anteaya: yes, there is a trusted static node named "proposal" assigned to jenkins.o.o which runs those, in order, one at a time | 16:26 |
fungi | so they have a tendency to queue up | 16:26 |
anteaya | ah | 16:26 |
fungi | oh! and it got marked offline | 16:26 |
anteaya | I don't see any of them running | 16:26 |
anteaya | okay | 16:27 |
fungi | looks like i need to patch teh regex for the jobs which run on it. checking to see what it ran last | 16:27 |
anteaya | k | 16:27 |
*** rossella-s has joined #openstack-infra | 16:27 | |
*** jcoufal has quit IRC | 16:28 | |
*** david-lyle has quit IRC | 16:28 | |
*** yassine has quit IRC | 16:28 | |
*** yassine has joined #openstack-infra | 16:29 | |
*** chuck__ has joined #openstack-infra | 16:29 | |
fungi | seems i missed setting propose-requirements-updates to add the reusable_node parameter function. i've re-onlined the slave so it should burn through those fairly quickly unless it hits another propose-requirements-update job before we merge the fix | 16:29 |
fungi | er, propose-requirements-updates | 16:30 |
anteaya | okay | 16:30 |
anteaya | I'll watch it | 16:30 |
anteaya | one down | 16:31 |
*** Ajaeger has quit IRC | 16:31 | |
*** esker has quit IRC | 16:32 | |
*** esker has joined #openstack-infra | 16:32 | |
anteaya | fungi: how long does it need between jobs? | 16:35 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Check server status in batch https://review.openstack.org/74773 | 16:35 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/config: Don't offline after propose-requirements-updates https://review.openstack.org/74774 | 16:35 |
jeblair | fungi, clarkb, mordred: https://review.openstack.org/74773 is another fairly small nodepool change that should make a huge difference | 16:36 |
anteaya | fungi: it is not currently running | 16:36 |
fungi | anteaya: looks like it takes about 1-1.5 minutes per job... https://jenkins.openstack.org/computer/proposal.slave.openstack.org/ | 16:36 |
anteaya | to finish the job | 16:36 |
anteaya | but to move from one to the other? | 16:36 |
fungi | anteaya: a few seconds | 16:36 |
*** sandywalsh_ has quit IRC | 16:36 | |
anteaya | okay, can you check it again | 16:36 |
anteaya | it isn't running, been at least 20 seconds | 16:37 |
*** esker has quit IRC | 16:37 | |
fungi | anteaya: it's running | 16:37 |
fungi | i just watched it complete a glance translation update and start a ceilometer one | 16:37 |
*** tjones has joined #openstack-infra | 16:38 | |
anteaya | I don't even see a glance patch in post | 16:38 |
anteaya | but at least it is running | 16:38 |
*** gokrokve has quit IRC | 16:38 | |
anteaya | thanks for the link | 16:38 |
fungi | anteaya: it might not have been in the post queue | 16:38 |
anteaya | oh | 16:38 |
anteaya | I had just been watching the post queue | 16:38 |
*** smarcet has quit IRC | 16:38 | |
fungi | it was in the periodic pipeline... https://jenkins.openstack.org/job/glance-propose-translation-update/325/parameters/ | 16:38 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Check server status in batch https://review.openstack.org/74773 | 16:39 |
*** gokrokve has joined #openstack-infra | 16:39 | |
dmsimard | jeblair: Got another merge fail on that same git bug again, is there another affected slave ? https://review.openstack.org/#/c/74082/ | 16:39 |
anteaya | ah, okay thanks | 16:40 |
SergeyLukjanov | jeblair, fwiw #2 lgtm (https://review.openstack.org/74773) | 16:40 |
fungi | dmsimard: looks like it might be a broken repository on one of the git server farm. i'll double-check that | 16:40 |
dmsimard | fungi: Thanks, I submitted https://bugs.launchpad.net/openstack-ci/+bug/1282136 FYI | 16:40 |
uvirtbot | Launchpad bug 1282136 in openstack-ci "Git problem: "Failed to resolve 'HEAD' as a valid ref."" [Undecided,New] | 16:40 |
*** ravikumar_hp has joined #openstack-infra | 16:41 | |
ravikumar_hp | quick question - What is Jenkins URL that runs nightly jobs | 16:42 |
anteaya | ravikumar_hp: we have 7 jenkins | 16:42 |
anteaya | they all run jobs | 16:42 |
SergeyLukjanov | anteaya, 8 | 16:42 |
SergeyLukjanov | :) | 16:43 |
*** gyee has joined #openstack-infra | 16:43 | |
fungi | ravikumar_hp: http://logs.openstack.org/periodic/ | 16:43 |
ravikumar_hp | ok. Jenkins that runs Tempest nightly jobs | 16:43 |
anteaya | yes 8 | 16:43 |
*** gokrokve has quit IRC | 16:43 | |
fungi | anteaya: SergeyLukjanov: technically 9 if you also count jenkins-dev | 16:43 |
anteaya | yes | 16:43 |
SergeyLukjanov | yeah ^) | 16:43 |
fungi | though for tempest periodic jobs, only 7 of them run those | 16:44 |
anteaya | I was going to move to figuring out what ravikumar_at_mothership wanted | 16:44 |
*** andreaf has joined #openstack-infra | 16:44 | |
anteaya | but you beat me too it | 16:44 |
ravikumar_hp | anteaya: i am trying to find out if there is Tempest job that runs everyday other than gated test | 16:45 |
jeblair | dmsimard: thanks! i was quiet because i was quickly sshing into that slave, which hadn't been deleted | 16:45 |
jeblair | dmsimard: it does indeed look like the cached git repo for puppet-swift on that node was bad; i saved a copy of it | 16:46 |
dmsimard | jeblair: We ran into the same issue for puppet-neutron, I did a recheck and it worked - I linked it to the bug | 16:46 |
fungi | jeblair: dmsimard: yes, my eye jumped to the remote update, but the git farm's copies of that repo seem fine | 16:46 |
fungi | and 'git clone file:///opt/git/stackforge/puppet-swift' was the local source of the issue | 16:47 |
fungi | wonder if it's bad on the image in that provider region | 16:47 |
jeblair | fungi: yeah, i'm going to check that next | 16:47 |
*** beagles has quit IRC | 16:47 | |
*** dkliban is now known as dkliban_afk | 16:47 | |
*** virmitio has joined #openstack-infra | 16:48 | |
anteaya | ravikumar_hp: I'm looking here: http://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/zuul/layout.yaml | 16:48 |
*** tjones has quit IRC | 16:48 | |
anteaya | I don't have the answer yet, but you are welcome to join me | 16:48 |
jeblair | dmsimard, fungi: both failures were puppet-swift in az2; | 16:48 |
jeblair | dmsimard: did you say you saw a puppet-neutron as well? | 16:48 |
fungi | jeblair: sounds like a strong correlation | 16:48 |
fungi | sample size 2 ;) | 16:49 |
*** sandywalsh_ has joined #openstack-infra | 16:49 | |
*** oubiwann has quit IRC | 16:49 | |
dmsimard | jeblair: Yeah, I linked it, it's on http://status.openstack.org/rechecks/ - review is https://review.openstack.org/#/c/74709/ | 16:49 |
jeblair | also az2 | 16:49 |
ravikumar_hp | anteaya: ok. Thanks | 16:49 |
anteaya | ravikumar_hp: and in the periodic logs that fungi linked you to: http://logs.openstack.org/periodic/ | 16:50 |
anteaya | you can see all the periodic tempest job logs | 16:50 |
ravikumar_hp | anteaya: ok | 16:50 |
*** jaypipes has joined #openstack-infra | 16:50 | |
fungi | jeblair: also, last successful build of that image was 181.65 hours ago | 16:50 |
anteaya | ravikumar_hp: did you have more to your question or does that give you the information you need? | 16:50 |
fungi | hpcloud-az2 really does not like to build images | 16:51 |
jeblair | no it doesn't | 16:51 |
anteaya | I need to change tasks and don't want to leave you hanging | 16:51 |
*** tjones has joined #openstack-infra | 16:51 | |
*** b3nt_pin has joined #openstack-infra | 16:51 | |
ravikumar_hp | anteaya: i got the information. That's it .Thanks . | 16:51 |
*** sarob has joined #openstack-infra | 16:51 | |
anteaya | ravikumar_hp: great | 16:51 |
*** b3nt_pin is now known as beagles | 16:51 | |
*** beagles is now known as beagles_brb | 16:52 | |
*** hemnafk is now known as hemna_ | 16:53 | |
*** smarcet has joined #openstack-infra | 16:53 | |
*** sarob_ has joined #openstack-infra | 16:53 | |
*** tjones has quit IRC | 16:54 | |
*** markmcclain has joined #openstack-infra | 16:54 | |
*** tjones has joined #openstack-infra | 16:55 | |
jeblair | fungi: the git repos with the latest timestamps are all bad. perhaps we didn't call sync three times while spinning in a circle. | 16:55 |
*** esker has joined #openstack-infra | 16:55 | |
dmsimard | lol ? | 16:55 |
fungi | that takes me back | 16:56 |
*** sarob has quit IRC | 16:56 | |
medieval1 | XYZZY | 16:56 |
*** esker has quit IRC | 16:56 | |
fungi | sync ; sleep 10 ; sync ; sleep 10; sync; sleep 10 ; shutdown -h now | 16:56 |
jeblair | fungi: prepare_devstack has a sync, but not prepare_node, which is where the clones are | 16:56 |
fungi | PLUGH | 16:56 |
*** sabari_ has joined #openstack-infra | 16:57 | |
jeblair | (and you can bet the sync is in prepare_devstack because it's needed) | 16:57 |
annegentle | fungi: so I know you said the merge would look funny, but https://review.openstack.org/#/c/74777/ is the result of my trying to merge openstack/operations-guide feature/edits with master | 16:57 |
annegentle | fungi: seems to be blank (no changes, just a commit message) | 16:57 |
*** sabari_ is now known as sabari | 16:57 | |
fungi | annegentle: sometimes it's funnier. but it's never a laugh riot or anything | 16:57 |
annegentle | fungi: *snort* | 16:57 |
*** e0ne has quit IRC | 16:58 | |
dmsimard | jeblair, fungi: You guys let me know when to try a reverify :) | 16:58 |
annegentle | fungi: I followed the steps in https://wiki.openstack.org/wiki/GerritJenkinsGit#Merge_Commits with git checkout -b oreilly/71943 remotes/origin/feature/edits as my first step | 16:58 |
*** ociuhandu has quit IRC | 16:58 | |
fungi | annegentle: anyway, if your ops guide jobs include draft building, you should be able to preview the result there from the check run before approving | 16:58 |
annegentle | fungi: it really is supposed to look like that? | 16:59 |
* annegentle is freaked out :) | 16:59 | |
fungi | annegentle: usually, yes | 16:59 |
annegentle | fungi: no way. Okay! | 16:59 |
annegentle | I'll wait for it to build then! Nice | 16:59 |
*** jcooley_ has joined #openstack-infra | 16:59 | |
fungi | annegentle: the critical part is the "parent(s)" field there... you can see it lists the commits you're merging | 16:59 |
jeblair | dmsimard: at this point, i think the lack of sync is the problem in the image build. i'll fix it but it'll take a few hours to work through the system; you can play the odds if you like, or come back to it later in the day for better odds. | 17:00 |
jeblair | dmsimard: considering the state of the backlog, if you can do the latter, that would probably be best | 17:00 |
fungi | annegentle: and also the "branch" field which tells you which branch you're merging them on | 17:00 |
jeblair | dmsimard: i'll update the bug in a minute; thank you very much for catching this and pointing me at a live server! | 17:00 |
fungi | annegentle: presumably the two parents are one from each branch you're trying to merge | 17:00 |
annegentle | fungi: okay, I see parents now. | 17:00 |
*** dkliban_afk is now known as dkliban | 17:00 | |
annegentle | fungi: so I've got a spreadsheet with all the patches I need to go to feature/edits at https://docs.google.com/spreadsheet/ccc?key=0AhXvn1h0dcyYdGtiRXo5ODFMbkhRZkVROGdTY3RjWVE#gid=0 and I'll just go through the list from oldest to newest | 17:01 |
*** dpyzhov has quit IRC | 17:01 | |
annegentle | fungi: and I think that helps me figure out parentage | 17:01 |
annegentle | fungi: woops, gotta get on a call, thanks for the help! | 17:01 |
dmsimard | jeblair: np, I appreciate you fixing the issue - i'm the one thanking you, here :p | 17:01 |
*** jaypipes has quit IRC | 17:02 | |
BobBall | I've managed to break my environment around pip and pbr while playing with nodepool... I may have deleted something I shouldn't have done. http://paste.openstack.org/show/67315/ Can anyone suggest how I can uninstall / reinstall PBR in a sensible way? | 17:03 |
*** thomasbiege has joined #openstack-infra | 17:03 | |
*** derekh has quit IRC | 17:03 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Add sync calls to all prepare scripts https://review.openstack.org/74780 | 17:03 |
jeblair | oh wait let me attach the bug to that | 17:04 |
*** jaypipes has joined #openstack-infra | 17:04 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Add sync calls to all prepare scripts https://review.openstack.org/74780 | 17:05 |
*** jcooley_ has quit IRC | 17:05 | |
*** markmc has quit IRC | 17:06 | |
jeblair | BobBall: delete the virtualenv and start over? | 17:07 |
BobBall | I wish I had done this in a virtual environment... :P | 17:07 |
jeblair | oh, i thought /usr/workspace/scratch/openstack/citrix/nodepool/easy-install.pth was in a venv | 17:07 |
BobBall | It might have been at one point - I'm still getting used to using venvs by default, and so perhaps the issue is I might have installed it in a venv and then done something else outside the venv that broke it or similar | 17:08 |
BobBall | that file doesn't exist | 17:08 |
*** zul has quit IRC | 17:08 | |
*** chuck__ has quit IRC | 17:08 | |
*** zul has joined #openstack-infra | 17:09 | |
BobBall | the venv environment I have works great - but I'm trying to fix my system so I don't have to be in a venv to use novaclient :P | 17:09 |
jeblair | BobBall: then at this point i usually go mucking about and try to remove things manually | 17:09 |
jeblair | BobBall: it's possible mordred may have better advice | 17:09 |
jeblair | mordred: btw, do know the relative merits of https://review.openstack.org/#/c/74521/ vs https://review.openstack.org/#/c/74523/ ? | 17:10 |
SergeyLukjanov | jeblair, all scripts are based on prepare_node, is it 'as designed' to sync twice? | 17:10 |
fungi | jeblair: so... there are jobs running, but not very many. looking at the jenkins masters' webuis, some have no assigned nodes at all, some have nodes but they're all marked offline, some have nodes running jobs but none have a bunch | 17:10 |
*** jcooley_ has joined #openstack-infra | 17:10 | |
jeblair | SergeyLukjanov: yes, so that we don't have to think about it. :) | 17:11 |
SergeyLukjanov | jeblair, can't disagree ;) | 17:11 |
fungi | nodepoold even after restarting and deleting everything, seems to have 574 nodes on jenkins04 | 17:12 |
*** sandywalsh_ has quit IRC | 17:12 | |
jeblair | fungi: oh, but those were false-ready nodes | 17:12 |
jeblair | fungi: and probably need to be deleted | 17:12 |
jeblair | fungi: i think nodepool marks them ready _before_ adding them to jenkins | 17:12 |
fungi | jeblair: i deleted any nodes which were marked ready at the time of the restart | 17:12 |
jeblair | oh | 17:13 |
fungi | so these are all new since the restart | 17:13 |
*** sarob_ has quit IRC | 17:13 | |
*** cadenzajon has joined #openstack-infra | 17:13 | |
*** sarob has joined #openstack-infra | 17:13 | |
*** amotoki_ has quit IRC | 17:14 | |
fungi | http://paste.openstack.org/show/67316/ is the current breakdown for jenkins04 according to nodepool | 17:14 |
fungi | skimming its webui, i believe the used and delete counts, but not the ready | 17:15 |
jeblair | fungi: i think the jenkins04 manager is stuck again waiting for a response | 17:18 |
jeblair | fungi: nodepool reports the connection ESTABLISHED but it doesn't show up on jenkins04 | 17:18 |
*** sarob has quit IRC | 17:18 | |
fungi | maybe jenkins04 is struggling? i can put it into shutdown and see what happens when i restart nodepoold | 17:19 |
jeblair | fungi: hrm, i wouldn't expect a half-closed connection as a result of that | 17:20 |
fungi | or maybe this is ongoing network issues in dfw | 17:20 |
jeblair | fungi: seems more likely that we're losing fin packets | 17:20 |
*** mrmartin has joined #openstack-infra | 17:21 | |
openstackgerrit | A change was merged to openstack/requirements: Sync requirements to oslo.vmware https://review.openstack.org/74569 | 17:22 |
jeblair | fungi: nm, jenkins04 is moving | 17:23 |
fungi | this looks decidedly non-graceful... http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=1411&rra_id=all | 17:23 |
*** chandankumar_ has joined #openstack-infra | 17:24 | |
fungi | checking to see whether there's anything obvious from teh javamelody side | 17:25 |
jeblair | fungi: there are a lot of offline nodes on jenkins04 | 17:25 |
fungi | https://jenkins04.openstack.org/monitoring?part=graph&graph=fileDescriptors | 17:25 |
fungi | jeblair: yeah, it seems like nodepool may be having trouble adding or deleting nodes from jenkins04 | 17:25 |
*** khyati has joined #openstack-infra | 17:26 | |
*** nicedice has joined #openstack-infra | 17:26 | |
openstackgerrit | A change was merged to openstack-infra/config: Don't offline after propose-requirements-updates https://review.openstack.org/74774 | 17:26 |
fungi | i wonder if it's hitting an open file descriptors limit | 17:26 |
jeblair | fungi: nodepool's interactions with it are _very_ slow | 17:26 |
jeblair | fungi: so maybe it is struggling | 17:26 |
*** chandankumar_ has quit IRC | 17:27 | |
jeblair | 5-10 seconds between api calls | 17:27 |
fungi | and the number of open files flatlining at 4k for long periods seems suspiciously like a max | 17:28 |
*** sandywalsh_ has joined #openstack-infra | 17:28 | |
jeblair | fungi: jenkins04 has 413 slaves attached to it which is considerably more than our intent | 17:28 |
fungi | right. just wondering what caused it to get so many new slaves assigned after the nodepool restart | 17:28 |
fungi | it was similarly the one with most of the slaves before the nodepool restart (though i deleted those). could the predictive assignment in nodepool be misinterpreting that? | 17:30 |
fungi | thinking it wants to run that many jobs? | 17:30 |
jeblair | fungi: it tries to balance across all providers that are up. that does mean that if a provider comes up with 0 nodes, it's going to try to catch it up to the others quickly | 17:31 |
fungi | that sounds like the reverse of what we see here then | 17:32 |
*** gokrokve has joined #openstack-infra | 17:32 | |
fungi | so maybe it's not a feedback loop problem | 17:32 |
*** gyee has quit IRC | 17:33 | |
*** oubiwann has joined #openstack-infra | 17:33 | |
jeblair | fungi: which restart are you thinking of? it never did a mass-allocation to 04 aronud 15:30 | 17:34 |
fungi | no, i think it builds back up on 04 | 17:34 |
*** beagles_brb is now known as beagles | 17:35 | |
fungi | the established tcp connections and open file descriptor graphs look like they might be proportional to the number of connected slaves | 17:35 |
jeblair | fungi: it seems that even nodepool thinks 04 has all the slaves | 17:35 |
fungi | they start to ramp up almost linearly from the time of the nodepool restart | 17:35 |
jeblair | just about | 17:36 |
*** jcoufal has joined #openstack-infra | 17:36 | |
markmcclain | with the current jenkins issue it is expected lose a release job? | 17:37 |
markmcclain | I pushed this tag: http://git.openstack.org/cgit/openstack/python-neutronclient/tag/?id=2.3.4 | 17:37 |
markmcclain | but the release job disappeared from zuul | 17:37 |
jeblair | fungi: okay, i'm going to load the db locally and debug the allocator. in the mean time, why don't you put 04 into shutdown and see if it redistributes after that. | 17:40 |
fungi | markmcclain: looks like jenkins04 ate it... http://logs.openstack.org/59/5931316dd7cddd6834eed6bd9665bd5ef7adbffc/release/python-neutronclient-tarball/0f27e48/console.html | 17:40 |
fungi | jeblair: will do. that was going to be my next suggestion | 17:40 |
fungi | markmcclain: i'll retrigger it once we get this going again | 17:41 |
*** pblaho has quit IRC | 17:41 | |
markmcclain | fungi: thanks | 17:41 |
fungi | jeblair: manually deleting the ready/building nodes assigned to jenkins04 now that it's in shutdown | 17:42 |
*** luqas has quit IRC | 17:43 | |
jeblair | fungi: so a lot of those actually still have active threads trying to add them | 17:43 |
fungi | i can hold off if you like | 17:43 |
jeblair | fungi: it's probably okay. i think it will cause a lot of errors in the daemon, but it should be ok. | 17:44 |
jeblair | carry on | 17:44 |
fungi | proceeding in that case | 17:44 |
*** esker has joined #openstack-infra | 17:44 | |
*** wenlock has quit IRC | 17:45 | |
openstackgerrit | Devananda van der Veen proposed a change to openstack-infra/config: Let infra manage pyghmi releases https://review.openstack.org/74499 | 17:45 |
*** sandywalsh_ has quit IRC | 17:46 | |
*** sarob has joined #openstack-infra | 17:46 | |
clarkb | morning | 17:47 |
*** basha has joined #openstack-infra | 17:48 | |
*** hashar has quit IRC | 17:48 | |
*** packet has joined #openstack-infra | 17:48 | |
*** Ryan_Lane has quit IRC | 17:49 | |
*** max_lobur is now known as max_lobur_afk | 17:50 | |
fungi | clarkb: welcome to the continuation of "what can possibly break next?" | 17:51 |
clarkb | jenkins04 is in trouble? | 17:51 |
jeblair | clarkb: nodepool is being mean to it | 17:52 |
fungi | as software goes, nodepool really can be a bit of a bully | 17:52 |
*** markwash has joined #openstack-infra | 17:58 | |
anteaya | morning clarkb | 17:58 |
*** mrmartin has quit IRC | 17:59 | |
*** sandywalsh_ has joined #openstack-infra | 17:59 | |
*** dangers_away is now known as dangers | 18:00 | |
*** rossella-s has quit IRC | 18:00 | |
openstackgerrit | Henry Gessau proposed a change to openstack-infra/config: Incompatible chrome extension has been fixed https://review.openstack.org/74796 | 18:00 |
*** jpich has quit IRC | 18:02 | |
*** thomasbiege has quit IRC | 18:05 | |
*** hogepodge has joined #openstack-infra | 18:05 | |
clarkb | fungi: to answer your question, elasticsearch. The cluster fell over around 0842UTC today | 18:07 |
clarkb | I have restarted elasticsearch6 which was the only node node back in the cluster at this point and ES is recovering shards to go back to all green | 18:08 |
fungi | clarkb: that was not a good time for, well, anything running in dfw i suspect | 18:08 |
*** david_lyle_ is now known as david_lyle | 18:08 | |
clarkb | oh did dfw have a bad time? | 18:08 |
clarkb | I am still trying to catch up on everything, but ES is on its way to being happy again so I can move onto the next thing | 18:09 |
fungi | ahh, you probably haven't had time to read scrollback | 18:09 |
clarkb | nope | 18:09 |
anteaya | dfw had a bad time yesterday | 18:09 |
fungi | yes, rax-dfw network outage | 18:09 |
anteaya | which you came in towards the end of | 18:09 |
*** chris_johnson has joined #openstack-infra | 18:09 | |
fungi | today utc though | 18:09 |
anteaya | then we went to bed, except Sergey - credit to him for not doing anything drastic | 18:09 |
anteaya | and then dfw had problems again today | 18:10 |
anteaya | cleanup is underway | 18:10 |
fungi | apparently rax-dfw problem was just after 08:30 utc | 18:10 |
clarkb | oh that explains why my weechat derped | 18:10 |
anteaya | and debugging to see what we can do since dfw might have problems some more | 18:10 |
clarkb | fungi: that lines up perfectly with ES cluster issues, I won't dig into them too deeply then. I may increase the wait for master timeout though | 18:11 |
*** ildikov_ has quit IRC | 18:11 | |
fungi | clarkb: yeah, i don't know what the exact duration was, but we can guess from gaps in cacti graphs | 18:11 |
*** chandan_kumar has quit IRC | 18:12 | |
anteaya | oh and fungi is afk for a good portion of the afternoon | 18:13 |
*** basha has quit IRC | 18:13 | |
anteaya | and so am I | 18:13 |
*** Sukhdev has quit IRC | 18:14 | |
fungi | yeah, i need to vaporize in about 45 minutes | 18:15 |
lifeless | fungi: speaking of said patches, i haven't looked at reviews yet; are either of them acceptable? | 18:16 |
fungi | lifeless: i basically haven't reviewed anything in the past 24 hours which was > 1 line long unless it was addressing an in-progress firefight | 18:17 |
lifeless | ack | 18:17 |
*** esker has quit IRC | 18:18 | |
openstackgerrit | A change was merged to openstack-infra/config: Add sync calls to all prepare scripts https://review.openstack.org/74780 | 18:18 |
fungi | okay, retriggered markmcclain's tarball job, only to discover that the authentication error it failed on the first time doesn't seem to be related to jenkins04 issues after all... got the same on jenkins05 now: http://logs.openstack.org/59/5931316dd7cddd6834eed6bd9665bd5ef7adbffc/release/python-neutronclient-tarball/0f27e48,1/console.html | 18:19 |
*** jcooley_ has quit IRC | 18:20 | |
fungi | checking logs on static.o.o | 18:20 |
*** nati_ueno has joined #openstack-infra | 18:21 | |
*** johnthetubaguy has quit IRC | 18:21 | |
clarkb | is it trying to use the credential store for scp now as well? | 18:22 |
clarkb | might explain oddness in scp'ing to tarballs if the credentials stuff changed there | 18:22 |
jeblair | lifeless: you and derekh both seemed to propose a patch that does similar things; is that correct? should we choose one or the other? | 18:22 |
fungi | nice! "Feb 19 18:09:10 static sshd[32104]: Invalid user hudson from 162.242.149.179" | 18:22 |
fungi | apparently the jenkins upgrade/downgrade has mucked with credentials | 18:23 |
jeblair | lifeless: i haven't reviewed yet, but knowing what to do with those two might help | 18:23 |
lifeless | jeblair: derekh and I independently approached the problem, now you get to choose | 18:23 |
lifeless | jeblair: I will review his; I think on his description that both approaches are valid | 18:24 |
jeblair | lifeless: ok, thanks. that will help. | 18:24 |
lifeless | jeblair: we probably can do both at the same time in fact | 18:24 |
lifeless | jeblair: though I don't know if that would be needed | 18:24 |
jeblair | belt and braces and a rope and some duct tape too? :) | 18:24 |
lifeless | yes | 18:25 |
lifeless | superglue as well | 18:25 |
ArxCruz | jeblair: regarding https://review.openstack.org/#/c/69715/ which paramiko version are you guys using? because I've tested in fedora19 and it fails because there's no get_tty argument on sshclient | 18:25 |
fungi | yep, so it definitely has the username as "hudson" in the scp publisher for tarballs.o.o on at least two of the masters so far, probably more | 18:25 |
fungi | i'll correct them | 18:26 |
*** jroovers has quit IRC | 18:26 | |
*** chris_johnson is now known as wchrisj|away | 18:26 | |
jeblair | fungi: that's very weird. | 18:26 |
*** jroovers has joined #openstack-infra | 18:26 | |
clarkb | fungi: jeblair: wouldn't be surprised if older jenkins read files different | 18:26 |
fungi | jeblair: i'm making sure nothing else about that publisher got reverted. i think hudson was the name it used back before we folded it onto static.o.o | 18:27 |
openstackgerrit | A change was merged to openstack-infra/zuul: Log components starts in Zuul.Server https://review.openstack.org/66939 | 18:28 |
*** wchrisj|away is now known as chris_johnson | 18:30 | |
lifeless | ArxCruz: +1'd - I am not core in -infra in general, only in pbr | 18:30 |
ArxCruz | lifeless: ;) thanks | 18:31 |
*** chris_johnson has quit IRC | 18:32 | |
*** wchrisj has joined #openstack-infra | 18:32 | |
fungi | it also changed the target directory from /srv/static to /srv (for some reason it didn't alter any of that publisher on jenkins.o.o) | 18:32 |
*** krtaylor has quit IRC | 18:34 | |
*** krtaylor has joined #openstack-infra | 18:36 | |
*** mriedem has quit IRC | 18:37 | |
fungi | yeah, it seems to have only happened on 04-07 so maybe something to do with the way we copied in the configs for those when we built them? | 18:37 |
*** jgallard has quit IRC | 18:37 | |
clarkb | possible since they were created all at once iirc | 18:37 |
fungi | er, 03-07 | 18:37 |
*** jgrimm has quit IRC | 18:38 | |
*** jgrimm has joined #openstack-infra | 18:38 | |
clarkb | oh 03 was before 04-07 so maybe? | 18:38 |
fungi | i thought we created 01+02 at one time, 03+04 together and then 05-07 together | 18:40 |
clarkb | could be, my memory is fuzzy | 18:41 |
clarkb | that was around LCA when a bunch of stuff was happening | 18:41 |
*** mriedem has joined #openstack-infra | 18:41 | |
*** jp_at_hp has quit IRC | 18:42 | |
clarkb | I thought mordred spun up 4 new masters | 18:42 |
fungi | proposal.slave got offlined by another reqs update job before the layout.yaml change made it onto zuul, so i've brought it back online again | 18:42 |
anteaya | we had 3 before lca, and 2 more during lca | 18:42 |
fungi | oh, i guess puppet agent is still disabled on zuul anyway? | 18:42 |
*** morganfainberg_Z is now known as morganfainberg | 18:42 | |
anteaya | then mordred brought up 3 more after that | 18:42 |
*** esker has joined #openstack-infra | 18:42 | |
clarkb | fungi: must be since I thought your change merged | 18:43 |
anteaya | we had jenkins, 01 and 02 before | 18:43 |
anteaya | and 03 and 04 during lca | 18:43 |
*** dizquierdo has quit IRC | 18:43 | |
*** esker has quit IRC | 18:44 | |
anteaya | I remember since my graphic was current on the monday and stale on the tuesday | 18:44 |
*** e0ne has joined #openstack-infra | 18:44 | |
*** esker has joined #openstack-infra | 18:45 | |
*** dcramer_ has quit IRC | 18:46 | |
fungi | markmcclain: https://pypi.python.org/pypi/python-neutronclient/2.3.4 | 18:46 |
markmcclain | fungi: awesome.. thanks | 18:47 |
anteaya | lifeless: ^ | 18:47 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/nodepool: Fix typo in allocation https://review.openstack.org/74803 | 18:47 |
jeblair | fungi: ^ | 18:47 |
jeblair | fungi, clarkb: we're going to want to restart with that soon. understanding that bug leads me to believe that the distribution is currently piling up on a different jenkins | 18:48 |
fungi | jeblair: gah! | 18:48 |
fungi | and yes, i think so | 18:48 |
fungi | i however won't be around for that bit of fun, i suspect | 18:48 |
jeblair | fungi, clarkb: the behavior change that triggered that is the addition of the py3k nodes, which happened to be the last ones in the loop. since there are few of them, the distribution is rather skewed. | 18:48 |
clarkb | reviewing now | 18:49 |
*** jroovers has quit IRC | 18:49 | |
openstackgerrit | K Jonathan Harker proposed a change to openstack-infra/config: Parameterize the status page urls https://review.openstack.org/74557 | 18:50 |
fungi | for some reason i seem to be unable to bring proposal.slave back online in the jenkins.o.o webui this time... after i click the button it just sits | 18:50 |
clarkb | jeblair: that is a fun typo | 18:51 |
fungi | now adding to the fun, i can't even get the login link on jenkins.o.o to work after logging out and trying to log back in | 18:52 |
fungi | doesn't *seem* to be the dns issue review.o.o was having along those lines yesterday though | 18:52 |
jeblair | fungi: ok if i restart nodepool? (i manually installed that) | 18:52 |
clarkb | elasticsearch is doing a slow recovery :( this is going to be like last week for ES I think | 18:52 |
fungi | jeblair: sure | 18:53 |
clarkb | fungi: I am giving jenkins.o.o and proposal a shot | 18:53 |
clarkb | but logging in seems to be unresponsive for me too | 18:53 |
clarkb | nothing in the jenkins log about it htough | 18:53 |
jeblair | restarting and running deletes for nodes in building/delete state | 18:54 |
openstackgerrit | Justin Lund proposed a change to openstack/requirements: Update neutron-client minimum to 2.3.4 https://review.openstack.org/74805 | 18:55 |
*** malini has left #openstack-infra | 18:55 | |
openstackgerrit | Justin Lund proposed a change to openstack/requirements: Update neutron-client minimum to 2.3.4 https://review.openstack.org/74805 | 18:56 |
*** melwitt has joined #openstack-infra | 18:56 | |
*** oubiwann has quit IRC | 18:56 | |
jeblair | all the keypairs are deleted | 18:58 |
anteaya | yay | 18:58 |
anteaya | did it take 13 hours? | 18:59 |
jeblair | anteaya: about | 18:59 |
anteaya | well now we have that datapoint | 18:59 |
clarkb | fungi: apache is throwing proxy timeout errors when trying to log in | 18:59 |
*** dcramer_ has joined #openstack-infra | 18:59 | |
fungi | okay, i'm headed out. i'll get online from when/where i can over the next ~6 hours, and then get some more stuff done later when i'm home again | 18:59 |
jeblair | fungi: have fun | 19:00 |
fungi | jeblair: thanks | 19:00 |
* anteaya leaves too | 19:00 | |
clarkb | trying to read from jenkinses securityRealm/commengeLogin which I assume does the openid dance | 19:00 |
*** e0ne has quit IRC | 19:00 | |
*** tjones has quit IRC | 19:04 | |
*** dkehn_ has joined #openstack-infra | 19:05 | |
clarkb | jeblair: any ideas on where else to look for jenkins.o.o login issues? jenkins.log is pretty much empty | 19:05 |
clarkb | I am tempest to restart the server since it isn't doing anyhting at the moment | 19:05 |
lifeless | clarkb: lol | 19:06 |
clarkb | wow | 19:06 |
lifeless | clarkb: your fingers failed you | 19:06 |
clarkb | *tempted | 19:06 |
clarkb | my print drivers cache common words | 19:06 |
lifeless | clarkb: have you seen that fax encoding bug ? | 19:06 |
jeblair | clarkb: not without logging in. :) i vote you restart | 19:06 |
clarkb | lifeless: no | 19:06 |
lifeless | clarkb: so there's a compression driver for some faxs that takes a bitmap from the page - say a 0 | 19:07 |
jeblair | jenkins 04 has a lot of nodes attached to it that don't exist. i'm going to stop it and manually remove the configs | 19:07 |
lifeless | clarkb: and then applies it everywhere there are 0's | 19:07 |
clarkb | jeblair: ok | 19:07 |
*** mriedem has quit IRC | 19:07 | |
lifeless | clarkb: the algorithm is tunable for noise etc | 19:07 |
lifeless | clarkb: if you don't tune it *just right* you end up with numbers - e.g. payroll data, cheques, bank accounts - totally messed up | 19:08 |
jeblair | starting jenkins04 | 19:10 |
clarkb | jenkins.o.o is dead. it can't getRootDir. Investigating now :/ | 19:11 |
*** wchrisj has quit IRC | 19:11 | |
*** chris_johnson has joined #openstack-infra | 19:12 | |
*** dstanek is now known as dstanek_afk | 19:12 | |
jeblair | jenkins04 is up, getting slaves added, and running jobs | 19:13 |
clarkb | hrm now it is up, maybe that is a false alarm | 19:13 |
lifeless | clarkb: http://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning | 19:14 |
*** mriedem has joined #openstack-infra | 19:14 | |
clarkb | and proposal.slave.o.o is running jobs again | 19:14 |
lifeless | clarkb: have a read of that and weep | 19:15 |
*** julim has quit IRC | 19:15 | |
clarkb | I will :) | 19:15 |
lifeless | also, don't by xerox:) | 19:15 |
jeblair | xerox laser printers are great. i print books on them. | 19:15 |
lifeless | 'They indeed implemented a software bug, eight years ago, and indeed, numbers could be mangled across all compression modes. They have to roll out a patch for hundreds of thousands of devices world-wide.' | 19:16 |
lifeless | jeblair: It was meant in humour; single mistakes don't blacklist a vendor - mistakes happen | 19:16 |
lifeless | jeblair: I've purchased some pretty large xerox kit at firms in the past | 19:17 |
jeblair | *nod* | 19:17 |
clarkb | now to figure out what fungi needed to run on the proposal slave. to the scrollback | 19:17 |
*** protux has quit IRC | 19:18 | |
jeblair | clarkb: i think it was just that it kept going offline because the regex was wrong; i don't think anything needs to be re-run | 19:18 |
clarkb | jeblair: oh right because zuul needs new functions | 19:18 |
jeblair | clarkb: the only thing that needed re-running was the tarball job due to the scp thing | 19:18 |
jeblair | clarkb: i think that change merged so we should be set now wrt proposal | 19:19 |
clarkb | jeblair: great, I will look at retriggering tarball job now | 19:19 |
*** e0ne has joined #openstack-infra | 19:21 | |
*** thomasbiege has joined #openstack-infra | 19:21 | |
clarkb | jeblair: there are a bunch of offline nodes on 05, not sure if that is just nodepool catching up though | 19:23 |
clarkb | markmcclain: you had tagged a release right? | 19:24 |
clarkb | markmcclain: I will make sure that the whole pipeline happens for that | 19:24 |
*** nati_ueno has quit IRC | 19:24 | |
jeblair | clarkb: it could be a similar situation to 04; i'll check it out | 19:25 |
clarkb | looks like fungi may have retriggered already, I am hunting this down | 19:25 |
*** nati_ueno has joined #openstack-infra | 19:25 | |
markmcclain | clarkb: yes and everything looks to have been published now | 19:25 |
clarkb | markmcclain: yup I see it, I think fungi must've triggered everything then the jobs ran once I brought the slave back online | 19:26 |
markmcclain | ah | 19:26 |
jeblair | clarkb: it's moving; i think i'll leave it be and see if it catches up. | 19:27 |
clarkb | jeblair: ok | 19:27 |
openstackgerrit | A change was merged to openstack-infra/nodepool: Fix typo in allocation https://review.openstack.org/74803 | 19:28 |
*** dstanek_afk has quit IRC | 19:28 | |
*** salv-orlando has joined #openstack-infra | 19:29 | |
jeblair | 02 has that problem too. the others are ok | 19:29 |
clarkb | https://issues.jenkins-ci.org/browse/JENKINS-16239 is what I saw on jenkins.o.o | 19:29 |
clarkb | I think an update of the envinject plugin will fix it | 19:30 |
clarkb | but it doesn't appear to be as serious as I first thought | 19:30 |
*** mrmartin has joined #openstack-infra | 19:30 | |
openstackgerrit | K Jonathan Harker proposed a change to openstack-infra/config: Parameterize the status page urls https://review.openstack.org/74557 | 19:32 |
*** thomasbiege has quit IRC | 19:40 | |
*** jcooley_ has joined #openstack-infra | 19:47 | |
*** afazekas has quit IRC | 19:50 | |
*** mfisch has quit IRC | 19:50 | |
*** salv-orlando has quit IRC | 19:53 | |
*** salv-orlando has joined #openstack-infra | 19:53 | |
*** mrmartin has quit IRC | 19:54 | |
*** dstanek_afk has joined #openstack-infra | 19:54 | |
*** yassine has quit IRC | 19:55 | |
*** yassine has joined #openstack-infra | 19:55 | |
*** sandywalsh_ has quit IRC | 19:57 | |
*** salv-orlando has quit IRC | 19:57 | |
*** dstanek_afk has quit IRC | 19:59 | |
*** julim has joined #openstack-infra | 20:01 | |
clarkb | ES recovery is really slow, I am going to stop my indexers to give the cluster a chance to finish recovering | 20:02 |
jog0 | 343 patches in check? | 20:04 |
clarkb | welcome to the jungle | 20:05 |
*** dcramer_ has quit IRC | 20:05 | |
jog0 | is this the recheck 24 thing? | 20:05 |
jeblair | jog0: no, this is a rax network outage + ffp load + the check thing | 20:05 |
jog0 | ffp? | 20:06 |
jeblair | feature freeze proposal | 20:06 |
jeblair | er | 20:06 |
jeblair | feature proposal freeze? | 20:06 |
jeblair | some combination of those words. :) | 20:06 |
jog0 | ack | 20:06 |
clarkb | jeblair: indexers are stopped. I think indexing and recovery was slow because it was doing both at the same time which meant everything had to be extremely synchronous | 20:06 |
clarkb | going to watch it now and see if that last 4 shards recover more quickyl | 20:07 |
jog0 | wow this is pretty scary | 20:07 |
* jog0 finds lunch | 20:07 | |
jeblair | jog0: ha | 20:07 |
*** ociuhandu has joined #openstack-infra | 20:08 | |
*** jcooley_ has quit IRC | 20:08 | |
*** hashar has joined #openstack-infra | 20:09 | |
*** jcooley_ has joined #openstack-infra | 20:09 | |
*** markmcclain has quit IRC | 20:12 | |
jeblair | clarkb: looks like the ready node count is now small (as it should be under load) | 20:12 |
openstackgerrit | Zane Bitter proposed a change to openstack-infra/config: Fix ChangeId links https://review.openstack.org/74821 | 20:13 |
*** sandywalsh_ has joined #openstack-infra | 20:13 | |
*** jcooley_ has quit IRC | 20:13 | |
*** oubiwann has joined #openstack-infra | 20:14 | |
*** jamespage_ has joined #openstack-infra | 20:17 | |
*** oubiwann has quit IRC | 20:18 | |
BobBall | nodepool question... deleteNode can sometimes timeout in RAX causing nodepool to bail at http://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/nodepool.py#n1112 - but the node is eventually cleaned from RAX. What would the advice be here? Extend timeout? ignore all exceptions and carry on with the nodepool stuff? | 20:20 |
*** ociuhandu has quit IRC | 20:20 | |
openstackgerrit | A change was merged to openstack-infra/git-review: Retrieve remote pushurl independently of user's locale https://review.openstack.org/64307 | 20:20 |
openstackgerrit | Dan Prince proposed a change to openstack-infra/nodepool: Retry ssh connections on auth failure. https://review.openstack.org/74825 | 20:21 |
jeblair | BobBall: the cleanup thread is supposed to cleanup the nodepool db in that case. i think we should extend the rax timeout so it hits less often. | 20:21 |
jeblair | BobBall: lifeless was working on a patch series that tackles that from a different perspective, but it's not ready yet | 20:22 |
BobBall | Ah, OK. | 20:22 |
BobBall | You hit it in the gate too with rax nodes? | 20:22 |
jeblair | BobBall: yep | 20:22 |
jeblair | they eventually get cleaned up, just slower than they should | 20:23 |
BobBall | kay. Wonder why it hits them. Might have a chat with Ant/John about that. | 20:23 |
*** ociuhandu has joined #openstack-infra | 20:23 | |
zaro | clarkb: i think this question is meant for you.. https://review.openstack.org/#/c/61321 | 20:23 |
BobBall | Will increase the timeout, and rely on the cleaning thread ;) | 20:23 |
clarkb | we seem to actually be recovering indexes now. I stopped all indexers and cleared the caches on es nodes | 20:24 |
jeblair | BobBall: if you decide on a good value, let me know | 20:24 |
clarkb | once it is green again I will turn on indexers | 20:24 |
clarkb | (I think we are beginning to get into our nodes are too small for the data thrown at it territory again) | 20:25 |
openstackgerrit | Andreas Jaeger proposed a change to openstack/requirements: Update openstack-doc-tools to 0.7.1 https://review.openstack.org/74827 | 20:25 |
BobBall | Just to understand jeblair - why do you want it to wait for the server to have gone on deletion? a quota thing? Could just add to a list of nodes that are being deleted and poll them in the cleanup thread, rather than block? | 20:25 |
*** cadenzajon has quit IRC | 20:26 | |
lifeless | jeblair: the delete refactoring stuff ? | 20:26 |
clarkb | you don't want to account the node as gone before it is gone | 20:26 |
clarkb | quota is part of that but more importantly the allocation of nodes across providers | 20:27 |
BobBall | OK | 20:27 |
BobBall | 10 minutes just seems like a long time to block so I'm hesitant about making it even longer :P | 20:27 |
jeblair | BobBall: answering in order: nodepool needs to know how many servers there actually are in order to do math about how many it should spin up correctly. yes. there's lots of ways you could do it; this one is not that problematic, it just needs tuning; lifeless has another way. | 20:27 |
jeblair | BobBall: it's not blocking anything | 20:27 |
jeblair | BobBall: the current nodepool design has lots of threads all fighting to get their work done, mediated by the provider managers (so they don't starve each other or run over rate limits) | 20:28 |
jeblair | BobBall: so that one thread is blocking, but it isn't slowing anything else down. | 20:29 |
jeblair | lifeless: yes | 20:29 |
BobBall | Understood. | 20:29 |
*** markmcclain has joined #openstack-infra | 20:30 | |
*** mrmartin has joined #openstack-infra | 20:30 | |
*** markmcclain1 has joined #openstack-infra | 20:31 | |
BobBall | I've got a python script trying to use nodepool - so my script polls for nodes, holds them then deletes them. This is what's blocking for me, but I can re-work the blocking there so a longer timeout is fine. | 20:31 |
*** jgrimm has quit IRC | 20:32 | |
*** markmcclain1 has quit IRC | 20:32 | |
*** jamespage_ has quit IRC | 20:32 | |
mtreinish | fungi, clarkb: have you guys seen this failure before/is there a bug for it?: http://logs.openstack.org/57/73457/1/check/check-tempest-dsvm-postgres-full/595b00c/console.html | 20:32 |
*** dstanek has joined #openstack-infra | 20:32 | |
*** mrda_away is now known as mrda | 20:33 | |
*** denis_makogon_ has joined #openstack-infra | 20:34 | |
clarkb | mtreinish: haven't seen that before. looks like adding a region failed | 20:34 |
*** markmcclain has quit IRC | 20:34 | |
clarkb | but I don't see keystone logs | 20:34 |
openstackgerrit | Ryan Petrello proposed a change to openstack/requirements: Update pecan >= 0.4.5 in global requirements. https://review.openstack.org/74830 | 20:34 |
*** dprince has quit IRC | 20:35 | |
jeblair | 2014-02-19 06:37:40.894 | 2014-02-19 06:37:40 /opt/stack/new/devstack/functions-common: line 997: /opt/stack/new/devstack/stack-screenrc: Permission denied | 20:35 |
jeblair | is that the actual error? | 20:35 |
clarkb | and syncing requirements failed | 20:35 |
clarkb | jeblair: looks like permissions trouble in the /opt dirs | 20:35 |
*** ryanpetrello has left #openstack-infra | 20:36 | |
HenryG | Hi, I am unable to find an existing bug for this gate-neutron-python27 failure: http://logs.openstack.org/33/68833/3/gate/gate-neutron-python27/42c2370 | 20:36 |
*** ryanpetrello has joined #openstack-infra | 20:36 | |
HenryG | Any clues/hints would be appreciated. | 20:37 |
clarkb | HenryG: looks like a greenlet failure | 20:38 |
clarkb | I would ask neutron folks | 20:38 |
HenryG | clarkb: thanks, will do | 20:38 |
mtreinish | clarkb, jeblair: ok I was just thrown by what looked like ps output interspersed in the log messages | 20:39 |
*** jcooley_ has joined #openstack-infra | 20:39 | |
mtreinish | but yeah it definitely looks like permissions issue, should I open it against devstack or ci? | 20:39 |
clarkb | mtreinish: not sure, does that change change permissions in a weird way? | 20:41 |
*** smarcet has left #openstack-infra | 20:41 | |
jeblair | right before running devstack, devstack-gate does: "sudo chown -R stack:stack $BASE" | 20:42 |
*** yolanda has quit IRC | 20:42 | |
jeblair | so it's hard to say what the problem could be. did that fail? or did something in devstack change it? | 20:42 |
mtreinish | jeblair: it looks like everything was working fine until: http://logs.openstack.org/57/73457/1/check/check-tempest-dsvm-postgres-full/595b00c/console.html#_2014-02-19_06_37_13_730 | 20:43 |
mtreinish | when it went to sync the requirements for horizon | 20:43 |
*** jgrimm has joined #openstack-infra | 20:44 | |
clarkb | devstack is also doing safe_chown ing of it sown | 20:45 |
clarkb | so yeah I think it could be in a number of places | 20:45 |
clarkb | sudo chown -R jenkins:jenkins /opt/stack/new happens in workspace new setup | 20:46 |
clarkb | should it be stack:stack instead? | 20:46 |
jeblair | clarkb: not unless we want to 'sudo stack' before every command | 20:46 |
clarkb | doesn't devstack run as the stack user though? | 20:47 |
clarkb | I guess it gets root as necessary though | 20:47 |
lifeless | you could sudo stack exec :) | 20:47 |
jeblair | clarkb: yes, which is why devstack-gate does "sudo chown -R stack:stack $BASE" | 20:47 |
jeblair | right before running devstack | 20:47 |
jeblair | lifeless: then we couldn't go back. | 20:47 |
clarkb | oh gotcha | 20:47 |
jeblair | lifeless: jenkins has sudo, stack drops sudo | 20:47 |
lifeless | ah | 20:48 |
*** dcramer_ has joined #openstack-infra | 20:48 | |
*** jcooley_ has quit IRC | 20:49 | |
jeblair | that seems to have happened to 2 builds in the last 24h, in dfw and iad. | 20:51 |
jeblair | according to logstash | 20:51 |
*** mwagner_lap has quit IRC | 20:53 | |
mtreinish | jeblair: I guess I'm really lucky then :) | 20:54 |
*** smarcet has joined #openstack-infra | 20:54 | |
*** jcooley_ has joined #openstack-infra | 20:55 | |
jeblair | mtreinish: i think we'll either need to catch a live node or add some debugging | 20:56 |
mtreinish | jeblair: ok, should I open a bug about it then? | 20:58 |
mtreinish | yeah the logs don't really show what happened | 20:58 |
*** DinaBelova is now known as DinaBelova_ | 20:58 | |
jeblair | mtreinish: sure; target ci and devstack until we know what's up i guess | 20:58 |
*** khyati has quit IRC | 21:01 | |
*** sabari has quit IRC | 21:01 | |
*** khyati has joined #openstack-infra | 21:02 | |
mtreinish | jeblair: https://bugs.launchpad.net/devstack/+bug/1282262 | 21:02 |
uvirtbot | Launchpad bug 1282262 in openstack-ci "Permission denied errors on /opt during devstack" [Undecided,New] | 21:02 |
*** khyati has quit IRC | 21:04 | |
clarkb | jeblair: I am thinking we may want to add another ES node so that losing one node doesn't cause the others to run into GC trouble (will need to bump the number of shards slightly too though that may be less necessary) | 21:05 |
clarkb | jeblair: but I think this can happen after FF | 21:05 |
*** CaptTofu has quit IRC | 21:06 | |
jeblair | clarkb: whew | 21:06 |
*** CaptTofu has joined #openstack-infra | 21:06 | |
*** jamespage_ has joined #openstack-infra | 21:07 | |
openstackgerrit | Dan Prince proposed a change to openstack-infra/nodepool: Retry ssh connections on auth failure. https://review.openstack.org/74825 | 21:07 |
*** rfolco has quit IRC | 21:08 | |
ArxCruz | jeblair: clarkb are you guys having problems with jenkins and nodepool? I have a few VM's ready, but a lot of jobs in the build queue | 21:09 |
*** thomasbiege has joined #openstack-infra | 21:09 | |
* ArxCruz blame zuul changes :@ | 21:10 | |
*** jamespage_ has quit IRC | 21:10 | |
openstackgerrit | Ivan Melnikov proposed a change to openstack-infra/config: Add documentation jobs for taskflow https://review.openstack.org/74837 | 21:10 |
*** pafuent has left #openstack-infra | 21:10 | |
openstackgerrit | Matthew Treinish proposed a change to openstack-infra/devstack-gate: Start compressing config files too https://review.openstack.org/74838 | 21:10 |
*** CaptTofu has quit IRC | 21:11 | |
*** alexpilotti has quit IRC | 21:11 | |
*** alexpilotti_ has joined #openstack-infra | 21:11 | |
HenryG | clarkb: there does not seem to be a bug tracking this yet, but it looks like trouble may be brewing: http://logstash.openstack.org/index.html#eyJzZWFyY2giOiJtZXNzYWdlOlwiZ3JlZW5sZXQuR3JlZW5sZXRFeGl0XCIgQU5EIGZpbGVuYW1lOlwiY29uc29sZS5odG1sXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTI4NDQyMDk0OTV9 | 21:11 |
jeblair | ArxCruz: actually, zuul seems to have dealt rather well with 400 changes in queue.... | 21:12 |
jeblair | ArxCruz: the bulk of our problems stem from a rax network outage this morning | 21:12 |
*** weshay has quit IRC | 21:12 | |
ArxCruz | jeblair: that's really weird, I have only one zuul and zuul-merger, and nodepool latest version | 21:12 |
HenryG | Any tips on how to track down the culprit? | 21:12 |
mattoliverau | Morning! | 21:12 |
ArxCruz | jeblair: right now I have a lot of vm's idle and a lot of jobs in build queue | 21:12 |
jeblair | mattoliverau: good morning; things are busy here. | 21:13 |
*** jamespage_ has joined #openstack-infra | 21:13 | |
jeblair | ArxCruz: oh, you're talking about your own thing. | 21:13 |
*** mrmartin has quit IRC | 21:13 | |
ArxCruz | jeblair: hehe, yup | 21:13 |
jeblair | ArxCruz: you asked about us. | 21:13 |
ArxCruz | wondering if is something I did or if there's something wrong with yours too | 21:13 |
ArxCruz | sorry, bad english | 21:13 |
*** cadenzajon has joined #openstack-infra | 21:13 | |
jeblair | ArxCruz: see what the state of the nodes are in jenkins. we upgraded jenkins and found that the latest version didn't work with the gearman plugin, so we're currently running the lts version | 21:14 |
ArxCruz | oh boy... | 21:15 |
ArxCruz | which jenkins version are you guys using ? | 21:15 |
ArxCruz | and which gearman plugin ? | 21:15 |
ArxCruz | :/ | 21:15 |
ArxCruz | jeblair: ^ | 21:16 |
*** jcooley_ has quit IRC | 21:17 | |
jeblair | ArxCruz: you can check the version # at the bottom of the page; the gearman plugin is something recent but shouldn't matter too much. | 21:17 |
*** jcooley_ has joined #openstack-infra | 21:18 | |
ArxCruz | jeblair: thanks, sorry for the confusion :) | 21:19 |
*** tjones has joined #openstack-infra | 21:21 | |
*** jcooley_ has quit IRC | 21:22 | |
*** jroovers has joined #openstack-infra | 21:26 | |
openstackgerrit | Sergey Lukjanov proposed a change to openstack-infra/config: Enable docs for python-savannaclient https://review.openstack.org/74470 | 21:27 |
*** markmcclain has joined #openstack-infra | 21:28 | |
*** sabari has joined #openstack-infra | 21:28 | |
*** e0ne has quit IRC | 21:28 | |
*** andreaf has quit IRC | 21:28 | |
*** jroovers has quit IRC | 21:30 | |
*** jcooley_ has joined #openstack-infra | 21:30 | |
dhellmann | dstufft: fyi, I'm very close to giving up on namespace packages for oslo libraries :-| | 21:30 |
*** thomasbiege has quit IRC | 21:33 | |
*** fbo_away is now known as fbo | 21:34 | |
*** jamielennox is now known as jamielennox|away | 21:36 | |
*** hashar has quit IRC | 21:38 | |
*** jhesketh_ has joined #openstack-infra | 21:39 | |
*** protux has joined #openstack-infra | 21:39 | |
jhesketh_ | Morning | 21:39 |
*** ok_delta has joined #openstack-infra | 21:40 | |
*** sabari_ has joined #openstack-infra | 21:40 | |
*** sabari has quit IRC | 21:41 | |
jeblair | jhesketh_: good morning | 21:42 |
jhesketh_ | hey jeblair, how's things? | 21:43 |
jeblair | jhesketh_: could be better. :) | 21:43 |
jog0 | jeblair: how much of the check queue is from the outage vs recheck | 21:43 |
jeblair | jhesketh_: there was a rax network outage this morning; that's the flat line in the nodepool graph | 21:43 |
jeblair | jog0: i'm not sure how i would determine the answer to that | 21:43 |
jhesketh_ | :-( | 21:43 |
jeblair | jhesketh_: that set us back a bit | 21:43 |
jhesketh_ | right, let me know if I can help with anything | 21:44 |
*** salv-orlando has joined #openstack-infra | 21:44 | |
jeblair | jog0: the trend in queue length has been solidly downward since we got everything unstuck, so at current in/out rates, we're not getting worse. that suggests that under normal circumstances we can more than handle the current patchset test load. | 21:46 |
jeblair | jog0: (extrapolating from less than 1 days worth of data which is potentially dangerous) | 21:46 |
*** oubiwann has joined #openstack-infra | 21:47 | |
*** oubiwann has quit IRC | 21:47 | |
jeblair | jhesketh_: i have a puzzle for you if you're interested -- during the network outage, both the jenkins manager in nodepool as well as novaclient itself were stuck in the same ssl read function. | 21:48 |
jeblair | jhesketh_: Shrews suggested that setting keepalive on the socket might help prevent that sort of situation in the future | 21:48 |
*** markmcclain has quit IRC | 21:48 | |
dims | puzzled...requirements/projects.txt seems to be outdated for a brand new docs run. any ideas? http://logs.openstack.org/74/74474/17/check/gate-oslo.vmware-docs/33f359e/console.html | 21:48 |
jeblair | jhesketh_: are you interested in seeing if something like that is possible? it might involve some novaclient, urllib, or ssl library deep diving | 21:49 |
*** sabari_ has quit IRC | 21:50 | |
jeblair | dims: that will update during the next image build, which won't be for a while | 21:50 |
*** smarcet has quit IRC | 21:51 | |
jhesketh_ | jeblair: not sure I have enough knowledge of those systems to actually achieve much there to be honest | 21:51 |
dims | jeblair, i see. thx | 21:51 |
jeblair | mordred: ^ might need to do something about stale requirements repos | 21:52 |
jeblair | jhesketh_: no prob | 21:52 |
jhesketh_ | jeblair: what was the read function they were stuck in/error they saw | 21:52 |
jeblair | jhesketh_: | 21:53 |
jeblair | http://paste.openstack.org/show/67382/ | 21:53 |
*** wenlock_ has quit IRC | 21:54 | |
openstackgerrit | Davanum Srinivas (dims) proposed a change to openstack-infra/config: Mark a few oslo.vmware jobs as non-voting https://review.openstack.org/74669 | 21:54 |
*** skraynev is now known as skraynev_afk | 21:55 | |
mordred | jeblair: reading scrollback | 21:55 |
*** wenlock has joined #openstack-infra | 21:56 | |
*** prad has quit IRC | 21:56 | |
jeblair | mordred: /opt/requirements is now updated daily at most. in the case of hpcloud-az2, it was last updated feb 12. | 21:57 |
*** julim has quit IRC | 21:58 | |
openstackgerrit | Davanum Srinivas (dims) proposed a change to openstack-infra/config: Temporary : Mark a few oslo.vmware jobs as non-voting https://review.openstack.org/74669 | 21:59 |
mordred | jeblair: so - we might need to "cd /opt/requirements ; git pull --ff-only" (or something similar) | 21:59 |
mordred | ? | 21:59 |
mordred | oh STALE requirements. I thought you were saying stable requirements | 21:59 |
jeblair | mordred: yes, though that may require sudo access unless we change the owner of those repos to jenkins | 21:59 |
jeblair | mordred: since all the slaves are single use, i think we can do that now | 22:00 |
mordred | jeblair: shouldn't the repo prep be setting requirements to master? | 22:00 |
mordred | like, since requirements is part of the integration set? | 22:00 |
jeblair | mordred: not devstack | 22:00 |
jeblair | mordred: unit test, etc, jobs | 22:00 |
mordred | oh. but why does /opt/requirements matter for unittests - they're all in tox? | 22:00 |
jeblair | mordred: see the original question from dims and 22:03 < jeblair> mordred: not devstack | 22:01 |
jeblair | 22:03 < jeblair> mordred: unit test, etc, jobs | 22:01 |
jeblair | gah | 22:01 |
jeblair | mordred: and http://logs.openstack.org/74/74474/17/check/gate-oslo.vmware-docs/33f359e/console.html | 22:01 |
mordred | k. reading | 22:01 |
mordred | jeblair: GOTCHA. thank you | 22:02 |
mordred | yeah - I think we fetch /opt/requirements as a pre-test sudo operation | 22:03 |
jeblair | mordred: can't sudo, not yet at least. | 22:03 |
mordred | or, rather, change it to jenkins owner | 22:03 |
mordred | sorry - misspoke | 22:03 |
jeblair | mordred: can sudo after this merges: https://review.openstack.org/#/q/status:open+project:openstack-infra/config+branch:master+topic:sudoers,n,z | 22:03 |
annegentle | fungi: what's a an appropriate gerrit ref to point oreilly to for a pointer to HEAD:openstack/operations-guide/feature/edits (is that right?) | 22:03 |
clarkb | ok back from lunch | 22:04 |
annegentle | fungi: right now they are pointing at a fork of openstack/operations-guide, but I think jeblair mentioned they could push to a gerrit ref | 22:05 |
annegentle | clarkb: welcome back! | 22:05 |
clarkb | I am going to turn indexers back on | 22:06 |
clarkb | annegentle: fungi is AFk for a while. let me kick ES then I will look at your question | 22:06 |
annegentle | clarkb: okie | 22:06 |
*** amcrn has quit IRC | 22:07 | |
*** ok_delta has quit IRC | 22:07 | |
*** virmitio has quit IRC | 22:08 | |
*** dkliban is now known as dkliban_afk | 22:09 | |
*** cadenzajon has quit IRC | 22:09 | |
*** CaptTofu has joined #openstack-infra | 22:10 | |
clarkb | ok ES and logstash are "UP" it is relocating shards but indexing is happening at a reasonable speed. I am a bit worried that we might run into memory trouble so will keep an eye on it | 22:11 |
*** oubiwann has joined #openstack-infra | 22:11 | |
clarkb | annegentle: now for oreilly. What is it that oreilly needs to do? just push their edits upstream? | 22:11 |
*** jamielennox|away is now known as jamielennox | 22:12 | |
*** vkozhukalov has quit IRC | 22:12 | |
*** cadenzajon has joined #openstack-infra | 22:13 | |
*** ArxCruz has quit IRC | 22:13 | |
annegentle | clarkb: so we created a branch so that oreilly's edits are less intrusive on our master | 22:13 |
clarkb | yup | 22:13 |
annegentle | clarkb: we can happily keep editing while they make it production ready | 22:13 |
*** lcostantino has quit IRC | 22:13 | |
annegentle | clarkb: we're still changing master and then I keep delivering changes to feature/edits | 22:14 |
*** khyati has joined #openstack-infra | 22:14 | |
annegentle | clarkb: they just want to know what we want :) very accomodating | 22:14 |
*** ArxCruz has joined #openstack-infra | 22:15 | |
zaro | roz: you cannot replace the change owner and that's not configurable in gerrit. however it looks like there might be a workaround which fungi has powers to do.. https://groups.google.com/forum/#!topic/repo-discuss/aqNgmuiCtyk | 22:15 |
clarkb | annegentle: what would you like them to do ? | 22:15 |
*** sarob has quit IRC | 22:16 | |
jeblair | zaro, roz: we're not going to do that. what's the problem? | 22:16 |
annegentle | clarkb: ideally they'll push to feature/edits | 22:16 |
annegentle | clarkb: so what do I tell them to push to? | 22:16 |
*** sarob has joined #openstack-infra | 22:16 | |
clarkb | dims: still around? we need to test the -proposed version of libvirt 1.1.1 on precise before it will end up in cloud archive. I think the easiest way to do that is with a devstack change that enables -proposed for the libvirt package. Is that something you are already testing? | 22:16 |
*** thedodd has joined #openstack-infra | 22:16 | |
clarkb | annegentle: and these edits would go into review right? | 22:17 |
annegentle | push to the appropriate gerrit ref (HEAD:refs/for/branchname) | 22:17 |
annegentle | clarkb: jeblair originally had that in an email ^^ | 22:17 |
annegentle | clarkb: so helping their production staff get the pointer right | 22:17 |
clarkb | git push ssh://username@review.openstack.org:29418 HEAD:refs/for/feature/edits <- that will push them up for review | 22:17 |
*** banix has quit IRC | 22:18 | |
clarkb | can also just use git review on that branch if the .gitreview branch is set correctly | 22:18 |
annegentle | refs/for? really? | 22:18 |
clarkb | annegentle: refs/for is the magical gerrit reference prefix | 22:18 |
annegentle | clarkb: (not that I'm doubting)! | 22:18 |
annegentle | clarkb: do you think it makes sense to give them one username that can push directly? or did we decide that was bad | 22:19 |
*** bknudson has quit IRC | 22:19 | |
annegentle | clarkb: I'm okay with walking one of their production staff through cla but wanting to be sure it's required | 22:20 |
clarkb | annegentle: personally I think that is bad. It isn't how openstack accepts commits. But the relationship here is new and special and may not require review | 22:20 |
clarkb | jeblair: ^ | 22:20 |
jeblair | i'd like to try having them push things for review | 22:20 |
annegentle | jeblair: clarkb: okay I'll keep pushing them | 22:20 |
*** sarob has quit IRC | 22:21 | |
*** dolphm has joined #openstack-infra | 22:21 | |
dolphm | is zuul waiting for a check job to complete before moving approved changes into the gate? | 22:22 |
clarkb | dolphm: if the check results are more than 24 hours old yes | 22:22 |
dolphm | YAY! | 22:22 |
annegentle | jeblair: if I give them git push ssh://username@review.openstack.org:29418 HEAD:refs/for/feature/edits and they go through the CLA and all, what will those patches look like to me on review.openstack.org? | 22:22 |
*** mfer has quit IRC | 22:22 | |
dstufft | dhellmann: dooo it | 22:22 |
dstufft | dhellmann: namespace packages are bad for you | 22:23 |
clarkb | dolphm: and it will recheck if comments happen (not just approvals) and the check tests are more than 72 hours old | 22:23 |
annegentle | jeblair: right now I'm porting from master to feature/edits | 22:23 |
dstufft | at least until python 3.whatever is the baseline and you can use the built in form of namepsace packages | 22:23 |
dstufft | maybe someone can backport that to 2.x, I dunno | 22:23 |
dolphm | clarkb: ha, that's awesome | 22:23 |
*** esker has quit IRC | 22:23 | |
clarkb | dolphm: idea behind that is test results stay fresh as review happens | 22:23 |
fungi | clarkb: the neutronclient release was not related to the offlined proposal slave... just different broken things i was trying to fix | 22:23 |
fungi | but looks like you figured that out | 22:24 |
jeblair | annegentle: they'll show up like normal but the branch column will be different | 22:24 |
dolphm | clarkb: that's great -- it should help catch merge conflicts earlier too, which will be super useful all by itself | 22:24 |
*** dstanek has quit IRC | 22:24 | |
clarkb | fungi: yup thanks, go back to being AFK :P | 22:24 |
annegentle | jeblair: ok like stable/havana. | 22:24 |
jeblair | annegentle: exactly | 22:24 |
zaro | jeblair: roz wants to make himself the owner of a change so he can set it to WIP Status. | 22:24 |
dmsimard | jeblair: Sorry to bother you with that again, when did you say https://review.openstack.org/#/c/74780/ was going to be effective ? | 22:24 |
jeblair | annegentle: so you'll want to watch out for that | 22:24 |
dims | clarkb, pong. yes i can help with that | 22:24 |
clarkb | dims: awesome thanks. let me collect the relevant data really quickly | 22:25 |
jeblair | zaro, roz: remove the changeid from the commit message and git-review it again to make a new change in gerrit. abondon the old one. | 22:25 |
dims | clarkb, i ended up building the libvirt from their git and running it in our gate | 22:25 |
clarkb | dims: see https://bugs.launchpad.net/nova/+bug/1228977/ comment from Brian Murray. Lifeless already updated the impact and risk stuff for us | 22:26 |
uvirtbot | Launchpad bug 1228977 in nova "n-cpu seems to crash when running with libvirt 1.1.1 from ubuntu cloud archive" [High,Confirmed] | 22:26 |
clarkb | dims: so now we need to test it | 22:26 |
jeblair | dmsimard: i'll try kicking off an image build now | 22:27 |
clarkb | dims: we need a change to devstack that enables ubuntu -proposed https://wiki.ubuntu.com/Testing/EnableProposed and changes the name of the libvirt package to libvirt/precise-proposed in devstack that we can WIP | 22:27 |
clarkb | dims: that should install libvirt from proposed and test that the patched libvirt works as expected | 22:27 |
dims | clarkb, am on it after i wrap up a couple of things in a few | 22:27 |
*** jamespage_ has quit IRC | 22:28 | |
clarkb | dims: with that info we should be able to get the package updated in cloud archive and hopefully switch all tests to new libvirt | 22:28 |
*** oubiwann has quit IRC | 22:28 | |
dims | clarkb, yep. sounds good. | 22:28 |
clarkb | dims: so basically this is a throw away change to show ubuntu that the fix is safe | 22:28 |
clarkb | dims: awesome thank you | 22:28 |
dims | clarkb, yep. | 22:28 |
*** jomara has quit IRC | 22:29 | |
*** prad has joined #openstack-infra | 22:31 | |
*** jcooley_ has quit IRC | 22:31 | |
*** jcooley_ has joined #openstack-infra | 22:31 | |
dmsimard | jeblair: Thanks, appreciate it. Let me know what happens :) | 22:33 |
*** mrda is now known as mrda_away | 22:33 | |
*** dcramer_ has quit IRC | 22:34 | |
jeblair | clarkb: http://paste.openstack.org/show/67391/ | 22:34 |
jeblair | clarkb: az2 consistently fails image creation with that | 22:34 |
clarkb | looking | 22:35 |
*** dolphm is now known as dolphm_503 | 22:35 | |
clarkb | FYI gearman for logstash is 164k events behind but slowly catching up | 22:35 |
clarkb | jeblair: is the remote side killing our connection? | 22:35 |
*** jcooley_ has quit IRC | 22:35 | |
jeblair | clarkb: i have no idea | 22:36 |
jeblair | clarkb: i tried it from my workstation at home and it works. :/ | 22:36 |
*** jcooley_ has joined #openstack-infra | 22:36 | |
*** VijayT has joined #openstack-infra | 22:37 | |
*** mriedem has quit IRC | 22:37 | |
*** jcooley_ has quit IRC | 22:37 | |
*** jeckersb is now known as jeckersb_gone | 22:37 | |
*** jcooley_ has joined #openstack-infra | 22:38 | |
*** thomasem has quit IRC | 22:39 | |
*** e0ne has joined #openstack-infra | 22:39 | |
*** rcleere has quit IRC | 22:39 | |
clarkb | is CONNECT_TIMEOUT being hit? | 22:40 |
* clarkb reads more code | 22:40 | |
clarkb | doesn't look like it | 22:41 |
*** miqui has quit IRC | 22:42 | |
*** jcooley_ has quit IRC | 22:42 | |
clarkb | jeblair: I think the nodeutils ssh_connect may need to catch a wider net of exceptions possibly | 22:42 |
clarkb | right now it only catches socket.error | 22:42 |
jeblair | clarkb: interesting that this is new and only happens on az2 | 22:42 |
*** e0ne has quit IRC | 22:42 | |
*** dstanek has joined #openstack-infra | 22:42 | |
clarkb | I agree | 22:43 |
jeblair | clarkb: i'm trying some manual tests with 'nova boot' | 22:43 |
clarkb | k | 22:43 |
fungi | clarkb: thanks. brad topol is giving a great keystone overview to the group. Shrews is here too | 22:44 |
clarkb | so #tox is the channel for the skype replacement, not the python test tool... | 22:44 |
lifeless | hahahahaha | 22:45 |
lifeless | clarkb: #python-testing | 22:45 |
jeblair | clarkb: so just manually sshing, for a while i got ssh: connect to host 15.185.190.118 port 22: Connection refused | 22:45 |
*** ArxCruz has quit IRC | 22:45 | |
jeblair | clarkb: now i get Connection closed by 15.185.190.118 | 22:45 |
clarkb | jeblair: which should cause an EOFError right? | 22:46 |
dims | clarkb, i see zul may have updated "precise-proposed/icehouse" to libvirt 1.2.1 with the changes we need (https://launchpad.net/~ubuntu-cloud-archive/+archive/icehouse-staging/+sourcepub/3889570/+listing-archive-extra) - we will have to try that | 22:46 |
clarkb | lifeless: thanks | 22:46 |
*** sarob has joined #openstack-infra | 22:46 | |
clarkb | dims: that should work | 22:47 |
morganfainberg | fungi, give topol a hard time for me ;) | 22:47 |
dims | clarkb, will report back tomorrow. | 22:47 |
morganfainberg | fungi, (or at least wave enthusiastically at him for me) | 22:47 |
jeblair | clarkb: i've never looked at a console log for an hpcs vm before, but this doesn't look great to me: http://paste.openstack.org/show/67397/ | 22:47 |
clarkb | dims: awesome thank you for the help (I had hoped to get to it eventually but so much other stuff is going on) | 22:48 |
zul | dims/clarkb: we should be uploading a new version of libvirt next week | 22:48 |
clarkb | zul: does that mean you don't need us to test it? | 22:48 |
clarkb | zul: https://bugs.launchpad.net/nova/+bug/1228977/ started the conversation | 22:48 |
uvirtbot | Launchpad bug 1228977 in nova "n-cpu seems to crash when running with libvirt 1.1.1 from ubuntu cloud archive" [High,Confirmed] | 22:48 |
jaypipes | quick question... anybody know which config file the periodic QA jobs are defined in? | 22:49 |
clarkb | we need to test it anyways, but it is easier to do that once in cloud archive | 22:49 |
*** bknudson has joined #openstack-infra | 22:49 | |
clarkb | however getting ahead of it is probably best so that if it doesn't work we can hopefully fix it before the update | 22:49 |
fungi | morganfainberg: will do. i'm a well-practiced heckler | 22:49 |
morganfainberg | fungi, ++ | 22:49 |
morganfainberg | :) | 22:49 |
clarkb | jeblair: that looks like unhappy metadata server which is bad times | 22:50 |
lifeless | zul: as we understand it you need it tested, so we're aiming to do that :) | 22:51 |
clarkb | jaypipes: most of them should be templates now and we specify which branch to test in the projects.yaml file for JJB when we instantiate the template | 22:51 |
mordred | clarkb: its #pylib | 22:51 |
*** rlandy has quit IRC | 22:52 | |
jeblair | clarkb: still getting eof on ssh to that host. spinning up another one in az1 to compare console log. | 22:52 |
jaypipes | clarkb: yeah, am looking in that file now.. unless I am mistaken, all the periodic jobs are run against "devstack-precise" single use nodes. Is that correct? | 22:52 |
*** dkranz has quit IRC | 22:52 | |
clarkb | jaypipes: all of the tempest periodic tests yes | 22:53 |
clarkb | the unittest periodic jobs are run on bare-precise and bare-centos now | 22:53 |
jeblair | clarkb: yeah, the output looks much less error-like on az1 | 22:53 |
jaypipes | clarkb: gotcha. thx man. | 22:54 |
jeblair | clarkb: i think this may be hpcs ticket-worthy | 22:54 |
clarkb | jeblair: I agree, though we may just be told to stop using az2 which is :( | 22:54 |
jeblair | clarkb: not much we can do about that, we can't use it now anyway | 22:55 |
clarkb | yup | 22:55 |
*** ryanpetrello has quit IRC | 22:55 | |
jeblair | clarkb: would you please do the honors? | 22:56 |
clarkb | oh you want me to do it :P yes I will file it | 22:56 |
zul | lifeless: thats for srus | 22:57 |
lifeless | zul: so, UCA doesn't need as much testing as SRUs ? | 22:58 |
lifeless | zul: anyhow, we want it in saucy directly too | 22:58 |
*** esker has joined #openstack-infra | 22:59 | |
*** thedodd has quit IRC | 23:00 | |
*** esker has quit IRC | 23:00 | |
*** esker has joined #openstack-infra | 23:00 | |
*** mrda_away is now known as mrda | 23:01 | |
zul | lifeless: to get it saucy it needs an sru, UCA it gets updated when trusty gets updated | 23:01 |
lifeless | zul: ok, so - tripleo wants it in saucy ;) | 23:01 |
zul | lifeless: thats nice for tripleo, that takes a bit longer then :) | 23:02 |
*** fbo is now known as fbo_away | 23:02 | |
*** markmcclain has joined #openstack-infra | 23:03 | |
*** markmcclain1 has joined #openstack-infra | 23:05 | |
clarkb | jeblair: ticket sent, I cc'd you | 23:05 |
*** markmcclain has quit IRC | 23:07 | |
*** julim has joined #openstack-infra | 23:08 | |
*** ayoung has joined #openstack-infra | 23:08 | |
*** khyati has quit IRC | 23:09 | |
*** jnoller has quit IRC | 23:10 | |
*** sarob has quit IRC | 23:11 | |
openstackgerrit | Mat Lowery proposed a change to openstack-infra/config: Enable list item bullets in CSS except for Jenkins https://review.openstack.org/71752 | 23:12 |
ayoung | jeblair, whom do we bug about enabling evesdrop for #openstack-keystone? I feel like we are coding without git right now | 23:15 |
jeblair | ayoung: sorry, it's lost in the infra review backlog | 23:15 |
ayoung | of course | 23:15 |
clarkb | hsa the change been proposed? | 23:16 |
jeblair | ayoung: well, not lost, but it's there. | 23:16 |
clarkb | I see it | 23:16 |
*** yassine has quit IRC | 23:16 | |
jeblair | i can't really prioritize reviewing irc-related changes right now. sorry. | 23:16 |
clarkb | I will approve, I don't think there are any meetings for the next 45 minutes | 23:17 |
ayoung | heh | 23:17 |
*** gordc has quit IRC | 23:17 | |
ayoung | sorry to be a noodge | 23:17 |
clarkb | ayoung: out of curiousity why vacate -dev? | 23:17 |
dhellmann | dstufft: the problem is the amount of pain to rename the packages we already have :-/ | 23:17 |
jeblair | ayoung: apparently clarkb is the answer. he's nicer than i am. maybe i can convince him to review some of my changes. ;) | 23:17 |
dmsimard | jeblair: Leaving the office, i'll let you know if I still see the issue tomorrow | 23:17 |
ayoung | clarkb, so many people were complaining about the keystone devs crowding out the room | 23:17 |
clarkb | ayoung: thats the point | 23:18 |
dstufft | dhellmann: sufficient pain to teach you the error of your ways ;) | 23:18 |
dstufft | (yes it sucks :( ) | 23:18 |
clarkb | ayoung: eg that is a good thing | 23:18 |
ayoung | clarkb, think we should stay in -dev? | 23:18 |
clarkb | oh well | 23:18 |
dstufft | dhellmann: (true talk, basically this pain is why I'm anti namespaces, because i know this feel) | 23:19 |
clarkb | ayoung: not necessarily. I definitely seem ot have a different idea of how irc should work than most | 23:19 |
clarkb | ayoung: I expect folks to use clients that don't suck :) | 23:19 |
dhellmann | dstufft: well, it's pain on the packagers, not on me | 23:19 |
dhellmann | the same pain applies for renaming anything | 23:19 |
ayoung | clarkb, I preferred being in -dev as it meant I was paying attention there and tended to answer General Purpose questions, too | 23:19 |
morganfainberg | clarkb, ++ on clients that don't suck | 23:20 |
*** oubiwann has joined #openstack-infra | 23:20 | |
zul | wth are we renaming now? | 23:20 |
morganfainberg | clarkb, and i agree w/ ayoung, but if there is a real push for us to be elsewhere, I'm ok with it. | 23:20 |
clarkb | zul: everything | 23:20 |
zul | awesome | 23:21 |
* zul goes jump off a cliff | 23:21 | |
morganfainberg | clarkb, *shrug* it's why i hang out here as well, good convos, and sometimes even unrelated to -infra stuffs | 23:21 |
*** yamahata has quit IRC | 23:21 | |
fungi | morganfainberg: we fish you in with good conversation and then try to put you to work on infra tasks ;) | 23:21 |
morganfainberg | fungi, LOL someday when dolphm_503 hasn't swamped us keystone folks w/ work, I'll be contributing more to infra :) | 23:22 |
morganfainberg | fungi, actually... it is on my "I will be more involved in this" list for Juno | 23:22 |
clarkb | lol logstash gearman backlog isn't falling | 23:22 |
*** flaper87 is now known as flaper87|afk | 23:23 | |
*** dmsimard has quit IRC | 23:23 | |
*** CaptTofu has quit IRC | 23:24 | |
jeblair | fungi: have the static slaves been deleted and nodepool config adjusted? | 23:24 |
jeblair | no | 23:24 |
jeblair | https://review.openstack.org/#/q/status:open+project:openstack-infra/config+branch:master+topic:single-use,n,z | 23:24 |
*** CaptTofu has joined #openstack-infra | 23:24 | |
openstackgerrit | A change was merged to openstack-infra/config: Add Eavesdrop bot to #openstack-keystone https://review.openstack.org/74472 | 23:27 |
jeblair | fungi, clarkb: i approved the next change in that series; we have a node ready and it has python3 and pypy installed | 23:27 |
jeblair | (i'm thinking 60 more nodes would be helpful now) | 23:27 |
clarkb | I am going to temporarily increase the number of logstash workers to 3 per host while I am watching it. Hopefully that drops the backlog | 23:27 |
morganfainberg | clarkb, ++ thanks for approving that | 23:27 |
clarkb | jeblair: sounds good | 23:28 |
SergeyLukjanov | jeblair, agreed | 23:28 |
*** chris_johnson has quit IRC | 23:28 | |
*** CaptTofu has quit IRC | 23:29 | |
SergeyLukjanov | heh, just understand that it's already 3:30am in my tz while reading scrollback... | 23:30 |
fungi | jeblair: yeah, that sounds good. i removed the static slaves (except py3k) from jenkins01 and 02 but didn't press forward yet with everything else going on | 23:30 |
fungi | we should be safe to delete the static centos6 and precise slaves from rax now | 23:31 |
fungi | i've seen no failures which seem to stem from the precise->bare-precise shift | 23:31 |
SergeyLukjanov | is there anyway to see the gate backlog? | 23:31 |
clarkb | SergeyLukjanov: zuul status? | 23:32 |
*** dstanek has quit IRC | 23:32 | |
*** CaptTofu has joined #openstack-infra | 23:32 | |
*** sarob has joined #openstack-infra | 23:33 | |
*** dkliban_afk has quit IRC | 23:33 | |
SergeyLukjanov | clarkb, it shows only 20 for each queue that are now in progress | 23:33 |
*** openstack has joined #openstack-infra | 23:34 | |
clarkb | SergeyLukjanov: right, 20 is the floor and it will grow as long as there aren't failures | 23:36 |
clarkb | SergeyLukjanov: it adds 2 to the queue for each successful merge and halves with a floor of 20 for each failed merge | 23:36 |
clarkb | SergeyLukjanov: the actual value is in the json blob | 23:36 |
SergeyLukjanov | clarkb, yup, I know, looks like I should sleep a bit to be able to ask correctly :) | 23:37 |
clarkb | SergeyLukjanov: you should sleep more | 23:37 |
clarkb | SergeyLukjanov: compared to you and fungi I think I get more sleep than the both of you combined | 23:37 |
clarkb | >_> | 23:37 |
SergeyLukjanov | clarkb, oh, thanks for the tip about json | 23:37 |
*** hemna_ is now known as hemnafk | 23:38 | |
SergeyLukjanov | clarkb, :) | 23:38 |
*** oubiwann has quit IRC | 23:38 | |
clarkb | jeblair: ok, I think I just need to leave es and logstash be for a while and see how they do over a larger time sample | 23:39 |
clarkb | jeblair: anything in particular you think needs attention re feature proposal freeze? | 23:39 |
clarkb | if not I am going to go through review backlogs | 23:39 |
clarkb | jeblair: we have a response from hpcloud, it happens every time right? and we are booting precise images there? | 23:41 |
* clarkb pokes at nodepool for info | 23:41 | |
morganfainberg | out of curiosity who do you tell that the link for the hotel block at the omni is now raising a 404 (ATL summit)? | 23:42 |
clarkb | morganfainberg: the foundation | 23:42 |
clarkb | reed would be a good one but is afk this week | 23:42 |
morganfainberg | clarkb, hm, ok i'll hunt down some email in that front. | 23:42 |
morganfainberg | clarkb, k thnks :) | 23:42 |
morganfainberg | clarkb, Infra, they know everything | 23:43 |
morganfainberg | yes.. everything | 23:43 |
morganfainberg | ;) | 23:43 |
*** jgrimm has quit IRC | 23:43 | |
* anteaya finishes reading backscroll | 23:44 | |
*** jerryz has quit IRC | 23:44 | |
*** protux has quit IRC | 23:44 | |
*** dstanek has joined #openstack-infra | 23:44 | |
*** denis_makogon_ has quit IRC | 23:46 | |
*** jergerber has quit IRC | 23:46 | |
*** jerryz has joined #openstack-infra | 23:46 | |
openstackgerrit | Cyril Roelandt proposed a change to openstack-infra/config: python-ceilometerclient: make the py33 gate voting https://review.openstack.org/74875 | 23:46 |
*** esker has quit IRC | 23:46 | |
*** alexpilotti_ has quit IRC | 23:47 | |
openstackgerrit | A change was merged to openstack-infra/devstack-gate: Add change in README file according to changes in code https://review.openstack.org/74342 | 23:56 |
openstackgerrit | Cyril Roelandt proposed a change to openstack-infra/config: python-ceilometerclient: make the py33 gate voting https://review.openstack.org/74875 | 23:58 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!