sdague | fungi: hey, so what just happend with nodepool, just saw a huge drop in nodes | 00:03 |
---|---|---|
fungi | sdague: i'm restarting it to try the aggressive-delete patch and see if that gets us back some of those deleted nodes faster | 00:03 |
sdague | cool | 00:04 |
sdague | +1 | 00:04 |
fungi | but it has to quiesce node creation/deletion activity before a graceful restart | 00:04 |
fungi | almost there | 00:04 |
lifeless | fungi: ctrl-C :P | 00:04 |
*** jhesketh_ has joined #openstack-infra | 00:05 | |
sdague | fungi: once you are done, promoting - 67480 is probably a good idea. It will help give us a console on some of the network tests that are racing | 00:09 |
fungi | will do | 00:10 |
*** salv-orlando has quit IRC | 00:11 | |
*** dcramer_ has quit IRC | 00:12 | |
sdague | thanks | 00:12 |
openstackgerrit | lifeless proposed a change to openstack-infra/config: Clamp MTU in TripleO instances https://review.openstack.org/67740 | 00:14 |
openstackgerrit | lifeless proposed a change to openstack-infra/config: Update the geard server for tripleo-gate. https://review.openstack.org/67680 | 00:14 |
openstackgerrit | lifeless proposed a change to openstack-infra/config: Configure eth1 for DHCP in tripleo-gate instances https://review.openstack.org/67260 | 00:14 |
fungi | sdague: the promote did interesting things to the enqueued times for a few changes in the gate | 00:15 |
lifeless | yay | 00:21 |
lifeless | | 3e251d4c-377d-4fa4-9b6a-4eff78f86cd7 | precise-1390175363.template.openstack.org | ACTIVE | image_pending_upload | Running | default-net=10.0.0.7, 138.35.77.21; tripleo-bm-test=192.168.1.12 | | 00:21 |
lifeless | significant progress | 00:21 |
lifeless | now if I can just get someone to take all my patches ;) | 00:21 |
openstackgerrit | lifeless proposed a change to openstack-infra/nodepool: Don't load system host keys. https://review.openstack.org/67738 | 00:23 |
openstackgerrit | lifeless proposed a change to openstack-infra/nodepool: Ignore vim editor backup and swap files. https://review.openstack.org/67651 | 00:23 |
openstackgerrit | lifeless proposed a change to openstack-infra/nodepool: Only attempt to copy files when bootstrapping. https://review.openstack.org/67678 | 00:23 |
openstackgerrit | lifeless proposed a change to openstack-infra/nodepool: Document that fake.yaml isn't usable. https://review.openstack.org/67679 | 00:23 |
lifeless | and -woo- | 02394c2d-a200-4e9b-83a4-ca2d87b411f1 | precise-ci-overcloud-1.slave.openstack.org | BUILD | spawning | NOSTATE | | 00:24 |
lifeless | ci-overcloud open for business, minions! | 00:24 |
fungi | booyah | 00:25 |
lifeless | fungi: so with all my patches applied, it should be good. We're now blocked on this :( | 00:29 |
lifeless | fungi: how can I move it forward? | 00:29 |
mattoliverau | Been reading a weekends (including US friday) worth of scroll back. Looks like the new new new zuul migration worked out well. glad to hear it! | 00:29 |
fungi | lifeless: use toothpicks to hold clarkb's eyelids open while he reviews all of it ;) | 00:30 |
lifeless | clarkb: Hi. You need toothpicks? | 00:30 |
lifeless | fungi: can mordred review this, if I can distract him sufficiently? | 00:31 |
fungi | mattoliverau: well enough. the stumbling blocks we did hit counted as learning experiences/bugs worth fixing | 00:31 |
fungi | lifeless: probably. i can too if i free myself up sufficiently, but there's a lot of other stuff we all need to review too | 00:31 |
lifeless | fungi: I know :( | 00:32 |
sdague | fungi: it did? | 00:34 |
lifeless | mordred: it would be a great help if you could review everything from me in infra/config and infra/nodepool | 00:35 |
sdague | hmmm... yeh, so it definitely reset some of those | 00:35 |
sdague | interesting | 00:35 |
fungi | sdague: i bet it's gerrit dependencies | 00:35 |
fungi | look at the pattern | 00:36 |
sdague | yeh, could be | 00:36 |
sdague | so the items only include the roots? | 00:36 |
fungi | seems to always be items reset immediately following other items from the same project. zuul looks for and pulls in any approved dependencies, so however that's being accomplished may be creating new objcects | 00:37 |
fungi | objects | 00:37 |
fungi | rather than reusing the existing ones | 00:37 |
sdague | well, it's recreating all of them | 00:37 |
sdague | my patch just provided a way of setting the enqueue time | 00:38 |
sdague | but I guess the children are a little different | 00:38 |
fungi | got it. so i guess that code path isn't hit for dependent changes | 00:38 |
*** DennyZhang has joined #openstack-infra | 00:39 | |
sdague | yep | 00:39 |
sdague | so I think once this lands - https://review.openstack.org/#/c/67739/ stable/havana will work again | 00:39 |
sdague | at least that's the current blocker | 00:39 |
*** sarob has joined #openstack-infra | 00:43 | |
mordred | lifeless: what? | 00:49 |
lifeless | mordred: I've kindof patchbombed a bunch of stuff to get tripleo-ci functional (the infra/config and nodepool bits we need) | 00:49 |
mordred | lifeless: yeah - I saw that - I'll go read | 00:50 |
mordred | once those land, you belive that ci-overcloud is good for business? | 00:50 |
mordred | lifeless: also, how much capacity does it have? can we also use it for normal gate nodes? | 00:50 |
lifeless | mordred: derekh and I have been pushing hard on actually, you know, having it all work and we're now (running manually) in end to end fine tuning | 00:50 |
mordred | :) | 00:50 |
lifeless | mordred: so status with these patches: | 00:50 |
lifeless | - we should be able to run 24 cores of jenkins slaves | 00:51 |
lifeless | - and right now uhm 10 test environments | 00:51 |
fungi | 24 cores of jenkins slaves meaning 6 slaves? | 00:51 |
fungi | (at 4x vcpu each) | 00:51 |
lifeless | fungi: dunno, depends on the size we choose. remember it's not running devstack-gate | 00:51 |
fungi | ahh, yeah | 00:52 |
mordred | right. I was just asking about its capacity for also running d-g - mainly because I'm asking everyone that right now. the answer can be "nope" | 00:52 |
lifeless | we have another 40+ machines we can start scaling up into | 00:52 |
lifeless | plus the RH cloud coming along | 00:52 |
lifeless | I'm trying to highlight that 'good for business' is nuanced :) | 00:53 |
lifeless | the silent queue runs on everything but doesn't vote, right ? | 00:53 |
fungi | lifeless: the silent queue doesn't report to the change at all, just uploads logs and sends stats to graphite | 00:54 |
lifeless | fungi: is there something that reports but won't vote ? | 00:54 |
fungi | lifeless: use a filter in the jobs section of layout.yaml to set voting: false on a job or job name pattern | 00:55 |
lifeless | oh right | 00:55 |
fungi | same place you filter which jobs run on what branch name patterns | 00:55 |
fungi | then it will report back to the change, but its result won't be taken into account for the verify score | 00:56 |
lifeless | mordred: oh, running actual devstack-gate jobs, the ones rh and hp run today? | 00:58 |
lifeless | mordred: I think we should layer that in only after everything else is working | 00:59 |
mordred | kk | 00:59 |
lifeless | mordred: not so much a capacity issue (though there is that) but rather what benefit we get | 00:59 |
lifeless | d-g is running elsewhere | 00:59 |
lifeless | tripleo-gate isn't | 00:59 |
mordred | d-g is runnign elsewhere, but the gate is under pretty massive duress atm | 00:59 |
lifeless | once tripleo-gate is running and heading up the path to being a symmetric gate with everything else | 01:00 |
mordred | although maybe rax will bump our quota | 01:00 |
mordred | lifeless: ++ | 01:00 |
lifeless | then adding more d-g nodes in excess capacity would be a great thing to do | 01:00 |
openstackgerrit | lifeless proposed a change to openstack-infra/config: Don't vote with gate-tripleo-deploy yet. https://review.openstack.org/67743 | 01:03 |
mordred | lifeless: ok. your config changes are +2/+A - they all seem pretty directly only touching tripleo at the moment | 01:14 |
lifeless | mordred: yeah, we're not in the collective gate yet | 01:15 |
*** sarob has quit IRC | 01:17 | |
*** DennyZhang has quit IRC | 01:17 | |
*** sarob has joined #openstack-infra | 01:20 | |
fungi | sdague: the promoted change failed on a tempest test with "The resource could not be found." | 01:22 |
*** nosnos has joined #openstack-infra | 01:24 | |
sdague | bummer, link? | 01:27 |
jog0 | so it looks like the console logs are still not in elasticSearch is that correct | 01:28 |
fungi | sdague: https://jenkins04.openstack.org/job/gate-tempest-dsvm-full/3488/ | 01:28 |
jog0 | fungi: ^ | 01:28 |
fungi | jog0: they should be for any jobs run through jenkins01 and jenkins02 | 01:28 |
mordred | jog0: we're rolling out the new plugin version one jenkins ata time | 01:29 |
mordred | btw - the fact that we have 5 jenkins master is still kinda amazing to me | 01:29 |
jog0 | mordred: yeah last I checked it was 3 | 01:30 |
*** Guest52195 is now known as maelfius | 01:30 | |
jog0 | ahh, I'll manually check for jenkins01 logs in elasticSearch | 01:30 |
*** maelfius is now known as Guest62585 | 01:31 | |
jog0 | this data missing means we are running partially blind in elastic-search | 01:31 |
mordred | jog0: thats what you'll get for being sick for a period of time | 01:31 |
mordred | yup | 01:31 |
mordred | jog0: sdague was talking about that earlier | 01:31 |
fungi | jog0: i think dims has a patch proposed for adding the name of the jenkins master as a metadata field so it can be searched/summarized | 01:31 |
jog0 | fungi: yeah that will really help | 01:32 |
*** Guest62585 is now known as needscoffee | 01:32 | |
*** needscoffee has joined #openstack-infra | 01:32 | |
*** needscoffee is now known as morganfainberg | 01:32 | |
*** morganfainberg is now known as morganfainberg|z | 01:32 | |
jog0 | touchdown seahawks | 01:32 |
*** morganfainberg|z is now known as morganfainberg | 01:32 | |
mordred | jog0: that was a RUN | 01:33 |
mordred | clarkb: are you at the stadium? | 01:33 |
openstackgerrit | A change was merged to openstack-infra/nodepool: Permit specifying instance networks to use. https://review.openstack.org/66394 | 01:33 |
*** cyeoh has quit IRC | 01:33 | |
sdague | fungi: sigh, yeh, that's unrelated. It was on my monday fix list | 01:34 |
openstackgerrit | A change was merged to openstack-infra/nodepool: Permit using a known keypair when bootstrapping. https://review.openstack.org/67649 | 01:34 |
openstackgerrit | A change was merged to openstack-infra/nodepool: Add some debugging around image checking. https://review.openstack.org/67650 | 01:34 |
openstackgerrit | A change was merged to openstack-infra/nodepool: Only attempt to copy files when bootstrapping. https://review.openstack.org/67678 | 01:34 |
openstackgerrit | A change was merged to openstack-infra/nodepool: Document that fake.yaml isn't usable. https://review.openstack.org/67679 | 01:34 |
openstackgerrit | A change was merged to openstack-infra/nodepool: Don't load system host keys. https://review.openstack.org/67738 | 01:34 |
*** dcramer_ has joined #openstack-infra | 01:34 | |
openstackgerrit | A change was merged to openstack-infra/nodepool: Ignore vim editor backup and swap files. https://review.openstack.org/67651 | 01:34 |
jog0 | fungi: confirmed that jenkins01 logs are in elasticSearch | 01:34 |
jog0 | at least for a passing job | 01:35 |
fungi | jog0: and for jenkins02 that should be the case as well, as of about 6 hours ago (rough estimate) | 01:35 |
lifeless | fungi: where is your branch updating the nodepool definition for ci-overcloud ? I have tweaks | 01:35 |
jog0 | fungi: cool | 01:35 |
*** sarob has quit IRC | 01:35 | |
fungi | lifeless: https://review.openstack.org/#/q/status:open+project:openstack-infra/config+branch:master+topic:tripleo-ci,n,z | 01:36 |
fungi | it's really just https://review.openstack.org/66491 though | 01:36 |
openstackgerrit | lifeless proposed a change to openstack-infra/config: Update TripleO Cloud API endpoint for Nodepool https://review.openstack.org/66491 | 01:39 |
lifeless | mordred: ^ needed too | 01:39 |
lifeless | then I think we can turn it on and start debugging the actual test scripts | 01:39 |
openstackgerrit | A change was merged to openstack-infra/config: Improve tripleo nodepool image build efficiency. https://review.openstack.org/67255 | 01:43 |
lifeless | I think then I need to look into how all the zuul ref stuff works so that we can make sure we run the code being merged not the code in trunk | 01:43 |
openstackgerrit | A change was merged to openstack-infra/config: Configure eth1 for DHCP in tripleo-gate instances https://review.openstack.org/67260 | 01:43 |
openstackgerrit | A change was merged to openstack-infra/config: Update the geard server for tripleo-gate. https://review.openstack.org/67680 | 01:44 |
sdague | fungi: so yah, that's the big giant stack trace in pci | 01:45 |
mordred | lifeless: it's actuallu pretty straightforward - zuul sends you a refspec and you use that | 01:46 |
lifeless | mordred: yeah, I know but ... | 01:46 |
lifeless | mordred: we need to translate that to our various refs | 01:46 |
lifeless | etc | 01:46 |
lifeless | its not that its hard, its that we need to do it | 01:46 |
mordred | lifeless: wait - what do you mean by "our various refs" ? | 01:48 |
mordred | why would your refs be different? | 01:48 |
lifeless | we have one set of variables - git url, branch, commitish - per source repository | 01:48 |
lifeless | we don't consult ZUUL_REF | 01:48 |
mordred | well, if you don't consult ZUUL_REF, your going to have a very hard time getting the right commit | 01:49 |
lifeless | thus my point | 01:49 |
lifeless | just like devstack doesn't consult ZUUL_REF but devstack_gate arranges it so things DTRT we need to do the same | 01:50 |
mordred | I do not undersatnd your souther hemisphere english | 01:50 |
*** dkranz has quit IRC | 01:51 | |
StevenK | mordred: You need to read it upside down | 01:54 |
mordred | StevenK: DOH | 01:54 |
*** zhiwei has joined #openstack-infra | 01:54 | |
lifeless | mordred: anyhow, nvm - I know we have more to do, and I know how zuul works it, and I know our plumbing which you perhaps don't know as much as you could :) | 01:58 |
jog0 | given a failed job how do I now which jenkins server it ran on? | 02:00 |
jog0 | http://logs.openstack.org/92/64592/4/check/check-tempest-dsvm-neutron-isolated/c6cda8d/ | 02:00 |
fungi | jog0: easiest way is to look at the hyperlink embedded in the first few lines to the slave hostname | 02:01 |
mordred | lifeless: I usually hire a plumber to deal with plumbing issues... | 02:01 |
jog0 | fungi: oh nice | 02:02 |
jog0 | jenkins01, so this should get console logs | 02:02 |
lifeless | mordred: you did | 02:03 |
mordred | lifeless: that's what I'm saying | 02:04 |
fungi | sdague: i just noticed where the subway can use another color... chances cancelled because they depend on another change which is failing or hitting a merge conflict (wight now those show up red) | 02:04 |
fungi | s/chances/changes/ | 02:04 |
openstackgerrit | ChangBo Guo proposed a change to openstack-dev/hacking: Add check for removed modules in Python 3 https://review.openstack.org/61049 | 02:05 |
lifeless | mordred: are you offering to do the work for tripleo? | 02:05 |
lifeless | mordred: or am I just horribly confused | 02:05 |
*** pcrews has quit IRC | 02:05 | |
mordred | lifeless: let's go with confused | 02:06 |
*** senk has joined #openstack-infra | 02:07 | |
sdague | fungi: sure | 02:08 |
mordred | jog0: wow. that was a throw right there | 02:08 |
jog0 | mordred: didn't see it got distracted by work | 02:09 |
jog0 | but I did hear yelling from the bar down the street | 02:09 |
mordred | jog0: oh my. it was a 50+ yard throw 4th down conversion for a TD | 02:09 |
*** senk has quit IRC | 02:09 | |
jog0 | ouch | 02:10 |
jog0 | tie game | 02:10 |
mordred | it was the type of throw which makes me worry for property damage in downtown sf | 02:10 |
mordred | jog0: nope. seattle is in the lead by 3 now | 02:10 |
jog0 | ahh the online score is outdated | 02:10 |
jog0 | I was downtown on new years and it looked like something out of mad max | 02:11 |
mordred | I try to not be in placse like that | 02:11 |
mordred | of course, we've been booking our stuff for mardi gras, so I'm actually full of shit :) | 02:11 |
sdague | jog0: you didn't manage to classify this one yet, did you - https://bugs.launchpad.net/nova/+bug/1270680 | 02:11 |
jog0 | I can't imagine what sf would do in this case | 02:11 |
jog0 | sdague: no sorting out some kinks on my e-r patch | 02:12 |
sdague | jog0: ok, cool, I just didn't want to dupe something you'd gotten | 02:13 |
*** sarob has joined #openstack-infra | 02:13 | |
jog0 | sdague: btw I think it would be interesting to plot commits to openstack/openstack and zuul gate queue | 02:13 |
sdague | jog0: sure. I'm trying to keep a balance between making the problem visible and fixing it | 02:14 |
sdague | because visibility is only seeming to work so much | 02:14 |
jog0 | sdague: heh yeah, well that would tell us if things are getting worse or better merge rate wise | 02:14 |
jog0 | so I don't know the answer to the following question: did concurrency=2 in gate make things better or worse | 02:15 |
fungi | ugh... https://jenkins04.openstack.org/job/gate-swift-dsvm-functional/728/consoleText | 02:15 |
jog0 | and merge rate *may* shine a *little* insight into that | 02:15 |
fungi | fail on an hpcloud-az2 slave failing to connect via ipv6 to git.openstack.org. wtf? | 02:15 |
fungi | Building remotely on devstack-precise-hpcloud-az2-1143800 [...] fatal: unable to connect to git.openstack.org: git.openstack.org[0: 2001:4800:7813:516:3bc3:d7f6:ff04:aacb]: errno=Network is unreachable | 02:16 |
fungi | uh, yeah, hpcloud az2 has no ipv6. why did you try to use it? | 02:16 |
sdague | sweet fumble! | 02:16 |
* fungi is apparently missing some very enthralling sportball | 02:17 | |
sdague | yes | 02:17 |
sdague | especially if you don't like SF :) | 02:17 |
StevenK | I am too, but australia only shows american football on pay TV | 02:17 |
lifeless | fungi: you might have local ipv6 connectivity | 02:18 |
clarkb | I ammissing it :( | 02:18 |
fungi | lifeless: well, somehow that slave thought it had a global ipv6 address assigned | 02:18 |
*** yaguang has joined #openstack-infra | 02:18 | |
clarkb | saw the kearse td. did seattle just recover a fumble? | 02:18 |
clarkb | sdague ^ | 02:18 |
*** sarob has quit IRC | 02:18 | |
lifeless | fungi: clearly it *did8 | 02:18 |
lifeless | fungi: just not a working one... | 02:19 |
fungi | indeed | 02:19 |
openstackgerrit | Sean Dague proposed a change to openstack-infra/elastic-recheck: add hit for bug 1270680 https://review.openstack.org/67751 | 02:19 |
fungi | freakish. first time i've seen an hp vm do that | 02:19 |
sdague | clarkb: yes | 02:19 |
sdague | jog0: can you look at that er fingerprint? | 02:19 |
lifeless | oh wow we clone all of stackforge too... | 02:20 |
jog0 | sdague: looking | 02:20 |
lifeless | I wonder if we made one mega git repo | 02:20 |
lifeless | and sucked *everything* into it | 02:20 |
lifeless | and then made branches it would be faster | 02:20 |
mordred | lifeless: I've been meaning to get a grokmirror thing set up - the kernel guys says it helps | 02:24 |
jog0 | sdague: message:"TRACE nova.api.openstack" AND message:"pci.py" AND message:"InstanceNotFound: Instance" AND filename:"logs/screen-n-api.txt" | 02:24 |
jog0 | message:"TRACE nova.api.openstack" AND message:"InstanceNotFound: Instance" AND filename:"logs/screen-n-api.txt" | 02:24 |
jog0 | those have very different hit counts | 02:24 |
*** senk has joined #openstack-infra | 02:24 | |
sdague | they do, it's not limitted to that extension | 02:24 |
mordred | clarkb: how are you MISSING the sportsball? | 02:24 |
sdague | at least from what I can tell | 02:24 |
mordred | clarkb: it's one of teh best games of sports I've seen in a while | 02:24 |
openstackgerrit | ChangBo Guo proposed a change to openstack-dev/hacking: Add check for removed modules in Python 3 https://review.openstack.org/61049 | 02:25 |
clarkb | mordred: I have friends that dont sports ball. about to be at a house party will try watching from there | 02:25 |
mordred | clarkb: 8:33 left in the 4th | 02:25 |
sdague | jog0: I actually think this is one of the new ones that is biting us hard | 02:25 |
clarkb | mordred we still winning | 02:26 |
clarkb | ? | 02:26 |
sdague | wow, worst handoff ever | 02:26 |
sdague | clarkb: yes, but refumbled | 02:26 |
mordred | clarkb: yeah. but by 3 - and jsut lost it on downs | 02:26 |
jog0 | sdague: so this has rougly equal hits for FAILURE and SUCCESS | 02:26 |
jog0 | which is actually not a horrible query | 02:26 |
mordred | clarkb: be VERY glad you didn't see the knee braek though | 02:26 |
sdague | jog0: yes, you read the log message right | 02:26 |
sdague | even on success, we are doing bad things, because we're going to be leaking resources | 02:26 |
sdague | as those success versions are on tempest compute deletes | 02:27 |
jog0 | agreed | 02:27 |
jog0 | so LGTM | 02:27 |
*** nati_uen_ has quit IRC | 02:27 | |
* jog0 +As sdague's patch | 02:27 | |
sdague | woot | 02:27 |
sdague | this quarter is just rediculous | 02:30 |
clarkb | I need play by play :) | 02:30 |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: add hit for bug 1270680 https://review.openstack.org/67751 | 02:31 |
StevenK | clarkb: Aren't there apps for that? | 02:31 |
sdague | clarkb: seatle just intercepted | 02:31 |
mordred | clarkb: seattle just intercepted again | 02:31 |
lifeless | mordred: goingto +A too ? https://review.openstack.org/#/c/66491/ [it's passed everything except the yaml order check, which it doesn't affect] | 02:32 |
sdague | which means we just had: fumble SF, fumble (but not called) SEA, fumble (and self recovery) on 4th down by SEA, interception by SF | 02:33 |
sdague | in about 8 downs | 02:33 |
mordred | yeah - fungi, you ok with https://review.openstack.org/#/c/66491/ going in? | 02:33 |
sdague | the only thing that would make this better is snow :) | 02:34 |
mordred | sdague: or a giant earthquake | 02:34 |
fungi | mordred: sure, it won't take effect automatically anyway because it's the reason puppet's still disabled on nodepool.o.o | 02:34 |
mordred | fungi: k. awesome | 02:35 |
fungi | mordred: see https://review.openstack.org/66958 and accompanying bug 1269001 for details | 02:35 |
sdague | fungi: when you get a chance can you see if I goofed this up too badly - https://review.openstack.org/#/c/67591/ | 02:35 |
sdague | that will give us the uncategorized jobs list | 02:36 |
notmyname | how do I deal with the error on the 2nd job in the gate right now? logs: https://jenkins04.openstack.org/job/gate-swift-dsvm-functional/728/console | 02:37 |
notmyname | error connecting to git | 02:37 |
openstackgerrit | A change was merged to openstack-infra/config: Clamp MTU in TripleO instances https://review.openstack.org/67740 | 02:38 |
notmyname | if the top one fails will it stay in? or is it too late? any chance it can be retried right there so as not to wait another 40+ hours? | 02:38 |
*** senk has quit IRC | 02:38 | |
openstackgerrit | A change was merged to openstack-infra/config: Don't vote with gate-tripleo-deploy yet. https://review.openstack.org/67743 | 02:38 |
notmyname | patch set 65604,3 | 02:39 |
fungi | notmyname: i'm stumped on that one--was looking at it earlier. hpcloud west doesn't provide global ipv6 to tenant networks, so why it thought it had one is a real enigma | 02:39 |
mordred | clarkb: field-goal. seahawks up by 6 | 02:39 |
*** mrda has quit IRC | 02:39 | |
notmyname | fungi: any hope for it going in? looks like zuul already recalculated it so it has to go to the bottom of the queue with a manual reverify? | 02:39 |
fungi | notmyname: i think the very top of that diagram gets it wrong | 02:40 |
fungi | notmyname: if you look at https://jenkins04.openstack.org/job/gate-swift-dsvm-functional/728/ it says Other changes tested concurrently with this change: 65255,1 | 02:40 |
notmyname | fungi: so my only hope is that the top one fails? | 02:40 |
fungi | notmyname: yeah, if the change running ahead of it fails, it will be retested on the branch tip | 02:41 |
*** mrda has joined #openstack-infra | 02:42 | |
sdague | fungi: so we've seen this creep up a couple times before - http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiMjAwMTo0ODAwOjc4MTM6NTE2OjNiYzM6ZDdmNjpmZjA0OmFhY2JcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiYWxsIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM5MDE4NTgyOTgxNX0= | 02:45 |
sdague | http://logs.openstack.org/19/65019/2/gate/gate-grenade-dsvm/d5c5219/console.html.gz | 02:46 |
sdague | fungi: so maybe we should register a bug and a fingerprint for it | 02:46 |
*** gokrokve has quit IRC | 02:48 | |
sdague | clarkb: interception in the end zone | 02:50 |
mordred | clarkb: INTERCEPTED | 02:50 |
sdague | by SEA | 02:50 |
jog0 | wow SF fail | 02:50 |
mordred | like, wow | 02:50 |
clarkb | mordred sdague you guys are awesome thank you | 02:50 |
sdague | final score SF: 17, SEA 23 | 02:53 |
clarkb | \o/ | 02:53 |
jog0 | ☹ | 02:54 |
mordred | clarkb: when is supersportsball? | 02:54 |
mordred | clarkb: next week or in 2 weeks? | 02:55 |
clarkb | 2weeks | 02:55 |
clarkb | feb 2nd | 02:55 |
mordred | I'll be in Brussels | 02:56 |
mordred | I'm going to need to find a place with the game | 02:56 |
mordred | because broncos seahawks is going to be interesting | 02:57 |
*** gokrokve has joined #openstack-infra | 02:57 | |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Mark resolved bugs https://review.openstack.org/67752 | 02:57 |
sdague | jog0: so look at - http://status.openstack.org/elastic-recheck/ | 02:58 |
jog0 | sdague: looking | 02:58 |
sdague | I actually think Bug 1270680 - v3 extensions api inherently racey wrt instances - might be one of our biggest new gate issues | 02:58 |
sdague | and the reason we're getting killed right now | 02:59 |
jog0 | sdague: yeah I agree | 02:59 |
jog0 | 1270680 has a pretty graph lol | 02:59 |
jog0 | so many colors | 02:59 |
sdague | heh, yeh | 02:59 |
sdague | so I marked it as critical for nova | 02:59 |
jog0 | sdague: cool | 03:00 |
sdague | I'll dive on it tomorrow | 03:00 |
jog0 | it looks like its tome to do some git log and git bisect | 03:00 |
jog0 | since we know when it started | 03:00 |
sdague | actually, it's when we actually started testing it | 03:00 |
sdague | that code's been in nova since oct | 03:00 |
jog0 | :( | 03:00 |
sdague | maybe something else changed wrt to it | 03:00 |
sdague | also, we're kind of mostly blind for the last couple of weeks | 03:01 |
jog0 | hmm so logstash.o.o doesn't show the same graph | 03:01 |
jog0 | there are hits before jan 16th there | 03:01 |
*** gokrokve has quit IRC | 03:02 | |
openstackgerrit | A change was merged to openstack-infra/config: Update TripleO Cloud API endpoint for Nodepool https://review.openstack.org/66491 | 03:02 |
*** cyeoh has joined #openstack-infra | 03:02 | |
*** AaronGr_Zzz is now known as AaronGr | 03:03 | |
sdague | jog0: maybe hot data issue | 03:03 |
sdague | let's see if the query fills out next go around | 03:03 |
jog0 | yeah | 03:03 |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Mark resolved bugs https://review.openstack.org/67752 | 03:04 |
notmyname | fungi: what bug number should I use for a recheck? | 03:05 |
openstackgerrit | A change was merged to openstack-infra/elastic-recheck: Add check for bug 1270608 https://review.openstack.org/67713 | 03:05 |
*** praneshp_ has joined #openstack-infra | 03:05 | |
notmyname | I know you guys are aware of it, but I want this to be in the logs (ie on the record). the patch that is about to fail (because the test node couldn't connect to git.o.o) has been in the queue for at least 50 hours and been rechecked 19 times due to gate resets | 03:06 |
fungi | notmyname: i don't think we have one for it--not that i've seen at any rate | 03:06 |
*** praneshp has quit IRC | 03:06 | |
*** praneshp_ is now known as praneshp | 03:06 | |
sdague | notmyname: I think it's worth reporting one against openstack-ci and we can build an er query for it | 03:08 |
sdague | there were 2 logstash hits back on the 8th | 03:08 |
sdague | so it happens from time to time | 03:08 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Remove remaining cases of '@message' https://review.openstack.org/67754 | 03:08 |
sdague | oh, finally, I was going to get around to that | 03:08 |
notmyname | sdague: could you please do that and give me a bug number? | 03:08 |
sdague | notmyname: you can't register a bug? | 03:09 |
*** sarob has joined #openstack-infra | 03:09 | |
notmyname | sdague: I'm not particularly in the mood to file a bug against the gate and make it polite or charitable | 03:10 |
notmyname | see the above number for why | 03:10 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Remove remaining cases of '@message' https://review.openstack.org/67754 | 03:10 |
*** AaronGr is now known as AaronGr_Zzz | 03:11 | |
*** sarob_ has joined #openstack-infra | 03:13 | |
*** sarob has quit IRC | 03:14 | |
notmyname | the job ahead actually failed!!!!!!!! | 03:15 |
fungi | seems that way | 03:15 |
notmyname | and now with another 60+ minutes to check the status, I'm stepping away for a bit | 03:16 |
*** sarob_ has quit IRC | 03:18 | |
clarkb | is something broken? | 03:19 |
clarkb | sorry superbowl is happening | 03:19 |
*** gokrokve has joined #openstack-infra | 03:19 | |
StevenK | Not for another two weeks? :-P | 03:19 |
fungi | clarkb: nothing new is broken, to my knowledge. do you ask for any particular reason, or just checking in? | 03:20 |
clarkb | fungi the bug number questions | 03:22 |
fungi | clarkb: oh, apparently we saw an hpcloud vm in az2 fail a job because it tried to connect to the ipv6 address of git.o.o and (unsurprisingly) got a network unreachable response | 03:23 |
fungi | which means it must have somehow gotten a global-scope address from somewhere | 03:24 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Clarify required parameters in query_builder https://review.openstack.org/67756 | 03:24 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Use short build_uuids in elasticSearch queries https://review.openstack.org/67596 | 03:24 |
*** gokrokve has quit IRC | 03:24 | |
fungi | clarkb: all i can guess is maybe another client in the same ethernet broadcast domain had radvd running or was otherwise generating router advertisements for some reason | 03:25 |
clarkb | awesome | 03:25 |
clarkb | dont we firewall that? | 03:27 |
clarkb | or not because no ipv6 typically | 03:27 |
fungi | ipv6 icmp type for ra? probably not explicitly | 03:27 |
*** vkozhukalov has joined #openstack-infra | 03:28 | |
lifeless | fungi: so, if its all landed/ing we can reenable puppet? | 03:35 |
*** nati_ueno has joined #openstack-infra | 03:41 | |
jog0 | https://review.openstack.org/#/c/67485/ | 03:41 |
jog0 | that should help with resource issues ever so slightly | 03:41 |
jog0 | sdague: ^ | 03:41 |
jog0 | thats to get better classification numbers | 03:41 |
*** nati_ueno has quit IRC | 03:43 | |
*** nati_ueno has joined #openstack-infra | 03:44 | |
fungi | lifeless: possibly. i'm not sure if this is a good week to be experimenting with (and potentially destabilizing) nodepool, but i won't really be around to troubleshoot it much during the week so i'll defer to clarkb and mordred if they're going to be in a position to keep an eye on it | 03:48 |
jog0 | sdague: can you review https://review.openstack.org/#/c/67596/2 | 03:54 |
jog0 | still waiting for some more data to finish testing | 03:54 |
jog0 | but it looks like its working | 03:54 |
jog0 | hada failed to classify failure | 03:54 |
jog0 | when the current e-r had a incorrect classification | 03:55 |
jog0 | waiting for a successful classification | 03:55 |
*** uriststk has joined #openstack-infra | 03:56 | |
*** slong has quit IRC | 04:06 | |
*** slong_ has joined #openstack-infra | 04:06 | |
*** sarob has joined #openstack-infra | 04:13 | |
*** uriststk has quit IRC | 04:14 | |
fungi | sdague: jog0: that latest gate reset looks like nova v3 api problems again. does v3 testing just need to be disabled again? | 04:17 |
*** sarob has quit IRC | 04:18 | |
cyeoh | fungi: do you have a link to that failure? | 04:19 |
*** coolsvap has joined #openstack-infra | 04:19 | |
fungi | cyeoh: https://jenkins03.openstack.org/job/gate-tempest-dsvm-neutron/2524/consoleText | 04:19 |
cyeoh | fungi: thx | 04:20 |
*** gokrokve has joined #openstack-infra | 04:20 | |
mattoliverau | cyeoh: are you breaking the v3 api again :P I see you survived the Adelaide heat wave, Melbourne had it pretty bad as well, damn thing followed us back from LCA ;P | 04:23 |
StevenK | mattoliverau: I think the heatwave was on your flight | 04:25 |
cyeoh | mattoliverau: between Perth and Adelaide I ended up with 7 days in a row >40C and three in a row >44C | 04:25 |
cyeoh | I don't think the v3 API is broken but am looking now just to check :-) | 04:25 |
*** gokrokve has quit IRC | 04:25 | |
mattoliverau | StevenK: maybe it hid in my bags :P | 04:25 |
*** dcramer_ has quit IRC | 04:26 | |
fungi | cyeoh: see the earlier discussion, sdague asserts "bug 1270680 - v3 extensions api inherently racey wrt instances - might be one of our biggest new gate issues" | 04:26 |
StevenK | mattoliverau: Haha | 04:26 |
notmyname | bug 1264972 for it? | 04:26 |
notmyname | fungi: ah, bug 1270680 instead? | 04:26 |
cyeoh | fungi: thanks, will look into it now | 04:27 |
fungi | notmyname: i'm not sure--i just keep the lights on. i defer to nova devs like cyeoh and sdague on these sorts of things | 04:27 |
* fungi compares error messages | 04:28 | |
cyeoh | I guess sdague is asleep by now... | 04:28 |
fungi | eh, it's not even midnight in our tz yet. he's probably just distracted by sportball (assuming the game is still going anyway) | 04:29 |
StevenK | No, game finished | 04:29 |
fungi | notmyname: 1264972 looks more searchable anyway | 04:31 |
*** nati_uen_ has joined #openstack-infra | 04:32 | |
cyeoh | fungi: oh yes, 1270680 is definitely a problem. I think there's lighter weight things we can do than disable the v3 api testing though | 04:32 |
*** nati_uen_ has quit IRC | 04:33 | |
*** nati_uen_ has joined #openstack-infra | 04:33 | |
fungi | cyeoh: if it's something which will significantly reduce spurious tempest test failures, i'll gladly shove it to the head of the gate so fewer changes get kicked out needlessly | 04:35 |
notmyname | what magic is behind the zuul queue having patches that have been in the queue for 4 hours ahead of patches that have been around for 35 hours? | 04:35 |
cyeoh | fungi: cool - am just looking now at how to fix it. I think a proper fix should be pretty straight forward | 04:36 |
*** nati_ueno has quit IRC | 04:36 | |
fungi | notmyname: sdague's change to carry over the enqueue time isn't actually used on changes which are gerrit dependencies (you'll note the offenders follow changes with sane-looking enqueue times for the same project) | 04:36 |
notmyname | fungi: ok | 04:36 |
fungi | so when those dependent changes get reenqueued, they end up with their enqueue times reset apparently. just noticed that myself a few hours ago | 04:37 |
notmyname | fungi: I figured it had something to do with dependencies. so the patches with shorter times have been around just as long, or those were put up front because of the git logic? | 04:37 |
notmyname | fungi: ah ok | 04:37 |
fungi | yeah, they've been in there as long as the others | 04:38 |
fungi | it's just lying | 04:38 |
fungi | cosmetic bug | 04:38 |
jog0 | fungi: I haven't dug into the v3 work enough to know if disabling is the right move | 04:38 |
fungi | jog0: cyeoh seems to have lighter-weight ideas there | 04:39 |
*** dcramer_ has joined #openstack-infra | 04:39 | |
jog0 | fungi: cool | 04:40 |
cyeoh | fungi, jog0: so I think we have this potential racey failure mode all over both v2 and v3 APIs | 04:44 |
cyeoh | I guess we've just been lucky in the past (or we haven't noticed it anyway) | 04:44 |
jog0 | cyeoh: agreed | 04:46 |
jog0 | its v2 and v3 | 04:46 |
cyeoh | jog0: so I guess there's two ways to fix this. Cache a whole lot more information in the resp_obj or fail gracefully in extensions if the instance is not found | 04:49 |
jog0 | failing when we don't need to is a bad idea | 04:50 |
cyeoh | I think I prefer the latter - not including a bit of information about an instance which has just been deleted anyway seems okayish to me | 04:50 |
jog0 | as in if the data in the DB but we can't find it ... thats bad | 04:50 |
cyeoh | yea, in this case its because the data has just been deleted. | 04:51 |
jog0 | cyeoh: ahh | 04:51 |
cyeoh | so we can just not append the data we can't to anymore (because of the race) | 04:51 |
cyeoh | "can't get to" I mean | 04:51 |
jog0 | cyeoh: TBH I haven't looked at this enough to have enough of an understanding of the issue | 04:51 |
jog0 | cyeoh: so its your call. I am distracted by elastic-recheck stuff at the moment | 04:52 |
cyeoh | jog0: ok, np. I'll see if its by luck just hitting a specific extension which we can fix quickly now, or if we need to fix all of them to make a difference | 04:52 |
*** amotoki has joined #openstack-infra | 04:55 | |
jog0 | cyeoh: thanks | 04:55 |
*** gokrokve has joined #openstack-infra | 04:57 | |
jog0 | cyeoh: can you update that bug with your comments | 04:57 |
cyeoh | jog0: just did | 04:57 |
jog0 | cyeoh: excellent | 04:58 |
cyeoh | hrm and looking through logstash for occurrences of it just found another bug in the v2 api ;-) | 04:59 |
*** senk has joined #openstack-infra | 05:00 | |
jog0 | :/ | 05:02 |
*** slong has joined #openstack-infra | 05:07 | |
*** slong_ has quit IRC | 05:07 | |
*** katyafervent has quit IRC | 05:11 | |
*** katyafervent has joined #openstack-infra | 05:11 | |
*** sarob has joined #openstack-infra | 05:13 | |
*** nicedice has quit IRC | 05:15 | |
*** sarob has quit IRC | 05:18 | |
jog0 | fungi: how far are we from getting all the jenkins masters to have the console.html=>elasticSearch fix | 05:19 |
fungi | jog0: clarkb and zaro were monitoring the plugin upgrade on jenkins02 before applying it on the others | 05:20 |
*** senk has quit IRC | 05:20 | |
jog0 | fungi: thanks, from what I See the fix is defiantly helping | 05:21 |
*** nati_ueno has joined #openstack-infra | 05:21 | |
*** nati_ueno has quit IRC | 05:21 | |
jog0 | haven't seen a missing console on any jenkins 01 and 02 nodes | 05:22 |
*** nati_ueno has joined #openstack-infra | 05:22 | |
*** krtaylor has joined #openstack-infra | 05:25 | |
*** nati_uen_ has quit IRC | 05:25 | |
lifeless | fungi: ah, so its less about experimenting with nodepool and more about getting jobs running for us; its rather critical path | 05:25 |
lifeless | fungi: we'll obviously stand ready to support any issues it might cause | 05:25 |
lifeless | fungi: could we run a separate nodepool in fact, avoid whatever bugs might lurk in nodepool? | 05:26 |
*** chandankumar_ has joined #openstack-infra | 05:26 | |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Sort uncategorized fails by time https://review.openstack.org/67761 | 05:26 |
lifeless | mordred: clarkb: ^ you may be more awake :P | 05:26 |
clarkb | whats wrong with the existing nodepool? | 05:27 |
clarkb | I'm not sure the nodepool cli is built for two nodepools | 05:27 |
clarkb | or the db stuff | 05:27 |
* fungi is awake, but not *very* awake | 05:30 | |
lifeless | clarkb: fungi is worried that turning tripleo-test-cloud on will cause issues w/nodepool when the gate is already fragile | 05:31 |
lifeless | clarkb: I was suggesting to mitigate that by running an entirely separate nodepool that is connected to the same geard | 05:32 |
fungi | nothing's necessarily wrong with the the existing nodepool. i just don't want to greenlight all the tripleo-ci-supporting patches for it and the config by reenabling puppet on the server when i'm not going to necessarily be around to troubleshoot it | 05:32 |
fungi | so leaving that call to those who will be around | 05:33 |
jog0 | is there a bug filed for: http://logs.openstack.org/23/66223/1/gate/gate-python-heatclient-pypy/9950fd5/console.html#_2014-01-19_01_08_37_063 | 05:35 |
jog0 | pip timeouts | 05:35 |
clarkb | fungi I see | 05:37 |
*** michchap has quit IRC | 05:37 | |
*** michchap has joined #openstack-infra | 05:37 | |
clarkb | jog0 I think if you search under openstack-ci there may be | 05:37 |
jog0 | all I found was https://bugs.launchpad.net/openstack-ci/+bug/1254167 | 05:38 |
jog0 | which is a little differnt | 05:38 |
jog0 | this is the fingerprint I am using: filename:"console.html" AND message:"download.py\", line 495" | 05:38 |
jog0 | there aren't many occurrences thankfully | 05:38 |
lifeless | clarkb: back in the states? | 05:40 |
*** DinaBelova_ is now known as DinaBelova | 05:40 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 05:40 | |
clarkb | lifeless: yes, mostly over jetlag now | 05:41 |
lifeless | clarkb: \o/ | 05:42 |
openstackgerrit | lifeless proposed a change to openstack-infra/config: Tripleo-gate needs the gear library. https://review.openstack.org/67762 | 05:42 |
fungi | the jetlag's not gone, just lulling you into a false sense of security | 05:42 |
StevenK | Haha | 05:42 |
lifeless | mordred: more ^ fodder | 05:42 |
lifeless | mordred: we could install that at runtime, but its really part of base setup | 05:42 |
*** carl_baldwin has joined #openstack-infra | 05:45 | |
clarkb | I will change scp plugins tomorrow on 03 and 04 then resume holidaying | 05:45 |
jog0 | clarkb: thanks | 05:46 |
*** nosnos has quit IRC | 05:52 | |
*** nosnos_ has joined #openstack-infra | 05:52 | |
fungi | oh, right, tomorrow is a usa holiday | 05:55 |
lifeless | nuts :( | 05:55 |
* fungi has lost track of which days are weekends much less holidays | 05:55 | |
fungi | and yes, we're all nuts here | 05:56 |
*** oubiwann_ has quit IRC | 05:56 | |
StevenK | Oh, MLK day | 05:58 |
jog0 | clarkb: how do I add grenade logs to elasticSearch | 05:59 |
jog0 | actully being this is supposed to be the weekend, never mind | 05:59 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/elastic-recheck: Add query for bug 1270710 https://review.openstack.org/67764 | 06:01 |
clarkb | jog0 add the files to the list of files | 06:01 |
jog0 | clarkb: where is that? | 06:02 |
clarkb | jog0 though ideally we list the files without paths and recursively look them up | 06:02 |
jog0 | clarkb: so in tempest the files are under logs/ | 06:03 |
clarkb | modules/openstack_project/files/logstash/sometjingclient.yaml | 06:03 |
jog0 | but in grenade they are under new/logs | 06:03 |
clarkb | jog0 right. today logstash needs full paths | 06:03 |
fungi | okay, i swear i'm really going to try to take a nap now | 06:04 |
jog0 | clarkb: lets pick this up on tuesday | 06:04 |
clarkb | jog0 ok. sherlock is on now :) | 06:04 |
*** nati_uen_ has joined #openstack-infra | 06:10 | |
clarkb | jog0 is the pip fail downloading pip installer from github during devstack? | 06:12 |
clarkb | that is arguably a devstack bug | 06:12 |
jog0 | clarkb: http://logs.openstack.org/23/66223/1/gate/gate-python-heatclient-pypy/9950fd5/console.html#_2014-01-19_01_08_37_063 | 06:13 |
*** sarob has joined #openstack-infra | 06:13 | |
clarkb | ah no different problem | 06:13 |
*** nati_ueno has quit IRC | 06:14 | |
*** gokrokve has quit IRC | 06:16 | |
*** sarob has quit IRC | 06:18 | |
*** rahmu has quit IRC | 06:27 | |
*** DinaBelova has quit IRC | 06:28 | |
*** carl_baldwin has quit IRC | 06:29 | |
*** rahmu has joined #openstack-infra | 06:29 | |
*** DinaBelova has joined #openstack-infra | 06:31 | |
*** mrda has quit IRC | 06:41 | |
*** vkozhukalov has quit IRC | 06:44 | |
*** bookwar has joined #openstack-infra | 06:46 | |
*** gokrokve has joined #openstack-infra | 06:47 | |
*** gokrokve_ has joined #openstack-infra | 06:49 | |
*** jhesketh_ has quit IRC | 06:50 | |
*** yamahata has joined #openstack-infra | 06:52 | |
*** gokrokve has quit IRC | 06:52 | |
*** nosnos_ has quit IRC | 06:52 | |
*** nosnos has joined #openstack-infra | 06:53 | |
*** gokrokve_ has quit IRC | 06:54 | |
*** jhesketh has quit IRC | 06:56 | |
amotoki | hi, I would like to request gerrit account for external testing. | 06:57 |
*** pblaho has joined #openstack-infra | 06:57 | |
amotoki | I am now working on neutron third party testing. Is this a right place to request an account? | 06:57 |
*** gokrokve has joined #openstack-infra | 06:58 | |
*** nati_ueno has joined #openstack-infra | 06:59 | |
clarkb | amotoki please see thr document at http://ci.openstack.org | 06:59 |
*** SergeyLukjanov is now known as SergeyLukjanov_a | 06:59 | |
*** SergeyLukjanov_a is now known as SergeyLukjanov_ | 07:00 | |
*** nati_uen_ has quit IRC | 07:01 | |
amotoki | clarkb: I saw http://ci.openstack.org/third_party.html and there are several ways: #openstack-infra , ML, bug report. Can I request it in this channel? | 07:02 |
*** gokrokve has quit IRC | 07:03 | |
*** nati_ueno has quit IRC | 07:04 | |
*** nati_ueno has joined #openstack-infra | 07:04 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 07:08 | |
*** mrda has joined #openstack-infra | 07:09 | |
clarkb | amotoki: you can but it is sunday night before a US holiday. a better bet is the mail list | 07:10 |
amotoki | clarkb: ah.... thanks.. I will request it via the list. | 07:12 |
*** sarob has joined #openstack-infra | 07:13 | |
*** yolanda has joined #openstack-infra | 07:14 | |
*** jhesketh_ has joined #openstack-infra | 07:15 | |
*** sarob has quit IRC | 07:17 | |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 07:18 | |
*** morganfainberg is now known as morganfainberg|z | 07:19 | |
*** mrda has quit IRC | 07:20 | |
*** mayu has joined #openstack-infra | 07:29 | |
*** jcoufal has joined #openstack-infra | 07:29 | |
*** crank has quit IRC | 07:30 | |
*** mayu has quit IRC | 07:34 | |
*** NikitaKonovalov_ is now known as NikitaKonovalov | 07:43 | |
*** crank has joined #openstack-infra | 07:44 | |
*** afazekas_ has joined #openstack-infra | 07:52 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 07:53 | |
*** nati_uen_ has joined #openstack-infra | 07:53 | |
*** morganfainberg|z is now known as morganfainberg | 07:54 | |
*** nati_ueno has quit IRC | 07:56 | |
*** gokrokve has joined #openstack-infra | 07:59 | |
ttx | FTR I'm traveling all day, mostly on a non-wifi transatlantic plane | 08:02 |
*** mrda has joined #openstack-infra | 08:02 | |
*** gokrokve has quit IRC | 08:04 | |
*** yolanda has quit IRC | 08:05 | |
*** jamielennox is now known as jamielennox|away | 08:08 | |
*** crank has quit IRC | 08:09 | |
*** mrda has quit IRC | 08:09 | |
*** crank has joined #openstack-infra | 08:09 | |
*** zhiwei has quit IRC | 08:09 | |
*** zhiwei has joined #openstack-infra | 08:09 | |
*** hashar has joined #openstack-infra | 08:12 | |
*** sarob has joined #openstack-infra | 08:13 | |
*** sarob has quit IRC | 08:18 | |
*** flaper87|afk is now known as flaper87 | 08:18 | |
*** crank has quit IRC | 08:21 | |
*** crank has joined #openstack-infra | 08:22 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Minor migration fix https://review.openstack.org/67789 | 08:25 |
*** yolanda has joined #openstack-infra | 08:25 | |
*** luqas has joined #openstack-infra | 08:26 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Introducing basic REST API https://review.openstack.org/63118 | 08:27 |
*** vkozhukalov has joined #openstack-infra | 08:28 | |
*** vkozhukalov has quit IRC | 08:34 | |
*** matrohon has joined #openstack-infra | 08:36 | |
*** luqas has quit IRC | 08:36 | |
*** nati_ueno has joined #openstack-infra | 08:41 | |
*** nati_ueno has quit IRC | 08:41 | |
*** nati_ueno has joined #openstack-infra | 08:42 | |
*** praneshp has quit IRC | 08:42 | |
*** hashar has quit IRC | 08:42 | |
*** praneshp has joined #openstack-infra | 08:44 | |
*** nati_uen_ has quit IRC | 08:44 | |
*** mrda has joined #openstack-infra | 08:44 | |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 08:47 | |
*** DinaBelova is now known as DinaBelova_ | 08:48 | |
*** jcoufal has quit IRC | 08:49 | |
*** vkozhukalov has joined #openstack-infra | 08:51 | |
*** fbo_away is now known as fbo | 08:52 | |
*** zhiwei has quit IRC | 08:54 | |
*** senk has joined #openstack-infra | 08:56 | |
*** senk has quit IRC | 08:57 | |
*** BobBallAway is now known as BobBall | 08:58 | |
*** gokrokve has joined #openstack-infra | 09:00 | |
*** gokrokve has quit IRC | 09:05 | |
*** nati_ueno has quit IRC | 09:07 | |
*** mancdaz_away is now known as mancdaz | 09:07 | |
*** nati_ueno has joined #openstack-infra | 09:07 | |
*** mancdaz is now known as mancdaz_away | 09:07 | |
*** luqas has joined #openstack-infra | 09:12 | |
*** nati_ueno has quit IRC | 09:12 | |
*** jcoufal has joined #openstack-infra | 09:12 | |
*** sarob has joined #openstack-infra | 09:13 | |
*** derekh has joined #openstack-infra | 09:15 | |
*** yassine has joined #openstack-infra | 09:16 | |
*** sarob has quit IRC | 09:18 | |
*** markmc has joined #openstack-infra | 09:18 | |
*** dpyzhov has joined #openstack-infra | 09:18 | |
*** yassine has quit IRC | 09:18 | |
*** yassine has joined #openstack-infra | 09:18 | |
*** jpich has joined #openstack-infra | 09:23 | |
*** zhiwei has joined #openstack-infra | 09:25 | |
*** praneshp has quit IRC | 09:29 | |
*** dizquierdo has joined #openstack-infra | 09:35 | |
*** dpyzhov has quit IRC | 09:35 | |
*** dpyzhov has joined #openstack-infra | 09:36 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 09:43 | |
*** jamielennox|away is now known as jamielennox | 09:48 | |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 09:48 | |
*** IvanBerezovskiy has joined #openstack-infra | 09:49 | |
*** jp_at_hp has joined #openstack-infra | 09:52 | |
*** mancdaz_away is now known as mancdaz | 09:54 | |
*** morganfainberg is now known as morganfainberg|z | 09:57 | |
*** derekh is now known as derekh_afk | 09:59 | |
*** gokrokve has joined #openstack-infra | 10:01 | |
*** rwsu has joined #openstack-infra | 10:03 | |
*** vkozhukalov has quit IRC | 10:04 | |
*** gokrokve has quit IRC | 10:06 | |
*** Ryan_Lane has quit IRC | 10:08 | |
*** johnthetubaguy has joined #openstack-infra | 10:08 | |
*** amotoki has quit IRC | 10:08 | |
*** sarob has joined #openstack-infra | 10:13 | |
*** vkozhukalov has joined #openstack-infra | 10:16 | |
*** dpyzhov has quit IRC | 10:16 | |
*** sarob has quit IRC | 10:18 | |
*** max_lobur_afk is now known as max_lobur | 10:18 | |
*** zhiwei has quit IRC | 10:38 | |
*** mrda has quit IRC | 10:39 | |
*** _ruhe is now known as ruhe | 10:42 | |
*** zhiwei has joined #openstack-infra | 10:43 | |
*** mrda has joined #openstack-infra | 10:46 | |
*** yassine has quit IRC | 10:46 | |
*** dpyzhov has joined #openstack-infra | 10:51 | |
*** zhiwei has quit IRC | 10:55 | |
*** iv_m has joined #openstack-infra | 10:59 | |
*** ArxCruz has joined #openstack-infra | 11:01 | |
*** gokrokve has joined #openstack-infra | 11:01 | |
*** markvoelker has quit IRC | 11:04 | |
*** gokrokve has quit IRC | 11:06 | |
*** sarob has joined #openstack-infra | 11:13 | |
*** sarob has quit IRC | 11:18 | |
*** rfolco has joined #openstack-infra | 11:27 | |
*** boris-42 has quit IRC | 11:31 | |
*** derekh_afk is now known as derekh | 11:38 | |
*** boris-42 has joined #openstack-infra | 11:41 | |
*** pblaho has quit IRC | 11:47 | |
*** jhesketh_ has quit IRC | 11:51 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Fix the intial db migration https://review.openstack.org/67592 | 11:51 |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Introducing basic REST API https://review.openstack.org/63118 | 11:52 |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Introducing basic REST API https://review.openstack.org/63118 | 11:54 |
*** mrda has quit IRC | 11:56 | |
sdague | fungi: when you wake up, cyeoh has a fix for that new bug | 11:57 |
*** jamielennox is now known as jamielennox|away | 11:58 | |
max_lobur | Somebody from requirements core group, could you please review/approve the patch https://review.openstack.org/#/c/66349/3. It's already has one +1 from core reviewer | 12:00 |
*** gokrokve has joined #openstack-infra | 12:02 | |
*** gokrokve has quit IRC | 12:07 | |
*** coolsvap has quit IRC | 12:07 | |
*** ruhe is now known as _ruhe | 12:09 | |
*** gsamfira has joined #openstack-infra | 12:10 | |
*** gsamfira has joined #openstack-infra | 12:11 | |
*** yassine has joined #openstack-infra | 12:11 | |
*** sarob has joined #openstack-infra | 12:13 | |
*** sarob has quit IRC | 12:18 | |
*** CaptTofu has joined #openstack-infra | 12:25 | |
*** _ruhe is now known as ruhe | 12:30 | |
*** dims has quit IRC | 12:34 | |
*** dpyzhov has quit IRC | 12:34 | |
*** yaguang has quit IRC | 12:34 | |
*** yassine has quit IRC | 12:39 | |
*** dims has joined #openstack-infra | 12:39 | |
*** yassine has joined #openstack-infra | 12:40 | |
*** markmc has quit IRC | 12:41 | |
*** markmc has joined #openstack-infra | 12:44 | |
*** CaptTofu has quit IRC | 12:46 | |
*** senk has joined #openstack-infra | 12:50 | |
*** pblaho has joined #openstack-infra | 12:50 | |
*** senk has quit IRC | 12:51 | |
*** dkranz has joined #openstack-infra | 12:57 | |
*** david-lyle_ has quit IRC | 12:58 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 12:58 | |
*** DinaBelova_ is now known as DinaBelova | 12:58 | |
*** AJaeger has joined #openstack-infra | 13:01 | |
*** smarcet has joined #openstack-infra | 13:03 | |
*** gokrokve has joined #openstack-infra | 13:03 | |
*** heyongli has joined #openstack-infra | 13:06 | |
*** markmc has quit IRC | 13:07 | |
*** gokrokve has quit IRC | 13:08 | |
*** ruhe is now known as _ruhe | 13:11 | |
*** sarob has joined #openstack-infra | 13:13 | |
*** markmc has joined #openstack-infra | 13:15 | |
*** sarob has quit IRC | 13:17 | |
*** mriedem has joined #openstack-infra | 13:19 | |
*** max_lobur is now known as max_lobur_afk | 13:26 | |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 13:26 | |
matel | Hi, I would like to have some recommendations on what is the proper development process for the devstack-gate project | 13:26 |
*** _ruhe is now known as ruhe | 13:29 | |
sdague | matel: can you be more specific for what you are looking for? | 13:29 |
*** alexpilotti has joined #openstack-infra | 13:30 | |
*** thomasem has joined #openstack-infra | 13:37 | |
*** flaper87 is now known as flaper87|afk | 13:37 | |
fungi | sdague: i saw the discussion in #nova... assuming it's https://review.openstack.org/67767 we seem to still need an approver | 13:41 |
sdague | fungi: yep | 13:41 |
sdague | and test results | 13:42 |
fungi | well, yeah, that | 13:42 |
sdague | so once we get activity on nova channel, and I get a +A, I'll ping you | 13:42 |
fungi | sounds good | 13:43 |
matel | sdauge: I want to test some changes in devstack-gate. | 13:44 |
matel | sdague: I already have an "emulated" node. | 13:44 |
matel | sdague: ./safe-devstack-vm-gate-wrap.sh seems to use the master. | 13:45 |
AJaeger | infra team, fungi: I would love to see the other api projects gated the same way as api-sites (right now they use gate-noop), do you have time for a review, please? https://review.openstack.org/#/c/67394/ | 13:45 |
matel | the master of devstack-gate | 13:45 |
matel | sdague: I have this script: https://github.com/matelakat/xenapi-os-testing/blob/start-devstack/launch-node.sh | 13:46 |
matel | sdague: on line 66, I am checking out the branch that I want to try out. | 13:47 |
sdague | yeh, honestly, we don't have a good model for testing that outside of the gate itself right now | 13:48 |
sdague | honestly, when I am making changes I usually use the gate to test them | 13:49 |
*** iv_m has quit IRC | 13:49 | |
*** Ng_ has joined #openstack-infra | 13:49 | |
matel | sdague: How does that work? The issue in my case, is that it requires a xenserver node. | 13:50 |
matel | Which does not exist in nodepool yet. | 13:50 |
sdague | matel: well we haven't had that situation before | 13:50 |
matel | sdague: I see. | 13:51 |
*** Ng_ has quit IRC | 13:51 | |
*** Ng_ has joined #openstack-infra | 13:51 | |
matel | sdague: So I would like to modify: https://github.com/openstack-infra/devstack-gate/blob/master/devstack-vm-gate.sh so that it can work with xenserver as well (I need to adjust the localrc basically) | 13:52 |
matel | sdague: Maybe checking out my branch to a location, and set SKIP_DEVSTACK_GATE_PROJECT ? | 13:53 |
sdague | matel: yeh that might work | 13:53 |
sdague | that's in place to test d-g changes actually, so it won't recursively keep checking itself out | 13:54 |
matel | My idea is that I'm gonna launch my node, check out d-g to the location(I need to look at it), and see if that works. | 13:55 |
matel | I need to check where does the checked-out repos live. | 13:56 |
matel | I guess it will live in $BASE/new | 13:57 |
matel | which is /opt/stack/new. | 13:57 |
matel | Okay, I give it a try. | 13:58 |
*** dstanek has joined #openstack-infra | 14:01 | |
*** heyongli has quit IRC | 14:02 | |
*** gokrokve has joined #openstack-infra | 14:04 | |
*** dcramer_ has quit IRC | 14:04 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 14:04 | |
*** b3nt_pin has joined #openstack-infra | 14:08 | |
*** Ng_ has quit IRC | 14:08 | |
*** gokrokve has quit IRC | 14:09 | |
*** Ng_ has joined #openstack-infra | 14:12 | |
*** Ng has quit IRC | 14:13 | |
*** Ng_ is now known as Ng | 14:13 | |
*** sarob has joined #openstack-infra | 14:13 | |
*** b3nt_pin is now known as beagles | 14:15 | |
sdague | fungi: so that patch failed jenkins on an unrelated race. I still think it should be promoted. | 14:17 |
*** sarob has quit IRC | 14:18 | |
sdague | https://bugs.launchpad.net/nova/+bug/1270608 is the other new issue that showed up last week | 14:19 |
*** dprince has joined #openstack-infra | 14:27 | |
*** alexpilotti has quit IRC | 14:29 | |
*** alexpilotti has joined #openstack-infra | 14:29 | |
BobBall | sdague: what's the recommended way to run a single test in tempest these days? | 14:29 |
*** mrodden1 has quit IRC | 14:30 | |
*** pblaho1 has joined #openstack-infra | 14:31 | |
mriedem | https://review.openstack.org/#/c/67767/ is +A'ed, but needs to pass jenkins | 14:31 |
*** pblaho has quit IRC | 14:33 | |
sdague | BobBall: tox -eall testname | 14:33 |
*** damnsmith is now known as dansmith | 14:33 | |
BobBall | heh... | 14:33 |
BobBall | sorry | 14:33 |
sdague | fungi: please promote 67767 when you can | 14:33 |
BobBall | that should have been one of the combinations I tried. | 14:33 |
*** max_lobur_afk is now known as max_lobur | 14:33 | |
*** eharney has joined #openstack-infra | 14:34 | |
sdague | fungi: actually abort on that | 14:37 |
fungi | holding off | 14:38 |
*** senk has joined #openstack-infra | 14:38 | |
*** coolsvap has joined #openstack-infra | 14:41 | |
*** oubiwann_ has joined #openstack-infra | 14:45 | |
*** ryanpetrello has joined #openstack-infra | 14:45 | |
fungi | unfortunate... 67767,2 seems to have a merge conflict with some change ahead of it | 14:47 |
*** mrodden has joined #openstack-infra | 14:48 | |
*** SergeyLukjanov is now known as SergeyLukjanov_a | 14:51 | |
*** pblaho1 has quit IRC | 14:52 | |
*** SergeyLukjanov_a is now known as SergeyLukjanov_ | 14:52 | |
*** dcramer_ has joined #openstack-infra | 14:53 | |
*** pblaho has joined #openstack-infra | 14:55 | |
*** malini has joined #openstack-infra | 14:55 | |
malini | Good Morning!! | 14:56 |
malini | I have a couple of patches outstanding for adding MArconi support | 14:56 |
malini | Can I get some reviews please? | 14:56 |
malini | https://review.openstack.org/#/c/65145/ | 14:56 |
malini | https://review.openstack.org/#/c/65140/ | 14:57 |
malini | I need these merged before I can get my patch to tempest merged | 14:57 |
*** malini is now known as malini_afk | 15:00 | |
*** oubiwann_ has quit IRC | 15:00 | |
*** malini_afk is now known as malini | 15:02 | |
*** senk has quit IRC | 15:03 | |
*** oubiwann_ has joined #openstack-infra | 15:04 | |
*** gokrokve has joined #openstack-infra | 15:04 | |
*** afazekas_ has quit IRC | 15:05 | |
*** nosnos has quit IRC | 15:06 | |
*** annegent_ has joined #openstack-infra | 15:06 | |
*** senk has joined #openstack-infra | 15:07 | |
*** senk1 has joined #openstack-infra | 15:08 | |
*** gokrokve has quit IRC | 15:09 | |
sdague | fungi: yeh, we're still discussing 67767 | 15:11 |
*** senk has quit IRC | 15:12 | |
*** sarob has joined #openstack-infra | 15:13 | |
*** annegent_ has quit IRC | 15:13 | |
*** DinaBelova is now known as DinaBelova_ | 15:16 | |
*** afazekas_ has joined #openstack-infra | 15:16 | |
*** sarob has quit IRC | 15:18 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 15:19 | |
*** DinaBelova_ is now known as DinaBelova | 15:20 | |
openstackgerrit | Zang MingJie proposed a change to openstack-infra/zuul: Supply authentication to zuul's gerrit baseurl https://review.openstack.org/67858 | 15:20 |
*** dims has quit IRC | 15:21 | |
*** rakhmerov has quit IRC | 15:22 | |
*** ryanpetrello has quit IRC | 15:22 | |
*** rakhmerov has joined #openstack-infra | 15:22 | |
openstackgerrit | Zang MingJie proposed a change to openstack-infra/zuul: Supply authentication to zuul's gerrit baseurl https://review.openstack.org/67858 | 15:23 |
*** mrmartin has joined #openstack-infra | 15:25 | |
openstackgerrit | Nikita Konovalov proposed a change to openstack-infra/storyboard: Load projects from yaml file https://review.openstack.org/66280 | 15:25 |
*** talluri has joined #openstack-infra | 15:30 | |
max_lobur | Somebody from requirements core group, could you please review/approve the patch https://review.openstack.org/#/c/66349/3. It's already has one +1 from core reviewer | 15:30 |
*** nprivalova is now known as nadya_ | 15:31 | |
*** dmitkuzn has joined #openstack-infra | 15:32 | |
*** jgrimm has joined #openstack-infra | 15:34 | |
*** vkozhukalov has quit IRC | 15:34 | |
*** dims has joined #openstack-infra | 15:35 | |
*** gokrokve has joined #openstack-infra | 15:37 | |
*** gokrokve has joined #openstack-infra | 15:37 | |
*** rcleere has joined #openstack-infra | 15:40 | |
*** johnthetubaguy has quit IRC | 15:40 | |
*** DennyZhang has joined #openstack-infra | 15:40 | |
*** johnthetubaguy has joined #openstack-infra | 15:41 | |
openstackgerrit | Arx Cruz proposed a change to openstack-infra/config: Change mysql-devel to community-mysql-devel in Fedora https://review.openstack.org/62739 | 15:43 |
*** afazekas_ has quit IRC | 15:44 | |
*** juliashem has joined #openstack-infra | 15:46 | |
*** NikitaKonovalov is now known as NikitaKonovalov_ | 15:47 | |
*** mrmartin has quit IRC | 15:49 | |
*** annegent_ has joined #openstack-infra | 15:50 | |
*** dmitkuzn has quit IRC | 15:51 | |
*** senk1 has quit IRC | 15:51 | |
*** juliashem has quit IRC | 15:51 | |
*** annegent_ has quit IRC | 15:54 | |
*** ryanpetrello has joined #openstack-infra | 15:55 | |
*** ryanpetrello has quit IRC | 15:55 | |
*** marun has joined #openstack-infra | 15:57 | |
fungi | the merge rate seems to be getting substantially worse. we're on track to merge 3 or 4 changes to openstack/openstack in a 24-hour period | 15:57 |
fungi | with the load from check pileup putting zuul into a pendulum between pipelines, we're merging or kicking out (more often kicking out) one change from the gate every couple hours, yet we're approving a dozen an hour | 15:59 |
*** talluri has quit IRC | 16:01 | |
notmyname | fungi: how are you tracking that number? | 16:01 |
fungi | notmyname: looked at http://git.openstack.org/cgit/openstack/openstack/log/ | 16:01 |
fungi | 3 changes merged in the past 18 hours | 16:02 |
notmyname | fungi: thanks | 16:02 |
*** johnthetubaguy has quit IRC | 16:02 | |
fungi | and the cinder change at the head of the gate just failed a grenade job, which means now we get to service the 50 or so changes waiting for nodes in the check pipeline before we restart testing on the change which was behind it in the gate | 16:03 |
*** johnthetubaguy has joined #openstack-infra | 16:05 | |
fungi | granted that off-the-cuff metric misses changes to stable release branches, but right now those are broken anyway so we wouldn't be merging any changes to them regardless | 16:05 |
*** david-lyle_ has joined #openstack-infra | 16:05 | |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 16:09 | |
openstackgerrit | Arx Cruz proposed a change to openstack-infra/config: Change mysql-devel to community-mysql-devel in Fedora https://review.openstack.org/62739 | 16:11 |
*** afazekas_ has joined #openstack-infra | 16:11 | |
*** nicedice has joined #openstack-infra | 16:13 | |
*** sarob has joined #openstack-infra | 16:13 | |
*** salv-orlando has joined #openstack-infra | 16:13 | |
*** jcoufal has quit IRC | 16:15 | |
*** DinaBelova is now known as DinaBelova_ | 16:17 | |
*** sarob has quit IRC | 16:18 | |
*** nati_ueno has joined #openstack-infra | 16:19 | |
*** marun has quit IRC | 16:20 | |
*** thuc has joined #openstack-infra | 16:20 | |
*** marun has joined #openstack-infra | 16:22 | |
*** johnthetubaguy has quit IRC | 16:22 | |
*** johnthetubaguy has joined #openstack-infra | 16:22 | |
*** dizquierdo has quit IRC | 16:27 | |
*** NikitaKonovalov_ is now known as NikitaKonovalov | 16:32 | |
sdague | mordred: any word on quota bump? | 16:32 |
fungi | i think it must be freudian that i've started mistyping "gate" as "hate" | 16:34 |
fungi | sdague: it looks like https://review.openstack.org/67371 could use an approval vote | 16:35 |
sdague | fungi: doh | 16:36 |
fungi | otherwise pretty much all of the tempest changes from last week's sprint have merged (except for a couple which are in the gate currently) | 16:36 |
sdague | fungi: where is it in the queue? | 16:36 |
*** mancdaz is now known as mancdaz_away | 16:37 | |
fungi | sdague: it isn't. it already passed all the way through but failed to merge because dkranz revoked his approval | 16:37 |
*** AaronGr_Zzz is now known as AaronGr | 16:37 | |
*** UtahDave has joined #openstack-infra | 16:38 | |
*** markmcclain has joined #openstack-infra | 16:38 | |
sdague | fungi: can we promote or ninja merge? that will actually take some of the load off the neutron tests | 16:39 |
*** yamahata has quit IRC | 16:39 | |
sdague | which should increase their pass rate | 16:39 |
fungi | sdague: should be safe. looks like it would have made it were it not for the missing approval vote when it was done | 16:39 |
*** DinaBelova_ is now known as DinaBelova | 16:39 | |
fungi | i'll merge it | 16:40 |
*** mancdaz_away is now known as mancdaz | 16:41 | |
fungi | it's merged now | 16:41 |
*** jpich has quit IRC | 16:43 | |
*** afazekas_ has quit IRC | 16:45 | |
mtreinish | fungi: heh, I don't think I actually seen that before | 16:45 |
fungi | mtreinish: that's the behavior if missing vrfy/cdrv/aprv votes are missing or there's a -2 vote on it when it comes time to merge | 16:46 |
fungi | generally happens when they're manually unset while it's in the gate | 16:46 |
mordred | sdague: nope. just pinged back again | 16:46 |
mtreinish | fungi: yeah it looks like dkranz removed his +A after the gate tests started on it | 16:46 |
fungi | yep | 16:46 |
fungi | which won't kick it out of the gate at the moment, but will prevent it from merging once it makes its way through | 16:47 |
sdague | fungi: so we might want to trigger a gate dequeue on removing A | 16:47 |
sdague | because otherwise it's kind of useless | 16:47 |
fungi | sdague: i believe there is intent to make that happen (along with on -2 as well), but is still on the to do list | 16:48 |
sdague | yep | 16:49 |
sdague | did the early pep8 on check ever get merged? | 16:49 |
fungi | sdague: mordred wanted to rework it. it wouldn't have bought us much in its original form | 16:50 |
sdague | ok, cool | 16:50 |
sdague | just checking | 16:50 |
fungi | all it would have preempted was python26/27 and docs checks | 16:50 |
mordred | yeah. I'm not sure it's possible to express with the current template setup | 16:50 |
*** elasticio has joined #openstack-infra | 16:50 | |
sdague | ok | 16:50 |
*** mgagne has joined #openstack-infra | 16:51 | |
*** GheRiver1 has joined #openstack-infra | 16:53 | |
*** GheRiver1 has quit IRC | 16:53 | |
*** MarkAtwood has joined #openstack-infra | 16:54 | |
*** pblaho has quit IRC | 16:57 | |
*** AaronGr is now known as AaronGr_Zzz | 16:58 | |
sdague | fungi: so given that we're not really moving code anyway, what are the odds we could fix logs on the other jenkinses | 16:58 |
*** alexpilotti has quit IRC | 16:58 | |
*** sarob has joined #openstack-infra | 16:59 | |
*** ruhe is now known as _ruhe | 16:59 | |
*** krotscheck has joined #openstack-infra | 17:00 | |
fungi | sdague: pretty good. would be easier when clarkb is around since he knows how he was obtaining the patched plugin build to upload into them | 17:00 |
sdague | sure | 17:01 |
sdague | that's fair, hopefully he'll be back on soon | 17:01 |
*** nati_ueno has quit IRC | 17:02 | |
*** pblaho has joined #openstack-infra | 17:04 | |
*** pblaho has quit IRC | 17:04 | |
mgagne | zaro: ping | 17:06 |
*** vkozhukalov has joined #openstack-infra | 17:08 | |
*** senk1 has joined #openstack-infra | 17:09 | |
*** Ryan_Lane has joined #openstack-infra | 17:10 | |
*** Ryan_Lane has quit IRC | 17:11 | |
sdague | mordred: if you feel like reviewing something that can merge - https://review.openstack.org/#/q/status:open+project:openstack-infra/config+branch:master+topic:gatestatus,n,z | 17:12 |
sdague | then I can get that off the elastic recheck page | 17:13 |
mordred | usdlooking | 17:13 |
mordred | gah | 17:13 |
mordred | sdague: looking | 17:13 |
*** gokrokve has quit IRC | 17:13 | |
*** gokrokve has joined #openstack-infra | 17:13 | |
*** aburaschi has joined #openstack-infra | 17:15 | |
sdague | also, where is that framework patch for status again? | 17:16 |
sdague | I want to look at redoing the er stuff like that before I add more logic to the existing page | 17:16 |
*** gokrokve has quit IRC | 17:18 | |
*** jaypipes has joined #openstack-infra | 17:19 | |
*** mancdaz is now known as mancdaz_away | 17:19 | |
*** moted has quit IRC | 17:20 | |
*** moted has joined #openstack-infra | 17:20 | |
aburaschi | Hello, newbie quick question: if I want to reverify a patch in jenkins, and I identify that two bugs are associated to that failure, which is the correct way to proceed? | 17:20 |
aburaschi | a) put: | 17:20 |
aburaschi | reverify bug 1 | 17:20 |
aburaschi | reverify bug 2 | 17:20 |
aburaschi | or b) select just one and go with that one? | 17:20 |
fungi | aburaschi: best would be to leave two reverify comments, one for each bug which resulted in a failure (don't leave them in the same comment though or it won't work) | 17:22 |
*** SumitNaiksatam has quit IRC | 17:25 | |
aburaschi | Excellent, thank you very much, fungi. | 17:25 |
fungi | you're welcome | 17:26 |
*** DennyZhang has quit IRC | 17:26 | |
*** yassine has quit IRC | 17:27 | |
*** AaronGr has joined #openstack-infra | 17:29 | |
fungi | ...thinking aloud, i wonder whether giving the check pipeline priority over the gate would break the pendulum swing and improve gating performance | 17:29 |
*** AaronGr has quit IRC | 17:30 | |
*** AaronGr_Zzz is now known as AaronGr | 17:30 | |
fungi | we'd dribble nodes into the gate jobs in sequence as the check pipeline no longer needs them. as a result, we'd be testing fewer gate changes at a time, meaning a smaller rush of nodes to reclaim on the inevitable gate reset | 17:31 |
fungi | would have the effect of spreading nodepool delete and build operations out more evenly | 17:31 |
*** thuc has quit IRC | 17:36 | |
*** afazekas_ has joined #openstack-infra | 17:37 | |
*** thuc has joined #openstack-infra | 17:37 | |
*** marun has quit IRC | 17:40 | |
*** fbo is now known as fbo_away | 17:40 | |
*** pballand has joined #openstack-infra | 17:40 | |
*** marun has joined #openstack-infra | 17:41 | |
*** thuc has quit IRC | 17:41 | |
*** chandankumar_ has quit IRC | 17:42 | |
*** luqas has quit IRC | 17:43 | |
*** senk1 has quit IRC | 17:43 | |
sdague | fungi: do we ever hit a point where check doesn't need them right now? | 17:43 |
sdague | I also thought both queues were equal priority | 17:44 |
fungi | sdague: if we were servicing it first, we probably would | 17:44 |
sdague | fungi: I'm not convinced :) | 17:44 |
fungi | they are equal priority right now, which is what causes the swing | 17:44 |
sdague | it's at 102 | 17:44 |
clarkb | morning | 17:44 |
sdague | and given the build delays, I think we'd just completely starve the gate | 17:44 |
sdague | if we had more nodes, I'd agree | 17:44 |
sdague | ok, going to pop out for lunch | 17:45 |
clarkb | fungi: I grabbed the scp.jpi file from jenkins-dev | 17:45 |
*** _ruhe is now known as ruhe | 17:45 | |
clarkb | fungi you can grab it from there or 02 | 17:45 |
fungi | sdague: possibly. part of it is that right now, we're applying every new node to gate changes (because there's more than we can service) and then once a gate reset happens, we start handing every available node to the check pipeline changes which piled up while we were previously handing them all to the gate | 17:46 |
sdague | yep, swapping not fun | 17:46 |
fungi | but given the gate reset frequency, most of the nodes burned on gate pipeline changes were wasted because their results were never needed | 17:46 |
*** dstufft has quit IRC | 17:46 | |
sdague | right | 17:47 |
*** nati_ueno has joined #openstack-infra | 17:47 | |
*** dstufft has joined #openstack-infra | 17:47 | |
fungi | at most the first few dozen nodes applied to the gate had any real effect at all, and the rest were just resources which could have gone to clearing out the check pipeline instead | 17:47 |
*** jasondotstar has joined #openstack-infra | 17:47 | |
sdague | the smart way to do it would be to calculate out the percentage chances for each successive piece of the gate to get through from it's current position, then define a cutoff | 17:47 |
sdague | and not schedule past that point | 17:47 |
sdague | that requires a lot more logic though | 17:48 |
*** nati_ueno has quit IRC | 17:48 | |
dkranz | fungi, sdague : Did I do something bad? | 17:48 |
sdague | dkranz: yeh, but we fixed it | 17:48 |
mordred | sdague: we can segregate teh pools more though | 17:48 |
mordred | we have precide and check-precise or whatever it's called | 17:48 |
dkranz | sdague: For future reference, what am I not supposed to do? | 17:48 |
sdague | dkranz: don't remove +A from a change in the gate | 17:49 |
sdague | the behavior isn't what you actually want | 17:49 |
mordred | we could change teh nodepool config to put less nodes into devstack-precise and more into devstack-precise-check | 17:49 |
fungi | mordred: we actually got rid of precise-check nodes a few weeks ago. now all dsvm and bare nodes are available for either check or gate | 17:49 |
mordred | to achieve with a baseball bat the thing you were talking about above | 17:49 |
mordred | oh. well | 17:49 |
sdague | ok... really, leaving for lunch | 17:49 |
dkranz | sdague: ok. But I was not trying to stop it and didn't realize it was in the gate. | 17:49 |
dkranz | sdague: I just saw from other comments that it should not have been approved. | 17:50 |
dkranz | But I won't do it again | 17:50 |
*** rakhmerov has quit IRC | 17:51 | |
clarkb | fungi mordred should we bump scp on 03 now? | 17:51 |
openstackgerrit | A change was merged to openstack-infra/config: Additional jobs for python-rallyclient https://review.openstack.org/66929 | 17:51 |
mordred | clarkb: yeah | 17:51 |
fungi | dkranz: probably could have just left it approved at that point and waited to see whether the check results came back green. then if not, upload a new patchset to knock the previous broken one out of the gate | 17:52 |
openstackgerrit | A change was merged to openstack-infra/config: Add an experimental functional job for neutron. https://review.openstack.org/66967 | 17:52 |
fungi | clarkb: i believe that would be good | 17:52 |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 17:52 | |
*** sarob has quit IRC | 17:53 | |
*** sarob has joined #openstack-infra | 17:54 | |
*** bnemec_ is now known as bnemec | 17:54 | |
clarkb | ok putting 03 in shutdown mode | 17:56 |
*** ruhe is now known as _ruhe | 17:57 | |
openstackgerrit | A change was merged to openstack-infra/storyboard: Fix the intial db migration https://review.openstack.org/67592 | 17:59 |
*** MarkAtwood has quit IRC | 18:00 | |
*** boris-42 has quit IRC | 18:00 | |
clarkb | fungi: mordred: the scp.jpi file is on 03 and 04 under ~clarkb/plugins/scp/fixed | 18:01 |
fungi | k | 18:02 |
fungi | and you're just using the upload screen in the webui to upgrade it? | 18:03 |
*** derekh has quit IRC | 18:03 | |
*** chandankumar_ has joined #openstack-infra | 18:03 | |
*** boris-42 has joined #openstack-infra | 18:04 | |
*** nati_ueno has joined #openstack-infra | 18:04 | |
clarkb | fungi: no, I am actually stopping the server, putting the scp.jpi in /var/lib/jenkins/plugins then starting jenkins | 18:05 |
*** NikitaKonovalov is now known as NikitaKonovalov_ | 18:05 | |
clarkb | fungi: you can use the webui instead, it is how zaro put it on -dev | 18:05 |
clarkb | I feel like there is more control doing it by hand on disk | 18:05 |
*** CaptTofu has joined #openstack-infra | 18:06 | |
clarkb | because I don't know what magic jenkins is doing under the hood to do restartless upgrades (which don't work) and so on | 18:06 |
*** gokrokve has joined #openstack-infra | 18:09 | |
*** zz_ewindisch is now known as ewindisch | 18:09 | |
*** sarob has quit IRC | 18:09 | |
*** markmcclain has quit IRC | 18:10 | |
fungi | ahh, okay. the last time i did it from the fs it was because jenkins wouldn't start otherwise, and i wasn't sure how many of the accompanying files needed to be copied into place too or whether some of those were ephemeral | 18:10 |
*** markmcclain has joined #openstack-infra | 18:10 | |
clarkb | fungi: the scp/ dir that is created is made by expanding the jpi archive I think | 18:11 |
radix | can someone help me understand what's going on in http://logs.openstack.org/06/67006/4/check/check-tempest-dsvm-full/5fa3d8a/ ? It seems to be some kind of network failure | 18:11 |
clarkb | fungi: the only thing you need is the .jpi or .hpi | 18:11 |
*** rakhmerov has joined #openstack-infra | 18:11 | |
*** afazekas_ has quit IRC | 18:12 | |
*** rakhmerov has joined #openstack-infra | 18:12 | |
clarkb | radix: http://logs.openstack.org/06/67006/4/check/check-tempest-dsvm-full/5fa3d8a/logs/devstack-gate-setup-workspace-new.txt an hpcloud node is trying to clone a repo over ipv6 | 18:13 |
clarkb | hpcloud doesn't have an ipv6 stack | 18:13 |
*** johnthetubaguy has quit IRC | 18:13 | |
clarkb | fungi: did we determine anything more about that problem? | 18:13 |
radix | clarkb: this came up in my heat change, pretty sure it's unrelated, and I'm not sure what to do about it | 18:13 |
clarkb | radix: I am not sure either, I think fungi has investigated it | 18:14 |
radix | oh ok :) | 18:14 |
clarkb | one job left on 03, I will stop it and start it with new scp plugin as soon as that job clears out | 18:14 |
clarkb | which is now | 18:15 |
openstackgerrit | Brant Knudson proposed a change to openstack/requirements: Update oauthlib requirement to at least 0.6 https://review.openstack.org/67900 | 18:15 |
*** jaypipes has quit IRC | 18:16 | |
fungi | clarkb: only speculation... the ip configuration output in the console log only shows ipv4 (not even any linklocal v6), which makes me think we're doing "ip -4 ad sh" explicitly or something. i'll have a look and see how we might get more diagnostics for this on future runs | 18:16 |
clarkb | 03 is back up with new plugin | 18:17 |
clarkb | fungi: mordred: should I put 04 in shutdown mode now? | 18:18 |
radix | hmm, looks like this: https://bugs.launchpad.net/openstack-ci/+bug/1266616 | 18:18 |
radix | I guess I'll run a recheck on that | 18:18 |
fungi | clarkb: go for it | 18:18 |
*** vkozhukalov has quit IRC | 18:19 | |
clarkb | fungi: once 04 is done the remaining nodes will be 01 and jenkins.o.o which can get the correct version when we update their jenkins version | 18:19 |
*** fifieldt has joined #openstack-infra | 18:19 | |
fungi | radix: that looks like it, yeah | 18:19 |
clarkb | I am going to take advantage of the wait to return to my regularly scheduled morning | 18:19 |
clarkb | will pop back in in a bit to finish 04 | 18:20 |
fungi | radix: current suspicion is that some other tenant in hpcloud is generating router advertisements, but adding some extra debugging around address assignments there may help enlighten us as to the cause | 18:20 |
radix | yikes | 18:21 |
*** yamahata has joined #openstack-infra | 18:21 | |
clarkb | fungi: we can update the iptables rules right? | 18:21 |
clarkb | needs to be conditional for hpcloud only though | 18:22 |
*** pballand has quit IRC | 18:23 | |
*** SergeyLukjanov is now known as SergeyLukjanov_a | 18:24 | |
fungi | clarkb: well, if that's the cause then yes, but if so there's every chance the same could happen in rackspace and then we'd need to be able to keep filters updated for their router linklocal addresses | 18:24 |
*** SergeyLukjanov_a is now known as SergeyLukjanov_ | 18:25 | |
fungi | clarkb: radix: for details, see https://launchpad.net/bugs/1262759 | 18:26 |
*** afazekas_ has joined #openstack-infra | 18:26 | |
fungi | it's apparently blocked *if* you're doing openstack ipv6 networking, but given the way in which rackspace has implemented their ipv6 vm connectivity i have no idea whether that also holds true for them | 18:29 |
*** afazekas_ has quit IRC | 18:30 | |
*** dcramer_ has quit IRC | 18:31 | |
*** afazekas_ has joined #openstack-infra | 18:32 | |
*** marun has quit IRC | 18:32 | |
*** marun has joined #openstack-infra | 18:33 | |
*** ewindisch is now known as zz_ewindisch | 18:36 | |
*** elasticio has quit IRC | 18:36 | |
*** praneshp has joined #openstack-infra | 18:36 | |
*** zz_ewindisch is now known as ewindisch | 18:37 | |
*** jaypipes has joined #openstack-infra | 18:37 | |
*** senk1 has joined #openstack-infra | 18:38 | |
*** ewindisch is now known as zz_ewindisch | 18:40 | |
*** marun has quit IRC | 18:40 | |
*** zz_ewindisch is now known as ewindisch | 18:41 | |
*** marun has joined #openstack-infra | 18:41 | |
clarkb | eta on 04 is 30 minutes | 18:43 |
*** chandankumar_ has quit IRC | 18:44 | |
*** yamahata has quit IRC | 18:48 | |
*** jasondotstar has quit IRC | 18:49 | |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Add tox.ini file to run things via tox https://review.openstack.org/67721 | 18:49 |
clarkb | fungi: do we think we should submit a ticket to hpcloud about the possible bad 'router'? | 18:50 |
*** ewindisch is now known as zz_ewindisch | 18:50 | |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/devstack-gate: Also print IPv6 address details https://review.openstack.org/67911 | 18:52 |
fungi | clarkb: maybe we start with ^ and have a look at the next one which hits logstash | 18:52 |
*** CaptTofu has quit IRC | 18:52 | |
*** mindjiver has quit IRC | 18:52 | |
*** zz_ewindisch is now known as ewindisch | 18:52 | |
* clarkb looks | 18:53 | |
*** nati_uen_ has joined #openstack-infra | 18:53 | |
fungi | do you think an ip route show along with that would also be in order? | 18:53 |
krotscheck | clarkb: The run-selenium script seems to depend on having run_tests.sh in the project. Do you have a strong opinion on whether A) I can remove that, or B) I should create an xvfb builder macro that just executes tox? | 18:53 |
*** markmcclain has quit IRC | 18:53 | |
fungi | clarkb: oh, though for that you also need ip -6 route show. maybe add an ip {,-6} neighbor show too | 18:54 |
clarkb | krotscheck: I would love it if we can remove the dependency on run_tests.sh, but horizon is a thing | 18:54 |
clarkb | krotscheck: maybe we can feed run-selenium a command to execute a test with selenium bits prestaged | 18:54 |
clarkb | krotscheck: then feed a different command to horizon and storyboard | 18:55 |
clarkb | fungi: sounds good to me | 18:55 |
krotscheck | clarkb: I dunno, that feels a bit like overparameterizing a command | 18:55 |
clarkb | krotscheck: not really, its creating a specific test environment to run tests within | 18:55 |
krotscheck | clarkb: BTW- so there's a python module called nodeenv that will drop a nodejs runtime into your virtualenv for you. | 18:55 |
clarkb | the test yo uwant to run within it don't need to be identical | 18:55 |
krotscheck | clarkb: So mordred and I are working on just having storyboard use tox. | 18:56 |
clarkb | fungi: want to update the existing change or do that in a different one? | 18:56 |
*** nati_ueno has quit IRC | 18:56 | |
fungi | clarkb: i'm updating it now | 18:56 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/devstack-gate: More network debugging detail https://review.openstack.org/67911 | 18:58 |
fungi | clarkb: ^ updated | 18:58 |
*** markmcclain has joined #openstack-infra | 18:59 | |
fungi | clarkb: turns out ip neighbor show gets you both the arp and nd table entries together, so it's just ip route show which needs a separate -6 variant | 18:59 |
*** Ajaeger1 has joined #openstack-infra | 19:01 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 19:02 | |
*** amotoki has joined #openstack-infra | 19:03 | |
fungi | clarkb: anyway, with that it should give us enough info to spot the ethernet address of the "router" if it really is someone testing radvd or a broken switchrouter in the distribution layer or something at fault | 19:03 |
jog0 | sdague: ping | 19:03 |
sdague | jog0: yo | 19:03 |
*** afazekas_ has quit IRC | 19:03 | |
*** amotoki has quit IRC | 19:03 | |
jog0 | sdague: https://review.openstack.org/#/c/67596/ works can you review it | 19:03 |
jog0 | mtreinish: if your around | 19:03 |
jog0 | sdague: that will give us more accurate e-r comments | 19:04 |
*** CaptTofu has joined #openstack-infra | 19:05 | |
jog0 | which is why I want to get this in as soon as possible | 19:06 |
sdague | jog0: so I have one suggested change, inline | 19:08 |
jog0 | sdague: sounds like a good idea to me, thanks | 19:09 |
*** mrodden has quit IRC | 19:10 | |
jog0 | so actually we use a lot of data from the gerrit event | 19:11 |
jog0 | and its all over the place right now | 19:12 |
jog0 | sdague: so I would prefer to do that refactor separately | 19:12 |
*** yamahata has joined #openstack-infra | 19:13 | |
*** markmcclain has quit IRC | 19:13 | |
openstackgerrit | lifeless proposed a change to openstack-infra/config: Add some dependencies required by toci https://review.openstack.org/67685 | 19:13 |
lifeless | clarkb: fungi: if we can get ^ landed and then turn on the tripleo nodepool config, that would be the awesome | 19:14 |
mriedem | did anything change with the backing cinder volume store on the test nodes around 1/17? | 19:14 |
*** jasondotstar has joined #openstack-infra | 19:14 | |
sdague | jog0: can you introduce the event object under this one | 19:14 |
sdague | I really hate having to clean these up later | 19:14 |
clarkb | 04 is idle now, updating scp plugin now | 19:15 |
*** mrodden has joined #openstack-infra | 19:15 | |
sdague | great | 19:15 |
sdague | I can already see us timing out a lot less in the channel | 19:15 |
*** markmc has quit IRC | 19:16 | |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Add tox.ini file to run things via tox https://review.openstack.org/67721 | 19:16 |
fungi | mriedem: what backing cinder volume store? you mean the one devstack creates when it starts up? | 19:16 |
mriedem | fungi: yes | 19:16 |
mriedem | anything using iscsu | 19:16 |
mriedem | *iscsi | 19:16 |
jgriffith | mriedem: fungi aren't those still just loopback files created by devstack? | 19:17 |
fungi | jgriffith: as far as i know, yes. so any changes would be changes in devstack or *maybe* devstack-gate repositories | 19:17 |
jgriffith | fungi: or cinder :/ | 19:17 |
jog0 | sdague: normally I would say sure, bit I am not even supposed to be working today, just stopped in for an hour or so | 19:17 |
jgriffith | mriedem: what are you seeing? | 19:17 |
fungi | jgriffith: well, yeah, or cinder ;) | 19:18 |
mriedem | jgriffith: digging into this https://bugs.launchpad.net/nova/+bug/1270608 | 19:18 |
jog0 | I agree it needs cleanup but I don't think its worth holding this up for that | 19:18 |
clarkb | 04 seems up | 19:18 |
mriedem | i might be looking at a red herring in the nova code that changed on 1/17 which is when that bug started showing up | 19:18 |
fungi | clarkb: agreed. looks like it's running jobs already | 19:18 |
mriedem | i'll see what changed in cinder and devstack on 1/17 | 19:18 |
ewindisch | irt a conversation I've been having with dtroyer in #openstack-dev.... | 19:19 |
ewindisch | what are the thoughts toward gating another nova hypervisor in openstack-infra? | 19:19 |
jgriffith | mriedem: I seem to recall this may be a dup of another nova item we looked at a while back | 19:19 |
sdague | jog0: so I don't want to unwind this when we could do the event object first | 19:20 |
ewindisch | Dean seems to worry about having enough resources for the extra gate | 19:20 |
sdague | as it makes more work | 19:20 |
sdague | ewindisch: -1 | 19:20 |
sdague | revisit at Juno summit | 19:20 |
ewindisch | sdague: at the root of this is russell REQUIRING a (non-voting) gate to keep hypervisors in Nova | 19:21 |
jog0 | sdague: you want to take a whack at the event object? I am trying to not work today | 19:21 |
*** annegent_ has joined #openstack-infra | 19:21 | |
sdague | jog0: yep, I will | 19:21 |
sdague | ewindisch: yep, do what everyone else is doing, and bring up a 3rd party system | 19:21 |
jog0 | sdague: thanks | 19:21 |
jog0 | ! | 19:21 |
fungi | ewindisch: as it stands there's still a whole stack of patches against nodepool, devstack-gate and infra/config to get xenserver testing working. we haven't even had time to look at them as far as i'm aware (much to the displeasure of the xenserver devs) | 19:21 |
clarkb | fungi: did they ever respond to the first round of review on those? | 19:22 |
fungi | clarkb: i believe so, but i've been too busy to look through them again | 19:22 |
*** nati_uen_ has quit IRC | 19:22 | |
clarkb | fungi: I was really curious what the feedback would be but the changes sat idel and were auto abandoned | 19:22 |
mriedem | jgriffith: this is iscsi related and 1/17: https://github.com/openstack/cinder/commit/a9267644ee09591e2d642d6c1204d94a9fdd8c82 | 19:22 |
*** annegent_ has quit IRC | 19:22 | |
*** markmcclain has joined #openstack-infra | 19:23 | |
ewindisch | sdague: everyone else being "VMware" and "Citrix" i.e. https://www.google.com/finance?q=ctxs and https://www.google.com/finance?q=vmw | 19:23 |
jgriffith | mriedem: eeek | 19:24 |
*** jp_at_hp has quit IRC | 19:24 | |
jog0 | ewindisch: even if we wanted to we don't have the resources right now | 19:24 |
mriedem | jgriffith: i'm not familiar with that code, but does that look like it could cause races? like premature return from snapshot from volume when it's not ready? | 19:24 |
ewindisch | I understand, but I'm going to have to sync with russellb and samalba about this. | 19:25 |
jgriffith | mriedem: indeed, I believe it could | 19:25 |
jgriffith | mriedem: looking now | 19:25 |
jgriffith | mriedem: I believe you're correct | 19:28 |
jog0 | so I have a question that I am not sure how to answer: do we think dropping tempest concurrency down to 2 increased the number of patches we are able to merge into openstack/openstack in a given amount of time | 19:28 |
jgriffith | mriedem: I'll spin it up here and take a look after I finish what I'm in the middle of now | 19:28 |
*** thomasem has quit IRC | 19:28 | |
clarkb | jog0: no | 19:28 |
clarkb | I think it significantly impacted the backlog | 19:28 |
jog0 | I count 7 patches in last 24 hours | 19:29 |
jog0 | clarkb: perhaps we should consider reverting the patch | 19:29 |
clarkb | in the opposite direction, but I have no hard data to support that | 19:29 |
clarkb | because tests are taking up to 1.33 hours now isntead of .70 hours or wherever they were before | 19:29 |
mriedem | jgriffith: cool, thanks | 19:30 |
*** yolanda has quit IRC | 19:30 | |
russellb | taking 1.33 hours more reliably is better than 0.7 hours with random failures all over the place due to pegging the CPU the entire time | 19:30 |
fungi | as discussed in #nova, i'm going to promote https://review.openstack.org/67914 to the head of the gate pipeline. the result will be that everything in the check pipeline as of now will get new nodes first, and then that change will get a shot at fixing a substantial percentage of our gate resets | 19:30 |
sdague | so I actually think the concurrency did make things better | 19:30 |
russellb | it's really just a non-starter to run the tests with CPU over the top | 19:30 |
clarkb | russellb: it isn't more reliable though | 19:30 |
sdague | clarkb: sure | 19:31 |
*** SergeyLukjanov is now known as SergeyLukjanov_a | 19:31 | |
russellb | the failures are just other things right now | 19:31 |
sdague | but it's more reliable | 19:31 |
sdague | so I'm -1 to going back to 4x | 19:31 |
russellb | it eliminates a whole class of failures | 19:31 |
sdague | agree with russellb | 19:31 |
clarkb | were those failures just masking all of these failures? | 19:31 |
sdague | clarkb: possibly | 19:31 |
clarkb | we are still essentially worst cases the gate queue which is where we were before | 19:32 |
clarkb | so the gate queue isn't more reliable | 19:32 |
*** SergeyLukjanov_a is now known as SergeyLukjanov_ | 19:32 | |
sdague | we were also in deep gate queue | 19:32 |
jog0 | clarkb: http://status.openstack.org/elastic-recheck/ the graph at the top looks very wrong | 19:32 |
sdague | so we're basically driving a rover on mars | 19:32 |
clarkb | I think we had what 30 changes merge over a day recently | 19:32 |
clarkb | jog0: looks like graphite problems | 19:32 |
jog0 | clarkb: yeah | 19:32 |
jog0 | so merge rates: http://paste.openstack.org/show/61594/ | 19:32 |
sdague | clarkb: yeh, friday -> sat was about 30 in 24hrs | 19:32 |
sdague | I also expect what happened is in drop concurency we had some tests move around | 19:33 |
*** _david_ has joined #openstack-infra | 19:33 | |
sdague | so we go new overlaps | 19:33 |
sdague | got new overlaps | 19:33 |
fungi | http://git.openstack.org/cgit/openstack/openstack/log/ shows 4 commits in the past 22 hours, one of which i force-merged without putting through the gate | 19:33 |
sdague | which exposed a few new issues | 19:34 |
clarkb | I need to run back to regularly scheduled holiday programming | 19:34 |
fungi | k | 19:35 |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 19:35 | |
jog0 | spot checking shows these numbers appear to be common | 19:36 |
jog0 | merges per day in below 45 | 19:37 |
*** emagana has joined #openstack-infra | 19:37 | |
sdague | jog0: you need to only count merge commits | 19:37 |
sdague | otherwise the timing is off | 19:37 |
sdague | filter by author jenkins | 19:37 |
jog0 | https://github.com/openstack/openstack/graphs/commit-activity | 19:38 |
sdague | jog0: right, but we have 2 commits per commit | 19:38 |
lifeless | sdague: so 3 in total? | 19:39 |
*** markmcclain has quit IRC | 19:39 | |
sdague | :P | 19:39 |
*** HenryG has quit IRC | 19:39 | |
sdague | jog0: anyway if yuo add --author=jenkins to your git commands it will be close | 19:39 |
sdague | it will double count translations | 19:39 |
sdague | but that's pretty minor | 19:39 |
jog0 | I don't see any doubles and translations are merges | 19:40 |
jog0 | anyway | 19:40 |
sdague | jog0: oh, github is filtering merges | 19:41 |
sdague | but on your pastebin | 19:42 |
jog0 | sdague: yeah I forgot about github, they have pretty pictures | 19:42 |
jog0 | anyway data looks inconclusive to me | 19:44 |
jog0 | do we know why deletes take so long in nodepool btw? | 19:44 |
clarkb | because cloud. deletes are expensive | 19:45 |
fungi | jog0: in particular, rackspace likes to ignore them | 19:45 |
fungi | so we keep spamming them with delete calls until they finally free up the node | 19:45 |
jog0 | fungi: ahh | 19:46 |
fungi | hpcloud doesn't ignore them as much, just takes a long time to act on them | 19:46 |
*** praneshp_ has joined #openstack-infra | 19:46 | |
jog0 | are deletes slow in openstack in general? | 19:46 |
fungi | i suspect it depends on the load in your cloud | 19:46 |
jog0 | can we conplain to RAX and HP cloud about it? | 19:46 |
*** CaptTofu has quit IRC | 19:46 | |
fungi | i have this running on nodepool.o.o for the past 18 hours or so, but it hasn't seemed to make any difference: https://review.openstack.org/67723 | 19:47 |
sdague | jog0: they'll probably just complain back to us to clean up nova :) | 19:47 |
jog0 | we can do that | 19:47 |
jog0 | but the nodepool plots are just sad | 19:48 |
fungi | clarkb: i was wanting to ask on 67723, does that need a yield in the outer loop too? | 19:48 |
lifeless | hah, devstack-gate really wants a lot of variables and node state :/ | 19:48 |
*** praneshp has quit IRC | 19:48 | |
*** praneshp_ is now known as praneshp | 19:48 | |
sdague | jog0: honestly, that's what swapping looks like. We've just got a working set far too large for our resources, so now we're swapping | 19:50 |
lifeless | fungi: no, its just broken | 19:50 |
lifeless | fungi: reviewing it now | 19:50 |
fungi | sdague: well, the providers also do take waaaay too long to act on delete calls from us | 19:50 |
fungi | lifeless: okay, thanks. it's a bit over my head i'm afraid | 19:50 |
jog0 | mordred: ^ can you look into the HP side of this | 19:50 |
*** markmcclain has joined #openstack-infra | 19:51 | |
sdague | jog0: I think that's a good long term conversation, I don't see that helping us over the hump | 19:51 |
*** gokrokve has quit IRC | 19:53 | |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/config: Use nodeenv via tox to do javascript testing https://review.openstack.org/67729 | 19:53 |
*** gokrokve has joined #openstack-infra | 19:53 | |
jog0 | sdague: agreed | 19:53 |
*** fifieldt has quit IRC | 19:53 | |
*** yolanda has joined #openstack-infra | 19:54 | |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/config: Use nodeenv via tox to do javascript testing https://review.openstack.org/67729 | 19:55 |
*** rnirmal has joined #openstack-infra | 19:55 | |
lifeless | fungi: so nodepool is regular python | 19:55 |
lifeless | fungi: threads, not eventlet | 19:55 |
*** marun has quit IRC | 19:55 | |
*** westmau5 is now known as westmaas | 19:55 | |
fungi | lifeless: thanks! i'm far more used to hacking on single-threaded applications | 19:55 |
lifeless | fungi: at least, AFAICT | 19:55 |
lifeless | fungi: anyhow, have a look at task_manager.py - you can see that run() is single threaded | 19:55 |
lifeless | fungi: it pulls a work item off of a queue, processes it, and continues. | 19:56 |
lifeless | fungi: it's not using a thread *pool*, so making the time to process a single item longer (e.g. up to 10 minutes!) will delay operating /all/ the tasks in the queue | 19:56 |
*** HenryG has joined #openstack-infra | 19:57 | |
lifeless | fungi: I'll work up an alternative patch for you | 19:57 |
fungi | lifeless: well, it was 10 minutes before, but having the outer loop be 10 minutes rather than the iterate_timeout() loop may make it less what i meant, agreed | 19:57 |
lifeless | fungi: I think you're missing my point :(. All deletes occur in a single thread. | 19:58 |
*** gokrokve has quit IRC | 19:58 | |
fungi | i pondered running two layers of iterate_timeout() inside each other there | 19:58 |
lifeless | fungi: waiting in that thread for a delete to occur makes all other deletes slower. | 19:58 |
*** _ruhe is now known as ruhe | 19:59 | |
fungi | lifeless: you mean originally, or only with that patch | 19:59 |
*** AaronGr is now known as Aarongr_afk | 19:59 | |
lifeless | fungi: in both cases its all single threaded | 19:59 |
fungi | got it | 19:59 |
lifeless | fungi: because its in the JenkinsManager TaskManager queue | 19:59 |
lifeless | fungi: your patch increases how long a specific delete takes, but does so by not deleting anything else for that period... because it's single threaded | 20:00 |
fungi | okay, so the yield in iterate_timeout() doesn't really allow anything helpful anyway | 20:00 |
lifeless | the yield in iterate_timeout is an entirely separate discussion | 20:00 |
lifeless | its because its a generator, so its needed | 20:00 |
fungi | oh, right | 20:00 |
* fungi sighs at his absent-mindedness | 20:01 | |
lifeless | there's also a 1 second gap between tasks by default | 20:03 |
lifeless | I'm not at all sure that makes sense | 20:03 |
lifeless | if you have more than 60 actions a minute, it will backlog | 20:03 |
*** zanins has joined #openstack-infra | 20:03 | |
* lifeless makes a mental note to ask jeblair about that | 20:04 | |
lifeless | it may be working around broken API ratelimits on small clouds | 20:04 |
jog0 | lifeless: that may be why deletes are so slow right now | 20:04 |
fungi | jog0: well, they were equally as slow before i tried that | 20:04 |
lifeless | so one simple thing to try would be to set rate to 0.5 or something | 20:05 |
jog0 | fungi: the 1 second gap? | 20:05 |
fungi | jog0: oh, i thought you meant the extra loop | 20:05 |
russellb | it would probably backlog earlier than 60 per minute | 20:06 |
*** marun has joined #openstack-infra | 20:06 | |
*** bermut has joined #openstack-infra | 20:06 | |
jog0 | well we definitly have more then 60 nodes in nodepool and many are in delete | 20:07 |
russellb | so i wonder if we should just put a hard limit on how many changes are tested in parallel in the gate queue | 20:07 |
russellb | that would help node thrashing on resets | 20:07 |
openstackgerrit | lifeless proposed a change to openstack-infra/nodepool: Provide diagnostics when task rate limiting. https://review.openstack.org/67924 | 20:07 |
lifeless | fungi: so I'd back the aggressive delete out, and apply ^ | 20:07 |
lifeless | fungi: I haven't /tested/ that patch yet, however | 20:07 |
fungi | however the 600-second timeout in the cleanup method was being hit fairly regularly, which was similarly backing up the other delete actions from what i saw before | 20:08 |
jog0 | russellb: looking at status.o.o/zuul we don't get test that many in parallel in gate | 20:08 |
lifeless | fungi: that has to go too | 20:08 |
lifeless | fungi: I bet thats an attempt to avoid quota overuse | 20:08 |
russellb | jog0: right now yeah ... | 20:08 |
jog0 | because we are resources starved, infact top of gate isn't getting run | 20:08 |
jog0 | russellb: right now yeah | 20:08 |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/config: Genericize javascript release artifact creation https://review.openstack.org/67731 | 20:09 |
lifeless | oh, wow | 20:09 |
lifeless | so this code doesn't clearly signal what is within a task and what is not | 20:09 |
lifeless | that 600 second wait actually does a cross-thread block | 20:09 |
jeblair | fungi, lifeless: easy thing to help with deletes is to increase the 600 second delete timeout (maybe 1 hour) | 20:09 |
*** gokrokve has joined #openstack-infra | 20:09 | |
lifeless | jeblair: *increase* it ? | 20:10 |
lifeless | jeblair: what is the 600 second timeout for ? | 20:10 |
jeblair | fungi, lifeless: because it turns deletes from parallel operations into serial ones | 20:10 |
jog0 | right now we have 150 nodes in deleting tate or so | 20:10 |
jeblair | lifeless: to avoid having lots of threads waiting around "forever" for something that isn't going to happen | 20:11 |
lifeless | jeblair: I mean, why wait at all ? | 20:11 |
jeblair | lifeless: occasionally cloud providers never delete nodes | 20:11 |
lifeless | jeblair: the code doesn't take any action if it's not deleted. | 20:11 |
lifeless | jeblair: other than raising an exception | 20:12 |
fungi | yeah, we now have two stuck in a active(deleting) state in hpcloud-az2 which i have manually cleaned out of nodepool so that it doesn't keep trying and failing to delete those | 20:12 |
jeblair | lifeless: good point; it should probably delete, wait 5-10 minutes, then delete again | 20:12 |
openstackgerrit | Davanum Srinivas (dims) proposed a change to openstack-infra/devstack-gate: Temporary HACK : Enable UCA https://review.openstack.org/67564 | 20:12 |
jeblair | lifeless: oh, but it does set the node state, right? | 20:12 |
lifeless | jeblair: note that I can see, I'm just tracing the code atm | 20:13 |
jeblair | lifeless: there is an action if it does succeed -- it deletes the node from the db | 20:13 |
*** bermut has quit IRC | 20:13 | |
jeblair | lifeless: so that's what it's waiting for | 20:14 |
lifeless | jeblair: so I think we should decouple those things | 20:15 |
lifeless | jeblair: not wait, instead set a state DELETING | 20:15 |
*** jasondotstar has quit IRC | 20:15 | |
lifeless | jeblair: and in the periodic check if the server is gone, delete from db, if its not submit a delete again | 20:15 |
sdague | welcome back jeblair | 20:16 |
dstufft | offtopic, but I need to ask someone a pbr question and i don't see a pbr specific channel ;P anyone mind if I PM them? Or tell me if there's a better channel :D (sorry to bother y'all) | 20:16 |
*** thomasem has joined #openstack-infra | 20:17 | |
dansmith | dstufft: it's easy. pull the tab to open the spout, chug it, recycle the can when done | 20:17 |
lifeless | jeblair: in fact, nodedb.DELETE appears to be for this already, just the surrounding code isn't quite aligned | 20:17 |
dstufft | dansmith: :D | 20:17 |
jeblair | lifeless: i think the behavior you described is the problem we're trying tofix | 20:18 |
jeblair | so the thing we want to deal with is that rackspace (apparently) ignores deletes and takes a long time for them to run | 20:18 |
lifeless | jeblair: yes, exactly | 20:18 |
lifeless | jeblair: or are we talking at cross purposes ;) | 20:18 |
*** jcoufal has joined #openstack-infra | 20:18 | |
jeblair | deleting nodes is parallel normally, but after the 10 minute timeout, the parallel thread exits and the only chance for it to be deleted is the serialized periodic task | 20:19 |
jeblair | so overall, i would expect that process to be slower. | 20:19 |
jeblair | the peroidic task should not be where the bulk of work happens, it should be where the stuff that falls through the cracks eventually gets cleaned up | 20:19 |
jeblair | so i think we need to change nodepool to match what's actually happening with clouds | 20:19 |
jeblair | which is that deletes can take longer than 10 minutes normally | 20:20 |
jeblair | so step 1 is to increase the 10 minute timeout for deletes | 20:20 |
lifeless | jeblair: I may be misunderstanding someting here, is deleteNode where the parallel deletes come in? | 20:20 |
jeblair | and if we think that rax is ignoring delete api calls, then we should have it send more of them (step 2) | 20:20 |
lifeless | jeblair: so the theory is that we're stuck on the quota because rax aren't deleting ? | 20:21 |
jeblair | lifeless: yeah, or deleting very slowly | 20:21 |
jeblair | lifeless, fungi: if we're hitting the 10 minute delete timeout and then later the periodic task is successfully deleting rax nodes, then what i've described is accurate | 20:22 |
lifeless | ok, so I see | 20:22 |
lifeless | NodeCompleteThread | 20:22 |
jeblair | fungi: i haven't checked the logs recently, is that the case? | 20:22 |
lifeless | is started per-node | 20:22 |
jeblair | lifeless: right | 20:22 |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 20:22 | |
fungi | jeblair: yes, that's what i've been seeing. mostly in ord | 20:23 |
lifeless | jeblair: so what I want to do is remove the 10m block, let the node complete wrap up quickly and let the periodic check also run quickly | 20:23 |
lifeless | then run the periodic check more often | 20:23 |
jeblair | lifeless: it's not a block | 20:23 |
jeblair | lifeless: because it's a thread-per-node, it doesn't block anything else | 20:24 |
lifeless | jeblair: Clearly I'm misunderstanding the code; I see deleteNode -> cleanupServer -> getServer -> submitTask | 20:24 |
lifeless | jeblair: the periodic code also calls cleanupServer, so it blocks that thread | 20:25 |
lifeless | jeblair: no ? | 20:25 |
mikal | Morning | 20:25 |
jeblair | lifeless: all the manager tasks are fast | 20:26 |
jeblair | lifeless: they are just nova api calls | 20:26 |
lifeless | jeblair: except cleanupServer | 20:26 |
jeblair | lifeless: serialized across multiple threads | 20:26 |
lifeless | jeblair: not for the periodic cleanup | 20:27 |
lifeless | jeblair: unless I've misunderstood TaskManager | 20:27 |
jeblair | lifeless: (periodic cleanup is just one of the threads submitting tasks) | 20:27 |
jeblair | lifeless: the cleanupServer method is slow, but it doesn't block anything else | 20:27 |
jeblair | lifeless: it submits a series of tasks to the manager | 20:27 |
mikal | What does a check time-in-queue time of 4 hours 18 minutes mean? That there weren't enought workers to start running the test immediately that it was enqueued? | 20:27 |
*** senk1 has quit IRC | 20:28 | |
jeblair | lifeless: it's a sort of convenience wrapper around the series of tasks needed to delete a server | 20:28 |
lifeless | jeblair: and the manager is a single thread with a Queue.Queue | 20:28 |
notmyname | mikal: not just workers, but also patches previous to it failing that cause a flush of the gate | 20:29 |
lifeless | I see one JenkinsManager per target jenkins | 20:29 |
fungi | mikal: yes, currently when a gate reset happens, gate pipeline changes go to the back of the line for resource allocation and any pending check pipeline changes are getting available nodes assigned until they catch up to whatever was pending there at teh time of teh gate reset | 20:29 |
jeblair | lifeless: so cleanupServer isn't what is run by that manager, but rather 'removeFloatingIP' 'deleteFloatingIP' 'deleteKeypair' 'deleteServer' are the actual serialized tasks | 20:29 |
*** ryanpetrello has joined #openstack-infra | 20:29 | |
mikal | notmyname: this is check though, I thought that was the IndependantPipelineManager | 20:29 |
mattoliverau | Morning all | 20:30 |
notmyname | mikal: ah. so just what fungi said, then :-) | 20:30 |
notmyname | heh. Australia has woken up ;-) | 20:30 |
mikal | fungi: I am having trouble parsing that... | 20:30 |
fungi | notmyname: the pipeline is independent, but node allocation is on a first-come, first served basis | 20:30 |
fungi | er, mikal ^ | 20:30 |
mikal | Oh, so a gate flush eats all the nodes that check would use? | 20:30 |
mikal | So check starves for a while? | 20:30 |
fungi | mikal: more or less. when there are enough nodes to go around you don't see this. when we run out of available nodes we get into a situation where the pipelines take turns | 20:31 |
mikal | Ok, fair enough | 20:31 |
mgagne | zaro: ping | 20:31 |
fungi | and it escalates, because each pipeline is accumulating new changes faster than it can serve them | 20:31 |
mikal | So... Should I go to the node shop and bring you back some more quota? | 20:32 |
fungi | s/serve/service/ | 20:32 |
fungi | mikal: yes, a thousand standard.large would do nicely ;) | 20:32 |
jeblair | lifeless: so the actual blocking parts of the manager are the methods that do 'self.submitTask(something)' | 20:32 |
lifeless | jeblair: I'm not sure I believe you. periodicCleanup->cleanupOneNode->deleteNode->manager.cleanupServer | 20:32 |
mikal | fungi: this is actually a serious question... Would asking rackspace for more test node quota actually get you out of trouble? | 20:32 |
jeblair | lifeless: cleanupServer as a whole is not blocking | 20:33 |
jeblair | lifeless: there's no thread lock around it or anything | 20:33 |
lifeless | jeblair: it won't return until the server is deleted | 20:33 |
fungi | mikal: i got the impression mordred was already asking rackspace for more quota, so might want to confirm with him (and reinforce as needed) | 20:33 |
*** ociuhandu has joined #openstack-infra | 20:33 | |
jeblair | lifeless: that is correct | 20:33 |
lifeless | jeblair: because getServer does a wait on the task | 20:33 |
mordred | jeblair: yay! | 20:33 |
*** gsamfira has quit IRC | 20:33 | |
*** rfolco has quit IRC | 20:33 | |
lifeless | jeblair: periodicCleanup will be blocked | 20:33 |
jeblair | mordred: don't be too happy | 20:33 |
mikal | mordred: you chasing rackspace for more quota? | 20:33 |
mordred | sorry , that should have been "yay, it's jeblair" | 20:33 |
mordred | mikal: yes | 20:33 |
jeblair | mordred: i'm quite sick | 20:33 |
mordred | jeblair: oh no! | 20:33 |
mordred | jeblair: you need me to bring you soup? I can do that now ... | 20:34 |
*** gokrokve has quit IRC | 20:34 | |
fungi | jeblair: you brought something more than your luggage back from perth, i take it? | 20:34 |
lifeless | jeblair: I *think* you might be saying 'node deletes when jobs finish will still be attempted' - and sure, I agree. | 20:34 |
jeblair | mordred: thanks! but i don't want you to get sick | 20:34 |
lifeless | jeblair: I'm talking about making the whole set of cleanup things accomondate rax better | 20:34 |
*** gokrokve has joined #openstack-infra | 20:34 | |
*** andreaf has joined #openstack-infra | 20:34 | |
*** ociuhandu has quit IRC | 20:34 | |
jeblair | lifeless: so am i. | 20:34 |
lifeless | jeblair: but I want to be sure I understand the code; and when you say 'wont be blocked' while I'm specifically talking about the periodic cleanup, I'm thoroughly confused. | 20:35 |
jeblair | lifeless: oh yes, the periodic cleanup _will_ be blocked. | 20:35 |
*** gokrokve_ has joined #openstack-infra | 20:35 | |
jeblair | lifeless: you're quite right there, and i think you understand correctly. | 20:35 |
lifeless | jeblair: so my point about this was that if we *stop waiting* in the nodecomplete handler | 20:35 |
lifeless | jeblair: *and* stop waiting in the periodic cleanup | 20:35 |
lifeless | jeblair: *then* we can retry across all the pending deletes more often | 20:36 |
*** malini has left #openstack-infra | 20:36 | |
lifeless | jeblair: without adding a raft of new threads or anything | 20:36 |
fungi | without a raft, i'll never get off this island | 20:36 |
jeblair | lifeless: if you don't wait at all then you only give the provider 1 second to delete a node before you ask it to again. | 20:36 |
fungi | though palm trees might do a sight better than threads | 20:36 |
*** markwash has joined #openstack-infra | 20:36 | |
lifeless | jeblair: We do periodic deletes 1/ second ? | 20:36 |
jeblair | lifeless: not at the moment | 20:37 |
jeblair | lifeless: are you suggesting that we leave the periodic interval as-is, every 5 minutes? | 20:38 |
jeblair | lifeless: then minimum time to delete a node will be 5 mins | 20:38 |
*** talluri has joined #openstack-infra | 20:38 | |
lifeless | jeblair: lets say we set it to 30 seconds | 20:38 |
lifeless | jeblair: then if the cloud deletes on request, it will be deleted by nodecompletehandler | 20:38 |
jeblair | lifeless: nodepool won't notice it until the next periodic run though since you aren't waiting for it | 20:39 |
lifeless | jeblair: if the cloud doesn't delete it on the first request, up to 30 seconds later we will try from periodic, and every 30s thereafter | 20:39 |
*** gokrokve has quit IRC | 20:39 | |
lifeless | jeblair: I don't mean 'don't try' I mean 'don't block if it does not go away immediately. | 20:39 |
jeblair | lifeless: it never goes away immediately | 20:39 |
jeblair | lifeless: even the fastest cloud provider takes a little while (many seconds-minutes) to delete a node | 20:40 |
lifeless | sure | 20:40 |
fungi | on a good day, novaclient reports my hpcloud vms gone after 10 seconds and rackspace after more like 60 | 20:40 |
lifeless | do nodes in state DELETE count towards the max-servers count ? | 20:40 |
jeblair | lifeless: yes | 20:40 |
lifeless | ah | 20:40 |
lifeless | jeblair: so is 30 seconds a reasonable time to wait to find out if the cloud deleted the node ? | 20:40 |
jeblair | lifeless: apparently 10 minutes isn't long enough | 20:41 |
lifeless | jeblair: I know, but I'm looking at the nodepool state changes from what I'm proposing | 20:41 |
fungi | i don't think any rackspace deletes will work in a 30-second timeframe. maybe one on occasion, but unlikely | 20:41 |
lifeless | they seem to be that *if* a cloud reacts quickly, we change from findout out at 2/4/6/8 (iterate_timeout) seconds | 20:41 |
lifeless | to finding out at 30+ seconds | 20:41 |
lifeless | in fact, right now we do nodes in state DELETE /2 API checks a second | 20:42 |
lifeless | so we could make the periodiccheck run every 2 seconds | 20:42 |
jeblair | lifeless: i think there are two ways of fixing this: i propose that we adjust the parallel delete strategy to match current reality, you propose going to all-serial. | 20:42 |
lifeless | and it would be the exact same API traffic | 20:42 |
*** gbrugnago has joined #openstack-infra | 20:42 | |
*** dcramer_ has joined #openstack-infra | 20:42 | |
lifeless | jeblair: yes; though actually I wasn't intending to block there; I was more aiming at a centralised view | 20:43 |
lifeless | jeblair: s/block/stop/ | 20:43 |
lifeless | jeblair: anyhow, now I understand more of the design - thanks - I can see why increasing the timeout will help - *as long as nodepool isn't restarted* | 20:43 |
lifeless | jeblair: but when it's restarted everything will become dependent on the periodic cleanup, so I think making that much more effective is important | 20:44 |
*** NikitaKonovalov_ is now known as NikitaKonovalov | 20:45 | |
fungi | under present volume, i've had to resort to ungracefully restarting nodepool and cleaning up the mess | 20:45 |
jeblair | lifeless: agreed; perhaps adjusting the timeout for parallel operation and reducing it for periodic cleanup would be best | 20:45 |
jeblair | s/adjusting/increasing/ | 20:45 |
*** markmcclain has quit IRC | 20:46 | |
lifeless | jeblair: so, what about eliminating the timeout, going all serial as I proposed, but then introducing concurrency in the periodic cleanup - e.g. worker threads there to scatter-gather at some defined concurrency | 20:47 |
lifeless | jeblair: this would get the same performance for live deletes and make after restart better too, without needing two different codepaths | 20:47 |
*** dcramer_ has quit IRC | 20:47 | |
lifeless | jeblair: oh, I just had a possible insight | 20:48 |
lifeless | jeblair: one form of rate limiting is to discard requests that are over the threshold | 20:48 |
lifeless | jeblair: how many nodes do we try to delete at once at peak ? | 20:48 |
*** derekh has joined #openstack-infra | 20:48 | |
lifeless | I'm guessing hundreds | 20:48 |
jeblair | lifeless: yes. sometimes the entire quota. | 20:49 |
lifeless | so what if our basically random api calls results in basically random things being actioned and the rest dropped | 20:49 |
lifeless | being non-blocking-serial (e.g one api call to delete each server before we probe for any of them, then probe all once, then delete all once, in a loop) | 20:49 |
lifeless | would give *much* better behaviour with such rate limiters | 20:50 |
lifeless | -> doctors | 20:50 |
mordred | mikal: I have a thread going with pvo | 20:50 |
lifeless | jeblair: I will prepare a patch after my dr visit so we can discuss code | 20:51 |
jeblair | lifeless: i'm going to be semi-responsive for a while | 20:51 |
jeblair | due to illness and other schedule issues | 20:52 |
openstackgerrit | Michael Krotscheck proposed a change to openstack-infra/storyboard-webclient: Add tox.ini file to run things via tox https://review.openstack.org/67721 | 20:52 |
jeblair | fungi: is there anything else urgent i can help with? otherwise i'm going to go sleep | 20:52 |
*** mrda has joined #openstack-infra | 20:53 | |
fungi | jeblair: go to sleep | 20:53 |
russellb | jeblair: hope you feel better soon! health more important :) | 20:53 |
jeblair | russellb: thanks | 20:53 |
fungi | we're handling it. really most of the issues are volume+openstack bugs | 20:53 |
sdague | jeblair: yeh, hope you feel better soon | 20:54 |
jeblair | sdague: thanks | 20:54 |
fungi | definitely. the sooner you're well, the more we'll get accomplished | 20:54 |
jeblair | fungi: i don't think i'm well enough to go to utah, i'll try to join in by phone or something | 20:55 |
fungi | jeblair: my flight's through baltimore tomorrow, so there's every chance i could get stuck in maryland instead ;) | 20:55 |
jeblair | but hopefully that will give me a chance to get better and pitch in later this week, and hopefully still go to brussels | 20:55 |
mordred | jeblair: ++ | 20:55 |
mordred | jeblair: and seriously- I'm sure you're covered, but let me know if I can be helpful | 20:56 |
*** yolanda has quit IRC | 20:56 | |
jeblair | mordred: cool, thanks | 20:56 |
*** rnirmal has quit IRC | 20:56 | |
*** ociuhandu has joined #openstack-infra | 20:57 | |
*** zanins has quit IRC | 20:57 | |
*** aburaschi has quit IRC | 20:59 | |
sdague | mordred: hey, tox question, because I'm abusing it for doing something unnatural | 20:59 |
sdague | is there an easy way to catch and pass a ^C through tox to the underlying thing it was running? | 21:00 |
mordred | hrm | 21:00 |
mordred | sdague: no idea | 21:00 |
*** DinaBelova is now known as DinaBelova_ | 21:00 | |
sdague | ok, no big deal | 21:00 |
fungi | russellb: sdague: the nova fix is getting nodes now | 21:02 |
russellb | yeah just saw that | 21:02 |
fungi | ~1 hour to results | 21:02 |
*** dcramer_ has joined #openstack-infra | 21:03 | |
openstackgerrit | Sean Dague proposed a change to openstack-infra/elastic-recheck: objectify the gerrit event for our purposes https://review.openstack.org/67941 | 21:03 |
sdague | sweet | 21:03 |
sdague | fingers crossed | 21:03 |
sdague | if you watch on -qa you can see that 680 is coming back a lot | 21:04 |
*** ociuhandu has quit IRC | 21:05 | |
*** CaptTofu has joined #openstack-infra | 21:05 | |
sdague | now lets hope it doesn't fail on one of the other races | 21:05 |
*** misskitty has joined #openstack-infra | 21:05 | |
fungi | clarkb: prelim results from 67911... http://paste.openstack.org/show/61596/ (seems to work as intended) | 21:07 |
fungi | if there's any ipv6 ra monkeybusiness at that point in time, we should be able to identify it now | 21:08 |
fungi | (...and knowing's half the battle) | 21:08 |
clarkb | ++ | 21:09 |
fungi | once check results come back, i say we just approve it into the gate normally and then can promote it or force-merge as necessary if the frequency increases substantially | 21:10 |
*** dprince has quit IRC | 21:10 | |
*** max_lobur is now known as max_lobur_afk | 21:10 | |
fungi | otherwise just let the gate take its course | 21:10 |
clarkb | sounds good | 21:11 |
fungi | i haven't see enough of these yet to suggest it's killing us | 21:11 |
fungi | seen | 21:11 |
clarkb | ya | 21:11 |
*** jaypipes has quit IRC | 21:12 | |
*** talluri has quit IRC | 21:14 | |
*** misskitty has quit IRC | 21:14 | |
openstackgerrit | Derek Higgins proposed a change to openstack-infra/config: Enable precise-backports on tripleo test nodes https://review.openstack.org/67958 | 21:16 |
*** gbrugnago has quit IRC | 21:17 | |
*** kirukhin has joined #openstack-infra | 21:17 | |
ewindisch | russellb: it seems to me that vmware is only complying with the "group b" functional testing requirement on changes that affect their driver directly... is that okay? | 21:17 |
*** dcramer_ has quit IRC | 21:17 | |
*** senk has joined #openstack-infra | 21:17 | |
*** salv-orlando has quit IRC | 21:18 | |
*** salv-orlando has joined #openstack-infra | 21:18 | |
*** smarcet has left #openstack-infra | 21:21 | |
dansmith | ewindisch: have you read the guidelines? | 21:24 |
ewindisch | dansmith: which? I've read https://wiki.openstack.org/wiki/HypervisorSupportMatrix | 21:24 |
dansmith | ewindisch: https://wiki.openstack.org/wiki/HypervisorSupportMatrix/DeprecationPlan | 21:24 |
dansmith | ewindisch: and the bit on the matrix page says "group c will be deprecated" | 21:25 |
openstackgerrit | Sean Dague proposed a change to openstack-infra/elastic-recheck: objectify the gerrit event for our purposes https://review.openstack.org/67941 | 21:25 |
ewindisch | dansmith: yes, I know that... which is why I'm trying to get into group B ;-) | 21:25 |
ewindisch | dansmith: I need to re-review the click-through for DeprecationPlan | 21:25 |
russellb | right, A and B are fine | 21:26 |
russellb | i expect most to end up in B | 21:26 |
dansmith | (for now) | 21:26 |
russellb | A is ideal | 21:27 |
russellb | B acceptable | 21:27 |
ewindisch | russellb / dansmith: the problem is that running our own gating infrastructure for every change is quite an undertaking. I had originally thought this could run in upstream CI | 21:27 |
*** CaptTofu has quit IRC | 21:28 | |
dims | ewindisch, the folks working on the vmware driver are on #openstack-vmware channel if you have questions for them as well - fyi | 21:28 |
dansmith | ewindisch: yeah, that's why most people got started early | 21:28 |
ewindisch | russellb: well, it sounds like A -- which is what I'd prefer to implement - is not acceptable by the openstack-infra team, based on conversations earlier today | 21:28 |
russellb | yeah, this has been set since before the driver was merged | 21:28 |
russellb | well ... it's just that the timing is bad | 21:28 |
dansmith | ewindisch: most of them can't run in upstream infra, so you have a major advantage that you can even do that | 21:28 |
ewindisch | dims: the question was more to russell, "does vmware qualify as B considering it doesn't run on every proposed change to nova"? | 21:28 |
russellb | it *will* be running on every change | 21:29 |
dansmith | ewindisch: they're working on that | 21:29 |
sdague | ewindisch: you can't come to infra at i2 and ask for implementing additional hypervisor in upstream ci | 21:29 |
russellb | that's their plan | 21:29 |
ewindisch | russellb: gotcha | 21:29 |
sdague | if we'd had a session at icehouse summit, it would be something worth discussing | 21:29 |
sdague | which is why I said -1, bring to juno summit | 21:29 |
russellb | fwiw, supporting docker in existing CI is way easier than anything else | 21:29 |
*** dcramer_ has joined #openstack-infra | 21:29 | |
sdague | russellb: agreed | 21:30 |
ewindisch | russellb: agreed. | 21:30 |
ewindisch | sdague: is it about human resources or hardware resources? | 21:30 |
russellb | but yeah, have to be sensitive to infra priorities based on the status of things | 21:30 |
sdague | ewindisch: right now, both | 21:30 |
dims | ewindisch, right. i was responding to "running our own gating infrastructure". you can get an idea from them if you wanted to :) | 21:31 |
ewindisch | dims: ah | 21:31 |
*** jhesketh_ has joined #openstack-infra | 21:33 | |
openstackgerrit | Andreas Jaeger proposed a change to openstack-infra/config: Early abort documentation builds https://review.openstack.org/67722 | 21:33 |
*** ruhe is now known as _ruhe | 21:34 | |
ewindisch | sdague: I've worked on gate stuff before, I don't know if it will require that much human capital besides my own effort and perhaps some inquiries here on irc -- but I could be wrong. | 21:34 |
ewindisch | sdague: hardware is something we might be able to help with, TBD | 21:35 |
mattoliverau | lifeless: In regards to speeding up the cleaning up/deleting of nodes: I don't know if this is possible yet, I've started playing, but what if we only have to build servers once (each day). That is, Build a server with a main LXC container using the prepare_node.sh etc. Then everytime we need a new "server" for running a test/build, the create is as simeple as creating an empherial LXC container (of | 21:35 |
mattoliverau | an existing one). This is a container that only lasts until it's turned off.. so run the tests and then the delete and clean up of a node is as simple as stopping a container. Containers run almost as fast as the the machine they run on as they use the same kernel. So as long as the tests/devstack can run inside one of course, so I could be missing something here, but wouldn't this speed up | 21:35 |
mattoliverau | subsuquent rebuids and deletes of each node. Just my 2 cents. But again I'm new to the project and have a huge gap in my knowlege on the evironment etc. | 21:35 |
*** yamahata has quit IRC | 21:36 | |
ewindisch | sdague: one of my concerns is that pulling from the gerrit eventstream, we don't get the advantages of things like zuul and "speculative testing" that are done upstream. | 21:36 |
clarkb | mattoliverau: containers are insufficient for our needs. pleia2 has a list of issues iirc | 21:36 |
*** nati_ueno has joined #openstack-infra | 21:37 | |
sdague | ewindisch: you only need to vote on check | 21:37 |
*** jhesketh__ has joined #openstack-infra | 21:37 | |
clarkb | there are things that are not namespaced that openstack touches | 21:37 |
mattoliverau | clarkb: ok, just a thought :) | 21:37 |
sdague | and I agree, it's not quite the same | 21:37 |
jhesketh__ | Morning | 21:37 |
clarkb | mattoliverau: we wish they would work :) | 21:37 |
sdague | ewindisch: to be pragmatic. 1) don't plan on this happening in icehouse. 2) start working on how to do it in juno, get prelim work started now 3) be prepared to do summit session on it in Atlanta | 21:38 |
*** yamahata has joined #openstack-infra | 21:38 | |
russellb | sdague: it's really not that complicated ... not sure a summit session should block anything | 21:39 |
ewindisch | sdague: meanwhile, unless we invest in external infrastructure, our driver is removed from NOva | 21:39 |
sdague | russellb: it's a socializing thing about what the new matrix looks like | 21:39 |
russellb | we can only socialize every 6 months? | 21:39 |
clarkb | so I spoke to devananda about ironic testing too. can we have nova talk to libvirt wemu, ironic, and docker and run one test | 21:40 |
sdague | russellb: there is *so* much to be done to get us to a functioning i3 at this point given the current gate, adn there are very few people to get it done | 21:40 |
ewindisch | investing in external infrastructure which is not only expensive, distracts us from making progress on getting into "group A" | 21:40 |
mikal | So, as a data point turbo hipster runs on every nova commit and isn't _that_ big | 21:40 |
*** Shrews has quit IRC | 21:41 | |
mikal | (21 instances, about $2,000 a month in public cloud costs) | 21:41 |
russellb | mikal: cool data point | 21:41 |
dansmith | yeah, awesome | 21:41 |
mikal | It sometimes gets behind, but that's mostly when dansmith does a thing | 21:42 |
russellb | sdague: yeah, i get that, it's kinda late to be getting started trying to get something running, given that infra is only going to get more busy | 21:42 |
fungi | and reuses a lot of upstream ci tooling | 21:42 |
mikal | And it catches up | 21:42 |
dansmith | $6k per cycle for CI testing | 21:42 |
russellb | well ... 12k | 21:42 |
mikal | Ball park | 21:42 |
russellb | 6 months :-) | 21:42 |
*** Shrews has joined #openstack-infra | 21:42 | |
mikal | If we catch one production db problem a cycle, then that's easily paid for itself | 21:42 |
dansmith | russellb: only if you feel the need for the numbers to be right | 21:42 |
russellb | mikal: how long do your runs take? | 21:42 |
dansmith | russellb: er, yeah, 12 :) | 21:42 |
mikal | Heh | 21:42 |
mikal | Ummm, about 20 minutes... | 21:42 |
russellb | we're asking for a full tempest run | 21:43 |
mikal | So a _lot_ faster than infra's CI at this point | 21:43 |
russellb | right | 21:43 |
mikal | So, what's a tempest run these days? An hour? | 21:43 |
russellb | yeah | 21:43 |
dansmith | certainly a tempest docker run would be way faster than kvm, no? | 21:43 |
mikal | So I guess multiply those numbers by three | 21:43 |
russellb | dansmith: yes | 21:43 |
russellb | because a tempest config that works with docker would be a small subset | 21:43 |
mikal | But yeah, I would expect containers to be a lot faster to start than vms | 21:43 |
russellb | that too | 21:43 |
russellb | but also, docker driver supports a small subset of the API | 21:43 |
dansmith | yeah, both of those things | 21:43 |
ewindisch | dansmith: yeah, and I've thought about doing "docker in docker" so we can avoid putting any of it into VMs at all (or gating multiple tests on a single VM in parallel) | 21:44 |
openstackgerrit | Matt Ray proposed a change to openstack-infra/config: Chef style testing enablement and minor speed cleanup starting w/block-storage https://review.openstack.org/67964 | 21:44 |
russellb | ewindisch: sure, whatever works ... | 21:44 |
russellb | just ... full tempest run on every patch :) | 21:45 |
sdague | anyway, I think with what's on the infra plate at this point, I think this is too late. Especially by a team that's not contributed to anything besides their corner of the world. So start helping on generic infra so we can free up some resources, and then it becomes part of the conversation | 21:45 |
russellb | where "full" is a bit loose | 21:45 |
sdague | every new feature has a cost, and i2 is the wrong place to be bringing this forward | 21:45 |
dansmith | russellb: well, I think the definition is "full, for everything you support, and show your config" :) | 21:46 |
russellb | dansmith: yeah | 21:46 |
sdague | I'll let jeblair contradict me when he is well, but until then, I'll play bad cop :) | 21:46 |
dansmith | I would think "we're not testing anything else until we can test what we already test" would be a reasonable answer until we get out of the current mess anyway, almost regardless of what it is | 21:47 |
russellb | +1 to that | 21:48 |
ewindisch | sdague: we're a startup, putting a team onto openstack-ci work is really a non-starter. I've personally worked with openstack-ci stuff in the past, although admittedly as it improved my "own little corner of the world", but I'm not entirely fresh on this. | 21:48 |
boris-42 | mikal russellb sdague sorry for probably off top but we learn Rally to make deployment at scale | 21:48 |
* russellb stares down the top nova change in the queue | 21:48 | |
boris-42 | I mean for 30 minutes we got 128 compute nodes | 21:48 |
russellb | boris-42: huh? | 21:48 |
boris-42 | russellb yep we are working around Rally | 21:48 |
boris-42 | russellb thing that make benchmarking simple | 21:48 |
boris-42 | russellb so latest results are next simulation of compute_node (running it in LXC) requires 150MB RAM | 21:49 |
boris-42 | russellb and we are instead of deploying it (actually copy-pasting) | 21:49 |
boris-42 | russellb probably will be interesting for catching Rabbit/NovaNetwork/Scheduer/DB bottlnecks | 21:50 |
boris-42 | without having tons of resources | 21:50 |
boris-42 | and a lot of $$$ | 21:50 |
sdague | ewindisch: again, it's about timing. you can't show up at i2, when we are under huge strains in the existing system, and say "hey guys, I want you all to pivot out the test matrix and test our hypervisor" | 21:51 |
dansmith | it's not like this nova requirement is new, or anything | 21:51 |
*** NikitaKonovalov is now known as NikitaKonovalov_ | 21:52 | |
mikal | I think you could argue as well that our obligation to existing driver users is greater than our obligation to new drivers. | 21:52 |
fungi | russellb: this is not the failure mode your new change is trying to fix, right? https://jenkins01.openstack.org/job/gate-grenade-dsvm/4786/consoleText | 21:52 |
mikal | We have a duty of care to the users we currently have | 21:52 |
*** jaypipes has joined #openstack-infra | 21:52 | |
dansmith | fungi: no | 21:53 |
fungi | okay, good. because that cropped up with the proposed fix in place | 21:53 |
dansmith | fungi: I think that's "the other one" | 21:53 |
ewindisch | sdague: I understand that. We're conflating two issues here of human and hardware resources. I acknowledge we might need to help with both, however. | 21:53 |
*** kirukhin has quit IRC | 21:53 | |
dansmith | fungi: i.e. switch the 8 and 0 | 21:53 |
fungi | dansmith: it's definitely a common one, because i've hit it on several changes today | 21:54 |
dansmith | fungi: yar | 21:54 |
russellb | yeah, not sure what that one is yet | 21:54 |
sdague | and we've got the other issue which is why would we play favorites on containers and pick docker instead of libvirt lxc | 21:54 |
ewindisch | sdague: presuming we could help with hardware, are the human-side strains still too hard? | 21:55 |
russellb | sdague: well ... someone is actually trying to do the work for docker, heh | 21:55 |
sdague | which is why I think this is a summit conversation | 21:55 |
fungi | dansmith: ahh, yep, the cinderclient change behind it is also failing on that | 21:55 |
sdague | ewindisch: yes | 21:55 |
*** _david_ has quit IRC | 21:55 | |
sdague | the infra team is massively strained at this point | 21:55 |
fungi | sdague: it's not *that* bad. i did actually sleep a few hours last night | 21:56 |
sdague | and we're probably going to need to do some heads down things to get the gate to a good state for i3 | 21:56 |
mikal | sdague: don't forget Canonical's lxc specific driver, which has been in review for a while | 21:56 |
sdague | yep | 21:56 |
ewindisch | sdague / dansmith: and we haven't ignored those requirements-- Docker acknowledged that the gating work had to be done and resourced the effort -- which is in part what I've been hired to accomplish. | 21:57 |
portante | sdague, fungi, clarkb: FWIW, I think you guys are doing a great job, and rely on your commitment and knowledge tremendously | 21:57 |
mikal | sdague: there's at least three container options at the moment | 21:57 |
fungi | portante: thanks! | 21:57 |
*** thuc has joined #openstack-infra | 21:57 | |
sdague | portante: thanks | 21:57 |
russellb | mikal: well ... 2 in tree | 21:58 |
russellb | mikal: the other one didn't even have a blueprint last i saw it | 21:58 |
fungi | ewindisch: a related datapoint, note that there are a stack of changes proposed to support xenserver in upstream infra, started a while back, and still being hashed over | 21:58 |
russellb | so, pretty far from even needing code review IMO | 21:58 |
sdague | mikal: right, which is why I said this is a summit conversation. Because I think containers in gate is a good idea, and I think it's a community conversation we should have. It's just not a now good idea. | 21:58 |
mikal | russellb: that's true, but it exists | 21:58 |
russellb | for some definition of exists | 21:58 |
russellb | not really relevant for this discussion of driver CI right now | 21:59 |
fungi | russellb: sdague: IT LIVES | 21:59 |
sdague | fungi: sweet! | 21:59 |
russellb | merged? | 21:59 |
mikal | I wonder how broken a tempest run with lxc containers turned on is? | 21:59 |
ewindisch | fungi: I'd have to look at those changes, but my perspective is that I'd target the docker gate to have no more impact than, say, adding a postgres gate as opposed to mysql | 21:59 |
fungi | russellb: well, it *will* merge once zuul wakes up and processes the result it has there | 21:59 |
sdague | russellb: passed everything | 21:59 |
russellb | mikal: well first you'd have to come up with a tempest config that only hits what it supports | 21:59 |
russellb | fungi: yay | 21:59 |
dansmith | woo! | 22:00 |
russellb | now, that other damn bug ... | 22:00 |
russellb | mriedem: have you fixed it yet? :-p | 22:00 |
sdague | some times you do get the bear | 22:00 |
sdague | on a day like today, a win like that is a good one | 22:00 |
*** rnirmal has joined #openstack-infra | 22:00 | |
fungi | ewindisch: right, their work involved needing separate test node configurations entirely (they have to reboot for new kernels and other stuff), so conceivably less involved for docker | 22:00 |
mriedem | russellb: nope, was thinking about pushing a test patch to increase the sleep in the libvirt volume module to see if it hits after a few rechecks, but i'm open to suggestion/help | 22:01 |
russellb | mriedem: was mostly kidding of course :)P | 22:01 |
mriedem | jsbryant said he looked at it a bit and nothing jumped out at him from the cinder changes | 22:01 |
ewindisch | fungi: we just need to install a userland package and run a daemon. We no longer have any special kernel requirements (there used to be a requirement on AUFS which required a newish vanilla kernel) | 22:01 |
fungi | sdague: well, it's a win, but it'll be the first change to merge through normal gating in 8 hours (per the openstack/openstack commit log) | 22:02 |
sdague | fungi: I'll take anything today | 22:02 |
ewindisch | fungi: the only special requirement we have right now is that our package isn't in precise-backports, only tahr (14.04)... I recognize it will be easier if we can use upstream ubuntu pacakges that work in Precise, so I'm pressing to get a package into precise-backports ASAP | 22:03 |
fungi | ewindisch: or ubuntu cloud archive for precise, assuming we can work out why it's still breaking tempest runs and nova unit tests | 22:03 |
*** nati_ueno has quit IRC | 22:04 | |
ewindisch | fungi: at present, we have our own packages for precise that live in our own private repo (signed with our own key). I recognize that's troublesome in a few ways ;-) | 22:04 |
*** ArxCruz has quit IRC | 22:06 | |
*** dizquierdo has joined #openstack-infra | 22:06 | |
fungi | ewindisch: yes, i know you definitely understand that ;) | 22:07 |
*** beagles has quit IRC | 22:07 | |
*** ArxCruz has joined #openstack-infra | 22:09 | |
*** marun has quit IRC | 22:14 | |
*** nati_ueno has joined #openstack-infra | 22:14 | |
*** nati_ueno has quit IRC | 22:14 | |
dansmith | russellb: merged | 22:14 |
russellb | \o/ | 22:15 |
russellb | good thing every patch isn't that hard to land | 22:15 |
russellb | ... usually | 22:15 |
dansmith | what's with the big gap in the failure rates graph on the e-r page? | 22:15 |
*** nati_ueno has joined #openstack-infra | 22:16 | |
*** ewindisch is now known as zz_ewindisch | 22:16 | |
*** zz_ewindisch is now known as ewindisch | 22:17 | |
*** Ajaeger1 has quit IRC | 22:18 | |
*** ewindisch is now known as zz_ewindisch | 22:20 | |
*** zz_ewindisch is now known as ewindisch | 22:21 | |
*** nati_ueno has quit IRC | 22:23 | |
openstackgerrit | Davanum Srinivas (dims) proposed a change to openstack-infra/devstack-gate: Temporary HACK : Enable UCA https://review.openstack.org/67564 | 22:26 |
*** jerryz has joined #openstack-infra | 22:27 | |
*** yamahata has quit IRC | 22:29 | |
*** michchap has quit IRC | 22:30 | |
*** michchap has joined #openstack-infra | 22:31 | |
*** senk has quit IRC | 22:32 | |
jerryz | fungi: ping | 22:32 |
*** dcramer_ has quit IRC | 22:33 | |
*** thomasem has quit IRC | 22:33 | |
fungi | jerryz: hi there | 22:34 |
*** nati_ueno has joined #openstack-infra | 22:35 | |
*** jcoufal has quit IRC | 22:36 | |
*** sandywalsh has joined #openstack-infra | 22:36 | |
jerryz | fungi: i have a question about third party testing. if a gerrit trigger is configured for a project, will every single create patch event of the project trigger a third party test ? | 22:37 |
fungi | jerryz: yes, in a normal configuration, it will | 22:37 |
jerryz | fungi: even if the patch may not have anything to do with the plugin | 22:37 |
*** yamahata has joined #openstack-infra | 22:38 | |
ewindisch | mikal: any idea how many patchsets per day on nova? | 22:38 |
*** dims has quit IRC | 22:39 | |
mikal | About 100 last I looked | 22:39 |
ewindisch | thanks | 22:39 |
mikal | Obviously around deadlines that spikes | 22:39 |
*** thuc has quit IRC | 22:39 | |
fungi | jerryz: i'm not familiar enough with the gerrit-trigger plugin for jenkins to know whether it can filter on changes matching only specific file patterns. but as far as whether the desired result is to test on every patch, that's more of a question for the ptl who's insisting on test results (i don't know whether requirements are differing between nova, neutron and cinder driver testing) | 22:39 |
*** thuc has joined #openstack-infra | 22:40 | |
*** thuc has quit IRC | 22:40 | |
*** dizquierdo has quit IRC | 22:40 | |
jerryz | fungi: do you know what is the final decision on whether keep enable third party testing +1 privilege? | 22:43 |
*** senk1 has joined #openstack-infra | 22:43 | |
mikal | ewindisch: I am lying it seems, its closer to 200 | 22:43 |
* mikal is making a graph now | 22:43 | |
jerryz | fungi: if a third party testing account will post vote on any patch regarding the project, that would indeed require the third party ci infra to be stable. | 22:44 |
*** dkranz has quit IRC | 22:44 | |
ewindisch | mikal: thanks | 22:44 |
mriedem | ewindisch: http://russellbryant.net/openstack-stats/nova-reviewers-30.txt | 22:44 |
mriedem | New patch sets in the last 30 days: 2564 (85.5/day) | 22:44 |
jerryz | fungi: i mean -1 privilege | 22:45 |
*** dcramer_ has joined #openstack-infra | 22:46 | |
fungi | jerryz: it's mostly consensus from the project it's voting on. there are some clarifications to the guidelines being proposed at https://review.openstack.org/63478 | 22:47 |
*** jasondotstar has joined #openstack-infra | 22:50 | |
*** carl_baldwin has joined #openstack-infra | 22:50 | |
*** nati_ueno has quit IRC | 22:52 | |
*** nati_ueno has joined #openstack-infra | 22:54 | |
*** dims has joined #openstack-infra | 22:54 | |
lifeless | ok back | 22:57 |
lifeless | fungi: clarkb: where are we at with exhaustion ? | 22:57 |
sdague | dansmith: on top? graphite fell over | 22:58 |
fungi | lifeless: i've reverted to running my manual auxiliary nodepool delete loops from the cli to keep the stale deletes minimized | 22:58 |
dansmith | sdague: ah, okay | 22:59 |
sdague | because, you know, we didn't have enough things breaking today :) | 22:59 |
fungi | i missed the graphite outage. who wound up fixing that? | 23:00 |
dansmith | sdague: just sucks to be able to see the change, if any, from the recent merge, which is why I was asking | 23:00 |
*** jasondotstar has quit IRC | 23:00 | |
sdague | yeh | 23:01 |
sdague | honestly, it takes a while to build up data anyway | 23:01 |
*** carl_baldwin has quit IRC | 23:01 | |
sdague | and I'm less trusting of the graphite numbers after I found that some of our interupts get reported as fails | 23:02 |
fungi | that's something i think would have to be addressed in jenkins itself too | 23:02 |
*** carl_baldwin has joined #openstack-infra | 23:02 | |
*** nati_ueno has quit IRC | 23:03 | |
lifeless | fungi: ahahahaha | 23:04 |
lifeless | fungi: I found a 15m latency on periodic cleanup as well | 23:04 |
fungi | lifeless: ooh! | 23:04 |
*** nati_ueno has joined #openstack-infra | 23:04 | |
sdague | lifeless: nice | 23:04 |
openstackgerrit | lifeless proposed a change to openstack-infra/nodepool: Cleanup nodes in state DELETE immediately. https://review.openstack.org/67979 | 23:05 |
lifeless | I may be misunderstanding state_time | 23:06 |
*** mrodden has quit IRC | 23:07 | |
lifeless | actually, I think that code block is entirely broken | 23:07 |
*** miqui has joined #openstack-infra | 23:07 | |
* lifeless revisits | 23:07 | |
lifeless | yeah, its missing a now - | 23:08 |
*** senk1 has quit IRC | 23:09 | |
*** miqui has quit IRC | 23:09 | |
*** miqui has joined #openstack-infra | 23:09 | |
*** miqui has quit IRC | 23:10 | |
lifeless | there | 23:11 |
openstackgerrit | lifeless proposed a change to openstack-infra/nodepool: Fix early-exit on recently-set-state in deleteNode https://review.openstack.org/67980 | 23:11 |
openstackgerrit | lifeless proposed a change to openstack-infra/nodepool: Cleanup nodes in state DELETE immediately. https://review.openstack.org/67979 | 23:11 |
openstackgerrit | lifeless proposed a change to openstack-infra/nodepool: Cleanup nodes in state DELETE immediately. https://review.openstack.org/67979 | 23:12 |
openstackgerrit | lifeless proposed a change to openstack-infra/nodepool: Fix early-exit in cleanupOneNode https://review.openstack.org/67980 | 23:12 |
lifeless | sorry for spam :) | 23:13 |
fungi | grrr... my flight tomorrow just got cancelled | 23:13 |
*** sarob has joined #openstack-infra | 23:13 | |
russellb | fungi: :( weather? | 23:16 |
openstackgerrit | lifeless proposed a change to openstack-infra/nodepool: Log how long nodes have been in DELETE state. https://review.openstack.org/67982 | 23:17 |
openstackgerrit | lifeless proposed a change to openstack-infra/nodepool: Consolidate duplicate logging messages. https://review.openstack.org/67983 | 23:17 |
*** dcramer_ has quit IRC | 23:20 | |
fungi | russellb: yeah, my layover was going to be in baltimore, which is now on lockdown for tomorrow (noaa/nws winter storm warning all day_ | 23:21 |
russellb | bummer | 23:21 |
fungi | just rebooked through vegas instead, but i think the long leg will end up being without wifi as a result | 23:22 |
*** sarob has quit IRC | 23:22 | |
russellb | vegas is a good choice. much worse places to be stuck than there, just in case | 23:22 |
clarkb | fungi: :( its ok I will be back to full focus tomorrow | 23:23 |
fungi | yeah, i figured it's slightly less likely to get buried under ice and snow | 23:23 |
fungi | clarkb: yay! | 23:23 |
russellb | i'm out for today ... on the volume bug, only candidate we have is https://review.openstack.org/#/c/67973/ | 23:23 |
russellb | just going to watch that through some rechecks while we keep digging | 23:23 |
fungi | russellb: thanks for the heads up | 23:24 |
russellb | that's https://bugs.launchpad.net/nova/+bug/1270608 | 23:24 |
jgriffith | russellb: agreed | 23:24 |
russellb | jgriffith: mriedem thanks again | 23:25 |
mriedem | np, fun first day back :) | 23:25 |
sdague | clarkb: if you have a little focus now, the config change with the er - uncategorized list would be handy to help us stay figure out what other unknown bugs are in the reset pile | 23:26 |
sdague | it was very good gamification for jog0 to try to drive up our classification rate | 23:27 |
*** eharney has quit IRC | 23:28 | |
*** jamielennox|away is now known as jamielennox | 23:29 | |
*** derekh has quit IRC | 23:30 | |
*** dcramer_ has joined #openstack-infra | 23:32 | |
*** gokrokve_ has quit IRC | 23:33 | |
*** gokrokve has joined #openstack-infra | 23:34 | |
lifeless | fungi: cron timing | 23:36 |
lifeless | fungi: in nodepool | 23:36 |
openstackgerrit | lifeless proposed a change to openstack-infra/nodepool: Make cleanupServer optionally nonblocking. https://review.openstack.org/67985 | 23:37 |
*** gokrokve has quit IRC | 23:38 | |
openstackgerrit | lifeless proposed a change to openstack-infra/config: Cleanup old servers every minute. https://review.openstack.org/67986 | 23:39 |
lifeless | fungi: clarkb: would love https://review.openstack.org/#/c/67685 to be reviewed please | 23:39 |
*** carl_baldwin has quit IRC | 23:39 | |
lifeless | jeblair: I've pushed a stack that will do what I propose to nodepool; I'm giving it a basic test now | 23:40 |
jog0 | wow 3 patches in openstack/openstack in 8 hours :/ | 23:49 |
*** rcleere has quit IRC | 23:53 | |
lifeless | yah, messed up | 23:53 |
lifeless | did you see jay's note that passlib isn't installed properly? | 23:53 |
*** rcleere has joined #openstack-infra | 23:53 | |
lifeless | > https://review.openstack.org/#/c/66670/ | 23:54 |
lifeless | That second patch has the gate-tempest-dsvm-neutron-isolated job failing | 23:54 |
lifeless | trying to run keystone-manage pki-setup: | 23:54 |
lifeless | ImportError: No module named passlib.hash | 23:54 |
*** reed has joined #openstack-infra | 23:55 | |
*** rcleere has quit IRC | 23:58 | |
lifeless | jog0: ^ | 23:59 |
jog0 | lifeless: I did | 23:59 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!