dansmith | sorry if I missed it in the scrollback, but things are wedged right now, yes? | 00:00 |
---|---|---|
*** sarob has quit IRC | 00:01 | |
sdague | clarkb: awesome | 00:02 |
fungi | dansmith: not that anyone's said until now... | 00:04 |
sdague | dansmith: stable/grizzly is still a problem | 00:04 |
dansmith | fungi: the top thing in the check queue looks to have been there for five hours | 00:04 |
sdague | but master should be fine | 00:04 |
dansmith | my thing queued for master has been sitting in check for 2+ hours | 00:05 |
clarkb | dansmith: I think what is happening there is we have enough jobs in the gate queue that we are starving the check queue | 00:06 |
clarkb | dansmith: as gate queue jobs get dibs on slaves first | 00:06 |
dansmith | clarkb: really? 36 in the gate right? | 00:06 |
clarkb | dansmith: yes | 00:06 |
clarkb | dansmith: but the new NNFI causes a lot more thrashing. Less time in between for check to catch up | 00:06 |
dansmith | it's been much higher than that in the not to distant past | 00:06 |
dansmith | ah | 00:06 |
clarkb | tl;dr we need to fix flakyness | 00:07 |
dansmith | that's some pretty bad starvation.. 5h with no progress.. | 00:07 |
*** gyee has quit IRC | 00:07 | |
dansmith | okay | 00:07 |
jeblair | also, more cloud servers | 00:07 |
jeblair | but mostly flakyness | 00:07 |
*** sarob_ has quit IRC | 00:09 | |
sdague | jeblair: can we burst some more nodes? getting to rc1 is going to be tough if stuff is hanging in check that long | 00:10 |
*** ArxCruz has joined #openstack-infra | 00:10 | |
dansmith | cha | 00:10 |
*** sarob has joined #openstack-infra | 00:10 | |
sdague | also, we should probably drop large-ops from gate, non voting on the gate just burns time | 00:11 |
fungi | we'd need to get hp to raise our quotas, right? | 00:11 |
sdague | or put the rack nodes back in rotation | 00:11 |
sdague | slow on check wouldn't be that big a deal | 00:11 |
fungi | true | 00:11 |
*** dims has joined #openstack-infra | 00:11 | |
*** sarob has quit IRC | 00:12 | |
*** sarob has joined #openstack-infra | 00:13 | |
jeblair | sdague: that's what i've been working on. :) | 00:13 |
*** dcramer_ has joined #openstack-infra | 00:14 | |
jeblair | sdague, fungi: zuul is able to thrash nodes faster than nodepool can keep up, so i'm working on getting nodepool to be able to more or less instantly burst to capacity | 00:14 |
jeblair | we are, however, at the moment pretty close to capacity. | 00:14 |
jeblair | (we've worked up to it over a while) | 00:14 |
* fungi nods | 00:15 | |
*** reed_ has quit IRC | 00:17 | |
jeblair | node selection by pipeline is possible. we could reserve rackspace nodes for that purpose. we're going to run into unit test node starvation too, which is the next thing i'm going to work on. of course we can spin up more static nodes for now. | 00:17 |
openstackgerrit | Sean Dague proposed a change to openstack-infra/config: drop large-ops from gate (it's non voting) https://review.openstack.org/48545 | 00:17 |
jog0 | was logs.openstack.org down for a split second for sdague's new patch? | 00:17 |
clarkb | jog0: apache may have been restarted momentarilly | 00:17 |
sdague | so that will help a little | 00:17 |
*** adalbas has quit IRC | 00:17 | |
jeblair | jog0: are you ready to make large-ops voting or should we consider https://review.openstack.org/48545 ? | 00:17 |
jog0 | clarkb: that explains what i saw thanks | 00:18 |
jog0 | jeblair: I am ready | 00:18 |
jeblair | jog0: then can you propose a change to do that | 00:19 |
sdague | so the neutron job looks like it has < 50% pass rate right now - https://jenkins01.openstack.org/job/gate-tempest-devstack-vm-neutron/ | 00:22 |
clarkb | it would be cool if gearman priority could be weighted so that as things aged in check they would get more priority and could flop positions with gate | 00:22 |
jeblair | sdague: that includes check jobs | 00:22 |
sdague | it does | 00:22 |
sdague | but I watched 2 neutron based resets in the last 4 minutes | 00:22 |
jeblair | i'm going to be busy with the nodepool bursting change, if someone else wants to take making rackspace nodes available for check jobs | 00:22 |
clarkb | jeblair: I can take a quick stab at it. There is a usergroup thing at 6 that I plan on going to though | 00:24 |
jeblair | sdague: that's complex. i'd rather throw more machines at the problem. | 00:24 |
clarkb | jeblair: how would we make it so those nodes are only used for check? new label and new jobs? | 00:24 |
*** UtahDave has quit IRC | 00:24 | |
clarkb | or use a zuul function? | 00:24 |
jeblair | clarkb: new label and zuul parameter function that sets the node to that label | 00:25 |
clarkb | got it. | 00:25 |
openstackgerrit | Joe Gordon proposed a change to openstack-infra/config: Make gate-tempest-devstack-vm-large-ops voting https://review.openstack.org/48547 | 00:25 |
jog0 | jeblair: done | 00:25 |
*** matsuhashi has joined #openstack-infra | 00:28 | |
*** colinmcnamara has joined #openstack-infra | 00:28 | |
*** MarkAtwood2 has quit IRC | 00:29 | |
*** colinmcnamara has quit IRC | 00:35 | |
*** rockyg has quit IRC | 00:38 | |
*** nosnos has joined #openstack-infra | 00:38 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Use rackspace for tempest check tests. https://review.openstack.org/48549 | 00:38 |
clarkb | jeblair: fungi mordred ^ I am really sure that is wrong as gearman node selection doesn't happen with NODE_LABEL iirc | 00:39 |
clarkb | and I need to head to the user group thing, but that should jumpstart the process, feel free to push better patchsets | 00:39 |
*** jhesketh has joined #openstack-infra | 00:39 | |
*** rnirmal has quit IRC | 00:39 | |
*** senk has joined #openstack-infra | 00:40 | |
*** kong has joined #openstack-infra | 00:41 | |
jhesketh | jeblair: What do you think about introducing conditional reporting into zuul. For example, since we'll be running our own zuul to report back to gerrit we don't want it to report on merge failures. In fact, we probably only need it to report in certain cases. For example, when our tests fail we always want to report FAILURE but we only need to report SUCCESS when there is a new migration introduced. | 00:41 |
*** weshay has quit IRC | 00:41 | |
*** CaptTofu_ has quit IRC | 00:43 | |
jog0 | clarkb sdague logstash is only 7 hours behind now! | 00:44 |
sdague | nice | 00:44 |
*** julim has joined #openstack-infra | 00:44 | |
jog0 | looks promising hopefully its not just related to peoples workday | 00:44 |
sdague | yeh, we'll find out tomorrow | 00:45 |
jog0 | sdague: saw your new patch in action, will make gate on stacktrace easy | 00:45 |
clarkb | its not. job queue fell by 100k in about an hour | 00:46 |
jog0 | \0/ | 00:47 |
clarkb | change definitely helped | 00:47 |
*** julim has quit IRC | 00:48 | |
*** senk has quit IRC | 00:49 | |
jog0 | that should have been Obama's catch phrase for his second term | 00:51 |
*** senk has joined #openstack-infra | 00:53 | |
*** senk has quit IRC | 00:53 | |
*** senk has joined #openstack-infra | 00:54 | |
mordred | sdague: I did not see your patch. tell me about it! | 00:59 |
mriedem | sdague: do you have any ideas about this quantumclient issue in the stable/grizzly gate? https://review.openstack.org/#/c/48299/ | 01:03 |
*** portante|afk is now known as portante | 01:03 | |
*** xchu has joined #openstack-infra | 01:04 | |
Alex_Gaynor | Hmm, so we probably have the ability to compute what %age of gate jobs are passing? | 01:04 |
jog0 | Alex_Gaynor: there is a way but I forget but it uses graphite.openstack.or | 01:05 |
Alex_Gaynor | jog0: trying to analyze if my feeling that the fail rate has been crazy high for the last 1-2 days is accurate | 01:05 |
jog0 | http://graphite.openstack.org/graphlot/?width=586&height=308&_salt=1380244013.092&target=stats.zuul.pipeline.gate.job.gate-tempest-devstack-vm-full.FAILURE&target=stats.zuul.pipeline.gate.job.gate-tempest-devstack-vm-full.SUCCESS&target=stats.zuul.pipeline.gate.job.gate-tempest-devstack-vm-neutron.SUCCESS&target=stats.zuul.pipeline.gate.job.gate-tempest-devstack-vm-neutron.FAILURE&from=00%3A00_20130926&until=23%3A59_20130926 | 01:07 |
Alex_Gaynor | So going back two weeks leads me to believe that yes, failure rates are up | 01:09 |
jeblair | jog0: what's the attraction of graphlot? | 01:09 |
*** sodabrew has quit IRC | 01:09 | |
jeblair | as opposed to composer | 01:10 |
jeblair | i find composer easier to use for finding metrics, changing time windows, and applying funcitions... | 01:11 |
jog0 | Alex_Gaynor: http://graphite.openstack.org/graphlot/?width=586&from=00%3A00_20130919&_salt=1380244287.508&height=308&target=summarize(stats_counts.zuul.pipeline.gate.job.gate-tempest-devstack-vm-neutron.FAILURE%2C%2224h%22)&target=summarize(stats_counts.zuul.pipeline.gate.job.gate-tempest-devstack-vm-neutron.SUCCESS%2C%2224h%22)&until=23%3A59_20130926&lineMode=staircase | 01:11 |
Alex_Gaynor | jog0: cool, science confirms my intuition! | 01:12 |
*** jrgarciahp has quit IRC | 01:12 | |
jog0 | jeblair: that was the link that I found first | 01:12 |
jog0 | Alex_Gaynor: I can point to the bug too | 01:13 |
jeblair | jog0: please do; i'd like to see who is assigned | 01:13 |
*** senk has quit IRC | 01:13 | |
Alex_Gaynor | jog0: my impression was there was a handful of bugs causing this? | 01:14 |
jog0 | http://logstash.openstack.org/#eyJzZWFyY2giOiIgQG1lc3NhZ2U6XCJBc3NlcnRpb25FcnJvcjogU3RhdGUgY2hhbmdlIHRpbWVvdXQgZXhjZWVkZWQhXCIgQU5EIEBmaWVsZHMuYnVpbGRfc3RhdHVzOlwiRkFJTFVSRVwiIEFORCBAZmllbGRzLmZpbGVuYW1lOlwiY29uc29sZS5odG1sXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzODAyNDQ0MzM2NzZ9 | 01:14 |
jog0 | Alex_Gaynor: at least one bug | 01:15 |
jog0 | jeblair: no one because I noticed it today | 01:15 |
jog0 | I can't even find a stacktrace that caused it | 01:15 |
Alex_Gaynor | jog0: my impression was that it was the boto and the test_volume_boot_pattern ones? | 01:15 |
jeblair | jog0: thank you for that. | 01:15 |
jog0 | https://bugs.launchpad.net/tempest/+bug/1230407 | 01:16 |
uvirtbot | Launchpad bug 1230407 in neutron "State change timeout exceeded" [Undecided,Confirmed] | 01:16 |
jeblair | also, i'm becoming more and more keen on the idea that we should run the neutron test 10 times for every neutron change | 01:16 |
jog0 | jeblair: hahaha | 01:16 |
jog0 | by that I mean yes! | 01:16 |
*** thomasm has quit IRC | 01:18 | |
jog0 | http://logstash.openstack.org/#eyJzZWFyY2giOiJAbWVzc2FnZTpcIk5vdmFFeGNlcHRpb246IGlTQ1NJIGRldmljZSBub3QgZm91bmQgYXRcIiBBTkQgQGZpZWxkcy5idWlsZF9zdGF0dXM6XCJGQUlMVVJFXCIgQU5EIEBmaWVsZHMuZmlsZW5hbWU6XCJsb2dzL3NjcmVlbi1uLWNwdS50eHRcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM4MDI0NDY2ODQ5Nn0= | 01:18 |
jog0 | https://bugs.launchpad.net/tempest/+bug/1226337 | 01:18 |
jog0 | boot pattern | 01:18 |
uvirtbot | Launchpad bug 1226337 in tempest "tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern flake failure" [High,Triaged] | 01:18 |
jog0 | anyway you get the idea | 01:18 |
*** wenlock has quit IRC | 01:19 | |
jog0 | anyone want to send those links to the openstack-dev ML? | 01:19 |
jog0 | shaming people for destabilizing during stabilization | 01:19 |
jeblair | jog0: do you not want to? in the past, sdague has started a thread naming specific critical bugs for gate failures and it has helped to focus attention | 01:20 |
jog0 | I will go ahead and do it | 01:21 |
jog0 | should be fun | 01:21 |
jog0 | unless sdague wants to | 01:21 |
*** dkliban has joined #openstack-infra | 01:22 | |
*** mriedem has quit IRC | 01:23 | |
* jog0 starts drafting a fun email | 01:25 | |
*** kong has quit IRC | 01:28 | |
*** jerryz has quit IRC | 01:28 | |
*** jerryz has joined #openstack-infra | 01:29 | |
*** ojacques has quit IRC | 01:35 | |
*** melwitt has quit IRC | 01:37 | |
*** rfolco has quit IRC | 01:39 | |
*** CaptTofu has joined #openstack-infra | 01:40 | |
jog0 | sent | 01:42 |
jog0 | that should be fun | 01:42 |
*** CaptTofu_ has joined #openstack-infra | 01:45 | |
*** CaptTofu has quit IRC | 01:47 | |
jog0 | Alex_Gaynor: I can account for 200 failures in last 24 hours with just two bugs | 01:48 |
Alex_Gaynor | jog0: :/ | 01:48 |
*** ArxCruz has quit IRC | 01:49 | |
jog0 | out of 305 | 01:49 |
jog0 | or so | 01:49 |
clarkb | wow | 01:52 |
morganfainberg | jog0, thats crazy. | 01:54 |
lifeless | morganfainberg: pretty common | 01:59 |
lifeless | morganfainberg: you get a long tail effect | 01:59 |
morganfainberg | lifeless, aye, still. i know i've had my fair share of rechecks on the bootpattern one | 01:59 |
morganfainberg | lifeless, just didn't realize how _much_ it affected everything | 02:00 |
mordred | I think, as much as I don't like it in theory, that I'd like to skip those two tests in the normal runs | 02:01 |
mordred | but run an extra job for neutron with them on | 02:01 |
mordred | and loop them 10x | 02:01 |
lifeless | morganfainberg: when we first got similar stuff in place for Launchpad, we had something like 80% explained by the first 4 bugs. | 02:01 |
mordred | because those numbers above are crazy | 02:01 |
lifeless | morganfainberg: and then 80% of the remainder from 4 more bugs, and so on. | 02:01 |
morganfainberg | lifeless, lol | 02:01 |
mordred | jog0, sdague: it's a little bitchy, but what do you think? | 02:02 |
*** ericw has quit IRC | 02:02 | |
*** dkliban has quit IRC | 02:02 | |
dims | jog0, which two tests specifically? | 02:03 |
mordred | dims: jog0 just sent a mail to the -dev list with the deets | 02:03 |
*** ericw has joined #openstack-infra | 02:05 | |
*** yaguang has joined #openstack-infra | 02:05 | |
dims | mordred, thx | 02:06 |
lifeless | mordred: I think it's a decent accomodation *if* the problem is test-side, not service side. | 02:06 |
lifeless | mordred: if neutron is actually buggy - and I've seen stuff with tripleo these last few days that makes me think it's service side. | 02:07 |
lifeless | mordred: then the gate is doing it's job and we need to fix the damn things before release. | 02:07 |
*** dkliban has joined #openstack-infra | 02:07 | |
mordred | lifeless: yes. I completely agree that we should fix the damn things before the release. I agree that the gate is doing its job | 02:07 |
*** senk has joined #openstack-infra | 02:08 | |
mordred | lifeless: I think I'm more brainstorming on how we can better place the onus to fix near where it could be fixed | 02:08 |
lifeless | mordred: Ah, so thats interesting. | 02:08 |
lifeless | mordred: From one sense, having it widespread gets more folk onboard faster. | 02:08 |
mordred | yah. that's the original theory | 02:09 |
lifeless | mordred: in fact, stopping other things changing while we fix brain damage helps prevent slippage: this is exactly the concern you and jeblair have about 'turn off bare metal gating if it breaks'. | 02:09 |
mordred | yes | 02:09 |
lifeless | mordred: OTOH if slippage is a low risk, you are basically breaking everyone elses brains until the thing is fixed. | 02:09 |
mordred | yeah. especially since the thing that is breaking is flaky, so the gate breakage isn't preventing slippage in this case | 02:10 |
mordred | which is where the "take flaky tests and run a job which runs them 10x" idea comes in | 02:10 |
mordred | if we can cause them to be _more_ breaking - but in a targetted manner | 02:11 |
lifeless | maybe we should just run everything N* where N gets us some confidence interval of 'very reliable' | 02:11 |
lifeless | e.g. 10* -> 90% reliable. | 02:12 |
mordred | yah. I could see that as a general strategy once we get past these | 02:12 |
lifeless | run 10 tempest jobs in parallel for every gate. | 02:12 |
mordred | yup | 02:12 |
mordred | the overall machine cost might still be lower than all the gate resets | 02:12 |
mordred | if it helps us not let flaky things in | 02:12 |
lifeless | jog0: do we have an identified bad commit ? | 02:12 |
lifeless | jog0: like 'never before X' ? | 02:13 |
lifeless | can we revert the thing? | 02:13 |
*** reed_ has joined #openstack-infra | 02:15 | |
*** senk has quit IRC | 02:18 | |
jeblair | lifeless: according to http://logstash.openstack.org/#eyJzZWFyY2giOiIgQG1lc3NhZ2U6XCJBc3NlcnRpb25FcnJvcjogU3RhdGUgY2hhbmdlIHRpbWVvdXQgZXhjZWVkZWQhXCIgQU5EIEBmaWVsZHMuYnVpbGRfc3RhdHVzOlwiRkFJTFVSRVwiIEFORCBAZmllbGRzLmZpbGVuYW1lOlwiY29uc29sZS5odG1sXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjEwMCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM4MDI0NDQzMzY3Nn0= | 02:18 |
jeblair | lifeless: never before 2013-09-20T23:37:40.000 but the major problem started at 2013-09-25T02:09:44.000 | 02:19 |
jeblair | lifeless: you'll see what i mean if you look at the graph | 02:20 |
lifeless | yeah | 02:20 |
lifeless | so a commit before 2013-09-25T02:09:44.000 | 02:20 |
lifeless | and not far before | 02:20 |
*** CaptTofu_ has quit IRC | 02:24 | |
*** dguitarbite has joined #openstack-infra | 02:25 | |
*** CaptTofu has joined #openstack-infra | 02:26 | |
lifeless | does openstack have a secure document store | 02:27 |
lifeless | where e.g. I can store a bunch of passwords and give them out to selected tripleo folk ? | 02:27 |
lifeless | for context, I want to make getting access to the machines that will host the proposed baremetal test cluster something we can document and delegate. | 02:28 |
lifeless | one test I'm considering is 'tripleo ptl + delegates' | 02:28 |
jeblair | lifeless: no; anteaya is looking into owncloud for the board of directors; we've considered expanding its use if it works out for that. | 02:33 |
lifeless | ok, I'll do something icky for now, but please consider us interested. | 02:34 |
jeblair | lifeless: related: there are plans forming for a keysigning event at the summit | 02:34 |
lifeless | yeah, I need to do a key migration thing | 02:34 |
anteaya | would we have an owncloud separate from the one the board of directors is using? | 02:34 |
lifeless | my gpg key is long in the tooth | 02:34 |
anteaya | or everyone on one owncloud? | 02:34 |
jerryz | Hi everyone, got a version conflict error from oslo.config on my own devstack while starting nova-api, http://paste.openstack.org/show/47585/ need help , thanks | 02:34 |
mordred | anteaya: unsure. I think we'll have to learn a little more about group permissions, management and users in owncloud | 02:34 |
anteaya | very good | 02:35 |
jeblair | yeah, and no need to get ahead; we can do baby steps. | 02:35 |
anteaya | owncloud is up after puppet-dashboard starts processing reports | 02:35 |
anteaya | up meaning next in line for my attention | 02:35 |
mordred | jerryz: awesome! that's just great | 02:35 |
anteaya | jeblair: k | 02:35 |
jeblair | lifeless: yeah, about a year ago i finally decided that having a 1024 bit key from 1996 was a liability, not a badge of honor. :) | 02:36 |
mordred | how did we manage to land that change? | 02:36 |
jeblair | jerryz: can you link to the change? | 02:37 |
mordred | oh! wait | 02:37 |
jerryz | no change here. Just sync the upstream and trigger a tempest test on my own devstack | 02:37 |
mordred | jerryz: you may need to do something | 02:38 |
mordred | jerryz: cd /opt/stack/new/oslo.config | 02:38 |
mordred | rm -rf *.egg-info | 02:39 |
mordred | git pull --ff-only | 02:39 |
mordred | sudo pip install -e . | 02:39 |
mordred | jeblair: you know the one gotcha in the way we're calculating versoins? that a setup.py develop'd install is not going to ever pick up a new version? | 02:39 |
mordred | jeblair: I believe that may be what has happened here | 02:40 |
mordred | sdague, dtroyer ^^ we may want to put something in to restack to clean out egg-info files | 02:40 |
mordred | so that git updates will re-gen versions properly across tag boundaries (where it might be important) | 02:41 |
mordred | clarkb: if you get bored: https://review.openstack.org/#/c/41945/ I think is FINALLY actually ready | 02:48 |
*** anteaya has quit IRC | 02:50 | |
jerryz | mordred: thanks. but why the d-g test on o.o does not have this issue? what is the circumstance for it to happen? | 02:59 |
*** dims has quit IRC | 02:59 | |
mordred | jerryz: d-g test starts with a completely clean vm each time | 02:59 |
mordred | your vm had some unaccounted for state from previous versions of your git repo | 02:59 |
mordred | jerryz: there is something that could be added to devstack to deal with this, and I'll add that to my todo list | 03:00 |
mordred | but you're lucky enough to have hit a strange corner case | 03:00 |
jerryz | mine is manged by nodepool, i believe it will clean up used ones | 03:00 |
mordred | oh! well that's a whole other thing | 03:03 |
jerryz | mordred: any more info needed to debug this? | 03:07 |
*** dkliban has quit IRC | 03:07 | |
mordred | jerryz: honestly, I'm kinda stumped as to how that could happen if that is a completely fresh node | 03:08 |
mordred | jerryz: and it's 11pm here, so I'm probably not going to dig in too much right now | 03:08 |
mordred | jerryz: I'll try to figure out what's going on when I wake up | 03:08 |
jerryz | mordred: ok. thanks. night | 03:09 |
*** sarob has quit IRC | 03:11 | |
*** sarob has joined #openstack-infra | 03:12 | |
*** matsuhashi has quit IRC | 03:15 | |
*** sarob has quit IRC | 03:16 | |
*** dkranz has joined #openstack-infra | 03:29 | |
*** marun has quit IRC | 03:37 | |
*** marun has joined #openstack-infra | 03:38 | |
*** nati_ueno has quit IRC | 03:38 | |
*** dguitarbite has quit IRC | 03:42 | |
*** ryanpetrello has joined #openstack-infra | 03:42 | |
clarkb | http://justin.abrah.ms/misc/state_of_githubs_code_review.html | 03:45 |
pleia2 | hey, look at that, they link to our gerrit :) | 03:47 |
clarkb | yup :) | 03:48 |
*** Ryan_Lane has quit IRC | 03:48 | |
clarkb | those of you that are twittery should twitter the benefits of gerrit | 03:49 |
Alex_Gaynor | grumble, the rate of gate resets is resulting in starving the check pipeline | 03:49 |
clarkb | Alex_Gaynor: yup | 03:49 |
clarkb | Alex_Gaynor: https://review.openstack.org/#/c/48549/ should help | 03:50 |
*** marun has quit IRC | 03:50 | |
clarkb | I won't get to fixing it tonight, anyone else is welcome to | 03:50 |
clarkb | (basically run tests in check on the other cloud) | 03:50 |
Alex_Gaynor | clarkb: redundant array of independent clouds! | 03:51 |
*** marun has joined #openstack-infra | 03:51 | |
hub_cap | mordred: promise im making progress on the new cli tool. ive got maybe ~2 days of work to go | 03:52 |
*** marun has quit IRC | 03:56 | |
*** marun has joined #openstack-infra | 03:56 | |
*** matsuhashi has joined #openstack-infra | 03:56 | |
*** basha has joined #openstack-infra | 04:06 | |
lifeless | clarkb: hey, how do you get uber receipts into HP's system ? | 04:13 |
clarkb | lifeless: I have never had to do it for HP... I use it in seattle for personal things | 04:13 |
pleia2 | lifeless: I save the email receipt as pdf | 04:13 |
*** jerryz has quit IRC | 04:14 | |
lifeless | pleia2: ah yeah, print-to-pdf | 04:15 |
pleia2 | yeah | 04:16 |
*** AlexF has joined #openstack-infra | 04:16 | |
*** CaptTofu has quit IRC | 04:16 | |
*** CaptTofu has joined #openstack-infra | 04:17 | |
*** AlexF has quit IRC | 04:21 | |
*** SergeyLukjanov has joined #openstack-infra | 04:31 | |
*** AlexF has joined #openstack-infra | 04:31 | |
*** basha has quit IRC | 04:32 | |
*** reed_ has quit IRC | 04:37 | |
*** basha has joined #openstack-infra | 04:38 | |
*** sarob has joined #openstack-infra | 04:38 | |
*** AlexF has quit IRC | 04:42 | |
*** AlexF has joined #openstack-infra | 04:43 | |
*** sarob has quit IRC | 04:44 | |
*** ericw has quit IRC | 04:45 | |
*** jerryz has joined #openstack-infra | 04:46 | |
*** ericw has joined #openstack-infra | 04:48 | |
*** odyssey4me has joined #openstack-infra | 04:50 | |
*** basha has quit IRC | 04:52 | |
*** boris-42 has joined #openstack-infra | 04:53 | |
*** odyssey4me has quit IRC | 04:54 | |
*** odyssey4me has joined #openstack-infra | 04:55 | |
mordred | clarkb: nice! | 04:55 |
*** DennyZhang has joined #openstack-infra | 04:56 | |
*** sarob has joined #openstack-infra | 04:57 | |
Alex_Gaynor | watching the gate today has been so sad | 04:59 |
Alex_Gaynor | Head of the gate was approved 10.5 hours sago :( | 04:59 |
mordred | Alex_Gaynor: yeah. it's been a bad couple of days for that | 05:02 |
Alex_Gaynor | mordred: sadly I can't think of any sane approach to improving it besides "fix the bugs in tempest / <projects>" | 05:02 |
mordred | Alex_Gaynor: yeah. well, did you see my terrible idea earlier (or combo of ideas) | 05:02 |
Alex_Gaynor | mordred: No, I missed it | 05:02 |
mordred | Alex_Gaynor: disable the two bad tests in the normal runs, make a run that does run those tests - and on every neutron change, run 10 copies of that | 05:03 |
mordred | that way, most of the gate is fine, but neutron has to fix the bugs before anything else will land for them | 05:03 |
Alex_Gaynor | mordred: I... I kind of love it (assuming we're sure neutron is at fault) | 05:03 |
*** sarob has quit IRC | 05:04 | |
mordred | the bad ones only happen when neutron is enabled | 05:04 |
Alex_Gaynor | mordred: probably the neutrno core reviewers shoudl also stop approving other patches | 05:04 |
mordred | then - once we've cleaned up the top reset offenders | 05:04 |
mordred | add a fanout run to every change which runs 5 copies of the neutron tests for everybody | 05:05 |
mordred | it would explode node usage a bit, but I'm _guessing_ not as bad as all the resets | 05:05 |
Alex_Gaynor | Possibly we need to think of a more general approach to dealing with non-determinism in tests. | 05:06 |
mordred | only systemic way I can think of is running tests multiple times | 05:06 |
mordred | to try to increase the odds of tripping non-deterministic things on their way in | 05:06 |
Alex_Gaynor | The other issue is that non-determinism sometimes doesn't look like it's caused by a patch, even if it is, so people just recheck until it manages to land, even though it's exacerbating a problem | 05:07 |
Alex_Gaynor | I don't know how to address that. | 05:07 |
mordred | well, recheck itself is a bandaid | 05:07 |
clarkb | ya thats a big problem I think | 05:07 |
clarkb | push until it goes in just adds more badness | 05:08 |
mordred | that's there to deal with non-deterministic tests | 05:08 |
clarkb | right but it feeds it too | 05:08 |
mordred | yup | 05:08 |
Alex_Gaynor | maybe the system should handle reverifies with expontential backoff, to prevent a patch that really almost never passes. or something. | 05:09 |
mordred | if we could figure out a better way to block flaky tests (such as parallel copies, or someting better) | 05:09 |
mordred | then we could make recheck/reverify go away | 05:09 |
Alex_Gaynor | right, making them go away would be ideal | 05:09 |
mordred | and save that feature for only things that infra triggeres, such as "the internet exploded" | 05:09 |
Alex_Gaynor | I wonder if the number of nodes we're spawning and shutting down produce a noticable blip for people at RS/HP observing. Probably not I guess | 05:11 |
*** afazekas_zz has quit IRC | 05:20 | |
*** AlexF has quit IRC | 05:21 | |
* mordred likes to think that both clouds have dedicated ops teams who just watch our activity and marvel | 05:22 | |
jerryz | mordred: could you tell me how package version number is calculated? i got variations of version numbers for oslo.config when doing pip install -e . locally | 05:22 |
mordred | jerryz: yes, it's very similar to how git describe works | 05:23 |
mordred | if the current commit is tagged, then that is the version | 05:23 |
*** nicedice has quit IRC | 05:23 | |
mordred | if the current commit is not taged, then the version is $next_version.a$number_of_commits_since_last_tag.g$git_short_sha | 05:23 |
mordred | where next_version is the version in setup.cfg | 05:24 |
mordred | this is how the version is calculated for the server repos and for the oslo code | 05:24 |
mordred | for library code, it's different (and slightly easier) | 05:24 |
mordred | jerryz: so _currently_ oslo.config master should be showing you: | 05:25 |
mordred | mordred@camelot:~/src/openstack/oslo.config$ python setup.py --version | 05:25 |
mordred | 1.2.1 | 05:25 |
mordred | if you're not seeing that, then my guess would be perhaps you're not fetching tags? | 05:26 |
jerryz | if my oslo.config code base is synced from upstream , which is review.o.o or github, the tag 1.2.1 should be already in the code | 05:27 |
jerryz | why i still get 1.2.0.**** if i install from a git clone from my private repo that is synced with upstream | 05:27 |
*** cthulhup has joined #openstack-infra | 05:28 | |
*** SergeyLukjanov has quit IRC | 05:29 | |
*** cthulhup has quit IRC | 05:29 | |
mordred | the only other thing is - if the repo was used before, the version calculation is cached in the egg-info dir | 05:31 |
mordred | when you say "if i install from a git clone from my private repo that is synced with upstream" - how are you syncing your private repo? | 05:31 |
mordred | jerryz: actually, funny story - look at the most recent commit to oslo.config | 05:33 |
mordred | and the commit message | 05:33 |
mordred | it seems this was a problem for us back on Sunday | 05:33 |
*** ericw has quit IRC | 05:33 | |
*** odyssey4me has quit IRC | 05:36 | |
jerryz | mordred: it seems that when syncing the upstream to private repo, i didn't push tags | 05:37 |
mordred | phew. well, that at least explains it! | 05:37 |
*** afazekas has joined #openstack-infra | 05:41 | |
*** SergeyLukjanov has joined #openstack-infra | 05:42 | |
*** SergeyLukjanov has quit IRC | 05:44 | |
*** ryanpetrello has quit IRC | 05:44 | |
*** ryanpetrello has joined #openstack-infra | 05:45 | |
*** Ryan_Lane has joined #openstack-infra | 05:46 | |
*** Ryan_Lane has joined #openstack-infra | 05:46 | |
*** Ryan_Lane has quit IRC | 05:46 | |
*** nati_ueno has joined #openstack-infra | 05:56 | |
*** DennyZhang has quit IRC | 06:03 | |
*** marun has quit IRC | 06:06 | |
*** davidhadas_ has quit IRC | 06:06 | |
*** amotoki has joined #openstack-infra | 06:15 | |
*** yolanda has joined #openstack-infra | 06:15 | |
*** afazekas_ has joined #openstack-infra | 06:16 | |
*** afazekas_ has quit IRC | 06:17 | |
*** jhesketh has quit IRC | 06:20 | |
*** jhesketh__ has quit IRC | 06:20 | |
*** jhesketh_ has joined #openstack-infra | 06:20 | |
*** yongli_away is now known as yongli | 06:26 | |
*** slong has quit IRC | 06:29 | |
*** jhesketh has joined #openstack-infra | 06:34 | |
*** shardy_afk is now known as shardy | 06:38 | |
*** odyssey4me has joined #openstack-infra | 06:55 | |
*** Ryan_Lane has joined #openstack-infra | 06:57 | |
*** Ryan_Lane has quit IRC | 07:01 | |
openstackgerrit | Rongze Zhu proposed a change to openstack-infra/gitdm: Add two employees to UnitedStack https://review.openstack.org/48597 | 07:11 |
*** hashar has joined #openstack-infra | 07:20 | |
*** Ryan_Lane has joined #openstack-infra | 07:21 | |
ttx | fungi: (to solve exclusionary reqs) if you except pep8 those seem to come from ceilometer and swift, but those two projects weren't in the gate in stable/folsom times, so i'm not sure why we would consider them ? | 07:22 |
*** hashar_ has joined #openstack-infra | 07:25 | |
*** hashar has quit IRC | 07:25 | |
*** hashar_ is now known as hashar | 07:25 | |
*** fbo_away is now known as fbo | 07:25 | |
*** hashar has quit IRC | 07:25 | |
*** hashar has joined #openstack-infra | 07:26 | |
*** Ryan_Lane has quit IRC | 07:29 | |
*** flaper87|afk is now known as flaper87 | 07:32 | |
*** mrda has quit IRC | 07:42 | |
*** tvb|afk has joined #openstack-infra | 07:43 | |
*** tvb|afk has joined #openstack-infra | 07:43 | |
*** jcoufal has joined #openstack-infra | 07:45 | |
*** yassine has joined #openstack-infra | 07:47 | |
*** basha has joined #openstack-infra | 07:47 | |
*** mrda has joined #openstack-infra | 07:49 | |
*** basha has quit IRC | 07:49 | |
*** jcoufal has quit IRC | 07:49 | |
*** boris-42 has quit IRC | 07:50 | |
*** SergeyLukjanov has joined #openstack-infra | 07:53 | |
*** Ryan_Lane has joined #openstack-infra | 07:56 | |
*** Ryan_Lane has quit IRC | 08:01 | |
*** mrda has quit IRC | 08:07 | |
*** SergeyLukjanov has quit IRC | 08:09 | |
*** dizquierdo has joined #openstack-infra | 08:10 | |
*** jcoufal has joined #openstack-infra | 08:13 | |
*** SergeyLukjanov has joined #openstack-infra | 08:13 | |
*** thomasbiege1 has joined #openstack-infra | 08:16 | |
*** thomasbiege1 has quit IRC | 08:19 | |
*** DinaBelova has joined #openstack-infra | 08:22 | |
*** Ryan_Lane has joined #openstack-infra | 08:27 | |
*** nati_ueno has quit IRC | 08:28 | |
*** Ryan_Lane has quit IRC | 08:31 | |
*** johnthetubaguy has joined #openstack-infra | 08:31 | |
*** mancdaz has quit IRC | 08:33 | |
*** dizquierdo has quit IRC | 08:33 | |
*** derekh has joined #openstack-infra | 08:34 | |
*** mancdaz has joined #openstack-infra | 08:35 | |
*** jerryz has quit IRC | 08:41 | |
*** DinaBelova has quit IRC | 08:43 | |
*** tvb|afk has quit IRC | 08:44 | |
*** tvb|afk has joined #openstack-infra | 08:44 | |
*** tvb|afk has joined #openstack-infra | 08:44 | |
*** tvb|afk is now known as tvb | 08:44 | |
*** locke105 has quit IRC | 08:49 | |
*** locke105 has joined #openstack-infra | 08:50 | |
openstackgerrit | Pavel Sedlák proposed a change to openstack-infra/jenkins-job-builder: KeepLongStdio argument for JUnit publisher https://review.openstack.org/48431 | 08:51 |
*** samalba has quit IRC | 08:52 | |
*** samalba has joined #openstack-infra | 08:53 | |
*** jcoufal has quit IRC | 08:55 | |
*** Ryan_Lane has joined #openstack-infra | 08:57 | |
*** Ryan_Lane has quit IRC | 09:02 | |
*** boris-42 has joined #openstack-infra | 09:05 | |
*** tvb is now known as Tristan_ | 09:10 | |
*** Tristan_ is now known as Guest77656 | 09:11 | |
*** Guest77656 is now known as tvb | 09:11 | |
*** dizquierdo has joined #openstack-infra | 09:15 | |
*** Ryan_Lane has joined #openstack-infra | 09:27 | |
*** Ryan_Lane has quit IRC | 09:32 | |
openstackgerrit | Jaroslav Henner proposed a change to openstack-infra/jenkins-job-builder: Add dynamic string and choice params. https://review.openstack.org/48506 | 09:40 |
*** Ryan_Lane has joined #openstack-infra | 09:58 | |
*** Ryan_Lane has quit IRC | 10:02 | |
*** hashar has quit IRC | 10:04 | |
*** hashar has joined #openstack-infra | 10:10 | |
*** hashar has quit IRC | 10:14 | |
*** AlexF has joined #openstack-infra | 10:16 | |
*** kmartin has quit IRC | 10:17 | |
*** fifieldt has quit IRC | 10:28 | |
*** tvb has quit IRC | 10:28 | |
*** Ryan_Lane has joined #openstack-infra | 10:29 | |
*** DinaBelova has joined #openstack-infra | 10:30 | |
*** dkehn_ has joined #openstack-infra | 10:31 | |
*** dkehn has quit IRC | 10:31 | |
*** hashar has joined #openstack-infra | 10:31 | |
*** Ryan_Lane has quit IRC | 10:33 | |
*** DinaBelova has quit IRC | 10:33 | |
*** hashar has quit IRC | 10:36 | |
*** thomasbiege1 has joined #openstack-infra | 10:40 | |
*** matsuhashi has quit IRC | 10:52 | |
*** yaguang has quit IRC | 10:56 | |
*** tvb has joined #openstack-infra | 10:59 | |
*** tvb has quit IRC | 10:59 | |
*** tvb has joined #openstack-infra | 10:59 | |
*** Ryan_Lane has joined #openstack-infra | 10:59 | |
*** Ryan_Lane has quit IRC | 11:03 | |
*** tvb has quit IRC | 11:07 | |
*** thomasbiege1 has quit IRC | 11:09 | |
*** johnthetubaguy has quit IRC | 11:10 | |
*** AlexF has quit IRC | 11:10 | |
openstackgerrit | Jaroslav Henner proposed a change to openstack-infra/jenkins-job-builder: Add dynamic string and choice params. https://review.openstack.org/48506 | 11:14 |
BobBall | mordred: when you're around could you let me know? I want to understand what sort of stats you think would be useful to show that smokestack's -1's are stable to feed into the discussion of whether they can be upgraded to -2? | 11:14 |
*** AlexF has joined #openstack-infra | 11:14 | |
*** tvb has joined #openstack-infra | 11:20 | |
*** tvb has quit IRC | 11:20 | |
*** tvb has joined #openstack-infra | 11:20 | |
*** thomasbiege1 has joined #openstack-infra | 11:24 | |
*** thomasbiege3 has joined #openstack-infra | 11:27 | |
*** thomasbiege1 has quit IRC | 11:27 | |
*** tvb has quit IRC | 11:28 | |
*** Ryan_Lane has joined #openstack-infra | 11:30 | |
*** tvb has joined #openstack-infra | 11:30 | |
sdague | BobBall: if it's not run by CI team, it really can't be -2 | 11:30 |
*** giulivo has joined #openstack-infra | 11:31 | |
sdague | we can't have an external entity have the ability to have an infrastructure fail then break the gate for everyone, we've got enough challenges with infrastructure we control doing that | 11:31 |
*** shardy is now known as shardy_afk | 11:32 | |
BobBall | I'm referring to the discussion which finished with http://lists.openstack.org/pipermail/openstack-infra/2013-August/000196.html - of course, the infra team needs the ultimate authority and the revokation of -2 privs easily solves that | 11:32 |
BobBall | just like the "ultimate" sanction of moving a job from voting to non-voting | 11:33 |
BobBall | doesn't really need any work from the infra team to fix it, but ensures that the team responsible for the job/etc will fix it before being considered for the priviledge again | 11:33 |
*** thomasbiege3 has quit IRC | 11:33 | |
sdague | ok, sorry, different thread I was thinking about | 11:34 |
*** Ryan_Lane has quit IRC | 11:34 | |
BobBall | I think it's the same thread - but my starting suggestion was unworkable and I completely understand why that was now! | 11:34 |
sdague | so I think the stat mordred actually wants there is how often is someone ignoring a -1 from smokestack | 11:35 |
BobBall | Basically what I think would be useful is for SS to run in parallel to the gate and post a -2 vote if it completes it's testing and finds a failure in the tests (we've specifically only included test-failures in voting - so if a packging failure occurs, it doesn't post) | 11:36 |
BobBall | if the gate finishes first, then tough, SS doesn't get a chance to say whether it thinks a patch works or not | 11:36 |
BobBall | *nod* - I've got those stats | 11:36 |
BobBall | but I want to get more details because I think there are other useful things | 11:36 |
sdague | https://review.openstack.org/#/q/status:merged+Verified-1+project:openstack/nova,n,z | 11:37 |
BobBall | ohhh useful query | 11:37 |
BobBall | I was doing it through SSH | 11:37 |
*** thomasbiege has joined #openstack-infra | 11:37 | |
sdague | so it's only happened twice this year on nova | 11:37 |
BobBall | I'll have to look into those two | 11:38 |
BobBall | but they were way before the automatically posting / packaging fxies that changed the SS workflow | 11:38 |
sdague | the first one, smokestack was broken (January) | 11:38 |
sdague | BobBall: sure | 11:38 |
sdague | but that's even more indication that there is no need for SS to have -2 | 11:38 |
BobBall | but I'm also interested in the stats about how regularly SS had posted before jenkins returned | 11:38 |
openstackgerrit | Darragh Bailey proposed a change to openstack-infra/jenkins-job-builder: Add repo scm https://review.openstack.org/45165 | 11:39 |
sdague | right, but it will still post a -1 even if we went to merge | 11:39 |
BobBall | think so, yes | 11:39 |
sdague | unless you did something very magical | 11:40 |
BobBall | heh :) | 11:40 |
*** CaptTofu has quit IRC | 11:40 | |
sdague | we get jenkins check results after we're in the gate | 11:40 |
sdague | sometimes | 11:40 |
*** CaptTofu has joined #openstack-infra | 11:40 | |
sdague | https://review.openstack.org/#/c/42361/ is the only override in the last 6 months | 11:40 |
BobBall | What do you mean by override? | 11:41 |
sdague | the only time we merged a change that SmokeStack had a -1 on | 11:41 |
BobBall | oh, yes | 11:41 |
sdague | so I think you are trying to solve a problem that doesn't exist :) | 11:42 |
openstackgerrit | Darragh Bailey proposed a change to openstack-infra/jenkins-job-builder: Add repo scm https://review.openstack.org/45165 | 11:42 |
BobBall | depends on what the problem really is :) | 11:42 |
sdague | ok, what do you think the problem is? maybe I don't understand | 11:42 |
BobBall | From my perspective we've got a system that can be used to gate changes to prevent breakages to the XenAPI driver | 11:43 |
BobBall | that's the criteria for being a "Group A" hypervisor | 11:43 |
sdague | but it's already doing it | 11:43 |
sdague | we only had 1 override in the last 6 months | 11:44 |
BobBall | and while I'm working hard on getting XenServer tested in the gate properly, there have been lots of hiccups along the way | 11:44 |
BobBall | nah, it's already "functional testing provided by an external system that does not gate commits" | 11:44 |
*** matsuhashi has joined #openstack-infra | 11:44 | |
sdague | so this is really just about moving from B -> A state? | 11:44 |
*** thomasbiege has quit IRC | 11:44 | |
sdague | not actually about keeping the breaks out of the tree? | 11:45 |
BobBall | Group A is about a system that ensures the breaks are kept out - rather than relying on the reviewers | 11:45 |
sdague | from a code perspective, the problem is already solved | 11:45 |
BobBall | agreed | 11:45 |
sdague | we rely on reviewers for all sorts of things, especially as we don't have 400% test coverage | 11:46 |
sdague | and the reviewers aren't failing us here | 11:46 |
sdague | 1 override in 6 months is not a real failure rate | 11:47 |
sdague | so you are trying to fix a problem that doesn't exist | 11:47 |
sdague | if the override rate was twice a day, I'd agree with you | 11:47 |
BobBall | So is your view that group A and B should really be considered the same thing because an automated process and manual process-that-works are as good as each other | 11:47 |
sdague | they are different, because group B isn't being run by the project. So if entity X that is running external CI stops, the project can do nothing about it. | 11:48 |
BobBall | So you think that A needs to be integrated with the gate and B is external irrespective of whether it "gates" or not | 11:49 |
sdague | realize "has -2" requires that it is "run by the CI team" | 11:49 |
sdague | I think that's where the definition might not have been clear. To be group A I really think it needs to be run by infrastructure team for OpenStack. I don't see another way we could do that. | 11:50 |
BobBall | I thought the discussion we had last month suggested that the -2 privs could be given to an external system because it's easy enough for the CI team to revoke those privs if they ever break the gate | 11:50 |
sdague | I didn't think that was suggested | 11:51 |
sdague | I'm -2 on the idea of non infra run systems having -2 on integrated projects | 11:51 |
*** dims has joined #openstack-infra | 11:51 | |
BobBall | *grin* That was my suggestion, but I thought mordred's suggestion to talk about it again when SS was proving it's stability with automated -1's meant that possibility was open :) | 11:53 |
*** AlexF has quit IRC | 11:53 | |
sdague | my reading of that is wanting to see how often the override was a problem was to make it clear there was nothing wrong with being only a -1 job | 11:54 |
sdague | because the -1 has been respected 99.99% of the time | 11:54 |
*** SergeyLukjanov has quit IRC | 11:55 | |
sdague | I'll let him speak for himself when he gets up though :) | 11:55 |
sdague | but that's my take | 11:55 |
BobBall | understood | 11:55 |
*** pcm_ has joined #openstack-infra | 11:56 | |
sdague | fungi when you get up, I had a question on job definition | 11:59 |
sdague | mostly around neutron jobs | 11:59 |
*** Ryan_Lane has joined #openstack-infra | 12:00 | |
*** matsuhashi has quit IRC | 12:01 | |
*** afazekas is now known as afazekas_food | 12:01 | |
*** AlexF has joined #openstack-infra | 12:02 | |
*** SergeyLukjanov has joined #openstack-infra | 12:05 | |
*** adalbas has joined #openstack-infra | 12:05 | |
dims | hi, looking at zuul page none of the "check" jobs seem to have a progress bar. they are marked "queued" . Do the gate jobs take precedence and check jobs will wait for their turn? or is there some other problem? | 12:07 |
openstackgerrit | Sean Dague proposed a change to openstack-infra/config: add gate-tempest-devstack-vm-neutron-pg job https://review.openstack.org/48635 | 12:07 |
sdague | dims: gate takes priority | 12:07 |
sdague | so yes, check queue is starved right now | 12:07 |
*** Ryan_Lane has quit IRC | 12:07 | |
dims | sdague, thanks! | 12:08 |
sdague | basically before nnfi the gate would be sitting in a hold until the gate failure was resolved, so the check jobs would run in and grab all the devstack nodes | 12:08 |
sdague | but now because the gate throughput is up, they are grabbing every resource | 12:08 |
dims | makes sense | 12:08 |
sdague | and because the neutron race which is killing most jobs, that's kind of problematic | 12:09 |
sdague | fungi / jeblair: check queue is now > 150, so bursting would be nice :) | 12:09 |
*** matsuhashi has joined #openstack-infra | 12:10 | |
*** matsuhashi has quit IRC | 12:11 | |
*** flaper87 is now known as flaper87|afk | 12:12 | |
*** AlexF has quit IRC | 12:14 | |
*** thomasm has joined #openstack-infra | 12:17 | |
*** thomasbiege has joined #openstack-infra | 12:20 | |
*** AlexF has joined #openstack-infra | 12:20 | |
*** hashar has joined #openstack-infra | 12:21 | |
*** ArxCruz has joined #openstack-infra | 12:21 | |
*** thomasbiege has quit IRC | 12:22 | |
*** flaper87|afk is now known as flaper87 | 12:22 | |
*** dims has quit IRC | 12:22 | |
*** dims has joined #openstack-infra | 12:23 | |
*** weshay has joined #openstack-infra | 12:28 | |
*** acabrera has joined #openstack-infra | 12:29 | |
*** acabrera is now known as alcabrera | 12:29 | |
*** tvb has quit IRC | 12:30 | |
*** matsuhashi has joined #openstack-infra | 12:35 | |
*** tvb has joined #openstack-infra | 12:35 | |
*** Ryan_Lane has joined #openstack-infra | 12:36 | |
*** dkliban has joined #openstack-infra | 12:36 | |
*** jhesketh has quit IRC | 12:37 | |
*** jhesketh_ has quit IRC | 12:37 | |
*** Ryan_Lane has quit IRC | 12:40 | |
ttx | fungi: finally fixed bug 1160277 | 12:44 |
uvirtbot | Launchpad bug 1160277 in openstack-ci "Groups have similar names in LP and gerrit but are no longer synced" [Medium,Fix released] https://launchpad.net/bugs/1160277 | 12:44 |
ttx | fungi: while looking at the groups list in gerrit though, I found a few groups that are probably useless and should be removed: | 12:44 |
ttx | fungi: empty copy of the LP "heat" group: https://review.openstack.org/#/admin/groups/92,members | 12:44 |
*** afazekas_food has quit IRC | 12:45 | |
ttx | hmm, that's all. | 12:45 |
*** johnthetubaguy has joined #openstack-infra | 12:47 | |
*** basic` has joined #openstack-infra | 12:47 | |
fungi | sdague: what's your job definition question? | 12:48 |
fungi | ttx: yeah, i try to empty and set unused groups non-visible | 12:48 |
fungi | gerrit doesn't have a "delete group" feature | 12:48 |
sdague | fungi: can we specify the same job twice on a zuul run | 12:48 |
ttx | fungi: ah. ah. | 12:48 |
fungi | ttx: eventually i'll get around to determining how to construct a query which identifies an empty group and removes all traces of it from the various tables it might appear in | 12:49 |
fungi | sdague: i don't think we've tried, so not entirely sure | 12:49 |
fungi | back to the "run neutron tempest 10x for neutron jobs" idea presumably | 12:50 |
*** jhesketh has joined #openstack-infra | 12:50 | |
fungi | er, for neutron changes | 12:50 |
*** jhesketh_ has joined #openstack-infra | 12:50 | |
fungi | lemme see if a duplicate entry horks up the layout.yaml parser at least | 12:50 |
fungi | ttx: i didn't get as far as the nova requirements sync in folsom yesterday, ran into some more corner cases, but did get the patches for openstack/requirements on folsom and grizzly with the capped list including all transitive dependencies for all integrated projects on that branch... https://review.openstack.org/#/q/topic:bug/1172418,n,z | 12:52 |
fungi | ttx: steps i'm following are described at https://etherpad.openstack.org/XpIFEzhkgY along with some details on manual conflict resolution between some of the projects' requirements lists | 12:53 |
fungi | the changes to the requirements project may need some more massaging since i crudely backported a couple changes from master to rename/combine the lists there | 12:54 |
*** rfolco has joined #openstack-infra | 12:55 | |
ttx | fungi: did you see my questions above about the need to care about ceilometer in stable/folsom at all ? | 12:59 |
*** crank has quit IRC | 13:01 | |
fungi | ttx: haven't hit the scrollback yet, but will look | 13:02 |
ttx | (that was answering your question on how to solve conflicting reqs) | 13:03 |
fungi | looks like removing them will solve the anyjson conflict at least | 13:03 |
*** zul has quit IRC | 13:03 | |
ttx | fungi: also was wondering about swift since they were not in the gate in those ancient folsom times | 13:04 |
ttx | ignoring both would solve all conflicts | 13:04 |
ttx | except pep8 | 13:04 |
*** dkehn_ is now known as dkehn | 13:04 | |
fungi | so it would | 13:04 |
*** julim has joined #openstack-infra | 13:04 | |
*** ericw has joined #openstack-infra | 13:05 | |
*** tizzo has joined #openstack-infra | 13:06 | |
*** Ryan_Lane has joined #openstack-infra | 13:06 | |
*** davidhadas_ has joined #openstack-infra | 13:06 | |
*** dprince has joined #openstack-infra | 13:07 | |
*** zul has joined #openstack-infra | 13:07 | |
fungi | though the versions i settled on to resolve those other conflicts are basically still the right one after factoring swift out of folsom | 13:07 |
ttx | ok then :) | 13:08 |
*** dizquierdo has left #openstack-infra | 13:09 | |
*** Ryan_Lane has quit IRC | 13:11 | |
*** HenryG has joined #openstack-infra | 13:11 | |
*** xchu has quit IRC | 13:11 | |
sdague | fungi: well at least run neutron more than once | 13:11 |
sdague | right now it's way too easy for a race to come through | 13:12 |
ekarlso | any of you familiar with disk image builder ? | 13:12 |
sdague | so running 2x neutron and 2x neutron-pg would make it closer to other projects in how easy it is to slip a change through | 13:12 |
sdague | ekarlso: you probably want #tripleo | 13:13 |
*** nosnos has quit IRC | 13:15 | |
*** mriedem has joined #openstack-infra | 13:18 | |
*** HenryG has quit IRC | 13:19 | |
*** HenryG has joined #openstack-infra | 13:19 | |
*** salv-orlando has joined #openstack-infra | 13:20 | |
*** crank has joined #openstack-infra | 13:20 | |
*** prad_ has joined #openstack-infra | 13:23 | |
sdague | fungi: so check queue is at 170 and growing because of the gate starvation, which is actually making folks jump the check queue, hence making the gate worse (at least a couple non Jenkins +1ed changes over in there) | 13:25 |
sdague | any idea how we can aleviate this? | 13:25 |
dansmith | yeah, my thing from yesterday still hasn't run check, after 15h | 13:25 |
*** afazekas has joined #openstack-infra | 13:26 | |
ttx | sdague: needs a slightly smarter prioritization algorithm, I fear | 13:27 |
sdague | ttx: the reality is we'll just move the pain around | 13:27 |
sdague | ttx: but I agree | 13:27 |
sdague | clarkb and jeblair were working on this last night, but I guess no progress, and I don't think they realized quite how bad it was | 13:28 |
dansmith | yeah, my thing from yesterday is critical, so it just got +A'd since jenkins never voted on it | 13:28 |
ttx | sdague: at some point going faster just makes you go slower. This is a complex system :) | 13:28 |
*** matty_dubs|gone is now known as matty_dubs | 13:28 | |
fungi | it looks like we're starved on devstack slaves, so adding more unit test slaves isn't going to help | 13:28 |
sdague | fungi: yeh, this is all devstack starvation | 13:29 |
sdague | also, given that stable/grizzly is bust, that's not helping either | 13:30 |
*** bnemec_ is now known as beekneemech | 13:30 | |
sdague | as those are guarunteed resets right now | 13:30 |
sdague | that's how we just lost the gate | 13:31 |
*** yassine has quit IRC | 13:31 | |
fungi | someone approved a grizzly change? | 13:31 |
*** yassine has joined #openstack-infra | 13:31 | |
sdague | yes | 13:31 |
fungi | the list of people able to do stable branch approvals is small--we should at least tell those people to cut it out until grizzly is fixed | 13:31 |
sdague | well, 8hrs ago they do | 13:32 |
sdague | https://review.openstack.org/#/c/47080/ | 13:32 |
sdague | it took 8hrs for that to get to the top of the gate, fwiw | 13:32 |
fungi | https://review.openstack.org/#/admin/groups/120,members plus https://review.openstack.org/#/admin/groups/11,members | 13:35 |
*** Ryan_Lane has joined #openstack-infra | 13:36 | |
*** johnthetubaguy1 has joined #openstack-infra | 13:39 | |
*** johnthetubaguy has quit IRC | 13:40 | |
*** Ryan_Lane has quit IRC | 13:41 | |
*** CaptTofu has quit IRC | 13:44 | |
sdague | fungi: any idea where the scheduling config is in zuul, so we could at least unstarve check? | 13:44 |
*** CaptTofu has joined #openstack-infra | 13:44 | |
*** dcramer_ has quit IRC | 13:45 | |
*** guohliu has joined #openstack-infra | 13:46 | |
fungi | sdague: in zuul's layout.yaml, within entries in the pipelines section there are precedence parameters | 13:46 |
fungi | we could, for example, put gate and check back on equal footing that way | 13:47 |
fungi | so that the gate will take 2-3x as long to clear as it is now | 13:47 |
fungi | we can't currently set proportional shares or anything though (to say 75% of available resources go to gate jobs and 25% go to check jobs) | 13:48 |
openstackgerrit | Sean Dague proposed a change to openstack-infra/config: make check queue high priority https://review.openstack.org/48657 | 13:49 |
sdague | fungi: yeh, equal priority I thik would be the right call | 13:49 |
sdague | the gate's really not merging much code right now anyway because of the resets | 13:50 |
sdague | and debug fixes to get to the bottom of those issues, are blocked on check, and not getting feedback | 13:50 |
dansmith | +1 | 13:51 |
fungi | as to your earlier question about multiple instances of the same job for a given project+pipeline, i did confirm that doesn't fail the layout parsing check but still no idea what zuul would do with it | 13:52 |
*** CaptTofu has quit IRC | 13:52 | |
sdague | fungi: ok, well we can ponder that one later :) | 13:52 |
*** CaptTofu has joined #openstack-infra | 13:52 | |
sdague | so what do you think about leveling the queues? per - https://review.openstack.org/48657 | 13:52 |
*** yassine has quit IRC | 13:53 | |
*** yassine has joined #openstack-infra | 13:53 | |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/config: Temporarily raise check pipeline precedence https://review.openstack.org/48659 | 13:55 |
fungi | oh, you wrote one already | 13:56 |
sdague | fungi: yeh :) | 13:56 |
dhellmann | good morning | 13:56 |
sdague | morning | 13:57 |
dhellmann | sdague: it sounds like there are still issues with stable/grizzly because of the cliff change and quantumclient. I'm thinking of just releasing a cliff that doesn't use pyparsing at all, to remove the conflict. | 13:58 |
soren | Hm. I'm trying to use jenkins-job-builder, but my Jenkins has CSRF enabled and python-jenkins doesn't seem to support that. How have you worked around it for the OpenStack Jenkins? | 13:58 |
*** julim has quit IRC | 13:58 | |
fungi | sdague: abandoned mine, +2'd yours. i expect jeblair will be waking up any time so let's get his input on it | 13:58 |
sdague | dhellmann: that would be awesome | 13:58 |
dhellmann | sdague: ok, I'll get back to work on that, then. | 13:59 |
sdague | fungi: ok | 13:59 |
*** julim has joined #openstack-infra | 13:59 | |
fungi | soren: good question... where is the csrf option in jenkins? i'll check whether we set it (we don't really use the webui enough to worry about that) | 14:00 |
soren | fungi: I just found http://ci.openstack.org/jenkins.html | 14:00 |
soren | fungi: ...which says not to enable CSRF. | 14:00 |
soren | Scary. | 14:00 |
fungi | i suppose that would do it | 14:00 |
fungi | well, again, if you treat its http interface as an api endpoint only and don't use it for browsery clicky-clicky things, it's not particularly scary | 14:01 |
*** shardy_afk is now known as shardy | 14:02 | |
fungi | your api client is not going to be following links from other sites (one would hope) | 14:02 |
fungi | this mostly underscores the need for jenkins to separate its web interface and its api endpoint | 14:03 |
*** anteaya has joined #openstack-infra | 14:03 | |
fungi | also, when i do need to connect into any sort of web interface as an admin, i use an entirely separate browser to log into that and only that, but thankfully most of the stuff we administer doesn't require a webui | 14:05 |
*** Ryan_Lane has joined #openstack-infra | 14:07 | |
fungi | i wonder if http://javadoc.jenkins-ci.org/hudson/security/csrf/CrumbExclusion.html could be leveraged for that more recently | 14:08 |
openstackgerrit | Felipe Reyes proposed a change to openstack-infra/jenkins-job-builder: Added support for Git shallow clone parameter https://review.openstack.org/48661 | 14:09 |
*** mrodden has joined #openstack-infra | 14:10 | |
*** Ryan_Lane has quit IRC | 14:11 | |
*** rnirmal has joined #openstack-infra | 14:13 | |
dhellmann | sdague: I'm trying to think of a plan for testing a new cliff release without actually releasing it and potentially causing more things to break. Any ideas? | 14:15 |
sdague | if we had spare gate time, I would. But as that is all starved... I don't know | 14:16 |
sdague | we could make a requirements proposed change with a tarball link | 14:16 |
dhellmann | I can run tests locally, I'm just trying to reason through would I would need to do | 14:16 |
dhellmann | oh, that's interesting | 14:16 |
sdague | that would at least test master | 14:16 |
*** dizquierdo has joined #openstack-infra | 14:17 | |
dhellmann | I'm assuming if I remove the pyparsing requirement from cliff, the one in stable/grizzly will be useless but not have a conflict | 14:17 |
dhellmann | so stable/grizzly will think it needs a version of pyparsing that nothing will import | 14:17 |
sdague | right, so it won't wedge in stable/grizzly | 14:17 |
dhellmann | right | 14:17 |
sdague | I think that's right | 14:17 |
sdague | honestly, I'm only about 1/2 way down the rabbit hole on that one, as I thought others were working it | 14:18 |
dhellmann | can I point the requirements file at a git URL? that would make it easy for me to test locally | 14:18 |
dhellmann | me, too | 14:18 |
dhellmann | I thought it was just a matter of removing that dependency, but apparently it's hard to get to the quantumclient part of the repo and do a release or something | 14:18 |
sdague | dhellmann: yeh, you can change the repos for devstack | 14:18 |
sdague | in localrc | 14:18 |
soren | fungi: csrf isn't about how *you* use the web ui, after all. | 14:18 |
sdague | either alt url, or alt branch | 14:18 |
dhellmann | sdague: no, I mean have the global requirements point to git for cliff | 14:19 |
soren | fungi: It's about how your browser can be tricked into using it. | 14:19 |
sdague | dhellmann: I don't remember if it can point to a git | 14:19 |
sdague | but it can do a tarball, like oslo does | 14:19 |
dhellmann | ok, I can make a local sdist | 14:19 |
soren | sdague: You can point pip at a git url. | 14:19 |
fungi | soren: yep. not logging authenticating to the jenkins administrative webui with your browser is a great way to thwart that | 14:19 |
soren | sdague: git+https://github.com/blah | 14:19 |
fungi | er, not authenticating | 14:20 |
*** tvb has quit IRC | 14:20 | |
sdague | soren: ok, except I'm not sure we propogate those via our global requirements sync | 14:20 |
sdague | I know we do the oslo tar case | 14:20 |
soren | sdague: Sorry, I replied entirely out of context. :) | 14:20 |
*** KennethWilke has joined #openstack-infra | 14:20 | |
sdague | yep, no worries :) | 14:20 |
soren | fungi: Jenkins seems less useful if you never look at it :) | 14:21 |
sdague | it's good to know though, probably something worth looking to add to our reqs sync | 14:21 |
fungi | soren: but yeah, having an automation-friendly means of authenticating to the api endpoint entirely separate from browser handling | 14:21 |
fungi | something it lacks | 14:21 |
fungi | soren: probably the other reason we don't need to authenticate to it often is that we have it set up with anonymous read access enabled, so as long as you're not changing things through the webui you don't need to log into it | 14:22 |
jd__ | huhu, today ETA for a Ceilometer patch merge seems to be around 8 hours, FWIW | 14:23 |
fungi | jd__: yeah, we're proposing slowing that down further ;) | 14:23 |
*** datsun180b has joined #openstack-infra | 14:23 | |
jd__ | if that improves quality even further I wouldn't mind | 14:23 |
jd__ | I prefer to wait 8 hours for a merge than spending my days doing rechecks :-) | 14:24 |
sdague | jd__: the gate's at about 8 hrs merge time right now because of all the resets | 14:24 |
soren | fungi: Ah, good point. Mine's set up to always require authentication. | 14:24 |
sdague | however, the check queue is currently starved, so nothings moved there for the last 15 hrs | 14:24 |
jd__ | sdague: ah I didn't know there has been reset, cool then | 14:24 |
*** wchrisj_ has joined #openstack-infra | 14:24 | |
sdague | jd__: not a zuul reset | 14:24 |
sdague | fails by stuff in the gate | 14:24 |
jd__ | oh I see | 14:25 |
sdague | the gate failure rate is really high | 14:25 |
jd__ | the new tree stuff ? | 14:25 |
sdague | no, bugs in openstack | 14:25 |
*** amotoki has quit IRC | 14:26 | |
fungi | shush. openstack has no bugs. you're dreaming | 14:26 |
jd__ | sdague: bugs in new patchset being tested you mean, or existing bugs (rechecks)? | 14:26 |
dims | lol | 14:26 |
sdague | http://lists.openstack.org/pipermail/openstack-dev/2013-September/015743.html | 14:26 |
sdague | existing bugs | 14:26 |
jd__ | ok :) | 14:26 |
*** adalbas has quit IRC | 14:28 | |
*** wchrisj_ has quit IRC | 14:29 | |
dansmith | http://img819.imageshack.us/img819/3070/6exn.png | 14:30 |
sdague | what is definitely interesting is the Test Nodes graphic at the bottom of the page has a very distinctive look when we are in reset land | 14:30 |
sdague | the peaks going up and down | 14:30 |
dansmith | it's pretty amazing how small gnome-terminal will go, so at least I can see all of the nova stuff block-wise :) | 14:30 |
sdague | heh | 14:30 |
*** dcramer_ has joined #openstack-infra | 14:32 | |
*** tvb has joined #openstack-infra | 14:32 | |
*** mrodden has quit IRC | 14:33 | |
*** markmcclain has joined #openstack-infra | 14:33 | |
mordred | morning all | 14:34 |
Alex_Gaynor | morning mordred | 14:34 |
mordred | soren: we kinda think Jenkins is less useful in general, and thus never really look at it :) | 14:35 |
sdague | mordred: how do you feel about rebalancing the queues? :) | 14:35 |
sdague | mordred: https://review.openstack.org/#/c/48657/ | 14:35 |
*** senk has joined #openstack-infra | 14:36 | |
sdague | we have stuff that entered the check queue yesterday afternoon, as still haven't gotten access to devstack nodes | 14:37 |
mordred | sdague: done | 14:37 |
sdague | mordred: thank you | 14:37 |
*** Ryan_Lane has joined #openstack-infra | 14:37 | |
*** MoXxXoM has quit IRC | 14:38 | |
Alex_Gaynor | So, maybe ridiculous question, but could we be launching more devstack nodes? | 14:39 |
*** MoXxXoM has joined #openstack-infra | 14:39 | |
sdague | Alex_Gaynor: my understanding is we were basically at quota with HP | 14:41 |
sdague | maybe mordred knows more | 14:41 |
*** Ryan_Lane has quit IRC | 14:41 | |
mordred | we are - and we could request a quota increase... but | 14:43 |
mordred | I don't know that I'm convinced that would help, given the resets | 14:44 |
*** adalbas has joined #openstack-infra | 14:44 | |
mordred | the gate queue isnt' slow due to starvation | 14:44 |
sdague | it would help with the starvation on check | 14:44 |
sdague | correct | 14:44 |
mordred | well, we've also got a change in flight to move the check queue to a separate pool of machines | 14:44 |
mordred | https://review.openstack.org/#/c/48549/ | 14:45 |
sdague | sure, it's just going to take until tomorrow afternoon to clear the check queue at this rate | 14:45 |
mordred | yah. I'm just saying, I think that finishing the above patch and landing it will get us _way_ further (and be quicker) than trying to increase quota size | 14:46 |
sdague | yeh, sure | 14:46 |
openstackgerrit | A change was merged to openstack-infra/config: make check queue high priority https://review.openstack.org/48657 | 14:46 |
mordred | I'll work on trying to get that patch finished as soon as I've found coffee | 14:46 |
jeblair | i think i would have made them both normal | 14:46 |
jeblair | now post will starve | 14:46 |
jeblair | i apparently missed reviewing that by 2 minutes | 14:47 |
sdague | I thought post was high? | 14:47 |
jeblair | normal | 14:47 |
sdague | ah | 14:47 |
*** alcabrera is now known as gerrit2 | 14:48 | |
*** gerrit2 is now known as alcabrera | 14:48 | |
sdague | is normal a keyword? or just the default? | 14:48 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Make check, high, post normal precedence. https://review.openstack.org/48668 | 14:48 |
jeblair | both | 14:48 |
mordred | jeblair: nod. +2 | 14:49 |
sdague | I think you want to update commit message :) | 14:49 |
sdague | s/high/gate/ | 14:49 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Make check, gate, post normal precedence. https://review.openstack.org/48668 | 14:49 |
jeblair | make word word word | 14:49 |
ryanpetrello | so I've just tagged a stackforge project (pecan) for release, and watched it go through on zuul; | 14:49 |
ryanpetrello | I've never done this before now - how long does it take for the sdist to show up on pypi? | 14:49 |
sdague | heh | 14:49 |
ryanpetrello | (not in a rush, just want to make sure I didn't goof it up :D) | 14:50 |
*** kgriffs has joined #openstack-infra | 14:51 | |
ryanpetrello | http://logs.openstack.org/bf/bf841be3933fd297b534ca235bcbae0c13bf6202/release/pecan-tarball/0b9ffb8/console.html | 14:51 |
ryanpetrello | looks like it failed? | 14:51 |
*** rcleere has joined #openstack-infra | 14:51 | |
*** matsuhashi has quit IRC | 14:51 | |
jeblair | mordred, sdague: there are also things we can tune to get nodepool a little more responsive now, i'll work on that while mordred finished the rax-check stuff | 14:51 |
kgriffs | guys, got a question re paste.openstack.org | 14:52 |
sdague | jeblair: cool | 14:52 |
kgriffs | I noticed it is based on lodgeit, and I found this: https://github.com/openstack-infra/lodgeit | 14:52 |
kgriffs | is that repo independent of the original lodgeit? | 14:53 |
fungi | ryanpetrello: yeah, looks like you're missing a [testenv:venv] section in your tox.ini which run-tarball.sh expects to find | 14:53 |
fungi | kgriffs: it's a fork | 14:53 |
sdague | jeblair: so are queue priorities changed as soon as the config lands? | 14:53 |
fungi | kgriffs: the original lodgeit is abandoned upstream last i checked | 14:53 |
kgriffs | oh, ok | 14:53 |
jeblair | sdague: yes | 14:53 |
sdague | check is still going in the wrong direction, and it's only going to get worse as the PST folks wake up | 14:53 |
kgriffs | so we are sort of keeping it on life support? | 14:53 |
fungi | kgriffs: i think pocoo stopped using it and ceased maintaining it | 14:53 |
jeblair | sdague: for new jobs | 14:53 |
kgriffs | fungi: ok, I suspected as much | 14:54 |
jeblair | sdague: which isn't going to help many of the jobs currently in check | 14:54 |
sdague | jeblair: ok, so the 190 check jobs that are in there won't make any progress? | 14:54 |
fungi | kgriffs: basically, i think. part of the problem is that unauthenticated sites allowing you to post arbitrary text are an attractive nuisance and often abused to the point of being unmaintainable | 14:54 |
mordred | kgriffs: yes. clarkb and I found a pastebin that was more similar to gist a little while ago, but we haven't gotten to the point where working on paste has been important enough :) | 14:54 |
*** marun has joined #openstack-infra | 14:55 | |
*** jswarren has joined #openstack-infra | 14:55 | |
jeblair | sdague: indeed it seems likely to make it worse | 14:55 |
kgriffs | mordred, fungi: I would like to create a "pastebin" for images to share screenshots and stuff, and was wondering whether it should be a standalone thing or try to integrate with something already out there | 14:55 |
*** jswarren has quit IRC | 14:55 | |
jeblair | sdague: perhaps we should _lower_ gate to low until it clears out | 14:55 |
fungi | kgriffs: yikes. i think you don't want to do that | 14:55 |
fungi | kgriffs: it's called 4chan ;) | 14:55 |
kgriffs | heh | 14:55 |
sdague | jeblair: yeh, that seems reasonable, then on the next reset they'll start getting resources | 14:56 |
mordred | kgriffs: we have an open item to have better support for this from the horizon folks to | 14:56 |
*** jswarren has joined #openstack-infra | 14:56 | |
mordred | kgriffs: and some preliminary plans, but simlarly that hasn't hit high enough on the queue yet | 14:56 |
kgriffs | mordred what is the alternative project you found? | 14:56 |
sdague | jeblair: you want to respin your patch for that? or I can do it | 14:56 |
kgriffs | mordred: (the gist-like thing you mentioned) | 14:57 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Make check, gate, post low precedence https://review.openstack.org/48668 | 14:57 |
jeblair | sdague: ^ | 14:57 |
*** Ajaeger has joined #openstack-infra | 14:57 | |
mordred | kgriffs: https://github.com/justinvh/gitpaste | 14:57 |
sdague | fungi, mordred: ^^^ | 14:57 |
mordred | +2 | 14:57 |
kgriffs | ah, nice | 14:57 |
kgriffs | thanks - I'll check it out. | 14:58 |
sdague | ok, hopefully that will get things running though | 14:58 |
Alex_Gaynor | kgriffs, fungi: I can confirm that pocoo upstream no longer maintains lodgetit, their install (paste.pocoo) was being used for various illegal and highly offensive stuff so it was too much of a hassle | 14:58 |
kgriffs | gtk | 14:59 |
openstackgerrit | Monty Taylor proposed a change to openstack-infra/config: Use rackspace for tempest check tests. https://review.openstack.org/48549 | 15:00 |
jswarren | Hello. Maybe this has already been brought up, but I'm noticing on zuul that python26 jobs are stuck on queued with evidently none are in progress. | 15:00 |
mordred | jeblair: I think that does it | 15:00 |
mordred | jswarren: yup. big-time gate issues right now | 15:01 |
mordred | jswarren: http://lists.openstack.org/pipermail/openstack-dev/2013-September/015743.html | 15:01 |
jeblair | mordred: you didn't split it into 2 changes | 15:01 |
mordred | jeblair: ah. sorry. didn't see that note (still pre-coffee) one sec | 15:02 |
*** mrodden has joined #openstack-infra | 15:03 | |
openstackgerrit | A change was merged to openstack-infra/config: Make check, gate, post low precedence https://review.openstack.org/48668 | 15:03 |
Alex_Gaynor | So the priority updates, does that require a zuul restart? | 15:04 |
jeblair | Alex_Gaynor: no, nothing to the zuul layout.yaml requires a restart, only a reload (which puppet will do automatically); queue contents don't change | 15:04 |
Alex_Gaynor | jeblair: thank god | 15:05 |
openstackgerrit | Monty Taylor proposed a change to openstack-infra/config: Use rackspace for tempest check tests https://review.openstack.org/48672 | 15:05 |
openstackgerrit | Monty Taylor proposed a change to openstack-infra/config: Set up new images on rackspace for check tests https://review.openstack.org/48549 | 15:05 |
jeblair | mordred: dfw has 18/60 slots available (the rest are static slaves); ord is pretty much open (i can delete some test servers there), iad only has 8 slots | 15:06 |
jeblair | mordred: i think we need to leave headroom in dfw. i'm not sure we should use it much, if at all. | 15:06 |
mordred | jeblair: agree. lemme modify the first patch | 15:07 |
jeblair | mordred: hang on | 15:07 |
mordred | I'm also going to send pvo and troy an email seeing if we can get IAD to match | 15:07 |
*** Ryan_Lane has joined #openstack-infra | 15:08 | |
*** tvb has quit IRC | 15:08 | |
Alex_Gaynor | mordred: if you need me to ask people to up our limit, let me know, I can start sending emails | 15:08 |
mordred | Alex_Gaynor: I just emailed troy and pvo, but if you know other folks, what I requested was "Can you up our quota on the openstackjenkins account in IAD to match DFW and ORD?" | 15:09 |
openstackgerrit | Anne Gentle proposed a change to openstack-infra/config: Removes openstack-api-programming doc build https://review.openstack.org/48674 | 15:09 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Tune nodepool https://review.openstack.org/48675 | 15:09 |
Alex_Gaynor | mordred: k, will start firing emails | 15:10 |
jeblair | mordred: i will modify your patch | 15:10 |
fungi | jeblair: ttx: reed: noticed a small freshness problem with http://git.openstack.org/cgit/openstack-infra/config/tree/tools/atc/email-stats.sh . what's the best way to confirm which repositories should be listed in there to count toward atc? should everything in openstack/ openstack-dev/ and openstack-infra/ get added to it? | 15:12 |
mordred | Alex_Gaynor, jeblair: pvo has acknowledge my email | 15:12 |
*** Ryan_Lane has quit IRC | 15:12 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Set up new images on rackspace for check tests https://review.openstack.org/48549 | 15:13 |
jeblair | mordred: ^ | 15:13 |
*** DinaBelova has joined #openstack-infra | 15:13 | |
ttx | fungi: you shoudln't need ATC right now, just APC | 15:13 |
jeblair | what's an apc? | 15:14 |
ttx | Active pro(ject/gram) Contributor | 15:14 |
mordred | jeblair: yup | 15:14 |
openstackgerrit | David Caro proposed a change to openstack-infra/jenkins-job-builder: Added the possibility to specify source files https://review.openstack.org/48677 | 15:14 |
fungi | ttx: that's the list of projects we're building stats on, so for example openstack/django_openstack_auth is not represented (yet) | 15:14 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Use rackspace for tempest check tests https://review.openstack.org/48672 | 15:15 |
jeblair | rebase ^ | 15:15 |
ttx | fungi: we don't have the precise program/project map yet, but I can go through the list of projects and get that for you | 15:15 |
fungi | ttx: i'll add it since you say it's part of horizon's program, but just trying to figure out what else we may be missing more generally | 15:15 |
jeblair | fungi: aprv https://review.openstack.org/48675 ? | 15:15 |
mordred | jeblair: all three are +2 from me | 15:16 |
ttx | fungi: i'll fix that list for you before we run the ATC voters lists | 15:16 |
fungi | ttx: k, thanks | 15:16 |
jeblair | fungi: and then https://review.openstack.org/48549 as well | 15:16 |
jeblair | i wip'd the 3rd change to keep it from going in prematurely | 15:16 |
ttx | fungi: i added django_openstack_auth because that's arguably part of the horizon program | 15:17 |
*** CaptTofu has quit IRC | 15:17 | |
ttx | fungi: but it's a bit of a grey area right now, until programs all submit their lists | 15:17 |
ttx | but i can't get them to publish a mission statement, so projects lists... | 15:17 |
jeblair | ttx: i believe that's the understanding we came to with gabrielhurley | 15:17 |
ttx | jeblair: agreed, but it just won't be completely clear cut until we get the program/projects maps in the governance repo | 15:19 |
* anteaya observes | 15:19 | |
ttx | until then we'll continue to use the old "sounds about right" recipe we've been using for ATCs until now :) | 15:19 |
fungi | wfm | 15:19 |
anteaya | okay | 15:20 |
ttx | fungi: everyone will just blame anteaya anyway | 15:20 |
ttx | that's what we need election officials for, after all | 15:20 |
anteaya | blame me | 15:20 |
fungi | i know i do ;) | 15:20 |
anteaya | :D | 15:20 |
* fungi kids | 15:20 | |
anteaya | it is the fun that comes with that particular hat | 15:20 |
anteaya | knew it when I volunteered | 15:20 |
*** tvb has joined #openstack-infra | 15:21 | |
*** tvb has quit IRC | 15:21 | |
*** tvb has joined #openstack-infra | 15:21 | |
ttx | anteaya: note that I decided to share the blame for the TC election. Just couldn't for this one :) | 15:21 |
anteaya | understood | 15:21 |
anteaya | and yeah the TC election promises to be a whole lot of fun | 15:21 |
anteaya | get ready for the deluge of +1 emails | 15:21 |
mordred | ttx: you might want to poke the TC folks who still haven't vote on the governance repo - I believe your reminded slipped in the end of the meeting last time | 15:21 |
mordred | so they may not be noticing that they need to do that | 15:22 |
ttx | mordred: will do | 15:22 |
jgriffith | sdague: ummm... just curious why you think this: https://bugs.launchpad.net/tempest/+bug/1226337 is a tgt issue? | 15:22 |
uvirtbot | Launchpad bug 1226337 in tempest "tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern flake failure" [High,Triaged] | 15:22 |
jgriffith | particularly sinc ethe specific example here is that the server never booted? | 15:22 |
*** CaptTofu has joined #openstack-infra | 15:22 | |
fungi | ttx: the main reason i was asking as far as updating that list is that it potentially affects the set of qualifying atcs i gave reed for summit passes | 15:23 |
openstackgerrit | David Caro proposed a change to openstack-infra/jenkins-job-builder: Added the possibility to specify source files https://review.openstack.org/48677 | 15:23 |
ttx | mordred: actually we have 8 +2s there. Which is enough to pass. | 15:23 |
ttx | mordred: i'll still ping them for a last-minute objection though | 15:23 |
jgriffith | jog0: ping | 15:24 |
*** Ajaeger has quit IRC | 15:24 | |
jgriffith | OH... never mind that Nikola | 15:24 |
*** freyes has joined #openstack-infra | 15:25 | |
*** reed_ has joined #openstack-infra | 15:27 | |
*** CaptTofu_ has joined #openstack-infra | 15:27 | |
openstackgerrit | A change was merged to openstack-infra/config: Tune nodepool https://review.openstack.org/48675 | 15:28 |
openstackgerrit | A change was merged to openstack-infra/config: Set up new images on rackspace for check tests https://review.openstack.org/48549 | 15:28 |
*** CaptTofu_ has quit IRC | 15:30 | |
sdague | jgriffith: because the issue looks like the iscsi device can't be found from compute | 15:31 |
*** rpodolyaka has left #openstack-infra | 15:31 | |
jgriffith | sdague: Ummmm | 15:32 |
sdague | it's a boot from volume, and on the 3rd time to boot from a volume the iscsi device never shows up on n-cpu | 15:32 |
jgriffith | http://logs.openstack.org/29/45029/6/check/gate-tempest-devstack-vm-full/80dd62e/logs/screen-n-cpu.txt.gz | 15:32 |
openstackgerrit | Jaroslav Henner proposed a change to openstack-infra/jenkins-job-builder: Add dynamic string and choice params. https://review.openstack.org/48506 | 15:32 |
jgriffith | sdague: afraid I think these multiple things going on here | 15:32 |
jgriffith | s/these/there's/ | 15:32 |
sdague | http://logs.openstack.org/87/47487/4/gate/gate-tempest-devstack-vm-postgres-full/247d81e/logs/screen-n-cpu.txt.gz#_2013-09-27_10_03_40_640 | 15:33 |
*** tvb|afk has joined #openstack-infra | 15:33 | |
*** tvb|afk has quit IRC | 15:33 | |
*** tvb|afk has joined #openstack-infra | 15:33 | |
sdague | jgriffith: ok, well more eyes appreciated | 15:33 |
sdague | this is as far as we got on -qa this morning trying to figure things out | 15:33 |
sdague | there's some scrollback there if you are on it | 15:34 |
jgriffith | sdague: I'm looking, If I can find a clean example of the target issue I can dig in on the cinder side | 15:34 |
jgriffith | sdague: checking... | 15:34 |
*** tvb has quit IRC | 15:34 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/zuul: Allow multiple invocations of the same job https://review.openstack.org/48684 | 15:35 |
jeblair | sdague, fungi: ^ sadly, I think that answers that question in the negative. but we should be able to have that feature in place over the weekend. | 15:35 |
*** Ryan_Lane has joined #openstack-infra | 15:38 | |
mgagne | When Rackspace updates their images, does the image ID change? Does the image disappears for a brief moment or are there 2 images with the same name for a couple of seconds? | 15:39 |
*** AlexF has quit IRC | 15:40 | |
*** tvb|afk has quit IRC | 15:41 | |
mordred | jeblair: pvo says our IAD quota should be increased | 15:42 |
jeblair | mgagne: i don't know | 15:42 |
*** tvb has joined #openstack-infra | 15:42 | |
*** tvb has quit IRC | 15:42 | |
*** tvb has joined #openstack-infra | 15:42 | |
jeblair | mgagne: it is! | 15:42 |
*** DinaBelova has quit IRC | 15:42 | |
jeblair | mordred: it is! | 15:42 |
jeblair | mgagne: sorry | 15:42 |
*** Ryan_Lane has quit IRC | 15:43 | |
jeblair | mordred: i'll update nodepool conf | 15:43 |
*** tizzo has quit IRC | 15:43 | |
*** DennyZhang has joined #openstack-infra | 15:43 | |
*** AlexF has joined #openstack-infra | 15:44 | |
*** UtahDave has joined #openstack-infra | 15:45 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Increase IAD nodepool limits https://review.openstack.org/48687 | 15:45 |
openstackgerrit | David Caro proposed a change to openstack-infra/jenkins-job-builder: Added globbed parameters to the job specification https://review.openstack.org/48688 | 15:45 |
jeblair | mordred: check images are building | 15:45 |
mordred | jeblair: woot | 15:46 |
*** yassine has quit IRC | 15:47 | |
jeblair | i'm deleting the old test nodes/images | 15:47 |
giulivo | jgriffith, what I found is that cinder seems to receive on okay from tgt-admin about the update so the volume is moved into available state | 15:48 |
*** DennyZhang has quit IRC | 15:48 | |
giulivo | but later iscsiadm on nova can't find the volume | 15:48 |
*** DinaBelova has joined #openstack-infra | 15:49 | |
giulivo | so following sdague suggestion I've this on devstack https://review.openstack.org/#/c/48626/ | 15:49 |
*** DennyZhang has joined #openstack-infra | 15:49 | |
jgriffith | giulivo: is it iscsiadm can't discover? Cuz it looks like the discover works and it *thinks* it attached it | 15:49 |
jgriffith | giulivo: but that that actual proble is that the attach was no good | 15:49 |
giulivo | jgriffith, I found three attempts to rediscover | 15:49 |
jgriffith | giulivo: but I'm just trying to catch up so I could be wrong | 15:49 |
giulivo | lasting like secs | 15:49 |
jgriffith | giulivo: what do you mean by that? | 15:50 |
jgriffith | giulivo: ie... can you point to the logs? | 15:50 |
jgriffith | giulivo: You mean sendtargets command? | 15:50 |
giulivo | wait a sec so I can post the relevant log | 15:51 |
jgriffith | giulivo: cool | 15:51 |
giulivo | :P | 15:51 |
jgriffith | giulivo: like I said, be patient with me I'm just catching up with you guys here :) | 15:51 |
jgriffith | giulivo: Hoping I can help | 15:51 |
*** tizzo has joined #openstack-infra | 15:51 | |
*** mkerrin has quit IRC | 15:52 | |
giulivo | oh c'mon so the logs I was looking at are http://logs.openstack.org/64/47264/2/gate/gate-tempest-devstack-vm-full/dced339/logs/screen-c-vol.txt.gz for cinder and http://logs.openstack.org/64/47264/2/gate/gate-tempest-devstack-vm-full/dced339/logs/screen-n-cpu.txt.gz for nova | 15:52 |
giulivo | the problem is with volume 4020e0dd-24a0-453b-985d-e50cb2dd0de1 | 15:53 |
giulivo | the nova exception is here http://logs.openstack.org/64/47264/2/gate/gate-tempest-devstack-vm-full/dced339/logs/screen-n-cpu.txt.gz#_2013-09-24_04_44_35_186 | 15:54 |
jeblair | mordred, fungi: https://review.openstack.org/#/c/48687/ | 15:54 |
jeblair | all the rax check images are now ready | 15:55 |
jgriffith | giulivo: yeah, so that's what I was wondering.... | 15:56 |
fungi | jeblair: does that mean 48672 is safe to un-wip/approve now? | 15:57 |
jgriffith | giulivo: Login was succesful indicating the target was there | 15:57 |
jeblair | fungi: not just yet, it's launching the nodes | 15:57 |
jgriffith | giulivo: 2013-09-24 04:44:17.515 | 15:57 |
giulivo | login succeeds true, but not the volume? | 15:57 |
fungi | ahh, okay | 15:57 |
jgriffith | BUT | 15:57 |
jgriffith | the attach/mount ad /dev/vda is the crux of the issue | 15:57 |
jgriffith | I *think* | 15:57 |
jgriffith | giulivo: That fact that the login to the target was succesful is why I had moved past that point | 15:58 |
jgriffith | giulivo: sadly, no logging inbetween there :( | 15:58 |
*** thomasbiege has joined #openstack-infra | 15:59 | |
*** CaptTofu has quit IRC | 15:59 | |
giulivo | so the three attempts to rediscover which are failing are "okay" ? | 15:59 |
giulivo | rediscover the volume, after logging in | 15:59 |
giulivo | like this http://logs.openstack.org/64/47264/2/gate/gate-tempest-devstack-vm-full/dced339/logs/screen-n-cpu.txt.gz#_2013-09-24_04_44_22_710 | 16:00 |
jgriffith | giulivo: well... | 16:00 |
jgriffith | giulivo: so "discover" can mean different things with iscsi | 16:00 |
jgriffith | giulivo: "discover" in terms of iscsi target discovery appears to have succeeded without issue | 16:00 |
jgriffith | giulivo: what you're referring to though is the attachment | 16:00 |
giulivo | yeah it's not the sendtargets sorry, I should say rescan but that is just the argument passed to iscsiadm | 16:00 |
jgriffith | *I think* | 16:01 |
jgriffith | giulivo: got ya | 16:01 |
jgriffith | giulivo: so what's failing is the attach | 16:01 |
jgriffith | giulivo: the target *appears* to be vlie | 16:01 |
jgriffith | valid | 16:01 |
*** thomasbiege has quit IRC | 16:01 | |
SpamapS | Anybody know a way to specify a different set of things to ignore for flake8 per-directory? | 16:01 |
jgriffith | giulivo: but it's the attach that is hosed | 16:01 |
jgriffith | and whatever's been done with the logging isn't overly helpful IMO | 16:02 |
*** matty_dubs is now known as matty_dubs|lunch | 16:03 | |
*** tizzo has quit IRC | 16:04 | |
openstackgerrit | David Caro proposed a change to openstack-infra/jenkins-job-builder: Added the possibility to specify source files https://review.openstack.org/48677 | 16:04 |
jeblair | mordred, sdague, fungi: our first rax nodes are ready, from IAD, they took 16 minutes to build | 16:05 |
jeblair | (dfw and ord are still building) | 16:06 |
fungi | eek | 16:06 |
fungi | what's build time like for hp? | 16:06 |
jeblair | fungi: 2 mins | 16:06 |
fungi | i guess ~15 minutes is what i recall from standing up puppetish servery things in rackspace previously though | 16:07 |
openstackgerrit | A change was merged to openstack-infra/config: Increase IAD nodepool limits https://review.openstack.org/48687 | 16:07 |
guitarzan | giulivo: can you tell if the iscsi device shows up eventually? | 16:07 |
fungi | taking package installs/upgrades and whatnot into account | 16:07 |
jeblair | fungi: that's not necessary for this though -- this is a straight launch from image -- but it's a custom image, which means it may not be local to the compute node | 16:07 |
fungi | oh, ew | 16:08 |
fungi | right, image is already updated and such | 16:08 |
jeblair | i don't know how it works in rax though -- perhaps continued use warms caches on compute nodes. | 16:08 |
fungi | we'll be warming those up really quickly if that's the case ;) | 16:08 |
jeblair | we can somewhat mitigate this by increasing min-ready even more | 16:09 |
giulivo | guitarzan, so I think the problem is exactly that the block device never shows up | 16:09 |
giulivo | there is nothing from the kernel messages about the newer volume (from iscsiadm) | 16:10 |
giulivo | not that I can see at least | 16:10 |
guitarzan | hmm, how is the network between the two machines? | 16:11 |
giulivo | loopback | 16:11 |
guitarzan | ah, and above someone said the discovery was fine | 16:11 |
giulivo | yeah the login on the portal works | 16:11 |
jgriffith | guitarzan: network sucks | 16:12 |
jgriffith | guitarzan: the target is discovered BTW | 16:12 |
jgriffith | guitarzan: it's the iscsiadm attach that doesn't seem to work | 16:13 |
*** flaper87 is now known as flaperboon | 16:13 | |
guitarzan | well, he also said the login worked | 16:13 |
guitarzan | so that's definitely confusing | 16:14 |
dims | jgriffith, giulivo - i don't even see iscsiadm commands being run - looking at logstash using query - (@message:"4020e0dd-24a0-453b-985d-e50cb2dd0de1" OR @message:"iscsiadm") AND @fields.build_uuid:"dced339fa65543fd9e752d2581bc5cae" | 16:14 |
jgriffith | guitarzan: http://logs.openstack.org/64/47264/2/gate/gate-tempest-devstack-vm-full/dced339/logs/screen-n-cpu.txt.gz | 16:14 |
jgriffith | dims: I've given up on logstash for the time being | 16:14 |
guitarzan | jgriffith: yeah, I'm looking at that too | 16:14 |
jgriffith | dims: checkout the link above to the nova log | 16:14 |
jgriffith | 2013-09-24 04:44:17.515 | 16:15 |
dims | i see it | 16:15 |
dims | looks like we are losing information in logstash sigh. | 16:15 |
jgriffith | dims: that's what I concluded but thought maybe my queries just sucked ;) | 16:16 |
*** alcabrera is now known as alcabrera_afk | 16:16 | |
jeblair | jgriffith: http://logs.openstack.org/64/47264/2/gate/gate-tempest-devstack-vm-full/dced339/logs/screen-n-cpu.txt.gz#_2013-09-24_04_44_17_515 | 16:17 |
*** tvb has quit IRC | 16:17 | |
jeblair | jgriffith: the timestamps are hyperlinks to per-line targets | 16:17 |
jeblair | jgriffith: (so you can more easily share a link to a line) | 16:17 |
clarkb | morning | 16:17 |
jgriffith | jeblair: Nice!!! | 16:17 |
jeblair | jgriffith, guitarzan: sdague made a change yesterday that removes DEBUG lines from logstash | 16:17 |
jgriffith | jeblair: thank you! | 16:17 |
jeblair | jgriffith: sdague did the line-hyperlink too | 16:18 |
jgriffith | jeblair: Ahhhh, so it's not that i cna't write a descent querie to save my life ;) | 16:18 |
dims | :) | 16:18 |
jeblair | fungi: some rax nodes are going on 0.43 hours in building state :( | 16:19 |
* clarkb catches up on the state of things | 16:19 | |
fungi | wow | 16:19 |
jeblair | clarkb: there's a lot; short version, we're throwing levers to deal with check starvation; nothing needs immediate attention there | 16:20 |
dims | giulivo, jgriffith, 04:44:17.577 first try and exception is at 04:44:31.806 - may be it just needs more time? | 16:20 |
giulivo | I don't know if there is nova folks around but after the latest iscsiadm --rescan attempt http://logs.openstack.org/64/47264/2/gate/gate-tempest-devstack-vm-full/dced339/logs/screen-n-cpu.txt.gz#_2013-09-24_04_44_22_710 we have 10 seconds of almost no logging before the stack trace | 16:20 |
*** gyee has joined #openstack-infra | 16:20 | |
openstackgerrit | A change was merged to openstack-infra/config: Use rackspace for tempest check tests https://review.openstack.org/48672 | 16:20 |
guitarzan | giulivo: 3**2 seconds maybe? :) | 16:21 |
clarkb | jeblair: does gearman honor NODE_LABEL? that was the biggest thing I was fuzzy on last night? | 16:21 |
jeblair | clarkb: zuul translates that into the job_name:label syntax for gearman | 16:21 |
*** odyssey4me has quit IRC | 16:21 | |
giulivo | it's 10 seconds after the last attempt | 16:21 |
jeblair | clarkb: we've never used that, so that's going to be exciting! | 16:22 |
clarkb | jgriffith: dims: we are removing DEBUG for a couple reasons the biggest being it adds an order of magnitude to the size of our indexes (2 weeks is ~600GB now but was ~5TB with DEBUG) but also DEBUG is largely useless noise | 16:22 |
clarkb | jgriffith: dims: also if there is information that pinpoints a bug and does not have anything logged at a higher level I would consider that to be a bug as well (if we fail it should be logged at something higher than INFO) | 16:23 |
clarkb | at least WARN imo | 16:23 |
jgriffith | clarkb: sure, don't get me wrong wasn't complaining | 16:23 |
clarkb | jeblair: cool | 16:23 |
jgriffith | clarkb: just pointing out that my queries never worked, and now I know why :) | 16:23 |
clarkb | jgriffith: I know, just trying to point out how we got here. It isn't perfect bus is definitely more useable overall | 16:24 |
jgriffith | clarkb: I would agree WRT bumping up some of the log levels | 16:24 |
giulivo | jgriffith, in nova it looks like the iscsiadm --rescan is only attempted three times so I think this just never finds the volume after logging in | 16:24 |
jgriffith | clarkb: agreed | 16:24 |
dims | clarkb, thanks, understood | 16:24 |
jgriffith | giulivo: sorry... I was looking at something else, going back to something here | 16:24 |
guitarzan | giulivo: if it hasn't happened in 14 seconds, maybe it isn't going to happen? | 16:25 |
giulivo | https://github.com/openstack/nova/blob/master/nova/virt/libvirt/volume.py#L275 | 16:25 |
guitarzan | giulivo: you say there was never anything in kern.log about a new disk showing up? | 16:25 |
giulivo | guitarzan, ^^ yep | 16:25 |
giulivo | I think logging on the portal works but the volume is never found and as per nova code, after three failed attempts it reports failure | 16:25 |
giulivo | that explains why there isn't anything in the kernel log about the new block device | 16:26 |
dims | giulivo, so try a few more times may help? | 16:27 |
giulivo | it is either the iscsiadm failing at --rescan | 16:27 |
jeblair | clarkb, fungi, mordred: look at 48423,2 on the status page | 16:27 |
jeblair | clarkb, fungi, mordred: mouseover the red dot | 16:27 |
giulivo | or the tgtd returning an okay to cinder before the lun is actually made available | 16:27 |
mordred | jeblair: yah | 16:28 |
jeblair | clarkb, fungi, mordred: you'll see the 'needed dependency is failing' logic in action | 16:28 |
clarkb | jeblair: awesome | 16:28 |
mordred | ++ | 16:28 |
fungi | nice, dependency failure | 16:28 |
clarkb | I mean not that it is failing but that the representation of it works :) | 16:28 |
giulivo | so based on that, I think this could help https://review.openstack.org/#/c/48626/ as we get tgtd in debug mode and can try to figure what it is doing when cinder provides it with the new volume | 16:28 |
fungi | i was hoping to eventually spot one of those in the wild with the new visualization | 16:28 |
fungi | also, holy test nodes graph batman | 16:29 |
clarkb | jog0: logstash is all caught up and appears to be keeping up | 16:30 |
pabelanger | fungi, I was about to say that... that is awesome! | 16:30 |
clarkb | jog0: so elastic-recheck probably doesn't need any fancy backoff stuff | 16:30 |
jeblair | here's an embiggened version: http://tinyurl.com/pj3kpj9 | 16:30 |
jeblair | (you have to remember to reload that one occasionally) | 16:30 |
jeblair | the orange peak near the end is the rackspace spinup | 16:31 |
jeblair | (and most of the ready nodes are rackspace) | 16:32 |
pabelanger | jeblair, what's the amount of time to actually spin up a node? Is that tracked some place? | 16:33 |
jeblair | pabelanger: it's in graphite (nodepool.launch.*), but i can tell you offhand we're seeing about 2 mins for hp and 16 for rackspace atm. | 16:35 |
dhellmann | sdague: https://github.com/dhellmann/cliff/tree/remove-cmd2 if you want to give it a spin | 16:35 |
*** Ryan_Lane has joined #openstack-infra | 16:35 | |
jog0 | mordred: it was a little bitchy, I was going for a public shaming. | 16:36 |
jog0 | clarkb: woot! | 16:37 |
clarkb | jeblair: chatted with zaro briefly over hte wall (shame on us for not doing it here) to better understand the NODE_LABEL stuff and I am not entirely sure it will owkr as expected | 16:40 |
jeblair | clarkb: we're about to find out? | 16:40 |
clarkb | jeblair: because our project configs don't use the label devstack-precise-check there won't be any jobs for that project:label name in the gearman server | 16:40 |
clarkb | s/jobs/workers/ | 16:41 |
jeblair | clarkb: ah, yes, that label needs to be added | 16:41 |
*** boris-42 has quit IRC | 16:41 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Revert "Use rackspace for tempest check tests" https://review.openstack.org/48698 | 16:42 |
clarkb | jeblair: but we can't do that safely without another job | 16:42 |
jeblair | clarkb: i think we can. the param func should set the label in all cases | 16:42 |
giulivo | dims, guitarzan, jgriffith, sdague I'm sorry I've to leave but FWIW I'm of the idea that iscsiadm --rescan is failing at finding the volume after it logs in on the portal, the nova code checks for the device path 3 times but it never pops up so it raises , see https://github.com/openstack/nova/blob/master/nova/virt/libvirt/volume.py#L275 so I think putting tgtd on debug on the other side could help figure wha | 16:43 |
giulivo | t is going on (at both creation time and attach) https://review.openstack.org/#/c/48626/ | 16:43 |
clarkb | jeblair: so we need to have an else in that function that sets it to devstack-precise? that should work | 16:43 |
jeblair | clarkb: yeah, though to do it safely, i think we need to start by setting it to devstack-precise always, then change the job labels, then add the conditional | 16:43 |
jeblair | clarkb: it's getting complicated enough that we should re-evaluate adding jobs... | 16:44 |
clarkb | ++ | 16:44 |
jeblair | clarkb: the advantage of adding jobs is that we can say check jobs can run on either, which is a little bit of a release valve if rackspace can't keep up. | 16:45 |
*** odyssey4me has joined #openstack-infra | 16:45 | |
jeblair | clarkb: the disadvantage, obviously, is that the devstack jobs are a huge mess right now and we'd be making twice as many of them. | 16:45 |
*** dcramer_ has quit IRC | 16:45 | |
clarkb | yeah. What if we didn't treat them differently (rackspace runs the jobs in about as much time as hpcloud did them serially) | 16:46 |
clarkb | (just throwing ideas out there) | 16:46 |
jeblair | clarkb: rackspace runs them in about 1.5 the time, so we're looking at 60 minutes instead of 40. | 16:47 |
*** wchrisj_ has joined #openstack-infra | 16:47 | |
*** afazekas is now known as afazekas_zz | 16:47 | |
*** dcramer_ has joined #openstack-infra | 16:48 | |
*** giulivo has quit IRC | 16:48 | |
notmyname | clarkb: I'm just getting caught up this morning. status of gates? good to go, or still waiting? | 16:48 |
jeblair | clarkb: that might be the best approach. | 16:49 |
clarkb | notmyname: still in a bit of flux, but we are actively sorting it out | 16:49 |
notmyname | clarkb: kk, thanks | 16:49 |
jeblair | clarkb: what are we sorting out? | 16:49 |
clarkb | jeblair: node starvation? | 16:49 |
clarkb | oh talking about gate in particular | 16:49 |
openstackgerrit | A change was merged to openstack-infra/config: Revert "Use rackspace for tempest check tests" https://review.openstack.org/48698 | 16:50 |
*** beekneemech has quit IRC | 16:50 | |
jeblair | clarkb: i don't think notmyname needs to take any particular action other than not approving stable/grizzly changes, which he so rarely does anyway. :) | 16:50 |
clarkb | jeblair: gotcha | 16:50 |
clarkb | notmyname: ^ | 16:50 |
notmyname | clarkb: jeblair: ok, thanks :-) | 16:50 |
*** gyee has quit IRC | 16:51 | |
*** AlexF has quit IRC | 16:52 | |
*** dcramer_ has quit IRC | 16:52 | |
jog0 | clarkb: some files are missing from logstash | 16:52 |
jeblair | mordred, fungi, sdague: so clarkb and i were chatting, and we either need to (a) do like 3 more steps to set up the check jobs to use rackspace, (b) double the number of devstack jobs so the check ones use rackspace, or (c) say screw it and just throw rackspace nodes into the general pool (occasionally jobs will take 60 instead of 40 mins) | 16:53 |
jog0 | http://logstash.openstack.org/#eyJzZWFyY2giOiIgQGZpZWxkcy5idWlsZF9jaGFuZ2U6XCIzOTYyMVwiIEFORCBAZmllbGRzLmJ1aWxkX3BhdGNoc2V0OlwiMTJcIiBBTkQgQGZpZWxkcy5idWlsZF9uYW1lOlwiZ2F0ZS10ZW1wZXN0LWRldnN0YWNrLXZtLXBvc3RncmVzLWZ1bGxcIiBBTkQgQGZpZWxkcy5idWlsZF91dWlkOlwiZWUyZjI2OTMyNDVhNGRmYmFjNDA4YmY3YmEyNDZmNmVcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiOTAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJtb2RlIjoidGVybXMiLCJhbmFseXplX2ZpZWx | 16:53 |
jeblair | mordred, fungi, sdague: thoughts? | 16:53 |
*** jerryz has joined #openstack-infra | 16:53 | |
jog0 | no screen-key isn't there | 16:54 |
clarkb | keystone should be there | 16:54 |
clarkb | we are missing ceilometer and one of the swift files (because the format of the swift file isn't conducive to indexing) | 16:54 |
* clarkb looks closer | 16:55 | |
* zaro says option c | 16:55 | |
jog0 | keystone is only missing sometimes | 16:55 |
mordred | jeblair: damn | 16:56 |
mordred | jeblair: I'm not convinced just more nodes in the pool will help - but you have just made excellent points | 16:56 |
jeblair | mordred: why wouldn't more nodes in the pool help? | 16:56 |
jeblair | mordred: that's pretty much what starvation means.... | 16:56 |
jog0 | oh and elasticSearch is really cought up, you weren't exaggerating. | 16:57 |
clarkb | jog0: http://logs.openstack.org/93/37893/11/check/gate-tempest-devstack-vm-neutron/303633a/logs/screen-key.txt.gz?level=INFO that is why | 16:57 |
jog0 | sdague: thanks!!! | 16:57 |
clarkb | jog0: basically no non DEBUG log lines according to apache | 16:57 |
mordred | jeblair: 2 things - slower nodes in the pool will increase the latency before resets potentially | 16:57 |
clarkb | jog0: but there are INFO lines in there so we have a bug | 16:57 |
jog0 | clarkb: oh :( | 16:57 |
jeblair | mordred: yes, slowing resets down mitigates starvation but slows gate throughput | 16:57 |
jog0 | turns out I don't need keystone yet so its not a blocker | 16:58 |
clarkb | jog0: I know what is going on | 16:58 |
clarkb | jog0: I think keystone uses its special snowflake format and we don't handle that properly on the apache side | 16:58 |
clarkb | sdague: ^ | 16:58 |
jeblair | mordred: other thing? | 16:58 |
mordred | jeblair: nope. I think that was the thing. I was wrong about there being 2 | 16:58 |
*** dmakogon__ has joined #openstack-infra | 16:59 | |
jeblair | mordred: the steps in (a) aren't difficult, and (b) is just a lot of typing (c) needs reconfiguration as well. i think all 3 choices will take about the same amount of time. | 16:59 |
jeblair | we get to chose on merits. | 17:00 |
mordred | jeblair: I like the end state of having check jobs running in rackspace | 17:00 |
mordred | because the slowness doesn't have a pile-on effect there | 17:00 |
*** DinaBelova has quit IRC | 17:00 | |
clarkb | jog0: sdague: I suddenly remember why logstash is so slow :) the number of cases you have to account for is a bit ridiculous | 17:00 |
*** matty_dubs|lunch is now known as matty_dubs | 17:00 | |
*** gyee has joined #openstack-infra | 17:01 | |
*** dstufft has quit IRC | 17:01 | |
clarkb | I think right now we only handle oslo format properly so swift and keystone aren't working | 17:01 |
jog0 | clarkb: yeah ... | 17:01 |
*** odyssey4me has quit IRC | 17:01 | |
clarkb | a quick fix would be to make the level configurable in the workers and only have >DEBUG on oslo formatted things | 17:01 |
clarkb | or sort it out in the wsgi app | 17:02 |
jeblair | okay, so the choice is between (a) run _only_ on rackspace, or (b) run on rackspace and hp, more or less at random according to the proportion of available nodes | 17:02 |
clarkb | sdague: ^ do you have an opinion on that? | 17:02 |
jog0 | clarkb: makes sense to me, but that may blow ElasticSearch way back again | 17:02 |
*** hashar has quit IRC | 17:02 | |
*** dstufft has joined #openstack-infra | 17:02 | |
clarkb | jog0: it shouldn't be too horrible. keystone and swift logs are smaller than the others | 17:02 |
jog0 | clarkb: hopefully | 17:02 |
*** MarkAtwood2 has joined #openstack-infra | 17:03 | |
jeblair | clarkb: i think it's only a partial regex to get the level anyway, so it may not be too complex to do in the app. | 17:03 |
jog0 | clarkb: on a related front I want to go ahead and make the elastic-search gerrit user | 17:03 |
jog0 | anything special to do that? | 17:03 |
*** SergeyLukjanov has quit IRC | 17:03 | |
jeblair | mordred: what are your feelings on a/b ? | 17:03 |
clarkb | jog0: one of the Gerrit admines (openstack-infra-core) needs to run a command | 17:03 |
markmcclain | jeblair: any update on manually pushing that quantumclient branch pypi? | 17:03 |
jeblair | markmcclain: did you ask us to? | 17:03 |
clarkb | jog0: probably get consensus on the name first (since it will potentially comment on lots of chnages) | 17:04 |
jog0 | elastic-recheck? | 17:04 |
jeblair | why is it called recheck? | 17:04 |
jog0 | ala the recheck page we have | 17:04 |
jog0 | so use elasticSearch to make rechecks easier | 17:05 |
clarkb | hmm is it time to test asterisk? | 17:05 |
jeblair | clarkb: yes it is | 17:05 |
mordred | jeblair: I think b sounds long term sounds richer | 17:05 |
jeblair | i was hoping we could at least reach a consensus on which of a/b/c to do about nodes... | 17:05 |
jeblair | mordred: yeah, so that means doubling the number of devstack jobs so there are check and gate versions | 17:06 |
mordred | jeblair: yeah. that's the least appealing part of b | 17:06 |
clarkb | maybe we can template those jobs and it won't be so horrible? | 17:07 |
jeblair | i mean, there may be opportunities for templating | 17:07 |
jeblair | so who wants to work on that? clarkb, zaro, mordred? | 17:07 |
mordred | jeblair: I am on the phone for the next 2 hours. | 17:08 |
clarkb | I can stab at it | 17:08 |
jeblair | mordred: i'm guessing that's a no, but i'm not sure ;) | 17:08 |
jeblair | clarkb: ok, thanks | 17:08 |
* mordred trying to bilk hp out of more headcount for us - so it's at least useful... | 17:08 | |
jeblair | russellb, pabelanger: around? | 17:08 |
fungi | eek, more scrollback | 17:09 |
jog0 | mtreinish: ping | 17:09 |
markmcclain | jeblair: I throught so, but I might not have made it clear | 17:09 |
*** odyssey4me has joined #openstack-infra | 17:09 | |
jeblair | mordred: can you release the quantumclient branch to pypi? | 17:10 |
mordred | jeblair: sure | 17:10 |
pabelanger | jeblair, indeed | 17:10 |
markmcclain | that was the review is going to require a manual merge first | 17:10 |
markmcclain | because that branch won't clear the gate | 17:10 |
jeblair | mordred: oh, so, er, can you force merge the review markmcclain is about to link for you, and then manually release it? :) | 17:10 |
markmcclain | mordred: https://review.openstack.org/#/c/48364/ | 17:10 |
clarkb | sdague: if you get a free moment it would be great if you could stab at making the wsgi app regex more flexible to handle keystone and in the case of swift probably just pass it all through | 17:11 |
mordred | jeblair: yes | 17:11 |
clarkb | since swift doesn't do log levels... | 17:11 |
notmyname | ???? | 17:11 |
fungi | i think a phased approach with rackspace nodes dumped into the general pool for starters makes sense, then take time to be able to separate pipelines to different providers in the ways which will make jenkins happy longer-term. i'm not super-keen on doubling the devstack job definitions, but maybe that's just unfounded ocd on my part | 17:11 |
clarkb | notmyname: http://logs.openstack.org/83/42283/37/check/gate-tempest-devstack-vm-full/4465ed4/logs/screen-s-account.txt.gz?level=DEBUG we are doing level based filtering of logs | 17:11 |
*** DennyZhang has quit IRC | 17:11 | |
clarkb | notmyname: but since swift doesn't have level based logs the filtering derps and removes everything | 17:11 |
mordred | markmcclain: do we want to tag that as a particular version? | 17:12 |
notmyname | clarkb: all swift processes support syslog facilities and log level filters: https://github.com/openstack/swift/blob/master/etc/proxy-server.conf-sample#L24 | 17:12 |
*** yolanda has quit IRC | 17:12 | |
clarkb | notmyname: but that only works with syslog? | 17:13 |
mordred | markmcclain: like, what version should be released to pypi? | 17:13 |
clarkb | notmyname: syslog doesn't like us when we run devstack it falls over pretty spectacularly | 17:13 |
jeblair | fungi: bummer, loss of consensus. i actually think that (b) is the safest from the pov that it's least likely to break things if rackspace can't keep up (or we decide to reduce its node supply). | 17:13 |
jog0 | clarkb: so for the elastic-search gerrit user .. now that ElasticSearch is blazingly fast I want to get the bot up, on my own RAX server | 17:14 |
markmcclain | mordred: 2.2.4 | 17:14 |
jeblair | fungi: it sucks that it adds so many jobs, but maybe templating will help | 17:14 |
* fungi is still reading the last 20 minutes of scrollback, which will take about 20 minutes, at which point there will be another 20 minutes of scrollback | 17:14 | |
*** odyssey4me has quit IRC | 17:14 | |
mordred | markmcclain: can't do that - neutronclient already has that tag :) | 17:14 |
mordred | markmcclain: how about 2.2.4.1 ? | 17:14 |
jeblair | clarkb, fungi, mordred: can we go ahead and merge these changes before clarkb starts? https://review.openstack.org/#/c/48547/ https://review.openstack.org/#/c/48635/ | 17:14 |
jeblair | pabelanger: i'm available to dial in | 17:15 |
jeblair | anteaya, zaro, fungi, clarkb: are you available for conferencing? | 17:15 |
jeblair | anyone else? | 17:15 |
anteaya | jeblair: oh yeah | 17:15 |
markmcclain | that will work | 17:15 |
mordred | jeblair: done | 17:16 |
clarkb | jeblair: yes, will be slightly distracted by job config stuff though | 17:16 |
* zaro is available | 17:17 | |
jeblair | pabelanger: let us know when you have pbx.o.o configured the way you want | 17:17 |
*** derekh has quit IRC | 17:17 | |
*** MarkAtwood has joined #openstack-infra | 17:17 | |
mordred | markmcclain: released | 17:18 |
mordred | jeblair: the jobs should fail, but I tihnk I should push the tag back to gerrit anyway, what do you think? | 17:18 |
fungi | jeblair: regarding loss of consensus, i'm still catching up on what the consensus was | 17:18 |
jeblair | mordred: yes | 17:18 |
mordred | done | 17:18 |
markmcclain | mordred: thanks | 17:19 |
jeblair | fungi: (b) the one you didn't like because it adds lots of jobs | 17:19 |
*** kgriffs has left #openstack-infra | 17:19 | |
jeblair | fungi: i mean, none of us like it because it adds lots of jobs | 17:19 |
clarkb | sdague: if we set the default starting sev to ERROR that should handle the swift case but will make the screen lines always show up... | 17:19 |
*** bnemec has joined #openstack-infra | 17:20 | |
fungi | jeblair: yeah i can switch rooms and jump into the pbx in a bit. just trying to finish reading the discussion in here first | 17:20 |
*** odyssey4me has joined #openstack-infra | 17:20 | |
*** reed_ is now known as reed | 17:22 | |
*** senk has quit IRC | 17:22 | |
openstackgerrit | A change was merged to openstack-infra/config: Make gate-tempest-devstack-vm-large-ops voting https://review.openstack.org/48547 | 17:22 |
fungi | jeblair: clarkb: sdague: mordred: if adding duplicate jobs is the safest and most pragmatic solution, then i agree it makes sense to take that route (no need to add features to support that) | 17:22 |
*** johnthetubaguy1 has quit IRC | 17:22 | |
*** reed has quit IRC | 17:22 | |
*** reed has joined #openstack-infra | 17:22 | |
pabelanger | jeblair, sure, give me a minute, trying to fix some errors on the pbx | 17:24 |
*** ryanpetrello has quit IRC | 17:25 | |
openstackgerrit | A change was merged to openstack-infra/config: add gate-tempest-devstack-vm-neutron-pg job https://review.openstack.org/48635 | 17:26 |
harlowja | qq for ya'll | 17:31 |
harlowja | if anybody has some free secs | 17:31 |
*** alcabrera_afk is now known as alcabrera | 17:31 | |
pabelanger | jeblair, is multiple asterisk boxes still up? | 17:33 |
*** wchrisj_ has quit IRC | 17:33 | |
pabelanger | okay pbx.o.o is fixes | 17:33 |
pabelanger | fixed* | 17:33 |
jeblair | pabelanger: maybe? i can check, but i think voipms is configured for pbx.o.o | 17:34 |
jeblair | pabelanger: yeah, the others are still around if we need them. | 17:34 |
anteaya | I'm in | 17:35 |
*** MarkAtwood2 has quit IRC | 17:35 | |
*** MarkAtwood2 has joined #openstack-infra | 17:35 | |
fungi | yeah, they keep e-mailing me about pending updates/needed reboots but since they don't have a domain configured they don't match my cronspam filters and land in my inbox instead | 17:36 |
*** hemnafk is now known as hemna_ | 17:36 | |
fungi | so pretty sure they're still up | 17:36 |
pabelanger | jeblair, okay, seems to be working now | 17:36 |
* zaro is in conference | 17:36 | |
harlowja | so just a question that the taskflow team is having, we'd like to run our tests against a real mysql instance (or maybe even postgres) instead of just sqlite (especially the migration part) and was wondering if there is any standard process to go through to make that happen? | 17:37 |
anteaya | my skype crashed, back now | 17:37 |
anteaya | and I am out again, my skype keeps crashing | 17:38 |
harlowja | :( | 17:38 |
anteaya | new laptop just installed it | 17:38 |
anteaya | sigh | 17:38 |
jeblair | i only hear silence now | 17:41 |
pabelanger | I am tweaking the time while you are talking to see if there is an notice of impact | 17:41 |
pabelanger | so, there might be some chop | 17:42 |
pabelanger | I increased the threashold | 17:42 |
jeblair | it came back | 17:42 |
pabelanger | yup | 17:42 |
pabelanger | lowering it again | 17:42 |
pabelanger | okay | 17:42 |
pabelanger | back to 1000ms | 17:42 |
pabelanger | (the sweet spot, so far) | 17:42 |
jog0 | clarkb: until we sort out the gerrit user for elastic-recheck just using my own user | 17:44 |
pabelanger | wow | 17:45 |
*** reed has quit IRC | 17:48 | |
*** SergeyLukjanov has joined #openstack-infra | 17:50 | |
*** dizquierdo has left #openstack-infra | 17:50 | |
anteaya | my skype died | 17:51 |
pabelanger | anteaya, okay | 17:52 |
*** sarob has joined #openstack-infra | 17:52 | |
anteaya | I'm pm'ing fungi for the rest | 17:52 |
*** melwitt has joined #openstack-infra | 17:52 | |
*** nati_ueno has joined #openstack-infra | 17:52 | |
*** odyssey4me has quit IRC | 17:54 | |
*** boris-42 has joined #openstack-infra | 17:58 | |
*** Ajaeger has joined #openstack-infra | 18:01 | |
pabelanger | back to 1000ms for silence | 18:02 |
pleia2 | if it would be helpful to have me join the call too let me know, I got distracted by my baremetal testing strace finally working (hooray) | 18:02 |
pleia2 | well, the failure appearing so I could strace it anyway :) | 18:02 |
*** odyssey4me has joined #openstack-infra | 18:03 | |
jog0 | its scary watching the elastic-recheck bot in openstack-qa | 18:05 |
* fungi is afraid to look | 18:06 | |
jog0 | sdague: ping | 18:07 |
jog0 | for bug 1230407 | 18:07 |
uvirtbot | Launchpad bug 1230407 in neutron "VMs can't progress through state changes because Neutron is deadlocking on it's database queries, and thus leaving networks in inconsistent states" [Critical,Confirmed] https://launchpad.net/bugs/1230407 | 18:07 |
jog0 | what would be better query to use for thatone | 18:07 |
*** dcramer_ has joined #openstack-infra | 18:07 | |
jog0 | something like @message:"Lock wait timeout exceeded" AND @fields.filename:"logs/screen-q-svc.txt" AND @fields.build_status:"FAILURE" ? | 18:09 |
*** DinaBelova has joined #openstack-infra | 18:09 | |
devananda | wsme seems to be broken? | 18:09 |
*** julim has quit IRC | 18:10 | |
devananda | clarkb: what's the interface to do searches on recent jenkins failures? | 18:12 |
jog0 | devananda: logstash.openstack.org | 18:13 |
fungi | devananda: comes to us from the distant past of monday | 18:14 |
fungi | with news of wsme issues | 18:14 |
*** dmakogon__ has quit IRC | 18:14 | |
*** dmakogon_ has joined #openstack-infra | 18:15 | |
jeblair | clarkb, fungi, mordred, jhesketh: https://review.openstack.org/48684 | 18:16 |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Make devstack jobs templates and create check jobs https://review.openstack.org/48714 | 18:16 |
jeblair | clarkb, fungi, mordred, jhesketh: if we merge that soonish, we can probably manage a zuul restart over the weekend to pick it up | 18:16 |
*** alexpilotti has quit IRC | 18:16 | |
devananda | fungi: wait. wsme's been broken since monday? | 18:17 |
*** odyssey4me has quit IRC | 18:20 | |
sdague | jog0: i'd actually narrow the message to - "Lock wait timeout exceeded; try restarting transaction" | 18:20 |
anteaya | pleia2: hooray | 18:20 |
*** dcramer_ has quit IRC | 18:21 | |
devananda | fungi: logstash suggests that it broke ~4hr ago with the new upload of pecan | 18:21 |
devananda | http://bit.ly/1606Cmj | 18:21 |
* sdague just got back from lunch + bike ride, scrolling back | 18:22 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Make devstack jobs templates and create check jobs https://review.openstack.org/48714 | 18:22 |
jog0 | sdague: awesome thanks | 18:22 |
jog0 | @message:"Lock wait timeout exceeded; try restarting transaction" AND @fields.filename:"logs/screen-q-svc.txt" AND @fields.build_status:"FAILURE" looks good | 18:22 |
clarkb | sdague: tl;dr is the wsgi log filter doesn't handle swift and keystone | 18:22 |
clarkb | sdague: because they are not oslo format | 18:23 |
clarkb | sdague: for swift I think we just want to let them pass through and for keystone we may need a slightly more forgiving regex | 18:24 |
*** dmakogon_ has quit IRC | 18:24 | |
clarkb | but let me know what you think | 18:24 |
*** dmakogon_ has joined #openstack-infra | 18:24 | |
sdague | clarkb: sure. So what I should actually do is get some unit testing for this in tree so we can dump in a bunch of sample logs and make sure it works | 18:25 |
clarkb | ++ | 18:25 |
sdague | clarkb: is there a pattern already for unit testing things in the config tree? | 18:25 |
clarkb | nope | 18:25 |
clarkb | sdague: but we do have a tox.ini | 18:26 |
clarkb | sdague: so you should be able to make use of that | 18:26 |
clarkb | or | 18:26 |
*** nati_ueno has quit IRC | 18:26 | |
clarkb | we could split this into a proper project | 18:26 |
sdague | yeh, I'm mixed on that, it seems like more trees end up just being more complexity | 18:26 |
clarkb | ya there are tradeoffs | 18:27 |
*** nati_ueno has joined #openstack-infra | 18:27 | |
*** odyssey4me has joined #openstack-infra | 18:27 | |
*** zaro is now known as list | 18:27 | |
*** list is now known as zaro | 18:27 | |
dims | jog0, i added some notes, basically 4 SQL statements hit this | 18:28 |
sdague | dims: they are all basically the same fail though right? | 18:28 |
sdague | I think that neutron fail moves around | 18:28 |
dims | sdague, all 4 SQL's end up with "Lock wait timeout exceeded; try restarting transaction" - yes. | 18:28 |
*** ryanpetrello has joined #openstack-infra | 18:29 | |
jog0 | fun | 18:29 |
sdague | jog0: well it's a database deadlock | 18:29 |
sdague | so that's kind of expected | 18:29 |
sdague | as whoever gets there last looses, and that's going to change | 18:29 |
*** rockyg has joined #openstack-infra | 18:30 | |
dims | sdague, y | 18:30 |
sdague | jeblair: you around? I want to get your opinion of trying to bring unit testing into config vs. breaking out to a separate project | 18:30 |
sdague | clarkb, jeblair: on the rax nodes, I'd say general pool. they should be running in 45min max I think (they were about 40% slower for devstack runs) | 18:31 |
sdague | at least short term | 18:31 |
sdague | check queue down to 20, nice. Much better than 190 | 18:32 |
jeblair | sdague: well, we sort of settlen on plan (b) which was still to just use rax nodes exclusively for check, but to also allow hp nodes to contribute to check. clarkb just finished the change here: https://review.openstack.org/#/c/48714/ | 18:33 |
sdague | ok, that's cool too | 18:33 |
jeblair | sdague: since the hard part is done, we might as well keep going with it, for now at least. can always change later. :) | 18:33 |
sdague | yep | 18:33 |
Ajaeger | clarkb: do you have a few minutes to discuss https://review.openstack.org/#/c/47691/ ? I'd like to know whether and how to rename the manual jenkins jobs | 18:34 |
jeblair | sdague: i'm fine either way on testing, but i feel like by the time something needs a unit test, that's one of the signals that it's probably time for it to be its own project. we have high hopes for this thing anyway. i think splitting is a good idea, but am not opposed to more 'incubation' if you're not quite ready. | 18:35 |
sdague | sure, though I do think all the python in the config tree should have tests anyway :) solving a framework to make that easier would be good at some point. | 18:36 |
sdague | but I expect we'll use some of the log parsing for other things here, so let me split this out | 18:37 |
*** reed has joined #openstack-infra | 18:37 | |
dansmith | wow, that monster check queue dumped pretty quick :) | 18:37 |
jeblair | sdague: i think the thing is that mostly we don't think there should be very much python in the config tree. a quick look suggests we're pretty close to that. | 18:38 |
sdague | dansmith: you can thank mtreinish and tempest testr for that. We can actually chew through it pretty quick when not starved :) | 18:39 |
dansmith | sdague: I know why it's faster, I'm just saying I would have expected it to take longer than a couple hours given how huge it was | 18:40 |
jeblair | dansmith: we threw 300 machines at it. | 18:40 |
sdague | nice :) | 18:41 |
dansmith | jeblair: ah | 18:41 |
dansmith | I try not to throw my machines around, personally, but.. thanks anyway :) | 18:41 |
sdague | oh, hey, yeh I didn't see the bottom graph | 18:41 |
sdague | that's pretty awesom | 18:41 |
jeblair | dansmith: that's how we roll here | 18:41 |
clarkb | Ajaeger: yes, actually something similar to what I have done to sort out devstack-gate stuff may help | 18:41 |
dansmith | jeblair: props, yo. | 18:41 |
clarkb | Ajaeger: but basically have a single project entry aclled openstack-manuals that covers all of the various subsets | 18:41 |
Ajaeger | clarkb, let me check devstack-gate in projects.yaml | 18:42 |
clarkb | Ajaeger: https://review.openstack.org/#/c/48714/2/modules/openstack_project/files/jenkins_job_builder/config/projects.yaml the section starting on line 917 then splits out the subsets | 18:42 |
clarkb | Ajaeger: ^ is where you should look | 18:42 |
Ajaeger | clarkb: thanks for the reference | 18:43 |
*** _david_ has joined #openstack-infra | 18:44 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Make devstack jobs templates and create check jobs https://review.openstack.org/48714 | 18:45 |
clarkb | I am hopeful that ^ will actually compile correctly | 18:46 |
jeblair | sdague: https://jenkins01.openstack.org/job/gate-tempest-devstack-vm-neutron-pg/ | 18:47 |
_david_ | clarkb, mordred, jeblair i am working on WIP Gerrit-Plugin against Gerit master (upcoming 2.8 release)and hope to have something working in few days | 18:47 |
clarkb | jeblair: fungi mordred ^ that passes. I think it is ready if you are, but I am going to lunch shrotly | 18:47 |
jeblair | sdague: https://jenkins02.openstack.org/job/gate-tempest-devstack-vm-neutron-pg/ | 18:47 |
clarkb | _david_: oh | 18:47 |
clarkb | _david_: you should've told us earlier :) | 18:47 |
clarkb | zaro: ^ | 18:48 |
_david_ | i did | 18:48 |
jeblair | clarkb: i believe _david_ is up to date on our efforts | 18:48 |
*** MarkAtwood has quit IRC | 18:48 | |
clarkb | ah cool | 18:48 |
clarkb | it is I who is behind | 18:48 |
jeblair | clarkb: i think _david_ has a different risk profile with respect to working on gerrit and contributing upstream. :) | 18:48 |
_david_ | with recent changes it is actually trivial thing to do | 18:48 |
_david_ | jeblair, ;-) | 18:49 |
jeblair | _david_: neat. do you think it will be an in-tree plugin, or a separate project? | 18:49 |
Ajaeger | clarkb: so, something like this: http://paste.openstack.org/show/47615/ ? | 18:50 |
_david_ | jeblair, what exactly do you mean by in-tree plugin? | 18:50 |
jeblair | _david_: will it be in the gerrit repository, or a different one? | 18:51 |
zaro | _david_: hi, did you comment on https://gerrit-review.googlesource.com/#/c/48254 | 18:51 |
jeblair | _david_: (sorry, i haven't ever used a gerrit with plugins, i don't really know how they are maintained) | 18:51 |
_david_ | jeblair, that's a good question | 18:51 |
_david_ | zaro, yes, it was /me | 18:51 |
*** mrodden has quit IRC | 18:51 | |
*** dkliban has quit IRC | 18:51 | |
fungi | devananda: ahh, new breakage then. that i think was the latest version trying to get us out of dependency hell in grizzly | 18:52 |
_david_ | jeblair, the only problem i see (may be we have more) that Change.State.WORKINPROGRESS and DashboardAccount should be extended in core and can be influenced by plugin, | 18:52 |
_david_ | well at least not yet. | 18:52 |
fungi | devananda: dhellmann would probably be interested in your logstash link there | 18:53 |
_david_ | So here is my prototype for WorkInProgressAction (against Master): | 18:53 |
_david_ | http://pastebin.com/rZ9b8YCZ | 18:53 |
zaro | _david_: ohh ok. looks like difference of opinion going on. hope it gets resolved soon. | 18:53 |
_david_ | jeblair, concerning place: we have two option | 18:53 |
_david_ | on gerrit-review or on openstack, right? | 18:53 |
dhellmann | fungi, devananda : ryan is working on the problem | 18:54 |
_david_ | may be we still would need very little core patch to make it work, | 18:54 |
dhellmann | fungi, devananda : but any debugging details you have may help | 18:54 |
fungi | awesome, thanks dhellmann and ryan! | 18:54 |
jog0 | sdague: for bug https://bugs.launchpad.net/bugs/1230407 | 18:54 |
uvirtbot | Launchpad bug 1230407 in neutron "VMs can't progress through state changes because Neutron is deadlocking on it's database queries, and thus leaving networks in inconsistent states" [Critical,Confirmed] | 18:54 |
_david_ | but i hope to convince guys to make it work with against upstream gerrit | 18:54 |
jog0 | http://logs.openstack.org/70/44670/3/gate/gate-tempest-devstack-vm-neutron/02d68e3/logs/screen-q-svc.txt.gz?level=TRACE | 18:54 |
ryanpetrello | yep, seems to be some sort of issue introduced in the pecan/wsme plugin w/ today's pecan release | 18:55 |
ryanpetrello | debugging... | 18:55 |
*** julim has joined #openstack-infra | 18:55 | |
dhellmann | if we're seeing gate blockages, I can propose a change to pin pecan for now | 18:55 |
zaro | _david_: what is the difference between your WIP plugin and my patch to upstream? | 18:56 |
ryanpetrello | +1 | 18:56 |
*** sodabrew has joined #openstack-infra | 18:56 | |
_david_ | zaro, i don't understand that question | 18:56 |
*** sarob_ has joined #openstack-infra | 18:56 | |
zaro | _david_: ohh, i implemented that patch so we can create a custom WIP vote and you are creating a WIP plugin. so i'm just asking what would be the difference? | 18:57 |
_david_ | zaro, wip plugin is 1 to 1 migration of Shrews's change https://gerrit-review.googlesource.com/36091 against latest master with may be 10 line of upstream patch (for now) | 18:58 |
*** dkliban has joined #openstack-infra | 18:58 | |
*** dcramer_ has joined #openstack-infra | 18:58 | |
zaro | _david_: ahh i see. thx for the clarification. | 18:59 |
sdague | jog0: what's the question? | 18:59 |
sdague | sorry so many pings | 18:59 |
openstackgerrit | Doug Hellmann proposed a change to openstack/requirements: Pin pecan to avoid the latest release https://review.openstack.org/48722 | 18:59 |
_david_ | zaro, in the handling: you not vote with a label, you just mark it as in Shrews's change directly on change screen | 18:59 |
dhellmann | fungi, devananda : I opened https://bugs.launchpad.net/pecan/+bug/1232199 for tracking rechecks and the real fix | 18:59 |
uvirtbot | Launchpad bug 1232199 in pecan "release 0.4 breaks some operations with WSME" [Undecided,New] | 18:59 |
Shrews | ugh, don't remind me of that horrific coding experience | 19:00 |
jog0 | sdague: I'll move this to the qa room where its a little less noisy | 19:00 |
_david_ | Shrews, why? ,-) | 19:00 |
*** sarob has quit IRC | 19:00 | |
clarkb | Ajaeger: yes. would need to check the output to be sure though | 19:01 |
*** sarob_ has quit IRC | 19:01 | |
devananda | dhellmann: thanks! judging by logstash, i suspect ceilometer and ironic are blocked on this, but nothing else is showing up yet | 19:02 |
fungi | Shrews: pick a time and i'll join you while you drown your memories of that project in a few pints. olaph too | 19:02 |
dhellmann | devananda: ok, good. see that changeset a few lines back for a requirements pin to work around it for now | 19:02 |
Shrews | fungi: yes, we should make that happen | 19:02 |
fungi | Shrews: olaph: the lynnwood grill next door to me just started a brewery recently, and now have several kinds ready for consumption on premises | 19:03 |
_david_ | Shrews, i wonder about that comment in your code: WORKINPROGRESS ... It implies that there is more work to be done, but the change will not show up in any review lists until a new patch set is pushed. | 19:03 |
*** vipul is now known as vipul-away | 19:03 | |
Ajaeger | clarkb: sure, this was untested, just wanted to know whether I'm on the right track. | 19:04 |
Shrews | _david_: Where? Which comment? | 19:05 |
_david_ | git push convert it? Is that true? Or a change owner has to explicitly to convert it to Status.NEW? | 19:05 |
_david_ | https://gerrit-review.googlesource.com/#/c/36091/1/gerrit-reviewdb/src/main/java/com/google/gerrit/reviewdb/client/Change.java | 19:05 |
_david_ | line 285 | 19:05 |
*** yolanda has joined #openstack-infra | 19:06 | |
Shrews | _david_: The intent that, along with clarkb's patch, any WIP review would not show up in a reviewer's list. Once a new patchset is pushed to a WIP review, it becomes "Ready for review" again. | 19:07 |
Shrews | does that answer your question? not sure exactly what you're looking for | 19:07 |
jeblair | clarkb: i spot checked the output of your change locally, lgtm | 19:08 |
clarkb | its basically a public draft | 19:08 |
_david_ | Shrews, Can you point me were that conversion take place? | 19:08 |
_david_ | I thought you have two buttons: WIP and Ready for review? | 19:08 |
Shrews | _david_: i don't think it was recorded. it was mainly discussed in this channel | 19:08 |
jeblair | Shrews: 'conversion' not 'conversation' | 19:09 |
jeblair | Shrews: (i did the same thing, finally read it right the 3rd time) | 19:09 |
Shrews | oh, duh | 19:09 |
fungi | conservation | 19:09 |
_david_ | 1/ git push => Status.NEW | 19:09 |
jeblair | the conversion conversation was not conserved | 19:09 |
_david_ | 2/ i click on WIP button => Status.WIP | 19:09 |
_david_ | my question how i suposed to get back to Status.NEW again ? | 19:10 |
_david_ | All use cases please ;-) | 19:10 |
jeblair | fungi, mordred: https://review.openstack.org/48714 | 19:10 |
fungi | jeblair: yep, almost through reading that one | 19:10 |
Shrews | _david_: case 1) new patchset uploaded, case 2) press R4R button. fin | 19:10 |
Shrews | _david_: I don't remember the code well enough to point you to specific areas | 19:11 |
notmyname | mordred: FYI https://review.openstack.org/#/c/48724/ | 19:11 |
_david_ | Shrews, i didn't find where 1case 1) in code. can you pint me? | 19:11 |
_david_ | point | 19:11 |
mordred | 18:00:36 hub_cap | one of my beefs is that i scream, fucking SCREAM at people internally | 19:11 |
jeblair | hub_cap: i hear your screams from here | 19:12 |
mordred | notmyname: responded | 19:13 |
mordred | notmyname: swear swear swear grumble grumble swear swear | 19:13 |
mordred | notmyname: I kept my comment short, to keep the swearing out, fwiw | 19:13 |
Shrews | _david_: https://gerrit-review.googlesource.com/#/c/36091/1/gerrit-httpd/src/main/java/com/google/gerrit/httpd/rpc/changedetail/ChangeDetailFactory.java | 19:13 |
*** basha has joined #openstack-infra | 19:13 | |
Shrews | _david_: I *think*. Like I said, I really can't remember the code too well | 19:14 |
notmyname | mordred: and I'm the one who has to play the diplomat standing between the 2 of you ;-) | 19:14 |
mordred | notmyname: lovely | 19:14 |
mordred | well, his patch is completely non-functional | 19:14 |
mordred | like, it's not even close to being functional. it looks like a patch made in anger with absolutely no thought | 19:14 |
jeblair | notmyname: do you know if michael barton is planning on submitting a similar patch to the other 56 openstack projects? | 19:15 |
notmyname | mordred: try not to review in that way ;-) | 19:15 |
_david_ | Shrews, i don't think so, there you put if the button on Views should be enabled or no | 19:15 |
mordred | notmyname: I will not | 19:15 |
*** mrodden has joined #openstack-infra | 19:15 | |
mordred | notmyname: I am not, in fact, going to review it further | 19:15 |
jeblair | (because if not, it may not be as well thought out as the patch that added in pbr) | 19:15 |
notmyname | mordred: jeblair: and, like I said, FYI. | 19:15 |
Shrews | _david_: well, i don't remember then | 19:15 |
mordred | notmyname: I believe "all of the openstack projects use it and it plays a key role in release management" should be clear enough | 19:15 |
fungi | clarkb: no need for a check-tempest-devstack-vm-heat-slow since gate-blah is only in the experimental pipeline? | 19:15 |
*** jcoufal has joined #openstack-infra | 19:16 | |
notmyname | mordred: yes, but "the way things are" is not a compelling argument for most people. /me being a diplomat | 19:16 |
clarkb | fungi right | 19:16 |
_david_ | Shrews, and you are absolutely sure that it is implemented? | 19:16 |
mordred | notmyname: I understand. but sometimes here, with as many projects as we have, I cannot make 56 different long-form arguments to everyone who would just happen to have chosen to solve the problem differently | 19:17 |
Shrews | _david_: If it isn't then I don't know how review.o.o has been working that way for the last umpteen months | 19:17 |
notmyname | mordred: agreed | 19:17 |
*** DinaBelova has quit IRC | 19:17 | |
jeblair | notmyname: i, and i'm sure many others agree with you. something about the fact that he chose to propose that patch without even trying to understand why things are the way thay are rankles a bit. | 19:17 |
mordred | notmyname: thank you, btw, for diplomating here | 19:17 |
* lifeless is curious about which patch is being discussed; couldn't find the start of the conversation | 19:17 | |
mordred | lifeless: https://review.openstack.org/#/c/48724/1 | 19:18 |
*** MarkAtwood2 has quit IRC | 19:18 | |
notmyname | jeblair: yes, but from the opposite perspective, pbr is making his day-to-day life more difficult without offering any perceived benefit (ie he now has to repackage the library himself instead of using something on pypi, and it includes more dependencies that may also need to be repackaged too) | 19:19 |
notmyname | jeblair: note I'm not arguing against pbr here | 19:20 |
jeblair | notmyname: yep. pbr makes some things more and some things less difficult, no argument there. attempting to delete it is a strange way of learning about what those are and what solutions there may be to his problems. | 19:22 |
notmyname | jeblair: we don't need to rehash long-form arguments about pbr here or now. I'll see what can be done | 19:23 |
lifeless | mordred: notmyname: huh, what I find interesting is the lack of attempt to understand - did he file a bug on pbr and the situation it fails in? | 19:24 |
mordred | notmyname: http://paste.openstack.org/show/47621/ | 19:25 |
mordred | that is what I would like to respond | 19:25 |
mordred | notmyname: he does not have to repackage the library himself | 19:25 |
openstackgerrit | A change was merged to openstack-infra/config: Make devstack jobs templates and create check jobs https://review.openstack.org/48714 | 19:25 |
notmyname | mordred: thanks. gotta run to a lunch meeting... | 19:26 |
mordred | notmyname: if he would read the documentation put together for packagers, he would see that he has to set an env var | 19:26 |
mordred | notmyname: thank you! | 19:26 |
*** basha has quit IRC | 19:27 | |
hub_cap | lol mordred | 19:28 |
hub_cap | jeblair: u might be able to hear those screams | 19:28 |
jeblair | fungi, mordred, clarkb: Make devstack jobs templates and create check jobs just merged; exciting things should be happenening soon | 19:28 |
clarkb | fingers are crossed | 19:28 |
* mordred waits | 19:28 | |
*** odyssey4me has quit IRC | 19:28 | |
jeblair | hub_cap: i have been hearing a lot of sirens recently; do you have something to do with that? | 19:28 |
openstackgerrit | Jeremy Stanley proposed a change to openstack-infra/config: Determine the package name when uploading to PyPI https://review.openstack.org/46805 | 19:29 |
hub_cap | nope. it could be the band of gypsies that have set up shop on dwight... a big bus of em, and some sleeping in cars in the area | 19:29 |
openstackgerrit | A change was merged to openstack-infra/config: Determine the package name when uploading to PyPI https://review.openstack.org/46805 | 19:31 |
fungi | i'm not sure exciting is what i want out of my evening... here's hoping it's exciting in a good way and not in the usual way | 19:31 |
jeblair | i'm going to run puppet on jenkins masters manually to make that happen a bit faster and smoother | 19:33 |
jeblair | (to minimize the time that the check jobs don't exist before zuul reloads and starts using them) | 19:34 |
*** odyssey4me has joined #openstack-infra | 19:36 | |
jswarren | bnemec, if you're not busy fighting neutron or any other component, https://review.openstack.org/#/c/46553 seems to have settled down a bit in case you're up for another look. I seem to have a talent for finding problems to work on that are not straightforward to explain and whose solutions are not easy to justify concisely. Just lucky, I guess. | 19:40 |
jeblair | zuul change is going in now | 19:41 |
jswarren | oops, wrong channel. | 19:41 |
*** wchrisj_ has joined #openstack-infra | 19:46 | |
ryanpetrello | FYI, I have a review open for pecan which will resolve the WSME issue | 19:46 |
*** CaptTofu has joined #openstack-infra | 19:47 | |
fungi | mordred: is the current thinking that pbr should only be a setup_requires a la https://git.openstack.org/cgit/openstack-infra/git-review/tree/setup.py#n20 | 19:47 |
*** jswarren has quit IRC | 19:47 | |
fungi | mordred: because basically all of the clients still have it listed in their requirements.txt as if it were a runtime requirement | 19:48 |
fungi | which i can see potentially confusing downstream/distro package maintainers | 19:48 |
jgriffith | jeblair: clarkb inerested in changing the settings to nova.conf in the gate..... not sure what repo/where the best place to do that is? | 19:51 |
jgriffith | jeblair: clarkb I'd like to bump CONF.num_iscsi_scan_tries | 19:51 |
mordred | fungi: it depends on whether or not they use it at runtime | 19:52 |
fungi | ahh | 19:52 |
mordred | fungi: for version processing | 19:52 |
fungi | mordred: got it... the bits which are in the process of being moved to oslo | 19:52 |
fungi | jgriffith: for devstack-gate jobs? if it makes sense to be adjusted as a default behavior for devstack, then in devstack. if it's really very specific to how we're testing things and not generally helpful (or potentially harmful) to other devstack use cases, then overriding in devstack-gate would be appropriate | 19:53 |
fungi | but we try to keep devstack-gate from changing devstack defaults if at all possible, so that we don't "test with devstack" using configurations dissimilar to the way other people run devstack in general | 19:55 |
*** rfolco has quit IRC | 19:56 | |
jgriffith | fungi: hmm... ok | 19:56 |
jgriffith | fungi: there's an awful lot of "added" changes from devstack in the gate configs which is why I asked but cool by me | 19:56 |
jeblair | jgriffith: we hate all of them | 19:57 |
fungi | we've been moving those out as we can | 19:57 |
jgriffith | jeblair: haha... Ok, now that makes more sense :) | 19:57 |
*** MarkAtwood has joined #openstack-infra | 19:58 | |
*** SergeyLukjanov has quit IRC | 19:58 | |
*** ryanpetrello has quit IRC | 19:59 | |
*** vipul-away is now known as vipul | 20:01 | |
mordred | fungi: that's right | 20:01 |
jeblair | zuul is now using the check jobs | 20:02 |
*** _david_ has quit IRC | 20:03 | |
fungi | so it should be safe to re-diversify the pipeline precedence settings again? | 20:03 |
*** ryanpetrello has joined #openstack-infra | 20:03 | |
jeblair | fungi: yes, if we're okay with the possibility of starving check of the unit test runners. so all told, i'm leaning toward leaving it for now. | 20:04 |
fungi | k | 20:04 |
openstackgerrit | Dirk Mueller proposed a change to openstack/requirements: Raise Babel requirements to >= 1.1 https://review.openstack.org/48739 | 20:05 |
openstackgerrit | Andreas Jaeger proposed a change to openstack-infra/config: Use Jenkins templates for old manual jobs https://review.openstack.org/47691 | 20:06 |
Ajaeger | clarkb: your suggested change worked fine for me, I've updated the patch, see ^^ | 20:06 |
*** alcabrera has quit IRC | 20:07 | |
clarkb | Ajaeger: cool, I will take a look | 20:07 |
*** sarob has joined #openstack-infra | 20:07 | |
Ajaeger | clarkb: thanks. If you have further ideas, just comment on it and I'll fix in the following days. For now I'm calling it a day. | 20:08 |
* Ajaeger waves good-bye | 20:08 | |
clarkb | have a good weekend | 20:08 |
*** alcabrera has joined #openstack-infra | 20:08 | |
Ajaeger | clarkb: thanks, same to all of you! | 20:09 |
*** yolanda has quit IRC | 20:09 | |
*** Ajaeger has quit IRC | 20:09 | |
clarkb | jeblair: which zuul change did you want reviewed? | 20:11 |
jeblair | https://review.openstack.org/#/c/48684/ | 20:11 |
*** basha has joined #openstack-infra | 20:11 | |
*** sarob has quit IRC | 20:13 | |
clarkb | jeblair: we should also get https://review.openstack.org/#/c/46869/ in | 20:13 |
clarkb | jeblair: I didn't approve due to the -1, but figure you can decide if that is worth overriding | 20:13 |
*** prad_ has quit IRC | 20:14 | |
clarkb | 48684 lgtm | 20:14 |
*** dprince has quit IRC | 20:14 | |
mordred | 48684 has now been reviewed by all of us | 20:14 |
Alex_Gaynor | jeblair: want to review https://review.openstack.org/#/c/47953/ while you're in that area? (tahnks!) | 20:14 |
jeblair | Alex_Gaynor: nice catch, thanks | 20:15 |
*** CaptTofu has quit IRC | 20:16 | |
*** basha has quit IRC | 20:16 | |
*** prad has joined #openstack-infra | 20:16 | |
*** prad has quit IRC | 20:16 | |
*** rockyg has quit IRC | 20:18 | |
*** rockyg has joined #openstack-infra | 20:18 | |
jeblair | mordred: https://jenkins02.openstack.org/computer/precise38/builds | 20:20 |
*** dmakogon_ has quit IRC | 20:20 | |
jeblair | that host was producing this error as fast as it could: https://jenkins02.openstack.org/job/gate-glance-pep8/619/console | 20:20 |
jeblair | i disconnected/reconnected it | 20:21 |
jeblair | i hate jenkins | 20:21 |
jeblair | precise10 is doing it as well | 20:21 |
*** CaptTofu has joined #openstack-infra | 20:22 | |
clarkb | jeblair: could that be related to the increase in slaves? | 20:22 |
clarkb | jenkins does seem to have an upper bound on the number of slaves it can handle before it starts failing to keep them connected | 20:23 |
jeblair | clarkb: beats me. do you understand that traceback? | 20:23 |
clarkb | I don't | 20:24 |
clarkb | it is trying to run a remote connection | 20:24 |
jeblair | clarkb: want to spin up jenkins03? | 20:25 |
fungi | i'm happy to start firing up a jenkins or two if you want to keep troubleshooting | 20:26 |
clarkb | jeblair: we can try it | 20:26 |
clarkb | I don't have much time to do that though, I need to finish preping for next week | 20:27 |
fungi | looks like we used a 30gb flavor? | 20:27 |
jeblair | clarkb: to be clear, i wasn't suggesting it as much as asking if that was your suggestion. ;) | 20:27 |
fungi | 8x vcpu with load average hovering a little over 5, slightly more than 50% of ram in active use (not buffers/cache). looks like it's sized appropriately--would be struggling a little on the next flavor down | 20:29 |
*** wchrisj_ has quit IRC | 20:29 | |
clarkb | jeblair: ah, yes. So in the grizzly cycle with one jenkisn we ran into similar problems as we added more and more slaves | 20:30 |
jeblair | clarkb: oh, did we see that error? | 20:30 |
fungi | i was looking at jenkins02, which is interestingly a little more heavily-loaded than jenkins01 for some reason | 20:30 |
clarkb | jeblair: I don't remember if it was this specific error, but it happened in a similar way. Immediately when starting jobs jenkins threw an exception indicating that something in the communication had failed | 20:31 |
clarkb | fungi: oh maybe | 20:31 |
jeblair | fungi: well, that was the jenkins to which those two slaves were attached | 20:31 |
clarkb | fungi: maybe we are running into that issue with the threads hanging around again | 20:31 |
fungi | mmm | 20:31 |
*** flaperboon is now known as flaper87|afk | 20:31 | |
fungi | could be, just catching it in the early stages so symptoms aren't nearly as pronounced yet | 20:31 |
fungi | checking | 20:32 |
jeblair | precise12 just threw the same error | 20:33 |
jeblair | (also jenkins02) | 20:33 |
fungi | 1.5m threads | 20:34 |
clarkb | hahahahahaha | 20:34 |
fungi | Threads on jenkins02.openstack.org@166.78.48.99: Number = 1,628, Maximum = 2,152, Total started = 1,512,727 | 20:34 |
clarkb | sorry, I probably shouldn't find that so funny | 20:34 |
clarkb | oh | 20:34 |
fungi | oh, wait, wrong counter | 20:34 |
openstackgerrit | David Peraza proposed a change to openstack/requirements: Adding sqlalchemy db2 dialect dependencies https://review.openstack.org/48745 | 20:34 |
fungi | so no, not anywhere near as high as that last time | 20:34 |
clarkb | yeah the Number value is what you want and that doesn't look too terrible | 20:34 |
fungi | pulling up 01 for a spot comparison | 20:35 |
*** ryanpetrello has quit IRC | 20:35 | |
jeblair | i just checked the rest of the precise nodes on jenkins02, they're not failing jobs with that error (yet) | 20:35 |
fungi | Threads on jenkins01.openstack.org@166.78.188.99: Number = 1,276, Maximum = 1,474, Total started = 862,538 | 20:36 |
*** ryanpetrello has joined #openstack-infra | 20:36 | |
fungi | so 02 is definitely higher, but only by about 30% | 20:36 |
jeblair | btw, the status pgae, starting with 48516 is interesting -- that's what happens when changes behind a single change fail in succession | 20:36 |
jeblair | (and yeah, the top is broken; i'll fix that next week) | 20:36 |
*** sarob has joined #openstack-infra | 20:37 | |
fungi | wow, that's a great indication that the tempest change at 48516 is causing the trouble not for itself but for changes which follow | 20:38 |
fungi | oh, except those failures aren't in tempest tests (yet) | 20:39 |
jeblair | fungi: yeah, that would be the interpretation except that the actual problem is that all of those changes happened to hit our bad jenkins nodes | 20:39 |
fungi | so csincidence | 20:39 |
fungi | coincidence | 20:39 |
fungi | should we put jenkins02 in shutdown and restart it to limp through before adding more masters (if we think we're bumping up against an inherent slave tracking limitation)? | 20:40 |
fungi | and also scale down nodepool's per-master max setting? | 20:40 |
jeblair | fungi: i reconnected those slaves, and they seem better at the moment; i think we can leave 02 as is for now; i don't really want to lose its capacity | 20:41 |
fungi | k | 20:41 |
*** jcoufal has quit IRC | 20:42 | |
fungi | so back to the earlier question... go ahead and start building more masters? or hold off until we're more certain it's warranted? | 20:42 |
jeblair | i wasn't expecting problems until we had more slaves, but perhaps 200/master is the mark. | 20:44 |
*** odyssey4me has quit IRC | 20:44 | |
pleia2 | anteaya: gave owncloud a spin in win7 with IE9, all works as expected | 20:44 |
*** flaper87|afk is now known as flaper87 | 20:44 | |
anteaya | woohoo | 20:44 |
anteaya | thanks pleia2 | 20:44 |
pleia2 | sure :) | 20:45 |
* pleia2 logs out of windows before she gets dirty | 20:45 | |
jeblair | i think we peaked at around 186 slaves total | 20:45 |
anteaya | no kidding | 20:45 |
anteaya | that's a lot of slaves | 20:45 |
jeblair | per master, including unit test workers | 20:45 |
jeblair | https://bugs.launchpad.net/openstack-ci/+bug/1148900 | 20:46 |
uvirtbot | Launchpad bug 1148900 in openstack-ci "Could not initialize class jenkins.model.Jenkins$MasterComputer" [High,Fix released] | 20:46 |
jeblair | blast from the past | 20:46 |
fungi | nodepool is reinventing jclouds failure modes ;) | 20:47 |
fungi | except not really, because these are static slaves which have been connected and running jobs just fine | 20:48 |
jeblair | fungi: except these are long running nodes | 20:48 |
* fungi nods | 20:48 | |
jeblair | fungi: i am leaning toward not spinning up another master | 20:49 |
fungi | k | 20:49 |
jeblair | i favor: if it happens again, restart that jenkins master, and if it happens again after that, add a new master. | 20:49 |
zaro | pleia2: did you try map drive using webdav protocol? | 20:49 |
pleia2 | zaro: no, that's a good idea | 20:49 |
fungi | i like that having multiple masters, we can restart them now without any downtime for other systems, merely temporary loss of capacity | 20:50 |
jog0 | are you running jobvs on rax yet? | 20:50 |
jog0 | Ithink that may be breaking the large-ops test | 20:50 |
jeblair | jog0: yes | 20:50 |
jog0 | :( | 20:50 |
jeblair | jog0: link? | 20:50 |
zaro | pleia2: i had problems with that last time i tried. | 20:50 |
jog0 | so large http://logs.openstack.org/27/48727/1/check/check-tempest-devstack-vm-large-ops/a3e7745/ | 20:51 |
jeblair | jog0: yeah, it looks like the only successful runs of check-tempest-devstack-vm-large-ops have been on hpcloud | 20:51 |
jeblair | jog0: any ideas? | 20:51 |
fungi | threadcount on jenkins01 and 02 is equalizing a bit now as well | 20:51 |
openstackgerrit | A change was merged to openstack-infra/config: Handle when `id` is null. https://review.openstack.org/47953 | 20:51 |
jog0 | jeblair: we would have to tweek the large-ops number for rax | 20:52 |
openstackgerrit | A change was merged to openstack-infra/zuul: On null changes serialize the id as null https://review.openstack.org/46869 | 20:52 |
openstackgerrit | A change was merged to openstack-infra/zuul: Allow multiple invocations of the same job https://review.openstack.org/48684 | 20:52 |
jeblair | jog0: why? | 20:52 |
jog0 | because it was tuned to work for hpcloud | 20:52 |
jeblair | tuned? | 20:52 |
jog0 | the test check to see if it can boot x VMs using fake virt driver. where a common error is something timeing out | 20:52 |
fungi | seems a bit inexact | 20:53 |
jeblair | jog0: so why would that need to be different? | 20:53 |
*** MarkAtwood has quit IRC | 20:53 | |
jog0 | so rax cloud is running slower so timeouts happen with less VMs | 20:53 |
jeblair | BuildErrorException: Server %(server_id)s failed to build and is in ERROR status | 20:53 |
fungi | basically it's performance-testing the cloud provider, it seems | 20:53 |
jeblair | jog0: a server being in error state is a result of that? | 20:53 |
jog0 | fungi: yeah and our code too | 20:53 |
jog0 | jeblair: yup | 20:54 |
jog0 | nova-net times out | 20:54 |
openstackgerrit | David Peraza proposed a change to openstack/requirements: Adding sqlalchemy db2 dialect dependencies https://review.openstack.org/48745 | 20:54 |
jog0 | when all cloud resources were equal, the test just performance tested our code. but with two very different couds ... :( | 20:55 |
jeblair | jog0: it was an illusion that all cloud resources were equal, i'm afraid | 20:55 |
jeblair | even hpcloud has significant variance | 20:55 |
jog0 | some are more equal then others? | 20:55 |
jeblair | especially when we approach release deadlines. :) | 20:55 |
jog0 | jeblair: yeah the number I picked before seemed pretty stable | 20:55 |
jog0 | accross all HP cloud | 20:55 |
jeblair | so these aren't really designed to be performance tests -- ideally these should work on developers laptops too... | 20:56 |
jog0 | never got fails like this with HP cloud, at least extremely rarely (I never found one) | 20:56 |
jog0 | jeblair: it does you just have to pick one param | 20:56 |
jeblair | jog0: ideally the test would be structured to be more tolerant of the environment it's running in. but for our immediate problem, would you like to adjust the parameter or remove the test? | 20:57 |
jog0 | jeblair: lets just remove it due to the nature of the gate right now I think its safe to say this shouldn't get priority at this juncture | 20:58 |
jog0 | and revist post havana | 20:58 |
*** julim has quit IRC | 20:58 | |
jeblair | jog0: shame to lose a test. :( | 20:59 |
jog0 | yeah ... | 20:59 |
jog0 | I think the answer in the future will be have two numbers one for hpcloud and one for rax | 20:59 |
jog0 | that will take at least a day of testing and whatnot to get right | 21:00 |
jeblair | jog0: and one for the next provider we get, and one for the one after that? | 21:00 |
jog0 | have to run recheck a dozen times or so to be sure I am right | 21:00 |
jog0 | we can maybe find a CPU perf metric to corrilate with a number | 21:00 |
jog0 | once we get two datapoints | 21:00 |
fungi | unfortunately, those will also probably have to be retuned even for existing providers as their performance characteristics change over time | 21:01 |
jog0 | so if CPU A is 30% slower then CPU B, number should be 30 percent lower too | 21:01 |
*** freyes has quit IRC | 21:02 | |
jog0 | fungi: perhap, the test is there to detect order ofmagnitide slowdowns | 21:02 |
*** matty_dubs is now known as matty_dubs|gone | 21:02 | |
jog0 | and I would hope a cloud wouldn't have that fluctation | 21:03 |
jeblair | jog0: i used to hope that | 21:03 |
*** sodabrew_ has joined #openstack-infra | 21:04 | |
jog0 | jeblair: lets talk about a smarter way to do this in Edinburgh | 21:04 |
jog0 | or HK | 21:04 |
fungi | we've definitely been in situations where new vms ended up on compute nodes with very resource-hungry neighbors | 21:04 |
jeblair | jog0: we have seen some of the metrics we care about change up to 3x over time; including both cloud providers. | 21:04 |
* fungi needs to disappear and do a bit of cooking... bbl | 21:05 | |
jog0 | jeblair: ouch | 21:05 |
*** sodabrew has quit IRC | 21:06 | |
jog0 | well if we ollect those numbers today ... we can make something adjust to that | 21:06 |
jeblair | jog0: so i think we can probably live with running the large-ops test only on hp for now, as long as we definitely plan to improve it later. | 21:06 |
jog0 | that would be awesome | 21:06 |
jog0 | that test came out of the issues with rootwrap | 21:07 |
*** ArxCruz has quit IRC | 21:07 | |
*** tjones has joined #openstack-infra | 21:07 | |
jog0 | jeblair: didn't realize that was an option to put it on one cloud only | 21:08 |
*** julim has joined #openstack-infra | 21:08 | |
jeblair | jog0: it's not a good option -- it's working against how we're trying to manage resources. and if we have further problems, it'll be the first thing to go. but we can try it. :) | 21:08 |
*** senk has joined #openstack-infra | 21:09 | |
*** jcoufal has joined #openstack-infra | 21:10 | |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Run large-ops test only on hp nodes https://review.openstack.org/48748 | 21:10 |
*** rnirmal has quit IRC | 21:11 | |
jog0 | fair enough | 21:11 |
jog0 | yeah we need to revsiit this in the near future | 21:11 |
jeblair | jog0: so while you're around... other than Zhi Kun ZK Liu being on vacation, do you know if work on those 2 bugs is progressing? | 21:12 |
jog0 | jeblair: a little sdague and jgriffith and dims are doing stuff | 21:13 |
jog0 | jeblair: see -qa | 21:13 |
jeblair | jog0: thx | 21:14 |
jog0 | my call to arms / public shaming worked a little | 21:14 |
jgriffith | jog0: /window 25 | 21:15 |
jgriffith | crap | 21:15 |
*** vipul is now known as vipul-away | 21:15 | |
*** senk has quit IRC | 21:17 | |
*** julim has quit IRC | 21:19 | |
*** tjones has quit IRC | 21:20 | |
jeblair | i just saw some more of those errors | 21:20 |
jeblair | i've put jenkins02 in shutdown | 21:20 |
*** markmcclain1 has joined #openstack-infra | 21:21 | |
*** markmcclain has quit IRC | 21:22 | |
*** markmcclain has joined #openstack-infra | 21:22 | |
*** markmcclain has quit IRC | 21:24 | |
*** markmcclain has joined #openstack-infra | 21:24 | |
*** markmcclain1 has quit IRC | 21:26 | |
jeblair | clarkb: ping | 21:27 |
jeblair | clarkb: i need https://review.openstack.org/#/c/45348/ to be merged but it depends on https://review.openstack.org/#/c/45347/1 | 21:27 |
*** alcabrera has quit IRC | 21:27 | |
*** vipul-away is now known as vipul | 21:28 | |
*** anteaya has quit IRC | 21:28 | |
dims | k i'll be back in a few hours | 21:29 |
*** markmcclain1 has joined #openstack-infra | 21:29 | |
jeblair | lacking that, i have manually executed "set global max_connections=1024;" in mysql on nodepool | 21:29 |
*** mriedem has quit IRC | 21:30 | |
*** markmcclain has quit IRC | 21:30 | |
ryanpetrello | okay, a new version of pecan (0.4.2) has been released that resolved the wsme breakage | 21:36 |
jeblair | oh nevermind, 0.6.1 doesn't have it either | 21:36 |
jeblair | clarkb: ^ | 21:36 |
clarkb | jeblair: looking | 21:36 |
dhellmann | jeblair, fungi: we'd like to land https://review.openstack.org/#/c/43145/ so we can set up cross-check jobs to gate pecan and WSME. The change has 2 +2 but isn't approved. Is there something else we need? | 21:36 |
jeblair | clarkb: i'm trying to add max_connections; i don't think it's supported even in 0.6.1. i may have to add a /etc/mysql/conf.d/ file | 21:38 |
clarkb | jeblair: we could potentially go to an even newer version. 0.6.1 was chosen to minimize delta while getting the desired results | 21:38 |
mgagne | jeblair: looks to be only supported in 1.0.0. adding a custom conf file looks to be the solution atm. I have the same problem with my setup. | 21:39 |
jeblair | dhellmann: i think we're afraid to merge that at the moment (if it goes wrong everything breaks), and there's quite a bit of excitement already. | 21:39 |
dhellmann | jeblair: fair enough :-) | 21:39 |
dhellmann | jeblair: we'll work on setting up the tests, and come back when things settle down to configure the gate jobs | 21:39 |
mgagne | jeblair: 0.9.0 supports it https://github.com/puppetlabs/puppetlabs-mysql/blob/0.9.0/manifests/config.pp#L117 | 21:39 |
jeblair | dhellmann: ok. feel free to ping us when you think it might be a good time (in case it slips our minds) | 21:40 |
dhellmann | jeblair: count on it! ;-) | 21:40 |
jeblair | mgagne: oh, that might work. it has both config_hash and max_connections. | 21:41 |
dkranz | This recent failure looks like some infra issue but I haven't seen it before http://logs.openstack.org/45/41345/8/check/gate-tempest-devstack-vm-neutron/e91142b/console.html | 21:41 |
mgagne | jeblair: 0.8.0 looks to be the first version to support the parameter. | 21:41 |
jeblair | dkranz: in what way? | 21:41 |
dkranz | jeblair: It seems to just stop during setup of tempest | 21:42 |
*** pabelanger has quit IRC | 21:46 | |
jeblair | dkranz: it looks like it stopped while running devstack. but i don't think it's an infra problem -- the node continued to run, including doing all of the cleanup work and copying the log files | 21:47 |
dkranz | jeblair: So what kind of problem do you think it is? Should I just recheck no bug? | 21:48 |
dkranz | jeblair: I've been trying not to do that. | 21:48 |
jeblair | dkranz: i'd start with the idea that it's a bug in devstack. note that lots of services are running and devstack has been doing work to set up images, etc... so it at least got that far. | 21:51 |
dkranz | jeblair: OK, I'll check there and file a bug if I don't turn up anything. Thanks. | 21:52 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Set mysql max_connections to 1024 on nodepool https://review.openstack.org/48755 | 21:52 |
*** bnemec_ has joined #openstack-infra | 21:53 | |
jog0 | dkranz: I have seen things like this before so opening a bug maybe a good idea | 21:55 |
dkranz | jog0: I asked Jim and he suggested starting with the idea that it is a devstack bug | 21:55 |
dkranz | jog0: I will file a bug there if there isn't one already | 21:55 |
*** pcm_ has quit IRC | 21:56 | |
dkranz | jog0: Because the job does finish but just stops in the middle of devstack running | 21:56 |
dkranz | jog0: presumably returning non-zero exit code | 21:56 |
jog0 | sigh yet another racy bug | 21:56 |
*** bnemec has quit IRC | 21:57 | |
fungi | we don't have enough of those yet | 21:58 |
openstackgerrit | A change was merged to openstack-infra/config: Run large-ops test only on hp nodes https://review.openstack.org/48748 | 21:59 |
*** flaper87 is now known as flaper87|afk | 22:02 | |
*** pabelanger has joined #openstack-infra | 22:02 | |
mordred | moring all. I'm back on line - anything I can jump on? | 22:03 |
jeblair | i'm about to restart jenkins02 because of the errors we saw earlier (check scrollback) | 22:04 |
clarkb | mordred: puppet-mysql has come up again | 22:04 |
mordred | clarkb: ugh. what now? | 22:04 |
clarkb | mordred: thats not super urgent though | 22:04 |
clarkb | mordred: jeblair needs to limit the number of connections for nodepool and the version of the module we have doesn't do that | 22:04 |
clarkb | mordred: newer versions do | 22:04 |
jeblair | clarkb: _raise_ the limit | 22:04 |
lifeless | anyone seen | 22:05 |
lifeless | File "/opt/stack/venvs/heat/local/lib/python2.7/site-packages/pip/backwardcompat/__init__.py", line 90, in fwrite | 22:05 |
mordred | ah. interesting | 22:05 |
clarkb | jeblair: ah | 22:05 |
lifeless | f.write(s) | 22:05 |
lifeless | ValueError: I/O operation on closed file | 22:05 |
lifeless | before ? | 22:05 |
mordred | jeblair: not doubting - but are you sure that's what you want to do? | 22:05 |
jeblair | mordred: yes. please read the commit message and let me know if you think otherwise. | 22:05 |
mordred | jeblair: increasing max_connections often has less positive effects than you might want (if you are sure, then fine, just checking) | 22:05 |
mordred | jeblair: ok. cool. | 22:05 |
mordred | looking | 22:05 |
jeblair | mordred: i'm not running a php script in apache, which is more or less what the default is tuned for. :) | 22:06 |
mordred | ah. ok. so, each threadconnection should essentially be performing like a quick query | 22:06 |
*** thomasm has quit IRC | 22:06 | |
mordred | the patch looks good- I potentially agree with fungi's comment - but I haven't really used conf.d files in anger | 22:07 |
jeblair | yes, except it might be a couple of queries separated by like 10 minutes, but each only looking at one row. | 22:07 |
fungi | jeblair: i think it's evidence nodepool should have been written in php | 22:07 |
mordred | lifeless: yes. but I cannot for the life of me remember why or what it was trying to do wrong | 22:07 |
jeblair | mordred: if you could answer fungi's comment-question, that would be swell. | 22:08 |
*** dcramer_ has quit IRC | 22:08 | |
mordred | ah - answer is "yes" | 22:08 |
jog0 | clarkb: can you make the elastic-recehck gerrit user | 22:08 |
mordred | it needs to be in [server[] | 22:08 |
mordred | it needs to be in [server] | 22:08 |
jog0 | so I don't have to keep using my own account | 22:08 |
mordred | or mysqld | 22:09 |
mordred | either will work | 22:09 |
openstackgerrit | James E. Blair proposed a change to openstack-infra/config: Set mysql max_connections to 1024 on nodepool https://review.openstack.org/48755 | 22:09 |
morganfainberg | jog0, using your own account just makes you look like you're looking at everyone's changes ;) | 22:10 |
*** sodabrew_ has quit IRC | 22:10 | |
jog0 | morganfainberg: but it sends me too many emails | 22:10 |
jeblair | i'm going to upgrade the gearman plugin on jenkins02 since i'm restarting it anyway | 22:10 |
morganfainberg | jog0, hehe. i bet. | 22:10 |
fungi | jog0: i can do it after i stop cramming food in my mouth hole. need an ssh key and, if possible, a dedicated contact e-mail address (not shared with any other gerrit user since gerrit has issues with duplicate e-mail addresses) and a display name you want it using in comments if different from the ssh username (can include spaces and whatnot) | 22:10 |
*** tjones has joined #openstack-infra | 22:11 | |
jeblair | fungi: this is an infra account | 22:11 |
jeblair | it's going to be run on the logstash host | 22:11 |
fungi | oh | 22:11 |
jeblair | fungi: so i think we should create it ourselves and stick it in hiera | 22:11 |
fungi | so we'll want to puppet the keys in and whatnot | 22:11 |
jeblair | https://review.openstack.org/#/c/47497/ | 22:11 |
jog0 | jeblair: I was hopign at first I could run it on my box for debugginga nd whatnot | 22:11 |
jog0 | if not I can work around that too | 22:12 |
clarkb | why don't I fix my review really fast | 22:12 |
clarkb | then maybe we can just deploy it on logstash.o.o and debug there | 22:12 |
*** sarob has quit IRC | 22:12 | |
jog0 | clarkb: works for me | 22:12 |
*** AlexF has joined #openstack-infra | 22:12 | |
*** sarob has joined #openstack-infra | 22:13 | |
*** alexpilotti has joined #openstack-infra | 22:14 | |
openstackgerrit | Clark Boylan proposed a change to openstack-infra/config: Deploy elastic-recheck on logstash.openstack.org. https://review.openstack.org/47497 | 22:15 |
*** flaper87|afk is now known as flaper87 | 22:15 | |
clarkb | jog0: fungi jeblair ^ there we go | 22:15 |
*** sarob has quit IRC | 22:17 | |
jog0 | clarkb: so I don't think elastic-recheck is wired up to pip yet | 22:18 |
jog0 | not really sure whats needed to put on pypi | 22:18 |
clarkb | jog0: we don't need it on pypi | 22:19 |
clarkb | jog0: we will CD it from git | 22:19 |
clarkb | jog0: we just need it to be python setup.py installable | 22:19 |
jog0 | even better | 22:19 |
jog0 | ohh haven't tried that heh | 22:20 |
clarkb | eventually we may want to pypi it, but for now this is good | 22:20 |
jeblair | restarting jenkins02 | 22:20 |
*** datsun180b has quit IRC | 22:22 | |
jeblair | the thing i love about the gearman plugin is how it starts running jobs before jenkins webui is even up. | 22:24 |
*** jcoufal has quit IRC | 22:25 | |
mordred | jeblair: ++ | 22:26 |
jeblair | even before the nodes themselves are ready. | 22:27 |
mordred | well, that's less exciting, but still fun | 22:28 |
*** justinabrahms has joined #openstack-infra | 22:28 | |
jeblair | well, after failing 100 jobs or so, it seems to be a bit better now. | 22:30 |
sdague | clarkb: where in the tree are the logstash parsing rules? | 22:30 |
clarkb | sdague: modules/openstack_project/templates/logstash/indexersomsething | 22:31 |
*** _david_ has joined #openstack-infra | 22:32 | |
clarkb | sdague: http://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/templates/logstash/indexer.conf.erb | 22:32 |
_david_ | clarb, jeblair, mordred done ;-) | 22:32 |
_david_ | WIP plugin (on top of Gerrit 2.8): https://github.com/davido/gerrit-wip-plugin | 22:32 |
_david_ | Even with screen cast, you can see it in action on new and shiny change screen 2 | 22:33 |
*** flaper87 is now known as flaper87|afk | 22:33 | |
sdague | clarkb: cool | 22:33 |
_david_ | And this is the patch upstream that still needed for that to work: https://gerrit-review.googlesource.com/50250 | 22:34 |
clarkb | _david_: are there any ACLs around it? | 22:34 |
_david_ | clarkb, sure ;-) | 22:34 |
_david_ | let me point you to that: | 22:34 |
clarkb | _david_: that is where zaro's patch comes in, being able to allow change owners permissions to do things to a change that not everyone else may be able to do | 22:34 |
clarkb | _david_: awesome | 22:35 |
_david_ | clarkb, take a look on pictures | 22:35 |
_david_ | in Gerrit 2.8 i introduced so called plugin owned capabilities (old permissions): | 22:36 |
_david_ | https://github.com/davido/gerrit-wip-plugin/blob/master/src/main/java/com/googlesource/gerrit/plugins/wip/WorkInProgressCapability.java | 22:36 |
_david_ | so you can just annotate REST endpoints: | 22:36 |
_david_ | https://github.com/davido/gerrit-wip-plugin/blob/master/src/main/java/com/googlesource/gerrit/plugins/wip/WorkInProgressAction.java#L40 | 22:36 |
clarkb | _david_: then in your ACL config you would give that capability to groups? | 22:37 |
* _david_ solved ACL in another patch already: | 22:37 | |
*** che-arne has joined #openstack-infra | 22:38 | |
jeblair | clarkb, mordred, fungi: i had to disconnect/reconnect some slaves from jenkins02 because they couldn't find their workspace | 22:38 |
mordred | jeblair: k. that's weird | 22:38 |
_david_ | clarkb, https://gerrit-review.googlesource.com/#/c/46970/ | 22:38 |
jeblair | i think it's because gearman plugin starting using them too early | 22:38 |
jeblair | and they don't seem to be able to fix themselves | 22:38 |
*** tjones has quit IRC | 22:39 | |
_david_ | clarkb, exactly, Capabilities are global permisions (exactly like in Shrews change). | 22:39 |
mordred | jgriffith: just catching up - are you making progress anywhere with the CONF.num_iscsi_retries ? | 22:40 |
clarkb | _david_: perfect | 22:41 |
*** CaptTofu has quit IRC | 22:42 | |
jgriffith | mordred: just started running it through gates | 22:44 |
mordred | jgriffith: awesome. here's hoping it helps! | 22:44 |
jgriffith | mordred: https://review.openstack.org/#/c/48752/ | 22:44 |
jgriffith | ditto... although at this rate it will take forever to have any good data | 22:44 |
jeblair | i just disconnected all of the precise slaves from jenkins02 | 22:45 |
jeblair | that was a lot of clicking | 22:45 |
jeblair | i think the restart process needs to be: | 22:45 |
jeblair | enter shutdown mode; wait; disable gearman plugin; stop; start; wait; enable gearman plugin | 22:45 |
mordred | jeblair: yes. I agree | 22:46 |
jeblair | clarkb, fungi: ^ fyi | 22:46 |
*** dcramer_ has joined #openstack-infra | 22:48 | |
*** _david_ has quit IRC | 22:51 | |
fungi | makes sense to me | 22:52 |
clarkb | we didnt have problems with the last restart | 22:52 |
clarkb | but being defensive can't hurt | 22:53 |
fungi | we probably need something somewhere which can tell whether the slaves are ready and waits for them to settle before jenkins starts accepting jobs on their behalf | 22:53 |
sdague | where is that cookie cutter repo again? | 22:53 |
fungi | or maybe it just waits for us to start connecting slaves directly to the gearman server | 22:53 |
fungi | sdague: openstack-dev/cookiecutter | 22:54 |
sdague | jgriffith: it seems to have hit the same issue again | 22:54 |
jgriffith | anybody else noticed the errors spewing everywhere | 22:59 |
*** nicedice has joined #openstack-infra | 22:59 | |
* fungi checks his faucet | 23:02 | |
fungi | jgriffith: which errors? and i assume spewing in job failure console logs, but... example? | 23:03 |
jgriffith | fungi: http://logs.openstack.org/52/48752/2/check/check-tempest-devstack-vm-postgres-full/b0e6a41/logs/screen-n-cpu.txt.gz | 23:03 |
jgriffith | fungi: just step through a search on error or trace | 23:03 |
*** rcleere has quit IRC | 23:04 | |
jgriffith | fungi: I'm also confused by the xen volumes mounted in this test output | 23:04 |
*** AlexF has quit IRC | 23:05 | |
jgriffith | xen-vdb-51744-part1 etc | 23:05 |
fungi | grr. i'm clearly on the wrong evening computer. its hanging up my browser | 23:05 |
*** sodabrew has joined #openstack-infra | 23:05 | |
* jgriffith wants diff computers for diff times of day :) | 23:06 | |
jgriffith | jeblair: sdague well that didn't tell us much except that upping the retry count isn't going to help us | 23:07 |
jgriffith | what's bothersome about this is if you look at syslog, it appears that we connected over IET succesfully | 23:09 |
*** boris-42 has quit IRC | 23:11 | |
fungi | eek, clicking trace on that log oom'd firefox, but took this poor netbook with it for several minutes while it dod so | 23:11 |
fungi | did so | 23:11 |
*** gyee has quit IRC | 23:11 | |
fungi | 512mb ram used to seem like a lot | 23:12 |
jgriffith | fungi: hehe | 23:12 |
* jgriffith takes back his earlier comment about wanting multiple coputers like fungi | 23:12 | |
jgriffith | :) | 23:12 |
fungi | yeah, you don't want these | 23:12 |
* fungi has random linux thinnish-clients scattered around the house | 23:13 | |
jog0 | clarkb: python setup.py install works for elastic-search | 23:13 |
jog0 | just doesn't install any binaries | 23:13 |
clarkb | jog0: awesome. I think the puppet is mostly ready then (it is missing an init script, but we can run it manually until we get one) | 23:14 |
jog0 | cool | 23:14 |
clarkb | fungi: yes manually running it was the intention until we had time to do it proper like | 23:14 |
clarkb | fungi: did you still want to create the system account and put it into hiera? I am being distracted by Fridayness | 23:15 |
clarkb | eg end of week fried brain | 23:15 |
openstackgerrit | Salvatore Orlando proposed a change to openstack-infra/devstack-gate: Revert "Enable q-vpn service" https://review.openstack.org/48767 | 23:16 |
jgriffith | hey wait... | 23:17 |
jgriffith | is it just me or is that SID not correct? | 23:17 |
clarkb | SID? | 23:18 |
jgriffith | SCSI ID | 23:19 |
jgriffith | something's not aligning correctly in the logs | 23:19 |
jgriffith | so notice in the nova logs we try to open/connect around 22:44:17 | 23:20 |
jgriffith | and the scsi ID is 6 | 23:20 |
jgriffith | then check the syslog, and at that time you see a connection made for a target ID 5 | 23:21 |
jgriffith | Ohhhhh | 23:22 |
jgriffith | hmmmm | 23:22 |
sdague | any idea why https://review.openstack.org/#/c/48626/ didn't collect logs after timeout | 23:24 |
clarkb | sdague: it didn't get a chance to run the cleanup function in devstack-gate | 23:26 |
clarkb | that is an annoying problem | 23:26 |
sdague | ok | 23:26 |
mordred | something about this: "jgriffith | hmmmm" terrifies me | 23:26 |
sdague | he didn't say muhahaha | 23:26 |
jgriffith | nahh, was wondering if there's something bad happening with iscsi mixing up targets | 23:26 |
mordred | jgriffith: I blame shuttleworth | 23:27 |
clarkb | sdague: not sure how we can handle that better. couple things come to mind like run a post build shell action that does the copying or trapping SIGINT and running cleanup then (assuming taht is how jenkins is killing the test) | 23:27 |
jgriffith | mordred: ha! I've been doing that for a year! | 23:30 |
sdague | mordred: that's always your answer, at least on fridays | 23:30 |
mordred | sdague: also on the other days that end in y | 23:31 |
*** ryanpetrello has quit IRC | 23:32 | |
*** che-arne has quit IRC | 23:34 | |
jeblair | so who wants to restart jenkins01? :) | 23:36 |
jeblair | it's not exhibiting problems, but i think it would be a good idea, possibly as a preventive measure, and also to upgrade the gearman plugin | 23:36 |
jgriffith | K, on a hunch that there's a target collision I'm ading a show targets message to the output | 23:37 |
jeblair | (i've uploaded the plugin, so it will take effect on restart) | 23:37 |
jgriffith | I'm likely not going to be around for a bit but I'll check it out when I get back to a computer | 23:37 |
openstackgerrit | A change was merged to openstack-infra/jenkins-job-builder: Add publisher for Git Publisher support https://review.openstack.org/46417 | 23:38 |
jeblair | (i also uploaded it to jenkins.o.o) | 23:38 |
*** alexpilotti has quit IRC | 23:42 | |
mordred | jeblair: the process is "put into shutdown; wait; disable gearman plugin; wait; stop; start; enable gearman plugin" | 23:43 |
mordred | jeblair: right? | 23:43 |
jeblair | mordred: yes | 23:43 |
mordred | putting jenkins01 into shutdown mode | 23:48 |
jeblair | i'm heading out | 23:49 |
jeblair | mordred: thanks for taking care of 01 | 23:49 |
mordred | k. sure thing! thanks for taking care of all of infra! | 23:49 |
clarkb | ++ jeblair is a good keeper of the gate keeper | 23:50 |
*** mgagne has quit IRC | 23:51 | |
jeblair | mordred: if you want to do jenkins.o.o at the same time it's ready (and should be easy, can probably do it while you're waiting on 01) | 23:52 |
*** KennethWilke has quit IRC | 23:53 | |
*** sodabrew has quit IRC | 23:53 | |
*** UtahDave has quit IRC | 23:54 | |
mordred | jenkins is in shutdown mode | 23:55 |
*** sodabrew has joined #openstack-infra | 23:57 | |
*** sodabrew has quit IRC | 23:58 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!