*** masayukig has joined #openstack-gate | 01:29 | |
*** markmcclain has joined #openstack-gate | 02:31 | |
*** dims has quit IRC | 02:50 | |
*** markmcclain has quit IRC | 04:05 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 06:04 | |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 06:31 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 06:57 | |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 07:20 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 07:48 | |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 08:03 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 08:03 | |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 08:17 | |
*** jpich has joined #openstack-gate | 09:19 | |
*** frankbutt has joined #openstack-gate | 11:10 | |
*** frankbutt has left #openstack-gate | 11:10 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 11:17 | |
sdague | morning folks | 11:23 |
---|---|---|
*** SergeyLukjanov is now known as SergeyLukjanov_ | 11:33 | |
*** shardy has joined #openstack-gate | 12:15 | |
*** Alexei_987 has joined #openstack-gate | 12:17 | |
*** werebutt has joined #openstack-gate | 12:18 | |
*** werebutt has left #openstack-gate | 12:18 | |
*** obondarev has joined #openstack-gate | 12:21 | |
*** cyeoh has joined #openstack-gate | 12:21 | |
*** Ajaeger has joined #openstack-gate | 12:21 | |
*** dims has joined #openstack-gate | 12:23 | |
*** therve has joined #openstack-gate | 12:27 | |
*** flaper87 has joined #openstack-gate | 12:28 | |
flaper87 | \o/ | 12:28 |
portante | o/ | 12:32 |
portante | mornin' | 12:32 |
*** gsamfira has joined #openstack-gate | 12:36 | |
*** koofoss has joined #openstack-gate | 12:37 | |
dims | o/ | 12:37 |
*** masayukig has quit IRC | 12:37 | |
flaper87 | portante: morning :) | 12:38 |
portante | :) | 12:38 |
*** salv-orlando has joined #openstack-gate | 12:48 | |
salv-orlando | aloha | 12:49 |
flaper87 | I prepared a patch that would keep the config files of the gate. https://review.openstack.org/#/c/69344/ (In case you guys think it's useful) | 12:51 |
salv-orlando | I joined the room just 5 minutes ago. Do we have somebody actively working on bug 1254890? | 12:52 |
salv-orlando | I think at least for neutron jobs the hang is occurring because of kernel crashes, so I'd like to discuss how we should go about it | 12:53 |
salv-orlando | on the other hand, for bug 1253896, Darragh's patch for increasing the ping timeout in tempest merged | 12:56 |
salv-orlando | this will solve the missed DHCPDISOVER problem | 12:56 |
salv-orlando | There is another neutron patch which is still blocked because of the other gate failures, which are mostly bug 1254890 and bug 1270212 | 12:57 |
salv-orlando | For the latter the neutron patch is under review as well: https://review.openstack.org/#/c/67537/ | 12:57 |
*** koofoss has left #openstack-gate | 12:57 | |
salv-orlando | promotion might not help however since there's still a high failure rate because of kernel crashes | 12:58 |
sdague | salv-orlando: so any idea why we are triggering kernel crashes now? | 13:01 |
sdague | that seems like a relatively new situation | 13:01 |
salv-orlando | I looked at neutron changes and nothing would justify this. The problem is that the crashes are triggered by the same operations which usually worked fine before. | 13:02 |
salv-orlando | And we have pretty much no logging for this from neutron, since the crash is usually triggered by the metadata proxy, whose log is stashed into the l3 agent log by redirecting the stream. | 13:02 |
salv-orlando | yucky. | 13:02 |
salv-orlando | sdague: So I was thinking that, assuming I can have metadata proxies (there is one for each namespace) logging into their own file, how hard would it be to have gate jobs store an additional log file? | 13:04 |
sdague | collecting them isn't hard | 13:04 |
salv-orlando | the other issue is that with the current gate situation I would be hardly able to merge the required neutron change. | 13:04 |
sdague | so we can always bypass if we have a critical debug issue like this | 13:05 |
*** jd__ has joined #openstack-gate | 13:07 | |
salv-orlando | k, so I'll work on that as this might give us the information we need to assess whether we need a neutron fix or whether we need to change something in the system where gate tests are executed | 13:07 |
sdague | yeh | 13:09 |
*** ociuhandu has joined #openstack-gate | 13:15 | |
sdague | https://etherpad.openstack.org/p/gate-bugs - if people wanted to update what they are working on there | 13:20 |
*** alexpilotti has joined #openstack-gate | 13:21 | |
chmouel | do you know how to do an update to status.openstack.org website i like to add http://status.openstack.org/elastic-recheck/ to it (on the top bar) | 13:21 |
Ajaeger | chmouel: repo openstack-infra/config - and then check the directory ./modules/openstack_project/files/status/ | 13:24 |
chmouel | Ajaeger: cool cheers | 13:24 |
Ajaeger | chmouel: Great idea to do this - will you patch it? | 13:25 |
chmouel | Ajaeger: yes, i am doing that now | 13:25 |
Ajaeger | thanks, chmouel ! | 13:26 |
dims | chmouel, there is/was a plan to merge elastic-recheck and rechecks page, so my prev review was -1'd | 13:27 |
chmouel | dims: ah but i guess in the meantime we can just add that page to the top bar until it's merged? | 13:27 |
chmouel | dims: as ppl are not aware of it | 13:28 |
dims | chmouel, that was exactly in my review :) | 13:28 |
chmouel | dims: so no -1 if i submit that changes ? :) | 13:28 |
Ajaeger | dims: do you have a link to your patch? | 13:29 |
chmouel | dims: ah i misunderstood, you submitted a change beforehand, cool let's see if we can chat the ppl who -1 you (if you can send the link) | 13:31 |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 13:31 | |
sdague | chmouel: there isn't one yet, though the gate status page has been stalled since Jan 9, so I hadn't bothered yet | 13:32 |
sdague | I'm also trying to sort out something on our unclassified rate, as the numbers don't look right to me | 13:32 |
anteaya | morning | 13:42 |
anteaya | I'm here for a bit and then have to move to airport wifi | 13:42 |
anteaya | just reading up on things atm | 13:42 |
*** dhellmann has joined #openstack-gate | 13:47 | |
sdague | anteaya: where you headed today? | 13:48 |
anteaya | I have to fly to Salt Lake City tonight for SaltConf | 13:48 |
anteaya | I am in Toronto today for hotel wifi rather than spending the day driving to the airport | 13:48 |
*** ttx has joined #openstack-gate | 13:48 | |
sdague | ah, cool | 13:49 |
salv-orlando | anyone has info regarding this error in n-cpu logs? http://logs.openstack.org/84/52884/7/check/check-tempest-dsvm-neutron-isolated/b0230ed/logs/screen-n-cpu.txt.gz?level=INFO#_2014-01-27_08_14_12_231 | 14:03 |
salv-orlando | does not seem fatal, however I never seen it. | 14:03 |
sdague | salv-orlando: yeh, cyeoh has some patches up to catch those, but I think there is a more systematic approach needed | 14:08 |
salv-orlando | sdague: thanks | 14:08 |
anteaya | my goal for myself today is to learn how to build elastic recheck matches | 14:09 |
anteaya | I feel I have seen enough around to get a sense of it, I just need to focus on it | 14:09 |
anteaya | will ask silly questions if I hit a wall | 14:09 |
sdague | sounds great | 14:11 |
*** rustlebee has joined #openstack-gate | 14:12 | |
*** rustlebee is now known as russellb | 14:12 | |
sdague | morning russellb | 14:12 |
russellb | morning | 14:12 |
russellb | so yeah, top bug, i think we should split it in 2 ... i can work on doing that | 14:13 |
russellb | (once i finish checking for fires in email backlog) | 14:13 |
sdague | sure, sounds good | 14:13 |
*** dims has quit IRC | 14:13 | |
*** dims has joined #openstack-gate | 14:15 | |
russellb | sdague: are you offended by Python code in a shell script? heh ... https://review.openstack.org/#/c/69256/ | 14:15 |
sdague | ummmmm | 14:17 |
sdague | we should probably try to do that in bash, honestly, for when it goes up for real | 14:17 |
sdague | NUMCPU=`cat /proc/cpuinfo | grep processor | wc -l` | 14:18 |
chmouel | nproc | 14:20 |
chmouel | woudl work as well | 14:20 |
russellb | ah, nproc is in coreutils, so that should be fine | 14:21 |
sdague | chmouel: cool | 14:21 |
sdague | actually | 14:21 |
sdague | nproc --ignore=2 | 14:21 |
sdague | give you the answer you want | 14:21 |
anteaya | for filing bugs against elastic-recheck, does it have it's own launchpad account or does it go under infra? | 14:25 |
russellb | ah true | 14:25 |
russellb | i actually changed it from - 2 to / 2 | 14:25 |
russellb | as a safer initial increase | 14:25 |
chmouel | are we doing the tabs are evil in shell scripts? | 14:25 |
chmouel | https://review.openstack.org/#/c/69256/3/devstack-vm-gate-wrap.sh | 14:25 |
russellb | yes | 14:25 |
russellb | i just screwed it up | 14:25 |
russellb | i love doing a bunch of iterations on a trivial patch, heh | 14:26 |
anteaya | a great start to the morning | 14:26 |
*** jeckersb has joined #openstack-gate | 14:27 | |
sdague | anteaya: honestly, we're not really using a tracker for er | 14:29 |
anteaya | okay | 14:30 |
anteaya | I found a bug in the docs | 14:30 |
anteaya | http://docs.openstack.org/infra/elastic-recheck/readme.html | 14:30 |
anteaya | contains a link: http://docs.openstack.org/developer/elastic-recheck which is broken | 14:31 |
anteaya | and I found 4 e-r bugs filed against infra's launchpad, fyi | 14:31 |
anteaya | do you want me to just offer an e-r docs patch that removes the broken link for now? | 14:32 |
sdague | yeh, I'm not surprised, we're not really using it :) | 14:32 |
sdague | anteaya: yes please | 14:32 |
anteaya | :D | 14:32 |
anteaya | can do | 14:32 |
*** mriedem has joined #openstack-gate | 14:37 | |
*** mriedem has left #openstack-gate | 14:37 | |
*** mriedem has joined #openstack-gate | 14:37 | |
mriedem | what did i miss? | 14:37 |
anteaya | mriedem: we waited for you | 14:38 |
mriedem | well that's nice :) | 14:39 |
anteaya | :D | 14:39 |
anteaya | also logs: http://eavesdrop.openstack.org/irclogs/%23openstack-gate/ | 14:39 |
russellb | mriedem: you missed everything | 14:40 |
mriedem | i saw the logs, that's a lie | 14:41 |
mriedem | something about shell scripts within python... | 14:41 |
mriedem | or vice-versa | 14:41 |
mriedem | i got to the parking lot at work today and then decided my car might not start tonight, so went back home | 14:41 |
*** licostan has joined #openstack-gate | 14:41 | |
russellb | mriedem: from the cold? | 14:42 |
mriedem | yeah, -15 real temp, -40 wind chill | 14:42 |
russellb | eep | 14:42 |
mriedem | not too bad | 14:42 |
russellb | i'm in the deep south, and we may get 2" of snow this week | 14:42 |
mriedem | hells bells | 14:42 |
mriedem | call off school | 14:42 |
russellb | 25 years ago was the last time i remember real snow on the ground here, heh | 14:42 |
russellb | pretty much :) | 14:42 |
russellb | milk and bread will be gone | 14:42 |
russellb | panic everywhere, shut the city down | 14:43 |
mriedem | don't forget bullets and booze | 14:43 |
russellb | though they do that even at the threat of snow/ice | 14:43 |
mriedem | you know it's bad when you step on dog shit in the back yard and think it was a rock | 14:43 |
anteaya | mriedem: where are you? | 14:43 |
mriedem | anteaya: rochester, minnesota | 14:44 |
anteaya | ah nice | 14:44 |
anteaya | almost canada | 14:44 |
mriedem | sort of, not really, it could be worse - north end of the state near the canadian border is always -40 | 14:44 |
mriedem | god's country | 14:45 |
mriedem | iron mines and bearded women :) | 14:45 |
*** licostan has left #openstack-gate | 14:45 | |
mriedem | alright, back to fingerprinting nova bugs - lots of these are old/fixed by now i'm finding | 14:46 |
anteaya | I am trying my hand at finding fingerprints for neutron unit test failures. Here is my first attempt at a fingerprint: http://bit.ly/1esexsm for this failure: http://logs.openstack.org/71/60571/3/gate/gate-neutron-python27/eb0985e/ | 14:53 |
anteaya | sdague: any feedback? | 14:53 |
sdague | anteaya: looking | 14:54 |
sdague | anteaya: lgtm | 14:55 |
anteaya | thanks, I'll do up a patch with this query | 14:56 |
anteaya | oh I guess I need to file a bug first, if there isn't one | 14:56 |
sdague | yep | 14:57 |
mriedem | anteaya: you're missing a colon after 'message' | 15:00 |
mriedem | should be: message:"delete_port() got an unexpected keyword argument 'l3_port_check'" AND filename:"console.html" | 15:00 |
mriedem | seems to be hitting though...maybe message is implied | 15:00 |
mriedem | you could further restrict the build_name to only the neutron unit test jobs | 15:01 |
anteaya | I can add the colon | 15:04 |
mriedem | russellb: sdague: seems we should handle ec2 failure responses at a level higher than debug? http://logs.openstack.org/87/44787/16/check/check-tempest-devstack-vm-neutron/d2ede4d/logs/screen-n-api.txt.gz?#_2013-10-25_18_06_26_217 | 15:05 |
anteaya | I could restrict the build_name too | 15:05 |
mriedem | sdague: because logstash doesn't index on debug level messages right? only INFO and higher? | 15:05 |
sdague | mriedem: sure | 15:05 |
sdague | you did get a 404 on the request line | 15:06 |
anteaya | it seems filename:"console.html" works as does filename:console.html | 15:06 |
sdague | anteaya: yes | 15:06 |
mriedem | anteaya: yeah, i think the quotes are only if there are spaces | 15:06 |
anteaya | ah | 15:07 |
sdague | mriedem: so are you sure that not found is actually an issue? | 15:07 |
mriedem | anteaya: also, wildcards will work in logstash (kibana) but not elastic-recheck | 15:07 |
mriedem | sdague: yeah, well at one point it caused a check failure | 15:07 |
mriedem | in a tempest boto test | 15:07 |
mriedem | sdague: https://bugs.launchpad.net/nova/+bug/1244762 | 15:07 |
mriedem | however, that test hasn't failed in the gate in the last 2 weeks | 15:07 |
anteaya | mriedem: yes, was reading that | 15:08 |
mriedem | anteaya: i need to get the INFO level indexing restriction into the e-r readme also | 15:08 |
anteaya | okay | 15:08 |
anteaya | I will vote on that patch when you have it up | 15:09 |
sdague | mriedem: yeh, if we haven't seen it int he gate in the last 2 weeks, I wouldn't worry about it | 15:09 |
anteaya | I don't know about the INFO level indexing restriction | 15:09 |
mriedem | anteaya: here is the query for that UT fail i came up with: http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiZGVsZXRlX3BvcnQoKSBnb3QgYW4gdW5leHBlY3RlZCBrZXl3b3JkIGFyZ3VtZW50ICdsM19wb3J0X2NoZWNrJ1wiIEFORCBmaWxlbmFtZTpcImNvbnNvbGUuaHRtbFwiIEFORCAoYnVpbGRfbmFtZTpcImdhdGUtbmV1dHJvbi1weXRob24yNlwiIE9SIGJ1aWxkX25hbWU6XCJnYXRlLW5ldXRyb24tcHl0aG9uMjdcIikiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6ImFsbCIsImdyYXBobW9k | 15:09 |
sdague | anteaya: for throughput reasons we are only logging at INFO level and up | 15:09 |
mriedem | anteaya: i had some e-r notes in a wiki here but need to move anything novel in there into the e-r readme: https://wiki.openstack.org/wiki/ElasticRecheck | 15:09 |
sdague | otherwise we overwhelm the search cluster | 15:09 |
mriedem | i'll get an e-r docs patch up for the INFO level restriction | 15:10 |
sdague | mriedem: thanks | 15:10 |
anteaya | sdague: ah | 15:11 |
anteaya | mriedem: can I get the query link as a shorter url? | 15:11 |
mriedem | sdague: here is an e-r query i wrote for a nova bug last night, i couldn't find anything better in the logs since it's a timeout: https://review.openstack.org/#/c/69242/ | 15:11 |
mriedem | anteaya: sure, sec | 15:11 |
anteaya | the weechat doesn't do well with multiline links | 15:11 |
anteaya | thanks | 15:11 |
sdague | mriedem: is that the one that russellb wanted to split in half? | 15:12 |
mriedem | sdague: there aren't any comments on it | 15:12 |
mriedem | anteaya: http://goo.gl/wFbs73 | 15:13 |
* mriedem needs more coffee | 15:14 | |
anteaya | mriedem: thanks | 15:14 |
anteaya | ah you went the long route for build name since there are no wild cards, thanks for showing me that, I was wondering how to address it | 15:15 |
russellb | that looks different i believe | 15:15 |
anteaya | will use that query | 15:15 |
mriedem | anteaya: yeah, i found out the hard way about no wildcard support in the e-r queries | 15:18 |
mriedem | wildcards are disabled by defalut in ElasticSearch | 15:18 |
mriedem | for performance reasons | 15:18 |
*** dansmith has joined #openstack-gate | 15:18 | |
*** dtroyer has joined #openstack-gate | 15:23 | |
anteaya | makes sense | 15:26 |
ttx | sdague: I have now 90min to dedicate to the bugday (sorry was late due to credit card being abused), anything specific I could jump in ? | 15:30 |
sdague | ttx: sure, pick a job off of the list - http://status.openstack.org/elastic-recheck/data/uncategorized.html and try to build a bug & fingerprint | 15:31 |
ttx | sdague: ok | 15:32 |
*** markmcclain has joined #openstack-gate | 15:34 | |
russellb | does anyone remember what the blockers are for getting us moved to cloud-archive? | 15:34 |
russellb | in particular, we need to run on a newer libvirt | 15:34 |
russellb | i'd like to do that as one of the next steps on the libvirt related bugs we're still seeing | 15:35 |
mriedem | russellb: wasn't there an etherpad with the patches around that? | 15:35 |
mriedem | few weeks ago | 15:35 |
mriedem | top 4 fails or something at the time? | 15:35 |
russellb | could be, trying to remember / dig it back up | 15:35 |
mriedem | me too | 15:35 |
mriedem | russellb: https://etherpad.openstack.org/p/nova-gate-issue-tracking | 15:36 |
russellb | ha i started that etherpad ... | 15:37 |
*** mtreinish has joined #openstack-gate | 15:37 | |
mriedem | https://bugs.launchpad.net/nova/+bug/1228977 | 15:37 |
mriedem | yeah :) | 15:37 |
*** ndipanov has joined #openstack-gate | 15:38 | |
mriedem | russellb: sounds like danpb is working on a fix | 15:38 |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 15:38 | |
*** markmcclain has quit IRC | 15:38 | |
russellb | ndipanov: hey, so we were just talking about the blockers for newer libvirt ... some notes on https://etherpad.openstack.org/p/nova-gate-issue-tracking | 15:38 |
russellb | ndipanov: in particular, take a look at the notes under the libvirt bug | 15:38 |
ndipanov | russellb, thanks - will take a look now | 15:39 |
russellb | ndipanov: current known blocker is https://bugs.launchpad.net/nova/+bug/1228977 | 15:40 |
ndipanov | russellb, thanks - will read up as soon as I get off this call | 15:42 |
russellb | k | 15:43 |
ttx | sdague: OK, filed a bug and a finderprint, any way to mark that bug as done ? | 15:43 |
ttx | sdague: https://bugs.launchpad.net/openstack-ci/+bug/1273283 | 15:43 |
mriedem | ttx: that fingerprint seems pretty loose | 15:44 |
*** markmcclain has joined #openstack-gate | 15:44 | |
mriedem | thought we already had some like that | 15:44 |
ttx | mriedem: there were other bugs on jenkins exceptions, but they matched other exceptions | 15:45 |
ttx | like "Interrupted" | 15:45 |
mriedem | yeah, seeing that now | 15:45 |
mriedem | and init failure on MasterComputer | 15:45 |
ttx | mriedem: am open to suggestions on making it less loose, but so far it catches the rigth stuff | 15:45 |
ttx | I suspect it's just a transient issue when things fall apart for other reasons (like a restart) | 15:46 |
ttx | but better have those 7 out of the other lists | 15:46 |
mriedem | ttx: yeah, looks sane | 15:46 |
mriedem | good thing openstack isn't written in java :) | 15:46 |
ttx | mriedem: so, is there a way to prevent those hits from appearing as uncategorized ? Or will the list autorefresh at some point ? | 15:47 |
mriedem | ttx: i'm not sure how often that list is updated, sdague or jog0 would know | 15:47 |
mriedem | ttx: as for closing the bug, it'll just be a placeholder, right? like the other 2 jenkins fail bugs we already track for random env issues. | 15:48 |
mriedem | ttx: want me to push up the e-r query patch for it? | 15:48 |
ttx | mriedem: probably useless in that case -- just pushed a fingerprint to do as instructed | 15:49 |
mriedem | ttx: well having the bug and e-r query will/should prevent duplicate bugs when someone hits this hiccup | 15:49 |
mriedem | so they can recheck and then maybe move to another node | 15:49 |
mriedem | i'd say this is worthwhile | 15:50 |
*** roaet has joined #openstack-gate | 15:52 | |
*** mestery has joined #openstack-gate | 15:53 | |
ttx | mriedem: is there anything else I should do to get the one I debunked off the list ? | 15:53 |
ttx | (i suspect posting a bug with a fingerprint is not enough) | 15:54 |
mriedem | ttx: write a query in elastic-recheck | 15:55 |
mriedem | ttx: see anteaya's one from this morning as an example: https://review.openstack.org/#/c/69386/ | 15:55 |
mriedem | ttx: if you don't have the time i can write it up | 15:55 |
ttx | mriedem: ok, thanks! | 15:55 |
ttx | mriedem: well, the idea is that I take the time to help you rater than the other way around :) | 15:56 |
ttx | will push | 15:56 |
*** ndipanov has quit IRC | 15:57 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 15:58 | |
russellb | an e-r update -- https://review.openstack.org/69391 | 15:58 |
russellb | actually ... going to tweak it a touch more | 15:59 |
*** dhellmann is now known as dhellmann_ | 16:00 | |
sdague | ttx: we're updating the uncategorized list every 60 minutes (or on code merge to r) top bugs list is every 15 mins IIRC | 16:01 |
ttx | sdague: ok, I'll just propose an e-r query | 16:01 |
mriedem | russellb: while you're tweaking, i made a comment | 16:02 |
russellb | heh | 16:02 |
russellb | i just updated | 16:02 |
russellb | will look | 16:02 |
russellb | mriedem: i think my update makes the AND comment no longer relevant | 16:03 |
russellb | at least for that one | 16:03 |
mriedem | russellb: yeah, but it's in your other one now | 16:04 |
russellb | yep, fixed now | 16:04 |
mriedem | +1 | 16:05 |
russellb | thanks | 16:05 |
russellb | seems the instances case may be due to a kernel bug | 16:06 |
*** HenryG has quit IRC | 16:06 | |
mriedem | russellb: is there something more specific in the n-cpu logs? | 16:07 |
russellb | mriedem: that was my next step (at least for the volumes bug i just filed) | 16:08 |
*** ndipanov has joined #openstack-gate | 16:11 | |
russellb | salv-orlando: have you talked to anyone about getting a kernel upgrade on our test nodes? | 16:11 |
ndipanov | russellb, If I read this right (wrt to libvirt 1.0.6 being used in the gate) it's a libvirt bug that is being worked on | 16:11 |
russellb | ndipanov: wasn't sure if it was a libvirt bug or a nova bug | 16:11 |
ndipanov | russellb, as per danpb - it's a libvirt bug... | 16:12 |
russellb | ah, ok | 16:12 |
sdague | ndipanov: so the version of libvirt in cloud archive is actually 1.1.1 | 16:12 |
russellb | so may be a while before we can update then | 16:12 |
sdague | so nova would need to work with that | 16:12 |
sdague | then we could get that in the gate | 16:13 |
ndipanov | sdague, and it doesn't? | 16:13 |
russellb | blew up last time we tried | 16:13 |
russellb | though it was actually the unit test problem that was the main issue | 16:13 |
russellb | and that is now resolved | 16:13 |
russellb | not sure if this other bug is a blocker for upgrading or not? | 16:13 |
sdague | dims had an experimental patch out there, it was failing | 16:14 |
mtreinish | russellb: on https://review.openstack.org/#/c/69391 can you add a related-bug line to the commit message then I'll push it through | 16:14 |
russellb | mtreinish: yes | 16:14 |
ndipanov | sdague, any chance you have a link to the review? | 16:14 |
russellb | mtreinish: done | 16:15 |
sdague | yeh, let me find it | 16:15 |
mtreinish | russellb: ok approved | 16:16 |
russellb | mtreinish: thanks! | 16:16 |
russellb | sdague: i think we need to try to get kernel upgraded on our nodes for https://bugs.launchpad.net/nova/+bug/1254890 | 16:16 |
sdague | https://review.openstack.org/#/c/67564/ | 16:16 |
ttx | Just pushed https://review.openstack.org/#/c/69398/ | 16:16 |
russellb | i just rechecked that one, see if it has improved with patches merged in the last week | 16:17 |
sdague | cool | 16:17 |
russellb | fungi: see my comment above to sdague | 16:17 |
dims | sdague, russellb bad lockup in libvirt - https://bugzilla.redhat.com/show_bug.cgi?id=929412 | 16:17 |
mtreinish | mriedem: on: https://review.openstack.org/#/c/69242/1 that sounds like something tempest would try during a negative test | 16:17 |
mtreinish | that query doesn't cause false positives | 16:17 |
dims | sdague, russellb - we can't upgrade to 1.1.1 | 16:18 |
dims | of libvirt | 16:18 |
russellb | dims: ah thanks for the link! ndipanov ^^^^ | 16:18 |
fungi | dims: unless the reason libvirt is breaking is *because* it needs a newer kernel | 16:18 |
mriedem | ttx: wildcards don't work in e-r queries :( | 16:18 |
ndipanov | russellb, yeah - it's linked in the LP bug | 16:18 |
ndipanov | I saw that | 16:18 |
russellb | k | 16:18 |
fungi | but sounds like not | 16:18 |
russellb | fungi: so my kernel comment was related to this bug where salv-orlando is seeing kernel crashes related to network namespace operations | 16:19 |
russellb | and i've been told there have been fixes in that code since the kernel we're using | 16:19 |
ttx | mriedem: you mean the '*' I added in my build_name ? | 16:19 |
mriedem | ttx: yup | 16:19 |
mriedem | just commented in your review | 16:19 |
ttx | mriedem: copied it from queries/1272511.yaml | 16:20 |
fungi | russellb: right. i was just commenting that it was also suggested that some of our issues with newer libvirt may also be related to running too olf of a kernel | 16:20 |
ttx | mriedem: is taht one bad too ? | 16:20 |
russellb | fungi: oh ok | 16:20 |
mriedem | ttx: maybe.. | 16:20 |
russellb | fungi: what can I do to help with a kernel upgrade? | 16:20 |
mriedem | ttx: although 1272511 does show up here: http://status.openstack.org/elastic-recheck/ | 16:20 |
ttx | mriedem: there are 5 queries using * is build_name, fwiw | 16:20 |
ttx | in* | 16:20 |
fungi | russellb: which jobs is it impacting? i'm waist deep in wrangling a nodepool/jenkins issue at the moment so little time to dig through scrollback | 16:21 |
mriedem | ttx: hmmm, well now my world is collapsing | 16:21 |
russellb | fungi: sorry. devstack-gate basically | 16:21 |
russellb | fungi: in particular, the ones using neutron | 16:21 |
ttx | mriedem: sorry about that ;) | 16:21 |
fungi | russellb: okay, bit of a challenge there. newer kernels mean reboots. we don't currently have a reboot phase on nodepool node creation (though maybe upgrading the kernel on the image build will be sufficient to cause launched nodes to use a newer kernel, in which case just having devstack require a suitable kernel deb ought to be fine?) | 16:23 |
mriedem | mtreinish: for https://review.openstack.org/#/c/69242/ - yeah, maybe, but in logstash for the query it's all fails | 16:24 |
mriedem | mtreinish: not sure if we have negative rebuild tests? | 16:24 |
mtreinish | mriedem: neither do I, let me check | 16:24 |
sdague | fungi: so I think we know we have a kernel bug, but we don't know what the fix it | 16:24 |
sdague | is | 16:25 |
russellb | fungi: yeah, i guess i was thinking just upgrading the kernel on the base image used for the dsvm nodes ... | 16:25 |
russellb | sdague: well, there are known fixes in this kernel code | 16:25 |
russellb | sdague: so i think first step we just need to see what kind of upgrade we can do without much pain | 16:25 |
russellb | i think ubuntu has newer kernels available for LTS for hardware enablement, so we just need to use one of them | 16:25 |
sdague | yeh, that's true | 16:26 |
russellb | "just need to" .. i say it like it's simple since i don't know how to do it | 16:26 |
sdague | ok, I can investigate this afternoon. | 16:26 |
russellb | k, i'm happy to help out | 16:27 |
mtreinish | mriedem: from what I can see it's just test_rebuild_reboot_deleted_server and test_rebuild_non_existent_server | 16:27 |
mtreinish | on the negative test side of things | 16:27 |
dims | fungi, Daniel Berrange has confirmed a code issue in libvirt and we have logs to prove it | 16:27 |
sdague | dims: so do we know a fix strategy? | 16:28 |
mriedem | mtreinish: hmm, ok, open to suggestions - unfortunately there are several test_rebuild_server* tests failing with timeouts but there isn't really anything great to fingerprint on, besides that one n-api log message | 16:28 |
russellb | sdague: he's working on it | 16:28 |
mriedem | mtreinish: they timeout while waiting for the instance to rebuild | 16:29 |
mriedem | so nova is doing it's thing, but apparently not quick enough | 16:29 |
fungi | dims: ahh, yes, now i recall he finally responded on that bug | 16:29 |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 16:30 | |
dims | sdague, not yet. will have to ping Daniel | 16:31 |
mtreinish | mriedem: no that query is probably fine | 16:31 |
mtreinish | I was just worried that we had a test trying to do that | 16:31 |
mtreinish | but we don't | 16:31 |
*** jgriffith has joined #openstack-gate | 16:36 | |
*** coolsvap has joined #openstack-gate | 16:37 | |
anteaya | all 24 remaining neutron unit test unclassified failures should be addressed once this is merged: https://review.openstack.org/#/c/69400/1 | 16:39 |
anteaya | anyone working on gate-grenade-dsvm unclassified failures yet? | 16:40 |
ttx | anteaya: i'm on it but will stop soon | 16:41 |
mriedem | ttx: there is a 33% success rate in the laste 7 days with this: https://review.openstack.org/#/c/69398/ | 16:41 |
anteaya | ttx I can switch to another category | 16:41 |
anteaya | ttx and let me know when you change focus | 16:41 |
ttx | mriedem: looking | 16:42 |
anteaya | I'll work on this list for now: gate-tempest-dsvm-neutron | 16:42 |
ttx | mriedem: tere seem to be one case where that failure is not propagated yes. Probably best to leave it out then | 16:44 |
ttx | oh. ah. | 16:46 |
ttx | I think I understand where this bug comes from though | 16:48 |
ttx | haha. | 16:50 |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 16:54 | |
anteaya | sdague: do you know which of cyeoh's patches address salv-orlando's earlier question regarding Info cache for instance <instance #> could not be found | 17:01 |
anteaya | https://review.openstack.org/#/dashboard/5292 | 17:01 |
anteaya | I'm seeing the same error in an unclassified log and am trying to find the correct bug for it | 17:02 |
mriedem | anteaya: that info cache one is fixed | 17:03 |
mriedem | sec | 17:03 |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 17:03 | |
mriedem | anteaya: fixed with this: https://review.openstack.org/#/c/65374/ | 17:03 |
mriedem | cyeoh: ^ | 17:03 |
anteaya | found it https://bugs.launchpad.net/nova/+bug/1256182 | 17:03 |
mriedem | you'll see that bug drops off elastic-recheck once that was merged | 17:03 |
ttx | OK, submitted one sig and debunked one bug. Got to get the kids at school. Sorry I couldn't contribute more :) | 17:04 |
mriedem | ttx: thanks for helping | 17:04 |
ttx | mriedem: I abandoned that second sig, it's actually all bug 1097592 now | 17:05 |
anteaya | ttx thanks, see you later | 17:05 |
anteaya | mriedem: okay 65374 was merged on the 25th, but the fingerprint I have for that failure is still collecting some failures, including on the 27th: http://bit.ly/1b1bpTo | 17:09 |
anteaya | now all the failures I have are from neutron | 17:09 |
mriedem | anteaya: i seem to remember seeing a nova bug in triage last night that was for an info_cache not found failure that was novel | 17:10 |
mriedem | will dig in a sec | 17:10 |
anteaya | thanks | 17:10 |
mriedem | sdague: mtreinish: russellb: another e-r query for libvirt connection reset: https://review.openstack.org/#/c/69415/ | 17:13 |
mriedem | slightly different than the one we see more often | 17:13 |
anteaya | mriedem: might this be the one? https://bugs.launchpad.net/nova/+bug/1072014 | 17:14 |
mriedem | anteaya: not the one i saw | 17:15 |
* mriedem looks now | 17:15 | |
anteaya | no sorry that was from November 27th, 2012 | 17:16 |
*** gsamfira has quit IRC | 17:17 | |
mriedem | anteaya: maybe this? https://bugs.launchpad.net/nova/+bug/1249065 | 17:18 |
mriedem | there are actually 17 hits when searching launchpad for nova bugs with info_cache | 17:18 |
anteaya | yes, I have been wandering among them | 17:19 |
anteaya | that was where i found the dusty one from 2012 | 17:19 |
mriedem | anteaya: yeah, http://goo.gl/92G9U2 | 17:19 |
anteaya | it looks like a good candidate | 17:21 |
anteaya | I have to change locations, pay for a cab, get boarding passes and have my privacy violated by the TSA | 17:21 |
anteaya | I'll be back in a bit | 17:21 |
mriedem | enjoy | 17:23 |
*** HenryG has joined #openstack-gate | 17:24 | |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 17:26 | |
*** jog0 has joined #openstack-gate | 17:30 | |
mriedem | yet another libvirt connection fail query: https://review.openstack.org/#/c/69418/ | 17:31 |
*** markmcclain has quit IRC | 17:31 | |
*** markmcclain has joined #openstack-gate | 17:32 | |
mriedem | anteaya: this could also be a large ops race fail related to info cache not found: https://bugs.launchpad.net/nova/+bug/1227143 | 17:33 |
*** markmcclain has quit IRC | 17:33 | |
*** markmcclain has joined #openstack-gate | 17:33 | |
mriedem | although that's grizzly...so probably nevermind | 17:33 |
*** jpich has quit IRC | 17:36 | |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 17:41 | |
jgriffith | russellb: I'm thinking of proposing a bump up on the num_scan_tries for that bug | 17:42 |
jgriffith | russellb: if nothing else to see if we can impact it | 17:42 |
jgriffith | russellb: trying some other setups to see if that's rational or not | 17:43 |
russellb | OK | 17:43 |
jgriffith | russellb: the attach makes it down, everything looks "ok" and we connect succesfully | 17:43 |
jgriffith | russellb: we just don't get the device mapped via libvirt | 17:43 |
jgriffith | err... open-iscsi | 17:44 |
russellb | just one of those being impatient issues? | 17:44 |
jgriffith | russellb: not fully convinced yet, but possibly | 17:44 |
jgriffith | russellb: I mean there's something not right in the time it takes, but I"m checking to see if we're borderline in good cases | 17:44 |
jgriffith | russellb: ie logstash on retries for that op | 17:44 |
russellb | jgriffith: note that i think someone said earlier that you can't query debug in logstash | 17:45 |
russellb | not sure if you get an INFO or higher message for that, haven't looked | 17:45 |
mriedem | russellb: you can query against INFO+ | 17:46 |
*** Alexei_987 has quit IRC | 17:46 | |
russellb | mriedem: thanks | 17:46 |
mriedem | https://review.openstack.org/#/c/69388/ | 17:46 |
jgriffith | russellb: there's a warning for it so that *should* work | 17:48 |
russellb | cool | 17:49 |
jgriffith | interesting: message:"ISCSI volume not yet found at" over last 7 days = 310 | 17:55 |
jgriffith | quite a distribution between how many tries it takes, not what I expected | 17:56 |
russellb | interesting | 17:56 |
russellb | so that supports your theory | 17:56 |
jgriffith | russellb: seems to, but I'm curious why we have the variance | 17:57 |
* russellb blames the cloud | 17:57 | |
jgriffith | russellb: indeed | 17:57 |
russellb | honestly, i've seen tons of variance in how long things take causing failures | 17:57 |
jgriffith | russellb: at any rate I'll look at a sane adjustment for the retries, either by upping the default count or adjusting localrc | 17:57 |
jgriffith | russellb: well, we are supposed to expect that eh? :) | 17:58 |
russellb | and it seemed to be worse before turning down tempest concurrency | 17:58 |
jgriffith | russellb: for sure | 17:58 |
jgriffith | russellb: huge drop after the 21'st | 17:58 |
russellb | that sounds about right for the concurrency merge | 17:58 |
jgriffith | russellb: in that querie alone | 17:58 |
jgriffith | russellb: alright, I'll play with some things and bump the default in nova's conf | 17:59 |
russellb | jgriffith: k, ping me if/when you need a review | 17:59 |
jgriffith | russellb: will do | 17:59 |
jgriffith | russellb: thakns | 17:59 |
russellb | thank _you_ | 17:59 |
jgriffith | thanks even | 17:59 |
sdague | the concurency merge was th 16th | 18:02 |
* jgriffith is covered by sdague 's wet blanket | 18:03 | |
jog0 | wow 95% classification rate! | 18:03 |
sdague | Bug 1270608 - n-cpu 'iSCSI device not found' log causes gate-tempest-dsvm-*-full to fail went away when I disabled a tempest test | 18:03 |
jog0 | and 56 bugs :( | 18:04 |
sdague | jgriffith: https://review.openstack.org/#/c/67991/ | 18:04 |
sdague | so that test is really good at exposing that bug | 18:04 |
sdague | it could be a test bug | 18:04 |
jgriffith | sdague: yes, I remember that one now, and yes completely explains the log querie | 18:04 |
jgriffith | sdague: Yup, and it'll give me some data on the timing assuming it's the same | 18:05 |
sdague | it would be good to figure out if this is a real cinder issue | 18:05 |
jgriffith | sdague: it's not | 18:05 |
jgriffith | sdague: it's load on the compute note | 18:05 |
jgriffith | sdague: at least the case I'm looking at now | 18:05 |
jgriffith | sdague: has NOTHING to do with Cinder at all | 18:05 |
jgriffith | sdague: or nova for that matter | 18:05 |
jgriffith | sdague: strictly slow iscsi connect | 18:05 |
sdague | jgriffith: so under load, how do we fall over? | 18:06 |
jgriffith | we give it 8 seconds, and sometimes that's not enough | 18:06 |
sdague | ok | 18:06 |
sdague | so can we adjust that to a more real timeout that makes sense? | 18:06 |
russellb | yeah that sounds aggressive for these loaded test nodes | 18:06 |
jgriffith | sdague: it looks like on average in our gate runs we're going past 4 seconds anyway | 18:06 |
jgriffith | sdague: yes | 18:06 |
russellb | dang, yeah then 8 isn't enough :) | 18:06 |
jgriffith | sdague: I'm just trying to decide what's most sane | 18:06 |
sdague | jgriffith: ok, cool | 18:06 |
jgriffith | either change teh default or change the sleep factor | 18:07 |
jgriffith | sleep factor is currently two, considering bumping it to 4 | 18:07 |
russellb | jgriffith: which code is this | 18:07 |
russellb | nm found it | 18:08 |
jgriffith | russellb: nova.virt.libvirt.py:L#317 | 18:08 |
russellb | libvirt/volume.py right? | 18:08 |
jgriffith | russellb: yes sir | 18:08 |
russellb | so we'll loop 3 times right? so we'll sleep 2 seconds, 4 seconds, then 8 seconds ... so 14 seconds total i think | 18:10 |
jgriffith | russellb: well it's ** | 18:10 |
jgriffith | so 2, 4, 9 | 18:10 |
russellb | do'oh, meant 9 :-p | 18:11 |
* russellb can't do math apparently | 18:11 | |
jgriffith | alright I did the same thing when I first viewed it | 18:11 |
russellb | so, i don't think we should do ** 4 ... | 18:11 |
russellb | because it's a hard sleep | 18:11 |
russellb | if we changed the default retries to 4, we'd double the time we wait | 18:12 |
russellb | maybe just change retries to 5 from 3? | 18:12 |
jgriffith | russellb: I'm leaning towards upping the sleep factor | 18:13 |
jgriffith | russellb: so that way we don't have to mess with chaning config | 18:13 |
jgriffith | chaing | 18:13 |
jgriffith | grrr... changing | 18:13 |
russellb | it's just a config default though, that's not a huge deal | 18:13 |
sdague | yeh, I'd got with more loops | 18:13 |
sdague | s/got/go/ | 18:13 |
jgriffith | russellb: tru'dat | 18:13 |
jgriffith | russellb: sdague fair | 18:14 |
sdague | because then on fast envs it will be fast, and on slow envs it will not blow up | 18:14 |
russellb | if we did **4 ... it'd be 1, 16, 81 sleeps | 18:14 |
*** SpamapS has joined #openstack-gate | 18:14 | |
russellb | seems a bit aggressive on the ramp up | 18:14 |
sdague | yeh | 18:15 |
jgriffith | russellb: I'll bump that up then, my only point was that we already hit retries on a regular basis | 18:15 |
russellb | 1, 4, 9, 16 25 | 18:15 |
russellb | seems better | 18:15 |
jgriffith | russellb: why have warning that we're retrying when we know we're going to hit them? | 18:15 |
jgriffith | just sec.. phone call | 18:15 |
russellb | jgriffith: dunno, seems like we should only warn if we give up | 18:15 |
russellb | maybe debug on retry | 18:15 |
jog0 | jgriffith: are we hitting retires outside of gate in production to or just in gate? | 18:15 |
jog0 | due to load etc | 18:15 |
jgriffith | russellb: then we'd be screwed | 18:15 |
russellb | orly | 18:16 |
jgriffith | russellb: ie like today trying to get info on this | 18:16 |
russellb | i think the early retries are reasonable for production machines though | 18:16 |
jgriffith | russellb: personally I like the warning, gives an indication that you need to bump the value up | 18:16 |
russellb | shouldn't optimize for our loaded test env | 18:16 |
russellb | that's fine, we can leave it | 18:16 |
jgriffith | russellb: You're the nova folks, totally your call | 18:17 |
russellb | k :) | 18:17 |
russellb | let's just bump retries | 18:17 |
jgriffith | russellb: perfecto | 18:17 |
jgriffith | russellb: doing it now | 18:17 |
russellb | great | 18:17 |
russellb | == 5 gives us a total timeout of almost 1 minute | 18:18 |
russellb | which seems sane ... | 18:18 |
jog0 | it does? | 18:18 |
jgriffith | russellb: I agree fully | 18:18 |
jog0 | note: not saying its insane | 18:18 |
russellb | jog0: 5 retries will result in sleeps of 1, 4, 9, 16, 25 seconds | 18:18 |
russellb | === 55 seconds | 18:18 |
jgriffith | russellb: sdague I'll run this on the ssh tests and see how things go there | 18:18 |
jog0 | ohh exponential backoff yeah that seems sane | 18:19 |
mriedem | is this the libvirt iscsi connection fail we're talking about? | 18:19 |
jgriffith | russellb: sdague maybe we'll kill two bugs with one patch | 18:19 |
russellb | nice | 18:19 |
mriedem | 1270608? | 18:19 |
jgriffith | mriedem: yes | 18:19 |
russellb | love it when "wait longer" is the solution, heh | 18:19 |
jgriffith | russellb: :P | 18:19 |
mriedem | ha, bknudson suggested waiting longer a couple weeks ago but we didn't think that would fly :) | 18:20 |
russellb | mriedem: d'oh | 18:20 |
mriedem | oh well | 18:20 |
russellb | sometimes it is the right answer. | 18:20 |
jgriffith | kudos to bknudson | 18:21 |
jog0 | sdague: you have a patch up to make http://status.openstack.org/elastic-recheck/data/uncategorized.html discovareable from http://status.openstack.org/elastic-recheck/ ? | 18:22 |
*** dhellmann_ is now known as dhellmann | 18:23 | |
*** dhellmann is now known as dhellmann_ | 18:23 | |
sdague | jog0: no, not yet | 18:24 |
russellb | so what builds the image used for dsvm nodes | 18:27 |
sdague | it's using devstack under the covers | 18:27 |
sdague | fungi or jeblair could probably explain | 18:28 |
russellb | so, nodepool does it as a periodic task ... got that far :) | 18:30 |
fungi | nodepool prep scripts (in the config repo) look in devstack's source to determine what to preinstall/cache | 18:30 |
russellb | ok cool | 18:31 |
fungi | and also apply configuration from the puppet manifests which configure them to be consistent with other jenkins slaves | 18:31 |
jog0 | russellb: for bug 1254890 we can add a syslog based fingerprint | 18:38 |
jog0 | or open a new bug with a syslog fingerprint | 18:38 |
russellb | sure, that'd be helpful | 18:38 |
russellb | the more targeted the queries/bugs the better i think | 18:38 |
salv-orlando | jog0: both options are equivalent for me. It should be easy to handle the overlap | 18:40 |
*** ndipanov is now known as ndipanov_gone | 18:40 | |
jog0 | salv-orlando: you want to write the query? | 18:42 |
jog0 | as you have dug into this bug more then me | 18:42 |
salv-orlando | sure jog0… so just to make sure, shall we add the query to the a new bug or still to 1254890 | 18:47 |
salv-orlando | ? | 18:47 |
jgriffith | russellb: mriedem sdague https://review.openstack.org/#/c/69443/ | 18:53 |
*** SergeyLukjanov_ is now known as SergeyLukjanov | 18:53 | |
jgriffith | mriedem: sdague might be worth trying to put boot_from_vol test back in after this lands | 18:53 |
russellb | jgriffith: +2 | 18:54 |
russellb | jgriffith: will chase an approval after check finishes | 18:54 |
jgriffith | russellb: sounds good | 18:55 |
mriedem | jgriffith: russellb: sdague: ok, we just need to restore this: https://review.openstack.org/#/c/69203/ | 18:57 |
sdague | jgriffith: sounds good | 19:01 |
sdague | mriedem: can you restore that now? | 19:01 |
sdague | and we'll run recheck on it a few times after the nova code lands | 19:01 |
jgriffith | sdague: excellent | 19:01 |
sdague | jgriffith: thanks for diving on this | 19:01 |
russellb | so, the kernel upgrade thing ... rev1 -- https://review.openstack.org/69445 | 19:01 |
russellb | totally untested | 19:02 |
mriedem | sdague: sure | 19:02 |
jog0 | salv-orlando: your call on new bug or not | 19:02 |
salv-orlando | jog0: ok thanks | 19:02 |
russellb | salv-orlando: see my patch above ... trying to figure out how to get kernel upgraded for the crashes you're seeing | 19:03 |
jog0 | russellb: nice, we should look into getting new libvirt in as well | 19:03 |
jog0 | I think dims was working on that | 19:03 |
jog0 | dims: ^ | 19:03 |
russellb | yeah dims is on that | 19:03 |
russellb | it's blocked by a libvirt bug | 19:03 |
jog0 | russellb: which one? | 19:03 |
jgriffith | russellb: running on my box now | 19:03 |
dims | right | 19:03 |
russellb | jog0: notes on https://etherpad.openstack.org/p/nova-gate-issue-tracking | 19:04 |
salv-orlando | russellb: let's hope that helps, otherwise we'll have to isolate the failure and find a way to work around it | 19:04 |
dims | jog0, https://bugzilla.redhat.com/show_bug.cgi?id=929412 | 19:04 |
sdague | russellb: yeh, that looks vaguely sane | 19:04 |
jog0 | dims: thanks | 19:04 |
russellb | sdague: i'll take that :) | 19:04 |
salv-orlando | in other news, I haven't heard from canonical team any complaint… so perhaps their internal testing is not failing ;) | 19:04 |
salv-orlando | in which case, it might be worth finding some canonical guy and asking them which kernel version they are running. | 19:05 |
jog0 | russellb dims: do we havea launchpad BP where we aretracking libvirt 1.x ? | 19:05 |
russellb | jog0: not that i know of | 19:05 |
jog0 | dims: you want the honors of creating one | 19:06 |
jog0 | so we can track this | 19:06 |
*** Ajaeger has quit IRC | 19:07 | |
dims | jog0, don't see one. looking for something to model on | 19:11 |
jog0 | dims: it could be something like: support libvirt 1.x | 19:17 |
jog0 | and target for icehouse | 19:18 |
flaper87 | sdague: PS 12 seems to have passed all tests, PS 11 didn't | 19:19 |
flaper87 | I'll check that one | 19:19 |
dims | jog0, will do | 19:20 |
jog0 | dims: thanks. | 19:21 |
sdague | flaper87: yeh, I expect that this is a more deep seated race in glance unit tests | 19:21 |
flaper87 | sdague: indeed. I was looking at those tests the other day. Not sure where the race is but I don't think those asserts need to be there to begin with | 19:21 |
flaper87 | anyway, I'll figure this out | 19:22 |
anteaya | back | 19:25 |
*** jmeridth has joined #openstack-gate | 19:26 | |
sdague | flaper87: thanks! | 19:26 |
*** coolsvap has quit IRC | 19:33 | |
russellb | eep @ http://status.openstack.org/elastic-recheck/ bug 1257626 | 19:50 |
russellb | all the sudden blowing up again? | 19:50 |
russellb | i did not approve of this | 19:50 |
sdague | russellb: so that's in check queue | 19:51 |
sdague | there is some big nova patch series that just pushed that blew up pretty universally on that test | 19:52 |
mtreinish | sdague: so that means dansmith is at fault? :) | 19:52 |
sdague | probably :) | 19:52 |
russellb | 1328 hits in the last 4 hours | 19:52 |
jog0 | message:"kernel BUG at /build/buildd/linux-3.2.0/fs/buffer.c:2917" AND filename:"logs/syslog.txt" | 19:52 |
jog0 | thats a lot of kernel bug hits | 19:52 |
dansmith | it worked for me on my machine :) | 19:53 |
russellb | dansmith: but yeah, looks like those are all on your patch series | 19:54 |
dansmith | russellb: only on mine? | 19:54 |
russellb | dansmith: well ... so far that's what i see | 19:54 |
dansmith | hmm | 19:54 |
jog0 | russellb: 1257626 looks like it legitimately came back | 19:54 |
jog0 | not all of them | 19:54 |
jog0 | 67694 | 19:54 |
jog0 | http://logstash.openstack.org/#eyJmaWVsZHMiOltdLCJzZWFyY2giOiJtZXNzYWdlOlwibm92YS5jb21wdXRlLm1hbmFnZXIgVGltZW91dDogVGltZW91dCB3aGlsZSB3YWl0aW5nIG9uIFJQQyByZXNwb25zZSAtIHRvcGljOiBcXFwibmV0d29ya1xcXCIsIFJQQyBtZXRob2Q6IFxcXCJhbGxvY2F0ZV9mb3JfaW5zdGFuY2VcXFwiXCIgQU5EIGZpbGVuYW1lOlwibG9ncy9zY3JlZW4tbi1jcHUudHh0XCJcbiIsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50Iiwib2Zmc2V0IjowLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJtb2RlIjoidGVybXMiLCJhbmFseXplX2ZpZWxk | 19:55 |
russellb | dansmith: the first 67694 is still technically the same patch series | 19:55 |
russellb | it's based on one of dan's patches | 19:55 |
jog0 | russellb: ahh | 19:55 |
jog0 | russellb: so all of these faulires are in the check queue | 19:56 |
* russellb nods | 19:56 | |
jog0 | so looks like it hasn't hit gate yet | 19:56 |
russellb | dansmith just rebased his patch series today (bunch of patches) | 19:56 |
russellb | so that would explain the sudden huge appearance of those if it was something in there | 19:56 |
* dansmith is confused | 19:57 | |
jog0 | russellb: https://review.openstack.org/#/c/69448/ | 19:57 |
russellb | cool | 19:58 |
jog0 | dansmith: https://jenkins03.openstack.org/job/gate-tempest-dsvm-large-ops/4499/console | 19:59 |
russellb | here's a count of the errors per patch -- http://logstash.openstack.org/#eyJmaWVsZHMiOltdLCJzZWFyY2giOiJtZXNzYWdlOlwibm92YS5jb21wdXRlLm1hbmFnZXIgVGltZW91dDogVGltZW91dCB3aGlsZSB3YWl0aW5nIG9uIFJQQyByZXNwb25zZSAtIHRvcGljOiBcXFwibmV0d29ya1xcXCIsIFJQQyBtZXRob2Q6IFxcXCJhbGxvY2F0ZV9mb3JfaW5zdGFuY2VcXFwiXCIgQU5EIGZpbGVuYW1lOlwibG9ncy9zY3JlZW4tbi1jcHUudHh0XCIiLCJ0aW1lZnJhbWUiOiIxNDQwMCIsImdyYXBobW9kZSI6ImNvdW50Iiwib2Zmc2V0IjowLCJ0aW1lIjp7InVzZXJfaW50ZX | 19:59 |
russellb | J2YWwiOjB9LCJzdGFtcCI6MTM5MDg1MjMxNzkzNCwibW9kZSI6InNjb3JlIiwiYW5hbHl6ZV9maWVsZCI6ImJ1aWxkX2NoYW5nZSJ9 | 19:59 |
russellb | hrm | 19:59 |
russellb | shorter: http://goo.gl/Uvw30f | 19:59 |
anteaya | thanks | 19:59 |
russellb | dansmith: errors start occurring on this patch: https://review.openstack.org/#/c/66634/10 | 20:00 |
russellb | according to logstash *shrug* | 20:00 |
jog0 | http://logs.openstack.org/50/67550/5/check/gate-tempest-dsvm-large-ops/60a7a43/logs/screen-n-net.txt.gz?level=INFO | 20:00 |
dansmith | I guess I have to wait for n-net logs, eh? | 20:01 |
jog0 | russellb: for https://review.openstack.org/#/c/69448/ I wanted to make sure you think adding that bug makes sense | 20:01 |
jog0 | dansmith: see n-net link above ^ | 20:01 |
russellb | so, nova-network timed out because it was blocking on conductor ... | 20:03 |
russellb | jog0: sure, yeah | 20:05 |
dansmith | russellb: this is the bug sdague pointed me at before I left | 20:05 |
dansmith | which I don't understand | 20:05 |
dansmith | seems like a race during worker startup, but I don't know why conductor hits it and not n-api | 20:05 |
jog0 | russellb: cool thanks | 20:05 |
*** david-lyle has joined #openstack-gate | 20:06 | |
russellb | could use a Related-bug: tag in the commit msg | 20:07 |
jog0 | russellb: can you add that to the review | 20:07 |
russellb | dopne | 20:07 |
russellb | jog0: you see https://review.openstack.org/#/c/69445/ ? | 20:08 |
russellb | added it to that bug too | 20:09 |
jog0 | russellb: nice | 20:10 |
jog0 | if this works we should add something to the release notes about this bug | 20:11 |
russellb | yeah | 20:11 |
jog0 | saying which kernel works | 20:11 |
jog0 | in fact maybe we should document in release notes what kernel we gate on | 20:11 |
russellb | wish there was a good way to do a test deploy of this change | 20:11 |
russellb | if this merges, it gets applied the next time nodepool rebuilds its base image, and that's for *everything* | 20:12 |
jog0 | fungi: ^ thoughts | 20:12 |
jgriffith | sdague: mriedem FWIW I'm convinced that the boot-from-volume test failures were due to the same issue (virt attach timeout) | 20:14 |
jgriffith | jog0: +1000 on publishing kernel | 20:14 |
mriedem | jgriffith: cool, this should also hopefully help the DOS on cinder from tempest: https://review.openstack.org/#/c/69455/ | 20:14 |
sdague | jgriffith: nice | 20:14 |
jgriffith | jog0: should be in release notes/docs at a min, consider adding as a pre-req in install guide | 20:14 |
jog0 | jgriffith: lets file a bug about adding this for the docs team | 20:15 |
jgriffith | mriedem: interesting | 20:15 |
fungi | jog0: well, it's worth noting that this change isn't going to fix the problem fior anybody besides infra ci tests | 20:15 |
jgriffith | jog0: sure, I'll do that now, assumign we're mvoing forward? Or is there some testing data we want first? | 20:16 |
sdague | mriedem: well, it's really a possible quota overrun | 20:16 |
mriedem | jgriffith: that test was creating 2 volumes per 7 test cases each, and only 2 of the tests actually needed a volume, separately | 20:16 |
jgriffith | fungi: I'm still not comfortable testing one way and deploying another | 20:16 |
fungi | jog0: i really think that if devstack doesn't work properly with the current precise kernels, then either ubuntu needs to update those kernels with a patch or devstack should make sure the correct kernel is installed | 20:16 |
mriedem | and never waiting for deletes | 20:16 |
jog0 | jgriffith: we want to document what version of kernel we test on, but the specific version doens't matter for the bug | 20:16 |
jog0 | (docs bug) | 20:16 |
fungi | jgriffith: agreed | 20:16 |
sdague | fungi: so the issue is mostly about testing it | 20:17 |
jgriffith | jog0: so just a statement in docs that we use kernel version X for now, maybe something else later | 20:17 |
fungi | sdague: okay, then devstack sounds like the right place to fix it | 20:17 |
russellb | and it's not even devstack | 20:17 |
russellb | it's a neutron issue | 20:17 |
jog0 | jgriffith: ideally docs can have a tool that checks what kernel we test with | 20:17 |
jgriffith | russellb: good point :) | 20:17 |
sdague | fungi: also... ubuntu precise, without cloud archive, is not supported on icehouse :) | 20:17 |
sdague | so our current config.... really isn't reality | 20:17 |
fungi | russellb: well, if devstack is configured with neutron then devstack needs to make sure there's a suitable kernel for neutron | 20:18 |
sdague | fungi: sure | 20:18 |
jog0 | jgriffith: I figure docs would say two things: these kernels are known to have issues. and we gate on kernel x | 20:18 |
russellb | fungi: i don't think devstack should be in the business of installing kernels | 20:18 |
sdague | fungi: the real question is how do we get test data on it | 20:18 |
russellb | but we could add a safety check | 20:18 |
jgriffith | jog0: sounds like the best approach | 20:18 |
sdague | russellb: it installs kernels on centos | 20:18 |
jgriffith | jog0: I'll log it and maybe add it here later if I have a minute | 20:18 |
jog0 | jgriffith: cool, thanks for filing the bug | 20:18 |
russellb | sdague: >_< | 20:18 |
fungi | russellb: well, devstack is in the business of installing all sorts of other system-wide packages, and it's where we get the list of what devstack needs to be able to run tempest tests when we're building nodepool nodes | 20:18 |
sdague | otherwise you *can't* use neutron | 20:18 |
sdague | at all | 20:19 |
russellb | with network namespaces, you're right | 20:19 |
russellb | fungi: hrm .... ok. let me see what i can do here | 20:19 |
sdague | fungi: so how would we try this to see if it solved things? because we don't have a step where we could reboot to take the new kernel? | 20:20 |
sdague | as that today is a nodepool prep step | 20:20 |
fungi | sdague: since we currently can't reboot slaves while they're running jobs, i don't think there's a good way to self-test that change... however we could define a new node type which uses that kernel and then set up an experimental job which uses only that node type | 20:20 |
sdague | fungi: ok, lets do that | 20:20 |
fungi | that way it gets its own nodepool image which won't affect other running jobs | 20:21 |
sdague | russellb and I can work on making a flag behind devstack so this is option | 20:21 |
sdague | optional | 20:21 |
sdague | fungi: yep, that would be great | 20:21 |
russellb | sounds like a sane way forward | 20:22 |
fungi | so in that case i think we need a separate nodepool prep script which does the thing in https://review.openstack.org/#/c/69445/1/modules/openstack_project/files/nodepool/scripts/prepare_devstack.sh but otherwise just wraps prepare_devstack.sh, and then specify that as the build script for our new node type in the nodepool configuration | 20:23 |
russellb | ok, i can do that easily enough | 20:24 |
fungi | and keep in mind that we want to rip it back out again and switch to figuring out the kernel package we want from devstack once we're sure this is sane | 20:24 |
jgriffith | jog0: FWIW https://bugs.launchpad.net/openstack-manuals/+bug/1273412 | 20:25 |
jgriffith | let's see how things go an I'm happy to augment the docs, or maybe someone from neutron to give more indepth info | 20:25 |
sdague | russellb: ok, you working on the nodepool side? want me to take the devstack side? or you already have something in process? | 20:25 |
russellb | sdague: i'm doing the nodepool change right now | 20:26 |
sdague | cool | 20:26 |
russellb | i started looking at devstack, but then came back to nodepool after this plan came up | 20:26 |
*** ociuhandu has quit IRC | 20:28 | |
fungi | worth noting, there are a metric ton of nova changes in the check pipeline failing large-ops jobs (but i see one passing in the gate so it's hopefully not a real epidemic) | 20:28 |
sdague | fungi: yep | 20:29 |
russellb | fungi: yeah we're on that ... it's a giant nova patch series | 20:30 |
russellb | sdague: fungi nodepool update - https://review.openstack.org/#/c/69445/ | 20:30 |
jog0 | fungi: its all the check queue | 20:30 |
*** marun has joined #openstack-gate | 20:31 | |
fungi | russellb: awesome. added some initial comments | 20:36 |
sdague | russellb: https://review.openstack.org/69464 | 20:39 |
russellb | fungi: i think wrapping is ok ... the package installed includes headers, too. also installing headers for the old but currently running kernel won't hurt anything | 20:40 |
fungi | russellb: oh, linux-generic-lts-saucy includes an equivalent of linux-headers-`uname -r` ? | 20:44 |
*** SergeyLukjanov is now known as SergeyLukjanov_ | 20:44 | |
* fungi checks | 20:44 | |
russellb | linux-generic-lts-saucy - Generic Linux kernel image and headers | 20:44 |
russellb | according to the package description anyway | 20:44 |
fungi | ah, yep, depends on linux-headers-generic-lts-saucy | 20:45 |
russellb | cool | 20:45 |
fungi | so it should get pulled in fine | 20:45 |
russellb | sdague: so, this looks fine, but won't work for the gate right? | 20:46 |
sdague | russellb: right | 20:46 |
russellb | k | 20:46 |
sdague | sorry, I guess I got the split wrong here. I was thinking we'd put it in devstack, have nodepool hit it with a different variable | 20:47 |
sdague | no worries | 20:47 |
russellb | yeah, don't think nodepool actually runs stack.sh | 20:47 |
russellb | it just pulls the package lists | 20:47 |
fungi | correct | 20:47 |
sdague | gotcha | 20:48 |
russellb | caching all the packages it would have downloaded for every devstack run | 20:48 |
russellb | (but doesn't install them yet) | 20:48 |
russellb | AFAICT | 20:48 |
sdague | right | 20:49 |
russellb | $ vim modules/openstack_project/templates/nodepool/nodepool.yaml.erb | 20:49 |
russellb | Vim: Caught deadly signal SEGV | 20:49 |
russellb | Vim: Finished. | 20:49 |
russellb | Segmentation fault (core dumped) | 20:49 |
russellb | ... | 20:49 |
sdague | heh | 20:49 |
russellb | it is seriously seg faulting *every* time i try to open that file | 20:49 |
russellb | infra is too l33t for vim | 20:50 |
fungi | ohhhh... right. so we definitely still need to do something out of band in the production equivalent of this to make sure the new kernel package is actually installed onto the image and not just cached | 20:50 |
fungi | which will mean a permanent addition to the nodepool prep script i guess | 20:50 |
russellb | fungi: yeah, though the command is telling it to install | 20:50 |
anteaya | jog0: check the first link under gate-tempest-dsvm-neutron : 6 Uncategorized Fails. 97.5% Classification Rate (240 Total Fails) | 20:51 |
*** ociuhandu has joined #openstack-gate | 20:51 | |
anteaya | context: https://review.openstack.org/#/c/69458/ | 20:51 |
fungi | russellb: right, for production we'll want something similar, but would still be nice to figure it out from devstack. not convinced there's a sane way for that though | 20:52 |
anteaya | if the current fingerprint were catching all the fails, that log wouldn't be in uncategorized | 20:52 |
russellb | fungi: yeah, i dunno ... not sure how much it's worth trying to make a generic solution. it seems like a one-off hack | 20:52 |
russellb | fungi: think i should set up this node type for all providers? or think 1 should be enough for the experimental job? | 20:54 |
fungi | russellb: i would just do one for now | 20:54 |
russellb | fungi: k, does it matter which? | 20:55 |
fungi | nah | 20:55 |
russellb | k, *picks top of the providers list* | 20:55 |
fungi | well, not the tripleo cloud provider, but the rest should be fine | 20:55 |
russellb | ha | 20:55 |
russellb | right. | 20:55 |
jog0 | anteaya: whats an example of a hit from message:"No nw_info cache associated with instance"" | 20:57 |
jog0 | anteaya: in http://status.openstack.org/elastic-recheck/data/uncategorized.html#gate-tempest-dsvm-neutron | 20:57 |
fungi | eek, swift devstack exercises for grizzly failing in a grenade job for a stable/havana nova change in the gate | 20:58 |
fungi | new bitrot or known nondeterministic condition? | 20:59 |
mriedem | russellb: jgriffith: we good to go on this? https://review.openstack.org/#/c/69443/ | 20:59 |
mriedem | i am | 20:59 |
russellb | yeah approved | 21:00 |
jgriffith | mriedem: russellb awesome | 21:00 |
russellb | candidate for promoting if there's a gate reset, or it should be in by tomorrow | 21:00 |
russellb | updated https://review.openstack.org/#/c/69445 ... | 21:02 |
russellb | now i guess i need a new job defined, and then have it added as an experimental job for nova and neutron or something | 21:03 |
* russellb learning all this infra amazingness slowly but surely | 21:03 | |
fungi | russellb: yep, that would be good to add in the same change. also i spotted an issue with the nodepool config | 21:06 |
fungi | (see review comment) | 21:06 |
russellb | thanks :) | 21:06 |
russellb | ok, will mark WIP while I get the rest in place | 21:06 |
fungi | will be nice if this can all be added as one config change (i believe it's possible) so that it will be easier to revert once we're done testing it out | 21:07 |
*** gsamfira has joined #openstack-gate | 21:07 | |
russellb | fungi: works for me | 21:07 |
portante | fungi: need some help with swift stuff? | 21:09 |
fungi | portante: spotted this a few moments ago... https://jenkins02.openstack.org/job/gate-grenade-dsvm/5046/consoleText | 21:10 |
fungi | failure of swift devstack exercises in stable/grizzly | 21:11 |
fungi | e-r says it might be bug 1209086 or 1240256 | 21:12 |
fungi | http://logs.openstack.org/24/61924/1/gate/gate-grenade-dsvm/a40c7a4/ | 21:12 |
* portante looks | 21:13 | |
fungi | since we have e-r patterns for it, probably not something new | 21:14 |
*** dhellmann_ is now known as dhellmann | 21:15 | |
*** yjiang5_1 has joined #openstack-gate | 21:15 | |
portante | fungi: looks like the "new" code account server did not start | 21:17 |
portante | yes, that is 1209086 | 21:18 |
fungi | portante: okay, thanks for the confirmation | 21:19 |
russellb | seems like jjb files have changed since i last looked ... more templatey | 21:20 |
gsamfira | hey guys. I am going through the uncategorized bugs now. Where can I find the already created categories? Would like to add some. | 21:21 |
jog0 | gsamfira: ? | 21:23 |
fungi | gsamfira: uncategorized failures just mean there are no patterns to positively match them in elastic-recheck | 21:25 |
gsamfira | looking through the bugs here: http://status.openstack.org/elastic-recheck/data/uncategorized.html . Is there a list of already open bugs where I can add some of these. If they match | 21:25 |
gsamfira | gotcha | 21:25 |
fungi | russellb: yeah, clarkb and jeblair templated-up the devstack jobs to make them less redundant | 21:26 |
fungi | so now jobs can run in check/gate/periodic and on multiple branches without needing separate definitions | 21:27 |
russellb | my brain is tired. | 21:35 |
salv-orlando | russellb: your tired brain is my brain at its best. | 21:36 |
russellb | ha, whatever | 21:36 |
fungi | ummm... did mock move beneath us? there are nova and ceilo changes in the gate are spontaneously but consistently failing all their unit tests with "TypeError: _load_plugins() takes exactly 4 arguments (5 given)" | 21:39 |
russellb | fungi: it's a stevedore release | 21:40 |
fungi | oh. ugh | 21:40 |
russellb | fungi: there's a patch or 2 up for nova as of a few minutes ago ... | 21:40 |
fungi | okay, known issue them. silencing my personal alarm | 21:40 |
dhellmann | several test suites are mocking a private method in stevedore | 21:40 |
fungi | got it. makes sense | 21:40 |
russellb | dhellmann: what could go wrong? | 21:40 |
dhellmann | russellb: indeed | 21:40 |
mriedem | jog0: another e-r query patch for you to look at while i'm working out the fix in tempest: https://review.openstack.org/#/c/69441/ | 21:41 |
russellb | sdague: dhellmann do we need both https://review.openstack.org/#/c/69476 and https://review.openstack.org/#/c/69475 ? | 21:41 |
sdague | russellb: no we were racing on the solution | 21:41 |
dhellmann | russellb: no, I just abandoned mine | 21:41 |
russellb | k | 21:41 |
dhellmann | sdague: please just add that comment about the right way to use stevedore's test API to yours | 21:42 |
mriedem | anteaya: i think you were looking at this earlier: https://bugs.launchpad.net/nova/+bug/1256182 | 21:43 |
mriedem | anteaya: i think that one is probably a difference in how nova-network and neutronv2 api handle the bad request, i'm looking but have to leave soon | 21:43 |
mriedem | that wouldn't be uncommon though, we've had that before with tempest | 21:43 |
mriedem | like how security groups are handled | 21:43 |
sdague | dhellmann: so, honestly, I think a low hanging fruit bug would be more fruitful than that comment | 21:45 |
dhellmann | sdague: ok, I can open that | 21:45 |
*** alexpilotti has quit IRC | 21:47 | |
dhellmann | sdague: https://bugs.launchpad.net/nova/+bug/1273451 | 21:47 |
*** alexpilotti_ has joined #openstack-gate | 21:48 | |
jog0 | do we have a bug filed for the stevadore issue? | 21:48 |
jog0 | ahh ^, I'll add an e-r fingerprint for it | 21:48 |
jog0 | actually hmm, not sure if should wait or not on that | 21:49 |
jog0 | sdague: thoughts^ | 21:49 |
sdague | jog0: we don't, it would be worth while | 21:50 |
sdague | at least nova and ceilometer need to solve it | 21:51 |
sdague | I don't know who else has mocks that need it | 21:51 |
jog0 | sdague: looks like dhellmann just opened bug 1273451 | 21:51 |
sdague | right, yep | 21:51 |
sdague | jog0: you want to write the fingerprint? | 21:51 |
jog0 | sdague: sure | 21:52 |
jog0 | not sure if that makes sense though | 21:52 |
jog0 | hopefully people will read the bug and see if its fixed or not | 21:52 |
sdague | jog0: so actually, dhellmann's bug is a longer term one | 21:52 |
sdague | I'd actually fix a different one for this | 21:52 |
jog0 | sdague: ahh what bug should I put a fingerprint under? | 21:52 |
sdague | jog0: there isn't one yet | 21:52 |
sdague | let me file one quick | 21:53 |
jog0 | query: message:" TypeError: _mock_load_plugins() takes exactly 4 arguments (5 given)" AND filename:"console.html" | 21:53 |
jog0 | only works for nova though | 21:53 |
russellb | the fun never ends. | 21:53 |
jog0 | russellb: heh! | 21:54 |
sdague | jog0: https://bugs.launchpad.net/ceilometer/+bug/1273455 | 21:55 |
sdague | jog0: ceilometer may not have gotten to ES yet | 21:55 |
jog0 | ceilometer | 21:55 |
sdague | it *litterally* just happened | 21:55 |
sdague | jog0: the bug is on both | 21:55 |
*** bnemec has joined #openstack-gate | 21:56 | |
dhellmann | sdague: I'll take the ceilometer side of the fix | 21:56 |
sdague | oh, the function name in ceilometer is different | 21:56 |
sdague | jog0: so your signature would need to account for that | 21:56 |
jog0 | sdague: just surprised that sall | 21:56 |
jog0 | thats all | 21:56 |
jog0 | (message:" TypeError: _mock_load_plugins() takes exactly 4 arguments (5 given)" OR message:" TypeError: _load_plugins() takes exactly 4 arguments (5 given)" ) AND filename:"console.html" | 21:58 |
russellb | unit tests passed | 21:58 |
jog0 | any other projects? | 21:58 |
sdague | jog0: http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiIHRha2VzIGV4YWN0bHkgNCBhcmd1bWVudHMgKDUgZ2l2ZW4pXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjE3MjgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTA4NTk4NDkxMzB9 | 21:58 |
russellb | https://review.openstack.org/#/c/69476/ | 21:58 |
russellb | if anyone wants to hit +A on that | 21:58 |
jog0 | ohh oslo-messaing | 21:58 |
jog0 | russellb: +Aed | 21:59 |
russellb | jog0: thank ya | 21:59 |
sdague | ok, added olso.messaging to the bug | 22:00 |
*** dims has quit IRC | 22:00 | |
*** dims has joined #openstack-gate | 22:02 | |
jog0 | sdague: how about this message:"site-packages/stevedore/extension.py" | 22:02 |
sdague | "_plugins() takes exactly 4 arguments (5 given)" | 22:02 |
sdague | seems like everyone named it close | 22:02 |
jog0 | mine will look hit any stacktrace with stevadore in it | 22:03 |
sdague | yeh, those might actually be different issues | 22:03 |
mriedem | anyone remember libvirt fails like this causing scheduler fails in the gate? "libvirtError: Unable to write to monitor: Broken pipe" | 22:04 |
mriedem | high success rate in builds when it shows up so can't be the reason the compute host goes down | 22:04 |
jog0 | sdague: message:"_plugins() takes exactly 4 arguments (5 given)" doesn't work | 22:05 |
jog0 | mriedem: rings a bell | 22:05 |
sdague | oh.... right, word boundaries | 22:05 |
mriedem | jog0: yeah, there are lots of libvirt error bugs, trying to figure out if this is a dupe...which it probably is | 22:06 |
mriedem | so many to choose from | 22:06 |
jog0 | mriedem: dims may have more insight | 22:06 |
jog0 | sdague: https://review.openstack.org/69483 | 22:08 |
dims | mriedem, only in ceilometer whitelist. haven't run across that myself | 22:09 |
dims | jog0, what broke in oslo.messaging for stevedore? | 22:09 |
sdague | dims: the signature on an private method changed, and turns out nova, ceilometer, and oslo.messaging had mocked it | 22:10 |
dhellmann | sdague: the issue in ceilometer actually points to a bug in stevedore | 22:10 |
dhellmann | do we have a log for the issue in oslo.messaging? | 22:10 |
dhellmann | ceilometer was properly using an API that I broke because there wasn't a test for it | 22:11 |
sdague | http://logs.openstack.org/1d/1d25c5ae20cdd8a9faf5a7f7dc2195a46e5be861/post/oslo.messaging-coverage/abe1a94/console.html | 22:11 |
dhellmann | I think that's the same issue | 22:11 |
dims | thx sdague ! | 22:11 |
dhellmann | ceilometer's tests won't run because it wants pytidylib and that's not on pypi, does someone know the magic incantation to allow that one to be downloaded remotely? | 22:12 |
dhellmann | nevermind, I think I found it | 22:15 |
dhellmann | sdague: https://review.openstack.org/69485 | 22:23 |
SpamapS | I am here to lend my eyes/code/w'ever to the effort for the next 2 hours. | 22:27 |
SpamapS | What's a good starting point? | 22:27 |
dims | dhellmann, so i don't need to do anything in oslo.messaging? | 22:29 |
dhellmann | dims, I'm going to work on a stevedore patch | 22:29 |
dhellmann | I have one that makes the tests pass now | 22:29 |
dims | cool. thx | 22:30 |
dhellmann | I just need to clean up the commit message and tie it to the bug | 22:30 |
dims | ok. back in a bit. | 22:30 |
sdague | dhellmann: cool | 22:32 |
sdague | so that will fix the ceilometer issue? | 22:32 |
dhellmann | sdague: yeah | 22:34 |
dhellmann | patch is merging now, and then I'll tag 0.14.1 | 22:34 |
dhellmann | tag pushed | 22:35 |
dhellmann | release on pypi | 22:35 |
*** jeckersb is now known as jeckersb_gone | 22:38 | |
*** flaper87 is now known as flaper87|afk | 22:50 | |
sdague | dhellmann: does stevedore now rely on a new non pypi package? | 22:56 |
sdague | or was the tidylib thing just ceilometer? | 22:57 |
dhellmann | sdague: the tidylib thing is something in ceilometer | 22:58 |
sdague | ok cool | 22:58 |
sdague | just realized we're not doing stevedore tempest runs until it hits the mirror tonight | 22:58 |
sdague | and wanted to head off any issues there | 22:58 |
dhellmann | ok | 22:58 |
dhellmann | I also submitted 2 patches to remove the use of the broken stevedore class -- it was deprecated anyway | 22:59 |
dhellmann | I need to run out and buy bread and milk like all the other southerners in case it actually snows here tomorrow | 23:00 |
dhellmann | I'll be back online in an hour or two, in case there's more breakage | 23:00 |
*** dhellmann is now known as dhellmann_ | 23:01 | |
*** jeckersb_gone is now known as jeckersb | 23:09 | |
*** masayukig has joined #openstack-gate | 23:12 | |
* cyeoh waves | 23:26 | |
* jog0 waves back | 23:27 | |
*** sdague has quit IRC | 23:27 | |
dims | hey cyeoh | 23:28 |
cyeoh | dims: hi! | 23:28 |
*** sdague has joined #openstack-gate | 23:28 | |
cyeoh | any gate bug in particular getting debugged at the moment? (I'm just back from vacation). Gate queue looks really good at the moment | 23:29 |
jog0 | cyeoh: gate is moving pretty well actually | 23:30 |
jog0 | but we do ahve 66 gate bugs we are tracking http://status.openstack.org/elastic-recheck/ | 23:30 |
sdague | dims: https://review.openstack.org/#/c/69492/ - actually has a pep8 fail in it | 23:31 |
cyeoh | jog0: cool - I'll have a look through. | 23:31 |
jog0 | cyeoh: actually one thing that would be useful is go through that entire list and make sure the bugs are properlly triaged | 23:32 |
cyeoh | jog0: ok I'll check as I look at them | 23:33 |
*** dims has quit IRC | 23:33 | |
jog0 | cyeoh: we also now have http://status.openstack.org/elastic-recheck/data/uncategorized.html | 23:34 |
jog0 | for tracking unclassified failures | 23:34 |
cyeoh | jog0: ah I didn't know about that page | 23:34 |
jog0 | cyeoh: it didn't exist 2 weeks ago | 23:35 |
cyeoh | jog0: heh | 23:35 |
jog0 | so good news is: we have a good grasp on why the gate is failing | 23:35 |
jog0 | bad news is we identified 66 potential issues, with many of them still open | 23:35 |
cyeoh | yea some of them look really rare so its good we're at least tracking them now | 23:36 |
jog0 | cyeoh: yeah | 23:37 |
sdague | cyeoh: yeh, categorization is good | 23:37 |
sdague | because it's actually really interesting to see that rare events show up a few times for us | 23:37 |
cyeoh | sdague: yea agreed. | 23:38 |
*** david-lyle has quit IRC | 23:40 | |
jog0 | sdague: btw it would be really slick if there was an option on http://status.openstack.org/elastic-recheck/ to specify gate, check or all queues | 23:40 |
mriedem | cyeoh: could use another set of eyes on the n-cpu/n-sch logs for this bug to see why the compute host goes down which causes the build error | 23:40 |
mriedem | https://bugs.launchpad.net/nova/+bug/1257799 | 23:40 |
jog0 | or something like that | 23:40 |
mriedem | cyeoh: libvirt is breaking at some point i think but couldn't find anything good to fingerprint that didn't also hit a lot of successful runs | 23:41 |
jog0 | mriedem: thats the libvirt issues | 23:41 |
jog0 | we want to try libvirt 1.x but there are bugs in it | 23:41 |
sdague | jog0: yeh, there's lots of good things that we could do. Honestly, I think I've spent about as much of my time as I can this cycle on the er graphics side. | 23:41 |
jog0 | sdague: heh yeah you did spend a lot | 23:42 |
cyeoh | mriedem, jog0: yea it does look like the libvirt problems we've seen before | 23:42 |
jog0 | sdague: thoughts on 68304 | 23:42 |
sdague | jog0: no more thoughts for the day. Time to call it a night :) | 23:42 |
jog0 | dims was working on the libvirt stuff too | 23:42 |
jog0 | sdague: o/ | 23:43 |
*** dtroyer_zz has joined #openstack-gate | 23:45 | |
*** jd__` has joined #openstack-gate | 23:47 | |
*** dims has joined #openstack-gate | 23:48 | |
mriedem | jog0: might have a fingerprint on this one too: https://bugs.launchpad.net/nova/+bug/1269204 | 23:48 |
mriedem | see last comment | 23:48 |
*** dtroyer has quit IRC | 23:48 | |
*** jd__ has quit IRC | 23:48 | |
mriedem | could be the whole 'timed out waiting for thing' bug though | 23:48 |
*** jd__` is now known as jd__ | 23:48 | |
jog0 | mriedem: nice | 23:54 |
mriedem | jog0: basically anything failing in test_server_rescue.py is becoming suspect to me | 23:55 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!