Tuesday, 2017-08-08

*** thorst has joined #openstack-powervm00:00
*** thorst has quit IRC00:02
*** thorst has joined #openstack-powervm00:05
*** apearson has joined #openstack-powervm00:13
*** thorst has quit IRC00:23
*** thorst has joined #openstack-powervm00:23
*** thorst has quit IRC00:27
*** thorst has joined #openstack-powervm00:38
*** edmondsw has joined #openstack-powervm00:42
*** edmondsw has quit IRC00:47
*** svenkat has quit IRC01:09
*** svenkat has joined #openstack-powervm01:12
*** svenkat has quit IRC01:19
*** thorst has quit IRC01:24
*** apearson has quit IRC01:30
*** apearson has joined #openstack-powervm01:30
*** edmondsw has joined #openstack-powervm02:30
*** edmondsw has quit IRC02:35
*** apearson has joined #openstack-powervm02:41
*** apearson has quit IRC03:21
*** apearson has joined #openstack-powervm03:22
*** apearson has quit IRC03:23
*** apearson has joined #openstack-powervm03:26
*** esberglu has quit IRC03:59
*** thorst has joined #openstack-powervm04:25
*** thorst has quit IRC04:30
*** esberglu has joined #openstack-powervm04:49
*** esberglu has quit IRC04:53
*** apearson has quit IRC05:00
*** apearson has joined #openstack-powervm05:01
*** apearson has quit IRC05:01
*** apearson has joined #openstack-powervm05:01
*** apearson has quit IRC05:02
*** apearson has joined #openstack-powervm05:03
*** apearson has quit IRC05:03
*** thorst has joined #openstack-powervm05:58
*** thorst has quit IRC06:03
*** edmondsw has joined #openstack-powervm06:06
*** edmondsw has quit IRC06:11
*** edmondsw has joined #openstack-powervm07:54
*** edmondsw has quit IRC07:59
*** thorst has joined #openstack-powervm07:59
*** thorst has quit IRC08:04
*** esberglu has joined #openstack-powervm08:28
*** esberglu has quit IRC08:32
*** edmondsw has joined #openstack-powervm09:43
*** edmondsw has quit IRC09:47
*** thorst has joined #openstack-powervm10:00
*** thorst has quit IRC10:05
*** esberglu has joined #openstack-powervm10:18
*** esberglu has quit IRC10:21
*** thorst has joined #openstack-powervm10:42
*** thorst has quit IRC10:54
*** thorst has joined #openstack-powervm10:54
*** thorst has quit IRC10:59
*** svenkat has joined #openstack-powervm11:40
*** svenkat_ has joined #openstack-powervm11:44
*** svenkat has quit IRC11:44
*** svenkat_ is now known as svenkat11:44
*** smatzek has joined #openstack-powervm11:46
*** smatzek_ has joined #openstack-powervm11:48
*** smatzek has quit IRC11:48
*** thorst has joined #openstack-powervm11:51
*** edmondsw has joined #openstack-powervm11:57
*** esberglu has joined #openstack-powervm12:04
*** esberglu has quit IRC12:09
*** esberglu has joined #openstack-powervm12:58
esberglu#startmeeting powervm_driver_meeting13:01
openstackMeeting started Tue Aug  8 13:01:02 2017 UTC and is due to finish in 60 minutes.  The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot.13:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.13:01
openstackThe meeting name has been set to 'powervm_driver_meeting'13:01
mdrabeo/13:01
edmondswo/13:01
esberglu#topic In Tree Driver13:02
esberglu#link link https://etherpad.openstack.org/p/powervm-in-tree-todos13:02
esbergluI don't think there is anything new IT13:03
edmondswright13:03
esberglu#topic Out Of Tree Driver13:03
edmondswthorst please check 564513:04
thorstedmondsw: yes sir.13:05
edmondswI think that's all we've got going OOT at the moment13:05
esberglu#topic PCI Passthrough13:06
esbergluAnything new here?13:06
edmondswI don't think we've made any progress here yet. efried is finishing up some auth work and then we can start to make progress13:07
efriedo/13:07
efriedYeah, what edmondsw said.13:07
esberglu#topic PowerVM CI13:08
esbergluTested the devstack gen. tempest.conf one last time for all runs last night, all looked good13:09
edmondswgreat13:09
esbergluGot the +2 from edmondsw, anyone else want to look before I merge?13:09
esbergluTempest bugs are getting worked through13:10
edmondswdo we need to be opening a LP bug about those 2 tests having the same id?13:10
efriedesberglu I don't need to look again.13:10
esbergluedmondsw: I think that it is intentional for those 213:10
efriedIf it's tested and edmondsw is happy, I'm happy.13:10
esbergluThey are the same test, just different microversions13:10
edmondswI'd rather we weren't having to skip a couple new tests, but that seems a small price to pay to get this in13:11
edmondswI hope there's a todo to figure that out and get those unskipped?13:11
esbergluedmondsw: Yeah I was going to add it to the list once I merged13:11
edmondswyeah, I know it's kinda the same test... still thought they should probably have different ids but maybe not13:11
edmondswesberglu I'd go ahead and add it just to make sure we don't forget :)13:12
esbergluI can disable the 2.48 version of the tests by setting the max_microversion13:12
edmondswI'd rather not13:12
esbergluBut I'm not familiar enough with compute microversions to know if that's really what we want13:12
esbergluI didn't think so either13:12
efriedCan I get some background here?13:12
efriedTwo different tests testing the same thing over different microversions of the API ought to have different UUIDs.  I very much doubt that was intentional.13:13
efriedAnd we should be able to handle both microversions in our env.  If we can't, and that's passing in the world at large, it's our bug.13:13
edmondswefried check 559813:14
esbergluhttps://github.com/openstack/tempest/blob/master/tempest/api/compute/admin/test_server_diagnostics.py13:14
esbergluI'm guessing whoever made the V248 test there just copied the original test case and didn't change the ID13:14
edmondswefried I expect efried is right, but I didn't look at how the test is actually written... is it one method, so one id, but run twice somehow?13:14
edmondswesberglu ah in that case it does sound like a bug13:15
efriedesberglu I suspect that's what happened.13:15
esbergluAnyways I can look into it13:15
edmondswesberglu open the LP bug... worst case they reject it13:15
edmondswtx13:15
esbergluYep13:15
esbergluOther bugs...13:15
esbergluThere was a bug in tempest where the REST requests would timeout13:16
esbergluefried made a loop to see if it was permanent or temporary13:16
esbergluhttps://review.openstack.org/#/c/491003/13:16
esbergluWith that getting patched in we no longer are seeing that timeout13:16
esbergluBut we still need to find out what's causing the timeout and make a long term solution13:17
edmondsw++13:17
esbergluhsien got to the bottom of the internal server error 500's13:17
efriedoh, do tell13:17
edmondswsweet13:18
edmondsw565713:18
esbergluThere was an issue with the vios busy rc not being honored and retrying13:18
efriedbtw, that loop fixup should have logged a message when we hit it.  We should look for that log message and see how many times it hits per test.  I suspect the very next try went through.  Which probably means it's a threading problem at the server side of that call.13:19
esbergluefried: Will do13:19
efriedesberglu Another experiment that might be worthwhile is knocking our threading level down.  It's possible we're just timing out due to load.13:20
efriedThough... it seems like it would always hit on one or more of the same three or four tests, nah?13:21
esbergluefried: Yeah same handful of tests13:21
edmondswesberglu you also had something about discover_hosts on the agenda?13:22
edmondswdid we get that all straight?13:22
edmondswlooks like the CI has been better13:22
esbergluedmondsw: Was just going to say that our fix is working there13:22
edmondswawesome13:22
esbergluYep with that and efried's retry loop success rates are up13:22
esbergluhsien's fix is +2 so should be in soon, then I will update the systems13:23
efriededmondsw It needs to be noted that the retry loop is in tempest code, not our code.13:23
efriedSo it's not a long-term fix (unless we can make the case that it should be submitted to tempest itself).13:23
edmondswefried right, we need to figure out what's going on there and how to fix it permanently13:24
efriedYeah, cause I don't think it's a good idea for us to be running long-term with a tempest patch.13:24
edmondsw++13:24
esberglu++13:24
edmondswthat on the todo list, esberglu?13:24
edmondswat the top? :)13:25
esbergluedmondsw: I need to do an update of the list after the meeting but yeah it will be13:25
edmondswcool13:25
edmondswI was going to ask about http://184.172.12.213/92/474892/6/check/nova-in-tree-pvm/2922a78/13:25
edmondswI'm pretty sure I've seen that kind of failure before... but can't remember where it ended up13:26
esbergluedmondsw: Yeah I saw that. I think when I removed a bunch of tests from the skip list with the networking api extension change some may have introduced new issues13:26
esbergluI know we have had those before, can't remember what our solution was13:27
edmondswok, that makes sense. cuz I thought we'd fixed that, but it was probably with a skip13:27
esbergluedmondsw: IIRC its an issue with tests interfering with each other13:28
esbergluThat's all for CI13:29
esberglu#topic Driver Testing13:29
esbergluAny progress?13:29
edmondswI opened RTC stories for testing13:30
edmondswI ordered them such that we'd validate vSCSI, FC, and LPM with the OOT driver before coming back to iSCSI13:30
edmondswgive us some time to do the dev work on iSCSI13:30
edmondswdon't see jay1_ on to discuss further13:31
edmondswchhavi fyi ^13:31
esberglu#topic Open Discussion13:33
esbergluAny last words?13:33
edmondswI finally got devstack working! ;)13:33
edmondswso there are a bunch of additions to https://etherpad.openstack.org/p/powervm_stacking_issues13:33
esbergluWoohoo!13:33
edmondswthat last one was really weird... hope that's really the fix, and it wasn't just coincidence that it worked after that13:34
edmondswI'm pretty sure it's legit13:34
edmondswthat's it from me13:35
esbergluThanks for joining13:35
esberglu#endmeeting13:35
openstackMeeting ended Tue Aug  8 13:35:32 2017 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)13:35
openstackMinutes:        http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2017/powervm_driver_meeting.2017-08-08-13.01.html13:35
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2017/powervm_driver_meeting.2017-08-08-13.01.txt13:35
openstackLog:            http://eavesdrop.openstack.org/meetings/powervm_driver_meeting/2017/powervm_driver_meeting.2017-08-08-13.01.log.html13:35
*** smatzek_ has quit IRC13:43
*** smatzek has joined #openstack-powervm14:01
*** smatzek has quit IRC14:05
*** smatzek has joined #openstack-powervm14:06
*** tjakobs has joined #openstack-powervm14:07
*** smatzek has quit IRC14:46
thorstefried: 5620 - I'm going to merge that14:56
efriedthorst Do it.14:56
*** smatzek has joined #openstack-powervm15:22
esbergluCI environment is updated with hsien's vios busy fix. Should hopefully see a decent chunk of failures disappear15:37
efriedesberglu Cool.  Confirm that one-and-done behavior of that timeout yet?15:59
*** smatzek has quit IRC15:59
*** smatzek has joined #openstack-powervm16:00
edmondswesberglu efried timeouts don't seem to be a thing of the past yet: http://184.172.12.213/94/490994/3/check/nova-in-tree-pvm/5ce0e6c/16:29
esbergluefried: Nope not yet16:32
efriededmondsw esberglu I do believe those are *real* timeouts.16:32
efriedactual test timeouts.16:32
esbergluedmondsw: efried: Yep. Those are timeouts waiting for servers to reach a certain status16:32
esbergluWhere as the other timeouts were REST request timeouts16:33
edmondswi.e. indicating there is a problem with the patch being proposed, or something for us to dig into?16:33
efriedBut it's those same effin tests still.16:34
esbergluNot diagnosed fully yet. Definitely something we need to dig into16:34
efriededmondsw The patch isn't being proposed :)  At least not yet.  It's a stopgap to help us identify whether there's an actual problem that needs to be solved.16:34
edmondswefried not what I meant16:34
edmondswI meant the patch that the CI is testing, not the patch that is supposed to help us avoid timeouts16:35
efriedgotcha.16:35
efriedWell, the fact that it's these same bloody tests...16:35
edmondswyeah, it's not a good sign16:35
efriedesberglu Tried bumping the concurrency down yet?16:35
esbergluefried: We did that at some point in the past with no success16:36
efriedWellllll16:36
efriedThat would have been a different timeout.16:36
esbergluefried: No we tried it for the actual server timeouts16:37
efriedI'll grant that we MAY still need to figure out why those REST requests are timing out.  But the fact that the SAME TESTS are still hitting overall test case timeouts pretty much indicates that, for the function these tests are hitting, our performance sucks.16:38
efriedThat would be corroborated if we don't see the same failures when we reduce the concurrency and/or increase the test timeout.16:38
efriedAnd those actions would have been inconclusive before because we were hitting that REST request timeout instead, which was masking this 'un.16:39
esbergluThe rest timeout wasn't masking the other timeouts16:40
esbergluI agree with everything else though16:42
efriedesberglu The exception we were seeing without the retry loop was in the REST request.  It wasn't getting to the overall test timeout, was it?16:45
esbergluefried: If a test hit the REST timeout it wouldn't get to the overall timeout. But tests would still get to the overall timeout and not hit the REST timeout16:46
efriedesberglu Okay, if you're sure which one we were seeing under which configuration.16:47
*** smatzek has quit IRC16:53
efriedesberglu What's the overall single-test timeout?16:55
esberglu120016:56
esberglu20 min16:56
*** kylek3h has joined #openstack-powervm17:04
*** smatzek has joined #openstack-powervm17:17
*** chhavi has quit IRC17:33
esbergluefried: edmondsw: I think the fix from hsien is causing problems18:28
efriedoh goodie.  What kind of problems?18:28
efriedWe getting extra VIOS_BUSYs?18:28
esbergluefried: Not sure. Just got back from lunch and seeing a bunch of runs like this18:28
esbergluhttp://184.172.12.213/66/491866/1/check/nova-out-of-tree-pvm/d069a7f/powervm_os_ci.html18:28
esbergluCould be something else and just coincidental timing, diving into the logs now18:29
edmondswlovely18:30
esbergluLooks more like what we would see with a cells/placement issue18:31
efriedesberglu Yeah, compute didn't even start.18:31
esbergluefried: Looks like nothing started18:34
efriedyeah18:34
efriedthough stack appears to have succeeded.18:34
esbergluWell this should be fun to debug18:35
esbergluefried: I have a test node up for something else that hit this18:37
esberglu"Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable."18:37
esbergluMight just not be logging properly18:37
efrieduck18:38
edmondswesberglu I was seeing log rotation while I was trying to stack the other day. Very annoying18:41
edmondswhit an issue, go away for a few hours and then when you come back to it the logs aren't there anymore18:41
edmondswI'm not a fan of journalctl so far...18:42
esbergluMe neither18:42
edmondswhopefully there's a way to get it to save gzipped log files?18:45
esbergluGet what to save gzipped log files?18:45
esbergluThis journalctl issue is hitting the passing runs too18:46
esbergluSo we don't have CI logs for the time being18:46
edmondswjournald18:47
efriedThere must be a way to set the rotate size18:47
esberglu /etc/systemd/journald.conf18:47
esbergluThat's the conf for it, looking at the options now18:48
edmondswI don't see anything about rotation, just retention18:49
edmondswwhich probably makes sense... if it's managing everything for you without files, there aren't files to rotate18:50
esbergluLooks like it saves the journals to /run/log/journal18:51
esbergluI wonder if we are filling that up18:51
edmondswlook at the "MaxFile" ones18:52
edmondswjournalctl does say that "Output is interleaved from all accessible journal files, whether they are rotated or currently being written", so it's not that it's only reading from the current file18:54
edmondswesberglu it looks to me like we should unset SystemMaxFiles and RuntimeMaxFiles and instead use SystemMaxFileSize and RuntimeMaxFileSize18:56
edmondswoh, nm... you need both18:57
edmondswbut one or the other probably needs to be increased18:58
esbergluedmondsw: Spawning a system to give it a try19:00
*** dwayne has quit IRC19:10
esbergluedmondsw: Tried using the RuntimeMaxFileSize with no luck. There is another option to save the logs to /var/log/journal21:01
esbergluinstead of /run/log/journal21:01
esbergluWhich seems to have worked21:01
edmondswcool21:01
esbergluTrying it out on a fresh system for a full run now21:02
*** svenkat has quit IRC21:04
*** thorst has quit IRC21:37
*** edmondsw has quit IRC21:38
*** smatzek has quit IRC22:03
*** esberglu has quit IRC22:03
*** esberglu has joined #openstack-powervm22:04
*** esberglu has quit IRC22:08
*** esberglu has joined #openstack-powervm22:16
*** tjakobs has quit IRC22:34
*** thorst has joined #openstack-powervm22:38
*** thorst has quit IRC22:43
*** apearson has joined #openstack-powervm23:32
*** apearson has quit IRC23:40

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!