Monday, 2016-11-07

*** thorst_ has joined #openstack-powervm00:38
*** thorst_ has quit IRC00:43
*** thorst_ has joined #openstack-powervm02:12
*** thorst_ has quit IRC02:17
*** thorst_ has joined #openstack-powervm02:17
*** thorst_ has quit IRC02:28
*** thorst_ has joined #openstack-powervm02:28
*** thorst_ has quit IRC02:37
*** thorst_ has joined #openstack-powervm03:26
*** thorst_ has quit IRC03:28
*** thorst_ has joined #openstack-powervm03:28
*** thorst_ has quit IRC03:37
*** thorst_ has joined #openstack-powervm04:35
*** thorst_ has quit IRC04:42
*** thorst_ has joined #openstack-powervm05:40
*** thorst_ has quit IRC05:47
*** thorst_ has joined #openstack-powervm06:46
*** thorst_ has quit IRC06:53
*** thorst_ has joined #openstack-powervm07:50
*** thorst_ has quit IRC07:57
*** k0da has joined #openstack-powervm08:28
*** k0da has quit IRC08:38
-openstackstatus- NOTICE: Gerrit is going to be restarted due to slowness and proxy errors08:46
*** openstackgerrit has quit IRC08:48
*** openstackgerrit has joined #openstack-powervm08:48
*** thorst_ has joined #openstack-powervm08:55
*** thorst_ has quit IRC09:02
*** kotra03 has joined #openstack-powervm09:12
*** thorst_ has joined #openstack-powervm10:00
*** k0da has joined #openstack-powervm10:02
*** thorst_ has quit IRC10:07
*** thorst_ has joined #openstack-powervm11:05
*** thorst_ has quit IRC11:12
*** thorst_ has joined #openstack-powervm12:10
*** thorst_ has quit IRC12:17
*** thorst_ has joined #openstack-powervm12:50
*** thorst_ has quit IRC12:50
*** thorst_ has joined #openstack-powervm12:52
*** kylek3h has quit IRC12:57
*** tblakes has joined #openstack-powervm13:02
*** edmondsw has joined #openstack-powervm13:17
*** apearson has joined #openstack-powervm13:23
*** jwcroppe has quit IRC13:30
*** kylek3h has joined #openstack-powervm13:30
*** kylek3h has quit IRC13:30
*** jwcroppe has joined #openstack-powervm13:30
*** kylek3h has joined #openstack-powervm13:30
*** jwcroppe has quit IRC13:35
*** mdrabe has joined #openstack-powervm13:46
*** jwcroppe has joined #openstack-powervm13:55
*** dwayne_ has quit IRC14:10
*** kotra03 has quit IRC14:17
*** seroyer has joined #openstack-powervm14:40
openstackgerritShyama proposed openstack/nova-powervm: SSP Volume Adapter  https://review.openstack.org/37225414:42
*** esberglu has joined #openstack-powervm14:49
*** kriskend has joined #openstack-powervm14:50
*** dwayne_ has joined #openstack-powervm14:50
*** k0da has quit IRC15:25
*** apearson has quit IRC15:30
*** apearson has joined #openstack-powervm15:37
*** apearson_ has joined #openstack-powervm15:40
*** adi___ has quit IRC15:41
*** tjakobs has joined #openstack-powervm15:41
*** apearson has quit IRC15:42
*** toan has quit IRC15:43
*** toan has joined #openstack-powervm15:47
*** adi___ has joined #openstack-powervm15:54
*** apearson_ has quit IRC15:59
*** apearson_ has joined #openstack-powervm16:02
*** adi___ has quit IRC16:06
*** adi___ has joined #openstack-powervm16:09
thorst_efried: asked in openstack-nova for them to take another look at the powervm blueprint.  I think they will this week.16:23
adreznecthorst_: We can bring it up in the open floor part of the nova meeting as well16:32
thorst_yeah, we'll see.  I was going to do that last meeting but that was like a 30 second window to get it in16:37
thorst_and I missed it16:37
thorst_:-)16:37
thorst_esberglu: if we got you a patch to test in the CI...could you apply it?16:56
thorst_its in addition to the local2remote16:56
esbergluYeah16:56
thorst_try 445816:57
thorst_I guess efried will give you the go ahead before doing it16:57
esbergluThe LU thing is hitting the undercloud again17:02
*** smatzek has joined #openstack-powervm17:13
efriedesberglu: Try 4458 now.17:55
efriedesberglu: want me to look at "The LU thing"?17:56
*** jwcroppe has quit IRC18:00
*** jwcroppe has joined #openstack-powervm18:01
*** jwcroppe_ has joined #openstack-powervm18:04
*** jwcroppe has quit IRC18:05
*** jwcroppe_ has quit IRC18:06
*** jwcroppe has joined #openstack-powervm18:07
*** jwcroppe has quit IRC18:11
thorst_esberglu: seems like efried is close to having this ready18:26
*** jwcroppe has joined #openstack-powervm18:42
*** jwcroppe has quit IRC18:46
*** jwcroppe has joined #openstack-powervm18:46
*** jwcroppe has quit IRC18:51
esbergluefried: Sorry for the slow delay, out at lunch.18:57
esbergluSo nodes are failing to delete again, but only from 2 of the 5 SSP groups18:57
esberglu*2 of the 4 SSP groups18:58
adreznecInteresting18:59
esbergluSeeing this in the logs18:59
esbergluhttp://paste.openstack.org/show/588307/19:00
adreznecthorst_: all the SSPs are from the same SAN, right?19:00
thorst_adreznec: yep19:00
thorst_but usually when we reinstall the VIOSes we have to remove the luns on the storage for the host19:00
efriedesberglu, that looks like yet another VIOS flake.19:00
adreznecThat's... weird19:00
adreznecYeah19:00
thorst_so it doesn't install to the wrong thing19:00
adreznecSure19:00
adreznecBut I don't think esberglu has reinstalled anything here19:00
adreznecJust the VIOS upgrades recently, right?19:01
esbergluYeah19:01
efriedYeah, exactly.19:01
efriedDo we need another REST-side fix from apearson_ to retry when VIOS effs up?19:01
efriedWhat's the point of a clustered file system if you can't operate on it in a distributed fashion?19:02
efried^^ Captain Obvious ^^19:02
adreznecefried: Asking the tough questions19:04
efriedadreznec thorst_ Should we open a VIOS defect and chase it around for a few weeks before they tell us they "just provide building blocks" and we have to synchronize ourselves?19:06
adreznecSigh19:11
adreznecprobably19:11
adreznecThat was the point of being on the latest VIOS19:11
adreznecesberglu: is this recreatable pretty easily?19:11
esbergluIf I leave the CI on for long enough it seems to hit it. Not sure what steps it would take to recreate on purpose19:13
efriedesberglu, how long ago did the error occur?19:13
efriedI.e. if we snap the offending VIOS(es), would that give the VIOS team enough information to nail down exactly what operation caused the bounce.19:14
esbergluThat error happened yesterday afternoon at some point19:17
efriedboo, that's probably not recent enough.19:18
efriedesberglu, see mriedem's comment in #openstack-nova19:23
efriedIt's going to blow up our skip list, potentially, cause I sure would like to keep the test names in comments.19:23
esbergluYep19:23
esbergluefried: Okay the error just popped again on neo819:29
efriedsame error?19:29
esbergluYeah19:29
efriedQuick, run a snap on the offending VIOS19:29
efriedAnd open a VIOS defect.19:29
esbergluHow do I do that?19:30
efriedWhich?19:30
adreznecRun snap?19:30
esbergluYeah19:30
seroyersnap19:30
efriedhm, on another glance, this actually looks like it may be much lower down in the stack - an actual ODM lock timeout.  That's trickier.19:31
adreznecYeah...19:32
adreznecthat's an AIX bug19:32
efriedesberglu: snap -a19:33
esberglu-a option flag is not valid19:35
efriedTry it as root.19:37
efriedoem_setup_env19:37
seroyerYou can snap as padmin19:37
seroyerCan and should19:37
esbergluHmm it isn't letting me specify the -a flag as padmin, but it does for root19:38
efriedokay, do what seroyer says then.19:38
efriedEither way.19:39
esbergluseroyer: Should I just do it without the -a as padmin? Or with -a as root?19:39
seroyerI don't know what the -a does as root, so I'm not sure how to answer that.  snap -help should give you a list of supported options.19:40
efriedBut if we're seeing an ODM lock problem, I kinda doubt we're going to convince VIOS to implement the retry on their end.  Or convert it to a "VIOS busy" error".19:40
seroyerBut I've always just run it without any args.19:40
seroyerefried, +119:40
efriedSo it's likely we'll have to have apearson_ do the retry magic keyed off that error code.19:41
efriedAlthough we have to hope that VIOS at least has atomicity/transactionalism around whatever that operation is.19:41
efriedI.e. it's not doing part of an operation and then failing.19:41
efriedAll of that needs to be ferreted out via the defect before we take action in REST.19:41
esbergluOkay so what actions do I need to take for this?19:48
efriedThe snap command should have printed the location of a tarball at the end.19:49
efriedOpen a CQ defect against VIOS and attach that snap.19:50
efriedadreznec seroyer thorst_ - I can never remember how all the routing info goes for these defects - y'all have that?19:50
thorst_efried: I do not...19:51
efriedesberglu, once done, give me the defect number and I'll put in my rant.19:51
efriedesberglu, this won't be the last VIOS defect you open, so keep notes on the process for future reference ;-)19:51
thorst_efried: yeah, lets ask hsien and apearson to do the magic retry thing19:51
thorst_we'll get a quick fix then19:52
thorst_(sorry  - I read 80% of that thread...so hopefully that was enough for me to go off of)19:52
efriedthorst_, want to do that right away?  I think we should at least make sure VIOS is cleaning up properly first.19:52
thorst_efried: I just want to do the quickest thing to unwedge us19:52
thorst_we've got to get CI unwedged ASAP...19:53
efriedThis is intermittent.19:53
efriedesberglu, when it happens, does it bring the whole CI crashing down around us?19:53
thorst_even if it kills a test run...that's kinda bad.  Leads to an inconsistent CI...19:55
thorst_I'm assuming we can't do the retry in nova-powervm.19:55
thorst_too high level?19:55
esbergluEventually yeah. And as the nodes hanging on delete start accumulating, it messed nodepool up19:56
esbergluRight now the CI is still "running" though19:57
efriedthorst_, we could start scraping the error message of every 500 we receive for a set of known error codes (in this case 0514-516) and spoof those as retries.  That would be a pypowervm thing.  But that doesn't seem like The Right Thing.20:04
thorst_efried: agree..lets get this to hsien and apearson to fast path in.20:07
seroyeradreznec, I have the defect routing info.  I can pm to the one who needs it.20:12
adreznecefried: esberglu ^^20:12
esbergluseroyer: I need it20:13
efriedthanks seroyer20:13
*** jwcroppe has joined #openstack-powervm20:27
efriedadreznec thorst_, turns out these VIOSes were running 1614_61e.  According to seroyer, that's actually the latest GAed version - but it's still pretty old - like from 1Q16.  Do we want to be on a recent 61k?  Apparently there's a GOLD level at 1640E_61k.20:31
adreznecBlah20:34
adreznecProbably20:34
adreznecThe VIOS team will tell us to recreate on that anyway I'm sure20:34
seroyerNo, what this was hit on was the latest GA level.  They have to support it.20:37
thorst_ahhh21:02
thorst_nice pt seroyer21:02
*** smatzek has quit IRC21:23
thorst_esberglu: did we get a test today with that new patch?21:36
esbergluI was waiting for efried to give me the okay21:37
*** tblakes has quit IRC21:37
efried(11:55:50 AM) Eric F.: esberglu: Try 4458 now.21:37
*** tblakes has joined #openstack-powervm21:38
*** kriskend_ has joined #openstack-powervm22:02
*** kriskend has quit IRC22:04
*** thorst_ has quit IRC22:10
esbergluthorst_: efried: I just redeployed the CI management node and it should have both pypowervm patches applied.22:10
esbergluSo runs should be going through at some point this evening with that22:11
*** edmondsw has quit IRC22:15
*** tblakes has quit IRC22:28
efriedesberglu: both?22:37
esbergluThere was 1 that got applied already. So just the 1 new one22:37
esbergluThat local2remote one that qinq wu is working on changing and then this one22:38
efriedokay.22:40
*** mdrabe has quit IRC22:42
*** kylek3h has quit IRC22:51
*** kriskend_ has quit IRC23:04
*** tjakobs has quit IRC23:05
*** esberglu has quit IRC23:12
*** esberglu has joined #openstack-powervm23:13
*** esberglu has quit IRC23:18
*** esberglu has joined #openstack-powervm23:26
*** apearson_ has quit IRC23:29
*** esberglu has quit IRC23:30
*** seroyer has quit IRC23:40
*** thorst_ has joined #openstack-powervm23:47
*** esberglu has joined #openstack-powervm23:49

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!