Wednesday, 2017-06-07

*** thorst_afk has joined #openstack-powervm01:06
*** thorst_afk has quit IRC01:11
*** thorst_afk has joined #openstack-powervm01:11
*** thorst_afk has quit IRC01:20
*** edmondsw has quit IRC01:36
*** svenkat has quit IRC01:45
*** thorst_afk has joined #openstack-powervm01:56
*** YuYangWang has joined #openstack-powervm02:06
*** thorst_afk has quit IRC02:32
*** thorst_afk has joined #openstack-powervm02:33
*** thorst_afk has quit IRC02:37
*** edmondsw has joined #openstack-powervm02:46
*** edmondsw has quit IRC02:51
*** thorst_afk has joined #openstack-powervm03:43
*** thorst_afk has quit IRC04:02
*** chhavi has joined #openstack-powervm04:03
*** edmondsw has joined #openstack-powervm04:35
*** edmondsw has quit IRC04:39
*** thorst_afk has joined #openstack-powervm05:59
*** thorst_afk has quit IRC06:04
*** YuYangWang has quit IRC06:10
*** edmondsw has joined #openstack-powervm06:23
*** edmondsw has quit IRC06:28
*** thorst_afk has joined #openstack-powervm07:01
*** thorst_afk has quit IRC07:05
*** k0da has joined #openstack-powervm07:13
*** thorst_afk has joined #openstack-powervm08:01
*** edmondsw has joined #openstack-powervm08:11
*** edmondsw has quit IRC08:16
*** thorst_afk has quit IRC08:21
*** thorst_afk has joined #openstack-powervm09:18
*** thorst_afk has quit IRC09:22
*** edmondsw has joined #openstack-powervm09:59
*** chhavi has quit IRC10:01
*** chhavi has joined #openstack-powervm10:01
*** edmondsw has quit IRC10:04
*** smatzek has joined #openstack-powervm10:39
*** smatzek has quit IRC10:43
*** chhavi has quit IRC11:07
*** smatzek has joined #openstack-powervm11:20
*** edmondsw has joined #openstack-powervm11:27
*** svenkat has joined #openstack-powervm11:40
*** thorst_afk has joined #openstack-powervm11:49
*** jpasqualetto has joined #openstack-powervm12:11
*** jpasqualetto has quit IRC12:46
*** chhavi has joined #openstack-powervm13:02
*** mdrabe has joined #openstack-powervm13:13
*** jpasqualetto has joined #openstack-powervm13:26
edmondswefried please take a look at https://review.openstack.org/#/c/471773/ and let me know if that's what you were thinking13:27
edmondswefried then we need to talk about how to test it, and I need to write UTs13:27
efriededmondsw Will do.13:27
*** smatzek has quit IRC13:40
*** jwcroppe has joined #openstack-powervm13:45
*** smatzek has joined #openstack-powervm14:03
esbergluefried: edmondsw: thorst_afk: Looking into the timeout errors. So far I've seen it on all of the systems of 1 of the SSP groups. And haven't seen it on any other systems14:10
esbergluSo it could be some sort of issue with the SSP14:10
edmondswesberglu interesting14:11
edmondswwe all know SSP never has issues... ;)14:11
esbergluI'm looking through some more failures to confirm the above14:11
edmondswesberglu one minor reword on 5406, but looking good overall, tx for the changes14:14
esbergluedmondsw: And you're cool with the "none" thing?14:15
edmondswyeah, it's not ideal but it's not a deal breaker. Bigger fish to fry14:16
esbergluedmondsw: Yeah I'm confident it's an SSP issue. 25+ timeouts among neo19, neo21, neo24, neo2514:19
esbergluvs. 1 on neo39 (which could be a different issue) and no other systems14:19
esbergluAnd those 4 are in the same cluster14:20
edmondswesberglu let's look at that one issue on neo39... link?14:27
*** jwcroppe has quit IRC14:28
esbergluedmondsw: http://184.172.12.213/37/459737/6/check/nova-out-of-tree-pvm/de5b4e8/14:28
edmondswesberglu that looks like the same issue to me14:29
esbergluedmondsw: I wasn't saying it couldn't be the same issue. Just that other things could cause a build to timeout14:31
edmondswesberglu could also be one SSP cluster is more reliable than another but still not 100% reliable14:31
esbergluedmondsw: Any idea where to start debugging SSP issues?14:32
edmondswesberglu other than pinging that team, no14:32
edmondswefried any suggestions?14:33
edmondswesberglu are the 4 tempest tests in your CI TODO etherpad still the only places we've hit this?14:35
*** jwcroppe has joined #openstack-powervm14:36
efriedWell, the first thing to do is grep the logs for how long SSP-related ops are taking.  LU creation, upload, etc.14:40
efriedCompare the results on the good cluster vs the bad, see what the pattern is.14:41
efriedIs it just LU creation?  Just upload?  Or both?14:41
efriedIs it every time, or intermittent?14:41
*** tjakobs has joined #openstack-powervm14:42
efriedThen probably sanity check the config of the SSPs.  Are they backed by the same SAN?  Do they have the same kind & number of disks?  Same number of VIOSes with the same kind of resources?14:43
*** mdrabe has quit IRC14:44
efriedI believe it also may be possible for one bad VIOS to affect the performance of the whole cluster; so maybe try removing one VIOS at a time in turn and see if it gets better.14:44
esbergluedmondsw: afaik14:45
efriedWe can engage the VIOS team as soon as we have hard evidence (timings from the log) that prove the pattern and point to the specific op(s) taking too long.14:45
efriedYeah, also that - dig into those four tests and see what's different about them.  They shouldn't be doing any specific op that's unique to those tests, but maybe they're doing a particular op multiple times where other tests just do it once.14:46
edmondswefried the log color work y'all did a couple weeks ago.. how do I get vim to interpret that and give me colors instead of those nasty codes?14:48
efriedvim, no idea, would have to RTFM14:48
efriedWhy do you need to edit logs?14:48
efriededmondsw https://stackoverflow.com/questions/10592715/ansi-color-codes-in-vim (first google hit)14:49
edmondswefried I don't need to edit them... what do you use to view them?14:49
efriededmondsw I just use terminal tools like less, grep, etc.  Those will interpret the codes by default.14:50
efriedIf not, use less -R14:50
esbergluedmondsw: mac or windows?14:50
edmondswmac14:50
edmondswefried ah, -R14:51
edmondswtx14:51
efriedsweet14:51
esbergluThe console application has some nice filtering capabilities but 0 color support14:51
esbergluBut if I want color I just use less -R14:51
efriedesberglu btw, don't make any sudden moves on the prep_devstack.sh changes - I'm in mid-review.14:52
esbergluefried: k14:52
*** mdrabe has joined #openstack-powervm14:54
esbergluefried: edmondsw: Seeing this in the logs15:12
esbergluhttp://paste.openstack.org/show/611732/15:12
esbergluSo the test in question is trying to rebuild the server and times out waiting for the server to become active15:13
esbergluBut for some reason it is being deleted here before it finishes rebuilding15:13
esbergluSearching for why it's getting deleted15:14
efriedPossibly this is just the manifestation of that condition.  It fails to build, so it deletes it, then the test proceeds and that's just how it fails.15:14
esbergluefried: Yeah15:15
mdrabeefried: Is there a get_instance_wrapper_from_uuid API in OOT or should I make one?15:46
efriedmdrabe From nova UUID or pvm UUID?15:46
mdrabenova15:46
efriedYou're looking to get the LPAR wrapper, not the instance obj, right?15:47
mdrabeyup15:47
efriednova_powervm.virt.powervm.vm.get_instance_wrapper takes an instance.  You could update it so the 'instance' arg can be either a UUID or an instance obj.15:48
efriedWhich ultimately probably just means making nova_powervm.virt.powervm.vm.get_pvm_uuid accept either15:49
efried(but update docstrings for things that call it)15:49
mdrabeK will do15:49
efriedcoo15:49
tjakobsefried thorst: can you take another look at https://review.openstack.org/#/c/462248/ when you get a chance (file-backed ephemeral)15:53
*** k0da has quit IRC15:55
esbergluefried: Seeing a bunch of 412 errors before the previous paste15:59
esbergluhttp://paste.openstack.org/show/611736/15:59
efriedesberglu Very much expected, especially with a tempest run doing stuff in parallel.16:01
edmondswesberglu I think 412s may be normal. It does appear to recover from those and continue on. But they could be a symptom of delays16:01
efriedIf those retry counts get above three or four, may be worth looking into, but I don't expect that to be related to the issue at hand.16:02
edmondswyeah, I think I saw it 2-3 times16:02
efriedtjakobs Ack, sorry, don't know why that didn't float to the top when you responded.16:03
edmondswefried is it normal to see a whole lot of NotImplementedErrors, like "It took flow 'destroy' 110.22 seconds to finish.: NotImplementedError" ?16:06
esbergluefried: edmondsw: Another thing I noticed in the logs16:08
esbergluhttp://paste.openstack.org/show/611739/16:08
edmondswesberglu yeah, that's similar to what I'm looking at16:08
efriededmondsw That's new, not our code, probably related to dhellman's recent oslo.log changes to print exception contexts.  He's still working through it.16:10
efriedIf it's not blowing things up, don't worry about it.16:11
efriedesberglu What's the distribution of neos to SSPs?  Do all SSPs have the same number of neos, or is this one "heavy"?16:12
esbergluAll of the SSPs have 4 systems except one which has 216:12
efriedesberglu Are the SSPs all the same size (number & size of disks) and backed by the same SAN?16:15
esbergluYeah. Same SAN. All have 4 250G disks16:16
efriededmondsw Have you devstacked your victim neo yet?16:25
edmondswefried no, I spun up a VM on which I can do that, but waiting for you to tell me what I may need to do in my local.conf16:26
efriedspun up a vm....16:27
edmondswlooking through logs on timeouts (between meetings and interruptions)16:27
edmondswefried gotta run devstack somewhere...16:27
efriededmondsw Normally on the neo.16:27
edmondswoh really...16:27
efriedWere you planning to set up remote/proxy?16:28
edmondswefried no... I assumed you'd have devstack setup on a VM for nova api, etc. and then just run the compute stuff on the neo... but I guess it would be nice to just run it all on the neo...16:28
edmondswwasn't thinking through it16:29
edmondswstill thinking like a PowerVC developer :)16:29
efriededmondsw I suppose you could split it up, though I can't really think of any benefit to that.16:29
edmondswyeah16:29
efriedSo the best thing to do, local.conf-wise, is to grab the one from the CI.16:29
efriedOr I suppose you can grab mine from neo40:/opt/stack/devstack/local.conf - it was known to work within the past week ;-)16:30
efriedand you wouldn't have to edit it as much.16:30
efriedI think16:30
efriededmondsw Can we talk through the disable-compute-service thing at some point?16:40
edmondswefried yes please16:40
efriedNow good?16:40
edmondswI pulled down both local confs and started diffing them, but I can do that later... now sounds good16:40
efried^^ make sure you get the .aio one from the CI.16:41
edmondswnow called "intree"16:41
edmondswyes16:41
efriededmondsw No, you'll want the OOT one.16:41
edmondswoh... even though I'm working on an intree change?16:41
efriedoh, right.  Well, I find it easiest to stack with OOT, cause then you can flip to in-tree just by changing the driver in nova.conf and restarting compute.16:42
efriedAnd you're gonna want to port this change back to OOT when done anyway.16:42
edmondswoh, if that's the case, makes sense16:42
efriedyeah, you can't go the other way (flip in-tree to OOT if you stacked in-tree only) cause nova-powervm and networking-powervm will be missing.16:43
efriedAnyway, back to disable-compute-service.16:43
efriedFrom the top, we have two scenarios we want to be able to cover: init_host and periodic task.16:44
edmondswyep16:44
efriedStarting with init_host, I'm not actually sure we can get anywhere if this fails.16:45
efriedIt runs only once, when the compute service starts.  I don't believe it retries if it fails - it just bails.16:45
edmondswefried the periodic task will retry the guts of it16:45
edmondswdefault runs that periodic task ever 60s IIRC16:46
efriedOnly if the driver can successfully report get_available_nodes16:46
efriedWhich relies on self.host_wrapper16:46
efriedWhich doesn't exist if we never initted properly.16:46
efriedSo we would have to change what that guy reports, to something generic that we can ascertain without talking over the adapter.16:47
edmondswefried what get_available_nodes call are you referencing?16:48
efriednova.virt.powervm.driver.PowerVMDriver#get_available_nodes16:48
edmondswefried oh, I see it16:48
edmondswhttps://github.com/openstack/nova/blob/5d95cb9dbca403790db4e9680919e6716fa5cb76/nova/compute/manager.py#L663016:49
efriedKinda raises the question as to whether 'available' ~ 'enabled' in this context.  I'm gonna assert 'no', precisely because we want this to work.16:49
efriedAnyway, as far as what that returns, we can probably get away with using the neo's hostname, gleaned by local cmd rather than pvm API.16:50
efriedCourse then we give esberglu another log scrubbing task :)16:50
efriedUnless we use the shortname, which would probably be aiight.  But not sure how globally unique these have to be.16:51
edmondswwhat does get_available_nodes return today when working?16:51
efriedMTMS string16:52
efriedside topic, what IDE do you use?16:53
edmondswefried can we get that from a cli call?16:53
efriedhm, possibly, or something like it, through RMC.16:53
edmondswefried do we have to worry about RMC being down?16:54
edmondswI would guess yes :)16:54
efriedWell, yeah, but if it is, I think we can safely declare ourselves dead.16:54
efriedmaybe16:54
edmondswdoes it possibly take time to get RMC working after restart?16:55
efriedYeah, that's one of the main things we're trying to account for here - I don't know how quickly RMC comes up relative to other stuff.16:55
edmondswright16:55
edmondswme either16:55
mdrabeOnly for the VIOSes should RMC matter I think, but idk about the SDE case16:55
mdrabeNL in VIOS environment doesn't have RMC16:56
efriedIt may not be an actual RMC command we need.  The MTMS (or something like it) may be in a text file somewhere.16:56
efried(edmondsw, talking it through on slack fyi)17:02
edmondswefried yeah, I'm following17:02
efriedk17:02
*** smatzek has quit IRC17:04
efriedmdrabe What we're working towards is allowing init_host to complete fairly quickly, even if the NL services aren't up yet, in which case it will mark the compute host as disabled.17:05
efriedBut then recheck when the periodic task for get_available_resource runs.17:06
efriedand if NL is now responsive, enable the compute host (and make sure all the init_host stuff is really done)17:06
efriedSo the problem is that the periodic task calls get_available_nodes17:07
mdrabeThat seems a little weird to me17:07
efriedwhich today uses the managed system wrapper to generate a name.17:07
efriedwhich part?17:08
mdrabeI guess it just makes more sense to me to have init_host wait for services17:08
mdrabeWhy have compute do periodic tasks and the like before then?17:09
efriedYeah, so the whole point here is not to hold the compute service hostage waiting for NL services.  We want to be able to say "alive, but disabled" fairly quickly, and then enable later when things smooth out.17:11
efriedThe general case is to do that if the NL services go pear-shaped during the normal course of events.17:11
efriedBut accounting for the reboot scenario is the kinda tricky edge case we're working on.17:12
mdrabeIt doesn't have to be tricky is all I'm sayin17:14
efriedmdrabe How long do we wait?17:15
efriedThat's what raised nova core eyebrows to begin with.17:15
mdrabeIt's pypowervm right?17:15
mdrabe300 tries every 2 seconds?17:15
mdrabe1 try every 2 seconds for a maximum of up to 300 ***17:16
efriedWhich works out to 10 minutes?  Michael Still and Matt Riedemann both independently balked at that.17:16
efriedSwhat prompted this whole project.17:17
mdrabeI do have some concerns, but I don't wanna hold anything up17:19
efriedThing is, we can make this work if we can resolve the get_available_nodes thing.17:20
edmondswefried is this only about getting initialized, or did we also want to handle the NL services and/or VIOS going down after some period of working?17:21
efriededmondsw Both.17:22
edmondswthe current impl only handles the former17:22
efriedYup, we'll get to that next :)17:22
edmondswvmware didn't handle the latter, just initialization17:23
edmondswbut I do think that would be a good idea17:23
edmondswand we won't have a get_available_nodes problem there17:23
edmondswbecause we can cache that from the time that it was working17:23
edmondswI'm thinking it may be best to just use hostname for get_available_nodes17:24
edmondswdoesn't nova refer to nodes by hostname?17:24
efriedThe docstring even says you can use hypervisor_hostname17:25
efriedSo yeah, I guess if that's the case, it's probably workable.17:25
edmondswefried https://developer.openstack.org/api-ref/compute/#show-host-details17:26
edmondswthen again, that's deprecated...17:26
edmondswmdrabe, did you see that they've deprecated os-hosts APIs?17:26
edmondswand specifically say there will be no replacement for maintenance_mode17:26
edmondsw:(17:26
efriededmondsw So17:35
efriedJust use socket.gethostname()17:35
edmondswefried do we need to get esberglu to add some log scraping before I push that up?17:36
efriedNope, that yields the short hostname.17:36
efriedand is what nova uses to default the ComputeManager.host / Service.host / etc.17:36
edmondswefried k17:37
edmondswmdrabe is this change to get_available_nodes going to affect PowerVC?17:37
edmondswefried or other things on upgrade?17:37
efriedSo the logs will now have things like "Compute_service record updated for neo40:neo40" instead of "Compute_service record updated for neo40:8247-21L*215439A"17:37
efriedI'm really hoping pvc doesn't use get_available_nodes anywhere.  That would be... silly.17:38
efriedHah - esberglu we just figured out another way you could figure out what neo you're running against in the CI.17:39
efried...until this change drops, that is.17:39
*** smatzek has joined #openstack-powervm17:39
efriedOkay, so now I *think* that'll sort us out for init_host.17:41
efriedNow edmondsw, what you were saying before about VCenter not supporting dynamic enable/disable during runtime - they do.17:42
edmondswefried not really... they start returning blank data, but they don't disable the service17:42
edmondswefried oh, my bad... you're right17:43
efriedTheir get_available_resource calls self._vc_state.get_host_stats(refresh=True), which calls update_status, which calls _set_host_enabled(False) if they get an exception trying to discover stuff.17:43
edmondswI overlooked L85, thought the only call was L10817:43
efriedWhich is roughly what we might do.  But before we settle on that, I'd like to bring up another possibility.17:44
efriedThis methodology relies on get_available_resource.17:44
efriedWhich means a couple of things:17:44
efriedWe're at the mercy of whatever interval the consumer configured for that guy to run.17:44
efriedIf something goes wrong in between hits, we won't disable until the next hit.17:45
efriedWhich could result in more failures than necessary.17:46
efriedThe other thing is that get_available_resource is gonna disappear pretty soon, so we'll probably have to move the logic to get_inventory() - but will need to make sure that's running in the same kind of periodic task.17:46
edmondswefried right, I assumed that if we need to disable when NL services go down, that will be done differently, on failures wherever they occur17:46
efriedSo yeah, that's an option.17:47
efriedIn fact, there's ways of doing that where we could avoid get_available_resource altogether.17:47
efriedOne way to do that would be to add a helper to our Adapter.17:47
efriedThose guys get wrapped around the low-level requests, so they can trap specific HTTP error codes and whatnot, which is kinda what we want to be on the lookout for.17:48
efriedWe get a "service unavailable", we disable right then.17:48
efriedThe question then becomes: how do we switch back on?17:48
efriedWe could keep that logic in get_available_resource.17:49
efriedOr (go with me for a sec here) we could spawn a thread that polls the service and re-enables when it comes back to life.17:49
*** chhavi has quit IRC17:51
edmondswefried if we spawned a thread, we could do the same during init and not have to change get_available_nodes17:56
efrieduh17:56
edmondswefried no? am I missing something?17:57
efriedWell, we would still have to account for the fact that get_available_resource could run before init_host has set up the host_wrapper.17:57
edmondswefried you mean get_available_nodes?17:57
efriedyeah, called by get_available_resource17:57
edmondswyes, we'd have to add some error handling logic there but we could just return []17:58
efriedhm, that could actually work.17:58
edmondswefried no, called by update_available_resource17:58
edmondswI kinda like returning [] there until nodes are actually available :)17:59
efriedshit, that could just work anyway.17:59
efriedyeah17:59
efriedAs long as the thing that re-enables the service is running NOT at the behest of update_available_resource18:00
edmondswactually... no, I think we'd still want to change get_available_resource to hostname18:01
edmondswI think we're getting hung up on the word available18:01
edmondswit's available in the sense that there is a compute service setup there. It's not available in the sense that the service is disabled18:01
edmondswI think nova means the former, though18:01
edmondswthey seem to have coded that way, as in update_available_resource18:02
efriedRight18:02
edmondswI'm still a little concerned that switching from MTMS to hostname is going to break something somewhere18:03
efriedAnyway, let's table that and come back to it.18:03
efriedI'm 95% sure that is an opaque string that nobody cares about unless there's more than one on a compute node (which I think only applies to ironic)18:04
efriedWe could return 'foo' and it would be okay.18:04
efriedBut let's table it for now.18:04
efriedSo I'm not a big fan of threads in general.18:05
efriedThe get_available_resource periodic task thing is probably "good enough".18:05
efriedIt's also worth noting that a recent change was made that automatically disables a compute service on which some number of consecutive deploys failed.18:06
efried10 by default, conf-able.18:06
edmondswyeah, would be nice to be automatically re-enabling after they've done that18:08
efriedWell, the stated design there is that the admin has to re-enable manually.18:13
efriedThere are many reasons beyond nvl services that deploys could be failing.18:13
efriededmondsw So the other thing I want to see done - possibly in a preceding change set - is factoring out the service enable/disable code, which is now used in at least four places I know of, including this one.18:17
edmondswefried sure18:19
efriedOkay, so to disable, we could a) rely on get_available_resource as the trigger; and/or b) add a helper to the adapter to disable on certain conditions18:21
efriedTo re-enable, we could a) rely on get_available_resource as the trigger; and/or b) spawn a thread when we disable (however that is) to poll for live-ness.18:21
edmondswI'm working on (a) atm18:22
edmondswfor both18:22
efriedWhy I like relying on get_available_resource: It's freakin simple.  Why I don't like it: we're at the mercy of periodic_task_interval/update_resources_interval18:25
efriedI'm gonna declare that's okay for now.  If we get complaints (I can almost guarantee we never will) we can look into swapping it around.18:26
efriedI'd like there to be a record of the design alternatives, though.  May be too heavyhanded to do it in code comments, but at least in the commit message.18:28
efried...so our posterity can git blame their way to finding it.18:28
efriedYou can refer to this IRC log - we're on eavesdrop.18:28
efriededmondsw In other news, I found out how to get the MTMS from /proc ;-)18:29
edmondswefried oh reallyl?18:31
efriedyup, gimme a sec.18:31
mdrabeedmondsw os-hosts being deprecated is gonna cause some problems18:45
mdrabeAnd about this get_available_nodes business, can it just return CONF.host?18:46
mdrabeWhat is CONF.host set to by default?18:46
edmondswmdrabe yeah, I'm worried about the os-hosts thing18:46
edmondswchatted with cjvolzka about that a bit, and will send a note18:46
edmondswmdrabe I was just looking for CONF.host... so stay tuned18:47
efriededmondsw http://paste.openstack.org/show/611760/18:49
efriedFrom what I can tell, if you don't set CONF.host, things that want it use socket.gethostname()18:50
esbergluefried: edmondsw: thorst: Thinking about reworking the patching logic in prep_devstack18:51
esbergluMy idea is that instead of passing the patch lists through on the command line we provide a file that will have a project per line18:51
esberglu<project1> <patch_list_1>18:51
esberglu<project2> <patch_list_2>18:51
esbergluSo we can patch multiple projects in the same run. And keep this file in powervm-ci so we can live update super easily18:51
esbergluThoughts?18:51
edmondswmdrabe https://github.com/openstack/nova/blob/master/nova/conf/netconf.py#L5518:51
edmondswdefault for CONF.host is socket.gethostname()18:52
edmondswefried that's what we'd talked about changing it to anyway18:52
efriedesberglu Yes.  Make the format project:branch:patch_list18:52
mdrabeCONF.host is what pvc does18:52
edmondswturns out PowerVC is already overwriting the pvm driver to use CONF.host18:52
edmondswmdrabe right18:52
edmondswso I'm thinking we just change the driver to use CONF.host and PowerVC can just stop overwriting that18:53
edmondswefried agreed?18:53
esbergluefried: Ok. Will wait until the external prep_devstack is working to get started on that18:53
mdrabe+118:53
efriedShrug.18:54
efriedSure.18:54
edmondswesberglu +118:54
*** k0da has joined #openstack-powervm19:02
*** thorst_afk has quit IRC19:11
*** thorst_afk has joined #openstack-powervm19:15
edmondswefried how can I stop pvm-rest to simulate that going down?19:18
*** thorst_afk has quit IRC19:20
mdrabeedmondsw try `service pvm-rest stop` maybe19:20
edmondswmdrabe nope19:20
mdrabesudo?19:21
edmondswmdrabe yep... duh...19:24
*** thorst_afk has joined #openstack-powervm19:29
edmondswefried I just found an infinite loop condition in validate_vios_ready...19:37
edmondswI'll propose something19:37
efriededmondsw Well, we haven't hit it yet...19:38
edmondswefried not surprising, actually19:38
edmondswefried that method is currently called right after you setup the adapter, so it should be good to go, and you'd only hit this if the adapter wasn't working19:39
efriededmondsw Okay, so if the adapter went belly up right after it was initialized successfully?19:40
edmondswin current usage... but we're going to start calling this differently, so it'd be more likely19:40
edmondswefried https://github.com/powervm/pypowervm/blob/master/pypowervm/tasks/partition.py#L26319:41
edmondswif you always hit the Exception in L275, you never break out19:41
edmondswmax_wait_time is ignored19:42
edmondswbecause rmc_down_vioses will always evaluate False in the if that can break out based on max_wait_time19:42
esbergluefried: Can I get a final review on 5406?19:42
efriedon it now19:43
esbergluI want to deploy staging with that this afternoon19:43
esbergluefried: thanks19:43
efriedesberglu done19:43
efriedesberglu Is 5405 just WIP because it's waiting for the other to merge?19:44
esbergluefried: Yeah pretty much. And if i hit anything unexpected when testing on staging19:56
-openstackstatus- NOTICE: The Gerrit service on review.openstack.org is being restarted now to clear some excessive connection counts while we debug the intermittent request failures reported over the past few minutes20:06
*** smatzek has quit IRC20:08
efriedesberglu Heads up, the nova project just went bananas with spurious merge conflicts on every pending change set.21:09
efriedSo dozens of rebases are gonna choke CIs everywhere.21:09
efriedIt'll be a good scale test for us, I suppose.  Have we ever hit full capacity with a wait queue before?21:09
efriedI count 38 in the last hour21:11
esbergluefried: Yeah we have hit our max nodes before. There are 86 runs in the queue right now (current max is 50). We can up the max probably21:11
efriedNot sure if that's necessary, assuming they don't just drop off at some point.21:12
efriedWe're not likely to be the last CI posting results after this glut.21:12
efriedThe max in this context is the number of nodes running simultaneously?21:13
esbergluefried: Yep. We very rarely get up to 50 under normal circumstances but it does happen every now and then21:13
efriedIncreasing the number of nodes would put load on.. what?21:13
efriedThe systems those VMs live on?  Are they shared proc/mem?21:13
efriedAnd presumably the SSPs backing them.21:14
esbergluefried: I don't know enough about the performance side of things to say21:14
efriedk.  If you're saying it's really rare to hit the max, let's leave it alone.  This is an anomaly, for sure.21:14
esbergluI think I have seen it hit the max 2 (maybe 3) other times ever. So yeah very rare21:15
*** edmondsw_ has joined #openstack-powervm21:22
*** zerick_ has joined #openstack-powervm21:25
*** jpasqualetto has quit IRC21:27
*** adi_____ has quit IRC21:29
*** edmondsw has quit IRC21:29
*** zerick has quit IRC21:29
*** zerick_ is now known as zerick21:30
openstackgerritMatt Rabe proposed openstack/nova-powervm master: Change NVRAM manager store to use uuid instead of instance object  https://review.openstack.org/47192621:30
*** esberglu has quit IRC21:34
*** esberglu has joined #openstack-powervm21:35
*** mdrabe has quit IRC21:38
*** esberglu has quit IRC21:39
*** esberglu has joined #openstack-powervm21:48
*** esberglu has quit IRC21:53
*** thorst_afk has quit IRC21:53
*** edmondsw_ has quit IRC21:58
*** svenkat has quit IRC22:12
*** tjakobs has quit IRC22:22
*** thorst_afk has joined #openstack-powervm22:28
*** thorst_afk has quit IRC22:32
*** jwcroppe has quit IRC22:36
*** jwcroppe has joined #openstack-powervm22:36
*** jwcroppe has quit IRC22:36
*** jwcroppe has joined #openstack-powervm22:37
*** jwcroppe has quit IRC22:41
*** thorst_afk has joined #openstack-powervm23:02
*** openstack has joined #openstack-powervm23:14
*** thorst_afk has quit IRC23:17
*** edmondsw has joined #openstack-powervm23:18
*** adi_____ has joined #openstack-powervm23:20
*** edmondsw has quit IRC23:23
*** svenkat has joined #openstack-powervm23:24
*** k0da has quit IRC23:46

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!