Wednesday, 2017-06-07

*** thorst_afk has joined #openstack-powervm		01:06
*** thorst_afk has quit IRC		01:11
*** thorst_afk has joined #openstack-powervm		01:11
*** thorst_afk has quit IRC		01:20
*** edmondsw has quit IRC		01:36
*** svenkat has quit IRC		01:45
*** thorst_afk has joined #openstack-powervm		01:56
*** YuYangWang has joined #openstack-powervm		02:06
*** thorst_afk has quit IRC		02:32
*** thorst_afk has joined #openstack-powervm		02:33
*** thorst_afk has quit IRC		02:37
*** edmondsw has joined #openstack-powervm		02:46
*** edmondsw has quit IRC		02:51
*** thorst_afk has joined #openstack-powervm		03:43
*** thorst_afk has quit IRC		04:02
*** chhavi has joined #openstack-powervm		04:03
*** edmondsw has joined #openstack-powervm		04:35
*** edmondsw has quit IRC		04:39
*** thorst_afk has joined #openstack-powervm		05:59
*** thorst_afk has quit IRC		06:04
*** YuYangWang has quit IRC		06:10
*** edmondsw has joined #openstack-powervm		06:23
*** edmondsw has quit IRC		06:28
*** thorst_afk has joined #openstack-powervm		07:01
*** thorst_afk has quit IRC		07:05
*** k0da has joined #openstack-powervm		07:13
*** thorst_afk has joined #openstack-powervm		08:01
*** edmondsw has joined #openstack-powervm		08:11
*** edmondsw has quit IRC		08:16
*** thorst_afk has quit IRC		08:21
*** thorst_afk has joined #openstack-powervm		09:18
*** thorst_afk has quit IRC		09:22
*** edmondsw has joined #openstack-powervm		09:59
*** chhavi has quit IRC		10:01
*** chhavi has joined #openstack-powervm		10:01
*** edmondsw has quit IRC		10:04
*** smatzek has joined #openstack-powervm		10:39
*** smatzek has quit IRC		10:43
*** chhavi has quit IRC		11:07
*** smatzek has joined #openstack-powervm		11:20
*** edmondsw has joined #openstack-powervm		11:27
*** svenkat has joined #openstack-powervm		11:40
*** thorst_afk has joined #openstack-powervm		11:49
*** jpasqualetto has joined #openstack-powervm		12:11
*** jpasqualetto has quit IRC		12:46
*** chhavi has joined #openstack-powervm		13:02
*** mdrabe has joined #openstack-powervm		13:13
*** jpasqualetto has joined #openstack-powervm		13:26
edmondsw	efried please take a look at https://review.openstack.org/#/c/471773/ and let me know if that's what you were thinking	13:27
edmondsw	efried then we need to talk about how to test it, and I need to write UTs	13:27
efried	edmondsw Will do.	13:27
*** smatzek has quit IRC		13:40
*** jwcroppe has joined #openstack-powervm		13:45
*** smatzek has joined #openstack-powervm		14:03
esberglu	efried: edmondsw: thorst_afk: Looking into the timeout errors. So far I've seen it on all of the systems of 1 of the SSP groups. And haven't seen it on any other systems	14:10
esberglu	So it could be some sort of issue with the SSP	14:10
edmondsw	esberglu interesting	14:11
edmondsw	we all know SSP never has issues... ;)	14:11
esberglu	I'm looking through some more failures to confirm the above	14:11
edmondsw	esberglu one minor reword on 5406, but looking good overall, tx for the changes	14:14
esberglu	edmondsw: And you're cool with the "none" thing?	14:15
edmondsw	yeah, it's not ideal but it's not a deal breaker. Bigger fish to fry	14:16
esberglu	edmondsw: Yeah I'm confident it's an SSP issue. 25+ timeouts among neo19, neo21, neo24, neo25	14:19
esberglu	vs. 1 on neo39 (which could be a different issue) and no other systems	14:19
esberglu	And those 4 are in the same cluster	14:20
edmondsw	esberglu let's look at that one issue on neo39... link?	14:27
*** jwcroppe has quit IRC		14:28
esberglu	edmondsw: http://184.172.12.213/37/459737/6/check/nova-out-of-tree-pvm/de5b4e8/	14:28
edmondsw	esberglu that looks like the same issue to me	14:29
esberglu	edmondsw: I wasn't saying it couldn't be the same issue. Just that other things could cause a build to timeout	14:31
edmondsw	esberglu could also be one SSP cluster is more reliable than another but still not 100% reliable	14:31
esberglu	edmondsw: Any idea where to start debugging SSP issues?	14:32
edmondsw	esberglu other than pinging that team, no	14:32
edmondsw	efried any suggestions?	14:33
edmondsw	esberglu are the 4 tempest tests in your CI TODO etherpad still the only places we've hit this?	14:35
*** jwcroppe has joined #openstack-powervm		14:36
efried	Well, the first thing to do is grep the logs for how long SSP-related ops are taking. LU creation, upload, etc.	14:40
efried	Compare the results on the good cluster vs the bad, see what the pattern is.	14:41
efried	Is it just LU creation? Just upload? Or both?	14:41
efried	Is it every time, or intermittent?	14:41
*** tjakobs has joined #openstack-powervm		14:42
efried	Then probably sanity check the config of the SSPs. Are they backed by the same SAN? Do they have the same kind & number of disks? Same number of VIOSes with the same kind of resources?	14:43
*** mdrabe has quit IRC		14:44
efried	I believe it also may be possible for one bad VIOS to affect the performance of the whole cluster; so maybe try removing one VIOS at a time in turn and see if it gets better.	14:44
esberglu	edmondsw: afaik	14:45
efried	We can engage the VIOS team as soon as we have hard evidence (timings from the log) that prove the pattern and point to the specific op(s) taking too long.	14:45
efried	Yeah, also that - dig into those four tests and see what's different about them. They shouldn't be doing any specific op that's unique to those tests, but maybe they're doing a particular op multiple times where other tests just do it once.	14:46
edmondsw	efried the log color work y'all did a couple weeks ago.. how do I get vim to interpret that and give me colors instead of those nasty codes?	14:48
efried	vim, no idea, would have to RTFM	14:48
efried	Why do you need to edit logs?	14:48
efried	edmondsw https://stackoverflow.com/questions/10592715/ansi-color-codes-in-vim (first google hit)	14:49
edmondsw	efried I don't need to edit them... what do you use to view them?	14:49
efried	edmondsw I just use terminal tools like less, grep, etc. Those will interpret the codes by default.	14:50
efried	If not, use less -R	14:50
esberglu	edmondsw: mac or windows?	14:50
edmondsw	mac	14:50
edmondsw	efried ah, -R	14:51
edmondsw	tx	14:51
efried	sweet	14:51
esberglu	The console application has some nice filtering capabilities but 0 color support	14:51
esberglu	But if I want color I just use less -R	14:51
efried	esberglu btw, don't make any sudden moves on the prep_devstack.sh changes - I'm in mid-review.	14:52
esberglu	efried: k	14:52
*** mdrabe has joined #openstack-powervm		14:54
esberglu	efried: edmondsw: Seeing this in the logs	15:12
esberglu	http://paste.openstack.org/show/611732/	15:12
esberglu	So the test in question is trying to rebuild the server and times out waiting for the server to become active	15:13
esberglu	But for some reason it is being deleted here before it finishes rebuilding	15:13
esberglu	Searching for why it's getting deleted	15:14
efried	Possibly this is just the manifestation of that condition. It fails to build, so it deletes it, then the test proceeds and that's just how it fails.	15:14
esberglu	efried: Yeah	15:15
mdrabe	efried: Is there a get_instance_wrapper_from_uuid API in OOT or should I make one?	15:46
efried	mdrabe From nova UUID or pvm UUID?	15:46
mdrabe	nova	15:46
efried	You're looking to get the LPAR wrapper, not the instance obj, right?	15:47
mdrabe	yup	15:47
efried	nova_powervm.virt.powervm.vm.get_instance_wrapper takes an instance. You could update it so the 'instance' arg can be either a UUID or an instance obj.	15:48
efried	Which ultimately probably just means making nova_powervm.virt.powervm.vm.get_pvm_uuid accept either	15:49
efried	(but update docstrings for things that call it)	15:49
mdrabe	K will do	15:49
efried	coo	15:49
tjakobs	efried thorst: can you take another look at https://review.openstack.org/#/c/462248/ when you get a chance (file-backed ephemeral)	15:53
*** k0da has quit IRC		15:55
esberglu	efried: Seeing a bunch of 412 errors before the previous paste	15:59
esberglu	http://paste.openstack.org/show/611736/	15:59
efried	esberglu Very much expected, especially with a tempest run doing stuff in parallel.	16:01
edmondsw	esberglu I think 412s may be normal. It does appear to recover from those and continue on. But they could be a symptom of delays	16:01
efried	If those retry counts get above three or four, may be worth looking into, but I don't expect that to be related to the issue at hand.	16:02
edmondsw	yeah, I think I saw it 2-3 times	16:02
efried	tjakobs Ack, sorry, don't know why that didn't float to the top when you responded.	16:03
edmondsw	efried is it normal to see a whole lot of NotImplementedErrors, like "It took flow 'destroy' 110.22 seconds to finish.: NotImplementedError" ?	16:06
esberglu	efried: edmondsw: Another thing I noticed in the logs	16:08
esberglu	http://paste.openstack.org/show/611739/	16:08
edmondsw	esberglu yeah, that's similar to what I'm looking at	16:08
efried	edmondsw That's new, not our code, probably related to dhellman's recent oslo.log changes to print exception contexts. He's still working through it.	16:10
efried	If it's not blowing things up, don't worry about it.	16:11
efried	esberglu What's the distribution of neos to SSPs? Do all SSPs have the same number of neos, or is this one "heavy"?	16:12
esberglu	All of the SSPs have 4 systems except one which has 2	16:12
efried	esberglu Are the SSPs all the same size (number & size of disks) and backed by the same SAN?	16:15
esberglu	Yeah. Same SAN. All have 4 250G disks	16:16
efried	edmondsw Have you devstacked your victim neo yet?	16:25
edmondsw	efried no, I spun up a VM on which I can do that, but waiting for you to tell me what I may need to do in my local.conf	16:26
efried	spun up a vm....	16:27
edmondsw	looking through logs on timeouts (between meetings and interruptions)	16:27
edmondsw	efried gotta run devstack somewhere...	16:27
efried	edmondsw Normally on the neo.	16:27
edmondsw	oh really...	16:27
efried	Were you planning to set up remote/proxy?	16:28
edmondsw	efried no... I assumed you'd have devstack setup on a VM for nova api, etc. and then just run the compute stuff on the neo... but I guess it would be nice to just run it all on the neo...	16:28
edmondsw	wasn't thinking through it	16:29
edmondsw	still thinking like a PowerVC developer :)	16:29
efried	edmondsw I suppose you could split it up, though I can't really think of any benefit to that.	16:29
edmondsw	yeah	16:29
efried	So the best thing to do, local.conf-wise, is to grab the one from the CI.	16:29
efried	Or I suppose you can grab mine from neo40:/opt/stack/devstack/local.conf - it was known to work within the past week ;-)	16:30
efried	and you wouldn't have to edit it as much.	16:30
efried	I think	16:30
efried	edmondsw Can we talk through the disable-compute-service thing at some point?	16:40
edmondsw	efried yes please	16:40
efried	Now good?	16:40
edmondsw	I pulled down both local confs and started diffing them, but I can do that later... now sounds good	16:40
efried	^^ make sure you get the .aio one from the CI.	16:41
edmondsw	now called "intree"	16:41
edmondsw	yes	16:41
efried	edmondsw No, you'll want the OOT one.	16:41
edmondsw	oh... even though I'm working on an intree change?	16:41
efried	oh, right. Well, I find it easiest to stack with OOT, cause then you can flip to in-tree just by changing the driver in nova.conf and restarting compute.	16:42
efried	And you're gonna want to port this change back to OOT when done anyway.	16:42
edmondsw	oh, if that's the case, makes sense	16:42
efried	yeah, you can't go the other way (flip in-tree to OOT if you stacked in-tree only) cause nova-powervm and networking-powervm will be missing.	16:43
efried	Anyway, back to disable-compute-service.	16:43
efried	From the top, we have two scenarios we want to be able to cover: init_host and periodic task.	16:44
edmondsw	yep	16:44
efried	Starting with init_host, I'm not actually sure we can get anywhere if this fails.	16:45
efried	It runs only once, when the compute service starts. I don't believe it retries if it fails - it just bails.	16:45
edmondsw	efried the periodic task will retry the guts of it	16:45
edmondsw	default runs that periodic task ever 60s IIRC	16:46
efried	Only if the driver can successfully report get_available_nodes	16:46
efried	Which relies on self.host_wrapper	16:46
efried	Which doesn't exist if we never initted properly.	16:46
efried	So we would have to change what that guy reports, to something generic that we can ascertain without talking over the adapter.	16:47
edmondsw	efried what get_available_nodes call are you referencing?	16:48
efried	nova.virt.powervm.driver.PowerVMDriver#get_available_nodes	16:48
edmondsw	efried oh, I see it	16:48
edmondsw	https://github.com/openstack/nova/blob/5d95cb9dbca403790db4e9680919e6716fa5cb76/nova/compute/manager.py#L6630	16:49
efried	Kinda raises the question as to whether 'available' ~ 'enabled' in this context. I'm gonna assert 'no', precisely because we want this to work.	16:49
efried	Anyway, as far as what that returns, we can probably get away with using the neo's hostname, gleaned by local cmd rather than pvm API.	16:50
efried	Course then we give esberglu another log scrubbing task :)	16:50
efried	Unless we use the shortname, which would probably be aiight. But not sure how globally unique these have to be.	16:51
edmondsw	what does get_available_nodes return today when working?	16:51
efried	MTMS string	16:52
efried	side topic, what IDE do you use?	16:53
edmondsw	efried can we get that from a cli call?	16:53
efried	hm, possibly, or something like it, through RMC.	16:53
edmondsw	efried do we have to worry about RMC being down?	16:54
edmondsw	I would guess yes :)	16:54
efried	Well, yeah, but if it is, I think we can safely declare ourselves dead.	16:54
efried	maybe	16:54
edmondsw	does it possibly take time to get RMC working after restart?	16:55
efried	Yeah, that's one of the main things we're trying to account for here - I don't know how quickly RMC comes up relative to other stuff.	16:55
edmondsw	right	16:55
edmondsw	me either	16:55
mdrabe	Only for the VIOSes should RMC matter I think, but idk about the SDE case	16:55
mdrabe	NL in VIOS environment doesn't have RMC	16:56
efried	It may not be an actual RMC command we need. The MTMS (or something like it) may be in a text file somewhere.	16:56
efried	(edmondsw, talking it through on slack fyi)	17:02
edmondsw	efried yeah, I'm following	17:02
efried	k	17:02
*** smatzek has quit IRC		17:04
efried	mdrabe What we're working towards is allowing init_host to complete fairly quickly, even if the NL services aren't up yet, in which case it will mark the compute host as disabled.	17:05
efried	But then recheck when the periodic task for get_available_resource runs.	17:06
efried	and if NL is now responsive, enable the compute host (and make sure all the init_host stuff is really done)	17:06
efried	So the problem is that the periodic task calls get_available_nodes	17:07
mdrabe	That seems a little weird to me	17:07
efried	which today uses the managed system wrapper to generate a name.	17:07
efried	which part?	17:08
mdrabe	I guess it just makes more sense to me to have init_host wait for services	17:08
mdrabe	Why have compute do periodic tasks and the like before then?	17:09
efried	Yeah, so the whole point here is not to hold the compute service hostage waiting for NL services. We want to be able to say "alive, but disabled" fairly quickly, and then enable later when things smooth out.	17:11
efried	The general case is to do that if the NL services go pear-shaped during the normal course of events.	17:11
efried	But accounting for the reboot scenario is the kinda tricky edge case we're working on.	17:12
mdrabe	It doesn't have to be tricky is all I'm sayin	17:14
efried	mdrabe How long do we wait?	17:15
efried	That's what raised nova core eyebrows to begin with.	17:15
mdrabe	It's pypowervm right?	17:15
mdrabe	300 tries every 2 seconds?	17:15
mdrabe	1 try every 2 seconds for a maximum of up to 300 ***	17:16
efried	Which works out to 10 minutes? Michael Still and Matt Riedemann both independently balked at that.	17:16
efried	Swhat prompted this whole project.	17:17
mdrabe	I do have some concerns, but I don't wanna hold anything up	17:19
efried	Thing is, we can make this work if we can resolve the get_available_nodes thing.	17:20
edmondsw	efried is this only about getting initialized, or did we also want to handle the NL services and/or VIOS going down after some period of working?	17:21
efried	edmondsw Both.	17:22
edmondsw	the current impl only handles the former	17:22
efried	Yup, we'll get to that next :)	17:22
edmondsw	vmware didn't handle the latter, just initialization	17:23
edmondsw	but I do think that would be a good idea	17:23
edmondsw	and we won't have a get_available_nodes problem there	17:23
edmondsw	because we can cache that from the time that it was working	17:23
edmondsw	I'm thinking it may be best to just use hostname for get_available_nodes	17:24
edmondsw	doesn't nova refer to nodes by hostname?	17:24
efried	The docstring even says you can use hypervisor_hostname	17:25
efried	So yeah, I guess if that's the case, it's probably workable.	17:25
edmondsw	efried https://developer.openstack.org/api-ref/compute/#show-host-details	17:26
edmondsw	then again, that's deprecated...	17:26
edmondsw	mdrabe, did you see that they've deprecated os-hosts APIs?	17:26
edmondsw	and specifically say there will be no replacement for maintenance_mode	17:26
edmondsw	:(	17:26
efried	edmondsw So	17:35
efried	Just use socket.gethostname()	17:35
edmondsw	efried do we need to get esberglu to add some log scraping before I push that up?	17:36
efried	Nope, that yields the short hostname.	17:36
efried	and is what nova uses to default the ComputeManager.host / Service.host / etc.	17:36
edmondsw	efried k	17:37
edmondsw	mdrabe is this change to get_available_nodes going to affect PowerVC?	17:37
edmondsw	efried or other things on upgrade?	17:37
efried	So the logs will now have things like "Compute_service record updated for neo40:neo40" instead of "Compute_service record updated for neo40:8247-21L*215439A"	17:37
efried	I'm really hoping pvc doesn't use get_available_nodes anywhere. That would be... silly.	17:38
efried	Hah - esberglu we just figured out another way you could figure out what neo you're running against in the CI.	17:39
efried	...until this change drops, that is.	17:39
*** smatzek has joined #openstack-powervm		17:39
efried	Okay, so now I think that'll sort us out for init_host.	17:41
efried	Now edmondsw, what you were saying before about VCenter not supporting dynamic enable/disable during runtime - they do.	17:42
edmondsw	efried not really... they start returning blank data, but they don't disable the service	17:42
edmondsw	efried oh, my bad... you're right	17:43
efried	Their get_available_resource calls self._vc_state.get_host_stats(refresh=True), which calls update_status, which calls _set_host_enabled(False) if they get an exception trying to discover stuff.	17:43
edmondsw	I overlooked L85, thought the only call was L108	17:43
efried	Which is roughly what we might do. But before we settle on that, I'd like to bring up another possibility.	17:44
efried	This methodology relies on get_available_resource.	17:44
efried	Which means a couple of things:	17:44
efried	We're at the mercy of whatever interval the consumer configured for that guy to run.	17:44
efried	If something goes wrong in between hits, we won't disable until the next hit.	17:45
efried	Which could result in more failures than necessary.	17:46
efried	The other thing is that get_available_resource is gonna disappear pretty soon, so we'll probably have to move the logic to get_inventory() - but will need to make sure that's running in the same kind of periodic task.	17:46
edmondsw	efried right, I assumed that if we need to disable when NL services go down, that will be done differently, on failures wherever they occur	17:46
efried	So yeah, that's an option.	17:47
efried	In fact, there's ways of doing that where we could avoid get_available_resource altogether.	17:47
efried	One way to do that would be to add a helper to our Adapter.	17:47
efried	Those guys get wrapped around the low-level requests, so they can trap specific HTTP error codes and whatnot, which is kinda what we want to be on the lookout for.	17:48
efried	We get a "service unavailable", we disable right then.	17:48
efried	The question then becomes: how do we switch back on?	17:48
efried	We could keep that logic in get_available_resource.	17:49
efried	Or (go with me for a sec here) we could spawn a thread that polls the service and re-enables when it comes back to life.	17:49
*** chhavi has quit IRC		17:51
edmondsw	efried if we spawned a thread, we could do the same during init and not have to change get_available_nodes	17:56
efried	uh	17:56
edmondsw	efried no? am I missing something?	17:57
efried	Well, we would still have to account for the fact that get_available_resource could run before init_host has set up the host_wrapper.	17:57
edmondsw	efried you mean get_available_nodes?	17:57
efried	yeah, called by get_available_resource	17:57
edmondsw	yes, we'd have to add some error handling logic there but we could just return []	17:58
efried	hm, that could actually work.	17:58
edmondsw	efried no, called by update_available_resource	17:58
edmondsw	I kinda like returning [] there until nodes are actually available :)	17:59
efried	shit, that could just work anyway.	17:59
efried	yeah	17:59
efried	As long as the thing that re-enables the service is running NOT at the behest of update_available_resource	18:00
edmondsw	actually... no, I think we'd still want to change get_available_resource to hostname	18:01
edmondsw	I think we're getting hung up on the word available	18:01
edmondsw	it's available in the sense that there is a compute service setup there. It's not available in the sense that the service is disabled	18:01
edmondsw	I think nova means the former, though	18:01
edmondsw	they seem to have coded that way, as in update_available_resource	18:02
efried	Right	18:02
edmondsw	I'm still a little concerned that switching from MTMS to hostname is going to break something somewhere	18:03
efried	Anyway, let's table that and come back to it.	18:03
efried	I'm 95% sure that is an opaque string that nobody cares about unless there's more than one on a compute node (which I think only applies to ironic)	18:04
efried	We could return 'foo' and it would be okay.	18:04
efried	But let's table it for now.	18:04
efried	So I'm not a big fan of threads in general.	18:05
efried	The get_available_resource periodic task thing is probably "good enough".	18:05
efried	It's also worth noting that a recent change was made that automatically disables a compute service on which some number of consecutive deploys failed.	18:06
efried	10 by default, conf-able.	18:06
edmondsw	yeah, would be nice to be automatically re-enabling after they've done that	18:08
efried	Well, the stated design there is that the admin has to re-enable manually.	18:13
efried	There are many reasons beyond nvl services that deploys could be failing.	18:13
efried	edmondsw So the other thing I want to see done - possibly in a preceding change set - is factoring out the service enable/disable code, which is now used in at least four places I know of, including this one.	18:17
edmondsw	efried sure	18:19
efried	Okay, so to disable, we could a) rely on get_available_resource as the trigger; and/or b) add a helper to the adapter to disable on certain conditions	18:21
efried	To re-enable, we could a) rely on get_available_resource as the trigger; and/or b) spawn a thread when we disable (however that is) to poll for live-ness.	18:21
edmondsw	I'm working on (a) atm	18:22
edmondsw	for both	18:22
efried	Why I like relying on get_available_resource: It's freakin simple. Why I don't like it: we're at the mercy of periodic_task_interval/update_resources_interval	18:25
efried	I'm gonna declare that's okay for now. If we get complaints (I can almost guarantee we never will) we can look into swapping it around.	18:26
efried	I'd like there to be a record of the design alternatives, though. May be too heavyhanded to do it in code comments, but at least in the commit message.	18:28
efried	...so our posterity can git blame their way to finding it.	18:28
efried	You can refer to this IRC log - we're on eavesdrop.	18:28
efried	edmondsw In other news, I found out how to get the MTMS from /proc ;-)	18:29
edmondsw	efried oh reallyl?	18:31
efried	yup, gimme a sec.	18:31
mdrabe	edmondsw os-hosts being deprecated is gonna cause some problems	18:45
mdrabe	And about this get_available_nodes business, can it just return CONF.host?	18:46
mdrabe	What is CONF.host set to by default?	18:46
edmondsw	mdrabe yeah, I'm worried about the os-hosts thing	18:46
edmondsw	chatted with cjvolzka about that a bit, and will send a note	18:46
edmondsw	mdrabe I was just looking for CONF.host... so stay tuned	18:47
efried	edmondsw http://paste.openstack.org/show/611760/	18:49
efried	From what I can tell, if you don't set CONF.host, things that want it use socket.gethostname()	18:50
esberglu	efried: edmondsw: thorst: Thinking about reworking the patching logic in prep_devstack	18:51
esberglu	My idea is that instead of passing the patch lists through on the command line we provide a file that will have a project per line	18:51
esberglu	<project1> <patch_list_1>	18:51
esberglu	<project2> <patch_list_2>	18:51
esberglu	So we can patch multiple projects in the same run. And keep this file in powervm-ci so we can live update super easily	18:51
esberglu	Thoughts?	18:51
edmondsw	mdrabe https://github.com/openstack/nova/blob/master/nova/conf/netconf.py#L55	18:51
edmondsw	default for CONF.host is socket.gethostname()	18:52
edmondsw	efried that's what we'd talked about changing it to anyway	18:52
efried	esberglu Yes. Make the format project:branch:patch_list	18:52
mdrabe	CONF.host is what pvc does	18:52
edmondsw	turns out PowerVC is already overwriting the pvm driver to use CONF.host	18:52
edmondsw	mdrabe right	18:52
edmondsw	so I'm thinking we just change the driver to use CONF.host and PowerVC can just stop overwriting that	18:53
edmondsw	efried agreed?	18:53
esberglu	efried: Ok. Will wait until the external prep_devstack is working to get started on that	18:53
mdrabe	+1	18:53
efried	Shrug.	18:54
efried	Sure.	18:54
edmondsw	esberglu +1	18:54
*** k0da has joined #openstack-powervm		19:02
*** thorst_afk has quit IRC		19:11
*** thorst_afk has joined #openstack-powervm		19:15
edmondsw	efried how can I stop pvm-rest to simulate that going down?	19:18
*** thorst_afk has quit IRC		19:20
mdrabe	edmondsw try `service pvm-rest stop` maybe	19:20
edmondsw	mdrabe nope	19:20
mdrabe	sudo?	19:21
edmondsw	mdrabe yep... duh...	19:24
*** thorst_afk has joined #openstack-powervm		19:29
edmondsw	efried I just found an infinite loop condition in validate_vios_ready...	19:37
edmondsw	I'll propose something	19:37
efried	edmondsw Well, we haven't hit it yet...	19:38
edmondsw	efried not surprising, actually	19:38
edmondsw	efried that method is currently called right after you setup the adapter, so it should be good to go, and you'd only hit this if the adapter wasn't working	19:39
efried	edmondsw Okay, so if the adapter went belly up right after it was initialized successfully?	19:40
edmondsw	in current usage... but we're going to start calling this differently, so it'd be more likely	19:40
edmondsw	efried https://github.com/powervm/pypowervm/blob/master/pypowervm/tasks/partition.py#L263	19:41
edmondsw	if you always hit the Exception in L275, you never break out	19:41
edmondsw	max_wait_time is ignored	19:42
edmondsw	because rmc_down_vioses will always evaluate False in the if that can break out based on max_wait_time	19:42
esberglu	efried: Can I get a final review on 5406?	19:42
efried	on it now	19:43
esberglu	I want to deploy staging with that this afternoon	19:43
esberglu	efried: thanks	19:43
efried	esberglu done	19:43
efried	esberglu Is 5405 just WIP because it's waiting for the other to merge?	19:44
esberglu	efried: Yeah pretty much. And if i hit anything unexpected when testing on staging	19:56
-openstackstatus- NOTICE: The Gerrit service on review.openstack.org is being restarted now to clear some excessive connection counts while we debug the intermittent request failures reported over the past few minutes		20:06
*** smatzek has quit IRC		20:08
efried	esberglu Heads up, the nova project just went bananas with spurious merge conflicts on every pending change set.	21:09
efried	So dozens of rebases are gonna choke CIs everywhere.	21:09
efried	It'll be a good scale test for us, I suppose. Have we ever hit full capacity with a wait queue before?	21:09
efried	I count 38 in the last hour	21:11
esberglu	efried: Yeah we have hit our max nodes before. There are 86 runs in the queue right now (current max is 50). We can up the max probably	21:11
efried	Not sure if that's necessary, assuming they don't just drop off at some point.	21:12
efried	We're not likely to be the last CI posting results after this glut.	21:12
efried	The max in this context is the number of nodes running simultaneously?	21:13
esberglu	efried: Yep. We very rarely get up to 50 under normal circumstances but it does happen every now and then	21:13
efried	Increasing the number of nodes would put load on.. what?	21:13
efried	The systems those VMs live on? Are they shared proc/mem?	21:13
efried	And presumably the SSPs backing them.	21:14
esberglu	efried: I don't know enough about the performance side of things to say	21:14
efried	k. If you're saying it's really rare to hit the max, let's leave it alone. This is an anomaly, for sure.	21:14
esberglu	I think I have seen it hit the max 2 (maybe 3) other times ever. So yeah very rare	21:15
*** edmondsw_ has joined #openstack-powervm		21:22
*** zerick_ has joined #openstack-powervm		21:25
*** jpasqualetto has quit IRC		21:27
*** adi_____ has quit IRC		21:29
*** edmondsw has quit IRC		21:29
*** zerick has quit IRC		21:29
*** zerick_ is now known as zerick		21:30
openstackgerrit	Matt Rabe proposed openstack/nova-powervm master: Change NVRAM manager store to use uuid instead of instance object https://review.openstack.org/471926	21:30
*** esberglu has quit IRC		21:34
*** esberglu has joined #openstack-powervm		21:35
*** mdrabe has quit IRC		21:38
*** esberglu has quit IRC		21:39
*** esberglu has joined #openstack-powervm		21:48
*** esberglu has quit IRC		21:53
*** thorst_afk has quit IRC		21:53
*** edmondsw_ has quit IRC		21:58
*** svenkat has quit IRC		22:12
*** tjakobs has quit IRC		22:22
*** thorst_afk has joined #openstack-powervm		22:28
*** thorst_afk has quit IRC		22:32
*** jwcroppe has quit IRC		22:36
*** jwcroppe has joined #openstack-powervm		22:36
*** jwcroppe has quit IRC		22:36
*** jwcroppe has joined #openstack-powervm		22:37
*** jwcroppe has quit IRC		22:41
*** thorst_afk has joined #openstack-powervm		23:02
*** openstack has joined #openstack-powervm		23:14
*** thorst_afk has quit IRC		23:17
*** edmondsw has joined #openstack-powervm		23:18
*** adi_____ has joined #openstack-powervm		23:20
*** edmondsw has quit IRC		23:23
*** svenkat has joined #openstack-powervm		23:24
*** k0da has quit IRC		23:46

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!