Monday, 2019-08-12

*** e0ne has joined #openstack-placement		06:16
*** e0ne has quit IRC		06:17
*** e0ne has joined #openstack-placement		07:49
*** helenafm has joined #openstack-placement		07:55
*** cdent has joined #openstack-placement		08:12
*** cdent has quit IRC		08:33
*** cdent has joined #openstack-placement		08:33
*** openstackgerrit has quit IRC		08:45
*** irclogbot_3 has quit IRC		08:49
*** irclogbot_1 has joined #openstack-placement		08:52
*** tssurya has joined #openstack-placement		09:40
*** artom has joined #openstack-placement		10:17
cdent	gibi, stephenfin : two more performance fixes	10:55
cdent	easy : https://review.opendev.org/675648	10:55
cdent	harder: https://review.opendev.org/675606	10:56
cdent	that second one has a rather significant impact	10:56
*** vdrok has quit IRC		12:14
*** vdrok has joined #openstack-placement		12:14
*** edleafe has joined #openstack-placement		12:29
*** jroll has quit IRC		13:01
*** jroll has joined #openstack-placement		13:02
*** efried_pto is now known as efried		13:09
*** mriedem has joined #openstack-placement		13:18
*** belmoreira has joined #openstack-placement		14:53
*** tssurya has quit IRC		15:18
cdent	efried: when you get a moment, this might blow off the top of your head: https://review.opendev.org/675606	15:21
*** tssurya has joined #openstack-placement		15:21
efried	cdent: on my list	15:21
cdent	cool, thanks	15:21
efried	brain still grinding into gear today. Been a bit... off lately.	15:21
efried	and I know that one's going to require brain.	15:21
cdent	i know how it can be when brain being off	15:22
*** openstackgerrit has joined #openstack-placement		15:30
openstackgerrit	Merged openstack/placement master: Use expanding bindparam in get_traits_by_provider_tree https://review.opendev.org/675648	15:30
*** belmoreira has quit IRC		15:44
*** helenafm has quit IRC		15:53
*** e0ne has quit IRC		16:01
gibi	cdent: left comments in https://review.opendev.org/#/c/675606	16:07
cdent	thanks	16:09
openstackgerrit	Chris Dent proposed openstack/placement master: Avoid duplicate ProviderSummary in _merge_candidates https://review.opendev.org/675606	16:28
cdent	gibi: fixed ^	16:28
*** dklyle_ is now known as dklyle		16:34
efried	cdent: How can I run one gabbi test and have it print out the UUIDs from os.environ?	16:36
efried	(in a way that I can tell which keys belong to which values)	16:36
cdent	efried: need slightly more context	16:37
efried	Okay	16:37
efried	I think https://review.opendev.org/#/c/675606/3/placement/objects/allocation_candidate.py@934 is very wrong, and I'm trying to prove it	16:37
efried	so I reverted it, and it fails with KeyError, which I'm sure is why you changed it	16:38
cdent	yes	16:38
efried	so when I get there I want to print out some of the provider UUIDs to prove that that key belongs to a not-root provider, which is what we're assuming it is when it's not found in the dict.	16:38
efried	just thought of another way to do that...	16:39
cdent	yeah, I wouldn't bother with gabbi for htat	16:39
efried	(from the code, not from the gabbi	16:39
efried	)	16:39
efried	um, still	16:40
cdent	the different here is that parent_uuid_by_rp_uuid has different stuff in it now that I've changed the algorithm	16:41
cdent	less stuff	16:41
efried	Yes, I understand that.	16:41
cdent	it used to have _all_ the stuff	16:41
cdent	(even though it shouldn't have)	16:41
efried	In the cases you were getting KeyError, it's because you're no longer processing cn1	16:41
efried	And you're assuming numa0 and numa1 are roots	16:42
efried	I just need to prove it	16:42
cdent	that may b	16:43
cdent	if so, we need a test for it, because all the tests currently pass	16:43
cdent	and we need to make sure that dict gets the right stuff	16:43
efried	tests do not pass	16:43
efried	unless you do that .get()	16:43
efried	which is bypassing the bug	16:43
efried	The test cases that cover that path, albeit somewhat incidentally, are those where we return results that don't involve the root provider.	16:44
cdent	what i mean is that we need a more explicit test	16:44
cdent	it may be as simple populating the parent_uuid thing somewhere else in the loop	16:45
cdent	or more	16:45
cdent	you accept the rest of the problems, yes, that too much data was involved?	16:45
cdent	also, my change to use get is still returning the same actual value, just not a literal value in the dict	16:46
cdent	so I'm not sure...	16:46
efried	say what??	16:47
efried	"returning the same actual value" - it's returning None when the key is absent.	16:47
efried	there is no "same actual value"	16:47
efried	I must be misunderstanding you.	16:47
efried	Yes, I accept that we needn't process psums that aren't involved in the request - as long as we make sure to process all the psums in all the trees that have providers involved in the request.	16:48
cdent	I _thought_, but maybe not, that parent_uuid is None	16:48
efried	parent_uuid is None only for roots.	16:49
efried	or, should be.	16:49
cdent	yes	16:49
efried	otherwise it should have a value	16:49
cdent	yes	16:49
efried	swhy I'm trying to print out cn1's uuid	16:49
* cdent nods		16:49
cdent	however	16:49
efried	printing uuids for cn1 and cn2 to prove that the one we're getting KeyError on isn't one of them.	16:49
cdent	if the tests pass when I force None there, that means that the tests already expected None there	16:50
cdent	that is they expected the recursion to hit its base case	16:50
cdent	so what's happening is that parent_uuid_by_rp_uuid is sparse	16:50
* cdent is speculating		16:51
* cdent we will wait while efried doesn't further exploration.		16:51
efried	Right, I'll accept that we should have an existing same_subtree test go deeper to confirm that the result changes when we're assuming numa0/1 is the root, which ought to be the case.	16:51
cdent	I'm in the midst of a tedious email, will rejoin this in a moment	16:52
efried	because that ought to be the impact. We're using that recursive thingy to calculate same_subtree-ness	16:52
efried	so if we're getting too short a tree, we should get a different result.	16:52
efried	but whatever test(s) we're doing would happen to get the same result either way.	16:52
efried	which probably means we need a test with a deeper tree, where the subtree root is at level 3 instead of level 2.	16:52
cdent	i reckon root of the subtree needs to be resourcelesss with a resourceless gap between it and anything providing resources in the same tree	16:56
cdent	a parent without resources having a parent without resources	16:56
efried	f, summary.resource_provider.name is empty	16:58
efried	not	16:58
efried	helpful	16:58
efried	I feel really stupid that I can't f'ing figure out how to printf wtf I need here.	16:59
cdent	maybe you're trying to be too focused	16:59
cdent	just print all the stuff	16:59
efried	that's what I thought I was doing, but without 'name' it's pretty hard for me to tell which provider is which.	17:00
cdent	line 538 the name isn't being set because we don't have it	17:02
cdent	(i guess to save space)	17:03
efried	fwiw it's not being set on the arr either	17:04
efried	(not sure if those are coming from the same place, prolly are)	17:04
cdent	i think it all comes from creating provider_ids, which was optimized (by jay I think) for space	17:04
cdent	you can LOG.debug when you start the fixtures with the data you care about, printing a mapping of name to uuid	17:05
efried	I tried that	17:06
efried	the log didn't show up	17:06
cdent	because OS_DEBUG isn't true and a test didn't fail	17:07
cdent	you can either change it to warning or some other high level, or force the test to fail	17:07
efried	I failed the test, and printed it as info.	17:07
efried	"when you start the fixtures" -> I was doing it at the end of the fixture setup, is that what you meant?	17:08
efried	I added name to the psum	17:08
efried	so now I can see that the KeyError is indeed coming off of cn1's UUID	17:09
efried	which means this will work until we have a candidate that doesn't involve the root or first tier down.	17:10
cdent	we should see about adding that as a test	17:10
cdent	because we don't know for certain that what you said is true, only that something that doesn't have a parent is not in the dict	17:11
cdent	you're likely right	17:11
cdent	that is: it not having a parent, in this instance, is correct, so we need a setup where we see the problem where it does	17:12
efried	yes. I'm trying to formulate that in words on the review. Then we get to decide who writes that test.	17:14
efried	Because I'm pretty loaded down, so I would like it to be not-me if possible.	17:14
efried	Just need to make someone understand what the test needs to look like.	17:14
cdent	i can probably make it happen if you do that ^	17:14
efried	cdent: I left the review. I haven't codified precisely what that test needs to look like, because I don't really know yet. Did I give enough information for someone to experiment until found?	17:17
cdent	I _think_ so	17:19
cdent	it at least gives me enough to mess about with and see what I can break	17:19
efried	cdent: all that said...	17:21
efried	I'm not sure that part of the optimization is the necessary part to reduce from 21000 to 7000	17:21
cdent	it was necessary to remove carrying around ids	17:22
cdent	(or psums)	17:22
cdent	you have to look at something to get rp ids	17:22
efried	I suspect it would reduce from, like, 7000 to 6000 or something, by eliminating the summaries for providers not involved in the request	17:22
efried	which, as noted, might not be the right thing to do	17:22
efried	(Are we still returning all summaries for the whole tree around providers involved in the request?)	17:23
cdent	we have tests for that, yes?	17:23
cdent	I _absolutely_ rely on the tests when doing this stuff. If we have gaps, we need to fill them.	17:24
cdent	s/absolutely/solely/	17:24
efried	we should, yeah	17:24
efried	anyway, what I'm saying is...	17:24
efried	I think the reduction from 21000 to 7000 happened by virtue of removing the summaries as part of the return values you're slinging around between methods.	17:24
cdent	yes, that's a given	17:25
efried	...and relying solely on the cache in rw_ctx	17:25
cdent	the chunk of code that is setting the parent_by dict is a result of that change, not a helper fo rit	17:25
efried	well then, a) let's split that change out, because I can get behind that being right, and b) then we can rationalize whether this other thing actually saves us anything, and maybe we don't need to "fix" this "problem" at all.	17:25
cdent	we do	17:26
cdent	I already established that somewhere in the process	17:26
cdent	not that I can recall it today	17:26
efried	Based on some of the comment residue, it looks like you went through some stages in the middle	17:26
efried	and I don't doubt that this may have been necessary in one of those stages.	17:27
efried	But I don't think it's necessary right now.	17:27
cdent	hmmm	17:27
cdent	i'll look, but I'm not convinced	17:27
efried	I think if you cut L737 back to previous algorithm, but use psums from rw_ctx instead of from candidates (cause they're no longer there)	17:27
efried	...then you'll have achieved the 21->7 reduction	17:27
cdent	rw_ctx has all provider summaries	17:28
efried	yes	17:28
cdent	we are in one of multiple calls to merge	17:28
efried	yes	17:28
cdent	so that's not good enought	17:28
cdent	i mean, if you want to go that way just so we can merge stuff in stages, I suppose that's an option, but it's far from ideal	17:31
efried	I think it's L383 and 453 that effects the 21->7 reduction.	17:31
efried	How hard is it for you to prove the 21000 vs 7000 bit?	17:31
efried	I mean, can you set that test up pretty easily?	17:31
cdent	it's not simply about proving the 21000 vs 7000 bit	17:31
cdent	that was an offshoot of trying to fix that merge_candidates was doing too much work	17:32
cdent	the goal is to make merge_candidates do less work	17:32
cdent	if it happened to be less expensive to mess with 21000 redundant objects I'd be happy with that	17:32
cdent	but it's not	17:32
cdent	but there's more to it than that	17:32
cdent	further, without the change I made we wouldn't have discovered that we've got a testing gap	17:34
cdent	(thigns would have worked by accident of data structure, not intent)	17:34
cdent	in any case: I will work on this some more and attempt to make it better on several dimensions	17:34
efried	so	17:34
cdent	but unlikely tonight	17:35
efried	what we want to do is	17:36
efried	make _merge_candidates only iterate over the providers involved in the request	17:36
efried	but we're pretty sure that's going to have to include tree members, or at least ancestors, if we want to be building parent_uuid_by_rp_uuid	17:36
efried	but	17:36
efried	what we could do is build that dict ^ just once, outside of _merge_candidates, based on the (whole) psums dict from rw_ctx.	17:37
efried	and then the reduction of the loop in _merge_candidates makes total sense.	17:37
cdent	yes, that's effectively what I saw above when I said something like "we could build the dict somewhere else"	17:37
efried	because we're only relying on it to do capacity checks	17:37
cdent	s/saw/said/	17:37
efried	which can be legitimately restricted to single providers.	17:38
efried	okay, neat.	17:38
efried	If we did that, we wouldn't need that .get(), and I would be happy to acknowledge the testing gap with a TODO.	17:39
cdent	I think it's really a case of: come up with the broken test	17:39
cdent	fix it	17:39
cdent	no	17:39
cdent	much better to fill the gap	17:39
cdent	and fix it	17:39
efried	as you wish. I'm just saying if the code were structured that way, I would be able to convince myself by inspection that it's still correct.	17:40
cdent	right, but I can't structure code the "right way" without having a test that shows me the wrong way	17:40
efried	because we would have effectively not changed the algorithm.	17:40
cdent	because as far as the tests are cocncerned, right now, I've done it the right way. even the "best way"	17:41
cdent	but you've found a hole	17:41
cdent	that's the important part	17:41
efried	okay.	17:41
cdent	convincing by inspection gives me shivers	17:42
*** tssurya has quit IRC		17:42
* cdent dines		17:49
*** cdent has quit IRC		17:49
*** e0ne has joined #openstack-placement		20:26
*** efried has quit IRC		20:39
*** efried has joined #openstack-placement		20:40
*** artom has quit IRC		20:46
*** gryf has quit IRC		20:58
*** e0ne has quit IRC		21:49
*** mriedem is now known as mriedem_afk		22:03
*** spatel has joined #openstack-placement		22:14
*** spatel has quit IRC		22:37
*** mriedem_afk has quit IRC		23:19
*** takashin has joined #openstack-placement		23:38
*** spatel has joined #openstack-placement		23:54
*** alex_xu has quit IRC		23:57
*** spatel has quit IRC		23:59
*** alex_xu has joined #openstack-placement		23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!