*** e0ne has joined #openstack-placement | 06:16 | |
*** e0ne has quit IRC | 06:17 | |
*** e0ne has joined #openstack-placement | 07:49 | |
*** helenafm has joined #openstack-placement | 07:55 | |
*** cdent has joined #openstack-placement | 08:12 | |
*** cdent has quit IRC | 08:33 | |
*** cdent has joined #openstack-placement | 08:33 | |
*** openstackgerrit has quit IRC | 08:45 | |
*** irclogbot_3 has quit IRC | 08:49 | |
*** irclogbot_1 has joined #openstack-placement | 08:52 | |
*** tssurya has joined #openstack-placement | 09:40 | |
*** artom has joined #openstack-placement | 10:17 | |
cdent | gibi, stephenfin : two more performance fixes | 10:55 |
---|---|---|
cdent | easy : https://review.opendev.org/675648 | 10:55 |
cdent | harder: https://review.opendev.org/675606 | 10:56 |
cdent | that second one has a rather significant impact | 10:56 |
*** vdrok has quit IRC | 12:14 | |
*** vdrok has joined #openstack-placement | 12:14 | |
*** edleafe has joined #openstack-placement | 12:29 | |
*** jroll has quit IRC | 13:01 | |
*** jroll has joined #openstack-placement | 13:02 | |
*** efried_pto is now known as efried | 13:09 | |
*** mriedem has joined #openstack-placement | 13:18 | |
*** belmoreira has joined #openstack-placement | 14:53 | |
*** tssurya has quit IRC | 15:18 | |
cdent | efried: when you get a moment, this might blow off the top of your head: https://review.opendev.org/675606 | 15:21 |
*** tssurya has joined #openstack-placement | 15:21 | |
efried | cdent: on my list | 15:21 |
cdent | cool, thanks | 15:21 |
efried | brain still grinding into gear today. Been a bit... off lately. | 15:21 |
efried | and I know that one's going to require brain. | 15:21 |
cdent | i know how it can be when brain being off | 15:22 |
*** openstackgerrit has joined #openstack-placement | 15:30 | |
openstackgerrit | Merged openstack/placement master: Use expanding bindparam in get_traits_by_provider_tree https://review.opendev.org/675648 | 15:30 |
*** belmoreira has quit IRC | 15:44 | |
*** helenafm has quit IRC | 15:53 | |
*** e0ne has quit IRC | 16:01 | |
gibi | cdent: left comments in https://review.opendev.org/#/c/675606 | 16:07 |
cdent | thanks | 16:09 |
openstackgerrit | Chris Dent proposed openstack/placement master: Avoid duplicate ProviderSummary in _merge_candidates https://review.opendev.org/675606 | 16:28 |
cdent | gibi: fixed ^ | 16:28 |
*** dklyle_ is now known as dklyle | 16:34 | |
efried | cdent: How can I run one gabbi test and have it print out the UUIDs from os.environ? | 16:36 |
efried | (in a way that I can tell which keys belong to which values) | 16:36 |
cdent | efried: need slightly more context | 16:37 |
efried | Okay | 16:37 |
efried | I think https://review.opendev.org/#/c/675606/3/placement/objects/allocation_candidate.py@934 is very wrong, and I'm trying to prove it | 16:37 |
efried | so I reverted it, and it fails with KeyError, which I'm sure is why you changed it | 16:38 |
cdent | yes | 16:38 |
efried | so when I get there I want to print out some of the provider UUIDs to prove that that key belongs to a not-root provider, which is what we're assuming it is when it's not found in the dict. | 16:38 |
efried | just thought of another way to do that... | 16:39 |
cdent | yeah, I wouldn't bother with gabbi for htat | 16:39 |
efried | (from the code, not from the gabbi | 16:39 |
efried | ) | 16:39 |
efried | um, still | 16:40 |
cdent | the different here is that parent_uuid_by_rp_uuid has different stuff in it now that I've changed the algorithm | 16:41 |
cdent | less stuff | 16:41 |
efried | Yes, I understand that. | 16:41 |
cdent | it used to have _all_ the stuff | 16:41 |
cdent | (even though it shouldn't have) | 16:41 |
efried | In the cases you were getting KeyError, it's because you're no longer processing cn1 | 16:41 |
efried | And you're assuming numa0 and numa1 are roots | 16:42 |
efried | I just need to prove it | 16:42 |
cdent | that may b | 16:43 |
cdent | if so, we need a test for it, because all the tests currently pass | 16:43 |
cdent | and we need to make sure that dict gets the right stuff | 16:43 |
efried | tests do *not* pass | 16:43 |
efried | unless you do that .get() | 16:43 |
efried | which is bypassing the bug | 16:43 |
efried | The test cases that cover that path, albeit somewhat incidentally, are those where we return results that don't involve the root provider. | 16:44 |
cdent | what i mean is that we need a more explicit test | 16:44 |
cdent | it may be as simple populating the parent_uuid thing somewhere else in the loop | 16:45 |
cdent | or more | 16:45 |
cdent | you accept the rest of the problems, yes, that too much data was involved? | 16:45 |
cdent | also, my change to use get is still returning the same actual value, just not a literal value in the dict | 16:46 |
cdent | so I'm not sure... | 16:46 |
efried | say what?? | 16:47 |
efried | "returning the same actual value" - it's returning None when the key is absent. | 16:47 |
efried | there is no "same actual value" | 16:47 |
efried | I must be misunderstanding you. | 16:47 |
efried | Yes, I accept that we needn't process psums that aren't involved in the request - as long as we make sure to process all the psums in all the *trees* that have providers involved in the request. | 16:48 |
cdent | I _thought_, but maybe not, that parent_uuid is None | 16:48 |
efried | parent_uuid is None only for roots. | 16:49 |
efried | or, should be. | 16:49 |
cdent | yes | 16:49 |
efried | otherwise it should have a value | 16:49 |
cdent | yes | 16:49 |
efried | swhy I'm trying to print out cn1's uuid | 16:49 |
* cdent nods | 16:49 | |
cdent | however | 16:49 |
efried | printing uuids for cn1 and cn2 to prove that the one we're getting KeyError on isn't one of them. | 16:49 |
cdent | if the tests pass when I force None there, that means that the tests already expected None there | 16:50 |
cdent | that is they expected the recursion to hit its base case | 16:50 |
cdent | so what's happening is that parent_uuid_by_rp_uuid is sparse | 16:50 |
* cdent is speculating | 16:51 | |
* cdent we will wait while efried doesn't further exploration. | 16:51 | |
efried | Right, I'll accept that we should have an existing same_subtree test go deeper to confirm that the result changes when we're assuming numa0/1 is the root, which ought to be the case. | 16:51 |
cdent | I'm in the midst of a tedious email, will rejoin this in a moment | 16:52 |
efried | because that ought to be the impact. We're using that recursive thingy to calculate same_subtree-ness | 16:52 |
efried | so if we're getting too short a tree, we should get a different result. | 16:52 |
efried | but whatever test(s) we're doing would happen to get the same result either way. | 16:52 |
efried | which probably means we need a test with a deeper tree, where the subtree root is at level 3 instead of level 2. | 16:52 |
cdent | i reckon root of the subtree needs to be resourcelesss with a resourceless gap between it and anything providing resources in the same tree | 16:56 |
cdent | a parent without resources having a parent without resources | 16:56 |
efried | f, summary.resource_provider.name is empty | 16:58 |
efried | not | 16:58 |
efried | helpful | 16:58 |
efried | I feel really stupid that I can't f'ing figure out how to printf wtf I need here. | 16:59 |
cdent | maybe you're trying to be too focused | 16:59 |
cdent | just print all the stuff | 16:59 |
efried | that's what I thought I was doing, but without 'name' it's pretty hard for me to tell which provider is which. | 17:00 |
cdent | line 538 the name isn't being set because we don't have it | 17:02 |
cdent | (i guess to save space) | 17:03 |
efried | fwiw it's not being set on the arr either | 17:04 |
efried | (not sure if those are coming from the same place, prolly are) | 17:04 |
cdent | i think it all comes from creating provider_ids, which was optimized (by jay I think) for space | 17:04 |
cdent | you can LOG.debug when you start the fixtures with the data you care about, printing a mapping of name to uuid | 17:05 |
efried | I tried that | 17:06 |
efried | the log didn't show up | 17:06 |
cdent | because OS_DEBUG isn't true and a test didn't fail | 17:07 |
cdent | you can either change it to warning or some other high level, or force the test to fail | 17:07 |
efried | I failed the test, and printed it as info. | 17:07 |
efried | "when you start the fixtures" -> I was doing it at the end of the fixture setup, is that what you meant? | 17:08 |
efried | I added name to the psum | 17:08 |
efried | so now I can see that the KeyError is indeed coming off of cn1's UUID | 17:09 |
efried | which means this will work until we have a candidate that doesn't involve the root or first tier down. | 17:10 |
cdent | we should see about adding that as a test | 17:10 |
cdent | because we don't know for certain that what you said is true, only that something that doesn't have a parent is not in the dict | 17:11 |
cdent | you're likely right | 17:11 |
cdent | that is: it not having a parent, in this instance, is correct, so we need a setup where we see the problem where it does | 17:12 |
efried | yes. I'm trying to formulate that in words on the review. Then we get to decide who writes that test. | 17:14 |
efried | Because I'm pretty loaded down, so I would like it to be not-me if possible. | 17:14 |
efried | Just need to make someone understand what the test needs to look like. | 17:14 |
cdent | i can probably make it happen if you do that ^ | 17:14 |
efried | cdent: I left the review. I haven't codified precisely what that test needs to look like, because I don't really know yet. Did I give enough information for someone to experiment until found? | 17:17 |
cdent | I _think_ so | 17:19 |
cdent | it at least gives me enough to mess about with and see what I can break | 17:19 |
efried | cdent: all that said... | 17:21 |
efried | I'm not sure that part of the optimization is the necessary part to reduce from 21000 to 7000 | 17:21 |
cdent | it was necessary to remove carrying around ids | 17:22 |
cdent | (or psums) | 17:22 |
cdent | you have to look at something to get rp ids | 17:22 |
efried | I suspect it would reduce from, like, 7000 to 6000 or something, by eliminating the summaries for providers not involved in the request | 17:22 |
efried | which, as noted, might not be the right thing to do | 17:22 |
efried | (Are we still returning all summaries for the whole tree around providers involved in the request?) | 17:23 |
cdent | we have tests for that, yes? | 17:23 |
cdent | I _absolutely_ rely on the tests when doing this stuff. If we have gaps, we need to fill them. | 17:24 |
cdent | s/absolutely/solely/ | 17:24 |
efried | we should, yeah | 17:24 |
efried | anyway, what I'm saying is... | 17:24 |
efried | I think the reduction from 21000 to 7000 happened by virtue of removing the summaries as part of the return values you're slinging around between methods. | 17:24 |
cdent | yes, that's a given | 17:25 |
efried | ...and relying solely on the cache in rw_ctx | 17:25 |
cdent | the chunk of code that is setting the parent_by dict is a result of that change, not a helper fo rit | 17:25 |
efried | well then, a) let's split that change out, because I can get behind that being right, and b) then we can rationalize whether this other thing actually saves us anything, and maybe we don't need to "fix" this "problem" at all. | 17:25 |
cdent | we do | 17:26 |
cdent | I already established that somewhere in the process | 17:26 |
cdent | not that I can recall it today | 17:26 |
efried | Based on some of the comment residue, it looks like you went through some stages in the middle | 17:26 |
efried | and I don't doubt that this may have been necessary in one of those stages. | 17:27 |
efried | But I don't think it's necessary right now. | 17:27 |
cdent | hmmm | 17:27 |
cdent | i'll look, but I'm not convinced | 17:27 |
efried | I think if you cut L737 back to previous algorithm, but use psums from rw_ctx instead of from candidates (cause they're no longer there) | 17:27 |
efried | ...then you'll have achieved the 21->7 reduction | 17:27 |
cdent | rw_ctx has all provider summaries | 17:28 |
efried | yes | 17:28 |
cdent | we are in one of multiple calls to merge | 17:28 |
efried | yes | 17:28 |
cdent | so that's not good enought | 17:28 |
cdent | i mean, if you want to go that way just so we can merge stuff in stages, I suppose that's an option, but it's far from ideal | 17:31 |
efried | I think it's L383 and 453 that effects the 21->7 reduction. | 17:31 |
efried | How hard is it for you to prove the 21000 vs 7000 bit? | 17:31 |
efried | I mean, can you set that test up pretty easily? | 17:31 |
cdent | it's not simply about proving the 21000 vs 7000 bit | 17:31 |
cdent | that was an offshoot of trying to fix that merge_candidates was doing too much work | 17:32 |
cdent | the goal is to make merge_candidates do less work | 17:32 |
cdent | if it happened to be less expensive to mess with 21000 redundant objects I'd be happy with that | 17:32 |
cdent | but it's not | 17:32 |
cdent | but there's more to it than that | 17:32 |
cdent | further, without the change I made we wouldn't have discovered that we've got a testing gap | 17:34 |
cdent | (thigns would have worked by accident of data structure, not intent) | 17:34 |
cdent | in any case: I will work on this some more and attempt to make it better on several dimensions | 17:34 |
efried | so | 17:34 |
cdent | but unlikely tonight | 17:35 |
efried | what we want to do is | 17:36 |
efried | make _merge_candidates only iterate over the providers involved in the request | 17:36 |
efried | but we're pretty sure that's going to have to include tree members, or at least ancestors, if we want to be building parent_uuid_by_rp_uuid | 17:36 |
efried | but | 17:36 |
efried | what we could do is build that dict ^ just once, outside of _merge_candidates, based on the (whole) psums dict from rw_ctx. | 17:37 |
efried | and then the reduction of the loop in _merge_candidates makes total sense. | 17:37 |
cdent | yes, that's effectively what I saw above when I said something like "we could build the dict somewhere else" | 17:37 |
efried | because we're only relying on it to do capacity checks | 17:37 |
cdent | s/saw/said/ | 17:37 |
efried | which can be legitimately restricted to single providers. | 17:38 |
efried | okay, neat. | 17:38 |
efried | If we did that, we wouldn't need that .get(), and I would be happy to acknowledge the testing gap with a TODO. | 17:39 |
cdent | I think it's really a case of: come up with the broken test | 17:39 |
cdent | fix it | 17:39 |
cdent | no | 17:39 |
cdent | much better to fill the gap | 17:39 |
cdent | and fix it | 17:39 |
efried | as you wish. I'm just saying if the code were structured that way, I would be able to convince myself by inspection that it's still correct. | 17:40 |
cdent | right, but I can't structure code the "right way" without having a test that shows me the wrong way | 17:40 |
efried | because we would have effectively not changed the algorithm. | 17:40 |
cdent | because as far as the tests are cocncerned, right now, I've done it the right way. even the "best way" | 17:41 |
cdent | but you've found a hole | 17:41 |
cdent | that's the important part | 17:41 |
efried | okay. | 17:41 |
cdent | convincing by inspection gives me shivers | 17:42 |
*** tssurya has quit IRC | 17:42 | |
* cdent dines | 17:49 | |
*** cdent has quit IRC | 17:49 | |
*** e0ne has joined #openstack-placement | 20:26 | |
*** efried has quit IRC | 20:39 | |
*** efried has joined #openstack-placement | 20:40 | |
*** artom has quit IRC | 20:46 | |
*** gryf has quit IRC | 20:58 | |
*** e0ne has quit IRC | 21:49 | |
*** mriedem is now known as mriedem_afk | 22:03 | |
*** spatel has joined #openstack-placement | 22:14 | |
*** spatel has quit IRC | 22:37 | |
*** mriedem_afk has quit IRC | 23:19 | |
*** takashin has joined #openstack-placement | 23:38 | |
*** spatel has joined #openstack-placement | 23:54 | |
*** alex_xu has quit IRC | 23:57 | |
*** spatel has quit IRC | 23:59 | |
*** alex_xu has joined #openstack-placement | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!