| *** mhen_ is now known as mhen | 02:04 | |
| opendevreview | Julien LE JEUNE proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156 | 07:46 |
|---|---|---|
| opendevreview | Julien LE JEUNE proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156 | 07:49 |
| nicolairuckel | I'm working on this patch to persist the NVRAM: https://review.opendev.org/c/openstack/nova/+/959682 So far it works for all migrations but I habve to manually add read permissions to `instance-0000003_VARS.fd`. The file for `disk` already has those. In the libvirt IRC they told me that not having global read permissions is intentional and I'm wondering why the disk in Nova has those permissions and how I | 07:56 |
| nicolairuckel | can set them for the NVRAM file as well. I asked this in the libvirt channel and was told that this might be a better place for this question. | 07:56 |
| opendevreview | Julien LE JEUNE proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156 | 08:48 |
| opendevreview | Julien Le Jeune proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156 | 09:37 |
| sean-k-mooney | gibi: i dont know if you read this comment https://review.opendev.org/c/openstack/placement/+/962776/comment/f68ed8e5_0803b616/ but when i was running the tests locally i noticed that changing the request_limit paramter had no perforncem imporment | 10:30 |
| sean-k-mooney | """ min(conf.max_allocation_candidates, requst_limit) """ | 10:30 |
| sean-k-mooney | i think we are missing ^ somewhere | 10:30 |
| gibi_ | sean-k-mooney: we intentionally not changed the meaning of the limit parameter when max_allocation_candidates was introduced due to | 11:16 |
| gibi_ | Placement generates these candidates fully before applying the limit parameter provided in the allocation candidate query to be able do a random sampling if [placement]randomize_allocation_candidates is True. | 11:16 |
| gibi_ | the limit parameter is applied last | 11:16 |
| gibi_ | if we change that then that changes how randomize_allocation_candidates behave | 11:17 |
| gibi_ | so we did not changed limit | 11:17 |
| *** gibi_ is now known as gibi | 11:17 | |
| sean-k-mooney | hum i see | 11:35 |
| sean-k-mooney | that is unfortunate | 11:35 |
| sean-k-mooney | but i guess that the legacy we are stuck with | 11:36 |
| sean-k-mooney | there are two cases i guess to consdier request_limit < max_allocation_candidates | 11:37 |
| sean-k-mooney | i was hoping in this case we coudl treadt it as if max_allocation_candidates == request_limit | 11:37 |
| sean-k-mooney | in the other case its already going to be enforced | 11:37 |
| sean-k-mooney | i wonder if we coudl implemetn the randomize_allocation_candiates diffently | 11:39 |
| sean-k-mooney | we currently have depthfirst or breathfirst search | 11:39 |
| sean-k-mooney | but couldl we instead do a random search for that usecase. | 11:39 |
| sean-k-mooney | it would not be exactuly the same behavior but the randomisation was really there because of the orginial depth first approch to generating candiates | 11:40 |
| sean-k-mooney | if we are using breathfirst its much less required | 11:41 |
| sean-k-mooney | that a sepreate poblem then the one that your trying to fix but i wonder fi we should conitnue to supprot that if we cant apply optimisation based on teh request limit | 11:41 |
| sean-k-mooney | in otherwords i think we could provide randomness in the generation fo the candiate without needign to generate all of them and doing a random choice at the end | 11:42 |
| sean-k-mooney | that would fulfil the same usecase but with a lot less wasted effort | 11:43 |
| sean-k-mooney | maybe another topic for the ptg dicussion? | 11:43 |
| sean-k-mooney | gibi: basically im not sure that genrating all possible candiate is part of the api contract fo that parmater so we shoudl be free to evolve its meaning over tiem to be more effeictent provided it still has the same overall effect | 11:46 |
| gibi | I'm OK to discuss this on the PTG. But for me this is a small issue that limit and max_allocation_candidates has two different meaning today | 11:57 |
| sean-k-mooney | ya its minor but it means the clinet cant contol the perfoamce of the request by asking for less cnadiates in the responce | 11:58 |
| sean-k-mooney | to me it undermines why the request_limit parmater exists in the api in the first place | 11:59 |
| gibi | that parameter from the start only limited the size of the response nothing lese | 11:59 |
| gibi | else | 11:59 |
| sean-k-mooney | sure but i woudl expect it to do more then that as an end user and it prevent nova for example form being smart and say (i knwo this query might generate a lot of candiate so im going to use a lower limit dynicmicly) | 12:00 |
| sean-k-mooney | not that nova has the intelegents to know when it shoudl do that | 12:01 |
| sean-k-mooney | so its not a blocker but i just find it surprising | 12:01 |
| sean-k-mooney | gibi: i noticed you commented on your inlining patches that they dont seam to provide a benifit | 12:02 |
| sean-k-mooney | gibi: do you think that is a dead end or are you plannign to do more expermients in that direction | 12:02 |
| gibi | unfortunately the randomize_allocation_candidates option defined as | 12:03 |
| gibi | If True, when limiting allocation candidate results, the results will be a random sampling of the full result set. | 12:03 |
| gibi | *full result set* :/ | 12:03 |
| gibi | sean-k-mooney: the first inlining helps, the second not much | 12:03 |
| sean-k-mooney | right but it does not define what the full result set is | 12:04 |
| gibi | this helps If True, when limiting allocation candidate results, the results will be a random sampling of the full result set. | 12:04 |
| gibi | sorry | 12:04 |
| gibi | wrong buffer | 12:04 |
| gibi | this helps https://review.opendev.org/c/openstack/placement/+/963052/1/placement/tests/functional/test_allocation_candidates.py#265 | 12:04 |
| gibi | this does not really help https://review.opendev.org/c/openstack/placement/+/963053/2/placement/tests/functional/test_allocation_candidates.py | 12:05 |
| sean-k-mooney | gibi: so inlining _consolidate_allocation_requests helps but not exceeds_capacity | 12:05 |
| gibi | yep | 12:05 |
| sean-k-mooney | i see form the comment you have been trying to use some profiles to help with this, have you found that useful | 12:06 |
| gibi | I will not trying to inline more, the profiler shows that the time is spent on hashing and eqality checking the building blocks but I don't see easy moves there | 12:06 |
| gibi | I used just the built in cProfiler | 12:07 |
| gibi | as you can see in the commented out code | 12:07 |
| sean-k-mooney | ya i was just wondering if the output was useful | 12:07 |
| sean-k-mooney | i have never used that before hence the question | 12:07 |
| gibi | it tells you what called how much and what time it took so it shows you where your spending your execution time | 12:08 |
| gibi | it showed that consolidate is more expensive than exeeced_capacity | 12:08 |
| sean-k-mooney | ack, i have used flamegraph profilers in other lanaguges in the past just never really used those types of tools much with openstack because they hate eventlets | 12:08 |
| sean-k-mooney | placement fortunetly does not use eventlet so that is not an issue | 12:09 |
| gibi | yeah, I just enabled it in athe _merge_candidate call so no parallism was involved | 12:09 |
| sean-k-mooney | but i tried to use them on nova once on the schduler and they were not really that useful when used as external tools | 12:09 |
| gibi | I'm now providing test coverage for the prune patch and we can decide together when dansmith is up that which inlining patch if any we want to take in | 12:10 |
| sean-k-mooney | how do you feel about this overall approch for the current bug. are you happy that it gives us enough breathing room for the current reproted edgecase to move forward with for now? | 12:10 |
| gibi | I think the prune patch already give us time to re-thing the modeling. I think the inlining of the _consoliodate call is still have a good ROI on code complexity. I would at least merge these two optimization. I will then go and update our backlog space about the problem with the new learning and we can discuss on the PTG. My attached customer case probably closable with only the prune patch in | 12:12 |
| gibi | s/space/spec/ | 12:12 |
| sean-k-mooney | im kind of suprised that manual function inlineing has such an effect in a dynmic language like python. what python version were you testing with out of interest? | 12:12 |
| gibi | Python 3.12.10 (main, Apr 9 2025, 04:44:59) [GCC 14.2.0] on linux | 12:13 |
| sean-k-mooney | ack i wonder if the impact is large on say 3.10 or 3.9 | 12:13 |
| gibi | note that it wasn't pure inlining, it was actually using the commulative nature of the call to save actual steps | 12:13 |
| gibi | cumulative | 12:14 |
| sean-k-mooney | ah ok i was actully trying to pull up the orginal code to comare whiel we were chatting but i have not fond that yet | 12:14 |
| sean-k-mooney | wel i have not been trying very hard :) | 12:14 |
| sean-k-mooney | oh its in your previous patch | 12:15 |
| gibi | consolidate merges the allocations of the different request groups, with the new product algo this was done on each prefix of those groups. but merging G1 + G2 and G1 + G2 + G3 can be optimized if G1 + G2 is only calculated once not twice | 12:15 |
| sean-k-mooney | that explains why it was not in github https://review.opendev.org/c/openstack/placement/+/962776/5/placement/objects/allocation_candidate.py#850 | 12:15 |
| sean-k-mooney | so really your folding the exceeds_capacity logic inter the generation | 12:16 |
| gibi | yes | 12:16 |
| sean-k-mooney | ok ya that make more sense as to why it would have a benifit | 12:17 |
| gibi | because of the result of that check informs what part of the product space can be ignored | 12:17 |
| gibi | i.e. if G1+G2 fails the check then G1+G2+Gx+...Gy will awas fail the check so they can be ignored | 12:18 |
| sean-k-mooney | right so we might be able to not inline the fucntion if we return more info beyond True or False like the atribute that exceed capsity | 12:18 |
| sean-k-mooney | i dont really object to the inlineing DRY is nice but im ok wiht specific code where we have a perfroamce need for it | 12:19 |
| sean-k-mooney | im just wonderign if we can get the benift of a testibale function while also achiving the speed up if we adjust the return value to provdie more context | 12:20 |
| sean-k-mooney | looking at https://github.com/openstack/placement/blob/master/placement/objects/research_context.py#L345 | 12:21 |
| sean-k-mooney | the behvior is differnt enough that i dont think it makes sense to reuse exceeds_capacity and modify it as i was suggesting | 12:22 |
| gibi | I will think about making the code cleaner once I have pinned down the behavior with some unit test | 12:24 |
| sean-k-mooney | ack i think we coudl factor https://review.opendev.org/c/openstack/placement/+/963052/1/placement/objects/allocation_candidate.py#746 out into its own helper if we wanted but having its own impleaiton i belive is resonable | 12:25 |
| opendevreview | Julien Le Jeune proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156 | 13:26 |
| bauzas | dansmith: remember when you asked me why my comments were usually having two spaces between the hashtag and the first word ? looks to me this is due to autopep8 which automatically 'fixes' my comments 🤦 | 13:34 |
| dansmith | bauzas: I don't, but.. good to know | 13:34 |
| bauzas | still have to confirm, but unless I'm fool, I don't understand why those extra spaces suddently appear on my code while I doublechecked before committing I wasn't having them | 13:38 |
| bauzas | so this is either autopep8 or vscode (was having the same problem before I started using Cursor) | 13:39 |
| bauzas | I was becoming crazy tbh | 13:39 |
| * dansmith would prefer to not use autopep8 to check pep8 for reasons just like this | 13:39 | |
| bauzas | I'll do further tests with --no-verify | 13:40 |
| bauzas | so it would skip autopep8 | 13:40 |
| *** ralonsoh_ is now known as ralonsoh | 14:00 | |
| sean-k-mooney | bauzas: its from pep8 https://peps.python.org/pep-0008/#inline-comments | 14:03 |
| sean-k-mooney | bauzas: "An inline comment is a comment on the same line as a statement. Inline comments should be separated by at least two spaces from the statement. They should start with a # and a single space." | 14:03 |
| bauzas | I'm not talking of inline statements sorry | 14:03 |
| sean-k-mooney | oh ok. never mind then | 14:04 |
| bauzas | just a whole comment line like # TODO(x) : This is a comment | 14:04 |
| sean-k-mooney | ah well ther eshoudl be 1 space between the # and TODO | 14:04 |
| bauzas | I'm writing "# TODO" and eventually it commits "#Â TODO" with two spaces | 14:04 |
| sean-k-mooney | hum that is odd indeed | 14:05 |
| sean-k-mooney | i have not seen that but that could be the agressive option. although im not sure which guidelien would be influcancing that | 14:05 |
| sean-k-mooney | the change to drop agressive=3 is still open and mergable so if ye want to proceed with that do | 14:06 |
| sean-k-mooney | bauzas: do you happen to have a linke to the code that was formated? | 14:06 |
| bauzas | that's always when I write a new comment line | 14:07 |
| sean-k-mooney | i woudl be interested in seign the example. given we did not have to reformat any code when i enabeld that i was really not expectign it to have such an effect | 14:07 |
| bauzas | maybe this is just a codium issue | 14:07 |
| sean-k-mooney | ya maybe | 14:07 |
| bauzas | but everytime I need to doublecheck what I commit, that's super cumbersome | 14:07 |
| sean-k-mooney | you did have an issue with codium adding a nobreaking space at one point | 14:07 |
| bauzas | honestly, I need time to investigate what's wrong but I'm rushing into delivering a fix for https://bugs.launchpad.net/nova/+bug/2125935 | 14:08 |
| bauzas | so I'll check that later | 14:09 |
| sean-k-mooney | ack | 14:09 |
| opendevreview | Balazs Gibizer proposed openstack/placement master: Prune a_c search space by invalid prefixes https://review.opendev.org/c/openstack/placement/+/962776 | 14:41 |
| opendevreview | Balazs Gibizer proposed openstack/placement master: Inline _consolidate_allocation_requests https://review.opendev.org/c/openstack/placement/+/963052 | 14:41 |
| opendevreview | Balazs Gibizer proposed openstack/placement master: Inline exceeds_capacity https://review.opendev.org/c/openstack/placement/+/963053 | 14:41 |
| gibi | dansmith: this is how unit test for the new product algo will look like https://review.opendev.org/c/openstack/placement/+/962776/5 | 14:42 |
| gibi | https://review.opendev.org/c/openstack/placement/+/962776/6/placement/tests/unit/objects/test_allocation_candidate.py | 14:48 |
| dansmith | gibi: okay.. that will require some study as well | 14:48 |
| gibi | no worries. I have plenty of quirks to debug now that the unit test show some strange steps the algo makes | 14:49 |
| dansmith | gibi: so all those "bug duplicate" ones are things it should have filtered? | 14:56 |
| gibi | those are steps in the algo that should not be made. (I guess we have a off by one index error somewhere) | 15:03 |
| gibi | somehow we check the same product 3 times instead of one | 15:04 |
| gibi | I will figure it out why... | 15:04 |
| gibi | don't hang up on that now. | 15:05 |
| gibi | but the test already shows where we skip products so that is what is interesting | 15:05 |
| gibi | for review | 15:05 |
| opendevreview | Julien Le Jeune proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156 | 15:05 |
| dansmith | gibi: yeah, when I first skimmed it I thought you were identifying the things that would be skipped by using yours over the standard product, so just wanted to confirm that it was actually further skipping that needed to happen | 15:08 |
| dansmith | I'm deeper into it now so it's obvious | 15:08 |
| dansmith | gibi: can you respond here: https://review.opendev.org/c/openstack/placement/+/962776/comment/05b0ce6d_48710288/ ? | 15:20 |
| dansmith | I'm trying to grok the test first, so maybe I'm being stupid | 15:20 |
| bauzas | dansmith: fwiw, fill_metadata() seems non deterministic when you use a wrong context | 15:26 |
| bauzas | ie. in my reproducer, some host with two instances has one with metadata set while the other not | 15:27 |
| dansmith | bauzas: I dunno why you think that's possible :D | 15:28 |
| bauzas | I know my patch fixes the whole problem but I'm puzzled why I'm still getting metadata only for one (and that could explain why your UT didn't catch the failure) | 15:28 |
| dansmith | bauzas: I think it's more likely that you're using a shared context and it's being mutated in a way that seems like it's changing | 15:28 |
| bauzas | yeah, that's what I thought | 15:28 |
| bauzas | I'm passing the same context object for both instances so this probably was mutate | 15:29 |
| bauzas | mutated | 15:29 |
| * bauzas continues to understand the problem | 15:29 | |
| dansmith | the instances are from the same cell right? | 15:29 |
| opendevreview | Julien Le Jeune proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156 | 15:30 |
| opendevreview | Julien Le Jeune proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156 | 15:31 |
| gibi | dansmith: responded | 15:33 |
| gibi | the mock is on the consolidate call not on the exceeds_capacity call | 15:33 |
| gibi | so we assert whant product or partial products are generated and tested by the algo, not what is eventually returned by the product algo. I can add the later to the test as well | 15:34 |
| gibi | s/later/latter/ | 15:34 |
| dansmith | oh, right, so for that 1,2,1 case we would have seen a result of "nope not valid" from _consolidate and thus we would not have yielded it, right? | 15:35 |
| gibi | yepp | 15:37 |
| gibi | and we would also notice that 1,2,1 is an invalid prefix, so if there would be more groups, we would not test any of 1,2,1,x partial products | 15:37 |
| gibi | as x can be any value the product would be invalid | 15:38 |
| dansmith | yeah, sorry seems obvious now :P | 15:38 |
| dansmith | okay so the solution is starting to come into view now.. the odometer is supposed to roll past a bunch of product "solutions" at the first index that isn't viable | 15:39 |
| gibi | yepp | 15:39 |
| gibi | (odometer or I learned during the weekend that it is called a mixed radix number in english that the odometer represents) | 15:39 |
| dansmith | this is why I wanted to see a unit test to understand what the desired behavior was instead of trying to synthesize it in my head :) | 15:40 |
| gibi | totally valid request to have unit tests | 15:41 |
| dansmith | oh I know, I just mean.. I thought it would be easier to review the code with unit tests to show the behavior | 15:41 |
| dansmith | that's not always the case, but here.. this is some complex stuff to reason about (until you get the idea in your head) | 15:42 |
| opendevreview | Balazs Gibizer proposed openstack/placement master: Prune a_c search space by invalid prefixes https://review.opendev.org/c/openstack/placement/+/962776 | 16:11 |
| *** nicolasbock___ is now known as nicolasbock | 22:08 | |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!