Monday, 2025-10-06

*** mhen_ is now known as mhen02:04
opendevreviewJulien LE JEUNE proposed openstack/nova master: Update start_service() function in test  https://review.opendev.org/c/openstack/nova/+/96315607:46
opendevreviewJulien LE JEUNE proposed openstack/nova master: Update start_service() function in test  https://review.opendev.org/c/openstack/nova/+/96315607:49
nicolairuckel I'm working on this patch to persist the NVRAM: https://review.opendev.org/c/openstack/nova/+/959682 So far it works for all migrations but I habve to manually add read permissions to `instance-0000003_VARS.fd`. The file for `disk` already has those. In the libvirt IRC they told me that not having global read permissions is intentional and I'm wondering why the disk in Nova has those permissions and how I 07:56
nicolairuckelcan set them for the NVRAM file as well. I asked this in the libvirt channel and was told that this might be a better place for this question.07:56
opendevreviewJulien LE JEUNE proposed openstack/nova master: Update start_service() function in test  https://review.opendev.org/c/openstack/nova/+/96315608:48
opendevreviewJulien Le Jeune proposed openstack/nova master: Update start_service() function in test  https://review.opendev.org/c/openstack/nova/+/96315609:37
sean-k-mooneygibi: i dont know if you read this comment https://review.opendev.org/c/openstack/placement/+/962776/comment/f68ed8e5_0803b616/ but when i was running the tests locally i noticed that changing the request_limit paramter had no perforncem imporment10:30
sean-k-mooney""" min(conf.max_allocation_candidates, requst_limit) """10:30
sean-k-mooneyi think we are missing ^ somewhere10:30
gibi_sean-k-mooney: we intentionally not changed the meaning of the limit parameter when max_allocation_candidates was introduced due to11:16
gibi_Placement generates these candidates fully before applying the limit parameter provided in the allocation candidate query to be able do a random sampling if [placement]randomize_allocation_candidates is True.11:16
gibi_the limit parameter is applied last11:16
gibi_if we change that then that changes how randomize_allocation_candidates behave11:17
gibi_so we did not changed limit11:17
*** gibi_ is now known as gibi11:17
sean-k-mooneyhum i see11:35
sean-k-mooneythat is unfortunate11:35
sean-k-mooneybut i guess that the legacy we are stuck with11:36
sean-k-mooneythere are two cases i guess to consdier request_limit < max_allocation_candidates11:37
sean-k-mooneyi was hoping in this case we coudl treadt it as if max_allocation_candidates == request_limit11:37
sean-k-mooneyin the other case its already going to be enforced11:37
sean-k-mooneyi wonder if we coudl implemetn the randomize_allocation_candiates diffently11:39
sean-k-mooneywe currently have depthfirst or breathfirst search11:39
sean-k-mooneybut couldl we instead do a random search for that usecase.11:39
sean-k-mooneyit would not be exactuly the same behavior but the randomisation was really there because of the orginial depth first approch to generating candiates11:40
sean-k-mooneyif we are using breathfirst its much less required11:41
sean-k-mooneythat a sepreate poblem then the one that your trying to fix but i wonder fi we should conitnue to supprot that if we cant apply optimisation based on teh request limit11:41
sean-k-mooneyin otherwords i think we could provide randomness in the generation fo the candiate without needign to generate all of them and doing a random choice at the end11:42
sean-k-mooneythat would fulfil the same usecase but with a lot less wasted effort11:43
sean-k-mooneymaybe another topic for the ptg dicussion?11:43
sean-k-mooneygibi: basically im not sure that genrating all possible candiate is part of the api contract fo that parmater so we shoudl be free to evolve its meaning over tiem to be more effeictent provided it still has the same overall effect11:46
gibiI'm OK to discuss this on the PTG. But for me this is a small issue that limit and max_allocation_candidates has two different meaning today11:57
sean-k-mooneyya its minor but it means the clinet cant contol the perfoamce of the request by asking for less cnadiates in the responce11:58
sean-k-mooneyto me it undermines why the request_limit parmater exists in the api in the first place11:59
gibithat parameter from the start only limited the size of the response nothing lese11:59
gibielse11:59
sean-k-mooneysure but i woudl expect it to do more then that as an end user and it prevent nova for example form being smart and say (i knwo this query might generate a lot of candiate so im going to use a lower limit dynicmicly) 12:00
sean-k-mooneynot that nova has the intelegents to know when it shoudl do that12:01
sean-k-mooneyso its not a blocker but i just find it surprising12:01
sean-k-mooneygibi: i noticed you commented on your inlining patches that they dont seam to provide a benifit12:02
sean-k-mooneygibi: do you think that is a dead end or are you plannign to do more expermients in that direction12:02
gibiunfortunately the randomize_allocation_candidates option defined as12:03
gibiIf True, when limiting allocation candidate results, the results will be a random sampling of the full result set. 12:03
gibi*full result set* :/12:03
gibisean-k-mooney: the first inlining helps, the second not much12:03
sean-k-mooneyright but it does not define what the full result set is12:04
gibithis helps If True, when limiting allocation candidate results, the results will be a random sampling of the full result set. 12:04
gibisorry12:04
gibiwrong buffer12:04
gibithis helps https://review.opendev.org/c/openstack/placement/+/963052/1/placement/tests/functional/test_allocation_candidates.py#26512:04
gibithis does not really help https://review.opendev.org/c/openstack/placement/+/963053/2/placement/tests/functional/test_allocation_candidates.py12:05
sean-k-mooneygibi: so inlining  _consolidate_allocation_requests helps but not exceeds_capacity12:05
gibiyep12:05
sean-k-mooneyi see form the comment you have been trying to use some profiles to help with this, have you found that useful12:06
gibiI will not trying to inline more, the profiler shows that the time is spent on hashing and eqality checking the building blocks but I don't see easy moves there12:06
gibiI used just the built in cProfiler12:07
gibias you can see in the commented out code12:07
sean-k-mooneyya i was just wondering if the output was useful12:07
sean-k-mooneyi have never used that before hence the question12:07
gibiit tells you what called how much and what time it took so it shows you where your spending your execution time12:08
gibiit showed that consolidate is more expensive than exeeced_capacity12:08
sean-k-mooneyack, i have used flamegraph profilers in other lanaguges in the past just never really used those types of tools much with openstack because they hate eventlets12:08
sean-k-mooneyplacement fortunetly does not use eventlet so that is not an issue12:09
gibiyeah, I just enabled it in athe _merge_candidate call so no parallism was involved12:09
sean-k-mooneybut i tried to use them on nova once on the schduler and they were not really that useful when used as external tools12:09
gibiI'm now providing test coverage for the prune patch and we can decide together when dansmith is up that which inlining patch if any we want to take in12:10
sean-k-mooneyhow do you feel about this overall approch for the current bug. are you happy that it gives us enough breathing room for the current reproted edgecase to move forward with for now?12:10
gibiI think the prune patch already give us time to re-thing the modeling. I think the inlining of the _consoliodate call is still have a good ROI on code complexity. I would at least merge these two optimization. I will then go and update our backlog space about the problem with the new learning and we can discuss on the PTG. My attached customer case probably closable with only the prune patch in12:12
gibis/space/spec/12:12
sean-k-mooneyim kind of suprised that manual function inlineing has such an effect in a dynmic language like python. what python version were you testing with out of interest?12:12
gibiPython 3.12.10 (main, Apr  9 2025, 04:44:59) [GCC 14.2.0] on linux12:13
sean-k-mooneyack i wonder if the impact is large on say 3.10 or 3.912:13
gibinote that it wasn't pure inlining, it was actually using the commulative nature of the call to save actual steps12:13
gibicumulative12:14
sean-k-mooneyah ok i was actully trying to pull up the orginal code to comare whiel we were chatting but i have not fond that yet12:14
sean-k-mooneywel i have not been trying very hard :)12:14
sean-k-mooneyoh its in your previous patch12:15
gibiconsolidate merges the allocations of the different request groups, with the new product algo this was done on each prefix of those groups. but merging G1 + G2 and G1 + G2 + G3 can be optimized if G1 + G2 is only calculated once not twice12:15
sean-k-mooneythat explains why it was not in github https://review.opendev.org/c/openstack/placement/+/962776/5/placement/objects/allocation_candidate.py#85012:15
sean-k-mooneyso really your folding the exceeds_capacity logic inter the generation12:16
gibiyes12:16
sean-k-mooneyok ya that make more sense as to why it would have a benifit12:17
gibibecause of the result of that check informs what part of the product space can be ignored12:17
gibii.e. if G1+G2 fails the check then G1+G2+Gx+...Gy will awas fail the check so they can be ignored12:18
sean-k-mooneyright so we might be able to not inline the fucntion if we return more info beyond True or False like the atribute that exceed capsity12:18
sean-k-mooneyi dont really object to the inlineing DRY is nice but im ok wiht specific code where we have a perfroamce need for it12:19
sean-k-mooneyim just wonderign if we can get the benift of a testibale function while also achiving the speed up if we adjust the return value to provdie more context12:20
sean-k-mooneylooking at https://github.com/openstack/placement/blob/master/placement/objects/research_context.py#L34512:21
sean-k-mooney the behvior is differnt enough that i dont think it makes sense to reuse exceeds_capacity and modify it as i was suggesting12:22
gibiI will think about making the code cleaner once I have pinned down the behavior with some unit test12:24
sean-k-mooneyack i think we coudl factor https://review.opendev.org/c/openstack/placement/+/963052/1/placement/objects/allocation_candidate.py#746 out into its own helper if we wanted but having its own impleaiton i belive is resonable12:25
opendevreviewJulien Le Jeune proposed openstack/nova master: Update start_service() function in test  https://review.opendev.org/c/openstack/nova/+/96315613:26
bauzasdansmith: remember when you asked me why my comments were usually having two spaces between the hashtag and the first word ? looks to me this is due to autopep8 which automatically 'fixes' my comments 🤦13:34
dansmithbauzas: I don't, but.. good to know13:34
bauzasstill have to confirm, but unless I'm fool, I don't understand why those extra spaces suddently appear on my code while I doublechecked before committing I wasn't having them13:38
bauzasso this is either autopep8 or vscode (was having the same problem before I started using Cursor)13:39
bauzasI was becoming crazy tbh13:39
* dansmith would prefer to not use autopep8 to check pep8 for reasons just like this13:39
bauzasI'll do further tests with --no-verify 13:40
bauzasso it would skip autopep813:40
*** ralonsoh_ is now known as ralonsoh14:00
sean-k-mooneybauzas: its from pep8 https://peps.python.org/pep-0008/#inline-comments14:03
sean-k-mooneybauzas: "An inline comment is a comment on the same line as a statement. Inline comments should be separated by at least two spaces from the statement. They should start with a # and a single space."14:03
bauzasI'm not talking of inline statements sorry14:03
sean-k-mooneyoh ok. never mind then14:04
bauzasjust a whole comment line like #  TODO(x) : This is a comment14:04
sean-k-mooneyah well ther eshoudl be 1 space between the # and TODO14:04
bauzasI'm writing "# TODO" and eventually it commits "#  TODO" with two spaces14:04
sean-k-mooneyhum that is odd indeed14:05
sean-k-mooneyi have not seen that but that could be the agressive option. although im not sure which guidelien would be influcancing that14:05
sean-k-mooneythe change to drop agressive=3 is still open and mergable so if ye want to proceed with that do14:06
sean-k-mooneybauzas: do you happen to have a linke to the code that was formated?14:06
bauzasthat's always when I write a new comment line14:07
sean-k-mooneyi woudl be interested in seign the example. given we did not have to reformat any code when i enabeld that i was really not expectign it to have such an effect14:07
bauzasmaybe this is just a codium issue14:07
sean-k-mooneyya maybe14:07
bauzasbut everytime I need to doublecheck what I commit, that's super cumbersome14:07
sean-k-mooneyyou did have an issue with codium adding a nobreaking space at one point14:07
bauzashonestly, I need time to investigate what's wrong but I'm rushing into delivering a fix for https://bugs.launchpad.net/nova/+bug/212593514:08
bauzasso I'll check that later14:09
sean-k-mooneyack14:09
opendevreviewBalazs Gibizer proposed openstack/placement master: Prune a_c search space by invalid prefixes  https://review.opendev.org/c/openstack/placement/+/96277614:41
opendevreviewBalazs Gibizer proposed openstack/placement master: Inline _consolidate_allocation_requests  https://review.opendev.org/c/openstack/placement/+/96305214:41
opendevreviewBalazs Gibizer proposed openstack/placement master: Inline exceeds_capacity  https://review.opendev.org/c/openstack/placement/+/96305314:41
gibidansmith: this is how unit test for the new product algo will look like https://review.opendev.org/c/openstack/placement/+/962776/514:42
gibihttps://review.opendev.org/c/openstack/placement/+/962776/6/placement/tests/unit/objects/test_allocation_candidate.py14:48
dansmithgibi: okay.. that will require some study as well14:48
gibino worries. I have plenty of quirks to debug now that the unit test show some strange steps the algo makes14:49
dansmithgibi: so all those "bug duplicate" ones are things it should have filtered?14:56
gibithose are steps in the algo that should not be made. (I guess we have a off by one index error somewhere) 15:03
gibisomehow we check the same product 3 times instead of one15:04
gibiI will figure it out why...15:04
gibidon't hang up on that now.15:05
gibibut the test already shows where we skip products so that is what is interesting15:05
gibifor review15:05
opendevreviewJulien Le Jeune proposed openstack/nova master: Update start_service() function in test  https://review.opendev.org/c/openstack/nova/+/96315615:05
dansmithgibi: yeah, when I first skimmed it I thought you were identifying the things that would be skipped by using yours over the standard product, so just wanted to confirm that it was actually further skipping that needed to happen15:08
dansmithI'm deeper into it now so it's obvious15:08
dansmithgibi: can you respond here: https://review.opendev.org/c/openstack/placement/+/962776/comment/05b0ce6d_48710288/ ?15:20
dansmithI'm trying to grok the test first, so maybe I'm being stupid15:20
bauzasdansmith: fwiw, fill_metadata() seems non deterministic when you use a wrong context15:26
bauzasie. in my reproducer, some host with two instances has one with metadata set while the other not 15:27
dansmithbauzas: I dunno why you think that's possible :D15:28
bauzasI know my patch fixes the whole problem but I'm puzzled why I'm still getting metadata only for one (and that could explain why your UT didn't catch the failure)15:28
dansmithbauzas: I think it's more likely that you're using a shared context and it's being mutated in a way that seems like it's changing15:28
bauzasyeah, that's what I thought15:28
bauzasI'm passing the same context object for both instances so this probably was mutate15:29
bauzasmutated15:29
* bauzas continues to understand the problem15:29
dansmiththe instances are from the same cell right?15:29
opendevreviewJulien Le Jeune proposed openstack/nova master: Update start_service() function in test  https://review.opendev.org/c/openstack/nova/+/96315615:30
opendevreviewJulien Le Jeune proposed openstack/nova master: Update start_service() function in test  https://review.opendev.org/c/openstack/nova/+/96315615:31
gibidansmith: responded15:33
gibithe mock is on the consolidate call not on the exceeds_capacity call 15:33
gibiso we assert whant product or partial products are generated and tested by the algo, not what is eventually returned by the product algo. I can add the later to the test as well15:34
gibis/later/latter/15:34
dansmithoh, right, so for that 1,2,1 case we would have seen a result of "nope not valid" from _consolidate and thus we would not have yielded it, right?15:35
gibiyepp15:37
gibiand we would also notice that 1,2,1 is an invalid prefix, so if there would be more groups, we would not test any of 1,2,1,x partial products15:37
gibias x can be any value the product would be invalid15:38
dansmithyeah, sorry seems obvious now :P15:38
dansmithokay so the solution is starting to come into view now.. the odometer is supposed to roll past a bunch of product "solutions" at the first index that isn't viable15:39
gibiyepp15:39
gibi(odometer or I learned during the weekend that it is called a mixed radix number in english that the odometer represents)15:39
dansmiththis is why I wanted to see a unit test to understand what the desired behavior was instead of trying to synthesize it in my head :)15:40
gibitotally valid request to have unit tests15:41
dansmithoh I know, I just mean.. I thought it would be easier to review the code with unit tests to show the behavior15:41
dansmiththat's not always the case, but here.. this is some complex stuff to reason about (until you get the idea in your head)15:42
opendevreviewBalazs Gibizer proposed openstack/placement master: Prune a_c search space by invalid prefixes  https://review.opendev.org/c/openstack/placement/+/96277616:11
*** nicolasbock___ is now known as nicolasbock22:08

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!