Monday, 2025-10-06

*** mhen_ is now known as mhen		02:04
opendevreview	Julien LE JEUNE proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156	07:46
opendevreview	Julien LE JEUNE proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156	07:49
nicolairuckel	I'm working on this patch to persist the NVRAM: https://review.opendev.org/c/openstack/nova/+/959682 So far it works for all migrations but I habve to manually add read permissions to `instance-0000003_VARS.fd`. The file for `disk` already has those. In the libvirt IRC they told me that not having global read permissions is intentional and I'm wondering why the disk in Nova has those permissions and how I	07:56
nicolairuckel	can set them for the NVRAM file as well. I asked this in the libvirt channel and was told that this might be a better place for this question.	07:56
opendevreview	Julien LE JEUNE proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156	08:48
opendevreview	Julien Le Jeune proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156	09:37
sean-k-mooney	gibi: i dont know if you read this comment https://review.opendev.org/c/openstack/placement/+/962776/comment/f68ed8e5_0803b616/ but when i was running the tests locally i noticed that changing the request_limit paramter had no perforncem imporment	10:30
sean-k-mooney	""" min(conf.max_allocation_candidates, requst_limit) """	10:30
sean-k-mooney	i think we are missing ^ somewhere	10:30
gibi_	sean-k-mooney: we intentionally not changed the meaning of the limit parameter when max_allocation_candidates was introduced due to	11:16
gibi_	Placement generates these candidates fully before applying the limit parameter provided in the allocation candidate query to be able do a random sampling if [placement]randomize_allocation_candidates is True.	11:16
gibi_	the limit parameter is applied last	11:16
gibi_	if we change that then that changes how randomize_allocation_candidates behave	11:17
gibi_	so we did not changed limit	11:17
*** gibi_ is now known as gibi		11:17
sean-k-mooney	hum i see	11:35
sean-k-mooney	that is unfortunate	11:35
sean-k-mooney	but i guess that the legacy we are stuck with	11:36
sean-k-mooney	there are two cases i guess to consdier request_limit < max_allocation_candidates	11:37
sean-k-mooney	i was hoping in this case we coudl treadt it as if max_allocation_candidates == request_limit	11:37
sean-k-mooney	in the other case its already going to be enforced	11:37
sean-k-mooney	i wonder if we coudl implemetn the randomize_allocation_candiates diffently	11:39
sean-k-mooney	we currently have depthfirst or breathfirst search	11:39
sean-k-mooney	but couldl we instead do a random search for that usecase.	11:39
sean-k-mooney	it would not be exactuly the same behavior but the randomisation was really there because of the orginial depth first approch to generating candiates	11:40
sean-k-mooney	if we are using breathfirst its much less required	11:41
sean-k-mooney	that a sepreate poblem then the one that your trying to fix but i wonder fi we should conitnue to supprot that if we cant apply optimisation based on teh request limit	11:41
sean-k-mooney	in otherwords i think we could provide randomness in the generation fo the candiate without needign to generate all of them and doing a random choice at the end	11:42
sean-k-mooney	that would fulfil the same usecase but with a lot less wasted effort	11:43
sean-k-mooney	maybe another topic for the ptg dicussion?	11:43
sean-k-mooney	gibi: basically im not sure that genrating all possible candiate is part of the api contract fo that parmater so we shoudl be free to evolve its meaning over tiem to be more effeictent provided it still has the same overall effect	11:46
gibi	I'm OK to discuss this on the PTG. But for me this is a small issue that limit and max_allocation_candidates has two different meaning today	11:57
sean-k-mooney	ya its minor but it means the clinet cant contol the perfoamce of the request by asking for less cnadiates in the responce	11:58
sean-k-mooney	to me it undermines why the request_limit parmater exists in the api in the first place	11:59
gibi	that parameter from the start only limited the size of the response nothing lese	11:59
gibi	else	11:59
sean-k-mooney	sure but i woudl expect it to do more then that as an end user and it prevent nova for example form being smart and say (i knwo this query might generate a lot of candiate so im going to use a lower limit dynicmicly)	12:00
sean-k-mooney	not that nova has the intelegents to know when it shoudl do that	12:01
sean-k-mooney	so its not a blocker but i just find it surprising	12:01
sean-k-mooney	gibi: i noticed you commented on your inlining patches that they dont seam to provide a benifit	12:02
sean-k-mooney	gibi: do you think that is a dead end or are you plannign to do more expermients in that direction	12:02
gibi	unfortunately the randomize_allocation_candidates option defined as	12:03
gibi	If True, when limiting allocation candidate results, the results will be a random sampling of the full result set.	12:03
gibi	full result set :/	12:03
gibi	sean-k-mooney: the first inlining helps, the second not much	12:03
sean-k-mooney	right but it does not define what the full result set is	12:04
gibi	this helps If True, when limiting allocation candidate results, the results will be a random sampling of the full result set.	12:04
gibi	sorry	12:04
gibi	wrong buffer	12:04
gibi	this helps https://review.opendev.org/c/openstack/placement/+/963052/1/placement/tests/functional/test_allocation_candidates.py#265	12:04
gibi	this does not really help https://review.opendev.org/c/openstack/placement/+/963053/2/placement/tests/functional/test_allocation_candidates.py	12:05
sean-k-mooney	gibi: so inlining _consolidate_allocation_requests helps but not exceeds_capacity	12:05
gibi	yep	12:05
sean-k-mooney	i see form the comment you have been trying to use some profiles to help with this, have you found that useful	12:06
gibi	I will not trying to inline more, the profiler shows that the time is spent on hashing and eqality checking the building blocks but I don't see easy moves there	12:06
gibi	I used just the built in cProfiler	12:07
gibi	as you can see in the commented out code	12:07
sean-k-mooney	ya i was just wondering if the output was useful	12:07
sean-k-mooney	i have never used that before hence the question	12:07
gibi	it tells you what called how much and what time it took so it shows you where your spending your execution time	12:08
gibi	it showed that consolidate is more expensive than exeeced_capacity	12:08
sean-k-mooney	ack, i have used flamegraph profilers in other lanaguges in the past just never really used those types of tools much with openstack because they hate eventlets	12:08
sean-k-mooney	placement fortunetly does not use eventlet so that is not an issue	12:09
gibi	yeah, I just enabled it in athe _merge_candidate call so no parallism was involved	12:09
sean-k-mooney	but i tried to use them on nova once on the schduler and they were not really that useful when used as external tools	12:09
gibi	I'm now providing test coverage for the prune patch and we can decide together when dansmith is up that which inlining patch if any we want to take in	12:10
sean-k-mooney	how do you feel about this overall approch for the current bug. are you happy that it gives us enough breathing room for the current reproted edgecase to move forward with for now?	12:10
gibi	I think the prune patch already give us time to re-thing the modeling. I think the inlining of the _consoliodate call is still have a good ROI on code complexity. I would at least merge these two optimization. I will then go and update our backlog space about the problem with the new learning and we can discuss on the PTG. My attached customer case probably closable with only the prune patch in	12:12
gibi	s/space/spec/	12:12
sean-k-mooney	im kind of suprised that manual function inlineing has such an effect in a dynmic language like python. what python version were you testing with out of interest?	12:12
gibi	Python 3.12.10 (main, Apr 9 2025, 04:44:59) [GCC 14.2.0] on linux	12:13
sean-k-mooney	ack i wonder if the impact is large on say 3.10 or 3.9	12:13
gibi	note that it wasn't pure inlining, it was actually using the commulative nature of the call to save actual steps	12:13
gibi	cumulative	12:14
sean-k-mooney	ah ok i was actully trying to pull up the orginal code to comare whiel we were chatting but i have not fond that yet	12:14
sean-k-mooney	wel i have not been trying very hard :)	12:14
sean-k-mooney	oh its in your previous patch	12:15
gibi	consolidate merges the allocations of the different request groups, with the new product algo this was done on each prefix of those groups. but merging G1 + G2 and G1 + G2 + G3 can be optimized if G1 + G2 is only calculated once not twice	12:15
sean-k-mooney	that explains why it was not in github https://review.opendev.org/c/openstack/placement/+/962776/5/placement/objects/allocation_candidate.py#850	12:15
sean-k-mooney	so really your folding the exceeds_capacity logic inter the generation	12:16
gibi	yes	12:16
sean-k-mooney	ok ya that make more sense as to why it would have a benifit	12:17
gibi	because of the result of that check informs what part of the product space can be ignored	12:17
gibi	i.e. if G1+G2 fails the check then G1+G2+Gx+...Gy will awas fail the check so they can be ignored	12:18
sean-k-mooney	right so we might be able to not inline the fucntion if we return more info beyond True or False like the atribute that exceed capsity	12:18
sean-k-mooney	i dont really object to the inlineing DRY is nice but im ok wiht specific code where we have a perfroamce need for it	12:19
sean-k-mooney	im just wonderign if we can get the benift of a testibale function while also achiving the speed up if we adjust the return value to provdie more context	12:20
sean-k-mooney	looking at https://github.com/openstack/placement/blob/master/placement/objects/research_context.py#L345	12:21
sean-k-mooney	the behvior is differnt enough that i dont think it makes sense to reuse exceeds_capacity and modify it as i was suggesting	12:22
gibi	I will think about making the code cleaner once I have pinned down the behavior with some unit test	12:24
sean-k-mooney	ack i think we coudl factor https://review.opendev.org/c/openstack/placement/+/963052/1/placement/objects/allocation_candidate.py#746 out into its own helper if we wanted but having its own impleaiton i belive is resonable	12:25
opendevreview	Julien Le Jeune proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156	13:26
bauzas	dansmith: remember when you asked me why my comments were usually having two spaces between the hashtag and the first word ? looks to me this is due to autopep8 which automatically 'fixes' my comments 🤦	13:34
dansmith	bauzas: I don't, but.. good to know	13:34
bauzas	still have to confirm, but unless I'm fool, I don't understand why those extra spaces suddently appear on my code while I doublechecked before committing I wasn't having them	13:38
bauzas	so this is either autopep8 or vscode (was having the same problem before I started using Cursor)	13:39
bauzas	I was becoming crazy tbh	13:39
* dansmith would prefer to not use autopep8 to check pep8 for reasons just like this		13:39
bauzas	I'll do further tests with --no-verify	13:40
bauzas	so it would skip autopep8	13:40
*** ralonsoh_ is now known as ralonsoh		14:00
sean-k-mooney	bauzas: its from pep8 https://peps.python.org/pep-0008/#inline-comments	14:03
sean-k-mooney	bauzas: "An inline comment is a comment on the same line as a statement. Inline comments should be separated by at least two spaces from the statement. They should start with a # and a single space."	14:03
bauzas	I'm not talking of inline statements sorry	14:03
sean-k-mooney	oh ok. never mind then	14:04
bauzas	just a whole comment line like # TODO(x) : This is a comment	14:04
sean-k-mooney	ah well ther eshoudl be 1 space between the # and TODO	14:04
bauzas	I'm writing "# TODO" and eventually it commits "# TODO" with two spaces	14:04
sean-k-mooney	hum that is odd indeed	14:05
sean-k-mooney	i have not seen that but that could be the agressive option. although im not sure which guidelien would be influcancing that	14:05
sean-k-mooney	the change to drop agressive=3 is still open and mergable so if ye want to proceed with that do	14:06
sean-k-mooney	bauzas: do you happen to have a linke to the code that was formated?	14:06
bauzas	that's always when I write a new comment line	14:07
sean-k-mooney	i woudl be interested in seign the example. given we did not have to reformat any code when i enabeld that i was really not expectign it to have such an effect	14:07
bauzas	maybe this is just a codium issue	14:07
sean-k-mooney	ya maybe	14:07
bauzas	but everytime I need to doublecheck what I commit, that's super cumbersome	14:07
sean-k-mooney	you did have an issue with codium adding a nobreaking space at one point	14:07
bauzas	honestly, I need time to investigate what's wrong but I'm rushing into delivering a fix for https://bugs.launchpad.net/nova/+bug/2125935	14:08
bauzas	so I'll check that later	14:09
sean-k-mooney	ack	14:09
opendevreview	Balazs Gibizer proposed openstack/placement master: Prune a_c search space by invalid prefixes https://review.opendev.org/c/openstack/placement/+/962776	14:41
opendevreview	Balazs Gibizer proposed openstack/placement master: Inline _consolidate_allocation_requests https://review.opendev.org/c/openstack/placement/+/963052	14:41
opendevreview	Balazs Gibizer proposed openstack/placement master: Inline exceeds_capacity https://review.opendev.org/c/openstack/placement/+/963053	14:41
gibi	dansmith: this is how unit test for the new product algo will look like https://review.opendev.org/c/openstack/placement/+/962776/5	14:42
gibi	https://review.opendev.org/c/openstack/placement/+/962776/6/placement/tests/unit/objects/test_allocation_candidate.py	14:48
dansmith	gibi: okay.. that will require some study as well	14:48
gibi	no worries. I have plenty of quirks to debug now that the unit test show some strange steps the algo makes	14:49
dansmith	gibi: so all those "bug duplicate" ones are things it should have filtered?	14:56
gibi	those are steps in the algo that should not be made. (I guess we have a off by one index error somewhere)	15:03
gibi	somehow we check the same product 3 times instead of one	15:04
gibi	I will figure it out why...	15:04
gibi	don't hang up on that now.	15:05
gibi	but the test already shows where we skip products so that is what is interesting	15:05
gibi	for review	15:05
opendevreview	Julien Le Jeune proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156	15:05
dansmith	gibi: yeah, when I first skimmed it I thought you were identifying the things that would be skipped by using yours over the standard product, so just wanted to confirm that it was actually further skipping that needed to happen	15:08
dansmith	I'm deeper into it now so it's obvious	15:08
dansmith	gibi: can you respond here: https://review.opendev.org/c/openstack/placement/+/962776/comment/05b0ce6d_48710288/ ?	15:20
dansmith	I'm trying to grok the test first, so maybe I'm being stupid	15:20
bauzas	dansmith: fwiw, fill_metadata() seems non deterministic when you use a wrong context	15:26
bauzas	ie. in my reproducer, some host with two instances has one with metadata set while the other not	15:27
dansmith	bauzas: I dunno why you think that's possible :D	15:28
bauzas	I know my patch fixes the whole problem but I'm puzzled why I'm still getting metadata only for one (and that could explain why your UT didn't catch the failure)	15:28
dansmith	bauzas: I think it's more likely that you're using a shared context and it's being mutated in a way that seems like it's changing	15:28
bauzas	yeah, that's what I thought	15:28
bauzas	I'm passing the same context object for both instances so this probably was mutate	15:29
bauzas	mutated	15:29
* bauzas continues to understand the problem		15:29
dansmith	the instances are from the same cell right?	15:29
opendevreview	Julien Le Jeune proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156	15:30
opendevreview	Julien Le Jeune proposed openstack/nova master: Update start_service() function in test https://review.opendev.org/c/openstack/nova/+/963156	15:31
gibi	dansmith: responded	15:33
gibi	the mock is on the consolidate call not on the exceeds_capacity call	15:33
gibi	so we assert whant product or partial products are generated and tested by the algo, not what is eventually returned by the product algo. I can add the later to the test as well	15:34
gibi	s/later/latter/	15:34
dansmith	oh, right, so for that 1,2,1 case we would have seen a result of "nope not valid" from _consolidate and thus we would not have yielded it, right?	15:35
gibi	yepp	15:37
gibi	and we would also notice that 1,2,1 is an invalid prefix, so if there would be more groups, we would not test any of 1,2,1,x partial products	15:37
gibi	as x can be any value the product would be invalid	15:38
dansmith	yeah, sorry seems obvious now :P	15:38
dansmith	okay so the solution is starting to come into view now.. the odometer is supposed to roll past a bunch of product "solutions" at the first index that isn't viable	15:39
gibi	yepp	15:39
gibi	(odometer or I learned during the weekend that it is called a mixed radix number in english that the odometer represents)	15:39
dansmith	this is why I wanted to see a unit test to understand what the desired behavior was instead of trying to synthesize it in my head :)	15:40
gibi	totally valid request to have unit tests	15:41
dansmith	oh I know, I just mean.. I thought it would be easier to review the code with unit tests to show the behavior	15:41
dansmith	that's not always the case, but here.. this is some complex stuff to reason about (until you get the idea in your head)	15:42
opendevreview	Balazs Gibizer proposed openstack/placement master: Prune a_c search space by invalid prefixes https://review.opendev.org/c/openstack/placement/+/962776	16:11
*** nicolasbock___ is now known as nicolasbock		22:08

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!