bauzas | good morning Nova | 08:01 |
---|---|---|
bauzas | gibi: ack, saw the Critical bug | 08:02 |
bauzas | we still have 2 weeks for RC1, hopefully should be OK | 08:02 |
gibi | good morning | 08:12 |
gibi | yeah, hopefully we can agree on where to fix it, how to fix it, and if we fix it in oslo then hopefully we can release a new olso.concurrency version | 08:13 |
frickler | FYI placement seems to have issues with the latest oslo.db release https://zuul.opendev.org/t/openstack/build/d7bcb42a3c4a466cac375badba13b18b not that it will likely matter much with all the projects that are completely broken by it | 08:25 |
sean-k-mooney[m] | frickler: thats just a deprecation warning not an actual failure. placemient is mostly sqlachemey 2.0 compatiable already | 08:45 |
sean-k-mooney[m] | looks like we have missed one change but that should not be hard to adress | 08:46 |
gibi | frickler, sean-k-mooney[m]: I'm on it | 08:46 |
frickler | sean-k-mooney[m]: yes, just mentioned it because if would block that requirements patch. certainly not a big thing compared to the other blockers there | 08:47 |
gibi | we intended to store dicts in a caches but we stored row objects instead | 08:47 |
frickler | s/if/it/ | 08:47 |
sean-k-mooney[m] | frickler: well as it stands its just breaking the test as we treat warnings as errors. i doubt its breaking placment at all | 08:48 |
sean-k-mooney[m] | but our cache is not working correctly so thats a latent bug | 08:48 |
sean-k-mooney[m] | gibi https://github.com/openstack/placement/commit/c68d472dca6619055579831ad5464042f745557a i guess we missed this usecase | 08:54 |
sean-k-mooney[m] | the test is useing the dict interface instead of the field interface | 08:55 |
gibi | yeah, good point, we can store row objects in our rc caches if we access the columns by attribute acces and not by dict access | 08:55 |
gibi | let me check that it works | 08:56 |
gibi | as it provides an easier solutions | 08:56 |
sean-k-mooney[m] | https://github.com/openstack/placement/blob/13bbdba06da19f85c05a2a9e1fbdb9d1813c3b47/placement/objects/resource_class.py#L220 we are tryign to use _mappings to convert them to dict on get all | 08:56 |
sean-k-mooney[m] | well not to convert the via a dict to a resouceclass which should be fine | 08:57 |
gibi | here we have Row objects in the caches https://github.com/openstack/placement/blob/13bbdba06da19f85c05a2a9e1fbdb9d1813c3b47/placement/attribute_cache.py#L155 | 08:58 |
gibi | which is actually fine | 08:59 |
gibi | as we use attribute access everywhere | 08:59 |
gibi | except in that one func test that now fails | 08:59 |
gibi | hm, no | 09:00 |
gibi | that caches is broken | 09:00 |
gibi | as sometimes we store dicts | 09:00 |
gibi | sometimes we store Row objects | 09:00 |
gibi | https://github.com/openstack/placement/blob/13bbdba06da19f85c05a2a9e1fbdb9d1813c3b47/placement/attribute_cache.py#L163 | 09:00 |
sean-k-mooney[m] | https://github.com/openstack/placement/blob/48f31d446be5dd8743392e6d1e45ed8183a9ce1b/placement/attribute_cache.py#L148-L151 | 09:01 |
sean-k-mooney[m] | we store db rows when we load it form the db | 09:01 |
gibi | yep so this is a type mess | 09:02 |
sean-k-mooney[m] | ya stephen was complainging about this in a differnt patch | 09:03 |
sean-k-mooney[m] | so the all_cache | 09:04 |
sean-k-mooney[m] | currently had the row | 09:05 |
sean-k-mooney[m] | but that could jsut be a new tuple firht | 09:05 |
sean-k-mooney[m] | https://github.com/openstack/placement/blob/48f31d446be5dd8743392e6d1e45ed8183a9ce1b/placement/attribute_cache.py#L151 | 09:05 |
sean-k-mooney[m] | self._all_cache = {r[1]: r for r in res} -> self._all_cache = {r[1]: (r[0], r[1]) for r in res} | 09:06 |
gibi | so while 'self._all_cache = {r[1]: r._mapping for r in res}' fixed the currently failing test case it breaks a bunch of gabbi tests | 09:06 |
sean-k-mooney[m] | what elese is in that row beyond the id and string | 09:07 |
gibi | updated_at and created_at | 09:08 |
gibi | it is in the select above the fetachall | 09:08 |
sean-k-mooney[m] | ah base is the time stamped mixin | 09:08 |
sean-k-mooney[m] | also the value of the cache is ment to be a dict not a tuple so my version is incorrect | 09:09 |
gibi | yeah I have to figure out while the gabbi tests fails with my change | 09:10 |
gibi | technically Row._mapping is not a dict just a dict like object | 09:10 |
gibi | so it might be a problem | 09:10 |
sean-k-mooney[m] | do you want to do dict(**r._mappings) | 09:10 |
sean-k-mooney[m] | by the way to make this dicts not rows | 09:11 |
gibi | yeah I can try that | 09:11 |
sean-k-mooney[m] | although if we are expecting rows and using .id | 09:11 |
sean-k-mooney[m] | you would need a named tuple instead | 09:11 |
gibi | that cache stores dicts so if there is .id access now that would fail anyhow | 09:12 |
sean-k-mooney[m] | named tuple is the only “standard” class that will give you the field and dict style access | 09:12 |
sean-k-mooney[m] | well its storing row objects currently | 09:13 |
sean-k-mooney[m] | its ment to ba a dict | 09:13 |
gibi | namedtuple does not give you dict access | 09:14 |
gibi | it sometimes store Row sometimes store Dict | 09:14 |
gibi | https://github.com/openstack/placement/blob/13bbdba06da19f85c05a2a9e1fbdb9d1813c3b47/placement/attribute_cache.py#L184-L189 | 09:14 |
gibi | so it seems we started using that cache also in a mixed mode | 09:15 |
gibi | at some places we use attribute acces on it | 09:15 |
gibi | hence the gabbi test failures | 09:15 |
sean-k-mooney[m] | https://github.com/openstack/placement/blame/13bbdba06da19f85c05a2a9e1fbdb9d1813c3b47/placement/objects/trait.py#L151 ya stephen added a fixme when they noticed that | 09:16 |
sean-k-mooney[m] | namedtuple give you indexed acces i.e. r[0] | 09:16 |
sean-k-mooney[m] | but i guess it wont give you r[‘id’] | 09:16 |
gibi | https://github.com/openstack/placement/blame/13bbdba06da19f85c05a2a9e1fbdb9d1813c3b47/placement/objects/resource_class.py#L71-L74 so we assumes Row here | 09:17 |
gibi | so we need to decide which was we go. a) Store Row (or namedtuple) objects in caches and keep attribute access b) store dict and go with dict access | 09:18 |
sean-k-mooney[m] | https://github.com/openstack/placement/commit/b3fe04f081a096258468d032560f46cdfe77e144 stpehen tried to remove these assumtions in ^ | 09:18 |
sean-k-mooney[m] | well that will work as a named tuple | 09:19 |
gibi | Row and namedtuple is pretty much compatible, yes | 09:19 |
sean-k-mooney[m] | i would avoid random dicts personally | 09:19 |
gibi | ack | 09:19 |
sean-k-mooney[m] | and either store the row or named tuple | 09:20 |
sean-k-mooney[m] | im not sure if there is a reason not to sotre thr row object | 09:20 |
sean-k-mooney[m] | does it increase memory | 09:20 |
sean-k-mooney[m] | or have any other sideffect we would not want | 09:20 |
gibi | storing row is OK the doc said dict hence my statement that the caches is broken | 09:23 |
sean-k-mooney[m] | ack | 09:23 |
gibi | we need to mix Row and namedtuple as I can only create namedtuple here https://github.com/openstack/placement/blob/13bbdba06da19f85c05a2a9e1fbdb9d1813c3b47/placement/attribute_cache.py#L157-L168 | 09:24 |
gibi | but they are compatible | 09:24 |
gibi | so I only leave a note | 09:24 |
gibi | when I convert that to namedtuple | 09:24 |
sean-k-mooney[m] | if you use a named tuple on line 155 as well in refersh_from_db i thik we dont need to mix types but cool ill review when you push | 09:39 |
songwenping_ | sean-k-mooney[m],gibi: hi, nova-scheduler get allocation_candidates return 504 gateway timeout when create vm with 8gpus requests on our client's env, there are 13 gpu compute, and every compute has 8gpus. | 09:47 |
opendevreview | Balazs Gibizer proposed openstack/placement master: Make us compatible with oslo.db 12.1.0 https://review.opendev.org/c/openstack/placement/+/855862 | 09:47 |
gibi | sean-k-mooney[m]: ^^ | 09:47 |
gibi | stephenfin: ^^ | 09:47 |
gibi | sean-k-mooney[m]: also here is my stab at the nova only fair lock fix https://review.opendev.org/c/openstack/nova/+/855717 there is the oslo version of the fix https://review.opendev.org/c/openstack/oslo.concurrency/+/855714 | 09:50 |
songwenping_ | i imitate to insert some test datas on my devstack env, and the result is same, perhaps the api of allocation_candidates/limit... need to optimize. | 09:51 |
gibi | but I have to go back and think about the unit tests as it seems they are unstable | 09:51 |
opendevreview | Amit Uniyal proposed openstack/nova master: Adds check for VM snapshot fail while quiesce https://review.opendev.org/c/openstack/nova/+/852171 | 09:53 |
sean-k-mooney[m] | songwenping_: this is physical gpu passthoug? | 09:54 |
sean-k-mooney[m] | not vGPU correct | 09:54 |
songwenping_ | yes | 09:54 |
songwenping_ | pgpu passthough | 09:54 |
sean-k-mooney[m] | those are not tracked in placment then | 09:55 |
auniyal_ | Hi | 09:55 |
auniyal_ | please review these | 09:55 |
auniyal_ | https://review.opendev.org/c/openstack/nova/+/852171 | 09:55 |
auniyal_ | https://review.opendev.org/c/openstack/nova/+/854499 | 09:55 |
auniyal_ | backporting | 09:55 |
auniyal_ | https://review.opendev.org/c/openstack/nova/+/854979 | 09:55 |
auniyal_ | https://review.opendev.org/c/openstack/nova/+/854980 | 09:55 |
sean-k-mooney[m] | songwenping_: on master we now can track pci devics in placment but in any other release pci devices are not tracked in placment | 09:56 |
songwenping_ | sean-k-mooney[m]: we use cyborg to manage pgpu, the request url is :curl -g -i -X GET "http://10.7.20.73/placement/allocation_candidates?limit=1000&group_policy=none&required1=CUSTOM_GPU_NVIDIA%2CCUSTOM_GPU_PRODUCT_ID_1DB6&required2=CUSTOM_GPU_NVIDIA%2CCUSTOM_GPU_PRODUCT_ID_1DB6&required3=CUSTOM_GPU_NVIDIA%2CCUSTOM_GPU_PRODUCT_ID_1DB6&required4=CUSTOM_GPU_NVIDIA%2CCUSTOM_GPU_PRODUCT_ID_1DB6&required5=CUSTOM_GPU_NVIDIA%2CCUSTOM_GPU_PRODUCT | 09:56 |
songwenping_ | _ID_1DB6&required6=CUSTOM_GPU_NVIDIA%2CCUSTOM_GPU_PRODUCT_ID_1DB6&resources=MEMORY_MB%3A32%2CVCPU%3A1&resources1=PGPU%3A1&resources2=PGPU%3A1&resources3=PGPU%3A1&resources4=PGPU%3A1&resources5=PGPU%3A1&resources6=PGPU%3A1" -H "Accept: application/json" -H "OpenStack-API-Version: placement 1.29" -H "User-Agent: openstacksdk/0.99.0 keystoneauth1/4.6.0 python-requests/2.27.1 CPython/3.8.10" -H "X-Auth-Token: gAAAAABjFbkj8nz24Q7A6J0qmjpdZHfWM | 09:56 |
songwenping_ | vZidJOT9iYhV2MQCngcYQHhSQmjsGJofkYoT087tAISpf3IniDGwPTHXz_-8x-1nF60WavSYFgEd-5l3_ENrGumaHuU1yfhMJqZu06IR4SXacjA1g6ImSSEfLbfQ9zrPouB0roFokHPmPy3-UpnFZE" | 09:56 |
gibi | sean-k-mooney[m], sean-k-mooney[m]: is it via cyborg? because then it might be tracked in placement | 09:56 |
sean-k-mooney[m] | ok | 09:56 |
sean-k-mooney[m] | in that case perhaps they are hitting the compintorial explosion we were worried about with tracking VFs directly | 09:58 |
gibi | probably there too many possible candidates | 09:58 |
sean-k-mooney[m] | there are 13 chose 8 combinations | 09:58 |
sean-k-mooney[m] | that 1287 | 09:58 |
gibi | that is not that much | 09:58 |
gibi | but if there is 1000 computes | 09:58 |
sean-k-mooney[m] | 1287*1000 | 09:58 |
gibi | or just 100 | 09:58 |
gibi | then that is sizeable | 09:59 |
sean-k-mooney[m] | ya it will grow quickly | 09:59 |
sean-k-mooney[m] | sorry no | 10:00 |
songwenping_ | there are extra 8 computes without gpu. | 10:00 |
sean-k-mooney[m] | its 13 hosts each with 8 gpus and the vm is asking for 8 | 10:00 |
sean-k-mooney[m] | so it wont explode like that there is only 1 allocation pooible per host | 10:00 |
sean-k-mooney[m] | *possible | 10:01 |
gibi | nope | 10:01 |
gibi | if you have 8 groups and 8 gpus | 10:01 |
gibi | then each group can be satisfied by each gpu | 10:01 |
sean-k-mooney[m] | you think it would b n squared | 10:01 |
sean-k-mooney[m] | oh hum maybe | 10:01 |
gibi | 8! | 10:01 |
gibi | 40320 | 10:02 |
sean-k-mooney[m] | ya… that would be bad | 10:03 |
sean-k-mooney[m] | i hope thats not whats happening or that will cause issue for pci devices in placemnt too | 10:03 |
gibi | this is coming from the fact that placement treats groups and RPs individually. even if we have very similar RPs and very similar groups | 10:03 |
gibi | I can make a test case for 8 PCI devs :) | 10:04 |
gibi | in nova func test | 10:04 |
gibi | so we can prove it | 10:04 |
sean-k-mooney[m] | perhaps start with 4 | 10:04 |
sean-k-mooney[m] | i mean if its a gabbit test then sure 8 | 10:05 |
gibi | ack. first 1 will go and try to stabilize the fair lock unit tests then I will look at the 4-8 PCI issue | 10:05 |
sean-k-mooney[m] | we really need to not do all possible permuations on the placement side | 10:05 |
gibi | if this is real factorial then we need to introduce an per host limit for a_c query | 10:06 |
sean-k-mooney[m] | ya i was debating if that was the only option | 10:06 |
gibi | if the groups are different by trait then we need all permutations | 10:06 |
gibi | otherwise we might not found the right one | 10:06 |
sean-k-mooney[m] | well an allocation candiate by defieniton meets the requirements | 10:07 |
gibi | the trick here is that we have identical groups and identicaly PRs | 10:07 |
sean-k-mooney[m] | so it depend on whyer this is happening | 10:07 |
gibi | to find a_c placement needs to iterate permutations I think in the general case | 10:07 |
sean-k-mooney[m] | anyway something to look into i guess | 10:07 |
gibi | yepp | 10:07 |
sean-k-mooney[m] | im going to grab a coffee and take my blood pressre meds and then check on freya be back in about 10 mins | 10:08 |
gibi | ack | 10:09 |
gibi | for me coffee is the blood pressure med | 10:09 |
sean-k-mooney | gibi: should fastener reintoudce the workaround they had | 10:32 |
gibi | that is a way too but I have less authority over that project | 10:35 |
sean-k-mooney | isnit it an oslo deliverable too | 10:35 |
sean-k-mooney | or has it move out of openstack | 10:36 |
sean-k-mooney | oh its not an openstack project | 10:36 |
sean-k-mooney | i tough it used to be at one point | 10:37 |
sean-k-mooney | i guess we can fix it in oslo but we should leth the know that under eventlet the rentrant guarentee is broken | 10:38 |
sean-k-mooney | https://github.com/harlowja/fasteners#-overview | 10:38 |
sean-k-mooney | then note that it should be reentrant | 10:38 |
sean-k-mooney | gibi: https://github.com/harlowja/fasteners/issues/86 | 10:44 |
sean-k-mooney | that also intereisng | 10:44 |
sean-k-mooney | https://github.com/harlowja/fasteners/pull/87/files | 10:44 |
sean-k-mooney | elif not self.has_pending_writers: | 10:44 |
sean-k-mooney | elif (self._writer == me) or not self.has_pending_writers: | 10:44 |
gibi | I'm not sure I follow how this connects to your current problem. | 10:49 |
gibi | our | 10:49 |
sean-k-mooney | its relying on threading.current_thread | 10:50 |
sean-k-mooney | an now it allows you to reaquire the lock if tha tis the same | 10:50 |
sean-k-mooney | but with spwan_n | 10:50 |
sean-k-mooney | that means two greenthread coudl get the same lock if they run on the same os thread | 10:51 |
gibi | yes, ReaderWriterLock rely on current_thread for reentrancy | 10:51 |
sean-k-mooney | this was changed in januaary | 10:52 |
sean-k-mooney | could you try downgrading fastener to 0.17.2 | 10:52 |
gibi | but the fact that ReaderWriterLock depends on current_thread was not introduced there, it was there before https://github.com/harlowja/fasteners/pull/87/files#diff-bdd827bd84626190e8a93d1a50782b998b426261511e653de5bb775e9082e1f3L169 | 10:52 |
sean-k-mooney | gibi: it did not used to in the reader writer case | 10:52 |
sean-k-mooney | right but it used to prevent geting the reader lock if there were any writers | 10:53 |
gibi | olso depends on the writer lock it seems https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/lockutils.py#L288 | 10:53 |
gibi | and the writer part had reentrancy before 0.17.2 | 10:53 |
sean-k-mooney | im oging to try your oslo repoducer and downgrade it just to see | 10:54 |
gibi | and writer lock is affected independently from the 0.17.2 https://github.com/harlowja/fasteners/pull/87/files#diff-bdd827bd84626190e8a93d1a50782b998b426261511e653de5bb775e9082e1f3L208 | 10:54 |
sean-k-mooney | synconise is takign a write_lock ya? | 10:56 |
sean-k-mooney | if so then its not related to that change | 10:56 |
sean-k-mooney | but the is_writer code is not eventlet safe | 10:57 |
sean-k-mooney | likely becuase of the workaround you mentioned they remvoed | 10:57 |
sean-k-mooney | https://github.com/harlowja/fasteners/commit/467ed75ee1e9465ebff8b5edf452770befb93913 | 10:57 |
sean-k-mooney | so 0.15 dropped that | 10:58 |
gibi | yes it is broken since 0.15 | 11:00 |
gibi | it is effecting master and yoga | 11:01 |
gibi | in xena we have < 0.15 | 11:01 |
sean-k-mooney | https://github.com/harlowja/fasteners/issues/96 | 11:12 |
sean-k-mooney | at leasst they can triage ^ and decied if its somethign they want to fix | 11:13 |
gibi | thanks | 11:20 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Fix fair internal lock used from eventlet.spawn_n https://review.opendev.org/c/openstack/nova/+/855717 | 11:33 |
gibi | added the fasteners issue link and hopefully stabilized the unit test ^^ | 11:33 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/yoga: Fix fair internal lock used from eventlet.spawn_n https://review.opendev.org/c/openstack/nova/+/855718 | 11:34 |
sean-k-mooney | im going to propose two reverts to fasteneres and assocaite it with the issue link too incase they deciced that that is approicate | 11:34 |
gibi | ack | 11:35 |
sean-k-mooney | https://github.com/harlowja/fasteners/pull/97 | 11:40 |
gibi | thanks | 11:47 |
*** tosky_ is now known as tosky | 12:23 | |
opendevreview | Balazs Gibizer proposed openstack/nova master: Show candidate combinatorial explosion by dev number https://review.opendev.org/c/openstack/nova/+/855885 | 13:00 |
gibi | sean-k-mooney: here are the numbers of the combinatorial explosion | 13:00 |
gibi | ^^ | 13:00 |
gibi | in case of single device per RP (ie PCI or PF) we have worst case factorial amount of candidates | 13:01 |
gibi | in case a single RP provides more than one devices (n VFs for a PF RP) then worst case we have exponential candidates | 13:01 |
gibi | and placement first generate all of them then limit the result based on the limit queryparam https://github.com/openstack/placement/blob/723da65faf66cc9b8d02f3756387dc58437e62af/placement/objects/research_context.py#L289-L292 | 13:04 |
gibi | so this probably needs a bit of (probably massive) refactoring if we want to avoid placement to blow up on 8 devices | 13:13 |
gibi | we need to inline the limit somehow | 13:13 |
gibi | but by that we would potentially loose viable candidates | 13:15 |
gibi | so placement alone cannot decide where to limiot | 13:16 |
gibi | So modeling similar PFs of PCI devs does not help as it would lead to the VF scenario. | 13:23 |
gibi | Also we cannot model count=n as a single group as placement never splits a suffixed group to fit it into multiple RP | 13:24 |
gibi | while for nova it would be enough to have a small number of candidates per compute host, while we have nova side PCI filtering we need all candidates from placement as we don't know which will fulfill the nova side filtering | 13:26 |
sean-k-mooney | gibi: ya so this is exactly why we did not want each VF to ba an RP | 13:44 |
sean-k-mooney | we were very concerned it would explode like this | 13:44 |
* sean-k-mooney just back form dentist so reading back | 13:44 | |
sean-k-mooney | gibi: this feels a bit like the numa node combintorial issue | 13:46 |
sean-k-mooney | is this happening in sql or in python | 13:46 |
gibi | this is in python | 13:46 |
gibi | in case of numa we re-tried host - guest numa mapping multiple times | 13:46 |
sean-k-mooney | ack i wond if we can use itertools.combinations there in stead of permuations in that case | 13:46 |
gibi | we use itertools.product to generate all mapping between RPs and groups | 13:47 |
sean-k-mooney | hum ya i wonder if we really need all | 13:47 |
gibi | from placement perspective we need all | 13:47 |
sean-k-mooney | do you have a pointer to the code | 13:47 |
gibi | nova might be able to limit it by providing al the PCI filtering information to placement | 13:48 |
sean-k-mooney | gibi: we likely can take a perhost limit and then use a genortator to limit the amount we return | 13:48 |
sean-k-mooney | gibi: to porvide all the info we would also need to pass the numa toplogy info which would change the tree structure | 13:49 |
sean-k-mooney | doable but a lot of work | 13:49 |
gibi | the perhost limit has the problem that if nova still filters out PCI devices after placement then we need to make sure that placement returns enough candidate to fulfill that extra filtering | 13:49 |
sean-k-mooney | https://github.com/openstack/placement/blob/c68d472dca6619055579831ad5464042f745557a/placement/objects/allocation_candidate.py#L364-L387 | 13:49 |
gibi | yeh I linked it above :) | 13:50 |
sean-k-mooney | gibi: ya its the same issue with the current request limit | 13:50 |
sean-k-mooney | you linked to the resarch context | 13:50 |
sean-k-mooney | unless i missed it | 13:50 |
gibi | ahh sorry yes | 13:51 |
gibi | in the commit message I linked to the product call | 13:51 |
gibi | https://review.opendev.org/c/openstack/nova/+/855885/1//COMMIT_MSG#26 | 13:51 |
sean-k-mooney | ack | 13:51 |
gibi | at the moment I don't think this is easy to fix and given my time allocation for the next months I won't start on it. | 13:54 |
gibi | songwenping_ might have time and ideas to hack on it | 13:54 |
gibi | I left the above functional test top of the PCI series so we will not forget that this needs to be fixed | 13:55 |
gibi | but this makes me question if we want to merge the scheduling support in AA | 13:55 |
gibi | it might be useful for small deployments (<8 devs per host) | 13:55 |
gibi | but it is dangerous for big deployments | 13:56 |
gibi | bauzas: do we have a PTG etherpad? | 13:56 |
sean-k-mooney | im wondering if we need to have a way to limit this form the api query | 13:56 |
bauzas | gibi: not yet, but I can create one | 13:56 |
gibi | bauzas: I could use one :) | 13:57 |
bauzas | as you want | 13:57 |
gibi | sean-k-mooney: if we limit the a_c query then we need to give hints to placement about which order to iterate the candidates to fill the limited response | 13:58 |
gibi | sean-k-mooney: but I'm not sure I can express what we need | 13:59 |
gibi | sean-k-mooney: it is skip those candidates that are "too similar" to an already found candidate | 13:59 |
sean-k-mooney | gibi: im wonderign if we can avoid it by generatting a suffictly diverse set of combinations | 14:01 |
gibi | i.e. in case RP1(2), RP2(2), G1(1), G2(1) -> (RP1-G1, RP2-G2) and (RP1-G2, RP2-G1) might be too similar if G1 an G2 asks for the same RC and traits | 14:01 |
gibi | yeah divers set of candidates == skip the too similar ones :) | 14:01 |
gibi | but what is divers might be not universal | 14:02 |
gibi | like ir RP2 is remote_managed=True in nova then placement still sees the same symmetry but the two RP is not equivalent from nova perspecgive | 14:02 |
gibi | perspective | 14:02 |
sean-k-mooney | right so we want to ignore order when lookign at equvialce provided the request group is the same | 14:04 |
sean-k-mooney | i feel likel there is definetly a way to optimise so that we dont generate a product | 14:04 |
sean-k-mooney | that is goign to over produce results | 14:04 |
sean-k-mooney | but off the top of my head im not sure the correct way to proceed | 14:05 |
gibi | bauzas: I've created https://etherpad.opendev.org/p/nova-antelope-ptg | 14:05 |
sean-k-mooney | lookingat https://docs.python.org/3/library/itertools.html#itertools-recipes | 14:06 |
gibi | sean-k-mooney: I agree on the second part "but off the top of my head im not sure the correct way to proceed" but I'm not sure we can have better than actually iterating the product | 14:06 |
sean-k-mooney | maybe take(per_host_limit, random_product(...)) | 14:07 |
gibi | yeah random is a way out even if a dirty one | 14:09 |
sean-k-mooney | gibi: if we cant avoid the need to generate the product im wonderign if we can break the implict lexegfacical ordering | 14:09 |
opendevreview | Amit Uniyal proposed openstack/nova master: add regression test case for bug 1552777 https://review.opendev.org/c/openstack/nova/+/855900 | 14:10 |
opendevreview | Amit Uniyal proposed openstack/nova master: Adds check for instance resizing https://review.opendev.org/c/openstack/nova/+/855901 | 14:10 |
gibi | if nova could define a requested order based on information that nova has but placement doesnt, then yes, ordering can be a solution. that is basically nova asking placement to generate diverse candiates with a definition of divers provided by nova in the a_c query | 14:10 |
gibi | OK I tried to document this issue on the PTG etherpad and by that I will move this to background processing :) | 14:18 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Fix fair internal lock used from eventlet.spawn_n https://review.opendev.org/c/openstack/nova/+/855717 | 14:34 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/yoga: Fix fair internal lock used from eventlet.spawn_n https://review.opendev.org/c/openstack/nova/+/855718 | 14:36 |
gibi | bauzas, sean-k-mooney: ML thread about the fair lock issue https://lists.openstack.org/pipermail/openstack-discuss/2022-September/030325.html | 16:22 |
bauzas | gibi: thanks | 16:22 |
sean-k-mooney | cool | 16:23 |
gibi | I did find any direct evidence that other projects than nova and taskflow are affected | 16:26 |
gibi | I did not | 16:26 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Fix fair internal lock used from eventlet.spawn_n https://review.opendev.org/c/openstack/nova/+/855717 | 16:30 |
opendevreview | Balazs Gibizer proposed openstack/nova stable/yoga: Fix fair internal lock used from eventlet.spawn_n https://review.opendev.org/c/openstack/nova/+/855718 | 16:31 |
sean-k-mooney | gibi bauzas https://blueprints.launchpad.net/nova/+spec/non-admin-hw-offloaded-ovs | 17:24 |
sean-k-mooney | i can proably hack up a poc of that this week i guess but im unsure if i will have time to take that to completion | 17:24 |
bauzas | cycle highlights ready to review https://review.opendev.org/c/openstack/releases/+/855974 | 17:26 |
sean-k-mooney | do we have any features to highlight for non libvirt drivers | 17:28 |
sean-k-mooney | were there any imporant ironic improvmements | 17:28 |
sean-k-mooney | or hyperv | 17:28 |
sean-k-mooney | based on https://docs.openstack.org/releasenotes/nova/unreleased.html#new-features i guess not | 17:29 |
sean-k-mooney | gibi: bauzas ... so there is alos a libvirt bug at play for hardwaore offloaded ovs | 18:05 |
sean-k-mooney | https://github.com/libvirt/libvirt/commit/8708ca01c0dd38764cad3e483405bdeb05ac2e96 | 18:06 |
whoami-rajat | bauzas, hey, we're past client freeze but my API feature is in and just wanted to mention the OSC and novaclient patches required by my feature | 19:03 |
whoami-rajat | novaclient https://review.opendev.org/c/openstack/python-novaclient/+/827163 | 19:03 |
whoami-rajat | OSC: https://review.opendev.org/c/openstack/python-openstackclient/+/831014 | 19:03 |
*** haleyb_ is now known as haleyb | 20:12 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!