*** altlogbot_3 has quit IRC | 01:28 | |
*** altlogbot_2 has joined #openstack-placement | 01:29 | |
*** tetsuro has joined #openstack-placement | 05:57 | |
*** tetsuro has quit IRC | 05:58 | |
*** tetsuro has joined #openstack-placement | 05:59 | |
*** tetsuro has quit IRC | 06:03 | |
*** helenafm has joined #openstack-placement | 07:17 | |
*** tssurya has joined #openstack-placement | 07:28 | |
openstackgerrit | Tetsuro Nakamura proposed openstack/placement master: Support `same_subtree` queryparam https://review.opendev.org/668376 | 07:47 |
---|---|---|
openstackgerrit | Tetsuro Nakamura proposed openstack/placement master: Doc `same_subtree` queryparam https://review.opendev.org/669616 | 07:47 |
*** tetsuro has joined #openstack-placement | 07:55 | |
*** tetsuro has quit IRC | 07:57 | |
*** ttsiouts has joined #openstack-placement | 08:01 | |
*** ttsiouts has quit IRC | 08:13 | |
*** ttsiouts has joined #openstack-placement | 08:13 | |
*** ttsiouts has quit IRC | 08:18 | |
openstackgerrit | Chris Dent proposed openstack/placement master: Update implemented spec and spec document handling https://review.opendev.org/669184 | 08:18 |
openstackgerrit | Chris Dent proposed openstack/placement master: Add whereto for testing redirect rules https://review.opendev.org/669370 | 08:18 |
openstackgerrit | Chris Dent proposed openstack/placement master: tox: Stop building api-ref docs with the main docs https://review.opendev.org/669371 | 08:18 |
*** ttsiouts has joined #openstack-placement | 08:24 | |
helenafm | :q | 08:28 |
*** cdent has joined #openstack-placement | 09:05 | |
*** cdent has quit IRC | 09:45 | |
*** ttsiouts has quit IRC | 10:32 | |
*** ttsiouts has joined #openstack-placement | 10:33 | |
*** cdent has joined #openstack-placement | 10:33 | |
*** ttsiouts has quit IRC | 10:37 | |
cdent | gibi: have you had a chance to look at tetsuro's same_subtree work? needs someone besides me and efried_pto looking at it | 10:46 |
gibi | cdent: not yet, I will dive into it now. | 10:49 |
cdent | great, thanks | 10:49 |
*** ttsiouts has joined #openstack-placement | 11:01 | |
*** cdent has quit IRC | 11:12 | |
*** cdent has joined #openstack-placement | 11:18 | |
*** sean-k-mooney has quit IRC | 12:03 | |
*** sean-k-mooney has joined #openstack-placement | 12:16 | |
*** cdent has quit IRC | 12:26 | |
*** artom has joined #openstack-placement | 12:33 | |
*** edleafe has joined #openstack-placement | 12:42 | |
openstackgerrit | Merged openstack/os-resource-classes master: Add Python 3 Train unit tests https://review.opendev.org/669479 | 12:57 |
openstackgerrit | Merged openstack/os-traits master: Add Python 3 Train unit tests https://review.opendev.org/669480 | 13:00 |
*** takashin has left #openstack-placement | 13:01 | |
*** mriedem has joined #openstack-placement | 13:23 | |
*** cdent has joined #openstack-placement | 13:34 | |
gibi | cdent: this same_subtree patch is pretty dense. I hope I finish it today but no promises. | 13:40 |
cdent | gibi: no worries, there's not a huge rush because we're fairly ahead of schedule, but the sooner it is merged, the sooner nova can start playing with it I suppose | 13:40 |
gibi | Is there a volunteer from nova side to play with this during Train? | 13:41 |
gibi | I mean who will be the developer first consuming this? as we might need a review from that dev as well | 13:42 |
cdent | gibi: I don't really know | 13:44 |
cdent | efried_pto: is the one who has been driving that it needs to happen | 13:44 |
gibi | cdent: OK | 13:46 |
cdent | I hope this doesn't turn into another situation where placement is months or even years ahead of nova, but it could well do, and that's fine | 13:47 |
*** amodi has joined #openstack-placement | 13:50 | |
*** efried_pto is now known as efried | 13:54 | |
gibi | I don't remember a nova spec that explicity stated same_subtree as a requirement | 13:54 |
gibi | and nova tend to be a lot slower to move forward than placement | 13:55 |
efried | I'm sort of hoping to bully bauzas into abandoning his vgpu affinity spec in favor of working NUMA nesting in nova. | 13:55 |
gibi | anyhow I will review the implementation, it was just a sidetrack of mine to get the "user" of the feature involved | 13:55 |
bauzas | efried: sorry but why ? | 13:55 |
bauzas | I already said both specs aren't competitive | 13:56 |
efried | No, they're not mutually exclusive... except from the standpoint of development resource. | 13:56 |
efried | IMO retrofitting filter-based affinity for placement-modeled vgpu is a backward step. | 13:57 |
bauzas | that's your opinion | 13:58 |
efried | That's what "IMO" means. | 13:58 |
efried | I'm just one voice. If there's a preference from the team to move ahead with it, so be it. | 13:59 |
gibi | there is complexity difference between the two solutions that could mean leadtime differences as well. | 13:59 |
efried | yes, absolutely. The filter bauzas proposes could be easily contained in Train. NUMA modeling/affinity in placement may well take longer. | 14:00 |
bauzas | efried: I'm not adding a filter | 14:00 |
efried | it has already taken long enough. | 14:00 |
efried | weigher? | 14:00 |
bauzas | indeed, have you looked at my spec honestly? | 14:01 |
bauzas | the weigher is even not needed | 14:01 |
efried | I certainly could be misusing terminology. I'm really good at that. | 14:01 |
bauzas | if that's really a problem for you, I can even remove it honestly | 14:01 |
bauzas | what I really just need is https://review.opendev.org/#/c/650963/9/specs/train/approved/libvirt-vgpu-numa-affinity.rst@92 | 14:02 |
bauzas | efried: the point is, a filter can get NoValidHostds | 14:02 |
bauzas | efried: not a weigher | 14:02 |
bauzas | efried: it just helps to make sure we spread instances between hosts | 14:03 |
bauzas | for vGPUs | 14:03 |
bauzas | in order to have less races | 14:03 |
efried | bauzas: Where is the code for @92 going to live, if not in the NUMATopologyFilter or a new weigher? In the libvirt virt driver? | 14:03 |
bauzas | efried: I said it in the spec | 14:04 |
bauzas | https://review.opendev.org/#/c/650963/9/specs/train/approved/libvirt-vgpu-numa-affinity.rst@101 | 14:05 |
bauzas | https://review.opendev.org/#/c/650963/9/specs/train/approved/libvirt-vgpu-numa-affinity.rst@211 and https://review.opendev.org/#/c/650963/9/specs/train/approved/libvirt-vgpu-numa-affinity.rst@214 | 14:05 |
efried | bauzas: Okay, I've reread the spec. | 14:14 |
bauzas | thanks | 14:14 |
efried | It all makes perfect sense in a world where there's no NUMA affinity at the placement level. | 14:14 |
efried | but once we have that, 80% of this code goes away. | 14:15 |
efried | A pack/spread weigher for its own sake may make sense. | 14:16 |
efried | though not related to affinity | 14:16 |
efried | The code to pick the proper NUMA node based on which PGPUs are allocated becomes n/a. | 14:17 |
efried | So my point is, a) this becomes tech debt almost immediately; and b) the effort spent coding & reviewing could be better spent getting us closer to placement-based NUMA modeling & affinity. | 14:18 |
efried | IMO | 14:18 |
cdent | IMOT | 14:18 |
cdent | Too | 14:19 |
cdent | otherwise we've spent a huge complexity cheque in placement for nada | 14:19 |
bauzas | efried: cdent: there will still be nova changes for using the new placement microversions | 14:21 |
bauzas | so, yeah, I'll work on this too | 14:22 |
bauzas | efried: the libvirt code will still possibly be there but AFAIK and unless I misunderstood, we will only have hard affinity | 14:22 |
bauzas | by using placement | 14:23 |
efried | correct | 14:23 |
efried | using group_policy=none would allow you to employ post-scheduling soft affinity | 14:25 |
*** purplerbot has quit IRC | 14:28 | |
*** purplerbot has joined #openstack-placement | 14:30 | |
bauzas | efried: so operators wanting to *only* have soft affinity for vGPUs would still need the libvirt claim | 14:30 |
bauzas | and maybe the weigher | 14:30 |
efried | actually, I take it back | 14:32 |
efried | you can't use the soft affinity after the fact - you can't disregard the claim you got. | 14:33 |
efried | it's all or nothing | 14:33 |
efried | You can use a host that hasn't modeled NUMA in nested providers, and get soft affinity; or if you're on a host that has modeled NUMA in nested providers, you can have hard affinity or no affinity. | 14:34 |
bauzas | efried: if you don't ask placement for hard affinity, then you can still have soft affinity | 14:51 |
bauzas | that's why I say the spec is not competitive | 14:52 |
efried | bauzas: you can't have soft affinity if the host is modeled with nested RPs. | 14:53 |
efried | bauzas: because we've created an allocation with resources from particular NUMA nodes. | 14:53 |
efried | So if we didn't request affinity from placement, and you happen to get allocations from opposite NUMA nodes, you can't just ignore that on the host and pick a different distribution of NUMA nodes. | 14:54 |
efried | so | 14:54 |
efried | all or nothing | 14:54 |
*** dklyle has joined #openstack-placement | 15:13 | |
mriedem | ima butt in with a question, | 15:14 |
mriedem | on https://developer.openstack.org/api-ref/placement/?expanded=update-allocations-detail#update-allocations | 15:14 |
mriedem | the 409 in the description is mostly about inventory conflicts, | 15:14 |
mriedem | but isn't there also a 409 response if the consumer exists and you pass 1.28 with consumer_generation=None? | 15:15 |
cdent | yes | 15:15 |
cdent | Inventory and/or allocations changed while attempting to allocate | 15:16 |
cdent | one could argue (weakly) that "or inventories are updated by another thread while attempting the operation" fits, since allocations change an inventory's capacity | 15:17 |
cdent | and if you try to send None for consumer_generation and it doesn't work, then there have been allocations out from under you | 15:17 |
cdent | but yes, it could be documented better | 15:17 |
* mriedem is storyboarding | 15:20 | |
cdent | you are a star | 15:20 |
mriedem | https://storyboard.openstack.org/#!/story/2006180 | 15:22 |
*** dklyle has quit IRC | 15:28 | |
*** dklyle has joined #openstack-placement | 15:28 | |
*** amodi has quit IRC | 15:29 | |
*** amodi has joined #openstack-placement | 15:39 | |
*** helenafm has quit IRC | 15:43 | |
*** tssurya has quit IRC | 15:53 | |
gibi | cdent: left comments in https://review.opendev.org/#/c/668376 | 16:03 |
cdent | thanks gibi | 16:04 |
sean-k-mooney | efried: in the numa case we should catch that in the numa toplogy filter before we hit the compute node | 16:07 |
sean-k-mooney | it wont be going away at least not in the short term after we have numa support in placment | 16:07 |
efried | sean-k-mooney: Is the NUMATopologyFilter a filter or a weigher? | 16:08 |
sean-k-mooney | it might eventurlly go away however but ya we cant ignore the allcoation form placmeent | 16:08 |
sean-k-mooney | efried: its a filter | 16:08 |
sean-k-mooney | doing hard affinity | 16:08 |
efried | so the filter part that rejects an allocation that doesn't provide affinity - that would be moot | 16:08 |
efried | because why would we not have requested affinity from placement if we were going to enforce it in the filter anyway? | 16:09 |
sean-k-mooney | if an only if we implement everything it currently does in placment | 16:09 |
sean-k-mooney | so cpu, hugepages, and pci device numa affinity | 16:09 |
sean-k-mooney | when we can enforce all of the above with placmenet it can go away but not before we have all 3 | 16:09 |
sean-k-mooney | so it will allow us to do it picemeal in that more and more of the filtering can be left to placmenet and eventualy everthing will be enforce by placement and it can be removed once we have parity | 16:11 |
sean-k-mooney | that is proably after U | 16:12 |
sean-k-mooney | stephenfin: jangutter: i fixed my missing bug nit in https://review.opendev.org/#/c/666387/2 if you want to hit that one quickly. its not urgent but lets try and land it by m2 | 16:24 |
sean-k-mooney | we might want to backport it too to stien but we can do that when needed | 16:25 |
sean-k-mooney | oops wrong channel | 16:25 |
cdent | cleanly sean-k-mooney needs to be kickbanned for violating the rules so egregiously | 16:26 |
cdent | clearly! | 16:26 |
sean-k-mooney | clearly :) | 16:27 |
edleafe | and certainly not cleanly! | 16:31 |
* cdent waves goodnight | 16:33 | |
*** cdent has quit IRC | 16:33 | |
efried | sean-k-mooney: I agree the filter itself needs to stick around, especially because (IMO) we should not be trying to go whole hog in placement with things like hugepages etc. However, pieces of that code will become redundant (run but never reject) due to the bits we are enabling in placement. | 16:35 |
sean-k-mooney | yep | 16:36 |
sean-k-mooney | althouhg that code need to be made placment aware | 16:36 |
efried | though (run but never reject) may be better as (remove) depending on how reliably we're running the placement side. | 16:36 |
sean-k-mooney | its the same code that does teh assignment on the compute node and it need to know it can only assing the resouce that correspond to the allocations/RPs selected by placement | 16:37 |
efried | yeah, that could get a little crazy. | 16:37 |
sean-k-mooney | efried: the filter works by invokeing the assingment code that will be used on the compute node without actully claiming the resouce in the RT | 16:37 |
sean-k-mooney | so 90% of the code would still be used after placment does it | 16:38 |
sean-k-mooney | but it need to learn that it cant select form all resouces anymore and can only look at the resouce that correspond to the RP in teh placement allocation | 16:38 |
sean-k-mooney | part of the logic will be updated to look at the alloction by the standardise cpu in placment work | 16:39 |
sean-k-mooney | when hugepage or pci device are moved to plamcent the rest will have to be updted | 16:40 |
sean-k-mooney | on the plus side it should make the filter faster | 16:40 |
efried | which is kind of the whole point | 16:41 |
sean-k-mooney | making it faster | 16:41 |
efried | faster in two senses | 16:41 |
efried | failures happen earlier with less racing; and the filter itself actually performs better. | 16:42 |
efried | whole point of placement | 16:42 |
sean-k-mooney | ya, although the real win is reducing rescudle however to do that we likely need to move teh RT claim to the conductor too. | 16:42 |
*** ttsiouts has quit IRC | 16:43 | |
*** ttsiouts has joined #openstack-placement | 16:44 | |
*** ttsiouts has quit IRC | 16:49 | |
*** mriedem has quit IRC | 21:53 | |
openstackgerrit | Merged openstack/placement master: Update implemented spec and spec document handling https://review.opendev.org/669184 | 22:22 |
openstackgerrit | Merged openstack/placement master: Add whereto for testing redirect rules https://review.opendev.org/669370 | 22:32 |
openstackgerrit | Merged openstack/placement master: tox: Stop building api-ref docs with the main docs https://review.opendev.org/669371 | 22:32 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!