| opendevreview | wanghongtao proposed openstack/nova master: Fix operator precedence in limit_check https://review.opendev.org/c/openstack/nova/+/993297 | 06:51 |
|---|---|---|
| opendevreview | Shalini Srivastava proposed openstack/nova master: Filter invalid kwargs in volume_api.create to prevent TypeError https://review.opendev.org/c/openstack/nova/+/993306 | 07:26 |
| opendevreview | Kamil Sambor proposed openstack/nova master: Restore 'fork' start method for daemon mode on Python 3.14 https://review.opendev.org/c/openstack/nova/+/987815 | 07:39 |
| opendevreview | Ashish Gupta proposed openstack/placement master: tests: Add connection parameter to Database and PlacementFixture https://review.opendev.org/c/openstack/placement/+/993106 | 07:57 |
| opendevreview | Stephen Finucane proposed openstack/nova master: docs: Speed up release notes builds https://review.opendev.org/c/openstack/nova/+/989212 | 09:17 |
| opendevreview | Stephen Finucane proposed openstack/nova master: docs: Speed up release notes builds https://review.opendev.org/c/openstack/nova/+/989212 | 09:21 |
| opendevreview | Joan Gilabert proposed openstack/nova master: Add mtty/mdpy support for testing fake mdevs https://review.opendev.org/c/openstack/nova/+/898100 | 09:35 |
| opendevreview | Joan Gilabert proposed openstack/nova master: Rename vtpm job and add mtty support for vpgu test https://review.opendev.org/c/openstack/nova/+/922140 | 09:35 |
| opendevreview | Shalini Srivastava proposed openstack/nova master: Filter invalid kwargs in volume_api.create to prevent TypeError https://review.opendev.org/c/openstack/nova/+/993306 | 10:32 |
| opendevreview | Lajos Katona proposed openstack/nova master: Use SDK for Neutron Ports https://review.opendev.org/c/openstack/nova/+/969298 | 11:17 |
| opendevreview | Lajos Katona proposed openstack/nova master: Use SDK for Neutron security-groups https://review.opendev.org/c/openstack/nova/+/981141 | 11:19 |
| opendevreview | Joan Gilabert proposed openstack/nova master: Rename vtpm job and add mtty support for vpgu test https://review.opendev.org/c/openstack/nova/+/922140 | 11:33 |
| bauzas | sean-k-mooney[m]: gibi; could either of you look at https://review.opendev.org/q/topic:%22bug/2153425%22 ? I'm all good | 12:25 |
| opendevreview | Joan Gilabert proposed openstack/nova master: Rename vtpm job and add mtty support for vpgu test https://review.opendev.org/c/openstack/nova/+/922140 | 12:28 |
| -opendevstatus- NOTICE: Recent POST_FAILURE job results with no logs were due to upload errors in one of our providers, which has been temporarily disabled now so rechecking those should be safe | 12:44 | |
| sean-k-mooney | bauzas: sorry was on a call but i can take a look quickly | 12:45 |
| sean-k-mooney | bauzas: so i have been watching the updates on this passivly | 12:46 |
| sean-k-mooney | bauzas: this is not quite enough to fix numa vswitch correct? | 12:46 |
| sean-k-mooney | its related but this is only for the live migration path | 12:47 |
| bauzas | it should be good | 12:47 |
| sean-k-mooney | we still have the issue with the compute not havig it on spwawn | 12:47 |
| bauzas | at least we have a customer asking for it | 12:47 |
| sean-k-mooney | or cold migrate | 12:47 |
| sean-k-mooney | bauzas: what i mean is this will help with live migration but there is more work needed to make sure that all code path include the limits | 12:48 |
| bauzas | we can ask that for a follow-up | 12:49 |
| sean-k-mooney | bugs.launchpad.net/nova/+bug/2145135 we alos have a gap for https://bugs.launchpad.net/nova/+bug/1855332 | 12:49 |
| sean-k-mooney | bauzas: sure im not saying its inscope fo those change | 12:49 |
| sean-k-mooney | but you were previsoly working on bugs.launchpad.net/nova/+bug/2145135 a month or two ago | 12:49 |
| sean-k-mooney | that why im bringing the topic up | 12:50 |
| sean-k-mooney | anyway ill review it now i just want to clear on the scope | 12:50 |
| bauzas | ack thanks | 12:52 |
| sean-k-mooney | just an fyi the expecation in the bug are also incorrect https://bugs.launchpad.net/nova/+bug/2153425 | 12:53 |
| sean-k-mooney | setting only `{‘hw:numa_nodes’: ‘1’}` | 12:53 |
| sean-k-mooney | does not claim any cpu or memroy in the host numa tracker but it does end up with the vm being pinned | 12:53 |
| sean-k-mooney | to a host numa cell | 12:53 |
| sean-k-mooney | we do not expect any numa blancing to happen with that config unless you also request hw:mem_page_size in teh falvor or image | 12:54 |
| sean-k-mooney | bauzas: so im sorry to say but thet bug is technially inviald and its a psudo feature request. | 12:58 |
| bauzas | but the issue still remains | 13:00 |
| sean-k-mooney | the flavor is not valid | 13:00 |
| sean-k-mooney | we can add supprot for numa blancin on shared cpus only but that was never supproted by nova | 13:00 |
| sean-k-mooney | we supprot it for pinned cpus | 13:00 |
| sean-k-mooney | and for guest memory | 13:00 |
| sean-k-mooney | but not for shared cpus | 13:01 |
| sean-k-mooney | so im not agains supproting that | 13:01 |
| sean-k-mooney | but the expecation today is that we wont consider the cpu allcation ration at the numa node level today | 13:01 |
| bauzas | honestly, I don't know what to say :( | 13:09 |
| sean-k-mooney | bauzas: im not agaisnt building thei fucntionlatiy into nova but its really a new feature | 13:10 |
| sean-k-mooney | and if we are to do it we need to do it for spawn and cold migrate and unshleve ectra | 13:11 |
| sean-k-mooney | thisis not the first tirm that i have had this converatoion with operators or custoemrs and pointed out that simply requesting hw:numa_node and nothing else is only vaild if your usign file backed memory as we do not do numa aware shared cpu placement so we will not blance between numa node | 13:12 |
| sean-k-mooney | to be clear even with file backed memory it not a good idea | 13:13 |
| sean-k-mooney | if other really want to tack this as a bug we need to add supprot for this on spwan and other move operaiotn and more fucntional tests to cover that | 13:13 |
| sean-k-mooney | https://bugs.launchpad.net/nova/+bug/1792985 and https://bugs.launchpad.net/nova/+bug/1439247 are relevent context for why the current flavor is invlaid | 13:23 |
| sean-k-mooney | usign file backed memory prevetn the OOM kill but without also susign cpu pinnign it does nto adress the cpu blancing part | 13:24 |
| sean-k-mooney | which is what ye were tryign to address with that new bug | 13:24 |
| *** sambork_ is now known as sambork | 13:32 | |
| *** iurygregory_ is now known as iurygregory | 13:46 | |
| opendevreview | Joan Gilabert proposed openstack/nova master: Rename vtpm job and add mtty support for vpgu test https://review.opendev.org/c/openstack/nova/+/922140 | 13:50 |
| opendevreview | Merged openstack/nova stable/2025.1: Add reproducer test for bug 2105896 https://review.opendev.org/c/openstack/nova/+/989535 | 14:00 |
| *** ralonsoh is now known as ralonsoh_ooo | 15:09 | |
| dansmith | sean-k-mooney: unrelated to the numa aspect, why is it not right to set self.limits before we call that compute-via-RPC check? it seems that the code expects limits to be set,. | 15:26 |
| dansmith | but I'm not sure when else they would be, but I'm sure I'm missing something | 15:26 |
| sean-k-mooney | dansmith: that part isned nessarly wrong but the expecation that nova shoudl blance based on that is. also nova shoudl not be multiplying the core or memory on a given numa node by the ratios | 15:27 |
| Uggla | reminder: upstream meeting in ~30mn | 15:27 |
| sean-k-mooney | well the memofy shoudl not be multipled the cores could be viable | 15:28 |
| dansmith | sean-k-mooney: yeah okay I was going to say.. the assumption in the patch being wrong makes sense, not arguing with that | 15:29 |
| dansmith | it just looks to me like in the force case we probably need to be setting self.limits | 15:30 |
| sean-k-mooney | dansmith: so there is a seprate bug in teh conductor | 15:30 |
| sean-k-mooney | where the limt are nto passed to the compute | 15:30 |
| dansmith | yeah | 15:30 |
| sean-k-mooney | so the cshcdule does nto actully select the placmenet on the host | 15:30 |
| sean-k-mooney | the compute does | 15:30 |
| sean-k-mooney | so untile we etenbr recreate them there ro pass them | 15:31 |
| sean-k-mooney | the two could still disagree | 15:31 |
| dansmith | "select the placement" meaning the numa placement? | 15:31 |
| dansmith | we're still calling scheduler even in the force case right? | 15:31 |
| sean-k-mooney | if you mean force live migration i belive if you use the old api where that exist | 15:32 |
| sean-k-mooney | we only check that the host exist and skipp all the filters | 15:32 |
| sean-k-mooney | but if you use the non force microverion where you can pass a host | 15:32 |
| sean-k-mooney | then we run true the filters | 15:32 |
| dansmith | sean-k-mooney: L100 here https://review.opendev.org/c/openstack/nova/+/990212/5/nova/conductor/tasks/live_migrate.py# | 15:33 |
| sean-k-mooney | so if you use the old foce api then we just trust you that it fits and send it to the compute | 15:33 |
| dansmith | oh I guess in that case we don't even have limits to set | 15:33 |
| sean-k-mooney | right because we are currently relying on a sideffect fo the numa topltoy filter to create them | 15:34 |
| dansmith | yeah, okay | 15:34 |
| sean-k-mooney | which is the bug with numa vswthces | 15:34 |
| sean-k-mooney | without checking i woudl have to assume we are effectivly applying our defaults or no multipler at all when the limits are not pass today? | 15:36 |
| sean-k-mooney | which while not ideal for memory and disk its at least <=1.0 now | 15:37 |
| bauzas | sean-k-mooney: the problem is that we say 'no' to something is a tribal knowledge miss | 15:37 |
| sean-k-mooney | in practice we are relying on placment to enfoce the multipler globally on the compute node | 15:37 |
| bauzas | sean-k-mooney: if the scheduler provides limits and then the conductor doesn't use it, then it's a bug | 15:38 |
| sean-k-mooney | bauzas: this is somethign we have discussed about changign a few time and when i have suggested it we wer econcerd about upgrade impact | 15:38 |
| bauzas | but I understand your point, the scheduler shouldn't accept this | 15:38 |
| sean-k-mooney | bauzas: dansmith just to be clear | 15:38 |
| sean-k-mooney | im not agaisnt fixign the fact we shoudl pass limit when we dont | 15:38 |
| sean-k-mooney | but im not ok with impliy hw:numa_nodes=1 with nothign else is a vlaid numa flavor that shoudl be balanced | 15:39 |
| sean-k-mooney | at elast not until that works for all code paths | 15:39 |
| bauzas | so we should provide a HTTP400 if an operator creates a wrong flavor then | 15:39 |
| bauzas | at least when creating the instance | 15:39 |
| sean-k-mooney | we coudl btu we rejected that in the past | 15:40 |
| sean-k-mooney | we dicuss this when stephen was addign the falvor metadata validation | 15:40 |
| sean-k-mooney | the minitum valid numa flavor to day has hw:mem_page_size set to anything | 15:40 |
| bauzas | but that's a tribal knowledge, right? | 15:41 |
| sean-k-mooney | everythign else builds form that baseline otherwise your vm will get OOM killed eventually | 15:41 |
| sean-k-mooney | bauzas: sort of | 15:41 |
| bauzas | because I wonder why we should just use this flavor extraspec (mem_page_size) just for NUMA usage ? | 15:41 |
| sean-k-mooney | because without it we do non numa aware memroy tracking | 15:41 |
| sean-k-mooney | and we pin you to a numa ndoe without chekcign if the sum of vms on that numa ndoe fit | 15:42 |
| bauzas | so that's a tech debt | 15:42 |
| sean-k-mooney | yep from day one of numa supprot in tree | 15:42 |
| bauzas | operators should be able to ask for NUMA nodes without needing to ask page sizes | 15:42 |
| sean-k-mooney | and its tech debt i hav eproposed fixing multipel time | 15:42 |
| sean-k-mooney | but we have rejected because it breaks upgrades | 15:43 |
| bauzas | but at least for the bug report, I think we can still accept it | 15:43 |
| sean-k-mooney | we can accpt the bug for limits | 15:43 |
| bauzas | if the scheduler provides limits, we could provide them to the conductor and then the compute | 15:43 |
| sean-k-mooney | not for numa without pagesize | 15:43 |
| bauzas | then the scheduler shouldn't provide the limits | 15:44 |
| bauzas | right? | 15:44 |
| sean-k-mooney | sorry im not following the question | 15:46 |
| bauzas | lemme explain it better my question | 15:47 |
| bauzas | if we don't want nova to accept flavors asking numa cores with no page sizes, then we should not pass limits down to the conductor | 15:48 |
| sean-k-mooney | https://etherpad.opendev.org/p/nova-wallaby-ptg#L712 | 15:48 |
| bauzas | if we're passing scheduler limits to the conductor, this means "oh, yeah, I accept this instance, please check those limits when you run the claims" | 15:48 |
| sean-k-mooney | that was the last time i brought up chagnitn this at the ptg | 15:48 |
| bauzas | sorry, my brain is bad about remembering things in general :-( | 15:49 |
| sean-k-mooney | bauzas: the limits we are checkign are cpu ram and disk | 15:49 |
| sean-k-mooney | those are enfocced by placment at the host level | 15:49 |
| sean-k-mooney | the have never applied with the host at the numa level | 15:49 |
| bauzas | like, I wasn't even to recollate the bug with my own numa vswitches | 15:49 |
| sean-k-mooney | that whwere the diconnect is. we have never considerd the allcation tratios when it comes to numa affinity | 15:50 |
| sean-k-mooney | bauzas: back in wallaby https://etherpad.opendev.org/p/r.321f34cf3eb9caa9d87a9ec8349c3d29#L712 we agreed an aproch to adress this so that any numa instance woudl be give a hw:mem_page_size if not set | 15:52 |
| sean-k-mooney | that didnt happen because it was internally depriorised by our pm so i didnt end up workign on that | 15:52 |
| bauzas | thanks, again, I wasn't able to remember this, sorry | 15:53 |
| bauzas | but now we stora kind of accepting those flavors and we can't say 'well, doh, we don't support this, thanks' | 15:54 |
| sean-k-mooney | no worries. so to have a path forward, i have no issues with passing limit when they shoudl be passed, and i have no issue with closign this numa footgun | 15:54 |
| bauzas | I'm looking at the upstream docs to see whether we call it | 15:54 |
| sean-k-mooney | bauzas: ya that has been the main concurn how do we fix this without breakign upgrades | 15:54 |
| bauzas | sean-k-mooney: so, to clarify, you say "let's provide limits when we have them, but let's not pass those limits if those shouldn't be supported". Am I correct with this assumptio ? | 15:55 |
| sean-k-mooney | more or less. | 15:56 |
| sean-k-mooney | im kind fo ok wiht alwasy passign them but my main issue with the bug is expecting that nova will do the numa blancing as a side effeict | 15:56 |
| sean-k-mooney | it might but that was never inteded to work | 15:56 |
| bauzas | hmmmm | 15:57 |
| bauzas | (I'm trying to find how we couldn't provide the limits if we have a specific flavor) | 15:57 |
| sean-k-mooney | its a littel tricky because you can change the limite via placement | 15:57 |
| bauzas | well, not really 'how' but 'where' | 15:57 |
| bauzas | yeah | 15:58 |
| sean-k-mooney | so there are 2 sepreate thing there are the numa toplogy constratis which we can create form flavor and iamge metadta | 15:58 |
| sean-k-mooney | and seperate there is the limige for the allcoation ratios | 15:58 |
| opendevreview | huangjs3 proposed openstack/nova master: cpu monitor: split zero/negative cputime paths https://review.opendev.org/c/openstack/nova/+/993205 | 15:59 |
| sean-k-mooney | i think we get those in the prover summaries | 15:59 |
| sean-k-mooney | but i think the metign is stating now issh? | 15:59 |
| sean-k-mooney | so we can pick this up after | 15:59 |
| Uggla | #startmeeting nova | 15:59 |
| opendevmeet | Meeting started Mon Jun 15 15:59:58 2026 UTC and is due to finish in 60 minutes. The chair is Uggla. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:59 |
| opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:59 |
| opendevmeet | The meeting name has been set to 'nova' | 15:59 |
| Uggla | Hello everyone | 16:00 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!