opendevreview | Yusuke Okada proposed openstack/nova master: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/873216 | 03:49 |
---|---|---|
opendevreview | Yusuke Okada proposed openstack/nova master: Fix failed count for anti-affinity check https://review.opendev.org/c/openstack/nova/+/873216 | 04:10 |
*** blarnath is now known as d34dh0r53 | 06:56 | |
opendevreview | Jorge San Emeterio proposed openstack/nova master: WIP: Look for cpu controller on cgroups v2 https://review.opendev.org/c/openstack/nova/+/873127 | 08:28 |
gibi | bauzas: I start to think that the functional test failure is somehow changing behavior just becuase we added logging. | 08:50 |
bauzas | really ? | 08:50 |
bauzas | gibi: btw. saw your highlight yesterday, thanks, didn't had yet time to look at my series but I can surely rush for the nits | 08:50 |
gibi | we rechecked it through multiple days | 08:50 |
gibi | without hit | 08:51 |
bauzas | gibi: then we should merge it and see whether it magically solves our problem | 08:51 |
bauzas | gibi: we could prepare a revert | 08:51 |
gibi | lol :D | 08:51 |
bauzas | we're not really at risk for Feature Freeze and we have time to revert before RC1 | 08:52 |
bauzas | butn, | 08:52 |
gibi | OK, let me push a new PS to clean things a bit up | 08:52 |
gibi | then I'm OK to merge it with a pending revert | 08:52 |
bauzas | if that solves the problem, then honestly, I don't know what to say | 08:52 |
gibi | yeah, I feel the same | 08:53 |
bauzas | we could merge some log saying "meh, don't be afraid, we love you" | 08:53 |
bauzas | gibi: btw. you know that our master branch is broken broken ? (c) elodilles | 09:00 |
gibi | bauzas: that is a news to me | 09:01 |
bauzas | I gonna recheck https://review.opendev.org/c/openstack/tempest/+/873300 just looking up the root cause of the CI failure | 09:01 |
bauzas | gibi: tl;dr when dan fixed the image caching issue in Tempest, unfortunately we were not having coverage so we regressed | 09:03 |
bauzas | I was in the train with limited connection yesterday so I haven't send a status email, but basically our ceph-multistore job is unhappy | 09:03 |
bauzas | the tempest fix went accepted soon so now there is no need to send a signal to the community but I just hope to not face problems with merging it | 09:04 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Add logging to find test cases leaking libvirt threads https://review.opendev.org/c/openstack/nova/+/872975 | 09:09 |
gibi | bauzas: pimped out ^^ | 09:09 |
bauzas | lol | 09:10 |
gibi | bauzas: ack, thanks for the summary | 09:10 |
bauzas | it brightens | 09:10 |
bauzas | gibi: sean-k-mooney is on PTO today until wed, we need a second core | 09:11 |
gibi | hm mh | 09:12 |
* bauzas is short in hands :) | 09:12 | |
sean-k-mooney[m] | i am but im reviewing your pm series | 09:12 |
sean-k-mooney[m] | right now | 09:12 |
sean-k-mooney[m] | so if you want me to look at something quickly i can | 09:12 |
bauzas | sean-k-mooney: oh, as you can | 09:12 |
gibi | sean-k-mooney[m]: just blindly merge https://review.opendev.org/c/openstack/nova/+/872975 please :D this is still trying to catch the functional tc | 09:13 |
gibi | that cause the libvirt import error in a later test | 09:13 |
bauzas | sean-k-mooney: for the PM series, I'll update the 3rd patch and work on a FUP given the method rename discussing in patch #2 | 09:13 |
gibi | after couple of days of constant rechecks we did not get a hit | 09:13 |
gibi | so we think if we merge the extra log it will fix the gate (kidding) | 09:14 |
gibi | bauzas: one thing hit me during the night and now I remember it. If we do the rename of the get_online_cpus and return the offlined ones as well then I think cpu_shared_set will allow listing offline CPUs without nova rejecting it. While in the past such config was rejected | 09:15 |
sean-k-mooney[m] | hehe a the famous add a debug line to fix the race technique | 09:16 |
sean-k-mooney[m] | there are man | 09:16 |
bauzas | technically this is a warning line :D | 09:16 |
bauzas | gibi: I need to look at that code, tbh | 09:16 |
sean-k-mooney[m] | *many c programs that are fixt with strategicly placed printfs | 09:16 |
gibi | sean-k-mooney[m]: it is a heisenbug so we merge the observer into the system and hope that it won't get entangled with it | 09:17 |
bauzas | lol | 09:17 |
gibi | but I know my fate | 09:17 |
bauzas | we need a cat | 09:17 |
* gibi needs cats | 09:17 | |
bauzas | I'd rather say that the race condition is actually low | 09:18 |
bauzas | and we were blinded by the false positives | 09:18 |
gibi | maybe the race condition is affraid of cats ... | 09:18 |
bauzas | (just a theory) | 09:18 |
bauzas | cats don't like races | 09:19 |
bauzas | they prefer to sit down and lick their bottoms | 09:19 |
sean-k-mooney[m] | wont that break tempest | 09:19 |
bauzas | tempest is already broken :) | 09:20 |
gibi | sean-k-mooney[m]: in which way? | 09:20 |
bauzas | that's another fun story | 09:20 |
sean-k-mooney[m] | https://review.opendev.org/c/openstack/nova/+/872975/7/nova/virt/libvirt/driver.py | 09:20 |
sean-k-mooney[m] | if we ever do an abort in tempest | 09:20 |
* bauzas gets his 3rd shot of coffee after 1 hour. The day is about to be huge | 09:21 | |
sean-k-mooney[m] | then wont that either crash on parent being none or keep looping | 09:21 |
bauzas | you mean for the real tempest checks ? | 09:21 |
sean-k-mooney[m] | ya if we test live migration abort | 09:22 |
sean-k-mooney[m] | like we never set testcase_id in reall code in the eventlet | 09:23 |
bauzas | that's a good point | 09:23 |
bauzas | gibi: ^ | 09:23 |
bauzas | formerly, I restricted zuul to only run the functests | 09:23 |
bauzas | but if we merge this, it will be run for *all* tests | 09:24 |
sean-k-mooney[m] | its a simple fix commeted inline | 09:25 |
sean-k-mooney[m] | we just need to check that current is not None | 09:25 |
sean-k-mooney[m] | and only log if its not None | 09:25 |
sean-k-mooney[m] | so too None check in the while and around the log and it will work without breaking abort | 09:26 |
gibi | good point | 09:27 |
gibi | fixing it up... | 09:27 |
bauzas | gibi: working on the alternative for the PM series patch #2 | 09:51 |
bauzas | it won't be a fup | 09:51 |
opendevreview | Balazs Gibizer proposed openstack/nova master: Add logging to find test cases leaking libvirt threads https://review.opendev.org/c/openstack/nova/+/872975 | 09:52 |
gibi | bauzas: sean-k-mooney[m]: ^^ | 09:52 |
gibi | bauzas: ack | 09:52 |
bauzas | gibi: +2d | 09:53 |
sean-k-mooney[m] | +2w | 09:54 |
sean-k-mooney[m] | also sylvain i completed a pass on the pm seriese | 09:54 |
sean-k-mooney[m] | nothing major but if your reworking it anyway please take a look | 09:55 |
sean-k-mooney[m] | i dont have any comments really worth holding the seriese over | 09:55 |
sean-k-mooney[m] | but perhaps things to think about in followups | 09:55 |
sean-k-mooney[m] | ok im going to have a coffee. check on freya then play some factorio. my ipad will be near by so if there is anything else just ping but im mostly done for the day now | 09:57 |
bauzas | sean-k-mooney: ack, very much appreciated | 09:57 |
sean-k-mooney[m] | ill laugh if after all that this fails to merge in the gate because the func test fails | 09:58 |
bauzas | our gate is in the weeds either way due to https://review.opendev.org/c/openstack/tempest/+/873300 still not merged | 10:00 |
gibi | factorio++ :) | 10:04 |
gibi | sean-k-mooney[m]: yeah, that has a chance :) | 10:04 |
opendevreview | Sofia Enriquez proposed openstack/nova master: Implement encryption on backingStore https://review.opendev.org/c/openstack/nova/+/870012 | 10:12 |
bauzas | TIL about factorio | 10:17 |
* bauzas feels old | 10:17 | |
* bauzas stayed with Starcraft and C&C | 10:22 | |
gibi | bauzas: be careful with factorio, it can feel like work :D | 10:24 |
gibi | sweet sweet productive work :D | 10:25 |
bauzas | I won't install it now, that's too risky | 10:25 |
bauzas | but I used to love playing RTS | 10:25 |
elodilles | bauzas: SC1 or SC2? :D | 10:28 |
bauzas | elodilles: dude, I'm 42, what do you expect ? | 10:28 |
elodilles | bauzas: 'same applies here' ;) | 10:29 |
gibi | SC1 2vs2 battle net was nice | 10:30 |
gibi | we tried SC2 2vs2 but it does not felt the same. (we probably got old) | 10:30 |
sean-k-mooney | its much simpler to get into a productive flow with factorio | 10:30 |
sean-k-mooney | that said did you know there are prometious exporters for factorio s you can visualise your factory in graphana | 10:31 |
bauzas | and when I say C&C, I really mean C&C 1 | 10:31 |
bauzas | and Red Alert | 10:31 |
bauzas | (mostly Red Alert actually) | 10:31 |
elodilles | :) | 10:31 |
bauzas | gibi: LAN parties with Starcraft 2x2 were gorgeous indeed | 10:31 |
elodilles | I'm so old that i time to time play WarCraft1 campaign just for "fun" >:D | 10:32 |
sean-k-mooney | peopel have litrally confiruted really monitoring systems to monitor there virtual factory and raise alerts when you run out of ore.... that is more investment then i can put into any game | 10:32 |
* gibi is wondering if SC1 battle net is still running or not | 10:32 | |
bauzas | sean-k-mooney: oh man, that's way too much addictive | 10:32 |
sean-k-mooney | bauzas: you should play it its fun | 10:33 |
sean-k-mooney | just make sure you have good posture when you do because you might blink and realise its been 3 hours without moving | 10:34 |
gibi | sean-k-mooney: if we loose bauzas on factortio then you need to be the next nova PTL as a punishment :) | 10:38 |
sean-k-mooney | lol | 10:38 |
* bauzas can't remember the name of the online game he spent too much time in the 2000s where it was about building starships and other thinghs | 10:38 | |
sean-k-mooney | surpisingly that does not narrow it down much | 10:39 |
bauzas | Ogame, got it | 10:40 |
bauzas | that one stole too much of my free time | 10:41 |
bauzas | gibi: about https://review.opendev.org/c/openstack/nova/+/821228/6/nova/virt/libvirt/host.py#745 I wonder whether we really need to call *again* getCPUMap() | 10:49 |
bauzas | gibi: the existing logic just considers to return the cpu blindless | 10:49 |
bauzas | blindly | 10:49 |
bauzas | but I see your point | 10:50 |
bauzas | nevermind | 10:50 |
gibi | bauzas: you probably still need to differentiate between non existent cpu ids in the dedicated_set and existing but offined cpus | 10:51 |
bauzas | yeah, I'm about adding a get_available_cpus() which will return all CPUs for the map | 10:52 |
gibi | yeah that will work | 10:53 |
kashyap | bauzas: What exactly does getCPUMap() fetch? The docs only say "Get node CPU information" | 10:59 |
kashyap | Ah, it maps to `virsh cpu-stats` | 11:00 |
bauzas | yup | 11:00 |
kashyap | Oh, interesting. When I run `virsh cpu-stats` on my Fedora 36 VM for a guest, it gives me: | 11:01 |
bauzas | https://libvirt.org/html/libvirt-libvirt-host.html#virNodeGetCPUMap | 11:01 |
kashyap | $> sudo virsh cpu-stats 1 | 11:01 |
kashyap | error: Failed to retrieve CPU statistics for domain 'el8-vm1' | 11:01 |
kashyap | error: Operation not supported: operation 'getCpuacctPercpuUsage' not supported for backend 'cgroup V2' | 11:01 |
bauzas | [sbauza@sbauza temp]$ python | 11:01 |
bauzas | Python 3.11.1 (main, Jan 6 2023, 00:00:00) [GCC 12.2.1 20221121 (Red Hat 12.2.1-4)] on linux | 11:01 |
bauzas | Type "help", "copyright", "credits" or "license" for more information. | 11:01 |
bauzas | >>> import libvirt | 11:01 |
bauzas | >>> conn = libvirt.open('qemu:///system') | 11:01 |
kashyap | bauzas: Yeah, was reading. It looks like there's some accounting to be done w.r.t CGroups version | 11:01 |
bauzas | >>> conn.getCPUMap() | 11:01 |
bauzas | (8, [True, True, True, True, True, True, True, True], 8) | 11:01 |
kashyap | Yep | 11:01 |
* kashyap --> back in a bit; will read back | 11:02 | |
sean-k-mooney | you can also just get the online cores form sysfs | 11:10 |
sean-k-mooney | if this is an issue but im fine with using libvirt's api for this | 11:11 |
bauzas | I'm done with the new rev, just updating the upper patch now | 11:16 |
sean-k-mooney | my steamdeck is charging so im currently still at my work laptop.(playing factorio) so gibi if your happy to review bauzas serise and it looks good to you ping me when your done and i can then do a final pass over it quickly and we can likely merge that today. with that said i have a doctors apointmen in a littel over 3 hours so ill be away after that. | 11:24 |
bauzas | that's a love | 11:24 |
auniyal | O/ | 11:24 |
* bauzas just testing the shutil/cleanup thing on a py38env | 11:24 | |
gibi | sean-k-mooney: ack | 11:24 |
auniyal | in devstack is there a way to see nova-manage logs | 11:25 |
bauzas | I'm honestly torn, I don't know whether we should really pay attention to ignoring errors | 11:25 |
bauzas | and whether we should be cautious | 11:25 |
bauzas | honestly, we create a temp dir, so I don't except problems besides the full disk problem | 11:25 |
sean-k-mooney | you should not get any now with the way the fixture works | 11:26 |
bauzas | so I'll turn into using the .cleanup() method | 11:26 |
bauzas | and meh | 11:26 |
bauzas | meh to ignoring errors | 11:26 |
gibi | I'm fine ignoring errors | 11:26 |
gibi | during delete of a temp dir | 11:26 |
gibi | we will never reuse the temp dir as it has a random postfix | 11:27 |
sean-k-mooney | the flag is only there in 3.10 | 11:27 |
gibi | if my /tmp fills up that is on me | 11:27 |
sean-k-mooney | and we need to support 3.8 | 11:27 |
gibi | aah | 11:27 |
bauzas | gibi: that's the problem | 11:27 |
sean-k-mooney | so its fine for it to error | 11:27 |
sean-k-mooney | it wont | 11:27 |
gibi | ahh | 11:27 |
bauzas | we can't say 'ignore errors' if we use cleanup | 11:27 |
bauzas | hence me torn | 11:27 |
sean-k-mooney | i could have errored when we had the copy of sysfs because of some permision | 11:27 |
gibi | I'm fine both ways | 11:27 |
sean-k-mooney | but now its just normal files owned by us | 11:27 |
bauzas | correct | 11:27 |
gibi | if it starts failing on cleanup then we will switch to shutil | 11:28 |
sean-k-mooney | so it wont error unless there is a disk issue which si out of scope | 11:28 |
bauzas | yup, this ^ | 11:28 |
sean-k-mooney | +1 | 11:28 |
bauzas | I'll add a comment explaining the risk and how to mitigate it if we see it in CI | 11:28 |
gibi | cool | 11:29 |
bauzas | sean-k-mooney: gibi: I haven't yet written the docs patch as it requires a bit of effort, so I'll keep your comments on it unresolved but don't misunderstand me, surely I'll do it in a subsequent patch once I'm done (monday morning hopefully) | 11:39 |
bauzas | I just wanna give chance to other people to get reviewed | 11:39 |
gibi | bauzas: sure, doc is OK after FF | 11:40 |
sean-k-mooney | ya it can be a seperate patch | 11:40 |
bauzas | the core #0 note is actually very important | 11:40 |
bauzas | TIL about it | 11:40 |
bauzas | but yeah that makes sense from an OS perspective | 11:41 |
bauzas | you always rely on that core to be available | 11:41 |
bauzas | that's the most portable assumption | 11:41 |
sean-k-mooney | ya it generally need one core that can always be used to handel interupts and you know turn on the others | 11:42 |
sean-k-mooney | bauzas: that is why there is no online file in /sys/bus/cpu/devices/cpu0/ | 11:43 |
sean-k-mooney | but there is in all the rest | 11:43 |
bauzas | ooooooooh | 11:44 |
bauzas | but you can write an online file and set 0 into it for core #0, right ? | 11:44 |
bauzas | that's quite a destructive CPU equivalent of disk's rm -rf / | 11:45 |
bauzas | except it's stateless | 11:45 |
sean-k-mooney | it will be ignored | 11:49 |
sean-k-mooney | you can create that file but it wont do anything as far as i am aware | 11:49 |
sean-k-mooney | by the way https://github.com/SeanMooney/arbiterd/blob/master/src/arbiterd/common/cpu.py#L58-L62 | 11:50 |
sean-k-mooney | is how i got the aviable cpus | 11:51 |
sean-k-mooney | gibi: bauzas actully also for context | 11:52 |
sean-k-mooney | https://github.com/SeanMooney/arbiterd/blob/master/src/arbiterd/common/cpu.py#L106-L107 | 11:52 |
sean-k-mooney | that default 1 was because of this | 11:52 |
bauzas | gtk | 11:52 |
sean-k-mooney | https://github.com/SeanMooney/arbiterd/blob/master/src/arbiterd/common/cpu.py#L106-L114 | 11:52 |
sean-k-mooney | thats also why i did the get_online check in set online | 11:52 |
sean-k-mooney | to not do the write | 11:52 |
sean-k-mooney | i proably should have left a code comment for that... | 11:52 |
sean-k-mooney | ye removed that optimisation but that was actully the real reason i did that in the poc. it was not an optimisation | 11:53 |
sean-k-mooney | it was to workaround the cpu0 weridness | 11:53 |
sean-k-mooney | although to be fair i didnt fix that on the set offline path | 11:54 |
sean-k-mooney | so meh | 11:54 |
* gibi goes gets lunch | 11:58 | |
opendevreview | Sylvain Bauza proposed openstack/nova master: libvirt: let CPUs be power managed https://review.opendev.org/c/openstack/nova/+/821228 | 12:04 |
opendevreview | Sylvain Bauza proposed openstack/nova master: Enable cpus when an instance is spawning https://review.opendev.org/c/openstack/nova/+/868237 | 12:04 |
bauzas | gibi: sean-k-mooney: ^ | 12:04 |
* bauzas goes to lunch too | 12:04 | |
bauzas | and after that, will take my pen for reviewing series | 12:04 |
* gibi is back | 13:28 | |
gibi | bauzas: on it | 13:28 |
bauzas | ack | 13:28 |
* bauzas starts to look at https://etherpad.opendev.org/p/nova-antelope-blueprint-status | 13:28 | |
opendevreview | Andre Aranha proposed openstack/nova stable/yoga: [stable-only] Test setting the nova job to centos-9-stream https://review.opendev.org/c/openstack/nova/+/860087 | 13:32 |
gibi | bauzas: I'm +2 +A on the power management series | 13:36 |
bauzas | gibi: thanks | 13:37 |
gibi | it was a self contained patch series with well splitted commits. so it was a plesure to review | 13:38 |
bauzas | I'll add in the etherpad the promised docs patch | 13:38 |
gibi | I will check the evac fup now | 13:41 |
*** dasm|off is now known as dasm | 13:49 | |
gibi | bauzas: I'm not sure https://review.opendev.org/q/topic:privsep-usage-review is at a landeable state | 13:54 |
gibi | I think the two patches there are only pre-reqs for the real move but I can be mistaken | 13:55 |
bauzas | I need to open those links | 13:55 |
bauzas | ideally, I'd like a migration plan | 13:55 |
bauzas | something we could merge on a step way | 13:56 |
bauzas | but yeah, I assume this blueprint would be marked Complete once we pull all the callers out of the privsep.py modules | 13:56 |
sean-k-mooney | they can be merged increentally but i dont think there is enough done in those to see a benifit in A | 13:57 |
sean-k-mooney | i would proably wait till early next cycle | 13:57 |
sean-k-mooney | that said i have not really reviewed it so there might be more there then i think | 13:58 |
sean-k-mooney | i would not rush it however | 13:58 |
bauzas | yeah maybe | 13:58 |
bauzas | shit. https://review.opendev.org/c/openstack/tempest/+/873300 got again trampled | 13:58 |
bauzas | our gate is still onhold | 13:58 |
gibi | bauzas: there a global requirement bump https://review.opendev.org/c/openstack/requirements/+/872065 that is RED due to nova. And it blocks bumping os-traits to 2.10.0 in global requirements which in turn blocks the manila series and the https://review.opendev.org/q/topic:bp%252Flibvirt-maxphysaddr-support impl | 14:02 |
bauzas | damn shit. | 14:03 |
bauzas | we're constructing a pile of cards | 14:03 |
gibi | yeah | 14:04 |
bauzas | wait | 14:04 |
bauzas | https://effe4ed80a91c92fc386-ef4309a852fb3e3584cdcb1adbb4ea34.ssl.cf2.rackcdn.com/872065/4/check/cross-nova-py310/da20743/testr_results.html | 14:04 |
bauzas | 2023-02-07 15:57:42,351 ERROR [nova.privsep.utils] Error on '.' while checking direct I/O: '' | 14:04 |
bauzas | I have no idea about what would cause this | 14:04 |
gibi | I did not looked into that failure | 14:05 |
bauzas | after 10 years, I still discover new areas in Nov | 14:05 |
bauzas | gibi: ouch, see this ? https://review.opendev.org/c/openstack/requirements/+/872065/4/upper-constraints.txt#207 | 14:07 |
gibi | I counted > 20 major bumps in that patch but I did not noticed that libvirt-python is one of them | 14:08 |
gibi | so shit++ | 14:08 |
gibi | I don't like that big bump patch | 14:08 |
gibi | it moves to many things at once | 14:08 |
gibi | too close to RC1 | 14:08 |
kashyap | gibi: It's not merged yet, right | 14:09 |
kashyap | gibi: Yeah, I just see the 'libvirt-python' bump sneaked in there | 14:09 |
gibi | kashyap: right, it is failed on CI | 14:09 |
gibi | kashyap: but even if it clears CI I smell trouble | 14:09 |
kashyap | Yeah, I'm adding a quick review comment | 14:10 |
bauzas | I made a clear statement already | 14:11 |
bauzas | gibi: we shouldn't wait for this massive reqs update for os-traits | 14:12 |
kashyap | bauzas: Aaah, I missed your comment | 14:12 |
gibi | bauzas: we can try to propose a separate bump that only moves os-traits | 14:12 |
bauzas | gibi: want me to propose some u-c update for os-traits ? | 14:12 |
kashyap | Ah, we wrote the comment at the same moment (3:11 PM) | 14:12 |
gibi | bauzas: yeah, go ahead, I can +1 it | 14:13 |
bauzas | https://review.opendev.org/c/openstack/requirements/+/872065/4/upper-constraints.txt#361 | 14:13 |
bauzas | gibi: fwiw, os-traits isn't upgraded in this patch | 14:13 |
gibi | bauzas: yeah that patch wasn't regenerated since os-traits 2.10 landed | 14:13 |
gibi | I think it will be regenerated eventually but we don't need to wait for it | 14:13 |
bauzas | on it then | 14:13 |
gibi | thanks | 14:13 |
gibi | once we have 2.10 in global reqs we need a bump in placement too as we want to release placement with the latest os-traits | 14:14 |
* bauzas is a bit rusty on proposing requirements patches but surely I can do | 14:14 | |
bauzas | why isn't the bot proposing a new release ? https://review.opendev.org/c/openstack/requirements/+/854821 | 14:15 |
gibi | bauzas: maybe because it is waiting for the other bump to land? | 14:16 |
gibi | I'm not sure | 14:17 |
gibi | elodilles: ^^ | 14:17 |
gibi | bauzas: the spice image compression patch https://review.opendev.org/c/openstack/nova/+/828675 is nicely written and easy to review. I | 14:19 |
gibi | I | 14:19 |
gibi | I | 14:19 |
gibi | I'm +2 on it | 14:19 |
gibi | (too much coffee...) | 14:19 |
bauzas | gibi: ok, then, I'll take a look on it | 14:20 |
gibi | I think it is an easy win :) | 14:20 |
bauzas | I like easy wins | 14:20 |
bauzas | our antelope release will have bad numbers so anything that helps to improve our numbers is good :) | 14:20 |
gibi | and I jumped the gun on the manila series regarding the os-traits release. The manial series only needs os-traits 2.8.0 so that is OK. So only the scaphandre and the max_phy_bits series are blocked on os-traits 2.10.0 | 14:22 |
bauzas | that is correct | 14:23 |
bauzas | and since the scaphandre series actually relies on manila... | 14:23 |
bauzas | well, on virtiofs support coming from the manila series... | 14:24 |
gibi | yeah | 14:24 |
bauzas | gibi: as a wrap-up | 14:28 |
bauzas | I just discussed with the releases team | 14:28 |
bauzas | and the releases post-merge jobs are failing since Feb3 | 14:28 |
bauzas | in other words, os-vif and os-traits aren't formally released yet | 14:29 |
bauzas | despite the releases patch got merged | 14:29 |
gibi | ohh nice | 14:40 |
gibi | I see https://pypi.org/project/os-traits/ only has 2.9.0 | 14:46 |
bauzas | gibi: join the openstack-releases channel if you wanna get some fun | 14:47 |
gibi | I'm in :) but don't need more fun | 14:47 |
bauzas | gibi: +Wd the image compression patch | 15:24 |
bauzas | yay | 15:24 |
gibi | it won't land due to the ceph job block but still it is an achievement \o/ | 15:24 |
bauzas | yeah, and I'll chase the rechecks | 15:28 |
dansmith | I think zuul is down | 15:50 |
opendevreview | Merged openstack/python-novaclient master: Bump microversion to 2.95 https://review.opendev.org/c/openstack/python-novaclient/+/872418 | 15:52 |
dansmith | s'back | 15:54 |
bauzas | dansmith: there were problems with post-release pipelines | 15:56 |
bauzas | hence the zuul restart | 15:56 |
dansmith | yeah | 15:56 |
bauzas | dansmith: gmann: we'll get yet another failure for your tempest image fix https://zuul.openstack.org/status#873300 | 15:58 |
dansmith | yep | 15:58 |
opendevreview | Jorge San Emeterio proposed openstack/nova master: WIP: Look for cpu controller on cgroups v2 https://review.opendev.org/c/openstack/nova/+/873127 | 15:58 |
bauzas | melwitt: could you please remove your -2 on https://review.opendev.org/c/openstack/nova/+/863177 (the ironic-vnc-console blueprint got accepted for the cycle) | 16:02 |
bauzas | gibi: I think we rounded on all the possible blueprints we have | 16:06 |
gibi | bauzas: yeah, I don't have brainpower to look at the ironic vnc one | 16:07 |
bauzas | we can try to take a look at https://review.opendev.org/c/openstack/nova/+/863177 possibly but sounds a bit optimistic | 16:07 |
gibi | yeah | 16:07 |
bauzas | gmann: planning to progress on https://review.opendev.org/c/openstack/nova/+/864594 ? | 16:08 |
gibi | I think I won't start anything big any more today but I'm still around for a bit if specific review is needed | 16:08 |
bauzas | me too, I'm done for today | 16:08 |
bauzas | I'll start reviewing the maxphysnet series today, but I'm not an expert in this | 16:08 |
bauzas | s//maxphysaddr | 16:09 |
gibi | ohh that one I can take a look | 16:10 |
gibi | I reviewd the spec there | 16:10 |
spatel | sean-k-mooney Hi, I have a question related HugePages, currently i am running SRIOV + CPU pinning + HugePages | 16:12 |
spatel | Lets say i don't want to use HugePages. Does that possible? | 16:13 |
bauzas | spatel: sean-k-mooney is on PTO until wed | 16:13 |
bauzas | lemme try to answer you | 16:13 |
bauzas | spatel: yes, it's possible to have CPU pinning without huge pages | 16:14 |
bauzas | but I guess you have running workloads ? | 16:14 |
spatel | Yes | 16:15 |
spatel | I am deploying new cloud and planning to not use HugePage. We had some incident in past related memory cause strange issue. | 16:15 |
bauzas | ok, so, do you want to tune off hugepages for all your computes but a subset ? | 16:15 |
bauzas | ah | 16:15 |
spatel | New cloud with no HugePage at all.. | 16:16 |
bauzas | ok, then you just need to use flavors that don't request hugepages | 16:16 |
spatel | This is what i have currently in my cloud - intel_iommu=on iommu=pt hugepagesz=2M hugepages=30000 transparent_hugepage=never | 16:16 |
spatel | Thinking to remove hugepages and change flavor | 16:16 |
bauzas | https://docs.openstack.org/nova/latest/admin/cpu-topologies.html | 16:17 |
bauzas | I need to verify one bit, sec | 16:18 |
spatel | sure! | 16:18 |
bauzas | https://docs.openstack.org/nova/latest/admin/huge-pages.html | 16:19 |
bauzas | so, say you no longer ask for hugepages, it won't request a NUMA topolgy | 16:19 |
spatel | That was my next question.. How numa play with HugePages? | 16:20 |
spatel | We want our workload schedule in single NUMA zone | 16:20 |
bauzas | so you want NUMA without hugepages | 16:21 |
spatel | Yes | 16:22 |
spatel | I believe openstack automatically schedule workload according NUNA correct? | 16:23 |
bauzas | see that doc https://docs.openstack.org/nova/latest/admin/cpu-topologies.html#customizing-instance-numa-placement-policies | 16:23 |
gibi | bauzas: the max_phy_address patch is just the start of the series. we will need more patches there. | 16:23 |
bauzas | either you explicitly specific a NUMA topology for your guest or you make it implicit with cpu pinning or hugepages flavor extra specs | 16:24 |
bauzas | gibi: ack | 16:24 |
spatel | This is my flavor properties - hw:cpu_policy='dedicated', hw:cpu_sockets='2', hw:cpu_threads='2', hw:mem_page_size='large', hw:pci_numa_affinity_policy='preferred', sriov='true' | 16:24 |
bauzas | spatel: so, you'd just remove the mention of using large pages | 16:24 |
spatel | I didn't tell my workload about where to schedule but it always put my VMs on single NUMA zone | 16:24 |
spatel | Yep! that is what i am thinking, remove from grub and flavor and it should be fine. | 16:25 |
bauzas | the fact it goes to the same NUMA cell is because of the policy https://docs.openstack.org/nova/latest/configuration/extra-specs.html#hw:pci_numa_affinity_policy | 16:26 |
opendevreview | Alexey Stupnikov proposed openstack/nova master: Fix logging in MemEncryption-related checks https://review.opendev.org/c/openstack/nova/+/873388 | 16:27 |
spatel | bauzas worth running some test.. i will pick one compute and try to play and see how it goes | 16:28 |
spatel | I love numa but it has some downside... | 16:28 |
bauzas | spatel: you'll need to modify the nova config, not only grub | 16:28 |
spatel | Yes..i will start with fresh compute node.. i am not going to touch existing one. | 16:29 |
bauzas | actually I'm wrong | 16:29 |
spatel | Ouch!! now what? | 16:29 |
bauzas | no nova conf is required for page management | 16:29 |
spatel | oh! | 16:29 |
bauzas | it just gets it from what we have | 16:29 |
spatel | I can't modify existing VM correct? | 16:30 |
bauzas | spatel: read the docs I gave to you | 16:30 |
bauzas | spatel: no, you can't | 16:30 |
spatel | Perfect! now i got it what to do. | 16:30 |
bauzas | flavor is embedded into the instance data | 16:30 |
bauzas | if you modify a flavor, the instances that booted from that flavor won't magically update | 16:30 |
spatel | I will add fresh compute nodes with no HugePage and create new flavor without Pages | 16:31 |
bauzas | you'll be required to resize with another flavor | 16:31 |
spatel | bauzas I totally understand.. you want just change flavor and it will work magically :) | 16:31 |
spatel | This is what happened last week, one of memory module die which crash my whole compute nodes because of HugePage requirement :( | 16:32 |
bauzas | hah, that's a common failure | 16:34 |
bauzas | and yeah, relying on RAM can be dangerous | 16:34 |
spatel | Yes.. because of that crash it created loop in my switch (I don't know how but it lock up my switch because of STP) | 16:35 |
spatel | Just trying to re-produce this issue with multiple variable to see if i can re-create | 16:35 |
sean-k-mooney[m] | spatel you might want to look at the numa blancing config option | 18:32 |
sean-k-mooney[m] | packing_host_numa_cells_allocation_strategy | 18:32 |
sean-k-mooney[m] | spatel by the way if you are using cpu pinning but not hugepages you should set hw:mem_page_size=small | 18:34 |
sean-k-mooney[m] | if you dont then the vms will randomly get kill due to OOM events | 18:34 |
spatel | hmm is that a new option packing_host_numa_cells_allocation_strategy ? | 18:36 |
spatel | sean-k-mooney[m] this is interesting - by the way if you are using cpu pinning but not hugepages you should set hw:mem_page_size=small | 18:37 |
spatel | does it going to work if i don't configure hugepage in grub? | 18:37 |
sean-k-mooney[m] | packing_host_numa_cells_allocation_strategy is new and we backported it | 18:38 |
sean-k-mooney[m] | i think it was added in zed or yoga we changed the default to spread this cycle or last | 18:39 |
sean-k-mooney[m] | packing_host_numa_cells_allocation_strategy goes in the compute section of the nova.conf i belvie | 18:39 |
sean-k-mooney[m] | https://docs.openstack.org/nova/latest/configuration/config.html#compute.packing_host_numa_cells_allocation_strategy | 18:40 |
sean-k-mooney[m] | spatel: if you are using cpu pinnign hw:mem_page_size need to be set to some valid value to turn on numa aware memory allocation | 18:41 |
sean-k-mooney[m] | if you dont we will scudle based on the gloabl not numa local memory | 18:41 |
sean-k-mooney[m] | the OOM reaper in the kernel operates at the numa level | 18:42 |
spatel | ohhhh | 18:42 |
spatel | I know what you saying.. to run workload in NUMA we need to set hw:mem_page_size | 18:43 |
sean-k-mooney[m] | so the end result of not setting it is we will overcommit the numa node since we are only schduling based on the cpu in that case | 18:43 |
sean-k-mooney[m] | ya basically | 18:44 |
sean-k-mooney[m] | i have wanted to enforce this for a while but there were concerns that operators are depending on the incorrect behavior | 18:44 |
sean-k-mooney[m] | i have wanted to make hw:mem_page_size=any the default if you have a numa toplogy in the guest and dont set anything | 18:45 |
sean-k-mooney[m] | any is the same as small excpet it allows you to override it in the image | 18:45 |
spatel | hmmm | 18:47 |
sean-k-mooney[m] | the simple way to think about it is if its a numa vm you should set a mem_page_size as well | 18:51 |
sean-k-mooney[m] | well or use file backed memory but that is not a configuration that many people use | 18:52 |
spatel | I will do it.. | 18:56 |
melwitt | bauzas: done, thanks for reminding | 19:14 |
gmann | bauzas: yes, I am planning to progress on 864594 but let's see if i can push it before FF | 22:00 |
*** dasm is now known as dasm|off | 22:38 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!