*** tosky has quit IRC | 00:11 | |
*** songwenping__ has joined #openstack-nova | 00:32 | |
*** LinPeiWen has joined #openstack-nova | 00:39 | |
*** ociuhandu has joined #openstack-nova | 00:45 | |
*** ociuhandu has quit IRC | 00:50 | |
*** mlavalle has quit IRC | 01:14 | |
*** adriant7 has joined #openstack-nova | 01:23 | |
*** adriant has quit IRC | 01:25 | |
*** adriant7 is now known as adriant | 01:25 | |
*** brinzhang_ has joined #openstack-nova | 01:32 | |
*** brinzhang has quit IRC | 01:35 | |
*** zzzeek has quit IRC | 01:38 | |
*** zzzeek has joined #openstack-nova | 01:39 | |
*** rcernin has joined #openstack-nova | 01:53 | |
*** Hazelesque has quit IRC | 02:04 | |
*** xinranwang has joined #openstack-nova | 02:05 | |
*** Hazelesque has joined #openstack-nova | 02:14 | |
*** mkrai has joined #openstack-nova | 02:20 | |
*** pmannidi has quit IRC | 02:41 | |
openstackgerrit | YuehuiLei proposed openstack/nova-specs master: Add xena directory for specs https://review.opendev.org/c/openstack/nova-specs/+/778604 | 02:51 |
---|---|---|
*** hemanth_n has joined #openstack-nova | 02:51 | |
*** hemanth_n has quit IRC | 02:53 | |
*** hemanth_n has joined #openstack-nova | 02:54 | |
*** rcernin has quit IRC | 03:06 | |
*** spatel has joined #openstack-nova | 03:08 | |
openstackgerrit | Brin Zhang proposed openstack/nova master: Add missed accel_uuids for _poll_shelved_instances https://review.opendev.org/c/openstack/nova/+/778440 | 03:23 |
*** rcernin has joined #openstack-nova | 03:24 | |
*** rcernin has quit IRC | 03:26 | |
*** rcernin has joined #openstack-nova | 03:27 | |
*** lpetrut has joined #openstack-nova | 03:27 | |
*** tbachman has quit IRC | 03:29 | |
*** tbachman has joined #openstack-nova | 03:30 | |
*** vishalmanchanda has joined #openstack-nova | 03:34 | |
*** psachin has joined #openstack-nova | 03:38 | |
*** martinkennelly has quit IRC | 03:43 | |
*** martinkennelly has joined #openstack-nova | 03:43 | |
*** jamesdenton has quit IRC | 03:59 | |
*** jamesdenton has joined #openstack-nova | 03:59 | |
*** spatel has quit IRC | 04:00 | |
*** martinkennelly has quit IRC | 04:18 | |
*** whoami-rajat has joined #openstack-nova | 04:20 | |
*** Underknowledge has quit IRC | 04:40 | |
*** Underknowledge has joined #openstack-nova | 04:40 | |
*** lpetrut has quit IRC | 04:47 | |
*** ratailor has joined #openstack-nova | 04:49 | |
*** xinranwang has quit IRC | 05:05 | |
*** gyee has quit IRC | 05:20 | |
*** yoctozepto has quit IRC | 06:00 | |
*** yoctozepto has joined #openstack-nova | 06:02 | |
*** zzzeek has quit IRC | 06:03 | |
*** gokhani has joined #openstack-nova | 06:13 | |
*** zzzeek has joined #openstack-nova | 06:16 | |
*** zzzeek has quit IRC | 06:25 | |
*** links has joined #openstack-nova | 06:27 | |
*** zzzeek has joined #openstack-nova | 06:43 | |
*** zzzeek has quit IRC | 06:44 | |
*** zzzeek has joined #openstack-nova | 06:45 | |
*** songwenping__ has quit IRC | 06:48 | |
*** songwenping__ has joined #openstack-nova | 06:49 | |
*** rcernin has quit IRC | 07:00 | |
*** rcernin has joined #openstack-nova | 07:14 | |
*** slaweq has joined #openstack-nova | 07:17 | |
*** lpetrut has joined #openstack-nova | 07:22 | |
*** takamatsu has joined #openstack-nova | 07:24 | |
*** rcernin has quit IRC | 07:30 | |
*** luksky has joined #openstack-nova | 07:36 | |
*** ralonsoh has joined #openstack-nova | 07:37 | |
openstackgerrit | Yongli He proposed openstack/nova master: Smartnic support - cyborg drive https://review.opendev.org/c/openstack/nova/+/771362 | 07:41 |
*** dklyle has quit IRC | 07:41 | |
openstackgerrit | Yongli He proposed openstack/nova master: smartnic support https://review.opendev.org/c/openstack/nova/+/758944 | 07:41 |
*** hamalq has joined #openstack-nova | 07:45 | |
*** rcernin has joined #openstack-nova | 07:55 | |
*** lpetrut has quit IRC | 07:56 | |
*** belmoreira has joined #openstack-nova | 07:59 | |
*** rcernin has quit IRC | 08:00 | |
*** hoonetorg has quit IRC | 08:03 | |
*** khomesh24 has joined #openstack-nova | 08:06 | |
*** rpittau|afk is now known as rpittau | 08:08 | |
*** rcernin has joined #openstack-nova | 08:12 | |
*** rcernin has quit IRC | 08:17 | |
*** ociuhandu has joined #openstack-nova | 08:23 | |
*** andrewbonney has joined #openstack-nova | 08:25 | |
*** mkrai has quit IRC | 08:26 | |
*** mkrai has joined #openstack-nova | 08:27 | |
*** lamt_ has joined #openstack-nova | 08:29 | |
*** TheJulia_ has joined #openstack-nova | 08:29 | |
*** flaviof_ has joined #openstack-nova | 08:29 | |
*** bbezak_ has joined #openstack-nova | 08:29 | |
*** fyx_ has joined #openstack-nova | 08:29 | |
*** johnsom_ has joined #openstack-nova | 08:30 | |
*** cz3_ has joined #openstack-nova | 08:30 | |
yonglihe | gibi: hope you have some bandwidth.. | 08:31 |
*** raorn has joined #openstack-nova | 08:32 | |
*** mnasiadka_ has joined #openstack-nova | 08:33 | |
*** rpittau_ has joined #openstack-nova | 08:33 | |
*** jrollen has joined #openstack-nova | 08:34 | |
*** tosky has joined #openstack-nova | 08:36 | |
*** cz3 has quit IRC | 08:39 | |
*** cz3_ is now known as cz3 | 08:39 | |
*** ociuhandu has quit IRC | 08:40 | |
*** ociuhandu has joined #openstack-nova | 08:41 | |
*** mnasiadka has quit IRC | 08:43 | |
*** raorn_ has quit IRC | 08:43 | |
*** lamt has quit IRC | 08:43 | |
*** rpittau has quit IRC | 08:43 | |
*** fyx has quit IRC | 08:43 | |
*** johnsom has quit IRC | 08:43 | |
*** TheJulia has quit IRC | 08:43 | |
*** flaviof has quit IRC | 08:43 | |
*** bbezak has quit IRC | 08:43 | |
*** jroll has quit IRC | 08:43 | |
*** mnasiadka_ is now known as mnasiadka | 08:43 | |
*** lamt_ is now known as lamt | 08:43 | |
*** TheJulia_ is now known as TheJulia | 08:43 | |
*** flaviof_ is now known as flaviof | 08:43 | |
*** bbezak_ is now known as bbezak | 08:43 | |
*** johnsom_ is now known as johnsom | 08:43 | |
*** rpittau_ is now known as rpittau | 08:43 | |
*** fyx_ is now known as fyx | 08:43 | |
*** khomesh24 has quit IRC | 08:44 | |
*** khomesh24 has joined #openstack-nova | 08:45 | |
*** ociuhandu has quit IRC | 08:45 | |
*** gryf has quit IRC | 08:46 | |
*** mkrai has quit IRC | 08:46 | |
*** ociuhandu has joined #openstack-nova | 08:46 | |
*** hamalq has quit IRC | 08:47 | |
*** gryf has joined #openstack-nova | 08:47 | |
*** lpetrut has joined #openstack-nova | 08:51 | |
*** ociuhandu has quit IRC | 08:55 | |
*** hoonetorg has joined #openstack-nova | 08:55 | |
*** ociuhandu has joined #openstack-nova | 09:01 | |
*** ociuhandu has quit IRC | 09:05 | |
*** ociuhandu has joined #openstack-nova | 09:05 | |
*** derekh has joined #openstack-nova | 09:07 | |
*** lucasagomes has joined #openstack-nova | 09:09 | |
*** rcernin has joined #openstack-nova | 09:13 | |
*** mkrai has joined #openstack-nova | 09:14 | |
*** rcernin has quit IRC | 09:17 | |
bauzas | lyarwood: morning | 09:20 |
* bauzas wonders whether https://040bad29f060e1f76339-88856c79572ad1783ad6e63321e57df6.ssl.cf2.rackcdn.com/761452/11/check/nova-next/fdaaa19/testr_results.html is related to https://bugs.launchpad.net/os-brick/+bug/1820007 | 09:20 | |
openstack | Launchpad bug 1820007 in os-brick "Failed to attach encrypted volumes after detach: volume device not found at /dev/disk/by-id" [Undecided,Fix released] - Assigned to Lee Yarwood (lyarwood) | 09:20 |
bauzas | I'll recheck my change, but given the bug was fixed, I wonder whether it was related | 09:21 |
*** martinkennelly has joined #openstack-nova | 09:22 | |
*** ociuhandu has quit IRC | 09:24 | |
*** zoharm has joined #openstack-nova | 09:33 | |
lyarwood | morning | 09:33 |
* lyarwood looks | 09:33 | |
*** xek has joined #openstack-nova | 09:34 | |
gibi | morning folks | 09:34 |
gibi | stephenfin: do we need https://review.opendev.org/c/openstack/nova/+/765798 for the hypervisor api bp? | 09:35 |
stephenfin | I think it's a nice-to-have rather than a necessity. gmann did say yesterday that he was going to take over that patch though, | 09:36 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: WIP: nova-next: Start testing the 'q35' machine type https://review.opendev.org/c/openstack/nova/+/708701 | 09:37 |
lyarwood | weird, just had `error: remote unpack failed: error Missing blob 08f698bbe36d414eca19d0760a5acc3a714c404e` errors trying to push that ^ | 09:37 |
lyarwood | appears the git gc actually did something for a change and it's fixed after a git fetch -va | 09:37 |
*** ociuhandu has joined #openstack-nova | 09:37 | |
lyarwood | bauzas: looking at that issue now sorry | 09:37 |
bauzas | lyarwood: no worries at all, rechecked meanwhile | 09:38 |
bauzas | just an open question | 09:38 |
bauzas | and I thought you could be interested in since you worked on the fix | 09:38 |
lyarwood | bauzas: yeah this is slightly different | 09:41 |
lyarwood | 43684 Mar 03 20:53:14.002343 ubuntu-focal-limestone-regionone-0023285105 nova-compute[53948]: ERROR oslo_messaging.rpc.server libvirt.libvirtError: internal error: unable to execute QEMU command 'blockdev-add': Could not open '/dev/disk/by-id/scsi-360000000000000000e00000 000010001': Operation not permitted | 09:41 |
lyarwood | we find the device but can't attach it | 09:41 |
lyarwood | weird, we even encrypt it | 09:42 |
lyarwood | ah but through /dev/sda | 09:42 |
lyarwood | I hope that's the same device | 09:43 |
gibi | stephenfin: ack, then I will not push hard on that policy patch | 09:43 |
bauzas | lyarwood: that looks to me a race finding the device, right? | 09:46 |
lyarwood | bauzas: no we've found the device fine, /dev/disk/by-id/scsi-360000000000000000e00000000010001 is connected correctly and we even format/encrypt it with LUKSv1 | 09:50 |
lyarwood | bauzas: QEMU just isn't happy with the passphrase we've provided AFAICT | 09:50 |
bauzas | ah | 09:50 |
lyarwood | bauzas: the only odd thing is that we format/encrypt /dev/sda | 09:50 |
lyarwood | https://github.com/openstack/os-brick/blame/bd629a3a4105f7f3f9f35b71350ea3c66f3690e9/os_brick/encryptors/cryptsetup.py#L80-L81 and https://github.com/openstack/os-brick/blob/bd629a3a4105f7f3f9f35b71350ea3c66f3690e9/os_brick/encryptors/luks.py#L97 cause that | 09:51 |
lyarwood | but I've never seen that be a problem before, it should be the same underlying block device | 09:51 |
lyarwood | unless theres some weirdness in the block layers and /dev/disk/by-id/scsi-360000000000000000e00000000010001 still doesn't look like it's encrypted by the time QEMU attempts to attach it | 09:52 |
*** khomesh24 has quit IRC | 09:55 | |
stephenfin | gibi: lyarwood: bauzas: Finishing off the microversion to allow e.g. 'openstack server create --hostname $HOSTNAME ...'. Do we want to allow users to update the hostname? | 09:56 |
bauzas | stephenfin: spec ? | 09:56 |
stephenfin | I said in the spec that we would, but all that will change is what's stored on the metadata service unless someone re-runs e.g. cloud-init | 09:57 |
stephenfin | https://specs.openstack.org/openstack/nova-specs/specs/wallaby/approved/configurable-instance-hostnames.html | 09:57 |
stephenfin | So I'm concerned it might be misleading | 09:57 |
bauzas | I missed that one or I'm old | 09:57 |
*** ociuhandu has quit IRC | 09:57 | |
stephenfin | Well you are old... | 09:58 |
stephenfin | but I guess you just missed it :P | 09:58 |
bauzas | it's coming from the display name issue when users were dumb enough to think that ubuntu20.04 was a valid hostname ? | 09:58 |
stephenfin | yes | 09:58 |
bauzas | seriously | 09:58 |
stephenfin | well, sort of | 09:59 |
* bauzas is desperated by this | 09:59 | |
stephenfin | we said the idea of tying a display name and hostname together wasn't necessarily that clever | 09:59 |
bauzas | stephenfin: users can change their instance hostnames without asking nova, right? | 09:59 |
bauzas | it's just that nova metadata will give you one | 09:59 |
bauzas | but you can change it | 10:00 |
stephenfin | of course, but they'd have to disable cloud-init (or part thereof) | 10:00 |
bauzas | I'm pretty sure they don't need to do it | 10:00 |
bauzas | at least in 2013, this wasn't required | 10:00 |
bauzas | but now quantum, err neutron, does exist | 10:01 |
bauzas | but OK | 10:02 |
bauzas | stephenfin: let's assume the user wants to change their hostnames on the nova CLI, what's your concern ? | 10:03 |
stephenfin | I'm trying to find the relevant cloud-init docs. Best I've got is https://cloudinit.readthedocs.io/en/21.1/topics/instancedata.html but that's EC2-specific | 10:03 |
stephenfin | My concern is that AFAICT cloud-init only runs once when setting up the instance | 10:03 |
stephenfin | so providing a way to change the hostname in the metadata service could be misleading, since one would need a service to propagate that change to the instance and I don't know if such a service exists | 10:04 |
bauzas | stephenfin: https://cloudinit.readthedocs.io/en/latest/topics/modules.html#set-hostname | 10:04 |
bauzas | now I remember | 10:05 |
bauzas | man, I turned 40 but I forgot I reviewed this one | 10:05 |
bauzas | and now I see my vote on the change, I remember I approved it by fatigue | 10:06 |
lyarwood | bauzas: found it, there's another request to attach a volume that also ends up with that WWN somehow | 10:08 |
bauzas | lyarwood: hah | 10:09 |
bauzas | good catch, hence the conflict | 10:09 |
bauzas | stephenfin: so, IIRC, you can turn off the hostname management with cloud-init and set it thru any management tool like ansible or puppet | 10:10 |
lyarwood | yeah I'm not sure if this is an os-brick or cinder bug tbh | 10:10 |
bauzas | stephenfin: so, preserve_hostname be True in cloud.cfg and then you can play with /etc/hosts like you want | 10:11 |
gibi | stephenfin: on use case I can imagine for the changing of the hostname is that the instance is do managed by ansible, and the user changed the hostname with that and want to keep the nova view in sync with what is in the instance | 10:11 |
bauzas | gibi: yeah, honestly I feel bad with my review of the spec | 10:12 |
bauzas | I just feel I haven't properly reviewed it and eventually gave up with loosely approving it | 10:12 |
bauzas | because I see some operator concerns | 10:12 |
lyarwood | ah it's a cinder bug, noice. | 10:14 |
lyarwood | caused by us running multiple c-vol backends >< | 10:14 |
*** ociuhandu has joined #openstack-nova | 10:14 | |
lyarwood | fun, I bet this has burnt us for years and no one has noticed | 10:14 |
bauzas | eeek, haven't seen the time flying and I need to dad taxi, shit. | 10:15 |
bauzas | my productivity would dramatically increase in 5 years once my both kids are in college. | 10:16 |
bauzas | (4 years actually) | 10:16 |
stephenfin | gibi: That's a fair point | 10:20 |
stephenfin | Aight, I'll do that so | 10:20 |
* stephenfin respins | 10:20 | |
*** ociuhandu has quit IRC | 10:22 | |
lyarwood | bauzas: FWIW https://bugs.launchpad.net/cinder/+bug/1917750 | 10:26 |
openstack | Launchpad bug 1917750 in Cinder "Running parallel iSCSI/LVM c-vol backends is causing random failures in CI" [Undecided,New] | 10:26 |
*** ociuhandu has joined #openstack-nova | 10:30 | |
*** ociuhandu has quit IRC | 10:30 | |
*** ociuhandu has joined #openstack-nova | 10:30 | |
*** zzzeek has quit IRC | 10:35 | |
*** zzzeek has joined #openstack-nova | 10:35 | |
*** ociuhandu has quit IRC | 10:36 | |
*** artom has quit IRC | 10:39 | |
*** jangutter has joined #openstack-nova | 10:41 | |
*** jangutter has quit IRC | 10:43 | |
*** jangutter has joined #openstack-nova | 10:43 | |
*** ociuhandu has joined #openstack-nova | 10:43 | |
*** jangutter_ has quit IRC | 10:44 | |
*** martinkennelly has quit IRC | 10:51 | |
*** k_mouza has joined #openstack-nova | 10:59 | |
*** rcernin has joined #openstack-nova | 11:08 | |
*** rcernin has quit IRC | 11:13 | |
*** artom has joined #openstack-nova | 11:37 | |
*** jangutter_ has joined #openstack-nova | 11:42 | |
*** jangutter has quit IRC | 11:45 | |
openstackgerrit | Merged openstack/nova master: libvirt: parse alias out from device config https://review.opendev.org/c/openstack/nova/+/772384 | 11:49 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: Remove references to 'inst_type' https://review.opendev.org/c/openstack/nova/+/778548 | 11:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: api: Rename 'parameter_types.hostname' -> 'fqdn' https://review.opendev.org/c/openstack/nova/+/778549 | 11:52 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: api: Add support for 'hostname' parameter https://review.opendev.org/c/openstack/nova/+/778550 | 11:52 |
*** mkrai has quit IRC | 11:52 | |
*** legochen has quit IRC | 11:53 | |
*** tkajinam has quit IRC | 11:56 | |
*** psachin has quit IRC | 12:14 | |
*** martinkennelly has joined #openstack-nova | 12:19 | |
*** ociuhandu has quit IRC | 12:21 | |
*** rcernin has joined #openstack-nova | 12:24 | |
openstackgerrit | Merged openstack/nova master: tests: Poison os.uname https://review.opendev.org/c/openstack/nova/+/775415 | 12:28 |
*** rcernin has quit IRC | 12:29 | |
*** ociuhandu has joined #openstack-nova | 12:37 | |
*** ratailor has quit IRC | 12:38 | |
*** ociuhandu has quit IRC | 12:42 | |
*** rcernin has joined #openstack-nova | 12:48 | |
*** rcernin has quit IRC | 12:53 | |
*** ociuhandu has joined #openstack-nova | 12:53 | |
kashyap | Can anyone give this the final ACK (already has a +2), and put it through, please? -- https://review.opendev.org/c/openstack/nova/+/774240 | 12:55 |
*** ociuhandu has quit IRC | 12:59 | |
gibi | stephenfin ^^ please? | 13:00 |
stephenfin | yup, will look shortly | 13:00 |
gibi | thnks | 13:01 |
*** ociuhandu has joined #openstack-nova | 13:06 | |
openstackgerrit | Merged openstack/nova stable/ussuri: Fallback to same-cell resize with qos ports https://review.opendev.org/c/openstack/nova/+/773932 | 13:08 |
*** nightmare_unreal has joined #openstack-nova | 13:19 | |
openstackgerrit | Takashi Kajinami proposed openstack/nova master: WIP: Clean up allocations left by evacuation https://review.opendev.org/c/openstack/nova/+/778696 | 13:28 |
*** jangutter has joined #openstack-nova | 13:34 | |
openstackgerrit | Takashi Kajinami proposed openstack/nova master: WIP: Clean up allocations left by evacuation https://review.opendev.org/c/openstack/nova/+/778696 | 13:36 |
*** jangutter_ has quit IRC | 13:37 | |
*** jangutter has quit IRC | 13:43 | |
*** jangutter_ has joined #openstack-nova | 13:44 | |
*** spatel has joined #openstack-nova | 13:48 | |
*** derekh has quit IRC | 13:52 | |
*** derekh has joined #openstack-nova | 13:52 | |
*** hemanth_n has quit IRC | 13:55 | |
*** amodi has quit IRC | 14:04 | |
*** amodi has joined #openstack-nova | 14:06 | |
gmann | stephenfin: yeah, was busy yesterday but I am going to update that today. | 14:12 |
*** legochen has joined #openstack-nova | 14:16 | |
*** jangutter has joined #openstack-nova | 14:16 | |
*** jangutter_ has quit IRC | 14:20 | |
*** legochen has quit IRC | 14:21 | |
*** ociuhandu has quit IRC | 14:21 | |
*** ociuhandu has joined #openstack-nova | 14:22 | |
*** ociuhandu has quit IRC | 14:28 | |
bauzas | gibi: thanks for the +2 on RPC API, tbc I did put a -2 on my own change as I think we should only merge it after FF next week | 14:39 |
bauzas | gibi: I guess you don't have yet a RC1 etherpad ? | 14:40 |
bauzas | stephenfin: dansmith: although I marked -2 on https://review.opendev.org/c/openstack/nova/+/761452 I'd appreciate a second core review for making sure we can land it when we want | 14:41 |
dansmith | I know, still pending | 14:41 |
openstackgerrit | Lee Yarwood proposed openstack/nova master: nova-next: Start testing the q35 machine type https://review.opendev.org/c/openstack/nova/+/708701 | 14:42 |
*** rcernin has joined #openstack-nova | 14:49 | |
gibi | bauzas: sure, the RPC bump need to be on hold until FF | 14:53 |
gibi | bauzas: I have the RC etherpad ready :) https://etherpad.opendev.org/p/nova-wallaby-rc-potential | 14:53 |
bauzas | huzzah, will add it then | 14:53 |
*** ociuhandu has joined #openstack-nova | 14:54 | |
gibi | thanks | 14:54 |
kashyap | lyarwood: Ah, thanks for adding to CirrOS itself: https://github.com/cirros-dev/cirros/pull/65 | 14:54 |
*** rcernin has quit IRC | 14:54 | |
bauzas | gibi: oh, already there, nice (or grenoble) | 14:54 |
* bauzas does a local pun | 14:54 | |
gibi | :) | 14:55 |
*** ociuhandu has quit IRC | 14:56 | |
*** ociuhandu has joined #openstack-nova | 14:57 | |
*** khomesh24 has joined #openstack-nova | 14:58 | |
*** jmlowe has quit IRC | 15:06 | |
*** jmlowe has joined #openstack-nova | 15:08 | |
*** ociuhandu has quit IRC | 15:13 | |
*** ociuhandu has joined #openstack-nova | 15:14 | |
*** ociuhandu has quit IRC | 15:19 | |
*** ociuhandu has joined #openstack-nova | 15:29 | |
*** __ministry1 has joined #openstack-nova | 15:29 | |
openstackgerrit | Stephen Finucane proposed openstack/nova master: tests: Remove useless mocks https://review.opendev.org/c/openstack/nova/+/778730 | 15:33 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: tests: Remove duplicate policy tests https://review.opendev.org/c/openstack/nova/+/778731 | 15:33 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: tests: Speed up 'servers' API tests https://review.opendev.org/c/openstack/nova/+/778732 | 15:33 |
*** lpetrut has quit IRC | 15:36 | |
*** __ministry1 has quit IRC | 15:37 | |
lyarwood | stephenfin: https://review.opendev.org/c/openstack/nova/+/673790 - any plans to respond to this -1 btw, I'm ready to go through the rest of the series once we've sorted this out. | 15:41 |
stephenfin | lyarwood: yup, I was going to do a separate FUP | 15:42 |
stephenfin | to avoid rebasing the whole series | 15:42 |
lyarwood | stephenfin: yup fair, I'll let you update the change before I continue on in the series | 15:43 |
lyarwood | and by that I mean comment, not rebase or anything | 15:43 |
stephenfin | kashyap: comments left on https://review.opendev.org/c/openstack/nova/+/774240. If you can respin I'll re-review today | 15:45 |
*** claudiub has joined #openstack-nova | 15:46 | |
kashyap | stephenfin: Thanks for the review. Let me look ... | 15:47 |
kashyap | stephenfin: That mock of _register_instance_machine_type is required after Lee's change | 15:49 |
stephenfin | but you didn't touch that function? | 15:49 |
kashyap | stephenfin: Especially as the test is calling init_host() directly | 15:49 |
stephenfin | kashyap: Ah, whoops | 15:50 |
lyarwood | the diff has moved it around | 15:50 |
stephenfin | yup | 15:50 |
stephenfin | apologies | 15:50 |
kashyap | No problem | 15:50 |
*** gyee has joined #openstack-nova | 15:52 | |
*** spatel has quit IRC | 15:55 | |
kashyap | stephenfin: Is it palatable to you if I don't address the style nit here: https://review.opendev.org/c/openstack/nova/+/774240/11/nova/tests/unit/virt/libvirt/test_driver.py#1579 | 15:57 |
gibi | nova meeting starts in 2 minutes in #openstack-meeting-3 | 15:57 |
kashyap | stephenfin: I'm addressing the rest of all your comments | 15:57 |
stephenfin | sure | 15:59 |
stephenfin | tbc though, I'm only suggesting doing that for the new functions, not the old ones of course | 16:00 |
openstackgerrit | Stephen Finucane proposed openstack/nova master: trivial: Clarify purpose of 'Host.supports_*' properties https://review.opendev.org/c/openstack/nova/+/778739 | 16:00 |
stephenfin | lyarwood: ^ | 16:00 |
stephenfin | Lemme know if that's not clear | 16:00 |
*** dklyle has joined #openstack-nova | 16:00 | |
lyarwood | stephenfin: thanks | 16:01 |
kashyap | stephenfin: Ah, okay; yes, just for the newly-added ones might as well address that | 16:06 |
openstackgerrit | Claudiu Belu proposed openstack/nova master: POC: tests: Adds test checking unbalanced NUMA node association https://review.opendev.org/c/openstack/nova/+/778740 | 16:10 |
*** zoharm has quit IRC | 16:12 | |
lyarwood | dansmith: sorry joined the meeting late, re the cinder failures, anything like https://bugs.launchpad.net/cinder/+bug/1917750 ? | 16:13 |
openstack | Launchpad bug 1917750 in Cinder "Running parallel iSCSI/LVM c-vol backends is causing random failures in CI" [Undecided,New] | 16:13 |
dansmith | lyarwood: depends on how that manifests, but can't say I've seen that specifically | 16:14 |
dansmith | the two major symptoms I see are a complaint about state conflict, and "unable to delete volume" | 16:14 |
lyarwood | right sorry, that basically leads to two instances looking at the same volume even when it isn't multiattached | 16:14 |
dansmith | ack | 16:15 |
dansmith | (and also.. ouch) | 16:15 |
lyarwood | that could be related if we are trying to detach the volume in Nova but it's still attached to another instance | 16:15 |
dansmith | well, I haven't seen detach fails, so much as failure to delete, but I guess it's possible it's just how we report it | 16:16 |
lyarwood | oh if it's the actual delete on the cinder side then it's likely something else | 16:16 |
lyarwood | we've already nuked the connections to the computes at that point | 16:17 |
dansmith | yeah I think it's like "delete volume, poll until it's gone...timeout" | 16:17 |
*** vishalmanchanda has quit IRC | 16:17 | |
lyarwood | kk, melwitt had a bug for lvcreate being slow, assuming it's waiting on lvdelete it could be related | 16:17 |
lyarwood | lvchange* | 16:17 |
lyarwood | there's no lvdelete | 16:17 |
dansmith | ack | 16:18 |
*** mlavalle has joined #openstack-nova | 16:28 | |
kashyap | stephenfin: Isn't your "while" spurious, here, on line-8? (The "it. If niether..." bit makes sense, though.) -- https://review.opendev.org/c/openstack/nova/+/774240/11/releasenotes/notes/allow-disabling-cpu-flags-cc861a3bdfffadf8.yaml#8 | 16:30 |
kashyap | It looks like so. I'll disregard it. | 16:31 |
*** links has quit IRC | 16:33 | |
stephenfin | kashyap: I think it's relevant | 16:33 |
stephenfin | This is possible via a '+' / '-' notation, where if you specify a CPU flag prefixed with a '+' sign (without quotes), it will be enabled for the guest, a prefix of '-' will disable it | 16:33 |
stephenfin | This is possible via a '+' / '-' notation, where if you specify a CPU flag prefixed with a '+' sign (without quotes) then it will be enabled for the guest while a prefix of '-' will disable it. | 16:34 |
stephenfin | The latter reads better to me | 16:34 |
kashyap | stephenfin: Ah, there; yes. But you're missing a comma after "guest" :) | 16:35 |
kashyap | Either that, or I've gone comma-wild (I was accused of this once, in a friendly way, on qemu-devel list before :D) | 16:35 |
stephenfin | Correct, missing comma | 16:36 |
stephenfin | since it's a comparison | 16:36 |
stephenfin | *Correct. Missing comma ;) | 16:36 |
*** lpetrut has joined #openstack-nova | 16:36 | |
kashyap | stephenfin: I.e. the missing comma in the second version is correct? Yeah? | 16:37 |
stephenfin | yeah, add the comma | 16:37 |
kashyap | Ah, nod. | 16:37 |
lyarwood | stephenfin: https://review.opendev.org/c/openstack/nova/+/769548 - just a reminder if you didn't have this on your list, that would then move the series into the gate. | 16:38 |
claudiub | Hello! I've noticed that in the NUMA-related docs (https://docs.openstack.org/nova/latest/admin/cpu-topologies.html#customizing-instance-numa-placement-policies) it says that "The NUMA node(s) used are normally chosen at random", | 16:38 |
lyarwood | oh and https://review.opendev.org/c/openstack/nova/+/778462/2 that gibi++ added in | 16:38 |
claudiub | but numa_fit_instance_to_host (https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L2235) says that it will return a new InstanceNUMATopology with its cell ids set to host cell ids of the first successful permutation, or None. | 16:38 |
*** lpetrut_ has joined #openstack-nova | 16:38 | |
*** lpetrut_ has quit IRC | 16:38 | |
claudiub | so, there's a mismatch there. From what I've seen, all the instances end up in the 1st NUMA node. | 16:39 |
stephenfin | lyarwood: was in the middle of reviewing. +2 on the whole series now | 16:39 |
claudiub | that could be problematic. Basically, I can have nodes with 50% consumed resources, but 1 NUMA node completely empty. | 16:39 |
stephenfin | claudiub: Yup, that's a known bug :( | 16:40 |
claudiub | so, I'm wondering which should be corrected: The docs, or the code. | 16:40 |
stephenfin | I think sean-k-mooney filed a bug for same | 16:40 |
stephenfin | claudiub: the code | 16:40 |
claudiub | ah, gotcha. :) | 16:40 |
claudiub | Wondering what the backport potential for this would be, if a fix would be added. :) | 16:41 |
stephenfin | we should shuffle the nodes or sort by least-allocated node | 16:41 |
stephenfin | depends on the implementation of course but definitely backportable IMO | 16:41 |
stephenfin | *should definitely be | 16:41 |
claudiub | is someone working on this? I could look into it if no one is | 16:42 |
*** lpetrut has quit IRC | 16:42 | |
stephenfin | Not right now. If you could look, I'd be happy to review | 16:42 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: pci: track host NUMA topology in stats https://review.opendev.org/c/openstack/nova/+/774149 | 16:42 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: pci: implement the 'socket' NUMA affinity policy https://review.opendev.org/c/openstack/nova/+/772779 | 16:42 |
openstackgerrit | Artom Lifshitz proposed openstack/nova master: pci: always pass node_id to manager https://review.opendev.org/c/openstack/nova/+/778747 | 16:43 |
lyarwood | stephenfin: awesome thanks for that | 16:43 |
openstackgerrit | Kashyap Chamarthy proposed openstack/nova master: libvirt: Allow disabling CPU flags via `cpu_model_extra_flags` https://review.opendev.org/c/openstack/nova/+/774240 | 16:43 |
claudiub | great, will let you know how it goes. :) | 16:43 |
kashyap | stephenfin: gibi (lost your +2, if you'd like to add the stamp again) --^ Addressed the doc nits from Stephen | 16:44 |
gibi | sure | 16:44 |
* kashyap hopes he didn't mess up anything else | 16:44 | |
bauzas | sean-k-mooney: could you please confirm that the os-vif SHA1 for the next release looks good to you ? | 16:45 |
bauzas | sean-k-mooney: https://review.opendev.org/c/openstack/releases/+/777955/1/deliverables/wallaby/os-vif.yaml | 16:45 |
bauzas | from what I see, yup | 16:45 |
sean-k-mooney | looking | 16:46 |
sean-k-mooney | yep its the current head of master and the main delta is just fixing the lower-constratists-job and a deprecation wraning | 16:49 |
sean-k-mooney | normlaly i would say that it could be a bugfix release but we always do feature version bump for the end of a cycle | 16:49 |
sean-k-mooney | so this looks correct | 16:49 |
*** rcernin has joined #openstack-nova | 16:50 | |
claudiub | Also, speaking of NUMA, I was wondering if you know if an instance placed in a single NUMA node can be live-migrated to another node in a different NUMA node, or it has to be in the same NUMA node? Trying to gauge the severity of that bug that places all the instances in the same NUMA node. | 16:50 |
sean-k-mooney | claudiub: that can be done but only from train | 16:51 |
sean-k-mooney | claudiub: artom added numa live migration in the train release whre we can regenerate teh xml as part of the migration | 16:51 |
claudiub | live-migrating instances with numa topologies, right? I've seen that bit in code | 16:51 |
artom | claudiub, except the user can't specify which NUMA node - which I think is what you're getting at | 16:52 |
sean-k-mooney | what bug are you triaging? | 16:52 |
artom | Nova just finds a "free" one | 16:52 |
claudiub | artom: that's perfect then. :) | 16:52 |
sean-k-mooney | claudiub: please do not file a new bug for numa blanacing | 16:53 |
claudiub | sean-k-mooney: I was hitting an issue where all the created instances were created on the same numa node, so I was asking about that. :) | 16:53 |
sean-k-mooney | claudiub: yep that is by design | 16:53 |
sean-k-mooney | its somethign we could chagne but when i brough tit up i was told it was a featur not a bug | 16:54 |
sean-k-mooney | we talked about in the last ptg and when numa was first being added many years ago | 16:54 |
*** rcernin has quit IRC | 16:55 | |
claudiub | well, one thing's for certain: the docs and the code don't match. :) The docs say the chosen NUMA node is random. | 16:55 |
sean-k-mooney | from a user perspecitve it is | 16:55 |
claudiub | Wondering if we could at least have a config option to randomize the selected numa node (ofc, if it can be randomized. not talking about PCI devices) | 16:55 |
sean-k-mooney | its not actully random form an api point of view it underfied | 16:55 |
sean-k-mooney | i have a much better solution for this just trying to fine the bug i already had filed | 16:56 |
sean-k-mooney | https://bugs.launchpad.net/nova/+bug/1893121 | 16:56 |
openstack | Launchpad bug 1893121 in OpenStack Compute (nova) "nova does not balance vm across numa node or prefer numa node with pci device when one is requested" [Undecided,Confirmed] - Assigned to sean mooney (sean-k-mooney) | 16:56 |
sean-k-mooney | claudiub: this is what determins the order | 16:57 |
sean-k-mooney | https://github.com/openstack/nova/blob/20459e3e88cb8382d450c7fdb042e2016d5560c5/nova/virt/hardware.py#L2268-L2277 | 16:57 |
sean-k-mooney | as an end user you are not allowed to rely on the detail of that implemenation | 16:57 |
sean-k-mooney | what i am proposing and had plannd on woking on is doing muliple sorts | 16:58 |
sean-k-mooney | so that we will blance vm plamcne based on avaiable resouces on a host | 16:58 |
sean-k-mooney | claudiub: line 693 is the ptg dicussion form the last ptg on the topic https://etherpad.opendev.org/p/nova-wallaby-ptg | 16:59 |
sean-k-mooney | this is one of the topic i need to bring up internally but i would like to adress this next cycle if i can make time but if you have time to work on it then that woudl be good too. | 17:00 |
sean-k-mooney | numa blancing is the non invaisive way to optimise better then we do today. | 17:01 |
sean-k-mooney | long term we should be doing this more abstractly. e.g. computeing a cost metic for any give plamcnet based on a number of factors and then minimsiing that. | 17:02 |
sean-k-mooney | kind of like how the wehers work but placment complictates that. | 17:02 |
sean-k-mooney | claudiub: the workaround for now is to follow the advice we always gave. try to create flavors that aproximate the host toplogy. e.g. if you hosts all have 2 numa nodes then default to createign flavors with hw:numa_nodes=2 | 17:03 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Replace blind retry with libvirt event waiting in detach https://review.opendev.org/c/openstack/nova/+/770246 | 17:04 |
*** lucasagomes has quit IRC | 17:05 | |
claudiub | sean-k-mooney: Hmm, I see. but correct me if I'm wrong, but in that code snippet, the host cells are not randomized or sorted in any other way if there are no pci_requests and no pci_stats. It's the same host_cells athat are set in host_topology.cells (not sure if the order ever changes here). I could spend some time on it, will read the PTG notes as well. | 17:08 |
*** belmoreira has quit IRC | 17:09 | |
sean-k-mooney | if pci devices are not requested we sort the numa nodes to prefer the onces without numa nodes | 17:09 |
claudiub | But in any case, setting hw_numa_nodes=2 doesn't help in our scenario, especially since we also have nodes with just 1 NUMA node. :) Additionally, setting the instances on 2 numa nodes could affect the guest performance as well. | 17:09 |
claudiub | sean-k-mooney: indeed, agreed there. | 17:10 |
sean-k-mooney | otherwise itertools.permutations iterates over them in a determisict maner | 17:10 |
sean-k-mooney | which for singel numa nodes instace is acending order form numa node 0 | 17:10 |
sean-k-mooney | then we bail out if it fits and dont try any other nodes | 17:11 |
sean-k-mooney | so that packs numa0 | 17:11 |
claudiub | yep | 17:11 |
sean-k-mooney | this is has alwasy been the case since numa was firsts added | 17:11 |
sean-k-mooney | the specific behavior is implemation defiend and is not garenteed by the api | 17:12 |
artom | stephenfin, D: how did you not jump on https://review.opendev.org/c/openstack/nova/+/774240/12/nova/virt/libvirt/driver.py#697 with a -2? | 17:12 |
artom | ;) | 17:12 |
*** khomesh24 has quit IRC | 17:12 | |
sean-k-mooney | so it can be modified but the current beahivor optimises for being able to spawn large vms | 17:12 |
sean-k-mooney | if we balance the vms between numa ndoes it will improve performance in genreal but pessimise spawning large vms | 17:12 |
sean-k-mooney | so there is a tradeoff between packeing and spreading/blancing | 17:13 |
claudiub | i agree there, and I am aware of that. :) | 17:13 |
sean-k-mooney | yep so this is why we said this cant be change in a bug and need a spec | 17:13 |
sean-k-mooney | which means upstream at least not backportable | 17:13 |
sean-k-mooney | downstream we likely would backport it but not chagne the behavior by default | 17:14 |
stephenfin | artom: My yoga guy said I needed to be more chill about these things | 17:14 |
stephenfin | I'll fire him in the morning | 17:14 |
sean-k-mooney | upstream if accpetd i would like to change the default but in the cycle after its added | 17:14 |
*** ociuhandu_ has joined #openstack-nova | 17:14 | |
stephenfin | <sean-k-mooney> so it can be modified but the current beahivor optimises for being able to spawn large vms | 17:14 |
stephenfin | are you talking about unpinned instances? | 17:14 |
sean-k-mooney | stephenfin: no all numa instances | 17:15 |
sean-k-mooney | stephenfin: by packing numa nodes before moving on | 17:15 |
sean-k-mooney | we keep the rest free | 17:15 |
sean-k-mooney | so vms that need all teh ram or cpus on a numa node can boot | 17:15 |
stephenfin | ah, yeah I don't think that's a good argument for the unpinned case | 17:15 |
sean-k-mooney | if we spread then that makes large vms less likely to fit | 17:15 |
stephenfin | spreading makes sense there IMO | 17:15 |
sean-k-mooney | unpinned float so ya not an issue | 17:15 |
stephenfin | yup | 17:16 |
claudiub | in our scenario, large vms is not a concern, since we have pretty large hosts, so in our case, it would work better for a spread-out approach. But indeed, not everyone is the same. Could this be a config option then? | 17:16 |
stephenfin | if you don't spread, you'll basically never end up on node 1, as claudiub is seeing | 17:16 |
stephenfin | or node N > 0 | 17:16 |
sean-k-mooney | claudiub: yep it could be a config option or it coudl be done via aggreate metadata | 17:16 |
stephenfin | I don't think that's needed or unpinned | 17:17 |
stephenfin | *for | 17:17 |
*** ociuhandu has quit IRC | 17:17 | |
sean-k-mooney | stephenfin: well this is needed for all numa guests pinned or unpinned | 17:17 |
claudiub | hmm, aggregate metadata also sounds interesting. it could have best of both worlds | 17:17 |
sean-k-mooney | but its not needed for non numa instnaces | 17:17 |
sean-k-mooney | claudiub: my concern is if we codify this as a feature we need to supprot it with placment in the future | 17:17 |
stephenfin | pinned guests can already use NUMA nodes > 0 | 17:17 |
sean-k-mooney | claudiub: we can do that but that means we need to do the same behavior by sorting the allocation candiates | 17:18 |
sean-k-mooney | yep they can but that not really the issue | 17:18 |
*** ociuhandu_ has quit IRC | 17:18 | |
claudiub | hm, I am a bit outside the loop with the placement api, but doesn't the NUMAPlacementFilter also use the numa_fit_instance_to_host function? | 17:19 |
sean-k-mooney | claudiub: yes currently we are not usign placmnet for numa and wont be for a few release | 17:19 |
claudiub | or whatever the fitler name was. :) | 17:19 |
claudiub | oh ok, gotcha. | 17:19 |
stephenfin | can we back up and say why any of this needs a aggregate metadata filter | 17:19 |
stephenfin | we're not trying to change stack/spread behavior for pinned instances, right? | 17:19 |
sean-k-mooney | claudiub: the concern is the more featers the filter has the more we need to port to the plamcent version | 17:20 |
sean-k-mooney | stephenfin: no new aggreate filter | 17:20 |
stephenfin | only unpinned NUMA instances, because those are all landing on NUMA node 0 | 17:20 |
stephenfin | sorry, an aggregate metadata key | 17:20 |
sean-k-mooney | stephenfin: nope hugepages and all other numa instnace land there too | 17:20 |
stephenfin | <sean-k-mooney> claudiub: yep it could be a config option or it coudl be done via aggreate metadata | 17:20 |
stephenfin | ^ that | 17:20 |
stephenfin | a page with hugepages and no pinning *is* an unpinned NUMA instance | 17:21 |
sean-k-mooney | stephenfin: if we do it per host it has to be accounted for on live migration and we have the same proble we had wtih PCPUs and hyperthreading | 17:21 |
stephenfin | if we do what per hose? | 17:21 |
stephenfin | *host | 17:21 |
claudiub | Hm, I'm wondering why the current implementation is: "Hey HostState, I'm a request spec, pls fit me", rather than: "Hey HostState, Placement told me to sit in your X numa node." | 17:21 |
sean-k-mooney | stephenfin: allow packing vs spreading | 17:21 |
stephenfin | I'm not suggesting making it configurable at all | 17:22 |
sean-k-mooney | well people objected ot hardcoding spreading | 17:22 |
stephenfin | the current packing behavior is a bug | 17:22 |
stephenfin | did they? Link? | 17:22 |
stephenfin | claudiub: Yes, it's the former | 17:22 |
sean-k-mooney | i filed https://bugs.launchpad.net/nova/+bug/1893121 and was told its a feature not a bug | 17:23 |
openstack | Launchpad bug 1893121 in OpenStack Compute (nova) "nova does not balance vm across numa node or prefer numa node with pci device when one is requested" [Undecided,Confirmed] - Assigned to sean mooney (sean-k-mooney) | 17:23 |
sean-k-mooney | stephenfin: then i brought it up in the ptg as a bug and was told it need a spec https://etherpad.opendev.org/p/nova-wallaby-ptg | 17:23 |
stephenfin | claudiub: placement gives us X VCPU inventory, but it has nothing to do with what actual host CPUs are used | 17:23 |
sean-k-mooney | stephenfin: line 710 | 17:23 |
stephenfin | claudiub: Put another way, placement says you may have 4 unpinned/pinned CPUs, and nova-compute says you may map to these specific hosts core(s). Placement doesn't track the specifics | 17:24 |
sean-k-mooney | stephenfin: if we alwasys want to spread thats simple. and thats what i wanted to do orginally | 17:24 |
stephenfin | sean-k-mooney: Yeah, that's never an RFE. I'm not sure how we came to that conclusion | 17:25 |
stephenfin | Packing all your instances onto one host NUMA node and leaving the others empty regardless of the number of instances created is a bug every day of the week :) | 17:26 |
*** jangutter_ has joined #openstack-nova | 17:26 | |
sean-k-mooney | if we are ok treating this as a bug the i will try to work on it next cycle and backport it | 17:26 |
stephenfin | well it sound like claudiub might have time to work on it also | 17:26 |
stephenfin | which would be great :) | 17:26 |
gibi | stephenfin: what will happen when we model NUMA nodes in placement? I guess at that point placement will track how many PCPU belongs to which NUMA node. | 17:26 |
sean-k-mooney | gibi: we will need to sort the allocation candiates | 17:27 |
stephenfin | gibi: s/when/if/ ;) | 17:27 |
sean-k-mooney | to have the same behviaor | 17:27 |
sean-k-mooney | we should get mutiple allcoation candiate per host | 17:27 |
stephenfin | yeah, what sean-k-mooney said | 17:27 |
gibi | OK. I'm out of brain power but I feel that if we discussed it once and came to a conclusion that it is a feature then there might be complications | 17:27 |
*** rpittau is now known as rpittau|afk | 17:27 | |
sean-k-mooney | gibi: right now since the behavior is currently undefiend we can pretend the exsiting behavior is not a thing | 17:28 |
stephenfin | I say we work on the patch and then review | 17:28 |
stephenfin | complications should be evident by then | 17:28 |
sean-k-mooney | gibi: the concern was that it could cause large vms that previousl would have boot to fail | 17:29 |
stephenfin | and we can adjust accordingly | 17:29 |
*** jangutter has quit IRC | 17:29 | |
gibi | so this will be somehow configurable or user requestable to pack or spread? | 17:29 |
sean-k-mooney | that was the reason we said i should write a spec | 17:29 |
stephenfin | sean-k-mooney: that seems like an acceptable compromise given the alternative | 17:29 |
sean-k-mooney | to debate that point | 17:29 |
sean-k-mooney | stephenfin: it not that we wont use the other numa nodes ever | 17:30 |
sean-k-mooney | we will if the vm does nto fit on the first node | 17:30 |
claudiub | btw, I had nodes with ~400 instances on a single NUMA node, and the other numa node was empty. :) | 17:30 |
stephenfin | sean-k-mooney: I don't think that will ever happen | 17:31 |
stephenfin | Yeah, what claudiub says | 17:31 |
sean-k-mooney | stephenfin: it does | 17:31 |
stephenfin | I'm almost certain we don't apply overcommit ratios correctly on a per-node basis | 17:31 |
sean-k-mooney | claudiub: you likely have misconfigured your flavors | 17:31 |
stephenfin | As above, I say we get a fix and functional test for this and then debate it there | 17:31 |
sean-k-mooney | stephenfin: we dont but i think claudiub is missing hw:mem_page_size | 17:32 |
openstackgerrit | Balazs Gibizer proposed openstack/nova master: Replace blind retry with libvirt event waiting in detach https://review.opendev.org/c/openstack/nova/+/770246 | 17:32 |
gibi | I rest my case (at least for today) | 17:32 |
sean-k-mooney | stephenfin: we said we should start default that to hw:mem_page_size=any for all numa instnaces | 17:32 |
gibi | see you tomorrow | 17:32 |
claudiub | hm, I only had the "hw:numa_nodes=1" extra_spec | 17:32 |
lyarwood | gibi: awesome, I'll queue that up in the morning :) | 17:32 |
stephenfin | sean-k-mooney: that's a legit request | 17:32 |
lyarwood | gibi: \o | 17:32 |
sean-k-mooney | claudiub: yep that incorect | 17:32 |
gibi | lyarwood: note that it is still not all your comment fixed, but I will add one more separate patch to fix thoes | 17:33 |
sean-k-mooney | claudiub: you have enbaled numa node but not enable either numa aware cpu tracking or memory tracking | 17:33 |
* gibi leaves | 17:33 | |
sean-k-mooney | if you set hw:numa_nodes=1 and dont set hw:mem_page_size to any valid value its not correct | 17:34 |
stephenfin | that's a bit strong. It's fine if you don't have hugepages on the host | 17:34 |
stephenfin | or other things taking memory that you haven't accounted for | 17:34 |
sean-k-mooney | stephenfin: i did not say use hugepages | 17:34 |
stephenfin | setting hw:mem_page_size disables memory overcommit | 17:35 |
sean-k-mooney | stephenfin: i said set hw:mem_page_size to any value | 17:35 |
sean-k-mooney | stephenfin: yep you cant use over commit with numa properly | 17:35 |
*** bbowen_ has quit IRC | 17:35 | |
sean-k-mooney | stephenfin: claudiub look at https://etherpad.opendev.org/p/nova-wallaby-ptg line 712 | 17:36 |
sean-k-mooney | i explained why this is needed | 17:36 |
stephenfin | that's a veritable hammer of a solution | 17:37 |
stephenfin | we could do proper tracking without that | 17:37 |
claudiub | ouch. didn't know about that | 17:37 |
sean-k-mooney | stephenfin: we cant we tried many times | 17:37 |
sean-k-mooney | stephenfin: the only time oversubsript can be used with numa guest is if hw:numa_nodes = the number of numa nodes on the host | 17:38 |
stephenfin | I appreciate it's complicated because we need to marry the non-NUMA model with the NUMA model | 17:38 |
stephenfin | that's just because we don't do proper oversubscription modelling on a per-node basis | 17:39 |
sean-k-mooney | we do proper page tracking on a per numa basis | 17:39 |
sean-k-mooney | and that woudl be used if we did hw:mem_page_size=small | 17:39 |
sean-k-mooney | that could support oversubsripion if we wanted | 17:39 |
claudiub | from what I've seen in the docs, it wasn't specifying that huge pages extra_spec is mandatory, so that could be easily missed, imo. | 17:39 |
stephenfin | right, and we could use that with overcommit ratio for non-pinned NUMA instances | 17:39 |
sean-k-mooney | but we need to turn that on which si what hw:mem_page_size does | 17:40 |
stephenfin | that's the gap | 17:40 |
stephenfin | that we don't consider that information for instances without explicit page size configuration | 17:40 |
sean-k-mooney | yep but we were told we can make all guest numa guests | 17:40 |
sean-k-mooney | this only works if the guest cant float across numa ndoes | 17:40 |
stephenfin | which they can't if they have a guest NUMA topology | 17:40 |
sean-k-mooney | stephenfin: yep | 17:41 |
stephenfin | <stephenfin> I appreciate it's complicated because we need to marry the non-NUMA model with the NUMA model | 17:41 |
sean-k-mooney | if we can make all guest numa guest in nova we can fix this | 17:41 |
stephenfin | ^ that's the complicated bit | 17:41 |
stephenfin | because we can't make all guests NUMA guests, as you say | 17:41 |
sean-k-mooney | yep | 17:41 |
stephenfin | so we need to somehow have two views into memory - one with NUMA context and another without it | 17:41 |
sean-k-mooney | so making all numa guest have hw:mem_page_size=any we trun on the memroy tracking | 17:42 |
stephenfin | and somehow still be able to get a useful "how much free memory do I have" answer for both NUMA and non-NUMA guests | 17:42 |
sean-k-mooney | it will use smallpages unless the image asks | 17:42 |
sean-k-mooney | we can then extend that to do over subsiption if we want | 17:42 |
stephenfin | that's doable though | 17:42 |
sean-k-mooney | claudiub: anyway sorry for the info overload | 17:43 |
claudiub | nono, thanks, this is useful. :) | 17:43 |
sean-k-mooney | claudiub: just being trying to fix this for 6+ years | 17:43 |
stephenfin | for example, in the NUMA case, free memory for NUMA N could be seen as (NUMA N total memory * overcommit) - (NUMA N used memory) - (non-NUMA used memory / NUMA count) | 17:43 |
stephenfin | i.e. just evenly divide the non-NUMA memory usage across all NUMA nodes | 17:44 |
sean-k-mooney | we cant do that | 17:44 |
stephenfin | my point being, there are other ways to solve this that don't involve the hw:mem_page_size=any hammer :) | 17:44 |
sean-k-mooney | that will cause oom issue | 17:45 |
sean-k-mooney | when a vm has a numa toplogy we tell teh kernel and restict its meory allcoations and the cores it can float over to the numa node | 17:45 |
stephenfin | by the memory isn't locked | 17:45 |
stephenfin | so it can be swapped out | 17:45 |
sean-k-mooney | it can be swapped yes | 17:46 |
sean-k-mooney | in the case of 4k pages at least | 17:46 |
stephenfin | and the memory for non-NUMA hosts will move across various NUMA nodes as needed | 17:46 |
sean-k-mooney | hugepages are not swapabel | 17:46 |
stephenfin | yeah, I'm only focusing on small pages for now | 17:46 |
sean-k-mooney | stephenfin: so there are 3 times we shoudl do. 1 numa blanceing, 2 hw:mem_page_size=any 3, make smallpages oversubscibale per numa node | 17:47 |
sean-k-mooney | well do 2 after 3 | 17:47 |
stephenfin | I'm still not sure why 2 is needed, but agreed on the other two | 17:47 |
sean-k-mooney | we do to turn on the memroy tracking | 17:47 |
stephenfin | I'll try to take a look at this the week after next | 17:47 |
sean-k-mooney | we dont do the claim otherwise in the hsot numna toplogy object | 17:48 |
stephenfin | since despite all this talk, I'm not going to start on it today :) | 17:48 |
*** slaweq has quit IRC | 17:49 | |
sean-k-mooney | ok either way i think we need to disucss this in the ptg again and we can talk about it on irc again before | 17:49 |
stephenfin | agreed | 17:49 |
stephenfin | wanna add it to the agenda if you haven't already? | 17:49 |
stephenfin | if not I can | 17:49 |
sean-k-mooney | i didnt readded it since i was going to proceed based on what we agreed last ptg | 17:49 |
sean-k-mooney | i just didnt get time to wrok on this this cycle because of vdpa | 17:50 |
claudiub | I'll try the hw:numa_pages thing as soon as possible, since it's a bit of a burning issue for us. :) | 17:50 |
*** slaweq has joined #openstack-nova | 17:51 | |
sean-k-mooney | claudiub: setting it to hw:mem_page_size=any or hw:mem_page_size=small should resolve your current issue | 17:51 |
sean-k-mooney | at the cost of oversubsciption of memory beign blocked | 17:51 |
lyarwood | stephenfin: do you have a link to you libvirt secure boot bug to hand? | 17:51 |
lyarwood | your* | 17:52 |
stephenfin | lyarwood: https://bugzilla.redhat.com/show_bug.cgi?id=1929357 | 17:52 |
openstack | bugzilla.redhat.com bug 1929357 in libvirt "UEFI: Provide a way how to configure different combinations of secure boot enabled/disabled and keys enrolled/not enrolled" [Medium,New] - Assigned to phrdina | 17:52 |
lyarwood | thanks | 17:52 |
sean-k-mooney | claudiub: it that is not an option for you and you can carry a downstream only patch you would add a random shuffle here https://github.com/openstack/nova/blob/20459e3e88cb8382d450c7fdb042e2016d5560c5/nova/virt/hardware.py#L2276 of host_cells | 17:52 |
claudiub | will try both. :) | 17:54 |
sean-k-mooney | the balancing without over subctiption or randomisation with it likely could be backproted as a workaround bugfix. with proper feature in the future | 17:56 |
*** derekh has quit IRC | 18:03 | |
*** ralonsoh has quit IRC | 18:09 | |
*** bbowen has joined #openstack-nova | 18:17 | |
*** iurygregory has quit IRC | 18:22 | |
*** iurygregory has joined #openstack-nova | 18:22 | |
*** andrewbonney has quit IRC | 18:23 | |
sean-k-mooney | stephenfin:related to the previous topic it look like libvirt is locking the guest memroy any time a guest has vfio(pci passthough/sriov), mdev or nvme devices assigned nova was previously not aware of that and them means many guests we previously assumed were swapable are not. i still have to file a bug for this but its emrging forlowing the memory locking disucssion we had for vdpa | 18:24 |
sean-k-mooney | im not sure how to fix that yet without requiring all guest with any kind of passthoug device to be numa guests or otherewise use hw:mem_page_size in some form | 18:26 |
sean-k-mooney | we can mitagate the problem for q35 guests by enabling the viommu | 18:27 |
sean-k-mooney | but i suspect this is the cause of many OOM bug that have been filed in the past | 18:27 |
sean-k-mooney | oh and more fun file backed memroy does not seam to work the way we tought either | 18:29 |
sean-k-mooney | i should do some more testing but usign it i was not actully able to allocate more vms then i had memory for without OOM issues killing the running vms | 18:30 |
sean-k-mooney | so it looks like instead of mmaping the guest memory form the files as the qemu/libvirt docs impleis | 18:31 |
sean-k-mooney | qemu just malloc the memroy normally and then also create a mapping of the memory to a file | 18:31 |
sean-k-mooney | that might be because we are usign the legacy api for this or it might be for a different reason but either way it makes me sad :( | 18:32 |
*** slaweq has quit IRC | 18:34 | |
sean-k-mooney | actully i think i know why its broken if we want vms to only have file backed memory which is what we inteded we need to explitly set the normal memory to 0 i belive and then add the file using the memory hotplug feature so ya it likely a qemu bug in the old api. | 18:49 |
*** rcernin has joined #openstack-nova | 18:50 | |
*** rcernin has quit IRC | 18:56 | |
*** mgagne has quit IRC | 18:56 | |
*** mgagne has joined #openstack-nova | 18:57 | |
*** whoami-rajat has quit IRC | 18:59 | |
*** rcernin has joined #openstack-nova | 19:12 | |
*** bbowen has quit IRC | 19:15 | |
*** rcernin has quit IRC | 19:17 | |
*** hamalq has joined #openstack-nova | 19:41 | |
openstackgerrit | Merged openstack/nova master: libvirt: Parse the 'os' element from domainCapabilities https://review.opendev.org/c/openstack/nova/+/673790 | 19:43 |
*** rcernin has joined #openstack-nova | 20:00 | |
*** nightmare_unreal has quit IRC | 20:15 | |
*** gokhani has quit IRC | 20:17 | |
openstackgerrit | Merged openstack/nova master: nova-manage: Add libvirt get_machine_type command https://review.opendev.org/c/openstack/nova/+/769548 | 20:17 |
*** k_mouza has quit IRC | 20:19 | |
*** slaweq has joined #openstack-nova | 20:29 | |
*** bbowen has joined #openstack-nova | 20:38 | |
*** Jeffrey4l has quit IRC | 20:39 | |
*** Jeffrey4l has joined #openstack-nova | 20:39 | |
*** rcernin has quit IRC | 20:43 | |
*** k_mouza has joined #openstack-nova | 20:50 | |
*** k_mouza has quit IRC | 20:55 | |
*** xek has quit IRC | 21:04 | |
*** openstackgerrit has quit IRC | 21:05 | |
*** rcernin has joined #openstack-nova | 21:09 | |
*** luksky has quit IRC | 21:28 | |
*** luksky has joined #openstack-nova | 21:28 | |
*** luksky has quit IRC | 21:39 | |
*** rcernin has quit IRC | 21:49 | |
*** luksky has joined #openstack-nova | 21:53 | |
*** ircuser-1 has joined #openstack-nova | 21:54 | |
*** claudiub has quit IRC | 22:16 | |
*** slaweq has quit IRC | 22:27 | |
*** rcernin has joined #openstack-nova | 22:29 | |
*** takamatsu has quit IRC | 22:42 | |
*** rcernin has quit IRC | 22:54 | |
*** rcernin has joined #openstack-nova | 22:54 | |
*** swp20 has joined #openstack-nova | 22:56 | |
*** takamatsu has joined #openstack-nova | 22:57 | |
*** tkajinam has joined #openstack-nova | 22:57 | |
*** songwenping__ has quit IRC | 22:59 | |
melwitt | lyarwood: could you pls take a look at these stable/victoria backports? they've disabled tests in tripleo ci to workaround intermittent failures due to the bug https://review.opendev.org/c/openstack/nova/+/777121 and https://review.opendev.org/c/openstack/nova/+/777209 | 23:03 |
*** openstackgerrit has joined #openstack-nova | 23:22 | |
openstackgerrit | Takashi Kajinami proposed openstack/nova master: WIP: Clean up allocations left by evacuation https://review.opendev.org/c/openstack/nova/+/778696 | 23:22 |
*** luksky has quit IRC | 23:28 | |
*** martinkennelly has quit IRC | 23:34 | |
*** martinkennelly has joined #openstack-nova | 23:34 | |
openstackgerrit | melanie witt proposed openstack/nova master: Add functional test for bug 1837995 https://review.opendev.org/c/openstack/nova/+/775449 | 23:51 |
openstack | bug 1837995 in OpenStack Compute (nova) ""Unexpected API Error" when use "openstack usage show" command" [Undecided,In progress] https://launchpad.net/bugs/1837995 - Assigned to melanie witt (melwitt) | 23:51 |
openstackgerrit | melanie witt proposed openstack/nova master: Dynamically archive FK related records in archive_deleted_rows https://review.opendev.org/c/openstack/nova/+/773834 | 23:51 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!