*** osmanlicilegi is now known as Guest0 | 04:33 | |
opendevreview | Nobuhiro MIKI proposed openstack/nova-specs master: Re-propose "Add maxphysaddr support for Libvirt" for 2024.1 Caracal https://review.opendev.org/c/openstack/nova-specs/+/895135 | 05:53 |
---|---|---|
sahid | sean-k-mooney: o/ we have discussed about using tenant isolation, but that is not really responding our use case an now all vms scheduled for that given tenant are targeted to that az | 07:41 |
sahid | we would like that, the scheduler consider the new az for some specified tenants | 07:42 |
sahid | any idea about how we can achieve that? | 07:43 |
sahid | basically for a given tenant allowed for az3 the return used by the scheduler will be [az1, az2, az3], and for the other tenants only [az1, az2] | 07:44 |
sahid | bauzas: o/ you may have some idea ^ | 07:46 |
sahid | i'm thinking about to fix a bit multitenantisolation to acheive that | 07:46 |
opendevreview | Merged openstack/nova master: Add service version for Bobcat https://review.opendev.org/c/openstack/nova/+/893749 | 08:24 |
bauzas | sahid: looking (I was afk) | 08:27 |
bauzas | so you would like to return different AZs based on the tenant ? | 08:28 |
bauzas | if so, yeah, it's https://github.com/openstack/nova/blob/master/nova/scheduler/request_filter.py#L95 | 08:28 |
sahid | hum... | 08:37 |
sahid | that does not really work, we will have to duplicate our aggregates and set filter_tenant_id to all the duplicate ones | 09:09 |
sahid | I guess in that way, for a given tenant it will be possible to schedule on as1, az2, az3 using filter_tenant_id and the other will continue on az1 and az2 | 09:11 |
zigo | stephenfin: FYI, I also found that freezer-api probably needs some sqlalchemy 2.x love too... | 09:49 |
zigo | TypeError: LegacyEngineFacade.from_config() got an unexpected keyword argument 'autocommit' | 09:49 |
zigo | ... | 09:49 |
zigo | failures=511 | 09:50 |
zigo | Is it possible that just removing the autocommit argument is enough? | 09:50 |
sean-k-mooney | is freezer still supported | 09:52 |
zigo | Wooops... s/freezer/watcher/ | 09:52 |
zigo | My bad. | 09:52 |
sean-k-mooney | i have almost the same qustion for watcher :P although i had seen that mention at least this side of the pandemic | 09:53 |
zigo | To the contrary of freezer, there was some commits for this cycle. | 09:53 |
sean-k-mooney | there was like 8 commit in watcher most procedural | 09:54 |
sean-k-mooney | but i guess its stil offically supported | 09:54 |
sean-k-mooney | tobias-urdin: you were making some changes to watcher eiarler in the year. is that a project you have use of personaly | 09:56 |
sean-k-mooney | just wondering if patches were created how likely is it that they would be merged/reviewd | 09:56 |
opendevreview | Amit Uniyal proposed openstack/nova master: WIP:Adds device tagging functional tests https://review.opendev.org/c/openstack/nova/+/895162 | 11:31 |
tobias-urdin | sean-k-mooney: we do not use it, nothing i actively work on, i just did some drive-by fixes for oslo.messaging or fixing ci probably | 11:51 |
sean-k-mooney | tobias-urdin: ack ya that is what i was assumign i was just wondering if there were any active contibutors left | 11:57 |
bauzas | ralonsoh: while https://review.opendev.org/c/openstack/neutron/+/893447 is now merged (the neutron revert), we still have CI failures today https://zuul.opendev.org/t/openstack/build/01dd3befff5e42cfb0c0f667b8e7e30b | 12:24 |
bauzas | elodilles: I guess you know the current situation in Nova | 12:28 |
bauzas | we still need to await a few merges before RC1 | 12:29 |
bauzas | hmpfff https://zuul.opendev.org/t/openstack/builds?job_name=nova-live-migration&skip=0 | 12:30 |
bauzas | 30% of failures I'd say | 12:31 |
mgariepy | is it possible to force nova a device_type for pci devices ? | 12:44 |
sean-k-mooney | no | 12:44 |
mgariepy | i wiant to only do some pci passthrough on a couple of A100 | 12:45 |
sean-k-mooney | you can filter by the pci adress | 12:46 |
sean-k-mooney | with support for regxes or bash globs if you dont want to just use the adress | 12:46 |
mgariepy | https://github.com/openstack/nova/blob/stable/2023.1/nova/pci/stats.py#L682C1-L683C1 | 12:46 |
sean-k-mooney | yes if the alais is not device_type pf then we filter them out when considering candiates | 12:47 |
sean-k-mooney | that does not mean you cant reqeust PF for passthrough | 12:48 |
sean-k-mooney | you jsut have to set the device_type to type-PF in the pci ailais | 12:48 |
sean-k-mooney | so inestead of type-pci or type-vf https://github.com/openstack/nova/blob/stable/2023.1/nova/pci/request.py#L34 you woudl set type-PF | 12:49 |
sean-k-mooney | the alias and device list work toghtere to selec tthe device. the alias is how you describe the request and the device_list (pci whitelist) is how you declar what device may be allocated | 12:50 |
mgariepy | alias = { "name": "nvidia-a100-smx4-40gb", "product_id": "20b0", "vendor_id": "10de", 'device_type': 'type-PF'} | 12:51 |
mgariepy | restart the scheduler, and still get the same error. | 12:51 |
sean-k-mooney | you need to set this on all the compute too just an fyi hte have to match | 12:52 |
sean-k-mooney | but what is the error | 12:52 |
mgariepy | on it's also set on the gpu compute. | 12:53 |
sean-k-mooney | have you checked the pci devices tabel to ensure the a100s are correctly tracked | 12:53 |
sean-k-mooney | wether they show up as type-PF or type-pci will depend on if they report supprot for creating VF | 12:54 |
sean-k-mooney | that may depend on if you have enbale mig mode of not | 12:55 |
sean-k-mooney | for t4 it depeneded on the firmware version | 12:55 |
sean-k-mooney | i think a100 woudl alays report sriov supprot but its worth checking | 12:55 |
mgariepy | Placement PCI resource view: Placement PCI view on gpu-hpc21145: RP(gpu-hpc21145_0000:01:00.0, CUSTOM_PCI_10DE_20B0=1, traits=COMPUTE_MANAGED_PCI_DEVICE) | 12:56 |
sean-k-mooney | oh your using the pci in placemnt feature | 12:56 |
mgariepy | yep all i tried was also failing :) haha | 12:57 |
mgariepy | on 2023.1 | 12:57 |
sean-k-mooney | well this should be fine | 12:57 |
sean-k-mooney | so is it passign placment and failing in the pci filter? | 12:57 |
sean-k-mooney | or is it failing to get results form placemnt | 12:57 |
elodilles | bauzas: ACK. you can -1 the patch until it'll be updated: https://review.opendev.org/c/openstack/releases/+/894693 | 12:59 |
bauzas | oh missed it from my mails | 12:59 |
mgariepy | fails at running : _filter_pools_for_unrequested_pfs() | 13:00 |
bauzas | I was wondering why I wasn't seen it | 13:00 |
mgariepy | dev_type vs device_type ? | 13:00 |
mgariepy | or it's abstracted somewhere ? | 13:00 |
sean-k-mooney | its device_type in the alias | 13:01 |
ralonsoh | bauzas, yes, I'm going to mark this test as unstable again | 13:01 |
sean-k-mooney | https://github.com/openstack/nova/blob/stable/2023.1/nova/pci/request.py#L95C10-L104 that is the jsonschma fo rthe validation | 13:01 |
ralonsoh | we need to find a better way to set the subport as active after the live migration | 13:02 |
sean-k-mooney | but i think its dev type in the object | 13:02 |
mgariepy | let me logs the request and pool. to see if something seems odds. | 13:02 |
sean-k-mooney | https://github.com/openstack/nova/blob/stable/2023.1/nova/pci/request.py#L143C13-L145 | 13:02 |
bauzas | ralonsoh: ack | 13:02 |
sean-k-mooney | mgariepy: i would check the pci_devices table in teh cell db | 13:02 |
sean-k-mooney | and make sure the device has the expected type ectra | 13:03 |
mgariepy | nova.pci_devices tells me that the device is type-PF | 13:04 |
ralonsoh | bauzas, https://review.opendev.org/c/openstack/tempest/+/895167 | 13:05 |
sean-k-mooney | mgariepy: what does the pci whitelist look like for that host | 13:09 |
sean-k-mooney | based on the alias above it looks co unless you added physnet or remote managed or something im not sure why it would be remvoed | 13:11 |
sean-k-mooney | its the pci filter not the numa filter that is removing it right? | 13:11 |
mgariepy | https://paste.openstack.org/show/bij0SgRpSEVCTILyzIzf/ | 13:11 |
mgariepy | seems to be the pci filter | 13:12 |
sean-k-mooney | hum that all looks correct to me | 13:13 |
mgariepy | https://paste.openstack.org/show/bP2oY2IOe4lNvj6wGB8w/ | 13:14 |
mgariepy | what is resolving the alias ? is it dont by the scheduler directly ? | 13:14 |
sean-k-mooney | that depends on the request | 13:15 |
mgariepy | my flavor has alias stuff in in. | 13:15 |
sean-k-mooney | but no i think its doen in the compute or the api | 13:15 |
sean-k-mooney | https://docs.openstack.org/nova/latest/admin/pci-passthrough.html#configure-nova-api | 13:16 |
sean-k-mooney | so for ne vm request this is usd by the nova-api not the scheduler | 13:17 |
sean-k-mooney | and for rezize its used form teh compute | 13:17 |
sean-k-mooney | so if you are udataing this you need to restart the nova-api not the scheduler | 13:17 |
sean-k-mooney | the scheudle reciveds a fully popultated RequestSepc object which has the pci resuets embeeded in it | 13:18 |
mgariepy | but me resquest doesn't receive the device_type :/ | 13:19 |
sean-k-mooney | did you restart the nova-api processes after updating hte alias | 13:23 |
mgariepy | i did restart the nova slice. let me try juste putting the same config accross all the cluster to see if it changes something. | 13:24 |
mgariepy | what driver shoud be loaded fo the pci dev ? | 13:32 |
mgariepy | pci-stub or vfio-pci ? | 13:32 |
sean-k-mooney | the driver should not matter to the schduler | 13:32 |
sean-k-mooney | that woudl only matter on the compute node but yes either of those shoudl work | 13:33 |
mgariepy | ok | 13:33 |
sean-k-mooney | the important part is that the framebuffer is not initallised by xorg/wayland as that woudl prevent the passthough | 13:33 |
mgariepy | ha ok. | 13:33 |
sean-k-mooney | vfio-pci is what kvm/qemu will use when its being passthough through | 13:34 |
sean-k-mooney | so prebinding to that is generally not a bad idea but often not required | 13:34 |
mgariepy | i did some stuff with passthrough with el-cheapo gamer gpus. but with older openstack release. | 13:34 |
sean-k-mooney | thre is nothing really that has changed that would impact this | 13:35 |
sean-k-mooney | the alias and whitelist/device list syntax is the same | 13:36 |
sean-k-mooney | and your reporting that its fialing in the pci filter os its not related to the placment part | 13:36 |
sean-k-mooney | since it passes that to get to the point where its failing | 13:36 |
mgariepy | yeah i'll retest in like 30 minutes.. meeting time ;) | 13:37 |
*** tosky_ is now known as tosky | 14:06 | |
mgariepy | sean-k-mooney, it works now Thanks a log for your help. | 15:04 |
mgariepy | with the type-PF and all my server with the same config it's almost magical :D | 15:04 |
sean-k-mooney | awsome | 15:21 |
opendevreview | Sylvain Bauza proposed openstack/placement master: Update 2023.2 reqs to support os-traits 3.0.0 as min version https://review.opendev.org/c/openstack/placement/+/895186 | 15:21 |
sean-k-mooney | the reqeust was proably being accpeted by a not restarted instace | 15:22 |
mgariepy | yeah | 15:22 |
bauzas | dansmith: sean-k-mooney: forgot to propose the requirements update for the new traits : https://review.opendev.org/c/openstack/placement/+/895186 | 15:23 |
mgariepy | any tips on debugging that sort of stuff when multiple instance of every service are running ? | 15:23 |
bauzas | we need it before RC1 | 15:23 |
dansmith | bauzas: aren't we past requirements freeze? | 15:23 |
dansmith | is that in u-c already? | 15:23 |
bauzas | dansmith: os-traits was delivered in March | 15:24 |
bauzas | and u-c already has it | 15:24 |
dansmith | okay | 15:24 |
bauzas | https://opendev.org/openstack/requirements/raw/branch/master/upper-constraints.txt | 15:24 |
dansmith | ack, cool | 15:25 |
bauzas | so we just need to say 'look, for Bobcat, we need to accept only 3.0 since we use a trait in it" | 15:25 |
bauzas | I haven't checked tho whether we really use this trait in Placement but I'm assuming it | 15:25 |
dansmith | bauzas: wait you're not sure if we need the new traits? :) | 15:26 |
dansmith | we just bumped for this a few weeks ago: https://review.opendev.org/c/openstack/nova/+/873221 | 15:26 |
opendevreview | Gorka Eguileor proposed openstack/nova master: Fix load_validators https://review.opendev.org/c/openstack/nova/+/895189 | 15:38 |
opendevreview | Gorka Eguileor proposed openstack/nova master: Fix debug options https://review.opendev.org/c/openstack/nova/+/895190 | 15:39 |
opendevreview | Gorka Eguileor proposed openstack/nova master: Logs cinderclient requests when debugging https://review.opendev.org/c/openstack/nova/+/895191 | 15:39 |
opendevreview | Gorka Eguileor proposed openstack/nova master: Fix guard for NVMeOF volumes https://review.opendev.org/c/openstack/nova/+/895192 | 15:39 |
*** efried1 is now known as efried | 15:51 | |
bauzas | dansmith: so, I made a few researches, my bad, just followed blindly a pattern from Antelope when I did https://review.opendev.org/c/openstack/placement/+/874080 | 15:53 |
bauzas | so, https://review.opendev.org/c/openstack/nova/+/873221 is using traits-2.10 https://review.opendev.org/c/openstack/releases/+/873106 which is already supported as minimum | 15:54 |
bauzas | dansmith: so, I think I'll abandon https://review.opendev.org/c/openstack/placement/+/895186 | 16:10 |
dansmith | bauzas: okay | 16:11 |
bauzas | I'm actually lost, we created 3.0 in order to no longer support py2.6 and py2.7 | 16:12 |
sean-k-mooney | bauzas: so https://review.opendev.org/c/openstack/nova/+/831194/37/nova/share/manila.py#43 is returning the connection to via the sdk to talke to manilla | 16:12 |
bauzas | sean-k-mooney: sec, trying to see whether we need https://review.opendev.org/c/openstack/releases/+/894698 | 16:12 |
sean-k-mooney | its fine | 16:13 |
sean-k-mooney | focus on that i was just wonderign where the error was | 16:13 |
bauzas | sorry I meant https://review.opendev.org/c/openstack/placement/+/895186 | 16:13 |
bauzas | we'll add a new trait which isn't used yet by Nova | 16:14 |
bauzas | but we'll also make py3.6 and py3.7 unsupported | 16:14 |
bauzas | dansmith: sean-k-mooney: okay, I think I sorted it, we eventually need https://review.opendev.org/c/openstack/placement/+/895186 not because of the new trait, but because we want to unsupport 3.6 and 3.7 | 16:20 |
bauzas | in Bobcat | 16:20 |
dansmith | okay | 16:20 |
sean-k-mooney | ok so i should +2w that then right | 16:20 |
sean-k-mooney | the min version bump to 3.0.0 | 16:20 |
* sean-k-mooney waits... | 16:21 | |
bauzas | sean-k-mooney: sorry, yeah | 16:24 |
bauzas | thanks | 16:24 |
* bauzas doesn't have yet chameleon eyes :D | 16:25 | |
* bauzas is now back into recheck mode | 16:25 | |
sean-k-mooney | mgariepy: the only real tip is to use the request-id to corralate the requets across diffent services when looking at the logs | 16:39 |
sean-k-mooney | that and try and use automation like kolla-ansible to ensure you have the same config on all relevent nodes rather then doing manual changes | 16:40 |
sean-k-mooney | but no not really other then that | 16:40 |
* bauzas disappears for family reasons but will be back later in the evening | 16:41 | |
-opendevstatus- NOTICE: The lists.airshipit.org and lists.katacontainers.io sites will be offline briefly for migration to a new server | 16:48 | |
mgariepy | sean-k-mooney, i'm using openstack-ansible | 17:03 |
mgariepy | when doing changes that i'm not quite sure i do it manually instead of doing it via osa. | 17:03 |
*** EugenMayer4404 is now known as EugenMayer440 | 18:13 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!