Monday, 2025-03-24

*** ralonsoh_ is now known as ralonsoh07:55
dansmithgibi: sean-k-mooney: so starting to look at placement where the capacity checks happen, for addressing the overcommit bugs...15:35
sean-k-mooneyoh fun15:36
dansmithI'm only seeing a single test that ever asserts the capacity-exceeded scenario in all of placement's unit and functional tests15:36
dansmiththat is, like, unexpected since that's kinda the core function of placement15:36
sean-k-mooneywhen you say functional did you include the gabbit tests15:36
dansmithI know there are lots of gabbit tests I'm probably not grepping to find, but.. am I missing something?15:36
sean-k-mooneywell the gabbit test are the primarly api test/funcitonal tests in placment15:37
sean-k-mooneyi know they are closer to neutron fullstack tests then our functional tests15:37
sean-k-mooneybut i think thats where a lot of the end to end functial behaivor is verifed15:37
sean-k-mooneyhttps://github.com/openstack/placement/tree/master/placement/tests/functional/gabbits15:38
sean-k-mooneythere are a number of allcoation an dallcoation bug tests there for sepcifc beahivor15:39
dansmithyeah,15:39
dansmithbut only one that explicitly tests overcapacity for allocations, AFAICT,15:40
sean-k-mooneydansmith: i belive some of this is also enfoced at the db level althouhg i dont think they are usign db triggers or anything to actully encode it at the db level15:40
dansmithand it's just the simplest case of asking for more than exists15:40
dansmithsean-k-mooney: it is, and there's a single db test for one simple sccenario there15:40
sean-k-mooneyack15:40
dansmithI mean the db object layer in placement15:40
sean-k-mooneyi guess the assumtin is if you neever allow a simple allcotion to oversubscibe that covers the more advanced cases15:41
dansmiththere's a big complex function that checks if we're over capacity after an allocation15:41
sean-k-mooneyright15:41
sean-k-mooneyso https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/allocations-bug-1779717.yaml#L73 is testing creating an allcotion for multiple consumers15:41
dansmithit's not that simple though, that big function has a lot of edges that can't possibly be covered by a single test (single python test and a single gabbit test)15:41
dansmithyep15:41
sean-k-mooneyalthough that is not really about capastiy15:42
sean-k-mooneylookign at the bug itself https://bugs.launchpad.net/nova/+bug/177971715:42
sean-k-mooneyi guess to un pick this a littel you might want to create a repoducer test for the current evac/cold migration case15:43
sean-k-mooneywhere we are doing the swap 15:43
sean-k-mooneyi suspect you can do that without gabit using the placement fixture15:43
dansmithwell, I want to test the underlying mechanics in placement, not nova15:43
sean-k-mooneywe certenly could do that in nova so i dont see why you could not in placment itself15:44
dansmithand I expected to see a lot more coverage in placement given how complex the capacity check is15:44
sean-k-mooneyright but the the actial api behavior does not actully depend on nova15:44
sean-k-mooneyyou jsut need to create the allcoation mess with reserved then try and swap it15:44
sean-k-mooneybasiclly emulate the OTU flow15:45
sean-k-mooneyi know you know that by the way im just trying to state the repoducer clearly15:45
sean-k-mooneygabit is expressive enough to encode that but its not very deep coverage hench why you probly will want to try using the placement fixutre and writign a new functional test15:46
dansmithyeah I'm very aware of what I need to do, I'm just surprised that there is so little coverage of this _area_ in placement15:46
sean-k-mooneyill fully adming i have never been that deep on the implemation side of placement15:47
sean-k-mooneyonly the usage of it15:47
sean-k-mooneyso i havent really looked tat the testing in any detail15:47
dansmithokay I figured you and gibi are the most familiar, but maybe not15:47
dansmithack15:47
sean-k-mooneygibi and melwitt i think. api wise yes but ptyhon wise less so on my part15:48
sean-k-mooneyim willing to review placement code but i generally need to learn the context every time i do15:48
gibidansmith: I would expect more than one over capacity test case. let me take a look15:49
sean-k-mooneyi think tis is the only functional testing of allctions https://github.com/openstack/placement/blob/master/placement/tests/functional/test_allocation.py15:49
sean-k-mooneyi.e. in python 15:50
dansmithI definitely want to do this in python functional tests and not gabbits, FWIW15:50
sean-k-mooneydansmith: this is the one you wer referint too for the db level right https://github.com/openstack/placement/blob/master/placement/tests/functional/db/test_allocation.py#L51915:51
sean-k-mooneya number of those functional tests are really like 5 tests in one15:51
dansmithyup15:52
dansmithyup, referring, but not sure yup in terms of coverage15:52
dansmiththe capacity check is very complex :)15:52
sean-k-mooneyhttps://github.com/search?q=repo%3Aopenstack/placement%20InvalidAllocationCapacityExceeded&type=code not that i realy turst github search but only 3 hits total15:53
sean-k-mooneycodesearch agrees15:53
sean-k-mooneythat excetion is not used elsewhere in the codebase15:54
sean-k-mooneyi guess there is also InvalidInventoryCapacity15:55
sean-k-mooneybut i think thats raised when you modify the inventory15:56
dansmithright15:56
gibithis file has some POST allocation tests https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/allocations-post.yaml15:56
gibibut I understand you want moe15:57
gibimore15:57
sean-k-mooneyright and that does https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/allocations-post.yaml#L212-L23615:57
dansmithgibi: yeah it's not just allocation, it's allocation with the capacity check15:58
dansmith*failing15:58
gibithis also has some coverage https://github.com/openstack/placement/blob/master/placement/tests/functional/db/test_allocation.py but I'm not against adding more :)15:59
dansmiththat's really the only place in python, but yeah16:04
gibiin cases when I'm not sure what is covered and not covered I tend to go in and break the code in certain cases to see if it caught be tests. Maybe there are test cases in wrongly named test files hiding16:07
sean-k-mooneygibi: i dont think placement has the saem level fo in depth testing that nova has in genral. gabit is greate but often it was used instead of unit and functional testin instead of in addtion to them16:27
sean-k-mooneyone of the test case that i looked at was teh sole testing in the commit that added a new microverions16:27
sean-k-mooneythere may have been other changes in teh serise16:28
sean-k-mooneybut if that really was the only commit to implemnt the microversion it shows that gabbit testing was considered enough coverage for an api change16:28
dansmithgibi: yeah I was going to create a reproducer for the issue and then break it to make sure it starts working.. so I guess we'll see if something stops when I do :)16:37
simondodsleyhi - i'm looking for advice on how to disable `discard` on a single cinder volume that is coming from a cinder driver where `discard` is hardcoded to be `True`. A customer is having issues with multiattach volumes that they want to use for Windows clustering, in that the SCSI page 83h doesn't appear to be passing through the SCSI reservation settings.16:44
simondodsleyI read that `nodiscard` may resolve this, but can't work out how to do the override. Overriding the whole backend for a test would be an option, but as I say the cinder driver has `disacrd` hardcoded. Is there somethin in nova that can help?16:45
sean-k-mooneysimondodsley: so windwos clustering required the vm to be create with a volume configured to use type=lun which cant be set durign attach only vm creation16:46
dansmithyep, was just going to say ^16:46
sean-k-mooneywe shoudl probaly fix that at some point16:47
sean-k-mooneybut anyway i am not sure if you can overried that with our libvirt section16:47
dansmithis that enough to enable it? I thought not16:47
sean-k-mooneywe have a cofnig option for discard but i tought it was for nova provisioend storage16:47
dansmithI thought lun was the only way16:47
sean-k-mooneyno i think you need lun for scsi v316:47
simondodsleyso how do we set the type=lun at vm creation?16:47
sean-k-mooneyi just ment that even behond that im not sure if you can overried discard form nova16:47
sean-k-mooneysimondodsley: so the only discard cofnig im awre of in nova is https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.hw_disk_discard which woudl turn it off for the host but that says "nova managed disks"16:49
dansmithsean-k-mooney: I think if they want the guest to use discard and have that make it all the way to the actual physical device they need to use lun, no?16:49
sean-k-mooneydansmith: well that what will happen if discard is enabeld on the cinder volume in the connection info16:49
dansmithI thought libvirt discard would only hit the qemu storage driver, like ceph or qcow216:49
sean-k-mooneydansmith: i guess there are two diffent things happenign here16:50
sean-k-mooneysimondodsley made the assertion that truning of discard migh help with the scsi command making it to the backend. i think we both agree it not related to that16:50
dansmithtight16:51
dansmith*right :)16:51
sean-k-mooneysimondodsley: so if you look at the server create api https://docs.openstack.org/api-ref/compute/#create-server16:51
sean-k-mooneythere is an option device_type filed in the request16:52
sean-k-mooneyin the block device mappings16:52
sean-k-mooneysorry that might not be the correct filed16:52
sean-k-mooneyi think you want block_device_mapping_v2.disk_bus  i will neeed to check the other doc16:53
sean-k-mooneyhttps://docs.openstack.org/nova/latest/user/block-device-mapping.html16:53
sean-k-mooneyok no so you want disk_bus=scsi and deivce_type=lun16:53
sean-k-mooney"""disk_bus and device_type - low level details that some hypervisors (currently only libvirt) may support. Some example disk_bus values can be: ide, usb, virtio, scsi, while device_type may be disk, cdrom, floppy, lun. This is not an exhaustive list as it depends on the virtualization driver, and may change as more support is added. Leaving these empty is the most common16:54
sean-k-mooneything to do."""16:54
simondodsleyok - so how would that be forced from the comand line, or the gui?16:54
sean-k-mooneyyou cant do it form horizon16:55
sean-k-mooneyfrom the cli you an do it when creating the block device mapppings16:55
sean-k-mooneyhttps://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#server-create16:55
sean-k-mooneyyou have to use teh --block-device parmater instead of --volume 16:55
sean-k-mooneyhttps://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#cmdoption-openstack-server-create-block-device16:55
simondodsleyawesome - i see - thank you. We will give that a try16:55
sean-k-mooneysimondodsley: when dan fixed this he added some tempest test that validate this is functional with lvm and multi attache. although obviosly not with a windows workload16:57
sean-k-mooneysimondodsley: but we do have coverage of creating a vme with a mutiatch volume in lun mode16:58
sean-k-mooneyhttps://github.com/openstack/tempest/blob/f4a8698b97a5433b8e64df408aace2fb14c2cb22/tempest/api/compute/volumes/test_attach_volume.py#L46916:58
simondodsleyand this will ensure that SCSI page 83 will be passed through correctly?16:58
sean-k-mooneydo you ahve a refence to that defintiion. it will ensure that scsiv3 command are aviable to the guest16:59
sean-k-mooneywithout that you onlyhave access to v2 commands16:59
sean-k-mooneysimondodsley: by the way we only fixed this fucntionaly in recent version of nova17:02
sean-k-mooneyi think form dalmation on17:02
simondodsleyoh - so it won't work on caracal17:03
simondodsleyhow did anyone every use windows clustering before this fix then?17:03
sean-k-mooneyi can check when we fixed it but no livbirt changed it api because qemu stop supproting the serial atibute with type lun a few years ago17:03
sean-k-mooneywindows supprot clustierning with other ways  of doing synconisation17:03
sean-k-mooneyi.e. not file system absed it has 3-4 modes17:04
sean-k-mooneyso either they used one of the other mode or they didnt do it with libvirt17:04
sean-k-mooneyi.e. they were usihg hyperv or vmware or ironic17:04
simondodsleythis is the error from windows - `Physical disk {} does not have the inquiry data (SCSI page 83h VPD descriptor) that is required by failover clustering. Status: The request is not supported`17:04
sean-k-mooneyyep17:05
dansmiththat's what lun should fix17:05
simondodsleybut it only from dalmation...17:05
sean-k-mooneyor really old libvirt17:05
dansmithsimondodsley: also, clustered windows is not cloudy and openstack really focuses on cloudy workloads :)17:05
dansmithso that's why it's recent17:06
simondodsleyi know that, but tell customers :D17:06
dansmithjust saying.17:06
sean-k-mooneyso the fix in dalmation was https://github.com/openstack/nova/commit/575ff86a4f1572786d66639f774405fbc074fdb117:06
sean-k-mooneywhich depend on another change that was done in caracal17:07
sean-k-mooneyit might be possibel to backport to caracal but not before17:07
sean-k-mooneythe prior change i dont think is backportable (using alias)17:07
dansmithnot backportable upstream, but doable for simondodsley you mean17:08
sean-k-mooneywell if it just needs https://github.com/openstack/nova/commit/575ff86a4f1572786d66639f774405fbc074fdb1 that might be ok upstream but the prior change to use the alias instead of serial to lookup the devices is much more invasive17:08
sean-k-mooneybut that already in caracal17:08
simondodsleyif we can backport the patch to caracal I can try and convince canonical to apply it downstream as well17:09
sean-k-mooneysimondodsley: its a serise of 3 patches https://review.opendev.org/q/topic:%22lun-device-serial%2217:10
sean-k-mooneysimondodsley: i would like to get elodilles input on if that is upstream backporable17:11
sean-k-mooneysimondodsley: strictly speaking https://bugs.launchpad.net/nova/+bug/2065084 was a latent bug for years so i think it could be backported upstram but it does not hurt to ask17:18
sean-k-mooneythe new feature that was added in caracal that it implcictly depend on is not backportable upstream really17:19
simondodsleyjust so i haven't got this wrong... from Dalmation we can use `openstack server create --block-device disk_bus=scsi,disk_type=lun,volume_size=x,volume_type=whatever` 17:19
sean-k-mooneyso if it was backproted it would only be to 2024.117:19
simondodsleyi thin that backpor would be fine17:19
sean-k-mooneyyes taht looks correct to me17:19
simondodsleycool17:20
sean-k-mooneywell you might need to set the destination type17:20
sean-k-mooneydestination_type=volume17:20
sean-k-mooneyyou also need to set the source type17:20
sean-k-mooneybasically  need to set all the filed for the block device mappings for that entry17:21
dansmithsean-k-mooney: what's the justification for backport upstream? it's been broken forever, not sure it ever worked, not a regression in recent memory and thus seems entirely a feature to me17:22
sean-k-mooneydansmith: just that it was tracked as a bug upstream as it worked in theory with old libvirt17:22
sean-k-mooneybut i agree that its not a very strong justification17:23
sean-k-mooneythat why im relucatnt to propose ti without impot form the stable team 17:23
sean-k-mooneye.g. elodilles  or others17:23
dansmithokay I thought you were implying it had a good justification.. to me it's just a feature.. if it ever worked it was long enough ago that it's not really a regression,17:24
dansmithso even if it's fixing a bug that never worked, it's enabling a whole new workflow, behaviors, etc.. very feature-y to me17:24
sean-k-mooneyyep. canconcal can assesss if they want to fix it in there downstream indepenetely of if we think its upstream fixable too17:25
simondodsleywould it elp if a customer raised a bug report against this?17:25
sean-k-mooneyso simondodsley i would still reach out to them about it17:25
sean-k-mooneysimondodsley: not really we have downstream custoemr bugs relate to it too17:25
sean-k-mooneyalthough it never workined in our downstream that we can tell for any supproted release17:25
sean-k-mooneyi.e. its been broekn for many many years17:26
simondodsleyi get it17:26
sean-k-mooneywe are also debtatign if we can backprot this internally but we have not done that yet17:26
sean-k-mooneyour latest downstream product is based on antelope and it does tno have the otehr fature which is the harder oen to backprot safely17:27
dansmithno, a customer bug report would not change my mind :)17:28
sean-k-mooneyfor what its worth we totelly condier this to be a feature downstream 17:28
sean-k-mooneyand honestly an incompelte one since you cant use it with attach17:29
sean-k-mooneymodifying the nova api to allow setting the device type to lun on attach will need a new microverison and someone to actully do the work :)17:30
sean-k-mooneydo be clear that woudl be nice ot have for other reason as it would allwo you to create a volume form an iso and attach it as a cd_rom17:31
sean-k-mooneybut i dont think anyone is curently working on that17:32
masahitoHi nova folks, please review the API bug fix https://review.opendev.org/c/openstack/nova/+/939658  All the comments are resolved and i hope bug fix reviews for the next release is open now. 18:29
sean-k-mooneymelwitt: i shoudl avlid checkign if things have merged before i go to bed :) ill fix the typos quickly `_SentinelExcpetion` was intentionally not `SentinelExcpetion`23:31
melwittsean-k-mooney: yes you should. that is usually a recipe for staying up even later in my experience :P23:32
melwittsean-k-mooney: also, I meant typo of the word Exception, not the underscore23:33
sean-k-mooneyoh i tried to compare the spelling but i didnt see that23:33
sean-k-mooneyill copy pat your version and add the underscore23:33
melwittyou are very consistent with your typos, I will give you that23:33
sean-k-mooneywith in the scope of a function/module 23:34
sean-k-mooneyif a copiler/interperter shouts at me ill definetly fix it23:34
sean-k-mooneybut between files much less so23:35
opendevreviewsean mooney proposed openstack/nova master: wrap wsgi_app.init_application with latch_error_on_raise  https://review.opendev.org/c/openstack/nova/+/94502823:37
sean-k-mooneymelwitt: hopefully id didnt add any new typos but ^ shoudl fix everythign you noted23:37
sean-k-mooneyah i reverst the ep to pe23:38
sean-k-mooney*reversed23:38
melwittsean-k-mooney: looks good to me, thanks23:38
sean-k-mooneyby the way is there any reason to hold https://review.opendev.org/c/openstack/nova/+/943507 23:42
sean-k-mooneywe are past rc1 and master is not 2025.2 so we can merge that on master but we cant backport until we ship 2025.1 right23:42
sean-k-mooneywell backporting is kind if awkware because the config otpion depend on specirfic oslo limits versions23:43
sean-k-mooneybut we are past the rc period now so there is no reason not to +w that right?23:43
melwittsean-k-mooney: yeah, the backport is not straightforward :( so not sure what we will do there, maybe nothing23:44
sean-k-mooneyya thats less the awsome but it what it is23:44
melwittsean-k-mooney: not that I know of but we could double check with bauzas/uggla23:45
sean-k-mooneyi could but i just +w'd i could remove it23:45
sean-k-mooneybut i think we are good23:45
melwittyeah it should be ok23:46
sean-k-mooneyi thikn there are 1 or 2 other chanage in a similar state but im not going to go look for them tonight23:47
sean-k-mooneychat to you tomrrow o/23:47
melwittseeya tomorrow o/23:47

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!