*** ralonsoh_ is now known as ralonsoh | 07:55 | |
dansmith | gibi: sean-k-mooney: so starting to look at placement where the capacity checks happen, for addressing the overcommit bugs... | 15:35 |
---|---|---|
sean-k-mooney | oh fun | 15:36 |
dansmith | I'm only seeing a single test that ever asserts the capacity-exceeded scenario in all of placement's unit and functional tests | 15:36 |
dansmith | that is, like, unexpected since that's kinda the core function of placement | 15:36 |
sean-k-mooney | when you say functional did you include the gabbit tests | 15:36 |
dansmith | I know there are lots of gabbit tests I'm probably not grepping to find, but.. am I missing something? | 15:36 |
sean-k-mooney | well the gabbit test are the primarly api test/funcitonal tests in placment | 15:37 |
sean-k-mooney | i know they are closer to neutron fullstack tests then our functional tests | 15:37 |
sean-k-mooney | but i think thats where a lot of the end to end functial behaivor is verifed | 15:37 |
sean-k-mooney | https://github.com/openstack/placement/tree/master/placement/tests/functional/gabbits | 15:38 |
sean-k-mooney | there are a number of allcoation an dallcoation bug tests there for sepcifc beahivor | 15:39 |
dansmith | yeah, | 15:39 |
dansmith | but only one that explicitly tests overcapacity for allocations, AFAICT, | 15:40 |
sean-k-mooney | dansmith: i belive some of this is also enfoced at the db level althouhg i dont think they are usign db triggers or anything to actully encode it at the db level | 15:40 |
dansmith | and it's just the simplest case of asking for more than exists | 15:40 |
dansmith | sean-k-mooney: it is, and there's a single db test for one simple sccenario there | 15:40 |
sean-k-mooney | ack | 15:40 |
dansmith | I mean the db object layer in placement | 15:40 |
sean-k-mooney | i guess the assumtin is if you neever allow a simple allcotion to oversubscibe that covers the more advanced cases | 15:41 |
dansmith | there's a big complex function that checks if we're over capacity after an allocation | 15:41 |
sean-k-mooney | right | 15:41 |
sean-k-mooney | so https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/allocations-bug-1779717.yaml#L73 is testing creating an allcotion for multiple consumers | 15:41 |
dansmith | it's not that simple though, that big function has a lot of edges that can't possibly be covered by a single test (single python test and a single gabbit test) | 15:41 |
dansmith | yep | 15:41 |
sean-k-mooney | although that is not really about capastiy | 15:42 |
sean-k-mooney | lookign at the bug itself https://bugs.launchpad.net/nova/+bug/1779717 | 15:42 |
sean-k-mooney | i guess to un pick this a littel you might want to create a repoducer test for the current evac/cold migration case | 15:43 |
sean-k-mooney | where we are doing the swap | 15:43 |
sean-k-mooney | i suspect you can do that without gabit using the placement fixture | 15:43 |
dansmith | well, I want to test the underlying mechanics in placement, not nova | 15:43 |
sean-k-mooney | we certenly could do that in nova so i dont see why you could not in placment itself | 15:44 |
dansmith | and I expected to see a lot more coverage in placement given how complex the capacity check is | 15:44 |
sean-k-mooney | right but the the actial api behavior does not actully depend on nova | 15:44 |
sean-k-mooney | you jsut need to create the allcoation mess with reserved then try and swap it | 15:44 |
sean-k-mooney | basiclly emulate the OTU flow | 15:45 |
sean-k-mooney | i know you know that by the way im just trying to state the repoducer clearly | 15:45 |
sean-k-mooney | gabit is expressive enough to encode that but its not very deep coverage hench why you probly will want to try using the placement fixutre and writign a new functional test | 15:46 |
dansmith | yeah I'm very aware of what I need to do, I'm just surprised that there is so little coverage of this _area_ in placement | 15:46 |
sean-k-mooney | ill fully adming i have never been that deep on the implemation side of placement | 15:47 |
sean-k-mooney | only the usage of it | 15:47 |
sean-k-mooney | so i havent really looked tat the testing in any detail | 15:47 |
dansmith | okay I figured you and gibi are the most familiar, but maybe not | 15:47 |
dansmith | ack | 15:47 |
sean-k-mooney | gibi and melwitt i think. api wise yes but ptyhon wise less so on my part | 15:48 |
sean-k-mooney | im willing to review placement code but i generally need to learn the context every time i do | 15:48 |
gibi | dansmith: I would expect more than one over capacity test case. let me take a look | 15:49 |
sean-k-mooney | i think tis is the only functional testing of allctions https://github.com/openstack/placement/blob/master/placement/tests/functional/test_allocation.py | 15:49 |
sean-k-mooney | i.e. in python | 15:50 |
dansmith | I definitely want to do this in python functional tests and not gabbits, FWIW | 15:50 |
sean-k-mooney | dansmith: this is the one you wer referint too for the db level right https://github.com/openstack/placement/blob/master/placement/tests/functional/db/test_allocation.py#L519 | 15:51 |
sean-k-mooney | a number of those functional tests are really like 5 tests in one | 15:51 |
dansmith | yup | 15:52 |
dansmith | yup, referring, but not sure yup in terms of coverage | 15:52 |
dansmith | the capacity check is very complex :) | 15:52 |
sean-k-mooney | https://github.com/search?q=repo%3Aopenstack/placement%20InvalidAllocationCapacityExceeded&type=code not that i realy turst github search but only 3 hits total | 15:53 |
sean-k-mooney | codesearch agrees | 15:53 |
sean-k-mooney | that excetion is not used elsewhere in the codebase | 15:54 |
sean-k-mooney | i guess there is also InvalidInventoryCapacity | 15:55 |
sean-k-mooney | but i think thats raised when you modify the inventory | 15:56 |
dansmith | right | 15:56 |
gibi | this file has some POST allocation tests https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/allocations-post.yaml | 15:56 |
gibi | but I understand you want moe | 15:57 |
gibi | more | 15:57 |
sean-k-mooney | right and that does https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/allocations-post.yaml#L212-L236 | 15:57 |
dansmith | gibi: yeah it's not just allocation, it's allocation with the capacity check | 15:58 |
dansmith | *failing | 15:58 |
gibi | this also has some coverage https://github.com/openstack/placement/blob/master/placement/tests/functional/db/test_allocation.py but I'm not against adding more :) | 15:59 |
dansmith | that's really the only place in python, but yeah | 16:04 |
gibi | in cases when I'm not sure what is covered and not covered I tend to go in and break the code in certain cases to see if it caught be tests. Maybe there are test cases in wrongly named test files hiding | 16:07 |
sean-k-mooney | gibi: i dont think placement has the saem level fo in depth testing that nova has in genral. gabit is greate but often it was used instead of unit and functional testin instead of in addtion to them | 16:27 |
sean-k-mooney | one of the test case that i looked at was teh sole testing in the commit that added a new microverions | 16:27 |
sean-k-mooney | there may have been other changes in teh serise | 16:28 |
sean-k-mooney | but if that really was the only commit to implemnt the microversion it shows that gabbit testing was considered enough coverage for an api change | 16:28 |
dansmith | gibi: yeah I was going to create a reproducer for the issue and then break it to make sure it starts working.. so I guess we'll see if something stops when I do :) | 16:37 |
simondodsley | hi - i'm looking for advice on how to disable `discard` on a single cinder volume that is coming from a cinder driver where `discard` is hardcoded to be `True`. A customer is having issues with multiattach volumes that they want to use for Windows clustering, in that the SCSI page 83h doesn't appear to be passing through the SCSI reservation settings. | 16:44 |
simondodsley | I read that `nodiscard` may resolve this, but can't work out how to do the override. Overriding the whole backend for a test would be an option, but as I say the cinder driver has `disacrd` hardcoded. Is there somethin in nova that can help? | 16:45 |
sean-k-mooney | simondodsley: so windwos clustering required the vm to be create with a volume configured to use type=lun which cant be set durign attach only vm creation | 16:46 |
dansmith | yep, was just going to say ^ | 16:46 |
sean-k-mooney | we shoudl probaly fix that at some point | 16:47 |
sean-k-mooney | but anyway i am not sure if you can overried that with our libvirt section | 16:47 |
dansmith | is that enough to enable it? I thought not | 16:47 |
sean-k-mooney | we have a cofnig option for discard but i tought it was for nova provisioend storage | 16:47 |
dansmith | I thought lun was the only way | 16:47 |
sean-k-mooney | no i think you need lun for scsi v3 | 16:47 |
simondodsley | so how do we set the type=lun at vm creation? | 16:47 |
sean-k-mooney | i just ment that even behond that im not sure if you can overried discard form nova | 16:47 |
sean-k-mooney | simondodsley: so the only discard cofnig im awre of in nova is https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.hw_disk_discard which woudl turn it off for the host but that says "nova managed disks" | 16:49 |
dansmith | sean-k-mooney: I think if they want the guest to use discard and have that make it all the way to the actual physical device they need to use lun, no? | 16:49 |
sean-k-mooney | dansmith: well that what will happen if discard is enabeld on the cinder volume in the connection info | 16:49 |
dansmith | I thought libvirt discard would only hit the qemu storage driver, like ceph or qcow2 | 16:49 |
sean-k-mooney | dansmith: i guess there are two diffent things happenign here | 16:50 |
sean-k-mooney | simondodsley made the assertion that truning of discard migh help with the scsi command making it to the backend. i think we both agree it not related to that | 16:50 |
dansmith | tight | 16:51 |
dansmith | *right :) | 16:51 |
sean-k-mooney | simondodsley: so if you look at the server create api https://docs.openstack.org/api-ref/compute/#create-server | 16:51 |
sean-k-mooney | there is an option device_type filed in the request | 16:52 |
sean-k-mooney | in the block device mappings | 16:52 |
sean-k-mooney | sorry that might not be the correct filed | 16:52 |
sean-k-mooney | i think you want block_device_mapping_v2.disk_bus i will neeed to check the other doc | 16:53 |
sean-k-mooney | https://docs.openstack.org/nova/latest/user/block-device-mapping.html | 16:53 |
sean-k-mooney | ok no so you want disk_bus=scsi and deivce_type=lun | 16:53 |
sean-k-mooney | """disk_bus and device_type - low level details that some hypervisors (currently only libvirt) may support. Some example disk_bus values can be: ide, usb, virtio, scsi, while device_type may be disk, cdrom, floppy, lun. This is not an exhaustive list as it depends on the virtualization driver, and may change as more support is added. Leaving these empty is the most common | 16:54 |
sean-k-mooney | thing to do.""" | 16:54 |
simondodsley | ok - so how would that be forced from the comand line, or the gui? | 16:54 |
sean-k-mooney | you cant do it form horizon | 16:55 |
sean-k-mooney | from the cli you an do it when creating the block device mapppings | 16:55 |
sean-k-mooney | https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#server-create | 16:55 |
sean-k-mooney | you have to use teh --block-device parmater instead of --volume | 16:55 |
sean-k-mooney | https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#cmdoption-openstack-server-create-block-device | 16:55 |
simondodsley | awesome - i see - thank you. We will give that a try | 16:55 |
sean-k-mooney | simondodsley: when dan fixed this he added some tempest test that validate this is functional with lvm and multi attache. although obviosly not with a windows workload | 16:57 |
sean-k-mooney | simondodsley: but we do have coverage of creating a vme with a mutiatch volume in lun mode | 16:58 |
sean-k-mooney | https://github.com/openstack/tempest/blob/f4a8698b97a5433b8e64df408aace2fb14c2cb22/tempest/api/compute/volumes/test_attach_volume.py#L469 | 16:58 |
simondodsley | and this will ensure that SCSI page 83 will be passed through correctly? | 16:58 |
sean-k-mooney | do you ahve a refence to that defintiion. it will ensure that scsiv3 command are aviable to the guest | 16:59 |
sean-k-mooney | without that you onlyhave access to v2 commands | 16:59 |
sean-k-mooney | simondodsley: by the way we only fixed this fucntionaly in recent version of nova | 17:02 |
sean-k-mooney | i think form dalmation on | 17:02 |
simondodsley | oh - so it won't work on caracal | 17:03 |
simondodsley | how did anyone every use windows clustering before this fix then? | 17:03 |
sean-k-mooney | i can check when we fixed it but no livbirt changed it api because qemu stop supproting the serial atibute with type lun a few years ago | 17:03 |
sean-k-mooney | windows supprot clustierning with other ways of doing synconisation | 17:03 |
sean-k-mooney | i.e. not file system absed it has 3-4 modes | 17:04 |
sean-k-mooney | so either they used one of the other mode or they didnt do it with libvirt | 17:04 |
sean-k-mooney | i.e. they were usihg hyperv or vmware or ironic | 17:04 |
simondodsley | this is the error from windows - `Physical disk {} does not have the inquiry data (SCSI page 83h VPD descriptor) that is required by failover clustering. Status: The request is not supported` | 17:04 |
sean-k-mooney | yep | 17:05 |
dansmith | that's what lun should fix | 17:05 |
simondodsley | but it only from dalmation... | 17:05 |
sean-k-mooney | or really old libvirt | 17:05 |
dansmith | simondodsley: also, clustered windows is not cloudy and openstack really focuses on cloudy workloads :) | 17:05 |
dansmith | so that's why it's recent | 17:06 |
simondodsley | i know that, but tell customers :D | 17:06 |
dansmith | just saying. | 17:06 |
sean-k-mooney | so the fix in dalmation was https://github.com/openstack/nova/commit/575ff86a4f1572786d66639f774405fbc074fdb1 | 17:06 |
sean-k-mooney | which depend on another change that was done in caracal | 17:07 |
sean-k-mooney | it might be possibel to backport to caracal but not before | 17:07 |
sean-k-mooney | the prior change i dont think is backportable (using alias) | 17:07 |
dansmith | not backportable upstream, but doable for simondodsley you mean | 17:08 |
sean-k-mooney | well if it just needs https://github.com/openstack/nova/commit/575ff86a4f1572786d66639f774405fbc074fdb1 that might be ok upstream but the prior change to use the alias instead of serial to lookup the devices is much more invasive | 17:08 |
sean-k-mooney | but that already in caracal | 17:08 |
simondodsley | if we can backport the patch to caracal I can try and convince canonical to apply it downstream as well | 17:09 |
sean-k-mooney | simondodsley: its a serise of 3 patches https://review.opendev.org/q/topic:%22lun-device-serial%22 | 17:10 |
sean-k-mooney | simondodsley: i would like to get elodilles input on if that is upstream backporable | 17:11 |
sean-k-mooney | simondodsley: strictly speaking https://bugs.launchpad.net/nova/+bug/2065084 was a latent bug for years so i think it could be backported upstram but it does not hurt to ask | 17:18 |
sean-k-mooney | the new feature that was added in caracal that it implcictly depend on is not backportable upstream really | 17:19 |
simondodsley | just so i haven't got this wrong... from Dalmation we can use `openstack server create --block-device disk_bus=scsi,disk_type=lun,volume_size=x,volume_type=whatever` | 17:19 |
sean-k-mooney | so if it was backproted it would only be to 2024.1 | 17:19 |
simondodsley | i thin that backpor would be fine | 17:19 |
sean-k-mooney | yes taht looks correct to me | 17:19 |
simondodsley | cool | 17:20 |
sean-k-mooney | well you might need to set the destination type | 17:20 |
sean-k-mooney | destination_type=volume | 17:20 |
sean-k-mooney | you also need to set the source type | 17:20 |
sean-k-mooney | basically need to set all the filed for the block device mappings for that entry | 17:21 |
dansmith | sean-k-mooney: what's the justification for backport upstream? it's been broken forever, not sure it ever worked, not a regression in recent memory and thus seems entirely a feature to me | 17:22 |
sean-k-mooney | dansmith: just that it was tracked as a bug upstream as it worked in theory with old libvirt | 17:22 |
sean-k-mooney | but i agree that its not a very strong justification | 17:23 |
sean-k-mooney | that why im relucatnt to propose ti without impot form the stable team | 17:23 |
sean-k-mooney | e.g. elodilles or others | 17:23 |
dansmith | okay I thought you were implying it had a good justification.. to me it's just a feature.. if it ever worked it was long enough ago that it's not really a regression, | 17:24 |
dansmith | so even if it's fixing a bug that never worked, it's enabling a whole new workflow, behaviors, etc.. very feature-y to me | 17:24 |
sean-k-mooney | yep. canconcal can assesss if they want to fix it in there downstream indepenetely of if we think its upstream fixable too | 17:25 |
simondodsley | would it elp if a customer raised a bug report against this? | 17:25 |
sean-k-mooney | so simondodsley i would still reach out to them about it | 17:25 |
sean-k-mooney | simondodsley: not really we have downstream custoemr bugs relate to it too | 17:25 |
sean-k-mooney | although it never workined in our downstream that we can tell for any supproted release | 17:25 |
sean-k-mooney | i.e. its been broekn for many many years | 17:26 |
simondodsley | i get it | 17:26 |
sean-k-mooney | we are also debtatign if we can backprot this internally but we have not done that yet | 17:26 |
sean-k-mooney | our latest downstream product is based on antelope and it does tno have the otehr fature which is the harder oen to backprot safely | 17:27 |
dansmith | no, a customer bug report would not change my mind :) | 17:28 |
sean-k-mooney | for what its worth we totelly condier this to be a feature downstream | 17:28 |
sean-k-mooney | and honestly an incompelte one since you cant use it with attach | 17:29 |
sean-k-mooney | modifying the nova api to allow setting the device type to lun on attach will need a new microverison and someone to actully do the work :) | 17:30 |
sean-k-mooney | do be clear that woudl be nice ot have for other reason as it would allwo you to create a volume form an iso and attach it as a cd_rom | 17:31 |
sean-k-mooney | but i dont think anyone is curently working on that | 17:32 |
masahito | Hi nova folks, please review the API bug fix https://review.opendev.org/c/openstack/nova/+/939658 All the comments are resolved and i hope bug fix reviews for the next release is open now. | 18:29 |
sean-k-mooney | melwitt: i shoudl avlid checkign if things have merged before i go to bed :) ill fix the typos quickly `_SentinelExcpetion` was intentionally not `SentinelExcpetion` | 23:31 |
melwitt | sean-k-mooney: yes you should. that is usually a recipe for staying up even later in my experience :P | 23:32 |
melwitt | sean-k-mooney: also, I meant typo of the word Exception, not the underscore | 23:33 |
sean-k-mooney | oh i tried to compare the spelling but i didnt see that | 23:33 |
sean-k-mooney | ill copy pat your version and add the underscore | 23:33 |
melwitt | you are very consistent with your typos, I will give you that | 23:33 |
sean-k-mooney | with in the scope of a function/module | 23:34 |
sean-k-mooney | if a copiler/interperter shouts at me ill definetly fix it | 23:34 |
sean-k-mooney | but between files much less so | 23:35 |
opendevreview | sean mooney proposed openstack/nova master: wrap wsgi_app.init_application with latch_error_on_raise https://review.opendev.org/c/openstack/nova/+/945028 | 23:37 |
sean-k-mooney | melwitt: hopefully id didnt add any new typos but ^ shoudl fix everythign you noted | 23:37 |
sean-k-mooney | ah i reverst the ep to pe | 23:38 |
sean-k-mooney | *reversed | 23:38 |
melwitt | sean-k-mooney: looks good to me, thanks | 23:38 |
sean-k-mooney | by the way is there any reason to hold https://review.opendev.org/c/openstack/nova/+/943507 | 23:42 |
sean-k-mooney | we are past rc1 and master is not 2025.2 so we can merge that on master but we cant backport until we ship 2025.1 right | 23:42 |
sean-k-mooney | well backporting is kind if awkware because the config otpion depend on specirfic oslo limits versions | 23:43 |
sean-k-mooney | but we are past the rc period now so there is no reason not to +w that right? | 23:43 |
melwitt | sean-k-mooney: yeah, the backport is not straightforward :( so not sure what we will do there, maybe nothing | 23:44 |
sean-k-mooney | ya thats less the awsome but it what it is | 23:44 |
melwitt | sean-k-mooney: not that I know of but we could double check with bauzas/uggla | 23:45 |
sean-k-mooney | i could but i just +w'd i could remove it | 23:45 |
sean-k-mooney | but i think we are good | 23:45 |
melwitt | yeah it should be ok | 23:46 |
sean-k-mooney | i thikn there are 1 or 2 other chanage in a similar state but im not going to go look for them tonight | 23:47 |
sean-k-mooney | chat to you tomrrow o/ | 23:47 |
melwitt | seeya tomorrow o/ | 23:47 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!