Monday, 2025-03-24

*** ralonsoh_ is now known as ralonsoh		07:55
dansmith	gibi: sean-k-mooney: so starting to look at placement where the capacity checks happen, for addressing the overcommit bugs...	15:35
sean-k-mooney	oh fun	15:36
dansmith	I'm only seeing a single test that ever asserts the capacity-exceeded scenario in all of placement's unit and functional tests	15:36
dansmith	that is, like, unexpected since that's kinda the core function of placement	15:36
sean-k-mooney	when you say functional did you include the gabbit tests	15:36
dansmith	I know there are lots of gabbit tests I'm probably not grepping to find, but.. am I missing something?	15:36
sean-k-mooney	well the gabbit test are the primarly api test/funcitonal tests in placment	15:37
sean-k-mooney	i know they are closer to neutron fullstack tests then our functional tests	15:37
sean-k-mooney	but i think thats where a lot of the end to end functial behaivor is verifed	15:37
sean-k-mooney	https://github.com/openstack/placement/tree/master/placement/tests/functional/gabbits	15:38
sean-k-mooney	there are a number of allcoation an dallcoation bug tests there for sepcifc beahivor	15:39
dansmith	yeah,	15:39
dansmith	but only one that explicitly tests overcapacity for allocations, AFAICT,	15:40
sean-k-mooney	dansmith: i belive some of this is also enfoced at the db level althouhg i dont think they are usign db triggers or anything to actully encode it at the db level	15:40
dansmith	and it's just the simplest case of asking for more than exists	15:40
dansmith	sean-k-mooney: it is, and there's a single db test for one simple sccenario there	15:40
sean-k-mooney	ack	15:40
dansmith	I mean the db object layer in placement	15:40
sean-k-mooney	i guess the assumtin is if you neever allow a simple allcotion to oversubscibe that covers the more advanced cases	15:41
dansmith	there's a big complex function that checks if we're over capacity after an allocation	15:41
sean-k-mooney	right	15:41
sean-k-mooney	so https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/allocations-bug-1779717.yaml#L73 is testing creating an allcotion for multiple consumers	15:41
dansmith	it's not that simple though, that big function has a lot of edges that can't possibly be covered by a single test (single python test and a single gabbit test)	15:41
dansmith	yep	15:41
sean-k-mooney	although that is not really about capastiy	15:42
sean-k-mooney	lookign at the bug itself https://bugs.launchpad.net/nova/+bug/1779717	15:42
sean-k-mooney	i guess to un pick this a littel you might want to create a repoducer test for the current evac/cold migration case	15:43
sean-k-mooney	where we are doing the swap	15:43
sean-k-mooney	i suspect you can do that without gabit using the placement fixture	15:43
dansmith	well, I want to test the underlying mechanics in placement, not nova	15:43
sean-k-mooney	we certenly could do that in nova so i dont see why you could not in placment itself	15:44
dansmith	and I expected to see a lot more coverage in placement given how complex the capacity check is	15:44
sean-k-mooney	right but the the actial api behavior does not actully depend on nova	15:44
sean-k-mooney	you jsut need to create the allcoation mess with reserved then try and swap it	15:44
sean-k-mooney	basiclly emulate the OTU flow	15:45
sean-k-mooney	i know you know that by the way im just trying to state the repoducer clearly	15:45
sean-k-mooney	gabit is expressive enough to encode that but its not very deep coverage hench why you probly will want to try using the placement fixutre and writign a new functional test	15:46
dansmith	yeah I'm very aware of what I need to do, I'm just surprised that there is so little coverage of this _area_ in placement	15:46
sean-k-mooney	ill fully adming i have never been that deep on the implemation side of placement	15:47
sean-k-mooney	only the usage of it	15:47
sean-k-mooney	so i havent really looked tat the testing in any detail	15:47
dansmith	okay I figured you and gibi are the most familiar, but maybe not	15:47
dansmith	ack	15:47
sean-k-mooney	gibi and melwitt i think. api wise yes but ptyhon wise less so on my part	15:48
sean-k-mooney	im willing to review placement code but i generally need to learn the context every time i do	15:48
gibi	dansmith: I would expect more than one over capacity test case. let me take a look	15:49
sean-k-mooney	i think tis is the only functional testing of allctions https://github.com/openstack/placement/blob/master/placement/tests/functional/test_allocation.py	15:49
sean-k-mooney	i.e. in python	15:50
dansmith	I definitely want to do this in python functional tests and not gabbits, FWIW	15:50
sean-k-mooney	dansmith: this is the one you wer referint too for the db level right https://github.com/openstack/placement/blob/master/placement/tests/functional/db/test_allocation.py#L519	15:51
sean-k-mooney	a number of those functional tests are really like 5 tests in one	15:51
dansmith	yup	15:52
dansmith	yup, referring, but not sure yup in terms of coverage	15:52
dansmith	the capacity check is very complex :)	15:52
sean-k-mooney	https://github.com/search?q=repo%3Aopenstack/placement%20InvalidAllocationCapacityExceeded&type=code not that i realy turst github search but only 3 hits total	15:53
sean-k-mooney	codesearch agrees	15:53
sean-k-mooney	that excetion is not used elsewhere in the codebase	15:54
sean-k-mooney	i guess there is also InvalidInventoryCapacity	15:55
sean-k-mooney	but i think thats raised when you modify the inventory	15:56
dansmith	right	15:56
gibi	this file has some POST allocation tests https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/allocations-post.yaml	15:56
gibi	but I understand you want moe	15:57
gibi	more	15:57
sean-k-mooney	right and that does https://github.com/openstack/placement/blob/master/placement/tests/functional/gabbits/allocations-post.yaml#L212-L236	15:57
dansmith	gibi: yeah it's not just allocation, it's allocation with the capacity check	15:58
dansmith	*failing	15:58
gibi	this also has some coverage https://github.com/openstack/placement/blob/master/placement/tests/functional/db/test_allocation.py but I'm not against adding more :)	15:59
dansmith	that's really the only place in python, but yeah	16:04
gibi	in cases when I'm not sure what is covered and not covered I tend to go in and break the code in certain cases to see if it caught be tests. Maybe there are test cases in wrongly named test files hiding	16:07
sean-k-mooney	gibi: i dont think placement has the saem level fo in depth testing that nova has in genral. gabit is greate but often it was used instead of unit and functional testin instead of in addtion to them	16:27
sean-k-mooney	one of the test case that i looked at was teh sole testing in the commit that added a new microverions	16:27
sean-k-mooney	there may have been other changes in teh serise	16:28
sean-k-mooney	but if that really was the only commit to implemnt the microversion it shows that gabbit testing was considered enough coverage for an api change	16:28
dansmith	gibi: yeah I was going to create a reproducer for the issue and then break it to make sure it starts working.. so I guess we'll see if something stops when I do :)	16:37
simondodsley	hi - i'm looking for advice on how to disable `discard` on a single cinder volume that is coming from a cinder driver where `discard` is hardcoded to be `True`. A customer is having issues with multiattach volumes that they want to use for Windows clustering, in that the SCSI page 83h doesn't appear to be passing through the SCSI reservation settings.	16:44
simondodsley	I read that `nodiscard` may resolve this, but can't work out how to do the override. Overriding the whole backend for a test would be an option, but as I say the cinder driver has `disacrd` hardcoded. Is there somethin in nova that can help?	16:45
sean-k-mooney	simondodsley: so windwos clustering required the vm to be create with a volume configured to use type=lun which cant be set durign attach only vm creation	16:46
dansmith	yep, was just going to say ^	16:46
sean-k-mooney	we shoudl probaly fix that at some point	16:47
sean-k-mooney	but anyway i am not sure if you can overried that with our libvirt section	16:47
dansmith	is that enough to enable it? I thought not	16:47
sean-k-mooney	we have a cofnig option for discard but i tought it was for nova provisioend storage	16:47
dansmith	I thought lun was the only way	16:47
sean-k-mooney	no i think you need lun for scsi v3	16:47
simondodsley	so how do we set the type=lun at vm creation?	16:47
sean-k-mooney	i just ment that even behond that im not sure if you can overried discard form nova	16:47
sean-k-mooney	simondodsley: so the only discard cofnig im awre of in nova is https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.hw_disk_discard which woudl turn it off for the host but that says "nova managed disks"	16:49
dansmith	sean-k-mooney: I think if they want the guest to use discard and have that make it all the way to the actual physical device they need to use lun, no?	16:49
sean-k-mooney	dansmith: well that what will happen if discard is enabeld on the cinder volume in the connection info	16:49
dansmith	I thought libvirt discard would only hit the qemu storage driver, like ceph or qcow2	16:49
sean-k-mooney	dansmith: i guess there are two diffent things happenign here	16:50
sean-k-mooney	simondodsley made the assertion that truning of discard migh help with the scsi command making it to the backend. i think we both agree it not related to that	16:50
dansmith	tight	16:51
dansmith	*right :)	16:51
sean-k-mooney	simondodsley: so if you look at the server create api https://docs.openstack.org/api-ref/compute/#create-server	16:51
sean-k-mooney	there is an option device_type filed in the request	16:52
sean-k-mooney	in the block device mappings	16:52
sean-k-mooney	sorry that might not be the correct filed	16:52
sean-k-mooney	i think you want block_device_mapping_v2.disk_bus i will neeed to check the other doc	16:53
sean-k-mooney	https://docs.openstack.org/nova/latest/user/block-device-mapping.html	16:53
sean-k-mooney	ok no so you want disk_bus=scsi and deivce_type=lun	16:53
sean-k-mooney	"""disk_bus and device_type - low level details that some hypervisors (currently only libvirt) may support. Some example disk_bus values can be: ide, usb, virtio, scsi, while device_type may be disk, cdrom, floppy, lun. This is not an exhaustive list as it depends on the virtualization driver, and may change as more support is added. Leaving these empty is the most common	16:54
sean-k-mooney	thing to do."""	16:54
simondodsley	ok - so how would that be forced from the comand line, or the gui?	16:54
sean-k-mooney	you cant do it form horizon	16:55
sean-k-mooney	from the cli you an do it when creating the block device mapppings	16:55
sean-k-mooney	https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#server-create	16:55
sean-k-mooney	you have to use teh --block-device parmater instead of --volume	16:55
sean-k-mooney	https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#cmdoption-openstack-server-create-block-device	16:55
simondodsley	awesome - i see - thank you. We will give that a try	16:55
sean-k-mooney	simondodsley: when dan fixed this he added some tempest test that validate this is functional with lvm and multi attache. although obviosly not with a windows workload	16:57
sean-k-mooney	simondodsley: but we do have coverage of creating a vme with a mutiatch volume in lun mode	16:58
sean-k-mooney	https://github.com/openstack/tempest/blob/f4a8698b97a5433b8e64df408aace2fb14c2cb22/tempest/api/compute/volumes/test_attach_volume.py#L469	16:58
simondodsley	and this will ensure that SCSI page 83 will be passed through correctly?	16:58
sean-k-mooney	do you ahve a refence to that defintiion. it will ensure that scsiv3 command are aviable to the guest	16:59
sean-k-mooney	without that you onlyhave access to v2 commands	16:59
sean-k-mooney	simondodsley: by the way we only fixed this fucntionaly in recent version of nova	17:02
sean-k-mooney	i think form dalmation on	17:02
simondodsley	oh - so it won't work on caracal	17:03
simondodsley	how did anyone every use windows clustering before this fix then?	17:03
sean-k-mooney	i can check when we fixed it but no livbirt changed it api because qemu stop supproting the serial atibute with type lun a few years ago	17:03
sean-k-mooney	windows supprot clustierning with other ways of doing synconisation	17:03
sean-k-mooney	i.e. not file system absed it has 3-4 modes	17:04
sean-k-mooney	so either they used one of the other mode or they didnt do it with libvirt	17:04
sean-k-mooney	i.e. they were usihg hyperv or vmware or ironic	17:04
simondodsley	this is the error from windows - `Physical disk {} does not have the inquiry data (SCSI page 83h VPD descriptor) that is required by failover clustering. Status: The request is not supported`	17:04
sean-k-mooney	yep	17:05
dansmith	that's what lun should fix	17:05
simondodsley	but it only from dalmation...	17:05
sean-k-mooney	or really old libvirt	17:05
dansmith	simondodsley: also, clustered windows is not cloudy and openstack really focuses on cloudy workloads :)	17:05
dansmith	so that's why it's recent	17:06
simondodsley	i know that, but tell customers :D	17:06
dansmith	just saying.	17:06
sean-k-mooney	so the fix in dalmation was https://github.com/openstack/nova/commit/575ff86a4f1572786d66639f774405fbc074fdb1	17:06
sean-k-mooney	which depend on another change that was done in caracal	17:07
sean-k-mooney	it might be possibel to backport to caracal but not before	17:07
sean-k-mooney	the prior change i dont think is backportable (using alias)	17:07
dansmith	not backportable upstream, but doable for simondodsley you mean	17:08
sean-k-mooney	well if it just needs https://github.com/openstack/nova/commit/575ff86a4f1572786d66639f774405fbc074fdb1 that might be ok upstream but the prior change to use the alias instead of serial to lookup the devices is much more invasive	17:08
sean-k-mooney	but that already in caracal	17:08
simondodsley	if we can backport the patch to caracal I can try and convince canonical to apply it downstream as well	17:09
sean-k-mooney	simondodsley: its a serise of 3 patches https://review.opendev.org/q/topic:%22lun-device-serial%22	17:10
sean-k-mooney	simondodsley: i would like to get elodilles input on if that is upstream backporable	17:11
sean-k-mooney	simondodsley: strictly speaking https://bugs.launchpad.net/nova/+bug/2065084 was a latent bug for years so i think it could be backported upstram but it does not hurt to ask	17:18
sean-k-mooney	the new feature that was added in caracal that it implcictly depend on is not backportable upstream really	17:19
simondodsley	just so i haven't got this wrong... from Dalmation we can use `openstack server create --block-device disk_bus=scsi,disk_type=lun,volume_size=x,volume_type=whatever`	17:19
sean-k-mooney	so if it was backproted it would only be to 2024.1	17:19
simondodsley	i thin that backpor would be fine	17:19
sean-k-mooney	yes taht looks correct to me	17:19
simondodsley	cool	17:20
sean-k-mooney	well you might need to set the destination type	17:20
sean-k-mooney	destination_type=volume	17:20
sean-k-mooney	you also need to set the source type	17:20
sean-k-mooney	basically need to set all the filed for the block device mappings for that entry	17:21
dansmith	sean-k-mooney: what's the justification for backport upstream? it's been broken forever, not sure it ever worked, not a regression in recent memory and thus seems entirely a feature to me	17:22
sean-k-mooney	dansmith: just that it was tracked as a bug upstream as it worked in theory with old libvirt	17:22
sean-k-mooney	but i agree that its not a very strong justification	17:23
sean-k-mooney	that why im relucatnt to propose ti without impot form the stable team	17:23
sean-k-mooney	e.g. elodilles or others	17:23
dansmith	okay I thought you were implying it had a good justification.. to me it's just a feature.. if it ever worked it was long enough ago that it's not really a regression,	17:24
dansmith	so even if it's fixing a bug that never worked, it's enabling a whole new workflow, behaviors, etc.. very feature-y to me	17:24
sean-k-mooney	yep. canconcal can assesss if they want to fix it in there downstream indepenetely of if we think its upstream fixable too	17:25
simondodsley	would it elp if a customer raised a bug report against this?	17:25
sean-k-mooney	so simondodsley i would still reach out to them about it	17:25
sean-k-mooney	simondodsley: not really we have downstream custoemr bugs relate to it too	17:25
sean-k-mooney	although it never workined in our downstream that we can tell for any supproted release	17:25
sean-k-mooney	i.e. its been broekn for many many years	17:26
simondodsley	i get it	17:26
sean-k-mooney	we are also debtatign if we can backprot this internally but we have not done that yet	17:26
sean-k-mooney	our latest downstream product is based on antelope and it does tno have the otehr fature which is the harder oen to backprot safely	17:27
dansmith	no, a customer bug report would not change my mind :)	17:28
sean-k-mooney	for what its worth we totelly condier this to be a feature downstream	17:28
sean-k-mooney	and honestly an incompelte one since you cant use it with attach	17:29
sean-k-mooney	modifying the nova api to allow setting the device type to lun on attach will need a new microverison and someone to actully do the work :)	17:30
sean-k-mooney	do be clear that woudl be nice ot have for other reason as it would allwo you to create a volume form an iso and attach it as a cd_rom	17:31
sean-k-mooney	but i dont think anyone is curently working on that	17:32
masahito	Hi nova folks, please review the API bug fix https://review.opendev.org/c/openstack/nova/+/939658 All the comments are resolved and i hope bug fix reviews for the next release is open now.	18:29
sean-k-mooney	melwitt: i shoudl avlid checkign if things have merged before i go to bed :) ill fix the typos quickly `_SentinelExcpetion` was intentionally not `SentinelExcpetion`	23:31
melwitt	sean-k-mooney: yes you should. that is usually a recipe for staying up even later in my experience :P	23:32
melwitt	sean-k-mooney: also, I meant typo of the word Exception, not the underscore	23:33
sean-k-mooney	oh i tried to compare the spelling but i didnt see that	23:33
sean-k-mooney	ill copy pat your version and add the underscore	23:33
melwitt	you are very consistent with your typos, I will give you that	23:33
sean-k-mooney	with in the scope of a function/module	23:34
sean-k-mooney	if a copiler/interperter shouts at me ill definetly fix it	23:34
sean-k-mooney	but between files much less so	23:35
opendevreview	sean mooney proposed openstack/nova master: wrap wsgi_app.init_application with latch_error_on_raise https://review.opendev.org/c/openstack/nova/+/945028	23:37
sean-k-mooney	melwitt: hopefully id didnt add any new typos but ^ shoudl fix everythign you noted	23:37
sean-k-mooney	ah i reverst the ep to pe	23:38
sean-k-mooney	*reversed	23:38
melwitt	sean-k-mooney: looks good to me, thanks	23:38
sean-k-mooney	by the way is there any reason to hold https://review.opendev.org/c/openstack/nova/+/943507	23:42
sean-k-mooney	we are past rc1 and master is not 2025.2 so we can merge that on master but we cant backport until we ship 2025.1 right	23:42
sean-k-mooney	well backporting is kind if awkware because the config otpion depend on specirfic oslo limits versions	23:43
sean-k-mooney	but we are past the rc period now so there is no reason not to +w that right?	23:43
melwitt	sean-k-mooney: yeah, the backport is not straightforward :( so not sure what we will do there, maybe nothing	23:44
sean-k-mooney	ya thats less the awsome but it what it is	23:44
melwitt	sean-k-mooney: not that I know of but we could double check with bauzas/uggla	23:45
sean-k-mooney	i could but i just +w'd i could remove it	23:45
sean-k-mooney	but i think we are good	23:45
melwitt	yeah it should be ok	23:46
sean-k-mooney	i thikn there are 1 or 2 other chanage in a similar state but im not going to go look for them tonight	23:47
sean-k-mooney	chat to you tomrrow o/	23:47
melwitt	seeya tomorrow o/	23:47

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!