*** akekane_ is now known as abhishekk | 05:51 | |
*** alex_xu_ is now known as alex_xu | 06:37 | |
*** rpittau|afk is now known as rpittau | 07:07 | |
lyarwood | morning \o | 08:05 |
---|---|---|
*** bhagyashris_ is now known as bhagyashris | 08:27 | |
gibi | \o | 08:59 |
frickler | lyarwood: https://bugs.launchpad.net/nova/+bug/1452641 just came up in #openstack-dev , are you still planning to proceed with https://review.opendev.org/c/openstack/nova/+/579004/ ? (changing ceph mon addresses) | 09:15 |
lyarwood | frickler: yeah but not as part of that change anymore, I'm writing up a spec at the moment to provide a set of nova-manage commands to allow operators to do refresh this for SHUTOFF instances | 09:18 |
lyarwood | frickler: the alternative is for users to shelve and unshelve the instances | 09:18 |
frickler | lyarwood: well from an operator perspective it would be great to have a solution that it transparent to the users and allows them to keep instances running, but I admit that that may not be achievable | 09:20 |
lyarwood | yeah that's pretty hard if impossible | 09:21 |
* bauzas goes off for lunch a bit early | 09:22 | |
bauzas | will be back around 1130UTC | 09:22 |
opendevreview | Lee Yarwood proposed openstack/nova master: zuul: Add CentOS 8 stream integrated compute tempest job to gate https://review.opendev.org/c/openstack/nova/+/797616 | 09:57 |
sean-k-mooney | frickler: the best way to do that might just be to alwasy put the ceph monds behind haproxy or a keepalived vrrp vip | 11:58 |
sean-k-mooney | frickler: e.g. do not have the actul mon ips present | 11:58 |
sean-k-mooney | have a sperate one that you can move there instead | 11:58 |
gibi | sean-k-mooney, bauzas: I've left my view about vgpu in https://review.opendev.org/c/openstack/nova-specs/+/780452 | 11:59 |
bauzas | gibi: ack | 12:00 |
sean-k-mooney | "Does OWNER_CYBORG trait on an RP means that _every_ RC on that RP is managed by Cyborg?" so yes traits apply to all invetories in a RP | 12:01 |
sean-k-mooney | that is why traits are on the RP not the inventory | 12:01 |
bauzas | gibi: so your comment would be about providing only traits for Cyborg VGPUs ? I'm OK if so | 12:02 |
sean-k-mooney | bauzas: we woudl have to use a forbiden trait on nova if we only do if for cyborg | 12:02 |
bauzas | the cyborg-agent could create the RPs with the same RCs | 12:02 |
gibi | "Based on these assumptions keeping the VGPU RC for nova usage and creating a new standard or custom RC for cyborg works for me. Personally I vote for a new standard trait for cyborg vGPUs." that is my summary | 12:02 |
gibi | shit | 12:03 |
gibi | Personally I vote for a new | 12:03 |
gibi | standard RC | 12:03 |
gibi | :D | 12:03 |
sean-k-mooney | so standard is where i have a problem with that approch | 12:03 |
bauzas | sean-k-mooney: not, not really | 12:03 |
sean-k-mooney | are you really suggesting that every time we want 2 service to manage the same thing that we need service speicifc resouce classes | 12:04 |
bauzas | sean-k-mooney: we could have some kind of pre-filter asking for a non-Cyborg, only if we have configuratin options for cyborg | 12:05 |
gibi | sean-k-mooney: I think we should avoid having two service manage the same resource. (you probably know that I'm against duplicating vgpu logic between nova and cyborg) | 12:05 |
sean-k-mooney | right but honestly i dont think vgpus should be in cyborg | 12:05 |
sean-k-mooney | they are not a prgramable devce | 12:05 |
bauzas | sean-k-mooney: as we discussed, it looks to me it's a non-mixed resource | 12:06 |
bauzas | if so, having another RC is understoodf | 12:06 |
gibi | sean-k-mooney: I stopped arguing on either side of moving the logic to cyborg or keeping it to nova only. | 12:06 |
gibi | sean-k-mooney: that would require one side of the table compromise | 12:07 |
gibi | sean-k-mooney: I have no bacon on either side | 12:07 |
gibi | so I don't want to force either party to accept the compromise | 12:08 |
gibi | this means we will have a sitation to manage a single type of resource from two services | 12:09 |
gibi | siutation | 12:09 |
gibi | I assume this is a rare situation | 12:09 |
sean-k-mooney | gibi: well i have been trying to get redhat to support cyborg since i was still working at intel and its not goning to happen anytime soon as in not before the A release and its not even slated for that presently | 12:09 |
sean-k-mooney | well not nessiarly | 12:10 |
bauzas | my point is to say : either we mix the same RC or not | 12:10 |
sean-k-mooney | i do think there will be other cases but in genral yes | 12:10 |
bauzas | if we mix, we could use the conductor to know whether we would ask cyborg or not | 12:10 |
sean-k-mooney | each RC tends to be specific ot a service | 12:10 |
bauzas | if we don't mix, we don't really need to have the same RC | 12:10 |
sean-k-mooney | cinder is the other example | 12:10 |
sean-k-mooney | it should be modeling its capastiy as disk_gb | 12:11 |
sean-k-mooney | which we use for local disk | 12:11 |
* gibi needs to jumpt to an urgent call for 10 mins | 12:11 | |
sean-k-mooney | ack | 12:11 |
sean-k-mooney | bauzas: gibi it sound like both of ye want to use a new resouce class so if that is what is requird to move this forward so be it | 12:12 |
swp20 | sean-k-mooney: hi, our cyborg tempest failed with the exception: Reason: 'Query' object has no attribute 'with_lockmode' . please help us with the refer:https://fc9dac502ce26d84f9de-05fc50868e17ec6a804428f62cd7e454.ssl.cf1.rackcdn.com/797427/4/check/cyborg-tempest/bd72fc3/job-output.txt | 12:13 |
sean-k-mooney | that might be form here https://github.com/openstack/cyborg/blob/ba6a35c67ee2e25cccc693d59cae2d2182aefa58/cyborg/db/sqlalchemy/api.py#L257 | 12:16 |
sean-k-mooney | swp20: i suspect that it has change in the new version fo sqlacamy | 12:17 |
sean-k-mooney | i think the excption is comming form here https://github.com/openstack/cyborg/blob/bb35be1b86953c6df5fd9a300221cd45e359e8ec/cyborg/conductor/manager.py#L150-L158 or here https://github.com/openstack/cyborg/blob/bb35be1b86953c6df5fd9a300221cd45e359e8ec/cyborg/conductor/manager.py#L170-L175 | 12:18 |
swp20 | it is because the placement changed the version of sqlalermy? | 12:18 |
swp20 | yeah, there are with_lockmode in cyborg project. | 12:19 |
sean-k-mooney | no not placment | 12:19 |
sean-k-mooney | Collecting SQLAlchemy===1.4.18 | 12:19 |
sean-k-mooney | there was a new release of sql alchemy not so log ago | 12:20 |
sean-k-mooney | and all the pojects need to adapt too it | 12:20 |
sean-k-mooney | we updated to 1.4 2 months ago https://github.com/openstack/requirements/commit/dc86260b283dedc3076d7873f5f031f45e3e3671 | 12:21 |
sean-k-mooney | it looks like that did not check for compatiablity in cyborg | 12:21 |
sean-k-mooney | "When the Query.with_lockmode() method were deprecated in favor of Query.with_for_update()..." | 12:25 |
sean-k-mooney | looks like https://docs.sqlalchemy.org/en/14/orm/query.html#sqlalchemy.orm.Query.with_for_update is the replacemnt | 12:26 |
sean-k-mooney | swp20: https://docs.sqlalchemy.org/en/14/orm/query.html#sqlalchemy.orm.Query.with_for_update is the replacment for Query.with_lockmode() | 12:27 |
swp20 | sean-k-mooney: ok, i'll try to update. thanks a lot. | 12:27 |
sean-k-mooney | swp20: it looks like it was deprecated in 1.1 or 1.2 they just finally got around to droping it in 1.4 | 12:29 |
swp20 | sean-k-mooney: thanks. the version in cyborg requirement.txt now is SQLAlchemy>=0.9.0,!=1.1.5,!=1.1.6,!=1.1.7,!=1.1.8 # MIT, i have update, hope this will success. | 12:32 |
sean-k-mooney | well you should not be capping the version locally in cyborg | 12:35 |
sean-k-mooney | that shoudl be managed via the upper constraitns file in the requirement repos | 12:35 |
sean-k-mooney | sicne all project are ment to be co installable the upper requrieemtn i manged cerntrally | 12:36 |
sean-k-mooney | so cyborg shoudl be updated to supprot 1.4 | 12:36 |
sean-k-mooney | in doing so your minium requiredm will incerease to 1.2 | 12:36 |
sean-k-mooney | for the Query.with_for_update method | 12:36 |
sean-k-mooney | actuly no | 12:36 |
sean-k-mooney | it was added in 0.9 | 12:37 |
sean-k-mooney | swp20: so you dont have to udpated your requiremetn.txt in cyborg | 12:37 |
* gibi is back | 12:38 | |
gibi | bauzas, sean-k-mooney: yeah I think in this specific case we need to allow both service to manage vgpus and in this specific case it is a lot simpler and cleaner to have separate RCs. | 12:38 |
gibi | sean-k-mooney: for the cinder case. I think the different there is tha cinder's disk_gb is not consumable by nova but consumable by the cinder backned. And this consumability needs to be modelled | 12:39 |
sean-k-mooney | what about for generic VFs and PFs for neutron sriov | 12:39 |
sean-k-mooney | because there is work to support cyboprg mandaged sriov vfs also | 12:40 |
gibi | good point | 12:40 |
sean-k-mooney | that are assocated with a nutorn port | 12:40 |
sean-k-mooney | we can use a differet resouce class there too | 12:40 |
sean-k-mooney | but this pattern has many implications if we repeat it | 12:41 |
bauzas | again, same thoughts here | 12:41 |
bauzas | mixed case: we need to use the same RC | 12:41 |
bauzas | non-mixed case : lgtm for another RC | 12:41 |
sean-k-mooney | well its not mixed in that we alwasy iknow if its form nova or cyborg | 12:41 |
sean-k-mooney | they will have different vnic types | 12:41 |
sean-k-mooney | in this case acclerator_direct vs direct | 12:42 |
gibi | does devices allocated for accelerator_direct tracked by nova pci tracker? I think it is not it is tracked by cyborg only | 12:42 |
sean-k-mooney | its only tracked by cyborg | 12:43 |
sean-k-mooney | like the vgpus | 12:43 |
sean-k-mooney | you could whitelist and track it in the pci tracker if it was stateless | 12:43 |
sean-k-mooney | cyborg only adds the ablity to flash an image on to it if need or do other stateful things | 12:44 |
* gibi thinking hard | 12:45 | |
sean-k-mooney | in the case of intels current propsal they are only supportin staticly pre programed devices but plan to make it dynmaic later | 12:45 |
sean-k-mooney | the current hardware does not manage the capailtie at the VF level, its card wide hence static intially until new hardware is release that is more granular | 12:46 |
swp20 | sean-k-mooney: so what's the main reason of the exception? | 12:46 |
gibi | sean-k-mooney: is it actually the first time that nova needs to handle this situation that both nova and another service tracks the same type of resource? | 12:49 |
gibi | in case of accelerator_direct we don't have the placement issue yet, as no PCI devices is modelled in placement yet | 12:49 |
gibi | so vgpu seems to be a first one when we actually do this | 12:49 |
sean-k-mooney | swp20: you are using a function that was removed in 1.4 it was deprecated around 0.9 you shoudl jsut use the replacemnt funciton | 12:50 |
sean-k-mooney | gibi: yes its the first time since cinder did not modle anything in placment yet | 12:50 |
sean-k-mooney | we would have the same issue with disk_GB | 12:51 |
gibi | and we will have the same issue with VFs one we have smartnic in cyborg and PCI devices in placement | 12:51 |
gibi | once | 12:51 |
bauzas | sean-k-mooney: gibi: that's why I think we should maybe think about some kind of provider type | 12:52 |
bauzas | we couldn't just use traits for owners | 12:53 |
bauzas | or this would mean that we would have traits for all of the resources we currently have | 12:53 |
bauzas | or, using another RCs | 12:54 |
sean-k-mooney | we proposed having provider type before the trait idea | 12:54 |
bauzas | that's the alternative | 12:54 |
sean-k-mooney | the only reason we went with the trait was peopel did not want to do the work to extend placment to model that | 12:54 |
bauzas | sean-k-mooney: this was 3 years ago, right? | 12:54 |
bauzas | sean-k-mooney: and people weren't thinking not only about cyborg, right? | 12:55 |
sean-k-mooney | when it was first propsoed yes | 12:55 |
sean-k-mooney | but it cam up at the ptg sicne | 12:55 |
gibi | I think we tend to mix provider type with consumer type, the latter is what was discussed in placement | 12:55 |
bauzas | I'm pretty sure that if cinder was asking to use the same RCs, then we would think about other alternatives than owner traits... | 12:55 |
sean-k-mooney | gibi: no we have discussed both | 12:55 |
bauzas | but here, see | 12:56 |
sean-k-mooney | owner triats was inteded to be a version fo provider types that did not reuqire placment code chagnes | 12:56 |
bauzas | for me, if cinder wants to support some RCs, those would be like oranges | 12:56 |
bauzas | while nova looks at apples | 12:56 |
sean-k-mooney | then we shoudl get rid of the idea of standtar resouce classes | 12:56 |
sean-k-mooney | they have 0 value if they cannot be shared | 12:56 |
sean-k-mooney | in fact they have negitive value | 12:57 |
gibi | sean-k-mooney: standard means you predefine them and therefore easily standardize them in the flavor extraspec too | 12:57 |
sean-k-mooney | that is not how i view them | 12:58 |
sean-k-mooney | if standarising them does not bring interoperatbltiy then we can just standarise them in the code of the project that uses them | 12:58 |
sean-k-mooney | without needing a lib to do that | 12:58 |
sean-k-mooney | the project could simple have registered there "standard" resocue classes when they first connect to placment and we could have even namespacee them as the owner if we wanted too | 12:59 |
gibi | sean-k-mooney: you are right I withdraw my the above :) | 13:00 |
gibi | os-resource-classes for shering | 13:00 |
gibi | sharing | 13:00 |
sean-k-mooney | so we have a few optiops, we can proceed with new "standard" maybe "shared" is better resocues classes for cyborg devices | 13:01 |
gibi | but sharing does not make too much sense. At least a disk_gb in cinder backend and a disk_gb in nova local storage are not interchangeable | 13:01 |
sean-k-mooney | we can use the same with some other owner mechanium | 13:01 |
sean-k-mooney | or we could use custom resouce classes right | 13:01 |
sean-k-mooney | gibi: sharing requires ownership of the resouce | 13:01 |
gibi | doesn't sharing means we share ownership? | 13:02 |
bauzas | sharing vs. sharding | 13:02 |
bauzas | we discussed this yesterday | 13:02 |
sean-k-mooney | gibi: no it means to sue it multiple service both mush supprot tracking the ownwership in placment vai some mechanium | 13:02 |
bauzas | gibi: agreed on the sharing ownership | 13:03 |
gibi | sharing an RC via os-resource-classes only make sense to me if there are two services and both managing that RC and that RC represents an interchangeable resource regardless of which serivice is reported it | 13:03 |
bauzas | gibi: if the conductor gets some allocation from a shared resource, it should pass the allocation to the right service | 13:03 |
sean-k-mooney | gibi: to me that is not what that means | 13:03 |
stephenfin | Python 3.10 looks pretty sweet. It'll be fun to use that in 5 years or whatever :-D https://lwn.net/Articles/860389/ | 13:04 |
sean-k-mooney | todate we dodn thave any service that share a common resouce class because we have not modeled ownwersyhip yet | 13:04 |
bauzas | sean-k-mooney: we don't need to model ownership for shared resources | 13:04 |
sean-k-mooney | stephenfin: it has some nice things yest like that swtich statement based on patern matchin | 13:04 |
sean-k-mooney | stephenfin: we should be able to bump our min python to 3.8 soon | 13:04 |
bauzas | if we have same resources, this is conceptually the same | 13:04 |
opendevreview | Rodolfo Alonso proposed openstack/os-vif master: Make explicit the network backend used in the CI jobs https://review.opendev.org/c/openstack/os-vif/+/797640 | 13:05 |
sean-k-mooney | stephenfin: im hoping that 3.8 bump happens next cycle | 13:05 |
bauzas | you're asking for apples, whether they are provided by a grocery or by something else | 13:05 |
sean-k-mooney | bauzas: what makes a cyborg vgpu different form a nova one | 13:05 |
opendevreview | Rodolfo Alonso proposed openstack/nova master: Make explicit the network backend used in the CI jobs https://review.opendev.org/c/openstack/nova/+/797641 | 13:05 |
sean-k-mooney | they are identical | 13:06 |
bauzas | sean-k-mooney: if you're asking for a resource that's not the same than an apple, this is not an apple | 13:06 |
sean-k-mooney | form a user persective | 13:06 |
bauzas | the user sees flavors | 13:06 |
sean-k-mooney | not in all cases | 13:06 |
bauzas | he doesn't see resources | 13:06 |
sean-k-mooney | the extra specs are not alwasy visable to users | 13:07 |
bauzas | I'm done with our "power users" | 13:07 |
sean-k-mooney | in fact they might not be visable by default that is contoled by policy | 13:07 |
bauzas | sean-k-mooney: but with cyborg, they don't see VGPUs when looking at the flavors, right? | 13:07 |
sean-k-mooney | both are stored in the flavor as an extra spec | 13:08 |
bauzas | they see device profiles, right? | 13:08 |
sean-k-mooney | e.g. resouce:vgpu or device-profile=whatever | 13:08 |
bauzas | this is like ironic | 13:08 |
sean-k-mooney | or pci_ailais=my-gpu | 13:08 |
bauzas | resource:vgpu=1 is only a nova syntax for nova-managed vgpus, right? | 13:08 |
gibi | flavors with resource:vgpu and flavors with device-profile could be interchangeable from the user perspective if he gets a VM with vGPU passed through in both case | 13:08 |
bauzas | in the flavor, I mean | 13:09 |
bauzas | gibi: you're mentioning a mixed usecase | 13:09 |
sean-k-mooney | bauzas: kind of yes. you could use the sriov based vgpus with pci passthough alias but for mdevs yes | 13:09 |
bauzas | gibi: where users don't care whether vgpus are offered by nova or cyborg | 13:09 |
gibi | bauzas: sort of yes | 13:09 |
sean-k-mooney | bauzas: yes but 99% they wont | 13:10 |
gibi | bauzas: I want to figure out which resource is interchangeable | 13:10 |
sean-k-mooney | cyborg is an mostly admin only api | 13:10 |
sean-k-mooney | so normally user wont interact with it | 13:10 |
bauzas | I think we're boiling the ocean and we need to babystep | 13:10 |
bauzas | *for the moment* cyborg flavors are using device profiles, right? | 13:10 |
sean-k-mooney | yes | 13:10 |
gibi | bauzas: we are struging here as we i) want a baby step but ii) doen't want to create a wrong precedence | 13:11 |
bauzas | ok, so they're conceptually different from nova | 13:11 |
gibi | precedent | 13:11 |
bauzas | gibi: agreed | 13:11 |
bauzas | gibi: that's why I dislike owner traits tbh | 13:11 |
bauzas | this looks to me an horrible hack for a single purpose | 13:11 |
sean-k-mooney | its not | 13:12 |
bauzas | sean-k-mooney: this is a hack, because this requires to touch the current modeling we have | 13:12 |
gibi | if we go with separate RC now, then we probably copy that for accelerator_direct + PCI in placement, and for disk_gb for cinder + nova too | 13:12 |
sean-k-mooney | i spent weeks tryign to come up with another way and that was the only way i coudl get jhone and other to consier moving ti forward | 13:12 |
gibi | and then I agree with sean that os-resource-classes has no use | 13:12 |
bauzas | gibi: we can make a consensu | 13:12 |
bauzas | gibi: or a statement if you prefer | 13:13 |
sean-k-mooney | bauzas: the current modeling does not fit our needs | 13:13 |
bauzas | gibi: if you have requirements for scheduling decisions that require your resources to be shared, you have to use other RCs | 13:13 |
bauzas | sharded* | 13:13 |
bauzas | and absolutely not shared | 13:13 |
bauzas | gibi: but if you are OK with having scheduling decisions that accept to mix your resources with other resources, then eventually the conductor (or the scheduling client rather) has to place the request to the right service | 13:14 |
sean-k-mooney | bauzas: we can supprot that but its a lot more work for the cyborg team | 13:15 |
gibi | bauzas: I think the scheduling decision can be made independently from the service providing the resources, the consumption of the resources on the hypervisor needs the information which service tracks the phyisical resource | 13:15 |
sean-k-mooney | are you willing to help them do that correctly | 13:15 |
bauzas | gibi: agreed with your statement, if you're talking about the "mixed case" | 13:16 |
bauzas | sean-k-mooney: for the cyborg team, they wanna do sharding | 13:16 |
gibi | bauzas: even in a not mixed case you have to claim the resource frome the service it is tracking it | 13:16 |
sean-k-mooney | bauzas: they only want to use the same resouce class because we told them to do that | 13:17 |
bauzas | gibi: the claim is the allocation | 13:17 |
sean-k-mooney | bauzas: we told them to do that to not add service specific standard resouce clases | 13:17 |
gibi | bauzas: nope, there is a claim in placement via the allocation, but you also need to talk to the service providint the resource to be able to use it (e.g. neutron to plug, cyborg to program and provide a pci address or mdev, cinder to provide the volume attachment information) | 13:18 |
bauzas | gibi: right | 13:18 |
* sean-k-mooney wonder if we shoudl even use placement at this point since we cannot agree on how to use it and it has repeatbly been a blocker to adding featrure to openstack | 13:18 | |
bauzas | gibi: this is the conductor which would bind the arq | 13:19 |
sean-k-mooney | bauzas: it can only do that if we have a device profile | 13:19 |
bauzas | gibi: I mean, this is the conductor which would call cyborg-agent to bind the arq | 13:19 |
sean-k-mooney | if we do not we do not have the info requried to create and bind the arq | 13:19 |
bauzas | sean-k-mooney: correct, but we know the flavor | 13:19 |
bauzas | for the mixed case | 13:19 |
bauzas | say, we ask some flavor with a device profile that's turned into VGPUs | 13:20 |
bauzas | if this is acceptable to get a host not managed by cyborg, then the conductor won't call the agent to bind | 13:20 |
sean-k-mooney | that would be incorrct | 13:20 |
bauzas | but if the allocated host is cyborg-managed, then the conductor would have to call the agent | 13:21 |
sean-k-mooney | no that is not architeutaly valid | 13:21 |
sean-k-mooney | the device profile is not just a resouce calss. | 13:21 |
sean-k-mooney | its a resouce class, possibel soem traints and optionaly image to program | 13:22 |
sean-k-mooney | if the request comes form a device-porfile it __must__ be fulfiled form cyborg | 13:22 |
bauzas | you could end up with some non-cyborg managed host using the same RCs and traits, right? | 13:22 |
bauzas | sean-k-mooney: then, a device profile is ABSOLUTELY AND NECESSARLY a request to shard your cloud | 13:23 |
sean-k-mooney | today yes that is why ownwer traits were bting intoduced to make sure that will not happen | 13:23 |
bauzas | and for this, I think a different RC is the viable option | 13:23 |
gibi | hm | 13:23 |
gibi | if a device requested by a device profile is not just a pci resource but some extra capability (e.g. programming) | 13:24 |
gibi | then we should model that capability in placement | 13:24 |
gibi | and then the scheduling will be based on capability not based on ownership | 13:24 |
sean-k-mooney | that does not work for vgpus | 13:25 |
gibi | if a cyborg vgpu has same extra feature compared to nova vgpu then we shoud modell that | 13:25 |
sean-k-mooney | they are not programable | 13:25 |
sean-k-mooney | not in the way fpgas are | 13:25 |
sean-k-mooney | gpus are fixed function asics withthere own isntrcution set | 13:25 |
sean-k-mooney | they are just a co processor not a programble hardwarer device | 13:25 |
gibi | if the cyborg vgpu totally the same device as a nova vgpu then we should be able not to differentiate between them | 13:25 |
sean-k-mooney | gibi: they are litrally the same model of nvidia gpu | 13:26 |
gibi | it is just a vgpu resource nothing more | 13:26 |
gibi | regardless how you request it | 13:26 |
gibi | you as a user get the same vgpu in your vm | 13:26 |
bauzas | that's my point | 13:26 |
sean-k-mooney | that point is incompatbale with "use different RC to tack ownwership" they are identical form a user point of view so shoudl have the same resouce class | 13:27 |
bauzas | if a device profile is a set of request groups, OK | 13:27 |
gibi | so if two service represents non interchangeable resource then the difference in capability needs to be modelled in placement (as a trait) | 13:27 |
gibi | if the two service represents the interchangeable resource then we use the same RC and we forbid to deploy a mixed cloud | 13:27 |
gibi | as a mixed could would be undecideable | 13:28 |
bauzas | sean-k-mooney: in this case (the mixed one), this is absolutely OK to get allocation candidates from a device profile that AREN'T cyborg managed hosts | 13:28 |
sean-k-mooney | gibi: well you could use aggreate for that | 13:28 |
gibi | sean-k-mooney: yeah we can allow that | 13:28 |
gibi | maybe | 13:28 |
sean-k-mooney | bauzas: i dont agree | 13:29 |
sean-k-mooney | if that is the case then its absolutely ok to get allcoations form a pci passthough alias form cyborg | 13:29 |
sean-k-mooney | or to get mdevs from cyborg | 13:29 |
sean-k-mooney | when using generic mdevs | 13:30 |
bauzas | agreed, with VGPU standard request, you could end up getting a cyborg managed host | 13:30 |
sean-k-mooney | we can support that but if we do it has to be symetirc | 13:30 |
bauzas | and I'm OK with this | 13:30 |
bauzas | but you said it's more than a standard RC | 13:31 |
sean-k-mooney | lets take that case | 13:31 |
sean-k-mooney | you have resouce:vgpu1 | 13:31 |
bauzas | but as I said, I'm OK with getting results using an explicit request group that maps a device profile | 13:31 |
sean-k-mooney | its alocated form a cyborg RP | 13:31 |
sean-k-mooney | how do you boot that vm | 13:31 |
sean-k-mooney | bauzas: the device profile is in its own requet group by the way | 13:32 |
bauzas | the conductor which gets the candidates would see this is cyborg target and accordingly would ask the agent to bind the arq | 13:32 |
sean-k-mooney | bauzas: what arq | 13:32 |
sean-k-mooney | we just had resouces:vgpu | 13:32 |
sean-k-mooney | there is not arq or device type referenced in the flavor | 13:32 |
sean-k-mooney | *device-profile | 13:33 |
sean-k-mooney | and without a device-profile you cannot create an arq | 13:33 |
bauzas | so, again, that means we CAN'T be scheduler agnostic | 13:34 |
bauzas | in this case, you're asking for orange | 13:34 |
bauzas | and not appel | 13:34 |
bauzas | apple | 13:34 |
sean-k-mooney | but we agree that the device the guest sees is idential | 13:34 |
sean-k-mooney | so they are apples | 13:34 |
gibi | nova implementation needs a distintion but for the user the result is the same | 13:35 |
sean-k-mooney | they just have different requiremetn to consume them | 13:35 |
sean-k-mooney | gibi: exactly | 13:35 |
gibi | that is bad :/ | 13:35 |
sean-k-mooney | this is an internal impleemntaiton detail of how the cloud was deployed | 13:35 |
gibi | we have two way to get the to the same goal :/ | 13:35 |
sean-k-mooney | technially 3 | 13:36 |
gibi | :D | 13:36 |
sean-k-mooney | amd and nviai now supprot vgpu via palin sriov | 13:36 |
sean-k-mooney | so you can jsut use pci passthough | 13:36 |
bauzas | what I'm sad is that I thought cyborg would accept the mixed scenario | 13:36 |
bauzas | but they explicitely ask to split the cloud | 13:36 |
bauzas | due to implementation details | 13:37 |
sean-k-mooney | its not cyborg fault | 13:37 |
sean-k-mooney | nova is not being expresive enough in its query to placment | 13:37 |
bauzas | as they're unable to post-create the resources | 13:37 |
bauzas | no | 13:37 |
sean-k-mooney | no we coud but we do not have the info required | 13:37 |
bauzas | you're asking placement to give you apples | 13:37 |
bauzas | it's not nova's fault that you ask for apples coming specially from Tesco | 13:38 |
sean-k-mooney | for example we could list all the device-profile that request the vgpu resouce and select one | 13:38 |
sean-k-mooney | and use that to create teh arq | 13:38 |
sean-k-mooney | ah but that is not what we asked for | 13:38 |
sean-k-mooney | we assed for resouce:vgpu=1 | 13:38 |
sean-k-mooney | with no other qulifyer | 13:38 |
bauzas | ok, back to the original thoughts (that's already 2 hours we're tangling the problem) | 13:39 |
bauzas | we progressed | 13:39 |
sean-k-mooney | the current propsal uses ownwer traits and a prefilter to make that owrk | 13:39 |
bauzas | we know we wanna express 'placement, give me apples that come from tesco and not from carrefour" | 13:39 |
sean-k-mooney | transparent ot the user or admin by adding a required trati for nova or forbidnint trait for cyborg in the resouce:vgpu case | 13:39 |
bauzas | sean-k-mooney: in this case, I'd prefer the forbidden trait way | 13:40 |
sean-k-mooney | we can do that but it makes nova special | 13:40 |
bauzas | "placement, give me apples from Tesco" in the cyborg case | 13:40 |
bauzas | "placement, give me apples not coming from Tesco as I know you got some of them" | 13:40 |
bauzas | sean-k-mooney: only if cyborg is configured on this cloud | 13:40 |
bauzas | this is the "as I know" | 13:41 |
sean-k-mooney | that works if the class is only shared across 2 service but not 3+ | 13:41 |
bauzas | nothing would change for operators who don't give a single penny to cyborg | 13:41 |
sean-k-mooney | there is another way i guess | 13:41 |
sean-k-mooney | we could use placment aggreate to track ownership | 13:42 |
bauzas | the other way is aggregates | 13:42 |
bauzas | "placement, don't give me apples coming from those supermarkets" | 13:42 |
sean-k-mooney | each proejct woudl add all there rps to an aggate | 13:42 |
bauzas | sean-k-mooney: I insist | 13:42 |
sean-k-mooney | and then nova would in the nova case add a member_of requirement to the vgpu request group | 13:42 |
bauzas | sean-k-mooney: whatever the solution is, the prefilter which would have the excluding logic (for the nova case) should only do this if cyborg manages hosts | 13:43 |
opendevreview | Lee Yarwood proposed openstack/nova master: Add check job for FIPS https://review.opendev.org/c/openstack/nova/+/790519 | 13:43 |
sean-k-mooney | bauzas: that is trival to do | 13:43 |
sean-k-mooney | bauzas: we just have a config option for it | 13:43 |
sean-k-mooney | like pcpus | 13:44 |
bauzas | sean-k-mooney: I don't want upgrade impacts if the operator doesn't use cyborg yet | 13:44 |
bauzas | sean-k-mooney: can we express "give me hosts that are not from this aggregate" ? | 13:44 |
sean-k-mooney | there wont be one if we do this correctly regarless of ownwer traits ro differnt resouce classes | 13:44 |
sean-k-mooney | bauzas: yes | 13:44 |
bauzas | ok, so there are no operations for nova-managed hosts | 13:45 |
sean-k-mooney | but i do not think we shoudl do a negitive member_or for cyborg | 13:45 |
bauzas | sean-k-mooney: the other way | 13:45 |
bauzas | sean-k-mooney: cyborg would manage an aggregate of managed hosts | 13:45 |
sean-k-mooney | for nova the prefilter when enabled would do member_op=nova-aggate or member_of=cyborg-aggreate | 13:45 |
bauzas | the prefilter would ask for hosts that are members of this agg in the cyborg case | 13:45 |
sean-k-mooney | when prefilter is off it wont do anything | 13:46 |
bauzas | but in the nova case, the prefilter would ask hosts that AREN'T hosts of this agg | 13:46 |
bauzas | sean-k-mooney: again, I don't wanna manage a fleet of nova hosts | 13:46 |
sean-k-mooney | bauzas: no i dont think that is the corrct way to do that | 13:46 |
sean-k-mooney | bauzas: that is short sighted in my opipion | 13:46 |
sean-k-mooney | we can have nova automatic regesiter in the nova host aggrate | 13:47 |
bauzas | eeek | 13:47 |
sean-k-mooney | we just need to choose a fixed uuid for it using uuid5 | 13:47 |
bauzas | yet again some hack for a non-necessary need in the case of operators don't care about cyborg | 13:47 |
gibi | I think we should forbid to have two services provide exactly the same thing. As soon as one of the service provides an extra capability like programability then we can schedule based on that. Ownership is an artificial quality we invented as there is no difference between the two vGPU devices. | 13:47 |
sean-k-mooney | gibi: ok so we are going to reject the cybrog vgpu spec then | 13:48 |
sean-k-mooney | gibi: since it provides nothing over novs implemenation ? | 13:48 |
gibi | so I'm back to my argument that I don't like duplicating capabilities between services | 13:48 |
gibi | as it adds 0 value | 13:48 |
bauzas | I have to bail out for 20 mins | 13:48 |
sean-k-mooney | should we schdule a call on this topic to get wider input | 13:49 |
gibi | sean-k-mooney: do you know who else cares about one or the other vgpu support? | 13:49 |
sean-k-mooney | gibi: ill bring up the idea of supporting cyborg in our product again today and see if there is any apitate for that | 13:49 |
sean-k-mooney | gibi: not not really | 13:50 |
bauzas | sean-k-mooney: I don't see it happening in OSP18 | 13:50 |
bauzas | soooo, 1+ | 13:50 |
gibi | as I only argue from the complexity persepective I cannot argue about the business perspective, I have no business in this area | 13:50 |
bauzas | A+ | 13:50 |
sean-k-mooney | i was jsut wondering if we want dansmith or johnthetubague | 13:50 |
sean-k-mooney | basicaly the wider core team | 13:50 |
opendevreview | Stephen Finucane proposed openstack/nova stable/wallaby: Test numa and vcpu topologies bug: #1910466 https://review.opendev.org/c/openstack/nova/+/797652 | 13:51 |
opendevreview | Stephen Finucane proposed openstack/nova stable/wallaby: Fix max cpu topologies with numa affinity https://review.opendev.org/c/openstack/nova/+/797653 | 13:51 |
dansmith | well, I can't really speak to the business side either, | 13:51 |
dansmith | but I totally agree with gibi on a technical level | 13:51 |
dansmith | it makes no sense to me to build half of cyborg inside nova unless there's some reason it has to be done that way | 13:52 |
sean-k-mooney | re no duplication i dont nessisarly dissagree just practicalities | 13:52 |
dansmith | practicalities of "it's just easier to hack in the bits we care about into nova than to coordinate with or contribute to another project" right? | 13:53 |
sean-k-mooney | ok taking downstream out of this i think it would be goind to do a paper exersise of thinking howe we woudl supprot shareing of resouce classes ectra in general between services | 13:53 |
dansmith | isn't that what the long-awaited consumer types is supposed to help with? | 13:54 |
bauzas | dansmith: that's my main concern | 13:54 |
sean-k-mooney | dansmith: well not just its easier because if we are bing honest its not eair to land thing in nova | 13:54 |
bauzas | adding nova traits for the whole purpose of cyborg seems invasive | 13:54 |
sean-k-mooney | dansmith: not really no | 13:54 |
sean-k-mooney | consumer type does nto help with shareing resouce classes between services | 13:55 |
dansmith | sean-k-mooney: well, it does if you have a nested provider so you know where the seam is | 13:55 |
sean-k-mooney | dansmith: no because cybrog creates RPs under the compute node RP | 13:56 |
sean-k-mooney | so we cant use same subtree here | 13:56 |
sean-k-mooney | it wont help | 13:56 |
sean-k-mooney | not unless we move all rescoue off the root RP | 13:56 |
sean-k-mooney | and have each service start there own subtreed form a common shared root rp | 13:57 |
gibi | consumer type is like, instance, migration, reservation, etc. What we have here is provider type it is provided (managed by) nova, cyborg, neutron... | 13:57 |
sean-k-mooney | yep ^ | 13:57 |
gibi | so I don't think consumer_type helps here | 13:57 |
bauzas | gibi: only because we need to ask placement to only give us a subset | 13:57 |
bauzas | for cyborg-only reasons | 13:58 |
sean-k-mooney | not just for cyborg | 13:58 |
gibi | bauzas: but we only need to ask for a subset if that subset is different from the other subset | 13:58 |
gibi | is the two subsets provide the same capability the why differentiate | 13:58 |
gibi | s/is/if/ | 13:58 |
bauzas | sean-k-mooney: at the moment, yes, only because cyborg | 13:58 |
bauzas | gibi: 100% agreed | 13:59 |
bauzas | it's a whack-a-mole game | 13:59 |
gibi | we differentiate not becasue they provide different capabilities but becuase there is different implementation behind them we need to be aware off during plugging | 13:59 |
sean-k-mooney | the usecase predates it the cybrog one and blocked ohter feature in the past | 13:59 |
bauzas | gibi: again, 100% agreed on your last sentence | 13:59 |
sean-k-mooney | i dont like catogoriesing this as a problem cause by cyborg | 14:00 |
gibi | maybe vgpu is an exception, a historical exception | 14:00 |
sean-k-mooney | its a nova/placement limmiation taht we need to solve genericly to enable cyborg and other usecases | 14:00 |
sean-k-mooney | gibi: well cyborg is doing generic pci passthough also | 14:00 |
bauzas | gibi: other services could propose resources to consume | 14:00 |
gibi | and all the other similar cases like accelerator_direct and disk_gb are different as there we have real differences to schedule on | 14:00 |
gibi | and dont need to invent ownership as a differentiatoer | 14:01 |
dansmith | so the concern is managing the actual RP and available resource for the cyborg things and not the allocations of those things by another source? | 14:01 |
dansmith | because cyborg reserving some accelerator that is on the compute RP _would_ use a different consumer_type I would think | 14:01 |
gibi | cyborg does not create consumers nova create the consumer after scheduling | 14:02 |
gibi | cyborg only creates inventories | 14:02 |
dansmith | okay I guess this has all fallen out of my head | 14:02 |
gibi | or we are back to the discussion to split the instance consumer into subconsumers | 14:02 |
gibi | today we have a single consumer per instance (except during migration) | 14:03 |
dansmith | I thought cyborg was going to have to do that because of dynamic devices that may or may not exist until they're scheduled | 14:03 |
dansmith | things it programs to create a new device that wasn't actually part of the schedule | 14:03 |
sean-k-mooney | dansmith: cyborg may have to update the RPs in some cases but we are modelign programabel devices as programable not programed | 14:05 |
dansmith | okay | 14:05 |
sean-k-mooney | for the cae wehre the admin will use cyborg directly to prgram it out of band that device would be consomed by cyborg | 14:05 |
sean-k-mooney | but the resouce provided form ti woudl be consuemd by nova still | 14:06 |
dansmith | ah, so cyborg consumes the programmable device itself, creates a new resource for the programmed thing, which nova is the consumer of, right? | 14:06 |
sean-k-mooney | yes in that case | 14:06 |
dansmith | yeah, okay | 14:06 |
sean-k-mooney | where the consumer of the programabel device is not a vm | 14:07 |
dansmith | yeah | 14:07 |
sean-k-mooney | e.g. where you have a many vm to 1 device toplogy | 14:07 |
sean-k-mooney | thats the theory at least | 14:07 |
sean-k-mooney | not sure they have actully implemente any dirver that does this yet | 14:07 |
dansmith | I guess that makes the concern over who is managing what less clear to me | 14:07 |
dansmith | which I think was what started this | 14:08 |
dansmith | this: [06:53:45] <sean-k-mooney> ok taking downstream out of this i think it would be goind to do a paper exersise of thinking howe we woudl supprot shareing of resouce classes ectra in general between services | 14:08 |
sean-k-mooney | we currently have 4 possibel ideas fo how to do ^ but no concreate propasl and we have not fully tought though all the implcaitons | 14:09 |
sean-k-mooney | cyborg being the first to need this was trying to propas a way in teh vgpu spec | 14:10 |
sean-k-mooney | using the ownwer_trits approch but bauzas has conserns over that as does gibi | 14:10 |
sean-k-mooney | but really this is a sperate problem form cybrog vgpu supprot | 14:11 |
bauzas | my main concern is that we need to add traits for nova hosts | 14:11 |
sean-k-mooney | yes but they are not really nova hosts | 14:11 |
bauzas | non cyborg managed hosts, if you prefer | 14:11 |
bauzas | or libvirt-managed hosts | 14:12 |
sean-k-mooney | we are supper bias but i have agrued since before nested resouce providers that nova shoudl not own the root rp | 14:12 |
sean-k-mooney | well it was one of my argument for intoducing them | 14:12 |
sean-k-mooney | the "compute" host is really a shared thing | 14:12 |
sean-k-mooney | to which multiple serivce may create resouces | 14:13 |
sean-k-mooney | being totally frank im worried that we will not come to a desicion on this this cycle and cyborg will have to with a third time to make progress | 14:14 |
sean-k-mooney | i have personaly see this type of discussion take litrally 2-3 year to progress and i think that is harmful to the openstack comunity as a whole. i also dont want to rush it as its hard to pivort after its released | 14:16 |
bauzas | we provided alternatives | 14:17 |
sean-k-mooney | yes but we dont agree on any of them | 14:17 |
sean-k-mooney | i hate to say this but i think we need a spec for this or atleast an etherpad and some midcycle like real time design session on this | 14:19 |
sean-k-mooney | this is exactly the type of thing we woule have worked though in the ptg | 14:20 |
sean-k-mooney | with a whiteboard and all the stake holders (where possible) in the same room | 14:20 |
bauzas | I can't disagree | 14:21 |
dansmith | we've also had plenty of those in the past which didn't yield much progress.. specifically about cyborg :) | 14:21 |
sean-k-mooney | ya the dublin seesion in partcalar was less then useful | 14:22 |
* gibi is tired | 14:23 | |
sean-k-mooney | gibi: thanks for trying to help | 14:24 |
sean-k-mooney | but i feel the same | 14:24 |
gibi | I summarized my lates view in the spec review but I haven't published it yet. | 14:24 |
sean-k-mooney | could we just use a custom resouce class for now an punt on this for now | 14:25 |
gibi | I'm torn between doing the right thing architecturally and supporting a parallel projec to integrate with nova technically | 14:25 |
sean-k-mooney | i know that would requrie a reshape eventually but at least it would unblcok them | 14:25 |
gibi | it would unblock them and we will never allocate time to do that reshape later | 14:26 |
sean-k-mooney | well it would be in cyborg? | 14:26 |
gibi | as it just a lot of work | 14:26 |
gibi | for basicly nothing | 14:26 |
sean-k-mooney | maybe it would be in nova | 14:26 |
sean-k-mooney | damb cross project reshapes will be a bitch to fiture out forget i said anything | 14:26 |
* gibi published his revised viewpoint in the spec | 14:37 | |
bauzas | gibi: thanks | 14:38 |
elodilles | sean-k-mooney gibi : as we yesterday discussed on the meeting, here is an example of a failure what I saw: test_live_block_migration_with_attached_volume -- https://c5d6a707d1df71acd55f-fbeed0693cac4a6e5441d43111515edc.ssl.cf5.rackcdn.com/787252/4/gate/nova-live-migration/706ec17/testr_results.html | 14:39 |
gibi | elodilles: looking | 14:40 |
elodilles | I saw this a couple of times at the failures of this patch: https://review.opendev.org/c/openstack/nova/+/787252 | 14:40 |
gibi | elodilles: you have a good timing :) | 14:40 |
elodilles | stable/victoria | 14:40 |
elodilles | gibi: why? o:) | 14:40 |
sean-k-mooney | weird | 14:40 |
sean-k-mooney | both thos test refernce the same volume 7688b3f1-6549-4e0c-a507-9789ddc2eb95 | 14:40 |
gibi | elodilles: I've just stopped thinking about the above vGPU problem | 14:41 |
sean-k-mooney | that should not happen right | 14:41 |
sean-k-mooney | two tests refering to the same volume uuid 7688b3f1-6549-4e0c-a507-9789ddc2eb95 | 14:41 |
sean-k-mooney | if they run in parally they would conflict | 14:41 |
gibi | sean-k-mooney: wich two test cases? | 14:42 |
gibi | sean-k-mooney: I see a testcase and a tear down | 14:42 |
gibi | of the suite | 14:42 |
sean-k-mooney | oh ya its the tear down method | 14:42 |
elodilles | gibi: well I've seen that there was an ongoing discussion so I did not want to interrupt earlier :) | 14:42 |
sean-k-mooney | ok that make a liit more sense | 14:42 |
sean-k-mooney | ok so the teardown failed becasue presumable the volume was still attached to the vm | 14:43 |
sean-k-mooney | it was in state detaching | 14:43 |
gibi | it is trying to detach in a loop so I guess it is the original detach problem | 14:46 |
gibi | for what we created the notification based solution | 14:46 |
gibi | but that solution is not in victoria | 14:46 |
sean-k-mooney | ya the detach starts at Jun 21 09:20:16.007823 | 14:46 |
gibi | so either we increase the timeout value before we retry in victoria or backport the notification based solution to victoria | 14:48 |
sean-k-mooney | we are calling os_brick.initiator.connectors.iscsi.ISCSIConnector.disconnect_volume on it | 14:48 |
sean-k-mooney | we would not do that if the detach ahs not commpeted right? | 14:48 |
gibi | sean-k-mooney: I don't know. what I know that the detach is not completed from libvirt perspective, it still reports the device in the live domain after 7 retries | 14:49 |
sean-k-mooney | this is happening as part of a live migration | 14:50 |
sean-k-mooney | this looks odd what is the test actully doing | 14:51 |
sean-k-mooney | it look like we are runign post live migration on the souce which is tearing odwn the volume while its potentioanly still detaching | 14:51 |
gibi | I think lyarwood has more context on this detach problem | 14:51 |
sean-k-mooney | im wonderign if the tempest test is broken | 14:52 |
sean-k-mooney | e.g. is it issuing a detach and then a live migrate without waiting for the detach to complete | 14:52 |
sean-k-mooney | this is the test https://github.com/openstack/tempest/blob/master/tempest/api/compute/admin/test_live_migration.py#L178-L204 | 14:53 |
sean-k-mooney | thats doing an attach | 14:54 |
sean-k-mooney | but its not doing a detach | 14:54 |
sean-k-mooney | its also not waiting to ensure its actully attached | 14:54 |
sean-k-mooney | unless self.attach_volume is doint that internally | 14:55 |
sean-k-mooney | ok if the attach fails its cleanup funciton does a detach im assuming https://github.com/openstack/tempest/blob/53c02181f87804a4ba8ddf6288ea1f7717234c2a/tempest/api/compute/base.py#L554-L591 | 14:56 |
sean-k-mooney | it is waiting internally | 14:56 |
sean-k-mooney | https://github.com/openstack/tempest/blob/53c02181f87804a4ba8ddf6288ea1f7717234c2a/tempest/api/compute/base.py#L590 | 14:57 |
sean-k-mooney | can we see the volume attach complete in the logs | 14:58 |
sean-k-mooney | we do the attachment in libvirt at un 21 09:20:00.476733 | 15:00 |
lyarwood | if we are still talking about https://c5d6a707d1df71acd55f-fbeed0693cac4a6e5441d43111515edc.ssl.cf5.rackcdn.com/787252/4/gate/nova-live-migration/706ec17/testr_results.html it's just another basic detach failure from the live domain | 15:10 |
lyarwood | after the test has passed and we are cleaning up | 15:10 |
lyarwood | I really need to land https://review.opendev.org/c/openstack/tempest/+/794757 so we can get some guestOS logs in this case | 15:12 |
gibi | lyarwood: that run is from stable/victoria so we don't have the new detach code there. Do you think it would help if we try to backport the new detach code? or is it possibly unrelated? | 15:13 |
lyarwood | or we could just Depends-On that from a DNM stable/victoria change | 15:13 |
elodilles | lyarwood: so you are saying that this is similar like bug #1931716 but not the exact same case? | 15:13 |
lyarwood | gibi: we've seen failures of this test on master so this time I don't think backporting that code is going to help | 15:14 |
lyarwood | brb | 15:14 |
gibi | lyarwood: OK, then I agree to land the tempest improvement first to get more info | 15:15 |
opendevreview | Elod Illes proposed openstack/nova stable/victoria: DNM: volume detach test https://review.opendev.org/c/openstack/nova/+/797675 | 15:18 |
elodilles | I've created the DNM patch what lyarwood suggested ^^^ | 15:19 |
lyarwood | elodilles: thanks | 15:21 |
elodilles | np. let's see what happens :) | 15:22 |
lyarwood | elodilles: I have a change to skip the test on master btw that we could also backport if we see the same soft lockups | 15:30 |
opendevreview | Stephen Finucane proposed openstack/nova stable/victoria: Test numa and vcpu topologies bug: #1910466 https://review.opendev.org/c/openstack/nova/+/797680 | 15:30 |
opendevreview | Stephen Finucane proposed openstack/nova stable/victoria: Fix max cpu topologies with numa affinity https://review.opendev.org/c/openstack/nova/+/797681 | 15:30 |
elodilles | lyarwood: thanks, good to know that, we might need to backport that then if needed | 15:35 |
kashyap | gibi: lyarwood: sean-k-mooney: When you get a min, have a look to see if you spot any holes there: https://blueprints.launchpad.net/nova/+spec/virtio-as-default-display-device | 15:46 |
sean-k-mooney | that will need a spec most likely | 15:50 |
sean-k-mooney | im not agaisnt it but we need to record the current modele in use | 15:51 |
sean-k-mooney | and then only change the default for new instnaces | 15:51 |
sean-k-mooney | lyarwood: so in this case we dont actully want to do the detach on succesful migration right | 15:59 |
sean-k-mooney | that is jsut an artifcat of the way the cleanup is done | 16:00 |
sean-k-mooney | we just want to delete the vm then delete the volume | 16:00 |
lyarwood | sean-k-mooney: yeah it's just part of the tempest cleanup code that's added when we initially attach | 16:00 |
sean-k-mooney | could we modify that | 16:00 |
lyarwood | indeed, we don't need to detach as part of this test | 16:00 |
sean-k-mooney | add a flag to attach e.g cleanup=false | 16:00 |
lyarwood | so we could just nuke the instance and move on with our lives | 16:00 |
sean-k-mooney | yep | 16:01 |
lyarwood | yup, the only issue is that there's duplication in the tempest code base with lots of this utility code | 16:01 |
lyarwood | but I can do this for the compute attach volume part at least | 16:01 |
lyarwood | I think there's another helper in the volume api tests | 16:01 |
lyarwood | and maybe the scenario manager | 16:01 |
sean-k-mooney | it should in thoery decrease test execution time | 16:02 |
lyarwood | Yup we would however drop some coverage of detaching volumes | 16:03 |
*** rpittau is now known as rpittau|afk | 16:08 | |
sean-k-mooney | lyarwood: we would but in principal we have tests for that specifically | 16:34 |
sean-k-mooney | detach is not really a part of what we are testing in this case | 16:34 |
sean-k-mooney | it does expose racecondition more as we see here | 16:34 |
sean-k-mooney | but im not sure that is always a good thing | 16:34 |
opendevreview | Lee Yarwood proposed openstack/nova master: WIP compute: Avoid calling detach with src connection_info during LM rollback https://review.opendev.org/c/openstack/nova/+/797725 | 17:19 |
sean-k-mooney | that was quick | 17:27 |
ganso | lyarwood: hi! is there anything pending on https://review.opendev.org/c/openstack/nova/+/795432 so it can be merged? That patch was the latest one to pass CI. There are several patches (including one with +W) that haven't had a single positive CI run | 17:39 |
lyarwood | elodilles: https://review.opendev.org/c/openstack/nova/+/795432 - can you hit this in the morning? I've upgraded my +1 to +2 to move it along. | 19:37 |
lyarwood | ganso: apologies, I'll work with elodilles and the other stable cores to unblock things in the morning | 19:38 |
ganso | lyarwood: np! thank you very much! I was mostly wondering if something else was pending because I had rebased on top of it and it still failed. See latest comments in https://review.opendev.org/c/openstack/nova/+/796719 | 19:39 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!