14:01:31 <jbernard> #startmeeting cinder 14:01:31 <opendevmeet> Meeting started Wed Dec 11 14:01:31 2024 UTC and is due to finish in 60 minutes. The chair is jbernard. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:31 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:31 <opendevmeet> The meeting name has been set to 'cinder' 14:01:35 <jbernard> #topic roll call 14:01:38 <tosky> o/ 14:01:41 <jungleboyj> o/ 14:01:44 <jbernard> o/ 14:01:45 <simondodsley> o/ 14:01:49 <rosmaita> o/ 14:01:50 <akawai> o/ 14:01:51 <jbernard> #link https://etherpad.opendev.org/p/cinder-epoxy-meetings 14:01:52 <flelain> o/ 14:02:00 <vdhakad> o/ 14:02:07 <whoami-rajat> hey 14:02:29 <sp-bmilanov> o/ 14:02:55 <msaravan> Hi 14:04:55 <jbernard> ok, welcome everyone 14:05:01 <jbernard> #topic annoucements 14:05:16 <jbernard> #link https://releases.openstack.org/epoxy/schedule.html 14:05:36 <jbernard> still in M1, M2 is early Jan (6-10) 14:06:22 <jbernard> generally we (or I at least) are focused on consolidating any pending specs in the next days 14:06:30 <jbernard> specifically the dm-clone spec from jan 14:06:37 <jbernard> but there could be something I've missed 14:06:55 <jbernard> otherwise, I hope everyone is haveing a good december so far 14:07:11 <simondodsley> It's December already!!! 14:07:28 <jungleboyj> Lol. 1/3 of the way through even! 14:07:30 <jbernard> actually we're almost half done! :/ 14:07:42 <simondodsley> must take more notice 14:07:52 <flelain> Happy Advent time to you all! 14:08:10 <jbernard> to you too 14:08:15 <jbernard> speaking of advent... :) 14:08:21 <jbernard> #link https://adventofcode.com/ 14:08:30 <jbernard> ^ this is fun if you don't have anything to do 14:08:50 <jhorstmann> o/ 14:09:00 <jbernard> #topic followup on dm-clone spec 14:09:02 <flelain> lol; got a couple of colleagues of mine already deeply involved in it :) Haven't found time for it so far! 14:09:40 <jbernard> #link https://review.opendev.org/c/openstack/cinder-specs/+/935347/6 14:09:44 <jbernard> #link https://review.opendev.org/c/openstack/cinder-specs/+/935347 14:09:48 <simondodsley> I'm reading this spec now and I'm not sure I understand the need, especially if Ceph is deployed 14:10:25 <jbernard> jhorstmann: we're getting some traction on your spec review, wanted to make space available now to raise any issues so far 14:11:35 <jbernard> simondodsley: if you have ceph, my understanding is that dm-clone would not be necessary; this spec rather gives you local storage with the ability to live migrate, something closer to distributed storage without the resource requirements and complexity 14:11:44 <jhorstmann> thank you, happy to answer any questions 14:11:45 <jbernard> jhorstmann: ^ please correct me if im wrong there 14:12:45 <simondodsley> but you are requiring a c-vol service on each compute node - that is more resources. I'm not sure that a c-vol on each hypervisor is a good methodology 14:13:17 <jhorstmann> simondodsley: the idea is to have more options. This is intended for deployments where you want either performance of local storage or are resource constrained and do not want to deploy a full storage solution 14:14:32 <jhorstmann> simondodsley: I agree that this is disadvatagious. It is one of the contraints if this is implemented as a pure cinder volume driver 14:14:35 <simondodsley> and how will this work with Nova AZs 14:15:17 <simondodsley> lol - you said pure - please capitialize it ... :) 14:15:33 <jhorstmann> :) 14:16:17 <simondodsley> i feel this may become problematic in core-edge scenarios where different edge nodes have no connectivity to other edge nodes 14:16:47 <whoami-rajat> simondodsley, IIUC it's a k8s over openstack use case where we require local volumes for the etcd service, we use ceph in HCI but i think it doesn't provide the desired latency 14:17:18 <whoami-rajat> simondodsley, that was one of my concerns, here all compute nodes needs to be connected via storage network 14:17:46 <simondodsley> if this is for a k8s on openstack, then a CSI driver should be used, rahter than a cinder driver shouldn't it? 14:17:59 <flelain> jhosrstmann, about this spec, just to make sure I got it right, does it want to achieve block storage service right on the local storage of the compute hosting the VM? 14:20:12 <whoami-rajat> simondodsley, I'm not aware of all the details but do you mean cinder CSI driver? won't that issue API calls to cinder only? 14:20:22 <simondodsley> or even a true CNS solution - i have to be careful here as my company sells one of these so I may have to recuse myself from being involved in this 14:20:52 <simondodsley> not necessarily th cinder CSI driver. 14:21:24 <jhorstmann> simondodsley: so regarding the availability zones you have cinder.cross_az_attach=True as a default, so the AZ is not relevant in that case, correct? If you want to have to have cross_az_attach=False you have to make sure that they overlap. I am probably missing the exact point of the question 14:22:03 <simondodsley> it's more about DCN deployments 14:22:27 <jbernard> flelain: i believe that is true, I think of it as an LVM deployment, but with the ability to move volumes between compute nodes as instances are migrated 14:22:27 <jhorstmann> flelain: that is correct. The driver will provide storage local to the compute node and offer the possibility to transfer the data on attach and live-migration 14:23:53 <flelain> then couldn't the instance disk, using local compute node, do the trick? (w/o being a block storgae though) 14:24:11 <simondodsley> i don't understand why you are referencing iSCSI or other network attached storage as the source, because this would mean tha texternal storage is avaible and therefore why not just use the cinder driver for that storage? 14:26:05 <jhorstmann> simondodsley, whoami-rajat: yes if you want to deploy a cloud with this driver you would need some sort of storage network between the conmpute nodes, but is there a deployment were this does not apply? 14:26:20 <simondodsley> It feel this this is just live migration 2.0, using dm-clone instead of dd. 14:27:16 <simondodsley> how will you cater for network outages? does dm-clone have capabilities to automatically recover from these? 14:27:49 <jhorstmann> flelain: the idea is to have the full flexibility of the block storage service, so that you dynamically create the volumes of the required size and are not bounbd to any instance lifecycles 14:29:26 <jhorstmann> simondodsley: the advantage is that using the dm-clone target writes will always be local to the compute node. So you get local storage perfomrance for writes, where it is most critical for applications like e.g. databases 14:29:59 <simondodsley> but what about a network outage during a migration 14:31:14 <whoami-rajat> the spec mentions all of these details 14:31:19 <whoami-rajat> #link https://review.opendev.org/c/openstack/cinder-specs/+/935347/7/specs/2025.1/add-dmclone-driver.rst 14:31:22 <whoami-rajat> L#528 14:31:34 <flelain> jhorstmann: gotcha. Interesting concept to be carried on! 14:31:44 <jhorstmann> simondodsley: regarding network outages: yes you can recover from those. of course you will get read errors if the source node is not available. Usually this is handled on the filesystem level by remounting read-only. Once you recover the source node you can recover the volume and data will be trasfered again 14:32:03 <flelain> jhorstmann I left comments in the spec, thank your for this proposal. 14:32:21 <jhorstmann> flelain: thank you 14:32:29 <rosmaita> also, the presentation from the PTG gives a good overview of the spec: https://janhorstmann.github.io/cinder-driver-dm-clone-presentation/000-toc.html 14:34:05 <jhorstmann> simondodsley: it is not an HA storage solution. If that is a requirement then this driver should not be used. As said before, I see this as an additional storage option, but it cannot replace most existing solutions 14:37:26 <whoami-rajat> i feel it would be best to leave the concerns as comments on the spec since Jan is pretty good and quick at responding on those 14:38:12 <simondodsley> So what thing that concerns me is that you cannot extend a volume. This is a minimum requirement for ny cinder driver as defined in the documentation 14:38:40 <whoami-rajat> we can extend it, just not during the migration 14:39:23 <simondodsley> so does it meet all the required core functions? 14:39:45 <simondodsley> https://docs.openstack.org/cinder/latest/contributor/drivers.html 14:39:58 <jhorstmann> simondodsley: yes volume extension is possible when the data is "at rest" (no clone target loaded) 14:40:46 <whoami-rajat> so this is not specifically a new cinder driver but an extension of the LVM driver to provide local attachment + live migration support 14:41:11 <jbernard> these are all good questions, it might be good to have any outstanding concerns captured in the spec, Jan has been good about quick responses 14:41:38 <simondodsley> hmm - the spec doesn't really say that - it says it is a new driver 14:41:58 <simondodsley> Line #20 14:42:17 <simondodsley> and the title says 'Add' which implies 'new' 14:42:32 <simondodsley> even the spec name is add 14:43:12 <simondodsley> isn't this more of a new replication capability with failover capabilities? 14:43:23 <whoami-rajat> i might need to clarify my understanding there, maybe jhorstmann can help 14:44:02 <whoami-rajat> but jbernard might have other discussion points so we can discuss it at some other place 14:44:21 <whoami-rajat> s/discussion/meeting 14:44:31 <jbernard> on the agenda there are only review requests, this was/is our primary topic for today 14:44:45 <jbernard> unless there is something not on the etherpad 14:45:49 <tosky> sooo, if I can: should we mark revert to snapshot as supported for Generic NFS? I've created https://review.opendev.org/c/openstack/cinder/+/936182 also to gather feedback: it seems the pending issues were addressed. Or is there still any gap that needs to be fixed? 14:46:14 <jbernard> #topic open dicussion 14:46:47 <jbernard> simondodsley: those are all great questions, please if you have cycles today put any outstanding ones in the spec so that we have record 14:46:53 <jhorstmann> whoami-rajat, simondodsley: the driver inherits from the LVM driver for basic volume management, but requires to implement some methods differently. I am not sure where the line between extension and new driver is drawn 14:47:40 <jbernard> eharney: re tosky's question, you are frequently in the NFS codez 14:48:07 <whoami-rajat> jhorstmann, so basically if the base driver (LVM) + the new driver methods implement all required functionalities, we should be good 14:48:10 <rosmaita> jhorstmann: i think calling it a new driver is fine, we have a bunch of drivers like that 14:48:42 <eharney> jbernard: yeah i meant to +2 that one -- i'm doing some work in nfs world anyway right now so i'll see if there are any concerns there 14:49:10 <whoami-rajat> rosmaita, +1 14:50:32 <jbernard> eharney: excellent, thanks 14:50:39 <whoami-rajat> tosky, i see in one of recent runs of nfs job, revert is enabled but I'm unable to find any test that is exercising that code path 14:51:20 <tosky> whoami-rajat: the tests are in cinder-tempest-plugin; I have a review to enable these tests for NFS, which is blocked on the known NFS format bug 14:51:26 <tosky> let me dig it 14:52:31 <whoami-rajat> tosky, oh okay, though I can vouch it works since i worked on the fix and verified it 14:52:47 <whoami-rajat> (it would still be good to see it in gate run) 14:53:08 <tosky> whoami-rajat: https://review.opendev.org/c/openstack/devstack-plugin-nfs/+/898965 14:53:42 <tosky> aaand we probably need fresh logs 14:54:34 <whoami-rajat> wait, that doc change might not be accurate 14:54:42 <whoami-rajat> so NFS doesn't support revert to snapshot 14:54:46 <tosky> oh 14:54:54 <whoami-rajat> but when it falls back to the generic revert mechanism, it still fails 14:54:59 <whoami-rajat> which is the scenario that i fixed 14:55:28 <whoami-rajat> so we cannot do driver assisted revert to snapshot but at least the generic code path works with nfs now 14:55:40 <tosky> so that's feature is about driver assisted revert, not just that it "just works"? 14:55:54 <tosky> but in fact having the tests enabled wouldn't hurt 14:56:03 <tosky> still it may need to written down in a proper way somehow 14:56:16 <tosky> or just decided that yes, revert to snapshot works, just not optimized 14:56:19 <whoami-rajat> i think everything is expected to work in generic workflow but NFS proved us wrong :D 14:56:26 <tosky> up to whatever you people decide :) 14:56:51 <whoami-rajat> right, it is certainly an improvement than before when we couldn't even do revert with NFS 14:57:03 <whoami-rajat> which also led to some complaints in the past 14:59:07 <jbernard> ok, last call 14:59:08 <whoami-rajat> I've left a -1 on this change but we can add it in some other doc place 15:00:00 <jbernard> #endmeeting