#openstack-meeting-alt log

14:01:31 <jbernard> #startmeeting cinder
14:01:31 <opendevmeet> Meeting started Wed Dec 11 14:01:31 2024 UTC and is due to finish in 60 minutes.  The chair is jbernard. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:31 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:31 <opendevmeet> The meeting name has been set to 'cinder'
14:01:35 <jbernard> #topic roll call
14:01:38 <tosky> o/
14:01:41 <jungleboyj> o/
14:01:44 <jbernard> o/
14:01:45 <simondodsley> o/
14:01:49 <rosmaita> o/
14:01:50 <akawai> o/
14:01:51 <jbernard> #link https://etherpad.opendev.org/p/cinder-epoxy-meetings
14:01:52 <flelain> o/
14:02:00 <vdhakad> o/
14:02:07 <whoami-rajat> hey
14:02:29 <sp-bmilanov> o/
14:02:55 <msaravan> Hi
14:04:55 <jbernard> ok, welcome everyone
14:05:01 <jbernard> #topic annoucements
14:05:16 <jbernard> #link https://releases.openstack.org/epoxy/schedule.html
14:05:36 <jbernard> still in M1, M2 is early Jan (6-10)
14:06:22 <jbernard> generally we (or I at least) are focused on consolidating any pending specs in the next days
14:06:30 <jbernard> specifically the dm-clone spec from jan
14:06:37 <jbernard> but there could be something I've missed
14:06:55 <jbernard> otherwise, I hope everyone is haveing a good december so far
14:07:11 <simondodsley> It's December already!!!
14:07:28 <jungleboyj> Lol.  1/3 of the way through even!
14:07:30 <jbernard> actually we're almost half done! :/
14:07:42 <simondodsley> must take more notice
14:07:52 <flelain> Happy Advent time to you all!
14:08:10 <jbernard> to you too
14:08:15 <jbernard> speaking of advent... :)
14:08:21 <jbernard> #link https://adventofcode.com/
14:08:30 <jbernard> ^ this is fun if you don't have anything to do
14:08:50 <jhorstmann> o/
14:09:00 <jbernard> #topic followup on dm-clone spec
14:09:02 <flelain> lol; got a couple of colleagues of mine already deeply involved in it :) Haven't found time for it so far!
14:09:40 <jbernard> #link https://review.opendev.org/c/openstack/cinder-specs/+/935347/6
14:09:44 <jbernard> #link https://review.opendev.org/c/openstack/cinder-specs/+/935347
14:09:48 <simondodsley> I'm reading this spec now and I'm not sure I understand the need, especially if Ceph is deployed
14:10:25 <jbernard> jhorstmann: we're getting some traction on your spec review, wanted to make space available now to raise any issues so far
14:11:35 <jbernard> simondodsley: if you have ceph, my understanding is that dm-clone would not be necessary; this spec rather gives you local storage with the ability to live migrate, something closer to distributed storage without the resource requirements and complexity
14:11:44 <jhorstmann> thank you, happy to answer any questions
14:11:45 <jbernard> jhorstmann: ^ please correct me if im wrong there
14:12:45 <simondodsley> but you are requiring a c-vol service on each compute node - that is more resources. I'm not sure that a c-vol on each hypervisor is a good methodology
14:13:17 <jhorstmann> simondodsley: the idea is to have more options. This is intended for deployments where you want either performance of local storage or are resource constrained and do not want to deploy a full storage solution
14:14:32 <jhorstmann> simondodsley: I agree that this is disadvatagious. It is one of the contraints if this is implemented as a pure cinder volume driver
14:14:35 <simondodsley> and how will this work with Nova AZs
14:15:17 <simondodsley> lol - you said pure - please capitialize it ... :)
14:15:33 <jhorstmann> :)
14:16:17 <simondodsley> i feel this may become problematic in core-edge scenarios where different edge nodes have no connectivity to other edge nodes
14:16:47 <whoami-rajat> simondodsley, IIUC it's a k8s over openstack use case where we require local volumes for the etcd service, we use ceph in HCI but i think it doesn't provide the desired latency
14:17:18 <whoami-rajat> simondodsley, that was one of my concerns, here all compute nodes needs to be connected via storage network
14:17:46 <simondodsley> if this is for a k8s on openstack, then a CSI driver should be used, rahter than a cinder driver shouldn't it?
14:17:59 <flelain> jhosrstmann, about this spec, just to make sure I got it right, does it want to achieve block storage service right on the local storage of the compute hosting the VM?
14:20:12 <whoami-rajat> simondodsley, I'm not aware of all the details but do you mean cinder CSI driver? won't that issue API calls to cinder only?
14:20:22 <simondodsley> or even a true CNS solution - i have to be careful here as my company sells one of these so I may have to recuse myself from being involved in this
14:20:52 <simondodsley> not necessarily th cinder CSI driver.
14:21:24 <jhorstmann> simondodsley: so regarding the availability zones you have cinder.cross_az_attach=True as a default, so the AZ is not relevant in that case, correct? If you want to have to have cross_az_attach=False you have to make sure that they overlap. I am probably missing the exact point of the question
14:22:03 <simondodsley> it's more about DCN deployments
14:22:27 <jbernard> flelain: i believe that is true, I think of it as an LVM deployment, but with the ability to move volumes between compute nodes as instances are migrated
14:22:27 <jhorstmann> flelain: that is correct. The driver will provide storage local to the compute node and offer the possibility to transfer the data on attach and live-migration
14:23:53 <flelain> then couldn't the instance disk, using local compute node, do the trick? (w/o being a block storgae though)
14:24:11 <simondodsley> i don't understand why you are referencing iSCSI or other network attached storage as the source, because this would mean tha texternal storage is avaible and therefore why not just use the cinder driver for that storage?
14:26:05 <jhorstmann> simondodsley, whoami-rajat: yes if you want to deploy a cloud with this driver you would need some sort of storage network between the conmpute nodes, but is there a deployment were this does not apply?
14:26:20 <simondodsley> It feel this this is just live migration 2.0, using dm-clone instead of dd.
14:27:16 <simondodsley> how will you cater for network outages? does dm-clone have capabilities to automatically recover from these?
14:27:49 <jhorstmann> flelain: the idea is to have the full flexibility of the block storage service, so that you dynamically create the volumes of the required size and are not bounbd to any instance lifecycles
14:29:26 <jhorstmann> simondodsley: the advantage is that using the dm-clone target writes will always be local to the compute node. So you get local storage perfomrance for writes, where it is most critical for applications like e.g. databases
14:29:59 <simondodsley> but what about a network outage during a migration
14:31:14 <whoami-rajat> the spec mentions all of these details
14:31:19 <whoami-rajat> #link https://review.opendev.org/c/openstack/cinder-specs/+/935347/7/specs/2025.1/add-dmclone-driver.rst
14:31:22 <whoami-rajat> L#528
14:31:34 <flelain> jhorstmann: gotcha. Interesting concept to be carried on!
14:31:44 <jhorstmann> simondodsley: regarding network outages: yes you can recover from those. of course you will get read errors if the source node is not available. Usually this is handled on the filesystem level by remounting read-only. Once you recover the source node you can recover the volume and data will  be trasfered again
14:32:03 <flelain> jhorstmann I left comments in the spec, thank your for this proposal.
14:32:21 <jhorstmann> flelain: thank you
14:32:29 <rosmaita> also, the presentation from the PTG gives a good overview of the spec: https://janhorstmann.github.io/cinder-driver-dm-clone-presentation/000-toc.html
14:34:05 <jhorstmann> simondodsley: it is not an HA storage solution. If that is a requirement then this driver should not be used. As said before, I see this as an additional storage option, but it cannot replace most existing solutions
14:37:26 <whoami-rajat> i feel it would be best to leave the concerns as comments on the spec since Jan is pretty good and quick at responding on those
14:38:12 <simondodsley> So what thing that concerns me is that you cannot extend a volume. This is a minimum requirement for ny cinder driver as defined in the documentation
14:38:40 <whoami-rajat> we can extend it, just not during the migration
14:39:23 <simondodsley> so does it meet all the required core functions?
14:39:45 <simondodsley> https://docs.openstack.org/cinder/latest/contributor/drivers.html
14:39:58 <jhorstmann> simondodsley: yes volume extension is possible when the data is "at rest" (no clone target loaded)
14:40:46 <whoami-rajat> so this is not specifically a new cinder driver but an extension of the LVM driver to provide local attachment + live migration support
14:41:11 <jbernard> these are all good questions, it might be good to have any outstanding concerns captured in the spec, Jan has been good about quick responses
14:41:38 <simondodsley> hmm - the spec doesn't really say that - it says it is a new driver
14:41:58 <simondodsley> Line #20
14:42:17 <simondodsley> and the title says 'Add' which implies 'new'
14:42:32 <simondodsley> even the spec name is add
14:43:12 <simondodsley> isn't this more of a new replication capability with failover capabilities?
14:43:23 <whoami-rajat> i might need to clarify my understanding there, maybe jhorstmann can help
14:44:02 <whoami-rajat> but jbernard might have other discussion points so we can discuss it at some other place
14:44:21 <whoami-rajat> s/discussion/meeting
14:44:31 <jbernard> on the agenda there are only review requests, this was/is our primary topic for today
14:44:45 <jbernard> unless there is something not on the etherpad
14:45:49 <tosky> sooo, if I can: should we mark revert to snapshot as supported for Generic NFS? I've created https://review.opendev.org/c/openstack/cinder/+/936182 also to gather feedback: it seems the pending issues were addressed. Or is there still any gap that needs to be fixed?
14:46:14 <jbernard> #topic open dicussion
14:46:47 <jbernard> simondodsley: those are all great questions, please if you have cycles today put any outstanding ones in the spec so that we have record
14:46:53 <jhorstmann> whoami-rajat, simondodsley: the driver inherits from the LVM driver for basic volume management, but requires to implement some methods differently. I am not sure where the line between extension and new driver is drawn
14:47:40 <jbernard> eharney: re tosky's question, you are frequently in the NFS codez
14:48:07 <whoami-rajat> jhorstmann, so basically if the base driver (LVM) + the new driver methods implement all required functionalities, we should be good
14:48:10 <rosmaita> jhorstmann: i think calling it a new driver is fine, we have a bunch of drivers like that
14:48:42 <eharney> jbernard: yeah i meant to +2 that one -- i'm doing some work in nfs world anyway right now so i'll see if there are any concerns there
14:49:10 <whoami-rajat> rosmaita, +1
14:50:32 <jbernard> eharney: excellent, thanks
14:50:39 <whoami-rajat> tosky, i see in one of recent runs of nfs job, revert is enabled but I'm unable to find any test that is exercising that code path
14:51:20 <tosky> whoami-rajat: the tests are in cinder-tempest-plugin; I have a review to enable these tests for NFS, which is blocked on the known NFS format bug
14:51:26 <tosky> let me dig it
14:52:31 <whoami-rajat> tosky, oh okay, though I can vouch it works since i worked on the fix and verified it
14:52:47 <whoami-rajat> (it would still be good to see it in gate run)
14:53:08 <tosky> whoami-rajat: https://review.opendev.org/c/openstack/devstack-plugin-nfs/+/898965
14:53:42 <tosky> aaand we probably need fresh logs
14:54:34 <whoami-rajat> wait, that doc change might not be accurate
14:54:42 <whoami-rajat> so NFS doesn't support revert to snapshot
14:54:46 <tosky> oh
14:54:54 <whoami-rajat> but when it falls back to the generic revert mechanism, it still fails
14:54:59 <whoami-rajat> which is the scenario that i fixed
14:55:28 <whoami-rajat> so we cannot do driver assisted revert to snapshot but at least the generic code path works with nfs now
14:55:40 <tosky> so that's feature is about driver assisted revert, not just that it "just works"?
14:55:54 <tosky> but in fact having the tests enabled wouldn't hurt
14:56:03 <tosky> still it may need to written down in a proper way somehow
14:56:16 <tosky> or just decided that yes, revert to snapshot works, just not optimized
14:56:19 <whoami-rajat> i think everything is expected to work in generic workflow but NFS proved us wrong :D
14:56:26 <tosky> up to whatever you people decide :)
14:56:51 <whoami-rajat> right, it is certainly an improvement than before when we couldn't even do revert with NFS
14:57:03 <whoami-rajat> which also led to some complaints in the past
14:59:07 <jbernard> ok, last call
14:59:08 <whoami-rajat> I've left a -1 on this change but we can add it in some other doc place
15:00:00 <jbernard> #endmeeting