#openstack-meeting-alt log

15:01:32 <bswartz> #startmeeting manila
15:01:33 <openstack> Meeting started Thu Feb  4 15:01:32 2016 UTC and is due to finish in 60 minutes.  The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:34 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:36 <openstack> The meeting name has been set to 'manila'
15:01:39 <toabctl_> hey
15:01:40 <cknight> Hi
15:01:40 <gouthamr> o/ hey!
15:01:42 <ganso> hello
15:01:43 <aovchinnikov> hi
15:01:43 <bswartz> hello all
15:01:45 <tpsilva> hello
15:01:49 <vponomaryov> hi
15:01:51 <xyang1> hi
15:02:09 <mkoderer> hi
15:02:28 <bswartz> #topic announcement
15:02:31 <jcsp> hi
15:02:41 <bswartz> first thing:
15:02:49 <bswartz> welcome ganso to manila core reviewer team!
15:02:55 <toabctl_> welcome!
15:03:00 <cknight> ganso: welcome!
15:03:03 <gouthamr> +1
15:03:20 <markstur_> congrats
15:03:21 <ganso> bswartz: thanks! I am very happy to be part of this community and continue contributing to the Manila project! :)
15:03:24 <mkoderer> congrats
15:03:30 <tpsilva> congrats ganso!
15:03:42 <vponomaryov> ganso: congrats! ))
15:03:49 <ganso> :)
15:03:50 <bswartz> I'm very excited to have ganso's help, and look forward to extra reviewer bandwidth in the coming weeks as we approach Mitaka Feature Freeze
15:03:59 <xyang1> ganso: congrats!
15:04:12 <xyang1> looks like I missed the email
15:04:39 <bswartz> xyang1: this is the announcement -- I'll send an email after the meeting
15:04:58 <xyang1> bswartz: ok
15:05:02 <ameade> 0/
15:05:13 <bswartz> so next thing: deadlines
15:05:54 <bswartz> as we agreed to at the start of the release, there is a *Driver* proposal freeze next week, Feb 11
15:06:08 <bswartz> that's Feature Freeze minus 3 weeks
15:06:33 <bswartz> New drivers are supposed to be in gerrit with +1 from Jenkins, and substantially complete by that date
15:06:58 <bswartz> also major new driver features should meet the same deadline
15:07:20 <bswartz> the feature proposal freeze for ALL features is a week later, Feb 18
15:07:58 <bswartz> basically we have 4 weeks from today until feature freeze and the deadlines are supposed to help us have enough time to review and merge the large backlog of changes
15:07:59 <toabctl_> bswartz: so what drivers do we have in the pipeline beside ceph (sorry, I'm a bit outdated currently)
15:08:33 <vponomaryov> toabctl: ZFS (on linux)
15:08:33 <bswartz> The one I know about is the ceph driver
15:08:47 <bswartz> there is another driver for ZFS which I'll discuss in a bit
15:09:15 <toabctl_> is it realistic to get ceph in? is somebody here from the ceph driver guys?
15:09:18 <jcsp> hi
15:09:23 <jcsp> ceph is hoping to meet deadline
15:09:28 <jcsp> the tempest stuff is being debugged at the moment
15:09:43 <jcsp> but it broadly works and our CI is up and running and should be green once the last couple of tests are fixed
15:09:45 <bswartz> main thing is to remind everyone about deadlines so no one is suprised next week when we start blocking new drivers
15:10:34 <mkoderer> jcsp: are the tempest tests already open for reviews?
15:10:39 <bswartz> toabctl_: ceph driver has been in gerrit for ~2 weeks now so people can review it
15:11:16 <jcsp> mkoderer: it's in the same review.  There aren't new scenario tests for it yet, but the existing tempest tests are run against the ceph driver.
15:11:30 <bswartz> yeah I haven't personally looked at the ceph driver but it seems to be meeting the deadlines with time to spare so I have a good feeling
15:12:08 <jcsp> we currently don't implement the new update_access() method, think it's optional at this stage, right?
15:12:08 <mkoderer> jcsp: ok fine
15:12:10 <jcsp> since it's not in master yet
15:12:35 <toabctl_> jcsp: I would say yes. if it's not yet in master
15:12:36 <bswartz> jcsp: thanks for that segue
15:13:02 <bswartz> everyone should be supporting the NEW access interface very soon
15:13:13 <bswartz> #link https://review.openstack.org/#/c/245126/
15:13:19 <ganso> bswartz: +1
15:13:26 <bswartz> it's okay to push patches that depend on this patch
15:13:50 <bswartz> I'm not sure why that hasn't merged yet
15:14:19 <tpsilva> it got in merge conflict, so it needed a rebase
15:14:23 <bswartz> but please don't let that fact slow down implementing the new interface
15:14:40 <ganso> ceph driver has already been rebased on top of it
15:14:46 <bswartz> excellent
15:14:52 <bswartz> okay that's enough for announcements lets get to the meeting
15:14:54 <ganso> so you can work replace access_allow and access_deny by update_access
15:14:56 <jcsp> (rebased on it to avoid conflicting on api versions, but not implemented method)
15:15:00 <bswartz> #agenda https://wiki.openstack.org/wiki/Manila/Meetings
15:15:07 <bswartz> #topic Data service handling protocol mount commands
15:15:19 <ganso> #link https://etherpad.openstack.org/p/manila-data-service-mount-meeting
15:15:38 <ganso> I started working on a new Data Service Driver Helper
15:15:54 <ganso> it should not require much coding effort, but it raised a lot of questions
15:16:26 <ganso> this requires additional standardization
15:16:50 <ganso> there are a few approaches suggested in the etherpad linked above
15:17:14 <ganso> there are still open questions, but each proposal follows a certain path, it defines a flexibility level
15:17:18 <ganso> other proposals are welcome
15:17:29 <bswartz> My main requirement is that clients should not need to know anything other than the export location(s) and the parameters they themselves specified when they created the share in order to mount it
15:17:54 <bswartz> it's not okay for different backends to force clients to mount their NFS shares (for example) differently
15:18:10 <bswartz> it's fine for there to be protocol-specific ways of mounting
15:18:18 <ganso> I would like that we all agree on a flexibility level, expose the problem to everyone so anyone can help with additional proposals and alternative solutions to the problem
15:18:27 <bswartz> A client that asks for a Ceph share knows he won't mount it the same way he mounts an NFS share
15:19:06 <ganso> the stricter we go, we easier for us, but not sure if it can work for everybody
15:19:10 <jcsp> is it expected that the data service will be explicitly allow_access()'d by the administrator?  Thinking about how it will e.g. know what identity to pass in when mounting ceph.
15:19:10 <ganso> s/we easier/the easier
15:19:26 <ganso> jcsp: yes
15:19:33 <bswartz> also, because we have only 1 driver for each of: CephFS, GlusterFS, and HDFS, those ones will be pretty easy to handle -- although we do need to consider how the manila data copy service authenticates and gets access to those backends
15:19:46 <ganso> jcsp: allowing/denying access will be automatized, the data service will need to have enough information to do it
15:19:55 <bswartz> it's NFS and CIFS which create the biggest challenges because we have multiple drivers for each
15:20:13 <ganso> jcsp: if you could please share some comments about ceph's approach, it would really help
15:20:57 <ganso> please see my observations at the end, they are very important regarding security and authentication
15:21:43 <jcsp> is the data service meant to have access to everything?
15:21:50 <bswartz> ganso: are you sure we can't be a client to 2 different CIFS domains at the same time?
15:22:04 <bswartz> jcsp: yes, at least during the time it needs access
15:22:12 <ganso> bswartz: I am not, but I am proposing that we be stricter
15:22:14 <jcsp> in the ceph case we can generate keys that give you access to every share, but I don't think there's a way to express that in manila
15:22:28 <jcsp> so the data service would need to grant itself access to each share, and get a ceph key in response (https://review.openstack.org/#/c/247540/)
15:22:30 <bswartz> jcsp: that's exactly the problem we need to fix
15:23:11 <bswartz> we need a way to get the "special admin credential" from the backend to the data copy service
15:23:21 <bswartz> and there's no interface for that yet
15:24:02 <ganso> the data service should have access to everything, it has the same rights as an admin, if drivers can create an additional export that does not require any security in the admin network (we already have admin export location), this makes everything much easier
15:24:22 <jcsp> yes, in this case you could use the same key that we already generate for use by manila-share
15:24:49 <ganso> jcsp: by already having this key, we do not need any special or dynamic parameters in the mount command, right?
15:24:51 <bswartz> so ganso can you own solving this problem and follow up with various people offline?
15:24:58 <jcsp> though that is an overly-powerful identity because it can also create other identities, so we would probably make it part of the install procedure to ask the admin to generate one for manila-share and another less privileged one for data-service
15:25:16 <bswartz> I don't think we're going to get a solution for all protocols right here because we don't have everyone we need
15:25:52 <bswartz> we need someone who understand HDFS and GlusterFS in addition to CephFS
15:26:08 <bswartz> and many of us can contribute ideas for NFS or CIFS
15:26:12 <vponomaryov> bswartz: HDFS looks abandoned for the moment
15:26:22 <vponomaryov> bswartz: its CI dead for looong time
15:26:30 <ganso> I was looking for chen, she is supposed to be the maintainer of HDFS
15:26:41 <markstur_> it looks like a passed in template would be powerful and flexible.  Probably a good idea, but...
15:26:44 <ganso> according to an etherpad
15:26:55 <bswartz> chen isn't maintaining HDFS anymore but she mentioned the new maintainer
15:26:58 <markstur_> it would be nice if the default template usually worked
15:27:53 <bswartz> arg I can't find that email atm
15:28:32 <bswartz> ganso I'll let you know who the new HDFS maintainer is
15:28:37 <ganso> bswartz: thanks!
15:28:50 <bswartz> moving on...
15:28:56 <bswartz> #topic Separate process for copy on data service
15:29:01 <bswartz> tpsilva: you're up
15:29:09 <tpsilva> ok, so..
15:29:09 <tpsilva> data service provides a copy operation, which is currently being used by share migration
15:29:15 <tpsilva> this copy operation uses shutil.copy2 on its implementation, which causes the service to block
15:29:25 <tpsilva> while blocked, any requests coming to the data service will not be handled until the current file copy finishes
15:29:35 <tpsilva> the solution seems simple, just move the shutil.copy2 call to a different thread, unblocking the service
15:29:44 <tpsilva> problem is, threading.Thread doesn't seem to work: the service still does not respond
15:29:54 <tpsilva> multiprocessing.Process solves this problem, but now the copy operation runs on a different process
15:30:04 <tpsilva> is it ok to use a different process for that?
15:30:19 <vponomaryov> tpsilva: and you sure that you properly greenthreaded your service?
15:30:29 <tpsilva> vponomaryov: with evenlet?
15:30:33 <vponomaryov> tpsilva: yes
15:30:48 <tpsilva> yes... it is being greened like the other services
15:30:52 <tpsilva> with monkey_patch
15:30:52 <ganso> vponomaryov: at this moment, it is not a whole new service, it is just a new process
15:31:07 <bswartz> I would think eventlet would solve this, but I'm NOT a fan of eventlet so if there are other solutions I'm open to them
15:31:40 <bswartz> I like the idea of the data copy service forking child processes as long as we can cap the number of children
15:31:53 <tpsilva> it would be just one child per copy
15:32:01 <ganso> services are soft-threaded, this case requires at least 2 hard-threads for the data-service, or a data-service-worker service
15:32:06 <bswartz> same applies with threads I guess, we want parallelism, but not too much parallelism
15:32:40 <tpsilva> so, looks like using process is ok?
15:32:48 <bswartz> the admin should be able to control the max parallelism with a config param
15:32:54 <bswartz> what are the downsides of processes
15:33:14 <ganso> bswartz: we don't have a queue implementation yet
15:33:27 <tpsilva> it would be just one background process for the copy
15:33:32 <bswartz> how would child processes respond to the data copy service restarting?
15:33:37 <ganso> bswartz: the point is, we are parallelizing the Copy class, which can be used by drivers, such as Huawei's
15:34:00 <tpsilva> bswartz: hmm... haven't tested that. I believe it dies if the main processes dies
15:34:09 <bswartz> better find out
15:34:10 <tpsilva> s/processes/process
15:34:12 <tpsilva> ok
15:34:32 <bswartz> killing an active copy operation has major downsides
15:34:39 <bswartz> ideally the admin could choose the behaviour
15:34:52 <tpsilva> and somehow, manila-share should be notified
15:34:54 <bswartz> sometimes you want to restart the service without halting ongoing copies, sometimes you really want to kill everything
15:35:02 <tpsilva> for the migration scenario
15:35:31 <ganso> bswartz: there has been a case in our lab or the process becoming a zombie... due to an uninterruptible I/O lock caused by NFS
15:35:44 <ganso> s/or the process/of the process
15:36:11 <ganso> the data service already sets all share statuses to error when restarted, if they were previously in progress
15:36:23 <bswartz> ok
15:36:29 <ganso> thing is, linux may complicate things, like having mounts stuck
15:36:40 <ganso> so the admin may have some work to do to clean things up
15:36:54 <tpsilva> but this would happen with or without a different process
15:36:55 <bswartz> the data copy service should probably mount with -o intr
15:37:13 <ganso> humm
15:37:15 <tpsilva> good to know
15:38:10 <tpsilva> I think I should submit a wip patch so other people can see it
15:38:17 <ganso> tpsilva: +1
15:38:29 <tpsilva> ok then
15:38:38 <bswartz> yeah we need to just test solutions and propose something here
15:38:45 <tpsilva> I will also test killing the service like bswartz said
15:38:45 <bswartz> if someone doesn't like it they can propose an alternative
15:38:58 <bswartz> I'm okay with either approach I think
15:39:18 <bswartz> although I dream of a future without eventlet
15:39:31 <cknight> bswartz: +1
15:39:36 <bswartz> okay next topic
15:39:43 <bswartz> #topic ZFS driver
15:40:24 <bswartz> so we agreed back in Tokyo that there needs to be a first party driver which supports replication, for testing and maintenence
15:41:01 <bswartz> DRBD turned out to be too expensive (resource-wise) to use in the gate
15:41:21 <dustins> DRBD?
15:41:22 <bswartz> so the best plan we have is a driver based on ZFS, which we're very confident will work
15:41:51 <bswartz> dustins: http://drbd.linbit.com/
15:42:09 <dustins> bswartz: Thanks!
15:42:33 <bswartz> it turns out that there's obvious way to add ZFS support to any of the existing first party drivers, so the plan is now to simply add a 4th first party driver
15:42:51 <bswartz> vponomaryov is working on that currently
15:44:06 <bswartz> because we don't want to wait much longer to merge the replication code (it's been in gerrit too long and had too many rebases) I'm going to propose merging the feature as soon as the ZFS driver can be demonstrated to work with the feature
15:44:44 <ganso> bswartz: +1
15:44:54 <cknight> bswartz: +1
15:45:01 <bswartz> now whether the ZFS driver will be production ready by the deadline, I'm not sure
15:46:11 <bswartz> but the whole point of the ZFS driver is to give us a way to test (and maintain) the new feature, so we need to think about what we want to do with the ZFS driver in Mitaka
15:46:21 <cknight> bswartz: It seems we don't really need 4 first-party drivers.  Which one(s) will we deprecate?  Generic?
15:46:37 <bswartz> cknight: LVM is the one with the most overlap
15:46:43 <vponomaryov> cknight: I guess LVM, as we do not really need volume-dependent thing
15:46:48 <ganso> cknight: we may have core features that could be demoed on a certain first party, including generic
15:47:09 <bswartz> ZFS is vastly superior to LVM
15:47:37 <ganso> generic as far as I know seems to be less advantageous compared to others
15:47:41 <bswartz> generic won't go away I don't think -- it solves too many use cases
15:47:50 <cknight> OK, makes sense to replace LVM with ZFS.  But LVM is so simple, it's not expensive to keep it around.  There is value in its simplicity.
15:48:04 <vponomaryov> cknight: ZFS much cheaper
15:48:05 <bswartz> personally I'm okay with 4 first party drivers
15:48:12 <jcsp> ZFS could be painful for folks testing on any distro that doesn't have the magic licensed packages, so having a fallback would be useful
15:48:26 <cknight> vponomaryov: OK, good.  Looking forward to seeing ZFS.
15:48:27 <toabctl_> jcsp: yes
15:48:32 <dustins> jcsp: +1
15:49:23 <cknight> bswartz: So the goal is to resolve our gate issues by moving all flavors of Generic to non-voting, and having ZFS and LXD as the voting drivers?
15:49:35 <bswartz> vponomaryov: I hope it turns out as simple as you say!
15:49:52 <ganso> cknight: my suggestion is that generic continue to be voting, but stripped down
15:50:09 <vponomaryov> bswartz: ZFS incredibly simple
15:50:11 <bswartz> cknight: generic MUST be nonvoting, but everything else is negotiable
15:50:17 <vponomaryov> bswartz: for usage
15:50:25 <cknight> bswartz: OK
15:50:46 <bswartz> the generic driver is incredibly valuable but just not reliable enough for gate testing (given the constrained resources of the gate)
15:50:49 <cknight> ganso: Generic can't be voting.  It has too many external dependencies that break all the time.
15:51:11 <bswartz> yes the external dependencies are also a big problem
15:51:27 <toabctl_> vponomaryov: have you also looked at btrfs which is in the mainline kernel? not sure if that supports replication.
15:51:29 <ganso> do we plan on maintaining those dependencies? or deprecating it?
15:51:46 <bswartz> ganso: the dependencies are cinder and nova -- we can't remove those
15:51:52 <vponomaryov> toabctl: no
15:52:03 <bswartz> toabctl_: I have
15:52:05 <vponomaryov> toabctl: didn't look at it
15:52:20 <ganso> bswartz: we would deprecate generic driver along with its dependencies
15:52:24 <bswartz> toabctl_: btrfs is a reasonable alternative to zfs but it's less mature
15:52:26 <cknight> ganso: I propose the Generic driver always be the responsibility of the newest core reviewer.
15:52:32 <bswartz> lol
15:52:44 <ganso> cknight: oh god
15:52:46 <markstur_> cknight, +1
15:53:06 <vponomaryov> cknight: so cruel...
15:53:16 <markstur_> and ganzo nominates...
15:53:30 <toabctl_> bswartz: "less mature" ? it's in mainline and at least supported by one enterprise distro. anyway. I guess this decision is already done.
15:53:30 <markstur_> s/ganzo/ganso/
15:53:43 <bswartz> vponomaryov, toabctl_: we could conceivably make a btrfs driver alongside the ZFS driver and the code might be almost identical, but for now we just need something that works
15:53:57 <toabctl_> yes. I agree. I'm fine with zfs for now
15:54:25 <cknight> bswartz: It'd be cool to have a replication driver with pluggable filesystems underneath, but time is short for now.
15:55:13 <bswartz> toabctl_: both projects are owned by Oracle, but ZFS has been in production for about twice as long as btrfs
15:55:31 <bswartz> the reason btrfs is mainline and zfs isn't has to do with licensing
15:55:32 <toabctl_> bswartz: hm. I thought btrfs is maintained by facebook. but not sure
15:55:52 <bswartz> well it's open source, I'm sure many people contribute to both
15:56:02 <bswartz> anyways we're running out of time
15:56:13 <bswartz> #topic First party drivers share migration support cooperation
15:56:19 <bswartz> ganso: 4 minutes left... :-(
15:56:34 <ganso> this topic has been partially covered in the previous one
15:56:39 <ganso> so it is just a heads up
15:56:48 <ganso> that I've been working with some driver/CI maintainers
15:56:52 <ganso> to support migration
15:57:09 <ganso> back in Tokyo at the migration session we agreed to test migration for different drivers
15:57:26 <ganso> but so far it seems not everyone has enabled migration tests in their CI
15:57:56 <ganso> and since migration approach is now becoming more and more generic, to support all vendors with less effort as possible
15:58:16 <bswartz> good point
15:58:17 <ganso> I would like to take a closer look at vendor's CI
15:58:25 <ganso> I am starting from the first party drivers
15:58:31 <bswartz> we haven't pushed vendors to increase CI coverage much this cycle
15:58:37 <bswartz> it's something that needs to happen though
15:59:19 <ganso> so I maintainers create a separate job to test migration, I am here to help debug issues and help configure their CIs to work with migration
15:59:26 <ganso> s/so I/so if
15:59:41 <bswartz> for Mitaka I think it will have to be voluntary but I do strongly encourage vendors to unskip as many tests as possible in CI, and migration is a good candidate to start testing
16:00:05 <bswartz> maybe in Newton we'll require it
16:00:11 <bswartz> so better to start checking it out now
16:00:20 <bswartz> that's it for this week everyone
16:00:24 <bswartz> thanks!
16:00:33 <bswartz> #endmeeting