15:01:32 #startmeeting manila 15:01:33 Meeting started Thu Feb 4 15:01:32 2016 UTC and is due to finish in 60 minutes. The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:34 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:36 The meeting name has been set to 'manila' 15:01:39 hey 15:01:40 Hi 15:01:40 o/ hey! 15:01:42 hello 15:01:43 hi 15:01:43 hello all 15:01:45 hello 15:01:49 hi 15:01:51 hi 15:02:09 hi 15:02:28 #topic announcement 15:02:31 hi 15:02:41 first thing: 15:02:49 welcome ganso to manila core reviewer team! 15:02:55 welcome! 15:03:00 ganso: welcome! 15:03:03 +1 15:03:20 congrats 15:03:21 bswartz: thanks! I am very happy to be part of this community and continue contributing to the Manila project! :) 15:03:24 congrats 15:03:30 congrats ganso! 15:03:42 ganso: congrats! )) 15:03:49 :) 15:03:50 I'm very excited to have ganso's help, and look forward to extra reviewer bandwidth in the coming weeks as we approach Mitaka Feature Freeze 15:03:59 ganso: congrats! 15:04:12 looks like I missed the email 15:04:39 xyang1: this is the announcement -- I'll send an email after the meeting 15:04:58 bswartz: ok 15:05:02 0/ 15:05:13 so next thing: deadlines 15:05:54 as we agreed to at the start of the release, there is a *Driver* proposal freeze next week, Feb 11 15:06:08 that's Feature Freeze minus 3 weeks 15:06:33 New drivers are supposed to be in gerrit with +1 from Jenkins, and substantially complete by that date 15:06:58 also major new driver features should meet the same deadline 15:07:20 the feature proposal freeze for ALL features is a week later, Feb 18 15:07:58 basically we have 4 weeks from today until feature freeze and the deadlines are supposed to help us have enough time to review and merge the large backlog of changes 15:07:59 bswartz: so what drivers do we have in the pipeline beside ceph (sorry, I'm a bit outdated currently) 15:08:33 toabctl: ZFS (on linux) 15:08:33 The one I know about is the ceph driver 15:08:47 there is another driver for ZFS which I'll discuss in a bit 15:09:15 is it realistic to get ceph in? is somebody here from the ceph driver guys? 15:09:18 hi 15:09:23 ceph is hoping to meet deadline 15:09:28 the tempest stuff is being debugged at the moment 15:09:43 but it broadly works and our CI is up and running and should be green once the last couple of tests are fixed 15:09:45 main thing is to remind everyone about deadlines so no one is suprised next week when we start blocking new drivers 15:10:34 jcsp: are the tempest tests already open for reviews? 15:10:39 toabctl_: ceph driver has been in gerrit for ~2 weeks now so people can review it 15:11:16 mkoderer: it's in the same review. There aren't new scenario tests for it yet, but the existing tempest tests are run against the ceph driver. 15:11:30 yeah I haven't personally looked at the ceph driver but it seems to be meeting the deadlines with time to spare so I have a good feeling 15:12:08 we currently don't implement the new update_access() method, think it's optional at this stage, right? 15:12:08 jcsp: ok fine 15:12:10 since it's not in master yet 15:12:35 jcsp: I would say yes. if it's not yet in master 15:12:36 jcsp: thanks for that segue 15:13:02 everyone should be supporting the NEW access interface very soon 15:13:13 #link https://review.openstack.org/#/c/245126/ 15:13:19 bswartz: +1 15:13:26 it's okay to push patches that depend on this patch 15:13:50 I'm not sure why that hasn't merged yet 15:14:19 it got in merge conflict, so it needed a rebase 15:14:23 but please don't let that fact slow down implementing the new interface 15:14:40 ceph driver has already been rebased on top of it 15:14:46 excellent 15:14:52 okay that's enough for announcements lets get to the meeting 15:14:54 so you can work replace access_allow and access_deny by update_access 15:14:56 (rebased on it to avoid conflicting on api versions, but not implemented method) 15:15:00 #agenda https://wiki.openstack.org/wiki/Manila/Meetings 15:15:07 #topic Data service handling protocol mount commands 15:15:19 #link https://etherpad.openstack.org/p/manila-data-service-mount-meeting 15:15:38 I started working on a new Data Service Driver Helper 15:15:54 it should not require much coding effort, but it raised a lot of questions 15:16:26 this requires additional standardization 15:16:50 there are a few approaches suggested in the etherpad linked above 15:17:14 there are still open questions, but each proposal follows a certain path, it defines a flexibility level 15:17:18 other proposals are welcome 15:17:29 My main requirement is that clients should not need to know anything other than the export location(s) and the parameters they themselves specified when they created the share in order to mount it 15:17:54 it's not okay for different backends to force clients to mount their NFS shares (for example) differently 15:18:10 it's fine for there to be protocol-specific ways of mounting 15:18:18 I would like that we all agree on a flexibility level, expose the problem to everyone so anyone can help with additional proposals and alternative solutions to the problem 15:18:27 A client that asks for a Ceph share knows he won't mount it the same way he mounts an NFS share 15:19:06 the stricter we go, we easier for us, but not sure if it can work for everybody 15:19:10 is it expected that the data service will be explicitly allow_access()'d by the administrator? Thinking about how it will e.g. know what identity to pass in when mounting ceph. 15:19:10 s/we easier/the easier 15:19:26 jcsp: yes 15:19:33 also, because we have only 1 driver for each of: CephFS, GlusterFS, and HDFS, those ones will be pretty easy to handle -- although we do need to consider how the manila data copy service authenticates and gets access to those backends 15:19:46 jcsp: allowing/denying access will be automatized, the data service will need to have enough information to do it 15:19:55 it's NFS and CIFS which create the biggest challenges because we have multiple drivers for each 15:20:13 jcsp: if you could please share some comments about ceph's approach, it would really help 15:20:57 please see my observations at the end, they are very important regarding security and authentication 15:21:43 is the data service meant to have access to everything? 15:21:50 ganso: are you sure we can't be a client to 2 different CIFS domains at the same time? 15:22:04 jcsp: yes, at least during the time it needs access 15:22:12 bswartz: I am not, but I am proposing that we be stricter 15:22:14 in the ceph case we can generate keys that give you access to every share, but I don't think there's a way to express that in manila 15:22:28 so the data service would need to grant itself access to each share, and get a ceph key in response (https://review.openstack.org/#/c/247540/) 15:22:30 jcsp: that's exactly the problem we need to fix 15:23:11 we need a way to get the "special admin credential" from the backend to the data copy service 15:23:21 and there's no interface for that yet 15:24:02 the data service should have access to everything, it has the same rights as an admin, if drivers can create an additional export that does not require any security in the admin network (we already have admin export location), this makes everything much easier 15:24:22 yes, in this case you could use the same key that we already generate for use by manila-share 15:24:49 jcsp: by already having this key, we do not need any special or dynamic parameters in the mount command, right? 15:24:51 so ganso can you own solving this problem and follow up with various people offline? 15:24:58 though that is an overly-powerful identity because it can also create other identities, so we would probably make it part of the install procedure to ask the admin to generate one for manila-share and another less privileged one for data-service 15:25:16 I don't think we're going to get a solution for all protocols right here because we don't have everyone we need 15:25:52 we need someone who understand HDFS and GlusterFS in addition to CephFS 15:26:08 and many of us can contribute ideas for NFS or CIFS 15:26:12 bswartz: HDFS looks abandoned for the moment 15:26:22 bswartz: its CI dead for looong time 15:26:30 I was looking for chen, she is supposed to be the maintainer of HDFS 15:26:41 it looks like a passed in template would be powerful and flexible. Probably a good idea, but... 15:26:44 according to an etherpad 15:26:55 chen isn't maintaining HDFS anymore but she mentioned the new maintainer 15:26:58 it would be nice if the default template usually worked 15:27:53 arg I can't find that email atm 15:28:32 ganso I'll let you know who the new HDFS maintainer is 15:28:37 bswartz: thanks! 15:28:50 moving on... 15:28:56 #topic Separate process for copy on data service 15:29:01 tpsilva: you're up 15:29:09 ok, so.. 15:29:09 data service provides a copy operation, which is currently being used by share migration 15:29:15 this copy operation uses shutil.copy2 on its implementation, which causes the service to block 15:29:25 while blocked, any requests coming to the data service will not be handled until the current file copy finishes 15:29:35 the solution seems simple, just move the shutil.copy2 call to a different thread, unblocking the service 15:29:44 problem is, threading.Thread doesn't seem to work: the service still does not respond 15:29:54 multiprocessing.Process solves this problem, but now the copy operation runs on a different process 15:30:04 is it ok to use a different process for that? 15:30:19 tpsilva: and you sure that you properly greenthreaded your service? 15:30:29 vponomaryov: with evenlet? 15:30:33 tpsilva: yes 15:30:48 yes... it is being greened like the other services 15:30:52 with monkey_patch 15:30:52 vponomaryov: at this moment, it is not a whole new service, it is just a new process 15:31:07 I would think eventlet would solve this, but I'm NOT a fan of eventlet so if there are other solutions I'm open to them 15:31:40 I like the idea of the data copy service forking child processes as long as we can cap the number of children 15:31:53 it would be just one child per copy 15:32:01 services are soft-threaded, this case requires at least 2 hard-threads for the data-service, or a data-service-worker service 15:32:06 same applies with threads I guess, we want parallelism, but not too much parallelism 15:32:40 so, looks like using process is ok? 15:32:48 the admin should be able to control the max parallelism with a config param 15:32:54 what are the downsides of processes 15:33:14 bswartz: we don't have a queue implementation yet 15:33:27 it would be just one background process for the copy 15:33:32 how would child processes respond to the data copy service restarting? 15:33:37 bswartz: the point is, we are parallelizing the Copy class, which can be used by drivers, such as Huawei's 15:34:00 bswartz: hmm... haven't tested that. I believe it dies if the main processes dies 15:34:09 better find out 15:34:10 s/processes/process 15:34:12 ok 15:34:32 killing an active copy operation has major downsides 15:34:39 ideally the admin could choose the behaviour 15:34:52 and somehow, manila-share should be notified 15:34:54 sometimes you want to restart the service without halting ongoing copies, sometimes you really want to kill everything 15:35:02 for the migration scenario 15:35:31 bswartz: there has been a case in our lab or the process becoming a zombie... due to an uninterruptible I/O lock caused by NFS 15:35:44 s/or the process/of the process 15:36:11 the data service already sets all share statuses to error when restarted, if they were previously in progress 15:36:23 ok 15:36:29 thing is, linux may complicate things, like having mounts stuck 15:36:40 so the admin may have some work to do to clean things up 15:36:54 but this would happen with or without a different process 15:36:55 the data copy service should probably mount with -o intr 15:37:13 humm 15:37:15 good to know 15:38:10 I think I should submit a wip patch so other people can see it 15:38:17 tpsilva: +1 15:38:29 ok then 15:38:38 yeah we need to just test solutions and propose something here 15:38:45 I will also test killing the service like bswartz said 15:38:45 if someone doesn't like it they can propose an alternative 15:38:58 I'm okay with either approach I think 15:39:18 although I dream of a future without eventlet 15:39:31 bswartz: +1 15:39:36 okay next topic 15:39:43 #topic ZFS driver 15:40:24 so we agreed back in Tokyo that there needs to be a first party driver which supports replication, for testing and maintenence 15:41:01 DRBD turned out to be too expensive (resource-wise) to use in the gate 15:41:21 DRBD? 15:41:22 so the best plan we have is a driver based on ZFS, which we're very confident will work 15:41:51 dustins: http://drbd.linbit.com/ 15:42:09 bswartz: Thanks! 15:42:33 it turns out that there's obvious way to add ZFS support to any of the existing first party drivers, so the plan is now to simply add a 4th first party driver 15:42:51 vponomaryov is working on that currently 15:44:06 because we don't want to wait much longer to merge the replication code (it's been in gerrit too long and had too many rebases) I'm going to propose merging the feature as soon as the ZFS driver can be demonstrated to work with the feature 15:44:44 bswartz: +1 15:44:54 bswartz: +1 15:45:01 now whether the ZFS driver will be production ready by the deadline, I'm not sure 15:46:11 but the whole point of the ZFS driver is to give us a way to test (and maintain) the new feature, so we need to think about what we want to do with the ZFS driver in Mitaka 15:46:21 bswartz: It seems we don't really need 4 first-party drivers. Which one(s) will we deprecate? Generic? 15:46:37 cknight: LVM is the one with the most overlap 15:46:43 cknight: I guess LVM, as we do not really need volume-dependent thing 15:46:48 cknight: we may have core features that could be demoed on a certain first party, including generic 15:47:09 ZFS is vastly superior to LVM 15:47:37 generic as far as I know seems to be less advantageous compared to others 15:47:41 generic won't go away I don't think -- it solves too many use cases 15:47:50 OK, makes sense to replace LVM with ZFS. But LVM is so simple, it's not expensive to keep it around. There is value in its simplicity. 15:48:04 cknight: ZFS much cheaper 15:48:05 personally I'm okay with 4 first party drivers 15:48:12 ZFS could be painful for folks testing on any distro that doesn't have the magic licensed packages, so having a fallback would be useful 15:48:26 vponomaryov: OK, good. Looking forward to seeing ZFS. 15:48:27 jcsp: yes 15:48:32 jcsp: +1 15:49:23 bswartz: So the goal is to resolve our gate issues by moving all flavors of Generic to non-voting, and having ZFS and LXD as the voting drivers? 15:49:35 vponomaryov: I hope it turns out as simple as you say! 15:49:52 cknight: my suggestion is that generic continue to be voting, but stripped down 15:50:09 bswartz: ZFS incredibly simple 15:50:11 cknight: generic MUST be nonvoting, but everything else is negotiable 15:50:17 bswartz: for usage 15:50:25 bswartz: OK 15:50:46 the generic driver is incredibly valuable but just not reliable enough for gate testing (given the constrained resources of the gate) 15:50:49 ganso: Generic can't be voting. It has too many external dependencies that break all the time. 15:51:11 yes the external dependencies are also a big problem 15:51:27 vponomaryov: have you also looked at btrfs which is in the mainline kernel? not sure if that supports replication. 15:51:29 do we plan on maintaining those dependencies? or deprecating it? 15:51:46 ganso: the dependencies are cinder and nova -- we can't remove those 15:51:52 toabctl: no 15:52:03 toabctl_: I have 15:52:05 toabctl: didn't look at it 15:52:20 bswartz: we would deprecate generic driver along with its dependencies 15:52:24 toabctl_: btrfs is a reasonable alternative to zfs but it's less mature 15:52:26 ganso: I propose the Generic driver always be the responsibility of the newest core reviewer. 15:52:32 lol 15:52:44 cknight: oh god 15:52:46 cknight, +1 15:53:06 cknight: so cruel... 15:53:16 and ganzo nominates... 15:53:30 bswartz: "less mature" ? it's in mainline and at least supported by one enterprise distro. anyway. I guess this decision is already done. 15:53:30 s/ganzo/ganso/ 15:53:43 vponomaryov, toabctl_: we could conceivably make a btrfs driver alongside the ZFS driver and the code might be almost identical, but for now we just need something that works 15:53:57 yes. I agree. I'm fine with zfs for now 15:54:25 bswartz: It'd be cool to have a replication driver with pluggable filesystems underneath, but time is short for now. 15:55:13 toabctl_: both projects are owned by Oracle, but ZFS has been in production for about twice as long as btrfs 15:55:31 the reason btrfs is mainline and zfs isn't has to do with licensing 15:55:32 bswartz: hm. I thought btrfs is maintained by facebook. but not sure 15:55:52 well it's open source, I'm sure many people contribute to both 15:56:02 anyways we're running out of time 15:56:13 #topic First party drivers share migration support cooperation 15:56:19 ganso: 4 minutes left... :-( 15:56:34 this topic has been partially covered in the previous one 15:56:39 so it is just a heads up 15:56:48 that I've been working with some driver/CI maintainers 15:56:52 to support migration 15:57:09 back in Tokyo at the migration session we agreed to test migration for different drivers 15:57:26 but so far it seems not everyone has enabled migration tests in their CI 15:57:56 and since migration approach is now becoming more and more generic, to support all vendors with less effort as possible 15:58:16 good point 15:58:17 I would like to take a closer look at vendor's CI 15:58:25 I am starting from the first party drivers 15:58:31 we haven't pushed vendors to increase CI coverage much this cycle 15:58:37 it's something that needs to happen though 15:59:19 so I maintainers create a separate job to test migration, I am here to help debug issues and help configure their CIs to work with migration 15:59:26 s/so I/so if 15:59:41 for Mitaka I think it will have to be voluntary but I do strongly encourage vendors to unskip as many tests as possible in CI, and migration is a good candidate to start testing 16:00:05 maybe in Newton we'll require it 16:00:11 so better to start checking it out now 16:00:20 that's it for this week everyone 16:00:24 thanks! 16:00:33 #endmeeting