15:00:14 #startmeeting manila 15:00:15 Meeting started Thu Jan 21 15:00:14 2016 UTC and is due to finish in 60 minutes. The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:16 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:18 The meeting name has been set to 'manila' 15:00:20 hello all 15:00:21 Hi 15:00:23 hello 15:00:24 \o 15:00:26 hello 15:00:29 hi 15:01:06 hmmm not many people here 15:01:09 o/ 15:01:18 toabctl told me he won't be here 15:01:23 xyang isn't around 15:01:39 #topic announcements 15:01:47 Hello, everyone. This is my first time attending manila meeting.I'm a newcomer. 15:01:52 hi 15:01:57 hi 15:01:58 tingwang: welcome! 15:01:59 hey 15:02:02 tingwang: Welcome! 15:02:04 so the M-2 milestone is today, and we have not merged everything we wanted to, not even close 15:02:05 Thank you 15:02:14 tingwang: hello, and welcome 15:02:25 bswartz: kinda hard to merge things fast when the gate doesn't work 15:02:29 hi 15:02:37 I'm happy to see you. 15:02:51 regarding M-2, there are several things in the gate I'm hoping to get in before the tag, but the whether they do or not, I will tag the milestone this afternoon 15:03:15 cknight: the gate is less bad today than previous days this week it seems 15:03:25 but anything is possible 15:03:56 bswartz: devstack starteg taking about 50 minutes (( 15:04:11 the main things I wanted to call attention to are the LVM driver https://blueprints.launchpad.net/manila/+spec/lvm-driver https://review.openstack.org/#/c/232970/ 15:04:58 and the share access control interface changes https://blueprints.launchpad.net/manila/+spec/new-share-access-driver-interface https://review.openstack.org/#/c/245126/ 15:05:03 LVM driver looks pretty good. It'd be great if others could also try running it. 15:05:21 I got it going pretty easily. 15:05:30 these both continue to be urgent 15:05:47 cknight: I'll make a point to give it a shot on my machines here 15:05:48 there is some discussion around the second 15:06:04 vponomaryov: are you joking about devstack? 15:06:14 bswartz: no 15:06:25 vponomaryov: how does any gate job pass? 15:06:35 bswartz: it is about failed ones 15:06:47 is it random? 15:06:54 bswartz: kind of 15:07:00 okay 15:07:21 there are 6 changes with workflow +1 currently 15:07:27 we'll see what gets through by this afternoon 15:08:04 #agenda https://wiki.openstack.org/wiki/Manila/Meetings 15:08:12 #topic Update_access backwards compatibility approach discussion 15:08:16 ganso: you're up 15:08:40 #link https://review.openstack.org/#/c/245126/ 15:09:14 currently, tempest tests are testing V1 and V2 the same way 15:09:19 because API has not changed 15:09:34 but in that patch, it currently changes some DB fields 15:09:44 it removes "state" field from Access table 15:09:55 and adds "access_rules_status" to Share Instance table 15:10:19 which causes API requests to retrieve rules to not return a value 15:10:28 this affects V1 and V2 microversioned up to 2.8 15:11:00 so if the tempest tests were done correctly, they would be failing, right? 15:11:09 so, behavior of V1 and V2-up-to-2.8 should be maintained, and not borken 15:11:14 s/borken/broken 15:11:21 bswartz: yes 15:11:23 they are 15:11:38 tpsilva: in that change, it currently is not 15:11:56 because they are not testing V1 appropriately 15:12:00 they are testing V2 2.9 15:12:19 right 15:12:20 so I have 2 proposals 15:12:28 proposal 1: creating a property "state" in access model that derives share access_rule_status. Map "out_of_sync" to "new", "active" to "active", "error" to "error". Downside is that whenever an user adds a new access rule, all rules change to "new" and then back to "active". 15:12:34 proposal 2: do not remove fields, keep old behavior and new one. For each new request, an access rule is added to DB with status "new", if API call succeeds, state is changed to "active". If it fails changes to "error". Behavior is closer to original. Downside is managing when one access rule fixes the other, or are added in a group, but API access_allow 15:12:34 allows only for one access_rule to be added at a time. 15:13:12 proposal 2 seems to have best results maintaining V1 behavior 15:13:40 I prefer proposal 1 15:13:58 why do we care about preserving the V1 behaviour, other than not breaking existing clients? 15:14:20 the old behaviour was confusing and not something we should continue IMO 15:15:10 the only reason to spend extra effort making the old API behaviour exactly correct would be if there were cases where clients depended on the states being expressed exactly as they used to be 15:15:21 and I seriously doubt any logic depends on the values in those fields 15:15:23 ganso: which causes API requests to retrieve rules to not return a value.. .I didn't really understand this part. We're removing a field, why would we not be able to retrieve rules? 15:15:41 as long as we provide some value which is vaguely accurate, I think clients will be fine 15:15:59 Proposal 1 is closer to what's happening on the backend. All rules are passed down to the driver to re-apply, so a brief time in the New state seems OK. 15:16:10 gouthamr: we are able to retrieve rules, but the field "state" from table Access is not present anymore. 15:16:36 gouthamr: access state used to be stored per-rule -- we're chaging it to be per-share-instance 15:16:40 Test scripts that set a rule and test its status will be broken 15:16:41 cknight: there is also the case where if one rule fails, all rules change to error at the same time 15:16:57 Production scripts? 15:17:05 ganso: that's close enough to the truth that I wouldn't expect breakage 15:17:15 markstur_: if they test individual rules they will not. If they test the whole list, or multiple items in the list, they will. 15:17:29 I would expect people to be confused, but the fix is simple -- upgrade to new client 15:18:36 As long as that's clearly documented, I'm alright with that 15:19:18 dustins: would this affect only Mitaka docs? 15:19:40 ganso: I think this calls for a release note 15:19:41 dustins: document that "if using V1 client, or V2 pre-2.9, the following behavior will be shown:" 15:19:48 bswartz: +1 15:19:50 ganso: +1 15:20:00 ganso: Exactly, yes 15:20:06 ganso: +1, standard deprecation.. like the others we've done so far. 15:20:18 great then 15:20:26 does everybody agree with proposal 1 or should we vote? 15:20:46 I favor proposal 1 15:20:48 any opposed? 15:20:52 I think my concern was addressed 15:21:35 ok with both 15:21:50 btw just to add some more background, the driver interface part of this change is the new interface that all the existing drivers need to implement in M-3, so it's urgent that that part of this change merges asap 15:22:02 this was discussed at midcycle last week 15:22:36 so we can tinker with the API part of the change later if needed, but some flavor of the change needs to go in 15:22:53 that's why I'm in favor of a cheap and easy solution 15:23:24 okay I think we're agreed 15:23:50 #agreed we will go with ganso's first proposal for share access rule states 15:23:50 ok 15:24:03 #topic Exposing the Data Service to drivers 15:24:09 ganso: you're up again 15:24:49 markstur_ and I were talking yesterday about exposing some basic functionality of Data Service to drivers, possibly through a helper 15:25:18 the manager would take care of RPC calls, handle results, callbacks 15:25:29 my main concern is this #link https://review.openstack.org/#/c/259917/ 15:25:34 I posted a comment 15:25:43 we briefly discussed this at the midcycle 15:25:44 ganso: would the manager run the copy code itself, or deal with sending RPCs to the actual data copy service? 15:25:59 the copy code would be offloaded to the data service 15:26:04 no copy would be performed in the manager 15:26:26 but there are 2 things very important here 15:26:41 1) drivers are not experimental, if driver change is in, it is in for Mitaka release 15:26:47 2) Migration and Data Service still are 15:27:00 so agreeing this, we need to make Migration and Data Service not experimental anymore 15:27:26 that ^, if a driver merges and uses Data Service 15:27:28 ganso: I agree, but there is another option 15:28:02 the data copy service does not need to be experimental -- it has no REST API 15:28:20 bswartz: right, good point 15:28:24 the migration APIs can continue to be experimental even as we fully support the data copy service 15:28:35 right now we don't have plans to expose Data Service functions through API, only internally to drivers through helper 15:29:47 ganso: Thanks for pointing this out. I don't think drivers should be doing a full data copy themselves. But using the data copy service would make the share-from-snapshot asynchronous from the driver's perspective. How would state changes be handled? 15:30:03 there is precedent for managers to send RPCs, so allowing the manager to send an RPC to the data copy service doesn't seem like a bad thing 15:30:18 however we should discuss what the actual RPC for data copying looks like 15:30:36 it has to be asynchronous to some degree, and it will require DB state 15:31:24 cknight: share status would remain in "creating", callback function would change it to "available". This is something we still need to design, drivers are not allowed to touch DB, manager needs to do it, but in this case, manager would need to know what to do. This would be a simple share model update. 15:32:13 ganso: OK. It does open possibilities for more drivers to support interesting workflows, albeit more slowly. 15:32:48 I'm referring to how the data copy service tracks its own work 15:33:13 bswartz: yes, this is a problem since we don't have the Jobs table 15:33:14 ideally the data copy service should be able to survive restarts and reboots 15:33:25 bswartz: that as well... 15:33:44 it is not resilient, fault tolerant 15:33:51 if we don't add those features then the data copy service will end up being highly unreliable in production 15:34:27 as simple copy_jobs table would solve a lot of problems 15:34:35 s/as/a/ 15:34:38 bswartz: but either that, or a driver in production bottlenecking the manila-share service copying 1TB of data 15:34:45 bswartz: At a minimum, it seems we're agreed drivers shouldn't be doing full host-based data copy operations themselves, right? 15:35:06 cknight: +1 15:35:07 cknight: I'm not sure I'd go that far 15:35:21 cknight: ideally they would not, but in the short term we may need to tolerate it 15:35:43 once we have a working reliable data copy service, then I think we should enforce that kind of rule 15:35:46 leveraging data service would be preferred 15:36:04 but until we do, I don't want to prevent drivers from doing what needs to be done 15:36:06 bswartz: It's a scaling nightmare for any driver. Why would a driver author choose to do that? 15:36:24 I don't see why it is "highly unreliable" 15:36:25 in any case we will be no worse off than cinder, which does immense amounts of data copying on the cinder control nodes 15:37:24 Seems we're going the right direction 15:37:36 markstur_: if all the data copy jobs simply fail when I restart the manila-data-copy service then people who use them will have to design workarounds like retry mechanisms 15:37:57 I'd rather that the retry mechanism lives inside the copy services so its users can assuming reliable operation 15:38:05 s/assuming/assume/ 15:38:10 bswartz, Makes sense. 15:38:58 * bswartz shakes his fist at zuul 15:39:22 zuul just restarted all the manila gate jobs 15:39:28 The ghostbusters Zuul or the other one? 15:39:40 Oh. 15:39:49 markstur_: it's the same zuul 15:40:01 The one not causing dimensional mischief :P 15:40:15 bswartz: :-) 15:40:22 dustins: that's a matter of perspective 15:40:47 :) 15:40:59 okay so did we answer your question ganso? 15:41:41 bswartz: so decision is that we prefer the driver does the copy, in Mitaka. And we will work for an improved data service to be released in N, right? 15:42:07 ganso: any reason we can't get the data copy service fully working in M? 15:42:29 is there not enough manpower to get it done? 15:42:47 bswartz: I am not sure we can do it in time, if the driver needs it, the driver will also need to update 15:42:58 I don't prefer that the driver does the copy, but it seems to be the only option in the short term 15:43:10 bswartz: If that's the will of the community, fine. I think the patch in question does Huawei customers, and by extension Manila, a disservice by introducing a huge performance bottleneck in what should be a quick workflow. 15:43:19 bswartz: It is better than rushing it 15:43:51 bswartz: *could be better 15:44:06 cknight: I think that can be worked around by documenting that, for the huawei driver at least, m-shr needs to on a node with data-copying horsepower 15:44:20 it's really a deployment question 15:44:47 right now we don't have flexbility to split the m-shr and m-dat functions onto different nodes, and in the future, we will 15:44:55 end users don't understand that, Cloud providers may even choose to disable snapshot functionality on a Huawei backend 15:45:57 ganso: +1 I would. 15:46:03 I actually hope that we can learn something about data copying by allowing some people to try it and let us know how it goes at production scale 15:46:29 bswartz: The migration facility should accomplish that. 15:46:56 yes but I'm hearing we might not have that production ready in time for Mitaka 15:47:15 bswartz: we can try 15:47:35 maybe create an alternative Huawei patches and by the end of M-3 we merge one or the other 15:47:44 Mitaka will hopefully give users an improved migration experience, but maybe not a fully scalable data copying engine 15:47:44 s/patches/patch 15:48:26 if nobody uses Data Service and Migration is still experimental in Mitaka, then nothing is affected 15:49:19 okay I wanted to move onto the next topic 15:49:43 #topic Snapshot manage/unmanage 15:50:03 I was hoping xyang would suggest this as a midcycle topic so we could have discussed it last week 15:50:24 since she isn't here, I just wanted to point it out 15:50:44 https://review.openstack.org/#/c/249542 15:51:15 this is a feature we discussed back in liberty 15:52:09 it's an extension to the manage/unmanage share feature to allow handling of snapshots as well 15:52:29 I hope we can make this feature part of mitaka too 15:52:50 since xyang isn't here I guess we don't need to say anything else unless there are questions about it 15:52:51 In principle, this seems like a useful feature, although it had its detractors on the Cinder side. I'd like to understand how this will work with snapshots of replicated shares. 15:53:19 cknight: well you can manage a replicated share can you? 15:53:37 perhaps you can unmanage one 15:53:49 but only if there are no snapshots 15:54:26 bswartz: I'm merely pointing out that merging this feature now adds more technical debt with feature interoperability. 15:54:27 bswartz: you could, but the replication isn't initiated up by manila though.. so it wouldn't know about any existing replicas.. 15:55:35 gouthamr: I'm referring to managing an share and using a share_type with replication=True 15:56:18 for that to succeed, the driver would need a TON of code to find and validate the existing replicas 15:56:46 bswartz: Rather than merging yet another feature, I'd prefer to see us make the ones we already have work together, including share manage/unmanage. 15:56:50 fortunately the driver is free to simply fail in that case 15:57:01 cknight: +1 15:58:03 cknight: +1 15:58:07 cknight: I don't see how this makes things any worse -- manage/unmanage is an optional feature 15:58:49 drivers that choose to implement it can do so and add value in that use case 15:58:53 As a side note, is there a place where I can see the required and optional features for a driver? 15:59:03 dustins: yes! 15:59:09 dustins: Yes, there's a doc in devref for that. 15:59:24 okay we're out of time 15:59:29 http://docs.openstack.org/developer/manila/devref/driver_requirements.html 15:59:31 no time for open discussion even 15:59:35 vponomaryov: Thanks! 15:59:55 continue the snapshot manage/unmanage discussion in the gerrit review 16:00:03 thanks all 16:00:05 bswartz: ok 16:00:15 #endmeeting