15:00:51 <bswartz> #startmeeting manila 15:00:55 <openstack> Meeting started Thu Mar 30 15:00:51 2017 UTC and is due to finish in 60 minutes. The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:57 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:00 <openstack> The meeting name has been set to 'manila' 15:01:01 <bswartz> hello all 15:01:03 <xyang2> hi 15:01:04 <dustins> \o 15:01:05 <ganso> hello 15:01:06 <zhongjun_> hi 15:01:07 <vponomaryov> hello 15:01:08 <markstur> hello 15:01:08 <jprovazn> hi 15:01:15 <gouthamr> hi 15:01:20 <gouthamr> o/ 15:01:41 <bswartz> #agenda https://wiki.openstack.org/wiki/Manila/Meetings 15:02:13 <bswartz> I don't have any announcements today but I wanted to remind you all that Pike-1 is 2 weeks away 15:02:36 <bswartz> I'm reviewing specs today 15:02:43 <bswartz> I know several of them are ready and just waiting 15:02:52 <zhongjun_> good news 15:03:05 <bswartz> #topic ensure share 15:03:11 <bswartz> #link https://review.openstack.org/#/c/446494 15:03:14 <tommylikehu_> hi 15:03:21 <bswartz> zhongjun_: you're up 15:03:32 <zhongjun_> bswartz: thanks, According to your advice, I add two methods to slove ensure share problem in spec. 15:03:35 <toabctl> hi 15:03:40 <vkmc> o/ 15:04:05 <zhongjun_> bswartz: I am not sure which one is better for manila? 15:04:18 <jungleboyj> o/ 15:04:57 <bswartz> zhongjun_: based on our discussion earlier, I don't think we should be prioritizing things such as avoiding restarting m-shr with this feature 15:05:17 <bswartz> avoiding m-shr restarts _is_ a valuable goal, but it's unrelated to ensure_share IMO 15:05:38 <ganso> to add some context: one method is automatic detection/flag of shares that need to run ensure, and the other is using CLI to do so when admin wants 15:06:00 <bswartz> ensure_share is about being as smart as we can about handling issues that might occur when we do restart the service 15:06:36 <bswartz> in particular dealing with the effects of software upgrades 15:06:56 <bswartz> either the manila software itself or the backend software/configuration 15:07:49 <zhongjun_> ok, We could open another patch to sovle the things that we don't need to restart 15:08:06 <bswartz> if there are enhancements we can make that allow is to "fix" some things without restarting which currently do require a restart, then I'd like to see that proposed separately 15:08:34 <bswartz> I don't know if others have different opinions -- that's how I think about it 15:10:09 <bswartz> zhongjun_: based on my reading of the current version of the spec, I think we might need to split some content out of this spec into a separate one 15:10:47 <markstur> the biggest problem with the current ensure_share is that it was misunderstood and new driver developers were unclear about what they should do with it 15:10:57 <bswartz> yeah 15:11:22 <zhongjun_> bswartz: but forget about without restarting, we still have two methods. I still need to concider all problems. 15:11:25 <bswartz> I want to stick to the model where it's called once on driver startup for some subset of the shares on that backend (possibly all, possibly none) 15:11:30 <markstur> the on_restart() concept seems clear, but we'll need to make sure it looks clear to a newbie in the end 15:12:00 <bswartz> the problem today is that it's always all shares, and drivers do whatever they want there 15:12:02 <markstur> what subset? 15:12:10 <bswartz> markstur: that's what we need to define 15:12:18 <ganso> bswartz: for that the share needs to be flagged as needing to be "ensured" 15:12:31 <markstur> ahh 15:12:34 <bswartz> ganso: I think we can do better than that 15:12:42 <bswartz> who does the flagging? 15:13:03 <ganso> bswartz: zhongjun_ has two methods to perform that flagging: on demand CLI and automatic 15:13:43 <bswartz> I'd like to focus on the automatic 15:14:03 <bswartz> on-demand maintenance is fine, but it's objectively worse than automatic 15:14:29 <bswartz> in a cloud everything should be automatic 15:14:44 <zhongjun_> ganso: I think maybe ben don't want a new CLI, right? 15:14:48 <markstur> if manila already knows which shares "need attention" to be sent to a driver, then why wait for restart 15:14:58 <ganso> zhongjun_: it is not preferable 15:15:20 <vponomaryov> markstur is right 15:15:29 <markstur> I think a CLI makes sense for when it is good for an admin to decide when to take the action 15:15:36 <vponomaryov> get it know automatically is the problem 15:15:37 <bswartz> markstur: the point is that determining which ones need attention might itself be an expensive operation 15:15:48 <markstur> but I like the push to try to sort out other stuff first and leave the CLI as a separate tool 15:15:54 <bswartz> so we only do that work on startup 15:16:38 <bswartz> or more likely, it's something that's not too expensive, but very unlikely to change except at times when there are restarts 15:16:47 <bswartz> again I'm thinking in particular about software upgrades 15:16:53 <markstur> bswartz: makes sense. I was thinking from a driver startup perspective and needed to think more of a manila restart process too 15:16:59 <bswartz> you can't upgrade your manila software without restarting m-shr 15:17:08 <tommylikehu_> maybe we could introduce the CLI first, and mark it deprecated when we already have the tested automatically features 15:17:10 <gouthamr> looks like we need both: sometimes, we need to set this "needs-attention" flag during a database migration, and we'd need to let the administrator set it 15:17:35 <bswartz> and in most cases, upgrading the software on your backend will involve downtime and/or a restart of m-shr so the driver can recognize the software upgrade 15:17:35 <zhongjun_> bswartz: If we tag share, the admin still seems so busy? 15:18:02 <markstur> I've seen where just a driver change (and the version in stats) can easily do some clever things on a driver upgrade with the current ensure 15:18:26 <markstur> The same thing on a core manila restart also has the possibility of dealing with upgrade stuff 15:18:32 <ganso> gouthamr: we need to be careful to not run in a case where the problem that needs attention needs admin intervention (like direct update to DB entry) to be fixed, on such cases tagging the share may not be useful at all 15:19:36 <gouthamr> ganso: true... 15:19:57 <bswartz> let me spell out the kind of use case I worry about 15:20:45 <bswartz> suppose a vendor improves the software on their storage controller to address a limitation with manila -- they add some feature which was missing 15:21:02 <bswartz> next they update their manila driver to take advantage of the new feature 15:21:28 <bswartz> however the driver changes require storing some additional metadata in manila, or changing existing fields in some way to remain compatible 15:22:01 <bswartz> now when an actual user who has the old version of manila and that driver and that vendor's storage controller wants to upgrade, something needs to happen 15:22:52 <bswartz> they need to decide whether to upgrade manila first or the storage controller first, and once they do, they need to perform those upgrades, and plan one of more restarts of the manila services 15:23:19 <bswartz> at some point something in the driver needs to flip the switch to start allowing the new feature which is now enabled by the new software 15:23:50 <bswartz> ensure_share seems like the ideal way to update any per-share information that needs updating in this case 15:24:31 <bswartz> the catch is, we don't want to be ensuring all the shares every time we restart, because most of the time the software isn't being upgraded 15:24:43 <bswartz> so how do we know when to do it? 15:25:08 <vponomaryov> bswartz: store driver version with share DB model? 15:25:17 <bswartz> manila needs some way to ask the driver, and the driver needs some way to persist enough state to tell the difference between a no-upgrade restart and an upgrade-restart 15:25:23 <vponomaryov> bswartz: and when version changed we run this method? 15:25:39 <bswartz> vponomaryov: yes that's the kind of thing I'm thinking about 15:25:58 <bswartz> however a plain "driver version" might not be flexible enough because versioning itself is a complex topic 15:26:21 <vponomaryov> bswartz: then we define versioning rules first 15:26:43 <bswartz> I was thinking something as simple as a unique hash which doesn't change unless the software is modified 15:26:44 <zhongjun_> We could store driver infos(inculde version) in method 1, It is up to driver 15:26:53 <ganso> bswartz, vponomaryov: that looks similar to what zhongjun_ proposes in her spec 15:26:56 <vponomaryov> bswartz: where we change major part only when we need to run this ensure share method 15:27:51 <zhongjun_> driver could tell manila whether the driver need to update, It is also up to drvier 15:27:57 <bswartz> ganso: as long as it's automatic and there's no API/CLI invoking required 15:28:29 <vponomaryov> zhongjun_: it is already possible 15:28:34 <bswartz> so that would address the question of: should we ensure all or none of the shares 15:28:42 <vponomaryov> zhongjun_: driver can either skip "ensure_share" execution or not 15:28:55 <bswartz> the other question is, is there a case to ensure some subset of the shares? (like cinder does with volumes) 15:30:11 <ganso> bswartz: we are most likely to run ensure_share for all shares of a certain backend stanza, even if only a few are needed, that would be okay I guess 15:30:57 <zhongjun_> vponomaryov: yes 15:31:56 <bswartz> ganso: I can't think of a case for a subset, but we have to consider the scalability implications then 15:32:28 <bswartz> if it's all-or-nothing for ensure share, then the cases where we do all of them might cause *very* slow startups on backends with large numbers of shares 15:33:34 <ganso> bswartz: a subset could be only the shares belonging to an array that was upgraded... a subset within that subset would be only the ones affected by the upgrade, but it seems too complex to try to detect this smaller subset effectively, it may be too intensive to do so 15:34:35 <ganso> bswartz: s/intensive/expensive 15:34:38 <markstur> now you guys are making it sound like these upgrades should be controlled by and admin CLI to do them in reasonable chunks 15:34:40 <bswartz> yeah 15:34:47 <zhongjun_> bswartz: We could do it in another patch, if we really need to do it. 15:35:03 <bswartz> markstur: admins already do control upgrades 15:35:17 <bswartz> I'd rather avoid making upgrades any harder than they already are 15:35:45 <markstur> I mean if this feature-enhancing on-startup will tweek 1000 shares and take a day, then maybe the admin would prefer it doesn't 15:36:03 <markstur> and control the share upgrade in smaller chunks 15:36:47 <markstur> just sayin' the concerns mentioned above made me think this ^ 15:37:53 <vponomaryov> markstur: updating whole backend I would expect some time soncumption 15:38:06 <markstur> but hopefully most upgrades are a fast loop thru a list with little tweak of some setting 15:38:24 <bswartz> unfortunately some "upgrades" are mandatory and whether you do them in small chunks or all at once, you can't bring up your services until they're all done 15:38:31 <markstur> vponomaryov: so... 15:38:44 <vponomaryov> markstur: so, small chunk does not really solve anything 15:39:02 <bswartz> if there's an "upgrade" which isn't really mandatory, then yes it could be handled as a background task *after* the driver fully comes online 15:39:05 <markstur> So if manila is upgraded and there are a bunch of backends and some come up right away but one is off-line for a significant period chunking thru upgrade... 15:39:29 <markstur> that'd be kind of bad, but OK. I suppose it'd be hard for us to know and warn an admin how long it'd actually take anyway 15:40:15 <markstur> maybe we need a real scenario before we worry too much about background upgrades and chunking and CLI 15:40:42 <bswartz> yeah that's what I was hoping we would arrive at with this spec 15:41:41 <markstur> I really like the vision and creativity of this group, but speculative features w/o customer need can be a real problem 15:42:41 <vponomaryov> who has experience from production deployment and manila restart? 15:42:41 <bswartz> fwiw, the cinder team faced some of the same issues and there was discussion about their ensure_volume and similar problems back in Ft Collins (summer 2016) 15:43:40 <zhongjun_> markstur: our real customer just want to update the shares without restart 15:44:49 <vkmc> that would be rolling upgrades 15:45:13 <bswartz> zhongjun_: in that case maybe ensure_share is the wrong vehicle and we need a polling-based approach or a new API (or even the UNIX SIGHUP approach) 15:45:14 <markstur> yeah but just rolling shares not like rolliing all of manila or OS 15:45:42 <bswartz> rolling upgrades solves a different problem IMO 15:46:00 <bswartz> rolling upgrades is about minimizing cloud downtime while upgrading openstack 15:46:23 <vkmc> well... taking that concept to manila only 15:46:35 <bswartz> what zhongjun_ refers to is upgrading individual shares without manila downtime 15:46:39 <vkmc> we would like to perform an upgrade without restarting services 15:46:39 <zhongjun_> rolling upgrades, and update the shares that is up to driver. 15:47:12 <ganso> also, we wouldn't be interested in keeping the old behavior in this case, it is the case where the share wouldn't work without the upgrade fix 15:47:44 <bswartz> vkmc: !!! 15:48:00 <bswartz> how do you upgrade without a restart? 15:48:19 <bswartz> that's a neat trick I'd like to learn 15:48:45 <ganso> bswartz: I don't see a problem with that... if we can reload the config and update the db, an array update becomes very simple from manila point of view 15:49:04 <ganso> bswartz: s/update/upgrade 15:49:05 <bswartz> oh you mean upgrade the backend 15:49:12 <ganso> bswartz: yea, that's one use case 15:49:23 <bswartz> yes in theory m-shr can stay up while the backend gets upgraded 15:49:34 <vkmc> bswartz, I'm not saying it's possible to do it for realz... you can simulate it though 15:49:41 <vkmc> in the sense that customers won't see the downtime 15:49:46 <ganso> bswartz: if driver version is changed, then it has to be restarted to reload python binaries 15:49:47 <bswartz> although you might get some errors on inflight operations 15:49:52 <vkmc> and yeah, it requires some juggling 15:49:57 <bswartz> ganso: that's what I was thinking 15:50:22 <bswartz> although if the Linux kernel can support so-called "live patching" then maybe someday python will too 15:50:28 <ganso> bswartz: that's why we shouldn't worry that much about avoiding restarts, we can't solve every use case avoiding restarts 15:50:38 <bswartz> ganso: +10 15:50:42 <vkmc> still, my understanding was that ensure shares wanted to address a different problem 15:50:55 <zhongjun_> markstur: yeah, we could rolling part of manila shares, but tag shares maybe it will make admin a little busy 15:51:13 <bswartz> that's why I'm more focused on making ensure_share do the right thing when the driver restarts, and no on avoiding restarts 15:51:23 <markstur> yeah. manual stuff doesn't scale well 15:51:55 <bswartz> okay anything else on the topic of ensure_share? 15:51:58 <ganso> bswartz: the thing is, we can't have restart as a trigger 15:52:03 <ganso> bswartz: as it is today 15:52:13 <bswartz> ganso: why not? 15:52:18 <bswartz> it seems fine to me 15:52:18 <ganso> bswartz: as part of the use cases aren't solved by driver restarts 15:52:25 <markstur> ganso: I don't suppose you want to right a m-upg to do background upgrades 15:52:28 <ganso> bswartz: like, upgraded the array, driver version remains the same, restart is not needed 15:52:32 <markstur> s/right/write/ 15:52:36 <ganso> markstur: lol no 15:52:40 <vponomaryov> markstur: lol 15:52:55 <ganso> bswartz: we could decouple them 15:53:03 <ganso> bswartz: just need a new trigger 15:53:04 <vkmc> ganso++ 15:53:05 * markstur thinks ganso could be convinced if it was a good idea 15:53:11 <bswartz> ganso: I never said ensure_share solves all the use cases -- just that it solves some and it has the benefit of being mostly idiot-proof 15:54:05 <bswartz> other solutions like new APIs require the admin to do something and I guarantee some fraction of admins will forget or not do it 15:54:21 <ganso> markstur: lol I am not very fond of creating new services :P 15:54:29 <bswartz> anyways since we're low on time 15:54:31 <bswartz> #topic open discussion 15:54:37 <bswartz> anything else to discuss today? 15:55:36 <bswartz> alright 15:55:49 <bswartz> thanks all and please keep reviewing specs 15:56:01 <zhongjun_> thanks 15:56:02 <bswartz> #endmeeting