#openstack-meeting-alt log

15:00:51 <bswartz> #startmeeting manila
15:00:55 <openstack> Meeting started Thu Mar 30 15:00:51 2017 UTC and is due to finish in 60 minutes.  The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:57 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:00 <openstack> The meeting name has been set to 'manila'
15:01:01 <bswartz> hello all
15:01:03 <xyang2> hi
15:01:04 <dustins> \o
15:01:05 <ganso> hello
15:01:06 <zhongjun_> hi
15:01:07 <vponomaryov> hello
15:01:08 <markstur> hello
15:01:08 <jprovazn> hi
15:01:15 <gouthamr> hi
15:01:20 <gouthamr> o/
15:01:41 <bswartz> #agenda https://wiki.openstack.org/wiki/Manila/Meetings
15:02:13 <bswartz> I don't have any announcements today but I wanted to remind you all that Pike-1 is 2 weeks away
15:02:36 <bswartz> I'm reviewing specs today
15:02:43 <bswartz> I know several of them are ready and just waiting
15:02:52 <zhongjun_> good news
15:03:05 <bswartz> #topic ensure share
15:03:11 <bswartz> #link https://review.openstack.org/#/c/446494
15:03:14 <tommylikehu_> hi
15:03:21 <bswartz> zhongjun_: you're up
15:03:32 <zhongjun_> bswartz: thanks, According to your advice, I add two methods to slove ensure share problem in spec.
15:03:35 <toabctl> hi
15:03:40 <vkmc> o/
15:04:05 <zhongjun_> bswartz: I am not sure which one is better for manila?
15:04:18 <jungleboyj> o/
15:04:57 <bswartz> zhongjun_: based on our discussion earlier, I don't think we should be prioritizing things such as avoiding restarting m-shr with this feature
15:05:17 <bswartz> avoiding m-shr restarts _is_ a valuable goal, but it's unrelated to ensure_share IMO
15:05:38 <ganso> to add some context: one method is automatic detection/flag of shares that need to run ensure, and the other is using CLI to do so when admin wants
15:06:00 <bswartz> ensure_share is about being as smart as we can about handling issues that might occur when we do restart the service
15:06:36 <bswartz> in particular dealing with the effects of software upgrades
15:06:56 <bswartz> either the manila software itself or the backend software/configuration
15:07:49 <zhongjun_> ok, We could open another patch to sovle the things that we don't need to restart
15:08:06 <bswartz> if there are enhancements we can make that allow is to "fix" some things without restarting which currently do require a restart, then I'd like to see that proposed separately
15:08:34 <bswartz> I don't know if others have different opinions -- that's how I think about it
15:10:09 <bswartz> zhongjun_: based on my reading of the current version of the spec, I think we might need to split some content out of this spec into a separate one
15:10:47 <markstur> the biggest problem with the current ensure_share is that it was misunderstood and new driver developers were unclear about what they should do with it
15:10:57 <bswartz> yeah
15:11:22 <zhongjun_> bswartz: but forget about without restarting, we still have two methods. I still need to concider all problems.
15:11:25 <bswartz> I want to stick to the model where it's called once on driver startup for some subset of the shares on that backend (possibly all, possibly none)
15:11:30 <markstur> the on_restart() concept seems clear, but we'll need to make sure it looks clear to a newbie in the end
15:12:00 <bswartz> the problem today is that it's always all shares, and drivers do whatever they want there
15:12:02 <markstur> what subset?
15:12:10 <bswartz> markstur: that's what we need to define
15:12:18 <ganso> bswartz: for that the share needs to be flagged as needing to be "ensured"
15:12:31 <markstur> ahh
15:12:34 <bswartz> ganso: I think we can do better than that
15:12:42 <bswartz> who does the flagging?
15:13:03 <ganso> bswartz: zhongjun_ has two methods to perform that flagging: on demand CLI and automatic
15:13:43 <bswartz> I'd like to focus on the automatic
15:14:03 <bswartz> on-demand maintenance is fine, but it's objectively worse than automatic
15:14:29 <bswartz> in a cloud everything should be automatic
15:14:44 <zhongjun_> ganso: I think maybe ben don't want a new CLI, right?
15:14:48 <markstur> if manila already knows which shares "need attention" to be sent to a driver, then why wait for restart
15:14:58 <ganso> zhongjun_: it is not preferable
15:15:20 <vponomaryov> markstur is right
15:15:29 <markstur> I think a CLI makes sense for when it is good for an admin to decide when to take the action
15:15:36 <vponomaryov> get it know automatically is the problem
15:15:37 <bswartz> markstur: the point is that determining which ones need attention might itself be an expensive operation
15:15:48 <markstur> but I like the push to try to sort out other stuff first and leave the CLI as a separate tool
15:15:54 <bswartz> so we only do that work on startup
15:16:38 <bswartz> or more likely, it's something that's not too expensive, but very unlikely to change except at times when there are restarts
15:16:47 <bswartz> again I'm thinking in particular about software upgrades
15:16:53 <markstur> bswartz: makes sense.  I was thinking from a driver startup perspective and needed to think more of a manila restart process too
15:16:59 <bswartz> you can't upgrade your manila software without restarting m-shr
15:17:08 <tommylikehu_> maybe we could introduce the CLI first, and mark it deprecated when we already have the tested automatically features
15:17:10 <gouthamr> looks like we need both: sometimes, we need to set this "needs-attention" flag during a database migration, and we'd need to let the administrator set it
15:17:35 <bswartz> and in most cases, upgrading the software on your backend will involve downtime and/or a restart of m-shr so the driver can recognize the software upgrade
15:17:35 <zhongjun_> bswartz: If we tag share, the admin still seems so busy?
15:18:02 <markstur> I've seen where just a driver change (and the version in stats) can easily do some clever things on a driver upgrade with the current ensure
15:18:26 <markstur> The same thing on a core manila restart also has the possibility of dealing with upgrade stuff
15:18:32 <ganso> gouthamr: we need to be careful to not run in a case where the problem that needs attention needs admin intervention (like direct update to DB entry) to be fixed, on such cases tagging the share may not be useful at all
15:19:36 <gouthamr> ganso: true...
15:19:57 <bswartz> let me spell out the kind of use case I worry about
15:20:45 <bswartz> suppose a vendor improves the software on their storage controller to address a limitation with manila -- they add some feature which was missing
15:21:02 <bswartz> next they update their manila driver to take advantage of the new feature
15:21:28 <bswartz> however the driver changes require storing some additional metadata in manila, or changing existing fields in some way to remain compatible
15:22:01 <bswartz> now when an actual user who has the old version of manila and that driver and that vendor's storage controller wants to upgrade, something needs to happen
15:22:52 <bswartz> they need to decide whether to upgrade manila first or the storage controller first, and once they do, they need to perform those upgrades, and plan one of more restarts of the manila services
15:23:19 <bswartz> at some point something in the driver needs to flip the switch to start allowing the new feature which is now enabled by the new software
15:23:50 <bswartz> ensure_share seems like the ideal way to update any per-share information that needs updating in this case
15:24:31 <bswartz> the catch is, we don't want to be ensuring all the shares every time we restart, because most of the time the software isn't being upgraded
15:24:43 <bswartz> so how do we know when to do it?
15:25:08 <vponomaryov> bswartz: store driver version with share DB model?
15:25:17 <bswartz> manila needs some way to ask the driver, and the driver needs some way to persist enough state to tell the difference between a no-upgrade restart and an upgrade-restart
15:25:23 <vponomaryov> bswartz: and when version changed we run this method?
15:25:39 <bswartz> vponomaryov: yes that's the kind of thing I'm thinking about
15:25:58 <bswartz> however a plain "driver version" might not be flexible enough because versioning itself is a complex topic
15:26:21 <vponomaryov> bswartz: then we define versioning rules first
15:26:43 <bswartz> I was thinking something as simple as a unique hash which doesn't change unless the software is modified
15:26:44 <zhongjun_> We could store driver infos(inculde version) in method 1, It is up to driver
15:26:53 <ganso> bswartz, vponomaryov: that looks similar to what zhongjun_ proposes in her spec
15:26:56 <vponomaryov> bswartz: where we change major part only when we need to run this ensure share method
15:27:51 <zhongjun_> driver could tell manila whether the driver need to update, It is also up to drvier
15:27:57 <bswartz> ganso: as long as it's automatic and there's no API/CLI invoking required
15:28:29 <vponomaryov> zhongjun_: it is already possible
15:28:34 <bswartz> so that would address the question of: should we ensure all or none of the shares
15:28:42 <vponomaryov> zhongjun_: driver can either skip "ensure_share" execution or not
15:28:55 <bswartz> the other question is, is there a case to ensure some subset of the shares? (like cinder does with volumes)
15:30:11 <ganso> bswartz: we are most likely to run ensure_share for all shares of a certain backend stanza, even if only a few are needed, that would be okay I guess
15:30:57 <zhongjun_> vponomaryov: yes
15:31:56 <bswartz> ganso: I can't think of a case for a subset, but we have to consider the scalability implications then
15:32:28 <bswartz> if it's all-or-nothing for ensure share, then the cases where we do all of them might cause *very* slow startups on backends with large numbers of shares
15:33:34 <ganso> bswartz: a subset could be only the shares belonging to an array that was upgraded... a subset within that subset would be only the ones affected by the upgrade, but it seems too complex to try to detect this smaller subset effectively, it may be too intensive to do so
15:34:35 <ganso> bswartz: s/intensive/expensive
15:34:38 <markstur> now you guys are making it sound like these upgrades should be controlled by and admin CLI to do them in reasonable chunks
15:34:40 <bswartz> yeah
15:34:47 <zhongjun_> bswartz: We could do it in another patch, if we really need to do it.
15:35:03 <bswartz> markstur:  admins already do control upgrades
15:35:17 <bswartz> I'd rather avoid making upgrades any harder than they already are
15:35:45 <markstur> I mean if this feature-enhancing on-startup will tweek 1000 shares and take a day, then maybe the admin would prefer it doesn't
15:36:03 <markstur> and control the share upgrade in smaller chunks
15:36:47 <markstur> just sayin' the concerns mentioned above made me think this ^
15:37:53 <vponomaryov> markstur: updating whole backend I would expect some time soncumption
15:38:06 <markstur> but hopefully most upgrades are a fast loop thru a list with little tweak of some setting
15:38:24 <bswartz> unfortunately some "upgrades" are mandatory and whether you do them in small chunks or all at once, you can't bring up your services until they're all done
15:38:31 <markstur> vponomaryov: so...
15:38:44 <vponomaryov> markstur: so, small chunk does not really solve anything
15:39:02 <bswartz> if there's an "upgrade" which isn't really mandatory, then yes it could be handled as a background task *after* the driver fully comes online
15:39:05 <markstur> So if manila is upgraded and there are a bunch of backends and some come up right away but one is off-line for a significant period chunking thru upgrade...
15:39:29 <markstur> that'd be kind of bad, but OK.  I suppose it'd be hard for us to know and warn an admin how long it'd actually take anyway
15:40:15 <markstur> maybe we need a real scenario before we worry too much about background upgrades and chunking and CLI
15:40:42 <bswartz> yeah that's what I was hoping we would arrive at with this spec
15:41:41 <markstur> I really like the vision and creativity of this group, but speculative features w/o customer need can be a real problem
15:42:41 <vponomaryov> who has experience from production deployment and manila restart?
15:42:41 <bswartz> fwiw, the cinder team faced some of the same issues and there was discussion about their ensure_volume and similar problems back in Ft Collins (summer 2016)
15:43:40 <zhongjun_> markstur: our real customer just want to update the shares without restart
15:44:49 <vkmc> that would be rolling upgrades
15:45:13 <bswartz> zhongjun_: in that case maybe ensure_share is the wrong vehicle and we need a polling-based approach or a new API (or even the UNIX SIGHUP approach)
15:45:14 <markstur> yeah but just rolling shares not like rolliing all of manila or OS
15:45:42 <bswartz> rolling upgrades solves a different problem IMO
15:46:00 <bswartz> rolling upgrades is about minimizing cloud downtime while upgrading openstack
15:46:23 <vkmc> well... taking that concept to manila only
15:46:35 <bswartz> what zhongjun_ refers to is upgrading individual shares without manila downtime
15:46:39 <vkmc> we would like to perform an upgrade without restarting services
15:46:39 <zhongjun_> rolling upgrades, and update  the shares that is up to driver.
15:47:12 <ganso> also, we wouldn't be interested in keeping the old behavior in this case, it is the case where the share wouldn't work without the upgrade fix
15:47:44 <bswartz> vkmc: !!!
15:48:00 <bswartz> how do you upgrade without a restart?
15:48:19 <bswartz> that's a neat trick I'd like to learn
15:48:45 <ganso> bswartz: I don't see a problem with that... if we can reload the config and update the db, an array update becomes very simple from manila point of view
15:49:04 <ganso> bswartz: s/update/upgrade
15:49:05 <bswartz> oh you mean upgrade the backend
15:49:12 <ganso> bswartz: yea, that's one use case
15:49:23 <bswartz> yes in theory m-shr can stay up while the backend gets upgraded
15:49:34 <vkmc> bswartz, I'm not saying it's possible to do it for realz... you can simulate it though
15:49:41 <vkmc> in the sense that customers won't see the downtime
15:49:46 <ganso> bswartz: if driver version is changed, then it has to be restarted to reload python binaries
15:49:47 <bswartz> although you might get some errors on inflight operations
15:49:52 <vkmc> and yeah, it requires some juggling
15:49:57 <bswartz> ganso: that's what I was thinking
15:50:22 <bswartz> although if the Linux kernel can support so-called "live patching" then maybe someday python will too
15:50:28 <ganso> bswartz: that's why we shouldn't worry that much about avoiding restarts, we can't solve every use case avoiding restarts
15:50:38 <bswartz> ganso: +10
15:50:42 <vkmc> still, my understanding was that ensure shares wanted to address a different problem
15:50:55 <zhongjun_> markstur: yeah, we could rolling part of manila shares, but tag shares maybe it will make admin a little busy
15:51:13 <bswartz> that's why I'm more focused on making ensure_share do the right thing when the driver restarts, and no on avoiding restarts
15:51:23 <markstur> yeah. manual stuff doesn't scale well
15:51:55 <bswartz> okay anything else on the topic of ensure_share?
15:51:58 <ganso> bswartz: the thing is, we can't have restart as a trigger
15:52:03 <ganso> bswartz: as it is today
15:52:13 <bswartz> ganso: why not?
15:52:18 <bswartz> it seems fine to me
15:52:18 <ganso> bswartz: as part of the use cases aren't solved by driver restarts
15:52:25 <markstur> ganso: I don't suppose you want to right a m-upg to do background upgrades
15:52:28 <ganso> bswartz: like, upgraded the array, driver version remains the same, restart is not needed
15:52:32 <markstur> s/right/write/
15:52:36 <ganso> markstur: lol no
15:52:40 <vponomaryov> markstur: lol
15:52:55 <ganso> bswartz: we could decouple them
15:53:03 <ganso> bswartz: just need a new trigger
15:53:04 <vkmc> ganso++
15:53:05 * markstur thinks ganso could be convinced if it was a good idea
15:53:11 <bswartz> ganso: I never said ensure_share solves all the use cases -- just that it solves some and it has the benefit of being mostly idiot-proof
15:54:05 <bswartz> other solutions like new APIs require the admin to do something and I guarantee some fraction of admins will forget or not do it
15:54:21 <ganso> markstur: lol I am not very fond of creating new services :P
15:54:29 <bswartz> anyways since we're low on time
15:54:31 <bswartz> #topic open discussion
15:54:37 <bswartz> anything else to discuss today?
15:55:36 <bswartz> alright
15:55:49 <bswartz> thanks all and please keep reviewing specs
15:56:01 <zhongjun_> thanks
15:56:02 <bswartz> #endmeeting