#openstack-meeting-alt log

15:00:20 <bswartz> #startmeeting manila
15:00:21 <openstack> Meeting started Thu Jun 15 15:00:20 2017 UTC and is due to finish in 60 minutes.  The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:25 <openstack> The meeting name has been set to 'manila'
15:00:26 <bswartz> hello all
15:00:28 <cknight> Hi
15:00:33 <tbarron> hi
15:00:34 <markstur> hi
15:00:38 <zhongjun> hi
15:00:39 <dustins> \o
15:00:45 <ganso> hello
15:00:51 <vponomaryov> hello
15:01:08 <xyang2> hi
15:01:10 <bswartz> no announcements today
15:01:18 <bswartz> #agenda https://wiki.openstack.org/wiki/Manila/Meetings
15:01:31 <bswartz> we have 2 important topics to cover
15:01:46 <bswartz> first up:
15:01:50 <bswartz> #topic How should support both IPv6 and IPv4 with DHSS=true
15:02:31 <bswartz> so during the Atlanta PTG we identified an issue with supporting both IPv4 and IPv6 at the same time for DHSS=true backends
15:02:40 <vkmc_> o/
15:02:42 <zhongjun> Should we support both 6 and 4 with DHSS=true as initial target
15:02:53 <bswartz> it's not possible to do it with the current share network design
15:03:10 <bswartz> so we know that a change to share networks is needed before we can achieve that
15:03:27 <bswartz> some of us discussed possible approaches (me, tbarron, gouthamr) on Friday
15:03:40 <zhongjun> link:https://review.openstack.org/#/c/391805/
15:04:00 <bswartz> but I'd like to table that part of the discussion and first make sure we have agreement on our pike goal
15:04:34 <bswartz> for Pike the plan is to support both IPv4 and IPv6 for dhss=false drivers
15:05:05 <bswartz> and for dhss=true drivers, either IPv4 or IPv6 should be supported, but not both at the same time
15:05:07 <gouthamr> o/
15:05:21 <zhongjun> bswartz: agree
15:05:21 * bswartz marks gouthamr tardy :-p
15:05:31 <gouthamr> :[
15:06:00 <gouthamr> +1 with the proposal
15:06:40 <bswartz> I'm pretty sure this is consistent with what we've been planning since ocata, but since zhongjun was confused about it, I wanted to make sure we were all on the same page as a community
15:07:09 <tbarron> +1
15:07:19 <bswartz> so if there are no issues there, we can move on to a more contentious topic...
15:07:36 <vponomaryov> not anymore
15:07:37 <zhongjun> In coat, we planning to support either IPv4 or IPv6 for dhss=false and true :)
15:07:50 <zhongjun> s/coat/ocata
15:08:20 <bswartz> zhongjun we've always wanted dual support
15:08:33 <bswartz> it's not hard to achieve for dhss=false
15:08:52 <zhongjun> :)   +1
15:08:57 <bswartz> it is hard for dhss=true, and we haven't had a volunteer to redesign the share networks APIs yet
15:08:58 <gouthamr> bswartz zhongjun: have we resolved how we'd test it in the gate yet though?
15:09:05 <bswartz> gouthamr: 1 job
15:09:12 <bswartz> both in 1 job I mean
15:09:25 <bswartz> ipv6 testing would just be a tempest flag
15:10:04 <bswartz> okay let's move on
15:10:07 <bswartz> #topic IDs and instance IDs in the driver interface
15:10:17 <bswartz> #link https://bugs.launchpad.net/manila/+bug/1697581
15:10:19 <openstack> Launchpad bug 1697581 in Manila "create snapshot failed due to absent of snapshot['share_id']" [Critical,In progress] - Assigned to Valeriy Ponomaryov (vponomaryov)
15:10:20 <zhongjun> gouthamr: add a job by vkmc, but not test yet
15:10:22 <bswartz> #link https://review.openstack.org/#/c/433854/12/manila/db/sqlalchemy/models.py
15:10:28 <bswartz> #link https://review.openstack.org/#/c/473864/
15:10:52 <bswartz> so there was some driver breakage caused by the share groups DB refactor change that merged
15:11:11 <bswartz> unfortunately reviewers (me included) did not catch the 3rd party CI failures
15:11:56 <bswartz> so I'll remind everyone that we should be looking for 3rd party CI failures on changes that can potentially break drivers (especially changes affecting the driver interface)
15:12:24 <vponomaryov> everyone just got used to their failures )
15:12:50 <bswartz> the history here is that when we introduced share instances 2 years ago, we planned to keep them hidden from drivers, and present the instances to drivers as if they were the actual shares
15:13:19 <bswartz> vponomaryov: yes that's part of the problem -- some CIs were already failing for different reasons so it was impossible to notice that the share groups change broke them
15:14:00 <gouthamr> bswartz: i had a different way to tell, our DHSS=True driver was always passing :P
15:14:16 <bswartz> at that time we agreed that the actual share IDs and snapshot IDs (the ones exposed through the REST API) should never be used by the drivers
15:14:30 <gouthamr> and then consistency_group_xyz (don't remember what) flag for tempest was reused and our manifests weren't updated
15:14:48 <bswartz> later on then when we added migration and replication we started to bend our own rule
15:15:12 <bswartz> and now we have a situation where at least 1 driver (ZFS) actually does use the share ID for something
15:15:20 <tbarron> gouthamr: lol
15:16:00 <bswartz> there are multiple options to fix this, but we don't seem to agree on which one to pursue
15:16:24 <gouthamr> i think we need to fix replication and migration if they are the problem areas
15:16:47 <bswartz> 1) admit defeat and present both share IDs and instance IDs through the driver interface -- this would require changing all the drivers the directly consume the instance IDs instead of the share IDs
15:17:23 <bswartz> 2) be more strict about never presenting the share IDs to the driver, and fix drivers which currently rely on it
15:17:48 <bswartz> 3) roll back to the previous state where stuff worked but it was all murky and confusing
15:18:00 <gouthamr> snapshot['share_id'] has always meant to refer to whatever ID is picked by the driver to refer to shares on their respective drivers... and for a lot of drivers, it's whatever comes as share['id'] in create_share
15:18:06 <gouthamr> we should keep it that way
15:18:33 <bswartz> vponomaryov's bugfix for this patch was the first step in (1), but xyang didn't like it -- AIUI because it would require significant driver changes
15:18:36 <gouthamr> about 2) this is enforced in code review...
15:18:50 <vponomaryov> bswartz, https://review.openstack.org/#/c/473864/ is (3) right now
15:18:50 <ganso> vponomaryov: can ZFS  avoid using the snapshot share_id ?
15:18:58 <bswartz> I personally lean towards option (2)
15:19:20 <bswartz> but I want to know if anything other than ZFS actually requires the share ID or snapshot ID (not the instance IDs)
15:19:26 <vponomaryov> ganso: this driver stores appropriate data for share facade and share instances
15:19:28 <ganso> bswartz: If I understood correctly, we are broken at this moment, so we need option #1 or #3 to fix
15:19:35 <vponomaryov> ganso: stores separate sets of data
15:19:52 <vponomaryov> bswartz: (3) now and (2) or (1) later
15:20:10 <gouthamr> +1
15:20:20 <zhongjun> +1
15:20:44 <bswartz> ganso: I think ZFS could be modified to not rely on the share ID, but it would be a breaking change
15:21:00 <bswartz> that is, exising replicated shares would probably not survive an upgrade
15:21:18 <zhongjun> We always use the  instance IDs in our driver.
15:21:23 <bswartz> I'm not aware of any users relying on ZFS fortunately
15:21:26 <tbarron> do drivers need to log the ID and maybe should present the share ID rather than instance ID in the log?  o/w I like #2
15:21:51 <tbarron> or is it ok to log instance IDs ... as long as we don't present to end user
15:22:03 <gouthamr> quick search: the container driver also seems to be using share['share_id']
15:22:22 <vponomaryov> gouthamr: it doesn't do snapshots ))
15:22:22 <bswartz> IMO it should be possible for drivers to come up with their own identifier that ties together the various replicas of a share and store that in driver private share data or in the provider location field
15:22:23 <gouthamr> share_name = share.share_id :|
15:22:41 <bswartz> and thus not rely on the share ID to relate different replicas to eachother
15:22:54 <gouthamr> vponomaryov: or replication
15:24:17 <bswartz> gouthamr: regarding enforcement of (2) code reviews haven't worked well so I would propose actually modifying the manager to pass down synthetic objects to the drivers instead of the model objects
15:25:06 <bswartz> so I'd like to figure out which approach we should pursue
15:25:15 <bswartz> hopefully nobody is in favor of (1)
15:25:22 <bswartz> but if you are please speak up
15:25:44 <bswartz> I think the question is whether to (2) or (3) or (3) followed by (2)
15:26:17 <bswartz> sorry if my usage of numbers is confusing
15:26:29 <gouthamr> the latter... so we can evaluate this with more time..
15:26:42 <gouthamr> we can't keep being broken at the moment.
15:26:53 <vponomaryov> bswartz: me, personally, don't see problem with (1)
15:27:02 <vponomaryov> bswartz: so, I woudl say (3) then (1)
15:27:34 <bswartz> vponomaryov: the downside there is that existing drivers need to be modified significantly, and future driver authors see a more complex interface to deal with
15:27:53 <bswartz> it's hard enough to figure out how to write a manila driver
15:27:59 <vponomaryov> bswartz: significantly? cannot agree
15:28:06 <bswartz> adding different kinds of IDs would make that worse IMO
15:28:34 <bswartz> vponomaryov: the problem is that it's hard to be sure you've found every place the IDs is used when doing a field rename
15:28:55 <bswartz> python is an untyped language so there's no "refactor" button that just works
15:28:58 <gouthamr> significantly: relative term --- less significantly for someone who wrote multiple first party drivers, more significantly for someone bridging a storage array :D
15:29:31 <ganso> gouthamr: +1
15:29:37 <bswartz> the other downside of any kind of driver refactor is that it makes backporting bugfixes harder -- so I'd rather not force all the driver to change
15:30:24 <vponomaryov> (3) forever then! ))
15:30:24 <gouthamr> bswartz: +1
15:30:30 <ganso> plus, we've seen the response time of driver updates when we forced everyone to implement update_access... some are still doing it or haven't done it
15:30:58 <bswartz> well what arguments are there against (2)?
15:31:18 <bswartz> Are we aware of any cases other than ZFS replication where the share ID actually matters to a driver?
15:31:28 <gouthamr> bswartz: i'm concerned about drivers that use driver-private-data
15:31:38 <bswartz> if not, then the sole downside to (2) is that we need to modify ZFS replication
15:33:12 <bswartz> which brings up another question...
15:33:33 <bswartz> since vponomaryov won't be maintaining ZfsOnLinux anymore, is there a volunteer to take that over?
15:33:51 <zhongjun> Do we need to check it again? Some other driver maintainer don't attend this meeting.
15:35:03 <bswartz> zhongjun: we will double check before introducing any changes, and we can/should look closely at CI systems in any changes that actually remove the share ID or shapshot ID from the objects passed to the drivers
15:35:49 <bswartz> gouthamr: what's your concern about private data?
15:36:06 <bswartz> private data is meant to make things easier for drivers
15:36:13 <gouthamr> bswartz: i;m concerned about updates
15:36:49 <bswartz> what updates
15:37:04 <gouthamr> if the driver relies on checking the id from fields that will disappear from resources passed in the driver interface
15:37:23 <gouthamr> to store things into the driver-private-data, how would we avoid that problem?
15:38:00 <bswartz> gouthamr: that would count as a driver using those fields, and that's exactly what we're trying to determine
15:38:18 <bswartz> perhaps we just need to audit the code, but I was looking for anyone that knows of cases already
15:38:29 <bswartz> because vponomaryov pointed out the ZFS use case
15:38:53 <gouthamr> container driver refers to shares created with the share ID not the instance ID
15:39:25 <bswartz> container driver should be fixed then -- its not like it has a good reason to do that, right?
15:39:39 <vponomaryov> bswartz: yes, compared to ZFS
15:40:40 <bswartz> can we agree in principle that we will do (3) and then pursue (2) assuming no other difficult cases other than ZFS come up?
15:40:54 <bswartz> since there are no volunteers to maintain ZFS I can take a look at that myself
15:41:22 <bswartz> I'm actually a heavy user of ZFS so I'm quite familiar with how it works
15:42:04 <vponomaryov> ZFSonLinux, not Oracle ZFS driver
15:42:11 <gouthamr> bswartz: nope, but vponomaryov will help with (3)?
15:42:25 <vponomaryov> (3) is already here -> https://review.openstack.org/#/c/473864/
15:42:38 <vponomaryov> just will upload one more patch set
15:42:39 <bswartz> gouthamr: if you read the #manila channel that was discussed already
15:43:12 <gouthamr> vponomaryov: awesome.. thanks. I'll see what needs to be fixed on the NetApp CI to unbreak ourselves
15:43:28 <bswartz> okay it sounds like we have a path forward here
15:43:39 <bswartz> I'll open up the floor to other topics
15:43:42 <bswartz> #topic open discussion
15:43:55 <vponomaryov> one note from me
15:44:02 <vponomaryov> Since now, I will be able to spend only my spare time for manila.
15:44:18 <tbarron> we hope you have lots of spare time
15:44:24 <vponomaryov> ^_^
15:44:28 <gouthamr> :)
15:44:31 <bswartz> tbarron: +1000
15:44:43 <tbarron> vponomaryov: and seriously, we hope your non-spare time is rewarding to you!
15:44:55 <vponomaryov> ))
15:45:10 <xyang2> vponomaryov: that includes saturday, sunday, and every night.  lots of time:)
15:45:18 <gouthamr> reserve some time to argue with ganso and myself.
15:45:25 <bswartz> in my spare time I've been playing with the neutron l2gw plugin -- it may have some interesting applications for Manila
15:45:28 <ganso> gouthamr: +1 xD
15:45:36 <markstur> vponomaryov: you leave big shoes to fill
15:45:42 <tbarron> xyang2:i see you doing regular work during those times
15:46:09 <vponomaryov> markstur: just do your best )
15:46:15 <xyang2> tbarron: right.  I sleep during the day:)
15:46:21 <tbarron> bswartz: 'splain about l2gw
15:46:25 <vponomaryov> markstur: and you will see it is not that big
15:46:28 <bswartz> vponomaryov: thanks for all you've done for manila -- we would never have been this successful without your efforts
15:46:42 <gouthamr> vponomaryov: thank you!
15:46:51 <tbarron> xyang2: i sleep during meetings too
15:47:00 <vponomaryov> thanks guys, it is pleasure to work with all of you
15:47:05 <xyang2> tbarron: :)
15:47:06 <markstur> +1000
15:47:13 <ganso> vponomaryov:  =)
15:47:26 <vponomaryov> and argue too ^_^
15:47:29 <bswartz> #link https://github.com/openstack/networking-l2gw
15:47:31 <gouthamr> :P
15:47:31 <zhongjun> vponomaryov: thank you for review my code.
15:47:36 <bswartz> #link https://docs.openstack.org/developer/networking-l2gw/readme.html
15:47:37 <markstur> um the thousand was thanks to VP not in response to Tom's napping in meetings
15:47:51 * tbarron zzz....
15:48:42 <vponomaryov> ganso: where can I take member ticket to closed club of retired manila devs?
15:48:57 <bswartz> ^ this neutron plugin allows you to take a neutron network and extend it to an external VLAN network
15:49:07 <ganso> vponomaryov: haha I'll email it to you :P
15:50:00 <gouthamr> http://livinglifeph.com/wp-content/uploads/2016/11/metro-manila-retirement-hoppler-1.jpg
15:50:05 <bswartz> tbarron: it's primarily of interest to dhss=true drivers
15:50:19 <ganso> gouthamr: ROFL
15:50:46 <markstur> looks nice  :)
15:50:56 <tbarron> bswartz: with off-cloud appliances, to help integrate them into neutron, right?
15:50:59 * bswartz doesn't want to live in the Philippines
15:51:15 <gouthamr> tbarron: yep..
15:51:16 <bswartz> tbarron: that's the use case I'm interested in
15:51:51 <tbarron> bswartz: I"m just calling that out b/c there may be some other appliances that have similar needs
15:52:07 <bswartz> yes that's why I'm mentioning it too
15:52:09 <tbarron> and may share your interest
15:52:58 <bswartz> currently the netapp driver has awkward requirements to be able to run in dhss=true mode
15:53:14 <bswartz> the l2gw stuff may allow us to relax the requirements and work in more use cases
15:53:25 <gouthamr> i guess every dhss=true driver in the tree, unless they natively support vxlan
15:53:28 <markstur> http://pics4.city-data.com/cpicc/cfiles37329.jpg
15:53:44 <markstur> bswartz: You can retire in Manila, Arkansas ^
15:53:53 <bswartz> >_<
15:54:00 <gouthamr> hahahaha
15:54:07 <bswartz> North Carolina is fine with me
15:54:17 <tbarron> manila phillipines is probably safer
15:54:24 <vponomaryov> markstur: "cry me a river"?
15:54:42 <tbarron> Speaking of shared interests, I'm starting to research "instance HA" for service VMs since as manila uses them currently they are a SPOF in the data path
15:54:56 <tbarron> if anyone else is looking at this please ping me
15:55:29 <markstur> vponomaryov: :)
15:55:33 <bswartz> tbarron: related to that -- does anyone know anything about neutron "service" networks?
15:55:41 <zhongjun> tbarron: Is there some links about that?
15:56:04 <bswartz> I'm not even sure what they're called in neutron or if they exist
15:56:13 <tbarron> zhongjun: not really that I know of, maybe I'll start a wiki or blog ...
15:56:26 <gouthamr> certainly not called service-networks.. what are you thinking about?
15:56:47 <bswartz> networks that have ports on the control nodes
15:57:11 <bswartz> so m-shr could SSH to a service_instance through that port, for example
15:57:23 <vponomaryov> gouthamr: common network for service needs that is available from hosts/controllers/compute nodes
15:57:42 <vponomaryov> gouthamr: but not exposed to users
15:57:45 <bswartz> vponomaryov: did you ever find any concrete information on that?
15:57:49 <tbarron> bswartz: on that subject I've thought that maybe a bridge on the control node plus an admin/service-user-owned neutron net conncected to that bridge may do what is needed
15:57:51 <vponomaryov> bswartz: no
15:57:58 <bswartz> okay
15:58:07 <bswartz> so that remains an area to research
15:58:32 <tbarron> yeah, that's the other are of service instance module that really needs attention I think
15:58:40 <tbarron> besides the instance HA area
15:58:50 <bswartz> tbarron: the important thing would be neutron APIs to set that up though
15:59:10 <tbarron> how to get connectivity to SVMs w/o the @#$#@$ layer 2 stitching that we have to do now
15:59:25 <tbarron> bswartz: yeah, agree on the APIs
15:59:25 <bswartz> if it's not supported by neutron then it would be no better than the hack currently used by the generic driver
15:59:44 <bswartz> okay we're out of time
15:59:49 <bswartz> thank you all
15:59:57 <bswartz> #endmeeting