#openstack-meeting-alt log

15:00:58 <bswartz> #startmeeting manila
15:00:59 <openstack> Meeting started Thu Oct 26 15:00:58 2017 UTC and is due to finish in 60 minutes.  The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:00 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:02 <openstack> The meeting name has been set to 'manila'
15:01:11 <gouthamr> o/
15:01:14 <dustins> \o
15:01:21 <markstur> hi
15:01:26 <bswartz> hello all
15:01:36 <tbarron> hi
15:01:38 <jungleboyj> @!
15:01:38 <_pewp_> jungleboyj (◍˃̶ᗜ˂̶◍)ﾉ”
15:01:50 <zhongjun> hi
15:01:57 <bswartz> jungleboyj what is that?
15:02:14 <jungleboyj> Looks like a waving cat.  :-)
15:02:37 <ganso> hello
15:03:03 <bswartz> these emojis are like rorschach tests
15:03:25 <jungleboyj> :-)  Gotta keep your life interesting.
15:04:08 <bswartz> courtesy ping: cknight vponomaryov toabctl
15:04:25 <bswartz> #topic announcements
15:04:42 <bswartz> so we extended the spec freeze deadline 1 week, to today
15:04:57 <bswartz> I don't see any more specs that look ready for review
15:05:07 <bswartz> #link https://review.openstack.org/#/q/status:open+project:openstack/manila-specs
15:05:35 <bswartz> so I think we're in good shape spec-wise
15:06:01 <bswartz> the agenda is short today
15:06:13 <tbarron> review comments on https://review.openstack.org/#/c/504987/ are solicited, though we won't merge it for this cycle's deadline
15:06:16 <bswartz> #agenda https://wiki.openstack.org/wiki/Manila/Meetings
15:06:57 <bswartz> tbarron: okay thanks -- I'll plan to take a look
15:07:05 <bswartz> I have one topic for today
15:07:11 <bswartz> #topic Manila test image
15:07:21 <bswartz> #link https://review.openstack.org/#/q/status:open+project:openstack/manila-test-image
15:07:37 <bswartz> is anyone reviewing changes to this project?
15:08:10 <bswartz> I'm wondering if it makes sense for me to just ninja merge stuff here, or if there are other core that want to review what goes into manila-test-image
15:08:31 <tbarron> I think ninja-merge is OK for this
15:08:43 <tbarron> I've been watching out of the corner of my eye from time to time
15:08:52 <bswartz> there's a ton of work needed to get the manila nextgen driver working, and it's really hard to maintain long chains of unmerged patches
15:08:55 <tbarron> this test image isn't used for our gate yet
15:09:12 <gouthamr> +1
15:09:37 <vkmc> o/
15:09:42 <bswartz> okay so no objection to ninja merging on manila-test-image during the queens cycle?
15:09:57 <markstur> ++
15:10:20 <bswartz> okay cool
15:11:08 <bswartz> #topic Let's Go Over New Bugs
15:11:35 <bswartz> dustins: title caps here hurt my eyes
15:11:36 <dustins> Looks like it's my turn, then
15:12:06 <bswartz> #link https://etherpad.openstack.org/p/manila-bug-triage-pad
15:12:07 * dustins makes note to look at his copy of "Elements of Style"
15:12:34 <dustins> So the first one this week is: https://bugs.launchpad.net/manila/+bug/1715783
15:12:35 <openstack> Launchpad bug 1715783 in Manila "The associated share will be deleted after delete all share replica" [Undecided,New]
15:13:22 <dustins> zhongjun: You opened this a bit ago, has to do with a share being deleted when all of its associated replicas were deleted
15:13:35 <zhongjun> dustins: yes
15:13:47 <bswartz> I'm confused
15:13:58 <bswartz> when you say "all replicas" do you mean the original replica too?
15:14:03 <zhongjun> dustins:  It just from our user's perspective, they thought the share replica and share are different resources.  They shouldn’t interact with each other. If we want to clean the share replica then we can delete all share replica.
15:14:14 <zhongjun> bswartz:  yes
15:14:20 <gouthamr> okay, this relates to our "reset-state" mechanism
15:14:27 <bswartz> when a share is created, it gets 1 "replica" automatically
15:14:42 <bswartz> even unreplicated shares are treated as if they have 1 replica
15:15:52 <gouthamr> bswartz: zhongjun created a share, reset the replica-state of the "replica" and used share-replica-delete on that
15:16:17 <dustins> So deleting all replicas will delete the share because it IS one of the replicas
15:16:30 <bswartz> yeah that's the point I was trying to make
15:16:49 <bswartz> if you delete the last replica, you did delete the share -- working as designed IMO
15:16:52 <gouthamr> the logic in the API is you can't use share-replica-delete to delete that last replica
15:16:52 <zhongjun> bswartz:  so, when a share is created, they thought "replica" is a different resource, so we may not gets 1 "replica" automatically
15:17:13 <bswartz> gouthamr: that was going to be my next question
15:17:19 <bswartz> we may have a usability issue
15:17:53 <bswartz> while we know that deleting the last repica is the same as deleting the share, it would better to force users to go through the share deletion path to actually do that
15:17:57 <zhongjun> dustins: yes, because it is one of the replicas
15:17:58 <bswartz> to avoid confusion and surprises
15:18:48 <gouthamr> bswartz: that's how it is implemented
15:19:03 <bswartz> does reset state allow you to do something not intended?
15:19:08 <gouthamr> there's reset-replica-state, which messes with that :/
15:19:14 <bswartz> okay
15:19:25 <bswartz> so we don't need to fix the whole bug during this meeting
15:19:26 <gouthamr> well, the reset-* methods are meant to be used by administrators so they don't go into the DB
15:19:37 <bswartz> Please update the bug with the suggested fix and let's move on
15:19:50 <dustins> Sounds good
15:20:04 <dustins> Next up: https://bugs.launchpad.net/manila/+bug/1713062
15:20:05 <openstack> Launchpad bug 1713062 in Manila "Missing ability to automatically build configuration reference artifacts" [Undecided,New]
15:20:10 <zhongjun> bswartz: okay
15:20:20 <gouthamr> #action: gouthamr update bug/1715783
15:20:37 <dustins> So this one has to do with docs stuff since the docs were moved in tree
15:21:03 <dustins> And we lost the ability to get release diffs of configuration options with that
15:21:22 <bswartz> gouthamr: this bug is pretty high priority IMO -- is it a lot of work to fix it?
15:21:23 <dustins> So just just gotta do a bit of work to get it working again (presumably)
15:21:28 <bswartz> do you know specifically what's needed?
15:22:02 <tbarron> jungleboyj: do you have this working in cinder yet?
15:22:06 <bswartz> is this one of those situations where we can just steal code from cinder?
15:22:14 <bswartz> tbarron: my thought exactly
15:22:22 <gouthamr> i don't yet, the docs teams had some tooling around our in-tree config generator that we might need to resurrect and adopt
15:22:31 * bswartz tries to rope jungleboyj into fixing all our docs issues
15:22:32 <tbarron> bswartz: didn't you talk with jungleboyj and dhellman about this?
15:22:36 <jungleboyj> tbarron:  let me look.
15:23:16 <bswartz> oh wait -- this is slightly different than what I assumed it was
15:23:24 <bswartz> yes, this topic was covered at the PTG
15:23:38 <jungleboyj> tbarron:  We talked about that in Denver.  I am trying to remember what we landed on.  Let me look at the notes.
15:23:46 <bswartz> IMO it's less of a bug than a new feature -- but either way it's high priority
15:24:21 <tbarron> presumably every openstack component has this issue so there should be some kind of tool and howto
15:24:33 <bswartz> even if cinder hasn't done this -- some other project has (I would guess nova at least) so we can go look for a model to replicate
15:24:53 <zhongjun> +1
15:24:54 <tbarron> I think we'll need to mark up our code and run a tool but that's all I know
15:25:18 <jungleboyj> Ok, found the notes.  So, there is a sphinxext plugin that I need to look at that should be able to automatically generate tables of config items for drivers in each release.
15:25:42 <dustins> jungleboyj: Mind if I assign you to the bug?
15:25:43 <tbarron> jungleboyj: do you know if any project has used it yet?
15:25:55 <jungleboyj> If, however, what you are looking to do is produce diffs of options.  That is a different topic.  We have been relying on release notes with changes to document those.
15:26:00 <zhongjun> jungleboyj: Is there a link?
15:26:09 <jungleboyj> tbarron:  I believe Nova has done it.
15:26:14 <jungleboyj> Hold on.
15:26:16 <tbarron> dustins: well we can take the bug rather than the cinder ptl but
15:26:24 <bswartz> dustins: I was joking about assigning this to jungleboyj
15:26:32 <dustins> Oh, haha
15:26:35 <bswartz> I'm sure he's busy enough
15:26:40 <tbarron> they will presumably be doing their own work in parallel
15:26:54 <bswartz> but I was serious about trying to just copy the work of either cinder or nova
15:27:06 <bswartz> ideally this shouldn't be a heavy lift
15:27:21 <jungleboyj> Notes with links are here:
15:27:27 <jungleboyj> #link  https://etherpad.openstack.org/p/cinder-ptg-queens-wednesday-notes
15:27:28 <tbarron> +1 we are short-handed so if anyone has a model to follow that will help
15:27:33 <jungleboyj> Example of how Nova is doing it:
15:27:47 <jungleboyj> #link https://github.com/openstack/nova/commit/83a9c2ac334712b27704a814552628cf0e536a85
15:28:00 <zhongjun> cool, we could just follow this
15:28:18 <gouthamr> relevant #link: https://docs.openstack.org/oslo.config/latest/reference/sphinxext.html
15:28:46 <jungleboyj> So, that gets the high level part done.  I still have to figure out if there is a way to do it per driver.
15:28:59 <bswartz> I copied those links to the bug
15:29:10 <bswartz> it still needs an owner, but now whoever takes it has a roadmap to follow
15:29:28 <dustins> And I added the information to the bug etherpad as well
15:30:04 <bswartz> dustins: it's preferable for the etherpad to have ephemeral information and for all long-term relevant into to be captured in LP
15:30:11 <jungleboyj> It is going to be a bit before I get to playing with this on the Cinder side.
15:30:14 <bswartz> they call it "etherpad" for a reason
15:30:29 <bswartz> s/into/info/
15:30:45 <dustins> bswartz: Sure, but just in case someone looks at the etherpad for a low hanging fruit to get started with Manila, they can have some info
15:30:49 <jungleboyj> bswartz:  That is why I summarize the notes in our Wiki.  :-)
15:31:09 <dustins> Agreed that it should be in Launchpad definitively :)
15:31:18 <bswartz> may wiki live forever!
15:31:50 <bswartz> next?
15:32:10 <dustins> Next is: https://bugs.launchpad.net/manila/+bug/1713060
15:32:11 <openstack> Launchpad bug 1713060 in Manila "Changing service network mask breakes new service subnet creation" [Undecided,New]
15:32:38 <bswartz> ugh
15:32:49 <bswartz> this is unsupported intentionally
15:32:54 <dustins> Hehehe
15:33:10 <dustins> So Won't Fix/Not a Bug, then? :)
15:33:27 <bswartz> there are all kinds of configuration options, where if you change them after you've created some shares, your system will be totally hosed
15:33:48 <bswartz> we probably need to at least document which options those are
15:34:12 <bswartz> and consider preventing some of the worst problems
15:34:21 <tbarron> note that jan vondra filed this one and he has a number of patches up for the generic driver
15:34:32 <dustins> Sounds like a good idea
15:34:37 <tbarron> esp. for generic driver with cifs
15:34:52 <bswartz> yeah
15:35:00 <bswartz> the generic driver is pretty fragile
15:35:19 <bswartz> even the changes I'm working on don't solve this kind of problem
15:35:49 <ganso> bswartz: there is a config option for setting the subnet
15:35:54 <bswartz> yeah
15:36:00 <ganso> bswartz: if changing it does not work, the config option shouldn't exist
15:36:14 <bswartz> where neutron is involved, or anything related to networking really, we assume that the initial values won't ever change
15:36:14 <tbarron> bswartz: can you update that bug with your observations/reflections?
15:36:38 <tbarron> ganso: well it can have different initial values
15:37:10 <bswartz> tbarron: what ganso is getting at, and I think I agree, is that something which can't ever change should be entered into the database, not a config file
15:37:29 <tbarron> ganso: bswartz ack
15:37:50 <bswartz> so we may have some design flaws
15:37:54 <bswartz> I will update the bug
15:38:14 <bswartz> but absent some proposal to address these design issues, I don't see how to fix the bug
15:38:30 <bswartz> so it will most likely go to wontfix
15:38:30 <tbarron> maybe we'll get lucky and jan vondra will come up with an approach to the issue
15:38:36 <bswartz> yes it's possible
15:38:46 <bswartz> dustins: next?
15:38:50 <dustins> https://bugs.launchpad.net/manila/+bug/1709474
15:38:51 <openstack> Launchpad bug 1709474 in Manila "Can't reset state of a share server" [Undecided,New]
15:39:18 <dustins> This one just has to do with wanting reset operations for share servers in addition to shares themselves
15:40:00 <tbarron> how much cleanup would reset-state for share servers do?
15:40:15 <tbarron> is it *just* the DB change?
15:40:17 <gouthamr> nothing, it would only update the state on the DB
15:40:19 <gouthamr> yes
15:40:57 <tbarron> k
15:41:01 <bswartz> reset state was only ever supposed to hack the DB column right?
15:41:05 <gouthamr> yes
15:41:15 <bswartz> cleanup is explicitly not done in reset state
15:41:24 <tbarron> i'm just getting that on record :)
15:41:38 <gouthamr> :) it's a hack
15:41:39 <bswartz> if you want to cleanup, you use a non-forced delete
15:42:13 <bswartz> the main benefit of reset state, is that it allows you to retry a failed delete
15:42:27 <bswartz> increasing the chance of actually cleaning up a mess
15:42:58 <bswartz> ofc it also allows you to turn a small mess into a big mess
15:43:07 <bswartz> if used incorrectly
15:43:43 <tbarron> so this is a minor feature as we never said you can do this
15:43:56 <tbarron> but it's a reasonable expectation
15:44:15 <tbarron> what other resources can we not reset today?
15:44:20 <gouthamr> yes, if a share server is stuck in "creating" or "deleting" state (reasons abound) it can't be deleted via manila API
15:44:20 <gouthamr> you'd've to resort to mucking with the database
15:44:22 <zhongjun> yeah, we cannot do any other operations except delete/list/show
15:44:29 <bswartz> gouthamr: is this an RFE masquerading as a bug?
15:45:03 <gouthamr> Yes, perhaps... can i not have it, then?
15:45:29 <tbarron> I don't think it requires a spec :)
15:45:35 <bswartz> I marked it wishlist
15:45:44 <bswartz> we'd need a use case I think
15:45:59 <tbarron> But it would be good to think about what other resources have the same issue.
15:46:02 <bswartz> to explain who the intended user is and what he's going to do with it
15:46:05 <gouthamr> it's on the bug.. i can add another one..
15:46:15 <gouthamr> yep
15:46:26 <bswartz> oh I had to read it a second time to see the use case
15:46:36 <bswartz> yes the RFE is clear
15:46:48 <bswartz> it feels low priority to me
15:47:07 <bswartz> the workaround is to use mysql
15:48:29 <bswartz> next?
15:48:33 <zhongjun> and figure out is there another way to solve this use case? Could we just use force delete if  the use case is just for delete it.
15:48:40 <dustins> Last one for today: https://bugs.launchpad.net/manila/+bug/1708491
15:48:41 <openstack> Launchpad bug 1708491 in Manila "tempest api migration tests very long" [Undecided,New]
15:48:56 <bswartz> zhongjun: gouthamr didn't want to delete it, he wanted to continue using it
15:48:57 <dustins> This one has to do with the speed (or perhaps the lack of it) with migration tests
15:49:29 <gouthamr> zhongjun: also, no force-delete on share-servers :)
15:50:01 <bswartz> tbarron: should we consider a separate job for migration tests?
15:50:16 <tbarron> gouthamr: probably wish-list bugs for each of these, they are relatively low-hanging fruit if still low-priority
15:50:28 <gouthamr> tbarron: +1 will add it
15:50:29 <bswartz> is there an advantage to running migration tests on every single job?
15:50:30 <tbarron> bswartz: I think we should look at this issue again after raissa
15:50:32 <zhongjun> gouthamr, bswartz: :)
15:50:47 <tbarron> has moved the tests to their own repo and we look at convertying
15:51:00 <tbarron> converting the legacy jobs to new zuulv3 format
15:51:22 <gouthamr> one reason they're on every job is because we have so many first party drivers and driver-modes
15:51:39 <tbarron> and in that last step maybe decide to take a different approach
15:51:40 <gouthamr> and they're not entirely "api" tests..
15:51:46 <bswartz> yeah but are we really testing anything unique in each case?
15:51:55 <bswartz> or is it the same code running over and over
15:52:14 <bswartz> I agree that drivers with assisted migration need to test that somehow
15:52:14 <ganso> bswartz: there are main tests
15:52:21 <tbarron> that's a good question
15:52:23 <ganso> bswartz: which guarantee that a migration works
15:52:24 <bswartz> but none of first party drivers do that
15:52:53 <ganso> bswartz: but there are other tests such as "migrate and extend", "migrate and create from snapshots" that guarantee that other functionality work with migration
15:53:19 <bswartz> what is really being tested though?
15:53:33 <bswartz> how is a migrate and extend different from a create new share and extend?
15:53:41 <bswartz> (for drivers that don't assist migration)
15:53:53 <ganso> it shouldn't be any different
15:54:01 <ganso> but we have seen cases where these things break
15:54:07 <gouthamr> the idea is to blackbox the tests and not favor them based on how we know it works.. :)
15:54:12 <bswartz> then there's not a lot of point in burning CPU cycles on running the tests for every job
15:54:30 <bswartz> gouthamr: in an ideal world with infinite CPU cycles, yes
15:54:37 <ganso> because drivers have to handle share_id, share_instance_id changes on migrated shares and these things can end up not being handled properly
15:54:40 <bswartz> this is a case of some unusually slow tests though
15:54:56 <bswartz> so it's reasonable to treat them specially
15:55:09 <ganso> we always migrate a 1gb empty share
15:55:34 <tbarron> i'm less concerned with the cpu cycles than the human cycles sorting out the false negatives and rechecking on patches that have nothinig to do with migration
15:55:49 <bswartz> if we can get the test's runtime down from 4 minutes to under a minute, I think that would be an acceptable fix too
15:56:12 <ganso> tbarron: if that's the case we should take a step back and investigate why they fail instead of rechecking
15:56:15 <bswartz> but assuming that's hard, we should consider running them in a different pipe
15:56:16 <tbarron> yeah, that would be best, I just don't know how to do that right off
15:56:24 <ganso> tbarron: migrations shouldn't be fail-prone that we recheck until the work
15:56:31 <ganso> s/the work/they work
15:56:37 <bswartz> they could still vote, but we could removing slowness as a cause of failures
15:57:17 <gouthamr> can the slowness be because of that periodic interval that can't be respected as well?
15:57:25 <tbarron> ganso: I agree, but often they take way too long and that's the reason for the failures, and the reason for the slowness is mysterious
15:57:26 <bswartz> that's a good point
15:57:37 <ganso> lately, especially with generic driver and lvm driver, I've seen only the migration tests failing, but most of the time they work
15:57:38 <gouthamr> or is that only the case with zfsonlinux, netapp, dummy drivers
15:57:48 <bswartz> the slowness might be something easy to address by speeding up some intervals
15:59:08 <ganso> bswartz, tbarron: I think we can attribute an "endurance test" characteristic to migration tests, and have them as a separate job. Those separate jobs should always pass, if they don't, then there's something wrong with the driver or storage, but we can separate the "flakyness" of those tests from the main jobs which are voting and fail much less when they don't run migration tests
15:59:24 <bswartz> ganso: I agree with you that simply closing your eyes and rechecking is the wrong approach here
16:00:09 <gouthamr> time check
16:00:09 <gouthamr> so we need someone to investigate and dig into these failures?
16:00:13 <bswartz> ganso: that's more or less what I was proposing assuming we can't speed up the tests
16:00:14 <ganso> gouthamr: I've seen the periodic task running very fast on NetApp CI
16:00:21 <bswartz> we're out of time for today
16:00:36 <dustins> To the other channel!
16:00:43 <bswartz> this bug needs an owner and more discussion -- we can revisit it next week if nobody wants to grab it
16:00:47 <bswartz> thanks all
16:00:59 <bswartz> #endmeeting