15:00:21 <bswartz> #startmeeting manila
15:00:22 <openstack> Meeting started Thu Apr 14 15:00:21 2016 UTC and is due to finish in 60 minutes.  The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:26 <openstack> The meeting name has been set to 'manila'
15:00:31 <bswartz> hello all
15:00:31 <cknight> Hi
15:00:32 <dustins> \o
15:00:33 <jseiler__> hi
15:00:33 <tbarron> hi
15:00:34 <gouthamr> hello o/
15:00:36 <zhongjun_> Hi
15:00:36 <vponomaryov> hello
15:00:37 <kaisers> hi
15:00:38 <ganso> hello!
15:00:44 <xyang1> hi
15:01:04 <bswartz> it's a good thing this is an IRC meeting, because I have laryngitis that won't quit
15:01:24 <tbarron> :(  bad pollen?
15:01:38 * bswartz thinks its virus related
15:01:41 <bswartz> anyways
15:01:46 <ganso> bswartz: I was like that for past 2 weeks
15:01:51 <bswartz> a few topics today
15:01:57 <bswartz> #topic announcements
15:02:01 <dustins> bswartz: Could always bold or italicize :P
15:02:31 <bswartz> summit is in 2 weeks so we'll cancel the 4/28 meeting
15:02:42 <mkoderer> hello
15:02:46 <bswartz> the topics are selected on the etherpad, and I just need to transfer them into the official scheduling tool
15:03:02 <bswartz> we have a lot of spillover topics to cover on Friday afternoon
15:03:32 <gouthamr> #link https://etherpad.openstack.org/p/manila-newton-summit-topics
15:03:42 <markstur_> hi
15:03:55 <bswartz> you're still welcome to suggest more topics but I think we've got plenty to discuss already
15:04:30 <bswartz> As I mentioned last week, I think newton is not the right time to do a Europe-based midcycle meetup
15:04:54 <bswartz> so I'm looking for possible US-based locations that are willing to host
15:05:05 <bswartz> RTP (NetApp) will be willing to host again
15:05:17 <mkoderer> bswartz: ok
15:05:18 <bswartz> but we're open to other locations
15:05:40 <bswartz> we need to start planning dates for that, and get a location locked in
15:06:11 <bswartz> I'd like to have a list of locations/dates before the summit so we can poll for what people prefer
15:06:38 <bswartz> #topic Manage API retrying
15:06:48 <bswartz> gouthamr/nidhimittalhada: you here?
15:07:06 <gouthamr> bswartz: thanks. i can cover this..
15:07:25 <gouthamr> #link https://bugs.launchpad.net/manila/+bug/1565903
15:07:26 <openstack> Launchpad bug 1565903 in Manila "retry for manage share fails with an API error " [Undecided,In progress] - Assigned to Goutham Pacha Ravi (gouthamr)
15:07:49 <gouthamr> the manage API has a retry logic
15:07:59 <gouthamr> #link https://github.com/gouthampacha/manila/blob/master/manila/share/api.py#L366
15:08:17 <bswartz> gouthamr: I think retry logic isn't an accurate way to portray it
15:08:24 <gouthamr> that's broken since liberty (since share instances)
15:08:25 <bswartz> it's more like "reuse" if the admin retries the API
15:08:43 <gouthamr> bswartz: yes..
15:08:56 <bswartz> so the API service doesn't retry anything if the request fails
15:09:07 <gouthamr> it is trying to reuse the DB record created previously
15:09:29 <bswartz> but if the admin retries the API request, then the row in the database can get reused rather than forcing the admin to deleted the errored share first
15:09:42 <bswartz> s/deleted/delete/
15:09:54 <gouthamr> bswartz: true.. is that really helpful?
15:10:11 <gouthamr> I was wondering if we could remove that from the API..
15:10:37 <gouthamr> and expect that the admin would delete the db entry and retry instead
15:11:05 <gouthamr> reason 1) There's no way to test this feature with tempest tests
15:11:12 <bswartz> I was just trying to characterize the old behavior before the bug was introduced
15:11:34 <bswartz> I'm curious to know how people feel about it
15:11:55 <vponomaryov> I am ok removing "retry"
15:12:03 <gouthamr> #link https://bugs.launchpad.net/manila/+bug/1566504
15:12:04 <openstack> Launchpad bug 1566504 in Manila "Manage API attempts to 'retry' manage if the same parameters are used" [Undecided,New] - Assigned to NidhiMittalHada (nidhimittal19)
15:12:17 <bswartz> It was added as a convenience for admins, but if feels very anti-REST to me
15:12:26 <dustins> Yeah, I can see removing the "retry" logic
15:12:29 <gouthamr> vponomaryov: so can we do this without a microversion..
15:13:15 <bswartz> gouthamr: if we characterize all of the whole old behavior as a bug, then it can be fixed without changing the microversion
15:13:18 <vponomaryov> gouthamr: #link http://docs.openstack.org/developer/manila/devref/api_microversion_dev.html#when-do-i-need-a-new-microversion
15:14:03 <ganso> bswartz: looks like a bug to me
15:14:04 <gouthamr> thanks..
15:14:06 <bswartz> however if the bug just says we should restore the old behavior and make it work the way it used to, then changing it would require a new microversion (and the change would not be backportable)
15:14:42 <bswartz> IIRC it was u_glide who implemented the reuse/retry optimization
15:14:44 <gouthamr> bswartz: the "fix" to this issue is to deepcopy dictionaries in the DB helpers..
15:14:57 <bswartz> and he's no longer around to defend his decision
15:15:39 <gouthamr> bswartz: i would like to keep the fix nevertheless ... since it's always safe to copy your parameters and not modify them
15:15:43 <bswartz> so if nobody else speaks up in favor of preserving the reuse/retry logic I think we'll just rewrite the bug and say we want that logic removed
15:15:52 <ganso> gouthamr: I think those are also necessary fixes, although they can be masked if retry is removed, is that right?
15:16:09 <gouthamr> ganso: yep..
15:16:53 <bswartz> okay I'm not hearing any support
15:17:14 <gouthamr> will add to the bug report
15:17:15 <gouthamr> thanks
15:17:16 <bswartz> #agreed share manage should not attempt to reuse IDs of previously failed (error_managing) shares
15:17:29 <bswartz> #topic Tempest test structuring - resource sharing and cleanup issues
15:17:40 <bswartz> so akerr isn't able to attend today
15:17:54 <mkoderer> bswartz: do you know about the issues he want to raise?
15:17:58 <bswartz> I will introduce this topic briefly but we may need to save discussion for a time when he's around
15:18:31 <dustins> I know there was the patch he introduced about improving cleanup to not cascade failures
15:18:34 <vponomaryov> mkoderer: issue is to make cleanup failure for some resource appear only once
15:18:45 <bswartz> the problem is that our tempest tests do a lot at the class level rather than the instance level to facilitate resource sharing between tests
15:18:59 <vponomaryov> mkoderer: now every test cleanup tries to remove failed thing that could be deleted before
15:19:08 <bswartz> this was considered a good idea for the generic driver because it's so slow and it was the only way to make our gate tests fast
15:19:23 <dustins> bswartz: But should probably be removed post haste
15:19:32 <bswartz> IMO, for every driver other than the generic driver, this is a bad way to do tests
15:19:44 <vponomaryov> #link https://review.openstack.org/#/c/304834/
15:19:56 <dustins> I'd argue that it's a bad way even for the generic driver, but a necessary evil
15:20:02 <bswartz> we should not share resources between test groups
15:20:05 <cknight> bswartz: +1
15:20:10 <bswartz> dustins: +1
15:20:46 <ganso> +1
15:20:55 <bswartz> so I'm still looking for a good solution to: "how do we make the generic driver able to run tempest tests quickly in a world where we don't share any resources between tests"
15:21:23 <dustins> The answer might be that we cannot
15:21:26 <bswartz> assuming we can solve that problem, then I'm strongly in favor of restructuring our tests to eliminate sharing of resources
15:21:38 <bswartz> perhaps not
15:21:51 <dustins> I'd hate for all of our tests to suffer just for the sake of the Generic Driver
15:22:04 <bswartz> I would be very curious to see an experiment where we run tempest tests on generic driver with no resource sharing
15:22:12 <dustins> But having a "fork" of our tests specifically for the Generic Driver isn't optimal either
15:22:26 <cknight> Once we have the new 1st-party drivers voting, can we run a smaller set of tests on Generic?
15:22:27 <bswartz> I'd like to know (1) how long does it take and (2) how much CPU/RAM/Strorage gets used
15:22:34 <mkoderer> I think we should really find a better reference implementation
15:22:39 <dustins> mkoderer: +1
15:22:47 <bswartz> well yes we're working hard to new reference drivers
15:23:02 <bswartz> that a necessary-but-not-sufficient solution to the problem
15:23:13 <dustins> bswartz: It would be an interesting experiment
15:23:24 <bswartz> after we have new reference drivers we will use those for voting gate jobs, but we will continue to support and test generic driver
15:23:48 <mkoderer> bswartz: but we could reduce the ammount of test for the generic driver
15:24:09 <bswartz> mkoderer: possibly, but not less than what we require of 3rd party vendor CI
15:24:56 <vponomaryov> bswartz: what is the goal that is going to be solved by removing "shared resources"?
15:25:09 <bswartz> vponomaryov: test reliability
15:25:21 <dustins> Failures in one test not affecting any other
15:25:27 <bswartz> dustins: +1
15:26:12 <vponomaryov> bswartz, dustins: problem still will exist that caused errors
15:26:14 <bswartz> it will be easier to debug failures when a small bug causes exactly 1 failure in tempest rather than what happens today, where a small bug can cause a cascade of failures
15:26:33 <dustins> bswartz: exactly
15:26:49 <vponomaryov> bswartz: it is easier to update "cleanup" approach
15:27:00 <bswartz> yes I think that was akerr's point
15:27:03 <vponomaryov> bswartz: than to stop using shared resources
15:27:19 <bswartz> the first small step to making things better is improve cleanup
15:27:45 <dustins> Ultimately I'd like to see shared resources go away, but baby steps first
15:27:49 <bswartz> I'll let akerr speak to his goals around changing class methods to instance methods in the tempest code
15:28:05 <bswartz> that will have to wait for another meeting
15:28:52 <bswartz> it makes sense to me intuitively that resource sharing across tests is generally bad because it can cause a failure in 1 test to cause failures in other tests
15:29:14 <tbarron> +1
15:29:17 <bswartz> but I also get the benefit of speeding things up by not creating needless extra objects
15:29:24 <bswartz> so we have to find a balance there
15:29:37 <bswartz> I suspect we're currently too far on the sharing of resources side
15:29:49 <bswartz> that's all I'll say about this topic
15:29:53 <dustins> Indeed, make it as fast as possible while maintaining as much independence of resources as we can
15:30:03 <vponomaryov> we share them now so, they cannot break each other running in parallel in any order
15:30:07 <vponomaryov> it is enough
15:30:24 <bswartz> vponomaryov: I think it depends on the type of object
15:30:35 <vponomaryov> 1 cleanup errror or 10 - anywa we have a problem
15:31:07 <bswartz> I think you may be correct if you ignore problems in tests cleanup, but I think many QA people believe it's wrong to ignore those cleanup errors
15:31:31 <dustins> Absolutely
15:31:44 <vponomaryov> bswartz: why ignore? they are analized easily
15:31:51 <vponomaryov> bswartz: that the main thing
15:32:01 <bswartz> not by me
15:32:12 <bswartz> I have a hard time tracing down the reason tests fail
15:32:19 <dustins> Same
15:32:22 <bswartz> and I've been doing this for a while
15:32:29 <vponomaryov> because it lays not in tempest but services
15:32:34 <bswartz> I feel bad for first time developers trying to debug a test failure
15:33:00 <cknight> Same here.  A failing test should be caused by an issue with that test, not some other one.
15:33:02 <dustins> Or the testers trying to figure out if something's actually wrong or if something just randomly failed
15:33:02 <vponomaryov> it is question of cleanup, but not shared resources
15:33:26 <dustins> vponomaryov: I'd argue that it's a bit of both
15:33:45 <bswartz> yeah I think we need to discuss the various proposed changes on a case by case basis
15:33:47 <vponomaryov> dustins: current problem in "cleanup" is python-specific
15:34:06 <bswartz> some sharing is probably accepable
15:34:12 <vponomaryov> dustins: we have false cleanup errors on "not-really-shared-resources"
15:34:18 <bswartz> but some of it definitely causes issues
15:34:25 <vponomaryov> dustins: just because of python specifics
15:34:49 <dustins> vponomaryov: I'm not sure I follow, I'll chat with you afterwards to see if we can get on the same page
15:34:52 <gouthamr> vponomaryov: we've never seen those on the NetApp CI
15:35:03 <gouthamr> vponomaryov: false cleanup errors..
15:35:07 <bswartz> so I'd like to revisit this topic with vponomaryov, akerr, and dustins sometime soon
15:35:17 <bswartz> perhaps next meeting or perhaps earlier than that if we can find a time
15:35:38 <bswartz> for now let's move on
15:35:40 <vponomaryov> gouthamr: by "false" I mean "duplication of failure again and again"
15:35:42 * gouthamr adds himself as interested party :P
15:35:57 <bswartz> gouthamr: I won't exclude anybody
15:36:06 <bswartz> #topic Snapshot support common capability in the documentation
15:36:14 <bswartz> gouthamr: you are up again
15:36:27 <gouthamr> alright.. i saw this bug #link https://bugs.launchpad.net/manila/+bug/1506863
15:36:28 <openstack> Launchpad bug 1506863 in Manila "'snapshot_support' capability absent in appropriate dev doc" [High,New]
15:36:57 <bswartz> gah!
15:36:57 <gouthamr> and i decided to let some part of it piggyback on my changes to that file.. since it's a doc change..
15:37:07 <gouthamr> https://review.openstack.org/#/c/300018/5/doc/source/devref/share_back_ends_feature_support_mapping.rst
15:37:37 <gouthamr> but then, i got some concerns about when a driver should advertise support for this capability
15:37:40 <bswartz> that's an unforunate bug
15:37:44 <gouthamr> :(
15:37:59 <bswartz> gouthamr: please don't piggyback multiple changes
15:38:08 <gouthamr> bswartz: now i know.
15:38:41 <gouthamr> bswartz: so i thought i'd let you guys decide what's to be done about snapshot_support in that table.
15:38:47 <bswartz> the only good reason to squash unrelated changes is if there are 2 or more gate-blocker bugs that need fixing simultaneously
15:39:52 <bswartz> I thought we were going to get rid of that table...
15:39:53 <gouthamr> bswartz: i can take it out.. but im still interested in knowing how we want that column to read..
15:40:02 <gouthamr> bswartz: that could be one soluion
15:40:23 <gouthamr> s/soluion/solution
15:40:36 <bswartz> we need to capture the relevant info somewhere before we remove it
15:41:09 <bswartz> but let's not spend too much time arguing about how the table should look if its planned to be deleted
15:41:39 <gouthamr> suggestion: the OpenStack Configuration Reference : http://docs.openstack.org/mitaka/config-reference/
15:42:15 <gouthamr> bswartz: i know there's value in maintaining a table with checkmarks whether a feature is currently supported
15:42:32 <bswartz> yes
15:42:33 <gouthamr> its valuable to developers, deployers and users..
15:42:39 <bswartz> justt not in the dev ref
15:42:54 <bswartz> users and deployers should never find anything useful in the dev ref
15:43:03 <gouthamr> true
15:43:20 <bswartz> if they do, it's because we failed to put useful information in the actual docs
15:43:56 <dustins> Well, not ONLY in the devref :)
15:44:08 <dustins> But agreed
15:44:10 <bswartz> dustins: you know what I mean
15:44:33 <dustins> Indeed
15:44:39 <bswartz> gouthamr: does that work for you?
15:44:44 <gouthamr> we can move this to config-ref..
15:44:52 <bswartz> we can plan to move it
15:45:04 <bswartz> but for your change, just avoid that issue entirely
15:45:17 <gouthamr> bswartz: for the review in question, i would like to remove the snapshot_support column but retain the replication_type column
15:45:28 <gouthamr> bswartz: or do i not bother about that column too? ^
15:45:44 <bswartz> I would ignore it as not relevant
15:45:59 <bswartz> start another change to deal with it specifically
15:46:12 <gouthamr> sure thing. thanks for the clarification!
15:46:41 <bswartz> okay!
15:46:45 <bswartz> #topic open discussion
15:47:08 <bswartz> anything else today?
15:48:10 <bswartz> okay thanks everyone!
15:48:19 <bswartz> #endmeeting