15:00:21 <bswartz> #startmeeting manila 15:00:22 <openstack> Meeting started Thu Apr 14 15:00:21 2016 UTC and is due to finish in 60 minutes. The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:26 <openstack> The meeting name has been set to 'manila' 15:00:31 <bswartz> hello all 15:00:31 <cknight> Hi 15:00:32 <dustins> \o 15:00:33 <jseiler__> hi 15:00:33 <tbarron> hi 15:00:34 <gouthamr> hello o/ 15:00:36 <zhongjun_> Hi 15:00:36 <vponomaryov> hello 15:00:37 <kaisers> hi 15:00:38 <ganso> hello! 15:00:44 <xyang1> hi 15:01:04 <bswartz> it's a good thing this is an IRC meeting, because I have laryngitis that won't quit 15:01:24 <tbarron> :( bad pollen? 15:01:38 * bswartz thinks its virus related 15:01:41 <bswartz> anyways 15:01:46 <ganso> bswartz: I was like that for past 2 weeks 15:01:51 <bswartz> a few topics today 15:01:57 <bswartz> #topic announcements 15:02:01 <dustins> bswartz: Could always bold or italicize :P 15:02:31 <bswartz> summit is in 2 weeks so we'll cancel the 4/28 meeting 15:02:42 <mkoderer> hello 15:02:46 <bswartz> the topics are selected on the etherpad, and I just need to transfer them into the official scheduling tool 15:03:02 <bswartz> we have a lot of spillover topics to cover on Friday afternoon 15:03:32 <gouthamr> #link https://etherpad.openstack.org/p/manila-newton-summit-topics 15:03:42 <markstur_> hi 15:03:55 <bswartz> you're still welcome to suggest more topics but I think we've got plenty to discuss already 15:04:30 <bswartz> As I mentioned last week, I think newton is not the right time to do a Europe-based midcycle meetup 15:04:54 <bswartz> so I'm looking for possible US-based locations that are willing to host 15:05:05 <bswartz> RTP (NetApp) will be willing to host again 15:05:17 <mkoderer> bswartz: ok 15:05:18 <bswartz> but we're open to other locations 15:05:40 <bswartz> we need to start planning dates for that, and get a location locked in 15:06:11 <bswartz> I'd like to have a list of locations/dates before the summit so we can poll for what people prefer 15:06:38 <bswartz> #topic Manage API retrying 15:06:48 <bswartz> gouthamr/nidhimittalhada: you here? 15:07:06 <gouthamr> bswartz: thanks. i can cover this.. 15:07:25 <gouthamr> #link https://bugs.launchpad.net/manila/+bug/1565903 15:07:26 <openstack> Launchpad bug 1565903 in Manila "retry for manage share fails with an API error " [Undecided,In progress] - Assigned to Goutham Pacha Ravi (gouthamr) 15:07:49 <gouthamr> the manage API has a retry logic 15:07:59 <gouthamr> #link https://github.com/gouthampacha/manila/blob/master/manila/share/api.py#L366 15:08:17 <bswartz> gouthamr: I think retry logic isn't an accurate way to portray it 15:08:24 <gouthamr> that's broken since liberty (since share instances) 15:08:25 <bswartz> it's more like "reuse" if the admin retries the API 15:08:43 <gouthamr> bswartz: yes.. 15:08:56 <bswartz> so the API service doesn't retry anything if the request fails 15:09:07 <gouthamr> it is trying to reuse the DB record created previously 15:09:29 <bswartz> but if the admin retries the API request, then the row in the database can get reused rather than forcing the admin to deleted the errored share first 15:09:42 <bswartz> s/deleted/delete/ 15:09:54 <gouthamr> bswartz: true.. is that really helpful? 15:10:11 <gouthamr> I was wondering if we could remove that from the API.. 15:10:37 <gouthamr> and expect that the admin would delete the db entry and retry instead 15:11:05 <gouthamr> reason 1) There's no way to test this feature with tempest tests 15:11:12 <bswartz> I was just trying to characterize the old behavior before the bug was introduced 15:11:34 <bswartz> I'm curious to know how people feel about it 15:11:55 <vponomaryov> I am ok removing "retry" 15:12:03 <gouthamr> #link https://bugs.launchpad.net/manila/+bug/1566504 15:12:04 <openstack> Launchpad bug 1566504 in Manila "Manage API attempts to 'retry' manage if the same parameters are used" [Undecided,New] - Assigned to NidhiMittalHada (nidhimittal19) 15:12:17 <bswartz> It was added as a convenience for admins, but if feels very anti-REST to me 15:12:26 <dustins> Yeah, I can see removing the "retry" logic 15:12:29 <gouthamr> vponomaryov: so can we do this without a microversion.. 15:13:15 <bswartz> gouthamr: if we characterize all of the whole old behavior as a bug, then it can be fixed without changing the microversion 15:13:18 <vponomaryov> gouthamr: #link http://docs.openstack.org/developer/manila/devref/api_microversion_dev.html#when-do-i-need-a-new-microversion 15:14:03 <ganso> bswartz: looks like a bug to me 15:14:04 <gouthamr> thanks.. 15:14:06 <bswartz> however if the bug just says we should restore the old behavior and make it work the way it used to, then changing it would require a new microversion (and the change would not be backportable) 15:14:42 <bswartz> IIRC it was u_glide who implemented the reuse/retry optimization 15:14:44 <gouthamr> bswartz: the "fix" to this issue is to deepcopy dictionaries in the DB helpers.. 15:14:57 <bswartz> and he's no longer around to defend his decision 15:15:39 <gouthamr> bswartz: i would like to keep the fix nevertheless ... since it's always safe to copy your parameters and not modify them 15:15:43 <bswartz> so if nobody else speaks up in favor of preserving the reuse/retry logic I think we'll just rewrite the bug and say we want that logic removed 15:15:52 <ganso> gouthamr: I think those are also necessary fixes, although they can be masked if retry is removed, is that right? 15:16:09 <gouthamr> ganso: yep.. 15:16:53 <bswartz> okay I'm not hearing any support 15:17:14 <gouthamr> will add to the bug report 15:17:15 <gouthamr> thanks 15:17:16 <bswartz> #agreed share manage should not attempt to reuse IDs of previously failed (error_managing) shares 15:17:29 <bswartz> #topic Tempest test structuring - resource sharing and cleanup issues 15:17:40 <bswartz> so akerr isn't able to attend today 15:17:54 <mkoderer> bswartz: do you know about the issues he want to raise? 15:17:58 <bswartz> I will introduce this topic briefly but we may need to save discussion for a time when he's around 15:18:31 <dustins> I know there was the patch he introduced about improving cleanup to not cascade failures 15:18:34 <vponomaryov> mkoderer: issue is to make cleanup failure for some resource appear only once 15:18:45 <bswartz> the problem is that our tempest tests do a lot at the class level rather than the instance level to facilitate resource sharing between tests 15:18:59 <vponomaryov> mkoderer: now every test cleanup tries to remove failed thing that could be deleted before 15:19:08 <bswartz> this was considered a good idea for the generic driver because it's so slow and it was the only way to make our gate tests fast 15:19:23 <dustins> bswartz: But should probably be removed post haste 15:19:32 <bswartz> IMO, for every driver other than the generic driver, this is a bad way to do tests 15:19:44 <vponomaryov> #link https://review.openstack.org/#/c/304834/ 15:19:56 <dustins> I'd argue that it's a bad way even for the generic driver, but a necessary evil 15:20:02 <bswartz> we should not share resources between test groups 15:20:05 <cknight> bswartz: +1 15:20:10 <bswartz> dustins: +1 15:20:46 <ganso> +1 15:20:55 <bswartz> so I'm still looking for a good solution to: "how do we make the generic driver able to run tempest tests quickly in a world where we don't share any resources between tests" 15:21:23 <dustins> The answer might be that we cannot 15:21:26 <bswartz> assuming we can solve that problem, then I'm strongly in favor of restructuring our tests to eliminate sharing of resources 15:21:38 <bswartz> perhaps not 15:21:51 <dustins> I'd hate for all of our tests to suffer just for the sake of the Generic Driver 15:22:04 <bswartz> I would be very curious to see an experiment where we run tempest tests on generic driver with no resource sharing 15:22:12 <dustins> But having a "fork" of our tests specifically for the Generic Driver isn't optimal either 15:22:26 <cknight> Once we have the new 1st-party drivers voting, can we run a smaller set of tests on Generic? 15:22:27 <bswartz> I'd like to know (1) how long does it take and (2) how much CPU/RAM/Strorage gets used 15:22:34 <mkoderer> I think we should really find a better reference implementation 15:22:39 <dustins> mkoderer: +1 15:22:47 <bswartz> well yes we're working hard to new reference drivers 15:23:02 <bswartz> that a necessary-but-not-sufficient solution to the problem 15:23:13 <dustins> bswartz: It would be an interesting experiment 15:23:24 <bswartz> after we have new reference drivers we will use those for voting gate jobs, but we will continue to support and test generic driver 15:23:48 <mkoderer> bswartz: but we could reduce the ammount of test for the generic driver 15:24:09 <bswartz> mkoderer: possibly, but not less than what we require of 3rd party vendor CI 15:24:56 <vponomaryov> bswartz: what is the goal that is going to be solved by removing "shared resources"? 15:25:09 <bswartz> vponomaryov: test reliability 15:25:21 <dustins> Failures in one test not affecting any other 15:25:27 <bswartz> dustins: +1 15:26:12 <vponomaryov> bswartz, dustins: problem still will exist that caused errors 15:26:14 <bswartz> it will be easier to debug failures when a small bug causes exactly 1 failure in tempest rather than what happens today, where a small bug can cause a cascade of failures 15:26:33 <dustins> bswartz: exactly 15:26:49 <vponomaryov> bswartz: it is easier to update "cleanup" approach 15:27:00 <bswartz> yes I think that was akerr's point 15:27:03 <vponomaryov> bswartz: than to stop using shared resources 15:27:19 <bswartz> the first small step to making things better is improve cleanup 15:27:45 <dustins> Ultimately I'd like to see shared resources go away, but baby steps first 15:27:49 <bswartz> I'll let akerr speak to his goals around changing class methods to instance methods in the tempest code 15:28:05 <bswartz> that will have to wait for another meeting 15:28:52 <bswartz> it makes sense to me intuitively that resource sharing across tests is generally bad because it can cause a failure in 1 test to cause failures in other tests 15:29:14 <tbarron> +1 15:29:17 <bswartz> but I also get the benefit of speeding things up by not creating needless extra objects 15:29:24 <bswartz> so we have to find a balance there 15:29:37 <bswartz> I suspect we're currently too far on the sharing of resources side 15:29:49 <bswartz> that's all I'll say about this topic 15:29:53 <dustins> Indeed, make it as fast as possible while maintaining as much independence of resources as we can 15:30:03 <vponomaryov> we share them now so, they cannot break each other running in parallel in any order 15:30:07 <vponomaryov> it is enough 15:30:24 <bswartz> vponomaryov: I think it depends on the type of object 15:30:35 <vponomaryov> 1 cleanup errror or 10 - anywa we have a problem 15:31:07 <bswartz> I think you may be correct if you ignore problems in tests cleanup, but I think many QA people believe it's wrong to ignore those cleanup errors 15:31:31 <dustins> Absolutely 15:31:44 <vponomaryov> bswartz: why ignore? they are analized easily 15:31:51 <vponomaryov> bswartz: that the main thing 15:32:01 <bswartz> not by me 15:32:12 <bswartz> I have a hard time tracing down the reason tests fail 15:32:19 <dustins> Same 15:32:22 <bswartz> and I've been doing this for a while 15:32:29 <vponomaryov> because it lays not in tempest but services 15:32:34 <bswartz> I feel bad for first time developers trying to debug a test failure 15:33:00 <cknight> Same here. A failing test should be caused by an issue with that test, not some other one. 15:33:02 <dustins> Or the testers trying to figure out if something's actually wrong or if something just randomly failed 15:33:02 <vponomaryov> it is question of cleanup, but not shared resources 15:33:26 <dustins> vponomaryov: I'd argue that it's a bit of both 15:33:45 <bswartz> yeah I think we need to discuss the various proposed changes on a case by case basis 15:33:47 <vponomaryov> dustins: current problem in "cleanup" is python-specific 15:34:06 <bswartz> some sharing is probably accepable 15:34:12 <vponomaryov> dustins: we have false cleanup errors on "not-really-shared-resources" 15:34:18 <bswartz> but some of it definitely causes issues 15:34:25 <vponomaryov> dustins: just because of python specifics 15:34:49 <dustins> vponomaryov: I'm not sure I follow, I'll chat with you afterwards to see if we can get on the same page 15:34:52 <gouthamr> vponomaryov: we've never seen those on the NetApp CI 15:35:03 <gouthamr> vponomaryov: false cleanup errors.. 15:35:07 <bswartz> so I'd like to revisit this topic with vponomaryov, akerr, and dustins sometime soon 15:35:17 <bswartz> perhaps next meeting or perhaps earlier than that if we can find a time 15:35:38 <bswartz> for now let's move on 15:35:40 <vponomaryov> gouthamr: by "false" I mean "duplication of failure again and again" 15:35:42 * gouthamr adds himself as interested party :P 15:35:57 <bswartz> gouthamr: I won't exclude anybody 15:36:06 <bswartz> #topic Snapshot support common capability in the documentation 15:36:14 <bswartz> gouthamr: you are up again 15:36:27 <gouthamr> alright.. i saw this bug #link https://bugs.launchpad.net/manila/+bug/1506863 15:36:28 <openstack> Launchpad bug 1506863 in Manila "'snapshot_support' capability absent in appropriate dev doc" [High,New] 15:36:57 <bswartz> gah! 15:36:57 <gouthamr> and i decided to let some part of it piggyback on my changes to that file.. since it's a doc change.. 15:37:07 <gouthamr> https://review.openstack.org/#/c/300018/5/doc/source/devref/share_back_ends_feature_support_mapping.rst 15:37:37 <gouthamr> but then, i got some concerns about when a driver should advertise support for this capability 15:37:40 <bswartz> that's an unforunate bug 15:37:44 <gouthamr> :( 15:37:59 <bswartz> gouthamr: please don't piggyback multiple changes 15:38:08 <gouthamr> bswartz: now i know. 15:38:41 <gouthamr> bswartz: so i thought i'd let you guys decide what's to be done about snapshot_support in that table. 15:38:47 <bswartz> the only good reason to squash unrelated changes is if there are 2 or more gate-blocker bugs that need fixing simultaneously 15:39:52 <bswartz> I thought we were going to get rid of that table... 15:39:53 <gouthamr> bswartz: i can take it out.. but im still interested in knowing how we want that column to read.. 15:40:02 <gouthamr> bswartz: that could be one soluion 15:40:23 <gouthamr> s/soluion/solution 15:40:36 <bswartz> we need to capture the relevant info somewhere before we remove it 15:41:09 <bswartz> but let's not spend too much time arguing about how the table should look if its planned to be deleted 15:41:39 <gouthamr> suggestion: the OpenStack Configuration Reference : http://docs.openstack.org/mitaka/config-reference/ 15:42:15 <gouthamr> bswartz: i know there's value in maintaining a table with checkmarks whether a feature is currently supported 15:42:32 <bswartz> yes 15:42:33 <gouthamr> its valuable to developers, deployers and users.. 15:42:39 <bswartz> justt not in the dev ref 15:42:54 <bswartz> users and deployers should never find anything useful in the dev ref 15:43:03 <gouthamr> true 15:43:20 <bswartz> if they do, it's because we failed to put useful information in the actual docs 15:43:56 <dustins> Well, not ONLY in the devref :) 15:44:08 <dustins> But agreed 15:44:10 <bswartz> dustins: you know what I mean 15:44:33 <dustins> Indeed 15:44:39 <bswartz> gouthamr: does that work for you? 15:44:44 <gouthamr> we can move this to config-ref.. 15:44:52 <bswartz> we can plan to move it 15:45:04 <bswartz> but for your change, just avoid that issue entirely 15:45:17 <gouthamr> bswartz: for the review in question, i would like to remove the snapshot_support column but retain the replication_type column 15:45:28 <gouthamr> bswartz: or do i not bother about that column too? ^ 15:45:44 <bswartz> I would ignore it as not relevant 15:45:59 <bswartz> start another change to deal with it specifically 15:46:12 <gouthamr> sure thing. thanks for the clarification! 15:46:41 <bswartz> okay! 15:46:45 <bswartz> #topic open discussion 15:47:08 <bswartz> anything else today? 15:48:10 <bswartz> okay thanks everyone! 15:48:19 <bswartz> #endmeeting