15:00:21 #startmeeting manila 15:00:22 Meeting started Thu Apr 14 15:00:21 2016 UTC and is due to finish in 60 minutes. The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:26 The meeting name has been set to 'manila' 15:00:31 hello all 15:00:31 Hi 15:00:32 \o 15:00:33 hi 15:00:33 hi 15:00:34 hello o/ 15:00:36 Hi 15:00:36 hello 15:00:37 hi 15:00:38 hello! 15:00:44 hi 15:01:04 it's a good thing this is an IRC meeting, because I have laryngitis that won't quit 15:01:24 :( bad pollen? 15:01:38 * bswartz thinks its virus related 15:01:41 anyways 15:01:46 bswartz: I was like that for past 2 weeks 15:01:51 a few topics today 15:01:57 #topic announcements 15:02:01 bswartz: Could always bold or italicize :P 15:02:31 summit is in 2 weeks so we'll cancel the 4/28 meeting 15:02:42 hello 15:02:46 the topics are selected on the etherpad, and I just need to transfer them into the official scheduling tool 15:03:02 we have a lot of spillover topics to cover on Friday afternoon 15:03:32 #link https://etherpad.openstack.org/p/manila-newton-summit-topics 15:03:42 hi 15:03:55 you're still welcome to suggest more topics but I think we've got plenty to discuss already 15:04:30 As I mentioned last week, I think newton is not the right time to do a Europe-based midcycle meetup 15:04:54 so I'm looking for possible US-based locations that are willing to host 15:05:05 RTP (NetApp) will be willing to host again 15:05:17 bswartz: ok 15:05:18 but we're open to other locations 15:05:40 we need to start planning dates for that, and get a location locked in 15:06:11 I'd like to have a list of locations/dates before the summit so we can poll for what people prefer 15:06:38 #topic Manage API retrying 15:06:48 gouthamr/nidhimittalhada: you here? 15:07:06 bswartz: thanks. i can cover this.. 15:07:25 #link https://bugs.launchpad.net/manila/+bug/1565903 15:07:26 Launchpad bug 1565903 in Manila "retry for manage share fails with an API error " [Undecided,In progress] - Assigned to Goutham Pacha Ravi (gouthamr) 15:07:49 the manage API has a retry logic 15:07:59 #link https://github.com/gouthampacha/manila/blob/master/manila/share/api.py#L366 15:08:17 gouthamr: I think retry logic isn't an accurate way to portray it 15:08:24 that's broken since liberty (since share instances) 15:08:25 it's more like "reuse" if the admin retries the API 15:08:43 bswartz: yes.. 15:08:56 so the API service doesn't retry anything if the request fails 15:09:07 it is trying to reuse the DB record created previously 15:09:29 but if the admin retries the API request, then the row in the database can get reused rather than forcing the admin to deleted the errored share first 15:09:42 s/deleted/delete/ 15:09:54 bswartz: true.. is that really helpful? 15:10:11 I was wondering if we could remove that from the API.. 15:10:37 and expect that the admin would delete the db entry and retry instead 15:11:05 reason 1) There's no way to test this feature with tempest tests 15:11:12 I was just trying to characterize the old behavior before the bug was introduced 15:11:34 I'm curious to know how people feel about it 15:11:55 I am ok removing "retry" 15:12:03 #link https://bugs.launchpad.net/manila/+bug/1566504 15:12:04 Launchpad bug 1566504 in Manila "Manage API attempts to 'retry' manage if the same parameters are used" [Undecided,New] - Assigned to NidhiMittalHada (nidhimittal19) 15:12:17 It was added as a convenience for admins, but if feels very anti-REST to me 15:12:26 Yeah, I can see removing the "retry" logic 15:12:29 vponomaryov: so can we do this without a microversion.. 15:13:15 gouthamr: if we characterize all of the whole old behavior as a bug, then it can be fixed without changing the microversion 15:13:18 gouthamr: #link http://docs.openstack.org/developer/manila/devref/api_microversion_dev.html#when-do-i-need-a-new-microversion 15:14:03 bswartz: looks like a bug to me 15:14:04 thanks.. 15:14:06 however if the bug just says we should restore the old behavior and make it work the way it used to, then changing it would require a new microversion (and the change would not be backportable) 15:14:42 IIRC it was u_glide who implemented the reuse/retry optimization 15:14:44 bswartz: the "fix" to this issue is to deepcopy dictionaries in the DB helpers.. 15:14:57 and he's no longer around to defend his decision 15:15:39 bswartz: i would like to keep the fix nevertheless ... since it's always safe to copy your parameters and not modify them 15:15:43 so if nobody else speaks up in favor of preserving the reuse/retry logic I think we'll just rewrite the bug and say we want that logic removed 15:15:52 gouthamr: I think those are also necessary fixes, although they can be masked if retry is removed, is that right? 15:16:09 ganso: yep.. 15:16:53 okay I'm not hearing any support 15:17:14 will add to the bug report 15:17:15 thanks 15:17:16 #agreed share manage should not attempt to reuse IDs of previously failed (error_managing) shares 15:17:29 #topic Tempest test structuring - resource sharing and cleanup issues 15:17:40 so akerr isn't able to attend today 15:17:54 bswartz: do you know about the issues he want to raise? 15:17:58 I will introduce this topic briefly but we may need to save discussion for a time when he's around 15:18:31 I know there was the patch he introduced about improving cleanup to not cascade failures 15:18:34 mkoderer: issue is to make cleanup failure for some resource appear only once 15:18:45 the problem is that our tempest tests do a lot at the class level rather than the instance level to facilitate resource sharing between tests 15:18:59 mkoderer: now every test cleanup tries to remove failed thing that could be deleted before 15:19:08 this was considered a good idea for the generic driver because it's so slow and it was the only way to make our gate tests fast 15:19:23 bswartz: But should probably be removed post haste 15:19:32 IMO, for every driver other than the generic driver, this is a bad way to do tests 15:19:44 #link https://review.openstack.org/#/c/304834/ 15:19:56 I'd argue that it's a bad way even for the generic driver, but a necessary evil 15:20:02 we should not share resources between test groups 15:20:05 bswartz: +1 15:20:10 dustins: +1 15:20:46 +1 15:20:55 so I'm still looking for a good solution to: "how do we make the generic driver able to run tempest tests quickly in a world where we don't share any resources between tests" 15:21:23 The answer might be that we cannot 15:21:26 assuming we can solve that problem, then I'm strongly in favor of restructuring our tests to eliminate sharing of resources 15:21:38 perhaps not 15:21:51 I'd hate for all of our tests to suffer just for the sake of the Generic Driver 15:22:04 I would be very curious to see an experiment where we run tempest tests on generic driver with no resource sharing 15:22:12 But having a "fork" of our tests specifically for the Generic Driver isn't optimal either 15:22:26 Once we have the new 1st-party drivers voting, can we run a smaller set of tests on Generic? 15:22:27 I'd like to know (1) how long does it take and (2) how much CPU/RAM/Strorage gets used 15:22:34 I think we should really find a better reference implementation 15:22:39 mkoderer: +1 15:22:47 well yes we're working hard to new reference drivers 15:23:02 that a necessary-but-not-sufficient solution to the problem 15:23:13 bswartz: It would be an interesting experiment 15:23:24 after we have new reference drivers we will use those for voting gate jobs, but we will continue to support and test generic driver 15:23:48 bswartz: but we could reduce the ammount of test for the generic driver 15:24:09 mkoderer: possibly, but not less than what we require of 3rd party vendor CI 15:24:56 bswartz: what is the goal that is going to be solved by removing "shared resources"? 15:25:09 vponomaryov: test reliability 15:25:21 Failures in one test not affecting any other 15:25:27 dustins: +1 15:26:12 bswartz, dustins: problem still will exist that caused errors 15:26:14 it will be easier to debug failures when a small bug causes exactly 1 failure in tempest rather than what happens today, where a small bug can cause a cascade of failures 15:26:33 bswartz: exactly 15:26:49 bswartz: it is easier to update "cleanup" approach 15:27:00 yes I think that was akerr's point 15:27:03 bswartz: than to stop using shared resources 15:27:19 the first small step to making things better is improve cleanup 15:27:45 Ultimately I'd like to see shared resources go away, but baby steps first 15:27:49 I'll let akerr speak to his goals around changing class methods to instance methods in the tempest code 15:28:05 that will have to wait for another meeting 15:28:52 it makes sense to me intuitively that resource sharing across tests is generally bad because it can cause a failure in 1 test to cause failures in other tests 15:29:14 +1 15:29:17 but I also get the benefit of speeding things up by not creating needless extra objects 15:29:24 so we have to find a balance there 15:29:37 I suspect we're currently too far on the sharing of resources side 15:29:49 that's all I'll say about this topic 15:29:53 Indeed, make it as fast as possible while maintaining as much independence of resources as we can 15:30:03 we share them now so, they cannot break each other running in parallel in any order 15:30:07 it is enough 15:30:24 vponomaryov: I think it depends on the type of object 15:30:35 1 cleanup errror or 10 - anywa we have a problem 15:31:07 I think you may be correct if you ignore problems in tests cleanup, but I think many QA people believe it's wrong to ignore those cleanup errors 15:31:31 Absolutely 15:31:44 bswartz: why ignore? they are analized easily 15:31:51 bswartz: that the main thing 15:32:01 not by me 15:32:12 I have a hard time tracing down the reason tests fail 15:32:19 Same 15:32:22 and I've been doing this for a while 15:32:29 because it lays not in tempest but services 15:32:34 I feel bad for first time developers trying to debug a test failure 15:33:00 Same here. A failing test should be caused by an issue with that test, not some other one. 15:33:02 Or the testers trying to figure out if something's actually wrong or if something just randomly failed 15:33:02 it is question of cleanup, but not shared resources 15:33:26 vponomaryov: I'd argue that it's a bit of both 15:33:45 yeah I think we need to discuss the various proposed changes on a case by case basis 15:33:47 dustins: current problem in "cleanup" is python-specific 15:34:06 some sharing is probably accepable 15:34:12 dustins: we have false cleanup errors on "not-really-shared-resources" 15:34:18 but some of it definitely causes issues 15:34:25 dustins: just because of python specifics 15:34:49 vponomaryov: I'm not sure I follow, I'll chat with you afterwards to see if we can get on the same page 15:34:52 vponomaryov: we've never seen those on the NetApp CI 15:35:03 vponomaryov: false cleanup errors.. 15:35:07 so I'd like to revisit this topic with vponomaryov, akerr, and dustins sometime soon 15:35:17 perhaps next meeting or perhaps earlier than that if we can find a time 15:35:38 for now let's move on 15:35:40 gouthamr: by "false" I mean "duplication of failure again and again" 15:35:42 * gouthamr adds himself as interested party :P 15:35:57 gouthamr: I won't exclude anybody 15:36:06 #topic Snapshot support common capability in the documentation 15:36:14 gouthamr: you are up again 15:36:27 alright.. i saw this bug #link https://bugs.launchpad.net/manila/+bug/1506863 15:36:28 Launchpad bug 1506863 in Manila "'snapshot_support' capability absent in appropriate dev doc" [High,New] 15:36:57 gah! 15:36:57 and i decided to let some part of it piggyback on my changes to that file.. since it's a doc change.. 15:37:07 https://review.openstack.org/#/c/300018/5/doc/source/devref/share_back_ends_feature_support_mapping.rst 15:37:37 but then, i got some concerns about when a driver should advertise support for this capability 15:37:40 that's an unforunate bug 15:37:44 :( 15:37:59 gouthamr: please don't piggyback multiple changes 15:38:08 bswartz: now i know. 15:38:41 bswartz: so i thought i'd let you guys decide what's to be done about snapshot_support in that table. 15:38:47 the only good reason to squash unrelated changes is if there are 2 or more gate-blocker bugs that need fixing simultaneously 15:39:52 I thought we were going to get rid of that table... 15:39:53 bswartz: i can take it out.. but im still interested in knowing how we want that column to read.. 15:40:02 bswartz: that could be one soluion 15:40:23 s/soluion/solution 15:40:36 we need to capture the relevant info somewhere before we remove it 15:41:09 but let's not spend too much time arguing about how the table should look if its planned to be deleted 15:41:39 suggestion: the OpenStack Configuration Reference : http://docs.openstack.org/mitaka/config-reference/ 15:42:15 bswartz: i know there's value in maintaining a table with checkmarks whether a feature is currently supported 15:42:32 yes 15:42:33 its valuable to developers, deployers and users.. 15:42:39 justt not in the dev ref 15:42:54 users and deployers should never find anything useful in the dev ref 15:43:03 true 15:43:20 if they do, it's because we failed to put useful information in the actual docs 15:43:56 Well, not ONLY in the devref :) 15:44:08 But agreed 15:44:10 dustins: you know what I mean 15:44:33 Indeed 15:44:39 gouthamr: does that work for you? 15:44:44 we can move this to config-ref.. 15:44:52 we can plan to move it 15:45:04 but for your change, just avoid that issue entirely 15:45:17 bswartz: for the review in question, i would like to remove the snapshot_support column but retain the replication_type column 15:45:28 bswartz: or do i not bother about that column too? ^ 15:45:44 I would ignore it as not relevant 15:45:59 start another change to deal with it specifically 15:46:12 sure thing. thanks for the clarification! 15:46:41 okay! 15:46:45 #topic open discussion 15:47:08 anything else today? 15:48:10 okay thanks everyone! 15:48:19 #endmeeting