15:00:58 #startmeeting manila 15:00:59 Meeting started Thu Oct 26 15:00:58 2017 UTC and is due to finish in 60 minutes. The chair is bswartz. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:00 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:02 The meeting name has been set to 'manila' 15:01:11 o/ 15:01:14 \o 15:01:21 hi 15:01:26 hello all 15:01:36 hi 15:01:38 @! 15:01:38 <_pewp_> jungleboyj (◍˃̶ᗜ˂̶◍)ﾉ” 15:01:50 hi 15:01:57 jungleboyj what is that? 15:02:14 Looks like a waving cat. :-) 15:02:37 hello 15:03:03 these emojis are like rorschach tests 15:03:25 :-) Gotta keep your life interesting. 15:04:08 courtesy ping: cknight vponomaryov toabctl 15:04:25 #topic announcements 15:04:42 so we extended the spec freeze deadline 1 week, to today 15:04:57 I don't see any more specs that look ready for review 15:05:07 #link https://review.openstack.org/#/q/status:open+project:openstack/manila-specs 15:05:35 so I think we're in good shape spec-wise 15:06:01 the agenda is short today 15:06:13 review comments on https://review.openstack.org/#/c/504987/ are solicited, though we won't merge it for this cycle's deadline 15:06:16 #agenda https://wiki.openstack.org/wiki/Manila/Meetings 15:06:57 tbarron: okay thanks -- I'll plan to take a look 15:07:05 I have one topic for today 15:07:11 #topic Manila test image 15:07:21 #link https://review.openstack.org/#/q/status:open+project:openstack/manila-test-image 15:07:37 is anyone reviewing changes to this project? 15:08:10 I'm wondering if it makes sense for me to just ninja merge stuff here, or if there are other core that want to review what goes into manila-test-image 15:08:31 I think ninja-merge is OK for this 15:08:43 I've been watching out of the corner of my eye from time to time 15:08:52 there's a ton of work needed to get the manila nextgen driver working, and it's really hard to maintain long chains of unmerged patches 15:08:55 this test image isn't used for our gate yet 15:09:12 +1 15:09:37 o/ 15:09:42 okay so no objection to ninja merging on manila-test-image during the queens cycle? 15:09:57 ++ 15:10:20 okay cool 15:11:08 #topic Let's Go Over New Bugs 15:11:35 dustins: title caps here hurt my eyes 15:11:36 Looks like it's my turn, then 15:12:06 #link https://etherpad.openstack.org/p/manila-bug-triage-pad 15:12:07 * dustins makes note to look at his copy of "Elements of Style" 15:12:34 So the first one this week is: https://bugs.launchpad.net/manila/+bug/1715783 15:12:35 Launchpad bug 1715783 in Manila "The associated share will be deleted after delete all share replica" [Undecided,New] 15:13:22 zhongjun: You opened this a bit ago, has to do with a share being deleted when all of its associated replicas were deleted 15:13:35 dustins: yes 15:13:47 I'm confused 15:13:58 when you say "all replicas" do you mean the original replica too? 15:14:03 dustins: It just from our user's perspective, they thought the share replica and share are different resources. They shouldn’t interact with each other. If we want to clean the share replica then we can delete all share replica. 15:14:14 bswartz: yes 15:14:20 okay, this relates to our "reset-state" mechanism 15:14:27 when a share is created, it gets 1 "replica" automatically 15:14:42 even unreplicated shares are treated as if they have 1 replica 15:15:52 bswartz: zhongjun created a share, reset the replica-state of the "replica" and used share-replica-delete on that 15:16:17 So deleting all replicas will delete the share because it IS one of the replicas 15:16:30 yeah that's the point I was trying to make 15:16:49 if you delete the last replica, you did delete the share -- working as designed IMO 15:16:52 the logic in the API is you can't use share-replica-delete to delete that last replica 15:16:52 bswartz: so, when a share is created, they thought "replica" is a different resource, so we may not gets 1 "replica" automatically 15:17:13 gouthamr: that was going to be my next question 15:17:19 we may have a usability issue 15:17:53 while we know that deleting the last repica is the same as deleting the share, it would better to force users to go through the share deletion path to actually do that 15:17:57 dustins: yes, because it is one of the replicas 15:17:58 to avoid confusion and surprises 15:18:48 bswartz: that's how it is implemented 15:19:03 does reset state allow you to do something not intended? 15:19:08 there's reset-replica-state, which messes with that :/ 15:19:14 okay 15:19:25 so we don't need to fix the whole bug during this meeting 15:19:26 well, the reset-* methods are meant to be used by administrators so they don't go into the DB 15:19:37 Please update the bug with the suggested fix and let's move on 15:19:50 Sounds good 15:20:04 Next up: https://bugs.launchpad.net/manila/+bug/1713062 15:20:05 Launchpad bug 1713062 in Manila "Missing ability to automatically build configuration reference artifacts" [Undecided,New] 15:20:10 bswartz: okay 15:20:20 #action: gouthamr update bug/1715783 15:20:37 So this one has to do with docs stuff since the docs were moved in tree 15:21:03 And we lost the ability to get release diffs of configuration options with that 15:21:22 gouthamr: this bug is pretty high priority IMO -- is it a lot of work to fix it? 15:21:23 So just just gotta do a bit of work to get it working again (presumably) 15:21:28 do you know specifically what's needed? 15:22:02 jungleboyj: do you have this working in cinder yet? 15:22:06 is this one of those situations where we can just steal code from cinder? 15:22:14 tbarron: my thought exactly 15:22:22 i don't yet, the docs teams had some tooling around our in-tree config generator that we might need to resurrect and adopt 15:22:31 * bswartz tries to rope jungleboyj into fixing all our docs issues 15:22:32 bswartz: didn't you talk with jungleboyj and dhellman about this? 15:22:36 tbarron: let me look. 15:23:16 oh wait -- this is slightly different than what I assumed it was 15:23:24 yes, this topic was covered at the PTG 15:23:38 tbarron: We talked about that in Denver. I am trying to remember what we landed on. Let me look at the notes. 15:23:46 IMO it's less of a bug than a new feature -- but either way it's high priority 15:24:21 presumably every openstack component has this issue so there should be some kind of tool and howto 15:24:33 even if cinder hasn't done this -- some other project has (I would guess nova at least) so we can go look for a model to replicate 15:24:53 +1 15:24:54 I think we'll need to mark up our code and run a tool but that's all I know 15:25:18 Ok, found the notes. So, there is a sphinxext plugin that I need to look at that should be able to automatically generate tables of config items for drivers in each release. 15:25:42 jungleboyj: Mind if I assign you to the bug? 15:25:43 jungleboyj: do you know if any project has used it yet? 15:25:55 If, however, what you are looking to do is produce diffs of options. That is a different topic. We have been relying on release notes with changes to document those. 15:26:00 jungleboyj: Is there a link? 15:26:09 tbarron: I believe Nova has done it. 15:26:14 Hold on. 15:26:16 dustins: well we can take the bug rather than the cinder ptl but 15:26:24 dustins: I was joking about assigning this to jungleboyj 15:26:32 Oh, haha 15:26:35 I'm sure he's busy enough 15:26:40 they will presumably be doing their own work in parallel 15:26:54 but I was serious about trying to just copy the work of either cinder or nova 15:27:06 ideally this shouldn't be a heavy lift 15:27:21 Notes with links are here: 15:27:27 #link https://etherpad.openstack.org/p/cinder-ptg-queens-wednesday-notes 15:27:28 +1 we are short-handed so if anyone has a model to follow that will help 15:27:33 Example of how Nova is doing it: 15:27:47 #link https://github.com/openstack/nova/commit/83a9c2ac334712b27704a814552628cf0e536a85 15:28:00 cool, we could just follow this 15:28:18 relevant #link: https://docs.openstack.org/oslo.config/latest/reference/sphinxext.html 15:28:46 So, that gets the high level part done. I still have to figure out if there is a way to do it per driver. 15:28:59 I copied those links to the bug 15:29:10 it still needs an owner, but now whoever takes it has a roadmap to follow 15:29:28 And I added the information to the bug etherpad as well 15:30:04 dustins: it's preferable for the etherpad to have ephemeral information and for all long-term relevant into to be captured in LP 15:30:11 It is going to be a bit before I get to playing with this on the Cinder side. 15:30:14 they call it "etherpad" for a reason 15:30:29 s/into/info/ 15:30:45 bswartz: Sure, but just in case someone looks at the etherpad for a low hanging fruit to get started with Manila, they can have some info 15:30:49 bswartz: That is why I summarize the notes in our Wiki. :-) 15:31:09 Agreed that it should be in Launchpad definitively :) 15:31:18 may wiki live forever! 15:31:50 next? 15:32:10 Next is: https://bugs.launchpad.net/manila/+bug/1713060 15:32:11 Launchpad bug 1713060 in Manila "Changing service network mask breakes new service subnet creation" [Undecided,New] 15:32:38 ugh 15:32:49 this is unsupported intentionally 15:32:54 Hehehe 15:33:10 So Won't Fix/Not a Bug, then? :) 15:33:27 there are all kinds of configuration options, where if you change them after you've created some shares, your system will be totally hosed 15:33:48 we probably need to at least document which options those are 15:34:12 and consider preventing some of the worst problems 15:34:21 note that jan vondra filed this one and he has a number of patches up for the generic driver 15:34:32 Sounds like a good idea 15:34:37 esp. for generic driver with cifs 15:34:52 yeah 15:35:00 the generic driver is pretty fragile 15:35:19 even the changes I'm working on don't solve this kind of problem 15:35:49 bswartz: there is a config option for setting the subnet 15:35:54 yeah 15:36:00 bswartz: if changing it does not work, the config option shouldn't exist 15:36:14 where neutron is involved, or anything related to networking really, we assume that the initial values won't ever change 15:36:14 bswartz: can you update that bug with your observations/reflections? 15:36:38 ganso: well it can have different initial values 15:37:10 tbarron: what ganso is getting at, and I think I agree, is that something which can't ever change should be entered into the database, not a config file 15:37:29 ganso: bswartz ack 15:37:50 so we may have some design flaws 15:37:54 I will update the bug 15:38:14 but absent some proposal to address these design issues, I don't see how to fix the bug 15:38:30 so it will most likely go to wontfix 15:38:30 maybe we'll get lucky and jan vondra will come up with an approach to the issue 15:38:36 yes it's possible 15:38:46 dustins: next? 15:38:50 https://bugs.launchpad.net/manila/+bug/1709474 15:38:51 Launchpad bug 1709474 in Manila "Can't reset state of a share server" [Undecided,New] 15:39:18 This one just has to do with wanting reset operations for share servers in addition to shares themselves 15:40:00 how much cleanup would reset-state for share servers do? 15:40:15 is it *just* the DB change? 15:40:17 nothing, it would only update the state on the DB 15:40:19 yes 15:40:57 k 15:41:01 reset state was only ever supposed to hack the DB column right? 15:41:05 yes 15:41:15 cleanup is explicitly not done in reset state 15:41:24 i'm just getting that on record :) 15:41:38 :) it's a hack 15:41:39 if you want to cleanup, you use a non-forced delete 15:42:13 the main benefit of reset state, is that it allows you to retry a failed delete 15:42:27 increasing the chance of actually cleaning up a mess 15:42:58 ofc it also allows you to turn a small mess into a big mess 15:43:07 if used incorrectly 15:43:43 so this is a minor feature as we never said you can do this 15:43:56 but it's a reasonable expectation 15:44:15 what other resources can we not reset today? 15:44:20 yes, if a share server is stuck in "creating" or "deleting" state (reasons abound) it can't be deleted via manila API 15:44:20 you'd've to resort to mucking with the database 15:44:22 yeah, we cannot do any other operations except delete/list/show 15:44:29 gouthamr: is this an RFE masquerading as a bug? 15:45:03 Yes, perhaps... can i not have it, then? 15:45:29 I don't think it requires a spec :) 15:45:35 I marked it wishlist 15:45:44 we'd need a use case I think 15:45:59 But it would be good to think about what other resources have the same issue. 15:46:02 to explain who the intended user is and what he's going to do with it 15:46:05 it's on the bug.. i can add another one.. 15:46:15 yep 15:46:26 oh I had to read it a second time to see the use case 15:46:36 yes the RFE is clear 15:46:48 it feels low priority to me 15:47:07 the workaround is to use mysql 15:48:29 next? 15:48:33 and figure out is there another way to solve this use case? Could we just use force delete if the use case is just for delete it. 15:48:40 Last one for today: https://bugs.launchpad.net/manila/+bug/1708491 15:48:41 Launchpad bug 1708491 in Manila "tempest api migration tests very long" [Undecided,New] 15:48:56 zhongjun: gouthamr didn't want to delete it, he wanted to continue using it 15:48:57 This one has to do with the speed (or perhaps the lack of it) with migration tests 15:49:29 zhongjun: also, no force-delete on share-servers :) 15:50:01 tbarron: should we consider a separate job for migration tests? 15:50:16 gouthamr: probably wish-list bugs for each of these, they are relatively low-hanging fruit if still low-priority 15:50:28 tbarron: +1 will add it 15:50:29 is there an advantage to running migration tests on every single job? 15:50:30 bswartz: I think we should look at this issue again after raissa 15:50:32 gouthamr, bswartz: :) 15:50:47 has moved the tests to their own repo and we look at convertying 15:51:00 converting the legacy jobs to new zuulv3 format 15:51:22 one reason they're on every job is because we have so many first party drivers and driver-modes 15:51:39 and in that last step maybe decide to take a different approach 15:51:40 and they're not entirely "api" tests.. 15:51:46 yeah but are we really testing anything unique in each case? 15:51:55 or is it the same code running over and over 15:52:14 I agree that drivers with assisted migration need to test that somehow 15:52:14 bswartz: there are main tests 15:52:21 that's a good question 15:52:23 bswartz: which guarantee that a migration works 15:52:24 but none of first party drivers do that 15:52:53 bswartz: but there are other tests such as "migrate and extend", "migrate and create from snapshots" that guarantee that other functionality work with migration 15:53:19 what is really being tested though? 15:53:33 how is a migrate and extend different from a create new share and extend? 15:53:41 (for drivers that don't assist migration) 15:53:53 it shouldn't be any different 15:54:01 but we have seen cases where these things break 15:54:07 the idea is to blackbox the tests and not favor them based on how we know it works.. :) 15:54:12 then there's not a lot of point in burning CPU cycles on running the tests for every job 15:54:30 gouthamr: in an ideal world with infinite CPU cycles, yes 15:54:37 because drivers have to handle share_id, share_instance_id changes on migrated shares and these things can end up not being handled properly 15:54:40 this is a case of some unusually slow tests though 15:54:56 so it's reasonable to treat them specially 15:55:09 we always migrate a 1gb empty share 15:55:34 i'm less concerned with the cpu cycles than the human cycles sorting out the false negatives and rechecking on patches that have nothinig to do with migration 15:55:49 if we can get the test's runtime down from 4 minutes to under a minute, I think that would be an acceptable fix too 15:56:12 tbarron: if that's the case we should take a step back and investigate why they fail instead of rechecking 15:56:15 but assuming that's hard, we should consider running them in a different pipe 15:56:16 yeah, that would be best, I just don't know how to do that right off 15:56:24 tbarron: migrations shouldn't be fail-prone that we recheck until the work 15:56:31 s/the work/they work 15:56:37 they could still vote, but we could removing slowness as a cause of failures 15:57:17 can the slowness be because of that periodic interval that can't be respected as well? 15:57:25 ganso: I agree, but often they take way too long and that's the reason for the failures, and the reason for the slowness is mysterious 15:57:26 that's a good point 15:57:37 lately, especially with generic driver and lvm driver, I've seen only the migration tests failing, but most of the time they work 15:57:38 or is that only the case with zfsonlinux, netapp, dummy drivers 15:57:48 the slowness might be something easy to address by speeding up some intervals 15:59:08 bswartz, tbarron: I think we can attribute an "endurance test" characteristic to migration tests, and have them as a separate job. Those separate jobs should always pass, if they don't, then there's something wrong with the driver or storage, but we can separate the "flakyness" of those tests from the main jobs which are voting and fail much less when they don't run migration tests 15:59:24 ganso: I agree with you that simply closing your eyes and rechecking is the wrong approach here 16:00:09 time check 16:00:09 so we need someone to investigate and dig into these failures? 16:00:13 ganso: that's more or less what I was proposing assuming we can't speed up the tests 16:00:14 gouthamr: I've seen the periodic task running very fast on NetApp CI 16:00:21 we're out of time for today 16:00:36 To the other channel! 16:00:43 this bug needs an owner and more discussion -- we can revisit it next week if nobody wants to grab it 16:00:47 thanks all 16:00:59 #endmeeting