16:00:07 <smcginnis> #startmeeting Cinder 16:00:07 <openstack> Meeting started Wed Jun 22 16:00:07 2016 UTC and is due to finish in 60 minutes. The chair is smcginnis. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:09 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:10 <smcginnis> ping dulek duncant eharney geguileo winston-d e0ne jungleboyj jgriffith thingee smcginnis hemna xyang tbarron scottda erlon rhedlind jbernard _alastor_ vincent_hou kmartin patrickeast sheel dongwenjuan JaniceLee cFouts Thelo vivekd adrianofr mtanino yuriy_n17 karlamrhein diablo_rojo jay.xu jgregor baumann rajinir wilson-l reduxio wanghao 16:00:11 <yuriy_n17> hi 16:00:11 <openstack> The meeting name has been set to 'cinder' 16:00:15 <e0ne> hi 16:00:15 <geguileo> Hi! 16:00:17 <cknight> hi 16:00:17 <ntpttr> hi 16:00:19 <smcginnis> Hey everyone 16:00:21 <e0ne> #link https://wiki.openstack.org/wiki/CinderMeetings#Next_meeting 16:00:21 <patrickeast> o/ 16:00:22 <adrianofr> hi 16:00:24 <tbarron> hi 16:00:25 <_alastor_> \o 16:00:27 <diablo_rojo> Hello :) 16:00:29 <scottda> hey 16:00:33 <eharney> hi 16:00:35 * DuncanT waves 16:00:37 <wanghao_> hi 16:00:39 <smcginnis> #topic Announcements 16:00:42 <Swanson> hello 16:00:50 <xyang> hi 16:00:55 <smcginnis> #link https://etherpad.openstack.org/p/cinder-spec-review-tracking Review focus 16:00:57 <dulek> o/ 16:01:00 <erlon> hey 16:01:03 <smcginnis> Please take a look at the review focus etherpad. 16:01:26 <smcginnis> Cores in particular - if we can spend some time working on the new drivers submissions - it would be great to get a few more of those out of there. 16:01:44 <smcginnis> Though with the lack of activity on some responding to comments I'm sure a few of those are not going to make it. 16:02:00 <smcginnis> #link https://etherpad.openstack.org/p/newton-cinder-midcycle 16:02:01 <dulek> geguileo: Can you update list of patches related to c-vol A/A there? 16:02:21 * jungleboyj is having to drop off for other commitments. Will catch up on the notes later. 16:02:25 <smcginnis> Just another reminder on the midcycle. Sign up if you are going, and please add any topics there. 16:02:30 <smcginnis> jungleboyj: Fine, be that way. 16:02:31 <smcginnis> :P 16:02:32 <erlon> smcginnis: do they all have the CI running ok? 16:02:40 <smcginnis> erlon: Not all of them yet. 16:02:42 <jungleboyj> smcginnis: I will. 16:02:48 <geguileo> dulek: Do you want a list or just the first in the series? 16:02:49 * jungleboyj stomps off pouting. 16:02:54 <smcginnis> :) 16:03:05 <smcginnis> geguileo, dulek: Yeah, that could help. 16:03:17 <smcginnis> That's another one I would like to see get through and be done with. 16:03:24 <dulek> geguileo: Good question. A list makes it easier to asses size by just looking at the Etherpad. 16:03:31 <dulek> geguileo: …I think. 16:03:36 <smcginnis> #info Gerrit outage July 1 @ 20:00 UTC for upgrades. 16:03:38 <geguileo> dulek: Ok, list it is 16:03:48 <smcginnis> Gerrit will be offline for upgrades. 16:03:50 <xyang> smcginnis: midcycle is the week of 7/18, newton-2 is 7/12 when all specs are supposed to be approved. these dates are reversed:) 16:04:11 <smcginnis> xyang: Yeah, as usual, not good timing. :] 16:04:26 <smcginnis> I would really like to try to finalize specs by then 16:04:47 <smcginnis> But like last time - at the discretion of the core team whether to accept any more after that point. 16:05:00 <xyang> smcginnis: sure 16:05:04 <hemna> yough 16:05:14 <smcginnis> Hopefully no last minute rewrites of major features related to desserts though. ;) 16:05:26 <xyang> smcginnis: right, we delayed the midcycle due to some tournaments in Fort Collins 16:05:36 <xyang> :) 16:05:58 <smcginnis> #topic Refactor quota reservation in managing volume/snapshot 16:06:01 <smcginnis> wanghao_: Hi 16:06:05 <wanghao_> hi 16:06:11 <smcginnis> #link https://blueprints.launchpad.net/cinder/+spec/refactor-quota-reservation-in-managing-vol Blueprint 16:06:29 <smcginnis> #link https://bugs.launchpad.net/cinder/+bug/1504007 Quota bug 16:06:29 <openstack> Launchpad bug 1504007 in Cinder "unmanage volume/snapshot reduce quota incorrectly" [Medium,In progress] - Assigned to wanghao (wanghao749) 16:07:28 <smcginnis> wanghao_: Was there something you wanted to discuss with this? 16:07:37 <wanghao_> I talked this bug with michal, we think to fix this bug completely, need to refactor the managing existing volume/snapshots. 16:07:59 <smcginnis> Oh? 16:08:06 <wanghao_> So want to hear guy's opinion. 16:08:26 <smcginnis> wanghao_: Can you describe a little what is needed to be done for that? 16:08:57 <wanghao_> Want to make quota reservation and commit in cinder-api instead of in cinder-volume. 16:09:21 <dulek> Quotas on manage are broken because we don't know the size of a volume being managed in c-api. 16:09:46 <dulek> So we're reserving them in c-vol. 16:09:49 <dulek> And it's hard to revert things properly in c-vol. 16:09:53 <xyang> dulek: should we know more about the volume size in cinder volume rather than cinder api? 16:09:53 <smcginnis> dulek: Ah. So by the time we manage it it's too late? 16:09:55 <dulek> (if not impossible) 16:09:56 <wanghao_> yes, that means we need to call getting-size in cinder-api. 16:10:40 <wanghao_> smcginnis: 16:10:43 <smcginnis> So from c-api we would call down to get size first, then call down to manage if quotas check out? 16:10:56 <wanghao_> smcginnis: yeah 16:11:42 <dulek> I think this was an anti-pattern in the past, that's why I wanted that to be discussed on a meeting. 16:12:00 <dulek> Now we have list managable volume which also works as call to c-vol. 16:12:28 <smcginnis> I think it makes sense, but would be interested in hearing from others on any reasons why not to do it that way. 16:12:35 <DuncanT> list managable is not implemented everywhere, and is low performance 16:13:23 <wanghao_> The point is getting volume size, since quota reservation needs that. 16:14:01 <xyang> how do you know the size without making a call to the backend? 16:14:11 <xyang> this should go to scheduler first? 16:14:42 <cknight> Why is it "hard to revert things properly in c-vol"? 16:14:43 <geguileo> xyang: I agree 16:15:35 <wanghao_> xyang: we don't know it, so we want to do it first in cinder-api not cinder-volume. 16:15:51 <xyang> wanghao_: scheduler is involved in manage volume 16:16:05 <DuncanT> Moving this to c-api seems not to be the right way forward, honestly 16:16:20 <xyang> wanghao_: I think that is why it is done in cinder volume. cinder api -> scheduler -> cinder volume 16:16:28 <cknight> DuncanT: +1 16:16:29 <dulek> xyang: When doing manage you need to specify host, backend and pool, so no scheduler needed I think. 16:16:32 <xyang> I don’t know how you get the size in cinder api? 16:16:34 <hemna> that can take a 'long time' to have the c-api wait for the array to report size 16:16:46 <hemna> c-api -> c-vol -> array and back 16:16:46 <xyang> dulek: scheduler is involved 16:16:49 <hemna> can be very slow 16:16:54 <sheel> dulek:right 16:17:04 <xyang> dulek: it goes to scheduler first. 16:17:34 <smcginnis> This adds APIs for getting details: https://specs.openstack.org/openstack/cinder-specs/specs/newton/list-manage-existing.html 16:18:00 <dulek> xyang: Oh, it only validates the host specified by the user. CHecks if a volume will fit there with filters. 16:18:04 <smcginnis> A little different scenario, but similar. 16:18:39 <dulek> Okay, so here's why it's hard to do things correctly in c-vol. 16:18:54 <dulek> It's because DB entry is already created. 16:19:00 <dulek> (by c-api) 16:19:32 <dulek> In other places quotas are validated before creating a DB row for a volume. 16:20:08 <DuncanT> We can just put the volume in error if the quota reservation fails in c-vol though... it's an admin only call 16:20:09 <dulek> So we don't have a problem - should we decrement quota counters when deleting a volume. 16:20:33 <dulek> Sure, but now do I decrement the quota or not when deleting? 16:20:40 <dulek> How do I know when it failed? 16:21:04 <DuncanT> Hmmm, a valid point 16:21:32 <dulek> Another option could (probably) be special error status that is set on failure modes not requiring quota decrement. 16:22:03 <DuncanT> error_managing? 16:22:33 <wanghao_> DuncanT: that why the bug is here, we need to cover some cases, but maybe can't fix it completing. 16:23:35 <dulek> DuncanT: Naming isn't a problem here. ;) 16:24:23 <DuncanT> wanghao_: I think a special error status is less messy than moving the quota reservation, in this case 16:24:48 <dulek> DuncanT: I don't know if new state would cover all cases. Like when volume message was lost in a way to c-vol… What do we do? Do we decrement? 16:25:17 <DuncanT> dulek: No, since we haven't reserved yet 16:25:18 <dulek> We have creating status. It's like we would need "managing" status as well. 16:25:23 <xyang> DuncanT: Is there a reason that we cannot decrement the quota reserve when it fails? 16:25:33 <xyang> DuncanT: in cinder volume 16:25:34 <DuncanT> "managing" seems reasonable 16:26:13 <DuncanT> xyang: Because we decrement quota when deleting volumes in error state 16:26:35 <cknight> Manila did the same thing. There are states for 'managing' and 'manage_error'. 16:26:56 <dulek> cknight: Ouch, was it to fix the same problem? 16:27:08 <cknight> dulek: I don't recall, honestly. 16:27:11 <wanghao_> cknight: emm, that's a useful info. 16:27:35 <smcginnis> cknight: Do you know if there was a reason for manage_error rather than just error. Just to be clear? 16:27:42 <DuncanT> 'managing' matches more closely what we have with other operations, though the quota management is still different 16:27:53 <cknight> smcginnis: I'll check. 16:28:08 <smcginnis> cknight: No worries, I was just curious if you happened to know. 16:28:13 <cknight> But a managing volume has size of 0 until it succeeds, so I don't know why it's an issue with quota reservations. 16:28:14 <wanghao_> DuncanT: if manageing, we don't decrement the quota, and if error_manage, decrement it, right? 16:28:17 <DuncanT> smcginnis: Don't delete quota when deleting a volume in manage_error for us.... 16:28:33 <dulek> cknight: It's about volume number quota I think. 16:28:35 <smcginnis> DuncanT: Yeah, it could be a way to differentiate that operation. 16:28:44 <hemna> DuncanT, we should be able to decrement the quota for failed manage calls though in c-vol 16:28:46 <wanghao_> DuncanT: 16:28:51 <hemna> we shouldn't have to wait for a delete 16:28:59 <smcginnis> hemna: +1 16:29:13 <cknight> dulek: Sure, so it goes to error and the quota is decremented on delete. 16:29:49 <dulek> Well, if Manila already tried the pattern, then why don't we? 16:30:16 <cknight> dulek: I discussed this with bswartz at the time, and his feeling was that a failed manage results in an errored volume that counts against quota until it is deleted. 16:30:53 <cknight> dulek: Since manage is admin only, the admin can always delete or force-delete as needed. 16:30:57 <smcginnis> cknight: I wonder why. The volume was already there before on the backend. 16:31:24 <hemna> cknight, but we don't know and may never know how much quota (space) to count against that volume 16:31:25 <smcginnis> cknight: Seems strange to now count it against quota if it's not actually managed. 16:31:29 <hemna> it's broken all the way around 16:31:50 <dulek> admin-only-ness is controlled by policy.json, so I don't think we should make such assumptions. 16:31:56 <cknight> smcginnis: Yes, but it's all consistent. Every non-deleted volume in the db counts against quota. 16:32:11 <dulek> hemna: Quotas are for tenants, not backends. 16:32:23 <dulek> hemna: I think you're thinking of capacity here. 16:32:38 <smcginnis> cknight: I guess I can see that argument. Not saying I buy into it, but I can see the point. ;) 16:32:39 <hemna> dulek, there are capacity quotas as well 16:32:44 <hemna> and counts of volumes 16:33:07 <xyang> DuncanT: so if we only change this for manage volume, it seems inconsistent? for everything else, we only decrement quota in delete_volume 16:33:24 <dulek> Volume is already on the backend, so capacity is decreased. But tenant's quotas aren't related to capacity of a single backend. 16:33:46 <DuncanT> xyang: For everything else, we know the size early 16:34:10 <hemna> dulek, not sure that makes much sense to me sorry. 16:34:23 <xyang> hemna: so our quota management does not look at capacity on the backend. they are not synced up at all:( 16:34:54 <smcginnis> hemna: I think he's saying the reported capacity of the backend never changes in this case because it's space already consumed. 16:34:55 <dulek> xyang: Which isn't a problem. 16:35:09 <hemna> I don't care about the capacity on the array 16:35:22 <hemna> we have a size quota in cinder related to the volume sizes 16:35:23 <hemna> anyway 16:35:38 <hemna> nm 16:35:55 <DuncanT> xyang: There's lots of reasons not to sync them up.... arrays can be shared 16:36:18 <erlon> overcommited 16:36:29 <erlon> overprovisioned 16:36:42 <wanghao_> How about to get the size from driver first, and then go to scheduler->volume? 16:37:12 <cknight> wanghao_: Are you suggesting a synchronous call from c-api to c-vol? 16:37:18 <hemna> and if the driver fails to get the size? 16:37:21 <tbarron> cknight: +1 16:37:26 <wanghao_> yes 16:37:27 <hemna> cknight, :( 16:37:31 <hemna> that can take a long time 16:37:31 <tbarron> i don't like that 16:37:40 <cknight> wanghao_: -1 Gotta be async. https://wiki.openstack.org/wiki/BasicDesignTenets 16:37:48 <tbarron> cknight: +1 16:38:29 <DuncanT> There's a sync call in on transfer volume BTW, that should be fixed if anybody wants to take a look 16:38:33 <dulek> cknight: Whoa, cool page. 16:38:34 * tbarron thinks this is the kind of issue that is driving jaypipes to propose doing away with reservations (which can be rolled back and expire) in favor of claims 16:39:09 * tbarron isn't sure which bandaid fix is best, but let's not change basic design patterns 16:39:43 <smcginnis> So how do we move this forward? 16:40:10 <cknight> I see a couple options. Handle the reservations in c-vol like now (also like Manila does it). Or require a size parameter in the manage API (failing in c-vol if the number turns out to be wrong). 16:40:14 <tbarron> maybe we live with a bug till we can do something clean and architecturally sound 16:40:33 <DuncanT> tbarron: That work is not going to be usable this cycle. 16:40:45 <tbarron> DuncanT: +1 16:40:55 <cknight> (or wait, like Tom says) 16:41:05 <smcginnis> cknight: I'm a fan of consistency, so following the Manila approach seems reasonable to me. 16:41:19 <tbarron> but maybe sometimes fixiing something is just pushing on a balloon 16:41:35 <DuncanT> I'd prefer the state based approach - it seems least intrusive 16:42:16 <smcginnis> DuncanT: That manila state based approach, right? 16:42:32 <wanghao_> cknight: I prefer the first option too if we can't do somthing in cinder-api. 16:42:36 <DuncanT> smcginnis: Sounds like it, yes - I haven't looked at their code 16:42:46 <dulek> Starting to require size would be a microversion, so we still could end up with broken quotas. 16:42:55 <smcginnis> DuncanT: Me neither, but at least at a high level it sounds alright. 16:42:59 <dulek> If user used older client. 16:42:59 <cknight> DuncanT: I'll send you a link to it. 16:43:11 <dulek> So states seem to make sense. 16:43:19 <smcginnis> dulek: Passing in the expected size to the API call? 16:43:36 <smcginnis> dulek: OK, nevermind, I see what you're saying. 16:43:45 <dulek> "Or require a size parameter in the manage API (failing in c-vol if the number turns out to be wrong)." 16:43:54 <dulek> I was referring to this. 16:43:55 <wanghao_> smcginnis: it will break the api too I think... 16:44:04 <smcginnis> wanghao_: Does this make sense to you? Can you take a look at the manila approach and see if you can use that for Cinder? 16:44:06 <DuncanT> cknight: I've got it checked out, I've just not looked yet - will do after the meeting 16:44:13 <wanghao_> smcginnis: sure 16:44:17 <cknight> DuncanT: ok 16:44:27 <smcginnis> wanghao_: OK, let's leave it at that then and move on. 16:44:37 <smcginnis> #action wanghao_ to review Manila's approach 16:44:49 <smcginnis> #topic Fixes of 'extend volume' should'n be merged without tempest dependency 16:44:53 <smcginnis> erlon: Hey 16:44:57 <erlon> Hi 16:45:03 <erlon> this is quick 16:45:08 <erlon> So, there are some bugfixes in the queue that seems to be trivial, but they need to be some specific backend testing to make sure all that work. 16:45:24 <erlon> https://bugs.launchpad.net/cinder?field.searchtext=honor&search=Search&field.status%3Alist=NEW&field.status%3Alist=INCOMPLETE_WITH_RESPONSE&field.status%3Alist=INCOMPLETE_WITHOUT_RESPONSE&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.status%3Alist=FIXCOMMITTED&field.assignee=&field.bug_reporter=&field.omit_dup 16:45:24 <erlon> es=on&field.has_patch=&field.has_no_package= 16:45:42 <erlon> Tempest does not cover this 'path' yet, so, I added this test there: 16:45:48 <erlon> https://review.openstack.org/#/c/310061/ 16:46:26 <hemna> goosh 16:46:27 <erlon> https://www.irccloud.com/pastebin/4lQP2TGT/ 16:46:29 <hemna> tinyurl.... 16:46:37 <erlon> sorry, I re 16:47:01 <erlon> I the wiki has a problem with goo.gl 16:47:18 <erlon> did't tried any other 16:47:33 <erlon> but that's it just to cores be aware 16:47:37 <scottda> So, I believe erlon has pointed out that some patches for that fix already merged, and that those patches will fail this in-flight tempest test (because the merged patches have a bug). Am I correct here, erlon ? 16:47:43 <smcginnis> erlon: Yeah, the wiki will block shortened URLs. 16:48:08 <smcginnis> erlon: OK, so you just need eyes on that. 16:48:13 <smcginnis> Once it passes Jenkins... 16:48:52 <erlon> scottda: yes, some patches that should fix that have entered already, when the test get merged on tempest they can potentially break 16:49:00 <xyang> erlon: is there a bug on what the problem in those merged patches? 16:50:07 <erlon> xyang: it can be, but not necessarily , some BE have problems with the adopted fix. Others might have not 16:50:52 <xyang> erlon: I guess I don’t know what the problem is:). I may have approved some of the merged patches, but don’t know what is wrong with them 16:51:00 <xyang> erlon: some explanation would help 16:51:32 <smcginnis> erlon: It's not clear, so the idea is to get tempest coverage on that so we know for sure, correct? 16:51:34 <hemna> that patch is failing liberty 16:51:37 <hemna> is that expected to pass ? 16:52:19 <erlon> In general lines, the fixes that people are sending can driver.extend() from inside the driver, but, the object volume, at that point still does not have the provider_location saved, and code inside driver.extend() might try to access it 16:52:32 <erlon> s/can/call/ 16:52:43 <hemna> and that isn't going to get fixed in liberty 16:52:55 <hemna> if the liberty tempest test is a real failure (due to the patch) 16:53:07 <erlon> smcginnis: yes 16:53:40 <hemna> that liberty failure looks like a neutron problem from what I can tell 16:54:03 <erlon> smcginnis: I'll also add the Depends-on and recheck those patches, so, they run the test that is in tempest 16:54:36 <hemna> gerrit seems really slow 16:54:49 <erlon> hemna: liberty? 16:54:54 <smcginnis> hemna: I'm shocked. ;) 16:55:12 <hemna> erlon, yah on your tempest patch, there is a liberty job 16:55:41 <hemna> huh, now I can't pull gerrit up at a ll 16:56:08 <erlon> hemna: hmmm, 16:56:20 <smcginnis> 3 minute warning 16:56:22 <hemna> gate-tempest-dsvm-neutron-full-liberty 16:56:34 <smcginnis> Anything more needed to discuss in the meeting? 16:56:34 <hemna> FAILURE in 52m 20s 16:56:40 <hemna> anyway, recheck should fix that one 16:57:00 <erlon> hemna: it might be because the test is failing in the point it is now testing, the extend() 16:57:35 <erlon> hemna: I'll check that out, haven't looked yet 16:57:52 <smcginnis> OK, anything else? 16:58:10 <erlon> smcginnis: for me its all 16:58:15 <smcginnis> erlon: Thanks! 16:58:25 <smcginnis> Thanks everyone. 16:58:33 <smcginnis> #endmeeting