#openstack-meeting log

16:00:07 <smcginnis> #startmeeting Cinder
16:00:07 <openstack> Meeting started Wed Jun 22 16:00:07 2016 UTC and is due to finish in 60 minutes.  The chair is smcginnis. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:09 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:10 <smcginnis> ping dulek duncant eharney geguileo winston-d e0ne jungleboyj jgriffith thingee smcginnis hemna xyang tbarron scottda erlon rhedlind jbernard _alastor_ vincent_hou kmartin patrickeast sheel dongwenjuan JaniceLee cFouts Thelo vivekd adrianofr mtanino yuriy_n17 karlamrhein diablo_rojo jay.xu jgregor baumann rajinir wilson-l reduxio wanghao
16:00:11 <yuriy_n17> hi
16:00:11 <openstack> The meeting name has been set to 'cinder'
16:00:15 <e0ne> hi
16:00:15 <geguileo> Hi!
16:00:17 <cknight> hi
16:00:17 <ntpttr> hi
16:00:19 <smcginnis> Hey everyone
16:00:21 <e0ne> #link https://wiki.openstack.org/wiki/CinderMeetings#Next_meeting
16:00:21 <patrickeast> o/
16:00:22 <adrianofr> hi
16:00:24 <tbarron> hi
16:00:25 <_alastor_> \o
16:00:27 <diablo_rojo> Hello :)
16:00:29 <scottda> hey
16:00:33 <eharney> hi
16:00:35 * DuncanT waves
16:00:37 <wanghao_> hi
16:00:39 <smcginnis> #topic Announcements
16:00:42 <Swanson> hello
16:00:50 <xyang> hi
16:00:55 <smcginnis> #link https://etherpad.openstack.org/p/cinder-spec-review-tracking Review focus
16:00:57 <dulek> o/
16:01:00 <erlon> hey
16:01:03 <smcginnis> Please take a look at the review focus etherpad.
16:01:26 <smcginnis> Cores in particular - if we can spend some time working on the new drivers submissions - it would be great to get a few more of those out of there.
16:01:44 <smcginnis> Though with the lack of activity on some responding to comments I'm sure a few of those are not going to make it.
16:02:00 <smcginnis> #link https://etherpad.openstack.org/p/newton-cinder-midcycle
16:02:01 <dulek> geguileo: Can you update list of patches related to c-vol A/A there?
16:02:21 * jungleboyj is having to drop off for other commitments.  Will catch up on the notes later.
16:02:25 <smcginnis> Just another reminder on the midcycle. Sign up if you are going, and please add any topics there.
16:02:30 <smcginnis> jungleboyj: Fine, be that way.
16:02:31 <smcginnis> :P
16:02:32 <erlon> smcginnis: do they all have the CI running ok?
16:02:40 <smcginnis> erlon: Not all of them yet.
16:02:42 <jungleboyj> smcginnis: I will.
16:02:48 <geguileo> dulek: Do you want a list or just the first in the series?
16:02:49 * jungleboyj stomps off pouting.
16:02:54 <smcginnis> :)
16:03:05 <smcginnis> geguileo, dulek: Yeah, that could help.
16:03:17 <smcginnis> That's another one I would like to see get through and be done with.
16:03:24 <dulek> geguileo: Good question. A list makes it easier to asses size by just looking at the Etherpad.
16:03:31 <dulek> geguileo: …I think.
16:03:36 <smcginnis> #info Gerrit outage July 1 @ 20:00 UTC for upgrades.
16:03:38 <geguileo> dulek: Ok, list it is
16:03:48 <smcginnis> Gerrit will be offline for upgrades.
16:03:50 <xyang> smcginnis: midcycle is the week of 7/18, newton-2 is 7/12 when all specs are supposed to be approved.  these dates are reversed:)
16:04:11 <smcginnis> xyang: Yeah, as usual, not good timing. :]
16:04:26 <smcginnis> I would really like to try to finalize specs by then
16:04:47 <smcginnis> But like last time - at the discretion of the core team whether to accept any more after that point.
16:05:00 <xyang> smcginnis: sure
16:05:04 <hemna> yough
16:05:14 <smcginnis> Hopefully no last minute rewrites of major features related to desserts though. ;)
16:05:26 <xyang> smcginnis: right, we delayed the midcycle due to some tournaments in Fort Collins
16:05:36 <xyang> :)
16:05:58 <smcginnis> #topic Refactor quota reservation in managing volume/snapshot
16:06:01 <smcginnis> wanghao_: Hi
16:06:05 <wanghao_> hi
16:06:11 <smcginnis> #link https://blueprints.launchpad.net/cinder/+spec/refactor-quota-reservation-in-managing-vol Blueprint
16:06:29 <smcginnis> #link https://bugs.launchpad.net/cinder/+bug/1504007 Quota bug
16:06:29 <openstack> Launchpad bug 1504007 in Cinder "unmanage volume/snapshot reduce quota incorrectly" [Medium,In progress] - Assigned to wanghao (wanghao749)
16:07:28 <smcginnis> wanghao_: Was there something you wanted to discuss with this?
16:07:37 <wanghao_> I talked this bug with michal, we think to fix this bug completely, need to refactor the managing existing volume/snapshots.
16:07:59 <smcginnis> Oh?
16:08:06 <wanghao_> So want to hear guy's opinion.
16:08:26 <smcginnis> wanghao_: Can you describe a little what is needed to be done for that?
16:08:57 <wanghao_> Want to make quota reservation and commit in cinder-api instead of in cinder-volume.
16:09:21 <dulek> Quotas on manage are broken because we don't know the size of a volume being managed in c-api.
16:09:46 <dulek> So we're reserving them in c-vol.
16:09:49 <dulek> And it's hard to revert things properly in c-vol.
16:09:53 <xyang> dulek: should we know more about the volume size in cinder volume rather than cinder api?
16:09:53 <smcginnis> dulek: Ah. So by the time we manage it it's too late?
16:09:55 <dulek> (if not impossible)
16:09:56 <wanghao_> yes, that means we need to call getting-size in cinder-api.
16:10:40 <wanghao_> smcginnis:
16:10:43 <smcginnis> So from c-api we would call down to get size first, then call down to manage if quotas check out?
16:10:56 <wanghao_> smcginnis: yeah
16:11:42 <dulek> I think this was an anti-pattern in the past, that's why I wanted that to be discussed on a meeting.
16:12:00 <dulek> Now we have list managable volume which also works as call to c-vol.
16:12:28 <smcginnis> I think it makes sense, but would be interested in hearing from others on any reasons why not to do it that way.
16:12:35 <DuncanT> list managable is not implemented everywhere, and is low performance
16:13:23 <wanghao_> The point is getting volume size, since quota reservation needs that.
16:14:01 <xyang> how do you know the size without making a call to the backend?
16:14:11 <xyang> this should go to scheduler first?
16:14:42 <cknight> Why is it "hard to revert things properly in c-vol"?
16:14:43 <geguileo> xyang: I agree
16:15:35 <wanghao_> xyang: we don't know it, so we want to do it first in cinder-api not cinder-volume.
16:15:51 <xyang> wanghao_: scheduler is involved in manage volume
16:16:05 <DuncanT> Moving this to c-api seems not to be the right way forward, honestly
16:16:20 <xyang> wanghao_: I think that is why it is done in cinder volume.  cinder api -> scheduler -> cinder volume
16:16:28 <cknight> DuncanT: +1
16:16:29 <dulek> xyang: When doing manage you need to specify host, backend and pool, so no scheduler needed I think.
16:16:32 <xyang> I don’t know how you get the size in cinder api?
16:16:34 <hemna> that can take a 'long time' to have the c-api wait for the array to report size
16:16:46 <hemna> c-api -> c-vol -> array and back
16:16:46 <xyang> dulek: scheduler is involved
16:16:49 <hemna> can be very slow
16:16:54 <sheel> dulek:right
16:17:04 <xyang> dulek: it goes to scheduler first.
16:17:34 <smcginnis> This adds APIs for getting details: https://specs.openstack.org/openstack/cinder-specs/specs/newton/list-manage-existing.html
16:18:00 <dulek> xyang: Oh, it only validates the host specified by the user. CHecks if a volume will fit there with filters.
16:18:04 <smcginnis> A little different scenario, but similar.
16:18:39 <dulek> Okay, so here's why it's hard to do things correctly in c-vol.
16:18:54 <dulek> It's because DB entry is already created.
16:19:00 <dulek> (by c-api)
16:19:32 <dulek> In other places quotas are validated before creating a DB row for a volume.
16:20:08 <DuncanT> We can just put the volume in error if the quota reservation fails in c-vol though... it's an admin only call
16:20:09 <dulek> So we don't have a problem - should we decrement quota counters when deleting a volume.
16:20:33 <dulek> Sure, but now do I decrement the quota or not when deleting?
16:20:40 <dulek> How do I know when it failed?
16:21:04 <DuncanT> Hmmm, a valid point
16:21:32 <dulek> Another option could (probably) be special error status that is set on failure modes not requiring quota decrement.
16:22:03 <DuncanT> error_managing?
16:22:33 <wanghao_> DuncanT: that why the bug is here, we need to cover some cases, but maybe can't fix it completing.
16:23:35 <dulek> DuncanT: Naming isn't a problem here. ;)
16:24:23 <DuncanT> wanghao_: I think a special error status is less messy than moving the quota reservation, in this case
16:24:48 <dulek> DuncanT: I don't know if new state would cover all cases. Like when volume message was lost in a way to c-vol… What do we do? Do we decrement?
16:25:17 <DuncanT> dulek: No, since we haven't reserved yet
16:25:18 <dulek> We have creating status. It's like we would need "managing" status as well.
16:25:23 <xyang> DuncanT: Is there a reason that we cannot decrement the quota reserve when it fails?
16:25:33 <xyang> DuncanT: in cinder volume
16:25:34 <DuncanT> "managing" seems reasonable
16:26:13 <DuncanT> xyang: Because we decrement quota when deleting volumes in error state
16:26:35 <cknight> Manila did the same thing.  There are states for 'managing' and 'manage_error'.
16:26:56 <dulek> cknight: Ouch, was it to fix the same problem?
16:27:08 <cknight> dulek: I don't recall, honestly.
16:27:11 <wanghao_> cknight: emm, that's a useful info.
16:27:35 <smcginnis> cknight: Do you know if there was a reason for manage_error rather than just error. Just to be clear?
16:27:42 <DuncanT> 'managing' matches more closely what we have with other operations, though the quota management is still different
16:27:53 <cknight> smcginnis: I'll check.
16:28:08 <smcginnis> cknight: No worries, I was just curious if you happened to know.
16:28:13 <cknight> But a managing volume has size of 0 until it succeeds, so I don't know why it's an issue with quota reservations.
16:28:14 <wanghao_> DuncanT: if manageing, we don't decrement the quota, and if error_manage, decrement it, right?
16:28:17 <DuncanT> smcginnis: Don't delete quota when deleting a volume in manage_error for us....
16:28:33 <dulek> cknight: It's about volume number quota I think.
16:28:35 <smcginnis> DuncanT: Yeah, it could be a way to differentiate that operation.
16:28:44 <hemna> DuncanT, we should be able to decrement the quota for failed manage calls though in c-vol
16:28:46 <wanghao_> DuncanT:
16:28:51 <hemna> we shouldn't have to wait for a delete
16:28:59 <smcginnis> hemna: +1
16:29:13 <cknight> dulek: Sure, so it goes to error and the quota is decremented on delete.
16:29:49 <dulek> Well, if Manila already tried the pattern, then why don't we?
16:30:16 <cknight> dulek: I discussed this with bswartz at the time, and his feeling was that a failed manage results in an errored volume that counts against quota until it is deleted.
16:30:53 <cknight> dulek: Since manage is admin only, the admin can always delete or force-delete as needed.
16:30:57 <smcginnis> cknight: I wonder why. The volume was already there before on the backend.
16:31:24 <hemna> cknight, but we don't know and may never know how much quota (space) to count against that volume
16:31:25 <smcginnis> cknight: Seems strange to now count it against quota if it's not actually managed.
16:31:29 <hemna> it's broken all the way around
16:31:50 <dulek> admin-only-ness is controlled by policy.json, so I don't think we should make such assumptions.
16:31:56 <cknight> smcginnis: Yes, but it's all consistent.  Every non-deleted volume in the db counts against quota.
16:32:11 <dulek> hemna: Quotas are for tenants, not backends.
16:32:23 <dulek> hemna: I think you're thinking of capacity here.
16:32:38 <smcginnis> cknight: I guess I can see that argument. Not saying I buy into it, but I can see the point. ;)
16:32:39 <hemna> dulek, there are capacity quotas as well
16:32:44 <hemna> and counts of volumes
16:33:07 <xyang> DuncanT: so if we only change this for manage volume, it seems inconsistent?  for everything else, we only decrement quota in delete_volume
16:33:24 <dulek> Volume is already on the backend, so capacity is decreased. But tenant's quotas aren't related to capacity of a single backend.
16:33:46 <DuncanT> xyang: For everything else, we know the size early
16:34:10 <hemna> dulek, not sure that makes much sense to me sorry.
16:34:23 <xyang> hemna: so our quota management does not look at capacity on the backend.  they are not synced up at all:(
16:34:54 <smcginnis> hemna: I think he's saying the reported capacity of the backend never changes in this case because it's space already consumed.
16:34:55 <dulek> xyang: Which isn't a problem.
16:35:09 <hemna> I don't care about the capacity on the array
16:35:22 <hemna> we have a size quota in cinder related to the volume sizes
16:35:23 <hemna> anyway
16:35:38 <hemna> nm
16:35:55 <DuncanT> xyang: There's lots of reasons not to sync them up.... arrays can be shared
16:36:18 <erlon> overcommited
16:36:29 <erlon> overprovisioned
16:36:42 <wanghao_> How about to get the size from driver first, and then go to scheduler->volume?
16:37:12 <cknight> wanghao_: Are you suggesting a synchronous call from c-api to c-vol?
16:37:18 <hemna> and if the driver fails to get the size?
16:37:21 <tbarron> cknight: +1
16:37:26 <wanghao_> yes
16:37:27 <hemna> cknight, :(
16:37:31 <hemna> that can take a long time
16:37:31 <tbarron> i don't like that
16:37:40 <cknight> wanghao_: -1  Gotta be async.  https://wiki.openstack.org/wiki/BasicDesignTenets
16:37:48 <tbarron> cknight: +1
16:38:29 <DuncanT> There's a sync call in on transfer volume BTW, that should be fixed if anybody wants to take a look
16:38:33 <dulek> cknight: Whoa, cool page.
16:38:34 * tbarron thinks this is the kind of issue that is driving jaypipes to propose doing away with reservations (which can be rolled back and expire) in favor of claims
16:39:09 * tbarron isn't sure which bandaid fix is best, but let's not change basic design patterns
16:39:43 <smcginnis> So how do we move this forward?
16:40:10 <cknight> I see a couple options.  Handle the reservations in c-vol like now (also like Manila does it).  Or require a size parameter in the manage API (failing in c-vol if the number turns out to be wrong).
16:40:14 <tbarron> maybe we live with a bug till we can do something clean and architecturally sound
16:40:33 <DuncanT> tbarron: That work is not going to be usable this cycle.
16:40:45 <tbarron> DuncanT: +1
16:40:55 <cknight> (or wait, like Tom says)
16:41:05 <smcginnis> cknight: I'm a fan of consistency, so following the Manila approach seems reasonable to me.
16:41:19 <tbarron> but maybe sometimes fixiing something is just pushing on a balloon
16:41:35 <DuncanT> I'd prefer the state based approach - it seems least intrusive
16:42:16 <smcginnis> DuncanT: That manila state based approach, right?
16:42:32 <wanghao_> cknight:  I prefer the first option too if we can't do somthing in cinder-api.
16:42:36 <DuncanT> smcginnis: Sounds like it, yes - I haven't looked at their code
16:42:46 <dulek> Starting to require size would be a microversion, so we still could end up with broken quotas.
16:42:55 <smcginnis> DuncanT: Me neither, but at least at a high level it sounds alright.
16:42:59 <dulek> If user used older client.
16:42:59 <cknight> DuncanT: I'll send you a link to it.
16:43:11 <dulek> So states seem to make sense.
16:43:19 <smcginnis> dulek: Passing in the expected size to the API call?
16:43:36 <smcginnis> dulek: OK, nevermind, I see what you're saying.
16:43:45 <dulek> "Or require a size parameter in the manage API (failing in c-vol if the number turns out to be wrong)."
16:43:54 <dulek> I was referring to this.
16:43:55 <wanghao_> smcginnis:  it will break the api too I think...
16:44:04 <smcginnis> wanghao_: Does this make sense to you? Can you take a look at the manila approach and see if you can use that for Cinder?
16:44:06 <DuncanT> cknight: I've got it checked out, I've just not looked yet - will do after the meeting
16:44:13 <wanghao_> smcginnis: sure
16:44:17 <cknight> DuncanT: ok
16:44:27 <smcginnis> wanghao_: OK, let's leave it at that then and move on.
16:44:37 <smcginnis> #action wanghao_ to review Manila's approach
16:44:49 <smcginnis> #topic Fixes of 'extend volume' should'n be merged without tempest dependency
16:44:53 <smcginnis> erlon: Hey
16:44:57 <erlon> Hi
16:45:03 <erlon> this is quick
16:45:08 <erlon> So, there are some bugfixes in the queue that seems to be trivial, but they need to be some specific backend testing to make sure all that work.
16:45:24 <erlon> https://bugs.launchpad.net/cinder?field.searchtext=honor&search=Search&field.status%3Alist=NEW&field.status%3Alist=INCOMPLETE_WITH_RESPONSE&field.status%3Alist=INCOMPLETE_WITHOUT_RESPONSE&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.status%3Alist=FIXCOMMITTED&field.assignee=&field.bug_reporter=&field.omit_dup
16:45:24 <erlon> es=on&field.has_patch=&field.has_no_package=
16:45:42 <erlon> Tempest does not cover this 'path' yet, so, I added this test there:
16:45:48 <erlon> https://review.openstack.org/#/c/310061/
16:46:26 <hemna> goosh
16:46:27 <erlon> https://www.irccloud.com/pastebin/4lQP2TGT/
16:46:29 <hemna> tinyurl....
16:46:37 <erlon> sorry, I re
16:47:01 <erlon> I the wiki has a problem with goo.gl
16:47:18 <erlon> did't tried any other
16:47:33 <erlon> but that's it just to cores be aware
16:47:37 <scottda> So, I believe erlon has pointed out that some patches for that fix already merged, and that those patches will fail this in-flight tempest test (because the merged patches have a bug). Am I correct here, erlon ?
16:47:43 <smcginnis> erlon: Yeah, the wiki will block shortened URLs.
16:48:08 <smcginnis> erlon: OK, so you just need eyes on that.
16:48:13 <smcginnis> Once it passes Jenkins...
16:48:52 <erlon> scottda: yes, some patches that should fix that have entered already, when the test get merged on tempest they can potentially break
16:49:00 <xyang> erlon: is there a bug on what the problem in those merged patches?
16:50:07 <erlon> xyang: it can be, but not necessarily , some BE have problems with the adopted fix. Others might have not
16:50:52 <xyang> erlon: I guess I don’t know what the problem is:).  I may have approved some of the merged patches, but don’t know what is wrong with them
16:51:00 <xyang> erlon: some explanation would help
16:51:32 <smcginnis> erlon: It's not clear, so the idea is to get tempest coverage on that so we know for sure, correct?
16:51:34 <hemna> that patch is failing liberty
16:51:37 <hemna> is that expected to pass ?
16:52:19 <erlon> In general lines, the fixes that people are sending can driver.extend() from inside the driver, but, the object volume, at that point still does not have the provider_location saved, and code inside driver.extend() might try to access it
16:52:32 <erlon> s/can/call/
16:52:43 <hemna> and that isn't going to get fixed in liberty
16:52:55 <hemna> if the liberty tempest test is a real failure (due to the patch)
16:53:07 <erlon> smcginnis: yes
16:53:40 <hemna> that liberty failure looks like a neutron problem from what I can tell
16:54:03 <erlon> smcginnis: I'll also add the Depends-on and recheck those patches, so, they run the test that is in tempest
16:54:36 <hemna> gerrit seems really slow
16:54:49 <erlon> hemna: liberty?
16:54:54 <smcginnis> hemna: I'm shocked. ;)
16:55:12 <hemna> erlon, yah on your tempest patch, there is a liberty job
16:55:41 <hemna> huh, now I can't pull gerrit up at a ll
16:56:08 <erlon> hemna: hmmm,
16:56:20 <smcginnis> 3 minute warning
16:56:22 <hemna> gate-tempest-dsvm-neutron-full-liberty
16:56:34 <smcginnis> Anything more needed to discuss in the meeting?
16:56:34 <hemna> FAILURE in 52m 20s
16:56:40 <hemna> anyway, recheck should fix that one
16:57:00 <erlon> hemna: it might be because the test is failing in the point it is now testing, the extend()
16:57:35 <erlon> hemna: I'll check that out, haven't looked yet
16:57:52 <smcginnis> OK, anything else?
16:58:10 <erlon> smcginnis: for me its all
16:58:15 <smcginnis> erlon: Thanks!
16:58:25 <smcginnis> Thanks everyone.
16:58:33 <smcginnis> #endmeeting