16:00:07 #startmeeting Cinder 16:00:07 Meeting started Wed Jun 22 16:00:07 2016 UTC and is due to finish in 60 minutes. The chair is smcginnis. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:09 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:10 ping dulek duncant eharney geguileo winston-d e0ne jungleboyj jgriffith thingee smcginnis hemna xyang tbarron scottda erlon rhedlind jbernard _alastor_ vincent_hou kmartin patrickeast sheel dongwenjuan JaniceLee cFouts Thelo vivekd adrianofr mtanino yuriy_n17 karlamrhein diablo_rojo jay.xu jgregor baumann rajinir wilson-l reduxio wanghao 16:00:11 hi 16:00:11 The meeting name has been set to 'cinder' 16:00:15 hi 16:00:15 Hi! 16:00:17 hi 16:00:17 hi 16:00:19 Hey everyone 16:00:21 #link https://wiki.openstack.org/wiki/CinderMeetings#Next_meeting 16:00:21 o/ 16:00:22 hi 16:00:24 hi 16:00:25 <_alastor_> \o 16:00:27 Hello :) 16:00:29 hey 16:00:33 hi 16:00:35 * DuncanT waves 16:00:37 hi 16:00:39 #topic Announcements 16:00:42 hello 16:00:50 hi 16:00:55 #link https://etherpad.openstack.org/p/cinder-spec-review-tracking Review focus 16:00:57 o/ 16:01:00 hey 16:01:03 Please take a look at the review focus etherpad. 16:01:26 Cores in particular - if we can spend some time working on the new drivers submissions - it would be great to get a few more of those out of there. 16:01:44 Though with the lack of activity on some responding to comments I'm sure a few of those are not going to make it. 16:02:00 #link https://etherpad.openstack.org/p/newton-cinder-midcycle 16:02:01 geguileo: Can you update list of patches related to c-vol A/A there? 16:02:21 * jungleboyj is having to drop off for other commitments. Will catch up on the notes later. 16:02:25 Just another reminder on the midcycle. Sign up if you are going, and please add any topics there. 16:02:30 jungleboyj: Fine, be that way. 16:02:31 :P 16:02:32 smcginnis: do they all have the CI running ok? 16:02:40 erlon: Not all of them yet. 16:02:42 smcginnis: I will. 16:02:48 dulek: Do you want a list or just the first in the series? 16:02:49 * jungleboyj stomps off pouting. 16:02:54 :) 16:03:05 geguileo, dulek: Yeah, that could help. 16:03:17 That's another one I would like to see get through and be done with. 16:03:24 geguileo: Good question. A list makes it easier to asses size by just looking at the Etherpad. 16:03:31 geguileo: …I think. 16:03:36 #info Gerrit outage July 1 @ 20:00 UTC for upgrades. 16:03:38 dulek: Ok, list it is 16:03:48 Gerrit will be offline for upgrades. 16:03:50 smcginnis: midcycle is the week of 7/18, newton-2 is 7/12 when all specs are supposed to be approved. these dates are reversed:) 16:04:11 xyang: Yeah, as usual, not good timing. :] 16:04:26 I would really like to try to finalize specs by then 16:04:47 But like last time - at the discretion of the core team whether to accept any more after that point. 16:05:00 smcginnis: sure 16:05:04 yough 16:05:14 Hopefully no last minute rewrites of major features related to desserts though. ;) 16:05:26 smcginnis: right, we delayed the midcycle due to some tournaments in Fort Collins 16:05:36 :) 16:05:58 #topic Refactor quota reservation in managing volume/snapshot 16:06:01 wanghao_: Hi 16:06:05 hi 16:06:11 #link https://blueprints.launchpad.net/cinder/+spec/refactor-quota-reservation-in-managing-vol Blueprint 16:06:29 #link https://bugs.launchpad.net/cinder/+bug/1504007 Quota bug 16:06:29 Launchpad bug 1504007 in Cinder "unmanage volume/snapshot reduce quota incorrectly" [Medium,In progress] - Assigned to wanghao (wanghao749) 16:07:28 wanghao_: Was there something you wanted to discuss with this? 16:07:37 I talked this bug with michal, we think to fix this bug completely, need to refactor the managing existing volume/snapshots. 16:07:59 Oh? 16:08:06 So want to hear guy's opinion. 16:08:26 wanghao_: Can you describe a little what is needed to be done for that? 16:08:57 Want to make quota reservation and commit in cinder-api instead of in cinder-volume. 16:09:21 Quotas on manage are broken because we don't know the size of a volume being managed in c-api. 16:09:46 So we're reserving them in c-vol. 16:09:49 And it's hard to revert things properly in c-vol. 16:09:53 dulek: should we know more about the volume size in cinder volume rather than cinder api? 16:09:53 dulek: Ah. So by the time we manage it it's too late? 16:09:55 (if not impossible) 16:09:56 yes, that means we need to call getting-size in cinder-api. 16:10:40 smcginnis: 16:10:43 So from c-api we would call down to get size first, then call down to manage if quotas check out? 16:10:56 smcginnis: yeah 16:11:42 I think this was an anti-pattern in the past, that's why I wanted that to be discussed on a meeting. 16:12:00 Now we have list managable volume which also works as call to c-vol. 16:12:28 I think it makes sense, but would be interested in hearing from others on any reasons why not to do it that way. 16:12:35 list managable is not implemented everywhere, and is low performance 16:13:23 The point is getting volume size, since quota reservation needs that. 16:14:01 how do you know the size without making a call to the backend? 16:14:11 this should go to scheduler first? 16:14:42 Why is it "hard to revert things properly in c-vol"? 16:14:43 xyang: I agree 16:15:35 xyang: we don't know it, so we want to do it first in cinder-api not cinder-volume. 16:15:51 wanghao_: scheduler is involved in manage volume 16:16:05 Moving this to c-api seems not to be the right way forward, honestly 16:16:20 wanghao_: I think that is why it is done in cinder volume. cinder api -> scheduler -> cinder volume 16:16:28 DuncanT: +1 16:16:29 xyang: When doing manage you need to specify host, backend and pool, so no scheduler needed I think. 16:16:32 I don’t know how you get the size in cinder api? 16:16:34 that can take a 'long time' to have the c-api wait for the array to report size 16:16:46 c-api -> c-vol -> array and back 16:16:46 dulek: scheduler is involved 16:16:49 can be very slow 16:16:54 dulek:right 16:17:04 dulek: it goes to scheduler first. 16:17:34 This adds APIs for getting details: https://specs.openstack.org/openstack/cinder-specs/specs/newton/list-manage-existing.html 16:18:00 xyang: Oh, it only validates the host specified by the user. CHecks if a volume will fit there with filters. 16:18:04 A little different scenario, but similar. 16:18:39 Okay, so here's why it's hard to do things correctly in c-vol. 16:18:54 It's because DB entry is already created. 16:19:00 (by c-api) 16:19:32 In other places quotas are validated before creating a DB row for a volume. 16:20:08 We can just put the volume in error if the quota reservation fails in c-vol though... it's an admin only call 16:20:09 So we don't have a problem - should we decrement quota counters when deleting a volume. 16:20:33 Sure, but now do I decrement the quota or not when deleting? 16:20:40 How do I know when it failed? 16:21:04 Hmmm, a valid point 16:21:32 Another option could (probably) be special error status that is set on failure modes not requiring quota decrement. 16:22:03 error_managing? 16:22:33 DuncanT: that why the bug is here, we need to cover some cases, but maybe can't fix it completing. 16:23:35 DuncanT: Naming isn't a problem here. ;) 16:24:23 wanghao_: I think a special error status is less messy than moving the quota reservation, in this case 16:24:48 DuncanT: I don't know if new state would cover all cases. Like when volume message was lost in a way to c-vol… What do we do? Do we decrement? 16:25:17 dulek: No, since we haven't reserved yet 16:25:18 We have creating status. It's like we would need "managing" status as well. 16:25:23 DuncanT: Is there a reason that we cannot decrement the quota reserve when it fails? 16:25:33 DuncanT: in cinder volume 16:25:34 "managing" seems reasonable 16:26:13 xyang: Because we decrement quota when deleting volumes in error state 16:26:35 Manila did the same thing. There are states for 'managing' and 'manage_error'. 16:26:56 cknight: Ouch, was it to fix the same problem? 16:27:08 dulek: I don't recall, honestly. 16:27:11 cknight: emm, that's a useful info. 16:27:35 cknight: Do you know if there was a reason for manage_error rather than just error. Just to be clear? 16:27:42 'managing' matches more closely what we have with other operations, though the quota management is still different 16:27:53 smcginnis: I'll check. 16:28:08 cknight: No worries, I was just curious if you happened to know. 16:28:13 But a managing volume has size of 0 until it succeeds, so I don't know why it's an issue with quota reservations. 16:28:14 DuncanT: if manageing, we don't decrement the quota, and if error_manage, decrement it, right? 16:28:17 smcginnis: Don't delete quota when deleting a volume in manage_error for us.... 16:28:33 cknight: It's about volume number quota I think. 16:28:35 DuncanT: Yeah, it could be a way to differentiate that operation. 16:28:44 DuncanT, we should be able to decrement the quota for failed manage calls though in c-vol 16:28:46 DuncanT: 16:28:51 we shouldn't have to wait for a delete 16:28:59 hemna: +1 16:29:13 dulek: Sure, so it goes to error and the quota is decremented on delete. 16:29:49 Well, if Manila already tried the pattern, then why don't we? 16:30:16 dulek: I discussed this with bswartz at the time, and his feeling was that a failed manage results in an errored volume that counts against quota until it is deleted. 16:30:53 dulek: Since manage is admin only, the admin can always delete or force-delete as needed. 16:30:57 cknight: I wonder why. The volume was already there before on the backend. 16:31:24 cknight, but we don't know and may never know how much quota (space) to count against that volume 16:31:25 cknight: Seems strange to now count it against quota if it's not actually managed. 16:31:29 it's broken all the way around 16:31:50 admin-only-ness is controlled by policy.json, so I don't think we should make such assumptions. 16:31:56 smcginnis: Yes, but it's all consistent. Every non-deleted volume in the db counts against quota. 16:32:11 hemna: Quotas are for tenants, not backends. 16:32:23 hemna: I think you're thinking of capacity here. 16:32:38 cknight: I guess I can see that argument. Not saying I buy into it, but I can see the point. ;) 16:32:39 dulek, there are capacity quotas as well 16:32:44 and counts of volumes 16:33:07 DuncanT: so if we only change this for manage volume, it seems inconsistent? for everything else, we only decrement quota in delete_volume 16:33:24 Volume is already on the backend, so capacity is decreased. But tenant's quotas aren't related to capacity of a single backend. 16:33:46 xyang: For everything else, we know the size early 16:34:10 dulek, not sure that makes much sense to me sorry. 16:34:23 hemna: so our quota management does not look at capacity on the backend. they are not synced up at all:( 16:34:54 hemna: I think he's saying the reported capacity of the backend never changes in this case because it's space already consumed. 16:34:55 xyang: Which isn't a problem. 16:35:09 I don't care about the capacity on the array 16:35:22 we have a size quota in cinder related to the volume sizes 16:35:23 anyway 16:35:38 nm 16:35:55 xyang: There's lots of reasons not to sync them up.... arrays can be shared 16:36:18 overcommited 16:36:29 overprovisioned 16:36:42 How about to get the size from driver first, and then go to scheduler->volume? 16:37:12 wanghao_: Are you suggesting a synchronous call from c-api to c-vol? 16:37:18 and if the driver fails to get the size? 16:37:21 cknight: +1 16:37:26 yes 16:37:27 cknight, :( 16:37:31 that can take a long time 16:37:31 i don't like that 16:37:40 wanghao_: -1 Gotta be async. https://wiki.openstack.org/wiki/BasicDesignTenets 16:37:48 cknight: +1 16:38:29 There's a sync call in on transfer volume BTW, that should be fixed if anybody wants to take a look 16:38:33 cknight: Whoa, cool page. 16:38:34 * tbarron thinks this is the kind of issue that is driving jaypipes to propose doing away with reservations (which can be rolled back and expire) in favor of claims 16:39:09 * tbarron isn't sure which bandaid fix is best, but let's not change basic design patterns 16:39:43 So how do we move this forward? 16:40:10 I see a couple options. Handle the reservations in c-vol like now (also like Manila does it). Or require a size parameter in the manage API (failing in c-vol if the number turns out to be wrong). 16:40:14 maybe we live with a bug till we can do something clean and architecturally sound 16:40:33 tbarron: That work is not going to be usable this cycle. 16:40:45 DuncanT: +1 16:40:55 (or wait, like Tom says) 16:41:05 cknight: I'm a fan of consistency, so following the Manila approach seems reasonable to me. 16:41:19 but maybe sometimes fixiing something is just pushing on a balloon 16:41:35 I'd prefer the state based approach - it seems least intrusive 16:42:16 DuncanT: That manila state based approach, right? 16:42:32 cknight: I prefer the first option too if we can't do somthing in cinder-api. 16:42:36 smcginnis: Sounds like it, yes - I haven't looked at their code 16:42:46 Starting to require size would be a microversion, so we still could end up with broken quotas. 16:42:55 DuncanT: Me neither, but at least at a high level it sounds alright. 16:42:59 If user used older client. 16:42:59 DuncanT: I'll send you a link to it. 16:43:11 So states seem to make sense. 16:43:19 dulek: Passing in the expected size to the API call? 16:43:36 dulek: OK, nevermind, I see what you're saying. 16:43:45 "Or require a size parameter in the manage API (failing in c-vol if the number turns out to be wrong)." 16:43:54 I was referring to this. 16:43:55 smcginnis: it will break the api too I think... 16:44:04 wanghao_: Does this make sense to you? Can you take a look at the manila approach and see if you can use that for Cinder? 16:44:06 cknight: I've got it checked out, I've just not looked yet - will do after the meeting 16:44:13 smcginnis: sure 16:44:17 DuncanT: ok 16:44:27 wanghao_: OK, let's leave it at that then and move on. 16:44:37 #action wanghao_ to review Manila's approach 16:44:49 #topic Fixes of 'extend volume' should'n be merged without tempest dependency 16:44:53 erlon: Hey 16:44:57 Hi 16:45:03 this is quick 16:45:08 So, there are some bugfixes in the queue that seems to be trivial, but they need to be some specific backend testing to make sure all that work. 16:45:24 https://bugs.launchpad.net/cinder?field.searchtext=honor&search=Search&field.status%3Alist=NEW&field.status%3Alist=INCOMPLETE_WITH_RESPONSE&field.status%3Alist=INCOMPLETE_WITHOUT_RESPONSE&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.status%3Alist=FIXCOMMITTED&field.assignee=&field.bug_reporter=&field.omit_dup 16:45:24 es=on&field.has_patch=&field.has_no_package= 16:45:42 Tempest does not cover this 'path' yet, so, I added this test there: 16:45:48 https://review.openstack.org/#/c/310061/ 16:46:26 goosh 16:46:27 https://www.irccloud.com/pastebin/4lQP2TGT/ 16:46:29 tinyurl.... 16:46:37 sorry, I re 16:47:01 I the wiki has a problem with goo.gl 16:47:18 did't tried any other 16:47:33 but that's it just to cores be aware 16:47:37 So, I believe erlon has pointed out that some patches for that fix already merged, and that those patches will fail this in-flight tempest test (because the merged patches have a bug). Am I correct here, erlon ? 16:47:43 erlon: Yeah, the wiki will block shortened URLs. 16:48:08 erlon: OK, so you just need eyes on that. 16:48:13 Once it passes Jenkins... 16:48:52 scottda: yes, some patches that should fix that have entered already, when the test get merged on tempest they can potentially break 16:49:00 erlon: is there a bug on what the problem in those merged patches? 16:50:07 xyang: it can be, but not necessarily , some BE have problems with the adopted fix. Others might have not 16:50:52 erlon: I guess I don’t know what the problem is:). I may have approved some of the merged patches, but don’t know what is wrong with them 16:51:00 erlon: some explanation would help 16:51:32 erlon: It's not clear, so the idea is to get tempest coverage on that so we know for sure, correct? 16:51:34 that patch is failing liberty 16:51:37 is that expected to pass ? 16:52:19 In general lines, the fixes that people are sending can driver.extend() from inside the driver, but, the object volume, at that point still does not have the provider_location saved, and code inside driver.extend() might try to access it 16:52:32 s/can/call/ 16:52:43 and that isn't going to get fixed in liberty 16:52:55 if the liberty tempest test is a real failure (due to the patch) 16:53:07 smcginnis: yes 16:53:40 that liberty failure looks like a neutron problem from what I can tell 16:54:03 smcginnis: I'll also add the Depends-on and recheck those patches, so, they run the test that is in tempest 16:54:36 gerrit seems really slow 16:54:49 hemna: liberty? 16:54:54 hemna: I'm shocked. ;) 16:55:12 erlon, yah on your tempest patch, there is a liberty job 16:55:41 huh, now I can't pull gerrit up at a ll 16:56:08 hemna: hmmm, 16:56:20 3 minute warning 16:56:22 gate-tempest-dsvm-neutron-full-liberty 16:56:34 Anything more needed to discuss in the meeting? 16:56:34 FAILURE in 52m 20s 16:56:40 anyway, recheck should fix that one 16:57:00 hemna: it might be because the test is failing in the point it is now testing, the extend() 16:57:35 hemna: I'll check that out, haven't looked yet 16:57:52 OK, anything else? 16:58:10 smcginnis: for me its all 16:58:15 erlon: Thanks! 16:58:25 Thanks everyone. 16:58:33 #endmeeting