16:00:01 <thingee> #startmeeting cinder 16:00:01 <openstack> Meeting started Wed Jan 21 16:00:01 2015 UTC and is due to finish in 60 minutes. The chair is thingee. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:02 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:05 <openstack> The meeting name has been set to 'cinder' 16:00:08 <thingee> hi everyone 16:00:12 <smcginnis> Hello 16:00:12 <dulek> o/ 16:00:13 <rushiagr> hi! 16:00:14 <rhe00_> hi 16:00:15 <avishay> Hey 16:00:15 <thangp> hi 16:00:17 <eikke> hi 16:00:21 <eharney> hi 16:00:25 <thingee> our agenda for today: 16:00:28 <thingee> #link https://wiki.openstack.org/wiki/CinderMeetings 16:00:34 <scottda> hi 16:00:44 <xyang> hi 16:00:48 <kmartin> hello 16:01:07 <thingee> Just a reminder on third party ci 16:01:14 <tbarron> hi 16:01:15 <mtanino> hi 16:01:19 <cknight> Hi 16:01:33 <thingee> I sent an email to indvidual driver maintainers. a long email explaining this 16:01:37 <TobiasE> Hi 16:01:51 <thingee> the deadline being March 19th 16:02:03 <thingee> I also emailed the openstack dev mailing list: 16:02:05 <thingee> #link http://lists.openstack.org/pipermail/openstack-dev/2015-January/054614.html 16:02:05 <erlon> hi 16:02:08 <e0ne> hi! 16:02:34 <thingee> In addition for those interested, join the Third party ci documentation sprint starting today 16:02:37 <thingee> #link http://lists.openstack.org/pipermail/openstack-dev/2015-January/054690.html 16:02:45 <thingee> ok lets get started! 16:02:58 <thingee> #topic Sharing volume snapshots 16:03:01 <rushiagr> https://blueprints.launchpad.net/cinder/+spec/snapshot-sharing 16:03:02 <thingee> rushiagr: you're up 16:03:06 <rushiagr> #link ^ 16:03:13 <thingee> #link https://blueprints.launchpad.net/cinder/+spec/snapshot-sharing 16:03:33 <rushiagr> so we left halfway last time.. 16:03:59 <rushil> \o 16:04:03 <rushiagr> the general thinking was between 'umm okay' and 'no'.. 16:04:43 <xyang> how is this related to the other blueprint on public snapshot? 16:04:43 <rushiagr> I dont remember if there was any strong objection on the idea.. If I am wrong, please remind me again.. 16:04:48 <flip214> rushiagr: sounds sane to me... I can see the usecase. And it doesn't even suffer from the quota-hardlink problem that UNIX had. 16:04:59 <rushiagr> xyang: it's an extension of that 16:05:15 <rushiagr> xyang: but to be clear, it's not full ACL 16:05:27 <dulek> Wasn't the outcome that public and private snapshots are enough? 16:05:30 <asselin_> hi 16:05:34 <xyang> rushiagr: so is this one dependent on that? 16:06:07 <rushiagr> xyang: there was already a blueprint and spec for public snapshot, so I wasn't sure how to write that, but yes, it's going to be dependent on that 16:06:13 <avishay> So what is the usecase? We already have shareable images, right? What do shareable snapshots give us? Why not shareable volumes too? 16:06:23 <rushiagr> dulek: I don't think we reached the conclusion 16:06:45 <jgriffith> rushiagr: to recap my preference was "no" on sharing, and "yes" on public snaps 16:06:53 <rushiagr> avishay: I don't know the exact requirement. But I was told 'they' wanted to share 'UAT disks' between tenants 16:07:11 <dulek> rushiagr: Yes, definitely, but public snapshots solve the usecase and are more-or-less accepted. 16:07:19 <winston-d_> rushiagr: what's UAT? 16:07:28 <dulek> rushiagr: So I just wanted to remind the idea. 16:07:34 <avishay> "they"? sounds like a government conspiracy :P 16:07:35 <jgriffith> dulek: +1 16:07:36 <rushiagr> jgriffith: okay. I remember you saying sharing snapshots as I proposed is a saner way than ACL. That's all I remembered :) 16:07:51 <jgriffith> rushiagr: yeah, that's the good part ;) 16:08:15 <jgriffith> rushiagr: I thought about use cases and looked at code last week-end; I like public as best option 16:08:25 <thingee> Sharable snapshots makes me think there is some framework in place for cinder to be mediator for the snapshot to be shared. making a snapshot public, seems like it's a bit switch and up to the owner to communicate to who they want to tell the uuid to. 16:08:26 <rushiagr> dulek: 'they' (winks at avishay) might not want to make stuff public 16:08:57 <jgriffith> thingee: +1, clean and easy IMO 16:09:22 <jgriffith> and it makes sense in other OpenStack workflows 16:09:25 <rushiagr> just to be clear about implementation, when a user will share a snap with a tenant, it will just add a db entry. No actuall data will be generated/transferred out of the block storage until that new user creates disk out that shared snap 16:09:32 <thingee> sure, we just did it with volume types in k-1 16:09:44 <dulek> thingee: won't public snapshot be visible on cinder list? Glance shows public resources on the lists. 16:09:46 <thingee> oh I guess that was project_id 16:10:07 <rushiagr> thingee: sorry, I didn't completely understand you.. 16:10:45 <thingee> dulek: I think the idea is, and rushiagr can correct me if I'm wrong, is the listing won't show the snapshots. Instead when we skip context check on the snapshot if it's public. 16:11:08 <thingee> but it's up to the owner to communicate the uuid to who they want to share it with. 16:11:19 <thingee> if it's anything other than that you're talking about ACL's 16:11:20 <rushiagr> thingee: yes, listing won't show public snapshots (or shared snapshots) by default.. 16:11:21 <dulek> thingee: I don't like how inconsistent it is with other services behaviour 16:11:43 <xyang> if it is public, any tenant should be able to see it 16:11:49 <jgriffith> rushiagr: ummmm... that would be part of the feature add 16:11:56 <jgriffith> rushiagr: dulek not sure the concern? 16:12:24 <jgriffith> owner can set snapshot to "public", modify get to get all tenant and public, done 16:12:58 <jgriffith> let's try this... 16:13:03 <rushiagr> I think it's a valid use case if a tenant wants to chare a snapshot. I even think it's cleaner than creating a vol, and then transferring to another tenant.. Best would have been only sharing snapshots, and no vol transfer.. But again, personal opinions 16:13:04 <dulek> jgriffith: Yup, thats how it should be done IMO 16:13:15 <jgriffith> anybody have a good reason to NOT use public, but instead use SHARING? 16:13:36 <dulek> But thingee and rushiagr are proposing that public won't be shown on lists for other tenants. 16:13:48 <dulek> Or did I misunderstood something? 16:13:53 <thingee> dulek: I'm guessing what rushiagr wants. 16:13:57 <jgriffith> dulek: hmmm... i didn't catch that, seems odd 16:14:05 <xyang> seems that we are mixing the two 16:14:09 <jgriffith> xyang: :) 16:14:11 <rushiagr> dulek: I'm just saying that current API won't, but a new API will 16:14:23 <jgriffith> so look... here's the thing; Every use case is covered if we do public 16:14:32 <jgriffith> you can transfer a volume to anotehr tenant if you want 16:14:38 <jgriffith> or you can make a snapshot public 16:14:38 <winston-d_> rushiagr: what new API? 16:14:40 <rushiagr> 'show all my snaps plus all shared snaps) 16:15:00 <rushiagr> s/)/'/ 16:15:02 <jgriffith> I'm still not completely clear on use cases rushiagr is interested in solving 16:15:09 <dulek> Not every, rushiagr mentioned that "they" may not want to have *public* snaps, just shared 16:15:33 <thingee> flip214: what use cases did you see with this? 16:15:43 <winston-d_> rushiagr: does any other openstack project has similar API like this? 16:15:50 <avishay> Maybe "they" should share their use case - maybe there is a better way to achieve what they want 16:15:51 <rushiagr> jgriffith: I am trying not to make an artifical example. But it's similar to volume transfer.. 16:16:14 <thingee> avishay: +1 16:16:25 <thingee> here's what I'm going to propose... 16:16:27 <winston-d_> rushiagr: then use volume transfer 16:16:32 <rushiagr> avishay: they have data in block storage snapshotted.. I'm using too much 'they' I guess. 16:16:34 <thingee> rushiagr: work with Tiantian on https://review.openstack.org/#/c/125045/ 16:16:35 <jgriffith> winston-d_: :) +1 16:16:47 <flip214> thingee: having a few well-known images that _all_ people can use. RHEL6, RHEL7, Ubuntu, etc. 16:16:56 <thingee> seems like the idea of just public is a preferred approach 16:16:57 <dulek> thingee: +! 16:17:00 <rushiagr> winston-d_: the problem is it's slighly bad on usability. We seem to be not caring much about the end user.. 16:17:17 <thingee> right now we're all confused on what is being proposed here and guessing what you want. and I just spent 15 mins trying to figure it out 16:17:23 <thingee> also if you look at: 16:17:25 <winston-d_> rushiagr: how come? 16:17:25 <thingee> #link https://review.openstack.org/#/c/125045/ 16:17:33 <thingee> I asked for use cases :) 16:18:12 <rushiagr> thingee: sorry, I was and will be busy for this whole week. 16:18:32 <rushiagr> I apologise if I don't seem to have a use case.. Totally my fault.. 16:18:55 <thingee> rushiagr: ok, well you got until feb 15 for feature freeze. and no worries, we just need things better defined to make a decision 16:19:38 <thingee> anything else? 16:19:39 <rushiagr> thingee: okay. Thanks. I'll write a spec.. that would help come to a decision soon.. 16:19:48 <rushiagr> thingee: nope. We can move to next topic 16:19:51 <thingee> rushiagr: work with this spec if possible https://review.openstack.org/#/c/125045/ 16:19:53 <rushiagr> thanks all 16:19:58 <rushiagr> thingee: definitely 16:20:05 <thingee> #topic Additional requirements with Third Party CI 16:20:28 <thingee> so I've gotten questions from driver maintainers 16:20:46 <thingee> How long can a ci be down, before we consider the driver inactive? 16:21:36 <flip214> How about 4 weeks from first notice, or 1 week from third one or so? 16:21:40 <smcginnis> A week? Unless they are in contact for specific issues. 16:21:43 <erlon> does a non reporting CI is considered broken?? 16:22:25 <thingee> erlon: after the deadline, yes. you're not reporting, you're not continuous integrating with proposed changes. 16:22:35 <flip214> I'm thinking about the case that "the" CI maintainer is on vacation for 3 weeks (not that uncommon here in Europe) 16:22:35 <thingee> defeats the whole purpose 16:22:53 <jordanP> yeah, 3 weeks of vacation happens here :) 16:22:59 <erlon> flip214: +1 16:23:03 * smcginnis moving to Europe 16:23:04 <flip214> (yes, there should be a few people knowing how to fix it, but still...) 16:23:15 <winston-d_> flip214, jordanP: i want to get a job there 16:23:27 <eikke> winston-d_: we're hiring 16:23:34 <thingee> I ask maintainers to give me an email address that forwards to a few people in the company 16:23:35 <erlon> even a weekend cannot be enough with 5 days period 16:24:03 <hemna> flip214, then someone else in the company should do their best to be a backup 16:24:16 <hemna> 3 weeks of no reporting on a driver is terrible IMHO 16:24:27 <thingee> asselin_: opinions? 16:24:31 <jgriffith> so IMHO there's no sense arguing about "I might go on vacation" 16:24:39 <jgriffith> first step is get a working system 16:24:42 <jgriffith> go from there 16:24:42 <hemna> jgriffith, +1 16:24:47 <flip214> hemna: yes, "should". In practice, especially in the summer time when lots of people are on vacation, I can foresee issues with that. 16:24:49 <avishay> i don't think there should be a hard rule. if it's down for a week because of equipment failure and you report that you're working on it, i think that's fine. if you disappear for 3 weeks, that's not IMO. 16:25:05 <hemna> avishay, +1 16:25:17 <winston-d_> avishay: +1 16:25:20 <jgriffith> personally if you have a system up and running and you go on vacation and it breaks I don't give a crap 16:25:21 <thingee> Ok, we're agreeing no hard rule, but case by case basis? 16:25:21 <Swanson> avishay: +1 16:25:22 <hemna> flip214, make sure at least someone is a backup. 16:25:24 <jgriffith> fix it when you get back :) 16:25:24 <xyang> avishay: +1 16:25:31 <hemna> thingee, +1 16:25:49 <jgriffith> thingee: hard rule does need to be on demonstrating something functional and reliable at some point though IMO 16:25:49 <smcginnis> thingee: +1 16:26:13 <avishay> jgriffith: +1 16:26:17 <jgriffith> In other words, "prove you've set it up and done the work" 16:26:18 <jgriffith> we 16:26:20 <dulek> jgriffith: +1 16:26:28 <thingee> #agreed case by case basis on CIs not reporting will be handled 16:26:30 <jgriffith> we'll take issues with failures or the system going down as they come up 16:26:37 <jgriffith> cool :) 16:26:43 <jgriffith> just wanted to make sure I was on the same page 16:26:47 <thingee> #topic Target drivers - DRBD transport layer Nova 16:26:58 <thingee> flip214: you'll need to better explain this :) 16:27:02 <flip214> right. 16:27:14 <flip214> well, I'm trying to get a new block storage transport mechanism working. 16:27:23 <flip214> but I keep stumbling.... 16:27:35 <flip214> there's parts of brick, target drivers, connector, etc. 16:28:11 <flip214> so I've got the architecural question: should I change *all* these places, or will eg. brick be removed in the next few weeks anyway? 16:28:26 <hemna> flip214, I'm working on brick 16:28:30 <hemna> it's going to be a bit 16:28:44 <hemna> it'll be my job to try and keep it in sync until we switch over. :( 16:28:56 <hemna> flip214, for now, do the work in cinder/brick/initiator 16:28:57 <flip214> hemna: okay. how about connector? target driver? jgriffith is working on that one, I believe. 16:28:59 <hemna> that you need to do. 16:29:02 <flip214> okay. 16:29:04 <thingee> hemna: do we have an idea of it happening in K? 16:29:18 <hemna> the target side of brick isn't going into the external lib. just the initiator side 16:29:37 <hemna> thingee, I'm going to try and get it in. I'm churning on multi-attach at the same time. 16:29:49 <thingee> flip214: https://github.com/openstack/cinder/blob/master/cinder/volume/targets/driver.py 16:29:54 <flip214> next question: create_export() doesn't know which host will attach the volume. the volume already has the key "attached_host" but it's None. 16:29:57 <thingee> you need to define a new target driver in cinder 16:30:11 <flip214> thingee: I'm doing that. 16:30:11 <asselin_> thingee, sorry multitasking. There's a spec for CI Status that will apply to all 3rd party ci. I think we should define working as part of that. 16:30:27 <thingee> asselin_: thanks! 16:30:41 <flip214> do we want attached_host to have the host that it _will_ be attached to? 16:30:47 <thingee> asselin_: can you give me the link when you have a chance? 16:30:53 <asselin_> #link https://review.openstack.org/#/c/135170/ 16:31:02 <flip214> would make my life much easier, if I would get that information. 16:31:44 <flip214> next point: for DRBD many storage hosts can be connected to one nova node at the same time. (That's the HA part of DRBD). So I'd provider_location much larger... 4kB or even more. 16:32:04 <flip214> Is there a chance for that, or is there some hard restriction that says no? 16:32:35 <flip214> Can I pass only a string in provider_location to Nova, or can that be structured data, too? 16:32:59 <xyang> the size of provider_location is 255 16:33:05 <thingee> I'm not sure I understand why provider_location matters for multiple nodes attaching 16:33:06 <flip214> xyang: now, yes. 16:33:32 <flip214> thingee: because the nova node needs to know _all_ storage nodes it should access at the same time. Many hostnames, IP addresses, TCP Ports, etc. 16:33:41 <flip214> the 255 characters are too small. 16:34:02 <flip214> If there's _no_ chance to get that enlarged, I'll have to pass the data around in some other way. 16:34:30 <hemna> can't you pass in a list to a new entry? 16:34:34 <flip214> hemna: BTW, my changes are in https://github.com/LINBIT/cinder 16:34:37 <hemna> provider_locations = [] 16:34:39 <thingee> flip214: I don't have an answer for you right now. provider_location was for things like iscsi..know the ip portal, lun number, etc 16:35:05 <flip214> hemna: I don't think so, because that all gets serialized to the database, which won't know about arrays, no? 16:35:23 <xyang> flip214: do you have to save it in provider_location? can initialize_connection return that info to nova? 16:35:33 <flip214> thingee: yes, for a single connection is was enough. _multiple_ connections are one reason for using DRBD. 16:35:58 <flip214> xyang: initialize_connection in cinder, or in nova? if in nova, where would it get that information from? 16:36:03 <xyang> in cinder 16:36:37 <thingee> flip214: so it sounds like you'll need to propose a way to support this in cinder. I was just saying I don't have an immediate answer for you 16:37:19 <flip214> right. I wanted to bring that up, so that people can think about it and tell me possible solutions ... we can talk about them later on, I hope for next week. 16:37:55 <flip214> if there's some way to relay information from cinder to nova via mq without size or data structure restriction it would be ideal ... I don't know whether such a thing exists. 16:38:19 <thingee> flip214: everything is done through cinder-api. 16:38:56 <flip214> so, as summary: my questions are pre-fill "attached_host" during create_export? size of "provider_location"? connector, target driver, brick? 16:38:59 <flip214> thank you. 16:39:06 <flip214> I'm done for today. 16:39:12 <thingee> heh 16:39:20 <thingee> sorry we weren't much of help 16:39:31 <thingee> #topic TaskFlow workflows 16:39:34 <thingee> #link https://review.openstack.org/#/c/135170/ 16:39:37 <flip214> that's okay, I didn't expect (only hope ;) for immediate answers. 16:39:38 <thingee> dulek: hi 16:39:41 <dulek> okay, some meetings ago I offered my help on improving reliability of cinder 16:39:51 <dulek> I wasn't pointed to any community-driven initiative, so tried to find one 16:40:00 <dulek> idea is to resume interrupted workflows when starting the services 16:40:02 <thingee> dulek: great we need help with taskflow, and it's not with persistent 16:40:15 <thingee> #link https://bugs.launchpad.net/bugs/1408763 16:40:17 <dulek> general spec is in the link thingee provided 16:40:39 <e0ne> thingee, dulek: me and vilobh are also interested in it 16:40:41 <dulek> yeah, I'm monitoring this bug 16:41:02 <dulek> so I'm asking for opinions 16:41:05 <thingee> I think jgriffith was looking at that bug for week to get us to work better with taskflow. right now it's not a positive thing imo 16:41:32 <thingee> I'll let other talks :) 16:41:45 <dulek> this should be certainly solved before persistence, I've mentioned it in API part of the spec 16:42:04 <jgriffith> dulek: that's not the issue 16:42:10 <dulek> I can take a look at that as it's a blocker and jgriffith stopped to work on that 16:42:22 <thingee> jgriffith: yes sorry... 16:42:29 <dulek> why? 16:42:46 <thingee> so the real issue is people are finding that working with taskflow hasn't been the easiest. 16:42:46 <jgriffith> dulek: it's not an issue of restarting jobs 16:43:02 <jgriffith> dulek: the problem is that the taskflow retries are issued from the TF lib 16:43:11 <jgriffith> dulek: our code isn't written to deal with that properly 16:43:22 <hemna> thingee, that's kinda always been the main issue w/ taskflow 16:43:25 <hemna> is it's complexity 16:43:28 <dulek> jgriffith: resumes can also trigger a retry 16:43:37 <jgriffith> dulek: so for example in the increment GB section.... it raises before the increment 16:43:47 <jgriffith> dulek: which IMO is going to make things "worse" 16:43:51 <jgriffith> dulek: with our current code 16:44:03 <jgriffith> dulek: we don't have clear control lines 16:44:17 <jgriffith> dulek: so we're doing a number of "unexpected" things on retries 16:44:32 <jgriffith> dulek: besides.... what's so great about resume functionality anyway :) 16:44:55 <dulek> resumes solve clean shutdown problems and increase reliability 16:45:02 <jgriffith> dulek: meh 16:45:07 <jgriffith> dulek: don't believe the hype 16:45:20 <jgriffith> dulek: regardless.... 16:45:21 <e0ne> dulek: it could help 16:45:32 <dulek> if cinder service dies it will probably be started again by pacemaker and resume it's wokr 16:45:33 <e0ne> if we'll implement it 16:45:35 <jgriffith> dulek: go for it, happy to see folks improve what we have 16:45:37 <avishay> can't tell if jgriffith is being sarcastic or not 16:45:47 <dulek> avishay: :) 16:45:51 <jgriffith> avishay: little of both 16:45:54 <rushiagr> avishay: +1 16:46:10 <dulek> but jgriffith answered my main concern 16:46:39 <dulek> should I go in this direction or is this idea not worth it? 16:47:09 <e0ne> dulek: we neeed to work both with harlowja_away on it. it is really not a lot of time to be done in K 16:47:23 <dulek> I know tpatil's team can help on implementation and now e0ne is interested 16:47:48 <dulek> yeah, I'm consulting this with harlowja_away constantly 16:48:00 <dulek> so any opinions from cores? 16:48:27 <thingee> by the way, tell violbh to finish what he started with cinder :) 16:48:52 <e0ne> dulek: harlowja_away sad that vilobh will contunie work on in in cinder 16:49:16 <thingee> #topic open discussion 16:49:27 <e0ne> thingee: i've already tried. hope, he will be available today 16:49:45 <dulek> Ok, I hope this means go on. ;) 16:50:07 <thingee> dulek: I moved on because you said you got your answer from jgriffith. jgriffith answering for me, I'll take it! 16:50:08 <e0ne> thingee: is it ok if i'll continue to work on that patch if violbh won't soon? 16:50:30 <thingee> e0ne: which? this https://blueprints.launchpad.net/cinder/+spec/cinder-state-enforcer 16:50:43 <e0ne> thingee: yes, this pne 16:50:48 <e0ne> s/pne/one 16:50:55 <thingee> e0ne: please 16:51:02 <e0ne> thingee: thanks 16:51:13 <thingee> just communicate with harlowja_away and violbh 16:51:41 <thingee> oh, so topics for mid cycle meetup 16:52:13 <thingee> add to potential topics https://etherpad.openstack.org/p/cinder-kilo-midcycle-meetup 16:52:22 <thingee> we already have some :) 16:52:35 <thingee> I think that's it for me 16:52:37 <thingee> anyone else? 16:52:52 <thingee> going 16:52:53 <flip214> is next week a meeting? 16:52:53 <thingee> going 16:53:02 <hemna> thingee, do we need to chat about brick 16:53:02 <thingee> flip214: good question, no. 16:53:09 <hemna> and the process of cutting over? 16:53:22 <hemna> I'll add that. (in the hopes that I can get the lib done) 16:53:23 <e0ne> thingee: will we have hangouts? 16:53:36 <thingee> e0ne: we did last time. hemna was nice enough to do that. 16:53:42 <e0ne> cool! 16:53:43 <rushiagr> what's a good way to target a bug or a bp to a milestone? Can I as a cinder driver group member do that? or is it the responsibility of the PTL? 16:53:46 <thingee> e0ne: not sure about this time 16:53:56 <e0ne> hemna: pleeeez 16:54:13 <thingee> rushiagr: core or me can target it. it'll be targeted regardless after being implemented 16:54:19 <rushiagr> e.g. for bugs which have a patch which has two +1s and a +2? 16:54:42 <rushiagr> thingee: okay. Thanks :) 16:54:43 <hemna> added Cinder Agent discussion as well. 16:55:03 <thingee> ok thanks everyone! 16:55:05 <thingee> #endmeeting