#openstack-meeting log

16:03:05 <jgriffith> #startmeeting cinder
16:03:07 <openstack> Meeting started Wed Nov 14 16:03:05 2012 UTC.  The chair is jgriffith. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:03:08 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:03:09 <openstack> The meeting name has been set to 'cinder'
16:03:20 <thingee> oh I didn't comprehend your message correctly. It appears durgin just signed on at 7:56
16:03:23 <jdurgin1> hello
16:03:26 <thingee> there!
16:03:28 <jgriffith> thingee: Morning!  :)
16:03:52 <jgriffith> thingee: after your all nighter you don't stand a chance at comprehending me :)
16:04:00 <jgriffith> thingee: I'm confusing enough on a good day
16:04:43 <jgriffith> alright, I want to start with G1 status updates
16:04:47 <jgriffith> #topic G1 status
16:04:56 <jgriffith> #link https://launchpad.net/cinder/+milestone/grizzly-1
16:05:41 <jgriffith> This is what we have slated and remember G1 is next week
16:06:19 <winston-d> :)
16:06:45 <jgriffith> Anybyody have any concerns for what they're signed up for thati should know about?
16:06:50 <bswartz> I see one of the blueprints mentions splitting the drivers into 1 per file
16:06:55 <jgriffith> Need help, blockers, etc?
16:07:01 <bswartz> I would prefer keeping the netapp drivers together
16:07:44 <winston-d> jgriffith, mines are on track.
16:07:57 <jgriffith> winston-d: excellent
16:08:20 <jgriffith> bswartz: the first phases of that change have already landed
16:08:27 <jgriffith> bswartz: over a week ago
16:08:57 <thingee> jgriffith: looking pretty good on apiv2. clearer-error-messages is going to be pretty easy too
16:09:16 <jgriffith> thingee: awesome, so the power of positive thinking worked :)
16:09:27 <jgriffith> thingee: That and no sleep for a night!
16:09:38 <bswartz> jgriffith: I think the change in general is a good idea, but I'm just suggesting that the NetApp drivers are exempted
16:09:55 <jgriffith> bswartz: I'm just pointing out that the change already merged
16:10:09 <bswartz> oh wait, I'm looking at the wrong tree
16:10:10 <jgriffith> bswartz: https://review.openstack.org/#/c/15000/
16:10:53 <bswartz> okay it looks like netapp wasn't affected by that change
16:11:02 <thingee> jgriffith: I thought apiv2 was targetted for g1?
16:11:09 <bswartz> I will keep an eye out for the next phase of the change
16:11:38 <jgriffith> bswartz: ok, we still need to figure out if there needs to be a next phase I suppose but anyway
16:12:11 <jgriffith> thingee: hmmm.... it was but it was moved last week when we got Chucks version
16:12:25 <jgriffith> thingee: My plan/idea was to get the structure for G1
16:12:25 <thingee> ah right
16:12:42 <jgriffith> thingee: All the other additions/enhancements will come in G2
16:12:54 <jgriffith> thingee: So it's critical that we have everything in place to do that at G1
16:12:56 <thingee> yeah. I'll let you know once I move the other stuff into separate bps and then retarget?
16:13:09 <DuncanT> Sorry, didn't see the time
16:13:09 <jgriffith> perfect
16:13:16 <jgriffith> DuncanT: NP
16:13:38 <jgriffith> So if there are no big concerns about getting these in....
16:13:51 <jgriffith> Are there any big concerns about something that should be there that's not?
16:14:44 <jgriffith> Yay!  I'll take silence as a good thing ;)
16:15:09 <jgriffith> #topic service and host info
16:15:30 <jgriffith> So something else that came up this week was cinder-manage service list :)
16:15:48 <jgriffith> Turns out I was in the process of implementing that functionality in the cinderclient
16:16:11 <jgriffith> There may be some concern around doing this and I wanted to get feed-back from all of you
16:16:28 <winston-d> i see you are working on a host extension?
16:16:44 <jgriffith> winston-d: Yes, but I am planning to change that around a bit
16:16:56 <jgriffith> It should fit more with what nova does I think
16:17:12 <jgriffith> and then add a seperate service extensionn for the "other stuff"
16:17:24 <winston-d> jgriffith, ok. i'm fine with that.
16:17:33 <jgriffith> TBH I'm not sure of the value in the host extension any longer
16:17:49 <winston-d> hmm
16:17:52 <jgriffith> 90% of it just raises notimplemented
16:18:19 <jgriffith> winston-d: I'd like to implement the extension and then we can fill in the underlying stuff later
16:18:33 <jgriffith> but I want to make sure it's not something that nobody will ever use :)
16:18:59 <creiht> what was the host extension?
16:19:02 <jgriffith> ie host power-actions (reboot a cinder noode) set maintenance-window etc
16:19:08 <creiht> oh
16:19:12 <creiht> interesting
16:19:39 <jgriffith> creiht: yeah, I started with just  a place to put things like "show me all the cinder services, their status and where they're running"
16:19:41 <DuncanT> Is there a detailed blueprint for it? I'm only familiar with a very small subset of it
16:19:52 <jgriffith> Then I noticed nova had this hosts extension but it's a bit different
16:20:09 <DuncanT> The services/nodes/statuses stuff is definitely useful
16:20:14 <jgriffith> DuncanT: Nope, I didn't go into detail with the BP because it existed in Nova
16:20:32 <jgriffith> DuncanT: Yeah, but I'm thinking that should be seperate form the hosts extension
16:20:58 <winston-d> DuncanT, agree. service/node/status are useful.
16:20:59 <jgriffith> Leave the hosts extension in line with what it does in Nova, and add services specifically for checking status, enable/disable etc
16:21:18 <winston-d> i usually treat that as how scheduler sees the cluster.
16:21:29 <DuncanT> I need to go look at the nova version before I can comment, I guess...
16:21:32 <jgriffith> winston-d: explain?
16:21:57 <jgriffith> DuncanT: It's sorta funky IMO, mostly because a good portion of it is not implemented
16:22:14 <jgriffith> DuncanT: But the idea is that you perform actions on your compute nodes
16:22:41 <winston-d> jgriffith, well, what services are up/down, etc. basically i'd check nova-manage service info when i see something wrong with instances scheduling.
16:22:55 <winston-d> this was missing part in cinder.
16:22:55 <jgriffith> winston-d: ahh... yes, ok
16:23:11 <jgriffith> winston-d: So that is what I set out to address with this
16:23:29 <jgriffith> winston-d: But I'm thinking now that it might be usefull if this had it's own extensions (service)
16:23:39 <DuncanT> I don't think the cinder API is the right place to be rebooting nodes... (nore the nova API for that matter)
16:23:43 <jgriffith> winston-d: There are a number of things that I think could be added into that in the future
16:24:05 <jgriffith> DuncanT: yeah, that's something I thought would come up :)
16:24:05 <DuncanT> Service info as its own extension sounds like the way to go then...
16:24:08 <winston-d> i agree with DuncanT.
16:24:28 <jgriffith> #action jgriffith dump hosts extension for now and implement services ext
16:24:45 <jgriffith> Everybody ok with that?
16:24:53 <DuncanT> Yup
16:24:58 <winston-d> i'm good.
16:25:04 <jdurgin1> sounds good
16:25:29 <jgriffith> Just for background...  there's a push to get things out of the *-manage utilities and into the clients
16:25:43 <jgriffith> that's why I didn't just pick up that patch and be done last week :)
16:26:24 <winston-d> jgriffith, that's related to admin API stuff?
16:26:26 <jgriffith> ok... any questions/suggestions on this topic?
16:26:37 <DuncanT> I'd ideally like to keep "cinder-manage service list" working direct from the database too, but I won't cry if it disappears (I'll just carry my own version... I only want it for dev systems)
16:26:41 <jgriffith> winston-d: yes, they would be admin only extensions
16:26:57 <jgriffith> DuncanT: Well, we can put it in as well.... but it DNE today :)
16:27:38 <DuncanT> I'll send a patch to put it in :-)
16:27:54 <jgriffith> DuncanT: We can just reactivate the one I rejected this week :)
16:28:05 <jgriffith> I think it was from Avishay, but can't remember
16:28:14 <kmartin> I'll let hemna know
16:28:24 <jgriffith> kmartin: Ahhh... thanks!!!!
16:28:31 <jgriffith> kmartin: Yes, it was hemna!
16:28:38 <DuncanT> :-)
16:28:43 <jgriffith> I'm just not sure about the value in having both but whatevs
16:29:07 <jgriffith> any other thoughts?
16:29:23 <kmartin> jgriffith: he should be in the office shortly, he'll lget it done today
16:29:27 <winston-d> i think nova is trying to avoid direct db access.
16:29:44 <jgriffith> winston-d: yes, that was my point of rejecting it the first time around
16:29:52 <winston-d> have that in cinder-manage, means we are adding direct db access?
16:30:00 <jgriffith> winston-d: yup
16:30:14 <DuncanT> I think it is a lordable aim but not needing the endpoint address is handy on dev systems
16:30:29 * jgriffith is hopeful somebody might agree with himm on this
16:30:52 <winston-d> last time, when i proposed to add some feature to nova-manage, it was rejected and suggest to do that in novaclient.
16:31:18 <jgriffith> Ok, I'm reverting back to my original stance on this
16:31:33 <jgriffith> kmartin: dont' tell hemna to resbumit please :)
16:31:55 <kmartin> no problem
16:32:00 <DuncanT> Fair enough, I'll keep an out-of-tree version for now
16:32:00 <creiht> lol
16:32:01 <jgriffith> DuncanT: we can revisit later, but i hate to put something in there just for dev
16:32:18 <jgriffith> DuncanT: Maybe I can just give you a developers tool that you can use :)
16:32:24 <creiht> I'm not sure if I am fond of having the manage tools also in the client
16:32:32 <winston-d> i think rackspace private cloud team has a lot of db access scripts to do management jobs.
16:32:35 <creiht> but I can understand how it makes certain things easier
16:32:44 <winston-d> they even have some project around that.
16:32:59 <jgriffith> creiht: Well the idea is they would be ONLY in the client if that helps :)
16:33:06 <DuncanT> DB access can be really handy when something in the db is stopping your API server from working :-)
16:33:24 <jgriffith> DuncanT: yeah, in some cases it's the only option :)
16:33:25 <DuncanT> But those type of tools tend to be very site specific I think
16:33:47 <jgriffith> DuncanT: I think it's fair that those are handled by the provider IMO
16:34:15 <DuncanT> As I said, it is a trivial enough thing to maintain out-of-tree for now... might bring it up again in six months
16:34:16 <winston-d> DuncanT, yeah, i know. the question is if we want more of that goes into cinder.
16:34:58 <jgriffith> Ok, I'm going to proceed forward.... people can scream and punch me in the face later if they want :)
16:34:59 <winston-d> ops people may already have much more powerful db scripts to do auditing/monitoring/reaping jobs. i guess.
16:35:07 <jgriffith> winston-d: +1
16:35:25 <winston-d> :)
16:35:31 <jgriffith> #topic gate test failures
16:35:41 <DuncanT> I'd like to bring some of that power into the upstream tree to save reinventing the wheel, but I'm happy to accept that cleaning thing up needs to happen first
16:35:57 <jgriffith> It just occured to me this  AM that I haven't been updating people on this whole mess
16:36:13 <jgriffith> The Ubuntu dd issue...
16:36:30 <jgriffith> We continue to see intermittent failures in the gate tests due to this
16:36:58 <jgriffith> the kernel folks working it are making some progress but it's really becoming a thorn in my side
16:37:01 <jgriffith> Soo.....
16:37:02 <DuncanT> I tried and failed to reproduce
16:37:15 <jgriffith> DuncanT: Yeah, that's what sucks about it
16:37:29 <jgriffith> DuncanT: But if you tear down, rebuild enough times you'll hit it
16:37:42 <jgriffith> physical or virtual
16:37:59 <jgriffith> Anyway, I put in a temporary patch:
16:38:07 <DuncanT> Hmmm, have you got a set of backtraces from when it is happening?
16:38:13 <winston-d> DuncanT, checkout https://github.com/JCallicoat/pulsar that's *nova swiss army knife*.
16:38:20 <jgriffith> I added a "secure_delete" flag that is checked on the dd command
16:39:10 <DuncanT> winston-d: Cheers
16:39:15 <jgriffith> The default is set to True, but in the gate/tempest jobs we set it to False in the localrc file
16:39:25 <winston-d> that works?
16:39:49 <jgriffith> This will hopefully keep everybody from saying "Cinder failed jenkins again"
16:39:56 <jgriffith> :)
16:40:09 <winston-d> great
16:40:31 <jgriffith> I've been trying some other methods of the secure delete but they either have the same problem or other severe perf problems
16:41:08 <jgriffith> Anyway, I thought I should start keeping everybody up to speed on what's going  on witht hat
16:41:11 <jgriffith> that
16:41:34 <jgriffith> I'm still hopeful that one morning I'll find the kernel fairy came by while I slept and have this fixed
16:42:11 <jgriffith> otherwise during G2 we'll need to focus on a viable alternative
16:42:29 <jgriffith> Any questions/thoughts on this?
16:42:35 <creiht> jgriffith: would out of band zeroing make it better?
16:42:47 <jgriffith> creiht: how do you mean?
16:42:50 <winston-d> jgriffith, do you have ubuntu bug # on this?
16:43:21 <jgriffith> winston-d: https://bugs.launchpad.net/cinder/+bug/1023755
16:43:23 <DuncanT> There's always the option of breaking the LVM and hand building a couple of linear mappings and writing zeros to them
16:43:23 <uvirtbot> Launchpad bug 1023755 in linux "Precise kernel locks up while dd to /dev/mapper files > 1Gb (was: Unable to delete volume)" [Undecided,Confirmed]
16:43:45 <winston-d> jgriffith, thx
16:43:57 <DuncanT> Should have way better performance that way too
16:44:30 <creiht> jgriffith: we zero in an outside job that runs periodically
16:44:42 <jgriffith> creiht: Ohhh..... got ya
16:44:49 <jgriffith> creiht: I don't think it would sadly
16:45:10 <jgriffith> creiht: The dd to the dev/mapper system itself seems to be the issue
16:45:34 <jgriffith> I don't thinkk it would matter when that's done, the failures in the tests are MOSTLY because the kernell locks up
16:46:05 <jgriffith> This would still happen but "other" tests/operations on the system would fail
16:46:22 <jgriffith> and it would be harder to figure out why.... unless I'm missing something
16:46:44 <jgriffith> creiht: although... if you guys aren't seeing this issue maybe there's something to that idea
16:48:19 <creiht> what version of ubuntu are they seeing it on?
16:48:20 <jgriffith> 12.04
16:48:57 <creiht> yeah it is weird that we haven't seen anything like that
16:49:16 <jgriffith> creiht: That is odd....
16:49:22 <creiht> is it specific to how dd writes, or is it just the squential writes of data?
16:49:28 <creiht> because I don't think we use dd
16:49:33 <jgriffith> OH!!
16:49:45 <jgriffith> Yeah, it definitely seems dd related
16:49:47 <creiht> we have a python script that zeros
16:49:56 <creiht> I think
16:49:58 <creiht> :)
16:50:08 <jgriffith> But I tried changing it to like "cp /dev/zero /dev/mapper/xxxx"
16:50:26 <jgriffith> This  eventually failed as well
16:50:47 <creiht> I'm not somwehere that I can look at the code right now, but I will see if I can dig a little deeper and report back to you
16:50:55 <jgriffith> creiht: cool
16:51:30 <eharney> jgriffith: have you tried different dd block sizes?  maybe the python script just writes with a different pattern
16:51:31 <jgriffith> creiht: It may be worth testing as you pointed out just doing direct access writes from python in the delete function as well
16:52:00 <jgriffith> eharney: Haven't messed with block-sizes too much
16:52:29 <jgriffith> DuncanT: I'd also like to hear more about your proposal as well
16:52:38 <bswartz> jgriffith: don't do "cp /dev/zero /dev/mapper/xxxx", do "cat < /dev/zero > /dev/mapper/xxxx"
16:53:12 <jgriffith> bswartz: ok,  I  can try it...
16:53:15 <jgriffith> bswartz: thanks
16:53:21 <DuncanT> jgriffith: I'm trying to code in now :-)
16:53:30 <jgriffith> DuncanT: awesome
16:53:56 <jgriffith> So BTW... anybody interested in this feel free :)  I''m open to ideas
16:54:09 <jgriffith> It's  just really tough to repro
16:54:28 <jgriffith> You almost have to tear down and rebuild each time
16:54:37 <DuncanT> I can't reproduce, but I don't need to to send you a patch
16:55:09 <jgriffith> cool
16:55:20 <jgriffith> alright... we've beat that dead horse enough for today
16:55:30 <jgriffith> #topic open discussion
16:55:43 <jgriffith> Anybody have anythign they want/need to talk about?
16:56:01 <winston-d> nope
16:56:10 <DuncanT> https://blueprints.launchpad.net/cinder/+spec/add-expect-deleted-flag-in-volume-db
16:57:04 <DuncanT> I've a slightly alternative proposal: Set the state to 'deleting' in API
16:57:15 <DuncanT> Match 'attaching' that we already have
16:57:32 <zykes-> Oh, cinder meeting ?
16:57:38 <zykes-> How goes the FC / SAN stuff ?
16:57:39 <bswartz> I would like to introduce rishuagr
16:57:40 <winston-d> DuncanT, don't we have that?
16:57:47 <bswartz> rushiagr*
16:58:20 <DuncanT> winston-d: I'm not sure if cinder have it, I couldn't see it in the code, but I only spent a few seconds looking. If we have, then I can't see what the blueprint is about?
16:58:31 <bswartz> Rushi is a member of the NetApp team who is working on cinder full time
16:58:44 <rushiagr> hi all !
16:59:11 <kmartin> zykes-:The FC blueprint is moving through the HP legal system....slowly
16:59:41 <bswartz> I would like Rushi to be added to the cinder core team soon
16:59:52 <winston-d> DuncanT, i think that bp is mainly for billing.  they don't want slow zeroing to mess-up with billing.
17:00:13 <DuncanT> winston-d: Surely you stop billing once it is in the 'deleting' state?
17:00:18 <DuncanT> (we do)
17:00:31 <jgriffith> DuncanT: I would hope so :)
17:00:32 <winston-d> DuncanT, oh, i see your point.
17:00:43 <winston-d> rongze_, ping
17:00:44 <jgriffith> DuncanT: Otherwise you better give an option to NOT secure delete :)
17:00:49 <zykes-> kmartin: :/
17:00:55 <creiht> that's the other nice thing about out of band delete
17:01:01 <winston-d> creiht, :)
17:01:02 <zykes-> billing you mean metering ?
17:01:04 <creiht> erm out of band zeroing
17:01:33 <rongze_> hi
17:01:46 <eharney> i haven't talked to some of you guys much yet, but Cinder is also becoming my primary focus... so.. hi :)
17:02:00 <thingee> DuncanT: the api appears to have a delete method for setting the deleting state.
17:02:00 <winston-d> rongze_, DuncanT was talking about your expected-deleted-flag bp.
17:02:07 <jgriffith> eharney: welcome...
17:02:12 <DuncanT> OoB zeroing is a win we've found too, but that is a differnt question to this blueprint I think?
17:02:18 <thingee> before it calls volume_delete
17:02:19 <jgriffith> Let's get through DuncanT's topic here...
17:02:44 <winston-d> thingee, yes, there is.
17:03:02 <DuncanT> So what is this blueprint proposing? I can't make sense of it
17:03:09 <winston-d> rongze_, and DuncanT suggest you stop billing when there's 'deleting' state, what do you think?
17:03:17 <rongze_> yes
17:03:29 <rongze_> I agree DuncanT
17:03:46 <DuncanT> So what is the blueprint suggesting?
17:03:49 <jgriffith> rongze_: I thought when we talked about this though the idea was.....
17:04:23 <jgriffith> rongze_: We have the ability to know when it's safe to remove a volume even if the delete operation never quite finished or errored out
17:05:32 <DuncanT> Isn't that 'still in deleting state'?
17:05:58 <jgriffith> DuncanT: yes, but it's the "hung in deleting state" thing that could be solved
17:06:22 <jgriffith> DuncanT: at least that's what I thought we were aiming for
17:06:53 <jgriffith> DuncanT: as it stands right now you can get that state and you're there forever unless you go in and manipulate the DB by hand
17:07:11 <DuncanT> I think this is a special case of the problem of needing (none-customer-facing) substates for all sorts of hang/lost message cases... Would it be worth trying to come up with a proposal that covers all of them?
17:07:25 <jgriffith> DuncanT: Perhaps.... yes
17:07:48 <jgriffith> rongze_: Is my interpretation accurate, or did I misunderstand you on this?
17:08:16 <DuncanT> i.e. have a sub-state field that could go 'delete api request dispatched' -> 'delete started on node XXX' -> 'scrubbing finished' -> 'gone'
17:08:41 <DuncanT> Same field can be used for create subtasks, snapshot subtasks, backup subtasks etc
17:08:46 <jgriffith> DuncanT: Yeah, which brings up the new state implementation stuff clayg teased us with :)
17:08:48 <rongze_> What is the instance is deleted?
17:09:00 <jgriffith> rongze_: on delete we don't care...
17:09:10 <jgriffith> rongze_: we're already detached right?
17:09:53 <jgriffith> DuncanT: I  think what you're proposing is the way we want to go, and I believe it's the sort of thing clayg had in mind
17:10:03 <rongze_> I think we can reference instance delete operation
17:10:22 <jgriffith> rongze_: Oh, I see what you mean.. sorry
17:10:31 <jgriffith> Ok...
17:11:02 <jgriffith> #action discuss/clarify blueprint add-expect-deleted-flag-in-volume-db
17:11:12 <jgriffith> We'll pick this up at the top of G2
17:11:20 <jgriffith> Meanwhile...
17:11:35 <jgriffith> rushiagr: welcome
17:11:55 <jgriffith> eharney: welcome to you as well
17:12:09 <jgriffith> rushiagr: eharney Hang out on IRC in #openstack-cinder
17:12:30 <eharney> will do
17:12:30 <jgriffith> Or PM me and we can sync up later
17:12:50 <jgriffith> I'm headed to the airport here and will be travelling today but otherwise....
17:12:55 <jgriffith> kmartin: any FC updates?
17:13:03 <eharney> ok
17:13:38 <kmartin> jgriffith: legal stuff...but that has not stopped us from starting to code
17:14:31 <jgriffith> kmartin: Ok.... please try and get some details added to the BP next week if you can
17:15:53 <jgriffith> Ok... we're over time
17:15:57 <jgriffith> Thanks everyone
17:16:01 <jgriffith> #endmeeting