#openstack-meeting log

15:59:51 <jgriffith> #startmeeting cinder
15:59:52 <openstack> Meeting started Wed Apr 16 15:59:51 2014 UTC and is due to finish in 60 minutes.  The chair is jgriffith. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:59:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:59:57 <openstack> The meeting name has been set to 'cinder'
16:00:15 <jgriffith> Hey everyone
16:00:29 <kmartin> hello
16:00:33 <thingee> o/
16:00:36 <jgriffith> Just wanted to do a quick synch up with folks today
16:00:39 <asselin> hi
16:00:43 <jgriffith> https://wiki.openstack.org/wiki/CinderMeetings
16:00:56 <jgriffith> #topic Release Status
16:01:15 <jgriffith> We cut another RC yesterday morning
16:01:30 <jgriffith> At this point we should be done unless something REALLY critical pops up
16:01:46 <jgriffith> akerr: did note a problem with Glance API V2
16:01:53 <thingee> jgriffith: there was an issue that came up with create from image
16:02:01 <jgriffith> thingee: :)
16:02:03 <thingee> with regards to checksum missing
16:02:05 <jgriffith> yes
16:02:11 <jgriffith> same issue I just mentioned
16:02:16 <jgriffith> It's Glance API V2
16:02:25 <jgriffith> There are two bugs associated with that....
16:02:36 <jgriffith> https://bugs.launchpad.net/cinder/+bug/1308594
16:02:37 <kmartin> should that be fixed, if it's easy?
16:02:37 <uvirtbot> Launchpad bug 1308594 in cinder "upload-to-image fails with size error on glance v2 api" [High,Confirmed]
16:02:46 <jgriffith> https://bugs.launchpad.net/cinder/+bug/1308058
16:02:47 <uvirtbot> Launchpad bug 1308058 in cinder "Cannot create volume from glance image without checksum" [Undecided,New]
16:02:55 <jgriffith> kmartin: the problem is timing
16:03:13 <kmartin> yeah, down to the wire
16:03:28 <jgriffith> So spinning another RC and reseting the package maintainers again this late is not good
16:03:31 <jgriffith> Also....
16:03:36 <thingee> and since we can't guarantee the timing with the state of things, I say we revert and default to None.
16:03:39 <jgriffith> My view is that this is a V2 Glance API thing
16:03:50 <jgriffith> and we default to V1
16:04:08 <jgriffith> My vote/view is document in the Release notes as Known issue (which is done) and roll
16:04:10 <kmartin> agree...should note it
16:04:15 <glenng> But it does have an impact to a new NetApp feature requiring v2.
16:04:22 <jgriffith> glenng: yes, correct
16:04:24 <jgriffith> which sucks
16:04:29 <glenng> agreed
16:05:14 <thingee> it's too bad that this was reported pretty early yesterday too
16:05:18 <jgriffith> I talked to akerr and he stated that a backport could be relatively easy for Netapp customers
16:05:26 <thingee> and not noticed
16:05:37 <glenng> Not the end of the world; documenting would be okay.
16:05:46 <jgriffith> thingee: well since we default to V1 only Netapp uses V2 right now
16:05:59 <thingee> just saying, I'
16:06:05 <jgriffith> thingee: FYI even yesterday morning was a bit late
16:06:07 <thingee> think the cut off wasn't done right
16:06:13 <akerr> jgriffith: our feature is optional as well.  As glenng says, not the end of the world
16:06:14 <thingee> if there are unverified issues like this
16:06:18 <jgriffith> thingee: we cut/shipped yesterday AM at about 8:00
16:06:26 <jgriffith> I think we're ok
16:06:28 <thingee> It was earlier than that
16:06:37 <jgriffith> thingee: well ok...
16:06:41 <jgriffith> thingee: and the point is?
16:06:58 <thingee> it was unverified. there was a cut off
16:07:05 <thingee> what if it was critical?
16:07:33 <jgriffith> thingee: I'm not sure what you're looking for here?  Is this criticism, or something else?
16:07:50 <thingee> I'm just unhappy with the cut off decision
16:07:58 <thingee> on new unverified issues
16:08:04 <jgriffith> thingee: what specifically do you mean?
16:08:05 <thingee> be present
16:08:11 <jgriffith> thingee: the "cut off" decision?
16:08:12 <thingee> can't make it more clear than that
16:08:27 <jgriffith> thingee: You're unhappy with me cutting the RC yesterday?
16:09:03 <jgriffith> thingee: maybe we should talk after the meeting
16:09:17 <jgriffith> thingee: You seem to be very unhappy with how things have been going lately
16:09:21 <jgriffith> maybe we can fix it
16:09:55 <jgriffith> Ok... so back to our regular scheduled program
16:10:25 <jgriffith> #topic summit sessions update
16:10:35 <glenng> Yahoo!
16:10:39 <jgriffith> I'll make another pass on those shortly
16:10:49 <jgriffith> Have some good proposals
16:11:04 <jgriffith> It's not too late if you have items you want to propose, but need to do it today
16:11:12 <thingee> jgriffith: is the ISCSI and FC clean up work in brick needed for a session?
16:11:19 <thingee> could probably just unconf for interested parties
16:11:32 <thingee> will not just brick
16:11:43 <bswartz> thingee: unconf sessions are no more
16:11:47 <bswartz> let me find the blog
16:11:59 <thingee> bswartz: I can invite anyone to a bar with me to discuss it
16:12:01 <jgriffith> thingee: We have slots
16:12:04 <thingee> and then bring it up in the ML
16:12:09 <thingee> ;)
16:12:17 <jgriffith> thingee: we should propose it
16:12:21 <jgriffith> IMO
16:12:24 <glenng> *is interested*
16:12:30 <bswartz> oh a REAL unconf session! are you buying?
16:12:32 <thingee> ok, I had second thoughts how productive it would be
16:12:46 <thingee> bswartz: HA, I'm on a budget nowadays
16:13:00 <thingee> jgriffith: ok I'll propose it
16:13:05 <jgriffith> thingee: thanks
16:13:24 <jgriffith> any questions/suggestions WRT summit sessions?
16:13:53 <thingee> have we verified the people that have proposed these are going to be present or have someone familiar with the subject to be present?
16:13:59 <thingee> I don't want more david wang situations
16:14:01 <jgriffith> haha
16:14:09 <DuncanT-1> Who/is/ David Wang?
16:14:15 <jgriffith> I've checked with each of them and they've "said" they'll be there
16:14:15 <thingee> DuncanT-1: that's the new shirt
16:14:36 <hemna> mornin
16:15:28 <winston-d> hemna: morning~
16:15:29 <jgriffith> anything else from anyone?
16:15:44 <DuncanT-1> I saw a couple of sessions marked incomplete
16:15:52 <DuncanT-1> Does that mean they're out?
16:16:01 <hemna> thingee, iSCSI/FC cleanup work ?
16:16:01 <jgriffith> DuncanT-1: nahh
16:16:10 <thingee> hemna: yes, it's a mess
16:16:12 <jgriffith> DuncanT-1: means they have a chance to come back with a more detailed focus
16:16:18 <thingee> complicated code
16:16:32 <xyang1> jgriffith: how many slots do we have
16:16:38 <jgriffith> DuncanT-1: but as the proposal was there were concerns or it wasn't clear what the objective was
16:16:47 <hemna> thingee, ok fill me in offline
16:16:47 <jgriffith> xyang1: 12
16:16:56 <jgriffith> errr... 11
16:17:05 <DuncanT-1> jgriffith: Ok, cool
16:17:24 <xyang1> jgriffith: do you know which days yet? Wed, Thu, Fri?
16:17:29 <jgriffith> hemna: if you want to update multi-attach perhaps?
16:17:31 <hemna> thingee, almost all of that is directly from nova.   I have plans to refactor some of the initiator side, wrt multi-attach and rediscovery at detach time for iSCSI
16:17:39 <jgriffith> xyang1: Friday last check
16:17:45 <hemna> jgriffith, sure
16:17:46 <jgriffith> xyang1: same as usual
16:18:12 <hemna> multi-attach is coming along.   I have the first set of patches as WIP in gerrit for nova, cinder, cinderclient
16:18:21 <jgriffith> hemna: awesome
16:18:23 <hemna> but I'm working on changes to it as well as getting unit tests working
16:18:29 <jgriffith> #topic open-discussion
16:18:42 * thingee added a topic last minute
16:18:58 <hemna> I had to make a change to the current patches at detach time to pass in the attachment uuid instead of instance uuid, because Cinder can attach to a host (no instance uuid in that case)
16:19:12 <winston-d> jgriffith: was asking about if people are interested in immutable volumt type or in place update for volumes when admin making changes to type definition or type-QoS associations
16:19:16 <jgriffith> thingee: I don't see it?
16:20:02 <hemna> I came across an issue yesterday that I might need some help with
16:20:02 <thingee> weird, I do
16:20:10 <thingee> well it's 'cinder resource status'
16:20:15 * jgriffith refreshes
16:20:18 <thingee> specifically the way we handle a status for an object
16:20:21 <jungleboyj_> I am here now.
16:20:22 <thingee> https://bugs.launchpad.net/cinder/+bug/1305550
16:20:23 <uvirtbot> Launchpad bug 1305550 in cinder "Failed retype with driver raised exception should set volume status to "error"" [Undecided,In progress]
16:20:42 <thingee> this bug raised a thought that we make the status field too complicated
16:20:45 <jgriffith> thingee: oh, topic to the agenda
16:20:49 <jgriffith> not the summit
16:21:13 <winston-d> thingee: or not complicated enough?
16:21:19 <jgriffith> #topic what to do on retype failure
16:21:37 <thingee> I would like folks to think of cinder of trying removing a lot of the intervention by ops and user.
16:21:38 <jgriffith> winston-d: not complicated enough IMO
16:21:53 <hemna> volume/manager.py _migrate_volume_generic has a call into nova to update_server_volume for an instance.   since cinder can be attached to multiple instances, which one do I use in the nova_api.update_server_volume() call?
16:21:58 <thingee> to be clear I don't think it should be up to people recover volumes if cinder can do it
16:22:03 <hemna> or do I call it for every instance.
16:22:12 <hemna> thats the only oustanding issue I have wrt multi-attach
16:22:13 <jungleboyj_> I think the problem is bigger than just for three type. It seems like we should be able to provide the user with more information about a failure.
16:22:40 <jgriffith> hemna: can we come back around to that
16:22:45 <thingee> if somethin is in an error, just stop. it's done. and keep the status as *error*.
16:22:45 <jungleboyj_> *retype
16:22:49 <jgriffith> hemna: finish up the talk about retype first
16:22:52 <hemna> so I'm going to try and get a second set of patches up in gerrit this week
16:22:55 <hemna> jgriffith, ok
16:23:01 <thingee> don't try to convey it with 'it-failed-because-of-this-thing' status
16:23:17 <thingee> have a separate field that explains why the status is 'error'
16:23:24 <jgriffith> thingee: so the question however is in that particular case is setting it to error-status appropriate I vote no
16:23:28 <DuncanT-1> Fine with the separate field
16:23:39 <thingee> jgriffith: ok, what do you gain from other statuses?
16:23:39 <akerr> thingee: doesn't nova do something similar to that when an instance goes into error?
16:23:52 <jungleboyj_> Can we do a separate fields and still be able to have the user take actions or will it only be the administrator?
16:23:54 <ameade> akerr: +1 instance faults
16:24:01 <jgriffith> DuncanT-1: I think thingee is saying the opposite
16:24:03 <DuncanT-1> jgriffith: what should it do? Leave the volume at the old type?
16:24:15 <thingee> my point is we should reserve error for, there is nothing cinder can do about it. nothing the user can do about it
16:24:17 <jgriffith> DuncanT-1: well, I think so yes
16:24:18 <thingee> it's up to ops
16:24:27 <jgriffith> DuncanT-1: and the reason is becuase there's nothing "wrong" with the volume
16:24:35 <DuncanT-1> jgriffith: That seems quite reasonable to me
16:24:41 <jgriffith> and worse the user doesn't have a mechanism to know what retype's are valid
16:24:45 <jgriffith> so it's trial and error
16:24:53 <thingee> jgriffith: so i agree with retype
16:24:57 <jgriffith> and I think it's bad user experience to put it in error
16:25:07 <jgriffith> and say "haha now you can't use your volume"
16:25:08 <thingee> I'm saying in general, better conveying to the user what happened is what I'm adovcating here
16:25:32 <jgriffith> thingee: so that's another topic IMO
16:25:39 <winston-d> jgriffith: well, i think https://bugs.launchpad.net/cinder/+bug/1305550 here is more about something wrong happened when retyping a volume
16:25:39 <jgriffith> thingee: and we should propose sub-states
16:25:41 <uvirtbot> Launchpad bug 1305550 in cinder "Failed retype with driver raised exception should set volume status to "error"" [Undecided,In progress]
16:25:53 <thingee> jgriffith: agreed. and if you look back to my original sentence, this topic brought on a thought for me
16:25:56 <thingee> of this
16:26:10 <jgriffith> winston-d: yeah, that one is different and that's no good
16:26:28 <jungleboyj_> Thingee so you were saying we wouldn't put the volume in error?
16:26:31 <akerr> jgriffith: we already have error_extending, but I'm not sure thats the best way to go
16:26:31 <thingee> I think sub-states is also complicated. Again, error is just nothing can be done about it. Not cinder, not the user. Just ops
16:26:40 <thingee> put that in a status description field
16:26:45 <jungleboyj_> Just leave it is available with more information in the field?
16:26:53 <thingee> make it so it's safe for users' eyes
16:26:58 <jgriffith> akerr: yeah, it still blocks some things that look for "error_"
16:27:02 <winston-d> Nova has both 'task state' and 'instance falut'
16:27:06 <thingee> ops can see a general idea from what the user sees and look at the logs for more information
16:27:11 <jgriffith> winston-d: +1
16:27:13 <winston-d> we can at least have one
16:27:20 <ameade> you need to be able to handle the case of multiple errors on one resource
16:27:21 <winston-d> or both
16:27:29 <ameade> i dont want to only see info about the latest
16:27:54 <thingee> ameade: so I talked about that in #openstack-cinder too
16:27:54 <jgriffith> ameade: I don't agree with that but I think we're rat holing a bit
16:28:07 <winston-d> ameade: use cases like multi-attaching?
16:28:20 <jgriffith> the bottom line is right now we have ONE and only ONE method of conveying status
16:28:26 <jgriffith> it seems that's not enough
16:28:36 <winston-d> jgriffith: +1
16:28:38 <jgriffith> so we should at least start by implementing a task-state
16:28:41 <jungleboyj_> ameade I think that is going further than we need right now.
16:28:44 <jgriffith> and go from there
16:28:52 * thingee is talking to himself when he just brought up a second way of giving status
16:29:22 <jungleboyj_> jgriffith sounds reasonable.
16:29:34 <jgriffith> thingee: what did you want to say
16:29:39 <jgriffith> thingee: floor is all yours
16:29:45 <jgriffith> everybody listen to thingee
16:29:51 <thingee> let me scrollback up and paste what I said earlier
16:30:04 * jungleboyj_ listens
16:30:24 <thingee> This is also explained in the bug https://bugs.launchpad.net/cinder/+bug/1305550
16:30:25 <uvirtbot> Launchpad bug 1305550 in cinder "Failed retype with driver raised exception should set volume status to "error"" [Undecided,In progress]
16:30:46 <jgriffith> thingee: ummm... sorry that doesn't help me
16:30:50 <jgriffith> thingee: what did YOU say
16:30:51 <hemna> (we should get that bot in openstack-cinder)
16:31:04 <thingee> Reserve 'error' for the resource is not recoverable by user or cinder. it requires manual intervention by ops
16:31:07 <thingee> jgriffith: sorry still typing
16:31:24 <thingee> use a *second* field to give a description of the status
16:31:41 <thingee> instead of 'it-failed-because-of-this-status' like we've been doing
16:32:22 <jgriffith> thingee: sure
16:32:23 <jungleboyj_> jgriffith I think he is pointing out that he wants to keep her for just the worst of situations.
16:32:31 <jgriffith> thingee: as in my proposal in comment #2 of the bug
16:32:38 <jungleboyj_> *error
16:32:52 <winston-d> thingee: so no 'error-extending' but just 'error' with an description field?
16:33:27 <akerr> winston-d: i think not even "error" there because the volume is still usable, just not the new size
16:33:32 <thingee> in order to promote better state setting, I would say instead of using the db api directly, we need some helper for setting state that would require things like the new state e.g. available, error, in-use, and a status description is required if it's something like error state.
16:33:32 <jungleboyj_> winston-d or available with a description field.
16:33:51 <winston-d> akerr: well, that depends.
16:33:58 <winston-d> jungleboyj_: ^^
16:34:01 <jgriffith> thingee: yeah, we've been saying for a year defined and real states
16:34:50 <winston-d> i wish we can have backend driver report some type of failure that actually doesn't hurt/touch the volume.
16:34:59 <jungleboyj_> It seems like we would still have to add a state for the case for your command failed and you need to see the additional information.
16:35:03 <thingee> jgriffith: I guess when I read #2 comment in that bug, I took it as another key being used for the sub-status, not a full description of text.
16:35:06 <jgriffith> winston-d: I agree
16:35:38 <akerr> you could define something like a 'nonFatalError' exception that drivers could throw
16:35:41 <jungleboyj_> winston-d +1
16:35:49 <hemna> doesn't this fall under general state management of volume transactions.  Wasn't taskflow supposed to help with this some?
16:36:07 <winston-d> akerr: yeah, but until we have that, an error could be a unrecoverable error
16:36:42 <jgriffith> Ok... can I say something without hurting any feelings or pissing anybody off?
16:36:48 <jgriffith> let's back up and focus a little
16:37:12 <jgriffith> first; don't bring taskflow into the discussion, it doesn't do what we're talking about regardless of if that was a goal or not
16:37:24 <jgriffith> Let's propose a summit session
16:37:41 <jgriffith> First... let's agree on: Adding a task-status entry
16:37:58 <jgriffith> We can argue about verbosity, what it means etc later
16:38:03 <jungleboyj_> jgr iffith +1
16:38:31 <jgriffith> At the same time, that means we have the opportunity to limit the status field we have today as thingee pointed out
16:38:35 <jungleboyj_> I think getting in a room together and talking about this is a good idea.
16:38:36 <jgriffith> which I think is needed/good
16:38:48 <jgriffith> There are a lot of opportunities here
16:38:55 <jungleboyj_> Ageed.
16:39:15 <jgriffith> but you can't throw in EVERYTHING all at once
16:39:35 <jgriffith> does this sound reasonable to everyone?
16:39:44 <jgriffith> are there any disagreements?
16:39:50 <thingee> question
16:40:01 <hemna> well I just think taskflow is relevant to the discussion of volume state.   that's all.
16:40:14 <thingee> what is the task-state accomplishing? what exactly does 'creating' currently mean for example?
16:40:48 <jgriffith> creating today is a status
16:40:54 <thingee> correct
16:40:56 <jungleboyj_> thingee it is telling the user what is happening that they can't see.
16:41:12 <jgriffith> when you say task-state are you referring to the hypothetical yet to exist thing?
16:41:13 <hemna> and it means you can't take other actions on the volume while it's in that status.
16:41:16 <jgriffith> or something else?
16:41:17 <thingee> ok, so again, it's explaining what 'creating' status currently means.
16:41:20 <thingee> more detailed
16:41:26 <thingee> as an example
16:41:49 <thingee> jgriffith: I'm referring to 'task-state' that you just said a few lines up
16:41:57 <jgriffith> thanks
16:41:59 <jgriffith> and NO
16:42:07 <jgriffith> it's not to describe the status
16:42:15 <thingee> I don't know what task-state means and I was just giving an example to understand.
16:42:17 <jgriffith> it's not to describe what "attaching" means
16:42:24 <jungleboyj_> thingee that is what I am thinking.  That is what we need to talk about at the summit.
16:42:37 <jgriffith> I'll try my proposal again....
16:42:42 <jgriffith> For example:
16:42:49 <jgriffith> You try to extend a volume
16:42:50 <akerr> do you mean state=creating, task-state=in progress?
16:42:56 <jgriffith> The volume/backend doesn't support extend
16:43:12 <jgriffith> The volume is "fine", just not extended
16:43:23 <jgriffith> DONT put the volume in error status
16:43:39 <jgriffith> Set a taks-status of "extend-failed" or whatever
16:43:52 <jgriffith> leave the volume as 'available' and the original size
16:44:01 <jgriffith> Example 2:
16:44:09 <jgriffith> retype from foo to baz
16:44:18 <jgriffith> backend doens't support baz, and migration is not enabled
16:44:28 <jgriffith> DONT set volume to error status and make it unusable
16:44:41 <jgriffith> Set the task-status to "error-retyp" or whatever
16:44:49 <jgriffith> Leave the status as "avaialble"
16:45:01 <jgriffith> thingee: is that clear?
16:45:05 <thingee> yup
16:45:07 <jgriffith> thingee: do I need another example?
16:45:10 <jungleboyj_> jgriffith +2
16:45:10 <hemna> shouldn't we include a tnx history, instead of just the last failure ?
16:45:16 <thingee> hemna: +1
16:45:17 <hemna> txn
16:45:17 <jgriffith> tnx?
16:45:25 <hemna> sorry, I'm lazy....transaction
16:45:29 <glenng> jgriffith + 1
16:45:30 <jgriffith> gotcha
16:45:36 <jgriffith> hemna: maybe...
16:45:40 <jungleboyj_> hemna one thing at a time.
16:45:42 <jgriffith> hemna: 1. What would that be
16:45:48 <jgriffith> hemna: 2. Do you need that first pass
16:45:55 <jgriffith> hemna: 3. How do you manage it
16:45:56 <winston-d> hemna: like instance faluts of Nova?
16:46:04 <jgriffith> winston-d: yes
16:46:08 <thingee> jgriffith: I think it's really important to consider this in the design now. If we change our mind later, it's going to be a pain to change on deployed
16:46:17 <hemna> do we need it right now?  I'd argue that yes, we could use it now :)
16:46:18 <jgriffith> thingee: I'm not saying that it isn't
16:46:19 <thingee> once deployed*
16:46:26 <hemna> does it have to be done first pass, probably not.
16:46:29 <jgriffith> hemna: Please answer the first question
16:46:33 <akerr> maybe get getting ahead here again, but would want a 3rd field with a more descriptive explanation of why the task failed?
16:46:34 <jgriffith> hemna: 'what is it'
16:46:52 <jungleboyj_> The last few states of that volume?
16:47:00 <hemna> another table in the db that tracks transactions and their states/steps/failures
16:47:07 <jgriffith> jungleboyj_: that's your interpretation... I want hemna 's
16:47:25 <thingee> hemna: pretty much what I thought too
16:47:35 <jgriffith> hemna: for who's consumption?
16:47:40 <thingee> the user
16:47:44 <ameade> how does the user know there was an error at all (if the status isn't error)?
16:48:00 <hemna> soo......that leads me to bring up taskflow again.  Isn't there a built in mechanism to taskflow that tracks the transaction state?
16:48:04 * hemna ducks
16:48:13 <hemna> jgriffith, for admins
16:48:21 <winston-d> ameade: task status
16:48:30 <DuncanT-1> hemna: Not really, no. There ought to be, but isn't
16:48:35 <jgriffith> hemna: so this is why I'm asking "you"
16:48:42 <ameade> winston-d: sure that could make sense maybe
16:48:46 <jgriffith> hemna: you say admins, thingee says users
16:48:49 <thingee> winston-d: I think the problem though is how do you know. say the task status already has a value
16:48:50 <akerr> winston-d: that assumes the task-status would clear up after some time?
16:48:50 <hemna> heh
16:48:52 <thingee> how do you know it's new?
16:48:55 <jgriffith> others may say "ops" etc
16:49:15 <hemna> I dunno, I don't think users should need to see why retype failed, but admins do.
16:49:19 <thingee> what if you get the same status? do you have to keep track of the old status to know a change has happened?
16:49:47 <jgriffith> I really think that this is being made much more complex than it should be
16:49:55 <winston-d> akerr: well, task state/status clear doesn't help if you want to find out why the 'retype' request was failed that you invoke 3 days ago.
16:50:00 <hemna> DuncanT-1, ok sounds like we should ping harlowja about adding it then.
16:50:07 <jgriffith> which is part of the problem I have with existing things (like taskflow)
16:50:10 <jungleboyj_> jgriffith +2
16:50:23 <DuncanT-1> hemna: Not simple, since taskflow currently isn't built in a way it can usefully track it
16:50:25 <winston-d> akerr: and after that you also did a bunch of new operations to the voluem
16:50:34 <thingee> jgriffith: I think the current thought is more simplified than it should be. I'm trying to figure out how people would use it.
16:50:41 <thingee> how it would look in clients like horizon
16:50:54 <jgriffith> thingee: the same as it looks in Nova for example
16:51:10 <jgriffith> |Status|Task|
16:51:27 <jgriffith> avaialble|unable-to-retype|
16:51:39 <ameade> fwiw, i think typically in a RESTful api what is usually done is the user would create a new 'retype' resource and they can poll that to see the status of the task
16:51:55 <ameade> but that of course makes no sense in our current design
16:52:00 <jungleboyj_> add a timestamp perhaps?
16:52:21 <thingee> jgriffith: so I'm totally in agreement with going back to available status. +1000. But if extend fails..the user tries twice...they get the same task state back. I guess that's fine and maybe a timestamp of when that task state was updated?
16:52:25 <thingee> just so you know something finished?
16:52:34 <winston-d> So Nova has |Status|Task|InstanceFaults|
16:52:53 <jungleboyj_> thingee +2
16:53:13 <winston-d> |available|unable-to-retype|backend_not_supported|
16:53:48 <jgriffith> winston-d: sure
16:54:09 <DuncanT-1> backend_not_supported doesn't mean anything to an end users tennant though
16:54:22 <jgriffith> DuncanT-1: yeah, I'd suggest that field be admin
16:54:30 <akerr> thingee: so I suppose a task history would come in handy there — cinder task-history <uuid> -> | Task | Outcome | Timestamp |
16:54:31 <jgriffith> but again I think we're getting ahead of ourselves a bit
16:54:34 <winston-d> DuncanT-1: instance falut is for admins
16:54:34 <DuncanT-1> Ok, that makes sense
16:54:43 <hemna> DuncanT-1, unless you want to portray it as that action is not available
16:54:51 <hemna> since it will always fail
16:55:30 <jgriffith> 5 minute warning
16:55:43 <winston-d> akerr: try logstash with request ID
16:56:04 <DuncanT-1> Or stacky with the same
16:56:13 <winston-d> yeah
16:56:17 <jgriffith> yeah, please don't suggest duplicating the log files in some API call
16:56:30 <thingee> winston-d: I still don't think it helps in knowing if a task finished when you retry a failed task.
16:56:35 <thingee> from the user's standpoint
16:56:37 <thingee> or client
16:56:59 <jgriffith> My suggestion was that running a new task 'always' clears the previous task-state
16:57:05 <jgriffith> set's it to None at the onset
16:57:10 <jungleboyj_> give the user as much info as possible.   Eventually it helps the admin is well.
16:57:39 <hemna> jungleboyj_, hey, user here is a nice fat stacktrace for you.  good luck. :P
16:57:48 <DuncanT-1> jungleboyj_: Disagree. Far too easy for the user to start guessing what the problem is and get completely the wrong end of the stick
16:57:54 <jgriffith> hemna: and so much for the abstraction
16:57:57 <glenng> Or confuses them. Seeing old error info may hinder when current operation worked.
16:57:59 <thingee> jgriffith: would that be obvious to someone new? I'm trying to remember if on certain operations we list the volume/snapshot or whatever details before doing certain actions
16:58:12 <jungleboyj_> hemna ... Not that much.
16:58:21 <hemna> :P  good.
16:58:42 <thingee> jgriffith: It's not an obvious thing to me that a field would be cleared on a new action.
16:58:45 <jgriffith> thingee: it's a hell of a lot more obvious that silently not extending or setting the volume to error because something isn't supported
16:58:49 <hemna> other than the current volume state, all the admin has now are log stacktraces...if that.
16:59:04 <jgriffith> thingee: when you run an API cmd and see the field change it seems obvious to me
16:59:26 <thingee> jgriffith: I agree it's better. I'm just saying if we're going to revamp this, lets be careful and consider these things so we're not repeating ourselves.
16:59:28 <jungleboyj_> hemna backend_not_supported doesn't seem dangerous though.
16:59:30 <DuncanT-1> hemna: Good drivers log lots of useful info of their own too... if yours doesn't, talk to your vendor
16:59:37 <jgriffith> thingee: fair enough
16:59:45 <hemna> jungleboyj_, +1
16:59:46 <jgriffith> DuncanT-1: +1
16:59:59 <jgriffith> Ok
16:59:59 <thingee> DuncanT-1: +1
17:00:04 <bswartz> +1 for log spam
17:00:15 <jgriffith> We've succesfully burned our hour
17:00:19 <hemna> DuncanT-1, ours does a good job of logging failures/reasons.   I'm just saying in general though that's not overly useful to an admin
17:00:24 <jgriffith> I'll get a session for this proposed
17:00:25 <thingee> also with that, reviewers should be encouraging driver changes to give great logs to cinder users!
17:00:30 <jgriffith> and have some code for ATL
17:00:34 <hemna> because it takes for fricking ever to find the error in the log on a busy system.
17:00:41 <DuncanT-1> ATL?
17:00:43 <jgriffith> thanks everyone
17:00:45 <thingee> atlanta
17:00:49 <hemna> forcing admins to have to look in the log, is the wrong approach IMO
17:00:50 <DuncanT-1> Ah
17:00:52 <jungleboyj_> thingee my favorite thing to do.
17:00:53 <jgriffith> #endmeeting