#openstack-meeting log

17:00:12 <hartsocks> #startmeeting VMwareAPI
17:00:13 <openstack> Meeting started Wed Oct 23 17:00:12 2013 UTC and is due to finish in 60 minutes.  The chair is hartsocks. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:00:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:00:17 <openstack> The meeting name has been set to 'vmwareapi'
17:00:25 <hartsocks> hi all
17:00:30 <hartsocks> I'm back.
17:00:35 <hartsocks> Did you miss me?
17:00:42 <hartsocks> Raise a hand if you did :-)
17:00:43 <tjones> I DID!!!
17:00:49 <hartsocks> *lol*
17:01:08 <tjones> :-D
17:01:48 <hartsocks> I'm still reading my old emails but I'm reading from newest to oldest so if there's something you need me to help on, please re-send… it'll get taken care of sooner (and perhaps twice even).
17:02:14 <hartsocks> One time I actually read all my emails and I didn't know what to do.
17:02:25 <hartsocks> Then the problem went away.
17:03:12 <hartsocks> Vui said he'd be online later so we might see him.
17:03:22 <hartsocks> Anybody else around?
17:03:55 <sarvind> sarvind ; I'm here. first time for the irc meeting
17:04:08 <tjones> welcome sarvind
17:04:28 <tjones> i may start calling you that face to face ;-)
17:05:01 <hartsocks> *lol*
17:06:19 <hartsocks> Well, let's get rolling then...
17:06:22 <hartsocks> #topic bugs
17:06:59 <hartsocks> #link http://goo.gl/uD7VDR
17:07:44 <hartsocks> here's a query on launchpad that combs for open bugs … new , in progress, etc.
17:08:25 <hartsocks> We have a few that's popped up since last week.
17:08:35 <hartsocks> Looks like 5 or so.
17:09:08 <hartsocks> This one troubles me...
17:09:10 <smurugesan> This is sabari here. Hi All.
17:09:11 <hartsocks> #link https://bugs.launchpad.net/nova/+bug/1241350
17:09:13 <uvirtbot> Launchpad bug 1241350 in nova "VMware: Detaching a volume from an instance also deletes the volume's backing vmdk" [Undecided,In progress]
17:09:17 <hartsocks> hey sabari!
17:10:07 <danwent> hi folks, garyk says he is having technical issues with his irc client
17:10:14 <hartsocks> okay.
17:10:16 <tjones> looks like that is out for review
17:10:25 <danwent> (he let me know via skype, apparently a more reliable channel)
17:10:41 <sarvind> i'd trouble with the irc client as well switched to the webchat for now
17:11:04 <hartsocks> okay.
17:11:09 <garyk> hi
17:11:33 <hartsocks> My plan for today is just to hit bugs, then blueprints, then HK summit stuff.
17:11:42 <hartsocks> Hey gary.
17:11:57 <hartsocks> We were just talking about #link https://bugs.launchpad.net/nova/+bug/1241350
17:11:58 <uvirtbot> Launchpad bug 1241350 in nova "VMware: Detaching a volume from an instance also deletes the volume's backing vmdk" [Undecided,In progress]
17:12:26 <garyk> that is a critical issue - hopefully we can get it backported asap
17:12:26 <tjones> jay has it out for review
17:12:49 <garyk> there are some minor issues with test cases. but nothing blocking
17:13:08 <hartsocks> okay good.
17:13:27 <hartsocks> #link https://bugs.launchpad.net/nova/+bug/1243222
17:13:28 <uvirtbot> Launchpad bug 1243222 in nova "VMware: Detach leaves volume in inconsistent state" [Undecided,New]
17:14:02 <garyk> i am currently working on that
17:14:11 <hartsocks> Okay.
17:14:15 <garyk> it is when a snaptshot takes place on the instance
17:14:53 <hartsocks> you've confirmed it then?
17:15:22 <garyk> confirmed the bug?
17:15:45 <hartsocks> yes. So it can be marked "confirmed" not "new" or something else?
17:16:09 <garyk> i'll mark it as confirmed
17:16:37 <hartsocks> groovy.
17:16:47 <danwent> garyk: ok, so that only happens when a snapshot is done?
17:17:05 <danwent> the title does not indicate that anywhere, making it sound much larger :)
17:17:16 <hartsocks> heh. good point.
17:17:33 <garyk> the issues is as follows: when we do a snapshot we ready the disk from the hardware summary. problem is that due to tge fact that there are 2 disks we do not read the right disk
17:17:47 <garyk> this causes us to snapshot the cinder disk instead of the nova disk.
17:17:50 <danwent> we seem to have a problem with that in general, giving bugs very general titles that make things seem very broken
17:18:12 <garyk> danwent: correct. we need to work on our descriptions
17:18:13 <hartsocks> I've edited the title to describe step 4 in the repro steps.
17:18:32 <danwent> ok, i'll send a note to the list just to remind people about this
17:18:33 <tjones> danwent: we went to keep you on your toes
17:18:49 <hartsocks> *lol* what's wrong with: "broken-ness all things happen bad" ?
17:19:00 <danwent> tjones: or the hospital?  i almost had a heart attack when i saw that
17:19:36 <hartsocks> I see things like that and I usually say, "yeah right"
17:19:37 <tjones> :-P
17:20:08 <tjones> garyk: did this break in tempest or do we need to add more tests?
17:20:15 <danwent> or wait, that's a different bug than I thought we were talking about
17:20:17 <danwent> one sec
17:20:40 <garyk> tjones: i do not think that this is covered in tempest - if it is, it does not really check the validiaty of the disks
17:21:00 <hartsocks> This is a good point...
17:21:04 <garyk> danwent: there are 2 bugs which are very closely related - they may even be the same
17:21:09 <danwent> https://bugs.launchpad.net/nova/+bug/1241350
17:21:11 <uvirtbot> Launchpad bug 1241350 in nova "VMware: Detaching a volume from an instance also deletes the volume's backing vmdk" [High,In progress]
17:21:26 <danwent> is the one i saw
17:21:27 <hartsocks> yeah.
17:21:37 <hartsocks> That was top of my list too.
17:22:03 <garyk> that bug already has a patch upstream. that is a blocker
17:22:16 <garyk> we have discussed hat a few minutes ago
17:22:24 <danwent> i'm confused, does this mean any use of volumes is broken?
17:23:00 <garyk> danwent: something changed as this is a screio that we have done a million times
17:23:19 <garyk> i am not sure if it is in cinder. nothing on our side was chnaged here (we also have this in a lab)
17:23:25 <danwent> yeah, that is my sense too.
17:23:36 <hartsocks> slow down guys.
17:23:52 <hartsocks> This looks like more bad naming confusion here.
17:24:04 <hartsocks> If I'm reading this line right...
17:24:09 <hartsocks> https://github.com/openstack/nova/blob/master/nova/virt/vmwareapi/vm_util.py#L471
17:24:13 <garyk> i think that subbu and kartik were looking deeepr at the status of the disk and that may have highlighted the problem
17:24:31 <hartsocks> assuming the bug reporter was linking correctly… this means the
17:24:38 <garyk> hartsocks: that is what the fix addresses
17:24:41 <garyk> there are two cases
17:24:49 <garyk> 1. a consolidated disk needs to be deleted
17:24:53 <hartsocks> nova volume-detach
17:24:56 <hartsocks> calls the delete volume code
17:24:57 <garyk> 2. a detachment does not need to be deleted
17:25:03 <danwent> one at a time please :)
17:25:06 <hartsocks> def delete_virtual_disk_spec(
17:25:14 <hartsocks> which is not the right thing to do.
17:25:23 <hartsocks> Since deleting is not detaching.
17:25:34 <hartsocks> So.
17:25:37 <hartsocks> I'm saying:
17:25:41 <hartsocks> delete is not detatch.
17:25:43 <garyk> please look at https://review.openstack.org/#/c/52645/
17:26:17 <hartsocks> great.
17:26:23 <hartsocks> I was scared there for a second.
17:26:40 <danwent> ok, so we can we up a level and talk about impact on customer?
17:26:48 <hartsocks> So the impact.
17:26:53 <hartsocks> can only be on
17:27:04 <hartsocks> the nova created vmdk right?
17:27:13 <hartsocks> this can't be bleeding into cinder somehow?
17:27:24 <garyk> hartsocks: danwent: no the problem is the cinder volume
17:27:47 <garyk> the 'detachment' 'deletes' the cinder volume.
17:28:05 <garyk> due to the fact that it is attached to a empty vm it will not be deleted but may turn into read only
17:28:31 <garyk> so it the case is we have instance X
17:28:35 <garyk> that uses volume Y
17:28:40 <garyk> and we write to Y
17:28:43 <garyk> the detach
17:29:07 <garyk> and attach to instance Z tehn we can read what was written by X but not may be able to write again
17:29:27 <garyk> sorry for the piece meal of comments - eb client is hard and my irc client is broken
17:29:52 <danwent> well, the bug says that the actually re-attach fails
17:30:02 <danwent> not that it succeeds, but the volume is read-only
17:30:27 <hartsocks> in my book, that means we haven't really confirmed this bug.
17:30:51 <garyk> hartsocks: subbu and kartik have confimed this and i have tested the patch
17:31:23 <danwent> garyk: confirmed what behavior?   what is written in the bug (second attach fails) or what you mentioned (second attach works, but read-only)
17:31:50 <hartsocks> I have no doubt you've found *a* bug and fixed it.
17:32:31 <garyk> danwent: i think that they have confirmed what is written in the bug. i am not 100% sure, but I discussed this with them
17:33:31 <danwent> ok… well, i guess one thing that is different in the bug from what I personally have tested is that I've never tried to immediately re-attach a volume to the same VM, we always booted another VM and attached the volume to that vms
17:33:32 <garyk> danwent: my understaning, and i may be wrong, or confused, most likely the latter, is that the disk could become read only when we do something like a delete or a snapshot and it is owned by someone else
17:34:46 <garyk> danwent: that is the scenrio that i always tested
17:35:37 <danwent> ok, i don't totally follow on the read-only part, I'm just trying to understand how pervasive the bug is, as the write-up makes it sound like any volume that is detached is deleted and can never be attached to another VM, which means the whole point of volumes is in question.
17:36:08 <danwent> but that seems to contract what we've tested.
17:36:13 <danwent> contradict
17:36:14 <garyk> danwent: i'll forllow up with subbu and kartik and get all of the details so that we can paint a better picture
17:36:30 <danwent> ok, thanks, yeah, don't need to take up the whole meeting, but this does seem pretty important
17:36:39 <garyk> hartsocks: you can action item that for me
17:36:44 <danflorea> I agree. We need to know if we should say "don't snapshot when you have Cinder volumes attached" or "don't use our Cinder driver"
17:36:52 <garyk> yeah i concur it is very importnat
17:37:31 <hartsocks> #action garyk follow up on https://bugs.launchpad.net/nova/+bug/1241350 and narrow scope/descriptions
17:37:33 <uvirtbot> Launchpad bug 1241350 in nova "VMware: Detaching a volume from an instance also deletes the volume's backing vmdk" [High,In progress]
17:37:52 <hartsocks> Which brings me to...
17:37:55 <hartsocks> #link https://bugs.launchpad.net/nova/+bug/1243193
17:37:56 <uvirtbot> Launchpad bug 1243193 in nova "VMware: snapshot backs up wrong disk when instance is attached to volume" [Undecided,New]
17:38:13 <hartsocks> which seems related.
17:38:18 <garyk> hartsocks: i am currently debugging this
17:38:24 <hartsocks> (if only by subject matter)
17:38:31 <garyk> this is related to https://bugs.launchpad.net/nova/+bug/1243222
17:38:34 <uvirtbot> Launchpad bug 1243222 in nova "VMware: Detach after snapshot leaves volume in inconsistent state" [Undecided,Confirmed]
17:39:01 <hartsocks> yeah, glad you're on it.
17:39:11 <garyk> i need a stiff drink
17:39:46 <hartsocks> putting your name on the bug so I don't accidentally try to pick it up.
17:40:15 <hartsocks> Who's ever in HK should buy Gary a round.
17:40:19 <tjones> garyk: at least its late enough for you to do just hat
17:40:29 <tjones> that
17:40:37 <garyk> :)
17:40:44 <hartsocks> a hat full of vodka.
17:40:47 <hartsocks> :-)
17:41:15 <tjones> :-D
17:41:24 <hartsocks> any other pressing things?
17:41:32 <hartsocks> (on the topic of bugs that is)
17:42:16 <hartsocks> anyone look at #link https://bugs.launchpad.net/nova/+bug/1240355
17:42:18 <uvirtbot> Launchpad bug 1240355 in nova "Broken pipe error when copying image from glance to vSphere" [Undecided,New]
17:43:19 <hartsocks> That seems like someone with a screwy setup more than anything.
17:43:26 <hartsocks> Okay.
17:43:35 <smurugesan> I think this is related to the bug Tracy is working on . let me pull it up
17:43:47 <garyk> i have seen that on a number of occasions. have never been able to debug it
17:44:15 <hartsocks> hmm… so maybe not just a screwy set up (I've never seen this)
17:44:21 <smurugesan> Could this be because the vmdk descriptor file exists but not the flat-file.
17:44:41 <garyk> i actually think that it happens when the image is copied to the vc - i do not think that devtsack uses ssl between nova and glance
17:45:14 <garyk> i see it once every few days using a vanilla devstack installation with the debian instance
17:45:50 <hartsocks> Really?!?
17:46:03 <tjones> odd i have never sen it
17:46:07 <garyk> i am not sure if a packet is discarded or corrupted. but it is a tcp session so it should be retransmitted
17:46:13 <hartsocks> That error looks to me like a transient networking failure.
17:46:42 <hartsocks> Yes. TCP should cover retransmit of the occasional packet loss.
17:47:14 <garyk> my thinking is that the current connection is terminated and a new session witht he vc is started. the file download is not restarted... but then again i have not been able to reproduce to be able to say for sure
17:47:45 <hartsocks> Hmmm…
17:48:24 <hartsocks> When we transfer to the datastores...
17:48:37 <hartsocks> are we using the HTTP "rest" like interfaces?
17:48:47 <hartsocks> I don't recall… I suppose we would have to.
17:49:09 <hartsocks> I recall that there is a problem with session time-outs between the two forms of connections.
17:49:19 <hartsocks> The vanilla HTTP connection used for large file transfer...
17:49:27 <hartsocks> and the SOAP connection have different sessions.
17:49:36 <hartsocks> One can time out and the other can still be active.
17:49:51 <hartsocks> This would tend to happen on long running large file transfers.
17:50:03 <hartsocks> Is that what you've seen Gary?
17:50:46 <garyk> i have just seen the exception. have not delved any deeper than that
17:51:08 <garyk> i'll try and run tcp dump and see if it reprodues. this may crystalize your theory
17:51:33 <danflorea> It would be good to know if this is isolated to one testbed or if we see it in multiple. If it's the latter, it's less likely that this is just a network/setup issue.
17:52:04 <danflorea> PayPal in particular has big image files and is sensitive to failures like this so definitely worth investigating.
17:52:20 <garyk> both ryand and i have seen this. only thing in common is that we use the same cloud
17:52:26 <tjones> gark: you said it was with the debian image?
17:53:03 <garyk> tjones: yes
17:53:33 <hartsocks> Does the transfer ever take more than 30 minutes?
17:53:36 <tjones> that's only 1G
17:54:30 <garyk> nope, it is usually a few seconds, maybe a minute at most
17:54:46 <hartsocks> Okay. That doesn't support my theory at all.
17:55:23 <hartsocks> Hmm… who should look at this?
17:55:44 <garyk> i am bigged donw in the disk and datastores
17:55:44 <tjones> If ryan can show me what he does i can take a look
17:55:52 <tjones> yeah garyk has enough ;-)
17:55:57 <garyk> bogged not bigged
17:55:58 <hartsocks> totally.
17:56:05 <hartsocks> why not both?
17:56:06 <tjones> bugged
17:56:18 <garyk> nah, not bugged.
17:56:53 <tjones> at least i can put some debugging in there so we can catch it more easily if i cannot repo
17:57:07 <hartsocks> okay.
17:57:08 <hartsocks> #action tjones to follow up on https://bugs.launchpad.net/nova/+bug/1240355
17:57:10 <uvirtbot> Launchpad bug 1240355 in nova "Broken pipe error when copying image from glance to vSphere" [Undecided,New]
17:57:27 <hartsocks> We spent most of the meeting on bugs.
17:57:47 <hartsocks> #topic open discussion
17:58:03 <hartsocks> Anything else pressing we need to talk about?
17:58:19 <danflorea> Just one request. Please review upstream Nova driver & Cinder driver docs :)
17:58:25 <tjones> :-D
17:58:43 <danflorea> Nova driver patch: https://review.openstack.org/#/c/51756/
17:59:05 <danflorea> Cinder driver doc is already merged. But send me comments if you have any and I can update that one too.
17:59:12 <hartsocks> #action everyone give some review love to upstream nova driver and cinder docs!
17:59:35 <hartsocks> So we're out of time.
17:59:41 <tjones> adios
17:59:41 <hartsocks> Thanks for the turn out today.
17:59:50 <danflorea> bye
18:00:08 <hartsocks> We're on #openstack-vmware just hangin' out if anyone needs to chat.
18:00:13 <hartsocks> #endmeeting