17:00:12 #startmeeting VMwareAPI 17:00:13 Meeting started Wed Oct 23 17:00:12 2013 UTC and is due to finish in 60 minutes. The chair is hartsocks. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:14 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:17 The meeting name has been set to 'vmwareapi' 17:00:25 hi all 17:00:30 I'm back. 17:00:35 Did you miss me? 17:00:42 Raise a hand if you did :-) 17:00:43 I DID!!! 17:00:49 *lol* 17:01:08 :-D 17:01:48 I'm still reading my old emails but I'm reading from newest to oldest so if there's something you need me to help on, please re-send… it'll get taken care of sooner (and perhaps twice even). 17:02:14 One time I actually read all my emails and I didn't know what to do. 17:02:25 Then the problem went away. 17:03:12 Vui said he'd be online later so we might see him. 17:03:22 Anybody else around? 17:03:55 sarvind ; I'm here. first time for the irc meeting 17:04:08 welcome sarvind 17:04:28 i may start calling you that face to face ;-) 17:05:01 *lol* 17:06:19 Well, let's get rolling then... 17:06:22 #topic bugs 17:06:59 #link http://goo.gl/uD7VDR 17:07:44 here's a query on launchpad that combs for open bugs … new , in progress, etc. 17:08:25 We have a few that's popped up since last week. 17:08:35 Looks like 5 or so. 17:09:08 This one troubles me... 17:09:10 This is sabari here. Hi All. 17:09:11 #link https://bugs.launchpad.net/nova/+bug/1241350 17:09:13 Launchpad bug 1241350 in nova "VMware: Detaching a volume from an instance also deletes the volume's backing vmdk" [Undecided,In progress] 17:09:17 hey sabari! 17:10:07 hi folks, garyk says he is having technical issues with his irc client 17:10:14 okay. 17:10:16 looks like that is out for review 17:10:25 (he let me know via skype, apparently a more reliable channel) 17:10:41 i'd trouble with the irc client as well switched to the webchat for now 17:11:04 okay. 17:11:09 hi 17:11:33 My plan for today is just to hit bugs, then blueprints, then HK summit stuff. 17:11:42 Hey gary. 17:11:57 We were just talking about #link https://bugs.launchpad.net/nova/+bug/1241350 17:11:58 Launchpad bug 1241350 in nova "VMware: Detaching a volume from an instance also deletes the volume's backing vmdk" [Undecided,In progress] 17:12:26 that is a critical issue - hopefully we can get it backported asap 17:12:26 jay has it out for review 17:12:49 there are some minor issues with test cases. but nothing blocking 17:13:08 okay good. 17:13:27 #link https://bugs.launchpad.net/nova/+bug/1243222 17:13:28 Launchpad bug 1243222 in nova "VMware: Detach leaves volume in inconsistent state" [Undecided,New] 17:14:02 i am currently working on that 17:14:11 Okay. 17:14:15 it is when a snaptshot takes place on the instance 17:14:53 you've confirmed it then? 17:15:22 confirmed the bug? 17:15:45 yes. So it can be marked "confirmed" not "new" or something else? 17:16:09 i'll mark it as confirmed 17:16:37 groovy. 17:16:47 garyk: ok, so that only happens when a snapshot is done? 17:17:05 the title does not indicate that anywhere, making it sound much larger :) 17:17:16 heh. good point. 17:17:33 the issues is as follows: when we do a snapshot we ready the disk from the hardware summary. problem is that due to tge fact that there are 2 disks we do not read the right disk 17:17:47 this causes us to snapshot the cinder disk instead of the nova disk. 17:17:50 we seem to have a problem with that in general, giving bugs very general titles that make things seem very broken 17:18:12 danwent: correct. we need to work on our descriptions 17:18:13 I've edited the title to describe step 4 in the repro steps. 17:18:32 ok, i'll send a note to the list just to remind people about this 17:18:33 danwent: we went to keep you on your toes 17:18:49 *lol* what's wrong with: "broken-ness all things happen bad" ? 17:19:00 tjones: or the hospital? i almost had a heart attack when i saw that 17:19:36 I see things like that and I usually say, "yeah right" 17:19:37 :-P 17:20:08 garyk: did this break in tempest or do we need to add more tests? 17:20:15 or wait, that's a different bug than I thought we were talking about 17:20:17 one sec 17:20:40 tjones: i do not think that this is covered in tempest - if it is, it does not really check the validiaty of the disks 17:21:00 This is a good point... 17:21:04 danwent: there are 2 bugs which are very closely related - they may even be the same 17:21:09 https://bugs.launchpad.net/nova/+bug/1241350 17:21:11 Launchpad bug 1241350 in nova "VMware: Detaching a volume from an instance also deletes the volume's backing vmdk" [High,In progress] 17:21:26 is the one i saw 17:21:27 yeah. 17:21:37 That was top of my list too. 17:22:03 that bug already has a patch upstream. that is a blocker 17:22:16 we have discussed hat a few minutes ago 17:22:24 i'm confused, does this mean any use of volumes is broken? 17:23:00 danwent: something changed as this is a screio that we have done a million times 17:23:19 i am not sure if it is in cinder. nothing on our side was chnaged here (we also have this in a lab) 17:23:25 yeah, that is my sense too. 17:23:36 slow down guys. 17:23:52 This looks like more bad naming confusion here. 17:24:04 If I'm reading this line right... 17:24:09 https://github.com/openstack/nova/blob/master/nova/virt/vmwareapi/vm_util.py#L471 17:24:13 i think that subbu and kartik were looking deeepr at the status of the disk and that may have highlighted the problem 17:24:31 assuming the bug reporter was linking correctly… this means the 17:24:38 hartsocks: that is what the fix addresses 17:24:41 there are two cases 17:24:49 1. a consolidated disk needs to be deleted 17:24:53 nova volume-detach 17:24:56 calls the delete volume code 17:24:57 2. a detachment does not need to be deleted 17:25:03 one at a time please :) 17:25:06 def delete_virtual_disk_spec( 17:25:14 which is not the right thing to do. 17:25:23 Since deleting is not detaching. 17:25:34 So. 17:25:37 I'm saying: 17:25:41 delete is not detatch. 17:25:43 please look at https://review.openstack.org/#/c/52645/ 17:26:17 great. 17:26:23 I was scared there for a second. 17:26:40 ok, so we can we up a level and talk about impact on customer? 17:26:48 So the impact. 17:26:53 can only be on 17:27:04 the nova created vmdk right? 17:27:13 this can't be bleeding into cinder somehow? 17:27:24 hartsocks: danwent: no the problem is the cinder volume 17:27:47 the 'detachment' 'deletes' the cinder volume. 17:28:05 due to the fact that it is attached to a empty vm it will not be deleted but may turn into read only 17:28:31 so it the case is we have instance X 17:28:35 that uses volume Y 17:28:40 and we write to Y 17:28:43 the detach 17:29:07 and attach to instance Z tehn we can read what was written by X but not may be able to write again 17:29:27 sorry for the piece meal of comments - eb client is hard and my irc client is broken 17:29:52 well, the bug says that the actually re-attach fails 17:30:02 not that it succeeds, but the volume is read-only 17:30:27 in my book, that means we haven't really confirmed this bug. 17:30:51 hartsocks: subbu and kartik have confimed this and i have tested the patch 17:31:23 garyk: confirmed what behavior? what is written in the bug (second attach fails) or what you mentioned (second attach works, but read-only) 17:31:50 I have no doubt you've found *a* bug and fixed it. 17:32:31 danwent: i think that they have confirmed what is written in the bug. i am not 100% sure, but I discussed this with them 17:33:31 ok… well, i guess one thing that is different in the bug from what I personally have tested is that I've never tried to immediately re-attach a volume to the same VM, we always booted another VM and attached the volume to that vms 17:33:32 danwent: my understaning, and i may be wrong, or confused, most likely the latter, is that the disk could become read only when we do something like a delete or a snapshot and it is owned by someone else 17:34:46 danwent: that is the scenrio that i always tested 17:35:37 ok, i don't totally follow on the read-only part, I'm just trying to understand how pervasive the bug is, as the write-up makes it sound like any volume that is detached is deleted and can never be attached to another VM, which means the whole point of volumes is in question. 17:36:08 but that seems to contract what we've tested. 17:36:13 contradict 17:36:14 danwent: i'll forllow up with subbu and kartik and get all of the details so that we can paint a better picture 17:36:30 ok, thanks, yeah, don't need to take up the whole meeting, but this does seem pretty important 17:36:39 hartsocks: you can action item that for me 17:36:44 I agree. We need to know if we should say "don't snapshot when you have Cinder volumes attached" or "don't use our Cinder driver" 17:36:52 yeah i concur it is very importnat 17:37:31 #action garyk follow up on https://bugs.launchpad.net/nova/+bug/1241350 and narrow scope/descriptions 17:37:33 Launchpad bug 1241350 in nova "VMware: Detaching a volume from an instance also deletes the volume's backing vmdk" [High,In progress] 17:37:52 Which brings me to... 17:37:55 #link https://bugs.launchpad.net/nova/+bug/1243193 17:37:56 Launchpad bug 1243193 in nova "VMware: snapshot backs up wrong disk when instance is attached to volume" [Undecided,New] 17:38:13 which seems related. 17:38:18 hartsocks: i am currently debugging this 17:38:24 (if only by subject matter) 17:38:31 this is related to https://bugs.launchpad.net/nova/+bug/1243222 17:38:34 Launchpad bug 1243222 in nova "VMware: Detach after snapshot leaves volume in inconsistent state" [Undecided,Confirmed] 17:39:01 yeah, glad you're on it. 17:39:11 i need a stiff drink 17:39:46 putting your name on the bug so I don't accidentally try to pick it up. 17:40:15 Who's ever in HK should buy Gary a round. 17:40:19 garyk: at least its late enough for you to do just hat 17:40:29 that 17:40:37 :) 17:40:44 a hat full of vodka. 17:40:47 :-) 17:41:15 :-D 17:41:24 any other pressing things? 17:41:32 (on the topic of bugs that is) 17:42:16 anyone look at #link https://bugs.launchpad.net/nova/+bug/1240355 17:42:18 Launchpad bug 1240355 in nova "Broken pipe error when copying image from glance to vSphere" [Undecided,New] 17:43:19 That seems like someone with a screwy setup more than anything. 17:43:26 Okay. 17:43:35 I think this is related to the bug Tracy is working on . let me pull it up 17:43:47 i have seen that on a number of occasions. have never been able to debug it 17:44:15 hmm… so maybe not just a screwy set up (I've never seen this) 17:44:21 Could this be because the vmdk descriptor file exists but not the flat-file. 17:44:41 i actually think that it happens when the image is copied to the vc - i do not think that devtsack uses ssl between nova and glance 17:45:14 i see it once every few days using a vanilla devstack installation with the debian instance 17:45:50 Really?!? 17:46:03 odd i have never sen it 17:46:07 i am not sure if a packet is discarded or corrupted. but it is a tcp session so it should be retransmitted 17:46:13 That error looks to me like a transient networking failure. 17:46:42 Yes. TCP should cover retransmit of the occasional packet loss. 17:47:14 my thinking is that the current connection is terminated and a new session witht he vc is started. the file download is not restarted... but then again i have not been able to reproduce to be able to say for sure 17:47:45 Hmmm… 17:48:24 When we transfer to the datastores... 17:48:37 are we using the HTTP "rest" like interfaces? 17:48:47 I don't recall… I suppose we would have to. 17:49:09 I recall that there is a problem with session time-outs between the two forms of connections. 17:49:19 The vanilla HTTP connection used for large file transfer... 17:49:27 and the SOAP connection have different sessions. 17:49:36 One can time out and the other can still be active. 17:49:51 This would tend to happen on long running large file transfers. 17:50:03 Is that what you've seen Gary? 17:50:46 i have just seen the exception. have not delved any deeper than that 17:51:08 i'll try and run tcp dump and see if it reprodues. this may crystalize your theory 17:51:33 It would be good to know if this is isolated to one testbed or if we see it in multiple. If it's the latter, it's less likely that this is just a network/setup issue. 17:52:04 PayPal in particular has big image files and is sensitive to failures like this so definitely worth investigating. 17:52:20 both ryand and i have seen this. only thing in common is that we use the same cloud 17:52:26 gark: you said it was with the debian image? 17:53:03 tjones: yes 17:53:33 Does the transfer ever take more than 30 minutes? 17:53:36 that's only 1G 17:54:30 nope, it is usually a few seconds, maybe a minute at most 17:54:46 Okay. That doesn't support my theory at all. 17:55:23 Hmm… who should look at this? 17:55:44 i am bigged donw in the disk and datastores 17:55:44 If ryan can show me what he does i can take a look 17:55:52 yeah garyk has enough ;-) 17:55:57 bogged not bigged 17:55:58 totally. 17:56:05 why not both? 17:56:06 bugged 17:56:18 nah, not bugged. 17:56:53 at least i can put some debugging in there so we can catch it more easily if i cannot repo 17:57:07 okay. 17:57:08 #action tjones to follow up on https://bugs.launchpad.net/nova/+bug/1240355 17:57:10 Launchpad bug 1240355 in nova "Broken pipe error when copying image from glance to vSphere" [Undecided,New] 17:57:27 We spent most of the meeting on bugs. 17:57:47 #topic open discussion 17:58:03 Anything else pressing we need to talk about? 17:58:19 Just one request. Please review upstream Nova driver & Cinder driver docs :) 17:58:25 :-D 17:58:43 Nova driver patch: https://review.openstack.org/#/c/51756/ 17:59:05 Cinder driver doc is already merged. But send me comments if you have any and I can update that one too. 17:59:12 #action everyone give some review love to upstream nova driver and cinder docs! 17:59:35 So we're out of time. 17:59:41 adios 17:59:41 Thanks for the turn out today. 17:59:50 bye 18:00:08 We're on #openstack-vmware just hangin' out if anyone needs to chat. 18:00:13 #endmeeting