17:01:17 #startmeeting vmwareapi 17:01:18 Meeting started Wed Jul 9 17:01:17 2014 UTC and is due to finish in 60 minutes. The chair is tjones. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:01:19 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:01:22 The meeting name has been set to 'vmwareapi' 17:01:38 hi folks - who's here today? 17:01:51 * mdbooth waves 17:02:07 mdbooth: thought you were coming late? 17:02:14 tjones: I am late :) 17:02:18 * mdbooth rushed 17:02:18 lol 17:02:20 hi 17:02:22 o/ 17:02:22 thanks! 17:02:45 gary wants me to pass word along that he is not able to make the meeting. 17:03:15 Hi tjones, I'm here 17:03:18 ok lets get started. I want time during the open dicsussion to talk about the 1 cluster / nova compute issue 17:03:28 so lets wait until the end for that 17:03:36 1st lets talk about bugs 17:03:43 #topic vmware api bugs 17:04:14 I've put together this web page which allows more flxibility in looking at bugs 17:04:15 http://54.201.139.117/demo.html 17:04:32 ok i have one defect https://review.openstack.org/99623 17:04:33 if you type vmware in the search area you can see our bugs 17:04:53 ok KanagarajM we can talk about that in a little bit 17:05:36 I would like us to spend 10 minutes going through the bugs and calling out issues that we think would be affecting our customers from being able to successfully deploy openstack on vsphere 17:06:05 for those bugs we need to add them to the vmwareapi project so they can be tracked separately 17:06:25 we can't tag them unfortuately - so we have to add them to the project. see this bug as an example 17:06:54 https://bugs.launchpad.net/nova/+bug/1240373 17:06:56 Launchpad bug 1240373 in openstack-vmwareapi-team "VMware: Sparse glance vmdk's size property is mistaken for capacity" [High,Confirmed] 17:07:08 we have to add it to the vmwareapi-team project 17:07:10 make sense? 17:09:07 tjones: Not really sure what I'm looking for, tbh 17:09:18 these are things that we would like to make sure that a customer has fixes for before trying to deploy. for example https://bugs.launchpad.net/nova/+bug/1255317 17:09:19 Launchpad bug 1255317 in nova "VMware: can't boot from sparse image copied to volume " [High,In progress] 17:10:03 tjones: This one is extremely easy to hit: https://bugs.launchpad.net/nova/+bug/1333587 17:10:04 Launchpad bug 1333587 in nova "VMware: ExtendVirtualDisk_Task fails due to locked file" [High,Fix committed] 17:10:21 It came from RH qa 17:10:36 yes - exactly 17:11:05 this is the kind of bug we should have added to the vmwarepi-team list. I'll add it 17:11:05 tjones: We just looking for vmware bugs? 17:11:25 yes 17:11:29 ok 17:11:55 so that bug is fixed right? 17:12:13 vuil: Yeah, upstream. Needs backporting, though 17:12:21 i don't want to spend more than 5 minutes on this - but lets call out as many as we can. we will do this every week. you can also do it when you see something like this that would affect a customer 17:13:39 this one also i think https://bugs.launchpad.net/nova/+bug/1304593 17:13:41 Launchpad bug 1304593 in nova "VMware: waste of disk datastore when root disk size of instance is 0" [High,In progress] 17:14:08 tjones: I *think* I just fixed that one 17:14:12 co-incidentally 17:14:17 In the refactor 17:14:19 awesome 17:15:06 the purpose of this list is calling out only issues that will cause a customer to fail. not to overwhelm them with issues 17:15:22 ok lets move on to BP now 17:15:31 tjones: i would like to ask some approver could look into the defect https://bugs.launchpad.net/nova/+bug/1329261 . 17:15:33 Launchpad bug 1329261 in nova "vmware: VCdriver creates same hypervisor_hostname for different vcenters with same Cluster name" [High,In progress] 17:15:51 oh sorry KanagarajM - you said you wanted to discuss that 17:16:01 hi, sorry for being late 17:16:15 tjones: https://review.openstack.org/#/c/104145/5/nova/tests/virt/vmwareapi/test_vmops.py line 98 17:16:41 we can only +1 it. we cannot +2 it 17:16:57 oh ok 17:17:02 That test change is required because it previously created an unnecessary copy, and now it doesn't 17:17:16 So we had to change the target size to force the copy creation 17:17:30 Which is what makes me suspect the root gb size 0 bug *might* be fixed 17:17:57 KanagarajM: looks like you have been getting reviews - so once it has a couple +1 from this team and a +1 from minesweeper you should ask the core guys to take a look (on irc) 17:18:12 mdbooth: cool - we should retest it then 17:18:27 ok on to BP? 17:18:30 tjones: ok sure. thanks 17:18:55 #topic approved BP (aka refactor) 17:19:04 vuil: mdbooth: how's it going? 17:19:16 tjones: Pretty good 17:19:29 thanks to mdbooth for posting the patches, and updating some existing ones 17:19:45 http://paste.fedoraproject.org/116749/26339140 17:19:51 I am working on laying phase 3 patches on top of it. 17:20:12 nice!!! 17:20:12 vuil: The end of the line is currently https://review.openstack.org/104148 17:20:27 pretty :-). How about I put it in https://etherpad.openstack.org/p/vmware-subteam-juno? 17:20:29 i did a zillion reviews yesterday 17:20:29 the oslo integration is also ready for review 17:20:42 thanks rado as well. 17:20:44 vuil: Go ahead 17:20:47 tjones: I saw, thanks 17:20:54 garyk said that hot plug is moving well too 17:21:21 mdbooth: vuil - so what's your feeling on when the refactor is completely merge?? 17:21:42 tjones: Well, we've been fairly successful in getting trivial fixes merged 17:21:43 and rgerganov_ can you post the link to the oslo integration for this team? 17:22:06 This has been incredibly useful, as it has meant we have dealt with a bunch of known potential merge conflicts already 17:22:27 tjones: here it is: https://review.openstack.org/#/c/70175/ 17:22:35 dansmith: could you please provide your inputs on the 1 cluster 1 compute process. 17:22:36 it has +1 from Jenkins and Minesweeper 17:22:44 However, I feel that what's left has some patches in there which would require more than a glance 17:22:53 KanagarajM: please wait until open discussion for that 17:22:57 Realistically, I think we should be looking for a couple of sponsors 17:23:44 dansmith: and mriedem have been pretty helpful - shoudl we ask them to sponsor? 17:24:06 tjones: I'll be happy with whoever wants the job :) 17:24:12 :-D 17:24:19 Mark McL has been too 17:24:26 true 17:24:27 Again, I'm happy to stay on top of review comments aggressively 17:24:36 thanks - this is moving nicely 17:24:52 rgerganov_: looks like you could use a few more reviews on http://paste.fedoraproject.org/116749/26339140 17:25:10 I am working through those as well 17:25:13 it's a big one - should we try to get a sponsor for this too? 17:25:22 rgerganov_: Sorry, employer interrupt meant I didn't get to that :( 17:25:30 tjones: I would like to get approval for the BP https://review.openstack.org/104211, NFS glance datastore. could you please bring it here thanks. 17:25:48 KanagarajM: please wait until we get to unapproved BP for that. it is next 17:26:15 tjones: sure. 17:26:22 any other approved BP we should cover? 17:26:48 vuil: Does refactor phase 3 cover factoring out image cache -> disk creation? 17:26:54 from spawn 17:27:08 * mdbooth vaguely recalls it does, but hasn't gotten around to looking, yet 17:27:12 image fetch/convert/cache/use 17:27:24 vuil: Awesome. I have a use for that. 17:27:54 cool 17:27:57 ok lets move on 17:27:58 vuil: Drop me a mail if you want any help with those patches, btw 17:28:01 #topic unapproved BP 17:28:05 #link https://review.openstack.org/#/q/status:open+project:openstack/nova-specs+message:vmware,n,z 17:28:14 vuil: Otherwise I'm going to concentrate on the existing queue 17:28:15 kirankv: want to go 1st? 17:28:37 mdbooth: will do (in the middle of decomposing that into small ones) 17:28:48 https://review.openstack.org/98704 17:29:15 vuil: Excellent. I think that's working well. 17:29:29 need review on this one https://review.openstack.org/98704 17:29:43 kirankv: should you chat with johnthetubaguy on irc to cover his concern (and remove the -1)? 17:30:08 i chatted with him today, he mentioned he will look into it tommorrow 17:30:14 great 17:30:31 I was actually the one who pointed out the issue to him 17:30:39 ok looks like it could use more reviews from this team too. guys please take a look 17:30:41 So I'm moderately familiar with it 17:31:07 anything else kirankv? 17:31:31 the other one I have to address the sync issues 17:31:50 tjones: thanks 17:31:53 Ah, I may be thinking of the other one 17:31:55 ok lets wait until we finish BP and then do that in open discsussion 17:32:00 KanagarajM: your turn 17:32:05 yes 17:32:17 would like to get approval for https://review.openstack.org/104211 NFS glance 17:32:25 datastore 17:32:43 we cannot approve - we can only review 17:32:55 guys can you please review this BP this week? 17:33:13 KanagarajM: you want to say a few words about it? 17:33:20 should i wait till get approval for a BP, or i can go ahead for the patch submit? 17:33:48 sure: it actually uses the datastore mounted on the NFS mount point created on the image store of a glance 17:34:21 during the instance boot, compute copies the image from glance to vmare_temp by using the NFS datastore 17:34:26 you can push patches but they will get -2 until the BP is approved. 17:34:44 it improved the image caching exponentially 17:35:11 KanagarajM: do you have any measurements? 17:35:16 sure, i will wait for the approval, but not sure how to get the attention of the core approvers , as this blue print is completely for the vmware vc driver 17:35:43 yes, for 800 MB image, the boot time reduced from 16 mins to 6 minutes 17:35:43 rgerganov_: I can well believe it. I'm convinced the performance of the copy code must be terrible from looking at it 17:35:47 KanagarajM: if you have any data, then I'd suggest to put it in the spec 17:35:52 i'd start with getting arnaud__ to take a look as he is a core reviewer in glance 17:36:11 sure, thats great idea. thanks rgerganov. 17:36:13 wow ! that is an improvement!! 17:36:14 KanagarajM: it depends on the vCenter version you are using 17:36:32 KanagarajM: Is that a common deployment, btw? 17:36:38 2Gb generally takes 5m 17:36:51 i.e. glance image store externally accessible by nfs? 17:36:57 kiranv : that is right its an windows vc 17:37:11 mdbooth: couldn't get your question 17:37:13 * mdbooth doesn't know how people use this stuff in the real world 17:37:42 yes, the right vCenter update versions have better performance 17:37:43 KanagarajM: I'm assuming that your BP only covers the case where the glance image store *can* be mounted via NFS from an ESX host 17:37:50 Is that a common deployment architecture? 17:37:57 i.e. is anybody likely to be able to use it? 17:39:21 NFS datastore is a common deployment scenario. 17:39:50 some deployment only uses high end NFS, in fact 17:40:01 I meant for glance 17:40:25 * mdbooth used to work at a VMware shop which used NFS for everything 17:41:50 so we really need to move onto open discussion. lets table this and continue either on the ML or in -vmware after the meeting 17:41:57 #topic open discussion 17:42:49 there is some confusion over what we are doing with 1vc / n-cpu. 17:43:13 yes, would prefer that the current approach be retained 17:43:17 as i understood this was a mandate from the core guys made in atlanta 17:43:43 you can achieve the 1cluster to 1 service mapping with existing implementation 17:43:50 I have posted the patch for dropping the multi-cluster support because this was on our roadmap for Juno 17:43:56 kirankv: We lost that argument. No point in continuing it imho. 17:44:33 tjones: only when we have a viable alternative can be drop it 17:44:57 *looking for the notes from last week* 17:45:51 kirankv: what would we be missing with multiple n-cpu processes managing a single cluster each? 17:46:03 as a viable alternative I mean 17:46:15 vuil: Apart from the additional resources they consume ;) 17:46:22 vuil: the memory footprint is more 17:46:42 kirankv: I agree with you, but we lost. 17:46:46 number of vcenter connection 17:47:09 the real problem is with the image cache which will be replicated for each nova host 17:47:31 I believe this can be solved with nova-conductor but requires more thinking and research 17:47:51 frankly i've lost sight of the problem we were solving by even doing this. can someone recap why the core guys insisted on this? 17:48:05 rgerganov_: The image cache is already replicated per datastore. 17:48:17 I don't think will change with multiple n-cpu 17:48:26 tjones: nova-compute is supposed to scale horizontally, not vertically. that was the main point I think 17:48:29 i have last the connection in IRC, and am back 17:48:45 rgrerganov: my BP intends to solve this :) 17:48:54 rgerganov_: oh yeah right 17:49:32 * mdbooth sees it as a deployment decision, personally 17:49:37 mdbooth: you won't replicate the cache for each cluster right now 17:49:40 Admins are grown-ups too 17:50:03 and the feeling that nodes concept was being abused, and they want it done 17:50:04 mdbooth: yes its a deployment decision and we should leave the choice as it exists today 17:50:19 vuil: correct 17:50:20 s/done/gone 17:50:55 rgerganov_: You'll replicate it per datastore 17:51:06 So unless the clusters share datastores, it'll make no difference 17:51:09 tjones: is the discussion strated on 1 cluster 1 nova-compute, i have lost the connection 17:51:19 yes we are discussing 17:51:48 mdbooth: it is a common case to share a datastore among clusters in the same vCenter I believe 17:52:29 rgerganov_: Ok, in that case it would make a difference 17:52:57 yes, in deployments ive seen its always shared datastores within the cluster 17:53:01 ok, multibackend solves many problems as listed in BP and in addition, i have added comments saying that , its going to save Capital Expense and running expense for the cloud provider 17:53:11 Anyway, regardless of the merits of it, we just need to get this done 17:53:17 mdbooth: are you referring to shared datastores across clusters? 17:53:23 kirankv: yes 17:53:59 * mdbooth would like to float an idea briefly before we finish, btw 17:54:09 mdbooth: ah, so it we were to cache images on such a shared datastore it would help 17:54:46 tjones: can we tell nova core that we have issues to resolve before we can move to the 1 cluster = 1 compute service model 17:55:31 kirankv: i think it would be best to send something out on the ML discussing the issues we are facing for full transparency 17:55:45 and until that we will leave this model, the admin still has the choice of using 1 cluster = 1 seevice in the existing design 17:55:47 tjones: +1 17:55:48 +1 17:57:06 who is going to write it - i feel like it's a best a joint email from (for example) rgerganov_, mdbooth, and kirankv. thoughts? 17:57:10 ok there are different ways to solve the 1 cluster 1 service issue, but they are having draw backs as listed in the BP like keeping same configuration across mutiple nova.-xxconf 17:57:59 can start the thread 17:58:01 * mdbooth isn't sufficiently familiar with the detail to write it 17:58:04 kirankv: +1 17:58:18 So, RH qa hit an issue when suspending a rescue image 17:58:23 great - 2 mote minutes 17:58:26 go mdbooth 17:58:28 It explodes into multiple pieces 17:58:46 I was looking over the code earlier 17:58:46 cinder has taken this approach of multi backend after a while cinder project started and the main reason was from one cinder-colume service, they want to spawn as many cinver-volume process for a given driver 17:58:48 mdbooth: you have a bug link? 17:59:05 tjones: It's an outcrop of https://bugs.launchpad.net/nova/+bug/1269418 17:59:09 Launchpad bug 1269418 in openstack-vmwareapi-team "[OSSA 2014-017] nova rescue doesn't put VM into RESCUE status on vmware (CVE-2014-2573)" [High,In progress] 17:59:09 But it's really a new bug 17:59:21 But I only just verified, so it's not in lp yet 17:59:25 Anyway 17:59:29 ok 17:59:36 garyk points out that there's a band-aid fix for it 17:59:50 However, I wondered about changing the way we do rescue images 17:59:55 Instead of creating a new vm 18:00:03 We just add the rescue image to the exiting vm and boot from it 18:00:23 oops - outta time. lets move over to -vmware 18:00:26 That way, we don't have to do this dance to check if something's a rescue image all over the place 18:00:32 * mdbooth moves 18:00:51 #endmeeting