#openstack-meeting log

17:01:17 <tjones> #startmeeting vmwareapi
17:01:18 <openstack> Meeting started Wed Jul  9 17:01:17 2014 UTC and is due to finish in 60 minutes.  The chair is tjones. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:01:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:01:22 <openstack> The meeting name has been set to 'vmwareapi'
17:01:38 <tjones> hi folks - who's here today?
17:01:51 * mdbooth waves
17:02:07 <tjones> mdbooth: thought you were coming late?
17:02:14 <mdbooth> tjones: I am late :)
17:02:18 * mdbooth rushed
17:02:18 <tjones> lol
17:02:20 <kirankv> hi
17:02:22 <vuil> o/
17:02:22 <tjones> thanks!
17:02:45 <vuil> gary wants me to pass word along that he is not able to make the meeting.
17:03:15 <KanagarajM> Hi tjones, I'm here
17:03:18 <tjones> ok lets get started.  I want time during the open dicsussion to talk about the 1 cluster  / nova compute issue
17:03:28 <tjones> so lets wait until the end for that
17:03:36 <tjones> 1st lets talk about bugs
17:03:43 <tjones> #topic vmware api bugs
17:04:14 <tjones> I've put together this web page which allows more flxibility in looking at bugs
17:04:15 <tjones> http://54.201.139.117/demo.html
17:04:32 <KanagarajM> ok i have one defect https://review.openstack.org/99623
17:04:33 <tjones> if you type vmware in the search area you can see our bugs
17:04:53 <tjones> ok KanagarajM we can talk about that in a little bit
17:05:36 <tjones> I would like us to spend 10 minutes going through the bugs and calling out issues that we think would be affecting our customers from being able to successfully deploy openstack on vsphere
17:06:05 <tjones> for those bugs we need to add them to the vmwareapi project so they can be tracked separately
17:06:25 <tjones> we can't tag them unfortuately - so we have to add them to the project.  see this bug as an example
17:06:54 <tjones> https://bugs.launchpad.net/nova/+bug/1240373
17:06:56 <uvirtbot> Launchpad bug 1240373 in openstack-vmwareapi-team "VMware: Sparse glance vmdk's size property is mistaken for capacity" [High,Confirmed]
17:07:08 <tjones> we have to add it to the vmwareapi-team project
17:07:10 <tjones> make sense?
17:09:07 <mdbooth> tjones: Not really sure what I'm looking for, tbh
17:09:18 <tjones> these are things that we would like to make sure that a customer has fixes for before trying to deploy.  for example https://bugs.launchpad.net/nova/+bug/1255317
17:09:19 <uvirtbot> Launchpad bug 1255317 in nova "VMware: can't boot from sparse image copied to volume " [High,In progress]
17:10:03 <mdbooth> tjones: This one is extremely easy to hit: https://bugs.launchpad.net/nova/+bug/1333587
17:10:04 <uvirtbot> Launchpad bug 1333587 in nova "VMware: ExtendVirtualDisk_Task fails due to locked file" [High,Fix committed]
17:10:21 <mdbooth> It came from RH qa
17:10:36 <tjones> yes - exactly
17:11:05 <tjones> this is the kind of bug we should have added to the vmwarepi-team list.  I'll add it
17:11:05 <mdbooth> tjones: We just looking for vmware bugs?
17:11:25 <tjones> yes
17:11:29 <mdbooth> ok
17:11:55 <vuil> so that bug is fixed right?
17:12:13 <mdbooth> vuil: Yeah, upstream. Needs backporting, though
17:12:21 <tjones> i don't want to spend more than 5 minutes on this - but lets call out as many as we can.  we will do this every week.  you can also do it when you see something like this that would affect a customer
17:13:39 <tjones> this one also i think https://bugs.launchpad.net/nova/+bug/1304593
17:13:41 <uvirtbot> Launchpad bug 1304593 in nova "VMware: waste of disk datastore when root disk size of instance is 0" [High,In progress]
17:14:08 <mdbooth> tjones: I *think* I just fixed that one
17:14:12 <mdbooth> co-incidentally
17:14:17 <mdbooth> In the refactor
17:14:19 <tjones> awesome
17:15:06 <tjones> the purpose of this list is calling out only issues that will cause a customer to fail.  not to overwhelm them with issues
17:15:22 <tjones> ok lets move on to BP now
17:15:31 <KanagarajM> tjones: i would like to ask some approver could look into the defect https://bugs.launchpad.net/nova/+bug/1329261 .
17:15:33 <uvirtbot> Launchpad bug 1329261 in nova "vmware: VCdriver creates same hypervisor_hostname for different vcenters with same Cluster name" [High,In progress]
17:15:51 <tjones> oh sorry KanagarajM - you said you wanted to discuss that
17:16:01 <rgerganov_> hi, sorry for being late
17:16:15 <mdbooth> tjones: https://review.openstack.org/#/c/104145/5/nova/tests/virt/vmwareapi/test_vmops.py line 98
17:16:41 <tjones> we can only +1 it.  we cannot +2 it
17:16:57 <KanagarajM> oh ok
17:17:02 <mdbooth> That test change is required because it previously created an unnecessary copy, and now it doesn't
17:17:16 <mdbooth> So we had to change the target size to force the copy creation
17:17:30 <mdbooth> Which is what makes me suspect the root gb size 0 bug *might* be fixed
17:17:57 <tjones> KanagarajM: looks like you have been getting reviews - so once it has a couple +1 from this team and a +1 from minesweeper you should ask the core guys to take a look (on irc)
17:18:12 <tjones> mdbooth: cool - we should retest it then
17:18:27 <tjones> ok on to BP?
17:18:30 <KanagarajM> tjones: ok sure. thanks
17:18:55 <tjones> #topic approved BP (aka refactor)
17:19:04 <tjones> vuil: mdbooth: how's it going?
17:19:16 <mdbooth> tjones: Pretty good
17:19:29 <vuil> thanks to mdbooth for posting the patches, and updating some existing ones
17:19:45 <mdbooth> http://paste.fedoraproject.org/116749/26339140
17:19:51 <vuil> I am working on laying phase 3 patches on top of it.
17:20:12 <tjones> nice!!!
17:20:12 <mdbooth> vuil: The end of the line is currently https://review.openstack.org/104148
17:20:27 <vuil> pretty :-). How about I put it in https://etherpad.openstack.org/p/vmware-subteam-juno?
17:20:29 <tjones> i did a zillion reviews yesterday
17:20:29 <rgerganov_> the oslo integration is also ready for review
17:20:42 <vuil> thanks rado as well.
17:20:44 <mdbooth> vuil: Go ahead
17:20:47 <mdbooth> tjones: I saw, thanks
17:20:54 <tjones> garyk said that hot plug is moving well too
17:21:21 <tjones> mdbooth: vuil - so what's your feeling on when the refactor is completely merge??
17:21:42 <mdbooth> tjones: Well, we've been fairly successful in getting trivial fixes merged
17:21:43 <tjones> and rgerganov_ can you post the link to the oslo integration for this team?
17:22:06 <mdbooth> This has been incredibly useful, as it has meant we have dealt with a bunch of known potential merge conflicts already
17:22:27 <rgerganov_> tjones: here it is: https://review.openstack.org/#/c/70175/
17:22:35 <KanagarajM> dansmith:  could you please provide your inputs on the 1 cluster 1 compute process.
17:22:36 <rgerganov_> it has +1 from Jenkins and Minesweeper
17:22:44 <mdbooth> However, I feel that what's left has some patches in there which would require more than a glance
17:22:53 <tjones> KanagarajM: please wait until open discussion for that
17:22:57 <mdbooth> Realistically, I think we should be looking for a couple of sponsors
17:23:44 <tjones> dansmith: and mriedem have been pretty helpful - shoudl we ask them to sponsor?
17:24:06 <mdbooth> tjones: I'll be happy with whoever wants the job :)
17:24:12 <tjones> :-D
17:24:19 <vuil> Mark McL has been too
17:24:26 <tjones> true
17:24:27 <mdbooth> Again, I'm happy to stay on top of review comments aggressively
17:24:36 <tjones> thanks - this is moving nicely
17:24:52 <tjones> rgerganov_: looks like you could use a few more reviews on http://paste.fedoraproject.org/116749/26339140
17:25:10 <vuil> I am working through those as well
17:25:13 <tjones> it's a big one - should we try to get a sponsor for this too?
17:25:22 <mdbooth> rgerganov_: Sorry, employer interrupt meant I didn't get to that :(
17:25:30 <KanagarajM> tjones:  I would like to get approval for the BP https://review.openstack.org/104211, NFS glance datastore. could you please bring it here thanks.
17:25:48 <tjones> KanagarajM: please wait until we get to unapproved BP for that.  it is next
17:26:15 <KanagarajM> tjones: sure.
17:26:22 <tjones> any other approved BP we should cover?
17:26:48 <mdbooth> vuil: Does refactor phase 3 cover factoring out image cache -> disk creation?
17:26:54 <mdbooth> from spawn
17:27:08 * mdbooth vaguely recalls it does, but hasn't gotten around to looking, yet
17:27:12 <vuil> image fetch/convert/cache/use
17:27:24 <mdbooth> vuil: Awesome. I have a use for that.
17:27:54 <tjones> cool
17:27:57 <tjones> ok lets move on
17:27:58 <mdbooth> vuil: Drop me a mail if you want any help with those patches, btw
17:28:01 <tjones> #topic unapproved BP
17:28:05 <tjones> #link https://review.openstack.org/#/q/status:open+project:openstack/nova-specs+message:vmware,n,z
17:28:14 <mdbooth> vuil: Otherwise I'm going to concentrate on the existing queue
17:28:15 <tjones> kirankv: want to go 1st?
17:28:37 <vuil> mdbooth: will do (in the middle of decomposing that into small ones)
17:28:48 <kirankv> https://review.openstack.org/98704
17:29:15 <mdbooth> vuil: Excellent. I think that's working well.
17:29:29 <kirankv> need review on this one https://review.openstack.org/98704
17:29:43 <tjones> kirankv: should you chat with johnthetubaguy on irc to cover his concern (and remove the -1)?
17:30:08 <kirankv> i chatted with him today, he mentioned he will look into it tommorrow
17:30:14 <tjones> great
17:30:31 <mdbooth> I was actually the one who pointed out the issue to him
17:30:39 <tjones> ok looks like it could use more reviews from this team too.  guys please take a look
17:30:41 <mdbooth> So I'm moderately familiar with it
17:31:07 <tjones> anything else kirankv?
17:31:31 <kirankv> the other one I have to address the sync issues
17:31:50 <kirankv> tjones: thanks
17:31:53 <mdbooth> Ah, I may be thinking of the other one
17:31:55 <tjones> ok lets wait until we finish BP and then do that in open discsussion
17:32:00 <tjones> KanagarajM: your turn
17:32:05 <KanagarajM> yes
17:32:17 <KanagarajM> would like to get approval for https://review.openstack.org/104211 NFS glance
17:32:25 <KanagarajM> datastore
17:32:43 <tjones> we cannot approve - we can only review
17:32:55 <tjones> guys can you please review this BP this week?
17:33:13 <tjones> KanagarajM: you want to say a few words about it?
17:33:20 <KanagarajM> should i wait till get approval for a BP, or i can go ahead for the patch submit?
17:33:48 <KanagarajM> sure: it actually uses the datastore mounted on the NFS mount point created on the image store of a glance
17:34:21 <KanagarajM> during the instance boot, compute copies the image from glance to vmare_temp by using the NFS datastore
17:34:26 <tjones> you can push patches but they will get -2 until the BP is approved.
17:34:44 <KanagarajM> it improved the image caching exponentially
17:35:11 <rgerganov_> KanagarajM: do you have any measurements?
17:35:16 <KanagarajM> sure, i will wait for the approval, but not sure how to get the attention of the core approvers , as this blue print is completely for the vmware vc driver
17:35:43 <KanagarajM> yes, for 800 MB image, the boot time reduced from 16 mins to 6 minutes
17:35:43 <mdbooth> rgerganov_: I can well believe it. I'm convinced the performance of the copy code must be terrible from looking at it
17:35:47 <rgerganov_> KanagarajM: if you have any data, then I'd suggest to put it in the spec
17:35:52 <tjones> i'd start with getting arnaud__ to take a look as he is a core reviewer in glance
17:36:11 <KanagarajM> sure, thats great idea. thanks rgerganov.
17:36:13 <tjones> wow !  that is an improvement!!
17:36:14 <kirankv> KanagarajM: it depends on the vCenter version you are using
17:36:32 <mdbooth> KanagarajM: Is that a common deployment, btw?
17:36:38 <kirankv> 2Gb generally takes 5m
17:36:51 <mdbooth> i.e. glance image store externally accessible by nfs?
17:36:57 <KanagarajM> kiranv : that is right its an windows vc
17:37:11 <KanagarajM> mdbooth: couldn't get your question
17:37:13 * mdbooth doesn't know how people use this stuff in the real world
17:37:42 <kirankv> yes, the right vCenter update versions have better performance
17:37:43 <mdbooth> KanagarajM: I'm assuming that your BP only covers the case where the glance image store *can* be mounted via NFS from an ESX host
17:37:50 <mdbooth> Is that a common deployment architecture?
17:37:57 <mdbooth> i.e. is anybody likely to be able to use it?
17:39:21 <vuil> NFS datastore is a common deployment scenario.
17:39:50 <vuil> some deployment only uses high end NFS, in fact
17:40:01 <mdbooth> I meant for glance
17:40:25 * mdbooth used to work at a VMware shop which used NFS for everything
17:41:50 <tjones> so we really need to move onto open discussion.  lets table this and continue either on the ML or in -vmware after the meeting
17:41:57 <tjones> #topic open discussion
17:42:49 <tjones> there is some confusion over what we are doing with 1vc / n-cpu.
17:43:13 <kirankv> yes, would prefer that the current approach be retained
17:43:17 <tjones> as i understood this was a mandate from the core guys made in atlanta
17:43:43 <kirankv> you can achieve the 1cluster to 1 service mapping with existing implementation
17:43:50 <rgerganov_> I have posted the patch for dropping the multi-cluster support because this was on our roadmap for Juno
17:43:56 <mdbooth> kirankv: We lost that argument. No point in continuing it imho.
17:44:33 <kirankv> tjones: only when we have a viable alternative can be drop it
17:44:57 <tjones> *looking for the notes from last week*
17:45:51 <vuil> kirankv: what would we be missing with multiple n-cpu processes managing a single cluster each?
17:46:03 <vuil> as a viable alternative I mean
17:46:15 <mdbooth> vuil: Apart from the additional resources they consume ;)
17:46:22 <kirankv> vuil: the memory footprint is more
17:46:42 <mdbooth> kirankv: I agree with you, but we lost.
17:46:46 <kirankv> number of vcenter connection
17:47:09 <rgerganov_> the real problem is with the image cache which will be replicated for each nova host
17:47:31 <rgerganov_> I believe this can be solved with nova-conductor but requires more thinking and research
17:47:51 <tjones> frankly i've lost sight of the problem we were solving by even doing this.  can someone recap why the core guys insisted on this?
17:48:05 <mdbooth> rgerganov_: The image cache is already replicated per datastore.
17:48:17 <mdbooth> I don't think will change with multiple n-cpu
17:48:26 <rgerganov_> tjones: nova-compute is supposed to scale horizontally, not vertically. that was the main point I think
17:48:29 <KanagarajM> i have last the connection in IRC, and am back
17:48:45 <kirankv> rgrerganov: my BP intends to solve this :)
17:48:54 <tjones> rgerganov_:  oh yeah right
17:49:32 * mdbooth sees it as a deployment decision, personally
17:49:37 <rgerganov_> mdbooth: you won't replicate the cache for each cluster right now
17:49:40 <mdbooth> Admins are grown-ups too
17:50:03 <vuil> and the feeling that nodes concept was being abused, and they want it done
17:50:04 <kirankv> mdbooth: yes its a deployment decision and we should leave the choice as it exists today
17:50:19 <rgerganov_> vuil: correct
17:50:20 <vuil> s/done/gone
17:50:55 <mdbooth> rgerganov_: You'll replicate it per datastore
17:51:06 <mdbooth> So unless the clusters share datastores, it'll make no difference
17:51:09 <KanagarajM> tjones: is the discussion strated on 1 cluster 1 nova-compute, i have lost the connection
17:51:19 <tjones> yes we are discussing
17:51:48 <rgerganov_> mdbooth: it is a common case to share a datastore among clusters in the same vCenter I believe
17:52:29 <mdbooth> rgerganov_: Ok, in that case it would make a difference
17:52:57 <kirankv> yes, in deployments ive seen its always shared datastores within the cluster
17:53:01 <KanagarajM> ok, multibackend solves many problems as listed in BP and in addition, i have added comments saying that , its going to save Capital Expense and running expense for the cloud provider
17:53:11 <mdbooth> Anyway, regardless of the merits of it, we just need to get this done
17:53:17 <kirankv> mdbooth: are you referring to shared datastores across clusters?
17:53:23 <mdbooth> kirankv: yes
17:53:59 * mdbooth would like to float an idea briefly before we finish, btw
17:54:09 <kirankv> mdbooth: ah, so it we were to cache images on such a shared datastore it would help
17:54:46 <kirankv> tjones: can we tell nova core that we have issues to resolve before we can move to the 1 cluster = 1 compute service model
17:55:31 <tjones> kirankv: i think it would be best to send something out on the ML discussing the issues we are facing for full transparency
17:55:45 <kirankv> and until that we will leave this model, the admin still has the choice of using 1 cluster = 1 seevice in the existing design
17:55:47 <rgerganov_> tjones: +1
17:55:48 <mdbooth> +1
17:57:06 <tjones> who is going to write it - i feel like it's a best a joint email from (for example) rgerganov_, mdbooth, and kirankv.  thoughts?
17:57:10 <KanagarajM> ok there are different ways to solve the 1 cluster 1 service issue, but they are having draw backs as listed in the BP like keeping same configuration across mutiple nova.-xxconf
17:57:59 <kirankv> can start the thread
17:58:01 * mdbooth isn't sufficiently familiar with the detail to write it
17:58:04 <mdbooth> kirankv: +1
17:58:18 <mdbooth> So, RH qa hit an issue when suspending a rescue image
17:58:23 <tjones> great - 2 mote minutes
17:58:26 <tjones> go mdbooth
17:58:28 <mdbooth> It explodes into multiple pieces
17:58:46 <mdbooth> I was looking over the code earlier
17:58:46 <KanagarajM> cinder has taken this approach of multi backend after a while cinder project started and the main reason was from one cinder-colume service, they want to spawn as many cinver-volume process for a given driver
17:58:48 <tjones> mdbooth: you have a bug link?
17:59:05 <mdbooth> tjones: It's an outcrop of https://bugs.launchpad.net/nova/+bug/1269418
17:59:09 <uvirtbot> Launchpad bug 1269418 in openstack-vmwareapi-team "[OSSA 2014-017] nova rescue doesn't put VM into RESCUE status on vmware (CVE-2014-2573)" [High,In progress]
17:59:09 <mdbooth> But it's really a new bug
17:59:21 <mdbooth> But I only just verified, so it's not in lp yet
17:59:25 <mdbooth> Anyway
17:59:29 <tjones> ok
17:59:36 <mdbooth> garyk points out that there's a band-aid fix for it
17:59:50 <mdbooth> However, I wondered about changing the way we do rescue images
17:59:55 <mdbooth> Instead of creating a new vm
18:00:03 <mdbooth> We just add the rescue image to the exiting vm and boot from it
18:00:23 <tjones> oops - outta time.  lets move over to -vmware
18:00:26 <mdbooth> That way, we don't have to do this dance to check if something's a rescue image all over the place
18:00:32 * mdbooth moves
18:00:51 <tjones> #endmeeting