17:00:35 <tjones> #startmeeting vmwareapi 17:00:36 <openstack> Meeting started Wed Jun 4 17:00:35 2014 UTC and is due to finish in 60 minutes. The chair is tjones. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:37 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:39 <openstack> The meeting name has been set to 'vmwareapi' 17:01:06 <tjones> hi folks - who is here today? 17:01:15 <browne> me 17:01:17 <arnaud> hi 17:01:32 <vuil> hi 17:01:40 <garyk> eitaj 17:01:46 <mdbooth> hi 17:01:50 <tjones> garyk: ? 17:02:10 <garyk> eitah is hello is south african slang 17:02:17 <tjones> lol - cool 17:02:22 <tjones> ok lets get started 17:02:39 <tjones> today is nova bug day - that means we should be trying to fix/review bugs today instead of feature work 17:03:43 <tjones> still i want to get a sense of where we are for features in the meeting today. I will be sending out an email on the ML showing our status with refactor specifically 17:03:58 <garyk> tjones: sounds good 17:04:00 <tjones> so lets talk about approved BP 1st 17:04:09 <tjones> #topic approved BP status 17:04:10 <tjones> https://blueprints.launchpad.net/openstack?searchtext=vmware 17:04:36 <tjones> phase 1 of spawn refactor has a +2 from matt. 17:04:41 <tjones> so we are almost done with that 17:04:49 <garyk> ahhh, is that why it is arining outside 17:04:53 <garyk> raining 17:04:59 <tjones> vui - want to talk about phase 2? (lol gary) 17:05:27 <garyk> my concerns with the refactoring are that backprting patches is really difficult 17:05:45 <vuil> the patches are just about done. 17:06:06 <vuil> I am in the rebase/reorg phase, to arrange them into self contained patches to post. 17:06:25 <vuil> hopefully by today if no complications comes up 17:06:31 <mdbooth> garyk: Indeed. Incidentally, that's also a reason to try to break down patches into small chunks. 17:06:33 <tjones> vuil: once done with that please send me an email and i'll use that as part of the message to the ML 17:06:46 <tjones> garyk: is there any possiblity of backporting the refactor? 17:06:49 <vuil> will do. 17:07:00 <garyk> mdbooth: i am not sure that small chunks will help when the whole code base is updated. 17:07:18 <garyk> tjones: i am not sure about that. it may be a idea well worth exploring 17:07:54 <garyk> tjones: maybe when we are done we can consider it and write a mail to the stable team if relevant 17:08:04 <mdbooth> garyk: Well, the whole codebase needs to be updated :) Small chunks makes it easier to see what we did. 17:08:08 <garyk> i think that we have hijacked the refactoring explanations by vui, sorry 17:08:24 <tjones> lets take this to the open discussion part of the meeting 17:08:29 <garyk> ok, thanks 17:08:30 <tjones> vuil: anything else on phase 2? 17:09:02 <vuil> not really, was thinking about the backports, but let's take that in the open disc. 17:09:06 <tjones> ok 17:09:18 <tjones> how about oslo? blocked on phase 2 i suspect 17:09:51 <vuil> yeah, once the spawn refactor work goes up, it will be easier to continue on the oslo bit. 17:09:54 * mdbooth almost has concrete plans for an updated api, btw 17:10:24 <tjones> mdbooth: lets take that in open discussion too :-D 17:10:25 <vuil> mdbooth: share when do 17:10:35 <tjones> last approved BP is hot plug - garyk? 17:10:51 <garyk> tjones: it has been in review for months :) 17:11:08 <garyk> i rebased and update the code after the summit... 17:11:22 <tjones> garyk: thought you said there was something to do last week?? if not we need to get on the reviews 17:11:54 <tjones> #action for all to review https://review.openstack.org/59365 and https://review.openstack.org/#/c/91005/ 17:12:18 <garyk> tjones: not that i am aware of. rado wanted me to consider an idea he had. i am thinking about that but i do not think it should block what is posted 17:12:25 <tjones> please take a look at those reviews (tomorrow after bug day) 17:12:36 <tjones> #topic BP in review 17:12:37 <tjones> https://review.openstack.org/#/q/status:open+project:openstack/nova-specs+message:vmware,n,z 17:13:17 <tjones> anyone here want to discuss a BP in this list? There are a number needing update, but not sure if those flks are attending 17:13:54 <garyk> tjones: i have posted the spec for the ephemeral disk support 17:13:58 <garyk> will add the code tomorrow 17:14:03 <tjones> garyk: great! thanks 17:15:14 <tjones> ok assuming no other BP discsussion needed. 17:15:31 <tjones> #topic bugs 17:15:50 <tjones> our 50 bugs http://tinyurl.com/p28mz43 17:16:29 <tjones> of those the ones which are not assigned http://tinyurl.com/kkyw9c4 17:16:39 <tjones> 18 of them 17:16:46 <tjones> this is the perfect day to work on this list 17:17:16 <tjones> anyone have a bug to discuss or should we go to open discussion since we have a lot to talk about? 17:17:22 <tjones> going once..... 17:17:40 <tjones> #topic open discussion 17:17:41 <tjones> GO 17:18:03 <mdbooth> iSCSI 17:18:05 <tjones> i think we wanted to talk about backporting, api, and iSCSI 17:18:15 <tjones> mdbooth: want to start? 17:18:16 <garyk> tjones: when you send the mail to the list about the refactor can you please add in the bug lists 17:18:23 <tjones> garyk: will do 17:18:28 <garyk> tjones: thanks 17:18:44 <mdbooth> iSCSI problem is hard, because it seems to require cluster config 17:18:45 <arnaud> mdbooth, I have posted a patch for the second issue (the first one being the auth that I think you are fixing) 17:19:04 <mdbooth> arnaud: What's the second issue? 17:19:14 <arnaud> https://review.openstack.org/#/c/97612/ 17:19:46 <mdbooth> arnaud: That's probably good, but unfortunately it doesn't solve the problem 17:19:57 <arnaud> hmm 17:20:03 <arnaud> that solves a part of the problem 17:20:13 <arnaud> the fact that you have powered off hosts 17:20:19 <arnaud> is not specific to ISCSI 17:20:26 <mdbooth> iSCSI disks are rdm devices 17:20:35 <mdbooth> rdm devices need to be present on all hosts 17:20:41 <arnaud> yes 17:20:50 <arnaud> I agree with that 17:20:54 <vuil> that is the 'third' problem I guess, that is hard 17:21:07 <mdbooth> Right 17:21:15 <arnaud> we have been looking at the vMotion problem 17:21:21 <arnaud> 'third' problem 17:21:36 <mdbooth> It's not just a vMotion problem 17:21:42 <vuil> We tried to establish that DRS will not auto vmotion to a host that cannot see the device, which I believe to be the case 17:22:05 <mdbooth> If the vm is taken down, it might not be able to come up again if it can't get on the host with its rdm device 17:22:21 <arnaud> same for a new host 17:22:31 <mdbooth> yup 17:22:59 <mdbooth> Anyway, auth is easy to add 17:23:25 <mdbooth> But iSCSI is kinda broken until we come up with a solution to this 17:23:31 <mdbooth> How much do we care? 17:23:57 <arnaud> we are looking at it so yes we care 17:23:58 <arnaud> :) 17:24:29 <mdbooth> Are there any other examples of config which needs to be on all hosts? 17:24:40 * mdbooth can't think of any 17:25:20 <mdbooth> I think fixing this may require a new db table 17:25:44 <garyk> mdbooth: a new db stable specific to a virt driver will be problematic 17:25:51 <vuil> what do you mean by config that needs to be on all host here? 17:25:55 <mdbooth> garyk: I guessed as much :( 17:25:57 <vuil> list of rdm devices? 17:26:03 <mdbooth> vuil: Yes 17:26:41 <garyk> mdbooth: there may be system metadata for hosts. i am not sure. 17:27:06 <vuil> I think this does not have to be exposed. 17:27:25 <vuil> It is somewhat analogous to a cluster with non-homogeneous hosts... 17:27:27 <arnaud> you reasoning with the table is to store this config to a table and everytime we scan we update the table, correct? 17:27:43 <vuil> with some host not having enough resources to accept a VM, so it doesn't 17:27:51 <mdbooth> arnaud: Yeah 17:28:49 <mdbooth> vuil: We don't need to expose it. I just think we need to store some state. 17:29:12 <mdbooth> I don't think it's unreasonable for a driver to store persistent state. 17:29:36 <garyk> i do not understand something here - cider is reposnible for the volume right? 17:29:47 <mdbooth> garyk: Yes. 17:30:03 <mdbooth> garyk: Nova is responsible for making it available to the vm. 17:30:06 <garyk> so cider should be aware of where the vm is running 17:30:22 <garyk> then it can either perform the operation if possible or fail if not. 17:30:35 <garyk> maybe i just do not understand the problems. sorry 17:30:37 <mdbooth> garyk: No. The vm is running in the 'cluster'. Cinder should not have an internal view of the cluster. 17:30:58 <garyk> mdbooth: cinder should. maybe the problem should be addressed there. 17:31:24 <mdbooth> garyk: The VM may also move outside of openstack's control. 17:31:38 <mdbooth> Because of HA or DRS, for eg. 17:31:38 <garyk> mdbooth: why? 17:31:56 <garyk> but if cinder was aware of that then it would not be a problem - for example with the vmdk driver 17:31:57 <mdbooth> Or explicit vMotion by an admin. 17:31:58 <vuil> mdbooth: purpose of the peristent state is...? 17:32:31 <vuil> so we know which hosts need to rescan for new devices? 17:32:32 <mdbooth> vuil: To store which targets needs to be configured. 17:32:49 <mdbooth> Poss also rescan. 17:33:21 <vuil> A rescan is needed to discover new targets. A rescan discover all discoverable targets 17:33:23 <mdbooth> I haven't thought it through in detail, but I'm pretty sure doing it without persistent state would be a pain. 17:33:40 <vuil> so in theory there is no need to track how many there is to discover 17:33:42 <mdbooth> vuil: The target must also be added to the hba 17:33:54 <mdbooth> When a new host joins the cluster, it won't have any configured targets 17:34:01 <mdbooth> Or when a new target is added to host a 17:34:06 <mdbooth> it won't also be added to host b 17:34:15 <mdbooth> So you need to track both 17:34:16 <vuil> Yeah sabari and I had a discussion about this. 17:35:21 <vuil> One option is to wait till a new volume is added, and do the rescan 17:35:28 <mdbooth> In theory you could scrape this from existing vm config, but that wouldn't be pretty or cheap 17:35:39 <mdbooth> So technically it would be a denormalisation 17:36:11 <mdbooth> Either way, you're going to want a reference golden state which is replicated to all hosts 17:36:13 <arnaud> vuil, you mean rescan each host everytime we add a volume to 1 host? 17:36:22 <mdbooth> Not just rescan 17:36:29 <mdbooth> Also add new targets as required 17:36:31 <mdbooth> Or remove them 17:36:43 <vuil> yeah, I am assuming that's part of it. 17:36:59 <vuil> but essentially whatever we are doing for existing hosts, do the same for new one. 17:37:04 <vuil> kinda punt the new host problem. 17:37:26 <vuil> it doesn't participate in iscsi/rdm until a new volume is added 17:37:31 <arnaud> the new host can be specified in the doc 17:37:39 <arnaud> when you add a new host to your cluster, you need to scan 17:38:04 <vuil> when add new volume is created and added to an instance, that already requires a rescan 17:38:17 <mdbooth> Is there any vmware feature we can use for this? 17:38:28 <mdbooth> It doesn't sound like a problem which is unique to us 17:38:41 <mdbooth> Maybe some kind of profile 17:38:53 <mdbooth> Then we could modify the profile 17:40:32 <mdbooth> Without that, I think we can do this if we store targets and paths in some kind of persistent storage 17:40:49 <arnaud> tbh, I think that we rescan each host everytime and when we attach we make sure that we take a host that is powered on and we specify in the doc that when we add a new host, we need to scan 17:40:54 <arnaud> the problem is solved 17:40:59 <mdbooth> Rescan is pretty expensive, btw 17:41:07 <mdbooth> Many seconds 17:41:36 <vuil> yeah, but that needs to be done at minimum for one host 17:42:39 <garyk> the patch that arnaud added should have the correct host for the instance 17:42:47 <garyk> would that not suffice? 17:43:01 <mdbooth> garyk: No, because it won't continue to work if the vm moves for any reason. 17:43:19 <arnaud> not enough for the vmotion case and the new host case 17:43:28 <mdbooth> And that move will not necessarily involve openstack. 17:43:58 <garyk> it just seems like an edge case that the admin will move it to a host that does not see the correct device 17:44:06 <vuil> I am leaning towards advocating that vmotion-enablement be done out of band by some admin action 17:44:32 <mdbooth> vuil: Or DRS, or HA 17:44:40 <vuil> by that I mean one can aschronously rescan the hba, and the VMs will be eventually vmotionable. 17:44:58 <garyk> my understanidng would be that the all of the hosts should be able to see the same devices. if not then it seems to be a setup issue 17:45:03 <vuil> DRS should not try and fail to vmotion the VM to a host that cannot see the device 17:45:21 <mdbooth> garyk: It absolutely is a setup issue. The setup needs to be done by Nova. 17:45:28 <vuil> so until some other hosts discovers the iscsi devices, the VM is essentially pinned (but temporarily) 17:45:49 <garyk> mdbooth: why is the setup done by nova? 17:46:06 <mdbooth> vuil: That's fine. But we still need to ensure that the config is propagated eventually. 17:46:20 <garyk> i would think that it is done out of band by someone who wants to increase the capacity of their cloud - they just adnother host to the cluster 17:47:34 <mdbooth> I think: config stored persistently somewhere (where?). Config on target host updated synchronously. Config on all other hosts updated asynchronously. Scan detects new hosts and auto adds all targets. 17:48:12 <mdbooth> garyk: iscsi volumes can come and go. e.g. cinder create ... creates one. 17:48:24 <mdbooth> garyk: It's not feasible for an admin. 17:49:01 <mdbooth> Scan process could also be responsible for async update. 17:49:05 <garyk> it is starting to sound like cinder should be resposnible for this 17:49:15 <mdbooth> Cinder can't be responsible for this. 17:49:21 <garyk> why? 17:49:23 <mdbooth> Cinder doesn't control the cluster. 17:49:27 <arnaud> yeah because, it cinder doesn't know about the hosts 17:49:29 <arnaud> in the cluster 17:49:36 <arnaud> it cannot trigger the rescan of the targets 17:49:37 <garyk> but cinder can add this support? 17:49:52 <mdbooth> garyk: Cinder in this case is an iscsi provider. 17:49:57 <garyk> i just feel that we are trying to solve the problem in the wrong place 17:50:03 <mdbooth> It knows nothing of what is consuming the iscsi volumes. 17:50:15 <mdbooth> This is a consumption problem, not a provision problem. 17:50:19 <arnaud> agreed 17:50:42 <garyk> it receives a request to provide a resource. why does it not be 'clever' about that provisioning 17:51:16 <mdbooth> Shall we continue this on the list? 9 mins for other topics. 17:51:24 <arnaud> the iscsi logic in cinder should not be aware of vmware 17:51:27 <vuil> because the cinder driver is not VC aware 17:51:33 <tjones> mdbooth: we could also move to openstack-vmware to continue 17:51:37 <tjones> in real-time 17:51:42 <mdbooth> Ok 17:52:19 <garyk> ok, i was not aware that the cinder driver was not aware of vmware 17:52:24 <vuil> lets do that 17:52:33 <arnaud> garyk it's not the vmdk driver in cinder 17:52:38 <arnaud> it's the lvm iscsi driver 17:52:44 <garyk> i am currently working on the esx deprecation. 17:53:01 <garyk> there are a number of issues there. i will send a mail to the list at some stage or another 17:53:09 <tjones> the other topcis i am aware of are backporting due to the refactor. I think we decided that once phase 1 is complete, gary will ask the stable core if we can backpoint the refactor. the other issue was the api changes mdbooth was mentioning 17:53:46 <tjones> mdbooth: fo you want to discuss this more here or was it just a heads up? 17:53:46 <garyk> what api changes? 17:53:47 <arnaud> quick question: what is the advantage of backporting the refactor? 17:53:57 <vuil> there is some churn in phase one, but the main upheaval happens in phase 2/3 17:54:06 <vuil> faciliate backports 17:54:06 <garyk> arnaud: we will need to add in bug fixes to the stable branch. 17:54:10 <mdbooth> tjones: We need agreement to move forward. I've dropped it because I don't know how to proceed. 17:54:25 <mdbooth> refactor related: https://review.openstack.org/#/c/97170/ 17:54:31 <mdbooth> Different refactor 17:54:42 <mdbooth> Quick opinions on this? 17:54:51 <mdbooth> I similarly dislike vim_util, btw 17:55:00 <garyk> i posted mine on the review 17:55:14 <mdbooth> garyk: Yeah, want to move code changes to another patch. 17:55:16 <arnaud> mdbooth +1 I cannot more agree with this patch 17:55:19 <vuil> seems fine, and fairly orthogonal to the current refactor work 17:55:47 <garyk> mdbooth: why move them to another patch when you can do them on this one? 17:56:11 <tjones> yes this and the power_off are orphans of phase 1 - but i hesitate to link them and block phase 2 17:56:23 <mdbooth> garyk: Because if you're reviewing it, it simpler to see: this patch moves code around, that patch changes the code. 17:56:30 <mdbooth> The 2 things are reviewed differently. 17:56:47 <mdbooth> Also, if you're backporting, you can more easily pick out code changes. 17:56:55 <tjones> cheaper to review a "move only" patch 17:57:03 <mdbooth> Otherwise you're left manually scanning big chunks of very similar code. 17:57:10 <garyk> if so then just address the log comments 17:57:30 <mdbooth> garyk: That said, I agreed with all your comments. Will do another patch. 17:57:41 <mdbooth> i.e. separate patch. 17:57:57 <garyk> no, those should be done in this one 17:58:57 <tjones> ok 2 minutes - i think we still have some chatting to do, so lets move over to openstack-vmware to continue 17:59:06 <garyk> those are in the process of ebing changed in https://review.openstack.org/91352 . 18:00:05 <tjones> gotta end now 18:00:13 <tjones> moving to openstack-vmware 18:00:13 <mdbooth> k 18:00:16 <tjones> #endmeeting