17:00:35 #startmeeting vmwareapi 17:00:36 Meeting started Wed Jun 4 17:00:35 2014 UTC and is due to finish in 60 minutes. The chair is tjones. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:37 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:39 The meeting name has been set to 'vmwareapi' 17:01:06 hi folks - who is here today? 17:01:15 me 17:01:17 hi 17:01:32 hi 17:01:40 eitaj 17:01:46 hi 17:01:50 garyk: ? 17:02:10 eitah is hello is south african slang 17:02:17 lol - cool 17:02:22 ok lets get started 17:02:39 today is nova bug day - that means we should be trying to fix/review bugs today instead of feature work 17:03:43 still i want to get a sense of where we are for features in the meeting today. I will be sending out an email on the ML showing our status with refactor specifically 17:03:58 tjones: sounds good 17:04:00 so lets talk about approved BP 1st 17:04:09 #topic approved BP status 17:04:10 https://blueprints.launchpad.net/openstack?searchtext=vmware 17:04:36 phase 1 of spawn refactor has a +2 from matt. 17:04:41 so we are almost done with that 17:04:49 ahhh, is that why it is arining outside 17:04:53 raining 17:04:59 vui - want to talk about phase 2? (lol gary) 17:05:27 my concerns with the refactoring are that backprting patches is really difficult 17:05:45 the patches are just about done. 17:06:06 I am in the rebase/reorg phase, to arrange them into self contained patches to post. 17:06:25 hopefully by today if no complications comes up 17:06:31 garyk: Indeed. Incidentally, that's also a reason to try to break down patches into small chunks. 17:06:33 vuil: once done with that please send me an email and i'll use that as part of the message to the ML 17:06:46 garyk: is there any possiblity of backporting the refactor? 17:06:49 will do. 17:07:00 mdbooth: i am not sure that small chunks will help when the whole code base is updated. 17:07:18 tjones: i am not sure about that. it may be a idea well worth exploring 17:07:54 tjones: maybe when we are done we can consider it and write a mail to the stable team if relevant 17:08:04 garyk: Well, the whole codebase needs to be updated :) Small chunks makes it easier to see what we did. 17:08:08 i think that we have hijacked the refactoring explanations by vui, sorry 17:08:24 lets take this to the open discussion part of the meeting 17:08:29 ok, thanks 17:08:30 vuil: anything else on phase 2? 17:09:02 not really, was thinking about the backports, but let's take that in the open disc. 17:09:06 ok 17:09:18 how about oslo? blocked on phase 2 i suspect 17:09:51 yeah, once the spawn refactor work goes up, it will be easier to continue on the oslo bit. 17:09:54 * mdbooth almost has concrete plans for an updated api, btw 17:10:24 mdbooth: lets take that in open discussion too :-D 17:10:25 mdbooth: share when do 17:10:35 last approved BP is hot plug - garyk? 17:10:51 tjones: it has been in review for months :) 17:11:08 i rebased and update the code after the summit... 17:11:22 garyk: thought you said there was something to do last week?? if not we need to get on the reviews 17:11:54 #action for all to review https://review.openstack.org/59365 and https://review.openstack.org/#/c/91005/ 17:12:18 tjones: not that i am aware of. rado wanted me to consider an idea he had. i am thinking about that but i do not think it should block what is posted 17:12:25 please take a look at those reviews (tomorrow after bug day) 17:12:36 #topic BP in review 17:12:37 https://review.openstack.org/#/q/status:open+project:openstack/nova-specs+message:vmware,n,z 17:13:17 anyone here want to discuss a BP in this list? There are a number needing update, but not sure if those flks are attending 17:13:54 tjones: i have posted the spec for the ephemeral disk support 17:13:58 will add the code tomorrow 17:14:03 garyk: great! thanks 17:15:14 ok assuming no other BP discsussion needed. 17:15:31 #topic bugs 17:15:50 our 50 bugs http://tinyurl.com/p28mz43 17:16:29 of those the ones which are not assigned http://tinyurl.com/kkyw9c4 17:16:39 18 of them 17:16:46 this is the perfect day to work on this list 17:17:16 anyone have a bug to discuss or should we go to open discussion since we have a lot to talk about? 17:17:22 going once..... 17:17:40 #topic open discussion 17:17:41 GO 17:18:03 iSCSI 17:18:05 i think we wanted to talk about backporting, api, and iSCSI 17:18:15 mdbooth: want to start? 17:18:16 tjones: when you send the mail to the list about the refactor can you please add in the bug lists 17:18:23 garyk: will do 17:18:28 tjones: thanks 17:18:44 iSCSI problem is hard, because it seems to require cluster config 17:18:45 mdbooth, I have posted a patch for the second issue (the first one being the auth that I think you are fixing) 17:19:04 arnaud: What's the second issue? 17:19:14 https://review.openstack.org/#/c/97612/ 17:19:46 arnaud: That's probably good, but unfortunately it doesn't solve the problem 17:19:57 hmm 17:20:03 that solves a part of the problem 17:20:13 the fact that you have powered off hosts 17:20:19 is not specific to ISCSI 17:20:26 iSCSI disks are rdm devices 17:20:35 rdm devices need to be present on all hosts 17:20:41 yes 17:20:50 I agree with that 17:20:54 that is the 'third' problem I guess, that is hard 17:21:07 Right 17:21:15 we have been looking at the vMotion problem 17:21:21 'third' problem 17:21:36 It's not just a vMotion problem 17:21:42 We tried to establish that DRS will not auto vmotion to a host that cannot see the device, which I believe to be the case 17:22:05 If the vm is taken down, it might not be able to come up again if it can't get on the host with its rdm device 17:22:21 same for a new host 17:22:31 yup 17:22:59 Anyway, auth is easy to add 17:23:25 But iSCSI is kinda broken until we come up with a solution to this 17:23:31 How much do we care? 17:23:57 we are looking at it so yes we care 17:23:58 :) 17:24:29 Are there any other examples of config which needs to be on all hosts? 17:24:40 * mdbooth can't think of any 17:25:20 I think fixing this may require a new db table 17:25:44 mdbooth: a new db stable specific to a virt driver will be problematic 17:25:51 what do you mean by config that needs to be on all host here? 17:25:55 garyk: I guessed as much :( 17:25:57 list of rdm devices? 17:26:03 vuil: Yes 17:26:41 mdbooth: there may be system metadata for hosts. i am not sure. 17:27:06 I think this does not have to be exposed. 17:27:25 It is somewhat analogous to a cluster with non-homogeneous hosts... 17:27:27 you reasoning with the table is to store this config to a table and everytime we scan we update the table, correct? 17:27:43 with some host not having enough resources to accept a VM, so it doesn't 17:27:51 arnaud: Yeah 17:28:49 vuil: We don't need to expose it. I just think we need to store some state. 17:29:12 I don't think it's unreasonable for a driver to store persistent state. 17:29:36 i do not understand something here - cider is reposnible for the volume right? 17:29:47 garyk: Yes. 17:30:03 garyk: Nova is responsible for making it available to the vm. 17:30:06 so cider should be aware of where the vm is running 17:30:22 then it can either perform the operation if possible or fail if not. 17:30:35 maybe i just do not understand the problems. sorry 17:30:37 garyk: No. The vm is running in the 'cluster'. Cinder should not have an internal view of the cluster. 17:30:58 mdbooth: cinder should. maybe the problem should be addressed there. 17:31:24 garyk: The VM may also move outside of openstack's control. 17:31:38 Because of HA or DRS, for eg. 17:31:38 mdbooth: why? 17:31:56 but if cinder was aware of that then it would not be a problem - for example with the vmdk driver 17:31:57 Or explicit vMotion by an admin. 17:31:58 mdbooth: purpose of the peristent state is...? 17:32:31 so we know which hosts need to rescan for new devices? 17:32:32 vuil: To store which targets needs to be configured. 17:32:49 Poss also rescan. 17:33:21 A rescan is needed to discover new targets. A rescan discover all discoverable targets 17:33:23 I haven't thought it through in detail, but I'm pretty sure doing it without persistent state would be a pain. 17:33:40 so in theory there is no need to track how many there is to discover 17:33:42 vuil: The target must also be added to the hba 17:33:54 When a new host joins the cluster, it won't have any configured targets 17:34:01 Or when a new target is added to host a 17:34:06 it won't also be added to host b 17:34:15 So you need to track both 17:34:16 Yeah sabari and I had a discussion about this. 17:35:21 One option is to wait till a new volume is added, and do the rescan 17:35:28 In theory you could scrape this from existing vm config, but that wouldn't be pretty or cheap 17:35:39 So technically it would be a denormalisation 17:36:11 Either way, you're going to want a reference golden state which is replicated to all hosts 17:36:13 vuil, you mean rescan each host everytime we add a volume to 1 host? 17:36:22 Not just rescan 17:36:29 Also add new targets as required 17:36:31 Or remove them 17:36:43 yeah, I am assuming that's part of it. 17:36:59 but essentially whatever we are doing for existing hosts, do the same for new one. 17:37:04 kinda punt the new host problem. 17:37:26 it doesn't participate in iscsi/rdm until a new volume is added 17:37:31 the new host can be specified in the doc 17:37:39 when you add a new host to your cluster, you need to scan 17:38:04 when add new volume is created and added to an instance, that already requires a rescan 17:38:17 Is there any vmware feature we can use for this? 17:38:28 It doesn't sound like a problem which is unique to us 17:38:41 Maybe some kind of profile 17:38:53 Then we could modify the profile 17:40:32 Without that, I think we can do this if we store targets and paths in some kind of persistent storage 17:40:49 tbh, I think that we rescan each host everytime and when we attach we make sure that we take a host that is powered on and we specify in the doc that when we add a new host, we need to scan 17:40:54 the problem is solved 17:40:59 Rescan is pretty expensive, btw 17:41:07 Many seconds 17:41:36 yeah, but that needs to be done at minimum for one host 17:42:39 the patch that arnaud added should have the correct host for the instance 17:42:47 would that not suffice? 17:43:01 garyk: No, because it won't continue to work if the vm moves for any reason. 17:43:19 not enough for the vmotion case and the new host case 17:43:28 And that move will not necessarily involve openstack. 17:43:58 it just seems like an edge case that the admin will move it to a host that does not see the correct device 17:44:06 I am leaning towards advocating that vmotion-enablement be done out of band by some admin action 17:44:32 vuil: Or DRS, or HA 17:44:40 by that I mean one can aschronously rescan the hba, and the VMs will be eventually vmotionable. 17:44:58 my understanidng would be that the all of the hosts should be able to see the same devices. if not then it seems to be a setup issue 17:45:03 DRS should not try and fail to vmotion the VM to a host that cannot see the device 17:45:21 garyk: It absolutely is a setup issue. The setup needs to be done by Nova. 17:45:28 so until some other hosts discovers the iscsi devices, the VM is essentially pinned (but temporarily) 17:45:49 mdbooth: why is the setup done by nova? 17:46:06 vuil: That's fine. But we still need to ensure that the config is propagated eventually. 17:46:20 i would think that it is done out of band by someone who wants to increase the capacity of their cloud - they just adnother host to the cluster 17:47:34 I think: config stored persistently somewhere (where?). Config on target host updated synchronously. Config on all other hosts updated asynchronously. Scan detects new hosts and auto adds all targets. 17:48:12 garyk: iscsi volumes can come and go. e.g. cinder create ... creates one. 17:48:24 garyk: It's not feasible for an admin. 17:49:01 Scan process could also be responsible for async update. 17:49:05 it is starting to sound like cinder should be resposnible for this 17:49:15 Cinder can't be responsible for this. 17:49:21 why? 17:49:23 Cinder doesn't control the cluster. 17:49:27 yeah because, it cinder doesn't know about the hosts 17:49:29 in the cluster 17:49:36 it cannot trigger the rescan of the targets 17:49:37 but cinder can add this support? 17:49:52 garyk: Cinder in this case is an iscsi provider. 17:49:57 i just feel that we are trying to solve the problem in the wrong place 17:50:03 It knows nothing of what is consuming the iscsi volumes. 17:50:15 This is a consumption problem, not a provision problem. 17:50:19 agreed 17:50:42 it receives a request to provide a resource. why does it not be 'clever' about that provisioning 17:51:16 Shall we continue this on the list? 9 mins for other topics. 17:51:24 the iscsi logic in cinder should not be aware of vmware 17:51:27 because the cinder driver is not VC aware 17:51:33 mdbooth: we could also move to openstack-vmware to continue 17:51:37 in real-time 17:51:42 Ok 17:52:19 ok, i was not aware that the cinder driver was not aware of vmware 17:52:24 lets do that 17:52:33 garyk it's not the vmdk driver in cinder 17:52:38 it's the lvm iscsi driver 17:52:44 i am currently working on the esx deprecation. 17:53:01 there are a number of issues there. i will send a mail to the list at some stage or another 17:53:09 the other topcis i am aware of are backporting due to the refactor. I think we decided that once phase 1 is complete, gary will ask the stable core if we can backpoint the refactor. the other issue was the api changes mdbooth was mentioning 17:53:46 mdbooth: fo you want to discuss this more here or was it just a heads up? 17:53:46 what api changes? 17:53:47 quick question: what is the advantage of backporting the refactor? 17:53:57 there is some churn in phase one, but the main upheaval happens in phase 2/3 17:54:06 faciliate backports 17:54:06 arnaud: we will need to add in bug fixes to the stable branch. 17:54:10 tjones: We need agreement to move forward. I've dropped it because I don't know how to proceed. 17:54:25 refactor related: https://review.openstack.org/#/c/97170/ 17:54:31 Different refactor 17:54:42 Quick opinions on this? 17:54:51 I similarly dislike vim_util, btw 17:55:00 i posted mine on the review 17:55:14 garyk: Yeah, want to move code changes to another patch. 17:55:16 mdbooth +1 I cannot more agree with this patch 17:55:19 seems fine, and fairly orthogonal to the current refactor work 17:55:47 mdbooth: why move them to another patch when you can do them on this one? 17:56:11 yes this and the power_off are orphans of phase 1 - but i hesitate to link them and block phase 2 17:56:23 garyk: Because if you're reviewing it, it simpler to see: this patch moves code around, that patch changes the code. 17:56:30 The 2 things are reviewed differently. 17:56:47 Also, if you're backporting, you can more easily pick out code changes. 17:56:55 cheaper to review a "move only" patch 17:57:03 Otherwise you're left manually scanning big chunks of very similar code. 17:57:10 if so then just address the log comments 17:57:30 garyk: That said, I agreed with all your comments. Will do another patch. 17:57:41 i.e. separate patch. 17:57:57 no, those should be done in this one 17:58:57 ok 2 minutes - i think we still have some chatting to do, so lets move over to openstack-vmware to continue 17:59:06 those are in the process of ebing changed in https://review.openstack.org/91352 . 18:00:05 gotta end now 18:00:13 moving to openstack-vmware 18:00:13 k 18:00:16 #endmeeting