17:01:29 <tjones> #startmeeting vmwareapi 17:01:30 <openstack> Meeting started Wed Aug 6 17:01:29 2014 UTC and is due to finish in 60 minutes. The chair is tjones. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:01:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:01:33 <openstack> The meeting name has been set to 'vmwareapi' 17:02:01 <tjones> anyone here today? 17:02:10 * mdbooth waves 17:02:12 <vuil> o/ 17:02:13 <arnaud> o/ 17:02:43 <tjones> hey guys - so it's all about reviews and bug fixing i think for us. 17:03:34 <garyk> hi 17:03:59 <tjones> it looks to me that the only BP we will get in juno are spawn refactor, oslo.vmware, and v3 diags. anyone think anything different? - like spbm and vsan?? 17:04:20 <mdbooth> tjones: I don't see it happening 17:04:49 <garyk> tjones: we are working on spbm and ephemeral - all have code posted 17:04:51 <vuil> vsan bp was approved too I thought, but with all the logjam of patches still needed reviews, yeah. 17:04:55 <tjones> spbm is set to "good progres" - garyk what do you think? 17:05:15 <garyk> tjones: code was completed about 8 months ago. we just need to rebase - i will do that tomorrow 17:05:35 <garyk> yesterday we had the oslo.vmware updated so the spbm code can now be used 17:05:51 <garyk> it is all above tje oslo.vmware integration patch 17:05:55 <tjones> garyk: im spacing - where is ephemeral one ?? https://blueprints.launchpad.net/openstack?searchtext=vmware (just got back from vacation and still fuzzy) 17:05:59 <garyk> which may land in 2021 17:06:10 <tjones> lol 17:06:19 <garyk> https://review.openstack.org/109432 17:06:40 <vuil> the few patches using oslo.vmware to provide streamOptimized/vsan support are being updated right now as well. 17:06:41 <mdbooth> garyk: Incidentally, I spent the afternoon looking at bdm 17:06:52 <mdbooth> And I agree with you 17:07:05 <mdbooth> about ephemeral, that is 17:07:08 <garyk> ah, that is is only relevant to libvirt? 17:07:18 <mdbooth> No, it's definitely relevant to us 17:07:30 <mdbooth> However, there's no need for it to be in this patch 17:07:42 <vuil> *missing context re bdm* 17:07:51 <mdbooth> block device mapping 17:07:52 <garyk> ah, ok. then next stuff i'll add in a patch after that 17:08:21 <mdbooth> garyk: See my big comment in spawn() about the driver behaviour being broken wrt bdm? 17:08:27 <mdbooth> I think it needs to be fixed along with that 17:08:48 <mdbooth> Probably quite an involved patch 17:08:51 <garyk> mdbooth: i think that the bdm support is broken in general 17:08:58 * mdbooth would be happy to write it, though 17:08:58 <garyk> but that is for another discussion - 17:09:03 <mdbooth> garyk: +1 :) 17:09:14 <tjones> ok so lets go through the BP 1 by 1 (we can revisit bdm later on) 17:09:15 <tjones> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/vmware-spawn-refactor,n,z 17:09:16 <vuil> I noted the same in my earlier phase 3 stuff too, so yeah this should be one of the first things to deal with after the refactor 17:09:24 <tjones> here's spawn 17:09:29 <garyk> after we chatted there was someone who wrote on irc that he did not manage to get it working in I… 17:10:21 <mdbooth> tjones: That's currently blocked on minesweeper 17:10:29 <tjones> arrggghhhhhh!!! 17:10:41 <mdbooth> Vui and I chatted just before this meeting about the phase 3 stuff 17:10:45 <tjones> down again? 17:11:03 <dansmith> tjones: hasn't it been down for a couple weeks now? or is it just really spotty? 17:11:07 <mdbooth> garyk had a handle on it, I believe 17:11:10 <tjones> is phase 3 in that list? im losing track of 17:11:18 <dansmith> we've been asking about minesweeper status in -nova for at least two weeks 17:11:30 <garyk> dansmith: no, it has been up. just the last week. just a few problems last 2 days 17:11:39 <tjones> dansmith: spotty. i literaly got back from vacation just before this meeting. i'll get an update after 17:11:45 <mdbooth> tjones: Yes. Basically all Vui's patches are phase 3. 17:11:48 <dansmith> garyk: hmm, okay I've recheck-vmware'd a few things and never get responses 17:12:08 <garyk> dansmith: that is due to the fact that the queue is very long due to eh fact it was down for a while 17:12:24 <dansmith> okay, I wish we could see the queue so we'd know, but... okay 17:12:33 <vuil> Matt had some nice suggestions, I will be taking on those and posting an update to the phase 3 chain of patches. 17:12:36 <garyk> it was averaging about 14 patches a day due to infra issues 17:12:41 <mdbooth> +1 17:13:00 <mdbooth> I understand it's on internal infrastructure, though 17:13:57 <tjones> ok im assuming oslo.vmware also blocked on minesweeper 17:14:23 <vuil> and reviews obv 17:14:31 <garyk> tjones: all of those patches had +1's then we need to rebase and it was at a time when ms was down. so back to sqaure 1. 17:14:35 <dansmith> tjones: we talked last week about not approving any without them 17:14:50 <dansmith> them == votes 17:14:57 <tjones> yes - i agree we need minesweeper runs 17:15:02 <dansmith> I've been trying to come back to patches I've -2d regularly to check for minesweeper votes 17:15:27 <dansmith> I definitely don't want to be the guy that -2s for that and then holds us up after MS shows up :) 17:15:35 <tjones> :-D 17:15:52 <garyk> dansmith: it is understaood. rules are rules 17:16:02 <tjones> ok here's the complete list of our stuff out for review 17:16:03 <garyk> we just need to et our act together with minesweeper 17:16:05 <tjones> #link https://review.openstack.org/#/q/status:open+project:openstack/nova+message:vmware,n,z 17:16:12 <mdbooth> Hmm, I have an actual -1 from dansmith 17:16:15 <garyk> but when it is up it would be nice if the patches could get some extra eyes 17:16:22 <dansmith> mdbooth: frame it! 17:16:26 <mdbooth> In phase 2 spawn refactor 17:16:32 <tjones> we still need our team to be reviewing like mad and so when MS comes back we are ready 17:16:32 <vuil> @dansmith: on related note even when minesweeper passes, we have seen −1 on xenserver CI quite a bit despite rechecks, does that −1 factor into the filtering for reviewable things ? 17:16:34 <mdbooth> dansmith: I'm honoured :) 17:16:42 <dansmith> vuil: not to me 17:16:55 * mdbooth can address that tomorrow 17:16:57 <dansmith> vuil: everyone gets -1s from xen ci right now :) 17:17:10 <garyk> just not that ms does not run on patches in the test directory 17:17:11 <dansmith> vuil: they're working hard on that too 17:17:29 <garyk> so a patch like https://review.openstack.org/105454 should not be blopcked 17:17:37 <vuil> ah got it 17:17:43 <dansmith> garyk: yeah, that makes sense to me 17:18:20 <tjones> anything else on BP ? 17:19:21 <tjones> *listening* 17:19:38 <tjones> ok lets talk about bugs 17:19:41 <tjones> #topic bugs 17:19:50 <tjones> #link http://tinyurl.com/p28mz43 17:19:53 <tjones> we have 59 17:20:04 <tjones> a number that is not going down 17:20:27 <tjones> we have a number of these in new 17:20:34 <tjones> or triaged state 17:21:01 <garyk> tjones: a lot of them have been triaged and a lot are in progress and a lot have been completed. 17:21:08 <garyk> we need to do a cleanup 17:21:12 <tjones> i filtered out completed 17:21:23 <garyk> some are also very concerning - basically the multi cluster stuff break a lot of things 17:21:34 <tjones> these are only new, inporgress, traged, and confirmed 17:21:45 <garyk> i am in favor of pushsing rado's patch which drops the support as we discussed at the summit 17:22:44 <tjones> HP raised a lot of concerns about that as i recall 17:23:36 <mdbooth> The principal concern was memory usage on vsphere, right? 17:23:46 <garyk> yeah, i asked that they write to the list so that we can get some discussion going about it and there was nothing 17:23:54 <mdbooth> I saw that 17:23:57 <garyk> that was one - but it is something that can be addressed 17:23:57 <tjones> i thought they posted something 17:24:12 <garyk> my main concern is that each compute node has its own cache dir - that is very costly 17:24:18 <tjones> i thought it was spinning up a n-compute for each cluster they did not like and the image cachce 17:24:41 <mdbooth> So, I happened to read a tripleo thing about deploying multiple novas per node earlier 17:24:48 <garyk> tjones: i do not recall seeing anything. if someone did can you please forward the mail message to me 17:24:59 <mdbooth> That would solve a provisioning issue, but not the memory usage thing 17:25:04 <tjones> https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg29338.html 17:25:38 <mdbooth> garyk: I think that's something we're going to have to live with until we get inter-node locking 17:26:36 <tjones> i think the biggest issue is what to do with existing customers that have deployed the current way. what is the upgrade path? 17:26:44 <mdbooth> garyk: Random thought: how likely is the backing store to do de-duplication? 17:26:49 <garyk> tjones: thanks 17:27:10 <vuil> @mdbooth storage-vendor specific 17:27:17 <mdbooth> Right, but is it common? 17:27:46 <garyk> mdbooth: soryy i do not understand 17:27:51 <vuil> *not entirely sure* 17:28:08 <mdbooth> Would suck to have to recommend it, for sure 17:29:19 <mdbooth> tjones: Is anybody looking at that, btw? 17:30:21 <tjones> mdbooth: looking at?? 17:30:30 <tjones> upgrade? 17:30:33 <mdbooth> tjones: yeah 17:30:40 <tjones> no that i am aware of 17:30:58 <tjones> i tihink we have to tackle that before we do this change though 17:31:21 <mdbooth> Is anybody currently motivated to do it? 17:31:37 <mdbooth> s/motivated/motivatable/ 17:31:47 <vuil> so the path we are talking about is provide inter-node locking then make image caching more efficient 17:31:47 <tjones> i dont see any way around saying "you have to reconfgure your n-compute" 17:32:11 <vuil> before taking out multi-cluster? 17:32:19 <mdbooth> tjones: Right. Presumably also db hacking. 17:32:24 <mdbooth> Much nastiness. 17:32:30 <tjones> yes 17:32:33 <garyk> mdbooth: there is no need for db hacking 17:32:40 <garyk> there is a patch for the esx migration 17:32:49 <garyk> an external utility 17:33:03 <garyk> https://review.openstack.org/101744 17:34:03 <tjones> so the issues with this are 1. soap takes too much memory / connection (could be solved with pyvmomi) 2. image cache duplication 3. upgrade path. 17:34:06 <tjones> right? 17:34:12 <garyk> btw there is also a pacth for the actual esx deprecation 17:34:32 <mdbooth> tjones: I don't see how pyvmomi would solve 1 17:34:43 <tjones> doesn't is use a different transport? 17:34:47 <tjones> not soap 17:34:49 <mdbooth> If it can, then we can presumably solve it without pyvmomi 17:34:54 <mdbooth> Don't think so 17:35:07 <vuil> no still soap, but on a more lightweight stack 17:35:27 <mdbooth> More lightweight on the client side, no difference to the server 17:35:29 <vuil> as in we may save some memory usage by taking out suds 17:35:40 <vuil> *remains to be seen* 17:35:47 <mdbooth> Our problem is on the server, though, iiuc 17:36:02 <vuil> hp was concerned about multiple computes each taking up lots of resources. 17:36:35 <garyk> my concerns of the multi cluster support are edge cases - for example the rezie issue 17:36:37 <mdbooth> vuil: That was resources on vcenter, though, right? 17:36:38 <vuil> I don't think server impact is going to be much 17:36:44 <tjones> i thought they were concerned on the client side - running multiple n-compute 17:36:57 * mdbooth might have misunderstood this 17:36:58 <vuil> no actual python n-cpu processes taking up memory. 17:37:05 <garyk> https://review.openstack.org/108225 17:37:05 <tjones> yeah that is what i thought 17:37:17 <dansmith> I thought I heard somethign about server side too 17:37:27 <dansmith> because each connection to vcenter comes with a lot of overhead 17:37:30 <vuil> in terms of load on VC it is pretty much the same whether it comes from one ncpu managing N clusters or N ncpu managing one each 17:37:34 <mdbooth> ~140MB in the driver 17:37:37 <mdbooth> Got it 17:37:39 <dansmith> so one compute using one connection for lots of machines 17:37:44 <mdbooth> Ok, that's way more manageable 17:38:22 <vuil> @dansmith, addition of a couple more connections should not be too big of a deal. 17:38:31 <dansmith> vuil: okay 17:39:19 <mdbooth> So, tripleo were talking about deploying multiple novas per server in separate containers 17:39:57 <tjones> even if we can decrease the client side memory load we still have the duplcate cache and upgrade. with duplicate cache we could solve with using shared datastore for glance - right? 17:40:03 <mdbooth> And in the grand scheme of thinks, anybody deploying 32 VMware clusters worth of openstack isn't going to notice the cost of 4GB of RAM 17:40:27 <mdbooth> Although it's inelegant to waste it, it's probably not a huge deal 17:41:11 <vuil> mdbooth: my thoughts as well. 17:42:06 <mdbooth> Duplicate cache: 17:42:18 <mdbooth> Only an issue for clusters sharing datastores 17:43:16 <mdbooth> Otherwise the cache would be duplicated anyway 17:43:47 <tjones> we need to get to a place where we can implement this change without screwing HPs existing customers…. 17:44:13 <garyk> tjones: yes, they are in production and this would break a installtion 17:44:24 <tjones> yep 17:44:51 <dansmith> so, 17:45:06 <dansmith> sounds like maybe we should put something into the juno release notes that such an arrangement is deprecated 17:45:17 <dansmith> to give time and notice so we can get something into kilo? 17:45:17 <tjones> i was just typing that very thing 17:45:25 <dansmith> cool 17:45:29 <tjones> we should deprecate thsi to give them some time 17:45:45 <dansmith> is there a config variable that would go away that we can also document as deprecated? 17:46:09 <dansmith> and, despite it's limited usefulness, we should also log.warning("this is going away soon") if they have that configured 17:46:16 <dansmith> per usual protocol 17:46:22 <garyk> no, there is no specific config var 17:46:22 <mdbooth> The one which selects clusters, presumably 17:46:34 <tjones> yeah - i think it's a list 17:46:36 <garyk> we can identify if there is more than one cluster configured 17:46:48 <dansmith> so, iirc, the patch changed it from a list to a string, which we can't do anyway 17:46:52 <garyk> but i think that we should address the issue on the list with the guys from hp 17:47:16 <mdbooth> self._cluster_names = CONF.vmware.cluster_name 17:47:17 <dansmith> so: 1. document in release notes, 2. log.warning() if len(list)>1 ? 17:47:25 <mdbooth> Hmm 17:47:34 <mdbooth> So it's still going to be called 'cluster_name', presumably 17:47:44 <mdbooth> Except it's no longer going to accept a list 17:47:47 <garyk> https://github.com/openstack/nova/blob/master/nova/virt/vmwareapi/driver.py#L60 17:47:51 <mdbooth> That's ugly 17:47:59 <dansmith> mdbooth: yeah, that was my complaint, IIRC 17:48:06 <garyk> i think that this needs more discussion prior to us doing a deprecation warning 17:48:41 <tjones> we really should reply back on that thread 17:48:46 <dansmith> garyk: IMHO, we mark as deprecated to force the conversation, nothing says we *have* to yank it in kilo if not ready 17:48:53 <garyk> an existing customer will also need an upgrade patch for cached images. 17:48:58 <tjones> to keep kiran in the loop - this time is not very good for him 17:49:10 <garyk> dansmith: i think that is not a good approach 17:49:24 <garyk> why force something until we have thought it out properly 17:49:50 <dansmith> garyk: well, because I feel like we've decided that it's coming out, we just have to figure out how it's going to look when it's out 17:50:01 <dansmith> garyk: and because if we're going to deprecate, starting the timer limits when we can actually remove it 17:50:15 <garyk> no, we have not decided 17:50:28 <garyk> a few people have but the general community has some issues with this 17:50:30 <dansmith> we agreed at summit, no? 17:50:51 <garyk> at the summit we discussed it in a small room. after the summit people started to raise issues 17:51:04 <garyk> are we not allowed to change things after problems and issues are raised? 17:51:15 <garyk> is that going to build a healthy community discussion? 17:51:40 <dansmith> sure, we can.. I didn't think any of the discussion here was considering the option of not doing it 17:51:50 <dansmith> but that's fine, see how the ML thread goes 17:51:50 <garyk> some people who work on the driver were not able to attend the summit and only afterthey were awair of discussion did they reiase thier issues 17:52:08 <garyk> and i think that they have come with some valid arguments 17:52:09 <dansmith> but we should make sure to revisit before too late in juno, merely for the deprecation timer ... timing :) 17:52:27 <garyk> to be honest i am kicking myself for not having found the problems when we originally added the feature, but we all approved that 17:52:42 <tjones> ok the way we got to this discussion was because we were talking about bugs 17:52:48 <tjones> #link http://preview.tinyurl.com/kkyw9c4 17:52:50 <garyk> i will follow up on this list tomorrow about this 17:53:05 <tjones> of the 59 there are 20 that are not owned by anyone and some are high prio 17:53:27 <tjones> garyk thanks for following up 17:54:33 <tjones> so - 6 minutes left. please do reviews, fix bugs, etc. 17:54:37 <tjones> #topic open discussion 17:54:40 <tjones> anything else? 17:54:48 <garyk> tjones: i honestly do not like the multi cluster support and would be happy to see it dropped, but we need to find something that works :) 17:55:10 <tjones> garyk: i don't disagree at all 17:55:38 <mdbooth> tjones: Any chance of getting more hardware for minesweeper? 17:55:55 <tjones> mdbooth: it is on order (and has been for a while). it will get here in sept 17:56:03 <tjones> it's a slllllooooowwwww process 17:56:11 * mdbooth has been there 17:56:15 <garyk> mdbooth: my understanding is that there is a request for more hardware 17:56:24 <garyk> and we all know how long that can take 17:56:29 <tjones> dansmith: im going to see if we can figure out how to get external access to the minesweeper status and queue 17:56:48 <dansmith> tjones: just scraping it and POSTing to an external site would be enough 17:56:58 <dansmith> tjones: just so we have some indication on whether we should ping you, or wait, or... :) 17:57:03 <tjones> yes that is what i am thinking - put it on the same place as my bug list 17:57:07 <dansmith> yeah 17:57:25 <tjones> if only i could get a free RAX vm.... 17:57:33 <tjones> :-) 17:57:48 <mdbooth> :) 17:58:09 <tjones> ok i have nothing else - anyone?? 17:58:14 <mdbooth> tjones: Email it somewhere? 17:58:18 <dansmith> heaven forbid, the multi-billion dollar company pay $12/mo for hosting :) 17:58:31 <tjones> lol 17:58:40 <mdbooth> dansmith: Do you know how much the lawyers to approve $12/mo hosting cost? 17:58:41 <tjones> it's $14 :-D 17:58:59 <dansmith> tjones: https://www.digitalocean.com/pricing/ 17:59:20 <dansmith> tjones: $5 would be plenty for this :P 17:59:25 <tjones> nice! cheaper that AWS 17:59:31 <dansmith> mdbooth: I used to work for IBM, I know all about this :) 18:00:04 <mdbooth> We're all the same :) 18:00:08 <tjones> ok i think we are done - thanks folks! 18:00:34 <garyk> have a good one 18:01:00 <mdbooth> g'night 18:01:53 <mdbooth> #endmeeting ?