17:01:29 #startmeeting vmwareapi 17:01:30 Meeting started Wed Aug 6 17:01:29 2014 UTC and is due to finish in 60 minutes. The chair is tjones. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:01:31 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:01:33 The meeting name has been set to 'vmwareapi' 17:02:01 anyone here today? 17:02:10 * mdbooth waves 17:02:12 o/ 17:02:13 o/ 17:02:43 hey guys - so it's all about reviews and bug fixing i think for us. 17:03:34 hi 17:03:59 it looks to me that the only BP we will get in juno are spawn refactor, oslo.vmware, and v3 diags. anyone think anything different? - like spbm and vsan?? 17:04:20 tjones: I don't see it happening 17:04:49 tjones: we are working on spbm and ephemeral - all have code posted 17:04:51 vsan bp was approved too I thought, but with all the logjam of patches still needed reviews, yeah. 17:04:55 spbm is set to "good progres" - garyk what do you think? 17:05:15 tjones: code was completed about 8 months ago. we just need to rebase - i will do that tomorrow 17:05:35 yesterday we had the oslo.vmware updated so the spbm code can now be used 17:05:51 it is all above tje oslo.vmware integration patch 17:05:55 garyk: im spacing - where is ephemeral one ?? https://blueprints.launchpad.net/openstack?searchtext=vmware (just got back from vacation and still fuzzy) 17:05:59 which may land in 2021 17:06:10 lol 17:06:19 https://review.openstack.org/109432 17:06:40 the few patches using oslo.vmware to provide streamOptimized/vsan support are being updated right now as well. 17:06:41 garyk: Incidentally, I spent the afternoon looking at bdm 17:06:52 And I agree with you 17:07:05 about ephemeral, that is 17:07:08 ah, that is is only relevant to libvirt? 17:07:18 No, it's definitely relevant to us 17:07:30 However, there's no need for it to be in this patch 17:07:42 *missing context re bdm* 17:07:51 block device mapping 17:07:52 ah, ok. then next stuff i'll add in a patch after that 17:08:21 garyk: See my big comment in spawn() about the driver behaviour being broken wrt bdm? 17:08:27 I think it needs to be fixed along with that 17:08:48 Probably quite an involved patch 17:08:51 mdbooth: i think that the bdm support is broken in general 17:08:58 * mdbooth would be happy to write it, though 17:08:58 but that is for another discussion - 17:09:03 garyk: +1 :) 17:09:14 ok so lets go through the BP 1 by 1 (we can revisit bdm later on) 17:09:15 https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/vmware-spawn-refactor,n,z 17:09:16 I noted the same in my earlier phase 3 stuff too, so yeah this should be one of the first things to deal with after the refactor 17:09:24 here's spawn 17:09:29 after we chatted there was someone who wrote on irc that he did not manage to get it working in I… 17:10:21 tjones: That's currently blocked on minesweeper 17:10:29 arrggghhhhhh!!! 17:10:41 Vui and I chatted just before this meeting about the phase 3 stuff 17:10:45 down again? 17:11:03 tjones: hasn't it been down for a couple weeks now? or is it just really spotty? 17:11:07 garyk had a handle on it, I believe 17:11:10 is phase 3 in that list? im losing track of 17:11:18 we've been asking about minesweeper status in -nova for at least two weeks 17:11:30 dansmith: no, it has been up. just the last week. just a few problems last 2 days 17:11:39 dansmith: spotty. i literaly got back from vacation just before this meeting. i'll get an update after 17:11:45 tjones: Yes. Basically all Vui's patches are phase 3. 17:11:48 garyk: hmm, okay I've recheck-vmware'd a few things and never get responses 17:12:08 dansmith: that is due to the fact that the queue is very long due to eh fact it was down for a while 17:12:24 okay, I wish we could see the queue so we'd know, but... okay 17:12:33 Matt had some nice suggestions, I will be taking on those and posting an update to the phase 3 chain of patches. 17:12:36 it was averaging about 14 patches a day due to infra issues 17:12:41 +1 17:13:00 I understand it's on internal infrastructure, though 17:13:57 ok im assuming oslo.vmware also blocked on minesweeper 17:14:23 and reviews obv 17:14:31 tjones: all of those patches had +1's then we need to rebase and it was at a time when ms was down. so back to sqaure 1. 17:14:35 tjones: we talked last week about not approving any without them 17:14:50 them == votes 17:14:57 yes - i agree we need minesweeper runs 17:15:02 I've been trying to come back to patches I've -2d regularly to check for minesweeper votes 17:15:27 I definitely don't want to be the guy that -2s for that and then holds us up after MS shows up :) 17:15:35 :-D 17:15:52 dansmith: it is understaood. rules are rules 17:16:02 ok here's the complete list of our stuff out for review 17:16:03 we just need to et our act together with minesweeper 17:16:05 #link https://review.openstack.org/#/q/status:open+project:openstack/nova+message:vmware,n,z 17:16:12 Hmm, I have an actual -1 from dansmith 17:16:15 but when it is up it would be nice if the patches could get some extra eyes 17:16:22 mdbooth: frame it! 17:16:26 In phase 2 spawn refactor 17:16:32 we still need our team to be reviewing like mad and so when MS comes back we are ready 17:16:32 @dansmith: on related note even when minesweeper passes, we have seen −1 on xenserver CI quite a bit despite rechecks, does that −1 factor into the filtering for reviewable things ? 17:16:34 dansmith: I'm honoured :) 17:16:42 vuil: not to me 17:16:55 * mdbooth can address that tomorrow 17:16:57 vuil: everyone gets -1s from xen ci right now :) 17:17:10 just not that ms does not run on patches in the test directory 17:17:11 vuil: they're working hard on that too 17:17:29 so a patch like https://review.openstack.org/105454 should not be blopcked 17:17:37 ah got it 17:17:43 garyk: yeah, that makes sense to me 17:18:20 anything else on BP ? 17:19:21 *listening* 17:19:38 ok lets talk about bugs 17:19:41 #topic bugs 17:19:50 #link http://tinyurl.com/p28mz43 17:19:53 we have 59 17:20:04 a number that is not going down 17:20:27 we have a number of these in new 17:20:34 or triaged state 17:21:01 tjones: a lot of them have been triaged and a lot are in progress and a lot have been completed. 17:21:08 we need to do a cleanup 17:21:12 i filtered out completed 17:21:23 some are also very concerning - basically the multi cluster stuff break a lot of things 17:21:34 these are only new, inporgress, traged, and confirmed 17:21:45 i am in favor of pushsing rado's patch which drops the support as we discussed at the summit 17:22:44 HP raised a lot of concerns about that as i recall 17:23:36 The principal concern was memory usage on vsphere, right? 17:23:46 yeah, i asked that they write to the list so that we can get some discussion going about it and there was nothing 17:23:54 I saw that 17:23:57 that was one - but it is something that can be addressed 17:23:57 i thought they posted something 17:24:12 my main concern is that each compute node has its own cache dir - that is very costly 17:24:18 i thought it was spinning up a n-compute for each cluster they did not like and the image cachce 17:24:41 So, I happened to read a tripleo thing about deploying multiple novas per node earlier 17:24:48 tjones: i do not recall seeing anything. if someone did can you please forward the mail message to me 17:24:59 That would solve a provisioning issue, but not the memory usage thing 17:25:04 https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg29338.html 17:25:38 garyk: I think that's something we're going to have to live with until we get inter-node locking 17:26:36 i think the biggest issue is what to do with existing customers that have deployed the current way. what is the upgrade path? 17:26:44 garyk: Random thought: how likely is the backing store to do de-duplication? 17:26:49 tjones: thanks 17:27:10 @mdbooth storage-vendor specific 17:27:17 Right, but is it common? 17:27:46 mdbooth: soryy i do not understand 17:27:51 *not entirely sure* 17:28:08 Would suck to have to recommend it, for sure 17:29:19 tjones: Is anybody looking at that, btw? 17:30:21 mdbooth: looking at?? 17:30:30 upgrade? 17:30:33 tjones: yeah 17:30:40 no that i am aware of 17:30:58 i tihink we have to tackle that before we do this change though 17:31:21 Is anybody currently motivated to do it? 17:31:37 s/motivated/motivatable/ 17:31:47 so the path we are talking about is provide inter-node locking then make image caching more efficient 17:31:47 i dont see any way around saying "you have to reconfgure your n-compute" 17:32:11 before taking out multi-cluster? 17:32:19 tjones: Right. Presumably also db hacking. 17:32:24 Much nastiness. 17:32:30 yes 17:32:33 mdbooth: there is no need for db hacking 17:32:40 there is a patch for the esx migration 17:32:49 an external utility 17:33:03 https://review.openstack.org/101744 17:34:03 so the issues with this are 1. soap takes too much memory / connection (could be solved with pyvmomi) 2. image cache duplication 3. upgrade path. 17:34:06 right? 17:34:12 btw there is also a pacth for the actual esx deprecation 17:34:32 tjones: I don't see how pyvmomi would solve 1 17:34:43 doesn't is use a different transport? 17:34:47 not soap 17:34:49 If it can, then we can presumably solve it without pyvmomi 17:34:54 Don't think so 17:35:07 no still soap, but on a more lightweight stack 17:35:27 More lightweight on the client side, no difference to the server 17:35:29 as in we may save some memory usage by taking out suds 17:35:40 *remains to be seen* 17:35:47 Our problem is on the server, though, iiuc 17:36:02 hp was concerned about multiple computes each taking up lots of resources. 17:36:35 my concerns of the multi cluster support are edge cases - for example the rezie issue 17:36:37 vuil: That was resources on vcenter, though, right? 17:36:38 I don't think server impact is going to be much 17:36:44 i thought they were concerned on the client side - running multiple n-compute 17:36:57 * mdbooth might have misunderstood this 17:36:58 no actual python n-cpu processes taking up memory. 17:37:05 https://review.openstack.org/108225 17:37:05 yeah that is what i thought 17:37:17 I thought I heard somethign about server side too 17:37:27 because each connection to vcenter comes with a lot of overhead 17:37:30 in terms of load on VC it is pretty much the same whether it comes from one ncpu managing N clusters or N ncpu managing one each 17:37:34 ~140MB in the driver 17:37:37 Got it 17:37:39 so one compute using one connection for lots of machines 17:37:44 Ok, that's way more manageable 17:38:22 @dansmith, addition of a couple more connections should not be too big of a deal. 17:38:31 vuil: okay 17:39:19 So, tripleo were talking about deploying multiple novas per server in separate containers 17:39:57 even if we can decrease the client side memory load we still have the duplcate cache and upgrade. with duplicate cache we could solve with using shared datastore for glance - right? 17:40:03 And in the grand scheme of thinks, anybody deploying 32 VMware clusters worth of openstack isn't going to notice the cost of 4GB of RAM 17:40:27 Although it's inelegant to waste it, it's probably not a huge deal 17:41:11 mdbooth: my thoughts as well. 17:42:06 Duplicate cache: 17:42:18 Only an issue for clusters sharing datastores 17:43:16 Otherwise the cache would be duplicated anyway 17:43:47 we need to get to a place where we can implement this change without screwing HPs existing customers…. 17:44:13 tjones: yes, they are in production and this would break a installtion 17:44:24 yep 17:44:51 so, 17:45:06 sounds like maybe we should put something into the juno release notes that such an arrangement is deprecated 17:45:17 to give time and notice so we can get something into kilo? 17:45:17 i was just typing that very thing 17:45:25 cool 17:45:29 we should deprecate thsi to give them some time 17:45:45 is there a config variable that would go away that we can also document as deprecated? 17:46:09 and, despite it's limited usefulness, we should also log.warning("this is going away soon") if they have that configured 17:46:16 per usual protocol 17:46:22 no, there is no specific config var 17:46:22 The one which selects clusters, presumably 17:46:34 yeah - i think it's a list 17:46:36 we can identify if there is more than one cluster configured 17:46:48 so, iirc, the patch changed it from a list to a string, which we can't do anyway 17:46:52 but i think that we should address the issue on the list with the guys from hp 17:47:16 self._cluster_names = CONF.vmware.cluster_name 17:47:17 so: 1. document in release notes, 2. log.warning() if len(list)>1 ? 17:47:25 Hmm 17:47:34 So it's still going to be called 'cluster_name', presumably 17:47:44 Except it's no longer going to accept a list 17:47:47 https://github.com/openstack/nova/blob/master/nova/virt/vmwareapi/driver.py#L60 17:47:51 That's ugly 17:47:59 mdbooth: yeah, that was my complaint, IIRC 17:48:06 i think that this needs more discussion prior to us doing a deprecation warning 17:48:41 we really should reply back on that thread 17:48:46 garyk: IMHO, we mark as deprecated to force the conversation, nothing says we *have* to yank it in kilo if not ready 17:48:53 an existing customer will also need an upgrade patch for cached images. 17:48:58 to keep kiran in the loop - this time is not very good for him 17:49:10 dansmith: i think that is not a good approach 17:49:24 why force something until we have thought it out properly 17:49:50 garyk: well, because I feel like we've decided that it's coming out, we just have to figure out how it's going to look when it's out 17:50:01 garyk: and because if we're going to deprecate, starting the timer limits when we can actually remove it 17:50:15 no, we have not decided 17:50:28 a few people have but the general community has some issues with this 17:50:30 we agreed at summit, no? 17:50:51 at the summit we discussed it in a small room. after the summit people started to raise issues 17:51:04 are we not allowed to change things after problems and issues are raised? 17:51:15 is that going to build a healthy community discussion? 17:51:40 sure, we can.. I didn't think any of the discussion here was considering the option of not doing it 17:51:50 but that's fine, see how the ML thread goes 17:51:50 some people who work on the driver were not able to attend the summit and only afterthey were awair of discussion did they reiase thier issues 17:52:08 and i think that they have come with some valid arguments 17:52:09 but we should make sure to revisit before too late in juno, merely for the deprecation timer ... timing :) 17:52:27 to be honest i am kicking myself for not having found the problems when we originally added the feature, but we all approved that 17:52:42 ok the way we got to this discussion was because we were talking about bugs 17:52:48 #link http://preview.tinyurl.com/kkyw9c4 17:52:50 i will follow up on this list tomorrow about this 17:53:05 of the 59 there are 20 that are not owned by anyone and some are high prio 17:53:27 garyk thanks for following up 17:54:33 so - 6 minutes left. please do reviews, fix bugs, etc. 17:54:37 #topic open discussion 17:54:40 anything else? 17:54:48 tjones: i honestly do not like the multi cluster support and would be happy to see it dropped, but we need to find something that works :) 17:55:10 garyk: i don't disagree at all 17:55:38 tjones: Any chance of getting more hardware for minesweeper? 17:55:55 mdbooth: it is on order (and has been for a while). it will get here in sept 17:56:03 it's a slllllooooowwwww process 17:56:11 * mdbooth has been there 17:56:15 mdbooth: my understanding is that there is a request for more hardware 17:56:24 and we all know how long that can take 17:56:29 dansmith: im going to see if we can figure out how to get external access to the minesweeper status and queue 17:56:48 tjones: just scraping it and POSTing to an external site would be enough 17:56:58 tjones: just so we have some indication on whether we should ping you, or wait, or... :) 17:57:03 yes that is what i am thinking - put it on the same place as my bug list 17:57:07 yeah 17:57:25 if only i could get a free RAX vm.... 17:57:33 :-) 17:57:48 :) 17:58:09 ok i have nothing else - anyone?? 17:58:14 tjones: Email it somewhere? 17:58:18 heaven forbid, the multi-billion dollar company pay $12/mo for hosting :) 17:58:31 lol 17:58:40 dansmith: Do you know how much the lawyers to approve $12/mo hosting cost? 17:58:41 it's $14 :-D 17:58:59 tjones: https://www.digitalocean.com/pricing/ 17:59:20 tjones: $5 would be plenty for this :P 17:59:25 nice! cheaper that AWS 17:59:31 mdbooth: I used to work for IBM, I know all about this :) 18:00:04 We're all the same :) 18:00:08 ok i think we are done - thanks folks! 18:00:34 have a good one 18:01:00 g'night 18:01:53 #endmeeting ?