17:01:33 <hartsocks> #startmeeting VMwareAPI
17:01:33 <garyk> that was quick
17:01:34 <openstack> Meeting started Wed Nov 20 17:01:33 2013 UTC and is due to finish in 60 minutes.  The chair is hartsocks. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:01:35 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:01:37 <tjones> lol
17:01:37 <openstack> The meeting name has been set to 'vmwareapi'
17:01:47 <tjones> ok we are done!
17:01:53 <hartsocks> I didn't realize it would take the whole name...
17:01:54 <garyk> tjones: and we completed it without incident :)
17:02:01 <tjones> awesome!
17:02:10 <garyk> chat to you guys next week ...
17:02:11 <hartsocks> :-)
17:02:25 <rgerganov> hi guys
17:02:58 <hartsocks> hey. Who else is around?
17:04:49 <hartsocks> Okay. Well, we'll just hope people who can jump in will.
17:04:53 <ogelbukh> hartsocks: o/
17:04:58 <hartsocks> #topic bugs
17:05:08 <hartsocks> #link https://bugs.launchpad.net/nova/+bugs?field.tag=vmware+&field.status%3Alist=NEW
17:05:41 <hartsocks> We've got 3 bugs I've not managed to cycle back to yet.
17:05:49 <hartsocks> I haven't set a priority on:
17:06:00 <hartsocks> #link https://bugs.launchpad.net/nova/+bug/1240355
17:06:03 <uvirtbot> Launchpad bug 1240355 in nova "Broken pipe error when copying image from glance to vSphere" [Undecided,New]
17:06:09 <garyk> hartsocks: what do you ean we have 3 bugs?
17:06:15 <tjones> that is intermittent
17:06:20 <hartsocks> Because I've had no luck reproducing it.
17:06:22 <tjones> 3 non-triaged bugs
17:06:32 <garyk> ok, thanks for the clarification
17:06:35 <hartsocks> We have 3 bugs without priority. Sorry, I a word.
17:06:57 <hartsocks> So 1240355 in nova "Broken pipe error when copying image from glance to vSphere" [
17:07:01 <tjones> #link https://bugs.launchpad.net/nova/+bug/1252827
17:07:03 <uvirtbot> Launchpad bug 1252827 in nova "VMWARE: Intermittent problem with stats reporting" [Undecided,New]
17:07:08 <hartsocks> I can't repro. But, I uderstand others have.
17:07:26 <garyk> i have yet to reproduce this one, but it is criticial in my opinion
17:07:38 <tjones> this one is ciritcal - it's blocking our CI.
17:07:41 <hartsocks> which one? reporting or intermittend pipe error?
17:07:45 <garyk> if someone has reproduced then they should set it as confirmed
17:07:51 <tjones> sorry - should have finished pipe discussion 1st
17:08:06 <garyk> the statistics - the reason is that if the stats are not reported then VM's cannot be launched
17:08:09 <hartsocks> Well, I thought Gary had reproduced the pipe thing.
17:08:26 <hartsocks> yeah. That's pretty bad. So we can call that "High" then.
17:08:33 <garyk> which bug are we talking about>
17:08:47 <hartsocks> *lol*
17:08:49 <tjones> lets finish the pipe discussoin and move on  - my bad
17:08:57 <garyk> tjones: ok. np
17:09:02 <hartsocks> So the pipe bug.
17:09:13 <hartsocks> Gary. You said you've seen it? I can't make it happen.
17:09:35 <garyk> i have seen this one a few times. not as of late
17:09:54 <garyk> i am marking it as confirmed as i have seen it on my setup
17:09:59 <hartsocks> So… it could be gone now for all we know?
17:10:14 <hartsocks> Okay.
17:10:23 <hartsocks> When it happens is it bad?
17:10:30 <hartsocks> Can you keep working?
17:10:51 <garyk> yes, when it happens one is unable to boot a VM and you need to try again. That is bad in my opinion.
17:11:12 <hartsocks> so is the cloud dead from that point on or does it recover?
17:11:34 <hartsocks> … and yet three groups have tried to repro. this and can't.
17:12:02 <garyk> the cloud is not dead. the current VM being booted fails.
17:12:09 <hartsocks> well, I'm calling that Medium for now and moving on to a more important thing then.
17:12:21 <hartsocks> #link https://bugs.launchpad.net/nova/+bug/1252827
17:12:23 <uvirtbot> Launchpad bug 1252827 in nova "VMWARE: Intermittent problem with stats reporting" [Undecided,New]
17:12:28 <hartsocks> tracy go!
17:12:36 <tjones> CRITICAL!  blocking CI
17:12:58 <hartsocks> awesome.
17:13:04 <tjones> and someone from this TZ should take it on so we can work with ryan and sreeram on it.  gary was looking, but its at the end of his day
17:13:23 <tjones> sabari was taking it but he was sick yesterday
17:13:27 <hartsocks> Technically, we can't put things under "Critical" since our driver's failures don't hit critical in the grand OpenStack scheme of things.
17:13:35 <tjones> ok super high ;-)
17:13:47 <garyk> tjones: i looked at the log files that they provided but there was only one concerning then - a image was unable to be deleted
17:13:50 <hartsocks> So this I'll tack to the vmwareapi subteam thingy.
17:13:53 <tjones> but it is affecting CI which makes it important
17:14:11 <garyk> the bug is criticial
17:14:51 <hartsocks> yeah. I guess now that our CI is feeding the Openstack infra we can claim that. Let's do it and see what happens.
17:14:58 <garyk> i think that we need to provide a debug version to sreeram and ryan and see where it goes from there.
17:15:04 <garyk> i'll give them something soon
17:15:11 <tjones> i don't know if sabari is here today - if not who wants to take it?  I can't as i have a prep for a demo tomorrow :-P
17:15:30 <garyk> i can look into it in the coming hours
17:15:53 <hartsocks> #action Gary to follow up on blocking bug/1252827
17:15:56 <hartsocks> cool.
17:15:57 <tjones> thanks garyk - you'll need to pass it on to someone at the end of your day
17:16:05 <garyk> tjones: ok, np
17:16:47 <hartsocks> okay, let's not drop that one then.
17:16:58 <hartsocks> last one… then on to other topics
17:17:05 <hartsocks> #link https://bugs.launchpad.net/nova/+bug/1251501
17:17:06 <uvirtbot> Launchpad bug 1251501 in nova "VMware: error when booting sparse images" [Undecided,New]
17:17:18 <hartsocks> Where did we go on that one?
17:17:20 <vuil> hi just checking in.
17:17:30 <hartsocks> hey.
17:17:31 <tjones> that looks like a backport issue (on my part) it does not happen in master
17:17:33 <hartsocks> Just in time.
17:17:36 <tjones> i'll take a look
17:17:41 <hartsocks> cool.
17:17:55 <vuil> cool. was wondering whay Ryan meant
17:18:06 <hartsocks> so that might not be an issue.
17:18:09 <hartsocks> good.
17:18:17 <tjones> not for master :-)
17:18:45 <hartsocks> Okay.
17:18:48 <hartsocks> So bugs in general.
17:19:25 <hartsocks> #link https://bugs.launchpad.net/nova/+bug/1195139
17:19:27 <uvirtbot> Launchpad bug 1195139 in nova "vmware Hyper  doesn't report hypervisor version correctly to database" [Critical,In progress]
17:19:32 <hartsocks> This one popped into my attention.
17:19:43 <garyk> that is fixed and waiting for review
17:19:48 <hartsocks> Gary, you moved the priority on this and it looks like you got some push back in review.
17:19:55 <garyk> it was −2 due to the inavlid bug status 'won't fix'
17:20:07 <garyk> it was fixed and vakiudated by a QE engineer at HP who encountered the problem
17:20:09 <hartsocks> #link https://review.openstack.org/#/c/53109/
17:20:29 <garyk> i have update the bug status and hopefully the reviewr will understand
17:20:57 <garyk> please exscuse my spelling - i am trying to do three things at once and not succeeding in any of them
17:21:01 <hartsocks> So I remember this issue is that the idea of numbers for versions doesn't work universally.
17:21:28 <garyk> at the moment no one has provided a hypervisor version that does not work with the numbers.
17:21:32 <hartsocks> I understand we can't fix upstream … so this is an attempt to solve a blocking problem.
17:21:49 <hartsocks> So the argument is that it's an academic argument.
17:22:10 <garyk> upstream there are 2 issues
17:22:11 <hartsocks> For example: W13k isn't the version of any existing hypervisor.
17:22:14 <garyk> 1. the bug fix
17:22:22 <garyk> 2. chnaging the way that is is used in the db
17:22:34 <garyk> the latter was not liked for reasons unbeknownst to me
17:22:41 <garyk> in the short term we should go with #1
17:23:04 <hartsocks> If they won't change the DB then I suppose we're forced to do it this way.
17:23:24 <garyk> sadly this is what we have until someone makes the chnages to the db.
17:23:33 <hartsocks> Why is it critical? Is it blocking something?
17:23:41 <garyk> at this stage i am not sure if anyone is actually doing that work
17:23:55 <garyk> it is critical - one cannot use postgres with our hypervisor
17:25:05 <hartsocks> I'm expecting we'll get a knock on the head for calling it critical but I'll let it ride just to see where the line is.
17:25:37 <garyk> if others disagree they can chnage the severity
17:25:42 <hartsocks> It's a shame this got knocked around so long if it's critical.
17:26:18 <hartsocks> Do you need someone to reach out to Joe since he's the one who blocked you?
17:26:35 <hartsocks> I'll take that off line.
17:26:36 <garyk> i sent him a mail and wrote on the review. if you could reach out to him it will be great.
17:26:50 <garyk> thanks
17:27:04 <hartsocks> #action follow up on why bug/1195139 is blocked.
17:27:18 <hartsocks> Okay.
17:27:32 <hartsocks> So does anyone else have a bug we need to discuss and sync up on?
17:28:09 * hartsocks politely listens for people who are slower on the keys.
17:29:42 <hartsocks> okay. We'll have open discussion at the end.
17:29:47 <hartsocks> #topic blueprints
17:29:52 <hartsocks> #link https://blueprints.launchpad.net/nova?searchtext=vmware
17:30:04 <hartsocks> So let's do this.
17:30:42 <hartsocks> Let's pick a blueprint and discuss it and it's priority to the project, and try and do a different one each week.
17:31:06 <garyk> hartsocks: sadly all bps are set as low until 2 cores jump in...
17:31:14 <hartsocks> yes.
17:31:19 <hartsocks> For background...
17:31:50 <hartsocks> if you weren't following, the BP process changed. So that if you want higher than "low" priority you have to get 2 core developers signed up for your BP.
17:32:24 <hartsocks> In all fairness this doesn't mean a BP won't get done, it just means it won't get done as fast. Virtually all our BP were "low" last round and many still made it in.
17:33:01 <hartsocks> On the link I posted...
17:33:11 <tjones> so our config checker (which i have done nothing on) didn't make this list cause it doesn't say vmware in it…
17:33:28 <garyk> i just hope that we manage to convince the core guys to jump in and review our stuff. at the moment it seems to be going as usual. an example is the diagnostics. russel has been very helpful here
17:33:28 <hartsocks> *all* but one of our BP have been bumped *below* "low" priority BTW.
17:33:43 <hartsocks> tjones: it's a dumb query, best I have.
17:33:50 <tjones> that's ok - i;ll work around it
17:34:11 <hartsocks> tjones: post your BP so we can pick on… er… dicuss it.
17:34:15 <hartsocks> :-)
17:34:32 <tjones> https://blueprints.launchpad.net/nova/+spec/config-validation-script
17:34:37 <tjones> it's OUR BP ;-)
17:34:51 <hartsocks> Actually, it's *awesome* that this happened.
17:35:06 <ogelbukh> :)
17:35:12 <tjones> i just put vmware in it ;-)
17:35:14 <hartsocks> I called my old BP on the configuration validator for the driver defunct. This is much better.
17:36:08 <tjones> ogelbukh did a nice job of capturing requirements in https://etherpad.openstack.org/p/w5BwMtCG6z
17:36:42 <ogelbukh> we have 2 distinct parts in it
17:36:47 <hartsocks> tjones: make sure to link that into the BP.
17:36:56 <tjones> just did
17:37:10 <hartsocks> ogelbukh: go ahead
17:37:56 <ogelbukh> first is modifications to common config
17:38:25 <ogelbukh> additional flag types and validations
17:38:52 <tjones> i like the idea of doing that part in oslo and auto generating
17:38:54 <ogelbukh> there are multiple blueprints along that lines
17:39:22 <ogelbukh> and i think vmware part will be the first one to implement as it's first use case
17:39:40 <hartsocks> Yeah. The validation and config-check thing is a cross-cutting concern for even VMwareAPI related drivers...
17:39:54 <hartsocks> the folks on Cinder have some of the same validation checks the folks on Nova will.
17:39:55 <ogelbukh> second part is standalone tool capable of per-service validation
17:40:22 <ogelbukh> of cross-services consistency
17:41:29 <hartsocks> My chief concern about validation at service start up and validation in a stand alone tool… was that this would be mostly the same code … so I wanted to see code reuse to avoid duplicate work.
17:41:48 <tjones> absoutely
17:42:05 <ogelbukh> my idea right now is that it should 'register' service config or something like that
17:42:15 <hartsocks> So I take it this is going to be part Oslo-level work and part work at the driver level?
17:42:19 <ogelbukh> and validate against 'known' configs
17:42:26 <ogelbukh> but that has implications
17:42:44 <ogelbukh> and I'm still trying to identify all of them
17:43:24 <ogelbukh> hartsocks: tjones: I'm not sure that validation logic will be the same for those 2 parts
17:43:35 <hartsocks> Okay.
17:43:38 <tjones> config validation?
17:44:02 <ogelbukh> with oslo.config part it is mostly additional types and regexp matching
17:44:21 <hartsocks> Hmm...
17:44:34 <ogelbukh> while in cross-services part we'll have to inspect sematics
17:45:25 <ogelbukh> *semantics
17:45:40 <ogelbukh> logical connections between services
17:45:49 <ogelbukh> that are not explicit in the code
17:45:55 <tjones> service validation at runtime would get deeper into the config but config validation can be done either place
17:46:02 <tjones> 2 different things to attack
17:46:03 <ogelbukh> sure
17:46:05 <ogelbukh> yes
17:46:27 <hartsocks> should we have a separate session to discuss this in depth?
17:46:30 <tjones> so how best to work on this further?  another irc meeting ??
17:46:35 <tjones> lol - read my mind
17:46:38 <ogelbukh> i believe so )
17:46:42 <hartsocks> :-)
17:47:00 <ogelbukh> we could have a call in webex or google hangouts if you like
17:47:21 <ogelbukh> but time windows are really narrow
17:47:23 <hartsocks> #action set up meeting (in IRC or otherwise) for ogelbukh, tjones, hartsocks, (and anyone else interested) to discuss config validation
17:47:27 <ogelbukh> given i'm in utc+8
17:47:33 <ogelbukh> *utc+4
17:47:34 <hartsocks> Yeah.
17:47:47 <tjones> yes they are very narrow - are you in australia?
17:47:47 <ogelbukh> and 12 hours difference with PST
17:47:55 <hartsocks> Well I have a teeny tiny baby… so sometimes 8pm to midnight EST is the best time for me.
17:47:55 <ogelbukh> no, Russia
17:48:00 <ogelbukh> Mosciw TZ
17:48:04 <tjones> ah
17:48:04 <ogelbukh> *Moscow
17:48:11 <hartsocks> cool
17:48:15 <tjones> very cool
17:48:23 <hartsocks> sometimes snowy and cold even.
17:48:28 <ogelbukh> :)
17:48:28 <hartsocks> :-)
17:48:35 <ogelbukh> probably in 2 weeks )
17:48:43 <ogelbukh> so we could start with another irc
17:49:10 <hartsocks> We are holding #openstack-vmware for discussions people aren't 100% sure go in #openstack-nova
17:49:14 <ogelbukh> ok
17:49:17 <ogelbukh> that's cool
17:49:21 <hartsocks> This is one of those that can probably go either place.
17:49:22 <ogelbukh> i'm already there
17:49:54 <hartsocks> So let's table that BP for now.
17:50:00 <hartsocks> #topic open discussion
17:50:36 <hartsocks> Last 10 minutes, for anything people need to call out.
17:50:46 * hartsocks listens
17:51:20 <garyk> fyi - i have given sreeram and ryan a debug version.
17:51:35 <garyk> troubling line of code - https://github.com/openstack/nova/blob/master/nova/virt/vmwareapi/host.py#L118
17:51:54 <garyk> i have seen exceptions that they have datastore access problems
17:52:08 <tjones> and everything is returning 0
17:52:09 <garyk> sreeram had a very good idea of not resetting the stats - we just need to validtae
17:52:40 <vuil> Can happen sometimes especially with NFS datastores.
17:53:13 <hartsocks> So, sometimes an NFS datastore is "not found" and then later it is?
17:53:21 * hartsocks boggles
17:53:21 <tjones> ugh
17:53:31 <vuil> yeah. transient network connectivity issues can cause that
17:53:39 <tjones> VC too then
17:53:43 * hartsocks nods knowingly
17:53:43 <tjones> wonder how they handle it
17:53:43 <garyk> i am not sure. hopefully after a run or 2 we'll have some debug info
17:54:03 <hartsocks> It's "the 7 fallacies of Network programming"
17:54:10 <hartsocks> or something like that.
17:54:35 <hartsocks> #link http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing
17:54:48 <hartsocks> Looks like I remembered wrong. There are 8.
17:54:53 <tjones> 1. the network is reliable
17:54:58 <hartsocks> *lol*
17:54:59 <hartsocks> yep.
17:55:34 <hartsocks> Is it *ironic* that our "cloud computing" code suffers from a lot of these?
17:55:56 <garyk> i think that they need to reevaluate after the advent of SDN
17:56:21 <hartsocks> there might be more?
17:56:39 <hartsocks> :-)
17:56:56 <garyk> :)
17:57:49 <hartsocks> I have nothing against short meetings.
17:58:00 <hartsocks> As I proved earlier. :-) … going once...
17:58:26 <hartsocks> … twice ...
17:59:09 <hartsocks> … three times...
17:59:13 <hartsocks> #endmeeting