15:02:15 <johnthetubaguy> #startmeeting XenAPI
15:02:15 <openstack> Meeting started Wed Dec 11 15:02:15 2013 UTC and is due to finish in 60 minutes.  The chair is johnthetubaguy. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:02:16 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:02:17 <johnthetubaguy> hello all
15:02:18 <openstack> The meeting name has been set to 'xenapi'
15:02:23 <johnthetubaguy> who is around today?
15:02:29 <thouveng> hi
15:02:57 <matel> hi
15:03:25 <johnthetubaguy> cool, so lets get cracking
15:03:33 <johnthetubaguy> #topic Blueprints
15:03:40 <BobBall> Sorry guys
15:03:45 <johnthetubaguy> anyone got anything to chat about blueprints?
15:03:56 <thouveng> yes :)
15:04:03 <johnthetubaguy> cool, fire away
15:04:19 <thouveng> I added a link to an etherpad and I don't know if it is a good practice
15:04:37 <BobBall> Could you just link to the bp thouveng ?
15:04:42 <BobBall> in here I mean
15:04:48 <BobBall> I'm being thick and can't find it
15:05:00 <thouveng> https://blueprints.launchpad.net/nova/+spec/pci-passthrough-xenapi
15:05:09 <thouveng> and the link to ehterpad is https://etherpad.openstack.org/p/pci-passthrough-xenapi
15:05:22 <BobBall> Perfect
15:05:28 <BobBall> that's right isn't it johnthetubaguy ?
15:05:59 <johnthetubaguy> yeah, that looks good
15:06:29 <thouveng> ok cool.
15:07:13 <johnthetubaguy> are there any bits you want to discuss in that?
15:07:18 <johnthetubaguy> or just get a general review?
15:07:48 <thouveng> just a general review for the moment
15:08:15 <BobBall> I think it LGTM - but that might be because we've discussed it outside of the BP
15:08:17 <johnthetubaguy> what is all the hiding of PCI devices?
15:08:51 <johnthetubaguy> I thought all the hiding is implemented inside nova?
15:08:54 <BobBall> I'll let you answer thouveng - but if you want me to step in, let me know
15:09:21 <thouveng> yes please go because I didn't get the question sorry
15:09:54 <johnthetubaguy> thouveng: you say about passing a "hide" option to pciback module, why is that?
15:10:00 <BobBall> In order to do PCI pass through for PV guests in xen (and stably for HVM guests) the devices should use the pciback driver in dom0 to make sure dom0 doesn't use them for other things
15:10:20 <BobBall> Therefore they need to be "hidden" from the normal kernel boot so pciback can collect them
15:10:34 <BobBall> hence pciback.hide=(device_id) on the kernel command line
15:11:11 <BobBall> the KVM approach is to change the module dynamically but that's a little less stable - and yuo still have to enumerate the devices in nova.conf anyway
15:11:16 <BobBall> so the two might as well be combined
15:12:25 <johnthetubaguy> OK, so I am not sure I understand what is going there, I know there is a whilelist that tells nova which devices it can pass to a guest, and in the flavor we tell nova which devices to pass to a specific server
15:13:10 <BobBall> Post-whitelist, KVM will try to disconnect the device from dom0 and attach it to the KVM-equivalent of the pciback driver
15:13:16 <johnthetubaguy> oh, I see
15:13:36 <johnthetubaguy> so we have to disconnect some set of device from dom0, and thats what we are updating?
15:13:37 <BobBall> All we're saying is we'll do that at boot time since that's better practice - particularly for xen
15:13:58 <BobBall> yes - but they are not "disconnected", just never connected to dom0, as it's a dom0 kernel option
15:14:15 <johnthetubaguy> right
15:14:28 <johnthetubaguy> so this is moving into host aggreates, and the config file is going away
15:14:33 <johnthetubaguy> does that create an issue?
15:14:42 <BobBall> what is moving into host aggregates?
15:15:08 <BobBall> and what config file?  do you mean the whitelist option in nova.conf?
15:15:26 <BobBall> but no, I'm sure it won't create a problem
15:15:36 <thouveng> there are two different things to configure pci passthrough. The whitelist is just used bu the compute node.
15:16:02 <johnthetubaguy> yes, the configuration in nova.conf is going into the DB
15:16:08 <BobBall> well - the compute node uses the whitelist to report to the scheduler what it can provide, right?
15:16:13 <thouveng> so I think that it doesn't create any issue since we just replaced the white list by the boot command line detection.
15:16:15 <thouveng> BobBall: yes
15:16:17 <johnthetubaguy> that includes the white list I think
15:16:18 <BobBall> that won't be a problem
15:16:27 <BobBall> I think :P
15:16:45 <johnthetubaguy> erm, except the user now specifies the whitelist through an administration API?
15:16:53 <johnthetubaguy> as in rest API
15:17:31 <BobBall> That won't be possible in the current thinking - and possibly isn't possible _at all_ with Xen
15:18:16 <johnthetubaguy> hmm, so thats what we decided at the summit for PCI passthrough, oh dear...
15:18:42 <johnthetubaguy> so, its not all bad right… as long as you expose more PCI devices than you want in your nova based whitelist
15:18:49 <BobBall> true
15:18:57 <BobBall> so it migth need to be an intersection of the two
15:19:03 <BobBall> i.e. you have to configure in dom0 to expose it
15:19:03 <johnthetubaguy> cool, I think we are good
15:19:11 <johnthetubaguy> yeah, +1
15:19:14 <BobBall> and then nova can only use it if it's both there and in the whitelist
15:19:28 <johnthetubaguy> expose some in dom0, then you can configure some of those in the dymanic whitelist, it all wokrs
15:19:33 <johnthetubaguy> yup
15:19:35 <johnthetubaguy> cool
15:19:39 <johnthetubaguy> any more blueprint stuff?
15:19:50 <BobBall> It might be possible with newer versions of xen btw - so Augusta or beyond - which use Xen 4.2+
15:20:04 <BobBall> but not with existing versions (which don't have xl pci-set-assignable stuff)
15:20:14 <BobBall> -xen +XenServer
15:20:39 <johnthetubaguy> ah, OK
15:20:41 <johnthetubaguy> good to know
15:20:44 <thouveng> nice
15:20:47 <BobBall> Does that make sense thouveng?  I think that's right?
15:20:58 <thouveng> yes I think so
15:21:34 <johnthetubaguy> I would just get in there, and throw up some code, and we can help you through it
15:21:44 <johnthetubaguy> all sounding really good :)
15:22:03 <BobBall> good good
15:22:08 <johnthetubaguy> #topic Docs
15:22:11 <johnthetubaguy> any news?
15:22:33 <johnthetubaguy> #topic QA
15:22:49 <johnthetubaguy> matel: want to update us on the tempest gate work?
15:23:10 <matel> Yep, nodepool is not prepared for server restarts
15:23:24 <matel> so I'm proposing a patch, so that this concept fits in.
15:23:38 <johnthetubaguy> I am looking at the other side, so assuming we get nodepool sorted, and we have a VM setup, what can we do
15:23:59 <johnthetubaguy> at the same time, going to see what I can do with config drive to get IPs into it, inside rax cloud
15:24:08 <matel> If that's ready, 2 more items left on the list: 1.) prepare the instance for snapshotting 2.) come up with a localrc.
15:24:20 <johnthetubaguy> yep
15:24:22 <matel> Yes, config drive would save us some reboots
15:24:36 <johnthetubaguy> I am kinda looking at (2), in theory anyways
15:24:50 <matel> Apart from that, an email will go to the infra list with our ideas.
15:24:56 <johnthetubaguy> bobball: how is making tempest stable going?
15:25:00 <matel> Hopefully today.
15:25:04 <johnthetubaguy> matel: some fine work
15:25:35 <BobBall> in the RS cloud just waiting for Mate to collect the logs so we can try and figure out why it's not working properly up there but it works better here
15:25:38 <BobBall> anyway
15:25:53 <BobBall> I've got a whole heap of changes stuck waiting for review needed to get tempest stable + fast enough
15:26:05 <johnthetubaguy> BobBall: is that full tempest failing or smoke too?
15:26:13 <matel> Bob - what's up with changing to raring?
15:26:16 <BobBall> smoke in RS cloud fails
15:26:23 <BobBall> That's what I was just going tos ay - but I want suacy
15:26:25 <BobBall> saucy*
15:26:39 <BobBall> anyway - the hopefully last issue is a kernel bug in precise
15:26:45 <matel> Oh, so you say we should try saucy?
15:26:45 <BobBall> which we've just confirmed as a kernel bug
15:26:50 <BobBall> yeah
15:26:53 <BobBall> saucy is newer
15:26:57 <BobBall> new = good, right?
15:27:03 <johnthetubaguy> something like that
15:27:04 <matel> I'm a bit confused, soucy is the latest?
15:27:08 <BobBall> Maybe raring is good enough
15:27:09 <BobBall> yes
15:27:12 <BobBall> raring = 13.04
15:27:14 <matel> It's not true in the software world.
15:27:15 <BobBall> saucy = 13.10
15:27:26 <johnthetubaguy> matel: what is in your XVA at the moment?
15:27:43 <matel> A debootstrapped install.
15:27:48 <matel> wait a sec...
15:27:53 <BobBall> Anyway - the kernel bug in precise causes a semaphore to be held when the userspace program finishes
15:27:58 <BobBall> causing all sorts of things to fail randomly
15:28:01 <johnthetubaguy> nasty
15:28:05 <BobBall> like lsof or lvs or anything really
15:28:15 <johnthetubaguy> eek, nice
15:28:20 <BobBall> which in turn (very disappointingly) means that tempest fails
15:28:22 <matel> john: which xva are you asking?
15:28:27 <matel> john: gimme url.
15:28:33 <BobBall> I thought all XVAs were precies?
15:28:35 <BobBall> precise*
15:28:43 <johnthetubaguy> matel: the one in your script in gerrit?
15:28:57 <matel> Ah, OK, I thought that you are interested in the package list.
15:29:05 <matel> So yes, that's a precise.
15:29:24 <johnthetubaguy> OK, so there is (3) update xva to latest ubuntu
15:29:26 <matel> Guys, should we agree to go for saucy?
15:29:45 <johnthetubaguy> we should go for whatever works for you locally at the moment
15:29:50 <BobBall> anyway - the frustrating thing is that for some reason we don't seem to hit this kernel issue if we don't have one of my changes... but other things in tempest randomly fail without it
15:30:00 <johnthetubaguy> lol
15:30:06 <johnthetubaguy> that sucks
15:30:26 <BobBall> #link https://review.openstack.org/#/c/60253/ is the one that fixes some real nova failures and seems to somehow expose the kernel bug
15:30:38 <matel> that's not lol, Bob is loosing his hair.
15:30:56 <matel> So, Saucy?
15:30:57 <matel> Bob?
15:30:59 <BobBall> It's true.  I have pulled most of it out in the last week.
15:31:03 <BobBall> I say yes matel
15:31:11 <BobBall> no point sticking on precise IMO
15:31:24 <johnthetubaguy> you make it go so fast… you hit a kernel bug
15:31:24 <johnthetubaguy> it happens to us all
15:31:28 <matel> Okay, I will go for that as well.
15:31:28 <BobBall> either saucy or just say "sod it" and go for centos like the rest of the infra jobs ;)
15:31:37 <matel> john - do you know if anyone is on saucy?
15:31:39 <BobBall> but that's a bigger change
15:31:51 <BobBall> I'm happy with trying raring if it's easier - e.g. exists in RS
15:31:52 <matel> Yep, I am afraid of the unknowns.
15:31:58 <johnthetubaguy> I duno, can't remember what we use, don't think its ubuntu
15:32:07 <matel> Okay, let's go with raring.
15:32:14 <BobBall> RS are standardising on Debian moving forward
15:32:15 <johnthetubaguy> raring is fine for now
15:32:24 <matel> wheezy?
15:32:40 <matel> Bob, do you think, wheezy would be a good option?
15:32:48 <BobBall> not sure which one matel ... antonym did tell me, but I can't remember
15:32:50 <johnthetubaguy> BobBall: thats all I remember, debian
15:33:01 <BobBall> but debian vs ubuntu yes
15:33:05 <johnthetubaguy> but some folks want centos, but hey
15:33:07 <matel> I'm just afraid of being the only team on the edge.
15:33:20 <johnthetubaguy> yeah, lets just pick something that works
15:33:25 <BobBall> maybe it was sid - Rackspace like being on the edge ;)
15:33:28 <johnthetubaguy> if it falls over, we pick something else righ
15:33:44 <matel> The problem is the cost of these probes, John.
15:34:02 <matel> It's quite expensive, so thinking for a while is a good idea.
15:34:13 <johnthetubaguy> sure, but we know precise is broken, I would rather with pick an LTS, but whatever works for now
15:34:15 <BobBall> Can we just remove the XVA and run a dozen or so smokes overnight to see if raring works?
15:34:26 <johnthetubaguy> BobBall: +1
15:34:28 <johnthetubaguy> anyways
15:34:33 <johnthetubaguy> lets move on I think
15:34:51 <johnthetubaguy> not precise as precise is broken for us, seems OK for now
15:35:09 <johnthetubaguy> but lets leave that for now
15:35:16 <johnthetubaguy> we need the nodepool working first
15:35:30 <johnthetubaguy> lets get a failing test rather than no test
15:35:33 <matel> will add you to the reviewers.
15:35:39 <johnthetubaguy> cool, sounds good
15:35:55 <johnthetubaguy> any bugs that people want to talk about?
15:36:12 <BobBall> https://review.openstack.org/#/c/60808/ is always fun
15:36:20 <BobBall> got a very weird thing happening
15:36:24 <BobBall> but it's not the cause of the kernel bug
15:36:40 <BobBall> basically we get kernel messages
15:36:49 <BobBall> saying the device is in use by nova when we're trying to unplug it
15:36:53 <BobBall> and it leaks grant entries
15:37:04 <BobBall> which isn't a "problem" - but something that's very weird
15:37:16 <BobBall> the device _does_ unplug
15:37:27 <BobBall> because the next loop sees it as inactive then it's ok
15:37:33 <BobBall> but compounded it might cause a problem
15:37:39 <BobBall> after hundreds of the g.e.'s leak
15:37:53 <BobBall> so I was trying to fix it with sync's and direct access to disks
15:37:57 <BobBall> all of which should prevent it
15:37:59 <BobBall> but it's not :(
15:38:00 <johnthetubaguy> yeah, might run out of handles or something...
15:38:06 <johnthetubaguy> hmm
15:38:11 <BobBall> not handles - but Xen will get very unhappy
15:38:25 <BobBall> I assert the changes I've made are good changes and worthwile to have
15:38:29 <BobBall> which is why I haven't pulled them
15:38:39 <BobBall> but they haven't fully fixed the issue I'm seeing
15:38:58 <BobBall> and I can't explain why because everything is so disconnected it's impossible to trace back to the nova code that's causing this
15:39:12 <BobBall> AND it only happens in parallel tempest at random times too
15:39:23 <johnthetubaguy> yuck
15:39:46 <BobBall> But maybe it'll be fixed by upgrading to the latest version of Ubuntu
15:39:58 <BobBall> And if it's not, we can just wait for 14.04 :)
15:40:12 <johnthetubaguy> yeah, sounds nasty, would PVHVM be better?
15:40:36 <BobBall> can't run that in RS cloud
15:40:49 <BobBall> I assume you mean HVM rather than PVH
15:40:56 <johnthetubaguy> true, damm half working nested virt
15:41:16 <johnthetubaguy> yeah, HVM with PV drivers, but we can't do that either, I assume
15:41:33 <BobBall> PVH will be cool when it exists
15:41:38 <johnthetubaguy> +1
15:41:47 <johnthetubaguy> cool, so lets move on...
15:41:57 <johnthetubaguy> #topic Open Discussion
15:42:05 <johnthetubaguy> anything else for today's meeting?
15:42:32 <annegentle> Doc Bug Day 12/20 -- next Friday
15:42:34 <annegentle> follow the sun!
15:42:50 <BobBall> regarding direct IO for writing config drive you just commented John - I don't have a bug that I can say is fixed by this, which is why it doesn't have a link
15:43:00 <johnthetubaguy> ah, my last day at work, that sounds like a good time to help update docs, I will but that in my diary
15:43:05 <BobBall> the change is one that we should be doing, but the symptoms I saw weren't fixed by this
15:43:06 <annegentle> It would be great to clean up/consolidate the Xen doc bugs, I think they're mostly tagged accurately. https://bugs.launchpad.net/openstack-manuals/+bugs?field.tag=xen
15:43:21 <annegentle> johnthetubaguy: yeah the timing is pretty cool. My team is gonna put in a movie in the afternoon :)
15:44:26 <johnthetubaguy> #action johnthetubaguy sort out doc bugs on doc day
15:45:12 <johnthetubaguy> cool, so I guess we are all done?
15:48:02 <johnthetubaguy> #endmeeting