15:01:11 <johnthetubaguy> #startmeeting XenAPI
15:01:12 <openstack> Meeting started Wed Dec  4 15:01:11 2013 UTC and is due to finish in 60 minutes.  The chair is johnthetubaguy. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:16 <johnthetubaguy> Hi everyone
15:01:17 <openstack> The meeting name has been set to 'xenapi'
15:01:19 <matel> hi
15:01:24 <BobBall> hi
15:01:25 <johnthetubaguy> who is around for today's meeting?
15:01:34 <thouveng> hi
15:01:42 <matel> I'm here
15:01:47 <johnthetubaguy> cool
15:01:51 <johnthetubaguy> so lets get cracking
15:02:05 <johnthetubaguy> #link https://wiki.openstack.org/wiki/Meetings/XenAPI
15:02:17 <johnthetubaguy> #topic Blueprints
15:02:32 <johnthetubaguy> Icehouse-1 is closing, and Icehouse-2 is starting
15:02:42 <johnthetubaguy> anyone got any worries or plans with that?
15:03:00 <johnthetubaguy> any blueprints we need to get in during Icehouse-2?
15:03:01 <BobBall> nope - but there is a BP that thouveng will want to talk about for I-2
15:03:11 <pvo> o/
15:03:13 <BobBall> first - hi thouveng !
15:03:24 <thouveng> hello everybody
15:03:30 <thouveng> hi BobBall
15:03:32 <johnthetubaguy> thouveng: hello! feel free to discuss your blueprint, and give a quick intro
15:03:44 <thouveng> now?
15:03:48 <johnthetubaguy> sure
15:03:53 <thouveng> ok
15:03:56 <BobBall> Just as a brief intro - thouveng is from bull.net who are interested in PCI pass through and potentially vGPU in future
15:04:19 <thouveng> So first here is the link: https://blueprints.launchpad.net/nova/+spec/pci-passthrough-xenapi
15:04:27 <johnthetubaguy> ah, cool
15:04:42 <thouveng> The goal of this bp is to add support for pci passthrough into xenapi driver
15:05:07 <thouveng> I see two tasks. First  add the support for updating the status of the compute host
15:05:17 <thouveng> and second add the mechanism to attach a pci device to a VM when booting.
15:05:56 <BobBall> What do you mean by status of the host?  The PCI devices that it has available?
15:06:02 <johnthetubaguy> sounds good, I have added them as work items in the blueprint
15:06:20 <thouveng> The pci devices that are available for the pci passthrough
15:06:52 <BobBall> So what's the procses for getting this prioritised / cores assigned johnthetubaguy ?
15:07:05 <johnthetubaguy> does the current structure in Nova look OK for wiring up for to XenAPI?
15:07:25 <johnthetubaguy> BobBall: just target it to a milestone, and it should get reviewed, I am kinda doing that as we type
15:07:44 <thouveng> johnthetubaguy: Yes it seems to have all needed wires
15:08:15 <johnthetubaguy> cool, I will add a note in the blueprint saying we expect no new configuration settings, does that seem fair?
15:09:22 <thouveng> Yes. For the configuration I have some doubts but I think that I only need to catch pci devices that are passed on the dom0 command line when booting the device.
15:09:38 <thouveng> So I think that we don't need extra configuration.
15:09:53 <johnthetubaguy> thouveng: how are you going to wire up with xenapi, does it need a plugin?
15:10:19 <thouveng> johnthetubaguy: Yes exactly.
15:10:22 <johnthetubaguy> thouveng: that sounds right to me, the filtering and grouping that is getting added in nova can be wired up once it droppes
15:10:49 <johnthetubaguy> thouveng: cool, it would be good to get these details in the blueprint, helps set expectations, and review the direction the implementation will take.
15:10:55 <johnthetubaguy> let me just add that in
15:10:56 <thouveng> I should be able to add the function into an existing plugin. So I just need to upgrade the plugin version.
15:11:36 <johnthetubaguy> BobBall: does this really need a plugin, are we missing some XenAPI functions for this stuff?
15:11:51 <BobBall> The issue is we need to list the PCI devices on the host
15:12:04 <BobBall> of course anything on the host can be exposed through XAPI - but XAPI doesn't do that ATM
15:12:17 <johnthetubaguy> right, I guess that was it, thats OK
15:12:23 <BobBall> so any use of a plugin could be considered "missing functionality" from XAPI
15:12:52 <BobBall> In particular thouveng is going to look at the boot command line and/or the output of lspci -Dv to see which modules are loaded for the devices
15:13:02 <BobBall> To me that feels too specific to be exposed through XAPI
15:13:05 <BobBall> and that's what plugins are for
15:13:14 <johnthetubaguy> yeah, probably
15:13:25 <johnthetubaguy> sounds good
15:13:28 <BobBall> thouveng isn't proposing a plugin that _modifies_ state from dom0 - just reads it
15:13:31 <johnthetubaguy> … final question
15:13:36 <BobBall> all the modification can be done through XAPI
15:14:09 <thouveng> agree
15:14:15 <johnthetubaguy> BobBall: cool, thats what I remember from looking that this, I have added it to the blueprint
15:14:19 <johnthetubaguy> so that question
15:14:33 <johnthetubaguy> do you think the code will be ready during Icehouse-2?
15:14:52 <thouveng> Sorry I don't remember the date for I2?
15:15:05 <BobBall> 23rd Jan
15:15:22 <johnthetubaguy> year, so needs to be up for review early Jan
15:15:38 <BobBall> or 23th Jan if you want to be pedantic and copy the schedule
15:15:43 <BobBall> #link https://wiki.openstack.org/wiki/Icehouse_Release_Schedule
15:15:46 <johnthetubaguy> we can always move things if we have too, but sooner is better
15:16:02 <leif> Hey quick question, since this is piggy-backing libvirt support, does libvirt handle hot-plugging?
15:16:10 <thouveng> The first part about updating host status will be available.
15:16:27 <johnthetubaguy> OK, well thats enough to say I-2 for now, and see how it goes
15:16:46 <johnthetubaguy> leif: I kinda assume this is attach at boot, not hot-plug right now, but thouveng?
15:16:48 <thouveng> I hope that the part that attaches the pci device will be ready too but I didn't did that much on that part.
15:16:48 <BobBall> think so leif - but I might be missing something from the code.  don't know to be honest.
15:17:11 <thouveng> johnthetubaguy: attach at boot yes. No hotplug.
15:17:22 <johnthetubaguy> I think its just at boot via the flavour right now, so thats all cool
15:17:40 <thouveng> johnthetubaguy: exactly
15:17:46 <leif> Yeah, no problem with level.  Was asking since hotplug support on libvirt what do we return back?
15:18:00 <leif> "level" - only support at boot.
15:18:23 <BobBall> no idea leif
15:18:24 <leif> Wanted to know if this is an immediate error or post launch error.
15:18:34 <johnthetubaguy> well, should just be extra wiring up, its the reporting status I have more worries about
15:18:35 <leif> no problem, just asking at this poitn.
15:18:50 <johnthetubaguy> immediate error?
15:19:06 <BobBall> Actually looking at the code - I might be mistaken
15:19:09 <BobBall> https://review.openstack.org/#/c/39891/40/nova/virt/libvirt/driver.py
15:19:12 <BobBall> at least in that changeset
15:19:33 <BobBall> the "hotpluging" only happens during VM lifecycle operations (e.g. reboot, suspend, snapshot)
15:19:47 <BobBall> so might just be reqs from libvirt for a reboot to work etc
15:19:53 <johnthetubaguy> yeah, there is no "attach PCI device" api call I know about
15:20:00 <johnthetubaguy> but there could be I supose
15:20:19 <johnthetubaguy> anyways, lets not get sidetracked
15:20:25 <johnthetubaguy> blueprint approved
15:20:26 <leif> agreed. :-)
15:20:45 <BobBall> w00t.  So - john - how do we get another core signed up?
15:20:49 <thouveng> :)
15:21:06 <johnthetubaguy> BobBall: you ask them, or you get me to ask them
15:21:29 <johnthetubaguy> but to be honest, just see who is interested and adds there name in the first instance
15:21:32 <BobBall> heh :) I thought this was now a managed process
15:21:54 <johnthetubaguy> well, its crowd sourced
15:22:02 <BobBall> Ah - if it's low priority then that's one thing... I assumed this would be a medium blueprint so needed two cores to sign up before approval or something
15:22:07 <johnthetubaguy> I see blueprint I want, so I sign up to do them
15:22:14 <johnthetubaguy> or if I see xenapi ones, I sign up
15:22:32 <johnthetubaguy> its low because there are not two cores signed up
15:22:42 <johnthetubaguy> it can be promoted if another core signs up
15:23:01 <BobBall> ok
15:23:01 <johnthetubaguy> its nice and loose
15:23:05 <johnthetubaguy> anyways
15:23:23 <johnthetubaguy> asesome stuff, I remember talking about this in San Diego, and never getting time to do it :D
15:23:37 <johnthetubaguy> lets move on..
15:23:46 <johnthetubaguy> #topic Docs
15:23:58 <johnthetubaguy> anyone got any doc things?
15:24:09 <BobBall> no doc fun this week
15:26:11 <BobBall> john?
15:26:55 <matel> oh
15:26:59 <johnthetubaguy> yeah, sorry
15:26:59 <matel> network issues?
15:27:08 <johnthetubaguy> #topic Bugs & QA
15:27:14 <johnthetubaguy> how is the tempest work going
15:27:21 * johnthetubaguy looks at matel
15:27:25 <BobBall> matel first?
15:27:32 <matel> Okay.
15:27:57 <matel> So I am eorking on scripts to create a cloud- ready xenserver
15:28:13 <matel> #link https://github.com/citrix-openstack/xenapi-in-the-cloud
15:28:36 <johnthetubaguy> is that "working" now?
15:29:01 <matel> it's working, but I am speeding it up - so that we have an image, that could just be launched.
15:29:07 <johnthetubaguy> … and have we done a tempest run in that cloud based XenServer yet?
15:29:11 <BobBall> Thanks to antonym - tap devices were fixed :)
15:29:15 <matel> Yes, we did smoke.
15:29:24 <johnthetubaguy> how "fast" was smoke?
15:29:28 <johnthetubaguy> and did it work?
15:29:39 <matel> Ran 223 tests in 624.476s
15:29:59 <matel> tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern[compute,image,volume]          164.053
15:29:59 <matel> tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern[compute,image,network]                  98.140
15:29:59 <matel> tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario[compute,image,network,volume]   91.124
15:29:59 <matel> tempest.api.compute.servers.test_server_rescue.ServerRescueTestJSON.test_rescue_unrescue_instance[gate,smoke]            38.828
15:30:04 <matel> tempest.scenario.test_server_basic_ops.TestServerBasicOps.test_server_basicops[compute,network]                          36.813
15:30:07 <matel> tempest.thirdparty.boto.test_ec2_volumes.EC2VolumesTest.test_create_volume_from_snapshot[gate,smoke]                     33.709
15:30:10 <matel> tempest.api.compute.v3.servers.test_server_actions.ServerActionsV3TestXML.test_rebuild_server[gate,smoke]                31.742
15:30:13 <matel> tempest.api.compute.servers.test_server_rescue.ServerRescueTestXML.test_rescue_unrescue_instance[gate,smoke]             23.964
15:30:14 <johnthetubaguy> hmm, OK, that could be worse
15:30:16 <matel> tempest.scenario.test_dashboard_basic_ops.TestDashboardBasicOps.test_basic_scenario[dashboard]                           23.038
15:30:19 <matel> tempest.api.compute.v3.servers.test_server_actions.ServerActionsV3TestJSON.test_rebuild_server[gate,smoke]               18.980
15:30:22 <matel> Some really slow ones.
15:30:24 <matel> So that's smoke.
15:30:30 <matel> No full runs yet
15:30:39 <johnthetubaguy> any errors with smoke?
15:30:42 <matel> Bob has fixes to make the test runs more stable.
15:30:42 <johnthetubaguy> I guess not?
15:30:49 <matel> They were a bit unstable.
15:30:53 <johnthetubaguy> hmm, OK
15:30:58 <matel> I'll pass it to Bob.
15:31:02 <johnthetubaguy> so can we wire this up into Zuul now?
15:31:08 <matel> As he found the reason.
15:31:28 <BobBall> I've found some really fun errors in full tempest...
15:31:30 <matel> I would need someone, who would help wire it up to that black box.
15:31:51 <BobBall> Ranging from lack of memory leading to compute memory fragmentation leading to VBD plug failures which then clearly fails the tempest test
15:31:52 <johnthetubaguy> OK, so have you seen the docs and other changes where people added tests? I can help with that
15:31:57 <matel> I would say, we are getting closer to wire it up, but it's measured in weeks...
15:32:14 <johnthetubaguy> why weeks?
15:32:37 <matel> Because there is work to be done, and work needs time.
15:32:54 <BobBall> Lots of unknowns still in exactly how to integrate with the existing zuul stuff
15:33:02 <johnthetubaguy> agreed, but digging into that, can I help make that go faster, or is it not that kind of work?
15:33:07 <BobBall> like exactly where the split between creating a VM and putting it in the nodepool should be etc
15:33:11 <matel> Sure, you can make it faster.
15:33:26 <matel> I would need to have a description of the zuul entry points.
15:33:42 <matel> AFAIK, it's "PREPARE IMAGE" and "RUN DEVSTACK"
15:33:50 <johnthetubaguy> Yup, OK, so I will have a chat with people, but I can come and sit next you tomorrow and help move it forward?
15:34:05 <matel> And we would need to find out how to "customise" the localrc - so that it knows about XenServer.
15:34:24 <johnthetubaguy> yup, that all makes sense
15:34:29 <matel> Yes, so I am working on a script, that is producing an image.
15:34:38 <johnthetubaguy> that sounds good
15:34:45 <johnthetubaguy> I am happy to look into Zuul bits if that helps
15:34:46 <matel> which is really the first step - I am close (this week)
15:34:55 <johnthetubaguy> anyways, we are making good progress
15:34:55 <matel> Okay, stay in touch
15:34:58 <johnthetubaguy> lets move on
15:35:04 <johnthetubaguy> http://ci.openstack.org/
15:35:06 <johnthetubaguy> BTW
15:35:18 <BobBall> those docs are rubbish
15:35:24 <johnthetubaguy> prehaps
15:35:25 <BobBall> we were looking at the yesterday :)
15:35:37 <matel> Yes, I am so upset about the lack of documentation.
15:35:42 <johnthetubaguy> OK, any more on QA
15:35:42 <BobBall> I know more of the overview than is in the docs just from playing
15:35:48 <BobBall> Well we can talk about my tempest fun
15:35:52 <johnthetubaguy> sure
15:35:59 <johnthetubaguy> you got patches and bugs up?
15:36:00 <matel> So if john, you could put together a nice diagram, and put it on the docs page, that would help.
15:36:04 <BobBall> As I said - if you give domU too little memory, all sorts of things can go wrong
15:36:12 <johnthetubaguy> matel: my plan is to read the code
15:36:14 <BobBall> I've got a patch up for devstack - but you'll need to be aware of it in production
15:36:26 <matel> Sure, and create a diagram as well.
15:36:29 <johnthetubaguy> BobBall: yes, well, we try not to run out of memory I guess
15:36:29 <BobBall> if your compute VM is swapping then you might be hitting an issue where VBD plug fails
15:36:53 <johnthetubaguy> we don't run rabbit in our compute domUs, oddly
15:36:55 <BobBall> it's not running out of memory that's the problem - it's memory fragmentation that prevents you from allocating a 128k block - which is quite large in memory terms
15:37:01 <johnthetubaguy> but its a good catch though
15:37:13 <BobBall> so if you have a long-running and busy compute VM with only just enough memory, you _will_ hit this sooner or later
15:37:42 <BobBall> I suspect all three of those will apply at Rackspace since I know you squeeze compute's memory to get more VMs on a box
15:38:03 <johnthetubaguy> yeah, not sure about the headroom though, needs a good luck
15:38:12 <johnthetubaguy> whats the fix?
15:38:16 <BobBall> more memory
15:38:23 <johnthetubaguy> lol, OK
15:38:23 <BobBall> or reboot compute to stop fragmentation
15:38:33 <johnthetubaguy> yeah, thats possible
15:38:46 <BobBall> but who knows how often you have to reboot
15:38:57 <BobBall> parallel tempest was hitting it within 60 minutes
15:38:58 <johnthetubaguy> never under normal operation, but hey
15:39:27 <BobBall> Anyway - next fun - we already synchronize on VBD.plug, but we have to synchronise on VBD.unplug too (both at the same time) otherwise we get the same random racy failures
15:39:36 <BobBall> #link https://review.openstack.org/#/c/59856/
15:39:47 <johnthetubaguy> Ah, yes, I did ask about that at the time, I guessed we might need that
15:40:35 <BobBall> and there are some weird things I'm looking at ATM but I don't have a solution for
15:40:38 <johnthetubaguy> cool, any more?
15:40:40 <BobBall> timeouts with volumes
15:40:52 <BobBall> the really annoying thing is that full tempest does pass
15:40:59 <johnthetubaguy> oh, is that the old kernel issue with iSCSI on the same box again?
15:40:59 <BobBall> so it's timeouts / races that we're fighting against
15:41:03 <BobBall> no
15:41:06 <BobBall> Mate fixed that
15:41:16 <johnthetubaguy> but it could be related?
15:41:31 <BobBall> I doubt it at this point
15:41:39 <johnthetubaguy> you get deadlocks in tap disk, is that related?
15:41:41 <BobBall> the iscsi issue is definitely mitigated
15:41:55 <BobBall> there may be other deadlocks - but not that one
15:42:08 <BobBall> that one is mitigated by adding a new memcopy in the process
15:42:24 <BobBall> if we're still seeing deadlocks then it's a different issue
15:42:29 <matel> Let me dig up the change...
15:42:34 <johnthetubaguy> OK, no worries
15:42:43 <johnthetubaguy> just checking the things that came to mind
15:42:48 <BobBall> (the memcopy isn't actually real - it's about changing the IO mode used by the SR)
15:43:05 <matel> #link https://github.com/citrix-openstack/qa/blob/master/install-devstack-xen.sh#L314
15:43:49 <johnthetubaguy> cool, making progress though, which is good to see
15:43:57 <johnthetubaguy> so, in other news...
15:44:17 <johnthetubaguy> I plan to use all your good work, and copy and paste it to run cloud cafe tests, assuming we keep using those
15:44:27 <johnthetubaguy> but thats just for context
15:44:42 <johnthetubaguy> so...
15:44:48 <johnthetubaguy> #topic Open Discussion
15:44:56 <johnthetubaguy> any more for any more?
15:45:19 <BobBall> can't for the life of me think what it was
15:45:22 <BobBall> but I was going to say something
15:46:08 <matel> I'm okay, I just need zuul- capable people.
15:46:11 <matel> to talk to.
15:46:24 <BobBall> Just ask on -infra
15:46:30 <BobBall> they usually answer
15:46:36 <matel> Oh, yes.
15:46:52 <BobBall> and they only charge £5 per question
15:46:54 <BobBall> quite reasonable
15:46:57 <johnthetubaguy> yeah, so I am happy to take on the zuul integration with you if that helps?
15:47:37 <matel> Let's find entry points, that's good enough.
15:47:38 <johnthetubaguy> while you make the stand-a-lone script stable, and Bob works on getting tempest running stabley
15:47:46 <johnthetubaguy> yup, I am happy to work on that
15:47:49 <BobBall> johnthetubaguy: other question - while we're here - guy in #openstack-dev asking a question I don't know the answer to...
15:47:58 <matel> okay, sounds like a plan.
15:48:02 <BobBall> hi any idea why two rules are used instead of using single rule line number 137,147  https://github.com/openstack/nova/blob/master/plugins/xenserver/networking/etc/xensource/scripts/ovs_configure_vif_flows.py
15:48:17 <BobBall> thought you might know :)
15:48:24 <johnthetubaguy> nope sorry, I am the wrong guy to ask about that stuff
15:48:36 <BobBall> I guessed it was different implementations of ARP - some using the host IP and others using 0.0.0.0 as the response
15:48:40 <BobBall> ah well
15:49:07 <johnthetubaguy> yeah, its deep in the niggles of network land, I know the basics not the details there :(
15:49:28 <BobBall> indeed
15:49:37 <johnthetubaguy> cool, are we all done?
15:49:48 <johnthetubaguy> a good meeting today I feel, thank you all for your hard work!
15:49:52 <BobBall> I know a fair bit, particularly around that tenant isolation, but that one had me stumped...
15:50:04 <BobBall> heh - you're welcome?
15:50:34 <johnthetubaguy> ...I was meeting more generally, not the hard work of the meeting
15:50:36 <johnthetubaguy> anyways
15:50:39 <johnthetubaguy> any more questions?
15:51:02 <BobBall> yeah
15:51:20 <BobBall> what's the deal with this furby boom thing?  I hear it's a popular xmas toy but I just don't get it
15:51:23 <BobBall> maybe I'm too old...
15:51:31 <BobBall> hmmm - do you think that question is off-topic?
15:52:17 <johnthetubaguy> maybe
15:52:26 <johnthetubaguy> I never get these crazes
15:52:33 <johnthetubaguy> they always seem to pass me by
15:52:38 <johnthetubaguy> but maybe I am just tight
15:52:41 <johnthetubaguy> anyways...
15:52:45 <johnthetubaguy> #endmeeting