17:01:12 <johnthetubaguy> #startmeeting XenAPI
17:01:13 <openstack> Meeting started Wed Mar  6 17:01:12 2013 UTC.  The chair is johnthetubaguy. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:01:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:01:16 <openstack> The meeting name has been set to 'xenapi'
17:01:22 <BobBall> Yay :)
17:01:34 <johnthetubaguy> hi everyone
17:02:02 <BobBall> Morning John
17:02:05 <BobBall> or afternoon
17:02:09 <BobBall> depending on where everyone is
17:02:25 <matelakat> hi
17:02:29 <johnthetubaguy> #topic actions from last meeting
17:02:38 <johnthetubaguy> So I had a few actions
17:03:04 <johnthetubaguy> #action johnthetubaguy needs to do the actions from last week
17:03:14 <BobBall> haha
17:03:16 <BobBall> nice action
17:03:29 <johnthetubaguy> been stuck with XCP on CentOS, so not really started on the CentOS install docs
17:03:32 <johnthetubaguy> hey ho
17:03:34 <matelakat> I guess it was a busy week.
17:03:53 * johnthetubaguy bangs head against wall
17:03:59 <johnthetubaguy> anyways...
17:04:07 <BobBall> What's the progress on XCP on CentOS?
17:04:16 <johnthetubaguy> #topic blueprints
17:04:38 <johnthetubaguy> we got a meeting on that Tueday 17.00 UTC in #centos-devel
17:04:59 <johnthetubaguy> summary: broken with odd permissions errors, no one really can tell why
17:05:05 <BobBall> sounds fun
17:05:14 <johnthetubaguy> joyus
17:05:22 <johnthetubaguy> so… blueprints?
17:05:36 <johnthetubaguy> just a call to look at the etherpad for the summit and add things
17:05:41 <BobBall> oh
17:05:43 <johnthetubaguy> we added the odd bit last time
17:05:48 <BobBall> I forgot to add stuff
17:05:52 <BobBall> lemme have a quick look
17:06:05 <johnthetubaguy> #link https://etherpad.openstack.org/HavanaXenAPIRoadmap
17:06:11 <BobBall> yeah
17:06:14 <BobBall> just got it from the web page
17:06:17 <BobBall> my bad, sorry
17:06:27 <johnthetubaguy> np
17:06:34 <BobBall> okay
17:06:46 <BobBall> there is one key thing that isn't there
17:06:55 <BobBall> or if it is then I can't see...
17:07:01 <BobBall> is quantum support for XS
17:07:06 <BobBall> didn't make grizzly-3
17:07:13 <johnthetubaguy> its a nova session though
17:07:26 <BobBall> so it should definitely be on the roadmap even if we don't need a new blueprint
17:07:28 <BobBall> ahhhhh
17:07:33 <johnthetubaguy> it was covered in the last summit
17:07:33 <BobBall> didn't spot that from etherpad!
17:07:43 <johnthetubaguy> implementation agreed, code pushed for review
17:07:48 <BobBall> *nod*
17:07:53 <johnthetubaguy> but only one quatum core ever reviewed
17:08:03 <johnthetubaguy> just to document / share
17:08:13 <BobBall> Tis a shame it didn't make the grizzly cut!
17:08:41 <johnthetubaguy> indeed, we half planned a backport to folsom
17:09:06 <johnthetubaguy> but no one will review it, but hopefully that will change now
17:09:19 <johnthetubaguy> so anything else?
17:09:25 <johnthetubaguy> blueprint wise
17:09:53 <BobBall> not from my end
17:10:01 <johnthetubaguy> #topic docs
17:10:04 <johnthetubaguy> any news?
17:10:09 <johnthetubaguy> I didn't do my things
17:10:21 <johnthetubaguy> there was a note on the mailing list about live-migraiton docs
17:10:28 <BobBall> but I guess there will be an AOB for adding new features to the roadmap?
17:10:32 <johnthetubaguy> they look fairly poor, might need to expand them
17:10:50 <johnthetubaguy> sure, we can do
17:10:57 <BobBall> we've got a plan to look at some docs - Mate has an action this week to go through and look at what's there
17:11:03 <johnthetubaguy> cool
17:11:20 <johnthetubaguy> #action matelakat to look at state of XenAPI docs and report back next week
17:11:23 <BobBall> there we go :D
17:11:29 <BobBall> I was going to say can you add an #action
17:11:41 <johnthetubaguy> catch me on IRC if there are questions
17:11:51 <matelakat> #link https://github.com/citrix-openstack/bugstat/blob/master/bugreport/main_report.md#openstack-manuals----20
17:12:13 <johnthetubaguy> cool
17:12:36 <matelakat> A lot to do...
17:13:06 <johnthetubaguy> some of those don't affect XenAPI
17:13:23 <matelakat> it's just a dumb search
17:13:31 <johnthetubaguy> sure, no worries
17:13:38 <matelakat> I will look at them, and put em to categories.
17:13:42 <BobBall> e.g. https://bugs.launchpad.net/openstack-manuals/+bug/1095095
17:13:43 <uvirtbot> Launchpad bug 1095095 in openstack-manuals "Configuring for resize with KVM" [Medium,Confirmed]
17:13:59 <BobBall> just says KVM docs aren't as good as XenServer for resize
17:14:19 <matelakat> So it includes the string "XenServer"
17:14:22 <BobBall> indeed
17:14:35 <BobBall> but only in the context of "XenServer doesn't have this bug" :)
17:15:07 <johnthetubaguy> its probably worth manually added xenserver tags, and having a tag only search
17:15:22 <johnthetubaguy> anyways, lets move on
17:15:36 <johnthetubaguy> any more for any more?
17:15:38 <BobBall> I'd like to keep the dumb search, but use the tagged search to say "this has been triaged by someone who knows it's a XS bug"
17:15:47 <johnthetubaguy> +1
17:16:07 <johnthetubaguy> that is what I meant
17:16:34 <BobBall> ah right
17:16:50 <johnthetubaguy> #topic QA and Bugs
17:16:59 <johnthetubaguy> anything major worrying people?
17:17:11 <johnthetubaguy> matelakat have you got the link to your bug finder?
17:17:26 <BobBall> I guess one thing that surprised me is that devstack multihost doesn't seem to be tested by anyone else
17:17:49 <matelakat> #link https://github.com/citrix-openstack/bugstat
17:18:01 <guitarzan> we'd like to get some eyes on this: https://review.openstack.org/#/c/23662/
17:18:41 <BobBall> *has a butchers*
17:20:32 <BobBall> That one's an interesting issue!
17:20:42 <guitarzan> and a little painful :)
17:21:13 <guitarzan> I'm going to try it out
17:21:23 <BobBall> So has the whole SR gone away?
17:21:34 <guitarzan> hopefully with the patch, yes
17:21:49 <BobBall> ahhh - this an iSCSI SR?
17:21:52 <guitarzan> yes
17:21:53 <johnthetubaguy> I guess the point is, if your iSCSI target dies, then VM will not start
17:22:00 <guitarzan> johnthetubaguy: exactly
17:22:04 <BobBall> yup - not surprising.
17:22:05 <BobBall> okay
17:22:05 <BobBall> got it
17:22:13 <BobBall> I was getting a little confused
17:22:45 <guitarzan> I'm not sure what happens in the other HVs case
17:22:59 <guitarzan> but I also don't have to worry about that case
17:23:07 <BobBall> *grin*
17:23:34 <BobBall> I'll have to have a think about this one
17:23:56 <johnthetubaguy> yeah, sounds like an excessive timeout, would be nice to be able to specify that in the check call
17:24:12 <guitarzan> s1rp and I talked about it quite a bit, and this seems to be the best we could come up with on short notice
17:24:18 <guitarzan> mad props to him for making it work
17:24:23 <BobBall> you mean the XAPI timeout?
17:24:55 <johnthetubaguy> erm, timeout in the xapi operation
17:25:06 <BobBall> I guess this is currently a critical issue for you guys?
17:25:12 <guitarzan> well, it's an ugly one
17:25:20 <guitarzan> requires ops to go in and nuke the SR
17:25:28 <guitarzan> it doesn't happen often
17:25:35 <johnthetubaguy> maybe some kind of health check would be better, with tunable timeout
17:25:59 <guitarzan> it should only happen if something happens to the network or we lose a storage node
17:26:01 <johnthetubaguy> I like scan SR because it should be quick in the working cases
17:26:01 <BobBall> so XS doesn't timeout the SR?
17:26:36 <johnthetubaguy> oh, I see you only call that in error cases
17:26:55 <guitarzan> johnthetubaguy: yeah, we don't do anything unless it doesn't boot
17:27:14 <johnthetubaguy> makes sense
17:27:31 <BobBall> how long ago had the SR gone away?
17:27:34 <BobBall> or had it only just gone?
17:27:55 <johnthetubaguy> shame we can't have a non destructive error case again, do we need to tell cinder we detached the volume?
17:27:57 <guitarzan> the sr is still there, it just can't make the iscsi connection
17:28:10 <guitarzan> johnthetubaguy: compute manager does that
17:28:14 <BobBall> Also, could you just post the XS error log to the bug so that we've got a traceback
17:28:28 <guitarzan> the fun part was propagating the bad devices back up to compute
17:28:38 <BobBall> *grin* that does look fun
17:28:43 <johnthetubaguy> ah, got ya, didn't get there yet
17:29:11 <guitarzan> BobBall: I'll try to remember to paste a stack
17:29:28 <BobBall> this _handle_bad_volumes_detached case?
17:29:53 <guitarzan> well, I'll grab the xen log from the failed boot
17:30:04 <BobBall> that's perfect
17:30:36 <johnthetubaguy> pull out the network cable between your iscsi target an hypervisor, it should repo OK
17:30:41 * BobBall is impressed with this one
17:30:46 <BobBall> I like that bug
17:30:48 <guitarzan> glad you like it
17:30:56 <BobBall> is there a Bug Of The Month award?
17:31:01 <guitarzan> we were hoping XS would boot without all the volumes, but alas
17:31:08 <BobBall> yup
17:31:21 <BobBall> well we might also be able to patch ISCSISR.py to do something
17:31:22 <BobBall> not sure
17:31:24 <johnthetubaguy> good old xapi trying to protect us from doing bad things again
17:31:26 <BobBall> depends how the SR is failing
17:31:35 <BobBall> unlikely to be XAPI
17:31:52 <johnthetubaguy> oh, OK
17:32:02 <BobBall> is it the vm start that fails? I'm almost surprised the shutdown works ok if the SR is timing out
17:32:17 <johnthetubaguy> it probably got shutdown before that right?
17:32:22 <guitarzan> I'm not sure
17:32:28 <johnthetubaguy> or this is the first start?
17:32:30 <guitarzan> it's a reboot, so it wasn't really shut down
17:32:35 <johnthetubaguy> ah
17:33:00 * johnthetubaguy remembers bug report…drrr
17:33:08 <BobBall> ok well might I suggest that John, you and I take an action to look at it?
17:33:26 <BobBall> I see you've already added yourself!
17:33:30 <BobBall> hah :)
17:33:43 <johnthetubaguy> make sure the SR is behaving correctly, for the "graceful" fix
17:34:09 <BobBall> guitarzan, do you happen to know if this is a soft or hard reboot?
17:34:49 <johnthetubaguy> #action johnthetubaguy guitarzan to look into broken SR issues https://review.openstack.org/#/c/23662
17:35:19 <guitarzan> BobBall: not sure
17:35:24 <guitarzan> I'll try both
17:35:31 <johnthetubaguy> maybe hard because soft failed...
17:35:33 <johnthetubaguy> cool
17:35:39 <BobBall> probably both tbh
17:35:47 <guitarzan> that's my guess
17:35:58 <johnthetubaguy> cool
17:36:02 <BobBall> *not sure if XAPI handles the SRs differently for the two cases*
17:36:05 <BobBall> Anyway - let's move on :)
17:36:08 <johnthetubaguy> indeed
17:36:13 <johnthetubaguy> any more bugs?
17:36:33 <johnthetubaguy> me guessing that is a no...
17:36:54 <johnthetubaguy> #topic Open Discussion
17:37:07 <johnthetubaguy> so, bobball has a few things?
17:37:22 <s1rp> ohai guys...
17:37:34 <johnthetubaguy> hey
17:37:34 <guitarzan> we were just talking about you
17:37:37 <s1rp> yeah the clean_reboot operation hangs for 120 secs...
17:37:50 <s1rp> luckily a subsequent SR.scan seems to be quick-ish
17:37:56 <BobBall> ahhhhh
17:37:59 <s1rp> only the first-one after unplugging seems to be slow
17:38:11 <s1rp> it's like it stores some data somewhere marking it as failed (?)
17:38:12 <guitarzan> BobBall: I haven't tried that iscsi patch you sent me yet
17:38:18 <johnthetubaguy> that figures, cool
17:38:57 <BobBall> s1rp, I thought it was the SR scan that waited 120 seconds
17:39:14 <BobBall> failing fast in clean_reboot is likely to be a XAPI thing waiting for the SR to respond to it's attach request
17:39:15 <s1rp> that too... lemme clarify
17:39:32 <BobBall> sorry john - we're derailing the agenda :D
17:39:38 <s1rp> so if you do an sr-scan w/o a reboot, then that call will take 120 secs (this is what i was doing on the comand line to troubleshoot this)
17:40:03 <johnthetubaguy> its OK, its important
17:40:07 <s1rp> but, and i'm not 100% sure on this, but if you do a clean_reboot, that will cause an underlying timeout, but i *think* the next SR.scan will actually fail-fast
17:40:08 <BobBall> ah - but failed reboot followed by sr-scan to find the failing device is fast
17:40:27 <s1rp> BobBall: yeah, need to triple check that case, but i believe so
17:40:28 <BobBall> unfortunately that might mean the timeout is in iscsiadm ?
17:40:37 <BobBall> ... or fortunately :)
17:40:43 <BobBall> that might be easy to fix
17:40:50 <johnthetubaguy> right, hack the RD
17:40:54 <johnthetubaguy> lol SR
17:41:01 <BobBall> or just an other-config
17:41:07 <BobBall> I think we can pass some iscsiadm flags through
17:41:09 <johnthetubaguy> even better
17:41:18 <BobBall> not 100% on that though.  Maybe only 73% sure.
17:41:54 <BobBall> btw john, my stuff on libvirt can wait until next week if we have other things to get through :)
17:42:14 <johnthetubaguy> BobBall: thanks
17:43:19 <BobBall> s1rp, Was saying to guitarzan that we'd like some of the XS logs in the bug report just for tracability if that's ok
17:44:35 <s1rp> BobBall: cool, we can get those over to you; luckily this is very easy to replicate!
17:44:55 <johnthetubaguy> sounds good, any more on that one?
17:45:26 <BobBall> no :) Let's leave that one for now
17:45:27 <johnthetubaguy> can always take it to the ML
17:45:44 <johnthetubaguy> cool, bobball summit stuff you wanted to mention?
17:46:18 <BobBall> Uhhhh maybe?  I don't remember which summit stuff you're referring to?
17:47:12 <johnthetubaguy> ok, missunderstood
17:47:24 <johnthetubaguy> put stuff on the etherpad to help discuss at the summit
17:47:47 <BobBall> Sorry - I could have been clearer! :)
17:47:51 <johnthetubaguy> assuming that session goes ahead, if there is loads, might ask for extra sessions
17:48:04 <BobBall> Summit stuff then - looking forward to it.  matelakat and I have booked our flights so we'll see you there
17:48:24 <johnthetubaguy> sounds like a Xen on libvirt vs XenAPI disucssion might be good, as long as it says sensible and not too religious
17:48:50 <johnthetubaguy> I was thinking in the summit
17:48:57 <johnthetubaguy> but bob you wanted to bring that up this weel?
17:49:00 <johnthetubaguy> week?
17:49:19 <BobBall> Well what I'd like to understand is what the primary value that the XenAPI integration is using from XAPI that can't be provided by libvirt
17:50:22 <BobBall> not sure if we've got enough time to explore that question properly today
17:50:24 <BobBall> which is fine :)
17:50:58 <s1rp> we do alot of weird stuff w/ dom0 plugins, but that probably could be handled with proper hooks in the libvirt layer
17:51:37 <johnthetubaguy> s1rp: we can't use both today, we would freak out xapi
17:51:44 <johnthetubaguy> but yes, I get your point
17:52:13 <johnthetubaguy> I think the question is, should we evolve XAPI/XCP or should we evlove libvirt
17:52:30 <johnthetubaguy> and how much effort is each approach, at this point
17:52:50 <johnthetubaguy> I guess what is missing between libvirt+Xen vs xapi+Xen in openstack today
17:53:19 <BobBall> Well the question is more that there are lots of things that are getting first-dibs in libvirt and whether a libvirt-on-xen/xapi hybrid approach would bring us much and what level of pain it would be for XAPI to tolerate such a hybrid approach
17:53:22 <johnthetubaguy> I get the idea that gap could be quite small, but I never got libvirt+Xen working that well, but didn't try very hard
17:53:47 <johnthetubaguy> hmm, maybe
17:54:02 <johnthetubaguy> but that is like how many years out?
17:54:29 <BobBall> It's not this week, that's true
17:54:30 <johnthetubaguy> well, maybe that fits into evolving XAPI actually...
17:54:49 <johnthetubaguy> I wondered about using xenopsd instead
17:55:26 <johnthetubaguy> thats the thing under xapi
17:56:09 <johnthetubaguy> anyways, we can take this offline
17:56:13 <johnthetubaguy> anything else?
17:57:22 <johnthetubaguy> cool
17:57:29 <johnthetubaguy> #endmeeting