#openstack-meeting-3 log

14:00:27 <PaulMurray> #startmeeting Nova Live Migration
14:00:28 <openstack> Meeting started Tue Aug 16 14:00:27 2016 UTC and is due to finish in 60 minutes.  The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:32 <openstack> The meeting name has been set to 'nova_live_migration'
14:00:46 <PaulMurray> hi - anyone here for LM
14:01:01 <paul-carlton2> hi
14:01:18 <mdbooth> o/
14:01:24 <mriedem> o/
14:01:26 <davidgiluk> o/
14:01:27 <johnthetubaguy> o/
14:01:49 <PaulMurray> I thought there would be no one for a minute - I was going to head for the beach
14:02:03 <PaulMurray> agenda here: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration
14:02:13 * mdbooth is lurking, trying to address a bunch of review comments before eod
14:02:16 <PaulMurray> I'm still catching up after vacation
14:02:33 <PaulMurray> #topic Libvirt image backend
14:02:44 <PaulMurray> anything here mdbooth ?
14:02:53 <mdbooth> Yep
14:03:09 <mdbooth> I've had a bunch of really good review from jaypipes and dansmith
14:03:28 <mdbooth> And I'm hoping I can land the test changes over the next few days
14:03:36 <PaulMurray> saw some disagreements about your test
14:03:45 <mdbooth> That's actually a substantial chunk of the work
14:03:54 <mdbooth> Yeah, the substantive disagreements are done, though
14:04:02 <mdbooth> That was just trivia
14:04:34 <mdbooth> So there's still a ton of stuff to land, and I'm actually furiously rewriting some of it :)
14:04:36 <PaulMurray> good - so just need reviews
14:04:45 <mdbooth> But it progresses
14:04:50 <PaulMurray> is the series getting longer or shorter ?
14:04:58 <mdbooth> It is now getting shorter
14:05:23 <PaulMurray> good stuff - I'll spend some time on it this week, but you really need the +2s
14:05:32 <mdbooth> Yup
14:05:39 <mdbooth> I wrote some generator functions, btw
14:05:50 <mdbooth> like get_ephemerals()
14:06:06 <mdbooth> and functions to fetch local disk, swap, etc
14:06:20 <mdbooth> These were kinda ugly and represented a different interface
14:06:35 <mdbooth> So as the reviews haven't got there yet I'm rewriting that part of the series
14:06:44 <mdbooth> I say rewrite, but it's really code motion.
14:07:04 <mdbooth> If the reviews get there, I'll convert the new stuff into future patches and keep the existing code.
14:07:21 <PaulMurray> ok - anything else
14:07:31 <mdbooth> Remind me what the deadline is?
14:07:40 <PaulMurray> I was just trying to remember
14:07:43 <mdbooth> I'm guessing I'm starting to be up against that.
14:07:47 <PaulMurray> mriedem, ?
14:08:11 * mriedem reads scrollback
14:08:16 <mriedem> 9/2
14:08:23 <mriedem> https://wiki.openstack.org/wiki/Nova/Newton_Release_Schedule
14:08:32 <mriedem> probably more like 9/1
14:08:36 <mdbooth> mriedem: Thanks
14:08:36 <davidgiluk> hmm, Luis doesn't seem to be back; I was hoping he'd fix the postcopy network bug
14:08:37 <mriedem> i don't think we ever have a friday deadline
14:08:44 <mdbooth> Yeah, I'm definitely up against that
14:08:56 <mdbooth> Esp as I'm at a conference next week :/
14:09:19 <PaulMurray> that is close
14:10:16 <mriedem> mdbooth: i thought there were red hat team trust falls in toronto?
14:10:37 <mdbooth> mriedem: It's KVM forum, and we're also doing Red Hat team trust falls
14:10:42 <PaulMurray> mdbooth, will anyone be able to continue while your gone ?
14:10:59 <mdbooth> Incidentally, I'll be at KVM forum and I have a talk scheduled for our internal libvirt team
14:11:11 <mdbooth> I was planning to talk about all the features we wish we had in libvirt
14:11:28 <mdbooth> If anybody would like me to present their pet feature to RH's libvirt team, let me know
14:11:56 <mdbooth> PaulMurray: Realistically no, but I'll be working on it
14:11:57 <davidgiluk> mdbooth: Anything you want from qemu?
14:12:30 <mdbooth> davidgiluk: I'm sure we'll let you know :)
14:12:45 <mdbooth> Feature parity for qemu nfs and iscsi would be awesome
14:13:19 <PaulMurray> lets move on
14:13:26 <PaulMurray> #topic Storage pools
14:13:33 <PaulMurray> anything to say paul-carlton2
14:14:14 <PaulMurray> no paul-carlton2
14:14:33 <PaulMurray> we can come back if we need to....
14:14:36 <PaulMurray> #topic CI
14:14:47 <paul-carlton2> yep, resubmitted spec fro Ocata
14:15:09 <paul-carlton2> got feedback from danpb before I went away
14:15:24 <PaulMurray> ok
14:15:30 <paul-carlton2> more feedback to incorporate but busy with internal stuff today
14:15:49 <PaulMurray> The agenda items for CI are the ones that were there last week I think
14:16:12 <PaulMurray> https://review.openstack.org/#/c/329466/ - NFS shared storage enablement
14:16:29 <PaulMurray> that has a -1 from mriedem
14:16:42 <PaulMurray> is tdurakov around ?
14:18:27 <mriedem> it's a simple fix for the -1
14:18:30 <mriedem> needs a rebase though
14:18:44 <mriedem> i haven't checked the live migration test results since we started skipping the volume-backed live migraiton test
14:18:49 <mriedem> but that was supposed to make it more stable
14:18:59 <mriedem> want to give me an action to follow up on that?
14:19:26 <PaulMurray> will do
14:19:57 <PaulMurray> you mean the patch or the results ?
14:20:58 <PaulMurray> #action mriedem to follow up on https://review.openstack.org/#/c/329466/ and live migration stability
14:21:30 <mriedem> all of it
14:21:31 <mriedem> sure
14:21:37 <PaulMurray> anything else on CI
14:21:37 <mriedem> i should do something around here
14:22:18 <PaulMurray> I saw the comment on "unexpected abort" bug being blocker for voting
14:23:06 <PaulMurray> ah, the fix on the bug is to skip the test - have I got that right? mriedem
14:24:46 <PaulMurray> maybe I answered my own question - this is very one way today........
14:25:00 <mriedem> yes
14:25:07 <mriedem> sorry in 3 channel discussions
14:25:30 <PaulMurray> I noticed, but glad your here :)
14:25:44 <PaulMurray> #topic Networking
14:26:10 <PaulMurray> Not much progress on Swami's patches
14:26:32 <mriedem> the dvr one?
14:26:37 <PaulMurray> yes
14:26:44 <mriedem> https://review.openstack.org/#/c/275073/
14:26:45 <PaulMurray> tempest one is failing - saw he rebased
14:26:49 <mriedem> ok,
14:27:04 <mriedem> i swear i was just about ready on that guy, all it needed was a rebase, and then it changed like 7 more times
14:27:11 <mriedem> so i left it alone for awhile until it quieted down
14:27:56 <PaulMurray> wow - the comments are so long it takes time to load them
14:28:39 <PaulMurray> the last change was last week - I'll catch up with him and find out whats going on
14:29:11 <PaulMurray> #action PaulMurray to follow up on Swami's DVR patch status
14:30:07 <PaulMurray> johnthetubaguy has this spec developing
14:30:27 <PaulMurray> https://review.openstack.org/#/c/353982/ - WIP: Improve Nova and Neutron interactions
14:30:42 <PaulMurray> it has live migration in it and he would like reviews
14:30:51 <PaulMurray> neutron mid cycle is this week
14:31:44 <PaulMurray> johnthetubaguy, does your spec cover the port binding update described in https://review.openstack.org/#/c/309416
14:32:50 <johnthetubaguy> so, ish
14:33:17 <johnthetubaguy> the idea is to work with neutorn and evolve that to some place that works
14:33:32 <johnthetubaguy> I am hoping it will look similar to what we are doing with cinder too
14:33:43 <johnthetubaguy> its very similar flows for both
14:34:17 <johnthetubaguy> the main bit is, we want to track volume attachements / port bindings inside cinder/neutron such that you have two of them on a single volume / port during live-migration
14:34:43 <johnthetubaguy> neutron needs to know when the destination binding goes active
14:35:02 <johnthetubaguy> cinder has some complications around shared per host connections to the storage array
14:35:13 <PaulMurray> danpb suggested using garp from vm to indicate the new port is in use
14:35:25 <johnthetubaguy> yeah, thats basically what we do downstream
14:35:29 <PaulMurray> I talked to carl_baldwin about that a coupld of weeks ago
14:35:34 <johnthetubaguy> but that depends on the network technology
14:35:46 <PaulMurray> we thought it would work in some cases bu tmight not be general enough
14:35:49 <johnthetubaguy> its something neutron should be doing, either way
14:36:16 <johnthetubaguy> there are driver specific actions to take on the destination port, when it needs to go active
14:37:33 <johnthetubaguy> basically it about getting the correct shared state, so neutron can do the gARP at the right time
14:38:26 <johnthetubaguy> anyways, I am going to the neutron midcycle this week, so they know about the issues we are facing
14:38:37 <johnthetubaguy> and hopefully come back with something that looks like a plan to fix things
14:39:17 <PaulMurray> remember the post copy case - switchover in the middle of the migration
14:39:37 * PaulMurray has that on his mind all the time now
14:40:13 <PaulMurray> next topic is......
14:40:14 * davidgiluk has that on his mind as well; but I don't know when Luis is back and I'm not sure who else to ask
14:40:41 <PaulMurray> #topic Open Discussion
14:40:51 <PaulMurray> anything here ?
14:41:58 <johnthetubaguy> PaulMurray: yeah, I mentioned post copy in the spec and comments on that other spec I think
14:42:27 <paul-carlton2> live migration of affinity instances is an issue
14:42:32 <johnthetubaguy> PaulMurray: turns out thats mostly our own problem though.
14:42:57 <PaulMurray> johnthetubaguy, yes, I think so
14:44:12 <PaulMurray> paul-carlton2, is that a discussion or a statement ?
14:44:16 <paul-carlton2> we need a plan to fix group policies so this can be addressed
14:44:39 <PaulMurray> how would you fix them ?
14:44:56 <paul-carlton2> both, discussed it a few weeks agoo with johnthetubaguy but not sure we have agreement of a way forward
14:45:46 <paul-carlton2> I'd like to migrate all members of an affinity group in one operation but that is sort of orchestration which is out of scope
14:45:49 <johnthetubaguy> does --force not let you work around that?
14:46:08 <johnthetubaguy> at least, I think that was the original plan
14:46:26 <PaulMurray> johnthetubaguy, that's what we do for now
14:46:43 <paul-carlton2> johnthetubaguy, yes but I don't like force it is too dangerous, requires operator to know everything about instance
14:47:10 <PaulMurray> we can do "check destination" now
14:47:16 <paul-carlton2> I'd like --ignor-affinity so it still uses scheduler but overrides affinity only
14:47:17 <PaulMurray> and name the host
14:48:02 <paul-carlton2> anti-affinity is not an issue, just affinity
14:48:16 <PaulMurray> if affinity is best effort it works too
14:48:31 <PaulMurray> only strict affinity has a problem
14:48:48 <PaulMurray> so could losen the definition instead
14:49:21 <johnthetubaguy> so my take is probably a bit extreme, per host affinity is basically broken anyway
14:49:32 <PaulMurray> yep
14:49:42 <johnthetubaguy> once we have the looser per rack affinity, etc, the problem kinda goes away
14:50:01 <paul-carlton2> per host affinity works, provided you create instances serially
14:50:02 <johnthetubaguy> well, its less acute, I should say
14:50:52 <johnthetubaguy> so there is another option maybe...
14:51:04 <PaulMurray> or if you have strict affinity you just can't migrate
14:51:07 <johnthetubaguy> ignore disabled hosts, when checking affinity
14:51:26 <paul-carlton2> The benefit of doing all instances  as a single operation is you can find a host that has space for all members of group then move them one at a time till done
14:51:31 <johnthetubaguy> so most cases its disable a host, move all the VMs, we could make affinity work better in that case
14:51:48 <johnthetubaguy> paul-carlton2: yeah, true, thats the suck-ey bit
14:52:04 <paul-carlton2> So maybe a find host for these instances operation is the answer?
14:52:14 <johnthetubaguy> so evenutally, you will be able to get all the claims from the placement API, then pass them into the live-migrate
14:52:26 <johnthetubaguy> maybe we just leave this one to be fixed in the future
14:52:38 <johnthetubaguy> --force for the short term
14:53:10 <PaulMurray> johnthetubaguy, yes, if the placement api deals with claims appropriately
14:53:30 <PaulMurray> the problem is still there in the scheduler (placer)
14:54:02 <paul-carlton2> ok, but I'd like to provide a way for the operator to use the scheduler to identify host(s) that can accommodate the instances
14:54:03 <PaulMurray> but yes, if we have a two phase get a claim/use claim we can do it
14:54:14 <PaulMurray> but it becomes more orchestration
14:54:40 <PaulMurray> paul-carlton2, constraints scheduler :0
14:54:51 <PaulMurray> :)
14:55:55 <PaulMurray> I would redefine affinity to something we can do personally
14:56:04 <PaulMurray> or refuse to migrate
14:56:11 <PaulMurray> unless admin forces
14:57:52 <paul-carlton2> johnthetubaguy whlist on the subject of affinity and migration could you look at https://review.openstack.org/#/c/339588/, it is a bug fix to anti-affinity migration
14:58:25 <PaulMurray> time to close
14:58:51 <PaulMurray> thanks for coming
14:58:58 <PaulMurray> #endmeeting