14:00:27 <PaulMurray> #startmeeting Nova Live Migration 14:00:28 <openstack> Meeting started Tue Aug 16 14:00:27 2016 UTC and is due to finish in 60 minutes. The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:32 <openstack> The meeting name has been set to 'nova_live_migration' 14:00:46 <PaulMurray> hi - anyone here for LM 14:01:01 <paul-carlton2> hi 14:01:18 <mdbooth> o/ 14:01:24 <mriedem> o/ 14:01:26 <davidgiluk> o/ 14:01:27 <johnthetubaguy> o/ 14:01:49 <PaulMurray> I thought there would be no one for a minute - I was going to head for the beach 14:02:03 <PaulMurray> agenda here: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration 14:02:13 * mdbooth is lurking, trying to address a bunch of review comments before eod 14:02:16 <PaulMurray> I'm still catching up after vacation 14:02:33 <PaulMurray> #topic Libvirt image backend 14:02:44 <PaulMurray> anything here mdbooth ? 14:02:53 <mdbooth> Yep 14:03:09 <mdbooth> I've had a bunch of really good review from jaypipes and dansmith 14:03:28 <mdbooth> And I'm hoping I can land the test changes over the next few days 14:03:36 <PaulMurray> saw some disagreements about your test 14:03:45 <mdbooth> That's actually a substantial chunk of the work 14:03:54 <mdbooth> Yeah, the substantive disagreements are done, though 14:04:02 <mdbooth> That was just trivia 14:04:34 <mdbooth> So there's still a ton of stuff to land, and I'm actually furiously rewriting some of it :) 14:04:36 <PaulMurray> good - so just need reviews 14:04:45 <mdbooth> But it progresses 14:04:50 <PaulMurray> is the series getting longer or shorter ? 14:04:58 <mdbooth> It is now getting shorter 14:05:23 <PaulMurray> good stuff - I'll spend some time on it this week, but you really need the +2s 14:05:32 <mdbooth> Yup 14:05:39 <mdbooth> I wrote some generator functions, btw 14:05:50 <mdbooth> like get_ephemerals() 14:06:06 <mdbooth> and functions to fetch local disk, swap, etc 14:06:20 <mdbooth> These were kinda ugly and represented a different interface 14:06:35 <mdbooth> So as the reviews haven't got there yet I'm rewriting that part of the series 14:06:44 <mdbooth> I say rewrite, but it's really code motion. 14:07:04 <mdbooth> If the reviews get there, I'll convert the new stuff into future patches and keep the existing code. 14:07:21 <PaulMurray> ok - anything else 14:07:31 <mdbooth> Remind me what the deadline is? 14:07:40 <PaulMurray> I was just trying to remember 14:07:43 <mdbooth> I'm guessing I'm starting to be up against that. 14:07:47 <PaulMurray> mriedem, ? 14:08:11 * mriedem reads scrollback 14:08:16 <mriedem> 9/2 14:08:23 <mriedem> https://wiki.openstack.org/wiki/Nova/Newton_Release_Schedule 14:08:32 <mriedem> probably more like 9/1 14:08:36 <mdbooth> mriedem: Thanks 14:08:36 <davidgiluk> hmm, Luis doesn't seem to be back; I was hoping he'd fix the postcopy network bug 14:08:37 <mriedem> i don't think we ever have a friday deadline 14:08:44 <mdbooth> Yeah, I'm definitely up against that 14:08:56 <mdbooth> Esp as I'm at a conference next week :/ 14:09:19 <PaulMurray> that is close 14:10:16 <mriedem> mdbooth: i thought there were red hat team trust falls in toronto? 14:10:37 <mdbooth> mriedem: It's KVM forum, and we're also doing Red Hat team trust falls 14:10:42 <PaulMurray> mdbooth, will anyone be able to continue while your gone ? 14:10:59 <mdbooth> Incidentally, I'll be at KVM forum and I have a talk scheduled for our internal libvirt team 14:11:11 <mdbooth> I was planning to talk about all the features we wish we had in libvirt 14:11:28 <mdbooth> If anybody would like me to present their pet feature to RH's libvirt team, let me know 14:11:56 <mdbooth> PaulMurray: Realistically no, but I'll be working on it 14:11:57 <davidgiluk> mdbooth: Anything you want from qemu? 14:12:30 <mdbooth> davidgiluk: I'm sure we'll let you know :) 14:12:45 <mdbooth> Feature parity for qemu nfs and iscsi would be awesome 14:13:19 <PaulMurray> lets move on 14:13:26 <PaulMurray> #topic Storage pools 14:13:33 <PaulMurray> anything to say paul-carlton2 14:14:14 <PaulMurray> no paul-carlton2 14:14:33 <PaulMurray> we can come back if we need to.... 14:14:36 <PaulMurray> #topic CI 14:14:47 <paul-carlton2> yep, resubmitted spec fro Ocata 14:15:09 <paul-carlton2> got feedback from danpb before I went away 14:15:24 <PaulMurray> ok 14:15:30 <paul-carlton2> more feedback to incorporate but busy with internal stuff today 14:15:49 <PaulMurray> The agenda items for CI are the ones that were there last week I think 14:16:12 <PaulMurray> https://review.openstack.org/#/c/329466/ - NFS shared storage enablement 14:16:29 <PaulMurray> that has a -1 from mriedem 14:16:42 <PaulMurray> is tdurakov around ? 14:18:27 <mriedem> it's a simple fix for the -1 14:18:30 <mriedem> needs a rebase though 14:18:44 <mriedem> i haven't checked the live migration test results since we started skipping the volume-backed live migraiton test 14:18:49 <mriedem> but that was supposed to make it more stable 14:18:59 <mriedem> want to give me an action to follow up on that? 14:19:26 <PaulMurray> will do 14:19:57 <PaulMurray> you mean the patch or the results ? 14:20:58 <PaulMurray> #action mriedem to follow up on https://review.openstack.org/#/c/329466/ and live migration stability 14:21:30 <mriedem> all of it 14:21:31 <mriedem> sure 14:21:37 <PaulMurray> anything else on CI 14:21:37 <mriedem> i should do something around here 14:22:18 <PaulMurray> I saw the comment on "unexpected abort" bug being blocker for voting 14:23:06 <PaulMurray> ah, the fix on the bug is to skip the test - have I got that right? mriedem 14:24:46 <PaulMurray> maybe I answered my own question - this is very one way today........ 14:25:00 <mriedem> yes 14:25:07 <mriedem> sorry in 3 channel discussions 14:25:30 <PaulMurray> I noticed, but glad your here :) 14:25:44 <PaulMurray> #topic Networking 14:26:10 <PaulMurray> Not much progress on Swami's patches 14:26:32 <mriedem> the dvr one? 14:26:37 <PaulMurray> yes 14:26:44 <mriedem> https://review.openstack.org/#/c/275073/ 14:26:45 <PaulMurray> tempest one is failing - saw he rebased 14:26:49 <mriedem> ok, 14:27:04 <mriedem> i swear i was just about ready on that guy, all it needed was a rebase, and then it changed like 7 more times 14:27:11 <mriedem> so i left it alone for awhile until it quieted down 14:27:56 <PaulMurray> wow - the comments are so long it takes time to load them 14:28:39 <PaulMurray> the last change was last week - I'll catch up with him and find out whats going on 14:29:11 <PaulMurray> #action PaulMurray to follow up on Swami's DVR patch status 14:30:07 <PaulMurray> johnthetubaguy has this spec developing 14:30:27 <PaulMurray> https://review.openstack.org/#/c/353982/ - WIP: Improve Nova and Neutron interactions 14:30:42 <PaulMurray> it has live migration in it and he would like reviews 14:30:51 <PaulMurray> neutron mid cycle is this week 14:31:44 <PaulMurray> johnthetubaguy, does your spec cover the port binding update described in https://review.openstack.org/#/c/309416 14:32:50 <johnthetubaguy> so, ish 14:33:17 <johnthetubaguy> the idea is to work with neutorn and evolve that to some place that works 14:33:32 <johnthetubaguy> I am hoping it will look similar to what we are doing with cinder too 14:33:43 <johnthetubaguy> its very similar flows for both 14:34:17 <johnthetubaguy> the main bit is, we want to track volume attachements / port bindings inside cinder/neutron such that you have two of them on a single volume / port during live-migration 14:34:43 <johnthetubaguy> neutron needs to know when the destination binding goes active 14:35:02 <johnthetubaguy> cinder has some complications around shared per host connections to the storage array 14:35:13 <PaulMurray> danpb suggested using garp from vm to indicate the new port is in use 14:35:25 <johnthetubaguy> yeah, thats basically what we do downstream 14:35:29 <PaulMurray> I talked to carl_baldwin about that a coupld of weeks ago 14:35:34 <johnthetubaguy> but that depends on the network technology 14:35:46 <PaulMurray> we thought it would work in some cases bu tmight not be general enough 14:35:49 <johnthetubaguy> its something neutron should be doing, either way 14:36:16 <johnthetubaguy> there are driver specific actions to take on the destination port, when it needs to go active 14:37:33 <johnthetubaguy> basically it about getting the correct shared state, so neutron can do the gARP at the right time 14:38:26 <johnthetubaguy> anyways, I am going to the neutron midcycle this week, so they know about the issues we are facing 14:38:37 <johnthetubaguy> and hopefully come back with something that looks like a plan to fix things 14:39:17 <PaulMurray> remember the post copy case - switchover in the middle of the migration 14:39:37 * PaulMurray has that on his mind all the time now 14:40:13 <PaulMurray> next topic is...... 14:40:14 * davidgiluk has that on his mind as well; but I don't know when Luis is back and I'm not sure who else to ask 14:40:41 <PaulMurray> #topic Open Discussion 14:40:51 <PaulMurray> anything here ? 14:41:58 <johnthetubaguy> PaulMurray: yeah, I mentioned post copy in the spec and comments on that other spec I think 14:42:27 <paul-carlton2> live migration of affinity instances is an issue 14:42:32 <johnthetubaguy> PaulMurray: turns out thats mostly our own problem though. 14:42:57 <PaulMurray> johnthetubaguy, yes, I think so 14:44:12 <PaulMurray> paul-carlton2, is that a discussion or a statement ? 14:44:16 <paul-carlton2> we need a plan to fix group policies so this can be addressed 14:44:39 <PaulMurray> how would you fix them ? 14:44:56 <paul-carlton2> both, discussed it a few weeks agoo with johnthetubaguy but not sure we have agreement of a way forward 14:45:46 <paul-carlton2> I'd like to migrate all members of an affinity group in one operation but that is sort of orchestration which is out of scope 14:45:49 <johnthetubaguy> does --force not let you work around that? 14:46:08 <johnthetubaguy> at least, I think that was the original plan 14:46:26 <PaulMurray> johnthetubaguy, that's what we do for now 14:46:43 <paul-carlton2> johnthetubaguy, yes but I don't like force it is too dangerous, requires operator to know everything about instance 14:47:10 <PaulMurray> we can do "check destination" now 14:47:16 <paul-carlton2> I'd like --ignor-affinity so it still uses scheduler but overrides affinity only 14:47:17 <PaulMurray> and name the host 14:48:02 <paul-carlton2> anti-affinity is not an issue, just affinity 14:48:16 <PaulMurray> if affinity is best effort it works too 14:48:31 <PaulMurray> only strict affinity has a problem 14:48:48 <PaulMurray> so could losen the definition instead 14:49:21 <johnthetubaguy> so my take is probably a bit extreme, per host affinity is basically broken anyway 14:49:32 <PaulMurray> yep 14:49:42 <johnthetubaguy> once we have the looser per rack affinity, etc, the problem kinda goes away 14:50:01 <paul-carlton2> per host affinity works, provided you create instances serially 14:50:02 <johnthetubaguy> well, its less acute, I should say 14:50:52 <johnthetubaguy> so there is another option maybe... 14:51:04 <PaulMurray> or if you have strict affinity you just can't migrate 14:51:07 <johnthetubaguy> ignore disabled hosts, when checking affinity 14:51:26 <paul-carlton2> The benefit of doing all instances as a single operation is you can find a host that has space for all members of group then move them one at a time till done 14:51:31 <johnthetubaguy> so most cases its disable a host, move all the VMs, we could make affinity work better in that case 14:51:48 <johnthetubaguy> paul-carlton2: yeah, true, thats the suck-ey bit 14:52:04 <paul-carlton2> So maybe a find host for these instances operation is the answer? 14:52:14 <johnthetubaguy> so evenutally, you will be able to get all the claims from the placement API, then pass them into the live-migrate 14:52:26 <johnthetubaguy> maybe we just leave this one to be fixed in the future 14:52:38 <johnthetubaguy> --force for the short term 14:53:10 <PaulMurray> johnthetubaguy, yes, if the placement api deals with claims appropriately 14:53:30 <PaulMurray> the problem is still there in the scheduler (placer) 14:54:02 <paul-carlton2> ok, but I'd like to provide a way for the operator to use the scheduler to identify host(s) that can accommodate the instances 14:54:03 <PaulMurray> but yes, if we have a two phase get a claim/use claim we can do it 14:54:14 <PaulMurray> but it becomes more orchestration 14:54:40 <PaulMurray> paul-carlton2, constraints scheduler :0 14:54:51 <PaulMurray> :) 14:55:55 <PaulMurray> I would redefine affinity to something we can do personally 14:56:04 <PaulMurray> or refuse to migrate 14:56:11 <PaulMurray> unless admin forces 14:57:52 <paul-carlton2> johnthetubaguy whlist on the subject of affinity and migration could you look at https://review.openstack.org/#/c/339588/, it is a bug fix to anti-affinity migration 14:58:25 <PaulMurray> time to close 14:58:51 <PaulMurray> thanks for coming 14:58:58 <PaulMurray> #endmeeting