14:00:27 #startmeeting Nova Live Migration 14:00:28 Meeting started Tue Aug 16 14:00:27 2016 UTC and is due to finish in 60 minutes. The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:29 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:32 The meeting name has been set to 'nova_live_migration' 14:00:46 hi - anyone here for LM 14:01:01 hi 14:01:18 o/ 14:01:24 o/ 14:01:26 o/ 14:01:27 o/ 14:01:49 I thought there would be no one for a minute - I was going to head for the beach 14:02:03 agenda here: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration 14:02:13 * mdbooth is lurking, trying to address a bunch of review comments before eod 14:02:16 I'm still catching up after vacation 14:02:33 #topic Libvirt image backend 14:02:44 anything here mdbooth ? 14:02:53 Yep 14:03:09 I've had a bunch of really good review from jaypipes and dansmith 14:03:28 And I'm hoping I can land the test changes over the next few days 14:03:36 saw some disagreements about your test 14:03:45 That's actually a substantial chunk of the work 14:03:54 Yeah, the substantive disagreements are done, though 14:04:02 That was just trivia 14:04:34 So there's still a ton of stuff to land, and I'm actually furiously rewriting some of it :) 14:04:36 good - so just need reviews 14:04:45 But it progresses 14:04:50 is the series getting longer or shorter ? 14:04:58 It is now getting shorter 14:05:23 good stuff - I'll spend some time on it this week, but you really need the +2s 14:05:32 Yup 14:05:39 I wrote some generator functions, btw 14:05:50 like get_ephemerals() 14:06:06 and functions to fetch local disk, swap, etc 14:06:20 These were kinda ugly and represented a different interface 14:06:35 So as the reviews haven't got there yet I'm rewriting that part of the series 14:06:44 I say rewrite, but it's really code motion. 14:07:04 If the reviews get there, I'll convert the new stuff into future patches and keep the existing code. 14:07:21 ok - anything else 14:07:31 Remind me what the deadline is? 14:07:40 I was just trying to remember 14:07:43 I'm guessing I'm starting to be up against that. 14:07:47 mriedem, ? 14:08:11 * mriedem reads scrollback 14:08:16 9/2 14:08:23 https://wiki.openstack.org/wiki/Nova/Newton_Release_Schedule 14:08:32 probably more like 9/1 14:08:36 mriedem: Thanks 14:08:36 hmm, Luis doesn't seem to be back; I was hoping he'd fix the postcopy network bug 14:08:37 i don't think we ever have a friday deadline 14:08:44 Yeah, I'm definitely up against that 14:08:56 Esp as I'm at a conference next week :/ 14:09:19 that is close 14:10:16 mdbooth: i thought there were red hat team trust falls in toronto? 14:10:37 mriedem: It's KVM forum, and we're also doing Red Hat team trust falls 14:10:42 mdbooth, will anyone be able to continue while your gone ? 14:10:59 Incidentally, I'll be at KVM forum and I have a talk scheduled for our internal libvirt team 14:11:11 I was planning to talk about all the features we wish we had in libvirt 14:11:28 If anybody would like me to present their pet feature to RH's libvirt team, let me know 14:11:56 PaulMurray: Realistically no, but I'll be working on it 14:11:57 mdbooth: Anything you want from qemu? 14:12:30 davidgiluk: I'm sure we'll let you know :) 14:12:45 Feature parity for qemu nfs and iscsi would be awesome 14:13:19 lets move on 14:13:26 #topic Storage pools 14:13:33 anything to say paul-carlton2 14:14:14 no paul-carlton2 14:14:33 we can come back if we need to.... 14:14:36 #topic CI 14:14:47 yep, resubmitted spec fro Ocata 14:15:09 got feedback from danpb before I went away 14:15:24 ok 14:15:30 more feedback to incorporate but busy with internal stuff today 14:15:49 The agenda items for CI are the ones that were there last week I think 14:16:12 https://review.openstack.org/#/c/329466/ - NFS shared storage enablement 14:16:29 that has a -1 from mriedem 14:16:42 is tdurakov around ? 14:18:27 it's a simple fix for the -1 14:18:30 needs a rebase though 14:18:44 i haven't checked the live migration test results since we started skipping the volume-backed live migraiton test 14:18:49 but that was supposed to make it more stable 14:18:59 want to give me an action to follow up on that? 14:19:26 will do 14:19:57 you mean the patch or the results ? 14:20:58 #action mriedem to follow up on https://review.openstack.org/#/c/329466/ and live migration stability 14:21:30 all of it 14:21:31 sure 14:21:37 anything else on CI 14:21:37 i should do something around here 14:22:18 I saw the comment on "unexpected abort" bug being blocker for voting 14:23:06 ah, the fix on the bug is to skip the test - have I got that right? mriedem 14:24:46 maybe I answered my own question - this is very one way today........ 14:25:00 yes 14:25:07 sorry in 3 channel discussions 14:25:30 I noticed, but glad your here :) 14:25:44 #topic Networking 14:26:10 Not much progress on Swami's patches 14:26:32 the dvr one? 14:26:37 yes 14:26:44 https://review.openstack.org/#/c/275073/ 14:26:45 tempest one is failing - saw he rebased 14:26:49 ok, 14:27:04 i swear i was just about ready on that guy, all it needed was a rebase, and then it changed like 7 more times 14:27:11 so i left it alone for awhile until it quieted down 14:27:56 wow - the comments are so long it takes time to load them 14:28:39 the last change was last week - I'll catch up with him and find out whats going on 14:29:11 #action PaulMurray to follow up on Swami's DVR patch status 14:30:07 johnthetubaguy has this spec developing 14:30:27 https://review.openstack.org/#/c/353982/ - WIP: Improve Nova and Neutron interactions 14:30:42 it has live migration in it and he would like reviews 14:30:51 neutron mid cycle is this week 14:31:44 johnthetubaguy, does your spec cover the port binding update described in https://review.openstack.org/#/c/309416 14:32:50 so, ish 14:33:17 the idea is to work with neutorn and evolve that to some place that works 14:33:32 I am hoping it will look similar to what we are doing with cinder too 14:33:43 its very similar flows for both 14:34:17 the main bit is, we want to track volume attachements / port bindings inside cinder/neutron such that you have two of them on a single volume / port during live-migration 14:34:43 neutron needs to know when the destination binding goes active 14:35:02 cinder has some complications around shared per host connections to the storage array 14:35:13 danpb suggested using garp from vm to indicate the new port is in use 14:35:25 yeah, thats basically what we do downstream 14:35:29 I talked to carl_baldwin about that a coupld of weeks ago 14:35:34 but that depends on the network technology 14:35:46 we thought it would work in some cases bu tmight not be general enough 14:35:49 its something neutron should be doing, either way 14:36:16 there are driver specific actions to take on the destination port, when it needs to go active 14:37:33 basically it about getting the correct shared state, so neutron can do the gARP at the right time 14:38:26 anyways, I am going to the neutron midcycle this week, so they know about the issues we are facing 14:38:37 and hopefully come back with something that looks like a plan to fix things 14:39:17 remember the post copy case - switchover in the middle of the migration 14:39:37 * PaulMurray has that on his mind all the time now 14:40:13 next topic is...... 14:40:14 * davidgiluk has that on his mind as well; but I don't know when Luis is back and I'm not sure who else to ask 14:40:41 #topic Open Discussion 14:40:51 anything here ? 14:41:58 PaulMurray: yeah, I mentioned post copy in the spec and comments on that other spec I think 14:42:27 live migration of affinity instances is an issue 14:42:32 PaulMurray: turns out thats mostly our own problem though. 14:42:57 johnthetubaguy, yes, I think so 14:44:12 paul-carlton2, is that a discussion or a statement ? 14:44:16 we need a plan to fix group policies so this can be addressed 14:44:39 how would you fix them ? 14:44:56 both, discussed it a few weeks agoo with johnthetubaguy but not sure we have agreement of a way forward 14:45:46 I'd like to migrate all members of an affinity group in one operation but that is sort of orchestration which is out of scope 14:45:49 does --force not let you work around that? 14:46:08 at least, I think that was the original plan 14:46:26 johnthetubaguy, that's what we do for now 14:46:43 johnthetubaguy, yes but I don't like force it is too dangerous, requires operator to know everything about instance 14:47:10 we can do "check destination" now 14:47:16 I'd like --ignor-affinity so it still uses scheduler but overrides affinity only 14:47:17 and name the host 14:48:02 anti-affinity is not an issue, just affinity 14:48:16 if affinity is best effort it works too 14:48:31 only strict affinity has a problem 14:48:48 so could losen the definition instead 14:49:21 so my take is probably a bit extreme, per host affinity is basically broken anyway 14:49:32 yep 14:49:42 once we have the looser per rack affinity, etc, the problem kinda goes away 14:50:01 per host affinity works, provided you create instances serially 14:50:02 well, its less acute, I should say 14:50:52 so there is another option maybe... 14:51:04 or if you have strict affinity you just can't migrate 14:51:07 ignore disabled hosts, when checking affinity 14:51:26 The benefit of doing all instances as a single operation is you can find a host that has space for all members of group then move them one at a time till done 14:51:31 so most cases its disable a host, move all the VMs, we could make affinity work better in that case 14:51:48 paul-carlton2: yeah, true, thats the suck-ey bit 14:52:04 So maybe a find host for these instances operation is the answer? 14:52:14 so evenutally, you will be able to get all the claims from the placement API, then pass them into the live-migrate 14:52:26 maybe we just leave this one to be fixed in the future 14:52:38 --force for the short term 14:53:10 johnthetubaguy, yes, if the placement api deals with claims appropriately 14:53:30 the problem is still there in the scheduler (placer) 14:54:02 ok, but I'd like to provide a way for the operator to use the scheduler to identify host(s) that can accommodate the instances 14:54:03 but yes, if we have a two phase get a claim/use claim we can do it 14:54:14 but it becomes more orchestration 14:54:40 paul-carlton2, constraints scheduler :0 14:54:51 :) 14:55:55 I would redefine affinity to something we can do personally 14:56:04 or refuse to migrate 14:56:11 unless admin forces 14:57:52 johnthetubaguy whlist on the subject of affinity and migration could you look at https://review.openstack.org/#/c/339588/, it is a bug fix to anti-affinity migration 14:58:25 time to close 14:58:51 thanks for coming 14:58:58 #endmeeting