14:00:11 #startmeeting Nova Live Migration 14:00:13 Meeting started Tue Jul 26 14:00:11 2016 UTC and is due to finish in 60 minutes. The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:14 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:16 The meeting name has been set to 'nova_live_migration' 14:00:25 o/ 14:00:27 hi 14:00:28 o/ 14:00:31 o/ 14:00:31 o/ 14:00:32 hi 14:00:40 good morning all 14:00:43 hi 14:01:00 mdbooth is on vacation this week 14:01:00 hi 14:01:06 I'm in mountain time today - I don't know how you americans can do this stuff at this time in the monring 14:01:13 o/ 14:01:34 * johnthetubaguy points towards a strong coffee 14:01:39 agenda: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration 14:01:41 me neither 14:02:00 * PaulMurray only has black coffee - no creamer in room - agghhh 14:02:16 going straight in 14:02:32 #topic Libvirt image backend 14:02:59 #link http://lists.openstack.org/pipermail/openstack-dev/2016-July/099858.html - mdbooth email summary 14:03:20 bottom change is https://review.openstack.org/#/c/333271/ 14:03:28 i plan on getting on that today 14:03:42 now that the api deprecation stuff is mostly done 14:03:58 mdbooth, that was a very good email btw 14:04:09 all details in there 14:04:25 note that mdbooth is on vacation this week 14:04:49 mriedem, is he back next week ? 14:04:53 yes 14:05:02 o/ 14:05:05 good 14:05:25 #topic Storage pools 14:05:56 #link http://lists.openstack.org/pipermail/openstack-dev/2016-July/099931.html - storage pools message from paul-carlton2 14:06:04 paul-carlton2, anything to add ? 14:06:36 i'm not sure which Matt is being called out there 14:06:43 nope, hope to get this apporoved, talking to max from virtusso in nova room about it now 14:07:02 will bug mdbooth when he returns next week 14:07:10 has danpb looked at the spec? 14:07:30 if it gets some plus ones dansmith says he will look at it 14:07:43 I'll ping danpb too 14:08:08 we merge this one, so we can make any progress we can, in parallel with libvirt image backend right? 14:08:10 * dansmith nods 14:08:12 I think he has seen it, we talked about it last mid cycle i.e. in Bristol 14:08:59 johnthetubaguy, yep, there is more work than there is time to do it in Newton that does not depend on mdbooth and diana_clarke work 14:09:03 danpb wasn't on it, i added him 14:09:07 so chances are he hasn't seen it 14:09:14 mriedem, ta 14:09:27 paul-carlton2: cool, just checking my memory, sounds good 14:10:04 next 14:10:07 #topic CI 14:10:32 Where are we with CI - I lost track a bit ? 14:10:52 * johnthetubaguy looks at tdurakov 14:11:04 live-migration job - waiting for the fix from danpb 14:11:15 I am also curious if we have tempest tests up for review for all the new live-migration API operations? 14:11:31 What about https://bugs.launchpad.net/nova/+bug/1524898 ? Its' coming down to an iscsi setup thing 14:11:31 Launchpad bug 1524898 in OpenStack Compute (nova) "Volume based live migration aborted unexpectedly" [High,Confirmed] 14:12:07 johnthetubaguy: i think https://review.openstack.org/#/c/338256/ and below 14:12:56 I think the plan from mid cycle was to stop testing everything in multinode job and start testing in live migration job 14:13:04 Is my memory right ? 14:13:12 mriedem: ah, yep 14:13:14 * mriedem checks etherpad 14:13:36 L441 here https://etherpad.openstack.org/p/nova-newton-midcycle 14:13:52 TODO: enable tempest tests + micro versions for live migration in live migration job - one live migration microversion in the multi-node job - try enabling nfs(tdurakov) 14:14:26 we're already doing the first one 14:14:33 mriedem: will prepare that dnm patch for nfs soon 14:14:33 that's it - you got to the line number before me 14:14:39 the 2nd (only run one microversion for live migration tests in the multinode job) requires some skips in d-g i think 14:14:47 and then the NFS one 14:15:20 mriedem: could take a look on 2nd also 14:15:38 tdurakov: ok, on the 2nd i'd ask mtreinish if there are questions, i'm not sure how he thought we'd do that one 14:15:44 w/o hacky regex in d-g 14:16:03 mriedem: ok, then only 3rd on me 14:17:13 something I was wondering, is do we need a functional test setup for some of these live-migrate bits, for testing the error handling and things? or does that mock out so much its not useful enough? 14:17:14 yeah i'm asking mtreinish about #2 in -qa 14:17:33 any updates on libvirt fix? 14:17:51 johnthetubaguy: that was on my todo list 14:18:05 johnthetubaguy: ++ for testing rollbacks in tempest 14:18:41 tdurakov, how do we test rollback in tempest ? 14:19:00 yeah, feels like that should be in the functional test jobs 14:19:11 we can check the affinity filters and things, as well, I guess 14:19:23 PaulMurray: I was thinking about at least testing it during live-migration-abort 14:19:55 tdurakov, oh, I see - need to get the timing right though - could be racy 14:20:25 PaulMurray: some hacky image? 14:20:40 that has no chance to finish with success 14:21:21 tdurakov: I've got a 512byte boot block that does that 14:21:40 davidgiluk: good:) 14:22:21 I wonder, should we first has stable job first, or I could do this tests in parallel? 14:22:26 it feels like those negative tests are not a perfect fit for tempest as such, but thats just my take 14:22:47 tdurakov, davidgiluk it sounds a little scary to be - something stable would be a good start 14:23:04 tdurakov: http://git.qemu.org/?p=qemu.git;a=commitdiff;h=ea0c6d62391d269e2d8927a80912d479a0c5cf8a 14:23:18 johnthetubaguy, functional tests would be good - may need some framework building behind it 14:23:32 davidgiluk: thanks 14:24:25 PaulMurray: agree, I'd wait for existing job being stable first, will prepare WIP patch anyway 14:25:00 tdurakov, thanks 14:25:01 yeah, we will need that deep testing once we start doing the statemachine work, just checking where it is on folks radar 14:25:24 johnthetubaguy, I have been thinking about it - the state space is enormous for lm 14:25:42 johnthetubaguy, but worth some effort 14:26:09 johnthetubaguy, maybe we can talk about it later 14:26:15 (like next week) 14:26:53 +1 14:26:54 * tdurakov started working on docs for state machine, as discussed on mid-cycle 14:27:01 ah - actually I've got some holiday coming up - but will go over anyway 14:27:19 * johnthetubaguy is looking at neutron and cinder bits of the live-migrate statemachine, in preparation for the neutron mid-cycle 14:27:20 note that dansmith brought up the need for a dsvm-integration job in nova 14:27:28 which would be backed by devstack 14:27:34 so not living in nova/tests/functional 14:27:48 so it wouldn't be tempest tests, just in tree tests that use a real backend 14:27:55 mriedem: yeah, I think it gets split between both of those 14:28:25 but...nova-dsvm-integration requiring 2 nodes gets a bit wonky 14:28:32 at least on a single-node dev vm 14:28:40 oh... hmm. 14:28:46 forgot about that 14:28:56 anyway, something to think about 14:29:47 lots of stuff to think about it seems 14:30:09 but I think we have an immediate plan 14:30:28 so lets move on 14:30:40 #topic Networking 14:30:55 a couple of things to bring up here 14:31:08 from mid cycle: 14:31:09 Swami's DVR fix to be merged when tempest test passes 14:31:22 fix: https://review.openstack.org/#/c/275073 14:31:35 tempest: https://review.openstack.org/#/c/286855 14:31:40 has to be rebased 14:31:43 the nova change 14:31:51 b/c the pci migration chagne merged 14:32:09 I'm not sure if swami is aware of the decision from last week 14:32:13 the tempest test passed last week btw 14:32:14 I can tell him 14:32:32 PaulMurray: L89 here https://etherpad.openstack.org/p/nova-newton-midcycle 14:32:33 for notes 14:32:58 thanks 14:33:17 following on from that... 14:33:19 Future port info spec to be worked on for Occata 14:33:33 Current form: https://review.openstack.org/#/c/309416 14:34:17 I am attending the neutron midcyle to try and connect our efforts up a little bit more, the new live-migrate flow will by a bit part of that 14:34:20 * PaulMurray I locked up for a moment 14:35:00 I was trying to type in: johnthetubaguy is working on this 14:35:36 heh, yeah, I hope to look into that 14:35:53 I saw armax this week, he sid you had been on to him already 14:36:31 yeah, caught him on friday 14:36:58 The next item is slightly related: Bug: post-copy interrupts networking 14:37:20 i entered a bug to make a note of this: https://bugs.launchpad.net/nova/+bug/1605016 14:37:20 Launchpad bug 1605016 in OpenStack Compute (nova) "Post copy live migration interrupts network connectivity" [Undecided,New] 14:37:39 * davidgiluk looks 14:37:42 We can't say that post-copy is serious until this is dealt with 14:37:54 and I think it will need a spec 14:38:08 +1 14:38:14 its a good problem to focus our minds 14:38:34 PaulMurray: Oh 14:38:41 luis5tb: ^^^ 14:38:59 did you lose connectivity all the time? with one specific VM? 14:39:17 its not really about losing connectivity 14:39:27 its just there is no networking while post copy is happening 14:40:21 at least it can speed up post copy :) 14:40:22 from after the switch until the end, right? 14:40:28 right 14:41:25 ok, I guess I tried either with VMs that did not have that much dirty memory after the switch, so it was fairly fast 14:42:27 and I did not try to connect during the migration either 14:43:25 I expect all our tests check to see if a migration happens - could do with something to test what happens to connectivity at some point 14:43:36 during migration 14:44:21 not sure where to do that hind of thing 14:44:55 mriedem, PaulMurray should we start thinking about switching to neutron instead of nova-network for live-migration? 14:45:35 also related to postcopy, we did not handle what to do if migration fails after the switch. VM at destination will be cleaned up, but the one at source will be left "broken" 14:45:49 tdurakov: for the live migration job? 14:45:58 there is a neutron multi-node job that runs live migration i think 14:46:07 i don't know how stable that is vs the nova-net multinode job 14:46:15 mriedem: yes 14:47:23 use failapotomus to find out 14:47:25 luis5tb: I think its simpler, the host just doesn't have traffic flowing to it until we tell it, which is currently at the very end of live-migrate, as I understood it 14:47:42 #action mriedem to compare live migration job results between noa-net and neutron multinode jobs 14:48:49 johnthetubaguy: yes, that I know, I was just mentioning another "unsolved" problem/issue, the bug is clear 14:49:35 does anyone want to take an action to look at the post-copy networking issue ? 14:49:48 tdurakov, pkoniszewski and I talked about it 14:50:06 we could do with a plan 14:50:22 I would like to take a look, but I'm on vacations after this week 14:50:57 luis5tb: gotcha 14:51:50 johnthetubaguy, I meant to say that I am interested in the networking stuff as well 14:52:34 johnthetubaguy, I'm going on holiday for some of august too - so I'll just keep tabs on what you're up to 14:53:27 johnthetubaguy: I think this case should be covered it state transitions also 14:53:54 as it requires communication with networking service 14:54:51 #topic Open Discussion 14:54:56 anything to finish on ? 14:55:22 I will send out a status summary email this week 14:55:44 #action PaulMurray to send out status summary 14:56:19 I will be on vacation for some of august, including next week 14:56:38 can someone volentieer to run the meeting for the next couple of weeks ? 14:56:53 i could 14:57:04 tdurakov, thanks - you got it 14:57:20 I'll be back mid august so I'll catch up with you then 14:57:37 ok 14:57:50 I think we are done 14:57:55 thank you all for coming 14:58:03 #endmeeting