14:00:11 <PaulMurray> #startmeeting Nova Live Migration
14:00:13 <openstack> Meeting started Tue Jul 26 14:00:11 2016 UTC and is due to finish in 60 minutes.  The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:16 <openstack> The meeting name has been set to 'nova_live_migration'
14:00:25 <mriedem> o/
14:00:27 <andrearosa> hi
14:00:28 <auggy> o/
14:00:31 <diana_clarke> o/
14:00:31 <luis5tb> o/
14:00:32 <paul-carlton2> hi
14:00:40 <PaulMurray> good morning all
14:00:43 <tdurakov> hi
14:01:00 <diana_clarke> mdbooth is on vacation this week
14:01:00 <johnthetubaguy> hi
14:01:06 <PaulMurray> I'm in mountain time today - I don't know how you americans can do this stuff at this time in the monring
14:01:13 <woodster_> o/
14:01:34 * johnthetubaguy points towards a strong coffee
14:01:39 <PaulMurray> agenda: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration
14:01:41 <auggy> me neither
14:02:00 * PaulMurray only has black coffee - no creamer in room - agghhh
14:02:16 <PaulMurray> going straight in
14:02:32 <PaulMurray> #topic Libvirt image backend
14:02:59 <PaulMurray> #link http://lists.openstack.org/pipermail/openstack-dev/2016-July/099858.html - mdbooth email summary
14:03:20 <mriedem> bottom change is https://review.openstack.org/#/c/333271/
14:03:28 <mriedem> i plan on getting on that today
14:03:42 <mriedem> now that the api deprecation stuff is mostly done
14:03:58 <PaulMurray> mdbooth, that was a very good email btw
14:04:09 <PaulMurray> all details in there
14:04:25 <mriedem> note that mdbooth is on vacation this week
14:04:49 <PaulMurray> mriedem, is he back next week ?
14:04:53 <diana_clarke> yes
14:05:02 <davidgiluk> o/
14:05:05 <PaulMurray> good
14:05:25 <PaulMurray> #topic Storage pools
14:05:56 <PaulMurray> #link  http://lists.openstack.org/pipermail/openstack-dev/2016-July/099931.html - storage pools message from paul-carlton2
14:06:04 <PaulMurray> paul-carlton2, anything to add ?
14:06:36 <mriedem> i'm not sure which Matt is being called out there
14:06:43 <paul-carlton2> nope, hope to get this apporoved, talking to max from virtusso in nova room about it now
14:07:02 <paul-carlton2> will bug mdbooth when he returns next week
14:07:10 <mriedem> has danpb looked at the spec?
14:07:30 <paul-carlton2> if it gets some plus ones dansmith says he will look at it
14:07:43 <paul-carlton2> I'll ping danpb too
14:08:08 <johnthetubaguy> we merge this one, so we can make any progress we can, in parallel with libvirt image backend right?
14:08:10 * dansmith nods
14:08:12 <paul-carlton2> I think he has seen it, we talked about it last mid cycle i.e. in Bristol
14:08:59 <paul-carlton2> johnthetubaguy, yep, there is more work than there is time to do it in Newton that does not depend on mdbooth and diana_clarke work
14:09:03 <mriedem> danpb wasn't on it, i added him
14:09:07 <mriedem> so chances are he hasn't seen it
14:09:14 <paul-carlton2> mriedem, ta
14:09:27 <johnthetubaguy> paul-carlton2: cool, just checking my memory, sounds good
14:10:04 <PaulMurray> next
14:10:07 <PaulMurray> #topic CI
14:10:32 <PaulMurray> Where are we with CI - I lost track a bit ?
14:10:52 * johnthetubaguy looks at tdurakov
14:11:04 <tdurakov> live-migration job - waiting for the fix from danpb
14:11:15 <johnthetubaguy> I am also curious if we have tempest tests up for review for all the new live-migration API operations?
14:11:31 <davidgiluk> What about https://bugs.launchpad.net/nova/+bug/1524898  ?  Its' coming down to an iscsi setup thing
14:11:31 <openstack> Launchpad bug 1524898 in OpenStack Compute (nova) "Volume based live migration aborted unexpectedly" [High,Confirmed]
14:12:07 <mriedem> johnthetubaguy: i think https://review.openstack.org/#/c/338256/ and below
14:12:56 <PaulMurray> I think the plan from mid cycle was to stop testing everything in multinode job and start testing in live migration job
14:13:04 <PaulMurray> Is my memory right ?
14:13:12 <johnthetubaguy> mriedem: ah, yep
14:13:14 * mriedem checks etherpad
14:13:36 <mriedem> L441 here https://etherpad.openstack.org/p/nova-newton-midcycle
14:13:52 <mriedem> TODO: enable tempest tests + micro versions for live migration in live migration job - one live migration microversion in the multi-node job - try enabling nfs(tdurakov)
14:14:26 <mriedem> we're already doing the first one
14:14:33 <tdurakov> mriedem: will prepare that dnm patch for nfs soon
14:14:33 <PaulMurray> that's it - you got to the line number before me
14:14:39 <mriedem> the 2nd (only run one microversion for live migration tests in the multinode job) requires some skips in d-g i think
14:14:47 <mriedem> and then the NFS one
14:15:20 <tdurakov> mriedem: could take a look on 2nd also
14:15:38 <mriedem> tdurakov: ok, on the 2nd i'd ask mtreinish if there are questions, i'm not sure how he thought we'd do that one
14:15:44 <mriedem> w/o hacky regex in d-g
14:16:03 <tdurakov> mriedem: ok, then only 3rd on me
14:17:13 <johnthetubaguy> something I was wondering, is do we need a functional test setup for some of these live-migrate bits, for testing the error handling and things? or does that mock out so much its not useful enough?
14:17:14 <mriedem> yeah i'm asking mtreinish about #2 in -qa
14:17:33 <tdurakov> any updates on libvirt fix?
14:17:51 <tdurakov> johnthetubaguy: that was on my todo list
14:18:05 <tdurakov> johnthetubaguy: ++ for testing rollbacks in tempest
14:18:41 <PaulMurray> tdurakov, how do we test rollback in tempest ?
14:19:00 <johnthetubaguy> yeah, feels like that should be in the functional test jobs
14:19:11 <johnthetubaguy> we can check the affinity filters and things, as well, I guess
14:19:23 <tdurakov> PaulMurray: I was thinking about at least testing it during live-migration-abort
14:19:55 <PaulMurray> tdurakov, oh, I see - need to get the timing right though - could be racy
14:20:25 <tdurakov> PaulMurray: some hacky image?
14:20:40 <tdurakov> that has no chance to finish with success
14:21:21 <davidgiluk> tdurakov: I've got a 512byte boot block that does that
14:21:40 <tdurakov> davidgiluk: good:)
14:22:21 <tdurakov> I wonder, should we first has stable job first, or I could do this tests in parallel?
14:22:26 <johnthetubaguy> it feels like those negative tests are not a perfect fit for tempest as such, but thats just my take
14:22:47 <PaulMurray> tdurakov, davidgiluk it sounds a little scary to be - something stable would be a good start
14:23:04 <davidgiluk> tdurakov: http://git.qemu.org/?p=qemu.git;a=commitdiff;h=ea0c6d62391d269e2d8927a80912d479a0c5cf8a
14:23:18 <PaulMurray> johnthetubaguy, functional tests would be good - may need some framework building behind it
14:23:32 <tdurakov> davidgiluk: thanks
14:24:25 <tdurakov> PaulMurray: agree, I'd wait for existing job being stable first, will prepare WIP patch anyway
14:25:00 <PaulMurray> tdurakov, thanks
14:25:01 <johnthetubaguy> yeah, we will need that deep testing once we start doing the statemachine work, just checking where it is on folks radar
14:25:24 <PaulMurray> johnthetubaguy, I have been thinking about it - the state space is enormous for lm
14:25:42 <PaulMurray> johnthetubaguy, but worth some effort
14:26:09 <PaulMurray> johnthetubaguy, maybe we can talk about it later
14:26:15 <PaulMurray> (like next week)
14:26:53 <johnthetubaguy> +1
14:26:54 * tdurakov started working on docs for state machine, as discussed on mid-cycle
14:27:01 <PaulMurray> ah - actually I've got some holiday coming up - but will go over anyway
14:27:19 * johnthetubaguy is looking at neutron and cinder bits of the live-migrate statemachine, in preparation for the neutron mid-cycle
14:27:20 <mriedem> note that dansmith brought up the need for a dsvm-integration job in nova
14:27:28 <mriedem> which would be backed by devstack
14:27:34 <mriedem> so not living in nova/tests/functional
14:27:48 <mriedem> so it wouldn't be tempest tests, just in tree tests that use a real backend
14:27:55 <johnthetubaguy> mriedem: yeah, I think it gets split between both of those
14:28:25 <mriedem> but...nova-dsvm-integration requiring 2 nodes gets a bit wonky
14:28:32 <mriedem> at least on a single-node dev vm
14:28:40 <johnthetubaguy> oh... hmm.
14:28:46 <johnthetubaguy> forgot about that
14:28:56 <mriedem> anyway, something to think about
14:29:47 <PaulMurray> lots of stuff to think about it seems
14:30:09 <PaulMurray> but I think we have an immediate plan
14:30:28 <PaulMurray> so lets move on
14:30:40 <PaulMurray> #topic Networking
14:30:55 <PaulMurray> a couple of things to bring up here
14:31:08 <PaulMurray> from mid cycle:
14:31:09 <PaulMurray> Swami's DVR fix to be merged when tempest test passes
14:31:22 <PaulMurray> fix: https://review.openstack.org/#/c/275073
14:31:35 <PaulMurray> tempest: https://review.openstack.org/#/c/286855
14:31:40 <mriedem> has to be rebased
14:31:43 <mriedem> the nova change
14:31:51 <mriedem> b/c the pci migration chagne merged
14:32:09 <PaulMurray> I'm not sure if swami is aware of the decision from last week
14:32:13 <mriedem> the tempest test passed last week btw
14:32:14 <PaulMurray> I can tell him
14:32:32 <mriedem> PaulMurray: L89 here https://etherpad.openstack.org/p/nova-newton-midcycle
14:32:33 <mriedem> for notes
14:32:58 <PaulMurray> thanks
14:33:17 <PaulMurray> following on from that...
14:33:19 <PaulMurray> Future port info spec to be worked on for Occata
14:33:33 <PaulMurray> Current form: https://review.openstack.org/#/c/309416
14:34:17 <johnthetubaguy> I am attending the neutron midcyle to try and connect our efforts up a little bit more, the new live-migrate flow will by a bit part of that
14:34:20 * PaulMurray I locked up for a moment
14:35:00 <PaulMurray> I was trying to type in: johnthetubaguy is working on this
14:35:36 <johnthetubaguy> heh, yeah, I hope to look into that
14:35:53 <PaulMurray> I saw armax this week, he sid you had been on to him already
14:36:31 <johnthetubaguy> yeah, caught him on friday
14:36:58 <PaulMurray> The next item is slightly related: Bug: post-copy interrupts networking
14:37:20 <PaulMurray> i entered a bug to make a note of this: https://bugs.launchpad.net/nova/+bug/1605016
14:37:20 <openstack> Launchpad bug 1605016 in OpenStack Compute (nova) "Post copy live migration interrupts network connectivity" [Undecided,New]
14:37:39 * davidgiluk looks
14:37:42 <PaulMurray> We can't say that post-copy is serious until this is dealt with
14:37:54 <PaulMurray> and I think it will need a spec
14:38:08 <johnthetubaguy> +1
14:38:14 <johnthetubaguy> its a good problem to focus our minds
14:38:34 <davidgiluk> PaulMurray: Oh
14:38:41 <davidgiluk> luis5tb: ^^^
14:38:59 <luis5tb> did you lose connectivity all the time? with one specific VM?
14:39:17 <PaulMurray> its not really about losing connectivity
14:39:27 <PaulMurray> its just there is no networking while post copy is happening
14:40:21 <PaulMurray> at least it can speed up post copy :)
14:40:22 <luis5tb> from after the switch until the end, right?
14:40:28 <PaulMurray> right
14:41:25 <luis5tb> ok, I guess I tried either with VMs that did not have that much dirty memory after the switch, so it was fairly fast
14:42:27 <luis5tb> and I did not try to connect during the migration either
14:43:25 <PaulMurray> I expect all our tests check to see if a migration happens - could do with something to test what happens to connectivity at some point
14:43:36 <PaulMurray> during migration
14:44:21 <PaulMurray> not sure where to do that hind of thing
14:44:55 <tdurakov> mriedem, PaulMurray should we start thinking about switching to neutron instead of nova-network for live-migration?
14:45:35 <luis5tb> also related to postcopy, we did not handle what to do if migration fails after the switch. VM at destination will be cleaned up, but the one at source will be left "broken"
14:45:49 <mriedem> tdurakov: for the live migration job?
14:45:58 <mriedem> there is a neutron multi-node job that runs live migration i think
14:46:07 <mriedem> i don't know how stable that is vs the nova-net multinode job
14:46:15 <tdurakov> mriedem: yes
14:47:23 <mriedem> use failapotomus to find out
14:47:25 <johnthetubaguy> luis5tb: I think its simpler, the host just doesn't have traffic flowing to it until we tell it, which is currently at the very end of live-migrate, as I understood it
14:47:42 <mriedem> #action mriedem to compare live migration job results between noa-net and neutron multinode jobs
14:48:49 <luis5tb> johnthetubaguy: yes, that I know, I was just mentioning another "unsolved" problem/issue, the bug is clear
14:49:35 <PaulMurray> does anyone want to take an action to look at the post-copy networking issue ?
14:49:48 <PaulMurray> tdurakov, pkoniszewski and I talked about it
14:50:06 <PaulMurray> we could do with a plan
14:50:22 <luis5tb> I would like to take a look, but I'm on vacations after this week
14:50:57 <johnthetubaguy> luis5tb: gotcha
14:51:50 <PaulMurray> johnthetubaguy, I meant to say that I am interested in the networking stuff as well
14:52:34 <PaulMurray> johnthetubaguy, I'm going on holiday for some of august too - so I'll just keep tabs on what you're up to
14:53:27 <tdurakov> johnthetubaguy: I think this case should be covered it state transitions also
14:53:54 <tdurakov> as it requires communication with networking service
14:54:51 <PaulMurray> #topic Open Discussion
14:54:56 <PaulMurray> anything to finish on ?
14:55:22 <PaulMurray> I will send out a status summary email this week
14:55:44 <PaulMurray> #action PaulMurray to send out status summary
14:56:19 <PaulMurray> I will be on vacation for some of august, including next week
14:56:38 <PaulMurray> can someone volentieer to run the meeting for the next couple of weeks ?
14:56:53 <tdurakov> i could
14:57:04 <PaulMurray> tdurakov, thanks - you got it
14:57:20 <PaulMurray> I'll be back mid august so I'll catch up with you then
14:57:37 <tdurakov> ok
14:57:50 <PaulMurray> I think we are done
14:57:55 <PaulMurray> thank you all for coming
14:58:03 <PaulMurray> #endmeeting