14:00:24 <tdurakov> #startmeeting Nova Live Migration
14:00:24 <openstack> Meeting started Tue Oct 18 14:00:24 2016 UTC and is due to finish in 60 minutes.  The chair is tdurakov. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:25 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:28 <openstack> The meeting name has been set to 'nova_live_migration'
14:00:35 <tdurakov> hi everyone
14:00:43 <davidgiluk> o/
14:00:45 <pkoniszewski> o/
14:00:45 <wznoinsk> o/
14:00:53 <mriedem> o/
14:00:58 <scsnow> hi
14:01:22 <tdurakov> #topic CI
14:01:34 <tdurakov> update on ceph
14:01:53 <tdurakov> https://review.openstack.org/#/c/387836/ - start moving local changes to fit upstream gates
14:02:28 <tdurakov> this change uses devstack ceph plugin
14:03:05 <tdurakov> pkoniszewski: any updates on grenade?
14:03:11 <pkoniszewski> sure
14:03:34 <tdurakov> except I could see it on the  list of jobs:)
14:03:48 <pkoniszewski> so patches are still in the same place, tempest change got some reviews but still not merged yet
14:04:15 <pkoniszewski> one thing I noticed is that grenade+LM job fails on all patches to, e.g., stable/newton
14:04:22 <pkoniszewski> http://logs.openstack.org/76/386676/1/check/gate-grenade-dsvm-multinode-live-migration-nv/b5ac5ce/logs/grenade.sh.txt.gz#_2016-10-14_16_12_48_938
14:04:54 <pkoniszewski> probably something that we need to care about
14:05:07 <tdurakov> pkoniszewski: is it only about grenade?
14:05:18 <pkoniszewski> mriedem: thoughts? should i change this job to use trusty when testing on stable/newton?
14:05:35 <mriedem> pkoniszewski: no, i'd talk to infra/qa about what that failure is
14:05:44 <pkoniszewski> okay
14:05:45 <mriedem> we were testing master dsvm jobs with xenial in newton
14:05:50 <mriedem> so newton should be able to handle xenial
14:06:29 <pkoniszewski> ok, so i will ask infra about this
14:06:50 <pkoniszewski> that's all I think
14:06:57 <tdurakov> I'd expect xenial to be  added to the list
14:07:34 <tdurakov> pkoniszewski: what is the first patch in queue to push grenade forward?
14:08:22 <pkoniszewski> https://review.openstack.org/#/c/379638/
14:08:40 <tdurakov> btw
14:08:42 <pkoniszewski> this is tempest change so that tests will do ping-pong live migrations
14:08:42 <tdurakov> https://review.openstack.org/#/c/386967/
14:09:28 <tdurakov> pkoniszewski: is it tested in grenade?
14:09:38 <pkoniszewski> let me check
14:10:00 <tdurakov> I mean is it possible to check that ping-pong is used in any of the jobs?
14:10:20 <mriedem> tdurakov: it'll be used in the gate-grenade-dsvm-multinode-live-migration-nv job
14:10:27 <mriedem> i tested the stack with a wip d-g change
14:11:27 <tdurakov> mriedem: I saw that patch on prev week, the only thing that stops me to +1 that it would be really cool that it works in gate, imo
14:11:37 <mriedem> it's non-voting for now
14:11:46 <mriedem> well, i'm not sure what you mean by 'gate'
14:11:59 <mriedem> the job won't be in the gate queue b/c it's non-voting
14:12:34 <mriedem> i believe gate-grenade-dsvm-multinode-live-migration-nv is in nova's experimental queue right now
14:13:17 <mriedem> tested the change here https://review.openstack.org/#/c/380840/
14:13:31 <mriedem> once the series is merged then the job in nova's experimental queue will run the back and forth stuff
14:13:47 <mriedem> once we are comfortable with it being stable we can move to non-voting check queue
14:13:51 <pkoniszewski> grenade job isn't experimental, it's just non-voting
14:14:07 <pkoniszewski> i mean, it's already in non-voting check queue
14:14:14 <tdurakov> requested experimental for tempest change
14:14:26 <mriedem> yeah you're right i guess it's in the check queue https://github.com/openstack-infra/project-config/blob/master/zuul/layout.yaml#L12036
14:14:30 <tdurakov> dsvm-tempest-live-migration should be triggered
14:14:40 <mriedem> tdurakov: the tempest chagne won't test itself anyway
14:14:47 <mriedem> b/c of the config option you have to set in the devstack change
14:14:54 <mriedem> all of that defaults to not run the back and forth live migration
14:15:01 <mriedem> that's what the d-g change at the top of the dep chain is for
14:15:17 <tdurakov> mriedem: just want to be sure that we will not break simple live-migration
14:15:28 <mriedem> it won't, it's already been verified
14:15:39 <mriedem> and it only runs in the live migration + grenade job
14:15:43 <mriedem> which is non-voting
14:15:45 <tdurakov> okay
14:15:54 <tdurakov> +1 then
14:16:15 <tdurakov> mriedem, pkoniszewski thanks for clarification
14:16:28 <tdurakov> anything else on ci?
14:16:37 <wznoinsk> one thing from me
14:16:42 <tdurakov> sure
14:16:56 <wznoinsk> we started to run multinode + LM recently http://intel-openstack-ci-logs.ovh/05/355805/4/check/tempest-dsvm-multinode-ovsdpdk-nfv-networking-xenial/b52122e/testr_results.html.gz
14:17:28 <wznoinsk> it's only api LM test, I guess next step is to wait for more LM tests merged in tempest or shouldw e look at nova/tests as well?
14:18:00 <tdurakov> no, only tempest, imo
14:18:49 <tdurakov> if it possible it would be good to have a job to test live-migration with nfv features, i.e. sr-iov
14:19:11 <pkoniszewski> +1, that's the biggest issue i have with pushing sr-iov live migration forward
14:19:46 <tdurakov> wznoinsk: what do you think?
14:20:26 <pkoniszewski> currently we cannot run such LM tests because basically it will not work
14:20:29 <wznoinsk> that's the long term plan, at the moment we're getting sriov CI right, next step is multinode + LM, not sure whether we'll be ready by the EOY tho
14:21:12 <wznoinsk> pkoniszewski: we can take it offline I guess
14:21:16 <pkoniszewski> sure
14:21:18 <pkoniszewski> thanks
14:21:48 <tdurakov> ok, let's go next
14:21:52 <wznoinsk> btw. the link above includes nfv features like numa and cpu pinning, hugepages
14:22:28 <wznoinsk> http://intel-openstack-ci-logs.ovh/05/355805/4/check/tempest-dsvm-multinode-ovsdpdk-nfv-networking-xenial/b52122e/logs/screen-n-cpu.txt.gz look for hugepage
14:23:00 <tdurakov> wznoinsk:yay, it would be great to have tempest-tests for these features
14:23:48 <wznoinsk> tdurakov: we have them running https://github.com/openstack/intel-nfv-ci-tests , not on a multi-node nor in a live-migration context
14:24:09 <wznoinsk> http://intel-openstack-ci-logs.ovh/55/387855/1/check/tempest-dsvm-intel-nfv-xenial/0b0ce6f/
14:24:31 <tdurakov> wznoinsk: let's merge them to tempest/master?
14:24:43 <wznoinsk> tdurakov: that's TODO, yes
14:25:23 <tdurakov> wznoinsk: do you need help with that?
14:25:47 <tdurakov> we could try to find volunteers
14:25:48 <wznoinsk> probably not but let's talk after the meeting
14:26:00 <tdurakov> ok
14:26:08 <tdurakov> #topic bugs
14:26:36 <paul-carlton2> I have a couple I'd like to to discuss
14:27:10 <tdurakov> yes please
14:27:11 <paul-carlton2> we have a fix for https://bugs.launchpad.net/nova/+bug/1600251 waiting for review
14:27:12 <openstack> Launchpad bug 1600251 in OpenStack Compute (nova) "live migration does not honor server group policy" [High,In progress] - Assigned to Paul Carlton (paul-carlton2)
14:27:55 <paul-carlton2> https://bugs.launchpad.net/nova/+bug/1628606 also has a fix for review but maybe needs some discussion
14:27:56 <openstack> Launchpad bug 1628606 in OpenStack Compute (nova) "live migration does not clean up at target node if a failure occurs during post migration" [Low,In progress] - Assigned to Paul Carlton (paul-carlton2)
14:28:00 <tdurakov> paul-carlton2: that one https://review.openstack.org/#/c/339588/ ?
14:28:33 <paul-carlton2> And finally, I'm trying to fix https://bugs.launchpad.net/nova/+bug/1633033
14:28:33 <openstack> Launchpad bug 1633033 in OpenStack Compute (nova) "live migration with encrypted volume fails" [Undecided,New]
14:28:40 <paul-carlton2> tdurakov, yep
14:31:06 <tdurakov> for the second one there is a patch on review: https://review.openstack.org/#/c/379491
14:31:45 <tdurakov> I've already reviewed it, so could discuss it right now, or wait more feedback first
14:32:38 <pkoniszewski> i will take a look on that
14:33:11 <paul-carlton2> Happy to discuss on review or here, as per my reply I can do something to avoid calling a method twice if needed but would like some more input, ta pkoniszewski
14:33:13 <pkoniszewski> but the thing we need to do is to change the way computes are communicating in post live migration, basically this is what tdurakov is working on
14:34:07 <tdurakov> yes, it's in progress, but we could merge suitable hotfix that could be backported
14:34:41 <pkoniszewski> currently it is all about adding some try-except blocks
14:34:49 <pkoniszewski> we can't really do anything else
14:35:54 <tdurakov> how critical for example to fail on termination of volume connection?
14:36:34 <pkoniszewski> it isn't critical at all, VM on source is not running anyway and will be removed by libvirt (or nova)
14:36:57 <pkoniszewski> but there might be a security risk
14:37:17 <paul-carlton2> the problem is that at this stage the instance is running on the target and needs to be updated to reflect that
14:37:37 <pkoniszewski> got it, this is critical to have instance host updated
14:37:44 <paul-carlton2> the issue with failure to clean up on source is an issue too
14:37:55 <paul-carlton2> but hard to solve automatically
14:38:31 <tdurakov> I'd prefer to not doing double rpc casts here
14:38:39 <paul-carlton2> It is vital the host is update because it has started running on the target although Nova still thinkis it is on source
14:39:01 <tdurakov> could we wrap the code that way, so we could guarantee only one rpc to destination?
14:39:25 <tdurakov> wrap=try/except
14:39:33 <tdurakov> paul-carlton2^
14:40:28 <paul-carlton2> tdurakov, happy to try something like that, the vital thing is to update the instance host
14:40:32 <pkoniszewski> i'd go different way, so that post_live_migration_at_desetination is always called regardless of what happens in _post_live_migration
14:41:33 <paul-carlton2> The really important thing is that the code in the finally block runs
14:41:57 <tdurakov> pkoniszewski: that might work
14:42:16 <pkoniszewski> it will still require manual cleanup, but the host will always be updated during post steps
14:42:51 <pkoniszewski> i will post my comments in patch
14:42:59 <paul-carlton2> pkoniszewski, ta
14:43:30 <tdurakov> paul-carlton2: let's try to remove rpc from _post_live_migration and put it to finally section on the new try/except block
14:44:30 <tdurakov> paul-carlton2: what's about the last one bug?
14:45:27 <paul-carlton2> Yes, trying to fix it, anyone tried live migration of instances with encrypted volumes before
14:46:09 <paul-carlton2> I get a keyphrase error in devstack which I need to figure out how to overcome
14:46:56 <tdurakov> haven't tried that case
14:47:09 <tdurakov> 13 minutes left
14:47:10 <paul-carlton2> In our product I've almost got it fixed, just fails when defining guest on target due to os-brick command err
14:47:45 <paul-carlton2> ok, will post a WIP fix later this week, would appreciate input
14:48:04 <tdurakov> paul-carlton2: feel free to add folks from team to review the change
14:48:10 <paul-carlton2> ta
14:48:15 <tdurakov> #topic summit
14:48:33 <tdurakov> who is going to Barcelona?
14:48:43 <pkoniszewski> i'm going
14:48:55 * tdurakov not going
14:49:24 <tdurakov> mriedem: do we have a slot for live-migration in design sessions?
14:49:28 <mriedem> nope
14:49:54 <mriedem> we have one for the imagebackend refactor in libvirt
14:49:54 <paul-carlton2> nor me
14:50:01 <mriedem> part retrospective, part plan for ocata
14:50:03 <mriedem> on that blueprint
14:50:12 <tdurakov> mriedem: acked
14:50:45 <wznoinsk> not me
14:50:56 <kashyap> tdurakov: There's Friday meet-up space, IIRC, for ad-hoc sessions
14:51:18 <tdurakov> kashyap: right
14:51:55 <tdurakov> I think to prepare topics to discuss on that
14:52:07 <kashyap> tdurakov: Isn't there already an etherpad?
14:52:15 <pkoniszewski> i will probably want to cover sr-iov LM on this session, but we will see
14:52:22 <pkoniszewski> depends what happens with nova-neutron interactions
14:52:52 <kashyap> tdurakov: https://etherpad.openstack.org/p/ocata-nova-summit-meetup
14:53:27 <tdurakov> so please put your topics here^
14:54:08 <tdurakov> #topic Open discussions
14:54:17 <tdurakov> we have ~5 minutes
14:54:35 <scsnow> tdurakov, rg https://review.openstack.org/#/c/355805/
14:54:59 <scsnow> you've mentioned that reno node is required for this change. how to add that?
14:55:03 <scsnow> note*
14:55:32 <pkoniszewski> reno new name-of-release-note
14:55:42 <pkoniszewski> but don't we want to have at least specless blueprint for that?
14:56:21 <tdurakov> mriedem: what do you think?
14:57:35 <tdurakov> I think we could have one, no specs, just bp
14:58:03 <scsnow> ok, I'll add bp. And what about reno?
14:58:44 <pkoniszewski> scsnow: http://docs.openstack.org/developer/reno/usage.html
14:59:02 <scsnow> pkoniszewski, ok, thanks. will take a look.
14:59:14 <tdurakov> pkoniszewski: thanks for the link
14:59:21 <tdurakov> so
14:59:29 <tdurakov> thanks for coming
14:59:33 <pkoniszewski> thanks!
14:59:39 <tdurakov> #endmeeting