14:00:24 <tdurakov> #startmeeting Nova Live Migration 14:00:24 <openstack> Meeting started Tue Oct 18 14:00:24 2016 UTC and is due to finish in 60 minutes. The chair is tdurakov. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:25 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:28 <openstack> The meeting name has been set to 'nova_live_migration' 14:00:35 <tdurakov> hi everyone 14:00:43 <davidgiluk> o/ 14:00:45 <pkoniszewski> o/ 14:00:45 <wznoinsk> o/ 14:00:53 <mriedem> o/ 14:00:58 <scsnow> hi 14:01:22 <tdurakov> #topic CI 14:01:34 <tdurakov> update on ceph 14:01:53 <tdurakov> https://review.openstack.org/#/c/387836/ - start moving local changes to fit upstream gates 14:02:28 <tdurakov> this change uses devstack ceph plugin 14:03:05 <tdurakov> pkoniszewski: any updates on grenade? 14:03:11 <pkoniszewski> sure 14:03:34 <tdurakov> except I could see it on the list of jobs:) 14:03:48 <pkoniszewski> so patches are still in the same place, tempest change got some reviews but still not merged yet 14:04:15 <pkoniszewski> one thing I noticed is that grenade+LM job fails on all patches to, e.g., stable/newton 14:04:22 <pkoniszewski> http://logs.openstack.org/76/386676/1/check/gate-grenade-dsvm-multinode-live-migration-nv/b5ac5ce/logs/grenade.sh.txt.gz#_2016-10-14_16_12_48_938 14:04:54 <pkoniszewski> probably something that we need to care about 14:05:07 <tdurakov> pkoniszewski: is it only about grenade? 14:05:18 <pkoniszewski> mriedem: thoughts? should i change this job to use trusty when testing on stable/newton? 14:05:35 <mriedem> pkoniszewski: no, i'd talk to infra/qa about what that failure is 14:05:44 <pkoniszewski> okay 14:05:45 <mriedem> we were testing master dsvm jobs with xenial in newton 14:05:50 <mriedem> so newton should be able to handle xenial 14:06:29 <pkoniszewski> ok, so i will ask infra about this 14:06:50 <pkoniszewski> that's all I think 14:06:57 <tdurakov> I'd expect xenial to be added to the list 14:07:34 <tdurakov> pkoniszewski: what is the first patch in queue to push grenade forward? 14:08:22 <pkoniszewski> https://review.openstack.org/#/c/379638/ 14:08:40 <tdurakov> btw 14:08:42 <pkoniszewski> this is tempest change so that tests will do ping-pong live migrations 14:08:42 <tdurakov> https://review.openstack.org/#/c/386967/ 14:09:28 <tdurakov> pkoniszewski: is it tested in grenade? 14:09:38 <pkoniszewski> let me check 14:10:00 <tdurakov> I mean is it possible to check that ping-pong is used in any of the jobs? 14:10:20 <mriedem> tdurakov: it'll be used in the gate-grenade-dsvm-multinode-live-migration-nv job 14:10:27 <mriedem> i tested the stack with a wip d-g change 14:11:27 <tdurakov> mriedem: I saw that patch on prev week, the only thing that stops me to +1 that it would be really cool that it works in gate, imo 14:11:37 <mriedem> it's non-voting for now 14:11:46 <mriedem> well, i'm not sure what you mean by 'gate' 14:11:59 <mriedem> the job won't be in the gate queue b/c it's non-voting 14:12:34 <mriedem> i believe gate-grenade-dsvm-multinode-live-migration-nv is in nova's experimental queue right now 14:13:17 <mriedem> tested the change here https://review.openstack.org/#/c/380840/ 14:13:31 <mriedem> once the series is merged then the job in nova's experimental queue will run the back and forth stuff 14:13:47 <mriedem> once we are comfortable with it being stable we can move to non-voting check queue 14:13:51 <pkoniszewski> grenade job isn't experimental, it's just non-voting 14:14:07 <pkoniszewski> i mean, it's already in non-voting check queue 14:14:14 <tdurakov> requested experimental for tempest change 14:14:26 <mriedem> yeah you're right i guess it's in the check queue https://github.com/openstack-infra/project-config/blob/master/zuul/layout.yaml#L12036 14:14:30 <tdurakov> dsvm-tempest-live-migration should be triggered 14:14:40 <mriedem> tdurakov: the tempest chagne won't test itself anyway 14:14:47 <mriedem> b/c of the config option you have to set in the devstack change 14:14:54 <mriedem> all of that defaults to not run the back and forth live migration 14:15:01 <mriedem> that's what the d-g change at the top of the dep chain is for 14:15:17 <tdurakov> mriedem: just want to be sure that we will not break simple live-migration 14:15:28 <mriedem> it won't, it's already been verified 14:15:39 <mriedem> and it only runs in the live migration + grenade job 14:15:43 <mriedem> which is non-voting 14:15:45 <tdurakov> okay 14:15:54 <tdurakov> +1 then 14:16:15 <tdurakov> mriedem, pkoniszewski thanks for clarification 14:16:28 <tdurakov> anything else on ci? 14:16:37 <wznoinsk> one thing from me 14:16:42 <tdurakov> sure 14:16:56 <wznoinsk> we started to run multinode + LM recently http://intel-openstack-ci-logs.ovh/05/355805/4/check/tempest-dsvm-multinode-ovsdpdk-nfv-networking-xenial/b52122e/testr_results.html.gz 14:17:28 <wznoinsk> it's only api LM test, I guess next step is to wait for more LM tests merged in tempest or shouldw e look at nova/tests as well? 14:18:00 <tdurakov> no, only tempest, imo 14:18:49 <tdurakov> if it possible it would be good to have a job to test live-migration with nfv features, i.e. sr-iov 14:19:11 <pkoniszewski> +1, that's the biggest issue i have with pushing sr-iov live migration forward 14:19:46 <tdurakov> wznoinsk: what do you think? 14:20:26 <pkoniszewski> currently we cannot run such LM tests because basically it will not work 14:20:29 <wznoinsk> that's the long term plan, at the moment we're getting sriov CI right, next step is multinode + LM, not sure whether we'll be ready by the EOY tho 14:21:12 <wznoinsk> pkoniszewski: we can take it offline I guess 14:21:16 <pkoniszewski> sure 14:21:18 <pkoniszewski> thanks 14:21:48 <tdurakov> ok, let's go next 14:21:52 <wznoinsk> btw. the link above includes nfv features like numa and cpu pinning, hugepages 14:22:28 <wznoinsk> http://intel-openstack-ci-logs.ovh/05/355805/4/check/tempest-dsvm-multinode-ovsdpdk-nfv-networking-xenial/b52122e/logs/screen-n-cpu.txt.gz look for hugepage 14:23:00 <tdurakov> wznoinsk:yay, it would be great to have tempest-tests for these features 14:23:48 <wznoinsk> tdurakov: we have them running https://github.com/openstack/intel-nfv-ci-tests , not on a multi-node nor in a live-migration context 14:24:09 <wznoinsk> http://intel-openstack-ci-logs.ovh/55/387855/1/check/tempest-dsvm-intel-nfv-xenial/0b0ce6f/ 14:24:31 <tdurakov> wznoinsk: let's merge them to tempest/master? 14:24:43 <wznoinsk> tdurakov: that's TODO, yes 14:25:23 <tdurakov> wznoinsk: do you need help with that? 14:25:47 <tdurakov> we could try to find volunteers 14:25:48 <wznoinsk> probably not but let's talk after the meeting 14:26:00 <tdurakov> ok 14:26:08 <tdurakov> #topic bugs 14:26:36 <paul-carlton2> I have a couple I'd like to to discuss 14:27:10 <tdurakov> yes please 14:27:11 <paul-carlton2> we have a fix for https://bugs.launchpad.net/nova/+bug/1600251 waiting for review 14:27:12 <openstack> Launchpad bug 1600251 in OpenStack Compute (nova) "live migration does not honor server group policy" [High,In progress] - Assigned to Paul Carlton (paul-carlton2) 14:27:55 <paul-carlton2> https://bugs.launchpad.net/nova/+bug/1628606 also has a fix for review but maybe needs some discussion 14:27:56 <openstack> Launchpad bug 1628606 in OpenStack Compute (nova) "live migration does not clean up at target node if a failure occurs during post migration" [Low,In progress] - Assigned to Paul Carlton (paul-carlton2) 14:28:00 <tdurakov> paul-carlton2: that one https://review.openstack.org/#/c/339588/ ? 14:28:33 <paul-carlton2> And finally, I'm trying to fix https://bugs.launchpad.net/nova/+bug/1633033 14:28:33 <openstack> Launchpad bug 1633033 in OpenStack Compute (nova) "live migration with encrypted volume fails" [Undecided,New] 14:28:40 <paul-carlton2> tdurakov, yep 14:31:06 <tdurakov> for the second one there is a patch on review: https://review.openstack.org/#/c/379491 14:31:45 <tdurakov> I've already reviewed it, so could discuss it right now, or wait more feedback first 14:32:38 <pkoniszewski> i will take a look on that 14:33:11 <paul-carlton2> Happy to discuss on review or here, as per my reply I can do something to avoid calling a method twice if needed but would like some more input, ta pkoniszewski 14:33:13 <pkoniszewski> but the thing we need to do is to change the way computes are communicating in post live migration, basically this is what tdurakov is working on 14:34:07 <tdurakov> yes, it's in progress, but we could merge suitable hotfix that could be backported 14:34:41 <pkoniszewski> currently it is all about adding some try-except blocks 14:34:49 <pkoniszewski> we can't really do anything else 14:35:54 <tdurakov> how critical for example to fail on termination of volume connection? 14:36:34 <pkoniszewski> it isn't critical at all, VM on source is not running anyway and will be removed by libvirt (or nova) 14:36:57 <pkoniszewski> but there might be a security risk 14:37:17 <paul-carlton2> the problem is that at this stage the instance is running on the target and needs to be updated to reflect that 14:37:37 <pkoniszewski> got it, this is critical to have instance host updated 14:37:44 <paul-carlton2> the issue with failure to clean up on source is an issue too 14:37:55 <paul-carlton2> but hard to solve automatically 14:38:31 <tdurakov> I'd prefer to not doing double rpc casts here 14:38:39 <paul-carlton2> It is vital the host is update because it has started running on the target although Nova still thinkis it is on source 14:39:01 <tdurakov> could we wrap the code that way, so we could guarantee only one rpc to destination? 14:39:25 <tdurakov> wrap=try/except 14:39:33 <tdurakov> paul-carlton2^ 14:40:28 <paul-carlton2> tdurakov, happy to try something like that, the vital thing is to update the instance host 14:40:32 <pkoniszewski> i'd go different way, so that post_live_migration_at_desetination is always called regardless of what happens in _post_live_migration 14:41:33 <paul-carlton2> The really important thing is that the code in the finally block runs 14:41:57 <tdurakov> pkoniszewski: that might work 14:42:16 <pkoniszewski> it will still require manual cleanup, but the host will always be updated during post steps 14:42:51 <pkoniszewski> i will post my comments in patch 14:42:59 <paul-carlton2> pkoniszewski, ta 14:43:30 <tdurakov> paul-carlton2: let's try to remove rpc from _post_live_migration and put it to finally section on the new try/except block 14:44:30 <tdurakov> paul-carlton2: what's about the last one bug? 14:45:27 <paul-carlton2> Yes, trying to fix it, anyone tried live migration of instances with encrypted volumes before 14:46:09 <paul-carlton2> I get a keyphrase error in devstack which I need to figure out how to overcome 14:46:56 <tdurakov> haven't tried that case 14:47:09 <tdurakov> 13 minutes left 14:47:10 <paul-carlton2> In our product I've almost got it fixed, just fails when defining guest on target due to os-brick command err 14:47:45 <paul-carlton2> ok, will post a WIP fix later this week, would appreciate input 14:48:04 <tdurakov> paul-carlton2: feel free to add folks from team to review the change 14:48:10 <paul-carlton2> ta 14:48:15 <tdurakov> #topic summit 14:48:33 <tdurakov> who is going to Barcelona? 14:48:43 <pkoniszewski> i'm going 14:48:55 * tdurakov not going 14:49:24 <tdurakov> mriedem: do we have a slot for live-migration in design sessions? 14:49:28 <mriedem> nope 14:49:54 <mriedem> we have one for the imagebackend refactor in libvirt 14:49:54 <paul-carlton2> nor me 14:50:01 <mriedem> part retrospective, part plan for ocata 14:50:03 <mriedem> on that blueprint 14:50:12 <tdurakov> mriedem: acked 14:50:45 <wznoinsk> not me 14:50:56 <kashyap> tdurakov: There's Friday meet-up space, IIRC, for ad-hoc sessions 14:51:18 <tdurakov> kashyap: right 14:51:55 <tdurakov> I think to prepare topics to discuss on that 14:52:07 <kashyap> tdurakov: Isn't there already an etherpad? 14:52:15 <pkoniszewski> i will probably want to cover sr-iov LM on this session, but we will see 14:52:22 <pkoniszewski> depends what happens with nova-neutron interactions 14:52:52 <kashyap> tdurakov: https://etherpad.openstack.org/p/ocata-nova-summit-meetup 14:53:27 <tdurakov> so please put your topics here^ 14:54:08 <tdurakov> #topic Open discussions 14:54:17 <tdurakov> we have ~5 minutes 14:54:35 <scsnow> tdurakov, rg https://review.openstack.org/#/c/355805/ 14:54:59 <scsnow> you've mentioned that reno node is required for this change. how to add that? 14:55:03 <scsnow> note* 14:55:32 <pkoniszewski> reno new name-of-release-note 14:55:42 <pkoniszewski> but don't we want to have at least specless blueprint for that? 14:56:21 <tdurakov> mriedem: what do you think? 14:57:35 <tdurakov> I think we could have one, no specs, just bp 14:58:03 <scsnow> ok, I'll add bp. And what about reno? 14:58:44 <pkoniszewski> scsnow: http://docs.openstack.org/developer/reno/usage.html 14:59:02 <scsnow> pkoniszewski, ok, thanks. will take a look. 14:59:14 <tdurakov> pkoniszewski: thanks for the link 14:59:21 <tdurakov> so 14:59:29 <tdurakov> thanks for coming 14:59:33 <pkoniszewski> thanks! 14:59:39 <tdurakov> #endmeeting