14:00:24 #startmeeting Nova Live Migration 14:00:24 Meeting started Tue Oct 18 14:00:24 2016 UTC and is due to finish in 60 minutes. The chair is tdurakov. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:25 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:28 The meeting name has been set to 'nova_live_migration' 14:00:35 hi everyone 14:00:43 o/ 14:00:45 o/ 14:00:45 o/ 14:00:53 o/ 14:00:58 hi 14:01:22 #topic CI 14:01:34 update on ceph 14:01:53 https://review.openstack.org/#/c/387836/ - start moving local changes to fit upstream gates 14:02:28 this change uses devstack ceph plugin 14:03:05 pkoniszewski: any updates on grenade? 14:03:11 sure 14:03:34 except I could see it on the list of jobs:) 14:03:48 so patches are still in the same place, tempest change got some reviews but still not merged yet 14:04:15 one thing I noticed is that grenade+LM job fails on all patches to, e.g., stable/newton 14:04:22 http://logs.openstack.org/76/386676/1/check/gate-grenade-dsvm-multinode-live-migration-nv/b5ac5ce/logs/grenade.sh.txt.gz#_2016-10-14_16_12_48_938 14:04:54 probably something that we need to care about 14:05:07 pkoniszewski: is it only about grenade? 14:05:18 mriedem: thoughts? should i change this job to use trusty when testing on stable/newton? 14:05:35 pkoniszewski: no, i'd talk to infra/qa about what that failure is 14:05:44 okay 14:05:45 we were testing master dsvm jobs with xenial in newton 14:05:50 so newton should be able to handle xenial 14:06:29 ok, so i will ask infra about this 14:06:50 that's all I think 14:06:57 I'd expect xenial to be added to the list 14:07:34 pkoniszewski: what is the first patch in queue to push grenade forward? 14:08:22 https://review.openstack.org/#/c/379638/ 14:08:40 btw 14:08:42 this is tempest change so that tests will do ping-pong live migrations 14:08:42 https://review.openstack.org/#/c/386967/ 14:09:28 pkoniszewski: is it tested in grenade? 14:09:38 let me check 14:10:00 I mean is it possible to check that ping-pong is used in any of the jobs? 14:10:20 tdurakov: it'll be used in the gate-grenade-dsvm-multinode-live-migration-nv job 14:10:27 i tested the stack with a wip d-g change 14:11:27 mriedem: I saw that patch on prev week, the only thing that stops me to +1 that it would be really cool that it works in gate, imo 14:11:37 it's non-voting for now 14:11:46 well, i'm not sure what you mean by 'gate' 14:11:59 the job won't be in the gate queue b/c it's non-voting 14:12:34 i believe gate-grenade-dsvm-multinode-live-migration-nv is in nova's experimental queue right now 14:13:17 tested the change here https://review.openstack.org/#/c/380840/ 14:13:31 once the series is merged then the job in nova's experimental queue will run the back and forth stuff 14:13:47 once we are comfortable with it being stable we can move to non-voting check queue 14:13:51 grenade job isn't experimental, it's just non-voting 14:14:07 i mean, it's already in non-voting check queue 14:14:14 requested experimental for tempest change 14:14:26 yeah you're right i guess it's in the check queue https://github.com/openstack-infra/project-config/blob/master/zuul/layout.yaml#L12036 14:14:30 dsvm-tempest-live-migration should be triggered 14:14:40 tdurakov: the tempest chagne won't test itself anyway 14:14:47 b/c of the config option you have to set in the devstack change 14:14:54 all of that defaults to not run the back and forth live migration 14:15:01 that's what the d-g change at the top of the dep chain is for 14:15:17 mriedem: just want to be sure that we will not break simple live-migration 14:15:28 it won't, it's already been verified 14:15:39 and it only runs in the live migration + grenade job 14:15:43 which is non-voting 14:15:45 okay 14:15:54 +1 then 14:16:15 mriedem, pkoniszewski thanks for clarification 14:16:28 anything else on ci? 14:16:37 one thing from me 14:16:42 sure 14:16:56 we started to run multinode + LM recently http://intel-openstack-ci-logs.ovh/05/355805/4/check/tempest-dsvm-multinode-ovsdpdk-nfv-networking-xenial/b52122e/testr_results.html.gz 14:17:28 it's only api LM test, I guess next step is to wait for more LM tests merged in tempest or shouldw e look at nova/tests as well? 14:18:00 no, only tempest, imo 14:18:49 if it possible it would be good to have a job to test live-migration with nfv features, i.e. sr-iov 14:19:11 +1, that's the biggest issue i have with pushing sr-iov live migration forward 14:19:46 wznoinsk: what do you think? 14:20:26 currently we cannot run such LM tests because basically it will not work 14:20:29 that's the long term plan, at the moment we're getting sriov CI right, next step is multinode + LM, not sure whether we'll be ready by the EOY tho 14:21:12 pkoniszewski: we can take it offline I guess 14:21:16 sure 14:21:18 thanks 14:21:48 ok, let's go next 14:21:52 btw. the link above includes nfv features like numa and cpu pinning, hugepages 14:22:28 http://intel-openstack-ci-logs.ovh/05/355805/4/check/tempest-dsvm-multinode-ovsdpdk-nfv-networking-xenial/b52122e/logs/screen-n-cpu.txt.gz look for hugepage 14:23:00 wznoinsk:yay, it would be great to have tempest-tests for these features 14:23:48 tdurakov: we have them running https://github.com/openstack/intel-nfv-ci-tests , not on a multi-node nor in a live-migration context 14:24:09 http://intel-openstack-ci-logs.ovh/55/387855/1/check/tempest-dsvm-intel-nfv-xenial/0b0ce6f/ 14:24:31 wznoinsk: let's merge them to tempest/master? 14:24:43 tdurakov: that's TODO, yes 14:25:23 wznoinsk: do you need help with that? 14:25:47 we could try to find volunteers 14:25:48 probably not but let's talk after the meeting 14:26:00 ok 14:26:08 #topic bugs 14:26:36 I have a couple I'd like to to discuss 14:27:10 yes please 14:27:11 we have a fix for https://bugs.launchpad.net/nova/+bug/1600251 waiting for review 14:27:12 Launchpad bug 1600251 in OpenStack Compute (nova) "live migration does not honor server group policy" [High,In progress] - Assigned to Paul Carlton (paul-carlton2) 14:27:55 https://bugs.launchpad.net/nova/+bug/1628606 also has a fix for review but maybe needs some discussion 14:27:56 Launchpad bug 1628606 in OpenStack Compute (nova) "live migration does not clean up at target node if a failure occurs during post migration" [Low,In progress] - Assigned to Paul Carlton (paul-carlton2) 14:28:00 paul-carlton2: that one https://review.openstack.org/#/c/339588/ ? 14:28:33 And finally, I'm trying to fix https://bugs.launchpad.net/nova/+bug/1633033 14:28:33 Launchpad bug 1633033 in OpenStack Compute (nova) "live migration with encrypted volume fails" [Undecided,New] 14:28:40 tdurakov, yep 14:31:06 for the second one there is a patch on review: https://review.openstack.org/#/c/379491 14:31:45 I've already reviewed it, so could discuss it right now, or wait more feedback first 14:32:38 i will take a look on that 14:33:11 Happy to discuss on review or here, as per my reply I can do something to avoid calling a method twice if needed but would like some more input, ta pkoniszewski 14:33:13 but the thing we need to do is to change the way computes are communicating in post live migration, basically this is what tdurakov is working on 14:34:07 yes, it's in progress, but we could merge suitable hotfix that could be backported 14:34:41 currently it is all about adding some try-except blocks 14:34:49 we can't really do anything else 14:35:54 how critical for example to fail on termination of volume connection? 14:36:34 it isn't critical at all, VM on source is not running anyway and will be removed by libvirt (or nova) 14:36:57 but there might be a security risk 14:37:17 the problem is that at this stage the instance is running on the target and needs to be updated to reflect that 14:37:37 got it, this is critical to have instance host updated 14:37:44 the issue with failure to clean up on source is an issue too 14:37:55 but hard to solve automatically 14:38:31 I'd prefer to not doing double rpc casts here 14:38:39 It is vital the host is update because it has started running on the target although Nova still thinkis it is on source 14:39:01 could we wrap the code that way, so we could guarantee only one rpc to destination? 14:39:25 wrap=try/except 14:39:33 paul-carlton2^ 14:40:28 tdurakov, happy to try something like that, the vital thing is to update the instance host 14:40:32 i'd go different way, so that post_live_migration_at_desetination is always called regardless of what happens in _post_live_migration 14:41:33 The really important thing is that the code in the finally block runs 14:41:57 pkoniszewski: that might work 14:42:16 it will still require manual cleanup, but the host will always be updated during post steps 14:42:51 i will post my comments in patch 14:42:59 pkoniszewski, ta 14:43:30 paul-carlton2: let's try to remove rpc from _post_live_migration and put it to finally section on the new try/except block 14:44:30 paul-carlton2: what's about the last one bug? 14:45:27 Yes, trying to fix it, anyone tried live migration of instances with encrypted volumes before 14:46:09 I get a keyphrase error in devstack which I need to figure out how to overcome 14:46:56 haven't tried that case 14:47:09 13 minutes left 14:47:10 In our product I've almost got it fixed, just fails when defining guest on target due to os-brick command err 14:47:45 ok, will post a WIP fix later this week, would appreciate input 14:48:04 paul-carlton2: feel free to add folks from team to review the change 14:48:10 ta 14:48:15 #topic summit 14:48:33 who is going to Barcelona? 14:48:43 i'm going 14:48:55 * tdurakov not going 14:49:24 mriedem: do we have a slot for live-migration in design sessions? 14:49:28 nope 14:49:54 we have one for the imagebackend refactor in libvirt 14:49:54 nor me 14:50:01 part retrospective, part plan for ocata 14:50:03 on that blueprint 14:50:12 mriedem: acked 14:50:45 not me 14:50:56 tdurakov: There's Friday meet-up space, IIRC, for ad-hoc sessions 14:51:18 kashyap: right 14:51:55 I think to prepare topics to discuss on that 14:52:07 tdurakov: Isn't there already an etherpad? 14:52:15 i will probably want to cover sr-iov LM on this session, but we will see 14:52:22 depends what happens with nova-neutron interactions 14:52:52 tdurakov: https://etherpad.openstack.org/p/ocata-nova-summit-meetup 14:53:27 so please put your topics here^ 14:54:08 #topic Open discussions 14:54:17 we have ~5 minutes 14:54:35 tdurakov, rg https://review.openstack.org/#/c/355805/ 14:54:59 you've mentioned that reno node is required for this change. how to add that? 14:55:03 note* 14:55:32 reno new name-of-release-note 14:55:42 but don't we want to have at least specless blueprint for that? 14:56:21 mriedem: what do you think? 14:57:35 I think we could have one, no specs, just bp 14:58:03 ok, I'll add bp. And what about reno? 14:58:44 scsnow: http://docs.openstack.org/developer/reno/usage.html 14:59:02 pkoniszewski, ok, thanks. will take a look. 14:59:14 pkoniszewski: thanks for the link 14:59:21 so 14:59:29 thanks for coming 14:59:33 thanks! 14:59:39 #endmeeting