14:01:14 <tdurakov> #startmeeting Nova Live Migration 14:01:16 <openstack> Meeting started Tue Oct 11 14:01:14 2016 UTC and is due to finish in 60 minutes. The chair is tdurakov. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:19 <openstack> The meeting name has been set to 'nova_live_migration' 14:01:28 <tdurakov> hi everyone 14:01:31 <markus_z> o/ 14:01:31 <pkoniszewski> o/ 14:01:52 <mriedem> o/ 14:02:11 <davidgiluk> o/ 14:02:32 * kashyap waves 14:02:35 <tdurakov> let's start 14:02:46 <tdurakov> agenda: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration 14:02:52 <tdurakov> #topic CI 14:02:57 <raj_singh> o/ 14:03:26 <tdurakov> pkoniszewski, raj_singh, any updates on grenade job? 14:03:38 <pkoniszewski> well, patches are waiting for review 14:03:58 <raj_singh> https://review.openstack.org/#/c/364809/ is merged 14:04:00 <pkoniszewski> nothing moved forward there 14:04:36 <pkoniszewski> I mean, patches that do implement new config option to execute ping-pong live migrations 14:04:57 <tdurakov> pkoniszewski: haven't seen tempest patch, we've discussed last time 14:05:01 <mriedem> do we have a d-g patch that tests the tempest change? 14:05:02 <pkoniszewski> https://review.openstack.org/#/c/379638/ 14:05:19 <mriedem> ah https://review.openstack.org/#/c/382456/ 14:05:38 <pkoniszewski> not sure whether it is enough to test it, is it? 14:06:05 <mriedem> well, it should be 14:06:12 <mriedem> set in tempest.conf in http://logs.openstack.org/56/382456/4/check/gate-grenade-dsvm-neutron-multinode/b447a15/logs/new/tempest_conf.txt.gz 14:06:30 <mriedem> [compute-feature-enabled] attach_encrypted_volume = True live_migrate_back_and_forth = True 14:06:32 <mriedem> it is 14:06:44 <pkoniszewski> yeah, so it should be enough 14:06:53 <mriedem> is it set on the old side? 14:07:07 <mriedem> looks like it's not http://logs.openstack.org/56/382456/4/check/gate-grenade-dsvm-neutron-multinode/b447a15/logs/old/tempest_conf.txt.gz 14:07:38 <mriedem> oh i know why 14:07:39 <pkoniszewski> if it needs to be set on both sides i need to improve it then 14:07:44 <mriedem> b/c we don't hae the var in devstack in stable/newton 14:07:54 <mriedem> we don't want to run that in stable/newton i don't think 14:08:20 <pkoniszewski> do we know on which node tempest is running? I mean, do we need to have the config option set on both sides? 14:08:22 <mriedem> so basically, this means we'd only run the live migration back and forth in the grenade multinode job and only on master changes (ocata+) 14:08:28 <mriedem> it's on both sides 14:09:02 <tdurakov> pkoniszewski: it's master node in d-g terms 14:09:04 <mriedem> grenade multinode is a full newton, run smoke tests, upgrade everything to ocata except one compute node stays newton, then run ocata tempest smoke tests 14:09:38 <mriedem> so on the 'new' side you have ocata on one host, and newton on an n-cpu 14:10:17 <pkoniszewski> so i'd need to backport this change 14:10:48 <mriedem> you'd have to backport the devstack change if we wanted to run live migration back and forth on the old side 14:10:50 <mriedem> which is newton 14:10:58 <mriedem> but do we? 14:11:13 <tdurakov> mriedem, tempest is installed on the master node, right? 14:11:16 <mriedem> that would also change grenade multinode jobs on stable/newton to run this 14:11:46 <mriedem> tempest is run from the controller...yes 14:12:19 <tdurakov> well, there is no need to backport any 14:12:26 <mriedem> tempest is branchless 14:12:27 <pkoniszewski> okay, so we can leave it as it is right now, it should be enough to have this flag set only on a 'new' node 14:12:30 <mriedem> d-g is branchless 14:12:43 <mriedem> so the only thing that keeps us from running lm back and forth on newton is devstack 14:13:00 <mriedem> and i don't think we need to expose newton to this right now 14:13:05 <mriedem> since it's in stable branch mode 14:13:29 <pkoniszewski> devstack change is here https://review.openstack.org/#/c/382451/ 14:14:13 <pkoniszewski> so this is a chain of three patches ATM: https://review.openstack.org/#/c/379638/ (tempest) -> https://review.openstack.org/#/c/382451/ (devstack) -> https://review.openstack.org/#/c/382456 (devstack-gate) 14:14:34 <mriedem> i'm +1 on the d-g change 14:14:37 <mriedem> i'll work down the stack 14:15:00 <pkoniszewski> thanks mriedem for your support there 14:16:25 <tdurakov> anything else on that? 14:16:52 <tdurakov> are there plans to test this on different backends? 14:16:52 <pkoniszewski> we can move forward 14:17:38 <pkoniszewski> this gate uses the same hooks, so once we enable NFS and Ceph backend back again we will have both tested in greande job too 14:17:46 <pkoniszewski> the same hooks as a normal job 14:18:06 <tdurakov> pkoniszewski: let's discuss it after meeting, have concerns on that 14:18:11 <pkoniszewski> okay 14:18:45 <tdurakov> #topic Bugs 14:18:54 <tdurakov> markus_z: are you around? 14:19:00 <markus_z> yep, I'm here 14:19:22 <markus_z> I wanted to ask for reviews for https://review.openstack.org/#/c/275801/ 14:19:40 <markus_z> ^ solves a 1yr old issue when live-migrating and instance which has serial-console support. 14:20:11 <pkoniszewski> i will make sure to test this 14:20:13 <markus_z> A backport to newton would be desirable too 14:20:48 <markus_z> pkoniszewski: Awesome, thanks. You could use https://github.com/markuszoeller/openstack/tree/master/scripts/vagrant/live-migration-U1404-VB if you like. 14:21:38 <pkoniszewski> okay, if i don't manage to resurrect my env, thanks! 14:23:11 <tdurakov> 28 versions, ok, let's test and review it. 14:23:34 <tdurakov> markus_z: thanks for bringing this 14:24:00 <pkoniszewski> we have some outstanding high bugs with fixes up for review 14:24:39 <tdurakov> pkoniszewski: share links on that 14:24:52 <pkoniszewski> searching for 14:25:08 <pkoniszewski> https://review.openstack.org/#/c/339588/ - LM currently does not honor affinity/anti-affinity policies 14:26:10 <pkoniszewski> https://review.openstack.org/#/c/356558/ - fixes LM on dedicated interface when tunnelling is OFF (by default in newton tunnelling is off) 14:26:28 <johnthetubaguy> Looks like some of these are listed, but maybe a few more to add? https://etherpad.openstack.org/p/ocata-nova-priorities-tracking 14:27:12 <tdurakov> there is 1 unassigned high bug https://bugs.launchpad.net/nova/+bug/1566622 14:27:12 <openstack> Launchpad bug 1566622 in OpenStack Compute (nova) "live migration fails with xenapi virt driver and SRs with old-style naming convention" [High,Confirmed] 14:28:32 <tdurakov> johnthetubaguy: you've reviewed fix on that https://review.openstack.org/#/c/307541/ 14:28:59 <tdurakov> johnthetubaguy: is it ok to restore and update it? 14:29:52 <johnthetubaguy> tdurakov: possibly, I thought we merged something on that one though... I don't 100% remember now 14:30:09 <johnthetubaguy> tdurakov: it made we want to do lots of refactoring, but not sure 14:30:43 <tdurakov> need to figure out whether is still valid or not 14:30:51 <tdurakov> johnthetubaguy: could you help with it? 14:32:20 <tdurakov> #topic specs 14:32:56 <tdurakov> do we have any new specs except 2 in agenda? 14:33:22 <tdurakov> pkoniszewski: do you still going to restore sr-iov one? 14:33:40 <pkoniszewski> tdurakov: yes 14:33:53 <pkoniszewski> tdurakov: haven't found time yet 14:34:08 <tdurakov> cool, I'm going to wrap things in the etherpad today 14:34:36 <tdurakov> to track ocata things 14:34:45 <johnthetubaguy> tdurakov: I can try help with that, although its on quite a big stack right now 14:35:18 <tdurakov> johnthetubaguy: refactoring? 14:36:02 <johnthetubaguy> tdurakov: basically that bit of the xenapi driver is messy around its use of the objects, and dicts and other stuff, so I want to tidy that all up at some point 14:36:26 <tdurakov> ok, help is welcome on that 14:36:36 <tdurakov> very welcome:) 14:37:06 <tdurakov> #topic open discussions 14:37:08 <johnthetubaguy> I do have some WIP patches / stuff in my head about how that should work, I will try get to that 14:38:48 <tdurakov> markus_z, there is a question in agenda, so? 14:39:11 <markus_z> Yeah, I thought I might ask here 14:39:31 <markus_z> My question is, how do we handle live-migrations to target nodes with an older hypervisor version today? 14:39:57 <pkoniszewski> we don't allow them 14:39:58 <markus_z> Let's say we have qemu 2.7.0 on the source node and qemu 2.3.0 on the target node. 14:40:00 <johnthetubaguy> we fail, in some cases, in pre-livemigration, I think 14:40:08 <pkoniszewski> in conductor 14:40:18 <davidgiluk> markus_z: With upstream qemu that's very unlikely to work 14:40:42 <markus_z> Oh, in conductor you say? I have checked only the libvirt driver. 14:40:53 <pkoniszewski> https://github.com/openstack/nova/blob/c824982e6a3d6660697e503f7236377cc8202d41/nova/conductor/tasks/live_migrate.py#L140 14:40:56 <pkoniszewski> here 14:41:02 <kashyap> markus_z: I think he means having machine types compatilibity, etc 14:41:04 <johnthetubaguy> yeah, thats it 14:41:11 <pkoniszewski> we do check version of the hypervisor basing on what we have in DB 14:41:31 <kashyap> davidgiluk: Isn't it, when you're referring to fwd & backward migr 14:41:32 <pkoniszewski> so we only allow old -> new live migrations 14:41:33 <markus_z> Ah man, yeah, thanks, that's exactly what I was looking for 14:41:52 <tdurakov> hmm 14:42:04 <tdurakov> I wonder, could it be blocker during upgrades? 14:42:10 <pkoniszewski> it can be 14:42:27 <markus_z> how? 14:42:41 <tdurakov> if smth goes wrong it will be impossible to get back to old compute, right? 14:43:02 <pkoniszewski> markus_z: it can be when you are moving VMs to newer operating system 14:43:09 <pkoniszewski> e.g. from 14.04 to 16.04 14:43:14 <pkoniszewski> you might not be able to move back to 14.04 14:43:34 <pkoniszewski> unless you upgrade QEMU on 14.04 side 14:43:41 <markus_z> I thought the VM of the source node gets killed *after* the l-m was successful? 14:43:59 <tdurakov> if there will be no check, will it be technically possible? on libvirt/qemu layers 14:44:42 <pkoniszewski> davidgiluk just said that it's very unlikely to work 14:45:19 <tdurakov> okay then 14:46:01 <markus_z> thanks a lot :) 14:46:05 <tdurakov> I'd prefer to have some soft checks flags for that 14:46:19 <davidgiluk> pkoniszewski: backwards migration tends not to work unless you're either 1) very lucky or 2) Someone has done a lot of work to wire the machine types to be exactly the same (that we do downstream) 14:47:37 <tdurakov> thanks davidgiluk 14:47:41 <pkoniszewski> hmm, can we base on configuration then? i mean, when someone explicitly sets machine type in nova.conf? but how would we know whether this machine type is supported by QEMU on both sides, maybe we need another precheck during check_can_live_migrate 14:48:26 <tdurakov> pkoniszewski: worth to test, I thinh 14:48:31 <tdurakov> s/think 14:48:44 <pkoniszewski> yeah, something we might want to check 14:49:09 <tdurakov> anything else to bring? 14:49:39 <davidgiluk> pkoniszewski: It's difficult to check, for example it might be one particular device that used some new field in a new qemu 14:50:33 <tdurakov> davidgiluk: i think it's not about upgrades 14:51:06 <tdurakov> but still 14:51:46 <tdurakov> thanks everyone for participating 14:51:50 <tdurakov> #endmeeting