14:01:14 <tdurakov> #startmeeting Nova Live Migration
14:01:16 <openstack> Meeting started Tue Oct 11 14:01:14 2016 UTC and is due to finish in 60 minutes.  The chair is tdurakov. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:19 <openstack> The meeting name has been set to 'nova_live_migration'
14:01:28 <tdurakov> hi everyone
14:01:31 <markus_z> o/
14:01:31 <pkoniszewski> o/
14:01:52 <mriedem> o/
14:02:11 <davidgiluk> o/
14:02:32 * kashyap waves
14:02:35 <tdurakov> let's start
14:02:46 <tdurakov> agenda: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration
14:02:52 <tdurakov> #topic CI
14:02:57 <raj_singh> o/
14:03:26 <tdurakov> pkoniszewski, raj_singh, any updates on grenade job?
14:03:38 <pkoniszewski> well, patches are waiting for review
14:03:58 <raj_singh> https://review.openstack.org/#/c/364809/ is merged
14:04:00 <pkoniszewski> nothing moved forward there
14:04:36 <pkoniszewski> I mean, patches that do implement new config option to execute ping-pong live migrations
14:04:57 <tdurakov> pkoniszewski: haven't seen tempest patch, we've discussed last time
14:05:01 <mriedem> do we have a d-g patch that tests the tempest change?
14:05:02 <pkoniszewski> https://review.openstack.org/#/c/379638/
14:05:19 <mriedem> ah https://review.openstack.org/#/c/382456/
14:05:38 <pkoniszewski> not sure whether it is enough to test it, is it?
14:06:05 <mriedem> well, it should be
14:06:12 <mriedem> set in tempest.conf in http://logs.openstack.org/56/382456/4/check/gate-grenade-dsvm-neutron-multinode/b447a15/logs/new/tempest_conf.txt.gz
14:06:30 <mriedem> [compute-feature-enabled] attach_encrypted_volume = True live_migrate_back_and_forth = True
14:06:32 <mriedem> it is
14:06:44 <pkoniszewski> yeah, so it should be enough
14:06:53 <mriedem> is it set on the old side?
14:07:07 <mriedem> looks like it's not http://logs.openstack.org/56/382456/4/check/gate-grenade-dsvm-neutron-multinode/b447a15/logs/old/tempest_conf.txt.gz
14:07:38 <mriedem> oh i know why
14:07:39 <pkoniszewski> if it needs to be set on both sides i need to improve it then
14:07:44 <mriedem> b/c we don't hae the var in devstack in stable/newton
14:07:54 <mriedem> we don't want to run that in stable/newton i don't think
14:08:20 <pkoniszewski> do we know on which node tempest is running? I mean, do we need to have the config option set on both sides?
14:08:22 <mriedem> so basically, this means we'd only run the live migration back and forth in the grenade multinode job and only on master changes (ocata+)
14:08:28 <mriedem> it's on both sides
14:09:02 <tdurakov> pkoniszewski: it's master node in d-g terms
14:09:04 <mriedem> grenade multinode is a full newton, run smoke tests, upgrade everything to ocata except one compute node stays newton, then run ocata tempest smoke tests
14:09:38 <mriedem> so on the 'new' side you have ocata on one host, and newton on an n-cpu
14:10:17 <pkoniszewski> so i'd need to backport this change
14:10:48 <mriedem> you'd have to backport the devstack change if we wanted to run live migration back and forth on the old side
14:10:50 <mriedem> which is newton
14:10:58 <mriedem> but do we?
14:11:13 <tdurakov> mriedem, tempest is installed on the master node, right?
14:11:16 <mriedem> that would also change grenade multinode jobs on stable/newton to run this
14:11:46 <mriedem> tempest is run from the controller...yes
14:12:19 <tdurakov> well, there is no need to backport any
14:12:26 <mriedem> tempest is branchless
14:12:27 <pkoniszewski> okay, so we can leave it as it is right now, it should be enough to have this flag set only on a 'new' node
14:12:30 <mriedem> d-g is branchless
14:12:43 <mriedem> so the only thing that keeps us from running lm back and forth on newton is devstack
14:13:00 <mriedem> and i don't think we need to expose newton to this right now
14:13:05 <mriedem> since it's in stable branch mode
14:13:29 <pkoniszewski> devstack change is here https://review.openstack.org/#/c/382451/
14:14:13 <pkoniszewski> so this is a chain of three patches ATM: https://review.openstack.org/#/c/379638/ (tempest) -> https://review.openstack.org/#/c/382451/ (devstack) -> https://review.openstack.org/#/c/382456 (devstack-gate)
14:14:34 <mriedem> i'm +1 on the d-g change
14:14:37 <mriedem> i'll work down the stack
14:15:00 <pkoniszewski> thanks mriedem for your support there
14:16:25 <tdurakov> anything else on that?
14:16:52 <tdurakov> are there plans to test this on different backends?
14:16:52 <pkoniszewski> we can move forward
14:17:38 <pkoniszewski> this gate uses the same hooks, so once we enable NFS and Ceph backend back again we will have both tested in greande job too
14:17:46 <pkoniszewski> the same hooks as a normal job
14:18:06 <tdurakov> pkoniszewski: let's discuss it after meeting, have concerns on that
14:18:11 <pkoniszewski> okay
14:18:45 <tdurakov> #topic Bugs
14:18:54 <tdurakov> markus_z: are you around?
14:19:00 <markus_z> yep, I'm here
14:19:22 <markus_z> I wanted to ask for reviews for https://review.openstack.org/#/c/275801/
14:19:40 <markus_z> ^ solves a 1yr old issue when live-migrating and instance which has serial-console support.
14:20:11 <pkoniszewski> i will make sure to test this
14:20:13 <markus_z> A backport to newton would be desirable too
14:20:48 <markus_z> pkoniszewski: Awesome, thanks. You could use https://github.com/markuszoeller/openstack/tree/master/scripts/vagrant/live-migration-U1404-VB if you like.
14:21:38 <pkoniszewski> okay, if i don't manage to resurrect my env, thanks!
14:23:11 <tdurakov> 28 versions, ok, let's test and review it.
14:23:34 <tdurakov> markus_z: thanks for bringing this
14:24:00 <pkoniszewski> we have some outstanding high bugs with fixes up for review
14:24:39 <tdurakov> pkoniszewski: share links on that
14:24:52 <pkoniszewski> searching for
14:25:08 <pkoniszewski> https://review.openstack.org/#/c/339588/ - LM currently does not honor affinity/anti-affinity policies
14:26:10 <pkoniszewski> https://review.openstack.org/#/c/356558/ - fixes LM on dedicated interface when tunnelling is OFF (by default in newton tunnelling is off)
14:26:28 <johnthetubaguy> Looks like some of these are listed, but maybe a few more to add? https://etherpad.openstack.org/p/ocata-nova-priorities-tracking
14:27:12 <tdurakov> there is 1 unassigned high bug https://bugs.launchpad.net/nova/+bug/1566622
14:27:12 <openstack> Launchpad bug 1566622 in OpenStack Compute (nova) "live migration fails with xenapi virt driver and SRs with old-style naming convention" [High,Confirmed]
14:28:32 <tdurakov> johnthetubaguy: you've reviewed fix on that https://review.openstack.org/#/c/307541/
14:28:59 <tdurakov> johnthetubaguy: is it ok to restore and update it?
14:29:52 <johnthetubaguy> tdurakov: possibly, I thought we merged something on that one though... I don't 100% remember now
14:30:09 <johnthetubaguy> tdurakov: it made we want to do lots of refactoring, but not sure
14:30:43 <tdurakov> need to figure out whether is still valid or not
14:30:51 <tdurakov> johnthetubaguy: could you help with it?
14:32:20 <tdurakov> #topic specs
14:32:56 <tdurakov> do we have any new specs except 2 in agenda?
14:33:22 <tdurakov> pkoniszewski: do you still going to restore sr-iov one?
14:33:40 <pkoniszewski> tdurakov: yes
14:33:53 <pkoniszewski> tdurakov: haven't found time yet
14:34:08 <tdurakov> cool, I'm going to wrap things in the etherpad today
14:34:36 <tdurakov> to track ocata things
14:34:45 <johnthetubaguy> tdurakov: I can try help with that, although its on quite a big stack right now
14:35:18 <tdurakov> johnthetubaguy: refactoring?
14:36:02 <johnthetubaguy> tdurakov: basically that bit of the xenapi driver is messy around its use of the objects, and dicts and other stuff, so I want to tidy that all up at some point
14:36:26 <tdurakov> ok, help is welcome on that
14:36:36 <tdurakov> very welcome:)
14:37:06 <tdurakov> #topic open discussions
14:37:08 <johnthetubaguy> I do have some WIP patches / stuff in my head about how that should work, I will try get to that
14:38:48 <tdurakov> markus_z, there is a question in agenda, so?
14:39:11 <markus_z> Yeah, I thought I might ask here
14:39:31 <markus_z> My question is, how do we handle live-migrations to target nodes with an older hypervisor version today?
14:39:57 <pkoniszewski> we don't allow them
14:39:58 <markus_z> Let's say we have qemu 2.7.0 on the source node and qemu 2.3.0 on the target node.
14:40:00 <johnthetubaguy> we fail, in some cases, in pre-livemigration, I think
14:40:08 <pkoniszewski> in conductor
14:40:18 <davidgiluk> markus_z: With upstream qemu that's very unlikely to work
14:40:42 <markus_z> Oh, in conductor you say? I have checked only the libvirt driver.
14:40:53 <pkoniszewski> https://github.com/openstack/nova/blob/c824982e6a3d6660697e503f7236377cc8202d41/nova/conductor/tasks/live_migrate.py#L140
14:40:56 <pkoniszewski> here
14:41:02 <kashyap> markus_z: I think he means having machine types compatilibity, etc
14:41:04 <johnthetubaguy> yeah, thats it
14:41:11 <pkoniszewski> we do check version of the hypervisor basing on what we have in DB
14:41:31 <kashyap> davidgiluk: Isn't it, when you're referring to fwd & backward migr
14:41:32 <pkoniszewski> so we only allow old -> new live migrations
14:41:33 <markus_z> Ah man, yeah, thanks, that's exactly what I was looking for
14:41:52 <tdurakov> hmm
14:42:04 <tdurakov> I wonder, could it be blocker during upgrades?
14:42:10 <pkoniszewski> it can be
14:42:27 <markus_z> how?
14:42:41 <tdurakov> if smth goes wrong it will be impossible to get back to old compute, right?
14:43:02 <pkoniszewski> markus_z: it can be when you are moving VMs to newer operating system
14:43:09 <pkoniszewski> e.g. from 14.04 to 16.04
14:43:14 <pkoniszewski> you might not be able to move back to 14.04
14:43:34 <pkoniszewski> unless you upgrade QEMU on 14.04 side
14:43:41 <markus_z> I thought the VM of the source node gets killed *after* the l-m was successful?
14:43:59 <tdurakov> if there will be no check, will it be technically  possible? on libvirt/qemu layers
14:44:42 <pkoniszewski> davidgiluk just said that it's very unlikely to work
14:45:19 <tdurakov> okay then
14:46:01 <markus_z> thanks a lot :)
14:46:05 <tdurakov> I'd prefer to have some soft checks flags for that
14:46:19 <davidgiluk> pkoniszewski: backwards migration tends not to work unless you're either 1) very lucky or 2) Someone has done a lot of work to wire the machine types to be exactly the same (that we do downstream)
14:47:37 <tdurakov> thanks davidgiluk
14:47:41 <pkoniszewski> hmm, can we base on configuration then? i mean, when someone explicitly sets machine type in nova.conf? but how would we know whether this machine type is supported by QEMU on both sides, maybe we need another precheck during check_can_live_migrate
14:48:26 <tdurakov> pkoniszewski: worth to test, I thinh
14:48:31 <tdurakov> s/think
14:48:44 <pkoniszewski> yeah, something we might want to check
14:49:09 <tdurakov> anything else to bring?
14:49:39 <davidgiluk> pkoniszewski: It's difficult to check, for example it might be one particular device that used some new field in a new qemu
14:50:33 <tdurakov> davidgiluk: i think it's not about upgrades
14:51:06 <tdurakov> but still
14:51:46 <tdurakov> thanks everyone for participating
14:51:50 <tdurakov> #endmeeting