14:00:59 <tdurakov> #startmeeting Nova Live Migration
14:01:00 <openstack> Meeting started Tue Oct  4 14:00:59 2016 UTC and is due to finish in 60 minutes.  The chair is tdurakov. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:01 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:04 <openstack> The meeting name has been set to 'nova_live_migration'
14:01:07 <tdurakov> hello everyone
14:01:10 <macsz> \o
14:01:14 <pkoniszewski> o/
14:01:21 <raj_singh> o/
14:01:47 <tdurakov> agenda: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration
14:01:55 <tdurakov> let's start
14:02:05 <tdurakov> #topic CI
14:02:54 <tdurakov> pkoniszewski, raj_singh any updates on grenade?
14:02:58 <pkoniszewski> sure
14:03:15 <pkoniszewski> so the job is almost ready, probably it will be merged in current state
14:03:21 <pkoniszewski> #link https://review.openstack.org/#/c/364809/
14:03:34 <davidgiluk> o/
14:03:47 <pkoniszewski> there are still tests missing and I'm not sure which approach we should take
14:04:03 <raj_singh> I made a test job to test LM + Neutron + Grenade + Moving VM source->target and then target->source. It all worked https://www.youtube.com/watch?v=-nQGBZQrtT0
14:04:06 <pkoniszewski> we definitely need to do something about tests as in grenade job we should test live migration back and forth
14:04:10 <raj_singh> sorry wrong link
14:04:18 <raj_singh> https://review.openstack.org/#/c/378015/
14:04:43 <tdurakov> afair tempest run only smoke tests for grenade?
14:04:45 <pkoniszewski> depends which approach we want to take, I hoped that at least one nova core would be there today
14:04:59 <pkoniszewski> by default - yest
14:05:04 <pkoniszewski> but we can change it
14:05:19 <tdurakov> so, first option mark l-m tests as smoke
14:05:25 <pkoniszewski> the point is that our current test live migrate all VMs only once
14:05:26 <tdurakov> do we have any others?
14:05:33 <pkoniszewski> i don't think that we need to do that
14:05:59 <pkoniszewski> i mean, we don't need to mark all lm tests as smoke
14:06:20 <pkoniszewski> so the point is whether we want to reimplement all tests so that all tests will live migrate VM back and forth
14:06:37 <pkoniszewski> or implement new tests specific for grenade job to LM VMs back and forth
14:07:01 <raj_singh> pkoniszewski: I got some comments from matt for that https://review.openstack.org/#/c/379638/
14:07:09 <pkoniszewski> or just have a single test for each type of migration, i.e., a test that live migrates VM back and forth and a test that block live migrates VM back and forth
14:07:18 <tdurakov> pkoniszewski: I wonder, could we reuse existing tests, or methods from that, but wrap them in new synthetic smoke l-m tests?
14:07:20 <raj_singh> He mentioned we dont need to test moving VM back and forth for all tests
14:08:43 <pkoniszewski> yeah, but live migrating both ways in exisiting job (i mean the job without grenade) might be good too as it will test whether post live migration steps are cleaning up compute node
14:08:48 <pkoniszewski> source node ofc
14:09:00 <raj_singh> +1
14:09:17 <pkoniszewski> i will send an email to the mailing list or try to catch Matt on IRC later today
14:09:29 <pkoniszewski> we need to make this decision to make this job fully functional
14:09:57 <tdurakov> we need both directions imho, there is the case when something goes wrong during upgrades and operator wants to rollback instances to old compute
14:10:08 <pkoniszewski> in greande, yes
14:10:18 <pkoniszewski> but the concern is whether we need to do that in tempest job also
14:11:02 <tdurakov> pkoniszewski: I expected that it's gonna be tempest anyway
14:11:02 <johnthetubaguy> could that be like a tempest config?
14:11:13 <johnthetubaguy> or two separate tests?
14:11:44 <pkoniszewski> johnthetubaguy: two separate classes (files) you mean?
14:11:55 <tdurakov> johnthetubaguy: config will help
14:12:27 <tdurakov> I think it's about one more test case class for upgrades testing purposes only, right?
14:12:51 <tdurakov> that will be inforced by new tempest config option
14:12:53 <pkoniszewski> its not just one more test case class as we already have two classes
14:13:02 <pkoniszewski> for microversion <= 2.24 and microversion >= 2.25
14:13:08 <pkoniszewski> and i think we should have both covered
14:14:04 <pkoniszewski> it's really about overriding one method that is invoking live migration and validates whether it succeded or not
14:14:21 <pkoniszewski> in base class
14:14:25 <pkoniszewski> https://github.com/openstack/tempest/blob/023ea9eeb6b70c9f10c2d53810ae4f3d1cbd3052/tempest/api/compute/admin/test_live_migration.py#L93
14:15:22 <tdurakov> pkoniszewski: flag that enables 2-way testing?
14:15:32 <pkoniszewski> it might be an option
14:15:45 <pkoniszewski> and probably is the best option
14:16:01 <pkoniszewski> so that if the flag is set run migrations back and forth, if it is not, just migrate and check
14:16:26 <tdurakov> that should work
14:16:31 <pkoniszewski> okay, i will try this approach
14:16:52 <davidgiluk> ping-pong migrations really find lots of bugs for us on qemu, I suspect they will also help at this level
14:17:05 <tdurakov> I'm fine with that, just try to expose it over tempest config
14:17:20 <pkoniszewski> okay, will see what we can do there
14:17:27 <tdurakov> ok
14:17:38 <tdurakov> anything else on grenade job?
14:17:48 <johnthetubaguy> I think that flag seems the best approach
14:17:56 <johnthetubaguy> so we can just modify existing jobs to run the new tests
14:18:14 <pkoniszewski> i think that's all, the decision is to add new flag
14:18:26 <johnthetubaguy> seems worth a try
14:18:46 <tdurakov> right, thanks for working on that
14:19:20 <tdurakov> let's go netxt
14:19:26 <tdurakov> #topic Bugs
14:20:01 <tdurakov> https://bugs.launchpad.net/nova/+bug/1573875 - who could try to reproduce that?\
14:20:02 <openstack> Launchpad bug 1573875 in OpenStack Compute (nova) "The same ceph rbd device is used by multiple instances" [Undecided,New]
14:21:39 <pkoniszewski> i don't have any setup with ceph atm
14:22:32 <tdurakov> ok, as I'm working on ceph backend right now, will try to reproduce it too
14:23:50 <tdurakov> any updates on this one https://bugs.launchpad.net/nova/+bug/1605016 ?
14:23:51 <openstack> Launchpad bug 1605016 in OpenStack Compute (nova) "Post copy live migration interrupts network connectivity" [High,In progress] - Assigned to Matthew Booth (mbooth-9)
14:24:40 <tdurakov> mdbooth: are you going to fix it withn a bug, or we need bp for that?
14:25:49 <tdurakov> will try to reach mdbooth later
14:26:04 <tdurakov> let's move on, any bugs we need to discuss?
14:26:39 <tdurakov> ok, next topic then
14:26:43 <tdurakov> #topic specs
14:27:22 <tdurakov> so, at this moment there is only 2 in a list, do we have more?
14:27:39 <pkoniszewski> what about your spec tdurakov about removing compute-compute communication?
14:28:08 <tdurakov> pkoniszewski: it's kind of changed
14:28:09 <mdbooth> tdurakov: Sorry, afk. Hope there's no bp required.
14:28:33 <tdurakov> #link https://www.dropbox.com/s/wspfpaf02asp23z/live-migration.png?dl=0  https://www.dropbox.com/s/da69wv9bl1g19ko/cold_migration.png?dl=0  https://www.dropbox.com/s/7e3y9vtfipjo3ik/evacuate.png?dl=0
14:28:42 <tdurakov> drafts for state machine :
14:29:05 <pkoniszewski> nice to see that this work is moving forward
14:29:26 <pkoniszewski> are you going to propose this in Ocata cycle?
14:29:29 <tdurakov> yes
14:29:45 <tdurakov> I'm doing some coding on that right now
14:30:22 <pkoniszewski> i wanted to work on live migration with sr-iov a bit
14:30:25 <johnthetubaguy> so the plan is state machine first, then possible evolve it?
14:30:36 <tdurakov> johnthetubaguy: right
14:30:44 <johnthetubaguy> pkoniszewski: the sr-iov stuff needs the neutron api changes I guess?
14:30:46 <johnthetubaguy> tdurakov: cool
14:30:57 <johnthetubaguy> tdurakov: thats what I remember at the mid cycle I think
14:31:06 <pkoniszewski> johnthetubaguy: that's right, but we need to plan all changes to make it possible
14:31:42 <tdurakov> AFAIR, there was some activity on sr-iov  last cycle
14:31:47 <johnthetubaguy> for the api changes I do have this spec up: https://review.openstack.org/#/c/375580/
14:32:45 <tdurakov> pkoniszewski, johnthetubaguy, folks consider attending nfv meeting next week, might be useful to discuss that there too, what do you think?
14:32:45 <davidgiluk> pkoniszewski: Do you have any idea how to do sr-iov + migration?
14:32:55 <pkoniszewski> also I wanted to work on LM configuration in nova.conf a bit, I mean I want to simplify it, not sure this will require a spec, but I will write something anyway
14:33:07 <davidgiluk> pkoniszewski: There have been 2 or 3 approaches suggested but none have happened yet
14:33:08 <pkoniszewski> davidgiluk: so I'm only thinking about macvtap SR-IOV, not the direct one
14:33:35 <pkoniszewski> davidgiluk: i need to read through all of them, i don't have solution ready yet
14:33:47 <davidgiluk> pkoniszewski: Can you explain what you mean by 'macvtap SR-IOV' I know macvtap, and I know SR-IOV, I don't understand how the two go together
14:34:51 <pkoniszewski> so you can connect SR-IOV port using macvtap device which is residing on the host
14:35:42 <pkoniszewski> davidgiluk: there is this wiki page about how it works - https://wiki.openstack.org/wiki/Nova-neutron-sriov
14:36:09 <johnthetubaguy> yeah, only that special case will work right now
14:36:50 <pkoniszewski> yeah, definitely i don't want to try to make direct sr-iov work with live migration as this might be undoable right now
14:37:24 <davidgiluk> pkoniszewski: Oh! So it's sr-iov VF on the host, then that macvtap wired to the qemu? So the qemu still just sees it as macvtap?
14:37:33 <pkoniszewski> yeah
14:37:56 <johnthetubaguy> that mode is only there to make live-migrate work, as I understand it
14:37:59 <davidgiluk> pkoniszewski: Oh right, yes migrating sr-iov passed through to the guest is a whole different pain which people have been sweating over on qemu
14:38:01 <tdurakov> https://review.openstack.org/#/c/251387/ - abandoned spec
14:38:29 <pkoniszewski> right, that's why i don't want to touch it
14:38:44 <pkoniszewski> macvtap makes it possible so probably we can enable it in nova
14:39:08 <davidgiluk> pkoniszewski: Is the sr-iov+macvtap just to get rid of the intermediate bridge?
14:39:19 <tdurakov> pkoniszewski: it's about macvtap
14:39:36 <pkoniszewski> tdurakov: thanks, i was looking for this spec
14:39:53 <pkoniszewski> davidgiluk: can't really answer right now whther it is only to get rid of the bridge
14:40:09 <pkoniszewski> might be
14:40:25 <davidgiluk> pkoniszewski: OK, it just seems a little odd arrangement from my point of view; I don't quite understand why you need the SR-IOV at all
14:41:36 <tdurakov> pkoniszewski: please ask folks and if they are ok repropose it then
14:42:04 <pkoniszewski> okay, will do
14:42:17 <tdurakov> anything else for ocata?
14:42:37 <pkoniszewski> given that Ocata is a very short cycle... ;)
14:43:00 <tdurakov> next topic
14:43:10 <tdurakov> #topic Open discussion
14:43:38 <tdurakov> any open questiong?
14:43:44 <tdurakov> s/questions
14:44:43 <tdurakov> okay then, thank you for joining
14:44:55 <tdurakov> #endmeeting