14:00:59 <tdurakov> #startmeeting Nova Live Migration 14:01:00 <openstack> Meeting started Tue Oct 4 14:00:59 2016 UTC and is due to finish in 60 minutes. The chair is tdurakov. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:01 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:04 <openstack> The meeting name has been set to 'nova_live_migration' 14:01:07 <tdurakov> hello everyone 14:01:10 <macsz> \o 14:01:14 <pkoniszewski> o/ 14:01:21 <raj_singh> o/ 14:01:47 <tdurakov> agenda: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration 14:01:55 <tdurakov> let's start 14:02:05 <tdurakov> #topic CI 14:02:54 <tdurakov> pkoniszewski, raj_singh any updates on grenade? 14:02:58 <pkoniszewski> sure 14:03:15 <pkoniszewski> so the job is almost ready, probably it will be merged in current state 14:03:21 <pkoniszewski> #link https://review.openstack.org/#/c/364809/ 14:03:34 <davidgiluk> o/ 14:03:47 <pkoniszewski> there are still tests missing and I'm not sure which approach we should take 14:04:03 <raj_singh> I made a test job to test LM + Neutron + Grenade + Moving VM source->target and then target->source. It all worked https://www.youtube.com/watch?v=-nQGBZQrtT0 14:04:06 <pkoniszewski> we definitely need to do something about tests as in grenade job we should test live migration back and forth 14:04:10 <raj_singh> sorry wrong link 14:04:18 <raj_singh> https://review.openstack.org/#/c/378015/ 14:04:43 <tdurakov> afair tempest run only smoke tests for grenade? 14:04:45 <pkoniszewski> depends which approach we want to take, I hoped that at least one nova core would be there today 14:04:59 <pkoniszewski> by default - yest 14:05:04 <pkoniszewski> but we can change it 14:05:19 <tdurakov> so, first option mark l-m tests as smoke 14:05:25 <pkoniszewski> the point is that our current test live migrate all VMs only once 14:05:26 <tdurakov> do we have any others? 14:05:33 <pkoniszewski> i don't think that we need to do that 14:05:59 <pkoniszewski> i mean, we don't need to mark all lm tests as smoke 14:06:20 <pkoniszewski> so the point is whether we want to reimplement all tests so that all tests will live migrate VM back and forth 14:06:37 <pkoniszewski> or implement new tests specific for grenade job to LM VMs back and forth 14:07:01 <raj_singh> pkoniszewski: I got some comments from matt for that https://review.openstack.org/#/c/379638/ 14:07:09 <pkoniszewski> or just have a single test for each type of migration, i.e., a test that live migrates VM back and forth and a test that block live migrates VM back and forth 14:07:18 <tdurakov> pkoniszewski: I wonder, could we reuse existing tests, or methods from that, but wrap them in new synthetic smoke l-m tests? 14:07:20 <raj_singh> He mentioned we dont need to test moving VM back and forth for all tests 14:08:43 <pkoniszewski> yeah, but live migrating both ways in exisiting job (i mean the job without grenade) might be good too as it will test whether post live migration steps are cleaning up compute node 14:08:48 <pkoniszewski> source node ofc 14:09:00 <raj_singh> +1 14:09:17 <pkoniszewski> i will send an email to the mailing list or try to catch Matt on IRC later today 14:09:29 <pkoniszewski> we need to make this decision to make this job fully functional 14:09:57 <tdurakov> we need both directions imho, there is the case when something goes wrong during upgrades and operator wants to rollback instances to old compute 14:10:08 <pkoniszewski> in greande, yes 14:10:18 <pkoniszewski> but the concern is whether we need to do that in tempest job also 14:11:02 <tdurakov> pkoniszewski: I expected that it's gonna be tempest anyway 14:11:02 <johnthetubaguy> could that be like a tempest config? 14:11:13 <johnthetubaguy> or two separate tests? 14:11:44 <pkoniszewski> johnthetubaguy: two separate classes (files) you mean? 14:11:55 <tdurakov> johnthetubaguy: config will help 14:12:27 <tdurakov> I think it's about one more test case class for upgrades testing purposes only, right? 14:12:51 <tdurakov> that will be inforced by new tempest config option 14:12:53 <pkoniszewski> its not just one more test case class as we already have two classes 14:13:02 <pkoniszewski> for microversion <= 2.24 and microversion >= 2.25 14:13:08 <pkoniszewski> and i think we should have both covered 14:14:04 <pkoniszewski> it's really about overriding one method that is invoking live migration and validates whether it succeded or not 14:14:21 <pkoniszewski> in base class 14:14:25 <pkoniszewski> https://github.com/openstack/tempest/blob/023ea9eeb6b70c9f10c2d53810ae4f3d1cbd3052/tempest/api/compute/admin/test_live_migration.py#L93 14:15:22 <tdurakov> pkoniszewski: flag that enables 2-way testing? 14:15:32 <pkoniszewski> it might be an option 14:15:45 <pkoniszewski> and probably is the best option 14:16:01 <pkoniszewski> so that if the flag is set run migrations back and forth, if it is not, just migrate and check 14:16:26 <tdurakov> that should work 14:16:31 <pkoniszewski> okay, i will try this approach 14:16:52 <davidgiluk> ping-pong migrations really find lots of bugs for us on qemu, I suspect they will also help at this level 14:17:05 <tdurakov> I'm fine with that, just try to expose it over tempest config 14:17:20 <pkoniszewski> okay, will see what we can do there 14:17:27 <tdurakov> ok 14:17:38 <tdurakov> anything else on grenade job? 14:17:48 <johnthetubaguy> I think that flag seems the best approach 14:17:56 <johnthetubaguy> so we can just modify existing jobs to run the new tests 14:18:14 <pkoniszewski> i think that's all, the decision is to add new flag 14:18:26 <johnthetubaguy> seems worth a try 14:18:46 <tdurakov> right, thanks for working on that 14:19:20 <tdurakov> let's go netxt 14:19:26 <tdurakov> #topic Bugs 14:20:01 <tdurakov> https://bugs.launchpad.net/nova/+bug/1573875 - who could try to reproduce that?\ 14:20:02 <openstack> Launchpad bug 1573875 in OpenStack Compute (nova) "The same ceph rbd device is used by multiple instances" [Undecided,New] 14:21:39 <pkoniszewski> i don't have any setup with ceph atm 14:22:32 <tdurakov> ok, as I'm working on ceph backend right now, will try to reproduce it too 14:23:50 <tdurakov> any updates on this one https://bugs.launchpad.net/nova/+bug/1605016 ? 14:23:51 <openstack> Launchpad bug 1605016 in OpenStack Compute (nova) "Post copy live migration interrupts network connectivity" [High,In progress] - Assigned to Matthew Booth (mbooth-9) 14:24:40 <tdurakov> mdbooth: are you going to fix it withn a bug, or we need bp for that? 14:25:49 <tdurakov> will try to reach mdbooth later 14:26:04 <tdurakov> let's move on, any bugs we need to discuss? 14:26:39 <tdurakov> ok, next topic then 14:26:43 <tdurakov> #topic specs 14:27:22 <tdurakov> so, at this moment there is only 2 in a list, do we have more? 14:27:39 <pkoniszewski> what about your spec tdurakov about removing compute-compute communication? 14:28:08 <tdurakov> pkoniszewski: it's kind of changed 14:28:09 <mdbooth> tdurakov: Sorry, afk. Hope there's no bp required. 14:28:33 <tdurakov> #link https://www.dropbox.com/s/wspfpaf02asp23z/live-migration.png?dl=0 https://www.dropbox.com/s/da69wv9bl1g19ko/cold_migration.png?dl=0 https://www.dropbox.com/s/7e3y9vtfipjo3ik/evacuate.png?dl=0 14:28:42 <tdurakov> drafts for state machine : 14:29:05 <pkoniszewski> nice to see that this work is moving forward 14:29:26 <pkoniszewski> are you going to propose this in Ocata cycle? 14:29:29 <tdurakov> yes 14:29:45 <tdurakov> I'm doing some coding on that right now 14:30:22 <pkoniszewski> i wanted to work on live migration with sr-iov a bit 14:30:25 <johnthetubaguy> so the plan is state machine first, then possible evolve it? 14:30:36 <tdurakov> johnthetubaguy: right 14:30:44 <johnthetubaguy> pkoniszewski: the sr-iov stuff needs the neutron api changes I guess? 14:30:46 <johnthetubaguy> tdurakov: cool 14:30:57 <johnthetubaguy> tdurakov: thats what I remember at the mid cycle I think 14:31:06 <pkoniszewski> johnthetubaguy: that's right, but we need to plan all changes to make it possible 14:31:42 <tdurakov> AFAIR, there was some activity on sr-iov last cycle 14:31:47 <johnthetubaguy> for the api changes I do have this spec up: https://review.openstack.org/#/c/375580/ 14:32:45 <tdurakov> pkoniszewski, johnthetubaguy, folks consider attending nfv meeting next week, might be useful to discuss that there too, what do you think? 14:32:45 <davidgiluk> pkoniszewski: Do you have any idea how to do sr-iov + migration? 14:32:55 <pkoniszewski> also I wanted to work on LM configuration in nova.conf a bit, I mean I want to simplify it, not sure this will require a spec, but I will write something anyway 14:33:07 <davidgiluk> pkoniszewski: There have been 2 or 3 approaches suggested but none have happened yet 14:33:08 <pkoniszewski> davidgiluk: so I'm only thinking about macvtap SR-IOV, not the direct one 14:33:35 <pkoniszewski> davidgiluk: i need to read through all of them, i don't have solution ready yet 14:33:47 <davidgiluk> pkoniszewski: Can you explain what you mean by 'macvtap SR-IOV' I know macvtap, and I know SR-IOV, I don't understand how the two go together 14:34:51 <pkoniszewski> so you can connect SR-IOV port using macvtap device which is residing on the host 14:35:42 <pkoniszewski> davidgiluk: there is this wiki page about how it works - https://wiki.openstack.org/wiki/Nova-neutron-sriov 14:36:09 <johnthetubaguy> yeah, only that special case will work right now 14:36:50 <pkoniszewski> yeah, definitely i don't want to try to make direct sr-iov work with live migration as this might be undoable right now 14:37:24 <davidgiluk> pkoniszewski: Oh! So it's sr-iov VF on the host, then that macvtap wired to the qemu? So the qemu still just sees it as macvtap? 14:37:33 <pkoniszewski> yeah 14:37:56 <johnthetubaguy> that mode is only there to make live-migrate work, as I understand it 14:37:59 <davidgiluk> pkoniszewski: Oh right, yes migrating sr-iov passed through to the guest is a whole different pain which people have been sweating over on qemu 14:38:01 <tdurakov> https://review.openstack.org/#/c/251387/ - abandoned spec 14:38:29 <pkoniszewski> right, that's why i don't want to touch it 14:38:44 <pkoniszewski> macvtap makes it possible so probably we can enable it in nova 14:39:08 <davidgiluk> pkoniszewski: Is the sr-iov+macvtap just to get rid of the intermediate bridge? 14:39:19 <tdurakov> pkoniszewski: it's about macvtap 14:39:36 <pkoniszewski> tdurakov: thanks, i was looking for this spec 14:39:53 <pkoniszewski> davidgiluk: can't really answer right now whther it is only to get rid of the bridge 14:40:09 <pkoniszewski> might be 14:40:25 <davidgiluk> pkoniszewski: OK, it just seems a little odd arrangement from my point of view; I don't quite understand why you need the SR-IOV at all 14:41:36 <tdurakov> pkoniszewski: please ask folks and if they are ok repropose it then 14:42:04 <pkoniszewski> okay, will do 14:42:17 <tdurakov> anything else for ocata? 14:42:37 <pkoniszewski> given that Ocata is a very short cycle... ;) 14:43:00 <tdurakov> next topic 14:43:10 <tdurakov> #topic Open discussion 14:43:38 <tdurakov> any open questiong? 14:43:44 <tdurakov> s/questions 14:44:43 <tdurakov> okay then, thank you for joining 14:44:55 <tdurakov> #endmeeting