14:00:25 #startmeeting Nova Live Migration 14:00:26 Meeting started Tue Nov 22 14:00:25 2016 UTC and is due to finish in 60 minutes. The chair is tdurakov. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:27 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:29 The meeting name has been set to 'nova_live_migration' 14:00:35 hello everyone 14:00:54 #link agenda https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration 14:01:16 o/ 14:01:34 hm 14:01:45 only me and markus_z O_o 14:01:51 o/ 14:02:16 whoop whoop, three's a party :) 14:02:22 o/ 14:02:36 last week there were only 3:( 14:02:41 so now it's 4) 14:02:42 * kashyap waves 14:02:47 ok, let's start 14:02:51 #topic ci 14:03:12 I think grenade and ceph bits are stuck on tempest part 14:03:35 that's right 14:03:45 https://review.openstack.org/#/c/379638/ 14:03:45 https://review.openstack.org/#/c/389767/ 14:04:15 tried to catch JordanP last week, no results 14:05:46 send a message for #openstack-qa with review request 14:06:01 late hi 14:06:08 pkoniszewski: any updates on stats for post-copy testing? 14:06:57 * tdurakov trying to make predictable mem load on instance - work in progress 14:07:02 none yet, i was burried in reviewing serial console and claims patches 14:07:04 i'm still a bit 14:07:14 ok 14:07:21 let's go next topic 14:07:31 #topic bugs 14:07:53 so, let's discuss that patches for serial-console and life migration 14:08:13 I wanted to ask here for a general direction on that patches. 14:08:17 For context: https://bugs.launchpad.net/nova/+bug/1455252 14:08:17 Launchpad bug 1455252 in OpenStack Compute (nova) "enabling serial console breaks live migration" [High,In progress] - Assigned to sahid (sahid-ferdjaoui) 14:08:39 I took over from sahid on that series and discussed it yesterday with dansmith. 14:09:06 Let me get the reference to that, one moment 14:09:35 http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2016-11-21.log.html#t2016-11-21T15:10:15 14:09:57 https://review.openstack.org/#/c/397276/ -is that fix? 14:10:35 ^ is one of the preparation patches. The actual fix is https://review.openstack.org/#/c/275801/ 14:10:48 I'm rebasing that after this meeting. 14:11:00 we are almost there, I think 14:11:08 there is one change that I don't like in those patches 14:11:08 markus_z: I'm ok with None too 14:11:10 I mean this check move 14:11:20 this is completely unreleated to the fix 14:11:45 OK, cool, that's the two major hurdles at the moment. 14:11:53 I got it and change them accordingly. 14:12:02 i'm fine with everything else and I agree with Dan's point 14:12:18 Anything else which would make the reviews easier? 14:13:16 hold on 14:13:57 https://review.openstack.org/#/c/398389/2/nova/virt/libvirt/migration.py - won't this raise obj_attr unset? 14:14:11 can't find anything else 14:14:37 tdurakov: according to Dan's comment I don't think we still need this change in current form 14:14:45 tdurakov: yes it would. I already reverted that but didn't yet push it. It will be in ps3. 14:14:45 I mean, we still need to have a guard there 14:14:54 great! :) 14:15:05 ok) 14:15:51 Please be aware that tempest isn't yet capable of testing the serial console with live-migration. That's done with: https://review.openstack.org/#/c/346815/ 14:16:34 To be precise, this one in Nova does the prep step: https://review.openstack.org/#/c/347471/ 14:16:45 Just to be sure to give the full context here. 14:17:03 markus_z: I could update hook 14:17:50 tdurakov: That would be nice, thanks :) 14:17:56 so, if it works, we could depends on that 14:18:16 again not sure tempest tests to be merged soon 14:18:40 maybe it will be easier to create tempest plugin for live-migration 14:19:13 anything else on that? 14:19:23 For testing, right now I use https://github.com/markuszoeller/openstack/tree/master/scripts/vagrant/live-migration-U1404-VB like the savage I am. :) 14:19:39 No, that's all on that, thanks for your time. 14:19:52 :) 14:20:01 let's move on 14:20:07 #topic specs 14:20:19 johnthetubaguy: are you arounds? 14:20:26 s/around 14:20:29 I am 14:20:32 are we past spec freeze already? 14:20:43 yes, it was last week, thursday 14:20:58 but there still specs unmerged 14:20:59 so 14:21:00 https://review.openstack.org/#/c/347161/ 14:21:30 I'm fine with that one, except I do not wont checks to be made in sync manner 14:22:19 johnthetubaguy: could you please take a look once again 14:22:50 I thought the problem with that, was the issue around recreating the libvirt xml? 14:23:45 FWIW, it seems worth adding that quick RPC call to get capabilities. 14:24:03 I know we don't want RPC calls in the API, but at least it should be a quick thing to process 14:24:29 what's the difference with other l-m pre-checks here? 14:24:48 they had to call out to cinder and things, and it was two sets of RPC calls 14:24:53 why not to get this check too over migration status and instance-actions 14:24:54 they took time 14:25:25 I think its a fine line, but if we can fail early, every reliably, that doesn't seem so bad 14:25:32 its only needed if the VM is in Rescue state 14:25:49 btw, is there plan to store that data on the the api side in db? 14:25:51 unless we check if live-migrate is support at all, I guess, which I guess we should 14:26:10 probably never, that would be a child DB thing if it did happen 14:26:16 at least in my head 14:26:57 okay, still against that, but looks like I'm in minority on that question 14:27:14 so the key bit, for me, is the API experience 14:27:15 if we do rpc.call, let's do it direct to compute 14:27:31 without extra hop to conductor 14:27:34 right, thats what we are asking, direct RPC call to compute 14:28:02 anyways, the key bit is the API sematics 14:28:08 the implementation can change over time 14:28:30 ok, will re-review that soon 14:29:27 johnthetubaguy: thanks 14:29:34 let's go next 14:30:02 #topic Open discussion 14:30:46 do we have anything? 14:31:23 Hi All! I am trying to reproduce this bug https://bugs.launchpad.net/nova/+bug/1605016. Would someone be able to help me with that ? 14:31:23 Launchpad bug 1605016 in OpenStack Compute (nova) "Post copy live migration interrupts network connectivity" [High,Confirmed] - Assigned to Sarafraj Singh (sarafraj-singh) 14:31:42 siva_krish: sure 14:32:01 siva_krish: the easiest way would be to have environment with DVR 14:32:13 as it appears only in certain neutron configurations 14:32:16 right 14:32:51 I tried reproducing it with DVR setup only but I wasn't able to reproduce it 14:33:05 pkoniszewski: ^ 14:33:24 siva_krish: how are you detecting network failure? 14:34:08 siva_krish: The reproducer of it is non-trivial 14:34:17 As you might've noticed the description from Matt Booth, there 14:34:24 tdurakov: by pinging the VM from outside the network and also ran stress command on VM beofr migrating 14:35:01 siva_krish: hm, are you sure that migration is switched to post-copy? 14:36:41 tdurakov: I changed to configuration to use post copy in nova.conf. Is there any other way to check it ? 14:36:59 siva_krish: Make sure the guest is running something heavy that wont migrate with precopy 14:37:09 did you just wait or did you use force-complete API? 14:37:14 siva_krish: e.g. stressapptest hammering memory 14:38:05 pkoniszewski: I didn't use force-complete 14:38:16 siva_krish: that's the point 14:38:25 precopy is pretty much irrelevant here in practice. You should see packet loss/interruption to a degree in any case. 14:38:28 you need to raise stress level on vm 14:38:50 Raising stress level is only required to trigger post-copy in the first place 14:39:15 However, post-copy is subsequently fast enough that you're unlikely to notice the additional network switchover delay due to post-copy 14:39:58 mdbooth: tdurakov will trying doing that and let you know what happened 14:40:00 I found that the easiest way to test it was to introduce an artificial delay in the code. 14:40:18 davidgiluk: will try that out 14:40:40 Also, frig the code to switch to post-copy mode immediately 14:40:50 Then it doesn't require stress in the guest 14:42:24 anything else? 14:42:29 hmm, i think it still does require 14:43:00 we don't know when LM will be switched to post copy, do we? 14:43:44 davidgiluk: can QEMU switch LM to post-copy in the middle of particular iteration? 14:44:03 or does it wait till the end of iteration and then starts post-copy in subsequent iteration? 14:44:16 pkoniszewski: That...depends 14:44:20 pkoniszewski: Yep, because we do the switch 14:44:33 mdbooth: but it's still async... 14:44:49 pkoniszewski: Nope 14:44:51 pkoniszewski: It only checks at certain points, if you've set a bandwidth limit I think it'll probably notice before the end of an iteration but no guarantee 14:44:56 Nova triggers it explicitly 14:45:04 And waits 14:45:16 sounds like load in the VM, and making the code trigger it straight away would help? 14:45:32 yeh, guest load is dead easy anyway 14:45:34 mdbooth: nova scheduled it, no? 14:45:44 yeah, nova just schedules it 14:45:49 mdbooth: Sorry had a internet connection issue. 14:46:08 siva_krish: the resolution is the same, good load level:) 14:46:28 tdurakov: mdbooth will try your suggestions 14:46:29 and switch to post-copy explicitly 14:46:50 siva_krish: we have folks trying to simulate load in the ops/qa team right now, that can probably share that with you 14:46:56 yeah, and somehow trigger post-copy 14:47:07 API call, or via changing the code a bit 14:47:08 live-migration-force-complete 14:47:24 johnthetubaguy: that might be helpful as well 14:47:35 Nova triggers the switch to post-copy explicitly in the _live_migration_monitor loop 14:48:07 The issue is that it doesn't do network switchover until that loop completes 14:48:13 LM is really a red herring here 14:48:18 The bug is the design of the loop 14:48:43 It's not worth spending a lot of time on a complicated setup to trigger a long post-copy 14:48:52 Because post-copy isn't what's causing the issue 14:48:57 so the fix can be done in parallel, I would say 14:49:03 It's what Nova does when post-copy happens which is the issue 14:49:58 (and it's not a big issue) 14:50:41 mdbooth: johnthetubaguy will start working on the fix. 14:54:48 ^ thanks for all of your info/suggestions on this bug 14:55:21 thanks everyone for joining 14:55:28 #endmeeting