13:59:59 <tdurakov> #startmeeting Nova Live Migration 14:00:00 <openstack> Meeting started Tue Aug 9 13:59:59 2016 UTC and is due to finish in 60 minutes. The chair is tdurakov. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:01 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:03 <openstack> The meeting name has been set to 'nova_live_migration' 14:00:12 <tdurakov> hi everyone 14:00:16 * kashyap waves 14:01:19 <tdurakov> just wait a minute, so everyone joins 14:01:53 <tdurakov> agenda https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration 14:02:16 <mriedem> o/ 14:02:20 <davidgiluk> o/ 14:02:35 <tdurakov> so, let's start 14:02:43 <tdurakov> #topic Libvirt image backend 14:02:52 <tdurakov> any updates? 14:03:19 <tdurakov> I've reviewed several patches on that, it looks like chain reduces 14:03:36 <woodster_> o/ 14:04:16 <tdurakov> I'd expect almost the same for the next topic... 14:04:27 <tdurakov> #topic Storage pools 14:05:12 <tdurakov> any updates on that? 14:05:55 <tdurakov> okay, let's move then to CI one... 14:06:03 <tdurakov> #topic CI 14:06:09 <tdurakov> first 14:06:32 <tdurakov> wanted to mention, markus_z is working on the serial consoles for live-migration 14:06:50 <tdurakov> #link https://review.openstack.org/#/c/347471/ 14:07:18 <tdurakov> so, it's improve coverage for that 14:08:03 <tdurakov> https://bugs.launchpad.net/nova/+bug/1524898 - this one is still hanged... 14:08:03 <openstack> Launchpad bug 1524898 in OpenStack Compute (nova) "Volume based live migration aborted unexpectedly" [High,Confirmed] 14:08:09 <tdurakov> mriedem: anything on that^ 14:08:35 <mriedem> nope 14:08:49 <mriedem> we had talked about skipping it 14:08:55 <mriedem> last week 14:09:01 <mriedem> at least in the multinode job 14:09:04 <mriedem> to see if that stabilizes that job 14:09:25 <tdurakov> mriedem: ok, I'll submit patch today for that 14:09:30 <mriedem> ok 14:10:13 <tdurakov> there is also potential issue with nfs https://bugs.launchpad.net/nova/+bug/1535232 14:10:13 <openstack> Launchpad bug 1535232 in OpenStack Compute (nova) "live-migration ci failure on nfs shared storage" [Medium,Confirmed] 14:10:56 <tdurakov> I expected it will work properly on xenial, but looks like we have it on xenial either 14:11:20 <kashyap> tdurakov: Is that a new bug/ 14:11:22 * kashyap clicks 14:11:41 <tdurakov> kashyap: it's not a new, but it turns out to be valid for xenial too 14:11:45 <kashyap> Duh, no, I even seem to have commented on it, a bit old (5 months old) 14:12:05 <tdurakov> yes 14:12:33 <kashyap> tdurakov: Currently we seem to be stuck at understanding 14:12:48 <kashyap> Why block I/O errors would cause live mig to abort 14:13:08 <kashyap> davidgiluk: Later, do you have time to take a qiuck look at? - https://bugs.launchpad.net/nova/+bug/1535232/comments/6 14:13:08 <openstack> Launchpad bug 1535232 in OpenStack Compute (nova) "live-migration ci failure on nfs shared storage" [Medium,Confirmed] 14:13:14 <tdurakov> kashyap: looks like another underlying issue 14:13:25 * davidgiluk looks 14:14:51 <kashyap> davidgiluk: The hint is at the end of the comment. 14:16:09 <davidgiluk> kashyap: The IO errors are interesting, but that suggests it's somethign related to the storage again 14:16:24 <kashyap> tdurakov: Is it reproducible for you locally? (I don't have an NFS env on my local env, should set one up.) 14:16:32 <tdurakov> kashyap: same 14:16:34 <kashyap> davidgiluk: Ah, you mean the host storage env? 14:16:49 <tdurakov> I haven't managed to reproduce it 14:16:51 <kashyap> Hmm, I have no clue what the underlying storage setup is like 14:17:07 <davidgiluk> kashyap: Yes 14:17:26 <kashyap> tdurakov: Incidentally, I'm doing a drive-mirror+NBD storage migration test with QEMU itself, I'll keep an eye to see if I can trigger this 14:17:42 <tdurakov> kashyap: ok, thanks 14:17:53 <davidgiluk> kashyap: I'd guess at NFS permissions 14:17:56 <kashyap> (I'll test it w/ libvirt APIs, too, which is more useful in Nova's case.) 14:18:29 <kashyap> davidgiluk: Yeah, without 'sosreport'-style logs, can't be certain that they're NFS permissions, can we...? 14:18:46 <davidgiluk> kashyap: But also we need to see libvirt source logs to know why it decided to shut it down 14:19:13 <kashyap> tdurakov: Do you have a handy link for the latest hits for this failure? 14:19:20 <tdurakov> davidgiluk: the strange thing here that there are tests passed on that, during one run 14:19:40 <tdurakov> kashyap: http://logs.openstack.org/66/329466/4/check/gate-tempest-dsvm-multinode-live-migration/586c6be/ 14:20:15 <kashyap> tdurakov: Hmm, the failure is not deterministic? 14:20:24 <tdurakov> kashyap: looks so 14:21:09 <davidgiluk> tdurakov: Deterministic failures would be too easy 14:21:13 <kashyap> tdurakov: 'subnode-2' is the destiniation, isn't it? 14:21:22 <tdurakov> no 14:21:29 <kashyap> Ah, it's the source? 14:21:33 <tdurakov> subnode-2 is just compute only 14:21:39 <tdurakov> in that case it's source 14:21:59 <davidgiluk> tdurakov: I regularly do NFS qemu only migrations - so it normally works at that level 14:22:05 <kashyap> tdurakov: Okay, I got confused because, for some instances on 'subnode-2', I see the QEMU command-line parameter '-incoming', for some not 14:22:28 <kashyap> E.g. this (instance-00000009) has the "-incoming defer" (which means, it's the destination) - http://logs.openstack.org/66/329466/4/check/gate-tempest-dsvm-multinode-live-migration/586c6be/logs/subnode-2/libvirt/qemu/instance-00000009.txt.gz 14:23:01 <kashyap> davidgiluk: You don't test drive-mirror + NBD, do you? I presume you still do: 'migrate -b'? 14:23:26 <kashyap> Or 'migrate tcp:localhost:4444' 14:23:37 <davidgiluk> kashyap: No, I rearely do migrate -b either, I just use migrate with NFS shared storage most of the time 14:24:00 <tdurakov> I'll check one thing, whether it's always fails on subnode, or there is no correlation 14:24:28 <kashyap> davidgiluk: That's the source libvirt daemon log - logs.openstack.org/66/329466/4/check/gate-tempest-dsvm-multinode-live-migration/586c6be/logs/subnode-2/libvirt/libvirtd.txt.gz 14:24:29 <tdurakov> my understanding for that was, that things are unrelated... 14:25:07 <tdurakov> http://logs.openstack.org/66/329466/4/check/gate-tempest-dsvm-multinode-live-migration/586c6be/logs/subnode-2/libvirt/libvirtd.txt.gz#_2016-08-03_00_45_27_923 14:25:12 <kashyap> davidgiluk: Oh, you seem to sound right... 14:25:12 <tdurakov> kashyap, davidgiluk ^ 14:25:13 <kashyap> 2016-08-03 00:45:21.456+0000: 18816: info : qemuMonitorIOProcess:423 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7008012390 buf={"timestamp": {"seconds": 1470185121, "microseconds": 456577}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "nospace": false, "reason": "Permission denied", "operation": "write", "action": "report"}} 14:25:22 <kashyap> davidgiluk: "Permission denied" 14:25:33 <davidgiluk> kashyap: Yeh that looks like a simple perm problem on the NFS 14:26:21 <kashyap> davidgiluk: Also, checking out the URL from tdurakov w/ the "load of migration failed: Input/output error" 14:26:47 <tdurakov> davidgiluk, about perm issue, http://logs.openstack.org/66/329466/4/check/gate-tempest-dsvm-multinode-live-migration/586c6be/console.html.gz#_2016-08-03_00_45_50_882922 - there are several tests that passes 14:26:49 <kashyap> tdurakov: We can move this discussion out of the meeting; maybe you have other issues to discuss. 14:27:15 <tdurakov> I'd spend 5 more minutes on that 14:27:20 <kashyap> Yeah, sure 14:27:44 <kashyap> tdurakov: Hmm, so this succeeds 'test_live_block_migration_paused' 14:27:54 <tdurakov> kashyap: yes 14:28:22 <tdurakov> so if it's perm issue, I'd expect to see all tests to fail, right? 14:29:51 <tdurakov> well, let's take a timeout for that, so everyone interested could grep through the code, and get back for that later 14:30:28 <davidgiluk> tdurakov: I wonder if the one that is passing is actually NFS shared storage; it's name is live_block_migration - so is it actually using NFS shared? 14:30:40 <tdurakov> davidgiluk: it's nfs 14:30:44 <davidgiluk> ok 14:30:49 <tdurakov> just bad tempest naming... 14:31:10 <kashyap> tdurakov: When the storage is shared, what is the point of doing a *block* migration? 14:31:26 <tdurakov> it's not block migration 14:31:37 <kashyap> Ah, bad tempest test name you say above 14:31:43 <tdurakov> right 14:31:53 <tdurakov> another ci thing to mention 14:31:56 <tdurakov> https://bugs.launchpad.net/nova/+bug/1592015 14:31:56 <openstack> Launchpad bug 1592015 in OpenStack Compute (nova) "libvirt: cleanup of a volume backed instance resize leaves behind the instance directory" [Medium,In progress] - Assigned to Lee Yarwood (lyarwood) 14:32:37 <tdurakov> I'm going to add tempest test that do cold migration, and then live-migration after that 14:33:06 <tdurakov> once the bug is fixed 14:33:53 <tdurakov> mriedem1, thoughts on that^ 14:35:19 <davidgiluk> tdurakov: Do those systems use apparmor or anything like that? I know there's a couple of selinux tricks to get NFS working, but I guess it's not using that 14:35:45 <tdurakov> davidgiluk: does ubuntu use apparmor? 14:35:57 <davidgiluk> tdurakov: It does, I don't know if it's used in this case 14:36:16 <tdurakov> I mean by default 14:36:39 <mriedem> tdurakov: the more tests the better 14:37:22 <tdurakov> ok let's move on then 14:38:05 <tdurakov> pkoniszewski: added http://lists.openstack.org/pipermail/openstack-dev/2016-August/100657.html this thread to agenda 14:38:40 <tdurakov> so, the point is to not using tunneled for live-migration by default 14:39:33 <tdurakov> so, I support that idea 14:40:09 <tdurakov> if you folks, missed the thread fell free to respond on that 14:40:25 <tdurakov> also there is a patch on review already https://review.openstack.org/#/c/350480/ 14:41:30 <tdurakov> #topic: Open discussion 14:41:36 <kashyap> tdurakov: I've read the thread, and agree with danpb's comment there 14:41:50 <tdurakov> kashyap: ok 14:41:52 <kashyap> tdurakov: And, yours 14:42:10 <kashyap> You're also "thumbs up" for default=False, for good reasons. 14:42:10 <tdurakov> cool, review the patch then, thanks:) 14:42:28 <kashyap> Yeah, looking 14:43:14 <tdurakov> anything else to discuss on live-migration? 14:43:55 <kashyap> Nothing from me as of now. 14:44:22 <tdurakov> ok, thank you for comming, It was kind of sub-sub-team meeting:) 14:45:01 <tdurakov> #endmeeting