13:59:59 <tdurakov> #startmeeting Nova Live Migration
14:00:00 <openstack> Meeting started Tue Aug  9 13:59:59 2016 UTC and is due to finish in 60 minutes.  The chair is tdurakov. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:01 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:03 <openstack> The meeting name has been set to 'nova_live_migration'
14:00:12 <tdurakov> hi everyone
14:00:16 * kashyap waves
14:01:19 <tdurakov> just wait a minute, so everyone joins
14:01:53 <tdurakov> agenda https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration
14:02:16 <mriedem> o/
14:02:20 <davidgiluk> o/
14:02:35 <tdurakov> so, let's start
14:02:43 <tdurakov> #topic Libvirt image backend
14:02:52 <tdurakov> any updates?
14:03:19 <tdurakov> I've reviewed several patches on that, it looks like chain reduces
14:03:36 <woodster_> o/
14:04:16 <tdurakov> I'd expect almost the same for the next topic...
14:04:27 <tdurakov> #topic Storage pools
14:05:12 <tdurakov> any updates on that?
14:05:55 <tdurakov> okay, let's move then to CI one...
14:06:03 <tdurakov> #topic CI
14:06:09 <tdurakov> first
14:06:32 <tdurakov> wanted to mention, markus_z is working on the serial consoles for live-migration
14:06:50 <tdurakov> #link https://review.openstack.org/#/c/347471/
14:07:18 <tdurakov> so, it's improve coverage for that
14:08:03 <tdurakov> https://bugs.launchpad.net/nova/+bug/1524898 - this one is still hanged...
14:08:03 <openstack> Launchpad bug 1524898 in OpenStack Compute (nova) "Volume based live migration aborted unexpectedly" [High,Confirmed]
14:08:09 <tdurakov> mriedem: anything on that^
14:08:35 <mriedem> nope
14:08:49 <mriedem> we had talked about skipping it
14:08:55 <mriedem> last week
14:09:01 <mriedem> at least in the multinode job
14:09:04 <mriedem> to see if that stabilizes that job
14:09:25 <tdurakov> mriedem: ok, I'll submit patch today for that
14:09:30 <mriedem> ok
14:10:13 <tdurakov> there is also potential issue with nfs https://bugs.launchpad.net/nova/+bug/1535232
14:10:13 <openstack> Launchpad bug 1535232 in OpenStack Compute (nova) "live-migration ci failure on nfs shared storage" [Medium,Confirmed]
14:10:56 <tdurakov> I expected it will work properly on xenial, but looks like we have it on xenial either
14:11:20 <kashyap> tdurakov: Is that a new bug/
14:11:22 * kashyap clicks
14:11:41 <tdurakov> kashyap: it's not a new, but it turns out to be valid for xenial too
14:11:45 <kashyap> Duh, no, I even seem to have commented on it, a bit old (5 months old)
14:12:05 <tdurakov> yes
14:12:33 <kashyap> tdurakov: Currently we seem to be stuck at understanding
14:12:48 <kashyap> Why block I/O errors would cause live mig to abort
14:13:08 <kashyap> davidgiluk: Later, do you have time to take a qiuck look at? - https://bugs.launchpad.net/nova/+bug/1535232/comments/6
14:13:08 <openstack> Launchpad bug 1535232 in OpenStack Compute (nova) "live-migration ci failure on nfs shared storage" [Medium,Confirmed]
14:13:14 <tdurakov> kashyap: looks like another underlying issue
14:13:25 * davidgiluk looks
14:14:51 <kashyap> davidgiluk: The hint is at the end of the comment.
14:16:09 <davidgiluk> kashyap: The IO errors are interesting, but that suggests it's somethign related to the storage again
14:16:24 <kashyap> tdurakov: Is it reproducible for you locally?  (I don't have an NFS env on my local env, should set one up.)
14:16:32 <tdurakov> kashyap: same
14:16:34 <kashyap> davidgiluk: Ah, you mean the host storage env?
14:16:49 <tdurakov> I haven't managed to reproduce it
14:16:51 <kashyap> Hmm, I have no clue what the underlying storage setup is like
14:17:07 <davidgiluk> kashyap: Yes
14:17:26 <kashyap> tdurakov: Incidentally, I'm doing a drive-mirror+NBD storage migration test with QEMU itself, I'll keep an eye to see if I can trigger this
14:17:42 <tdurakov> kashyap: ok, thanks
14:17:53 <davidgiluk> kashyap: I'd guess at NFS permissions
14:17:56 <kashyap> (I'll test it w/ libvirt APIs, too, which is more useful in Nova's case.)
14:18:29 <kashyap> davidgiluk: Yeah, without 'sosreport'-style logs, can't be certain that they're NFS permissions, can we...?
14:18:46 <davidgiluk> kashyap: But also we need to see libvirt source logs to know why it decided to shut it down
14:19:13 <kashyap> tdurakov: Do you have a handy link for the latest hits for this failure?
14:19:20 <tdurakov> davidgiluk: the strange thing here that there are tests passed on that, during one run
14:19:40 <tdurakov> kashyap: http://logs.openstack.org/66/329466/4/check/gate-tempest-dsvm-multinode-live-migration/586c6be/
14:20:15 <kashyap> tdurakov: Hmm, the failure is not deterministic?
14:20:24 <tdurakov> kashyap: looks so
14:21:09 <davidgiluk> tdurakov: Deterministic failures would be too easy
14:21:13 <kashyap> tdurakov: 'subnode-2' is the destiniation, isn't it?
14:21:22 <tdurakov> no
14:21:29 <kashyap> Ah, it's the source?
14:21:33 <tdurakov> subnode-2 is just compute only
14:21:39 <tdurakov> in that case it's source
14:21:59 <davidgiluk> tdurakov: I regularly do NFS qemu only migrations - so it normally works at that level
14:22:05 <kashyap> tdurakov: Okay, I got confused because, for some instances on 'subnode-2', I see the QEMU command-line parameter '-incoming', for some not
14:22:28 <kashyap> E.g. this (instance-00000009) has the "-incoming defer" (which means, it's the destination) - http://logs.openstack.org/66/329466/4/check/gate-tempest-dsvm-multinode-live-migration/586c6be/logs/subnode-2/libvirt/qemu/instance-00000009.txt.gz
14:23:01 <kashyap> davidgiluk: You don't test drive-mirror + NBD, do you?  I presume you still do: 'migrate -b'?
14:23:26 <kashyap> Or 'migrate tcp:localhost:4444'
14:23:37 <davidgiluk> kashyap: No, I rearely do migrate -b either, I just use migrate  with NFS shared storage most of the time
14:24:00 <tdurakov> I'll check one thing, whether it's always fails on subnode, or there is no correlation
14:24:28 <kashyap> davidgiluk: That's the source libvirt daemon log - logs.openstack.org/66/329466/4/check/gate-tempest-dsvm-multinode-live-migration/586c6be/logs/subnode-2/libvirt/libvirtd.txt.gz
14:24:29 <tdurakov> my understanding for that was, that things are unrelated...
14:25:07 <tdurakov> http://logs.openstack.org/66/329466/4/check/gate-tempest-dsvm-multinode-live-migration/586c6be/logs/subnode-2/libvirt/libvirtd.txt.gz#_2016-08-03_00_45_27_923
14:25:12 <kashyap> davidgiluk: Oh, you seem to sound right...
14:25:12 <tdurakov> kashyap, davidgiluk ^
14:25:13 <kashyap> 2016-08-03 00:45:21.456+0000: 18816: info : qemuMonitorIOProcess:423 : QEMU_MONITOR_IO_PROCESS: mon=0x7f7008012390 buf={"timestamp": {"seconds": 1470185121, "microseconds": 456577}, "event": "BLOCK_IO_ERROR", "data": {"device": "drive-virtio-disk0", "nospace": false, "reason": "Permission denied", "operation": "write", "action": "report"}}
14:25:22 <kashyap> davidgiluk: "Permission denied"
14:25:33 <davidgiluk> kashyap: Yeh that looks like a simple perm problem on the NFS
14:26:21 <kashyap> davidgiluk: Also, checking out the URL from tdurakov w/ the "load of migration failed: Input/output error"
14:26:47 <tdurakov> davidgiluk, about perm issue, http://logs.openstack.org/66/329466/4/check/gate-tempest-dsvm-multinode-live-migration/586c6be/console.html.gz#_2016-08-03_00_45_50_882922 - there are several tests that passes
14:26:49 <kashyap> tdurakov: We can move this discussion out of the meeting; maybe you have other issues to discuss.
14:27:15 <tdurakov> I'd spend 5 more minutes on that
14:27:20 <kashyap> Yeah, sure
14:27:44 <kashyap> tdurakov: Hmm, so this succeeds 'test_live_block_migration_paused'
14:27:54 <tdurakov> kashyap: yes
14:28:22 <tdurakov> so if it's perm issue, I'd expect to see all tests to fail, right?
14:29:51 <tdurakov> well, let's take a timeout for that, so everyone interested could grep through the code, and get back for that later
14:30:28 <davidgiluk> tdurakov: I wonder if the one that is passing is actually NFS shared storage; it's name is live_block_migration - so is it actually using NFS shared?
14:30:40 <tdurakov> davidgiluk: it's nfs
14:30:44 <davidgiluk> ok
14:30:49 <tdurakov> just bad tempest naming...
14:31:10 <kashyap> tdurakov: When the storage is shared, what is the point of doing a *block* migration?
14:31:26 <tdurakov> it's not block migration
14:31:37 <kashyap> Ah, bad tempest test name you say above
14:31:43 <tdurakov> right
14:31:53 <tdurakov> another ci thing to mention
14:31:56 <tdurakov> https://bugs.launchpad.net/nova/+bug/1592015
14:31:56 <openstack> Launchpad bug 1592015 in OpenStack Compute (nova) "libvirt: cleanup of a volume backed instance resize leaves behind the instance directory" [Medium,In progress] - Assigned to Lee Yarwood (lyarwood)
14:32:37 <tdurakov> I'm going to add tempest test that do cold migration, and then live-migration after that
14:33:06 <tdurakov> once the bug is fixed
14:33:53 <tdurakov> mriedem1, thoughts on that^
14:35:19 <davidgiluk> tdurakov: Do those systems use apparmor or anything like that? I know there's a couple of selinux tricks to get NFS working, but I guess it's not using that
14:35:45 <tdurakov> davidgiluk: does ubuntu use apparmor?
14:35:57 <davidgiluk> tdurakov: It does, I don't know if it's used in this case
14:36:16 <tdurakov> I mean by default
14:36:39 <mriedem> tdurakov: the more tests the better
14:37:22 <tdurakov> ok let's move on then
14:38:05 <tdurakov> pkoniszewski: added http://lists.openstack.org/pipermail/openstack-dev/2016-August/100657.html  this thread to agenda
14:38:40 <tdurakov> so, the point is to not using tunneled for live-migration by default
14:39:33 <tdurakov> so, I support that idea
14:40:09 <tdurakov> if you folks, missed the thread fell free to respond on that
14:40:25 <tdurakov> also there is a patch on review already https://review.openstack.org/#/c/350480/
14:41:30 <tdurakov> #topic: Open discussion
14:41:36 <kashyap> tdurakov: I've read the thread, and agree with danpb's comment there
14:41:50 <tdurakov> kashyap: ok
14:41:52 <kashyap> tdurakov: And, yours
14:42:10 <kashyap> You're also "thumbs up" for default=False, for good reasons.
14:42:10 <tdurakov> cool, review the patch then, thanks:)
14:42:28 <kashyap> Yeah, looking
14:43:14 <tdurakov> anything else to discuss on live-migration?
14:43:55 <kashyap> Nothing from me as of now.
14:44:22 <tdurakov> ok, thank you for comming, It was kind of sub-sub-team meeting:)
14:45:01 <tdurakov> #endmeeting