14:00:16 #startmeeting Nova Live Migration 14:00:17 Meeting started Tue Jun 28 14:00:16 2016 UTC and is due to finish in 60 minutes. The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:19 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:21 The meeting name has been set to 'nova_live_migration' 14:00:25 o/ 14:00:25 ji 14:00:27 o/ 14:00:28 o/ 14:00:29 o/ 14:00:29 Hi all 14:00:30 o/ 14:00:31 o/ 14:00:46 o/ 14:00:52 agenda https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration 14:01:12 Did anyone notice that the agenda was up last friday ! ? 14:01:25 * kashyap didn't 14:01:26 * PaulMurray organized for once 14:01:26 :D 14:02:02 #topic Non-priority features 14:02:04 hi 14:02:16 June 30: non-priority feature freeze 14:02:21 two days away 14:02:37 so a quick check for anything that needs to be pushed along 14:02:38 Here's the URL, that PaulMurray has been shy to post here :-) - https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration 14:02:43 https://review.openstack.org/#/c/308198 14:03:07 and https://review.openstack.org/#/c/328280 14:03:17 kashyap, its up a few lines ^^^^ 14:03:24 * PaulMurray not so shy 14:03:36 Oh, then I joined late :-) Sorry 14:03:39 immediately before you joined in fact 14:03:43 :) 14:04:12 I found Re-Proposes to check destination on migrations is complete 14:04:18 I haven't looked today 14:04:33 Automatic Live Migration Completion (luis5tb, pkoniszewski, paul-carlton) 14:04:40 how is this one going ? ^^ 14:04:52 well, it is in very good shape 14:04:55 looking good 14:04:57 i think everything is already accepted 14:04:58 looks nearly there 14:05:05 https://review.openstack.org/#/q/topic:bp/auto-live-migration-completion 14:05:17 so we are waiting for the last +2/+W on libvirt cleanup 14:05:31 yeah https://review.openstack.org/#/c/327763/ 14:05:33 paul-carlton2: Incidentally, are you aware of this patch: https://review.openstack.org/#/c/270289/ 14:05:39 danpb just pinged me to look at that one after the meeting 14:05:41 paul-carlton2: Also looks fairly close to the line 14:05:45 o/ 14:06:28 Live Migration of Rescued Instances (paul-carlton) 14:06:30 mdbooth, hadn't seen that 14:06:39 https://review.openstack.org/#/q/topic:bp/live-migrate-rescued-instances 14:07:08 this looks to have more to go - paul-carlton2 ? 14:07:18 no tempest test for that? ^ 14:07:45 mriedem, I haven't done the tempest test for it but plan to, been delayed, my dad died two weeks ago 14:08:04 mriedem: Incidentally, how can we add a tempest test for something which hasn't landed yet? What's the dance to avoid breaking the gate? 14:08:04 sorry to hear that paul-carlton2 14:08:20 depends-on 14:08:21 paul-carlton2: I'm so sorry, Paul. I'm learning tempest & happy to try a rescue test senerio. 14:08:24 paul-carlton2: sorry to hear that 14:08:29 mdbooth: yeah, depends-on 14:08:34 PaulMurray: Thanks 14:08:47 mdbooth: e.g. https://review.openstack.org/#/c/327191/ 14:09:07 I think I know what is needed to do tempest test, just need to implement it 14:10:02 the you have a merge conflict and a couple of test failures - can you fix those ? 14:10:23 yep 14:10:43 ok, next 14:10:45 Make checks before live-migration async (tdurakov) 14:10:55 https://review.openstack.org/#/q/topic:bp/live-migrate-rescued-instances 14:11:10 oops - that's wrong 14:11:23 https://review.openstack.org/#/q/topic:bp/async-live-migration-rest-check 14:11:30 tdurakov said the he will skip live migration meeting today 14:11:59 i had asked for a tempest test on the api changes there also 14:12:13 the api change is the last thing in the series https://review.openstack.org/#/c/314932/ 14:12:36 it also had other issues 14:12:58 that last one is his also 14:12:59 Remove compute-compute communication in live-migration (tdurakov) 14:13:09 https://review.openstack.org/#/q/topic:bp/remove-compute-compute-communication 14:13:24 that has 4 open changes, some have -1s 14:13:45 pkoniszewski, is tdurakov just missing the meeting - i.e. is he working on these ? 14:14:14 he is just missing the meeting 14:14:17 he keeps working on 14:14:23 ok 14:14:28 that's what i understood 14:14:37 So a few of those are close 14:14:55 lets see if we can get them landed - call for help reviewing if needed 14:15:18 #topic CI 14:15:33 I think the lm job has moved to check queue ? 14:15:45 yes 14:16:20 the results are way up and down 14:17:11 mriedem: you mean time wise or failures? 14:17:23 failures 14:17:27 very sporadic 14:17:34 got logs to look at? 14:18:36 http://tinyurl.com/zcceagy 14:19:40 logs of any of the failed ones? 14:20:14 davidgiluk: there are 2 bugs on http://status.openstack.org//elastic-recheck/index.html related to live migration failures 14:20:24 following the logstash links for those would get you some failure jobs and links ot logs 14:20:43 problem is those are also for the multinode job which is running on stable branches too 14:21:20 http://bugs.launchpad.net/bugs/1539271 14:21:20 Launchpad bug 1539271 in OpenStack Compute (nova) "Libvirt live block migration migration stalls" [High,Confirmed] 14:21:37 http://bugs.launchpad.net/bugs/1524898 14:21:37 Launchpad bug 1524898 in OpenStack Compute (nova) "Volume based live migration aborted unexpectedly" [High,Confirmed] 14:21:48 are those the ones you mean ? 14:22:39 yeah 14:22:53 but like i said, e-r is also tracking failures for those from stable branches, so it skews the results a bit 14:23:02 is there a way to tell which job they are coming from ? 14:23:04 you'd have to filter on the live migration job name and master branch only i'd think 14:23:08 build_name 14:23:13 on the left side, in logstash 14:23:40 we want to filter on build_name:"gate-tempest-dsvm-multinode-live-migration" 14:24:51 we can look later (I expect everyone is trying right now though) 14:25:47 They are the same ones that have been around a long time - it would be good to find a way into them 14:25:58 so what i'm seeing is, 14:26:08 it's this one http://status.openstack.org//elastic-recheck/index.html#1524898 14:26:36 1539271 isn't showing up in the last 7 days on that job 14:27:08 so, we could try skipping the volume-backed live migration test for now and see if the job stabilizes 14:27:10 Is anybody able to reproduce that 'stalling' bug? 1539271 14:27:30 kashyap: 1539271 isn't the problem in the live migration job from what logstash is showing me 14:27:38 we tried here and couldn't reproduce 14:27:40 the failures are coming from bug 1524898 14:27:40 bug 1524898 in OpenStack Compute (nova) "Volume based live migration aborted unexpectedly" [High,Confirmed] https://launchpad.net/bugs/1524898 14:27:45 mriedem: Is that still running the old qemu ? 14:27:48 * kashyap is reading that 14:28:20 davidgiluk: from one of the random failures http://logs.openstack.org/55/328055/12/check/gate-tempest-dsvm-multinode-live-migration/e4d241f/logs/subnode-2/ 14:28:28 davidgiluk: Not modern enough for sure 14:28:44 qemu-kvm 1:2.5+dfsg-5ubuntu10.2 14:28:57 libvirt-bin 1.3.1 14:29:07 these are ubuntu 16.04 nodes 14:29:25 kashyap: to be fair, your answer is always that we aren't running against trunk libvirt/qemu :) 14:29:27 1.2 QEMU was released in 2012 DEC 14:29:36 kashyap: this is qemu 2.5 14:29:36 mriedem: :-) I knew someone would say that 14:29:58 mriedem: OK, that's much better than it used to be 14:29:59 mriedem: Okay, "I'll shut up and look at logs" 14:30:31 so the volume backed live migration aborts 14:30:31 not sure why 14:30:48 the live migration job could be failing for other reasons we aren't tracking yet, e.g. i saw one fail this weekend which was due to ansible failing 14:30:51 but we didn't have the ansible logs from the node to see why (failed during inventory collection) 14:30:54 mriedem: Instead of holding the meeting up, can you pm and help me work through the logs? 14:31:03 i'm out some this afternoon 14:31:18 davidgiluk: danpb is probably best to ask questions about libvirt logs for a faliure 14:31:20 in the -nova channel 14:31:22 mriedem: ok, no problem; but I'd be happy to try and work through the logs with someone 14:31:25 s/afternoon/morning/ 14:31:33 work with kashyap and danpb 14:31:49 davidgiluk: Yeah, happy to assist 14:32:08 is there anything else for CI ? 14:32:36 lets go on then 14:32:42 #topic Libvirt Storage Pools 14:33:07 No news from me, really. Image backend reviews are always welcome though. Otherwise, I'm looking into tempest coverage. 14:33:36 * mdbooth is currently churning quite a lot of stuff 14:33:48 mdbooth, diana_clarke I still want to dig into this - but later this week. (as I said last week) 14:33:57 I now have a large and growing outstanding patch queue, which I intend to prepend to diana_clarke's patch queue 14:34:18 It's long enough that I'll have to send out an explanatory covering email when it's done 14:34:55 thanks mdbooth 14:35:12 you thought we would not get there this cycle - is that still your opinion ? 14:35:27 PaulMurray: Unfortunately yes 14:36:02 ok - I'll catch up later 14:36:11 moving on 14:36:26 #topic Use target VIF information on Migration 14:36:44 this topic is about awareness for something coming up 14:36:45 https://blueprints.launchpad.net/nova/+spec/migration-use-target-vif 14:37:10 right, it requires some coordination between nova and neutron 14:37:51 neutron work needs to be done first (new api), so what neutron is looking for is some statement from nova that this fits somehow in their plans 14:38:09 on the other side nova of course wants to know the priorities on neutron side for this... 14:38:24 If anyone can't remember 14:38:41 the problem is to do with the port binding after live migration 14:38:49 taking too long and causing problems 14:38:59 so the plan is to bind ports in advance 14:39:10 and happening too late! if it fails, the instance is stuck in error state on the target 14:39:36 so we're basically going to have nova ask neutron for a host capabilities during scheduling for live migration 14:40:00 which is not really the same as generic resource pools/providers 14:40:29 andreas_s: i know carl_baldwin is going to be at the nova midcycle, and armax might be too but i'm not sure about armax, 14:40:39 we're too late in newton to do anything with this in nova, 14:40:48 Yes, I'll be there the first two days. 14:40:49 but we should talk to them about it there if the neutron team has thought about it 14:40:49 mriedem, I know, that's not a problem 14:41:18 could also be next release 14:41:28 mriedem: I am working it out, do you have an updated agenda? 14:41:37 andreas_s: so work on getting any design docs / background / issues put together before the meetup so we can discuss 14:41:47 armax: we have an etherpad with topics 14:42:02 * armax looks 14:42:08 https://etherpad.openstack.org/p/nova-newton-midcycle 14:42:19 mriedem, ok, all stuff should be covered by the 2 specs (nova + neutron) 14:42:47 andreas_s, I would like to help get this lined up 14:43:00 so ask if you need anything 14:43:10 andreas_s: i've added an entry to the etherpad, if there are specs please link them in 14:43:17 what is the state of the neutron side at the moment ? 14:43:19 PaulMurray, thanks! 14:43:24 mriedem, I'll do! 14:43:29 mriedem: if I can come, I’ll be able to participate full time July 20, so it would be helpful perhaps for the Neutron topics to defer them until then 14:43:42 PaulMurray, I'm currenlty working on a database rework which is a prereq 14:43:43 mriedem: but I’ll give you a more definite answer later this week 14:43:50 armax: sounds good 14:44:18 to be honest, I don't think that the new api will make newton! 14:44:28 so probably ocata 14:45:08 andreas_s: i'd avoid adding a new api to neutron until we have discussed it on the nova side 14:45:16 as in nova would actually use the new api and fits what we need as a client 14:45:27 mriedem, exactly 14:46:01 mriedem, but the database restructure was planned anyhow - that's what I'm currently working on 14:46:38 andreas_s, did you way you are coming to our mid cycle ? 14:46:51 s/way/say/ 14:47:10 PaulMurray: I don't think there's a way, as our nova guy is also not able to come... 14:47:29 but I can try to ask my mgmt again with this item in mind 14:47:42 is markus_z coming? 14:47:56 i'm assuming that's your nova guy? 14:48:11 mriedem, yes, not he just said that he will not come 14:48:24 confusing sentence... 14:48:28 he will not come 14:48:39 ok 14:49:10 mriedem, PaulMurray: So I'll update the etherpad with the specs 14:49:17 yep, thanks 14:49:25 try to get as much feedback as possible before the midcycle 14:49:39 and will brief armax and carl_baldwin to discuss that 14:49:41 and ping me if you need help 14:50:03 ...next on the agenda 14:50:12 In addition I'll send out a mail via ML in a minute or so 14:50:16 thanks! 14:50:22 #topic reviews / open discussion 14:50:31 nothing on the agenda 14:50:39 does anyone have anything ? 14:50:43 for last few minutes 14:50:51 or we end 14:51:18 I'll take that as a 'no' 14:51:24 thanks for coming everyone 14:51:37 lets get those last patches in for FF 14:51:51 #endmeeting