14:00:16 <PaulMurray> #startmeeting Nova Live Migration 14:00:17 <openstack> Meeting started Tue Jun 28 14:00:16 2016 UTC and is due to finish in 60 minutes. The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:21 <openstack> The meeting name has been set to 'nova_live_migration' 14:00:25 <davidgiluk> o/ 14:00:25 <andrearosa> ji 14:00:27 <diana_clarke> o/ 14:00:28 <mdbooth> o/ 14:00:29 <pkoniszewski> o/ 14:00:29 <PaulMurray> Hi all 14:00:30 <andreas_s> o/ 14:00:31 <luis5tb> o/ 14:00:46 <mriedem> o/ 14:00:52 <PaulMurray> agenda https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration 14:01:12 <PaulMurray> Did anyone notice that the agenda was up last friday ! ? 14:01:25 * kashyap didn't 14:01:26 * PaulMurray organized for once 14:01:26 <luis5tb> :D 14:02:02 <PaulMurray> #topic Non-priority features 14:02:04 <paul-carlton2> hi 14:02:16 <PaulMurray> June 30: non-priority feature freeze 14:02:21 <PaulMurray> two days away 14:02:37 <PaulMurray> so a quick check for anything that needs to be pushed along 14:02:38 <kashyap> Here's the URL, that PaulMurray has been shy to post here :-) - https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration 14:02:43 <paul-carlton2> https://review.openstack.org/#/c/308198 14:03:07 <paul-carlton2> and https://review.openstack.org/#/c/328280 14:03:17 <PaulMurray> kashyap, its up a few lines ^^^^ 14:03:24 * PaulMurray not so shy 14:03:36 <kashyap> Oh, then I joined late :-) Sorry 14:03:39 <PaulMurray> immediately before you joined in fact 14:03:43 <PaulMurray> :) 14:04:12 <PaulMurray> I found Re-Proposes to check destination on migrations is complete 14:04:18 <PaulMurray> I haven't looked today 14:04:33 <PaulMurray> Automatic Live Migration Completion (luis5tb, pkoniszewski, paul-carlton) 14:04:40 <PaulMurray> how is this one going ? ^^ 14:04:52 <pkoniszewski> well, it is in very good shape 14:04:55 <luis5tb> looking good 14:04:57 <pkoniszewski> i think everything is already accepted 14:04:58 <PaulMurray> looks nearly there 14:05:05 <PaulMurray> https://review.openstack.org/#/q/topic:bp/auto-live-migration-completion 14:05:17 <pkoniszewski> so we are waiting for the last +2/+W on libvirt cleanup 14:05:31 <mriedem> yeah https://review.openstack.org/#/c/327763/ 14:05:33 <mdbooth> paul-carlton2: Incidentally, are you aware of this patch: https://review.openstack.org/#/c/270289/ 14:05:39 <mriedem> danpb just pinged me to look at that one after the meeting 14:05:41 <mdbooth> paul-carlton2: Also looks fairly close to the line 14:05:45 <woodster_> o/ 14:06:28 <PaulMurray> Live Migration of Rescued Instances (paul-carlton) 14:06:30 <paul-carlton2> mdbooth, hadn't seen that 14:06:39 <PaulMurray> https://review.openstack.org/#/q/topic:bp/live-migrate-rescued-instances 14:07:08 <PaulMurray> this looks to have more to go - paul-carlton2 ? 14:07:18 <mriedem> no tempest test for that? ^ 14:07:45 <paul-carlton2> mriedem, I haven't done the tempest test for it but plan to, been delayed, my dad died two weeks ago 14:08:04 <mdbooth> mriedem: Incidentally, how can we add a tempest test for something which hasn't landed yet? What's the dance to avoid breaking the gate? 14:08:04 <PaulMurray> sorry to hear that paul-carlton2 14:08:20 <PaulMurray> depends-on 14:08:21 <diana_clarke> paul-carlton2: I'm so sorry, Paul. I'm learning tempest & happy to try a rescue test senerio. 14:08:24 <mriedem> paul-carlton2: sorry to hear that 14:08:29 <mriedem> mdbooth: yeah, depends-on 14:08:34 <mdbooth> PaulMurray: Thanks 14:08:47 <mriedem> mdbooth: e.g. https://review.openstack.org/#/c/327191/ 14:09:07 <paul-carlton2> I think I know what is needed to do tempest test, just need to implement it 14:10:02 <PaulMurray> the you have a merge conflict and a couple of test failures - can you fix those ? 14:10:23 <paul-carlton2> yep 14:10:43 <PaulMurray> ok, next 14:10:45 <PaulMurray> Make checks before live-migration async (tdurakov) 14:10:55 <PaulMurray> https://review.openstack.org/#/q/topic:bp/live-migrate-rescued-instances 14:11:10 <PaulMurray> oops - that's wrong 14:11:23 <PaulMurray> https://review.openstack.org/#/q/topic:bp/async-live-migration-rest-check 14:11:30 <pkoniszewski> tdurakov said the he will skip live migration meeting today 14:11:59 <mriedem> i had asked for a tempest test on the api changes there also 14:12:13 <mriedem> the api change is the last thing in the series https://review.openstack.org/#/c/314932/ 14:12:36 <mriedem> it also had other issues 14:12:58 <PaulMurray> that last one is his also 14:12:59 <PaulMurray> Remove compute-compute communication in live-migration (tdurakov) 14:13:09 <PaulMurray> https://review.openstack.org/#/q/topic:bp/remove-compute-compute-communication 14:13:24 <mriedem> that has 4 open changes, some have -1s 14:13:45 <PaulMurray> pkoniszewski, is tdurakov just missing the meeting - i.e. is he working on these ? 14:14:14 <pkoniszewski> he is just missing the meeting 14:14:17 <pkoniszewski> he keeps working on 14:14:23 <PaulMurray> ok 14:14:28 <pkoniszewski> that's what i understood 14:14:37 <PaulMurray> So a few of those are close 14:14:55 <PaulMurray> lets see if we can get them landed - call for help reviewing if needed 14:15:18 <PaulMurray> #topic CI 14:15:33 <PaulMurray> I think the lm job has moved to check queue ? 14:15:45 <pkoniszewski> yes 14:16:20 <mriedem> the results are way up and down 14:17:11 <davidgiluk> mriedem: you mean time wise or failures? 14:17:23 <mriedem> failures 14:17:27 <mriedem> very sporadic 14:17:34 <davidgiluk> got logs to look at? 14:18:36 <mriedem> http://tinyurl.com/zcceagy 14:19:40 <davidgiluk> logs of any of the failed ones? 14:20:14 <mriedem> davidgiluk: there are 2 bugs on http://status.openstack.org//elastic-recheck/index.html related to live migration failures 14:20:24 <mriedem> following the logstash links for those would get you some failure jobs and links ot logs 14:20:43 <mriedem> problem is those are also for the multinode job which is running on stable branches too 14:21:20 <PaulMurray> http://bugs.launchpad.net/bugs/1539271 14:21:20 <openstack> Launchpad bug 1539271 in OpenStack Compute (nova) "Libvirt live block migration migration stalls" [High,Confirmed] 14:21:37 <PaulMurray> http://bugs.launchpad.net/bugs/1524898 14:21:37 <openstack> Launchpad bug 1524898 in OpenStack Compute (nova) "Volume based live migration aborted unexpectedly" [High,Confirmed] 14:21:48 <PaulMurray> are those the ones you mean ? 14:22:39 <mriedem> yeah 14:22:53 <mriedem> but like i said, e-r is also tracking failures for those from stable branches, so it skews the results a bit 14:23:02 <PaulMurray> is there a way to tell which job they are coming from ? 14:23:04 <mriedem> you'd have to filter on the live migration job name and master branch only i'd think 14:23:08 <mriedem> build_name 14:23:13 <mriedem> on the left side, in logstash 14:23:40 <mriedem> we want to filter on build_name:"gate-tempest-dsvm-multinode-live-migration" 14:24:51 <PaulMurray> we can look later (I expect everyone is trying right now though) 14:25:47 <PaulMurray> They are the same ones that have been around a long time - it would be good to find a way into them 14:25:58 <mriedem> so what i'm seeing is, 14:26:08 <mriedem> it's this one http://status.openstack.org//elastic-recheck/index.html#1524898 14:26:36 <mriedem> 1539271 isn't showing up in the last 7 days on that job 14:27:08 <mriedem> so, we could try skipping the volume-backed live migration test for now and see if the job stabilizes 14:27:10 <kashyap> Is anybody able to reproduce that 'stalling' bug? 1539271 14:27:30 <mriedem> kashyap: 1539271 isn't the problem in the live migration job from what logstash is showing me 14:27:38 <PaulMurray> we tried here and couldn't reproduce 14:27:40 <mriedem> the failures are coming from bug 1524898 14:27:40 <openstack> bug 1524898 in OpenStack Compute (nova) "Volume based live migration aborted unexpectedly" [High,Confirmed] https://launchpad.net/bugs/1524898 14:27:45 <davidgiluk> mriedem: Is that still running the old qemu ? 14:27:48 * kashyap is reading that 14:28:20 <mriedem> davidgiluk: from one of the random failures http://logs.openstack.org/55/328055/12/check/gate-tempest-dsvm-multinode-live-migration/e4d241f/logs/subnode-2/ 14:28:28 <kashyap> davidgiluk: Not modern enough for sure 14:28:44 <mriedem> qemu-kvm 1:2.5+dfsg-5ubuntu10.2 14:28:57 <mriedem> libvirt-bin 1.3.1 14:29:07 <mriedem> these are ubuntu 16.04 nodes 14:29:25 <mriedem> kashyap: to be fair, your answer is always that we aren't running against trunk libvirt/qemu :) 14:29:27 <kashyap> 1.2 QEMU was released in 2012 DEC 14:29:36 <mriedem> kashyap: this is qemu 2.5 14:29:36 <kashyap> mriedem: :-) I knew someone would say that 14:29:58 <davidgiluk> mriedem: OK, that's much better than it used to be 14:29:59 <kashyap> mriedem: Okay, "I'll shut up and look at logs" 14:30:31 <mriedem> so the volume backed live migration aborts 14:30:31 <mriedem> not sure why 14:30:48 <mriedem> the live migration job could be failing for other reasons we aren't tracking yet, e.g. i saw one fail this weekend which was due to ansible failing 14:30:51 <mriedem> but we didn't have the ansible logs from the node to see why (failed during inventory collection) 14:30:54 <davidgiluk> mriedem: Instead of holding the meeting up, can you pm and help me work through the logs? 14:31:03 <mriedem> i'm out some this afternoon 14:31:18 <mriedem> davidgiluk: danpb is probably best to ask questions about libvirt logs for a faliure 14:31:20 <mriedem> in the -nova channel 14:31:22 <davidgiluk> mriedem: ok, no problem; but I'd be happy to try and work through the logs with someone 14:31:25 <mriedem> s/afternoon/morning/ 14:31:33 <mriedem> work with kashyap and danpb 14:31:49 <kashyap> davidgiluk: Yeah, happy to assist 14:32:08 <PaulMurray> is there anything else for CI ? 14:32:36 <PaulMurray> lets go on then 14:32:42 <PaulMurray> #topic Libvirt Storage Pools 14:33:07 <diana_clarke> No news from me, really. Image backend reviews are always welcome though. Otherwise, I'm looking into tempest coverage. 14:33:36 * mdbooth is currently churning quite a lot of stuff 14:33:48 <PaulMurray> mdbooth, diana_clarke I still want to dig into this - but later this week. (as I said last week) 14:33:57 <mdbooth> I now have a large and growing outstanding patch queue, which I intend to prepend to diana_clarke's patch queue 14:34:18 <mdbooth> It's long enough that I'll have to send out an explanatory covering email when it's done 14:34:55 <PaulMurray> thanks mdbooth 14:35:12 <PaulMurray> you thought we would not get there this cycle - is that still your opinion ? 14:35:27 <mdbooth> PaulMurray: Unfortunately yes 14:36:02 <PaulMurray> ok - I'll catch up later 14:36:11 <PaulMurray> moving on 14:36:26 <PaulMurray> #topic Use target VIF information on Migration 14:36:44 <PaulMurray> this topic is about awareness for something coming up 14:36:45 <PaulMurray> https://blueprints.launchpad.net/nova/+spec/migration-use-target-vif 14:37:10 <andreas_s> right, it requires some coordination between nova and neutron 14:37:51 <andreas_s> neutron work needs to be done first (new api), so what neutron is looking for is some statement from nova that this fits somehow in their plans 14:38:09 <andreas_s> on the other side nova of course wants to know the priorities on neutron side for this... 14:38:24 <PaulMurray> If anyone can't remember 14:38:41 <PaulMurray> the problem is to do with the port binding after live migration 14:38:49 <PaulMurray> taking too long and causing problems 14:38:59 <PaulMurray> so the plan is to bind ports in advance 14:39:10 <andreas_s> and happening too late! if it fails, the instance is stuck in error state on the target 14:39:36 <mriedem> so we're basically going to have nova ask neutron for a host capabilities during scheduling for live migration 14:40:00 <mriedem> which is not really the same as generic resource pools/providers 14:40:29 <mriedem> andreas_s: i know carl_baldwin is going to be at the nova midcycle, and armax might be too but i'm not sure about armax, 14:40:39 <mriedem> we're too late in newton to do anything with this in nova, 14:40:48 <carl_baldwin> Yes, I'll be there the first two days. 14:40:49 <mriedem> but we should talk to them about it there if the neutron team has thought about it 14:40:49 <andreas_s> mriedem, I know, that's not a problem 14:41:18 <andreas_s> could also be next release 14:41:28 <armax> mriedem: I am working it out, do you have an updated agenda? 14:41:37 <mriedem> andreas_s: so work on getting any design docs / background / issues put together before the meetup so we can discuss 14:41:47 <mriedem> armax: we have an etherpad with topics 14:42:02 * armax looks 14:42:08 <mriedem> https://etherpad.openstack.org/p/nova-newton-midcycle 14:42:19 <andreas_s> mriedem, ok, all stuff should be covered by the 2 specs (nova + neutron) 14:42:47 <PaulMurray> andreas_s, I would like to help get this lined up 14:43:00 <PaulMurray> so ask if you need anything 14:43:10 <mriedem> andreas_s: i've added an entry to the etherpad, if there are specs please link them in 14:43:17 <PaulMurray> what is the state of the neutron side at the moment ? 14:43:19 <andreas_s> PaulMurray, thanks! 14:43:24 <andreas_s> mriedem, I'll do! 14:43:29 <armax> mriedem: if I can come, I’ll be able to participate full time July 20, so it would be helpful perhaps for the Neutron topics to defer them until then 14:43:42 <andreas_s> PaulMurray, I'm currenlty working on a database rework which is a prereq 14:43:43 <armax> mriedem: but I’ll give you a more definite answer later this week 14:43:50 <mriedem> armax: sounds good 14:44:18 <andreas_s> to be honest, I don't think that the new api will make newton! 14:44:28 <andreas_s> so probably ocata 14:45:08 <mriedem> andreas_s: i'd avoid adding a new api to neutron until we have discussed it on the nova side 14:45:16 <mriedem> as in nova would actually use the new api and fits what we need as a client 14:45:27 <andreas_s> mriedem, exactly 14:46:01 <andreas_s> mriedem, but the database restructure was planned anyhow - that's what I'm currently working on 14:46:38 <PaulMurray> andreas_s, did you way you are coming to our mid cycle ? 14:46:51 <PaulMurray> s/way/say/ 14:47:10 <andreas_s> PaulMurray: I don't think there's a way, as our nova guy is also not able to come... 14:47:29 <andreas_s> but I can try to ask my mgmt again with this item in mind 14:47:42 <mriedem> is markus_z coming? 14:47:56 <mriedem> i'm assuming that's your nova guy? 14:48:11 <andreas_s> mriedem, yes, not he just said that he will not come 14:48:24 <andreas_s> confusing sentence... 14:48:28 <andreas_s> he will not come 14:48:39 <mriedem> ok 14:49:10 <andreas_s> mriedem, PaulMurray: So I'll update the etherpad with the specs 14:49:17 <PaulMurray> yep, thanks 14:49:25 <andreas_s> try to get as much feedback as possible before the midcycle 14:49:39 <andreas_s> and will brief armax and carl_baldwin to discuss that 14:49:41 <PaulMurray> and ping me if you need help 14:50:03 <PaulMurray> ...next on the agenda 14:50:12 <andreas_s> In addition I'll send out a mail via ML in a minute or so 14:50:16 <andreas_s> thanks! 14:50:22 <PaulMurray> #topic reviews / open discussion 14:50:31 <PaulMurray> nothing on the agenda 14:50:39 <PaulMurray> does anyone have anything ? 14:50:43 <PaulMurray> for last few minutes 14:50:51 <PaulMurray> or we end 14:51:18 <PaulMurray> I'll take that as a 'no' 14:51:24 <PaulMurray> thanks for coming everyone 14:51:37 <PaulMurray> lets get those last patches in for FF 14:51:51 <PaulMurray> #endmeeting