14:00:11 <PaulMurray> #startmeeting Nova Live Migration 14:00:12 <openstack> Meeting started Tue May 3 14:00:11 2016 UTC and is due to finish in 60 minutes. The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:16 <openstack> The meeting name has been set to 'nova_live_migration' 14:00:30 <PaulMurray> Hi, is anyoe here for live migration ? 14:00:36 <luis5tb> Hi! 14:00:36 <rdopiera> o/ 14:00:38 <andrearosa> hi I am 14:00:40 <paul-carlton1> o/ 14:00:46 <mdbooth> o/ 14:00:53 <diana_clarke> o/ 14:00:56 <abhishek> \o/ 14:00:59 * mriedem is here but has to run to another physical meeting for a half hour 14:01:13 <davidgiluk> o/ 14:01:27 <PaulMurray> just gonna wait a moment while people pop up 14:01:44 <PaulMurray> There is an agenda here: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration 14:02:04 * kashyap waves 14:02:08 <PaulMurray> Now I'm back home I will try to get these agendas sorted on fridays 14:02:31 <PaulMurray> #topic Review summit outcome 14:02:45 <PaulMurray> Did everyone have a good summit ? 14:03:03 <abhishek> yes 14:03:20 <PaulMurray> I put the new deadlines on the agenda 14:03:28 <PaulMurray> but so you can see them here: 14:03:45 <PaulMurray> Non-priority spec freeze: May 30-03 (R-18) 14:03:45 <PaulMurray> Non-priority feature freeze: Jun 27-01 (R-14) 14:03:45 <PaulMurray> Priority spec freeze: Aug 01-05 (R-9) 14:03:45 <PaulMurray> Priority feature freeze: Aug 29-02 (R-5) 14:03:45 <PaulMurray> No deadlie for CI work 14:04:02 <mriedem> #link https://wiki.openstack.org/wiki/Nova/Newton_Release_Schedule 14:04:08 <paul-carlton1> May 30-03 means May 30th? 14:04:32 <abhishek> May 30 to June 3 probably 14:04:34 <PaulMurray> I think it means the week of MAy 30th (goes up to 3rd June) 14:04:36 <mriedem> june 2nd 14:04:48 <mriedem> thursday 6/2 is n-1 and non-priority spec approval freeze 14:05:25 <PaulMurray> mriedem, do al those dates refer to Thursday as the deadline ? 14:06:28 <PaulMurray> Anyway, you have the link there 14:06:47 <PaulMurray> I also did an ML post - with some corrections in the thread.... 14:07:04 <PaulMurray> #link Summit round up: http://lists.openstack.org/pipermail/openstack-dev/2016-April/093538.html 14:07:26 <PaulMurray> Feel free to add corrections if you see any 14:07:31 <PaulMurray> mistakes 14:07:53 <mdbooth> It was a great roundup, thanks 14:08:20 <kashyap> Yeah, useful to those of us who were not present 14:08:40 <PaulMurray> The storage pools work was considered right to be made a priority 14:08:57 <PaulMurray> because it addresses security issues and is a mess now 14:09:09 <PaulMurray> so we will push that one as best we can 14:09:17 <PaulMurray> the others are non priority 14:09:46 <PaulMurray> mdbooth, paul-carlton1 and I will need a little help 14:10:04 <PaulMurray> mdbooth, are you going to share a list of patches soon ? 14:10:40 <mdbooth> PaulMurray: They should be automatically attached to the bp? 14:11:00 <mdbooth> So, breaking news, there are now 2 of us working on this series 14:11:12 * diana_clarke waves hello! I'll be working with mdbooth to help with this. 14:11:12 <PaulMurray> mdbooth, who is the second ? 14:11:19 <mdbooth> ^^^ 14:11:26 <PaulMurray> hello diana_clarke 14:11:33 <PaulMurray> what time zone are you ? 14:11:34 <diana_clarke> PaulMurray: hello! 14:11:45 <diana_clarke> PaulMurray: Toronto, Canada (EDT) 14:11:50 <andrearosa> diana_clarke: welcome 14:11:58 <diana_clarke> andrearosa: Thanks! 14:12:30 <PaulMurray> great, good to have you with us 14:12:57 <PaulMurray> mdbooth, is the spec approved - I didt look before the meeting 14:13:10 <mdbooth> I don't think so, but I know it's being approved 14:13:14 <mdbooth> s/approved/reviewed/ 14:13:38 <mdbooth> No, it's not approved 14:13:40 <PaulMurray> #link mdbooth's spec: https://review.openstack.org/#/c/302117/ 14:13:48 <PaulMurray> right one ^^ ? 14:13:57 <mdbooth> Yes 14:14:34 <PaulMurray> ok - lets get this reviewed 14:14:44 <mdbooth> dansmith said he was looking at it 14:15:00 <dansmith> as we speak 14:15:01 <PaulMurray> good - dansmith did say he would help with core reviews 14:15:02 <mdbooth> So I'm expecting a -2 any time now ;) 14:15:08 <PaulMurray> thanks dansmith 14:15:55 <PaulMurray> The other spec we focused on in the session was post-copy 14:16:20 <PaulMurray> luis5tb, I think that's your spec right ? 14:16:26 <luis5tb> yes 14:16:40 <PaulMurray> #link Add post copy: https://review.openstack.org/#/c/301509/ 14:16:46 <PaulMurray> Were you there ? 14:16:57 <mdbooth> PaulMurray: Did you see danpb's post about performance testing? 14:17:03 <luis5tb> no, unfortunately I couldn't attend the summit 14:17:26 <PaulMurray> mdbooth, no, I didn't - do you have a link ? 14:17:48 <mdbooth> http://lists.openstack.org/pipermail/openstack-dev/2016-May/093741.html 14:17:57 <luis5tb> how should we proceed with that one? should I modify the specs so that it is an admin option (at nova.conf) to enable/disable postcopy? 14:18:10 <luis5tb> and then automatic switch based on number of memory iterations? 14:18:26 <mdbooth> PaulMurray: I think it's worth parking this discussion until next week when hopefully he'll have posted the results 14:19:07 <luis5tb> but regardless of the results, we still need to have a decission on how to enable post-copy 14:19:07 <davidgiluk> the idea about only using it for small VMs is odd; using it for big VMs works very nicely 14:19:14 <PaulMurray> jsut for luis5tb sake.... we need to have a discussion on your spec 14:19:18 <PaulMurray> or possibly on the ML 14:19:34 <PaulMurray> and danpb's work might inform it a little 14:19:41 <luis5tb> even if postcopy is better for performance (with I'm assuming it is), for reliability it may be not the best solution 14:19:49 <mdbooth> PaulMurray: We also need to pull in other drivers 14:20:06 <mdbooth> We need a general api, not a libvirt api 14:20:28 <PaulMurray> mdbooth, agreed, but we will take time to get any concensus on it 14:20:54 <PaulMurray> luis5tb, might be able to create an argument for having it as a fixed config option 14:20:54 <mdbooth> It's taking enough to just to decide whether it's a good idea 14:21:23 <luis5tb> can we, for now, just propose an admin configurable variable (no API modification) and automatic switch? 14:21:39 <luis5tb> maybe later we can decide if it is worthy to include it as an API or not 14:22:01 <mdbooth> We have a force completion api already, right? 14:22:24 <luis5tb> yes, but that will degrade the performance of the VM, or even make it "non-live" 14:22:46 <paul-carlton1> yes an idea we discussed earlier today was to have force complete switch to post copy if post copy is enabled in conf 14:22:50 <PaulMurray> mdbooth, right, does it make sense to do without an auto switch - i.e. use post copy if config opt set, otherwise use pause 14:22:54 <PaulMurray> on foce complete 14:22:55 <mdbooth> We might have an admin knob to configure the behaviour of that 14:23:14 <mdbooth> paul-carlton1: +1 14:23:28 <andrearosa> or specify it in the body of the force complete API call, if I am not wrong we pass so,ething in the body 14:23:46 <PaulMurray> andrearosa, that's the option they don't like 14:23:51 <mdbooth> andrearosa: The problem with that is you start exposing qemu details in the api 14:23:53 <PaulMurray> it becomes visible in the api 14:24:02 <mdbooth> Nobody likes that 14:24:12 <andrearosa> oh yes because it is not available for all drivers 14:24:52 <PaulMurray> so looks like mdbooth paul-carlton1 and me ( PaulMurray ) think doing it on force-complete is ok if set by config 14:24:59 <PaulMurray> any other suggestions ? 14:25:07 <luis5tb> I think the automatic switching is not the problem, d.Berrange like that 14:25:40 <luis5tb> but the discussion is still how to include the post-copy flag in the VMs, as this needs to be included when triggering the migration 14:25:46 <luis5tb> so, a config option? 14:26:02 <paul-carlton1> auto switch over is a viable option but gives the user less control 14:26:03 <PaulMurray> yes, sounds good to me - at least for now 14:26:04 <luis5tb> I also like the idea to also include post-copy at force option 14:26:08 <mdbooth> fyi, hyperv and xenapi appear at first glance to support live migration 14:26:17 <mdbooth> And I know it's on vmware's immediate todo list 14:26:22 <mdbooth> It would be nice to at least ask them 14:26:35 <PaulMurray> mdbooth, yep, we can do that 14:27:04 <paul-carlton1> You could do both, i..e allow force complete to invoke it but also allow the migration code to decide it is needed if the instance is not making progress 14:27:20 <luis5tb> my initial idea for the automatic switching was to be based on a variable regarding the number of memory iterations before the switching 14:27:23 <davidgiluk> paul-carlton1: Yes, that's a good combo 14:27:33 <mdbooth> paul-carlton1: The latter could be an api option, btw 14:27:40 <luis5tb> that could be working together with the force migration + post-copy too 14:27:45 <mdbooth> i.e live-migration auto-force=True 14:28:32 <PaulMurray> mdbooth, do you mean that as a config opt or as an flag in the API ? 14:28:44 <mdbooth> PaulMurray: I was thinking api flag 14:28:51 <PaulMurray> not sure I like that 14:28:55 <luis5tb> I like paul-carlton1 idea, pretty similar to what I already have 14:29:22 <mdbooth> PaulMurray: I'm not sufficiently wedded to it to bike-shed it :) 14:29:23 <PaulMurray> ...but would need to think 14:29:44 <PaulMurray> luis5tb, would you like to do an updated spec 14:29:50 <PaulMurray> then we can discuss it there 14:29:58 <luis5tb> yes, I was waiting for summit decision to do so 14:30:03 <PaulMurray> and maybe promote its existance on the ML when its up 14:30:11 <andrearosa> I am not 100% sure about the auto-switch if we are not making any progress, some users coudl prefer to abort the live migration and not going for the risky post-copy option 14:30:40 <luis5tb> but if no progress is being done, you could do the force-completiion 14:30:44 <mdbooth> andrearosa: That's why I was thinking api flag rather than config opt 14:30:46 <davidgiluk> andrearosa: As a feature that's enablable 14:30:49 <PaulMurray> The plan is to discuss it on the spec / ML to converge on a plan 14:30:51 <paul-carlton1> That is where my spec come in https://review.openstack.org/#/c/306561 Automatic Live Migration Completion 14:30:53 <mdbooth> indeed 14:31:22 <PaulMurray> So, luis5tb, we'll leave it with you to do the spec and tell us when its up 14:31:26 <PaulMurray> and we can move on here 14:31:40 <luis5tb> ok 14:31:44 <PaulMurray> #topic Specs 14:31:51 <luis5tb> I'll update the spec and send the email 14:32:22 <PaulMurray> #link specs for review: https://review.openstack.org/#/q/project:openstack/nova-specs+status:open 14:32:42 <PaulMurray> #undo 14:32:42 <openstack> Removing item from minutes: <ircmeeting.items.Link object at 0xa92bb10> 14:32:49 <PaulMurray> that's not what I meant to post 14:33:15 <PaulMurray> I think this is what I was looking for 14:33:24 <PaulMurray> #link subteam specs for review: https://etherpad.openstack.org/p/newton-nova-priorities-tracking 14:33:47 <PaulMurray> The deadline for non-priority is very soon 14:34:57 <PaulMurray> Does anyone want to talk about one of them now ? 14:35:44 <PaulMurray> ok 14:36:07 <PaulMurray> We will review progress on these specs each week up to the freeze date 14:36:14 <paul-carlton1> https://review.openstack.org/#/c/307131 Live Migration of Rescued Instances 14:36:40 <paul-carlton1> reviews welcome, there is code ready to go too 14:36:40 <PaulMurray> paul-carlton1, ? 14:36:56 <paul-carlton1> You asked about spec reviews 14:37:00 <PaulMurray> yep, 14:37:21 <PaulMurray> that is an interesting one - should be a quick win if its all correct 14:37:47 <PaulMurray> next topic 14:38:05 <PaulMurray> #topic Open Discussion 14:38:25 <PaulMurray> aha - https://review.openstack.org/#/c/215483/3 14:38:27 <abhishek> hi, this is related to bug https://bugs.launchpad.net/nova/+bug/1470420 14:38:29 <openstack> Launchpad bug 1470420 in OpenStack Compute (nova) "Set migration status to 'error' instead of 'failed' during live-migration" [Low,In progress] - Assigned to Rajesh Tailor (rajesh-tailor) 14:38:57 <abhishek> during summit we had discussion with PaulMurray and alaski for the same. 14:39:17 <abhishek> please take a look at it, IMO, as of now we can have this solution and once a permenant fix is implemented we can remove the periodic task _cleanup_incomplete_migrations 14:40:01 <mdbooth> abhishek: Is this just the freeform status field? 14:40:13 <abhishek> currently patch is in merge conflict, I can rebase it in few minutes if it is required 14:40:21 <PaulMurray> what did alaski say when yu spoke to him? 14:40:53 <abhishek> alaski said he will have a look again at it, We have added comment on patch for him 14:41:06 <mdbooth> This should follow whatever we do at project level, tbh. It looks like user-visible api to me, which means changing it could be regarded as a regression. 14:41:21 <mdbooth> I mean, it's obviously a wart, but it might be a wart we have to live with 14:41:54 <mdbooth> However, if we've decided as a project that we fix this sort of thing... 14:41:59 <abhishek> mdbooth, but with this we will be able to cleanup the files from source or desination node 14:42:06 <PaulMurray> mdbooth, I think we need to look at the migrations reporting in general a bit more carefully 14:42:16 <mdbooth> We can punch operators in the face /so many times/ before they complain 14:42:37 <PaulMurray> at the moment the migrations record is used in several different ways 14:42:58 <PaulMurray> some types even end up finished when others end up completed 14:43:12 <mdbooth> Eww 14:43:12 <abhishek> PaulMurray: right 14:43:41 <PaulMurray> abhishek, if its about a quick fix, is there any other non-user facing parameters that can be used to flag a clean up is needed ? 14:43:55 <abhishek> currently patch is in merge conflict, I can rebase it 14:44:14 <mdbooth> If we're going to do this, can we please create an enum and enforce it somewhere? 14:44:21 <abhishek> PaulMurray, IMO no 14:44:24 <mdbooth> Perhaps at the object layer? 14:45:11 <abhishek> mdbooth, this can be done 14:45:16 <PaulMurray> abhishek, I think mdbooth's point is where this got stuck before - its user facing and some people have tooling that looks for the values 14:47:15 <abhishek> PaulMurray, I think its still good to have way to cleanup files rather keeping it as it is 14:47:35 <PaulMurray> abhishek, I think we may need to find another way though, we keep going around on this one 14:48:26 <abhishek> PaulMurray, I will try to get alaski's view on this 14:48:42 <PaulMurray> abhishek, ok 14:49:09 <PaulMurray> abhishek, thanks for keeping on with it, but be open minded about how it will end up 14:49:25 <abhishek> PaulMurray, mdbooth: sure, thank you for your time 14:49:39 <PaulMurray> anything else for the last few minutes ? 14:49:52 <PaulMurray> anything anyone would like to see in these meetings ? 14:50:09 <PaulMurray> (in future I mean) 14:51:14 <PaulMurray> I will try to organise the subteam page: https://etherpad.openstack.org/p/newton-nova-live-migration 14:51:24 <PaulMurray> in the mean time - thanks for comig 14:51:29 <diana_clarke> Nice to meet you, folks! 14:51:42 <PaulMurray> #endmeeting