14:00:47 <PaulMurray> #startmeeting Nova Live Migration
14:00:48 <openstack> Meeting started Tue Nov 17 14:00:47 2015 UTC and is due to finish in 60 minutes.  The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:49 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:53 <openstack> The meeting name has been set to 'nova_live_migration'
14:01:01 * bauzas lurks
14:01:11 * johnthetubaguy lurks with intent
14:01:17 <PaulMurray> Hi, Anyone here for live migration ?
14:01:21 <pkoniszewski> o/
14:01:23 <paul-carlton> yep
14:01:23 <rdopiera> o/
14:01:24 <mdbooth> o/
14:01:24 <alex_xu> o/
14:01:34 * kashyap waves
14:01:44 <shaohe_feng1> hi PaulMurray
14:01:46 <eliqiao> hi PaulMurray
14:01:46 <kashyap> *
14:01:47 <andrearosa> hi
14:01:50 <eliqiao> o/
14:02:01 <eliqiao> hi pkoniszewski
14:02:18 <shaohe_feng1> hi pkoniszewski
14:02:25 <shaohe_feng1> o/
14:02:28 <PaulMurray> Hi Everyone, just wait one minute in case someone is late
14:02:34 <jlanoux> O/
14:03:17 <johnthetubaguy> looks like a good turn out, great to see :)
14:03:30 <eliqiao> I though the time is utc 1300 as the polled . okay anyway.
14:03:30 <PaulMurray> ok - that's lone enough - thanks for coming everyone
14:04:08 <PaulMurray> eliqiao, intersting, you are second to say that - the poll was definitely 1400UTC, but we're here now anyway
14:04:31 <PaulMurray> I assume you have all seen the meeting page here: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration
14:04:37 <PaulMurray> it has an agenda on it
14:04:41 <johnthetubaguy> daylight savings changed between now and the poll, thats always fun
14:04:59 <kashyap> And, the 6 etherpads it has URLs to :P
14:05:00 <PaulMurray> I will try to make sure the agenda far enough in advance that everyone gets to see it before
14:05:26 <eliqiao> johnthetubaguy: get it :)
14:05:49 <PaulMurray> I'll also try to go through things promptly in the meeting so we can go early if there is nothing in particular to discuss
14:05:55 <shaohe_feng1> johnthetubaguy:  got it. I'm also puzzle with the time.
14:06:21 <PaulMurray> Did the poll adjust the times for local time zones?
14:06:42 <pkoniszewski> it did, i had no problems
14:06:56 <PaulMurray> Ah, I will remember that in future - sorry everyone
14:07:06 <pkoniszewski> but i was also affected by daylight saving thing
14:07:37 <PaulMurray> #topic Specs status
14:07:50 <johnthetubaguy> I like this part of the year when I live in UTC, life is so much simpler!
14:07:54 <shaohe_feng1> PaulMurray: there is no dalylight in our country.
14:08:15 <PaulMurray> shaohe_feng1, not at all, how do you see?
14:08:22 <kashyap> PaulMurray: :-)
14:08:25 <pkoniszewski> :D
14:08:33 <PaulMurray> #link https://etherpad.openstack.org/p/mitaka-nova-spec-review-tracking
14:08:43 <PaulMurray> This is the priority spec review page
14:09:11 <PaulMurray> We have a few specs under live migrate
14:09:19 <PaulMurray> just scroll a little from the top
14:09:27 <shaohe_feng1> PaulMurray:  daylight saveing. :)
14:09:32 <PaulMurray> Two are already merged
14:10:06 <PaulMurray> I added the last one fo alex_xu earlier today
14:10:16 <alex_xu> PaulMurray: thanks
14:10:32 <alex_xu> sorry join the party late
14:10:51 <PaulMurray> I want to discuss the pause VM during live migration for a moment, any others people want to mention?
14:11:25 <PaulMurray> #link https://review.openstack.org/#/c/229040/
14:11:48 <mdbooth> PaulMurray: libvirt volumes may have hit an issue requiring a minor rethink
14:11:55 <PaulMurray> This is the pause VM one - it is very close to being done, but has a couple of points to clear
14:12:03 <PaulMurray> mdbooth, ok - come back to that in a mo
14:12:07 <mdbooth> +!
14:12:15 <eliqiao> I think it will be hard to control the status when the live-migraion is running
14:12:28 <bzhou> " As an operator of an OpenStack cloud, I would like the ability to pause VM 40
14:12:28 <bzhou> during live migration. "     why "during live migration"?
14:12:49 <PaulMurray> bzhou, its really a way to push a migration through
14:13:00 <pkoniszewski> bzhou to get rid of dirty pages
14:13:11 <johnthetubaguy> so, for me, the key think here, is live-migrate can literally take for ever, lets give operators a way for force the end of that process
14:13:14 <johnthetubaguy> yeah, that
14:13:20 <eliqiao> does pause == cancle?
14:13:32 <PaulMurray> eliqiao, no
14:13:38 <PaulMurray> we will do both seperately
14:13:44 <PaulMurray> as two features
14:13:48 <pkoniszewski> no, cancel is different operation which will literally cancel this process
14:14:17 <PaulMurray> So the question on the spec is about naming
14:14:19 <mdbooth> Can we support post-copy yet?
14:14:24 <PaulMurray> but also about future intent
14:14:31 <pkoniszewski> no yet, post-copy should be there around N-release of openstack
14:14:33 <bzhou> do we need to resume it after LM is done?
14:14:36 <mdbooth> Another option for an operator might be to cancel and post-copy
14:14:45 <pkoniszewski> bzhou: we don't, libvirt does the job
14:14:52 <eliqiao> [danpb] NB, when we make use of post-copy migration, cancel will be impossible once post-copy starts.
14:15:05 <eliqiao> I see this from LM etherpad
14:15:12 <paul-carlton> I thought post copy was pending changes in qemu that are still being tested?
14:15:20 <pkoniszewski> paul-carlton: exactly
14:15:29 <shaohe_feng1> PaulMurray: converge can also can help to reduce the rate dirty pages
14:15:36 <pkoniszewski> a change to libvirt should be simple, tricky part will be OpenStack, e.g., networking
14:16:01 <johnthetubaguy> we should get back to Mitaka stuff I think
14:16:04 <pkoniszewski> y
14:16:05 <PaulMurray> The question I would like to see an answer to is do we intend this only be the pause version
14:16:11 <bzhou> are we able to resume even if LM is NOT done?
14:16:16 <andrearosa> post copy is not an option now, do not discuss it. Converge can help the idea of pausing the LM is a kind of last resort
14:16:34 <PaulMurray> or do we intend that there could be other options that will come under the same action
14:16:45 <johnthetubaguy> so I think we can add options later
14:16:50 <shaohe_feng1> post copy is ready for qemu?
14:16:57 <PaulMurray> If the second we need a different name - that is all I think
14:17:02 <pkoniszewski> PaulMurray: I don't like this idea to have different actions under the same action
14:17:09 <pkoniszewski> it brings confusions
14:17:10 <eliqiao> shaohe_feng1: no
14:17:30 <kashyap> shaohe_feng1: It is merged upstream QEMU
14:17:35 <kashyap> And the relevant Kernel part.
14:17:38 <johnthetubaguy> thing is, we don't need to decide that now, except if we add pause in the API name, I guess
14:18:24 <PaulMurray> johnthetubaguy, agreed - so perhaps change name - makes it distinctly different to normal pause operation
14:18:32 <johnthetubaguy> just for clarity, we generally don't want features in Nova to depend on unreleased QEMU or libvirt
14:18:53 <eliqiao> johnthetubaguy: +1
14:19:07 <PaulMurray> Lets move the discussion of the naming to the spec and move on
14:19:14 <johnthetubaguy> so yes, the current list of names
14:19:15 <andrearosa> I like the johnthetubaguy's proposal to call it "live-migrate-force-end". That name doesn't have any implementation details and give us the flexibility to change the actual actions in future
14:19:33 <PaulMurray> mdbooth, you had something to discuss
14:19:39 <kashyap> johnthetubaguy: Post copy dev in QEMU is on this channel - davidgiluk.  Maybe he can comment which release it'll be in QEMU.
14:20:02 <mdbooth> PaulMurray: Yes, although I posted review comments.
14:20:03 <quintela____> 2.5
14:20:06 <eliqiao> but force-end is kind like cancel
14:20:07 <quintela____> cade is already upstream
14:20:12 <mdbooth> https://review.openstack.org/#/c/232053/
14:20:28 <pkoniszewski> andrearosa: if we will change entire action in the future, e.g. to something that will dynamically throttle VM, +1 from me
14:20:31 <PaulMurray> mdbooth, does it need discussion here or just highlight the review?
14:20:39 <pkoniszewski> andrearosa: if we gonna hide two distinct operations under one action - huge -1
14:21:00 <mdbooth> The concern is that libvirt's storage copy method will probably be a severe performance regression from rsync
14:21:06 <mdbooth> in almost all cases
14:21:22 <johnthetubaguy> mdbooth: seems like we might have to assume rsync lives until that bug is fixed then?
14:21:23 <davidgiluk> kashyap: The libvirt work is still in progress; there is also an experimental openstack set that the guys at UMU have done
14:21:24 <mdbooth> So while I think we should go ahead with it, rsync needs to remain a first class citizen for the moment
14:21:43 <johnthetubaguy> mdbooth: that sounds like a good approach
14:21:46 <mdbooth> Yes. It can live differently, using libvirt storage pools.
14:22:06 <mdbooth> But I don't think we can deprecate it, because some people do use it, and they're unlikely to be happy.
14:22:07 <paul-carlton> mdbooth, rsync is not an option for security reasons
14:22:22 <PaulMurray> paul-carlton, that's not true for everyone
14:22:28 <johnthetubaguy> paul-carlton: well you can choose performance vs security right?
14:22:41 <paul-carlton> we really need the libvirt perf improvement danbp proposed befor this is viable
14:23:10 <mdbooth> paul-carlton: I don't think we need to go that far. The storage pools stuff looks great. We'll just have to retain some cruft for a bit.
14:23:20 <andrearosa> paul-carlton: it is an option now. I think that the recap made by danp in his comment is a reasonable approach
14:23:27 <johnthetubaguy> mdbooth: +1
14:23:29 <paul-carlton> true, we should do it but a lot of people (including HPE) will not use it if the choice is rsync or slow
14:23:56 <johnthetubaguy> paul-carlton: agreed, we just don't have an alternative, and it seems like slow might get fixed soonish
14:24:39 <PaulMurray> maybe we can find people to fix the slow option - we could look into that
14:24:50 <johnthetubaguy> yeah, I think thats the correct approach here
14:24:51 <PaulMurray> is anyone working on it now?
14:24:55 <mdbooth> I think it's vaguely in motion.
14:24:55 <paul-carlton> if so then fine, I'm not arguing against doing the work, I'm saying that there will be class of customers who won't use it till the libvirt fix is done
14:25:09 <mdbooth> danpb sounded interested. I've been looking at the code this morning.
14:25:40 <PaulMurray> #info storage pools version of migration will be slow - need to keep rsync version and look to improve performance
14:25:57 <paul-carlton> I'd prefer to implement based on the assumption that libvirt will fix performance issues in due course
14:26:04 <johnthetubaguy> we should get that detail in the spec, but I think the existing comments make that clear
14:26:35 <PaulMurray> paul-carlton, mdbooth I think this all sounds ok
14:26:56 <PaulMurray> do the change - keep rsync option - organise getting performance improvement
14:27:02 <PaulMurray> that can be done
14:27:07 <mdbooth> Thrash out the detail in the spec?
14:27:14 <PaulMurray> unless anyone thinks I'm being naive
14:27:24 <mdbooth> Sounds good to me
14:27:39 <PaulMurray> lets do that then
14:27:57 <PaulMurray> Any other specs - they are the main thing at the moment
14:28:08 <johnthetubaguy> I am curious about the query one
14:28:09 <PaulMurray> Any other with urgent issue to go over that is
14:28:10 <johnthetubaguy> and cancel
14:28:30 <PaulMurray> johnthetubaguy, goahead
14:29:06 <johnthetubaguy> do we agree how that API should look, I assume similar to that pause/force-end one?
14:29:15 <johnthetubaguy> the action of cancel I mean
14:29:39 <PaulMurray> paul-carlton, ? ~^^
14:29:45 <johnthetubaguy> looks like the latest version is going that way, which is cool for me
14:29:59 <paul-carlton> yep
14:30:24 <johnthetubaguy> so the current version doesn't have much query in it
14:30:33 <johnthetubaguy> by which I mean, the progress reporting
14:31:01 <paul-carlton> I took it out and changed the name
14:31:01 <pkoniszewski> i thought that they decided that progress is already reported somehow
14:31:19 <paul-carlton> yes, because progress is reported on instance details
14:31:29 <johnthetubaguy> ah, with the percentages
14:31:40 <mdbooth> percentage of what?
14:31:43 <mdbooth> Disk transfer?
14:31:44 <PaulMurray> paul-carlton, it still says query in the title
14:31:46 <mdbooth> Convergence?
14:32:00 <johnthetubaguy> mdbooth: good question
14:32:04 <mdbooth> How can you tell the difference between transfer of a big disk, and a vm which is slow to converge?
14:32:27 <shaohe_feng1> percentage of ram, disk?
14:32:42 <johnthetubaguy> mdbooth: yeah, I think thats what I would love to see, disk transfer vs convergence step
14:33:05 <paul-carlton> I'll fix the title
14:33:38 <PaulMurray> Maybe that should be split out as a separate issue as there is something already there for query
14:33:48 <PaulMurray> just fix the title as paul-carlton says
14:33:50 <johnthetubaguy> PaulMurray: +1 for a separate spec on this
14:33:51 <paul-carlton> I think if it is slow to converge you should see percentage complete going up and down
14:34:25 <paul-carlton> There is only one spec, abort, I'll purge it of any mention of query, missed the title
14:34:44 <PaulMurray> paul-carlton, good
14:34:53 <johnthetubaguy> I would be nice to be explicit about this at some point, but yeah, lets track that separatly
14:35:03 <johnthetubaguy> does anyone want to talk that spec?
14:35:14 <johnthetubaguy> oops
14:35:15 <johnthetubaguy> take
14:35:22 <PaulMurray> I'd rather move to the next topic
14:35:30 <PaulMurray> But first a note to everyone
14:36:01 <PaulMurray> please do review the specs on that list. We can target getting things in for the spec cutoff
14:36:12 <PaulMurray> even though it is not absolute for priorities
14:36:23 <PaulMurray> I really want to make progress as fast as possible
14:36:28 <shaohe_feng1> johnthetubaguy:  do we need a smart tune(such as migration, pause) when  converge is slow when  percentage complete going up and down
14:36:42 <johnthetubaguy> #help would be good for someone to take on writing up better live-migrate progress
14:36:55 <PaulMurray> #topic CI status
14:37:02 <johnthetubaguy> shaohe_feng1: I think the pause and cancel spec have a lot of that covered already
14:37:10 <PaulMurray> I don't see tdurkov
14:37:23 <PaulMurray> Does anyone know the status of CI?
14:37:40 <PaulMurray> I know there is a job he has a review for
14:37:44 <shaohe_feng1> such as migration, pause/such as compress, pause
14:38:16 <PaulMurray> I also don't have the review links to hand
14:38:20 <davidgiluk> shaohe_feng1: QEMU 2.5 has an improved autoconverge
14:38:22 <kashyap> PaulMurray: For the CI to work, first multi-node CI ought to be working successfully, no?
14:38:40 <PaulMurray> kashyap, the plan is to create a new CI job for live migration
14:38:44 <PaulMurray> then add coverage to it
14:38:54 <johnthetubaguy> yeah, the discussion was create a new one
14:38:56 <PaulMurray> The idea is to seperate it from other instabilities
14:39:09 <kashyap> Sure.
14:39:47 <paul-carlton> johnthetubaguy, I'll look at the way progress is calculated and propose a solution meets everyones needs if you like
14:39:53 <PaulMurray> I know Timofei Durakov is part way there
14:40:07 <shaohe_feng1> davidgiluk: so nova do not need an  strategy tune, right?
14:40:25 <PaulMurray> So I guess there is nothing more to say on this one
14:40:32 <johnthetubaguy> paul-carlton: sounds good
14:40:39 <pkoniszewski> one more thing
14:40:48 <kashyap> shaohe_feng1: I think topics are bing mixed up, probably good for open floor discussion.
14:40:50 <PaulMurray> pkoniszewski, on CI?
14:41:06 <pkoniszewski> oh, thought its open discussion already, sorry! :)
14:41:16 <shaohe_feng1> kashyap:  sorry.  go ahead CI.
14:41:23 <PaulMurray> no - going on with agenda
14:41:25 <davidgiluk> shaohe_feng1: The qemu stuff is experimental and still needs the libvirt stuff wiring up, but it should do appropriate CPU limiting - it would be good to check that out before trying to write a new one
14:41:38 <PaulMurray> #topic Bugs
14:41:53 <PaulMurray> shaohe_feng1, this is your spot I think?
14:42:12 <PaulMurray> You have been looking at bugs in Intel and Rackspace right?
14:42:29 <PaulMurray> https://docs.google.com/spreadsheets/d/19MFatOpjePS4JtkVHXCh6Qa8XUf6T2t0Igy1PucZ3Zk/edit#gid=2127877307
14:42:44 <shaohe_feng1> PaulMurray:  maybe we can discuss the pkoniszewski bug fix.
14:43:04 <pkoniszewski> this list is a bit outdated
14:43:16 <PaulMurray> Do you have a link?
14:43:24 <shaohe_feng1> PaulMurray: https://review.openstack.org/#/c/168916/
14:43:34 <PaulMurray> #link https://review.openstack.org/#/c/168916/
14:44:34 <shaohe_feng1> PaulMurray: https://review.openstack.org/#/c/235994/
14:44:35 <PaulMurray> I haven't looked at this yet, can you tell us something?
14:45:05 <PaulMurray> #link https://review.openstack.org/#/c/235994/
14:46:02 <shaohe_feng1> PaulMurray: yes. we need to set the correct VM status when live-migration RPC call   timeout
14:46:32 <shaohe_feng1> pkoniszewski is working on it.
14:46:57 <PaulMurray> Do you need anything, or just reviews?
14:47:00 <bauzas> less than 15 mins left
14:47:06 <PaulMurray> bauzas, thanks
14:47:32 <shaohe_feng1> PaulMurray:   we need a little  discussion.
14:48:00 <PaulMurray> Can I suggest it goes on the ML this time
14:48:14 <shaohe_feng1> PaulMurray: yes.
14:48:18 <pkoniszewski> ML should be way better for this problem
14:48:22 <pkoniszewski> its a bit weird
14:48:26 <PaulMurray> My understanding is that there are a few people going through the live migraiton bugs
14:48:36 <PaulMurray> Is that still true
14:48:38 <PaulMurray> ?
14:49:23 <andrearosa> no replies so probably you are right
14:49:31 <PaulMurray> Maybe its best if I contact you about this seperately
14:49:35 <PaulMurray> moving on
14:49:40 <PaulMurray> #topic Open
14:49:45 <mdbooth> Did I miss the bug list?
14:50:03 <mdbooth> Ah, the google spreadsheet
14:50:16 <PaulMurray> mdbooth, there are links on the meeting page and on the google sheet
14:50:34 <PaulMurray> mikal has put reviews on the priority review page:
14:50:50 <PaulMurray> #link https://etherpad.openstack.org/p/mitaka-nova-priorities-tracking
14:51:05 <PaulMurray> Please return to this regularly to get subteam reviews done
14:51:23 <PaulMurray> then when reviews are ready we can bring htem to the attention of cores
14:51:24 <shaohe_feng1> PaulMurray: do you means this google link? https://docs.google.com/spreadsheets/d/19MFatOpjePS4JtkVHXCh6Qa8XUf6T2t0Igy1PucZ3Zk/edit#gid=2127877307
14:52:15 <PaulMurray> If yo uhave code ready to review put it on the page in the subteam section
14:53:25 <johnthetubaguy> +1 for that etherpad
14:53:31 <johnthetubaguy> would be great to get the bug fixes listed in there
14:53:36 <johnthetubaguy> if the code is up for review
14:53:37 <shaohe_feng1> +1.
14:54:00 <PaulMurray> johnthetubaguy, do you want the individual reviews in there - there?
14:54:15 <mdbooth> paul-carlton: Do you have code already for anything relating to libvirt storage pools?
14:55:17 <PaulMurray> Anything that needs to be brought up in this meeting now?
14:55:19 <paul-carlton> mdbooth, there is lots of code from Solly Ross's attempt at this but I've not done anything yet
14:55:26 <PaulMurray> (as opposed to discussed after)
14:55:38 <mdbooth> paul-carlton: Ok, lets chat offline as I'm interested and don't want to tread on your toes.
14:55:56 <paul-carlton> Welcome the help, we should talk
14:55:58 <PaulMurray> mdbooth, thanks for stepping up for work to do
14:56:15 <PaulMurray> Thank you everyone for coming - please help me be as organised
14:56:29 <PaulMurray> as I can be - I will settle into chairing
14:56:43 <kashyap> PaulMurray: Good work, keeping things on-topic :-)
14:56:45 <PaulMurray> Same time next week 1400UTC
14:56:57 <PaulMurray> kashyap, doing my best :)
14:57:03 <PaulMurray> #endmeeting