14:00:21 <PaulMurray> #startmeeting Nova Live Migration 14:00:22 <openstack> Meeting started Tue Apr 12 14:00:21 2016 UTC and is due to finish in 60 minutes. The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:25 <openstack> The meeting name has been set to 'nova_live_migration' 14:00:33 <andrearosa> hi 14:00:33 <PaulMurray> Hi, anyone here for live migration ? 14:00:36 * eliqiao lurks 14:00:36 <davidgiluk> o/ 14:00:37 <pkoniszewski> o/ 14:00:40 <tdurakov> hi 14:00:41 <eliqiao> o/ 14:01:00 <PaulMurray> Agenda: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration 14:01:04 <luis5tb_> hi 14:01:04 <mdbooth> o/ 14:01:21 <PaulMurray> wait just one minute 14:01:27 <scheuran> hi 14:01:29 <PaulMurray> for stragglers (if any) 14:02:22 <paul-carlton2> o/ 14:02:33 <tdurakov> PaulMurray: while waiting can i ask to review this series: https://review.openstack.org/#/c/304621/ 14:02:49 <PaulMurray> you just did :) 14:02:59 <PaulMurray> #topic CI 14:03:00 <tdurakov> :) 14:03:04 <tdurakov> okay 14:03:17 <PaulMurray> tdurakov, what is the state of play for CI ? 14:03:45 <tdurakov> here is update: now I'm working on coverage for negative cases for live-migration. To check rollback things. 14:04:28 <tdurakov> about job stability, any news for devstack plugin? 14:04:53 <PaulMurray> tdurakov, who was looking at devstack plugin ? 14:05:39 <tdurakov> last time I've asked there was activity leaded by markus_z 14:06:33 <tdurakov> so,if it's done - we could check whether the problem with jobs is fixed by newer veersions of libvirt/qemu or not 14:07:23 <PaulMurray> tdurakov, and I just pinged markus_z so lets see if he turns up 14:07:36 <tdurakov> yep 14:08:14 <PaulMurray> he says he will be here in 5-10 mins 14:08:48 <PaulMurray> anything else while we are waiting ? 14:08:57 <tdurakov> maybe let's move on and get back to this topic when markus will be ready? 14:09:01 <tdurakov> :) 14:09:06 <PaulMurray> ok 14:09:17 <PaulMurray> #topic Summit Sessions 14:09:34 <PaulMurray> I want to get organised for the summit sessions 14:09:50 <PaulMurray> to start with I want to make sure we have enough time 14:09:59 <PaulMurray> we have one session dedicated to us so far 14:10:13 <PaulMurray> see: https://etherpad.openstack.org/p/newton-nova-summit-ideas 14:10:33 <PaulMurray> Wed 11:50-12:30: live migration (gaps, re-approved specs, testing) 14:10:54 <PaulMurray> So we need to makes sure there are no significant clashes 14:11:02 <PaulMurray> for people with talks etc. 14:11:21 <PaulMurray> Let me know if you do (or put it on the etherpad) 14:11:37 <PaulMurray> #link summit session planning https://etherpad.openstack.org/p/newton-nova-summit-ideas 14:11:55 <PaulMurray> mdbooth, do you need time to go through storage pools 14:12:13 <PaulMurray> I know paul-carlton2 will not be there 14:12:16 <mdbooth> I'm wondering. 14:12:33 <mdbooth> My agenda is more around inculcating the importance of cleanup in this code. 14:13:13 <mdbooth> But I've already put that in a spec: https://review.openstack.org/#/c/302117/ 14:13:45 <mdbooth> If people could get eyes on that spec, if it's not controversial I don't think we need a separate session 14:14:23 * markus_z joins late 14:14:25 <PaulMurray> mdbooth, do you think it is realistic to get storage pools and changing migrate to use it in this cycle 14:14:36 <mdbooth> Yes 14:14:41 <mdbooth> Umm.. 14:14:46 <mdbooth> Storage pools, yes 14:14:59 <mdbooth> The goal of eliminating ssh... maybe not 14:16:13 <PaulMurray> I would like a plan to get both done if possible - its a relatively high want for security 14:16:48 <PaulMurray> Hi, markus_z, will come to you in a moment 14:17:25 <paul-carlton2> We can focus on this but there may be some changes for storage pools that require a change to externals that need a release notice, so early in O might be more realistic 14:18:27 <PaulMurray> ok - we can talk about it outside this meeting - and maybe in the summit 14:19:04 <mdbooth> PaulMurray: Yes, that might want a session. 14:19:26 <PaulMurray> There is also discussion going on around post-copy 14:19:31 <PaulMurray> How is that going ? 14:20:03 <PaulMurray> Is luis5tb_ here ? 14:20:07 <davidgiluk> I know danpb has some comments on luis5tb's spec 14:20:08 <luis5tb_> I did not receive that much feedback beside the one we already had last week 14:20:36 <pkoniszewski> i can say that i agree with Daniel, we shouldn't expose it through API 14:20:55 <davidgiluk> the question then is how to structure it instead 14:20:59 <pkoniszewski> we need something that paul-carlton2 mentioned in ml (some kind of SLA) or task api finally 14:21:01 <luis5tb_> I think that we can include a configuration in nova.conf 14:21:19 <luis5tb_> so that post-copy migration will be used or not based on that, per host level, instead of VM level 14:21:42 <luis5tb_> maybe somehow similar to the way of enabling tunneling for migrations 14:22:00 <luis5tb_> and then, I thought of a simple way of switching from pre-copy to post-copy 14:22:07 <paul-carlton2> I'm working on a spec to define instances as cattle, pets or pandas and migrations as low, medium or high importance then use this to work out if/when to use features like post copy 14:22:09 <luis5tb_> based on the number of Iterations 14:22:11 <tdurakov> imo it's bad idea to put it into nova.conf 14:22:13 <tdurakov> again 14:22:31 <davidgiluk> host based is probably wrong because it does depend on the feature of the VMs 14:23:01 <tdurakov> what point not to expose it over api? 14:23:05 <pkoniszewski> so we have configuration vs api 14:23:13 <pkoniszewski> to not expose hypervisor specific details through API 14:23:33 <pkoniszewski> probably we should discuss it during summit 14:23:35 <luis5tb_> yep, I completely agree with Daniel regarding the automatic switching 14:23:36 * andrearosa thinks that post-copy-lm needs a discussion at the summit as well 14:23:43 <tdurakov> well, we could "downgrade" to pre-copy if it's not supported 14:23:54 <tdurakov> i'd go with api approach 14:23:54 <luis5tb_> though I still think it is nice to have a flag for --post-copy to only use it when you like 14:24:11 <pkoniszewski> we never know when it is supported on the api level, thats the point 14:24:30 <tdurakov> pkoniszewski: we could check it:p 14:24:43 <luis5tb_> we can check if libvirt has the VIR_MIGRATE_POSTCOPY flag 14:24:48 <andrearosa> tdurakov: then nova can check it instead of the user 14:25:18 <tdurakov> andrearosa: what i think we should make l-m fully async 14:25:30 <pkoniszewski> can we agree that it should be discussed during summit? we won't agree on a solution here.. 14:25:30 <tdurakov> and force user to check result with instance actions 14:25:37 <andrearosa> pkoniszewski: +1 14:25:45 <paul-carlton2> luis5tb_, my approach is to have one flag on live-migration, optional importance defaulting to medium and let it work out what features to use to force migration to complete based on that and the instance type 14:26:02 <pkoniszewski> we will have all cores there and it will be their decision, we can just show pros and cons of both, they will summarize 14:26:45 <luis5tb_> so, then I should leave the spec as it is until a decision is made during the summit? 14:26:49 <PaulMurray> It will be a good idea to come in with a proposl - a straw man - and then see what everyone wants to do 14:27:13 <pkoniszewski> luis5tb: yes 14:27:17 <luis5tb_> ok 14:27:20 <PaulMurray> the summit should be for decisions - discussion needs to come before if possible so everyone knows what they think before they get there 14:27:37 <pkoniszewski> we have two proposals already 14:27:54 <PaulMurray> I know, but no agreement yet 14:27:56 * davidgiluk isn't sure if anyone who has experience with postcopy is at the summit, but still OK 14:27:57 <pkoniszewski> with paul-carlton2 idea i think that we have three proposals 14:28:43 <kashyap> davidgiluk: Probably you should head there, yourself? :-) 14:28:43 <luis5tb_> IMHO, main advantage of having the flag is that you don't need to always include the VIR_MIGRATE_POSTCOPY (and its overhead) when starting the migrations 14:29:01 <pkoniszewski> but overhead is very small, isn't it? 14:29:11 <PaulMurray> So that's two topics that need some time 14:29:24 <pkoniszewski> is it even noticeable? do we have any numbers? 14:29:44 <tdurakov> PaulMurray: maybe, get back to devstack plugin? 14:29:51 <luis5tb_> davidgiluk: do you have some numbers about the performance impact of using postcopy flag? 14:30:08 <PaulMurray> tdurakov, just want to finish this topic - markus_z says he can wait a bit 14:30:09 <pkoniszewski> i know that it will add few miliseconds to VM downtime in final phase 14:30:18 <tdurakov> PaulMurray: cool 14:30:22 <davidgiluk> luis5tb_: No I don't; but I'm more worried about restrictions on features than actual overhead 14:30:40 <pkoniszewski> u mean, like RDMA or compression? 14:30:42 <mdbooth> davidgiluk: Unfortunately danpb won't be there. 14:30:55 <davidgiluk> pkoniszewski: Right 14:31:05 <pkoniszewski> so, we do not support compression anyway 14:31:11 <luis5tb_> pkoniszewski: I meant overhead of including the flag, but including the flag does not mean you are actually using post-copy mode 14:31:16 <pkoniszewski> and if you use RDMA do you really need to use post-copy at all? 14:31:25 <davidgiluk> luis5tb_: That overhead is pretty low 14:31:32 <luis5tb_> ok 14:32:00 <davidgiluk> pkoniszewski: My bet is it's still a bad thing to autoenable based on host; but I may be wrong 14:32:27 <PaulMurray> lets cut this one now 14:32:38 <PaulMurray> is there anything else that will need time ? 14:32:51 <PaulMurray> We will go over things to do and CI 14:32:55 <pkoniszewski> davidgiluk: tbh, i dont like both solutions 14:33:29 <PaulMurray> I'm wondering about anything that can't be sorted out without discussion there 14:33:58 <PaulMurray> There is some neutron/nova stuff that is listed at the end of the agenda 14:34:16 <PaulMurray> and it has been listed in the neutron/nova session for the summit 14:34:40 <PaulMurray> so may not need to budget for that - I'll make sure to follow up on that 14:35:13 <PaulMurray> ok - lets go back to markus_z now then 14:35:19 <PaulMurray> #topic Back To CI 14:35:32 <PaulMurray> are you still here markus_z 14:35:34 <markus_z> yep 14:35:56 <PaulMurray> tdurakov, was asking about devstack plugin for versions of libvirt/qemu 14:36:10 <PaulMurray> what is the status 14:36:23 <markus_z> The "happy path" isn't working right now. I have 2 patches which should solve that. 14:36:24 <markus_z> https://review.openstack.org/#/q/project:openstack/devstack-plugin-additional-pkg-repos 14:36:39 <markus_z> tonyb is the only core of the project and I didn't get him in the last days 14:36:53 <tdurakov> starred this 14:37:17 <markus_z> I wanted to discuss with him at the summit how to proceed. 14:37:32 <tdurakov> markus_z: any estimates? 14:38:06 <markus_z> tdurakov: estimates for what? Until it is fully finished and no experimental job anymore? 14:38:27 <tdurakov> markus_z: plugin readiness i mean 14:38:46 <markus_z> tdurakov: I hope to get it done until newton-1 14:38:52 <tdurakov> acked 14:39:07 <tdurakov> markus_z: thank you for details 14:39:09 <PaulMurray> markus_z, do you need any help - or is it just down to core review ? 14:39:32 <markus_z> PaulMurray: It's down to core review. We also need cores for that plugin project. 14:40:10 <PaulMurray> markus_z, is it a separate project on its own ? 14:40:11 <markus_z> PaulMurray: After I talked with tony I would post on the ML how the version-bump would look like. Then we can discuss there. 14:40:24 <markus_z> PaulMurray: The devstack-plugin is, yes 14:40:47 <tdurakov> what about moving ubuntu versions for ci instead? 14:40:51 <markus_z> PaulMurray: It's also of use for Neutron and Cinder, that's why we decided to keep it out-of-tree of Nova. 14:41:32 <PaulMurray> markus_z, got it - didn't clock that 14:41:46 <markus_z> tdurakov: You mean the test nodes in the jenkins jobs? 14:41:52 <tdurakov> yes 14:42:04 <tdurakov> just bump whole linux version instead:) 14:42:24 <markus_z> We get the pre-build libvirt(kvm) packages from Ubuntu-Cloud-Archive (at the moment). 14:42:36 <tdurakov> batteries will be included then 14:43:22 <markus_z> tdurakov: I think we will bump the libvirt version more often than the linux version of the test nodes 14:43:49 <tdurakov> markus_z: ok 14:44:27 <kashyap> markus_z: Only libvirt version, but not QEMU, too? 14:44:34 <markus_z> If you like, we can discuss this at the summit. I'll grab you, PaulMurray and tonyb and figure this out in a few minutes 14:44:47 <PaulMurray> markus_z, sounds good to me 14:44:56 <tdurakov> markus_z: I'll miss Austin 14:45:04 <PaulMurray> markus_z, also only one core is not a good place to be - we can sort that out I'm sure 14:45:40 <PaulMurray> Thanks markus_z 14:45:46 <clarkb> we have xenial available for experimental jobs now 14:46:16 <clarkb> and sometime after summit and xenials release will switch trusty jobs over 14:46:33 <tdurakov> clarkb: could we disscuss it on #openstack-infra i got one experimental job for this 14:46:35 <PaulMurray> What's xenial ? 14:46:41 <tdurakov> 16.0.4 14:46:49 <tdurakov> 16.04 i mean 14:47:06 <tdurakov> new ubuntu version 14:47:17 <PaulMurray> oh, got it 14:47:32 <PaulMurray> #topic Specs 14:47:41 <tdurakov> PaulMurray: I think it will fit live-migration job 14:48:01 <PaulMurray> I just wanted to point out that the specs need reviews 14:48:03 <PaulMurray> https://etherpad.openstack.org/p/newton-nova-priorities-tracking 14:48:28 <PaulMurray> And paul-carlton2, you need to submit newton versions of your specs 14:48:37 <PaulMurray> I am going to remove the mitaka ones from the page 14:49:41 <PaulMurray> If there are specs missing from that page please put them there 14:49:51 <PaulMurray> I will go through reviewing everything this week 14:50:00 <PaulMurray> please all do the same if you can 14:50:13 <PaulMurray> #topic Open Discussion 14:50:31 <PaulMurray> The only one here is from scheuran 14:50:42 <scheuran> hi 14:50:43 <PaulMurray> VIF information on migration 14:50:51 <PaulMurray> We did mention last week 14:51:03 <PaulMurray> and I know you said the main input is needed from neutron at the moment 14:51:05 <PaulMurray> is that right ? 14:51:11 <scheuran> Right 14:51:18 <scheuran> i posted a mail on the ML 14:51:23 <scheuran> #link http://lists.openstack.org/pipermail/openstack-dev/2016-April/092073.html 14:52:02 <scheuran> The point is, that we cannot just set the binding:host_id in pre_live_migration, as this would break some thirdparty mechanisms + the l2pop mechanism 14:52:13 <scheuran> so we need to figure out another way on the Neutron side 14:52:35 <scheuran> One idea has already been posted on the ML 14:52:37 <paul-carlton2> PaulMurray, ack 14:53:02 <scheuran> I'll come back to you guys as soon as I have a direction from Neutron side... 14:54:01 <mdbooth> I posted to the ML last week about image cache. Anybody get a chance to look at that? 14:54:15 <mdbooth> I think markus_z was interested. 14:54:46 <mdbooth> Basically, we've got some stored technical debt in image cache which is going to require a layout migration at some point when it's fixed. 14:54:57 <mdbooth> storage pools will also require a layout migration. 14:55:34 <mdbooth> I suspect we don't want to try to fix image cache in this cycle, but I'd prefer to make this decision eyes open. 14:55:45 <mdbooth> i.e. it will require 2 layout migrations. 14:56:24 <PaulMurray> I want to ask why - but I suspect I should re-read the ML 14:56:28 <paul-carlton2> the layout migration you refer to is disk.info? 14:56:52 <mdbooth> paul-carlton2: Short answer: image cache stores data in the wrong place. 14:57:24 <mdbooth> Essentially it's broken in design as well as implementation. 14:57:54 <mdbooth> paul-carlton2: Nope, the _base directory needs to go away 14:58:44 <PaulMurray> We only have 1 minute left 14:59:05 <PaulMurray> I would like to start discussing this - but we can't really 14:59:08 <mdbooth> Discussion on the list is fine by my 14:59:09 <mdbooth> me 14:59:13 <PaulMurray> how about we read and answer 14:59:21 <mdbooth> +1 thanks 14:59:29 <PaulMurray> and we can talk again next week if needed 14:59:41 <PaulMurray> got to cut off now I'm affraid 14:59:46 <PaulMurray> thanks all for coming 14:59:53 <PaulMurray> #endmeeting