14:01:16 <PaulMurray> #startmeeting Nova Live Migration 14:01:21 <openstack> Meeting started Tue Jun 21 14:01:16 2016 UTC and is due to finish in 60 minutes. The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:24 <openstack> The meeting name has been set to 'nova_live_migration' 14:01:26 <davidgiluk> o/ 14:01:28 <tdurakov> o/ 14:01:32 <PaulMurray> hi all 14:01:34 <luis5tb> o/ 14:01:38 <mdbooth> o/ 14:01:38 <diana_clarke> o/ 14:01:42 <mriedem> o/ 14:01:48 <paul-carlton2> o/ 14:01:55 <PaulMurray> agenda: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration 14:02:31 <PaulMurray> #topic Non-priority features 14:02:58 <PaulMurray> we have just over another week to complete non-prio features 14:03:17 * kashyap waves 14:03:21 <PaulMurray> #link Newton release schedule: https://wiki.openstack.org/wiki/Nova/Newton_Release_Schedule 14:03:31 <PaulMurray> So I thought it would be 14:03:46 <PaulMurray> a good idea to run through where the bps have got to 14:03:52 <PaulMurray> so, starting with 14:04:04 <PaulMurray> #link Automatic Live Migration Completion (luis5tb, pkoniszewski, paul-carlton) - https://review.openstack.org/#/c/306561 14:04:15 <PaulMurray> luis5tb, tdurakov ? 14:04:28 <tdurakov> m? 14:04:44 <PaulMurray> I think the patches are mostly from luis5tb 14:04:44 <luis5tb> we are waiting for some more reviews, but looking good 14:04:54 <mriedem> https://review.openstack.org/#/q/topic:bp/auto-live-migration-completion 14:05:04 <PaulMurray> are you depenant on danpb's patches ? 14:05:17 <luis5tb> yes 14:05:29 <PaulMurray> do you need to be ? 14:05:36 <luis5tb> since the live_migration_monitor function was getting too complex 14:05:43 <luis5tb> and fail to pass the pep8 tests 14:06:04 <PaulMurray> ok, so we need to get quite a few of those patches through before your's can merge 14:06:09 <luis5tb> also, the automatic switch is dependent on the bug affecting the progress_watermark update 14:06:18 <luis5tb> https://review.openstack.org/#/c/331685/1 14:06:23 <tdurakov> PaulMurray: I think it's worth to merge danpb patches 14:06:46 <tdurakov> although depends on it https://review.openstack.org/#/c/304621/ 14:07:00 <luis5tb> yes, it makes more sense to split some migration code into the migration.py 14:07:52 <pkoniszewski> danpb changes are ready for ore review i think 14:07:59 <tdurakov> +1 14:08:00 <pkoniszewski> so we can move this item for core review 14:08:25 <PaulMurray> yes, lets see if we can get someone to review it 14:08:39 <PaulMurray> where does the dependency on https://review.openstack.org/#/c/304621/ come in ? 14:08:55 <tdurakov> PaulMurray: monitor code complexity 14:09:12 <PaulMurray> I don't see the patches linked in a series 14:09:45 <tdurakov> oh, need to update it, but anyway this really depends on danpb patches 14:10:03 <luis5tb> ahh, you mean that https://review.openstack.org/#/c/304621/ depends on danpb patches 14:10:12 <mriedem> https://review.openstack.org/#/c/304621/ should also be linked to the bp if it's for a bp 14:10:19 <luis5tb> not on the automatic live migration ones 14:10:21 <luis5tb> right? 14:10:50 <tdurakov> not it shouldn't as it bugfix, it's about dan's patches improtance 14:11:04 <mriedem> i don't see a bug on https://review.openstack.org/#/c/304621/ 14:11:12 <mriedem> if it's a bug, let's open a bug and track it that way 14:11:19 <tdurakov> yup 14:12:11 <PaulMurray> So appart from that patch needing an update the rest of the series looks ready for core reviewers 14:12:20 <PaulMurray> but there are a lot of patches 14:12:28 <PaulMurray> so we need it to go smoothly 14:12:50 <PaulMurray> next 14:12:53 <PaulMurray> #link Live Migration of Rescued Instances (paul-carlton) - https://review.openstack.org/#/c/307131 14:13:03 <PaulMurray> paul-carlton2, do you have any comments 14:13:17 <PaulMurray> https://review.openstack.org/#/q/topic:bp/live-migrate-rescued-instances 14:14:00 <paul-carlton2> working on filal fix for images with kernel and ramdisk 14:14:11 <paul-carlton2> it works now, just fixing tests 14:14:25 <paul-carlton2> Also need to do tempest test 14:14:54 <PaulMurray> when will all the code be up for review? 14:15:28 <paul-carlton2> I aslo told johnthetubaguy I'd implement it for XEnAPI but that might miss cut now 14:15:51 <paul-carlton2> Code is up, tests need fixing but feel free to review what is there 14:16:42 <paul-carlton2> If I don't get tthe XenAPI implementation done I will do it for O 14:16:56 <PaulMurray> ok 14:16:59 <PaulMurray> next 14:17:07 <PaulMurray> #link Make checks before live-migration async (tdurakov) - https://review.openstack.org/#/c/320416/ 14:17:20 <PaulMurray> https://review.openstack.org/#/q/topic:bp/async-live-migration-rest-check 14:17:47 <tdurakov> this one is good, review is welcome 14:17:56 <tdurakov> first 2 ready to be merged 14:18:03 <PaulMurray> good - most have a +2 I see 14:18:29 <PaulMurray> #link Remove compute-compute communication in live-migration (tdurakov) - https://review.openstack.org/#/c/292271/ 14:18:34 <PaulMurray> has this been started ? 14:18:39 <tdurakov> yes 14:18:45 <tdurakov> https://review.openstack.org/#/q/topic:bp/remove-compute-compute-communication 14:19:09 <tdurakov> actualy it's reword/rebase of https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:conductor 14:19:17 <tdurakov> so ready to be reviewed 14:19:35 <tdurakov> will resubmit rollback centralization today, and all is done 14:20:04 <PaulMurray> good - are they on the review page ? 14:20:20 <tdurakov> will add right noew 14:20:27 <PaulMurray> thank you 14:20:42 <PaulMurray> for completeness 14:20:43 <PaulMurray> #link Re-Proposes to check destination on migrations (bauzas) - https://review.openstack.org/#/c/296408/ 14:20:53 <PaulMurray> https://review.openstack.org/#/q/topic:bp/check-destination-on-migrations-newton 14:20:59 <PaulMurray> this is probably tracked under scheduling 14:21:25 <PaulMurray> see if bauzas pops up in a minute 14:21:34 <mriedem> only changes left there are the client 14:21:44 * bauzas puts the head in the door's corner 14:21:57 <PaulMurray> that don't sound safe bauzas 14:21:57 <bauzas> oh that 14:22:09 <bauzas> so, the BP is complete for the Nova side 14:22:18 <bauzas> I made a patch for the client side of evacuation 14:22:31 <bauzas> and I'm about to upload a patch for the live-mig one by EOD 14:22:44 <PaulMurray> do these fall under the scheduler subteam ? 14:22:46 <bauzas> couple of nits for the first review, but looks good so far 14:22:51 <bauzas> well, not really 14:22:54 <bauzas> it's not a prio bp 14:23:06 <mriedem> so the bp is done, 14:23:10 <mriedem> only thing left is the cli 14:23:11 <bauzas> it's rather a poor lone cowboy 14:23:16 <mriedem> which isn't restricted to the nova release schedule 14:23:23 <PaulMurray> ok - that's good then 14:23:23 <bauzas> yeah that 14:23:24 <mriedem> it's the next microversion that needs to be implemented in the cli 14:23:26 <mriedem> so it will be done 14:23:30 <mriedem> as others depend on it 14:23:33 <bauzas> I fooled the release schedule :) 14:23:40 <mriedem> so we don't really need to worry about that one here 14:23:44 <bauzas> +1 14:23:44 <PaulMurray> thanks bauzas 14:23:54 <PaulMurray> are there any bps I have missed 14:24:12 <PaulMurray> these were on our etherpad - I have a feeling there is another one that might have got lost 14:24:21 <PaulMurray> ...but only a feeling 14:24:46 <PaulMurray> ok...... 14:24:50 <PaulMurray> next up 14:24:58 <PaulMurray> before mriedem goes running to a meeting somewhere 14:25:05 <PaulMurray> #topic CI 14:25:16 <tdurakov> mriedem: any updated for job stability? 14:25:16 <PaulMurray> any updates ? 14:25:29 <mriedem> i was going to check the live migration job quick 14:25:45 <mriedem> but since it's in the experimental queue i'm not sure if we track stats on that, i'll find out 14:26:15 <mriedem> you might have to come back around to me 14:26:44 <PaulMurray> np 14:26:56 <tdurakov> PaulMurray, fyi: all shared storages were disabled, to track only block-migration failures for now 14:27:03 <mriedem> the other thing is 14:27:21 <mriedem> since it's an on-demand job any numbers we have are only going to be for what was run against it 14:27:28 <mriedem> which could also be busted changes 14:27:49 <PaulMurray> I should encourage people to use the jobs 14:27:56 <tdurakov> mriedem: well, maybe move it to check pipeline then? 14:27:58 <mriedem> so i'm thinking we should probably just move that job to the check queue as non-voting 14:28:00 <PaulMurray> I forgot to do that recently 14:28:08 <tdurakov> mriedem: +1 14:28:16 <tdurakov> I'll prepare patch 14:28:17 <PaulMurray> sounds good to me 14:28:37 <mriedem> i'm not sure if that means we also want to drop the live migration tests from the gate-tempest-dsvm-multinode-full job which is also in the check queue and non-voting today 14:28:51 <mriedem> since they would be essentially duplicating the test run 14:28:56 <tdurakov> mriedem: i'd leave another job untouched for now 14:29:25 <mriedem> yeah i agree, at least to start 14:29:36 <tdurakov> need to figure out live-migration job health first, than disable tests for another one 14:29:54 <PaulMurray> yep, good plan 14:30:06 <PaulMurray> has anything else happened with CI this week 14:30:13 <tdurakov> guess no 14:30:23 <PaulMurray> ok, lets move on then 14:30:33 <mriedem> ok so no data for gate-tempest-dsvm-multinode-live-migration 14:30:35 <mriedem> just checked it 14:30:40 <mriedem> b/c it's in the experimental queue 14:30:44 <PaulMurray> #topic Libvirt Storage Pools 14:31:07 <PaulMurray> mdbooth, diana_clarke going ahead as usual ? 14:31:11 <diana_clarke> Reviews, especially on the individual image backends, would be great. I'm also waiting on Virtuozzo CI. 14:31:38 <mdbooth> Yeah, we've gotten a bit stalled on reviews for the main series 14:32:19 <PaulMurray> lets see if we can get some more reviews 14:32:31 <PaulMurray> could be people turning attention to non-priority ff 14:32:31 <mdbooth> However, we think there's some missing tempest coverage right now 14:32:40 <PaulMurray> I'd like to get a good understanding of where you are in the process after the coming week 14:32:50 <PaulMurray> so I'll be in touch about that 14:32:55 <PaulMurray> paul-carlton2, can help me understand 14:32:59 <mdbooth> Suspect that resize might not be working correctly with the new series, but CI still says it's ok 14:33:16 <PaulMurray> hmmm 14:33:34 <PaulMurray> you think the tests are any good ? 14:33:50 <mdbooth> diana_clarke: were you going to look into that? 14:33:51 <PaulMurray> I haven't looked 14:35:42 <diana_clarke> mdbooth: Perhaps. I was thinking of taking a look at the tempest coverage next. 14:36:17 <mriedem> mdbooth: what do you suspect is missing? resize with ephemeral? i saw you opened a bug for that 14:36:22 <mriedem> the resize tests in tempest are pretty basic 14:36:32 <PaulMurray> #action ALL please keep up reviews on image back end patches 14:36:43 <mdbooth> mriedem: I haven't looked in detail, but I have a suspicion resize is completely broken 14:36:53 <mdbooth> For all disks (with the new series) 14:37:02 <mriedem> mdbooth: oh, heh 14:37:09 <mriedem> mdbooth: as in the disks don't actually resize 14:37:10 <mdbooth> If that's true, that would suggest no CI coverage 14:37:16 <mdbooth> I *think* so 14:37:18 <mriedem> mdbooth: tempest just verifies the API doesn't barf 14:37:27 <mriedem> it doesn't check the size of the disks or anything 14:37:35 <mdbooth> Right, we'd want to it actually go into the guest and check the size of its disks 14:37:43 <mriedem> which i don't think tempest is going to do 14:37:46 <mriedem> because that's virt specific 14:37:50 <mriedem> unless you ssh into the guest 14:37:57 <mdbooth> Right, you'd have to ssh into the guest 14:38:08 <mdbooth> But it's a thing for any hypervisor, right? 14:38:13 <mriedem> we can't get any of that info out of the bdms from the api? 14:38:24 <mdbooth> Hehe, funny story 14:38:27 <mdbooth> BDMs don't get updated 14:38:34 * mdbooth is working on fixing that right now 14:38:47 <mriedem> ok so rebuilding with bdms would take you back to the pre-resize state 14:38:48 <mriedem> nice 14:39:03 <mdbooth> mriedem: Depends where you read your data from 14:39:08 <mdbooth> Because instance object *is* updated 14:39:17 <mdbooth> So root_gb, ephemeral_gb, swap_mb all updated 14:39:20 <mdbooth> But not bdms 14:40:17 <PaulMurray> is your code worse than the existing code ? 14:40:20 <PaulMurray> for that? 14:40:32 <PaulMurray> ...its not something you've introduced is it ? 14:40:33 <mdbooth> PaulMurray: If I'm right about it not resizing anything, then yes. 14:40:43 <mdbooth> Currently it only doesn't resize ephemeral disks. 14:41:13 <PaulMurray> ....well at least you code is consistent.... lol 14:41:26 <mdbooth> PaulMurray: :) 14:41:33 * PaulMurray that's an ironic lol, not a real laught 14:42:02 <mdbooth> Interestingly, resizing of ephemeral disks is broken in at least 3 different ways I'm currently aware of. 14:42:25 <mdbooth> 3 different and entirely independent ways, each of which mean it could never work. 14:42:48 <mdbooth> Incidentally, in attempting to fix that I'm going to have to fudge a design question 14:43:03 <mdbooth> Namely, what does it mean to resize an instance with more than 1 ephemeral disk? 14:43:20 <mdbooth> It's permitted to create multiple ephemeral disks up to a total of ephemeral_gb 14:43:31 <mdbooth> By default it only creates 1, taking up the whole space 14:43:38 <mdbooth> Resizing just 1 is obvious 14:43:45 <mdbooth> But what does it even mean to resize more than 1? 14:43:54 <PaulMurray> I don't follow 14:43:57 * mdbooth is planning to emit a warning in the logs and do nothing 14:44:17 <mdbooth> PaulMurray: You have a flavor which permits an ephemeral disk of 5F 14:44:21 <mdbooth> s/F/G/ 14:44:43 <mdbooth> When you create your instance, you explicitly specify 2 ephemeral disks of sizes 2G and 3G 14:45:02 <mdbooth> This is ok, because total <= 5G 14:45:12 <mdbooth> Now you resize to a flavor with ephemeral size of 10G 14:45:16 <mdbooth> What do you do? 14:45:24 <mdbooth> Create a third ephemeral of size 5G? 14:45:31 <mdbooth> Double the size of both existing disks? 14:45:34 <mdbooth> Resize one of them? 14:45:53 * PaulMurray obviously don't remember this bit 14:46:10 <PaulMurray> I would imagine you would log it and do nothing 14:46:12 <mriedem> yikes 14:46:16 <mriedem> config option! 14:46:22 * mdbooth larts mriedem 14:46:30 <PaulMurray> you would need more detail to specify what to do right ? 14:46:32 * mriedem looks up "larts" 14:46:54 <mdbooth> PaulMurray: Yeah, I'm just planning to punt it. 14:47:03 <mdbooth> It's broken now, and I'm pretty sure almost nobody does this 14:47:13 <mdbooth> I was just going to leave it broken with a warning. 14:47:19 <mriedem> mdbooth: might be a good ask for the ops ml 14:47:27 <PaulMurray> I'm going to do some homework so I know what happens 14:47:28 <mdbooth> We can fix it after we've bikeshedded it a bit 14:47:28 <mriedem> to see if they have users running into that 14:48:36 <PaulMurray> there was nothing else on the agenda, but I had better get back to it and make sure in case 14:48:41 * mdbooth is pretty sure the regular case is a single ephemeral disk, and that's easy enough to fix 14:48:48 <mdbooth> 1 thing 14:48:55 <PaulMurray> go ahead 14:49:00 <mdbooth> I wrote this last week: http://lists.openstack.org/pipermail/openstack-dev/2016-June/097529.html 14:49:07 <mdbooth> Writing it was a useful exercise for me 14:49:19 <mdbooth> If it's useful to anybody else too, that would be awesome 14:49:29 <mriedem> i still need to read it 14:49:33 <mriedem> but i intend on reading it 14:49:38 <andrearosa> same here 14:49:43 <mriedem> because if there is one thing i love reading, it's about BDMs in nova 14:49:53 <PaulMurray> me too 14:49:58 <PaulMurray> I keep it for bed time 14:50:06 <mdbooth> Related note, I think instance_disk_info might need to die 14:50:11 <mriedem> i'm reading 'faust' before bed right now, so it can't get much worse 14:50:16 <PaulMurray> mdbooth, any understanding is a good thing 14:50:29 <mdbooth> Which would be relevant to live migration, because that seems to be its primary use 14:50:40 <mdbooth> I think we can take its contents from the db 14:50:52 <mdbooth> Because if the 2 are out of sync we have other problems 14:51:47 <PaulMurray> mdbooth, are we likely to get to a point where we can implement storage pools for migrate/resize before the end of this cycle ? 14:52:19 <mdbooth> PaulMurray: I'm beginning to doubt it :( 14:52:47 <mriedem> honestly we'll be looking good if we can get the refactor done for newton 14:52:51 <PaulMurray> that's what I really wanted to know when I said I wanted to dig into it next week 14:53:00 <mriedem> i've started spending my review time on non-priority bp's this week and next 14:53:58 <PaulMurray> so what I would like to do is work out what the plan is and if there is any way we can order it - I know you 14:54:05 <PaulMurray> are probably doing that already, but 14:54:19 <PaulMurray> I would like to get a handle on how to use people 14:54:28 <PaulMurray> this cycle and next 14:54:46 <PaulMurray> I'm not here to get in your way 14:55:01 <mriedem> are you talking to me or mdbooth? 14:55:18 <PaulMurray> I am here to get in your way mriedem 14:55:22 <PaulMurray> always 14:55:31 <PaulMurray> its mdbooth I'm trying to help 14:55:32 <mriedem> damn you paul murray 14:55:59 <PaulMurray> but need to get this week out the way first 14:56:12 <mriedem> sounds like we could use someone poking on a tempest test that does resize plus ssh into the guest before and after to verify the disks are resized 14:56:17 <mdbooth> PaulMurray: Happy to discuss in more detail next week if you like. 14:56:19 <diana_clarke> My image backend patches haven't really been reviewed at all, so reviews would help. 14:56:26 <mdbooth> +1 diana_clarke 14:56:46 <PaulMurray> I didn't check if anyone had specific review requests or opens, but ther was nothing on the agenda 14:56:49 <PaulMurray> so lets skip that 14:57:04 <PaulMurray> I think we need to wrap up 14:57:22 * PaulMurray is going to change the title of this meeting to "What mdbooth found wrnog with nova this week" 14:57:27 <PaulMurray> :) 14:57:39 * mdbooth is honoured 14:57:40 <PaulMurray> I put an action for reviews 14:57:49 <PaulMurray> time to end 14:57:52 <PaulMurray> bye all 14:58:01 <PaulMurray> #endmeeting