14:01:16 #startmeeting Nova Live Migration 14:01:21 Meeting started Tue Jun 21 14:01:16 2016 UTC and is due to finish in 60 minutes. The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:22 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:24 The meeting name has been set to 'nova_live_migration' 14:01:26 o/ 14:01:28 o/ 14:01:32 hi all 14:01:34 o/ 14:01:38 o/ 14:01:38 o/ 14:01:42 o/ 14:01:48 o/ 14:01:55 agenda: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration 14:02:31 #topic Non-priority features 14:02:58 we have just over another week to complete non-prio features 14:03:17 * kashyap waves 14:03:21 #link Newton release schedule: https://wiki.openstack.org/wiki/Nova/Newton_Release_Schedule 14:03:31 So I thought it would be 14:03:46 a good idea to run through where the bps have got to 14:03:52 so, starting with 14:04:04 #link Automatic Live Migration Completion (luis5tb, pkoniszewski, paul-carlton) - https://review.openstack.org/#/c/306561 14:04:15 luis5tb, tdurakov ? 14:04:28 m? 14:04:44 I think the patches are mostly from luis5tb 14:04:44 we are waiting for some more reviews, but looking good 14:04:54 https://review.openstack.org/#/q/topic:bp/auto-live-migration-completion 14:05:04 are you depenant on danpb's patches ? 14:05:17 yes 14:05:29 do you need to be ? 14:05:36 since the live_migration_monitor function was getting too complex 14:05:43 and fail to pass the pep8 tests 14:06:04 ok, so we need to get quite a few of those patches through before your's can merge 14:06:09 also, the automatic switch is dependent on the bug affecting the progress_watermark update 14:06:18 https://review.openstack.org/#/c/331685/1 14:06:23 PaulMurray: I think it's worth to merge danpb patches 14:06:46 although depends on it https://review.openstack.org/#/c/304621/ 14:07:00 yes, it makes more sense to split some migration code into the migration.py 14:07:52 danpb changes are ready for ore review i think 14:07:59 +1 14:08:00 so we can move this item for core review 14:08:25 yes, lets see if we can get someone to review it 14:08:39 where does the dependency on https://review.openstack.org/#/c/304621/ come in ? 14:08:55 PaulMurray: monitor code complexity 14:09:12 I don't see the patches linked in a series 14:09:45 oh, need to update it, but anyway this really depends on danpb patches 14:10:03 ahh, you mean that https://review.openstack.org/#/c/304621/ depends on danpb patches 14:10:12 https://review.openstack.org/#/c/304621/ should also be linked to the bp if it's for a bp 14:10:19 not on the automatic live migration ones 14:10:21 right? 14:10:50 not it shouldn't as it bugfix, it's about dan's patches improtance 14:11:04 i don't see a bug on https://review.openstack.org/#/c/304621/ 14:11:12 if it's a bug, let's open a bug and track it that way 14:11:19 yup 14:12:11 So appart from that patch needing an update the rest of the series looks ready for core reviewers 14:12:20 but there are a lot of patches 14:12:28 so we need it to go smoothly 14:12:50 next 14:12:53 #link Live Migration of Rescued Instances (paul-carlton) - https://review.openstack.org/#/c/307131 14:13:03 paul-carlton2, do you have any comments 14:13:17 https://review.openstack.org/#/q/topic:bp/live-migrate-rescued-instances 14:14:00 working on filal fix for images with kernel and ramdisk 14:14:11 it works now, just fixing tests 14:14:25 Also need to do tempest test 14:14:54 when will all the code be up for review? 14:15:28 I aslo told johnthetubaguy I'd implement it for XEnAPI but that might miss cut now 14:15:51 Code is up, tests need fixing but feel free to review what is there 14:16:42 If I don't get tthe XenAPI implementation done I will do it for O 14:16:56 ok 14:16:59 next 14:17:07 #link Make checks before live-migration async (tdurakov) - https://review.openstack.org/#/c/320416/ 14:17:20 https://review.openstack.org/#/q/topic:bp/async-live-migration-rest-check 14:17:47 this one is good, review is welcome 14:17:56 first 2 ready to be merged 14:18:03 good - most have a +2 I see 14:18:29 #link Remove compute-compute communication in live-migration (tdurakov) - https://review.openstack.org/#/c/292271/ 14:18:34 has this been started ? 14:18:39 yes 14:18:45 https://review.openstack.org/#/q/topic:bp/remove-compute-compute-communication 14:19:09 actualy it's reword/rebase of https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:conductor 14:19:17 so ready to be reviewed 14:19:35 will resubmit rollback centralization today, and all is done 14:20:04 good - are they on the review page ? 14:20:20 will add right noew 14:20:27 thank you 14:20:42 for completeness 14:20:43 #link Re-Proposes to check destination on migrations (bauzas) - https://review.openstack.org/#/c/296408/ 14:20:53 https://review.openstack.org/#/q/topic:bp/check-destination-on-migrations-newton 14:20:59 this is probably tracked under scheduling 14:21:25 see if bauzas pops up in a minute 14:21:34 only changes left there are the client 14:21:44 * bauzas puts the head in the door's corner 14:21:57 that don't sound safe bauzas 14:21:57 oh that 14:22:09 so, the BP is complete for the Nova side 14:22:18 I made a patch for the client side of evacuation 14:22:31 and I'm about to upload a patch for the live-mig one by EOD 14:22:44 do these fall under the scheduler subteam ? 14:22:46 couple of nits for the first review, but looks good so far 14:22:51 well, not really 14:22:54 it's not a prio bp 14:23:06 so the bp is done, 14:23:10 only thing left is the cli 14:23:11 it's rather a poor lone cowboy 14:23:16 which isn't restricted to the nova release schedule 14:23:23 ok - that's good then 14:23:23 yeah that 14:23:24 it's the next microversion that needs to be implemented in the cli 14:23:26 so it will be done 14:23:30 as others depend on it 14:23:33 I fooled the release schedule :) 14:23:40 so we don't really need to worry about that one here 14:23:44 +1 14:23:44 thanks bauzas 14:23:54 are there any bps I have missed 14:24:12 these were on our etherpad - I have a feeling there is another one that might have got lost 14:24:21 ...but only a feeling 14:24:46 ok...... 14:24:50 next up 14:24:58 before mriedem goes running to a meeting somewhere 14:25:05 #topic CI 14:25:16 mriedem: any updated for job stability? 14:25:16 any updates ? 14:25:29 i was going to check the live migration job quick 14:25:45 but since it's in the experimental queue i'm not sure if we track stats on that, i'll find out 14:26:15 you might have to come back around to me 14:26:44 np 14:26:56 PaulMurray, fyi: all shared storages were disabled, to track only block-migration failures for now 14:27:03 the other thing is 14:27:21 since it's an on-demand job any numbers we have are only going to be for what was run against it 14:27:28 which could also be busted changes 14:27:49 I should encourage people to use the jobs 14:27:56 mriedem: well, maybe move it to check pipeline then? 14:27:58 so i'm thinking we should probably just move that job to the check queue as non-voting 14:28:00 I forgot to do that recently 14:28:08 mriedem: +1 14:28:16 I'll prepare patch 14:28:17 sounds good to me 14:28:37 i'm not sure if that means we also want to drop the live migration tests from the gate-tempest-dsvm-multinode-full job which is also in the check queue and non-voting today 14:28:51 since they would be essentially duplicating the test run 14:28:56 mriedem: i'd leave another job untouched for now 14:29:25 yeah i agree, at least to start 14:29:36 need to figure out live-migration job health first, than disable tests for another one 14:29:54 yep, good plan 14:30:06 has anything else happened with CI this week 14:30:13 guess no 14:30:23 ok, lets move on then 14:30:33 ok so no data for gate-tempest-dsvm-multinode-live-migration 14:30:35 just checked it 14:30:40 b/c it's in the experimental queue 14:30:44 #topic Libvirt Storage Pools 14:31:07 mdbooth, diana_clarke going ahead as usual ? 14:31:11 Reviews, especially on the individual image backends, would be great. I'm also waiting on Virtuozzo CI. 14:31:38 Yeah, we've gotten a bit stalled on reviews for the main series 14:32:19 lets see if we can get some more reviews 14:32:31 could be people turning attention to non-priority ff 14:32:31 However, we think there's some missing tempest coverage right now 14:32:40 I'd like to get a good understanding of where you are in the process after the coming week 14:32:50 so I'll be in touch about that 14:32:55 paul-carlton2, can help me understand 14:32:59 Suspect that resize might not be working correctly with the new series, but CI still says it's ok 14:33:16 hmmm 14:33:34 you think the tests are any good ? 14:33:50 diana_clarke: were you going to look into that? 14:33:51 I haven't looked 14:35:42 mdbooth: Perhaps. I was thinking of taking a look at the tempest coverage next. 14:36:17 mdbooth: what do you suspect is missing? resize with ephemeral? i saw you opened a bug for that 14:36:22 the resize tests in tempest are pretty basic 14:36:32 #action ALL please keep up reviews on image back end patches 14:36:43 mriedem: I haven't looked in detail, but I have a suspicion resize is completely broken 14:36:53 For all disks (with the new series) 14:37:02 mdbooth: oh, heh 14:37:09 mdbooth: as in the disks don't actually resize 14:37:10 If that's true, that would suggest no CI coverage 14:37:16 I *think* so 14:37:18 mdbooth: tempest just verifies the API doesn't barf 14:37:27 it doesn't check the size of the disks or anything 14:37:35 Right, we'd want to it actually go into the guest and check the size of its disks 14:37:43 which i don't think tempest is going to do 14:37:46 because that's virt specific 14:37:50 unless you ssh into the guest 14:37:57 Right, you'd have to ssh into the guest 14:38:08 But it's a thing for any hypervisor, right? 14:38:13 we can't get any of that info out of the bdms from the api? 14:38:24 Hehe, funny story 14:38:27 BDMs don't get updated 14:38:34 * mdbooth is working on fixing that right now 14:38:47 ok so rebuilding with bdms would take you back to the pre-resize state 14:38:48 nice 14:39:03 mriedem: Depends where you read your data from 14:39:08 Because instance object *is* updated 14:39:17 So root_gb, ephemeral_gb, swap_mb all updated 14:39:20 But not bdms 14:40:17 is your code worse than the existing code ? 14:40:20 for that? 14:40:32 ...its not something you've introduced is it ? 14:40:33 PaulMurray: If I'm right about it not resizing anything, then yes. 14:40:43 Currently it only doesn't resize ephemeral disks. 14:41:13 ....well at least you code is consistent.... lol 14:41:26 PaulMurray: :) 14:41:33 * PaulMurray that's an ironic lol, not a real laught 14:42:02 Interestingly, resizing of ephemeral disks is broken in at least 3 different ways I'm currently aware of. 14:42:25 3 different and entirely independent ways, each of which mean it could never work. 14:42:48 Incidentally, in attempting to fix that I'm going to have to fudge a design question 14:43:03 Namely, what does it mean to resize an instance with more than 1 ephemeral disk? 14:43:20 It's permitted to create multiple ephemeral disks up to a total of ephemeral_gb 14:43:31 By default it only creates 1, taking up the whole space 14:43:38 Resizing just 1 is obvious 14:43:45 But what does it even mean to resize more than 1? 14:43:54 I don't follow 14:43:57 * mdbooth is planning to emit a warning in the logs and do nothing 14:44:17 PaulMurray: You have a flavor which permits an ephemeral disk of 5F 14:44:21 s/F/G/ 14:44:43 When you create your instance, you explicitly specify 2 ephemeral disks of sizes 2G and 3G 14:45:02 This is ok, because total <= 5G 14:45:12 Now you resize to a flavor with ephemeral size of 10G 14:45:16 What do you do? 14:45:24 Create a third ephemeral of size 5G? 14:45:31 Double the size of both existing disks? 14:45:34 Resize one of them? 14:45:53 * PaulMurray obviously don't remember this bit 14:46:10 I would imagine you would log it and do nothing 14:46:12 yikes 14:46:16 config option! 14:46:22 * mdbooth larts mriedem 14:46:30 you would need more detail to specify what to do right ? 14:46:32 * mriedem looks up "larts" 14:46:54 PaulMurray: Yeah, I'm just planning to punt it. 14:47:03 It's broken now, and I'm pretty sure almost nobody does this 14:47:13 I was just going to leave it broken with a warning. 14:47:19 mdbooth: might be a good ask for the ops ml 14:47:27 I'm going to do some homework so I know what happens 14:47:28 We can fix it after we've bikeshedded it a bit 14:47:28 to see if they have users running into that 14:48:36 there was nothing else on the agenda, but I had better get back to it and make sure in case 14:48:41 * mdbooth is pretty sure the regular case is a single ephemeral disk, and that's easy enough to fix 14:48:48 1 thing 14:48:55 go ahead 14:49:00 I wrote this last week: http://lists.openstack.org/pipermail/openstack-dev/2016-June/097529.html 14:49:07 Writing it was a useful exercise for me 14:49:19 If it's useful to anybody else too, that would be awesome 14:49:29 i still need to read it 14:49:33 but i intend on reading it 14:49:38 same here 14:49:43 because if there is one thing i love reading, it's about BDMs in nova 14:49:53 me too 14:49:58 I keep it for bed time 14:50:06 Related note, I think instance_disk_info might need to die 14:50:11 i'm reading 'faust' before bed right now, so it can't get much worse 14:50:16 mdbooth, any understanding is a good thing 14:50:29 Which would be relevant to live migration, because that seems to be its primary use 14:50:40 I think we can take its contents from the db 14:50:52 Because if the 2 are out of sync we have other problems 14:51:47 mdbooth, are we likely to get to a point where we can implement storage pools for migrate/resize before the end of this cycle ? 14:52:19 PaulMurray: I'm beginning to doubt it :( 14:52:47 honestly we'll be looking good if we can get the refactor done for newton 14:52:51 that's what I really wanted to know when I said I wanted to dig into it next week 14:53:00 i've started spending my review time on non-priority bp's this week and next 14:53:58 so what I would like to do is work out what the plan is and if there is any way we can order it - I know you 14:54:05 are probably doing that already, but 14:54:19 I would like to get a handle on how to use people 14:54:28 this cycle and next 14:54:46 I'm not here to get in your way 14:55:01 are you talking to me or mdbooth? 14:55:18 I am here to get in your way mriedem 14:55:22 always 14:55:31 its mdbooth I'm trying to help 14:55:32 damn you paul murray 14:55:59 but need to get this week out the way first 14:56:12 sounds like we could use someone poking on a tempest test that does resize plus ssh into the guest before and after to verify the disks are resized 14:56:17 PaulMurray: Happy to discuss in more detail next week if you like. 14:56:19 My image backend patches haven't really been reviewed at all, so reviews would help. 14:56:26 +1 diana_clarke 14:56:46 I didn't check if anyone had specific review requests or opens, but ther was nothing on the agenda 14:56:49 so lets skip that 14:57:04 I think we need to wrap up 14:57:22 * PaulMurray is going to change the title of this meeting to "What mdbooth found wrnog with nova this week" 14:57:27 :) 14:57:39 * mdbooth is honoured 14:57:40 I put an action for reviews 14:57:49 time to end 14:57:52 bye all 14:58:01 #endmeeting