#openstack-meeting-3 log

14:01:16 <PaulMurray> #startmeeting Nova Live Migration
14:01:21 <openstack> Meeting started Tue Jun 21 14:01:16 2016 UTC and is due to finish in 60 minutes.  The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:24 <openstack> The meeting name has been set to 'nova_live_migration'
14:01:26 <davidgiluk> o/
14:01:28 <tdurakov> o/
14:01:32 <PaulMurray> hi all
14:01:34 <luis5tb> o/
14:01:38 <mdbooth> o/
14:01:38 <diana_clarke> o/
14:01:42 <mriedem> o/
14:01:48 <paul-carlton2> o/
14:01:55 <PaulMurray> agenda: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration
14:02:31 <PaulMurray> #topic Non-priority features
14:02:58 <PaulMurray> we have just over another week to complete non-prio features
14:03:17 * kashyap waves
14:03:21 <PaulMurray> #link Newton release schedule: https://wiki.openstack.org/wiki/Nova/Newton_Release_Schedule
14:03:31 <PaulMurray> So I thought it would be
14:03:46 <PaulMurray> a good idea to run through where the bps have got to
14:03:52 <PaulMurray> so, starting with
14:04:04 <PaulMurray> #link Automatic Live Migration Completion (luis5tb, pkoniszewski, paul-carlton) - https://review.openstack.org/#/c/306561
14:04:15 <PaulMurray> luis5tb, tdurakov ?
14:04:28 <tdurakov> m?
14:04:44 <PaulMurray> I think the patches are mostly from luis5tb
14:04:44 <luis5tb> we are waiting for some more reviews, but looking good
14:04:54 <mriedem> https://review.openstack.org/#/q/topic:bp/auto-live-migration-completion
14:05:04 <PaulMurray> are you depenant on danpb's patches ?
14:05:17 <luis5tb> yes
14:05:29 <PaulMurray> do you need to be ?
14:05:36 <luis5tb> since the live_migration_monitor function was getting too complex
14:05:43 <luis5tb> and fail to pass the pep8 tests
14:06:04 <PaulMurray> ok, so we need to get quite a few of those patches through before your's can merge
14:06:09 <luis5tb> also, the automatic switch is dependent on the bug affecting the progress_watermark update
14:06:18 <luis5tb> https://review.openstack.org/#/c/331685/1
14:06:23 <tdurakov> PaulMurray: I think it's worth to merge danpb patches
14:06:46 <tdurakov> although depends on it https://review.openstack.org/#/c/304621/
14:07:00 <luis5tb> yes, it makes more sense to split some migration code into the migration.py
14:07:52 <pkoniszewski> danpb changes are ready for ore review i think
14:07:59 <tdurakov> +1
14:08:00 <pkoniszewski> so we can move this item for core review
14:08:25 <PaulMurray> yes, lets see if we can get someone to review it
14:08:39 <PaulMurray> where does the dependency on  https://review.openstack.org/#/c/304621/ come in ?
14:08:55 <tdurakov> PaulMurray: monitor code complexity
14:09:12 <PaulMurray> I don't see the patches linked in a series
14:09:45 <tdurakov> oh, need to update it, but anyway this really depends on danpb patches
14:10:03 <luis5tb> ahh, you mean that https://review.openstack.org/#/c/304621/ depends on danpb patches
14:10:12 <mriedem> https://review.openstack.org/#/c/304621/ should also be linked to the bp if it's for a bp
14:10:19 <luis5tb> not on the automatic live migration ones
14:10:21 <luis5tb> right?
14:10:50 <tdurakov> not it shouldn't as it bugfix, it's about dan's patches improtance
14:11:04 <mriedem> i don't see a bug on https://review.openstack.org/#/c/304621/
14:11:12 <mriedem> if it's a bug, let's open a bug and track it that way
14:11:19 <tdurakov> yup
14:12:11 <PaulMurray> So appart from that patch needing an update the rest of the series looks ready for core reviewers
14:12:20 <PaulMurray> but there are a lot of patches
14:12:28 <PaulMurray> so we need it to go smoothly
14:12:50 <PaulMurray> next
14:12:53 <PaulMurray> #link Live Migration of Rescued Instances (paul-carlton) - https://review.openstack.org/#/c/307131
14:13:03 <PaulMurray> paul-carlton2, do you have any comments
14:13:17 <PaulMurray> https://review.openstack.org/#/q/topic:bp/live-migrate-rescued-instances
14:14:00 <paul-carlton2> working on filal fix for images with kernel and ramdisk
14:14:11 <paul-carlton2> it works now, just fixing tests
14:14:25 <paul-carlton2> Also need to do tempest test
14:14:54 <PaulMurray> when will all the code be up for review?
14:15:28 <paul-carlton2> I aslo told johnthetubaguy I'd implement it for XEnAPI but that might miss cut now
14:15:51 <paul-carlton2> Code is up, tests need fixing but feel free to review what is there
14:16:42 <paul-carlton2> If I don't get tthe XenAPI implementation done I will do it for O
14:16:56 <PaulMurray> ok
14:16:59 <PaulMurray> next
14:17:07 <PaulMurray> #link Make checks before live-migration async (tdurakov) - https://review.openstack.org/#/c/320416/
14:17:20 <PaulMurray> https://review.openstack.org/#/q/topic:bp/async-live-migration-rest-check
14:17:47 <tdurakov> this one is good, review is welcome
14:17:56 <tdurakov> first 2 ready to be merged
14:18:03 <PaulMurray> good - most have a +2 I see
14:18:29 <PaulMurray> #link Remove compute-compute communication in live-migration (tdurakov) - https://review.openstack.org/#/c/292271/
14:18:34 <PaulMurray> has this been started ?
14:18:39 <tdurakov> yes
14:18:45 <tdurakov> https://review.openstack.org/#/q/topic:bp/remove-compute-compute-communication
14:19:09 <tdurakov> actualy it's reword/rebase of https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:conductor
14:19:17 <tdurakov> so ready to be reviewed
14:19:35 <tdurakov> will resubmit rollback centralization today, and all is done
14:20:04 <PaulMurray> good - are they on the review page ?
14:20:20 <tdurakov> will add right noew
14:20:27 <PaulMurray> thank you
14:20:42 <PaulMurray> for completeness
14:20:43 <PaulMurray> #link Re-Proposes to check destination on migrations (bauzas) - https://review.openstack.org/#/c/296408/
14:20:53 <PaulMurray> https://review.openstack.org/#/q/topic:bp/check-destination-on-migrations-newton
14:20:59 <PaulMurray> this is probably tracked under scheduling
14:21:25 <PaulMurray> see if bauzas pops up in a minute
14:21:34 <mriedem> only changes left there are the client
14:21:44 * bauzas puts the head in the door's corner
14:21:57 <PaulMurray> that don't sound safe bauzas
14:21:57 <bauzas> oh that
14:22:09 <bauzas> so, the BP is complete for the Nova side
14:22:18 <bauzas> I made a patch for the client side of evacuation
14:22:31 <bauzas> and I'm about to upload a patch for the live-mig one by EOD
14:22:44 <PaulMurray> do these fall under the scheduler subteam ?
14:22:46 <bauzas> couple of nits for the first review, but looks good so far
14:22:51 <bauzas> well, not really
14:22:54 <bauzas> it's not a prio bp
14:23:06 <mriedem> so the bp is done,
14:23:10 <mriedem> only thing left is the cli
14:23:11 <bauzas> it's rather a poor lone cowboy
14:23:16 <mriedem> which isn't restricted to the nova release schedule
14:23:23 <PaulMurray> ok - that's good then
14:23:23 <bauzas> yeah that
14:23:24 <mriedem> it's the next microversion that needs to be implemented in the cli
14:23:26 <mriedem> so it will be done
14:23:30 <mriedem> as others depend on it
14:23:33 <bauzas> I fooled the release schedule :)
14:23:40 <mriedem> so we don't really need to worry about that one here
14:23:44 <bauzas> +1
14:23:44 <PaulMurray> thanks bauzas
14:23:54 <PaulMurray> are there any bps I have missed
14:24:12 <PaulMurray> these were on our etherpad - I have a feeling there is another one that might have got lost
14:24:21 <PaulMurray> ...but only a feeling
14:24:46 <PaulMurray> ok......
14:24:50 <PaulMurray> next up
14:24:58 <PaulMurray> before mriedem goes running to a meeting somewhere
14:25:05 <PaulMurray> #topic CI
14:25:16 <tdurakov> mriedem: any updated for job stability?
14:25:16 <PaulMurray> any updates ?
14:25:29 <mriedem> i was going to check the live migration job quick
14:25:45 <mriedem> but since it's in the experimental queue i'm not sure if we track stats on that, i'll find out
14:26:15 <mriedem> you might have to come back around to me
14:26:44 <PaulMurray> np
14:26:56 <tdurakov> PaulMurray, fyi: all shared storages were disabled, to track only block-migration failures for now
14:27:03 <mriedem> the other thing is
14:27:21 <mriedem> since it's an on-demand job any numbers we have are only going to be for what was run against it
14:27:28 <mriedem> which could also be busted changes
14:27:49 <PaulMurray> I should encourage people to use the jobs
14:27:56 <tdurakov> mriedem: well, maybe move it to check pipeline then?
14:27:58 <mriedem> so i'm thinking we should probably just move that job to the check queue as non-voting
14:28:00 <PaulMurray> I forgot to do that recently
14:28:08 <tdurakov> mriedem: +1
14:28:16 <tdurakov> I'll prepare patch
14:28:17 <PaulMurray> sounds good to me
14:28:37 <mriedem> i'm not sure if that means we also want to drop the live migration tests from the gate-tempest-dsvm-multinode-full job which is also in the check queue and non-voting today
14:28:51 <mriedem> since they would be essentially duplicating the test run
14:28:56 <tdurakov> mriedem: i'd leave another job untouched for now
14:29:25 <mriedem> yeah i agree, at least to start
14:29:36 <tdurakov> need to figure out live-migration job health first, than disable tests for another one
14:29:54 <PaulMurray> yep, good plan
14:30:06 <PaulMurray> has anything else happened with CI this week
14:30:13 <tdurakov> guess no
14:30:23 <PaulMurray> ok, lets move on then
14:30:33 <mriedem> ok so no data for gate-tempest-dsvm-multinode-live-migration
14:30:35 <mriedem> just checked it
14:30:40 <mriedem> b/c it's in the experimental queue
14:30:44 <PaulMurray> #topic Libvirt Storage Pools
14:31:07 <PaulMurray> mdbooth, diana_clarke going ahead as usual ?
14:31:11 <diana_clarke> Reviews, especially on the individual image backends, would be great. I'm also waiting on Virtuozzo CI.
14:31:38 <mdbooth> Yeah, we've gotten a bit stalled on reviews for the main series
14:32:19 <PaulMurray> lets see if we can get some more reviews
14:32:31 <PaulMurray> could be people turning attention to non-priority ff
14:32:31 <mdbooth> However, we think there's some missing tempest coverage right now
14:32:40 <PaulMurray> I'd like to get a good understanding of where you are in the process after the coming week
14:32:50 <PaulMurray> so I'll be in touch about that
14:32:55 <PaulMurray> paul-carlton2, can help me understand
14:32:59 <mdbooth> Suspect that resize might not be working correctly with the new series, but CI still says it's ok
14:33:16 <PaulMurray> hmmm
14:33:34 <PaulMurray> you think the tests are any good ?
14:33:50 <mdbooth> diana_clarke: were you going to look into that?
14:33:51 <PaulMurray> I haven't looked
14:35:42 <diana_clarke> mdbooth: Perhaps. I was thinking of taking a look at the tempest coverage next.
14:36:17 <mriedem> mdbooth: what do you suspect is missing? resize with ephemeral? i saw you opened a bug for that
14:36:22 <mriedem> the resize tests in tempest are pretty basic
14:36:32 <PaulMurray> #action ALL please keep up reviews on image back end patches
14:36:43 <mdbooth> mriedem: I haven't looked in detail, but I have a suspicion resize is completely broken
14:36:53 <mdbooth> For all disks (with the new series)
14:37:02 <mriedem> mdbooth: oh, heh
14:37:09 <mriedem> mdbooth: as in the disks don't actually resize
14:37:10 <mdbooth> If that's true, that would suggest no CI coverage
14:37:16 <mdbooth> I *think* so
14:37:18 <mriedem> mdbooth: tempest just verifies the API doesn't barf
14:37:27 <mriedem> it doesn't check the size of the disks or anything
14:37:35 <mdbooth> Right, we'd want to it actually go into the guest and check the size of its disks
14:37:43 <mriedem> which i don't think tempest is going to do
14:37:46 <mriedem> because that's virt specific
14:37:50 <mriedem> unless you ssh into the guest
14:37:57 <mdbooth> Right, you'd have to ssh into the guest
14:38:08 <mdbooth> But it's a thing for any hypervisor, right?
14:38:13 <mriedem> we can't get any of that info out of the bdms from the api?
14:38:24 <mdbooth> Hehe, funny story
14:38:27 <mdbooth> BDMs don't get updated
14:38:34 * mdbooth is working on fixing that right now
14:38:47 <mriedem> ok so rebuilding with bdms would take you back to the pre-resize state
14:38:48 <mriedem> nice
14:39:03 <mdbooth> mriedem: Depends where you read your data from
14:39:08 <mdbooth> Because instance object *is* updated
14:39:17 <mdbooth> So root_gb, ephemeral_gb, swap_mb all updated
14:39:20 <mdbooth> But not bdms
14:40:17 <PaulMurray> is your code worse than the existing code ?
14:40:20 <PaulMurray> for that?
14:40:32 <PaulMurray> ...its not something you've introduced is it ?
14:40:33 <mdbooth> PaulMurray: If I'm right about it not resizing anything, then yes.
14:40:43 <mdbooth> Currently it only doesn't resize ephemeral disks.
14:41:13 <PaulMurray> ....well at least you code is consistent.... lol
14:41:26 <mdbooth> PaulMurray: :)
14:41:33 * PaulMurray that's an ironic lol, not a real laught
14:42:02 <mdbooth> Interestingly, resizing of ephemeral disks is broken in at least 3 different ways I'm currently aware of.
14:42:25 <mdbooth> 3 different and entirely independent ways, each of which mean it could never work.
14:42:48 <mdbooth> Incidentally, in attempting to fix that I'm going to have to fudge a design question
14:43:03 <mdbooth> Namely, what does it mean to resize an instance with more than 1 ephemeral disk?
14:43:20 <mdbooth> It's permitted to create multiple ephemeral disks up to a total of ephemeral_gb
14:43:31 <mdbooth> By default it only creates 1, taking up the whole space
14:43:38 <mdbooth> Resizing just 1 is obvious
14:43:45 <mdbooth> But what does it even mean to resize more than 1?
14:43:54 <PaulMurray> I don't follow
14:43:57 * mdbooth is planning to emit a warning in the logs and do nothing
14:44:17 <mdbooth> PaulMurray: You have a flavor which permits an ephemeral disk of 5F
14:44:21 <mdbooth> s/F/G/
14:44:43 <mdbooth> When you create your instance, you explicitly specify 2 ephemeral disks of sizes 2G and 3G
14:45:02 <mdbooth> This is ok, because total <= 5G
14:45:12 <mdbooth> Now you resize to a flavor with ephemeral size of 10G
14:45:16 <mdbooth> What do you do?
14:45:24 <mdbooth> Create a third ephemeral of size 5G?
14:45:31 <mdbooth> Double the size of both existing disks?
14:45:34 <mdbooth> Resize one of them?
14:45:53 * PaulMurray obviously don't remember this bit
14:46:10 <PaulMurray> I would imagine you would log it and do nothing
14:46:12 <mriedem> yikes
14:46:16 <mriedem> config option!
14:46:22 * mdbooth larts mriedem
14:46:30 <PaulMurray> you would need more detail to specify what to do right ?
14:46:32 * mriedem looks up "larts"
14:46:54 <mdbooth> PaulMurray: Yeah, I'm just planning to punt it.
14:47:03 <mdbooth> It's broken now, and I'm pretty sure almost nobody does this
14:47:13 <mdbooth> I was just going to leave it broken with a warning.
14:47:19 <mriedem> mdbooth: might be a good ask for the ops ml
14:47:27 <PaulMurray> I'm going to do some homework so I know what happens
14:47:28 <mdbooth> We can fix it after we've bikeshedded it a bit
14:47:28 <mriedem> to see if they have users running into that
14:48:36 <PaulMurray> there was nothing else on the agenda, but I had better get back to it and make sure in case
14:48:41 * mdbooth is pretty sure the regular case is a single ephemeral disk, and that's easy enough to fix
14:48:48 <mdbooth> 1 thing
14:48:55 <PaulMurray> go ahead
14:49:00 <mdbooth> I wrote this last week: http://lists.openstack.org/pipermail/openstack-dev/2016-June/097529.html
14:49:07 <mdbooth> Writing it was a useful exercise for me
14:49:19 <mdbooth> If it's useful to anybody else too, that would be awesome
14:49:29 <mriedem> i still need to read it
14:49:33 <mriedem> but i intend on reading it
14:49:38 <andrearosa> same here
14:49:43 <mriedem> because if there is one thing i love reading, it's about BDMs in nova
14:49:53 <PaulMurray> me too
14:49:58 <PaulMurray> I keep it for bed time
14:50:06 <mdbooth> Related note, I think instance_disk_info might need to die
14:50:11 <mriedem> i'm reading 'faust' before bed right now, so it can't get much worse
14:50:16 <PaulMurray> mdbooth, any understanding is a good thing
14:50:29 <mdbooth> Which would be relevant to live migration, because that seems to be its primary use
14:50:40 <mdbooth> I think we can take its contents from the db
14:50:52 <mdbooth> Because if the 2 are out of sync we have other problems
14:51:47 <PaulMurray> mdbooth, are we likely to get to a point where we can implement storage pools for migrate/resize before the end of this cycle ?
14:52:19 <mdbooth> PaulMurray: I'm beginning to doubt it :(
14:52:47 <mriedem> honestly we'll be looking good if we can get the refactor done for newton
14:52:51 <PaulMurray> that's what I really wanted to know when I said I wanted to dig into it next week
14:53:00 <mriedem> i've started spending my review time on non-priority bp's this week and next
14:53:58 <PaulMurray> so what I would like to do is work out what the plan is and if there is any way we can order it - I know you
14:54:05 <PaulMurray> are probably doing that already, but
14:54:19 <PaulMurray> I would like to get a handle on how to use people
14:54:28 <PaulMurray> this cycle and next
14:54:46 <PaulMurray> I'm not here to get in your way
14:55:01 <mriedem> are you talking to me or mdbooth?
14:55:18 <PaulMurray> I am here to get in your way mriedem
14:55:22 <PaulMurray> always
14:55:31 <PaulMurray> its mdbooth I'm trying to help
14:55:32 <mriedem> damn you paul murray
14:55:59 <PaulMurray> but need to get this week out the way first
14:56:12 <mriedem> sounds like we could use someone poking on a tempest test that does resize plus ssh into the guest before and after to verify the disks are resized
14:56:17 <mdbooth> PaulMurray: Happy to discuss in more detail next week if you like.
14:56:19 <diana_clarke> My image backend patches haven't really been reviewed at all, so reviews would help.
14:56:26 <mdbooth> +1 diana_clarke
14:56:46 <PaulMurray> I didn't check if anyone had specific review requests or opens, but ther was nothing on the agenda
14:56:49 <PaulMurray> so lets skip that
14:57:04 <PaulMurray> I think we need to wrap up
14:57:22 * PaulMurray is going to change the title of this meeting to "What mdbooth found wrnog with nova this week"
14:57:27 <PaulMurray> :)
14:57:39 * mdbooth is honoured
14:57:40 <PaulMurray> I put an action for reviews
14:57:49 <PaulMurray> time to end
14:57:52 <PaulMurray> bye all
14:58:01 <PaulMurray> #endmeeting