14:01:05 <PaulMurray> #startmeeting Nova Live Migration
14:01:06 <openstack> Meeting started Tue Feb  9 14:01:05 2016 UTC and is due to finish in 60 minutes.  The chair is PaulMurray. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:07 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:10 <openstack> The meeting name has been set to 'nova_live_migration'
14:01:16 <tdurakov> o/
14:01:26 <PaulMurray> hi - who is here?
14:01:29 <pkoniszewski> o/
14:01:29 <mdbooth> o/
14:01:34 <jlanoux> hi
14:01:34 <andrearosa> hi
14:02:28 <paul-carlton2> o/
14:02:45 <PaulMurray> the agenda is here: https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration
14:03:19 <PaulMurray> #topic Next meeting
14:03:29 <PaulMurray> I will not be here next week
14:03:47 <PaulMurray> is there someone willing to be chair for the meeting on 16th Feb?
14:03:52 <tdurakov> I could
14:04:04 <PaulMurray> thank you - I will put you down
14:04:08 <tdurakov> sure
14:04:13 <PaulMurray> #info tdurakov to chair meeting on 16th Feb
14:04:22 <PaulMurray> great, that's sorted
14:04:28 <pkoniszewski> in case there are some troubles I will also be able to cover
14:04:46 <tdurakov> cool, will ping you if any happens
14:04:49 <PaulMurray> ok - thanks pkoniszewski
14:04:52 <PaulMurray> #topic Priority reviews
14:05:11 <PaulMurray> the feature freeze is three weeks away I think
14:05:20 <PaulMurray> but we are starting to get some movement
14:05:34 <PaulMurray> split-network-plane-for-live-migration is complete
14:05:54 <PaulMurray> pause-vm-during-live-migration only needs +W on last patch
14:06:28 <pkoniszewski> pausing needs API approval
14:06:33 <PaulMurray> abort-live-migration has started to get some patches up (well, one)
14:06:52 <andrearosa> ...yet but I am working on the big one
14:06:58 <andrearosa> s/yet/yes
14:07:19 <PaulMurray> pkoniszewski, mikal asked for alex_xu or similar to approve - I think it will be done pretty soon
14:07:52 <PaulMurray> block-live-migrate-with-attached-volumes is making good progress
14:08:03 <PaulMurray> dan +2 first patch
14:08:27 <PaulMurray> I think pkoniszewski is going to update second
14:08:32 <PaulMurray> pkoniszewski, ?
14:08:53 <PaulMurray> #link https://review.openstack.org/#/c/234659 - needs minor revision
14:09:03 <pkoniszewski> so I'm working actually on second patch as I'm not sure that there is an issue
14:09:29 <PaulMurray> is that the one ^^
14:09:43 <pkoniszewski> I'm sure that we want to skip config drive ISO because destination host will try to write to readonly device which is not a good idea at all
14:10:02 <pkoniszewski> now I'm checking what happens if other readonly device is attached to VM
14:10:14 <pkoniszewski> which is not config drive
14:10:34 <PaulMurray> hmmm, did no one think of that?
14:10:53 <PaulMurray> we used to migrate the config drive in HP
14:11:06 <PaulMurray> I think we did it differently though - paul-carlton2 can you remember?
14:11:26 <tdurakov> PaulMurray, was it block-migrate?
14:11:44 <PaulMurray> tdurakov, that's what I am trying to remember
14:11:46 <paul-carlton2> We copied config drive over rcp
14:11:58 <tdurakov> prior to migration?
14:12:04 <paul-carlton2> yep
14:12:05 <PaulMurray> right, which we don't want to do
14:12:15 <paul-carlton2> same for rescue libvirt xml
14:12:29 <paul-carlton2> nope, not the upstream way forward
14:13:12 <PaulMurray> I thought qemu could handle fat config drives ? so it must have worked at some point ?
14:13:23 <pkoniszewski> well, config drive works fine
14:13:28 <pkoniszewski> in current implementation
14:13:53 <paul-carlton2> ah, now I recall, we made config drives vfat which copied ok
14:14:08 <tdurakov> ah
14:14:08 <pkoniszewski> right, they are not readonly
14:14:22 <PaulMurray> I see
14:14:28 <tdurakov> paul-carlton2, they works fine
14:14:48 <paul-carlton2> Yep we got it to work, but using a local patch
14:14:51 <paul-carlton2> I think
14:15:16 <paul-carlton2> can't libvirt copy config drives
14:15:19 <tdurakov> afair vfat works in upstream well too for block-migrate
14:15:32 <PaulMurray> pkoniszewski, so just to check - vfat config drives already migrate - its only the iso format that is blocked
14:15:44 <tdurakov> so the issue affects only iso9600
14:15:52 <PaulMurray> that's why that readonly conditional was there
14:16:06 <PaulMurray> the one danpb commented on
14:16:31 <paul-carlton2> surely at libvirt/qemu  level it should be able to copy any disk device it is told to
14:16:52 <paul-carlton2> otherwise a readonly device could not be created?
14:17:07 <PaulMurray> qemu doesn't create it
14:18:06 <PaulMurray> pkoniszewski, what did it mean when iso9600 support was added
14:18:22 <pkoniszewski> let me find libvirt release note for this
14:18:45 <PaulMurray> ok - lets move this discussion to the patch and move on
14:18:54 <PaulMurray> or we wont finish the meeting
14:19:26 <PaulMurray> the next spec was live-migration-progress-report
14:19:33 <PaulMurray> http://lists.openstack.org/pipermail/openstack-dev/2016-February/085662.html
14:19:54 <PaulMurray> did this ML thread resolve the write to db question?
14:20:02 <pkoniszewski> i think so
14:20:20 <PaulMurray> I wasn't sure if there was an actual conclusion
14:20:45 <tdurakov> heh, I'd prefer to do this reporting online instead of db writes)
14:20:53 <tdurakov> still
14:21:12 <pkoniszewski> I think that we will be able to change this once migration progress is saved on compute side
14:21:41 <johnthetubaguy> what do you mean by "saved on the compute side"?
14:22:09 <paul-carlton2> tdurakov, I disagree the compute process should update db and api calls should get it form there
14:22:29 <tdurakov> johnthetubaguy,  it's about conductor and compute communication refactoring,  i believe
14:22:32 <paul-carlton2> that is how it is done everywhere in openstack, certainly in nova
14:22:39 <pkoniszewski> johnthetubaguy: during midcycle Nikola had some concerns that in case that compute dies it will need to have data saved locally
14:22:51 <pkoniszewski> johnthetubaguy: to recover migration process/cleanup/whatever
14:23:03 <johnthetubaguy> yes, thats in addition, not instead of
14:23:20 <paul-carlton2> That is a separate issue to the progress reporting
14:23:27 <pkoniszewski> thought we can use it for reporting progress online
14:23:40 <johnthetubaguy> anyways, I think we should keep the nova architecture the same here, at least until we find a real issue folks are hitting
14:23:55 <mdbooth> Long term, the db doesn't seem like the best place for this. If the compute is eventually going to have this information, api can directly query the compute because it has an rpc address.
14:23:56 <johnthetubaguy> pkoniszewski: in summary, we can't, the API reads from the DB only
14:24:06 <pkoniszewski> johnthetubaguy: got it
14:24:10 <johnthetubaguy> yeah, lets no go there for now
14:24:32 <pkoniszewski> that's reasonable, i'd stick to writing progress every 5 seconds
14:24:48 <tdurakov> ok, let's move on then
14:24:50 <pkoniszewski> but current implementation does not reset it anywhere
14:25:06 <pkoniszewski> so i have completed migrations with disk and memory to be transferred, which is weird
14:25:41 <johnthetubaguy> the API shouldn't be showing progress info for not in progress things, so I am not sure its a big issue
14:26:03 <pkoniszewski> well, currently it can show that VM is paused with a progress of 92%
14:26:07 <rdopiera> johnthetubaguy: how about for things that were interrupted?
14:27:13 <PaulMurray> a migration should be running, completed, cancelled or in error - only one of those needs progress report ?
14:27:35 <PaulMurray> I'd count paused as running
14:27:42 <johnthetubaguy> rdopiera: same, once the interruption is spotted, but really I just mean we can clean up that stuff later
14:28:02 <johnthetubaguy> PaulMurray: that sounds correct
14:28:08 <rdopiera> johnthetubaguy: it could be useful to know where it finished for cleaning up
14:28:20 <johnthetubaguy> rdopiera: maybe
14:28:25 <rdopiera> johnthetubaguy: where it was interrupted, I mean
14:28:30 <pkoniszewski> rdopiera: +1
14:28:39 <PaulMurray> rdopiera, isn't that covered by its state?
14:28:46 <paul-carlton2> the current monitor code dan write calculates the progress every 5 seconds so it will be updated
14:28:56 <johnthetubaguy> PaulMurray: non-zero progress is in interesting data point though
14:28:57 <rdopiera> PaulMurray: the state is very large grained
14:29:17 <paul-carlton2> progress should be reset to zero when migration ends, be it good or bad outcome
14:30:22 <johnthetubaguy> paul-carlton2: sure, but the API should not report any progress unless its running, so its a nit really
14:30:37 <PaulMurray> johnthetubaguy, pkoniszewski so I think we concluded to write every 5 seconds
14:31:08 <PaulMurray> lets move on
14:31:15 <paul-carlton2> The api as in nova show reports progress all the time
14:31:26 <PaulMurray> I mentioned auto-converge on the agenda
14:31:40 <paul-carlton2> could be zero, should be zero unless live mig or snapshot in progress
14:31:41 <PaulMurray> but I think that is out of scope because it didn't have an approved spec
14:32:05 <pkoniszewski> PaulMurray: and it needs newer libvirt, even for a config flag, so let's skip it
14:32:29 <PaulMurray> pkoniszewski, right, it was on the priority reviews pad so I'm going to remove it
14:32:32 <PaulMurray> just got confused
14:32:48 <pkoniszewski> sure, go ahead, i forgot to remove it
14:32:55 <PaulMurray> #topic Bugs
14:33:14 <paul-carlton2> if we are referring to the new data counters they should be left as is on completion since it may be useful to se how far it got when it stopped/completed and how much data it copied
14:33:28 <PaulMurray> paul-carlton2, we've moved on
14:33:35 <johnthetubaguy> paul-carlton2: yeah, thats what I mean API returns zero if not in progress, regardless of DB, but yeah, lets move on
14:34:00 <PaulMurray> there are 4 new bugs for triage
14:34:03 <PaulMurray> https://bugs.launchpad.net/nova/+bugs?field.tag=live-migration+&field.status%3Alist=NEW
14:34:21 <PaulMurray> I don't know who is doing the triage but well done to them :)
14:34:47 <PaulMurray> I moved some of the bug patches around in the priorities pad
14:34:52 <PaulMurray> because several were stale
14:35:14 <PaulMurray> the rest have had some review and updates recently
14:35:38 <PaulMurray> Does anyone have any questions ?
14:36:00 <PaulMurray> johnthetubaguy, do you have any comments on bugs - we don't do a great job of turning those over
14:36:08 <tdurakov> about this one: https://bugs.launchpad.net/nova/+bug/1526642  - it's possible to break anti-affinity rules by providing dest host, right?
14:36:10 <openstack> Launchpad bug 1526642 in OpenStack Compute (nova) "Simultaneous live migrations break anti-affinity policy" [Undecided,New]
14:36:35 <johnthetubaguy> PaulMurray: only that once reviews are ready, lets get them in that etherpad, like the other patches
14:36:35 <pkoniszewski> tdurakov: no, scheduler breaks anti-affinity policy
14:36:49 <mdbooth> bauzas: ^^^
14:36:50 <johnthetubaguy> PaulMurray: having a set of stuff you want in mitaka would be no bad thing
14:36:54 <pkoniszewski> tdurakov: in case of parallel scheduling it might cause rescheduling and instances are spawned on the same host
14:36:56 <PaulMurray> anti-affinity is checked for boot
14:37:00 <PaulMurray> in claims code
14:37:12 <PaulMurray> so if you get a claim you got anti-affinity
14:37:17 <PaulMurray> its easier than affinity
14:37:20 <johnthetubaguy> PaulMurray: there were a heap of buts in the tracker that seemed to be fixed by upcoming blueprints, those are working commenting on, any maybe changing the status somehow
14:37:28 <tdurakov> pkoniszewski, if you got aa policy and manually pick up dest node that breaks rules?
14:37:41 <pkoniszewski> tdurakov: if you force, yes
14:37:53 <PaulMurray> I think force is allowed to break rules
14:37:57 <PaulMurray> it overrides
14:38:08 <johnthetubaguy> yes, thinking about migrating one host to a new host
14:38:13 <PaulMurray> there will be a check destinations coming
14:38:13 <tdurakov> yep, it doesn't asks scheduler
14:38:15 <johnthetubaguy> you need to move each VM one by one
14:38:31 <johnthetubaguy> so you actually have to break affinity = true rules while you do the move
14:38:45 * johnthetubaguy mutters something about great power and great responsibility
14:38:45 * bauzas lurks
14:38:50 <PaulMurray> johnthetubaguy, yes, true - I was thinking about that the other day
14:39:17 <PaulMurray> only moving whole host to whole host works for these kind of policies
14:39:21 <mdbooth> bauzas: Discussion was around anti-affinity during live-migration https://bugs.launchpad.net/nova/+bug/1526642
14:39:22 <openstack> Launchpad bug 1526642 in OpenStack Compute (nova) "Simultaneous live migrations break anti-affinity policy" [Undecided,New]
14:39:31 <johnthetubaguy> now eventually, we could do some fancy stuff to work around that, but lets just get it working first
14:39:32 <bauzas> so, server groups is checked in a semaphore in the compute manager AFAIR
14:39:46 <bauzas> actually, I wrote that in Kilo
14:39:48 <PaulMurray> bauzas, yes, in claims
14:40:21 <bauzas> PaulMurray: probably, I don't remember in which caller I made that
14:40:24 <johnthetubaguy> bauzas: force needs to break these rules quite a few times, and thats the idea, basically
14:40:49 <pkoniszewski> johnthetubaguy: it breaks rules even without force
14:41:13 <bauzas> mmm
14:41:35 <tdurakov> same for affinity rules
14:41:36 <bauzas> it would add another level of verification
14:41:40 <bauzas> but fair
14:41:52 <bauzas> for the moment, we don't provide the Spec object to the compute manager
14:42:03 <bauzas> so, there is no way for it to know whether it was forced or not
14:42:05 <bauzas> anyway
14:42:17 <bauzas> I'm discussing about a non-implemented BP
14:42:45 <PaulMurray> we can look at this more later
14:43:07 <PaulMurray> anything else on bugs?
14:43:46 <PaulMurray> #topic open discussion
14:43:58 <PaulMurray> the floor is open for anything else?
14:44:04 <PaulMurray> I noticed there is movement
14:44:15 <mdbooth> Libvirt storage pools isn't going to happen by the end of the month
14:44:16 <PaulMurray> on ploop support for storage groups in libvirt
14:44:35 <PaulMurray> er, storage pools
14:44:48 <mdbooth> I talked to paul-carlton2 the other day, and there's basically 1 preparatory patch I'd like to get in Mitaka
14:45:03 <mdbooth> Which would make it easier to migrate between backend storage schemes in Newton
14:45:15 <PaulMurray> do you have a link yet?
14:45:21 <PaulMurray> review
14:45:27 <PaulMurray> or is it coming
14:45:48 <mdbooth> I haven't written it, yet, because I was trying to get there in order. However, I may bodge something out of order for expediency.
14:46:19 <PaulMurray> we have less than 3 weeks I think
14:46:25 <mdbooth> Basically, we have a file, disk.info, which currently stores information about which file format a particular disk is using.
14:46:36 <mdbooth> I want to change it to be able to store more than that
14:46:44 <mdbooth> For example, which backend it's using.
14:46:59 <mdbooth> However, the file is on shared storage
14:47:07 <mdbooth> So it needs to be able to co-exist with N-1
14:47:25 <mdbooth> I'd like to change the parser in M to understand the format I intend to use in N
14:47:33 <mdbooth> And not barf
14:47:35 <johnthetubaguy> pkoniszewski: today, yes I think it does break that, thats what bauzas's blueprint is trying to fix
14:48:19 <mdbooth> In the meantime, I will continue trying to get there more cleanly.
14:48:22 <johnthetubaguy> mdbooth: why not start reporting the new stuff now, but only use the new data later?
14:48:42 <mdbooth> johnthetubaguy: It's a JSON file mapping disk -> format
14:48:58 <mdbooth> I want it to map disk -> {lots: of, new: things}
14:49:34 <mdbooth> Also, if I changed the format now, it would break compatibility with Liberty
14:49:43 <johnthetubaguy> mdbooth: oh, I see, you need to reader to support both before you use both
14:49:48 <mdbooth> If Liberty and Mitaka computes are on the same shared storage
14:49:55 <mdbooth> Yeah
14:49:56 <johnthetubaguy> mdbooth: thats testable quite easily, so that should be OK
14:50:37 <johnthetubaguy> mdbooth: or just add an additional file? and eventually delete the old one, but that does seem wasteful
14:50:54 <johnthetubaguy> don't want to hit filesystem limits because of that
14:50:59 <mdbooth> I hoped at one point that the file was redundant, but it turns out it's not
14:51:10 <mdbooth> So I figured I'd just co-opt it
14:51:41 <mdbooth> Also, it already took the best name ;)
14:53:37 <PaulMurray> mdbooth, when you've figured out what you need to land in mitaka you can let us know and we will try and push it
14:53:57 <mdbooth> PaulMurray: Sure.
14:54:12 <PaulMurray> mdbooth, thanks
14:54:30 <PaulMurray> anything else?
14:54:52 <mdbooth> PaulMurray: Amongst other things, it will allow us to know which instances have been moved from the legacy scheme to libvirt storage pools.
14:55:53 <PaulMurray> mdbooth, are you going to negotiate that between hosts or does it end up in the db?
14:56:04 <PaulMurray> when it comes to a migration I mean?
14:56:12 <mdbooth> It's on disk on the compute.
14:56:39 <mdbooth> It uses instance storage, which makes most sense.
14:57:03 <mdbooth> The other advantage of using the existing file, is that current code should expect it to exist.
14:57:27 <mdbooth> So fewer dependent changes.
14:57:46 <PaulMurray> I get it,
14:57:51 * mdbooth assumes this file would be copied over during non-shared storage migration, for eg
14:58:29 <PaulMurray> we're coming to the end of the meeting now
14:58:48 <paul-carlton> surely it would be created by the dest compute node, not copied?
14:59:04 <PaulMurray> tdurakov, or pkoniszewski if anything goes wrong for tdurakov
14:59:14 <mdbooth> paul-carlton: Wouldn't surprise me, but that would be a bug.
14:59:18 <PaulMurray> will chair next meeting ^^
14:59:24 <PaulMurray> we have to close
14:59:30 <PaulMurray> bye all - and thanks for coming
14:59:31 <andrearosa> bye
14:59:33 <tdurakov> byr
14:59:34 <PaulMurray> #endmeeting