14:01:52 <tdurakov> #startmeeting Nova Live Migration
14:01:53 <openstack> Meeting started Tue Nov  1 14:01:52 2016 UTC and is due to finish in 60 minutes.  The chair is tdurakov. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:56 <openstack> The meeting name has been set to 'nova_live_migration'
14:02:03 <tdurakov> hi everyone
14:02:55 <mrhillsman> o/
14:03:00 <raj_singh> o/
14:03:09 <tdurakov> #link https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration#Agenda_for_next_meeting
14:03:13 <tdurakov> agenda
14:03:22 <mriedem> o/
14:03:30 <wznoinsk> o/
14:03:50 <tdurakov> let's start
14:03:54 <paul-carlton2> o/
14:03:55 <tdurakov> #topic CI
14:04:23 <tdurakov> raj_singh, pkoniszewski any updates on grenade job?
14:04:50 <raj_singh> Tempest patch is still not merged
14:04:58 <raj_singh> https://review.openstack.org/#/c/379638/
14:05:26 <raj_singh> I will try to address the comments and ping in QA folks
14:05:27 <mriedem> probably need to catch up with jordanP in openstack-qa at some point
14:05:34 <tdurakov> raj_singh: maybe it's worth to ask on #openstack-qa for feedback?
14:05:39 <tdurakov> mriedem: agree
14:05:39 <mriedem> my guess is the QA people are distracted from the summit
14:05:40 <raj_singh> yup
14:05:58 <raj_singh> will do
14:06:40 <tdurakov> raj_singh: please address/response to jordanP comments
14:06:46 <tdurakov> anything else?
14:06:57 <wznoinsk> one thing
14:07:05 <tdurakov> sure
14:07:26 <wznoinsk> I will not have time to work on upstreaming the nfv tests, I would appreciate some helping hands here
14:08:29 <tdurakov> wznoinsk: please start ml thread to find volunteers, it would be great to write down a test plan in this thread too
14:08:32 <wznoinsk> I mean, the ones ran by Intel NFV CI - http://intel-openstack-ci-logs.ovh/53/356553/12/check/tempest-dsvm-intel-nfv-xenial/0f5b1e7/testr_results.html.gz
14:08:48 <wznoinsk> ok, will follow-up on ML then
14:08:56 <tdurakov> thanks!
14:09:02 <tdurakov> ok, 2c from me
14:09:13 <tdurakov> https://review.openstack.org/#/c/389546/31 - ceph bits
14:09:20 <tdurakov> it works
14:09:46 <tdurakov> mriedem: I've responsed to your comments and resubmitted patches
14:10:08 <tdurakov> one thing to mention http://logs.openstack.org/82/389582/26/check/gate-tempest-dsvm-multinode-live-migration-ubuntu-xenial/975c897/
14:10:30 <tdurakov> gates rarely marks job with post-failure
14:10:52 <tdurakov> I've already asked folks on #openstack-infra, it should be ansible issue
14:11:25 <tdurakov> mriedem: taking this into account, could we merge it now? or should wait?
14:11:35 <mriedem> what do you mean by post-failure?
14:11:46 <tdurakov> it looks like issue doesn't related with the change itself
14:11:59 <mriedem> oh the status on the job was literally POST_FAILURE?
14:12:14 <mriedem> i see
14:12:15 <mriedem> No such file or directory (2)\nrsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1183) [sender=3.1.0]\n", "rc": 23}
14:12:18 <tdurakov> yes
14:12:24 <mriedem> yeah, failure to upload logs usually
14:12:28 <mriedem> or some ssh failure
14:12:32 <mriedem> those are known issues
14:12:39 <mriedem> just have to recheck the job
14:12:44 <mriedem> *recheck the patch
14:12:46 <tdurakov> right
14:13:00 <mriedem> we don't index the ansible logs in logstash yet, else we could track those failures better
14:13:01 <tdurakov> if it's known issue, I believe there are no blockers
14:13:19 <mriedem> oo speaking of https://review.openstack.org/#/c/351269/
14:13:20 <mriedem> i forgot i had that
14:14:04 <mriedem> anyway, ignore
14:14:08 <tdurakov> starred that^
14:14:10 <mriedem> i'll look at the ceph change later today
14:14:26 <tdurakov> thanks, everyone are welcome to do the same
14:14:42 <tdurakov> ok, let's move on
14:15:04 <tdurakov> #topic
14:15:06 <tdurakov> Bugs
14:15:10 <tdurakov> #topic Bugs
14:15:13 <tdurakov> sorry
14:15:33 <tdurakov> https://review.openstack.org/#/c/338929/ - want to bring this one
14:16:15 <johnthetubaguy> ooh, interesting
14:16:32 <tdurakov> the whole idea is ok
14:16:59 <tdurakov> but I think it would be better to put such data in migrate_data object instead
14:17:04 <tdurakov> what do you think?
14:17:42 <johnthetubaguy> the problem with the migrate object is the per hypervisor-ness, but I get your point
14:17:57 <johnthetubaguy> something that might change this, is the fact we are moving to store the connection info inside Cinder, longer term
14:18:35 <tdurakov> I could rephrase, I do not want to extend rpc api param list for all that methods
14:18:39 <johnthetubaguy> I guess I need to dig a bit deeper on why we actually need to pass that
14:18:44 <paul-carlton2> see https://review.openstack.org/#/c/389608, I've added bdm pre state to mig info in there
14:19:14 <paul-carlton2> I have some bugs to discuss
14:20:02 <paul-carlton2> I submitted a revert for https://bugs.launchpad.net/nova/+bug/1614019, the fix merged but breaks live migratuon
14:20:03 <openstack> Launchpad bug 1614019 in OpenStack Compute (nova) "Instances lose its serial ports during soft-reboot after live-migration" [Undecided,Fix released] - Assigned to sahid (sahid-ferdjaoui)
14:20:03 <tdurakov> johnthetubaguy: could you leave a comment on that change please?
14:20:23 <johnthetubaguy> tdurakov: that would require me to decide what is best, but I could
14:20:46 <johnthetubaguy> s/could/should
14:20:50 <tdurakov> :)
14:21:01 <tdurakov> paul-carlton2: will discuss it a bit later
14:21:25 <paul-carlton2> ok
14:21:41 <tdurakov> another important change/fix https://review.openstack.org/#/c/389687/
14:22:18 <tdurakov> imo it's ok to change rpc from cast to call here, just wanted everyone to be on the same page
14:22:52 <johnthetubaguy> calls can timeout of course, thats the usual issue
14:23:16 <tdurakov> johnthetubaguy: it's 'post' step
14:23:29 <tdurakov> on the other hand we have a race here right now
14:23:43 <johnthetubaguy> yeah, I think pkoniszewski mentioned this to me at one point
14:24:08 <tdurakov> so, from me it's better to timeout, rather that undetermined race
14:24:14 <johnthetubaguy> upgrade wise, I think this is safe enough...
14:24:56 <johnthetubaguy> I mean, you don't get the fix till after the upgrade, but thats what you would expect
14:25:03 <johnthetubaguy> it could be backported I guess
14:25:14 <tdurakov> good point on upgrades tbh
14:26:03 <tdurakov> yeah, destination should be upgraded by that moment
14:26:04 * mriedem runs to another meeting
14:26:34 <tdurakov> paul-carlton2: so:)
14:26:39 <tdurakov> your turn
14:26:49 <johnthetubaguy> tdurakov: thats probably the right thing to do
14:26:59 <paul-carlton2> it breaks live migration because after that hard reboot fails
14:27:47 <paul-carlton2> also https://review.openstack.org/#/c/339588/ still waiting for reviews
14:27:53 <johnthetubaguy> tdurakov: can you test that with a VM that has 15 volumes attached, btw?
14:28:40 <tdurakov> paul-carlton2: will take a look afrer the meeting
14:28:47 <tdurakov> johnthetubaguy: which one?
14:28:56 <paul-carlton2> and https://bugs.launchpad.net/nova/+bug/1633033, I have an attempt at a fix bit it is not working!
14:28:57 <openstack> Launchpad bug 1633033 in OpenStack Compute (nova) "live migration with encrypted volume fails" [Undecided,In progress] - Assigned to Paul Carlton (paul-carlton2)
14:29:01 <johnthetubaguy> tdurakov: commented on the patch changing call to cast
14:29:16 <johnthetubaguy> tdurakov: as a side note, seems like we need to update our section in https://etherpad.openstack.org/p/ocata-nova-priorities-tracking
14:30:09 <tdurakov> johnthetubaguy: ok, will do
14:30:23 <tdurakov> paul-carlton2: about the bug above
14:31:05 <tdurakov> any thoughts on potential fix for that?
14:32:24 <paul-carlton2> seems maybe it needs an os-brick fix, should I ask in cinder room?
14:32:43 <tdurakov> paul-carlton2: I  think yes
14:33:01 <tdurakov> offtop https://github.com/ansible/ansible/issues/18281 - ansible bug for the post_failure
14:33:20 <paul-carlton2> thanks, got to go now, meeting
14:33:27 <johnthetubaguy> paul-carlton2: it feels like your fix is also needed for migration?
14:35:11 <tdurakov> ok, let's go next
14:35:17 <tdurakov> #topic specs
14:35:27 <tdurakov> any updates on the specs for live-migration
14:36:23 <tdurakov> well, it seems like folks are still returning from Barcelona, will try to reach them after the meeting
14:36:36 <tdurakov> #topic Open discussion
14:37:05 <tdurakov> pkoniszewski: are you around?
14:37:21 <tdurakov> https://review.openstack.org/#/c/292826/ - wanted to update status on this chain
14:38:11 <tdurakov> another thing https://review.openstack.org/#/c/274097 - devref for nova-neutron communication during live-migration
14:38:38 <tdurakov> johnthetubaguy: you might be interested^
14:39:11 <johnthetubaguy> yeah, thats on my review list
14:39:25 <johnthetubaguy> given the discussions to totally re-write that flow
14:40:04 <tdurakov> I've shared diagrams for that on one of the privous meetings
14:40:15 <tdurakov> working on this part
14:40:23 <tdurakov> so, anything else to bring up?
14:40:35 <johnthetubaguy> mdbooth: are you still working on this post copy bug, I think you were in the design summit session where I mentioned the quick fix for that? https://bugs.launchpad.net/nova/+bug/1605016
14:40:37 <openstack> Launchpad bug 1605016 in OpenStack Compute (nova) "Post copy live migration interrupts network connectivity" [High,In progress] - Assigned to Matthew Booth (mbooth-9)
14:40:57 <johnthetubaguy> I should follow up on that from your comment
14:42:10 <mdbooth> johnthetubaguy: Not really, tbh.
14:42:39 <mdbooth> johnthetubaguy: I was working on it while other stuff was stalled, but I'd be very happy for somebody else to take it from me.
14:43:32 <tdurakov> mdbooth: please update assignee and status in the lp
14:43:51 <tdurakov> mdbooth: will try to find anyone interested in fixing that
14:44:18 <mdbooth> i still have context on it, though. I spoke to pavel(?) about it last week.
14:44:46 <mdbooth> Incidentally, I still dispute the 'High' importance. I believe it's of limited consequence in practise.
14:46:12 <tdurakov> mdbooth: let's discuss that next week than, once more people be there
14:46:38 <mdbooth> Should be fixed, of course, but probably nobody will noticve.
14:47:45 <tdurakov> need to re-read bug, will comment it
14:47:51 <tdurakov> anything else?
14:48:53 <tdurakov> thanks everyone for coming
14:49:23 <tdurakov> #endmeeting