14:01:52 <tdurakov> #startmeeting Nova Live Migration 14:01:53 <openstack> Meeting started Tue Nov 1 14:01:52 2016 UTC and is due to finish in 60 minutes. The chair is tdurakov. 14:02:03 <tdurakov> hi everyone 14:02:55 <mrhillsman> o/ 14:03:00 <raj_singh> o/ 14:03:09 <tdurakov> #link https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration#Agenda_for_next_meeting 14:03:13 <tdurakov> agenda 14:03:22 <mriedem> o/ 14:03:30 <wznoinsk> o/ 14:03:50 <tdurakov> let's start 14:03:54 <paul-carlton2> o/ 14:03:55 <tdurakov> #topic CI 14:04:23 <tdurakov> raj_singh, pkoniszewski any updates on grenade job? 14:04:50 <raj_singh> Tempest patch is still not merged 14:04:58 <raj_singh> https://review.openstack.org/#/c/379638/ 14:05:26 <raj_singh> I will try to address the comments and ping in QA folks 14:05:27 <mriedem> probably need to catch up with jordanP in openstack-qa at some point 14:05:34 <tdurakov> raj_singh: maybe it's worth to ask on #openstack-qa for feedback? 14:05:39 <tdurakov> mriedem: agree 14:05:39 <mriedem> my guess is the QA people are distracted from the summit 14:05:40 <raj_singh> yup 14:05:58 <raj_singh> will do 14:06:40 <tdurakov> raj_singh: please address/response to jordanP comments 14:06:46 <tdurakov> anything else? 14:06:57 <wznoinsk> one thing 14:07:05 <tdurakov> sure 14:07:26 <wznoinsk> I will not have time to work on upstreaming the nfv tests, I would appreciate some helping hands here 14:08:29 <tdurakov> wznoinsk: please start ml thread to find volunteers, it would be great to write down a test plan in this thread too 14:08:32 <wznoinsk> I mean, the ones ran by Intel NFV CI - http://intel-openstack-ci-logs.ovh/53/356553/12/check/tempest-dsvm-intel-nfv-xenial/0f5b1e7/testr_results.html.gz 14:08:48 <wznoinsk> ok, will follow-up on ML then 14:08:56 <tdurakov> thanks! 14:09:02 <tdurakov> ok, 2c from me 14:09:13 <tdurakov> https://review.openstack.org/#/c/389546/31 - ceph bits 14:09:20 <tdurakov> it works 14:09:46 <tdurakov> mriedem: I've responsed to your comments and resubmitted patches 14:10:08 <tdurakov> one thing to mention http://logs.openstack.org/82/389582/26/check/gate-tempest-dsvm-multinode-live-migration-ubuntu-xenial/975c897/ 14:10:30 <tdurakov> gates rarely marks job with post-failure 14:10:52 <tdurakov> I've already asked folks on #openstack-infra, it should be ansible issue 14:11:25 <tdurakov> mriedem: taking this into account, could we merge it now? or should wait? 14:11:35 <mriedem> what do you mean by post-failure? 14:11:46 <tdurakov> it looks like issue doesn't related with the change itself 14:11:59 <mriedem> oh the status on the job was literally POST_FAILURE? 14:12:14 <mriedem> i see 14:12:15 <mriedem> No such file or directory (2)\nrsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1183) [sender=3.1.0]\n", "rc": 23} 14:12:18 <tdurakov> yes 14:12:24 <mriedem> yeah, failure to upload logs usually 14:12:28 <mriedem> or some ssh failure 14:12:32 <mriedem> those are known issues 14:12:39 <mriedem> just have to recheck the job 14:12:44 <mriedem> *recheck the patch 14:12:46 <tdurakov> right 14:13:00 <mriedem> we don't index the ansible logs in logstash yet, else we could track those failures better 14:13:01 <tdurakov> if it's known issue, I believe there are no blockers 14:13:19 <mriedem> oo speaking of https://review.openstack.org/#/c/351269/ 14:13:20 <mriedem> i forgot i had that 14:14:04 <mriedem> anyway, ignore 14:14:08 <tdurakov> starred that^ 14:14:10 <mriedem> i'll look at the ceph change later today 14:14:26 <tdurakov> thanks, everyone are welcome to do the same 14:14:42 <tdurakov> ok, let's move on 14:15:04 <tdurakov> #topic 14:15:06 <tdurakov> Bugs 14:15:10 <tdurakov> #topic Bugs 14:15:13 <tdurakov> sorry 14:15:33 <tdurakov> https://review.openstack.org/#/c/338929/ - want to bring this one 14:16:15 <johnthetubaguy> ooh, interesting 14:16:32 <tdurakov> the whole idea is ok 14:16:59 <tdurakov> but I think it would be better to put such data in migrate_data object instead 14:17:04 <tdurakov> what do you think? 14:17:42 <johnthetubaguy> the problem with the migrate object is the per hypervisor-ness, but I get your point 14:17:57 <johnthetubaguy> something that might change this, is the fact we are moving to store the connection info inside Cinder, longer term 14:18:35 <tdurakov> I could rephrase, I do not want to extend rpc api param list for all that methods 14:18:39 <johnthetubaguy> I guess I need to dig a bit deeper on why we actually need to pass that 14:18:44 <paul-carlton2> see https://review.openstack.org/#/c/389608, I've added bdm pre state to mig info in there 14:19:14 <paul-carlton2> I have some bugs to discuss 14:20:02 <paul-carlton2> I submitted a revert for https://bugs.launchpad.net/nova/+bug/1614019, the fix merged but breaks live migratuon 14:20:03 <openstack> Launchpad bug 1614019 in OpenStack Compute (nova) "Instances lose its serial ports during soft-reboot after live-migration" [Undecided,Fix released] - Assigned to sahid (sahid-ferdjaoui) 14:20:03 <tdurakov> johnthetubaguy: could you leave a comment on that change please? 14:20:23 <johnthetubaguy> tdurakov: that would require me to decide what is best, but I could 14:20:46 <johnthetubaguy> s/could/should 14:20:50 <tdurakov> :) 14:21:01 <tdurakov> paul-carlton2: will discuss it a bit later 14:21:25 <paul-carlton2> ok 14:21:41 <tdurakov> another important change/fix https://review.openstack.org/#/c/389687/ 14:22:18 <tdurakov> imo it's ok to change rpc from cast to call here, just wanted everyone to be on the same page 14:22:52 <johnthetubaguy> calls can timeout of course, thats the usual issue 14:23:16 <tdurakov> johnthetubaguy: it's 'post' step 14:23:29 <tdurakov> on the other hand we have a race here right now 14:23:43 <johnthetubaguy> yeah, I think pkoniszewski mentioned this to me at one point 14:24:08 <tdurakov> so, from me it's better to timeout, rather that undetermined race 14:24:14 <johnthetubaguy> upgrade wise, I think this is safe enough... 14:24:56 <johnthetubaguy> I mean, you don't get the fix till after the upgrade, but thats what you would expect 14:25:03 <johnthetubaguy> it could be backported I guess 14:25:14 <tdurakov> good point on upgrades tbh 14:26:03 <tdurakov> yeah, destination should be upgraded by that moment 14:26:04 * mriedem runs to another meeting 14:26:34 <tdurakov> paul-carlton2: so:) 14:26:39 <tdurakov> your turn 14:26:49 <johnthetubaguy> tdurakov: thats probably the right thing to do 14:26:59 <paul-carlton2> it breaks live migration because after that hard reboot fails 14:27:47 <paul-carlton2> also https://review.openstack.org/#/c/339588/ still waiting for reviews 14:27:53 <johnthetubaguy> tdurakov: can you test that with a VM that has 15 volumes attached, btw? 14:28:40 <tdurakov> paul-carlton2: will take a look afrer the meeting 14:28:47 <tdurakov> johnthetubaguy: which one? 14:28:56 <paul-carlton2> and https://bugs.launchpad.net/nova/+bug/1633033, I have an attempt at a fix bit it is not working! 14:28:57 <openstack> Launchpad bug 1633033 in OpenStack Compute (nova) "live migration with encrypted volume fails" [Undecided,In progress] - Assigned to Paul Carlton (paul-carlton2) 14:29:01 <johnthetubaguy> tdurakov: commented on the patch changing call to cast 14:29:16 <johnthetubaguy> tdurakov: as a side note, seems like we need to update our section in https://etherpad.openstack.org/p/ocata-nova-priorities-tracking 14:30:09 <tdurakov> johnthetubaguy: ok, will do 14:30:23 <tdurakov> paul-carlton2: about the bug above 14:31:05 <tdurakov> any thoughts on potential fix for that? 14:32:24 <paul-carlton2> seems maybe it needs an os-brick fix, should I ask in cinder room? 14:32:43 <tdurakov> paul-carlton2: I think yes 14:33:01 <tdurakov> offtop https://github.com/ansible/ansible/issues/18281 - ansible bug for the post_failure 14:33:20 <paul-carlton2> thanks, got to go now, meeting 14:33:27 <johnthetubaguy> paul-carlton2: it feels like your fix is also needed for migration? 14:35:11 <tdurakov> ok, let's go next 14:35:17 <tdurakov> #topic specs 14:35:27 <tdurakov> any updates on the specs for live-migration 14:36:23 <tdurakov> well, it seems like folks are still returning from Barcelona, will try to reach them after the meeting 14:36:36 <tdurakov> #topic Open discussion 14:37:05 <tdurakov> pkoniszewski: are you around? 14:37:21 <tdurakov> https://review.openstack.org/#/c/292826/ - wanted to update status on this chain 14:38:11 <tdurakov> another thing https://review.openstack.org/#/c/274097 - devref for nova-neutron communication during live-migration 14:38:38 <tdurakov> johnthetubaguy: you might be interested^ 14:39:11 <johnthetubaguy> yeah, thats on my review list 14:39:25 <johnthetubaguy> given the discussions to totally re-write that flow 14:40:04 <tdurakov> I've shared diagrams for that on one of the privous meetings 14:40:15 <tdurakov> working on this part 14:40:23 <tdurakov> so, anything else to bring up? 14:40:35 <johnthetubaguy> mdbooth: are you still working on this post copy bug, I think you were in the design summit session where I mentioned the quick fix for that? https://bugs.launchpad.net/nova/+bug/1605016 14:40:37 <openstack> Launchpad bug 1605016 in OpenStack Compute (nova) "Post copy live migration interrupts network connectivity" [High,In progress] - Assigned to Matthew Booth (mbooth-9) 14:40:57 <johnthetubaguy> I should follow up on that from your comment 14:42:10 <mdbooth> johnthetubaguy: Not really, tbh. 14:42:39 <mdbooth> johnthetubaguy: I was working on it while other stuff was stalled, but I'd be very happy for somebody else to take it from me. 14:43:32 <tdurakov> mdbooth: please update assignee and status in the lp 14:43:51 <tdurakov> mdbooth: will try to find anyone interested in fixing that 14:44:18 <mdbooth> i still have context on it, though. I spoke to pavel(?) about it last week. 14:44:46 <mdbooth> Incidentally, I still dispute the 'High' importance. I believe it's of limited consequence in practise. 14:46:12 <tdurakov> mdbooth: let's discuss that next week than, once more people be there 14:46:38 <mdbooth> Should be fixed, of course, but probably nobody will noticve. 14:47:45 <tdurakov> need to re-read bug, will comment it 14:47:51 <tdurakov> anything else? 14:48:53 <tdurakov> thanks everyone for coming 14:49:23 <tdurakov> #endmeeting