14:01:52 #startmeeting Nova Live Migration 14:01:53 Meeting started Tue Nov 1 14:01:52 2016 UTC and is due to finish in 60 minutes. The chair is tdurakov. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:54 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:56 The meeting name has been set to 'nova_live_migration' 14:02:03 hi everyone 14:02:55 o/ 14:03:00 o/ 14:03:09 #link https://wiki.openstack.org/wiki/Meetings/NovaLiveMigration#Agenda_for_next_meeting 14:03:13 agenda 14:03:22 o/ 14:03:30 o/ 14:03:50 let's start 14:03:54 o/ 14:03:55 #topic CI 14:04:23 raj_singh, pkoniszewski any updates on grenade job? 14:04:50 Tempest patch is still not merged 14:04:58 https://review.openstack.org/#/c/379638/ 14:05:26 I will try to address the comments and ping in QA folks 14:05:27 probably need to catch up with jordanP in openstack-qa at some point 14:05:34 raj_singh: maybe it's worth to ask on #openstack-qa for feedback? 14:05:39 mriedem: agree 14:05:39 my guess is the QA people are distracted from the summit 14:05:40 yup 14:05:58 will do 14:06:40 raj_singh: please address/response to jordanP comments 14:06:46 anything else? 14:06:57 one thing 14:07:05 sure 14:07:26 I will not have time to work on upstreaming the nfv tests, I would appreciate some helping hands here 14:08:29 wznoinsk: please start ml thread to find volunteers, it would be great to write down a test plan in this thread too 14:08:32 I mean, the ones ran by Intel NFV CI - http://intel-openstack-ci-logs.ovh/53/356553/12/check/tempest-dsvm-intel-nfv-xenial/0f5b1e7/testr_results.html.gz 14:08:48 ok, will follow-up on ML then 14:08:56 thanks! 14:09:02 ok, 2c from me 14:09:13 https://review.openstack.org/#/c/389546/31 - ceph bits 14:09:20 it works 14:09:46 mriedem: I've responsed to your comments and resubmitted patches 14:10:08 one thing to mention http://logs.openstack.org/82/389582/26/check/gate-tempest-dsvm-multinode-live-migration-ubuntu-xenial/975c897/ 14:10:30 gates rarely marks job with post-failure 14:10:52 I've already asked folks on #openstack-infra, it should be ansible issue 14:11:25 mriedem: taking this into account, could we merge it now? or should wait? 14:11:35 what do you mean by post-failure? 14:11:46 it looks like issue doesn't related with the change itself 14:11:59 oh the status on the job was literally POST_FAILURE? 14:12:14 i see 14:12:15 No such file or directory (2)\nrsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1183) [sender=3.1.0]\n", "rc": 23} 14:12:18 yes 14:12:24 yeah, failure to upload logs usually 14:12:28 or some ssh failure 14:12:32 those are known issues 14:12:39 just have to recheck the job 14:12:44 *recheck the patch 14:12:46 right 14:13:00 we don't index the ansible logs in logstash yet, else we could track those failures better 14:13:01 if it's known issue, I believe there are no blockers 14:13:19 oo speaking of https://review.openstack.org/#/c/351269/ 14:13:20 i forgot i had that 14:14:04 anyway, ignore 14:14:08 starred that^ 14:14:10 i'll look at the ceph change later today 14:14:26 thanks, everyone are welcome to do the same 14:14:42 ok, let's move on 14:15:04 #topic 14:15:06 Bugs 14:15:10 #topic Bugs 14:15:13 sorry 14:15:33 https://review.openstack.org/#/c/338929/ - want to bring this one 14:16:15 ooh, interesting 14:16:32 the whole idea is ok 14:16:59 but I think it would be better to put such data in migrate_data object instead 14:17:04 what do you think? 14:17:42 the problem with the migrate object is the per hypervisor-ness, but I get your point 14:17:57 something that might change this, is the fact we are moving to store the connection info inside Cinder, longer term 14:18:35 I could rephrase, I do not want to extend rpc api param list for all that methods 14:18:39 I guess I need to dig a bit deeper on why we actually need to pass that 14:18:44 see https://review.openstack.org/#/c/389608, I've added bdm pre state to mig info in there 14:19:14 I have some bugs to discuss 14:20:02 I submitted a revert for https://bugs.launchpad.net/nova/+bug/1614019, the fix merged but breaks live migratuon 14:20:03 Launchpad bug 1614019 in OpenStack Compute (nova) "Instances lose its serial ports during soft-reboot after live-migration" [Undecided,Fix released] - Assigned to sahid (sahid-ferdjaoui) 14:20:03 johnthetubaguy: could you leave a comment on that change please? 14:20:23 tdurakov: that would require me to decide what is best, but I could 14:20:46 s/could/should 14:20:50 :) 14:21:01 paul-carlton2: will discuss it a bit later 14:21:25 ok 14:21:41 another important change/fix https://review.openstack.org/#/c/389687/ 14:22:18 imo it's ok to change rpc from cast to call here, just wanted everyone to be on the same page 14:22:52 calls can timeout of course, thats the usual issue 14:23:16 johnthetubaguy: it's 'post' step 14:23:29 on the other hand we have a race here right now 14:23:43 yeah, I think pkoniszewski mentioned this to me at one point 14:24:08 so, from me it's better to timeout, rather that undetermined race 14:24:14 upgrade wise, I think this is safe enough... 14:24:56 I mean, you don't get the fix till after the upgrade, but thats what you would expect 14:25:03 it could be backported I guess 14:25:14 good point on upgrades tbh 14:26:03 yeah, destination should be upgraded by that moment 14:26:04 * mriedem runs to another meeting 14:26:34 paul-carlton2: so:) 14:26:39 your turn 14:26:49 tdurakov: thats probably the right thing to do 14:26:59 it breaks live migration because after that hard reboot fails 14:27:47 also https://review.openstack.org/#/c/339588/ still waiting for reviews 14:27:53 tdurakov: can you test that with a VM that has 15 volumes attached, btw? 14:28:40 paul-carlton2: will take a look afrer the meeting 14:28:47 johnthetubaguy: which one? 14:28:56 and https://bugs.launchpad.net/nova/+bug/1633033, I have an attempt at a fix bit it is not working! 14:28:57 Launchpad bug 1633033 in OpenStack Compute (nova) "live migration with encrypted volume fails" [Undecided,In progress] - Assigned to Paul Carlton (paul-carlton2) 14:29:01 tdurakov: commented on the patch changing call to cast 14:29:16 tdurakov: as a side note, seems like we need to update our section in https://etherpad.openstack.org/p/ocata-nova-priorities-tracking 14:30:09 johnthetubaguy: ok, will do 14:30:23 paul-carlton2: about the bug above 14:31:05 any thoughts on potential fix for that? 14:32:24 seems maybe it needs an os-brick fix, should I ask in cinder room? 14:32:43 paul-carlton2: I think yes 14:33:01 offtop https://github.com/ansible/ansible/issues/18281 - ansible bug for the post_failure 14:33:20 thanks, got to go now, meeting 14:33:27 paul-carlton2: it feels like your fix is also needed for migration? 14:35:11 ok, let's go next 14:35:17 #topic specs 14:35:27 any updates on the specs for live-migration 14:36:23 well, it seems like folks are still returning from Barcelona, will try to reach them after the meeting 14:36:36 #topic Open discussion 14:37:05 pkoniszewski: are you around? 14:37:21 https://review.openstack.org/#/c/292826/ - wanted to update status on this chain 14:38:11 another thing https://review.openstack.org/#/c/274097 - devref for nova-neutron communication during live-migration 14:38:38 johnthetubaguy: you might be interested^ 14:39:11 yeah, thats on my review list 14:39:25 given the discussions to totally re-write that flow 14:40:04 I've shared diagrams for that on one of the privous meetings 14:40:15 working on this part 14:40:23 so, anything else to bring up? 14:40:35 mdbooth: are you still working on this post copy bug, I think you were in the design summit session where I mentioned the quick fix for that? https://bugs.launchpad.net/nova/+bug/1605016 14:40:37 Launchpad bug 1605016 in OpenStack Compute (nova) "Post copy live migration interrupts network connectivity" [High,In progress] - Assigned to Matthew Booth (mbooth-9) 14:40:57 I should follow up on that from your comment 14:42:10 johnthetubaguy: Not really, tbh. 14:42:39 johnthetubaguy: I was working on it while other stuff was stalled, but I'd be very happy for somebody else to take it from me. 14:43:32 mdbooth: please update assignee and status in the lp 14:43:51 mdbooth: will try to find anyone interested in fixing that 14:44:18 i still have context on it, though. I spoke to pavel(?) about it last week. 14:44:46 Incidentally, I still dispute the 'High' importance. I believe it's of limited consequence in practise. 14:46:12 mdbooth: let's discuss that next week than, once more people be there 14:46:38 Should be fixed, of course, but probably nobody will noticve. 14:47:45 need to re-read bug, will comment it 14:47:51 anything else? 14:48:53 thanks everyone for coming 14:49:23 #endmeeting