21:02:46 #startmeeting project 21:02:47 Meeting started Tue Sep 30 21:02:46 2014 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:02:48 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:02:50 The meeting name has been set to 'project' 21:02:52 Our agenda for today: 21:02:56 #link http://wiki.openstack.org/Meetings/ProjectMeeting 21:03:01 should be a very quick one 21:03:08 #topic News from the 1:1 sync points 21:03:17 Here are the logs: 21:03:21 #link http://eavesdrop.openstack.org/meetings/ptl_sync/2014/ptl_sync.2014-09-30-07.57.html 21:03:23 and 21:03:27 #link http://eavesdrop.openstack.org/meetings/ptl_sync/2014/ptl_sync.2014-09-30-16.13.html 21:03:34 Keystone, Glance, Ceilometer already published a RC1 21:03:40 Cinder, Sahara and Nova will be done first thing tomorrow 21:03:49 or maybe just after this meeting if it ends early 21:03:55 Horizon, Neutron, Swift, Heat, Trove are still at least a day off 21:04:08 Race at: http://old-wiki.openstack.org/rc/ 21:04:15 #topic Other program news 21:04:17 reviews are in progress for the last Heat thing 21:04:21 Any other program with a quick announcement ? 21:04:36 we expect to have a patch release of oslo.db in the next day or two 21:04:39 zaneb: ok, approve the open-kilo if it merges 21:04:52 ttx: cool, will do 21:05:52 ok, nobody else ? annegentle, mordred ? 21:06:15 ttx: nothing from me 21:07:08 infra has nothing to add afaik 21:07:13 oh, havana eol 21:07:34 but you know, we have mailing lists where we announce these things too, so that's already old news 21:07:52 ok then 21:07:53 #topic Kilo release schedule 21:07:58 Final proposal (with release on April 30) posted at: 21:08:00 oh, I did want to bring up: https://bugs.launchpad.net/neutron/+bug/1323658 but we can save that until the end 21:08:03 Launchpad bug 1323658 in nova "Nova resize/restart results in guest ending up in inconsistent state with Neutron" [Critical,Confirmed] 21:08:03 http://lists.openstack.org/pipermail/openstack-dev/2014-September/047128.html 21:08:11 Unless someone complains very soon on that thread, it will be officialized 21:08:25 #topic Open discussion 21:08:31 mtreinish: go for it 21:08:58 so that bug is one of the largest gate issues right now 21:09:29 and it's basically blocking any movement on icehouse stuff 21:10:09 so I was curious about it's deferral into K1 21:10:45 mtreinish: that's a critical bug that needs to be fixed ASAP 21:10:53 Bug 1323658 - Nova resize/restart results in guest ending up in inconsistent state with Neutron 21:10:57 Launchpad bug 1323658 in nova "Nova resize/restart results in guest ending up in inconsistent state with Neutron" [Critical,Confirmed] https://launchpad.net/bugs/1323658 21:11:02 ttx: well the neutron team defered to k1 21:11:08 but before or after RC1/release is not really a question 21:11:17 sdague: it will land in master 21:11:22 and maybe be backported 21:11:24 and as far as I can tell no one is working on it 21:11:35 but it should not prevent the RC1 from being published 21:11:40 Yeah, my comments there are because whilst its critical, I don't think its RC 21:11:41 that's another question. 21:11:44 sdague: both salv-orlando and armax have dug into it 21:11:45 And no one was assigned 21:11:48 It should have someone working on it, rc1 or not 21:11:48 the only active patch is me turning off the tests - https://review.openstack.org/#/c/125150 21:12:06 ttx: well it does not 21:12:19 unfortuinately we don't have mestery around 21:12:22 sdague: it does not on the nova side 21:12:28 sdague: it does on the neutron side 21:12:39 So, that sounds like one for me to go and try and find a volunteer to work on it 21:12:42 mikal: if someone is actively working on it, I'd expect patches up 21:12:49 even if just log enhancement ones 21:12:56 Aaron Rosen is assigned to it on the neutron side 21:13:11 sdague: so I think it is going to take a pairing of a nova dev and neutron dev 21:13:33 trying to ping him to see 21:13:36 since some of the neutron folks who've looked at it have reached the limits of their nova knowledge 21:13:46 ttx: I tried pinging him earlier today, no ack 21:14:08 markmcclain: ok, well when I brought it up in neutron channel earlier today, it was all dead air 21:14:12 here he is 21:14:20 :) 21:14:23 arosen-home1: what's the state? 21:14:40 sdague: sorry about dead air… lots of neutron devs were offline today 21:15:02 I've spent a good amount of trying trying to reproduce this and dig into it but unfortunately i'm still grasping at straws why it's failing :( 21:15:42 I'm wondering if might be more of an issue with config driver but i see the postgres tests uses the metadata agent and still has failures as well. 21:16:21 the resize code though doesn't unplug the vif so really neutron shouldn't even know that anything happened. 21:16:45 well, the fact that there are no patches up to add debug logging that would expose that means I assume everyone has just given up 21:17:16 and if that's the case, we should just be honest about that 21:17:26 and not pretend it's critical when no one is working on it 21:17:36 weird that it would hit icehouse and master at the same time 21:17:46 ttx: not really 21:17:56 So, reading the comment history 21:18:05 The neutron devs assert that the instance never boots 21:18:10 sdague: i agree sorry I didn't communicate much better on this one :/ . 21:18:41 I think this issue might be related to config-drive more so maybe but still not really sure :/ 21:18:44 sdague: oh, 10 days of history ? heh 21:19:00 What about this being a qemu disk corruption problem? 21:20:33 arosen-home1: so if you believe there are areas that this might be in, can we get some debug logs added to verify those assertions or not 21:20:40 honestly, that's really an anyone: 21:20:55 because I don't actually want to single out arosen-home1 here, he did dive for a while 21:21:46 sdague: so, the neutron guys say they see logs indicating the instance didn't boot completely. Is it possible to stash the instance disk files somewhere in the case of failure? 21:22:00 mikal: anything is possible 21:22:17 sdague: I'd like to see if this is another of those qemu disk corruption bugs 21:22:24 sdague: we've fixed at least one of those recently 21:22:48 the bug was filed back in May, but flew under the radar between June 16 and September 2 ? 21:23:07 sdague: for example https://review.openstack.org/#/c/123957/ 21:23:12 or did it just become more of a PITA recently 21:23:31 mikal: is there anyway to check if the disk is corrupt? 21:23:54 arosen-home1: not that I can immediately think of, except for archiving it for inspection 21:23:56 would a console-log show us anything? I think there is something funky going on with when tempest does it's console-logging. 21:24:18 arosen-home1: a console log might help, but also might not 21:25:05 I think an audit of every use of qemu-img to make sure it has a fsync() might be a reasonable thing to do 21:25:10 Not sure if it would help here though 21:25:52 ttx: it rose in exposure recently 21:27:04 * dhellmann apologizes for having to leave early 21:28:04 I can't think at this late hour anyway 21:28:50 mikal: that's at least something 21:29:17 sdague: I will ask Tony to see if he can find other places the qemu thing might be happening 21:29:31 But I need him to wake up first 21:29:35 mikal: so lacking that, we should probably bring this patch in anyway right? 21:29:46 sdague: yes, definitely 21:29:56 sdague: my point was "we should trust qemu less" 21:30:00 sure 21:30:02 But I am not convinced it is our issue here 21:30:04 Just hoping it is 21:30:34 I'd be surprised if there's any qemu in the restart path for example 21:30:44 But it would be for resize 21:30:45 mikal: so we do have the qemu logs for each instances saved, but I'm not sure that would capture what you need 21:30:50 sdague: would you say it happens more often on icehouse than on master right now ? That would go in the direction of something that master would have partially patched 21:31:12 ttx: it's probably more icehouse 21:31:14 ttx: the "layer 0" dependancies are different for icehouse as well remember 21:31:26 that's mostly related to the fact that nova is faster in juno 21:31:29 ttx: different libvirt version for example 21:31:31 in my guess 21:31:35 or neutron even 21:31:42 there were enough perf changes as well 21:31:52 feels like something has slightly altered timing and now we've upset the balance of the universe 21:32:01 yeh, that happens over time as well 21:32:06 as different tests overlap 21:32:16 based on what's skipped or not in each branch 21:32:30 this is why tracking these things down is hard 21:32:52 sdague: i feel like it's not related to neutron as the resize code path in nova does nothing to the instance that neutron would know about (i put print statements in the plug/unplug code and that doesn't happen when a resize occurs on the same host). 21:33:20 arosen-home1: except it only exposes on neutron paths, right? 21:33:42 sdague, arosen-home1, markmcclain, mikal: do you want to keep this meeting open to keep debugging ? Or come up with another venue to continue working on it in common ? 21:33:53 I think we could do this in another channel 21:33:57 sdague: ah really? sorry I didn't realize that :( 21:34:02 Let's not waste unrelated PTL's time 21:34:11 yeah we can chat in -neutron 21:34:14 well, I could use some sleep 21:34:32 * ttx feels like a bartender ejecting his last faithful customers 21:35:13 ok, unless someone else has something to raise... 21:35:55 #endmeeting