21:02:46 <ttx> #startmeeting project
21:02:47 <openstack> Meeting started Tue Sep 30 21:02:46 2014 UTC and is due to finish in 60 minutes.  The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:02:48 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:02:50 <openstack> The meeting name has been set to 'project'
21:02:52 <ttx> Our agenda for today:
21:02:56 <ttx> #link http://wiki.openstack.org/Meetings/ProjectMeeting
21:03:01 <ttx> should be a very quick one
21:03:08 <ttx> #topic News from the 1:1 sync points
21:03:17 <ttx> Here are the logs:
21:03:21 <ttx> #link http://eavesdrop.openstack.org/meetings/ptl_sync/2014/ptl_sync.2014-09-30-07.57.html
21:03:23 <ttx> and
21:03:27 <ttx> #link http://eavesdrop.openstack.org/meetings/ptl_sync/2014/ptl_sync.2014-09-30-16.13.html
21:03:34 <ttx> Keystone, Glance, Ceilometer already published a RC1
21:03:40 <ttx> Cinder, Sahara and Nova will be done first thing tomorrow
21:03:49 <ttx> or maybe just after this meeting if it ends early
21:03:55 <ttx> Horizon, Neutron, Swift, Heat, Trove are still at least a day off
21:04:08 <ttx> Race at: http://old-wiki.openstack.org/rc/
21:04:15 <ttx> #topic Other program news
21:04:17 <zaneb> reviews are in progress for the last Heat thing
21:04:21 <ttx> Any other program with a quick announcement ?
21:04:36 <dhellmann> we expect to have a patch release of oslo.db in the next day or two
21:04:39 <ttx> zaneb: ok, approve the open-kilo if it merges
21:04:52 <zaneb> ttx: cool, will do
21:05:52 <ttx> ok, nobody else ? annegentle, mordred ?
21:06:15 <mtreinish> ttx: nothing from me
21:07:08 <fungi> infra has nothing to add afaik
21:07:13 <fungi> oh, havana eol
21:07:34 <fungi> but you know, we have mailing lists where we announce these things too, so that's already old news
21:07:52 <ttx> ok then
21:07:53 <ttx> #topic Kilo release schedule
21:07:58 <ttx> Final proposal (with release on April 30) posted at:
21:08:00 <mtreinish> oh, I did want to bring up: https://bugs.launchpad.net/neutron/+bug/1323658 but we can save that until the end
21:08:03 <uvirtbot> Launchpad bug 1323658 in nova "Nova resize/restart results in guest ending up in inconsistent state with Neutron" [Critical,Confirmed]
21:08:03 <ttx> http://lists.openstack.org/pipermail/openstack-dev/2014-September/047128.html
21:08:11 <ttx> Unless someone complains very soon on that thread, it will be officialized
21:08:25 <ttx> #topic Open discussion
21:08:31 <ttx> mtreinish: go for it
21:08:58 <mtreinish> so that bug is one of the largest gate issues right now
21:09:29 <mtreinish> and it's basically blocking any movement on icehouse stuff
21:10:09 <mtreinish> so I was curious about it's deferral into K1
21:10:45 <ttx> mtreinish: that's a critical bug that needs to be fixed ASAP
21:10:53 <sdague> Bug 1323658 - Nova resize/restart results in guest ending up in inconsistent state with Neutron
21:10:57 <uvirtbot> Launchpad bug 1323658 in nova "Nova resize/restart results in guest ending up in inconsistent state with Neutron" [Critical,Confirmed] https://launchpad.net/bugs/1323658
21:11:02 <sdague> ttx: well the neutron team defered to k1
21:11:08 <ttx> but before or after RC1/release is not really a question
21:11:17 <ttx> sdague: it will land in master
21:11:22 <ttx> and maybe be backported
21:11:24 <sdague> and as far as I can tell no one is working on it
21:11:35 <ttx> but it should not prevent the RC1 from being published
21:11:40 <mikal> Yeah, my comments there are because whilst its critical, I don't think its RC
21:11:41 <ttx> that's another question.
21:11:44 <markmcclain> sdague: both salv-orlando and armax have dug into it
21:11:45 <mikal> And no one was assigned
21:11:48 <ttx> It should have someone working on it, rc1 or not
21:11:48 <sdague> the only active patch is me turning off the tests - https://review.openstack.org/#/c/125150
21:12:06 <sdague> ttx: well it does not
21:12:19 <ttx> unfortuinately we don't have mestery around
21:12:22 <mikal> sdague: it does not on the nova side
21:12:28 <mikal> sdague: it does on the neutron side
21:12:39 <mikal> So, that sounds like one for me to go and try and find a volunteer to work on it
21:12:42 <sdague> mikal: if someone is actively working on it, I'd expect patches up
21:12:49 <sdague> even if just log enhancement ones
21:12:56 <ttx> Aaron Rosen is assigned to it on the neutron side
21:13:11 <markmcclain> sdague: so I think it is going to take a pairing of a nova dev and neutron dev
21:13:33 <ttx> trying to ping him to see
21:13:36 <markmcclain> since some of the neutron folks who've looked at it have reached the limits of their nova knowledge
21:13:46 <sdague> ttx: I tried pinging him earlier today, no ack
21:14:08 <sdague> markmcclain: ok, well when I brought it up in neutron channel earlier today, it was all dead air
21:14:12 <ttx> here he is
21:14:20 <arosen-home1> :)
21:14:23 <ttx> arosen-home1: what's the state?
21:14:40 <markmcclain> sdague: sorry about dead air… lots of neutron devs were offline today
21:15:02 <arosen-home1> I've spent a good amount of trying trying to reproduce this and dig into it but unfortunately i'm still grasping at straws why it's failing :(
21:15:42 <arosen-home1> I'm wondering if might be more of an issue with config driver but i see the postgres tests uses the metadata agent and still has failures as well.
21:16:21 <arosen-home1> the resize code though doesn't unplug the vif so really neutron shouldn't even know that anything happened.
21:16:45 <sdague> well, the fact that there are no patches up to add debug logging that would expose that means I assume everyone has just given up
21:17:16 <sdague> and if that's the case, we should just be honest about that
21:17:26 <sdague> and not pretend it's critical when no one is working on it
21:17:36 <ttx> weird that it would hit icehouse and master at the same time
21:17:46 <sdague> ttx: not really
21:17:56 <mikal> So, reading the comment history
21:18:05 <mikal> The neutron devs assert that the instance never boots
21:18:10 <arosen-home1> sdague: i agree sorry I didn't communicate much better on this one :/ .
21:18:41 <arosen-home1> I think this issue might be related to config-drive more so maybe but still not really sure :/
21:18:44 <ttx> sdague: oh, 10 days of history ? heh
21:19:00 <mikal> What about this being a qemu disk corruption problem?
21:20:33 <sdague> arosen-home1: so if you believe there are areas that this might be in, can we get some debug logs added to verify those assertions or not
21:20:40 <sdague> honestly, that's really an anyone:
21:20:55 <sdague> because I don't actually want to single out arosen-home1 here, he did dive for a while
21:21:46 <mikal> sdague: so, the neutron guys say they see logs indicating the instance didn't boot completely. Is it possible to stash the instance disk files somewhere in the case of failure?
21:22:00 <sdague> mikal: anything is possible
21:22:17 <mikal> sdague: I'd like to see if this is another of those qemu disk corruption bugs
21:22:24 <mikal> sdague: we've fixed at least one of those recently
21:22:48 <ttx> the bug was filed back in May, but flew under the radar between June 16 and September 2 ?
21:23:07 <mikal> sdague: for example https://review.openstack.org/#/c/123957/
21:23:12 <ttx> or did it just become more of a PITA recently
21:23:31 <arosen-home1> mikal:  is there anyway to check if the disk is corrupt?
21:23:54 <mikal> arosen-home1: not that I can immediately think of, except for archiving it for inspection
21:23:56 <arosen-home1> would a console-log show us anything? I think there is something funky going on with when tempest does it's console-logging.
21:24:18 <mikal> arosen-home1: a console log might help, but also might not
21:25:05 <mikal> I think an audit of every use of qemu-img to make sure it has a fsync() might be a reasonable thing to do
21:25:10 <mikal> Not sure if it would help here though
21:25:52 <sdague> ttx: it rose in exposure recently
21:27:04 * dhellmann apologizes for having to leave early
21:28:04 <ttx> I can't think at this late hour anyway
21:28:50 <sdague> mikal: that's at least something
21:29:17 <mikal> sdague: I will ask Tony to see if he can find other places the qemu thing might be happening
21:29:31 <mikal> But I need him to wake up first
21:29:35 <sdague> mikal: so lacking that, we should probably bring this patch in anyway right?
21:29:46 <mikal> sdague: yes, definitely
21:29:56 <mikal> sdague: my point was "we should trust qemu less"
21:30:00 <sdague> sure
21:30:02 <mikal> But I am not convinced it is our issue here
21:30:04 <mikal> Just hoping it is
21:30:34 <mikal> I'd be surprised if there's any qemu in the restart path for example
21:30:44 <mikal> But it would be for resize
21:30:45 <mtreinish> mikal: so we do have the qemu logs for each instances saved, but I'm not sure that would capture what you need
21:30:50 <ttx> sdague: would you say it happens more often on icehouse than on master right now ? That would go in the direction of something that master would have partially patched
21:31:12 <sdague> ttx: it's probably more icehouse
21:31:14 <mikal> ttx: the "layer 0" dependancies are different for icehouse as well remember
21:31:26 <sdague> that's mostly related to the fact that nova is faster in juno
21:31:29 <mikal> ttx: different libvirt version for example
21:31:31 <sdague> in my guess
21:31:35 <sdague> or neutron even
21:31:42 <sdague> there were enough perf changes as well
21:31:52 <markmcclain> feels like something has slightly altered timing and now we've upset the balance of the universe
21:32:01 <sdague> yeh, that happens over time as well
21:32:06 <sdague> as different tests overlap
21:32:16 <sdague> based on what's skipped or not in each branch
21:32:30 <sdague> this is why tracking these things down is hard
21:32:52 <arosen-home1> sdague: i feel like it's not related to neutron as the resize code path in nova does nothing to the instance that neutron would know about (i put print statements in the plug/unplug code and that doesn't happen when a resize occurs on the same host).
21:33:20 <sdague> arosen-home1: except it only exposes on neutron paths, right?
21:33:42 <ttx> sdague, arosen-home1, markmcclain, mikal: do you want to keep this meeting open to keep debugging ? Or come up with another venue to continue working on it in common ?
21:33:53 <mikal> I think we could do this in another channel
21:33:57 <arosen-home1> sdague:  ah really?  sorry I didn't realize that :(
21:34:02 <mikal> Let's not waste unrelated PTL's time
21:34:11 <markmcclain> yeah we can chat in -neutron
21:34:14 <ttx> well, I could use some sleep
21:34:32 * ttx feels like a bartender ejecting his last faithful customers
21:35:13 <ttx> ok, unless someone else has something to raise...
21:35:55 <ttx> #endmeeting