21:02:33 #startmeeting project 21:02:34 Meeting started Tue Jan 21 21:02:33 2014 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:02:35 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:02:35 ttx: need me ? :) 21:02:37 o/ 21:02:38 The meeting name has been set to 'project' 21:02:40 * flaper87 is here in case an update on oslo is needed 21:02:40 o/ 21:02:45 ttx: bah. I don't believe in meetings 21:02:50 lifeless: you can have that hour of your life back 21:02:51 so does the o/ actually do something bot related? 21:02:57 #link http://wiki.openstack.org/Meetings/ProjectMeeting 21:03:03 hub_cap: no 21:03:09 The main topic of discussion for today is obviously the situation with the frozen gate and the consequences for icehouse-2 21:03:17 #topic Frozen gate situation 21:03:29 so, on that topic... 21:03:31 First of all, I would like to discuss the current situation: flow rate, main causes 21:03:42 s othat everyone ois on the same page 21:03:57 sdague, jeblair, mordred: any volunteer to summarize the current state ? 21:04:02 sure 21:04:14 so we've got kind of a perfect storm of issues 21:04:31 we're hitting non linear scaling issues on zuul, because we're throwing so much at it 21:04:36 and the gate queue is so long 21:04:47 we're in starvation for nodes in general 21:05:01 our working set is much bigger than all the available nodes to us 21:05:21 and we've got a bunch of new gate reseting issues that cropped up in the last couple of weeks 21:05:31 is that node starvation due to lack of cloud resources ? Or something else ? 21:05:44 which we were blind to because we were actually loosing console logs to elastic search 21:05:53 ttx: lack of cloud resources 21:06:05 basically we're swapping 21:06:08 sdague: is that something we need to ask our kind sponsors to help with ? 21:06:15 ttx: it has been requested 21:06:19 sdague: ok 21:06:23 i think i saw mordred say he was asking 21:06:34 yep, mordred is working that 21:06:40 yes. I have emailed rackspace about more nodes/quota 21:06:44 sdague, mordred: let me know if smoe kind of officioal foundation statement can help 21:06:46 sdague: i don't want to contradict anything you have said, all of which are correct, but i don't think starvation is having a huge impact on throughput 21:06:48 pvo hasn't given a final answer back yet 21:07:05 jeblair: that could be 21:07:12 mordred: if you could take iad or dfw, that would be a huge help. 21:07:16 is that possible? 21:07:28 hp's new cloud region is where new capacity comes from, and those nodes do not work well for us in the gate yet- so we're dependent on rax for additional node quota if we're getting any 21:07:29 though yesterday we were thrashing due to node starvation 21:07:33 pvo: I believe so? 21:07:35 in some interesting patterns 21:07:43 jeblair: thoughts on pvo's question? 21:07:45 also, it is worth mentioning that the reduction in tempest concurrency is likely affecting throughput 21:07:46 but we can debate the subtleties later 21:08:04 jeblair: well, it saw a noticable increase in reliability 21:08:06 mordred: I can get the quotas raised there if that'll work for you. I"ve been going back and forth with our ops guys. 21:08:11 so what's the flow rate right now ? 20 changes per day ? more ? less ? 21:08:14 so I think, on net, it's helping 21:08:15 https://github.com/openstack/openstack/graphs/commit-activity 21:08:18 flow 21:08:24 pvo: absolutely 21:08:27 jog0: awesome htx 21:08:49 jeblair: mordred: cool. I'll try to get those changes done today. 21:09:04 pvo: thanks! I owe you much beer 21:09:19 I've spent enough time looking at the perf data now, that I'm very much with russellb, 4x concurency puts the environment at such a high load for so long, we get unpredicable behavior 21:09:32 +1 21:09:36 sdague: is there a guide to running tempest somewhere? I can't see one on the wiki and am ashamed to admit I've never run it in person. 21:09:38 it's just a non-starter right now 21:09:42 OK, do we have reasons to hope things will get back to a flow rate in the near future that would let us absorb that backlog ? 21:09:47 mordred: np. whats the tenant id again? 21:09:54 pvo: thanks a ton 21:09:58 can private message that if you want 21:10:04 Sorry, wrong channel 21:10:08 heh 21:10:45 ttx: without more people helping on the race fix side... no 21:11:01 we're going to mitigate some things, to make us thrash less 21:11:13 but we won't get back to normal flow until we aren't doing so many resets 21:11:35 i've had a fix for a huge offender up for 5 hours, but it's so behind we can't get it in 21:11:38 sdague: do we have a bug number for the key issue(s) 21:11:40 it hasn't shown up in check yet even 21:11:51 ttx: I put 2 on the list yesterday 21:11:59 sdague: ok, those two are still current 21:12:00 https://review.openstack.org/#/c/68147/3 ... fixes a bug with 340 hits in the last 12 hours 21:12:02 part of it is also discovering the resets 21:12:23 ttx: yes, I had a morning update on list about it 21:12:43 OK... As far as the icehouse-2 milestone goes, we need to decide if it makes sense to cut branches at the end of the day today, or if we should just delay by a week 21:12:52 On one hand, a milestone is just a point in time, so we can cut anytime 21:13:01 On the other hand, we want it to be a *reference* point in time, and here we have a lot of stuff in-flight... so cutting now is a bit useless 21:13:03 honestly, we're only at about an 80% characterization rate right now as well, so there could be killer bugs we haven't identified yet 21:13:15 Decision all depends on our confidence that adding a week will make a difference. 21:13:24 that the currently in-flight stuff will make it. 21:13:31 which in turn depends on our ability to restore a decent flow rate to absorb the backlog. 21:13:38 I'm open to suggestions here 21:13:48 ttx: I do see people try to install milestone releases, so do wait if you think it would help 21:14:10 http://status.openstack.org/release/ should show how many bleuprints are stuck in queue for i2 21:14:21 my instinct is cut what we have, the trees were always supposed to be shippable 21:14:26 ttx: I am not sure if waiting a week is enough TBH. If we don't fix the gate a week won't help 21:14:27 sdague: same here 21:14:37 my concern is that if we delay i2, then all these people are going to rush the queue harder 21:14:37 yeah. we're not timed-based-releses for nothing round here 21:14:48 i say just cut 21:14:52 ++ 21:15:02 sdague: that's a very good point 21:15:08 we don't need to encourage a gate rush right now 21:15:15 we need people to chill out for a while 21:15:17 yeah avoid gate rush. I'm just going back and forth here. I'll stop 21:15:20 sdague: I'm fine with cutting now if you PTLs are fine with that 21:15:24 I vote for doing a cut now 21:15:37 it will looks a bit empty but who cares 21:15:55 ttx: sometimes we get to fail in public, and i think that's OK 21:15:58 yeh, I already had one group ping me for a requirements review so they could land something in i2 this morning 21:16:00 will this encourage folks whose changes are in-flight ut dont make it to delay them further, making a bigger rush for I3 ? 21:16:01 if it will alleviate the mental push to help clean up the Q a bit, lets do it 21:16:10 I'm fine with a cut now, except there is one patch that I really need to land first that has been stuck for hours. 21:16:14 devananda: we're going to have a bigger rush in i3 regardless 21:16:18 if we weren't all so damn busy, we could write a book on all the amazing lessons we're learning along the way :) 21:16:27 russellb: ++ 21:16:27 so.. let's say I'll cut early next morning, which gives a bit of time for things to land 21:16:34 +1 21:16:35 that would be like 7am MST 21:16:35 horizon may have a licensing issue if we cut now, may need to get a fix in 21:16:58 would that work for everyone ? 21:17:04 ttx: good for me 21:17:21 i'm fine with cutting now 21:17:28 ++ 21:17:31 and then tag the next morning same hour 21:17:33 https://bugs.launchpad.net/horizon/+bug/1270764 21:17:38 or even tag on the same day 21:17:42 ttx: we may want to get ptls to indicate super-important must-land patches in case promoting them is necessary 21:17:45 it's not as if backpotrs would make it through anyway 21:18:08 yeh, if there is a licensing patch, that would qualify 21:18:11 so... branch cut at 7am MST, tag toward the end of the afternoon 21:18:19 mordred: i hope you're not referring to blueprints .. ? 21:18:19 works for me 21:18:22 ttx: This Critical fix needs to get in one way or another https://review.openstack.org/#/c/68135/ 21:18:40 i dont' care about landing more features, i just want to help restore the gate :) 21:18:45 at this point 21:18:52 russellb: +1 21:18:52 sdague: could you add it to your magic list of priority fixes ? 21:18:54 dolphm: nope. I just think there are a couple of things, like the license thing, that we should help land before the morning if we can 21:18:55 russellb: ++ 21:18:58 russellb: ++ 21:18:59 https://review.openstack.org/#/c/68135/ 21:19:02 russellb: ++ please 21:19:18 trying 21:19:26 ttx: so it has to have *some* recent test results 21:19:31 if we fail to land it today, we still have the backport option tomorrow 21:19:34 before we promote 21:19:46 sdague: ack 21:20:08 ttx: I need to run - are we good on the portion of the meeting for which I'm useful? 21:20:15 #agreed I2 branch cut around 7am MST tomorrow. Tags expected toward the end of day 21:20:30 mordred: I think we cna handle it frmo here 21:20:33 damn it 21:20:38 sdague: nova fix is in the check queue now at least (no dsvm nodes yet) 21:20:40 heh 21:20:41 mordred: I think we can handle it from here 21:20:55 russellb: yeh, there is something else going on, like last week 21:21:04 7 AM MST3K? 21:21:14 too many meetings this afternoon to dive on it :) 21:21:27 I'm down 21:21:28 sdague: let's make this one quick 21:21:34 #topic stable/havana situation (david-lyle) 21:21:50 So stable/grizzly was broken a few weeks ago, which is turn paralyzed stable/havana (due to grenade trying to run stable/grizzly stuff) 21:21:55 My understanding is that it's all put on the backburner until we get the master gate in order 21:22:03 and in hte mea ntim we should not approve stable/* stuf 21:22:14 mean time... stuff 21:22:15 seems stable +A was removed anyway :) 21:22:18 yeh 21:22:19 hah 21:22:21 good 21:22:26 solves that. 21:22:27 we made an executive call on that on sunday 21:22:29 david-lyle: comment on that ? 21:22:41 (you raised that topic) 21:22:47 ok, just checking. I think queued up jobs are finally failing 21:22:48 the stable/grizzly fix is actually in the gate 21:23:00 david-lyle: are there still stable jobs in the gate? 21:23:08 I thought we managed to kick those all out 21:23:12 wanted to verify 21:23:37 I believe - https://review.openstack.org/#/c/67739/ will make stable/havana work in the gate again 21:23:39 sdague: I believe I saw a couple Horizon failures yesterday 21:23:44 stable/grizzly is actually working again 21:23:49 david-lyle: gate or check 21:23:50 ok 21:23:59 so this is actually being solved too 21:24:11 david-lyle: anything more on that topic ? 21:24:14 sdague: I think gate 21:24:24 ttx: no, I'm good 21:24:27 #topic common openstack client (markwash) 21:24:31 david-lyle: so if there are any other stable/havana things in gate, we should remove them 21:24:36 markwash: floor is yours 21:24:52 okay, I was really just hoping to do a quick poll here 21:24:58 sdague, ack 21:25:16 the discussion in openstack dev is about what kind of approach to take: many client projects, one client project 21:25:25 one 21:25:40 oh 21:25:45 I'm interested especially because I want to make some changes to glanceclient, and would love to know where to put it 21:25:59 I'm more worried about MULTIPLE unified client projects 21:26:01 there's also a difference between one CLI client and one library SDK 21:26:31 I thought openstackclient was the one but hte discussion seemed to point to something else (haven't read it in detail yet) 21:26:33 markwash: my read of the thread is you should keep doing what you are doing now, I can't imagine this becomes reality pre icehouse 21:27:02 sdague: makes sense 21:27:34 If no one else has anything they want to share re one vs many I'm good. . I just was hoping to hear from any PTLs who felt strongly 21:28:16 whatever works :) 21:28:18 this is in regard to one SDK, correct? 21:28:22 not another CLI? 21:28:40 I would be happy to get behind a unified client, but I've still not got around to integrating heat with python-openstackclient 21:28:59 dolphm: I think that's the key distinction, yeah. . not a lot of opposition in my mind to unifying the CLI 21:29:09 yeah, we really need someone to head that client projects division 21:29:23 and push things forward 21:29:36 in a coherent way 21:29:52 honestly, I'm very pro unified CLI and unified SDK. But I don't think this is an icehouse thing regardless 21:29:53 my take is, programs should own their little part of the SDK within reason, but we can probably solve that regardless of how we structure the client repo(s) 21:29:59 I'm happy that people are diving on it 21:30:09 #topic Red Flag District / Blocked blueprints 21:30:18 Any inter-project blocked work that this meeting could help unblock ? 21:30:18 markwash: ++ 21:30:37 Though I suspect the gate is the common issue 21:31:41 OK, I guess none then 21:31:43 #topic Incubated projects 21:31:53 o/ 21:32:00 o/ 21:32:03 devananda, kgriffs, SergeyLukjanov: will cut your branches at the same time as the grown-up's 21:32:10 great :) 21:32:18 works for me too 21:32:33 in savanna, we're only have 1 bp and 1 bug in flight 21:32:39 https://launchpad.net/savanna/+milestone/icehouse-2 21:32:43 we've got a bunch of bug-fixes in flight, stalled on the same slow gate that the grown-ups are stalled on. 21:32:54 I will defer what's still open 21:33:03 so keep your blueprints status current 21:33:08 (and your bugs too) 21:33:21 any question on that process ? 21:33:26 our BP status is correct (2 implemented) I've deferred the rest that were targeted previously 21:33:50 please feel free to defer any still-open bugs when you cut. we'll land what we can in the meantime 21:34:02 no questions here 21:34:05 #topic Open discussion 21:34:13 Anything else, anyone ? 21:34:27 on the cross project front... 21:34:43 if you are using eventlet.wsgi to log requests, please turn debug off 21:34:45 https://review.openstack.org/#/c/67737/ 21:34:51 I proposed that to neutron and nova 21:34:58 otherwise you get a ton of log spam 21:35:01 #info if you are using eventlet.wsgi to log requests, please turn debug off https://review.openstack.org/#/c/67737/ 21:35:17 sdague: thanks for the pointer. I suspect we need to do taht, too 21:35:22 OK, back to useful work, then. 21:35:23 ttx, are you planning to use 'proposed/icehouse-2' instead of mp branches for i3? 21:35:33 no, mp 21:35:39 the change wasn't merged yet 21:35:47 we migt try that for i-3 21:35:55 anything else ? 21:36:16 ok then 21:36:19 #endmeeting