21:02:34 #startmeeting project 21:02:35 Meeting started Tue Jan 7 21:02:34 2014 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:02:36 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:02:38 o/ 21:02:38 The meeting name has been set to 'project' 21:02:40 ttx: I did, new irc client - still configuring alerts 21:02:54 stevebaker: we can do it just after this one 21:02:57 ok 21:02:59 #link http://wiki.openstack.org/Meetings/ProjectMeeting 21:03:08 #topic Icehouse-2 21:03:16 We looked into icehouse-2 progress during the 1:1s today 21:03:31 only two weeks left before icehouse-2. how time flies 21:03:39 We are a bit late overall and I fear congestion at the gate next week 21:03:46 So land what you can this week :) 21:03:51 (if you can, eh) 21:03:54 dang holidays 21:04:02 We'll probably defer anything "not started" next Tuesday. 21:04:24 in other news, we ahve a few topics up for discussion 21:04:31 #topic Nova-network deprecation status 21:04:37 #link http://lists.openstack.org/pipermail/openstack-dev/2014-January/023555.html 21:04:52 We said we'd make a final call after icehouse-2, but now is a good time to take the pulse on this 21:05:10 markmcclain, russellb: what's the status at this point ? 21:05:30 sdague: and if you're around, QA perspective would be good too 21:05:44 we're actively working on closing the gaps from both a feature and QA side 21:06:14 we're meeting next week to sprint on Neutron/Tempest 21:06:22 think they'll be closed by icehouse-2? or still a ways off? 21:06:39 it still feels a bit off from there to me 21:07:00 there is also a general perception that built over time that make people oversensitive to the issue 21:07:12 my experience is it's at least 4 - 6 weeks to stabalize the parallel jobs, and those aren't lit yet 21:07:24 Ideally feature parity would let us keep nova-network frozen 21:07:39 yeah stabilizing the jobs has uncovered other issues 21:07:44 then we can wait for more confidence before marking deprecated 21:07:47 I don't know where we stand on the feature matches, that's just the qa side 21:08:02 which has been a good thing, but has slowed velocity down 21:08:08 markmcclain: how about the feature parity bits? 21:08:14 there really wasn't too much left there 21:08:17 one big thing, really 21:08:23 multi-host? 21:08:24 o/ 21:08:25 yeah 21:08:42 so still have two different groups working on it 21:08:45 what about having a good open source default that scales moderately well 21:09:13 jog0: do we ahve that in the nova-network case ? 21:09:26 they're working from different positions that have different payoffs (ie one gets us a good solution short term and the other pays off long term) 21:09:32 ("moderately scalable") 21:09:35 ttx: AFIK yes 21:09:41 jog0: yeah i guess that was the other thing 21:09:54 so salv-orlando said a bunch of the issues are ovs 1.3 21:10:07 when moving to ovs 2.0 a lot of issues fixed themselves 21:10:11 i was actually just talking to a user yesterday saying that they were trying neutron with OVS on 200 nodes and it fell over badly 21:10:15 nova-network was/is fine 21:10:17 with havana 21:10:27 as part of the parallel testing we've been uncovering and fixing some of scaling issues 21:10:34 that's good 21:10:39 russellb: thats a a great benchmark to try to match here 21:10:46 OVS 2.0 is the direction to head long term 21:10:47 so maybe icehouse will fair better 21:10:51 russellb: any idea what ovs version? 21:10:52 but will require some distro support 21:10:57 sdague: nope, but i can find out 21:11:06 whatever we have in RHEL i suspect 21:11:07 russellb: that would be helpful, just to get more data 21:11:21 and i should know the answer to that ... 21:11:28 russellb: there are two pieces I think... you need feature parity to keep nova-network frozen. you need performance parity / stability partity before marking it deprecated 21:11:30 I agree, we should have a scaling threshold 21:11:35 russellb: would love to know how their cluster fell over 21:11:43 ttx: working on getting more info 21:11:47 err, markmcclain ^^ 21:12:09 will point you to info once i have it 21:12:15 ttx: so feature parity is being worked through testing 21:12:45 russellb: I think we may get to feature parity, but fear that perf/stability parity might be farther away, not even talking about the confidence to build around it 21:12:49 ttx: the problem is that we've had a *much* longer time between freeze and deprecation than is healthy 21:12:51 (in icehouse) 21:13:03 performance parity depends on what you're measuring 21:13:09 I do think all of this paints a story that isn't going to close in icehouse 21:13:22 so ... i'm tempted to unfreeze, and revisit a freeze (and deprecation) when we have a much better feel for when that will happen 21:13:31 I think some huge strides have been made, which have been great 21:13:42 sdague: that's what i'm afraid of 21:13:50 unfreezing is bad idea because it creates a moving target 21:13:52 yes, great progress and velocity, the track is just longer than expected 21:13:55 but I don't think we're going to close on all of it in icehouse, not in a way I'd feel comfortable with 21:13:56 markmcclain: and the migration story (re: a doc saying if useing nova-network use this in neutron ) 21:14:05 markmcclain: +1 21:14:33 russellb: how much would unfreezing really create a moving target ? what would make it in, really ? 21:14:42 i don't expect major work 21:14:44 just curious are there signficant proposals out there for changes to nova-net? 21:14:45 jog0: we can get those docs written 21:14:46 bugfixes, new features ? 21:15:00 but there are all kinds of things we haven't bothered doing because it was on it's way out 21:15:06 my question is that if folks want to change nova-net are they contributing to neutron? 21:15:17 if not then we're fragmenting the devs 21:15:19 no-db-compute is one example 21:15:23 and that was a couple releases ago 21:15:51 yeh, and I don't think it's reasonable to ask nova-network folks to work on neutron if they can't deploy it at scale in their envs. 21:15:53 more minor things 21:15:57 keeping it up to date with the rest of nova 21:16:05 lots of that kind of stuff we have passed on while it's frozen 21:16:06 markmcclain: I think russell is more talking about nova infra changes that were applied everywhere but in nova-net, like no-db-compute 21:16:12 honestly, I'd be +1 with the unfreeze, as I do think it's been frozen too long 21:16:13 mostly, yes 21:16:23 so maybe just a softer freeze 21:16:28 sdague: but when they don't contribute we lose that information and the problem being solve because it is not shared 21:16:36 like ... we don't really want *major* work on it, because that should be neutron 21:16:49 but we'll let smaller enhancements, and other infrastructure work to keep it up to date with the rest of nova 21:16:59 russellb: we could have a soft freeze (no new feature, just keep it up to date) 21:17:07 no major new features anyway 21:17:14 some of those would still count as "features" but would not widen the gap 21:17:20 moving to no db type stuff makes sense 21:17:24 russellb: but then what do "minor" enhancements buy you anyway? 21:17:27 like 'adopt nova new object model' 21:17:47 jgriffith: probably something we'd have to take case by case 21:17:53 russellb: fair 21:17:56 ttx: yes, that's been the biggest pain 21:18:06 ttx: that's an example I'm not sure I fully understand 21:18:09 rpc versioning stuff, no-db-compute stuff, nova objects stuff ... all avoided nova-network 21:18:13 markmcclain: so we could have a freeze for any feature that would widen the gap 21:18:17 has gone for a few releases now 21:18:23 and it's adding up 21:18:25 but not a strict code freeze 21:18:34 +1 on ttx / russelb's idea of a 'soft' freeze. 21:19:02 a freeze on features makes sense will allowing framework changes to made 21:19:08 markmcclain: if you get to feature parity, then we can easily argue that feature dev should happen in neutron 21:19:14 performance enhancements would be OK i think 21:19:25 someone told me they had some of those they hadn't submitted because they thought they would be rejected 21:19:27 probably right 21:19:46 performance enhancements should be ok as long as the information is shared with Neutron team 21:19:46 it's just that nova-net users might not be ready to switch in icehouse, and need a working nova-net 21:19:47 so I think this just needs to be more incentive for the neutron team to close the basic gaps. 21:20:06 we might have already solved some of the same issues 21:20:16 * russellb nods 21:20:33 of course, the other big problem with all of this is our messaging to the community at large 21:20:40 i think it's pretty ... cloudy right now 21:20:46 ok, we can revisit the state after your neutron sprint ? 21:20:57 but I think this half-freeze has potential 21:21:18 yeah I think we'll have a better idea where we stand after the 17th 21:21:27 sounds good 21:21:59 all that said, I'm very pleased with the focus on those issues in the neutron team 21:22:14 it's just that the gap was deep 21:22:50 and covering it can take longer than expected 21:22:57 which brings us to the next topic 21:23:02 #topic Gate stability (jgriffith) 21:23:28 So going into next week I'm a bit concerned 21:23:36 jgriffith: anything on your mind ? the 78-deep gate queue ? 21:23:45 ttx: that would be correct 21:24:03 I'm wondering if we need to rethink som things 21:24:13 we don't seem to be solving the problem 21:24:13 mordred, fungi, jeblair, clarkb: around? 21:24:23 Or am I the only one that thinks there's a problem? 21:24:23 jeblair and clarkb are in .au 21:24:44 sdague: mordred is too 21:24:50 well today there was a n ova problem that put us way behind 21:24:56 sounds like maybe I'm in the minority 21:24:56 or is he no 21:24:57 libvirt package was updated by accident 21:24:57 t 21:25:00 jgriffith: definitely a problem. Still very few people working on race bugs 21:25:08 and that caused all nova patches to fail until it was resolved 21:25:12 and we had compounding issues 21:25:18 ttx: sup? 21:25:21 also for reference ... http://not.mn/gate_status.html 21:25:35 russellb: yeah... make sense but I guess my quesiton is we continue to uncover more of these "races" than we fix 21:25:43 but that was just today 21:25:46 mordred: weekly "do we need to rethink some things about the gate" topic 21:25:50 I'm beginning to wonder if our testing approach might need some tweaking? 21:25:57 ttx: yeah. 21:25:59 * jgriffith ducsk while everybody throws things at him 21:26:00 jgriffith: have any tweaks in mind? 21:26:10 jgriffith: I also think that we keep finding races because we're adding better tests 21:26:19 i generally always end up back at "we just need to fix the problems" ... 21:26:22 russellb: So I just started thinking about this yesterday 21:26:29 jgriffith: at lot of these races have been here a long time 21:26:32 jgriffith: well - I think the problem is that what needs tweaking is what people hack on and how - but we've been using hte gate as a stick to try to force people to do so 21:26:34 just no one is working on them 21:26:44 russellb: +1 :) 21:26:48 Ok 21:26:53 Maybe I'm completely off base here then 21:26:57 jgriffith: I agree we may need to think of new things - but not tech things in the gate- I think we may not be motivating people successfully 21:27:01 ttx: yeah, here 21:27:04 I don't agree that it's "nobody working on them though" 21:27:09 i think the motivation issue is key 21:27:19 * jog0 thinks of beer 21:27:29 most nova gate bugs are collecting dust i think 21:27:30 one bug, one beer 21:27:40 I think there's a number of things going on that are not helping 21:27:56 workign on the issues may be part of it 21:28:02 sdague: any specific project to shame ? 21:28:02 but there are a TON of distractions 21:28:15 gitignore, vim heading, __init__ files, typos in comments 21:28:27 and my favorite the 10 part commits (one line each) 21:28:38 This adds stress to the review queues, the gates etc etc 21:28:43 jog0: maybe you're onto something -- a special party at the summit for contributors to the race issues? 21:28:46 ttx: not really, I'm still working towards better dashboard, but I get dragged down until actually trying to debug the key issues as well 21:28:47 and takes attention/focus away from those races in the gates 21:29:21 and trying to determine better ways of exposing those races early/first time around 21:29:33 so http://status.openstack.org/elastic-recheck/ lists about 20 bugs or so that are actively happening in the gate 21:29:34 I think we are at a key moment -- adding better tests so catching more bugs, with the tooling and reporting slightly behind and people not all lined up to fix them 21:30:00 trick is overall, that means stuff is getting better 21:30:10 ttx: I fully agree with that 21:30:19 yes, it hurts our velocity 21:30:32 and I apologize that I'm not bringing this up with a proposal other than maybe rethink some things 21:30:34 jgriffith: as for distractions, I think you just need to -2 more 21:30:42 but then, maybe we shoudln't be that much fast as long as we have those bugs around 21:30:44 jog0: sure 21:30:55 ok 21:30:55 I also think that given the review load, I think it's totally ok to start -2ing things at this point that aren't useful 21:30:59 it's i2 almost 21:31:09 hurting the velocity is probably good if our quality is hurting 21:31:11 and so time to start shedding review load 21:31:12 "You frustration I hear", said ttx, "but on the right path we are." 21:31:14 the tension comes when the bugs are on one project and the delays hit other projects (see the discussion from last meeting) 21:31:20 velocity is only useful if the quality stays high 21:31:21 Well it seems I'm in the minority on some of this so we can move on 21:31:25 jgriffith: here was one i just did https://review.openstack.org/#/c/64393/ 21:31:30 russellb: ++ 21:31:36 jd__: summarizing, you are good at. 21:31:53 jgriffith: I think you are not in the minority in believing the current situation sucks 21:32:06 jgriffith: I agree with your concern 21:32:07 sdague: to be clear, I'm not hear to say "this sucks" 21:32:09 ttx: totally get the cross project tensions ... not sure what to do about that 21:32:13 * notmyname just woke up at 5am in AUS 21:32:20 I was hoping to get some thought on maybe doing something "differently" 21:32:46 whehter that be changes in how we gate... changes in philosophy on commits during certain periods of time etc 21:32:57 jgriffith: yeh, so to me differently is having any gate bug go a week without an update 21:33:01 and of course getting discovered issues worked on 21:33:07 jgriffith: when there are critical gate bugs, I think its very reasonable to ignore all other reviews and work on the gate / other ciritcal bug 21:33:10 https://bugs.launchpad.net/nova/+bug/1257626 - last updated a month ago 21:33:11 jgriffith: there is some potential around slarter gate queues, as notmyname proposed last meeting 21:33:15 smarter* 21:33:17 it's like #4 on the list 21:33:21 sdague: agree, and yet we have bugs with 500+ rechecks from August 21:33:27 which needs some dev help to happen 21:33:39 jgriffith: right, that was my point on "*no one* is working on them" 21:33:49 which really should have been: not enough people are working on them 21:33:51 it seems like more people fixing these bugs is the biggest issue by far 21:33:52 sdague: fair enough 21:34:04 and people want to work on the fun stuff (features etc) 21:34:08 russellb: agreed 21:34:16 alright... so that's all fair 21:34:17 things are tolerable until we get hit by some external issue and then the day is lost 21:34:33 its going to hit the fan next week :) 21:34:36 so it's a bit of a fragile equilibrium 21:34:45 ttx: honestly, I'm not convinced we're at tolerable 21:34:52 even without external events 21:34:53 maybe when that happens people will be more interested 21:35:01 as in their stuff isn't landing 21:35:04 but the pain threshold hasn't seemed to bother enough people to dive in 21:35:11 I htink the queue is 22hrs deep right now 21:35:14 sdague: ttx I'd say we are not 21:35:16 by a long shot 21:35:16 sdague: might be a european perspective. At least we have the option of sneaking our patches before the queue fails 21:35:24 ttx: LOL 21:35:34 ttx: with a 22 hour queue you don't 21:35:42 ttx: yeh, not at the moment you don't 21:36:12 Ok, that's all I have I guess. I'll think on how to motivate more effort on bug fixes 21:36:13 sdague: about current situation, is it just backlog, or some persisting issue ? 21:36:32 o/ 21:36:32 and I would ideally love to come up with a solution to expose things first time around 21:36:34 ttx: so there were a few compounding events, like the libvirt upgrade 21:36:47 jgriffith: so math is actually against us on that one 21:36:51 https://bugs.launchpad.net/tempest/+bug/1253896 is my favorite bug 21:37:02 bounty system? don't know who provides the prize though 21:37:03 ttx: and now zuul is so hammered, it's actually also timing out bugs 21:37:05 sdague: and you can't beat math as I've always said in the past 21:37:11 filed and critical since second half of november 21:37:14 jgriffith: I'd be happy to work with you figure something out 21:37:32 notmyname: thx 21:37:48 kidnapping first born, anyone? 21:37:49 jog0: https://bugs.launchpad.net/tempest/+bug/1253896 has actually been seeing active work 21:38:02 notmyname: There's a lot of really sharp people already thinking/working on it 21:38:04 I'd suggest cloning salvatore orlando a few thousand times 21:38:23 It'd be interesting if we all got together on something other than "get more people working on it" 21:38:28 ttx: +1 21:38:42 that was a test. good news. you passed 21:39:15 jgriffith: maybe. I'm not sure we're going to clever our way out of fixing bugs though :) 21:39:33 sdague: I'm personally a firm believer in smarter rather than harder :) 21:39:39 jgriffith: so right now all we have is the atomic gate is wedged stop +2ing tactic 21:39:45 which is not a good one to use 21:40:06 my point is how to NOT wedge the gate in the first place 21:40:08 I think the concept of slowing down feature development until bugs are fixed is good. The problem is that the pain shared by everyone, but not everyone can fix those bugs where they are 21:40:14 and I don't buy the "fix the bugs" 21:40:38 because there's something wrong IMO with either our code, or how we're testing 21:40:38 hence the frustration 21:40:51 jgriffith: so your concerend so many bugs are getting into the gate? 21:40:54 jgriffith: put all of openstack into 1 single threaded process 21:41:03 * jgriffith proposes a quality / tech-debt release.. no new features :) 21:41:04 the issue is really races between components 21:41:10 jog0: exactly 21:41:26 which means there are timing challenges, which is why they show up 1 or 2% of the time 21:41:28 sdague: if that's true we need to look at what we're doing as a whole 21:41:38 ie the design/arch of OpenStack 21:41:40 I guess I feel like if someone contributed a test that failed to glance, I wouldn't even look at it, because it would make my unit test suite unusable, even if the test reflected an actual bug. . so I'm not sure how nondeterministic failures are really different 21:41:42 the odds are against us 21:41:44 but I don't necessarily know if I agree 21:41:59 there are things that are done in tests that aren't really valid sometimes IMO 21:42:06 ok, we need to move on 21:42:13 no magic bullet again 21:43:02 k.. thanks everyone 21:43:04 although I've hope that the current efforts already in progress will improve the situation 21:43:30 (new elastic-recheck reporting, smarter queues, moar neutron parallel testing) 21:43:35 next topic 21:43:53 #topic Brick library (jgriffith) 21:43:57 jgriffith: you again 21:44:08 oi 21:44:13 you're on fire 21:44:17 gogogo 21:44:17 Ok... so I've talked with some folks on this 21:44:32 Cinder came up with this idea of brick.. basicly mini-cinder service 21:44:51 do things like manage LVM/local storage on compute nodes 21:45:03 long term goal is things like scheduling local disk on instance nodes 21:45:20 also things like manilla, trove etc 21:45:34 IMO should leverage cinder for block storage rather than write their own 21:45:58 My proposal is to skip incubation in oslo and create a lib right out of the gate 21:46:02 it would fall under Cinder 21:46:21 then once that's set up of course go through and pull into the other projects 21:46:46 I've chatted with a couple of folks and wanted to get all the PTL's up to speed 21:46:48 i don't think we really incubate any other library right? incubating libs in oslo was just while the API was unstable 21:46:50 jgriffith: is it a library ? 21:46:50 sounds sane 21:46:51 make sure there are no objections 21:47:00 seems fine to me 21:47:04 ttx: it's not yet, I need to turn it in to one :) 21:47:13 not sure about the oslo part, but dhellmann might have an opinion 21:47:19 I'll need help from ttx and the infra gurus on that front :) 21:47:23 no need for incubation in the oslo sense 21:47:30 jd__: I spoke with dhellmann on this, skip oslo 21:47:38 and no need for incubation in the normal sense either right? 21:47:39 ^^ skip oslo was what we came up with 21:47:46 russellb: that's what I'm hoping 21:47:47 yeah, we discussed this and I agree it doesn't make sense for this to go through the oslo-incubator 21:47:49 works for me 21:47:53 jgriffith: ok, so that would be just a standard Python library? 21:47:57 russellb: I'm hoping it's use in cinder has been the "incubatio" 21:48:00 jgriffith: one question left was the use of the openstack namespace for the lib name, right 21:48:05 jgriffith: this become oslo.foo or cinder.foo? 21:48:06 ttx: yes 21:48:18 sdague: that's the million dollar question :) 21:48:19 please make a standard Python lib, don't zope it 21:48:26 jd__: noted :) 21:48:30 jd__: nice :) 21:48:34 sdague: it can only be cinder.brick if cinder is converted to a namespace package (which isn't a bad idea, but will involve some code churn) 21:48:47 * jgriffith hates code churn 21:48:56 just brick seems fine 21:48:57 but if it's worth the end result I'm down for it 21:49:00 if not already taken 21:49:12 I'll deal with a name if needed 21:49:13 ok, so the only thing to note is we'll need to gate it like the oslo.config and oslo.messaging libs 21:49:29 sdague: that's the part I'm going to need some help on, at least some guidance 21:49:31 as that's our current policy for things like that (which might get revisted) 21:49:35 sure, normally that's what I'd say, but this is really only intended to be used by openstack apps so that's why we were discussing a global namespace 21:49:39 sdague: roger that 21:49:55 jgriffith: yeh the oslo graduation section lays most of it out 21:50:08 is this really only intended to be used by openstack apps ? 21:50:11 and ttx is going through it with oslo.rootwrap right now 21:50:17 lots of fun 21:50:30 vaguely documented on the oslo page now 21:50:42 ttx: well I'm not desiging it with anybody else in mind but I don't have a problem if others pick it up use it 21:50:51 but I don't want to maintain it in that perspective 21:51:03 Life is much simpler when it's "openstack only" IMO 21:51:35 anyway, I just wanted to run this by the other projects 21:51:42 mostly russellb (so thanks russellb ) 21:51:51 jgriffith: do you need anything more on that subject, or can we move on ? 21:51:54 make sure there's now "WTH is this" 21:52:03 ttx: move on... thanks everyone 21:52:10 jgriffith: sure! 21:52:16 #topic Red Flag District / Blocked blueprints 21:52:24 No blocked blueprint afaict 21:52:33 Any blocked work that this meeting could help unblock ? 21:52:49 78-deep gate doesn't count. 21:53:46 76* 21:53:50 yay 21:54:17 so coming back to the gate issue, one thing we haven't tried is global gate bug fix days. Especially if you got a lot of PTLs and top tech folks to sign up for devoting a day it might shake some of these loose. 21:54:24 elaphant gun approach 21:54:35 sdague: i'd love to do that 21:54:52 mondays are good moments for that. Clean gate 21:55:02 sounds like a good idea 21:55:03 it's not a long term sustainable approach ... but it's good for playing catchup 21:55:08 and i think we need to play catchup right now 21:55:12 +1 21:55:13 ++ 21:55:18 +1 21:55:20 i'd be happy to devote at least a full day, if not a couple 21:55:36 sdague: go for it 21:55:50 damn, I just signed myself up for organizing that didn't I.... 21:55:50 just track the results to see if it made a difference :) 21:56:01 sdague: sure did 21:56:07 ttx: good call 21:56:17 #topic Incubated projects 21:56:23 ok, well with montreal I'll plan to do one post i2 21:56:26 * NobodyCam is here for ironic 21:56:32 4 minutes for questions if any 21:56:52 * SergeyLukjanov here too 21:56:54 NobodyCam: are we stil on track for an icehouse-2 functional milestone of ironic ? 21:57:32 Ironic update. we are looking mostly good. we may have a issue with neutron intragration, but have already started on a fall back plan for that case 21:57:34 SergeyLukjanov: https://launchpad.net/savanna/+milestone/icehouse-2 looking good 21:58:18 ttx, yup, heat integration is already landed 21:58:49 fwiw we should have some incubation status intermediary meeting soon at the TC level 21:58:53 will keep you posted 21:59:12 great 21:59:19 #action ttx to organize intermediary incubation status TC meeting soon 21:59:34 any question ? 21:59:53 no time left for open discussion 22:00:25 Thank you all 22:00:28 #endmeeting