21:03:01 #startmeeting project 21:03:01 Meeting started Tue Jul 8 21:03:01 2014 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:03:02 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:03:04 The meeting name has been set to 'project' 21:03:09 * eglynn didn't realize the meeting was off 21:03:10 Agenda for today is available at: 21:03:21 #link http://wiki.openstack.org/Meetings/ProjectMeeting 21:03:25 #topic Actions from previous meeting 21:03:30 I documented SPD and SAD, in case you want to use them for Juno in your projects: 21:03:35 #link https://wiki.openstack.org/wiki/SpecProposalDeadline 21:03:37 ttx: Thanks! 21:03:40 #link https://wiki.openstack.org/wiki/SpecApprovalDeadline 21:03:44 #topic News from the 1:1 sync points 21:03:52 There were no 1:1 sync points today since I've been traveling 21:03:57 If any of you has news items, just shout now 21:04:38 Swift 2.0.0 was out but I guess everyone noticed that 21:04:43 #topic Other program news 21:04:47 I'm starting work on cross-project unit test jobs (https://review.openstack.org/#/c/95885/) in the next couple of days. 21:04:53 Infra, QA, Docs... anything you'd like to mention ? 21:05:13 dhellmann: what is a cross-project unit test? 21:05:25 * ttx follows link 21:05:32 ttx: running unit tests for projects using master of oslo libraries, and vice versa 21:05:57 dhellmann: ah, ok 21:06:01 nifty 21:06:06 we already have integration tests via d-g, but this will prevent breaking unit tests with new releases (we did that a couple of times last cycle) 21:06:08 dhellmann: ++ 21:06:48 any other news ? 21:07:24 #topic Neutron parallel gate job switchover (mtreinish) 21:07:30 mtreinish: floor is yours 21:07:37 ttx: ok thanks 21:08:01 so switching the neutron jobs over to parallel is going to happen soon 21:08:11 mtreinish: Yay! 21:08:17 mtreinish: didn't we do that once already ? 21:08:19 right now they're the only jobs that still run tempest serially in the gate 21:08:44 ttx: not for the neutron jobs 21:08:50 ok 21:08:56 ttx: no, we got really close, then the January massive gate wedge happened 21:08:57 we've had a nonvoting job parallel job for a while 21:09:18 so the question I wanted to get opinions on was whether we make the switch everywhere 21:09:46 or we have an asymmetrical gate with neutron (ie switch it to gate parallel for neutron and leave it serial on all the other projects) 21:10:09 the concern from salv-orlando and others was that the reliability of the jobs goes down slightly after switching to parallel 21:10:11 is neutron running all of tempest yet? 21:10:24 russellb: this switch will include that more or less 21:10:32 i see. ok 21:10:33 I feel like you should make it everywhere if you're going to do it 21:10:42 But if its unreliable, probably shouldn't do it at all 21:10:43 mtreinish: I think switching neutron first sounds good, so if things do go wrong they don't bring everything down. 21:10:45 mtreinish: did you quantify the difference in ... reliability ? 21:11:05 ttx: salv-orlando's ML thread had some numbers 21:11:11 let me pull up that link 21:11:17 mtreinish: I am a fan of making changes like this slowly and carefully and not all at once 21:11:24 mikal: well everything is unreliable to some degree. Nova fails unit tests in the gate some times. It's all a matter of degrees. 21:11:39 #link http://lists.openstack.org/pipermail/openstack-dev/2014-June/038496.html 21:11:44 jogo: I agree it should be rolled out in steps, but I thik the end goal should be consistency 21:11:46 jogo++ 21:11:57 mikal: yup 21:12:02 jogo: the concern with it being asymmetrical is that we will break the neutron gate at some point 21:12:25 mtreinish: can we leave one serial job in neutron gate? 21:12:28 just to be safe 21:12:32 so in a situation like this, i'd prefer to make it voting within the project but not elsewhere yet 21:12:41 jogo: it's the reverse case 21:12:42 jogo: that's not actually the concern 21:12:53 primarily to help developers within the project avoid introducing new issues that this would catch 21:12:54 So... Making this change would triple the fail rate? 21:13:03 Or am I reading this wrong? 21:13:03 sdague: what is the concern, sounds like i am missing something 21:13:10 mikal: taht's how I read it, too 21:13:21 Triple is a pretty big increase 21:13:29 (Noting that it might be telling us about actual neutron bugs though) 21:13:34 jogo: in parallel things like hitting keystone and the rest of the services change dramatically 21:13:50 if they change an access pattern that doesn't work for neutron, neutron is wedge, and they can't fix it 21:13:50 from ~10% to ~30% 21:14:17 mikal: it should only be about 2x 21:14:34 mikal: there are still some outstanding big bugs which is why we haven't switched yet 21:14:41 mtreinish: ahhh, at the end of the email he says 2x _if_ some bugs can be fixed 21:14:41 actually, you have to compare the check queues 21:15:04 "Summarizing the only failure modes specific to the full job seem to be C & 21:15:07 D. If we were able to fix those we should reasonably expect a failure rate 21:15:10 of about 6.5%. That's still almost twice as the smoke job, but I deem it 21:15:13 acceptable for two reasons" 21:15:14 sdague: right, but what is the risk of moving neturon to parallel before the rest? 21:15:27 jogo: because this job doesn't just test neutron 21:15:35 it's a configuration that includes neutron 21:15:43 testing everything 21:15:45 sdague: ah right, so keystone may wedge neutron only 21:15:49 yep 21:15:58 so the summary at the end of that mail seems to support teh possibility of making it asymmetric initially 21:15:58 jogo: or more likely nova... 21:16:15 I think that risk is fairly small, while the risk of parallel neutron breaking everything is higher. 21:16:56 jogo: honestly, with all the races in security groups in nova that we've been diving in, I'd actually say there is a pretty good chance we're going to 'fix' something and break neutron in the process. 21:17:18 mtreinish: I still tend to prefer enabling it for neutron only as a first step, but I'll happily defer to the gatemasters 21:17:35 ttx: honestly, I'm actually good with neutron first as well 21:17:53 ttx: well I was initially opposed to that, but I don't really feel to strongly about it 21:17:57 but I did want to make sure the other side downfalls were clear 21:18:06 sdague: that may be true. Sounds like we agree 21:18:12 as long as we keep our eyes open about the issues with doing it 21:18:27 so it means if we do that, the core teams for other projects are going to have to sign up to moving on changes fast if they broke neutron 21:18:29 mtreinish: the goal being... to flesh out issues without breaking everyone else... and make it all parallel asap 21:18:30 devananda: knows firsthand how much fun an asymmetrical gate is 21:18:31 mtreinish: ++ 21:18:56 ok, anything else on that topic? 21:19:06 mtreinish: indeed .... 21:19:10 mtreinish: you said this will enable full tempest too? 21:19:17 jogo: yes 21:19:31 although there are still a bunch of skip if neutrons in there because of api issues 21:19:37 mtreinish: awesome then we can do https://review.openstack.org/#/c/100033/ after 21:19:51 ttx: that's all I had, we've got a direction to move this forward when it's ready 21:19:52 jogo: no, I really don't want to do that 21:20:02 ok, I think we can move on then 21:20:03 #topic Are we digging ourselves into a backporting hole with branchless Tempest? (eglynn) 21:20:03 sdague: oh? lets talk after 21:20:13 context is: in ceilo we're begining to see featureful backports to stable just to facilitate branchless Tempest 21:20:15 eglynn: here is the mike 21:20:21 example ... https://review.openstack.org/104863 21:20:52 in this case it was a kinda featureful fix for a bug of *omission* ... i.e. ceilo wasn't handling the cinder notification at all 21:20:54 they shall be rejected by the stable team, no? 21:21:00 i've been wondering when this would be brought up 21:21:18 ironic hasn't hit it yet, but i've started to suspect we will hit this soon 21:21:18 eglynn: so are notifications part of your API contract? 21:21:26 ttx: yep would be in the normal course of events 21:21:39 sdague: so question is ... is leaving the backporting policy unchanged a goal of branchless tempest? 21:21:55 eglynn: the point was to enforce published API contract 21:22:01 because we were slipping it a lot 21:22:18 so I guess the key problem is that some of these ceilometer tests are not strictly API tests 21:22:30 ok, so then they probably shouldn't be in tempest 21:22:40 i.e. we want to assert that a certain notification emitted by another service is consumed by ceilometer and the expected metering datapoints appear in ceilometer 21:23:11 sdague: do we have another integration tets harness for such tests to go to? ;) 21:23:18 eglynn: nope 21:23:21 this is a bit like saying "when we ask nova's API to launch an instance, it actually makes a usable instance" 21:23:26 eglynn: what about if you add a certain notification, say during the Juno cycle, which wasn't in Icehouse, and want to test that? 21:23:51 eglynn: is taht the situation you're in, or have i misunderstood? 21:23:51 devananda: so basically the situation is a little like scenario #1 "New Tests for new features" in the BP 21:23:53 dhellmann: so I consider a working compute part of the API contract 21:24:00 #link https://github.com/openstack/qa-specs/blob/master/specs/implemented/branchless-tempest.rst 21:24:07 devananda: https://github.com/openstack/tempest/blob/master/README.rst#1-new-tests-for-new-features 21:24:19 ... but without the obvious discoverability in this case 21:24:26 sdague: an ceilometer says that the events it collects will be returnable by the API, but the API itself does not specify what events are collected -- like nova doesn't specify which images are availble 21:24:35 i'm sorely lacking in background here, but it seems like "datapoints appear in ceilometer" might be part of the api contract; similarly to a nova instace showing up 21:24:38 right, without discoverability 21:25:07 sdague: because new event types can be created by end-users through the API, deployer custom code, etc. The API doesn't specify the event names. 21:25:09 this sounds like an issue in the ceilometer API itself 21:25:12 jeblair: the API is the same ol' API, i.e. the query doesn't required a bump in the API version 21:25:23 dhellmann: ok, then the tempest test proposed isn't valid 21:25:34 because it makes too many assumptions on something not in the API 21:25:40 jogo: I fear we may be bending the problem to fit the "solution" 21:26:06 sdague: so we don't want any tests that say that ceilometer does actually collect data from the other services? 21:26:28 dhellmann: is it a required part of the API? 21:26:29 eglynn: ? can you elaborate. I meant having an open ended API like that without discoverability is dangerous 21:26:30 sounds way too restrictive for an integration test harness 21:26:31 I think we want that tested somewhere, even though the list of events isn't defined in the ceilometer API. 21:26:43 so I was chatting with dkranz a bit about this earlier 21:26:46 "can I store arbitrary data in this API" vs "when cinder does $thing, did $value get stored in ceilometer" ? 21:26:47 sdague: yes, ceilometer says it will "collect events" but it doesn't say which ones 21:27:01 dkranz had the idea of micro-versioning the service (as opposed to the API) 21:27:15 ... so that a test can be skipped on the basis of whether a particular commit is available or not 21:27:32 eglynn: how would that be reflected on a public cloud? 21:27:34 ... which gives fine-grained discoverability when tempest is run in-CI-gate 21:28:02 sdague: I was waiting for you to ask that ... it doesn't help in that case 21:28:41 ... it seems like forcing tempest to do double-duty in that regard will restrict what we put into tempest (in it's role as the integration testing harness for the CI gate) 21:28:43 right, so I think this becomes one of those things where people keep throwing tests at tempest that aren't of the level of stability to really be part of it 21:29:05 so is "able to collect data" the feature? or "ceilo actually collects data from $service"? If the latter is the feature being tested, then AIUI, it's not related to the API version at all 21:29:16 eglynn: is ^ a fair summary of the question? 21:29:29 I don't see this as a "stability" question, though. The problem is the events collected actually depend on ceilometer's pipeline configuration file as much as code. 21:29:35 devananda: yep, the latter 21:29:37 eglynn: I feel like Ironic is bumping against a similar challenge with scenario testing 21:29:59 so do we need to split off the CI-focused and the public-cloud-capability-test-focused aspects of Tempest? 21:30:07 yeh, I think what's emerging is that for a long time we basically had unit tests and tempest 21:30:16 and I don't think that's the right approach going forward 21:30:27 eglynn: because I actually want to test what /nova/ does, and even though it's API isn't changing, in a future version of OpenStack, it may be capable of doing more things with Ironic than it is today 21:30:49 I think that projects really should have functional tests that run against a devstack just for their project which is in this middle ground 21:31:01 eglynn: which sounds similar to you testing the cinder->ceilo messages, not ceilo's API itself 21:31:01 devananda: yep, and would those extra things be externally discoverable? 21:31:02 and Tempest needs to stay at the API stable boundary 21:31:05 sdague: ++ 21:31:09 eglynn: nope. 21:31:17 eglynn: since Nova doesn't expose what driver is being used 21:31:22 devananda: k, so same problem I think 21:31:37 eglynn: right, because you actually want white box testing 21:31:39 sdague: those tests need to protect us so a project can't change the format of an event and that break ceilometer 21:31:55 neutron has some functional testing along those lines 21:31:57 so we can't just run them against the project 21:32:04 sdague: "projects really should have functional tests that run against a devstack just for their project" --> does that scale? 21:32:05 so we definitely need integration tests for the various cross-project interactions like this 21:32:18 eglynn: honestly, it scales better than the current model 21:32:26 does that mean another functional test suite that is not tempest? 21:32:29 which is what tempest originally provided, based on my limited understanding 21:32:30 (scale in terms of the QA exterpise to build such a harness on a per-project basis) 21:33:03 swift does it too 21:33:06 dhellmann: I actually think these are functional tests in the project itself 21:33:10 (also in terms of CI resources to run all these seperate mini-tempests) 21:33:15 ceilometer owns ceilometers functional tests 21:33:25 sdague: but ceilometer will be depending on information coming from other projects for these tests 21:33:30 shouldn't all cross-project interactions be versioned to help address this issue? 21:33:38 with tested APIs 21:33:40 sdague: so what if $otherproject breaks the APi contract between it and ceilo? 21:33:41 devananda: exactly 21:33:47 sdague: ditto for eg. nova <-> ironic 21:34:00 jogo: yeah, there's no contract on notification format at all right now, afaict 21:34:03 devananda: so I think we actually can solve that with things like the contract unit test job we put in nova 21:34:07 or, for taht matter, any project and keyustoneclient 21:34:14 dhellmann: yeah and I think that's the crux of the issue here 21:34:14 dhellmann: IMHO that is a bad 21:34:24 so the contract around notifications is different to an API 21:34:34 eglynn: there is no contract on notifications 21:34:34 eglynn: eh? 21:34:35 totally agree; and jd__ had a blueprint to work on it but didn't get much traction 21:34:35 notifications are versioned usually, for example 21:34:57 eglynn: rpc payloads are versioned, but are notifications? 21:35:15 that might have changed since I was involved closely with this part of ceilometer... 21:35:22 dhellmann: I mean *not* versioned 21:35:25 ah 21:35:30 Freudian slip :) 21:35:34 eglynn: heh 21:35:40 so here is my point of view based on a month of trying to dig us out of current gate situations. 21:35:59 I don't think the current model of pile more into tempest, and expand tempest scope, is helping 21:36:08 because people aren't helping fix the issues 21:36:14 they are just recheck grinding 21:37:01 so functional / grey / white box testing that people recheck grind on because their nova change is getting broken by a ceilometer test.... I'm not sure that's going to help us move forward 21:37:03 s/tempest/OpenStack/ and that statement is still fairly true 21:37:10 sdague: isn't there's also a strong incentive to pile more & more into tempest in terms of TC mandated requirements for tempest coverage? 21:37:20 jogo: exactly 21:37:22 eglynn: there are API coverage needs 21:37:52 jogo: that's the crux of the problem, IMO, but also a different discussion 21:37:56 but testing the API, and doing functional testing of cinder notifications which were not provided by the API is different 21:38:02 devananda: agreed 21:38:23 sdague: but these tests are good check compatibility projects 21:38:33 sdague: I agree with the need for separate tests. I don't agree that *all* of them can be project-specific. 21:38:43 vrovachev: then they should be valid across an API 21:38:44 sdague: can we redefine the question to, should this be an API discoverable thing? 21:38:56 jogo: sure 21:39:26 because this sounds like something that probably should be. If I am using a public ceilometer I want to know what it supports etc 21:39:36 jogo: We could add an API to ceilometer to discover what sort of data is being collected at all. But how do we version the response, if the deployer can change it. 21:39:51 sdague: in the case of ironic, the problem is the API being exercised is not the one being tested by tempest 21:39:53 sdague: the interaction between ceilo and all services simply isn't API based in it entireity 21:40:18 eglynn: I think thats one of the big issues here 21:40:19 eglynn: in this situation, is the end-user communicating with ceilo, or is the notification API "hidden" behind other services? 21:40:20 eglynn: right, I do get that. But I also get the scope problems we currently have. 21:40:40 sdague: sdague: then it is necessary mock all API requests 21:40:53 devananda: in this situation the end user snaphots a volume so they only interact with the cinder API 21:40:58 eglynn: and you've caught me at a particular level of frustration, because realistically I've not been reviewing tempest code for the better part of a month now 21:41:03 eglynn: i suspect again we're facing the same issue -- this isn't discoverable by tempest because tempest isn't talking to the API being tested 21:41:17 because everyone wants to add new tests, no one wants to debug the fails we hit 21:41:47 sdague: and this is tests to validate and not functional tests 21:41:48 devananda: yes, in our case tempest talks to the ceilo API to check for a side effect of the interaction done via notifications 21:41:49 sdague: because there's a mandate that projects add more tests if they want to graduate / stay integrated 21:42:11 OK, I think we should think a bit more about this and maybe discuss it n the list. Problem looks a bit more complex than we can solve in a one-hour IRC meeting 21:42:46 devananda: well, until we can dig out of the current fail pit, we can't really sort out the long term issues here. 21:42:48 ttx: I can bring the topic to the ML tmrw, if everyone is agreed to continue the discussion there? 21:42:48 ttx: ++ 21:42:55 eglynn: sure 21:42:58 eglynn: sounds good to me 21:43:02 eglynn: ++ 21:43:06 sdague: cool thank you sir! 21:43:12 at least we established now that there is no easy way out 21:43:22 eglynn: Sounds good, thanks! 21:43:25 #topic Open discussion 21:43:39 Feel free to continue discussing now :) 21:43:45 Anything else, anyone ? 21:43:46 :) 21:43:59 meanwhile, in Brazil... 21:44:06 for srs 21:44:12 i haven't seen a gate status email to the dev list in a while 21:44:31 did i just miss it, or did folks stop sending those? 21:44:32 * eglynn is running out of fingers to count the German goals ;) 21:44:42 sdague, jogo, mtreinish: ^ ? 21:45:05 jeblair: honestly, I ran out of energy to do them 21:45:10 jeblair: I ahve been on vacation' 21:45:25 jeblair: do you miss them / do you think they helped? 21:45:27 it's apparent that sdague is frustrated, i just want to make sure everyone else actually knows that :) 21:45:41 ttx: i don't know. from the sidelines, i thought they did 21:46:17 i _personally_ found them helpful just to know what was going on 21:46:24 that's not a good enough reason to do them 21:46:41 but i did think that they usually got results for the top offending bugs 21:46:50 jeblair: yeh, when it seemed like they were drawing more folks in, it was easier to get motived on them. But when the decision is 'do I write this up, or try to fix one of these' I've been opting for fixing 21:47:30 jeblair, sdague : I found them useful, too, to understand the state of what's being dealt with. 21:47:46 dhellmann: +1, I also found them interesting and useful as well. 21:47:51 dhellmann: right, but did you find them useful enough to devote 3 days to fixing things :) 21:48:01 because being interesting isn't really the goal 21:48:07 indeed 21:48:40 sdague: if I had had 3 days, I might have, but I at least knew why my other patches weren't working and was less likely to recheck them as a result 21:48:44 i'm not asking for myself, more thinking that if i'm surprised that sdague is frustrated by the gate status, perhaps other people who might be able to help would be as well 21:48:48 so if I have commitments from people that me doing that will bring more people to fixing issues, that's an easy trade off 21:49:16 jeblair: right, I'm not asking sdague to keep it up, but I think it was valuable so maybe we need to find some community members who can take that task off his plate 21:49:53 not everyone is going to be able to dive right in and debug, but summarizing the state of the most common gate failures doesn't require that level of expertise (at least I thin k not, maybe I'm wrong) 21:50:44 dhellmann: i think you may be right; it probably requires paying some attention over time, but not huge expertise 21:50:51 dhellmann: yeh, another volunteer to do that would be appreciated as well 21:50:54 jogo: sdague: any thoughts on what to use for white box integration testing as the scope of OpenStack continues to expand, and the gate becomes exponentially more complex? 21:51:18 devananda: realistically, I think we need to reevaluate whether we think that's sustainable 21:51:20 this is something we should probably start talking about since it seems to be on our minds already 21:51:25 I think the gate news, IF relatively low cost, can help getting people interested 21:51:58 At least that gives me good posts to point people to when they ask "where should I help" 21:52:14 sdague: i don't think it is. I thought (and said) that over a year ago, fwiw 21:52:23 sdague: the board asked us at the summit how they can help, maybe we can take this specific request for volunteering to a few of them? 21:52:33 dhellmann: sure 21:52:47 I'd be more than happy to help bootstrap folks here 21:52:51 devananda: I agree with sdague on this one 21:52:52 OK, I think we can wrap up 21:53:09 before we kill the gate in frustration 21:53:13 jogo: i maen, i agree as well -- i don't think it's sustainable 21:53:32 when you think you had a bad day, consider Brazil's team day 21:53:50 I fear OpenStack is facing feature sprawl without focusing on solidifying the foundations 21:53:50 devananda: realistically I think we need to be more surgical about this 21:54:10 because we can actually put contract points in, like we did with ironic on nova 21:54:58 maybe a topic for another meeting 21:55:04 ttx: sure :) 21:55:04 ttx: yep 21:55:05 last words ? 21:55:12 you just said it was open discussion 21:55:27 OK then, let's close it :) 21:55:37 #endmeeting