21:01:52 #startmeeting project 21:01:53 Meeting started Tue Aug 26 21:01:52 2014 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:54 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:57 Our agenda for today: 21:01:58 The meeting name has been set to 'project' 21:02:03 #link http://wiki.openstack.org/Meetings/ProjectMeeting 21:02:14 0/ 21:02:35 #topic News from the 1:1 sync points 21:02:41 Here is the log: 21:02:46 #link http://eavesdrop.openstack.org/meetings/ptl_sync/2014/ptl_sync.2014-08-26-08.14.html 21:03:18 Only Glance was MIA today 21:03:26 In summary, juno-3 and feature freeze will hit most projects next week 21:03:40 That means we have only 9 days to merge the remaining targeted features 21:03:42 * mestery notes hit is not necessarily a figurative term 21:04:08 We are at about 15% of targets landed right now, and 80% of our juno-3 time is consumed 21:04:25 so let's just say we raelly need to switch gears 21:04:47 (to reach icehouse activity level we need to reach 80% of those targets :) 21:04:48 * dolphm makes a sacrifice to the transient failure gods 21:04:56 (by some weird simplistic metric) 21:05:00 * mestery tosses one on after dolphm. 21:05:09 We identified a few blocked things, which we'll discuss in this meeting 21:05:25 ttx: keystone is unblocked! 21:06:03 cool 21:06:09 dolphm: now get that feature merged :) 21:06:15 #topic Other program news 21:06:21 Any other program with a quick announcement ? 21:06:48 ttx: well last week I forgot to announce that salv-orlando got the neutron parallel full gate enabled everywhere 21:06:53 which is awesome 21:06:59 mtreinish: ++, awesome work by salv-orlando! 21:07:02 And the QA team 21:07:06 excellent 21:07:17 Neutron is proceeding with the incubator proposal for new features, we should have it up and running this week yet. 21:07:54 mestery: where will neutron advanced services (FWaaS etc.) fit in to that 21:08:02 #link https://wiki.openstack.org/wiki/Network/Incubator 21:08:19 jogo: That's still under discussion, it's been talked about possibly moving them there, but not right away at least. 21:08:23 jogo: So, stay tuned :) 21:09:18 ok, anything else ? 21:09:30 jeblair: how is the gate holding so far ? 21:09:47 jeblair: despite all my efforts to stage the review activity, it still looks like we'll have a heavy week ahead 21:11:02 mestery: thanks 21:11:43 ttx: we are experiencing lots of unit test issues: http://jogo.github.io/gate/ (at the bottom) 21:12:32 unit tests ? thats a new one 21:12:55 ttx: yeah sadly it is 21:13:01 any reason for the surge? 21:13:18 the magic random hash, one assumes 21:13:24 ttx: I think it was a new testtools version breaking glance 21:13:40 jogo: is that keystone or trove? colors are nearly the same 21:13:50 those things happened a few days ago so not sure 21:14:00 the freshness checks were also turned off 21:14:08 well here arethe unclassified gate failures: http://status.openstack.org/elastic-recheck/data/uncategorized.html 21:14:13 I think the 24hr freshness check probably should come back 21:14:27 sdague: ++ why was that removed? 21:14:29 for sahara there was an issue w/ new jsonschema release (in unit tests) 21:14:34 sdague: hmm, so avoidable unit test fails are back too 21:14:35 ttx: the upgrade to tox caused the hashseed value to be set randomly, and that was causing the python-26 tests to fail in trove. 21:14:50 sdague: yeah that may be related. can we just bring back unit test freshness? 21:15:01 I've seena number of changes pushed to bypass that though... aren't they merged by now? 21:15:03 jogo: or drop unit tests from the gate :) 21:15:08 which I wanted to do 8 months ago 21:15:17 ttx: we've got a fix in place to work around that, but are also looking at fixing the problematic tests as a high pri. 21:15:49 dolphm: so the theory was that people shouldn't have massively flakey unit tests.... because they are unit tests, so it shouldn't be an issue 21:15:54 ttx: yes merged now. 21:16:02 sdague: theory! 21:16:10 sdague, jogo: ok, let me know if there is anything I can do, or anything this particular meeting could help with 21:16:45 ttx: people can classify there unclassified failures 21:16:49 http://status.openstack.org/elastic-recheck/data/uncategorized.html 21:17:00 ttx: so we have better insight into what is failing and why 21:17:12 #action everyone help classify gate fails (http://status.openstack.org/elastic-recheck/data/uncategorized.html) 21:17:28 jogo: I'll try to give it some of my remaining cycles tomorrow 21:17:32 if anyone needs help I will be in -qa 21:17:46 If we enter the week of death with a gate that has a cold, we won't be in good shape. 21:17:52 jogo: how do people go about classifying something? 21:18:06 sdague: I think bringing back the 24 hour check (or maybe just a 72 hour or something) is a good idea right now 21:18:07 an email to the mailing list often helps with classifying failures 21:18:13 dolphm: elastic-recheck fingerprints 21:18:16 who wants to send the email? 21:18:34 jogo: the 24hr check before entering the gate is the one that protects the gate 21:18:48 the 72 hr check probably just eats nodes 21:19:00 sorry, was afk 21:19:11 sdague: is there evidence the 24h check would help? 21:19:42 jeblair: rigorous? no. I've just seen a few wrecking balls when I've looked on stuff that can't pass the gate 21:19:54 those add a lot of time on the gate side 21:19:56 sdague: previously when we've looked, we have not found gate failures that would have been prevented by the 24h check 21:19:57 ttx: 'gate' health: http://status.openstack.org/elastic-recheck/index.html 21:20:09 sdague: because we still have the requirement for a +1 before entering the gate 21:20:13 jogo: does 'recheck bug ###' still count towards classifications? 21:20:15 ttx: if anyone knows javascript I have a few ideas for that page to make it more useful 21:20:21 jeblair: the thing I called people out on the list about for glance would have 21:20:26 dolphm: no, you have to add a fingerprint 21:20:30 s/still// 21:20:37 jeblair: so I think we're probably just looking at different times 21:20:45 sdague: yes, but they have agreed to a procedural change for that 21:20:53 ok, looks like we could use a quick email reminding/teaching people how to classify 21:21:08 jeblair: http://lists.openstack.org/pipermail/openstack-dev/2014-August/043810.html 21:21:15 ttx: I can send one out 21:21:25 jogo: is there any point to anything beyond just 'recheck' anymore? 21:21:58 dolphm: yes, we don't collect the data on a regular basis but yes. 21:22:06 jeblair: yeh, I think people mostly don't realize that's what they are doing. Anyway, if you don't feel it would help, I'm not going to push it 21:22:19 adding a bug number shows you know what the issue is, and aren't just ignoring things 21:22:57 dolphm: you can now say any reason after the word recehck 21:23:08 jogo: good to know 21:23:12 like 'recheck -- I have a hunch something else just broke this' 21:23:20 sdague: the thing that would convince me is a change that passed > 24 hours ago but reliably failed < 24 hours ago getting into the gate 21:23:34 right, that glance stuff was that 21:23:36 but bare rechecks send the social message that you don't know why it failed and don't care 21:23:38 sdague: i realize that the tox switch could do that ^ and likely happened with glance 21:23:40 IMHO) 21:23:46 sdague: but we don't globally break tox every day :) 21:23:46 OK, let's move on... but yes, any extra effort to get the gate back in shape in this crucial week is very appreciated 21:24:02 #topic juno-3 blueprints blocked on cross-project issues 21:24:03 jeblair: sure 21:24:14 I think the only one we ahve left at this point is: 21:24:17 * ceilometer/grenade-resource-survivability 21:24:25 Blocked pending discussion between jogo and Chris Dent 21:24:37 i've synced with jogo about this before the meeting. 21:24:43 ah, great 21:24:45 patch can be tracked here: https://review.openstack.org/#/c/102354/ 21:25:01 gordc: so it's unblocked now ? 21:25:09 basically we just need feedback for the patch since cdent (dev working on implementation) is sort of unsure how to proceed 21:25:18 ttx: somewhat 21:25:36 jogo. i think cdent posted a question to your reply in gerrit 21:26:00 gordc: note that if this is just touching tests, it's fine to land post-FF 21:26:02 gordc: I'l take a look 21:26:05 (pre-RC1) 21:26:16 ttx: good to know. 21:26:37 it should just be touching tests... but i'm not sure how much the scope will change. 21:27:05 if it can be completed for j3, all the better, but if not, it can automatically be targeted to RC1 21:27:17 #topic Packaging for functional tests (zaneb) 21:27:22 i'm not that familiar with grenade/javelin stuff myself so i'm pretty useless there... if anyone has knowledge it'd be cool if we got your opinion. :) 21:27:31 #link http://lists.openstack.org/pipermail/openstack-dev/2014-August/044072.html 21:27:36 zaneb: around? 21:27:39 yep 21:27:44 Floor is yours 21:27:52 was pasting the #link, but you beat me to it ;) 21:27:55 gordc: responded 21:27:59 stevebaker: o/ 21:28:03 jogo: awesome. much appreciated 21:28:11 so, basically it's all in that email 21:28:19 sdague: it would be good to get your thoughts on it too https://review.openstack.org/#/c/102354/ 21:28:26 zaneb: \o 21:28:32 but we're looking for a consensus on how the new in-project functional tests should be packaged 21:28:39 zaneb: so I think the disconnect is around the idea that these are tempest plugins 21:28:49 because that's really not what was intended 21:28:55 input from folks who already have in-tree functional tests on that thread would be helpful 21:29:04 I don't see tempest having anything to do with these tests 21:29:13 sdague: interesting 21:29:20 I just replied that thread. tl;dr +1 on zaneb's suggestion of -integrationtests package 21:29:28 this is project functional testing 21:29:43 the tests we want to add in Heat are basically the scenario tests we haven't been able to land in Tempest 21:29:55 as far as I understand it 21:30:08 sdague: agreed, but because of the nature of heat our functional tests are really integration tests. All we do is interact with other services 21:30:11 so it may be that this is a different thing to what everyone else is doing 21:30:18 right, it might be 21:30:43 which I guess is basically the question I am asking in that thread 21:30:46 yeah, I expect the functional tests for oslo.messaging to not rely on other services, for example 21:30:56 These sound different than the neutron functional tests 21:31:13 The neutron ones functionally test out bits that neutron relies on, but no other openstack services are required to my knowledge. 21:31:28 e.g. ip commands, ovs-vsctl, etc. 21:32:04 I think the swift functional tests run against a full running swift (needs keystone?) but swift would be self-contained so they could still be considered functional 21:32:37 notmyname: ^? 21:34:11 so if we're the only project with this kind of tests, I'm happy to do our own thing 21:34:16 sdague: the heat functional tests are independent of tempest, they have forklifted some of the tempest scenario scaffolding just to get started 21:34:32 I don't want to do our own thing just for the sake of it though 21:35:03 if the other projects are actually running, then it's an integration test 21:35:07 zaneb / stevebaker - I think the challenge is heat's a weird starting point for defining this because it is so far up the stack 21:35:20 sdague: agreed 21:35:20 and i think all of the thinking about how functional tests can be optimized doesn't apply 21:35:49 yep 21:36:58 zaneb: so I'd say given that, you guys should run with whatever works for you 21:36:59 and I do see check-heat-dsvm-functional becoming a voting job on nova, keystone, neutron so that it doesn't break all the time 21:37:03 zaneb: looks like you reached a conclusion there 21:37:12 stevebaker: so the point is to *not* do that 21:37:27 cool, thanks everyone 21:37:30 if you depend on certain behavior in those projects, that should be locked down inside that project 21:37:37 not by lots of more cross project tests 21:37:49 because that's the scaling issue we are trying to address 21:38:23 sdague: we may need to wait and see how often it breaks. I guess regressions require a new tempest test to prevent a repeat 21:38:34 sdague: this is the "nova has a unit test that it doesn't break heat's use of some api" idea 21:38:43 jeblair: exactly 21:39:05 if you depend on a behavior in a component, and they keep breaking it, put some tests in the project to stop that :) 21:39:51 sdague: so should we run the heat functional tests with everything running for real, but only gate heat on it, and when it breaks, fix it and add a unit test in the other project? 21:39:54 I think it's unlikely that Nova &c. will break us that often 21:40:05 jeblair: that's an option 21:40:07 does check-heat-dsvm-functional need to be renamed -integration? (please no ;) 21:40:24 stevebaker: I vote no :) 21:40:25 jeblair: that's what I'm thinking 21:40:26 sdague: or should heat run with nova, etc mocked out? 21:40:37 sdague: (trying to understand what the other options would be) 21:40:46 jeblair: so those are the 2 options 21:41:03 jeblair: we need to test heat<-> agent interaction, so nova needs to keep it real 21:41:06 I'm not sure if we're going to know which is more effective in catching and resolving issues until we try 21:41:25 I actually think we need both 21:41:32 jeblair: fake virt driver will be useful for scale tests though 21:42:00 I would like to test with e.g. induced failures 21:42:15 but we also need to spin VMs to check we can talk to the agents on them 21:42:17 this is something I'm going to be experimenting with nova over the next couple of weeks 21:42:34 zaneb: so maybe you've got a couple of types here 21:42:52 yes, some functional and some integration 21:43:04 I tried killing nova-api to test heat resilience, but nova went into an unrecoverable state ;) 21:43:09 because it's not clear to me if having a fake glance or a real glance (which honestly basically never fails) is better 21:44:02 stevebaker: interesting, because I kill and restart nova-api all the time in devstack, and it's fine :) 21:44:03 we'll need the real glance artifact repo when it comes out 21:44:21 sdague: yeah, and i think we've been seeing both patterns as people start working on func test jobs 21:44:28 jeblair: agreed 21:44:37 sdague: during server boot? I think I ended up with an undeletable server 21:44:40 sdague: i think neutron may run nothing else, swift runs everything (but probably uses nothing else) 21:45:20 stevebaker: there are tons of ways to get undeletable servers :) 21:45:46 stevebaker: anyway, yes, functional jobs should make fault injection something that's more managable to start doing 21:46:16 yep 21:46:59 ok, so we should set up both functional tests that mock out the services and integration tests that run against all of devstack 21:46:59 so I'd say right now feel free to carve you own path, and lets work to converge on working patterns in the middle of kilo when we have more experience 21:47:41 because I'm really hesitant to say "do it thusly" until we have more experience 21:48:28 cool, many thanks sdague & jeblair for your input 21:48:41 alright 21:48:46 #topic Open discussion 21:48:57 anything anyone ? 21:49:05 new feedback on the czar^Wliaison proposal ? 21:49:47 ttx: to add a data point on the earlier gate status topic: the waiting jobs queue touched 0 last night; so we're working through 1 days worth of changes in approx 1 day 21:49:59 I'll likely propose that delegation (if deemed necessary) is indicated on the main project wiki page 21:50:08 that's easier than a governance patch 21:50:44 jeblair: ok, not too bad. Expect load to grow though. I expect a peak on Thursday and Tuesday 21:50:53 that's using my meteorological model 21:51:10 ttx: +1 for wiki, but maybe they should be all in one big matrix? 21:51:19 rather than on individual program pages 21:51:32 zaneb: hmm, yes, that could prove easier 21:51:46 also, if anyone has a public cloud laying around, i'll be happy to use it :) 21:51:58 * ttx looks in his closet 21:51:58 ping me and i'll send you info 21:51:59 ttx: we'll soon see who has empty columns that way ;) 21:52:10 jeblair: i'm renewing my thinkpad, if you want my old one 21:52:20 jeblair: it has a SSD, as fast as on day 1 21:52:33 ttx: i think i might want it just for parts :) 21:52:59 don't scare my laptop, the new one is still on some boat from Hong-Kong 21:53:10 if I'm to trust UPS 21:53:20 "if" indeed :) 21:53:35 mor probably in some labs in west virginia 21:53:52 ok, unless someone has something to add... 21:54:01 let's close this now 21:54:33 #endmeeting