21:01:52 <ttx> #startmeeting project 21:01:53 <openstack> Meeting started Tue Aug 26 21:01:52 2014 UTC and is due to finish in 60 minutes. The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:57 <ttx> Our agenda for today: 21:01:58 <openstack> The meeting name has been set to 'project' 21:02:03 <ttx> #link http://wiki.openstack.org/Meetings/ProjectMeeting 21:02:14 <jgriffith> 0/ 21:02:35 <ttx> #topic News from the 1:1 sync points 21:02:41 <ttx> Here is the log: 21:02:46 <ttx> #link http://eavesdrop.openstack.org/meetings/ptl_sync/2014/ptl_sync.2014-08-26-08.14.html 21:03:18 <ttx> Only Glance was MIA today 21:03:26 <ttx> In summary, juno-3 and feature freeze will hit most projects next week 21:03:40 <ttx> That means we have only 9 days to merge the remaining targeted features 21:03:42 * mestery notes hit is not necessarily a figurative term 21:04:08 <ttx> We are at about 15% of targets landed right now, and 80% of our juno-3 time is consumed 21:04:25 <ttx> so let's just say we raelly need to switch gears 21:04:47 <ttx> (to reach icehouse activity level we need to reach 80% of those targets :) 21:04:48 * dolphm makes a sacrifice to the transient failure gods 21:04:56 <ttx> (by some weird simplistic metric) 21:05:00 * mestery tosses one on after dolphm. 21:05:09 <ttx> We identified a few blocked things, which we'll discuss in this meeting 21:05:25 <dolphm> ttx: keystone is unblocked! 21:06:03 <ttx> cool 21:06:09 <ttx> dolphm: now get that feature merged :) 21:06:15 <ttx> #topic Other program news 21:06:21 <ttx> Any other program with a quick announcement ? 21:06:48 <mtreinish> ttx: well last week I forgot to announce that salv-orlando got the neutron parallel full gate enabled everywhere 21:06:53 <mtreinish> which is awesome 21:06:59 <mestery> mtreinish: ++, awesome work by salv-orlando! 21:07:02 <mestery> And the QA team 21:07:06 <david-lyle> excellent 21:07:17 <mestery> Neutron is proceeding with the incubator proposal for new features, we should have it up and running this week yet. 21:07:54 <jogo> mestery: where will neutron advanced services (FWaaS etc.) fit in to that 21:08:02 <mestery> #link https://wiki.openstack.org/wiki/Network/Incubator 21:08:19 <mestery> jogo: That's still under discussion, it's been talked about possibly moving them there, but not right away at least. 21:08:23 <mestery> jogo: So, stay tuned :) 21:09:18 <ttx> ok, anything else ? 21:09:30 <ttx> jeblair: how is the gate holding so far ? 21:09:47 <ttx> jeblair: despite all my efforts to stage the review activity, it still looks like we'll have a heavy week ahead 21:11:02 <jogo> mestery: thanks 21:11:43 <jogo> ttx: we are experiencing lots of unit test issues: http://jogo.github.io/gate/ (at the bottom) 21:12:32 <ttx> unit tests ? thats a new one 21:12:55 <jogo> ttx: yeah sadly it is 21:13:01 <ttx> any reason for the surge? 21:13:18 <zaneb> the magic random hash, one assumes 21:13:24 <mtreinish> ttx: I think it was a new testtools version breaking glance 21:13:40 <dolphm> jogo: is that keystone or trove? colors are nearly the same 21:13:50 <jogo> those things happened a few days ago so not sure 21:14:00 <sdague> the freshness checks were also turned off 21:14:08 <jogo> well here arethe unclassified gate failures: http://status.openstack.org/elastic-recheck/data/uncategorized.html 21:14:13 <sdague> I think the 24hr freshness check probably should come back 21:14:27 <dolphm> sdague: ++ why was that removed? 21:14:29 <SergeyLukjanov> for sahara there was an issue w/ new jsonschema release (in unit tests) 21:14:34 <ttx> sdague: hmm, so avoidable unit test fails are back too 21:14:35 <SlickNik> ttx: the upgrade to tox caused the hashseed value to be set randomly, and that was causing the python-26 tests to fail in trove. 21:14:50 <jogo> sdague: yeah that may be related. can we just bring back unit test freshness? 21:15:01 <ttx> I've seena number of changes pushed to bypass that though... aren't they merged by now? 21:15:03 <sdague> jogo: or drop unit tests from the gate :) 21:15:08 <sdague> which I wanted to do 8 months ago 21:15:17 <SlickNik> ttx: we've got a fix in place to work around that, but are also looking at fixing the problematic tests as a high pri. 21:15:49 <sdague> dolphm: so the theory was that people shouldn't have massively flakey unit tests.... because they are unit tests, so it shouldn't be an issue 21:15:54 <SlickNik> ttx: yes merged now. 21:16:02 <dolphm> sdague: theory! 21:16:10 <ttx> sdague, jogo: ok, let me know if there is anything I can do, or anything this particular meeting could help with 21:16:45 <jogo> ttx: people can classify there unclassified failures 21:16:49 <jogo> http://status.openstack.org/elastic-recheck/data/uncategorized.html 21:17:00 <jogo> ttx: so we have better insight into what is failing and why 21:17:12 <ttx> #action everyone help classify gate fails (http://status.openstack.org/elastic-recheck/data/uncategorized.html) 21:17:28 <ttx> jogo: I'll try to give it some of my remaining cycles tomorrow 21:17:32 <jogo> if anyone needs help I will be in -qa 21:17:46 <ttx> If we enter the week of death with a gate that has a cold, we won't be in good shape. 21:17:52 <dolphm> jogo: how do people go about classifying something? 21:18:06 <jogo> sdague: I think bringing back the 24 hour check (or maybe just a 72 hour or something) is a good idea right now 21:18:07 <anteaya> an email to the mailing list often helps with classifying failures 21:18:13 <jogo> dolphm: elastic-recheck fingerprints 21:18:16 <anteaya> who wants to send the email? 21:18:34 <sdague> jogo: the 24hr check before entering the gate is the one that protects the gate 21:18:48 <sdague> the 72 hr check probably just eats nodes 21:19:00 <jeblair> sorry, was afk 21:19:11 <jeblair> sdague: is there evidence the 24h check would help? 21:19:42 <sdague> jeblair: rigorous? no. I've just seen a few wrecking balls when I've looked on stuff that can't pass the gate 21:19:54 <sdague> those add a lot of time on the gate side 21:19:56 <jeblair> sdague: previously when we've looked, we have not found gate failures that would have been prevented by the 24h check 21:19:57 <jogo> ttx: 'gate' health: http://status.openstack.org/elastic-recheck/index.html 21:20:09 <jeblair> sdague: because we still have the requirement for a +1 before entering the gate 21:20:13 <dolphm> jogo: does 'recheck bug ###' still count towards classifications? 21:20:15 <jogo> ttx: if anyone knows javascript I have a few ideas for that page to make it more useful 21:20:21 <sdague> jeblair: the thing I called people out on the list about for glance would have 21:20:26 <jogo> dolphm: no, you have to add a fingerprint 21:20:30 <dolphm> s/still// 21:20:37 <sdague> jeblair: so I think we're probably just looking at different times 21:20:45 <jeblair> sdague: yes, but they have agreed to a procedural change for that 21:20:53 <ttx> ok, looks like we could use a quick email reminding/teaching people how to classify 21:21:08 <sdague> jeblair: http://lists.openstack.org/pipermail/openstack-dev/2014-August/043810.html 21:21:15 <jogo> ttx: I can send one out 21:21:25 <dolphm> jogo: is there any point to anything beyond just 'recheck' anymore? 21:21:58 <jogo> dolphm: yes, we don't collect the data on a regular basis but yes. 21:22:06 <sdague> jeblair: yeh, I think people mostly don't realize that's what they are doing. Anyway, if you don't feel it would help, I'm not going to push it 21:22:19 <jogo> adding a bug number shows you know what the issue is, and aren't just ignoring things 21:22:57 <jogo> dolphm: you can now say any reason after the word recehck 21:23:08 <dolphm> jogo: good to know 21:23:12 <jogo> like 'recheck -- I have a hunch something else just broke this' 21:23:20 <jeblair> sdague: the thing that would convince me is a change that passed > 24 hours ago but reliably failed < 24 hours ago getting into the gate 21:23:34 <sdague> right, that glance stuff was that 21:23:36 <jogo> but bare rechecks send the social message that you don't know why it failed and don't care 21:23:38 <jeblair> sdague: i realize that the tox switch could do that ^ and likely happened with glance 21:23:40 <jogo> IMHO) 21:23:46 <jeblair> sdague: but we don't globally break tox every day :) 21:23:46 <ttx> OK, let's move on... but yes, any extra effort to get the gate back in shape in this crucial week is very appreciated 21:24:02 <ttx> #topic juno-3 blueprints blocked on cross-project issues 21:24:03 <sdague> jeblair: sure 21:24:14 <ttx> I think the only one we ahve left at this point is: 21:24:17 <ttx> * ceilometer/grenade-resource-survivability 21:24:25 <ttx> Blocked pending discussion between jogo and Chris Dent 21:24:37 <gordc> i've synced with jogo about this before the meeting. 21:24:43 <ttx> ah, great 21:24:45 <gordc> patch can be tracked here: https://review.openstack.org/#/c/102354/ 21:25:01 <ttx> gordc: so it's unblocked now ? 21:25:09 <gordc> basically we just need feedback for the patch since cdent (dev working on implementation) is sort of unsure how to proceed 21:25:18 <gordc> ttx: somewhat 21:25:36 <gordc> jogo. i think cdent posted a question to your reply in gerrit 21:26:00 <ttx> gordc: note that if this is just touching tests, it's fine to land post-FF 21:26:02 <jogo> gordc: I'l take a look 21:26:05 <ttx> (pre-RC1) 21:26:16 <gordc> ttx: good to know. 21:26:37 <gordc> it should just be touching tests... but i'm not sure how much the scope will change. 21:27:05 <ttx> if it can be completed for j3, all the better, but if not, it can automatically be targeted to RC1 21:27:17 <ttx> #topic Packaging for functional tests (zaneb) 21:27:22 <gordc> i'm not that familiar with grenade/javelin stuff myself so i'm pretty useless there... if anyone has knowledge it'd be cool if we got your opinion. :) 21:27:31 <ttx> #link http://lists.openstack.org/pipermail/openstack-dev/2014-August/044072.html 21:27:36 <ttx> zaneb: around? 21:27:39 <zaneb> yep 21:27:44 <ttx> Floor is yours 21:27:52 <zaneb> was pasting the #link, but you beat me to it ;) 21:27:55 <jogo> gordc: responded 21:27:59 <zaneb> stevebaker: o/ 21:28:03 <gordc> jogo: awesome. much appreciated 21:28:11 <zaneb> so, basically it's all in that email 21:28:19 <jogo> sdague: it would be good to get your thoughts on it too https://review.openstack.org/#/c/102354/ 21:28:26 <stevebaker> zaneb: \o 21:28:32 <zaneb> but we're looking for a consensus on how the new in-project functional tests should be packaged 21:28:39 <sdague> zaneb: so I think the disconnect is around the idea that these are tempest plugins 21:28:49 <sdague> because that's really not what was intended 21:28:55 <zaneb> input from folks who already have in-tree functional tests on that thread would be helpful 21:29:04 <sdague> I don't see tempest having anything to do with these tests 21:29:13 <zaneb> sdague: interesting 21:29:20 <stevebaker> I just replied that thread. tl;dr +1 on zaneb's suggestion of <project>-integrationtests package 21:29:28 <sdague> this is project functional testing 21:29:43 <zaneb> the tests we want to add in Heat are basically the scenario tests we haven't been able to land in Tempest 21:29:55 <zaneb> as far as I understand it 21:30:08 <stevebaker> sdague: agreed, but because of the nature of heat our functional tests are really integration tests. All we do is interact with other services 21:30:11 <zaneb> so it may be that this is a different thing to what everyone else is doing 21:30:18 <sdague> right, it might be 21:30:43 <zaneb> which I guess is basically the question I am asking in that thread 21:30:46 <dhellmann> yeah, I expect the functional tests for oslo.messaging to not rely on other services, for example 21:30:56 <mestery> These sound different than the neutron functional tests 21:31:13 <mestery> The neutron ones functionally test out bits that neutron relies on, but no other openstack services are required to my knowledge. 21:31:28 <mestery> e.g. ip commands, ovs-vsctl, etc. 21:32:04 <stevebaker> I think the swift functional tests run against a full running swift (needs keystone?) but swift would be self-contained so they could still be considered functional 21:32:37 <stevebaker> notmyname: ^? 21:34:11 <zaneb> so if we're the only project with this kind of tests, I'm happy to do our own thing 21:34:16 <stevebaker> sdague: the heat functional tests are independent of tempest, they have forklifted some of the tempest scenario scaffolding just to get started 21:34:32 <zaneb> I don't want to do our own thing just for the sake of it though 21:35:03 <jeblair> if the other projects are actually running, then it's an integration test 21:35:07 <sdague> zaneb / stevebaker - I think the challenge is heat's a weird starting point for defining this because it is so far up the stack 21:35:20 <zaneb> sdague: agreed 21:35:20 <jeblair> and i think all of the thinking about how functional tests can be optimized doesn't apply 21:35:49 <stevebaker> yep 21:36:58 <sdague> zaneb: so I'd say given that, you guys should run with whatever works for you 21:36:59 <stevebaker> and I do see check-heat-dsvm-functional becoming a voting job on nova, keystone, neutron so that it doesn't break all the time 21:37:03 <ttx> zaneb: looks like you reached a conclusion there 21:37:12 <sdague> stevebaker: so the point is to *not* do that 21:37:27 <zaneb> cool, thanks everyone 21:37:30 <sdague> if you depend on certain behavior in those projects, that should be locked down inside that project 21:37:37 <sdague> not by lots of more cross project tests 21:37:49 <sdague> because that's the scaling issue we are trying to address 21:38:23 <stevebaker> sdague: we may need to wait and see how often it breaks. I guess regressions require a new tempest test to prevent a repeat 21:38:34 <jeblair> sdague: this is the "nova has a unit test that it doesn't break heat's use of some api" idea 21:38:43 <sdague> jeblair: exactly 21:39:05 <sdague> if you depend on a behavior in a component, and they keep breaking it, put some tests in the project to stop that :) 21:39:51 <jeblair> sdague: so should we run the heat functional tests with everything running for real, but only gate heat on it, and when it breaks, fix it and add a unit test in the other project? 21:39:54 <zaneb> I think it's unlikely that Nova &c. will break us that often 21:40:05 <sdague> jeblair: that's an option 21:40:07 <stevebaker> does check-heat-dsvm-functional need to be renamed -integration? (please no ;) 21:40:24 <sdague> stevebaker: I vote no :) 21:40:25 <zaneb> jeblair: that's what I'm thinking 21:40:26 <jeblair> sdague: or should heat run with nova, etc mocked out? 21:40:37 <jeblair> sdague: (trying to understand what the other options would be) 21:40:46 <sdague> jeblair: so those are the 2 options 21:41:03 <stevebaker> jeblair: we need to test heat<-> agent interaction, so nova needs to keep it real 21:41:06 <sdague> I'm not sure if we're going to know which is more effective in catching and resolving issues until we try 21:41:25 <zaneb> I actually think we need both 21:41:32 <stevebaker> jeblair: fake virt driver will be useful for scale tests though 21:42:00 <zaneb> I would like to test with e.g. induced failures 21:42:15 <zaneb> but we also need to spin VMs to check we can talk to the agents on them 21:42:17 <sdague> this is something I'm going to be experimenting with nova over the next couple of weeks 21:42:34 <sdague> zaneb: so maybe you've got a couple of types here 21:42:52 <zaneb> yes, some functional and some integration 21:43:04 <stevebaker> I tried killing nova-api to test heat resilience, but nova went into an unrecoverable state ;) 21:43:09 <sdague> because it's not clear to me if having a fake glance or a real glance (which honestly basically never fails) is better 21:44:02 <sdague> stevebaker: interesting, because I kill and restart nova-api all the time in devstack, and it's fine :) 21:44:03 <zaneb> we'll need the real glance artifact repo when it comes out 21:44:21 <jeblair> sdague: yeah, and i think we've been seeing both patterns as people start working on func test jobs 21:44:28 <sdague> jeblair: agreed 21:44:37 <stevebaker> sdague: during server boot? I think I ended up with an undeletable server 21:44:40 <jeblair> sdague: i think neutron may run nothing else, swift runs everything (but probably uses nothing else) 21:45:20 <sdague> stevebaker: there are tons of ways to get undeletable servers :) 21:45:46 <sdague> stevebaker: anyway, yes, functional jobs should make fault injection something that's more managable to start doing 21:46:16 <stevebaker> yep 21:46:59 <zaneb> ok, so we should set up both functional tests that mock out the services and integration tests that run against all of devstack 21:46:59 <sdague> so I'd say right now feel free to carve you own path, and lets work to converge on working patterns in the middle of kilo when we have more experience 21:47:41 <sdague> because I'm really hesitant to say "do it thusly" until we have more experience 21:48:28 <zaneb> cool, many thanks sdague & jeblair for your input 21:48:41 <ttx> alright 21:48:46 <ttx> #topic Open discussion 21:48:57 <ttx> anything anyone ? 21:49:05 <ttx> new feedback on the czar^Wliaison proposal ? 21:49:47 <jeblair> ttx: to add a data point on the earlier gate status topic: the waiting jobs queue touched 0 last night; so we're working through 1 days worth of changes in approx 1 day 21:49:59 <ttx> I'll likely propose that delegation (if deemed necessary) is indicated on the main project wiki page 21:50:08 <ttx> that's easier than a governance patch 21:50:44 <ttx> jeblair: ok, not too bad. Expect load to grow though. I expect a peak on Thursday and Tuesday 21:50:53 <ttx> that's using my meteorological model 21:51:10 <zaneb> ttx: +1 for wiki, but maybe they should be all in one big matrix? 21:51:19 <zaneb> rather than on individual program pages 21:51:32 <ttx> zaneb: hmm, yes, that could prove easier 21:51:46 <jeblair> also, if anyone has a public cloud laying around, i'll be happy to use it :) 21:51:58 * ttx looks in his closet 21:51:58 <jeblair> ping me and i'll send you info 21:51:59 <zaneb> ttx: we'll soon see who has empty columns that way ;) 21:52:10 <ttx> jeblair: i'm renewing my thinkpad, if you want my old one 21:52:20 <ttx> jeblair: it has a SSD, as fast as on day 1 21:52:33 <jeblair> ttx: i think i might want it just for parts :) 21:52:59 <ttx> don't scare my laptop, the new one is still on some boat from Hong-Kong 21:53:10 <ttx> if I'm to trust UPS 21:53:20 <jeblair> "if" indeed :) 21:53:35 <ttx> mor probably in some labs in west virginia 21:53:52 <ttx> ok, unless someone has something to add... 21:54:01 <ttx> let's close this now 21:54:33 <ttx> #endmeeting