21:01:52 <ttx> #startmeeting project
21:01:53 <openstack> Meeting started Tue Aug 26 21:01:52 2014 UTC and is due to finish in 60 minutes.  The chair is ttx. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:01:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:01:57 <ttx> Our agenda for today:
21:01:58 <openstack> The meeting name has been set to 'project'
21:02:03 <ttx> #link http://wiki.openstack.org/Meetings/ProjectMeeting
21:02:14 <jgriffith> 0/
21:02:35 <ttx> #topic News from the 1:1 sync points
21:02:41 <ttx> Here is the log:
21:02:46 <ttx> #link http://eavesdrop.openstack.org/meetings/ptl_sync/2014/ptl_sync.2014-08-26-08.14.html
21:03:18 <ttx> Only Glance was MIA today
21:03:26 <ttx> In summary, juno-3 and feature freeze will hit most projects next week
21:03:40 <ttx> That means we have only 9 days to merge the remaining targeted features
21:03:42 * mestery notes hit is not necessarily a figurative term
21:04:08 <ttx> We are at about 15% of targets landed right now, and 80% of our juno-3 time is consumed
21:04:25 <ttx> so let's just say we raelly need to switch gears
21:04:47 <ttx> (to reach icehouse activity level we need to reach 80% of those targets :)
21:04:48 * dolphm makes a sacrifice to the transient failure gods
21:04:56 <ttx> (by some weird simplistic metric)
21:05:00 * mestery tosses one on after dolphm.
21:05:09 <ttx> We identified a few blocked things, which we'll discuss in this meeting
21:05:25 <dolphm> ttx: keystone is unblocked!
21:06:03 <ttx> cool
21:06:09 <ttx> dolphm: now get that feature merged :)
21:06:15 <ttx> #topic Other program news
21:06:21 <ttx> Any other program with a quick announcement ?
21:06:48 <mtreinish> ttx: well last week I forgot to announce that salv-orlando got the neutron parallel full gate enabled everywhere
21:06:53 <mtreinish> which is awesome
21:06:59 <mestery> mtreinish: ++, awesome work by salv-orlando!
21:07:02 <mestery> And the QA team
21:07:06 <david-lyle> excellent
21:07:17 <mestery> Neutron is proceeding with the incubator proposal for new features, we should have it up and running this week yet.
21:07:54 <jogo> mestery: where will neutron advanced services (FWaaS etc.) fit in to that
21:08:02 <mestery> #link https://wiki.openstack.org/wiki/Network/Incubator
21:08:19 <mestery> jogo: That's still under discussion, it's been talked about possibly moving them there, but not right away at least.
21:08:23 <mestery> jogo: So, stay tuned :)
21:09:18 <ttx> ok, anything else ?
21:09:30 <ttx> jeblair: how is the gate holding so far ?
21:09:47 <ttx> jeblair: despite all my efforts to stage the review activity, it still looks like we'll have a heavy week ahead
21:11:02 <jogo> mestery: thanks
21:11:43 <jogo> ttx: we are experiencing lots of unit test issues: http://jogo.github.io/gate/ (at the bottom)
21:12:32 <ttx> unit tests ? thats a new one
21:12:55 <jogo> ttx: yeah sadly it is
21:13:01 <ttx> any reason for the surge?
21:13:18 <zaneb> the magic random hash, one assumes
21:13:24 <mtreinish> ttx: I think it was a new testtools version breaking glance
21:13:40 <dolphm> jogo: is that keystone or trove? colors are nearly the same
21:13:50 <jogo> those things happened a few days ago so not sure
21:14:00 <sdague> the freshness checks were also turned off
21:14:08 <jogo> well here arethe unclassified gate failures: http://status.openstack.org/elastic-recheck/data/uncategorized.html
21:14:13 <sdague> I think the 24hr freshness check probably should come back
21:14:27 <dolphm> sdague: ++ why was that removed?
21:14:29 <SergeyLukjanov> for sahara there was an issue w/ new jsonschema release (in unit tests)
21:14:34 <ttx> sdague: hmm, so avoidable unit test fails are back too
21:14:35 <SlickNik> ttx: the upgrade to tox caused the hashseed value to be set randomly, and that was causing the python-26 tests to fail in trove.
21:14:50 <jogo> sdague: yeah that may be related. can we just bring back unit test freshness?
21:15:01 <ttx> I've seena number of changes pushed to bypass that though... aren't they merged by now?
21:15:03 <sdague> jogo: or drop unit tests from the gate :)
21:15:08 <sdague> which I wanted to do 8 months ago
21:15:17 <SlickNik> ttx: we've got a fix in place to work around that, but are also looking at fixing the problematic tests as a high pri.
21:15:49 <sdague> dolphm: so the theory was that people shouldn't have massively flakey unit tests.... because they are unit tests, so it shouldn't be an issue
21:15:54 <SlickNik> ttx: yes merged now.
21:16:02 <dolphm> sdague: theory!
21:16:10 <ttx> sdague, jogo: ok, let me know if there is anything I can do, or anything this particular meeting could help with
21:16:45 <jogo> ttx: people can classify there unclassified failures
21:16:49 <jogo> http://status.openstack.org/elastic-recheck/data/uncategorized.html
21:17:00 <jogo> ttx: so we have better insight into what is failing and why
21:17:12 <ttx> #action everyone help classify gate fails (http://status.openstack.org/elastic-recheck/data/uncategorized.html)
21:17:28 <ttx> jogo: I'll try to give it some of my remaining cycles tomorrow
21:17:32 <jogo> if anyone needs help I will be in -qa
21:17:46 <ttx> If we enter the week of death with a gate that has a cold, we won't be in good shape.
21:17:52 <dolphm> jogo: how do people go about classifying something?
21:18:06 <jogo> sdague: I think bringing back the 24 hour check (or maybe just a 72 hour or something) is a good idea right now
21:18:07 <anteaya> an email to the mailing list often helps with classifying failures
21:18:13 <jogo> dolphm: elastic-recheck fingerprints
21:18:16 <anteaya> who wants to send the email?
21:18:34 <sdague> jogo: the 24hr check before entering the gate is the one that protects the gate
21:18:48 <sdague> the 72 hr check probably just eats nodes
21:19:00 <jeblair> sorry, was afk
21:19:11 <jeblair> sdague: is there evidence the 24h check would help?
21:19:42 <sdague> jeblair: rigorous? no. I've just seen a few wrecking balls when I've looked on stuff that can't pass the gate
21:19:54 <sdague> those add a lot of time on the gate side
21:19:56 <jeblair> sdague: previously when we've looked, we have not found gate failures that would have been prevented by the 24h check
21:19:57 <jogo> ttx: 'gate' health: http://status.openstack.org/elastic-recheck/index.html
21:20:09 <jeblair> sdague: because we still have the requirement for a +1 before entering the gate
21:20:13 <dolphm> jogo: does 'recheck bug ###' still count towards classifications?
21:20:15 <jogo> ttx: if anyone knows javascript I have a few ideas for that page to make it more useful
21:20:21 <sdague> jeblair: the thing I called people out on the list about for glance would have
21:20:26 <jogo> dolphm: no, you have to add a fingerprint
21:20:30 <dolphm> s/still//
21:20:37 <sdague> jeblair: so I think we're probably just looking at different times
21:20:45 <jeblair> sdague: yes, but they have agreed to a procedural change for that
21:20:53 <ttx> ok, looks like we could use a quick email reminding/teaching people how to classify
21:21:08 <sdague> jeblair: http://lists.openstack.org/pipermail/openstack-dev/2014-August/043810.html
21:21:15 <jogo> ttx: I can send one out
21:21:25 <dolphm> jogo: is there any point to anything beyond just 'recheck' anymore?
21:21:58 <jogo> dolphm: yes, we don't collect the data on a regular basis but yes.
21:22:06 <sdague> jeblair: yeh, I think people mostly don't realize that's what they are doing. Anyway, if you don't feel it would help, I'm not going to push it
21:22:19 <jogo> adding a bug number shows you know what the issue is, and aren't just ignoring things
21:22:57 <jogo> dolphm: you can now say any reason after the word recehck
21:23:08 <dolphm> jogo: good to know
21:23:12 <jogo> like 'recheck -- I have a hunch something else just broke this'
21:23:20 <jeblair> sdague: the thing that would convince me is a change that passed > 24 hours ago but reliably failed < 24 hours ago getting into the gate
21:23:34 <sdague> right, that glance stuff was that
21:23:36 <jogo> but bare rechecks send the social message that you don't know why it failed and don't care
21:23:38 <jeblair> sdague: i realize that the tox switch could do that ^ and likely happened with glance
21:23:40 <jogo> IMHO)
21:23:46 <jeblair> sdague: but we don't globally break tox every day :)
21:23:46 <ttx> OK, let's move on... but yes, any extra effort to get the gate back in shape in this crucial week is very appreciated
21:24:02 <ttx> #topic juno-3 blueprints blocked on cross-project issues
21:24:03 <sdague> jeblair: sure
21:24:14 <ttx> I think the only one we ahve left at this point is:
21:24:17 <ttx> * ceilometer/grenade-resource-survivability
21:24:25 <ttx> Blocked pending discussion between jogo and Chris Dent
21:24:37 <gordc> i've synced with jogo about this before the meeting.
21:24:43 <ttx> ah, great
21:24:45 <gordc> patch can be tracked here: https://review.openstack.org/#/c/102354/
21:25:01 <ttx> gordc: so it's unblocked now ?
21:25:09 <gordc> basically we just need feedback for the patch since cdent (dev working on implementation) is sort of unsure how to proceed
21:25:18 <gordc> ttx: somewhat
21:25:36 <gordc> jogo. i think cdent posted a question to your reply in gerrit
21:26:00 <ttx> gordc: note that if this is just touching tests, it's fine to land post-FF
21:26:02 <jogo> gordc: I'l take a look
21:26:05 <ttx> (pre-RC1)
21:26:16 <gordc> ttx: good to know.
21:26:37 <gordc> it should just be touching tests... but i'm not sure how much the scope will change.
21:27:05 <ttx> if it can be completed for j3, all the better, but if not, it can automatically be targeted to RC1
21:27:17 <ttx> #topic Packaging for functional tests (zaneb)
21:27:22 <gordc> i'm not that familiar with grenade/javelin stuff myself so i'm pretty useless there... if anyone has knowledge it'd be cool if we got your opinion. :)
21:27:31 <ttx> #link http://lists.openstack.org/pipermail/openstack-dev/2014-August/044072.html
21:27:36 <ttx> zaneb: around?
21:27:39 <zaneb> yep
21:27:44 <ttx> Floor is yours
21:27:52 <zaneb> was pasting the #link, but you beat me to it ;)
21:27:55 <jogo> gordc: responded
21:27:59 <zaneb> stevebaker: o/
21:28:03 <gordc> jogo: awesome. much appreciated
21:28:11 <zaneb> so, basically it's all in that email
21:28:19 <jogo> sdague: it would be good to get your thoughts on it too https://review.openstack.org/#/c/102354/
21:28:26 <stevebaker> zaneb: \o
21:28:32 <zaneb> but we're looking for a consensus on how the new in-project functional tests should be packaged
21:28:39 <sdague> zaneb: so I think the disconnect is around the idea that these are tempest plugins
21:28:49 <sdague> because that's really not what was intended
21:28:55 <zaneb> input from folks who already have in-tree functional tests on that thread would be helpful
21:29:04 <sdague> I don't see tempest having anything to do with these tests
21:29:13 <zaneb> sdague: interesting
21:29:20 <stevebaker> I just replied that thread. tl;dr +1 on zaneb's suggestion of <project>-integrationtests package
21:29:28 <sdague> this is project functional testing
21:29:43 <zaneb> the tests we want to add in Heat are basically the scenario tests we haven't been able to land in Tempest
21:29:55 <zaneb> as far as I understand it
21:30:08 <stevebaker> sdague: agreed, but because of the nature of heat our functional tests are really integration tests. All we do is interact with other services
21:30:11 <zaneb> so it may be that this is a different thing to what everyone else is doing
21:30:18 <sdague> right, it might be
21:30:43 <zaneb> which I guess is basically the question I am asking in that thread
21:30:46 <dhellmann> yeah, I expect the functional tests for oslo.messaging to not rely on other services, for example
21:30:56 <mestery> These sound different than the neutron functional tests
21:31:13 <mestery> The neutron ones functionally test out bits that neutron relies on, but no other openstack services are required to my knowledge.
21:31:28 <mestery> e.g. ip commands, ovs-vsctl, etc.
21:32:04 <stevebaker> I think the swift functional tests run against a full running swift (needs keystone?) but swift would be self-contained so they could still be considered functional
21:32:37 <stevebaker> notmyname: ^?
21:34:11 <zaneb> so if we're the only project with this kind of tests, I'm happy to do our own thing
21:34:16 <stevebaker> sdague: the heat functional tests are independent of tempest, they have forklifted some of the tempest scenario scaffolding just to get started
21:34:32 <zaneb> I don't want to do our own thing just for the sake of it though
21:35:03 <jeblair> if the other projects are actually running, then it's an integration test
21:35:07 <sdague> zaneb / stevebaker - I think the challenge is heat's a weird starting point for defining this because it is so far up the stack
21:35:20 <zaneb> sdague: agreed
21:35:20 <jeblair> and i think all of the thinking about how functional tests can be optimized doesn't apply
21:35:49 <stevebaker> yep
21:36:58 <sdague> zaneb: so I'd say given that, you guys should run with whatever works for you
21:36:59 <stevebaker> and I do see check-heat-dsvm-functional becoming a voting job on nova, keystone, neutron so that it doesn't break all the time
21:37:03 <ttx> zaneb: looks like you reached a conclusion there
21:37:12 <sdague> stevebaker: so the point is to *not* do that
21:37:27 <zaneb> cool, thanks everyone
21:37:30 <sdague> if you depend on certain behavior in those projects, that should be locked down inside that project
21:37:37 <sdague> not by lots of more cross project tests
21:37:49 <sdague> because that's the scaling issue we are trying to address
21:38:23 <stevebaker> sdague: we may need to wait and see how often it breaks. I guess regressions require a new tempest test to prevent a repeat
21:38:34 <jeblair> sdague: this is the "nova has a unit test that it doesn't break heat's use of some api" idea
21:38:43 <sdague> jeblair: exactly
21:39:05 <sdague> if you depend on a behavior in a component, and they keep breaking it, put some tests in the project to stop that :)
21:39:51 <jeblair> sdague: so should we run the heat functional tests with everything running for real, but only gate heat on it, and when it breaks, fix it and add a unit test in the other project?
21:39:54 <zaneb> I think it's unlikely that Nova &c. will break us that often
21:40:05 <sdague> jeblair: that's an option
21:40:07 <stevebaker> does check-heat-dsvm-functional need to be renamed -integration? (please no ;)
21:40:24 <sdague> stevebaker: I vote no :)
21:40:25 <zaneb> jeblair: that's what I'm thinking
21:40:26 <jeblair> sdague: or should heat run with nova, etc mocked out?
21:40:37 <jeblair> sdague: (trying to understand what the other options would be)
21:40:46 <sdague> jeblair: so those are the 2 options
21:41:03 <stevebaker> jeblair: we need to test heat<-> agent interaction, so nova needs to keep it real
21:41:06 <sdague> I'm not sure if we're going to know which is more effective in catching and resolving issues until we try
21:41:25 <zaneb> I actually think we need both
21:41:32 <stevebaker> jeblair: fake virt driver will be useful for scale tests though
21:42:00 <zaneb> I would like to test with e.g. induced failures
21:42:15 <zaneb> but we also need to spin VMs to check we can talk to the agents on them
21:42:17 <sdague> this is something I'm going to be experimenting with nova over the next couple of weeks
21:42:34 <sdague> zaneb: so maybe you've got a couple of types here
21:42:52 <zaneb> yes, some functional and some integration
21:43:04 <stevebaker> I tried killing nova-api to test heat resilience, but nova went into an unrecoverable state ;)
21:43:09 <sdague> because it's not clear to me if having a fake glance or a real glance (which honestly basically never fails) is better
21:44:02 <sdague> stevebaker: interesting, because I kill and restart nova-api all the time in devstack, and it's fine :)
21:44:03 <zaneb> we'll need the real glance artifact repo when it comes out
21:44:21 <jeblair> sdague: yeah, and i think we've been seeing both patterns as people start working on func test jobs
21:44:28 <sdague> jeblair: agreed
21:44:37 <stevebaker> sdague: during server boot? I think I ended up with an undeletable server
21:44:40 <jeblair> sdague: i think neutron may run nothing else, swift runs everything (but probably uses nothing else)
21:45:20 <sdague> stevebaker: there are tons of ways to get undeletable servers :)
21:45:46 <sdague> stevebaker: anyway, yes, functional jobs should make fault injection something that's more managable to start doing
21:46:16 <stevebaker> yep
21:46:59 <zaneb> ok, so we should set up both functional tests that mock out the services and integration tests that run against all of devstack
21:46:59 <sdague> so I'd say right now feel free to carve you own path, and lets work to converge on working patterns in the middle of kilo when we have more experience
21:47:41 <sdague> because I'm really hesitant to say "do it thusly" until we have more experience
21:48:28 <zaneb> cool, many thanks sdague & jeblair for your input
21:48:41 <ttx> alright
21:48:46 <ttx> #topic Open discussion
21:48:57 <ttx> anything anyone ?
21:49:05 <ttx> new feedback on the czar^Wliaison proposal ?
21:49:47 <jeblair> ttx: to add a data point on the earlier gate status topic: the waiting jobs queue touched 0 last night; so we're working through 1 days worth of changes in approx 1 day
21:49:59 <ttx> I'll likely propose that delegation (if deemed necessary) is indicated on the main project wiki page
21:50:08 <ttx> that's easier than a governance patch
21:50:44 <ttx> jeblair: ok, not too bad. Expect load to grow though. I expect a peak on Thursday and Tuesday
21:50:53 <ttx> that's using my meteorological model
21:51:10 <zaneb> ttx: +1 for wiki, but maybe they should be all in one big matrix?
21:51:19 <zaneb> rather than on individual program pages
21:51:32 <ttx> zaneb: hmm, yes, that could prove easier
21:51:46 <jeblair> also, if anyone has a public cloud laying around, i'll be happy to use it  :)
21:51:58 * ttx looks in his closet
21:51:58 <jeblair> ping me and i'll send you info
21:51:59 <zaneb> ttx: we'll soon see who has empty columns that way ;)
21:52:10 <ttx> jeblair: i'm renewing my thinkpad, if you want my old one
21:52:20 <ttx> jeblair: it has a SSD, as fast as on day 1
21:52:33 <jeblair> ttx: i think i might want it just for parts :)
21:52:59 <ttx> don't scare my laptop, the new one is still on some boat from Hong-Kong
21:53:10 <ttx> if I'm to trust UPS
21:53:20 <jeblair> "if" indeed :)
21:53:35 <ttx> mor probably in some labs in west virginia
21:53:52 <ttx> ok, unless someone has something to add...
21:54:01 <ttx> let's close this now
21:54:33 <ttx> #endmeeting