19:01:36 <jeblair> #startmeeting ci 19:01:37 <openstack> Meeting started Tue Nov 13 19:01:36 2012 UTC. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:38 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:39 <openstack> The meeting name has been set to 'ci' 19:02:08 <jeblair> #topic actions from last meeting 19:02:32 <jeblair> fungi: any movement on foundation server stuff? 19:02:39 <fungi> talked to reed again today 19:03:01 <fungi> he's sitting across a conference table from toddmorey for the next two days, and will try to get some info/movement on it 19:03:37 <fungi> my patches are probably a bit stale, so i'll rebase them again 19:03:43 <jeblair> #action toddmorey provide a test foundation server 19:03:43 <fungi> other than that, nothing 19:04:44 <jeblair> i do not think that mordred updated the bug list 19:04:54 <clarkb> I am pretty sure he hasnt 19:04:57 <jeblair> #action mordred bugify summit actions 19:05:08 <jeblair> #action everyone collect action items from other summit session etherpads and register as bugs 19:05:18 <jeblair> and i confess, i have not done that second thing yet myself. 19:05:45 <jeblair> I _have_ deconfigured nova-volume testing on master... 19:06:01 <jeblair> so the current devstack-gate only runs cinder on master 19:06:03 <clarkb> I did put a thing or two on the state-of-ci list so that mordred would do it :) 19:06:16 <jeblair> and runs cinder+n-vol for folsom, and n-vol for <folsom 19:06:35 <jeblair> i think the mechanics for that will work out well for similar projects, like quantum 19:07:01 <jeblair> #topic grenade / quantum 19:07:41 <jeblair> These haven't been progressing much; and I need to spend some time tracking people down and trying to get them moving again. 19:08:39 <clarkb> what is left for quantum? 19:08:43 <jeblair> dtroyer suggested that grenade may be making some assumptions about where upgrade data are stored that is not compatible with running it in the devstack gate. 19:08:46 <jeblair> as for quantum 19:08:55 <jeblair> there is this change: 19:08:57 <jeblair> #link https://review.openstack.org/#/c/14990/ 19:09:10 <jeblair> which I'd like to get some nova-core people looking at... 19:09:53 <jeblair> particularly since it seems to do a lot of wrapping devstack exercises with "if using quantum...; else..." 19:10:27 <clarkb> mordred just walked into a different meeting. fyi... 19:10:55 <jeblair> <sigh> 19:11:44 <jeblair> #topic testr and friends 19:11:55 <jeblair> clarkb: what's up with testr? 19:11:55 <mordred> jeblair: hey man - some of us have to walk in to meetings sometimes 19:12:06 <pabelanger> o/ 19:12:26 <mordred> I have patches that get nova all the way to using testr 19:12:27 <clarkb> so I haven'y been able to do much with testr while you guys were conferencing, but we haven't had a meeting over that time period either... 19:12:28 <jeblair> mordred: glad you're here. you're up next. :) 19:12:44 <mordred> except - the last few patches make things SLOW 19:12:50 <jeblair> that's sad 19:12:56 <mordred> I have no yet been able to diagnose 19:12:57 <clarkb> I have a patch that basically got testr mostly working so that nova devs could look at it and play with it 19:13:06 <clarkb> I think jog0 was one of the few to really take a look at it. 19:13:13 <jeblair> clarkb, mordred: are these the same or different patches? 19:13:28 <clarkb> jeblair: different, mine was more just get it to go and mordreds is more make it work properly 19:14:05 <clarkb> one comment from jog0 was wondering if we could have nose and testr as options... 19:14:09 <jeblair> no 19:14:27 <clarkb> I kind of figured we didn'y want to support both. However, getting coverage with testr may be tricky 19:14:42 <fungi> run_tests.sh and tox options are already causing enough confusion on what will pass ci testing 19:14:52 <jeblair> the attempt to move from run_tests to tox left us with two ways of running tests. i really don't want four. 19:14:54 <jeblair> fungi: exactly. 19:14:56 <mordred> I imagine it has something to do with making database init proper fixtures 19:15:21 <clarkb> the problem with testr and coverage is testr runs everything in different processes and relies upon a line protocol so you can't just run it under coverage to get that info 19:16:03 <clarkb> each individual process would need to be told to run under coverage then you will need to merge the results. certainly possible, just something I haven't sorted out yet 19:16:32 <mordred> clarkb: I think we should not care about testr for coverage tests 19:16:41 <jeblair> mordred: how will we run coverage tests then? 19:16:54 <mordred> any of the normal test runners 19:17:04 <mordred> coverage has a wrapper 19:17:05 <jeblair> i thought testr was more "normal" than nose 19:17:08 <mordred> no 19:17:24 <mordred> testr requires that the unittests _themselves_ operate in the usual unittest protocol 19:17:38 <mordred> but it's quite a complex pipeline approach suitable for running tests - but not for doing other things 19:17:57 <jeblair> mordred: then what does it yield us if we still have to run nose? 19:18:25 <mordred> running nose for the coverage tests should be fine - because if that fails weirdly, whatever 19:18:39 <mordred> the tests will have been correctly run via the unittests 19:18:56 <mordred> alternately 19:19:17 <jeblair> mordred: that sounds nice, but i just heard "there are now twice as many ways running tests can break". 19:19:37 <mordred> there's still only one way to run tests - coverage is a post-analysis things 19:19:42 <mordred> but I hear you 19:19:44 <mordred> it's just a thought 19:19:57 <mordred> the other option is going to be patching subunit/testtools to grok the coverage library 19:20:10 <jeblair> mordred: actually, we've been talking about making it pre-merge, so you can factor coverage changes into merge decisions. 19:20:13 <clarkb> I think we can just run subunit/testtools under coverage 19:20:17 <mordred> ok. nevermind then 19:20:21 <mordred> we need to patch subunit 19:20:27 <clarkb> mordred: why? 19:20:31 * mordred poked at this on the plane 19:20:37 <clarkb> :( 19:20:59 <mordred> I could be wrong - I'm just pretty sure 19:21:02 <mordred> please prove me wrong :) 19:21:04 <jeblair> (and running it pre-merge means that coverage run-time affects overall check test run time, btw) 19:21:36 <clarkb> I want to say that you can run subunit under coverage and as long as it doesn't fork you get all thedetails 19:21:37 <mordred> yeah. k. let's make coverage work with testr then 19:21:59 <mordred> we need to be able to work with it in parallel mode - which is why we need patching I think 19:22:23 <clarkb> or have some external way to merge multiple coverage reports 19:22:31 <jeblair> clarkb: you want to continue hacking on that? 19:22:35 <clarkb> but yes, patching subunit/testtools is a possibility. 19:22:37 <clarkb> jeblair: yes 19:22:59 <jeblair> #action clarkb look into subunit/testtools with coverage 19:23:21 <clarkb> mordred: can you #link your nova change for testr? 19:24:31 <mordred> #link https://review.openstack.org/#/c/14949/ 19:24:49 <jeblair> shall we move onto project creation? 19:25:02 <clarkb> yes, I think we have covered testr for now 19:25:06 <jeblair> #topic automagic project creation 19:25:11 <jeblair> #link https://review.openstack.org/#/c/15352/ 19:25:28 <jeblair> this seems nearly ready to go! 19:25:37 <mordred> yes. just needs docs 19:25:44 <jeblair> didn't clarkb write some? 19:25:51 <clarkb> no, I fixed the technical issues 19:25:53 <jeblair> ah 19:26:05 <clarkb> that said mordred if you are in meetings all day I can crank out docs 19:26:19 <mordred> clarkb: please. my day got bitchslapped 19:26:29 <clarkb> ok, I will do that today 19:27:07 <clarkb> then, once those are written we should probably do another round of testing on review-dev to catch and potential fails in our recent updates 19:27:18 <clarkb> mordred: or have you been testing as you went? 19:27:26 <mordred> clarkb: I have not tested the group-add change 19:27:32 <clarkb> k 19:28:11 <jeblair> #action clarkb document and test project creation change 19:28:28 <jeblair> i'm very excited about merging that and being ready for the expected onslaught of new projects. 19:28:35 <mordred> ++ 19:28:35 <clarkb> for those following along at home this change puts gerrit project management in puppet 19:28:59 <clarkb> so that any one can propose new gerrit projects and have them automagically created when the puppet change is approved through gerrit 19:29:52 <jeblair> #topic gerrit user sync script 19:30:18 <jeblair> now that there is an api call in launchpad to look up a user given an openid, 19:30:32 <jeblair> we can have the gerrit sync script automatically correct the situation where a user logs into gerrit with an unexpected openid 19:30:55 <jeblair> it's not a perfect solution to the problem, but it should eliminate the need to ask the lp admins to manually correct the situation. 19:31:13 <jeblair> i've started working on that, and since i have to fully comprehend the sync script in order to implement it... 19:31:23 <jeblair> i'm trying to leave it in a better state than i found it 19:31:50 <mordred> you're ripping it down to the new group sync semantics, yeah? 19:31:58 <jeblair> which means hopefully more modular and maintainable, along with a few technical changes: 19:32:11 <jeblair> yes, one is that it will only sync groups that exist in gerrit 19:32:31 <jeblair> which should cut down on syncing tons of unecessary groups and perhaps thousands of users. 19:32:55 <fungi> potential major runtime improvement there 19:33:08 <jeblair> another is to cache all the LP data at the start of the script, and move the actual database writes to the end, so that the time spent holding write locks in mysql is much smaller 19:33:53 <jeblair> so we should be able to actually use gerrit group admin functions again, which we pretty much can't because the script is always holding a write lock on the groups tables. 19:34:13 <fungi> in which case we might not have to worry so much about turning off the sync script during maintenance involving gerrit db changes too 19:34:17 <mordred> excellent - once that's in - I want to delete the useless groups too 19:34:22 <jeblair> mordred: +1 19:34:48 <jeblair> i should have something for review soon, but i'm also going to try to make one more improvement: 19:34:50 <clarkb> ++ 19:35:21 <jeblair> a debug mode that caches the LP data in a pickle for re-use across runs so that we can actually test and debug the script in human rather than geological time. 19:35:51 <mordred> :) funny story - I had something similar to that in the VERY FIRST versoin of the script 19:35:53 <mordred> oops 19:36:02 <jeblair> #action jeblair finish updates to sync script. 19:36:21 <jeblair> mordred: yeah, it's pretty important. this poor script has seen a lot of action. :( 19:37:14 <jeblair> #topic ci-issues-log 19:37:31 <jeblair> clarkb: want to talk about your idea? 19:37:34 <clarkb> #link https://etherpad.openstack.org/ci-issues-log 19:38:05 <clarkb> at the summit there was a lot of mention about when the infrastructure failed and when things couldn't merge and so on 19:38:24 <clarkb> and we weren't tracking these issues very well 19:38:27 <fungi> assertions about perceived gate "instability" 19:38:58 <clarkb> now, these things don't always end up being bugs in hte infrastructure or even things related to what we do, but the perception is there 19:39:12 <jeblair> perceived is a good word, because at this point the infrastructure very rarely fails. 19:39:22 <clarkb> so filing bugs against openstack-ci for things we can never fix or don't have a hand in doesn't make sense 19:39:34 <clarkb> but we still want to track this so I started the above etherpad 19:39:58 <jeblair> it's useful for that, but i think it's actually more useful as a communication tool for ourselves... 19:40:11 <clarkb> basically when something fails jot it down in there 19:40:17 <clarkb> jeblair: yes, it has been useful for that 19:40:33 <clarkb> being able to keep up to date with the latest status of a particular issue is helpful 19:40:47 <jeblair> i find it's valuable to see what has been happening and what other people have been doing, for exactly the reason a ships log is useful to crews going on and off shift. 19:41:54 <jeblair> but it can also be a tool for exposing what's going on (and what's going wrong) to the wider community... 19:42:23 <jeblair> but whether etherpad is the best tool for that is an open question 19:42:32 <clarkb> ya, I am still not sold on it 19:42:40 <jeblair> it's great that we can all edit it and keep things up to date 19:43:03 <fungi> at my last job, we used a private wordpress instance for that, but it wasn't really ideal either 19:43:26 <fungi> i like the etherpad better in that the content is more granularly collaborative and wikilike 19:43:37 <clarkb> we could potentially use a git repository 19:43:44 <clarkb> to have stronger versioning and history 19:44:01 <fungi> or a wiki page... 19:44:03 <jeblair> we could try publicising it and see what happens; i guess my only concern is that misinformation or less-useful information, or "problem dumps" start showing up there. 19:44:05 <clarkb> or a wiki page 19:44:27 <clarkb> jeblair: ya, I don't really see it as a user reporting tool 19:44:34 <clarkb> the info there should be pre filtered 19:44:35 <fungi> etherpad and wiki are both not natively great for keeping long and continuously update logs of things though, i think 19:45:00 <clarkb> so that it isn't ambiguous to the next shift if things have been filtered 19:45:22 <fungi> i think i want something bloglike over the long term but wikilike over the short term 19:45:34 <fungi> not really sure such a thing exists 19:45:57 <clarkb> we could use a static content blog system backed by git 19:46:13 <jeblair> clarkb: i think quick updates are key 19:46:40 <jeblair> i really hate heavyweight reporting tools. 19:46:41 <fungi> i agree. and at that, etherpad is great. normal wikis somewhat but not quite as much. blogs and git far less so 19:47:12 <jeblair> clarkb: that doesn't exclude your idea, but i think it suggests that maybe it should be wrapped with quick scripts or something. 19:47:31 <fungi> maybe something that scraped a daily etherpad into a git-backed blog entry? 19:48:01 <jeblair> we could also write a web app that's half wiki and half blog. click to edit the most recent entries, automatic archiving of old ones... 19:48:16 <jeblair> okay, so more brainstorming about this, but it seems several of us really like the idea. :) 19:48:24 <fungi> yes 19:48:38 <jeblair> on a related note, flakey tests... 19:48:42 <jeblair> #topic flakey tests 19:49:12 <jeblair> there have been a lot of flakey tests lately, obviously, and the issues log is at least partly a response to that 19:49:17 <clarkb> yes 19:49:19 <jeblair> we've been sort of a de-facto clearinghouse for information about the tests 19:49:39 <fungi> or front desk for complaints about anyway 19:49:55 <jeblair> which is a useful thing to do, but i think it's distracting us from doing the things we're rather better at than being a help desk. 19:50:06 <mordred> ++ 19:50:23 <mordred> jaypipes: you around? this might be a convo you should be in on... 19:50:36 <clarkb> my initial instinct is to take away reverify 19:50:47 <mordred> same here 19:50:54 <clarkb> you can recheck to see if your patch is actually bad 19:50:55 <mordred> although it will cause an immediate revolt 19:51:07 <clarkb> but to merge your code you must take some ownership of the failrues 19:51:18 <clarkb> and the core members can re authorize if need be 19:52:00 <jeblair> yes, removing recheck/reverify doesn't stop you from merging code, but it escalates problems. 19:52:25 <jeblair> and given how solid the infrastructure is, i feel comfortable doing it from that point of view... however... 19:52:33 <jeblair> it seems like either the code or tests or both are kind of crap right now. 19:52:35 <clarkb> that said I think the flakeyness is pretty visible and that hasn't helped the troubleshooting much 19:53:11 <jeblair> and it will really annoy people that their changes are harder to get merged (even if it's the fault of their co-devs) 19:53:22 <jeblair> so, how about this for a compromise: 19:53:25 <torgomatic> but if the flaky code is in another project, then that makes developers' lives harder 19:53:43 <jeblair> torgomatic: indeed, exacerbating that point. 19:53:47 <fungi> there is only one project, and that project is openstack 19:54:00 <torgomatic> for example, if the devstack gate fails due to some Cinder thing when run on a commit in Swift, there's about a 0.0% chance that I (as a Swift dev) can go fix it 19:54:01 <jeblair> fungi: that's right too. :) 19:54:18 <jeblair> anyway, idea: recheck/reverify require a bug link. 19:54:28 <clarkb> torgomatic: correct, which is why the core members being able to re authorize is important. 19:54:39 <fungi> maybe you don't fix it, but you involve devs for the component which is suspect 19:54:42 <clarkb> torgomatic: but, in doing so those core members should be working with the other projects to sort out the problems 19:54:50 <jeblair> so you have to at least diagnose/triage the problem enough to identify an existing bug in the correct project, or report a new one. 19:54:53 <clarkb> (this is my thought of how things would work in an ideal world) 19:55:44 <jeblair> and we whip up a report of the most active/recent bug links attached to reverify/rechecks 19:56:06 <fungi> it's also an incentive to step up scrutiny of stability for new openstack components during incubation, since everyone becomes responsible for it being smooth once we gate on it 19:56:09 <jeblair> so that they can be quantified, tracked in the project meeting, and hopefully more dev attention focused on them. 19:56:34 <clarkb> I really like that 19:57:03 <clarkb> may be less useful for rechecks as it could be the patch itself that is broken 19:57:18 <clarkb> but being able to track and quantify is a giant step above where we are now 19:57:27 <fungi> yeah, tying every failure to a documented bug report (even a vague one), would be great 19:57:33 <torgomatic> it can be difficult to know which codebase something else is in, though 19:58:14 <clarkb> torgomatic: we can move bugs around projects 19:58:16 <jeblair> yeah, but if the volumes test fails, you can at least start with a bug against cinder, and if that's not right, it can be moved to the right project on later inspection 19:58:28 <torgomatic> clarkb: fair enough 19:58:36 <clarkb> I think if a bug is submitted with general failure details then as part of the troubleshooting that info can become more solid 19:59:15 <mordred> (lurking - but can we suggest that the bug gets a flaky-ci tag or something so that they can be raised in the weekly meetings?) 19:59:27 <jeblair> #action jeblair propose a system for linking reverifies to bugs 19:59:28 <jeblair> mordred: +1 19:59:40 <jeblair> and we're out of time 19:59:46 <jeblair> thanks everyone! 19:59:49 <jeblair> #endmeeting