19:01:24 #startmeeting 19:01:25 Meeting started Tue Aug 7 19:01:24 2012 UTC. The chair is mtaylor. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:26 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:26 CI anybody? 19:01:33 please. 19:01:43 jeblair: you wanna talk about anything? 19:02:21 mtaylor: i'd like to talk about current problems and what we're going to do about them. 19:02:28 #topic current problems 19:02:36 jeblair: you have the floor 19:02:41 mtaylor: have you had a chance to look into why a git remote update taks 6 minutes? 19:03:21 jeblair: no, I have not. I keep getting pulled on to phone calls 19:03:59 jeblair: is it only between rax hosts? or does that need investigation too? 19:04:02 so i'm very worried about that. 19:04:55 i have only performed minimal investigation. it seems to be most pronounced on an oneiric rs host talking to review.openstack.org. it seems less of a problem from a precise host. and it doesn't seem to be much of a problem from my home connection. 19:04:58 that's all the data i hive. 19:04:59 have 19:05:08 ok 19:05:22 also.... 19:06:15 review.o.o is averaging 2.24 Mb outbound traffic. I forget what the link is, but i don't think it's hitting the limit. 19:06:28 oneiric is running an older version of git than precise - but not by much, so it _should_ support efficent http protocol 19:07:06 i'd really love it if someone could look into that, because it's eating up 6 minutes of run time for every unit test, and more than _20_ minutes on devstack runs. 19:07:11 i think it's a critical issue. 19:07:14 1.7.5.4 vs. 1.7.9.5 19:07:35 yeah. I will work on that this afternoon. LinuxJedi any chance you have any brain-space to help? 19:07:53 the ideas i've brainstormed are slow links between certain sets of hosts within rackspace 19:08:09 probably, yes 19:08:33 or perhaps even oversubscribed io 19:08:53 * LinuxJedi wonders if we can simulate the traffic 19:09:11 which traffic? 19:09:38 git http 19:10:14 "git remote update" ? 19:10:41 jeblair: btw - do we need to do remote update? 19:10:51 sure, but I meant with something we can get some real data from 19:10:54 jeblair: since we're later doing fetch on specific refspec? 19:10:59 yes, so that we can "git checkout master" 19:11:23 LinuxJedi: i'm trying to follow, but i'm puzzled. what's not real about that? 19:11:56 LinuxJedi: i'm trying to understand what thing you want to simulate? 19:12:45 jeblair: nevermind we can just use packet analysers to get everything I was thinking of 19:12:56 ok 19:13:12 next critical issue: 19:13:17 i filed this bug: https://bugs.launchpad.net/openstack-ci/+bug/1034032 19:13:18 Launchpad bug 1034032 in openstack-ci "make static html versions of jenkins reports for archiving" [Critical,Triaged] 19:13:44 we really need to stop using the junit post-build option in jenkins 19:14:31 that's step on in that bug -- to generate junit output some other way 19:14:55 then we can stick the job information, including the unit test report, on a static webserver 19:15:28 i think this is the next biggest scalability hurdle for us -- it's significantly slowing down zuul and jenkins 19:15:39 and it's the biggest cause of deadlocks that we have to manually clear out 19:16:07 can we disable that plugin while we work on a fix? 19:16:11 so i'd really like to see some progress on that. 19:16:32 clarkb: unfortunately, we did a lot of work to make the unit test output usable via the junit module... 19:16:36 clarkb: doesn't nose have an html output mechanism? 19:16:48 mtaylor: no it does not. 19:17:34 mtaylor: if we disabled the junit build step, do you believe the results would be usable to developers, as things currently stand? 19:17:53 (my understanding is that you did some work to get all of the log output, etc, in the xunit report) 19:18:16 jeblair: I did - but that was really just getting the results to be picked up by nose 19:18:35 jeblair: if we turned off xml output, we'd get standard nose error reports to stdout at the end of the run 19:19:06 it isn't the output that is failing though right? it is the junit plugin waiting for the output? 19:19:11 correct 19:20:00 mtaylor: so you think that output would be useful, without the organization provided by junit? 19:20:34 jeblair: it wouldn't be as pretty, but all of the information should be there 19:20:45 oh, wait 19:21:42 so - they'd get this: 19:21:44 https://jenkins.openstack.org/job/gate-nova-python26/3956/console 19:21:53 so you can see the traceback at the end 19:22:00 as well as captured logging 19:22:03 it's ugly, but it's here 19:22:05 not yet i can't, i'm waiting for jenkins. 19:22:05 there 19:23:10 mtaylor: why isn't that job processed by junit? 19:23:29 jeblair: it is but the option to also retain stdout results is checked 19:23:46 what clarkb said 19:23:49 er 19:23:49 no 19:23:58 why does that build not have a test report? 19:24:00 oh, you're right 19:24:48 we do this: export NOSE_WITH_XUNIT=1 19:24:49 the option to "also retain stdout" is an option in jenkins junit processing that causes jenkins to keep the output recorded in the xml file 19:25:00 yes, that job _wrote_ an xml file 19:25:05 but jenkins did not read it. 19:25:36 clarkb: the "retain stdout" option doesn't have any effect on what is printed to stdout by nose, or what's recorded in the xml file. 19:25:56 good question 19:26:07 mtaylor: so what i'm getting at is, is there logging that's going into the xml file that we're not seeing? 19:26:42 jeblair: there should be no _additional_ logging into xml 19:26:46 mtaylor: i thought there was a whole "run_tests.log" thing and you got most of the output to go via the nose log capture plugin. 19:26:58 i don't see any logs from that failed test 19:27:06 that is the output at the end 19:27:14 oh wait 19:27:17 there is _one_ log line 19:27:27 yeah. there just wasn't much log output during that test run 19:27:38 okay, sorry i missed that. 19:27:50 i just saw the exception. was expecting more logs. 19:28:29 okay, so i guess we can disable junit in jenkins as clarkb suggested. do we have agreement on that? 19:28:43 yes. works for me 19:28:49 works for me too 19:29:06 and then we can work on getting nose output into html pages that can be copied to a static web server 19:29:24 once we do that, I assume zuul will be able to report back the right link for people to look at? 19:29:25 okay. i will submit a change to disable junit 19:29:32 yep. that's all described in the bug 19:29:35 great 19:29:59 (the rest of the bug still stands -- this is just the first step) 19:30:18 yup 19:30:26 #action jeblair disable junit processing in jenkins jobs 19:30:36 (is that the right action syntax?) 19:31:06 next critical item: https://bugs.launchpad.net/openstack-ci/+bug/1010621 19:31:08 Launchpad bug 1010621 in openstack-ci "important servers should have backups" [Critical,In progress] 19:31:30 mtaylor: have you heard back about the hpcloud volume service? 19:32:10 jeblair: yes I think that is correct 19:32:30 jeblair: no 19:32:34 jeblair: pinging again 19:33:32 mtaylor: are they being unresponsive? cause I'd really like to get this going, but i don't think we have an effective backup until we _at least_ have one in another account from the one where all our important servers are... 19:34:28 i mean, should we come up with a different backup strategy? 19:36:09 jeblair: they report it as enabled on tenant id 15813847660783 19:36:46 mtaylor: no idea what tenant that is. 19:36:52 I know. super helpful, right? 19:37:59 okay, that's supposed to be the stackforge tenant. i'll try again. 19:38:21 #action jeblair to see if volume service works in hpcloud for backups 19:38:38 so those are all the critical issues i know of that we can do something about at the moment... 19:39:16 i'd really like to see work going into addressing those, and maybe dealing with some of the bug backlog. 19:39:57 * mtaylor agrees 19:40:11 I'll work with LinuxJedi on trying to figure out the git slowdown 19:40:20 i'm filing a bug about that one right now 19:40:21 clarkb: can you take figuring out how to get a nice html report from nose? 19:40:27 sure 19:40:44 jeblair: artifact copy itself is a blocking operation, yeah? 19:40:57 jeblair: do we need to solve that for html report transfer? 19:41:19 mtaylor: we could just add another 'execute shell' at the end of the builds if we need to 19:41:25 yeah... 19:41:28 or 19:41:40 there is the option to always run a piece of code even if a test fails I think 19:41:42 just as an scp operation, without listing them as artifacts, like the docs jobs. 19:42:00 that should work 19:42:13 k 19:42:17 i think that's worth investigating first -- i don't think it has the jenkins locking problems that artifacts have 19:42:26 and it's cleaner than running a shell 19:42:37 but another challenge is how to get the console output.. 19:42:38 and we already have jenkins_job builders for it 19:42:47 jeblair: more tee? 19:43:19 let's do one thing at a time - let's get test output fixed 19:43:26 clarkb: that's an idea. output with timestamps would be even more awesome though. :) 19:43:38 then we can figure out console output 19:43:56 https://bugs.launchpad.net/openstack-ci/+bug/1034130 19:43:58 Launchpad bug 1034130 in openstack-ci "find out why git operations from oneiric hosts are slow" [Critical,Triaged] 19:44:25 #action clarkb get a nice html report from nose 19:44:41 #mtaylor,LinuxJedi find out why git operations from oneiric hosts are slow 19:45:05 you forgot the #action 19:45:10 #action mtaylor,LinuxJedi find out why git operations from oneiric hosts are slow 19:45:11 heh 19:45:37 mtaylor is an action :) 19:45:42 i think that covers critical operational stuff. 19:46:09 * clarkb jumps in really quick so that he doesn't forget. if people could take a look at https://review.openstack.org/#/c/10784/ that would be awesome 19:46:24 that is a draft to deal with one of the things assigned to me last week 19:49:14 clarkb: I shall look at that 19:49:21 clarkb: perhaps un-draft it so I don't forget? 19:49:35 ok 19:50:16 #topic open floor 19:50:22 anybody got anything else? 19:50:38 gerritbot is ready for its first release as soon as zuul can handle it 19:50:50 go for it 19:51:00 I don't think I have the proper permissions 19:51:30 I can't even +2 gerritbot 19:51:42 devstack merged my openvz support patch, so it would be great if devstack-gate did the same soon 19:51:59 clarkb: apparently you need to join openstack-ci-core 19:52:07 jeblair: ok 19:52:13 (and we probably need to give that group perms to tag) 19:52:38 clarkb: http://pypi.python.org/pypi/HTMLTestRunner 19:52:44 devananda: ++ 19:53:03 mtaylor: why don't you review it? i gave it a +2. ;) 19:53:11 mtaylor: nice. BSD too 19:53:46 jeblair: maybe I will 19:53:51 lovely. i mean, other than the colors. but i think that's exactly what we want. 19:54:01 I have no idea if that's suitable, but it might be a start at leats 19:54:04 least 19:54:22 that at least is what i was imagining the end product looked like. 19:54:32 s/looked/should look/ 19:55:02 ++ 19:57:01 okie. that's all I've got for this week 19:57:07 thanks all 19:57:10 #endmeeting