13:31:39 <esberglu_> #startmeeting powervm_ci_meeting 13:31:39 <openstack> Meeting started Thu Dec 8 13:31:39 2016 UTC and is due to finish in 60 minutes. The chair is esberglu_. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:31:40 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:31:42 <openstack> The meeting name has been set to 'powervm_ci_meeting' 13:31:49 <esberglu_> Hey guys 13:32:26 * adreznec waves 13:33:40 <thorst_> o/ 13:34:36 <esberglu_> #topic status 13:34:51 <esberglu_> So those runs are _slowly_ going through 13:35:13 <adreznec> Yeah 13:35:16 <esberglu_> The runs themselves seem to be fine 13:35:18 <thorst_> how slow? 13:35:20 <adreznec> Seeing some scary times out there on the queue 13:35:52 <esberglu_> thorst_: I can ping you the zuul ip if you want to look 13:36:06 <thorst_> I have it 13:36:08 <thorst_> 10 hours? 13:36:13 <esberglu_> But like 10 hours 13:36:14 <esberglu_> Yeah 13:36:28 <adreznec> Do we know what's bogging things down yet? 13:36:31 <thorst_> are any actually going? 13:36:59 <thorst_> the jenkins has a ton of idle VMs. 13:37:40 <esberglu_> Yeah. There are 3 going through right now, 20 - 30 have gone through in the last 12 hours 13:37:46 <adreznec> Yeah 13:38:00 <adreznec> It doesn't actually look like anything's been running for all that long 13:38:03 <esberglu_> That's about the volume I would expect 13:38:07 <adreznec> but things have been in the queue for a long time 13:38:13 <esberglu_> Its just they sit around in the queue forever first 13:39:16 <esberglu_> Which means the queue just keeps getting bigger 13:41:03 <adreznec> Ok, so I think we need to nail down exactly what's causing the initial queuing to build up 13:41:28 <adreznec> If it's git issues, then we probably need to invest in mirrors at this point 13:42:57 <thorst_> did we tell Zuul to only let 3 through at a time? 13:43:16 <thorst_> wasn't there some gate in zuul about throughput? 13:43:34 <thorst_> I don't really know how this could be git...what's that train of thought there? 13:44:21 <thorst_> (not that a mirror is a bad idea...) 13:44:38 <esberglu_> No the only zuul conf that changed was moving nova from silent to check pipeline 13:44:45 <thorst_> hmm 13:45:13 <adreznec> Well we were seeing those git performance issues yesterday, and one theory was that we were hitting some kind of internal timeouts doing the clones/fetches 13:45:40 <thorst_> ahh, cause zuul does some sort of clone 13:45:51 <thorst_> which I don't understand...I'd have thought that was just in the Jenkins slave VM 13:45:51 <adreznec> Because we could see it attempting to do the same fetch multiple times on different PIDs 13:46:03 <esberglu_> We were seeing these git fetch <change> 13:46:10 <esberglu_> That seemed to just be looping 13:46:15 <adreznec> Not sure we have enough data to say that concretely 13:46:21 <adreznec> But it was a theory 13:46:54 <esberglu_> The only other thing that I thought it might be 13:47:09 <esberglu_> There are these changes in the queue that depend on like 10 other changes 13:47:31 <esberglu_> And some of the changes are having merge issues 13:48:01 <thorst_> why is zuul doing this? ssh -i /var/lib/zuul/ssh/id_rsa -p 29418 powervmci@review.openstack.org git-upload-pack '/openstack/nova' 13:48:18 <esberglu_> Here's an example of one of those changes https://review.openstack.org/#/c/337789/ 13:48:32 <esberglu_> Not sure 13:49:34 <thorst_> that process has been running for a while 13:51:27 <adreznec> Interesting... 13:51:29 <thorst_> the commit message in zuul about why that runs is "I'll document this later" 13:51:48 <adreznec> That should never really be a particularly long-running command 13:52:27 <thorst_> I suggest we kill that proc 13:52:30 <thorst_> and see if we unwedge. 13:52:54 <esberglu_> Sure 13:53:23 <adreznec> thorst_: How long is a while? 13:53:27 <adreznec> Hours? 13:53:51 <thorst_> says 08:48 in the ps aux output 13:54:15 <thorst_> so under 5 min. 13:54:19 <thorst_> its done now. 13:54:36 <esberglu_> Yeah I killed it. Another one just popped up in its place 13:55:09 <thorst_> did you kill that second one? 13:55:13 <thorst_> they just seem to be really slow 13:55:14 <esberglu_> No 13:56:13 <adreznec> Right 13:56:35 <thorst_> wonder what git-upload-pack does 13:56:44 <thorst_> needs some investigation, because I don't think a clone would help that... 13:58:52 <thorst_> well...when in doubt, just run by hand. 13:59:05 <thorst_> it returns quite the amount of data. 14:00:22 <adreznec> I think it does discovery/fetching of objects from git during a fetch 14:00:39 <adreznec> Not 100% sure on that 14:02:45 <thorst_> alright...so is that the status. Figure out why we're wedged. 14:02:56 <thorst_> (since we're over on time in the meeting) 14:03:09 <adreznec> Yeah 14:03:16 <adreznec> Clearly we need longer than 30 minutes to investigate this 14:03:28 <thorst_> just running the command ourselves may take 30 minutes 14:03:33 <esberglu_> Yeah. Other than that I put a wiki page up for CI 14:03:48 <esberglu_> If you guys want to take a look. Still need to finish a few sections and polish it up 14:04:56 <adreznec> Where did it land? 14:04:58 <adreznec> Novalink wiki? 14:05:34 <esberglu_> Neo dev wiki 14:05:49 <esberglu_> Subpage under PowerVM CI System 14:06:04 <adreznec> Ok 14:06:12 <esberglu_> _WIP_ CI System and Deployment 14:06:16 <thorst_> so that is also for wangqwsh as you train him to be able to redeploy the CI? 14:06:20 <esberglu_> Yep 14:06:38 <thorst_> excellent. And if we do need a git mirror, that may be a good project for wangqwsh to drive 14:07:24 <esberglu_> #endmeeting