Saturday, 2017-09-30

openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Handle build_set being None for priority https://review.openstack.org/508634	00:13
*** harlowja has quit IRC		00:14
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Handle build_set being None for priority https://review.openstack.org/508634	00:14
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Protect against builds dict changing while we iterate https://review.openstack.org/508629	00:14
SpamapS	ooo I wish I hadn't been in meetings all day I'd love to do the load average limiter patch	01:40
SpamapS	fungi: still working on it?	01:41
fungi	SpamapS: i never even really got off the ground with it--if you want to take it, all yours!	01:44
fungi	there's some discussion in here on a viable direction for it, at least	01:45
fungi	if you haven't already caught up	01:45
SpamapS	I did see that	01:46
SpamapS	I'm wondering if we can just go simpler and limit things with a thread pool.	01:46
SpamapS	if ansible jobs are the source of load and RAM usage.. causing load.. .then limiting concurrency seems like the way to go.	01:47
fungi	and just let the remaining jobs pile up in gearman until an executor has available threads again? i guess that would be the result	01:50
SpamapS	yep	01:51
SpamapS	but it's easier to just make a thread pool than monitor load	01:51
SpamapS	and the way gearman works busier executors will always respond slower than idle ones if the concurrency hasn't been all used up	01:52
openstackgerrit	John L. Villalovos proposed openstack-infra/zuul feature/zuulv3: Fix pep8 error https://review.openstack.org/508643	01:55
*** harlowja has joined #zuul		02:39
*** harlowja has quit IRC		02:55
SpamapS	hm actually no	03:00
SpamapS	if I just have a thread pool for jobs the executor server will slurp all of the jobs in.	03:00
SpamapS	it's simpler than that anyway. I can have a counter for active jobs and deregister/register when it crosses the concurrency threshhold	03:01
jeblair	SpamapS: that works for a max job count, but we were thinking that load average might be more adaptive	03:11
jeblair	(like, actual system load average)	03:11
SpamapS	jeblair: It is, but it's also more complicated. ;)	03:13
SpamapS	now that I'm digging in	03:14
SpamapS	it's not thaaaat much more complicated	03:15
SpamapS	I have it working where it will deregister when it has more than 5 jobs running, and re-register when it drops below 5	03:16
SpamapS	jeblair: a concurrency limit will also help control memory in a coarse kind of way.	03:17
SpamapS	but if we can poll load average, we can poll free	03:17
SpamapS	still I'm inclined to start with this and see how it goes.	03:18
SpamapS	as much because I'm about to get home and I won't be coding for about 48 hours after that. ;)	03:18
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add a concurrency limit to zuul-executor https://review.openstack.org/508649	03:21
SpamapS	anyway, I may hack on a load average based one later or something	03:21
* SpamapS afk for a bit		03:21
jeblair	SpamapS: i'm not opposed to more tunable limits; i think what i'd like the default to be though is automatic based on load average. i'm not a fan of systems that make sysadmins guess tunable parameters when we have computers that can do it for them. but we can build on that. :)	03:22
jeblair	i love ndb. i'd love it even more if it ran that perl script for me before starting. ;)	03:22
SpamapS	hah yeah	03:44
SpamapS	jeblair: I love adaptable systems too. Rarely do I get to write one. :) Sitting on a bus now, will see what flies out of me.	03:45
SpamapS	I think the right thing is to just check load before getJob	03:46
SpamapS	if load is too high, sleep a bit and check again.	03:46
jeblair	SpamapS: thing is we need to keep getting jobs other than execute:execute though. especially execute:stop	03:49
jeblair	well, i guess that's the only other one.	03:50
jeblair	but it is important. :)	03:50
SpamapS	oh right	03:50
SpamapS	not sleep, unregister	03:50
SpamapS	so check load... if too busy, unregister expensive	03:50
SpamapS	I think that's the ticket	03:50
SpamapS	note that there's still a tunable required	03:51
SpamapS	Which is "what's an OK load?"	03:51
jeblair	yeah, the load average. we could set it to nproc*3	03:51
jeblair	by default	03:51
SpamapS	what's the 3 coming from?	03:51
jeblair	swag	03:51
SpamapS	+1	03:52
jeblair	looking at http://cacti.openstack.org/cacti/graph.php?action=zoom&local_graph_id=63999&rra_id=1&view_type=&graph_start=1506660317&graph_end=1506742125&graph_height=120&graph_width=500&title_font_size=10 make it look like 20 is the magic number for those servers	03:54
jeblair	so maybe nproc*2.5 :)	03:54
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP Do not add implied branch matchers in project-templates https://review.openstack.org/508658	03:58
jeblair	clarkb, mordred: ^ i think that's 99% there; i think the logic is correct; i just need to write a commit message and do a little cleanup from a previous attempt. should be able to do that tomorrow.	03:59
SpamapS	whoa that was weird	04:01
SpamapS	I just did git review -d 508649, and got the content from 508658	04:01
SpamapS	almost like it collided with you uploading 508658	04:02
SpamapS	bah	04:18
SpamapS	if I do it only before getJob... we stay doing nothing until a cancel/cat/merge comes in	04:18
SpamapS	need a thread and I think a lock on the client :-P	04:18
clarkb	can you do it without a thread as a noop handler?	04:20
clarkb	iirc the server sends those gratuitously to wake workers?	04:20
clarkb	or maybe it was different request	04:21
SpamapS	no	04:21
SpamapS	well	04:21
SpamapS	yeah	04:22
SpamapS	NOOP is what the server sends to say "Hey you say you can do this, wake up and GRAB JOB	04:22
SpamapS	so yes	04:22
SpamapS	this delay thing is interesting	04:22
SpamapS	Not sure why that's there.	04:22
SpamapS	I mean I know why it says it is there.	04:23
SpamapS	but that seems unnecessary. The delay should already be happening by virtue of the fact that the less busy servers should respond faster.	04:23
clarkb	the problem is the job cost is delayed itself	04:26
SpamapS	clarkb: so the problem is once we've unregistered from work, we won't get noop's anymore.	04:26
clarkb	so they will more slowly grab jobs	04:27
clarkb	but that doesntatter when load is 80 because ansible	04:27
clarkb	SpamapS: aha	04:27
SpamapS	and gearman will send us jobs for anything we're registered for, so if we don't unregister, getjob will still assign us jobs	04:27
SpamapS	so we need something that periodically checks to see that we're ready for more work	04:27
clarkb	but ya if you grab 50 jobs and load skyrockets you'll just continue to add on but more slowly	04:28
SpamapS	yeah it's untenable this way	04:28
SpamapS	so a thread that just sleeps and goes "am I taking work? If not, is load low enough to take more work? If yes, register" every few seconds seems like the right thing.	04:29
SpamapS	but that also gets into "are gear.Worker's thread safe?"	04:30
SpamapS	because the worker is likely to be in getJob()	04:31
SpamapS	I wonder if we could just make the tunable "concurrency_factor" and basically say "multiple this time nproc to get the concurrent jobs"?	04:32
SpamapS	because controlling the # of concurrent jobs is pretty easy	04:32
SpamapS	since we're always going to be in control when jobs are started or finished.	04:32
SpamapS	that also feels like a pretty reasonable factor to expect sites to tune as they optimize. We can make the default roughly what infra sees on its executors, but other sites may have very different jobs	04:34
SpamapS	It's also a bit safer since taking on jobs while load is low and we're just waiting for a bunch of devstack runs may backfire if the devstack runs all finish at once and we're doing 50 concurrent rsyncs.	04:34
SpamapS	we could even get good at tracking job cost eventually, and not go by the count of jobs, but by the count of expected job execution seconds/iops/etc.	04:36
SpamapS	anyway.. home now... will ponder more. If we see it getting out of control the patch I made will at least give us a governor	04:37
SpamapS	Actually I just check load in finish and start. I can make a special exception never to unregister in finish if it would take me below 1 job running.	04:48
* SpamapS may not sleep until this is implemented		04:48
*** xinliang has quit IRC		07:46
*** xinliang has joined #zuul		07:59
*** bhavik1 has joined #zuul		09:31
*** bhavik1 has quit IRC		09:35
*** xinliang has quit IRC		09:35
*** huangtianhua has quit IRC		11:11
*** zhuli has quit IRC		13:15
openstackgerrit	Jeremy Stanley proposed openstack-infra/zuul-jobs master: Yet still more fix post log location https://review.openstack.org/508684	14:31
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Handle build_set being None for priority https://review.openstack.org/508634	15:04
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Yet still more fix post log location https://review.openstack.org/508684	15:06
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Protect against builds dict changing while we iterate https://review.openstack.org/508629	15:06
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Make fetch-tox-output more resilient https://review.openstack.org/508563	15:28
jeblair	SpamapS: i was about to say "oh it's totally safe to call register functions from another thread", and in fact we do exactly that in zuul v2.5	15:49
jeblair	SpamapS: i went to double check that though, and i did find that we may want this: remote: https://review.openstack.org/508698 Add a send lock to the base Connection class	15:49
jeblair	SpamapS: zuul v2.5 was not using ssl, but v3 is. so we were very unlikely to see a problem with that in v2, but somewhat more likely in v3.	15:50
jeblair	SpamapS: also, a downside to only checking at start/end of jobs is that if we end up in a case where we unregister, and then a bunch of jobs finish when the load is high, we stay unregistered, but then the load drops but it's an hour until the next job finishes, we may end up substantially underutilized. so i favor something that checks more regularly.	15:52
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Make fetch-tox-output more resilient https://review.openstack.org/508563	15:55
SpamapS	jeblair: yeah in fact finishJob sends work complete from another thread from what I see.	16:13
jeblair	good point, we're already playing the odds	16:14
SpamapS	Yeah so I think I can just start a thread that checks load every few seconds and registers or unregisters appropriately.	16:15
SpamapS	And then maybe we can put some armor on gear to make it less of a dice roll.	16:16
jeblair	SpamapS: https://review.openstack.org/508698 should be armor	16:19
SpamapS	I will say also that I'm not sure gearman is the best protocol for this. AMQP has a specific response which is "send this to somebody else I'm too busy" will would allow us to make per-job cost decisions.	16:19
SpamapS	s/will/which/. DYAC	16:20
SpamapS	Nice!	16:21
SpamapS	Re: lock	16:21
SpamapS	oh man I just discovered mosh the other day.. nice that my irc session never disconnects now	16:24
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Handle build_set being None for priority https://review.openstack.org/508634	16:42
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Limit concurrency in zuul-executor under load https://review.openstack.org/508649	16:48
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Do not add implied branch matchers in project-templates https://review.openstack.org/508658	16:48
SpamapS	jeblair: ^ load based, definitely could end up trying to send at the same time we're sending other stuff so may add another dice roll. ;)	16:50
SpamapS	I'm going to test it out in my GD internal zuul.	16:51
jeblair	i'm never going to stap giggling whenever you say that :)	16:51
jeblair	stop even	16:51
*** bhavik1 has joined #zuul		16:54
openstackgerrit	Merged openstack-infra/zuul-jobs master: Yet still more fix post log location https://review.openstack.org/508684	16:54
*** bhavik1 has quit IRC		16:56
*** bhavik1 has joined #zuul		16:56
*** bhavik1 has quit IRC		16:58
*** bhavik1 has joined #zuul		16:58
*** bhavik1 has quit IRC		17:00
SpamapS	jeblair: It's a great GD zuul. ;)	17:07
openstackgerrit	Clark Boylan proposed openstack-infra/zuul feature/zuulv3: Do not add implied branch matchers in project-templates https://review.openstack.org/508658	17:19
openstackgerrit	David Moreau Simard proposed openstack-infra/zuul feature/zuulv3: Delete IncludeRole object from result object for include_role tasks https://review.openstack.org/504238	17:57
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Do not add implied branch matchers in project-templates https://review.openstack.org/508658	18:11
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Protect against builds dict changing while we iterate https://review.openstack.org/508629	20:39
openstackgerrit	Merged openstack-infra/zuul-jobs master: Multi-node: Set up hosts file https://review.openstack.org/504552	21:45
openstackgerrit	Merged openstack-infra/zuul-jobs master: Multi-node: Set up firewalls https://review.openstack.org/504553	21:45
*** mrhillsman has joined #zuul		21:55

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!