Saturday, 2016-12-03

jeblair	SpamapS: that is a state that only exists in the test suite. that's basically testing that sendWorkFail() causes the job to run again. in <=2.5 the fake build has a run_error attribute, and if it is set, it will send WORK_FAIL over gearman, but also add a history entry (to the list of build histories the fake job runner in the tests use to keep track of what has run) with RUN_ERROR so that it's easy to check that happened.	00:04
jeblair	SpamapS: it looks like the update to FakeBuild for v3 doesn't actually sendWorkFail anymore	00:05
jeblair	so i suspect the solution will be to figure out how to cause that to happen when the run_error flag is set	00:05
SpamapS	jeblair: so RUN_ERROR still happens	00:08
SpamapS	jeblair: but retries do not	00:08
SpamapS	jeblair: everything just gets flipped to SKIPPED	00:08
jeblair	SpamapS: right -- since RUN_ERROR is just intra-test communication -- the real thing that should be happening is the WORK_FAIL that results from setting run_error	00:09
SpamapS	so now trying to figure out why :-P	00:09
SpamapS	I'm having a hard time even mapping this to any concrete understanding of zuul. :-/	00:09
SpamapS	perhaps need to bore the hole a bit deeper into head.	00:10
SpamapS	jeblair: I think I'm grasping. So when a work fail gets received by the client, there's no 'result', so LaunchClient.onBuildCompleted should see that and set retry = True	00:12
jeblair	SpamapS: exactly	00:12
SpamapS	Might be the first time I've seen WORK_FAIL used right btw. ;)	00:13
jeblair	SpamapS: tbh, it was my second try. :)	00:13
SpamapS	it's not altogether complicated, but it's just not always obvious that it's just a work complete by a different name. :)	00:14
SpamapS	many users expect gearman to retry the job automatically when it's used	00:14
jeblair	SpamapS: i note that the current launcher in v3 doesn't actually send work_fail unless there's a problem with the job. i think it's missing a big giant exception handler in _launch to send it. i think the solution might be to add that, then have RecordingLaunchServer.runAnsible raise an exception if build.run_error is true.	00:15
jeblair	SpamapS: it's also fine to sendWorkComplete as long as result is None; it's the same thing either way. i think that is actually what happens with the v2.5 launcher (might be good to examine that since it's battle-hardened).	00:17
SpamapS	jeblair: I think you're about 1 layer ahead of me. Was just wondering how I bubble that into the launcher	00:18
clarkb	jeblair: in v2.5 None signifies the worker disappeared or otherwise went away iirc	00:18
SpamapS	if a worker disappears, geard should send the work to another worker	00:18
SpamapS	(that's what gearmand will do anyway)	00:18
jeblair	term collision here -- i think clarkb meant zuul worker	00:19
clarkb	yes the thing formerly called a slave	00:19
SpamapS	ack	00:19
SpamapS	so the gearman worker will see that and sendWorkFail() which is empty result.	00:20
SpamapS	clarkb: the test I'm trying to re-enable is testing that those get retried	00:20
clarkb	ya so I think in v2.5 it was changed to be success, fail, or nil and nil signifies something happened such that I don't actually have data	00:21
clarkb	like "slave" went away	00:22
SpamapS	that's how I read it yes	00:22
SpamapS	also gets that if somehow the worker recieved a job for a queue/function name that zuul doesn't understand	00:23
SpamapS	s/worker/launcher/	00:23
SpamapS	which is kind of a '# this should never happen' branch	00:23
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: Make waitForBuildDeletion() more robust https://review.openstack.org/406411	00:26
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: Add checksum support to fake-image-create https://review.openstack.org/406412	00:26
SpamapS	oh I actually think its' WORK_EXCEPTION not WORK_FAIL that happens in 2.5	00:26
jeblair	SpamapS: yeah, i think you're right; that seems a reasonable thing to keep.	00:28
* SpamapS almost figuring it out		00:31
pabelanger	next round of fedora-23 image uploads started	00:32
SpamapS	mmmmmmmmmm I think I got it down to a 5 line patch	00:37
SpamapS	and I think it might actually be correct	00:37
* SpamapS will prove that to himself if he can write intellgible prose in the commit message		00:37
SpamapS	btw now that tests are turned back on.. running the whole thing slams my laptop	00:38
* SpamapS kinda likes that		00:38
SpamapS	Ran 214 (+5) tests in 195.157s (-6.854s)	00:40
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Delete hard upload failures from current builds https://review.openstack.org/406342	00:41
jeblair	Shrews: ^ found it. you're going to like it. :)	00:41
jeblair	Shrews: (i wonder if we could start passing in ImageBuild objects to avoid that sort of thing)	00:42
clarkb	SpamapS: now run it single process and appreciate how much faster it is that we run multiple workers	00:42
SpamapS	clarkb: right? sent my box to a load of 7	00:42
jeblair	SpamapS: yes, it's doing lots of Important Work!	00:42
jeblair	SpamapS: oh, i should probably make a note of this in the dev guide -- you may be interested in setting "ZUUL_TEST_ROOT=/tmpfs"	00:43
jeblair	or wherever you keep a tmpfs	00:44
jeblair	lots of git hdd thrashing.	00:44
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul: Re-enable TestScheduler.test_rerun_on_error https://review.openstack.org/406416	00:44
SpamapS	jeblair: my SSD laughs at your tmpfs	00:45
SpamapS	but good idea :)	00:45
SpamapS	anyway, 406416 is the result	00:46
jeblair	i actually did it to extend the life of my ssd (probably by several hours)	00:46
* SpamapS likes that the test remains unchanged		00:46
SpamapS	there's a lot of really heated debate on tmpfs and ssds and swap lately	00:46
SpamapS	but ultimatley, tmpfs for /tmp on laptops _should_ give you a bit less writing to the SSD. :)	00:47
jeblair	yeah, that's a no-go for me the way i use tmp	00:47
SpamapS	total used free shared buff/cache available	00:47
SpamapS	Mem: 15747 3205 6742 561 5798 11536	00:47
SpamapS	Swap: 16075 0 16075	00:47
SpamapS	especially when you have 16GB :-P	00:47
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: Make sure we also cleanup checksum files https://review.openstack.org/406417	00:47
SpamapS	jeblair: well technically, /tmp is supposed to be for _small_ temp files. Anything that needs to be spooled should be in /var/tmp	00:48
clarkb	my tumbleweed install did a really weird setup for /	00:49
clarkb	it made subvolumes for everything	00:50
pabelanger	eep, looks like we have a race condition in nodepool now	00:50
pabelanger	http://logs.openstack.org/11/406411/1/check/nodepool-coverage-ubuntu-xenial/399fa53/console.html	00:50
jeblair	SpamapS: i'm 200% more productive with the shorter tmp name	00:50
pabelanger	I'll poke at it for monday	00:51
clarkb	pabelanger: the coverage job can tickle them extra good too due to the instrumentation code making stuff run at much different speeds than typical	00:51
pabelanger	indeed	00:52
SpamapS	jeblair: actually that's poor practice in a multi-user system :)	01:10
SpamapS	jeblair: not that your laptop is multi-user :)	01:10
SpamapS	jeblair: you're much better off with ~/tmp	01:10
SpamapS	which is what I've trained myself to use	01:10
clarkb	I have one of those too	01:11
clarkb	used to be beacuse ubuntu though that only encrypting the homedir was smart for some reason	01:11
clarkb	now I do it because it has actually helped me keep my homedir cleaner by starting stuff there if I know I want to rm it later	01:12
Shrews	jeblair: wow	04:13
Shrews	jeblair: if we passed the object in, we could do type checking to prevent such programming errors. (have i mentioned i hate the weak typing of python?)	04:15
SpamapS	Shrews: eh.. python is strongly typed	06:44
SpamapS	"A strongly-typed programming language is one in which each type of data (such as integer, character, hexadecimal, packed decimal, and so forth) is predefined as part of the programming language and all constants or variables defined for a given program must be described with one of the data types."	06:44
*** bhavik1 has joined #zuul		11:12
*** willthames has quit IRC		11:25
*** jamielennox is now known as jamielennox\|away		12:22
*** bhavik1 has quit IRC		13:30
*** dmsimard\|pto is now known as dmsimard		13:35
*** harlowja has quit IRC		14:40
mordred	strong/weak and static/dynamic == different axes. I agree with Shrews that the dynamic typing of python annoys me. which is funny - there was a time in my life when I really liked it	16:44
SpamapS	It's truly a double edged sword though.	18:12
SpamapS	over time the two will support around the same velocity	18:13
SpamapS	but agility in early dev is high with dynamic typing	18:13
SpamapS	so you get a nice steep early functionality spike	18:13
SpamapS	but later as you move into maturity.. you find yourself chasing typing problems and it's definitely annoying	18:14
fungi	another zuul-related question for people who enjoy answering things on ask.o.o: https://ask.openstack.org/question/99896	18:19
fungi	wonder if we need an faq about why zuul enforces strict serialization and that it doesn't support atomic cross-repo merges	18:19
SpamapS	I'd be interested in the background behind why that's not supported (I think I know, but I'm not sure enough to answer)	18:31
fungi	my understanding is that zuul can't guarantee they all merge at the exact same moment, and if gerrit allows it to merge one and then refuses to let it merge another... you could end up in a bad situation	18:45
fungi	though there are likely other technical reasons as well	18:46
fungi	(and then there are the theoretical reasons behind not supporting it, in that it promotes poor development patterns, but that's more of a judgment call)	18:47
jeblair	it would not be much code for zuul to support that, and we probably will as an option in v3. but we don't want to enable it for openstack because we value the strict sequencing which aids in CD.	18:47
jeblair	(in order to support it, we would have zuul push to gerrit for merges rather than depend on the gerrit submission queue)	18:47
fungi	good point	18:52
fungi	that would at least mitigate the situation i was concerned about	18:52
jeblair	yeah, the code is basically there (that was part of the idea of the mergers -- they even already have a push method), we just (intentionally) don't have a way to turn it on.	18:53
fungi	i bet we could also stop crippling the git merge calls to allow the normal suite of strategies	18:55
zaro	jeblair: have you taken a look at the Gerrit batch plugin. im wondering whether there's any benefit using that over zuul mergers?	18:56
fungi	once we no longer have to rely on gerrit actually being capable of merging on its own	18:56
jeblair	fungi: yeah, aside from us possibly changing our minds and allowing simultaneous cross-dependent change merges, we might want to enable that to deal with the octomerge issue	18:57
fungi	https://gerrit.googlesource.com/plugins/batch/ "...building and previewing sets of proposed updates to multiple projects/branches/refs that should be applied together..."	18:57
fungi	neat	18:57
jeblair	zaro: no -- we might be able to use it, but also, we're doing just about all the work needed anyway.	18:57
jeblair	so i'm not sure it would save us anything	18:58
fungi	also with zuul-mergers we can scale pretty limitlessly	18:58
fungi	whereas the batch plugin would likely put a lot of additional load on gerrit	18:59
*** jamielennox\|away is now known as jamielennox		20:08
zaro	sorry i had to drop off.	21:28
zaro	fungi: i think the idea is that load is mostly from cloning the repo and pushing back.	21:30
zaro	fungi: therefore when zuul cloning it's putting load on gerrit. with the batch plugin it's all done on server so no cloning needed, just the merges so might be less load.	21:31
zaro	fungi: actually maybe not, since zuul clones from the cgit repos? so maybe there's not much benefit.	21:38

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!