jeblair | SpamapS: that is a state that only exists in the test suite. that's basically testing that sendWorkFail() causes the job to run again. in <=2.5 the fake build has a run_error attribute, and if it is set, it will send WORK_FAIL over gearman, but also add a history entry (to the list of build histories the fake job runner in the tests use to keep track of what has run) with RUN_ERROR so that it's easy to check that happened. | 00:04 |
---|---|---|
jeblair | SpamapS: it looks like the update to FakeBuild for v3 doesn't actually sendWorkFail anymore | 00:05 |
jeblair | so i suspect the solution will be to figure out how to cause that to happen when the run_error flag is set | 00:05 |
SpamapS | jeblair: so RUN_ERROR still happens | 00:08 |
SpamapS | jeblair: but retries do not | 00:08 |
SpamapS | jeblair: everything just gets flipped to SKIPPED | 00:08 |
jeblair | SpamapS: right -- since RUN_ERROR is just intra-test communication -- the real thing that should be happening is the WORK_FAIL that results from setting run_error | 00:09 |
SpamapS | so now trying to figure out why :-P | 00:09 |
SpamapS | I'm having a hard time even mapping this to any concrete understanding of zuul. :-/ | 00:09 |
SpamapS | perhaps need to bore the hole a bit deeper into head. | 00:10 |
SpamapS | jeblair: I think I'm grasping. So when a work fail gets received by the client, there's no 'result', so LaunchClient.onBuildCompleted should see that and set retry = True | 00:12 |
jeblair | SpamapS: exactly | 00:12 |
SpamapS | Might be the first time I've seen WORK_FAIL used right btw. ;) | 00:13 |
jeblair | SpamapS: tbh, it was my second try. :) | 00:13 |
SpamapS | it's not altogether complicated, but it's just not always obvious that it's just a work complete by a different name. :) | 00:14 |
SpamapS | many users expect gearman to retry the job automatically when it's used | 00:14 |
jeblair | SpamapS: i note that the current launcher in v3 doesn't actually send work_fail unless there's a problem with the job. i think it's missing a big giant exception handler in _launch to send it. i think the solution might be to add that, then have RecordingLaunchServer.runAnsible raise an exception if build.run_error is true. | 00:15 |
jeblair | SpamapS: it's also fine to sendWorkComplete as long as result is None; it's the same thing either way. i think that is actually what happens with the v2.5 launcher (might be good to examine that since it's battle-hardened). | 00:17 |
SpamapS | jeblair: I think you're about 1 layer ahead of me. Was just wondering how I bubble that into the launcher | 00:18 |
clarkb | jeblair: in v2.5 None signifies the worker disappeared or otherwise went away iirc | 00:18 |
SpamapS | if a worker disappears, geard should send the work to another worker | 00:18 |
SpamapS | (that's what gearmand will do anyway) | 00:18 |
jeblair | term collision here -- i think clarkb meant zuul worker | 00:19 |
clarkb | yes the thing formerly called a slave | 00:19 |
SpamapS | ack | 00:19 |
SpamapS | so the gearman worker will see that and sendWorkFail() which is empty result. | 00:20 |
SpamapS | clarkb: the test I'm trying to re-enable is testing that those get retried | 00:20 |
clarkb | ya so I think in v2.5 it was changed to be success, fail, or nil and nil signifies something happened such that I don't actually have data | 00:21 |
clarkb | like "slave" went away | 00:22 |
SpamapS | that's how I read it yes | 00:22 |
SpamapS | also gets that if somehow the worker recieved a job for a queue/function name that zuul doesn't understand | 00:23 |
SpamapS | s/worker/launcher/ | 00:23 |
SpamapS | which is kind of a '# this should never happen' branch | 00:23 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Make waitForBuildDeletion() more robust https://review.openstack.org/406411 | 00:26 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Add checksum support to fake-image-create https://review.openstack.org/406412 | 00:26 |
SpamapS | oh I actually think its' WORK_EXCEPTION not WORK_FAIL that happens in 2.5 | 00:26 |
jeblair | SpamapS: yeah, i think you're right; that seems a reasonable thing to keep. | 00:28 |
* SpamapS almost figuring it out | 00:31 | |
pabelanger | next round of fedora-23 image uploads started | 00:32 |
SpamapS | mmmmmmmmmm I think I got it down to a 5 line patch | 00:37 |
SpamapS | and I think it might actually be correct | 00:37 |
* SpamapS will prove that to himself if he can write intellgible prose in the commit message | 00:37 | |
SpamapS | btw now that tests are turned back on.. running the whole thing slams my laptop | 00:38 |
* SpamapS kinda likes that | 00:38 | |
SpamapS | Ran 214 (+5) tests in 195.157s (-6.854s) | 00:40 |
openstackgerrit | James E. Blair proposed openstack-infra/nodepool: Delete hard upload failures from current builds https://review.openstack.org/406342 | 00:41 |
jeblair | Shrews: ^ found it. you're going to like it. :) | 00:41 |
jeblair | Shrews: (i wonder if we could start passing in ImageBuild objects to avoid that sort of thing) | 00:42 |
clarkb | SpamapS: now run it single process and appreciate how much faster it is that we run multiple workers | 00:42 |
SpamapS | clarkb: right? sent my box to a load of 7 | 00:42 |
jeblair | SpamapS: yes, it's doing lots of Important Work! | 00:42 |
jeblair | SpamapS: oh, i should probably make a note of this in the dev guide -- you may be interested in setting "ZUUL_TEST_ROOT=/tmpfs" | 00:43 |
jeblair | or wherever you keep a tmpfs | 00:44 |
jeblair | lots of git hdd thrashing. | 00:44 |
openstackgerrit | Clint 'SpamapS' Byrum proposed openstack-infra/zuul: Re-enable TestScheduler.test_rerun_on_error https://review.openstack.org/406416 | 00:44 |
SpamapS | jeblair: my SSD laughs at your tmpfs | 00:45 |
SpamapS | but good idea :) | 00:45 |
SpamapS | anyway, 406416 is the result | 00:46 |
jeblair | i actually did it to extend the life of my ssd (probably by several hours) | 00:46 |
* SpamapS likes that the test remains unchanged | 00:46 | |
SpamapS | there's a lot of really heated debate on tmpfs and ssds and swap lately | 00:46 |
SpamapS | but ultimatley, tmpfs for /tmp on laptops _should_ give you a bit less writing to the SSD. :) | 00:47 |
jeblair | yeah, that's a no-go for me the way i use tmp | 00:47 |
SpamapS | total used free shared buff/cache available | 00:47 |
SpamapS | Mem: 15747 3205 6742 561 5798 11536 | 00:47 |
SpamapS | Swap: 16075 0 16075 | 00:47 |
SpamapS | especially when you have 16GB :-P | 00:47 |
openstackgerrit | Paul Belanger proposed openstack-infra/nodepool: Make sure we also cleanup checksum files https://review.openstack.org/406417 | 00:47 |
SpamapS | jeblair: well technically, /tmp is supposed to be for _small_ temp files. Anything that needs to be spooled should be in /var/tmp | 00:48 |
clarkb | my tumbleweed install did a really weird setup for / | 00:49 |
clarkb | it made subvolumes for everything | 00:50 |
pabelanger | eep, looks like we have a race condition in nodepool now | 00:50 |
pabelanger | http://logs.openstack.org/11/406411/1/check/nodepool-coverage-ubuntu-xenial/399fa53/console.html | 00:50 |
jeblair | SpamapS: i'm 200% more productive with the shorter tmp name | 00:50 |
pabelanger | I'll poke at it for monday | 00:51 |
clarkb | pabelanger: the coverage job can tickle them extra good too due to the instrumentation code making stuff run at much different speeds than typical | 00:51 |
pabelanger | indeed | 00:52 |
SpamapS | jeblair: actually that's poor practice in a multi-user system :) | 01:10 |
SpamapS | jeblair: not that your laptop is multi-user :) | 01:10 |
SpamapS | jeblair: you're much better off with ~/tmp | 01:10 |
SpamapS | which is what I've trained myself to use | 01:10 |
clarkb | I have one of those too | 01:11 |
clarkb | used to be beacuse ubuntu though that only encrypting the homedir was smart for some reason | 01:11 |
clarkb | now I do it because it has actually helped me keep my homedir cleaner by starting stuff there if I know I want to rm it later | 01:12 |
Shrews | jeblair: wow | 04:13 |
Shrews | jeblair: if we passed the object in, we could do type checking to prevent such programming errors. (have i mentioned i hate the weak typing of python?) | 04:15 |
SpamapS | Shrews: eh.. python is strongly typed | 06:44 |
SpamapS | "A strongly-typed programming language is one in which each type of data (such as integer, character, hexadecimal, packed decimal, and so forth) is predefined as part of the programming language and all constants or variables defined for a given program must be described with one of the data types." | 06:44 |
*** bhavik1 has joined #zuul | 11:12 | |
*** willthames has quit IRC | 11:25 | |
*** jamielennox is now known as jamielennox|away | 12:22 | |
*** bhavik1 has quit IRC | 13:30 | |
*** dmsimard|pto is now known as dmsimard | 13:35 | |
*** harlowja has quit IRC | 14:40 | |
mordred | strong/weak and static/dynamic == different axes. I agree with Shrews that the dynamic typing of python annoys me. which is funny - there was a time in my life when I really liked it | 16:44 |
SpamapS | It's truly a double edged sword though. | 18:12 |
SpamapS | over time the two will support around the same velocity | 18:13 |
SpamapS | but agility in early dev is high with dynamic typing | 18:13 |
SpamapS | so you get a nice steep early functionality spike | 18:13 |
SpamapS | but later as you move into maturity.. you find yourself chasing typing problems and it's definitely annoying | 18:14 |
fungi | another zuul-related question for people who enjoy answering things on ask.o.o: https://ask.openstack.org/question/99896 | 18:19 |
fungi | wonder if we need an faq about why zuul enforces strict serialization and that it doesn't support atomic cross-repo merges | 18:19 |
SpamapS | I'd be interested in the background behind why that's not supported (I think I know, but I'm not sure enough to answer) | 18:31 |
fungi | my understanding is that zuul can't guarantee they all merge at the exact same moment, and if gerrit allows it to merge one and then refuses to let it merge another... you could end up in a bad situation | 18:45 |
fungi | though there are likely other technical reasons as well | 18:46 |
fungi | (and then there are the theoretical reasons behind not supporting it, in that it promotes poor development patterns, but that's more of a judgment call) | 18:47 |
jeblair | it would not be much code for zuul to support that, and we probably will as an option in v3. but we don't want to enable it for openstack because we value the strict sequencing which aids in CD. | 18:47 |
jeblair | (in order to support it, we would have zuul push to gerrit for merges rather than depend on the gerrit submission queue) | 18:47 |
fungi | good point | 18:52 |
fungi | that would at least mitigate the situation i was concerned about | 18:52 |
jeblair | yeah, the code is basically there (that was part of the idea of the mergers -- they even already have a push method), we just (intentionally) don't have a way to turn it on. | 18:53 |
fungi | i bet we could also stop crippling the git merge calls to allow the normal suite of strategies | 18:55 |
zaro | jeblair: have you taken a look at the Gerrit batch plugin. im wondering whether there's any benefit using that over zuul mergers? | 18:56 |
fungi | once we no longer have to rely on gerrit actually being capable of merging on its own | 18:56 |
jeblair | fungi: yeah, aside from us possibly changing our minds and allowing simultaneous cross-dependent change merges, we might want to enable that to deal with the octomerge issue | 18:57 |
fungi | https://gerrit.googlesource.com/plugins/batch/ "...building and previewing sets of proposed updates to multiple projects/branches/refs that should be applied together..." | 18:57 |
fungi | neat | 18:57 |
jeblair | zaro: no -- we might be able to use it, but also, we're doing just about all the work needed anyway. | 18:57 |
jeblair | so i'm not sure it would save us anything | 18:58 |
fungi | also with zuul-mergers we can scale pretty limitlessly | 18:58 |
fungi | whereas the batch plugin would likely put a lot of additional load on gerrit | 18:59 |
*** jamielennox|away is now known as jamielennox | 20:08 | |
zaro | sorry i had to drop off. | 21:28 |
zaro | fungi: i think the idea is that load is mostly from cloning the repo and pushing back. | 21:30 |
zaro | fungi: therefore when zuul cloning it's putting load on gerrit. with the batch plugin it's all done on server so no cloning needed, just the merges so might be less load. | 21:31 |
zaro | fungi: actually maybe not, since zuul clones from the cgit repos? so maybe there's not much benefit. | 21:38 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!