*** jamesmcarthur has quit IRC | 00:01 | |
*** jamesmcarthur has joined #zuul | 00:21 | |
*** jamesmcarthur has quit IRC | 00:52 | |
*** jamesmcarthur has joined #zuul | 01:08 | |
*** jamesmcarthur has quit IRC | 01:55 | |
*** jamesmcarthur has joined #zuul | 01:55 | |
*** jamesmcarthur has quit IRC | 02:00 | |
*** jamesmcarthur has joined #zuul | 02:05 | |
*** jamesmcarthur has quit IRC | 02:10 | |
*** jamesmcarthur has joined #zuul | 02:18 | |
*** jamesmcarthur has quit IRC | 02:22 | |
*** jamesmcarthur has joined #zuul | 02:24 | |
*** jamesmcarthur has quit IRC | 02:32 | |
*** jamesmcarthur has joined #zuul | 02:58 | |
*** jamesmcarthur has quit IRC | 03:31 | |
*** jamesmcarthur_ has joined #zuul | 03:31 | |
*** jamesmcarthur_ has quit IRC | 04:01 | |
*** jamesmcarthur has joined #zuul | 04:21 | |
*** zbr has quit IRC | 04:45 | |
*** zbr has joined #zuul | 04:46 | |
-openstackstatus- NOTICE: Our CI system has problems uploading job results to the log server and thus all jobs are failing. Do not recheck jobs until the situation is fixed. | 05:41 | |
*** ChanServ changes topic to "Our CI system has problems uploading job results to the log server and thus all jobs are failing. Do not recheck jobs until the situation is fixed." | 05:41 | |
*** jamesmcarthur has quit IRC | 05:45 | |
*** jamesmcarthur has joined #zuul | 06:23 | |
*** yolanda has joined #zuul | 06:33 | |
*** jamesmcarthur has quit IRC | 06:36 | |
*** shachar has quit IRC | 07:51 | |
*** shachar has joined #zuul | 07:51 | |
*** altlogbot_1 has quit IRC | 07:57 | |
*** altlogbot_0 has joined #zuul | 07:59 | |
*** ChanServ changes topic to "Discussion of the project gating system Zuul | Website: https://zuul-ci.org/ | Docs: https://zuul-ci.org/docs/ | Source: https://git.zuul-ci.org/ | Channel logs: http://eavesdrop.openstack.org/irclogs/%23zuul/ | Weekly updates: https://etherpad.openstack.org/p/zuul-update-email" | 12:32 | |
-openstackstatus- NOTICE: log publishing is working again, you can recheck your jobs failed with "retry_limit" | 12:32 | |
*** rfolco|ruck has quit IRC | 12:58 | |
*** tosky has joined #zuul | 13:40 | |
*** bhavikdbavishi has joined #zuul | 13:43 | |
*** bhavikdbavishi has quit IRC | 13:59 | |
*** bhavikdbavishi has joined #zuul | 15:56 | |
*** jamesmcarthur has joined #zuul | 16:19 | |
*** jamesmcarthur has joined #zuul | 16:19 | |
*** jamesmcarthur has quit IRC | 16:44 | |
*** jamesmcarthur has joined #zuul | 16:45 | |
*** armstrongs has joined #zuul | 16:46 | |
armstrongs | hi question, i have set-up a zookeeper cluster and hooked up multiple zuul-executors and nodepool launchers. The config is reference the zookeeper clusters. When i schedule jobs i keep seeing them land on the same executor. If i take the executor out of service it lands on another but i am not seeing a distribution of jobs across executors. How do | 16:48 |
---|---|---|
armstrongs | you make sure that jobs are distributed? | 16:48 |
*** jamesmcarthur has quit IRC | 16:57 | |
SpamapS | armstrongs:gearman distributes jobs based on response time | 16:59 |
armstrongs | ignore me i am talking nonsense had a config issue | 16:59 |
armstrongs | working now | 16:59 |
armstrongs | thanks | 16:59 |
SpamapS | armstrongs:it sends out a "wakeup" to every worker, and the first one that responds with "GRAB_JOB" wins. | 16:59 |
armstrongs | thanks for the info | 17:00 |
SpamapS | also zookeeper is only used for scheduler and nodepool IIRC | 17:00 |
armstrongs | ah ok so executor just connects to gearman | 17:00 |
fungi | we do also have backoff heuristics in the executors to make sure they stop claiming jobs if they reach certain resource thresholds, which should make sure the workload evens out a bit under heavy volume | 17:01 |
fungi | even if some of your executors are faster at claiming jobs than others | 17:01 |
armstrongs | yeah its looking good testing with 5 executors and it looks pretty distributed | 17:01 |
armstrongs | :) | 17:02 |
fungi | corvus: so, looking at mirror_info i expect the challenge faced is roughly the same as in the currently-used role... basically for debian 10/buster right now we'd want to omit a mirror_info.debian entry for https://{{ zuul_site_local_mirror_host }}/debian {{ ansible_distribution_release }}-updates main" until debian has its first stable point release of buster (ideal would be if we could auto-detect presence of | 17:02 |
fungi | the buster-updates dist on our mirrors, but just being able to omit it until we know we should adjust configuration to put it to use once available) | 17:02 |
fungi | (...should be a viable alternative) | 17:05 |
fungi | also https://zuul-ci.org/docs/zuul-jobs/mirror.html#rolevar-mirror_info.debian doesn't mention how you enumerate which dist repositories you want to use at the given url (e.g. stretch vs stretch-backports vs stretch-updates) | 17:07 |
*** jamesmcarthur has joined #zuul | 17:11 | |
*** jamesmcarthur has quit IRC | 17:16 | |
corvus | fungi: re the last thing, i think that's the (perhaps poorly named) 'components' attribute? | 17:16 |
fungi | "components" seems to be for specifying things like main, contrib, non-free | 17:17 |
fungi | suites | 17:17 |
corvus | fungi: re the debian 10 thing -- ah, i see now. that seems like logic that we could put into the debian mirror role, since at least whether debian has had a release is globally applicable. if we wanted to make that site-local configuration, i guess we would need to add or change that data structure.... | 17:18 |
corvus | we should rename that suites then :) | 17:18 |
fungi | er, suites is actually what i'm calling dists, sorry... i'm going to rephrase using the field descriptors from the sources.list(5) manpage | 17:19 |
fungi | the schema for a sources.list entry is: | 17:20 |
fungi | <type> [options] <uri> <suite> [components] | 17:21 |
fungi | so "components" is the right term for things like "main contrib non-free" | 17:21 |
fungi | suite is something like "stretch" or "stretch-backports" or "stretch-updates" | 17:21 |
fungi | type is generally "deb" or "deb-src" | 17:22 |
fungi | so the terminology used in mirror_info.debian looks reasonable, we're just missing at least a couple more fields | 17:23 |
corvus | oh, somehow i missed the difference between suite and components | 17:24 |
corvus | i think i just saw "<uri> [components]" :) | 17:24 |
corvus | so sounds like we should add suite | 17:24 |
fungi | type can possibly be inferred (we can decide to either always include deb-src for every deb entry, or to never include it and assume jobs won't be consuming source packages) | 17:25 |
fungi | though if the mirror doesn't include source packages, then deb-src entries could result in apt update failures, i expect | 17:26 |
*** jamesmcarthur has joined #zuul | 17:52 | |
*** jamesmcarthur has quit IRC | 17:59 | |
*** tosky has quit IRC | 18:12 | |
*** jamesmcarthur has joined #zuul | 18:16 | |
*** jamesmcarthur has quit IRC | 18:21 | |
armstrongs | i have put zuul web dashboard behind a load balancer and it is all working fine, apart from the streaming of logs. It seems for some web instances it isn't showing and just outputs end of stream as opposed to the running log. Is there anything special needed on the load balancer to get this to work for all instances | 18:21 |
fungi | possible the websocket connection is getting aggressively timed out by the lb? | 18:25 |
*** bhavikdbavishi has quit IRC | 18:25 | |
armstrongs | it comes up eventually but just has a long delay | 18:33 |
armstrongs | like 5 or 6 seconds | 18:34 |
fungi | ahh, i think "end of stream" misleadingly displays until the javascript responsible is able to establish a connection, so sounds like something is getting delayed there maybe | 18:38 |
fungi | are you observing this when pulling up the log stream for builds which have been underway for a while, or only on builds as they're starting up? | 18:41 |
fungi | the console log streamer is started as an early part of the build, so isn't there instantly | 18:42 |
fungi | if it's not something obvious like that, i would probably resort to packet captures or access log analysis as the next step | 18:47 |
*** tflink has quit IRC | 19:11 | |
*** tflink has joined #zuul | 19:12 | |
*** tflink has quit IRC | 19:15 | |
*** tflink has joined #zuul | 19:17 | |
*** themroc has joined #zuul | 19:19 | |
*** themroc has quit IRC | 19:25 | |
*** jamesmcarthur has joined #zuul | 20:18 | |
SpamapS | armstrongs: some LB's don't handle websockets properly in HTTP mode. | 20:36 |
SpamapS | ELB Classics being one of those. | 20:36 |
*** jamesmcarthur has quit IRC | 20:54 | |
*** tosky has joined #zuul | 21:32 | |
armstrongs | observing it for jobs that have been running a while. Was trying this with 5 web nodes as a scale up test, seems the more web nodes that are there the more it happens. Its like the stream is pinned to a specific web server | 23:23 |
*** tosky has quit IRC | 23:25 | |
armstrongs | as i tried hitting specific web nodes directly behind the load balancer and some arent getting streams. Its like 1 out of 10 dont show when running 10 concurrent jobs. | 23:25 |
fungi | could one of them lack the requisite connectivity from the fingergw/web daemons to 7900/tcp on the executors? | 23:37 |
fungi | or could 7900/tcp on some of your executors be blocked? | 23:37 |
fungi | https://zuul-ci.org/docs/zuul/admin/components.html | 23:38 |
*** panda is now known as panda|pubholiday | 23:41 | |
*** jamesmcarthur has joined #zuul | 23:58 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!