*** sanjayu__ has quit IRC | 00:19 | |
*** sgw has quit IRC | 00:31 | |
*** zenkuro has quit IRC | 00:50 | |
*** hamalq_ has quit IRC | 01:03 | |
*** openstackgerrit has joined #zuul | 01:15 | |
openstackgerrit | Ian Wienand proposed zuul/zuul master: web: PF4 minor rework of log viewer page https://review.opendev.org/751140 | 01:15 |
---|---|---|
ianw | felixedel: thanks, i think your work looks great. it makes some other bits look a bit old now :) i had a go at some PF4ness for the log viewer page ^ | 01:16 |
openstackgerrit | Ian Wienand proposed zuul/zuul master: web: PF4 minor rework of log viewer page https://review.opendev.org/751140 | 02:08 |
openstackgerrit | Ian Wienand proposed zuul/zuul master: web: PF4 minor rework of log viewer page https://review.opendev.org/751140 | 02:50 |
*** bstinson has quit IRC | 04:26 | |
*** evrardjp has quit IRC | 04:33 | |
*** evrardjp has joined #zuul | 04:33 | |
*** bstinson has joined #zuul | 04:38 | |
*** cloudnull has quit IRC | 04:51 | |
*** cloudnull has joined #zuul | 04:52 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Prepare Zookeeper for scale-out scheduler https://review.opendev.org/717269 | 05:19 |
felixedel | ianw: Glad you like it :) Right, I had the same feeling yesterday. But currently I have too many PF4 changes open and I want to finish thhose first before starting with another new page :D | 05:38 |
*** wxy has joined #zuul | 06:16 | |
*** jcapitao has joined #zuul | 06:55 | |
*** jcapitao has quit IRC | 07:04 | |
*** jcapitao has joined #zuul | 07:06 | |
*** jcapitao has quit IRC | 07:21 | |
*** saneax has joined #zuul | 07:24 | |
*** jcapitao has joined #zuul | 07:26 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Fix memleak on zk session loss https://review.opendev.org/751170 | 07:26 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Clear traceback before attaching exception to event https://review.opendev.org/751171 | 07:26 |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Remove source_event from Change objects https://review.opendev.org/751172 | 07:26 |
*** hashar has joined #zuul | 07:39 | |
*** saneax has quit IRC | 07:44 | |
*** jpena|off is now known as jpena | 07:44 | |
*** jcapitao has quit IRC | 07:45 | |
*** LLIU82 has joined #zuul | 08:01 | |
openstackgerrit | Lida Liu proposed zuul/zuul master: Add commit id to Change for mqtt reporter https://review.opendev.org/722478 | 08:02 |
mhu | felixedel, that's wise, any change you'd like reviewed in priority? | 08:22 |
tobiash | zuul-maint: we have a couple of memleak fixes we've been hunting throughout this week: https://review.opendev.org/#/q/project:zuul/zuul+topic:memleak-fixes | 08:23 |
felixedel | mu: https://review.opendev.org/#/c/741385/6 and https://review.opendev.org/#/c/746112/9 would be cool. But I have the feeling that the latter one must be rebased after the merged our "scroll issue fixes" and the modal changes (https://review.opendev.org/#/c/750875/1 + parents). The filtertoolbar works independently, though. | 08:29 |
*** jcapitao has joined #zuul | 08:30 | |
felixedel | ^mhu | 08:30 |
*** ssbarnea has joined #zuul | 08:38 | |
mhu | felixedel, oops disregard comments on obsolete PS, I guess they were kept in cache by firefow | 08:43 |
*** tosky has joined #zuul | 08:47 | |
*** wuchunyang has joined #zuul | 08:51 | |
felixedel | ianw, corvus: I've abandoned my other "try to fix the scroll issues" changes as I think they are not necessary anylonger once the config error modal and the related changes are merged. | 08:57 |
*** vishalmanchanda has joined #zuul | 09:00 | |
*** harrymichal has joined #zuul | 09:02 | |
*** armstrongs has joined #zuul | 09:02 | |
*** ssbarnea has quit IRC | 09:06 | |
*** LLIU82 has quit IRC | 09:08 | |
*** harrymichal has quit IRC | 09:25 | |
*** zenkuro has joined #zuul | 09:27 | |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Prepare Zookeeper for scale-out scheduler https://review.opendev.org/717269 | 09:29 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Mandatory Zookeeper connection for ZuulWeb in tests https://review.opendev.org/721254 | 09:29 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Driver event ingestion https://review.opendev.org/717299 | 09:29 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Connect merger to Zookeeper https://review.opendev.org/716221 | 09:29 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Connect fingergw to Zookeeper https://review.opendev.org/716875 | 09:29 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: Connect executor to Zookeeper https://review.opendev.org/716262 | 09:29 |
openstackgerrit | Jan Kubovy proposed zuul/zuul master: WIP: Switch to using zookeeper instead of gearman for jobs (keep gearman for mergers) https://review.opendev.org/744416 | 09:29 |
*** saneax has joined #zuul | 09:35 | |
*** sanjayu_ has joined #zuul | 09:41 | |
*** saneax has quit IRC | 09:42 | |
*** nils has joined #zuul | 10:06 | |
*** mnaser has quit IRC | 10:11 | |
*** gundalow has quit IRC | 10:11 | |
*** donnyd has quit IRC | 10:11 | |
*** ttx has quit IRC | 10:11 | |
*** andreykurilin has quit IRC | 10:11 | |
*** freefood has quit IRC | 10:11 | |
*** corvus has quit IRC | 10:11 | |
*** andreykurilin has joined #zuul | 10:11 | |
*** corvus has joined #zuul | 10:11 | |
*** gundalow has joined #zuul | 10:11 | |
*** donnyd has joined #zuul | 10:11 | |
*** freefood has joined #zuul | 10:16 | |
openstackgerrit | Matthieu Huin proposed zuul/zuul-client master: Add console-stream subcommand https://review.opendev.org/751238 | 10:32 |
mhu | zuul-maint: https://review.opendev.org/#/c/749775/ needs the last +3! | 10:35 |
*** wuchunyang has quit IRC | 10:35 | |
tobiash | +3 with comment | 10:47 |
*** jcapitao is now known as jcapitao_lunch | 10:47 | |
tobiash | mhu: cool, you already thought about the encrypt subcommand :) | 10:48 |
openstackgerrit | Merged zuul/zuul-client master: Initialize repository https://review.opendev.org/749775 | 10:56 |
openstackgerrit | Merged zuul/zuul-client master: Add promote, release jobs https://review.opendev.org/750193 | 10:56 |
*** sanjayu__ has joined #zuul | 11:03 | |
*** sanjayu_ has quit IRC | 11:05 | |
tobiash | zuul-maint: this bugfix for vars returned via zuul_return in combination with retries would need another review: https://review.opendev.org/711002 | 11:20 |
*** sanjayu__ has quit IRC | 11:28 | |
*** hashar has quit IRC | 11:34 | |
*** ttx has joined #zuul | 11:40 | |
*** jpena is now known as jpena|lunch | 11:42 | |
mhu | the promote jobs for zuul-client seem to be missing something https://zuul.opendev.org/t/zuul/builds?project=zuul%2Fzuul-client&pipeline=promote | 11:49 |
*** jcapitao_lunch is now known as jcapitao | 11:59 | |
*** rfolco|ruck has joined #zuul | 12:01 | |
*** rlandy has joined #zuul | 12:01 | |
*** zenkuro has quit IRC | 12:14 | |
*** Goneri has joined #zuul | 12:16 | |
tobiash | mhu: there is a jobname mismatch: opendev-tox-docs vs zuul-tox-docs | 12:16 |
mhu | tobiash, ah I see, I'll upload a patch | 12:20 |
mhu | I guess build-python-release is also missing | 12:20 |
tobiash | mhu: k, so just use zuul-tox-docs instead of opendev-tox-docs and the docs promote should work | 12:21 |
tobiash | nodepool does it like that as well | 12:21 |
*** LLIU82 has joined #zuul | 12:22 | |
tobiash | and gate misses build-python-release | 12:22 |
tobiash | yepp | 12:22 |
LLIU82 | https://review.opendev.org/#/c/722478/ need some review here | 12:23 |
openstackgerrit | Matthieu Huin proposed zuul/zuul-client master: Fix promote and release pipelines https://review.opendev.org/751259 | 12:24 |
mhu | tobiash, ^ should do it | 12:24 |
tobiash | +2 | 12:24 |
openstackgerrit | Matthieu Huin proposed zuul/zuul-client master: Add cross testing with Zuul https://review.opendev.org/751264 | 12:36 |
*** hashar has joined #zuul | 12:37 | |
*** zenkuro has joined #zuul | 12:39 | |
*** zenkuro has quit IRC | 12:43 | |
*** jpena|lunch is now known as jpena | 12:44 | |
openstackgerrit | Merged zuul/nodepool master: [provider][aws] use one API call to create tags https://review.opendev.org/746921 | 12:55 |
*** zenkuro has joined #zuul | 13:07 | |
*** LLIU82 has quit IRC | 13:09 | |
openstackgerrit | Merged zuul/zuul-client master: Fix promote and release pipelines https://review.opendev.org/751259 | 13:12 |
*** zenkuro has quit IRC | 13:12 | |
openstackgerrit | Matthieu Huin proposed zuul/zuul-client master: Add cross testing with Zuul https://review.opendev.org/751264 | 13:20 |
*** zenkuro has joined #zuul | 13:37 | |
*** saneax has joined #zuul | 13:40 | |
*** zenkuro has quit IRC | 13:42 | |
*** dmsimard has quit IRC | 13:44 | |
*** dmsimard has joined #zuul | 13:45 | |
openstackgerrit | Tobias Henkel proposed zuul/zuul master: Ignore 500 errors when requesting pr files https://review.opendev.org/751281 | 13:48 |
*** gmann is now known as gmann_afk | 14:05 | |
openstackgerrit | Matthieu Huin proposed zuul/zuul-client master: Make default config files location a class attribute https://review.opendev.org/751291 | 14:13 |
openstackgerrit | Matthieu Huin proposed zuul/zuul master: Add zuul-client to requirements https://review.opendev.org/750196 | 14:14 |
*** hashar has quit IRC | 14:18 | |
tobiash | promote worked now: https://review.opendev.org/#/c/751259/ :) | 14:24 |
tobiash | and it's hosted: https://zuul-ci.org/docs/zuul-client/ | 14:25 |
tobiash | (but not linked yet) | 14:25 |
mhu | tobiash, ah cool! Can we get an initial 0.0 release so that the project's on PyPI as well? | 14:31 |
tobiash | corvus is our release expert :) | 14:33 |
corvus | tobiash, mhu: i'll take a look in a bit | 14:34 |
mhu | thanks! | 14:34 |
fungi | mhu: you should be able to submit a change against the zuul/zuul-website repository to add it to the docs list | 14:35 |
fungi | when you're ready | 14:35 |
mhu | fungi, I'll have a look | 14:35 |
corvus | mhu, tobiash: commit 56981c76df188d78e8395260a19eee9e5ad16b54 (HEAD -> master, tag: 0.0.0, origin/master, origin/HEAD, refs/changes/59/751259/1) | 14:39 |
corvus | that look right? | 14:39 |
tobiash | lgtm | 14:40 |
mhu | corvus, yep! Should allow for the rest to get going | 14:40 |
mhu | thanks | 14:40 |
tobiash | mhu: thinking about https://review.opendev.org/750196 it might make sense to additionally expose those tests in their own job so they can be used in the zuul-client repo as integration tests | 14:41 |
mhu | tobiash, I was thinking of doing something like what's done with nodepool: https://review.opendev.org/#/c/751264/ | 14:41 |
mhu | would that work for you? | 14:41 |
corvus | mhu: pushed | 14:42 |
mhu | \o/ | 14:42 |
tobiash | mhu: wfm | 14:43 |
*** sgw has joined #zuul | 14:44 | |
openstackgerrit | Matthieu Huin proposed zuul/zuul-website master: Add link to zuul-client documentation https://review.opendev.org/751312 | 14:48 |
mhu | uh, there might be a problem with the archive name | 14:49 |
mhu | creating '/home/zuul/src/opendev.org/zuul/zuul-client/dist/zuul-0.0.0-py3-none-any.whl' and adding '.' to it | 14:49 |
fungi | i'm not getting whatever the problem is with that archive name | 14:51 |
corvus | oh that should say "zuul-client-0.0.0"? | 14:51 |
fungi | aha, yep | 14:51 |
fungi | it's likely needing to be tweaked in setup.cfg | 14:51 |
mhu | yep, exactly | 14:51 |
mhu | sorry about that | 14:51 |
fungi | the package name needs to not be inherited from the module name | 14:52 |
corvus | https://zuul.opendev.org/t/zuul/build/3dbd2b28fc46457eb10e1c2dddf14c98/log/job-output.txt#486 | 14:52 |
corvus | congratulations, you just uploaded zuul 0.0.0 | 14:52 |
corvus | https://pypi.org/project/zuul/0.0.0/ | 14:52 |
mhu | ahaha | 14:53 |
corvus | i'm not laughing | 14:53 |
fungi | i can delete that release | 14:53 |
mhu | I mean, at least it didn't erase 3.19.1 | 14:53 |
corvus | i'm more concerned about whether we overwrote a real release | 14:53 |
fungi | pypi (warehouse) thankfully won't allow you to replace a release or a file | 14:54 |
corvus | ok. i thought there was a 0.0.0 | 14:54 |
corvus | fungi: i am in favor of you deleting that | 14:54 |
fungi | it may have allowed us to upload additional files for 0.0.0 if their names were different than existing files for that release | 14:54 |
corvus | looks like it's just the new files | 14:55 |
fungi | though from what i could tell digging in zuul git repository history, we started versioning at 1.0.0 | 14:55 |
fungi | you know, because we could ;) | 14:55 |
corvus | we may have had a 0.0.0 release with no files (due to the old pypi registration system) | 14:55 |
corvus | that's what i used to do when it could be done | 14:55 |
fungi | maybe. i used to just use "0" | 14:56 |
fungi | though i'm pretty sure the release would have been marked as many years ago rather than 7 minutes ago in that case | 14:56 |
corvus | even better | 14:56 |
fungi | pypi has normally kept the release timestamp the same even when new files are uploaded for an existing release (like adding more wheels or something) | 14:57 |
fungi | #status log deleted errant zuul-0.0.0-py3-none-any.whl and zuul-0.0.0.tar.gz files and the corresponding 0.0.0 release from the zuul project on pypi | 14:59 |
openstackstatus | fungi: finished logging | 14:59 |
openstackgerrit | Matthieu Huin proposed zuul/zuul-client master: Fix package metadata https://review.opendev.org/751315 | 15:00 |
fungi | going to see if i need to remove them from tarballs.o.o as well | 15:00 |
fungi | looks like we only upload branch tip artifacts there, no release artifacts or signatures: https://tarballs.opendev.org/zuul/zuul/ | 15:01 |
fungi | and not since 2020-02-20 apparently | 15:02 |
*** harrymichal has joined #zuul | 15:02 | |
mhu | fungi, thanks for looking into that, https://review.opendev.org/#/c/751315/ should get things back to normal. My apologies for letting that typo go past! | 15:05 |
fungi | i think i reviewed the addition too and didn't spot it, so it's not all on you | 15:06 |
corvus | fungi: thx | 15:07 |
corvus | mhu: should that be zuulclient or zuul-client ? | 15:07 |
mhu | corvus, every other project is named zuul-something, I'll add the hyphen | 15:10 |
openstackgerrit | Matthieu Huin proposed zuul/zuul-client master: Fix package metadata https://review.opendev.org/751315 | 15:11 |
corvus | mhu: maybe capitalize that "a" at the start? i think that will shouw up on pypi | 15:12 |
corvus | (it's a nit, but a big nit) | 15:12 |
openstackgerrit | Matthieu Huin proposed zuul/zuul-client master: Fix package metadata https://review.opendev.org/751315 | 15:13 |
mhu | voilĂ :) | 15:13 |
corvus | tobiash, fungi: ^ lgtm | 15:13 |
*** harrymichal has quit IRC | 15:15 | |
fungi | yep, when i build with that it produces zuul-client-0.0.1.dev1.tar.gz and zuul_client-0.0.1.dev1-py3-none-any.whl now, so should be all set | 15:18 |
openstackgerrit | Merged zuul/zuul-client master: Fix package metadata https://review.opendev.org/751315 | 15:29 |
*** harrymichal has joined #zuul | 15:39 | |
*** jcapitao has quit IRC | 15:46 | |
corvus | mhu, fungi: commit 1d2301814b5c27e2e712e50a5beb0e96fccf3bab (HEAD -> master, tag: 0.0.1, origin/master, origin/HEAD, refs/changes/15/751315/3) | 15:47 |
corvus | look right? | 15:47 |
fungi | corvus: yep, that looks like my origin/HEAD too | 15:48 |
fungi | and is also the change i checked out and tested | 15:49 |
corvus | pushed | 15:50 |
openstackgerrit | Pierre-Louis Bonicoli proposed zuul/zuul master: gitlab: an "update" event isn't always a "labeled" action https://review.opendev.org/750544 | 15:58 |
*** hamalq has joined #zuul | 15:58 | |
*** hamalq_ has joined #zuul | 15:59 | |
*** hamalq has quit IRC | 16:03 | |
*** harrymichal has quit IRC | 16:08 | |
*** saneax has quit IRC | 16:31 | |
*** rfolco|ruck is now known as rfolco|ruck|brb | 16:39 | |
clarkb | is http://paste.openstack.org/show/797786/ a zuul bug? | 16:43 |
clarkb | I'm checking now if the problem persists in that repo, but maybe the executor crashed and leaked that file and that is antoher thing to scrub on startup? | 16:43 |
clarkb | the other thing that could be happening is having the build contexts leak out somehow? | 16:43 |
clarkb | -rw-r--r-- 1 zuuld zuuld 0 Aug 25 07:18 index.lock <- uptime says 17 days whcih is ~ to that time | 16:45 |
clarkb | so ya I think the server may have crashed and we leaked the git index.lock | 16:45 |
*** fdegir has quit IRC | 16:53 | |
fungi | maybe it would be safest if executors cleaned their local repos at boot? | 16:53 |
fungi | at start, whatever | 16:53 |
*** tobberydberg has quit IRC | 16:57 | |
clarkb | ya I think we should add in a startup task to clear out index.lock files at least. Doing an update on all repos first would likely be very expsnive | 16:57 |
*** jpena is now known as jpena|off | 17:34 | |
openstackgerrit | Clark Boylan proposed zuul/zuul master: Clean up stale git index.lock files on merger startup https://review.opendev.org/751370 | 17:37 |
clarkb | something like that maybe for cleaning up the index.lock files. I Was hoping the build dir cleanup had tests but I don't see any? Or maybe there were added after the original implementation and i need to look harder | 17:37 |
*** gmann_afk is now known as gmann | 17:42 | |
*** rfolco|ruck|brb is now known as rfolco|ruck | 17:46 | |
*** fdegir has joined #zuul | 18:02 | |
fungi | could recent changes for zuul's webui have changed how api queries are being made? we're noticing a lot of "cache busting" (pragma: no-cache, cache-control: no-cache, max-age: 1) which we think has caused our deployment in opendev to no longer be able to offload requests onto apache mod_cache... zuul-web cpu utilization is quite high, response times are terrible, and apache definitely is not caching the status | 18:06 |
fungi | json for us | 18:06 |
fungi | has anyone else seen similar behavior recently? | 18:06 |
clarkb | and it seemed to start after we updated zuul-scheduler and zuul-web to pick up the hold build status in the db | 18:07 |
fungi | if that's intentional then we can probably come up with a workaround, just first trying to make sure we're not barking up the wrong tree | 18:08 |
fungi | our request volume doesn't look like it's particularly higher than usual, but we also weren't logging mod_cache info until yesterday so we can't be 100% certain we were successfully caching it before either | 18:09 |
fungi | it's just our only good explanation for what we're seeing at this point | 18:10 |
clarkb | ya we noticed that we lacked logging for this in the debugging process and have since added it | 18:10 |
*** vishalmanchanda has quit IRC | 18:20 | |
openstackgerrit | Clark Boylan proposed zuul/zuul master: Clean up stale git index.lock files on merger startup https://review.opendev.org/751370 | 18:25 |
corvus | clarkb: i don't always see a pragma header being sent | 18:31 |
corvus | clarkb: only when i do a shift-reload | 18:33 |
clarkb | corvus: the cache-control: no-cache seems to be there if I just let it sit and wait | 18:34 |
clarkb | as is the pragma | 18:34 |
clarkb | it also seems to be refreshing way more quickly that I would normally expect | 18:35 |
clarkb | every 5 seconds? (maybe this is related) | 18:36 |
corvus | every 5 seconds is normal | 18:38 |
clarkb | oh I thought it was 60 seconds for some reason | 18:39 |
corvus | the only way i can get mine to send a pragma is by shift reload. i've never sent a cache-control header afaict | 18:39 |
clarkb | I see both on every request. Maybe it is browser specific: FF 81.0b7 here I'll check chrome now | 18:40 |
corvus | i do see one bug: it's supposed to stop sending the request if the browser window is not active, but that only happens if it's not in the middle of a request when the user switches focus. if they do that while the request is outstanding, it does not disable the request loop. | 18:40 |
corvus | ff80.0.1 | 18:40 |
corvus | i believe that changed recently | 18:41 |
corvus | (the request loop disabling code) | 18:41 |
clarkb | chrome doesn't seem to provide cache-control or pragma on the 5 second interval refreshes. Doing a shift reload now to see if it does then | 18:41 |
clarkb | shift reload in chrome set both | 18:42 |
corvus | then see if the next request clears them | 18:42 |
corvus | (or maybe the one after that if the loop gets out of sync) | 18:42 |
clarkb | yes the next regular interval (and subsequent intervals) has cleared them | 18:42 |
fungi | so maybe the performance decrease is due to no longer skipping refreshes on inactive windows, and we've actually been failing to cache stuff all along | 18:43 |
clarkb | if I'm the only one using broken ff beta then that doesn't explain why we aren't caching though | 18:43 |
clarkb | fungi: oh good call | 18:43 |
corvus | so we have 2 hypotheses: a) "pragma: no-cache" header sent consistently in ff beta. b) refresh loop broken in js causing lots of extra requests from people leaving backgrounded status tabs open | 18:44 |
fungi | however, like i said, our actual request volume doesn't look particularly higher going by bandwidth utilization and packet rate | 18:44 |
corvus | ah, and c) web server cache config is broken? | 18:44 |
corvus | clarkb: do we know we are not caching? if so, how do we know that? | 18:44 |
fungi | by adding cache details to the access log | 18:45 |
clarkb | that was a change we made yseterday to ebtter understand this | 18:45 |
clarkb | and we see the static resources being handled by the caching system but not the status json | 18:45 |
fungi | yeah, when we started to dig into it we noticed that apache's cacheenable directive isn't documented as supporting regular expressions, so we tried changing that and got the static content to start caching, but not the status api calls for the multi-tenant vhost (but it's somehow caching them for the whitelabeled vhost) | 18:46 |
fungi | we need a wildcarded pattern match to cover the multi-tenant status api path, so tried putting cacheenable in a locationmatch instead, but it's still not getting cached | 18:47 |
clarkb | fungi: but also you tried ti with specific tenant names and that wasn't caching either | 18:48 |
clarkb | and that is when we started to think it may be due to the requests themselves | 18:48 |
corvus | how can you tell a cache hit? | 18:48 |
corvus | i see that it says "cache miss" in the access log | 18:49 |
clarkb | I think it will say cache hit but also cache something I'm caching it on this request | 18:49 |
* clarkb looks | 18:49 | |
fungi | oh, yep, instead of the locationmatch i tries to just cacheenable mem /api/tenant/openstack/status and we still weren't getting any hits | 18:49 |
fungi | corvus: grep for "cache hit" | 18:49 |
corvus | ah. i don't see that for any status url | 18:50 |
clarkb | cache miss: attempting entity save and cache hit | 18:50 |
clarkb | you see both for /static | 18:50 |
corvus | i see either "cache miss" or nothing for status | 18:50 |
fungi | corvus: yep, that's the problem we're running into | 18:50 |
clarkb | so caching is working for that path but not status | 18:50 |
corvus | i thought someone said whitelabel status was being cached | 18:50 |
fungi | er, i don't recall if it was being cached, but it was at least hitting mod_cache accoring to the access log | 18:51 |
fungi | while the non-whitelabeled status api path was not that i could find | 18:51 |
fungi | even when just hardcoding it to one of the tenants and not including any wildcarding | 18:51 |
clarkb | corvus: yes if you filter by /api/status which is the whitelabeled path then you'll see both | 18:51 |
clarkb | (though not many cache hits they do exist) | 18:51 |
fungi | (right now it's configured for a locationmatch, but that's also not working) | 18:51 |
clarkb | but if you filter by /api/tenant/ you get nothing | 18:52 |
*** tobberydberg has joined #zuul | 18:58 | |
clarkb | corvus: fwiw I'm not entirely sure that we're caching the whitelabeled status correctly, just that the system agrees it is something to cache (and appears to have done so occasionally) | 18:59 |
corvus | clarkb: 4 times in the past day sounds pretty spurious | 19:00 |
fungi | yeah, i don't know that's even a frequency we can chalk up to not many people using the whitelabeled vhost these days | 19:01 |
corvus | there are lots of whitelabeled requests | 19:02 |
corvus | plenty of times we see multiple requests / second for whitelabel, so should be enough to hit the cache | 19:03 |
corvus | abusing the inactivity bug -- i now have 4 tabs reloading continually, all of them are cache misses | 19:05 |
corvus | and i do occasionally see duplicate content length, so it's very likely at least some of the time i'm hitting the internal zuul-web cache (so i know it's within the 1-second window) | 19:06 |
*** johanssone_ has joined #zuul | 19:13 | |
*** johanssone has quit IRC | 19:15 | |
clarkb | corvus: do you think the bug causing background refreshes is new enough to have been pulled in by the latest restart? or is that an old one? | 19:24 |
corvus | clarkb: i think it's new, lemme dig | 19:32 |
corvus | (was just out to lunch. so to speak) | 19:32 |
clarkb | no worries I'm justfinishing up mine too | 19:32 |
clarkb | it doesseem like if something like that is new with the latest restart it makes a likely candidate for the issue | 19:33 |
clarkb | doesn't explain the apache cachign problems but maybe wecan solve that separately if this is the underlying problem | 19:33 |
AJaeger | clarkb, corvus, here's a reorg of the upload-logs roles in zuul-jobs, could you review the stack at https://review.opendev.org/#/c/742732/7 , please? | 19:36 |
corvus | clarkb: 1ecbe58474 ed9d0446d5 70a7997197 are first-pass candidates from git blame; all authored in june/july timeframe (unsure when merged) | 19:37 |
clarkb | corvus: iirc our last restart of zuul-web was july 31 I feel like I checked that but need to find it in logs | 19:38 |
clarkb | oh that was before we restarted for scroll fixes | 19:38 |
corvus | 1ecbe58474 merged aug 24; ed9d0446d5 merged jul 13; 70a7997197 merged jul 7 | 19:39 |
clarkb | we restarted on he 28th for scroll fixes | 19:40 |
clarkb | August 28 I mean | 19:40 |
corvus | so all of these were in place then | 19:40 |
corvus | and aug 24 is the most recent related change | 19:41 |
clarkb | ya so either its the cause and no one complained until recently or its something else | 19:41 |
corvus | i doubt it's that no one complained :) | 19:41 |
*** hashar has joined #zuul | 19:42 | |
clarkb | looking at gerrit the only web related chagne between the earlier build page restart and scorll fixes and this restart was the change we restarted for | 19:43 |
clarkb | which added the hold attribute to builds | 19:43 |
clarkb | maybe its the scheduelr then (since the zuul web process is largely just a fancy proxy for that) | 19:44 |
corvus | clarkb: the logs indicate that zuul web internal caching is working as expected | 19:44 |
corvus | clarkb: zuul-web is only making a request to the scheduler 1/sec | 19:45 |
corvus | (so zuul-web is protecting the scheduler, but nothing is protecting zuul-web) | 19:45 |
clarkb | I guess it can also be an update in cherrpy ? | 19:45 |
clarkb | no new cherrypy releases | 19:45 |
corvus | check cheroot | 19:46 |
clarkb | https://pypi.org/project/cheroot/8.4.5/ is from august 24 | 19:46 |
corvus | we unpinned on jul 14 | 19:46 |
corvus | so that all should be the same then | 19:46 |
clarkb | when I looked yesterday (and now looking again) it seems the gearman requests finish in a erasonable amount of time | 19:49 |
clarkb | maybe not always super fact but hundreds of milliseconds not multiple whole seconds | 19:49 |
clarkb | but we don't log the job uuids so may have things mismatched | 19:49 |
clarkb | I'm able to reproduce what fungi said which is that direct requests to zuul-web are slow | 19:53 |
clarkb | which has me leaning back towards maybe we need better caching in apache to protect zuul-web, but also what changed here ? | 19:53 |
clarkb | corvus: I notice that 404s are slow too | 19:57 |
clarkb | corvus: almost like it is routing that is the problem | 19:57 |
clarkb | beacuse we should 404 early for something like /api/tenant/zuul/shoulderror | 19:58 |
clarkb | and not do any expensive backend processing | 19:58 |
corvus | clarkb: i imagine zuul-web is just cpu starved and backlogged | 19:58 |
clarkb | ya it is a busy process (and it isn't forking right) | 20:00 |
corvus | yes, it's threaded | 20:01 |
clarkb | so we're back to figuring out apache I guess. Maybe we need to increase the max ttl ? | 20:02 |
clarkb | perhaps the max-age value is hurting us? | 20:04 |
clarkb | though you'd expect apache would cache it for second? | 20:04 |
corvus | yeah, pretty sure this worked at one point | 20:05 |
clarkb | I wonder if that is calculated against the last modified value and not when apache gets it | 20:05 |
clarkb | maybe that delta is > 1 due to the slowness | 20:05 |
corvus | if so, could be a simple tipping point scenario | 20:05 |
clarkb | "Well formed content that is intended to be cached should declare an explicit freshness lifetime with the Cache-Control header's max-age or s-maxage fields, or by including an Expires header." and "If a response does not include an Expires header but does include a Last-Modified header, mod_cache can infer a freshness lifetime based on a heuristic, which can be controlled through the use of the | 20:06 |
clarkb | CacheLastModifiedFactor directive." | 20:06 |
clarkb | so I think that is a possibility here | 20:06 |
clarkb | reading about that CacheLastModifiedFactor directive now | 20:06 |
clarkb | max-age is meant to be taken from the time of request | 20:10 |
clarkb | so ya I think if zuul takes longer than a second to respond then we stop caching? | 20:10 |
clarkb | and perhaps the background refreshes are a component in tipping over | 20:12 |
clarkb | thinking out loud here, maybe we should bump that internal caching and max-age to 10 seconds? | 20:20 |
clarkb | then see if the behavior changes at all? (the longest request I saw was 9 seconds so expect 10 to be plenty) | 20:21 |
corvus | i'm not a fan of that | 20:22 |
corvus | i think a large installation like opendev either needs a good caching layer or more zuul-web instances | 20:23 |
*** nils has quit IRC | 20:23 | |
clarkb | hrm any idea if SO_REUSEADDR is set? if so we may by able to just run a few zuul-webs on the existing host | 20:25 |
clarkb | https://github.com/cherrypy/cherrypy/issues/1088 implies that it is set | 20:26 |
clarkb | webknjaz: ^ would it be crazy to start a few cherrypy processes on port 9000 and let the kernel decide where connections go? | 20:27 |
fungi | yeah, we have plenty of available processors, zuul-web seems to only want to use one | 20:30 |
fungi | well, one processor worth anyway | 20:31 |
fungi | though the work *seems* to get distributed across the processors according to top (if you hit the 1 key to expand the processor list) | 20:32 |
fungi | so i can't say for sure it would actually help | 20:32 |
clarkb | fungi: its a single python process with many threads and due to the GIL I think the only way to effectively use multiple cpus is to fork | 20:32 |
openstackgerrit | James E. Blair proposed zuul/zuul master: Correct visibility check in web JS https://review.opendev.org/751425 | 20:33 |
fungi | i wonder why top makes it look like those threads are distributed across processors then | 20:33 |
corvus | fungi: it's io heavy so it can use > 1 processor at a time | 20:33 |
corvus | but barely | 20:34 |
corvus | clarkb: is there no way to actually use apache to cache? | 20:35 |
clarkb | corvus: we may be able to tell apache2 to ignore the max-age sent by zuul-web | 20:35 |
corvus | clarkb, fungi, zbr: https://review.opendev.org/751425 should fix the visibility check | 20:35 |
corvus | clarkb: i don't want to ignore it, i want it to honor it :) | 20:36 |
corvus | "cache this for one second after you get it" | 20:36 |
clarkb | corvus: there are lots of tunables I'm just not sure how to express that in this case | 20:37 |
fungi | and sorry i'm mostly on silent running for the past few... trying to get okonomiyaki grilled and consumed before the mordred hour | 20:38 |
corvus | fungi: grill extra and share | 20:40 |
fungi | it's too thick to fit through the fax machine | 20:41 |
fungi | (you're not supposed to press down on it while it cooks!) | 20:42 |
clarkb | looking at the js fix now | 20:47 |
clarkb | hrm I should go pour a whiskey for the mordred happy hour | 20:50 |
fungi | i have a bowl of sake | 20:52 |
clarkb | the way the air is here I bet sake would almost taste like whiskey | 20:52 |
clarkb | (not really the trees burning are not all oak) | 20:52 |
fungi | this is pretty terrible sake (the finest sake, gekkeikan, serving the imperial household by appointment!) | 20:53 |
fungi | i normally use it for cooking, but desperate times call for desperate sake | 20:54 |
fungi | our local grocery carries 1.5 liter bottles of the stuff | 20:54 |
clarkb | acutally I've just remembered I have beer | 20:54 |
* mordred has made a caipirnha - but probably won't try to fax it to anyone | 20:55 | |
*** rlandy is now known as rlandy|afk | 21:09 | |
webknjaz | @clarkb: it is set but I haven't tried using it https://github.com/cherrypy/cheroot/pull/53/files#diff-b0366adf530cee9249c1888ba4f32260R1550 | 21:14 |
clarkb | webknjaz: great well I expect https://review.opendev.org/751426 to test it | 21:15 |
*** nils has joined #zuul | 21:17 | |
*** rfolco|ruck has quit IRC | 21:20 | |
*** rlandy|afk is now known as rlandy | 21:53 | |
*** hashar has quit IRC | 21:59 | |
*** nils has quit IRC | 22:13 | |
clarkb | webknjaz: Timeout('Port 9000 not free on 127.0.0.1.') so its not quite working, but I'm not likely to debug that further today. I'll poke at it more when I have time | 22:15 |
clarkb | "except when there is an active listening socket bound to the address." <- that may be the problem | 22:17 |
*** hamalq has joined #zuul | 23:13 | |
*** hamalq_ has quit IRC | 23:14 | |
*** tosky has quit IRC | 23:20 | |
*** hamalq has quit IRC | 23:37 | |
*** armstrongs47 has joined #zuul | 23:39 | |
*** armstrongs47 has quit IRC | 23:49 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!