| -@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: | 02:00 | |
| - [zuul/zuul] 957479: Replace getQuotaUsed with cache https://review.opendev.org/c/zuul/zuul/+/957479 | ||
| - [zuul/zuul] 957480: Only log quota messages when acted upon https://review.opendev.org/c/zuul/zuul/+/957480 | ||
| -@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 957456: Include requested nodes in quota percentage calculation https://review.opendev.org/c/zuul/zuul/+/957456 | 03:05 | |
| -@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: | 06:06 | |
| - [zuul/zuul] 957457: Sort nodes by request time https://review.opendev.org/c/zuul/zuul/+/957457 | ||
| - [zuul/zuul] 957294: Fix race with node provisioning https://review.opendev.org/c/zuul/zuul/+/957294 | ||
| -@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 957304: Always send nodes provisioned event before unlock https://review.opendev.org/c/zuul/zuul/+/957304 | 06:06 | |
| -@gerrit:opendev.org- Simon Westphahl proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 957480: Only log quota messages when acted upon https://review.opendev.org/c/zuul/zuul/+/957480 | 07:13 | |
| -@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 957464: Ensure node request event is sent after crash https://review.opendev.org/c/zuul/zuul/+/957464 | 07:44 | |
| -@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: | 07:45 | |
| - [zuul/zuul] 957295: Launcher: improve missing artifact log message https://review.opendev.org/c/zuul/zuul/+/957295 | ||
| - [zuul/zuul] 957296: Launcher: retry loading providers on failure https://review.opendev.org/c/zuul/zuul/+/957296 | ||
| @harbott.osism.tech:regio.chat | what is the expected behaviour when job.semaphores specifies a name for which no semaphore is defined? I would expect a config error to be triggered, or maybe the job being blocked forever. what I'm seeing is the job getting executed regardless, likely without any limit on parallel execution (didn't verify the latter though) (this is with Zuul 12.0.0) | 12:21 |
|---|---|---|
| @jangutter:matrix.org | According to https://zuul-ci.org/docs/zuul/latest/config/job.html#attr-job.semaphores : | 12:51 |
| `Also the name of a semaphore can be any string (without being previously defined via semaphore directive). In this case an implicit semaphore is created with capacity max=1.` | ||
| @jangutter:matrix.org | Jens Harbott: in our pipelines we use implicit semaphores quite extensively since it works across branches with no conflicts. | 12:52 |
| -@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 957479: Replace getQuotaUsed with cache https://review.opendev.org/c/zuul/zuul/+/957479 | 21:51 | |
| -@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 957480: Only log quota messages when acted upon https://review.opendev.org/c/zuul/zuul/+/957480 | 21:52 | |
| -@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 957574: Fix connection filter lookups in updateTenantProviders https://review.opendev.org/c/zuul/zuul/+/957574 | 21:58 | |
| @clarkb:matrix.org | corvus: swest digging into ^ I have a lot of thoughts.... First up I'm not convinced that test case is really testing much for us for two reasons. The first is that Launchers with connection_filters short circuit in `updateTenantProviders` here: https://opendev.org/zuul/zuul/src/commit/8a25d07f6f82372e0859b99d4a8ac98117e3dff6/zuul/launcher/server.py#L2233-L2236 and we don't have any connection names that match 'nope' in our Launcher configs. Second we're not asserting any state at the end of the test. I think all we're doing is forcing the request to be handled by that specific launcher then waiting for the launcher to stop after we stop it. Internally in the log (https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_da7/zuul/da75b459d7654f7e929bf532b1cfc351/testr_results.html) we can see the request fails due to quota problems. | 22:05 |
| @clarkb:matrix.org | I think it might be a good idea to reconsider what we're trying to test there. My change makes a stab at half of what i think we need to test: that we're able to pick launchers for endpoints and the other half is that we actually successfully boot instances using matching connection filters. | 22:06 |
| @clarkb:matrix.org | Then in addition to the test case I also found we're doing some very similar work in several places around the launcher: There is `LauncherApi.getMatchingRequests()` (this uses the `launcher_score()` function monkey patched in the test), `Launcher._chooseLauncherForEndpoint()`, and `CleanupWorker.getMatchingEndpoints()`. These last two do not use `laucher_score()` whcih I found confusing with the existing test case as it seems like we're only half patching things. I wonder if can/should simplify things by having a common place to lookup endpoints that apply to the current launcher? | 22:11 |
| @clarkb:matrix.org | Finally, looking in that test log I note that we have a mixtures of `zuul.Launcher`, `zuul.Launcher-0`, and `zuul.Launcher-1` log messages. I suspect that in production this isn't a big deal because we're always running one Launcher? But within the test cases it makes for some confusing log messages to have `zuul.Launcher` messages that belong to one of the two Launchers. I think the reason for this is we set the thread name for Launcher.run's thread to "Launcher". In theory we can update that value for more consistency and differentiation? | 22:15 |
| @clarkb:matrix.org | * Finally, looking in that test log I note that we have a mixtures of `zuul.Launcher`, `zuul.Launcher-0`, and `zuul.Launcher-1` log messages. I suspect that in production this isn't a big deal because we're always running one Launcher? But within the test cases it makes for some confusing log messages to have `zuul.Launcher` messages that belong to one of the two Launchers. I think the reason for this is we set the thread name for `Launcher.run`'s thread to "Launcher". In theory we can update that value for more consistency and differentiation? | 22:23 |
| @jim:acmegating.com | Clark: those are a lot of thoughts. i think with your fix, the test is sufficient. it tests that one of the launchers *handled* the request. i agree, the test would be slightly better if we assert the request was fulfilled, which it should be under normal circumstances. but even with the failure you observed, what the test was intended to cover was still actually tested: the one launcher out of the two that could handle it did. but it's trivial to add a check that the request is fulfilled, so let's do that. | 22:58 |
| @jim:acmegating.com | Clark: it's worth noting that the _chooseLauncherForEndpoint method is only used for one thing: nominating a launcher to do the initial quota lookup for a provider (so they don't do a thundering herd). it's not used to determine what launcher should handel a request, which is why the failure you looked at was quota related. | 23:00 |
| @clarkb:matrix.org | ah | 23:02 |
| @clarkb:matrix.org | my main concern with having several methods there was around them calculating a similar score in three different places the nwe only patch one. but I guess it doesn't matter since as you say its for quota (and the other is for cleanup) which isn't going to affect this? | 23:02 |
| @jim:acmegating.com | yep. i think there's probably some opportunity to DRY that method, but i don't think it matters here. | 23:05 |
| @jim:acmegating.com | and yeah, the log messages are just for humans reviewing the test results to have some clue as to which of the two is doing it. i don't think it's worth a lot of effort. if we need more differentiation in that test, we can override more things, but when i was writing the test, i thought that was enough. certainly we don't need to do anything like that for prod. | 23:07 |
| -@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Clark Boylan: [zuul/zuul] 957574: Fix connection filter lookups in updateTenantProviders https://review.opendev.org/c/zuul/zuul/+/957574 | 23:08 | |
| @clarkb:matrix.org | ya it just took me a minute to figure out which launcher the exception originated in | 23:08 |
| @jim:acmegating.com | Clark: ^ that'll check the result | 23:08 |
| @clarkb:matrix.org | corvus: I'm not sure it actually gets fulfilled unless an error result is considered fulfulled? | 23:09 |
| @jim:acmegating.com | Clark: it does normally -- you saw a quota error because of the bug you fixed | 23:09 |
| @clarkb:matrix.org | oh! | 23:10 |
| @clarkb:matrix.org | got it. We failed to update tenant info which led to broken quota | 23:10 |
| @jim:acmegating.com | yep | 23:10 |
| @jim:acmegating.com | the one launcher handled the request and failed it, which means the connection filter still worked, just not the quota init | 23:10 |
| @jim:acmegating.com | which is why i say the test was still actually testing that part of things. | 23:10 |
| @jim:acmegating.com | now with your change (and also that assert) it also tests the quota part. :) | 23:11 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!