Friday, 2025-08-15

-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:		02:00
- [zuul/zuul] 957479: Replace getQuotaUsed with cache https://review.opendev.org/c/zuul/zuul/+/957479
- [zuul/zuul] 957480: Only log quota messages when acted upon https://review.opendev.org/c/zuul/zuul/+/957480
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 957456: Include requested nodes in quota percentage calculation https://review.opendev.org/c/zuul/zuul/+/957456		03:05
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com:		06:06
- [zuul/zuul] 957457: Sort nodes by request time https://review.opendev.org/c/zuul/zuul/+/957457
- [zuul/zuul] 957294: Fix race with node provisioning https://review.opendev.org/c/zuul/zuul/+/957294
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 957304: Always send nodes provisioned event before unlock https://review.opendev.org/c/zuul/zuul/+/957304		06:06
-@gerrit:opendev.org- Simon Westphahl proposed on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 957480: Only log quota messages when acted upon https://review.opendev.org/c/zuul/zuul/+/957480		07:13
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 957464: Ensure node request event is sent after crash https://review.opendev.org/c/zuul/zuul/+/957464		07:44
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com:		07:45
- [zuul/zuul] 957295: Launcher: improve missing artifact log message https://review.opendev.org/c/zuul/zuul/+/957295
- [zuul/zuul] 957296: Launcher: retry loading providers on failure https://review.opendev.org/c/zuul/zuul/+/957296
@harbott.osism.tech:regio.chat	what is the expected behaviour when job.semaphores specifies a name for which no semaphore is defined? I would expect a config error to be triggered, or maybe the job being blocked forever. what I'm seeing is the job getting executed regardless, likely without any limit on parallel execution (didn't verify the latter though) (this is with Zuul 12.0.0)	12:21
@jangutter:matrix.org	According to https://zuul-ci.org/docs/zuul/latest/config/job.html#attr-job.semaphores :	12:51
`Also the name of a semaphore can be any string (without being previously defined via semaphore directive). In this case an implicit semaphore is created with capacity max=1.`
@jangutter:matrix.org	Jens Harbott: in our pipelines we use implicit semaphores quite extensively since it works across branches with no conflicts.	12:52
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 957479: Replace getQuotaUsed with cache https://review.opendev.org/c/zuul/zuul/+/957479		21:51
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul] 957480: Only log quota messages when acted upon https://review.opendev.org/c/zuul/zuul/+/957480		21:52
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul] 957574: Fix connection filter lookups in updateTenantProviders https://review.opendev.org/c/zuul/zuul/+/957574		21:58
@clarkb:matrix.org	corvus: swest digging into ^ I have a lot of thoughts.... First up I'm not convinced that test case is really testing much for us for two reasons. The first is that Launchers with connection_filters short circuit in `updateTenantProviders` here: https://opendev.org/zuul/zuul/src/commit/8a25d07f6f82372e0859b99d4a8ac98117e3dff6/zuul/launcher/server.py#L2233-L2236 and we don't have any connection names that match 'nope' in our Launcher configs. Second we're not asserting any state at the end of the test. I think all we're doing is forcing the request to be handled by that specific launcher then waiting for the launcher to stop after we stop it. Internally in the log (https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_da7/zuul/da75b459d7654f7e929bf532b1cfc351/testr_results.html) we can see the request fails due to quota problems.	22:05
@clarkb:matrix.org	I think it might be a good idea to reconsider what we're trying to test there. My change makes a stab at half of what i think we need to test: that we're able to pick launchers for endpoints and the other half is that we actually successfully boot instances using matching connection filters.	22:06
@clarkb:matrix.org	Then in addition to the test case I also found we're doing some very similar work in several places around the launcher: There is `LauncherApi.getMatchingRequests()` (this uses the `launcher_score()` function monkey patched in the test), `Launcher._chooseLauncherForEndpoint()`, and `CleanupWorker.getMatchingEndpoints()`. These last two do not use `laucher_score()` whcih I found confusing with the existing test case as it seems like we're only half patching things. I wonder if can/should simplify things by having a common place to lookup endpoints that apply to the current launcher?	22:11
@clarkb:matrix.org	Finally, looking in that test log I note that we have a mixtures of `zuul.Launcher`, `zuul.Launcher-0`, and `zuul.Launcher-1` log messages. I suspect that in production this isn't a big deal because we're always running one Launcher? But within the test cases it makes for some confusing log messages to have `zuul.Launcher` messages that belong to one of the two Launchers. I think the reason for this is we set the thread name for Launcher.run's thread to "Launcher". In theory we can update that value for more consistency and differentiation?	22:15
@clarkb:matrix.org	* Finally, looking in that test log I note that we have a mixtures of `zuul.Launcher`, `zuul.Launcher-0`, and `zuul.Launcher-1` log messages. I suspect that in production this isn't a big deal because we're always running one Launcher? But within the test cases it makes for some confusing log messages to have `zuul.Launcher` messages that belong to one of the two Launchers. I think the reason for this is we set the thread name for `Launcher.run`'s thread to "Launcher". In theory we can update that value for more consistency and differentiation?	22:23
@jim:acmegating.com	Clark: those are a lot of thoughts. i think with your fix, the test is sufficient. it tests that one of the launchers handled the request. i agree, the test would be slightly better if we assert the request was fulfilled, which it should be under normal circumstances. but even with the failure you observed, what the test was intended to cover was still actually tested: the one launcher out of the two that could handle it did. but it's trivial to add a check that the request is fulfilled, so let's do that.	22:58
@jim:acmegating.com	Clark: it's worth noting that the _chooseLauncherForEndpoint method is only used for one thing: nominating a launcher to do the initial quota lookup for a provider (so they don't do a thundering herd). it's not used to determine what launcher should handel a request, which is why the failure you looked at was quota related.	23:00
@clarkb:matrix.org	ah	23:02
@clarkb:matrix.org	my main concern with having several methods there was around them calculating a similar score in three different places the nwe only patch one. but I guess it doesn't matter since as you say its for quota (and the other is for cleanup) which isn't going to affect this?	23:02
@jim:acmegating.com	yep. i think there's probably some opportunity to DRY that method, but i don't think it matters here.	23:05
@jim:acmegating.com	and yeah, the log messages are just for humans reviewing the test results to have some clue as to which of the two is doing it. i don't think it's worth a lot of effort. if we need more differentiation in that test, we can override more things, but when i was writing the test, i thought that was enough. certainly we don't need to do anything like that for prod.	23:07
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed on behalf of Clark Boylan: [zuul/zuul] 957574: Fix connection filter lookups in updateTenantProviders https://review.opendev.org/c/zuul/zuul/+/957574		23:08
@clarkb:matrix.org	ya it just took me a minute to figure out which launcher the exception originated in	23:08
@jim:acmegating.com	Clark: ^ that'll check the result	23:08
@clarkb:matrix.org	corvus: I'm not sure it actually gets fulfilled unless an error result is considered fulfulled?	23:09
@jim:acmegating.com	Clark: it does normally -- you saw a quota error because of the bug you fixed	23:09
@clarkb:matrix.org	oh!	23:10
@clarkb:matrix.org	got it. We failed to update tenant info which led to broken quota	23:10
@jim:acmegating.com	yep	23:10
@jim:acmegating.com	the one launcher handled the request and failed it, which means the connection filter still worked, just not the quota init	23:10
@jim:acmegating.com	which is why i say the test was still actually testing that part of things.	23:10
@jim:acmegating.com	now with your change (and also that assert) it also tests the quota part. :)	23:11

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!