Tuesday, 2024-02-13

corvus	tonyb: which ze did that job run on?	01:50
tonyb	I'm not sure. I for the logs by looking at them all `for i in 01 ..... 12 ; do ...`	01:52
corvus	ze03	01:53
tonyb	How'd you find that so quickly? ... just experience?	01:54
tonyb	just .... like its a small thing	01:54
corvus	oh nope not that one	01:54
corvus	you can do an ad-hoc ansible command to grep	01:55
corvus	it's ze08	01:56
tonyb	Okay	01:56
corvus	tonyb: the exceptions don't end up with build tags, so if there is one, you have to read the log and search for it	01:57
corvus	2024-02-13 00:29:53,859 ERROR zuul.AnsibleJob: Exception: Variable names may only contain letters, numbers, and underscores	01:57
tonyb	Ahh okay. That makes sense	01:59
tonyb	I think maybe I found it.	02:01
tonyb	Thanks	02:01
corvus	\o/	02:01
*** tosky_ is now known as tosky		09:48
*** Adri2000_ is now known as Adri2000		13:42
opendevreview	Rodolfo Alonso proposed openstack/project-config master: Implement "neutron-unmaintained-core" group https://review.opendev.org/c/openstack/project-config/+/908911	15:32
opendevreview	Merged zuul/zuul-jobs master: Introduce LogJuicer roles https://review.opendev.org/c/zuul/zuul-jobs/+/899212	15:41
TheJulia	o/ folks, looks like https://zuul.opendev.org/t/openstack/status has a hung job on queued docs..... any insight or is there any way to clear it out?	15:57
clarkb	TheJulia: generally those get stuck due to being unable to provision the node type for the job. I've got a meeting now but can look at logs afterwards to see exactly why it is stuck	16:01
opendevreview	Merged zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job https://review.opendev.org/c/zuul/zuul-jobs/+/907363	16:09
clarkb	the status page does say it is waiting on a node request. If I had to guess this is fallout from the restarts of nodepool related to some recent bugfixes to nodepool. cc corvus I think we may have seen this before where a node request gets lost but it happened over the holidays and so we didn't have logs for it.	16:31
clarkb	corvus: I haven't dug up logs yet though, but my hunch is this is that same issue. This time we should hopefully have logs since it happened recently	16:36
clarkb	The rax-dfw provider is trying to process the node request and has been since 2024-02-12 20:09:30,572 DEBUG nodepool.PoolWorker.rax-dfw-main: [e: 4369e4e61c504735bc2f32c589152357] [node_request: 300-0023568458] Locking request	16:58
clarkb	It has been failing in a loop on not enough quota remaining so it pauses and retries	16:58
clarkb	rax-dfw is not launching any nodes	16:59
clarkb	doing a server list against that provider is taking a very long time...	17:02
clarkb	trying to double check the values grafana and nodepool are reporting against what the cloud reports	17:02
clarkb	I suspect we may need to set max-servers in this region to 0 though while we figure this out	17:02
clarkb	ok ya there are many servers in server list that are not known to nodepool. I'll try manually deleting them first and if that doesn't work we can set max-servers to 0 and file a ticket	17:04
fungi	chances are they're undeletable and we'll have to get rax support to clean them up. not the first time this has happened in the past few months	17:05
clarkb	ya	17:06
clarkb	there are a few listed as active and I'm startign with them rather than the ones already in error or deleting	17:06
clarkb	nodepool reports all the servers it knows about in that region should be deleting already so should be safe to delete anything off the list I generated. Now that I'm trying to delete stuff new servers may be booted though	17:06
fungi	note that you should put in a wait between asking nodepool for a list of instances it knows about and then asking the cloud for a list	17:10
fungi	nodepool may be waiting for instances to come active, and won't know the uuids for those until they boot, so you could race it	17:11
clarkb	ya in this case it has been steady state for a while (almost 10 minutes)	17:12
clarkb	also I don't think any of the 30 something deletes I've done have made any changes	17:13
clarkb	oh maybe I'm wrong	17:14
clarkb	I see the building graph jumping up	17:14
clarkb	I guess I'll continue with the deletions	17:14
corvus	clarkb: do they look like leaked nodepool nodes? do they have any metadata?	17:26
clarkb	corvus: yes they are all named np000XYZ...	17:26
clarkb	I'll check the metadata on a server after the current mass delete ends (enough are failing I'll still have examples(	17:26
corvus	++	17:27
corvus	clarkb: then my next question (assuming that there is no metadata) is whether these nodes were ever used. as in: if the cloud is messing up the metadata association, is it on creation (so we never got those nodes to be used) or on deletion (we used them and told the cloud to delete, and it deleted the metadata but not the node)	17:28
clarkb	TheJulia: the cleanups I'm doing above allowed your node to be scheduled and you should have a zuul report on the change now	17:28
corvus	can probably answer by grepping for one of those node ids	17:28
clarkb	corvus: looks like some are deletes failing because multiple nodes have the same name	17:28
TheJulia	clarkb: thanks!	17:28
clarkb	which is a surprising state considering that nodepool largely runs single threaded per provider?	17:29
clarkb	corvus: I'm not sure I'll have time to do that debugging for a while. I have two more meetings this morning. One of which I'm running	17:29
TheJulia	from a 30k ft guess, I bet something told it was deleting but it didn't actually get deleted	17:29
corvus	clarkb: we'll reuse the name if we retry launching a node. but we perform delete operations based on nova id so it shouldn't care	17:29
clarkb	ya also server list doesn't show me duplicates	17:30
clarkb	so wherever that duplicate is isn't exposed to us	17:30
corvus	TheJulia: that sounds likely, but we expect lies like that from the cloud so we also use metadata to detect leaks. the worst thing is if the cloud deletes the metadata but not the instance (or conversely, creates an instance and doesn't attach metadata). the only resolution to that is human intervention.	17:31
clarkb	corvus: 8624c16c-97a0-435b-93e3-dfcf048f4e6a is np0036475154 and openstack says there is a duplicate. There is no metadata on this server	17:32
clarkb	10f0b06b-7019-4792-a3db-b76ab2ab2f3f was np0036473596 and had metadata. I manually deleted it though	17:33
clarkb	Makes me wonder if nodepool isn't able to delete things that have duplicates for some reason	17:33
clarkb	since it should be a deletable node with the metadata	17:33
corvus	clarkb: if it had metadata and wasn't detected as a leak, that sounds like a nodepool bug worth exploring. might want to keep it around if you find another with metadata.	17:33
clarkb	corvus: ack	17:34
corvus	clarkb: i don't see any info about the lifecycle of np0036475154 before we started trying to delete it; i expect those logs are gone	17:36
corvus	grep 0036475154 /var/log/nodepool/*\|grep -v Exception\|grep -v cache\|grep -v Delete\|grep -v delete is what i used	17:36
clarkb	corvus: ya the newest one I've got is from january 21	17:36
clarkb	I think we may have rotated logs on most/all of these	17:36
clarkb	67ad5b7b-1667-4923-8470-1b274460fba0 is the newest one np0036478059	17:37
clarkb	it was created at 2024-01-21T11:24:29Z, had a fault at 2024-01-21T12:13:39Z, then was last updated at 2024-01-21T12:15:14Z	17:37
corvus	my question about the lifecycle is mostly curiosity / to help characterize what i expect is a cloud failure and not that important. but any ongoing failure to delete leaked nodes with metadata is potentially actionable.	17:37
TheJulia	corvus: still seems like that could be reconciled out of potentially :( Anyway, Thanks!	17:37
corvus	TheJulia: with heuristics we could automatically delete leaked nodes without metadata, but considering how quickly this system can do great damage, we avoid doing stuff like that with heuristics.	17:38
fungi	TheJulia: a common problem we've run into is that something times out between bits of the cloud during server creation, and the metadata nodepool relies on never gets added to the instances	17:39
corvus	a mistake there could delete the entire dev infrastructure in a blink. :)	17:39
TheJulia	corvus: different tenants maybe :)	17:39
TheJulia	fungi: ugh	17:39
corvus	TheJulia: opendev uses different tenants, generally, for that purpose. but not everyone is as lucky as opendev in that regard, and even opendev in some cases has had trouble getting multiple tenants from clouds. we avoid using those clouds for infrastructure.	17:40
fungi	actually, thinking back to when i looked into it last time, openstacksdk adds the metadata as a separate api call? so if it gives up waiting for the server instance then the metadata is never added	17:40
fungi	though i may also be confusing this with similar leaks we see with image uploads	17:41
TheJulia	corvus: true	17:41
clarkb	corvus: np0036478700 is also from january 21, but it has metadata	17:41
clarkb	corvus: though looking at it it may be a stuck delete /me looks at nodeppol	17:41
corvus	clarkb: yeah it'r trying to delete that one still	17:42
clarkb	ya it is	17:42
clarkb	I'm going to try manually deleting it since we know nodepool is trying now	17:42
clarkb	I don't expect it to actually go away but worth a shot in case the openstack client gives me any useful info	17:42
corvus	++ nodepool is only getting a timeout	17:43
corvus	clarkb: any nova error state set?	17:43
clarkb	corvus: {'message': 'MessagingTimeout', 'code': 500, 'created': '2024-01-21T16:08:45Z'}	17:44
clarkb	its possible that we need an admin to reset the task staet so that a delete will actually be reattempted	17:44
clarkb	corvus: I found one e3f0d035-71f9-4bdb-8a5d-e576de2ead87 is np0036476592 (duplicate reported) and has metadata but is not being deleted by nodepool	17:45
clarkb	corvus: possibly because the state for this node is DELETED. Maybe we need to retry deletes until servers stop being listed?	17:45
clarkb	corvus: so I suspect that the vast majority of these are a cloud side issue but maybe this subset is possible to have nodepool cleanup?	17:45
corvus	clarkb: that should already be the case	17:46
corvus	lemme look up the zk record for that one	17:46
clarkb	ok I need to context switch now. There are quite a few nodes that are still leaked. I'll make a record of them in my homedir on bridge to distinguish them from the running nodes	17:47
corvus	clarkb: oh i see, the cloud state is deleted; yeah i think our understanding of that is that it doesn't affect quota and we can ignore it, so we don't try deleting nova nodes that are deleted	17:48
corvus	if we think this is a problem, we could probably do as you suggest and keep deleting the deleted nodes	17:48
clarkb	corvus: I guess it isn't clear to me if that node is counting against our quota. But it continues to show up in listings several weeks later. I think it might be a good idea to keep trying simply to clear out the noise as much as possible making it easier to see the actual problems (assuming they don't count against quota)	17:49
clarkb	corvus: nodes in a DELETED state are the only ones that have leaked that I see with metadata so far fwiw	17:49
clarkb	ok file with list of leaked nodes is in my homedir if anyone else wants to look at it and avoid touching nodes that may be in use	17:50
corvus	clarkb: i don't disagree, but one thing to consider is it's trading one kind of noise for another (noisy nodepool logs)	17:52
fungi	okay, so there are two categories of leaks present: those stuck in deleting/error and those missing metadata?	17:52
fungi	that's fun	17:52
clarkb	fungi: three I've seen so far. deleted with metadata now ignored by nodepool intentionally, error with metadata not ignored by nodepool but cloud fails to delete them, and error/active/build with no metadata that nodepool ignores as a result. The first and third categores have been manually deletable if you use uuids to get around duplicate node complaints	17:53
corvus	if our previous assessment that nova-deleted nodes don't count against quota still holds, then category #1 is only an admin annoyance, but not an operational impediment. if that doesn't hold true anymore then i definitely think we should continuously delete them	17:56
clarkb	well #3 is a problem too because it uses up our quota then jobs get stuck. But I'm not sure nodepool can do much to solve them (so admin problem not a nodepool problem)	17:58
corvus	yes, i was trying to clarify that those 3 classes of "leaked nodes" may not all be causing operational issues. #3 clearly is and i think #2 probably is (but less sure), and i think #1 is not (unless something changed since the last time we evaluated that behavior)	18:00
fungi	unrelated to anything else, https://zuul.opendev.org/t/openstack/status shows a change which is about to clear the gate, and has a null queue name for it	18:05
fungi	wondering what could lead to that	18:06
clarkb	corvus: gotcha	18:06
corvus	fungi: not assigned to a named queue, so it's only in its automatically created per-project queue	18:09
fungi	corvus: interesting, other changes show the project name as the queue name in that situation rather than just a blank	18:09
corvus	might be due to the backwards compat handling?	18:10
fungi	in this case it was for an openstack/openstack-ansible change (now it's merged so no longer showing there), but i couldn't find any queue name set in the project's config	18:10
opendevreview	Lajos Katona proposed openstack/project-config master: Implement "neutron-unmaintained-core" group https://review.opendev.org/c/openstack/project-config/+/908911	18:36
fungi	i just noticed https://github.com/lxc/lxc-ci because someone announced plans to package it in debian... looks very similar to dib, but focused on just creating lxc images instead of virtual machine images	18:47
clarkb	interesting that lxc wouldn't do dockerfile like builds (d0on't have to use docker)	18:50
fungi	they also don't seem to have as much of a mix-in model like how dib elements work, and more repetition between related distros as a result	18:53
opendevreview	Merged opendev/system-config master: Check launched server for x86-64-v2/sse4_2 support https://review.opendev.org/c/opendev/system-config/+/908512	19:17
opendevreview	Merged opendev/zone-opendev.org master: Switch the keycloak CNAME to the new server https://review.opendev.org/c/opendev/zone-opendev.org/+/908357	19:24
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Remove command.warn usage https://review.opendev.org/c/zuul/zuul-jobs/+/908671	19:27
opendevreview	Steve Baker proposed openstack/diskimage-builder master: Add setuptools for python3.12 support in venvs https://review.opendev.org/c/openstack/diskimage-builder/+/902497	19:30
fungi	having not used the zuul web admin access before now, is the "sign in" button in the top-right corner supposed to do anything? i tried it a few times a while back and it doesn't seem to ever do anything when i click that	20:06
fungi	tried it from multiple browsers too, i don't think there's a popup blocker breaking it	20:07
fungi	i would have expected it to send me to the configured openid provider	20:08
Clark[m]	Did we update the URL to drop /auth/ yet?	20:09
Clark[m]	It may be doing a request in the background that fails. Browser debugger may help	20:09
tonyb	404 openid-configutation.js	20:10
opendevreview	Merged opendev/system-config master: Update Zuul auth config for new Keycloak images https://review.opendev.org/c/opendev/system-config/+/908353	20:11
tonyb	sorry the 404 is on openid-configuration XHR from oidc-client.min.js:1	20:13
tonyb	```status#system-config:1 Access to XMLHttpRequest at 'https://keycloak.opendev.org/auth/realms/zuul/.well-known/openid-configuration' from origin 'https://zuul.opendev.org' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.```	20:15
Clark[m]	The /auth/ prefix is at least part of the problem. Maybe it sets cors headers when hitting the valid path?	20:16
tonyb	Yeah I get data without the /auth/ iun the url. I admit I have no idea how to test/debug CORS stuff	20:17
fungi	okay, so in theory it will work once 908353 is in use	20:17
fungi	we probably need to restart zuul-web for that?	20:17
fungi	since the change is to zuul.conf, which i don't think gets read live	20:18
fungi	huh, when did zuul start leaving comments like "1 is the latest approved patch-set. No files were changed between the latest approved patch-set and the submitted	20:20
fungi	one."	20:20
fungi	note the required votes were all on patch set #2	20:20
fungi	was that part of the circular deps refactor?	20:21
Clark[m]	That's from Gerrit I think	20:22
Clark[m]	All zuul is doing is clicking the submit button and then Gerrit records a comment on your honor	20:22
fungi	oh, right that's the gerrit comment on zuul's behalf	20:22
fungi	so anyway, 908353 hasn't deployed yet. once it does i should restart each zuul-web container in turn?	20:24
Clark[m]	Yes I think that would be the next steps	20:24
tonyb	fungi: Yup sounds good to me	20:24
fungi	it deployed seconds after i said that	20:29
fungi	issuer_id looks correct in the configs of both servers so downing/upping their containers one at a time now	20:30
fungi	mmm, i probably should have waited longer between those	20:32
fungi	sorry about that	20:32
tonyb	I get redirected to keycloak now when clicking the 'sign in' button	20:37
fungi	yep, the webui seems to be back up and working again	20:38
fungi	and yes, the sign in button is also working for me	20:38
tonyb	\o/	20:39
fungi	unfortunately, when signing in with my account credentials, i get "login in progress, you will be redirected shortly..." for what seems like is probably forever	20:41
opendevreview	Jeremy Stanley proposed opendev/system-config master: Document adding Zuul WebUI admins https://review.opendev.org/c/opendev/system-config/+/908949	20:44
fungi	everything's working for me up to the very last step, signing into zuul	20:44
fungi	i probably need to check the zuul-web logs for errors	20:45
fungi	for me it's spinning forever on the https://zuul.opendev.org/auth_callback that keycloak redirects to	20:49
tonyb	eeek gotta do the school run	20:50
fungi	okay, javascript console in my browser says this:	20:54
fungi	Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://keycloak.opendev.org/realms/zuul/protocol/openid-connect/userinfo. (Reason: CORS header ‘Access-Control-Allow-Origin’ missing). Status code: 200.	20:54
fungi	is that going to be on the zuul side or the keycloak side?	20:56
clarkb	which server respodned with that error?	20:57
clarkb	you should see it in the network debugger with clearer network paths	20:57
fungi	aha, thanks, i'm entirely unfamiliar with debugging browser-based stuff	20:58
fungi	aside from basic html/xhtml/sgml	20:58
clarkb	but I think that may be zuul saying it blocked the content from keycloak based on the same origin policy	20:58
clarkb	I'm not sure how to address that. I would be something in react maybe?	20:58
fungi	looking at the network trace, the last response is from zuul yes	20:59
fungi	i wonder why it wasn't a problem with the old keycloak server	20:59
fungi	i see we explicitly set Access-Control-Allow-Origin in our vhost configs for gitea, graphite and jitsi-meet	21:00
clarkb	basically I think the webbrowser is saying zuul.o.o can't trust keycloak.o.o because of the policy. Which si slightly different than the CORS policy which is the server saying "you acn use this elsewhere"	21:00
clarkb	oh except maybe keycloak needs to respond with access-control-all0w-origin ?	21:00
fungi	https://zuul-ci.org/docs/zuul/latest/howtos/openid-with-keycloak.html#create-a-client covers setting "web origins" in keycloak, which should cover that	21:01
clarkb	you should see that in your network trace	21:03
clarkb	for the request responses from keycloak	21:03
fungi	web origins for the client i created in the zuul realm is set to https://zuul.opendev.org/	21:04
fungi	description for that field in the form is "Allowed CORS origins. To permit all origins of Valid Redirect URIs, add '+'. This does not include the '' wildcard though. To permit all origins, explicitly add ''."	21:04
clarkb	I woudl double check you see that in your web browser dev tools network trace	21:04
fungi	it'll be in a response header, right? still trying to figure out how/where it exposes those	21:06
clarkb	yes	21:06
clarkb	if you click on a line entry in the network trace it should open a more details very for that request and response	21:06
clarkb	that will include header info	21:07
clarkb	*more details view	21:07
fungi	aha, thanks	21:07
fungi	referrer policy is strict-origin-when-cross-origin	21:07
fungi	Access-Control-Allow-Origin	21:08
fungi	is *	21:08
clarkb	which isn't https://zuul.opendev.org/	21:09
clarkb	but also should be sufficient to allow things to happen	21:09
fungi	that's coming from zuul.o.o though	21:09
clarkb	oh you need to find the requests to and from keycloak	21:09
fungi	aha, yeah i had to redo the login rather than just refreshing	21:12
fungi	it reports the origin as https://zuul.opendev.org but no Access-Control-Allow-Origin header	21:13
clarkb	ok that is likely the problem. We need the access control headers for the browser to do the right thing	21:13
clarkb	maybe there is a different setting filed in keycloak now that we have to set?	21:14
fungi	however, an earlier request to keycloak.opendev.org is returning "Access-Control-Allow-Origin: https://zuul.opendev.org"	21:14
fungi	just not the request that it's getting stuck on	21:14
fungi	https://keycloak.opendev.org/realms/zuul/protocol/openid-connect/certs has the expected header, https://keycloak.opendev.org/realms/zuul/protocol/openid-connect/userinfo does not	21:16
clarkb	https://keycloak.discourse.group/t/access-control-allow-origin-header-missing/328/28 says they had to remove trailing slashes from the origin	21:17
fungi	gah	21:17
clarkb	that seems unlikely to be the problem, but maybe that is it?	21:17
fungi	that was entirely it. thanks	21:19
clarkb	wow	21:19
fungi	i'll adjust my docs update for zuul	21:19
clarkb	that seems like a bug in input validation for keycloak if they don't want to accept a trailing /	21:19
clarkb	I've just sent the ansible 6 removal announcement email	21:19
fungi	infra-root: new keycloak server is now in production, instructions as to how to add your user are provided by https://review.opendev.org/908949 (and eventually by our system-config docs once that merges)	21:25
clarkb	thanks I'll review that change and the zuul docs update now	21:26
clarkb	fungi: mhu has some comments on the zuul change	21:27
fungi	https://github.com/keycloak/keycloak/issues/25522 seems to be the corresponding bug report for that	21:27
fungi	clarkb: thanks, forgot to address those when updating	21:30
fungi	should be covered now	21:30
clarkb	I'm going to pop out soon for a bike ride. We're going to get hit by another atmospheric river/pineapple express tomorrow and I can see blue skies right now	21:30
clarkb	just a heads up I'll be afk for a bit this afternoon to take advantage of the weather	21:31
fungi	we very narrowly escaped flooding today from the wind storm that just plowed through	21:31
clarkb	is that the same storm as the noreaster hitting the areas north of you?	21:31
fungi	probably. a few inches deeper at high tide and we'd have had some interior cleanup to deal with	21:31
clarkb	wow	21:31
clarkb	when I get back I'll look into the rest of that rax dfw cleanup that we can do without admin perms	21:32
fungi	thankfully it only just reached the edge of our patio on the waterfront side	21:32
clarkb	I finally made it up into the hills over the weekend and the destruction from our ice storm last month is still very evident in areas that were hit hard. One very large multistory home had an entire corner just sliced/ripped off	21:33
clarkb	the massive rootball of a tree near the house gave me an idea of how big the tree must've been to do that	21:33
clarkb	and there is still a fairly large tree hanging off utility lines next to the commuter trail	21:34
fungi	oh yes, trees are extremely strong, but add an inch-thick coating of ice to their branches and they topple like (insanely heavy) toothpicks, taking out everything in their path	21:34
fungi	especially if the soil is also wet and compromised by the same storm	21:35
clarkb	yup pretty classic scenario for trees coming down. There is a big push in our local media to try and rescue the reputation of trees	21:36
clarkb	there is concern that everyone will chop down all the trees now. But they provide a lot of benefits like defending against urban heat islands and so on	21:36
fungi	also preventing or at least limiting erosion	21:37
fungi	chop down all the trees, and you guarantee mudslides. saw it all over growing up, on mountainsides that had been clearcut	21:37
clarkb	ya and trees work together to block wind. If you thin them out too much those that are left may be at higher risk	21:38
fungi	once the trees are gone, there's nothing to hold all the soil and bounders onto the substrate. so it plows downhill at insane speeds and turns whole subdivisions into parking lots	21:38
clarkb	its all about balance	21:38
clarkb	alright working on popping out now. Back later	21:42
fungi	next (non-urgent) question for those who have used the admin functions of zuul's webui: the docs suggest that after logging in i should see the option to create an autohold, but where? it shows me logged in (displays my username in the top-right corner), yet neither the tenants list nor the autoholds page for a tenant seems to give any such option. what am i missing?	21:44
fungi	the docs suggest it should appear on the tenant-specific autoholds page	21:45
fungi	makes me wonder if i missed adding some sort of authorization, and by default my account is unprivileged	21:48
corvus	fungi: yeah, in opendev we set up an expectation that there would be groups with the tenant name and you'd be in that group	22:23
corvus	fungi: to have admin perms, you either need to be in a group with the exact name of the tenant, or in a group named `infra-root` you probably want the latter	22:25
corvus	i'm going to create the infra-root group now	22:31
corvus	and we need a mapping too; i'll set that up	22:38
fungi	corvus: oh, thanks! that would explain exactly what i saw. where is that expectation reflected in the zuul configuration? somewhere i'm overlooking in zuul.conf?	22:56
corvus	fungi: in main.yaml -- the tenant config	22:57
fungi	aha. i was looking in entirely the wrong place	22:58
corvus	fungi: okay i did 2 things: first i added a mapper for groups like we just discussed: https://keycloak.opendev.org/admin/master/console/#/zuul/clients/b4ed13af-2692-4821-a06e-03a2f356b7f3/clientScopes/dedicated	22:58
fungi	i should probably mention this in https://review.opendev.org/c/opendev/system-config/+/908949 too	22:58
corvus	that's specific to how we want to set up authz for opendev, so yeah, that probably belongs in that doc but not the zuul tutorial	22:59
corvus	there was one other thing missing that probably belongs in the tutorial; after creating a client scope for the audience, you need to add it to the zuul client config	22:59
corvus	fungi: that's here: https://keycloak.opendev.org/admin/master/console/#/zuul/clients/b4ed13af-2692-4821-a06e-03a2f356b7f3/clientScopes	23:00
fungi	awesome. i can cover the the custom mapper/group creation in keycloak.rst and adding users to the infra-root group in sysadmin.rst, i guess	23:01
fungi	i'll do that tomorrow between pre-ptg sessions	23:01
corvus	fungi: basically on client details page, go to "client scopes" tab, click "add client scope button" "check zuul_aud" click "add" "default"	23:01
fungi	perfect, i'll update the zuul docs change with that too. thanks!	23:02
corvus	fungi: alternatively, i think you could do what i did with the groups, which is rather than defining a standalone client scope and then adding it to the client; i think you could add a dedicated client scope.	23:02
corvus	fungi: since i hadn't fully explored this, we now have a standalone scope for zuul_aud and a dedicated scope for groups; if that bothers anyone else's ocd, we might want to pick one of those two styles and stick with it	23:03
corvus	i'm fine with either	23:03
corvus	fungi: since i'm here, i added you to the infra-root group	23:03
corvus	i'm done with the keycloak admin ui now	23:04
fungi	thanks corvus!	23:04
fungi	signing out of and back into zuul, i see a link to create an autohold now	23:05
fungi	i also have dequeue and promote options in the status view	23:06
fungi	i'll get the various docs changes fixed up for this tomorrow	23:06
corvus	there's also a little wizard hat if you open up the user info dialog	23:06
corvus	should say "Logged in as: fungi <wizard hat>"	23:07
fungi	i see the hat! i have a hat!	23:07
fungi	very cute	23:07
fungi	corvus: the doc in zuul already does cover adding zuul_aud to default. did it still show up as "none" for the assigned type?	23:10
fungi	the ui around that part seems to have changed somewhat between the version the original steps were written for and now, so i had to adapt it a little, but maybe i misunderstood what i was looking at and adapted incorrectly	23:11
fungi	or maybe i just missed clicking a "save" button somewhere there	23:12
corvus	i didn't see zuul_aud added as a client scope to the zuul client; the zuul client had no client scopes; yeah maybe a missed save	23:12
corvus	fungi: hrm i see the text about adding it as a scope, but i think it's missing an explicit "Add" step	23:13
corvus	not quite sure how to reconcile that; might need a clean run through?	23:13
fungi	oh, got it. yeah this talks about setting the assigned type for the zuul_aud client scope to default	23:13
fungi	in the client scopes list	23:14
fungi	i do still have a held node i can override my name resolution for and go through the same setup steps on	23:14
corvus	yeah; when you add it, you also select whether it's default or not (i did select default); it almost seems like that's an edit step that's missing the add	23:14
fungi	but can't easily test external interactions from zuul with it of course	23:15
corvus	or maybe the docs are correct for adding a dedicated scope, but you did a standalone scope instead in prod?	23:15
corvus	fungi: you can decode the jwt and check the "aud" field manually	23:15
corvus	it should have "zuul" in it; without the mapper it just says "account"	23:16
fungi	oh, looking at the diff in 908855 i think i did indeed misinterpret the instructions there when adapting it to the current version. i clicked "client scopes" in the left sidebar and changed the assigned type for zuul_aud there, rather than picking "clients" in the left sidebar and going into the zuul client to change zuul_aud there	23:18
*** dmitriis is now known as Guest2683		23:44
clarkb	I've kicked off a deletion process which should delete the reamining nodes that we can delete from the rax dfw nodepool provider	23:51
clarkb	there are ~25 that have stuck around	23:56
clarkb	a big improvement but still enough that we'll probably want to file a ticket ot see if rax can clear out the remainder	23:56

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!