Friday, 2021-12-03

corvus	thanks; the delta from your last review on keycloak is very small :)	00:00
clarkb	corvus: for the auth token. We supply the value in the config as our token directly? or do we generate a time specific token? And if we generate a time specific token what prevents anyone from doing that (I've got the zuul client docs up and they don't seem to speak to this)	00:02
clarkb	oh there it is zuul create-token	00:02
clarkb	zuul create-auth-token	00:02
clarkb	but that still doesn't explain how this is ACL'd https://zuul-ci.org/docs/zuul/reference/client.html#create-auth-token	00:03
corvus	clarkb: my plan is to generate one token with no limit on the lifetime, and stick that in a config file on zuul0X for use with zuul-client	00:05
corvus	if we lose that token, we'll need to change the secret, but we can do that easily	00:05
clarkb	corvus: ok, and what prevents anyone from running zuul create-auth-token and creating another token?	00:06
clarkb	that is what I'm confused about. The command doesn't seem to accept a secret at all	00:06
clarkb	so wondering where the trust is involved	00:06
corvus	clarkb: oh we run that on zuul02 and it reads zuul.conf, so only we can create those tokens	00:06
corvus	and then the acl is here: https://review.opendev.org/820277	00:06
clarkb	oh this isn't creating a token via the rest api	00:06
clarkb	creating the token is local via the local config. Then the craeted token can be used via the rest api according to the rules in 820277	00:07
corvus	correct, 'zuul create-auth-token' is an admin-only command that can only be run with the actual production zuul.conf.	00:07
corvus	clarkb: exactly	00:07
clarkb	ok now to find the docs on those config settings	00:08
corvus	and i set up the rules so we can use a single token for every tenant	00:08
clarkb	corvus: if I read the docs correctly you may not actually be planning to use teh create-auth-token command and instead manually generate the token?	00:13
clarkb	I'm basing this on the fact that you seem to be relying on the issuer to accept the token via the iss config in 820277. But create-auth-token doesn't seem to allow you to specify that? Or maybe it is automatic based on the zuul-web canonical name	00:13
corvus	it uses the value from the config file (issuer_id)	00:14
clarkb	aha	00:14
clarkb	and we could filter this further by matching sub is admin or similar, but in this case it is sufficient to match the issuer since we are the only issuer	00:14
clarkb	I know we've broken down the zuul docs in terms of tutorials, guides, and reference but digging through docs for this makes me wish that everything was more centralized.	00:15
clarkb	I'll have to think on how to make this stuff more discoverable and easy to read in the docs. I don't have any good ideas other than I wish it wasn't in three different places right now :)	00:17
clarkb	corvus: one last question. The docs say that revoking tokens is not trivial. I assume the process is basically to chgne the secret in the config? then old tokens won't match anymore?	00:21
clarkb	Basically it is doable but might requires a restart?	00:21
corvus	yep	00:22
corvus	(it may as well be a shared-secret system the way we'll be using it -- but this way we don't need to implement another auth mech)	00:23
clarkb	cool and then if we only ever leave the token on the servers themselves the risk is roughly the same as the other side of the verification	00:23
corvus	yep	00:23
clarkb	if we move the secret off host we should time scope them	00:23
corvus	yeah. hopefully instead of doing that though we just run keycloak for our local convenience :)	00:24
clarkb	right	00:24
clarkb	I'm just making sure I've got a good grasp of how this works. This has helped, thanks	00:24
clarkb	and for keycloak we'd add a new auth opendevkeycleak entry or similar wthat is driver openidconnect. Then in our admin rules we would configure it to look for that issuer and probably specific users	00:25
clarkb	and then token issuance happens via keycloak's api	00:25
clarkb	whatever that method might be	00:25
corvus	clarkb: yep, and we can do stuff with keycloak groups and zuul tenants, etc.	00:26
Clark[m]	Related to understanding how things work John Carmack did a commencement speech recently where he talks about not being afraid to dig into details and actually understand how things work. I'm sure I'll never have as much understanding as he does, but I find that I enjoy technology a lot more when I don't treat it as magic but instead as largely decipherable tools. https://www.youtube.com/watch?v=YOZnqjHkULc	00:29
clarkb	heh and now to the other client. Of course sometimes you wonder if software and computers truly are decipherable :)	00:30
opendevreview	Ian Wienand proposed opendev/system-config master: Update bridge playbook match https://review.opendev.org/c/opendev/system-config/+/820281	00:38
opendevreview	Ian Wienand proposed opendev/system-config master: Rename install-ansible to bootstrap-bridge https://review.opendev.org/c/opendev/system-config/+/820282	00:38
opendevreview	Ian Wienand proposed opendev/system-config master: Rename install-ansible to bootstrap-bridge https://review.opendev.org/c/opendev/system-config/+/820282	00:45
ianw	corvus: sorry, lost it in the scrollback, did you want a full zuul restart, or just schedulers/web?	00:46
corvus	Roll sched and web only	00:48
ianw	ok, i can do that, and gerrit, in say 1-30/2 hrs from now	00:49
corvus	Thx. I won't be around then fyi. Or even now really :)	00:50
ianw	no worries. what could possibly go wrong :)	00:50
ianw	as clarkb says, always good to get into the details	00:50
corvus	You know where the big red reset button is	00:51
*** rlandy\|ruck is now known as rlandy\|out		00:55
kevinz	clarkb: ianw: Np, I will update today for the Cert.	02:30
ianw	ok going for some restarts	02:45
ianw	ok, https://zuul.opendev.org/t/openstack/build/414cdcdd14bc432c9909c692a3841aed/logs pushed 3.3: digest: sha256:152fc54f4d91f938cfe6bf5a762f129f8716e05a46619a5fe31eaaca5eabd7c5	02:48
ianw	that matches https://hub.docker.com/layers/opendevorg/gerrit/3.3/images/sha256-152fc54f4d91f938cfe6bf5a762f129f8716e05a46619a5fe31eaaca5eabd7c5?context=explore	02:48
ianw	"RepoDigests": [	02:50
ianw	"opendevorg/gerrit@sha256:152fc54f4d91f938cfe6bf5a762f129f8716e05a46619a5fe31eaaca5eabd7c5"	02:50
ianw	],	02:50
ianw	matches on review. so we're ready to restart with 3.3.8	02:51
ianw	old image is a071a9727a92	02:51
fungi	lgtm	02:53
ianw	... and back Powered by Gerrit Code Review (3.3.8-9-g783af24727-dirty)	02:53
fungi	yay! thanks	02:54
ianw	#status log restarted gerrit with 3.3.8 from https://review.opendev.org/c/opendev/system-config/+/819733/	02:54
opendevstatus	ianw: finished logging	02:54
ianw	now zuul	02:54
*** sshnaidm is now known as sshnaidm\|off		02:57
ianw	zuul/zuul-scheduler <none> 0a216ce83b59 26 hours ago 491MB	03:00
ianw	i'm not so sure about this on zuul01	03:00
ianw	https://hub.docker.com/layers/zuul/zuul-scheduler/latest/images/sha256-25347323eeaead7f8a8ca27f5b8ffd5ee62dda5ddcb508b30f1b6390727674bb?context=explore	03:00
ianw	is the latest	03:00
ianw	"zuul/zuul-scheduler@sha256:25347323eeaead7f8a8ca27f5b8ffd5ee62dda5ddcb508b30f1b6390727674bb"	03:02
ianw	it's ok, i'm just blind, that's the old image	03:02
ianw	everything matches	03:02
ianw	2021-12-03 03:02:40,151 DEBUG zuul.CommandSocket: Received b'stop' from socket	03:03
wxy-xiyuan	Hi, @ianw, cloud you please take a look https://review.opendev.org/c/openstack/project-config/+/818723 if you're free? When I tried to add openEuler support to devstack, the team suggest that the CI cloud be ready at the same time. Before I write the job, the node should be ready first I guess.	03:03
wxy-xiyuan	s/cloud/could	03:04
ianw	https://zuul.opendev.org/components shows zuul01 init	03:06
ianw	01 is up, going to restart 02 & web	03:24
ianw	wyx-xiyuan: sorry, i had missed that one. one comment inline	03:33
ianw	z2 scheduler up, restarting web now	03:39
wxy-xiyuan	@ianw, big thanks! will reply soon	03:40
opendevreview	wangxiyuan proposed openstack/project-config master: Add openEuler 20.03 LTS SP2 node https://review.opendev.org/c/openstack/project-config/+/818723	03:44
ianw	we now seem to have a beaker next to pipelines, which i think means it worked	03:57
ianw	#status log performed rolling restart of zuul01/02 and zuul-web	03:58
opendevstatus	ianw: finished logging	03:58
fungi	https://zuul.openstack.org/status is back as well now that the fix is in place	04:14
*** ysandeep\|out is now known as ysandeep\|ruck		04:54
*** pojadhav\|afk is now known as pojadhav		05:00
wxy-xiyuan	ianw, fixed now. :)	06:06
*** raukadah is now known as chandankumar		06:12
*** ysandeep\|ruck is now known as ysandeep\|afk		06:16
ianw	wxy-xiyuan: one other thing; is there a reason it's not added to nl02? as it's a new distro, we could restrict it to the rax servers to start and then roll out everywhere when it is working (i.e do it in a follow-on)	07:48
ianw	it is easier to debug one thing at a time	07:48
wxy-xiyuan	My thought is to enable it in a small place for testing first. Once it's stable enough, we can add it to everywhere. So I just added it in nl01 for x86	07:51
opendevreview	Ian Wienand proposed opendev/system-config master: infra-prod: setup system-config on bridge in bootstrap job https://review.opendev.org/c/opendev/system-config/+/820320	07:52
ianw	wxy-xiyuan: ok; might be worth adding it in a follow-on but mark it wip	07:53
wxy-xiyuan	Sure	07:53
ianw	clarkb / fungi: ^^that's a bit of a hail mary change and i'll think on it some more; but it feels right. basically, a infra-prod-bootstrap-bridge job should always run, and it should a) install the production ansible and b) update system-config to the buildset reference	07:55
ianw	at the moment, we use playbooks/zuul/run-production-playbook.yaml to run "install-ansible.yaml". that actually feels wrong -- that is using the "production" ansible to install ... the production ansible	07:57
ianw	now I don't think you'd actually ever notice unless we wiped out ansible on bridge, but it still feels like we've got a hidden bootstrap problem with that	07:58
ianw	it is a question for me if our existing install-ansible role is 100% idempotent. it probably is but i'd want to investigate	07:59
ianw	by removing the file matcher and making this run unconditionally, i think we have avoided the fundamental problem of the other queues not updating the source	08:00
*** jpena\|off is now known as jpena		08:02
ianw	the DISABLE-ANSIBLE also needs integration. I think the place to do that is from setup-keys -- after setting up so each executor can log into bridge, each job can then check the on-bridge-disk flag file and stop itself before it goes on	08:02
*** ysandeep\|afk is now known as ysandeep		08:07
*** ysandeep is now known as ysandeep\|ruck		08:08
opendevreview	Ian Wienand proposed openstack/project-config master: Update the opendev/system-config tag https://review.opendev.org/c/openstack/project-config/+/819715	08:16
opendevreview	Elod Illes proposed openstack/project-config master: Add rights to neutron-dynamic-routing-stable-maint https://review.opendev.org/c/openstack/project-config/+/820351	08:51
*** ysandeep\|ruck is now known as ysandeep\|lunch		08:58
*** ysandeep\|lunch is now known as ysandeep		10:13
*** ysandeep is now known as ysandeep\|ruck		10:16
opendevreview	Arnaud Morin proposed openstack/project-config master: Disable nodepool temporarily https://review.opendev.org/c/openstack/project-config/+/820369	10:44
*** rlandy_ is now known as rlandy\|ruck		11:14
*** arxcruz\|rover is now known as arxcruz		12:43
fungi	ianw: i've pretty much always assumed there were bootstrapping gaps for bridge.o.o, but i agree it would be good to close them where we can	13:08
*** pojadhav is now known as pojadhav\|brb		13:48
*** pojadhav\|brb is now known as pojadhav		15:20
*** chandankumar is now known as raukadah		15:29
clarkb	ianw: fungi: left a couple of thoughts on that system-config update. I think it is close to what we want, but needs a few edits. Also as noted we might be able to do it in two stages where we can confirm the first job is doing what we want before we rely on it	16:08
fungi	i'm still thinking through how best to test the hanging newlist call	16:08
clarkb	fungi: Maybe hack up the test input for the test job and undo the "don't send emails"	16:10
fungi	yeah, maybe drop all the lists except the mailman meta-list for one site	16:11
clarkb	fungi: for that I think you'll want to update the inventory stuff to remove all the lists except for your lists.openinfra.dev list. Then toggle the test flag to false	16:11
clarkb	ya exactly	16:11
clarkb	then you should be in a state where it is stuck and the job will timeout. Then you can hop onto the held node and rerun the newlist manually	16:11
*** marios is now known as marios\|afk		16:15
*** pojadhav is now known as pojadhav\|dinner		16:36
*** ysandeep\|ruck is now known as ysandeep\|out		16:38
opendevreview	Jeremy Stanley proposed opendev/system-config master: DNM: Reproduce mailman newlist hanging https://review.opendev.org/c/opendev/system-config/+/820392	16:38
clarkb	ya something along those lines should work	16:39
fungi	if it ever gets a node assigned	16:44
fungi	there it goes	16:45
corvus	infra-root: frickler requested that i move the docker valume mapped directory /var/keycloak/log to /var/log/keycloak in https://review.opendev.org/819923 -- do we have a collective preference for that?	16:45
corvus	in some cases we have, eg, /var/log/zuul, but in others, i see us putting all the docker volume dirs under one /var/foo	16:46
fungi	i guess it's a question of whether it's more convenient to group the mapped dirs together in one place on the host fs	16:46
clarkb	I don't think we've been very consistent about how to capture logs for containers. Partly because services do logging in a variety of ways. For services that write to stdout/stderr we've got syslog capturing systems that write them to dedicated files on disk. Services like zuul and gitea we capture in log dirs but as you mention they are done in different ways	16:47
fungi	i don't have a personal preference, other than for consistency, which we already seem to lack at this point	16:47
corvus	yeah, that change seems to have collected a very large number of nit comments where people said "we do it this way" and in fact, we do it that way half the time, and we do it another way the other half of the time. so i'm trying to navigate that and produce something that will actually get some +2 reviews.	16:48
corvus	so i'm trying to figure out what the actual right answers are	16:48
corvus	(i copied that from the etherpad role, btw, so everything in that change has precedent)	16:49
fungi	i'll admit i do tend to look in /var/log first when trying to find logs, but a quick skim of the docker-compose file typically sorts me out if i don't find whatever i'm looking for	16:50
corvus	okay, i'll switch it then since frickler has a preference and no one else does	16:50
*** jpena is now known as jpena\|off		16:51
clarkb	For me my biggest concerns at this point are understanding the user we're running as (1000 in this case) and ensuring we don't accidentally delete state because ti was written into the ephemeral image and not a bind mount (I think we're mounting the h2 db dir?)	16:51
corvus	clarkb: correct	16:51
clarkb	Otherwise it is hard to be consistent with every application simply because they differ and referring to the configs (docker-compose in many cases) is a good way to determine that when debugging	16:52
corvus	(tbh, i prefer the other way considering that the actual filesystem location in the container is /opt/jboss/keycloak/standalone/log , and it's a sibling directory to data, so in my original change, they are both sibling directories in both locations)	16:52
*** pojadhav\|dinner is now known as pojadhav		16:53
opendevreview	James E. Blair proposed opendev/system-config master: Add a keycloak server https://review.opendev.org/c/opendev/system-config/+/819923	16:54
corvus	clarkb, ianw, frickler ^ i think i addressed all the comments	16:54
*** marios\|afk is now known as marios		16:59
*** marios is now known as marios\|out		17:08
opendevreview	Clark Boylan proposed opendev/system-config master: Add a second Zuul user in gerrit testing https://review.opendev.org/c/opendev/system-config/+/820395	17:10
clarkb	Thats a naive first step in testing around case sensitive usernames	17:10
opendevreview	James E. Blair proposed opendev/system-config master: Add a keycloak server https://review.opendev.org/c/opendev/system-config/+/819923	17:24
*** pojadhav is now known as pojadhav\|out		17:24
corvus	clarkb, fungi ^ one more ps to fix the thing fungi caught	17:24
fungi	thanks!	17:25
fungi	so the good news is that i was able to recreate the hanging newlist call: https://zuul.opendev.org/t/openstack/build/ef9e10d4365b4aa69afac0bd4bd149de	17:34
clarkb	yay	17:34
fungi	the ara report has the tasks up to the one which tries to create the lists since that task times out and never completes	17:34
fungi	doesn't actually have that task itself, so i simply assume it's hanging like we observed in production	17:35
clarkb	ya the json only writes out when the task completes iirc	17:35
clarkb	and we probably timed out and killed ansible before that happened due to the nag	17:36
clarkb	*hang	17:36
clarkb	fungi: I guess you can run newlist by hand next and see what it prompts for and if that is different for the mailman meta list than a normal list?	17:41
clarkb	it may be that there is a required value to be provided and it can't default like the normal list case	17:41
fungi	i suspect it's more that our workaround isn't working around. i'm going to test that next	17:43
clarkb	fungi: also why didn't testing catch this when you added the new site? I guess that is why it is good news we replicated	17:45
opendevreview	Jeremy Stanley proposed opendev/system-config master: DNM: Reproduce mailman newlist hanging https://review.opendev.org/c/opendev/system-config/+/820392	17:45
fungi	well, we don't test it because we explicitly avoid sending notifications, and it's the notification sending which prompts	17:46
fungi	i replicated it by running the test job without disabling notifications	17:46
fungi	we previously thought we had replicated it and that replacing stdin with an empty string would do the same as what we had tested, i believe	17:47
fungi	but we may need to add </dev/null or something like that	17:48
clarkb	oh right I forgot about that so ya removing the no email flag makes it prompt ( I wish software didn't do that, but what can you do)	17:49
clarkb	fungi: yes I'm fairly certain we managed to confirm it was fixed, but I suppose its possible we only convinced ourselves of that and reality was different	17:49
fungi	well, what's fun is that removing the no email flag makes it prompt if it thinks you're in an interactive shell, and it looks like ansible probably goes out of its way to convince shells that's the case	17:50
fungi	part of the problem is that killing the newlist once it reaches the notification prompt basically has the desired effect minus notification, since the list has already been created at that point	17:52
fungi	so subsequent runs will see the list exists and not rerun newlist	17:52
clarkb	fun.	17:52
fungi	the effective way to test this would be to configure exim to send all messages to /dev/null or something	17:54
fungi	so that newlist can believe it's notifying admins	17:54
clarkb	fungi: and remove the test only flag? I'd be open to that	17:54
clarkb	but also not sure I know how to make exim garuntee that	17:54
fungi	well, we could in theory use this change or a similar one to work out the details	17:55
fungi	so that we avoid annoying/confusing real list admins	17:55
clarkb	++	17:56
clarkb	fungi: maybe we need to run under nohup?	17:58
clarkb	a new patchset to your existing change should be able to test something like that. Then we can work backwards to swap out exim configs?	17:59
fungi	yep. that ought to work	18:00
fungi	mmm, nohup also redirects stdout/stderr to local files automatically. that may make debugging harder	18:06
fungi	i'll wip it for the moment while we test whether it's an effective workaround at all	18:06
opendevreview	Jeremy Stanley proposed opendev/system-config master: Run newlist under nohup https://review.opendev.org/c/opendev/system-config/+/820397	18:09
clarkb	fungi: nohup manpage says we can redirect to other files if we prefer. We would have to switch from command to shell module in the nasible to use redirects	18:13
fungi	yeah, and if we can use redirects we could just </dev/null explicitly instead	18:13
clarkb	ya	18:18
fungi	i think we ended up using cmd and overloading stdin that way because "shell" is frowned upon?	18:19
clarkb	ya I think the linter prefers it that way but if redirecting then you need the shell and its fine	18:21
clarkb	fungi: https://zuul.opendev.org/t/openstack/build/d6d023ba318e44e6b5aefd861614061e/log/job-output.txt#22193-22221	18:32
clarkb	fungi: maybe the "fix" here is to switch to sending a newline on the stdin	18:32
fungi	819923,9 looks like it failed system-config-run-keycloak when trying to start apache, but we don't collect apache logs	18:32
clarkb	https://docs.ansible.com/ansible/latest/collections/ansible/builtin/shell_module.html#parameter-stdin_add_newline but apparently it is already the default to send a newline	18:33
clarkb	fungi: https://opendev.org/opendev/puppet-mailman/src/branch/master/lib/puppet/provider/mailman_list/mailman.rb#L69-L93 that is what puppte was doing I think. Unfortunately no explicit stdin handling	18:38
clarkb	fungi: is it possible that mailman updates and/or mailman on focal is the cause of this behavior change?	18:39
clarkb	that could explain why we were confident it was fixed but now it isn't	18:39
fungi	i can certainly try running the job on bionic	18:42
fungi	or xenial?	18:43
clarkb	ya it would've been xenial before	18:43
opendevreview	Jeremy Stanley proposed opendev/system-config master: DNM: Reproduce mailman newlist hanging https://review.opendev.org/c/opendev/system-config/+/820392	18:47
opendevreview	Jeremy Stanley proposed opendev/system-config master: Redirect stdin for newlist https://review.opendev.org/c/opendev/system-config/+/820397	18:47
fungi	switched the reproducer to xenial, updated the workaround to redirect stdin with a shell task	18:47
opendevreview	Jeremy Stanley proposed opendev/system-config master: Pipe yes into newlist https://review.opendev.org/c/opendev/system-config/+/820397	19:25
clarkb	infra-root I copied my raw notes file for the gerrit user summit into my homedir on review02	19:27
clarkb	I'd typically prefer to stick them in an etherpad but they are very raw and have event urls and I'm not comfortable putting them on etherpad right away	19:28
clarkb	if you would like them more curated on an etherpad I can work on that next week	19:28
corvus	clarkb: don't forget to scrub the names for gdpr compliance! (/sarcasm -- maybe, honestly, don't know)	19:30
clarkb	corvus: ya... thats one of the things since I put some names in there	19:30
clarkb	and they seem very cautious in that community about names :)	19:31
corvus	simple solution: give everyone aliases from clue. Colonel Mustard uploaded gerrit 3.4 in the office with the release script.	19:32
clarkb	hahahaha	19:32
fungi	the newlist </dev/null in https://zuul.opendev.org/t/openstack/build/052e2fa7f2ef4fa592db74ffe991136d looks like it did work (i think the prompt was printed but bypassed), though subsequent tests failed for it	19:32
fungi	oh, but maybe that's because i didn't fix the tests for the lists i omitted	19:32
clarkb	fungi: ya the tests check specific sites and lsits	19:32
clarkb	if </dev/null worked then we probably have a reasonable workaround	19:33
fungi	i agree, i'll try switching back to that one momentarily	19:34
opendevreview	Jeremy Stanley proposed opendev/system-config master: DNM: Reproduce mailman newlist hanging https://review.opendev.org/c/opendev/system-config/+/820392	19:39
opendevreview	Jeremy Stanley proposed opendev/system-config master: Redirect stdin for newlist https://review.opendev.org/c/opendev/system-config/+/820397	19:39
fungi	also i confirmed the reproducer still reproduces on xenial, so i don't think this crept in with the focal upgrade, i think it was just never thoroughly tested	19:40
fungi	further, i think corvus would make an excellent professor plum	19:42
ianw	> in some cases we have, eg, /var/log/zuul, but in others, i see us putting all the docker volume dirs under one /var/foo	20:33
opendevreview	Jeremy Stanley proposed opendev/system-config master: DNM: Reproduce mailman newlist hanging https://review.opendev.org/c/opendev/system-config/+/820392	20:34
opendevreview	Jeremy Stanley proposed opendev/system-config master: Redirect stdin for newlist https://review.opendev.org/c/opendev/system-config/+/820397	20:34
ianw	my feeling on that is probably that if it's under /var/foo, /var/foo might be a separate cinder volume. i feel like maybe gerrit/graphite are things that have separate storage volumes	20:34
ianw	anyway, not super fussed	20:34
corvus	well, etherpad was the role i copied all that from	20:35
corvus	(so every thing someone disagreed with was true for the etherpad role). there's /var/etherpad/db and /var/etherpad/www	20:35
corvus	i'm not super fussed either, but given the differences between roles and review comments, maybe we ought to go through and articulate a policy	20:36
corvus	infra-root: public service announcement: because of all the recent 'zuul delete-state' runs, there are no autohold records in zuul, but there are some held nodes in nodepool. might be worth a check of the nodepool nodes.	20:38
ianw	clarkb: thanks, will loop back on your comments, sorry yes i meant to add the nodes:[] from your prior comment on that, thanks for picking up	20:39
fungi	clarkb: one other thing i've noticed... even though i set my address as the testlist admin, i did not receive any notification from the test node. checked my mta's logs and there were no connections (not even rejections) from the node's ip address	21:28
fungi	not sure if we're successfully blocking test nodes from sending e-mail already, or if that workaround is causing newlist not to generate the notification	21:30
fungi	(though it has an exit code of 0 so it didn't act like that was a failure)	21:30
clarkb	corvus: thanks for the reminder I am pretty sure I have a couple I can clean up. Will check momentarily	21:30
clarkb	fungi: it oculd be the test node provider blocks smtp	21:31
clarkb	I have requested that nodepool delete my held nodes they shuold disappear momentarily	21:32
clarkb	fungi: maybe that is the easiest thing to do though add an iptables rule blocking port 25?	21:33
clarkb	left that as a suggestion on the change so that it doesn't get missed if fungi is weekending already	21:37
fungi	this is how i weekend ;)	21:45
clarkb	fungi: if you were closer I'd take you fishing or something so that you could weekend more weekendy	21:46
fungi	but yeah, i agree, an egress rule blocking destination port 25/tcp before the allow all egress rule would be a great addition	21:46
fungi	if you were closer you wouldn't need to take me fishing, we could just cast from the yard ;)	21:46
clarkb	ha indeed	21:46
fungi	anyway, firewall rule is a stellar idea, far less effort than reconfiguring exim to drop outbound messages on the floor	21:48
ianw	i probably have some gerrit held nodes, they can be removed if in there, otherwise i'll clean up and test the new gerrit 3.4 images next week	21:48
fungi	clarkb: the main reason i brought it up is that i suspect we need to test whether it tried to deliver the message, so that we know the workaround isn't just equivalent to always doing newlist -q	21:48
clarkb	fungi: ah good point. Maybe we can test that on a held node?	21:49
clarkb	through manual invocations of newlist	21:49
fungi	also, maybe we can configure our parent job to collect exim logs	21:49
fungi	if i had the exim logs and they showed mailman sending notifications through exim (even if undeliverable because of firewall drop/reject) that would be enough to satisfy my concern	21:50
fungi	so i think that's what i'll do. a change to block outbound 25 on our test list nodes, a change to collect exim logs in all our deployment tests, and then drop the -q and associated conditional in the workaround	21:52
clarkb	sounds like a plan	21:53
clarkb	by the way my naive case insensitive username collision fails and this si apparently expected. The reason for this is while current gerrit treats existing usernames as case sensitive (therefore not breaking our existing users) it won't let you create new users that have collisions	21:54
clarkb	that makes testing of the behavior changes a bit difficutl, but probably good enough for now	21:54
fungi	in fact, it might be a good idea to just make system-config-run block outbound 25/tcp from everything?	21:55
fungi	system-config-run is only inherited by test jobs, right?	21:56
clarkb	yes system-config-run is independent of our prod stuff	21:56
clarkb	there is overlap where the roles and common group/host vars are used	21:57
clarkb	but system-config-run runs distinct playbooks to setup stuff and will also put new zuul test job specific host/group varsi n places	21:57
fungi	okay, i'll propose a single change which blocks 25/tcp outbound in system-config-run and collects exim logs	21:57
fungi	if that makes sense	21:57
clarkb	yup I think that sounds great	21:58
fungi	that way any of our deployment test jobs shouldn't be able to accidentally send outbound e-mail, but also if we're curious about whether something tried we can look in the log	21:59
clarkb	infra-root https://review.opendev.org/c/opendev/system-config/+/820267 would be a good one to review for early next week. It upgrades gitea to 1.15.7	22:05
clarkb	I'm a bit distracted this afternoon with parenting duties so please don't approve now unless you intend on watchnig it :) but I'll happily land it monday or fix issues if people find them	22:05
opendevreview	James E. Blair proposed opendev/system-config master: Add a keycloak server https://review.opendev.org/c/opendev/system-config/+/819923	22:17
corvus	clarkb, fungi, ianw: i now have a definite preference for the ordering of server certs; i updated that to list the server first so that we don't have to template out more of the apache config	22:18
corvus	(and really, the individual server name is optional anyway; we could drop it and be fine; it's just a convenience for us when debugging with direct access)	22:19
Clark[m]	Ah because it can always be keycloak.opendev.org in the file path that way?	22:19
corvus	yep	22:20
opendevreview	James E. Blair proposed opendev/system-config master: Update letsencrypt role docs to suggest a specific order https://review.opendev.org/c/opendev/system-config/+/820409	22:26
ianw	corvus: i'm fine with adding on backups as we find out. is there a particular reason you don't want to grab the service-status page via apache in the testinfra? i've definitely seen things before where it was listening, but not actually responding correctly, so my preference is to do more end-to-end validation in testinfra if we can	22:36
corvus	ianw: oh sorry i forgot to reply to that comment... do we do that anywhere?	22:40
ianw	we generally call out to curl; quite a few examples e.g. https://opendev.org/opendev/system-config/src/branch/master/testinfra/test_codesearch.py#L23	22:40
ianw	as this has a UI, could even do the screenshot stuff	22:41
ianw	i don't mind if this is a follow-on; just we do have facilities to do a lot more testing there	22:41
corvus	have an example with a system-status page?	22:42
corvus	i'm looking and can't find one	22:42
ianw	oh, i don't think explicitly a system-status page	22:42
ianw	but we do have examples of using requests directly	22:43
ianw	https://opendev.org/opendev/system-config/src/branch/master/testinfra/test_paste.py#L35 is what i'm thinking of. so if it's json that might be easier too	22:45
corvus	how about we put in a check for loading the main page (so something like "test_paste") in a followup? I already destroyed my local test env; i think at this point it'll be easier to just look at the server when we boot it to get the correct output. i think that's worth more than the apache status page anyway (which is only there again because i copied that from etherpad)	22:50
corvus	incidentally, the reason i missed that comment is that it's on line 23, and gerrit and gertty disagree on whether that file has a line 23.	22:55
corvus	(the final byte of that file is the newline on line 22)	22:55
fungi	seems fine to keep on the to do list for a followup change, i assume ianw would be fine with that too since he's +2'd the current change	22:57
clarkb	corvus: that is a neat bug :)	22:57
corvus	yeah. i'm clearly going to need to "fix" it even if i don't agree it's "broken" :)	22:57
clarkb	the best bugs are the ones you fix that were never really boken in the first place	22:58
corvus	is it okay from a system-config ansible perspective for me to +w that now?	22:58
clarkb	I just sat back down after a school run and can take another look, however if only the le names moved I'm fine with a +w	22:59
clarkb	it also dones't have a new server yet so this is largely a noop until we boot one right?	22:59
clarkb	so ya +w should be fine	22:59
fungi	yes, that's all	23:14
ianw	follow-up is fine. i've just been writing a talk about how amazing our testing is, so i'm attuned to it atm :)	23:16
corvus	should i +w that change, or should i create the server and add it to dns and inventory first?	23:17
corvus	not sure about the chicken/egg thing here	23:17
corvus	also, i'm going to guess 4g for this server	23:18
ianw	i'd probably make the server and add to inventory then merge the change but i don't think it matters	23:20
Clark[m]	Ya doesn't matter too much. You'll just have to follow-up with an inventory update next	23:20
corvus	okay, i'll do the server first... and redo it since i just named it keystone instead of keycloak	23:23
corvus	er, anyone know how to run "openstack server list" on bridge?	23:28
fungi	huh, yeah, i'm getting a "temporary failure in name resolution"	23:30
corvus	oh okay so that was supposed to work	23:30
fungi	sudo ~fungi/osc/bin/openstack --os-cloud openstackci-vexxhost --os-region-name ca-ymq-1 server list	23:30
fungi	i installed latest osc in a venv there	23:31
corvus	oh should i make this in vexxhost and not rax dfw?	23:31
fungi	oh, i see, i think the name resolution failures are something to do with osc being installed in a docker container?	23:31
fungi	/usr/local/bin/openstack on bridge is a wrapper script calling docker run	23:32
corvus	oh, so running your command with rax instead of vexx might work?	23:32
fungi	there may be dns resolver configuration problems within the container itself	23:32
corvus	running `~fungi/osc/bin/openstack --os-cloud openstackci-rax` as root does not work for me	23:33
corvus	`Version 2 is not supported, use supported version 3 instead.`	23:33
fungi	yeah, i was just trying to figure that out as well	23:34
fungi	i wonder if rackspace changed their keystone	23:34
corvus	somehow launch-node works tho	23:34
fungi	yeah, our clouds.yaml seems to set identity_api_version: 2	23:35
corvus	okay, i managed to find the right rackspace web login and deleted the server	23:37
fungi	i'm betting we need to update the clouds.yaml now	23:38
ianw	i am 100% sure we've had this problem with the openstack wrapper on bridge before	23:41
ianw	i just can not find any details about it	23:42
ianw	i feel like we might have done something like a docker restart and it started working	23:43
fungi	oh, i think the "Version 2 is not supported, use supported version 3 instead." error is coming from osc itself. rackspace still uses/needs keystone v2 api, so you have to use an old osc release to talk to rackspace	23:43
fungi	it seems like osc has given up on the idea of backward compatibility there	23:45
Clark[m]	This is why shade existed and it makes me wonder if we need to resurrect that sort of idea with the sdk team	23:47
fungi	though i can't be sure, as i'm not actually finding that error string in osc	23:51
Clark[m]	It would be in the keystoneauth library	23:54
Clark[m]	And ya grepping those repos is often an exercise in frustration because the code does too much magic	23:54
fungi	i think it's actually bubbling up from cinderclient?	23:58
fungi	sudo ~fungi/osc/bin/openstack --os-cloud openstackci-rax --os-region-name DFW --os-volume-api-version 3 server list	23:59
fungi	that's working for me now	23:59
fungi	overriding the volume_api_version: 2 from clouds.yaml	23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!