Monday, 2024-01-29

frickler	tonyb: not sure about the status of inmotion cleanup, according to grafana there's still 28 nodes stuck in deleting?	09:10
*** tosky_ is now known as tosky		10:19
opendevreview	Jan Marchel proposed openstack/project-config master: Add new components to NebulOuS project: prediction-orchiestrator, exn-middleware, overlay-network-agent https://review.opendev.org/c/openstack/project-config/+/907060	13:48
jrosser	i know not much can be done but looks like connectivity between me and opendev.org is pretty terrible again	15:25
jrosser	and mtr points to above.net / zayo as culprit again	15:26
fungi	jrosser: ipv4, v6, both?	15:32
jrosser	ah i'm not on a dual stack vm right now	15:33
* jrosser tries something else		15:33
* frickler still has no v6 connectivity to opendev.org. (or again? I keep loosing track. likely should add that topic to preptg agenda, too)		15:35
jrosser	i see this for v4 https://paste.opendev.org/show/bVNawMTqNZJenQUqe40F/	15:36
jrosser	though i'm not sure that actually reports end to end loss though	15:36
frickler	well end to end loss is in the final line, which says 0%, so that looks fine	15:38
jrosser	it's the same symptom that i've seen before of only kbit/s throughput on git things	15:46
frickler	jrosser: here's an mtr in the opposite direction to the first public IP in your trace. looks like there might be congestion between zayo and bbc https://paste.opendev.org/show/bVonvvCAVp1gj1U8Sk5x/	16:02
opendevreview	Tim Burke proposed opendev/git-review master: Add classifiers for Python 3.10 and 3.11 https://review.opendev.org/c/opendev/git-review/+/907097	16:41
frickler	seems we have a regression in linting for git-review, failure unrelated to the patch afaict https://zuul.opendev.org/t/opendev/build/6fbff753824c418aa26d173d5ffefb13	16:45
corvus	Clark: i'm working on the skopeo thing	16:50
clarkb	corvus: ok, is it more involved than simply using a newer client?	16:55
clarkb	hrm I guess the change that tried that failed too	16:55
opendevreview	James E. Blair proposed zuul/zuul-jobs master: ensure-skopeo: use make install https://review.opendev.org/c/zuul/zuul-jobs/+/907100	16:57
corvus	clarkb: i think that's the next mole to whack ^	16:57
corvus	if works, will need to squash	16:57
clarkb	I guess that installs to /usr/local/bin by default?	16:58
opendevreview	Tim Burke proposed opendev/git-review master: Use importlib.metadata instead of pkg_resources https://review.opendev.org/c/opendev/git-review/+/907101	17:00
corvus	clarkb: yep, verified in a local build	17:01
opendevreview	Merged openstack/project-config master: Implement ironic-unmaintained-core group https://review.opendev.org/c/openstack/project-config/+/902796	17:09
opendevreview	Tim Burke proposed opendev/git-review master: Fix flake8 issue https://review.opendev.org/c/opendev/git-review/+/907102	17:10
timburke	frickler, thanks for letting me know; ^^^ should address	17:11
opendevreview	Jan Marchel proposed openstack/project-config master: Add new components to NebulOuS project: prediction-orchiestrator, exn-middleware, overlay-network-agent https://review.opendev.org/c/openstack/project-config/+/907060	17:13
opendevreview	Jeremy Stanley proposed opendev/system-config master: Retire the OpenInfra Labs mailing list https://review.opendev.org/c/opendev/system-config/+/907103	17:16
clarkb	fungi: where did we end up wtih keycloak?	17:26
clarkb	fungi: I think if othes don't object renaming the database files and mounting them in the new locations seems reasonable. I guess make backups of the files too	17:26
fungi	clarkb: yeah, that's what i'm doing, basically	17:28
fungi	also dawned on me that i should double-check the ownership and permissions inside the container	17:28
clarkb	oh ya those may have changed too. Fun	17:28
fungi	does mirror02.iad3.inmotion need to remain in the emergency disable list, or is it working again?	17:29
fungi	i seem to be able to ssh into it	17:29
fungi	just want to be sure before i re-break our base deploy	17:29
clarkb	I think it is happy after tonyb did surgery on that cloud	17:29
clarkb	basically had to restart rabbitmq properly. We tried a naive approach whihc didn't work but there was a kolla method that was better and got things working again	17:30
fungi	okay, i've taken it back out but we should keep an eye on the deploy jobs	17:30
clarkb	++	17:31
fungi	i also added keycloak01 to the disable list while i'm working on potential compose file edits	17:31
clarkb	corvus: fwiw I checked snapcraft and microk8s latest/stable hasn't updated since they broke things. No new release yet. The bug hsa gotten some references added to it so maybe it will get fixed	17:32
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Run buildset registry test jobs on ensure-skopeo changes https://review.opendev.org/c/zuul/zuul-jobs/+/907104	17:33
clarkb	corvus: which means we may want to go ahead with the nodepool change for now then swing around and cleanup zuul-jobs	17:34
corvus	which nodepool change?	17:34
corvus	https://review.opendev.org/906905	17:35
clarkb	yup that one. That change is what started my journey on friday :)	17:35
clarkb	trying to land the ssh keyscan change	17:35
clarkb	corvus: https://review.opendev.org/c/zuul/zuul-jobs/+/906907 was next then https://review.opendev.org/c/zuul/zuul-jobs/+/906916	17:36
corvus	yeah, i got those but missed the rollback one :)	17:37
corvus	i think i missed it cause i started at the end and worked backwards	17:37
clarkb	and then finally the ssh keyscan change you wrote is at the very beginning	17:38
corvus	thanks for digging into that :)	17:40
clarkb	you're welcome	17:41
opendevreview	Merged opendev/git-review master: Fix flake8 issue https://review.opendev.org/c/opendev/git-review/+/907102	17:51
opendevreview	Tim Burke proposed opendev/git-review master: Use importlib.metadata instead of pkg_resources https://review.opendev.org/c/opendev/git-review/+/907101	17:52
clarkb	it took me a second to udnerstand why ^ is necessary since git-review depends on setuptools to install. Except it doesn't if you install from a wheel...	18:05
timburke	clarkb, bingo -- idea was to have it as an alternative to https://review.opendev.org/c/opendev/git-review/+/898839?usp=search	18:31
fungi	mmm, i'm getting increasingly out of my depth with the keycloak migration experiment... container log is now full of errors about failing to start the server because it can't establish a jdbc connection, reporting "ERROR: Wrong user name or password [28000-197]"	18:45
fungi	i guess h2 databases have some sort of integrated credentials?	18:45
clarkb	I didn't think they did. I thought it was more like sqlite. But maybe I'm wrong about that	18:46
fungi	https://stackoverflow.com/questions/63800413/invalid-username-password-when-accessing-keycloaks-h2-database	18:47
adamcarthur5	I've seen that in some cases people are adding a comment that re-runs the CI to check if it is a fluke. What is the best way to do this myself?	18:47
clarkb	heh that even points at keycloak	18:47
fungi	adamcarthur5: start a review comment with the word "recheck" but ideally only do that after you've looked into the failure and feel confident it's not being caused by your change	18:48
adamcarthur5	Okay great, thank you :))	18:48
clarkb	you can also append any text after "recheck" if you want to add notes around why you are rerunning things	18:48
JayF	Yes, please do not run `recheck` with some reasoning/explanation afterwards.	18:49
JayF	even `recheck the code under test is not changed by this patch` is good as a minimum bar	18:49
fungi	yes, i generally "recheck because ..." (some summary of the nature of the job failure and why i know it's unrelated)	18:49
clarkb	fungi: things like that are making me think maybe we need to consider a mariadb and start over on config	18:49
clarkb	just because we know how to work with a mariadb and configuratioon of that is a bit more explicit	18:49
clarkb	I dunno this feels like a framework migration where they never considered h2 users because that isn't a "production" deployment so doesn't matter	18:50
JayF	adamcarthur5 the ideal case is that, as a contributor to $project, you'd troubleshoot the random failure as fix it like some kinda coding superhero :D. The reality is as an early contributor, the best thing you can do there is just read the logs, make sure you start recognizing if patterns emerge in failures, and if so raise a question in IRC or on mailing list	18:50
adamcarthur5	JayF yeah definitely. I'll just take it project by project and see how I get on :))	18:51
clarkb	fungi: the only other idae I've got is if we can determine the old h2 db creds we can probably configure quarkus keycloak to use those creds	18:52
JayF	Yeah the entire point is twofold: 1) each `recheck` comment is expensive, in terms of actual-computers-being-run, and 2) we do not want to get to a point where we have a periodic failure, say, every 5-10% of the time it fails randomly, get into the codebase because we "rechecked" to get a green	18:52
clarkb	fungi: also why you would chnge those at all is beyond me. It only serves to make users' lives miserable	18:52
clarkb	fungi: KC_DB_USERNAME and KC_DB_PASSWORD seem to be what you use to set things for postgres etc. Maybe we can set those values and have them work with h2	18:53
fungi	clarkb: yeah, found this in an upgrade note too: "Keycloak ships for development purposes with an H2 database driver. As it is intended for development purposes only, it should never be used in a production environment. In this release, the H2 driver has been upgraded from version 1.x to version 2.x. This change might require changes to the H2 JDBC URL or migration of the H2 database files in	18:53
fungi	an existing Keycloak setup. [...] Purge existing H2 database files to start with an empty database, export and import the realms using Keycloak’s export and import functionality, or refer to the migration notes on the H2 database project’s website for details on how to migrate H2 database contents."	18:53
fungi	that's for 20.0.0 i think, but reflects their sentiments fairly well	18:54
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Run buildset registry test jobs on ensure-skopeo changes https://review.opendev.org/c/zuul/zuul-jobs/+/907104	18:54
clarkb	fungi: ya basically they don't care about people using h2	18:54
clarkb	fungi: https://stackoverflow.com/questions/72863453/unable-to-access-keycloak-18-0-2-embedded-h2-db-file this has what may be the defaults we used	18:54
clarkb	fungi: we could try forcing those values here. Otherwise maybe we consider a revert and plan for a proper db to avoid problems in the future?	18:54
clarkb	oh wait those valuse are for quarkus not wildfly	18:55
clarkb	but we might be able to look in the same files and go back in time to find the old values?	18:56
corvus	clarkb: fungi if our actual goal is to eventually proxy to other providers, then we could use h2 and reconstruct the contents via the api on every deployment.	18:56
clarkb	corvus: we'd still need the user generated content though	18:56
fungi	yeah, it looks like the container change included switching from wildfly to quarkus	18:56
corvus	(or of course, use a real dbms; just mentioning the option)	18:56
clarkb	fungi: yes that is the difference between legacy and not legacy	18:57
clarkb	and it seems like they didn't put a lot of effort into making that transition easy	18:57
clarkb	I mean they could've copied the db from the old location to the new one and continued to use the same credentials	18:57
fungi	corvus: yeah, i think that's where we're headed. the main question i have is whether we should roll back the container change temporarily, and whether we need to include steps to export the h2 realm data from h2/import it into maraidb or want to start with a clean slate	18:58
fungi	so two questions i guess	18:58
corvus	clarkb: oh i guess we would accumulate mappings, wouldn't we?	18:58
clarkb	corvus: yes exactly. Over time users would be building out the db telling keycloak what backend identities they have for the frontent of opendev id (for lack of better terminology)	18:58
corvus	then i agree dbms is the right long-term answer; just a question of whether we want to take that hit now or later	18:59
fungi	not trying to keep our old data would only impact maybe half a dozen users, but i'm open to doing the extra export/import if folks want	19:00
fungi	also since we only have keycloak hooked up to one system for now and it's not in a critical path for anything, it seems like this would be the time to switch it to a proper sql server. that will in theory greatly simplify the container version updates we want to do once we're on the "new style" image, but also means less data to lose or migrate if we do it before working on tying it to more	19:01
fungi	systems	19:01
clarkb	ya I think it also simplifies working with that data	19:02
clarkb	since we know how to talk to a mariadb (or even a postgres) but h2 is a bit more complicated	19:02
opendevreview	Jeremy Stanley proposed opendev/system-config master: Revert "Switch from legacy to new style keycloak container" https://review.opendev.org/c/opendev/system-config/+/907119	19:12
fungi	oh funky... default behavior of git revert has changed! now if you revert a revert, the subject changes to 'Reapply "whatever the reverted commit subject was"' rather than 'Revert "Revert ..."'	19:14
clarkb	yup. I left a note in there about keeping the new test as I think that is valuable	19:14
fungi	the test may need altering because of the change to the url, but i can do that. or i can reintroduce the test as a separate change before the reapply	19:15
fungi	preference?	19:15
clarkb	ya you need to add the /auth/ prefix (noted that in my comment)	19:16
clarkb	I'm fine with a followup change to the revert	19:16
fungi	also i've undone my test changes to the db files (wiped both paths we're mounting into the container and replaced with their content with the local backup copies i made)	19:18
fungi	once we merge a revert, i'll take the server back out of the disable list	19:18
clarkb	sounds like a plan	19:18
opendevreview	Jeremy Stanley proposed opendev/system-config master: Revert "Switch from legacy to new style keycloak container" https://review.opendev.org/c/opendev/system-config/+/907119	19:21
fungi	clarkb: something to consider... if we don't care about exporting/importing, maybe it would make more sense to make the jump to the latest keycloak lts version instead of incrementally upgrading?	19:28
clarkb	fungi: ++	19:30
fungi	huh, actually they don't do "lts" (unless you count red hat's commercial version), our choices are between nightly branch tips, latest release version, or pinning a major (or minor/patch/build) version	19:32
fungi	at the moment, 23.0.5-0 == 23.0 == latest	19:33
fungi	so we'd either do latest and live with potential surprises (like in a month when 24.0.0 is released) or 23.0 i think and then keep on top of updates for it	19:34
fungi	i'll start conservatively with 23.0 and see what happens	19:34
clarkb	I think we did lates previously	19:35
clarkb	and then latest switched to quarkus and broke so we switched to legacy and that updated a few releases without trouble	19:35
clarkb	the main risk is probably if they do another big shift like quarkus	19:35
opendevreview	Jeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0 https://review.opendev.org/c/opendev/system-config/+/907141	19:55
fungi	https://zuul.opendev.org/t/openstack/build/799c01c02b1d46539c8d68e4e15b48bc/log/keycloak01.opendev.org/docker/keycloak-docker_keycloak_1.txt#41 says "Listening on: http://127.0.0.1:8080" at 20:15:22 but then testinfra fails at 20:17:15 saying that it wasn't https://zuul.opendev.org/t/openstack/build/799c01c02b1d46539c8d68e4e15b48bc/log/job-output.txt#17294	20:44
fungi	host networking reaching the container?	20:45
fungi	it's definitely odd, because it was listening on 8080/tcp in the prior change when it was still using the h2 backend	20:45
fungi	rather, the test that it was listening passed	20:46
opendevreview	Jeremy Stanley proposed opendev/system-config master: Revert "Switch from legacy to new style keycloak container" https://review.opendev.org/c/opendev/system-config/+/907119	20:47
opendevreview	Jeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0 https://review.opendev.org/c/opendev/system-config/+/907141	20:47
fungi	still trying to get the new test to pass on the revert, but also i've set an autohold for the upgrade change	20:48
clarkb	looking at db stuff for keycloak they don't do utf8mb4...	21:20
clarkb	that won't explain your test failures, but wow in 2024 we still can't get reliable utf8 from mysql/mariadb	21:20
clarkb	fungi: looking at the mariadb log https://zuul.opendev.org/t/openstack/build/799c01c02b1d46539c8d68e4e15b48bc/log/keycloak01.opendev.org/docker/keycloak-docker_mariadb_1.txt I don't see it logging any connections	21:22
clarkb	I don't know that it would necessarily, but maybe we aren't connecting properly and the keyclaok is dying?	21:22
fungi	i would have expected to see that in its log, which i didn't. but hey, who knows	21:25
fungi	i'll know more once the held node is there	21:26
fungi	held node is 217.182.142.60	21:27
fungi	it's still running	21:27
fungi	netstat indicates ther's something listening at 127.0.0.1:8080	21:27
fungi	i get html back from it too, connecting over the loopback on the host	21:28
fungi	what would the is_listening assert be checking if not that?	21:31
clarkb	I would expect it to do the equivalent of a netstat or ss for that port and ip	21:32
fungi	which is what i did first, and it also listed the socket in a listening state (netstat -lnt)	21:41
fungi	hrm, though it shows up under tcp6 and not tcp	21:42
fungi	tcp6 0 0 127.0.0.1:8080 :::* LISTEN	21:42
fungi	but i was still able to query http://127.0.0.1:8080/ with wget, no ptoblem	21:43
clarkb	maybe change the tcp:// to tcp6:// in the test?	21:43
clarkb	I wonder if that is filtering too aggressively	21:44
opendevreview	Jeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0 https://review.opendev.org/c/opendev/system-config/+/907141	21:45
fungi	worth a shot	21:45
clarkb	starting on tomorrow's meeting agenda. I'll add notes about keycloak and remove the zuul db item	21:58
clarkb	I guess I should look at the centos wheel builds to catch up on that	21:58
clarkb	looks like the publish jobs are working now but not the release jobs	22:00
clarkb	'VLDB: no such entry' and "afs_volume": "mirror.wheel.cent8a64"	22:01
clarkb	I see the issue	22:04
opendevreview	Clark Boylan proposed openstack/project-config master: Fix wheel_volume values for centos stream wheel mirrors https://review.opendev.org/c/openstack/project-config/+/907150	22:10
fungi	clarkb: no dice... "Cannot validate protocol 'tcp6'. Should be tcp, udp or unix"	22:23
clarkb	fungi: I hate this next idea but maybe we rewrite the test to do an http fetch which if successful implies the socket is listening	22:24
clarkb	the other thought is to make sure the correct server is being tested	22:24
fungi	or delete the test entirely since we now have more thorough tests anyway?	22:24
clarkb	ya	22:25
fungi	it had a purpose back when we didn't actually interact with the service in any tests	22:25
clarkb	it is weird that it is failing though	22:25
fungi	oh, the existing tests don't connect to the listener from the container directly though, they go through the apache reverse-proxy	22:26
fungi	but regardless, i expect they'd fail if it wasn't up and listening	22:26
clarkb	and hte container uses host networking so the distinction there isn't meaningful. I meant more is the testinfra node correct	22:27
opendevreview	Jeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0 https://review.opendev.org/c/opendev/system-config/+/907141	22:27
fungi	i've got a new autohold set for it too	22:28
fungi	in part, i'm curious to see if the subsequent tests actually work even though that one was failing	22:29
fungi	oh, in fact they did pass in previous builds too	22:31
fungi	so only the is_listening check for 8080/tcp on localhost was failing, but the proxied interactions (using the admin test creds to issue a token and then making an admin-only api call with that token) through apache's proxy were fine	22:32
clarkb	yup it has to be something to do with the test but whatever it is isn't very obvious	22:34
clarkb	I've updated the meeting agenda. Anything else to add?	22:39
fungi	looks like it relies on either ss or netstat depending on what's available: https://github.com/pytest-dev/pytest-testinfra/blob/main/testinfra/modules/socket.py#L212-L312	22:40
fungi	defaults to ss if present, so i'll test with that on the next held node to see if i can spot the problem	22:40
fungi	it seems to do some rather fragile parsing of ss output too	22:49
opendevreview	Jeremy Stanley proposed opendev/system-config master: DNM: Fail keycloak testing for an autohold https://review.opendev.org/c/opendev/system-config/+/906600	23:36
fungi	forgot the autohold wasn't going to do much good if i got the change to a passing state	23:40
clarkb	that looks like a noop rebase though/	23:41
clarkb	but I think it will fail	23:41
fungi	i restored the dnm change i was using before and then cherry-picked it onto the passing change	23:41
clarkb	ah	23:42
fungi	so autohold is now set on the dnm change that appends an always-failing test	23:42
fungi	then i'll step through the testinfra.modules.socket.LinuxSocketSS._iter_sockets() and see what it comes back with	23:44
fungi	looks like it should to `ss --numeric --listening` and then check all the returned lines for a match	23:49
fungi	wow, that method has a variable named "local"	23:50

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!