frickler | tonyb: not sure about the status of inmotion cleanup, according to grafana there's still 28 nodes stuck in deleting? | 09:10 |
---|---|---|
*** tosky_ is now known as tosky | 10:19 | |
opendevreview | Jan Marchel proposed openstack/project-config master: Add new components to NebulOuS project: prediction-orchiestrator, exn-middleware, overlay-network-agent https://review.opendev.org/c/openstack/project-config/+/907060 | 13:48 |
jrosser | i know not much can be done but looks like connectivity between me and opendev.org is pretty terrible again | 15:25 |
jrosser | and mtr points to above.net / zayo as culprit again | 15:26 |
fungi | jrosser: ipv4, v6, both? | 15:32 |
jrosser | ah i'm not on a dual stack vm right now | 15:33 |
* jrosser tries something else | 15:33 | |
* frickler still has no v6 connectivity to opendev.org. (or again? I keep loosing track. likely should add that topic to preptg agenda, too) | 15:35 | |
jrosser | i see this for v4 https://paste.opendev.org/show/bVNawMTqNZJenQUqe40F/ | 15:36 |
jrosser | though i'm not sure that actually reports end to end loss though | 15:36 |
frickler | well end to end loss is in the final line, which says 0%, so that looks fine | 15:38 |
jrosser | it's the same symptom that i've seen before of only kbit/s throughput on git things | 15:46 |
frickler | jrosser: here's an mtr in the opposite direction to the first public IP in your trace. looks like there might be congestion between zayo and bbc https://paste.opendev.org/show/bVonvvCAVp1gj1U8Sk5x/ | 16:02 |
opendevreview | Tim Burke proposed opendev/git-review master: Add classifiers for Python 3.10 and 3.11 https://review.opendev.org/c/opendev/git-review/+/907097 | 16:41 |
frickler | seems we have a regression in linting for git-review, failure unrelated to the patch afaict https://zuul.opendev.org/t/opendev/build/6fbff753824c418aa26d173d5ffefb13 | 16:45 |
corvus | Clark: i'm working on the skopeo thing | 16:50 |
clarkb | corvus: ok, is it more involved than simply using a newer client? | 16:55 |
clarkb | hrm I guess the change that tried that failed too | 16:55 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: ensure-skopeo: use make install https://review.opendev.org/c/zuul/zuul-jobs/+/907100 | 16:57 |
corvus | clarkb: i think that's the next mole to whack ^ | 16:57 |
corvus | if works, will need to squash | 16:57 |
clarkb | I guess that installs to /usr/local/bin by default? | 16:58 |
opendevreview | Tim Burke proposed opendev/git-review master: Use importlib.metadata instead of pkg_resources https://review.opendev.org/c/opendev/git-review/+/907101 | 17:00 |
corvus | clarkb: yep, verified in a local build | 17:01 |
opendevreview | Merged openstack/project-config master: Implement ironic-unmaintained-core group https://review.opendev.org/c/openstack/project-config/+/902796 | 17:09 |
opendevreview | Tim Burke proposed opendev/git-review master: Fix flake8 issue https://review.opendev.org/c/opendev/git-review/+/907102 | 17:10 |
timburke | frickler, thanks for letting me know; ^^^ should address | 17:11 |
opendevreview | Jan Marchel proposed openstack/project-config master: Add new components to NebulOuS project: prediction-orchiestrator, exn-middleware, overlay-network-agent https://review.opendev.org/c/openstack/project-config/+/907060 | 17:13 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Retire the OpenInfra Labs mailing list https://review.opendev.org/c/opendev/system-config/+/907103 | 17:16 |
clarkb | fungi: where did we end up wtih keycloak? | 17:26 |
clarkb | fungi: I think if othes don't object renaming the database files and mounting them in the new locations seems reasonable. I guess make backups of the files too | 17:26 |
fungi | clarkb: yeah, that's what i'm doing, basically | 17:28 |
fungi | also dawned on me that i should double-check the ownership and permissions inside the container | 17:28 |
clarkb | oh ya those may have changed too. Fun | 17:28 |
fungi | does mirror02.iad3.inmotion need to remain in the emergency disable list, or is it working again? | 17:29 |
fungi | i seem to be able to ssh into it | 17:29 |
fungi | just want to be sure before i re-break our base deploy | 17:29 |
clarkb | I think it is happy after tonyb did surgery on that cloud | 17:29 |
clarkb | basically had to restart rabbitmq properly. We tried a naive approach whihc didn't work but there was a kolla method that was better and got things working again | 17:30 |
fungi | okay, i've taken it back out but we should keep an eye on the deploy jobs | 17:30 |
clarkb | ++ | 17:31 |
fungi | i also added keycloak01 to the disable list while i'm working on potential compose file edits | 17:31 |
clarkb | corvus: fwiw I checked snapcraft and microk8s latest/stable hasn't updated since they broke things. No new release yet. The bug hsa gotten some references added to it so maybe it will get fixed | 17:32 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Run buildset registry test jobs on ensure-skopeo changes https://review.opendev.org/c/zuul/zuul-jobs/+/907104 | 17:33 |
clarkb | corvus: which means we may want to go ahead with the nodepool change for now then swing around and cleanup zuul-jobs | 17:34 |
corvus | which nodepool change? | 17:34 |
corvus | https://review.opendev.org/906905 | 17:35 |
clarkb | yup that one. That change is what started my journey on friday :) | 17:35 |
clarkb | trying to land the ssh keyscan change | 17:35 |
clarkb | corvus: https://review.opendev.org/c/zuul/zuul-jobs/+/906907 was next then https://review.opendev.org/c/zuul/zuul-jobs/+/906916 | 17:36 |
corvus | yeah, i got those but missed the rollback one :) | 17:37 |
corvus | i think i missed it cause i started at the end and worked backwards | 17:37 |
clarkb | and then finally the ssh keyscan change you wrote is at the very beginning | 17:38 |
corvus | thanks for digging into that :) | 17:40 |
clarkb | you're welcome | 17:41 |
opendevreview | Merged opendev/git-review master: Fix flake8 issue https://review.opendev.org/c/opendev/git-review/+/907102 | 17:51 |
opendevreview | Tim Burke proposed opendev/git-review master: Use importlib.metadata instead of pkg_resources https://review.opendev.org/c/opendev/git-review/+/907101 | 17:52 |
clarkb | it took me a second to udnerstand why ^ is necessary since git-review depends on setuptools to install. Except it doesn't if you install from a wheel... | 18:05 |
timburke | clarkb, bingo -- idea was to have it as an alternative to https://review.opendev.org/c/opendev/git-review/+/898839?usp=search | 18:31 |
fungi | mmm, i'm getting increasingly out of my depth with the keycloak migration experiment... container log is now full of errors about failing to start the server because it can't establish a jdbc connection, reporting "ERROR: Wrong user name or password [28000-197]" | 18:45 |
fungi | i guess h2 databases have some sort of integrated credentials? | 18:45 |
clarkb | I didn't think they did. I thought it was more like sqlite. But maybe I'm wrong about that | 18:46 |
fungi | https://stackoverflow.com/questions/63800413/invalid-username-password-when-accessing-keycloaks-h2-database | 18:47 |
adamcarthur5 | I've seen that in some cases people are adding a comment that re-runs the CI to check if it is a fluke. What is the best way to do this myself? | 18:47 |
clarkb | heh that even points at keycloak | 18:47 |
fungi | adamcarthur5: start a review comment with the word "recheck" but ideally only do that after you've looked into the failure and feel confident it's not being caused by your change | 18:48 |
adamcarthur5 | Okay great, thank you :)) | 18:48 |
clarkb | you can also append any text after "recheck" if you want to add notes around why you are rerunning things | 18:48 |
JayF | Yes, please do not run `recheck` with some reasoning/explanation afterwards. | 18:49 |
JayF | even `recheck the code under test is not changed by this patch` is good as a minimum bar | 18:49 |
fungi | yes, i generally "recheck because ..." (some summary of the nature of the job failure and why i know it's unrelated) | 18:49 |
clarkb | fungi: things like that are making me think maybe we need to consider a mariadb and start over on config | 18:49 |
clarkb | just because we know how to work with a mariadb and configuratioon of that is a bit more explicit | 18:49 |
clarkb | I dunno this feels like a framework migration where they never considered h2 users because that isn't a "production" deployment so doesn't matter | 18:50 |
JayF | adamcarthur5 the ideal case is that, as a contributor to $project, you'd troubleshoot the random failure as fix it like some kinda coding superhero :D. The reality is as an early contributor, the best thing you can do there is just read the logs, make sure you start recognizing if patterns emerge in failures, and if so raise a question in IRC or on mailing list | 18:50 |
adamcarthur5 | JayF yeah definitely. I'll just take it project by project and see how I get on :)) | 18:51 |
clarkb | fungi: the only other idae I've got is if we can determine the old h2 db creds we can probably configure quarkus keycloak to use those creds | 18:52 |
JayF | Yeah the entire point is twofold: 1) each `recheck` comment is expensive, in terms of actual-computers-being-run, and 2) we do not want to get to a point where we have a periodic failure, say, every 5-10% of the time it fails randomly, get into the codebase because we "rechecked" to get a green | 18:52 |
clarkb | fungi: also why you would chnge those at all is beyond me. It only serves to make users' lives miserable | 18:52 |
clarkb | fungi: KC_DB_USERNAME and KC_DB_PASSWORD seem to be what you use to set things for postgres etc. Maybe we can set those values and have them work with h2 | 18:53 |
fungi | clarkb: yeah, found this in an upgrade note too: "Keycloak ships for development purposes with an H2 database driver. As it is intended for development purposes only, it should never be used in a production environment. In this release, the H2 driver has been upgraded from version 1.x to version 2.x. This change might require changes to the H2 JDBC URL or migration of the H2 database files in | 18:53 |
fungi | an existing Keycloak setup. [...] Purge existing H2 database files to start with an empty database, export and import the realms using Keycloak’s export and import functionality, or refer to the migration notes on the H2 database project’s website for details on how to migrate H2 database contents." | 18:53 |
fungi | that's for 20.0.0 i think, but reflects their sentiments fairly well | 18:54 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Run buildset registry test jobs on ensure-skopeo changes https://review.opendev.org/c/zuul/zuul-jobs/+/907104 | 18:54 |
clarkb | fungi: ya basically they don't care about people using h2 | 18:54 |
clarkb | fungi: https://stackoverflow.com/questions/72863453/unable-to-access-keycloak-18-0-2-embedded-h2-db-file this has what may be the defaults we used | 18:54 |
clarkb | fungi: we could try forcing those values here. Otherwise maybe we consider a revert and plan for a proper db to avoid problems in the future? | 18:54 |
clarkb | oh wait those valuse are for quarkus not wildfly | 18:55 |
clarkb | but we might be able to look in the same files and go back in time to find the old values? | 18:56 |
corvus | clarkb: fungi if our actual goal is to eventually proxy to other providers, then we could use h2 and reconstruct the contents via the api on every deployment. | 18:56 |
clarkb | corvus: we'd still need the user generated content though | 18:56 |
fungi | yeah, it looks like the container change included switching from wildfly to quarkus | 18:56 |
corvus | (or of course, use a real dbms; just mentioning the option) | 18:56 |
clarkb | fungi: yes that is the difference between legacy and not legacy | 18:57 |
clarkb | and it seems like they didn't put a lot of effort into making that transition easy | 18:57 |
clarkb | I mean they could've copied the db from the old location to the new one and continued to use the same credentials | 18:57 |
fungi | corvus: yeah, i think that's where we're headed. the main question i have is whether we should roll back the container change temporarily, and whether we need to include steps to export the h2 realm data from h2/import it into maraidb or want to start with a clean slate | 18:58 |
fungi | so two questions i guess | 18:58 |
corvus | clarkb: oh i guess we would accumulate mappings, wouldn't we? | 18:58 |
clarkb | corvus: yes exactly. Over time users would be building out the db telling keycloak what backend identities they have for the frontent of opendev id (for lack of better terminology) | 18:58 |
corvus | then i agree dbms is the right long-term answer; just a question of whether we want to take that hit now or later | 18:59 |
fungi | not trying to keep our old data would only impact maybe half a dozen users, but i'm open to doing the extra export/import if folks want | 19:00 |
fungi | also since we only have keycloak hooked up to one system for now and it's not in a critical path for anything, it seems like this would be the time to switch it to a proper sql server. that will in theory greatly simplify the container version updates we want to do once we're on the "new style" image, but also means less data to lose or migrate if we do it before working on tying it to more | 19:01 |
fungi | systems | 19:01 |
clarkb | ya I think it also simplifies working with that data | 19:02 |
clarkb | since we know how to talk to a mariadb (or even a postgres) but h2 is a bit more complicated | 19:02 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Revert "Switch from legacy to new style keycloak container" https://review.opendev.org/c/opendev/system-config/+/907119 | 19:12 |
fungi | oh funky... default behavior of git revert has changed! now if you revert a revert, the subject changes to 'Reapply "whatever the reverted commit subject was"' rather than 'Revert "Revert ..."' | 19:14 |
clarkb | yup. I left a note in there about keeping the new test as I think that is valuable | 19:14 |
fungi | the test may need altering because of the change to the url, but i can do that. or i can reintroduce the test as a separate change before the reapply | 19:15 |
fungi | preference? | 19:15 |
clarkb | ya you need to add the /auth/ prefix (noted that in my comment) | 19:16 |
clarkb | I'm fine with a followup change to the revert | 19:16 |
fungi | also i've undone my test changes to the db files (wiped both paths we're mounting into the container and replaced with their content with the local backup copies i made) | 19:18 |
fungi | once we merge a revert, i'll take the server back out of the disable list | 19:18 |
clarkb | sounds like a plan | 19:18 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Revert "Switch from legacy to new style keycloak container" https://review.opendev.org/c/opendev/system-config/+/907119 | 19:21 |
fungi | clarkb: something to consider... if we don't care about exporting/importing, maybe it would make more sense to make the jump to the latest keycloak lts version instead of incrementally upgrading? | 19:28 |
clarkb | fungi: ++ | 19:30 |
fungi | huh, actually they don't do "lts" (unless you count red hat's commercial version), our choices are between nightly branch tips, latest release version, or pinning a major (or minor/patch/build) version | 19:32 |
fungi | at the moment, 23.0.5-0 == 23.0 == latest | 19:33 |
fungi | so we'd either do latest and live with potential surprises (like in a month when 24.0.0 is released) or 23.0 i think and then keep on top of updates for it | 19:34 |
fungi | i'll start conservatively with 23.0 and see what happens | 19:34 |
clarkb | I think we did lates previously | 19:35 |
clarkb | and then latest switched to quarkus and broke so we switched to legacy and that updated a few releases without trouble | 19:35 |
clarkb | the main risk is probably if they do another big shift like quarkus | 19:35 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0 https://review.opendev.org/c/opendev/system-config/+/907141 | 19:55 |
fungi | https://zuul.opendev.org/t/openstack/build/799c01c02b1d46539c8d68e4e15b48bc/log/keycloak01.opendev.org/docker/keycloak-docker_keycloak_1.txt#41 says "Listening on: http://127.0.0.1:8080" at 20:15:22 but then testinfra fails at 20:17:15 saying that it wasn't https://zuul.opendev.org/t/openstack/build/799c01c02b1d46539c8d68e4e15b48bc/log/job-output.txt#17294 | 20:44 |
fungi | host networking reaching the container? | 20:45 |
fungi | it's definitely odd, because it *was* listening on 8080/tcp in the prior change when it was still using the h2 backend | 20:45 |
fungi | rather, the test that it was listening passed | 20:46 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Revert "Switch from legacy to new style keycloak container" https://review.opendev.org/c/opendev/system-config/+/907119 | 20:47 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0 https://review.opendev.org/c/opendev/system-config/+/907141 | 20:47 |
fungi | still trying to get the new test to pass on the revert, but also i've set an autohold for the upgrade change | 20:48 |
clarkb | looking at db stuff for keycloak they don't do utf8mb4... | 21:20 |
clarkb | that won't explain your test failures, but wow in 2024 we still can't get reliable utf8 from mysql/mariadb | 21:20 |
clarkb | fungi: looking at the mariadb log https://zuul.opendev.org/t/openstack/build/799c01c02b1d46539c8d68e4e15b48bc/log/keycloak01.opendev.org/docker/keycloak-docker_mariadb_1.txt I don't see it logging any connections | 21:22 |
clarkb | I don't know that it would necessarily, but maybe we aren't connecting properly and the keyclaok is dying? | 21:22 |
fungi | i would have expected to see that in its log, which i didn't. but hey, who knows | 21:25 |
fungi | i'll know more once the held node is there | 21:26 |
fungi | held node is 217.182.142.60 | 21:27 |
fungi | it's still running | 21:27 |
fungi | netstat indicates ther's something listening at 127.0.0.1:8080 | 21:27 |
fungi | i get html back from it too, connecting over the loopback on the host | 21:28 |
fungi | what would the is_listening assert be checking if not that? | 21:31 |
clarkb | I would expect it to do the equivalent of a netstat or ss for that port and ip | 21:32 |
fungi | which is what i did first, and it also listed the socket in a listening state (netstat -lnt) | 21:41 |
fungi | hrm, though it shows up under tcp6 and not tcp | 21:42 |
fungi | tcp6 0 0 127.0.0.1:8080 :::* LISTEN | 21:42 |
fungi | but i was still able to query http://127.0.0.1:8080/ with wget, no ptoblem | 21:43 |
clarkb | maybe change the tcp:// to tcp6:// in the test? | 21:43 |
clarkb | I wonder if that is filtering too aggressively | 21:44 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0 https://review.opendev.org/c/opendev/system-config/+/907141 | 21:45 |
fungi | worth a shot | 21:45 |
clarkb | starting on tomorrow's meeting agenda. I'll add notes about keycloak and remove the zuul db item | 21:58 |
clarkb | I guess I should look at the centos wheel builds to catch up on that | 21:58 |
clarkb | looks like the publish jobs are working now but not the release jobs | 22:00 |
clarkb | 'VLDB: no such entry' and "afs_volume": "mirror.wheel.cent8a64" | 22:01 |
clarkb | I see the issue | 22:04 |
opendevreview | Clark Boylan proposed openstack/project-config master: Fix wheel_volume values for centos stream wheel mirrors https://review.opendev.org/c/openstack/project-config/+/907150 | 22:10 |
fungi | clarkb: no dice... "Cannot validate protocol 'tcp6'. Should be tcp, udp or unix" | 22:23 |
clarkb | fungi: I hate this next idea but maybe we rewrite the test to do an http fetch which if successful implies the socket is listening | 22:24 |
clarkb | the other thought is to make sure the correct server is being tested | 22:24 |
fungi | or delete the test entirely since we now have more thorough tests anyway? | 22:24 |
clarkb | ya | 22:25 |
fungi | it had a purpose back when we didn't actually interact with the service in any tests | 22:25 |
clarkb | it is weird that it is failing though | 22:25 |
fungi | oh, the existing tests don't connect to the listener from the container directly though, they go through the apache reverse-proxy | 22:26 |
fungi | but regardless, i expect they'd fail if it wasn't up and listening | 22:26 |
clarkb | and hte container uses host networking so the distinction there isn't meaningful. I meant more is the testinfra node correct | 22:27 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0 https://review.opendev.org/c/opendev/system-config/+/907141 | 22:27 |
fungi | i've got a new autohold set for it too | 22:28 |
fungi | in part, i'm curious to see if the subsequent tests actually work even though that one was failing | 22:29 |
fungi | oh, in fact they did pass in previous builds too | 22:31 |
fungi | so only the is_listening check for 8080/tcp on localhost was failing, but the proxied interactions (using the admin test creds to issue a token and then making an admin-only api call with that token) through apache's proxy were fine | 22:32 |
clarkb | yup it has to be something to do with the test but whatever it is isn't very obvious | 22:34 |
clarkb | I've updated the meeting agenda. Anything else to add? | 22:39 |
fungi | looks like it relies on either ss or netstat depending on what's available: https://github.com/pytest-dev/pytest-testinfra/blob/main/testinfra/modules/socket.py#L212-L312 | 22:40 |
fungi | defaults to ss if present, so i'll test with that on the next held node to see if i can spot the problem | 22:40 |
fungi | it seems to do some rather fragile parsing of ss output too | 22:49 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: DNM: Fail keycloak testing for an autohold https://review.opendev.org/c/opendev/system-config/+/906600 | 23:36 |
fungi | forgot the autohold wasn't going to do much good if i got the change to a passing state | 23:40 |
clarkb | that looks like a noop rebase though/ | 23:41 |
clarkb | but I think it will fail | 23:41 |
fungi | i restored the dnm change i was using before and then cherry-picked it onto the passing change | 23:41 |
clarkb | ah | 23:42 |
fungi | so autohold is now set on the dnm change that appends an always-failing test | 23:42 |
fungi | then i'll step through the testinfra.modules.socket.LinuxSocketSS._iter_sockets() and see what it comes back with | 23:44 |
fungi | looks like it should to `ss --numeric --listening` and then check all the returned lines for a match | 23:49 |
fungi | wow, that method has a variable named "local" | 23:50 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!