clarkb | fungi: ok posted a review. | 00:04 |
---|---|---|
tkajinam | I wonder if https://review.opendev.org/c/openstack/project-config/+/905976 can be moved forward | 02:06 |
tonyb | tkajinam: Looks good to me. | 02:08 |
tkajinam | tonyb, thanks ! | 02:09 |
fungi | clarkb: thanks! | 02:11 |
opendevreview | Merged openstack/project-config master: Add puppet-ceph-release right for special stable branch handling https://review.opendev.org/c/openstack/project-config/+/905976 | 02:46 |
tonyb | I've done more poking on the inmotion cloud and it looks like there are instances in the nova_api database that are deleted in the nova_cell0 database which explains the mismatch. I've reached out in openstack-nova for some help and will keep prodding there | 05:59 |
tonyb | I think it's just a matter of missed cleanups but I'd like some help from nova to make sure I do it right. | 06:00 |
tonyb | While working on it I may need to take set the various hypervisors to disabled in a rolling fashion but I don't think that's any worse than what we have right now. | 06:01 |
frickler | tonyb: ack, thx for digging through this | 06:30 |
opendevreview | Jan Marchel proposed openstack/project-config master: Add new components to NebulOuS project: prediction-orchestrator, exn-middleware, overlay-network-agent https://review.opendev.org/c/openstack/project-config/+/907060 | 08:50 |
*** liuxie is now known as liushy | 08:57 | |
*** zigo_ is now known as zigo | 09:43 | |
*** ykarel_ is now known as ykarel | 10:00 | |
opendevreview | Merged openstack/project-config master: Add new components to NebulOuS project: prediction-orchestrator, exn-middleware, overlay-network-agent https://review.opendev.org/c/openstack/project-config/+/907060 | 13:17 |
*** d34dh0r5| is now known as d34dh0r53 | 15:01 | |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0 https://review.opendev.org/c/opendev/system-config/+/907141 | 15:21 |
*** d34dh0r53 is now known as d34dh0r5| | 16:01 | |
*** d34dh0r5| is now known as d34dh0r53 | 16:01 | |
fungi | clarkb: i think i addressed all your comments on ^ and the keycloak job is still passing (buildset just hasn't reported yet) | 16:22 |
clarkb | ack I'll rereview shortly | 16:22 |
fungi | no rush, just making sure you're aware | 16:22 |
clarkb | I just need to finish doing local updates and catch up on emails | 16:23 |
fungi | i hear ya, that sounds like the first 3 hours of my day | 16:23 |
clarkb | also tea which I now have | 16:24 |
fungi | oh! yes i'm overdue for a cup myself, thanks for the reminder | 16:24 |
clarkb | fungi: one thing that occurred to me is I'm not sure if the keycloak service has db backups yet (or backups at all). Would be a good followup post redeployment to ensure that is added | 16:32 |
clarkb | I don't think we need to bundle it in the deployment though since we aren't using it for anything critical yet | 16:32 |
fungi | clarkb: yes, i added a note about db backups on the etherpad, i was unsure if that was something i could include in the initial change or if it needed to be a followup after deployment | 16:33 |
clarkb | I think you can do it all together, but that is a lot of moving parts and something that might end up getting discarded if we redo it | 16:33 |
fungi | i don't think it has any persistent data other than in the database, so unless we also want to backup its logs (maybe not a bad idea for forensic reasons) the db backup should be sufficient for disaster recovery | 16:34 |
clarkb | I think we get the system backups with the exclusions by default then we can add db on top. Not sure if the current roles allow you to just do the db | 16:35 |
clarkb | also the change lgtm now | 16:35 |
fungi | really, remote logging (or local worm) would be best for forensics, but not something to worry about for now | 16:36 |
fungi | i wonder what worm drive options there might be. something tells me that's not a common thing in cloud providers | 16:37 |
clarkb | aroo? | 16:38 |
fungi | "write once read many" | 16:39 |
fungi | sometimes called "append-only" | 16:39 |
fungi | apparently amazon glacier has a worm option | 16:39 |
clarkb | ah. "worm drive" I think gears and stuff | 16:39 |
fungi | hah, yes that's also a type of gear. i had to replace one in my stand mixer recently | 16:40 |
fungi | i suppose the modern solution is cryptographic approaches for tamper-evident logging, i.e. merkle-damgård | 16:44 |
fungi | e.g., you progressively add each line to an iterative hash of the previous line | 16:45 |
fungi | but then you still have to put the hashes somewhere they can't be tampered with | 16:46 |
fungi | chained hashes | 16:47 |
fungi | hmm... apparently syslog-ng has something along those lines: https://man.archlinux.org/man/secure-logging.7.en | 16:49 |
fungi | though that also encrypts them | 16:49 |
fungi | but ultimately, the most thorough solution it to just stream logs in near-real-time to another system and try to make sure that the place you send your logs is unlikely to get compromised even if someone manages to tamper with the sending system and tries to hide their tracks by editing or removing logs | 16:55 |
fungi | so yeah, there's no real magic solution like the old-school worm enforced at the hardware level (or even older school logging to greenbar on an impact printer attached to a serial line to a different locked room/building) | 16:57 |
fungi | what was great was when the admins would mindlessly just toss piles of that into an unsecured dumpster, and you could laboriously read through looking for places where someone accidentally typed their password at the username prompt | 16:58 |
fungi | i mean, not that i ever did that or anything | 16:59 |
fungi | on closer inspection, this version of keycloak seems to do all its logging to stdout and gets captured in the container log, so we can drop that extra mount i think | 17:07 |
fungi | on the host filesystem for the held node, /var/log/keycloak/ is entirely empty | 17:08 |
clarkb | that may be another change between jboss and wildfly | 17:09 |
clarkb | in that case I think the update to have syslog consume it for us is fine adn we can probably drop the dir and mount for the log dir? | 17:09 |
fungi | yeah, that's what i'm thinking | 17:10 |
fungi | minor concern though, there's still h2 databases in the container. i'll check for signs it's actually using sql | 17:10 |
fungi | possible i've got the envvars wrong | 17:11 |
fungi | yeah, there are no tables in the keycloak database | 17:15 |
fungi | resorting to cloning the source to dig for confirmation the envvar names are correct, but wow this is not a small repo | 17:29 |
fungi | worst case we can probably just map in our own https://github.com/keycloak/keycloak/blob/main/quarkus/dist/src/main/content/conf/keycloak.conf and set values directly there | 17:29 |
fungi | 605mb just checking out the main branch | 17:30 |
clarkb | fungi: https://www.keycloak.org/server/containers has different vars | 17:31 |
fungi | there are build-time and run-time envvars | 17:31 |
fungi | pretty sure those are the options to set when building your own image | 17:31 |
clarkb | looks like instead of a address we have to give it a full jdbc connection string? | 17:31 |
clarkb | oh weird | 17:32 |
clarkb | fungi: further down in that page they provide the db info as args to the start command | 17:32 |
clarkb | under running a standard keycloak container. Maybe ditch the env vars and use the command line instead? | 17:32 |
fungi | it looks like DB_VENDOR may have changed to just DB? https://github.com/keycloak/keycloak/blob/main/quarkus/config-api/src/main/java/org/keycloak/config/DatabaseOptions.java | 17:32 |
fungi | and yeah, i considered switching to the cli opts, since we already have several we're supplying anyway | 17:33 |
clarkb | https://mariadb.com/kb/en/about-mariadb-connector-j/ has jdbc url example for mariadb | 17:33 |
fungi | i'm going to fiddle with the held node a bit and see what works | 17:33 |
jrosser | i have some examples of this if youre interested | 17:33 |
jrosser | we run HA keycloak and mariadb | 17:33 |
fungi | jrosser: oh really? yes please! | 17:33 |
jrosser | `db-url=jdbc:{{ keycloak_jdbc_provider }}://{{ keycloak_jdbc_haproxy_vip }}:{{ keycloak_jdbc_db_port }}/{{ keycloak_db_name }}` | 17:34 |
jrosser | from the conf file | 17:34 |
jrosser | ansible, of course so those are our vars | 17:34 |
fungi | jrosser: also https://review.opendev.org/c/opendev/system-config/+/907141/11/playbooks/roles/keycloak/templates/docker-compose.yaml.j2 is what we'd tried up to this point | 17:34 |
clarkb | fungi: looking at that file I agree DB appears to be the var to set the high level type | 17:34 |
clarkb | but it isn't clear to me if those are read as env vars | 17:35 |
clarkb | --db=postgres is in the first example link I provided so tehy seem to map to cli args at least | 17:35 |
fungi | clarkb: the other common envvars like DB_PASSWORD turned up in that file | 17:35 |
fungi | so just a hunch | 17:35 |
fungi | i'm going to break for lunch and then start fiddling around a bit | 17:36 |
jrosser | fungi: ours is installed from distro packages so we template out the conf file | 17:37 |
jrosser | but we have recently done a massive series of upgrades bringing it to a pretty new version | 17:37 |
fungi | jrosser: yeah, like i said earlier, we can also just map our own conffile into the container if we want | 17:55 |
fungi | but having some semi-stable api (more stable than tracking changes to their default config file) would be preferable if we can work it out | 17:56 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0 https://review.opendev.org/c/opendev/system-config/+/907141 | 18:35 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: DNM: Fail keycloak testing for an autohold https://review.opendev.org/c/opendev/system-config/+/906600 | 18:35 |
fungi | clarkb: jrosser: ^ apparently they template out the db url with sub-options | 18:35 |
clarkb | that simplifies things | 18:35 |
fungi | so you can do --db-url-host, --db-url-port, --db-url-database... | 18:35 |
fungi | the tricky bit is that --db-url-host gets stuck straight into the jdbc url string, which is colon-delimited, so you have to include [] if using raw ipv6 addresses | 18:36 |
fungi | --db-url-host=::1 didn't work (and returned odd errors about the port), which confused me initially, until i realized it was reading that as a null host and null port | 18:37 |
fungi | --db-url-host=[::1] worked a treat though | 18:38 |
fungi | i also added a test to confirm we have expected initial database content | 18:39 |
JayF | I may have just backported an ironic fix of a similar shape :-| | 18:39 |
JayF | although I think we found some things want [] and some break with [] | 18:39 |
fungi | what's old is old again | 18:39 |
clarkb | and docker just refuses to understand both versions | 18:39 |
clarkb | and podman refuses to change in solidarity with docker | 18:39 |
fungi | because the podman folks love docker so very, very much | 18:40 |
JayF | At least all our API mistakes are our API mistakes. It has to be annoying to be chasing someone elses' API | 18:40 |
fungi | JayF: only when it's undocumented | 18:40 |
fungi | which, you know, is most of the time | 18:41 |
clarkb | JayF: the frustrating thing as an end user is that podman is not compatibile with docker in a bunch of different ways | 18:41 |
JayF | I have | 18:41 |
clarkb | but for some reason ipv6 literal support is not one of the ways they can differ | 18:41 |
JayF | **I have lots of opinions about podman, and none of them include "this is a good idea that influenced tech in a positive way". I prefer someone be incompatible rather than 90% there | 18:41 |
fungi | it's mainly annoying that they clearly chose to be incompatible with docker in some ways, but then refuse to acknowledge clear bugs with the excuse that they want to be bug-compatible with docker | 18:42 |
* fungi has cake and eats it too | 18:43 | |
JayF | Yeah, this is the pattern you get trapped in if you chase someone elses' API | 18:43 |
clarkb | now I want cake | 18:44 |
fungi | the cake is a lie | 18:50 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update to etherpad 1.9.6 https://review.opendev.org/c/opendev/system-config/+/907349 | 18:57 |
fungi | wow, database test worked on the first try! | 18:57 |
clarkb | nice | 18:58 |
fungi | the new held keycloak test node is 104.239.230.31 | 19:00 |
fungi | also no h2 files in the running container this time | 19:01 |
fungi | which i considered adding a test for, but figured checking mariadb for content was sufficient | 19:01 |
clarkb | ya and I think h2 dbs can be used as caches (gerrit oes somethign like this) | 19:05 |
clarkb | so as long as the permanent data ends up in mariadb we should be good | 19:05 |
frickler | fwiw I'm seeing packet loss and sometimes-slow-responses from review.o.o like SvenKieske did earlier (in #*-kolla) | 19:13 |
clarkb | I'm seeing very minimal loss within my isp before packets jump out of our AS but nothing beyond that | 19:18 |
fungi | frickler: ipv4 i guess? you're presumably still not able to reach it at all over ipv6 | 19:21 |
clarkb | I've started to try and add some more depth to the pre ptg etherpad | 19:23 |
fungi | for me, ping -4 from home to review.o.o looks like: 100 packets transmitted, 100 received, 0% packet loss, time 99725ms rtt min/avg/max/mdev = 66.104/74.671/431.658/36.152 ms | 19:23 |
fungi | ping -6 is surprisingly a little better: 100 packets transmitted, 100 received, 0% packet loss, time 99140ms rtt min/avg/max/mdev = 54.896/57.737/68.236/2.460 ms | 19:26 |
frickler | fungi: yes, v6 is still unreachable | 19:31 |
JayF | 75.196/79.345/89.269/4.652 ms from here over v4, it looks good/normal to me | 19:33 |
* JayF has no v6 | 19:33 | |
fungi | i'm doing some tests from our mirror server in france, since that's the geographically closest network to germany i have access to, though i doubt it's following similar routes | 19:36 |
fungi | i could also boot one in vexxhost's warsaw region but that's not really any better | 19:37 |
fungi | ipv6: 100 packets transmitted, 100 received, 0% packet loss, time 99136ms, rtt min/avg/max/mdev = 80.353/81.795/99.545/3.724 ms | 19:41 |
fungi | ipv4: 100 packets transmitted, 100 received, 0% packet loss, time 99154ms, rtt min/avg/max/mdev = 80.125/80.971/98.320/2.549 ms | 19:41 |
fungi | fairly consistent from there | 19:41 |
jrosser | frickler: do you see where they are lost with mtr? | 19:42 |
fungi | we can also install the mtr package on review.o.o to get the reverse path for comparison, since cases like this quite often involve an asymmertic route somewhere and you've got a 50% chance to see failures misattributed to the first hop where they diverge | 19:44 |
frickler | jrosser: seems to be only the final two hops, so either the vexxhost link is full or something going wrong on the return path | 19:51 |
frickler | too bad mnaser isn't around any more most of the time to look at things from the inside | 19:52 |
fungi | when you see failures like that close to a provider edge, odds are you're dealing with an asymmetric route and the loss is somewhere on the way back | 19:52 |
mnaser | i'm around, but honestly, there's not much we can do with zayo, i've filed endless tickets with them | 19:52 |
fungi | we can run mtr from review.o.o to see where it errors | 19:52 |
mnaser | i'm playing ping pong with them and it's just a matter of having the contract lapse and recommending no one to ever touch their stuff :) | 19:53 |
fungi | i definitely don't envy you, nor do i miss chasing backbone provider problems | 19:53 |
mnaser | it turns out after all the internet is a series of tubes | 19:54 |
fungi | very leaky ones at that | 19:54 |
mnaser | its not a big truck | 19:54 |
fungi | most of the troubles i remember would end up being two backbone providers who couldn't agree on who was responsible for upgrading the capacity on their peering with one another, so they'd just point fingers and let customers suffer until one of them eventually caved and added more circuits | 19:55 |
fungi | our bgp tables were a never-ending churn of pads and prefs to try to work around the worst offenders, but there was only so much we could do | 19:56 |
jrosser | just now everything i can look at (not at my work laptop) goes via cogent and looks OK | 19:56 |
frickler | mnaser: sorry to hear that. though from my traceroutes, both directions seem to be via cogent. and tbh I've heard more bad stories about cogent than zayo, but who knows | 20:14 |
mnaser | frickler: historically cogent has been the bad guy, but surprisingly they got their act together | 20:14 |
frickler | fungi: would we install mtr by hand or do we need to add it to the automation somewhere? | 20:15 |
frickler | mnaser: well not in terms of their connectivity to german telekom it seems | 20:16 |
fungi | frickler: i would just manually `sudo apt install mtr` but i wouldn't object to adding it and similar diagnostic tools to our default set if others are in favor | 20:18 |
frickler | I went for mtr-tiny in order to avoid installing like 100 X11 libraries | 20:20 |
tonyb | yeah I think adding it to the defaults is good. | 20:21 |
frickler | but having that as default tool together with things like tcpdump and nc is a good idea | 20:21 |
tonyb | also maybe jq? | 20:21 |
frickler | jq is also good, yes | 20:21 |
tonyb | is nmap too much? | 20:22 |
frickler | hmm ... at least questionable I'd say, too easy to do unwanted things with it | 20:22 |
frickler | I can look into a patch tomorrow, eoding for now | 20:24 |
fungi | yeah, i don't see nmap as being in the same category as those other things | 20:24 |
opendevreview | Clark Boylan proposed opendev/system-config master: DNM force etherpad failure to hold node https://review.opendev.org/c/opendev/system-config/+/840972 | 20:38 |
clarkb | put a hold in plcae for ^ after a successful test run on the parent | 20:39 |
fungi | looks like you should have a held node for it now | 21:24 |
opendevreview | James E. Blair proposed openstack/project-config master: DNM: test syntax error https://review.opendev.org/c/openstack/project-config/+/907362 | 21:44 |
clarkb | 173.231.255.107 is the held node and I'm in the clarkb-test etherpad if you want to help test | 21:49 |
clarkb | chrome is doing the random reconnect thing we've seen in the past but seems to work otherwise | 21:52 |
clarkb | if others can't find issues I think this is probably a safe update | 21:52 |
clarkb | curiously chrome and firefox render that reddish color differently | 21:53 |
*** blarnath is now known as d34dh0r53 | 22:17 | |
clarkb | I missed that https://discuss.python.org/t/what-to-do-about-gpus-and-the-built-distributions-that-support-them/7125 is a thing pypi is actually looking at now | 22:20 |
clarkb | this same issue is what ultimately led to us turning off our pypi mirroring | 22:20 |
fungi | yeah | 22:21 |
clarkb | it seems like the fundamental issue is that CUDA isn't packaged in a way that is consumable as a dependency so everyone bundles it | 22:23 |
clarkb | kidn of surprising to me that very little of the discussion seems to have gone down the path of "stop allowing cuda to do this to us" | 22:24 |
fungi | stockholm syndrome | 22:25 |
clarkb | nvidia is making large buckets of money in large part due to the success of cuda + python | 22:26 |
clarkb | its crazy to me that investing a small amount of that into making the packaging of the software not suck seems insurmountable | 22:26 |
clarkb | I guess at the end of the thread there is talk of the cudapython lib which does some of that | 22:27 |
clarkb | except that those bindings are different than the ones everyone is already using | 22:27 |
fungi | held etherpad lgtm | 22:28 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job https://review.opendev.org/c/zuul/zuul-jobs/+/907363 | 22:35 |
clarkb | I'ev got a doctor appointment tomorrow morning but maybe we upgrade etherpad when I get back | 22:44 |
clarkb | tonyb: looks like fungi reviewed the meetpad stack too if we want to start merging some of those. I think most of them are safe as they don't try and replace anything yet? | 22:45 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job https://review.opendev.org/c/zuul/zuul-jobs/+/907363 | 22:46 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job https://review.opendev.org/c/zuul/zuul-jobs/+/907363 | 22:57 |
fungi | sgtm | 23:05 |
tonyb | clarkb: Sounds good. I'll add a comment to the review to the first review to address your question | 23:11 |
tonyb | Also FWIW: I'm slowly removing the stuck nodes from inmotion | 23:12 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job https://review.opendev.org/c/zuul/zuul-jobs/+/907363 | 23:14 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job https://review.opendev.org/c/zuul/zuul-jobs/+/907363 | 23:24 |
clarkb | tonyb: what process did you end up for cleaning up the stuck nodes? | 23:43 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job https://review.opendev.org/c/zuul/zuul-jobs/+/907363 | 23:55 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!