Wednesday, 2024-01-31

clarkb	fungi: ok posted a review.	00:04
tkajinam	I wonder if https://review.opendev.org/c/openstack/project-config/+/905976 can be moved forward	02:06
tonyb	tkajinam: Looks good to me.	02:08
tkajinam	tonyb, thanks !	02:09
fungi	clarkb: thanks!	02:11
opendevreview	Merged openstack/project-config master: Add puppet-ceph-release right for special stable branch handling https://review.opendev.org/c/openstack/project-config/+/905976	02:46
tonyb	I've done more poking on the inmotion cloud and it looks like there are instances in the nova_api database that are deleted in the nova_cell0 database which explains the mismatch. I've reached out in openstack-nova for some help and will keep prodding there	05:59
tonyb	I think it's just a matter of missed cleanups but I'd like some help from nova to make sure I do it right.	06:00
tonyb	While working on it I may need to take set the various hypervisors to disabled in a rolling fashion but I don't think that's any worse than what we have right now.	06:01
frickler	tonyb: ack, thx for digging through this	06:30
opendevreview	Jan Marchel proposed openstack/project-config master: Add new components to NebulOuS project: prediction-orchestrator, exn-middleware, overlay-network-agent https://review.opendev.org/c/openstack/project-config/+/907060	08:50
*** liuxie is now known as liushy		08:57
*** zigo_ is now known as zigo		09:43
*** ykarel_ is now known as ykarel		10:00
opendevreview	Merged openstack/project-config master: Add new components to NebulOuS project: prediction-orchestrator, exn-middleware, overlay-network-agent https://review.opendev.org/c/openstack/project-config/+/907060	13:17
*** d34dh0r5\| is now known as d34dh0r53		15:01
opendevreview	Jeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0 https://review.opendev.org/c/opendev/system-config/+/907141	15:21
*** d34dh0r53 is now known as d34dh0r5\|		16:01
*** d34dh0r5\| is now known as d34dh0r53		16:01
fungi	clarkb: i think i addressed all your comments on ^ and the keycloak job is still passing (buildset just hasn't reported yet)	16:22
clarkb	ack I'll rereview shortly	16:22
fungi	no rush, just making sure you're aware	16:22
clarkb	I just need to finish doing local updates and catch up on emails	16:23
fungi	i hear ya, that sounds like the first 3 hours of my day	16:23
clarkb	also tea which I now have	16:24
fungi	oh! yes i'm overdue for a cup myself, thanks for the reminder	16:24
clarkb	fungi: one thing that occurred to me is I'm not sure if the keycloak service has db backups yet (or backups at all). Would be a good followup post redeployment to ensure that is added	16:32
clarkb	I don't think we need to bundle it in the deployment though since we aren't using it for anything critical yet	16:32
fungi	clarkb: yes, i added a note about db backups on the etherpad, i was unsure if that was something i could include in the initial change or if it needed to be a followup after deployment	16:33
clarkb	I think you can do it all together, but that is a lot of moving parts and something that might end up getting discarded if we redo it	16:33
fungi	i don't think it has any persistent data other than in the database, so unless we also want to backup its logs (maybe not a bad idea for forensic reasons) the db backup should be sufficient for disaster recovery	16:34
clarkb	I think we get the system backups with the exclusions by default then we can add db on top. Not sure if the current roles allow you to just do the db	16:35
clarkb	also the change lgtm now	16:35
fungi	really, remote logging (or local worm) would be best for forensics, but not something to worry about for now	16:36
fungi	i wonder what worm drive options there might be. something tells me that's not a common thing in cloud providers	16:37
clarkb	aroo?	16:38
fungi	"write once read many"	16:39
fungi	sometimes called "append-only"	16:39
fungi	apparently amazon glacier has a worm option	16:39
clarkb	ah. "worm drive" I think gears and stuff	16:39
fungi	hah, yes that's also a type of gear. i had to replace one in my stand mixer recently	16:40
fungi	i suppose the modern solution is cryptographic approaches for tamper-evident logging, i.e. merkle-damgård	16:44
fungi	e.g., you progressively add each line to an iterative hash of the previous line	16:45
fungi	but then you still have to put the hashes somewhere they can't be tampered with	16:46
fungi	chained hashes	16:47
fungi	hmm... apparently syslog-ng has something along those lines: https://man.archlinux.org/man/secure-logging.7.en	16:49
fungi	though that also encrypts them	16:49
fungi	but ultimately, the most thorough solution it to just stream logs in near-real-time to another system and try to make sure that the place you send your logs is unlikely to get compromised even if someone manages to tamper with the sending system and tries to hide their tracks by editing or removing logs	16:55
fungi	so yeah, there's no real magic solution like the old-school worm enforced at the hardware level (or even older school logging to greenbar on an impact printer attached to a serial line to a different locked room/building)	16:57
fungi	what was great was when the admins would mindlessly just toss piles of that into an unsecured dumpster, and you could laboriously read through looking for places where someone accidentally typed their password at the username prompt	16:58
fungi	i mean, not that i ever did that or anything	16:59
fungi	on closer inspection, this version of keycloak seems to do all its logging to stdout and gets captured in the container log, so we can drop that extra mount i think	17:07
fungi	on the host filesystem for the held node, /var/log/keycloak/ is entirely empty	17:08
clarkb	that may be another change between jboss and wildfly	17:09
clarkb	in that case I think the update to have syslog consume it for us is fine adn we can probably drop the dir and mount for the log dir?	17:09
fungi	yeah, that's what i'm thinking	17:10
fungi	minor concern though, there's still h2 databases in the container. i'll check for signs it's actually using sql	17:10
fungi	possible i've got the envvars wrong	17:11
fungi	yeah, there are no tables in the keycloak database	17:15
fungi	resorting to cloning the source to dig for confirmation the envvar names are correct, but wow this is not a small repo	17:29
fungi	worst case we can probably just map in our own https://github.com/keycloak/keycloak/blob/main/quarkus/dist/src/main/content/conf/keycloak.conf and set values directly there	17:29
fungi	605mb just checking out the main branch	17:30
clarkb	fungi: https://www.keycloak.org/server/containers has different vars	17:31
fungi	there are build-time and run-time envvars	17:31
fungi	pretty sure those are the options to set when building your own image	17:31
clarkb	looks like instead of a address we have to give it a full jdbc connection string?	17:31
clarkb	oh weird	17:32
clarkb	fungi: further down in that page they provide the db info as args to the start command	17:32
clarkb	under running a standard keycloak container. Maybe ditch the env vars and use the command line instead?	17:32
fungi	it looks like DB_VENDOR may have changed to just DB? https://github.com/keycloak/keycloak/blob/main/quarkus/config-api/src/main/java/org/keycloak/config/DatabaseOptions.java	17:32
fungi	and yeah, i considered switching to the cli opts, since we already have several we're supplying anyway	17:33
clarkb	https://mariadb.com/kb/en/about-mariadb-connector-j/ has jdbc url example for mariadb	17:33
fungi	i'm going to fiddle with the held node a bit and see what works	17:33
jrosser	i have some examples of this if youre interested	17:33
jrosser	we run HA keycloak and mariadb	17:33
fungi	jrosser: oh really? yes please!	17:33
jrosser	`db-url=jdbc:{{ keycloak_jdbc_provider }}://{{ keycloak_jdbc_haproxy_vip }}:{{ keycloak_jdbc_db_port }}/{{ keycloak_db_name }}`	17:34
jrosser	from the conf file	17:34
jrosser	ansible, of course so those are our vars	17:34
fungi	jrosser: also https://review.opendev.org/c/opendev/system-config/+/907141/11/playbooks/roles/keycloak/templates/docker-compose.yaml.j2 is what we'd tried up to this point	17:34
clarkb	fungi: looking at that file I agree DB appears to be the var to set the high level type	17:34
clarkb	but it isn't clear to me if those are read as env vars	17:35
clarkb	--db=postgres is in the first example link I provided so tehy seem to map to cli args at least	17:35
fungi	clarkb: the other common envvars like DB_PASSWORD turned up in that file	17:35
fungi	so just a hunch	17:35
fungi	i'm going to break for lunch and then start fiddling around a bit	17:36
jrosser	fungi: ours is installed from distro packages so we template out the conf file	17:37
jrosser	but we have recently done a massive series of upgrades bringing it to a pretty new version	17:37
fungi	jrosser: yeah, like i said earlier, we can also just map our own conffile into the container if we want	17:55
fungi	but having some semi-stable api (more stable than tracking changes to their default config file) would be preferable if we can work it out	17:56
opendevreview	Jeremy Stanley proposed opendev/system-config master: Upgrade to Keycloak 23.0 https://review.opendev.org/c/opendev/system-config/+/907141	18:35
opendevreview	Jeremy Stanley proposed opendev/system-config master: DNM: Fail keycloak testing for an autohold https://review.opendev.org/c/opendev/system-config/+/906600	18:35
fungi	clarkb: jrosser: ^ apparently they template out the db url with sub-options	18:35
clarkb	that simplifies things	18:35
fungi	so you can do --db-url-host, --db-url-port, --db-url-database...	18:35
fungi	the tricky bit is that --db-url-host gets stuck straight into the jdbc url string, which is colon-delimited, so you have to include [] if using raw ipv6 addresses	18:36
fungi	--db-url-host=::1 didn't work (and returned odd errors about the port), which confused me initially, until i realized it was reading that as a null host and null port	18:37
fungi	--db-url-host=[::1] worked a treat though	18:38
fungi	i also added a test to confirm we have expected initial database content	18:39
JayF	I may have just backported an ironic fix of a similar shape :-\|	18:39
JayF	although I think we found some things want [] and some break with []	18:39
fungi	what's old is old again	18:39
clarkb	and docker just refuses to understand both versions	18:39
clarkb	and podman refuses to change in solidarity with docker	18:39
fungi	because the podman folks love docker so very, very much	18:40
JayF	At least all our API mistakes are our API mistakes. It has to be annoying to be chasing someone elses' API	18:40
fungi	JayF: only when it's undocumented	18:40
fungi	which, you know, is most of the time	18:41
clarkb	JayF: the frustrating thing as an end user is that podman is not compatibile with docker in a bunch of different ways	18:41
JayF	I have	18:41
clarkb	but for some reason ipv6 literal support is not one of the ways they can differ	18:41
JayF	**I have lots of opinions about podman, and none of them include "this is a good idea that influenced tech in a positive way". I prefer someone be incompatible rather than 90% there	18:41
fungi	it's mainly annoying that they clearly chose to be incompatible with docker in some ways, but then refuse to acknowledge clear bugs with the excuse that they want to be bug-compatible with docker	18:42
* fungi has cake and eats it too		18:43
JayF	Yeah, this is the pattern you get trapped in if you chase someone elses' API	18:43
clarkb	now I want cake	18:44
fungi	the cake is a lie	18:50
opendevreview	Clark Boylan proposed opendev/system-config master: Update to etherpad 1.9.6 https://review.opendev.org/c/opendev/system-config/+/907349	18:57
fungi	wow, database test worked on the first try!	18:57
clarkb	nice	18:58
fungi	the new held keycloak test node is 104.239.230.31	19:00
fungi	also no h2 files in the running container this time	19:01
fungi	which i considered adding a test for, but figured checking mariadb for content was sufficient	19:01
clarkb	ya and I think h2 dbs can be used as caches (gerrit oes somethign like this)	19:05
clarkb	so as long as the permanent data ends up in mariadb we should be good	19:05
frickler	fwiw I'm seeing packet loss and sometimes-slow-responses from review.o.o like SvenKieske did earlier (in #*-kolla)	19:13
clarkb	I'm seeing very minimal loss within my isp before packets jump out of our AS but nothing beyond that	19:18
fungi	frickler: ipv4 i guess? you're presumably still not able to reach it at all over ipv6	19:21
clarkb	I've started to try and add some more depth to the pre ptg etherpad	19:23
fungi	for me, ping -4 from home to review.o.o looks like: 100 packets transmitted, 100 received, 0% packet loss, time 99725ms rtt min/avg/max/mdev = 66.104/74.671/431.658/36.152 ms	19:23
fungi	ping -6 is surprisingly a little better: 100 packets transmitted, 100 received, 0% packet loss, time 99140ms rtt min/avg/max/mdev = 54.896/57.737/68.236/2.460 ms	19:26
frickler	fungi: yes, v6 is still unreachable	19:31
JayF	75.196/79.345/89.269/4.652 ms from here over v4, it looks good/normal to me	19:33
* JayF has no v6		19:33
fungi	i'm doing some tests from our mirror server in france, since that's the geographically closest network to germany i have access to, though i doubt it's following similar routes	19:36
fungi	i could also boot one in vexxhost's warsaw region but that's not really any better	19:37
fungi	ipv6: 100 packets transmitted, 100 received, 0% packet loss, time 99136ms, rtt min/avg/max/mdev = 80.353/81.795/99.545/3.724 ms	19:41
fungi	ipv4: 100 packets transmitted, 100 received, 0% packet loss, time 99154ms, rtt min/avg/max/mdev = 80.125/80.971/98.320/2.549 ms	19:41
fungi	fairly consistent from there	19:41
jrosser	frickler: do you see where they are lost with mtr?	19:42
fungi	we can also install the mtr package on review.o.o to get the reverse path for comparison, since cases like this quite often involve an asymmertic route somewhere and you've got a 50% chance to see failures misattributed to the first hop where they diverge	19:44
frickler	jrosser: seems to be only the final two hops, so either the vexxhost link is full or something going wrong on the return path	19:51
frickler	too bad mnaser isn't around any more most of the time to look at things from the inside	19:52
fungi	when you see failures like that close to a provider edge, odds are you're dealing with an asymmetric route and the loss is somewhere on the way back	19:52
mnaser	i'm around, but honestly, there's not much we can do with zayo, i've filed endless tickets with them	19:52
fungi	we can run mtr from review.o.o to see where it errors	19:52
mnaser	i'm playing ping pong with them and it's just a matter of having the contract lapse and recommending no one to ever touch their stuff :)	19:53
fungi	i definitely don't envy you, nor do i miss chasing backbone provider problems	19:53
mnaser	it turns out after all the internet is a series of tubes	19:54
fungi	very leaky ones at that	19:54
mnaser	its not a big truck	19:54
fungi	most of the troubles i remember would end up being two backbone providers who couldn't agree on who was responsible for upgrading the capacity on their peering with one another, so they'd just point fingers and let customers suffer until one of them eventually caved and added more circuits	19:55
fungi	our bgp tables were a never-ending churn of pads and prefs to try to work around the worst offenders, but there was only so much we could do	19:56
jrosser	just now everything i can look at (not at my work laptop) goes via cogent and looks OK	19:56
frickler	mnaser: sorry to hear that. though from my traceroutes, both directions seem to be via cogent. and tbh I've heard more bad stories about cogent than zayo, but who knows	20:14
mnaser	frickler: historically cogent has been the bad guy, but surprisingly they got their act together	20:14
frickler	fungi: would we install mtr by hand or do we need to add it to the automation somewhere?	20:15
frickler	mnaser: well not in terms of their connectivity to german telekom it seems	20:16
fungi	frickler: i would just manually `sudo apt install mtr` but i wouldn't object to adding it and similar diagnostic tools to our default set if others are in favor	20:18
frickler	I went for mtr-tiny in order to avoid installing like 100 X11 libraries	20:20
tonyb	yeah I think adding it to the defaults is good.	20:21
frickler	but having that as default tool together with things like tcpdump and nc is a good idea	20:21
tonyb	also maybe jq?	20:21
frickler	jq is also good, yes	20:21
tonyb	is nmap too much?	20:22
frickler	hmm ... at least questionable I'd say, too easy to do unwanted things with it	20:22
frickler	I can look into a patch tomorrow, eoding for now	20:24
fungi	yeah, i don't see nmap as being in the same category as those other things	20:24
opendevreview	Clark Boylan proposed opendev/system-config master: DNM force etherpad failure to hold node https://review.opendev.org/c/opendev/system-config/+/840972	20:38
clarkb	put a hold in plcae for ^ after a successful test run on the parent	20:39
fungi	looks like you should have a held node for it now	21:24
opendevreview	James E. Blair proposed openstack/project-config master: DNM: test syntax error https://review.opendev.org/c/openstack/project-config/+/907362	21:44
clarkb	173.231.255.107 is the held node and I'm in the clarkb-test etherpad if you want to help test	21:49
clarkb	chrome is doing the random reconnect thing we've seen in the past but seems to work otherwise	21:52
clarkb	if others can't find issues I think this is probably a safe update	21:52
clarkb	curiously chrome and firefox render that reddish color differently	21:53
*** blarnath is now known as d34dh0r53		22:17
clarkb	I missed that https://discuss.python.org/t/what-to-do-about-gpus-and-the-built-distributions-that-support-them/7125 is a thing pypi is actually looking at now	22:20
clarkb	this same issue is what ultimately led to us turning off our pypi mirroring	22:20
fungi	yeah	22:21
clarkb	it seems like the fundamental issue is that CUDA isn't packaged in a way that is consumable as a dependency so everyone bundles it	22:23
clarkb	kidn of surprising to me that very little of the discussion seems to have gone down the path of "stop allowing cuda to do this to us"	22:24
fungi	stockholm syndrome	22:25
clarkb	nvidia is making large buckets of money in large part due to the success of cuda + python	22:26
clarkb	its crazy to me that investing a small amount of that into making the packaging of the software not suck seems insurmountable	22:26
clarkb	I guess at the end of the thread there is talk of the cudapython lib which does some of that	22:27
clarkb	except that those bindings are different than the ones everyone is already using	22:27
fungi	held etherpad lgtm	22:28
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job https://review.opendev.org/c/zuul/zuul-jobs/+/907363	22:35
clarkb	I'ev got a doctor appointment tomorrow morning but maybe we upgrade etherpad when I get back	22:44
clarkb	tonyb: looks like fungi reviewed the meetpad stack too if we want to start merging some of those. I think most of them are safe as they don't try and replace anything yet?	22:45
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job https://review.opendev.org/c/zuul/zuul-jobs/+/907363	22:46
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job https://review.opendev.org/c/zuul/zuul-jobs/+/907363	22:57
fungi	sgtm	23:05
tonyb	clarkb: Sounds good. I'll add a comment to the review to the first review to address your question	23:11
tonyb	Also FWIW: I'm slowly removing the stuck nodes from inmotion	23:12
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job https://review.opendev.org/c/zuul/zuul-jobs/+/907363	23:14
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job https://review.opendev.org/c/zuul/zuul-jobs/+/907363	23:24
clarkb	tonyb: what process did you end up for cleaning up the stuck nodes?	23:43
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Add zuul-tenant-conf-check role/job https://review.opendev.org/c/zuul/zuul-jobs/+/907363	23:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!