Wednesday, 2023-12-20

jrosser	NeilHanlon: thanks for the pointer to `rocky-release-kernel` i think theres a missing build for aarch64 64k page size?	08:31
jrosser	hamburgler: was there anything specific you were interested in for the radosgw config?	08:53
hamburgler	jrosser: hey :), i was curious how the mappings were working from storage-policy (swift) to placement-group(ceph) as for some reason in horizon I was unable to see these from the drop down menu in horizon so I could not create a container that way. I ended up upgrading to 18.2.1 from 18.2.0 and it seemed to resolve the issue lol :D	19:50
hamburgler	was able to get a multi-site config setup running in lab, really quite neat :)	19:51
jrosser	hamburgler: interesting - is that with replication?	19:53
hamburgler	yes sir, so for demo purposes, two ceph azs, one is master, other is secondary, when i write to first az it gets replicated to second	19:54
hamburgler	lowered garbage collection times to run faster to see that syncs as well	19:55
jrosser	did you ever investigate performance with lots of objects in a bucket	19:55
jrosser	delete time for the bucket for example	19:55
jrosser	^ not related to multisite	19:56
jrosser	we see a delete rate of about 500 objects/second which can make it extremely long to delete massive buckets	19:58
hamburgler	not yet with any measurable workload - but from observation it looks pretty quick with a single bucket, that really doesn't mean much though I suppose :D, if I delete an object in a bucket, i can see the pool drop in size (this must be a marker saying objects to be deleted) then since i set rgw_gc_processor_period and rgw_gc_obj_min_wait to 60s within the objects within a bucket are gone i'd say within 2 minutes	19:59
hamburgler	everything is removed	19:59
hamburgler	did you adjust the default delete io rate limit for objects in buckets?	19:59
jrosser	almost certainly not	19:59
hamburgler	i haven't played with that yet myself	19:59
jrosser	generally we've focussed on getting thoughtput up	19:59
hamburgler	gotcha, what type of disks do you use? are your pools mapped to specific crush rules/drive types - such as I read the red hat docs a bit ago and it looks like certain pools related to metadata should be mapped to ssd at a minimum, ideally NVMe - then data pools likely hdd with an ssd/nvme journal/wal	20:01
jrosser	all the rgw metadata is on an nvme pool	20:01
jrosser	but then there is several PB of hdd for the default placement group	20:02
hamburgler	replication or EC?	20:02
jrosser	and we made an extra placement group "fast" which is placed on the nvme	20:02
jrosser	all replicated, no EC	20:02
jrosser	initially i didn't have enough chassis to sensibly do EC	20:02
hamburgler	yeah it looks like quite a few nodes are needed for that, I'm truly not sure if it is a benefit over replication as I've barely touched EC for anything	20:03
hamburgler	I would say since you have metadata on nvme pool that's probably not the limit of 500 objects/second - wonder if it's the io rate but again haven't really played with it	20:04
hamburgler	is it different from web to cli?	20:05
jrosser	i don't think so	20:05
jrosser	we did discuss it briefly and decided there must be some rate limit	20:05
jrosser	but bigger problem is dealing with quota exceeded :)	20:06
hamburgler	haha - is that an issue on a per tenant basis?	20:06
jrosser	that returns an error code immediately from the radosgw and is absolutely not bound by the latency of doing any storage I/o	20:07
jrosser	so if the client is written naievely, then that can badly hurt your radosgw node	20:07
jrosser	we need to come up with some rate limiting for that case	20:08
hamburgler	ahh you mean that if there is lots of objects being written it triggers quota exceeded?	20:08
hamburgler	sorry if I am not following there	20:09
jrosser	once you exceed your quota and the client just retries over and over, you can DOS the radosgw pretty easily	20:09
jrosser	particularly if the client is highly parallel	20:09
hamburgler	oh shoot that's good to know	20:09
jrosser	so theres two sides to that, client needs to behave appropriatey	20:09
jrosser	but object store needs to be able to cope with a stupid client	20:10
hamburgler	hmm could haproxy not handle a rate limit?	20:11
hamburgler	via source	20:11
jrosser	it could	20:11
jrosser	but we need to revisit as the haproxy on the front is L4	20:11
jrosser	and SSL termination is done on a big bank of radosgw behind that	20:11
jrosser	so that needs to become a bit fancier architecture	20:12
hamburgler	yeah absolutely, I'm not sure what we would do horizon size, but I imagine we will likely set the public endpoint through a different set of haproxy nodes - currently my lab is using the orchestrator to deploy keepalived/haproxy for ceph interal to openstack	20:13
hamburgler	horizon side*	20:13
jrosser	thats what we do - a separate pair of haprox y for the public endpoint	20:14
jrosser	the radsgw sit in a sort of dmz	20:14
jrosser	and then theres some magic which allows access to those radosgw directly from instances	20:15
jrosser	without having to go through a neutron router	20:15
hamburgler	oh interesting never thought of that use case!	20:15
hamburgler	is your object storage built on its own cluster or dedicated root where rbd pools sit?	20:16
jrosser	it's the same cluster for everything	20:17
jrosser	but we had some particular requirements around object store throughtput MB/s, rather than rbd iops	20:17
hamburgler	gotcha, we have multiple roots for different tiers at the moment, was debating about throwing object storage on its own dedicated cluster but again $ :\| lol	20:18
jrosser	i have opportunity to expand the object store significantly	20:19
jrosser	but the disk chassis are 84 disks each	20:19
jrosser	which is some pretty wild number for OSDs	20:19
hamburgler	oh yeah :O, those are HDD nodes for data pools?	20:20
jrosser	yes	20:20
hamburgler	was gunna say if that was NVMe or even SSD those poor processors :D bottleneck for days	20:20
hamburgler	*with that many disks on one node :)	20:24
hamburgler	actually pretty happy with 18.2.1 - not that I am much of a fan of dashboards but the multi-site and overview are a nice touch	20:26
hamburgler	jrosser: btw ty, appreciate the chat!	20:30
jrosser	heh no problem	20:30
jrosser	tbh we had a lot of trouble so far with the radosgw dashboard	20:30
jrosser	buckets with loads of objects did not go well	20:31
hamburgler	I imagine it will be the same for me when not in a quiet lab env :D	20:32
NeilHanlon	jrosser: not sure about the 64k kernel--but will check!	20:40
jrosser	NeilHanlon: that would be great	20:41

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!