Tuesday, 2021-10-05

kota	it looks like python 3.10 is now available https://mail.python.org/archives/list/python-committers@python.org/message/OQWNWZWDPASOUOAT6VPUXIXBH2THYREC/	01:39
slim00	hi, i am looking to merge two swift clusters, any ideas how to merge account and container rings?	01:42
kota	slim00: perhaps, composite ring would be helpful for your purpose https://docs.openstack.org/swift/latest/overview_ring.html#composite-rings	01:46
timburke_	slim00, are the hash prefix/suffix the same between the two clusters? i'd assume not -- in which case, you'll have to download everything from one cluster and upload it all to the other	03:59
timburke_	might be able to use container-sync to do the movement	03:59
timburke_	once a container finishes syncing, you can break the sync then issue a bunch of DELETEs to clear out the data	03:59
timburke_	as capacity frees up in the "old" cluster, you pick a node (or maybe a rack, depending on topology) and drop all its devices' weights to zero. wait for replication to drain them off, then take them out of the ring entirely	03:59
timburke_	then they should be good to get ingested into the "new" cluster (possibly with a device format in between)	04:00
timburke_	it's gonna be a long, slow, sucky process unfortunately -- doubly so if capacity is tight	04:00
timburke_	slim00, i guess my main question would be: why do you want to merge the two clusters? operational concerns, client ease of use, ... something else?	04:03
slim00	kota, thanks. will read more about composite ring	04:12
slim00	timburke, they are not the same	04:13
slim00	timburke, will explore container-sync option. yes, it's more for operational issue and we would like to remove the old cluster and only operate new cluster	04:14
timburke_	slim00, is there a pretty good link between the clusters? are there still new writes going into the old cluster?	04:26
timburke_	prometheanfire, if i had to guess, i'd say dnspython is the likely culprit. looks like the type of an answer's rrset.items changed in the 1->2 transition?	04:27
slim00	timburke, yes there are good link between them and yes new writes still going in	04:28
timburke_	oh hey... eventlet actually seems to support it dnspython>=2.0.0 ... https://github.com/eventlet/eventlet/issues/619	04:29
timburke_	slim00, fwiw, if the new cluster has enough spare capacity that you don't need to transition hardware between the clusters, i think it should simplify some things. you can set up container sync to push from old to new, then once things seem mostly caught up, use the read-only middleware to stop new writes and wait for the sync to finish	04:32
prometheanfire	that makes sense	04:32
timburke_	monitoring your progress is likely to be difficult, and it's probably not going to go as fast as you'll want it to, particularly if there's a lot of EC data	04:33
prometheanfire	man, that was a large bump in dnspython	04:34
slim00	timburke, thanks for the suggestions. going to try it out in the lab.	04:34
timburke_	we've put it off for a while, mostly because of the lack of eventlet support :-(	04:34
prometheanfire	well, if I get this merged...	04:35
opendevreview	Tim Burke proposed openstack/swift master: cname_lookup: Work with dnspython 2.0+ https://review.opendev.org/c/openstack/swift/+/812424	04:50
timburke_	prometheanfire, well that was easy! who knows how well it works in practice, of course... but at least unit tests should pass!	04:51
prometheanfire	heh	04:54
opendevreview	Matthew Oliver proposed openstack/swift master: container-updater: no incoming syncs no account update https://review.opendev.org/c/openstack/swift/+/811833	08:58
mattoliver	^ there is one approach at it also supporting single replica container rings.. not 100% on it.	09:00
mattoliver	acoles: ^	09:00
acoles	mattoliver: ack, thanks	09:01
opendevreview	Merged openstack/swift master: cname_lookup: Work with dnspython 2.0+ https://review.opendev.org/c/openstack/swift/+/812424	21:07
reid_g	Question: When we specify different IPs/Ports for replication in the ring, how does the reconstructor work? I see calls to the replication IP:Port in the error log, but I see a huge amount of traffic going over the normal network during a rebalance. before rebalance the normal network is around ~2GB/s in the cluster. During rebalance it is ~15-25GB/s while the replication network went from about 800MB/s to 1.8GB/s	21:08
timburke_	:-/ https://github.com/openstack/swift/blob/master/swift/obj/reconstructor.py#L396-L397 looks suspicious -- that should probably be using replication_ip/replication_port	21:13
timburke_	the good news is that SSYNC traffic should be using the replication network: https://github.com/openstack/swift/blob/master/swift/obj/ssync_sender.py#L235-L236	21:15
timburke_	but we should really be pulling frags for reconstruction over that, too	21:15
opendevreview	Tim Burke proposed openstack/swift master: ec: Use replication network to get frags for reconstruction https://review.opendev.org/c/openstack/swift/+/812614	21:22
timburke_	reid_g, good spot! i'm surprised we never noticed that before...	21:23
reid_g	That is pretty traffic intensive because it is trying to reconstruct the data that is missing right?	21:24
reid_g	The ssync part only matters if the reconstructor is pushing the data to the correct node?	21:26
timburke_	yup, i wouldn't be surprised if it's fairly traffic intensive -- reverting data from handoffs should just use the replication network, but any reconstruction would need ndata frags for every frag it sent	21:27
timburke_	is there an expansion going on, or is this day-to-day "make sure everything is durable" reconstruction?	21:28
prometheanfire	nice, the dnspython change merged :D	21:54
reid_g	This is an expansion. I have a 1 or more rebalances left to do but we are adding to other clusters	21:54
timburke_	reid_g, if you haven't already, you might want to turn on handoffs_only -- it should prevent reconstruction so you can use those iops just to rebalance data, and as a side-benefit it should only be doing stuff on the replication network	22:00
timburke_	(that's probably why we hadn't really noticed the problem before...)	22:00
reid_g	I think I get why we want to use the handoffs only option now... It causes the reconstructors to just do push instead of recreating the missing fragments which is ligher operation?	23:53
reid_g	Just clicked... we have not been using that setting	23:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!