kota | it looks like python 3.10 is now available https://mail.python.org/archives/list/python-committers@python.org/message/OQWNWZWDPASOUOAT6VPUXIXBH2THYREC/ | 01:39 |
---|---|---|
slim00 | hi, i am looking to merge two swift clusters, any ideas how to merge account and container rings? | 01:42 |
kota | slim00: perhaps, composite ring would be helpful for your purpose https://docs.openstack.org/swift/latest/overview_ring.html#composite-rings | 01:46 |
timburke_ | slim00, are the hash prefix/suffix the same between the two clusters? i'd assume not -- in which case, you'll have to download everything from one cluster and upload it all to the other | 03:59 |
timburke_ | might be able to use container-sync to do the movement | 03:59 |
timburke_ | once a container finishes syncing, you can break the sync then issue a bunch of DELETEs to clear out the data | 03:59 |
timburke_ | as capacity frees up in the "old" cluster, you pick a node (or maybe a rack, depending on topology) and drop all its devices' weights to zero. wait for replication to drain them off, then take them out of the ring entirely | 03:59 |
timburke_ | then they should be good to get ingested into the "new" cluster (possibly with a device format in between) | 04:00 |
timburke_ | it's gonna be a long, slow, sucky process unfortunately -- doubly so if capacity is tight | 04:00 |
timburke_ | slim00, i guess my main question would be: why do you want to merge the two clusters? operational concerns, client ease of use, ... something else? | 04:03 |
slim00 | kota, thanks. will read more about composite ring | 04:12 |
slim00 | timburke, they are not the same | 04:13 |
slim00 | timburke, will explore container-sync option. yes, it's more for operational issue and we would like to remove the old cluster and only operate new cluster | 04:14 |
timburke_ | slim00, is there a pretty good link between the clusters? are there still new writes going into the old cluster? | 04:26 |
timburke_ | prometheanfire, if i had to guess, i'd say dnspython is the likely culprit. looks like the type of an answer's rrset.items changed in the 1->2 transition? | 04:27 |
slim00 | timburke, yes there are good link between them and yes new writes still going in | 04:28 |
timburke_ | oh hey... eventlet actually seems to support it dnspython>=2.0.0 ... https://github.com/eventlet/eventlet/issues/619 | 04:29 |
timburke_ | slim00, fwiw, if the new cluster has enough spare capacity that you don't need to transition hardware between the clusters, i think it should simplify some things. you can set up container sync to push from old to new, then once things seem mostly caught up, use the read-only middleware to stop new writes and wait for the sync to finish | 04:32 |
prometheanfire | that makes sense | 04:32 |
timburke_ | monitoring your progress is likely to be difficult, and it's probably not going to go as fast as you'll want it to, particularly if there's a lot of EC data | 04:33 |
prometheanfire | man, that was a large bump in dnspython | 04:34 |
slim00 | timburke, thanks for the suggestions. going to try it out in the lab. | 04:34 |
timburke_ | we've put it off for a while, mostly because of the lack of eventlet support :-( | 04:34 |
prometheanfire | well, if I get this merged... | 04:35 |
opendevreview | Tim Burke proposed openstack/swift master: cname_lookup: Work with dnspython 2.0+ https://review.opendev.org/c/openstack/swift/+/812424 | 04:50 |
timburke_ | prometheanfire, well that was easy! who knows how well it works in practice, of course... but at least unit tests should pass! | 04:51 |
prometheanfire | heh | 04:54 |
opendevreview | Matthew Oliver proposed openstack/swift master: container-updater: no incoming syncs no account update https://review.opendev.org/c/openstack/swift/+/811833 | 08:58 |
mattoliver | ^ there is one approach at it also supporting single replica container rings.. not 100% on it. | 09:00 |
mattoliver | acoles: ^ | 09:00 |
acoles | mattoliver: ack, thanks | 09:01 |
opendevreview | Merged openstack/swift master: cname_lookup: Work with dnspython 2.0+ https://review.opendev.org/c/openstack/swift/+/812424 | 21:07 |
reid_g | Question: When we specify different IPs/Ports for replication in the ring, how does the reconstructor work? I see calls to the replication IP:Port in the error log, but I see a huge amount of traffic going over the normal network during a rebalance. before rebalance the normal network is around ~2GB/s in the cluster. During rebalance it is ~15-25GB/s while the replication network went from about 800MB/s to 1.8GB/s | 21:08 |
timburke_ | :-/ https://github.com/openstack/swift/blob/master/swift/obj/reconstructor.py#L396-L397 looks suspicious -- that should probably be using replication_ip/replication_port | 21:13 |
timburke_ | the good news is that SSYNC traffic should be using the replication network: https://github.com/openstack/swift/blob/master/swift/obj/ssync_sender.py#L235-L236 | 21:15 |
timburke_ | but we should really be pulling frags for reconstruction over that, too | 21:15 |
opendevreview | Tim Burke proposed openstack/swift master: ec: Use replication network to get frags for reconstruction https://review.opendev.org/c/openstack/swift/+/812614 | 21:22 |
timburke_ | reid_g, good spot! i'm surprised we never noticed that before... | 21:23 |
reid_g | That is pretty traffic intensive because it is trying to reconstruct the data that is missing right? | 21:24 |
reid_g | The ssync part only matters if the reconstructor is pushing the data to the correct node? | 21:26 |
timburke_ | yup, i wouldn't be surprised if it's fairly traffic intensive -- reverting data from handoffs should just use the replication network, but any reconstruction would need ndata frags for every frag it sent | 21:27 |
timburke_ | is there an expansion going on, or is this day-to-day "make sure everything is durable" reconstruction? | 21:28 |
prometheanfire | nice, the dnspython change merged :D | 21:54 |
reid_g | This is an expansion. I have a 1 or more rebalances left to do but we are adding to other clusters | 21:54 |
timburke_ | reid_g, if you haven't already, you might want to turn on handoffs_only -- it should prevent reconstruction so you can use those iops just to rebalance data, and as a side-benefit it should only be doing stuff on the replication network | 22:00 |
timburke_ | (that's probably why we hadn't really noticed the problem before...) | 22:00 |
reid_g | I think I get why we want to use the handoffs only option now... It causes the reconstructors to just do push instead of recreating the missing fragments which is ligher operation? | 23:53 |
reid_g | Just clicked... we have not been using that setting | 23:57 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!