*** gyee has quit IRC | 00:50 | |
kota_ | timburke: good info, thx for the summit schedule | 02:26 |
---|---|---|
*** rcernin has quit IRC | 02:50 | |
*** rcernin has joined #openstack-swift | 02:59 | |
openstackgerrit | Merged openstack/swift master: Remove some useless swob.Request attr setting https://review.opendev.org/750013 | 03:11 |
*** josephillips has quit IRC | 03:19 | |
*** evrardjp has quit IRC | 04:33 | |
*** evrardjp has joined #openstack-swift | 04:33 | |
*** m75abrams has joined #openstack-swift | 05:06 | |
*** rcernin has quit IRC | 08:35 | |
*** manuvakery has joined #openstack-swift | 09:06 | |
*** rdejoux has joined #openstack-swift | 09:36 | |
*** StevenK has quit IRC | 10:21 | |
*** rcernin has joined #openstack-swift | 10:39 | |
*** StevenK_ has joined #openstack-swift | 11:19 | |
*** StevenK_ is now known as StevenK | 11:23 | |
*** yuxin_ has quit IRC | 12:35 | |
*** yuxin_ has joined #openstack-swift | 12:36 | |
*** m75abrams has quit IRC | 12:55 | |
*** sorrison has quit IRC | 13:14 | |
*** sorrison has joined #openstack-swift | 13:14 | |
*** rcernin has quit IRC | 15:50 | |
*** yuxin_ has quit IRC | 16:03 | |
*** yuxin_ has joined #openstack-swift | 16:06 | |
cwright | Hi everyone. I've been working with ormandj on our swift cluster, and as he's described here, we have run into some performance issues. | 16:35 |
cwright | We've started looking at implementing servers_per_port=2. We are using an independent replication network, and have seen that this is an issue for servers_per_port (re: https://bugs.launchpad.net/swift/+bug/1669579 ) | 16:35 |
openstack | Launchpad bug 1669579 in OpenStack Object Storage (swift) "servers_per_port will not bind to replication_port" [Medium,In progress] - Assigned to Romain LE DISEZ (rledisez) | 16:35 |
cwright | I've read the available docs, but I've been having trouble finding any real examples online of how the ring should look, given the above bug and that we are using a separate replication network. | 16:36 |
cwright | Here's a gist that shows the object ring for a small test cluster of 4 servers, each with 3 object disks, both how it is today (before implementing servers_per_port) and how I *think* it should look after servers_per_port: | 16:36 |
cwright | https://gist.github.com/corywright/d89a93bf21de3773ee9ade39d8c324fc | 16:36 |
cwright | Can anyone confirm that what I'm planning to do looks sane? | 16:36 |
timburke | cwright, so in light of the bug, i'd recommend keeping all of the replication ports as 6300 -- you then run two instances of the object-server with two different configs: the one serving data back to proxies should be using servers per port, while the one serving replication traffic will continue using workers | 16:44 |
timburke | at least, that's how clayg tells me we've been running my clusters; it might be interesting to see if that matches what rledisez does, as another data point | 16:45 |
timburke | note that you likely *don't* want to set `replication_server = true` on that second instance: while we finally have a fix for https://bugs.launchpad.net/swift/+bug/1446873, it's not been in a tagged release yet | 16:46 |
openstack | Launchpad bug 1446873 in OpenStack Object Storage (swift) "ssync doesn't work with replication_server = true" [Medium,Fix released] | 16:46 |
clayg | I think rledisez carries that patch so he can do "something" with his replication servers... I don't know exactly why that's desirable; but maybe we should try to land https://review.opendev.org/#/c/337861/ | 16:48 |
patchbot | patch 337861 - swift - Permit to bind object-server on replication_port - 7 patch sets | 16:48 |
ormandj | we're seeing fun stuff like a single drive in a single server in a cluster with hundreds of drives across multiple servers getting busy due to a patrol read, for example, causing the entire cluster to slow down to pretty abysmal rates | 16:49 |
ormandj | to include replication | 16:49 |
clayg | timburke: yes, we just have all our replication server workers on the same port - having disk isolation in worker processes would be helpful - but I don't know what we want to dedicate that much memory to running a bunch of replication workers | 16:49 |
ormandj | and when you're onlining new capacity, that's pretty painful | 16:50 |
clayg | what's a "patrol" read? | 16:50 |
ormandj | drive controller kicks off a process that effectively does a sector scan on drives in the background/transparently to the OS to look for failing disks/sectors, which makes disk access 'slow' | 16:51 |
ormandj | no different than if you hit it with 150IOPS or something like that from the OS, just slows down the disk | 16:51 |
ormandj | HPs/Dells/Ciscos/etc do it re: raid controllers | 16:51 |
clayg | IME a single drive getting busy or slow does not make the whole cluster performance "abysmal" - but adding capacity should be able to make effective use of ALL cluster resources (edging out clients) if the consistency engine is cranked all the way up | 16:52 |
cwright | thanks timburke. we currently already run two object-servers, with the second one only having `replication_server = true` | 16:52 |
cwright | since that is the one that binds to the ip on the replication network | 16:53 |
clayg | cwright: yeah timburke is definitely saying if you use EC you shouldn't have replication_server = true because of https://bugs.launchpad.net/swift/+bug/1446873 unless you're running master | 16:53 |
openstack | Launchpad bug 1446873 in OpenStack Object Storage (swift) "ssync doesn't work with replication_server = true" [Medium,Fix released] | 16:53 |
ormandj | clayg: if we kick off a patrol read on a drive, it absolutely does negatively impact performance for the cluster, significantly, which is why we're looking to implement servers-per-port. we'll see successful request rates drop through the floor, and we start serving 499s/500s more frequently - it's very odd | 16:54 |
clayg | oic, well - yes - one of the main benifits of servers-per-port was in-fact in reality isolating disks to particular wsgi servers such that a tarpit disk only effects requests to that disk (and not requests to other disks on the same server) | 16:55 |
ormandj | no EC here btw, just replication of 3 | 16:56 |
ormandj | also we aren't using ssync, i wasn't aware that was usable/reliable yet, is it? | 16:56 |
clayg | ssync is the only option for EC, for replicated rsync is still good - but some deployments are having success with ssync on replicated too | 16:57 |
clayg | since you're just 3R - it makes a little less sense why a single slow disk/server would effect "most" requests | 16:57 |
clayg | isolating slow disks to a single object server worker helps A LOT with other requests to other disks on the same server - but shouldn't effect anything on OTHER servers really 🤔 | 16:58 |
ormandj | yeah, it's baffling us too, because it causes a pretty significant impact | 16:58 |
ormandj | well, we think on our small clusters of 3 servers, it may cause a problem because of 3x replication | 16:59 |
ormandj | we've also had to set fairly high node timeouts due to the impact of a single drive | 16:59 |
clayg | ah, sure so with 3 servers a 3R PUT will need to talk to every server 😁 | 16:59 |
ormandj | and i think that 'slows down' everything | 16:59 |
clayg | I think you're on the right track with servers per port to start | 17:00 |
cwright | so since we aren't using EC/ssync, should it be safe to set `replication_server = true` in the second object-server config? | 17:01 |
clayg | if a disk becomes nearly un-responsive during a patrol read you might want to look at unmounting it first (507s don't take a whole node timouet and the proxy will stage writes on handoffs, then object-replicator can automatically repair when it's back online) | 17:02 |
clayg | the only problem with unmounting is that the replicator will also try to rebuild parts while it's unmounted - you might need some new code or custom middleware to "temporarily suspend writes" to a disk (I could see lots of uses for something like that) | 17:03 |
*** gyee has joined #openstack-swift | 17:03 | |
ormandj | yeah, we wanted to see how that went, but when we went to implement, ran into trouble because of the replication network bug. so just wanted to clarify before we make changes and break the planet ;) | 17:04 |
clayg | cwright: yes, if you're not using EC/ssync replication_server = true is fine as far as I know, but leaving it commented out is also ok (the config's bind_ip tells it where to listen; and must match the ring regardless) | 17:04 |
cwright | clayg: ok great, thanks | 17:05 |
ormandj | clayg: we can't really do the unmounting thing. that's not really how it works, it kicks off on all the drives at varying intervals within the parameters you set. | 17:05 |
ormandj | i think we'll probably be a lot better off with this change, and we'll see how it goes with that in place | 17:05 |
ormandj | one question we did have, mutating the ring by updating the port for the non-replication entry, are we talking full reshuffle of data everywhere? | 17:06 |
clayg | ormandj: neat! we have a "node agent" that runs in the background and does periodic disk checks - if we had something in the object server that would respect a "drive temporarily offline file" - that's where I'd add code to notice a drive is being scanned and mark a drive in maintenance mode | 17:07 |
clayg | ... but a cron would work too | 17:07 |
clayg | no, just changing a device's replication_ip/port wouldn't require a rebalance of parts | 17:07 |
ormandj | clayg: yeah, this is hidden from the OS, you can 'see' it occuring through the controller cli tools, so it'd have to be some magic there, which is suboptimal | 17:07 |
ormandj | clayg: awesome, hadn't looked at the hashing algo to figure out if ports/etc were used, that's great news | 17:07 |
clayg | ormandj: it sounds like a really neat check that's doing a lot to try and stay transparent - it's unfortunate that it's adding so much latency in io wait queues 😢 | 17:09 |
clayg | hopefully the drive isolation of server_per_port will do the trick! 🤞 | 17:09 |
ormandj | yeah, for most workloads it's relatively transparent, but random iops? not so much | 17:09 |
ormandj | most modern servers with raid controllers will default with it on | 17:10 |
*** tonyb has quit IRC | 17:15 | |
*** tonyb has joined #openstack-swift | 17:57 | |
*** manuvakery has quit IRC | 18:05 | |
*** gmann is now known as gmann_afk | 18:11 | |
openstackgerrit | Merged openstack/swift master: gate: Make rolling upgrade job work with either 60xx or 62xx ports https://review.opendev.org/750679 | 18:52 |
*** mikecmpbll has quit IRC | 18:59 | |
*** rdejoux has quit IRC | 19:31 | |
openstackgerrit | Clay Gerrard proposed openstack/swift master: add swift-manage-shard-ranges shrink command https://review.opendev.org/741721 | 20:23 |
*** openstackgerrit has quit IRC | 20:36 | |
cwright | Follow up question, is it recommended to deploy servers_per_port changes for account and container servers at the same time that we roll it out for object servers? | 21:04 |
cwright | actually, just saw a comment in the code that servers_per_port is only for object-server at the moment, so that answers that | 21:38 |
*** gmann_afk is now known as gmann | 22:49 | |
*** rcernin has joined #openstack-swift | 22:58 | |
*** rcernin has quit IRC | 22:59 | |
*** rcernin has joined #openstack-swift | 22:59 | |
*** openstackgerrit has joined #openstack-swift | 23:10 | |
openstackgerrit | Tim Burke proposed openstack/swift master: Authors/ChangeLog for 2.26.0 https://review.opendev.org/750537 | 23:10 |
timburke | rendered release notes preview: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_59e/750537/2/check/build-openstack-releasenotes/59e2503/docs/current.html | 23:49 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!