Thursday, 2020-09-10

*** gyee has quit IRC		00:50
kota_	timburke: good info, thx for the summit schedule	02:26
*** rcernin has quit IRC		02:50
*** rcernin has joined #openstack-swift		02:59
openstackgerrit	Merged openstack/swift master: Remove some useless swob.Request attr setting https://review.opendev.org/750013	03:11
*** josephillips has quit IRC		03:19
*** evrardjp has quit IRC		04:33
*** evrardjp has joined #openstack-swift		04:33
*** m75abrams has joined #openstack-swift		05:06
*** rcernin has quit IRC		08:35
*** manuvakery has joined #openstack-swift		09:06
*** rdejoux has joined #openstack-swift		09:36
*** StevenK has quit IRC		10:21
*** rcernin has joined #openstack-swift		10:39
*** StevenK_ has joined #openstack-swift		11:19
*** StevenK_ is now known as StevenK		11:23
*** yuxin_ has quit IRC		12:35
*** yuxin_ has joined #openstack-swift		12:36
*** m75abrams has quit IRC		12:55
*** sorrison has quit IRC		13:14
*** sorrison has joined #openstack-swift		13:14
*** rcernin has quit IRC		15:50
*** yuxin_ has quit IRC		16:03
*** yuxin_ has joined #openstack-swift		16:06
cwright	Hi everyone. I've been working with ormandj on our swift cluster, and as he's described here, we have run into some performance issues.	16:35
cwright	We've started looking at implementing servers_per_port=2. We are using an independent replication network, and have seen that this is an issue for servers_per_port (re: https://bugs.launchpad.net/swift/+bug/1669579 )	16:35
openstack	Launchpad bug 1669579 in OpenStack Object Storage (swift) "servers_per_port will not bind to replication_port" [Medium,In progress] - Assigned to Romain LE DISEZ (rledisez)	16:35
cwright	I've read the available docs, but I've been having trouble finding any real examples online of how the ring should look, given the above bug and that we are using a separate replication network.	16:36
cwright	Here's a gist that shows the object ring for a small test cluster of 4 servers, each with 3 object disks, both how it is today (before implementing servers_per_port) and how I think it should look after servers_per_port:	16:36
cwright	https://gist.github.com/corywright/d89a93bf21de3773ee9ade39d8c324fc	16:36
cwright	Can anyone confirm that what I'm planning to do looks sane?	16:36
timburke	cwright, so in light of the bug, i'd recommend keeping all of the replication ports as 6300 -- you then run two instances of the object-server with two different configs: the one serving data back to proxies should be using servers per port, while the one serving replication traffic will continue using workers	16:44
timburke	at least, that's how clayg tells me we've been running my clusters; it might be interesting to see if that matches what rledisez does, as another data point	16:45
timburke	note that you likely don't want to set `replication_server = true` on that second instance: while we finally have a fix for https://bugs.launchpad.net/swift/+bug/1446873, it's not been in a tagged release yet	16:46
openstack	Launchpad bug 1446873 in OpenStack Object Storage (swift) "ssync doesn't work with replication_server = true" [Medium,Fix released]	16:46
clayg	I think rledisez carries that patch so he can do "something" with his replication servers... I don't know exactly why that's desirable; but maybe we should try to land https://review.opendev.org/#/c/337861/	16:48
patchbot	patch 337861 - swift - Permit to bind object-server on replication_port - 7 patch sets	16:48
ormandj	we're seeing fun stuff like a single drive in a single server in a cluster with hundreds of drives across multiple servers getting busy due to a patrol read, for example, causing the entire cluster to slow down to pretty abysmal rates	16:49
ormandj	to include replication	16:49
clayg	timburke: yes, we just have all our replication server workers on the same port - having disk isolation in worker processes would be helpful - but I don't know what we want to dedicate that much memory to running a bunch of replication workers	16:49
ormandj	and when you're onlining new capacity, that's pretty painful	16:50
clayg	what's a "patrol" read?	16:50
ormandj	drive controller kicks off a process that effectively does a sector scan on drives in the background/transparently to the OS to look for failing disks/sectors, which makes disk access 'slow'	16:51
ormandj	no different than if you hit it with 150IOPS or something like that from the OS, just slows down the disk	16:51
ormandj	HPs/Dells/Ciscos/etc do it re: raid controllers	16:51
clayg	IME a single drive getting busy or slow does not make the whole cluster performance "abysmal" - but adding capacity should be able to make effective use of ALL cluster resources (edging out clients) if the consistency engine is cranked all the way up	16:52
cwright	thanks timburke. we currently already run two object-servers, with the second one only having `replication_server = true`	16:52
cwright	since that is the one that binds to the ip on the replication network	16:53
clayg	cwright: yeah timburke is definitely saying if you use EC you shouldn't have replication_server = true because of https://bugs.launchpad.net/swift/+bug/1446873 unless you're running master	16:53
openstack	Launchpad bug 1446873 in OpenStack Object Storage (swift) "ssync doesn't work with replication_server = true" [Medium,Fix released]	16:53
ormandj	clayg: if we kick off a patrol read on a drive, it absolutely does negatively impact performance for the cluster, significantly, which is why we're looking to implement servers-per-port. we'll see successful request rates drop through the floor, and we start serving 499s/500s more frequently - it's very odd	16:54
clayg	oic, well - yes - one of the main benifits of servers-per-port was in-fact in reality isolating disks to particular wsgi servers such that a tarpit disk only effects requests to that disk (and not requests to other disks on the same server)	16:55
ormandj	no EC here btw, just replication of 3	16:56
ormandj	also we aren't using ssync, i wasn't aware that was usable/reliable yet, is it?	16:56
clayg	ssync is the only option for EC, for replicated rsync is still good - but some deployments are having success with ssync on replicated too	16:57
clayg	since you're just 3R - it makes a little less sense why a single slow disk/server would effect "most" requests	16:57
clayg	isolating slow disks to a single object server worker helps A LOT with other requests to other disks on the same server - but shouldn't effect anything on OTHER servers really 🤔	16:58
ormandj	yeah, it's baffling us too, because it causes a pretty significant impact	16:58
ormandj	well, we think on our small clusters of 3 servers, it may cause a problem because of 3x replication	16:59
ormandj	we've also had to set fairly high node timeouts due to the impact of a single drive	16:59
clayg	ah, sure so with 3 servers a 3R PUT will need to talk to every server 😁	16:59
ormandj	and i think that 'slows down' everything	16:59
clayg	I think you're on the right track with servers per port to start	17:00
cwright	so since we aren't using EC/ssync, should it be safe to set `replication_server = true` in the second object-server config?	17:01
clayg	if a disk becomes nearly un-responsive during a patrol read you might want to look at unmounting it first (507s don't take a whole node timouet and the proxy will stage writes on handoffs, then object-replicator can automatically repair when it's back online)	17:02
clayg	the only problem with unmounting is that the replicator will also try to rebuild parts while it's unmounted - you might need some new code or custom middleware to "temporarily suspend writes" to a disk (I could see lots of uses for something like that)	17:03
*** gyee has joined #openstack-swift		17:03
ormandj	yeah, we wanted to see how that went, but when we went to implement, ran into trouble because of the replication network bug. so just wanted to clarify before we make changes and break the planet ;)	17:04
clayg	cwright: yes, if you're not using EC/ssync replication_server = true is fine as far as I know, but leaving it commented out is also ok (the config's bind_ip tells it where to listen; and must match the ring regardless)	17:04
cwright	clayg: ok great, thanks	17:05
ormandj	clayg: we can't really do the unmounting thing. that's not really how it works, it kicks off on all the drives at varying intervals within the parameters you set.	17:05
ormandj	i think we'll probably be a lot better off with this change, and we'll see how it goes with that in place	17:05
ormandj	one question we did have, mutating the ring by updating the port for the non-replication entry, are we talking full reshuffle of data everywhere?	17:06
clayg	ormandj: neat! we have a "node agent" that runs in the background and does periodic disk checks - if we had something in the object server that would respect a "drive temporarily offline file" - that's where I'd add code to notice a drive is being scanned and mark a drive in maintenance mode	17:07
clayg	... but a cron would work too	17:07
clayg	no, just changing a device's replication_ip/port wouldn't require a rebalance of parts	17:07
ormandj	clayg: yeah, this is hidden from the OS, you can 'see' it occuring through the controller cli tools, so it'd have to be some magic there, which is suboptimal	17:07
ormandj	clayg: awesome, hadn't looked at the hashing algo to figure out if ports/etc were used, that's great news	17:07
clayg	ormandj: it sounds like a really neat check that's doing a lot to try and stay transparent - it's unfortunate that it's adding so much latency in io wait queues 😢	17:09
clayg	hopefully the drive isolation of server_per_port will do the trick! 🤞	17:09
ormandj	yeah, for most workloads it's relatively transparent, but random iops? not so much	17:09
ormandj	most modern servers with raid controllers will default with it on	17:10
*** tonyb has quit IRC		17:15
*** tonyb has joined #openstack-swift		17:57
*** manuvakery has quit IRC		18:05
*** gmann is now known as gmann_afk		18:11
openstackgerrit	Merged openstack/swift master: gate: Make rolling upgrade job work with either 60xx or 62xx ports https://review.opendev.org/750679	18:52
*** mikecmpbll has quit IRC		18:59
*** rdejoux has quit IRC		19:31
openstackgerrit	Clay Gerrard proposed openstack/swift master: add swift-manage-shard-ranges shrink command https://review.opendev.org/741721	20:23
*** openstackgerrit has quit IRC		20:36
cwright	Follow up question, is it recommended to deploy servers_per_port changes for account and container servers at the same time that we roll it out for object servers?	21:04
cwright	actually, just saw a comment in the code that servers_per_port is only for object-server at the moment, so that answers that	21:38
*** gmann_afk is now known as gmann		22:49
*** rcernin has joined #openstack-swift		22:58
*** rcernin has quit IRC		22:59
*** rcernin has joined #openstack-swift		22:59
*** openstackgerrit has joined #openstack-swift		23:10
openstackgerrit	Tim Burke proposed openstack/swift master: Authors/ChangeLog for 2.26.0 https://review.opendev.org/750537	23:10
timburke	rendered release notes preview: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_59e/750537/2/check/build-openstack-releasenotes/59e2503/docs/current.html	23:49

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!