Monday, 2019-04-22

*** d34dh0r53 has quit IRC		01:09
*** baojg has joined #openstack-swift		01:28
*** d34dh0r53 has joined #openstack-swift		02:47
*** gkadam has quit IRC		03:00
*** thurloat8 has quit IRC		03:03
*** gkadam has joined #openstack-swift		03:06
*** openstackgerrit has joined #openstack-swift		03:06
openstackgerrit	zhongshengping proposed openstack/swift master: Replace git.openstack.org URLs with opendev.org URLs https://review.opendev.org/654278	03:06
*** psachin has joined #openstack-swift		03:22
*** tonyb has joined #openstack-swift		04:46
*** tonyb has quit IRC		06:13
*** tonyb has joined #openstack-swift		06:14
*** ccamacho has joined #openstack-swift		06:58
*** baojg has quit IRC		07:07
*** pcaruana has joined #openstack-swift		07:19
*** tkajinam has quit IRC		08:24
*** pcaruana has quit IRC		08:45
*** rcernin has joined #openstack-swift		09:03
*** e0ne has joined #openstack-swift		09:42
*** hoonetorg has quit IRC		10:00
*** hoonetorg has joined #openstack-swift		10:13
*** baojg has joined #openstack-swift		10:53
*** pcaruana has joined #openstack-swift		10:58
*** gkadam has quit IRC		12:49
*** gkadam has joined #openstack-swift		12:52
*** psachin has quit IRC		13:39
*** openstackgerrit has quit IRC		14:28
clayg	cool, yeah everything seemed to "just work" on my end...	15:00
clayg	we have p 654278 which looks fully legit	15:00
patchbot	https://review.openstack.org/#/c/654278/ - swift - Replace git.openstack.org URLs with opendev.org URLs - 1 patch set	15:00
*** gyee has joined #openstack-swift		15:11
notmyname	good morning	15:41
*** e0ne has quit IRC		15:47
*** e0ne has joined #openstack-swift		16:01
*** gkadam has quit IRC		16:30
*** e0ne has quit IRC		16:36
*** ndk_ has quit IRC		17:48
*** ybunker has joined #openstack-swift		17:49
*** sleterrier has quit IRC		17:50
*** sleterrier has joined #openstack-swift		17:50
ybunker	is there any swift command that i can use to find where the partition (for example 1111) is located? i want to find the main partition and the replicas	17:51
notmyname	`swift-get-nodes -p PARTITION`	17:52
notmyname	`swift-get-nodes [-a] <ring.gz> -p partition` (from the usage string, so more complete/correct)	17:53
*** e0ne has joined #openstack-swift		17:58
ybunker	notmyname: thanks :-)	18:02
*** e0ne has quit IRC		18:08
ybunker	got into a drive fully 100% condition, and after trying many things.. for example (set handsoff_delete and handsoff_first)	18:12
ybunker	and already added (3) new nodes to the cluster	18:12
notmyname	for full clusters, the goal is to first add new drives, then get the handoffs moved as quickly as possible. the handoffs_first=true and handoffs_delete set to something like 1 or 2 are the first things to check	18:13
notmyname	then check rsync settings. make sure you've got a lot of available connections on the new drives, make sure rsync is not accepting inbound connections on the servers with full drives	18:13
ybunker	it seems that since the fully drives are at 100% used space (10G free), it does not 'remove' the replicated partitions from those drives.. and still at 100%	18:13
*** e0ne has joined #openstack-swift		18:14
notmyname	clayg: what are the other things we set in the emergency replication mode?	18:14
ybunker	notmyname: handsoff_ already check and is not freeing a byte :(	18:15
clayg	if you're 100% full you need handoffs_delete = 1	18:15
ybunker	already did that on the fully nodes	18:16
notmyname	meeting time for me. gotta close irc	18:16
clayg	ok, then you should be good to go	18:16
ybunker	thanks a lot notmyname	18:16
clayg	you can increase object replicator works & concurrency if you have iops available on the nodes that want to drain	18:16
ybunker	clayg: already did that but disks keeps at 100%	18:17
clayg	you might want to back of the napkin how many streams of io you can pull/push per node	18:17
clayg	check logs for errors - something could be preventing successful trasnfer (e.g. rsync connection limiting)	18:17
clayg	this is replicated or EC fragments?	18:18
ybunker	clayg: and the problem is that we have a 'maintenance' window for obj-repl that runs 4hs a day..because when it runs the latency of the cluster goes to the roof.. from 44ms... to 4s....and its impossible to operate at those numbers	18:18
ybunker	replicated	18:18
clayg	everybody has an iops budget - you could play with the ionice settings	18:19
clayg	but - i figure you'd want to get things moving first and then worry about making easier to manage - you should be able to free up some bytes duirng your four hour window if you go full tilt	18:20
clayg	no?	18:20
ybunker	clayg: yeah i want to get rid of the 100% full condition.. the problem is that on those 4h replica.. the fully drives are still at 100%, i got 8 workers running on that window	18:21
clayg	are you doing rsync connections/modules per disk?	18:21
ybunker	per ACO	18:22
clayg	so with 8 workers your outbound streams from the node is 8x concurrency - what's your object replicator concurrency? how many (object) disks per node?	18:23
ybunker	found the following on the log error: object-replicator: [worker 1/8 pid=20511] @ERROR: max connections (8) reached -- try again later	18:23
ybunker	9 disks per node	18:23
clayg	there you go!	18:23
clayg	going no where fast	18:24
ybunker	i just change max connections on rsyncd to 16	18:24
clayg	that limit is set in the rsync.conf	18:24
clayg	is that enough? what concurrency are you running?	18:24
clayg	16 per node isn't even 2 connections per disk - I'd think you'd want 4-8 per disk in an emergency	18:25
clayg	of course w/o rsync modules per disk you can't guarantee even distribution...	18:25
ybunker	clayg: http://paste.openstack.org/show/749609/	18:25
clayg	yeah! try and move your deployment in this direction at some point -> https://github.com/openstack/swift/blob/master/etc/rsyncd.conf-sample#L25	18:28
clayg	ok, so concurrency is 8 and workers is 8 - so you have about 64 streams coming out of each full node - so you'll want the aggregate capacity of all new incoming hardware ready to receive all that	18:29
clayg	how many standing nodes do you have - and how many are you adding?	18:29
ybunker	i have 9 standing nodes (2 fully 100%) and adding (3) new ones	18:30
clayg	ok, so you should be able to really hammer those new guys - I'd recommend increasing workers to 9 so you have one worker per disk - set concurrency to 4 so you have 324 outgoing streams, then set max_connections at something like 75 or so on those nodes you want to fill up and LET HER RIP!	18:33
clayg	when is your window? it'll be a replication party!!!	18:34
ybunker	for rsyncd something like this (names are from 4..12) => http://paste.openstack.org/show/749610/	18:34
clayg	oh, you ARE doing rsync module per disk then	18:34
ybunker	nono, i mean the changes that you told me about	18:35
clayg	how can that be tho? you're not setting it in the object-server.conf	18:35
clayg	oh oh oh - sure that'd be nice to have for the future - makes replication a lot more closely tied to rare commodity of spinning platters	18:36
clayg	up to you if you want to change that now or after the fire-drill	18:36
clayg	we rebalanced lots of clusters before we had rsync modules per disk ;)	18:36
ybunker	after :)	18:36
clayg	yeah so tweak your workers X concurrency and increase the rysnc max_connections on the 3 new nodes like... A LOT	18:37
clayg	you should see them start to get hammered with write io, and shortly after that the nodes pushing data will be able to do some DELETEs	18:37
ybunker	so on all the nodes im going to set: workers = 9, concurrency = 4, replicator_workers = 9 and max_connections on rsyncd to 76	18:37
clayg	that's fine - if it's easy for you to make a subtle change I'd recommend a heterogeneous deployment	18:38
clayg	lots of workers X concurrency on the PUSHING nodes w/ very little room for incoming rsync - then the OPPOSITE on the receiving nodes	18:38
clayg	does that make sense? I'm not sure it's obvious what these options all really "do" exactly?	18:39
ybunker	so on fully nodes i need to decrease the max_connections? or do i increase it on all the nodes?	18:39
clayg	you man increase it on all nodes - it would be a very small improvement to leave it set to something small on the nodes that should be pushing data (just to avoid busy work)	18:40
ybunker	got it	18:40
clayg	normally "busy" work is fine, it's just... the code doesn't really understand "this is an emergency!!! FREAK OUT!" so if you give it a bunch of breathing room it might think "oh you want me to GO FAST, can do boss!"	18:41
ybunker	got sense :)	18:42
ybunker	hopefully with these changes it started to decrease a little bit :-), get back tomorrow hopefully with some good news :-), thanks clayg	18:43
clayg	great!	18:43
*** e0ne has quit IRC		19:23
*** dasp has joined #openstack-swift		19:35
*** ybunker has quit IRC		19:53
*** e0ne has joined #openstack-swift		20:05
*** pcaruana has quit IRC		20:46
*** fyx has quit IRC		20:53
*** jungleboyj has quit IRC		20:54
*** clayg has quit IRC		20:56
*** jungleboyj has joined #openstack-swift		20:57
*** e0ne has quit IRC		20:58
*** gmann has quit IRC		20:59
*** beisner has quit IRC		21:00
*** e0ne has joined #openstack-swift		21:02
*** jungleboyj has quit IRC		21:04
*** e0ne has quit IRC		21:06
*** jungleboyj has joined #openstack-swift		21:07
*** fyx has joined #openstack-swift		21:07
*** gmann has joined #openstack-swift		21:07
*** beisner has joined #openstack-swift		21:08
*** clayg has joined #openstack-swift		21:14
*** ChanServ sets mode: +v clayg		21:14
*** irclogbot_0 has quit IRC		21:58
*** tonyb has quit IRC		21:58
*** irclogbot_1 has joined #openstack-swift		22:03
*** jungleboyj has quit IRC		22:07
*** gmann has quit IRC		22:07
*** jungleboyj has joined #openstack-swift		22:07
*** gmann has joined #openstack-swift		22:07
*** sleterrier_ has joined #openstack-swift		22:09
*** gmann has quit IRC		22:11
*** irclogbot_1 has quit IRC		22:11
*** gmann has joined #openstack-swift		22:11
*** kinrui has joined #openstack-swift		22:14
*** fungi has quit IRC		22:16
*** sleterrier has quit IRC		22:16
*** mathiasb has quit IRC		22:16
*** kinrui is now known as fungi		22:19
*** irclogbot_0 has joined #openstack-swift		22:21
*** rcernin has quit IRC		22:45
*** rcernin has joined #openstack-swift		22:45
notmyname	working on my swift project update talk for next week, and I'm looking for some really old info. which means I'm looking through project update talks from back in 2011 (and earlier). wow they were ugly ;-)	22:50
notmyname	eg http://d.not.mn/swift_overview_oscon2011.pdf	22:51
*** tkajinam has joined #openstack-swift		22:57
*** threestrands has joined #openstack-swift		23:04
*** rcernin has quit IRC		23:15
*** rcernin has joined #openstack-swift		23:16
*** csmart has joined #openstack-swift		23:26
mattoliverau	mornign	23:48
notmyname	hello mattoliverau	23:48
mattoliverau	wow, that looks a lot different then your current update style. They look about as ugly any of my slides.	23:52
notmyname	it's funny seeing the original all-text and bullets version to the stylized text to the just-big-pics to my current style of a blend of text and pics	23:54
notmyname	I also need to figure out the right way to say "hello everyone. this is my last project update to give, so you get to listen to what I want to talk about for the next 40 minutes" ;-)	23:57
notmyname	"I've given close to 20 of these, so now imma tell you what's what"	23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!