*** d34dh0r53 has quit IRC | 01:09 | |
*** baojg has joined #openstack-swift | 01:28 | |
*** d34dh0r53 has joined #openstack-swift | 02:47 | |
*** gkadam has quit IRC | 03:00 | |
*** thurloat8 has quit IRC | 03:03 | |
*** gkadam has joined #openstack-swift | 03:06 | |
*** openstackgerrit has joined #openstack-swift | 03:06 | |
openstackgerrit | zhongshengping proposed openstack/swift master: Replace git.openstack.org URLs with opendev.org URLs https://review.opendev.org/654278 | 03:06 |
---|---|---|
*** psachin has joined #openstack-swift | 03:22 | |
*** tonyb has joined #openstack-swift | 04:46 | |
*** tonyb has quit IRC | 06:13 | |
*** tonyb has joined #openstack-swift | 06:14 | |
*** ccamacho has joined #openstack-swift | 06:58 | |
*** baojg has quit IRC | 07:07 | |
*** pcaruana has joined #openstack-swift | 07:19 | |
*** tkajinam has quit IRC | 08:24 | |
*** pcaruana has quit IRC | 08:45 | |
*** rcernin has joined #openstack-swift | 09:03 | |
*** e0ne has joined #openstack-swift | 09:42 | |
*** hoonetorg has quit IRC | 10:00 | |
*** hoonetorg has joined #openstack-swift | 10:13 | |
*** baojg has joined #openstack-swift | 10:53 | |
*** pcaruana has joined #openstack-swift | 10:58 | |
*** gkadam has quit IRC | 12:49 | |
*** gkadam has joined #openstack-swift | 12:52 | |
*** psachin has quit IRC | 13:39 | |
*** openstackgerrit has quit IRC | 14:28 | |
clayg | cool, yeah everything seemed to "just work" on my end... | 15:00 |
clayg | we have p 654278 which looks fully legit | 15:00 |
patchbot | https://review.openstack.org/#/c/654278/ - swift - Replace git.openstack.org URLs with opendev.org URLs - 1 patch set | 15:00 |
*** gyee has joined #openstack-swift | 15:11 | |
notmyname | good morning | 15:41 |
*** e0ne has quit IRC | 15:47 | |
*** e0ne has joined #openstack-swift | 16:01 | |
*** gkadam has quit IRC | 16:30 | |
*** e0ne has quit IRC | 16:36 | |
*** ndk_ has quit IRC | 17:48 | |
*** ybunker has joined #openstack-swift | 17:49 | |
*** sleterrier has quit IRC | 17:50 | |
*** sleterrier has joined #openstack-swift | 17:50 | |
ybunker | is there any swift command that i can use to find where the partition (for example 1111) is located? i want to find the main partition and the replicas | 17:51 |
notmyname | `swift-get-nodes -p PARTITION` | 17:52 |
notmyname | `swift-get-nodes [-a] <ring.gz> -p partition` (from the usage string, so more complete/correct) | 17:53 |
*** e0ne has joined #openstack-swift | 17:58 | |
ybunker | notmyname: thanks :-) | 18:02 |
*** e0ne has quit IRC | 18:08 | |
ybunker | got into a drive fully 100% condition, and after trying many things.. for example (set handsoff_delete and handsoff_first) | 18:12 |
ybunker | and already added (3) new nodes to the cluster | 18:12 |
notmyname | for full clusters, the goal is to first add new drives, then get the handoffs moved as quickly as possible. the handoffs_first=true and handoffs_delete set to something like 1 or 2 are the first things to check | 18:13 |
notmyname | then check rsync settings. make sure you've got a lot of available connections on the new drives, make sure rsync is not accepting inbound connections on the servers with full drives | 18:13 |
ybunker | it seems that since the fully drives are at 100% used space (10G free), it does not 'remove' the replicated partitions from those drives.. and still at 100% | 18:13 |
*** e0ne has joined #openstack-swift | 18:14 | |
notmyname | clayg: what are the other things we set in the emergency replication mode? | 18:14 |
ybunker | notmyname: handsoff_ already check and is not freeing a byte :( | 18:15 |
clayg | if you're 100% full you need handoffs_delete = 1 | 18:15 |
ybunker | already did that on the fully nodes | 18:16 |
notmyname | meeting time for me. gotta close irc | 18:16 |
clayg | ok, then you should be good to go | 18:16 |
ybunker | thanks a lot notmyname | 18:16 |
clayg | you can increase object replicator works & concurrency if you have iops available on the nodes that want to drain | 18:16 |
ybunker | clayg: already did that but disks keeps at 100% | 18:17 |
clayg | you might want to back of the napkin how many streams of io you can pull/push per node | 18:17 |
clayg | check logs for errors - something could be preventing successful trasnfer (e.g. rsync connection limiting) | 18:17 |
clayg | this is replicated or EC fragments? | 18:18 |
ybunker | clayg: and the problem is that we have a 'maintenance' window for obj-repl that runs 4hs a day..because when it runs the latency of the cluster goes to the roof.. from 44ms... to 4s....and its impossible to operate at those numbers | 18:18 |
ybunker | replicated | 18:18 |
clayg | everybody has an iops budget - you could play with the ionice settings | 18:19 |
clayg | but - i figure you'd want to get things moving first and then worry about making easier to manage - you should be able to free up some bytes duirng your four hour window if you go full tilt | 18:20 |
clayg | no? | 18:20 |
ybunker | clayg: yeah i want to get rid of the 100% full condition.. the problem is that on those 4h replica.. the fully drives are still at 100%, i got 8 workers running on that window | 18:21 |
clayg | are you doing rsync connections/modules per disk? | 18:21 |
ybunker | per ACO | 18:22 |
clayg | so with 8 workers your outbound streams from the node is 8x concurrency - what's your object replicator concurrency? how many (object) disks per node? | 18:23 |
ybunker | found the following on the log error: object-replicator: [worker 1/8 pid=20511] @ERROR: max connections (8) reached -- try again later | 18:23 |
ybunker | 9 disks per node | 18:23 |
clayg | there you go! | 18:23 |
clayg | going no where fast | 18:24 |
ybunker | i just change max connections on rsyncd to 16 | 18:24 |
clayg | that limit is set in the rsync.conf | 18:24 |
clayg | is that enough? what concurrency are you running? | 18:24 |
clayg | 16 per node isn't even 2 connections per disk - I'd think you'd want 4-8 per disk in an emergency | 18:25 |
clayg | of course w/o rsync modules per disk you can't guarantee even distribution... | 18:25 |
ybunker | clayg: http://paste.openstack.org/show/749609/ | 18:25 |
clayg | yeah! try and move your deployment in this direction at some point -> https://github.com/openstack/swift/blob/master/etc/rsyncd.conf-sample#L25 | 18:28 |
clayg | ok, so concurrency is 8 and workers is 8 - so you have about 64 streams coming out of each full node - so you'll want the aggregate capacity of all new incoming hardware ready to receive all that | 18:29 |
clayg | how many standing nodes do you have - and how many are you adding? | 18:29 |
ybunker | i have 9 standing nodes (2 fully 100%) and adding (3) new ones | 18:30 |
clayg | ok, so you should be able to really hammer those new guys - I'd recommend increasing workers to 9 so you have one worker per disk - set concurrency to 4 so you have 324 outgoing streams, then set max_connections at something like 75 or so on those nodes you want to fill up and LET HER RIP! | 18:33 |
clayg | when is your window? it'll be a replication party!!! | 18:34 |
ybunker | for rsyncd something like this (names are from 4..12) => http://paste.openstack.org/show/749610/ | 18:34 |
clayg | oh, you ARE doing rsync module per disk then | 18:34 |
ybunker | nono, i mean the changes that you told me about | 18:35 |
clayg | how can that be tho? you're not setting it in the object-server.conf | 18:35 |
clayg | oh oh oh - sure that'd be nice to have for the future - makes replication a lot more closely tied to rare commodity of spinning platters | 18:36 |
clayg | up to you if you want to change that now or after the fire-drill | 18:36 |
clayg | we rebalanced lots of clusters before we had rsync modules per disk ;) | 18:36 |
ybunker | after :) | 18:36 |
clayg | yeah so tweak your workers X concurrency and increase the rysnc max_connections on the 3 new nodes like... A LOT | 18:37 |
clayg | you should see them start to get hammered with write io, and shortly after that the nodes pushing data will be able to do some DELETEs | 18:37 |
ybunker | so on all the nodes im going to set: workers = 9, concurrency = 4, replicator_workers = 9 and max_connections on rsyncd to 76 | 18:37 |
clayg | that's fine - if it's easy for you to make a subtle change I'd recommend a heterogeneous deployment | 18:38 |
clayg | lots of workers X concurrency on the PUSHING nodes w/ very little room for incoming rsync - then the OPPOSITE on the receiving nodes | 18:38 |
clayg | does that make sense? I'm not sure it's obvious what these options all really "do" exactly? | 18:39 |
ybunker | so on fully nodes i need to decrease the max_connections? or do i increase it on all the nodes? | 18:39 |
clayg | you man increase it on all nodes - it would be a very small improvement to leave it set to something small on the nodes that should be pushing data (just to avoid busy work) | 18:40 |
ybunker | got it | 18:40 |
clayg | normally "busy" work is fine, it's just... the code doesn't really understand "this is an emergency!!! FREAK OUT!" so if you give it a bunch of breathing room it might think "oh you want me to GO FAST, can do boss!" | 18:41 |
ybunker | got sense :) | 18:42 |
ybunker | hopefully with these changes it started to decrease a little bit :-), get back tomorrow hopefully with some good news :-), thanks clayg | 18:43 |
clayg | great! | 18:43 |
*** e0ne has quit IRC | 19:23 | |
*** dasp has joined #openstack-swift | 19:35 | |
*** ybunker has quit IRC | 19:53 | |
*** e0ne has joined #openstack-swift | 20:05 | |
*** pcaruana has quit IRC | 20:46 | |
*** fyx has quit IRC | 20:53 | |
*** jungleboyj has quit IRC | 20:54 | |
*** clayg has quit IRC | 20:56 | |
*** jungleboyj has joined #openstack-swift | 20:57 | |
*** e0ne has quit IRC | 20:58 | |
*** gmann has quit IRC | 20:59 | |
*** beisner has quit IRC | 21:00 | |
*** e0ne has joined #openstack-swift | 21:02 | |
*** jungleboyj has quit IRC | 21:04 | |
*** e0ne has quit IRC | 21:06 | |
*** jungleboyj has joined #openstack-swift | 21:07 | |
*** fyx has joined #openstack-swift | 21:07 | |
*** gmann has joined #openstack-swift | 21:07 | |
*** beisner has joined #openstack-swift | 21:08 | |
*** clayg has joined #openstack-swift | 21:14 | |
*** ChanServ sets mode: +v clayg | 21:14 | |
*** irclogbot_0 has quit IRC | 21:58 | |
*** tonyb has quit IRC | 21:58 | |
*** irclogbot_1 has joined #openstack-swift | 22:03 | |
*** jungleboyj has quit IRC | 22:07 | |
*** gmann has quit IRC | 22:07 | |
*** jungleboyj has joined #openstack-swift | 22:07 | |
*** gmann has joined #openstack-swift | 22:07 | |
*** sleterrier_ has joined #openstack-swift | 22:09 | |
*** gmann has quit IRC | 22:11 | |
*** irclogbot_1 has quit IRC | 22:11 | |
*** gmann has joined #openstack-swift | 22:11 | |
*** kinrui has joined #openstack-swift | 22:14 | |
*** fungi has quit IRC | 22:16 | |
*** sleterrier has quit IRC | 22:16 | |
*** mathiasb has quit IRC | 22:16 | |
*** kinrui is now known as fungi | 22:19 | |
*** irclogbot_0 has joined #openstack-swift | 22:21 | |
*** rcernin has quit IRC | 22:45 | |
*** rcernin has joined #openstack-swift | 22:45 | |
notmyname | working on my swift project update talk for next week, and I'm looking for some really old info. which means I'm looking through project update talks from back in 2011 (and earlier). wow they were ugly ;-) | 22:50 |
notmyname | eg http://d.not.mn/swift_overview_oscon2011.pdf | 22:51 |
*** tkajinam has joined #openstack-swift | 22:57 | |
*** threestrands has joined #openstack-swift | 23:04 | |
*** rcernin has quit IRC | 23:15 | |
*** rcernin has joined #openstack-swift | 23:16 | |
*** csmart has joined #openstack-swift | 23:26 | |
mattoliverau | mornign | 23:48 |
notmyname | hello mattoliverau | 23:48 |
mattoliverau | wow, that looks a lot different then your current update style. They look about as ugly any of my slides. | 23:52 |
notmyname | it's funny seeing the original all-text and bullets version to the stylized text to the just-big-pics to my current style of a blend of text and pics | 23:54 |
notmyname | I also need to figure out the right way to say "hello everyone. this is my last project update to give, so you get to listen to what I want to talk about for the next 40 minutes" ;-) | 23:57 |
notmyname | "I've given close to 20 of these, so now imma tell you what's what" | 23:57 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!