kota_ | good morning | 00:47 |
---|---|---|
*** hoonetorg has quit IRC | 00:53 | |
*** lucasxu has joined #openstack-swift | 00:57 | |
*** lucasxu has quit IRC | 00:57 | |
*** hoonetorg has joined #openstack-swift | 01:05 | |
*** klrmn has quit IRC | 01:08 | |
*** tovin07_ has joined #openstack-swift | 01:09 | |
*** hoonetorg has quit IRC | 01:16 | |
*** cshastri has joined #openstack-swift | 01:25 | |
*** hoonetorg has joined #openstack-swift | 01:29 | |
*** bkopilov_ has quit IRC | 01:47 | |
*** aselius has quit IRC | 02:08 | |
*** m_kazuhiro has joined #openstack-swift | 02:18 | |
*** klrmn has joined #openstack-swift | 02:34 | |
*** pdardeau has joined #openstack-swift | 02:57 | |
*** bkopilov_ has joined #openstack-swift | 03:04 | |
mahatic | good morning | 03:24 |
openstackgerrit | Lingxian Kong proposed openstack/swift master: Write-affinity aware object deletion https://review.openstack.org/470158 | 03:24 |
* mahatic had a long weekend | 03:24 | |
mahatic | kota_: o/ | 03:24 |
kota_ | mahatic: o/ | 03:24 |
*** links has joined #openstack-swift | 03:42 | |
*** gkadam has joined #openstack-swift | 03:44 | |
*** pdardeau has quit IRC | 03:44 | |
*** psachin has joined #openstack-swift | 03:48 | |
*** gkadam has quit IRC | 03:48 | |
*** Dinesh_Bhor has joined #openstack-swift | 04:02 | |
mattoliverau | kota_, mahatic: o/ | 04:05 |
kota_ | mahatic: o/ | 04:06 |
*** m_kazuhiro has quit IRC | 04:13 | |
*** gkadam has joined #openstack-swift | 04:27 | |
*** MVenesio has quit IRC | 04:37 | |
kong | mattoliverau: hi, could you please take a look at the latest patchset for https://review.openstack.org/470158 and see if you are happy with that? | 04:39 |
patchbot | patch 470158 - swift - Write-affinity aware object deletion | 04:39 |
*** sanchitmalhotra has joined #openstack-swift | 04:39 | |
kong | i prefer the default value to be 'auto' since it's clear enough for ops than 'None' | 04:39 |
kong | also, clayg ^^ | 04:40 |
*** MVenesio has joined #openstack-swift | 04:41 | |
mattoliverau | kong: yup I'll take a look when I get a chance (hopefully tonight). When none in a configuration I meant nothing but yeah auto also tells ops that the value will be automatically calculated depending on your system so yeah that's better I think :) | 04:58 |
notmyname | FYI, no 2100 meeting this week. only the 0700 meeting | 05:09 |
notmyname | https://wiki.openstack.org/wiki/Meetings/Swift is up to date | 05:09 |
*** htruta` has joined #openstack-swift | 05:11 | |
*** etiennem1 has joined #openstack-swift | 05:12 | |
*** sgundur has joined #openstack-swift | 05:12 | |
*** edausqu has joined #openstack-swift | 05:13 | |
*** htruta has quit IRC | 05:13 | |
*** etienneme has quit IRC | 05:13 | |
*** mmotiani has quit IRC | 05:13 | |
*** sgundur- has quit IRC | 05:13 | |
*** EmilienM has quit IRC | 05:13 | |
*** edausq has quit IRC | 05:13 | |
*** edausqu is now known as edausq | 05:13 | |
*** psachin has quit IRC | 05:14 | |
*** psachin has joined #openstack-swift | 05:16 | |
*** adriant has joined #openstack-swift | 05:20 | |
*** EmilienM has joined #openstack-swift | 05:22 | |
*** adriant has quit IRC | 05:25 | |
*** mmmucky_ has joined #openstack-swift | 05:37 | |
*** mahatic_ has joined #openstack-swift | 05:37 | |
*** ChubYann has quit IRC | 05:38 | |
*** mmmucky has quit IRC | 05:39 | |
*** mahatic has quit IRC | 05:39 | |
*** ntata has quit IRC | 05:39 | |
*** rcernin has joined #openstack-swift | 05:55 | |
*** sanchitmalhotra has quit IRC | 05:57 | |
mattoliverau | kong: there is an config_auto_int_value helper method in utils you can use | 06:02 |
*** cschwede has joined #openstack-swift | 06:07 | |
*** ChanServ sets mode: +v cschwede | 06:07 | |
*** klrmn has quit IRC | 06:19 | |
*** rcernin has quit IRC | 06:26 | |
*** rcernin has joined #openstack-swift | 06:32 | |
*** hseipp has joined #openstack-swift | 06:39 | |
*** tesseract has joined #openstack-swift | 06:55 | |
*** hseipp has quit IRC | 06:56 | |
*** hseipp has joined #openstack-swift | 06:56 | |
*** hseipp has quit IRC | 07:00 | |
*** jeblair has quit IRC | 07:08 | |
*** jeblair has joined #openstack-swift | 07:09 | |
*** skudlik has joined #openstack-swift | 07:21 | |
*** gyee has quit IRC | 07:23 | |
*** pcaruana has joined #openstack-swift | 07:32 | |
acoles | good morning | 07:33 |
mattoliverau | acoles: o/ | 07:38 |
acoles | mattoliverau: hi | 07:38 |
*** oshritf has joined #openstack-swift | 07:53 | |
mahatic_ | acoles: mattoliverau: hello | 08:00 |
mattoliverau | mahatic: o/ | 08:04 |
openstackgerrit | liuyamin proposed openstack/swift master: Fix the reST field raises in docstrings https://review.openstack.org/451143 | 08:12 |
*** oshritf_ has joined #openstack-swift | 08:22 | |
*** oshritf has quit IRC | 08:25 | |
*** cbartz has joined #openstack-swift | 08:41 | |
*** etiennem1 is now known as etienneme | 08:41 | |
*** d0ugal has quit IRC | 08:58 | |
*** d0ugal_ has joined #openstack-swift | 08:59 | |
*** d0ugal_ has quit IRC | 08:59 | |
*** d0ugal has joined #openstack-swift | 08:59 | |
*** d0ugal has joined #openstack-swift | 08:59 | |
*** jarbod_ has joined #openstack-swift | 09:30 | |
*** tonyb_ has joined #openstack-swift | 09:30 | |
*** mvk has quit IRC | 09:30 | |
*** karenc has joined #openstack-swift | 09:30 | |
*** StevenK_ has joined #openstack-swift | 09:30 | |
*** jlvillal_ has joined #openstack-swift | 09:33 | |
*** logan_ has joined #openstack-swift | 09:33 | |
*** d0ugal has quit IRC | 09:34 | |
*** hoonetorg has quit IRC | 09:34 | |
*** rledisez has quit IRC | 09:34 | |
*** alecuyer has quit IRC | 09:34 | |
*** StevenK has quit IRC | 09:34 | |
*** logan- has quit IRC | 09:34 | |
*** jarbod___ has quit IRC | 09:34 | |
*** mgagne has quit IRC | 09:35 | |
*** tonyb has quit IRC | 09:35 | |
*** karenc_ has quit IRC | 09:35 | |
*** jlvillal has quit IRC | 09:35 | |
*** jlvillal_ is now known as jlvillal | 09:36 | |
*** alecuyer has joined #openstack-swift | 09:37 | |
*** jlvillal is now known as Guest60319 | 09:37 | |
*** mgagne has joined #openstack-swift | 09:37 | |
*** mgagne is now known as Guest28796 | 09:37 | |
*** rledisez has joined #openstack-swift | 09:37 | |
*** logan_ is now known as logan- | 09:37 | |
*** d0ugal has joined #openstack-swift | 09:41 | |
*** hoonetorg has joined #openstack-swift | 09:41 | |
*** chlong has joined #openstack-swift | 09:46 | |
*** mvk has joined #openstack-swift | 09:57 | |
*** tovin07_ has quit IRC | 10:01 | |
openstackgerrit | wangzhenyu proposed openstack/python-swiftclient master: Enable some off-by-default checks https://review.openstack.org/477872 | 10:19 |
openstackgerrit | Lingxian Kong proposed openstack/swift master: Write-affinity aware object deletion https://review.openstack.org/470158 | 10:42 |
*** bkopilov_ has quit IRC | 10:49 | |
*** cshastri has quit IRC | 11:00 | |
*** chlong_ has joined #openstack-swift | 11:04 | |
*** chlong has quit IRC | 11:04 | |
openstackgerrit | iswarya vakati proposed openstack/swift master: Add python 3.5 in classifier https://review.openstack.org/477901 | 11:52 |
*** vint_bra has joined #openstack-swift | 11:57 | |
*** vint_bra has quit IRC | 12:03 | |
*** bkopilov_ has joined #openstack-swift | 12:14 | |
*** lifeless has quit IRC | 12:30 | |
*** gkadam has quit IRC | 12:33 | |
*** lifeless has joined #openstack-swift | 12:43 | |
*** NM has joined #openstack-swift | 12:43 | |
*** skudlik has quit IRC | 12:47 | |
*** skudlik has joined #openstack-swift | 12:51 | |
*** lucasxu has joined #openstack-swift | 13:05 | |
*** kei_yama has quit IRC | 13:08 | |
*** klamath has joined #openstack-swift | 13:42 | |
*** klamath has quit IRC | 13:42 | |
*** klamath has joined #openstack-swift | 13:43 | |
*** ukaynar has joined #openstack-swift | 13:52 | |
*** vint_bra has joined #openstack-swift | 14:10 | |
*** gsmethells has joined #openstack-swift | 14:27 | |
*** aselius has joined #openstack-swift | 14:30 | |
*** d0ugal has quit IRC | 14:32 | |
*** d0ugal has joined #openstack-swift | 14:32 | |
*** d0ugal has quit IRC | 14:32 | |
*** d0ugal has joined #openstack-swift | 14:32 | |
*** skudlik has quit IRC | 14:45 | |
*** gkadam has joined #openstack-swift | 14:52 | |
*** gkadam has quit IRC | 14:59 | |
*** Guest60319 is now known as jlvillal | 15:04 | |
*** rcernin has quit IRC | 15:04 | |
*** links has quit IRC | 15:05 | |
openstackgerrit | Alistair Coles proposed openstack/swift master: WIP: Ring rebalance respects co-builders' last_part_moves https://review.openstack.org/477000 | 15:11 |
*** lucasxu has quit IRC | 15:14 | |
*** lucasxu has joined #openstack-swift | 15:16 | |
*** gsmethells has quit IRC | 15:20 | |
*** lucasxu has quit IRC | 15:28 | |
*** klamath has quit IRC | 15:28 | |
*** klamath has joined #openstack-swift | 15:30 | |
*** vinsh has joined #openstack-swift | 15:33 | |
*** gyee has joined #openstack-swift | 15:34 | |
*** gsmethells has joined #openstack-swift | 15:37 | |
*** klamath has quit IRC | 15:38 | |
gsmethells | clayg are you available? | 15:39 |
*** klamath has joined #openstack-swift | 15:39 | |
*** NM has quit IRC | 15:42 | |
*** ghebda has joined #openstack-swift | 15:44 | |
gsmethells | is anyone available to help with https://bugs.launchpad.net/swift/+bug/1700585 ? | 15:48 |
openstack | Launchpad bug 1700585 in OpenStack Object Storage (swift) "Objects can become orphaned in Swift 2.4.0" [Undecided,Incomplete] | 15:48 |
*** psachin has quit IRC | 15:54 | |
*** ukaynar has quit IRC | 16:01 | |
*** ukaynar has joined #openstack-swift | 16:01 | |
*** gyee has quit IRC | 16:02 | |
jrichli | gsmethells: clayg is on PST time, so its is straight-up 9am for him now. He will probably be online soon. | 16:02 |
*** links has joined #openstack-swift | 16:02 | |
gsmethells | Thanks jrichli for the heads up | 16:02 |
*** chlong has joined #openstack-swift | 16:05 | |
*** chlong has quit IRC | 16:05 | |
*** pcaruana has quit IRC | 16:09 | |
notmyname | good morning | 16:10 |
*** cbartz has quit IRC | 16:11 | |
notmyname | I'm off to Boston today, so I'm not sure how much I'll be online this week. I'll be back i SF on Friday | 16:13 |
*** lucasxu has joined #openstack-swift | 16:16 | |
*** itlinux has joined #openstack-swift | 16:19 | |
*** gyee has joined #openstack-swift | 16:25 | |
*** lucasxu has quit IRC | 16:29 | |
*** lucasxu has joined #openstack-swift | 16:30 | |
*** gsmethells_ has joined #openstack-swift | 16:30 | |
*** gsmethells has quit IRC | 16:34 | |
*** klamath has quit IRC | 16:39 | |
*** tesseract has quit IRC | 16:40 | |
*** klamath has joined #openstack-swift | 16:41 | |
*** skudlik has joined #openstack-swift | 16:41 | |
*** lucasxu has quit IRC | 16:49 | |
clayg | gsmethells_: ghebda: did you try the request node count setting with an insane high value? | 16:53 |
clayg | The dispersion populate isn't going to do much good after the fact - if you have it at 100% population and monitor the report closely it tells you when you're close to loosing >1 replica of a part. | 16:54 |
ghebda | clayg: gsmethells: I just updated the ticket with some information. We did not change that value because we weren't sure what exactly that line was supposed to look like | 16:55 |
clayg | I think I saw where the replication cycle time on your cluster is 4 days - which is high for a stable system. Maybe during some replication event it was much much higher? | 16:56 |
*** links has quit IRC | 16:57 | |
clayg | in app:proxy-server section add request_node_count = 1000 - n.b. will cause significant increase in latency for the cluster | 16:59 |
*** klamath has quit IRC | 17:00 | |
*** chsc has joined #openstack-swift | 17:02 | |
*** klamath has joined #openstack-swift | 17:05 | |
ghebda | OK, i set that variable to 1000 and restarted the swift proxy service. I ran the curl -I -XHEAD commands from ticket comment #5 with the same results, meaning the two that came back OK came back OK and the two giving 404s came back with 404s | 17:11 |
*** lucasxu has joined #openstack-swift | 17:12 | |
*** mat128 has joined #openstack-swift | 17:13 | |
*** klamath has quit IRC | 17:16 | |
*** klamath has joined #openstack-swift | 17:17 | |
ghebda | as far as the replication time, that could be because a node shut down and moved, so maybe has not caught up yet... | 17:18 |
*** klrmn has joined #openstack-swift | 17:25 | |
*** mvk has quit IRC | 17:27 | |
*** oshritf_ has quit IRC | 17:27 | |
gsmethells_ | clayg is there anything else we ought to looking into? | 17:29 |
*** vinsh has quit IRC | 17:31 | |
*** vinsh has joined #openstack-swift | 17:32 | |
*** vinsh has quit IRC | 17:32 | |
*** vinsh has joined #openstack-swift | 17:32 | |
*** vinsh has quit IRC | 17:37 | |
clayg | was the response time noticeably higher for the 404 after changing request_node_count - can you confirm from the transaction id and logs that the proxy was hitting all the disks looking for that hash? | 17:37 |
*** geaaru has joined #openstack-swift | 17:44 | |
ghebda | i did not run the initial queries, so I can't compare the times, but my results were pretty immediate. Here's the line from the logs for that transaction: | 17:51 |
ghebda | 10.11.12.200 10.11.12.200 26/Jun/2017/15/05/41 HEAD /v1/AUTH_75673124ca7f42968e28bc264ed32331/1/1.2.840.114204.2.2.4.1.243395414945023.14589405468080000/1.2.840.114204.2.2.2.1.199754063548486.14589405856570000.dcm HTTP/1.0 404 - curl/7.19.7%20%28x86_64-redhat-linux-gnu%29%20libcurl/7.19.7%20NSS/3.19.1%20Basic%20ECC%20zlib/1.2.3%20libidn/1.18%20libssh2/1.4.2 47901120da53431e... - - - tx5b2de3447b6a45e8a5a5e-00595122c4 - 1.1213 | 17:51 |
*** hseipp has joined #openstack-swift | 17:54 | |
*** links has joined #openstack-swift | 17:55 | |
clayg | that looks like the proxy request - were there a bunch of backend object-server requests too? like... dozens? with the same txid | 17:59 |
*** tonanhngo has joined #openstack-swift | 18:00 | |
*** links has quit IRC | 18:02 | |
ghebda | actually with those txids, i'm notseeing any requests in the logs on the nodes | 18:06 |
*** tonanhngo has quit IRC | 18:08 | |
clayg | that's not good | 18:10 |
clayg | ;) | 18:10 |
clayg | but it's not an account or container existence check issue? You can list the account, list the container, other objects in the container HEAD just fine - always the same objects fail? Maybe it's just an issue with your greps - can you sanity check you can find the object-server requests for the *successful* txid? | 18:11 |
*** tonanhngo has joined #openstack-swift | 18:12 | |
ghebda | does the object-server log those only if the request is made from the swift client, or and not do that for curl? | 18:13 |
clayg | from the object-server's perspective all requests are made from the proxy | 18:13 |
clayg | regadless of what client the proxy is talking to | 18:14 |
clayg | the user-agent string shouldn't make any difference anyway | 18:14 |
ghebda | ok. is verbose logging necessary to see which disks it checks? | 18:14 |
*** hseipp has quit IRC | 18:15 | |
clayg | object-server access log lines are logged at INFO | 18:15 |
clayg | https://docs.openstack.org/developer/swift/logs.html#storage-node-logs | 18:15 |
*** tonanhngo has quit IRC | 18:17 | |
*** skudlik has quit IRC | 18:18 | |
*** tonanhngo has joined #openstack-swift | 18:19 | |
*** tonanhngo has quit IRC | 18:23 | |
*** noark9 has joined #openstack-swift | 18:24 | |
*** tonanhngo has joined #openstack-swift | 18:25 | |
*** gsmethells_ has quit IRC | 18:28 | |
*** tonanhngo has quit IRC | 18:29 | |
*** tonanhngo has joined #openstack-swift | 18:31 | |
ghebda | so i pulled a recent GET operation txid from one of my logs, and on 1 node, i see 10 lines in rapid succession. on node 2, i get nothing from that grep, and on node 3, there are about 20 GET lines in rapid succession | 18:32 |
*** tonanhngo has quit IRC | 18:36 | |
*** tonanhngo has joined #openstack-swift | 18:37 | |
clayg | this is after you made the change to the request-node-count? if you're seeing ~30 lines for the GET probably so... but it must have been a 404 response, yes? I can't imagine why one node wouldn't have any logs for that txid? | 18:39 |
clayg | did it not log them/you can't find them? Or did it not get sent requests? | 18:40 |
*** tonanhngo has quit IRC | 18:41 | |
*** tonanhngo has joined #openstack-swift | 18:43 | |
*** gsmethells has joined #openstack-swift | 18:47 | |
*** tonanhngo has quit IRC | 18:48 | |
*** JimCheung has joined #openstack-swift | 18:48 | |
*** tonanhngo has joined #openstack-swift | 18:50 | |
*** tonanhngo has quit IRC | 18:54 | |
*** tonanhngo has joined #openstack-swift | 18:56 | |
*** Renich has joined #openstack-swift | 18:58 | |
*** geaaru has quit IRC | 19:00 | |
*** joeljwright has joined #openstack-swift | 19:01 | |
*** ChanServ sets mode: +v joeljwright | 19:01 | |
*** tonanhngo has quit IRC | 19:02 | |
*** ChubYann has joined #openstack-swift | 19:08 | |
*** noark9 has quit IRC | 19:11 | |
ghebda | ok, i'm actually able to download some of the files that were 404-ing before using the swift client. What's strange is that when I go to one of the storage nodes and swift-get-nodes for one of those files, the curl commands all 404 when I run them back on the proxy | 19:21 |
*** Renich has quit IRC | 19:22 | |
*** mvk has joined #openstack-swift | 19:23 | |
*** vinsh has joined #openstack-swift | 19:23 | |
clayg | ... I don't understand any of that except the bit where you say we found the lost objects ;) | 19:25 |
clayg | the rest makes no sense :P | 19:25 |
*** tonanhngo has joined #openstack-swift | 19:25 | |
clayg | I mean.. the words make sense - but I don't understand how all of these things can be true/correct at the same time :D | 19:25 |
clayg | distributed systems are fun! | 19:25 |
*** tonanhngo has quit IRC | 19:29 | |
ghebda | right. so, we also ran the ls commands on the paths that swift-get-nodes gives us and came up with no such file or directory for all 6 paths (3 primary and 3 handoff locations) | 19:30 |
gsmethells | ghebda - which server are you running the successful command on? what command is it? | 19:36 |
ghebda | i'm running a swift download from the proxy | 19:36 |
*** tonanhngo has joined #openstack-swift | 19:37 | |
clayg | so it's possible the object data is available in *non-primary* locations | 19:38 |
clayg | that's what the request_node_count is all about - check more nodes | 19:38 |
clayg | troubleshooting *why* the data isn't on primary locations is one thing - but it's a different thing - durability vs. availability | 19:39 |
clayg | go find the 200 responses for those txid's - do an ls on those devices/nodes - how far down the list of handoffs do those nodes appear in swift-get-nodes if you use the --all option or whatever it's called | 19:40 |
*** tonanhngo has quit IRC | 19:42 | |
clayg | maybe something like this would help figure out what's going on with replication: https://gist.github.com/clayg/4261e7dc654cc2c80a529b741a7cdd5f | 19:42 |
*** tonanhngo has joined #openstack-swift | 19:43 | |
*** tonanhngo has quit IRC | 19:48 | |
*** tonanhngo has joined #openstack-swift | 19:50 | |
ghebda | ok, yeah, so we ran through the entire list of handoff locations and found 3 that gave us 200s, much deeper down the list from the 3 primaries and 3 initial handoffs. so that makes sense | 19:50 |
clayg | NOICE | 19:54 |
*** tonanhngo has quit IRC | 19:54 | |
clayg | ... wait ... is that new information to only me? I wasn't sure if we were looking at potentially a durability issue? Or even really why you're still seeing some 404's on objects you don't think should have ever been deleted? | 19:55 |
*** tonanhngo has joined #openstack-swift | 19:56 | |
*** alanvitor has joined #openstack-swift | 19:59 | |
gsmethells | clayg - it sounds like you configured the storage nodes to basically search every possible location (essentially look on all disks). That is clearly non-optimal though it can give us 200s instead of 404s. I'm happy we achieved some level of correctness, but now I wonder how we achieve optimality again. Why are the files stuck in odd handoff locations and not in the primary locations? Is there a way to track that down now? | 19:59 |
*** tonanhngo has quit IRC | 20:00 | |
gsmethells | ghebda - have you guys run this? https://gist.github.com/clayg/4261e7dc654cc2c80a529b741a7cdd5f | 20:00 |
tdasilva | maybe it has to do with that very long replication time mentioned earlier? was it 4 days? | 20:01 |
clayg | yeah, having the parts on the wrong nodes is a availability issue - request_node_counts was an experiment | 20:01 |
gsmethells | yeah, I figured. Are we now at the hard part? | 20:01 |
ghebda | we have not run that yet | 20:02 |
gsmethells | There must be a way to get Swift to usher these "orphaned" handoffs to their final (primary) location? And then determine how to prevent it in the future. | 20:02 |
clayg | I think the hard part is not freaking out when you think you may have lost data - so I say we're past the hard part | 20:02 |
gsmethells | Oh, goodie. :) | 20:03 |
clayg | this is just optimization and twiddling stuff - we can do that with beers | 20:03 |
clayg | there's lots of strategies you can employ to deal with mis-placed parts | 20:04 |
clayg | recognizing the issue (and monitoring for it) is the biggest first step | 20:04 |
clayg | and sort of pre-requisite | 20:04 |
*** itlinux has quit IRC | 20:04 | |
clayg | since doing something... to "hurry things along" requires knowing when you can stop doing that - getting visibility is required | 20:04 |
clayg | probably you should just run the classify stuff - or some kind of part counting sort of collection script - you can graph it or stick it in a database or run it adhoc on demand (w/e) | 20:05 |
clayg | ... then probably just increase replicator concurrency, turn on handoffs_first/only and maybe set handoffs_delete to ... 2 or 1 - probably 1 is fine | 20:06 |
clayg | you might need to check your rsync max_connections configuration or something too... | 20:06 |
openstackgerrit | Thiago da Silva proposed openstack/swift master: Bind SAIO services on different loopback addresses https://review.openstack.org/475202 | 20:08 |
gsmethells | I like the suggestions but, what's step 1 here? | 20:08 |
*** tonanhngo has joined #openstack-swift | 20:08 | |
gsmethells | I'm thinking we ought to find a methodology for moving the parts along to their primary locations instead of their current handoff locations. How do we do that? | 20:09 |
clayg | if you had 100% dispersion population - you would normally be using swift-dispersion-report to track rebalances - or at least that's how it's been "classically" - although I prefer part counting/classification these days... | 20:09 |
gsmethells | Is that the "classify script"? | 20:09 |
clayg | yeah I think visibility is step 0 - otherwise you can't really tell what's working | 20:11 |
clayg | and since data is kinda heavy you have to have some patience - you might start something off and not know for hours (days?) if it's going to be the solution | 20:12 |
*** tonanhngo has quit IRC | 20:13 | |
*** tonanhngo has joined #openstack-swift | 20:14 | |
*** spetersen has joined #openstack-swift | 20:19 | |
*** tonanhngo has quit IRC | 20:19 | |
ghebda | hey clay, thanks. i'm actually jumping out now and scott(spetersen) is taking over. | 20:20 |
*** ghebda has quit IRC | 20:20 | |
*** ukaynar_ has joined #openstack-swift | 20:20 | |
spetersen | hi clay | 20:20 |
*** tonanhngo has joined #openstack-swift | 20:21 | |
clayg | heh, hi guys | 20:21 |
spetersen | Thanks for helping greg | 20:21 |
spetersen | We changed the number to 1000 and we found the data way down the list but we did find 3 copies. | 20:22 |
clayg | yeah it's great when it's not a durability issue *phew* | 20:22 |
spetersen | you are not kidding. | 20:23 |
spetersen | greg said the default was 6 ? | 20:23 |
clayg | so... rolling into the next thing - you need to do something to monitor replication/rebalance - you let it get way out of hand and stuff got messed way up without you knowing - not good | 20:23 |
spetersen | right | 20:24 |
*** ukaynar has quit IRC | 20:24 | |
clayg | as an open source project swift has many different monitoring stacks as it does deployments - IMHO, best of bread these days is part counting techniques - you listdir in /srv/node/dXXX/objects every so often and then check which of those part numbers "belong" on this node (according to the current ring version) and which ones don't | 20:25 |
clayg | you don't need to track the specific part's as much as the counts - 10K primaries 1K handoffs | 20:25 |
*** tonanhngo has quit IRC | 20:25 | |
clayg | if handoffs is big - it's not great - if it's not going down it's sorta not good - if it's growing it's bad | 20:26 |
clayg | is this a geo cluster? | 20:26 |
clayg | multi-region? | 20:26 |
clayg | write affinity? | 20:26 |
spetersen | we moved one of the 3 storage nodes to our main office, 100MB line | 20:26 |
*** tonanhngo has joined #openstack-swift | 20:27 | |
spetersen | since then we have not performed a successful replication | 20:27 |
spetersen | 5 days worth | 20:27 |
clayg | cool! | 20:27 |
spetersen | we have r1z1n1 r1z2n1 and r1z3n1 | 20:28 |
spetersen | we moved r1z2n1 here | 20:28 |
spetersen | We did have a 10GB backbone on the cluster but we reduced that connection by 100 times by moving a node here. | 20:28 |
spetersen | I believe that to be true, I may be wrong. | 20:29 |
clayg | which location do you ingest too? both? are you using read/write affinity? | 20:29 |
spetersen | a 3 node cluster with 34 bays each. | 20:29 |
spetersen | I wanted to enable that on day one before moving the node here but was rushed by management. | 20:30 |
spetersen | I just enabled read / write affinity on the proxy at our colo. | 20:30 |
spetersen | read_affinity = r1z1=100, r1z3=100, r1z2=200 | 20:31 |
*** itlinux has joined #openstack-swift | 20:31 | |
spetersen | write_affinity = r1z1, r1z3 | 20:31 |
spetersen | Is that right? | 20:31 |
*** tonanhngo has quit IRC | 20:32 | |
spetersen | r1z1n1 and r1z3n1 are also at the colo. | 20:32 |
clayg | maybe - write affinity requires a very high fidelity of careful replication monitoring - or maybe very predictable ingest patterns... | 20:32 |
clayg | read affinity is fine - but I'd recommend you drop write_affinity - even if you do get some kind of handle on replication monitoring... | 20:33 |
*** tonanhngo has joined #openstack-swift | 20:33 | |
clayg | doesn't matter than much right at this moment tho... | 20:33 |
spetersen | ok, i commented out write_affinity | 20:33 |
spetersen | do you recommend moving r1z2n1 back to the colo ? | 20:34 |
clayg | well... that's probably fine - but it only effects new parts coming in | 20:34 |
clayg | there's a lot of variables to consider there - i'm sorry I don't have a simple answer for you - best I can do is try to point you at information available online how swift works and let you learn and use your best judgement | 20:34 |
spetersen | cool | 20:35 |
clayg | I don't mind pointing you at a protip there and again - esp. if there's a hair on fire situation :D | 20:35 |
spetersen | Thanks! | 20:36 |
clayg | read over the classify script - test it on a development/lab environment - then check it out on one of the prod nodes - try to build up a mental model for what's going on - let me know if you have any questions or want to confirm/correct any understanding | 20:36 |
spetersen | sure thing! | 20:36 |
spetersen | thanks! | 20:36 |
clayg | np, gl | 20:37 |
*** tonanhngo has quit IRC | 20:38 | |
*** tonanhngo has joined #openstack-swift | 20:39 | |
clayg | i'm looking at https://gist.github.com/clayg/4261e7dc654cc2c80a529b741a7cdd5f and thinking it'd be nice if line 80 kept a count of the primary parts ondisk too - i only recently started making a distinction between handoffs and misplaced parts - but it's particularly relevant in geo clusters where handoffs are normal/expected with write_affinity - but misplaced parts after a rebalance are just never good unless they're | 20:41 |
clayg | going down. | 20:41 |
*** tonanhngo has quit IRC | 20:43 | |
*** ukaynar has joined #openstack-swift | 20:44 | |
*** ukaynar_ has quit IRC | 20:47 | |
spetersen | Thanks clay. | 20:47 |
*** lucasxu has quit IRC | 20:48 | |
*** vinsh has quit IRC | 20:49 | |
*** spetersen has quit IRC | 20:51 | |
*** tonanhngo has joined #openstack-swift | 20:52 | |
*** MVenesio has quit IRC | 20:53 | |
*** tonanhngo has quit IRC | 20:56 | |
*** tonanhngo has joined #openstack-swift | 20:58 | |
tdasilva | timburke: around? | 20:59 |
*** cschwede has quit IRC | 21:01 | |
*** tonanhngo has quit IRC | 21:02 | |
*** itlinux has quit IRC | 21:10 | |
*** tonanhngo has joined #openstack-swift | 21:10 | |
*** tonanhngo has quit IRC | 21:18 | |
clayg | tdasilva: he's on PTO this week | 21:26 |
tdasilva | clayg: ah cool! good for him. just left him a comment on patch 459023 | 21:27 |
patchbot | https://review.openstack.org/#/c/459023/ - liberasurecode - Consistently use zlib for crc32 | 21:27 |
clayg | yeah i wish I could tell you i know what's going on in that patch - I don't - kota_ ??? | 21:28 |
*** mmmucky_ is now known as mmmucky | 21:31 | |
*** tonanhngo has joined #openstack-swift | 21:32 | |
*** itlinux has joined #openstack-swift | 21:32 | |
*** joeljwright has quit IRC | 21:34 | |
*** tonanhngo has quit IRC | 21:37 | |
*** tonanhngo has joined #openstack-swift | 21:38 | |
*** tonanhngo has quit IRC | 21:43 | |
*** tonanhngo has joined #openstack-swift | 21:44 | |
*** ukaynar has quit IRC | 21:45 | |
*** tonanhngo has quit IRC | 21:50 | |
*** tonanhngo has joined #openstack-swift | 21:51 | |
*** alanvitor has quit IRC | 21:53 | |
*** tonanhngo has quit IRC | 21:55 | |
*** tonanhngo has joined #openstack-swift | 21:57 | |
*** gsmethells_ has joined #openstack-swift | 22:01 | |
*** tonanhngo has quit IRC | 22:02 | |
*** tonanhngo has joined #openstack-swift | 22:03 | |
*** gsmethells has quit IRC | 22:04 | |
*** gsmethells_ has quit IRC | 22:04 | |
*** tonanhngo has quit IRC | 22:07 | |
*** tonanhngo has joined #openstack-swift | 22:09 | |
*** tonanhngo has quit IRC | 22:14 | |
*** tonanhngo has joined #openstack-swift | 22:15 | |
*** tonanhngo has quit IRC | 22:19 | |
*** tonanhngo has joined #openstack-swift | 22:21 | |
*** tonanhngo has quit IRC | 22:26 | |
*** klamath has quit IRC | 22:32 | |
*** klamath has joined #openstack-swift | 22:33 | |
*** klamath_ has joined #openstack-swift | 22:35 | |
*** klamath has quit IRC | 22:35 | |
*** tonanhngo has joined #openstack-swift | 22:43 | |
*** itlinux has quit IRC | 22:46 | |
*** tonanhngo has quit IRC | 22:47 | |
*** tonanhngo has joined #openstack-swift | 22:49 | |
*** tonanhngo has quit IRC | 22:53 | |
*** tonanhngo has joined #openstack-swift | 22:55 | |
*** tonanhngo has quit IRC | 23:00 | |
*** tonanhngo has joined #openstack-swift | 23:01 | |
*** tonanhngo has quit IRC | 23:06 | |
*** hoonetorg has quit IRC | 23:09 | |
*** tonanhngo has joined #openstack-swift | 23:23 | |
*** tonanhngo has quit IRC | 23:24 | |
*** tonanhngo has joined #openstack-swift | 23:24 | |
*** hoonetorg has joined #openstack-swift | 23:26 | |
*** chsc has quit IRC | 23:31 | |
*** klamath_ has quit IRC | 23:38 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!