Friday, 2015-06-05

*** annegentle has joined #openstack-swift00:09
*** zhill_ has joined #openstack-swift00:11
*** zhill_ has quit IRC00:11
*** annegentle has quit IRC00:14
hogood morning!00:18
notmynamegood mornign ho00:18
honotmyname: hello00:19
*** remix_tj has quit IRC00:21
*** annegentle has joined #openstack-swift00:22
hoacoles: "why did 'w' get missed out :)" <=== (^-^;)00:35
hoacoles: (^-^;) means I'm embarrassed and a cold sweat is come out in Japan :-)00:43
mattoliverauho: morning00:46
homattoliverau: morning!00:49
*** blmartin has joined #openstack-swift00:51
*** blmartin_ has joined #openstack-swift00:51
*** chlong has quit IRC00:52
*** chlong has joined #openstack-swift00:54
*** lpabon has joined #openstack-swift01:03
*** lpabon has quit IRC01:03
*** annegentle has quit IRC01:06
*** kota_ has joined #openstack-swift01:13
*** ChanServ sets mode: +v kota_01:13
kota_good morning01:13
kota_notmyname: hi :)01:18
*** hugespoon has left #openstack-swift01:49
*** blmartin has quit IRC01:56
*** blmartin_ has quit IRC01:56
*** jkugel has joined #openstack-swift02:12
hohello clayg!03:06
*** zul has quit IRC03:08
*** asettle is now known as asettle-afk03:10
*** zul has joined #openstack-swift03:20
*** asettle-afk is now known as asettle03:48
*** kota_ has quit IRC04:12
*** jamielennox is now known as jamielennox|away04:32
hopatch #187489 got liberasurecode in gate:
holiberasurecode_backend_open: dynamic linking error cannot open shared object file: No such file or directory04:47
hois there any env changes in gate?04:50
*** ahonda has quit IRC04:58
swifterdarrellho: is jerasure installed too?04:58
*** ppai has joined #openstack-swift05:00
hoswifterdarrell: i'm not sure. do you how to check it in gate?05:04
swifterdarrellho: no idea05:04
swifterdarrellho: ask the infra guys?05:04
hoswifterdarrell: do you know a gu05:05
swifterdarrellho: but based on the error, I'd first make sure libjerasure is actually installed05:05
homy keyboard something wrong...05:05
swifterdarrellho: I think monty is one of them?  not sure the full set of infra folks05:06
swifterdarrellho: try #openstack-infra channel?05:06
hoswifterdarrell: thanks! I will ask this to infra guys05:06
swifterdarrellho: np, good luck!05:07
swifterdarrellportante: good news! I'm clearly seeing that one bad disk per node totally fucks object-server, even with workers = 90 (3x disks).  Details to come, but it's night and day05:11
swifterdarrellportante: now testing with 1 bad disk in one of the storage nodes, then I'll do a control group with no bad disks... just normal unfettered swift-object-auditor05:12
portanteso then you'll do the servers-per-port runs with that05:12
swifterdarrellportante: sneak peek
swifterdarrellportante: ya05:12
* portante looks05:12
swifterdarrellportante: that graph has 4 runs of servers_per_disk=3 w/one hammered disk per storage node, then 4 runs of workers=90, also w/one hammered disk per node05:13
swifterdarrellportante: far right is teh first run w/workers=90 with only one hammered drive in one of two storage nodes05:13
portantewhich means th object servers are not all engaged handling requests, a few of them are getting the requests, a top on the system would probably bear that out05:14
swifterdarrellportante: I have a threaded python script that uses directio python module to issue random reads directly (O_DIRECT) to the raw block device of one disk; it gets I/O queue full to 128 with await times between 500 and 700+ ms05:15
portantewhere in that graph is the servers-per-port stuff05:15
swifterdarrellportante: ya, the longer blocking I/Os to the bad disk should interrupt servicing of I/O to other non-bad disks by that same, unlucky, swift-object-server05:15
swifterdarrellportante: servers_per_port=3 are teh left 4 runs05:16
swifterdarrellportante: (the ones that don't look like shit)05:16
*** SkyRocknRoll has joined #openstack-swift05:17
swifterdarrellportante: this is ssbench-master run-scenario -U bench2 -S -f Real_base2_simplified.scenario -u 120 -r 120005:17
swifterdarrellportante: 120 concurrency05:17
swifterdarrellportante: I simplified the realistic scenario to reduce noise; cutting down concurrency & whatnot for all background daemons other than swift-object-auditor also reduced noise05:17
portanteso in that graph, left y-axis is min avg max response time, and right is request per sec05:18
swifterdarrellportante: so I'm seeing even one hammered disk screws things up... this is 1 hammered disk out of ~60 total obj disks in cluster05:18
swifterdarrellportante: yup05:19
swifterdarrellportante: so lower black line is bad, adn higher red line is bad05:19
openstackgerritKota Tsuyuzaki proposed openstack/swift: Fix the missing SLO state on fast-post
swifterdarrellportante: I'll have the actual nubmers later (i have all the raw data for these runs)05:19
portantewow, so my guess is that if you were to lower the maxclients you would see less of that effect with 90 workers05:19
portanteand the flat lines just mean test wasn't running05:20
swifterdarrellportante: flat line was dinner + True Detective :)05:20
swifterdarrellportante: so you think lowering teh default 1024 max clients will imporove those runs on the right? the wokers=90?05:21
swifterdarrellportante: maybe I'll try that in the morning (still w/one bad disk); that'll be cheap & easy to test05:21
portanteit should because it will allow more of the workers to participate fully05:21
portanteeventlet accept greenlet is greedy05:22
swifterdarrellportante: i'm not convinced that starvation's happening, necessarily... I think the problem is that any object-server (on the affected node) can get fucked by that bad disk05:22
portanteit'll just keep posting accepts and gobbling up, as long as that process gets on the run queue05:22
swifterdarrellportante: but I'm interested in the experiment :)05:22
portantecertainly, but the more greedy the object server the larger the effect05:22
portanteservers-per-port is by nature not greedy05:23
swifterdarrellportante: I see CPU consumption of swift-object-server being uneven, but how can I tell there aren't outstanding I/Os blocked for the others?  i.e. lack of CPU consumptoin doesn't mean reqs aren't being processed05:23
swifterdarrellportante: makes sense; what value would you suggest?05:24
portanteI would drop it down to 5 or something05:24
swifterdarrellportante: k05:24
portantemaybe even 1 if 90 workers covers all your requests05:25
swifterdarrellportante: let's see... 120 clients w/some PUT amplification is up to ~360 concurrent reqs, divided by 180 total workers betw 2 servers, is 205:25
swifterdarrellportante: so I'll try it w/205:25
portantesounds reasonable05:26
portanteso what percentage of the oustsanding requests would be for the bad disk in this controlled experiment?05:26
swifterdarrellportante: thanks for the idea!  I won't get a chance to run it 'til tomorrow morning05:27
swifterdarrellportante: heading to bed soon05:27
portanteI should be myself05:27
swifterdarrellportante: all subject ot Swift's dispersion (md5)05:27
swifterdarrellportante: 1 bad disk (latest runs) out of 6005:27
portante60 across all servers, right?05:28
swifterdarrellportante: ya05:28
swifterdarrellportante: like 29 and 32 i think05:28
swifterdarrellportante: so actually 61, but whatever05:28
portanteso my guess is that what you'll is still works that servers-per-disk, maybe not too bad, but better than normal by a lot05:28
swifterdarrellP that a GET hits it is... 1 / (61 / 3) ~= 5%05:29
portantebut that is just a guess, it depends on how well the requests get spread out05:29
*** Triveni has joined #openstack-swift05:29
swifterdarrellP that a PUT hits it is... same, I guess05:29
portantethis is good stuff05:30
swifterdarrellportante: I'll ping you tomorrow when I have something05:30
swifterdarrellportante: g'night!05:30
*** mitz has quit IRC05:41
hoclayg, notmyname, torgomatic: macque's info:
cschwedeGood Morning!06:23
*** jistr has joined #openstack-swift06:57
hocschwede: Morning!07:03
openstackgerritChristian Schwede proposed openstack/python-swiftclient: Add connection release test
*** remix_tj has joined #openstack-swift07:27
*** mmcardle has joined #openstack-swift07:42
*** jistr is now known as jistr|biab07:47
*** hseipp has joined #openstack-swift08:09
*** acoles_away is now known as acoles08:27
acolesho: i did put :) after my comment. just amused me that the values accidentally ended up using v,x,y,z :P08:29
*** chlong has quit IRC08:34
*** foexle has joined #openstack-swift08:35
hoacoles: i didn't notice it. i exposed weaknesses in english :-)08:45
acolesho: did you get any idea why the unit tests failed with the problem? i see it on other jenkins jobs too08:46
hoacoles: I asked it to infra guys but i didn't get any response yet.08:48
hoacoles: FYI: 14:09:58 (ho) hello, patch #187489 got an error regarding to lberasurecode in gate. I would like to know whether libJerasure is installed or not.
*** jistr|biab is now known as jistr08:50
acolesho: thanks08:51
hoacoles: you are welcome!08:51
acolesho: that was a good addition to the test btw08:52
hoacoles: thanks! :)08:54
*** jistr has quit IRC08:59
*** jistr has joined #openstack-swift09:19
cschwedeacoles: do you remember if we have a „official“ abandon policy? I wanted to abandon a few waised patches (not mine).09:20
*** geaaru has joined #openstack-swift09:21
*** marzif_ has joined #openstack-swift09:21
acolescschwede: hi. iirc swift policy is that mattoliverau has a bot that finds old patches (4 weeks with -1 ??), mails the author to warm them of likely abandonment, then after 2 weeks he or notmyname abandon them.09:22
cschwedemorning :) yes, that’s for swift - but it what about swiftclient? i think we don’t do it there09:24
cschwedeacoles: ^^09:24
acolescschwede: i guess an improvement might be to place a comment on the patch when warning is given, then any core could abandon after another 2 weeks.09:24
acolescschwede: hmm, you are most likely right, maybe mattoliverau only does it for swift09:24
cschwedeacoles: ^^ is a better view, all patches that got either a -1/-2 from a reviewer or jenkins09:27
*** SkyRocknRoll has quit IRC09:27
cschwedethat’s more than half of all swiftclient patches09:27
acolespatch 158701 i will ping author (hp)09:27
cschwedeheh, i was looking at that patch just a few minutes ago :)09:28
acolescschwede: so we don't abandon if WIP, i think09:32
acolescschwede: global requirements notmyname always -2's09:33
cschwedemakes sense09:34
*** aix has joined #openstack-swift09:34
acolespatch 148791 appears to be superseded by patch 18562909:36
acolescschwede: shall i abandon 148791?09:37
cschwedei was thinking the same09:37
acolesdoing it now09:37
cschwedei abandoned a patch earlier and left the following msg:09:37
cschwede„There has been no change to this patch since nearly a year, thus abandoning this patch to clear the list of incoming reviews.09:38
cschwedePlease feel free to restore this patch in case you are still working on it. Thank you!“09:38
cschwedeon patch 10980209:38
*** SkyRocknRoll has joined #openstack-swift09:40
acolescschwede: i will abandon patch 160169 because i think the bug is already fixed09:45
cschwedeyes, and there was also no response to your comment09:46
acolescschwede: does an 'abandon' add to our stackalytics scores :P09:48
cschwedei don’t think so?09:48
acolescschwede: that leaves patch 116065 that is old and not WIP09:51
*** shlee322 has joined #openstack-swift09:51
cschwedeyes, and i agree with Darrells comment on that patch. probably a candidate for abandoning too09:52
acolesgo for it09:53
acolesauthor can always restore if the disagree09:53
acolespatch 172791 is not quite so old so i left a comment asking if more work is planned. also, it has had no human review.09:58
*** dmorita has quit IRC09:58
acolescschwede: i have fresh coffee and croissant waiting for me. bbiab :)09:58
openstackgerritChristian Schwede proposed openstack/python-swiftclient: Add ability to download objects to particular folder.
cschwedeacoles: coffee sounds great - enjoy!09:59
*** Triveni has quit IRC10:13
*** shlee322 has quit IRC10:18
*** shlee322 has joined #openstack-swift10:32
*** ho has quit IRC10:41
*** wasmum has quit IRC10:55
*** marzif_ has quit IRC11:11
*** marzif_ has joined #openstack-swift11:12
*** marzif_ has quit IRC11:13
*** marzif_ has joined #openstack-swift11:13
*** wasmum has joined #openstack-swift11:19
*** blmartin_ has joined #openstack-swift11:33
*** jkugel has quit IRC11:34
*** shlee322 has quit IRC11:56
*** wbhuber has joined #openstack-swift11:58
*** geaaru has quit IRC12:08
*** geaaru has joined #openstack-swift12:09
*** km has quit IRC12:11
*** mmcardle has quit IRC12:13
openstackgerritAlistair Coles proposed openstack/swift: Make test_proxy work independent of evn vars
openstackgerritAlistair Coles proposed openstack/swift: Make test_proxy work independent of env vars
*** zul has quit IRC12:16
*** zul has joined #openstack-swift12:16
*** thurloat_isgone is now known as thurloat12:21
*** sc has quit IRC12:23
*** mmcardle has joined #openstack-swift12:25
*** MVenesio has joined #openstack-swift12:25
*** MVenesio has quit IRC12:25
*** sc has joined #openstack-swift12:26
*** wbhuber has quit IRC12:27
*** annegentle has joined #openstack-swift12:32
*** blmartin_ has quit IRC12:38
openstackgerritPrashanth Pai proposed openstack/swift: Make object creation more atomic in Linux
*** jkugel has joined #openstack-swift13:00
*** ppai has quit IRC13:01
*** jkugel1 has joined #openstack-swift13:02
*** jkugel has quit IRC13:05
*** kei_yama has quit IRC13:07
*** petertr7_away is now known as petertr713:07
*** SkyRocknRoll has quit IRC13:18
*** wbhuber has joined #openstack-swift13:19
*** blmartin has joined #openstack-swift13:23
*** acoles is now known as acoles_away13:37
tdasilvagood morning13:38
cschwedeHello Thiago!13:42
tdasilvacschwede: hi!13:43
*** acampbell has joined #openstack-swift13:43
*** acampbel11 has joined #openstack-swift13:44
*** jrichli has joined #openstack-swift14:06
*** esker has joined #openstack-swift14:09
*** esker has quit IRC14:14
*** esker has joined #openstack-swift14:15
*** acoles_away is now known as acoles14:15
*** foexle has quit IRC14:27
swifterdarrellportante: initial results for workers=90 + max_clients=2 do not look good14:29
*** thurloat is now known as thurloat_isgone14:30
*** acampbel11 has quit IRC14:38
*** thurloat_isgone is now known as thurloat14:41
swifterdarrellportante: ya, we'll see how consistent the 4 runs are, but the first run of workers=90 + max_clients=2 is worse than workers=90 + max_clients=102414:43
*** minwoob has joined #openstack-swift14:50
portanteswifterdarrell: bummer15:12
*** B4rker has joined #openstack-swift15:12
swifterdarrellportante: cluster might have just gotten cold overnight; the 2nd run of max_clients=2 is looking very similar to the max_clients=102415:13
swifterdarrellportante: PUTs:
swifterdarrellportante: GETs:
swifterdarrellportante: far right are the (still going) max_clients=2 runs15:14
swifterdarrellportante: they're identical to the middle runs:  1 bad disk out of 61; swift-object-auditor running full speed, workers=90; only difference is max_clients15:14
swifterdarrellportante: and servers_per_disk=3 with two bad disks (one per server) is better than all the workers=90 runs w/only 1 bad disk15:16
swifterdarrellportante: so I'm still pretty sure that the I/O isolation is really important15:16
swifterdarrellportante: (what threads-per-disk and servers_per_port really try to get at )15:16
swifterdarrellportante: funny story: one of our guys is onsite w/Intel doing EC benchmarking on a much larger cluster than I've got for my testing and their results were being interfered with by like 1 or two naughty disks15:18
*** gyee_ has joined #openstack-swift15:38
*** janonymous_ has joined #openstack-swift15:41
*** janonymous_ has quit IRC15:46
*** david-lyle has quit IRC15:46
*** david-lyle has joined #openstack-swift15:46
*** SkyRocknRoll has joined #openstack-swift15:50
portanteswifterdarrell: yes15:50
portante;) that is the reality, too often we work with "pristine" environments and expect that is what customers have15:51
*** shlee322 has joined #openstack-swift15:55
*** zaitcev has joined #openstack-swift15:57
*** ChanServ sets mode: +v zaitcev15:57
swifterdarrellportante: little more data for the max_clients=2:
swifterdarrellportante: you can see it's very ball-park with max_clients=102416:03
swifterdarrellportante: (tossing the first run as an outlier)16:03
*** jistr has quit IRC16:03
swifterdarrellportante: I'm going to halt the max_clients=2 runs and proceed the rest of my targets16:04
*** B4rker has quit IRC16:07
*** B4rker has joined #openstack-swift16:10
*** B4rker has quit IRC16:14
*** B4rker has joined #openstack-swift16:16
*** jordanP has joined #openstack-swift16:29
*** Fin1te has joined #openstack-swift16:29
portanteswifterdarrell: so the timeline for the maxclients=1024 is 22:00 - 00:00, and maxclients=2 is 07:00 till end?16:38
*** breitz has quit IRC16:38
*** breitz has joined #openstack-swift16:39
portanteswifterdarrell: are there 4 phases to the test which might match those peaks and valleys?16:43
*** acoles is now known as acoles_away16:45
*** annegentle has quit IRC16:46
*** jordanP has quit IRC16:48
openstackgerritTim Burke proposed openstack/python-swiftclient: Add ability to download objects to particular folder.
*** mmcardle has quit IRC16:53
*** annegentle has joined #openstack-swift16:54
notmynamegood morning16:54
*** breitz has quit IRC16:57
*** SkyRocknRoll has quit IRC16:59
*** annegentle has quit IRC17:08
*** Fin1te has quit IRC17:10
*** SkyRocknRoll has joined #openstack-swift17:11
*** dimasot has joined #openstack-swift17:13
*** zhill_ has joined #openstack-swift17:15
*** geaaru has quit IRC17:18
pelusejrichli, you there17:27
*** harlowja has quit IRC17:27
*** SkyRocknRoll has quit IRC17:32
*** harlowja has joined #openstack-swift17:32
*** lastops has joined #openstack-swift17:33
*** SkyRocknRoll has joined #openstack-swift17:45
*** marzif_ has quit IRC17:45
*** B4rker has quit IRC17:46
*** annegentle has joined #openstack-swift17:50
*** marzif_ has joined #openstack-swift17:51
*** B4rker has joined #openstack-swift17:53
*** hseipp has left #openstack-swift17:53
*** B4rker has quit IRC17:58
*** B4rker has joined #openstack-swift17:59
jrichlipeluse: I am back.  what can I do for you?18:01
*** proteusguy has joined #openstack-swift18:04
*** Fin1te has joined #openstack-swift18:33
*** harlowja has quit IRC18:46
*** proteusguy has quit IRC18:47
*** gyee_ has quit IRC18:47
*** themadcanudist has joined #openstack-swift18:48
tdasilvaso i probably missed this conversation, but what's the status with the py27 tests failing?18:49
themadcanudisthey guys, i ran a $container->deleteAllObjects(); from the php opencloud SDK and the command translated to "DELETE /v1/AUTH_$tenant_id%3Fbulk-delete%3D1"18:50
themadcanudistwhich ended up deleting the account!! Now I can't do anything.. i get a 403 "recently deleted" message18:50
themadcanudistis this supposed to be possible?18:50
*** harlowja has joined #openstack-swift18:53
*** serverascode has quit IRC18:54
*** briancurtin has quit IRC18:54
*** zhiyan has quit IRC18:54
*** nottrobin has quit IRC18:54
*** odsail has joined #openstack-swift18:55
*** lastops has quit IRC19:00
redbothemadcanudist: what version of swift are you running?  It's possible that could happen if you have an old version of swift and you don't have the bulk delete middleware running and you have the proxy set up to do account management.19:02
redboand you're using a client that doesn't use POST for bulk deletes19:04
portantePOST for bulk deletes, don't tell the REST police!19:04
swifterdarrellportante: probably?  each "run" was subsequent 20-min ssbench runs w/like 120s sleep in between.  So there should have been 4 humps from max_clients=1024 and 2 or 3 from the in-progress max_clients=2 at the time I took those screenshots19:04
portanteswifterdarrell: k thanks19:05
*** lastops has joined #openstack-swift19:10
*** annegentle has quit IRC19:14
*** SkyRocknRoll has quit IRC19:14
*** lastops has quit IRC19:15
*** odsail has quit IRC19:18
*** lastops has joined #openstack-swift19:22
*** lastops has quit IRC19:25
themadcanudistredbo: That's likely it19:26
themadcanudistcan I disable that functionality!?19:26
themadcanudistit's *super-dangerous*19:26
*** acampbell has quit IRC19:26
themadcanudistI just blew away my test account by running a bulkdelete php api call *on a container*19:26
themadcanudistnot swift's fault, except for the fact that it allows that behaviour19:27
*** Fin1te has quit IRC19:29
openstackgerritMinwoo Bae proposed openstack/swift: The hash_cleanup_listdir function should only be called when necessary.
*** silor has joined #openstack-swift19:31
*** silor has quit IRC19:37
*** B4rker has quit IRC19:40
*** ptb has joined #openstack-swift19:42
ptbIn a multi-region cluster (4 replica - 2/2) when getting a 404 on head it will go over the wire to check the 2 remote disks and 2 secondary locations.  Ideally, when read affinity is enabled would it make sense to have an additional setting to only check local region for an object - if enabled?19:48
notmynameptb: even knowing that setting will result in false 404s to the client?19:51
*** annegentle has joined #openstack-swift19:54
ptbagreed, what I am trying to solve for is the extra latency over the wire to the remote region19:54
ptbb4 returning a 404 that is.  8)19:56
notmynameptb: yeah, that makes sense. just wanted to know if you still wanted the config option even knowing the tradeoffs19:59
notmynameptb: so it sounds like you're ok with a 404 for data that has been stored in the system but isn't in the current region20:00
notmynameis that true?20:00
ptbExactly! There is an application pattern in use that always checks b4 writing files.20:00
ptbIn a multi-region replication scenario having such a setting would save a great deal of over the wire traffic to the remote regions with a HEAD 40420:02
notmynameyeah, I understand how it's good for speeding up 404s20:03
notmynamebut if you have an issue in one region and those 2 replicas aren't available (but they are available in the other region), then you'll ask for it and get a 404 to the client20:03
*** annegentle has quit IRC20:04
notmynameI'm thinking of data that is actually already in the cluster20:04
themadcanudistredbo: Is there a way to disable the functionality that allows for older swift servers to have the account sent a DELETE?20:04
*** annegentle has joined #openstack-swift20:04
ptbYes...this is why I think it should be a setting.  Or in a multi-region >2 you could even have a single fallback region.20:04
*** petertr7 is now known as petertr7_away20:06
ptbI have regions A,B,C,D and perhaps cld set a fallback region to check if the local lookup fails - opposed to the current (albeit correct) behavior of checking all locations?20:07
notmynamea specific fallback is already accounted for with the read_affinity setting (you can prefer one region or zone over another)20:08
notmynamebut yeah, you're asking for a config option to return the wrong answer really fast (in some casese)20:09
notmynamegranted, the common case would just make 404s faster20:09
ptbYes.  Ideally that is the problem to address - returning 404s faster in a multi-region cluster during a HEAD operation20:10
ptbAs we add regions, customers are expecting the same response times...trying to see if there are options to help meet their expectations.  8)20:11
ptbWhich also sends another +1 for your async container fix!20:12
notmynamehave you considered not doing the initial HEAD and instead doing a PUT with the "If-None-Match: *" header?20:13
* notmyname needs to get back to that patch soon20:13
ptbI hadn't...will experiment.20:13
*** charlesw has joined #openstack-swift20:14
redbothemadcanudist: if allow_account_management is set to "yes" in the proxy configs, that should be removed/set to no.20:14
themadcanudistlast question redbo… if I accidently deleted an account like that, can I udnelete it and will the objects and containers still exist?20:15
themadcanudistcuz righ tnow i'm getting a 403 "recently deleted"20:15
notmynamethemadcanudist: from the swift "ideas" page: "utility to "undelete" accounts, as described in"20:16
*** blmartin has quit IRC20:16
redbothemadcanudist: it depends.  if you're running the swift-account-reaper, what's what actually deletes things.  Oh yeah just read that page.20:17
*** tellesnobrega_ has joined #openstack-swift20:17
notmynamethemadcanudist: unfortunately, the current way to do that is to twiddle the bit in the account db20:17
themadcanudistand mapping the account -> /srv/node/*/accounts/$HASH ?20:19
notmynameptb: can you add that idea of a config option as you described it as a bug on launchpad?
*** thurloat is now known as thurloat_isgone20:20
ptbIndeed...will do. Thanks!20:23
notmynameptb: thanks20:27
minwoob"Invalid arguments passed to liberasurecode_instance_create" -- has anyone seen this error lately, when testing a patch?20:34
minwoobIt isn't showing up locally, but when I push it to gerrit, the gate test seems to fail.20:35
minwoobOn my local system it is fine.20:35
*** tellesnobrega_ has quit IRC20:38
*** tellesnobrega_ has joined #openstack-swift20:39
torgomaticthemadcanudist: fwiw, more-recent versions of Swift disallow account DELETE calls with a query string for just that reason20:41
*** nottrobin has joined #openstack-swift20:46
*** serverascode has joined #openstack-swift20:51
*** zhiyan has joined #openstack-swift20:54
*** themadcanudist has quit IRC20:59
peluseminwoob, just approved so lets see if it happens again...20:59
minwoobpeluse: All right. Thank you.21:01
pelusethere's been some issues in the past up there, I didn't deal with them though so if it still pukes we'll find someone who did :)  works locally for me as well though21:02
*** briancurtin has joined #openstack-swift21:05
minwoobpeluse: Do you mean that problem has been observed before, or just in general that the gate tests occasionally have exhibited strange behaviors?21:10
*** tellesnobrega_ has quit IRC21:11
peluseminwoob, in general issues with liberasurecode/pyeclib setup correctly on those systems...21:13
pelusebah, still failed.21:15
peluseprobably little chance of getting help til Mon...21:15
*** lastops has joined #openstack-swift21:18
*** doxavore has joined #openstack-swift21:21
*** lastops has quit IRC21:23
*** shlee322 has quit IRC21:26
*** shlee322 has joined #openstack-swift21:27
*** esker has quit IRC21:29
*** jrichli has quit IRC21:31
*** shlee322 has quit IRC21:36
swifterdarrellpeluse: speaking of PyECLib & friends, the 1.0.7m versions break w/some distribute/pip/setuptools combination... I don't have anything more specific than that, but just FYI it can cause trouble21:38
swifterdarrellpeluse: i.e. I had 1.0.7m installed and the requirement ">=1.0.7" was unmet and my proxies wouldn't start21:39
swifterdarrellpeluse: arguably, I should have just packaged "1.0.7m" as "1.0.7" (a little white lie)21:39
swifterdarrellpeluse: (this was for PyECLib)21:39
*** openstack has joined #openstack-swift21:43
torgomaticswifterdarrell: that is pretty darn convincing21:43
peluseswifterdarrell, interesting. info on pyeclib.  tsg is out of the country, I'll pass it on to he and keving though21:43
swifterdarrellpeluse: thx; don't think there's a real fix beyond getting gate to use actual libraries and get rid of teh "m"21:44
swifterdarrellpeluse: (so the PyPi version also just uses liberasurecode vs. some bundled C stuff)21:44
swifterdarrelltorgomatic: yup...21:44
swifterdarrelltorgomatic: night and day21:44
notmynameswifterdarrell: run "2" is with the failing disk(s)?21:44
swifterdarrellnotmyname: numbers like "0;" and "2;" indicate how many disks were made slow21:45
notmynameah ok21:45
swifterdarrellacoles_away: clayg: cschwede: zaitcev: portante: mattoliverau: notmyname: tdasilva: torgomatic: so comparing "0;..." to "2;..." shows how badly 2 slow disks (1 per storage node, with 61 total disks in cluster) hurt21:46
notmynamenight and day :)21:46
notmynamethe max latency is the scariest part (of workers=90)21:47
openstackgerritDarrell Bishop proposed openstack/swift: Allow 1+ object-servers-per-disk deployment
torgomaticswifterdarrell: so that's 61 total disks in a cluster of 2 nodes, where each storage node had 1 slow disk?21:49
torgomaticor is that 2 slow disks per storage node?21:50
pelusewe just experienced those same type things here in our cluster...21:52
notmynamedfg_: glange: redbo: y'all should definitely check out those results and the patch ^21:52
swifterdarrellacoles_away: clayg: cschwede: zaitcev: portante: mattoliverau: notmyname: tdasilva: torgomatic: updated the gist to clarify21:56
swifterdarrellacoles_away: clayg: cschwede: zaitcev: portante: mattoliverau: notmyname: tdasilva: torgomatic: "0;..." is no slow disks (just normal swift-object-auditor) "1;..." is one slow disk (out of 61 total disks in cluster) in one of 2 storage nodes "2;..." is two slow disks (out of 61 total disks in cluster), one per each of the 2 storage nodes21:57
torgomaticswifterdarrell: thanks21:57
swifterdarrelltorgomatic: np21:57
zaitcevI saw that21:57
swifterdarrellacoles_away: clayg: cschwede: zaitcev: portante: mattoliverau: notmyname: tdasilva: torgomatic: also updated gist to include the make-a-drive-slow script I used; a value of 256 kills a disk pretty good (await between 500 and 1000+ms)21:59
openstackgerritMichael Barton proposed openstack/swift: go: restructure cmd/hummingbird.go
*** marzif_ has quit IRC22:01
swifterdarrellacoles_away: clayg: cschwede: zaitcev: portante: mattoliverau: notmyname: tdasilva: torgomatic: here's earlier results that illustrate the threads_per_disk overhead:
peluseswifterdarrell, so I guess the question is how many clusters out there are being hit by this but their admins have no idea?22:07
swifterdarrellpeluse: probably a lot?22:07
notmynameall of them? ;-)22:07
notmynamepeluse: only the ones that have drives that fail22:08
pelusesafe bet I think!22:08
pelusefail or just intermittently crappy perf?22:08
swifterdarrellpeluse: Intel saw it in their testing w/COSBench way back, like 4 design summits ago? that was what prompted the threads_per_disk change22:08
peluseyup, I remember - that was jiangang22:08
peluseportland summit22:08
swifterdarrellpeluse: notmyname: declining perf is worse than total failure, I think22:08
notmynameyeah, bad drive. or overloaded drive22:08
notmynameswifterdarrell: yeah22:08
swifterdarrellpeluse: ya! portland22:09
notmynamegood: working drive. bad: broken drive. worst: drive that isn't broken but is slow22:09
redbowas this not common knowledge?22:09
peluseI didn't think the extent was common knoweldge but could just be me22:10
swifterdarrellredbo: which part? the pain of slow disks has been common knowledge since at least the portland summit22:10
* peluse means by extent the drastic data that swifterdarrell just showed vs 'yeah there's impact'22:10
*** charlesw has quit IRC22:10
swifterdarrellredbo: Mercado Libre deployed object servers per disk via ring device port differentiation quite a while ago and talked about it, so that's been common knowledge22:11
swifterdarrellredbo: I don't know about common knowledge, but we've seen too-high overhead w/threads_per_disk and no longer recommend it22:11
pelusemaybe ripping it out would be a good low priority todo item at some point....22:12
*** dimasot has quit IRC22:12
notmynamethe new thing here is multiple listeners per port when there's one drive per port (right?). the mercado libre situation was 1:1 port:drive22:12
swifterdarrellpeluse: I've been terrified of blocking I/O calls starving eventlet hub for quite a while now22:12
swifterdarrellpeluse: servers_per_port has been on my hitlist for a long time... but finally had enough time to actually work on it22:12
peluseso they're not keeping you busy enough, is that what you're saying? :)22:13
swifterdarrellpeluse: haha22:13
notmynameand swifterdarrell's results show that the 1 worker per drive isn't always good (or much better than just a lot of worker wrt max latency). but multiple workers per port is great for smoothing out latency and keeping it low22:13
swifterdarrellnotmyname: ya, servers_per_port=1 didn't cut it22:14
swifterdarrellnotmyname: 3 was the sweet spot for my 30-disk nodes; not sure how that'd change for a 60 or 80-disk storage node22:14
notmynameI think the new thing here is that multiple servers per port where each port is a different drive22:14
portanteswifterdarrell: nice work, I see that maxclients=2 with 90 workers only helps the 99%, but the average still high, so that really shows how important the server per port method is22:17
openstackgerritDarrell Bishop proposed openstack/swift: Allow 1+ object-servers-per-disk deployment
swifterdarrellportante: ya, the max_clients=2 was a wash22:18
*** zhill__ has joined #openstack-swift22:19
*** jkugel1 has quit IRC22:19
*** zhill_ has quit IRC22:20
*** bi_fa_fu has joined #openstack-swift22:21
*** themadcanudist has joined #openstack-swift22:24
themadcanudisttorgomatic: thanks!22:25
*** ptb has quit IRC22:28
*** openstackgerrit has quit IRC22:37
*** openstackgerrit has joined #openstack-swift22:37
*** lcurtis has quit IRC22:41
*** wbhuber has quit IRC22:46
*** doxavore has quit IRC22:48
*** ozialien has joined #openstack-swift22:49
*** zhill__ has quit IRC22:53
*** zhill_ has joined #openstack-swift22:55
*** zhill_ is now known as zhill_mbp22:57
*** petertr7_away is now known as petertr722:58
*** ozialien has quit IRC23:02
*** ozialien has joined #openstack-swift23:05
*** petertr7 is now known as petertr7_away23:07
*** annegentle has quit IRC23:18
openstackgerritMichael Barton proposed openstack/swift: go: restructure cmd/hummingbird.go
*** ozialien has quit IRC23:33

Generated by 2.14.0 by Marius Gedminas - find it at!