kota_ | good morning | 00:49 |
---|---|---|
mattoliverau | kota_: morning | 00:54 |
kota_ | mattoliverau: morning | 00:54 |
*** tqtran has quit IRC | 01:00 | |
charz | kota_: mattoliverau morning | 01:06 |
kota_ | charz: o/ | 01:06 |
zhengyin | good morning | 01:06 |
mattoliverau | charz, zhengyin: o/ | 01:10 |
clayg | hehe - good morning everyone! | 01:11 |
openstackgerrit | Clay Gerrard proposed openstack/swift: WIP: Make ECDiskFileReader check fragment metadata https://review.openstack.org/387655 | 01:12 |
*** ntata_ has joined #openstack-swift | 01:13 | |
clayg | i think overall the test failures are trending down - managed to get an implementation for the object server quarantine that i'm satisfied with | 01:14 |
clayg | but I'm sort of worrying/wondering if for backports we should *just* have the full read quarantine for the audtior - but we'll see what happens when we start to cherry pick it i 'spose | 01:14 |
*** ntata_ has quit IRC | 01:21 | |
*** blair has joined #openstack-swift | 01:28 | |
kota_ | clayg: thanks for updating that, will look at | 01:28 |
*** clu_ has quit IRC | 01:39 | |
openstackgerrit | Kazuhiro MIYAHARA proposed openstack/swift: Remove 'X-Static-Large-Object' from .meta files https://review.openstack.org/385412 | 02:12 |
*** chlong has joined #openstack-swift | 02:16 | |
openstackgerrit | Kota Tsuyuzaki proposed openstack/liberasurecode: Fix liberasurecode skipping a bunch of invalid_args tests https://review.openstack.org/387879 | 02:23 |
*** lcurtis has quit IRC | 02:53 | |
*** klrmn has quit IRC | 03:10 | |
*** kei_yama has quit IRC | 03:16 | |
*** rjaiswal has quit IRC | 03:41 | |
*** klrmn has joined #openstack-swift | 03:42 | |
*** tqtran has joined #openstack-swift | 03:45 | |
*** links has joined #openstack-swift | 03:46 | |
*** cshastri has joined #openstack-swift | 03:50 | |
*** tqtran has quit IRC | 03:50 | |
*** Guest29440 has quit IRC | 03:53 | |
*** klrmn has quit IRC | 04:01 | |
*** trananhkma has joined #openstack-swift | 04:19 | |
*** ppai has joined #openstack-swift | 04:34 | |
openstackgerrit | Tuan Luong-Anh proposed openstack/swift: Add prefix "$" for command examples https://review.openstack.org/388355 | 04:36 |
*** cshastri has quit IRC | 04:51 | |
*** klrmn has joined #openstack-swift | 04:54 | |
*** sure has joined #openstack-swift | 04:56 | |
*** sure is now known as Guest89668 | 04:56 | |
Guest89668 | hii all, I am doing "container syncronization" in same cluster for that i created my "container-sync-realms.conf" file like this http://paste.openstack.org/show/586313/ | 04:58 |
Guest89668 | i created two containers and uploaded objects to one but those objects are not copied to other container | 04:59 |
Guest89668 | please some one help | 04:59 |
*** klrmn has quit IRC | 05:08 | |
*** itlinux has quit IRC | 05:09 | |
*** raginbaj- has joined #openstack-swift | 05:11 | |
openstackgerrit | Bryan Keller proposed openstack/swift: WIP: Add notification policy and transport middleware https://review.openstack.org/388393 | 05:12 |
mattoliverau | Guest89668: so your container sync realms file is in /etc/swift/ | 05:16 |
*** SkyRocknRoll has joined #openstack-swift | 05:16 | |
Guest89668 | mattoliverau: yes | 05:17 |
mattoliverau | Guest89668: also you can remove the clustername2 line, you only need to define each cluster once (and you are only using 1 cluster) but that shouldn't be stopping anything | 05:17 |
Guest89668 | mattoliverau: here is error log http://paste.openstack.org/show/586314/ | 05:18 |
mattoliverau | hmm, so it's timing out and then on the retry its saying method not allowed. And it's a DELETE | 05:25 |
mattoliverau | Guest89668: you have the same secret key on both containers in the sync? | 05:27 |
Guest89668 | mattoliverau: yes | 05:27 |
Guest89668 | here is my http://paste.openstack.org/show/586315/ | 05:28 |
Guest89668 | container stats | 05:28 |
mattoliverau | and just to make sure, your proxy or loadbalancer (or whatever your ip in your realms config is pointing at) is listening on port 80? | 05:28 |
mattoliverau | cause thats what your realms config says | 05:29 |
Guest89668 | yes it is listening at port 80 | 05:29 |
*** qwertyco has joined #openstack-swift | 05:34 | |
mattoliverau | Guest89668: is the endpoint to your cluster (that's listening on port 80) as swift proxy? a load balancer. Just trying to figure out why the request is 405'ed | 05:43 |
mattoliverau | and where is container_sync on the proxy pipeline? | 05:43 |
Guest89668 | my swift endpoint is " http://192.168.2.187:8080/v1/AUTH_%(tenant_id)s" | 05:44 |
mattoliverau | oh so theyre listening on port 8080 not port 80 or do you have a load balancer listening on 80? | 05:45 |
*** ChubYann has quit IRC | 05:45 | |
Guest89668 | mattoliverau: no | 05:45 |
mattoliverau | Guest89668: if not try changeing your end points in the realm to: http://192.168.2.187:8080/v1/ | 05:45 |
mattoliverau | Guest89668: looks like the container sync daemon is trying to update whatever is listening on port 80, maybe a webserver | 05:46 |
Guest89668 | mattoliverau: just now i changed and tried again | 05:46 |
Guest89668 | but now also same result but in log that ERROR was gone | 05:47 |
mattoliverau | also I mentioned before you only need to specify a single cluster if you have a single cluster, so if you remove the second you'll have to update container metadata that points to cluser2 to point to cluster 1 | 05:47 |
mattoliverau | same result as in no objects? | 05:47 |
mattoliverau | have you waited or reran the container-sync? | 05:48 |
Guest89668 | yes i reran container-sync | 05:49 |
mattoliverau | if your on the container server in question (a container server that serves ars a primary for the container in question) you can stop container-sync and force it to run manually with: swift-init container-sync once | 05:49 |
Guest89668 | here is my new relam file http://paste.openstack.org/show/586317/ | 05:49 |
mattoliverau | if your using swift-init | 05:50 |
mattoliverau | Guest89668: that looks right (the :8080) | 05:51 |
Guest89668 | and my proxy-server.conf http://paste.openstack.org/show/586316/ | 05:51 |
mattoliverau | Guest89668: cool, container sync is before auth | 05:52 |
mattoliverau | Guest89668: is the container-sync logging anything? it should log something, even if its just saying its running or warning about internal client using default | 05:53 |
Guest89668 | here is that log http://paste.openstack.org/show/586318/ | 05:54 |
*** klrmn has joined #openstack-swift | 05:54 | |
*** klrmn has quit IRC | 05:56 | |
mattoliverau | hmm, yeah ok, thats a normal message, but means container sync is running. | 05:57 |
mattoliverau | Guest89668: now that we have the ports right, how about you put another object in a container.. just in case container sync thinks it's uptodate | 05:58 |
mattoliverau | cause it isn't erroring | 05:58 |
Guest89668 | mattoliverau: i deleted both the containers and created again but still same result | 05:59 |
mattoliverau | Guest89668: whats your container sync interval? have you set one in the config? if not by default its 300 seconds | 05:59 |
mattoliverau | or 5 mins | 06:00 |
mattoliverau | Guest89668: and your container server can access your proxy servers (via the IP you specified) because thats where container sync is running from | 06:01 |
Guest89668 | mattoliverau: i am using single node swift (proxy+storage in same node) | 06:03 |
mattoliverau | oh ok | 06:03 |
Guest89668 | and how to set container sync intervel | 06:03 |
mattoliverau | Guest89668: in your container-server config(s) there should be a section for container-sync. Under that heading you can specify an interval by adding: | 06:04 |
mattoliverau | interval = <number> | 06:05 |
mattoliverau | while your in there, you can turn up the logging verbosity for just the container sync daemon, but adding to the same container-sync section: | 06:05 |
mattoliverau | log_level = DEBUG | 06:05 |
mattoliverau | then restart the container sync daemon | 06:06 |
mattoliverau | and hopefully it'll log more and it might tell us what's going on | 06:06 |
*** trananhkma has quit IRC | 06:07 | |
*** trananhkma has joined #openstack-swift | 06:07 | |
*** trananhkma has quit IRC | 06:08 | |
*** trananhkma has joined #openstack-swift | 06:08 | |
*** rcernin has joined #openstack-swift | 06:11 | |
Guest89668 | mattoliverau: i added both parameters but same result | 06:12 |
Guest89668 | i didnt find any extra log | 06:13 |
mattoliverau | Guest89668: its seems that container-sync isn't finding objects to sync. | 06:18 |
mattoliverau | hmm weird, what could we be missing | 06:22 |
clayg | busy busy | 06:31 |
Guest89668 | mattoliverau: then how to debug this | 06:33 |
mattoliverau | Guest89668: any logs matching the time the container sync ran in the proxy server logs (the other side of the container sync transaction)> | 06:35 |
mattoliverau | ? | 06:35 |
Guest89668 | mattoliverau: just now i uploaded one object to the container1 here is the log http://paste.openstack.org/show/586321/ | 06:38 |
*** eranrom has joined #openstack-swift | 06:40 | |
mattoliverau | Guest89668: line 12 says your are getting a container-sync log error and your getting a 404 not found | 06:40 |
mattoliverau | so double check your container sync paths, and make sure you can access the proxy at the ip you specify in the realms config | 06:41 |
Guest89668 | the file "openstack" what i deleted | 06:41 |
*** winggundamth has quit IRC | 06:42 | |
Guest89668 | at first time creation of containers i uploaded object called openstck | 06:42 |
mattoliverau | it doesn't seem the log level change has taken effect because you should see alot more. | 06:42 |
Guest89668 | mattoliverau: after that i deleted two containers and again i created those | 06:43 |
Guest89668 | i have given log_level = DEBUG | 06:44 |
Guest89668 | it is correct | 06:44 |
Guest89668 | ? | 06:44 |
mattoliverau | yeah, and did you restart container-sync? Also I don't see your proxy log as apart of that. | 06:45 |
mattoliverau | clayg: your still up! | 06:45 |
*** winggundamth has joined #openstack-swift | 06:45 | |
Guest89668 | mattoliverau: yes i restarted | 06:50 |
*** silor has joined #openstack-swift | 06:51 | |
*** hseipp has joined #openstack-swift | 06:52 | |
onovy | clayg: no. i shutdowned it, spiked up. after power on, spiked down (and some time for sync of missing data). it was off for ~1 hour | 07:00 |
onovy | "down" = value before shutdown, but still higher than before upgrade | 07:02 |
*** tesseract has joined #openstack-swift | 07:08 | |
*** tesseract is now known as Guest90211 | 07:09 | |
*** qwertyco has quit IRC | 07:11 | |
clayg | handoffs first? | 07:12 |
clayg | i think there's a warning emitted if you have it turned out - but the behavior changed at some point | 07:13 |
clayg | onovy: 01410129dac6903ce7f486997a48e36072fa0401 first appeared in 2.7 tag | 07:14 |
*** silor has quit IRC | 07:18 | |
*** rledisez has joined #openstack-swift | 07:24 | |
*** joeljwright has joined #openstack-swift | 07:34 | |
*** ChanServ sets mode: +v joeljwright | 07:34 | |
*** trananhkma has quit IRC | 07:37 | |
*** _JZ_ has quit IRC | 07:40 | |
*** geaaru has joined #openstack-swift | 07:46 | |
*** tqtran has joined #openstack-swift | 07:49 | |
*** amoralej|off is now known as amoralej | 07:52 | |
onovy | clayg: # handoffs_first = False | 07:53 |
onovy | so commented out default | 07:53 |
*** tqtran has quit IRC | 07:53 | |
onovy | clayg: btw: s/and some time for sync/after some time for sync/ | 07:56 |
onovy | don't understand why rsync metric jumped up after one node shutdown. no reason to sync anything, because handoff are used only when there is disk failure/umount, not whole server failure, right? | 07:57 |
clayg | onovy: incoming writes/deletes will go to handoff while node is down - and i think handoffs_first would spin while waiting for the node to come back up - so it could have explained the change - oh well | 08:06 |
clayg | onovy: I still don't understand what sort of magic you're applying to make that recon drop show up in a graph like that - that metric - and all of rsync metrics are dropped at the end of a cycle - and overwritten by the next cycle | 08:07 |
clayg | IME the cycle while there is real part movement going on (rsync's) is *much* longer then the cycle of a few suffix rehashes | 08:08 |
clayg | i kept loosing my interesting numbers because I didn't want to spin in a tight loop collection the same number over and over just to find an interesting edge | 08:09 |
clayg | not to mention that the number only got spit out *after* the fact - so it gave me no insight into what was going on *right* now | 08:09 |
onovy | clayg: but i don't have enabled handoffs_first, so i don't think it explains it | 08:10 |
clayg | so I only track the statsd stuff from the repliator and the finish_time | 08:10 |
onovy | at graph. Every 5 minutes i GET all stores and process that json replies | 08:10 |
onovy | so i don't see "edges" when number is reset, but only numbers "every 5 minutes" | 08:10 |
clayg | and doesn't that give you the same number lots of times? like even in a stable cluster with enough nodes i report my cycle time ~20 mins for the whole cluster (it could be smaller if a reporter gets bad timing cycle takes 15 he reports at 14 5 mins later says it finished in 20, etc) | 08:12 |
clayg | when i have a rebalance going and real weight needs to move the cycle time is ... much longer ;) | 08:12 |
onovy | yep, it does | 08:12 |
onovy | it's not perfect metric, i know :) | 08:12 |
clayg | i can't really see that come through in the graphs your sending? is it just too scaled out? | 08:12 |
onovy | graph4? | 08:13 |
clayg | of the spike? | 08:13 |
onovy | jop, that's scaled out | 08:13 |
onovy | mmnt | 08:13 |
onovy | jop=yes :) | 08:13 |
onovy | https://s15.postimg.org/559djmgxn/graph5.png zoom in | 08:14 |
onovy | "max" zoom https://s17.postimg.org/5e0m1ji7j/graph6.png | 08:15 |
*** rcernin has quit IRC | 08:16 | |
*** rcernin has joined #openstack-swift | 08:16 | |
*** rcernin has quit IRC | 08:18 | |
*** rcernin has joined #openstack-swift | 08:19 | |
*** rcernin has quit IRC | 08:19 | |
*** rcernin has joined #openstack-swift | 08:19 | |
clayg | ok, so maybe it's not an imperfect proxy - i use a statsd metric suffix.syncs which happens around the same time as rsync getting incremented but only for primary partitions in update | 08:21 |
clayg | it'd be great if those rsync's were broken out by primary sync with peer or handoff sync to delete | 08:22 |
clayg | onovy: it doesn't make much sense to me that it would climb like that - even if a bunch of suffixes were invalid *and* also out of sync - why wouldn't one pass *fix* most of them? | 08:23 |
clayg | any chance some of the rsync's are failing? max connections limit or something? I think a "failure" number comes our gith next to rsyncs? | 08:23 |
onovy | yep, many failures | 08:26 |
onovy | i have connection limits in rsync to prevent overload drivers | 08:27 |
onovy | https://s12.postimg.org/floivqq59/graph_failure.png | 08:27 |
onovy | btw: we are going to change this to statsd, but i just "joined" our really old monitoring with swift using few lines of python code :) | 08:28 |
onovy | rsyncd.conf: max connections = 64 for objects | 08:32 |
onovy | for 24 disks per server | 08:32 |
onovy | and "concurrency: 4" for object-replicator | 08:33 |
*** x1fhh9zh has joined #openstack-swift | 08:33 | |
onovy | s/drivers/of drives/ | 08:33 |
*** x1fhh9zh has quit IRC | 08:48 | |
*** x1fhh9zh has joined #openstack-swift | 08:50 | |
Guest89668 | mattoliverau: my problem resolved and it is syncing objects without any issue | 08:51 |
mattoliverau | Great what did we miss? Sorry was called away | 09:02 |
Guest89668 | mattoliverau: the actual issue with the swift endpoint what i mentioned in the relams file after your intimation i changed and restarted services but i didnt checked the other container whether the objects are copied or not | 09:04 |
Guest89668 | now it is working fine and syncing the objects with the time intervel what i mentioned in container-server.conf | 09:05 |
mattoliverau | \o/ nice work! | 09:06 |
Guest89668 | mattoliverau: you helped a lot to debug this issue | 09:06 |
Guest89668 | thanks again...!! | 09:07 |
*** links has quit IRC | 09:09 | |
clayg | ok, well at least that explains how it's able to cycle so fast | 09:15 |
clayg | onovy: i'm really lovin' on the rsync module per disk configuration - my rsync.conf has a few more lines in it - but the per drive rsync connection limits are really nice | 09:17 |
clayg | so do that, and the statsd, and 10 million other things, and ... ;) | 09:17 |
clayg | wtg mattoliverau and Guest89668 !!!! | 09:18 |
clayg | wooooo!!! | 09:18 |
onovy | clayg: we are using salt for deploy, so i can generate rsyncd.conf automagically | 09:23 |
onovy | but need to fix this first use and finish upgrade first :] | 09:24 |
patchbot | Error: Spurious "]". You may want to quote your arguments with double quotes in order to prevent extra brackets from being evaluated as nested commands. | 09:24 |
onovy | *first issue | 09:26 |
onovy | clayg: do you think it's safe to downgrade that one swift store node? | 09:29 |
clayg | i'd have to look over the change log 2.5 -> 2.7 | 09:35 |
clayg | notmyname tries to highlight stuff that can't be backed out of - and we try to avoid stuff that can't be backed out of | 09:36 |
clayg | ... but it's still not something folks do very much - I don't personally have a lot of experience with it | 09:37 |
clayg | maybe we ~broke something with suffix hashing between versions | 09:38 |
clayg | i'm not sure that would explain the reboot tho - unless that many suffixes really got invalidated | 09:39 |
clayg | the rsyncs could be firing and doing nothing - not really sending data (some probably still do) but - maybe - the majority of the delta is rsyncs that are finding the directories already have the same files | 09:40 |
clayg | i sorta remember something with fast post because of ssync and ctype timestamps - we had fix *something* in suffix hashing | 09:41 |
clayg | but I thought we decided it was backwards compatible | 09:41 |
clayg | onovy: do you use fast-past? do you have .meta files in your cluster? | 09:42 |
onovy | # object_post_as_copy = true | 10:03 |
onovy | and no meta files | 10:04 |
onovy | btw: do you have 2.7.0 in production already? | 10:04 |
*** mvk has quit IRC | 10:14 | |
openstackgerrit | Stefan Majewsky proposed openstack/swift: swift-recon-cron: do not get confused by files in /srv/node https://review.openstack.org/388029 | 10:14 |
*** zhengyin has quit IRC | 10:39 | |
*** mvk has joined #openstack-swift | 10:43 | |
*** Guest89668 has quit IRC | 10:56 | |
onovy | clayg: https://github.com/openstack/swift/commit/2d55960a221c9934680053873bf1355c4690bb19 this is that patch about 'ssync' vs. suffix hashing? | 11:00 |
onovy | cite: in most | 11:00 |
onovy | 'normal' situations the result of the hashing is the same | 11:00 |
onovy | as before this patch. That avoids a storm of hash mismatches | 11:00 |
onovy | when this patch is deployed in an existing cluster. | 11:00 |
*** hseipp has quit IRC | 11:02 | |
*** ppai has quit IRC | 11:04 | |
*** x1fhh9zh has quit IRC | 11:06 | |
onovy | + https://github.com/openstack/swift/commit/9db7391e55e069d82f780c4372ffa32ef4e79c35 this patch makes downgrades harder | 11:07 |
*** cdelatte has joined #openstack-swift | 11:23 | |
*** x1fhh9zh has joined #openstack-swift | 11:48 | |
*** tqtran has joined #openstack-swift | 11:50 | |
*** tqtran has quit IRC | 11:55 | |
*** klamath has joined #openstack-swift | 12:02 | |
openstackgerrit | Kota Tsuyuzaki proposed openstack/swift: Items to consider for ECObjectAuditor https://review.openstack.org/388648 | 12:03 |
*** links has joined #openstack-swift | 12:20 | |
*** SkyRocknRoll has quit IRC | 12:25 | |
openstackgerrit | Shashi proposed openstack/python-swiftclient: Enable code coverage report in console output https://review.openstack.org/388669 | 12:30 |
kota_ | acoles: I updated my thought to patch 387655. | 12:51 |
patchbot | https://review.openstack.org/#/c/387655/ - swift - WIP: Make ECDiskFileReader check fragment metadata | 12:51 |
kota_ | and clayg:^^ | 12:51 |
kota_ | basically the way we are going with patch 387655 seems ok. | 12:51 |
patchbot | https://review.openstack.org/#/c/387655/ - swift - WIP: Make ECDiskFileReader check fragment metadata | 12:51 |
*** amoralej is now known as amoralej|lunch | 12:52 | |
kota_ | that one works to detect all frag archives given from admin6 as corrupted (this is awesome!) | 12:52 |
kota_ | but i found some cornar cases we cannot detect or ability to quarantine a good frag archives. | 12:53 |
acoles | kota_: ack | 12:53 |
kota_ | acoles, clayg: hopefully, i was just a worrier but i think it can happen so I'd like to hear your opinions for that. | 12:53 |
kota_ | acoles:!! | 12:54 |
acoles | kota_: worriers make good reviewers ! | 12:54 |
kota_ | sorry, i have to leave my office asap | 12:54 |
kota_ | that is going to be closed. | 12:54 |
acoles | kota_: just looking at yours and clayg changes | 12:54 |
admin6 | kota_: that sounds good :-) | 12:54 |
acoles | kota_: ok have a good night leave it with me | 12:54 |
kota_ | acoles: thanks man, and if you make comments (either gerrit, irc, etc...), i will take a look wherenever. | 12:55 |
* acoles worries kota may be locked in office all night | 12:55 | |
onovy | clayg: upgraded second node | 12:56 |
*** links has quit IRC | 12:59 | |
*** Jeffrey4l has quit IRC | 12:59 | |
*** remix_tj has quit IRC | 13:01 | |
*** remix_tj has joined #openstack-swift | 13:01 | |
*** jordanP has joined #openstack-swift | 13:04 | |
*** StevenK has quit IRC | 13:09 | |
*** StevenK has joined #openstack-swift | 13:15 | |
*** mvk has quit IRC | 13:50 | |
*** mvk has joined #openstack-swift | 13:53 | |
*** amoralej|lunch is now known as amoralej | 13:57 | |
*** vinsh has quit IRC | 14:07 | |
*** silor has joined #openstack-swift | 14:11 | |
onovy | clayg: so new info: after second node upgrade, rsync metrics bumped up again. and i done test in our test env. If I have 1/2 nodes of 2.5.0 and 1/2 of 2.7.0, rsync metrics is higher than if i have all nodes on same version | 14:11 |
onovy | so i think there is hashes compare incompatibility between 2.5.0 and 2.7.0 | 14:12 |
onovy | and it's not only about rsync metrics, rsync cmd is really called much more | 14:12 |
onovy | https://s14.postimg.org/tboppflq9/graph_2_nodes.png // rsync metrics graph | 14:14 |
*** jordanP has quit IRC | 14:15 | |
*** x1fhh9zh has quit IRC | 14:18 | |
*** hseipp has joined #openstack-swift | 14:24 | |
*** sgundur has joined #openstack-swift | 14:28 | |
*** jistr is now known as jistr|call | 14:28 | |
*** silor1 has joined #openstack-swift | 14:31 | |
*** silor has quit IRC | 14:32 | |
*** silor1 is now known as silor | 14:32 | |
tdasilva | rledisez, acoles, onovy: what's the best practice for your clouds re object-expirer? do you typically run on storage nodes or proxy nodes. doesn't seem like there's good consensus, so I proposed patch 388185 | 14:37 |
patchbot | https://review.openstack.org/#/c/388185/ - swift - added expirer service to list | 14:37 |
*** sgundur has quit IRC | 14:39 | |
tdasilva | ahale: ^^^ | 14:39 |
*** sgundur has joined #openstack-swift | 14:43 | |
*** vinsh has joined #openstack-swift | 14:43 | |
rledisez | tdasilva: for now, we run on proxy node because we don't have real scaling issues with the expirer. The rare situation were we had problem, we just increased concurrency and it was enough for us. but i guess it depends on how much object you have to expire. we expire between 1M and 1.5M objects every day and have no negative feebacks | 14:51 |
rledisez | would be nice to have some metrics about how many expired objects are waiting to be effectively expired | 14:51 |
rledisez | querying the containers of the special account .expired-objects (or whatever is its name) | 14:52 |
rledisez | tdasilva: what are you calling storage node on your patch? object or account/container? | 14:53 |
rledisez | i'm affraid that if it runs on object servers there will be too much requests on the container servers, taking down the entire clusters (it already happend to us with a homemade process that was querying containers from object servers) | 14:54 |
rledisez | memcache would be a requirement then | 14:55 |
*** vinsh has quit IRC | 14:55 | |
*** vinsh_ has joined #openstack-swift | 14:55 | |
*** vinsh has joined #openstack-swift | 14:56 | |
*** klrmn has joined #openstack-swift | 14:58 | |
*** vinsh_ has quit IRC | 15:00 | |
*** sgundur has quit IRC | 15:00 | |
*** hseipp has quit IRC | 15:00 | |
hurricanerix | tdasilva I am going to try and get this updated over the ocata cycle: https://review.openstack.org/#/c/252085/ | 15:10 |
patchbot | patch 252085 - swift - Refactoring the expiring objects feature | 15:10 |
*** jistr|call is now known as jistr | 15:12 | |
*** Guest90211 has quit IRC | 15:13 | |
*** pcaruana has quit IRC | 15:14 | |
tdasilva | rledisez: honestly i was calling storage node anything but proxy. typically we don't separate aco nodes, but i understand if you guys do | 15:15 |
tdasilva | hurricanerix: cool, are you planning to do that on the golang code? | 15:16 |
*** rcernin has quit IRC | 15:16 | |
hurricanerix | tdasilva not sure yet, since there is already a POC mostly done, i was just going to rebase it to get it up to master and verify that it does not break anything. | 15:17 |
tdasilva | hurricanerix: got it | 15:19 |
hurricanerix | tdasilva i think it also needs some more documentation, like a deployment/rollback strategy, since this will likely need to be done in phases. | 15:19 |
glange | tdasilva: the object expirier stuff as written can cause problems with heavy usage | 15:19 |
glange | tdasilva: besides getting behind, it can result in a large number of asyncs | 15:19 |
tdasilva | glange: yeah, i remember dfg talking about that in tokyo | 15:19 |
glange | tdasilva: for really heavy usage, we need a rewrite either like the one alan did or something similar | 15:20 |
tdasilva | glange: do you guys also currently run on the proxy nodes? | 15:20 |
glange | tdasilva: we are only keeping up in some of our clusters because we run a hacked up version of the code | 15:20 |
glange | each of our clusters have a few extra systems that are used for various things | 15:21 |
glange | we run the expirer there | 15:21 |
tdasilva | glange: oh, interesting, neat | 15:21 |
glange | these extra boxes do log processing and some other stuff | 15:21 |
glange | we have a few customers that heavily use that feature :/ | 15:22 |
glange | it doesn't scale very well as written :) | 15:22 |
glange | and we give the developer who wrote that feature (he sits nearby) crap about it from time to time :) | 15:23 |
tdasilva | glange: hehehe | 15:26 |
acoles | clayg: fyi I am working on fixing the ssync tests in patch 387655 | 15:28 |
patchbot | https://review.openstack.org/#/c/387655/ - swift - WIP: Make ECDiskFileReader check fragment metadata | 15:28 |
acoles | clayg: back later | 15:29 |
*** hoonetorg has quit IRC | 15:29 | |
*** acoles is now known as acoles_ | 15:29 | |
*** sgundur has joined #openstack-swift | 15:36 | |
*** jistr is now known as jistr|biab | 15:39 | |
onovy | tdasilva: hi. we run expirer on 1-4 nodes in every region | 15:39 |
onovy | i mean 1. - 4. storage nodes | 15:39 |
onovy | and every dones 1/4 of expiring | 15:40 |
onovy | *does | 15:40 |
onovy | so: processes=4, process=0 on first storage node, =1 on second, etc. | 15:41 |
onovy | same in both region. so if one region if off, we still expire objects | 15:42 |
onovy | in first version we had expirer on all nodes which processes=0, but there was many errors in long. expirer was trying to delete object which was just deleted i few seconds before by another one expirer | 15:42 |
onovy | *one region is off | 15:43 |
*** hseipp has joined #openstack-swift | 15:43 | |
onovy | tdasilva: we have aco on same servers => storage nodes. p is separated | 15:43 |
onovy | and a+c is on SSD, o on rotational disk | 15:43 |
notmyname | good morning | 15:45 |
onovy | + we have ~ x0-x00 expiring per seconds and x000 of async in queue :) | 15:45 |
*** links has joined #openstack-swift | 15:46 | |
rledisez | tdasilva: fyi, we used to do pac / o, we are now mmoving to p / ac / o | 15:48 |
onovy | rledisez: hi. what's your reason for separating ac from o pls? | 15:49 |
*** tqtran has joined #openstack-swift | 15:51 | |
rledisez | performance. o are slow rotational devices while ac are fast SSD. and also number | 15:52 |
rledisez | onovy: ^ | 15:52 |
onovy | ah. we have 1 SSD per storage node and 23 rotation disk | 15:53 |
onovy | ac are on one SSD, o are on 23 rotation disk | 15:53 |
rledisez | onovy: makes sense, but it would cost too much for us. we have thousands of object servers, we only need 100 or 200 SSD for ac servers | 15:54 |
onovy | ah, right. we have ~16 stores per region now :) | 15:54 |
notmyname | rledisez: onovy: I'd definitely appreciate it if you can help update https://etherpad.openstack.org/p/BCN-ops-swift for next week | 15:56 |
onovy | notmyname: but i'm not op :) | 15:56 |
onovy | i will forward it to our ops | 15:56 |
*** tqtran has quit IRC | 15:56 | |
rledisez | notmyname: thx for the reminder, i wrote down some topics I had in mind, will try to think more :) | 16:00 |
*** jistr|biab is now known as jistr | 16:05 | |
*** admin6_ has joined #openstack-swift | 16:07 | |
notmyname | thanks :-) | 16:08 |
*** klrmn has quit IRC | 16:08 | |
onovy | notmyname: is there any deadline for that etherpad? | 16:09 |
*** admin6 has quit IRC | 16:10 | |
*** admin6_ is now known as admin6 | 16:10 | |
notmyname | onovy: i put a link to the agenda item in there. that's the deadline. when the session starts | 16:10 |
*** ChubYann has joined #openstack-swift | 16:11 | |
notmyname | cschwede: around? | 16:12 |
openstackgerrit | John Dickinson proposed openstack/swift: use the new upper constraints infra features https://review.openstack.org/354291 | 16:19 |
*** sgundur has quit IRC | 16:30 | |
*** rledisez has quit IRC | 16:31 | |
*** links has quit IRC | 16:31 | |
*** sgundur has joined #openstack-swift | 16:36 | |
onovy | notmyname: ok, thanks, forwarded :] | 16:37 |
patchbot | Error: Spurious "]". You may want to quote your arguments with double quotes in order to prevent extra brackets from being evaluated as nested commands. | 16:37 |
onovy | clayg: https://bugs.launchpad.net/swift/+bug/1634967 | 16:41 |
openstack | Launchpad bug 1634967 in OpenStack Object Storage (swift) "2.5.0 -> 2.7.0 upgrade problem with object-replicator" [Undecided,New] | 16:41 |
*** pcaruana has joined #openstack-swift | 16:51 | |
clayg | onovy: sigh (on fast-post suffix hashing change) - i'm running out of ideas! | 16:51 |
onovy | i think it must be related to suffix hashing change | 16:52 |
onovy | i read whole git log 2.5.0..2.7.0 and only this seems related | 16:52 |
onovy | good news is i can reproduce it in lab | 16:53 |
onovy | and i think everybody can :) | 16:53 |
clayg | onovy: but - i'm still not sure that the spike isn't just because of 2.5 <=> 2.7 | 16:54 |
onovy | i'm sure it's problem with "version hybrid cloud" | 16:55 |
clayg | it's not like all our >= 2.7 clusters got a 10x increase in rsync traffic and no one noticed | 16:55 |
onovy | if whole cluster have same version (2.7 or 2.5) problem disappear | 16:55 |
clayg | *maybe* we saw the same bumps *while* upgrading but didn't notice | 16:55 |
onovy | look to bug :) | 16:55 |
onovy | if i have 2x 2.5.0 + 2x 2.7.0 in lab, i have big spike | 16:56 |
clayg | ok, so ... it probably was something in suffix hashing between 2.5 and 2.7 | 16:56 |
onovy | when i downgrade or upgrade whole cluster to same version, spike disappear | 16:56 |
onovy | (after few tens of minutes) | 16:56 |
onovy | yep | 16:56 |
onovy | i think so | 16:56 |
onovy | maybe it's just "feature", but we should document it than | 16:56 |
onovy | and maybe recommend to shutdown replicator during upgrade process | 16:57 |
onovy | because it can overload cluster imho | 16:57 |
*** Jeffrey4l has joined #openstack-swift | 16:58 | |
clayg | I'm looking @ https://review.openstack.org/#/c/267788/ - but i made a note in the review that when I had it all loaded in my head I thought the hashes would always be the same | 16:59 |
patchbot | patch 267788 - swift - Fix inconsistent suffix hashes after ssync of tomb... (MERGED) | 16:59 |
clayg | maybe you could poke at the REPLICATE api with curl or do some debug logging to find out if one of your parts on 2.7 code has a different result in hashes.pkl than a 2.5 node for the same part? | 17:00 |
onovy | can you try in your lab (with your config) reproduce it? | 17:01 |
onovy | just install few 2.5.0 nodes and upgrade few of them to 2.7.0 | 17:02 |
onovy | we can confirm it's not "my setup" problem | 17:02 |
clayg | onovy: not this week I can't! ;) | 17:02 |
onovy | :] | 17:02 |
patchbot | Error: Spurious "]". You may want to quote your arguments with double quotes in order to prevent extra brackets from being evaluated as nested commands. | 17:02 |
clayg | trying to get ready for barca and fix some bugs :) | 17:02 |
onovy | clayg: do you have 2.7.0 in production already btw? | 17:02 |
clayg | onovy: this is our latest tag -> https://github.com/swiftstack/swift/tree/ss-release-2.9.0.2 | 17:03 |
clayg | we have lots of folks that have upgraded to 2.9, some are still on ... much older releases | 17:03 |
*** amoralej is now known as amoralej|off | 17:04 | |
onovy | ok | 17:04 |
onovy | clayg: what about: https://review.openstack.org/#/c/387591/ ? | 17:04 |
patchbot | patch 387591 - swift - Set owner of drive-audit recon cache to swift user | 17:04 |
*** klrmn has joined #openstack-swift | 17:05 | |
onovy | zaitcev: torgomatic: ^ can you look too pls? | 17:06 |
*** tqtran has joined #openstack-swift | 17:09 | |
onovy | clayg: thanks | 17:09 |
*** joeljwright has quit IRC | 17:10 | |
clayg | onovy: do you still have a mixed environment in play - or is everything upgraded to 2.7 now? | 17:12 |
onovy | clayg: in dev i have anything. in production i have 2 nodes on 2.7.0, and other on 2.5.0 | 17:13 |
clayg | onovy: well would you confirm/deny my supsicion about mis-mashed suffix hashing? https://bugs.launchpad.net/swift/+bug/1550563 | 17:13 |
openstack | Launchpad bug 1550563 in OpenStack Object Storage (swift) "need a devops tool for inspecting object server hashes" [Wishlist,New] | 17:13 |
zaitcev | What about "patch add(s)" :-) | 17:14 |
clayg | zaitcev: fix it | 17:15 |
onovy | clayg: so i should run this? https://gist.github.com/clayg/035dc3b722b7f89cce66520dde285c9a | 17:15 |
onovy | on 2.7.0 or 2.5.0 node? | 17:15 |
clayg | it uses the ring to talk to primary nodes about parts - so ideally you would find a partition that is on a 2.5 and 2.7 node | 17:16 |
clayg | hopefully you could identify such a part from the logs on the node with the high volume rsync's | 17:16 |
openstackgerrit | Pete Zaitcev proposed openstack/swift: Set owner of drive-audit recon cache to swift user https://review.openstack.org/387591 | 17:16 |
zaitcev | your wish is my command | 17:17 |
onovy | clayg: i have 4 nodes, 3 replicas and 2 nodes on 2.5.0 and 2 nodes on 2.7.0 | 17:17 |
onovy | every partition is on 2.5.0 and 2.7.0 node | 17:17 |
onovy | clayg: really looong output | 17:18 |
onovy | sdn-swift-store1.test 6000 hd7-500G | 17:18 |
onovy | {'9fd': '282d14b6c9f3ccc447ac1f387d9c9c60', '9fe': 'bcf1431d13d69ba1123d7504216787bb', | 17:18 |
onovy | something like this | 17:18 |
onovy | sdn-swift-store3.test 6000 hd3-500G '9fd': '282d14b6c9f3ccc447ac1f387d9c9c60' | 17:20 |
onovy | sdn-swift-store1.test 6000 hd7-500G '9fd': '282d14b6c9f3ccc447ac1f387d9c9c60' | 17:20 |
onovy | so same hash... :/ | 17:20 |
*** acoles_ is now known as acoles | 17:22 | |
acoles | clayg: onovy IDK if its relevant or helpful but we do have a direct client method to get hashes from an object server https://github.com/openstack/swift/blob/0d41b2326009c470f41f365c508e473ebdacb11c/swift/common/direct_client.py#L484-L484 | 17:30 |
*** mvk has quit IRC | 17:31 | |
onovy | i'm trying to edit clay's script to compare hashes across servers, almost done | 17:31 |
onovy | running over all partitions now... | 17:32 |
acoles | k, i was just scan-reading backlog, ignore me ;) | 17:32 |
onovy | :) | 17:33 |
*** sgundur has quit IRC | 17:39 | |
*** sgundur has joined #openstack-swift | 17:40 | |
onovy | zaitcev: thanks for review and fix | 17:41 |
*** mvk has joined #openstack-swift | 18:02 | |
*** klrmn1 has joined #openstack-swift | 18:07 | |
*** klrmn has quit IRC | 18:07 | |
openstackgerrit | Ondřej Nový proposed openstack/swift: Fixed rysnc -> rsync typo https://review.openstack.org/388843 | 18:17 |
*** sgundur has quit IRC | 18:18 | |
*** geaaru has quit IRC | 18:19 | |
onovy | tdasilva: we are from Czech republic, not Canada :P | 18:35 |
tdasilva | onovy: i knew that, did i mis-spell something? :( | 18:36 |
tdasilva | oops, seznam.ca sorry | 18:36 |
onovy | :))) | 18:36 |
tdasilva | i meant cz | 18:36 |
onovy | clayg: thanks for pointing to rsync_module | 18:40 |
onovy | onovy@jupiter~/tmp/salt-state (rsync_module) $ git show | wc -l | 18:40 |
onovy | 124 | 18:40 |
onovy | i love salt => ready to deploy :) | 18:40 |
*** vinsh has quit IRC | 19:00 | |
*** charz has quit IRC | 19:08 | |
*** mlanner has quit IRC | 19:09 | |
*** hugokuo has quit IRC | 19:09 | |
*** sgundur has joined #openstack-swift | 19:09 | |
*** alpha_ori has quit IRC | 19:09 | |
*** treyd has quit IRC | 19:10 | |
*** ctennis has quit IRC | 19:11 | |
*** zackmdavis has quit IRC | 19:12 | |
*** charz has joined #openstack-swift | 19:12 | |
*** acorwin has quit IRC | 19:12 | |
*** swifterdarrell has quit IRC | 19:12 | |
*** bobby2_ has quit IRC | 19:12 | |
*** hugokuo has joined #openstack-swift | 19:12 | |
*** timburke has quit IRC | 19:14 | |
*** sgundur has quit IRC | 19:15 | |
*** balajir has quit IRC | 19:16 | |
*** charz has quit IRC | 19:17 | |
*** treyd has joined #openstack-swift | 19:17 | |
*** mlanner has joined #openstack-swift | 19:18 | |
*** bobby2 has joined #openstack-swift | 19:18 | |
*** swifterdarrell has joined #openstack-swift | 19:19 | |
*** ChanServ sets mode: +v swifterdarrell | 19:19 | |
acoles | notmyname: are we meeting today? | 19:19 |
*** balajir has joined #openstack-swift | 19:19 | |
*** alpha_ori has joined #openstack-swift | 19:20 | |
*** acorwin has joined #openstack-swift | 19:20 | |
*** zackmdavis has joined #openstack-swift | 19:21 | |
*** ctennis has joined #openstack-swift | 19:22 | |
*** timburke has joined #openstack-swift | 19:22 | |
*** ChanServ sets mode: +v timburke | 19:22 | |
*** charz has joined #openstack-swift | 19:23 | |
*** sgundur has joined #openstack-swift | 19:25 | |
notmyname | acoles: yes. need to go over backports and big bugs and any questions about the summit. I should have the work sessions scheduled by then | 19:27 |
acoles | notmyname: k, thanks | 19:28 |
*** joeljwright has joined #openstack-swift | 19:31 | |
*** ChanServ sets mode: +v joeljwright | 19:32 | |
openstackgerrit | Alistair Coles proposed openstack/swift: WIP: Make ECDiskFileReader check fragment metadata https://review.openstack.org/387655 | 19:35 |
*** hseipp has quit IRC | 19:35 | |
acoles | clayg: ^^ kota_ fixed failing ssync tests, proxy tests still to do plus kota's suggestion in dependent patch | 19:35 |
acoles | back for meeting | 19:36 |
*** joeljwright has quit IRC | 19:37 | |
*** acoles is now known as acoles_ | 19:39 | |
clayg | yay! | 19:48 |
*** pcaruana has quit IRC | 19:50 | |
clayg | i *think* i understand the ssync test fixes sort of? | 19:52 |
*** joeljwright has joined #openstack-swift | 19:59 | |
*** ChanServ sets mode: +v joeljwright | 19:59 | |
*** silor has quit IRC | 20:04 | |
*** nikivi has joined #openstack-swift | 20:04 | |
*** sn0v has joined #openstack-swift | 20:06 | |
*** sn0v has left #openstack-swift | 20:06 | |
*** joeljwright has quit IRC | 20:11 | |
*** joeljwright has joined #openstack-swift | 20:11 | |
*** ChanServ sets mode: +v joeljwright | 20:11 | |
*** hoonetorg has joined #openstack-swift | 20:19 | |
*** nikivi has quit IRC | 20:21 | |
*** chsc has joined #openstack-swift | 20:33 | |
*** chsc has joined #openstack-swift | 20:33 | |
openstackgerrit | Shashirekha Gundur proposed openstack/swift: Invalidate cached tokens api https://review.openstack.org/370319 | 20:34 |
mattoliverau | Morning | 20:36 |
joeljwright | morning | 20:37 |
joeljwright | :) | 20:37 |
*** sgundur has quit IRC | 20:52 | |
kota_ | good morning | 20:55 |
kota_ | acoles: thanks for working that. I'm getting another thought on a part of my concerns in the last night, will update my comment. | 20:59 |
*** acoles_ is now known as acoles | 20:59 | |
notmyname | meeting time in #openstack-meeting | 20:59 |
*** sgundur has joined #openstack-swift | 20:59 | |
*** mmotiani_ has joined #openstack-swift | 20:59 | |
acoles | kota_: we definitely need to change the exceptions as you suggested, I just didn't get time to do that | 21:00 |
*** vint_bra has joined #openstack-swift | 21:12 | |
*** vint_bra has left #openstack-swift | 21:12 | |
*** m_kazuhiro has joined #openstack-swift | 21:21 | |
*** Jeffrey4l has quit IRC | 21:34 | |
acoles | tdasilva: thanks for +2 on the reconstructor patch! | 21:42 |
*** m_kazuhiro has quit IRC | 21:44 | |
*** mmotiani_ has quit IRC | 21:52 | |
*** nikivi has joined #openstack-swift | 21:54 | |
*** sgundur has quit IRC | 21:54 | |
*** acoles is now known as acoles_ | 21:57 | |
*** nikivi has quit IRC | 22:14 | |
*** klamath has quit IRC | 22:32 | |
*** _JZ_ has joined #openstack-swift | 22:37 | |
*** jmunsch has joined #openstack-swift | 22:40 | |
*** vint_bra has joined #openstack-swift | 22:47 | |
*** joeljwright has quit IRC | 22:48 | |
*** vint_bra has left #openstack-swift | 22:48 | |
jmunsch | anyone able to verify my previous messages exist? | 22:50 |
jmunsch | hello. How to view the X-Delete-After and X-Delete-At meta data from an object, or where in the code should i look more specifically. i have been looking through the http://git.openstack.org/cgit/openstack/deb-swift/tree/swift/obj/server.py trying to figure out how the .expiring_objects gets set, and looking to see how it gets read for GET responses. I have looked at these related links: | 22:55 |
jmunsch | http://docs.openstack.org/developer/swift/overview_expiring_objects.html http://developer.openstack.org/api-ref/object-storage/ http://www.gossamer-threads.com/lists/openstack/dev/31872 https://blog.rackspace.com/rackspace-cloud-files-how-to-use-expiring-objects-api-functionality http://git.openstack.org/cgit/openstack/deb-swift/tree/api-ref/source/storage-object-services.inc http://git.openstack.org/cgit/openstack/deb-swif | 22:55 |
notmyname | jmunsch: you're wanting to view the data on an existing object? | 22:56 |
jmunsch | notmyname: the meta data | 22:56 |
jmunsch | For example I have done something like this: | 22:57 |
*** vint_bra has joined #openstack-swift | 22:57 | |
jmunsch | object_headers.update({'X-Delete-After': '2592000'}) # seconds in 90 days | 22:57 |
*** gyee has joined #openstack-swift | 22:57 | |
jmunsch | On a PUT | 22:58 |
notmyname | ok | 22:59 |
zaitcev | guys guys guys. Where is PyECLib's upstream nowadays, https://github.com/openstack/pyeclib/ ? | 23:00 |
mattoliverau | zaitcev: yup it's apart of OpenStack namespace now | 23:00 |
zaitcev | mattoliverau: that explains why I could not find 1.3.1 | 23:01 |
notmyname | jmunsch: ok, so what do you want to find now that you've done the PUT? | 23:01 |
jmunsch | notmyname: a key value response with a GET or `swift stat|list` indicating that the created object has had the expiry set | 23:09 |
notmyname | jmunsch: ok, so `swift stat <container> <object>` will show that, as will a direct HEAD or GET request to the object | 23:10 |
notmyname | x-delete-after gets translated into x-delete-at an an absolute time | 23:11 |
notmyname | jmunsch: eg https://gist.github.com/notmyname/3aa5f7f6d6b6e6c76e4499061df7fcc0 | 23:12 |
mattoliverau | jmunsch: the updating of the expiring objects container is done in the proxy on a put or post. As notmyname mentioned this is also where x-delete-after is tralated into x-delete-at to be stored as metadata in the object server | 23:16 |
mathiasb | notmyname: sorry I fell asleep and missed the meeting :/ | 23:29 |
mathiasb | any chance of moving the topics from working session 5 from friday to thursday, since neither me nor kota_ will be around on friday? | 23:29 |
notmyname | mathiasb: yeah, I do need to adjust that. will do it over the next 24 hours | 23:30 |
mathiasb | ..just going over the meeting logs and saw that the issue was raised there already | 23:30 |
mathiasb | thanks! | 23:30 |
*** Jeffrey4l has joined #openstack-swift | 23:31 | |
mathiasb | do you know anything more about the meeting room facilities, e.g., if they have projectors to show slides? | 23:32 |
notmyname | mathiasb: I don't, for sure. but I expect them to have something like that. we have had it in the past | 23:34 |
jmunsch | notmyname mattoliverau : thanks so much for the help | 23:39 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!