*** baojg has quit IRC | 00:03 | |
*** baojg has joined #openstack-swift | 00:06 | |
*** geaaru has quit IRC | 00:08 | |
notmyname | mattoliverau: \o/ | 00:43 |
---|---|---|
mattoliverau | notmyname: only thing that would be awesome for me at the PTG would be a US power cable so I don't have to use this converter ;) one with a IEC C5 Female (or mickey-mouse) or whatever people call it so I can plug it into my dell power adaptor ;) So if anyone happens to have one laying around an office IIEC C5 Female can steal, that would be nice, but it isn't really required :) | 00:55 |
mattoliverau | * i can steal, some how a paste got in there.. I'd balme the travel, but haven't left yet :P | 00:57 |
*** aagrawal has joined #openstack-swift | 01:06 | |
*** abhinavtechie has quit IRC | 01:10 | |
notmyname | mattoliverau: the one with the 3 barrels thing? https://ae01.alicdn.com/kf/HTB1Re6EHVXXXXbKXFXXq6xXFXXXU/-font-b-IEC-b-font-320-font-b-C14-b-font-Male-font-b-C5.jpg | 01:33 |
notmyname | oh like the top one here http://sc02.alicdn.com/kf/HTB1mAmgMVXXXXclXpXXq6xXFXXXq/IEC-FEMALE-C13-TO-C6-CLOVER-MALE.jpg | 01:33 |
notmyname | I'm not sure what I'm looking at, but I'm pretty sure I don't have it https://www.google.com/search?q=IEC+C5+Female&client=safari&rls=en&source=lnms&tbm=isch&sa=X&ved=0ahUKEwiyjc3V-pbWAhUY8WMKHRrYClwQ_AUICygC&biw=1386&bih=762#imgrc=eeYadaU41T4SlM: | 01:35 |
mattoliverau | Yeah that 3 barrel one, like in the last link | 01:36 |
*** itlinux has quit IRC | 01:37 | |
*** mat128 has joined #openstack-swift | 01:38 | |
*** itlinux has joined #openstack-swift | 02:06 | |
*** mat128 has quit IRC | 02:11 | |
*** mat128 has joined #openstack-swift | 03:09 | |
openstackgerrit | Pete Zaitcev proposed openstack/swift master: PUT+POST and its development test https://review.openstack.org/427911 | 03:13 |
*** mat128 has quit IRC | 03:42 | |
*** baojg has quit IRC | 03:48 | |
*** baojg has joined #openstack-swift | 03:50 | |
*** aagrawal has quit IRC | 03:50 | |
*** klrmn has quit IRC | 04:35 | |
*** mat128 has joined #openstack-swift | 04:41 | |
*** mat128 has quit IRC | 05:13 | |
*** baojg has quit IRC | 05:47 | |
*** mat128 has joined #openstack-swift | 06:11 | |
*** silor has joined #openstack-swift | 06:40 | |
*** mat128 has quit IRC | 06:43 | |
*** rcernin has joined #openstack-swift | 06:48 | |
*** mat128 has joined #openstack-swift | 07:41 | |
*** baojg has joined #openstack-swift | 07:51 | |
*** psachin has joined #openstack-swift | 08:07 | |
*** mat128 has quit IRC | 08:13 | |
*** baojg has quit IRC | 08:50 | |
*** baojg has joined #openstack-swift | 08:54 | |
*** mat128 has joined #openstack-swift | 09:10 | |
*** baojg has quit IRC | 09:21 | |
*** geaaru has joined #openstack-swift | 09:39 | |
*** mat128 has quit IRC | 09:44 | |
*** mat128 has joined #openstack-swift | 10:45 | |
*** silor has quit IRC | 11:11 | |
*** mat128 has quit IRC | 11:13 | |
*** psachin has quit IRC | 12:12 | |
*** mat128 has joined #openstack-swift | 12:12 | |
*** mat128 has quit IRC | 12:44 | |
*** rcernin has quit IRC | 12:45 | |
*** baojg has joined #openstack-swift | 12:58 | |
*** baojg has quit IRC | 13:01 | |
*** baojg has joined #openstack-swift | 13:05 | |
*** silor has joined #openstack-swift | 14:04 | |
*** mat128 has joined #openstack-swift | 14:07 | |
*** silor has quit IRC | 14:13 | |
openstackgerrit | zhangyangyang proposed openstack/swift master: Replace deprecated .assertRaisesRegexp() https://review.openstack.org/502243 | 14:16 |
*** mat128 has quit IRC | 14:37 | |
*** mat128 has joined #openstack-swift | 14:39 | |
*** mat128 has quit IRC | 14:39 | |
*** baojg has quit IRC | 14:44 | |
*** baojg has joined #openstack-swift | 14:45 | |
*** baojg has quit IRC | 14:45 | |
*** catintheroof has joined #openstack-swift | 15:25 | |
*** catintheroof has quit IRC | 15:29 | |
*** mat128 has joined #openstack-swift | 15:35 | |
*** catintheroof has joined #openstack-swift | 15:55 | |
*** catintheroof has quit IRC | 15:57 | |
*** catintheroof has joined #openstack-swift | 15:58 | |
*** baojg has joined #openstack-swift | 16:20 | |
*** baojg has quit IRC | 16:25 | |
*** itlinux has quit IRC | 16:59 | |
*** silor has joined #openstack-swift | 17:16 | |
*** baojg has joined #openstack-swift | 17:23 | |
*** baojg has quit IRC | 17:28 | |
*** itlinux has joined #openstack-swift | 17:49 | |
*** psachin has joined #openstack-swift | 17:54 | |
*** psachin has quit IRC | 18:01 | |
*** catintheroof has quit IRC | 18:12 | |
*** d0ugal has quit IRC | 18:21 | |
*** d0ugal has joined #openstack-swift | 18:24 | |
*** silor has quit IRC | 18:34 | |
*** mat128 has quit IRC | 18:44 | |
*** klrmn has joined #openstack-swift | 18:58 | |
*** vint_bra has quit IRC | 19:14 | |
*** itlinux has quit IRC | 20:25 | |
*** catintheroof has joined #openstack-swift | 20:41 | |
*** catintheroof has quit IRC | 20:45 | |
*** eckesicle has joined #openstack-swift | 21:40 | |
eckesicle | im experiencing some degraded swift performance that I cant figure out the cause of | 21:41 |
eckesicle | one of our keystone servers went down, and after coming back up, we're seeing a strange issue | 21:41 |
eckesicle | https://www.irccloud.com/pastebin/puGHNeQB/ | 21:42 |
eckesicle | that error appears over and over again | 21:42 |
eckesicle | but! | 21:42 |
eckesicle | only sometimes | 21:42 |
eckesicle | if try to upload/download the file with the same credentials a couple of times | 21:42 |
eckesicle | it'll work like 3/5 times | 21:42 |
eckesicle | it's also only occuring for some accounts | 21:43 |
eckesicle | 85% of the accounts are okay. | 21:43 |
eckesicle | if we restart keystone | 21:43 |
eckesicle | we see the same behaviour | 21:43 |
eckesicle | except | 21:43 |
eckesicle | the 15% of broken accounts, have now changed to be another 15% (seemingly random accounts) | 21:44 |
eckesicle | any ideas? | 21:44 |
eckesicle | ah wrong paste | 21:46 |
eckesicle | this is the correct error message | 21:46 |
eckesicle | https://www.irccloud.com/pastebin/D4E7m96Q/ | 21:46 |
notmyname | eckesicle: (just guessing here) sounds like it might be a cache issue. do you have multiple proxy servers? are they configured to have the same common memcache pool? (they should) | 21:53 |
eckesicle | yeah they are | 21:54 |
notmyname | but you might have it configured that the proxies are using different memcache pools or even only using localhost. which means some cached tokens are now invalid | 21:54 |
eckesicle | i could restart memcached? | 21:54 |
eckesicle | for debug purposes i have disabled all but one swift-proxy | 21:54 |
eckesicle | and the error is still occuring | 21:54 |
notmyname | restarting memcache will be a cache flush. I'm not sure what the impact for your cluster will be. could be minimal. | 21:55 |
eckesicle | it'll be fine, usage is minimal at 2300 on a saturday | 21:55 |
eckesicle | that didnt solve it :( | 21:56 |
notmyname | can we isolate the issue any further? you said you tried to redownload with the same credentials. is that like a client request that's doing the auth dance again? or do you mean you're using curl and already have the token and the token sometimes works and sometimes doesn't? | 21:57 |
eckesicle | well i can simulate a client request | 21:57 |
eckesicle | but just reloading the /health endpoint (that will connect to swift, upload a heartbeat file, and download the same file) | 21:57 |
eckesicle | so if i go to a client that is currently down | 21:57 |
eckesicle | and reload that page over and over | 21:58 |
eckesicle | it'll work like 20% of the time | 21:58 |
eckesicle | but then after a minute or so | 21:58 |
eckesicle | it'll work 100% of the time | 21:58 |
notmyname | what client are you using? | 21:58 |
eckesicle | (but another client will go down) | 21:58 |
eckesicle | fog | 21:58 |
eckesicle | and this is what it looks like in the logs when it fails: | 21:59 |
eckesicle | https://www.irccloud.com/pastebin/r9LbR6pe/ | 21:59 |
notmyname | what's between the swift cluster and the client? load balancer? any cache? CDN? TLS terminator? | 22:00 |
eckesicle | nothing at all | 22:00 |
eckesicle | we use a custom dns to route properly but that's it | 22:01 |
notmyname | client box talks directly to the swift proxy server process? | 22:01 |
eckesicle | [pipeline:main] | 22:02 |
eckesicle | pipeline = catch_errors gatekeeper healthcheck proxy-logging cache container_sync bulk tempurl ratelimit authtoken keystoneauth container-quotas account-quotas slo dlo proxy-logging proxy-server | 22:02 |
eckesicle | that's the pipeline | 22:02 |
notmyname | next step I'd do is use the swift CLI to get an auth token. eg `swift auth -v` | 22:02 |
notmyname | make a note of the auth token returned and then use curl to do requests to the given storage URL | 22:03 |
notmyname | or maybe to the specific IP of the proxy server you're testing | 22:03 |
notmyname | see if that works or fails | 22:03 |
eckesicle | ok | 22:04 |
notmyname | eg curl command: `curl -i -H "x-auth-token: yourtoken" http://1.2.3.4/v1/youraccount/` | 22:05 |
notmyname | what are you using for TLS termination? | 22:06 |
eckesicle | got it! | 22:07 |
eckesicle | this cluster runs completely on an internal network | 22:08 |
notmyname | what was it? | 22:08 |
eckesicle | so there is no TLS | 22:08 |
notmyname | ok | 22:08 |
eckesicle | export OS_STORAGE_URL=http://swift-proxy.service.binet:8080/v1/AUTH_fe48f5d760a14d1a9c398d95275235bf | 22:08 |
eckesicle | export OS_AUTH_TOKEN=xxx | 22:08 |
notmyname | you got it working now? | 22:09 |
eckesicle | that works | 22:09 |
notmyname | ok, good | 22:09 |
notmyname | so now that you have the token itself, use curl to make requests to the proxy server | 22:10 |
notmyname | (the reason I like using curl is because it doesn't hide anything or do extra requests for you. makes debugging much easier) | 22:10 |
notmyname | `curl -i -H "x-auth-token: xxx" http://swift-proxy.service.binet:8080/v1/AUTH_fe48f5d760a14d1a9c398d95275235bf` in your case | 22:12 |
notmyname | that does a GET on the account and returns a list of the containers in the account | 22:12 |
eckesicle | yeah | 22:13 |
notmyname | if you have a bunch of containers, then `curl -I -XHEAD ...` will just do the HEAD request and return the account metadata in the response headers | 22:13 |
eckesicle | it does get me a list of the containers | 22:13 |
notmyname | cool | 22:13 |
notmyname | does it work every time? | 22:13 |
eckesicle | but i actually cant see | 22:13 |
eckesicle | oh right | 22:13 |
eckesicle | no i cant see the container i expect to see | 22:15 |
notmyname | TBH I don't care if you can see what you expect or not (yet). I care that you don't get 401s | 22:15 |
eckesicle | ok | 22:15 |
eckesicle | well im getting 401s now | 22:15 |
eckesicle | and now it works again | 22:16 |
eckesicle | (running that same command) | 22:16 |
notmyname | big picture, here's where my thinking is going: I want to see if the issue is between swift and keystone with validation of tokens or if the issue is a client to keystone issue where maybe keystone is giving tokens that dont' get validated | 22:16 |
notmyname | hmm | 22:16 |
notmyname | ok | 22:16 |
notmyname | so repeatedly using the same token you are sometimes getting 200 and sometimes getting 401 | 22:16 |
notmyname | and it flops back and forth | 22:16 |
notmyname | right? | 22:17 |
eckesicle | yes | 22:17 |
notmyname | that would imply the problem is more likely to be between swift and keystone | 22:18 |
notmyname | I don't know a whole lot about keystone internals. are you using uuid tokens? | 22:18 |
notmyname | or fernet tokens? | 22:18 |
eckesicle | i dont know the answer to that | 22:19 |
eckesicle | that's in the keystone config? | 22:19 |
notmyname | yes, I'd guess | 22:19 |
notmyname | fernet tokens are long IIRC. uuid tokens are 32 bytes | 22:19 |
eckesicle | uuid then | 22:20 |
notmyname | ok | 22:20 |
eckesicle | 32 chars hex | 22:20 |
notmyname | yeah | 22:20 |
notmyname | so with the curl requests, when you get a 401, the log messages are the same as the ones you pasted above? | 22:20 |
eckesicle | HTTP/1.1 401 Unauthorized | 22:21 |
eckesicle | Content-Length: 131 | 22:21 |
eckesicle | Content-Type: text/html; charset=UTF-8 | 22:21 |
eckesicle | Www-Authenticate: Swift realm="AUTH_fe48f5d760a14d1a9c398d95275235bf" | 22:21 |
eckesicle | WWW-Authenticate: Keystone uri='http://keystone.service.binet:5000' | 22:21 |
eckesicle | X-Trans-Id: tx3ee4698a23314989aa548-0059b46844 | 22:21 |
eckesicle | Date: Sat, 09 Sep 2017 22:16:36 GMT | 22:21 |
eckesicle | <html><h1>Unauthorized</h1><p>This server could not verify that you are authorized to access the document you requested.</p></html>root@sld-stor2:/etc/swift/proxy-server# curl -i -H "x-auth-token: | 22:21 |
eckesicle | ah sorry i should've pastebinned that | 22:21 |
notmyname | no worries | 22:21 |
notmyname | what about the log lines. the "Auth token not in the request header" is the one from earlier that looks weird to me | 22:22 |
notmyname | are you still seeing that? | 22:22 |
eckesicle | yes, but i think i know what that is | 22:22 |
notmyname | oh, and if you grep all of your logs for the x-trans-id, you'll see everything in the swift cluster for that reuqest (tx3ee4698a23314989aa548-0059b46844 in this example) | 22:22 |
eckesicle | I think that's the health check, just making a http request | 22:22 |
notmyname | oh | 22:22 |
eckesicle | https://www.irccloud.com/pastebin/gEmVZsu9/ | 22:23 |
eckesicle | there's a pastebin of the grep | 22:23 |
eckesicle | who validates the token? | 22:24 |
eckesicle | is it the proxy or the object/container/account server? | 22:24 |
notmyname | you implied you had multiple proxy servers. can you turn on another without adding it to the dns/load balancer (ie to be able to change the config without end users using it) | 22:25 |
notmyname | auth heppens in the proxy server | 22:25 |
notmyname | so if the auth fails, requests don't even go to the storage nodes (account, container, object) | 22:25 |
eckesicle | i cant easily turn it on now unfortunately | 22:26 |
notmyname | no worries | 22:26 |
notmyname | I was hoping you could set a proxy server to use debug-level logging | 22:26 |
eckesicle | oh i can just restart this one | 22:26 |
eckesicle | in debug | 22:26 |
notmyname | there's a bunch of logs that they keystone integration spits out that might be handy | 22:26 |
notmyname | ok, put it in debug then, redo the curl request until you get a 401, then grep for the transaction id | 22:27 |
eckesicle | do you know the command by heart? | 22:27 |
eckesicle | in the config i mean | 22:27 |
notmyname | which command? | 22:27 |
notmyname | oh, to set it to debug? | 22:27 |
notmyname | log_level=DEBUG IIRC | 22:27 |
eckesicle | yeah thats it | 22:28 |
notmyname | https://github.com/openstack/swift/blob/master/etc/proxy-server.conf-sample#L44 | 22:28 |
eckesicle | https://www.irccloud.com/pastebin/NF6aqZVD/ | 22:29 |
eckesicle | okay | 22:29 |
eckesicle | so i set it to debug | 22:29 |
eckesicle | ran a few curls until it 401 | 22:30 |
eckesicle | then grepped with context for that txn | 22:30 |
eckesicle | 'cached token marked as unauthorized' looks interesting | 22:31 |
notmyname | yeah | 22:31 |
notmyname | looking | 22:31 |
notmyname | eckesicle: this started when you upgraded or otherwise changed keystone, right? | 22:34 |
eckesicle | server crashed | 22:34 |
eckesicle | and we booted it up again | 22:34 |
notmyname | yikes | 22:35 |
notmyname | in the proxy server config, under the [filter:authtoken] section, do you have cache set to swift.cache? https://github.com/openstack/swift/blob/master/etc/proxy-server.conf-sample#L384 | 22:37 |
eckesicle | no | 22:37 |
notmyname | if not, I think you should | 22:37 |
eckesicle | delay_autth_decision = true | 22:37 |
eckesicle | delay_auth_decision = true | 22:37 |
notmyname | good | 22:38 |
eckesicle | okay | 22:39 |
eckesicle | im setting that | 22:39 |
eckesicle | same problem | 22:39 |
notmyname | hmm | 22:39 |
notmyname | that was from https://github.com/openstack/keystonemiddleware/blob/master/keystonemiddleware/auth_token/__init__.py#L203-L211 | 22:40 |
eckesicle | i wonder if someone made a breaking change to the config | 22:41 |
eckesicle | saved it but didnt reload keystone | 22:41 |
eckesicle | and then when the reboot happened it loaded up an invalid config | 22:42 |
notmyname | I don't think you need to reload keystone if you just update the swift proxy config | 22:42 |
notmyname | ok, so far it seems like the proxy server gets a token from the end user, the keystone middleware looks for the token in its cache, gets an invalid response from the cache, and then returns 401 to the user | 22:42 |
eckesicle | it's cached in memcached right? | 22:44 |
notmyname | hmm... maybe setting cache to swift.cache could hide/fix the problem, but it makes me wonder why it was working before | 22:45 |
notmyname | IIRC the swift.cache was the best to use a long time ago, but that was related to som scale concerns in keystone's memcache client (which IIRC have been resolved) | 22:45 |
notmyname | so, taking a quick glance through unfamiliar code (the keystone middleware project), it seems that keystone is looking up the token in memecache | 22:46 |
eckesicle | so i did set cache = swift.cache | 22:46 |
eckesicle | and restarted every swift service | 22:46 |
notmyname | and so perhaps something with the keystone memcache pool or settings was changed with the keystone upgrade | 22:46 |
eckesicle | but it didnt fix it | 22:46 |
notmyname | yeah. that may be a red herring | 22:47 |
notmyname | or, alternatively, it might be some other keystone config related to memcache that is a common problem there | 22:47 |
eckesicle | [filter:authtoken] | 22:48 |
eckesicle | paste.filter_factory = keystonemiddleware.auth_token:filter_factory | 22:48 |
eckesicle | identity_uri = http://keystone-admin.service.binet:35357 | 22:48 |
eckesicle | auth_uri = http://keystone.service.binet:5000 | 22:48 |
eckesicle | what does paste.filter_factory do here? | 22:48 |
notmyname | there's not any chance you've got pre-upgrade versions of the config file to compare against, do you? a company I used to work for had all the config files checked into a version control system for that | 22:48 |
notmyname | oh, the filter_factory identifies the code module that will be used | 22:48 |
notmyname | basically, it's how the plugin system works | 22:49 |
eckesicle | i have all configs in ansible | 22:49 |
eckesicle | i was just browsing that | 22:49 |
eckesicle | but i checked the config files, they have not been touched since 2015-12-12 | 22:50 |
notmyname | I'm looking at https://github.com/openstack/keystonemiddleware/blob/master/keystonemiddleware/auth_token/__init__.py#L948-L957 | 22:50 |
eckesicle | i wonder if maybe the auto update has broken something | 22:50 |
notmyname | again, it's an unfamiliar codebase, and I'm not sure where, if anywhere, keystone has sample config files | 22:51 |
eckesicle | some package that we depend on couldve been updated | 22:51 |
notmyname | but those lines make it look like there's some settable options | 22:51 |
eckesicle | we did something wrong ... now the number of affected clients have gone from 15% to 60% | 22:51 |
notmyname | yikes! | 22:52 |
notmyname | ok, unset the cache thing (that's the only thing we changed | 22:52 |
notmyname | oh, debug logging, but that shouldn't affect things | 22:53 |
eckesicle | yeah | 22:53 |
eckesicle | ill see if it fixes things | 22:53 |
eckesicle | i wonder if it's a load thing? | 22:53 |
eckesicle | like the cache lookup times out every now and then or something | 22:53 |
notmyname | maybe | 22:53 |
eckesicle | the load is quite high on the server | 22:54 |
eckesicle | 3 on a 6 core | 22:54 |
notmyname | on the keystone server or the proxy server? | 22:54 |
notmyname | oh, are they the same box? | 22:54 |
eckesicle | yeah | 22:54 |
notmyname | so the other proxy servers were talkign to this box too? | 22:55 |
eckesicle | ive taken those down | 22:55 |
eckesicle | right now (for debugging) | 22:55 |
notmyname | right | 22:55 |
eckesicle | there is only one keystone and one swift proxy | 22:55 |
notmyname | ya | 22:55 |
eckesicle | but a lot of storage nodes | 22:56 |
eckesicle | no one uses this service out of office horus | 22:56 |
eckesicle | no one uses this service out of office hours | 22:56 |
eckesicle | so i dont have to worry about bringing things down | 22:56 |
notmyname | but the load is high right now? | 22:56 |
eckesicle | we're back down to 15% now | 22:56 |
notmyname | ok | 22:56 |
eckesicle | yeah the load is high | 22:56 |
notmyname | how many requests per second are you handling right now? roughly | 22:57 |
eckesicle | 100 or so | 22:57 |
notmyname | ok | 22:57 |
eckesicle | i just paused all the health checks | 22:57 |
eckesicle | okay, we're down to zero requests per second :) | 22:58 |
eckesicle | let's see if the load goes down | 22:58 |
eckesicle | and if the curl still fails | 22:58 |
eckesicle | well the load is down to <1 | 22:59 |
eckesicle | and the 401s are still showing | 22:59 |
notmyname | hmm | 22:59 |
notmyname | I wonder if there's something in the memcache config? did that get updated? | 23:00 |
notmyname | are there knobs there that can increase capacity? | 23:00 |
notmyname | also, I'm getting to the end of my knowledge at this point | 23:02 |
eckesicle | 2015-10-30 | 23:02 |
eckesicle | yeah | 23:02 |
eckesicle | thanks so much for your help | 23:02 |
notmyname | might be worth it to see if there's anyone in the #openstack-keystone channel | 23:02 |
eckesicle | (if i knew your name and address id send you a bottle of gin or something) | 23:02 |
notmyname | do you have other openstack services using this keystone instance? | 23:02 |
notmyname | lol | 23:02 |
eckesicle | or a gift card :) | 23:03 |
eckesicle | priv me your amazon username and ill send you a little thanks | 23:04 |
eckesicle | no, it's only swift | 23:05 |
*** alenavolk has joined #openstack-swift | 23:38 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!