*** tonyb has joined #openstack-swift | 00:22 | |
*** psachin has joined #openstack-swift | 03:24 | |
openstackgerrit | Merged openstack/swift master: proxy: Include thread_locals when spawning _fragment_GET_request https://review.opendev.org/749376 | 03:33 |
---|---|---|
*** m75abrams has joined #openstack-swift | 04:18 | |
*** evrardjp has quit IRC | 04:33 | |
*** evrardjp has joined #openstack-swift | 04:33 | |
*** gyee has quit IRC | 05:38 | |
*** manuvakery has joined #openstack-swift | 05:47 | |
*** rcernin has quit IRC | 06:50 | |
*** rcernin has joined #openstack-swift | 07:08 | |
*** rcernin has quit IRC | 07:28 | |
*** mikecmpbll has joined #openstack-swift | 08:03 | |
*** rcernin has joined #openstack-swift | 08:11 | |
*** rcernin has quit IRC | 08:17 | |
*** rcernin has joined #openstack-swift | 08:50 | |
*** rcernin has quit IRC | 09:03 | |
*** lxkong has joined #openstack-swift | 09:32 | |
*** m75abrams has quit IRC | 09:46 | |
*** zaitcev has joined #openstack-swift | 12:49 | |
*** ChanServ sets mode: +v zaitcev | 12:49 | |
*** mikecmpbll has quit IRC | 13:54 | |
*** gyee has joined #openstack-swift | 15:15 | |
DHE | anyone seen deadlocks in the proxy server? I've had a few times now where the proxy service hangs and requires a full restart, the workers apparently hung in a futex() call of sorts | 15:28 |
DHE | I've brought this up before without resolution, but that was months ago | 15:28 |
zaitcev | Not me, but my cluster is small. | 15:29 |
DHE | mine isn't all that big either. but it might need updating. could be an eventlet bug maybe, etc. | 15:32 |
DHE | Something like this https://github.com/eventlet/eventlet/issues/508 except I'm running python 3.6 instead and this specifically mentions 3.7. | 15:37 |
zaitcev | I was hitting https://github.com/eventlet/eventlet/issues/526 and I noticed it on 3.6 first. This does not have a direct relation to your deadlock, but it tells us that the lowest broken and highest working versions are not always reliable. | 15:39 |
DHE | fair | 15:39 |
*** manuvakery has quit IRC | 16:03 | |
*** lxkong has quit IRC | 16:11 | |
*** manuvakery has joined #openstack-swift | 16:14 | |
*** psachin has quit IRC | 16:15 | |
ormandj | timburke: you'll be happy to know we're rolling out servers_per_port as we speak | 16:26 |
ormandj | will let you know how it goes | 16:26 |
ormandj | on an unrelated to that point - why would we be seeing lots of timeouts/errors getting respones from the object-updater connecting to the container-server port/disk? the disks are basically idle (and are SSD) for container dbs | 16:27 |
timburke | DHE, fwiw i've been running into https://bugs.launchpad.net/swift/+bug/1895739 at home | 16:56 |
openstack | Launchpad bug 1895739 in OpenStack Object Storage (swift) "Proxy server sometimes deadlocks while logging client disconnect" [Undecided,New] | 16:56 |
timburke | for the moment, i'm running with something like http://paste.openstack.org/show/798024/ to see if the problem goes away, or if any *new* problems creep up -- so far, so good 🤞 | 16:58 |
DHE | stack xray? I like it already... | 16:59 |
timburke | super-handy! torgomatic's awesome | 17:00 |
DHE | so I'm currently on 2.23.1 but it looks like it will install cleanly | 17:05 |
timburke | oh yeah, zaitcev, have you tried latest eventlet recently? i *think* the SSLContext ting should be fixed now... | 17:24 |
zaitcev | timburke: on my list. I'm just back from the desert today. | 17:24 |
timburke | oh, no worries! just figured i'd check :-) | 17:25 |
timburke | good trip? i know i really needed a trip to the mountains a few weeks ago | 17:25 |
zaitcev | It was okay. Lost 16 pounds. | 17:26 |
zaitcev | While eating mostly chicken and sousage. | 17:26 |
DHE | oh I found a hung process! one moment please! | 17:33 |
DHE | http://paste.openstack.org/show/IRDAu64Piux39lZIRbkJ/ | 17:37 |
*** irclogbot_1 has quit IRC | 17:58 | |
*** irclogbot_2 has joined #openstack-swift | 18:02 | |
timburke | what happened to the poor guys that we spawne dup at https://github.com/openstack/swift/blob/2.23.1/swift/proxy/controllers/base.py#L1335-L1336 ? :-/ they were supposed to be making the backend requests | 18:31 |
timburke | DHE, try grepping logs for "STDERR" or "_make_node_request" | 18:31 |
DHE | STDERR shows a lot of memcached errors | 18:33 |
timburke | (the biggest downside to the xray thing is that it can only tell you where things are *now*, after things have broken -- it's like how i always want a "rewind" option in debuggers) | 18:34 |
timburke | fwiw, the memcache logging will improve somewhat with https://github.com/openstack/swift/commit/e4586fdcd | 18:34 |
DHE | memcached is limiting itself to 1024 connections, I suspect that is the reason... | 18:35 |
timburke | i could see that causing issues, yeah. fwiw, i think we run with like 16k by default | 18:38 |
timburke | i don't think it's the root cause for *this* issue, of course. by the time we get down to making and waiting on backend connections, we shouldn't be touching memcache | 18:40 |
openstackgerrit | Tim Burke proposed openstack/swift master: Authors/ChangeLog for 2.26.0 https://review.opendev.org/750537 | 18:56 |
*** ccamel has quit IRC | 18:57 | |
*** camelCaser has joined #openstack-swift | 19:15 | |
DHE | is it possible to print the active thread's stack? I can run "bt" in gdb but have no idea how to interpret that | 19:16 |
DHE | (I know enough to be dangerous but not enough to be useful) | 19:16 |
*** openstackgerrit has quit IRC | 19:21 | |
DHE | so I've dumped 3 processes so far. 1 has your nested current_thread call deadlock. the other 2 are nigh-identical and one is the paste I gave above. the only significant difference is the top stack. | 19:30 |
*** manuvakery has quit IRC | 20:03 | |
*** openstackgerrit has joined #openstack-swift | 20:21 | |
openstackgerrit | Tim Burke proposed openstack/swift master: Add a new URL parameter to allow for async cleanup of SLO segments https://review.opendev.org/733026 | 20:21 |
timburke | DHE, so basically any stack that ends with `self.greenlet.switch()` should be inactive - it's registered what it's waiting on with eventlet and put itself to sleep. generally when you observe a deadlock, the only active thread will be the eventlet hub, and it'll be in a sleep-a-little-then-check-what's-ready loop | 20:33 |
timburke | good to know that you've seen this nested-current_thread-call issue, too! i'm not sure why it hasn't come up before -- it really seems like it should've been an issue with py2 as well... | 20:35 |
DHE | yes, but 1 out of 3 isn't reassuring. and when I strace'd a hung thread it was in a futex that wasn't time-limited | 20:35 |
*** lxkong has joined #openstack-swift | 20:36 | |
*** lxkong has quit IRC | 20:36 | |
DHE | proxy server isn't actually multi-threaded and doesn't share memory with other instances, right? so there isn't anything that could cause a wake-up | 20:36 |
DHE | unless somehow the bug manifests in other ways | 20:37 |
DHE | sorry, I've been spending most of the day trying to figure this one out. bit us pretty hard this morning | 20:39 |
DHE | and this is definitely not my area of expertise | 20:40 |
timburke | might try picking up the changes in https://github.com/swiftstack/python-stack-xray/pull/2/files so you can see pthread stacks, too -- i think those should all be "active" | 21:17 |
DHE | okay that shows a new stack with your current_thread doublecall that wasn't in the previous output | 21:34 |
DHE | yeah this looks right | 22:06 |
*** mgagne has joined #openstack-swift | 22:37 | |
openstackgerrit | Tim Burke proposed openstack/swift master: Add a new URL parameter to allow for async cleanup of SLO segments https://review.opendev.org/733026 | 22:39 |
timburke | reeeally... mind sharing? i'd only ever seen it in a greenthread stack iirc | 22:39 |
openstackgerrit | Tim Burke proposed openstack/swift master: Run swift-tox-func-encryption-py37 job in the gate https://review.opendev.org/752580 | 22:47 |
*** rcernin has joined #openstack-swift | 23:05 | |
DHE | http://paste.openstack.org/show/798034/ oh sorry, yeah I should share that | 23:49 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!