opendevreview | Jianjian Huo proposed openstack/swift master: Container-server: add container namespaces GET https://review.opendev.org/c/openstack/swift/+/890470 | 00:57 |
---|---|---|
opendevreview | Merged openstack/swift master: slo: 500 if we can't load the manifest https://review.opendev.org/c/openstack/swift/+/887230 | 01:11 |
zigo | timburke: Thanks, let me try that patch. | 09:16 |
zigo | timburke: The patch looks like fixing that StopIteration issue for me, though I still get this unit test failure: | 09:21 |
zigo | https://paste.opendev.org/show/bB2hp2iu7RaVGsQFzTdd/ | 09:21 |
zigo | Maybe I can ignore that one? | 09:21 |
opendevreview | Matthew Vernon proposed openstack/swift master: swift-expired-status CLI tool to inspect the expiry queue https://review.opendev.org/c/openstack/swift/+/893861 | 10:54 |
opendevreview | Matthew Vernon proposed openstack/swift master: swift-expired-status: tool to inspect the expiry queue https://review.opendev.org/c/openstack/swift/+/893861 | 10:55 |
opendevreview | Alistair Coles proposed openstack/swift master: Improve FakeSwift Backend-Ignore-Range support https://review.opendev.org/c/openstack/swift/+/893577 | 12:08 |
opendevreview | ASHWIN A NAIR proposed openstack/swift master: wip: refactor slo https://review.opendev.org/c/openstack/swift/+/893578 | 18:55 |
kota | good morning | 20:58 |
zaitcev | ossu | 20:58 |
seongsoocho | good morning | 20:59 |
mattoliver | Morning | 21:02 |
seongsoocho | Hi~! | 21:03 |
* zaitcev taps his foot | 21:04 | |
* zaitcev pokes timburke | 21:04 | |
zaitcev | I have a question for an open discussion slot... What happens if one back-end node goes down? Yeah, really. | 21:04 |
kota | which timing? in user requests? or stable cluster management perspective? | 21:06 |
zaitcev | I'm looking at our code for error state in proxy. | 21:09 |
kota | On the write context, only one backend fails is no matter because the rest two nodes can respond correct 201, write? (and also if the fail can be detected at the beginning of the request, we can use handoff nodes) | 21:12 |
mattoliver | It gets error limited, new puts are put on handoffs. | 21:12 |
kota | not write, right | 21:12 |
mattoliver | Well what kota said 😀 | 21:13 |
kota | and, perhaps no meeting? | 21:16 |
zaitcev | Apparently the issue is, an operator has a cluster that is very slow, and they narrowed it to a particular node being slow enough to make connect timeouts. However, this does not result in the node getting error limited. | 21:17 |
zaitcev | Maybe it would be limited if it either reset connections or returned HTTP errors, but connect timeouts aren't doing it. | 21:17 |
mattoliver | Are they using servers per port, one slow disk can slow a whole ide down. | 21:18 |
mattoliver | *node down | 21:18 |
mattoliver | Oh really, I would've thought they would, need to look at the code | 21:19 |
kota | got it. slow down node handling. hum... | 21:19 |
zaitcev | I'll look into the servers per port thing. We do not enable it by default, I think. | 21:19 |
opendevreview | Merged openstack/swift master: Improve FakeSwift Backend-Ignore-Range support https://review.opendev.org/c/openstack/swift/+/893577 | 21:26 |
mattoliver | Hmm, in base on ConnectionTimeout we do call call the proxies exception_occured with is suppose to increment the error limit value for the node. | 21:28 |
zaitcev | Right. | 21:28 |
mattoliver | Well it looks like it's suppose to. Is it intermittent enough that the nodes error limit suppression count is too high to ever actually trigger in the time frame? | 21:30 |
zaitcev | Unfortunately, I don't know. | 21:31 |
zaitcev | I don't even have the ring layout yet. | 21:31 |
mattoliver | Yeah we need for info on this, there is now way to get ready error_limit date with out logs in debug. Tracing pulls this data out. So you can see the state of the error_limit dictionary from the proxy you visit. | 21:32 |
mattoliver | *Get "real" error limit data | 21:33 |
mattoliver | Sorry on phone and it keeps auto correcting | 21:33 |
zaitcev | np | 21:35 |
mattoliver | Yeah, I'd get the suppression count and suppression interval, if there aren't count happening within the interval it won't get limitted | 21:35 |
mattoliver | So maybe it's too intermittent. Could adjust those values. But also look at servers per port (or what ever) as that means a single slow disk doesn't slow down the while back end service. Although I guess it could just as easy be other latency issues | 21:36 |
mattoliver | Also guessing there will be no meeting today 😜 | 21:37 |
zaitcev | The report says "we can easily reproduce it with iptables with -j DROP". Okay duh. | 21:37 |
mattoliver | Lol | 21:37 |
mattoliver | I'll can play with a saio and see what happens when it times out (post breakfast). | 21:40 |
mattoliver | zaitcev once you get more info and if you want/need to chat more just ping , esp if you just need another sounding board. | 21:40 |
zaitcev | mattoliver: thanks! | 21:40 |
opendevreview | Tim Burke proposed openstack/swift master: Clean up watchdog threads https://review.opendev.org/c/openstack/swift/+/885812 | 22:18 |
opendevreview | Tim Burke proposed openstack/swift master: Add our own sys.unraisablehook https://review.opendev.org/c/openstack/swift/+/893991 | 22:19 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!