Wednesday, 2023-09-06

opendevreview	Jianjian Huo proposed openstack/swift master: Container-server: add container namespaces GET https://review.opendev.org/c/openstack/swift/+/890470	00:57
opendevreview	Merged openstack/swift master: slo: 500 if we can't load the manifest https://review.opendev.org/c/openstack/swift/+/887230	01:11
zigo	timburke: Thanks, let me try that patch.	09:16
zigo	timburke: The patch looks like fixing that StopIteration issue for me, though I still get this unit test failure:	09:21
zigo	https://paste.opendev.org/show/bB2hp2iu7RaVGsQFzTdd/	09:21
zigo	Maybe I can ignore that one?	09:21
opendevreview	Matthew Vernon proposed openstack/swift master: swift-expired-status CLI tool to inspect the expiry queue https://review.opendev.org/c/openstack/swift/+/893861	10:54
opendevreview	Matthew Vernon proposed openstack/swift master: swift-expired-status: tool to inspect the expiry queue https://review.opendev.org/c/openstack/swift/+/893861	10:55
opendevreview	Alistair Coles proposed openstack/swift master: Improve FakeSwift Backend-Ignore-Range support https://review.opendev.org/c/openstack/swift/+/893577	12:08
opendevreview	ASHWIN A NAIR proposed openstack/swift master: wip: refactor slo https://review.opendev.org/c/openstack/swift/+/893578	18:55
kota	good morning	20:58
zaitcev	ossu	20:58
seongsoocho	good morning	20:59
mattoliver	Morning	21:02
seongsoocho	Hi~!	21:03
* zaitcev taps his foot		21:04
* zaitcev pokes timburke		21:04
zaitcev	I have a question for an open discussion slot... What happens if one back-end node goes down? Yeah, really.	21:04
kota	which timing? in user requests? or stable cluster management perspective?	21:06
zaitcev	I'm looking at our code for error state in proxy.	21:09
kota	On the write context, only one backend fails is no matter because the rest two nodes can respond correct 201, write? (and also if the fail can be detected at the beginning of the request, we can use handoff nodes)	21:12
mattoliver	It gets error limited, new puts are put on handoffs.	21:12
kota	not write, right	21:12
mattoliver	Well what kota said 😀	21:13
kota	and, perhaps no meeting?	21:16
zaitcev	Apparently the issue is, an operator has a cluster that is very slow, and they narrowed it to a particular node being slow enough to make connect timeouts. However, this does not result in the node getting error limited.	21:17
zaitcev	Maybe it would be limited if it either reset connections or returned HTTP errors, but connect timeouts aren't doing it.	21:17
mattoliver	Are they using servers per port, one slow disk can slow a whole ide down.	21:18
mattoliver	*node down	21:18
mattoliver	Oh really, I would've thought they would, need to look at the code	21:19
kota	got it. slow down node handling. hum...	21:19
zaitcev	I'll look into the servers per port thing. We do not enable it by default, I think.	21:19
opendevreview	Merged openstack/swift master: Improve FakeSwift Backend-Ignore-Range support https://review.opendev.org/c/openstack/swift/+/893577	21:26
mattoliver	Hmm, in base on ConnectionTimeout we do call call the proxies exception_occured with is suppose to increment the error limit value for the node.	21:28
zaitcev	Right.	21:28
mattoliver	Well it looks like it's suppose to. Is it intermittent enough that the nodes error limit suppression count is too high to ever actually trigger in the time frame?	21:30
zaitcev	Unfortunately, I don't know.	21:31
zaitcev	I don't even have the ring layout yet.	21:31
mattoliver	Yeah we need for info on this, there is now way to get ready error_limit date with out logs in debug. Tracing pulls this data out. So you can see the state of the error_limit dictionary from the proxy you visit.	21:32
mattoliver	*Get "real" error limit data	21:33
mattoliver	Sorry on phone and it keeps auto correcting	21:33
zaitcev	np	21:35
mattoliver	Yeah, I'd get the suppression count and suppression interval, if there aren't count happening within the interval it won't get limitted	21:35
mattoliver	So maybe it's too intermittent. Could adjust those values. But also look at servers per port (or what ever) as that means a single slow disk doesn't slow down the while back end service. Although I guess it could just as easy be other latency issues	21:36
mattoliver	Also guessing there will be no meeting today 😜	21:37
zaitcev	The report says "we can easily reproduce it with iptables with -j DROP". Okay duh.	21:37
mattoliver	Lol	21:37
mattoliver	I'll can play with a saio and see what happens when it times out (post breakfast).	21:40
mattoliver	zaitcev once you get more info and if you want/need to chat more just ping , esp if you just need another sounding board.	21:40
zaitcev	mattoliver: thanks!	21:40
opendevreview	Tim Burke proposed openstack/swift master: Clean up watchdog threads https://review.opendev.org/c/openstack/swift/+/885812	22:18
opendevreview	Tim Burke proposed openstack/swift master: Add our own sys.unraisablehook https://review.opendev.org/c/openstack/swift/+/893991	22:19

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!