Monday, 2020-10-05

*** psachin has joined #openstack-swift		03:39
*** evrardjp has quit IRC		04:33
*** evrardjp has joined #openstack-swift		04:33
*** m75abrams has joined #openstack-swift		05:06
*** tkajinam has quit IRC		06:13
*** tkajinam has joined #openstack-swift		06:13
*** manuvakery has joined #openstack-swift		06:45
*** rpittau\|afk is now known as rpittau		07:29
*** mikecmpbll has joined #openstack-swift		07:49
*** mikecmpbll has quit IRC		09:50
*** mikecmpbll has joined #openstack-swift		09:53
*** dsariel has joined #openstack-swift		10:24
*** tkajinam has quit IRC		13:49
*** tkajinam has joined #openstack-swift		13:49
*** tonyb has quit IRC		14:50
*** m75abrams has quit IRC		15:06
*** gyee has joined #openstack-swift		15:34
*** rpittau is now known as rpittau\|afk		16:06
timburke	good morning	16:16
zaitcev	Good Morning!	16:46
clayg	timburke: so trying to test p 735271 - I don't know that "timing since" is really the right way to emit these stats	16:47
patchbot	https://review.opendev.org/#/c/735271/ - swift - metrics: Add lag metric to expirer - 1 patch set	16:48
clayg	like only get metrics for count, mean and upper - do we only care about upper?	16:48
clayg	there's an interesting line for error-404.lag.upper until it hits a reclaim age - most everything else is just spotty.	16:49
clayg	the real values are NUTS too - like i might have a 30s lag on a successful - but I approach a reclaim age on 404's 🤷‍♂️	16:50
clayg	Maybe I'm still stuck thinking about the expirer queue in terms of "how big is the volume of work that's past it's deadline" instead of "how late was I on this particular item" 🤔	16:50
timburke	clayg, i could see some value in count -- if you've got something like your script to do some GETs/HEADs in the expiring-object account and see that you've got a bit of a backlog, the counts can give you an idea of how long it'll take you to get through them all	16:51
clayg	successful.lag.count then is a proxy for processing rate - and you can just you sum that up across nodes?	16:52
clayg	it's that significantly better than extrapolating from the graph of "how fast is the backlog going down" - I mean i guess so, it's more real time (no container-updates needed) if you're adjusting concurrency tuning etc that's useful	16:53
timburke	don't see why not -- though we've already got an "objects" counter that'd do the same thing...	16:54
timburke	the tendency of upper to approach reclaim age makes it seem like mean (or maybe 90th percentile?) would be useful, too -- sure, i've got some stale queue entries that are going to need to get reaped after a reclaim age -- but how are the rest of the entries doing?	16:55
openstackgerrit	Tim Burke proposed openstack/swift master: squash: More logging refactoring; stop needing an extra catch_errors https://review.opendev.org/755906	16:56
clayg	Have you played with these metrics in a dev environment? do you have graphs you LIKE? I'm probably failing to conceptualize them...	16:56
clayg	I have a value on a graph of "542K" for object-expirer.successful.lag.upper - it's part of a series that went up then down then back up... what... does... that... mean?	16:57
clayg	did I really have my expirer slowed down so much it successfully deleted an expired object six days late? I only had it running over the weekend! 🤔	16:58
timburke	i've not actually seen it graphed yet, no. it was largely speculative that it'd be useful, and drawn somewhat by analogy from p 715580 -- way i see it, we've got an item of work, it became valid to perform it at such-and-such time, it's worth tracking how long it takes for us to actually get it done	17:08
patchbot	https://review.opendev.org/#/c/715580/ - swift - obj-updater: add metric on lag of containers listing - 1 patch set	17:08
timburke	there will be times when a whole bunch of stuff is all available to be done at once, and it's going to lead to spikes on the graph -- if it stays under reclaim age, i know i'm still good	17:10
timburke	if upper and median keep trending toward reclaim age, i probably need to think about upping my reclaim age; otherwise there's gonna be some data that never gets cleaned up properly	17:12
timburke	if upper trends toward reclaim age but median seems "healthy", i guess i won't worry too much -- there's some stale work that's taking a while to process and i should maybe look at the health of my object-updaters?	17:13
clayg	but, like if we have multiple nodes you can't just "sum" the upper - so you take an "average" - but then what does that really mean... I mean... I don't know what it means already maybe a mean of the mean would be mean 🤮	17:34
timburke	no -- you can look at the upper of the upper, and you might even be able to make sense of the upper of the mean, but my gut says trying to average across nodes is unlikely to give you anything useful	18:11
*** psachin has quit IRC		18:16
*** manuvakery has quit IRC		18:45
openstackgerrit	Tim Burke proposed openstack/swift master: squash: More logging refactoring; stop needing an extra catch_errors https://review.opendev.org/755906	18:46
clayg	oh right, max of the upper across nodes 🤔	18:53
clayg	I might have to try again to setup an experiment, right now i've just got https://gist.github.com/clayg/e6ddc8396b72683d162ce56c52b5b390 and i made some to expire @ 30, 300, 3000, 30000, etc.	18:55
*** viks____ has quit IRC		19:02
*** edausq has quit IRC		20:25
*** tonyb has joined #openstack-swift		21:22
openstackgerrit	Pete Zaitcev proposed openstack/swift master: DNM: follow-up for Dark Data #35 https://review.opendev.org/756163	21:48
zaitcev	What a can of worms	21:48
zaitcev	timburke: I wanted to consult with you about this real quick though. Suppose we have plugins X and Y. X runs first and quarantines the object. Should Y get called to the already-quarantined object?	21:56
zaitcev	timburke: There's also a Solomon solution: prohibit quarantining by plugins and just not do it.	21:57
timburke	my gut says no -- but i also don't want us stat'ing the file to see whether it still exists between every plugin invocation...	21:58
timburke	maybe we could make it part of the api that plugins aren't allowed to quarantine themselves, but they can instead signal that the auditor should do it?	21:59
timburke	or maybe, if a plugin quaranitines a file itself, it has to let a DiskFileQuarantined bubble out...	22:00
zaitcev	https://review.opendev.org/#/c/756163/1/swift/obj/audit_dark_data.py does just that	22:01
patchbot	patch 756163 - swift - DNM: follow-up for Dark Data #35 - 1 patch set	22:01
zaitcev	However, that exception causes the execution of chain of plugins to stop.	22:01
zaitcev	As you see, David omitted the raise, so other plugins were called anyway. I thought it was asking for trouble.	22:02
timburke	aborting the chain seems reasonable to me -- it's not in the objects tree, so no more watching	22:04
zaitcev	Thanks a lot	22:05
*** rcernin has joined #openstack-swift		22:19
openstackgerrit	Tim Burke proposed openstack/swift stable/ussuri: Authors/ChangeLog for 2.25.1 https://review.opendev.org/756166	22:21
*** samueldmq has quit IRC		22:22
*** samueldmq has joined #openstack-swift		22:26
mattoliverau	morning	22:30
openstackgerrit	Tim Burke proposed openstack/swift stable/train: ChangeLog for 2.23.2 https://review.opendev.org/756167	22:47
*** mikecmpbll has quit IRC		23:12
*** mikecmpbll has joined #openstack-swift		23:15
*** dsariel has quit IRC		23:53

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!