*** psachin has joined #openstack-swift | 03:39 | |
*** evrardjp has quit IRC | 04:33 | |
*** evrardjp has joined #openstack-swift | 04:33 | |
*** m75abrams has joined #openstack-swift | 05:06 | |
*** tkajinam has quit IRC | 06:13 | |
*** tkajinam has joined #openstack-swift | 06:13 | |
*** manuvakery has joined #openstack-swift | 06:45 | |
*** rpittau|afk is now known as rpittau | 07:29 | |
*** mikecmpbll has joined #openstack-swift | 07:49 | |
*** mikecmpbll has quit IRC | 09:50 | |
*** mikecmpbll has joined #openstack-swift | 09:53 | |
*** dsariel has joined #openstack-swift | 10:24 | |
*** tkajinam has quit IRC | 13:49 | |
*** tkajinam has joined #openstack-swift | 13:49 | |
*** tonyb has quit IRC | 14:50 | |
*** m75abrams has quit IRC | 15:06 | |
*** gyee has joined #openstack-swift | 15:34 | |
*** rpittau is now known as rpittau|afk | 16:06 | |
timburke | good morning | 16:16 |
---|---|---|
zaitcev | Good Morning! | 16:46 |
clayg | timburke: so trying to test p 735271 - I don't know that "timing since" is really the right way to emit these stats | 16:47 |
patchbot | https://review.opendev.org/#/c/735271/ - swift - metrics: Add lag metric to expirer - 1 patch set | 16:48 |
clayg | like only get metrics for count, mean and upper - do we only care about upper? | 16:48 |
clayg | there's an interesting line for error-404.lag.upper until it hits a reclaim age - most everything else is just spotty. | 16:49 |
clayg | the real values are NUTS too - like i might have a 30s lag on a successful - but I approach a reclaim age on 404's 🤷♂️ | 16:50 |
clayg | Maybe I'm still stuck thinking about the expirer queue in terms of "how big is the volume of work that's past it's deadline" instead of "how late was I on this particular item" 🤔 | 16:50 |
timburke | clayg, i could see some value in count -- if you've got something like your script to do some GETs/HEADs in the expiring-object account and see that you've got a bit of a backlog, the counts can give you an idea of how long it'll take you to get through them all | 16:51 |
clayg | successful.lag.count then is a proxy for processing rate - and you can just you sum that up across nodes? | 16:52 |
clayg | it's that significantly better than extrapolating from the graph of "how fast is the backlog going down" - I mean i guess so, it's more real time (no container-updates needed) if you're adjusting concurrency tuning etc that's useful | 16:53 |
timburke | don't see why not -- though we've already got an "objects" counter that'd do the same thing... | 16:54 |
timburke | the tendency of upper to approach reclaim age makes it seem like mean (or maybe 90th percentile?) would be useful, too -- sure, i've got some stale queue entries that are going to need to get reaped after a reclaim age -- but how are the *rest* of the entries doing? | 16:55 |
openstackgerrit | Tim Burke proposed openstack/swift master: squash: More logging refactoring; stop needing an extra catch_errors https://review.opendev.org/755906 | 16:56 |
clayg | Have you played with these metrics in a dev environment? do you have graphs you LIKE? I'm probably failing to conceptualize them... | 16:56 |
clayg | I have a value on a graph of "542K" for object-expirer.successful.lag.upper - it's part of a series that went up then down then back up... what... does... that... mean? | 16:57 |
clayg | did I really have my expirer slowed down so much it successfully deleted an expired object six days late? I only had it running over the weekend! 🤔 | 16:58 |
timburke | i've not actually seen it graphed yet, no. it was largely speculative that it'd be useful, and drawn somewhat by analogy from p 715580 -- way i see it, we've got an item of work, it became valid to perform it at such-and-such time, it's worth tracking how long it takes for us to actually get it done | 17:08 |
patchbot | https://review.opendev.org/#/c/715580/ - swift - obj-updater: add metric on lag of containers listing - 1 patch set | 17:08 |
timburke | there will be times when a whole bunch of stuff is all available to be done at once, and it's going to lead to spikes on the graph -- if it stays under reclaim age, i know i'm still good | 17:10 |
timburke | if upper *and* median keep trending toward reclaim age, i probably need to think about upping my reclaim age; otherwise there's gonna be some data that never gets cleaned up properly | 17:12 |
timburke | if upper trends toward reclaim age but median seems "healthy", i guess i won't worry too much -- there's some stale work that's taking a while to process and i should maybe look at the health of my object-updaters? | 17:13 |
clayg | but, like if we have multiple nodes you can't just "sum" the upper - so you take an "average" - but then what does that really mean... I mean... I don't know what it means already maybe a mean of the mean would be mean 🤮 | 17:34 |
timburke | no -- you can look at the upper of the upper, and you might even be able to make sense of the upper of the mean, but my gut says trying to average across nodes is unlikely to give you anything useful | 18:11 |
*** psachin has quit IRC | 18:16 | |
*** manuvakery has quit IRC | 18:45 | |
openstackgerrit | Tim Burke proposed openstack/swift master: squash: More logging refactoring; stop needing an extra catch_errors https://review.opendev.org/755906 | 18:46 |
clayg | oh right, max of the upper across nodes 🤔 | 18:53 |
clayg | I might have to try again to setup an experiment, right now i've just got https://gist.github.com/clayg/e6ddc8396b72683d162ce56c52b5b390 and i made some to expire @ 30, 300, 3000, 30000, etc. | 18:55 |
*** viks____ has quit IRC | 19:02 | |
*** edausq has quit IRC | 20:25 | |
*** tonyb has joined #openstack-swift | 21:22 | |
openstackgerrit | Pete Zaitcev proposed openstack/swift master: DNM: follow-up for Dark Data #35 https://review.opendev.org/756163 | 21:48 |
zaitcev | What a can of worms | 21:48 |
zaitcev | timburke: I wanted to consult with you about this real quick though. Suppose we have plugins X and Y. X runs first and quarantines the object. Should Y get called to the already-quarantined object? | 21:56 |
zaitcev | timburke: There's also a Solomon solution: prohibit quarantining by plugins and just not do it. | 21:57 |
timburke | my gut says no -- but i also don't want us stat'ing the file to see whether it still exists between every plugin invocation... | 21:58 |
timburke | maybe we could make it part of the api that plugins aren't allowed to quarantine themselves, but they can instead signal that the auditor should do it? | 21:59 |
timburke | or maybe, *if* a plugin quaranitines a file itself, it has to let a DiskFileQuarantined bubble out... | 22:00 |
zaitcev | https://review.opendev.org/#/c/756163/1/swift/obj/audit_dark_data.py does just that | 22:01 |
patchbot | patch 756163 - swift - DNM: follow-up for Dark Data #35 - 1 patch set | 22:01 |
zaitcev | However, that exception causes the execution of chain of plugins to stop. | 22:01 |
zaitcev | As you see, David omitted the raise, so other plugins were called anyway. I thought it was asking for trouble. | 22:02 |
timburke | aborting the chain seems reasonable to me -- it's not in the objects tree, so no more watching | 22:04 |
zaitcev | Thanks a lot | 22:05 |
*** rcernin has joined #openstack-swift | 22:19 | |
openstackgerrit | Tim Burke proposed openstack/swift stable/ussuri: Authors/ChangeLog for 2.25.1 https://review.opendev.org/756166 | 22:21 |
*** samueldmq has quit IRC | 22:22 | |
*** samueldmq has joined #openstack-swift | 22:26 | |
mattoliverau | morning | 22:30 |
openstackgerrit | Tim Burke proposed openstack/swift stable/train: ChangeLog for 2.23.2 https://review.opendev.org/756167 | 22:47 |
*** mikecmpbll has quit IRC | 23:12 | |
*** mikecmpbll has joined #openstack-swift | 23:15 | |
*** dsariel has quit IRC | 23:53 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!