21:00:50 <timburke> #startmeeting swift 21:00:50 <opendevmeet> Meeting started Wed Mar 1 21:00:50 2023 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:50 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:50 <opendevmeet> The meeting name has been set to 'swift' 21:01:00 <timburke> who's here for the swift meeting? 21:01:08 <zaitcev> o/ 21:01:12 <indianwhocodes> o/ 21:01:45 <mattoliver> i'm kinda here, have the day off today so that means I'm on getting kids ready for school (however that works) :P 21:02:51 <timburke> i didn't get around to updating the agenda, but i think it's mostly going to be a couple updates from last week, maybe one interesting new thing i'm working on 21:03:20 <timburke> #topic ssync, data with offsets, and meta 21:03:34 <acoles> o/ 21:03:51 <timburke> clayg's probe test got squashed into acoles's fix 21:03:59 <timburke> #link https://review.opendev.org/c/openstack/swift/+/874122 21:04:41 <timburke> we're upgrading our cluster now to include that fix; we should be sure to include feedback about how that went on the review 21:05:37 <timburke> being able to deal with metas with timestamps is still a separate review, but acoles seems to like the direction 21:05:40 <timburke> #link https://review.opendev.org/c/openstack/swift/+/874184 21:06:24 <acoles> timburke: persuaded me that we should fix a future bug while we had this all in our heads 21:06:29 <timburke> the timestamp-offset delimiter business still seems a little strange, but i didn't immediately see a better way to do deal with it 21:07:56 <timburke> #topic http keepalive timeout 21:08:28 <timburke> so my eventlet patch merged! gotta admit, seemed easier to get merged than expected :-) 21:08:30 <timburke> #link https://github.com/eventlet/eventlet/pull/788 21:09:24 <timburke> which means i ought to revisit the swift patch to add config plumbing 21:09:28 <timburke> #link https://review.opendev.org/c/openstack/swift/+/873744 21:10:28 <timburke> are we all ok with turning it into a pure-plumbing patch, provided i make it clear in the sample config that the new option kinda requires new eventlet? 21:12:03 <acoles> what happens if the option is set without new eventlet? 21:13:12 <timburke> largely, existing behavior: keepalive is turned on, and with the general socket timeout (ie, client_timeout) 21:13:41 <timburke> it would also give the option of setting keepalive_timeout to 0 to turn off keepalive behavior 21:13:50 <mattoliver> Yup, do it 21:14:36 <acoles> ok 21:15:32 <timburke> all right then 21:15:34 <timburke> #topic per-policy quotas 21:15:45 <timburke> thanks for the reviews, mattoliver! 21:16:11 <timburke> test refactor is now landed, and there's a +2 on the code refactor 21:16:18 <timburke> #link https://review.opendev.org/c/openstack/swift/+/861487 21:16:27 <timburke> any reason not to just merge it? 21:17:44 <timburke> i suppose mattoliver's busy ;-) i can poke him more later 21:18:09 <timburke> the actual feature patch needs some docs -- i'll try to get that up this week 21:18:12 <timburke> #link https://review.opendev.org/c/openstack/swift/+/861282 21:19:22 <timburke> other interesting thing i've been working on (and i should be sure to add it to the PTG etherpad) 21:19:24 <acoles> I just glanced (not reviewed) and the refactor looks nicer than the original 21:20:16 <timburke> thanks -- there were a couple sneaky spots, but the existing tests certainly helped 21:20:24 <timburke> #topic statsd labeling extensions 21:21:18 <mattoliver> Yeah it can probably just land 21:21:20 <timburke> when swift came out, statsd was the basis for a pretty solid monitoring stack 21:22:03 <timburke> these days, though, people generally seem to be coalescing around prometheus, or at least its data model 21:23:23 <timburke> we at nvidia, for example, are running https://github.com/prometheus/statsd_exporter on every node to turn swift's stats into something that can be periodically scraped 21:24:29 <mattoliver> I've been playing with otel metrics, put it as a topic on the ptg etherpad. Got a basic client to test some infrastructure here at work. Maybe I could at least write up some doc on how that works for extra discussions at the ptg? 21:25:00 <mattoliver> By that i mean how open telemetry works 21:25:08 <timburke> that'd be great, thanks! 21:26:33 <timburke> as it works for us today, there's a bunch of parsing that's required -- a stat like `proxy-server.object.HEAD.200.timing:56.9911003112793|ms` doesn't have all the context we really want in a prometheus metric (like, 200 is the status, HEAD is the request method, etc.) 21:27:55 <timburke> which means that whenever we add a new metric, there's a handoff between dev and ops about what the new metric is, then ops need to go update some yaml file so the new metric gets parsed properly, and *then* they can start using it in new dashboards 21:28:12 <timburke> which all seems like some unnecessary friction 21:29:33 <timburke> fortunately, there are already some extensions to add the missing labels for components, and the statsd_exporter even already knows how to eat several of them: https://github.com/prometheus/statsd_exporter#tagging-extensions 21:30:08 <timburke> so i'm currently playing around with emitting metrics like `proxy-server.timing,layer=account,method=HEAD,status=204:41.67628288269043|ms` 21:30:22 <timburke> or `proxy-server.timing:34.14654731750488|ms|#layer:account,method:HEAD,status:204` 21:30:35 <timburke> or `proxy-server.timing#layer=account,method=HEAD,status=204:5.418539047241211|ms` 21:30:44 <timburke> or `proxy-server.timing;layer=account;method=HEAD;status=204:34.639835357666016|ms` 21:31:33 <timburke> (really, "proxy-server" should probably get labeled as something like "service"...) 21:31:58 <timburke> my hope is to have a patch up ahead of the PTG, so... look forward to that! 21:32:05 <acoles> nice! 21:32:37 <acoles> "layer" is a new term to me? 21:32:56 <timburke> idk, feel free to offer alternative suggestions :-) 21:33:10 <acoles> vs tier or resource (I guess tier isn't clear) 21:33:22 <acoles> haha it took us < 1second to get into a naming debate :D 21:33:40 <acoles> let's save that for the PTG 21:34:53 <mattoliver> Oh cool, I look forward to seeing it! 21:34:54 <timburke> if it doesn't mesh well with an operator's existing metrics stack, (1) it's opt-in and they can definitely still do the old-school vanilla statsd metrics, and (2) most collection endpoints (i believe) offer some translation mechanism 21:34:55 <acoles> I'm hoping we might eventually converge this "structured" stats with structured logging 21:35:14 <mattoliver> +1 21:35:31 <timburke> yes! there's a lot of context that seems like it'd be smart to share between stats and logging 21:35:40 <acoles> e.g. build a "context" data structure and squirt it a logger and/or a stats client and you're done 21:36:15 <timburke> that's all i've got 21:36:19 <timburke> #topic open discussion 21:36:25 <timburke> what else should we bring up this week? 21:36:40 <acoles> on that theme, I wanted to draw attention to a change i have proposed to sharder logging 21:37:18 <timburke> #link https://review.opendev.org/c/openstack/swift/+/875220 21:37:21 <timburke> #link https://review.opendev.org/c/openstack/swift/+/875221 21:37:27 <acoles> 2 patches currently: https://review.opendev.org/c/openstack/swift/+/875220 and https://review.opendev.org/c/openstack/swift/+/875221 21:37:35 <acoles> timburke: is so quick! 21:38:13 <mattoliver> Oh yeah, I've been meaning to get to that.. but off for the rest of the week, so won't happen now until next week. 21:38:19 <acoles> I recently had to debug some sharder issue and found the inconsistently log formats very frustrating 21:38:58 <acoles> e.g sometime we include the DB path, sometimes the resource path, sometimes both...but worst, sometimes neither 21:39:59 <acoles> So the patches ensure that every log message associated with a container DB (which is almost all) will consistently get both the db file path and the resource path (i,e, 'a/c') appended to the message 21:40:31 <acoles> I wanted to flag it up because that includes WARNING and ERROR level messages that I am aware some ops may parse for alerts 21:41:07 <acoles> so this change may break some parsing, but on the whole I believe we'll be better for having consistency 21:41:11 <mattoliver> Sounds good, and as we eventually worker up the sharper it gets all more important. 21:41:37 <mattoliver> *sharder 21:42:07 <acoles> IDK if we have precedence for flagging up such a change, or if I am worrying too much (I tend to!) 21:43:26 <mattoliver> Your making debugging via log messages easier.. and that's a win in my book 21:43:43 <timburke> there's some precedent (e.g., https://review.opendev.org/c/openstack/swift/+/863446) but in general i'm not worried 21:44:19 <acoles> ok so I could add an UpgradeImpact to the commit message 21:45:02 <timburke> if we got to the point of actually emitting structured logs, and then *took that away*, i'd worry. but this, *shrug* 21:46:01 <timburke> fwiw, i did *not* call it out in the changelog 21:46:14 <acoles> well if there's no concerns re. the warnings then I will squash the two patches 21:47:11 <acoles> and then I can look forward to the next sharder debugging session 😜 21:47:21 <timburke> sounds good 21:49:06 <timburke> all right, i think i'll call it 21:49:17 <timburke> thank you all for coming, and thank you for working on swift! 21:49:23 <timburke> #endmeeting