21:00:21 <timburke> #startmeeting swift 21:00:22 <openstack> Meeting started Wed Jan 6 21:00:21 2021 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:24 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:26 <openstack> The meeting name has been set to 'swift' 21:00:29 <timburke> who's here for the swift meeting? 21:00:49 <kota_> o/ 21:00:51 <acoles> o/ 21:00:52 <seongsoocho> o/ 21:01:00 <rledisez> hi! o/ 21:01:11 <alecuyer> o/ 21:01:27 <zaitcev> o/ 21:01:36 <mattoliverau> o/ 21:01:59 <timburke> as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:02:04 <clayg> 0/ 21:02:10 <timburke> first up 21:02:18 <timburke> #topic happy new year 21:02:40 <timburke> hope everyone had a good break for the end of the year 21:03:14 <timburke> we've done a lot of great stuff this last year, and i'm sure this coming year will be great, too :-) 21:03:54 <timburke> but... there's also some stuff that's not so great 21:03:54 <mattoliverau> 2020 is finally over, may 2021 be less 2020y :P 21:04:03 <timburke> #topic stable gates 21:04:49 <timburke> mattoliverau, so far... that's not looking great. but we should always be hopeful! 21:05:37 <timburke> so for about the past month we've had broken stable gates 21:07:00 <timburke> a lot of fixes have landed to resolve the bandit troubles (on pike, queens, and rocky, and a slightly different tack tacken for victoria) 21:07:29 <timburke> and is seems like we've got fixes for stein and train, though they're currently held up on some devstack patches 21:07:59 <timburke> #link https://review.opendev.org/c/openstack/swift/+/766214 21:08:08 <timburke> #link https://review.opendev.org/c/openstack/swift/+/766489 21:08:19 <timburke> which depend on 21:08:26 <timburke> #link https://review.opendev.org/c/openstack/devstack/+/768256 21:08:30 <timburke> #link https://review.opendev.org/c/openstack/devstack/+/768257 21:10:00 <clayg> Amazing work Tim! 🙏. Do you need any reviews or anything we can do to help? 21:10:01 <timburke> i tried applying a similar fix for ussuri as was used for r-t 21:10:06 <timburke> #link https://review.opendev.org/c/openstack/swift/+/769439 21:11:04 <timburke> but there seems to be some trouble with grenade 21:11:51 <timburke> i'll keep working at it, will keep folks updated 21:13:00 <timburke> clayg, thanks -- i'm not sure there *is* much to be done at this point, unless someone else would like to become the stable-branch champion. i'll keep trying to make it work, but there's other stuff i want to get done, too ;-) 21:13:23 <timburke> #topic OpenStack Xena 21:13:31 <timburke> we have a new release name! 21:13:42 <timburke> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-December/019537.html 21:14:00 <mattoliverau> I wonder if we can add the swift core group to swift stable, is there still a stable team? 21:14:33 <mattoliverau> Nice I used to work on an opensource project called Xena, so this take me back.. let all the warrior princess memes begin :P 21:15:43 <timburke> there is a separate swift-stable-maint group; if anyone else would like to join it, review some backports and i'd be happy to propose you be added to it 21:16:21 <timburke> in general, i feel like the problem isn't review bandwidth, though; mostly a matter of putting in time to make sure things still work *shrug* 21:17:48 <timburke> but speaking of new releases... 21:17:57 <timburke> #topic swift 2.27.0 21:18:35 <clayg> we're only at 2.26.0.6 - it hasn't been *that* long since a release 🤷♂️ 21:18:36 <timburke> i feel like we've landed a good bit of stuff lately, and we should probably do a fresh release soon 21:20:16 <timburke> we've added audit watchers, improved waterfall ec, improved logging, fixed some py3 issues, added a new option to deal with libec upgrades, ... 21:21:02 <clayg> 💪 :shipit: 21:21:05 <timburke> so i'm going to work on getting some release notes up in the next week or two; if you know of anything else that really should land for the next release, let me know and i'll try to get it reviewed! 21:21:23 <clayg> shard range caching (spoiler!!!) 21:21:32 <timburke> (or add it to https://wiki.openstack.org/wiki/Swift/PriorityReviews -- or do both!) 21:21:47 <acoles> https://review.opendev.org/c/openstack/swift/+/761659 21:21:58 <timburke> speaking of... 21:22:11 <timburke> #topic shard range caching for listings 21:22:18 <timburke> acoles, clayg take it away;-) 21:22:54 <acoles> https://review.opendev.org/c/openstack/swift/+/761659 adds caching shard ranges in memcache for container listings 21:23:16 <acoles> we've had it in production for a month or so and it works 21:23:31 <clayg> when we shipped it to prod our baseline 5XX dropped to almost nothing and everyone was singing @acoles praises 21:24:03 <acoles> it's had a few rounds of review so I hope it is close to merging 21:24:32 <clayg> well, but it's blocked on you BUGFIX!!! (don don DUUUUUUN) 21:24:56 <acoles> clayg: yeah, I was going to mention the bug too 21:25:23 <acoles> so while working on this I discovered a nasty bug, but one I doubt any is at risk of yet 21:25:41 <acoles> the bug fix is here https://review.opendev.org/c/openstack/swift/+/767410/5 21:26:05 <mattoliverau> I plan to review your bug + cache chain today. 21:26:28 <mattoliverau> And is something that we shold land before the next release ;) 21:26:47 <acoles> there's a corner case risk of loops in proxy, but only when shrinking shards, and then only in very particular conditions, but I really want to get that fixed, cos loops aren't good 21:27:01 * mattoliverau is never biased toward sharding patches :P 21:27:33 <acoles> happily, the fix is relatively simple. so that is https://review.opendev.org/c/openstack/swift/+/767410 21:28:12 <acoles> back to the caching patch, I wanted to mention that by default the shard caching is enabled 21:28:38 <acoles> (BTW, this is similar to the existing object PUT shard caching that timburke added a while back) 21:29:34 <clayg> true true, which also was on by default - and helps with object PUT throughput quite a big on sharded containers 21:29:43 <acoles> the complete set of shard ranges are cached on memcache, so if you have big containers with lots of shard ranges, the memcache values can be large. we have found it necessary to increase the default memcache config to allow for these increased sizes 21:30:10 <zaitcev> so, how do I know what sizes to use? 21:30:21 <clayg> hence: https://bugs.launchpad.net/swift/+bug/1890643 21:30:22 <openstack> Launchpad bug 1890643 in OpenStack Object Storage (swift) "shard range cache can get too big for memcache" [Undecided,New] 21:30:23 <acoles> which is why I flag up the 'on by default' property 21:30:26 <mattoliverau> Maybe we need to make sure that is mentioned something in doc? deployment or admin guide? 21:30:44 <acoles> mattoliverau: yes! and we knew you would volunteer :D 21:30:49 <acoles> ;) 21:30:51 <clayg> mattoliverau: I think a doc change would be a GREAT fix for lp bug #1890643 21:31:20 <mattoliverau> lol, ok.. so long as someone gives me a hint on the right asswer I'll add it somewhere :P 21:31:21 <acoles> zaitcev: yes, so we propose to add doc about memcache sizing 21:31:22 <clayg> zaitcev: IIRC we bumped up from 1MB to 5MB? although 10MB would have probably been more reasonable 🤔 21:32:26 <clayg> by the time we caught it our largest container's shard ranges were something like 1400 KiB (but I think it was also full of overlaps) - point is we're close enough to the default swift deployments should be aware of it 21:32:28 <timburke> the default for memcache is something like 1MB, right? ballpark an upper bound for a serialized shard range at something like 2-4kb, so you ought to be good until you've got >250 or so shard ranges for a container 21:32:56 <clayg> maybe worse is that caching 5MB of json in memcache might not be like... "perfectly optimized" 😬 21:34:26 <timburke> we should probably also flag up what happens when we exceed the max memcache value size (both here and in the deployment guide somewhere ;-) 21:35:52 <timburke> in short, the `set` then works like a `delete`, and you start hammering the container DB again, similar to if caching were disabled 21:36:31 <clayg> yeah, caching disable sucks - which is why it's on by default 21:37:12 <clayg> but if you're sharding (which is still notoriously difficult/manual) - but if you're sharding, luckily you'll also be caching (unless you didn't know the sekret memcache tweak - then you won't be caching... which sucks) 21:37:42 <mattoliverau> hmm, maybe can recommend pulling the size stats from memcache peridically in one's monitoring scripts. Then you can know when your going to hit a ceiling. 21:37:54 <clayg> but after we merge p 761659 and mattoliverau fixes lp bug #1890643 everything will be great! 21:37:56 <openstack> Launchpad bug 1890643 in OpenStack Object Storage (swift) "shard range cache can get too big for memcache" [Undecided,New] https://launchpad.net/bugs/1890643 21:38:40 <clayg> mattoliverau: uhh... WE should probably do that 🤔 21:39:12 <clayg> we have some other memcache stats - like hit/miss - it's possible "min/max/avg key size" comes out too? 21:39:35 <acoles> we should also figure out shrinking so that our shard lists do not grow infinitely long...I'm working on that 21:39:37 <mattoliverau> according to google you can 'states sizes' 21:39:53 <mattoliverau> according to google you can *'stats sizes' 21:40:54 <mattoliverau> but the call may lock up for a while, so only something you'd want to do occasitionally 21:41:07 <mattoliverau> acoles: YES! 21:41:32 <clayg> STAT sizes_status disabled 🤔 21:41:33 <timburke> it may also come back with `STAT sizes_status disabled` (apparently; that's what my vsaio says, anyway) 21:41:51 <mattoliverau> bugger 21:42:09 <clayg> NEWAY - point is mattoliverau is going to merge this AWESOME FEATURE this afternoon - and do some docs soon - because matt is 'mazing 21:42:29 <mattoliverau> lol, or rather acoles rocks :) 21:42:54 <clayg> there's not a finate amount of awesomeness in the universe - you AND acoles can both rock 💪 21:43:19 <acoles> you rock I'll roll 21:43:27 <timburke> is anyone opposed to the caching being on by default? iirc there was some concern about it before, but it sounds like most of the people involved with the patch now feel like on-by-default is the right way to go 21:44:09 <acoles> (repeating) the object PUT shard caching is already on by default 21:44:18 <mattoliverau> I figure 1. it's proven itself in prod and 2. sharding isn't something that is suppose to happen all the time so caching for some time is fine :) 21:45:30 <clayg> yeah the "amount of time" is configurable - but the "pounding the root to get ranges every object PUT and every container listing" was NOT working 21:45:42 <clayg> i guess in theory you could turn the value to 0 if you hate your container servers 🤔 21:45:45 <clayg> or your clients 21:46:04 <timburke> all right, sounds like we've got a plan then 21:46:10 <timburke> that's all we had on the agenda 21:46:15 <timburke> #topic open discussion 21:46:15 <acoles> clayg: yep, setting the expiry time to 0 disables the caching 21:46:25 <timburke> what else should we bring up this week? 21:47:06 <clayg> idk, we have some crazy clients - they be listing like massive concurrency for like parallel search or something 21:47:27 <acoles> did I see audit watchers merged? 21:47:42 <zaitcev> Yes 21:48:00 <acoles> then we should congratulate zaitcev and all others involved - congrats! 21:48:21 <timburke> it's so great. i need to write a few more watchers :-) 21:48:37 <clayg> zaitcev: congrats!!! 21:48:42 <zaitcev> acoles: dsariel did a lot of work 21:48:43 <acoles> I mean, I had an entire career diversion while you kept plugging away on that 21:48:44 <mattoliverau> Yeah! great work 21:48:48 <timburke> slo validation's gonna be a good one... 21:49:05 <zaitcev> Unfortunately, torgomatic is no longer around. 21:49:08 <mattoliverau> acoles: lol 21:49:18 <acoles> dsariel: congrats and thanks too 21:50:49 <zaitcev> We had some good changes done in the very final round of reviews. I just wish this good feedback could be gotten a year ago. I remember what I rolled out 2 PTGs ago, oh boy. And I thought it was ready to merge, too. 21:50:55 <acoles> zaitcev: :'( yes, but maybe torgomatic is .... 'watching' 21:51:10 <timburke> if anyone wants to look at some other watcher code, i've got https://review.opendev.org/c/openstack/swift/+/744078/ (EC stats, looking at libec versions and crcs) and https://review.opendev.org/c/openstack/swift/+/766640 (per-policy stats) proposed 21:51:43 <zaitcev> I don't quite understand what the EC watcher does, but the CRC one clearly is useful. 21:53:06 <zaitcev> I already went and abandoned Sam's original review 212824. 21:53:09 <timburke> yeah, the CRC guy seems real useful for figuring out whether it's safe to upgrade libec 21:54:19 <timburke> the per-policy stats are mainly just me wanting a way to get a size distribution of objects, and a split on how much data's in each policy 21:55:43 <timburke> all right, i'm calling it 21:55:56 <timburke> thank you all for coming, and thank you for working on swift! 21:56:00 <timburke> #endmeeting