21:00:34 <timburke> #startmeeting swift 21:00:35 <openstack> Meeting started Wed May 20 21:00:34 2020 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:38 <openstack> The meeting name has been set to 'swift' 21:00:41 <timburke> who's here for the swift meeting? 21:00:48 <mattoliverau> o/ 21:00:49 <rledisez> hi o/ 21:00:53 <kota_> o/ 21:01:17 <tdasilva> o/ 21:02:15 <timburke> agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:02:22 <timburke> #topic ptg 21:02:35 <timburke> it's only a week and a half away! 21:03:04 <timburke> thanks everybody for adding topics to the etherpad 21:03:07 <timburke> #link https://etherpad.opendev.org/p/swift-ptg-victoria 21:03:29 <timburke> i made sure that the link was included on http://ptg.openstack.org/etherpads.html 21:04:15 <timburke> if you haven't already, please do register -- i expect it'll help with logistics/planning 21:04:20 <timburke> #link https://www.eventbrite.com/e/virtual-project-teams-gathering-june-2020-tickets-103456996662 21:05:15 <timburke> mattoliverau did a great job of making sure we had some timeslots booked for video conferencing 21:05:36 <timburke> the schedule of those is up at http://ptg.openstack.org/ptg.html 21:05:48 <timburke> as well as the top of the etherpad 21:07:30 <timburke> one thing i think i'd like to try is having everyone come to the ptg with a few patches they'd really like to see some progress on -- pick three or so and add them (and your name) to the priority reviews page! 21:07:33 <timburke> #link https://wiki.openstack.org/wiki/Swift/PriorityReviews 21:08:07 <timburke> any questions/comments on the ptg? 21:08:31 <mattoliverau> Nope, looking forward to it :) 21:09:42 <timburke> on to new business :-) 21:09:52 <timburke> #topic ratelimit + s3api 21:10:55 <timburke> so after digging out from the massive pile of async pendings, i wanted to make sure it didn't happen again, at least not easily. and one easy way to do that is to limit how quickly writes happen in a cluster 21:11:34 <timburke> fortunately, we have a ratelimit middleware! unfortunately, it could be a little annoying to deploy with s3api 21:12:47 <timburke> (my understanding is) you usually want to place it left of auth -- that auth decision may be expensive, and you don't want to have auth fall down because of an over-eager swift client 21:13:34 <timburke> but if it's an s3 request, you won't have the full account/container path until *after* auth 21:14:28 <timburke> so i'm thinking that having it twice (once before s3api and once after auth) might be a reasonable way to go? how do other people deploy that? 21:14:51 <timburke> #link https://review.opendev.org/729051 21:14:51 <patchbot> patch 729051 - swift - ratelimit: Allow multiple placements - 3 patch sets 21:15:55 <timburke> rledisez, surely you've got *something* for this, yeah? is it a custom ratelimiter? 21:16:23 <rledisez> right now (when we enabled it) we put it left of auth and s3api, but we don't have much s3 requests. and by default we do not use it 21:16:34 <rledisez> timburke: nothing custom. we don't ratelimit, we scale :P 21:17:03 <rledisez> joke aside, we enable this only when really necessary, it's pretty rare 21:17:04 <kota_> excellent 21:17:35 <timburke> oh, cool! never mind then ;-) 21:17:38 <rledisez> the usual situation is the "delete storm" 21:18:49 <timburke> fwiw, following https://review.opendev.org/#/c/697535/ we can have ratelimit right of s3api and auth and still serve "reasonable" responses to s3 clients 21:18:50 <patchbot> patch 697535 - swift - s3api: Better handle 498/429 responses (MERGED) - 1 patch set 21:20:04 <timburke> ...but i quickly realized that it'd throw off my metrics, since AWS sends out 503s -- i want a way to easily differentiate between 503 (slow down) and 503 (backend failed) 21:20:20 <timburke> which led to https://review.opendev.org/729092 21:20:21 <patchbot> patch 729092 - swift - s3api: Log ratelimited requests as 498s - 2 patch sets 21:21:22 <timburke> idk how sane of a thing that is to do though -- it feels a little dirty lying in logs like that 21:23:03 <timburke> so i also started thinking about returning some other error code -- i don't know of any s3 clients that would retry a 498, but the rfc-compliant 429 seemed to get awscli to retry at least 21:23:06 <timburke> #link https://review.opendev.org/729093 21:23:07 <patchbot> patch 729093 - swift - s3api: Add config option to return 429s on ratelimit - 1 patch set 21:23:42 <timburke> rledisez, kota_: any opinion on which approach seems better/more reasonable? 21:24:43 <timburke> (could even do both, i suppose; if configured, log & return 429, otherwise log 498 but return 503) 21:25:13 <rledisez> we need to log the same code that what is returned to customer (otherwise we can't discuss the SLA: I got 10% 503, I only see 2%) 21:25:30 <kota_> sounds reeasonable but what i thought when looking the log change patch a little is we may leave 503 logging for debug due. 21:25:31 <rledisez> so I would say we need to reeturn something difference 21:26:00 <kota_> s/due/for user support/ 21:26:17 <kota_> i didn't dig it in detail yet, just quick look. 21:27:14 <timburke> *nod* makes sense. i suppose i ought to dig in more to see how well other s3 clients support 429 21:27:48 <rledisez> I know that I would for sure only use the "429" option you describe, seems the best option (and retry is not that bad I guess, it does not consume much resources) 21:28:06 <kota_> user might say, "hey I got 503s" so 503s reported in the swift logs helps us to debug. 21:29:39 <timburke> all right, on to updates! 21:29:50 <timburke> #topic lots of small files 21:30:12 <timburke> rledisez, alecuyer how's it going? 21:31:03 <rledisez> So alecuyer is off, he told me few days ago that some tests were passing. I guess he worked on it this week but to be honest I don't know much. We saw a change in diskfile that need to be backoprted to LOSF (a new parameter to _finalize_durable from a recent patch of you timburke) 21:31:54 <rledisez> that's it 21:32:31 <timburke> 👍 sorry for the extra trouble ;-) 21:33:10 <timburke> #topic database reclaim locking 21:33:45 <timburke> i was hoping clayg would be around to discuss his findings on https://review.opendev.org/#/c/727876/ 21:33:46 <patchbot> patch 727876 - swift - Breakup reclaim into batches - 4 patch sets 21:34:02 <timburke> but i could take a stab at it :D 21:35:36 <timburke> the current code breaks up the deleted namespace into batches of ~1k and reaps them in a loop, taking getting a fresh connection/lock each time 21:36:48 <timburke> and it seems to be working well! the reclaim still takes a while, but the server could continue writing deletes at a decent clip while it was happening 21:37:44 <timburke> i think tests should pass now, and it's ready for review! i might have to ask clay for his testing setup though 21:38:07 <timburke> has anyone else had a chance to take a look at it yet? 21:38:51 <rledisez> not yet 21:40:08 <timburke> no worries. it's ready when you are :-) 21:40:17 <timburke> that's all i had 21:40:22 <timburke> #topic open discussion 21:40:32 <timburke> anything else we should discuss? 21:42:50 <timburke> i wonder whether clayg's reclaim patch will make https://review.opendev.org/#/c/571917/ and https://review.opendev.org/#/c/724943/ unnecessary... 21:42:50 <patchbot> patch 571917 - swift - Manage async_pendings priority per containers - 5 patch sets 21:42:51 <patchbot> patch 724943 - swift - WIP: Batched updates for object-updater - 2 patch sets 21:43:54 <clayg> one of the comments on the patch has a gist for the script i'm using to do reclaim in one thread while inserting tombstones in another 21:45:24 <timburke> all right, we oughta let mattoliverau and kota_ start their day ;-) 21:45:32 <rledisez> I'm not sure that it makes them unecessary. they are just other tools in the toolbox to reduce the container-listing lag 21:45:40 <timburke> thank you all for coming, and thank you for working on siwft! 21:45:45 <timburke> #endmeeting