21:00:08 <timburke> #startmeeting swift 21:00:08 <opendevmeet> Meeting started Wed Jun 1 21:00:08 2022 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:08 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:08 <opendevmeet> The meeting name has been set to 'swift' 21:00:17 <timburke> who's here for the swift meeting? 21:00:44 <kota> o/ 21:01:09 <clayg> o/ 21:02:19 <timburke> as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:02:47 <timburke> first up 21:03:01 <timburke> #topic deprecating sha1 in formpost 21:03:14 <timburke> we merged the deprecation for tempurl! 21:03:27 <kota> cool 21:03:28 <timburke> now to get the same for formpost :-) 21:03:52 <timburke> but of course, before we can do *that*, we need to support some other signature 21:04:03 <timburke> #link https://review.opendev.org/c/openstack/swift/+/838434 21:04:16 <timburke> adds support for sha256 and sha512 21:05:10 <timburke> once we have that, we can get a patch against tempest to use one of those instead of sha1 21:05:20 <timburke> #link https://review.opendev.org/c/openstack/swift/+/833713 21:05:33 <timburke> meanwhile, does the actual deprecation for sha1 21:06:15 <timburke> i was hoping mattoliver would be around to talk about the new-digests patch a little, but i think it might be a holiday in australia 21:06:57 <mattoliver> Oh hi o/ 21:07:07 <timburke> \o/ 21:07:08 <mattoliver> No just running late 21:07:24 <timburke> so i had two main questions 21:08:45 <timburke> one was that the sig always seems to be hex-encoded -- which seems like an inconsistency from what i did for tempurl in https://review.opendev.org/c/openstack/swift/+/525770 21:09:26 <timburke> there, you could have <hex-encoded-sig> or <digest>:<base64-encoded-sig> 21:09:56 <timburke> i was wondering whether that was an intentional difference, and how much we care 21:11:43 <timburke> the other question was around some unused info we're tracking in test_prefixed_and_not_prefixed_sigs_unsupported -- i'm not sure whether we want to make some assertions on it, or just drop it 21:11:46 <mattoliver> No real intentional difference. I guess I should keep them the same as tempurl so we're consistant. 21:11:55 <clayg> well, it WAS always *just* `<hex-encoded-sig>` until we added support for other signatures? And then it was always `<digest>:<base64-encoded-sig>`? 21:12:29 <clayg> er... *if* you specify a digest then it better be base64 - you can't do `<digest>:<hex-encoded-sig>` for tempurl? 21:13:09 <timburke> ^^^ that's the one 21:14:01 <timburke> you can still do <hex-encoded-sig> for sha1, sha256, and sha512 with tempurl -- it'll detect which digest based on the length 21:14:54 <timburke> a hex-encoded sha512 sig seems pretty long to me, so i added base64 support -- but required the prefix 21:16:20 <timburke> anyway, sounds like we've got a plan to move forward -- and i figure we can just drop the extra info that's being tracked in that test 21:16:49 <timburke> #topic backend rate limiting 21:17:37 <timburke> i don't *think* we've tried this out in prod yet, but wanted to check in to make sure 21:17:45 <mattoliver> doesn't adding base64 increase a string by like a 3rd, so makes things longer.. maybe it's too early for my brain to work. either case I'll match tempurl. 21:18:04 <mattoliver> Oh we've got stuff setup in staging 21:18:16 <timburke> woo! 21:18:31 <mattoliver> And I've been waiting on SRE to start testing backend ratelimit with the load generator 21:19:08 <mattoliver> Now sure where that's at atm though. Long weekend for US and all that. 21:19:44 <mattoliver> But will try and find out. I'm lined up to use the load balancers next for the noop pipemutex testing. 21:19:56 <mattoliver> *load generators 21:19:59 <timburke> 👍 21:20:04 <timburke> i know there were some concerns about interactions with the proxy's error limiting -- and it made me wonder if we should rethink how we do that 21:20:37 <mattoliver> Al and I had a talk about that.. and we are more conviced it could do the right thing at when used at scale.. 21:21:01 <mattoliver> not sure staging is the right scale though (there isn't many nodes to loose in staging). 21:21:04 <mattoliver> but we'll see. 21:21:20 <timburke> i was thinking, what if instead of a threshold that would trigger some cool-down period, we started trying to estimate the probability of a request to a backend giving an error 21:22:00 <timburke> and "error-limit" roughly that proportion of requests to the backend 21:22:12 <mattoliver> oh, interesting thought 21:23:00 <timburke> something like https://paste.opendev.org/show/bj5m3lXImDLiX5tp4Bvu/ 21:23:33 <clayg> mattoliver: i'm sure that SRE could use some help with load generation; we could also try turning the replication facing rate limit way down and try and make sure the consistency engine still works as expected 21:26:05 <mattoliver> @clay 👍️ good thinking! Confirming that and no proxy to get in the way.. unless internal clients, but not sure how effective any error_limitting would be there. 21:27:05 <timburke> pretty sure i've seen error limiting kick in for internal clients before (though i don't remember the context now) 21:27:52 <timburke> i think it'd be super-interesting to try different generated loads against both error-limiting implementations, ideally with a few different configs 21:28:18 <timburke> which probably means i ought to try to get tests passing with my idea :-) 21:28:44 <mattoliver> When we have enough proxies, error limitting + backend ratelimiting kinda does what we expect. Take out a proxy would actually cut off some load, giving others "more" to those the nodes.. so at scale it might actaully be great. But on a SAIO with 4 nodes and 1 proxy.. well it just takes down your cluster pretty quickly with alot of load :P 21:30:08 <mattoliver> Yeah, depends on the life of the internal client I guess. If it sticks around or alot is done then the in memory error_limiting structure will be used. 21:30:21 <mattoliver> But yeah testing both approaches would be interesting. 21:31:02 <timburke> all right, i think we know what we're doing next to get that merged 21:31:05 <timburke> #topic s3api test suite 21:31:42 <timburke> i remember one of the action items coming out of the last PTG was to actually run the handful of tests we've got under test/s3api in the gate 21:31:53 <timburke> #link https://review.opendev.org/c/openstack/swift/+/843567 21:32:02 <timburke> will do that 21:32:50 <kota> nice 21:33:18 <timburke> acoles also poked at simplifying the process of running those tests against actual AWS 21:33:24 <timburke> #link https://review.opendev.org/c/openstack/swift/+/838563 21:33:56 <timburke> i think my only concern on that was that it'd be nice if we could piggy-back a little more on boto3's config parsing 21:34:58 <timburke> maybe that'd be better as future work, though 21:37:01 <timburke> if anyone had some review cycles to spare, those would be handy additions, i think 21:37:49 <timburke> one last-minute topic (since both clayg and kota are here :-) 21:37:54 <timburke> #topic expiring MPUs 21:38:24 <timburke> a while back, clayg wrote a patch to delete MPU segments when the manifest expires 21:38:26 <timburke> #link https://review.opendev.org/c/openstack/swift/+/800701 21:39:17 <timburke> we've been running it in prod for a while -- i want to know: do we have any reservations about merging this to master? 21:40:10 <zaitcev> Hmm 21:40:54 <zaitcev> Makes sense, if only S3 orphan segments are gathered. 21:42:59 <timburke> clayg, maybe it'd be worth going over the goals of the patch, how it decides when to delete segments, and which segments to delete? 21:44:28 <clayg> there was some stuff in the response of the DELETE request to the manifest that indicated it was an s3 manifest - and because s3 MPU doesn't let you reference/access your segments; we know it's safe to DELETE them 21:46:07 <mattoliver> I think it's good. Solves a big issue. Is there way we can leverage something similar for the known related-bug (on overwrite)? or do we need an auditor watcher to run periodically? 21:47:45 <timburke> i think we still need the auditor -- even with just this, if the expirer falls over between deleting the manifest and finishing deleting the segments, there's still going to be orphans 21:48:02 <mattoliver> true 21:49:13 <zaitcev> It seems like a duplication of effort. If you have a semantic S3 watcher, it can delete. 21:52:27 <timburke> i think it becomes a matter of how quickly it can get deleted -- an audit watcher would clean things up over the course of days, but our users often want to be able to get themselves under quota again sooner rather than later 21:53:00 <timburke> so maybe there *is* an argument that we should try to do something similar for the overwrite case 21:53:15 <mattoliver> instead of inline delete, queueing them for delete I guess is the other option. But for overwrite, we'd need to check to see if a object exists is a MPU, and trigger a delete when we finalise (before we loose the old manifest). 21:54:25 <mattoliver> but overwrite would be in inline, so maybe queuing up with the expirer does make sense? 21:55:36 <mattoliver> moving the maifest and pulling an expiry on it, so only it needs to get queued up (overwrite case) 21:55:46 <mattoliver> sorry, now just thinking out loud. 21:56:05 <mattoliver> *putting an expiry on it. 21:56:07 <timburke> all right, i'll try to review the patch soon with an eye towards merging it as-is, and think some more about what else we can do 21:56:34 <timburke> last couple minutes 21:56:39 <timburke> #topic open discussion 21:56:47 <timburke> anything else we ought to bring up this week? 21:58:09 <timburke> all right, i'll call it then 21:58:21 <timburke> thank you all for coming, and thank you for working on swift! 21:58:25 <timburke> #endmeeting