21:00:08 <timburke> #startmeeting swift
21:00:08 <opendevmeet> Meeting started Wed Jun  1 21:00:08 2022 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:08 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:08 <opendevmeet> The meeting name has been set to 'swift'
21:00:17 <timburke> who's here for the swift meeting?
21:00:44 <kota> o/
21:01:09 <clayg> o/
21:02:19 <timburke> as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift
21:02:47 <timburke> first up
21:03:01 <timburke> #topic deprecating sha1 in formpost
21:03:14 <timburke> we merged the deprecation for tempurl!
21:03:27 <kota> cool
21:03:28 <timburke> now to get the same for formpost :-)
21:03:52 <timburke> but of course, before we can do *that*, we need to support some other signature
21:04:03 <timburke> #link https://review.opendev.org/c/openstack/swift/+/838434
21:04:16 <timburke> adds support for sha256 and sha512
21:05:10 <timburke> once we have that, we can get a patch against tempest to use one of those instead of sha1
21:05:20 <timburke> #link https://review.opendev.org/c/openstack/swift/+/833713
21:05:33 <timburke> meanwhile, does the actual deprecation for sha1
21:06:15 <timburke> i was hoping mattoliver would be around to talk about the new-digests patch a little, but i think it might be a holiday in australia
21:06:57 <mattoliver> Oh hi o/
21:07:07 <timburke> \o/
21:07:08 <mattoliver> No just running late
21:07:24 <timburke> so i had two main questions
21:08:45 <timburke> one was that the sig always seems to be hex-encoded -- which seems like an inconsistency from what i did for tempurl in https://review.opendev.org/c/openstack/swift/+/525770
21:09:26 <timburke> there, you could have <hex-encoded-sig> or <digest>:<base64-encoded-sig>
21:09:56 <timburke> i was wondering whether that was an intentional difference, and how much we care
21:11:43 <timburke> the other question was around some unused info we're tracking in test_prefixed_and_not_prefixed_sigs_unsupported -- i'm not sure whether we want to make some assertions on it, or just drop it
21:11:46 <mattoliver> No real intentional difference. I guess I should keep them the same as tempurl so we're consistant.
21:11:55 <clayg> well, it WAS always *just* `<hex-encoded-sig>` until we added support for other signatures?  And then it was always `<digest>:<base64-encoded-sig>`?
21:12:29 <clayg> er... *if* you specify a digest then it better be base64 - you can't do `<digest>:<hex-encoded-sig>` for tempurl?
21:13:09 <timburke> ^^^ that's the one
21:14:01 <timburke> you can still do <hex-encoded-sig> for sha1, sha256, and sha512 with tempurl -- it'll detect which digest based on the length
21:14:54 <timburke> a hex-encoded sha512 sig seems pretty long to me, so i added base64 support -- but required the prefix
21:16:20 <timburke> anyway, sounds like we've got a plan to move forward -- and i figure we can just drop the extra info that's being tracked in that test
21:16:49 <timburke> #topic backend rate limiting
21:17:37 <timburke> i don't *think* we've tried this out in prod yet, but wanted to check in to make sure
21:17:45 <mattoliver> doesn't adding base64 increase a string by like a 3rd, so makes things longer.. maybe it's too early for my brain to work. either case I'll match tempurl.
21:18:04 <mattoliver> Oh we've got stuff setup in staging
21:18:16 <timburke> woo!
21:18:31 <mattoliver> And I've been waiting on SRE to start testing backend ratelimit with the load generator
21:19:08 <mattoliver> Now sure where that's at atm though. Long weekend for US and all that.
21:19:44 <mattoliver> But will try and find out. I'm lined up to use the load balancers next for the noop pipemutex testing.
21:19:56 <mattoliver> *load generators
21:19:59 <timburke> 👍
21:20:04 <timburke> i know there were some concerns about interactions with the proxy's error limiting -- and it made me wonder if we should rethink how we do that
21:20:37 <mattoliver> Al and I had a talk about that.. and we are more conviced it could do the right thing at when used at scale..
21:21:01 <mattoliver> not sure staging is the right scale though (there isn't many nodes to loose in staging).
21:21:04 <mattoliver> but we'll see.
21:21:20 <timburke> i was thinking, what if instead of a threshold that would trigger some cool-down period, we started trying to estimate the probability of a request to a backend giving an error
21:22:00 <timburke> and "error-limit" roughly that proportion of requests to the backend
21:22:12 <mattoliver> oh, interesting thought
21:23:00 <timburke> something like https://paste.opendev.org/show/bj5m3lXImDLiX5tp4Bvu/
21:23:33 <clayg> mattoliver: i'm sure that SRE could use some help with load generation; we could also try turning the replication facing rate limit way down and try and make sure the consistency engine still works as expected
21:26:05 <mattoliver> @clay 👍️ good thinking! Confirming that and no proxy to get in the way.. unless internal clients, but not sure how effective any error_limitting would be there.
21:27:05 <timburke> pretty sure i've seen error limiting kick in for internal clients before (though i don't remember the context now)
21:27:52 <timburke> i think it'd be super-interesting to try different generated loads against both error-limiting implementations, ideally with a few different configs
21:28:18 <timburke> which probably means i ought to try to get tests passing with my idea :-)
21:28:44 <mattoliver> When we have enough proxies, error limitting + backend ratelimiting kinda does what we expect. Take out a proxy would actually cut off some load, giving others "more" to those the nodes.. so at scale it might actaully be great. But on a SAIO with 4 nodes and 1 proxy.. well it just takes down your cluster pretty quickly with alot of load :P
21:30:08 <mattoliver> Yeah, depends on the life of the internal client I guess. If it sticks around or alot is done then the in memory error_limiting structure will be used.
21:30:21 <mattoliver> But yeah testing both approaches would be interesting.
21:31:02 <timburke> all right, i think we know what we're doing next to get that merged
21:31:05 <timburke> #topic s3api test suite
21:31:42 <timburke> i remember one of the action items coming out of the last PTG was to actually run the handful of tests we've got under test/s3api in the gate
21:31:53 <timburke> #link https://review.opendev.org/c/openstack/swift/+/843567
21:32:02 <timburke> will do that
21:32:50 <kota> nice
21:33:18 <timburke> acoles also poked at simplifying the process of running those tests against actual AWS
21:33:24 <timburke> #link https://review.opendev.org/c/openstack/swift/+/838563
21:33:56 <timburke> i think my only concern on that was that it'd be nice if we could piggy-back a little more on boto3's config parsing
21:34:58 <timburke> maybe that'd be better as future work, though
21:37:01 <timburke> if anyone had some review cycles to spare, those would be handy additions, i think
21:37:49 <timburke> one last-minute topic (since both clayg and kota are here :-)
21:37:54 <timburke> #topic expiring MPUs
21:38:24 <timburke> a while back, clayg wrote a patch to delete MPU segments when the manifest expires
21:38:26 <timburke> #link https://review.opendev.org/c/openstack/swift/+/800701
21:39:17 <timburke> we've been running it in prod for a while -- i want to know: do we have any reservations about merging this to master?
21:40:10 <zaitcev> Hmm
21:40:54 <zaitcev> Makes sense, if only S3 orphan segments are gathered.
21:42:59 <timburke> clayg, maybe it'd be worth going over the goals of the patch, how it decides when to delete segments, and which segments to delete?
21:44:28 <clayg> there was some stuff in the response of the DELETE request to the manifest that indicated it was an s3 manifest - and because s3 MPU doesn't let you reference/access your segments; we know it's safe to DELETE them
21:46:07 <mattoliver> I think it's good. Solves a big issue. Is there way we can leverage something similar for the known related-bug (on overwrite)? or do we need an auditor watcher to run periodically?
21:47:45 <timburke> i think we still need the auditor -- even with just this, if the expirer falls over between deleting the manifest and finishing deleting the segments, there's still going to be orphans
21:48:02 <mattoliver> true
21:49:13 <zaitcev> It seems like a duplication of effort. If you have a semantic S3 watcher, it can delete.
21:52:27 <timburke> i think it becomes a matter of how quickly it can get deleted -- an audit watcher would clean things up over the course of days, but our users often want to be able to get themselves under quota again sooner rather than later
21:53:00 <timburke> so maybe there *is* an argument that we should try to do something similar for the overwrite case
21:53:15 <mattoliver> instead of inline delete, queueing them for delete I guess is the other option. But for overwrite, we'd need to check to see if a object exists is a MPU, and trigger a delete when we finalise (before we loose the old manifest).
21:54:25 <mattoliver> but overwrite would be in inline, so maybe queuing up with the expirer does make sense?
21:55:36 <mattoliver> moving the maifest and pulling an expiry on it, so only it needs to get queued up (overwrite case)
21:55:46 <mattoliver> sorry, now just thinking out loud.
21:56:05 <mattoliver> *putting an expiry on it.
21:56:07 <timburke> all right, i'll try to review the patch soon with an eye towards merging it as-is, and think some more about what else we can do
21:56:34 <timburke> last couple minutes
21:56:39 <timburke> #topic open discussion
21:56:47 <timburke> anything else we ought to bring up this week?
21:58:09 <timburke> all right, i'll call it then
21:58:21 <timburke> thank you all for coming, and thank you for working on swift!
21:58:25 <timburke> #endmeeting