21:00:08 #startmeeting swift 21:00:08 Meeting started Wed Jun 1 21:00:08 2022 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:08 The meeting name has been set to 'swift' 21:00:17 who's here for the swift meeting? 21:00:44 o/ 21:01:09 o/ 21:02:19 as usual, the agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:02:47 first up 21:03:01 #topic deprecating sha1 in formpost 21:03:14 we merged the deprecation for tempurl! 21:03:27 cool 21:03:28 now to get the same for formpost :-) 21:03:52 but of course, before we can do *that*, we need to support some other signature 21:04:03 #link https://review.opendev.org/c/openstack/swift/+/838434 21:04:16 adds support for sha256 and sha512 21:05:10 once we have that, we can get a patch against tempest to use one of those instead of sha1 21:05:20 #link https://review.opendev.org/c/openstack/swift/+/833713 21:05:33 meanwhile, does the actual deprecation for sha1 21:06:15 i was hoping mattoliver would be around to talk about the new-digests patch a little, but i think it might be a holiday in australia 21:06:57 Oh hi o/ 21:07:07 \o/ 21:07:08 No just running late 21:07:24 so i had two main questions 21:08:45 one was that the sig always seems to be hex-encoded -- which seems like an inconsistency from what i did for tempurl in https://review.opendev.org/c/openstack/swift/+/525770 21:09:26 there, you could have or : 21:09:56 i was wondering whether that was an intentional difference, and how much we care 21:11:43 the other question was around some unused info we're tracking in test_prefixed_and_not_prefixed_sigs_unsupported -- i'm not sure whether we want to make some assertions on it, or just drop it 21:11:46 No real intentional difference. I guess I should keep them the same as tempurl so we're consistant. 21:11:55 well, it WAS always *just* `` until we added support for other signatures? And then it was always `:`? 21:12:29 er... *if* you specify a digest then it better be base64 - you can't do `:` for tempurl? 21:13:09 ^^^ that's the one 21:14:01 you can still do for sha1, sha256, and sha512 with tempurl -- it'll detect which digest based on the length 21:14:54 a hex-encoded sha512 sig seems pretty long to me, so i added base64 support -- but required the prefix 21:16:20 anyway, sounds like we've got a plan to move forward -- and i figure we can just drop the extra info that's being tracked in that test 21:16:49 #topic backend rate limiting 21:17:37 i don't *think* we've tried this out in prod yet, but wanted to check in to make sure 21:17:45 doesn't adding base64 increase a string by like a 3rd, so makes things longer.. maybe it's too early for my brain to work. either case I'll match tempurl. 21:18:04 Oh we've got stuff setup in staging 21:18:16 woo! 21:18:31 And I've been waiting on SRE to start testing backend ratelimit with the load generator 21:19:08 Now sure where that's at atm though. Long weekend for US and all that. 21:19:44 But will try and find out. I'm lined up to use the load balancers next for the noop pipemutex testing. 21:19:56 *load generators 21:19:59 👍 21:20:04 i know there were some concerns about interactions with the proxy's error limiting -- and it made me wonder if we should rethink how we do that 21:20:37 Al and I had a talk about that.. and we are more conviced it could do the right thing at when used at scale.. 21:21:01 not sure staging is the right scale though (there isn't many nodes to loose in staging). 21:21:04 but we'll see. 21:21:20 i was thinking, what if instead of a threshold that would trigger some cool-down period, we started trying to estimate the probability of a request to a backend giving an error 21:22:00 and "error-limit" roughly that proportion of requests to the backend 21:22:12 oh, interesting thought 21:23:00 something like https://paste.opendev.org/show/bj5m3lXImDLiX5tp4Bvu/ 21:23:33 mattoliver: i'm sure that SRE could use some help with load generation; we could also try turning the replication facing rate limit way down and try and make sure the consistency engine still works as expected 21:26:05 @clay 👍️ good thinking! Confirming that and no proxy to get in the way.. unless internal clients, but not sure how effective any error_limitting would be there. 21:27:05 pretty sure i've seen error limiting kick in for internal clients before (though i don't remember the context now) 21:27:52 i think it'd be super-interesting to try different generated loads against both error-limiting implementations, ideally with a few different configs 21:28:18 which probably means i ought to try to get tests passing with my idea :-) 21:28:44 When we have enough proxies, error limitting + backend ratelimiting kinda does what we expect. Take out a proxy would actually cut off some load, giving others "more" to those the nodes.. so at scale it might actaully be great. But on a SAIO with 4 nodes and 1 proxy.. well it just takes down your cluster pretty quickly with alot of load :P 21:30:08 Yeah, depends on the life of the internal client I guess. If it sticks around or alot is done then the in memory error_limiting structure will be used. 21:30:21 But yeah testing both approaches would be interesting. 21:31:02 all right, i think we know what we're doing next to get that merged 21:31:05 #topic s3api test suite 21:31:42 i remember one of the action items coming out of the last PTG was to actually run the handful of tests we've got under test/s3api in the gate 21:31:53 #link https://review.opendev.org/c/openstack/swift/+/843567 21:32:02 will do that 21:32:50 nice 21:33:18 acoles also poked at simplifying the process of running those tests against actual AWS 21:33:24 #link https://review.opendev.org/c/openstack/swift/+/838563 21:33:56 i think my only concern on that was that it'd be nice if we could piggy-back a little more on boto3's config parsing 21:34:58 maybe that'd be better as future work, though 21:37:01 if anyone had some review cycles to spare, those would be handy additions, i think 21:37:49 one last-minute topic (since both clayg and kota are here :-) 21:37:54 #topic expiring MPUs 21:38:24 a while back, clayg wrote a patch to delete MPU segments when the manifest expires 21:38:26 #link https://review.opendev.org/c/openstack/swift/+/800701 21:39:17 we've been running it in prod for a while -- i want to know: do we have any reservations about merging this to master? 21:40:10 Hmm 21:40:54 Makes sense, if only S3 orphan segments are gathered. 21:42:59 clayg, maybe it'd be worth going over the goals of the patch, how it decides when to delete segments, and which segments to delete? 21:44:28 there was some stuff in the response of the DELETE request to the manifest that indicated it was an s3 manifest - and because s3 MPU doesn't let you reference/access your segments; we know it's safe to DELETE them 21:46:07 I think it's good. Solves a big issue. Is there way we can leverage something similar for the known related-bug (on overwrite)? or do we need an auditor watcher to run periodically? 21:47:45 i think we still need the auditor -- even with just this, if the expirer falls over between deleting the manifest and finishing deleting the segments, there's still going to be orphans 21:48:02 true 21:49:13 It seems like a duplication of effort. If you have a semantic S3 watcher, it can delete. 21:52:27 i think it becomes a matter of how quickly it can get deleted -- an audit watcher would clean things up over the course of days, but our users often want to be able to get themselves under quota again sooner rather than later 21:53:00 so maybe there *is* an argument that we should try to do something similar for the overwrite case 21:53:15 instead of inline delete, queueing them for delete I guess is the other option. But for overwrite, we'd need to check to see if a object exists is a MPU, and trigger a delete when we finalise (before we loose the old manifest). 21:54:25 but overwrite would be in inline, so maybe queuing up with the expirer does make sense? 21:55:36 moving the maifest and pulling an expiry on it, so only it needs to get queued up (overwrite case) 21:55:46 sorry, now just thinking out loud. 21:56:05 *putting an expiry on it. 21:56:07 all right, i'll try to review the patch soon with an eye towards merging it as-is, and think some more about what else we can do 21:56:34 last couple minutes 21:56:39 #topic open discussion 21:56:47 anything else we ought to bring up this week? 21:58:09 all right, i'll call it then 21:58:21 thank you all for coming, and thank you for working on swift! 21:58:25 #endmeeting