21:00:18 <timburke> #startmeeting swift
21:00:19 <openstack> Meeting started Wed Sep 30 21:00:18 2020 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:22 <openstack> The meeting name has been set to 'swift'
21:00:27 <timburke> who's here for the swift meeting?
21:00:41 <seongsoocho> o/
21:00:50 <mattoliverau> o/
21:01:06 <alecuyer> o/
21:01:09 <rledisez> o/
21:01:12 <kota_> hi
21:02:18 <timburke> not a whole lot on the agenda
21:02:21 <timburke> #link https://wiki.openstack.org/wiki/Meetings/Swift
21:02:42 <timburke> #topic ptg
21:02:51 <timburke> we're less than a month away!
21:03:04 <timburke> planning etherpad is at
21:03:06 <timburke> #link https://etherpad.opendev.org/p/swift-ptg-wallaby
21:03:56 <timburke> i'll plan on an ops-feedback-style session for one of our slots, and make another etherpad just for that
21:04:50 <timburke> moving on
21:04:55 <clayg> o/
21:05:02 <timburke> #topic replication lock
21:05:11 <timburke> #link https://review.opendev.org/#/c/754242/
21:05:12 <patchbot> patch 754242 - swift - Fix a race condition in case of cross-replication - 4 patch sets
21:05:19 <timburke> clayg, i think you added this, right?
21:05:27 <clayg> i forget why I wanted to talk about this - rledisez has been pushing up new patchsets daily and it just keeps getting better!
21:05:51 <clayg> rledisez: do you want some help with a ssync_reciever test?  I'm not sure I can a probe test ๐Ÿค”
21:06:22 <rledisez> I still have one (major) point to discuss. this patchset only fix the issue for SSYNC. But rsync is(must be) impacted too
21:07:02 <rledisez> and yeah, I don't know how to write a probetest for this
21:08:22 <rledisez> I don't know how to fix rsync actually
21:08:53 <clayg> meh rsync
21:09:18 <timburke> part of me wants the excuse to change our "saio" to a "saif" or so...
21:09:43 <clayg> swift-all-in-fhawhatnow?
21:10:29 <timburke> four! so like, `vagrant up` and you get four vms, each independently configurable
21:10:43 <clayg> rledisez: maybe this is just another reason to get of rsync and start working more on tsync - do you think we could "catch" rsync doing this?  how did you find it with ssync?
21:10:56 <timburke> or i could learn the dockers ๐Ÿค”
21:11:31 <rledisez> clayg: we wrote a tool do scan and discover in near real time disappearing fragments, and we started a rebalance. And we saw it happening
21:11:48 <clayg> I think we can say "closes-bug" with a caveat it still exists in rsync and has forever I think?  not being in the datapath takes so many good options off the table.
21:11:56 <rledisez> we have a cluster were it is (too) easily reproducible :(
21:12:11 <clayg> ๐Ÿ˜ข
21:12:25 <timburke> that's a damn cool tool
21:12:46 <mattoliverau> could we add the current ring version in the replicate verb call, and if I receive one newer I look for a new ring?
21:13:02 <clayg> so it's kinda like a dispersion report sort of thing?  like it checks on individual frags?
21:13:13 <mattoliverau> that plus a lock, cause thre would still be overlap when things can happen
21:13:34 <clayg> mattoliverau: so the tricky thing with update_deleted is it doesn't check with the remote node via REPLICATE (hash recalc) before pushing it's local data over rsync
21:13:55 <rledisez> alecuyer: is the one writing it. ir scan disk and push in a single DB all files. then a tool scan the DB and list objects that don't have the required number of frags or replicas.
21:14:01 <clayg> it just says "this isn't MY part; rsync away - 1, 2, 3 - everyone good?  BELETED"
21:14:44 <rledisez> mattoliverau: the object-server does not have the ring loaded so it can't know the ring version.
21:15:12 <rledisez> + what clayg just said
21:15:27 <mattoliverau> oh yes, the whole no ring in the storage daemon thing.
21:16:22 <timburke> (not having the ring loaded isn't the end of the world; espescially if we only care about some of the json metadata at the front and not the full set of assignments -- it's something solvable)
21:16:22 <clayg> I mean, there's some version of "load the ring" that could just pull the metadata like the ring version, I think it'd be useful to pursue but the lock is better
21:17:34 <clayg> it's kinda weird that the replicator is so quick to pick up and notice a new part but so slow to realize it has new rings...
21:18:18 <rledisez> to be clear, it happens because our ring deployment takes about 30 minutes. if the ring deployment was synchronised, we may have never hit that bug
21:18:23 <clayg> everything is probably worse with EC than replicated - just in that deleting a copy doesn't trigger a rebuild
21:18:29 <timburke> i think the trouble is that it may *not* have the new ring yet -- yeah, what rledisez said
21:19:32 <clayg> so someone pushed it parts, and it doesn't believe they belong there... ๐Ÿค” lots more options with ssync than rsync
21:20:14 <clayg> anyway, I think rsync can suck a lemon and fixing ssync is going to be great!
21:20:43 <timburke> and much mor costly to be wrong with ssync than rsync. i agree; we ought to fix ssync and footnote rsync
21:20:47 <clayg> I'm super open to saying we want to deprecate rsync replication too, like you can't complain about bugs in deprecated deployment options right?
21:20:56 <clayg> timburke gets it
21:22:11 <rledisez> tbh, i'm not confortable to keep rsync in the code if the bug is not fixed. it's pretty bad, it bite us pretty hardโ€ฆ so i can't imagine that kind of bug being unfixed if we know it's there
21:23:42 <timburke> it's not that we *shouldn't* fix it, it's that we should land what fixes we have in hand, and shore up the rest when we can
21:24:02 <clayg> well we can't pull it out with deprecating - but it could be a strong warning - also I might be biased cause I'm not sure how to fix it... so maybe we can just keep thinking on that harder after we fix ssync (no reason to wait on fixing that right?)
21:25:31 <clayg> rledisez: it's not a new bug - there may be something about rsync with replicated data that makes it less of a problem for some reason we don't entirely understand yet ๐Ÿค”
21:25:52 <rledisez> clayg: agree with that. we can fix ssync quickly and think of rsync at the same time
21:26:30 <timburke> so, wanting to work toward that rsync fix: what about a pre-rsync REPLICATE that grabs the partition lock with some (configurable) lease time and a post-rsync REPLICATE to release it again?
21:26:59 <clayg> timburke: ๐Ÿคฏ
21:27:30 <rledisez> it's gonna be tricky to keep the lock without an active connection, but it's something we can try
21:27:52 <clayg> i'd hate to add additional suffix hashing that not's necessary - so it'd probably look more like "fool, i'm reverthing this to you - don't delete it for X hours" kind of new pre-flight message
21:27:53 <rledisez> i was thinking an SSYNC connection that just "wrap" the rsync run
21:28:16 <rledisez> and keeping it active by sending "noop" action
21:28:59 <clayg> rledisez: that sounds not terrible!
21:29:08 <timburke> either of those could work, too :-)
21:29:29 <clayg> look us acting like we know how to build software - thanks for finding the bug @rledisez !!!
21:29:57 <mattoliverau> +1
21:30:25 <kota_> +1
21:31:02 <rledisez> i'll finish the ssync patch and i'll do some test on rsync. any idea how to test that in an automated way?
21:32:08 <timburke> i'll start trying to get a multi-vm dev setup going with an explicit goal of repro-ing this and let you know how it goes
21:32:12 <mattoliverau> I guess we need to have a different rings on different "nodes" so maybe a ring override folder for siao
21:32:52 <clayg> rledisez: i wouldn't block anything on probe test - there's a lot of new requirements (timing, ring changes)
21:33:06 <rledisez> ok
21:33:19 <clayg> rledisez: there's a unittest that already has an ssync_reciever object server and reconstructor daemon - I'd start there
21:34:13 <timburke> anything else to bring up on this?
21:34:44 <rledisez> not on my side
21:34:59 <timburke> #topic async SLO segment deletion
21:35:02 <timburke> #link https://review.opendev.org/#/c/733026/
21:35:03 <patchbot> patch 733026 - swift - Add a new URL parameter to allow for async cleanup... - 11 patch sets
21:36:02 <timburke> so clay pointed out that we should bring this up with a broader audience before merging as-s
21:36:54 <clayg> well, but also you already added the operator config value
21:37:13 <clayg> what do folks think about having users dump a lot of "to be expired *rfn*" objects into your .expiring_objects quuee
21:37:17 <clayg> does anyone monitor that shit?
21:37:35 <timburke> still, seems worth asking whether i got the default for the config option right ;-)
21:38:14 <rledisez> i'd like to monitore it, but I never took time to wrote that patch in the expirer that would send a metric of the queue size
21:38:40 <mattoliverau> that's what the .expiring_objects queue is for. async delete. it's a good idea.
21:38:49 <rledisez> we sometime get late on deletion because of a burst, so it's useful to know something is going wrong
21:39:10 <mattoliverau> maybe we need to add some tooling around viewing and maintaining the queue would be useful though
21:39:30 <clayg> rledisez: something like the "lag" metric for the container-updater?  I never quite understood how that metric works in a time series visualization...
21:39:52 <clayg> ... although I *do* have some better tools for development with metrics - maybe I just need to look at it again!
21:40:39 <clayg> mattoliverau: all I have is https://gist.github.com/clayg/7f66eab2a61c77869e1e84ac4ed6f1df (run it on any node every N minutes and plot the numbers over time)
21:41:00 <mattoliverau> I always though expired queue deletion was a soft shedule. it shouldn't be deleted until at least that time.
21:41:26 <clayg> stale entires should be zero-ish, but any time someone does an SLO delete it would spike and depending on how aggressive your object-epxirers are configured...
21:41:27 <timburke> speaking of lag metrics, there's https://review.opendev.org/#/c/735271/ ...
21:41:27 <patchbot> patch 735271 - swift - metrics: Add lag metric to expirer - 1 patch set
21:42:24 <clayg> timburke: yeah!  i'll take a look at that one again, maybe that's just what we need (or a version of that which is slo-async-delete aware)
21:42:34 <mattoliverau> clayg: cool that's something, tracking the size is great :)
21:44:09 <clayg> i'm not even sure I could easily get all those lag metrics to even pop up in a dev env
21:44:55 <timburke> clayg, good point -- it should probably emit different metrics for expired vs. async deleted....
21:45:47 <clayg> but just poking at the queue listings doesn't scale horizontally, one of the container walkers could dump stats as it goes and then you just ... divide by replica count?  ๐Ÿคฎ
21:47:19 <clayg> ok - so "monitor expiring queue" sounds like something everyone is keen on, but no objections to extending SLO delete in this fashion?
21:48:05 <timburke> does anyone feel uncomfortable having the async-delete behavior as the default?
21:48:08 <clayg> having the objects in the segment container listing until their cleared seems reasonable - and it doesn't hurt anything if the client ends up deleting them on their own
21:48:31 <mattoliverau> no I think it's a great idea because it means we use an existing mechanism that is ment to do it. I'll review the patch today
21:49:05 <clayg> once the async delete successfully queues everything - the manifest is gone
21:49:49 <mattoliverau> timburke: I'm warming to the idea of it being the default
21:50:07 <mattoliverau> Will think about it while I review the patch.
21:50:48 <timburke> cool, sounds like we're good to go then, thanks. between mattoliverau, clayg, and zaitcev, i'm sure this is gonna turn out great :-)
21:50:59 <timburke> #topic open discussion
21:51:11 <timburke> last few minutes: anything else to bring up?
21:53:06 <timburke> i updated https://review.opendev.org/#/c/738959/ to have LIBERASURECODE_WRITE_LEGACY_CRC unset, set to "", and set to "0" all mean the same thing. i realized the commit message needs some clean up, but otherwise i think it's good to go
21:53:06 <patchbot> patch 738959 - liberasurecode - Be willing to write fragments with legacy crc - 3 patch sets
21:53:24 <clayg> ๐Ÿฅณ
21:53:41 <kota_> ok, thx. I'll circle back to the patch around Friday.
21:53:51 <timburke> thanks kota_!
21:54:20 <timburke> the swift side at https://review.opendev.org/#/c/739164/ should maybe be updated to set LIBERASURECODE_WRITE_LEGACY_CRC=1 instead of true, but that's pretty minor
21:54:21 <patchbot> patch 739164 - swift - ec: Add an option to write fragments with legacy crc - 2 patch sets
21:55:25 <clayg> I was playing with p 749400 trying to preserve existing behavior - but it got wonky when I put p 749401 on top of it ๐Ÿ˜ž
21:55:26 <patchbot> https://review.opendev.org/#/c/749400/ - swift - proxy: Put storage policy index in object responses - 3 patch sets
21:55:28 <patchbot> https://review.opendev.org/#/c/749401/ - swift - s3api: Ensure backend headers make it through s3api - 3 patch sets
21:57:02 <clayg> existing behavior for symlinks relies on the client request headers getting a x-backend-storage-policy index - and the s3api bug is that proxy-logging doesn't see those swift_req headers
21:57:02 <timburke> yeah, i'm getting less and less certain that we want p 749400 :-/
21:57:02 <patchbot> https://review.opendev.org/#/c/749400/ - swift - proxy: Put storage policy index in object responses - 3 patch sets
21:57:51 <clayg> timburke: I tried changing it to x-backend-proxy-policy-index; but that was mostly because all the resp.headers.pop were making me nervous (i thought we'd leave 'em just in case)
21:58:20 <timburke> it *seems like* the wort of thing that's reasonable... until you start actually seeing what it does to logs :-(
21:58:37 <clayg> but it doesn't really matter - because if we're using the response headers it's whatever request was last - so swift api would prefer the client.req header and s3api wouldn't have that - so it'd use the resp header it just sucked
21:59:42 <clayg> ah yeah - so for *me* the bug is just s3api metrics not having policy index since we started actually logging the s3api requests (instead of just their underlying swift request)
22:00:04 <clayg> putting the policy index in the response headers may not be the way to go - i'll try to refocus on just the s3api bug
22:00:23 <timburke> i think we end up needing to make sure that s3api copies some (request!) headers back into the provided env from a subrequest env -- yuk
22:00:45 <timburke> oh! one last thing: the election nomination period is over -- we should all read over TC candidacy messages so we're ready for the election
22:00:49 <timburke> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017668.html
22:01:00 <timburke> and #link https://governance.openstack.org/election/#wallaby-tc-candidates
22:01:07 <clayg> and i'm also struggling with making SLO responses that fail mid-transfer show up as 503 in logs p 752770 but that is ALSO a request/resp/environ/context mess ๐Ÿ˜ก
22:01:07 <patchbot> https://review.opendev.org/#/c/752770/ - swift - Log error processing manifest as ServerError - 3 patch sets
22:01:44 <timburke> all right, we're about out of time
22:02:23 <timburke> but we can argue about the wisdom of WSGI back in -swift ;-)
22:02:36 <timburke> thank you all for coming, and thank you for working on swift!
22:02:42 <timburke> #endmeeting