21:00:18 <timburke> #startmeeting swift 21:00:19 <openstack> Meeting started Wed Sep 30 21:00:18 2020 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:22 <openstack> The meeting name has been set to 'swift' 21:00:27 <timburke> who's here for the swift meeting? 21:00:41 <seongsoocho> o/ 21:00:50 <mattoliverau> o/ 21:01:06 <alecuyer> o/ 21:01:09 <rledisez> o/ 21:01:12 <kota_> hi 21:02:18 <timburke> not a whole lot on the agenda 21:02:21 <timburke> #link https://wiki.openstack.org/wiki/Meetings/Swift 21:02:42 <timburke> #topic ptg 21:02:51 <timburke> we're less than a month away! 21:03:04 <timburke> planning etherpad is at 21:03:06 <timburke> #link https://etherpad.opendev.org/p/swift-ptg-wallaby 21:03:56 <timburke> i'll plan on an ops-feedback-style session for one of our slots, and make another etherpad just for that 21:04:50 <timburke> moving on 21:04:55 <clayg> o/ 21:05:02 <timburke> #topic replication lock 21:05:11 <timburke> #link https://review.opendev.org/#/c/754242/ 21:05:12 <patchbot> patch 754242 - swift - Fix a race condition in case of cross-replication - 4 patch sets 21:05:19 <timburke> clayg, i think you added this, right? 21:05:27 <clayg> i forget why I wanted to talk about this - rledisez has been pushing up new patchsets daily and it just keeps getting better! 21:05:51 <clayg> rledisez: do you want some help with a ssync_reciever test? I'm not sure I can a probe test ๐ค 21:06:22 <rledisez> I still have one (major) point to discuss. this patchset only fix the issue for SSYNC. But rsync is(must be) impacted too 21:07:02 <rledisez> and yeah, I don't know how to write a probetest for this 21:08:22 <rledisez> I don't know how to fix rsync actually 21:08:53 <clayg> meh rsync 21:09:18 <timburke> part of me wants the excuse to change our "saio" to a "saif" or so... 21:09:43 <clayg> swift-all-in-fhawhatnow? 21:10:29 <timburke> four! so like, `vagrant up` and you get four vms, each independently configurable 21:10:43 <clayg> rledisez: maybe this is just another reason to get of rsync and start working more on tsync - do you think we could "catch" rsync doing this? how did you find it with ssync? 21:10:56 <timburke> or i could learn the dockers ๐ค 21:11:31 <rledisez> clayg: we wrote a tool do scan and discover in near real time disappearing fragments, and we started a rebalance. And we saw it happening 21:11:48 <clayg> I think we can say "closes-bug" with a caveat it still exists in rsync and has forever I think? not being in the datapath takes so many good options off the table. 21:11:56 <rledisez> we have a cluster were it is (too) easily reproducible :( 21:12:11 <clayg> ๐ข 21:12:25 <timburke> that's a damn cool tool 21:12:46 <mattoliverau> could we add the current ring version in the replicate verb call, and if I receive one newer I look for a new ring? 21:13:02 <clayg> so it's kinda like a dispersion report sort of thing? like it checks on individual frags? 21:13:13 <mattoliverau> that plus a lock, cause thre would still be overlap when things can happen 21:13:34 <clayg> mattoliverau: so the tricky thing with update_deleted is it doesn't check with the remote node via REPLICATE (hash recalc) before pushing it's local data over rsync 21:13:55 <rledisez> alecuyer: is the one writing it. ir scan disk and push in a single DB all files. then a tool scan the DB and list objects that don't have the required number of frags or replicas. 21:14:01 <clayg> it just says "this isn't MY part; rsync away - 1, 2, 3 - everyone good? BELETED" 21:14:44 <rledisez> mattoliverau: the object-server does not have the ring loaded so it can't know the ring version. 21:15:12 <rledisez> + what clayg just said 21:15:27 <mattoliverau> oh yes, the whole no ring in the storage daemon thing. 21:16:22 <timburke> (not having the ring loaded isn't the end of the world; espescially if we only care about some of the json metadata at the front and not the full set of assignments -- it's something solvable) 21:16:22 <clayg> I mean, there's some version of "load the ring" that could just pull the metadata like the ring version, I think it'd be useful to pursue but the lock is better 21:17:34 <clayg> it's kinda weird that the replicator is so quick to pick up and notice a new part but so slow to realize it has new rings... 21:18:18 <rledisez> to be clear, it happens because our ring deployment takes about 30 minutes. if the ring deployment was synchronised, we may have never hit that bug 21:18:23 <clayg> everything is probably worse with EC than replicated - just in that deleting a copy doesn't trigger a rebuild 21:18:29 <timburke> i think the trouble is that it may *not* have the new ring yet -- yeah, what rledisez said 21:19:32 <clayg> so someone pushed it parts, and it doesn't believe they belong there... ๐ค lots more options with ssync than rsync 21:20:14 <clayg> anyway, I think rsync can suck a lemon and fixing ssync is going to be great! 21:20:43 <timburke> and much mor costly to be wrong with ssync than rsync. i agree; we ought to fix ssync and footnote rsync 21:20:47 <clayg> I'm super open to saying we want to deprecate rsync replication too, like you can't complain about bugs in deprecated deployment options right? 21:20:56 <clayg> timburke gets it 21:22:11 <rledisez> tbh, i'm not confortable to keep rsync in the code if the bug is not fixed. it's pretty bad, it bite us pretty hardโฆ so i can't imagine that kind of bug being unfixed if we know it's there 21:23:42 <timburke> it's not that we *shouldn't* fix it, it's that we should land what fixes we have in hand, and shore up the rest when we can 21:24:02 <clayg> well we can't pull it out with deprecating - but it could be a strong warning - also I might be biased cause I'm not sure how to fix it... so maybe we can just keep thinking on that harder after we fix ssync (no reason to wait on fixing that right?) 21:25:31 <clayg> rledisez: it's not a new bug - there may be something about rsync with replicated data that makes it less of a problem for some reason we don't entirely understand yet ๐ค 21:25:52 <rledisez> clayg: agree with that. we can fix ssync quickly and think of rsync at the same time 21:26:30 <timburke> so, wanting to work toward that rsync fix: what about a pre-rsync REPLICATE that grabs the partition lock with some (configurable) lease time and a post-rsync REPLICATE to release it again? 21:26:59 <clayg> timburke: ๐คฏ 21:27:30 <rledisez> it's gonna be tricky to keep the lock without an active connection, but it's something we can try 21:27:52 <clayg> i'd hate to add additional suffix hashing that not's necessary - so it'd probably look more like "fool, i'm reverthing this to you - don't delete it for X hours" kind of new pre-flight message 21:27:53 <rledisez> i was thinking an SSYNC connection that just "wrap" the rsync run 21:28:16 <rledisez> and keeping it active by sending "noop" action 21:28:59 <clayg> rledisez: that sounds not terrible! 21:29:08 <timburke> either of those could work, too :-) 21:29:29 <clayg> look us acting like we know how to build software - thanks for finding the bug @rledisez !!! 21:29:57 <mattoliverau> +1 21:30:25 <kota_> +1 21:31:02 <rledisez> i'll finish the ssync patch and i'll do some test on rsync. any idea how to test that in an automated way? 21:32:08 <timburke> i'll start trying to get a multi-vm dev setup going with an explicit goal of repro-ing this and let you know how it goes 21:32:12 <mattoliverau> I guess we need to have a different rings on different "nodes" so maybe a ring override folder for siao 21:32:52 <clayg> rledisez: i wouldn't block anything on probe test - there's a lot of new requirements (timing, ring changes) 21:33:06 <rledisez> ok 21:33:19 <clayg> rledisez: there's a unittest that already has an ssync_reciever object server and reconstructor daemon - I'd start there 21:34:13 <timburke> anything else to bring up on this? 21:34:44 <rledisez> not on my side 21:34:59 <timburke> #topic async SLO segment deletion 21:35:02 <timburke> #link https://review.opendev.org/#/c/733026/ 21:35:03 <patchbot> patch 733026 - swift - Add a new URL parameter to allow for async cleanup... - 11 patch sets 21:36:02 <timburke> so clay pointed out that we should bring this up with a broader audience before merging as-s 21:36:54 <clayg> well, but also you already added the operator config value 21:37:13 <clayg> what do folks think about having users dump a lot of "to be expired *rfn*" objects into your .expiring_objects quuee 21:37:17 <clayg> does anyone monitor that shit? 21:37:35 <timburke> still, seems worth asking whether i got the default for the config option right ;-) 21:38:14 <rledisez> i'd like to monitore it, but I never took time to wrote that patch in the expirer that would send a metric of the queue size 21:38:40 <mattoliverau> that's what the .expiring_objects queue is for. async delete. it's a good idea. 21:38:49 <rledisez> we sometime get late on deletion because of a burst, so it's useful to know something is going wrong 21:39:10 <mattoliverau> maybe we need to add some tooling around viewing and maintaining the queue would be useful though 21:39:30 <clayg> rledisez: something like the "lag" metric for the container-updater? I never quite understood how that metric works in a time series visualization... 21:39:52 <clayg> ... although I *do* have some better tools for development with metrics - maybe I just need to look at it again! 21:40:39 <clayg> mattoliverau: all I have is https://gist.github.com/clayg/7f66eab2a61c77869e1e84ac4ed6f1df (run it on any node every N minutes and plot the numbers over time) 21:41:00 <mattoliverau> I always though expired queue deletion was a soft shedule. it shouldn't be deleted until at least that time. 21:41:26 <clayg> stale entires should be zero-ish, but any time someone does an SLO delete it would spike and depending on how aggressive your object-epxirers are configured... 21:41:27 <timburke> speaking of lag metrics, there's https://review.opendev.org/#/c/735271/ ... 21:41:27 <patchbot> patch 735271 - swift - metrics: Add lag metric to expirer - 1 patch set 21:42:24 <clayg> timburke: yeah! i'll take a look at that one again, maybe that's just what we need (or a version of that which is slo-async-delete aware) 21:42:34 <mattoliverau> clayg: cool that's something, tracking the size is great :) 21:44:09 <clayg> i'm not even sure I could easily get all those lag metrics to even pop up in a dev env 21:44:55 <timburke> clayg, good point -- it should probably emit different metrics for expired vs. async deleted.... 21:45:47 <clayg> but just poking at the queue listings doesn't scale horizontally, one of the container walkers could dump stats as it goes and then you just ... divide by replica count? ๐คฎ 21:47:19 <clayg> ok - so "monitor expiring queue" sounds like something everyone is keen on, but no objections to extending SLO delete in this fashion? 21:48:05 <timburke> does anyone feel uncomfortable having the async-delete behavior as the default? 21:48:08 <clayg> having the objects in the segment container listing until their cleared seems reasonable - and it doesn't hurt anything if the client ends up deleting them on their own 21:48:31 <mattoliverau> no I think it's a great idea because it means we use an existing mechanism that is ment to do it. I'll review the patch today 21:49:05 <clayg> once the async delete successfully queues everything - the manifest is gone 21:49:49 <mattoliverau> timburke: I'm warming to the idea of it being the default 21:50:07 <mattoliverau> Will think about it while I review the patch. 21:50:48 <timburke> cool, sounds like we're good to go then, thanks. between mattoliverau, clayg, and zaitcev, i'm sure this is gonna turn out great :-) 21:50:59 <timburke> #topic open discussion 21:51:11 <timburke> last few minutes: anything else to bring up? 21:53:06 <timburke> i updated https://review.opendev.org/#/c/738959/ to have LIBERASURECODE_WRITE_LEGACY_CRC unset, set to "", and set to "0" all mean the same thing. i realized the commit message needs some clean up, but otherwise i think it's good to go 21:53:06 <patchbot> patch 738959 - liberasurecode - Be willing to write fragments with legacy crc - 3 patch sets 21:53:24 <clayg> ๐ฅณ 21:53:41 <kota_> ok, thx. I'll circle back to the patch around Friday. 21:53:51 <timburke> thanks kota_! 21:54:20 <timburke> the swift side at https://review.opendev.org/#/c/739164/ should maybe be updated to set LIBERASURECODE_WRITE_LEGACY_CRC=1 instead of true, but that's pretty minor 21:54:21 <patchbot> patch 739164 - swift - ec: Add an option to write fragments with legacy crc - 2 patch sets 21:55:25 <clayg> I was playing with p 749400 trying to preserve existing behavior - but it got wonky when I put p 749401 on top of it ๐ 21:55:26 <patchbot> https://review.opendev.org/#/c/749400/ - swift - proxy: Put storage policy index in object responses - 3 patch sets 21:55:28 <patchbot> https://review.opendev.org/#/c/749401/ - swift - s3api: Ensure backend headers make it through s3api - 3 patch sets 21:57:02 <clayg> existing behavior for symlinks relies on the client request headers getting a x-backend-storage-policy index - and the s3api bug is that proxy-logging doesn't see those swift_req headers 21:57:02 <timburke> yeah, i'm getting less and less certain that we want p 749400 :-/ 21:57:02 <patchbot> https://review.opendev.org/#/c/749400/ - swift - proxy: Put storage policy index in object responses - 3 patch sets 21:57:51 <clayg> timburke: I tried changing it to x-backend-proxy-policy-index; but that was mostly because all the resp.headers.pop were making me nervous (i thought we'd leave 'em just in case) 21:58:20 <timburke> it *seems like* the wort of thing that's reasonable... until you start actually seeing what it does to logs :-( 21:58:37 <clayg> but it doesn't really matter - because if we're using the response headers it's whatever request was last - so swift api would prefer the client.req header and s3api wouldn't have that - so it'd use the resp header it just sucked 21:59:42 <clayg> ah yeah - so for *me* the bug is just s3api metrics not having policy index since we started actually logging the s3api requests (instead of just their underlying swift request) 22:00:04 <clayg> putting the policy index in the response headers may not be the way to go - i'll try to refocus on just the s3api bug 22:00:23 <timburke> i think we end up needing to make sure that s3api copies some (request!) headers back into the provided env from a subrequest env -- yuk 22:00:45 <timburke> oh! one last thing: the election nomination period is over -- we should all read over TC candidacy messages so we're ready for the election 22:00:49 <timburke> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017668.html 22:01:00 <timburke> and #link https://governance.openstack.org/election/#wallaby-tc-candidates 22:01:07 <clayg> and i'm also struggling with making SLO responses that fail mid-transfer show up as 503 in logs p 752770 but that is ALSO a request/resp/environ/context mess ๐ก 22:01:07 <patchbot> https://review.opendev.org/#/c/752770/ - swift - Log error processing manifest as ServerError - 3 patch sets 22:01:44 <timburke> all right, we're about out of time 22:02:23 <timburke> but we can argue about the wisdom of WSGI back in -swift ;-) 22:02:36 <timburke> thank you all for coming, and thank you for working on swift! 22:02:42 <timburke> #endmeeting