#openstack-swift log

21:00:28 <timburke__> #startmeeting swift
21:00:28 <opendevmeet> Meeting started Wed May 11 21:00:28 2022 UTC and is due to finish in 60 minutes.  The chair is timburke__. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:28 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:28 <opendevmeet> The meeting name has been set to 'swift'
21:00:35 <timburke__> who's here for the swift meeting?
21:00:46 <kota> hi
21:00:52 <mattoliver> o/
21:01:53 <timburke> as usual, the agenda's at
21:01:59 <timburke> #link https://wiki.openstack.org/wiki/Meetings/Swift
21:02:08 <acoles> o/
21:02:19 <timburke> but i mostly just want to follow up on the various threads of work i mentioned last week
21:02:28 <timburke> #topic ring v2
21:02:55 <timburke> i don't think we've done further work on this yet -- can anyone spare some review cycles for the head of the chain?
21:03:42 <mattoliver> I can go back and look again, but probably need fresh eyes too.
21:04:03 <timburke> thanks 👍
21:04:24 <timburke> #topic memcache errors
21:05:04 <timburke> clayg took a look and seemed to like it -- iirc we're going to try it out in prod this week and see how it goes
21:05:42 <timburke> i think it'll be of tremendous use to callers to be able to tell the different between a cache miss and an error talking to memcached
21:06:10 <mattoliver> +1
21:06:51 <timburke> #topic eventlet and logging handler locks
21:08:24 <mattoliver> I took a look at this in a vsaio. Threw a bunch of load at it, upped the workers. And seemed to all work fine.
21:08:29 <timburke> we're also going to try this out in prod. it may take a little longer to get data on it, though, since it seems riskier (and so requires ops to opt-in to the new behavior)
21:08:46 <timburke> good proof point! thanks mattoliver
21:09:34 <timburke> it'll be real interesting if we can notice a performance difference when we try it in prod
21:09:51 <mattoliver> I then wanted to goto an elk environment to better tell me if there was a logging issues as file beat tries to break things into json
21:10:02 <mattoliver> But stalled getting the env set up
21:10:32 <mattoliver> Might have a play with it now that we'll have it in our staging cluster.
21:11:04 <timburke> 👍
21:11:20 <timburke> #topic backend rate limiting
21:11:38 <timburke> sorry, acoles, i'm pretty sure i promised reviews and haven't delivered
21:12:21 <acoles> hehe, no worries, I've not circled back to this yet (in terms of deploying at least)
21:12:50 <acoles> I think since last week we've had some thoughts about how the proxy should deal with 529s...
21:14:11 <acoles> originally I felt the proxy should not error limit. I still think that is the case given the difference in time scales (per second rate limiting vs 60second error limit), BUT it could be mitigated for by increasing the backend ratelimit buffer to match
21:14:37 <acoles> i.e. average backend ratelimiting over ~60secs
21:15:29 <acoles> also, we need to consider there are N proxies requesting to each backend ratelimiter
21:17:05 <acoles> in the meantime, since last week I think, I added a patch to the chain to load the ratelimit config from a separate file, and to periodically reload
21:17:06 <timburke> do we already have plumbing to make the buffer time configurable?
21:17:16 <mattoliver> If only we had global error limiting 😜
21:17:31 <mattoliver> Oh nice
21:17:58 <acoles> timburke: yes, rate_buffer can be configured in the upstream patch
21:19:01 <timburke> and do we want to make it an option in the proxy for whether to error-limit 529s or not? then we've put a whole bunch of knobs that ops can try out, and hopefully they can try some experiments and tell us how they like it
21:20:09 <acoles> that's an idea I have considered, but not actioned yet
21:20:47 <timburke> all right, that's it for old business
21:20:54 <timburke> #topic shard range gaps
21:22:01 <timburke> part of why i didn't get to reviewing the ratelimiting stuff -- i slipped up last week and broke some shard ranges in prod
21:23:24 <timburke> we were having trouble keeping shard ranges in cache -- memcache would often complain about being out of memory
21:23:44 <timburke> i wanted to relieve some of the memory pressure and knew there were shards getting cached that didn't have any objects in them, so i wanted to shrink them away
21:23:57 * zaitcev is getting excited at a possibility of dark data in production.
21:24:26 <timburke> unfortunately, there was also a transient shard-range overlap which blocked me from trying to compact
21:24:46 <timburke> i thought, "overlaps seem bad," and ran repair
21:25:33 <zaitcev> Not to wish evil on Tim and crew but, I kinda wish my work on watchers were helpful to someone.
21:25:58 <zaitcev> So what did repair do?
21:26:30 <timburke> only, the trouble was that (1) a shard had sharded, (2) at least one replica fully split up, and (3) at least one other replica reported back to the root. everybody was currently marked active, but it wasn't going to stay that way
21:27:21 <acoles> @zaitcev IIRC we try very hard to send object updates *somewhere* ultimately falling back to the root container 🤞
21:27:51 <timburke> iirc, the child shards all got marked shrinking. so now the parent was trying to kick rows out to the children and delete itself, while the children were trying to send rows back to the parent and delete *themselves*
21:28:45 <timburke> eventually some combination succeeded well enough to where the children got marked shrunk and the parent got marked sharded. and now nobody's covering the range
21:29:22 <mattoliver> So the "overlap" was a normal shard event, just everything hadn't replicated. And after repair the shrink acceptor disappeared (deleted itself).
21:29:33 <timburke> acoles was quick to get some new code into swift-manage-shard-ranges to help us fix it
21:29:35 <timburke> #link https://review.opendev.org/c/openstack/swift/+/841143
21:29:52 <timburke> and immediately following this meeting, i'm going to try it out :-)
21:31:46 <timburke> any questions or comments on any of that?
21:32:28 <mattoliver> Nope, but I've been living it 😀 interested to see how it goes
21:32:39 <zaitcev> I haven't even enabled shrinking, so none.
21:33:59 <mattoliver> Also seems fixing the early ACTIVE/CLEAVED would've helped here.
21:34:18 <timburke> i was just going to mention that :-)
21:34:45 <mattoliver> So maybe seeing the early ACTIVE edge case in real life, rather then in theory.
21:35:14 <mattoliver> So we weren't crazy to work on it Al, that's a silver lining.
21:35:37 <timburke> the chain ending at https://review.opendev.org/c/openstack/swift/+/789651 likely would have prevented me getting into trouble, since i would've seen some mix of active and cleaved/created and no overlap would've been detected
21:36:28 <timburke> all right, that's all i've got
21:36:32 <timburke> #topic open discussion
21:36:40 <timburke> what else should we bring up this week?
21:36:57 <mattoliver> As mentioned at ptg we think we have a better way of solving it without new states, but yeah, it would've helped definitely
21:37:58 <timburke> mattoliver, do we have a patch for that yet, or is it still mostly hypothetical?
21:39:11 <mattoliver> I've done the prereq patch, and playing with a timing out algorithm. But no still need to find time to write the rest of the code.. maybe once the gaps are filled ;)
21:39:39 <acoles> there's a few other improvements we have identified, like making the repair overlaps tool check for any obvious parent-child relationship between the overlapping ranges
21:40:27 <acoles> and also being wary of fixing recently created overlaps (that may be transient)
21:40:58 <acoles> but yeah, ideally we would eliminate the transient overlaps that are a feature of normal sharding
21:43:37 <timburke> all right, i think i'll get on with that repair then :-)
21:43:54 <timburke> thank you all for coming, and thank you for working on swift
21:43:58 <timburke> #endmeeting
06:59:27 <SharathCk> Hi, I am working on enabling Keystone audit middleware in Swift. Since swift_proxy_server supports middleware, I am trying to add audit filter in the pipeline and enable audit for Swift service. But audit events are not getting generated. As per the analysis, events are not getting notified. Is this a known issue or Keystone audit middleware is not supported for Swift ?
13:38:26 <opendevreview> Alistair Coles proposed openstack/swift master: trivial: add comment re sharder misplaced found stat  https://review.opendev.org/c/openstack/swift/+/841592
13:38:49 <acoles> ^^ easy review :)
13:39:31 <acoles> trivial change but I lost time confused by the code
15:31:01 <opendevreview> Alistair Coles proposed openstack/swift master: sharder: ensure that misplaced tombstone rows are moved  https://review.opendev.org/c/openstack/swift/+/841612
18:13:43 <opendevreview> Merged openstack/swift master: trivial: add comment re sharder misplaced found stat  https://review.opendev.org/c/openstack/swift/+/841592
02:48:45 <opendevreview> Takashi Kajinami proposed openstack/swift master: Add missing services to sample rsyslog.conf  https://review.opendev.org/c/openstack/swift/+/841673
03:57:49 <opendevreview> Tim Burke proposed openstack/swift master: Add --test-config option to WSGI servers  https://review.opendev.org/c/openstack/swift/+/833124
03:57:50 <opendevreview> Tim Burke proposed openstack/swift master: Add a swift-reload command  https://review.opendev.org/c/openstack/swift/+/833174
03:57:50 <opendevreview> Tim Burke proposed openstack/swift master: systemd: Send STOPPING/RELOADING notifications  https://review.opendev.org/c/openstack/swift/+/837633
03:57:51 <opendevreview> Tim Burke proposed openstack/swift master: Add abstract sockets for process notifications  https://review.opendev.org/c/openstack/swift/+/837641
07:38:28 <opendevreview> Alistair Coles proposed openstack/swift master: sharder: ensure that misplaced tombstone rows are moved  https://review.opendev.org/c/openstack/swift/+/841612
07:42:50 <acoles> mattoliver: ^^^ do you have any recollection if there was a reason to process misplaced object rows in order undeleted followed by deleted?
08:19:41 <mattoliver> I was going to say, in case a delete was issued after a put, but the delete hits first. But that's only an issue if a delete isn't logged if the object isn't already there? Surely it is. Maybe it was to make sure objects that were there got to its destination first, to minimise missing objects from listing.
08:20:39 <mattoliver> Maybe we should treat it more like a journal, and move objects in row order (include deleted) from the start.
08:21:57 <mattoliver> If there were alot of deletes, like in an expired objects container then maybe so it looks like it makes progress if you move the deleted=0 first 🤷‍♂️
08:51:52 <afaranha_> mattoliver, hey, I see that the tempurl patches are in a good shape, thanks for that https://review.opendev.org/c/openstack/swift/+/525771 are we currently just waiting for reviews?
08:51:54 <afaranha_> do you need any help with the patches?
08:57:00 <mattoliver> Yeah, I'll double back to them and check, but yeah it's waiting on reviews I believe. I'll poke people about it in the next meeting if it isn't reviewed by then.
08:57:38 <mattoliver> If you want to double check and review them too (if you haven't already) that'll be great! (On my phone so don't have them handy to check).
08:58:10 <afaranha_> thanks, I'll do that, and ask my colleague
16:44:58 <opendevreview> Merged openstack/swift master: memcached: Give callers the option to accept errors  https://review.opendev.org/c/openstack/swift/+/839448
23:54:57 <opendevreview> Tim Burke proposed openstack/swift master: container: Add delimiter-depth query param  https://review.opendev.org/c/openstack/swift/+/829605
05:28:39 <opendevreview> Tim Burke proposed openstack/swift master: container: Add delimiter-depth query param  https://review.opendev.org/c/openstack/swift/+/829605
04:30:16 <sharathck> Hi, I am working on enabling Keystone audit middleware in Swift. Since swift_proxy_server supports middleware, I am trying to add audit filter in the pipeline and enable audit for Swift service. But audit events are not getting generated. As per the analysis, events are not getting notified. Is this a known issue or Keystone audit middleware is not supported for Swift ?
18:41:27 <opendevreview> Tim Burke proposed openstack/swift master: Distinguish workers by their args  https://review.opendev.org/c/openstack/swift/+/841989
20:51:28 <opendevreview> Merged openstack/swift master: Refactor rate-limiting helper into a class  https://review.opendev.org/c/openstack/swift/+/834960
06:24:29 <opendevreview> Matthew Oliver proposed openstack/swift master: ring v2 serialization: more test coverage follow up  https://review.opendev.org/c/openstack/swift/+/842040
08:15:19 <opendevreview> Merged openstack/swift master: AbstractRateLimiter: add option to burst on start-up  https://review.opendev.org/c/openstack/swift/+/835122
19:30:37 <acoles> timburke__: apologies, I won't be able to make today's meeting
20:54:58 <opendevreview> Merged openstack/swift master: Add missing services to sample rsyslog.conf  https://review.opendev.org/c/openstack/swift/+/841673
21:03:35 <kota> meeting?
21:04:30 <mattoliver> I'll poke tim
21:05:05 <kota> thx mattoliver
21:05:47 <zaitcev> I didn't forget this time, meaning it's probably going to be jinxed with Tim's sick child or something.
21:06:14 <kota> :/
21:06:20 <mattoliver> lol
21:08:32 <mattoliver> Still no response from him
21:11:08 <timburke__> sorry, got distracted by an issue at home
21:11:14 <opendevmeet> timburke__: Error: Can't start another meeting, one is in progress.  Use #endmeeting first.
21:11:41 <mattoliver> that's a weird error
21:11:41 <timburke__> wha...
21:11:45 <timburke__> #endmeeting