21:00:00 <timburke> #startmeeting swift
21:00:01 <openstack> Meeting started Wed May 13 21:00:00 2020 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:02 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:05 <openstack> The meeting name has been set to 'swift'
21:00:09 <timburke> who's here for the swift meeting?
21:00:14 <seongsoocho> o/
21:00:18 <kota_> hi
21:00:55 <tdasilva> hello
21:01:05 <rledisez> o/
21:02:11 <timburke> clayg, mattoliverau, alecuyer?
21:02:19 <clayg> o/
21:02:35 <timburke> agenda's at https://wiki.openstack.org/wiki/Meetings/Swift
21:02:53 <timburke> #topic PTG
21:03:22 <timburke> first up, the usual reminder that we'll have our virtual PTG in just a few weeks
21:03:43 <timburke> mattoliverau has been doing a great job of working to organize it
21:04:06 <timburke> and you guys have been great about adding topics to the etherpad
21:04:11 <timburke> #link https://etherpad.opendev.org/p/swift-ptg-victoria
21:05:24 <timburke> mattoliverau added some times for us to https://ethercalc.openstack.org/126u8ek25noy
21:05:37 <tdasilva> is there any info anywhere in how it will actually work? I read somewhere about reserving "rooms", but I'm not sure I understand what that means
21:06:04 <timburke> i'll make sure they get added to the etherpad
21:07:03 <timburke> tdasilva, that's a great question! unfortunately i've been a bit distracted, so i'm not entirely sure myself. i'll find out more and drop what i find out (and probably links to mailing list messages) in -swift about it
21:07:41 <tdasilva> timburke: no worries, I can go dig for it too
21:08:12 <timburke> are there any other questions about the PTG? i'm not sure i'll be able to answer them right away, but i should be able to do some research
21:10:05 <timburke> all right, on to my most recent distraction, then :-)
21:10:14 <timburke> #topic object updater
21:10:50 <timburke> so last week i shared that one of our clusters had a *lot* of async pendings
21:11:05 <timburke> good news! we're doing much better now :-)
21:11:28 <timburke> better news: we've written up a bunch of bugs describing things we saw going wrong
21:12:27 <clayg> best news (?): patches are starting to come up -> https://review.opendev.org/#/c/727876/1
21:12:28 <timburke> clayg did the lion's share of the investigation -- the core of it was
21:12:28 <patchbot> patch 727876 - swift - Breakup reclaim into batches - 1 patch set
21:12:33 <timburke> #link https://bugs.launchpad.net/swift/+bug/1877651
21:12:33 <openstack> Launchpad bug 1877651 in OpenStack Object Storage (swift) "Reclaim of tombstone rows is unbounded and causes LockTimeout (10s)" [Medium,In progress] - Assigned to clayg (clay-gerrard)
21:12:47 <timburke> yeah, that's the stuff :-)
21:15:08 <clayg> rledisez: i don't know yet how this effects my thinking about the on-disk layout for the per-container stuff
21:15:54 <timburke> tl;dr: after pushing our container-replicator cycle time up by about a hundred-fold, we could get our updaters from not keeping up with incoming asyncs to net-reducing them by about 11M/hr
21:16:06 <rledisez> clayg: me neither. i'll make sure to go look at these patches, i'll see then if it still make sense
21:16:16 <clayg> we had a node offline for few days to get a nic swapped and we have a bunch of async pendings all of the cluster spread across all the containers on that node...
21:16:31 <clayg> ... which is kind of different than the situation you're optimizing for
21:16:49 <timburke> along the way we noticed some workers dying off due to https://bugs.launchpad.net/swift/+bug/1877924
21:16:49 <openstack> Launchpad bug 1877924 in OpenStack Object Storage (swift) "object-updater should be more tolerant of already-removed async pendings" [Undecided,In progress]
21:17:22 <timburke> we did some sharding (which helped, but it could be better; see https://bugs.launchpad.net/swift/+bug/1878090)
21:17:23 <openstack> Launchpad bug 1878090 in OpenStack Object Storage (swift) "object-updater should remember redirects and proactively check whether an update should be pointing at a shard instead" [Undecided,Confirmed]
21:18:16 <timburke> did some config changes that restarted updaters cluster-wide and caused some minor heart attacks due to https://bugs.launchpad.net/swift/+bug/1878056
21:18:16 <openstack> Launchpad bug 1878056 in OpenStack Object Storage (swift) "object-updater should shuffle work before making requests" [Undecided,In progress]
21:18:58 <timburke> but at the end of the day, we've got a really great system!
21:19:41 <clayg> 💪 ten years of sweat and tears has to be good for something - it's resilient and flexible if nothing else
21:20:29 <timburke> (speaking of -- swift's birthday is in four days! 10 years running in prod!)
21:21:02 <clayg> i also filed a couple more https://bugs.launchpad.net/bugs/1877662 https://bugs.launchpad.net/bugs/1877663 https://bugs.launchpad.net/bugs/1877665
21:21:02 <openstack> Launchpad bug 1877662 in OpenStack Object Storage (swift) "Magic number for per_diff in rsync_then_merge " [Low,New]
21:21:03 <openstack> Launchpad bug 1877663 in OpenStack Object Storage (swift) "Default db_replicator per_diff is the degenerate configuration " [Low,New]
21:21:04 <openstack> Launchpad bug 1877665 in OpenStack Object Storage (swift) "Database WAL PENDING_CAP should be configurable" [Medium,New]
21:21:16 <clayg> i'm not sure how many I'll get before I go back to waterfall-ec
21:21:17 <kota_> congrats for the birthday
21:21:31 <rledisez> 10 years, awesome! is swift one of the oldest object storage software?
21:21:41 <clayg> i guess none of them really have anything to do with big containers just eating PUTs slowly - but we're 100% on the sharding train and trying to do more and go faster
21:22:00 <clayg> so the whole "500M row database" might not be a thing we keep caring about 🤔
21:23:06 <timburke> rledisez, i'm not sure, good question...
21:23:20 <zaitcev> Sage defended the dissertation on Ceph in 2007, and it was running on Bluehost, so that's longer than Swift.
21:23:29 <timburke> clayg, we need to help out mattoliverau with getting some *real* auto-sharding going :D
21:24:02 <clayg> 😬
21:24:11 <rledisez> timburke: (operator-speaking) it's clearly the next big thing
21:24:32 <timburke> zaitcev, otoh, https://en.wikipedia.org/wiki/Ceph_(software) lists their initial release in 2012
21:24:34 <timburke> *shrug*
21:24:59 <mattoliverau> o/ sorry I'm very late, kinda slept in.
21:25:07 <timburke> no worries!
21:27:04 <timburke> anyway, i think that's about all i've got by way of post-mortem -- i've been playing at putting a story together with graphs and everything, i'll see what comes out of that (and what i can share :-/ this working-for-a-Big-Corp is kinda new to me)
21:27:50 <timburke> only other thing i've got for the agenda is an update for LOSF
21:28:15 <timburke> but i think alecuyer said he wasn't going to be able to make it to discuss?
21:28:19 <rledisez> so, no update this week, alecuyer was busy on an other project
21:28:37 <timburke> 👍
21:28:44 <timburke> i know how it goes ;-)
21:29:52 <timburke> oh, last minute topic! i think i'd like to get a swiftclient release out soon. i missed the window to get versioning support out in ussuri, but it's sitting there on master -- we should publish it!
21:30:38 <timburke> it'll also bring some recently-landed application credential support, and i'll see what other client patches might make sense to get merged in the next week
21:31:19 <timburke> if anyone has any they want to get in, please let me know (or update the priority reviews page)
21:31:26 <timburke> #topic open discussion
21:31:49 <timburke> anything else we should talk about?
21:33:02 <zaitcev> Nothing here.
21:33:13 <mattoliverau> A note from earlier, the time booking is also booking the room. So I booked the same location (different times) for the other. We're in Liberty... Now not sure what means assume it's some virtual room
21:33:32 <mattoliverau> *PTG
21:33:39 <mattoliverau> Damn autocorrect
21:33:44 <zaitcev> Well, the Dark Data is stuck. I didn't work on it in a while. I think I addressed Romain's concerns.
21:35:36 <rledisez> I'll try to go look at it while I'm on updater patches
21:36:13 <clayg> I don't think Romain is a great candidate for the dark data auditors - it's like part power adjustment or container sync - it won't work for everyone to their satisfaction; still useful
21:36:54 <timburke> crap, i'd meant to look at that hadn't i...
21:36:56 <zaitcev> He's the only one who might have some for this to actually find. My test cluster seems to have none at all.
21:37:06 <clayg> Is there a question if it should be maintained upstream?
21:37:31 <clayg> zaitcev: no dark data is good!
21:37:52 <zaitcev> Anything that's not upstream rots. Just look how well it worked for swift3 and swauth.
21:37:54 <clayg> it normally takes something to fall over pretty good and be missed for awhile
21:38:38 <timburke> counter-point: container-sync
21:39:17 <clayg> timburke: but rledisez is gunna look at that again, and... Gil before that?  Alistair was a container sync fan.  1space container crawler is based off container-sycn!
21:39:21 <timburke> (just playing devil's advocate -- i'm all for having audit-watchers upstream, and the feature makes a lot more sense when there's at least one user)
21:40:18 <clayg> well, unless it's broken; in principle I'm in favor of merging it - with docs
21:40:37 <clayg> some iterative review would be a great pre-ptg project
21:41:51 <zaitcev> For me, the feature itself has value, but Sam wanted a general API. Does that need still exit, and do we have people modding auditor with additional functions?
21:42:17 <clayg> lord knows I'm going to need some help/review https://review.opendev.org/#/c/727876/1/test/unit/account/test_backend.py@209
21:42:18 <patchbot> patch 727876 - swift - Breakup reclaim into batches - 1 patch set
21:43:27 <timburke> i seem to recall that we have at least one audit-watcher-like thing that was implemented as a whole extra auditor (alongside ZBF and ALL) -- i forget what it was for offhand, though
21:44:34 <zaitcev> Math sounds like an interesting challenge, although this probably can just be tested by running a few boundary conditions through.
21:44:37 <clayg> zaitcev: can you add dark data and auditor watchers to https://etherpad.opendev.org/p/swift-ptg-victoria
21:46:20 <timburke> how about this as an idea, too -- i'm assigning homework! everyone find 1-5 patches they'd like to see progress on over the course of PTG week. add them to the priority reviews wiki
21:47:05 <clayg> timburke: can they all be patches I write or review between now and ptg week?
21:47:29 <timburke> i'll put a new section in for it. list of patches, then maybe a sub-bullet for irc nicks interested
21:47:38 <mattoliverau> I wonder if we could use audit watchers to let the auditors walk the filesystem and trigger events like replication, sharding, etc. And that would mean much less filesystem walking by almost all the daemons (just thinking out loud and pre coffee).
21:48:11 <rledisez> mattoliverau: i like that, I kinda of think about something similar last week
21:48:12 <timburke> kinda like out old in-person sticky notes-and-dots system at swift hackathons
21:48:40 <rledisez> split the "job producer" from the "job executor"
21:49:07 <timburke> mattoliverau, rledisez yes! i've thought about that too -- gets us closer to a central point for all i/o scheduling
21:49:17 <rledisez> we are taking this approach to scale container-sync => we split container-sync, one crawl the DB, one execute what need to be synchronised
21:49:19 <mattoliverau> yeah, might screw up our tunings on different daemon internvals, but less io is less io
21:52:07 <clayg> i want this even more with generic tasks queues
21:52:53 <clayg> let's do it - let's build generic task queues - single producer multi executor for all daemons - and s3 bucket policies for multi-site replication and expiry
21:53:50 <mattoliverau> sounds like another PTG topic :)
21:54:04 <clayg> or two
21:54:18 <mattoliverau> I wish we had this discussion before I had to book times. I could have tried to find some more slots.
21:54:36 <clayg> hahaha
21:54:42 <clayg> we'll make do
21:54:46 <timburke> worst case, we keep talking in irc :D
21:55:07 <timburke> i'll just need to make sure i have enough coffee at home
21:55:57 <timburke> all right, let's let mattoliverau, seongsoocho, and kota_ go have breakfast
21:56:10 <timburke> thank you all for coming, and thank you for working on swift!
21:56:15 <timburke> #endmeeting