21:01:10 <timburke> #startmeeting swift
21:01:10 <openstack> Meeting started Wed Jul 15 21:01:10 2020 UTC and is due to finish in 60 minutes.  The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:01:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:01:13 <openstack> The meeting name has been set to 'swift'
21:01:19 <timburke> who's here for the swift meeting?
21:01:28 <seongsoocho> o/
21:01:38 <rledisez> o/
21:02:16 <kota_> morning
21:02:19 <mattoliverau> o/
21:03:04 <timburke> agenda's at https://wiki.openstack.org/wiki/Meetings/Swift -- it's fairly light, just updates on ongoing work
21:03:21 <timburke> #topic libec upgrade issue
21:04:08 <timburke> a bit ago, alecuyer noticed an issue running EC with a mix of focal and bionic nodes (iirc)
21:04:32 <timburke> i wrote up https://bugs.launchpad.net/swift/+bug/1886088 as i started playing with a mix of libec versions
21:04:32 <openstack> Launchpad bug 1886088 in OpenStack Object Storage (swift) "Mixed versions of liberasurecode leads to quarantined fragments" [Undecided,In progress]
21:05:10 <timburke> then realized that i'd actually seen a report of this before: https://bugs.launchpad.net/swift/+bug/1867937
21:05:11 <openstack> Launchpad bug 1867937 in OpenStack Object Storage (swift) "Different erasure codes metadata checksums with pypy3 and python2" [Undecided,New]
21:06:06 <timburke> wrote up a couple patches to deal with it: p 738959, p 739164
21:06:34 <timburke> *ahem* p 738959, p 739164
21:06:35 <patchbot> https://review.opendev.org/#/c/738959/ - liberasurecode - Be willing to write fragments with legacy crc - 1 patch set
21:06:37 <patchbot> https://review.opendev.org/#/c/739164/ - swift - ec: Add an option to write fragments with legacy crc - 1 patch set
21:07:14 <timburke> the idea is, libec looks for an env var; if set, write the legacy CRC
21:07:58 <timburke> then, since managing env vars isn't how we've typically tooled up our deployment tools, add a config option to swift that sets or clears the env var
21:08:41 <timburke> does that seem like a reasonable way to go? so far in playing with it in dev, it's working out for me pretty well
21:09:21 <timburke> i tried to put a decent write-up of what would be needed to do a smooth upgrade in the commit messages
21:09:51 <kota_> sounds reasonable. I didn't find better way.
21:09:52 <rledisez> I guess the choice of env variable was to avoid changing the API, right?
21:11:18 <timburke> rledisez, yeah, that was my thinking. didn't really want to go breaking a bunch of signatures if i could help it (plus it would've required an additional patch for pyeclib)
21:12:20 <zaitcev> Normally I hate it when people penetrate layers like that and just rummage down like it's monkey patching or something, but oh well.
21:13:01 <timburke> i could be persuaded to do it anyway ;-) that's why i asked
21:13:15 <rledisez> Wondering if we could "auto" detect that the data are in legacy mode and automatically switch to it? I mean, as an operator, how am I suppose to know if I should enable it or not…
21:15:18 <timburke> i tried to put a decent write-up of how to tell whether you're affected in https://bugs.launchpad.net/swift/+bug/1886088/comments/2
21:15:18 <openstack> Launchpad bug 1886088 in OpenStack Object Storage (swift) "Mixed versions of liberasurecode leads to quarantined fragments" [Undecided,In progress]
21:15:22 <kota_> ah... exactly, to enable the env var, anyway we need to bump libec version.
21:16:02 <rledisez> great! I missed that
21:16:04 <timburke> i'm not so sure that we can do any auto-enabling, though, since the idea is to have this as a temporary measure during upgrade, then turn it back off after everything's upgraded
21:16:08 <kota_> newer libec is compatible with legacy.
21:16:29 <timburke> yup -- i was at least *that* careful originally ;-)
21:18:40 <kota_> oic. however, the env basically prevents to *write* a new fragment with zlib to be compatible with older legacy backend.
21:19:20 <kota_> it sounds hard to make it as auto-detect.
21:20:27 <timburke> so, do people generally agree with zaitcev, that it'd be better to have an explicit option that we plumb through pyeclib as well? or is the env var "good enough"?
21:21:32 <rledisez> I'm voting "good enough", maybe to remove in a future version (eg 1.7), and if somebody wants to upgrade, we say that 1.6 is a "checkpoint release" :)
21:22:03 <timburke> yeah, i was about to ask about that part -- how long should we have this option available? at some point i feel like we'd be justified in removing support for it
21:22:45 <timburke> (maybe that ends up informing the env var vs option debate)
21:24:11 <kota_> either is fine to me but if we choose option, we have to touch it carefully both libec and pyeclib that are maintained in separate repo :/
21:25:09 <kota_> env may be just ignored if older libec was used.
21:27:16 <timburke> all right, seems like we're fairly split at the moment -- let's think about it some more and we can revisit next week. will anyone have a chance to take a look at either or both patches?
21:27:38 <zaitcev> Not a consideration for me, fortunately, I'm going to add "Requires:" into packaging, so "dnf update" does the right thing.
21:29:45 <timburke> fortunately they're both pretty small, less than 100 lines between them
21:31:17 <timburke> all right, moving on
21:31:26 <timburke> #topic waterfall EC
21:32:01 <timburke> so clay isn't around today, but i started trying it out a bit yesterday -- it's great! i'm loving it
21:32:12 <zaitcev> Nice
21:32:36 <timburke> i wrote up a bit of my findings on the main patch
21:32:39 <timburke> #link https://review.opendev.org/#/c/711342/
21:32:39 <patchbot> patch 711342 - swift - Add concurrent_gets to EC GET requests - 12 patch sets
21:34:15 <timburke> basically, i drove a known load via swift-bench against my saio -- then introduced some random delays and watched how the various timeouts available led to different behaviors
21:35:21 <timburke> without clay's work, i could only really play with recoverable_node_timeout, which would help but not that much. and the lower i turned it down, the more failures i'd see as a client
21:36:35 <timburke> with the new behavior, throughput's up and failures are down. i'd still like to see what it does in a real cluster with real disks, but i like it so far
21:36:58 <timburke> anyone else who's interested in EC and performance ought to take a look :-)
21:38:11 <timburke> that's all i've got
21:38:16 <timburke> #topic open discussion
21:38:29 <timburke> anyone else have topics we ought to bring up?
21:38:45 <kota_> definitely I'm interested but sorry I won't have enough time to play with it
21:38:54 <kota_> > waterfallEC
21:40:24 <timburke> all right, let's let mattoliverau enjoy his birthday :-)
21:40:35 <timburke> thank you all for coming, and thank you for working on swift!
21:40:38 <timburke> #endmeeting