21:01:10 <timburke> #startmeeting swift 21:01:10 <openstack> Meeting started Wed Jul 15 21:01:10 2020 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:13 <openstack> The meeting name has been set to 'swift' 21:01:19 <timburke> who's here for the swift meeting? 21:01:28 <seongsoocho> o/ 21:01:38 <rledisez> o/ 21:02:16 <kota_> morning 21:02:19 <mattoliverau> o/ 21:03:04 <timburke> agenda's at https://wiki.openstack.org/wiki/Meetings/Swift -- it's fairly light, just updates on ongoing work 21:03:21 <timburke> #topic libec upgrade issue 21:04:08 <timburke> a bit ago, alecuyer noticed an issue running EC with a mix of focal and bionic nodes (iirc) 21:04:32 <timburke> i wrote up https://bugs.launchpad.net/swift/+bug/1886088 as i started playing with a mix of libec versions 21:04:32 <openstack> Launchpad bug 1886088 in OpenStack Object Storage (swift) "Mixed versions of liberasurecode leads to quarantined fragments" [Undecided,In progress] 21:05:10 <timburke> then realized that i'd actually seen a report of this before: https://bugs.launchpad.net/swift/+bug/1867937 21:05:11 <openstack> Launchpad bug 1867937 in OpenStack Object Storage (swift) "Different erasure codes metadata checksums with pypy3 and python2" [Undecided,New] 21:06:06 <timburke> wrote up a couple patches to deal with it: p 738959, p 739164 21:06:34 <timburke> *ahem* p 738959, p 739164 21:06:35 <patchbot> https://review.opendev.org/#/c/738959/ - liberasurecode - Be willing to write fragments with legacy crc - 1 patch set 21:06:37 <patchbot> https://review.opendev.org/#/c/739164/ - swift - ec: Add an option to write fragments with legacy crc - 1 patch set 21:07:14 <timburke> the idea is, libec looks for an env var; if set, write the legacy CRC 21:07:58 <timburke> then, since managing env vars isn't how we've typically tooled up our deployment tools, add a config option to swift that sets or clears the env var 21:08:41 <timburke> does that seem like a reasonable way to go? so far in playing with it in dev, it's working out for me pretty well 21:09:21 <timburke> i tried to put a decent write-up of what would be needed to do a smooth upgrade in the commit messages 21:09:51 <kota_> sounds reasonable. I didn't find better way. 21:09:52 <rledisez> I guess the choice of env variable was to avoid changing the API, right? 21:11:18 <timburke> rledisez, yeah, that was my thinking. didn't really want to go breaking a bunch of signatures if i could help it (plus it would've required an additional patch for pyeclib) 21:12:20 <zaitcev> Normally I hate it when people penetrate layers like that and just rummage down like it's monkey patching or something, but oh well. 21:13:01 <timburke> i could be persuaded to do it anyway ;-) that's why i asked 21:13:15 <rledisez> Wondering if we could "auto" detect that the data are in legacy mode and automatically switch to it? I mean, as an operator, how am I suppose to know if I should enable it or not… 21:15:18 <timburke> i tried to put a decent write-up of how to tell whether you're affected in https://bugs.launchpad.net/swift/+bug/1886088/comments/2 21:15:18 <openstack> Launchpad bug 1886088 in OpenStack Object Storage (swift) "Mixed versions of liberasurecode leads to quarantined fragments" [Undecided,In progress] 21:15:22 <kota_> ah... exactly, to enable the env var, anyway we need to bump libec version. 21:16:02 <rledisez> great! I missed that 21:16:04 <timburke> i'm not so sure that we can do any auto-enabling, though, since the idea is to have this as a temporary measure during upgrade, then turn it back off after everything's upgraded 21:16:08 <kota_> newer libec is compatible with legacy. 21:16:29 <timburke> yup -- i was at least *that* careful originally ;-) 21:18:40 <kota_> oic. however, the env basically prevents to *write* a new fragment with zlib to be compatible with older legacy backend. 21:19:20 <kota_> it sounds hard to make it as auto-detect. 21:20:27 <timburke> so, do people generally agree with zaitcev, that it'd be better to have an explicit option that we plumb through pyeclib as well? or is the env var "good enough"? 21:21:32 <rledisez> I'm voting "good enough", maybe to remove in a future version (eg 1.7), and if somebody wants to upgrade, we say that 1.6 is a "checkpoint release" :) 21:22:03 <timburke> yeah, i was about to ask about that part -- how long should we have this option available? at some point i feel like we'd be justified in removing support for it 21:22:45 <timburke> (maybe that ends up informing the env var vs option debate) 21:24:11 <kota_> either is fine to me but if we choose option, we have to touch it carefully both libec and pyeclib that are maintained in separate repo :/ 21:25:09 <kota_> env may be just ignored if older libec was used. 21:27:16 <timburke> all right, seems like we're fairly split at the moment -- let's think about it some more and we can revisit next week. will anyone have a chance to take a look at either or both patches? 21:27:38 <zaitcev> Not a consideration for me, fortunately, I'm going to add "Requires:" into packaging, so "dnf update" does the right thing. 21:29:45 <timburke> fortunately they're both pretty small, less than 100 lines between them 21:31:17 <timburke> all right, moving on 21:31:26 <timburke> #topic waterfall EC 21:32:01 <timburke> so clay isn't around today, but i started trying it out a bit yesterday -- it's great! i'm loving it 21:32:12 <zaitcev> Nice 21:32:36 <timburke> i wrote up a bit of my findings on the main patch 21:32:39 <timburke> #link https://review.opendev.org/#/c/711342/ 21:32:39 <patchbot> patch 711342 - swift - Add concurrent_gets to EC GET requests - 12 patch sets 21:34:15 <timburke> basically, i drove a known load via swift-bench against my saio -- then introduced some random delays and watched how the various timeouts available led to different behaviors 21:35:21 <timburke> without clay's work, i could only really play with recoverable_node_timeout, which would help but not that much. and the lower i turned it down, the more failures i'd see as a client 21:36:35 <timburke> with the new behavior, throughput's up and failures are down. i'd still like to see what it does in a real cluster with real disks, but i like it so far 21:36:58 <timburke> anyone else who's interested in EC and performance ought to take a look :-) 21:38:11 <timburke> that's all i've got 21:38:16 <timburke> #topic open discussion 21:38:29 <timburke> anyone else have topics we ought to bring up? 21:38:45 <kota_> definitely I'm interested but sorry I won't have enough time to play with it 21:38:54 <kota_> > waterfallEC 21:40:24 <timburke> all right, let's let mattoliverau enjoy his birthday :-) 21:40:35 <timburke> thank you all for coming, and thank you for working on swift! 21:40:38 <timburke> #endmeeting