21:00:17 <timburke> #startmeeting swift 21:00:18 <openstack> Meeting started Wed Mar 18 21:00:17 2020 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:21 <openstack> The meeting name has been set to 'swift' 21:00:25 <timburke> who's here for the swift meeting? 21:00:30 <seongsoocho> o/ 21:00:39 <alecuyer> o/ 21:01:02 <kota_> hello 21:01:26 <tdasilva> o/ 21:02:01 <clayg> more like *party* time 21:02:12 <timburke> agenda's at https://wiki.openstack.org/wiki/Meetings/Swift 21:02:25 <timburke> #topic covid-19 / Vancouver 21:02:34 <timburke> so i'd meant to mention this thread last week but forgot (things have been a little hectic with my recent job transition) 21:02:38 <timburke> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-March/013127.html 21:02:43 <timburke> but it looks like i've got a new message to reference now, anyway! 21:02:49 <timburke> #link http://lists.openstack.org/pipermail/foundation/2020-March/002854.html 21:02:59 <timburke> looks like the PTG is going virtual! 21:03:23 <timburke> i know just this week the bay area (where i live) has recommended people shelter-in-place for the next three weeks 21:03:33 <timburke> so i guess this isn't entirely surprising 21:04:08 <timburke> (also, apologies in advance -- i'm probably going to be less available than usual if i've got two small kids at home full time) 21:04:29 <alecuyer> yup.. I'm trying that now (and with just one :) ) 21:04:39 <rledisez> hi / 21:04:41 <rledisez> o/ 21:04:47 <clayg> *virtual* PTG 🤔 21:04:56 <clayg> I hadn't heard that - thanks @timburke 21:05:13 <kota_> no worry, a lot of people have same situation. I also take care of my kids in my home. 21:05:33 <alecuyer> clayg: sounds as good as a "virtual" beer no ? but I guess it's for the best 21:06:06 <kota_> +1 for the virtual drinking. 21:06:08 <timburke> i'm sure there will be more organizing and planning going on over the next few months 21:07:16 <timburke> and i'm hopeful about us finding a way to have some dedicated time together to think hard about swift :-) 21:07:39 <timburke> stay safe everyone! 21:07:54 <clayg> perhaps even in person post apocalypse! 21:08:23 <timburke> we'll all meet up at matt's ~~beach house~~ bunker! 21:09:00 <timburke> #topic jerasure support in liberasurecode 21:09:25 <timburke> so i noticed recently that the liberasurecode gate is currently broken 21:09:31 <timburke> all jobs fail with something like `fatal: repository 'http://lab.jerasure.org/jerasure/gf-complete.git/' not found` 21:09:38 <timburke> at first, i was inclined to just replace the repo with a working mirror (such as ceph's fork on github, done in https://review.opendev.org/#/c/712842/) 21:09:39 <patchbot> patch 712842 - liberasurecode - Use ceph's GitHub mirrors for gf-complete/jerasure - 1 patch set 21:09:49 <timburke> but investigating further, i found 21:09:51 <timburke> #link http://web.eecs.utk.edu/~jplank/plank/www/software.html 21:09:59 <timburke> the notice toward the top indicates that jerasure is no longer supported and the source has been taken down 21:10:06 <timburke> (all-in-all, it sounds like part of a patent-suit settlement) 21:10:23 <timburke> so i guess my main question is: do we go chasing forks/mirrors (which may share a similar fate), or stop supporting jerasure? or maybe just stop *testing* jerasure? but then it'll be difficult to tell when/whether we've broken support 21:10:46 <timburke> i suppose that last one is the closest to our current support model for shss and libphazr... but i don't know that we'd even get reports of breakage, much less any assistance in resolving issues :-/ 21:11:29 <kota_> true 21:12:17 <rledisez> stopping test does not seem good. I would vote in favor of deprecating it, but still supporting it through mirror for some time (1 year?) 21:12:32 <timburke> does anyone have clusters running with jerasure? i know swiftstack would always go with isa-l... 21:12:33 <rledisez> what are the other options instead if ISA-L to support the same EC schema? 21:12:39 <clayg> rledisez: that's pretty reasonable! 21:12:49 <rledisez> we run jerasure but i(ve been considering to move to isa-l recently 21:13:04 <kota_> AFAIK, isa-l or shss for NTT groups 21:13:31 <clayg> rledisez: oh ouch - i remember when we looked at jerasure the lawsuit stuff turned us off 😬 21:13:36 <rledisez> but I'm thinking what about people running swift on non-x86 processor (does somebody do that?). can they run isa-l in replacement of jerasure? 21:14:03 <clayg> rledisez: I don't think isa-l is going to be "compatible" so much as it'd just be a different ec policy with a different scheme - you'd want to "support" jerasure forever (or re-encode all your data!) 21:14:32 <kota_> I'd make sure if the liberasurecode_rs_vand is not effected by the GF-complete problem? 21:14:37 <rledisez> clayg: i did basci test and it was working, but it's on my todo to run extensive testing on that 21:15:00 <clayg> rledisez: oh WOW - it'd be *amazing* if I was wrong about that 21:15:24 <rledisez> clayg: just a basic test running pyeclib manualy, still need a lot of confirmation 21:15:26 <clayg> kota_: was the GF-complete the thing where decode would return bad data if you gave it specific combinations of frags? 21:15:53 <timburke> kota_, i know libec's built-in algo doesn't link against gf-complete -- though whether it would run into patent trouble is a separate issue... 21:16:21 <clayg> timburke: ok, well 1) awesome find, i'm sure no one else was paying attention to gate tests for pyeclib and 2) does rledisez 's suggestion of "support" through ceph mirror with big WARNING WILL REMOVE somewhere in the changelog ASAP? 21:16:28 <timburke> clayg, no, the bad data thing was an isa-l bug 21:16:41 <kota_> clayg: I don't think so. that problem was in isa-l rs_vand. 21:16:46 <clayg> timburke: then hopefully rledisez can drive putting together a "how to not with the jerasure" guide that we can publish when we pull the plu 21:17:07 <timburke> that all sounds like a great plan :-) 21:17:48 <clayg> rledisez: and god speed on getting of jerasure 👍 21:17:49 <timburke> (this, and the quadiron patches, reminds me how i rather wish we had some alternate plugin model that more-explicitly pushed the glue-code responsibility down to each driver...) 21:17:57 <rledisez> well, I hope my plan is gonna work then :D 21:18:20 <clayg> rledisez: well you can be like "look upstream is removing support for jerasure" :P 21:18:32 <clayg> timburke: yes plugins are so hard to do right 😞 21:18:48 <timburke> especially in a language you're not super-familiar with 21:19:41 <timburke> all right, i think i've got what i need out of that -- on to updates! 21:19:47 <timburke> #topic waterfall EC 21:20:00 <timburke> clayg, how's it looking? 21:20:32 <clayg> so I think my last update was two weeks ago - at that time I was like "yeah we can't just extend replicated concurrent gets; because the control is in the wrong place" 21:20:56 <clayg> so then I thought I'd just decouple EC get from database & replicated GETs then I'd be able to "make it so much simpler!!!" 21:21:00 <clayg> yeah that didn't work 21:21:15 <clayg> the first thing I wanted to "rip out" was the "resuming stream feature" 21:21:52 <clayg> basically I never liked it and don't have a clear picture of how often a chunkreadtimeout actually turns into a resume'd get - and even less so how often that WORKS - even for replicated! 21:22:54 <clayg> then I started looking at how it fails in the EC case and was all like https://media1.tenor.com/images/dcb66efa26bc6d58becc3581e5f41e38/tenor.gif 21:23:25 <clayg> So i decided EC GET's don't NEED resuming behavior and THEN I can "make it so much simpler!!! 21:23:31 <clayg> but yeah that didn't work 21:24:16 <clayg> I removed a couple hundred lines of resume code - but there's still like 400 lines of "multi-byte range" response handling code that is ALSO burried in the GETorHEADHandlerBase/ResumingGetter mess 21:24:46 <clayg> and I'm not sure I can convince myself EC GET's don't NEED multi-byte-range responses 21:25:09 <clayg> I mean... they probably don't - I think Sam just added it because he wanted too and no one stopped him... but I could be wrong, maybe someone wants it 21:25:31 <clayg> and since I don't really have a good reason to pull it off of replicated objects it seems like we're probably stuck with it on EC 21:25:38 <clayg> ^ that's actually up for debate I guess? 21:25:47 <clayg> tdasilva: seemed to think "well maybe we CAN drop it!?" 21:27:01 <rledisez> if it was broken I would say drop it, but I think it's working, and I can tell for sure that somebody somewhere in the world is using it. so changing the API, mmm… 21:27:21 <clayg> anyways - aside from maybe a little forward progress on the core EC GET request handling code and related tests I'm kinda back to square one 😞 21:27:37 <clayg> yup, that's my gut as well 21:27:57 <timburke> i'm still wondering whether it might make things easier to reason about if we at least pulled the multi-range support out to middleware -- though i think SLO uses it, so ordering may be a little annoying... 21:28:04 <kota_> IIRC, the multi range supports for EC is needed because a segment may belong to 2 fragments 21:28:39 <kota_> due to the user range GET request. 21:29:11 <clayg> kota_: there IS some byte range translation for client requests - and you need that even for SINGLE range requests - but the ability for bytes=0-4,8-12 to turn into a MIME document isn't really dependent on the storage policy 21:29:37 <clayg> in FACT - we could *BUILD* multi-byte-range responses (the MIME responses) in middleware using ONLY single byte-range requests to the proxy 21:30:01 <clayg> start a MIME response, fetch bytes 0-4 and send those, then fetch 8-12 and send those 21:30:14 <kota_> ah, it should follow the storage policy. I don't think the translation is needed for the repliated one. 21:30:31 <clayg> that actually seems like a MUCH better way to do multi-byte-range responses than what we have now (that threads mime handling all through the proxy and storage layer) 21:31:03 <clayg> right for multi-byte-range request to replicated data we just return the object server's MIME response (which is... idk, gross to me for some reason) 21:31:15 <timburke> there's going to be some corner cases we'd have to consider if we moved it to middleware -- a 416 on the first range may or may not mean we should 416 the whole request, for example 21:31:38 <clayg> like I don't WANT my object servers to know how to make MIME responses - I think Sam just got a little crazy with multipart messages once he did that thing for EC PUT 🤷 21:31:44 <timburke> and *definitely* need to make sure we plumb in an If-Match header on subsequent requests 21:32:44 <clayg> timburke: yeah... if we decided to stop and say "ok, you can't have better backend EC request handling until you pull multi-part-byte requests to middleware" it'd be LONG haul 21:33:50 <timburke> fwiw, AWS only supports a single range per request 21:34:27 <clayg> so realistically I guess I'll probably take another stab at pulling apart GETorHEADHandler somehow 21:35:13 <clayg> leave the resuming and multi-byte-range handling in place and extract the connection logic so it's either like dependency injection, or just subclasses 21:37:10 <clayg> maybe ResummingGetter becomes BaseMultiRangeResumingGetter and GETorHEADHandler becomes ReplicatedGETorHEADHandler and some of ECObjectController._get_or_head_response goes into a new ECGETorHEADHandler that does all the Response Bucket stuff 21:37:23 <clayg> so, I guess that's the plan 21:37:25 <tdasilva> just to add a bit more about my idea of just dropping it. my reasoning was: 1. s3 doesn't support it (hence my assumption very few (if any) people care about it. 2. we can have build it in middleware. So my idea was "drop it" and if someone complains, add it to middleware 21:37:27 <clayg> 3rd times the charm! 21:38:11 <clayg> tdasilva: I didn't mean to throw you under the bus - FWIW I totally understood that line of reasoning and find it compelling 21:38:15 <tdasilva> if no one complains, less code for us to support. 21:39:09 <tdasilva> clayg: I gotcha, just wanted to provide some thoughts behind it, cause I honestly don't think it's a bad idea. but that's just my opinion... 21:39:24 <tdasilva> we could have the middleware ready 21:39:29 <clayg> also having investigated how much work it'll be to make "waterfall-ec" mergable - it's entirely possible priorities may shift and this will be a slow burn rather than hard push 21:40:53 <clayg> rledisez: straw man - if we had a change that made EC demonstrably better, plus simpler code - but dropped multi-byte-range responses BUT in followup patch we reimplemented multi-byte-range as middleware ware ... 21:41:08 <clayg> could we merge the first one w/o merging the second one until we need it? 😁 21:42:06 <rledisez> clayg: well, that's a tough position for me. like I have to wait for a customer to complain, then we merge it. in the mean time, my customer says he will move to OTHER-CLOUD-PROVIDER because it didn't break his workflow 21:42:32 <rledisez> maybe I should add a timeseries to monitore if somebody use it 21:43:17 <clayg> rledisez: i guess it depends how much you want it out 21:43:45 <clayg> rledisez: and it sounds like you're probably justifyable saying "it's not causing ME any pain; please don't make pain for me" and that seems reasonable 21:43:49 <tdasilva> I this it's reasonable to think that over time we add cruft to the code base that over time is no longer used/needed. It's really hard (almost impossible) to find it, but I think we should make attempts as it would simplify the code, making it better 21:43:53 <clayg> let me take one more stab at this with less code churn 21:44:00 <tdasilva> s/I this/I think 21:44:31 <clayg> if I fail again I may come back and beg you to do some more qualification on multi-range responses 21:44:32 <timburke> tdasilva or i could start poking at multi-range-as-middleware if we get serious about going that route, anyway 21:44:33 <rledisez> I guess we have some time to decide on this (if we need the middleware). I'll try to find out if somebody use multi-byte range on my clusters 21:44:54 <timburke> sounds good. we oughta keep moving 21:44:59 <clayg> 👍 21:45:02 <timburke> #topic lots of small files 21:45:08 <timburke> rledisez, i saw a merge from master! 21:45:23 <rledisez> yep, I'll let alecuyer explain where he is now on losf 21:45:27 <alecuyer> I've posted a list of the main changes planned so far, here 21:45:32 <alecuyer> #link https://wiki.openstack.org/wiki/Swift/ideas/small_files/implementation#LOSF_v2 21:45:57 <alecuyer> If you have questions, go ahead, or I can put it on an etherpad if that's better 21:46:20 <alecuyer> Otherwise, I haven't posted code yet, for lack of time these past few days, but also because of going back and forth and changing my mind about some things 21:46:58 <timburke> so does hashes.pkl get written in the volume, or somewhere else? 21:47:20 <alecuyer> it's written in the same place as it is in the regular diskfile, currently 21:47:34 <alecuyer> object-X/partition - but , that could change to be below the "losf" directory 21:48:07 <timburke> cool - i couldn't remember where we wrote it currently ;-) 21:48:52 <timburke> i look forward to seeing the next few patches! 21:48:56 <rledisez> right now the development is happening in our internal branch. how do you see the reconciliation with feature/losf? 21:49:01 <rledisez> alecuyer: ^ 21:50:02 <alecuyer> well I think I still need to do some testing, and once I get something that I think works, I'll try to split it in proper patches 21:51:03 <rledisez> great. i guess we will try that future dev happen directly on feature branch :) 21:51:39 <timburke> +1 21:51:49 <timburke> #topic CORS 21:51:54 <timburke> p 712585 adds a cors gate job, and it even passes! 21:51:55 <patchbot> https://review.opendev.org/#/c/712585/ - swift - Add gate job for CORS func tests - 11 patch sets 21:52:02 <timburke> next up i'll work on stacking the s3api changes on top of that, and getting the s3 tests in p 710354 distributed across the s3api patches so you can see what gets enabled by each patch 21:52:02 <patchbot> https://review.opendev.org/#/c/710354/ - swift - Add CORS func tests for s3api - 3 patch sets 21:52:22 <timburke> has anyone tried running the new tests in p 533028? or even looked at them? i want to figure out whether this is even a palatable way to have func tests with an actual browser, or if i need to sort out something different 21:52:22 <patchbot> https://review.opendev.org/#/c/533028/ - swift - Add some functional CORS tests - 8 patch sets 21:52:35 <timburke> i saw that clayg has opinions :-) 21:52:57 <clayg> so on p 533028 - should all of the tests PASS? 21:52:57 <patchbot> https://review.opendev.org/#/c/533028/ - swift - Add some functional CORS tests - 8 patch sets 21:53:27 <timburke> yes 21:53:51 <timburke> (with the two patches that it's stacked on top of) 21:54:12 <timburke> well, pass or skip, anyway 21:55:14 <timburke> actually, maybe it's better to follow-up in -swift -- i wanted to leave time for 21:55:22 <timburke> #topic open discussion 21:55:47 <timburke> anything else for us to bring up? 21:56:02 <alecuyer> I'm curious to know the proportion of HEAD requests you all get on your clusters. Do share if you can! 21:56:09 <alecuyer> (I think I asked that once already actually ;) ) 21:56:34 <rledisez> so for us, 54% HEAD, 23% GET 21:57:01 <rledisez> we've been trying to evaluate the cost of HEAD (cost in I/O) 21:57:06 <rledisez> it's not that easy 21:58:41 <clayg> I don't have that metric in aggregate offhand - I'll drop a note to try and sample some clusters we can look at 21:59:27 <timburke> alecuyer, rledisez do you also have stats on user agents? i know python-swiftclient tends to be noisy with the HEADs... 21:59:42 <rledisez> just a note on the drop-md5 work, i uploaded a "working" patch (some tests still need to be fixed). if you're interrested you can look at it. on replication policy it increase the download speed like x3. let me find the link 21:59:55 <alecuyer> rledisez: if you don't, I will look it at (user -agent) I don't have it now 21:59:59 <rledisez> timburke: I can check that 22:00:07 <rledisez> or alecuyer will :) 22:00:24 <rledisez> drop-md5: https://review.opendev.org/#/c/713059/ 22:00:24 <patchbot> patch 713059 - swift - WIP: Make the hashing algorithm configurable - 2 patch sets 22:00:30 <timburke> i've got a snippet of logs from one of our clusters that's got like 300:9:1 for GET:HEAD:PUT, but it's a pretty short timespan iirc 22:00:31 <zaitcev> holy cow, where do all these HEAD come from? 22:00:45 <alecuyer> zaitcev: my thoughts exactly 22:01:14 <seongsoocho> 80/15/5 for GET/HEAD/PUT 22:01:32 <alecuyer> seongsoocho: thanks 22:02:01 <timburke> all right, we're at time 22:02:03 <timburke> thank you all for coming, and thank you for working on swift! 22:02:07 <timburke> #endmeeting