21:04:16 <timburke> #startmeeting swift 21:04:17 <openstack> Meeting started Wed Nov 13 21:04:16 2019 UTC and is due to finish in 60 minutes. The chair is timburke. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:04:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:04:20 <openstack> The meeting name has been set to 'swift' 21:04:30 <timburke> who's here for the swift meeting? 21:04:31 <kota_> o/ 21:04:35 <clayg> o/ 21:04:39 <mattoliverau> o/ 21:04:39 <alecuyer> o/ 21:04:43 <rledisez> o/ 21:05:18 <timburke> sorry to run a bit late; was upgrading the OS on my laptop 21:05:18 <clayg> alecuyer: rledisez: I feel like I *just* saw you guys!? 21:05:40 <alecuyer> chinese flashback! 21:05:41 <tdasilva> o/ 21:05:54 <timburke> on that note... 21:05:58 <timburke> #topic PTG recap 21:06:01 <rledisez> clayg: did you travel in time when flighing home? it may be the explanation ;) 21:06:40 <timburke> as expected, it was wonderful seeing everyone who made it :-) 21:06:48 <clayg> rledisez: so funny story, I started my travel day by going to the wrong airport - but... i made it home 21:06:56 <timburke> everyone else, hopefully i'll get to see you in vancouver 21:07:03 <clayg> VANCOUVER!!! 21:07:11 <timburke> clayg, oh jeeze! glad you still made it home! 21:07:15 <clayg> i hear vancouver has a bunch of chinese millionares 21:08:01 <timburke> we had some really good discussions, hopefully we more or less kept the etherpad updated 21:08:04 <timburke> #link https://etherpad.openstack.org/p/swift-ptg-shanghai 21:08:06 * rledisez wish he was a canadian millionares in china 21:08:15 <mattoliverau> lol 21:08:17 <tdasilva> lol 21:09:03 <timburke> we met a new Korean operator -- if you see seongsoocho in -swift, you ought to say hi :D 21:09:21 <timburke> had a good ops feedback session 21:09:21 <mattoliverau> nice :) 21:09:24 <timburke> #link https://etherpad.openstack.org/p/PVG-swift-ops-feedback 21:09:31 <clayg> it's 6am in seoul 21:09:51 <clayg> i'm sure kota_ and mattoliverau wouldn't complain if we wanted to make the meeting a little later 🤷♂️ 21:10:40 <mattoliverau> 8am here now, so isn't too bad. But its 6 in Tokyo isn't it kota_? 21:10:58 <mattoliverau> daylight savings for the win here 21:11:11 <tdasilva> when is ptg in vancouver? 21:11:36 <timburke> i want to say june? let me find the email... 21:11:40 <kota_> mattoliverau: yup 21:11:55 <kota_> same with Tokyo timezone 21:11:59 <rledisez> yep, 8 to 11 june I think 21:12:13 <mattoliverau> Seeing as I'm not in the cloud team anymore at Suse I probably wont have any travel funding to go. I guess I could attempt to get a talk in, maybe if it gets accepted they'll send me. I'd have to talk to my new manager. 21:12:13 <rledisez> I'm sure it ends the june, 11th 21:12:13 <timburke> http://lists.openstack.org/pipermail/foundation/2019-September/002794.html says Jun 8-11 21:13:02 <timburke> mattoliverau, there's also the travel support program: https://wiki.openstack.org/wiki/Travel_Support_Program 21:13:14 <timburke> i should make sure seongsoocho knows about it, too 21:13:15 <mattoliverau> that's true, I have used it before :) 21:13:53 <timburke> big takeaways i got out of the week (and feel free to chime in with corrections or additional info): 21:14:12 <clayg> tdasilva: aren't you going to New Zealand at some point? Can you pick up matt on your way back to the states? 21:14:35 <tdasilva> heh, sounds like a good idea 21:14:47 <timburke> on LOSF: the main cluster that rledisez and alecuyer needed this for is getting phased out, so its future is up in the air a bit 21:14:48 <tdasilva> or maybe you all come down to join us 21:15:19 <rledisez> timburke: a bit of precision, it will happens in about 12 to 18 months, so we still get a bit of time on it ;) 21:15:20 <mattoliverau> ^ that :) 21:15:57 <timburke> we know there are a bunch more tests that we'd like to see, but we're also not sure we want to take on the maintenance burden when we might be able to get a lot of benefit out of things like xfs's realtime device support 21:16:26 <clayg> rledisez: well, but even given the runway on the phase out - aren't you also being tasked with planning for the NEW cluster (i.e. benchmarking alternatives to existing LOF index & slab storage) 21:17:03 <rledisez> right now, we are still investigating all possibilities: 21:17:07 <timburke> we know drives are only going to get bigger over time, though -- and we suspect that lots-of-small-files as a problem is going to start to look like lots-of-files 21:17:19 <rledisez> xfs realtime => I asked the status on XFS ML, to see if it maintained/tested/… 21:17:26 <rledisez> zfs => looks a nice possibility 21:17:35 <rledisez> LOSF/LOF => still in course 21:17:50 <rledisez> open-cas => does not seem stable, but is maintained so we may talk to them 21:18:08 <timburke> rledisez, alecuyer was saying that zfs doesn't have good recovery tooling, though, yeah? 21:18:41 <rledisez> I think it was about what happen in case of an I/O error. the only option might be to reboot the server 21:18:53 <rledisez> so we need to check that, right 21:18:58 <alecuyer> I started to work on eBPF scripts to monitor block device IO and link it to inode/xattr access, vs file data. Good going on SAIO, but not working on our prod (need diff kernel options). Once I have something good i'll share so everyone can check if they'd benefit from xattr stored or cached on a fast device (allowing for the use of larger HDDs for data) 21:19:13 <tdasilva> anyone ever look how hard it would be to use something like bluestore? 21:19:46 <rledisez> tdasilva: does it allow to place the rocksdb on a different device ? (SSD/NVme) 21:20:10 <tdasilva> dunno 21:20:39 <tdasilva> rledisez: why? 21:20:44 <mattoliverau> alecuyer: yeah 21:20:53 <rledisez> the goal is place filesystem metadata (inode/xattr) on a faster device (or the LOSF index) 21:21:01 <mattoliverau> you can decide if you want the rocks db and wal on a seperate device. 21:21:39 <rledisez> mattoliverau: so yes, that's something that could be investigated yes, but it's pretty much the same solution than LOSF, so I wouls stick to LOSF for now as it is designed for swift especially 21:22:10 <timburke> on versioning: people seemed enthusiastic. null namespace didn't seem to scare anyone off, and swift growing another versioning scheme seems like a good idea given how poorly the current one maps to s3 versioning. iirc, everyone wants s3 versioning 21:23:19 <timburke> so, clayg, tdasilva, and i will be working on that a lot, hopefully getting it ready to merge to master within the next few weeks 21:23:35 <mattoliverau> nice 21:24:52 <timburke> on "atomic large objects": we recognize the utility, but still aren't sure about how to implement it. had a couple discussions but no clear resolution -- will probably come up again the next time we meet in person 21:26:07 <clayg> yes bluestore was very much aimed at putting the index/metadata db on a seperate device from the blob slab 21:26:31 <clayg> you can also put it on the same device, but it's not quite as awesome as filestore 21:26:35 <timburke> on the object updater, i had an idea about grouping async pendings by container and sending UPDATE requests to do them all at once 21:26:37 <timburke> will almost certainly need some benchmarking before we know whether its actually a *good* idea 21:26:57 <mattoliverau> oh cool 21:27:09 <mattoliverau> with the new UPDATE that makes sense. 21:27:14 <clayg> eitherway it's VERY ceph sepcific - I couldn't find any documentation for a ABI or something that allows general access to volume 21:27:45 <timburke> on recon dumps: it'd be nice to get more/better keys, but where we *really* want to go involves tracking what the oldest unprocessed work item is 21:27:47 <mattoliverau> clayg: yeah I've struggled to find decent info on it too 21:29:11 <timburke> like, replicator should be able to track when partition have successfully synced with all primaries, expose the one that needs to sync most, and prioritize that work 21:29:49 <timburke> in the mean time, getting missing keys into recon could be a good short-term win 21:30:04 <clayg> #ft 21:30:06 <clayg> #ftw 21:31:09 <timburke> on tiering... we didn't actually talk much about it. sorry mattoliverau. fwiw, i know that we have customers wanting that sort of behavior, though, and i think that the null namespace could be very useful for the implementation 21:31:47 <mattoliverau> yeah, I thought so too. if null namespace is the future, we shouhld hold off so we use it. 21:32:16 <timburke> in particular, it'd be useful in combination with versioning -- so you could tier off the non-active versions to somewhere cheaper 21:32:31 <mattoliverau> +1 21:32:43 <mattoliverau> speaking of recon, I wrote this ages ago.. need to see if it still is correct and everything, it's a little old:https://review.opendev.org/#/c/541141/ 21:33:10 <timburke> i think it'd be great to keep in mind as we think about implementing some of s3's bucket policies (in particular, deleting non-active versions older than X) 21:33:27 <zaitcev> Dude. I once tried to write a Ceph client in Python. The current one at the time spawned a thread for each RADOS request, which called into C++. But it was impossible with the lack of docs, and the code was fairly impregnable. You may be able to reverse-engineer Bluestore API, but it's not going to be easy, I can guarantee that much. 21:33:53 <timburke> mattoliverau, oh, nice! yeah, that does seem useful 21:35:00 <clayg> zaitcev: i bet alecuyer could do it :nerd_snipe: 21:35:11 <mattoliverau> :) 21:35:19 <timburke> on performance: https://review.opendev.org/#/c/693116/ looks good, clayg brought some nice history to the discussion that made us all feel a lot better about getting rid of the queues 21:35:36 <alecuyer> clayg: all that talk certainly got me curious ;) 21:35:37 <timburke> working its way through the gate now 21:35:49 <clayg> EAT IT GATE 21:36:13 <clayg> timburke: rledisez: i was unclear on if dropping the q significantly helped throughput - or it just mostly reduced cpu? 21:36:24 <rledisez> clayg: both actually 21:36:34 <rledisez> I'll submit soon a patch to get rid of the with Chunk*Timeout 21:36:49 <timburke> clayg, https://etherpad.openstack.org/p/swift-profiling says just no-queue brought like 45% better throughput 21:36:53 <mattoliverau> heres my dodgy benchmark results from a SAIO: https://etherpad.openstack.org/p/swift-remove-prxy-queues-benchmarks 21:37:15 <rledisez> after that, I found some other place that could provide some perf improvment (especially a place in the proxy that does a cache that is reseted at every request :)) 21:37:18 <mattoliverau> which was just a look with ssbench and getput 21:37:55 <mattoliverau> rledisez: oh that sounds like a useful cache :P 21:38:12 <rledisez> and after that, I want to get rid if MD5 as a checksum algorithm (not as placement algorithm) 21:39:20 <clayg> ok then! 21:39:36 <timburke> rledisez, i wonder if it'd actually be easier to get rid of it as a placement algo... or at least, make the choice of algo a property of the ring 21:39:57 <rledisez> timburke: yeah, but I'm not sure yet the gain is worth it 21:40:07 <timburke> fair 21:40:13 <rledisez> while as checksum, he, the double md5 calculation in EC is really killing perf 21:40:17 <rledisez> but it's gonna be hard to drop MD5 because I think it's part of API (etag header) 21:40:41 <timburke> yup, that was my thought, too :-( 21:41:40 <timburke> still, if EC only had to do one MD5 and one (HW-optimized, yeah?) SHA-256 or SHA-512... might be a solid win 21:41:52 <timburke> similar with encryption 21:41:56 <rledisez> yeah, that was my though too 21:42:36 <timburke> on swiftclient test directory layout: yeah, just make it consistent with swift. merged. 21:42:36 <rledisez> i'm thinking of adler32 maybe, which is designed especially for that (and 4 times faster than md5 on my bench server) 21:44:33 <timburke> i *think* that about covers the PTG... am i forgetting anything? 21:44:52 <clayg> i hope not! I gotta go async 21:45:55 <timburke> there were some interesting talks from cmurphy about keystone scoping and access rules -- seems like they might fit well with swift 21:46:32 <timburke> and i know people are interested in having keystone application credential support in swiftclient 21:46:56 <mattoliverau> oh really, I might have to go look those up. cmurphy is awesome 21:47:12 <timburke> all right, i think that's all i've got 21:47:13 * cmurphy blushes 21:47:17 <timburke> #topic open discussion 21:48:35 <alecuyer> I'll just leave the link here, anyone has ideas about using larger drives, please add to it: https://etherpad.openstack.org/p/swift-ptg-shanghai-large-drives 21:50:29 <timburke> all right. thank you all for coming, and thank you for working on swift! 21:50:37 <timburke> #endmeeting