21:00:13 #startmeeting swift 21:00:13 Meeting started Wed Mar 15 21:00:13 2017 UTC and is due to finish in 60 minutes. The chair is notmyname. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:14 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:16 The meeting name has been set to 'swift' 21:00:19 who's here for the swift team meeting? 21:00:23 o/ 21:00:25 hi 21:00:27 o/ 21:00:40 hi 21:00:42 o/ 21:00:44 o/ 21:00:56 hi 21:00:57 o/ 21:01:12 o/ 21:01:17 o/ 21:01:17 clayg: around? we're going to talk about the rebalance stuff 21:01:23 Now that I am not an hour early. 21:01:29 jungleboyj: :-) 21:01:42 o/ 21:01:45 Time change is truly messing with me this year. 21:02:13 welcome everyone 21:02:28 we've got some interesting and important stuff to cover this week 21:02:33 #link https://wiki.openstack.org/wiki/Meetings/Swift 21:02:49 o/ 21:02:58 first up, although she's asleep right now, I'd like to welcome mahatic to the swift core team 21:03:15 whoooo! 21:03:16 \o/ great news, welcome Mahati! 21:03:52 congrats! 21:04:02 \o/ 21:04:13 congrats Mahati 21:04:13 part time freak out woooo 21:04:23 congrats! 21:04:23 congrats 21:04:35 #topic awareness patches 21:04:54 two patches to make you aware of, and need gerrit attention, but I don't think they need big discussion in here 21:05:05 first, patch to close a critical bug 21:05:10 #link https://review.openstack.org/#/c/444604/ 21:05:16 so please review that one 21:05:35 clayg and mahatic and zaitcev have been looking. thanks 21:05:35 I did but I didn't realize it was critical. 21:06:24 second, thurloat has proposed a patch to make an API to undelete accounts that have been marked as deleted 21:06:29 #link https://review.openstack.org/#/c/445160/ 21:06:29 omg timezones suck so much - the globe is HUGE 21:06:55 we've said for about 7 years that we should have an API for this, and thurloat actually took the initiative and put together a patch 21:07:08 I know that several of us have run in to needing this from time to time 21:07:30 so which I don't think it's a time-critical thing to review, I think it's very important to review from the perspective of the API we want 21:07:45 is thurloat around? if patch 445160 works I'm pretty interested? 21:07:48 this patch adds a header to a proxy PUT verb. maybe that's great. maybe it's not 21:08:03 but we need to look at it and write down thoughts in gerrit 21:08:10 I'm always scared to start a review on a big deal from a new contributor tho (no offense!) 21:08:25 he's in -swift, but not here right now 21:08:32 and IIRC he's in an EU timezone 21:08:57 anyway, like I said, needs eyes from an API perspective 21:09:15 ok, let's move on to the bigger stuff 21:09:34 #topic hummingbird->golang->rebalance->oh my! status 21:10:04 last week I said I'd come this week with something written up on this topic: what's going on with golang/hummingbird/etc 21:10:24 there's a couple of things that you shoudl read, but start with this one... 21:10:30 #link https://wiki.openstack.org/wiki/Swift/Fixing-rebalance-and-golang 21:10:50 ok 21:10:50 clayg also wrote up a lot of info in an etherpad that's on the ideas page and linked in my wiki page 21:11:01 here's the basic summary 21:11:21 first, golang or not golang, that doesn't matter. language is a tool to solve problems 21:11:44 and the problem is that rebalances and latency and ops overhead for these are bad 21:12:08 we know several reasons why these things are bad 21:12:10 sorry what happened now 21:12:41 yeah! we want big fast and stupid easy! 21:12:50 eg with rsync, we aren't in the data path. and we don't do a good job of scheduling partition movement. and eventlet's hub is bad for disk IO 21:13:08 and we've talked about a lot of different ways to help solve these problems 21:13:17 one of which is rewriting stuff in golang 21:13:37 this is the essence of what needs to get done: 21:13:40 stupid eventlet hub and blocking operations - it says right on the tin don't use this if you do blocking io - why'd we do it then!? 21:13:55 1) we need a better protocol for moving data between storage nodes 21:14:16 2) we need a better way to schedule the work that needs to be done (eg moving a partition or hashing directories) 21:14:49 3) we need a standard protocol for proxy<->storage node communication that doesn't depend on our custom additions to it (I'm looking at you, 100-headers 21:15:10 4) we need a better way to do network-to-disk-and-back operations 21:15:19 that's it. that's the four things 21:15:24 * clayg waves at torgomatic 21:15:33 and to be clear, golang is still part of the solution 21:15:44 number 4 is solved with a golang object server 21:15:51 but rebalance is in replicator, isn't it? Why is proxy-storage commication important? 21:16:04 zaitcev: its' both repl and ec 21:16:09 just moving parts to the right place 21:16:22 proxy->storage matters for EC and for crypto 21:16:45 MIME prroglem? 21:16:53 problem 21:16:55 and we can't rewrite the object server in golang and not change the protocol unless we also modify standard golang libraries for doing http 21:17:17 aww man I didn't know what a prroglem was - but I definately thought we had one and didn't want it 21:17:43 so if instead we use a protocol that's more supported across different languages, we can actually solve the problems in the object server more easily 21:17:54 kota_: Go supports MIME and 100-continue, but not the way we use them, in partcular extra headers for the 100 reply. 21:18:13 in other words, the proxy->object protocol isn't itself crucial, but it's needed to vastly simplify the work needed to rewrite the object server 21:18:17 clayg, zaitcev: ok 21:18:34 zaitcev: can you send a second 100-continue in the middle of a chunked transfer?! 21:18:36 kota_: yep. the MIME stuff 21:18:48 including the 100-continue stuff 21:18:58 so that's why we need the new proxy<->storage protocol 21:19:35 also, I know I'm saying "storage" instead of "object". only to keep in mind longer-term stuff that may include account/container. but object server is definitely the first thing 21:19:37 *I* don't like the MIME stuff - it's *technically* pretty ok - don't confuse the 100-continue stuff that is acctually wrong, witht he 100-continue stuff that is just annoying, with the MIME stuff that is basically fine except clayg misses curl 21:19:52 clayg: :-) 21:20:04 clayg: no, the runtime decides when to sent one. But it seems to do it just where we otherwise want it, so no issue that I found. The only problem is that it's not possible to negotiate with our extra headers. 21:20:11 unless we *all* miss curl - in which case - heck yeah stupid MIME! 21:20:22 the point is, something that actually works without our upstream patches to eventlet is what we want 21:20:27 I generate MIME body and feed it to curl 21:20:56 ok, I'm going to rush ahead, then stop for questions 21:21:09 zaitcev: that doesn't sound corect - the *big* problem - the part that's *wrong* is the pause in the middle of the EC stream after sending the body waiting on the 100 continue to send the rest of the MIME document indicating the commit bit 21:21:17 so to get there (how do we get there john?), here's what we do.... 21:22:06 first (or zeroth, in the wiki page) is that we realize that the hummingbird branch is great R&D and POC. but it's not something we're going to add to and eventually merge with master 21:22:38 zaitcev: do you have MIME generating wrapper thing that you can use with pipes like `cat data | python wrap_mine.py | curl `!? 21:22:43 next, clayg's already been hard at work getting reconstruction and replication to a point where it's not a "hair on fire" situation when someone has big rebalance problems 21:22:58 that's pretty much done for replication and close for reconstruction 21:23:02 clayg: I captured it with wireshark once.... Sorry, not a generated one. 21:23:36 zaitcev: ack 21:23:42 after that, we need to make a better scheduler for the rebalance work (initially farming work to rsync or ssync), then we make "tsync" 21:24:00 notmyname: wfm, ain't no body been trying to merge hummingbird anyway near as I can tell 21:24:21 clayg has written a bunch on tsync (see the link in the wiki), and the basic idea here is that it uses http2+grpc (ie common stuff that's not NIH swift team wire protocols) 21:24:51 alongside the scheduler and tsync work, we need a golang object server that's feature complete 21:25:19 and we need to do the infra/devstack/testing/etc work to make sure golang is consumable in openstack 21:25:30 much of this is paralellizable 21:25:35 oh "written a bunch *on* tsync" like "about" not "written a bunch *of* tsync" 21:25:45 :-) 21:25:52 you're pretty much done right? ;-) 21:26:09 yeah I like that we can work on fixing rebalance seperately from fixing the object-server problem 21:26:55 that (the pages of ranting I just did) is where we are with solving one of the biggest problems in swift today, and golang is part of the overall solution to it 21:27:02 And I like that going with something like gRPC could make it a bunch easier to migrate the rebalance engine to golang piecemeal 21:27:07 and that's a basic walkthrough of the wiki page 21:27:12 so... 21:27:17 what questions do you have? :-) 21:27:26 and I like the idea of defining the swift consistency protocol seperately from the implementation 21:27:35 this is a pretty good idea actually notmyname - kudos! 21:27:57 clayg: well you helped me finish it up yesterday :-) 21:28:03 "hummingbird branch is ... not something we're going to add to and eventually merge with master" - so, what is? 21:28:28 zaitcev: I fully expect that some parts of hummingbird will be able to be reused 21:28:46 as of this moment, that is undefined (ie what branch that will go on) 21:29:06 is there somewhere any kind of functional tests for object server that could be used in checking golang object-server is complete? 21:29:09 zaitcev: you might be asking "what is the first patch that adds golang to swift master going to be" (it's a good question - do we know?) 21:29:37 rledisez: no, and it's been an annoying gap since we started talking about a hard rewrite of the object server 21:29:57 rledisez: so far all we have are the probe tests and functests. (ie not complete) 21:30:06 rledisez: I have had same thought, it would be a great thing to have 21:30:31 clayg, notmyname: at some point we will need some for our diskfile implem, so we will certainly start the work on that in the following weeks 21:30:46 rledisez: yep! 21:31:23 rledisez: oh some direct object server tests 21:31:27 rledisez: oh wait? are you saying you'll start on the diskfile stuff? or the tests? 21:31:42 rledisez: for testing current diskfile vs. your implementation 21:31:43 interesting 21:32:04 notmyname: start working on tests, to test the diskfile we’re working on 21:32:09 awesome! 21:32:10 we should start an etherpad to brainstorm requirements for that test suite 21:32:12 compared to the original 21:32:38 is there a bot that will put rledisez on the hook to start that etherpad? 21:32:50 I'll ask notmynamebot 21:32:57 :D 21:33:12 rledisez: there are some unit tests that make requests to a port on localhost, IDK if they could be re-used somehow 21:33:12 shit.. now i'm thinking about rledisez's diskfile and entrypoints 21:33:19 stay focused clayg 21:33:44 notmyname: zaitcev: I think there is an open and important question "what is the expected first golang patch to swift" 21:33:46 clayg: stay focused, but we started today about entry points ;) 21:33:48 what other questions are therE? 21:34:07 clayg: yeah. I don't think we know 21:34:09 clarkb: I think notmyname just said "undefined" to that. 21:34:27 we've talked about finding something tiny (eg a simple CLI script) to rewrite in golang just to test the mechanics 21:34:54 to get all the infra mechanics out of the way without it blocking a major patch 21:35:15 but beyond that, undefined at the moment 21:35:56 in general, though, does this make sense to everyone? specifically, the problem, the different parts of the solution, and how golang fits in? 21:36:35 i don't immediately love the idea of re-writing some trivial component in golang just to prove it's doable - that runs pretty counter to my expectation we'd push back on rewrites just for the sake of rewrites :\ 21:37:07 clayg: I think it depends on how much of the infrastructure is in our repo vs external 21:37:43 it's helpful to have the problem/solution broken down into the contributing parts - thanks for the write-ups notmyname and clayg 21:37:51 I had the idea that the thing we use to test the waters is something like logging, that we can extract from say golang, and would build bridges (and be small to make sense) because is could be the golong oslo.logging or config.. pick one. 21:37:51 if it's all in the swift repo (eg devstack plugin etc), then we can pretty easily avoid gratuitous rewrite 21:38:17 s/goloang/hummingbird 21:38:21 mattoliverau: I think that's what some of the golang-commons has proposed 21:38:54 wait. reoder that sentence 21:39:09 I think that's what some of the proposed golang-commons is about 21:39:10 i think we avoid a bunch of friction if we only golang stuff that's not tested/required in devstack 21:39:42 there's a huge amount of questions like this that will come up. and that we can't answer right now 21:40:13 right now I want to make sure people understand the general idea as presented and if there are any immediate concerns 21:40:36 there is a bunch of questions about "how does openstack golang" that are orthogonal to "how does openstack use a swift that doesn't have a python object server" 21:41:16 like openstack has a way that it doesn't requirements and dependencies and configs and talk to database and access DLM's and ... not all of that effects us 21:41:33 the part that is really sticky is the stuff that "deploys openstack" that has to change dramatically 21:41:52 on the openstack side, I've asked tdasilva and cschwede to help me by working on a TC resolution for the flavio process. 21:42:26 my point is, find a small bit that we can work though all the testing and golong openstack problems, get it right, something huge would take too long, and something useless doesn't help as much but a part of the overall thing is what we want.. 21:42:27 notmyname: tdasilva is mostly working on this atm, he just sent me a proposal 21:42:29 historically we've been pretty slow to go update that stuff to setup multiple storage policies, multipl regions, ec, encryption - if we change fundementally how swift is deployed from source - we might have to play ball a little more 21:42:32 clayg: and I want to start talking to other projects about that asap. I'll probably bug mattoliverau about ansible and cschwede about tripleo and find others for other projects 21:42:47 but anyway sure, this can be moved to a etherpad or out of meeting :) 21:42:59 yeah, I want to get to acoles's topic too :-) 21:43:12 so.. everyone excited? terrified? wheeee! 21:43:24 wheeee... 21:43:51 ok, mental context shift coming 21:44:08 #topic composite rings, how to best expose building in a CLI 21:44:14 acoles: this is your topic. what's up? 21:44:25 I'm terri-wheeed 21:44:33 :P 21:44:34 I think zaitcev has the skepticism about the golang object server replacement - but i'm not 100% sure why - we essentially need to rewrite the object server and we have a working example in hummingbird 21:44:55 ok, so Kota has been doing great work on global EC, we now have experimental support for duplication of frags... 21:45:06 yay kota_ 21:45:06 so awesome 21:45:13 and his next patch (well one of them ) is composite rings https://review.openstack.org/#/c/441921 21:45:18 thx 21:45:40 and we've an etherpad going to discuss some of the issues around that https://etherpad.openstack.org/p/composite_rings 21:45:56 #link https://etherpad.openstack.org/p/composite_rings 21:45:58 (for the bots) 21:46:15 thanks acoles to sketch the issues there 21:46:59 one particular issue we've been discussing is how best to expose the composition of rings on a CLI, and whether any state needs to be maintained about the composite ring 21:47:03 yup etherpads + acoles == awesomesause 21:47:05 (see the etherpad...) 21:47:57 acoles: it's a *long* etherpad? at somepoint it even gets into other global-ec metaissues 21:48:04 do you want people to go read it now? are you just raising awareness? 21:48:11 and even whether we need at this point in time to put CLI support in place or just have the functions available for people to use if they want to compose rings 21:48:19 acoles: kota_: great. do we have a particular question or thing to decide in the meeting, or do we follow up next week after people have read it? 21:48:26 No, don't read it now, 21:48:39 I'm raising awareness and seeking any opinions on the etherpad 21:49:06 ok, great. I'll bring it up again next week to make sure we've collected input and see where we are 21:49:42 acoles: is there anything else on that topic for today's meeting? 21:50:48 just that specific thing that's been discussed is whether we add support for this to swift-ring-builder, or have another CLI, or ... ? so if you have opinions on that please add them 21:50:55 cschwede: I'm particuarlly interested if you could spare some braincells on the UX for managing composite and component rings - I feel like you have a lot of relevant experience to draw from 21:51:08 +1 that^^ 21:51:43 acoles: thanks for bringing it up 21:52:02 #topic open discussion 21:52:02 acoles + etherpads == awesomesause - it is known 21:52:14 please remember to register for the summit, if you're going 21:52:23 clayg: we should get that on a t-shirt 21:52:27 there's an etherpad with topics we'd like to see discussed... 21:52:38 #link https://etherpad.openstack.org/p/BOS-Swift-brainstorming 21:52:40 clayg: I kinda hate them actually ! 21:53:07 I'm just back from PTG. Seems like Too Soon. 21:53:17 please update the etherpad. we've got about 5 days before people will pull from that to make topics and schedules 21:53:55 acoles: that just means you bring the aweswomesauce. . or shouhld I say you can spell awesomesauce without acoles ;) 21:54:00 anything else to bring up this week in the meeting? any more follow up from a previous topic? we've got about 5 minutes more 21:54:04 *cant 21:55:01 Or regisiter for the forum while it's free (if you were at PTG).. and then hope to come (I'm in this boat) 21:55:29 if you need a ticket and didn't get a free one, please ping me. I might be able to find one or two 21:56:27 anything else? shall we end a few minutes early? 21:56:42 early mark! (or should I say breakfast) 21:56:52 notmyname: clayg are you planning to add a forum topic on rebalance-awesomeness work? 21:57:01 For those of you that are having trouble getting management backing there was a note that came out from the foundation explaining why technical representation was important. 21:57:19 Check my twitter feed @jungleboyj or the @openstack feed for the link. 21:57:21 the forum topics need to be operator-focussed, so we;ll try to add one phrased like that 21:57:35 jungleboyj: got it handy now? 21:57:45 * jungleboyj looks quick. 21:58:04 notmyname: yes, might need a different slant than the wiki content 21:58:14 acoles: definitely 21:58:39 "ops who has rebalances running more than 72 hours after a drive replacement please register to this session" 21:58:46 lol 21:58:47 yes! 21:58:56 thank you for your hard work on swift. the project is better because of everyone here 21:59:02 have a good rest of your day! 21:59:06 #endmeeting