21:00:33 #startmeeting swift 21:00:34 Meeting started Wed Jul 15 21:00:33 2015 UTC and is due to finish in 60 minutes. The chair is notmyname. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:35 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:36 hello, everyone 21:00:38 The meeting name has been set to 'swift' 21:00:44 who's here for the swift meeting? 21:00:46 here 21:00:47 i 21:00:48 \o/ 21:00:49 yo 21:00:49 🐄 21:00:49 o/ 21:00:49 hi 21:00:51 o/ 21:00:52 hey 21:01:09 great 21:01:24 I know we've got a few people who are out (acoles, peluse, clayg) 21:01:34 agenda is at 21:01:35 \o/ 21:01:38 #link https://wiki.openstack.org/wiki/Meetings/Swift 21:01:49 #topic general stuff 21:01:58 first up, some general stuff 21:02:05 #link https://www.eventbrite.com/e/swift-hackathon-tickets-17308818141 21:02:12 hackathon registration link there ^ 21:02:29 if you have questions, please feel free to ask me (or jrichli) 21:02:37 I'm looking forward to seeing everyone there 21:03:05 how many people will be there? 30? 21:03:21 ho: yes, I believe so 21:03:24 also, today (in just under 10 hours) is the deadline for conference presentations at the tokyo summit 21:03:30 #link https://www.openstack.org/summit/tokyo-2015/call-for-speakers/ 21:03:32 notmyname: thanks 21:03:42 so please submit a talk if you want to speak 21:03:52 this is for the conference part, not the technical sessions 21:03:58 we'll do technical session stuff later 21:04:08 figuring out that schedule 21:04:54 also, especially if you aren't from the US, please check *now* about visa requirements for Japan. you may need to start the process now to have everything in time 21:05:34 any other general info from people before we move on to the other scheduled topics? 21:05:59 ok 21:06:05 #topic swiftclient release 21:06:18 so I tried to do a swiftclient 2.5.0 release yesterday 21:06:30 it didn't happen though 21:06:34 nice 21:06:44 on, no 21:06:51 just tried 21:06:51 reason being, I don't have permissions to do it anymore 21:06:59 ?! 21:07:09 so there's been a change in how "library projects" are released 21:07:16 it's fairly new 21:07:25 but here's some links (and then I'll summarize) 21:07:33 #link http://lists.openstack.org/pipermail/openstack-dev/2015-June/066346.html 21:07:39 #link http://git.openstack.org/cgit/openstack/releases/tree/README.rst 21:07:58 Summary: with so many "library" projects, releases are done differently and inconsistently. Now, only let a small group ("library-release") actually do the release and try to automate some stuff. Eventually, make a "releases" repo that has a bunch of yaml files to include info about each release, and kick off some jobs based on those files. 21:08:46 the new way is for me to submit a gerrit pull request to a new repo (recently created) and then it's supposed to get approved 21:09:27 so...that's the new way of things 21:09:51 I'll try to work through that and adjust and find out what's best 21:10:40 I've got to balance "how big of a deal would it be to not do this" against being in the "normal" way of things with the rest of openstack projects and see what the relative costs are 21:11:13 my goal is that i'll be the only one to have to worry about the process/bureaucracy so everyone else can focus on getting code landed 21:11:28 but I wanted to share what's happened and why the release didn't happen yesterday 21:11:45 yep, it's not openstack until you have to file a Ticket to Push Something 21:12:42 so to avoid further frustration, let's move on to the next topic :-) 21:12:55 (although i'm happy to discuss this with anyone later, if you want) 21:13:10 #topic python3 support 21:13:19 hmm...doesn't look like haypo is here 21:13:52 ok, there's been a lot of patches (i'm sure you've seen) about making swift work with py3 21:14:25 right now, it's slow, hard to review, of marginal benefit, and causes frustration for both the patch authors and the reviewers 21:14:36 so, let's find something better 21:15:03 the current strategy seems to be around "here's a thing that needs to be ported; fix it everywhere" 21:15:32 the problem is that you end up with big patches that are *very* hard to review and check that there is no benefit, and you can't even see a green py34 gate job at the end! 21:15:51 so I think there are a couple of different options we have for actually making progress 21:15:57 well, you can't see a green py34 gate anyway because of pyeclib 21:16:07 well, first.. 21:16:33 I'm presupposing that "works with py3" is a good thing. I think it is. is there any disagreement on that point? 21:17:10 ok, good :-) 21:17:11 "works with py3" is good. "works with py3" is also not my top priority. 21:17:21 * peluse sneaks in late... 21:17:22 torgomatic: I don't think anyone disagrees ther e:-) 21:17:43 ok, so there are 2 options we have for moving forward with py3 support 21:18:05 one option is to freeze the world and port all at once then let dev work continue 21:18:45 the advantage is that it takes care of it all at once (modulo any linkering port issues that cause bugs that sneak in) 21:18:47 * torgomatic votes for option two, whatever it is 21:18:50 lol 21:18:52 :D 21:18:59 lol 21:19:06 yeah, the downside is that you freeze the world 21:19:34 ok, the other option is that we go for a depth-first instead of a breadth-first method 21:19:40 hah.. I mean you're really just freezing merges. And forcing reconciling conflicts onto everyone else. 21:20:37 If we could do it in a week, it wouldn't be so bad. 21:20:37 redbo: right, but knowing how long stuff like that takes 21:20:37 do you think we could do it in a week? 21:20:50 so right now we're taking one issue at a time and porting it across the project 21:21:21 I have no idea! It doesn't seem outside the realm of possibility. 21:21:37 the alternative would be to take one module at a time and port everything in it. start with modules that have few import links and move up. then exclude the code that hasn't been ported yet so we get a passing gate 21:22:13 if we do one module at a time, I think it would probably take longer, but we'd have a passing gate job and wouldn't have to freeze the world for this to land 21:22:27 so that's the question. what do you think? which would you rather? 21:22:57 depth first +1 21:23:07 I'd really like to know how long we'd have to stop the world 21:23:17 +1 21:24:03 if we figure out how long it would take, it d help us make a better decision 21:24:04 here's the current py3 patches 21:24:05 https://review.openstack.org/#/q/status:open+project:openstack/swift+branch:master+topic:py3,n,z 21:24:19 at least those with a "py3" topic 21:24:51 definitely the second; if our estimates are low, then stop-the-world paralyzes all development, while one-at-a-time at least lets other development continue, even if it is more merge-conflict-prone 21:24:56 15 patches 21:25:08 wbhuber_: redbo: how do we figure out how long we'd have to freeze? 21:25:34 IME regression testing and issues always take longer than expected 21:26:03 if my experience with swiftclient are anything to go by, the issues can be frustrating and subtle 21:26:15 joeljwright: porting issues or resulting bugs? 21:26:37 there are always new issues related to unicode 21:26:50 and even when things are passing, you might not be testing what you think you are 21:27:11 there are still patches to fix py3 behaviour in the swiftclient queue now 21:27:23 and py3 has been turned on for ages 21:28:26 +1 for the second (joeljwright: thanks for the info) 21:28:30 I am curious to hear what acoles and clayg have to say on the topic 21:28:44 unfortunately acoles is off enjoying himself somewhere sunny 21:28:50 True, we use utf-encoded byte strings all over the place, and py3 will probably be easier if we use unicode representations. And then you have to deal with things like how do you urlencode a unicode object all over the place. 21:28:52 notmyname: couldn't we freeze for an acceptable amount of time, say a week. merge as much as possible, then unfreeze allowing the world to move again, and re-evaluate what to do next after that? 21:29:01 * peluse is holding out for py4 21:29:30 doing the depth-first type of port also presupposes that it's possible to find disconnected modules that can be ported 21:30:37 we could start with depth-first on the modules that seem easier to "disconnect" and go until maybe the rest is doable altogether 21:30:53 this patch https://review.openstack.org/#/c/199034/ is close to what I'm describing. it uses a whitelist instead of a blacklist for the py34 check, but there's still a lot of files touched just to get test_exceptions ported 21:31:05 Anywhere you urlencode or md5 a string, you have to know to convert it to utf-8 first since those operate on bytes. Finding all that junk will be the hard part. 21:31:15 jrichli: yeah, I think that's what we'd effectively get to anyway 21:34:03 redbo: what do you think of porting the disconnected stuff and getting a gate passing on them, then reevaluating as we get to more connected modules? 21:34:08 hurricanerix: ^ 21:35:08 if y'all are ok with that, then I'll talk to haypo about it this week 21:35:46 sounds ok to me. 21:35:58 Sure, if we can make it work and find smaller units to pull out and fix. 21:36:28 yeah, I suspect we'll end up with a big chunk of work at the end, but not 100% at once 21:36:54 ok, I'll talk to haypo and janonymous this week about it 21:37:18 or write somethign up so we can point poeple at it. that would probably be better (more scalable than when i'm awake and online) 21:37:29 ok, thanks for the input! 21:37:33 #topic container sync 21:37:39 eranrom: this is your topic 21:37:52 thanks 21:37:52 you have a lot of good links in the meeting agenda. can you summarize? 21:38:02 summary: 21:38:40 1. Existing container sync is single process single threaded and generate very low sync BW 21:39:06 2. Written in a way that every object is likely to get copied up to 4 timesa 21:39:38 a relatively simple change which adds parallelism greatly improves the BW it can generate 21:39:48 \o/ 21:40:00 the links summarise (1) measurements of before and after 21:40:13 (2) show how the parallel code looks like 21:40:16 we tested container sync and shut it off almost immediately because polling for things to sync uses a lot of resources. wouldn't paralellizing that make it worse? 21:40:34 not necessarily 21:40:50 you can tune the time it is wokring and the amount of parallelism 21:41:38 I guess the problem is worse if the ratio of synced / not synced is really small 21:41:41 especially since you would most likely put container db on SSDs, and the rust drives holding your objects wouldn't necessarily get touched any more often if your were in sync 21:42:00 your + containers 21:42:06 agree. 21:42:33 but in my case, ingest rate often exceeds the rate that container syncs can possibly occur 21:42:38 single-threaded 21:42:56 I also agree that an appraoch taken for the reconsiliator might be better, but from our point of view we prefer to first have a solution that works - perhaps sub optimal but works and then make it better 21:43:19 MooingLemur: So I take it that you want parallelism 21:43:36 even syncing N containers in parallel would work well, since we're sharded out quite wide. 21:43:56 not necessary to sync a single container any differently. We just need more workers. 21:44:41 In fact the suggested patch do both multi-processing (each process gets to sync a different container) and multi-threading (each process will sync a container using many threads) 21:45:35 eranrom: you hinted that you have before and after numbers. meaning you've already got your solution coded up? running somewhere? 21:45:46 yes 21:45:59 nice :D 21:46:12 I'd just like to see the "figuring out what needs to be synced" re-architected so it doesn't use so many resources. Then if you use a bunch of parallelism for the actual object transfers, that makes sense. 21:46:56 eranrom: is that split possible with what you've done? 21:47:38 Well the split would require to change additional places so as to queue only the work that needs to be done 21:47:54 wouldn't that be very similar to the reconciler pattern? 21:47:59 Another option is to build only the list of synced containers 21:48:32 we can build it form the devices only once in a while and keep in memcahce 21:49:14 I am not sure this is exactly what redbo meant, but is a big improvment. 21:49:41 Yeah. Basically we had it running and we had hardly any containers with sync turned on, but it was chewing up a ton of CPU. 21:49:45 the reconciler builds that up in swift itself (in the .misplaced_objects container via the container replication process). then you've decoupled the sync transfer implementation from what needs to be synced 21:50:09 doing somethign similar with container sync seems possible right? 21:50:55 yes, in more then one way 21:51:13 with tradeoffs of effort to what you get from it 21:51:18 I guess the more important question is if it is better. what do you think? 21:51:41 To go the reconsiler way, we need to change things outside of container sync 21:51:56 * notmyname wants to wrap up this topic int he next two minutes (time's running out) 21:51:59 mainly to 'register' or queue the sync work that needs to be done 21:52:11 eranrom: is there a specific quesiton or something or were you more looking for general input? 21:52:24 of course we'llcontinue discussing this outside of this meeting :-) 21:52:35 ok, lets continue outside of the meeting. 21:52:56 We can suggest a patch along the lines of what has been discussed so far 21:53:11 ok. thanks for brining it up! I'm excited that you're working on it. it's a places that's really in need of improvement :-) 21:53:26 my pleasure :-) 21:53:26 #topic open discussion 21:53:47 we've got about 5 minutes left, so everything else get's trhown in here 21:53:53 the last big thing I want to bring up is.. 21:54:11 EC bugs https://bugs.launchpad.net/swift/+bugs?field.tag=ec 21:54:21 most listed there are resolved 21:54:29 there is on "New" one 21:54:31 I wrote up 2 patches for ec 21:54:33 https://review.openstack.org/#/c/201055/ 21:54:37 I have two more in the works right now. 21:54:44 https://review.openstack.org/#/c/199043/ 21:54:50 minwoob_: bugs or patches? :-) 21:54:57 Patches for bugs. 21:54:59 yay 21:55:00 waiting to be reviewd 21:55:01 So, two bugs. 21:55:17 minwoob_: ah. are they not registered bugs yet? 21:55:29 They are, and they're pretty close, in my opinion. 21:55:37 summary being, these need to be resolved asap 21:55:39 minwoob_: ok 21:55:49 and as soon as they are resolved, we can do a swift release 21:55:50 One needs review, and the other is just in a merge conflict right now. 21:56:21 Aside from that, congrats to kota on the server side copy fix :-) 21:56:28 everyone, please review the bugs and patches for them. I'll be starring them as I see them in gerrit 21:57:28 anything else to bring up today in the meeting? 21:57:31 notmyname: Sounds good. 21:58:08 great! 21:58:10 thanks, everyone, for coming today. and thanks for working on swift :-) 21:58:16 #endmeeting