#openstack-meeting log

19:01:52 <notmyname> #startmeeting swift
19:01:53 <openstack> Meeting started Wed Jun 11 19:01:52 2014 UTC and is due to finish in 60 minutes.  The chair is notmyname. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:56 <openstack> The meeting name has been set to 'swift'
19:02:10 <notmyname> Thanks for coming. big stuff to talk about this week
19:02:15 <notmyname> #link https://wiki.openstack.org/wiki/Meetings/Swift
19:02:25 <notmyname> briancline updated the agenda perfectly :-)
19:02:34 <notmyname> I guess I'm pretty predictable :-)
19:03:01 <notmyname> first up, for some tangentially related logistics
19:03:36 <briancline> huzzah!
19:03:42 <notmyname> I'm having surgery tomorrow morning, early, and I've added clayg and torgomatic to the swift-ptl group in gerrit, temporarily
19:04:12 <notmyname> I expect to be back online this weekend. maybe friday pm
19:04:13 <clayg> o/
19:04:33 <peluse_> good luck man!
19:04:37 <notmyname> thanks
19:04:40 <notmyname> so, moving on to the current stuff in swift...
19:04:42 <portante> a/
19:04:49 <portante> o/
19:04:53 <notmyname> #topic storage policies merge
19:04:53 <portante> sorry I am late
19:05:01 <notmyname> portante: no worries. just getting started
19:05:24 <notmyname> just last night clayg proposed what I think is the "final" set of SP patches. ie the set with all the functionality
19:05:40 <notmyname> note that the new "end of chain" is https://review.openstack.org/#/c/99315/
19:05:51 <portante> 2 vector timestamps
19:05:59 <peluse_> reviewing now...
19:06:01 <portante> I'd like some time to review that
19:06:08 <notmyname> of course
19:06:17 <acoles> i've started but not done on 99315
19:06:34 <portante> would there be any revolt pushing this out from this week to next?
19:06:42 <clayg> REVOLT!
19:06:43 <notmyname> clayg: can you confirm that, other than discovered issues in the proposed patches, there is no additional functionality expected to be proposed
19:07:03 <clayg> notmyname: well i'm not sure the two vector timestamp stuff is really done *done*
19:07:08 <notmyname> ok
19:07:26 <clayg> notmyname: at a minimum it needs some extra probetests, and after two days of testing and testing i just sorta said - well i guess that's good enough
19:07:33 <notmyname> :-)
19:07:43 <notmyname> ok
19:08:20 <clayg> notmyname: I tried to audit everywhere that swift core was dealing with timestamps and I think it's all quite managable, but w/o probetests my confidence in the consistency engine in the face of the internalized form is only like... 95%
19:08:33 <notmyname> ack
19:08:37 <clayg> aybe 92%
19:08:45 <notmyname> 93.4%?
19:08:48 <peluse_> clayg:  I'm only a few files into it but so far it seems like an improvement even aside from the new functinality (cleaner)
19:09:03 <acoles> clayg: i have a concern that the offset needs to be absolute, like another timestamp, for it to be useful for the object metadata post use case
19:09:03 <briancline> (no, negotiate up!)
19:09:05 <clayg> peluse_: maybe the timestamp class is sorta nice
19:09:11 <peluse_> yup
19:09:23 <acoles> clayg: the Timestamp class IS nice
19:09:37 <clayg> acoles: maybe we can offline that - i'm pretty sure it's useless if the offset is absolute ;)
19:09:54 <clayg> fixed deterministic is the way to go!
19:10:05 <acoles> clayg: yeah, discuss offline
19:10:34 <notmyname> ok, so it looks like we should target Monday instead of today/tomorrow for the SP merge
19:10:46 <notmyname> ie giving several more days for review
19:10:58 <peluse_> I think I'm washing my hair on Monday...
19:11:00 <clayg> notmyname: what about getting some of the other pending stuff merged
19:11:10 <notmyname> clayg: well that's my next topic :-)
19:11:13 <clayg> notmyname: did you want to cut a release 1.14 or some such before sp?
19:11:14 <acoles> peluse_: !
19:11:21 <portante> peluse_: wait, you have that much hair to wash? :)
19:11:26 <notmyname> we've had the soft freeze for a while, and there is stuff queued up
19:11:28 <portante> hair washing?
19:11:34 <portante> next topic?
19:11:42 <peluse_> :)
19:11:47 <notmyname> I'm not anticipating a 1.13.2 or 1.14 release before the SP release
19:12:07 <portante> notmyname: where are we at with the xLO fixes?
19:12:14 <notmyname> what I am expecting is that we'll land the SP chain and also the stuff that's queued up. and all of that will be in the release
19:12:21 <notmyname> portante: 2 +2s but not merged
19:12:27 <briancline> what's queued up for the SP release? or is that the bottom half of the priority reviews page
19:12:35 <portante> but does that fix anticw's concerns?
19:12:41 <briancline> (other than SP of course)
19:12:42 <notmyname> portante: yes it does
19:12:47 <portante> great
19:12:56 <notmyname> the bottom half of https://wiki.openstack.org/wiki/Swift/PriorityReviews has patches that have 1 +2 and could be in a release
19:13:08 <notmyname> I don't have the stuff with 2 +2s listed anywhere right now
19:13:30 <notmyname> portante: and, later, I've got to decide if we backport that to icehouse (likely, but not definite)
19:13:32 <briancline> could probably beat gerrit into submission to find that
19:13:36 <notmyname> portante: if so, I'll take care of it
19:13:45 <portante> notmyname: k
19:14:05 <notmyname> so that's the current tension: a bunch of queued reviews and the SP chain that we want to avoid a bunch of rebasing on
19:14:30 <notmyname> how about this: today I'll merge the stuff that's pending (2 +2s) and then clayg can rebase tomorrow after those land.
19:14:31 <notmyname> ?
19:14:35 <notmyname> clayg: thoughts?
19:14:58 * portante wonders if the 2-vector timestamps should go in first against master as is
19:15:02 <notmyname> the point of the soft freeze is to avoid a bunch of rebases, but if we're all in agreement, then we can to it on a limited basis
19:15:13 <notmyname> portante: stop your speculation ;-)
19:15:21 <portante> okay
19:15:24 <notmyname> lol
19:15:34 * portante wonders why it is not raining here ...
19:15:41 <clayg> rebase's away!
19:16:05 <notmyname> everyone else ok with me landing the pending stuff and then having a monday target for SP landing?
19:16:13 <briancline> +1
19:16:16 <clayg> the only reason to avoid them is cause it's ugly in gerrit for reviewers - but everyone's tolerance seems to have built up against that
19:16:28 <clayg> notmyname: monday is a weird day to do anything
19:16:42 <notmyname> clayg: so are other days that end in "y"
19:16:43 <portante> I don't mind the patch set rebase if it does not hinder clayg's efforts
19:16:51 <notmyname> portante: that's my concern
19:16:59 <clayg> portante: no it's no trouble for me at all really
19:17:09 <clayg> portante: it's just annoying to reviewers
19:17:10 <portante> okay ... pig pile!
19:17:17 <briancline> *shrug* I've grown numb to long patch chains
19:17:17 <acoles> i don't mind being annoyed
19:17:41 * portante wonders if he is comfortably numb
19:17:47 <notmyname> the gate seems to have moved from "horrible" to simply "terrible", so it may take all night to merge stuff
19:17:48 <briancline> numb not because of this... that's glance's fault
19:18:02 <notmyname> or tomorrow
19:18:09 <portante> notmyname: what was the special way to land this patch set?
19:18:17 <portante> is it written up somewhere?
19:18:25 <notmyname> #action notmyname to land pending changes
19:18:27 <portante> the PS set
19:18:37 <portante> SP
19:18:52 <notmyname> portante: good question. nice transition :-)
19:19:08 * portante checks made out to ...
19:19:23 <notmyname> so, given the state of the gate (13.5 hours at a 50% pass rate now), we don't want to try to land 29 patches there
19:19:35 <acoles> notmyname: aww
19:19:36 <clayg> man... i remember when it was *only* 27
19:19:39 <notmyname> lol
19:19:48 <portante> ;)
19:19:50 <notmyname> so I've been talking with -infra to figure out a better way
19:19:58 <notmyname> here's what we've come up with:
19:20:05 <notmyname> review the current patches as normal
19:20:16 <notmyname> leave +1s and +2s
19:20:20 <notmyname> (or -1s)
19:21:28 <notmyname> when they all have all the necessary reviews, then we will have -infra build at new feature branch (probably "sp-review") and we'll force push all the patches there. then one merge commit will be proposed to master and reviewed in gerrit. I'll link the existing patch reviews (for historians) and we'll merge that one patch
19:21:51 <notmyname> the key is that all 29 patches will not have to be gated. just the final set of them all will be gated once
19:22:02 <notmyname> I'm working with mordred on this
19:22:12 <clayg> yay mordred!
19:22:15 <notmyname> make sense?
19:22:45 <portante> so will the individual commits be lost then?
19:22:50 <notmyname> portante: no
19:23:01 <portante> great, all for it then
19:23:30 <notmyname> portante: the individual commits (the 29 proposed) will still exist. but they will be added to master in one atomic commit (which is also nice for future bisects and bug tracking)
19:23:50 <notmyname> basically, this is how you are supposed to do git ;-)
19:24:03 <portante> nice
19:24:16 <notmyname> ok. so that takes us up to the "everything is on master" time
19:24:18 <zaitcev> you mean in one merge
19:24:18 <peluse_> cool
19:24:34 <notmyname> I'm hoping that we'll be there on tuesday (ie merge monday)
19:24:36 * portante wonders if a disney movie quote fits here ... fox and the hound
19:25:04 <notmyname> at that point, with the SP patches and the other queued up stuff, we'll cut and RC for the next release
19:25:20 <notmyname> and master is open for new patches
19:25:39 <notmyname> the RC period will be extended from the normal 3-4 days to two weeks
19:25:59 <portante> is anybody from rackspace here?
19:26:02 <notmyname> during this time, I'm hoping that everyone will be able to do their own testing in their labs for this release
19:26:14 <torgomatic> it amazes me that it takes a whole team effort to force Gerrit to work like Git wants it to :|
19:26:35 <briancline> are there any and/or do we need to define any parameters (logistically) for testing the RC?
19:26:48 <clayg> torgomatic is filled with astonishment
19:26:50 <notmyname> I have soft commitments from RAX, HP, softlayer, Red Hat, and maybe NeCTAR and maybe eNovance to do testing
19:26:57 <briancline> or just throw everything imaginable at it?
19:27:46 <notmyname> briancline: the most important thing is that existing clusters don't break. after that, look at the new features and do "stuff" to ensure it works as expected. IOW, what would happen if you deployed it to prod and turned it on :-)
19:28:22 <notmyname> I'm most concerned about regressions. then functionality
19:28:37 <zaitcev> do we have the 2-phase config in the latest SP or not? If yes, it has to be documented in some kind of readme  1)  yum update or apt-get something, 2) edit swift.conf (on all nodes) and set SP_SCHEMA=true
19:28:55 <zaitcev> or is it implicit for >1 policies
19:29:10 <briancline> are there any metrics or other things in specific that all who are *extremely* familiar with this full set of patches would like us to make note of?
19:29:25 <notmyname> zaitcev: yes. updating the code is "safe". having >1 policy is what is the trigger for many of the code paths and is the "can't downgrade" point of no return
19:29:30 <briancline> aside from what we might usually do in our individual normal course of testing
19:29:35 <peluse_> zaitcev:  the docs have udpated info about the order to do upgrades
19:29:40 <notmyname> peluse_: ah good
19:29:59 <zaitcev> peluse_: thanks, I'll re-review
19:30:13 <notmyname> assuming nothing is found during the RC period that is not also fixed during the RC period, then at the end of it we will have the final release. that will be Swift v2.0
19:30:24 <peluse_> zaitcev:  Cool, in the section called "Upgrading" or something like that
19:30:40 <peluse_> yes!
19:30:43 <notmyname> and I'm letting some upstream community, packagers, and marketing people know
19:30:57 <clayg> briancline: containers are going to be slower to fill up, at least in the pathological case - it may be hard to prove with the all the object server and http connection overhead
19:31:03 <notmyname> unfortunately, as external-to-devs get involved, it puts more pressure on specific dates
19:31:21 <clayg> briancline: but if you're benchmarking normally uses 100 containers - you might try it with only 10 - and get a before and after
19:31:30 <notmyname> all of this put together means an end-of-June release
19:31:44 <notmyname> any questions here? does this sound reasonable?
19:31:55 <peluse_> bueno
19:32:09 <briancline> clayg: works for me - I'll make a note for myself
19:32:20 <notmyname> peluse_: will y'all be testing at Intel? can I add your name to the "soft QA commitment" list?
19:32:41 <peluse_> we don't have production clusters but I planned on testing an upgrade on a real test cluster
19:32:52 <notmyname> ack
19:33:30 <notmyname> ok, so everyone review the SP patches, and when not looking at those, take a look at the ones listed at the bottom of https://wiki.openstack.org/wiki/Swift/PriorityReviews
19:33:51 <portante> ack
19:34:31 <notmyname> and, once again, thank you to everyone here. every time I see the Swift community come together, I struck by your awesomeness :-)
19:34:56 <peluse_> good luck tomorrow... say yes to morphine
19:35:06 <tdasilva> notmyname: good luck tomorrow
19:35:10 <notmyname> thanks
19:35:15 <portante> aw, com'on man, think wolverine!
19:35:23 <notmyname> as a note about the non-SP release, if there are other patches that need to be in the release, please add them to the bottom of that wiki page
19:35:24 <portante> did he have any morphine?
19:35:43 <peluse_> portante:  youre right I think he passed
19:35:47 <notmyname> briancline had one more topic he wanted to discuss
19:35:55 <notmyname> #topic container sync questions
19:35:58 <notmyname> briancline: you're up
19:36:29 <briancline> right, so this is just a quick meta-question or two on container sync -
19:37:25 <briancline> in reviewing some of the innards and the doc on it (http://docs.openstack.org/developer/swift/overview_container_sync.html), it isn't quite clear how it handles syncing objects whose replica 0 lives on a downed storage node
19:37:56 <briancline> there's a brief mention of balancing distribution of work but not missing work, but the latter part isn't covered much that I saw
19:38:25 <briancline> I've got a WIP for the multinode instructions and figured if I can get some clarity on it then perhaps I could submit a change to clarify these
19:39:01 <clayg> briancline: the second sync point watches will march up and sync all rows - but it expects to short circut when everyone is doing their job
19:39:07 <notmyname> I know that the container sync processes weren't very scalable (unlike the expirer). what have you seen in your testing pandemicsyn?
19:39:32 <clayg> notmyname: well you get replica count guys per container
19:39:59 <clayg> so smaller less active containers sync more quicklyish than larger more active ones - and it's really mostly dominted by the weight
19:40:17 <notmyname> clayg: I mean the single-thread, single-process syncer that is transporting for the whole cluster. I don't recall if that was improved
19:40:25 <clayg> but there's no idea of "container-sync is running slow, i'll run more"
19:40:45 <briancline> clayg: ahh ok, so the secondary/teriary nodes should detect this from SP2? if so, will they have the intelligence to distinguish between the replica 0 node being down versus it taking a long time to complete a prior sync?
19:40:46 <clayg> notmyname: well it is single process per container server...
19:40:51 <notmyname> clayg: ah ok
19:40:59 <clayg> briancline: what?
19:41:13 <clayg> briancline: you're talking specifically about container sync with storage policies?
19:41:21 <notmyname> sync point, I think :-)
19:41:22 <clayg> or ust scaling container sync in general?
19:41:26 <briancline> no, container sync itself
19:41:28 <clayg> oh heheheheh
19:41:33 <notmyname> briancline: SP now means storage polices :-)
19:41:47 <briancline> oh, haha
19:41:52 <briancline> sorry, sync point 2 :)
19:42:08 <clayg> briancline: no they don't distinguish, everyone moves everything eventually
19:42:54 <clayg> briancline: but at first they only try mod replica count and then wait for the second pass before doing all rows and hope they other guys make that second sweep quick
19:43:25 <briancline> alright, that helps a good bit
19:43:33 <clayg> briancline: you might be surprised
19:43:41 <clayg> briancline: but it *sounds* good!
19:43:54 <clayg> briancline: and it works... which is always nice
19:43:57 <briancline> I only mean it helps my understanding ;-)
19:44:59 <notmyname> cool. I think other design/usage questions should come up in #openstack-swift
19:45:07 <clayg> heheheh
19:45:29 <notmyname> and Rackspace is looking at it too, so you might want to ping them on it as well. and I'm hoping all of it results in patches to make it better :-)
19:45:30 <briancline> so my current understanding is the contention point is we don't want to worry about a single coordination point that would solve some of the scaling issues, and that it's totally serial per container server, correct?
19:46:49 <briancline> if this is a bit too in the weeds I can take it offline
19:47:03 <notmyname> briancline: ya, I think it should be discussion in -swift
19:47:07 <notmyname> *discussed
19:47:32 <briancline> I mostly put it on the agenda since I've seen a lot of lonely souls ask about it without much input
19:47:35 <briancline> alright, cool
19:47:45 <notmyname> #topic other topics?
19:47:54 <notmyname> anythign else to bring up as a group this week?
19:48:34 <creiht> howdy
19:48:35 <creiht> sorry
19:48:41 <notmyname> welcome :-)
19:48:46 <notmyname> creiht: we are just finishing up
19:48:51 <notmyname> actually, I think we're done
19:49:14 <creiht> perfect timing :)
19:49:14 <notmyname> thanks everyone for attending and participating
19:49:16 <clayg> creiht: i'll stay and talk with you if you want
19:49:19 <notmyname> #endmeeting