#openstack-meeting log

21:00:04 <notmyname> #startmeeting swift
21:00:04 <openstack> Meeting started Wed Jun  7 21:00:04 2017 UTC and is due to finish in 60 minutes.  The chair is notmyname. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:09 <openstack> The meeting name has been set to 'swift'
21:00:11 <notmyname> who's here for the swift team meeting?
21:00:16 <mattoliverau> o/
21:00:17 <timburke> o/
21:00:17 <m_kazuhiro> o/
21:00:18 <jrichli> hi
21:00:20 <rledisez> o/
21:00:21 <mathiasb> o/
21:00:29 <kota_> hello
21:00:52 <notmyname> acoles: tdasilva: clayg: ping
21:00:59 <acoles> notmyname: pong
21:01:00 <tdasilva> hello
21:01:18 <clayg> don't @ me
21:01:38 <notmyname> don't tell me what to do ;-)
21:01:43 <notmyname> welcome, everyone
21:02:07 <notmyname> agenda this week is at ...
21:02:08 <notmyname> #link https://wiki.openstack.org/wiki/Meetings/Swift
21:02:18 <notmyname> you'll see that I reorganized that page quite a bit
21:02:23 <notmyname> I hope it's easier to read now
21:02:35 <notmyname> also, OH MY there's a lot of stuff on it!
21:02:42 <tdasilva> it is until you add items to the other meeting
21:02:47 <notmyname> heh
21:03:15 <notmyname> reminder that our next 0700 meeting is june 14 (ie before the next 2100 meeting) and acoles will be leading it
21:04:03 <notmyname> let's see how quickly we can go through some stuff
21:04:15 <clayg> yeah let's go!
21:04:17 <clayg> go go go !
21:04:19 <notmyname> oh! I just saw an email on the mailing list from Erin Disney about the next PTG
21:04:25 <notmyname> has a registration link and hotel info
21:04:41 <timburke> http://lists.openstack.org/pipermail/openstack-dev/2017-June/118002.html
21:04:42 <notmyname> so be sure to read that
21:04:45 <notmyname> timburke: thanks
21:04:47 <clayg> already!?
21:04:51 <notmyname> #link http://lists.openstack.org/pipermail/openstack-dev/2017-June/118002.html
21:05:17 <notmyname> #topic follow-up from previous meeting
21:05:20 <notmyname> LSOF status
21:05:39 <rledisez> nothing new here, didn't had time to do the bench, just writing code
21:05:40 <notmyname> rledisez: you said you'd been swamped with other stuff and haven't been able to make much progress on this yet
21:05:44 <notmyname> ack
21:05:51 <notmyname> TC goals status
21:05:54 <rledisez> still on todolist, hopefuly for next week
21:06:01 <notmyname> acoles did a great job with the uwsgi one
21:06:02 <clayg> right on!
21:06:23 <notmyname> although the conversation sortof went in a "but do we really need this?" direction
21:06:43 <notmyname> however, it's documented, and as far as I can tell, we're done with that one
21:07:02 <notmyname> still no progress on the py3 docs
21:07:03 <mattoliverau> Nice
21:07:21 <notmyname> ok, `mount_check` for containers patch
21:07:24 <notmyname> #link https://review.openstack.org/#/c/466255/
21:07:25 <patchbot> patch 466255 - swift - Make mount_check option usable in containerized en...
21:07:49 <notmyname> we discussed it last week. clayg has +2'd it. zaitcev has a +1
21:07:57 <notmyname> hmm... cschwede?
21:07:57 <tdasilva> i think i signed up to review that, but have not had a chance yet, will review tomorrow
21:08:02 <notmyname> ack
21:08:04 <notmyname> thanks
21:08:14 <notmyname> ah, yes. you're name is on it in gerrit :-)
21:08:30 <notmyname> and depricating known-bad EC config
21:08:32 <notmyname> #link https://review.openstack.org/#/c/468105/
21:08:32 <patchbot> patch 468105 - swift - Require that known-bad EC schemes be deprecated
21:08:58 <notmyname> clayg: you have a comment on it saying someone should send an email to the ops mailing list
21:09:02 <notmyname> clayg: could that someone be you?
21:09:03 <clayg> notmyname: yeah you should do that
21:09:11 <tdasilva> lol
21:09:12 <notmyname> "someone" ;-)
21:09:13 <clayg> I can barely spell EC
21:09:38 <notmyname> ok ok. I'll do it
21:09:41 <clayg> we don't have to send jack - we can just merge it
21:09:48 <clayg> I also don't really know who reads the ops ML?
21:09:50 <notmyname> well, that shoudl happen too :-)
21:09:55 <tdasilva> jack might not want to go
21:10:05 <notmyname> #action notmyname to send email to ops ML about https://review.openstack.org/#/c/468105/
21:10:06 <patchbot> patch 468105 - swift - Require that known-bad EC schemes be deprecated
21:10:22 <notmyname> we shouldn't wait for a ML post to land that patch, though
21:10:39 <notmyname> tdasilva: your name is on it in gerrit too. are you planning on reviewing?
21:10:47 <clayg> either way - I *think* we landed on we want to make proxies not start in at least some conditions - which is pretty ballsy IMHO - maybe it's a great thing - for our product it means a bunch of work before we can safely upgrade to a version of swift with that behavior - i have no idea what it means for other people
21:11:32 <notmyname> IIRC the "some conditions" were when you had the bad EC policy as the only policy
21:11:45 <notmyname> otherwise, auto-set the bad config to depricated, right?
21:11:46 <clayg> i'm really pretty sure the code is top shelf (duh, timburke wrote it) - the question is really about what is the process for the canonical openstack never-breaks-compat project when they want to break compat?
21:12:38 <notmyname> it's not an API compat breakage. it's a "stop slowly deleting data when your have a certain config" change. with a strong signal to move your data to a different policy
21:12:46 <clayg> notmyname: I guess?  is that what timburke implemented?  that was one suggestion... is it *really* any better to automatically hide a policy and let you continue to ignore the error?  Does it mean any less work for someone that wants to start to consume this version of swift - but *somehow* still doesn't know they have these bad policies and isn't dealing with them?
21:13:44 <notmyname> timburke: I think that's questions best answered by you
21:13:46 <clayg> notmyname: data is only corrupted if you have a sufficiently old liberasurecode - which AFAIK isn't really part of the warning
21:13:59 <clayg> yeah timburke why are you doing this?  put your head back in the sand?
21:14:16 <timburke> clayg: i like turning over rocks?
21:14:16 <clayg> j/k
21:14:21 <clayg> lol!
21:14:23 <notmyname> also, this would be a great place for input from kota_ cschwede acoles mattoliverau tdasilva jrichli (all the cores)
21:14:40 <notmyname> rledisez: what do you want to happen in this case?
21:15:35 <timburke> so the change as it currently stands is to stop services (all of them!) from starting if you have a known-bad policy that hasn't been deprecated
21:15:36 <rledisez> i would prefer the process not to start. at least it's obvious something must be changed in config, i can't miss it. if policy is auto-set deprecated, an oeprator might miss it and then, getting called because "i can't create containers anymore"
21:16:18 <clayg> easier to cache in staging - i like it
21:16:25 <acoles> clayg: are you saying that isa_l_rs_vand with parity>5 is ok if you have recent libec?
21:16:35 <clayg> acoles: nope
21:16:41 <clayg> acoles: just that it doesn't corrupt data
21:16:47 <timburke> just that we won't still silently corrupt data
21:17:03 * acoles confused
21:17:06 <clayg> acoles: it might still 503 - we could also add more code to make it 503 less-ish if you can manage to get enough frags to get yourself out of a bad erase list
21:17:37 <acoles> OIC, we don't corrupt it noisily, we refuse to decode it?
21:17:54 <timburke> yeah
21:18:07 <clayg> acoles: but I like don't start if you have these policies - i just have a to write a bunch of code to make sure it doesn't come to that - and we don't really offer cluster-assisted-policy-migration yet - so if someone gets the "if you upgrade swift you are boned" message - there's no clear path to #winning
21:18:19 <clayg> you could deprecate it and continue to ignore it i guess?
21:18:46 <mattoliverau> Not starting services seems like something that would annoy people, but it seems the lesser of 2 evils. I mean we can log, but loosing data if you miss a log entry is kinda scary
21:19:07 * jungleboyj sneaks in late
21:19:28 <acoles> let's not do this in a release that you also cannot roll back from
21:19:50 <clayg> do we support rollback sometimes!
21:19:54 <notmyname> acoles: yeah, that's a scary part because we've never really done anything around supporting rollbacks
21:20:14 <notmyname> although most of the time it should work. probably.
21:20:17 <kota_> iirc, the newer pyeclib/liberasurecode stops to decode on that bad case
21:20:39 <acoles> no but we have had release where we warn that rollback would be really hard e.g. changing hashes.pkl format
21:21:02 <acoles> I guess if you never start the new release you're ok
21:21:02 <timburke> if you really want to continue with the policy un-deprecated, you could always go in and comment out the raise
21:21:10 <clayg> ugh, i'm bored - this sucks - anyone that has these policies is screwed - but what can we do?  Does this patch actually *help* someone?  I will punch you in the face until you check this checkbox - BAM do you like that!? <checks checkbox> - warning: you are still hozed
21:21:34 <kota_> so operators can deploy the policy but sometimes they will see decode error on reconstructor logs or proxy log on GET requests
21:22:10 <kota_> note that that is if the operators uses newer version of pyeclib/liberasurecode
21:22:11 <clayg> let's just ignore this and work on cluster-assisted-policy-migration?
21:22:33 <notmyname> clayg: right, so it's what mattoliverau said. this one is less bad than ignoring it
21:23:24 <clayg> ok cool
21:23:49 <acoles> timburke: but if you go in, comment out the raise, and find your can read your data, you then feel mad at devs
21:24:13 <notmyname> I'll send the ML message, and this patch should be reviewed and merged when we like it and operators will be in a better place
21:24:28 <notmyname> acoles: if you're editing the code on your own in prod, you're already off the reservation :-)
21:24:39 <mattoliverau> +1
21:24:48 <acoles> notmyname: wasn't my suggestion
21:24:53 <notmyname> :-)
21:25:06 <notmyname> ok, let's move on. we've at least got a little bit of a plan for it
21:25:06 <timburke> acoles: you'll find you can read your data just by deprecating the policy, too
21:25:20 <acoles> timburke: not if it is your only policy
21:25:33 <acoles> so, yeah, go make another policy. I know.
21:25:40 <tdasilva> so did we agree the current patch is ok?
21:25:52 <clayg> EVERYTHING IS AWESOME - PAINT IT BLUE!
21:26:09 <notmyname> it's not a bikeshed argument!
21:26:22 <notmyname> but yeah, I haven't heard any other paint colors being proposed ;-)
21:26:54 * acoles needs to find power cord, bbiab
21:26:57 <clayg> i suggested we still have the nuclear-reactor^Wcluster-assisted-policy-migration that would be a great *solution* to these policies being terrible?
21:27:58 <notmyname> yes, that would be awesome. but when? m_kazuhiro's had patches to help with that up for a long time
21:27:59 <timburke> so get behind https://review.openstack.org/#/c/173580/ or https://review.openstack.org/#/c/209329/ first?
21:28:00 <patchbot> patch 173580 - swift - wip: Cluster assisted Storage Policy Migration
21:28:01 <patchbot> patch 209329 - swift - WIP: Changing Policies
21:28:20 <mattoliverau> clayg: yeah, thats a feature that would make this mute more of a moot point.
21:29:07 <notmyname> for the time being (ie until we have the cluster assisted policy migration) we've got timburke's patch to not start unless the bad config is depricated
21:29:44 <clayg> win!
21:30:41 <notmyname> ok, let's move on
21:31:16 <notmyname> looking at nicks that just netsplit, I don't actually think anyone but torgomatic was here in this meeting (and he's already rejoined--if he was doing more than lurking)
21:31:22 <notmyname> #topic discuss: follow-up vs patch overwrite. what's the better pattern?
21:31:51 <notmyname> recently I've noticed us, as a group, doing a lot more of "oh this patch looks good, but..." and proposing a small follow-on patch
21:32:14 <acoles> notmyname: both. if patch can be landed now, follow-up, it it needs revisions anyway, overwrite
21:32:19 <notmyname> maybe that's great. maybe it's better to give the original author diffs or push over the current patch set
21:32:37 <acoles> unless follow up is so trivial, then overwrite and land
21:33:04 <mattoliverau> so/mute/much
21:33:05 <mattoliverau> Arrg phone autocomplete :p
21:33:24 <notmyname> it's something that seems to be a relatively recent change (at least in frequency), so i wanted to address it and either make sure we're all ok with it or know if people are worried by it (or want something differnt)
21:33:41 <notmyname> so what are your thoughts?
21:34:06 <notmyname> acoles: seems reasonable (ie "use your judgement") as long as we don't end up getting tests and docs in the follow-up patches?
21:34:25 <acoles> notmyname: agree
21:34:42 <notmyname> what do other people thing?
21:34:45 <notmyname> *think
21:34:54 * acoles wonders if notmyname is now going to tell I have just done exactly that?
21:35:51 <mattoliverau> Yeah what acoles said, but also follow up if it's a change that's building apon the last one but a little out of scope.
21:36:37 <mattoliverau> Ie, Critical bug (stop bleeding) then maybe improve
21:36:42 <notmyname> heh, so I can see were not likely to get some "this is good" or "this is bad" sort of comment. but i at least wanted to raise it as a way to point it out as a thing that's happening
21:36:57 <notmyname> so... pay attention! :-)
21:37:07 <clayg> notmyname: please clarify, you observed -> doing a lot more of "oh this patch looks good, but..." as opposed to give the original author diffs or push over the current patch set
21:37:08 <acoles> I sometimes feel we could use commit subject line for more useful words than 'follow up for...' and just use a Related-Change tag in the commit msg body
21:37:27 <rledisez> i also feel "trust your judgement" is the best choice i feel like the follow-up when you start a patch, and the more you dig, the more you want to change things because the new-way-to-do is diffferent than years ago. so splitting things like the bugfixe vs the code improvement (in a follow-up) seems reasonable. also, it helps separate what's everybody agree and what makes debate
21:37:33 <notmyname> clayg: yes, correct
21:37:41 <clayg> of "give the original author diffs" or "push over the current patch set" which do you thin was more common before you observed this change?
21:37:53 <jrichli> I have seen two different types of "follow-up" patches.  1) something more that can be done after the original merges 2) a suggested change, intended to be rolled into original, then abandoned
21:38:47 <notmyname> I think previously we waited more for the original author to write something (and occasionally gave a diff) -- at least for frequent contributors.
21:39:13 <jungleboyj> jrichli:  That second approach sounds dangerous.
21:39:33 <acoles> notmyname: but, but... you told us we needed to merge more stuff faster ;)
21:39:34 <jungleboyj> In that case it would be better to work with the original author to fix it or push up a new patch.
21:39:54 <clayg> notmyname: thanks for clarifying;  ok, I think trying to approve changes and address additional improvements in follow patches is a categorical improvement and something we did on purpose based on community review sessions that you've lead at summits and stuff
21:40:07 <notmyname> acoles: I don't think the current situation is necessarily bad
21:40:19 <jungleboyj> Policy I have always followed is to push up a dependent patch unless it is something really minor that isn't worth waiting for the other person to fix, then push up a new aptch.
21:40:28 <notmyname> eg https://review.openstack.org/#/c/468011/3 and https://review.openstack.org/#/c/465184/
21:40:29 <patchbot> patch 468011 - swift - Update Global EC docs with reference to composite ...
21:40:30 <patchbot> patch 465184 - swift - Ring doc cleanups
21:41:21 <clayg> don't use docs as the example - doc patches are terrible - they're so subjective
21:41:24 <timburke> https://review.openstack.org/#/c/466952/ maybe fell on the other side of that line
21:41:24 <patchbot> patch 466952 - swift - Follow-up for per-policy proxy configs (MERGED)
21:41:46 <notmyname> they were just easy ones I knew were proposed as follow-on that I had quick access to
21:41:58 <timburke> though parts of it were definitely out of scope for the original
21:42:04 <clayg> notmyname: i know i was just whining
21:42:44 <notmyname> I think landing more stuff faster is better. and it seems like this pattern has helping in that regard
21:42:49 <mattoliverau> jrichli's second approach is how we're working through sharding patch. Seems to work as we can break down the large patch and then merge it back in.
21:43:09 <acoles> notmyname: I'm not sure I see them as follow-up. The ring doc cleanup is standalone, just may been prompted by timburke reviewing another change??
21:43:11 <notmyname> but since it is a change, then I want to make sure people are feeling ok with it now that it's been happening for a bit
21:43:44 <clayg> wfm
21:44:22 <notmyname> kota_: tdasilva: are you concerned with the current pattern of using minor follow-on patches?
21:44:53 <kota_> hmm
21:45:23 <kota_> I'm not feeling so bad on the current follow up use cases right now
21:45:29 <notmyname> great :-)
21:45:34 <tdasilva> notmyname: no concern from me. I've noticed different people do different things...hasn't really bothered me
21:45:44 <notmyname> great :-)
21:45:56 <timburke> i definitely find myself more likely to take that second approach -- otherwise i'd worry that by pushing over a patch and +2ing it, the original author may not get a chance to protest before someone else comes along for the +A
21:46:16 <notmyname> ok, if someone *does* have comments, please feel free to say something, publicly or privately
21:46:21 <notmyname> ok, let's move on
21:46:32 <notmyname> #topic priority items
21:46:38 <notmyname> THERE'S SO MUCH STUFF TO DOOOO!!
21:46:57 <notmyname> ok, we're not going to be able to talk about all the stuff on https://wiki.openstack.org/wiki/Meetings/Swift
21:47:13 <notmyname> I think I'll take this section and copy it over to the priority reviews page :-)
21:47:36 <notmyname> but I do want to mention a few of the high priority bugs and some small follow-on things
21:47:41 <notmyname> https://bugs.launchpad.net/swift/+bug/1568650
21:47:43 <openstack> Launchpad bug 1568650 in OpenStack Object Storage (swift) "Connection between client and proxy service does not closes" [High,Confirmed] - Assigned to drax (devesh-gupta)
21:47:53 <notmyname> related to https://bugs.launchpad.net/swift/+bug/1572719
21:47:54 <openstack> Launchpad bug 1572719 in OpenStack Object Storage (swift) "Client may hold socket open after ChunkWriteTimeout" [High,Confirmed]
21:48:09 <notmyname> high priority bugs, open for a while, no current patches for them
21:48:53 <notmyname> is there someone who can look in to them this week? if not to write a full patch, at least to outline what needs to be done to help someone who *can* write a patch
21:48:57 <clayg> yeah... stupid connection close bugs... timburke found another one
21:49:02 <notmyname> yep
21:49:28 <notmyname> timburke: you seem to be on this topic lately. can you check these?
21:49:39 <clayg> whoa!  poor timburke :'(
21:50:00 <clayg> i was going to just suggest we move the medium and wait for them to bite us?
21:50:14 <notmyname> maybe that's the best
21:50:17 <timburke> i suppose i can try... need to go digging into eventlet anyway...
21:51:11 <notmyname> right, I mean keep these in mind as you're doing the investigation on the new bug you found
21:51:33 <notmyname> thanks
21:51:49 <notmyname> ok, next: DB replicators lose metadata
21:51:54 <notmyname> https://review.openstack.org/#/c/302494/
21:51:54 <patchbot> patch 302494 - swift - Sync metadata in 'rsync_then_merge' in db_replicator
21:51:58 <notmyname> https://bugs.launchpad.net/swift/+bug/1570118
21:52:00 <openstack> Launchpad bug 1570118 in OpenStack Object Storage (swift) "Metadata can be missing by account/container replication" [High,In progress] - Assigned to Daisuke Morita (morita-daisuke)
21:52:09 <kota_> I added +2 already?
21:52:16 <notmyname> kota_: you did! thanks!
21:52:29 <clayg> i'ma look at that
21:52:34 <notmyname> clayg: thanks!
21:52:36 <clayg> werd
21:52:52 <notmyname> ok, we already mentioned the EC docs updates...
21:52:59 <notmyname> now https://bugs.launchpad.net/swift/+bug/1652323
21:53:01 <openstack> Launchpad bug 1652323 in OpenStack Object Storage (swift) "ssync syncs an expired object as a tombstone, probe test_object_expirer fails" [Medium,Confirmed]
21:53:03 <kota_> timburke: fyi -> https://review.openstack.org/#/c/471613/
21:53:03 <patchbot> patch 471613 - swift - Use config_number method instead of node_id + 1
21:53:18 <kota_> related on the db_replicator's probe
21:53:19 <timburke> kota_: i saw; thanks
21:53:51 <notmyname> acoles: on that bug, anything to do right now? it's listed as medium
21:54:47 <notmyname> the title mentions the probe tests failing, but the comments indicate something more serious (ssynv failing to replicate data)
21:54:56 <notmyname> which is why I mention it here today
21:55:06 <acoles> notmyname: I'm not sure how best to close that one. I discussed it with rledisez this week - do we do a lot of work to be able to reconstruct an expired frag, or just ignore that it is missing ... cos it has expired anyway??
21:55:14 <notmyname> rledisez: is this something your devs at OVH can help with?
21:55:26 <acoles> notmyname: we fixed the probe test with a workaround
21:55:35 <notmyname> ah, ok
21:55:46 <acoles> but we didn;t fix ssync :/
21:55:58 <notmyname> ok, so we need to update tis bug or get a different bug to track
21:56:10 <notmyname> which do you think is best?
21:56:15 <rledisez> notmyname: right now no, but i want this fixed, so i might take some time to look into it, but i still don't know what would be the best way
21:56:18 <acoles> the recent change rledisez made so we can open an expired diskfile helps, but doesn;t get us everything we need to fix this one
21:56:32 <acoles> notmyname: I'll update the bug
21:56:42 <notmyname> acoles: perfect. thanks
21:56:58 <notmyname> https://review.openstack.org/#/c/439572/
21:56:59 <patchbot> patch 439572 - swift - Limit number of revert tombstone SSYNC requests
21:57:07 <notmyname> for https://bugs.launchpad.net/swift/+bug/1668857
21:57:08 <openstack> Launchpad bug 1668857 in OpenStack Object Storage (swift) "EC reconstructor revert tombstone SSYNCs with too many primaries" [Medium,In progress] - Assigned to Mahati Chamarthy (mahati-chamarthy)
21:57:14 <acoles> any other opinions - missing frag, but expired, can we just leave it missing??
21:57:32 <tdasilva> acoles: what's the cons to that?
21:57:36 <notmyname> kota_: on this patch, youv'e got a +1
21:57:57 <acoles> tdasilva: inconsistent state, for a while
21:58:04 <notmyname> kota_: will you get a chance to revisit it?
21:58:06 <kota_> notmyname: yup, i reviewd yesterday again but i'm still on +1 not +2 yet
21:58:26 <kota_> still want to collect clay's opinion
21:58:28 <notmyname> kota_: ok. what do you want to see before you get to a +2? what's missing?
21:58:33 <notmyname> oh, clayg is already +2 ;-)
21:58:45 <kota_> diff from clay's description with current code
21:58:59 <kota_> for the number of primary nodes to push the tombstone
21:59:33 <clayg> kota_: is it > 3 and < replicas?  I will +2 it so hard!
21:59:34 <kota_> i added my thought yesterday so now waiting on the ack. I just was thinking to poke clayg after this meeting
21:59:45 <notmyname> ok
21:59:45 <kota_> it's on non duplicated case
21:59:53 <kota_> to clayg
22:00:27 <clayg> k, we can chat
22:00:35 <notmyname> kota_: clayg: thank you
22:00:59 <notmyname> ok, we're at time, but please check the other patches linked on the meeting agenda (and i'll move them to the priority reviews page as well)
22:01:07 <notmyname> thank you, everyone, for your work on swift
22:01:15 <notmyname> and thanks for coming today!
22:01:16 <jungleboyj> Thanks.
22:01:17 <notmyname> #endmeeting