21:00:04 <notmyname> #startmeeting swift 21:00:04 <openstack> Meeting started Wed Jun 7 21:00:04 2017 UTC and is due to finish in 60 minutes. The chair is notmyname. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:09 <openstack> The meeting name has been set to 'swift' 21:00:11 <notmyname> who's here for the swift team meeting? 21:00:16 <mattoliverau> o/ 21:00:17 <timburke> o/ 21:00:17 <m_kazuhiro> o/ 21:00:18 <jrichli> hi 21:00:20 <rledisez> o/ 21:00:21 <mathiasb> o/ 21:00:29 <kota_> hello 21:00:52 <notmyname> acoles: tdasilva: clayg: ping 21:00:59 <acoles> notmyname: pong 21:01:00 <tdasilva> hello 21:01:18 <clayg> don't @ me 21:01:38 <notmyname> don't tell me what to do ;-) 21:01:43 <notmyname> welcome, everyone 21:02:07 <notmyname> agenda this week is at ... 21:02:08 <notmyname> #link https://wiki.openstack.org/wiki/Meetings/Swift 21:02:18 <notmyname> you'll see that I reorganized that page quite a bit 21:02:23 <notmyname> I hope it's easier to read now 21:02:35 <notmyname> also, OH MY there's a lot of stuff on it! 21:02:42 <tdasilva> it is until you add items to the other meeting 21:02:47 <notmyname> heh 21:03:15 <notmyname> reminder that our next 0700 meeting is june 14 (ie before the next 2100 meeting) and acoles will be leading it 21:04:03 <notmyname> let's see how quickly we can go through some stuff 21:04:15 <clayg> yeah let's go! 21:04:17 <clayg> go go go ! 21:04:19 <notmyname> oh! I just saw an email on the mailing list from Erin Disney about the next PTG 21:04:25 <notmyname> has a registration link and hotel info 21:04:41 <timburke> http://lists.openstack.org/pipermail/openstack-dev/2017-June/118002.html 21:04:42 <notmyname> so be sure to read that 21:04:45 <notmyname> timburke: thanks 21:04:47 <clayg> already!? 21:04:51 <notmyname> #link http://lists.openstack.org/pipermail/openstack-dev/2017-June/118002.html 21:05:17 <notmyname> #topic follow-up from previous meeting 21:05:20 <notmyname> LSOF status 21:05:39 <rledisez> nothing new here, didn't had time to do the bench, just writing code 21:05:40 <notmyname> rledisez: you said you'd been swamped with other stuff and haven't been able to make much progress on this yet 21:05:44 <notmyname> ack 21:05:51 <notmyname> TC goals status 21:05:54 <rledisez> still on todolist, hopefuly for next week 21:06:01 <notmyname> acoles did a great job with the uwsgi one 21:06:02 <clayg> right on! 21:06:23 <notmyname> although the conversation sortof went in a "but do we really need this?" direction 21:06:43 <notmyname> however, it's documented, and as far as I can tell, we're done with that one 21:07:02 <notmyname> still no progress on the py3 docs 21:07:03 <mattoliverau> Nice 21:07:21 <notmyname> ok, `mount_check` for containers patch 21:07:24 <notmyname> #link https://review.openstack.org/#/c/466255/ 21:07:25 <patchbot> patch 466255 - swift - Make mount_check option usable in containerized en... 21:07:49 <notmyname> we discussed it last week. clayg has +2'd it. zaitcev has a +1 21:07:57 <notmyname> hmm... cschwede? 21:07:57 <tdasilva> i think i signed up to review that, but have not had a chance yet, will review tomorrow 21:08:02 <notmyname> ack 21:08:04 <notmyname> thanks 21:08:14 <notmyname> ah, yes. you're name is on it in gerrit :-) 21:08:30 <notmyname> and depricating known-bad EC config 21:08:32 <notmyname> #link https://review.openstack.org/#/c/468105/ 21:08:32 <patchbot> patch 468105 - swift - Require that known-bad EC schemes be deprecated 21:08:58 <notmyname> clayg: you have a comment on it saying someone should send an email to the ops mailing list 21:09:02 <notmyname> clayg: could that someone be you? 21:09:03 <clayg> notmyname: yeah you should do that 21:09:11 <tdasilva> lol 21:09:12 <notmyname> "someone" ;-) 21:09:13 <clayg> I can barely spell EC 21:09:38 <notmyname> ok ok. I'll do it 21:09:41 <clayg> we don't have to send jack - we can just merge it 21:09:48 <clayg> I also don't really know who reads the ops ML? 21:09:50 <notmyname> well, that shoudl happen too :-) 21:09:55 <tdasilva> jack might not want to go 21:10:05 <notmyname> #action notmyname to send email to ops ML about https://review.openstack.org/#/c/468105/ 21:10:06 <patchbot> patch 468105 - swift - Require that known-bad EC schemes be deprecated 21:10:22 <notmyname> we shouldn't wait for a ML post to land that patch, though 21:10:39 <notmyname> tdasilva: your name is on it in gerrit too. are you planning on reviewing? 21:10:47 <clayg> either way - I *think* we landed on we want to make proxies not start in at least some conditions - which is pretty ballsy IMHO - maybe it's a great thing - for our product it means a bunch of work before we can safely upgrade to a version of swift with that behavior - i have no idea what it means for other people 21:11:32 <notmyname> IIRC the "some conditions" were when you had the bad EC policy as the only policy 21:11:45 <notmyname> otherwise, auto-set the bad config to depricated, right? 21:11:46 <clayg> i'm really pretty sure the code is top shelf (duh, timburke wrote it) - the question is really about what is the process for the canonical openstack never-breaks-compat project when they want to break compat? 21:12:38 <notmyname> it's not an API compat breakage. it's a "stop slowly deleting data when your have a certain config" change. with a strong signal to move your data to a different policy 21:12:46 <clayg> notmyname: I guess? is that what timburke implemented? that was one suggestion... is it *really* any better to automatically hide a policy and let you continue to ignore the error? Does it mean any less work for someone that wants to start to consume this version of swift - but *somehow* still doesn't know they have these bad policies and isn't dealing with them? 21:13:44 <notmyname> timburke: I think that's questions best answered by you 21:13:46 <clayg> notmyname: data is only corrupted if you have a sufficiently old liberasurecode - which AFAIK isn't really part of the warning 21:13:59 <clayg> yeah timburke why are you doing this? put your head back in the sand? 21:14:16 <timburke> clayg: i like turning over rocks? 21:14:16 <clayg> j/k 21:14:21 <clayg> lol! 21:14:23 <notmyname> also, this would be a great place for input from kota_ cschwede acoles mattoliverau tdasilva jrichli (all the cores) 21:14:40 <notmyname> rledisez: what do you want to happen in this case? 21:15:35 <timburke> so the change as it currently stands is to stop services (all of them!) from starting if you have a known-bad policy that hasn't been deprecated 21:15:36 <rledisez> i would prefer the process not to start. at least it's obvious something must be changed in config, i can't miss it. if policy is auto-set deprecated, an oeprator might miss it and then, getting called because "i can't create containers anymore" 21:16:18 <clayg> easier to cache in staging - i like it 21:16:25 <acoles> clayg: are you saying that isa_l_rs_vand with parity>5 is ok if you have recent libec? 21:16:35 <clayg> acoles: nope 21:16:41 <clayg> acoles: just that it doesn't corrupt data 21:16:47 <timburke> just that we won't still silently corrupt data 21:17:03 * acoles confused 21:17:06 <clayg> acoles: it might still 503 - we could also add more code to make it 503 less-ish if you can manage to get enough frags to get yourself out of a bad erase list 21:17:37 <acoles> OIC, we don't corrupt it noisily, we refuse to decode it? 21:17:54 <timburke> yeah 21:18:07 <clayg> acoles: but I like don't start if you have these policies - i just have a to write a bunch of code to make sure it doesn't come to that - and we don't really offer cluster-assisted-policy-migration yet - so if someone gets the "if you upgrade swift you are boned" message - there's no clear path to #winning 21:18:19 <clayg> you could deprecate it and continue to ignore it i guess? 21:18:46 <mattoliverau> Not starting services seems like something that would annoy people, but it seems the lesser of 2 evils. I mean we can log, but loosing data if you miss a log entry is kinda scary 21:19:07 * jungleboyj sneaks in late 21:19:28 <acoles> let's not do this in a release that you also cannot roll back from 21:19:50 <clayg> do we support rollback sometimes! 21:19:54 <notmyname> acoles: yeah, that's a scary part because we've never really done anything around supporting rollbacks 21:20:14 <notmyname> although most of the time it should work. probably. 21:20:17 <kota_> iirc, the newer pyeclib/liberasurecode stops to decode on that bad case 21:20:39 <acoles> no but we have had release where we warn that rollback would be really hard e.g. changing hashes.pkl format 21:21:02 <acoles> I guess if you never start the new release you're ok 21:21:02 <timburke> if you really want to continue with the policy un-deprecated, you could always go in and comment out the raise 21:21:10 <clayg> ugh, i'm bored - this sucks - anyone that has these policies is screwed - but what can we do? Does this patch actually *help* someone? I will punch you in the face until you check this checkbox - BAM do you like that!? <checks checkbox> - warning: you are still hozed 21:21:34 <kota_> so operators can deploy the policy but sometimes they will see decode error on reconstructor logs or proxy log on GET requests 21:22:10 <kota_> note that that is if the operators uses newer version of pyeclib/liberasurecode 21:22:11 <clayg> let's just ignore this and work on cluster-assisted-policy-migration? 21:22:33 <notmyname> clayg: right, so it's what mattoliverau said. this one is less bad than ignoring it 21:23:24 <clayg> ok cool 21:23:49 <acoles> timburke: but if you go in, comment out the raise, and find your can read your data, you then feel mad at devs 21:24:13 <notmyname> I'll send the ML message, and this patch should be reviewed and merged when we like it and operators will be in a better place 21:24:28 <notmyname> acoles: if you're editing the code on your own in prod, you're already off the reservation :-) 21:24:39 <mattoliverau> +1 21:24:48 <acoles> notmyname: wasn't my suggestion 21:24:53 <notmyname> :-) 21:25:06 <notmyname> ok, let's move on. we've at least got a little bit of a plan for it 21:25:06 <timburke> acoles: you'll find you can read your data just by deprecating the policy, too 21:25:20 <acoles> timburke: not if it is your only policy 21:25:33 <acoles> so, yeah, go make another policy. I know. 21:25:40 <tdasilva> so did we agree the current patch is ok? 21:25:52 <clayg> EVERYTHING IS AWESOME - PAINT IT BLUE! 21:26:09 <notmyname> it's not a bikeshed argument! 21:26:22 <notmyname> but yeah, I haven't heard any other paint colors being proposed ;-) 21:26:54 * acoles needs to find power cord, bbiab 21:26:57 <clayg> i suggested we still have the nuclear-reactor^Wcluster-assisted-policy-migration that would be a great *solution* to these policies being terrible? 21:27:58 <notmyname> yes, that would be awesome. but when? m_kazuhiro's had patches to help with that up for a long time 21:27:59 <timburke> so get behind https://review.openstack.org/#/c/173580/ or https://review.openstack.org/#/c/209329/ first? 21:28:00 <patchbot> patch 173580 - swift - wip: Cluster assisted Storage Policy Migration 21:28:01 <patchbot> patch 209329 - swift - WIP: Changing Policies 21:28:20 <mattoliverau> clayg: yeah, thats a feature that would make this mute more of a moot point. 21:29:07 <notmyname> for the time being (ie until we have the cluster assisted policy migration) we've got timburke's patch to not start unless the bad config is depricated 21:29:44 <clayg> win! 21:30:41 <notmyname> ok, let's move on 21:31:16 <notmyname> looking at nicks that just netsplit, I don't actually think anyone but torgomatic was here in this meeting (and he's already rejoined--if he was doing more than lurking) 21:31:22 <notmyname> #topic discuss: follow-up vs patch overwrite. what's the better pattern? 21:31:51 <notmyname> recently I've noticed us, as a group, doing a lot more of "oh this patch looks good, but..." and proposing a small follow-on patch 21:32:14 <acoles> notmyname: both. if patch can be landed now, follow-up, it it needs revisions anyway, overwrite 21:32:19 <notmyname> maybe that's great. maybe it's better to give the original author diffs or push over the current patch set 21:32:37 <acoles> unless follow up is so trivial, then overwrite and land 21:33:04 <mattoliverau> so/mute/much 21:33:05 <mattoliverau> Arrg phone autocomplete :p 21:33:24 <notmyname> it's something that seems to be a relatively recent change (at least in frequency), so i wanted to address it and either make sure we're all ok with it or know if people are worried by it (or want something differnt) 21:33:41 <notmyname> so what are your thoughts? 21:34:06 <notmyname> acoles: seems reasonable (ie "use your judgement") as long as we don't end up getting tests and docs in the follow-up patches? 21:34:25 <acoles> notmyname: agree 21:34:42 <notmyname> what do other people thing? 21:34:45 <notmyname> *think 21:34:54 * acoles wonders if notmyname is now going to tell I have just done exactly that? 21:35:51 <mattoliverau> Yeah what acoles said, but also follow up if it's a change that's building apon the last one but a little out of scope. 21:36:37 <mattoliverau> Ie, Critical bug (stop bleeding) then maybe improve 21:36:42 <notmyname> heh, so I can see were not likely to get some "this is good" or "this is bad" sort of comment. but i at least wanted to raise it as a way to point it out as a thing that's happening 21:36:57 <notmyname> so... pay attention! :-) 21:37:07 <clayg> notmyname: please clarify, you observed -> doing a lot more of "oh this patch looks good, but..." as opposed to give the original author diffs or push over the current patch set 21:37:08 <acoles> I sometimes feel we could use commit subject line for more useful words than 'follow up for...' and just use a Related-Change tag in the commit msg body 21:37:27 <rledisez> i also feel "trust your judgement" is the best choice i feel like the follow-up when you start a patch, and the more you dig, the more you want to change things because the new-way-to-do is diffferent than years ago. so splitting things like the bugfixe vs the code improvement (in a follow-up) seems reasonable. also, it helps separate what's everybody agree and what makes debate 21:37:33 <notmyname> clayg: yes, correct 21:37:41 <clayg> of "give the original author diffs" or "push over the current patch set" which do you thin was more common before you observed this change? 21:37:53 <jrichli> I have seen two different types of "follow-up" patches. 1) something more that can be done after the original merges 2) a suggested change, intended to be rolled into original, then abandoned 21:38:47 <notmyname> I think previously we waited more for the original author to write something (and occasionally gave a diff) -- at least for frequent contributors. 21:39:13 <jungleboyj> jrichli: That second approach sounds dangerous. 21:39:33 <acoles> notmyname: but, but... you told us we needed to merge more stuff faster ;) 21:39:34 <jungleboyj> In that case it would be better to work with the original author to fix it or push up a new patch. 21:39:54 <clayg> notmyname: thanks for clarifying; ok, I think trying to approve changes and address additional improvements in follow patches is a categorical improvement and something we did on purpose based on community review sessions that you've lead at summits and stuff 21:40:07 <notmyname> acoles: I don't think the current situation is necessarily bad 21:40:19 <jungleboyj> Policy I have always followed is to push up a dependent patch unless it is something really minor that isn't worth waiting for the other person to fix, then push up a new aptch. 21:40:28 <notmyname> eg https://review.openstack.org/#/c/468011/3 and https://review.openstack.org/#/c/465184/ 21:40:29 <patchbot> patch 468011 - swift - Update Global EC docs with reference to composite ... 21:40:30 <patchbot> patch 465184 - swift - Ring doc cleanups 21:41:21 <clayg> don't use docs as the example - doc patches are terrible - they're so subjective 21:41:24 <timburke> https://review.openstack.org/#/c/466952/ maybe fell on the other side of that line 21:41:24 <patchbot> patch 466952 - swift - Follow-up for per-policy proxy configs (MERGED) 21:41:46 <notmyname> they were just easy ones I knew were proposed as follow-on that I had quick access to 21:41:58 <timburke> though parts of it were definitely out of scope for the original 21:42:04 <clayg> notmyname: i know i was just whining 21:42:44 <notmyname> I think landing more stuff faster is better. and it seems like this pattern has helping in that regard 21:42:49 <mattoliverau> jrichli's second approach is how we're working through sharding patch. Seems to work as we can break down the large patch and then merge it back in. 21:43:09 <acoles> notmyname: I'm not sure I see them as follow-up. The ring doc cleanup is standalone, just may been prompted by timburke reviewing another change?? 21:43:11 <notmyname> but since it is a change, then I want to make sure people are feeling ok with it now that it's been happening for a bit 21:43:44 <clayg> wfm 21:44:22 <notmyname> kota_: tdasilva: are you concerned with the current pattern of using minor follow-on patches? 21:44:53 <kota_> hmm 21:45:23 <kota_> I'm not feeling so bad on the current follow up use cases right now 21:45:29 <notmyname> great :-) 21:45:34 <tdasilva> notmyname: no concern from me. I've noticed different people do different things...hasn't really bothered me 21:45:44 <notmyname> great :-) 21:45:56 <timburke> i definitely find myself more likely to take that second approach -- otherwise i'd worry that by pushing over a patch and +2ing it, the original author may not get a chance to protest before someone else comes along for the +A 21:46:16 <notmyname> ok, if someone *does* have comments, please feel free to say something, publicly or privately 21:46:21 <notmyname> ok, let's move on 21:46:32 <notmyname> #topic priority items 21:46:38 <notmyname> THERE'S SO MUCH STUFF TO DOOOO!! 21:46:57 <notmyname> ok, we're not going to be able to talk about all the stuff on https://wiki.openstack.org/wiki/Meetings/Swift 21:47:13 <notmyname> I think I'll take this section and copy it over to the priority reviews page :-) 21:47:36 <notmyname> but I do want to mention a few of the high priority bugs and some small follow-on things 21:47:41 <notmyname> https://bugs.launchpad.net/swift/+bug/1568650 21:47:43 <openstack> Launchpad bug 1568650 in OpenStack Object Storage (swift) "Connection between client and proxy service does not closes" [High,Confirmed] - Assigned to drax (devesh-gupta) 21:47:53 <notmyname> related to https://bugs.launchpad.net/swift/+bug/1572719 21:47:54 <openstack> Launchpad bug 1572719 in OpenStack Object Storage (swift) "Client may hold socket open after ChunkWriteTimeout" [High,Confirmed] 21:48:09 <notmyname> high priority bugs, open for a while, no current patches for them 21:48:53 <notmyname> is there someone who can look in to them this week? if not to write a full patch, at least to outline what needs to be done to help someone who *can* write a patch 21:48:57 <clayg> yeah... stupid connection close bugs... timburke found another one 21:49:02 <notmyname> yep 21:49:28 <notmyname> timburke: you seem to be on this topic lately. can you check these? 21:49:39 <clayg> whoa! poor timburke :'( 21:50:00 <clayg> i was going to just suggest we move the medium and wait for them to bite us? 21:50:14 <notmyname> maybe that's the best 21:50:17 <timburke> i suppose i can try... need to go digging into eventlet anyway... 21:51:11 <notmyname> right, I mean keep these in mind as you're doing the investigation on the new bug you found 21:51:33 <notmyname> thanks 21:51:49 <notmyname> ok, next: DB replicators lose metadata 21:51:54 <notmyname> https://review.openstack.org/#/c/302494/ 21:51:54 <patchbot> patch 302494 - swift - Sync metadata in 'rsync_then_merge' in db_replicator 21:51:58 <notmyname> https://bugs.launchpad.net/swift/+bug/1570118 21:52:00 <openstack> Launchpad bug 1570118 in OpenStack Object Storage (swift) "Metadata can be missing by account/container replication" [High,In progress] - Assigned to Daisuke Morita (morita-daisuke) 21:52:09 <kota_> I added +2 already? 21:52:16 <notmyname> kota_: you did! thanks! 21:52:29 <clayg> i'ma look at that 21:52:34 <notmyname> clayg: thanks! 21:52:36 <clayg> werd 21:52:52 <notmyname> ok, we already mentioned the EC docs updates... 21:52:59 <notmyname> now https://bugs.launchpad.net/swift/+bug/1652323 21:53:01 <openstack> Launchpad bug 1652323 in OpenStack Object Storage (swift) "ssync syncs an expired object as a tombstone, probe test_object_expirer fails" [Medium,Confirmed] 21:53:03 <kota_> timburke: fyi -> https://review.openstack.org/#/c/471613/ 21:53:03 <patchbot> patch 471613 - swift - Use config_number method instead of node_id + 1 21:53:18 <kota_> related on the db_replicator's probe 21:53:19 <timburke> kota_: i saw; thanks 21:53:51 <notmyname> acoles: on that bug, anything to do right now? it's listed as medium 21:54:47 <notmyname> the title mentions the probe tests failing, but the comments indicate something more serious (ssynv failing to replicate data) 21:54:56 <notmyname> which is why I mention it here today 21:55:06 <acoles> notmyname: I'm not sure how best to close that one. I discussed it with rledisez this week - do we do a lot of work to be able to reconstruct an expired frag, or just ignore that it is missing ... cos it has expired anyway?? 21:55:14 <notmyname> rledisez: is this something your devs at OVH can help with? 21:55:26 <acoles> notmyname: we fixed the probe test with a workaround 21:55:35 <notmyname> ah, ok 21:55:46 <acoles> but we didn;t fix ssync :/ 21:55:58 <notmyname> ok, so we need to update tis bug or get a different bug to track 21:56:10 <notmyname> which do you think is best? 21:56:15 <rledisez> notmyname: right now no, but i want this fixed, so i might take some time to look into it, but i still don't know what would be the best way 21:56:18 <acoles> the recent change rledisez made so we can open an expired diskfile helps, but doesn;t get us everything we need to fix this one 21:56:32 <acoles> notmyname: I'll update the bug 21:56:42 <notmyname> acoles: perfect. thanks 21:56:58 <notmyname> https://review.openstack.org/#/c/439572/ 21:56:59 <patchbot> patch 439572 - swift - Limit number of revert tombstone SSYNC requests 21:57:07 <notmyname> for https://bugs.launchpad.net/swift/+bug/1668857 21:57:08 <openstack> Launchpad bug 1668857 in OpenStack Object Storage (swift) "EC reconstructor revert tombstone SSYNCs with too many primaries" [Medium,In progress] - Assigned to Mahati Chamarthy (mahati-chamarthy) 21:57:14 <acoles> any other opinions - missing frag, but expired, can we just leave it missing?? 21:57:32 <tdasilva> acoles: what's the cons to that? 21:57:36 <notmyname> kota_: on this patch, youv'e got a +1 21:57:57 <acoles> tdasilva: inconsistent state, for a while 21:58:04 <notmyname> kota_: will you get a chance to revisit it? 21:58:06 <kota_> notmyname: yup, i reviewd yesterday again but i'm still on +1 not +2 yet 21:58:26 <kota_> still want to collect clay's opinion 21:58:28 <notmyname> kota_: ok. what do you want to see before you get to a +2? what's missing? 21:58:33 <notmyname> oh, clayg is already +2 ;-) 21:58:45 <kota_> diff from clay's description with current code 21:58:59 <kota_> for the number of primary nodes to push the tombstone 21:59:33 <clayg> kota_: is it > 3 and < replicas? I will +2 it so hard! 21:59:34 <kota_> i added my thought yesterday so now waiting on the ack. I just was thinking to poke clayg after this meeting 21:59:45 <notmyname> ok 21:59:45 <kota_> it's on non duplicated case 21:59:53 <kota_> to clayg 22:00:27 <clayg> k, we can chat 22:00:35 <notmyname> kota_: clayg: thank you 22:00:59 <notmyname> ok, we're at time, but please check the other patches linked on the meeting agenda (and i'll move them to the priority reviews page as well) 22:01:07 <notmyname> thank you, everyone, for your work on swift 22:01:15 <notmyname> and thanks for coming today! 22:01:16 <jungleboyj> Thanks. 22:01:17 <notmyname> #endmeeting