21:00:04 #startmeeting swift 21:00:04 Meeting started Wed Jun 7 21:00:04 2017 UTC and is due to finish in 60 minutes. The chair is notmyname. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:06 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:09 The meeting name has been set to 'swift' 21:00:11 who's here for the swift team meeting? 21:00:16 o/ 21:00:17 o/ 21:00:17 o/ 21:00:18 hi 21:00:20 o/ 21:00:21 o/ 21:00:29 hello 21:00:52 acoles: tdasilva: clayg: ping 21:00:59 notmyname: pong 21:01:00 hello 21:01:18 don't @ me 21:01:38 don't tell me what to do ;-) 21:01:43 welcome, everyone 21:02:07 agenda this week is at ... 21:02:08 #link https://wiki.openstack.org/wiki/Meetings/Swift 21:02:18 you'll see that I reorganized that page quite a bit 21:02:23 I hope it's easier to read now 21:02:35 also, OH MY there's a lot of stuff on it! 21:02:42 it is until you add items to the other meeting 21:02:47 heh 21:03:15 reminder that our next 0700 meeting is june 14 (ie before the next 2100 meeting) and acoles will be leading it 21:04:03 let's see how quickly we can go through some stuff 21:04:15 yeah let's go! 21:04:17 go go go ! 21:04:19 oh! I just saw an email on the mailing list from Erin Disney about the next PTG 21:04:25 has a registration link and hotel info 21:04:41 http://lists.openstack.org/pipermail/openstack-dev/2017-June/118002.html 21:04:42 so be sure to read that 21:04:45 timburke: thanks 21:04:47 already!? 21:04:51 #link http://lists.openstack.org/pipermail/openstack-dev/2017-June/118002.html 21:05:17 #topic follow-up from previous meeting 21:05:20 LSOF status 21:05:39 nothing new here, didn't had time to do the bench, just writing code 21:05:40 rledisez: you said you'd been swamped with other stuff and haven't been able to make much progress on this yet 21:05:44 ack 21:05:51 TC goals status 21:05:54 still on todolist, hopefuly for next week 21:06:01 acoles did a great job with the uwsgi one 21:06:02 right on! 21:06:23 although the conversation sortof went in a "but do we really need this?" direction 21:06:43 however, it's documented, and as far as I can tell, we're done with that one 21:07:02 still no progress on the py3 docs 21:07:03 Nice 21:07:21 ok, `mount_check` for containers patch 21:07:24 #link https://review.openstack.org/#/c/466255/ 21:07:25 patch 466255 - swift - Make mount_check option usable in containerized en... 21:07:49 we discussed it last week. clayg has +2'd it. zaitcev has a +1 21:07:57 hmm... cschwede? 21:07:57 i think i signed up to review that, but have not had a chance yet, will review tomorrow 21:08:02 ack 21:08:04 thanks 21:08:14 ah, yes. you're name is on it in gerrit :-) 21:08:30 and depricating known-bad EC config 21:08:32 #link https://review.openstack.org/#/c/468105/ 21:08:32 patch 468105 - swift - Require that known-bad EC schemes be deprecated 21:08:58 clayg: you have a comment on it saying someone should send an email to the ops mailing list 21:09:02 clayg: could that someone be you? 21:09:03 notmyname: yeah you should do that 21:09:11 lol 21:09:12 "someone" ;-) 21:09:13 I can barely spell EC 21:09:38 ok ok. I'll do it 21:09:41 we don't have to send jack - we can just merge it 21:09:48 I also don't really know who reads the ops ML? 21:09:50 well, that shoudl happen too :-) 21:09:55 jack might not want to go 21:10:05 #action notmyname to send email to ops ML about https://review.openstack.org/#/c/468105/ 21:10:06 patch 468105 - swift - Require that known-bad EC schemes be deprecated 21:10:22 we shouldn't wait for a ML post to land that patch, though 21:10:39 tdasilva: your name is on it in gerrit too. are you planning on reviewing? 21:10:47 either way - I *think* we landed on we want to make proxies not start in at least some conditions - which is pretty ballsy IMHO - maybe it's a great thing - for our product it means a bunch of work before we can safely upgrade to a version of swift with that behavior - i have no idea what it means for other people 21:11:32 IIRC the "some conditions" were when you had the bad EC policy as the only policy 21:11:45 otherwise, auto-set the bad config to depricated, right? 21:11:46 i'm really pretty sure the code is top shelf (duh, timburke wrote it) - the question is really about what is the process for the canonical openstack never-breaks-compat project when they want to break compat? 21:12:38 it's not an API compat breakage. it's a "stop slowly deleting data when your have a certain config" change. with a strong signal to move your data to a different policy 21:12:46 notmyname: I guess? is that what timburke implemented? that was one suggestion... is it *really* any better to automatically hide a policy and let you continue to ignore the error? Does it mean any less work for someone that wants to start to consume this version of swift - but *somehow* still doesn't know they have these bad policies and isn't dealing with them? 21:13:44 timburke: I think that's questions best answered by you 21:13:46 notmyname: data is only corrupted if you have a sufficiently old liberasurecode - which AFAIK isn't really part of the warning 21:13:59 yeah timburke why are you doing this? put your head back in the sand? 21:14:16 clayg: i like turning over rocks? 21:14:16 j/k 21:14:21 lol! 21:14:23 also, this would be a great place for input from kota_ cschwede acoles mattoliverau tdasilva jrichli (all the cores) 21:14:40 rledisez: what do you want to happen in this case? 21:15:35 so the change as it currently stands is to stop services (all of them!) from starting if you have a known-bad policy that hasn't been deprecated 21:15:36 i would prefer the process not to start. at least it's obvious something must be changed in config, i can't miss it. if policy is auto-set deprecated, an oeprator might miss it and then, getting called because "i can't create containers anymore" 21:16:18 easier to cache in staging - i like it 21:16:25 clayg: are you saying that isa_l_rs_vand with parity>5 is ok if you have recent libec? 21:16:35 acoles: nope 21:16:41 acoles: just that it doesn't corrupt data 21:16:47 just that we won't still silently corrupt data 21:17:03 * acoles confused 21:17:06 acoles: it might still 503 - we could also add more code to make it 503 less-ish if you can manage to get enough frags to get yourself out of a bad erase list 21:17:37 OIC, we don't corrupt it noisily, we refuse to decode it? 21:17:54 yeah 21:18:07 acoles: but I like don't start if you have these policies - i just have a to write a bunch of code to make sure it doesn't come to that - and we don't really offer cluster-assisted-policy-migration yet - so if someone gets the "if you upgrade swift you are boned" message - there's no clear path to #winning 21:18:19 you could deprecate it and continue to ignore it i guess? 21:18:46 Not starting services seems like something that would annoy people, but it seems the lesser of 2 evils. I mean we can log, but loosing data if you miss a log entry is kinda scary 21:19:07 * jungleboyj sneaks in late 21:19:28 let's not do this in a release that you also cannot roll back from 21:19:50 do we support rollback sometimes! 21:19:54 acoles: yeah, that's a scary part because we've never really done anything around supporting rollbacks 21:20:14 although most of the time it should work. probably. 21:20:17 iirc, the newer pyeclib/liberasurecode stops to decode on that bad case 21:20:39 no but we have had release where we warn that rollback would be really hard e.g. changing hashes.pkl format 21:21:02 I guess if you never start the new release you're ok 21:21:02 if you really want to continue with the policy un-deprecated, you could always go in and comment out the raise 21:21:10 ugh, i'm bored - this sucks - anyone that has these policies is screwed - but what can we do? Does this patch actually *help* someone? I will punch you in the face until you check this checkbox - BAM do you like that!? - warning: you are still hozed 21:21:34 so operators can deploy the policy but sometimes they will see decode error on reconstructor logs or proxy log on GET requests 21:22:10 note that that is if the operators uses newer version of pyeclib/liberasurecode 21:22:11 let's just ignore this and work on cluster-assisted-policy-migration? 21:22:33 clayg: right, so it's what mattoliverau said. this one is less bad than ignoring it 21:23:24 ok cool 21:23:49 timburke: but if you go in, comment out the raise, and find your can read your data, you then feel mad at devs 21:24:13 I'll send the ML message, and this patch should be reviewed and merged when we like it and operators will be in a better place 21:24:28 acoles: if you're editing the code on your own in prod, you're already off the reservation :-) 21:24:39 +1 21:24:48 notmyname: wasn't my suggestion 21:24:53 :-) 21:25:06 ok, let's move on. we've at least got a little bit of a plan for it 21:25:06 acoles: you'll find you can read your data just by deprecating the policy, too 21:25:20 timburke: not if it is your only policy 21:25:33 so, yeah, go make another policy. I know. 21:25:40 so did we agree the current patch is ok? 21:25:52 EVERYTHING IS AWESOME - PAINT IT BLUE! 21:26:09 it's not a bikeshed argument! 21:26:22 but yeah, I haven't heard any other paint colors being proposed ;-) 21:26:54 * acoles needs to find power cord, bbiab 21:26:57 i suggested we still have the nuclear-reactor^Wcluster-assisted-policy-migration that would be a great *solution* to these policies being terrible? 21:27:58 yes, that would be awesome. but when? m_kazuhiro's had patches to help with that up for a long time 21:27:59 so get behind https://review.openstack.org/#/c/173580/ or https://review.openstack.org/#/c/209329/ first? 21:28:00 patch 173580 - swift - wip: Cluster assisted Storage Policy Migration 21:28:01 patch 209329 - swift - WIP: Changing Policies 21:28:20 clayg: yeah, thats a feature that would make this mute more of a moot point. 21:29:07 for the time being (ie until we have the cluster assisted policy migration) we've got timburke's patch to not start unless the bad config is depricated 21:29:44 win! 21:30:41 ok, let's move on 21:31:16 looking at nicks that just netsplit, I don't actually think anyone but torgomatic was here in this meeting (and he's already rejoined--if he was doing more than lurking) 21:31:22 #topic discuss: follow-up vs patch overwrite. what's the better pattern? 21:31:51 recently I've noticed us, as a group, doing a lot more of "oh this patch looks good, but..." and proposing a small follow-on patch 21:32:14 notmyname: both. if patch can be landed now, follow-up, it it needs revisions anyway, overwrite 21:32:19 maybe that's great. maybe it's better to give the original author diffs or push over the current patch set 21:32:37 unless follow up is so trivial, then overwrite and land 21:33:04 so/mute/much 21:33:05 Arrg phone autocomplete :p 21:33:24 it's something that seems to be a relatively recent change (at least in frequency), so i wanted to address it and either make sure we're all ok with it or know if people are worried by it (or want something differnt) 21:33:41 so what are your thoughts? 21:34:06 acoles: seems reasonable (ie "use your judgement") as long as we don't end up getting tests and docs in the follow-up patches? 21:34:25 notmyname: agree 21:34:42 what do other people thing? 21:34:45 *think 21:34:54 * acoles wonders if notmyname is now going to tell I have just done exactly that? 21:35:51 Yeah what acoles said, but also follow up if it's a change that's building apon the last one but a little out of scope. 21:36:37 Ie, Critical bug (stop bleeding) then maybe improve 21:36:42 heh, so I can see were not likely to get some "this is good" or "this is bad" sort of comment. but i at least wanted to raise it as a way to point it out as a thing that's happening 21:36:57 so... pay attention! :-) 21:37:07 notmyname: please clarify, you observed -> doing a lot more of "oh this patch looks good, but..." as opposed to give the original author diffs or push over the current patch set 21:37:08 I sometimes feel we could use commit subject line for more useful words than 'follow up for...' and just use a Related-Change tag in the commit msg body 21:37:27 i also feel "trust your judgement" is the best choice i feel like the follow-up when you start a patch, and the more you dig, the more you want to change things because the new-way-to-do is diffferent than years ago. so splitting things like the bugfixe vs the code improvement (in a follow-up) seems reasonable. also, it helps separate what's everybody agree and what makes debate 21:37:33 clayg: yes, correct 21:37:41 of "give the original author diffs" or "push over the current patch set" which do you thin was more common before you observed this change? 21:37:53 I have seen two different types of "follow-up" patches. 1) something more that can be done after the original merges 2) a suggested change, intended to be rolled into original, then abandoned 21:38:47 I think previously we waited more for the original author to write something (and occasionally gave a diff) -- at least for frequent contributors. 21:39:13 jrichli: That second approach sounds dangerous. 21:39:33 notmyname: but, but... you told us we needed to merge more stuff faster ;) 21:39:34 In that case it would be better to work with the original author to fix it or push up a new patch. 21:39:54 notmyname: thanks for clarifying; ok, I think trying to approve changes and address additional improvements in follow patches is a categorical improvement and something we did on purpose based on community review sessions that you've lead at summits and stuff 21:40:07 acoles: I don't think the current situation is necessarily bad 21:40:19 Policy I have always followed is to push up a dependent patch unless it is something really minor that isn't worth waiting for the other person to fix, then push up a new aptch. 21:40:28 eg https://review.openstack.org/#/c/468011/3 and https://review.openstack.org/#/c/465184/ 21:40:29 patch 468011 - swift - Update Global EC docs with reference to composite ... 21:40:30 patch 465184 - swift - Ring doc cleanups 21:41:21 don't use docs as the example - doc patches are terrible - they're so subjective 21:41:24 https://review.openstack.org/#/c/466952/ maybe fell on the other side of that line 21:41:24 patch 466952 - swift - Follow-up for per-policy proxy configs (MERGED) 21:41:46 they were just easy ones I knew were proposed as follow-on that I had quick access to 21:41:58 though parts of it were definitely out of scope for the original 21:42:04 notmyname: i know i was just whining 21:42:44 I think landing more stuff faster is better. and it seems like this pattern has helping in that regard 21:42:49 jrichli's second approach is how we're working through sharding patch. Seems to work as we can break down the large patch and then merge it back in. 21:43:09 notmyname: I'm not sure I see them as follow-up. The ring doc cleanup is standalone, just may been prompted by timburke reviewing another change?? 21:43:11 but since it is a change, then I want to make sure people are feeling ok with it now that it's been happening for a bit 21:43:44 wfm 21:44:22 kota_: tdasilva: are you concerned with the current pattern of using minor follow-on patches? 21:44:53 hmm 21:45:23 I'm not feeling so bad on the current follow up use cases right now 21:45:29 great :-) 21:45:34 notmyname: no concern from me. I've noticed different people do different things...hasn't really bothered me 21:45:44 great :-) 21:45:56 i definitely find myself more likely to take that second approach -- otherwise i'd worry that by pushing over a patch and +2ing it, the original author may not get a chance to protest before someone else comes along for the +A 21:46:16 ok, if someone *does* have comments, please feel free to say something, publicly or privately 21:46:21 ok, let's move on 21:46:32 #topic priority items 21:46:38 THERE'S SO MUCH STUFF TO DOOOO!! 21:46:57 ok, we're not going to be able to talk about all the stuff on https://wiki.openstack.org/wiki/Meetings/Swift 21:47:13 I think I'll take this section and copy it over to the priority reviews page :-) 21:47:36 but I do want to mention a few of the high priority bugs and some small follow-on things 21:47:41 https://bugs.launchpad.net/swift/+bug/1568650 21:47:43 Launchpad bug 1568650 in OpenStack Object Storage (swift) "Connection between client and proxy service does not closes" [High,Confirmed] - Assigned to drax (devesh-gupta) 21:47:53 related to https://bugs.launchpad.net/swift/+bug/1572719 21:47:54 Launchpad bug 1572719 in OpenStack Object Storage (swift) "Client may hold socket open after ChunkWriteTimeout" [High,Confirmed] 21:48:09 high priority bugs, open for a while, no current patches for them 21:48:53 is there someone who can look in to them this week? if not to write a full patch, at least to outline what needs to be done to help someone who *can* write a patch 21:48:57 yeah... stupid connection close bugs... timburke found another one 21:49:02 yep 21:49:28 timburke: you seem to be on this topic lately. can you check these? 21:49:39 whoa! poor timburke :'( 21:50:00 i was going to just suggest we move the medium and wait for them to bite us? 21:50:14 maybe that's the best 21:50:17 i suppose i can try... need to go digging into eventlet anyway... 21:51:11 right, I mean keep these in mind as you're doing the investigation on the new bug you found 21:51:33 thanks 21:51:49 ok, next: DB replicators lose metadata 21:51:54 https://review.openstack.org/#/c/302494/ 21:51:54 patch 302494 - swift - Sync metadata in 'rsync_then_merge' in db_replicator 21:51:58 https://bugs.launchpad.net/swift/+bug/1570118 21:52:00 Launchpad bug 1570118 in OpenStack Object Storage (swift) "Metadata can be missing by account/container replication" [High,In progress] - Assigned to Daisuke Morita (morita-daisuke) 21:52:09 I added +2 already? 21:52:16 kota_: you did! thanks! 21:52:29 i'ma look at that 21:52:34 clayg: thanks! 21:52:36 werd 21:52:52 ok, we already mentioned the EC docs updates... 21:52:59 now https://bugs.launchpad.net/swift/+bug/1652323 21:53:01 Launchpad bug 1652323 in OpenStack Object Storage (swift) "ssync syncs an expired object as a tombstone, probe test_object_expirer fails" [Medium,Confirmed] 21:53:03 timburke: fyi -> https://review.openstack.org/#/c/471613/ 21:53:03 patch 471613 - swift - Use config_number method instead of node_id + 1 21:53:18 related on the db_replicator's probe 21:53:19 kota_: i saw; thanks 21:53:51 acoles: on that bug, anything to do right now? it's listed as medium 21:54:47 the title mentions the probe tests failing, but the comments indicate something more serious (ssynv failing to replicate data) 21:54:56 which is why I mention it here today 21:55:06 notmyname: I'm not sure how best to close that one. I discussed it with rledisez this week - do we do a lot of work to be able to reconstruct an expired frag, or just ignore that it is missing ... cos it has expired anyway?? 21:55:14 rledisez: is this something your devs at OVH can help with? 21:55:26 notmyname: we fixed the probe test with a workaround 21:55:35 ah, ok 21:55:46 but we didn;t fix ssync :/ 21:55:58 ok, so we need to update tis bug or get a different bug to track 21:56:10 which do you think is best? 21:56:15 notmyname: right now no, but i want this fixed, so i might take some time to look into it, but i still don't know what would be the best way 21:56:18 the recent change rledisez made so we can open an expired diskfile helps, but doesn;t get us everything we need to fix this one 21:56:32 notmyname: I'll update the bug 21:56:42 acoles: perfect. thanks 21:56:58 https://review.openstack.org/#/c/439572/ 21:56:59 patch 439572 - swift - Limit number of revert tombstone SSYNC requests 21:57:07 for https://bugs.launchpad.net/swift/+bug/1668857 21:57:08 Launchpad bug 1668857 in OpenStack Object Storage (swift) "EC reconstructor revert tombstone SSYNCs with too many primaries" [Medium,In progress] - Assigned to Mahati Chamarthy (mahati-chamarthy) 21:57:14 any other opinions - missing frag, but expired, can we just leave it missing?? 21:57:32 acoles: what's the cons to that? 21:57:36 kota_: on this patch, youv'e got a +1 21:57:57 tdasilva: inconsistent state, for a while 21:58:04 kota_: will you get a chance to revisit it? 21:58:06 notmyname: yup, i reviewd yesterday again but i'm still on +1 not +2 yet 21:58:26 still want to collect clay's opinion 21:58:28 kota_: ok. what do you want to see before you get to a +2? what's missing? 21:58:33 oh, clayg is already +2 ;-) 21:58:45 diff from clay's description with current code 21:58:59 for the number of primary nodes to push the tombstone 21:59:33 kota_: is it > 3 and < replicas? I will +2 it so hard! 21:59:34 i added my thought yesterday so now waiting on the ack. I just was thinking to poke clayg after this meeting 21:59:45 ok 21:59:45 it's on non duplicated case 21:59:53 to clayg 22:00:27 k, we can chat 22:00:35 kota_: clayg: thank you 22:00:59 ok, we're at time, but please check the other patches linked on the meeting agenda (and i'll move them to the priority reviews page as well) 22:01:07 thank you, everyone, for your work on swift 22:01:15 and thanks for coming today! 22:01:16 Thanks. 22:01:17 #endmeeting