21:00:26 <notmyname> #startmeeting swift
21:00:28 <openstack> Meeting started Wed Mar 29 21:00:26 2017 UTC and is due to finish in 60 minutes.  The chair is notmyname. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:31 <openstack> The meeting name has been set to 'swift'
21:00:35 <notmyname> who's here for the swift team meeting?
21:00:40 <jrichli> o/
21:00:41 <m_kazuhiro> o/
21:00:42 <mathiasb> o/
21:00:51 <kota_> o/
21:00:53 <rledisez> hey o/
21:00:55 <mattoliverau> o/
21:01:08 <jungleboyj> o/
21:01:15 <acoles> hi
21:01:22 <timburke> hello
21:01:28 <dmorita> hi
21:01:32 <notmyname> tdasilva: here?
21:01:44 <notmyname> onovy: here?
21:02:28 <notmyname> welcome everyone
21:02:54 <notmyname> looks slightly smaller this week, but there's a few people on vacation i think
21:03:01 <notmyname> agenda fro this week is
21:03:03 <notmyname> #link https://wiki.openstack.org/wiki/Meetings/Swift
21:03:14 <notmyname> first, follow-ups from last week
21:03:20 <notmyname> #topic stable gate failures
21:03:29 <notmyname> mattoliverau: you were looking in to this. what did you find?
21:03:29 <mattoliverau> hi
21:03:55 <mattoliverau> OK so after running unit tests continually
21:04:10 <mattoliverau> I did find one failure that could be back ported
21:04:24 <mattoliverau> https://review.openstack.org/448854
21:04:52 <mattoliverau> after that, it ran 272 more times before another intermittent bug was found
21:05:22 <mattoliverau> Further, all links reported are actually different test failures.
21:05:36 <notmyname> interesting
21:05:45 <notmyname> meaning that what you found isn't what is being reported?
21:05:59 <mattoliverau> The backported bug hit a 201 != 202 bug in a different place then one from the meeting, so I've been trying to track that down.
21:06:07 <mattoliverau> yeah
21:06:07 <notmyname> ok
21:06:25 <tdasilva> sorry, i'm here now
21:06:27 <notmyname> thanks for looking in to that
21:06:29 <mattoliverau> so TL;DR: There isn't simply 1 intermittent failure, they are all different. And at least on the stable SAIO I'm running on they are pretty few and far bewteen. Maybe the zuul env is more flakey. And I'm not 100% sure this is just a stable problem. However at least 1 problem found was fixed in master so it's being backported.
21:06:29 <mattoliverau> Maybe I should run the same on master in another cloud instance and see how that goes.
21:06:45 <notmyname> ok
21:06:57 <mattoliverau> I plan to continue looking, as I do it in the back ground
21:06:59 <notmyname> with your proposed backport, I'm not sure how I feel about it
21:07:08 <mattoliverau> I have been putting notes here:
21:07:08 <notmyname> on the one hand, it's simple, clear, and obvious
21:07:20 <notmyname> on the other, backporting tests only? seems odd
21:07:32 <mattoliverau> #link https://etherpad.openstack.org/p/swift_stable_newton_periodic_probs
21:07:56 <notmyname> mattoliverau: thanks
21:08:08 <mattoliverau> notmyname: sure, but if they were fixed in master and are causing failures, then maybe it is worth it, for small levels of worth it
21:08:25 <notmyname> mattoliverau: yeah :-)
21:08:49 <timburke> notmyname: will it mean that we can better trust the gate for backports we *do* care about? does it mean that we get better signal from the periodic jobs?
21:08:56 <acoles> mattoliverau: thanks for the etherpad notes
21:09:11 <notmyname> timburke: yeah. like I said, I'm conflicted :-)
21:09:34 <notmyname> just land everything
21:09:56 <notmyname> mattoliverau: anything more on that subject for now?
21:10:25 <mattoliverau> I;ve been scripting the runs that will drop me into a debugger shell which then I can poke around in via ipython, so I can just keep it running in tmux and then jump across and have a poke if something found.. so it;s something that can be done in the background
21:10:36 <notmyname> nice
21:10:54 <mattoliverau> I mention how on my blog: https://oliver.net.au/?p=302
21:11:05 <mattoliverau> cause otherwise I'd forget in the future :P
21:11:23 <jrichli> thanks for sharing!
21:11:25 <mattoliverau> also nose-pudb is useful ;)
21:11:42 <jungleboyj> That is cool.
21:11:47 <mathiasb> mattoliverau: that looks great, thanks!
21:11:54 <acoles> the failures with KeyError: StoragePolicy are curious because it suggests policy (un)patching is either not happening or happening async
21:12:00 <tdasilva> mathiasb: pudb curses?
21:12:05 <tdasilva> mattoliverau: ^^^
21:12:18 <notmyname> acoles: yeah
21:12:23 <mattoliverau> acoles: yeah I agree
21:12:35 <mattoliverau> tdasilva: yup
21:12:44 <notmyname> let's come back to this again next week to see what else has been found?
21:12:51 <mattoliverau> kk
21:13:22 <notmyname> #topic backports
21:13:27 <notmyname> speaking of backports
21:13:28 <mattoliverau> acoles: I'm waiting for a policy bug to be a failure so I can pock around
21:13:34 <mattoliverau> poke even
21:13:36 <notmyname> I backported jrichli's patch for https://bugs.launchpad.net/swift/+bug/1657246
21:13:36 <openstack> Launchpad bug 1657246 in OpenStack Object Storage (swift) "Proxy logs wrong request method when validating SLO segments" [Critical,Fix released] - Assigned to Janie Richling (jrichli)
21:13:47 <notmyname> it applied cleanly to ocata
21:13:55 <jrichli> thanks!
21:14:02 <notmyname> but it brought in https://review.openstack.org/#/c/449836/ for newton
21:14:25 <notmyname> jrichli: if you could glance over those and give a +1 if they look right, I'd appreciate it
21:14:35 <jrichli> will do
21:14:38 <notmyname> thanks
21:15:18 <notmyname> which leaves just https://review.openstack.org/#/c/444653/ open in backports
21:15:36 <notmyname> so, more stuff to look at
21:15:59 <notmyname> FWIW, I'm not in a rush to tag another backport release, and I definitely want to hear more from what mattoliverau finds before doing so
21:16:14 <notmyname> #topic boston forum topics
21:16:29 <notmyname> I didn't see anything new added to the etherpad (which is good)
21:16:32 <notmyname> #link https://etherpad.openstack.org/p/BOS-Swift-brainstorming
21:16:52 <notmyname> and I proposed all the stuff to the forum topic submission site
21:16:54 <notmyname> #link http://forumtopics.openstack.org
21:17:06 <kota_> notmyname: thanks for that
21:17:15 <notmyname> feel free to peruse those and leave lots of supporting comments for the ones you want to see discussed
21:18:00 <notmyname> hmm...not much else to say about that :-)
21:18:08 <notmyname> #topic TC golang resolution
21:18:15 <notmyname> #link https://review.openstack.org/#/c/451524/
21:18:24 <notmyname> tdasilva wrote up a great overview for the TC
21:18:42 <notmyname> this is the first step for the flavio process for getting not-python into openstack
21:18:49 <notmyname> step one is technical justification
21:18:54 <notmyname> step two is "do the work"
21:19:09 <acoles> tdasilva: great work - it's really well written - thank you
21:19:13 <jrichli> good work tdasilva
21:19:31 <notmyname> and when the TC approves step one, it means that we're good and our use of golang isn't to be further questioned
21:19:41 <notmyname> yes, tdasilva did a great job on the writeup
21:20:03 <tdasilva> team effort and most of the credit should go to torgomatic
21:20:23 <notmyname> please follow that patch. I expect that it would be brought up during one of the next two TC meetings (tuesdays), but it's already got some positive comments on it
21:21:27 <notmyname> and as a reminder, the golang work is part of the overall "fix rebalance" work. https://wiki.openstack.org/wiki/Swift/Fixing-rebalance-and-golang
21:21:54 <notmyname> any questions on the golang proposal?
21:22:24 <timburke> is it scoped to swift specifically, or all of openstack?
21:22:30 <notmyname> timburke: swift
21:22:38 <timburke> should the title reflect that?
21:22:39 <notmyname> and more specifically, swift's storage nodes
21:22:56 <notmyname> probably. but let's not make edits until they are requested by the TC
21:23:02 <timburke> ya
21:23:17 <tdasilva> ack
21:23:58 <notmyname> ok, last issue on the agenda that I wanted to bring up
21:24:03 <notmyname> #topic DB replication issue
21:24:20 <notmyname> this is a bug filed by Pavel, and it looks like it might be a big deal
21:24:26 <notmyname> #link https://bugs.launchpad.net/swift/+bug/1675500
21:24:26 <openstack> Launchpad bug 1675500 in OpenStack Object Storage (swift) "Container/account disk drive fault results replication on all rest drives" [High,In progress] - Assigned to Pavel Kvasnička (pavel-kvasnicka)
21:24:58 <notmyname> the summary is that if a drive fails, the DB replicator might over-replicated the container (and account?) DBs
21:25:08 <notmyname> timburke looked at it this morning and seemed to confirm it
21:25:19 <notmyname> so it's at least a high priority bug
21:25:32 <notmyname> might go up to critical based on what further investigation finds
21:26:06 <notmyname> at this point, we need some other people to duplicate the steps that Pavel wrote in teh bug report and then take it to the next step. try with more drives than a SAIO, more nodes, etc
21:26:15 <notmyname> and see what the scope is and where it might be
21:26:31 <notmyname> timburke: any more to share on this, or did I get all that down correctly?
21:26:43 <timburke> fwiw, i was doing the four drive, two replica variation from the bug report
21:27:25 <timburke> and wound up with a replica of some DBs on ever drive
21:27:33 <timburke> every*
21:27:57 <notmyname> a new replica each pass, right?
21:28:15 <mattoliverau> I'll take a look and try and recreate it best I can. I've spent alot of time in database replication thanks to sharding.
21:28:23 <timburke> notmyname: yup
21:28:26 <acoles> notmyname: I'll try to give tit some time
21:28:28 <notmyname> mattoliverau: you beat me to the question! thanks :-)
21:28:31 <notmyname> acoles: thanks
21:28:55 <acoles> I need to spend more time in db replication code ;)
21:28:58 <notmyname> ok, please leave notes in the bug and/or IRC
21:28:59 <acoles> mattoliverau: ^^
21:29:25 <rledisez> is it specific for rings of 2 replica or it should apply for n replica?
21:29:36 <notmyname> rledisez: great question. I don't know
21:29:47 <notmyname> I wondered if it might be related to the odd/even quorum size
21:29:53 <rledisez> i’m gonna test on a 3 replicas cluster then
21:29:54 <notmyname> but that needs to be tested
21:29:58 <notmyname> rledisez: thank you
21:30:28 <timburke> from the bug report, it sounds like drives >= 2 * replicas? but yeah, please test
21:30:36 <notmyname> oh, and timburke will be out after today on vacation, so don't ping him for it after about 3 hours from now
21:30:45 <timburke> hehe
21:31:08 <mattoliverau> timburke deservers a vacation!
21:31:25 <mattoliverau> wow, my tpying is bad today.. I blame morning and no coffee
21:31:25 <notmyname> mattoliverau: not like he can walk down to the beach every day!
21:31:40 <mattoliverau> pftt, you could if you came for a visit ;)
21:31:46 <notmyname> :-)
21:31:47 <timburke> mattoliverau: but it should a vacation in which i have enough of sharding loaded into my head to continue thinking about it in the back of my mind :-)
21:32:01 <mattoliverau> timburke: :)
21:32:11 <notmyname> #topic open discussion
21:32:18 <notmyname> anything else to bring up in today's meeting?
21:32:31 <notmyname> i'm starting to think about the next release of swift and swiftclient
21:32:56 <notmyname> there's a few things in the works that I'd probably prefer to wait on (like resolution or triage of pavels bug)
21:33:03 <notmyname> and a very nice perf patch to swiftcleint
21:33:17 <notmyname> but keep the next release in the back of your mind
21:33:32 <notmyname> start thinking about stuff you want to see in it that's nearly ready to land
21:33:39 <notmyname> anything else from anyone?
21:34:25 <timburke> oh... speaking of... i wouldn't mind if someone else took a look at the test failure for https://review.openstack.org/#/c/449771/
21:35:00 <timburke> and thinking hard about py2 vs py3 and what it means to be using a unicode buffer for our upload source...
21:35:11 <timburke> and then deciding whether that test actually makes any sense to keep
21:36:03 <notmyname> right
21:36:28 <timburke> i would ask joel, but he's not here :'(
21:36:59 <notmyname> m_kazuhiro: could you take a look at timburke's patch there?
21:38:07 <m_kazuhiro> notmyname: please give me some minutes.
21:38:30 <notmyname> m_kazuhiro: sure. I just mean sometime soon. not right this moment
21:38:58 <notmyname> anything else from anyone?
21:39:08 <notmyname> if not, we'll end early (yay!)
21:39:29 <timburke> again, i'm out through all of next week, so no rush on my account. although if we want a release sooner, i'm also happy to have someone else push over the patch to get it in
21:40:50 <timburke> oh! are people OK with the change in behavior in https://review.openstack.org/#/c/446142/ ?
21:41:50 <timburke> i maintain that the exclusions in versioned_writes for DLOs are a bug
21:41:52 <notmyname> clearly acoles is :-)
21:42:16 <tdasilva> timburke: i think I came up with another reason why DLOs are not versioned, but we can take it to #openstack-swift
21:42:26 <tdasilva> or the patch
21:42:38 <timburke> and when we pulled versioned_writes out to middleware we went for bug-for-bug compatibility which was totally the right call
21:42:45 <timburke> but now i wanna fix the bug
21:42:53 <timburke> tdasilva: ya, that's fine
21:42:58 <notmyname> ok, then let's close this meeting
21:43:07 <notmyname> thanks for working on swift!
21:43:14 <notmyname> #endmeeting