21:00:26 #startmeeting swift 21:00:28 Meeting started Wed Mar 29 21:00:26 2017 UTC and is due to finish in 60 minutes. The chair is notmyname. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:29 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:31 The meeting name has been set to 'swift' 21:00:35 who's here for the swift team meeting? 21:00:40 o/ 21:00:41 o/ 21:00:42 o/ 21:00:51 o/ 21:00:53 hey o/ 21:00:55 o/ 21:01:08 o/ 21:01:15 hi 21:01:22 hello 21:01:28 hi 21:01:32 tdasilva: here? 21:01:44 onovy: here? 21:02:28 welcome everyone 21:02:54 looks slightly smaller this week, but there's a few people on vacation i think 21:03:01 agenda fro this week is 21:03:03 #link https://wiki.openstack.org/wiki/Meetings/Swift 21:03:14 first, follow-ups from last week 21:03:20 #topic stable gate failures 21:03:29 mattoliverau: you were looking in to this. what did you find? 21:03:29 hi 21:03:55 OK so after running unit tests continually 21:04:10 I did find one failure that could be back ported 21:04:24 https://review.openstack.org/448854 21:04:52 after that, it ran 272 more times before another intermittent bug was found 21:05:22 Further, all links reported are actually different test failures. 21:05:36 interesting 21:05:45 meaning that what you found isn't what is being reported? 21:05:59 The backported bug hit a 201 != 202 bug in a different place then one from the meeting, so I've been trying to track that down. 21:06:07 yeah 21:06:07 ok 21:06:25 sorry, i'm here now 21:06:27 thanks for looking in to that 21:06:29 so TL;DR: There isn't simply 1 intermittent failure, they are all different. And at least on the stable SAIO I'm running on they are pretty few and far bewteen. Maybe the zuul env is more flakey. And I'm not 100% sure this is just a stable problem. However at least 1 problem found was fixed in master so it's being backported. 21:06:29 Maybe I should run the same on master in another cloud instance and see how that goes. 21:06:45 ok 21:06:57 I plan to continue looking, as I do it in the back ground 21:06:59 with your proposed backport, I'm not sure how I feel about it 21:07:08 I have been putting notes here: 21:07:08 on the one hand, it's simple, clear, and obvious 21:07:20 on the other, backporting tests only? seems odd 21:07:32 #link https://etherpad.openstack.org/p/swift_stable_newton_periodic_probs 21:07:56 mattoliverau: thanks 21:08:08 notmyname: sure, but if they were fixed in master and are causing failures, then maybe it is worth it, for small levels of worth it 21:08:25 mattoliverau: yeah :-) 21:08:49 notmyname: will it mean that we can better trust the gate for backports we *do* care about? does it mean that we get better signal from the periodic jobs? 21:08:56 mattoliverau: thanks for the etherpad notes 21:09:11 timburke: yeah. like I said, I'm conflicted :-) 21:09:34 just land everything 21:09:56 mattoliverau: anything more on that subject for now? 21:10:25 I;ve been scripting the runs that will drop me into a debugger shell which then I can poke around in via ipython, so I can just keep it running in tmux and then jump across and have a poke if something found.. so it;s something that can be done in the background 21:10:36 nice 21:10:54 I mention how on my blog: https://oliver.net.au/?p=302 21:11:05 cause otherwise I'd forget in the future :P 21:11:23 thanks for sharing! 21:11:25 also nose-pudb is useful ;) 21:11:42 That is cool. 21:11:47 mattoliverau: that looks great, thanks! 21:11:54 the failures with KeyError: StoragePolicy are curious because it suggests policy (un)patching is either not happening or happening async 21:12:00 mathiasb: pudb curses? 21:12:05 mattoliverau: ^^^ 21:12:18 acoles: yeah 21:12:23 acoles: yeah I agree 21:12:35 tdasilva: yup 21:12:44 let's come back to this again next week to see what else has been found? 21:12:51 kk 21:13:22 #topic backports 21:13:27 speaking of backports 21:13:28 acoles: I'm waiting for a policy bug to be a failure so I can pock around 21:13:34 poke even 21:13:36 I backported jrichli's patch for https://bugs.launchpad.net/swift/+bug/1657246 21:13:36 Launchpad bug 1657246 in OpenStack Object Storage (swift) "Proxy logs wrong request method when validating SLO segments" [Critical,Fix released] - Assigned to Janie Richling (jrichli) 21:13:47 it applied cleanly to ocata 21:13:55 thanks! 21:14:02 but it brought in https://review.openstack.org/#/c/449836/ for newton 21:14:25 jrichli: if you could glance over those and give a +1 if they look right, I'd appreciate it 21:14:35 will do 21:14:38 thanks 21:15:18 which leaves just https://review.openstack.org/#/c/444653/ open in backports 21:15:36 so, more stuff to look at 21:15:59 FWIW, I'm not in a rush to tag another backport release, and I definitely want to hear more from what mattoliverau finds before doing so 21:16:14 #topic boston forum topics 21:16:29 I didn't see anything new added to the etherpad (which is good) 21:16:32 #link https://etherpad.openstack.org/p/BOS-Swift-brainstorming 21:16:52 and I proposed all the stuff to the forum topic submission site 21:16:54 #link http://forumtopics.openstack.org 21:17:06 notmyname: thanks for that 21:17:15 feel free to peruse those and leave lots of supporting comments for the ones you want to see discussed 21:18:00 hmm...not much else to say about that :-) 21:18:08 #topic TC golang resolution 21:18:15 #link https://review.openstack.org/#/c/451524/ 21:18:24 tdasilva wrote up a great overview for the TC 21:18:42 this is the first step for the flavio process for getting not-python into openstack 21:18:49 step one is technical justification 21:18:54 step two is "do the work" 21:19:09 tdasilva: great work - it's really well written - thank you 21:19:13 good work tdasilva 21:19:31 and when the TC approves step one, it means that we're good and our use of golang isn't to be further questioned 21:19:41 yes, tdasilva did a great job on the writeup 21:20:03 team effort and most of the credit should go to torgomatic 21:20:23 please follow that patch. I expect that it would be brought up during one of the next two TC meetings (tuesdays), but it's already got some positive comments on it 21:21:27 and as a reminder, the golang work is part of the overall "fix rebalance" work. https://wiki.openstack.org/wiki/Swift/Fixing-rebalance-and-golang 21:21:54 any questions on the golang proposal? 21:22:24 is it scoped to swift specifically, or all of openstack? 21:22:30 timburke: swift 21:22:38 should the title reflect that? 21:22:39 and more specifically, swift's storage nodes 21:22:56 probably. but let's not make edits until they are requested by the TC 21:23:02 ya 21:23:17 ack 21:23:58 ok, last issue on the agenda that I wanted to bring up 21:24:03 #topic DB replication issue 21:24:20 this is a bug filed by Pavel, and it looks like it might be a big deal 21:24:26 #link https://bugs.launchpad.net/swift/+bug/1675500 21:24:26 Launchpad bug 1675500 in OpenStack Object Storage (swift) "Container/account disk drive fault results replication on all rest drives" [High,In progress] - Assigned to Pavel Kvasnička (pavel-kvasnicka) 21:24:58 the summary is that if a drive fails, the DB replicator might over-replicated the container (and account?) DBs 21:25:08 timburke looked at it this morning and seemed to confirm it 21:25:19 so it's at least a high priority bug 21:25:32 might go up to critical based on what further investigation finds 21:26:06 at this point, we need some other people to duplicate the steps that Pavel wrote in teh bug report and then take it to the next step. try with more drives than a SAIO, more nodes, etc 21:26:15 and see what the scope is and where it might be 21:26:31 timburke: any more to share on this, or did I get all that down correctly? 21:26:43 fwiw, i was doing the four drive, two replica variation from the bug report 21:27:25 and wound up with a replica of some DBs on ever drive 21:27:33 every* 21:27:57 a new replica each pass, right? 21:28:15 I'll take a look and try and recreate it best I can. I've spent alot of time in database replication thanks to sharding. 21:28:23 notmyname: yup 21:28:26 notmyname: I'll try to give tit some time 21:28:28 mattoliverau: you beat me to the question! thanks :-) 21:28:31 acoles: thanks 21:28:55 I need to spend more time in db replication code ;) 21:28:58 ok, please leave notes in the bug and/or IRC 21:28:59 mattoliverau: ^^ 21:29:25 is it specific for rings of 2 replica or it should apply for n replica? 21:29:36 rledisez: great question. I don't know 21:29:47 I wondered if it might be related to the odd/even quorum size 21:29:53 i’m gonna test on a 3 replicas cluster then 21:29:54 but that needs to be tested 21:29:58 rledisez: thank you 21:30:28 from the bug report, it sounds like drives >= 2 * replicas? but yeah, please test 21:30:36 oh, and timburke will be out after today on vacation, so don't ping him for it after about 3 hours from now 21:30:45 hehe 21:31:08 timburke deservers a vacation! 21:31:25 wow, my tpying is bad today.. I blame morning and no coffee 21:31:25 mattoliverau: not like he can walk down to the beach every day! 21:31:40 pftt, you could if you came for a visit ;) 21:31:46 :-) 21:31:47 mattoliverau: but it should a vacation in which i have enough of sharding loaded into my head to continue thinking about it in the back of my mind :-) 21:32:01 timburke: :) 21:32:11 #topic open discussion 21:32:18 anything else to bring up in today's meeting? 21:32:31 i'm starting to think about the next release of swift and swiftclient 21:32:56 there's a few things in the works that I'd probably prefer to wait on (like resolution or triage of pavels bug) 21:33:03 and a very nice perf patch to swiftcleint 21:33:17 but keep the next release in the back of your mind 21:33:32 start thinking about stuff you want to see in it that's nearly ready to land 21:33:39 anything else from anyone? 21:34:25 oh... speaking of... i wouldn't mind if someone else took a look at the test failure for https://review.openstack.org/#/c/449771/ 21:35:00 and thinking hard about py2 vs py3 and what it means to be using a unicode buffer for our upload source... 21:35:11 and then deciding whether that test actually makes any sense to keep 21:36:03 right 21:36:28 i would ask joel, but he's not here :'( 21:36:59 m_kazuhiro: could you take a look at timburke's patch there? 21:38:07 notmyname: please give me some minutes. 21:38:30 m_kazuhiro: sure. I just mean sometime soon. not right this moment 21:38:58 anything else from anyone? 21:39:08 if not, we'll end early (yay!) 21:39:29 again, i'm out through all of next week, so no rush on my account. although if we want a release sooner, i'm also happy to have someone else push over the patch to get it in 21:40:50 oh! are people OK with the change in behavior in https://review.openstack.org/#/c/446142/ ? 21:41:50 i maintain that the exclusions in versioned_writes for DLOs are a bug 21:41:52 clearly acoles is :-) 21:42:16 timburke: i think I came up with another reason why DLOs are not versioned, but we can take it to #openstack-swift 21:42:26 or the patch 21:42:38 and when we pulled versioned_writes out to middleware we went for bug-for-bug compatibility which was totally the right call 21:42:45 but now i wanna fix the bug 21:42:53 tdasilva: ya, that's fine 21:42:58 ok, then let's close this meeting 21:43:07 thanks for working on swift! 21:43:14 #endmeeting