00:05:50 #startmeeting CongressTeamMeeting 00:05:51 Meeting started Thu Aug 11 00:05:50 2016 UTC and is due to finish in 60 minutes. The chair is thinrichs. Information about MeetBot at http://wiki.debian.org/MeetBot. 00:05:52 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 00:05:54 The meeting name has been set to 'congressteammeeting' 00:06:00 ekcs: hi! 00:06:05 Greetings all! 00:06:20 ekcs: thanks for handling last week's meeting. I looked over the logs—sounded like a good one. 00:06:38 hi 00:06:49 thanks! 00:07:07 masahito is out of town this week 00:07:12 So we can get started. 00:07:17 On my agenda... 00:07:26 1. Extra ATC nominations 00:07:32 2. Status updates 00:07:53 3. Synchronization during API calls discussion 00:07:57 Anything else? 00:08:22 i’d like to talk about 00:08:38 how to test congress HA in tempest. 00:08:53 but maybe only if masahito is here. 00:09:05 4. testing Congress HA in tempest (possibly) 00:09:31 #topic Extra ATC nominations 00:10:04 Typically Active Technical Contributors are people who have submitted at least 1 patch 00:10:23 but now it's possible to nominate someone who has been an active contributor but who has not submitted any patches 00:10:57 Is there anyone like that we should consider nominating? 00:11:05 * thinrichs digging up link to description 00:11:35 ATCs get free admission to the summit 00:11:42 which is I think the main value 00:11:52 besides acknowledging their contributions publically 00:12:36 Most people have patches. Only person I can think of who wouldn’t have a Congress patch may be bryan_att. 00:13:23 #link http://www.openstack.org/legal/technical-committee-member-policy/ 00:14:16 ekcs: That's true. I don't think he's submitted any patches to Congress, but he is actively interested and integrated Congress into OPNFV Copper. 00:14:30 " Contributions 00:14:30 might include, but aren't limited to, bug triage, design work, and 00:14:30 documentation -- there is a lot of leeway in how teams define 00:14:30 contribution for ATC status." 00:14:58 Nova submits people who have Co-authored a patch but not been primary committer 00:15:36 Integrating Congress into OPNFV Copper probably wouldn't do it. That's great open-source contributions but not contributions to OpenStack 00:15:54 question: it is a project by project thing? like basically if someone has other openstack patches but not congress patches, does it still make any difference whatsoever to nominate someone? 00:15:57 He worked on installers though and was trying to get them committed 00:16:07 ekcs: I don't think so 00:16:17 ekcs: I think it's binary: either your an ATC or you're not. 00:16:36 ekcs: whether you contributed to 1000 patches to 10 different projects or 1 patch 00:16:37 thinrichs: Yes, something with JuJu but he may not have been the primary author. 00:17:25 bryan_att is the only person I can think of too. I'll follow up with him to see what he's done and whether he's already an ATC. 00:17:42 Oh, and I guess whether he's an OpenSTack Foundation member, which is also required 00:18:26 Moving on… 00:18:28 thinrichs: I'm checking Stackalytics 00:18:39 aimeeu: great! 00:18:57 (well I do work with him sort of) 00:19:04 #topic status 00:19:16 ramineni_: want to start? 00:19:41 thinrichs: ok 00:21:00 not much update, worked on couple of gate failures and tempest tests 00:21:32 thats it from my side 00:21:45 The gate seems to be green now. Thanks! 00:21:50 Sounded like a challenge to get it working. 00:22:24 actually inbetween its because of tempest failure, not related to our code 00:22:45 the other one, ekcs actually worked an disabled the test , as it is redundant 00:23:19 Right—saw the disabled one. Sounds like there may be some deeper issues lurking there. 00:23:30 Anything to worry about? 00:24:04 ya, couldnt figure out what is the issue, i thought locking mechanism in place should fix that , masahito already has patch for that 00:24:30 it seems like masahito’s patch fixes the issue. 00:24:38 *would* fix the issue. 00:25:44 I imagine we'll find a number of locking issues as we move forward with the new arch 00:26:09 ekcs: want to go next? 00:26:25 lossless DSD failover patch in review. lossless PE failover patch is coming. 00:26:34 Thought about how to do integration test of replicated Congress deployment. Maybe we'll discuss it later. 00:26:47 Identified a couple more issues with datasource add/delete/sync when it comes to distributed/replicated deployment. About to document in bugs. 00:26:53 Worked with ramineni_ on resolving gate issues. 00:27:15 that’s all. 00:27:56 ekcs: it is lossless in the sense that it retries to exec an action? 00:28:25 thinrichs: yea. loose terminology. 00:28:41 ekcs: just making sure I knew the patch you meant 00:29:00 Maybe I missed this when looking at the patch.... 00:29:16 do all the PEs retry? Or how do they know when to retry? 00:29:55 all PEs retry. 00:30:07 on every action execution? 00:30:08 the DSD (whenever it comes back online) would acknowledge all “followers” and exec action from “leader" 00:30:28 yes. 00:31:02 Got it. 00:31:57 if we’re not careful, 00:32:16 we could run out of greenthreads if lots and lots of actions take place within the retry time. 00:32:26 not an issue if retry is handled by oslo-messaging. 00:32:55 Does oslo-messaging have retry functionality? 00:33:59 it does, but it’s not totally clear how it behaves and whether it fits our need. 00:34:14 response to your comment: The reason why I implemented our own retry logic is the following: 00:34:15 1. Using built-in retry, it doesn't seem that we can separately specify how long to wait on each try and how long before we give up retrying. From reading the docs, in some situations oslo-messaging may wait the full length of the timeout on the first try, instead of retrying. The exact behavior probably depends on the transport used and other specific conditions. 00:34:16 2. oslo-messaging does not support unbounded retry without timeout. I made it an option with our retry logic. But maybe we don't want that anyway. (Like maybe if the ExecutionDriver takes a full day to come back, maybe it never makes sense to retry executing the action after that long.) 00:34:59 Understood. Thanks for copying in your response here. 00:35:03 I also asked this on openstack-dev, but haven’t gotten anything helpful. 00:35:10 Hi all, I have a question about oslo-messaging RPC retry. Thanks so much! 00:35:11 Say I set timeout=600 and retry=None (unbounded retry within time), would an rpc call wait 600s on the first try or would it do multiple retries until 600s total time has elapsed? What factors does the answer depend on? 00:35:11 Alternatively, if I set timeout=600 and retry=100, would it be 600s total time before failing? or would it be 600s timeout for each try? 00:35:22 May need to just read the code and test it myself to figure out. 00:36:00 not clear how to test how many failed retries take place. 00:36:27 Right—retry logic seems hard to test. Maybe better to look at code/comments for that one. 00:37:17 aimeeu: want to give a status update? 00:37:26 Sure. Unfortunately not much from me this past week due to lots of last minute meetings . Code reviews, and thanks ekcs for the feedback on the HA Overview guide. #link https://review.openstack.org/#/c/350731/ 00:38:02 That's it from me 00:38:56 aimeeu: anything you need from us? Last week you were looking for a feature to add and test. Did you find something? 00:39:16 ekcs gave me some ideas to ponder 00:40:07 This isn't a feature, but we talked about replacing exception.message with the str(exception). 00:40:56 Yes, I started on that but the majority of changes are in the horizon plug-in, so I was waiting until ramineni's patch was merged 00:41:04 Got it. 00:41:30 I'll keep my eye out then, as I'm (slowly) getting back into things after vacation 00:41:55 I can probably finish up the exception patch tomorrow 00:42:07 aimeeu: great! 00:42:22 status update from me… 00:42:32 Trying to get that large patch in that disables Dse1. 00:42:51 Rebased today and addressed lingering comments. Think it's ready to go. 00:43:03 Other than that, trying to get back up to speed from vacation 00:43:17 That's it. 00:43:27 2 items left on the agenda.. 00:43:41 synchronization on APIs and HA testing with tempest 00:44:10 ekcs: since you've spent time thinking about it, I'd suggest HA testing with tempest 00:44:15 unless you want to wait for masahito 00:44:32 i’ll put out what I have now 00:44:38 great 00:44:46 and hear your comments and see what masahito thinks later. 00:44:49 may be quick. 00:44:54 Last meeting we discussed whether to add new HA tests to existing tempest job or create now one. 00:45:02 It's not clear how to test replicated Congress in tempest because: 00:45:03 1. Not sure we can (and how complex it is to) deploy a load balancer in the test env. 00:45:03 2. Not clear how easy it is to kill and restart individual processes in the test env. 00:45:30 My current thinking is this: 00:45:31 For basic functionality testing: deploy 1 API + n PE + 1 DSDs-node, run all the current tempest tests (except the replica-HA test). Some tests would need to be modified to wait/retry because different instances get out of sync. 00:46:13 Agree with basic functionality 00:46:16 Notice that this is NOT how we recommend deploying. We recommend n API+PE nodes behind a load balancer. But we can test it this way to avoid having to setup a load balancer. Not ideal, but I think it works. 00:46:57 Hang on… missed the n PEs 00:47:16 Does that work? 00:47:37 I thought the API was hard-coded to send to 1 engine 00:47:40 by name 00:48:10 thinrichs: i think it sends if engine exists locally otherwise rpc 00:48:19 so it should work 00:48:29 ekcs: right? 00:48:32 how does the API pick which engine to send it to? 00:48:35 Yes, but when we run multiple PEs with the same service name (different node name), oslo-messaging actually just arbitrarily picks one. 00:49:00 Interesting behavior from oslo-messaging 00:49:07 or send to multiple and take response from first. 00:49:26 either way should work for tempest purpose. 00:49:36 Another option is to deploy n API+PEs and 1 DSD, like we recommend... 00:49:48 and then write tests that implement the load-balancer 00:50:13 which we probably want to do anyway 00:50:34 Such as write policy to API+PE 1; then run query on API+PE 2 00:51:03 that’s an option too. 00:51:31 I could imagine having a LB class that simulates the behavior of a LB, for when we want to do that. 00:51:32 but then we get less reuse out of out current tempest tests. 00:51:50 I see 00:52:21 yes, we can do that too. it’s a but complicated though. I was thinking we would deploy N API+PEs on different ports. and load balance between them. 00:52:27 as an option. 00:52:41 For testing different failover scenarios, I'm think maybe the best place is to start by using the unittest environment (similar to test_congress). Again it's not ideal because it's not how things actually get deployed, but it makes it much easier to manipulate and create failure scenarios. 00:52:41 But that's where I want to hear from you guys and masahito to see if I'm mis-estimating the complexity of different things. Like maybe it's easy to manipulate processes in tempest. And maybe it's easy to deploy load balancer in tempest. 00:53:15 and that’s where we can also do tests simulating LB behavior. 00:53:51 that’s all from me. 00:54:05 I'd say it makes sense to use test_congress tests for more comprehensive testing. 00:54:39 I usually think of tempest tests as checking that deployment and devstack are set up correctly. 00:54:58 And tempest tests parallelism. 00:55:00 thinrichs: swaning new processes and testing with curret tempest tests should be doable 00:55:32 I guess we're already spinning up a new Congress instance inside the tempest tests. 00:55:36 ramineni_: right that’s an option as well. 00:55:38 So there's the functionality to start new processes 00:55:53 yes 00:55:55 which makes spinning up LBs inside of tempest a possibility 00:56:01 it’s just not clear that it offers any benefit over test_congress style. 00:56:21 is test_congress running with true parallelism? 00:56:46 Can we effectively test race conditions? 00:57:07 I assume that if you start new processes then it’s parallel. 00:57:18 even if only one core is allocated, you still get to test race conditions. 00:57:38 I'm always in favor of testing inside of test_congress since then we get feedback quicker when we've broken something. 00:57:50 And I think it ends up being more comprehensive. 00:58:03 because the tests are easier to write 00:58:14 yup that’s my thought too. but want to hear/consider if there are advantages to doing it in tempest. 00:58:16 I agree , they would be more comperehensive 00:58:32 tempest maybe we can go for basic testing 00:59:24 We definitely want to test the real deployment with LBs, etc. using tempest. Using test_congress for comprehensiveness makes sense to me too. 00:59:43 1 minute left. Any last thoughts? 01:00:00 i thought of adding unit tests for synchronizer too 01:00:13 https://bugs.launchpad.net/congress/+bug/1609223 01:00:13 ramineni_: Error: Could not gather data from Launchpad for bug #1609223 (https://launchpad.net/bugs/1609223). The error has been logged 01:00:15 ramineni_: that would be cool! 01:00:26 out of time 01:00:34 Thanks all! 01:00:56 thanks! 01:01:17 #endmeeting