00:01:52 #startmeeting CongressTeamMeeting 00:01:53 Meeting started Thu May 12 00:01:52 2016 UTC and is due to finish in 60 minutes. The chair is thinrichs. Information about MeetBot at http://wiki.debian.org/MeetBot. 00:01:54 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 00:01:56 The meeting name has been set to 'congressteammeeting' 00:01:57 hi 00:02:24 Today I just have status updates for the agenda. 00:02:32 Anything else to discuss? 00:03:27 #topic status 00:03:31 ramineni: want to start? 00:03:48 thinrichs: sure, 00:04:55 thinrichs: i just started with migration of test-congress this week 00:05:38 thinrichs: and fixed some gat failures 00:05:51 thinrichs: thats it from my side for this week 00:06:26 Great! Getting good unit test coverage for at least the in-memory version of the new arch is important, I'd say. 00:06:35 And the gate is always important to get fixed 00:06:43 masahito: want to go next? 00:07:06 hi all sorry I’m late. just got in from a conference. 00:07:26 sure 00:07:40 ekcs: no problem. Just doing status updates. 00:08:48 I updated the code Congress with new arhictecture and it work now! 00:08:55 https://review.openstack.org/#/c/314873/ 00:09:28 From link? 00:09:32 wrong link? 00:09:52 And I added codes which loads configured datasources. 00:10:43 from this patch, it works https://review.openstack.org/#/c/280793/11 00:11:39 ekcs: sorry, I wrote wrong linke. 00:12:46 Hi all. Still here? 00:12:54 yes 00:12:56 yes 00:13:18 yup 00:13:26 My irc client crashed, and then wouldn't reconnect. I couldn't even ping chat.freenode.net 00:13:35 masahito: still gate job for new arch fails , i think is it because of datasource_id vs name? 00:13:53 ramineni_: I think so 00:14:27 ramineni_: or another bugs in new_architecture 00:15:13 The name vs. id would definitely cause tests to fail 00:15:20 Could be other problems, of course. 00:15:40 How did we decide to fix that? 00:16:00 Have the API translate from ID to name? 00:16:16 by asking the DB? 00:16:26 I think the consensus was to to do that. same way it’s done now. 00:18:40 i think patch got merged for that right 00:18:54 masahito: Did I miss any important discussion about your status update? 00:19:32 I think nothing. 00:20:17 thinrichs: my update is just Congress with new architecture works now! 00:20:27 ramineni: the patch for translating from ID to name? 00:22:32 thinrichs: not getting now, i thought you have kept a patch for it already? 00:23:28 masahito: the whole new architecture works? Even the datasource name/id bug is fixed? 00:24:26 ramineni: I abandoned that patch thinking we were just trying something out. 00:24:45 ramineni: masahito reviewed it and pointed out some correctness problems 00:24:46 thinrichs: https://review.openstack.org/#/c/310597/ 00:25:06 thinrichs: its abandoned 00:25:08 ? 00:25:25 I was just confused. I'll resurrect it and work out the bugs. 00:25:43 thinrichs: oh, 'works' means Congress can launch with DseNode, not pass tempest tests. 00:26:10 masahito: understood 00:26:30 thinrichs: oh ok 00:26:57 ekcs: want to do your status update? 00:27:09 sure. 00:27:35 Thinking through and writing out the details of HA proposal based on our discussions. working with Tim and Andrew from Redhat on that. 00:27:35 Design summarized in a diagram in the etherpad. Comments welcome. https://etherpad.openstack.org/p/newton-congress-availability 00:27:39 Will be putting it together into a spec proposal. 00:27:44 Also will revise the update sequencing patch once we decide whether to do sequencing logic in DseNode or DataService. 00:31:05 ekcs: for the reactive enforcement policy engine, a hot+cold would add quite a bit of complexity b/c we'd need leader election 00:31:39 ekcs: originally I had thought we'd want that to ensure we don't miss executing any actions if the engine crashes and we need to restart it. 00:32:00 But do we really need a hot+cold? 00:33:14 I’m thinking for first step we can just restart action execution policy engine on failure. that gives us maybe 30 second or less down time? 00:33:40 but if better is needed, then standby would be necessary. 00:33:55 thinrichs, ekcs: restarting policy engine might not work evrytime right on the same node 00:34:16 Here's my question: are we actually ever going to miss executing an action, even if the policy engine is down for minutes or hours? 00:34:43 All the messages are sent over oslo-messaging with the receiver set to the execution policy engine 00:35:02 Those messages will stay there on the bus for however long it takes to resurrect the policy engine. 00:35:34 When it comes back up, it should read the messages off the bus, and as long as it processes them 1 by 1, execute all the correct actions. 00:36:01 (That's assuming, I suppose, that it doesn't fall so far behind that it never catches up. But that'd be a whole different problem anyway.) 00:36:17 thinrichs1: The way things are done right now, executions will be missed. because a policy engine only looks at latest snapshot on restart. but I can think through whether we can do it differently and leverage the message queue to make sure we don’t miss execution. 00:37:05 Leader-election is hard, so if there's some way we can leverage oslo-messaging to avoid it, that'd be good. 00:37:32 Does anyone know if there are tools in OpenStack to make hot+cold easy to implement? 00:37:35 so you’re saying not missing any execution action is an important requirement. downtime is less important. 00:38:34 I think so—for the execution policy engine at least. The only reason downtime would matter is if the user wanted actions to be executed as soon as the conditions were satisfied. 00:39:05 Hmmm…that was what masahito needed though—sub-second response times 00:39:43 thinrichs1: usually hot+cold is implemented by pacemaker in OpenStack 00:40:15 So pacemaker is responsible for picking a hot and a cold 00:40:26 The other thing is this: my understanding is that we’ll make the necessary changes in congress to do an HA deployment. And we’ll document and test a reference HA deployment. but it’d be up to the operator to actually configure the deployment with all the different tools. is that correct? So to me it’s okay to document how to do a hot standby deployment using say pacemaker. and let that be an optional thing based on user need. 00:41:22 is that your understanding as well? 00:42:02 ekcs: I'm not sure what other projects do. Based on the discussion with the Redhat/Suse guys at the summit, though, that sounds right. 00:42:28 ekcs: yes, but IMO using pacemaker is defact standard now. 00:42:45 the end user would need a bunch of external tools to do it. corosync, pacemaker, haproxy, etc. 00:42:53 masahito: yes. 00:43:50 Ideally we'd want to have system tests that test our HA—deploy the HA configuration, shoot some instances, and check that we don't lose functionality. 00:44:05 Not sure how other projects do that though. 00:45:13 Thanks for the update ekcs. 00:45:15 thinrichs1: you mean an automated test? 00:45:21 ekcs: yes 00:45:46 Let's all try to take time to read/comment on ekcs's etherpad proposal. 00:46:04 ekcs: or should we wait, if you're getting a spec ready? 00:46:25 I think quick scan and comment/questions would be helpful. 00:46:41 #action Everyone takes a quick look at the HA 00:47:09 #link https://etherpad.openstack.org/p/newton-congress-availability 00:48:00 For my status update, I've been looking at the vm-placement policy engine 00:48:28 trying to get it working again for person who mailed us wanting to use it. 00:48:54 Trying to enable it to be spun up like a datasource 00:49:27 just so that it can be used on mitaka/liberty/etc. 00:50:01 I'll also make sure to resurrect and fix up the patch for the datasource name/id bug 00:50:12 That's about it 00:51:17 Anything else today? 00:51:51 on vmplacement, 00:52:36 I’m a little confused as to whether the person wants to use vmplacement policy engine we have, or use his/her own vmplacement code. 00:52:48 That I don't know for sure either. 00:53:14 The vm-placement code wasn't easy to write, so I was assuming they'd want to start with that and tweak it. 00:53:17 ok 00:53:20 But it's hard to say 00:55:34 Seems like that's it for the day. 00:55:37 Thanks all! 00:55:43 thanks! 00:56:20 thanks 00:57:15 #endmeeting 16:01:01 hello 16:01:20 o/ 16:01:20 emagana: moo. 16:01:57 anyone here for the net guide meeting? 16:02:02 I am 16:02:07 cool 16:02:57 Sam-I-Am: Error: Can't start another meeting, one is in progress. Use #endmeeting first. 16:03:03 #endmeeting