00:05:47 #startmeeting CongressTeamMeeting 00:05:48 Meeting started Thu Jun 2 00:05:47 2016 UTC and is due to finish in 60 minutes. The chair is thinrichs. Information about MeetBot at http://wiki.debian.org/MeetBot. 00:05:49 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 00:05:51 The meeting name has been set to 'congressteammeeting' 00:06:01 masahito ekcs ramineni_: courtesy piong 00:06:09 hi 00:06:15 hi 00:06:18 hi all. 00:06:46 I just have status updates on the agenda. Anything else? 00:07:55 One other thing first: the gate 00:07:57 #topic gate 00:08:20 Looking the reviews, we have a failure on a global requirements update.. 00:08:28 #link https://review.openstack.org/#/c/323883/ 00:08:40 One of the doctor driver tests is failing. 00:08:57 Error says that it can't find a datasource. 00:09:26 I wonder if this has to do with our translation of datasource IDs to names in the API. 00:09:43 masahito: I added you as a reviewer since you're most familiar with that code 00:10:35 thinrichs: yap, I'll check it. 00:10:49 masahito: great! 00:11:26 ramineni_: One of your patches is failing devstack too, but I couldn't seem to track down the error. 00:11:28 #link https://review.openstack.org/#/c/320306/ 00:12:10 That patch has been ready to merge since last week 00:12:21 so I'd be surprised if it had anything to do with the patch 00:12:27 thinrichs: yes , changing to the rabbit driver is not working if started on multhiple nodes 00:13:04 ramineni_: I'd like to hear more on that. 00:13:06 #topic status 00:13:06 masahito raised a patch to solve the issue https://review.openstack.org/#/c/321459/ 00:13:59 masahito: I had a question on that patch similar to ramineni's… 00:14:23 That patch assigns a different partition to every DSENode, right? 00:14:46 yes 00:14:47 Doesn't that stop them from communicating? They'd all be listening on different topics. 00:15:07 yes 00:16:08 thinrichs: ya, i think policies must be synchronized across nodes? 00:16:27 policy synch is done via DB which is independent. 00:16:44 ekcs: datasource policies are not synchrnozed via DB 00:17:17 ekcs: they are stored in mem right, not sure about the reason of not storing in DB in code 00:18:16 ramineni_: can you clarify what datasource policies need DSE2 to sync? 00:18:52 masahito: so when we write a script that creates multiple DSENodes, we need to feed them all the same partition_id, right? 00:20:01 masahito: (in a multi-node deployment where say the policy engine runs in one DseNode and the datasources run in a separate DseNode) 00:20:02 thinrichs: with multiple PolicyEngine? 00:20:20 oh 00:20:33 in that case, yes. 00:20:52 I forgot that usecase. 00:21:03 ekcs: this is the bug we have now 00:21:06 https://bugs.launchpad.net/congress/+bug/1585975 00:21:06 Launchpad bug 1585975 in congress "Fails to start replica on rabbit bus" [Undecided,New] - Assigned to Masahito Muroi (muroi-masahito) 00:21:30 adding to what thinrichs is asking, sometimes we want multiple nodes to be on the same partition, sometimes we don’t. I wonder how we should control that. Right now it seems the patch is taking the node_id as partition_id. it’d be nice is we can separately specify partition_id, with default being all same partiiton. 00:21:54 But I'm not sure we allow users to deploy Congress multiple DseNode with one PE now. 00:22:54 I'm confused why it matters how many PEs there are. 00:23:04 PE1 and PE2 maintian their own datasource policies in memeory , 00:23:06 I understand the reason of ramineni_'s and thinrichs's questions. 00:23:10 Maybe I'm missing something, so let me lay this out. 00:23:29 Suppose we want PE to run on machine1 and all the datasources to run on machine2. 00:24:02 Then on machine 1 we create a DseNode, and spin up the PE. 00:24:13 And on machine 2 we create a DseNode and spin up all the datasources 00:24:36 The way those 2 DseNodes communicate is via oslo-messaging, which means the DseNodes need to be talking on the same topic. 00:24:46 So that would require them to be given the same partition_id 00:24:53 Right? 00:24:55 right. 00:25:17 Today we don't have a mechanism/script for deploying that way, so it shouldn't be a problem. 00:25:42 I would imagine this bug is due to the fact that the replica SHOULD be assigned a separate partition but it is not. 00:26:05 thinrichs: yes. 00:26:14 yes 00:26:29 ramineni_: does that sound right? 00:27:15 thinrichs: confused :( if its assigned seperate partition they cant listen on same topic right 00:28:09 In case of HA we would want to listen on same topic , so that any policy engine can serve the requests 00:28:10 ramineni_: the replica is intended to be a totally separate instance of Congress … 00:28:12 that does not communicate with the other version of Congress EXCEPT by synchronizing with the DB. 00:28:57 ramineni_: We're not that far along yet. Masahito's change is intended to just fix the existing HA test, not implement the HA/HT arch that ekcs is working on 00:29:36 In effect, we want to handle 2 different use-cases for DseNode: 00:30:04 (i) creating a new "root" DseNode that is isolated from all other "root" DseNodes 00:30:18 (ii) creating a new DseNode that communicates with an existing "root" DseNode 00:30:43 thinrichs: oh ok, got it 00:31:26 masahito: does your change allow us to do both use-cases? 00:32:04 no 00:32:23 It allows only (i) 00:32:40 I think we need additional works for (ii) 00:33:20 I guess what we want is for the eventlet_server to take an argument that is the name of the DseNode partition 00:33:23 I think it’d be clearer to make it an explicit option to specify a different partition_id, rather than using the node_id as partition_id. Then we can also build on that to support (ii). but I’m also okay with this as a quick fix for now. 00:33:39 ekcs: agreed 00:34:12 I'd also like us to get a multi-node version of Congress working (the non-replicated kind) 00:34:51 and have scripts/code/something so that we can arrange multiple DseNodes to implement the different HA/HT architectures we're designing 00:35:06 That is, we should do some testing that multiple DseNodes on different machines actually works. 00:37:27 And have a tempest test or two that spins up a multi-node version of Congress and runs tests 00:37:39 make sense 00:37:49 That seems like the next milestone for the dist-arch 00:38:13 Once that's working, we can begin adding the code for HA/HT 00:38:26 yes 00:38:28 thinrichs: agreed. 00:38:58 We do have the script in scripts/start_process.py that was intended to start up several DseNodes 00:39:24 That was written before we had DseNode, so it'll require some tweaking or rewriting 00:39:49 But it's worth looking through at the very least, since I know it underwent several rounds of iteration 00:39:59 got it. 00:40:38 Anyone want to spearhead writing the tempest-tests? 00:42:08 thinrichs: i will look into it 00:42:42 ramineini_: great! Pull us in as you need to. 00:42:58 sure 00:43:27 I'd even start by spinning up 2-3 processes on a single machine, where each process has its own DseNode. 00:43:51 That could be the first tempest test. 00:44:06 I don't know if tempest supports multi-node stuff, so that may be the only test. 00:44:23 The script start_process.py was designed for the multi-process, single-node case. 00:44:27 yes, thats what im thinking too 00:44:53 current HA test starts 2 processes with different node id currently 00:46:02 ramineni_: perfect—so we have a model to start from 00:46:23 15 minutes left. Let's discuss HA/HT. 00:46:32 #topic High availability and throughput 00:46:47 ekcs: how's that discussion going? 00:47:10 Good. I reworked HA spec with more details on the issues people are concerned about. Thanks to all the feedback, I think we're converging toward supporting active-active replication with symmetric nodes in the first phase. Please comment if you think we should take a different approach! 00:47:20 #link: https://review.openstack.org/#/c/318383/ 00:47:43 I will be filling in the smaller details this week. 00:48:20 I understand from spec that we are going for active-active for PE and active-passive for datasources node right? 00:48:44 ramineni_: right. 00:50:01 And then what for action execution? Leader election of some sort 00:50:07 ekcs: but currently concentrating on all processes on signle node? 00:50:59 thinrichs: yes. not sure there is a consensus between local leader and global leader. 00:52:15 ramineni_: All services (PE, API, DSDs) in one single process node. Replicate that node N times, but disable DSDs on non-leaders. 00:52:53 ekcs: "disable DSDs" means stop them from polling as well as taking them off the bus? 00:53:09 thinrichs: yes. 00:53:57 ramineni_: nodes will likely be repliacted across different hosts. 00:54:15 ramineni_: up to the deployer but that’s something we need to support. 00:54:43 ok 00:55:13 this leader election only for action requests? 00:55:30 The crucial bit there seems to be ensuring that (a) Pacemaker pushes leader election info into Congress correctly and (b) Congress properly disables DSDs on non-leaders. 00:55:56 ramineni_: leader election needed also for disabling DSDs on non-leaders. 00:56:04 Do we know how Pacemaker picks a leader and how non-leaders find out they're non-leaders? 00:56:12 thinrichs: yes. 00:56:33 (Running short on time.) 00:56:35 ekcs: how 00:56:36 ? 00:56:41 ekcs : i thought on other nodes we could start only API and PE not datasources 00:57:01 thinrichs: We write custom scripts called (resource agents) that Pacemaker calls to promote and demote. 00:57:17 like proposed in this patch https://review.openstack.org/#/c/307693/ 00:57:19 those script can make API calls to tell a node to be leader 00:57:52 ekcs: so basically we'll need to add API calls that tell a DseNode to disable its datasources and to enable its action-execution 00:58:10 That seems straightforward 00:58:12 yes. pacemaker picks a leader based on underlying clustering as wel as different weigts you can cinfugure. 00:58:47 underlying clustering provided by corosync or another service. 00:59:04 ekcs: i think im confused on the leader approach, i will add my questions on the spec 00:59:47 Think that seems good to me. Only downside I can see is that we're taking a dependency on Pacemaker even if the user doesn't need action-execution. Without action-execution, we can get by without leader election. 00:59:47 ekcs: thanks for the very detailed spec, thats realy helpful :) 00:59:48 ramineni_: yes that’s what we started out with. but then evolved to a new proposal. the spec discusses and compares both. look forward to your questions and comments! 01:00:33 thinrichs: yes. In the assymmetric node approach we don’t need global leader period. but is morecomplex in other ways. 01:00:47 But this way we'll support full functionality out of the box with HA/HT. 01:00:54 on the other hand, Pacemaker is essentially required and leader election comes at low marginal cost. 01:01:04 once you’ve set up pacemaker. 01:01:17 We can provide an alternative deployment as an option later (e.g. for if you need HT datasources or don't need action-execution) 01:01:29 Out of time for today. 01:01:32 Thanks all! 01:01:36 #endmeeting