00:00:21 <thinrichs> #startmeeting CongressTeamMeeting 00:00:22 <openstack> Meeting started Thu Aug 13 00:00:21 2015 UTC and is due to finish in 60 minutes. The chair is thinrichs. Information about MeetBot at http://wiki.debian.org/MeetBot. 00:00:22 <pballand> hi 00:00:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 00:00:24 <masahito> Hi 00:00:26 <openstack> The meeting name has been set to 'congressteammeeting' 00:00:30 <veena> Hi 00:00:30 <Yingxin1> hi 00:00:31 <RuiChen> hola, guys 00:00:39 <jwy> hello 00:00:55 <thinrichs> We've got a good crew here right on time. So let's get started. 00:00:56 <Prakash_DataTap> #info Prakash Kanthi 00:01:18 <thinrichs> Let's start with a quick recap of last week's mid-cycle sprint. 00:01:22 <thinrichs> #topic Mid-cycle sprint 00:01:39 <thinrichs> We had 9-10 people working hard on a new distributed architecture. 00:01:54 <thinrichs> I sent out an email to the mailing list with details. 00:02:14 <thinrichs> But the short version is that we decided on... 00:02:23 <thinrichs> running each datasource driver in its own process; 00:02:30 <thinrichs> running each policy engine in its own process; 00:02:36 <thinrichs> running the API in its own process; 00:02:45 <thinrichs> and having them all communicate using oslo.messaging. 00:03:16 <thinrichs> There were also plans for generalizing that, as the need arises, to enable multiple datasources/policy engines to run in a single process. 00:03:40 <thinrichs> Comments/questions? (I'll try to dig up a pointer to the email I sent in the meantime.) 00:04:07 <thinrichs> #link http://lists.openstack.org/pipermail/openstack-dev/2015-August/071653.html 00:04:20 <RuiChen> we will support the multi workers on same datasource drivers? 00:05:00 <thinrichs> What does multi-workers mean? 00:05:12 <RuiChen> multi processes 00:05:43 <thinrichs> I suppose that within a single process we could eventually multi-thread the datasource driver code. 00:05:46 <RuiChen> like nova-conductor 00:06:15 <RuiChen> ok, get it 00:06:17 <thinrichs> I don't know nova-conductor 00:06:39 <thinrichs> But we don't expect to ever need more than 1 datasource driver process per datasource. 00:06:43 <thinrichs> Even for high-availability. 00:06:44 <Yingxin1> multiple policy engines means HA? There is only one engine right now 00:07:09 <thinrichs> If one datasource driver process crashes, we just bring up another one, and it'll pull data immediately. 00:07:35 <thinrichs> Yingxin1: either HA or if we end up with different kinds of policy engines (such as the vm-placement policy engine we experimented with) 00:07:36 <Yingxin1> ok 00:08:03 <thinrichs> For policy-engines, we WILL need multiple replicas (multi-master) to handle HA and high query throughput. 00:08:15 <thinrichs> Each of those policy-engines will run in its own process. 00:08:38 <thinrichs> For HA we'll put those policy-engine processes on different boxes. 00:09:00 <thinrichs> Though you could imagine having multiple policy-engine processes on the same box if all you wanted was high query-throughput. 00:09:28 <thinrichs> Before I forget, there are a bunch of notes on the etherpad. 00:09:30 <thinrichs> #link https://etherpad.openstack.org/p/congress-liberty-sprint 00:09:54 <Yingxin1> get it 00:10:15 <thinrichs> Other questions/comments/suggestions? 00:11:00 <masahito> We try to implement it by Liberty, right? 00:11:37 <masahito> it's just a confirmation for me and others who didn't attend the meet-up. 00:11:42 <thinrichs> Feature-freeze (liberty-3) is Sept 1-3, so there's no real way I see getting it done by liberty. 00:12:34 <thinrichs> For liberty, we will release the existing architecture. For M we'll release the new distributed architecture. 00:13:11 <pballand> I do remember agreeing that we will try to have the code ready by the liberty summit though 00:13:50 <pballand> (I assumed that meant we would have a distributed version available on a alpha basis, but the full existing functionality still working) 00:13:58 <thinrichs> pballand: Oh right. The goal was to have the first draft ready in master by the summit. 00:14:09 <thinrichs> And then make it part of the release in M. 00:14:40 <masahito> Thank you. (when I re-read eatherpad there is no description for the timelime so I wanted to confirm it) 00:15:06 <thinrichs> With that, maybe it's time to move on to a discussion about the work-items we produced. 00:15:09 <thinrichs> #topic Blueprints 00:15:29 <thinrichs> The blueprints that came out of the meeting all start with dist- 00:15:36 <thinrichs> #link https://blueprints.launchpad.net/congress 00:15:51 <thinrichs> They're all Medium priority. 00:16:03 <thinrichs> (Typically High priority are the ones we're targeting for the current release.) 00:16:49 <thinrichs> About half of the dist- blueprints don't have assignees. 00:17:22 <thinrichs> A good number of them are for migrating different API modules to use the RPC-style of interaction with the policy engine/datasource drivers. 00:17:42 <thinrichs> All the dist-api- prefixed ones are for the api. 00:17:50 <thinrichs> Those are all fairly small and self-contained. 00:18:44 <thinrichs> Please make sure to sign up before actually starting the work 00:18:50 <thinrichs> so that we don't duplicate effort 00:19:19 <masahito> thinrichs: I can't change assignee because I'm not core. 00:20:26 <thinrichs> masahito: Send me an email and I'll sign you up. 00:20:29 <masahito> thinrichs: I think it's good to send a mail to ML if someone decide to implement it. 00:20:43 <masahito> thinrichs: OK 00:20:58 <thinrichs> masahito: that sounds good, but I'm guessing people will forget. 00:21:10 <Yingxin1> Have we defined a uniformed RPC interface, to make dist-api- easier to implement? 00:21:13 <thinrichs> Or won't want to tell everyone they're signing up. 00:21:22 <RuiChen> masahito: I think it's a good idea 00:21:59 <thinrichs> Yingxin1: that's a good thought. Let's discuss that in a couple of minutes. 00:22:14 <pballand> Yingxin1: I am working on the RPC interface as part of the base class in ‘dist-cross-process-dse' 00:23:00 <pballand> thinrichs: ok, will wait to discuss until you say it’s time 00:23:30 <thinrichs> Question: can anyone else change assignees on blueprints? Or can you only change the assignee on blueprints that you have created? 00:23:33 <Yingxin1> pballand: I'll have a look at it 00:23:58 <RuiChen> maybe we should add the dependency between the dist-* blueprint 00:24:10 <masahito> thinrichs: I can only change assignee I've created. 00:24:25 <Yingxin1> masahito: agreed 00:24:40 <thinrichs> masahito: thanks. I don't see a way to let anyone change the assignee on my blueprints. 00:24:43 <masahito> thinrichs: and especially I can change it to only myself. 00:25:27 <thinrichs> Can someone try this one: 00:25:29 <thinrichs> #link https://blueprints.launchpad.net/congress/+spec/dist-api-rpcify-row 00:25:53 <thinrichs> I set it to Approved and set an Approver. If that doesn't work, I don't see another way. 00:26:56 <RuiChen> I can try it dist-api-rpcify-row 00:27:02 <masahito> Your set looks work. 00:28:00 <masahito> launchpad allows me to change whitboard and workitems. 00:28:17 <thinrichs> I don't see anyone's name showing up under assignee. 00:28:40 <thinrichs> It seems only the owner/core can change that. 00:28:41 <Yingxin1> thinrichs: I still cannot change the Assignee 00:28:47 <thinrichs> Yingxin1: thanks for trying. 00:28:52 <thinrichs> Let's move on. 00:28:57 <thinrichs> #topic RPC interface 00:29:08 <thinrichs> pballand says he's working on something 00:29:40 <thinrichs> I'm guessing it's this: 00:29:42 <thinrichs> #link https://review.openstack.org/#/c/210159/ 00:29:56 <pballand> I’ve been testing out various stragegies for DseNode using oslo.messaging primitives 00:30:01 <thinrichs> I did some of the preliminary work for the policy-model API 00:30:25 <thinrichs> #link https://review.openstack.org/#/c/210691/ 00:30:36 <thinrichs> pballand: please continue—didn't mean to interrupt 00:30:36 <pballand> and also working on the spec for ‘dist-cross-process-dse’ 00:30:59 <pballand> the spec isn’t pushed for review yet, because I wanted to test out some things before proposing a design 00:31:22 <pballand> I’ve found one major shortcoming in oslo.messagaing 00:31:49 <pballand> it seems that the message bus connection is managed automatically (including reconnects)… this is good 00:32:07 <pballand> the problem is that the application logic isn’t notified when the message bus is disconnected 00:32:28 <pballand> this presents a problem with the design we outlined in the midcycle 00:32:55 <pballand> if a node is disconnected, and oslo.messaging reconnects, the dse doesn’t know that it may have missed messages 00:33:11 <alexsyip> We can detect that with sequence numbers. 00:33:27 <pballand> alexsyip: yes - more on that in a sec 00:34:04 <pballand> I am working on two solutions: 1) chatted with some of the oslo.messaging folks, and am going to send an email to the mailing list to propose getting a trigger for connections and disconnections 00:34:29 <pballand> 2) as alexsyip said: we can use sequence numbers to detect gaps 00:35:05 <pballand> sequence numbers don’t work for services that aren’t sending updates, however, so we will need to have periodic heartbeats 00:35:06 <alexsyip> Does oslo messaging lose messages in any other situation? 00:35:22 <alexsyip> some times, these messages systems will drop messages underl overload conditinos. 00:35:46 <pballand> alexsyip: I have yet to determine that, however it seems that the solution in 2) will handle that case as well 00:36:20 <alexsyip> Ok. The clone pattern is meant to deal with lost messages: http://zguide.zeromq.org/php:chapter5#Reliable-Pub-Sub-Clone-Pattern 00:36:39 <pballand> so I’m currently thinking we will ship with design 2, and changes in oslo.messaging will be an optimization if/when they come 00:36:52 <alexsyip> sounds good to me. 00:37:16 <pballand> I am working on a trial of this using oslo.messaging’s RPC interface, and hope to publish a spec by the end of the weekl 00:37:18 <pballand> that’s it from me 00:37:55 <thinrichs> pballand: sounds good to me too. 00:37:59 <masahito> pballand: sounds great. 00:39:27 <Yingxin1> :) 00:39:39 <thinrichs> Am I right in thinking that this sequence-number issue will be handled at a lower layer than the api-models would worry about? 00:40:11 <thinrichs> That is, when doing the api-modifications we can assume RPC is reliable, right? 00:40:48 <pballand> thinrichs: that’s right - I expect the DseNode class will be a parent for all services on the bus, and it will contain methods that send updates and send full data - the base class will manage adding metadata such as sequence numbers 00:40:50 <alexsyip> you can’t ever expect RPCs to be reliable. 00:41:12 <alexsyip> unless you receive an ack. 00:41:20 <pballand> (that’s right was for the first message, agree with alexsyip’s comment) 00:42:04 <Yingxin1> So I think there could be an ack or a timeout. 00:42:08 <pballand> a given table will have in-order updates up to some point, without the caller needing to worry about sequence numbers 00:42:24 <thinrichs> So when someone is writing the API model that inserts a rule, and we send off an API call but don't hear back, what do we do? 00:42:31 <pballand> but when making an explicit RPC to a service, you need an ack or timeout as Yingxin1 says 00:43:03 <thinrichs> Does the behavior depend on the API call? 00:43:07 <alexsyip> You can ask to see if the rule exists. 00:43:11 <pballand> thinrichs: my initial thought would be to throw a 503 in that case 00:43:24 <thinrichs> pballand: but what if the rule actually got inserted? 00:43:30 <pballand> the caller won’t know if the call succeeded or not, but that’s a common problem 00:43:54 <pballand> the caller can check for the rule, or try the insert again (if/when we support idempotent create) 00:44:04 <thinrichs> pballand: so you're saying never retry—it's the user's problem. 00:44:23 <pballand> thinrichs: well, we retry internally, but only up to some time limit 00:44:46 <pballand> ultimately, it’s always the user’s problem (there can be disconnections other places in the line) 00:44:51 <thinrichs> pballand: is that rolled into the base class, or does the retry logic depend on the particulars of the API call? 00:45:03 <thinrichs> I'm just trying to figure out what abstraction we need to use when modifying the API models. 00:45:12 <veena> thinrichs: yes, there should be a time out limit to handle a request message 00:45:31 <pballand> oslo.messaging has support for that internally 00:45:42 <pballand> #link: I am working on two solutions: 1) chatted with some of the 00:45:45 <pballand> oops 00:45:52 <pballand> #link http://docs.openstack.org/developer/oslo.messaging/rpcclient.html#oslo_messaging.RPCClient.call 00:46:26 <thinrichs> So I should be treating the RPC method implemented in the DseNode base class as something that either returns a value/ack or that times out. 00:46:37 <thinrichs> And then if there's a time-out I return a 503. 00:46:55 <pballand> (from the docs, however, it doesn’t look like the call timeout is configurable, so we may need to implement some more logic) 00:47:19 <veena> pballand: it is configurable 00:47:31 <thinrichs> Lost track of the time. 00:47:39 <thinrichs> I wanted to make 1 quick comment before moving on. 00:47:45 <pballand> thinrichs: I would treat it as synchronous, but could raise MessagingTimeout, RemoteError, MessageDeliveryFailure 00:47:51 <pballand> veena: thanks 00:48:00 <thinrichs> pballand: sounds good. 00:48:14 <thinrichs> In the first edits I made to the policy-model, 00:48:16 <thinrichs> #link https://review.openstack.org/#/c/210691/ 00:48:20 <thinrichs> I did 2 things: 00:48:43 <thinrichs> 1. introduced a self.rpc to mimic the one that will belong to the DSENode and made all communication go through that. 00:49:02 <thinrichs> (The implementation is just invoking the policy-engine's methods directly.) 00:49:15 <thinrichs> 2. I moved the database logic out of the API model and into the policy-engine. 00:49:35 <thinrichs> So instead of the API keeping the database and policy engine synchronized, that's left to the policy engine itself. 00:50:10 <thinrichs> None of that relies on having DseNode ready, so we can do all that in parallel with pballand's efforts. 00:50:24 <thinrichs> masahito: you had a question about whether (2) makes sense. 00:50:41 <masahito> thinrichs: yap. 00:51:02 <thinrichs> The question was whether the API should directly talk to the database or whether the API should always talk to one of the other processes to answer questions. 00:51:11 <thinrichs> Is that right? 00:51:14 <masahito> yes 00:51:18 <alexsyip> thinrichs: you could just write directly to the db from the API. Then wait for the policy engine to read from the DB> 00:51:54 <alexsyip> My hunch is that it’s better to talk directly to the db. 00:52:00 <thinrichs> alexsyip: understood, but in the PE case, only the PE knows whether a new rule can be inserted so we need to talk to the PE anyway. 00:53:22 <masahito> IMO, writing access to db is only permitted for PE, reading DB is permitted for all. 00:53:23 <alexsyip> I think it’s not enough to ask the PE 00:53:47 <alexsyip> because there may be two writers going through different PEs that write conflicting writes. 00:54:24 <alexsyip> Oh, maybe not. 00:54:58 <alexsyip> Are you saying the PE writes the transaction, but only the PE knows under what conditions to write the transaction? 00:55:38 <masahito> alexsyip: maybe yes 00:55:50 <thinrichs> alexsyip: yes. The PE is the only one that knows syntactically valid statements, and whether adding a statement would cause cycles. 00:56:09 <thinrichs> Lost track of time again. Let's put this on hold and see if anyone else needs help from the group. 00:56:13 <thinrichs> #topic open discussion 00:56:16 <alexsyip> Are there multiple PEs with different ideas of what is valid? 00:57:31 <thinrichs> Sorry we ran so short on time this week, everyone. 00:57:49 <thinrichs> Lots of energy for the new architecture! 00:57:54 <thinrichs> Great to see! 00:58:06 <thinrichs> No one has anything to ask? 00:58:52 <RuiChen> pleased to see the new architecture in plan 00:58:58 <RuiChen> no from me 00:59:09 <thinrichs> alexsyip: if there were a bunch of rules additions coming in close together, then 2 PEs could get out of sync and then evaluate whether a new rule would create a cycle, I suppose. 00:59:17 <thinrichs> I'm guessing that's going to be rare, but possible. 00:59:40 <thinrichs> It wouldn't be until the sync that the two realized the rules in the DB were actually recursive. 00:59:44 <thinrichs> And so not permitted. 01:00:00 <thinrichs> RuiChen: agreed. 01:00:04 <alexsyip> Yes, so that means the PE is not able to really evaulate at API time. 01:00:15 <thinrichs> Thanks all—we're officially out of time. 01:00:21 <thinrichs> I can continue on #congress for a few minutes. 01:00:25 <thinrichs> #endmeeting