19:01:15 #startmeeting Marconi Project Team 19:01:15 hi 19:01:16 Meeting started Thu Jan 17 19:01:15 2013 UTC. The chair is kgriffs. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:19 The meeting name has been set to 'marconi_project_team' 19:01:37 #topic introductions 19:01:50 OK, folks, let's get this party started 19:02:51 Since this is a new project and involves some folks new to OpenStack, I thought it would be good to start out with a few introductions from the core team and anyone else who's here to talk about Marconi. 19:03:49 First off, I'm Kurt Griffiths with Rackspace in the Atlanta office. I developed the notification bus for our Cloud Backup product. 19:04:08 treeder: you want to go next? 19:04:31 sure 19:05:57 I'm Travis Reeder, CTO of Iron.io, in SF. We built a cloud message queue called IronMQ. Excited to be involved in this project to help define the cloud MQ standard. 19:06:22 (Travis is on the core team for Marconi) 19:06:59 Cool. Anyone else want to say hi? Let's give it a couple minutes and move on to the next topic. 19:07:41 I'm Evan Shaw from Iron.io. 19:07:55 I'm John Hopper, engineer over in the Cloud Integration group. We're working on a unified logging component for OpenStack and considering the potential applications of Marconi, we're very interested 19:08:04 Rackspace, San Antonio^ 19:08:46 Hey all. Chad Arimura, Iron.io, in SF with Travis. 19:09:00 I'm Jamie Painter in Rackspace Atlanta (Kurt's colleague). Looking forward to contributing to Marconi. 19:09:00 Hi, I am Nithiwat from Rackspace's Blacksburg office. We use a lot of messaging here and are interested in Marconi project. 19:09:05 I'm Chad Lung, Rackspace Software Developer most recently on Atom Hopper (atomhopper.org) and helping out John Hopper on the Logging as a Service effort now. I'm in San Antonio 19:09:17 Hi, I am Oz Akan from Rackspace, Atlanta 19:09:27 Allan Metts, also from Rackspace Atlanta. Happy to see such a great turnout! 19:09:48 #topic Answer any general questions about the project 19:10:27 Fire away, and I'll do my best to answer (or nudge one of my colleagues to answer). 19:10:40 Can you define the initial use cases for the system? The target user audience, etc... 19:14:08 The high-level use cases revolve around providing a message bus that web app developers can use when deploying on OpenStack cloud(s)... 19:14:33 Two main use cases that I see are: 19:15:20 1. Notifications and RPC between backend systems and user agents through the Internet 19:16:18 2. Distributed job queuing (producer-consumer) 19:16:37 Gotcha. Thank you, that makes it more clear to me 19:17:02 treeder: Anything to add based on your experience? 19:17:44 I like the idea of using a "Home Document" versus WADL, are the other OpenStack projects using home documents since the RFC is pretty new? Or is this just something Marconi will have out of the box? 19:17:51 there's quite a few, mind if i post a link to a blog post with top 10 uses? 19:17:55 don't want to be spammy 19:17:59 sounds good 19:18:01 That'd be great 19:18:06 http://blog.iron.io/2012/12/top-10-uses-for-message-queue.html 19:18:16 #info http://blog.iron.io/2012/12/top-10-uses-for-message-queue.html 19:18:37 There's also this one which gets a bit deeper into some things: http://blog.iron.io/2013/01/common-actions-on-messages-in-queue.html 19:18:51 #info http://blog.iron.io/2013/01/common-actions-on-messages-in-queue.html 19:18:57 cool, thanks 19:19:35 the worker pattern is probably the most common though 19:19:47 one system puts messages on the queue 19:20:05 Makes sense - so this is targeting a much wider distribution model than the current queue implementations already in use (RabbitMQ), is that a safe assumption? 19:20:07 another system (could be many different machines) pulls them off to process them 19:20:15 I think so 19:20:22 chadl: The idea for a home document came from mnot and I've been trying to follow his guidance. Thoughts on WADL vs. home documents? 19:20:45 I prefer home documents - they're more concise and easier to parse and process in web-native platforms 19:20:51 For instance, one interesting thing we're seeing since it's a cloud queue that supports webhoooks 19:20:59 is integration between third party systems. 19:21:11 That would be unique, especially when looking at the Mobile market 19:21:24 ie: service X (stripe, github, or whatever) posts messages to a queue via their webhook support 19:21:41 then you can deal with them later to process them with workers, etc. 19:22:04 rabbit wouldn't work for those scenarios 19:22:12 #agreed Stick with home document 19:22:57 How will the bus handle the potentially vastly different latencies consumers will have. For example: mobile clients may have a much higher per-message latency for platform reasons than my VM hosted in an adjacent VLAN 19:25:08 Do you think mobile clients would use the service directly or via some kind of mobile push bridge? 19:25:35 do you mean the latency in terms of the mobile users experience? or how the mq would deal with it technically? 19:25:48 I guess their web browsers will do it via JavaScript if they load up a web app. 19:26:24 kgriffs: the mobile thing is a bit tricky. We'd like them to be able to post directly, the hardest part of that is authentication though 19:26:45 The bridge would make sense. I was curious to know if there's a distinction between retention and replication of events made available to consumers that might drain their queue infrequently. The bridge is probably a more ideal solution since they're two different use-case domains 19:27:43 #info Need to figure out auth to allow posting directly from mobile 19:27:48 regarding authentication, the mobile app developer would have to embed their authentication tokens into the client 19:28:00 even BaaS haven't figured that out yet 19:28:48 Interesting. We should take that up in depth in a future meeting 19:29:20 #action kgriffs add mobile app auth discussion to a future mtg agenda 19:31:34 Are there any throughput/scaling targets? I've been following some of your blog posts but I'm curious to know if there's a workload range defined 19:31:48 re the latency question, the combination of customizable message TTLs and the ability to query stats (messages per queue, also trending) would allow app developers to tune for different scenarios (even dynamically) 19:33:01 Right now I've just got some SWAGs re throughput and scaling targets 19:33:29 Haha, fair enough - just curious. I know development is in the early stages 19:34:15 AMQP support or not? 19:34:21 For example, looking at 50ms max turnaround per request (hopefully will be more like 10-20ms, but we're dealing with Python here, so no promises) 19:34:21 -1 19:34:28 ;) 19:35:33 In the logging service there's a REST component that's loosely defined for downstream event dissemination - the REST interface described by this system seems to fit our needs quite well, hence the curiosity on throughput 19:35:40 I like the idea of keeping Marconi RESTful (to use an abused term), and creating bridges to other systems that may be more appropriate for certain problem domains. 19:36:07 that -1 was for amqp support 19:36:14 I agree with that. AMQP was designed for certain reasons 19:36:43 jhopper: is 50ms max response time good enough for your logging use case? 19:37:15 re throughput, it isn't unreasonable to target millions of tps for a 10-12 box cluster 19:38:11 that's basically 2000 serial requests a second. I'd need to know payload size and how that impacts request/response latency but I don't see us outstripping your performance. We're looking at a sustained output of 10k+ messages a second however. 19:39:14 Our numbers are based roughly on Nova's API logging output 19:39:19 Well, Nova in general 19:40:06 #info 10k+ messages/sec 19:40:15 What kind of networking libs are you planning to use? Twisted? Eventlet? Tornado? etc. 19:40:50 Good to know. We should keep that in mind and see how little hardware we can get away with deploying to support that. 19:42:51 What I'm really interested in is how Marconi provides us with a uniform way of transmitting messages. Celiometer currently pulls a lot of information right out of RabbitMQ - Marconi may be a more ideal, decoupled solution. Even more so if we treat it as the log event distribution mechanism (used in metering, billing, monitoring, etc...) 19:42:52 This is very tentative, but it's looking like gevent with monkey-patched sockets for async requests to auth/storage. I don't think we'll need to do any disk I/O directly from the app (other than error logging). 19:43:53 jhopper: Good point. The Ceilometer team has talked about some frustrations they have had in working with the current RabbitMQ-based RPC mechanism 19:44:29 Indeed. We would like to position our solution as the feeding tube of Ceilometer and Marconi would drastically simplify that 19:44:51 re library, it will be WSGI. If people want to do push, they can add that as another layer on their own. 19:46:10 I don't want to necessarily position Marconi as a direct competitor to things like RabbitMQ, but if there are use cases where it makes sense to use it, I don't see any reason to stop people either. 19:49:15 P.S. - as for web frameworks, I'm working on one built specifically for these kinds of projects: simple, fast, focused on APIs only (vs. web apps). Preliminary benchmarks show it's about 10x faster than Flask, but that number will probably come down when I do some more realistic benchmarks. 19:49:48 Latency is very important and I'm also interested in efficient use of hardware. 19:50:17 Interesting. Is the framework going to be Python 3 compatible? 19:51:04 Part of my concern with the current dependency on RabbitMQ for Ceilometer is that it requires the Ceilometer instance to be hosted locally to that AMQP instance. This means that having a unified global view requires another aggregation system (another Celiometer perhaps) - not ideal in my opinion. 19:51:14 Yes, I've done some preliminary work on Python 3 compat, but there is more to do. Once the library stabilizes, that will be next on the todo list. 19:51:15 https://github.com/racker/falcon 19:51:36 I'll take a look at it. We're using Pecan at the moment to design our REST interfaces. 19:51:43 interesting - definitely less than ideal 19:52:32 Doug & Co. were touting that at the summit - I will run it through it's paces as well. 19:52:57 (AMQP situation is less than ideal, not Pecan) 19:53:25 kgirffs: looks nice 19:53:28 Yeah, that's the trick with things like AMQP and DDS 19:53:44 Check. I find Pecan a little under-developed still but usable and better than Flask. The models in it are pretty straight forward and the framework is tiny. 19:54:03 I'd love to hear your prospective as well as see how the framework behaves when put into the grinder. 19:54:10 ^ in the future, that is 19:54:16 treeder: Thanks. I sort of got strong-armed into doing this in python, so I'm using just about every trick in the book to get performance up. 19:54:58 jhopper: Sure thing. Pecan is another framework more targeted at web apps rather than just APIs, but that doesn't mean you can't do both with it. 19:54:59 jhopper: quick question, for your use case, is it ok to lose some messages in a failure scenario? 19:55:28 # info https://github.com/racker/falcon 19:55:29 or do you need guarantees that all messages are safe 19:55:32 ? 19:55:35 #info https://github.com/racker/falcon 19:56:02 Since many of the events will be utilized for alerts and usage, redundancy and message durability are a must 19:56:02 #action kgriffs Kick the tires on Pecan 19:56:22 Do you require cross-DC replication? 19:56:32 ok, so in your rabbit install, you are clustering it? 19:56:49 and using persistence? 19:56:54 Is there a plan down the road to have a web ui peering into Marconi, kind of like how RabbitMQ has its own? 19:57:40 Cross DC replication is not required but would be ideal. I don't know many of the specifics on the RabbitMQ cluster itself but there is message durability implied but not necessarily flushing 19:57:50 ChadL: Good idea. Noone has talked about a standalone admin surface 19:58:14 about durability, AWS says that even though it won't happen often, some messages in SQS might get lost 19:58:22 Well, now that I think about it, cross DC responsibilities may be better satisfied by a datastore 19:58:24 ChadL: I assume a Horizon plugin is in the future, but it would be nice to have something standalone as well. 19:58:44 so it might be very difficult to have a fully redundant web scale queue service 19:59:17 #action kgriffs Add admin web UI to list of future features 20:00:36 You can replicate across DCs but it slows things down. What do you think about clients being able to specify message durability? If they want something super-durable but slower throughput, they can choose that. 20:01:25 #topic winding down the discussion 20:01:36 Looks like we are running short on time 20:02:10 I will shift the remaining agenda items to future meetings. 20:02:13 That would be interesting. It might make things more flexible if the client can determine if durability is important to it. I'd like to explore that more but I like the concept. 20:02:16 Cool 20:02:41 kgriffs: can we talk about the spec a bit? 20:03:36 Sure. I just didn't want to keep people around if they've got other meetings, but if some folks wanna stick around a bit, I don't mind. 20:03:54 The spec in general, or the API? 20:04:10 well both I suppose 20:05:20 #topic Discuss the spec rough draft 20:05:23 #info http://wiki.openstack.org/marconi/specs/grizzly 20:05:33 two things I'm interested in discussing 20:05:38 tags 20:05:56 vs a concrete queue name 20:06:19 #info API blueprint - http://wiki.openstack.org/marconi/specs/api/v1 20:06:31 and the options to Get Messages 20:06:33 GET {base_url}/messages{?tags,ts,limit,sort,audit,echo} 20:06:52 OK. Let's take them one at a time 20:07:09 So, tags and/or concrete queue name 20:07:25 #topic tags and/or concrete queues 20:08:20 So, I know we discussed previously in a G+ hangout that having queue names offers some benefits. 20:08:32 ya 20:08:38 so a few of those benefits 20:08:43 While we *could* do everything with tags, it isn't very pragmatic 20:09:04 well just to reiterate, tags essentially feels like the user has 1 single massive queue where each message is tagged 20:09:15 with one or more tags 20:09:16 tags will come with a hefty performance hit 20:09:19 and replication hit 20:09:26 yep, for sure 20:09:39 I love them, don't get me wrong 20:10:08 so 1) scaling with tags would be very hard 20:10:50 2) sharing queues - ie: different rights/access to queues for different users/parties 20:11:05 i suppose you could assign rights to tags, but tags feel more ad-hoc 20:11:17 They're really powerful for implying event types and interest - if Marconi had a poll-push model then they might make more sense in that you could actively filter. However I don't think removing the construct is a good idea either - tags are powerful descriptors. It might be enough to just have them as part of the event schema and let downstream poller decide interest 20:11:35 I agree, there's an impedance mismatch with the mental model 20:12:47 Sharing queues makes sense to me but authentication and authorization would have to be defined first before visiting what it means to host access for a shared queue 20:13:09 If AuthN/Z are offloaded by a proxy then permissions at the queue level might not matter at all? 20:13:37 guess it depends how fine grained the auth can be 20:14:05 This is true 20:14:10 Seems like auth would be more expensive if it were based on a tag set 20:14:22 kgriffs: yes, it would be 20:14:46 If the bus doesn't really do any kind of tag introspection then using tags for permissions (while interesting) would require additional effort 20:15:04 What do you guys think about having concrete queues as well as some limited number of tags purely for filtering? 20:15:04 if you can assign tokens to a particular queue, then you can give that token to a third party, embed it in a mobile app, or whatever and be sure it can only access messages in that queue 20:15:24 that's an interesting concept 20:15:30 Indeed 20:15:40 generate an API token that is tied to a specific app and a specific queue 20:15:41 the idea of a global queue scares me 20:15:42 ;) 20:15:53 ya, exactly 20:16:21 #action kgriffs Add generating an API token that is tied to a specific app and a specific queue to the feature proposals list 20:16:22 Having concrete queues makes sense to me. If you want to do tag comprehension for filtering input into the queue, I think it would be interesting but that model also opens up a lot of extra questions. 20:16:48 jhopper: agree 20:17:02 although, for first version, I would highly recommend no tagging 20:17:44 Can a queue impart tags to messages published to it? 20:17:54 The sticking point is that we need it for Cloud Backup and I was hoping to use that as a guinea pig 20:18:01 Ah, I see. 20:18:02 you need tags? 20:18:09 just one 20:18:17 can you explain the use case? 20:18:30 Sure. 20:19:24 If the control panel wants to communicate with a backup agent installed on someone's server, it uses RSE... 20:19:35 For example... 20:21:18 …actually, I'm jumping ahead of myself, but I think it would be possible to do what Cloud Backup needs by allocating two queues per agent. The downside is you are putting 2x the load on the servers and using 2x bandwidth, etc. 20:21:27 …so back to my example 20:21:51 When a customer logs in, the control panel checks for recent heartbeat events 20:21:53 i think I get it already, you want some sort of fanout? 20:22:03 one message in, two consumers? 20:22:06 yeah, you probably know where I'm going 20:22:23 so we tell people to use two queues, just like you said 20:22:27 when someone logs in, the CP checks for heartbeat events 20:22:39 it does this at an account-wide level 20:23:16 There are other scenarios where the CP wants to talk just to a particular agent 20:23:38 anyway, depending on what's going on you've got fanout in one direction or the other. 20:23:48 right 20:24:06 would using multiple queues work just as well? 20:24:13 besides the fact you'd have to post multiple times 20:24:35 Event types are all multiplexed across a single queue, essentially, and the queues are only used to namespace by account and agent 20:24:37 It might be expensive since you could potentially have a queue for each host-agent of which there may be hundreds 20:24:51 Yes, you could, it just doubles the load, also for polling. 20:25:01 I see, so you'd use tags to demux the queue? 20:25:11 there are actually going to be hundreds of thousands of agents 20:26:00 Possibly, but I think the real benefit is to add one or two nested namespaces in order to provide an affordance than can be leveraged for stuff like fanout. 20:26:38 In any case, we could do v1.0 without tags and folks could use multiple queues, but I think it makes sense to do it at some point. 20:27:03 I ran into this thought game when it came to AtomNuke - This type of filtering, in my opinion, should probably be part of the responsibility of the poller or an intermediate bridge. 20:27:17 Of course, once the ability is there, who knows how people will leverage it? People do surprising things. 20:27:31 But again, there's a lot of +/- to the argument. That's a fair question too, kgriffs 20:27:52 jhopper: food for thought. 20:28:39 Probably not the poller; consider a web app doing the filtering. JavaScript isn't exactly a speed demon. 20:28:48 kgriffs: for your tagged example 20:29:07 Well, an intermediate bridge - maybe not the end client but rather a client from the queue's perspective 20:29:20 But maybe a proxy approach. Although depending on the storage drive, it may be more efficient to just do it as part of the query, assuming a finite number of tags (say, 2-3) 20:29:23 if agent A takes a message how does it even get to the two consumers? 20:30:09 That's a good question too. Hrm. 20:30:28 you'd have to have two messages anyways 20:30:39 Not sure I follow - if the agent gets a message, it *is* the consumer, right? 20:30:58 i thought you wanted one message to get to two consumers? 20:31:08 oic 20:31:54 Well, the scenario is the CP wants to broadcast an event to all agents that Bob is running under his cloud account. 20:32:17 So, agents need to listen on the broadcast queue, as well as another queue for RPC type stuff. 20:32:27 Unless you can do a tag 20:32:46 ok 20:32:57 so even in that scenario, I think two queues is probably better than tags anyways 20:33:13 unless you wanted to ensure that EVERY tag gets a message somehow 20:33:19 but that's adding some serious complexity 20:33:32 Tags imply a lot of routing logic 20:33:54 and delivery logic 20:33:55 I think that kgriffs' case makes sense, it's very useful in my opinion - just hard to implement. 20:34:03 In RSE we basically have the notion of a single tag AKA subqueue 20:34:12 Well, delivery logic would be simply placing a message reference into the relevant queue 20:34:18 It works pretty well. Things would get more complicated with N tags 20:34:39 This is true, unlimited taxonomies produce only headaches 20:35:40 I guess you need the ability to say "give me everything that has no tags and give me everything just with this tag set" in a single query 20:36:29 so, I don't query /foo?tag=bar AND /foo in separate requests. Otherwise might as well just use two queues 20:39:54 maybe fanout support would be good? 20:40:04 Going the other way, if an agent were to post to /account with tag=agent, the CP could decide whether it wants to listen to all events (in the case of a dashboard page) and just get everything regardless of tags for /account, or if you are looking at a single agent, it would then request /foo?tag=agent 20:41:13 eg: 20:41:44 hmmm, need to think about that a bit more 20:42:21 the broadcast use case is interesting 20:42:46 we have that with our Push Queues, but not for pull queues 20:43:19 can we continue that discussion next week? 20:43:44 Sounds good. We've been add it for quite a while now. :D 20:43:49 I'd like that - there's a lot to think about 20:43:50 add => at 20:44:13 good first meeting though! 20:44:31 Definitely. I wasn't sure what to expect but I think we're off to a good start 20:44:35 ya, definitely 20:44:36 I'm excited. This all sounds really promising and I like the API. 20:44:46 good stuff guys! 20:44:53 yeah the API looks great so far 20:45:26 OK. let's sleep on some of these things we've been discussing and follow up next week. 20:46:33 alright, cheers guys, good meeting you all and talk to you next week. 20:46:40 take care 20:46:43 #endmeeting