19:01:15 <kgriffs> #startmeeting Marconi Project Team
19:01:15 <oz_> hi
19:01:16 <openstack> Meeting started Thu Jan 17 19:01:15 2013 UTC.  The chair is kgriffs. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:19 <openstack> The meeting name has been set to 'marconi_project_team'
19:01:37 <kgriffs> #topic introductions
19:01:50 <kgriffs> OK, folks, let's get this party started
19:02:51 <kgriffs> Since this is a new project and involves some folks new to OpenStack, I thought it would be good to start out with a few introductions from the core team and anyone else who's here to talk about Marconi.
19:03:49 <kgriffs> First off, I'm Kurt Griffiths with Rackspace in the Atlanta office. I developed the notification bus for our Cloud Backup product.
19:04:08 <kgriffs> treeder: you want to go next?
19:04:31 <treeder> sure
19:05:57 <treeder> I'm Travis Reeder, CTO of Iron.io, in SF. We built a cloud message queue called IronMQ. Excited to be involved in this project to help define the cloud MQ standard.
19:06:22 <kgriffs> (Travis is on the core team for Marconi)
19:06:59 <kgriffs> Cool. Anyone else want to say hi? Let's give it a couple minutes and move on to the next topic.
19:07:41 <edsrzf> I'm Evan Shaw from Iron.io.
19:07:55 <jhopper> I'm John Hopper, engineer over in the Cloud Integration group. We're working on a unified logging component for OpenStack and considering the potential applications of Marconi, we're very interested
19:08:04 <jhopper> Rackspace, San Antonio^
19:08:46 <carimura> Hey all. Chad Arimura, Iron.io, in SF with Travis.
19:09:00 <jamie-racklanta> I'm Jamie Painter in Rackspace Atlanta (Kurt's colleague). Looking forward to contributing to Marconi.
19:09:00 <nithiwat> Hi, I am Nithiwat from Rackspace's Blacksburg office. We use a lot of messaging here and are interested in Marconi project.
19:09:05 <ChadL> I'm Chad Lung, Rackspace Software Developer most recently on Atom Hopper (atomhopper.org) and helping out John Hopper on the Logging as a Service effort now. I'm in San Antonio
19:09:17 <oz_akan> Hi, I am Oz Akan from Rackspace, Atlanta
19:09:27 <ametts-atl> Allan Metts, also from Rackspace Atlanta.  Happy to see such a great turnout!
19:09:48 <kgriffs> #topic Answer any general questions about the project
19:10:27 <kgriffs> Fire away, and I'll do my best to answer (or nudge one of my colleagues to answer).
19:10:40 <jhopper> Can you define the initial use cases for the system? The target user audience, etc...
19:14:08 <kgriffs> The high-level use cases revolve around providing a message bus that web app developers can use when deploying on OpenStack cloud(s)...
19:14:33 <kgriffs> Two main use cases that I see are:
19:15:20 <kgriffs> 1. Notifications and RPC between backend systems and user agents through the Internet
19:16:18 <kgriffs> 2. Distributed job queuing (producer-consumer)
19:16:37 <jhopper> Gotcha. Thank you, that makes it more clear to me
19:17:02 <kgriffs> treeder: Anything to add based on your experience?
19:17:44 <ChadL> I like the idea of using a "Home Document" versus WADL, are the other OpenStack projects using home documents since the RFC is pretty new? Or is this just something Marconi will have out of the box?
19:17:51 <treeder> there's quite a few, mind if i post a link to a blog post with top 10 uses?
19:17:55 <treeder> don't want to be spammy
19:17:59 <kgriffs> sounds good
19:18:01 <jhopper> That'd be great
19:18:06 <treeder> http://blog.iron.io/2012/12/top-10-uses-for-message-queue.html
19:18:16 <kgriffs> #info http://blog.iron.io/2012/12/top-10-uses-for-message-queue.html
19:18:37 <treeder> There's also this one which gets a bit deeper into some things: http://blog.iron.io/2013/01/common-actions-on-messages-in-queue.html
19:18:51 <kgriffs> #info http://blog.iron.io/2013/01/common-actions-on-messages-in-queue.html
19:18:57 <kgriffs> cool, thanks
19:19:35 <treeder> the worker pattern is probably the most common though
19:19:47 <treeder> one system puts messages on the queue
19:20:05 <jhopper> Makes sense - so this is targeting a much wider distribution model than the current queue implementations already in use (RabbitMQ), is that a safe assumption?
19:20:07 <treeder> another system (could be many different machines) pulls them off to process them
19:20:15 <treeder> I think so
19:20:22 <kgriffs> chadl: The idea for a home document came from mnot and I've been trying to follow his guidance. Thoughts on WADL vs. home documents?
19:20:45 <jhopper> I prefer home documents - they're more concise and easier to parse and process in web-native platforms
19:20:51 <treeder> For instance, one interesting thing we're seeing since it's a cloud queue that supports webhoooks
19:20:59 <treeder> is integration between third party systems.
19:21:11 <jhopper> That would be unique, especially when looking at the Mobile market
19:21:24 <treeder> ie: service X (stripe, github, or whatever) posts messages to a queue via their webhook support
19:21:41 <treeder> then you can deal with them later to process them with workers, etc.
19:22:04 <treeder> rabbit wouldn't work for those scenarios
19:22:12 <kgriffs> #agreed Stick with home document
19:22:57 <jhopper> How will the bus handle the potentially vastly different latencies consumers will have. For example: mobile clients may have a much higher per-message latency for platform reasons than my VM hosted in an adjacent VLAN
19:25:08 <kgriffs> Do you think mobile clients would use the service directly or via some kind of mobile push bridge?
19:25:35 <treeder> do you mean the latency in terms of the mobile users experience? or how the mq would deal with it technically?
19:25:48 <kgriffs> I guess their web browsers will do it via JavaScript if they load up a web app.
19:26:24 <treeder> kgriffs: the mobile thing is a bit tricky. We'd like them to be able to post directly, the hardest part of that is authentication though
19:26:45 <jhopper> The bridge would make sense. I was curious to know if there's a distinction between retention and replication of events made available to consumers that might drain their queue infrequently. The bridge is probably a more ideal solution since they're two different use-case domains
19:27:43 <kgriffs> #info Need to figure out auth to allow posting directly from mobile
19:27:48 <treeder> regarding authentication, the mobile app developer would have to embed their authentication tokens into the client
19:28:00 <treeder> even BaaS haven't figured that out yet
19:28:48 <kgriffs> Interesting. We should take that up in depth in a future meeting
19:29:20 <kgriffs> #action kgriffs add mobile app auth discussion to a future mtg agenda
19:31:34 <jhopper> Are there any throughput/scaling targets? I've been following some of your blog posts but I'm curious to know if there's a workload range defined
19:31:48 <kgriffs> re the latency question, the combination of customizable message TTLs and the ability to query stats (messages per queue, also trending) would allow app developers to tune for different scenarios (even dynamically)
19:33:01 <kgriffs> Right now I've just got some SWAGs re throughput and scaling targets
19:33:29 <jhopper> Haha, fair enough - just curious. I know development is in the early stages
19:34:15 <ChadL> AMQP support or not?
19:34:21 <kgriffs> For example, looking at 50ms max turnaround per request (hopefully will be more like 10-20ms, but we're dealing with Python here, so no promises)
19:34:21 <treeder> -1
19:34:28 <treeder> ;)
19:35:33 <jhopper> In the logging service there's a REST component that's loosely defined for downstream event dissemination - the REST interface described by this system seems to fit our needs quite well, hence the curiosity on throughput
19:35:40 <kgriffs> I like the idea of keeping Marconi RESTful (to use an abused term), and creating bridges to other systems that may be more appropriate for certain problem domains.
19:36:07 <treeder> that -1 was for amqp support
19:36:14 <jhopper> I agree with that. AMQP was designed for certain reasons
19:36:43 <treeder> jhopper: is 50ms max response time good enough for your logging use case?
19:37:15 <kgriffs> re throughput, it isn't unreasonable to target millions of tps for a 10-12 box cluster
19:38:11 <jhopper> that's basically 2000 serial requests a second. I'd need to know payload size and how that impacts request/response latency but I don't see us outstripping your performance. We're looking at a sustained output of 10k+ messages a second however.
19:39:14 <jhopper> Our numbers are based roughly on Nova's API logging output
19:39:19 <jhopper> Well, Nova in general
19:40:06 <kgriffs> #info 10k+ messages/sec
19:40:15 <ChadL> What kind of networking libs are you planning to use? Twisted? Eventlet? Tornado? etc.
19:40:50 <kgriffs> Good to know. We should keep that in mind and see how little hardware we can get away with deploying to support that.
19:42:51 <jhopper> What I'm really interested in is how Marconi provides us with a uniform way of transmitting messages. Celiometer currently pulls a lot of information right out of RabbitMQ - Marconi may be a more ideal, decoupled solution. Even more so if we treat it as the log event distribution mechanism (used in metering, billing, monitoring, etc...)
19:42:52 <kgriffs> This is very tentative, but it's looking like gevent with monkey-patched sockets for async requests to auth/storage. I don't think we'll need to do any disk I/O directly from the app (other than error logging).
19:43:53 <kgriffs> jhopper: Good point. The Ceilometer team has talked about some frustrations they have had in working with the current RabbitMQ-based RPC mechanism
19:44:29 <jhopper> Indeed. We would like to position our solution as the feeding tube of Ceilometer and Marconi would drastically simplify that
19:44:51 <kgriffs> re library, it will be WSGI. If people want to do push, they can add that as another layer on their own.
19:46:10 <kgriffs> I don't want to necessarily position Marconi as a direct competitor to things like RabbitMQ, but if there are use cases where it makes sense to use it, I don't see any reason to stop people either.
19:49:15 <kgriffs> P.S. - as for web frameworks, I'm working on one built specifically for these kinds of projects: simple, fast, focused on APIs only (vs. web apps). Preliminary benchmarks show it's about 10x faster than Flask, but that number will probably come down when I do some more realistic benchmarks.
19:49:48 <kgriffs> Latency is very important and I'm also interested in efficient use of hardware.
19:50:17 <jhopper> Interesting. Is the framework going to be Python 3 compatible?
19:51:04 <jhopper> Part of my concern with the current dependency on RabbitMQ for Ceilometer is that it requires the Ceilometer instance to be hosted locally to that AMQP instance. This means that having a unified global view requires another aggregation system (another Celiometer perhaps) - not ideal in my opinion.
19:51:14 <kgriffs> Yes, I've done some preliminary work on Python 3 compat, but there is more to do. Once the library stabilizes, that will be next on the todo list.
19:51:15 <kgriffs> https://github.com/racker/falcon
19:51:36 <jhopper> I'll take a look at it. We're using Pecan at the moment to design our REST interfaces.
19:51:43 <kgriffs> interesting - definitely less than ideal
19:52:32 <kgriffs> Doug & Co. were touting that at the summit - I will run it through it's paces as well.
19:52:57 <kgriffs> (AMQP situation is less than ideal, not Pecan)
19:53:25 <treeder> kgirffs: looks nice
19:53:28 <kgriffs> Yeah, that's the trick with things like AMQP and DDS
19:53:44 <jhopper> Check. I find Pecan a little under-developed still but usable and better than Flask. The models in it are pretty straight forward and the framework is tiny.
19:54:03 <jhopper> I'd love to hear your prospective as well as see how the framework behaves when put into the grinder.
19:54:10 <jhopper> ^ in the future, that is
19:54:16 <kgriffs> treeder: Thanks. I sort of got strong-armed into doing this in python, so I'm using just about every trick in the book to get performance up.
19:54:58 <kgriffs> jhopper: Sure thing. Pecan is another framework more targeted at web apps rather than just APIs, but that doesn't mean you can't do both with it.
19:54:59 <treeder> jhopper: quick question, for your use case, is it ok to lose some messages in a failure scenario?
19:55:28 <kgriffs> # info https://github.com/racker/falcon
19:55:29 <treeder> or do you need guarantees that all messages are safe
19:55:32 <treeder> ?
19:55:35 <kgriffs> #info https://github.com/racker/falcon
19:56:02 <jhopper> Since many of the events will be utilized for alerts and usage, redundancy and message durability are a must
19:56:02 <kgriffs> #action kgriffs Kick the tires on Pecan
19:56:22 <kgriffs> Do you require cross-DC replication?
19:56:32 <treeder> ok, so in your rabbit install, you are clustering it?
19:56:49 <treeder> and using persistence?
19:56:54 <ChadL> Is there a plan down the road to have a web ui peering into Marconi, kind of like how RabbitMQ has its own?
19:57:40 <jhopper> Cross DC replication is not required but would be ideal. I don't know many of the specifics on the RabbitMQ cluster itself but there is message durability implied but not necessarily flushing
19:57:50 <kgriffs> ChadL: Good idea. Noone has talked about a standalone admin surface
19:58:14 <oz_akan> about durability, AWS says that even though it won't happen often, some messages in SQS might get lost
19:58:22 <jhopper> Well, now that I think about it, cross DC responsibilities may be better satisfied by a datastore
19:58:24 <kgriffs> ChadL: I assume a Horizon plugin is in the future, but it would be nice to have something standalone as well.
19:58:44 <oz_akan> so it might be very difficult to have a fully redundant web scale queue service
19:59:17 <kgriffs> #action kgriffs Add admin web UI to list of future features
20:00:36 <kgriffs> You can replicate across DCs but it slows things down. What do you think about clients being able to specify message durability? If they want something super-durable but slower throughput, they can choose that.
20:01:25 <kgriffs> #topic winding down the discussion
20:01:36 <kgriffs> Looks like we are running short on time
20:02:10 <kgriffs> I will shift the remaining agenda items to future meetings.
20:02:13 <jhopper> That would be interesting. It might make things more flexible if the client can determine if durability is important to it. I'd like to explore that more but I like the concept.
20:02:16 <jhopper> Cool
20:02:41 <treeder> kgriffs: can we talk about the spec a bit?
20:03:36 <kgriffs> Sure. I just didn't want to keep people around if they've got other meetings, but if some folks wanna stick around a bit, I don't mind.
20:03:54 <kgriffs> The spec in general, or the API?
20:04:10 <treeder> well both I suppose
20:05:20 <kgriffs> #topic Discuss the spec rough draft
20:05:23 <kgriffs> #info http://wiki.openstack.org/marconi/specs/grizzly
20:05:33 <treeder> two things I'm interested in discussing
20:05:38 <treeder> tags
20:05:56 <treeder> vs a concrete queue name
20:06:19 <kgriffs> #info API blueprint - http://wiki.openstack.org/marconi/specs/api/v1
20:06:31 <treeder> and the options to Get Messages
20:06:33 <treeder> GET {base_url}/messages{?tags,ts,limit,sort,audit,echo}
20:06:52 <kgriffs> OK. Let's take them one at a time
20:07:09 <kgriffs> So, tags and/or concrete queue name
20:07:25 <kgriffs> #topic tags and/or concrete queues
20:08:20 <kgriffs> So, I know we discussed previously in a G+ hangout that having queue names offers some benefits.
20:08:32 <treeder> ya
20:08:38 <treeder> so a few of those benefits
20:08:43 <kgriffs> While we *could* do everything with tags, it isn't very pragmatic
20:09:04 <treeder> well just to reiterate, tags essentially feels like the user has 1 single massive queue where each message is tagged
20:09:15 <treeder> with one or more tags
20:09:16 <jhopper> tags will come with a hefty performance hit
20:09:19 <jhopper> and replication hit
20:09:26 <treeder> yep, for sure
20:09:39 <jhopper> I love them, don't get me wrong
20:10:08 <treeder> so 1) scaling with tags would be very hard
20:10:50 <treeder> 2) sharing queues - ie: different rights/access to queues for different users/parties
20:11:05 <treeder> i suppose you could assign rights to tags, but tags feel more ad-hoc
20:11:17 <jhopper> They're really powerful for implying event types and interest - if Marconi had a poll-push model then they might make more sense in that you could actively filter. However I don't think removing the construct is a good idea either - tags are powerful descriptors. It might be enough to just have them as part of the event schema and let downstream poller decide interest
20:11:35 <kgriffs> I agree, there's an impedance mismatch with the mental model
20:12:47 <jhopper> Sharing queues makes sense to me but authentication and authorization would have to be defined first before visiting what it means to host access for a shared queue
20:13:09 <jhopper> If AuthN/Z are offloaded by a proxy then permissions at the queue level might not matter at all?
20:13:37 <treeder> guess it depends how fine grained the auth can be
20:14:05 <jhopper> This is true
20:14:10 <kgriffs> Seems like auth would be more expensive if it were based on a tag set
20:14:22 <treeder> kgriffs: yes, it would be
20:14:46 <jhopper> If the bus doesn't really do any kind of tag introspection then using tags for permissions (while interesting) would require additional effort
20:15:04 <kgriffs> What do you guys think about having concrete queues as well as some limited number of tags purely for filtering?
20:15:04 <treeder> if you can assign tokens to a particular queue, then you can give that token to a third party, embed it in a mobile app, or whatever and be sure it can only access messages in that queue
20:15:24 <kgriffs> that's an interesting concept
20:15:30 <jhopper> Indeed
20:15:40 <kgriffs> generate an API token that is tied to a specific app and a specific queue
20:15:41 <treeder> the idea of a global queue scares me
20:15:42 <treeder> ;)
20:15:53 <treeder> ya, exactly
20:16:21 <kgriffs> #action kgriffs Add generating an API token that is tied to a specific app and a specific queue to the feature proposals list
20:16:22 <jhopper> Having concrete queues makes sense to me. If you want to do tag comprehension for filtering input into the queue, I think it would be interesting but that model also opens up a lot of extra questions.
20:16:48 <treeder> jhopper: agree
20:17:02 <treeder> although, for first version, I would highly recommend no tagging
20:17:44 <jhopper> Can a queue impart tags to messages published to it?
20:17:54 <kgriffs> The sticking point is that we need it for Cloud Backup and I was hoping to use that as a guinea pig
20:18:01 <jhopper> Ah, I see.
20:18:02 <treeder> you need tags?
20:18:09 <kgriffs> just one
20:18:17 <treeder> can you explain the use case?
20:18:30 <kgriffs> Sure.
20:19:24 <kgriffs> If the control panel wants to communicate with a backup agent installed on someone's server, it uses RSE...
20:19:35 <kgriffs> For example...
20:21:18 <kgriffs> …actually, I'm jumping ahead of myself, but I think it would be possible to do what Cloud Backup needs by allocating two queues per agent. The downside is you are putting 2x the load on the servers and using 2x bandwidth, etc.
20:21:27 <kgriffs> …so back to my example
20:21:51 <kgriffs> When a customer logs in, the control panel checks for recent heartbeat events
20:21:53 <treeder> i think I get it already, you want some sort of fanout?
20:22:03 <treeder> one message in, two consumers?
20:22:06 <kgriffs> yeah, you probably know where I'm going
20:22:23 <treeder> so we tell people to use two queues, just like you said
20:22:27 <kgriffs> when someone logs in, the CP checks for heartbeat events
20:22:39 <kgriffs> it does this at an account-wide level
20:23:16 <kgriffs> There are other scenarios where the CP wants to talk just to a particular agent
20:23:38 <kgriffs> anyway, depending on what's going on you've got fanout in one direction or the other.
20:23:48 <treeder> right
20:24:06 <treeder> would using multiple queues work just as well?
20:24:13 <treeder> besides the fact you'd have to post multiple times
20:24:35 <kgriffs> Event types are all multiplexed across a single queue, essentially, and the queues are only used to namespace by account and agent
20:24:37 <jhopper> It might be expensive since you could potentially have a queue for each host-agent of which there may be hundreds
20:24:51 <kgriffs> Yes, you could, it just doubles the load, also for polling.
20:25:01 <jhopper> I see, so you'd use tags to demux the queue?
20:25:11 <kgriffs> there are actually going to be hundreds of thousands of agents
20:26:00 <kgriffs> Possibly, but I think the real benefit is to add one or two nested namespaces in order to provide an affordance than can be leveraged for stuff like fanout.
20:26:38 <kgriffs> In any case, we could do v1.0 without tags and folks could use multiple queues, but I think it makes sense to do it at some point.
20:27:03 <jhopper> I ran into this thought game when it came to AtomNuke - This type of filtering, in my opinion, should probably be part of the responsibility of the poller or an intermediate bridge.
20:27:17 <kgriffs> Of course, once the ability is there, who knows how people will leverage it? People do surprising things.
20:27:31 <jhopper> But again, there's a lot of +/- to the argument. That's a fair question too, kgriffs
20:27:52 <kgriffs> jhopper: food for thought.
20:28:39 <kgriffs> Probably not the poller; consider a web app doing the filtering. JavaScript isn't exactly a speed demon.
20:28:48 <treeder> kgriffs: for your tagged example
20:29:07 <jhopper> Well, an intermediate bridge - maybe not the end client but rather a client from the queue's perspective
20:29:20 <kgriffs> But maybe a proxy approach. Although depending on the storage drive, it may be more efficient to just do it as part of the query, assuming a finite number of tags (say, 2-3)
20:29:23 <treeder> if agent A takes a message how does it even get to the two consumers?
20:30:09 <jhopper> That's a good question too. Hrm.
20:30:28 <treeder> you'd have to have two messages anyways
20:30:39 <kgriffs> Not sure I follow - if the agent gets a message, it *is* the consumer, right?
20:30:58 <treeder> i thought you wanted one message to get to two consumers?
20:31:08 <kgriffs> oic
20:31:54 <kgriffs> Well, the scenario is the CP wants to broadcast an event to all agents that Bob is running under his cloud account.
20:32:17 <kgriffs> So, agents need to listen on the broadcast queue, as well as another queue for RPC type stuff.
20:32:27 <kgriffs> Unless you can do a tag
20:32:46 <treeder> ok
20:32:57 <treeder> so even in that scenario, I think two queues is probably better than tags anyways
20:33:13 <treeder> unless you wanted to ensure that EVERY tag gets a message somehow
20:33:19 <treeder> but that's adding some serious complexity
20:33:32 <jhopper> Tags imply a lot of routing logic
20:33:54 <treeder> and delivery logic
20:33:55 <jhopper> I think that kgriffs' case makes sense, it's very useful in my opinion - just hard to implement.
20:34:03 <kgriffs> In RSE we basically have the notion of a single tag AKA subqueue
20:34:12 <jhopper> Well, delivery logic would be simply placing a message reference into the relevant queue
20:34:18 <kgriffs> It works pretty well. Things would get more complicated with N tags
20:34:39 <jhopper> This is true, unlimited taxonomies produce only headaches
20:35:40 <kgriffs> I guess you need the ability to say "give me everything that has no tags and give me everything just with this tag set" in a single query
20:36:29 <kgriffs> so, I don't query /foo?tag=bar AND /foo in separate requests. Otherwise might as well just use two queues
20:39:54 <treeder> maybe fanout support would be good?
20:40:04 <kgriffs> Going the other way, if an agent were to post to /account with tag=agent, the CP could decide whether it wants to listen to all events (in the case of a dashboard page) and just get everything regardless of tags for /account, or if you are looking at a single agent, it would then request /foo?tag=agent
20:41:13 <treeder> eg:
20:41:44 <treeder> hmmm, need to think about that a bit more
20:42:21 <treeder> the broadcast use case is interesting
20:42:46 <treeder> we have that with our Push Queues, but not for pull queues
20:43:19 <treeder> can we continue that discussion next week?
20:43:44 <kgriffs> Sounds good. We've been add it for quite a while now. :D
20:43:49 <jhopper> I'd like that - there's a lot to think about
20:43:50 <kgriffs> add => at
20:44:13 <treeder> good first meeting though!
20:44:31 <kgriffs> Definitely. I wasn't sure what to expect but I think we're off to a good start
20:44:35 <treeder> ya, definitely
20:44:36 <jhopper> I'm excited. This all sounds really promising and I like the API.
20:44:46 <jhopper> good stuff guys!
20:44:53 <ChadL> yeah the API looks great so far
20:45:26 <kgriffs> OK. let's sleep on some of these things we've been discussing and follow up next week.
20:46:33 <treeder> alright, cheers guys, good meeting you all and talk to you next week.
20:46:40 <jhopper> take care
20:46:43 <kgriffs> #endmeeting