Tuesday, 2013-10-22

*** jdaggett has joined #openstack-marconi		00:15
*** jdaggett has quit IRC		00:19
*** amitgandhi has joined #openstack-marconi		00:25
*** reed has quit IRC		00:43
*** reed has joined #openstack-marconi		00:44
*** jdaggett has joined #openstack-marconi		00:45
*** jdaggett has quit IRC		00:50
*** nosnos has joined #openstack-marconi		00:52
*** jdaggett has joined #openstack-marconi		01:36
*** amitgandhi has quit IRC		01:36
*** oz_akan_ has joined #openstack-marconi		01:41
*** oz_akan_ has quit IRC		02:22
*** reed has quit IRC		02:38
*** reed has joined #openstack-marconi		05:49
*** yassine has joined #openstack-marconi		08:11
*** cthulhup has joined #openstack-marconi		08:22
*** reed has quit IRC		09:11
*** tedross has joined #openstack-marconi		11:38
*** malini_afk is now known as malini		11:40
*** malini is now known as malini_afk		12:13
*** malini_afk is now known as malini		12:15
*** nosnos has quit IRC		12:27
*** nosnos has joined #openstack-marconi		12:28
*** nosnos has quit IRC		12:32
*** ayoung has quit IRC		12:51
*** jcru has joined #openstack-marconi		12:54
*** oz_akan_ has joined #openstack-marconi		13:08
*** alcabrera has joined #openstack-marconi		13:10
alcabrera	Good morning! :D	13:11
*** mpanetta has joined #openstack-marconi		13:20
*** malini is now known as malini_afk		13:31
*** amitgandhi has joined #openstack-marconi		13:42
*** ayoung has joined #openstack-marconi		13:49
openstackgerrit	Zhihao Yuan proposed a change to openstack/marconi: feat(shard): queue listing from multiple sources https://review.openstack.org/53127	14:19
zyuan	^^ i'm trying to find out which change makes py26 failed	14:19
zyuan	don't review these	14:20
alcabrera	k	14:22
openstackgerrit	Zhihao Yuan proposed a change to openstack/marconi: feat(shard): queue listing from multiple sources https://review.openstack.org/53127	14:24
*** reed has joined #openstack-marconi		14:30
openstackgerrit	Zhihao Yuan proposed a change to openstack/marconi: feat(shard): queue listing from multiple sources https://review.openstack.org/53127	14:36
openstackgerrit	Zhihao Yuan proposed a change to openstack/marconi: feat(shard): queue listing from multiple sources https://review.openstack.org/53127	15:00
openstackgerrit	Zhihao Yuan proposed a change to openstack/marconi: feat(shard): queue listing from multiple sources https://review.openstack.org/53127	15:04
*** whenry has joined #openstack-marconi		15:05
*** jdaggett1 has joined #openstack-marconi		15:08
*** malini_afk is now known as malini		15:08
openstackgerrit	Alejandro Cabrera proposed a change to openstack/marconi: feat: add shard management resource https://review.openstack.org/50702	15:13
openstackgerrit	Alejandro Cabrera proposed a change to openstack/marconi: feat: shards storage controller interface https://review.openstack.org/50721	15:14
openstackgerrit	Zhihao Yuan proposed a change to openstack/marconi: feat(shard): queue listing from multiple sources https://review.openstack.org/53127	15:17
*** vkmc has joined #openstack-marconi		15:17
*** vkmc has quit IRC		15:17
*** vkmc has joined #openstack-marconi		15:17
*** kgriffs_afk is now known as kgriffs		15:23
openstackgerrit	Zhihao Yuan proposed a change to openstack/marconi: feat(shard): queue listing from multiple sources https://review.openstack.org/53127	15:24
zyuan	pelase review https://review.openstack.org/#/c/53127/	15:29
zyuan	i changed nothing, and it passed. jenkins you win.	15:29
alcabrera	lol	15:30
alcabrera	I'll review that patch once I finish getting jenkins happy on my local rebase.	15:31
alcabrera	*appeasing jenkins	15:31
alcabrera	errmm... hmm.. jenkins -> tox	15:32
alcabrera	:P	15:32
kgriffs	alcabrera: got a minute to discuss health endpoint stuff?	15:40
alcabrera	yup	15:40
kgriffs	ok, so we have two bugs	15:40
kgriffs	https://bugs.launchpad.net/marconi/+bug/1242926	15:40
alcabrera	whitelisting for keystone ^^	15:41
kgriffs	so, my thinking there was to have keystone inject a header, e.g., X-Auth-Whitelisted	15:41
kgriffs	or something	15:41
kgriffs	so the other middleware can key off of that and not complain about missing auth stuff	15:41
kgriffs	that would allow the solution to work with EOM as well as if an operator is just deploying with out-of-box middleware support	15:42
alcabrera	I see. As for keystone itself, we'd need support for whitelisting particular routes.	15:42
alcabrera	s/itself/input	15:43
zyuan	hmm	15:43
zyuan	that looks scary	15:43
kgriffs	why scary?	15:43
zyuan	what if user send X-Auth-Whitelisted?	15:43
mpanetta	Don't forget that the capabilities of the thing running the health check is pretty minimal...	15:43
mpanetta	I can't send any headers.	15:44
kgriffs	the keystone middleware would strip that header from the client, just like it already strips X-Roles	15:44
kgriffs	(and such)	15:44
zyuan	our X-Project-Id?	15:44
kgriffs	mpanetta: the LB does not need to send that header	15:44
mpanetta	Ah ok	15:44
kgriffs	it would be injected by the keystone middleware	15:44
kgriffs	just to notify downstream middlewarez	15:45
alcabrera	hmmm...	15:45
kgriffs	that is one option	15:45
kgriffs	other options include having another app	15:45
zyuan	...	15:45
kgriffs	two variants on that	15:46
zyuan	if we only allow the header on 1 endpoint, that looks fine	15:46
kgriffs	a. simple middleware app that implements it's own health, otherwise passes through	15:46
kgriffs	b. middleware that implements health check but translates it to a request to post something to a special/hidden "health check" queue	15:47
alcabrera	I kind of like (a), given that it would just wrap the marconi app.	15:48
kgriffs	...which brings me to my second bug.	15:48
kgriffs	https://bugs.launchpad.net/marconi/+bug/1243268	15:48
mpanetta	Ah	15:49
alcabrera	cool - a deeper health check	15:49
mpanetta	Yes, we need a deep check.	15:49
kgriffs	Whatever we do, I think that we need to check that we can talk to a real process and that process can talk to it's storage backend	15:49
zyuan	how deep? all shards?	15:49
kgriffs	the check goes to a single web head	15:50
kgriffs	this is for the LB, right?	15:50
alcabrera	zyuan has a point - a single webhead can access any shard.	15:50
kgriffs	is this also going to be used by the uwsgi router?	15:50
mpanetta	yes	15:50
mpanetta	well LB, does not have to go through router.	15:50
kgriffs	(or localhost nginx balancer, whatever operators deploy in front of workers)	15:51
mpanetta	But I think that was how we were going to expose the endpoint.	15:51
kgriffs	mpanetta: if the router support checking worker health, that could be useful as well - esp. if it knows how to restart workers	15:51
mpanetta	But yeah, single web head, this is not a health check as would necessarily be performed by a queues user, just the system.	15:51
kgriffs	or some kind of localhost daemon manager	15:52
mpanetta	kgriffs: It actually seems to know how to restart pretty well, I killed some workers manually the other day and it happily restarted them.	15:52
kgriffs	ok, but can it preemptively check for hung or 500's or something	15:55
mpanetta	Not that I am aware of	15:55
kgriffs	ok	15:57
kgriffs	i guess the LB alerts when a node becomes unhealthy?	15:58
kgriffs	so someone can look at it??	15:58
mpanetta	It drops the node from the rotation	15:58
mpanetta	no alerts are sent the way we have the LB's configured at the moment.	15:58
kgriffs	ah, that should be remedied. If something goes MIA for an extended period, someone needs to be notified	15:59
mpanetta	Yes	15:59
kgriffs	aaaanyway	16:02
kgriffs	re sharding, that is a good point	16:06
kgriffs	So, a deep check would need to verify it can communicate with all shards	16:07
kgriffs	so, the shard catalog would need a "health check" call or something	16:07
kgriffs	actually	16:07
kgriffs	if the storage driver interface includes a "health" method	16:08
mpanetta	Shards, are the db distribution method?	16:08
kgriffs	then we are cool, since the shard driver can just implement that as well	16:08
zyuan	if so, then the "healthy" need to be related to the sharding fallback algorithm	16:08
zyuan	we are going to use	16:08
mpanetta	That sounds perfect	16:08
kgriffs	mpanetta: app-level db sharding	16:08
mpanetta	kgriffs: Ah ok	16:09
alcabrera	hmm...	16:10
alcabrera	it shouldn't be very difficult to extend the shard catalogue storage driver to check the health of all registered shards.	16:10
*** mpanetta is now known as mpanetta_lunch		16:11
*** yassine has quit IRC		16:20
*** jdaggett has joined #openstack-marconi		16:50
*** jdaggett1 has quit IRC		16:54
openstackgerrit	Alejandro Cabrera proposed a change to openstack/marconi: feat: shards mongodb driver + tests https://review.openstack.org/50815	16:57
*** whenry has quit IRC		17:03
*** fvollero is now known as fvollero\|gone		17:04
zyuan	kgriffs: ping	17:48
zyuan	i think i will be easier to add an noop() to storage interface	18:00
zyuan	i think it need a very short connection timeout...	18:01
*** ametts has quit IRC		18:01
*** JRow has joined #openstack-marconi		18:19
*** JRow has quit IRC		18:43
alcabrera	kgriffs: ping	18:54
zyuan	kgriffs: i have some questiong	18:54
zyuan	...	18:54
zyuan	you first	18:54
alcabrera	:D	18:54
kgriffs	pong	18:55
alcabrera	w00t	18:55
*** JRow has joined #openstack-marconi		18:55
alcabrera	so to be sure, every patch up to https://review.openstack.org/#/c/50815/ is rebased and ready.	18:55
alcabrera	I need to double check the single catalogue driver patch.	18:55
alcabrera	I'm almost done rebasing the transport + storage part of sharding.	18:56
alcabrera	*sharding admin	18:56
*** mpanetta_lunch is now known as mpanetta		18:57
kgriffs	ok, I will take a look at those shortly	19:00
alcabrera	kgriffs: thanks!	19:00
zyuan	kgriffs: i noticed that the whole app only uses 1 database connection. it might be a stupid question but.... why not 1 connection per client?	19:01
kgriffs	the hole app, meaning "queues" app?	19:03
kgriffs	whole	19:03
zyuan	whole	19:03
zyuan	yea	19:03
kgriffs	in the case of the mongodb driver, pymongo maintains its own connection pool iirc	19:05
zyuan	wsgi container can run N apps, but N database connections are served; multiple sessions currently share 1 connection.	19:05
zyuan	kgriffs: it doesn't matter iiuc, that's only usefull when you ask you multiple connections; for each app we asks for 1 connection	19:06
zyuan	for* multiple...	19:06
kgriffs	each wsgi app is single-threaded, right?	19:07
kgriffs	i mean, each worker process	19:07
zyuan	it's allowed to be not	19:07
zyuan	^^ not very sure	19:07
kgriffs	well, you could try deploying Marconi using multithreaded workers	19:08
kgriffs	but, I don't think anyone has tried it yet	19:08
zyuan	no, it's allowed to be not. because flask support multithread	19:08
kgriffs	I'm trying to think whether Falcon has any state that would blow up in a multithreaded environment	19:09
zyuan	so, you mean we are just fine to reuse the database connections? can greentlet prempt between queries?	19:09
kgriffs	pymongo is gevent-aware, not sure about eventlet	19:09
kgriffs	so, you can multiplex across a single client connection	19:10
zyuan	hmmm, ok	19:10
alcabrera	pymongo has issues with eventlet, iirc, because it runs on gevent	19:10
zyuan	i know sqlite don't work if you have multi thead access 1 connection...	19:10
zyuan	i asks this because traditional website only connects to db when a request come in	19:11
zyuan	anyway. then that means we do need a noop() DB access to test whether a shard is still alive	19:12
kgriffs	traditional setups also use thread pools per connection.	19:12
kgriffs	but usually that is less efficient than using an evented module combined with a set of single-threaded worker processes	19:12
zyuan	kgriffs: that's fine; the purpose is to limit the total threads count, but still 1 thread 1 connection	19:13
zyuan	kgriffs: i vaguely thing so	19:13
zyuan	k*	19:13
zyuan	the next question is about X-Auth-Whitelisted	19:13
zyuan	i don't know what Marconi need to be done.	19:14
zyuan	keystone controlled every access	19:14
zyuan	if there is no request come to marconi, there won't be a X-Auth-Whitelisted seen by anyside	19:15
zyuan	i looked at keystone's conf, it makes more sense to be if there is an configuration to exclude some uri...	19:16
kgriffs	yeah, I was thinking it would be great to patch python-keystoneclient middleware to support whitelisting, but...	19:21
kgriffs	it can be a real pain to get them to accept anything	19:21
kgriffs	(or so I've heard)	19:21
zyuan	but X-Auth-Whitelisted also need to patch keystone; i don't see what marconi can do...	19:21
zyuan	(unless we have two apps...)	19:22
*** JRow has left #openstack-marconi		19:28
kgriffs	just a sec	19:28
kgriffs	I have an idea	19:28
zyuan	btw, pls review https://review.openstack.org/#/c/53127/ ; i finally get jenkins accepted it (by doing nothing)	19:31
*** malini is now known as malini_afk		19:32
kgriffs	https://docs.google.com/file/d/0BxZAkOZUwvFAWnU1aWJfdjJ6V1U/edit?usp=sharing	19:34
zyuan	??	19:35
zyuan	coooool	19:36
zyuan	i uses dia	19:36
zyuan	this is.... basically a dia	19:36
kgriffs	http://oi39.tinypic.com/rrqb80.jpg	19:37
alcabrera	kgriffs: +1 - middleware saves the day	19:37
zyuan	put pipeline before keystone?	19:38
kgriffs	so, that is one option.	19:38
kgriffs	i am thinking we could have a generic wsgi pipeline constructor	19:38
kgriffs	downside is this won't work for non-HTTP transport	19:39
kgriffs	but I guess that is something to discuss later	19:39
kgriffs	although	19:39
kgriffs	we could just translate ZMQ messages to WSGI calls. :p	19:39
kgriffs	aaaanyway	19:40
kgriffs	cross that bridge when we come to it	19:40
alcabrera	that'd be a chore. :P	19:40
alcabrera	yup	19:40
kgriffs	since we don't know yet how auth will work period in that case, anyway	19:40
alcabrera	good point - keystone zmq auth plugin.	19:40
alcabrera	>.>	19:40
mpanetta	How would whitelisting work?	19:49
kgriffs	so, my current thinking is we have a wsgi pipeline app that is configurable via json	19:49
mpanetta	I guess what I mean is, how does the system determine who/what is whitelisted? Are only local connections allowed to be whitelisted?	19:50
kgriffs	you give it a list of apps and it loads them with stevedore	19:50
kgriffs	um	19:50
kgriffs	it can be whatever you like, I suppose	19:50
mpanetta	Ah ok it is app level whitelisting	19:50
kgriffs	whitelist based on URL and/or ip address or something	19:50
kgriffs	but really, can't you block the auth url at the LB from outside users hitting it?	19:51
kgriffs	(blacklist)	19:51
kgriffs	that seems more reliable than the app trying to determine whether the caller is an admin or load balancer or something	19:52
mpanetta	the LB is quite dumb	19:54
mpanetta	It does not allow URL filtering	19:54
mpanetta	I think the idea was the admin endpoint would not go through the LB and would only be internally accessable.	19:54
kgriffs	hmm	19:55
mpanetta	I don't think the app should know anything, tis why I thought a separate app for health check would be good.	19:55
mpanetta	That app would be responsable for auth.	19:55
kgriffs	but then you are only checking whether the health app is "healthy", not the app itself	19:55
kgriffs	wait	19:56
kgriffs	I think I see where you going	19:56
mpanetta	No cause the health app would do a queue post, and etc	19:56
mpanetta	if the queue post fails we return a 5xx	19:56
mpanetta	I would assume if the storage backend is bad on the node the queue post woud fail.	19:56
mpanetta	Is that a poor assumption>?	19:57
zyuan	i'm implementing an alive() method for storage (which calls mongoclient's alive())	19:57
zyuan	i think one app is enough, since if the app itself is down, you won't get a 200 response anyway	19:58
kgriffs	i think we should actually try to post something - that would be a deeper test, would it not?	19:58
kgriffs	and it isn't like this is going to be gazillions of pings per second or anything	19:59
zyuan	unless the db is phicially broken, i don't see a need of something other than a no-op	19:59
mpanetta	Nope, should be no more then a few a minute.	19:59
kgriffs	mpanetta: thing is, if you have a separate app, how does it talk to the localhost "real" app without going through auth? It would have to have it's own valid cloud creds	19:59
kgriffs	zyuan: there may be a failing disk or something that alive() isn't going to check	20:00
kgriffs	remember that this needs to be storage agnostic	20:00
zyuan	db connection can break of course	20:00
zyuan	failing disk can not be checked with a queue posting or something	20:01
zyuan	because usually these requests don go to disk	20:01
zyuan	you need disk monitor services	20:01
mpanetta	Don't care about that	20:01
mpanetta	we only care of the queue service will respond	20:02
zyuan	"health", to me, only means "connection is good"	20:02
zyuan	yea	20:02
mpanetta	and by respond, I mean allow [posting to a queue.	20:02
mpanetta	Beyond that, we don't worry about, disk failure is out of scope.	20:02
zyuan	afaic [posting to a queue] is just an implementation detail	20:03
mpanetta	yes	20:03
zyuan	what mongo's alive() does is to select on connections	20:03
kgriffs	http://d3j5vwomefv46c.cloudfront.net/photos/large/816932810.png?1382472163	20:03
zyuan	i belive that's enough	20:03
kgriffs	zyuan: Its up to the devops guys, what they want	20:04
kgriffs	mpanetta, oz_akan_: how deep of a check do you need?	20:04
mpanetta	Technically all we care about is that we can post to a queue, that is enough proof to me that the server is alive.	20:04
mpanetta	The app (or whatever it end up being) will only return a simple status code.	20:05
kgriffs	ok, so the nice thing about that is we don' have to implement it differently for each storage driver	20:05
mpanetta	I think oz_akan_ has something else in mind for zenoss, but for the LB all we need is a go/nogo	20:05
*** vkmc has quit IRC		20:05
kgriffs	we could just attempt calling the driver's post() method	20:06
kgriffs	I guess we would have to ensure the queue is created first too. :[	20:07
zyuan	it's create()	20:07
kgriffs	the sharding driver would just call that for each driver under it's control. I guess it is trickier for when sharding is not enabled. We currently go directly to the storage driver's method	20:08
kgriffs	ah	20:08
kgriffs	we could just have a default implementation in the base class that does the check to try and post a message.	20:08
kgriffs	the sharding driver would be the only one that would need to override it	20:08
kgriffs	zyuan: ?	20:09
zyuan	i feel unconfortable about including a testing method in API	20:09
mpanetta	kgriffs: Yeah of course, but that should be simple logic.	20:09
zyuan	i'm in favor of a proof of working method...	20:09
kgriffs	zyuan: messages controller defines post for messages	20:09
mpanetta	(The que existing part)	20:09
zyuan	i mean queues	20:09
zyuan	they want queues	20:09
kgriffs	mpanetta: I thought you said post a message, not create a queue for the test?	20:10
zyuan	so , you see, different people want to test different parts. they show the evidence, but not proof	20:10
mpanetta	kgriffs: Well, the queue has to exist to post to it, no? ;)	20:10
kgriffs	yeah	20:10
mpanetta	Queue creation should only occur once.	20:11
mpanetta	This is why I am kind of for just having a health check client, it removes health check from the scope of marconi.	20:11
kgriffs	if the test is create and delete a queue then we would have to choose unique names each time for the queue	20:11
kgriffs	just something to consider	20:11
mpanetta	Different people may have different ideas of health check	20:11
kgriffs	mpanetta: I'm still not convinced that is a real health check	20:11
zyuan	if you want test, then just do so	20:11
zyuan	and test auth at the same time	20:12
mpanetta	kgriffs: Me either	20:12
kgriffs	if you aren't going through the workers serving the actual requests, then you can't know they are healthy	20:12
zyuan	i don't see a reason why health means test	20:12
mpanetta	kgriffs: Yes, we would only be hitting a single worker.	20:12
kgriffs	zyuan: health check is for LB so it knows whether or not to stop sending traffic to a node	20:12
mpanetta	kgriffs: All though worker health is a uwsgi issue, and it seems to handle that well.	20:12
kgriffs	imo, we should simulate the user as closely as possible for a health check	20:13
mpanetta	At least with the very simple, "Kill some random workers and see if they respawn" test...	20:13
zyuan	kgriffs: then health internally ping each shard, what's the problem?	20:13
mpanetta	kgriffs: I agree	20:13
kgriffs	we aren't just testing the storage, we are testing the uwsgi	20:13
kgriffs	s/testing/health-checking	20:13
zyuan	if you can	20:13
zyuan	t get response from an endpoint, obviously your wsgi down	20:14
mpanetta	It is basically an end-to-end check, but don't think of it so complex, it really is just a simple (is the system useable?) test I think.	20:14
kgriffs	zyuan: that's the thing. The LB can't ping the endpoint using an Auth token - it isn't smart enough	20:14
mpanetta	Yes, this LB is quite simple...	20:15
kgriffs	so we are trying to find a way to ping a real endpoint just like a user except without auth	20:15
zyuan	kgriffs: open /heath, that's we'are trying to do, don't it?	20:15
zyuan	doesn't it?	20:15
kgriffs	yeah, but that is behind auth right now. We could circumvent that within Marconi itself, but then anyone building up their own wsgi pipeline still has the problem.	20:16
mpanetta	but I thought just opening /health was basically a noop anyay?	20:16
kgriffs	so, I was trying to come up with a generic solution. Let's us ping real workers without Auth	20:16
mpanetta	*anyway	20:16
kgriffs	it is now	20:16
kgriffs	that's why I created two bugs	20:17
mpanetta	Ah, but it does not have to be?	20:17
mpanetta	Ok	20:17
kgriffs	first step is to just fix the auth issue	20:17
mpanetta	I see now.	20:17
mpanetta	does health have to be authed?	20:17
kgriffs	second step is to do a deeper health check	20:17
*** jdaggett has quit IRC		20:17
mpanetta	I guess we are worried about a DOS attack in that case...	20:17
kgriffs	mpanetta: I don't see why it does, although you may want to prevent end users from hitting it unless you know that is will be rate limited	20:17
kgriffs	or you know it returns sensitive internal data	20:18
mpanetta	kgriffs: Ok, yeah that was my only concern.	20:18
kgriffs	right now it just returns an empty body	20:18
mpanetta	It probably should stay that way...	20:18
kgriffs	thing is, if rate limiting depends on knowing the project ID, you won't have that unless you auth. :p	20:19
mpanetta	There is no default rate limit?	20:19
mpanetta	A very low default rate limit would be ok...	20:19
mpanetta	Perhaps	20:19
kgriffs	you have to have some way to scope/bucket the counters	20:19
mpanetta	Ah, you do not have an 'unknown' or 'everything else' bin? ;)	20:19
alcabrera	Guys, I'm heading home. I'll be back online in a bit to finish rebasing the last of the patches. I've hit an annoying unit test issue where bootstrap.storage is always returning faulty_driver, so that's been slowing me down. :P	20:20
kgriffs	well, then someone could do a DDoS on us	20:20
mpanetta	Either way, I can still forsee an issue...	20:20
alcabrera	See y'all in a bit.	20:20
kgriffs	just flood us with health checks so the LB can't get in	20:20
*** alcabrera has quit IRC		20:20
mpanetta	kgriffs: That was my concern	20:20
kgriffs	hmmm	20:21
kgriffs	I just realized my last diagram is bogus	20:21
kgriffs	if it's whitelisted, it is whitelisted for everybody	20:21
kgriffs	so, rbac check doesn't help	20:21
mpanetta	Crap, so back to the DDOS...	20:21
mpanetta	Seems like even more a reason for health check to be external...	20:24
mpanetta	At least that way we could expose the end point to only a short list of IP's, the ones specific to the LB's.	20:24
zyuan	id on't think so	20:24
zyuan	if you can DDOS health, you can DDOS auth as well	20:25
mpanetta	Auth requests are cached though aren't they?	20:25
zyuan	so you can't DDOS health; it currently does nothing and i'm trying to prevent it from doing too much	20:26
mpanetta	Problem is, doing nothing isn't very useful ;)	20:27
zyuan	if it proves the app is working, it's enough	20:27
kgriffs	http://d3j5vwomefv46c.cloudfront.net/photos/large/816934453.png?1382473585	20:27
zyuan	if you want to test, and add an speciall account for test	20:27
kgriffs	how about that?	20:27
kgriffs	oh crap	20:28
mpanetta	Basically it looks to me like we have 2 paths, we have a very simple health check, that just says that marconi is listening, and is fast to reply, or we have something more complex, like check a queue which exercises the backend as well.	20:28
kgriffs	needs to go through worker	20:28
kgriffs	blah	20:28
kgriffs	just a moment	20:28
kgriffs	can LB hit a different port on the node to do health check?	20:28
zyuan	mpanetta: if you want that, open an account and post it in client	20:28
mpanetta	no :(	20:28
mpanetta	LB is super dumb	20:28
zyuan	it's just correct that LB is dumb	20:29
mpanetta	zyuan: That is what I am saying, maybe we should do that in a seprate app?	20:29
zyuan	where you need that app?	20:30
mpanetta	on the same box as the web head	20:30
zyuan	behind LB?	20:30
zyuan	if so, i don't agree	20:30
mpanetta	the request would have to be routed appropriately	20:30
mpanetta	Yes behind the LB	20:30
mpanetta	It would have to be, since the LB is running the check.	20:30
zyuan	i don't think so, LB shound not have a way to tell the app logic	20:31
kgriffs	i'm tired of uploading the pic	20:31
kgriffs	https://docs.google.com/file/d/0BxZAkOZUwvFAWnU1aWJfdjJ6V1U/edit?usp=sharing	20:31
kgriffs	just go there and open in draw.io	20:31
kgriffs	:p	20:31
mpanetta	zyuan: Exactly	20:31
mpanetta	LB is dumb, will always be dumb.	20:31
zyuan	then why you want it to behind LB?	20:31
kgriffs	so, that latest revision to the drawing would be nice if the LB were smart enough to go to a different port on the box	20:31
zyuan	behind LB means LB can access this app	20:31
kgriffs	waaaait	20:31
mpanetta	zyuan: to you, what is behind LB?	20:31
kgriffs	can't you make your router smart enough to go through the bastian for just /health ?	20:32
kgriffs	i mean, nginx can do stuff like that	20:32
mpanetta	To me behind LB means on marconi side of LB, not client side.	20:32
zyuan	request -> LB -> here -> marconi	20:32
mpanetta	Ok, yes it has to be where here is.	20:32
zyuan	if request -> LB -> health -> marconi	20:32
mpanetta	LB has to be able to access	20:32
mpanetta	Yes that is exactly.	20:32
zyuan	then LB can tell marconi's logic by accesing health	20:32
mpanetta	How?	20:32
zyuan	i don't agree with this.	20:32
mpanetta	All the health endpoint does is return go/nogo	20:33
zyuan	because you want this health do real posting	20:33
mpanetta	Internally	20:33
mpanetta	All the LB will see is 200 or 500	20:33
kgriffs	https://docs.google.com/file/d/0BxZAkOZUwvFAWnU1aWJfdjJ6V1U/edit?usp=sharing	20:33
zyuan	real posting is NG here to me.	20:33
kgriffs	i saved - not sure if it automagically updates for you guys	20:33
zyuan	request -> LB -> marconi	20:33
zyuan	and LB can also get 200 or xxx	20:34
zyuan	"is marconi alive" should be the infomation known by LB	20:34
mpanetta	kgriffs: taking a look now.	20:34
*** ayoung has quit IRC		20:34
zyuan	not "is marconi doing the right thing"	20:34
mpanetta	zyuan: But, isnt alive "doing the right thing"?	20:34
zyuan	no	20:34
zyuan	alive means, phically good	20:35
zyuan	right means, logically good	20:35
mpanetta	Because if it isn't 'doing the right thing', then the LB needs to drop it, else clients will see the issue.	20:35
kgriffs	by "alive" I should think we mean "the node can accept requests from users without 500's"	20:35
mpanetta	Yes	20:35
zyuan	LB must do not drop because nodes are logically wrong. LB should be dumb and don't what "logic" means	20:35
kgriffs	500's or layer 3 errors	20:36
kgriffs	network link between LB and web heads is already handled by the LB	20:36
mpanetta	zyuan: Why not? It does for other services.	20:36
kgriffs	we need to just detect internal app health problems	20:36
mpanetta	zyuan: If a server returns a 5xx, LB will drop.	20:36
kgriffs	so we can send the user somewhere else before they know anything is wrong	20:36
zyuan	there are many cases a server returns 5xx	20:36
zyuan	and some of them i don't think LB should understand	20:37
kgriffs	the health ping is just for checking health then in advance of user requests?	20:37
zyuan	so far yes, and what i	20:37
kgriffs	i mean, if the LB already watches for 500's then is the health check needed?	20:37
mpanetta	kgriffs: Specifically it means can this web head serve user requests.	20:37
zyuan	'm trying to add is to check db connection as well	20:38
zyuan	mpanetta: so, you see	20:38
mpanetta	kgriffs: If the health endpoint returns 5xx when something is broken then yeah.	20:38
zyuan	there is a big gap between "can server user requests" and "can create queue/messages"	20:38
kgriffs	mpanetta: no, i mean, if in the course of a user request, a 500 comes back, does LB take the node out of rotation?	20:39
kgriffs	(not a request to health)	20:39
mpanetta	Basically I point the LB to a specific endpoint and LB pings the endpoint checking for a response. No response, LB drops, bad (5xx) LB drops.	20:39
mpanetta	kgriffs: No, it won't see it.	20:39
kgriffs	if so, and user requests happen frequently enough, seems like you wouldn't need the health check	20:39
mpanetta	I wish it did that ;)	20:39
kgriffs	me too!	20:39
mpanetta	But no, it only checks the endpoint we tell it to.	20:39
kgriffs	ok, so what do you think about that latest design?	20:40
mpanetta	It won't load :(	20:40
kgriffs	bah	20:40
kgriffs	stand by	20:40
kgriffs	g+	20:40
zyuan	last word: if LB can tell whether user can create a queue or not, it's no longer a "load" balance; it's "marconi" balancer	20:41
kgriffs	mpanetta755 ?	20:41
mpanetta	Technically the LB doesn't know squat.	20:41
mpanetta	kgriffs: panetta.mike	20:41
mpanetta	All LB knows is the endpoint we told it returns 500 or not	20:41
mpanetta	The one issue that this has to avoid, is that the end user can contact the queue server and not be able to use the service because one of the web heads is malfunctioning, but we can't detect it.	20:43
kgriffs	https://plus.google.com/hangouts/_/a260606dab37d907e09607e4dd107f51dd294bd5?hl=en	20:43
mpanetta	Ultimately that is all we care about, if you want to get down to it.	20:44
mpanetta	kgriffs: That image is closer to what I was thinking.	20:46
mpanetta	except the bastion app will reside on the same server as the worker(s) it is responsible for.	20:46
mpanetta	Mainly because it has to... heh	20:46
kgriffs	yeah	20:47
kgriffs	so that big box is all localhost	20:47
mpanetta	Ok that looks correct.	20:47
*** jdaggett1 has joined #openstack-marconi		20:48
kgriffs	can the router be made to do that?	20:48
mpanetta	I believe so yes.	20:48
kgriffs	the alternative I guess is to have a wsgi app that does it	20:48
kgriffs	then it wouldn't depend on the router - work with any router	20:48
mpanetta	Yes	20:49
mpanetta	In our case the router is uwsgi heh	20:49
mpanetta	So it would be a uwsgi app	20:49
*** alcabrera has joined #openstack-marconi		20:50
* alcabrera catches up		20:50
kgriffs	arg	20:51
kgriffs	just though of something	20:51
kgriffs	nevermind	20:51
kgriffs	rbac to the rescue	20:51
mpanetta	alcabrera: if you have the link, kgriffs has pretty pictures in the hangout :)	20:52
kgriffs	https://plus.google.com/hangouts/_/a260606dab37d907e09607e4dd107f51dd294bd5?hl=en	20:52
mpanetta	Maybe I can convince oz_akan_ to come here too heh	20:53
alcabrera	I'll join up in a moment. :)	20:54
* alcabrera is caught up		20:54
kgriffs	so, the idea is that we would have that health bastian middleware	20:54
kgriffs	you configure it with an account which has a specific role that you can key off in RBAC middleware or oslo.policy	20:55
*** vkmc has joined #openstack-marconi		20:55
kgriffs	so, for everything BUT /health, the bastian is pass-through	20:55
kgriffs	for /health, it injects an X-Auth-Token	20:55
mpanetta	Hmm	20:56
kgriffs	that way, we don't have random users DDoS'n us	20:56
mpanetta	An optimization would be to make uwsgi only forward requests to /health to the bastian.	20:56
kgriffs	(since rate limiting is keyed off the tenant/project ID)	20:56
mpanetta	Ah ok	20:56
*** jdaggett1 has quit IRC		20:57
kgriffs	performance-wise the passthough should be almost as fast as the router	20:57
kgriffs	but yeah, we could make it an external app	20:57
kgriffs	I was just thinking, putting the bastian into the pipeline means it would work with any kind of router/rproxy you put in front	20:58
mpanetta	Hmm, ok.	20:58
mpanetta	Or, the health check 'app', could be the first example app for the client lib... If you want others to be able to reuse it.	20:59
mpanetta	I donno.	21:00
kgriffs	hmm	21:00
mpanetta	I really should draw up a low level diagram of how we are doing things now.	21:00
mpanetta	Basically each web head has 8 uwsgi instances running marconi, and a router uwsgi instance	21:01
mpanetta	That selects the marconi instances based on some load balancing logic.	21:01
kgriffs	crap crap crap	21:01
mpanetta	?	21:01
alcabrera	?	21:01
kgriffs	any user can still hit /health unless somehow health bastian checks who is calling	21:01
alcabrera	yup	21:02
mpanetta	yeah	21:02
mpanetta	That is the biggest issue.	21:02
mpanetta	And I think we can handle it in the router, uwsgi has allow/deny rules I believe.	21:02
alcabrera	short of having a magic key or something like that for the health bastion configuration, any user could hit that endpoint.	21:02
mpanetta	We would have to whitelist the LB ips.	21:02
kgriffs	let's assume LB can only talk to the web head via internal network	21:03
oz_akan_	we can have an interesting url that none would know	21:03
kgriffs	meaning, a user can't hit the web head directly	21:03
oz_akan_	/healthsosososdjj33343434	21:03
kgriffs	oh, nevermind	21:03
mpanetta	security through obscurity is a nono...	21:03
kgriffs	lol	21:03
mpanetta	That is true, LB can only talk to web head internally.	21:04
*** malini_afk is now known as malini		21:04
kgriffs	but it is bad mojo to put secrets in URL	21:04
mpanetta	yes	21:04
oz_akan_	it is health check after all	21:04
mpanetta	Yeah, but we are worried about health check becoming the achilles heel I think.	21:05
kgriffs	so, normally we don't care if a user hits /health	21:05
kgriffs	however, the LB needs to hit is without auth	21:05
kgriffs	hmmm	21:06
kgriffs	so we inject auth using a bastian	21:06
kgriffs	but then rate limiting goes into a single bucket	21:06
kgriffs	opening us up for DDoS	21:06
kgriffs	unless	21:06
kgriffs	blah	21:07
kgriffs	nevermind	21:07
alcabrera	suggestion - why not: rate limit -> bastion -> auth -> rbac -> app?	21:07
kgriffs	does LB do X-Forwarded-For?	21:07
alcabrera	DDoS protection by rate limiting?	21:07
oz_akan_	I thought we had a very simple solution already :)	21:07
kgriffs	rate limit depends on knowing project ID	21:07
alcabrera	hmmm	21:08
alcabrera	that's fine :)	21:08
kgriffs	and we don't know that before auth	21:08
oz_akan_	I mean Mike's solution	21:08
alcabrera	We fake the project ID	21:08
alcabrera	Everyone hitting health has the same project ID	21:08
alcabrera	How about that?	21:08
kgriffs	that is the problem	21:08
kgriffs	we don't want that	21:08
mpanetta	oz_akan_: I think my solution turns out to be complicated when we worry about DDOS.	21:09
kgriffs	allows non-admin users to mess with the limit counter	21:09
kgriffs	mpanetta: how about that X-Forwarded-For header?	21:09
mpanetta	kgriffs: I don't know.	21:09
openstackgerrit	Dirk Mueller proposed a change to openstack/marconi: Start 2014.1 development https://review.openstack.org/53219	21:10
mpanetta	I can't set headers in health check, but maybe the absence of them would be ok.	21:10
kgriffs	My thinking was, if that header IS NOT present, we can assume it is the LB itself making the request to /health	21:10
mpanetta	Ah good thought.	21:10
alcabrera	cool - that sounds like it would work.	21:11
mpanetta	Let me read LB docs to see if it sets that header.	21:11
kgriffs	then and only then would the bastian inject the auth. Alternatively, it could skip auth middleware	21:12
alcabrera	I'm passing on the pretty pictures. I'm pretty heads down on getting this patch rebased. :P	21:12
mpanetta	alcabrera: Ok :)	21:12
kgriffs	so, related thought	21:13
kgriffs	it would be nice if instead of hard-coding keystone auth into the marconi WSGI app...	21:14
kgriffs	we had a generic notion of a WSGI pipeline	21:14
kgriffs	so operators don't have to write their own app.py to use other middleware	21:14
kgriffs	then we can have a solution to the auth issue that works for everybody	21:15
kgriffs	the issue is that sometimes you want to but stuff after keystone auth	21:15
kgriffs	but the current "auth strategy" approach only allows you to put stuff before auth	21:15
mpanetta	Ah yes	21:16
kgriffs	so, you are forced to not use WSGI transport's auth support if you want to do that	21:16
kgriffs	and it leads to every operator having to reinvent app.py ad naseum	21:17
mpanetta	Bah, the docs seem lacking wrt set headers from LB	21:17
kgriffs	can you test	21:17
mpanetta	Hmm	21:17
kgriffs	i mean, capture request headers with a test request through the LB?	21:17
mpanetta	yeah...	21:17
mpanetta	sec	21:19
mpanetta	I need to install tcpdump...	21:20
mpanetta	Well, the user agent is set to an interesting value...	21:22
mpanetta	Yes, it sets X-Forwarded-For	21:24
mpanetta	kgriffs: ^^	21:25
kgriffs	ok, does the health check set that header as well?	21:25
kgriffs	(not sure what it would set it to!)	21:25
mpanetta	And X-Forwarded-Port as well...	21:25
mpanetta	It does not set it for health check	21:25
mpanetta	Only header that is set for that is agent id	21:25
kgriffs	w00t	21:26
kgriffs	FAN-TAST-IC	21:26
mpanetta	Yep	21:26
mpanetta	Stress level going down ;)	21:26
kgriffs	ok, so only thing left to decide is whether to run a separate app and have a router rule or just stick it in the wsgi pipeline	21:27
mpanetta	https://gist.github.com/anonymous/ff85fea856d008f312c5	21:28
kgriffs	from my perspective, deploying an extra app on the box seems more complicated, but i could be wrong	21:28
mpanetta	Check that for available headers	21:28
kgriffs	thanks!	21:29
mpanetta	Eh, from my pov it is just configuring another uwsgi instance, so not much more difficult.	21:29
mpanetta	Either way, how it is done does not matter as much to me ;)	21:29
kgriffs	ok	21:29
kgriffs	hmm	21:30
kgriffs	separate app would also require configuration for the loopback	21:30
mpanetta	true	21:30
*** jdaggett1 has joined #openstack-marconi		21:31
kgriffs	i'm thinking the wsgi middleware (worker option B) would be better and has a nice property of working with any kind of router	21:31
kgriffs	gunicorn, uwsgi, nginx, whatever	21:31
mpanetta	Ok	21:31
kgriffs	without having to use different configs, or maybe the router is too dumb anyway	21:31
mpanetta	I don't know about other implementations, but uwsgi is extremely powerful.	21:31
mpanetta	To the point of confusion in some cases it seems...	21:31
kgriffs	indeed	21:32
kgriffs	ready to write your first EOM contribution?	21:33
kgriffs	;)	21:33
mpanetta	hehe sure lol	21:34
kgriffs	so, we need a thing that you can configure with a URI	21:34
kgriffs	if that URI matches, AND X-Forwarded-For is NOT present, then it should inject X-Auth-Token	21:35
mpanetta	Hopefully oz_akan_ will grant me the time to do such a wonderful thing ;)	21:35
kgriffs	guess we should make an issue	21:35
kgriffs	hold on	21:35
* mpanetta holds		21:35
mpanetta	afk	21:36
*** mpanetta is now known as mpanetta_afk		21:36
oz_akan_	I have to leave now, I will try to understand tomorrow why this needs to be a part of EOM and why mpanetta needs to invest time on this	21:36
kgriffs	heh	21:37
oz_akan_	even if it is eom, it has to have a token first.. I didn't follow the thread.. anyway.. talk to you tomorrow	21:37
kgriffs	he doesn't necessarily, but someone does	21:37
kgriffs	yeah, you have to configure it with account creds	21:37
oz_akan_	ok, lets catch up tomorrow, bye for now	21:37
*** oz_akan_ has quit IRC		21:37
kgriffs	ttfn	21:37
alcabrera	setattr - it's what's been biting me all this time. Something about TestBaseFaulty makes it so that all tests were using the FaultyStorage driver. :/	21:40
alcabrera	Still digging into this	21:40
alcabrera	I noticed that changing the name of bootstrap.storage to bootstrap.kab was fixing all tests except for those involving the Faulty Storage driver.	21:41
zyuan	alcabrera: so you want some tests uses Faulty driver or something?	21:43
alcabrera	nah	21:43
alcabrera	There's something weird going on that the existing FaultyTest that's affecting the rest of the tests.	21:44
alcabrera	**on with the ...	21:44
mpanetta_afk	alcabrera: Your brain is faster then your fingers :)	21:49
alcabrera	mpanetta_afk: yup. :P	21:50
*** mpanetta_afk is now known as mpanetta		21:50
mpanetta	I have that problem at times, it results in sentences missing bits of thought lol	21:51
mpanetta	Anyway, it is go home time for me.	21:52
*** mpanetta has quit IRC		21:53
alcabrera	curiously, if I remove the setattrs from the FaultyDriver test setup, all tests now pass.	21:53
zyuan	...	21:54
alcabrera	including the faulty driver tests.	21:54
zyuan	!!!	21:54
openstack	zyuan: Error: "!!" is not a valid command.	21:54
*** ayoung has joined #openstack-marconi		21:57
zyuan	i want to help if you can't solve it by tomorrow	21:57
*** malini is now known as malini_afk		22:05
openstackgerrit	Alejandro Cabrera proposed a change to openstack/marconi: feat: integrate shard storage with transport https://review.openstack.org/50998	22:05
alcabrera	solved	22:06
alcabrera	I decided to promote 'faulty' to a setup.cfg entry point.	22:06
alcabrera	Then everything works without setattr magic	22:06
alcabrera	zyuan: ^	22:06
alcabrera	zyuan: thanks for the offer to help, though. :)	22:06
*** tedross has quit IRC		22:07
alcabrera	zyuan, kgriffs: all patches in the admin api feature branch are rebased and ready for review.	22:09
alcabrera	I'm double-checking the queues' catalogue storage driver now	22:09
alcabrera	(separate branch)	22:09
kgriffs	nice work	22:10
kgriffs	I will check it out	22:10
alcabrera	thanks!	22:10
kgriffs	fyi, eom issue for that auth thingy	22:12
kgriffs	https://github.com/racker/eom/issues/7	22:12
alcabrera	kgriffs: cool - I'll tackle that one tomorrow morning. I need a change of pace. Waaaay too much rebasing lately. :P	22:13
kgriffs	ok	22:13
kgriffs	let's sync up with the devops guys in the morning to finalize the design	22:14
alcabrera	sure thing	22:14
alcabrera	I'm our for the night. There's some pork chops just waiting to be cooked.	22:15
alcabrera	o/	22:15
alcabrera	*out	22:15
*** alcabrera has left #openstack-marconi		22:15
*** jdaggett1 has quit IRC		22:16
*** jdaggett1 has joined #openstack-marconi		22:17
*** jdaggett1 has left #openstack-marconi		22:17
*** amitgandhi has quit IRC		22:18
*** oz_akan_ has joined #openstack-marconi		22:44
*** oz_akan_ has quit IRC		22:49
*** reed has quit IRC		23:15
*** reed has joined #openstack-marconi		23:16
*** jcru has quit IRC		23:25
*** malini_afk is now known as malini		23:34
*** kgriffs is now known as kgriffs_afk		23:37
*** amitgandhi has joined #openstack-marconi		23:53

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!