*** vkmc has joined #openstack-marconi | 00:10 | |
*** fifieldt_ has joined #openstack-marconi | 00:24 | |
*** vkmc has quit IRC | 00:39 | |
*** oz_akan_ has joined #openstack-marconi | 00:44 | |
*** oz_akan_ has quit IRC | 01:06 | |
*** oz_akan_ has joined #openstack-marconi | 01:06 | |
*** oz_akan_ has joined #openstack-marconi | 01:07 | |
*** oz_akan_ has quit IRC | 01:13 | |
*** oz_akan_ has joined #openstack-marconi | 01:20 | |
*** oz_akan_ has quit IRC | 02:27 | |
*** oz_akan_ has joined #openstack-marconi | 02:28 | |
*** oz_akan_ has quit IRC | 02:32 | |
*** oz_akan_ has joined #openstack-marconi | 02:41 | |
*** oz_akan_ has quit IRC | 02:43 | |
*** oz_akan_ has joined #openstack-marconi | 03:54 | |
*** oz_akan_ has quit IRC | 03:58 | |
*** dafter has joined #openstack-marconi | 05:34 | |
*** dafter has quit IRC | 05:40 | |
*** dafter has joined #openstack-marconi | 05:41 | |
*** dafter has quit IRC | 05:45 | |
*** ykaplan has joined #openstack-marconi | 06:40 | |
*** dafter has joined #openstack-marconi | 07:46 | |
*** ykaplan has quit IRC | 08:04 | |
*** yassine has joined #openstack-marconi | 08:14 | |
openstackgerrit | Flavio Percoco proposed a change to openstack/marconi: 'Persist' __getattr__ results https://review.openstack.org/51539 | 08:25 |
---|---|---|
openstackgerrit | Flavio Percoco proposed a change to openstack/marconi: Use stevedore instead of importutils https://review.openstack.org/51540 | 08:25 |
openstackgerrit | Flavio Percoco proposed a change to openstack/marconi: Return a consumer function instead of consuming https://review.openstack.org/51541 | 08:25 |
*** ykaplan has joined #openstack-marconi | 08:38 | |
*** ykaplan has quit IRC | 09:03 | |
*** ykaplan has joined #openstack-marconi | 10:38 | |
*** dafter has quit IRC | 11:04 | |
*** tedross has joined #openstack-marconi | 11:18 | |
*** fifieldt_ has quit IRC | 11:21 | |
*** dafter has joined #openstack-marconi | 12:06 | |
*** dafter has quit IRC | 12:12 | |
*** tvb|afk has joined #openstack-marconi | 12:14 | |
*** tvb|afk has quit IRC | 12:14 | |
*** tvb|afk has joined #openstack-marconi | 12:14 | |
*** tedross has quit IRC | 12:23 | |
*** malini_afk is now known as malini | 12:33 | |
*** mpanetta has joined #openstack-marconi | 12:40 | |
*** tvb|afk has quit IRC | 12:41 | |
*** tedross has joined #openstack-marconi | 12:41 | |
*** mpanetta has quit IRC | 12:42 | |
*** mpanetta has joined #openstack-marconi | 12:42 | |
*** dafter has joined #openstack-marconi | 12:44 | |
*** dafter has quit IRC | 12:44 | |
*** dafter has joined #openstack-marconi | 12:44 | |
*** oz_akan_ has joined #openstack-marconi | 13:05 | |
*** oz_akan_ has quit IRC | 13:07 | |
*** oz_akan_ has joined #openstack-marconi | 13:08 | |
*** rustlebee is now known as russellb | 13:10 | |
*** amitgandhi has joined #openstack-marconi | 13:26 | |
*** amitgandhi has quit IRC | 13:26 | |
*** amitgandhi has joined #openstack-marconi | 13:27 | |
*** vkmc has joined #openstack-marconi | 13:27 | |
*** kgriffs_afk is now known as kgriffs | 13:38 | |
*** kgriffs is now known as kgriffs_afk | 13:40 | |
*** ametts has joined #openstack-marconi | 13:44 | |
*** jcru has joined #openstack-marconi | 14:08 | |
*** alcabrera has joined #openstack-marconi | 14:21 | |
*** dafter has quit IRC | 14:33 | |
*** dafter has joined #openstack-marconi | 14:38 | |
*** tvb|afk has joined #openstack-marconi | 14:38 | |
*** tvb|afk has quit IRC | 14:38 | |
*** tvb|afk has joined #openstack-marconi | 14:38 | |
*** dafter has quit IRC | 14:38 | |
*** kgriffs_afk is now known as kgriffs | 14:41 | |
kgriffs | alcabrera: morning! | 14:41 |
alcabrera | kgriffs: morning! :) | 14:41 |
kgriffs | soooo | 14:41 |
kgriffs | I kicked over an ant pile when I started working on getting rid of global conf | 14:42 |
kgriffs | :p | 14:42 |
alcabrera | oh? :x | 14:42 |
alcabrera | What'd you find that cause a stir? | 14:42 |
kgriffs | well, it turns out there are tons of things that break when you try to do that | 14:42 |
kgriffs | so, I am putting that it it's own patch | 14:42 |
kgriffs | and then the sharding will depend on it | 14:43 |
zyuan | kgriffs: i remembered there was a negative ttl problem reported to cloudqueues@, is there any further information? | 14:43 |
kgriffs | zyuan: ah, that one's been fixed | 14:43 |
kgriffs | alcabrera: so, i am getting close to having that done, stand by | 14:44 |
alcabrera | kgriffs: yikes. well, the separate patch approach works for me, esp. considering the scope of the change. | 14:44 |
alcabrera | also... | 14:44 |
alcabrera | Marconi meeting in an hour, kgriffs? :P | 14:44 |
* alcabrera goes to check agenda | 14:44 | |
kgriffs | oh, yeah. forgot to send out the email again | 14:44 |
alcabrera | it was a frantic week, this last one. Seems like one of us should start sending out that email Fridays. | 14:45 |
kgriffs | yeah. Fridays would be good | 14:45 |
kgriffs | anyway, yeah, we are meeting | 14:46 |
kgriffs | :p | 14:46 |
alcabrera | heh. :P | 14:47 |
zyuan | kgriffs: which patch? | 14:47 |
kgriffs | https://github.com/openstack/marconi/commit/33f75cd5927e129e6de4fe2116abb5159fa0061a | 14:48 |
*** tvb|afk has quit IRC | 14:50 | |
kgriffs | alcabrera: just registered this bug, fwiw - https://bugs.launchpad.net/marconi/+bug/1239725 | 14:51 |
alcabrera | kgriffs: thanks! There's this other very related one here: https://bugs.launchpad.net/marconi/+bug/1229848 | 14:53 |
alcabrera | actually... the title is 90% identical. :P | 14:53 |
alcabrera | Maybe dup? | 14:53 |
alcabrera | Since you're working on eliminating the global config, I'm going to merge the two tickets into yours and leave you assigned to it. | 14:54 |
kgriffs | oh, ok | 14:54 |
kgriffs | I saw that one but it was specific to proxy and the proxie's future is TBD | 14:55 |
kgriffs | so... | 14:55 |
kgriffs | not sure if it is worth merging yet | 14:55 |
*** flaper87|afk is now known as flaper87 | 14:57 | |
alcabrera | kgriffs: fair enough. | 14:57 |
alcabrera | flaper87: o/ | 14:57 |
alcabrera | kgriffs: I updated the title instead (prefix: [queues]) | 14:57 |
zyuan | kgriffs: not this one, this one fixes negative age. serveral days ago, someone mentioned there is a negative ttl | 15:01 |
kgriffs | hmm, I didn't hear about that | 15:02 |
kgriffs | is there a bug registered? | 15:02 |
*** yassine has quit IRC | 15:03 | |
*** tedross has quit IRC | 15:03 | |
alcabrera | kgriffs, zyuan: there is no open bug registered for negative TTL/age at this time. | 15:04 |
kgriffs | malini: ping | 15:05 |
flaper87 | yo yo yooooo | 15:10 |
flaper87 | how are y'all doiiiing??? | 15:11 |
zyuan | kgriffs: ah ok, it is the one you fixed | 15:11 |
alcabrera | flaper87: alright. How'd PyconIE go? (is it still going?) :) | 15:11 |
flaper87 | it went pretty well, loved it! The talk was good as well,at least that's what people said :D | 15:13 |
kgriffs | cool | 15:13 |
* flaper87 talked about the new partition stuff | 15:14 | |
flaper87 | folks were very happy about it | 15:14 |
alcabrera | sweet! | 15:15 |
*** yassine has joined #openstack-marconi | 15:16 | |
flaper87 | I'm about to jump on my flight back, I guess I'll miss the meeting this week | 15:17 |
flaper87 | did you guys saw my comments on gerrit? | 15:17 |
flaper87 | (and the new patches) "D | 15:17 |
flaper87 | :D | 15:17 |
alcabrera | flaper87: yup, I've been checking them out. :) | 15:17 |
flaper87 | kgriffs: btw, if you get a chance, could you go through the client queue ? | 15:18 |
kgriffs | yeah, sorry I've been behind on that | 15:18 |
kgriffs | hopefully later today | 15:18 |
flaper87 | alcabrera: awesome, thanks a lot! | 15:18 |
alcabrera | flaper87: cached_getattr is cool. Nice decorator! | 15:18 |
kgriffs | I am getting rid of global config in queues at the moment | 15:19 |
flaper87 | I'll be back full-time tomorrow | 15:19 |
kgriffs | it's blocking the sharding work | 15:19 |
*** yassine has quit IRC | 15:19 | |
flaper87 | alcabrera: :D | 15:19 |
*** yassine has joined #openstack-marconi | 15:19 | |
alcabrera | flaper87: looking forward to your return, but glad to hear the trip has been going well. :D | 15:19 |
flaper87 | kgriffs: awesome, that will definitely help with the global config patch | 15:19 |
kgriffs | yep | 15:19 |
flaper87 | yeah! It's my first time in Dublin, very cool city | 15:20 |
zyuan | wow | 15:20 |
flaper87 | I don't like the wheather, though. :P | 15:20 |
flaper87 | but it seems that it's like 'the best it gets' here | 15:20 |
flaper87 | :D | 15:20 |
kgriffs | heh | 15:20 |
*** tedross has joined #openstack-marconi | 15:20 | |
flaper87 | just like london! | 15:20 |
flaper87 | but london gets better, though | 15:20 |
flaper87 | anyway | 15:21 |
kgriffs | lol | 15:21 |
kgriffs | I'm thinking maybe we skip today's meeting | 15:23 |
kgriffs | There is a bunch of stuff on the agenda that I'd like flaper87 here for | 15:23 |
kgriffs | and the remaining stuff would make for a very short meeting | 15:24 |
kgriffs | thoughts? | 15:24 |
alcabrera | kgriffs: sounds fair. I'm double-checking the agenda to be sure. | 15:24 |
ametts | Okay with me. | 15:24 |
flaper87 | okey with me! Sorry for not being around today! | 15:24 |
alcabrera | action review, bug update, graduation status, API feedback, API versioning strategy, open discussion | 15:24 |
alcabrera | So of those items, I wouldn't want flaper87 to miss out on graduation/API discussions. :P | 15:25 |
alcabrera | Plus, we can start making it a routine to announce our meeting on the ML every Friday. | 15:26 |
kgriffs | yep | 15:26 |
kgriffs | I'll do that | 15:26 |
alcabrera | kgriffs: my thoughts - I'm in favor of skipping this meeting. | 15:26 |
kgriffs | anyway, I'll pop into #openstack-meeting-alt and let folks know | 15:26 |
alcabrera | cool, thanks! | 15:27 |
flaper87 | awesome, thanks guys, A LOT! | 15:27 |
malini | o/ | 15:31 |
malini | sorry just got back from the appt | 15:31 |
malini | We had a negative TTL bug (for message & claims) a whole back tht alcabrera fixed | 15:31 |
malini | The other bugs which kgriffs fixed recently is the negative age for messages from the stats endpoint | 15:32 |
malini | bug* | 15:32 |
alcabrera | malini: o/ | 15:32 |
kgriffs | hmm | 15:34 |
kgriffs | # Maximum Content-Length allowed for metadata updating and | 15:34 |
kgriffs | # message posting. | 15:34 |
kgriffs | ;metadata_max_length = 65536 | 15:34 |
kgriffs | ;content_max_length = 262144 | 15:34 |
kgriffs | it looks like if I have, say, queue metadata with whitespace in it, i could exceed 64 K | 15:35 |
kgriffs | and the request would fail | 15:35 |
kgriffs | …even though | 15:36 |
kgriffs | we say they can have up to 64 K compacted (without whitespace) | 15:36 |
kgriffs | metadata_size_uplimit | 15:36 |
zyuan | kgriffs: wait | 15:36 |
kgriffs | am I misunderstanding? | 15:36 |
kgriffs | # Place JSON size restriction before parsing | 15:37 |
kgriffs | if req.content_length > CFG.metadata_max_length: | 15:37 |
zyuan | what's the problem? | 15:37 |
kgriffs | the problem is that we aren't making good on our promise | 15:37 |
kgriffs | at least, with the default settings | 15:37 |
zyuan | those are just content_length limits | 15:38 |
kgriffs | yes, but if they are both 64 K | 15:38 |
zyuan | and? | 15:38 |
kgriffs | so, whitespace isn't really ignored, in effect | 15:38 |
zyuan | it is ignored if you condigured the limit to ignore to a smaller value | 15:39 |
zyuan | it's just not observable when the configuration is 64k/64k | 15:39 |
alcabrera | kgriffs: I'm of the opinion that we should take back our promise on compaction, since that necessarily implies that somewhere in the processing pipeline, we'd be doing some form of compaction. | 15:40 |
alcabrera | On whitespace. | 15:40 |
kgriffs | alcabrera: we are | 15:40 |
kgriffs | the validator compacts the json before checking length | 15:41 |
alcabrera | kgriffs: ah | 15:41 |
zyuan | storage don;t receive request | 15:41 |
kgriffs | we thought it would be easier for the user | 15:41 |
kgriffs | easier/more predictable | 15:41 |
kgriffs | anyway | 15:41 |
kgriffs | say I set the uplimit to 11 | 15:41 |
alcabrera | "length = _compact_json_length(metadata)" | 15:41 |
alcabrera | I see it now | 15:41 |
kgriffs | and max content length to 11 | 15:42 |
kgriffs | and I pass {'thing':1} | 15:42 |
kgriffs | that works fine | 15:42 |
kgriffs | {'thing': 1} | 15:42 |
kgriffs | would fail | 15:42 |
kgriffs | right? | 15:42 |
zyuan | yes | 15:42 |
kgriffs | that's all I'm saying | 15:42 |
zyuan | if you do want to observe the effect of checking | 15:42 |
zyuan | set them to different values | 15:42 |
zyuan | the fact is | 15:42 |
kgriffs | that is the whole point | 15:42 |
kgriffs | of the compaction | 15:42 |
zyuan | don't change *_length | 15:43 |
*** dafter has joined #openstack-marconi | 15:43 | |
zyuan | user are not expected to change *_max_length | 15:43 |
* kgriffs is confused | 15:43 | |
kgriffs | why not | 15:43 |
kgriffs | and for that matter, what is the point of _max_length? | 15:43 |
zyuan | the limit of content_length is not so useful to user | 15:43 |
zyuan | it's just a defender, that's it | 15:43 |
* alcabrera just learned about the separators option in json.dumps | 15:44 | |
kgriffs | but | 15:44 |
kgriffs | the web server already has limits like that | 15:44 |
zyuan | then why you want to change it? | 15:44 |
kgriffs | what do we gain by doing it in our app? | 15:44 |
kgriffs | i mean, the content length limits? | 15:44 |
alcabrera | kgriffs: good point | 15:45 |
zyuan | because you said web server may not have it | 15:45 |
alcabrera | hmmm | 15:45 |
kgriffs | hmm | 15:45 |
* alcabrera does some research | 15:45 | |
zyuan | and marconi should do it by itself | 15:45 |
kgriffs | if I did, I mispoke | 15:45 |
alcabrera | http://wiki.nginx.org/HttpCoreModule#client_max_body_size (nginx) | 15:45 |
kgriffs | every real web server restricts body size | 15:45 |
zyuan | and i agree it's ok to do it byitself | 15:45 |
alcabrera | http://www.iis.net/configreference/system.webserver/security/requestfiltering/requestlimits (Microsoft IIS) | 15:45 |
zyuan | because it looks wired if you need to configure things in two places | 15:46 |
zyuan | especially | 15:46 |
zyuan | marconi don't know how you configured your web server | 15:46 |
zyuan | and don't know the limit | 15:46 |
kgriffs | I think we should just let the web server do it's thing, and in the app we only check compact json, since that has storage ramifications, and the webserver isn't smart enough to parse the json | 15:47 |
zyuan | to take advantage of not to performan the whitespace checking | 15:47 |
zyuan | i decided to inline the content_length checking | 15:47 |
*** flaper87 is now known as flaper87|afk | 15:47 | |
zyuan | kgriffs: my suggestion is we can keep this config var | 15:47 |
kgriffs | alcabrera: does json schema act on the raw text? | 15:47 |
zyuan | but perform no checking | 15:48 |
zyuan | just ask user to fill those config var with what their server have | 15:48 |
zyuan | sounds better? | 15:48 |
alcabrera | kgriffs: lemme double check that jsonschema behavior | 15:48 |
alcabrera | http://httpd.apache.org/docs/trunk/mod/core.html#limitrequestbody (Apache httpd) | 15:48 |
kgriffs | i don't see any point in requiring the operator to configure that in two places | 15:48 |
zyuan | kgriffs: otherwise you have to perform real whitespace checking in deployment | 15:49 |
kgriffs | basically, we need to verify JSON, and we need to verify length of compact json for the user. Fudging it is a bad UX | 15:49 |
zyuan | i need an approach to workaround it | 15:49 |
kgriffs | I was under the impression we were doing real checking already | 15:49 |
zyuan | kgriffs: no | 15:50 |
zyuan | kgriffs: not in deployment, i belive | 15:50 |
zyuan | this is deployment-dependent | 15:50 |
kgriffs | hmm | 15:50 |
kgriffs | then what we are telling users and what we are checking doesn't match | 15:50 |
alcabrera | kgriffs: jsonschema works off of the already-decoded JSON, so... dict objects. | 15:51 |
zyuan | this behavior is not observable, so we kept the promise | 15:51 |
kgriffs | it is observable | 15:51 |
kgriffs | see my example above | 15:51 |
zyuan | how? | 15:51 |
alcabrera | jsonschema.validate(schema: dict, data: dict) | 15:51 |
zyuan | no, it's not, it violated one of the limit | 15:51 |
kgriffs | a user will submit metadata with whitespace, expecting it to be valid, because we promise that we will strip whitespace before checking the limit | 15:52 |
kgriffs | HOWEVER | 15:52 |
kgriffs | we check it against content length and return 400 | 15:52 |
kgriffs | so, we are violating the contract | 15:52 |
kgriffs | or am I missing something? | 15:52 |
zyuan | setting _max_length is equivilient to set it in server | 15:52 |
zyuan | so user should understand this as "my request is totally wrong and not reached marconi yet" | 15:53 |
zyuan | user violated the precondition first | 15:53 |
zyuan | so we just done the right thing | 15:53 |
kgriffs | _max_length would need to be large enough to account for potential whitespace difference between the two checks | 15:53 |
zyuan | kgriffs: so you need to set it to a large enough value! | 15:54 |
zyuan | not 11 | 15:54 |
zyuan | for example, 64K | 15:54 |
kgriffs | the default config it is equivalent, and it isn't obvious that this is the case | 15:54 |
kgriffs | and in any case, we are duplicating what the web server already does for us | 15:54 |
zyuan | kgriffs: we can alwasy change the other value | 15:54 |
zyuan | kgriffs: i agree, but i also want to say we'd better to keep this value | 15:55 |
kgriffs | I'm still not convinced we need _max_length at all | 15:55 |
kgriffs | it's redundant | 15:55 |
alcabrera | given that IIS, apache httpd, and nginx all support content-length limitation, I'm in agreement, kgriffs. | 15:55 |
kgriffs | (since the server already checks this for us) | 15:55 |
zyuan | kgriffs: i know! but how marconi get server setting? | 15:55 |
kgriffs | marconi doesn't care | 15:55 |
zyuan | kgriffs: i has to | 15:56 |
zyuan | it has to | 15:56 |
kgriffs | why? | 15:56 |
zyuan | to skip the deserilization checking at all | 15:56 |
zyuan | without breaking any promise | 15:56 |
kgriffs | first of all, I think it is safe to assume that the server's max content length will always be greater than marconi's | 15:57 |
zyuan | they shall be same | 15:57 |
zyuan | they'd better be same | 15:57 |
kgriffs | nope | 15:57 |
kgriffs | i mean | 15:57 |
kgriffs | greater than marconi's uplimit | 15:57 |
zyuan | uplimit is fine, different | 15:57 |
kgriffs | assume we remove marconi's metadata_max_length | 15:57 |
zyuan | max_length shall be as same as server setting | 15:58 |
kgriffs | I'm saying, let's remove it | 15:58 |
alcabrera | +1 | 15:58 |
zyuan | if we remove it, then for each request we need to do deserialization | 15:58 |
kgriffs | and also assume that server's max length is like 2x metadata_size_uplimit or whatever | 15:58 |
zyuan | kgriffs: *_uplimit is fine, it can be anyX | 15:59 |
alcabrera | kgriffs: that kind of assumption should be documented in the *yet to be born* marconi operator's manual. | 15:59 |
kgriffs | zyuan: for each request that is within a sane limit, yes we need to deserialize | 15:59 |
kgriffs | that happens anyway | 15:59 |
alcabrera | It's going to be fun working on marconi docs when we get to that point. :) | 15:59 |
zyuan | server's max length and metadata_size_uplimit are just not related, i actually don't know that you mean | 15:59 |
kgriffs | here's the thing | 16:00 |
kgriffs | right now there is | 16:00 |
kgriffs | metadata_max_length | 16:00 |
kgriffs | if the raw, unparsed json exceeds that length, we return 400 | 16:01 |
*** whenry has joined #openstack-marconi | 16:01 | |
kgriffs | also | 16:01 |
kgriffs | otherwise we keep going | 16:01 |
kgriffs | next we check compact json | 16:01 |
kgriffs | but only | 16:01 |
kgriffs | if metadata_size_uplimit < metadata_max_length | 16:01 |
kgriffs | so, assuming that the operator would always set metadata_max_length to something greater than metadata_size_uplimit | 16:02 |
kgriffs | (to account for possible whitespace) | 16:02 |
zyuan | it's ok, and it's right, just slow | 16:02 |
kgriffs | that short-circuit, in practice, will never happen | 16:03 |
zyuan | it must | 16:03 |
zyuan | SQS does it | 16:03 |
kgriffs | if we are always skipping the compact JSON check then there is not point in having it in the first place | 16:03 |
kgriffs | s/not/no | 16:03 |
zyuan | kgriffs: there is a point, i want this behavior to be opauqe | 16:04 |
zyuan | oh, you mean, first place | 16:04 |
zyuan | ok. i don't know. | 16:04 |
kgriffs | seems like we either need to decide to take the hit and validate this for real, or tell users that we validate with whitespace included | 16:05 |
*** alcabrera has quit IRC | 16:06 | |
zyuan | i just provided things work for both case | 16:06 |
zyuan | (it sounds like sophistry but i belive that's key point of design...) | 16:08 |
*** dafter has quit IRC | 16:08 | |
kgriffs | %timeit _compact_json_length({}) | 16:09 |
kgriffs | 10.6 us | 16:09 |
kgriffs | hmm | 16:09 |
*** reed has joined #openstack-marconi | 16:09 | |
*** alcabrera has joined #openstack-marconi | 16:09 | |
kgriffs | let me think on this. that is pretty slow. we either need to speed it up or abandon the idea altogether | 16:09 |
alcabrera | back - internet had a disconnect. :P | 16:10 |
* alcabrera catches up | 16:10 | |
zyuan | and there is always an idea around which to use a much faster and native and (fill whatever you want) json parser | 16:10 |
kgriffs | yeah, the original discussion is coming back to me | 16:13 |
*** yassine has quit IRC | 16:13 | |
kgriffs | the main slowdown was reserializing the body of each message | 16:13 |
*** ykaplan has quit IRC | 16:15 | |
kgriffs | basically, we would need a way to very quickly check a subdocument's length, with whitespace stripped | 16:17 |
kgriffs | whether we compact or not, we need to verify the size of each message body | 16:19 |
zyuan | kgriffs: i want to ask a question | 16:21 |
zyuan | if i send a piece of JSON in UTF-8 | 16:22 |
zyuan | which contains { "key": "this is one unicode char represented in 3 bytes" } | 16:22 |
zyuan | what's the expected "docoment length with whitespace stripped"? | 16:23 |
zyuan | {"key": "1"} or {"key": "xxx"}? | 16:24 |
kgriffs | the spec says "charactes" not "bytes" | 16:24 |
kgriffs | so we should be counting chars I guess | 16:24 |
kgriffs | or we can change the spec | 16:25 |
zyuan | i'm ok with chars | 16:25 |
zyuan | because "whitespace" only make sense on chars | 16:25 |
kgriffs | looks like _compact_json_length is counting chars | 16:25 |
zyuan | kgriffs: yes | 16:25 |
zyuan | i ask this just for measure how hard to write a size-only parser. | 16:26 |
alcabrera | ```len(filter(lambda x: x not in string.whitespace, document)``` is crazy fast on Python 3 (~300 ns), and about 3us on Python 2.7 | 16:27 |
alcabrera | for a small document, I should say. | 16:28 |
zyuan | alcabrera: but obviously that does not work because JSON string can also contains whitespace | 16:28 |
alcabrera | hmmm... | 16:29 |
kgriffs | if we could avoid even parsing messages['body'] in the first place, keeping it a blob, then we could just strip whitespace and do len() | 16:29 |
kgriffs | len(messages['body']) | 16:29 |
zyuan | kgriffs: a more realistic problem | 16:30 |
zyuan | what is the length of : 3.1416? | 16:30 |
zyuan | the storage presure of 3.1416 is as same as .0 | 16:31 |
kgriffs | come on, I bet we could contribute a patch to simplejson to support that | 16:31 |
kgriffs | where's your sense of adventure? | 16:31 |
zyuan | kgriffs: i have a feeling that "size without whitespace" have different meaning to different usecase, can the term "size" itself can also have different meanings | 16:32 |
kgriffs | well, we have already established that "size" means "number of characters" | 16:32 |
kgriffs | whether or not we strip whitespace, we still need to validate number of characters for each body | 16:33 |
zyuan | but as i shown it does not mean too much for storage | 16:33 |
kgriffs | unless we want to tell users something different than we've been telling them | 16:33 |
zyuan | i agree it's quite clear to JSON | 16:34 |
kgriffs | thought experiment | 16:34 |
kgriffs | what if we were to allow application/msgpack | 16:34 |
kgriffs | (assuming that were a real media type | 16:34 |
kgriffs | ) | 16:34 |
zyuan | msgpack has only 1 representation | 16:34 |
kgriffs | now, we end up defining limits differently | 16:35 |
zyuan | so, no problem | 16:35 |
zyuan | that's true | 16:35 |
kgriffs | i think it is ok to define limits based on media type | 16:35 |
kgriffs | the main thing is the user knows what to expect | 16:35 |
zyuan | however, things become tricky because | 16:35 |
zyuan | msgpack size = representation size = bytes | 16:35 |
zyuan | json size (in our definition) != json chars sent by user != bytes | 16:36 |
zyuan | JSON has two more layers here, when comparing to msgpack | 16:37 |
kgriffs | yeah | 16:37 |
zyuan | one for encoding (utf-8), one for whitespace | 16:37 |
zyuan | and if storage comes in, things become furtherly tricky | 16:38 |
zyuan | does mongo support insert BSON directly? | 16:39 |
kgriffs | mongo only speaks bson, so I assume you mean pymongo | 16:39 |
zyuan | yea | 16:39 |
kgriffs | idk | 16:39 |
zyuan | (who cares the server api) | 16:40 |
zyuan | anyway, it simplfies mongo but not for others... | 16:41 |
zyuan | need to revisit the sizing concept. if needed, i'm ok with doing anything in C/C++ | 16:48 |
kgriffs | OK. I think the best bet would be to have a JSON validator that can check the size of subdocuments | 16:49 |
zyuan | (simplejson the one we are using is also in C) | 16:49 |
zyuan | kgriffs: hard to say at this point of time | 16:50 |
alcabrera | InvalidDocument: BSON document too large (1000000098 bytes) - the connected server supports BSON document sizes up to 16777216 bytes - given a large enough project-id | 16:51 |
kgriffs | we have three options | 16:51 |
kgriffs | 1. verify size before serialization | 16:51 |
kgriffs | 2. verify size after serialization, which would require re-serialization | 16:51 |
kgriffs | 3. don't serialize the body in the first place | 16:51 |
kgriffs | but | 16:51 |
kgriffs | number 3 will cause us problems if we ever want to support non-JSON media types | 16:52 |
kgriffs | unless we tell users that body has to be a string blob | 16:52 |
kgriffs | but that sucks | 16:52 |
kgriffs | if body is structured as the user submits it to us, we need to give it back to them structured | 16:52 |
kgriffs | anyway | 16:53 |
kgriffs | 2. is what we are doing now | 16:53 |
zyuan | i have one more option in mind | 16:53 |
kgriffs | I'm all ears | 16:54 |
zyuan | first, define size differently, for each json types | 16:54 |
zyuan | for example, json string size as just len(s) | 16:54 |
zyuan | double size is 8 | 16:54 |
zyuan | as well as integer | 16:54 |
zyuan | null size is 8 | 16:54 |
zyuan | bool, maybe 1? | 16:55 |
zyuan | like that | 16:55 |
zyuan | then, work through the parsed python dict, sum up the size | 16:55 |
zyuan | that's it | 16:55 |
zyuan | benefits: 1. close to what db stores | 16:56 |
zyuan | 2. close to user expectation | 16:56 |
zyuan | 3. easy to implement. slow in cpython i blieve, but ok for pypy. pypy's raw loops are fine. | 16:56 |
kgriffs | I guess the con is that it is ony "close" to user expectations | 16:57 |
zyuan | when i say "user expectation" i mean, JSON cares each chars | 16:57 |
zyuan | but user don't! | 16:57 |
zyuan | if i;m user, i hope server don't care the floating point precision of my dicoment | 16:57 |
zyuan | that makes no sense | 16:58 |
zyuan | 3.1415345347 should be regarded to have the same size as .0 | 16:58 |
kgriffs | that can be confusing to some users | 16:58 |
zyuan | i'm a numeric anaysis user | 16:58 |
zyuan | :) | 16:58 |
kgriffs | yeah | 16:58 |
kgriffs | some users would be ok, but some would be confused | 16:59 |
kgriffs | that being said | 16:59 |
kgriffs | if you weren't actually emitting a serialized document string | 16:59 |
zyuan | when you design something significant, there will be someone confused (said by Bjarne LOL) | 16:59 |
kgriffs | could you make counting faster but still be accurate? | 16:59 |
kgriffs | you still have to str the numbers | 17:00 |
zyuan | you mean..? | 17:00 |
kgriffs | but bool and everything else you could just count | 17:00 |
zyuan | numbers don't len | 17:00 |
kgriffs | yeah | 17:01 |
zyuan | they are just constant, size 8 | 17:01 |
kgriffs | your would have to len(str(number)) | 17:01 |
kgriffs | i bet you could make it faster than simplejson, but I don't know if it would be significant | 17:01 |
zyuan | why? in my design above they are always 8 | 17:01 |
kgriffs | no, I mean, not having always 8 | 17:01 |
kgriffs | but actually counting the real text length of the number when serialized | 17:02 |
zyuan | what i'm arguing is that counting by char is flawed | 17:02 |
zyuan | it's meaningful to JSON | 17:02 |
zyuan | but JSON is just a rich representation | 17:02 |
kgriffs | in the user's mind, JSON is the context | 17:03 |
zyuan | and user usually care the meaning | 17:03 |
kgriffs | we have to pick some neutral format, and that is JSON at the moment | 17:03 |
zyuan | I don't agree. JSON is just an appraoch to represent meaning | 17:04 |
zyuan | and it's an intermediate one (not close to representation in either way, JSON bytes, or DB storage) | 17:04 |
zyuan | while my suggestion give it meaning by value | 17:04 |
zyuan | and key | 17:05 |
kgriffs | but then the user will always have to be calculating in their head how many bytes they have left after serializing from their apps internal data structure to JSON | 17:05 |
kgriffs | since they can't just look at the json and tell | 17:05 |
zyuan | kgriffs: they don't need to calc | 17:05 |
kgriffs | now they have to think about the sizes that marconi has "reserved" for each JSON type | 17:05 |
zyuan | they need to calc if we do chars counting | 17:06 |
kgriffs | I guarantee you are going to get a lot of push back on this idea from users | 17:06 |
zyuan | for example | 17:06 |
kgriffs | but the calc is much simpler and more intuitive | 17:06 |
zyuan | "\uaaaa" is size 8 | 17:06 |
zyuan | in our current policy | 17:06 |
zyuan | but with my suggestion | 17:06 |
zyuan | it's 3 | 17:06 |
zyuan | same as you do s.length in our client | 17:07 |
zyuan | whatever we see in client, we have it in server | 17:07 |
zyuan | no plain | 17:07 |
zyuan | otherwise, you need to do repr(e) in client to get size | 17:07 |
zyuan | which is, awful | 17:07 |
kgriffs | how is "\uaaaa" is size 8 | 17:08 |
kgriffs | if that is a single unicode char, it will count as "1" | 17:08 |
zyuan | no | 17:08 |
zyuan | to JSON, each char counts | 17:08 |
zyuan | i mean, "\uaaaa" in JSON, not python | 17:08 |
kgriffs | ok, I see your point | 17:08 |
zyuan | one more example | 17:08 |
zyuan | if a user read a csv file | 17:09 |
zyuan | how he measure the size? | 17:09 |
kgriffs | but having to account for some unicode strs in my length calculation seems a lot easier than having to add up size for all the different types based on marconi's arbitrary definition of such | 17:09 |
zyuan | here is what the next thing i want to show | 17:09 |
kgriffs | actually | 17:10 |
zyuan | for scalars other than string, user may not care about the size of scalars at all! | 17:10 |
kgriffs | marconi will be deserializing | 17:10 |
zyuan | if you read a csv file | 17:10 |
kgriffs | nevermind my last comment | 17:10 |
zyuan | you know 1. the file size (which making no sense) | 17:10 |
kgriffs | zyuan: what if they are sending a huge batch of numbers | 17:11 |
zyuan | 2. each value's repr (which is hard to marsure) | 17:11 |
kgriffs | i suppose then they will have to take array.length * marconi_size_per_float | 17:11 |
zyuan | but user know one thing: how many number i have in total | 17:11 |
zyuan | a client jsut need to limit on "how many numbers", which is they matrix size, that's it! | 17:11 |
zyuan | forget marconi_size_per_float | 17:12 |
zyuan | it's just ok to define an max matrix size in client | 17:12 |
zyuan | without looking at marconi_size_per_float | 17:12 |
kgriffs | how do I know it will fit in a single message | 17:13 |
zyuan | for example, 100 x 100 | 17:13 |
zyuan | a client degsiner only need to test several times, then pick a number | 17:13 |
kgriffs | trial by error? | 17:13 |
zyuan | like 1000 * 1000? opps, failed | 17:14 |
kgriffs | seems hacky to me | 17:14 |
zyuan | 800 x 800 ok | 17:14 |
zyuan | then defined it as 800 x 800 | 17:14 |
kgriffs | and nobody knows the real limits of marconi | 17:14 |
zyuan | it's client implementer's job, not user's | 17:14 |
kgriffs | the client implementor is the user | 17:14 |
zyuan | so it's not quite trial by error | 17:14 |
zyuan | user only know 800 | 17:14 |
kgriffs | or at least, *a* user | 17:14 |
zyuan | he/she don't need to know whether the number comes from | 17:15 |
zyuan | the final interface is consist by marconi the server and client | 17:15 |
zyuan | (s) | 17:15 |
zyuan | they are not seperatale | 17:15 |
zyuan | or we can say, marconi_size_per_float is something libarary implementro need to look at | 17:16 |
zyuan | in any appraoch, i don't want user to care and calculate the limits. | 17:16 |
zyuan | what they should care is how much content they have. | 17:17 |
zyuan | like matrix size, string length, etc. | 17:17 |
kgriffs | the unfortunate thing is that the user will care as soon as they get back a 400 the first time. Then they want to know *why* and then we have to tell them | 17:17 |
kgriffs | and if our explaination is "well, it depends" | 17:18 |
zyuan | they should not. a domain-specific client should be alle to detect it before sending to marconi. | 17:18 |
kgriffs | "you have to take into account the sizes of all these different types, blah blah" | 17:18 |
kgriffs | people are going to complain that it is "too confusing" and "not predictable" | 17:18 |
zyuan | ask their library implementers | 17:18 |
kgriffs | not all clients are domain-specific | 17:18 |
kgriffs | you have the generic go library | 17:19 |
kgriffs | some startup uses it | 17:19 |
kgriffs | hits the limit | 17:19 |
kgriffs | and then calls support because they are confused | 17:19 |
zyuan | lastword: this algorithms is easy to implement everwhere | 17:19 |
zyuan | we can even add it to a gneric client! | 17:19 |
kgriffs | well sure. we would need to run this by client implementors then to see what they think | 17:20 |
zyuan | i find it's reasonable, since it also solves msgpack | 17:20 |
zyuan | because msgpack, if have the same content (dict), they get same size, and same to that JSON have | 17:21 |
zyuan | ... see you after lunch | 17:21 |
kgriffs | ttfn | 17:22 |
zyuan | forgot to say, we can define marconi_size_per_float to 1, so that less confusion | 17:22 |
zyuan | each string has len(s), each scalar is 1 | 17:23 |
zyuan | that shows my "size as content size" better, i think | 17:23 |
alcabrera | brb | 17:25 |
*** ametts has quit IRC | 17:28 | |
alcabrera | back | 17:32 |
*** jdaggett1 has joined #openstack-marconi | 17:33 | |
kgriffs | malini or alcabrera: can you verify a couple bugs? | 17:33 |
kgriffs | https://bugs.launchpad.net/marconi/+bug/1239784 | 17:33 |
kgriffs | https://bugs.launchpad.net/marconi/+bug/1239768 | 17:33 |
kgriffs | the second one you can see if you make content_max_length larger than message_size_uplimit | 17:34 |
kgriffs | Those appear to be bugs based on looking at the code alone, but i need someone to actually test and confirm | 17:35 |
malini | I am looking at it now | 17:35 |
alcabrera | kgriffs: heh, I definitely wouldn't have seen the second one, since I wipe my marconi-queues.conf of limits. :P | 17:35 |
alcabrera | malini: thanks! | 17:35 |
malini | kgriffs: https://bugs.launchpad.net/marconi/+bug/1239784 | 17:37 |
malini | iirc we decided to go this route because as long as individual_msg_size is within limit, their sum would be too | 17:40 |
kgriffs | that doesn't sound right | 17:41 |
kgriffs | if I have two messages, each with a body that is 256 KB, then the overall limit would be exceeded, even though the individual limit is OK | 17:41 |
malini | But overall limit >= individual_msg_size_max * max_allowed_msg_count | 17:43 |
kgriffs | currently we do two things | 17:44 |
kgriffs | we first check whether the entire body of the request exceeds some maximum | 17:45 |
kgriffs | that will be going away, since it is redundant with what the web server does | 17:45 |
kgriffs | (web server already specifies a max request body size) | 17:45 |
kgriffs | then we go and we check the size of each message body | 17:45 |
kgriffs | (not counting the envelope) | 17:46 |
kgriffs | (just counting the body) | 17:46 |
kgriffs | if each body is <= the limit (256 KB by default) we say OK | 17:46 |
kgriffs | but we never check whether the sum of the length of each body is greater than the overall limit | 17:47 |
malini | iow, this is not being done now 'the total size for all message bodies included in a single request is limited as well to 256 KiB by default (configurable)' ? | 17:47 |
kgriffs | the problem is, the content_length limit got conflagrated with the sum_of_bodies_limit (which setting doesn't actually exist, btw) | 17:47 |
*** fvollero is now known as fvollero|gone | 17:47 | |
kgriffs | that is the bug | 17:48 |
kgriffs | at least, I think it is a bug but haven't tried it | 17:48 |
kgriffs | to test this, you would have to set content_max_length | 17:48 |
kgriffs | to something big, like 1 MB | 17:48 |
malini | So if I post messages with 2 messages (240KB) each, I can get a repro | 17:48 |
kgriffs | since at the moment it is being (incorrectly) overloaded to mean "sum of all body lengths" | 17:49 |
kgriffs | malini: yeah, but you need to up that setting I just mentioned, since otherwise the body validation check is short-circuited | 17:49 |
malini | aah..ok..I'll add a test for this | 17:50 |
*** jdaggett2 has joined #openstack-marconi | 17:53 | |
*** jdaggett1 has quit IRC | 17:56 | |
*** jdaggett2 has quit IRC | 18:10 | |
*** kgriffs is now known as kgriffs_afk | 18:13 | |
*** vkmc has quit IRC | 18:15 | |
zyuan | kgriffs_afk: ping | 18:28 |
*** kgriffs_afk is now known as kgriffs | 18:32 | |
*** jdaggett1 has joined #openstack-marconi | 18:41 | |
zyuan | kgriffs: how you repro the first bug? | 18:42 |
*** ametts has joined #openstack-marconi | 18:46 | |
*** jdaggett1 has quit IRC | 18:50 | |
*** kgriffs is now known as kgriffs_afk | 18:57 | |
*** alcabrera has quit IRC | 19:00 | |
*** kgriffs_afk is now known as kgriffs | 19:07 | |
*** jdaggett1 has joined #openstack-marconi | 19:16 | |
*** ayoung has joined #openstack-marconi | 19:38 | |
*** jdaggett1 has quit IRC | 20:44 | |
*** jdaggett1 has joined #openstack-marconi | 20:45 | |
*** amitgandhi1 has joined #openstack-marconi | 20:50 | |
*** amitgandhi has quit IRC | 20:51 | |
*** malini is now known as malini_afk | 20:52 | |
*** malini_afk is now known as malini | 20:52 | |
*** amitgandhi1 has quit IRC | 20:56 | |
*** amitgandhi has joined #openstack-marconi | 20:57 | |
openstackgerrit | Kurt Griffiths proposed a change to openstack/marconi: fix(queues): Global config used everywhere https://review.openstack.org/51705 | 21:02 |
openstackgerrit | Kurt Griffiths proposed a change to openstack/marconi: feat: Storage sharding foundation https://review.openstack.org/50437 | 21:03 |
openstackgerrit | Kurt Griffiths proposed a change to openstack/marconi: Setup storage pipeline in the boostrap instead of driver base https://review.openstack.org/51049 | 21:03 |
openstackgerrit | Kurt Griffiths proposed a change to openstack/marconi: chore: Remove GC cruft from storage driver base class https://review.openstack.org/51016 | 21:03 |
*** malini is now known as malini_afk | 21:03 | |
zyuan | as a Chinese, i've never been to HongKong. BTW, Chinese people need another passer to go HongKong, with a maximum stay time of 7 days -- I've heard of that HongKong is a part of China? | 21:19 |
*** mpanetta has quit IRC | 21:23 | |
zyuan | kgriffs: more comments added https://bugs.launchpad.net/marconi/+bug/1239768 | 21:24 |
*** oz_akan_ has quit IRC | 21:26 | |
*** oz_akan_ has joined #openstack-marconi | 21:26 | |
openstackgerrit | Kurt Griffiths proposed a change to openstack/marconi: fix(queues): Global config used everywhere https://review.openstack.org/51705 | 21:28 |
*** jdaggett1 has quit IRC | 21:28 | |
*** jdaggett1 has joined #openstack-marconi | 21:29 | |
*** oz_akan_ has quit IRC | 21:31 | |
*** fifieldt_ has joined #openstack-marconi | 21:32 | |
*** jdaggett1 has quit IRC | 21:33 | |
openstackgerrit | Kurt Griffiths proposed a change to openstack/marconi: feat: Storage sharding foundation https://review.openstack.org/50437 | 21:33 |
*** amitgandhi1 has joined #openstack-marconi | 21:35 | |
*** amitgandhi2 has joined #openstack-marconi | 21:35 | |
*** amitgandhi1 has quit IRC | 21:35 | |
*** amitgandhi has quit IRC | 21:38 | |
openstackgerrit | A change was merged to openstack/python-marconiclient: Complete refactor of api.py https://review.openstack.org/49698 | 21:48 |
zyuan | again i found that python need a switch statement... | 21:50 |
openstackgerrit | A change was merged to openstack/python-marconiclient: Split get_transport into 2 different functions https://review.openstack.org/49786 | 22:00 |
*** kgriffs is now known as kgriffs_afk | 22:22 | |
*** amitgandhi2 has quit IRC | 22:30 | |
*** oz_akan_ has joined #openstack-marconi | 22:37 | |
*** oz_akan_ has quit IRC | 22:41 | |
*** kgriffs_afk is now known as kgriffs | 22:59 | |
*** jcru has quit IRC | 23:09 | |
*** kgriffs is now known as kgriffs_afk | 23:12 | |
*** kgriffs_afk is now known as kgriffs | 23:14 | |
*** kgriffs is now known as kgriffs_afk | 23:24 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!