14:00:49 #startmeeting oslo 14:00:50 Meeting started Fri Oct 11 14:00:49 2013 UTC and is due to finish in 60 minutes. The chair is dhellmann. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:51 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:51 cool 14:00:52 nop, just here for fun 14:00:53 The meeting name has been set to 'oslo' 14:01:03 how about a show of hands for the record 14:01:05 o/ 14:01:08 here 14:01:13 o/ 14:01:17 o/ 14:01:21 #link https://wiki.openstack.org/wiki/Meetings/Oslo 14:01:29 o/ 14:01:37 o/ 14:01:44 hey 14:02:19 ok, the main topic today is the delayed translation feature in openstack.common.gettextutils 14:02:22 \o 14:02:43 we should start with a quick summary of the original requirements 14:02:55 who can provide that? 14:03:21 there were a few issues with the way localization worked in the projects before 14:03:34 mainly that API messages and log messages were tied to the locale of the system 14:04:31 the goal was to untie them 14:04:32 ok 14:04:48 i can find the original blueprint... 14:05:03 do we need the log messages untied too, or is it just that log messages are another place we do translation? 14:05:16 https://blueprints.launchpad.net/oslo/+spec/delayed-message-translation 14:05:49 well there was a requirement to have english logs alongside the localized ones 14:05:59 that we had internally 14:06:10 "we"? 14:06:10 more for debugging and support purposes 14:06:18 personally, I think we should focus on properly translated REST API responses 14:06:29 logs in different languages seems bizarre to me 14:06:49 yeah 14:06:55 +1 14:07:00 markmc: yes thats why we wanted a way to change them back to english and make them independent of the system locale 14:07:11 ok, so we need to have a way to delay translation of messages until right before they are output, and then at that point select the locale for the translation 14:07:23 thats a good summary yes 14:07:23 well, that's one way 14:07:36 the other way is that you know what language you want when you create the message 14:08:06 Can you do that for api calls? 14:08:14 i.e. a lot of the logic here is about retaining _("%s" % (foo, )) so that we don't have to know the language when making the substitution 14:08:14 doesn't that push info about the locale all over the code, though? 14:08:17 You could have users with multiple locales requesting the same message. 14:08:30 (didn't explain that very well) 14:08:51 here's what I'd like to see: make Message a container, but *not* something that pretends to be a string. Then handle the translation explicitly at the point of output. 14:08:51 i.e. it's not about, at runtime, knowing what language you want _() to return 14:09:08 it's about allowing that language decision to be made long after _() is called 14:09:18 dhellmann: I was thinking along the same lines. 14:09:34 markmc: right, that's where I'm going, too, I think 14:09:41 so _() returns a Message, that's fine 14:09:51 dhellmann, I think that's overcomplicating all of this 14:10:14 markmc: oh? 14:10:14 I'm not sure that if we ignore the "logs in multiple languages" thing, we couldn't simplify this greatly 14:10:33 e.g. if the language is part of the request context, that's all over the place already 14:10:35 the other place where the % operator is used is building exception messages 14:10:41 yes 14:10:59 exceptions are probably the main way that messages end up in REST API responses 14:11:12 for the most part thats true yes 14:11:12 I don't see how you do this without some sort of object returned by _ to encapsulate the original message for later translation. 14:11:31 beekneemech, pass the required language to _() 14:12:03 anyway 14:12:07 that would work, but require passing knowledge of how to get that language throughout a *lot* of code 14:12:15 All of it. :-) 14:12:15 just giving my "are we *sure* we're not overcomplicating this" perspective 14:12:28 dhellmann, it's in the request context 14:12:43 is all code that throws an exception aware of the request context? 14:13:04 ^ 14:13:15 if you call a function that throws exceptions and you don't pass it a context, then catch all exceptions and raise a new exception with translated message 14:13:43 Heh, that's a whole new can of worms. 14:13:46 perhaps with the original message as a non-translated detail 14:13:53 Something that needs to be done anyway though. 14:13:56 so the WSGI framework used could do that, catching Message and rebuilding them with the right language? 14:14:09 jd__: right, that's what I was thinking 14:14:18 rather than making every openstack developer understand how this works 14:14:24 That's essentially where it's happening right now. 14:14:35 * dims says belated o/ 14:14:36 The delayed translation happens in wsgi.py for each project. 14:14:40 except for RPC requests 14:14:51 We're translating RPC requests? 14:14:54 we have a few other requirements to consider 14:15:06 well... RemoteError in Nova 14:15:14 wraps other exceptions basically 14:15:23 I'm not sure updating all of the apps and API implementations to pass a lang val to _() everywhere or translate exceptions and reraise is practical :-/ 14:15:36 I'm not sure expecting every drive-by developer to get that right, either 14:15:37 * jd__ agrees with dhellmann 14:15:53 there's a lot of place that we'll be missed in the future with such an approach 14:16:07 that's why I like the approach of a catch-all handler at the point of output 14:16:25 are there any downside to a catch-all? 14:16:46 There's a lot of code that expects a str/unicode from _ that is now getting a Message. 14:16:55 it complicates exception handling at that point a little, because not every exception is going to have a translatable message 14:16:58 That's basically what led to this discussion. 14:17:11 yeah, that was the other thing I wanted to understand better 14:17:19 what code is expecting a string-like object now? 14:17:46 well, python logging kind of 14:17:54 which is where we hit the issue 14:17:57 ok, we can address that in our adapter 14:18:01 was there something else? 14:18:09 just want to say the way the Message was implemented, it look and feeld very much like a string 14:18:20 but i haven't run into much else than can't be tweaked to account for Messages 14:18:24 so all the places that do "An error happened": "The actual error" 14:18:36 still work and both strings are translatable 14:18:42 what would happen if we made Message.__str__ and Message.__unicode__ raise an exception? 14:18:47 explicitly 14:18:56 dhellmann: we actually talked about that here : https://etherpad.openstack.org/bug-1225099 14:18:57 like, you can't turn these objects into a string-like thing using these methods? 14:19:13 #link https://etherpad.openstack.org/bug-1225099 14:19:14 we = luisg, bpokorny and i 14:19:26 what conclusion did you reach? 14:19:38 i think we need to do approach 2 ilsted there ^ 14:20:07 well there are options but i like the one where if we can't str() a message (since it has non-ascii bytes) then we raise the UnicodeError and let the code deal with it 14:20:21 Well, the logging issue is a bug in the logging module IMHO. 14:20:34 instead of trying to determine what the best encoding is 14:20:34 So I'd say the adapter fix addresses that to my satisfaction. 14:20:41 well, the other thing we need to do is ensure that Message is always just returning unicode values (never byte strings) 14:20:48 beekneemech: take a look at the link 14:21:10 i think they actually handled it well, they try to str() it, but if it doesn't work they defer 14:21:13 beekneemech: yeah, we should verify that python3's logger works properly in that case 14:21:30 dhellmann: yeah that is the other option 14:21:34 luisg_: but what happens when that unicode error is raised? can the caller figure it out? 14:21:45 or do we just get unicode errors in the log files and API responses? 14:21:49 dhellmann: u mean in the logger? 14:22:06 anywhere 14:22:08 API responses should be using unicode already 14:22:11 it is actually not raised, the try to str() if it's not possible they leave the object alone 14:22:15 i think there was a bunch of work around that in Nova anyways 14:22:40 luisg_: doesn't that introduce extra cases, then? sometimes it works and sometimes it does not, so you always have to be able to handle either 14:22:49 it seems better to be explicit 14:23:03 say that a Message instance is not a string and cannot be converted to a string without providing a locale 14:23:18 if they unicode(message) that will always work 14:23:20 it actually just makes it look more like a str in that if u try to str non0-ascii it won't let u 14:23:20 so no __str__ or __unicode__, only an explicit method call 14:23:43 mrodden: it will always work, but what language will the message be in? 14:23:48 dhellmann: i think we need to at least have unicode, b/c everybody expects unicode out of _() 14:24:03 the point of Message is not just to handle the encoding, it's also to handle the translation 14:24:23 luisg_: who is everybody? 14:24:31 consumers of _() 14:24:52 I mean, who in this case is expecting a unicode object that we can't update to handle Message instances? 14:25:02 i would expect unicode(message) to return it in a default locale, probably the system locale 14:25:06 what else consumes _() responses other than logging and API handlers? 14:25:06 dhellmann: if you force __init__ to have a locale with default to system, you can have __str__ or __unicode__, no? 14:25:23 jd__: that's back to markmc's suggestion of passing locale to _() 14:25:40 dhellmann: yay but by default you pass nothing, and the WSGI catcher rebuild them with the locale set 14:25:48 mrodden: that's a reasonable default, but where is it useful that we could not require an explicit locale? 14:26:07 The problem is that if you call str() on a Message that translates to un-str-able characters it fails. 14:26:10 As it should, of course. 14:26:18 beekneemech: corect 14:26:19 beekneemech: +! 14:26:22 dhellmann: or just set the .locale (or so) attribute of the message 14:26:22 +1 14:26:23 thats what i realized 14:26:23 the reason for disallowing implicit conversion is it avoids cases of failing to explicitly handle the translation 14:26:47 right i see what dhellmann is saying... 14:26:48 so we don't have cases where sometimes the caller gets english and sometimes they get their language 14:27:03 dhellmann: I don't see how avoiding implicit helps? 14:27:04 so we force *ourselves* to handle all of those cases by uncovering exceptions 14:27:27 because anything that tries to treat a Message as a str() or unicode() ends up with an exception immediately 14:27:36 rather than leaking the wrong language 14:27:45 those might take awhile to flush out 14:27:46 dhellmann: r u proposing we go with option 3? 14:27:52 but that would be nice to do 14:28:04 luisg_: no, I am suggesting that both __str__ and __unicode__ should raise a RuntimeError 14:28:24 a RuntimeError for unicoding a message? 14:28:26 "you must translate Message objects explicitly before outputting them" 14:28:47 The problem I see there is we end up with special-case code all over the place for lazy vs. not-lazy translation. 14:28:48 luisg_: there are 2 concerns: encoding and language 14:28:48 still not getting why the system default is not a good one 14:28:53 here's a thought - if there is an error translating at runtime, can we gracefully fall back to no translation 14:28:54 Which I guess is okay. 14:29:00 beekneemech: we're going to have that anyway, no? 14:29:05 i.e. the same fallback as if there is no translation available 14:29:11 As long as we understand that means we can't go back by just switching lazy=True to lazy=False like we just did. 14:29:13 markmc: +1 14:29:16 markmc: don't raise from __unicode__ 14:29:24 * markmc concerned about people hitting a public API, requesting obscure languages and tickling bugs 14:29:46 Which is pretty much what happened with the logging bug. :-) 14:29:48 dont raise from unicode? 14:29:54 dhellmann: with a default locale Message works just like if there was not message class at all 14:30:01 beekneemech: that switch will still turn off the use of Message objects, which would make _() return unicode objects, which wouldn't need special cases 14:30:31 luisg_: isn't the point of the API requirement that there is no default locale? 14:30:49 the logging bug actually happened because we are forcing utf-8 encoding on strings that are later handled used with sys.getdefaultlocale() like other strings 14:30:55 no 14:31:03 the point is that u can translate from the default locale to others 14:31:06 markmc: gettext handles an unsupported language, doesn't it? 14:31:15 doesn't it just return the original string? 14:31:19 yes 14:31:26 that's not the case I'm talking about 14:31:28 I mean e.g. 14:31:44 dhellmann: But if we're going to say you can't call str or unicode on Messages, then we need some other way to get the value out, which won't work if we turn off lazy translation and start returning unicode again. 14:31:44 _("%(flavour)s") % flavor 14:31:56 "no key 'flavour' found" 14:31:59 or whatever 14:32:18 beekneemech: not every string is going to be wrapped in our _() (third party libraries, esp.) so we have to handle strings as a case anyway 14:33:03 * jd__ ponders if it wouldn't be easier to teach english to everyone 14:33:05 i think the first concern should be enabling that to happen for our strings (our=from opentack) 14:33:09 lol 14:33:14 Okay, so we're looking at a bunch of if isinstance Message stuff then, right? 14:33:24 jd__: +1 :-) 14:33:27 basically, yes, but in 2 general places 14:34:13 markmc: I think I get what you're saying. That exception wouldn't be raised by our code, so the exception would have a real string in it not a Message, so the output code would just pass it along. 14:35:15 dhellmann, (_("%(flavour)s") % flavor) returns a Message, right? 14:35:23 yeah 14:35:39 dhellmann, in __unicode__(), if there's an exception, try again with the untranslated string 14:35:49 markmc: oh, yeah, sorry, was misreading the location of % 14:36:15 markmc: yes, that makes sense 14:36:46 i dont follow? 14:36:59 mrodden: translation has 3 steps 14:36:59 where would __unicode__() be called in that exapmle 14:37:08 first, translate the message and try to combine it with the args 14:37:25 calling unicode() on the Message returned from that expression 14:37:29 ok 14:37:42 if the first attempt fails, take the original untranslated string and combine it with the args 14:37:47 we actually run the % internally in message just as a sanity check 14:37:59 if that fails, which error do we report? 14:38:02 and let any KeyError raise from there 14:38:22 mrodden: yeah, the Message object still needs to support % to hold onto its arguments 14:38:24 since its another developers error usually 14:38:30 mrodden: right 14:38:38 So we would throw an exception before getting to the unicode step, right? 14:38:43 During the % operation. 14:38:44 correct 14:38:53 our error needs to include the untranslated message so they have some hope of finding where it came from in the code, because the traceback won't point there 14:39:01 oh, sure, that's a way to do it 14:39:33 although there is still a chance that the untranslated format string will work, but the translated one will not 14:39:43 that is another concern 14:39:48 yes 14:39:52 we can log that case, and then return the untranslated message 14:40:11 but if __mod__ raises when the untranslated string doesn't work, that would point right to the problem line 14:40:22 nice, beekneemech 14:40:43 That's how it works now, isn't it? 14:40:47 it actually does it now 14:40:48 yeah 14:40:54 even better 14:40:59 So credit to mrodden :-) 14:41:06 what changes do we need to make to Message, then? 14:41:13 except we don't distinguish between translated to untranslated we just attempt the translated one 14:41:14 * dhellmann tips hat in mrodden's direction 14:41:22 we can fix that though 14:41:28 the problem is in the str() method 14:41:34 described in the link mrodden pasted above 14:41:46 How would you do the translated string at % time? 14:41:52 You don't know the locale yet, do you? 14:41:57 right, you can't 14:42:05 we use the system locale 14:42:09 if available 14:42:19 that is what is returned from the original _() 14:42:51 I think we're focusing too much on making Message behave like a string. It really shouldn't need to do that. 14:43:08 it only needs to be a thing we can turn into a string 14:43:15 right 14:43:18 +1 14:43:19 y 14:43:29 we have about 15 minutes, let's summarize the changes we want to make 14:43:34 i does not need to be basestring or unicode 14:43:43 1. change the base class of Message to just object? 14:43:50 __str__ should raise RunTime errors 14:44:05 yeah make Message str() raise like any other str() would with non-ascii 14:44:10 (i think thats what we concluded) 14:44:13 that is the main change and would solve the bug 14:44:20 That makes sense to me. 14:44:24 luisg_: it should raise a RuntimeError that the operation isn't permitted, I think 14:44:37 i thought we would just raise a UnicodeError 14:44:41 that is whawt the logger is expeceting 14:44:45 or any consumer 14:44:47 Catch this sort of problem up front, rather than wait for someone to request the problem locale. 14:44:52 the logger will never get a Message object 14:45:14 that's the main change: we are NEVER going to pass a Message to code outside of OpenStack 14:45:33 we will always handle the translation at the point where a Message would have been passed, and then pass the resulting string 14:45:38 * beekneemech wonders how many places that will require changes. 14:45:52 Investigating that is probably a todo from this. 14:45:56 yeah 14:45:58 yes 14:46:03 shoudl be easy to throw warnings for now 14:46:08 we expect it to be logging (handled already) and API 14:46:10 to find the problem points 14:46:23 ok, back to the changes 14:46:53 add an explicit translation method to Message that takes a required locale argument (maybe we have that already?) 14:47:10 It was proposed, but not merged yet. 14:47:14 no, it looks like we're sticking the locale into the message and then calling unicode 14:47:15 ok 14:47:21 we kind of do, but it needs to be fixed up 14:47:24 yea 14:47:40 there is a module level one we use essentially 14:47:43 can we remove the _locale attribute and locale property? 14:47:47 it should probably just be a function on the object 14:47:52 mrodden: right 14:47:57 translate()? 14:48:03 get_localized_message 14:48:09 ok 14:48:13 is what it is now 14:48:24 translate() would be a good candidate for the new one 14:48:45 do we need the data property? 14:48:55 probably not any more 14:49:01 unsure though 14:49:46 I saw some recursive handling of Message in the locale setter, so that will need to move to translate() 14:50:05 yep 14:50:13 most of that will move to translate i'd imagin 14:50:14 e 14:50:19 the methods for adding messages together can probably go, right? 14:50:35 most of the operator methods, I guess 14:50:38 that would be tricky 14:50:41 __add__, __mul__, etc. 14:50:50 because we're not going to pretend a Message is a string 14:50:51 requires quite a bit of cod changes 14:50:52 yeah 14:50:56 code* 14:51:47 assuming this is all Icehouse work we are planning ? 14:51:48 do you think we still need them? 14:51:52 mrodden: yes 14:51:54 ok 14:52:06 so just trying to take a step back for a second 14:52:08 we can work on migrating away from them 14:52:15 are we adding Messages and strings (or messages) anywhere? I'm not sure how those are used 14:52:24 that sounds like we want to re-design what markmc beekneemech mrodden had done at the beginning of havana 14:52:26 which we know works 14:52:36 would it be possible for us to just fix the bug 14:52:37 concatenation 14:52:46 that caused all this, and try it for a little bit 14:52:59 (I'm in favour of re-thinking this if we think it'll get us too a better place - too much magic currently IMHO) 14:53:04 mrodden: hmm, ok, maybe we need to keep that stuff 14:53:12 It works, but it's complicated and prone to third-party bugs. 14:53:27 markmc: +1 14:53:45 i think in the case of the logger 14:53:50 mrodden: but it may need to be updated in light of the "we are not a string" approach :-) 14:53:53 the reason why it broke was becuase we were forcing an encoding 14:54:01 I like the idea of making the delayed translation more explicit. 14:54:02 but it has worked in all other places other than that 14:54:15 luisg_: No, logging failed because it was calling str() on an object that resolves to unicode. 14:54:32 Which it shouldn't be doing. 14:54:33 luisg_: what is setting the locale of the Message objects in the API? 14:54:37 dhellmann: yeah, i don't see any issues with it currently, but if we are going to not be a string-like anymore it shouldn't be around then 14:55:07 it fialed because we let the logger str() a non-ascii 14:55:12 mrodden: ok, we can leave that for last, just in case 14:55:13 instead of reaising a unicode saying it cant be done 14:55:43 logging doesn't try any special encodings, it only knows about the default 'ascii' 14:55:53 correct 14:55:57 which is why it was choking on utf8 encoded str we were giving it 14:56:04 it actually jsut uses the default, which is normally ascii 14:56:11 we're almost out of time here 14:56:22 but it does know that if that dos not work it does not encode 14:56:45 other changes: find the spot(s) in the API code where explicit translation is needed and update the way it works (or add it) 14:57:06 that should be pretty easy 14:57:14 i have to catch almost everything going out that endpoint already 14:57:30 luisg_: I hear what you're saying, but I'm just not comfortable with turning the current implementation back on. 14:57:58 something else to consider: is there any reason not to have this on all the time, once we get it working smoothly? 14:58:12 it was on all the time in Nova 14:58:17 even in tests 14:58:21 Yeah, the lazy value was already hard coded. 14:58:25 mrodden: we'll have to look at the pecan/wsme APIs, too, but those should be relatively easy as well 14:58:30 (we actually found a bunch of localization issues with it on) 14:59:02 well i guess i did 14:59:41 ok, good, so we can re-enable the feature with a change in oslo, then, instead of touching each project 15:00:07 anything else, before we wrap up? 15:00:28 I should ask, who will be working on this? :-) 15:00:48 good question... 15:01:25 I can put some basic details about what we agreed to today into a blueprint, but you guys know more about the implementation than I do 15:01:44 it'll probably end up being our team (luis, bpokorny and myself) 15:02:01 ok, thanks 15:02:03 yeah i was just thinking a blueprint woudl be the first start 15:02:16 starting point 15:02:24 yes, definitely -- would you like me to do that, or do you want to handle it? 15:02:40 i can take what we decided in the meeting and BP-ify it 15:02:48 excellent, thank you again 15:03:10 ok, we're a couple of minutes over, so I think we should wait for the BP before continuing the discussion 15:03:15 +1 15:03:24 thank you all, I appreciate everyone's help on this 15:03:28 thanks doug! 15:03:28 thanks 15:03:40 ty 15:03:46 Thanks all 15:04:02 #endmeeting