21:03:11 <rockyg> #startmeeting #log_wg 21:03:12 <openstack> Meeting started Wed Mar 2 21:03:11 2016 UTC and is due to finish in 60 minutes. The chair is rockyg. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:03:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:03:16 <openstack> The meeting name has been set to '_log_wg' 21:04:35 <rockyg> #topic RequestID in log messages 21:05:44 <rockyg> Abhishek has mostly finished getting requestIDs into the projects and is moving now to get them into logs. 21:06:02 <rockyg> #link http://lists.openstack.org/pipermail/openstack-dev/2016-March/087878.html is where he starts the discusssion 21:06:09 * dhellmann slides in a bit late 21:06:22 <rockyg> Hey dhellmann ! 21:06:39 <rockyg> Talking requestIDs into log messages. Perfect day to be here. 21:07:49 <jokke_> dhellmann: o/ 21:07:50 <dhellmann> hi, rockyg 21:07:53 <dhellmann> hi, johnthetubaguy 21:07:55 <dhellmann> oops, jokke_ 21:08:12 * dhellmann had a tab-completion-failure 21:08:32 <rbradfor> sorry, was out 21:08:33 <jokke_> I'm bit concerned about that req ID list to be dumped to the logs 21:08:57 * jokke_ needs to subscribe back to the mailing lists 21:09:21 <rockyg> I agree the log format sucks, but we can work on that first, does adding request_utils module to oslo.log make sense? 21:09:52 * dhellmann goes to look at what's in request_utils 21:10:03 <rockyg> This is where dhellmann and rbradfor have the best info. Let's give them time to look into this. 21:11:08 <rbradfor> while we are talking about request-ids (and this is likely a larger cross project discussion), a.) is "req-" needed, b) can you shorten the display of ids (e.g. like we do for git commits) 21:11:31 <rockyg> I've been doing a lousy job of actually putting stuff together this cycle. Involved in too many projects/work groups I apologize 21:11:57 <rockyg> So, how to differentiate req- from uuid? 21:12:32 <dhellmann> I don't even see that module any more 21:12:46 <rbradfor> rockyg why do you have to? 21:12:49 <rockyg> dhellmann, could it be in incubator? 21:12:56 <dhellmann> rockyg : not my copy 21:13:16 <rbradfor> and looking at the email, duplication: INFO tempest.tests [req-a0df655b-18a2-4510-9e85-b9435dh8ye4 admin admin] Request ID Link: request.link 'req-a0df655b-18a2-4510-9e85-b9435dh8ye4' -> Target='nova' TargetId=req-f0fb885b-18a2-4510-9e85-b9066b410ee4 21:13:31 <dhellmann> we deleted all the remaining code in the incubator earlier this cycle, so we'll have to recover it from an older git commit, I guess 21:13:45 <rbradfor> so it seems one could use like my idea of 7-8 digits for first in the list (as it's part of the [context] already) 21:13:45 <rockyg> oh, sorry dhellmann . He wants to resurrect it. 21:13:53 <jokke_> rbradfor: that req- prefix was specific request from ops over multiple conversations ... reasoning, can't remember 21:14:05 <rbradfor> jokke_, ok, happy with explanation. 21:14:17 <rbradfor> there was somebody wanting to or proposed a review to remove it recently 21:14:30 <rockyg> So, I don't think Ops cares about how we specify the ids, just so they can tell what they are. 21:15:10 * rbradfor wishes some things like ids were shorter then they are, and they contained structure, not just randomized 21:16:01 <rockyg> This is where the syslog key-value section of the its header can be valuable. originator: xxx, caller: xxx, called: xxx 21:16:09 <jokke_> yeah iirc ops were pretty open for the actual ID type itself during the conversations as long as it was unique and there was some indication that it's the request id (other than just position where it shows up) 21:17:24 <jokke_> rbradfor: I totally agree having to log multiple ID's just makes log lines ridiculously long and difficult for human to parse 21:17:41 <rockyg> rbradfor, well, reqid *could* use a smaller UUID size. Or, that could be a config option (might be best -- default as is but let ops set it based on their cloud size as appropriate) 21:19:56 <rbradfor> I do not know the varied details of uuid lengths in python, except it does exist. I guess some things need to potentially be more unique then others, so is the config of length project wide or note. It's probably way to difficult to implement, and a discussion for elsewhere 21:20:20 <rbradfor> anyway, I digress 21:20:23 <dhellmann> this appears to be the request_utils code? https://review.openstack.org/#/c/170074/ 21:20:45 <rockyg> The ability to define size is already in oslo. Saw it somewhere. You can pick the number of bits 21:21:14 <dhellmann> as far as I know, we're using uuid4 everywhere, which are random numbers. All UUID values have the same length, unless you format them without the - in the middle. 21:21:20 <rbradfor> rockyg, not using oslo.context http://git.openstack.org/cgit/openstack/oslo.context/tree/oslo_context/context.py#n40 21:21:34 <rockyg> Ah. OK. Thanks. 21:21:40 <rbradfor> but some projects could be using their own id generation, 21:22:35 <dhellmann> some do, and they end up with longer request ids iirc 21:22:44 <dhellmann> I think heat is the canonical example of that 21:23:09 <dhellmann> is the idea to include the id chain in every line, or log it once when the request starts? 21:23:43 <rockyg> Idea is to log every time a handoff from client to project or project to project happens. 21:24:09 <dhellmann> ok. I'm not sure the best place to put that, to be honest 21:24:29 <dhellmann> possibly oslo.context? 21:25:12 <dhellmann> frankly, we might want to implement it in one service first and see what the results look like before we pick a library to act as home for the code 21:25:19 <rockyg> We could do it every message, but *very* redundant over time. but Ops might like it. Or prefer the other. Have to ask on that one 21:25:33 <dhellmann> oh, no, I don't think we want that, it just wasn't clear what the proposal was 21:26:43 <rockyg> Here is the way I thought we could include it in the log message: STRUCTURED-DATA = NILVALUE / 1*SD-ELEMENT 21:26:43 <rockyg> SD-ELEMENT = "[" SD-ID *(SP SD-PARAM) "]" 21:26:43 <rockyg> SD-PARAM = PARAM-NAME "=" %d34 PARAM-VALUE %d34 21:26:43 <rockyg> SD-ID = SD-NAME 21:26:43 <rockyg> PARAM-NAME = SD-NAME PARAM-VALUE = UTF-8-STRING ; characters '"', '\' and ; ']' MUST be escaped. 21:26:46 <rockyg> SD-NAME = 1*32PRINTUSASCII 21:26:48 <rockyg> ; except '=', SP, ']', %d34 (") 21:26:58 <jokke_> well the hand off is one thing, but there is no much of joy about it in the logs unless it gets logged in the actual log lines as well 21:28:12 <dhellmann> the question is whether we need it in every line for it to be useful 21:28:17 <rockyg> This is the last header field for syslog message format. We could do SD-ID = ReqId then originating: dispatching: received: As the three 21:28:44 <rockyg> Every handoff would need to be logged. 21:29:36 <rockyg> I am not positive, but I think there is enough redundancy in the rest of the log message that instance id would be trackable in between the handoffs. 21:30:01 <dhellmann> logging just the hand-offs is relatively straightforward, we just have to find all of the places to do that 21:30:48 <rockyg> right. And I believe Abhi took care of that when he instituted the requestID stuff in the API calls/returns 21:32:23 <rockyg> Using the syslog format, we could do one project as POC and the field would be [-] for where it's not implemented 21:34:20 <dhellmann> I'm not sure what you mean? If we're logging discrete events, when would we have an empty field? 21:34:32 <jokke_> dhellmann: the whole point of req ID was to enable tracking requests on flailure 21:35:04 <jokke_> dhellmann: if we do have busy system and no request id in logs we have no idea what error is related to what request 21:35:21 <dhellmann> it is already possible to always log the *current* request id 21:35:42 <jokke_> dhellmann: yes 21:35:56 <dhellmann> I thought this was about logging "I just called from nova into glance from request X to request Y" 21:36:21 <dhellmann> and that's one line, and if you don't have both X ad Y then why are you even logging it? 21:36:45 <dhellmann> or am I misunderstanding something more fundamental? sorry, I've been out of touch with this stuff for a bit so it may have changed from when I last looked. 21:36:53 <rockyg> dhellmann, if the SD-ELEMENT has not been added to the the log message call, syslog format would automatically put in the [-]. Once the sd element is added, it would appear in the message properly 21:37:14 <jokke_> a-ha! sorry my bad ... stuck on the previous proposal of this having list of req- IDs that has gone through system 21:37:30 <dhellmann> rockyg : I'm afraid I don't know what those abbreviations mean. What is "SD"? 21:38:07 <rockyg> from the exerpt of the RFC 5424 "SD" is structured data. It's the last field in the syslog header format. 21:38:19 <jokke_> but that was what the client were proposed returning, not what gets logged it seems 21:39:54 <rockyg> Right now, each project creates its own req-ID when it starts a task. Abhi's just finished work returns those req-ids in the API response. Now he/we want to collect them in a log message. 21:41:45 <dhellmann> rockyg : our syslog API has a log level and a message field. The message is, as far as I can tell, formatted using the same configuration option that is used to specify the log line format for regular formatters. I haven't used that in production with syslog, so I'm not sure if there is extra punctuation added or noth 21:41:47 <dhellmann> *not 21:41:54 <rockyg> Every time an api call results in a new req-id, we would put the new one in the "received" or "called" keypair, put the reqid of the process that made the api call in the "caller" or "dispatched" keypair and the original client call stays the same in the originator keypair 21:42:39 <dhellmann> rockyg : if service A is calling service B, do I do the logging in A or in B? 21:42:45 <dhellmann> or both? 21:44:12 <rockyg> I originally thought A but that doesn't make all that much sense. I think B. The ops guys are talking about using logstash + elastic search or other agregators to get all related log messages 21:45:39 <rockyg> A and B would be more robust, but I am not sure we need that. The key is ops want to see where the line breaks if something doesn't happen. 21:46:47 <jokke_> well we can't log A's req ID in B because that req ID doesn't get sent to B 21:46:48 <rockyg> A andB would allow quick determination that handoff didn't happen but B would show the breakage point. 21:47:15 <rockyg> jokke_, that's how it doesn't make sense ;-) 21:47:33 <jokke_> so only place to log that is in A after B has returned 21:47:39 <dhellmann> ok, so we log requests in A after B returns, since the response from B includes the request id 21:47:54 <jokke_> dhellmann: that's the only way to do that 21:48:05 <rockyg> Oh, good. The mistake in my logic is found. Thank you! 21:48:06 <dhellmann> so A is logging "I just called B with request id Y" and A's request id X is automatically already in the log output 21:48:33 <jokke_> so you need to parse ahead to seek the request ID that has been logged, then roll back in time to look the service B logs what happened 21:48:49 <rockyg> Yeah. Hopefully. If not, we have to insert it. 21:49:58 <dhellmann> so I guess the next step is to either update everywhere we have a client call, or just update the clients, to log the request ids 21:50:33 <dhellmann> if we do it in the clients, using python's standard logging, the overrides from the apps that use oslo.log will give them the current request as part of the format string 21:51:01 <dhellmann> so the question is, do we actually want to do it in the clients, or do we want to do it where the clients are used? 21:51:39 <dhellmann> the latter is a lot more changes 21:51:46 <jokke_> dhellmann: based on the previous discussions around the topic I doubt people want's that logging to happen by the client 21:51:53 <dhellmann> the former would mean always logging a given call at the same level, though 21:52:24 <rockyg> Here's another question: client says: give me a vm. Sends to Nova. Nova send to Neutron: give me a network Does "give me a network" call come from a client? 21:52:26 <dhellmann> ok, so we need to find all outgoing client calls in all apps and wrap them with try:finally and log calls I guess 21:52:44 <dhellmann> rockyg : yes, nova uses neutron client to talk to neutron 21:52:49 <rockyg> Cool. 21:53:09 <rockyg> I believe the amount of work necessary is moot as Abhi and his team plan on doing it. 21:53:31 <dhellmann> well, it's not, because that's a *lot* of code churn and reviews 21:53:51 <rockyg> That's true. Sorry. Wasn't thinking it through. 21:54:12 <jokke_> yes that brings lots of of big reviews or huge amount of small reviews 21:54:36 <dhellmann> it's much less work all around to log from within the clients, so I'm curious about why that was rejected 21:55:04 <rockyg> So, Abhi's email proposed three options: putting the utils in oslo.log, putting it into oslo.utils or putting it in the individual projects. I believe 3 equates to the every client? 21:55:17 <dhellmann> yes, at least 21:55:54 <dhellmann> it's not clear how the request_utils.py code applies to this, if I've found the right version 21:55:59 <dhellmann> https://review.openstack.org/#/c/170074/2/openstack/common/request_utils.py 21:57:21 <dhellmann> it looks like it just builds a very verbose log message? 21:57:46 <dhellmann> well, and there's the notifier bit at the end 21:57:59 <jokke_> rockyg: if it's not done by client there is no way it's done anywhere without logging specifically after each client call 21:58:17 <rockyg> Yeah. That's not gonna work. I see now. His email just restates the code implementation. 21:58:39 <dhellmann> jokke_, rockyg : I guess my point is, why do we even need a function for this? it's just a call to the logger. 21:58:55 <rockyg> So I believe the answer is, that's not the way to do this, and we suggest what we just discussed. 21:58:56 <jokke_> dhellmann: I totally agree 21:59:21 <dhellmann> logging.getLogger(__name__).info('%(current_function)s used request ids [%(request_id_chain)s]') 21:59:46 <rockyg> It goes in clients. It utilizes the syslog structured data field in the message header. Point him to this discussion 22:00:00 <dhellmann> rockyg : we do not, by default, use syslog, so we have to leave that part out 22:00:00 <jokke_> rockyg: as said I highly doubt community will agree clients doing that logging 22:00:48 <jokke_> as people tends to be really touchy libs logging stuff 22:01:40 <rockyg> Cool. So, it's just adding the extra formatter field, plus the related code. So, how do we do the logging in B rather than A? How do we get A's request ID and the originating ID? 22:01:45 <jokke_> we're out of time 22:01:59 <jokke_> rockyg: we don't 22:02:23 <rockyg> Argh. 22:02:35 <jokke_> rockyg: and that extra formatter field won't help as someone needs to initiate that log event 22:02:50 <rockyg> Well, great discussion. We point him to this and continue the ML discussion. 22:03:11 <jokke_> thanks everyone 22:03:17 <rockyg> Is that reasonable for the next week worth of design? 22:03:34 <rockyg> And Yes. Gotta go. docs has this channel next. 22:03:36 <dhellmann> yeah, let's continue on the mailing list 22:03:41 <rbradfor> ok, later 22:03:47 <rockyg> #endmeeting