20:00:47 <Rockyg> #startmeeting log_wg 20:00:47 <openstack> Meeting started Wed May 6 20:00:47 2015 UTC and is due to finish in 60 minutes. The chair is Rockyg. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:48 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:51 <openstack> The meeting name has been set to 'log_wg' 20:00:59 <Rockyg> Roll call! 20:01:07 <dhellmann> o/ 20:01:27 <Rockyg> jokke_, nkrinner 20:01:50 <jokke_> o/ 20:02:02 <Rockyg> bknudson: 20:02:36 <Rockyg> Am I missing anyone? 20:02:54 <bknudson> hi 20:02:58 <Rockyg> eugeniya is not on at least on this channel 20:03:01 <Rockyg> Hey! 20:03:34 <Rockyg> And dhellmann, thanks for showing up. We'll be going for a better time after the summit. 20:04:01 <Rockyg> #topic log sessions at the summit 20:04:16 <dhellmann> ok, this meeting time isn't terrible for me but if we can find a time that's better for others that would be good, too -- we need more members :-) 20:04:33 <Rockyg> We have two ops sessions. A general session and a working session 20:05:05 <Rockyg> We do not have a dev session, but a conversation with doug leads me to believe we aren't ready for that yet. 20:05:30 <dhellmann> here's one: https://libertydesignsummit.sched.org/event/764d77baafe13caaad8ff1badabb9b9a#.VUpz_s5hW9Y 20:05:30 <Rockyg> The ops sessions will get us much closer to that point if we do our jobs. 20:05:31 <dhellmann> where is the second? 20:05:57 <Rockyg> 4:30pm Wednesday 20:06:34 <dhellmann> https://libertydesignsummit.sched.org/event/fbfceb17bc4927136c0aa778d38586d1#.VUp0Rs5hW9Y 20:06:57 <Rockyg> Thanks 20:07:28 <Rockyg> I've started the etherpad but we need to flesh it out some. We will use it for both the general and the working session. 20:07:51 <Rockyg> https://etherpad.openstack.org/p/YVR-ops-logging 20:08:38 <Rockyg> I was trying to edit it yesterday, though and had problems. Might have been my connect to ehterpad.openstack.org, but seemed the site might have been overloaded then 20:09:18 <Rockyg> Is there anything any of you want specifically addressed and discussed at the summit? 20:09:46 <dhellmann> Rockyg: that happens once in a while; you have to force the page to reload to make it work again, unfortunately 20:10:38 <Rockyg> Yeah. I did that and kept getting an error. Other pages weren't having the problem. That page was in a wierd state, so I figured I'd come back and try today. 20:10:49 <bknudson> will be interesting to hear from ops if they have any specific complaints 20:11:04 <dhellmann> Rockyg: hmm, weird 20:11:06 <Rockyg> If I still have an issue, it's off to openstack-infra to discuss after the meeting. 20:11:28 <bknudson> e.g., I can't tell what happened when scheduler fails. 20:11:41 <dhellmann> Rockyg: I'm able to edit the etherpad (I added a link to an oslo.log spec) 20:12:02 <Rockyg> The three big issues from Paris were: 1) tracing the error back to its true origination 20:12:34 <bknudson> luckily we made lots of progress on those 3 issues! 20:12:47 <Rockyg> 2) error messages that were useless (noops, misleading, worse) 20:13:32 <Rockyg> 3) and now I can't remember. 20:14:25 <Rockyg> Another one from before the summit from ops folks I know: stuff happens in eventlets that never make it to logs 20:14:44 <Rockyg> So, propagating important notifications to logs 20:15:43 <Rockyg> Oh, yeah. consistency. Same format, same number of fields. If there is an optional field in the message not used, put a "-" in its place in the message 20:16:22 <Rockyg> Thank you dhellmann for adding that spec link. that was what I was trying to add yesterday 20:17:04 <Rockyg> and yeah, bknudson, we've made progress ;-) 20:17:16 <Rockyg> Actually, we have, thanks to dhellmann and the oslo team 20:17:24 <Rockyg> We have better docs 20:17:38 <Rockyg> we have better guidelines for devs 20:18:06 <bknudson> keystone logs didn't improve... still need the time to work on it. 20:18:09 <Rockyg> And we have improvements on config and a few other things that ops wanted. 20:18:40 <Rockyg> 3) global config setting of log format (syslog) 20:18:51 <Rockyg> 3) is done. 20:19:00 <dhellmann> Rockyg: the ops session on logging is at the same time as the oslo session on logging, so we should talk to dims and whoever is scheduling the ops track to see if we can move one of them 20:19:20 <dhellmann> http://libertydesignsummit.sched.org/event/35475d6e34ad1b4c2ffc5a2ff8cc68ed#.VUp3Rc5hW9Y 20:19:26 <dhellmann> it would be good to have the ops session before ^^ 20:19:58 <Rockyg> dhellmann: Definitely. And agreed. If we can get actionable items out of the ops working session before the oslo session, that would be great. 20:20:11 <dhellmann> that ops session is also up against a talk on logging, which is unfortunate: http://libertydesignsummit.sched.org/event/407b15645cef2cfb4248a28a0c96c9fe#.VUp3a85hW9Y 20:20:18 <dhellmann> we may want to move both of them 20:20:45 <dhellmann> both the ops session and the oslo session, that is 20:20:48 <Rockyg> #action connect with fifeld and dims about reschedule of one of the log sessions currently Wednesday at 4:30 20:21:53 <Rockyg> #link http://libertydesignsummit.sched.org/event/407b15645cef2cfb4248a28a0c96c9fe#.VUp3a85hW9Y 20:22:29 <Rockyg> #link https://libertydesignsummit.sched.org/event/fbfceb17bc4927136c0aa778d38586d1#.VUp0Rs5hW9Y 20:23:39 <Rockyg> Ok. For first meeting, I would like to present the specs and/or reviews that have/will make logging better, and docs 20:24:04 <Rockyg> dhellmann: are you aware of any user docs that talk about logging? 20:24:24 <Rockyg> Config, admin, etc? 20:24:26 <dhellmann> I haven't looked for anything like that, but I would expect the ops manual to cover it somewhere? 20:24:44 <dhellmann> I don't know if they still call the manual "ops guide" or whatever 20:25:02 <Rockyg> I think there is less that a paragraph somewhere. 20:25:10 <Rockyg> I think it's config. 20:25:22 <Rockyg> At least that's what it was like Paris timeframe. 20:26:06 <bknudson> keystone logging: http://docs.openstack.org/admin-guide-cloud/content/keystone-logging.html 20:26:35 <bknudson> we should be able to have keystone logging docs just point to a general logging config doc that all the others point to 20:26:41 <Rockyg> I think that's another reason there are so many complaints. Ops has to dig into dev docs or code to find stuff. 20:27:24 <bknudson> http://docs.openstack.org/admin-guide-cloud/content/section_manage-logs.html 20:27:28 <bknudson> there's the compute one... nice! 20:28:41 <Rockyg> I would like to get ops volunteers to document what they do and have dev review the results. And yeah, bknudson, that's much better. 20:30:05 <Rockyg> I think a target for Liberty is to have a chapter on log, configuring, managing, etc that is for a full stack 20:30:34 <Rockyg> I think that is an attainable goal. 20:30:43 <jokke_> yeap 20:31:10 <Rockyg> So, uh, topic drift. do we have anything else for summit planning? 20:32:36 <Rockyg> #topic general discusson 20:34:12 <Rockyg> I am thinking that a good chunk of the dissatisfaction with logging is lack of understanding of the underlying python log stuff along with a lack of specific documentation of what oslo.log adds/modifies. 20:35:11 <Rockyg> The more I dig and learn about it, the more I see that the naive approach to the logs makes managing them much more difficult for ops. 20:35:17 <jokke_> Looking the logs, it's just not that ... lots of te logging we do is for developers not for ops. What we log, the way we log it etc. 20:36:20 <Rockyg> jokke_: That's also very true. We need to educate developers of how large systems use logs and what they expect from them 20:36:41 <Rockyg> So at a minimum, better docs on both sides 20:37:26 <jokke_> having that would be good start 20:37:39 * dims checks in 20:37:49 <jokke_> then having people actually reading those docs and acting by that would be awesome ;) 20:38:04 <Rockyg> Hey dims, we have a summit scheduling conflict 20:38:34 <Rockyg> the ops log working session and the oslo.log session are at the same time 20:39:00 <dhellmann> dims: during the 4:30 period on wed 20:39:16 <Rockyg> we really need the ops session to happen before oslo.log session 20:39:16 <dhellmann> we have an oslo logging session, an ops logging session, and a conference talk on logging 20:39:51 <dims> dhellmann: ack, will try to move oslo.log 20:39:58 <dims> to thu 20:40:08 <Rockyg> Thanks, dims 20:40:44 <dhellmann> dims: welcome to summit scheduling :-) 20:42:23 <Rockyg> folks, do you think we could get a youtube talk on good logging algorithms (when to log) and message style? Would that help for developers? 20:43:35 <dhellmann> we should probably write down what we would want to say, first 20:44:21 <Rockyg> Definitely. But if it would help, we could spend a meeting on planning the talk. Either at the summit, or after. 20:45:46 <dhellmann> ok. I'm not sure if it would be useful or not, so I'll leave that up to others 20:46:37 <Rockyg> Agreed. If we could get an ATC who is also ops to do it, that might help. Something to think about. 20:46:46 <Rockyg> Anything else? 20:48:09 <Rockyg> One thing I was thinking about, after looking at the og files generated from a tempest run: I want to start a discussion with ops as to whether consolidating some of the logs through a collector might be something they would like 20:48:50 <jokke_> Rockyg: care to elaborate a bit? 20:50:07 <Rockyg> So, a description of how to. From my ops experience, getting all the nova logs (api, etc) into a single log would at least put all of them in one single file and one place for looking for nova issues 20:50:46 <Rockyg> https://docs.google.com/spreadsheets/d/1XTncfK_droY8E-Uy2icVuU-z9ya38ZBK_ZIRvGfPOXc/edit#gid=0 20:51:27 <Rockyg> is my start on listing all the files and their types. Sorry, bknudson, keystone logs aren't in there yet because I haven't gotten to the apache logs 20:51:30 <jokke_> in my experience / by my knowledge OS is impossible to run without something like that already in place ;) 20:52:15 <Rockyg> for instance, nova has: api, cond, cpu, crt, obj, sch, agt logs all separate 20:52:30 <jokke_> I'd be very surprised if someone ran production without having some kind of centralized logging where they pull all that together 20:52:43 <Rockyg> Wow. major freenode delay1 20:53:34 <Rockyg> right now, they use logstash or splunk to pull them together. but, if they were already more consolidated, setup would be a lot easier. And it might make life easier for smaller cloud installations. 20:54:36 <Rockyg> All the logs and their locations should also be documented in the various project chapters for the amin guides 20:55:03 <bknudson> you can configure the logs to go anywhere 20:55:15 <jokke_> do you mean something like instructions how to set all services logging to syslog and managing the split to files from there? 20:55:47 <Rockyg> bknudson: yes, but you can also configure them to write related logs to a single "local" log file first. 20:57:54 <Rockyg> jokke_ bknudson: more like have oslo.log send all nova.xxx files to /etc/logs/nova.syslog. then the thirdparty aggregator only has to grab one nova log (well, likely a couple because of other types of logs you'd want from that system) from each nova instance 20:59:27 <Rockyg> right now, nova.conductor messages go to nova-cond and nova.api messages go to nova-api. Why wouldn't you want the api messages in the same file as the conductor messages? They are both part of the nova "system?" 20:59:33 <jokke_> doubt anyone wants to have their logs in /etc/ but I see your point ;) 21:00:37 <Rockyg> sorry :-( but, yeah. I want to discuss this with ops, first, but if a slightly modified default setup for the files themselves make sense, we can do that with either a sample config file, or .... 21:01:00 <Rockyg> /var/logs ...that better? 21:01:01 <dhellmann> having multiple processes write to the same file will introduce race conditions and corrupt the logs 21:01:15 <jokke_> yup 21:01:30 <dhellmann> that's why we write to separate files, and deployers can use something like syslog if they want a single file 21:01:43 <Rockyg> dhellmann: if we use the collector and parent setup from python.log, modelled on syslog? 21:01:47 <jokke_> thus you need something like the syslog if you want to have them all in one 21:02:01 <dhellmann> Rockyg: then that's another service we have to maintain, instead of using one that already exists 21:02:34 <Rockyg> I think it's already in python.log 21:05:12 <Rockyg> I will look more closely and see what I find. I thinkit's the syslog handler or socket handler. 21:05:21 <Rockyg> But, it is TIME. 21:05:24 <dhellmann> Rockyg: ok, I'm not sure what you mean but I'll look for it too 21:05:30 <Rockyg> #endmeeting