Thursday, 2018-01-25

*** kumarmn has joined #openstack-tc00:09
*** kumarmn has quit IRC00:14
*** kumarmn has joined #openstack-tc00:21
*** kumarmn has quit IRC00:38
*** liujiong has joined #openstack-tc01:36
*** gcb has joined #openstack-tc02:15
*** kumarmn has joined #openstack-tc02:39
*** kumarmn has quit IRC02:43
*** ianychoi has quit IRC02:49
*** ianychoi has joined #openstack-tc02:49
*** harlowja has quit IRC03:21
*** openstackgerrit has quit IRC03:33
*** rosmaita has quit IRC04:06
*** harlowja has joined #openstack-tc04:28
*** kumarmn has joined #openstack-tc04:37
*** kumarmn has quit IRC05:09
*** kumarmn has joined #openstack-tc05:09
*** kumarmn has quit IRC05:14
*** harlowja has quit IRC05:17
*** ianychoi has quit IRC06:08
*** ianychoi has joined #openstack-tc06:09
*** liujiong has quit IRC07:18
*** gcb has quit IRC08:26
*** gcb has joined #openstack-tc08:29
*** jpich has joined #openstack-tc08:51
*** gcb has quit IRC11:17
*** gcb has joined #openstack-tc11:20
*** kumarmn has joined #openstack-tc12:10
*** kumarmn has quit IRC12:15
*** rosmaita has joined #openstack-tc12:56
*** kumarmn has joined #openstack-tc14:00
*** david-lyle has quit IRC14:01
-openstackstatus- NOTICE: We're currently experiencing issues with the server which will result in POST_FAILURE for jobs, please stand by and don't needlessly recheck jobs while we troubleshoot the problem.14:26
smcginnistc-members: Office hours, it seems.15:00
* smcginnis will be distracted though15:00
EmilienMsame, in meetings as usual...15:01
TheJuliaerr, o/15:09
TheJuliaClearly, I require more coffeee15:09
cmurphyit's a more coffee kind of week15:09
TheJuliaI'm really feeling like we need to push an effort forward to be more kind to CI resources15:10
smcginnisTheJulia: I've had the same thought. We want good test coverage, but I think we do put a lot of unnecessary load out there.15:12
TheJulianot quite like be kind, please rewind, but be kind, be sensible15:12
TheJuliaI think some of the tests in ironic can be made to run parallel, and we can improve log gathering to only grab essentials unless we are debugging a job or there is a failure.15:14
cmurphylikely fungi would have thoughts here if he weren't traveling15:16
TheJuliaI totally see the argument to collecting all the logs all the time, but often to verify success only a handful of logs are needed for jobs15:17
smcginnisThe problem is always when you realize too late that that one skipped file would have the information you need to fix something. :)15:19
dhellmannare there specific issues triggered by the current resource use patterns?15:22
TheJuliadhellmann: well, we've seem to have run out of space, and also from a standpoint as a contributor in ironic, there is a willingness just to create more and more jobs with-in the community, since there is not a great understanding often of what resources are consumed on average15:23
dhellmannTheJulia : I thought that was zuul's log volume, not from the jobs? maybe I misunderstood something.15:24
TheJuliaif there was an "average job log size" counter or something like that, and minutes of compute time used counter, perhaps that might help people grok the actual cost15:24
dhellmannoh, having some stats would be interesting15:24
dhellmannalthough I'm not sure we necessarily want to encourage log sizes of 0 :-)15:24
pabelangerthe current outage isn't a space issue, we've seem to lost a volume on logs.o.o (not sure why yet) but working on fscks now15:24
dhellmannthanks, pabelanger15:25
TheJuliadtroyer: exactly, and I'm not advocating no logs :)15:25
TheJuliapabelanger: thanks!15:25
pabelangerbut due agree, when we have issues with logs.o.o all jobs are affected15:25
TheJuliaI remember.... maybe 3 years ago fungi did come up with some average numbers for log data...15:26
cmurphyI feel like most of the issues we tend to see are due to bugs that can be fixed, not so much to greedy usage15:26
TheJuliaI wouldn't call it greedy, I would call it overzealous or viewing the resources as free when they really are not free15:27
pabelangeryah, we had some metrics at last PTG too. We had a project that was uploading a lot of data to logs.o.o, much more then before. But I feel things have been in a good spot for logs.o.o the last few months (aside from today)15:27
TheJuliaSomeone has to pay the power bill....15:27
TheJuliapabelanger: makes sense to do it again, imho, and possibly encourage consideration of log/resource usage if teams are grumpy regarding CI and intend to discuss it at the PTG.15:29
cmurphyI guess ideally the system would deal with it and regulate it somehow, I'm not sure I want it to be on the developers to have to be self-aware or for the jobs to suffer15:29
pabelangeryah, we've discussed in the past how we could make log storage better. And even with zuulv3, we've discussed some sort of function to limit the amount of data a job could push to logs.o.o, but so far we haven't enabled any of that.15:31
dhellmannwe don't retain logs indefinitely, right?15:31
pabelangerBut do agree, it would be nice to see projects we aware of resources consumed.15:31
pabelangerdhellmann: no, we are down to maybe 30 days now15:31
pabelangerdue to sizes15:31
TheJuliaI'm also not thinking of just logs, but also IO/bandwidth is also a consideration that ultimately has a cost and impact, and I would think... and maybe I am just thinking with my ops hat on, that those impacts do ultimately can relate to the overall performance, of course, if the cloud performance was perfectly reliable, then it becomes easy to just have a benchmark on jobs15:33
TheJuliamaybe average out counters at the end of the jobs, but I'm not sure we're really retaining enough of that to do anything beyond bandwidth (since that should be easy)15:35
pabelangerwe also store some of those metrics in graphite.o.o today, job run times for example15:36
pabelangerwhich has a much longer retention15:36
pabelangerzuulv3 also has an sql reporter where we now track data too15:36
TheJuliaexcept underlying clouds make overall runtimes variable15:36
TheJuliagranted, there is the unaccounted time report from devstack that has been useful at times in the past.15:37
*** david-lyle has joined #openstack-tc15:38
pabelangerAs it relates to CI usage, we are also down another cloud this week (infracloud). So that also add pressure to hold long developers wait for jobs to run in the gate. Something to also keep in mind15:38
dhellmannhow do people feel about the goal selection process?15:48
dhellmanndo we have consensus on a goal or two for rocky?15:49
cmurphyseems to be still up in the air from what i can tell, but i feel like mox and mutable config are in the lead15:51
*** hongbin has joined #openstack-tc15:52
dhellmannmutable config seems less interesting now with so many folks deploying in containers15:52
cmurphycontainers make service restarts unnecessary? o.015:58
pabelangerI thought the workflow was more about deploying a new container, with changes, then stop / start running one15:59
pabelanger(haasn't really used containers)16:00
dhellmanncmurphy : I thought the pattern for managing container-based apps was to launch a new one and kill the old one.16:00
dhellmannright, what pabelanger said16:01
dhellmannso it's not that restarting is not needed, it's just the norm16:01
dhellmannand given the isolation, I don't know how we would send a signal to the service inside the container when it does  need to reread the file16:01
dhellmannat one point I was working with the tripleo team to look at confd for that, but we ultimately decided that made the containers themselves more complicated16:02
-openstackstatus- NOTICE: is stabilized and there should no longer be *new* POST_FAILURE errors. Logs for jobs that ran in the past weeks until earlier today are currently unavailable pending FSCK completion. We're going to temporarily disable *successful* jobs from uploading their logs to reduce strain on our current limited capacity. Thanks for your patience !16:02
cmurphythat seems like quite a lot of work if what you want to do is turn on debug logging16:04
pabelangerdhellmann: my understanding of confd and containers, is that is how it is used outside of opentsack. So, I am unsure why it would be more complicated16:05
dhellmannpabelanger : it looked like "config maps" were the new hotness for k8s16:06
pabelangerdhellmann: oh, maybe. Haven't really looked into that16:07
pabelangercmurphy: yah, that workflow is much like nodepool and DIB changes. I can see how it would take a while to make that change. But also agree, supporting a reload should also be there16:08
dhellmannit looked easier to update the map and tell k8s to relaunch the container than to push new config somewhere via some other way and have the container pick that up16:09
dhellmannuse the built-in tools16:09
pabelangerYah, if that is how the k8s community has moved towards, that is great. My fear was we as openstack would implement some other method to do it, specific to us16:10
pabelangerglad to see that isn't the case16:10
dhellmannright, I don't think we want that16:11
dhellmannnow, not everyone deploys with containers, so maybe this is still useful16:11
dhellmannthe mutable config that is16:11
* fungi apologizes for missing yet another office hour. trying to catch up16:23
*** dtantsur|afk is now known as dtantsur16:31
*** openstackstatus has quit IRC16:41
*** openstackstatus has joined #openstack-tc16:43
*** ChanServ sets mode: +v openstackstatus16:43
*** dtantsur is now known as dtantsur|afk17:15
*** jpich has quit IRC17:25
*** diablo_rojo has joined #openstack-tc18:07
*** david-lyle has quit IRC18:08
*** diablo_rojo has quit IRC18:43
*** david-lyle has joined #openstack-tc19:09
*** david-lyle has quit IRC19:09
*** david-lyle has joined #openstack-tc19:26
*** harlowja has joined #openstack-tc19:44
*** flwang has quit IRC20:23
*** flwang has joined #openstack-tc20:36
*** ianychoi has quit IRC23:26
*** ianychoi has joined #openstack-tc23:27
*** kumarmn has quit IRC23:31
*** kumarmn has joined #openstack-tc23:32
*** kumarmn has quit IRC23:36

Generated by 2.15.3 by Marius Gedminas - find it at!