*** kumarmn has joined #openstack-tc | 00:09 | |
*** kumarmn has quit IRC | 00:14 | |
*** kumarmn has joined #openstack-tc | 00:21 | |
*** kumarmn has quit IRC | 00:38 | |
*** liujiong has joined #openstack-tc | 01:36 | |
*** gcb has joined #openstack-tc | 02:15 | |
*** kumarmn has joined #openstack-tc | 02:39 | |
*** kumarmn has quit IRC | 02:43 | |
*** ianychoi has quit IRC | 02:49 | |
*** ianychoi has joined #openstack-tc | 02:49 | |
*** harlowja has quit IRC | 03:21 | |
*** openstackgerrit has quit IRC | 03:33 | |
*** rosmaita has quit IRC | 04:06 | |
*** harlowja has joined #openstack-tc | 04:28 | |
*** kumarmn has joined #openstack-tc | 04:37 | |
*** kumarmn has quit IRC | 05:09 | |
*** kumarmn has joined #openstack-tc | 05:09 | |
*** kumarmn has quit IRC | 05:14 | |
*** harlowja has quit IRC | 05:17 | |
*** ianychoi has quit IRC | 06:08 | |
*** ianychoi has joined #openstack-tc | 06:09 | |
*** liujiong has quit IRC | 07:18 | |
*** gcb has quit IRC | 08:26 | |
*** gcb has joined #openstack-tc | 08:29 | |
*** jpich has joined #openstack-tc | 08:51 | |
*** gcb has quit IRC | 11:17 | |
*** gcb has joined #openstack-tc | 11:20 | |
*** kumarmn has joined #openstack-tc | 12:10 | |
*** kumarmn has quit IRC | 12:15 | |
*** rosmaita has joined #openstack-tc | 12:56 | |
*** kumarmn has joined #openstack-tc | 14:00 | |
*** david-lyle has quit IRC | 14:01 | |
-openstackstatus- NOTICE: We're currently experiencing issues with the logs.openstack.org server which will result in POST_FAILURE for jobs, please stand by and don't needlessly recheck jobs while we troubleshoot the problem. | 14:26 | |
smcginnis | tc-members: Office hours, it seems. | 15:00 |
---|---|---|
cmurphy | hello | 15:00 |
* smcginnis will be distracted though | 15:00 | |
EmilienM | o/ | 15:00 |
EmilienM | same, in meetings as usual... | 15:01 |
dtroyer | ola! | 15:01 |
TheJulia | o | 15:09 |
TheJulia | err, o/ | 15:09 |
TheJulia | Clearly, I require more coffeee | 15:09 |
cmurphy | it's a more coffee kind of week | 15:09 |
TheJulia | I'm really feeling like we need to push an effort forward to be more kind to CI resources | 15:10 |
smcginnis | TheJulia: I've had the same thought. We want good test coverage, but I think we do put a lot of unnecessary load out there. | 15:12 |
TheJulia | not quite like be kind, please rewind, but be kind, be sensible | 15:12 |
TheJulia | I think some of the tests in ironic can be made to run parallel, and we can improve log gathering to only grab essentials unless we are debugging a job or there is a failure. | 15:14 |
cmurphy | likely fungi would have thoughts here if he weren't traveling | 15:16 |
TheJulia | likely | 15:16 |
TheJulia | I totally see the argument to collecting all the logs all the time, but often to verify success only a handful of logs are needed for jobs | 15:17 |
smcginnis | The problem is always when you realize too late that that one skipped file would have the information you need to fix something. :) | 15:19 |
dhellmann | o/ | 15:21 |
dhellmann | are there specific issues triggered by the current resource use patterns? | 15:22 |
TheJulia | dhellmann: well, we've seem to have run out of space, and also from a standpoint as a contributor in ironic, there is a willingness just to create more and more jobs with-in the community, since there is not a great understanding often of what resources are consumed on average | 15:23 |
dhellmann | TheJulia : I thought that was zuul's log volume, not from the jobs? maybe I misunderstood something. | 15:24 |
TheJulia | if there was an "average job log size" counter or something like that, and minutes of compute time used counter, perhaps that might help people grok the actual cost | 15:24 |
dhellmann | oh, having some stats would be interesting | 15:24 |
dhellmann | although I'm not sure we necessarily want to encourage log sizes of 0 :-) | 15:24 |
pabelanger | the current outage isn't a space issue, we've seem to lost a volume on logs.o.o (not sure why yet) but working on fscks now | 15:24 |
dhellmann | thanks, pabelanger | 15:25 |
TheJulia | dtroyer: exactly, and I'm not advocating no logs :) | 15:25 |
TheJulia | pabelanger: thanks! | 15:25 |
pabelanger | but due agree, when we have issues with logs.o.o all jobs are affected | 15:25 |
TheJulia | I remember.... maybe 3 years ago fungi did come up with some average numbers for log data... | 15:26 |
cmurphy | I feel like most of the issues we tend to see are due to bugs that can be fixed, not so much to greedy usage | 15:26 |
TheJulia | I wouldn't call it greedy, I would call it overzealous or viewing the resources as free when they really are not free | 15:27 |
pabelanger | yah, we had some metrics at last PTG too. We had a project that was uploading a lot of data to logs.o.o, much more then before. But I feel things have been in a good spot for logs.o.o the last few months (aside from today) | 15:27 |
TheJulia | Someone has to pay the power bill.... | 15:27 |
TheJulia | pabelanger: makes sense to do it again, imho, and possibly encourage consideration of log/resource usage if teams are grumpy regarding CI and intend to discuss it at the PTG. | 15:29 |
cmurphy | I guess ideally the system would deal with it and regulate it somehow, I'm not sure I want it to be on the developers to have to be self-aware or for the jobs to suffer | 15:29 |
pabelanger | yah, we've discussed in the past how we could make log storage better. And even with zuulv3, we've discussed some sort of function to limit the amount of data a job could push to logs.o.o, but so far we haven't enabled any of that. | 15:31 |
dhellmann | we don't retain logs indefinitely, right? | 15:31 |
pabelanger | But do agree, it would be nice to see projects we aware of resources consumed. | 15:31 |
pabelanger | dhellmann: no, we are down to maybe 30 days now | 15:31 |
pabelanger | due to sizes | 15:31 |
dhellmann | ok | 15:31 |
TheJulia | I'm also not thinking of just logs, but also IO/bandwidth is also a consideration that ultimately has a cost and impact, and I would think... and maybe I am just thinking with my ops hat on, that those impacts do ultimately can relate to the overall performance, of course, if the cloud performance was perfectly reliable, then it becomes easy to just have a benchmark on jobs | 15:33 |
TheJulia | maybe average out counters at the end of the jobs, but I'm not sure we're really retaining enough of that to do anything beyond bandwidth (since that should be easy) | 15:35 |
pabelanger | we also store some of those metrics in graphite.o.o today, job run times for example | 15:36 |
pabelanger | which has a much longer retention | 15:36 |
pabelanger | zuulv3 also has an sql reporter where we now track data too | 15:36 |
TheJulia | except underlying clouds make overall runtimes variable | 15:36 |
pabelanger | agree | 15:36 |
TheJulia | granted, there is the unaccounted time report from devstack that has been useful at times in the past. | 15:37 |
*** david-lyle has joined #openstack-tc | 15:38 | |
pabelanger | As it relates to CI usage, we are also down another cloud this week (infracloud). So that also add pressure to hold long developers wait for jobs to run in the gate. Something to also keep in mind | 15:38 |
pabelanger | /hold/how | 15:39 |
dhellmann | how do people feel about the goal selection process? | 15:48 |
dhellmann | do we have consensus on a goal or two for rocky? | 15:49 |
cmurphy | seems to be still up in the air from what i can tell, but i feel like mox and mutable config are in the lead | 15:51 |
*** hongbin has joined #openstack-tc | 15:52 | |
dhellmann | interesting | 15:52 |
dhellmann | mutable config seems less interesting now with so many folks deploying in containers | 15:52 |
cmurphy | containers make service restarts unnecessary? o.0 | 15:58 |
pabelanger | I thought the workflow was more about deploying a new container, with changes, then stop / start running one | 15:59 |
pabelanger | (haasn't really used containers) | 16:00 |
dhellmann | cmurphy : I thought the pattern for managing container-based apps was to launch a new one and kill the old one. | 16:00 |
dhellmann | right, what pabelanger said | 16:01 |
dhellmann | so it's not that restarting is not needed, it's just the norm | 16:01 |
dhellmann | and given the isolation, I don't know how we would send a signal to the service inside the container when it does need to reread the file | 16:01 |
dhellmann | at one point I was working with the tripleo team to look at confd for that, but we ultimately decided that made the containers themselves more complicated | 16:02 |
-openstackstatus- NOTICE: logs.openstack.org is stabilized and there should no longer be *new* POST_FAILURE errors. Logs for jobs that ran in the past weeks until earlier today are currently unavailable pending FSCK completion. We're going to temporarily disable *successful* jobs from uploading their logs to reduce strain on our current limited capacity. Thanks for your patience ! | 16:02 | |
cmurphy | that seems like quite a lot of work if what you want to do is turn on debug logging | 16:04 |
pabelanger | dhellmann: my understanding of confd and containers, is that is how it is used outside of opentsack. So, I am unsure why it would be more complicated | 16:05 |
dhellmann | pabelanger : it looked like "config maps" were the new hotness for k8s | 16:06 |
pabelanger | dhellmann: oh, maybe. Haven't really looked into that | 16:07 |
pabelanger | cmurphy: yah, that workflow is much like nodepool and DIB changes. I can see how it would take a while to make that change. But also agree, supporting a reload should also be there | 16:08 |
dhellmann | it looked easier to update the map and tell k8s to relaunch the container than to push new config somewhere via some other way and have the container pick that up | 16:09 |
dhellmann | use the built-in tools | 16:09 |
pabelanger | Yah, if that is how the k8s community has moved towards, that is great. My fear was we as openstack would implement some other method to do it, specific to us | 16:10 |
pabelanger | glad to see that isn't the case | 16:10 |
dhellmann | right, I don't think we want that | 16:11 |
dhellmann | now, not everyone deploys with containers, so maybe this is still useful | 16:11 |
dhellmann | the mutable config that is | 16:11 |
* fungi apologizes for missing yet another office hour. trying to catch up | 16:23 | |
*** dtantsur|afk is now known as dtantsur | 16:31 | |
*** openstackstatus has quit IRC | 16:41 | |
*** openstackstatus has joined #openstack-tc | 16:43 | |
*** ChanServ sets mode: +v openstackstatus | 16:43 | |
*** dtantsur is now known as dtantsur|afk | 17:15 | |
*** jpich has quit IRC | 17:25 | |
*** diablo_rojo has joined #openstack-tc | 18:07 | |
*** david-lyle has quit IRC | 18:08 | |
*** diablo_rojo has quit IRC | 18:43 | |
*** david-lyle has joined #openstack-tc | 19:09 | |
*** david-lyle has quit IRC | 19:09 | |
*** david-lyle has joined #openstack-tc | 19:26 | |
*** harlowja has joined #openstack-tc | 19:44 | |
*** flwang has quit IRC | 20:23 | |
*** flwang has joined #openstack-tc | 20:36 | |
*** ianychoi has quit IRC | 23:26 | |
*** ianychoi has joined #openstack-tc | 23:27 | |
*** kumarmn has quit IRC | 23:31 | |
*** kumarmn has joined #openstack-tc | 23:32 | |
*** kumarmn has quit IRC | 23:36 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!