Thursday, 2018-01-25

*** kumarmn has joined #openstack-tc		00:09
*** kumarmn has quit IRC		00:14
*** kumarmn has joined #openstack-tc		00:21
*** kumarmn has quit IRC		00:38
*** liujiong has joined #openstack-tc		01:36
*** gcb has joined #openstack-tc		02:15
*** kumarmn has joined #openstack-tc		02:39
*** kumarmn has quit IRC		02:43
*** ianychoi has quit IRC		02:49
*** ianychoi has joined #openstack-tc		02:49
*** harlowja has quit IRC		03:21
*** openstackgerrit has quit IRC		03:33
*** rosmaita has quit IRC		04:06
*** harlowja has joined #openstack-tc		04:28
*** kumarmn has joined #openstack-tc		04:37
*** kumarmn has quit IRC		05:09
*** kumarmn has joined #openstack-tc		05:09
*** kumarmn has quit IRC		05:14
*** harlowja has quit IRC		05:17
*** ianychoi has quit IRC		06:08
*** ianychoi has joined #openstack-tc		06:09
*** liujiong has quit IRC		07:18
*** gcb has quit IRC		08:26
*** gcb has joined #openstack-tc		08:29
*** jpich has joined #openstack-tc		08:51
*** gcb has quit IRC		11:17
*** gcb has joined #openstack-tc		11:20
*** kumarmn has joined #openstack-tc		12:10
*** kumarmn has quit IRC		12:15
*** rosmaita has joined #openstack-tc		12:56
*** kumarmn has joined #openstack-tc		14:00
*** david-lyle has quit IRC		14:01
-openstackstatus- NOTICE: We're currently experiencing issues with the logs.openstack.org server which will result in POST_FAILURE for jobs, please stand by and don't needlessly recheck jobs while we troubleshoot the problem.		14:26
smcginnis	tc-members: Office hours, it seems.	15:00
cmurphy	hello	15:00
* smcginnis will be distracted though		15:00
EmilienM	o/	15:00
EmilienM	same, in meetings as usual...	15:01
dtroyer	ola!	15:01
TheJulia	o	15:09
TheJulia	err, o/	15:09
TheJulia	Clearly, I require more coffeee	15:09
cmurphy	it's a more coffee kind of week	15:09
TheJulia	I'm really feeling like we need to push an effort forward to be more kind to CI resources	15:10
smcginnis	TheJulia: I've had the same thought. We want good test coverage, but I think we do put a lot of unnecessary load out there.	15:12
TheJulia	not quite like be kind, please rewind, but be kind, be sensible	15:12
TheJulia	I think some of the tests in ironic can be made to run parallel, and we can improve log gathering to only grab essentials unless we are debugging a job or there is a failure.	15:14
cmurphy	likely fungi would have thoughts here if he weren't traveling	15:16
TheJulia	likely	15:16
TheJulia	I totally see the argument to collecting all the logs all the time, but often to verify success only a handful of logs are needed for jobs	15:17
smcginnis	The problem is always when you realize too late that that one skipped file would have the information you need to fix something. :)	15:19
dhellmann	o/	15:21
dhellmann	are there specific issues triggered by the current resource use patterns?	15:22
TheJulia	dhellmann: well, we've seem to have run out of space, and also from a standpoint as a contributor in ironic, there is a willingness just to create more and more jobs with-in the community, since there is not a great understanding often of what resources are consumed on average	15:23
dhellmann	TheJulia : I thought that was zuul's log volume, not from the jobs? maybe I misunderstood something.	15:24
TheJulia	if there was an "average job log size" counter or something like that, and minutes of compute time used counter, perhaps that might help people grok the actual cost	15:24
dhellmann	oh, having some stats would be interesting	15:24
dhellmann	although I'm not sure we necessarily want to encourage log sizes of 0 :-)	15:24
pabelanger	the current outage isn't a space issue, we've seem to lost a volume on logs.o.o (not sure why yet) but working on fscks now	15:24
dhellmann	thanks, pabelanger	15:25
TheJulia	dtroyer: exactly, and I'm not advocating no logs :)	15:25
TheJulia	pabelanger: thanks!	15:25
pabelanger	but due agree, when we have issues with logs.o.o all jobs are affected	15:25
TheJulia	I remember.... maybe 3 years ago fungi did come up with some average numbers for log data...	15:26
cmurphy	I feel like most of the issues we tend to see are due to bugs that can be fixed, not so much to greedy usage	15:26
TheJulia	I wouldn't call it greedy, I would call it overzealous or viewing the resources as free when they really are not free	15:27
pabelanger	yah, we had some metrics at last PTG too. We had a project that was uploading a lot of data to logs.o.o, much more then before. But I feel things have been in a good spot for logs.o.o the last few months (aside from today)	15:27
TheJulia	Someone has to pay the power bill....	15:27
TheJulia	pabelanger: makes sense to do it again, imho, and possibly encourage consideration of log/resource usage if teams are grumpy regarding CI and intend to discuss it at the PTG.	15:29
cmurphy	I guess ideally the system would deal with it and regulate it somehow, I'm not sure I want it to be on the developers to have to be self-aware or for the jobs to suffer	15:29
pabelanger	yah, we've discussed in the past how we could make log storage better. And even with zuulv3, we've discussed some sort of function to limit the amount of data a job could push to logs.o.o, but so far we haven't enabled any of that.	15:31
dhellmann	we don't retain logs indefinitely, right?	15:31
pabelanger	But do agree, it would be nice to see projects we aware of resources consumed.	15:31
pabelanger	dhellmann: no, we are down to maybe 30 days now	15:31
pabelanger	due to sizes	15:31
dhellmann	ok	15:31
TheJulia	I'm also not thinking of just logs, but also IO/bandwidth is also a consideration that ultimately has a cost and impact, and I would think... and maybe I am just thinking with my ops hat on, that those impacts do ultimately can relate to the overall performance, of course, if the cloud performance was perfectly reliable, then it becomes easy to just have a benchmark on jobs	15:33
TheJulia	maybe average out counters at the end of the jobs, but I'm not sure we're really retaining enough of that to do anything beyond bandwidth (since that should be easy)	15:35
pabelanger	we also store some of those metrics in graphite.o.o today, job run times for example	15:36
pabelanger	which has a much longer retention	15:36
pabelanger	zuulv3 also has an sql reporter where we now track data too	15:36
TheJulia	except underlying clouds make overall runtimes variable	15:36
pabelanger	agree	15:36
TheJulia	granted, there is the unaccounted time report from devstack that has been useful at times in the past.	15:37
*** david-lyle has joined #openstack-tc		15:38
pabelanger	As it relates to CI usage, we are also down another cloud this week (infracloud). So that also add pressure to hold long developers wait for jobs to run in the gate. Something to also keep in mind	15:38
pabelanger	/hold/how	15:39
dhellmann	how do people feel about the goal selection process?	15:48
dhellmann	do we have consensus on a goal or two for rocky?	15:49
cmurphy	seems to be still up in the air from what i can tell, but i feel like mox and mutable config are in the lead	15:51
*** hongbin has joined #openstack-tc		15:52
dhellmann	interesting	15:52
dhellmann	mutable config seems less interesting now with so many folks deploying in containers	15:52
cmurphy	containers make service restarts unnecessary? o.0	15:58
pabelanger	I thought the workflow was more about deploying a new container, with changes, then stop / start running one	15:59
pabelanger	(haasn't really used containers)	16:00
dhellmann	cmurphy : I thought the pattern for managing container-based apps was to launch a new one and kill the old one.	16:00
dhellmann	right, what pabelanger said	16:01
dhellmann	so it's not that restarting is not needed, it's just the norm	16:01
dhellmann	and given the isolation, I don't know how we would send a signal to the service inside the container when it does need to reread the file	16:01
dhellmann	at one point I was working with the tripleo team to look at confd for that, but we ultimately decided that made the containers themselves more complicated	16:02
-openstackstatus- NOTICE: logs.openstack.org is stabilized and there should no longer be new POST_FAILURE errors. Logs for jobs that ran in the past weeks until earlier today are currently unavailable pending FSCK completion. We're going to temporarily disable successful jobs from uploading their logs to reduce strain on our current limited capacity. Thanks for your patience !		16:02
cmurphy	that seems like quite a lot of work if what you want to do is turn on debug logging	16:04
pabelanger	dhellmann: my understanding of confd and containers, is that is how it is used outside of opentsack. So, I am unsure why it would be more complicated	16:05
dhellmann	pabelanger : it looked like "config maps" were the new hotness for k8s	16:06
pabelanger	dhellmann: oh, maybe. Haven't really looked into that	16:07
pabelanger	cmurphy: yah, that workflow is much like nodepool and DIB changes. I can see how it would take a while to make that change. But also agree, supporting a reload should also be there	16:08
dhellmann	it looked easier to update the map and tell k8s to relaunch the container than to push new config somewhere via some other way and have the container pick that up	16:09
dhellmann	use the built-in tools	16:09
pabelanger	Yah, if that is how the k8s community has moved towards, that is great. My fear was we as openstack would implement some other method to do it, specific to us	16:10
pabelanger	glad to see that isn't the case	16:10
dhellmann	right, I don't think we want that	16:11
dhellmann	now, not everyone deploys with containers, so maybe this is still useful	16:11
dhellmann	the mutable config that is	16:11
* fungi apologizes for missing yet another office hour. trying to catch up		16:23
*** dtantsur\|afk is now known as dtantsur		16:31
*** openstackstatus has quit IRC		16:41
*** openstackstatus has joined #openstack-tc		16:43
*** ChanServ sets mode: +v openstackstatus		16:43
*** dtantsur is now known as dtantsur\|afk		17:15
*** jpich has quit IRC		17:25
*** diablo_rojo has joined #openstack-tc		18:07
*** david-lyle has quit IRC		18:08
*** diablo_rojo has quit IRC		18:43
*** david-lyle has joined #openstack-tc		19:09
*** david-lyle has quit IRC		19:09
*** david-lyle has joined #openstack-tc		19:26
*** harlowja has joined #openstack-tc		19:44
*** flwang has quit IRC		20:23
*** flwang has joined #openstack-tc		20:36
*** ianychoi has quit IRC		23:26
*** ianychoi has joined #openstack-tc		23:27
*** kumarmn has quit IRC		23:31
*** kumarmn has joined #openstack-tc		23:32
*** kumarmn has quit IRC		23:36

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!