Monday, 2021-11-15

*** rlandy is now known as rlandy\|ruck		00:11
*** ricolin_ is now known as ricolin		04:18
*** ysandeep\|out is now known as ysandeep		05:24
*** amoralej\|off is now known as amoralej		07:00
*** ykarel__ is now known as ykarel		08:19
*** soniya29\|ruck is now known as soniya29\|ruck\|lunch		08:21
*** bauwser is now known as bauzas		08:31
*** giblet is now known as gibi		09:02
*** soniya29\|ruck\|lunch is now known as soniya29\|ruck		09:35
*** ykarel is now known as ykarel\|lunch		09:42
*** ysandeep is now known as ysandeep\|afk		09:44
opendevreview	daniel.pawlik proposed openstack/ci-log-processing master: Initial project commit https://review.opendev.org/c/openstack/ci-log-processing/+/815604	09:49
*** ykarel\|lunch is now known as ykarel		10:43
*** ysandeep\|afk is now known as ysandeep		10:57
*** jpena\|off is now known as jpena		10:59
*** rlandy is now known as rlandy\|ruck		11:19
*** amoralej is now known as amoralej\|lunch		12:27
*** amoralej\|lunch is now known as amoralej		13:38
*** soniya29\|ruck is now known as soniya29\|ruck\|dinner		15:00
*** ykarel is now known as ykarel\|away		15:03
opendevreview	Merged openstack/project-config master: Add Vault role to Zuul jobs https://review.opendev.org/c/openstack/project-config/+/799825	15:08
opendevreview	daniel.pawlik proposed openstack/ci-log-processing master: Initial project commit https://review.opendev.org/c/openstack/ci-log-processing/+/815604	15:27
*** soniya29\|ruck\|dinner is now known as soniya29\|ruck		15:34
lajoskatona	clarkb: Hi, shall I disturb you with gate health and load: https://paste.opendev.org/show/jD6kAP9tHk7PZr2nhv8h/	15:58
lajoskatona	clarkb: it was perhaps January when this list was created, is there a way to get fresh data to see if for example recent Neutron efforts to optimize job execution has any effect?	15:59
*** jpena is now known as jpena\|off		16:08
clarkb	lajoskatona: that output is from a script we can run against zuul logs. I think it may need updating to handle new zuul logs but I can look at that. Note it has nothing to do with gate health just usage	16:16
lajoskatona	clarkb: yeah, thanks that would be useful, please tell if I can help	16:19
clarkb	lajoskatona: the alternative is that you can get the data out of graphite	16:19
clarkb	I think that would be a good long term solution to this then we don't have to run the script anymore (zuul didn't report those stats via statsd until well after the script was written but I believe it does report them now)	16:19
*** amoralej is now known as amoralej\|off		16:24
lajoskatona	clarkb: that something new for me:-)	16:31
*** ysandeep is now known as ysandeep\|out		16:34
clarkb	lajoskatona: I think zuul.nodepool.resources.project.$PROJECT.$RESOURCE is the graphite key	16:37
clarkb	https://graphite.opendev.org/?width=586&height=308&target=stats.zuul.nodepool.resources.project.opendev_org-openstack-neutron.instances for example	16:38
clarkb	looking at that data I'm wondering if it is accurate though. Or maybe the scale isn't what I expect	16:40
lajoskatona	clarkb: I think I found the same, and tried to compare it with nova for example	16:43
clarkb	oh ha I just realized that we run multiple schedulers now so we also have to combine data across them. This problem is getting more fun. I'll try to run the scripts but the multiple schedulers will make the data less clear	16:46
lajoskatona	clarkb: thanks	16:47
clarkb	ok confirmed the script is broken with current logs so that will need debugging (there is a division by 0 as we aren't finding any resource usage :/(	16:52
clarkb	ok the issue is that recording moved into the zuul executors of which there are 12	16:58
clarkb	lajoskatona: ya this is going to need work the log format changed (we no longer record the build name :/ ) and the data is recored on 12 separate instances. I think we should look at making the graphite data work instead since that is collected centrally. It also lacks per job info but that isn't a regression against the new log format	17:10
clarkb	I'm going to hack this script up and run it against ze01 to give you a rough idea of info but to be more accuarte we'll likely need to use graphite one way or another	17:13
clarkb	lajoskatona: https://paste.opendev.org/show/bYTlHXfbX84aESK6cLMM/ there you go as noted the data isn't a complete report	17:17
clarkb	Just a survey of one of the zuul executors	17:17
clarkb	related looks like neutron has been very good a resetting the gate lately	17:19
fungi	nice to see that non-openstack activity accounted for ~10% of the utilization	17:19
*** jgwentworth is now known as melwitt		17:34
lajoskatona	clarkb: thanks	17:52
clarkb	lajoskatona: fungi: https://graphite.opendev.org/?width=943&height=529&target=integralByInterval(stats.zuul.nodepool.resources.project.opendev_org-openstack-neutron.instances%2C%20%221d%22)&target=integralByInterval(stats.zuul.nodepool.resources.project.opendev_org-openstack-nova.instances%2C%20%221d%22)&from=00%3A00_20211109&until=23%3A59_20211115	19:09
clarkb	I think that shows neutron compared to nova's instance seconds use per day daily	19:10
clarkb	The scale of the data changes dramatically when you look prior to the 9th so I'm not sure if there is a bug or what	19:10
clarkb	lajoskatona: you can make a grafana dashboard in openstack/project-config/grafyaml that sets up a graph similar to that to compare openstack projects	19:11
lajoskatona	clarkb: thanks	19:12
clarkb	one thing to note is the graphite data seems to correlate to my ze01 data. the nova scale is about 1/4-1/5 that of neutron	19:12
lajoskatona	clarkb: so we can add it to for example https://grafana.opendev.org/d/BmiopeEMz/neutron-failure-rate?orgId=1 ?	19:12
clarkb	there may also be a better function than integrateByInterval but this seemed to show use increase over time on a daily basis which I found interesting	19:13
clarkb	lajoskatona: I would probably start a new dashboard in a new file and title it OpenStack CI resource usage or similar	19:13
lajoskatona	clarkb: yeah that's sounds good, I will check it	19:13
clarkb	lajoskatona: the instances records are probably most interesting but there is also memory and cpu too. Maybe do a graph for each	19:14
clarkb	https://review.opendev.org/c/zuul/zuul/+/818019 is an update to zuul to make the logs a bit more verbose so that we can get the per job resource usage info from the logs again	21:16

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!