Monday, 2021-11-15

*** rlandy is now known as rlandy|ruck00:11
*** ricolin_ is now known as ricolin04:18
*** ysandeep|out is now known as ysandeep05:24
*** amoralej|off is now known as amoralej07:00
*** ykarel__ is now known as ykarel08:19
*** soniya29|ruck is now known as soniya29|ruck|lunch08:21
*** bauwser is now known as bauzas08:31
*** giblet is now known as gibi09:02
*** soniya29|ruck|lunch is now known as soniya29|ruck09:35
*** ykarel is now known as ykarel|lunch09:42
*** ysandeep is now known as ysandeep|afk09:44
opendevreviewdaniel.pawlik proposed openstack/ci-log-processing master: Initial project commit  https://review.opendev.org/c/openstack/ci-log-processing/+/81560409:49
*** ykarel|lunch is now known as ykarel10:43
*** ysandeep|afk is now known as ysandeep10:57
*** jpena|off is now known as jpena10:59
*** rlandy is now known as rlandy|ruck11:19
*** amoralej is now known as amoralej|lunch12:27
*** amoralej|lunch is now known as amoralej13:38
*** soniya29|ruck is now known as soniya29|ruck|dinner15:00
*** ykarel is now known as ykarel|away15:03
opendevreviewMerged openstack/project-config master: Add Vault role to Zuul jobs  https://review.opendev.org/c/openstack/project-config/+/79982515:08
opendevreviewdaniel.pawlik proposed openstack/ci-log-processing master: Initial project commit  https://review.opendev.org/c/openstack/ci-log-processing/+/81560415:27
*** soniya29|ruck|dinner is now known as soniya29|ruck15:34
lajoskatonaclarkb: Hi, shall I disturb you with gate health and load:  https://paste.opendev.org/show/jD6kAP9tHk7PZr2nhv8h/15:58
lajoskatonaclarkb: it was perhaps January when this list was created, is there a way to get fresh data to see if for example recent Neutron efforts to optimize job execution has any effect?15:59
*** jpena is now known as jpena|off16:08
clarkblajoskatona: that output is from a script we can run against zuul logs. I think it may need updating to handle new zuul logs but I can look at that. Note it has nothing to do with gate health just usage16:16
lajoskatonaclarkb: yeah, thanks that would be useful, please tell if I can help16:19
clarkblajoskatona: the alternative is that you can get the data out of graphite16:19
clarkbI think that would be a good long term solution to this then we don't have to run the script anymore (zuul didn't report those stats via statsd until well after the script was written but I believe it does report them now)16:19
*** amoralej is now known as amoralej|off16:24
lajoskatonaclarkb: that something new for me:-)16:31
*** ysandeep is now known as ysandeep|out16:34
clarkblajoskatona: I think zuul.nodepool.resources.project.$PROJECT.$RESOURCE is the graphite key16:37
clarkbhttps://graphite.opendev.org/?width=586&height=308&target=stats.zuul.nodepool.resources.project.opendev_org-openstack-neutron.instances for example16:38
clarkblooking at that data I'm wondering if it is accurate though. Or maybe the scale isn't what I expect16:40
lajoskatonaclarkb: I think I found the same, and tried to compare it with nova for example16:43
clarkboh ha I just realized that we run multiple schedulers now so we also have to combine data across them. This problem is getting more fun. I'll try to run the scripts but the multiple schedulers will make the data less clear16:46
lajoskatonaclarkb: thanks16:47
clarkbok confirmed the script is broken with current logs so that will need debugging (there is a division by 0 as we aren't finding any resource usage :/(16:52
clarkbok the issue is that recording moved into the zuul executors of which there are 1216:58
clarkblajoskatona: ya this is going to need work the log format changed (we no longer record the build name :/ ) and the data is recored on 12 separate instances. I think we should look at making the graphite data work instead since that is collected centrally. It also lacks per job info but that isn't a regression against the new log format17:10
clarkbI'm going to hack this script up and run it against ze01 to give you a rough idea of info but to be more accuarte we'll likely need to use graphite one way or another17:13
clarkblajoskatona: https://paste.opendev.org/show/bYTlHXfbX84aESK6cLMM/ there you go as noted the data isn't a complete report17:17
clarkbJust a survey of one of the zuul executors17:17
clarkbrelated looks like neutron has been very good a resetting the gate lately17:19
funginice to see that non-openstack activity accounted for ~10% of the utilization17:19
*** jgwentworth is now known as melwitt17:34
lajoskatonaclarkb: thanks17:52
clarkblajoskatona: fungi: https://graphite.opendev.org/?width=943&height=529&target=integralByInterval(stats.zuul.nodepool.resources.project.opendev_org-openstack-neutron.instances%2C%20%221d%22)&target=integralByInterval(stats.zuul.nodepool.resources.project.opendev_org-openstack-nova.instances%2C%20%221d%22)&from=00%3A00_20211109&until=23%3A59_2021111519:09
clarkbI think that shows neutron compared to nova's instance seconds use per day daily19:10
clarkbThe scale of the data changes dramatically when you look prior to the 9th so I'm not sure if there is a bug or what19:10
clarkblajoskatona: you can make a grafana dashboard in openstack/project-config/grafyaml that sets up a graph similar to that to compare openstack projects19:11
lajoskatonaclarkb: thanks19:12
clarkbone thing to note is the graphite data seems to correlate to my ze01 data. the nova scale is about 1/4-1/5 that of neutron19:12
lajoskatonaclarkb: so we can add it to for example https://grafana.opendev.org/d/BmiopeEMz/neutron-failure-rate?orgId=1 ?19:12
clarkbthere may also be a better function than integrateByInterval but this seemed to show use increase over time on a daily basis which I found interesting19:13
clarkblajoskatona: I would probably start a new dashboard in a new file and title it OpenStack CI resource usage or similar19:13
lajoskatonaclarkb: yeah that's sounds good, I will check it19:13
clarkblajoskatona: the instances records are probably most interesting but there is also memory and cpu too. Maybe do a graph for each19:14
clarkbhttps://review.opendev.org/c/zuul/zuul/+/818019 is an update to zuul to make the logs a bit more verbose so that we can get the per job resource usage info from the logs again21:16

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!