*** rlandy is now known as rlandy|ruck | 00:11 | |
*** ricolin_ is now known as ricolin | 04:18 | |
*** ysandeep|out is now known as ysandeep | 05:24 | |
*** amoralej|off is now known as amoralej | 07:00 | |
*** ykarel__ is now known as ykarel | 08:19 | |
*** soniya29|ruck is now known as soniya29|ruck|lunch | 08:21 | |
*** bauwser is now known as bauzas | 08:31 | |
*** giblet is now known as gibi | 09:02 | |
*** soniya29|ruck|lunch is now known as soniya29|ruck | 09:35 | |
*** ykarel is now known as ykarel|lunch | 09:42 | |
*** ysandeep is now known as ysandeep|afk | 09:44 | |
opendevreview | daniel.pawlik proposed openstack/ci-log-processing master: Initial project commit https://review.opendev.org/c/openstack/ci-log-processing/+/815604 | 09:49 |
---|---|---|
*** ykarel|lunch is now known as ykarel | 10:43 | |
*** ysandeep|afk is now known as ysandeep | 10:57 | |
*** jpena|off is now known as jpena | 10:59 | |
*** rlandy is now known as rlandy|ruck | 11:19 | |
*** amoralej is now known as amoralej|lunch | 12:27 | |
*** amoralej|lunch is now known as amoralej | 13:38 | |
*** soniya29|ruck is now known as soniya29|ruck|dinner | 15:00 | |
*** ykarel is now known as ykarel|away | 15:03 | |
opendevreview | Merged openstack/project-config master: Add Vault role to Zuul jobs https://review.opendev.org/c/openstack/project-config/+/799825 | 15:08 |
opendevreview | daniel.pawlik proposed openstack/ci-log-processing master: Initial project commit https://review.opendev.org/c/openstack/ci-log-processing/+/815604 | 15:27 |
*** soniya29|ruck|dinner is now known as soniya29|ruck | 15:34 | |
lajoskatona | clarkb: Hi, shall I disturb you with gate health and load: https://paste.opendev.org/show/jD6kAP9tHk7PZr2nhv8h/ | 15:58 |
lajoskatona | clarkb: it was perhaps January when this list was created, is there a way to get fresh data to see if for example recent Neutron efforts to optimize job execution has any effect? | 15:59 |
*** jpena is now known as jpena|off | 16:08 | |
clarkb | lajoskatona: that output is from a script we can run against zuul logs. I think it may need updating to handle new zuul logs but I can look at that. Note it has nothing to do with gate health just usage | 16:16 |
lajoskatona | clarkb: yeah, thanks that would be useful, please tell if I can help | 16:19 |
clarkb | lajoskatona: the alternative is that you can get the data out of graphite | 16:19 |
clarkb | I think that would be a good long term solution to this then we don't have to run the script anymore (zuul didn't report those stats via statsd until well after the script was written but I believe it does report them now) | 16:19 |
*** amoralej is now known as amoralej|off | 16:24 | |
lajoskatona | clarkb: that something new for me:-) | 16:31 |
*** ysandeep is now known as ysandeep|out | 16:34 | |
clarkb | lajoskatona: I think zuul.nodepool.resources.project.$PROJECT.$RESOURCE is the graphite key | 16:37 |
clarkb | https://graphite.opendev.org/?width=586&height=308&target=stats.zuul.nodepool.resources.project.opendev_org-openstack-neutron.instances for example | 16:38 |
clarkb | looking at that data I'm wondering if it is accurate though. Or maybe the scale isn't what I expect | 16:40 |
lajoskatona | clarkb: I think I found the same, and tried to compare it with nova for example | 16:43 |
clarkb | oh ha I just realized that we run multiple schedulers now so we also have to combine data across them. This problem is getting more fun. I'll try to run the scripts but the multiple schedulers will make the data less clear | 16:46 |
lajoskatona | clarkb: thanks | 16:47 |
clarkb | ok confirmed the script is broken with current logs so that will need debugging (there is a division by 0 as we aren't finding any resource usage :/( | 16:52 |
clarkb | ok the issue is that recording moved into the zuul executors of which there are 12 | 16:58 |
clarkb | lajoskatona: ya this is going to need work the log format changed (we no longer record the build name :/ ) and the data is recored on 12 separate instances. I think we should look at making the graphite data work instead since that is collected centrally. It also lacks per job info but that isn't a regression against the new log format | 17:10 |
clarkb | I'm going to hack this script up and run it against ze01 to give you a rough idea of info but to be more accuarte we'll likely need to use graphite one way or another | 17:13 |
clarkb | lajoskatona: https://paste.opendev.org/show/bYTlHXfbX84aESK6cLMM/ there you go as noted the data isn't a complete report | 17:17 |
clarkb | Just a survey of one of the zuul executors | 17:17 |
clarkb | related looks like neutron has been very good a resetting the gate lately | 17:19 |
fungi | nice to see that non-openstack activity accounted for ~10% of the utilization | 17:19 |
*** jgwentworth is now known as melwitt | 17:34 | |
lajoskatona | clarkb: thanks | 17:52 |
clarkb | lajoskatona: fungi: https://graphite.opendev.org/?width=943&height=529&target=integralByInterval(stats.zuul.nodepool.resources.project.opendev_org-openstack-neutron.instances%2C%20%221d%22)&target=integralByInterval(stats.zuul.nodepool.resources.project.opendev_org-openstack-nova.instances%2C%20%221d%22)&from=00%3A00_20211109&until=23%3A59_20211115 | 19:09 |
clarkb | I think that shows neutron compared to nova's instance seconds use per day daily | 19:10 |
clarkb | The scale of the data changes dramatically when you look prior to the 9th so I'm not sure if there is a bug or what | 19:10 |
clarkb | lajoskatona: you can make a grafana dashboard in openstack/project-config/grafyaml that sets up a graph similar to that to compare openstack projects | 19:11 |
lajoskatona | clarkb: thanks | 19:12 |
clarkb | one thing to note is the graphite data seems to correlate to my ze01 data. the nova scale is about 1/4-1/5 that of neutron | 19:12 |
lajoskatona | clarkb: so we can add it to for example https://grafana.opendev.org/d/BmiopeEMz/neutron-failure-rate?orgId=1 ? | 19:12 |
clarkb | there may also be a better function than integrateByInterval but this seemed to show use increase over time on a daily basis which I found interesting | 19:13 |
clarkb | lajoskatona: I would probably start a new dashboard in a new file and title it OpenStack CI resource usage or similar | 19:13 |
lajoskatona | clarkb: yeah that's sounds good, I will check it | 19:13 |
clarkb | lajoskatona: the instances records are probably most interesting but there is also memory and cpu too. Maybe do a graph for each | 19:14 |
clarkb | https://review.opendev.org/c/zuul/zuul/+/818019 is an update to zuul to make the logs a bit more verbose so that we can get the per job resource usage info from the logs again | 21:16 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!