opendevreview | Merged openstack/project-config master: Retire openstack-health project: end project gating https://review.opendev.org/c/openstack/project-config/+/836707 | 00:01 |
---|---|---|
opendevreview | Ghanshyam proposed openstack/project-config master: Retire openstack-health projects: remove project from infra https://review.opendev.org/c/openstack/project-config/+/836709 | 00:09 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire opendev/puppet-openstack_health: set noop jobs https://review.opendev.org/c/openstack/project-config/+/836710 | 00:31 |
opendevreview | Ghanshyam proposed openstack/project-config master: Retire opendev/puppet-openstack_health: remove project from infa https://review.opendev.org/c/openstack/project-config/+/836712 | 00:39 |
*** ysandeep|out is now known as ysandeep | 05:22 | |
*** jpena|off is now known as jpena | 07:36 | |
noonedeadpunk | clarkb: regarding zuul queues. I have 2 questions. 1. Should we calculate time project is consuming in gates before we switched to shared queues to compare how much time we started wasting with that approach? 2. Can I just define that in project-config? | 08:20 |
*** raukadah is now known as chandankumar | 08:29 | |
*** ysandeep is now known as ysandeep|lunch | 09:08 | |
opendevreview | Elod Illes proposed openstack/project-config master: Add ansible-collection-kolla to projects.yaml https://review.opendev.org/c/openstack/project-config/+/836763 | 09:49 |
*** ysandeep|lunch is now known as ysandeep | 10:03 | |
*** rlandy_ is now known as rlandy | 10:19 | |
opendevreview | Marcin Juszkiewicz proposed openstack/project-config master: Build and publish wheel mirror for CentOS Stream 9 https://review.opendev.org/c/openstack/project-config/+/836793 | 10:24 |
opendevreview | Marcin Juszkiewicz proposed openstack/project-config master: Add centos-stream-9-arm64 nodes https://review.opendev.org/c/openstack/project-config/+/836796 | 10:34 |
opendevreview | Marcin Juszkiewicz proposed openstack/project-config master: Remove CentOS 8 wheel mirrors https://review.opendev.org/c/openstack/project-config/+/836799 | 10:37 |
*** dviroel|out is now known as dviroel | 11:21 | |
*** ysandeep is now known as ysandeep|afk | 11:37 | |
opendevreview | Merged openstack/project-config master: Remove CentOS 8 wheel mirrors https://review.opendev.org/c/openstack/project-config/+/836799 | 11:50 |
*** ysandeep|afk is now known as ysandeep | 12:56 | |
opendevreview | Marcin Juszkiewicz proposed openstack/openstack-zuul-jobs master: Add build-wheel-cache jobs for CentOS Stream 9 https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/836829 | 13:19 |
opendevreview | Marcin Juszkiewicz proposed openstack/project-config master: Build and publish wheel mirror for CentOS Stream 9 https://review.opendev.org/c/openstack/project-config/+/836793 | 13:19 |
lajoskatona | frickler: Hi, can you join Neutron discussion for OVN related questions? (ttps://www.openstack.org/ptg/rooms/grizzly ) | 13:23 |
opendevreview | Ghanshyam proposed openstack/project-config master: Remove tempest-lib from infra https://review.opendev.org/c/openstack/project-config/+/836703 | 13:24 |
*** dasm|off is now known as dasm|ruck | 13:44 | |
*** dviroel is now known as dviroel|ptg | 14:13 | |
clarkb | noonedeadpunk: I think that is a different waste value to the one you were concerned abou previously. The previous concern was wasted resources which isn't quite the same as gate time. From a resource perspective that is tracked in graphite and there are some dashboard for it already: https://grafana.opendev.org/d/94891e7b01/resource-usage-by-tenants-and-projects?orgId=1 but | 15:01 |
clarkb | looks like those graphs may have broken recently | 15:02 |
clarkb | noonedeadpunk: I think graphite is also tracking the time in queues but we don't have dashboards for it | 15:02 |
clarkb | noonedeadpunk: for 2) you can define it htere since there should already be project entries for the projects there | 15:03 |
clarkb | lajoskatona: I'm trying t oremember were you working on the resource usage dashboards? | 15:05 |
lajoskatona | clarkb: yes, but yatin actually finished it | 15:05 |
clarkb | ah. I'm not sure why those grpahs would've stopped having data recently. I wonder if zuul shifted the location of that info but I don't recall that happening | 15:06 |
lajoskatona | clarkb: this one: https://grafana.opendev.org/d/94891e7b01/resource-usage-by-tenants-and-projects?orgId=1 | 15:06 |
clarkb | ya data seems to end on march 25th | 15:06 |
lajoskatona | clarkb: true, I don't know what happened with it | 15:06 |
clarkb | let me ask the zuul project if they know what may have caused that | 15:07 |
*** dasm|ruck is now known as dasm|ruck|bbl | 15:08 | |
noonedeadpunk | clarkb: likely I wasn't able to explane my concerns correctly:) | 15:10 |
clarkb | noonedeadpunk: well either way that info should be in graphite and you can query it there | 15:10 |
noonedeadpunk | clarkb: as my concern was mostly about random failures in gate that would invalidate all queue | 15:10 |
noonedeadpunk | ok, gotcha | 15:11 |
clarkb | noonedeadpunk: right but it was expressed as a concern was wasting resources not wall time for developers (they are related but distinct problems) | 15:11 |
noonedeadpunk | (I just today got OOM during tempest in gates) | 15:11 |
clarkb | wall time is much more susceptble to other demands in the system | 15:11 |
noonedeadpunk | eventually having flavor with 12Gb of RAM would be quite helpfull to eliminate that | 15:12 |
fungi | noonedeadpunk: we have flavors with 16 | 15:12 |
noonedeadpunk | fungi: I checked nodepool conf and saw only 1 provider having them... | 15:12 |
clarkb | but they are limited and you may experience NODE_FAILURE depending on availbality of the cloud provider | 15:13 |
fungi | it's possible we're down to 1 now that ericsson cancelled their citycloud donation, yeah. i'll double-check | 15:13 |
clarkb | yes I think it was vexxhost and citycloud and now just vexxhost? citycloud could never reliably boot the larger instances either | 15:13 |
noonedeadpunk | uhhhh... citycloud should just donate on their own... | 15:13 |
fungi | noonedeadpunk: you'd think that, but... if you happen to know anybody there that would be awesome | 15:14 |
noonedeadpunk | we're working on that but progress is slow :( | 15:14 |
* noonedeadpunk happen to know himself.... | 15:14 | |
* noonedeadpunk happen to know also evrardjp who is CTO there | 15:15 | |
fungi | oh, i didn't realize that's where he ended up! | 15:15 |
clarkb | right so with citycloud we had resources there indepdnent of airship and had to shut them all down | 15:15 |
clarkb | then airship + ericcson + citycloud did a thing for a bit that was smaller but intended for larger flavor sizes. Unfortunately the larger flavor sizes were very flaky | 15:16 |
fungi | er, didn't have resources independent of what was being donated for airship | 15:16 |
clarkb | and now we don't have that at all which leaves just vexxhost for the larger flavor sizes aiui | 15:16 |
clarkb | fungi: we did several years ago | 15:17 |
noonedeadpunk | well, we have some ideas, but have soooo limited time for that.... | 15:17 |
noonedeadpunk | and huge backlog... | 15:17 |
clarkb | but then they were gone for a year or two by the time airship donation happened | 15:17 |
fungi | oh, right there was an earlier citycloud provider but we were running into scheduling issues with it then too | 15:17 |
noonedeadpunk | but will see | 15:17 |
noonedeadpunk | I didn't know that you had to drop citycloud though | 15:17 |
clarkb | noonedeadpunk: yes we were told they didn't have the extra capacity anymore iirc | 15:18 |
clarkb | which is a perfectly valid reason to stop donating | 15:18 |
fungi | on the original donation, right | 15:18 |
clarkb | Just want to pint out that asking us to give you bigger instances is difficult when the clouds you work for are unable t odo it :) | 15:18 |
fungi | and then the more recent donation ended because ericsson stopped paying for it | 15:18 |
clarkb | re OOMing I've tried a few times to encourage projects to look at their resource consumption particularly memory | 15:19 |
noonedeadpunk | ok, gotcha | 15:19 |
clarkb | the last time I did debugging of it privsep was a major culprit | 15:19 |
clarkb | because every service (maybe even each process) runs a separate privsep instance and they all together consume more memory than any single openstack service iirc | 15:19 |
noonedeadpunk | well, the job that failed was installing whole telemetry stack.... | 15:19 |
fungi | right, i expect openstack-ansible is falling victim to unchecked memory bloat across openstack services, which happens in part because they happily use all the memory our flavors offer | 15:20 |
clarkb | fungi: right, if we had 16GB of memory as default then openstack would balloon to fill that | 15:20 |
clarkb | we can kick the can down th eroad or push openstack to fix it | 15:20 |
fungi | if every project started using 16gb flavors for testing, suddenly the openstack services would just grow their memory footprint in response, right | 15:20 |
clarkb | I've attempted the please fix it route without much success | 15:20 |
fungi | and so openstack-ansible would probably continue to oom once that happened | 15:21 |
noonedeadpunk | well we are really limiting amount of workers, which reduces memroy consumption | 15:21 |
*** ysandeep is now known as ysandeep|out | 15:21 | |
noonedeadpunk | but out of aio build it feels that smth like 12 gb ram is needed for stable work. Such snadboxes can remain working for several month without issues | 15:21 |
fungi | it does, until the individual service projects see that their memory-hungry patches are no longer failing with oom, and start to merge them without concern for how much memory they're wasting | 15:22 |
noonedeadpunk | for example, when we're testing manila I spent half of day to find out flavor for it that would allow VM to spawn (in terms of image requirement) and not to OOM | 15:22 |
noonedeadpunk | as 256 was too smal for focal to launch and 384 too much to start VM | 15:23 |
noonedeadpunk | So I'd say we almost fitting 8Gb, until more hungry jobs (like with ceph) are launched... | 15:24 |
fungi | probably the tc would need to select integrated/aggregate testing memory footprints as a cross-project goal | 15:25 |
fungi | and figure out ways to keep services from consuming however much memory is available | 15:25 |
fungi | because otherwise, projects are just going to grow their memory consumption to fit whatever new flavor is provided | 15:26 |
fungi | it seems like our current test nodes are the only thing forcing them to be careful about memory waste | 15:27 |
clarkb | really I think if privsep was improved we'd see a major benefit | 15:28 |
fungi | there was a time when we ran devstack jobs in 2gb ram virtual machines. when we switched it to 4gb, all the projects quickly grew their memory utilization such that running in 2gb was no longer possible. same happened almost immediately again when we increased from 4gb to 8gb | 15:28 |
clarkb | like maybe running a single privsep for all the things or making its regexes more efficient (re2 maybe?) | 15:28 |
opendevreview | Elod Illes proposed openstack/openstack-zuul-jobs master: Use again bionic in lower-constraints for xena, wallaby and victoria https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/836843 | 15:29 |
fzzf[m] | Hi, I use nodepool-builder build diskimages. diskimage-builder run `"curl -v -L -o source-repositories/.download.qg00hgu8 -w '%{http_code}' --connect-timeout 10 --retry 3 --retry-delay 30 https://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img -z"`, **timeout occured**.... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/NzuCpmBXgBZrhuVqdwZaYcPU) | 15:29 |
opendevreview | Elod Illes proposed openstack/openstack-zuul-jobs master: Use again bionic in lower-constraints for xena and older branches https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/836843 | 15:29 |
fungi | fzzf[m]: normally, dib should only download a particular cirros image once on each builder and then cache it indefinitely | 15:30 |
fungi | fzzf[m]: i'm able to download that image at home in 3.3 seconds. maybe you have a proxy or filter between your builder and the internet which is blocking that request? | 15:32 |
opendevreview | Elod Illes proposed openstack/openstack-zuul-jobs master: Use again bionic in lower-constraints for older branches https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/836843 | 15:33 |
opendevreview | Clark Boylan proposed openstack/project-config master: Update grafana resource usage graphs to match new paths https://review.opendev.org/c/openstack/project-config/+/836844 | 15:33 |
clarkb | lajoskatona: yasufum ^ I think that will fix the issue with those graphs | 15:33 |
clarkb | er sorry yasufum I meant to ping yatin and tab complete failed me | 15:33 |
lajoskatona | clarkb: thanks, sorry we have neutron session in the meantime | 15:34 |
fzzf[m] | fungi: Is there any way to set the proxy, or set the timeout, or is there one way to download manually and configure | 15:36 |
clarkb | fzzf[m]: DIB should respect http_proxy env var settings. Is the problem that you are running in an environment that doens't have external access without a proxy? | 15:39 |
clarkb | I don't expect increasing the timeout will make it work any better if whatever caused the initial timeout isn't addressed | 15:40 |
clarkb | specifically if you cannot connect in 10 seconds that indicates a problem somewhere aside from the timeout | 15:40 |
*** dviroel|ptg is now known as dviroel|ptg|lunch | 15:42 | |
opendevreview | Merged openstack/project-config master: Retire opendev/puppet-openstack_health: set noop jobs https://review.opendev.org/c/openstack/project-config/+/836710 | 15:44 |
fzzf[m] | clarkb: Does DIB have any variable configuration for http_proxy? | 15:46 |
clarkb | fzzf[m]: I think it should respect the standard env vars: http_proxy and https_proxy | 15:47 |
fzzf[m] | clarkb: Do you mean the variables displayed by env command? | 15:51 |
clarkb | fzzf[m]: the env command displays all currently set environment variables yes. I'm saying if you set http_proxy and https_proxy as environment variables pointing at your proxy that DIB should respect it | 15:52 |
*** dasm|ruck|bbl is now known as dasm|ruck | 15:52 | |
fungi | the http_proxy and https_proxy environment variables are a standard unix/linux way of specifying the location of outbound proxies for your systems, it's not something dib/nodepool/zuul-specific | 15:53 |
fungi | if you're going to be running any servers in a network which requires use of a proxy, it's how you would handle that for most applications you run | 15:54 |
timburke | fungi, clarkb: fyi, looks like the fedora mirror issue for py310 cleared up. thanks again for the quick investigation! | 15:55 |
fungi | timburke: thanks for pointing it out! glad things there stabilized eventually | 15:56 |
fungi | seems like there was a very large mirror push from fedora and the second-tier mirror we sync from was probably mid-update for a while | 15:56 |
fzzf[m] | <fungi> "the http_proxy and https_proxy..." <- okay, I get it, I'll check it. thanks. | 15:59 |
opendevreview | Merged openstack/project-config master: Update grafana resource usage graphs to match new paths https://review.opendev.org/c/openstack/project-config/+/836844 | 16:00 |
opendevreview | Marcin Juszkiewicz proposed openstack/openstack-zuul-jobs master: Add build-wheel-cache jobs for CentOS Stream 9 https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/836829 | 16:03 |
opendevreview | Marcin Juszkiewicz proposed openstack/openstack-zuul-jobs master: Install EPEL on CentOS Stream 9 before using it https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/836852 | 16:03 |
*** dviroel|ptg|lunch is now known as dviroel|ptg | 16:03 | |
opendevreview | Marcin Juszkiewicz proposed openstack/openstack-zuul-jobs master: Add build-wheel-cache jobs for CentOS Stream 9 https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/836829 | 16:13 |
opendevreview | Marcin Juszkiewicz proposed openstack/project-config master: Add centos-stream-9-arm64 nodes https://review.opendev.org/c/openstack/project-config/+/836796 | 16:16 |
opendevreview | Marcin Juszkiewicz proposed openstack/project-config master: Add EPEL into CentOS Stream 9 images https://review.opendev.org/c/openstack/project-config/+/836855 | 16:19 |
*** jpena is now known as jpena|off | 16:35 | |
opendevreview | Merged openstack/project-config master: Add EPEL into CentOS Stream 9 images https://review.opendev.org/c/openstack/project-config/+/836855 | 16:38 |
opendevreview | Merged openstack/project-config master: Add centos-stream-9-arm64 nodes https://review.opendev.org/c/openstack/project-config/+/836796 | 16:40 |
clarkb | ok three of the resource usage graphs work now but not the first one | 16:49 |
opendevreview | Clark Boylan proposed openstack/project-config master: Fix small bug in resource usage grpahs https://review.opendev.org/c/openstack/project-config/+/836861 | 16:51 |
clarkb | fungi: ^ I suspect that small typo is to blame | 16:52 |
fungi | d'oh! i did not spot it | 16:52 |
clarkb | I wrote it :) | 16:52 |
*** dasm|ruck is now known as dasm|ruck|bbl | 17:06 | |
opendevreview | Merged openstack/project-config master: Fix small bug in resource usage grpahs https://review.opendev.org/c/openstack/project-config/+/836861 | 17:10 |
*** dviroel|ptg is now known as dviroel | 17:22 | |
*** dasm|ruck|bbl is now known as dasm|ruck | 17:36 | |
*** dviroel is now known as dviroel|mtg | 17:47 | |
clarkb | the resource usage graphs are fixed now | 18:22 |
*** dviroel|mtg is now known as dviroel | 18:43 | |
*** rlandy is now known as rlandy|biab | 19:48 | |
*** rlandy|biab is now known as rlandy | 20:04 | |
*** dviroel is now known as dviroel|afk | 20:34 | |
*** dasm|ruck is now known as dasm|off | 21:12 | |
*** rlandy is now known as rlandy|out | 22:46 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!