Wednesday, 2025-07-02

opendevreviewJames E. Blair proposed openstack/project-config master: Fix zuul status node requests graph  https://review.opendev.org/c/openstack/project-config/+/95391400:38
opendevreviewTony Breeds proposed openstack/diskimage-builder master: Remove mirror from experimental debian jobs  https://review.opendev.org/c/openstack/diskimage-builder/+/95325601:57
opendevreviewTony Breeds proposed openstack/diskimage-builder master: Remove nodepool based testing  https://review.opendev.org/c/openstack/diskimage-builder/+/95295301:57
opendevreviewTony Breeds proposed openstack/diskimage-builder master: Remove testing for f37  https://review.opendev.org/c/openstack/diskimage-builder/+/95295401:57
opendevreviewMerged openstack/project-config master: Fix zuul status node requests graph  https://review.opendev.org/c/openstack/project-config/+/95391409:12
stephenfinclarkb: fungi: gate fixes for pbr starting here, when you have a chance. I'd like to get that going to unblock a few other open changes, before moving onto further testing11:13
stephenfinnote that I'm simply disabling tests with newer setuptools since the fixes are likely to be involved, and this way we can selectively re-enable them as we fix or delete them if it's no longer sustainable11:13
frickleris this a new feature in the gerrit UI that in the "reply" popup I can remove reviewers/CCs by clicking on the "x" next to them, but them when I try so submit my comment/reviews, I get "Error 403 (Forbidden): remove reviewer not permitted" and I also have no way of restoring to the original state without losing the comment I typed?12:21
mnasiadkaGood feature12:21
fungifrickler: if you changed permission levels for your account (adding to/removing from the administrators group for example) you'll need to force-refresh gerrit in your browser since the client caches a lot of permission lookups client-side12:26
fricklerhmm, last permission changes were a very long time ago. still I'd argue they should be treated consistently within the UI12:28
fungiheaded out to run some errands, should be back soon13:00
mnasiadkacorvus/frickler: willing to take a look at https://review.opendev.org/c/opendev/zuul-providers/+/953908? ;-)13:30
clarkbstephenfin: ya I should be able to take a look today14:54
stephenfinty14:54
clarkbsince things seem otherwise quiet I'm going to take the opportunity for some local system updates first though. I've been neglecting those due to a busy early week14:54
opendevreviewJames E. Blair proposed openstack/project-config master: Revise zuul-launcher dashboards  https://review.opendev.org/c/openstack/project-config/+/95397316:18
opendevreviewMichal Nasiadka proposed openstack/diskimage-builder master: Retry git clone/fetch on timeout  https://review.opendev.org/c/openstack/diskimage-builder/+/72158116:23
corvusi'm restarting the launchers with a fix that should address a situation where the launchers were only using half the quota16:25
corvushttps://i.imgur.com/jO8j7dL.png16:32
corvusthat looks a lot better16:32
corvusyou can see the usage increase when i restarted, and it's now fully saturated16:33
corvusthat's from the new style dashboard i just uploaded in 95397316:33
clarkbcorvus: which launcher change was that?16:33
corvussince osuosl is the only provider for arm64 labels, all of the arm64 node requests get assigned there.  which means, for the first time, we can see the backlog for that particular provider (the read line that goes way above the max line)16:34
corvusclarkb: https://review.opendev.org/95392516:34
corvuswrote it late last night and early this morning16:34
clarkbah a data lookup bug in quota calculations16:35
clarkband ya that graph is neat16:35
clarkband for reviewing the graph update change I'm going to look at screenshots and if they look good give it a +2 rather than try and parse all the json.16:36
corvusyeah, i was sad about the json, but i'm pretty sure the beta-feature of the "config from query results" transform is not supportable in our old yaml...17:01
clarkbcorvus: looking at the screenshots I see that other providers also have the red backlog line above the max server capacity. Is that the backlog that those providers have grabbed from the queue?17:03
clarkbit doesnt' seem to reflect the global backlog as it differs between them17:03
corvusyes, and figuring out why those are getting assigned to providers instead of waiting in the nodeset request queue is next on my list17:04
corvus(so the graphs are showing what i think is a real thing that should be improved)17:04
clarkbnice. Data helping make things better17:05
clarkbI +2'd the change the graphs seem to work in the screenshots17:05
corvuslike, 6 hours ago 300 nodes got assigned to rax-dfw, and 50 got assigned to rax-ord.  why?  i dunno.  :)17:05
clarkbnot sure if anyone else wants to review that or if we should go ahead and approve it17:05
corvusit is interesting to see that in ovh, the controlling limit is cores.  we use 20% of our ram limit when we're at 100% of cores.17:06
fungii'll take a quick peek17:07
corvusosuosl is, unsurprisingly, perfectly balanced -- we're at 100% of instances, cores, ram all at the same time.  :)17:08
clarkbstephenfin: fungi: ok I have reviewed the pbr chagnes. I have a concern in the first change that we're over restricting setuptools to <80 even in cases we don't need to and I'm concnered that the precommit update in the last change switches to pulling hacking from an uncached location rather than pypi which is cached in our CI system17:30
stephenfinthanks, looking now before I wrap up for the day17:32
clarkbfwiw I wish I had kept up with the precommit change17:32
clarkbI -1'd it then it was merged despite my requset that we not do the thing that the last precommit change does too17:32
clarkbprecommit is a bad choice in CI environments that attempt to cache things like pacakges because no one sets it up to install from pacakges17:33
clarkbinstead installing from random git repo refs that are not cached17:33
clarkbI feel like this proves my point that its a bad tool. You can use it in less bad ways but no one does even when I explicitly asked that we use it correctly...17:34
clarkbfungi: fyi I suspect that precommit is also why the fixup change includes unrelated formatting updates17:36
clarkbits but I'm not positive of that (therea are a bunch of rules in the file for pre-commit-hooks17:36
corvuswhy would precommit need the ci caches... it's not run in the gate, right?17:37
clarkbcorvus: it is run in the gate17:37
opendevreviewMerged openstack/project-config master: Revise zuul-launcher dashboards  https://review.opendev.org/c/openstack/project-config/+/95397317:37
clarkbcorvus: the way openstack has configured pre-commit is that the pep8/linters targets now all run pre-commit if pre-commit is added to the repo17:37
clarkbthis way you don't end up with flake8 called directly failing a commit that was pre-commit checked locally due to a mismatch in versions or whatever17:38
corvushttps://governance.openstack.org/tc/reference/project-testing-interface.html17:38
corvuspre-commit does not appear there17:38
clarkbcorvus: I think the rules in that document are loose enough "every project must enforce code style"17:39
corvusi feel like that would have been a really good place to talk about how to integrate pre-commit in a way that doesn't break testing17:39
clarkbwhether you use flake8 directly or pre-commit to call flake8 is fine under that rule17:39
clarkbbut yes the problem isn't pre-commit existing. The problem is no one configures pre-commit to use CI caches17:39
clarkbevne when I explicitly ask them to apparently17:40
corvusyeah, that doc used to mostly be about making sure tox was set up correctly17:40
corvusso i feel like bypassing the dep installation of tox warrants a note :)17:40
fricklerjrosser reported some unusual cluster of job timeouts, like in https://zuul.opendev.org/t/openstack/buildset/c9c0373292f445a28033086f4ab4ff4c seems to be mostly on rax-ord/dfw, but I didn't look closer yet and won't get to that today. just as a note in case more such failures pop up17:41
clarkbtl;dr you should have pre-commit install python packages by name and version that way pip will look at pypi which in our Ci environment has caching17:42
corvusclarkb: ++17:42
stephenfinclarkb: I recall this coming up, but it's a matter of competing priorities. For me, I selfishly care far more about the dev ex win that pre-commit represents that I do about something abstract (to me) like caching17:43
clarkbstephenfin: right but you don't run pre-commit 10k times a day17:43
corvusi mean, you can have both17:43
clarkbthe CI system does and we should be better citizens of the Internet (though in the age of AI crawlers this seems like a drop in the bucket)17:44
clarkband ya no one is saying don't use pre-commit. Just asking that you not point at a repo and sha17:44
clarkbwhcih I know is the pre-commit default but aiui doesn't have to be run that way17:44
stephenfinby doing local configuration, you lose the ability to auto-bump dependencies17:45
stephenfinclient-side caching too, iirc17:45
clarkbstephenfin: you have to bump them in your tox config (which may be requirements.txt or whatever)17:45
stephenfinand you need to copy the configuration from the upstream pre-commit hooks 17:46
clarkbthis is a bug in pre-commit fwiw17:46
clarkbit should be easier to use like this.17:46
stephenfinI think the issue is that there's no way to get all the pre-commit context from the installed package17:46
stephenfinthat's a known file (.pre-commit-hooks.yaml) in the root of the repo, not something included in the package17:47
clarkbI'm not sure I follow. flake8 installed from a git hash or from a pypi package should contain the exact same code as long as the commit hash matches that pypi package release17:47
clarkbstephenfin: if there are files missing from the paackges then that is a broken package17:48
clarkband that should be fixable (though you may need a new release)17:49
stephenfinAlas, no. A .pre-commit-hooks.yaml file won't be included since it lives in the root of the repo, not in the package, so it's a data file (which is a deprecated thing) rather than a package data file17:50
clarkbwe're still able to incluide things like READMEs and AUTHORS files17:50
clarkbwhy is this any different?17:50
stephenfinthe README is included because it's a blessed thing for package management tools. The AUTHORS is included because pbr (and only pbr, right?) has a soon-to-be-broken hook for that. Adding another file to dist-info will require a new hooking mechanism, which I can only assume would necessitate a PEP17:53
clarkbas a sanity check I grabbed https://files.pythonhosted.org/packages/e7/7f/2143758ec2ed791b9fe506a4721fed680452291f7d8bfb39b397d9a86687/zuul-12.1.0.tar.gz and it contains Changelog and even the input MANIFEST.in17:53
clarkbI guess what you're saying is setuptools is going to break our ability to properly package software further and we won't be able to do that in the future?17:54
clarkbbut as far as I can tell this does work today with existing packaging17:54
fungistephenfin: no, AUTHORS gets included in packages automatically by setuptools as a license-related file, auto-detected similar to Copying and LICENSE et cetera17:55
fungithe fact that we generate it with pbr is orthogonal to that17:55
stephenfinfungi: thanks, I wasn't sure about that. ChangeLog too, or is that one pbr only?17:56
fungisetuptools is not, afaik, planning to break inclusion of license files17:56
clarkbthe .zuul.yaml is also in the sdist so I think this would work fine for pre-commit. At least with current setuptools and pbr17:56
fungiagain, ChangeLog is generated by pbr but its inclusion is a matter of being entered in the manifest17:56
stephenfinclarkb: no. Nothing includes .pre-commit-hooks.yaml. You can verify that by pulling a package that includes one like hacking17:56
clarkbI don't know enough of their future plans to know if that will break17:56
clarkbstephenfin: right but you can is my point17:56
clarkbjust like zuul includes .zuul.yaml in it17:56
clarkbif that information is required for pre-commit to work then not having it in your package is a package bug imo17:57
fungiperhaps one of the things i should have added to the pbr features etherpad is that it generates the dist manifest based on the results of git ls-files17:57
clarkbnot a fundamental reason to not use packages17:57
stephenfinokay, I get your point now17:57
fungithough setuptools also has its own manifest auto-generator these days that scans for python modules, but can be given a list of additional files to include17:57
clarkbstephenfin: hacking includes the precommit file in its package17:57
clarkbstephenfin: https://files.pythonhosted.org/packages/f7/19/cf7a61cb63288c226bf2fa012ddcda51e4baad3039dbb4fc4b4e1a2b8e16/hacking-7.0.0.tar.gz extract that and you'll see the file17:58
stephenfinI was going to say no, it's not there, but forgot nautilus (or whatever the graphical file browser in Fedora is) doesn't show hidden files by default17:58
fungistephenfin: anyway, in future it would be good to separate random format updating and comment typo corrections into their own patch, if you're going to auto-style files every time you touch them17:59
fungii didn't -1 for that, but it's distracting to reviewers17:59
clarkband yes its likely that pre-commit does't have the necessary plumbing today to make all of that work. But it could I don't think there is a fundamental reason that it wouldn't work and most of the pieces are there. Its just pre-commit itself missing the necessary bits17:59
stephenfinerr, which review are we talking about17:59
fungistephenfin: the pbr reviews i left nit comments/questions on18:00
clarkbhttps://review.opendev.org/c/openstack/pbr/+/953892/3/pbr/tests/test_integration.py and https://review.opendev.org/c/openstack/pbr/+/953839/5/pbr/tests/test_core.py for example18:00
fungiabout sudden appearance of extra blank lines, reflowing function parameters that weren't being changed, there was also a mistyped word in a random code commment that i didn't point out but was seemingly unrelated to the patch18:01
fungimaybe you're running some tool that's randomly altering files behind your back, and not checking the diff yourself so didn't notice?18:01
clarkbfungi: we just got an email indicating the lists backups failed. These failures have occurred occasionally over the last week18:03
funginot a huge deal, but like i said distracting when it's unrelated to and in some cases distantly removed from your actual intended edits in the file18:03
clarkbfungi: I assume (but haven't checked) that this is going to be load related to the cralwing stuff18:03
clarkbI guess I should look at the logs and see if we have more rules to add to the UA filter18:03
fungiclarkb: possible, that could easily lead to timeouts18:03
fungithe ua filter change i started earlier in the week just includes the one ua you pointed out, i haven't looked at the logs to see if there are others18:04
fungialso it's still open and can be amended18:04
clarkbfungi: ya or affecting timing in such a way that it overlaps with tasks on the backup server side like backup validation that I think cause new backups to error if they happen concurrently18:04
stephenfinfungi: no, that's me alright. I'm relying on Gerrit highlighting significant changes differently to newlines/rewraps18:05
stephenfinothers like https://review.opendev.org/c/openstack/pbr/+/953839/5/pbr/tests/test_integration.py were to make code more comprehensible. I can't drag that out into a precursor patch like I normally would since the gate is currently broken18:06
fungistephenfin: makes sense, thanks18:06
fungiclarkb: right, in the past what we saw was colliding backup runs from the two separate servers18:08
clarkbfungi: `cut -d' ' -f 12- lists.opendev.org-ssl-access.log | sort | uniq -c | sort` says chatgpt and claude are the worst offenders18:08
fungiusually related to long backup times causing them to overlap when they normally wouldn't18:08
clarkbfungi: /pipermail is the old url compatibility shim right?18:09
clarkbI should maybe grep -v pipermail and see what things look like then18:09
stephenfinclarkb: as for pre-commit, I need time to think about that more. I can surely wrangle a solution, but the biggest issue is that the pre-commit author is one of the least pleasant people I've had the misfortune of working with and I actively avoid contributing to either that or flake8 nowadays18:09
stephenfinif only astral would come up with their own variant of that too...18:09
clarkboof18:10
corvusi think all the zuul-launcher anomalies i have investigated can be explained by the recently fixed bug; so i'm going to disregard past issues and just look for new weirdness.18:10
clarkband ya for something like PBR its relatively minor due to the lack of activity. I'm more just frustrated that its been years of trying to get people to be more cautious with pre-commit to make it CI friendly but that is never anyone elses priority18:10
stephenfinleast pleasant might be a big harsh: least agreeable is perhaps more apt :)18:10
stephenfin*bit18:11
fungi /pipermail is the copy of the old mm2 archives, which are redundant (but not able to be automatically mapped/redirected from their mm3 counterparts) in some cases, though uniquely archival for lists that were retired before we migrated. i guess they ignore the crawl-delay we set in robots.txt?18:11
clarkbstephenfin: fwiw I can think of other tools that suffer similiar problems. Docker comes immediately ot mind where they made the image protocol annoying to cache and now they are enforcing strict request limits on everyone18:11
clarkbfungi: I haven't checked timestamps to see if they honor the crawl delay18:11
clarkbfungi: but a lot of requests are to pipermail for chatgpt at least18:12
fungiwell, we set it to 2 so i wouldn't expect a ton of requests at that speed18:12
clarkb`grep -v '/pipermail/' lists.opendev.org-ssl-access.log | cut -d' ' -f 12- | sort | uniq -c | sort` give a very different result18:12
fungiat crawl-delay: 2 the bot should top out around 43200/day18:12
fungibut also, just because a ua claims to be claude or chat-gpt, doesn't mean it actually is18:13
clarkbyes but once I ignore pipermail I think the picture is more clear18:13
clarkbserving pipermail is basically free compared to the django stuff18:13
fungiright, it's direct file handoff for the most part, while django is database queries18:14
opendevreviewMerged opendev/zuul-providers master: Drop niz- label prefix from nodesets  https://review.opendev.org/c/opendev/zuul-providers/+/95383518:14
opendevreviewMerged opendev/zuul-providers master: Remove "normal" labels, etc  https://review.opendev.org/c/opendev/zuul-providers/+/95383618:15
opendevreviewMerged opendev/zuul-providers master: Remove gentoo-17 nodeset  https://review.opendev.org/c/opendev/zuul-providers/+/95272318:16
opendevreviewMerged opendev/zuul-providers master: Remove ubuntu-xenial nodeset  https://review.opendev.org/c/opendev/zuul-providers/+/95272618:17
corvusclarkb: ^ you highlighted that there is a possibility of fallout from that (but not expected)18:18
clarkbcorvus: ya and it would be in places like system-config18:19
clarkbso infra-root keep your eyes open and let us know if you see something18:19
stephenfinclarkb: https://review.opendev.org/c/openstack/pbr/+/953982 I've kept it separate for now lest if fail in CI18:21
clarkbstephenfin: ack that works for me. Probably easier to whittle down what needs further fixing that way too18:22
stephenfinthat's my thinking, yes18:22
stephenfin(start from a stable base rather than house on fire)18:22
clarkbI +2d but didn't approve as I figured fungi may want to weigh in on the nits18:22
clarkbfungi: but feel free to approve the stack now if you're ahppy18:23
fungilooking18:23
fungii'll give it a bit to see how tests are doing with that change before i approve the whole lot18:26
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Remove no_log for image upload tasks  https://review.opendev.org/c/zuul/zuul-jobs/+/95398318:28
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Switch to zuul-jobs upload-image-swift  https://review.opendev.org/c/opendev/zuul-providers/+/95101818:29
opendevreviewClark Boylan proposed opendev/system-config master: Add more UA filters  https://review.opendev.org/c/opendev/system-config/+/95390418:48
clarkbfungi: ^ that is an updated set of rules. In some cases I found we already had regex type rules that I expanded. In others I just went more verbatim because it is easy18:49
clarkbfor testing of that change our gitea testinfra requests go to port 3081 which should be the apache there which should check that we don't completelybreak apache with those rules which is nice18:51
clarkbsomething we should be aware of as those rules are applied pretty broadly these days. If we break apache with ab ad rule then we'll have widespread sadness18:51
clarkbin addition to that change https://review.opendev.org/c/opendev/system-config/+/953848 and https://review.opendev.org/c/opendev/system-config/+/953846 are straightforward simple improvements to zuul and zookeeper docker compose19:01
clarkband now I'm finding lunch19:01
opendevreviewMerged opendev/system-config master: Cleanup zookeeper config management  https://review.opendev.org/c/opendev/system-config/+/95384620:05
opendevreviewMerged opendev/system-config master: Remove docker compose version from zuul services  https://review.opendev.org/c/opendev/system-config/+/95384820:17
clarkbthat first change made the trailing space correction to the inventory so its running all the infra prod jobs20:24
clarkbI'm keeping an eye on it20:24
clarkbcorvus: the deployment job for grafana against 953846 (so an unrelated change) failed with Exception: Duplicate dashboard found in '/grafana/zuul-launcher-ovh.yaml: 'Zuul Launcher: OVH' already defined20:30
clarkbcorvus: I'm wondering if we need to do manual cleanup for the existing launcher dahsboards as it can't resolve the deltas for some raeson?20:30
clarkbit does look like the dashboard I see currently on grafana is the old version (no memory, cores, etc graphs)20:31
fungithe discussion in #openstack-nova about why glean is still needed may be relevant to the interests of some in here20:31
clarkbis it ongoing or should I just look at logs?20:32
fungi(cropped up earlier based on discussion about nova long-term plans to get rid of configdrive)20:32
fungiit's in progress20:32
fungisemi-synchronous due to tz differences between conversants20:33
fungiso it stretches back to early this morning western time20:33
clarkbthe zookeeper docker-compose.yaml update did end up doing a rolling restart of the cluster fwiw20:44
clarkbbut all seems well from what I can see here20:44
clarkblooking at ansible logs that may be because we chagned the docker-compose.yaml config (even in a noop way) which caused it to repull images?20:47
clarkbbut ya we went server by server and the cluster seems happy so I think we're ok20:47
clarkband remote puppet else failed with Execution of '/usr/bin/git fetch origin' returned 128: fatal: unable to access 'https://opendev.org/openstack/project-config/': GnuTLS recv error (-54): Error in the pull function. which is suspiciously like our image build errors20:50
clarkbI wonder if the gitea upgrade may be contributing to this...20:50
opendevreviewClark Boylan proposed opendev/system-config master: Add apache-ua-filter file path matches where used  https://review.opendev.org/c/opendev/system-config/+/95399320:58
clarkbfungi: corvus ^ you reviewed the UA filter change. I noticed ^ when checking on the progress of the gating. Basically I don't think we'll auto deploy those updates to the zuul and mailman services currently20:59
clarkbhrm that change didn't actually run the jobs I expected it to21:16
clarkbI'll make a noop change to the ua filters in it so that it actually drives updates to all those services when it lands21:16
opendevreviewClark Boylan proposed opendev/system-config master: Add apache-ua-filter file path matches where used  https://review.opendev.org/c/opendev/system-config/+/95399321:17
clarkbfungi: ^ sorry I noticed that after you reviewed it21:17
funginp21:17
clarkbalso I really like that git-review (gerrit really) tells you what the label removals are when you push21:19
clarkbmakes it easy to catch things like this21:19
clarkbcorvus: I think I've laerned something about the zk average latency values. Its the latency to clients not within the cluster. zk03 is ucrrently at 0 beacuse it has no connections (it was restarted last in the rolling restart just a bit ago so all connections went to zk02 or zk01)21:21
opendevreviewMerged opendev/system-config master: Add more UA filters  https://review.opendev.org/c/opendev/system-config/+/95390421:29
corvusclarkb: yeah, i was afraid we'd need to delete the dashboards.... hopefully we have an admin user that can do that.21:57
corvusi'll try to get to it in a bit21:57
clarkbcorvus: I seem torecall there is a token we can use like etherpad used to do22:09
clarkbcorvus: ya looks like there are the bits needed for admin api access in /etc/grafana/secrets22:10
clarkbcorvus: if you get a moment for https://review.opendev.org/c/opendev/system-config/+/953993 that should actually apply the new UA filters to the mailman server (or we'll wait for periodic this evening)22:13
clarkbI can still access gitea so it seems to work in general22:13
corvus+322:14
corvusi deleted the zuul-lancher dashboards22:30
corvusi logged in as the admin user through the web ui normally, which worked, but deleting the dashboards failed with a 403 origin not allowed22:31
corvusso i set up an ssh port forward for 3000:localhost:3000 and repeated it that way, and it worked22:31
corvusi have re-enqueued the deploy buildset22:32
clarkboh right we set up a limitation for admin access. I can't remember why but I seem to recall doing that22:33
clarkbcorvus: it failed again. i think the hint as to why is earlier in the log: DEBUG:grafana_dashboards.cache:Using cache: /root/.cache/grafyaml/cache.dbm22:40
clarkbcorvus: I think it may be complaining that the cache.dbm database has the duplicate and note grafan itself?22:40
clarkbhrm i don't see a /root/.cache/grafyaml/22:43
clarkboh that is from the container perspective22:45
clarkbmaybe it lives elsewhere22:45
clarkbhrm we don't seem to bind mount anything to that path. So maybe it is building that db and finding stale content within grafana somewhere?22:47
clarkbI'm nto sure I understand why this is happening22:54
corvuslooks like /opt/project-config/grafana has yaml and json files22:54
corvusthat dir gets bind-mounted into a temporary container to run grafyaml22:55
corvusi don't understand why it has both22:55
clarkbhrm I think that comes from project-config syncing?22:55
corvusi think we rsync it?22:55
corvusdo we check it out on bridge then rsync it to grafana02?22:56
clarkbthat sounds right. Looking on bridge there are only json files for zuul-launcher22:56
corvusdo we need to add an extra argument to sync to delete?22:56
corvushttps://docs.ansible.com/ansible/latest/collections/ansible/posix/synchronize_module.html#parameter-delete22:57
clarkbI'm trying to find where role sync-project-config lives22:57
corvusthat's used in a number of places... 22:57
corvushttps://paste.opendev.org/show/boOAIOkIsHjDJfYvPPUz/22:57
corvusit's in system-config22:57
corvusdo we think it's okay to set that for all of them?22:58
clarkbthe synchronize does indeed only set a source and a destination22:58
clarkbcorvus: I think the main one I would be worried about is gerrit22:59
corvuswhy's that?22:59
clarkbBecause jeepyb is reading from files in there (potentially, maybe we copy out acls and project config?) but acls in particular I could see there being a problem if we only have an acl file beacuse we haven't deleted it23:00
corvusthat sounds like this would still correct a bug, no?23:00
clarkbwhereas things like eavesdrop are reading gerritbot config and accessbot configs and are less likely to have the "file missing oops" problem23:00
corvusalso, we would accidentally correct this situation every time we deploy a new gerrit server, so if we are relying on zombie acl files, the problem would only extend to there, right?23:01
clarkbcorvus: well yes, but it may break creation of new projects? Though I think jeepyb will try to create all projects and continue along then report success/failure rather than short circuiting23:01
opendevreviewJames E. Blair proposed opendev/system-config master: Update sync-project-config to delete  https://review.opendev.org/c/opendev/system-config/+/95399923:01
clarkball of the other systems grab data out of project-config in ways that I think are less prone to errors like this23:02
clarkbbecause they look at singular files23:02
corvusyeah... to me it sounds like a bandaid we should rip off... i think it's unlikely to be a problem23:02
clarkbbut yes I think booting the new review03 server is an indication that this is also unlikely to be a problem for gerrit since that was semi recent23:02
corvusshould be able to check by listing the files on the gerrit server23:02
clarkband yes fixing it seems ideal. I'm just thinking through where/what the fallout could be23:03
corvusi'm going to manually delete the files on grafana so we don't have to rush that change.23:03
clarkback23:03
corvusdeleted and re-enqueued again23:03
clarkbI +2'd and left notes summarizing the above discussion23:05
clarkbI think this is a coorect change. Just one to think about carefully and monitor when landing23:05
clarkbcorvus: thinking out loud 953999 isn't running the grafana job or gerrit jobs. I think due to the way we setup a fake bridge and all of that the change would be self testing if we did so (at least in some capacity)23:07
clarkbcorvus: maybe we should update job trigger file matchers as part of that change and get a bit of test coverage that way?23:08
*** prometheanfire is now known as Guest2125923:09
corvuswant to basically add these? https://paste.opendev.org/show/boOAIOkIsHjDJfYvPPUz/23:09
clarkbcorvus: ya though I think run-accessbot and service-eavesdrop are both covered by the eavesdrop job23:12
clarkbso we'd add playbooks/roles/sync-project-config to the gerrit jobs (there are three), zuul, nodepool, grafana, and eavesdrop?23:12
corvusjust for the test jobs or the deploy ones as well?23:12
clarkbI feel like adding it for both is probably more complete? that will ensure we sync the deletions when we land the chagne rather than waiting for hourlies or dailies23:13
opendevreviewMerged opendev/system-config master: Add apache-ua-filter file path matches where used  https://review.opendev.org/c/opendev/system-config/+/95399323:13
clarkbmight make monitoring the updates easier23:13
clarkbthe new graphs are present on the server now23:13
*** Guest21259 is now known as prometheanfire23:14
opendevreviewJames E. Blair proposed opendev/system-config master: Update sync-project-config to delete  https://review.opendev.org/c/opendev/system-config/+/95399923:15
corvuswould be cool if we did static analysis on the playbooks to generate the file list23:15
corvusthe gauges are not quite right; our grafana may be too old23:18
clarkbcorvus: two comments about jobs getting the matchers. I think there is a mismatch in each of the updated files23:18
clarkbre gauges and too old grafana I think we can upgrade grafana these days since we acught up semi recently?23:19
clarkbI can probably look into that more closely if you think it would be helpful23:19
clarkbI think lists got its apache config reloaded at 23:17 UTC23:20
fungiload average on the server is reasonable, but looks like it was not under significant load before that either23:21
clarkbpreviously the biggest cpu hog was mariadb iirc23:22
clarkbmariadb is still up there but its using less now (it was a full cpu before)23:23
clarkbprobably easiest to check in tomorrow looking at requests for the next day and see if they're more resonable. Also two of the 5 apache processes are still old23:23
clarkboh those have just disappeared now23:24
opendevreviewJames E. Blair proposed opendev/system-config master: Upgrade grafana to 12.0.2  https://review.opendev.org/c/opendev/system-config/+/95400023:26
opendevreviewJames E. Blair proposed opendev/system-config master: Update sync-project-config to delete  https://review.opendev.org/c/opendev/system-config/+/95399923:28
corvusokay i think that's got all the things23:28
clarkbya that looks better. Then for grafana we just want to confirm we get graphs out of testing as I think the service itself is fairly stateless23:32

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!