Thursday, 2025-07-10

corvuslauncher behavior with the periodic jobs looks ok02:35
corvuslooks like flex is operating again13:32
hasharfungi: thank you for the release of git-review 2.5.0 :]13:40
fungihashar: you're welcome!13:51
fungicorvus: i got a ticket in my personal rackspace account yesterday about network maintenance for flex sjc3, but it's not until wednesday next week13:54
fungii didn't see any notices for yesterday13:54
fungiit does look like theu've started listing flex on https://rackspace.service-now.com/system_status but the history there is all green back through the weekend13:57
fungilooks like they're listing a new (iad3 maybe?) pop too13:58
fungiclarkb: corvus: yeah, i think it's fine to do the at-rest checksum for now, odds are it won't be that much extra time and depending on whether i/o or cpu are the bottleneck we could maybe look at something parallelizable e.g. simd acceleration or something14:05
fungibut first let's see how the simple implementation works out in practice14:05
fungialso the change failed in the gate14:05
funginox-py311 and 312 both hitting timeouts in different tests (test_launcher_image_expire and test_launcher_crashed_upload)14:08
corvusyeah, i expect more tests need updating for the new fixtures; it's #2 on my list14:10
fungii'm going to disappear in a bit to run an errand and grab lunch while i'm out, probably mia 14:30-16:00 utc or thereabouts14:12
corvusnodepool has seen a request for 'openEuler-22-03-LTS' which is an image not handled by zuul-launcher.  how should we proceed?14:17
fungiwhat was the requesting project?14:22
fungisomething cfn/.* presumably?14:22
corvus2025-07-10 13:33:08,601 DEBUG zuul.Pipeline.openstack.check: [e: a528ca3243c040c499eb48fae80de896] Build <Build 919aec1210f54070a577c8e40ce12b95 of devstack-platform-openEuler-22.03-ovn-source voting:False> of <QueueItem d40c654da20f4076a285ed78ee23ced7 live for [<Change 0x7f8520131290 openstack/devstack 954606,1>] in check> completed14:24
corvusi'm guessing that one14:24
corvushttps://zuul.opendev.org/t/openstack/job/devstack-platform-openEuler-22.03-ovn-source14:25
corvusyep14:25
fungiaha, okay i'll let the openstack qa folks know, but since it's non-voting i bet it's been broken for a long time14:25
corvusthat has definitions on stable/2024.1 and unmaintained/2023.114:25
corvusit actually has some recent successes14:26
fungiah, so already removed in later branches anyway14:26
corvusyep that's what it looks like14:27
corvusso i think the message something like "this is running on stable/2024.1 and unmaintained/2023.1 but it's about to stop working because no one is maintaining the openeuler image in opendev, so if it's important to keep running, talk to #opendev about volunteering to maintain that image or if it's not important, you may want to go ahead and remove the job"14:29
corvusthen my next question is, how long do we leave nodepool running for that?14:30
fungiyeah, i actually asked them in #openstack-qa if there would be objections to me pushing a couple of patches to just remove it from the pipelines14:30
fungii'll do that after i get back from lunch14:30
fungianyway, headed out now14:30
corvuscool. also, in that message we sent out a couple months ago, we told everyone that openeuler was going to be removed anyway14:31
corvusmaybe we can shutdown nodepool later today then.  also, in our email, we said that people would start getting node_errors for unsupported images "tentatively july 1"14:32
Clark[m]I'm surprised the devstack job can pass but maybe they updated it to not use our mirror14:32
corvuslooking at the current behavior, i think there's still a tweak we need to do to avoid accepting requests too early: https://review.opendev.org/95461814:39
clarkbcorvus: +2 from me14:50
corvusclarkb: what was the blocker for upgrading grafana? https://review.opendev.org/95400014:58
clarkbcorvus: oh I meant to followup on that. The screenshots have a lot of empty graphs. I wanted to hold a node and see if that is a selenium timing thing or an actual problem with the upgrade14:59
clarkbhttps://e36203e60051d918bd96-b4b1a7d89013756684de846d3b70c9e9.ssl.cf2.rackcdn.com/openstack/946744e8dd314260a4b13c16f57f71b6/bridge99.opendev.org/screenshots/zuul-status.png for example14:59
clarkbI'll work on that now15:00
corvusah.  i'm looking at the graphs, and i think it's a selenium thing15:00
corvuslike, grafana deciding that the other graphs aren't in view, so it doesn't render them15:00
clarkbya I think that is likely as I seem to recall grafana puts an annotation on the graphs that are broken but it should be quick to double check15:01
opendevreviewClark Boylan proposed opendev/system-config master: DNM Force failure in grafana testing to hold a node  https://review.opendev.org/c/opendev/system-config/+/95462415:04
clarkbok a hold is in place for that now so as soon as the grafana is running we should be able to load that servicel ocally and check the graphs directly15:07
clarkbcorvus: https://158.69.72.204/d/21a6e53ea4/zuul-status?orgId=1&from=now-6h&to=now&timezone=utc the graphs look good to me. I think you're right and we need to scroll to the bottom of the page to get the screenshots to work properly15:27
clarkbI've +2'd the upgrade change. Maybe see if fungi or frickler want to review it in the next little bit and approve otherwise?15:29
opendevreviewClark Boylan proposed opendev/system-config master: Scroll grafana pages to force all graphs to load  https://review.opendev.org/c/opendev/system-config/+/95462415:36
clarkblets see if that fixes the screenshots in the job15:36
fungilooks like our root alias did receive network maintenance ticket notifications from rackspace too16:22
fungipresumably similar to the one i got at my personal account, though i haven't confirmed yet16:23
clarkbha I think the scroll script worked but not completely. https://efb4199e6af2b2032769-ac03866ed76044a5727f648521330347.ssl.cf5.rackcdn.com/openstack/17978b012dfd4dedb0b858a89ba9118e/bridge99.opendev.org/screenshots/zuul-status.png16:24
clarkbits changing the window location not scrolling past each graph as it goes. So the top and bottom render and the middle does not. I'll have to see about doing a proper 100 pixel at a time scroll or something16:25
fungimakes sense, but also i wish browser-based applications like grafana had a get var you could set to just render everything immediately16:26
clarkbya that would be nice16:26
fungi?imnotreallyabrowser=true16:26
corvushttps://github.com/grafana/grafana/issues/17257#issuecomment-273607020616:28
corvusapparently we can set that on the dashboard, but that would affect real users too16:28
corvus(maybe we want that?  i dunno)16:29
fungiimplemented as of a few months ago i guess16:29
fungihttps://github.com/grafana/grafana/issues/105656 seems to be a followup16:30
stephenfinclarkb: sorry, I forgot to open HexChat today. The pbr series should be green now. I had (a) included mock in five.py despite it not being a runtime dependency and (b) not include __future__.absolute_imports in the _compat modules, causing them to pick up the wrong packaging. Both addressed16:31
stephenfinfungi: clarkb: Also, I know you use nox for zuul and other opendev projects now, but https://github.com/tox-dev/tox/pull/3556 just merged. We probably want to migrate to that everywhere once it's released16:32
fungilooks like the site-wide grafana toggle was added as far back as 11.2.016:32
fungistephenfin: very cool!16:33
stephenfinin fact, it's probably something worth bot proposing, assuming the constraints update scripts are flexible enough to do that. I'll see if elodilles or tonyb have any ideas tomorrow16:34
fungithough in zuul jobs with tox-siblings we separately preinstall the dependencies i think so that we can also install some deps from git sources when used in cross-repository testing scenarios16:34
fungibut for local testing it sounds like a real win16:35
fungihttps://opendev.org/zuul/zuul-jobs/src/commit/8110acc/roles/tox/library/tox_install_sibling_packages.py does the heavy lifting there16:37
* stephenfin looks16:37
fungiit's possible it just magically works the way we're invoking things there too16:38
fungibut the takeaway is that it's not necessarily *just* the project being tested that needs to be excluded from constraints16:39
Clark[m]stephenfin: I see you rechecked the compat change. It failed building a wheel for PBR but that was as far as I got in understanding the failure 16:39
Clark[m]If it fails again on recheck I guess we need to dig deeper. But it tests for child changes were happy it probably was just a fluke16:40
stephenfinthat's my assumption, based on the stack above it16:40
opendevreviewMerged opendev/system-config master: Upgrade grafana to 12.0.2  https://review.opendev.org/c/opendev/system-config/+/95400016:43
Clark[m]fungi: stephenfin: I think sibling installs work by first running tox no tests to install everything according to the tox rules. Then it does a package listing and any that match the siblings list get installed from source. Then it runs tox without the install step to run the tests16:45
stephenfinfungi: I think those things are orthogonal? If understand that correctly, we let tox create the venv, then overwrite what's installed?16:45
stephenfinjinx16:45
Clark[m]So that should work regardless of constraints in tox. In fact most Openstack tox configs already apply constraints just manually in an overridden install command16:45
stephenfinThis constraints feature lets us drop the 'deps' section entirely in most cases, and instead rely on tox pull and installing package dependencies for us, with '[testenv] extras' to ensure we install test or doc dependencies16:46
stephenfinEventually, I would like to see us move away from requirements.txt files entirely and put everything in pyproject.toml, but there's a lot more tooling to be reworked before we can do that16:47
stephenfins/tox pull/tox pulling/16:47
fungiyeah, so it should be compatible with that new mechanism as well16:50
fungistephenfin: i don't know of anything in particular stopping us from moving off requirements.txt files, we've already done that for several opendev tools16:50
fungistephenfin: for example https://opendev.org/opendev/bindep/src/branch/master/pyproject.toml#L31-L3616:51
stephenfinbindep doesn't use constraints I assume?16:51
fungiwe also replaced test-requirements.txt and similar lists but that's more controversial for now, see the comments further down about pep 735 support16:52
stephenfinthat new tox feature was the main thing holding things https://review.opendev.org/c/openstack/openstacksdk/+/953484/16:52
fungiyeah, good point, constraints throws a wrench into that16:52
fungior did anyway16:52
stephenfinI was also under the assumption that the check-requirements job was checking requirements.txt files, but maybe it uses packaging16:52
stephenfin(the lib)16:52
fungiwell, opendev also doesn't rely on central requirements coordination like openstack dows16:53
fungis/dows/does/16:53
stephenfinright16:53
fungiso yes the conventions that openstack/requirements automation relies on may need to get adjusted16:54
stephenfingtema said the same thing about codegenerator, but that also doesn't use constraints or the broader requirements tooling16:54
fungibut purely from a pbr-using project perspective, it's working well16:56
opendevreviewClark Boylan proposed opendev/system-config master: Scroll grafana pages to force all graphs to load  https://review.opendev.org/c/opendev/system-config/+/95462416:58
clarkbsetting behavior: "smooth" might be sufficient for getting it to actually scroll past each graph? If that works then I don't think we need to bother with changing dashboard or server settings16:58
opendevreviewTristan Cacqueray proposed zuul/zuul-jobs master: Ignore .htaccess files iin the zuul-manifest  https://review.opendev.org/c/zuul/zuul-jobs/+/95464417:10
fungianyway, 2025-07-16 18:00-22:00 utc looks like a planned network outage for rackspace flex sjc3 in case we want to do anything in preparation for that17:19
fungithough hard to be sure those are the times since they just list them in "pst" so i'm not sure if they really mean pdt17:19
fungiit's during a time of year when most/all locales observing pacific time use a daylight savings offset17:20
fungireally wish providers would just use utc instead, its way less ambiguous in situations like this17:21
fungidan_with: do you happen to know of "pst" maintenance times for rackspace are really pacific daylight time this part of the year?17:21
fungicardoe: ^ you might know too17:22
corvushttps://grafana.opendev.org/d/0172e0bb72/zuul-launcher3a-rackspace-flex?orgId=1&from=now-6h&to=now&timezone=utc&var-region=$__all17:22
corvusdo a shift-reload on that... grafana has upgraded and the gauges are correct now!17:23
clarkblooks great17:23
fungivery nice!17:23
fungisuperimposing rested over the quota really tells a far better story17:23
fungier, requested17:24
corvusi notice that our ram limit (which is, unfortunately, not easily viewable on that dashboard, unless you do math) is lower in sjc3 than dfw3.  that's why we're hitting a limit of like 35 instances in sjc3, under our instance limit17:24
clarkbhrm I want to say that the memory limit is what determiend the original 50 max nodes limit in nodepool17:25
clarkbI wonder if they changed that on the cloud side later?17:25
corvusit's something on the order of 256gb in sjc3 and 512gb in dfw317:26
opendevreviewMerged opendev/git-review master: Clean up all references to branchauthor after removal of usage  https://review.opendev.org/c/opendev/git-review/+/95146717:31
fungii've proposed devstack changes removing openeuler jobs and nodeset from old branches17:34
corvusfungi: thanks!17:34
fungiresponse in their irc channel was generally favorable17:35
corvusi have restarted both zuul launchers and both zuul web servers17:37
corvushttps://zuul.opendev.org/t/openstack/nodeset-requests and https://zuul.opendev.org/t/openstack/nodes look a little different now if you want to take a look17:37
corvushrm, someone is using an ubuntu-jammy-32GB label...17:37
corvusalso the images page is updated https://zuul.opendev.org/t/openstack/image/ubuntu-noble17:44
fungioh, that's all awesome. i hadn't even been aware of the images page until now17:45
clarkbcorvus: maybe openstack helm using the large node?17:52
clarkbI seem to recall they complained in the past when they would node failure17:52
clarkband were resistant to migrating jobs to more smaller nodse as a workaround17:53
clarkbI think the grafana scroll hack is working now. Except the zuul-status screenshot wants a login. Other dashboards appear to have rendered all the graphs. I guess I recheck that to see if the login thing is consistent17:58
clarkbstephenfin: looks like the pbr-installation-openstack-jammy job is going to fail again17:59
clarkbwe can set up a hold for that too if we think it will help18:01
clarkbI'll clean up my hold for grafana now that the upgrade is done18:02
corvusthere are a couple of ubuntu-jammy-arm64 ready nodes (likely orphaned from dequeued items) that aren't being used. i think the most recent update to delay accepting requests may help with that in the future (and these nodes predate that), so i'm going to disregard it for now.18:18
corvusactually, i will manually set them to "used" so they get deleted and the quota recovered18:18
clarkbcorvus: the node timeout would eventually clear them out after ~8 hours if left alone?18:20
corvusyeah, or they could get picked up by a future request once we burn through the backlog18:20
corvusthis is a really good graph of the pressure that the arm provider is under: https://grafana.opendev.org/d/2c6f499090/zuul-launcher3a-osuosl?orgId=1&from=2025-07-09T18:24:08.083Z&to=2025-07-10T18:19:20.302Z&timezone=utc&var-region=$__all18:21
corvusunfortunately, once my fix for delaying requests takes effect, we won't see graphs like that anymore, because the backlog will be held in the request stage, not the node stage18:22
corvus(we should probably try to make a requests-per-label graph to see that in the future)18:22
corvusbut at least right now, you can see the magnitude of the backlog for arm nodes pretty easily18:23
clarkblooks like a fairly diverse crowd asking for the arm resources too18:27
clarkbkolla, nova, swift, cinder, neutron18:27
clarkbeven system-config18:27
clarkbhttps://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_55c/openstack/55c1f535872345af985a260631fe8e62/bridge99.opendev.org/screenshots/zuul-status.png this screenshot looks a lot better if anyone wants to review https://review.opendev.org/c/opendev/system-config/+/95462418:28
clarkbfungi: looking at https://review.opendev.org/c/zuul/zuul-jobs/+/954644 which stops adding .htaccess files to the zuul manifest (so those files won't show up in the zuul dashboard) I think this is ok but I seem to recall some openstack doc jobs do want to upload .htaccess files. I don't think this affects the upload of the files just of what the zuul dashboard tries to expose18:35
clarkbya looking at our base job the manifest generation and upload steps are distinct so I don't think the openstack docs use cases are affected. That said maybe we should be generating manifests with a complete listing for accuracy. I'll leave a review18:38
fungithanks, commented18:39
clarkbya I think this boils down to "is it better to be accurate in listing things for completeness (eg to avoid confusion about whether or not the file was present and uploaded) or to avoid potential subsequent errors through user interaction18:41
corvusyes i don't think we should lose fidelity there18:57
fricklerdid I miss something on my trixie change? https://zuul.opendev.org/t/openstack/image/debian-trixie still looks empty. I can check myself tomorrow if needed19:42
fungijust saw a node_failure result on https://zuul.opendev.org/t/openstack/build/4393463ace0a4c42b452bc50174c391519:45
fungii guess we're not doing fedora images, no?19:45
Clark[m]We deleted fedora a long time ago19:47
keekz@fungi i looked up that rackspace change on 2025-07-16 and 2025-07-16 18:00-22:00 utc is correct. i don't know how / where they got pacific time from... my teams have always used UTC. 19:50
fungithanks keekz. it's phrased in pst in the tickets19:51
funginothing you control i'm sure19:51
fungii guess they figure people using the san jose region probably all live in california and would appreciate a personal touch with tz conversions ;)19:51
keekzyeah, i see that. internally the change also has central US time (where headquarters is at). no clue why they're using pacific time ever for anything :)19:52
keekzservice-now also doesn't distinguish any time zone for the maintenance. it just gives a time with no zone 🤷 i think it depends on who created the change because i've always used UTC myself19:53
fungibut thanks for confirming my assumption that they meant pacific daylight time for the offset19:53
fungii guess "our users might also appreciate utc times because they're not necessarily local to the data centers" would be my main feedback19:54
funginext time i close out one of those notification tickets i'll try to remember to stick that in feedback comment19:55
fungipersonally, i've got (legacy, not flex) servers in their sydney region too, and that's not a tz i can convert in my head and track annual daylight time shifts for19:57
clarkbfrickler: I think some of the other images were missing necessary metadata on the jobs to indicate which images they build? Not sure if that is the case for trixie19:57
corvusthe most recent periodic image buildset failed, so there were no uploads from that (it's all-or-none...we may want to revisit that choice, but that's what it is now)20:00
corvusbut this gate build should have at least caused an upload: https://zuul.opendev.org/t/opendev/buildset/4d5004904115442db010b4f6f58a21df20:01
corvusunfortunately, that was 2 days ago, and i deleted the servers that would have handled that, so we don't have those logs20:01
corvusi'll trigger a manual build of trixie to get more logs20:02
corvusor... maybe not.... because the build trigger returned a 40420:02
opendevreviewJames E. Blair proposed opendev/zuul-providers master: Add debian-trixie image to providers  https://review.opendev.org/c/opendev/zuul-providers/+/95465820:05
corvusfrickler: ^ i think that's what's missing.  you will also need to add labels for it to the providers.20:05
corvusi suspect if we had the logs from the launcher they would have said that there were no uploads so it deleted the artifacts.20:06
opendevreviewMerged opendev/zuul-providers master: Add debian-trixie image to providers  https://review.opendev.org/c/opendev/zuul-providers/+/95465820:17
corvusbtw, a clue to that was visiting this page: https://zuul.opendev.org/t/opendev/provider/rax-dfw-main20:28
corvus(debian-trixie did not appear before, but does now)20:28
opendevreviewClark Boylan proposed opendev/infra-specs master: Update existing specs to match the current reality  https://review.opendev.org/c/opendev/infra-specs/+/95466220:31
clarkbinfra-root ^ I applied some best judgement to the lists there in order to update the staet of things20:31
clarkbfigured I should do that before writing a new spec for opendev comms in matrix20:31
clarkbI did leave the storyboard specs as is as I'm not sure if we want to consider them all abandoned or worthy of effort or what20:31
fungilooks like the specs repo has some docs building bitrot too20:39
clarkbI should've expected that. I'll look into fixing that too20:41
opendevreviewClark Boylan proposed opendev/infra-specs master: Update existing specs to match the current reality  https://review.opendev.org/c/opendev/infra-specs/+/95466221:30
opendevreviewClark Boylan proposed opendev/infra-specs master: Make infra specs buildable again  https://review.opendev.org/c/opendev/infra-specs/+/95467021:30
clarkbsomething like that maybe21:30
clarkbgetting it to look not terrible was more difficult than I expected21:33
opendevreviewMerged opendev/system-config master: Update sync-project-config to delete  https://review.opendev.org/c/opendev/system-config/+/95399921:34
clarkbI wonder if we should push a trivial grafana and/or gerrit acl config update that simply renames a file to see that ^ is happy21:35
clarkbthe deploy jobs for that are going to run against grafana and review and zuul fwiw21:35
fungii'll find out if the foundation folks think the openinfra acls need dco enforcement, in which case i can push a quick acl patch for that21:35
clarkbfungi: the key bit is that the acl file also be renamed so that the rsync delete behavior gets exercised21:36
clarkbbut ya should be able to put those two things into one change and not add a bunch of noise to the git log21:36
fungioh21:37
fungisure21:37
clarkbI don't expect it to be an issue but rather than discover a problem years? later like corvus did with the grafyaml files might be good to double check upfront21:38
corvusi figured now is an okay time to merge that... getting later in the week but not too late :)21:40
clarkbyup. Though looking closer the infra-prod-service-review job doesn't exercise it at all. Its the manage-projects job that does. The infra-prod-service-grafana job should've exercised it though21:40
clarkbjust in a noop manner at the moment21:41
clarkboh the gerrit role also syncs project-config but I'm not sure if it strictly needs to. I think it is manage-rpojects that uses the data21:51
clarkbclarkb@review03:/opt/project-config/gerrit/acls$ find ./ -name '*.config' | wc -l returns 737 which matches the count I have locally21:52
clarkbso ya short of actually creating a file rename / deleted sitaution I'm not sure what else I can check. This seems to be correct until we introduce that situation21:53
opendevreviewJeremy Stanley proposed openstack/project-config master: DCO enforcement for all OpenInfra Foundation repos  https://review.opendev.org/c/openstack/project-config/+/95467222:14
clarkbfungi: one thing to note is that manage projects runs have been very slow due to overloaded gitea :/22:28
clarkbthings look okish right now though. Or at least less bad than when I checked yseterday22:28
fungiclarkb: one comment on the specs build fixup change, otherwise lgtm22:35
opendevreviewClark Boylan proposed opendev/infra-specs master: Make infra specs buildable again  https://review.opendev.org/c/opendev/infra-specs/+/95467022:37
opendevreviewClark Boylan proposed opendev/infra-specs master: Update existing specs to match the current reality  https://review.opendev.org/c/opendev/infra-specs/+/95466222:37
clarkbfungi: good point ^ that should handle it22:37
fungioh, also in the preview i noticed at the bottom it says "OpenStack Foundation" for the copyright22:38
funginot sure if that's worth adjusting now or save for later22:38
clarkbfungi: yup I didn't want to change that since its a copyright line.  Ifigured that deserved a commit of its own with justification if we want to change it22:38
fungiagreed22:38
clarkbI don't like "hiding" copyright/attribution changes amongst other updates22:38
fungimakes total sense, yep22:39

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!