opendevreview | Tony Breeds proposed opendev/system-config master: [DNM] Test insecure-ci-registry.opendev.org on jammy https://review.opendev.org/c/opendev/system-config/+/885421 | 00:05 |
---|---|---|
opendevreview | Ian Wienand proposed opendev/system-config master: system-config: update to Ansible 9 https://review.opendev.org/c/opendev/system-config/+/885422 | 00:30 |
opendevreview | Tony Breeds proposed opendev/system-config master: [DNM] Test insecure-ci-registry.opendev.org on jammy https://review.opendev.org/c/opendev/system-config/+/885421 | 01:22 |
*** amoralej|off is now known as amoralej | 07:47 | |
*** amoralej is now known as amoralej|lunch | 11:02 | |
opendevreview | Merged openstack/project-config master: Cache new cirros images https://review.opendev.org/c/openstack/project-config/+/885005 | 11:39 |
fungi | python 3.7.17 is likely to be the final 3.7.x point release, as it's due to be eol after this month | 12:09 |
*** blarnath is now known as d34dh0r53 | 12:42 | |
*** amoralej|lunch is now known as amoralejk | 12:58 | |
*** amoralejk is now known as amoralej | 12:58 | |
opendevreview | Tony Breeds proposed opendev/system-config master: [DNM] Test insecure-ci-registry.opendev.org on jammy https://review.opendev.org/c/opendev/system-config/+/885421 | 13:29 |
opendevreview | Tony Breeds proposed opendev/system-config master: [DNM] Test insecure-ci-registry.opendev.org on jammy https://review.opendev.org/c/opendev/system-config/+/885421 | 13:36 |
tonyb | That ^^ is failing on bridge99 beacuse there isn't a letencrypt cert for my fake insecure-ci-registry99. I don't know if I'm off in the weeds or on the right path but either way advice needed | 13:53 |
fungi | heading out to run errands but should be back at the screen in an hour-ish | 14:17 |
*** ralonsoh__ is now known as ralonsoh | 14:26 | |
frickler | another thing I noticed during my zuul config error cleanup: zuul continues to run check jobs when a patch has been merged. maybe this is ok since usually nobody is expected to interfere with zuul, but maybe also an option to improve this. see e.g. https://review.opendev.org/c/openstack/adjutant/+/885382 | 14:53 |
opendevreview | Tony Breeds proposed opendev/system-config master: [DNM] Test insecure-ci-registry.opendev.org on jammy https://review.opendev.org/c/opendev/system-config/+/885421 | 15:08 |
fungi | frickler: maybe a good feature would be another pipeline flag that works sort of like the supercedes option but is specifically for cancelling buildsets in those pipelines on merge | 15:19 |
fungi | though bypassing testing is an infrequent action, so i wouldn't consider it all that wasteful | 15:20 |
fungi | we don't really use supercedes in the openstack tenant because of the clean-check antipattern we have traditionally employed there, but the more typical configuration is for check pipeline builds to be cancelled automatically if the change enqueues into the gate pipeline | 15:21 |
clarkb | tonyb: that looks about right, will need to check job results if it continues to fail for more info | 15:25 |
Ramereth | frickler: I see an image on your image-list page that claims to be in deleting, but on our end it's active and still has a VM using it. This is the first time I've noticed this as it's usually queued on our end. Shall I remove the VM and the image manually? | 15:32 |
Ramereth | clarkb: fungi ^ | 15:32 |
fungi | Ramereth: what's the server instance uuid? i'll take a look | 15:33 |
Ramereth | 876cca52-d530-47cd-a82c-e0b529323ba9 | 15:33 |
fungi | we've seen it before in boot-from-volume situations since nodepool may rotate out images while servers are still booted from them (but does usually indicate a leaked server in that case) | 15:33 |
fungi | checking | 15:33 |
opendevreview | James E. Blair proposed opendev/system-config master: Replace ze07-ze09 https://review.opendev.org/c/opendev/system-config/+/885170 | 15:33 |
opendevreview | James E. Blair proposed opendev/system-config master: Replace ze01-ze03 https://review.opendev.org/c/opendev/system-config/+/885508 | 15:33 |
opendevreview | James E. Blair proposed opendev/system-config master: Replace ze04-ze06 https://review.opendev.org/c/opendev/system-config/+/885509 | 15:33 |
opendevreview | James E. Blair proposed opendev/system-config master: Replace ze10-ze12 https://review.opendev.org/c/opendev/system-config/+/885510 | 15:33 |
fungi | Ramereth: looks like that's a debian-buster-arm64 "ready" node which nodepool booted at 02:11:15Z yesterday in anticipation of jobs requesting it, but we just haven't run any jobs since then which needed one of those. i don't think it's leaked, just that the nodepool image builders will start trying to delete old images after rotating them out, and that node has been hanging around waiting to | 15:38 |
fungi | be called on longer than usual | 15:38 |
clarkb | it should clean itself up after that node gets used | 15:39 |
Ramereth | ah ok, then I'll just ignore it. I have a nagios check quering your URL notifying me when it sees one in a deleting state | 15:39 |
fungi | in clouds where we don't bfv that isn't usually an issue, but obviously a bfv node can't have its backing store deleted | 15:39 |
clarkb | we might want to set ready nodes for older labels to 0 though | 15:39 |
Ramereth | FWIW I did notice that url was returning 503 yesterday at one point | 15:39 |
fungi | yeah, i'm good with that. we don't need to pre-boot buster nodes | 15:39 |
clarkb | Ramereth: which url? was it https://nb04... something? | 15:40 |
Ramereth | yup | 15:40 |
opendevreview | James E. Blair proposed opendev/zone-opendev.org master: Replace ze07-ze09 https://review.opendev.org/c/opendev/zone-opendev.org/+/885168 | 15:40 |
opendevreview | James E. Blair proposed opendev/zone-opendev.org master: Replace ze01-ze03 https://review.opendev.org/c/opendev/zone-opendev.org/+/885513 | 15:40 |
opendevreview | James E. Blair proposed opendev/zone-opendev.org master: Replace ze04-ze06 https://review.opendev.org/c/opendev/zone-opendev.org/+/885514 | 15:40 |
opendevreview | James E. Blair proposed opendev/zone-opendev.org master: Replace ze10-ze12 https://review.opendev.org/c/opendev/zone-opendev.org/+/885515 | 15:40 |
clarkb | that was likely due to me stopping the backend service and upgrading things ad doing a reboot | 15:40 |
Ramereth | gotcha | 15:41 |
corvus | fungi: clarkb you can also set max-ready-age to force old nodes to get gc'd (and still keep ready nodes) | 15:41 |
fungi | also a fine alternative in my opinion | 15:41 |
frickler | max-ready-age=1d would seem to be a good match to our image rotation timing | 15:48 |
frickler | and not overly wasteful, either | 15:48 |
tonyb | clarkb: Thanks. making slow progress, at least it's failing in testinfra now :) | 15:50 |
frickler | while we're at failing things, where are we at with repairing wheel builds? I think ianw had some patches up for that? | 15:52 |
clarkb | frickler: I'm not sure. I thought where we were at was cleaning up unneeded wheels but that builds were working? It has been a bit since i look at it though | 15:55 |
frickler | clarkb: well the grafana AFS page says last released 1 month ago, I didn't dig deeper yet | 15:57 |
frickler | https://review.opendev.org/c/openstack/project-config/+/879722 is the patch I had in mind, that would at least decouple the failures. since we don't have working releases anyway, I would think it is a low-risk patch we can simply give a try by now | 16:08 |
frickler | also c9s seems to be the culprit once again with openafs failing https://zuul.opendev.org/t/openstack/build/cb8ce090d03d4b039501ea6c3ea87beb | 16:12 |
corvus | i'm going to restart zuul-web | 16:12 |
frickler | corvus: anything happening in particular? (just being curious) | 16:13 |
corvus | frickler: oh just want to get the new errors page up | 16:16 |
corvus | https://zuul.opendev.org/t/openstack/config-errors | 16:17 |
clarkb | frickler: on that wheels change I left a small but important suggestion. Otherwise ya I think we can alnd that | 16:18 |
tonyb | clarkb: I think based on https://6badbd21c5540c7fe6af-e13746d46d33f29609826c7d7a815da2.ssl.cf1.rackcdn.com/885421/5/check/system-config-run-docker-registry/1b4d7b6/insecure-ci-registry99.opendev.org/docker/registry-docker_registry_1.txt that the new node isn't getting the correct group vars | 16:23 |
tonyb | clarkb: which I'd expect to be: https://6badbd21c5540c7fe6af-e13746d46d33f29609826c7d7a815da2.ssl.cf1.rackcdn.com/885421/5/check/system-config-run-docker-registry/1b4d7b6/bridge99.opendev.org/etc/ansible/hosts/group_vars/registry.yaml | 16:23 |
tonyb | Am I missing something that maps the new hostname into the correct group/role? | 16:24 |
clarkb | tonyb: inventory/service/groups.yaml defines the groups and the new *99 host should match the registry group I think | 16:26 |
clarkb | tonyb: I did leave a comment about a small thing I noticed (won't be the cause of the issue but may help debug it?) | 16:29 |
clarkb | ERROR! The requested handler 'letsencrypt updated insecure-ci-registry99-main' was not found in either the main handlers list nor in the listening handlers list is curious because it seems like that handler is right there in the handlers file... | 16:32 |
clarkb | you can copy that string and ^F it and it matches | 16:32 |
tonyb | I think you're looking at patchset 4, 5 should fix that | 16:33 |
clarkb | oh yup I was looking at a stale job run | 16:34 |
clarkb | ok we hardcode in the clouds.yaml that the cloud is rax so the errors trying to auth there are expected | 16:39 |
clarkb | but that doesn't explain why it isn't listening on port 5000 | 16:39 |
tonyb | I was assuming that the errors were fatal so the registry errors out | 16:40 |
clarkb | If that is the case I'm not sure how this test ever succeeded. Maybe push up a noop change separately to see what the current state looks like? | 16:41 |
tonyb | If I modify zuul.d/system-config-run.yaml can I safely add /var/registry/conf to the logs? | 16:42 |
tonyb | Yup I'll do that | 16:42 |
clarkb | ya the test env shouldn't have any real world credential access | 16:43 |
clarkb | they aren't even zuul secrets it is completely separated so fetching that in the test env should be fine | 16:43 |
tonyb | okay cool | 16:45 |
opendevreview | Tony Breeds proposed opendev/system-config master: [dnm] checking testing for the existing registry https://review.opendev.org/c/opendev/system-config/+/885524 | 16:46 |
opendevreview | Tony Breeds proposed opendev/system-config master: [DNM] Test insecure-ci-registry.opendev.org on jammy https://review.opendev.org/c/opendev/system-config/+/885421 | 16:49 |
corvus | clarkb: fungi do you think we're ready to roll the executors? did the new ze01 and ze02 get normalized wrt afs? | 16:56 |
clarkb | I haven't checked yet sorry. | 16:58 |
clarkb | corvus: I think ze02 was never modified. Only ze01 was and I want to say fungi did that? | 16:58 |
opendevreview | Dr. Jens Harbott proposed openstack/project-config master: wheel builds : move to individual releases https://review.opendev.org/c/openstack/project-config/+/879722 | 16:59 |
frickler | clarkb: ^^ just updated with your comment | 17:00 |
clarkb | +2 thanks | 17:02 |
fungi | corvus: clarkb: yes, new jammy ze01 was downgraded back to distro openafs packages after i cleaned up the ppa | 17:13 |
fungi | should be ready to roll as long as we're settled on https://review.opendev.org/885419 i think? | 17:14 |
fungi | that's the last remaining bit and only tangentially related, probably not a blocker | 17:14 |
frickler | corvus: https://zuul.opendev.org/t/openstack/config-errors looks much nicer already. the "name: Unknown" is still WIP I guess? | 17:15 |
fungi | also the project detail view will remain broken until we can roll out https://review.opendev.org/885155 | 17:17 |
frickler | the one complain I would have is that the "blue bell" link now goes directly to that page and no longer shows the total error count, so more difficult to track my progress | 17:19 |
frickler | +t | 17:19 |
corvus | frickler: yes, we can increase the fidelity there over time; there are already some other errors, just none present in opendev now. can easily add a counter at the top of the page for total results. | 17:19 |
frickler | that'd be nice, thx | 17:20 |
frickler | just doing some wishful thinking, if we could somehow integrate the openstack governance project/team <-> repo mapping into the zuul config and filter things with that, that might also be helpful on multiple occasions, like not only errors, but also status and builds | 17:38 |
*** amoralej is now known as amoralej|off | 18:10 | |
fungi | longer term the idea was that openstack would have its own zuul tenant, and then the tenant views would be exactly that | 18:15 |
frickler | fungi: I don't understand your reply, openstack does have its own tenant mostly? I don't think one tenant per project like nova/neutron etc. would be feasible? | 18:48 |
frickler | maybe filtering by queue would work, at least for most non-integrated projects | 18:49 |
frickler | hmm, the queues on https://zuul.opendev.org/t/openstack/status are shown only for the gate pipeline, not for check or others, is that intentional? | 18:50 |
frickler | afaict rate limiting per queue still affects the check pipeline, too, so would be useful to have that information there? | 18:51 |
frickler | corvus: ^^? | 18:51 |
fungi | frickler: oh, sorry i thought you meant filtering to just openstack repos so you could ignore everything non-openstack that's currently sharing the tenant | 18:58 |
fungi | yes custom query views for config errors could be interesting | 18:58 |
frickler | fungi: no, my idea was to have an URL I could give to e.g. the ironic team saying: these are the errors in your projects, please check and fix them | 19:00 |
fungi | it may be that queue filtering for status only returns the ones for dependent pipelines, so independent pipelines like check are omitted since they form independent per-change queues | 19:00 |
fungi | though i don't immediately see why those couldn't still be filtered on queue name | 19:01 |
frickler | and then I thought that it might as well be useful for the ironic team to have a link that filters just their zuul builds or just their in-progress things on the status page | 19:01 |
frickler | (choosing ironic as guinea pig because they have a rather high number of repos) | 19:01 |
fungi | you can give them a link to filter to just their builds or in-progress changes today, but it would have to be generated with a separate tool that constructs the query parameters based on what's in the governance file | 19:02 |
fungi | for zuul to have some integrated functionality to do that on the fly, we'd probably need to define a standard grouping mechanism of some kind (named queues are already used for other things cross-team, therefore not good to overload for this purpose) | 19:03 |
frickler | ah, I guess I could write such a tool and place the link onto some personal page as a first iteration, that's a nice idea | 19:05 |
fungi | something like the project groups concept in storyboard could make sense there. basically annotate a project with a list of zero or more groups to which it belongs, and then extend the query api to support filtering by one or more group names | 19:05 |
frickler | or more general tags, but yes, similar idea | 19:05 |
fungi | yeah, you could call them tags, groups, or frobnules, doesn't really matter | 19:06 |
corvus | yeah, i think third-party tool to construct those queries sounds good; both the builds and config-errors page intentionally use the same query params for that | 19:08 |
fungi | further development could include known group names in the drop-down filter selectors for some dashboard views, and possibly a groups view where you can get a list of groups within the tenant similar to the current top-level tenants view that takes you directly to group-filtered views of things | 19:08 |
fungi | probably something that merits a zuul spec | 19:08 |
corvus | i'm hesitant to add an organizational grouping system to zuul | 19:08 |
corvus | that's called tenants | 19:09 |
fungi | understandable | 19:09 |
fungi | starting with a build-your-own-dashboard-query tool external would be good to see how much use people find the idea anyway | 19:10 |
fungi | similar to the gerrit dashboard query builder tools that have been floating around | 19:10 |
corvus | ++ | 19:11 |
jrosser | `parentproject:openstack/openstack-ansible` pulls dozens of repos into our gerrit dashboard - perhaps that achieves something similar | 19:20 |
fungi | i haven't looked, but probably that's telling gerrit to return all projects whose gerrit config (acl) inherits from the same project. in the case of official openstack repos, they all inherit from the openstack/meta-config project | 19:26 |
fungi | that way the openstack project is able to set global gerrit policy in one place, simplifying the per-project configs | 19:26 |
frickler | that seems to be a special thing only used for openstack-ansible, never heard of that before, too. https://gerrit-review.googlesource.com/Documentation/cmd-create-project.html | 19:29 |
frickler | https://paste.opendev.org/show/b3m54jouBhaA9ypX8RLC/ | 19:30 |
fungi | openstack/openstack-ansible-roles does an inheritFrom = openstack/openstack-ansible | 19:34 |
jrosser | anyway - creating dashboard queries got completely out of hand, needing continuous adjustment to keep newly created other repos with similar names out | 19:35 |
jrosser | and thats pretty much sorted it completely and make the query quite compact | 19:36 |
fungi | fwiw, that's the only inheritFrom in openstack namespace acls other than openstack/meta-config being inherited by everything | 19:37 |
frickler | maybe that could be used to simplify the setup for some other teams, too | 19:40 |
frickler | but I don't think that zuul could use that information | 19:40 |
fungi | agreed, that's very gerrit-specific | 19:40 |
corvus | wildcard support in searches might be useful here too | 19:57 |
fungi | yes, or regular expressions (though those have greater chance of security risks) | 19:58 |
clarkb | gitea 1.20 just got a release candidate | 22:17 |
clarkb | I don't know what is in it yet | 22:17 |
opendevreview | Ian Wienand proposed opendev/system-config master: [dnm] trigger and openafs build to get error logs https://review.opendev.org/c/opendev/system-config/+/885557 | 22:23 |
ianw | https://8120cec50dbf22b0ed97-a15027182aab035aa882f99410b51a23.ssl.cf2.rackcdn.com/885557/1/check/system-config-zuul-role-integration-centos-9-stream/f19c651/dkms-make-logs/make.log | 23:23 |
ianw | that sure is a lot of errors | 23:23 |
ianw | i can't find any obvious references to those errors in current git or gerrit | 23:32 |
ianw | there seem to be 4 options | 23:39 |
ianw | 1) ignore it and hope openafs release something that fix it eventually | 23:39 |
ianw | 2) work w/ openafs to fix it, backport required patches in the mean time and figure out how to deploy them to the rpms we use | 23:40 |
ianw | 3) perhaps like Fedora, consider 9-stream too much of a moving target to keep wheel up-to-date for | 23:41 |
ianw | 4) rework the publishing to go through the executor, so that the wheel build environments don't need openafs. or indeed do something completely different like containerised builds etc etc | 23:42 |
ianw | despite writing it, I'm not 100% convinced on the ignore failure proposed by https://review.opendev.org/c/openstack/project-config/+/879722, and essentially implementing 1) here. | 23:44 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!