Tuesday, 2026-06-02

-@gerrit:opendev.org- Roja Eswaran proposed: [openstack/diskimage-builder] 987717: Replace debootstrap with mmdebstrap https://review.opendev.org/c/openstack/diskimage-builder/+/98771715:35
-@gerrit:opendev.org- Monty Taylor https://matrix.to/#/@mordred:inaugust.com proposed: [opendev/system-config] 991140: Start mirroring the rust container image https://review.opendev.org/c/opendev/system-config/+/99114015:39
@mordred:waterwanders.comlooks like we're using unauthenticated docker for mirroring. presumably if that ^^ and it's parent (add resolute) aren't centrally interesting there's nothing stopping me from using the same job in an inaugust repo or drizzle namespace and doing my own parallel mirror, right? (although those seem reasonably applicable enough to be in the central mirror) Also - I have a rabbitmq plugin in drizzle that it feels like testing against the rabbitmq image would make sense. I'm a little surprised no-one in openstack is deploying rabbit in a container... or at least not leveraging upstream rabbit images and wanting a mirrror image16:01
@jim:acmegating.comi think our intention was to have a fairly low bar for mirroring, but not at zero.  both of those seem to be well above that so +2 from be on both.16:07
@clarkb:matrix.orgI think the primary thing was that we wouldn't mirror the super specific image that only one group would need. But generic things like language runtimes and official builds of tools like databases are fine as they can and are used by many16:10
@mordred:waterwanders.com++ cool! \o/16:17
-@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 988310: Add GrepTimeDB long term storage for Prometheus https://review.opendev.org/c/opendev/system-config/+/98831016:20
-@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 988310: Add GrepTimeDB long term storage for Prometheus https://review.opendev.org/c/opendev/system-config/+/98831016:21
-@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 988310: Add GrepTimeDB long term storage for Prometheus https://review.opendev.org/c/opendev/system-config/+/98831016:21
@mordred:waterwanders.comI feel like I should know this - but if I have a job in opendev/base-jobs and I want to mention a role from zuul/zuul-jobs in its documentation, do we have a construct for that?16:23
@mordred:waterwanders.com(actually, thinking about it, I'm not sure I really need to do that - but now I'm curious)16:24
@jim:acmegating.commordred: someone just added intersphinx support to zuul-sphinx, so it should be possible to use that for that now, but nothing has been done to enable that in any of the opendev repos.  i would happily review patches that did that.16:25
short of that, i think just a normal hyperlink.
@jim:acmegating.com(i think they did add testing of that and examples to the zuul-sphinx repo itself, so i think that could serve as a source of copypasta)16:26
@mordred:waterwanders.comneat. I may nerd-snipe myself on that one. it feels like a thing that _should_ work16:26
@jim:acmegating.comagree; a little awkward for that not to be enabled in zuul-jobs and friends now that it exists16:27
@mordred:waterwanders.comClark: mind +A on the parent of the rust one: https://review.opendev.org/c/opendev/system-config/+/990727 16:32
@clarkb:matrix.orgoh yup I missed there was a parent change16:35
@mordred:waterwanders.comyeehaw16:36
-@gerrit:opendev.org- Zuul merged on behalf of Monty Taylor https://matrix.to/#/@mordred:inaugust.com:16:43
- [opendev/system-config] 990727: Start mirroring ubuntu resolute container images https://review.opendev.org/c/opendev/system-config/+/990727
- [opendev/system-config] 991140: Start mirroring the rust container image https://review.opendev.org/c/opendev/system-config/+/991140
-@gerrit:opendev.org- Zuul merged on behalf of ayyappa: [openstack/project-config] 990651: Add repo app-ejbca for starlingx https://review.opendev.org/c/openstack/project-config/+/99065116:44
@clarkb:matrix.orgmnasiadka: any idea why https://review.opendev.org/c/opendev/system-config/+/988310/ says it depends on a change with invalid configuration?16:52
@clarkb:matrix.orgthe two parent changes have +1s from Zuul. Maybe we need to recheck them to see what is wrong?16:52
@jim:acmegating.comClark: those comments lack a vote; i suspect they are from different tenants16:54
@clarkb:matrix.orgoh!!!16:55
@mnasiadka:matrix.orgClark: I'd be happy to understand that too :)16:55
@clarkb:matrix.orgsystem-config is pbably in opendev and openstack tenants or something16:55
@jim:acmegating.comi'm not sure about that, just something to check16:55
@clarkb:matrix.orghrm it isn't enqueued into openstack's check queue like I would expect it to though16:56
@jim:acmegating.comnope i'm wrong16:56
@jim:acmegating.comhttps://zuul.opendev.org/t/openstack/buildset/c63e0ee66722450e8f17db75cbfcd4af16:56
@jim:acmegating.comthat really is from the openstack tenant16:56
@jim:acmegating.comClark: i'd proceed with your recheck testing then16:57
@clarkb:matrix.orghttps://review.opendev.org/c/opendev/system-config/+/991140 just merged and is also in system-config so the repo itself isn't completely broken16:57
@clarkb:matrix.orgcorvus: rechecking the parents you mean?16:57
@jim:acmegating.comya16:57
@clarkb:matrix.orgI wonder if this is fallout from the mixed nodeset change. It modified system-config run stuff16:58
@clarkb:matrix.organd changed some of the yaml replacement. I bet that is the issue. mnasiadka I think we have to rebase the stack and update the system-config-run-prometheus job to use the new yaml string replacement anchor name16:58
-@gerrit:opendev.org- Michal Nasiadka proposed on behalf of Mohammed Naser: [opendev/system-config] 980840: Add Prometheus monitoring service https://review.opendev.org/c/opendev/system-config/+/98084016:59
@clarkb:matrix.orggit can rebase it without file level conflict, but we have a semantic level conflict due to the change in the anchor names for the bridge node in the nodest16:59
-@gerrit:opendev.org- Michal Nasiadka proposed on behalf of Mohammed Naser: [opendev/system-config] 980994: Deploy node_exporter across all managed hosts https://review.opendev.org/c/opendev/system-config/+/98099416:59
-@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 988310: Add GrepTimeDB long term storage for Prometheus https://review.opendev.org/c/opendev/system-config/+/98831016:59
-@gerrit:opendev.org- Michal Nasiadka proposed on behalf of Mohammed Naser: [opendev/system-config] 980840: Add Prometheus monitoring service https://review.opendev.org/c/opendev/system-config/+/98084017:00
-@gerrit:opendev.org- Michal Nasiadka proposed on behalf of Mohammed Naser: [opendev/system-config] 980994: Deploy node_exporter across all managed hosts https://review.opendev.org/c/opendev/system-config/+/98099417:00
-@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 988310: Add GrepTimeDB long term storage for Prometheus https://review.opendev.org/c/opendev/system-config/+/98831017:00
@mnasiadka:matrix.orgThat should probably do it.17:00
@fungicide:matrix.orgaha, sorry for the churn there!17:03
@mnasiadka:matrix.orgNo problems, world around is moving on constantly :)17:08
@mnasiadka:matrix.orgHow are we with Resolute arm64 nodes? I would need that for Kolla support - happy to help there if I can17:08
@clarkb:matrix.orgmnasiadka: I don't think aynone has started on that. One of the big concerns there is mirror space. The ports mirror is larger than the x86 mirror I think and we're a bit tight on space after adding x8617:09
@clarkb:matrix.orgI think the next steps for that are to propose the jobs for arm64 resolute nodes then start figuring out a plan for mirroring the package content. Maybe we can clean up other mirror content to make more room or maybe we need more afs disk storage? etc17:10
@fungicide:matrix.orghttps://grafana.opendev.org/d/9871b26303/afs for a high-level view of the usage situation17:12
@fungicide:matrix.org1.17tb in mirror.ubuntu (with resolute), 1.03tb in mirror.ubuntu-ports (without resolute)17:13
@fungicide:matrix.orgwe're about 600gb away from running out of space in /vicepa on afs01.dfw17:14
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 991155: Update grafana to 13.0.2 https://review.opendev.org/c/opendev/system-config/+/99115517:17
-@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/zuul-providers] 991156: Add Resolute arm64 nodes and images https://review.opendev.org/c/opendev/zuul-providers/+/99115617:17
-@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/zuul-providers] 991156: Add Resolute arm64 nodes and images https://review.opendev.org/c/opendev/zuul-providers/+/99115617:19
@mnasiadka:matrix.orgfungi: I'd reckon operating system packages mirrors are not getting smaller, but also the arm64 provider is not the fastest one - so maybe not mirroring ports is not that big of a problem right now17:25
@fungicide:matrix.orgagreed, jobs would just need some override to accommodate that17:26
@mnasiadka:matrix.orgClark: I would assume we want to disable https://docs.greptime.com/reference/telemetry/ ?17:48
@clarkb:matrix.orgYes please17:50
-@gerrit:opendev.org- Michal Nasiadka proposed: [opendev/system-config] 988310: Add GrepTimeDB long term storage for Prometheus https://review.opendev.org/c/opendev/system-config/+/98831017:55
@mnasiadka:matrix.orgDone17:59
-@gerrit:opendev.org- Zuul merged on behalf of Dmitriy Rabotyagov: [openstack/project-config] 989144: Change ACLs for Venus to retired https://review.opendev.org/c/openstack/project-config/+/98914418:34
@fungicide:matrix.orgjust to confirm Clark's suspicion from the meeting, it does look like i need to manually add an account on the new backup server for wiki19:14
@fungicide:matrix.org`Remote: Permission denied (publickey).`19:14
@fungicide:matrix.orgi see we have a `borg-wiki-update-test` user on the old server, i can manually create an equivalent and set the same `~/.ssh/authorized_keys` entry unless there's a more proper way to go about it19:16
@clarkb:matrix.orgfungi: I would have to cross check against ansible, but that sounds correct from memory19:19
@clarkb:matrix.organd then the cron entries are offset ~evenly through the day from each other19:20
@fungicide:matrix.orgi was just going to edit the existing 02 backup to use 03 instead, since we have another backup to the other region anyway19:25
@clarkb:matrix.orgack that works for me19:28
@clarkb:matrix.orgonce etherpad is done I'm going to upgrade my local network gear so I will be temporarily disconnected from the Internet at that point. But I'll wait for etherpad things to complete first19:34
@fungicide:matrix.orgoh good reminder, i rolled back and pinned firmware on 2 of my 3 meshed waps because they kept dropping and reconnecting which made my home lan unusable. time to see if things have improved19:36
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [opendev/system-config] 990531: Update etherpad from 2.7.3 to 3.2.0 https://review.opendev.org/c/opendev/system-config/+/99053120:05
@clarkb:matrix.orgthat should deploy in a few minutes after the hourly runs20:07
@fungicide:matrix.orgyeah, any moment20:09
@fungicide:matrix.orgthough infra-prod-service-zuul is running slower than usual20:10
@fungicide:matrix.orglooks like pulling zuul container images is taking a while on at least some servers20:13
@fungicide:matrix.orgi wonder if this is going to end up with timeouts and hung docker pulls on one or more servers again20:14
@clarkb:matrix.orgI guess we're going to find out soon enough20:14
@fungicide:matrix.orgze03 and ze08 have both been pulling images for the past 10 minutes20:16
@fungicide:matrix.orgthough currently no executors have still-running `docker-compose pull` processes from prior hourly ansible runs20:17
@clarkb:matrix.orgIt probably has to do with when zuul merges changes and pushes new images more than anything else20:18
@fungicide:matrix.orgyes, but 9 of the 11 executors wrapped things up within a matter of seconfs20:18
@fungicide:matrix.orgseconds20:18
@clarkb:matrix.orgSo 989249 has just landed and published after the last hourly run20:18
@fungicide:matrix.orgze05 was the last one to complete and took 3m50s to pull all the images20:19
@fungicide:matrix.orger, 2m50s i mean20:19
@fungicide:matrix.orgso ze03 and ze08 are taking 5x (so far) as long as ze05 which was the slowest one that has finished20:20
@jim:acmegating.comin the past, i think we determined nothing was happening; like the tcp conns got dropped with no fin.  i'm guessing docker doesn't enable keepalives?20:21
@fungicide:matrix.orgthat was my best guess based on limited data gathered from strace20:22
@fungicide:matrix.organd then the processes hung around indefinitely, until the next reboot20:22
@jim:acmegating.com(periodic bad-idea reminder: if we get tired of this, we could run our own registry; zuul-registry is horizontally scalable with a swift backend)20:22
@jim:acmegating.commaybe we can put the pulls in a timeout/retry ansible block thingy20:23
@fungicide:matrix.orgit doesn't have a substantial impact on us, just sometimes slows things down when it holds the deploy semaphore until the job hits its timeout20:23
@jim:acmegating.comwell, it does completely break zuul deployment every few weeks20:23
@fungicide:matrix.orgwhich, honestly, is the only reason i even noticed it happening last time20:23
@clarkb:matrix.orgDo we know what the fail/timeout mode is?20:24
@clarkb:matrix.orgDo we have to intervene or just be patient etc. Not sure if know20:24
@fungicide:matrix.orgi think the job ends at the 30min mark20:24
@clarkb:matrix.orgAh the job itself times out then20:24
@jim:acmegating.comClark: if i understand the question right: docker and docker-compose never times out, it just hangs forever20:24
@fungicide:matrix.orgwell, the playbook, bur yes20:24
@jim:acmegating.comso if we want to improve it, we'd need to run "timeout X docker-compose pull" and put that in an ansible retry block20:25
@fungicide:matrix.orgright, subsequent `docker-compose pull` calls in later hourly jobs are unaffected and complete successfully, but the old processes persist until the server is rebooted20:25
@fungicide:matrix.orgthe other odd thing is, i've only seen this affect the executors, not any other zuul servers20:26
@jim:acmegating.comsounds like a 5-10m timeout/retry would be an improvement20:26
@clarkb:matrix.org++20:26
@clarkb:matrix.orgfungi: the executor image is much larger than the others20:26
@fungicide:matrix.orgit looks like 3 minutes is a good upper-bound on a slower run that completes successfully20:26
@clarkb:matrix.orgLike all the other images we use are smaller than it20:26
@fungicide:matrix.orgso 5 is probably good20:26
@fungicide:matrix.orggot it, so maybe image size plays a role, greater chance of cosmic rays bombarding the nic or something20:27
@fungicide:matrix.orgbut yeah, if `docker-compose pull` is still running after 5 minutes, kill it and try again, and if it doesn't work 3 times in a row then fail the task/job20:29
@fungicide:matrix.orgworst case it will probably still fail half as fast as waiting for the configured job/phase playbook timeout20:30
@fungicide:matrix.orger, half as long20:30
@clarkb:matrix.orgYup encoding that retry logic should be straightforward with ansible 20:30
@clarkb:matrix.orgMay need to wrap it with the timeout command? Not sure if ansible task timeouts do the right thing with retries20:30
@clarkb:matrix.orgLooks like it did timeout at ~30 minutes. Etherpad deployment is proceeding20:35
@fungicide:matrix.orgfwiw, the pulls are still running on ze03 and ze08 even though the job has now timed out, so seems the same as the last time i looked into it20:36
@fungicide:matrix.orgmy browser just disconnected from and reconnected to a pad i was in the middle of editing20:39
@clarkb:matrix.org`Up 17 seconds (healthy)` so it should be back up now. Time to check it works20:39
@fungicide:matrix.orgretained my session too20:39
@clarkb:matrix.orghttps://etherpad.opendev.org/p/gerrit-upgrade-3.13 loads for me20:39
@fungicide:matrix.orgall lgtm so far20:39
@clarkb:matrix.orgI'll keep an eye on memory consumption. It took about 15 minutes to OOM on 2.7.3 with session cleanup enabled20:39
@clarkb:matrix.orgif it makes it to say 30 minutes that is when I'll go ahead and upgrade my network gear20:40
@fungicide:matrix.orgi'm going to peel some garlic for ผัดซีอิ๊ว (phat si-io)20:42
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 991221: Retry zuul executor container image pulls https://review.opendev.org/c/opendev/system-config/+/99122120:49
@clarkb:matrix.orgsomething like ^ that for the discussed timeout and retries idea20:49
@mordred:waterwanders.comI know we have all the control in the world over every aspect of gerrit - so just noting that the text: "Outdated Votes: Code-Review+1 (copy condition: "changekind:TRIVIAL_REBASE OR is:MIN")"  mostly causes the words "TRIVIAL_REBASE stand out, which keeps making me confused when I know something _wasn't_ a trivial rebase :)20:57
@clarkb:matrix.orgwe've made it to 25 minutes and the gerrit 3.13 etherpad still loads for me21:04
@clarkb:matrix.organd now 30 minutes. Time for network stuff. Hopefully i'm back in 5-10 minutes :)21:10
@mordred:waterwanders.comnow I'm hungry21:10
@jim:acmegating.comi'm sure he's making enough for everybody21:12
@mordred:waterwanders.comoh good21:12
@fungicide:matrix.orgalways21:19
@clarkb:matrix.orgI think I'm back and my network is functional?21:23
@fungicide:matrix.orgyour bits are reaching me21:28
@mordred:waterwanders.comand also me21:28
@mordred:waterwanders.comI see bits from various people, in fact21:28
@clarkb:matrix.orgexcellent one less thing to worry about now21:28
@clarkb:matrix.orgI think https://review.opendev.org/c/opendev/system-config/+/988993 and https://review.opendev.org/c/opendev/system-config/+/991221 are the two next things on my todo list (they both update zuul executor config management). THen tomorrow morning fungi and I are meeting with the i18n team to try and help them weblate more21:30
@fungicide:matrix.organd anyone else who wants to pitch in, i don't think there's a limit21:36
@fungicide:matrix.orghttps://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/2ZLT3VQL377LRB64SQLKTP33I5XEK7BF/ i18n SIG Virtual Sprint (June 3rd, 14:00 UTC)21:39

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!