Thursday, 2025-12-04

clarkbmnasiadka: I left a comment on the zuul-jobs change. I don't know that the change is wrong, but I want to make sure we've got some sort of idea of how we're getting to the end state we want (which this change may be one of potentially many steps towards)00:09
*** liuxie is now known as liushy02:34
mnasiadkaclarkb: commented back05:17
mnasiadkafrickler, tonyb: are you willing to do a second review on https://review.opendev.org/c/opendev/zuul-providers/+/967962 ? I think that should fix the iptables.service not running on rocky hosts05:20
priteauGood morning. We are seeing lots of timeouts in Kayobe CI since yesterday. Are there issues with mirrors?09:09
priteauhttps://zuul.opendev.org/t/openstack/build/44e5376c969d4ca08f1823515c317218/log/primary/ansible/overcloud-deploy#27957-2796409:10
priteauhttps://zuul.opendev.org/t/openstack/build/025eed087a274169b59967b0e74dbed6/log/primary/ansible/seed-deploy-pre-upgrade#1876309:10
priteauhttps://zuul.opendev.org/t/openstack/build/fb560ac624fe4356acb3db9b000fbc6f/log/primary/ansible/overcloud-deploy#37810-3781709:10
priteauI keep seeing rax.opendev.org in all these failures, but it is not always the same region09:11
priteauActually one has just failed with OVH: https://zuul.opendev.org/t/openstack/build/815abbe97dfd4730b0a9c24d6a622c14/log/primary/ansible/overcloud-deploy#27488-2749509:35
priteauSo maybe it isn't an infrastructure issue09:36
mnasiadkapriteau: we don’t see that in Kolla - because we stopped using the local Apache/mod_proxy servers (which you refer to as mirrors) due to issues that we have seen after upgrading to docker-ce 2909:45
priteaumnasiadka: was it the same issue? (timeouts)10:10
mnasiadkapriteau: I think so, or some image layer fetch problems - I don’t remember in detail10:12
priteaumnasiadka: thanks, I will make the same change in kayobe10:44
priteauLooking better with the fix13:43
fungipriteau: mnasiadka: looks like that's a proxy to quay.io? sounds like the mirror servers are not getting responses from there fast enough14:29
fungii wonder if there's something more global going on, i'm getting "no server is available to handle your request" errors from github this morning too14:30
mnasiadkafungi: it might be also docker-ce 29.0 or new containerd version that is more picky or does things differently - but last time when we had these problems - disabling usage of the proxy in docker config made things working - so Kolla-Ansible is running like that for a couple of weeks14:34
fungimakes sense. also i think our proxy caches are not well suited for large blobs like images from container registries. there's just not enough local storage to keep enough of the content hot14:35
fungiall the proxies combined share around 60gb (we have htcacheclean set to delete anything more than that so the volume doesn't fill up)14:38
mnasiadkaAh, right - that makes sense14:41
mnasiadkafungi: need another +2 on https://review.opendev.org/c/opendev/zuul-providers/+/967962 - willing to have a look?14:41
fungimnasiadka: looks fine but what's the reason for doing it on all systemd systems except gentoo? was putting it outside that conditional check problematic?14:44
fungialternatively, since you say it only seems to be happening on rocky, why not just in the conditional check for rhel derivatives?14:45
mnasiadkafungi: we basically get Rocky 10 with firewalld - and it seems iptables.service doesn’t get started at all (even though it’s enabled)14:45
mnasiadkaSure, I can update the change to do it only on RHEL derivatives14:45
fungiright, i understood that much from the commit message, mostly just wondering what the rationale is for the particular conditional you nested it in14:45
fungii would either have expected it to be applied to all systemd systems, or to a narrower set covering the platform where you observed the problem14:46
mnasiadkaThat felt like the most reasonable place without introducing another conditional14:46
fungibut the spot you picked was basically "all systemd systems except gentoo"14:46
mnasiadkaYeah, now I see that :-)14:47
mnasiadkaL54 sounds like all RHEL derivatives sort of14:47
fungiright14:47
mnasiadkaSo, should we have it for RHEL derivatives or for all? I can also make it only for Rocky if it makes any sense14:48
fungii would say move it up between lines 42 and 43 and just apply it to all systemd systems, if we think it could have the potential to happen on other platforms that start installing firewalld in the future14:48
fungibut i could also be convinced that having it more tightly scoped would be safer14:49
opendevreviewMichal Nasiadka proposed opendev/zuul-providers master: nodepool-base: Mask firewalld unit  https://review.opendev.org/c/opendev/zuul-providers/+/96796214:49
mnasiadkaI don’t think firewalld is even installed on any other distro or version, and since we rely on iptables.service - it makes sense it doesn’t get in the way14:50
mnasiadka(And systemctl doesn’t do any checks - just links firewalld.service to /dev/null)14:50
fungilet's see if clarkb agrees when he wakes up14:51
fungithen hopefully we can get it merged14:51
mnasiadkaGreat, thanks14:58
clarkbI think firewalld is installable on debuntu (but ufw is default) then is default on rhel derivatives. Masking it globally seems fine as long as systemd doesn't complain when it isn't present (and apparently it doesn't)15:49
clarkbI've approved it15:49
fungiheading out briefly to grab lunch, bbiab16:13
clarkbI will be headed to that doctors appointment shortly as well16:25
opendevreviewClark Boylan proposed opendev/system-config master: Update developer.openstack.org vhost AllowOverride Rules  https://review.opendev.org/c/opendev/system-config/+/96983818:51
opendevreviewClark Boylan proposed openstack/project-config master: Update Jeepyb's Gerrit builds to Gerrit 3.11  https://review.opendev.org/c/openstack/project-config/+/96984619:43
opendevreviewClark Boylan proposed opendev/system-config master: Update infra-prod review and manage-projects deps for new Gerrit  https://review.opendev.org/c/opendev/system-config/+/96984719:49
clarkbthose two changes are being staged for after the gerrit upgrade and shouldn't be landed just yet. I've put them on the etherpad notes so that we don't forget though19:50
opendevreviewMerged opendev/system-config master: Update developer.openstack.org vhost AllowOverride Rules  https://review.opendev.org/c/opendev/system-config/+/96983820:03
clarkbfungi: ^ that applied to apache's sites enabled content but the server was not restarted (it may haev been reloaded I'm not sure yet). Do we know if we a restart is necesasry to pick that up?20:31
fungiusually a reload should be sufficient for any vhost configuration20:32
clarkbok I'll see if bridge logs indicate that a reload was performed20:32
clarkbRUNNING HANDLER [static : Reload apache2] changed: [static02.opendev.org] "changed": true "ActiveExitTimestamp": "Wed 2025-11-19 18:55:26 UTC"20:36
clarkbI believe it did reload so if we think that is sufficient we should be all set20:37
clarkband redirects continue to work for me20:37
clarkblooking at the apache docs ExtendedStatus (in the same doc file as AllowOverride) indicates it cannot be updated with a graceful restart. But AllowOverride doesn't say anything about it so I suspect that it is fine with a reload20:41
fungiargh, 967962 got killed by a single arm64 image timeout21:19
fungino, wait, two of them21:19
fungiduring the same task we were seeing be really slow previously, where i suspected an implicit background fsync blocking the next write21:21
fungior at least one of them did. the other timed out closer to the end of the job but that same fstab copy took ~97 minutes to complete21:22
fungiis there any more interest in merging https://review.opendev.org/968386 so we can start collecting dstat data?21:23
fungiwould have been great to know if the node was steadily writing to disk during that 97 minutes of silence21:24
clarkbfungi: I've approved it now21:35
clarkbit == the dstat change21:35
fungicool, thanks!21:35

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!