Monday, 2026-02-09

-@gerrit:opendev.org- Zuul merged on behalf of Henrik Wahlqvist: [openstack/project-config] 975154: Add back volvocars repos in volvocars tenant https://review.opendev.org/c/openstack/project-config/+/97515414:42
@fungicide:matrix.orgattempting to request apache server status on lists.o.o is timing out so i suspect all worker slots are in use yet again. i'm going to restart apache to clear out any that might be hung and haven't timed out yet15:28
@fungicide:matrix.orglooks like gitea14 may be overloaded, looks like my ssh may time out trying to log in even15:46
@clarkb:matrix.orgfungi: do we know if any opendev stuff is affected by the setuptools release yet? Some naive greps on my side don't show any pkg_resource hits so I think as long as the pbr updates we made late last yaer are happy then we may be ok15:49
@clarkb:matrix.orgas for web crawler traffic load maybe we should land your docs.opendev.org update and see how that fares?15:49
@fungicide:matrix.orgno clue, i'm still trying to catch up on morning moderation tasks and outages, haven't even gotten to the e-mail thread about setuptools on openstack-discuss yet15:50
@clarkb:matrix.orgack. I suspect if we do have any problems it is in our dependencies and not a direct issue15:50
@fungicide:matrix.orgwhen my ssh into gitea14 finally connected, load averages reported on it were lower than on any other backends, fwiw15:50
@clarkb:matrix.orgfungi: I think a behavior we see with those backends is one will get overloaded by a particularly aggressive client ip and then it gets removed from the pool and slowly returns to normal as it processes out the running requests. While that happens a new server is hit in the pool since the old one was removed and we do that in a cycle15:51
@fungicide:matrix.orgi can't see what resource issues could have slowed down my ability to ssh into gitea14 though, cpu and memory look okay, maybe disk i/o is just super, super slow?15:54
@clarkb:matrix.orgmaybe? I think a lot of that git disk content manages to get cached, but if we're processing enough requests to load memory up with "real" data those disk caches get evicted and maybe we become very sensitive to disk io?15:55
@clarkb:matrix.orgalso possible that this isn't a crawler problem and instead some sort of VM hosting / VM problem?15:55
@clarkb:matrix.orgfungi: for example I think live migrations might exhibit slow network behaviors15:56
@fungicide:matrix.orgyeah, i'm wondering if there's something pathological with network to/from the server maybe15:56
@clarkb:matrix.orgI think `openstack client server event list` or similar will give you a history of migrations15:56
@clarkb:matrix.org* I think `openstack server event list` or similar will give you a history of migrations15:56
@fungicide:matrix.orgoh, ipv6 ping is 100% packet loss for me to all the gitea servers, maybe something is happening with v6 routes to vexxhost, though that doesn't explain the problem with gitea1415:57
@clarkb:matrix.orgfungi: well your ssh conenction may have taken time to timeout on ipv6 before falling back to ipv4?15:58
@fungicide:matrix.org0% packet loss for ipv4 to gitea14 at least15:58
@clarkb:matrix.orgthat may explain what you saw connecting15:58
@fungicide:matrix.orgi was able to ssh into the other 5 backends just fine though15:58
@clarkb:matrix.orgah15:59
@fungicide:matrix.orghuh, though `ssh -4` does connect instantly15:59
@fungicide:matrix.orgssh over ipv6 is working for me to gitea09, just not icmp echo over ipv616:00
@fungicide:matrix.orgso looks like ipv6 ping is broken to all the gitea servers, but ssh over v6 is working to all except gitea14. i guess that's the difference16:01
@fungicide:matrix.organd explains the long delay i see logging into that one specifically16:01
@clarkb:matrix.orgthat may also explain the failures that were seen. We can drop the ipv6 A record for opendev.org if we think that is a good idea16:02
@fungicide:matrix.orgbut haproxy is only load balancing to v4 addresses of the backends, so this presumably isn't affecting actual users16:02
@clarkb:matrix.org* that may also explain the failures that were seen. We can drop the ipv6 AAAA record for opendev.org if we think that is a good idea16:02
@clarkb:matrix.orgfungi: right it would just be the ipv6 connectivity to the haproxy server 16:02
@fungicide:matrix.orgi'm able to ssh into gitea-lb03 over ipv6 just fine though16:04
@fungicide:matrix.orgso i guess it's not that either16:04
@clarkb:matrix.orgI'm trying to get some local updates done so that I can reboot then I'll load keys and can get a second set of eyes on it to see if there is naything obviously wrong. Another thing to check is there is a way to ask haproxy for the active backends that would identify what if any had been removed automatically16:16
@clarkb:matrix.org`echo "show stat" | sudo socat /var/lib/haproxy/run/stats stdio` says all six backends are up16:23
@clarkb:matrix.orgsystem load on all six backends looks fine too16:23
@clarkb:matrix.orgI don't think I'll bother with a socks proxy to check them all directly. I suspect that whatever issue there was is either no longer and issue or is related to ipv6 connectivity that you ran into16:24
@fungicide:matrix.orgyeah, i never saw the gitea problem, stephenfin reported it in #openstack-infra irc16:25
@fungicide:matrix.orgi've just been trying to track down a potential cause16:25
@fungicide:matrix.org(system-config-run-base) `TASK [pip3 : Install latest pip and virtualenv] An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ModuleNotFoundError: No module named 'pkg_resources'` https://zuul.opendev.org/t/openstack/build/4fa10b98b2b844e98786e78b65483bd116:45
@fungicide:matrix.orgjust noticed that failure16:45
@clarkb:matrix.orglooks like we're failing to setup the ansibel virtualenv there16:50
@clarkb:matrix.orgbecause pip or virtualenv themselves depend on pkg_resources?16:50
@clarkb:matrix.org`item=virtualenv`16:50
@clarkb:matrix.orgisn't that ironic that even virtualenv doesn't work16:51
@clarkb:matrix.orgoh maybe it is the ansible module checking if the packages are installed via pkg_resources?16:51
@clarkb:matrix.orgok I think I get it. We install ansible to a virtualenv. Then later we run ansible out of that virtualenv and ansible within that virtualenv expects setuptools to have pkg_resources available16:55
@fungicide:matrix.orgyeah, it looks like it's ansible or the pip module from ansible's stdlib doing it16:55
@clarkb:matrix.orgI can push a change up that pins setuptools in the ansible venv install there16:56
@clarkb:matrix.orgI half wonder if bridge updated its venv ( can't remember if we do that or not)16:56
@fungicide:matrix.orgit's likely this was already fixed in ansible for a while and we're just running an older version16:57
@clarkb:matrix.orgpossible. There is a comment in the install-ansible role indicating that we upgrade the even once a day16:58
@clarkb:matrix.orgso its possible that we are already in a chicken and egg situation here and need to manually fix bridge16:59
@clarkb:matrix.orgyes we have setuptools 82 installed in that venv on bridge17:01
@clarkb:matrix.orgwe should check the hourly jobs17:01
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 976149: Pin setuptools in our bridge ansible virtualenv https://review.opendev.org/c/opendev/system-config/+/97614917:04
@clarkb:matrix.orgI added the setuptools pin/install using the same conditions taht cause normal installation to update once a day so that may automatically update. However, if the problem is in pip itself then we may need to do surgery on bridge to build a new virtualenv by hand (or maybe just move the old virtualenv aside and let it rebuild it in the hourly jobs?)17:06
@clarkb:matrix.orginfra-prod-service-bridge is failing but the other hourly jobs seem to be ok17:08
@clarkb:matrix.orgI think that is because we're doing containers for so many services now that we aren't doing a lot of pip. So ya I think maybe we land 976149 or something like it once that passes testing then see if hourly updates can correct itself. If not we can probably move the virtualenv aside and have it rebuild it from scratch on the next hourly run17:09
@clarkb:matrix.orgthat hit a merge failure17:10
@clarkb:matrix.orgI think github may be having problems I see a zuul image build hit a 500 error trying to get skopeo. Maybe our ansible dev job which pulls ansible hit something similar. I'll check merger logs17:11
@clarkb:matrix.orgzm05 handled refstate job 74143b6bdf5c47188ef07cad0ce6a9a9 which failed due to `stderr: 'fatal: unable to access 'https://github.com/pytest-dev/pytest-testinfra/': Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds'`17:16
@clarkb:matrix.orghttps://www.githubstatus.com/ and they are having a sad17:17
@clarkb:matrix.orgIf there was ever a signal from the universe that I should go do some of that paperwork that I've been putting off this may be it17:19
@clarkb:matrix.orgthe latest update says git operations are normal again. I'll give it a few more minutes then recheck that change17:28
@fungicide:matrix.orghttps://discuss.python.org/t/106079 is covering the setuptools v82 pkg_resources removal, and also mentions the github outage17:30
@clarkb:matrix.orgI rechecekd and jobs seem to be queued up now17:35
@clarkb:matrix.orgre upgrading ansible iirc the problem is that the next version of ansible wants a newer version of python than is one bridge so we'd need to upgrade bridge or run ansible out of a container or something17:35
@clarkb:matrix.orgI think as long as this pin works that we'll get by reasonable well. And if a pkg_resources package shows up on pypi we can start installing that instead17:36
@fungicide:matrix.orgagreed17:43
@clarkb:matrix.orgfungi: that change is in the gate now so I think that did fix it in CI. I don't think it iwll merge before the next hourly run but that is ok it should deploy itself and if that fails then we can move the venv aside and see if the next hourly runs fixes it?17:58
@fungicide:matrix.orgyep, sounds right17:59
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [opendev/system-config] 976149: Pin setuptools in our bridge ansible virtualenv https://review.opendev.org/c/opendev/system-config/+/97614918:14
@clarkb:matrix.orgthat is running the bootstrap bridge job right now. I'll check pip list in that venv when it is done18:16
@clarkb:matrix.orgdoesn't look like that updated the venv. So we need to wait for service-bridge to run first I guess. But I suspect that is in the chicken and egg state18:18
@fungicide:matrix.orgso 40 minutes, i guess18:19
@fungicide:matrix.orgor do we run that on deploy?18:19
@clarkb:matrix.orgwe did not run that job as part of the deployment18:20
@clarkb:matrix.orghrm but the bootstrap bridge playbook seems to be what runs install-ansible18:21
@clarkb:matrix.orgI need to find logs18:21
@clarkb:matrix.orgoh! the reason is the condition that causes us to update the venv once a day18:22
@clarkb:matrix.orgfungi: I think I can force it to run on the next hourly run by updating the requirements.txt file on bridge for that18:22
@clarkb:matrix.orgthen we can reenqueue the deployment buildset18:22
@clarkb:matrix.org/usr/ansible-venv/requirements.txt has a date in a comment in it. I'll append clarkb to the comment and that should cause the file to change on the next run. Which will run because I reenqueue the buildset18:24
@fungicide:matrix.orgah okay18:25
@clarkb:matrix.orgIt is reenqueued18:26
@clarkb:matrix.orgsetuptools 81 is installed now. Now we wait for the next hourly runs to confirm that things are happy again18:29
@clarkb:matrix.orgThe reason that deployment worked is that the ansible env for the zuul executor is what updated the python venv for ansible on bridge. it looks like Ansible 9 and newer fix that via packaging.requirements (though it isn't clear yet if packaging.requirements is installed by default)18:36
@clarkb:matrix.orgall that to say I don't think the zuul upgrade Friday/Saturday will break us if the ansible install there inscludes packaging which I'll check against the container images shortly18:37
@clarkb:matrix.orgbut I also realize that we could update bridge ansible to ansible 9 or 10  but not 11 with the python versions we have18:37
@clarkb:matrix.orgso maybe that is a good next step for us as that will also fix this issue I think18:37
@clarkb:matrix.orgpackaging 26.0 is installed into the zuul ansible 11 venv and the zuul ansible 9 venv in the executor container. So ya I think we'll be ok on the zuul side even when setuptools updates there18:42
@clarkb:matrix.orgLooking at https://zuul.opendev.org/t/openstack/buildset/b1e214783230427b924007ab3630868b only the service-bridge job failed (if we ignore image mirroring jobs) so I think once we get this sorted system-cofnig should be happy18:51
@clarkb:matrix.orgstill possible a tool like git review or bindep or something is mad but I haven't seen evidence of that yet18:51
@clarkb:matrix.orgthe hourly service-bridge has succeeded so I think that is fixed for now19:05
@fungicide:matrix.orgyeah, so far that's the only issue i've seen in our stuff19:05
@fungicide:matrix.orgso well-prepared, i suppose19:06
@fungicide:matrix.orggithub seems to still be struggling, i just did a `git remote update` on a repository with an origin there, and it hung for several minutes then came back with a http/502 error19:10
@fungicide:matrix.orgthen ran again and it succeeded ~instantly, so seems to be hit-or-miss19:11
@clarkb:matrix.orgexciting19:23
@clarkb:matrix.orgI'd like to keep updating container images to trixie (like say gitea) but with the github errors it is probably best to wait a bit19:24
@clarkb:matrix.orgfor tomorrow's meeting agenda I'll add pkg_resources to the agenda. Drop gerrit upgrades. Any other edits to make?19:25
@clarkb:matrix.orgI've updated the meeting agenda as noted above. I also removed the zuul-registry item as the fixes last week appear to be working (I just rechecked the lodgeit image test change and it was able to find the image build last week without an issue)23:09
@jim:acmegating.comhuzzah!23:14

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!