Monday, 2026-05-18

-@gerrit:opendev.org- Sei Sano proposed: [opendev/irc-meetings] 988910: Update Masakari IRC meeting ... https://review.opendev.org/c/opendev/irc-meetings/+/98891006:46
@mnasiadka:matrix.orgThere is some slowness in git clone over https from opendev.org - as in `error: RPC failed; curl 28 Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds`09:46
-@gerrit:opendev.org- Zuul merged on behalf of Elod Illes: [openstack/project-config] 988430: [release-tools] Fix dist_name fetch for upper bump https://review.opendev.org/c/openstack/project-config/+/98843010:38
-@gerrit:opendev.org- Stephen Finucane proposed:12:52
- [opendev/git-review] 987712: Revert "Clean up all references to branchauthor after removal of usage" https://review.opendev.org/c/opendev/git-review/+/987712
- [opendev/git-review] 987713: Add gitreview.autotopic git config flag https://review.opendev.org/c/opendev/git-review/+/987713
- [opendev/git-review] 988966: Use configparser utils to parse types https://review.opendev.org/c/opendev/git-review/+/988966
-@gerrit:opendev.org- Elod Illes proposed: [openstack/project-config] 988968: Fix publish-openstack-releasenotes-python3 job https://review.opendev.org/c/openstack/project-config/+/98896812:58
-@gerrit:opendev.org- Stephen Finucane proposed:13:02
- [opendev/git-review] 988966: Use configparser utils to parse types https://review.opendev.org/c/opendev/git-review/+/988966
- [opendev/git-review] 987712: Revert "Clean up all references to branchauthor after removal of usage" https://review.opendev.org/c/opendev/git-review/+/987712
- [opendev/git-review] 987713: Add gitreview.autotopic git config flag https://review.opendev.org/c/opendev/git-review/+/987713
@fungicide:matrix.orgmnasiadka: i'm seeing slowness too, i wonder if it's because an openstack deployment project just made a release recently, or whether the crawlers have figured out how to bypass anubis finally13:24
@fungicide:matrix.orgi was able to clone openstack/nova at 5.35 MiB/s but there was quite a delay before the service responded, suggesting apache worker slots may be in short supply... looking now13:25
@fungicide:matrix.orgnope, not that, apache worker slots are almost entirely unused on all backends, load averages are sub-1.0 too13:29
@fungicide:matrix.organd i'm not seeing any packet loss to the lb over ipv4 or ipv613:30
@fungicide:matrix.orglooks like my connections are being balanced to gitea11 currently13:32
@fungicide:matrix.orgdigging in the apache logs on gitea11 i'm not seeing anything that stands out, nor in the haproxy logs on gitea-lb0313:42
@fungicide:matrix.orgprobably unrelated, letsencrypt renewals have been broken for a few days now, i'll see if i can track down the cause13:46
@fungicide:matrix.orglikely connected to the ansible upgrade on bridge, given the timing13:46
@fungicide:matrix.org`-rw-r--r-- 1 root root 431076 May 13 03:04 /var/log/ansible/letsencrypt.yaml.log`14:05
@fungicide:matrix.orgso yeah, the playbook hasn't run for the past 5 days14:05
@fungicide:matrix.orgthat last run was earlier on the same day that we upgraded ansible on bridge14:07
@fungicide:matrix.orgthat also coincides with our daily infra-prod-base job failing: https://zuul.opendev.org/t/openstack/builds?job_name=infra-prod-base&project=opendev/system-config14:10
@fungicide:matrix.orgaha, it's trying to update translate-dev01.openstack.org which has too-old python14:11
@fungicide:matrix.orgalso translate01.openstack.org and storyboard01.opendev.org14:12
@fungicide:matrix.orgi'll add them to the disable list for now while we discuss further14:12
@clarkb:matrix.orgfungi: I made a note about translate and storyboard on saturday. I think we need to force ansible to use python2 on those hosts via our inventory. THe autodetection finds python3 which is too old for ansible 914:44
@clarkb:matrix.orgmnasiadka: fungi: if the backends are happy then the issue could be that the frontend is getting overwhelmed14:44
@clarkb:matrix.orghttps://grafana.opendev.org/d/1f6dfd6769/opendev-load-balancer?orgId=1&from=now-6h&to=now&timezone=utc looks ok though14:44
@clarkb:matrix.organd it seems to be response for me at the moment. So either its backend specific or the window of time where this was happening has passed?14:47
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 988992: Force some hosts to use python2 for compat with Ansible 9 https://review.opendev.org/c/opendev/system-config/+/98899215:04
@fungicide:matrix.orgmaybe it's something on my end, but if i `git clone https://opendev.org/openstack/nova` i see `Cloning into 'nova'...` immediately followed by nothing for 43 seconds before `remote: Enumerating objects:` appears15:10
@fungicide:matrix.orgthen it proceeds to actually download at a reasonable pace15:10
@clarkb:matrix.orgI tested with system-config maybe it is repo specific?15:11
@clarkb:matrix.orgnova does have a ton of refs and maybe its spinning on the backend preparing the pack files etc to supply to the client15:11
@fungicide:matrix.orgdoesn't matter if i add `-4` or `-6` either, so seems to be the same for either address family15:11
@fungicide:matrix.orgif i clone opendev/bindep instead, the pause there is more like 1s15:12
@fungicide:matrix.orgso yeah, seems to perhaps be related to repository size15:12
@fungicide:matrix.orgmaybe it's taking gitea a long time to read in the files?15:13
@clarkb:matrix.orgya I think maybe we should try to profile a specific git clone of say nova against a specific backend and see where that may be slow15:15
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 988993: Set kernel.yama.ptrace_scope to 2 on executors https://review.opendev.org/c/opendev/system-config/+/98899315:17
@fungicide:matrix.orgfwiw, doesn't look like the backend i'm hitting has any memory pressure either, though the kernel does report 90% of ram is consumed for buff/cache15:19
@clarkb:matrix.orgI think we have a repacking cronjob on the giteas like we do for gerrit and that is done for performance reasons. Maybe we should check that is working?15:20
@clarkb:matrix.orgbasically check the cron isn't failing and then also inspect if it looks like we've got thousands of loose refs that we don't expect15:20
@clarkb:matrix.orgbut also if you initiate a git clone and its taking that long you may also be able to directly observe what is happening and strace it or something15:21
@clarkb:matrix.orgmight take a couple of attempts but should be doable?15:21
@fungicide:matrix.orgtrying to work out the ssh permission error in system-config-run-base-arm64 for my mixed architecture change, i see we don't collect `/var/log/auth.log`, should we start? or would it make more sense for me to just hold a node?15:43
@clarkb:matrix.orgIt might make sense to hold the nodes15:59
@fungicide:matrix.orgi'll just do that, then16:04
@clarkb:matrix.orgfungi: https://review.opendev.org/c/opendev/system-config/+/988992 passed testing but I don't think it actually exercises the code path16:34
@clarkb:matrix.orgif we proceed with ^ we'll want to remove the hosts from the emergency file so that the deplyoment actually affects those servers16:34
@fungicide:matrix.orgyep16:39
@fungicide:matrix.orgi can do that shortly, thanks!16:40
@mnasiadka:matrix.org988992 looks legit, although I wasn't aware Ansible is so funny, that it supports 2.7 but not some older 3.x ;-)17:03
@mnasiadka:matrix.org(but Kolla-Ansible is way forward in Ansible versions)17:03
@clarkb:matrix.orgone thing I learned after my networking issues on Friday is that tumbleweed has moved/is moving from apparmor to selinux. This partially explains the lack of testing that broke things. My laptop which is a newer install is all selinux and no apparmor17:06
@clarkb:matrix.orgso I've been digging around in documentation and trying to decide if I convert my desktop. I probably should at this point but maybe that is another Friday activity rather than a monday activity17:07
@clarkb:matrix.orgLooks like the process is to install all the selinux tools and tell the kernel to boot with selinux rather than apparmor enabled. First boot sets a flag to auto relabel everything and you boot into permissive mode first. If that all looks ok then you switch to enforcing and reboot again. Straightforward but considering fundamental items like nteworking can break I don't want to do that on Monday17:09
@clarkb:matrix.orgI'm holding new gerrit upgrade test nodes to retest things with the latest images. Looking at a calendar June 5 is looking like it might be good for the 3.13 upgrade?17:21
@clarkb:matrix.orgI'll bring that up in tomorrow's meeting but would be curious if June 5 works for others17:22
@fungicide:matrix.orgokay, i've taken the three older servers i added this morning back out of the disable list, and have approved 988992 now17:26
@fungicide:matrix.orgClark: okay, bit of a head-scratcher on the multi-arch job failure...17:41
@fungicide:matrix.orgi can see in the auth log on noble that the root login from bridge99 is being refused, and indeed `/root/.ssh/authorized_keys` has the wrong ipv6 address for bridge99 (though the correct ipv4 address) and the connection is being attempted over ipv617:42
@fungicide:matrix.orgthe zuul inventory doesn't have any record of the ipv6 address that's in the local config, for that matter17:43
@fungicide:matrix.orgbut also the inventory shows a null v6 address for bridge99 as `public_ipv6: ''` even though it was booted in rax-dfw17:44
@fungicide:matrix.orgseparately, i wonder if this would also be broken in cases where bridge99 ended up in a provider with no ipv6 at all while the inventory listed raw v6 addresses in `ansible_host` for the arm64 nodes in the inventory17:46
@fungicide:matrix.orgsince it seems like it wants to ssh from bridge99 to `root@$ansible_host`17:47
@fungicide:matrix.orgthe ip addresses of the held nodes are listed in https://zuul.opendev.org/t/openstack/build/604e9936e7c14b7cad823ed72d7ef30d/log/zuul-info/inventory.yaml if you want to take a look17:48
@fungicide:matrix.orgaha, the mystery ipv6 address in `/root/.ssh/authorized_keys` actually belongs to production bridge0117:50
@fungicide:matrix.orgso probably it got filled from a dns lookup fallback due to the empty `public_ipv6` for bridge99>17:50
@clarkb:matrix.orgThough I thought that is why we use bridge99 instead of the actual name to better decouple that stuff17:51
@fungicide:matrix.orgbut maybe we test for whether the address exists?17:52
@clarkb:matrix.orgBut ya maybe we just aren't handling the no ipv6 case since the prod bridge has ipv6 but how did this work when bridge was arm which has no ipv6?17:52
@fungicide:matrix.orgthe arm nodes do have ipv6 addresses17:52
@fungicide:matrix.organd the amd node *should* in theory because it's in rax classic, but for some reason it's empty17:52
@clarkb:matrix.orgOh I think arm nodes having ipv6 may be new17:53
@fungicide:matrix.orgyeah, possible17:53
@fungicide:matrix.orgregardless, and to clarify, the amd64 bridge99 *does* in fact have a working global ipv6 address and is using it to connect from rax-dfw to the arm64 nodes in osuosl-regionone, but the zuul/ansible *inventory* doesn't include it17:55
@clarkb:matrix.orghttps://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/base/server/tasks/main.yaml#L32-L3917:56
@clarkb:matrix.orgthis is where we set up that authorized key rule17:56
@fungicide:matrix.orgif it connected over ipv4 the tests would probably work, since the correct ipv4 address for bridge99 is allowed17:56
@clarkb:matrix.orgfungi: there are two levels of ansible here. The one that zuul runs to execute the job and the nested level thatruns our our ansible playbooks and I think the same one testinfra tests use17:57
@clarkb:matrix.orgits the nested inventory that is the problem right?17:57
@clarkb:matrix.orghttps://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/base/server/defaults/main.yaml#L2 and this explains why we get the prod ip addr when nothing else is defined (its the default for that var)17:58
@fungicide:matrix.orgprobably, i'm looking to see if i can find were we save a copy of the nested ansible's inventory17:58
@clarkb:matrix.orghttps://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-base.yaml#L105 this is where we try to set it in the test env's nested ansible config I think17:59
@fungicide:matrix.orghere it is: https://zuul.opendev.org/t/openstack/build/604e9936e7c14b7cad823ed72d7ef30d/log/bridge99.opendev.org/etc/ansible/hosts/group_vars/all.yaml17:59
@fungicide:matrix.organd yeah, it contains `bastion_ipv4` but no mention of `bastion_ipv6`18:00
@clarkb:matrix.orghttps://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/templates/group_vars/all.yaml.j2#L1-L9 which should write out this file to override the prod defaults18:01
@fungicide:matrix.orgso i suspect `public_ipv6` being an empty string in the zuul inventory bridge99 entry is probably resulting in an undefined `bastion_ipv6` in the nested vars18:01
@clarkb:matrix.orgok so https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-base.yaml#L105 is not writing out the ipv6 address we expect18:01
@clarkb:matrix.orgfungi: yes I think that is correct18:01
@clarkb:matrix.orgfungi: and then since we do actually have ipv6 on those hosts they use ipv6 and we fail. The expectation is that if we don't have an ipv6 addr listed in facts that we don't have working ipv6 at all and it will be fine18:02
@clarkb:matrix.orgso is this an openstacksdk or zuul launcher level bug?18:02
@clarkb:matrix.orgthese addresses come from the openstack APIs not ansible fact gathering I Think18:03
@clarkb:matrix.orgwhat is weird is the base job is working on the non mixed setup I think18:04
@clarkb:matrix.orgbut let me see if I can find one that also had the bridge in rax18:04
@fungicide:matrix.orgwell, for the non-mixed job zuul will try to put all the nodes in the same provider right? so in that case none of them will have v6 addresses in the zuul inventory in theory, and the nested ansible will refer to them all by their v4 addresses instead?18:06
@clarkb:matrix.orgfungi: same issue here: https://zuul.opendev.org/t/openstack/build/ea9eddfb5ee6403384d5a445e48c81c8/log/zuul-info/inventory.yaml but the job passes18:06
@clarkb:matrix.organd yes I think that explains it. When everything is one provider there is no mixing of ipv6 and ipv4 and we just sidestep the issue18:07
@clarkb:matrix.orgI suspect this problem is openstacksdk returning the wrong values for rax classic now18:07
@fungicide:matrix.orgbut separately, what will happen if an amd64 bridge99 ends up in openmetal-iad3 which has no ipv6 and then tries to ssh to arm64 nodes in osuosl-regionone? will it refer to them by their v6 addresses and error out with no route to host/invalid address family/whatever?18:09
@clarkb:matrix.orgit depends on how we write out the inventory using the write-inventory role here: https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-base.yaml#L7318:11
@clarkb:matrix.orgfungi: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/write-inventory/library/write_inventory.py writes that out which is a little anoying because it isn't an easy template file to just read18:13
@clarkb:matrix.orgfungi: I think it is taking the current inventory (from the zuul ansible) and writing it out to a file that the nested ansible will use18:14
@clarkb:matrix.orgwhich preserves the ansible_host values from the zuul ansible inventory18:15
@clarkb:matrix.organd it looks like we prefer ipv6 if present so ya  I think this will break in both directions18:15
@clarkb:matrix.orgso there are two bugs. The first is empty ipv6 addrs on hosts that do have ipv6 possibly as a problem with the openstack apis or openstacksdk or the zuul launcher. Then second we need to ensure that we only use ipv6 if all hosts can use ipv6 and vice versa with ipv4 when writing the nested inventory18:16
@clarkb:matrix.orgfixing the second thing will paper over the first thing18:16
@fungicide:matrix.orgokay, so 1. openstacksdk seems to not be finding rackspace classic ipv6 addresses recently, 2. our test `authorized_keys` entry shouldn't fall back to production addresses when test addresses are empty, 3. we need to figure out how we make the jobs work when bridge is in a provider with no ipv6 while other nodes have v6 addresses18:17
@clarkb:matrix.orgfungi: I think 2 may be ok since in theory we should be ignoring those network families if we don't have them (but we're not because we do have them we're just mistaken about it)18:18
@clarkb:matrix.orgfor 3. I suspect that we can add a new task after this task: https://opendev.org/opendev/system-config/src/branch/master/playbooks/zuul/run-base.yaml#L73 that checks the inventory's ansible_host value for each host and if they aren't all ipv6 sets them to ipv4 ro something like that18:19
@clarkb:matrix.orgbasically do the current write-inventory step to translate things directly. Then do a cleanup pass18:19
@clarkb:matrix.orgalternatively: zuul-launcher intends to support mixed provider nodesets. Maybe the actual bug here should be fixed in zuul-launcher itself18:21
@clarkb:matrix.orgthough that is a bit of a stretch since zuul's ansible can talk to all of them as is18:21
@clarkb:matrix.orgbut maybe this is a "make things easier for zuul users" item18:21
@fungicide:matrix.orgas in have zuul-launcher omit v6 addresses from the inventory if not all nodes have them?18:21
@clarkb:matrix.orgyup18:22
@fungicide:matrix.org(and same for v4 addresses i suppose, to support the v6-only case)18:22
@clarkb:matrix.orgor at least don't set ansible_host to ipv6 (maybe still record it in the inventory but don't make it the default connection type)18:22
@fungicide:matrix.orgoh, that sounds a little more reasonable18:23
@fungicide:matrix.organd i suppose there's also a corner case that would need to be covered: some nodes have only ipv4 addresses while others have only ipv618:23
@fungicide:matrix.orgnot that it would ever happen for us, but there could be some zuul deployments like that18:23
@clarkb:matrix.orgyes, which I don't think zuul can solve in inventory. That would have to come with mixed nodeset partitioning18:24
@clarkb:matrix.orglike don't mix this group with that group graph coloring18:24
@fungicide:matrix.orgi suppose the logic could work like this: if not all nodes have ipv6 addresses, then the nodes which do have both ipv6 and ipv4 addresses should use their ipv4 address for `ansible_host`18:25
@clarkb:matrix.orgyes or more generically: if all hosts have an ipvX address and not an ipvY address then ansible_host should be set to use ipvX for all hosts18:26
@fungicide:matrix.orgthen the all-dual-stack and all-v6-only cases would still work the way they do now, as would the some-v6-only while others v4-only case18:26
@fungicide:matrix.orgwell, my point was that the current v6 preference means that v6-only hosts will still get their v6 address as host anyway, so that doesn't need to change18:27
@fungicide:matrix.orgthere wouldn't be an actual need to "fall back" from missing v4 to present v618:28
@clarkb:matrix.orggot it18:28
@fungicide:matrix.orgso don't even need to test for that18:28
@clarkb:matrix.orgso something like: if you only have one ipvX or ipvY address use that. If all hosts share either available ipvX or ipvY but not all hosts have both then prefer the shared version. Finally prefer ipv6 if all else fails?18:29
@clarkb:matrix.organyway I think we can encode something like that in our run-base.yaml playbook as a followup to the initial inventory copy over18:29
@clarkb:matrix.orgit might look ugly in ansible but should be doable18:29
@fungicide:matrix.org"if at least one node has no v6, every node which has v4 should prefer it"18:29
@clarkb:matrix.org++18:30
@fungicide:matrix.orgthat should be the simplest logical encoding18:30
@clarkb:matrix.organd then if zuul wants to make it easier for people by encoding these rules a level higher we can drop what we do in run-base.yaml18:31
@clarkb:matrix.orgbut step 0 seems like updating run-base.yaml to do this18:31
@fungicide:matrix.orgis there a convenient tool for adjusting the values in that `gate-hosts.yaml` file, or do we need to make an external script/module for that?18:34
@clarkb:matrix.orgfungi: I think ansible can load the yaml, then you can change values, then you can write it back again18:35
@clarkb:matrix.orgone problem may be having that pollute the local inventory18:35
@clarkb:matrix.orgI don't know if we can "namespace" things18:35
@clarkb:matrix.orgya looks like you can just load vars and set a new varliable that way18:36
@clarkb:matrix.org* ya looks like you can just include vars and set a new varliable that way18:36
@fungicide:matrix.orgick, https://zuul.opendev.org/t/openstack/build/604e9936e7c14b7cad823ed72d7ef30d/log/bridge99.opendev.org/gate-hosts.yaml is all one very long line of yaml18:37
@clarkb:matrix.orghttps://opendev.org/zuul/zuul-jobs/src/branch/master/roles/update-json-file/tasks/main.yaml this is how the docker roles updating docker related json files18:38
@clarkb:matrix.orgfungi: that shouldn't matter too much if we use an approach like ^18:38
@clarkb:matrix.orgbasically load the file and loop over the entries and check if ipv6 is set for all of them. If not then in a new loop pass update ansible_host to the public_v4 value for the current item18:39
@clarkb:matrix.orgfinally write the file back out again if we made changes18:39
@fungicide:matrix.orgyeah, if we use something that groks yaml18:39
@clarkb:matrix.orgyes https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/update-json-file/tasks/main.yaml#L15 shoudl work fi you change from_json to from_yaml iirc18:40
@clarkb:matrix.orghttps://docs.ansible.com/projects/ansible/latest/collections/ansible/builtin/from_yaml_filter.html18:40
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [opendev/system-config] 988992: Force some hosts to use python2 for compat with Ansible 9 https://review.opendev.org/c/opendev/system-config/+/98899218:51
@clarkb:matrix.orginfra-prod-base succeeded in deploy for ^19:14
@clarkb:matrix.orgit is issuing certs now so deployment isn't done yet19:14
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 989022: Handle mixed provider nodests and ipv6 availability https://review.opendev.org/c/opendev/system-config/+/98902219:21
@clarkb:matrix.orgfungi: ^ that is totally untested, but I think captures the general shapre of what we're looking for19:21
@clarkb:matrix.orgfungi: I think having both static02 and static03 trying to renew certs may be creating problems. static03 failed with `security.openstack.org:Verify error:{"type":"urn:ietf:params:acme:error:serverInternal","detail":"Unable to validate JWS","status": 500}` but static02 succeeded. I'm reading the interval error as being maybe we tried to issue a cert for that name too many times in short succession?19:24
@clarkb:matrix.orgthough we've done this with 6 giteas for a long time19:24
@clarkb:matrix.orgso I don't know maybe that is wrong19:24
@clarkb:matrix.orgoh its internal not interval so ya I'm probably reading that wrong19:24
@fungicide:matrix.orgah thanks! i was still trying to figure out the syntax for making ansible find hosts that were missing v6 addresses, i.e. step #1 of the process19:25
@fungicide:matrix.orgyeah, i'll push up a change now to drop static02 and static04 from our production inventory, then work on cleanup once that lands19:26
@clarkb:matrix.orgI think all of the certs that succeeded do get new values on disk but we maybe didn't reload the vhosts19:27
@clarkb:matrix.orgso in theory this will either self heal this evening during the next daily runs and that will auto reload apache configs to pick up the new certs. Or if it will continue to fail and we can make more intervention19:27
@clarkb:matrix.organd now lunch19:28
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 989023: Drop static02 and static04 from inventory https://review.opendev.org/c/opendev/system-config/+/98902319:33
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/system-config] 989023: Drop static02 and static04 from inventory https://review.opendev.org/c/opendev/system-config/+/98902319:34
@fungicide:matrix.orgforgot to reset the head initially19:34
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 989022: Handle mixed provider nodests and ipv6 availability https://review.opendev.org/c/opendev/system-config/+/98902219:37
@clarkb:matrix.orgthere was a small bug that testing caught immediately. I'm hoping that the next failure is a big bug :)19:38
@clarkb:matrix.orgok really eating lunch now19:38
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/zone-opendev.org] 989024: Remove records for static02 and static04 https://review.opendev.org/c/opendev/zone-opendev.org/+/98902419:39
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 989022: Handle mixed provider nodests and ipv6 availability https://review.opendev.org/c/opendev/system-config/+/98902219:49
-@gerrit:opendev.org- Stephen Finucane proposed:19:51
- [opendev/git-review] 988966: Use configparser utils to parse types https://review.opendev.org/c/opendev/git-review/+/988966
- [opendev/git-review] 987712: Revert "Clean up all references to branchauthor after removal of usage" https://review.opendev.org/c/opendev/git-review/+/987712
- [opendev/git-review] 987713: Add gitreview.autotopic git config flag https://review.opendev.org/c/opendev/git-review/+/987713
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 989022: Handle mixed provider nodests and ipv6 availability https://review.opendev.org/c/opendev/system-config/+/98902220:05
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 989022: Handle mixed provider nodests and ipv6 availability https://review.opendev.org/c/opendev/system-config/+/98902220:22
@clarkb:matrix.orgfungi: I have approved the static02 and static04 inventory removal change20:24
@fungicide:matrix.orgi guess if we land 989023 to remove the unused static servers from our inventory, that will re-exercise the let's encrypt playbook in deploy and hopefully get through all the depending jobs20:24
@clarkb:matrix.orgyup20:24
@fungicide:matrix.orggreat timing20:24
@clarkb:matrix.orgin 989022 I'm struggling with getting all the type conversions and attribute lookups to line up. But hopefully each update is a small bit of progress20:26
@jim:acmegating.comClark: what arm nodes do we have?20:27
@fungicide:matrix.orgwe test on arm in osuosl because we run a mirror server there20:28
@fungicide:matrix.orgso the system-config-run-mirror job20:28
@clarkb:matrix.orgyes I think that is the only arm node left in production. Everything else is a test node20:29
@fungicide:matrix.orger, system-config-run-mirror-arm64 technically20:29
@fungicide:matrix.organd also system-config-run-base-arm64 of course20:29
@jim:acmegating.comgot it (and that doesn't require arm, it's just the only thing available in that region)20:30
@fungicide:matrix.orgright. if it were a mixed-architecture region we could run an amd64 mirror serving the arm64 nodes located there20:30
@fungicide:matrix.orgbut since it's the only architecture in that region we want to make sure we can continue to deploy and manage a mirror server in it20:31
@jim:acmegating.comdo we have any ipv6 only clouds?20:31
@fungicide:matrix.orgalso makes for an interesting test-case for zuul-launcher's new cross-provider nodeset capability20:32
@fungicide:matrix.orgwe do not currently have any ipv6-only clouds, no20:32
@fungicide:matrix.orgthough earlier we discussed the potential challenges with booting an ipv4-only bridge99 and some other node in an ipv6-only provider20:32
@fungicide:matrix.orgor vice versa20:33
@jim:acmegating.comthen i think the thing to do kind of depends on what we want to test; if we want the test to exercise real-world v4->v6 bridge->server connectivity, then we'll need something like clark's change, so that we make the most of whatever the clouds give us.  i have doubts whether we care about this very much though, since it's luck of the draw whether we get v6 capable server nodes.  if we don't care about that so much then:20:34
@fungicide:matrix.orgftr, this initially came up because https://review.opendev.org/c/opendev/system-config/+/988698 is attempting to work around lack of arm wheels for some python packages ansible needs, which resulted in the ansible v9 upgrade breaking the arm jobs20:34
@jim:acmegating.coma zuul-ish way of solving this would be to turn off v6 on all our clouds.  possibly scoped to just some system-config specific labels, or potentially globally.20:34
@jim:acmegating.com(by "turn off v6" i mean "tell zuul-launcher not to use v6 addresses")20:35
@fungicide:matrix.orgcorvus: yeah, we also talked about maybe making zuul-launcher smart enough to set the v4 address as preferred on any dual-stack node when at least one node in the nodeset is v4-only20:35
@clarkb:matrix.orgthere is also the issue of empty ipv6 addresses20:36
@jim:acmegating.comthat's something worth considering -- but it also means inconsistent behavior from a given provider20:36
@fungicide:matrix.orgright, the missing address is a separate (likely openstacksdk) bug which seems to just be affecting rackspace classic20:36
@jim:acmegating.comlike, you get v6 unless you get something from another provider in which case you get v4.  convenient for this case, but is it universally sensible?20:37
@fungicide:matrix.orgi suppose it depends on how likely most people's multi-node jobs are to want all the nodes to be able to communicate with each other via their ansible host identifiers20:38
@fungicide:matrix.org(specifically in heterogenous multi-provider-aware deployments with nodesets needing resources from more than one provider)20:39
@jim:acmegating.comi'm arguing there's an extra clause that belongs in that statement: "and they also do not want to disable ipv6 on the labels involved"20:39
@fungicide:matrix.orgso seems like the potential target audience for such an optimization is vanishingly small and maybe "just opendev"20:39
@jim:acmegating.comyes, i would bet a nickel that all versions of this qualifier (yours and mine) match only opendev.  :)20:40
@fungicide:matrix.organyway, that's why we started with a workaround in our own playbooks20:41
@jim:acmegating.comincidentally, that's probably the thing that would convince me it's okay to have zuul-launcher downgrade the protocol: there's probably no other use case for this configuration than opendev :)20:42
@jim:acmegating.com(it's a coin flip which behavior is "better", and there's only one user who cares)20:43
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 989022: Handle mixed provider nodests and ipv6 availability https://review.opendev.org/c/opendev/system-config/+/98902220:44
@clarkb:matrix.orgfwiw I think that generally being able to talk between nodeset nodes is a good goal unless intentionally broken. And in general I think jobs would figure that out themselves based on the network info. The oddity here is our nested ansible and reuse of the inventory20:45
@clarkb:matrix.organd ya it is probably unlikely to be useful to many others (because nested ansible seems less common and few have the heterogenous envs that we do)20:46
@jim:acmegating.comnote that if we do want to have zuul-launcher do this, it will require a small bit of refactoring since currently the drivers themselves select the interface ip, not the launcher.  so we'd need to move that logic to the launcher.  in the case of the openstack driver, we use the value returned by openstacksdk, so we would end up no longer using that (that's probably fine)20:46
@clarkb:matrix.orgI'm happy to start with our own little workaround assuming I can ever get the jinja filters to work before changing anything in zuul20:47
@jim:acmegating.comack.  maybe the second time we hit this issue we look at the zl change.  :)20:47
@clarkb:matrix.orgre the `public_v6: ''` values I'm assuming we can try and reproduce that using sdk directly to list a node and see what value ti get back for the address20:50
@clarkb:matrix.orghttps://zuul.opendev.org/t/openstack/build/ead900ebf29e4e3290db0531f0557849/log/zuul-info/inventory.yaml#26 actually this is affecting ovh too20:51
@clarkb:matrix.orgso not a rax classic specific problem. Which maybe means you can reproduce this more easily20:51
-@gerrit:opendev.org- Zuul merged on behalf of Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org: [opendev/system-config] 989023: Drop static02 and static04 from inventory https://review.opendev.org/c/opendev/system-config/+/98902320:52
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 989022: Handle mixed provider nodests and ipv6 availability https://review.opendev.org/c/opendev/system-config/+/98902220:56
@clarkb:matrix.orgok that seems like it is getting closer. I managed to address the prior issue with dict2items. And then I had a small syntax bug. fixing that will probably just uncover the next small issue. But I have to do a school run in a few minutes so feel free to continue to iterate on that while I'm doing that. Otherwise I'll pick it up when I get back.20:57
@clarkb:matrix.orgAlso I'll be making meeting agenda edits when I get back. Let me know if there are any edits you'd like to see20:57
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 989022: Handle mixed provider nodests and ipv6 availability https://review.opendev.org/c/opendev/system-config/+/98902222:00
@clarkb:matrix.orgfungi: remote puppet else failed as did lists3 on the deploy for static02 and static04 removal so that is an improvement22:09
@clarkb:matrix.orgI haven't looked into those failires but the remote puppet else failure may be related to the python version change or ansible 9 update I suppose22:10
@fungicide:matrix.orglet's encrypt job i was just looking at that22:10
@fungicide:matrix.orginfra-prod-letsencrypt succeeded, but infra-prod-service-lists3 and infra-prod-remote-puppet-else failed22:10
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 989022: Handle mixed provider nodests and ipv6 availability https://review.opendev.org/c/opendev/system-config/+/98902222:11
@fungicide:matrix.orgERROR: for anubis  Head "https://ghcr.io/v2/techarohq/anubis/manifests/v1.25.0": Get "https://ghcr.io/token?scope=repository%3Atecharohq%2Fanubis%3Apull&service=ghcr.io": context deadline exceeded (Client.Timeout exceeded while awaiting headers)22:11
@fungicide:matrix.orgso i think that's probably just a one-off22:11
@clarkb:matrix.orgya that looks like lists trying to figure out if there are updates to the v1.25.0 anubis image. I agree we can ignore that one22:12
@clarkb:matrix.orgunless it becomes persistent then figure out mirroring that image or something. but for now its enough to ignroe I think22:12
@fungicide:matrix.orgdigging in `/var/log/ansible/remote_puppet_else.yaml.log` i'm not immediately seeing the problem, no tasks were failed or unreachable22:13
@fungicide:matrix.orgaha!22:13
@fungicide:matrix.org`ERROR! [DEPRECATED]: ansible.builtin.include has been removed. Use include_tasks or import_tasks instead. This feature was removed from ansible-core in a release after 2023-05-16. Please update your playbooks.`22:14
@fungicide:matrix.org`The error appears to be in '/etc/ansible/roles/puppet/tasks/main.yaml': line 131, column 3, but may be elsewhere in the file depending on the exact syntax problem.`22:14
@fungicide:matrix.orgno other obvious error messages in the log22:14
@clarkb:matrix.orgof course finding the source of that role might be fun22:15
@clarkb:matrix.orggiven the path I'm assuming that is a separate reop22:15
@clarkb:matrix.orghttps://opendev.org/opendev/ansible-role-puppet likely22:16
@clarkb:matrix.orgyup https://opendev.org/opendev/ansible-role-puppet/src/branch/master/tasks/main.yaml#L133 I think this matches22:17
@clarkb:matrix.orgnote the line number is for the task definition starting so its off by a couple but it matches22:18
@clarkb:matrix.orgfungi: so I think that include: should just be include_tasks: and it will be happy again22:19
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 989022: Handle mixed provider nodests and ipv6 availability https://review.opendev.org/c/opendev/system-config/+/98902222:26
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/ansible-role-puppet] 989028: Use include_tasks instead of include https://review.opendev.org/c/opendev/ansible-role-puppet/+/98902822:29
@fungicide:matrix.orgClark: like that ^ ?22:29
@fungicide:matrix.organyway, i'm stepping away for the night, but can pick this back up in the morning if nobody beats me to it22:30
@clarkb:matrix.orgfungi: ya that looks right. I won't merge anything with what day is left for me. However, my change for the ipv6 handling in inventory is looking like it may be working now. I'll rebase your x86 bridge change on top of it if that is teh case22:38
-@gerrit:opendev.org- Clark Boylan proposed wip on behalf of Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org: [opendev/system-config] 988698: Use mixed arches for system-config-run-base-arm64 https://review.opendev.org/c/opendev/system-config/+/98869822:40
@clarkb:matrix.orga few test jobs have reported success so I've done the rebase there ^22:41
@clarkb:matrix.orgok I just did a first pass on the meeting agenda. I know its late so I don't expect any input at this point. But just in case there are thoughts I'll wait another 10-20 minutes before sending the email to make it official22:49
@clarkb:matrix.orgfungi: hrm maybe testinfra isn't using the inventory like I thought it was. Its still trying to connect to ipv6 like before even after rebased onto my change23:27

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!