*** rlandy|bbl is now known as rlandy|out | 00:09 | |
opendevreview | Ian Wienand proposed opendev/system-config master: [wip] testing with bridge99.opendev.org https://review.opendev.org/c/opendev/system-config/+/862845 | 01:16 |
---|---|---|
opendevreview | Ian Wienand proposed opendev/base-jobs master: infra-prod: Move project-config reset into base-jobs https://review.opendev.org/c/opendev/base-jobs/+/862853 | 03:56 |
opendevreview | Ian Wienand proposed opendev/system-config master: Remove old bridge testing https://review.opendev.org/c/opendev/system-config/+/862766 | 03:59 |
opendevreview | Ian Wienand proposed opendev/system-config master: Refernce bastion through prod_bastion group https://review.opendev.org/c/opendev/system-config/+/862845 | 03:59 |
opendevreview | Ian Wienand proposed opendev/system-config master: Revert "Update to tip of master in periodic jobs" https://review.opendev.org/c/opendev/system-config/+/862854 | 03:59 |
opendevreview | Ian Wienand proposed opendev/base-jobs master: infra-prod: Move project-config reset into base-jobs https://review.opendev.org/c/opendev/base-jobs/+/862853 | 07:00 |
*** jpena|off is now known as jpena | 07:22 | |
*** benj_0 is now known as benj_ | 08:07 | |
*** ShadowJonathan_ is now known as ShadowJonathan | 08:07 | |
*** andrewbonney_ is now known as andrewbonney | 08:07 | |
*** walshh__ is now known as walshh_ | 08:07 | |
*** open10k8s_ is now known as open10k8s | 08:07 | |
*** aprice_ is now known as aprice | 08:07 | |
*** erbarr_ is now known as erbarr | 08:07 | |
*** TheJulia_ is now known as TheJulia | 08:07 | |
*** gouthamr_ is now known as gouthamr | 08:07 | |
*** eball_ is now known as eball | 08:07 | |
*** snbuback_ is now known as snbuback | 08:07 | |
*** PrinzElvis_ is now known as PrinzElvis | 08:07 | |
*** chateaulav_ is now known as chateaulav | 08:07 | |
*** odyssey4me_ is now known as odyssey4me | 08:13 | |
frickler | infra-root: https://review.opendev.org/c/opendev/system-config/+/862759/1/playbooks/roles/nodepool-base/tasks/main.yaml is broken, the attr must be named "host" instead of "addr" | 09:58 |
frickler | noticed this via mail from failing nodepool job | 09:58 |
opendevreview | Dr. Jens Harbott proposed opendev/system-config master: Fix generated zookeeper config for nodepool https://review.opendev.org/c/opendev/system-config/+/862878 | 10:01 |
frickler | unrelated: wheels were last successfully built 8 days ago | 10:03 |
*** rlandy_ is now known as rlandy | 10:35 | |
*** arxcruz is now known as arxcruz|ruck | 10:36 | |
frickler | looks like afs rpm issue again https://zuul.opendev.org/t/openstack/build/80ad124ac3cf4933a7b9a381ad2f0b9c | 10:36 |
frickler | regarding nodepool, someone might want to check why the failures can be seen in the job log, but the job still passed https://bb6d07e84661d82562d5-daf98166b205b84408724e1df10e75fa.ssl.cf5.rackcdn.com/862759/1/gate/system-config-run-nodepool/4cb5b5b/nl01.opendev.org/docker/nodepool-docker_nodepool-launcher_1.txt | 11:23 |
*** dviroel is now known as dviroel|rover | 11:44 | |
opendevreview | Merged opendev/system-config master: Fix generated zookeeper config for nodepool https://review.opendev.org/c/opendev/system-config/+/862878 | 12:11 |
frickler | actually with that patch in there is still something wrong in the nodepool.yaml generated in gate, the host adresses are empty https://c60c9b47a943d67d7acd-72f68c73c06acdc7229714e8d93d40d1.ssl.cf1.rackcdn.com/862878/1/gate/system-config-run-nodepool/73b5a84/nl01.opendev.org/docker/nodepool-docker_nodepool-launcher_1.txt | 12:21 |
frickler | will have to watch what gets generated by the next periodic job on the live systems | 12:22 |
fungi | we can revert https://review.opendev.org/862759 and just go back to relying on the old module for now, worst case | 12:29 |
frickler | nodepool servers seem to be happy again since 1h, so the issue seems to happen only in CI | 13:40 |
frickler | revert would have been difficult unless we also revert to using old bridge | 13:41 |
fungi | not really. new bridge was already working without 862759, that was merely a performance improvement | 13:54 |
fungi | the actual problem we hit on the new bridge was related to some unused zk keys which were embedded as raw binary, and were causing encoding problems for ansible | 13:55 |
fungi | that was fixed by a separate change just prior to 862759 | 13:55 |
frickler | ah, good, do you remember where that fix was? can't find anything matching in system-config | 14:10 |
frickler | also this is failing for a couple of days https://zuul.opendev.org/t/openstack/builds?job_name=infra-prod-base&project=opendev/system-config | 14:10 |
frickler | "usermod: user zuul is currently used by process 1370895" on bridge01 itself | 14:12 |
fungi | frickler: possible it was an edit to our private group vars, checking... | 14:14 |
fungi | frickler: yep, that's where it was | 14:15 |
fungi | "Remove unsued keytab entries" (most recent commit in /etc/ansible/hosts on the new bridge) | 14:16 |
frickler | fungi: ah, right, thx | 14:28 |
fungi | anyway, those were what was gumming up the works for newer python, apparently | 14:29 |
*** dasm|off is now known as dasm | 14:58 | |
clarkb | re the usermod error that seems similar to the issue I addressed with removing the ubuntu user as part of launch node | 15:10 |
clarkb | I addressed that by forcing the removal regardless as a subsequent step does a reboot. I doubt we're trying to remove the zuul user there but maybe any modification is tripping over similar? | 15:10 |
opendevreview | Andy Ladjadj proposed zuul/zuul-jobs master: fix(packer): prevent task failure when packer_variables is not defined https://review.opendev.org/c/zuul/zuul-jobs/+/836744 | 15:21 |
opendevreview | Andy Ladjadj proposed zuul/zuul-jobs master: [upload-logs-base] add ipa extension to known mime types https://review.opendev.org/c/zuul/zuul-jobs/+/834045 | 15:21 |
opendevreview | Andy Ladjadj proposed zuul/zuul-jobs master: [upload-logs-base] add android mime-type https://review.opendev.org/c/zuul/zuul-jobs/+/834046 | 15:22 |
clarkb | fungi: frickler: re nodepool config we should double check that the old code produced different results too. Thinking out loud here I wonder if our inventory in the test jobs has enough of the bits we use in production to produce a working config | 15:27 |
opendevreview | Andy Ladjadj proposed zuul/zuul-jobs master: [ensure-python] install python version only if not present https://review.opendev.org/c/zuul/zuul-jobs/+/770656 | 15:36 |
*** dviroel|rover is now known as dviroel|rover|lunch | 15:37 | |
clarkb | infra-root after breakfast I'll get around to deleting gitea-lb01 and jvb02 from their respective clouds | 15:42 |
clarkb | I'll start by shutting down services on the hosts and letting them sit for about an hour just to make sure there isn't any unexpected fallout then turn them off | 15:42 |
clarkb | s/then turn them off/then delete them/ | 15:43 |
*** jpena is now known as jpena|off | 15:48 | |
fungi | sounds great, thanks! | 15:50 |
clarkb | services are now off | 15:51 |
clarkb | expect deletions to occur around 1700 UTC | 15:51 |
*** dviroel|rover|lunch is now known as dviroel|rover | 16:31 | |
clarkb | infra-root the rax dns backup is failing on bridge01, but it is/was also failing on bridge.o.o. Not a regression | 16:37 |
clarkb | infra-root I've just realized that both bridges will attempt to run the zuul restart playbook later today | 16:43 |
fungi | d'oh! | 16:43 |
clarkb | Since bridge.o.o shouldn't be configured autoamtically anymore I'm going to manually comment out the crontab entry for the playbook on that server allowing bridge01 to be the lone zuul restart commander | 16:43 |
fungi | probably time that we shut down the old bridge (just not delete it) | 16:44 |
fungi | but yeah that's a good intermediate step | 16:44 |
clarkb | crontab is edited | 16:44 |
clarkb | ya we should probably wait for ianw's monday before doing that? | 16:44 |
fungi | agreed | 16:44 |
clarkb | just in case he feels there are still things that need cross checking | 16:44 |
fungi | the new bridge is working great though | 16:44 |
fungi | and now that we've sussed out how to make latest osc work with rackspace volume management, i don't really have anything i need to preserve from the old bridge | 16:45 |
clarkb | ya I've got a bunch of stuff in my homedir but I'm fairly certain non of it is particularly important | 16:47 |
clarkb | running `openstack` for the first time on bridge01 reminds me this is a docker command | 16:51 |
clarkb | it started doing things I didn't expect at first so I ^C'd | 16:51 |
clarkb | personally I'm not sure how I feel about consuming osc that way | 16:52 |
clarkb | its definitely a surprise to have stuff download in response to a server list | 16:52 |
clarkb | and now I'm wondering why python-openstackclient and openstackclient are both on pypi | 16:54 |
clarkb | python-openstackclient appears to be the up to date one | 16:55 |
clarkb | openstackclient says it is a meta pacakge that installs the same major version of python-openstackclient. It doesn't have new releases so I guess that stopped getting updated | 16:56 |
clarkb | anyway I'm setting up another venv because in the past that has become necessary. I'm just jumping the gun on that. | 16:59 |
clarkb | and I can list nodes and volumes in vexxhost which is what I need to double check my work deleting gitea-lb01 | 17:00 |
clarkb | infra-root its been an hour since I shutdown services on jvb02 and gitea-lb01 any objections to deleting them now? | 17:00 |
clarkb | ok gitea-lb01 is deleted, but that didn't auto delete its bfv volume. Going to delete that too | 17:04 |
clarkb | the volume doesn't seem to remember the last thing it was attached to which makes deletion difficult if you don't do a volume list first (I did this out of fear this may happen) | 17:06 |
clarkb | I can't server list against rax | 17:08 |
clarkb | Version 2 is not supported, use supported version 3 instead. | 17:09 |
clarkb | Invalid client version '2.0'. Major part should be '3' | 17:09 |
clarkb | fungi: do you have this working on bridge01? your comments about the volume stuff make me think that this may be the case | 17:09 |
clarkb | #status log Deleted gitea-lb01 (e65dc9f4-b1d4-4e18-bf26-13af30dc3dd6) and its BFV volume (41553c15-6b12-4137-a318-7caf6a9eb44c) as this server has been replaced with gitea-lb02. | 17:10 |
opendevstatus | clarkb: finished logging | 17:10 |
clarkb | ok the issue is the cinder api. Apparently we don't support API v2 in the latest version | 17:12 |
clarkb | ok downgrading to osc<5.0.0 fixes it (5.0 might work too? I'm not sure). It reused the python-cinderclient wheel I already had cached which implies the issue is in osc not cinderclient | 17:15 |
clarkb | #status log Deleted jvb02.opendev.org (a93ef02b-4e8b-4ace-a2b4-cb7742cdb3e3) as we don't need this extra jitsi meet jvb to meet ptg demands | 17:18 |
opendevstatus | clarkb: finished logging | 17:18 |
clarkb | gtema: ^ fyi re client issues | 17:18 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Remove gitea-lb01 and jvb02 from DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/862941 | 17:20 |
clarkb | The good news is that the new bridge can do this stuff with some minor tweaks | 17:23 |
fungi | clarkb: latest cli/sdk worked for me by pinning python-cinderclient<8 | 17:37 |
fungi | it was cinderclient 8.0.0 which dropped volume v2 api support | 17:38 |
clarkb | fungi: it works fine with old osc and cinderclient 9.1.0 | 17:41 |
fungi | the ~fungi/foo venv on bridge01 is able to volume list, and was built via just `pip install openstackclient 'python-cinderclient<8'` | 17:41 |
clarkb | it could be that both things are breaking it in different ways but if you change one or the other then it works | 17:41 |
clarkb | oh openstackclient should install the same version that python-openstackclient<5.0.0 installs | 17:42 |
clarkb | I don't think I'm going to debug this further, I just want to call it out as something that downgrading osc alone seems tohave fixed so isn't the cinderclient's sole issue | 17:42 |
fungi | that working venv has openstackclient==4.0.0 and python-openstackclient==6.0.0 | 17:42 |
fungi | also openstacksdk==0.102.0 | 17:43 |
clarkb | mine has python-openstackclient==4.0.2 and python-cinderclient==9.1.0 and openstacksdk==0.102.0 | 17:43 |
clarkb | I guess hte promise that openstackclient always installs the same major version of python-openstackclient as its major version is wrong | 17:43 |
fungi | so it's either old osc with new cinderclient, or new osc with old cinderclient? | 17:43 |
clarkb | ya I think so | 17:44 |
fungi | clarkb: to provide a different cloud.yaml file path/name with osc you need to export an envvar, right? i remember there was some way to do it but can't find an command-line flag at least | 18:00 |
fungi | i guess i should be looking for the old oscc docs | 18:21 |
clarkb | yes I think it is an env var. Something like OS_CONFIG_FILE | 18:41 |
clarkb | I forget the actual name though | 18:41 |
clarkb | fungi: OS_CLIENT_CONFIG_FILE openstacksdk defines it not osc | 18:48 |
fungi | aha, thanks! | 18:49 |
fungi | i had taken to grepping through the source because i didn't spot it in the docs | 18:49 |
clarkb | ya it might be worth having osc's docs link to sdk's docs on the subject | 18:50 |
*** dviroel|rover is now known as dviroel|rover|afk | 20:22 | |
*** dasm is now known as dasm|off | 20:23 | |
clarkb | part of me wants to start upgrading gitea backends to jammy now, but I think waiting for the openssl vuln to be fixed is a good idea | 21:02 |
*** dviroel|rover|afk is now known as dviroel|rover | 21:34 | |
*** dviroel|rover is now known as dviroel|out | 21:36 | |
fungi | oh, is jammy already openssl 3.x? | 21:45 |
clarkb | I think so | 21:47 |
fungi | looks like it is, yeah | 21:52 |
fungi | 3.0.2-0ubuntu1.6 | 21:52 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!