-@gerrit:opendev.org- Artem Goncharov proposed: [zuul/zuul] 859939: Implement Gitea driver https://review.opendev.org/c/zuul/zuul/+/859939 | 12:55 | |
-@gerrit:opendev.org- Artem Goncharov proposed: [zuul/zuul] 859939: Implement Gitea driver https://review.opendev.org/c/zuul/zuul/+/859939 | 15:14 | |
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul-jobs] 935726: Cap the ansible version used by ansible-lint https://review.opendev.org/c/zuul/zuul-jobs/+/935726 | 17:11 | |
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul-jobs] 935722: Support new style mirror_info in use-docker-mirror https://review.opendev.org/c/zuul/zuul-jobs/+/935722 | 17:31 | |
@clarkb:matrix.org | corvus: Nasser over in Gerrit land is reporting that nodepool-launcher is failing to start with a traceback that indicates the nodepool-launcher openstack driver init method is crashing when trying to load openstack config (I think clouds.yaml type config) during driver loading | 18:10 |
---|---|---|
@clarkb:matrix.org | corvus: is this possibly a regression in nodepool-launcher where we now require an openstack config even if you aren't using the driver or have any providers with openstack configured? | 18:10 |
@clarkb:matrix.org | or maybe openstacksdk is crashing now when before it would just be a noop if you have no cloud config | 18:11 |
@clarkb:matrix.org | the code itself doesn't appear to have changed | 18:12 |
@clarkb:matrix.org | ec1bb9eef1ef493f5f8e20e65b44934a3b1f51dc this change in openstacksdk added the check | 18:15 |
@clarkb:matrix.org | and I'm guessing our new release of nodepool is what caused gerrit to pick it up? | 18:15 |
@clarkb:matrix.org | so I think we need to make the openstack driver catch and ignore the exception raised here or similar: https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/config/loader.py#L478-L481 | 18:17 |
@clarkb:matrix.org | I am in a meeting now but can look closer afterwards | 18:17 |
@jim:acmegating.com | Clark: it looks like nodepool-launcher is running; i don't see any errors related to openstack. | 18:34 |
@jim:acmegating.com | i also don't see any changes merged to the ops or config repos recently | 18:34 |
@clarkb:matrix.org | corvus: they just downgraded to :10.0.0 which fixes it by using older openstacksdk | 18:34 |
@clarkb:matrix.org | 11.0.0 and latest have openstacksdk 4.1.0 which creates the error | 18:35 |
@clarkb:matrix.org | I don't know how nasser downgraded it | 18:35 |
@jim:acmegating.com | yeah... i'd normally expect a change for that :/ | 18:35 |
@clarkb:matrix.org | but the issue seems to be we load all drivers for nodepool when nodepool starts regardless of which we end up using and openstack's driver __init__ calls reset() which loads configs which now raises an exception in the code I linked above | 18:36 |
@jim:acmegating.com | Clark: do you have a copy of the traceback? | 18:37 |
@clarkb:matrix.org | unfortunately I'm not logged into discord on my desktop only my phone right now. Let me try and email it to myself or something that should work | 18:38 |
@clarkb:matrix.org | ``` | 18:41 |
Traceback (most recent call last): | ||
File "/usr/local/bin/nodepool-launcher", line 8, in <module> | ||
sys.exit(main()) | ||
^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/launcher.py", line 88, in main | ||
return NodePoolLauncherApp.main() | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
[snip for discord msg length] | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/__init__.py", line 91, in get | ||
Drivers.load() | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/__init__.py", line 84, in load | ||
Drivers.drivers[driver] = driver_obj() | ||
^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/openstack/__init__.py", line 27, in __init__ | ||
self.reset() | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/openstack/__init__.py", line 24, in reset | ||
self.openstack_config = loader.OpenStackConfig() | ||
^^^^^^^^^^^^^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/openstack/config/loader.py", line 214, in __init__ | ||
self._validate_config_file(self.config_filename, self.cloud_config) | ||
File "/usr/local/lib/python3.11/site-packages/openstack/config/loader.py", line 479, in _validate_config_file | ||
raise exceptions.ConfigException( | ||
openstack.exceptions.ConfigException: Configuration file {path} is empty or not a valid mapping | ||
``` | ||
@clarkb:matrix.org | the `{path}` thing is due to a missing f'' string f prefix in openstacksdk which they have since fixed but not released I guess | 18:42 |
@jim:acmegating.com | ` [snip for discord msg length]` :( | 18:43 |
@clarkb:matrix.org | ya.... | 18:43 |
@clarkb:matrix.org | Nasser has pushed a change up to pin the launcher to 10.0.0 now | 18:44 |
@clarkb:matrix.org | 443461 on their gerrit server | 18:44 |
@clarkb:matrix.org | let me see if I can get the full traceback now | 18:44 |
@jim:acmegating.com | thanks, that would be useful. there are a number of possible code paths... | 18:48 |
@clarkb:matrix.org | ``` | 18:49 |
Traceback (most recent call last): | ||
File "/usr/local/bin/nodepool-launcher", line 8, in <module> | ||
sys.exit(main()) | ||
^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/launcher.py", line 88, in main | ||
return NodePoolLauncherApp.main() | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/__init__.py", line 249, in main | ||
return super(NodepoolDaemonApp, cls).main(argv) | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/__init__.py", line 196, in main | ||
return cls()._main(argv=argv) | ||
^^^^^^^^^^^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/__init__.py", line 186, in _main | ||
return self._do_run() | ||
^^^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/__init__.py", line 230, in _do_run | ||
return super(NodepoolDaemonApp, self)._do_run() | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/__init__.py", line 192, in _do_run | ||
return self.run() | ||
^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/launcher.py", line 64, in run | ||
config = self.pool.loadConfig() | ||
^^^^^^^^^^^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/launcher.py", line 1123, in loadConfig | ||
config = nodepool_config.loadConfig(self.configfile) | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/config.py", line 463, in loadConfig | ||
newconfig.setProviders(config.get('providers')) | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/config.py", line 255, in setProviders | ||
p = get_provider_config(provider) | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/config.py", line 408, in get_provider_config | ||
driver = Drivers.get(provider['driver']) | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/__init__.py", line 91, in get | ||
Drivers.load() | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/__init__.py", line 84, in load | ||
Drivers.drivers[driver] = driver_obj() | ||
^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/openstack/__init__.py", line 27, in __init__ | ||
self.reset() | ||
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/openstack/__init__.py", line 24, in reset | ||
self.openstack_config = loader.OpenStackConfig() | ||
^^^^^^^^^^^^^^^^^^^^^^^^ | ||
File "/usr/local/lib/python3.11/site-packages/openstack/config/loader.py", line 214, in __init__ | ||
self._validate_config_file(self.config_filename, self.cloud_config) | ||
File "/usr/local/lib/python3.11/site-packages/openstack/config/loader.py", line 479, in _validate_config_file | ||
raise exceptions.ConfigException( | ||
openstack.exceptions.ConfigException: Configuration file {path} is empty or not a valid mapping | ||
``` | ||
@clarkb:matrix.org | does that imply they actually do have an openstack driver provider configured? | 18:49 |
@clarkb:matrix.org | I don't see one here at least: https://gerrit.googlesource.com/zuul/ops/+/refs/heads/master/nodepool/nodepool.yaml | 18:50 |
@jim:acmegating.com | no, i think it's more or less as you described, but the usage here is weird | 18:51 |
@jim:acmegating.com | i don't see why that should be a global singleton | 18:51 |
@clarkb:matrix.org | oh I see we loop over all drivers not the drivers in the config | 18:51 |
@clarkb:matrix.org | > <@jim:acmegating.com> i don't see why that should be a global singleton | 18:52 |
Drivers? | ||
@jim:acmegating.com | no openstack_config | 18:53 |
@clarkb:matrix.org | oh I'm not sure. fwiw we do have handling if the driver fails to load but I guess not if we raise we have to return a nil driver object instead | 18:54 |
@clarkb:matrix.org | wondering if maybe we should modify _load_class to try except and then additionally log if that fails | 18:55 |
@clarkb:matrix.org | then it can be an annoying but not completely broken error in this case | 18:55 |
@jim:acmegating.com | heh, i think maybe it's a singleton because it might reload clouds.yaml every time and we might want to avoid that for efficiency, but we call reset every time we load the config | 18:57 |
@jim:acmegating.com | so i think the current behavior is equivalent to just removing that attr from the driver; i'll make a patch for that | 18:57 |
@clarkb:matrix.org | oh wait _load_class is before we actually fail | 18:57 |
@clarkb:matrix.org | ok I think that works too as long as we load the config when needed elsewhere | 18:58 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 935820: Remove global singleton openstack_config https://review.opendev.org/c/zuul/nodepool/+/935820 | 18:59 | |
@jim:acmegating.com | Clark: ^ that's what i'm thinking. i have not tested that locally. | 18:59 |
@clarkb:matrix.org | oh we don't actually use the attr at all we're doing that purely for side effects | 18:59 |
@jim:acmegating.com | we do use it; you can see in that change | 18:59 |
@clarkb:matrix.org | corvus: how did this not break our testing with k8s? | 18:59 |
@clarkb:matrix.org | in theory our k8s functional job doesn't have an openstack provider. Did we just not run that job when we merged whatever updated things? However this updated long enough ago that seems unlikely | 19:00 |
@clarkb:matrix.org | corvus: where do we call reset when we load the provider config? | 19:02 |
@clarkb:matrix.org | your change makes sense to me I'm just not immediately seeing ^ | 19:02 |
@clarkb:matrix.org | oh maybe that is part of the driver loading machinery | 19:02 |
@clarkb:matrix.org | yup loadConfig calls driver.reset for each driver | 19:03 |
@clarkb:matrix.org | the only other question is why our testing didn't catch this. It may be worth fixing that as part of your chagne to ensure we've got coverage of both sides | 19:03 |
@jim:acmegating.com | Clark: maybe because there's always a valid clouds.yaml in all of our tests? | 19:05 |
@clarkb:matrix.org | ya maybe we're writing clouds.yaml for the k8s job and we should stop? | 19:05 |
@jim:acmegating.com | oh you mean the functional k8s job? | 19:06 |
@clarkb:matrix.org | yes | 19:06 |
@clarkb:matrix.org | since in theory that job doesn't have openstack config in it so would fail like the gerrit install. But as far as I can tell it hasn't? | 19:07 |
@jim:acmegating.com | it's a good question | 19:07 |
@clarkb:matrix.org | unless maybe it did and that was what the errors on mnaser's recent change are | 19:07 |
@jim:acmegating.com | if it does catch it, then it's possible the sdk change snuck in between the last nodepool merge and release | 19:07 |
@clarkb:matrix.org | the mnaser thing was in zuul-jobs so not testing nodepool with k8s and is likely unrelated | 19:08 |
@clarkb:matrix.org | https://review.opendev.org/c/zuul/nodepool/+/934187 should've had a failing k8s job because openstacksdk updated in october and this ran in november. But the job passed so we have a gap in testing | 19:09 |
@clarkb:matrix.org | oh wait we don't run nodepool in that job like we do with the openstack job and run a daemon? we seem to run nox test cases | 19:10 |
@clarkb:matrix.org | maybe that is how as we might manipulate things internally and not load all drivers or something? | 19:10 |
@clarkb:matrix.org | we seem to go through useNodepool in the test framework for the k8s testing and I'm guessing that loads drivers differently maybe? | 19:11 |
@clarkb:matrix.org | that creates a Nodepool instance and we start it which calls run which call updateConfig which calls loadConfig so I'm confused how the k8s job works. There must be a "valid" cloud config? | 19:15 |
@clarkb:matrix.org | corvus: do you think it is worth holding a node for the k8s job and trying to debug that further or if things pass call it good enough | 19:15 |
@jim:acmegating.com | Clark: my preference would be to leave it alone and back away slowly. | 19:16 |
@jim:acmegating.com | this has a limited shelf-life, and the fix is structural so we shouldn't have this kind of problem again even if sdk changes | 19:17 |
@clarkb:matrix.org | ya | 19:18 |
@clarkb:matrix.org | mostly just want to convince myself that we're not doing something silly in testing but even if we are its probably fine | 19:18 |
@clarkb:matrix.org | corvus: new theory: reading the sdk code I think they only validate the config if there was a config file on disk. https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/config/loader.py#L211-L214 only calls validate if the load function returns a result which it only does if it finds a file and json or yaml loads it. My guess is the gerrit environment has a config file for google cloud that collides with the openstack files | 19:23 |
@clarkb:matrix.org | corvus: our functional job works because there are no files and loading returns none and skips validation | 19:23 |
@clarkb:matrix.org | and now we are failing on the too many docker requests problem | 19:27 |
@jim:acmegating.com | Clark: /etc/nodepool/clouds.yaml exists and is empty | 19:36 |
@jim:acmegating.com | Clark: we create it in gerrit's zuul, maybe to silence an old version of the error ... :/ | 19:37 |
@jim:acmegating.com | https://gerrit.googlesource.com/zuul/ops/+/refs/heads/master/playbooks/deploy.yaml#56 | 19:38 |
@jim:acmegating.com | remote: https://gerrit-review.googlesource.com/c/zuul/ops/+/443462 Remove nodepool version pin and clouds.yaml [NEW] | 19:40 |
@jim:acmegating.com | Clark: ^ so that should theoretically be an alternate fix. | 19:41 |
@jim:acmegating.com | (but not in conflict with the first) | 19:41 |
@clarkb:matrix.org | Oh neat that explains it. I think both fixes are good to gave | 19:47 |
@jim:acmegating.com | nasser +2d that, i submitted it; it should deploy shortly | 19:52 |
@fungicide:matrix.org | hah, that's a marvellous situation | 19:53 |
@jim:acmegating.com | it did not remove the key from that secret... :/ | 20:01 |
@jim:acmegating.com | https://ci.gerritcodereview.com/t/gerrit/build/50d6b60122bc4050b4e6a92997892c14/console#1/0/7/localhost | 20:01 |
@jim:acmegating.com | i will manually delete the secret and re-run the deploy job | 20:02 |
@jim:acmegating.com | also... um... did they undo the change to allow zuul repos to be self-approved? | 20:07 |
@clarkb:matrix.org | no clue | 20:08 |
@jim:acmegating.com | oh no.. they just require a verified result apparently. nm. :) | 20:08 |
@jim:acmegating.com | 2024-11-20 20:13:20,076 INFO nodepool.NodePool: Nodepool launcher 11.0.1.dev3 2c9d067 starting | 20:15 |
@jim:acmegating.com | looks good now ^ | 20:15 |
@sdodsley:matrix.org | As part of our OpenStack 3rd Party CI system Zuul is sending logs to a VM in AWS, but the report to gerrit shows a link to the logs for the job runs that is not the publicly available AWS located, but a URL in our internal CI. How do I tell Zuul to use the URL for the AWS location? | 20:19 |
@fungicide:matrix.org | sdodsley: any relatively modern zuul should report the buildset link to gerrit, and then your zuul dashboard provides the breakout to individual log urls. what version of zuul are you running? | 20:22 |
@fungicide:matrix.org | also, it's really up to the teams in openstack setting expectations for your testing to decide how your logs should be made available to them, if at all, so they're probably the best place to start with questions | 20:23 |
@sdodsley:matrix.org | the buildset is not accessible to anyone outside of our org, which is why we push the logs to AWS. I need to put that AWS-based URL in the report so anyone can check our run logs | 20:23 |
@fungicide:matrix.org | yeah, i would suggest you start by asking openstack | 20:23 |
@fungicide:matrix.org | figure out what they expect, and get them to guide you in configuring your ci system to meet their expectations | 20:24 |
@sdodsley:matrix.org | ok- i'll try asking in openstack IRC | 20:24 |
@fungicide:matrix.org | zuul itself has a lot of options, and the zuul community can't reasonably guess what openstack is expecting you to provide with your test results | 20:25 |
@clarkb:matrix.org | there is a url pattern you can use still I think | 20:25 |
@clarkb:matrix.org | I'm trying to find that now | 20:25 |
@fungicide:matrix.org | sdodsley: a team-specific irc channel will likely yield better results, for example #openstack-cinder if you're testing a cinder driver | 20:25 |
@clarkb:matrix.org | oh maybe its semi hardcoded now | 20:26 |
@clarkb:matrix.org | https://opendev.org/zuul/zuul/src/branch/master/zuul/model.py#L6802-L6804 and then you can configure web root per tenant. Not sure if that will do what you want though | 20:27 |
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: | 20:56 | |
- [zuul/zuul] 935729: Batch fake build zookeeper additions https://review.opendev.org/c/zuul/zuul/+/935729 | ||
- [zuul/zuul] 935730: Batch fake build database additions https://review.opendev.org/c/zuul/zuul/+/935730 | ||
- [zuul/zuul] 935830: WIP: Add ZKObjectDict https://review.opendev.org/c/zuul/zuul/+/935830 | ||
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul-jobs] 935837: Fix dockerhub check in use-docker-mirror role https://review.opendev.org/c/zuul/zuul-jobs/+/935837 | 22:14 | |
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul-jobs] 935837: Fix dockerhub check in use-docker-mirror role https://review.opendev.org/c/zuul/zuul-jobs/+/935837 | 23:06 | |
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul-jobs] 935837: Fix dockerhub check in use-docker-mirror role https://review.opendev.org/c/zuul/zuul-jobs/+/935837 | 23:25 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!