Wednesday, 2024-11-20

-@gerrit:opendev.org- Artem Goncharov proposed: [zuul/zuul] 859939: Implement Gitea driver https://review.opendev.org/c/zuul/zuul/+/85993912:55
-@gerrit:opendev.org- Artem Goncharov proposed: [zuul/zuul] 859939: Implement Gitea driver https://review.opendev.org/c/zuul/zuul/+/85993915:14
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul-jobs] 935726: Cap the ansible version used by ansible-lint https://review.opendev.org/c/zuul/zuul-jobs/+/93572617:11
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul-jobs] 935722: Support new style mirror_info in use-docker-mirror https://review.opendev.org/c/zuul/zuul-jobs/+/93572217:31
@clarkb:matrix.orgcorvus: Nasser over in Gerrit land is reporting that nodepool-launcher is failing to start with a traceback that indicates the nodepool-launcher openstack driver init method is crashing when trying to load openstack config (I think clouds.yaml type config) during driver loading18:10
@clarkb:matrix.orgcorvus: is this possibly a regression in nodepool-launcher where we now require an openstack config even if you aren't using the driver or have any providers with openstack configured?18:10
@clarkb:matrix.orgor maybe openstacksdk is crashing now when before it would just be a noop if you have no cloud config18:11
@clarkb:matrix.orgthe code itself doesn't appear to have changed18:12
@clarkb:matrix.orgec1bb9eef1ef493f5f8e20e65b44934a3b1f51dc this change in openstacksdk added the check18:15
@clarkb:matrix.organd I'm guessing our new release of nodepool is what caused gerrit to pick it up?18:15
@clarkb:matrix.orgso I think we need to make the openstack driver catch and ignore the exception raised here or similar: https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/config/loader.py#L478-L48118:17
@clarkb:matrix.orgI am in a meeting now but can look closer afterwards18:17
@jim:acmegating.comClark: it looks like nodepool-launcher is running; i don't see any errors related to openstack.18:34
@jim:acmegating.comi also don't see any changes merged to the ops or config repos recently18:34
@clarkb:matrix.orgcorvus: they just downgraded to :10.0.0 which fixes it by using older openstacksdk18:34
@clarkb:matrix.org11.0.0 and latest have openstacksdk 4.1.0 which creates the error18:35
@clarkb:matrix.orgI don't know how nasser downgraded it18:35
@jim:acmegating.comyeah... i'd normally expect a change for that :/18:35
@clarkb:matrix.orgbut the issue seems to be we load all drivers for nodepool when nodepool starts regardless of which we end up using and openstack's driver __init__ calls reset() which loads configs which now raises an exception in the code I linked above18:36
@jim:acmegating.comClark: do you have a copy of the traceback?18:37
@clarkb:matrix.orgunfortunately I'm not logged into discord on my desktop only my phone right now. Let me try and email it to myself or something that should work18:38
@clarkb:matrix.org```18:41
Traceback (most recent call last):
File "/usr/local/bin/nodepool-launcher", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/launcher.py", line 88, in main
return NodePoolLauncherApp.main()
^^^^^^^^^^^^^^^^^^^^^^^^^^
[snip for discord msg length]
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/__init__.py", line 91, in get
Drivers.load()
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/__init__.py", line 84, in load
Drivers.drivers[driver] = driver_obj()
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/openstack/__init__.py", line 27, in __init__
self.reset()
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/openstack/__init__.py", line 24, in reset
self.openstack_config = loader.OpenStackConfig()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/openstack/config/loader.py", line 214, in __init__
self._validate_config_file(self.config_filename, self.cloud_config)
File "/usr/local/lib/python3.11/site-packages/openstack/config/loader.py", line 479, in _validate_config_file
raise exceptions.ConfigException(
openstack.exceptions.ConfigException: Configuration file {path} is empty or not a valid mapping
```
@clarkb:matrix.orgthe `{path}` thing is due to a missing f'' string f prefix in openstacksdk which they have since fixed but not released I guess18:42
@jim:acmegating.com`  [snip for discord msg length]`  :(18:43
@clarkb:matrix.orgya....18:43
@clarkb:matrix.orgNasser has pushed a change up to pin the launcher to 10.0.0 now18:44
@clarkb:matrix.org443461 on their gerrit server18:44
@clarkb:matrix.orglet me see if I can get the full traceback now18:44
@jim:acmegating.comthanks, that would be useful.  there are a number of possible code paths...18:48
@clarkb:matrix.org```18:49
Traceback (most recent call last):
File "/usr/local/bin/nodepool-launcher", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/launcher.py", line 88, in main
return NodePoolLauncherApp.main()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/__init__.py", line 249, in main
return super(NodepoolDaemonApp, cls).main(argv)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/__init__.py", line 196, in main
return cls()._main(argv=argv)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/__init__.py", line 186, in _main
return self._do_run()
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/__init__.py", line 230, in _do_run
return super(NodepoolDaemonApp, self)._do_run()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/__init__.py", line 192, in _do_run
return self.run()
^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nodepool/cmd/launcher.py", line 64, in run
config = self.pool.loadConfig()
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nodepool/launcher.py", line 1123, in loadConfig
config = nodepool_config.loadConfig(self.configfile)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nodepool/config.py", line 463, in loadConfig
newconfig.setProviders(config.get('providers'))
File "/usr/local/lib/python3.11/site-packages/nodepool/config.py", line 255, in setProviders
p = get_provider_config(provider)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nodepool/config.py", line 408, in get_provider_config
driver = Drivers.get(provider['driver'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/__init__.py", line 91, in get
Drivers.load()
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/__init__.py", line 84, in load
Drivers.drivers[driver] = driver_obj()
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/openstack/__init__.py", line 27, in __init__
self.reset()
File "/usr/local/lib/python3.11/site-packages/nodepool/driver/openstack/__init__.py", line 24, in reset
self.openstack_config = loader.OpenStackConfig()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/openstack/config/loader.py", line 214, in __init__
self._validate_config_file(self.config_filename, self.cloud_config)
File "/usr/local/lib/python3.11/site-packages/openstack/config/loader.py", line 479, in _validate_config_file
raise exceptions.ConfigException(
openstack.exceptions.ConfigException: Configuration file {path} is empty or not a valid mapping
```
@clarkb:matrix.orgdoes that imply they actually do have an openstack driver provider configured?18:49
@clarkb:matrix.orgI don't see one here at least: https://gerrit.googlesource.com/zuul/ops/+/refs/heads/master/nodepool/nodepool.yaml18:50
@jim:acmegating.comno, i think it's more or less as you described, but the usage here is weird18:51
@jim:acmegating.comi don't see why that should be a global singleton18:51
@clarkb:matrix.orgoh I see we loop over all drivers not the drivers in the config18:51
@clarkb:matrix.org> <@jim:acmegating.com> i don't see why that should be a global singleton18:52
Drivers?
@jim:acmegating.comno openstack_config18:53
@clarkb:matrix.orgoh I'm not sure. fwiw we do have handling if the driver fails to load but I guess not if we raise we have to return a nil driver object instead18:54
@clarkb:matrix.orgwondering if maybe we should modify _load_class to try except and then additionally log if that fails18:55
@clarkb:matrix.orgthen it can be an annoying but not completely broken error in this case18:55
@jim:acmegating.comheh, i think maybe it's a singleton because it might reload clouds.yaml every time and we might want to avoid that for efficiency, but we call reset every time we load the config18:57
@jim:acmegating.comso i think the current behavior is equivalent to just removing that attr from the driver; i'll make a patch for that18:57
@clarkb:matrix.orgoh wait _load_class is before we actually fail18:57
@clarkb:matrix.orgok I think that works too as long as we load the config when needed elsewhere18:58
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 935820: Remove global singleton openstack_config https://review.opendev.org/c/zuul/nodepool/+/93582018:59
@jim:acmegating.comClark: ^ that's what i'm thinking.  i have not tested that locally.18:59
@clarkb:matrix.orgoh we don't actually use the attr at all we're doing that purely for side effects18:59
@jim:acmegating.comwe do use it; you can see in that change18:59
@clarkb:matrix.orgcorvus: how did this not break our testing with k8s?18:59
@clarkb:matrix.orgin theory our k8s functional job doesn't have an openstack provider. Did we just not run that job when we merged whatever updated things? However this updated long enough ago that seems unlikely19:00
@clarkb:matrix.orgcorvus: where do we call reset when we load the provider config?19:02
@clarkb:matrix.orgyour change makes sense to me I'm just not immediately seeing ^19:02
@clarkb:matrix.orgoh maybe that is part of the driver loading machinery19:02
@clarkb:matrix.orgyup loadConfig calls driver.reset for each driver19:03
@clarkb:matrix.orgthe only other question is why our testing didn't catch this. It may be worth fixing that as part of your chagne to ensure we've got coverage of both sides19:03
@jim:acmegating.comClark: maybe because there's always a valid clouds.yaml in all of our tests?19:05
@clarkb:matrix.orgya maybe we're writing clouds.yaml for the k8s job and we should stop?19:05
@jim:acmegating.comoh you mean the functional k8s job?19:06
@clarkb:matrix.orgyes19:06
@clarkb:matrix.orgsince in theory that job doesn't have openstack config in it so would fail like the gerrit install. But as far as I can tell it hasn't?19:07
@jim:acmegating.comit's a good question19:07
@clarkb:matrix.orgunless maybe it did and that was what the errors on mnaser's recent change are19:07
@jim:acmegating.comif it does catch it, then it's possible the sdk change snuck in between the last nodepool merge and release19:07
@clarkb:matrix.orgthe mnaser thing was in zuul-jobs so not testing nodepool with k8s and is likely unrelated19:08
@clarkb:matrix.orghttps://review.opendev.org/c/zuul/nodepool/+/934187 should've had a failing k8s job because openstacksdk updated in october and this ran in november. But the job passed so we have a gap in testing19:09
@clarkb:matrix.orgoh wait we don't run nodepool in that job like we do with the openstack job and run a daemon? we seem to run nox test cases19:10
@clarkb:matrix.orgmaybe that is how as we might manipulate things internally and not load all drivers or something?19:10
@clarkb:matrix.orgwe seem to go through useNodepool in the test framework for the k8s testing and I'm guessing that loads drivers differently maybe?19:11
@clarkb:matrix.orgthat creates a Nodepool instance and we start it which calls run which call updateConfig which calls loadConfig so I'm confused how the k8s job works. There must be a "valid" cloud config?19:15
@clarkb:matrix.orgcorvus: do you think it is worth holding a node for the k8s job and trying to debug that further or if things pass call it good enough19:15
@jim:acmegating.comClark: my preference would be to leave it alone and back away slowly.19:16
@jim:acmegating.comthis has a limited shelf-life, and the fix is structural so we shouldn't have this kind of problem again even if sdk changes19:17
@clarkb:matrix.orgya19:18
@clarkb:matrix.orgmostly just want to convince myself that we're not doing something silly in testing but even if we are its probably fine19:18
@clarkb:matrix.orgcorvus: new theory: reading the sdk code I think they only validate the config if there was a config file on disk. https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/config/loader.py#L211-L214 only calls validate if the load function returns a result which it only does if it finds a file and json or yaml loads it. My guess is the gerrit environment has a config file for google cloud that collides with the openstack files19:23
@clarkb:matrix.orgcorvus: our functional job works because there are no files and loading returns none and skips validation19:23
@clarkb:matrix.organd now we are failing on the too many docker requests problem19:27
@jim:acmegating.comClark: /etc/nodepool/clouds.yaml exists and is empty19:36
@jim:acmegating.comClark: we create it in gerrit's zuul, maybe to silence an old version of the error ... :/19:37
@jim:acmegating.comhttps://gerrit.googlesource.com/zuul/ops/+/refs/heads/master/playbooks/deploy.yaml#5619:38
@jim:acmegating.comremote:   https://gerrit-review.googlesource.com/c/zuul/ops/+/443462 Remove nodepool version pin and clouds.yaml [NEW]        19:40
@jim:acmegating.comClark: ^ so that should theoretically be an alternate fix.19:41
@jim:acmegating.com(but not in conflict with the first)19:41
@clarkb:matrix.orgOh neat that explains it. I think both fixes are good to gave19:47
@jim:acmegating.comnasser +2d that, i submitted it; it should deploy shortly19:52
@fungicide:matrix.orghah, that's a marvellous situation19:53
@jim:acmegating.comit did not remove the key from that secret... :/20:01
@jim:acmegating.comhttps://ci.gerritcodereview.com/t/gerrit/build/50d6b60122bc4050b4e6a92997892c14/console#1/0/7/localhost20:01
@jim:acmegating.comi will manually delete the secret and re-run the deploy job20:02
@jim:acmegating.comalso... um... did they undo the change to allow zuul repos to be self-approved?20:07
@clarkb:matrix.orgno clue20:08
@jim:acmegating.comoh no.. they just require a verified result apparently.  nm.  :)20:08
@jim:acmegating.com2024-11-20 20:13:20,076 INFO nodepool.NodePool: Nodepool launcher 11.0.1.dev3 2c9d067 starting20:15
@jim:acmegating.comlooks good now ^20:15
@sdodsley:matrix.orgAs part of our OpenStack 3rd Party CI system Zuul is sending logs to a VM in AWS, but the report to gerrit shows a link to the logs for the job runs that is not the publicly available AWS located, but a URL in our internal CI. How do I tell Zuul to use the URL for the AWS location?20:19
@fungicide:matrix.orgsdodsley: any relatively modern zuul should report the buildset link to gerrit, and then your zuul dashboard provides the breakout to individual log urls. what version of zuul are you running?20:22
@fungicide:matrix.orgalso, it's really up to the teams in openstack setting expectations for your testing to decide how your logs should be made available to them, if at all, so they're probably the best place to start with questions20:23
@sdodsley:matrix.orgthe buildset is not accessible to anyone outside of our org, which is why we push the logs to AWS. I need to put that AWS-based URL in the report so anyone can check our run logs20:23
@fungicide:matrix.orgyeah, i would suggest you start by asking openstack20:23
@fungicide:matrix.orgfigure out what they expect, and get them to guide you in configuring your ci system to meet their expectations20:24
@sdodsley:matrix.orgok- i'll try asking in openstack IRC20:24
@fungicide:matrix.orgzuul itself has a lot of options, and the zuul community can't reasonably guess what openstack is expecting you to provide with your test results20:25
@clarkb:matrix.orgthere is a url pattern you can use still I think20:25
@clarkb:matrix.orgI'm trying to find that now20:25
@fungicide:matrix.orgsdodsley: a team-specific irc channel will likely yield better results, for example #openstack-cinder if you're testing a cinder driver20:25
@clarkb:matrix.orgoh maybe its semi hardcoded now20:26
@clarkb:matrix.orghttps://opendev.org/zuul/zuul/src/branch/master/zuul/model.py#L6802-L6804 and then you can configure web root per tenant. Not sure if that will do what you want though20:27
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:20:56
- [zuul/zuul] 935729: Batch fake build zookeeper additions https://review.opendev.org/c/zuul/zuul/+/935729
- [zuul/zuul] 935730: Batch fake build database additions https://review.opendev.org/c/zuul/zuul/+/935730
- [zuul/zuul] 935830: WIP: Add ZKObjectDict https://review.opendev.org/c/zuul/zuul/+/935830
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul-jobs] 935837: Fix dockerhub check in use-docker-mirror role https://review.opendev.org/c/zuul/zuul-jobs/+/93583722:14
-@gerrit:opendev.org- Clark Boylan proposed: [zuul/zuul-jobs] 935837: Fix dockerhub check in use-docker-mirror role https://review.opendev.org/c/zuul/zuul-jobs/+/93583723:06
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [zuul/zuul-jobs] 935837: Fix dockerhub check in use-docker-mirror role https://review.opendev.org/c/zuul/zuul-jobs/+/93583723:25

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!