| -@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 963933: Fix image upload validation race conditions https://review.opendev.org/c/zuul/zuul/+/963933 | 05:44 | |
| -@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul-jobs] 963828: Allow upload-image-s3 role to export S3 URLS https://review.opendev.org/c/zuul/zuul-jobs/+/963828 | 08:31 | |
| @y2kenny:matrix.org | Hi, I just upgraded to Zuul 13.0.0 and some configs that used to work now produce error: "Unable to freeze job graph: 'NoneType' object has no attribute 'project_canonical_name'"... is this a known issue? | 15:04 |
|---|---|---|
| @y2kenny:matrix.org | I thought it may have to do with my usage of zuul.project.canonical_name but that doesn't appear to be the case. I did a code search on hound for 'project_canonical_name' and got some hit but I am not familiar with Zuul's code base. | 15:08 |
| @fungicide:matrix.org | Kenny Ho: what version were you running before the upgrade? | 15:09 |
| @y2kenny:matrix.org | 12.1.0 | 15:09 |
| @y2kenny:matrix.org | 12.1.0 | 15:10 |
| @fungicide:matrix.org | okay, so just the previous version then | 15:11 |
| @clarkb:matrix.org | That error reads to me like Zuul is looking for a project_canonical_name in an object that is None | 15:11 |
| @fungicide:matrix.org | yeah, like in zookeeper? | 15:12 |
| @clarkb:matrix.org | without a traceback its hard to say. I would expect the scheduler logs to have tracebacks around that info but not certain of that | 15:12 |
| @y2kenny:matrix.org | I did a delete-state before upgrading... may be that's a bad thing to do? | 15:12 |
| @y2kenny:matrix.org | ok... I found the trace... give me a sec | 15:15 |
| @clarkb:matrix.org | its possible that zuul was working with a partially loaded config due to an error before. And now that it has to load most state from scratch it is unable to do so post delete-state | 15:15 |
| @clarkb:matrix.org | basically I don't think delete-state is the underlying cause of whatever the issue is, but it may have surfaced it | 15:15 |
| @y2kenny:matrix.org | ERROR zuul.Pipeline.osg.check: Traceback (most recent call last): | 15:17 |
| ERROR zuul.Pipeline.osg.check: File "/usr/local/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1828, in prepareItem | ||
| ERROR zuul.Pipeline.osg.check: item.freezeJobGraph(self.getLayout(item), | ||
| ERROR zuul.Pipeline.osg.check: File "/usr/local/lib/python3.11/site-packages/zuul/model.py", line 6741, in freezeJobGraph | ||
| ERROR zuul.Pipeline.osg.check: results = layout.createJobGraph(context, self, skip_file_matcher, | ||
| ERROR zuul.Pipeline.osg.check: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
| ERROR zuul.Pipeline.osg.check: File "/usr/local/lib/python3.11/site-packages/zuul/model.py", line 10355, in createJobGraph | ||
| ERROR zuul.Pipeline.osg.check: self.extendJobGraph( | ||
| ERROR zuul.Pipeline.osg.check: File "/usr/local/lib/python3.11/site-packages/zuul/model.py", line 10276, in extendJobGraph | ||
| ERROR zuul.Pipeline.osg.check: updates_job_config = item.updatesJobConfig( | ||
| ERROR zuul.Pipeline.osg.check: ^^^^^^^^^^^^^^^^^^^^^^ | ||
| ERROR zuul.Pipeline.osg.check: File "/usr/local/lib/python3.11/site-packages/zuul/model.py", line 7682, in updatesJobConfig | ||
| ERROR zuul.Pipeline.osg.check: if pb.source_context.project_canonical_name == project_cn: | ||
| ERROR zuul.Pipeline.osg.check: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
| ERROR zuul.Pipeline.osg.check: AttributeError: 'NoneType' object has no attribute 'project_canonical_name' | ||
| @clarkb:matrix.org | ok so what that code is doing is iterating over all playbooks in a job and looking up the source context and project canonical name for where the playbook originates | 15:19 |
| @clarkb:matrix.org | and apparently there is missing info for at least one of the playbooks | 15:19 |
| @clarkb:matrix.org | and I think the reason is that the source_context is None. Can you check your zuul config error list to see if there are projects that are just broken config wise? | 15:20 |
| @clarkb:matrix.org | it wouldn't surprise me if project A is unable to load configs so it has no source_context then when you run job for project B it tries to load info from project A and this is the result | 15:21 |
| @clarkb:matrix.org | the web ui has a blue bell button in the top right to give you the project config error list which is where I would start | 15:21 |
| @jangutter:matrix.org | Kenny Ho: If it doesn't have a bell, then you can check it with something like https://your-zuul-site.net/t/your-tenant/config-errors | 15:21 |
| @jangutter:matrix.org | (example: https://zuul.opendev.org/t/zuul/config-errors ) | 15:22 |
| @y2kenny:matrix.org | I see the bell and there are errors but I thought those have been there for awhile without causing a lot of issue. There's a few categories: | 15:24 |
| @y2kenny:matrix.org | Project Not FoundThe projects "zuul/nodepool", "zuul/zuul" were not found... (these are local mirror of upstream projects that also has zuul configs.) | 15:24 |
| @clarkb:matrix.org | Kenny Ho: yes, zuul tries to be resilient to errors and will use older configs that were valid if the current config is not valid. I think (but am not 100% certain) that when you ran delete-state you cleared out that fallback information and now zuul is trying to load the config from zero and it needs things to be valid to do so | 15:24 |
| @clarkb:matrix.org | generally you shouldn't need to run delete-state when upgrading fwiw | 15:25 |
| @y2kenny:matrix.org | ok | 15:25 |
| @y2kenny:matrix.org | another error type: | 15:25 |
| UnknownCmd('git') failed due to: exit code(128) cmdline: git ls-remote --heads --tags https://src.fedoraproject.org/rpms/rocm stderr: 'fatal: unable to access 'https://src.fedoraproject.org/rpms/rocm/': The requested URL returned error: 503' | ||
| @jangutter:matrix.org | Is it possible some of these were once available? | 15:26 |
| @clarkb:matrix.org | Kenny Ho: fedoraproject.org is running anubis now and their rules may be blocking valid git requests? | 15:26 |
| @clarkb:matrix.org | that is something you'd need to ask them about | 15:26 |
| @y2kenny:matrix.org | that's may be the case... but I am hoping that wouldn't break zuul somehow | 15:26 |
| @y2kenny:matrix.org | last type of error: | 15:27 |
| Will not fetch project branches as read-only is set | ||
| @clarkb:matrix.org | by default anubis will only filter requests that present with a user agent indicating a browser | 15:27 |
| @clarkb:matrix.org | git uses a user agent that does not indicate it is a browser but they may be changing the defaults or applying additional rules or something | 15:27 |
| @clarkb:matrix.org | you can also try running that git command manually and see what happens | 15:27 |
| @fungicide:matrix.org | https://src.fedoraproject.org/rpms/rocm/ says it's a meta package, maybe it was originally an actual package | 15:28 |
| @y2kenny:matrix.org | but would any of these error cause issue with loading zuul-config? | 15:29 |
| @fungicide:matrix.org | though it does still indicate it's a git repo | 15:29 |
| @y2kenny:matrix.org | none of these git repos has zuul configs in them | 15:30 |
| @y2kenny:matrix.org | well, except for the mirrored zuul related repos | 15:30 |
| @fungicide:matrix.org | fedora uses zuul, but i don't know whether they put zob configs in the package repos | 15:31 |
| @fungicide:matrix.org | er, job configs | 15:31 |
| @y2kenny:matrix.org | would a tenant-reconfigure help refresh the configuration? | 15:32 |
| @sarrafis:matrix.org | Hi, I wanted to check if I am in the right place: I followed this link: https://docs.openstack.org/contributors/common/irc.html that said I can use IRC chat with Matrix through Zuul. Are there any other steps I should do now? | 15:33 |
| @sarrafis:matrix.org | * Hi, I wanted to check if I am in the right place: I followed this link: https://docs.openstack.org/contributors/common/irc.html that said I can use IRC chat with Matrix through Zuul. Are there any other steps I should do now? I have a macbook M2 in case that's of relevance. | 15:34 |
| @fungicide:matrix.org | Ishita Sarraf: you're in the matrix room for the zuul project, not an irc channel | 15:35 |
| @jangutter:matrix.org | Kenny Ho: do you have a previous review that passed on that project? If you can figure out the jobs that should be in the buildset maybe you can work your way back from there. | 15:35 |
| @fungicide:matrix.org | isaacvicente: that article doesn't say anything about using irc chat with matrix through zuul, it's an article by the zuul community about communicating on irc through matrix | 15:36 |
| @fungicide:matrix.org | er, Ishita Sarraf: ^ | 15:36 |
| @y2kenny:matrix.org | I was trying that but I cut things down to even noop and I am still getting the same error | 15:36 |
| @fungicide:matrix.org | sorry isaacvicente, tab-completing names failed me | 15:36 |
| @jangutter:matrix.org | Does your config project add some jobs to that project? | 15:38 |
| @y2kenny:matrix.org | I have a separate trusted project to add jobs for that project yes. | 15:38 |
| @y2kenny:matrix.org | that project itself does not have any zuul configs in it | 15:38 |
| @jangutter:matrix.org | You could try clearing up the config errors to reduce the noise, and see if you can find a project where you can run the base and/or noop job first. Then gradually adding jobs till you find the ones that break it. | 15:43 |
| @y2kenny:matrix.org | um... ok. I did a tenant-reconfigure and the 503 error related to the fedoraprojects are now gone but the problem persist | 15:44 |
| @jangutter:matrix.org | If a "good config" got cached, it's an unexploded bomb that will hit you at the most inopportune time. | 15:48 |
| @clarkb:matrix.org | you could also try to tackle it in the other direction. Make a list of all the playbooks the broken job is trying to run. Then lookup their locations and see if there are config errors for the origin project | 15:49 |
| @y2kenny:matrix.org | the problem is that I can't tell which job is broken right now | 15:53 |
| @y2kenny:matrix.org | that particular project triggers a lot of jobs | 15:53 |
| @y2kenny:matrix.org | the weird thing is how it still has issue when I trim it down to noop... but may be I didn't do it properly. | 16:07 |
| @y2kenny:matrix.org | ok... so I did a couple of my test: | 16:28 |
| shutdown whole zuul cluster, delete state, restart: problem persist. | ||
| delete state, downgrade whole cluster back to 12.1.0: problem went away. | ||
| @fungicide:matrix.org | that's probably a really slow process to hook up to a git bisect | 16:30 |
| @fungicide:matrix.org | 143 non-merge commits between those versions | 16:31 |
| @fungicide:matrix.org | so 7-8 steps | 16:31 |
| @y2kenny:matrix.org | in case this may help identify the issue, the triggering project is the linux kernel (so no zuul config in it), with the trigger store in a separate trusted project (zuul-project) and the job stored in an untrusted project (zuul-job) | 16:34 |
| @jim:acmegating.com | Kenny Ho: FYI: `Will not fetch project branches as read-only is set` really means there was a problem getting the branches of that repo on the scheduler. the actual error would only be in the scheduler logs; the web server doesn't have access to it so it outputs that error instead. it may be similar or the same as the errors you cited regarding the rocm sources. | 16:40 |
| @jim:acmegating.com | * Kenny Ho: FYI: `Will not fetch project branches as read-only is set` really means there was a problem getting the branches of that repo on the scheduler. the actual error would only be in the scheduler logs; the web server doesn't have access to it so it outputs that error instead. it may be similar or the same as the errors you cited regarding the rocm repo. | 16:40 |
| @fungicide:matrix.org | the code block where that exception is being raised came in with https://review.opendev.org/941171 "Ignore file matches on changes to include-vars and playbooks" and is new in 13.0.0 | 16:44 |
| @fungicide:matrix.org | so a simple (but perhaps not that helpful) answer to why the exception isn't raised in 12.1.0 is that the code didn't exist in that version | 16:49 |
| @bennetefx:matrix.org | Hi, just a quick reminder on this change https://review.opendev.org/c/zuul/zuul/+/962904 — it has a +1 from the Zuul pipeline tests and is ready for review when someone has a moment. Thanks! | 17:11 |
| -@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 964194: Add label aliases for static nodes https://review.opendev.org/c/zuul/zuul/+/964194 | 20:10 | |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!