*** ysandeep|out is now known as ysandeep | 04:30 | |
*** ykarel|away is now known as ykarel | 05:02 | |
*** iurygregory_ is now known as iurygregory | 06:42 | |
*** rpittau|afk is now known as rpittau | 07:22 | |
*** jpena|off is now known as jpena | 07:34 | |
opendevreview | Jiri Podivin proposed openstack/openstack-zuul-jobs master: DNM https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/804962 | 08:16 |
---|---|---|
*** mgoddard- is now known as mgoddard | 08:20 | |
*** ysandeep is now known as ysandeep|lunch | 08:26 | |
gryf | hi. I'm just wondering. I have a job with two nodes (controller and compute1), defined like this: https://opendev.org/openstack/kuryr-kubernetes/src/branch/master/.zuul.d/multinode.yaml#L15-L63 and I'd like to pass something which is generated during stacking devstack. is it possible? | 08:35 |
gryf | I'm looking on https://zuul-ci.org/docs/zuul/reference/jobs.html#return-values but cannot figure out how to pass something from controller to the compute1. | 08:36 |
gryf | especially how data.foo = bar would be exposed to zuul by controller? I'd like to have it passed to the compute1. | 08:37 |
opendevreview | chandan kumar proposed openstack/openstack-zuul-jobs master: DNM https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/804962 | 08:39 |
opendevreview | chandan kumar proposed openstack/openstack-zuul-jobs master: DNM https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/804962 | 08:40 |
opendevreview | Sorin Sbârnea proposed openstack/openstack-zuul-jobs master: tox: help py36 jobs use utf8 encoding https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/804890 | 08:58 |
opendevreview | chzhang8 proposed openstack/project-config master: bring tricircle under x namespaces https://review.opendev.org/c/openstack/project-config/+/804969 | 09:00 |
opendevreview | chzhang8 proposed openstack/project-config master: bring tricircle under x namespaces https://review.opendev.org/c/openstack/project-config/+/804970 | 09:04 |
*** ykarel is now known as ykarel|lunch | 09:04 | |
opendevreview | chandan kumar proposed openstack/openstack-zuul-jobs master: DNM https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/804962 | 09:06 |
opendevreview | chzhang8 proposed openstack/project-config master: bring tricircle under x namespaces https://review.opendev.org/c/openstack/project-config/+/804972 | 09:20 |
*** ysandeep|lunch is now known as ysandeep | 09:47 | |
opendevreview | chzhang8 proposed openstack/project-config master: bring tricircle under x namespaces https://review.opendev.org/c/openstack/project-config/+/804977 | 09:51 |
*** odyssey4me is now known as Guest4722 | 10:37 | |
*** ykarel|lunch is now known as ykarel | 10:58 | |
*** jpena is now known as jpena|lunch | 11:34 | |
*** rlandy is now known as rlandyrover | 11:36 | |
*** rlandyrover is now known as rlandy|rover | 11:36 | |
*** sshnaidm|pto is now known as sshnaidm | 12:25 | |
*** jpena|lunch is now known as jpena | 12:32 | |
slaweq | frickler: hi | 12:39 |
slaweq | frickler: can You maybe help me once again with one zuul related issue (I think it's zuul related) | 12:39 |
slaweq | some time ago we had in tobiko problem that zuul was reporting "Unable to freeze job graph: Job devstack-tobiko is abstract and may not be directly run" | 12:39 |
slaweq | we fixed that with patch https://review.opendev.org/c/x/devstack-plugin-tobiko/+/804356 | 12:40 |
slaweq | but now it happend again for us in patch https://review.opendev.org/c/x/tobiko/+/804881 | 12:40 |
slaweq | and looking at https://zuul.openstack.org/job/devstack-tobiko we see there are 2 jobs devstack-tobiko defined there | 12:40 |
slaweq | and one of them is abstract | 12:40 |
slaweq | but in fact we don't have 2 definitions of that job, only one which isn't abstract for sure | 12:41 |
slaweq | frickler: can You take a look and help me understand that problem? Thx in advance | 12:41 |
frickler | slaweq: sorry, no time right now, maybe some other infra-root can take a look in a bit | 13:00 |
slaweq | frickler: sure, np | 13:00 |
slaweq | infra-root folks, can someone help me with ^^? Thx in advance | 13:01 |
fungi | slaweq: so this started sometime after 14:55 utc yesterday when the previous patchset was pushed? | 13:08 |
fungi | and by 07:16 utc today when the next revision was pushed | 13:09 |
fungi | i guess it affects all open changes for x/tobiko? | 13:09 |
slaweq | fungi: it started happening again today, at least we noticed it then | 13:10 |
slaweq | fungi: and it is now affecting all tobiko patches | 13:11 |
fungi | slaweq: i do wonder if there's any relationship with the remaining config errors for x/devstack-plugin-tobiko: https://zuul.opendev.org/t/openstack/config-errors | 13:11 |
slaweq | fungi: it's related in the way that it reports issues in jobs defined in zuul.d/jobs.yaml | 13:13 |
slaweq | and that file is not existing anymore | 13:13 |
slaweq | also the abstract "version" of the devstack-tobiko job was also defined in the same file | 13:13 |
slaweq | but now it's in different file | 13:13 |
fungi | i see, and that was adjusted by https://review.opendev.org/801436 which merged on july 21 | 13:17 |
slaweq | fungi: yes | 13:18 |
fungi | yeah, something weird is going on there, zuul should no longer be complaining about a file which doesn't exist | 13:20 |
slaweq | fungi: we just tried to revert that patch which renamed file https://review.opendev.org/c/x/devstack-plugin-tobiko/+/805018/ and with that jobs seems to be run https://review.opendev.org/c/x/tobiko/+/805019/ | 13:23 |
slaweq | so it seems that zuul have somewhere old files - maybe it's some cache? | 13:23 |
fungi | slaweq: yes, that's what i'm inquiring about. there was some recent work on configuration caches, and we restarted zuul between those known working and broken runs | 13:25 |
fungi | yeah, so the old file before it was renamed did define that job as abstract, so if zuul is still caching that old file then it would explain both the problem you noticed and the stale config errors | 13:43 |
zbr | fungi: clarkb re py36-pip issue, i updated https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/804890 with correct fix and I even manged to get a helping change merged into tox itself: https://github.com/tox-dev/tox/pull/2162 | 14:08 |
zbr | for the moment on tripleo we are implementing `LC_ALL={env:LC_ALL:en_US.UTF-8}` in tox.in, as a workaround. At least that one will respect other values if present on the nodes. | 14:10 |
fungi | to clarkb's point yesterday though, if we work around it in the job definition instead of in the project's tox.ini file, then people may be confused when running tox locally breaks in ways that running it under zuul does not | 14:18 |
*** ykarel is now known as ykarel|away | 14:42 | |
clarkb | fwiw I think updating tox.ini is the correct fix not a workaround (as long as pip doesn't fix it themselves) | 15:14 |
*** ysandeep is now known as ysandeep|away | 15:38 | |
*** jpena is now known as jpena|off | 15:42 | |
*** rpittau is now known as rpittau|afk | 16:06 | |
zbr | tox patch to add LC_ALL to default passenv was just released. this should help a little | 16:30 |
zbr | i doubt pip will fix it, it appears deep into python. filesystem related apparently. ansible contains some test files with unicode filenames, that is what is breaking it. | 16:31 |
zbr | most projects do not have unicode files inside their wheels | 16:32 |
fungi | yeah, having tox itself do that seems like a better solution, since if users run into it with local testing the solution is simply to upgrade tox | 16:33 |
zbr | imho, we still need the patch for openstack-tox-py36 job if we do not want to force all repos owners to add the hack themselves. | 16:59 |
clarkb | zbr: right we are saying repos should be forced to do that so there isn't confusion when people run tox locally | 17:00 |
fungi | or tell people to use newer tox which infers a locale when none is set | 17:08 |
zbr | if i remember correctly microsoft already added that locale on their github images as i did not encounter the same bug there, even if i run lots of jobs with py36 installing ansible | 17:20 |
fungi | some distros do set a default system-wide locale in their images | 17:21 |
fungi | also some connection methods may override those envvars | 17:21 |
fungi | for example, openssh tries to substitute your client-side locale vars in place of whatever might be set server-side | 17:22 |
fungi | slaweq: so the good news is i think corvus has identified the cause, it does indeed seem to be a stale cache problem of sorts (old files not getting correctly removed from the cache, then being read in from the cache on the next startup) | 17:34 |
fungi | also it looks like 804356 which you thought fixed a problem in your config did not, there was no actual problem in your current config you were seeing a ghost of the old config, but merging that change caused zuul to use a correct view of that project's configuration until its next scheduler restart | 17:35 |
fungi | which explains why the problem seems to have spontaneously returned | 17:36 |
fungi | whoami-rajat: circling back around to your stale config issue from yesterday, we suspect it was different, looks like zuul did not see or had trouble processing the merge event for your config change, which resulted in its cache continuing with the previous content | 17:38 |
fungi | the error we saw in the debug log may be related to network instability between the zuul scheduler and the gerrit server (they're in different cloud providers now and communicate with each other across the open internet) | 17:39 |
fungi | clarkb has proposed https://review.opendev.org/804929 to hopefully make that communication a bit more robust | 17:39 |
whoami-rajat | fungi, ack, thanks for the update, will recheck after this change merges | 17:40 |
fungi | whoami-rajat: yeah, your issue may have been solved by yesterday's scheduler restart, since it added a change to perform a reconfigure on startup looking for stale configuration states (that clearly doesn't catch the deleted files left behind in the cache however) | 17:42 |
fungi | also corvus has proposed https://review.opendev.org/804304 which would give our administrators the ability to clear the entire state cache | 17:43 |
slaweq | fungi: corvus: thx a lot for help with that issue | 17:45 |
fungi | if all goes well we may restart on a fixed version later today or tomorrow now that the cause is better understood | 17:48 |
clarkb | zbr: fwiw we run lots of py36 jobs on bionic too and don't hit the problem | 17:50 |
clarkb | zbr: zuul in particular comes to mind and it does install ansible | 17:50 |
*** timburke_ is now known as timburke | 21:00 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!