*** pojadhav- is now known as pojadhav | 04:55 | |
*** marios is now known as marios|ruck | 04:57 | |
*** soniya29 is now known as soniya29|ruck | 05:35 | |
*** soniya29|ruck is now known as soniya29|rover | 05:37 | |
*** ysandeep|out is now known as ysandeep | 06:23 | |
frickler | infra-root: zuul seems to be in a bad state, no check jobs running, trying to find out more details | 07:00 |
---|---|---|
frickler | https://zuul.opendev.org/t/openstack/builds shows only periodic jobs for some hours. I do see events submitted from gerrit, but they seem to disappear somewhere | 07:09 |
ysandeep | marios|ruck, soniya29|rover ^^ fyi.. | 07:12 |
soniya29|rover | ysandeep, ack | 07:12 |
frickler | this looks like it could be related https://paste.opendev.org/show/b2jHWxtJG1ljGC3ePUsh/ | 07:15 |
marios|ruck | thanks ysandeep | 07:16 |
frickler | #status notice zuul isn't executing check jobs at the moment, investigation is ongoing, please be patient | 07:17 |
opendevstatus | frickler: sending notice | 07:17 |
-opendevstatus- NOTICE: zuul isn't executing check jobs at the moment, investigation is ongoing, please be patient | 07:17 | |
opendevreview | Arnaud Morin proposed openstack/project-config master: [OVH/GRA1] Disable nodepool temporarily https://review.opendev.org/c/openstack/project-config/+/835415 | 07:26 |
swest | frickler: re. the exception: this shouldn't block when it happens in during tenant trigger processing. the exception is the result of a data race when reading the state while it is currently updated | 07:38 |
swest | * frickler: re. the exception: this shouldn't block anything when it happens during tenant trigger processing. the exception is the result of a data race when reading the state while it is currently updated | 07:38 |
*** jpena|off is now known as jpena | 07:39 | |
*** soniya29|rover is now known as soniya29|rover|lunch | 07:41 | |
*** ysandeep is now known as ysandeep|lunch | 07:49 | |
frickler | swest: right, it just matched the time when I though the issue might have started, but looking further, this seems to happen pretty often, so likely unrelated | 08:14 |
frickler | too bad we have so many errors occurring regulary | 08:15 |
frickler | gerrit is even worse, the log is a continous stream of error messages | 08:15 |
frickler | swest: do you think I should try restarting a scheduler? or better wait for further investigation? | 08:20 |
frickler | btw I tested that gate jobs are not being triggered as well, so only periodic jobs working. not sure if manually enqueueing something might be interesting? | 08:22 |
swest | frickler: I think you could try restarting a scheduler. As periodic jobs are starting it seems to be more of a problem related to gerrit events and not a general issue. | 08:23 |
swest | do you see any exceptions related to Gerrit event processing? | 08:25 |
frickler | restarting zuul01 scheduler now | 08:31 |
frickler | swest: no exceptions, just something like this and then nothing matching that change afterwards 2022-03-28 06:37:04,258 DEBUG zuul.zk.event_queues.ConnectionEventQueue: Submitting connection event to queue /zuul/events/connection/gerrit/events/queue: | 08:33 |
frickler | restart has helped. check 50, gate 22 | 08:34 |
frickler | seems zuul is now processing all the previous events, like from https://review.opendev.org/835413 | 08:35 |
swest | frickler: sounds like the gerrit event pre processing was stuck somehow | 08:35 |
frickler | ah, I probably should've collected stacktraces | 08:36 |
frickler | trying that for zuul02, maybe is still has the broken state | 08:36 |
swest | only one of the schedulers is elected for processing the Gerrit events. but the stacktrace might still be helpful | 08:38 |
*** ysandeep|lunch is now known as ysandeep | 08:45 | |
Ian | IDENTIFY | 08:45 |
frickler | likely too late, seems zuul02 started working as soon as zuul01 and was elected then | 08:45 |
frickler | Ian: nope | 08:45 |
Ian | sorry. | 08:46 |
frickler | Ian: np, just kidding ;) | 08:47 |
frickler | this has a gap between 01:12:49,653 and 08:25:42,586: grep Forwarding /var/log/zuul/debug.log|grep -v " timer "|grep -v " pull_request " | 08:47 |
Ian | always knew IRC existed. never messe with it | 08:47 |
frickler | swest: let's hope that if it is a regression in recent zuul code, it will be triggered again soonish | 08:48 |
opendevreview | Merged openstack/project-config master: [OVH/GRA1] Disable nodepool temporarily https://review.opendev.org/c/openstack/project-config/+/835415 | 09:04 |
*** soniya29|rover|lunch is now known as soniya29|rover | 09:18 | |
opendevreview | Arnaud Morin proposed openstack/project-config master: Revert "[OVH/GRA1] Disable nodepool temporarily" https://review.opendev.org/c/openstack/project-config/+/835422 | 09:23 |
opendevreview | Chris MacNaughton proposed openstack/project-config master: Add Ganesha based Ceph NFS Charm https://review.opendev.org/c/openstack/project-config/+/835430 | 10:17 |
dpawlik | Ian, fungi, clarkb: after merging that change https://review.opendev.org/c/openstack/ci-log-processing/+/833011 logsender will take more time to parse the log files, so can we resize once again logscraper01.openstack.org to have 4vcpus? Right now is 2. | 10:32 |
dpawlik | logsender seems to be better that logstash, so I will ask Reed to remove that stack related to logstash. | 10:32 |
dpawlik | not you reed :P | 10:32 |
opendevreview | Andy Ladjadj proposed zuul/zuul-jobs master: [upload-logs-base] add public url attribute https://review.opendev.org/c/zuul/zuul-jobs/+/834043 | 11:34 |
opendevreview | Andy Ladjadj proposed zuul/zuul-jobs master: [upload-logs-base] add public url attribute https://review.opendev.org/c/zuul/zuul-jobs/+/834043 | 11:35 |
opendevreview | Andy Ladjadj proposed zuul/zuul-jobs master: [upload-logs-base] add public url attribute https://review.opendev.org/c/zuul/zuul-jobs/+/834043 | 11:36 |
*** dviroel|pto is now known as dviroel | 11:52 | |
fungi | dpawlik: i can try to find time in a little while, seems like there's a bunch to catch up on first this morning | 11:58 |
*** pojadhav is now known as pojadhav|afk | 11:59 | |
dpawlik | fungi: ok | 12:02 |
ironfoot | Hi there, I'm somehow struggling to configure my environment to send a patch to gerrit. | 12:17 |
ironfoot | I've done this in the past, and I can't figure out what's wrong. The main issue is that my configured ssh keys don't seem to work | 12:18 |
fungi | ironfoot: you're following the setup guide in https://docs.opendev.org/opendev/infra-manual or some other instructions? | 12:18 |
frickler | ironfoot: are you using fedora35 or some other distro with recent openssh? | 12:18 |
ironfoot | didn't start reading that from scratch, as I managed to sent some patches in the past | 12:18 |
ironfoot | frickler: oh no, is that an issue? | 12:19 |
ironfoot | (yes, fedora 35) | 12:19 |
fungi | ahh, yep | 12:19 |
frickler | ironfoot: yes, gerrit has some issue with rsa keys and new openssh. workaround is to use other key type like elliptic curve | 12:19 |
frickler | or some special ssh options that I can remember | 12:20 |
frickler | can't | 12:20 |
ironfoot | right, will create a new key :) | 12:20 |
fungi | for some reason, new versions of openssh have elected to fall back to ssh_rsa host protocol when an rsa key is used, but also blocked use of ssh_rsa at the same time. there's finally support for negotiating rsa key exchanges with stronger hash algorithm in gerrit's development branch, so hopefully this will cease to be a problem soon | 12:22 |
fungi | (once we upgrade to gerrit 3.6.x i think?) | 12:22 |
ironfoot | can confirm, that fixed the issue | 12:23 |
ironfoot | or worked around it | 12:23 |
ironfoot | thanks frickler and fungi :D | 12:24 |
fungi | yw | 12:24 |
*** ysandeep is now known as ysandeep|afk | 12:25 | |
*** dviroel is now known as dviroel|brb | 12:43 | |
tobias-urdin | any known issues cloning from gerrit? i can access gerrit but cloning just hangs with no progress for the last 5-10 minutes now atleast https://paste.opendev.org/show/b4wkca8DepfpnRnMHMJS/ | 12:53 |
frickler | tobias-urdin: I don't know any particular current issue, but the general reply likely is that you shouldn't do this for performance reasons. clone from opendev.org which is our scalable gitea farm instead. you can then add gerrit as second remote if needed after that | 13:00 |
fungi | also the gitea servers have copies of all the gerrit change refs, git notes, and so on. there's nothing you can pull from gerrit via git which isn't also in the gitea servers | 13:04 |
*** ysandeep|afk is now known as ysandeep | 13:14 | |
tobias-urdin | ack, it went away for a coffe and 15 minutes later it was cloned, sorry for the noice :) | 13:21 |
tobias-urdin | I went away :p | 13:21 |
*** dviroel|brb is now known as dviroel | 13:27 | |
*** pojadhav|afk is now known as pojadhav | 14:02 | |
fungi | dpawlik: okay, i'm going to try to resize it from v3-starter-2 which has 4gb ram and 2 vcpus to v2-highcpu-4 which has the same 4gb ram but 4 vcpus | 14:17 |
fungi | in progress now | 14:18 |
fungi | once it's up and you have a chance to double-check it looks okay, let me know and i'll confirm the resize through the api | 14:18 |
fungi | looks like it's booted | 14:21 |
dpawlik | fungi: yep, it works. Thank you | 14:27 |
fungi | cool, i've marked it confirmed through the api now as well | 14:28 |
*** artom__ is now known as artom | 15:03 | |
*** marios|ruck is now known as marios|ruck|call | 15:23 | |
clarkb | fungi: yup gerrit 3.6 should fix it | 15:24 |
clarkb | frickler: swest: to clarify a single scheduler handles a single pipeline at a time. But each pass thorugh can be the other csheduler. The elect each othe rbut not necessarily for hte long term if that makes sense | 15:27 |
swest | clarkb: I was talking about the Gerrit event preprocessing | 15:34 |
clarkb | ah yup | 15:35 |
clarkb | sorry I saw check queue wasn't processing so thought that was the focus. Didn't realize that event processing beyond that was also impacted | 15:35 |
jrosser | does anyone know how the version of setuptools is decided here (its an openstack-tox-docs job) https://zuul.opendev.org/t/openstack/build/5027f55aed1444ec96ac10ce7df6ecb8/log/job-output.txt#645 | 15:49 |
clarkb | jrosser: they are bundled by virtualenv | 15:50 |
jrosser | ah ok - so to get a later setuptools would require a later release of virtualenv | 15:51 |
clarkb | yes, or yo uneed to manually update setuptools afterwards | 15:51 |
fungi | right. the virtualenv 20.14.0 release on friday updated setuptools from 60.10.0 to 61.1.0 | 16:01 |
fungi | according to its changelog | 16:01 |
*** marios|ruck|call is now known as marios|ruck | 16:03 | |
jrosser | fungi: i've just done some experiments here and it looks like 61.1.0 is good enough to fox what failed in that job | 16:04 |
jrosser | so hopefully this turns into a no-op | 16:04 |
jrosser | fix not fox :) | 16:05 |
fungi | you'll outfox the bug | 16:06 |
jrosser | oh wait - that job already installs virtualenv-20.14.0-py2.py3-none-any.whl | 16:06 |
fungi | what's the error you're encountering? | 16:06 |
fungi | is the the multiple top-level packages error? | 16:06 |
jrosser | yes it is | 16:06 |
jrosser | https://github.com/pypa/setuptools/issues/3197 | 16:07 |
fungi | okay, then that's the same problem everyone else ran into with setuptools 61 | 16:07 |
fungi | some projects (e.g. tripleo) merged workarounds on friday when it first hit | 16:07 |
jrosser | indeed - i was looking to avoid having to do that in ~50 repos if theres already a fix in stuptools | 16:08 |
fungi | only repos with multiple top-level packages in them run into this, as the error indicates, so thankfully it's a fairly small number of projects impacted | 16:08 |
clarkb | what version of steuptools has the fix? | 16:08 |
fungi | but i guess that's a design pattern adopted across a large swath of openstack-ansible? | 16:08 |
jrosser | alternatively, we have some other mistake in our config that triggers this and we should instead fix that | 16:08 |
jrosser | yes, it's pretty much copy/paste everywhere | 16:09 |
clarkb | fungi: the issue is when you have playbooks/ and roles/ and so on they all have python stuff in them and setuptools has a sad. system-config has the same problem | 16:09 |
fungi | unless i misread the setuptools discussion, this is working as designed | 16:09 |
fungi | anything using pbr is not impacted, right? | 16:09 |
clarkb | fungi: no | 16:10 |
jrosser | clarkb: i just tried a venv here with 61.0.0 and it broke like my zuul job, 61.1.0 seemed to work OK | 16:10 |
clarkb | system-config uses pbr and is affected | 16:10 |
fungi | i thought pbr assembles the manifest.in rather than relying on setuptools' discovery | 16:10 |
clarkb | fungi: it does, but setuptools still runs the discovery and fails stuff | 16:10 |
fungi | ahh, so still have to tell setuptools not to try to discover things | 16:10 |
clarkb | fungi: this is why the tripleo workaround works | 16:10 |
clarkb | they told setuptools there are no package modules don't discover things. Then pbr did its discovery and populated the package content | 16:11 |
fungi | yeah, looks like system-config does indeed rely on pbr | 16:11 |
clarkb | if 61.1.0 fixes things then we are good I think | 16:11 |
jrosser | once there is a release of virtualenv? | 16:13 |
*** dviroel is now known as dviroel|lunch | 16:13 | |
fungi | or forcing a setuptools upgrade in the env | 16:13 |
fungi | "#3202: Changed behaviour of auto-discovery to not explicitly expand package_dir for flat-layouts and to not use relative paths starting with ./." | 16:13 |
fungi | i guess that's the one? | 16:13 |
clarkb | jrosser: oh yup its still 61.0.0 in virtualenv got it. Need an update there too | 16:13 |
jrosser | is there a way to upgrade setuptools on a job which i'm a sort of drive-by consumer of, like openstack-tox-docs? | 16:14 |
fungi | ahh, no, it was this: | 16:14 |
fungi | "#3211: Disabled auto-discovery when distribution class has a configuration attribute (e.g. when the setup.py script contains setup(..., configuration=...)). This is done to ensure extension-only packages created with numpy.distutils.misc_util.Configuration are not broken by the safe guard behaviour to avoid accidental multiple top-level packages in a flat-layout." | 16:14 |
clarkb | If tox has a flag to tell virtualenv to update setuptools then we could set that flag in our base tox job for a while maybe | 16:14 |
fungi | it does, yes | 16:14 |
*** marios|ruck is now known as marios|out | 16:15 | |
fungi | well, maybe not a cli option | 16:15 |
fungi | it has an option you can set in tox.ini | 16:15 |
fungi | maybe there's a corresponding envvar, checking | 16:15 |
clarkb | fungi: does pbr set configuration=? | 16:15 |
clarkb | I don't understand why 3211 would fix it | 16:15 |
fungi | clarkb: yeah, i don't understand why 3211 would fix it either, but issue 3197 claims to be "fixed" by it | 16:16 |
clarkb | ah | 16:16 |
fungi | so maybe it was fixed by a combination of 3211 and 3202 | 16:17 |
fungi | https://tox.wiki/en/latest/config.html#conf-requires doesn't mention any envvar you can do that with | 16:20 |
*** ysandeep is now known as ysandeep|out | 16:20 | |
jrosser | oh i see why our other jobs don't fail with this, we don't usually use the bundled setuptools | 16:22 |
jrosser | currently the only thing i can find to do in an OSA job is put setuptools==60.9.3 in the tox.ini `deps =`, but thats super-fragile as it'll blow up as soon as the setuptools in u-c moves | 16:44 |
*** jpena is now known as jpena|off | 16:52 | |
*** dviroel|lunch is now known as dviroel | 16:55 | |
fungi | jrosser: setting requires setuptools>=61.1 in tox.ini won't solve it? | 16:59 |
fungi | is the problem that older 61.0.0 is getting called into and choking before tox gets a chance to update the virtualenv? | 17:00 |
jrosser | oh maybe you are right there | 17:00 |
fungi | [tox] requires = setuptools>=61.1 | 17:02 |
fungi | is what i'm talking about, to be clear | 17:02 |
fungi | (the config option i linked above has an example too) | 17:02 |
jrosser | fungi: in a simple environment i think that would work, however https://paste.opendev.org/show/bF5ZroGXqFcoh1lWJnZn/ | 17:07 |
jrosser | it's a bit catch-22, as `pip install -chttps://releases.openstack.org/constraints/upper/master 'setuptools>=61.1'` is never going to work | 17:08 |
clarkb | setuptools shouldn't be in constraints | 17:09 |
jrosser | unfortunately it is | 17:09 |
clarkb | (I've said this ove rand over again and I guess I don't get listened to) | 17:09 |
jrosser | in other OSA repos we download a copy of u-c and sed out all the nonsense like this | 17:09 |
fungi | yeah, i think the reason people stuffed it into upper-constraints.txt is because that's what devstack uses to install setuptools | 17:10 |
jrosser | having said that, the proposal bot removes setuptools here https://review.opendev.org/c/openstack/requirements/+/835329 | 17:11 |
fungi | oh good | 17:21 |
jrosser | i don't understand why it does that though :) it might come back! | 17:21 |
frickler | it does that because pip freeze without --all skips setuptools and others. afaict prometheanfire keeps readding it manually | 17:25 |
fungi | ahh, that's too bad | 17:25 |
fungi | since the result is that we're now pinning to one which is breaking some projects and preventing them from updating to a newer setuptools | 17:25 |
clarkb | yes this is why you aren't supposed to pin setuptools that way | 17:26 |
clarkb | its a chicken and egg problem that needs to be solved outside anyway so you should just avoid introducing errors | 17:26 |
clarkb | basically if you have functional problems constraints won't help you. All constraints can do is over constrain you making things worse in when it comes to setuptools | 17:27 |
jrosser | it would be very nice if it was removed from u-c and didnt come back | 17:30 |
jrosser | if it's true that it's there to determine the version for devstack, thats not a great reason imho | 17:30 |
fungi | yes, devstack could still do that in its own repository | 17:31 |
*** lajoskatona_ is now known as lajoskatona | 18:27 | |
prometheanfire | frickler: yep :| | 19:27 |
clarkb | I've updated https://github.com/go-gitea/gitea/issues/19118 with more verbose client logs as requested though I'm not sure they are helpful | 20:18 |
fungi | i saw, thanks! | 20:20 |
clarkb | I spent far too much time using a variety of GIT_TRACE_* flags trying to find those with enough verbosity to likely be helpful and remove those that add just noise | 20:21 |
clarkb | Unfortuately the trace on hirsute looks completely different so hard to compare them directly :/ | 20:21 |
corvus | i'm looking into the potential zuul gerrit event processing bug; i suspect we are stuck right now, but still collecting events. i'd like to leave it as is while i research. but if necessary, we can restart a scheduler to unstick it temporarily. | 20:23 |
fungi | thanks for the heads up | 20:23 |
*** dviroel is now known as dviroel|out | 20:30 | |
opendevreview | melanie witt proposed openstack/project-config master: grafana ceph: add nova stable/(xena|yoga) branches https://review.opendev.org/c/openstack/project-config/+/835514 | 20:31 |
clarkb | I'm going to start putting the meeting agenda together. I think I've been a bit distracted and not sure if things are missing. Anything I hsould add? | 21:11 |
corvus | clarkb: fungi i would like to monkey-patch https://review.opendev.org/835518 into production; objections? | 21:12 |
corvus | (i have run the zuul test suite on that locally with no errors) | 21:12 |
clarkb | corvus: are we hitting this now due to the updates for circular dependency stuff? | 21:14 |
clarkb | but ya that patch looks small enough that it should be ok to patch in | 21:14 |
corvus | clarkb: yes. it is unclear to me at this point whether we are looping, or if we are just still in an exponential tree. i think the submitted together change can cause an exponential explosion in queries in the best case, and i think looping is theoretically possible in the worst case. i've only confirmed exponential behavior in the tests since they produce well-behaved responses. | 21:19 |
corvus | the assert in that test for 8 queries is something like 2748 without the fix | 21:20 |
clarkb | oh wow | 21:21 |
fungi | sounds good, thanks for looking into it | 21:21 |
corvus | i think the submitted-together contribution is mostly just adding an extra path; we were likely doing far too many queries alreday | 21:21 |
corvus | but exponents being what they are, we notice now :) | 21:21 |
clarkb | ah there i sthe recursion. _getChange calls _updateChange which calls _updateChangeDependencies again | 21:25 |
clarkb | and so the issue is we aren't accumulating the complete set of changes we can short circuit on as quickly as possibly | 21:25 |
clarkb | The branch that generates log.debug("Change %s is in history", change) isn't hit when it could be | 21:25 |
clarkb | I guess the laternative would be to return the history and update it, but this seems like it should work just fine | 21:26 |
clarkb | fungi: is the openstack release Wednesday or Thursday? | 21:53 |
rlandy | hello ... there is only one check job in https://zuul.opendev.org/t/openstack/status | 21:56 |
rlandy | and https://review.opendev.org/c/openstack/tripleo-ci/+/835101 has not shown up | 21:56 |
clarkb | rlandy: yes see scrollback. corvus thinks the issue has been identified and we're working on patching a fix in | 21:57 |
rlandy | k - reading back ... | 21:57 |
rlandy | thank you | 21:57 |
corvus | clarkb: remote: https://review.opendev.org/c/zuul/zuul/+/835522 Add more submitted-together tests [NEW] | 21:58 |
corvus | i wanted to confirm a few more things in unit tests before executing the patch | 21:58 |
corvus | i'm happy with that, so i will proceed now | 21:58 |
clarkb | sounds good. I'll work on reviewing that followup testing change now | 21:58 |
clarkb | corvus: I think that second change will fail pep8 due to line lengths | 22:03 |
clarkb | lgtm otherwise | 22:05 |
corvus | clarkb: thx; i'm going to add a reno to the first one (so we can do a 5.2.1) ill fix it then | 22:06 |
corvus | #status log monkeypatched https://review.opendev.org/835518 into running zuul schedulers | 22:06 |
opendevstatus | corvus: finished logging | 22:06 |
fungi | clarkb: release activities are starting around 11:30 utc on wednesday according to my notes. at least that's when i set my reminder to be on hand. maybe it's starting at noon utc | 22:09 |
fungi | corvus: i see backlogged changes starting to be enqueued, thanks | 22:09 |
clarkb | I've made note of it on the agenda for our meeting tomorrow. I'll get that sent out in a bit once I'm sure there isn't anything else I want toadd to it first | 22:10 |
clarkb | also I don't know I'll be up that early, but can do my best to check wednesday morning once I am up | 22:10 |
fungi | thanks | 22:10 |
fungi | and yeah, the assumption is that release work will be concluded by 15:00z | 22:11 |
fungi | so around 8am your time | 22:11 |
fungi | i have an appointment i need to get to about an hour after | 22:11 |
clarkb | I should be around so no prolbem | 22:12 |
fungi | clarkb: also you got a response to your gitea debug info | 22:12 |
fungi | that was quick | 22:12 |
clarkb | yup I think the shanghai day must be starting | 22:13 |
*** rlandy is now known as rlandy|out | 22:55 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!