opendevreview | Merged openstack/project-config master: nodepool: infra-package-needs; cleanup python https://review.opendev.org/c/openstack/project-config/+/872476 | 00:16 |
---|---|---|
opendevreview | Merged openstack/project-config master: nodepool: infra-package-needs; remove lvm2 https://review.opendev.org/c/openstack/project-config/+/872477 | 00:16 |
tonyb | I'm still seeing really slow reqponses from gitea08 | 02:32 |
ianw | load average: 78.44, 85.12, 99.48 | 02:33 |
ianw | it is .. unhappy | 02:34 |
ianw | nothing completely obvious | 02:36 |
ianw | Feb 6 02:31:24 gitea08 docker-gitea[847]: 2023/02/06 02:31:23 ...ules/context/repo.go:469:RepoAssignment() [E] [63e06679-3] GetUserByName: context canceled | 02:36 |
ianw | seems frequent | 02:37 |
ianw | those messages go back as far as we have logs though | 02:40 |
ianw | there's lots of oom kills | 02:44 |
ianw | i've restarted the container anyway | 02:45 |
fungi | oom kills on the gitea servers are usually a sign that some network behind a common nat is repeatedly cloning large repos like openstack/nova | 03:49 |
fungi | we saw that behavior when people had openstack-ansible deployments acting up and all their servers tried to independently clone all of openstack rather than caching a central copy in their deployment | 03:50 |
fungi | apparently the clone operation results in whole copies of the repository being temporarily stored in memory | 03:51 |
fungi | so it doesn't take many to exhaust one of the backends | 03:51 |
*** yadnesh|away is now known as yadnesh | 04:00 | |
*** bhagyashris_ is now known as bhagyashris | 04:28 | |
*** ysandeep is now known as ysandeep|ruck | 05:18 | |
*** ysandeep|ruck is now known as ysandeep|ruck|afk | 06:05 | |
*** ysandeep|ruck|afk is now known as ysandeep|ruck | 06:47 | |
jrosser | fungi ianw I did a bunch of work to make OSA use an identifiable user agent if you believe that is the cause | 06:58 |
ianw | jrosser: ++ i don't have time to look right now but definitely an angle. | 08:42 |
ianw | i wonder if we could somehow work that into some sort of static report like we do for the other services | 08:42 |
ianw | e.g. https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_757/periodic/opendev.org/opendev/system-config/master/docs-openstack-goaccess-report/757a49e/docs.openstack.org_goaccess_report.html | 08:42 |
*** jpena|off is now known as jpena | 08:42 | |
opendevreview | Ian Wienand proposed zuul/zuul-jobs master: ensure-skopeo: fixup some typos https://review.opendev.org/c/zuul/zuul-jobs/+/872733 | 08:44 |
*** gibi_pto is now known as gibi | 08:47 | |
opendevreview | Merged zuul/zuul-jobs master: ensure-skopeo: add install from upstream option https://review.opendev.org/c/zuul/zuul-jobs/+/872617 | 08:56 |
opendevreview | Merged zuul/zuul-jobs master: zuul-jobs-test-registry-docker-* : update to jammy nodes https://review.opendev.org/c/zuul/zuul-jobs/+/872365 | 09:56 |
*** ysandeep|ruck is now known as ysandeep|ruck|break | 10:43 | |
opendevreview | Ade Lee proposed zuul/zuul-jobs master: Add ubuntu to enable-fips role https://review.opendev.org/c/zuul/zuul-jobs/+/866881 | 10:55 |
*** ysandeep|ruck|break is now known as ysandeep|ruck | 11:31 | |
*** yadnesh is now known as yadnesh|away | 13:38 | |
*** dasm|off is now known as dasm|rover | 13:52 | |
opendevreview | Scott Little proposed openstack/project-config master: Create a git for the storage of public keys and certificates https://review.opendev.org/c/openstack/project-config/+/872758 | 15:04 |
gthiemonge | FYI I see a lot of issues with ubuntu mirrors in the octavia-grenade job (I don't know why this particular job is so impacted) | 15:16 |
gthiemonge | https://zuul.opendev.org/t/openstack/build/ca0a84af6a704bb6b3b521c846faab19/log/controller/logs/dib-build/amphora-x64-haproxy.qcow2_log.txt#429 | 15:21 |
gthiemonge | I see similar issues in opensearch | 15:21 |
fungi | gthiemonge: any idea why those jobs don't use our package mirrors? | 15:25 |
tweining | once I saw in the log that it was trying an ipv6 address, not sure if that has something to do with it or not though. | 15:26 |
fungi | some of our providers have ipv6 access, some do not. if it was trying to reach an ipv6 address from a system which didn't have any v6 routes that could indicate an issue | 15:28 |
fungi | the zuul host info logged with the job should show the routing table the node had when the build started | 15:28 |
fungi | but also some tools "fall back" to trying ipv6 when v4 connections to something time out, and then report misleading errors | 15:29 |
fungi | so the error message ends up implying that a v4-only host tried to reach something over v6, but really the problem is that the v4 connection it correctly attempted first failed for some reason | 15:30 |
opendevreview | Scott Little proposed openstack/project-config master: Create a git for the storage of public keys and certificates https://review.opendev.org/c/openstack/project-config/+/872758 | 15:30 |
fungi | anyway, i'm able to manually download one of the packages that build said it couldn't, so the problem is likely either intermittent or location-dependent | 15:33 |
fungi | the addresses i see for it in dns appear to be hosted directly by canonical, so probably no cdn involved at least | 15:34 |
Clark[m] | fungi: it's likely not using our mirrors because it is within the dib image builds chroot. Dib has support for using our package mirrors though iirc. Maybe that's just part of dibs test suite though | 15:36 |
fungi | yeah, that would make sense | 15:40 |
fungi | also i tested downloading over ipv4 as well as ipv6, fwiw, both worked for me | 15:41 |
fungi | though i didn't try all 3 v4 and 3 v6 addresses in the round-robin | 15:41 |
fungi | could be one of the servers they list is having trouble | 15:42 |
gthiemonge | fungi: Clark[m]: thanks, I'll check how we can use our mirrors | 16:03 |
slittle1_ | Review please... https://review.opendev.org/c/openstack/project-config/+/872758 | 16:28 |
fungi | you bet, i was just about to pull it up, i was delayed by some local software updates which have just completed | 16:30 |
clarkb | hey I'm doing local software updates too | 16:30 |
clarkb | monday morning routine | 16:30 |
fungi | indeed, though i was way behind in recompiling all my python interpreters since the recent tags | 16:31 |
fungi | and then rebuilding all my venvs, including the one for my gertty | 16:32 |
clarkb | they should only need rebuilding when you change major versions? | 16:32 |
fungi | well, any time your interpreter's path changes, i think | 16:32 |
fungi | which in my case is the case even for new patch releases because i use separate directories for them | 16:33 |
clarkb | ah | 16:33 |
fungi | yeah, pyvenv.cfg embeds the real path to the interpreter in its "executable" key | 16:35 |
fungi | so mine just updated from /home/fungi/lib/cpython/3.11.0/bin/python3.11 to /home/fungi/lib/cpython/3.11.1/bin/python3.11 | 16:35 |
fungi | useful when you want to, say, easily compare behaviors between 3.11.0 and 3.11.1 since you can have them installed side by side and create different venvs referencing each | 16:37 |
fungi | i simply update a symlink in ~/bin to point to the new version when i want it to be the default build | 16:38 |
fungi | i've got my venv rebuilds scripted anyway, so it's just a matter of starting the script and waiting for that to complete (or error) | 16:41 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Feature our cloud donors on opendev.org https://review.opendev.org/c/opendev/system-config/+/869091 | 16:45 |
opendevreview | Merged openstack/project-config master: Create a git for the storage of public keys and certificates https://review.opendev.org/c/openstack/project-config/+/872758 | 17:03 |
*** jpena is now known as jpena|off | 17:36 | |
clarkb | mtreinish: super minor thing I've noticed cleaning up warnings in Zuul. stestr's subunit_runner opens an fd returning a python file object in SubunitTestRunner._list() and ends up returning that back up again to users of the TestRunner so that status results can be recorded. Python complains that this file object is never closed and raises a ResourceWarning | 18:11 |
clarkb | mtreinish: a quick fix wasn't super obvious to me otherwise I'd write a PR because the status object which uses that file object is passed back up and stats things are called against it | 18:12 |
fungi | infra-root: i think our recent changes to jeepyb may have broken manage-projects: https://paste.opendev.org/show/bwSUS8swuboT3Q6OSr6v/ | 18:32 |
fungi | possible chicken-and-egg problem? is it trying to fetch a ref from a project which doesn't exist yet? | 18:33 |
clarkb | fungi: that would be from the change that made the git errors an error rather than log and continue | 18:34 |
clarkb | and ya that hunch sounds correct. | 18:34 |
fungi | yeah, that's the change i was expecting it to be | 18:34 |
clarkb | We should be able to "safely" revert that jeepyb change that raised errors in that situation | 18:34 |
clarkb | (we'll just reintroduce the old behavior which was problematic but less problematic probably) | 18:34 |
clarkb | an alternative would be to treat the fetch of the refs as special. If it fails its ok and continue and we'll try to push what we have anyway | 18:34 |
clarkb | that is probably a reasonably correct fix | 18:35 |
clarkb | the issue before was treating pushes as fail acceptable, here its a fetch | 18:35 |
fungi | though... has that change actually merged yet? | 18:35 |
clarkb | hrm nope https://review.opendev.org/c/opendev/jeepyb/+/869873 | 18:36 |
fungi | right, so this must be something else | 18:36 |
fungi | maybe there was an intermittent connectivity failure | 18:37 |
clarkb | its all to localhost I think | 18:37 |
clarkb | that would be highly unlikely but possible if the mina sshd ran out of threads maybe | 18:37 |
clarkb | fungi: the other change is the change of the base image | 18:37 |
fungi | gerrit says the project got created | 18:37 |
fungi | so maybe jeepyb raced something trying to access the config ref from it too soon | 18:38 |
fungi | the repo got prepopulated and synced to gitea too | 18:38 |
clarkb | fungi: if you look at fetch_config in manage_projects it has a loop for 20ish seconds waiting for the meta config to be available | 18:39 |
clarkb | perhaps 20 seconds is not long enough? | 18:39 |
fungi | mmm, the repo actually didn't get prepoputated or synced, it was just created on gitea empty | 18:40 |
fungi | but it exists in gerrit and gitea at least | 18:40 |
clarkb | fungi: if ou look in the manage projects log you should see it looping too as it seems to log each pass through that loop | 18:40 |
clarkb | oh but only at debug level? | 18:41 |
clarkb | what if "public-keys" is the problem | 18:42 |
clarkb | and we're tripping over some gerrit user public keys api path | 18:42 |
fungi | if i `git fetch ssh://review.opendev.org:29418/starlingx/public-keys +refs/meta/config:refs/remotes/gerrit-meta/config` i'm told "fatal: couldn't find remote ref refs/meta/config" | 18:43 |
clarkb | fungi: you have to do that as your admin account iirc | 18:43 |
clarkb | possibly in bootstrappers | 18:43 |
fungi | oh, right | 18:43 |
fungi | that worked | 18:43 |
fungi | git fetch ssh://fungi.admin@review.opendev.org:29418/starlingx/public-keys +refs/meta/config:refs/remotes/gerrit/config | 18:43 |
fungi | * [new ref] refs/meta/config -> gerrit/config | 18:44 |
fungi | so it seems to exist now, at least | 18:44 |
fungi | might have just been a race | 18:44 |
clarkb | ya maybe that 20 second time period isn't long enough depending on how busy gerrit is or how busy its disks are? | 18:45 |
johnsom | gthiemonge https://github.com/openstack/octavia-tempest-plugin/blob/master/zuul.d/jobs.yaml#L219 | 18:45 |
fungi | clarkb: should i try manually rerunning manage-projects and see if it succeeds? | 18:45 |
clarkb | fungi: I guess so? maybe with debug enabled so that you can see it loop through things. The only other thought I've got is maybe it has something to do with git in the new image or the git repos in the jeepyb cache on the new image | 18:46 |
clarkb | fungi: but we directly manage the gerrit uid already and that didn't change in the base image swap so that would surprise me I think | 18:46 |
clarkb | and the git versions were basically equivalent | 18:46 |
clarkb | (conversion from our security patched version to debians) | 18:47 |
clarkb | unrelated: Our CI jobs for fungi's gitea change are failing on apparmor for docker 23 now | 18:56 |
fungi | i saw that the build failed, but hadn't found time to see why yet | 18:56 |
fungi | noticed it about the same time as the manage-projects failure | 18:56 |
clarkb | our prod servers already have apparmor installed based on a quick sampling so I think I'll just push a change to add apparmor to our install docker role | 18:57 |
clarkb | fungi: re manage-projects I can't really come up with anything except for git versions/permissions issues due to the base image change, or simply a timeout with our loop not being long enough | 18:58 |
clarkb | fungi: I double checked group membership and project creator appears to have the correct perms | 18:58 |
clarkb | in gerrit I mean. | 18:58 |
opendevreview | Clark Boylan proposed opendev/system-config master: Install apparmor when we install docker-ce from upstream https://review.opendev.org/c/opendev/system-config/+/872801 | 19:00 |
opendevreview | Clark Boylan proposed opendev/system-config master: Feature our cloud donors on opendev.org https://review.opendev.org/c/opendev/system-config/+/869091 | 19:01 |
clarkb | fungi: ^ rebased as thats a good check it fixes the issue | 19:01 |
fungi | sgtm | 19:03 |
clarkb | fungi: another variable that may have impacted refs/meta/config is if it overlappedwith backups and that was eating up iops | 19:04 |
fungi | ooh | 19:04 |
clarkb | so ya I'm thinking the best next step is to rerun with debug on against that project specifically and see if its happy now. If so our 20 second retry loop may simply be too short | 19:04 |
clarkb | I guess the jdk changed too and maybe its slower at doing that bootstrapping? | 19:07 |
clarkb | fungi: I'm going to go back to zuul warning cleanup while I've got it paged in but ping me if I can help further | 19:16 |
fungi | i'm trying to reverse-engineer the manage-projects playbook since just running it directly seems to have failed (probably in the same spot but it doesn't log to a file, just to stdout) | 19:31 |
fungi | what does this tasks_from do? https://opendev.org/opendev/system-config/src/branch/master/playbooks/manage-projects.yaml#L35 | 19:32 |
clarkb | fungi: it runs the tasks from teh manage-projects file in the gerrit role | 19:32 |
clarkb | fungi: https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gerrit/tasks/manage-projects.yaml | 19:33 |
fungi | yep, thanks found it | 19:33 |
fungi | so i guess i can just run manage-projects on the gerrit server | 19:34 |
fungi | which seems to be a docker run wrapper | 19:34 |
clarkb | yes because we run jeepyb on the image with all the various dirs bind mounted in | 19:35 |
clarkb | doing that by hand would be annoying so we have the wrapper | 19:35 |
fungi | running with -v, i don't see any debug log entries | 19:38 |
fungi | 2023-02-06 19:37:40,534: manage_projects - ERROR - Failed to fetch refs/meta/config for project: starlingx/public-keys | 19:39 |
fungi | so whatever it's trying is still not working | 19:39 |
clarkb | that method is the only place we have log.debug() calls. I wonder if we didn't add an ability to actually record those | 19:39 |
fungi | i find it extra interesting that i can fetch that with a gerrit admin account | 19:40 |
fungi | it does seem to take at least 20 seconds before i get any output, which would suggest the retry loop is actually happening at least | 19:43 |
clarkb | fungi: I think it isn't creating the blank repo to fetch the config into | 19:44 |
clarkb | fungi: the jeepyb cache dir is at /opt/lib/jeepyb:/opt/lib/jeepyb so the same dir path in both host and container. /opt/lib/jeepyb/starlingx does not have any entries, but the public-keys dir should be there to fetch the config into | 19:45 |
fungi | /opt/lib/jeepyb/project.cache has an entry for it with project-created and pushed-to-gerrit both true but no acl-sha, which seems to match what we're observing at least | 19:47 |
clarkb | what I'm confused about is jeepyb's make_local_copy should error if it isn't able to git init I think | 19:49 |
clarkb | oh except we don't raise there we could just be running several git commands that all just fail | 19:50 |
fungi | yeah, it looks like run_command would log.debug the output from those | 19:51 |
fungi | but maybe that doesn't go to stdout/stderr on a normal invocation | 19:51 |
fungi | manage-projects has a -l option to specify a log path | 19:51 |
fungi | we map /var/log into the container too but doesn't look like anything is writing a jeepyb or manage-projects log by default | 19:52 |
clarkb | ya I think beacuse we use default logging which is stdout | 19:53 |
clarkb | oh wait we remove the dir in the cache | 19:54 |
clarkb | ok that explains some very confusing behavior | 19:54 |
clarkb | and the timestamps for that dir do show it was updated roughly when you ran it by hand ok thats making a bit more sense now | 19:55 |
fungi | i'm trying it's -l option | 19:57 |
fungi | which doesn't appear to do anything | 19:57 |
clarkb | fungi: I think the flag for debug is -d | 19:58 |
fungi | aha, the -- i was including was to blame | 19:58 |
clarkb | see setup_logging_arguments | 19:58 |
fungi | --help says it's -v | 19:58 |
clarkb | -v is verbose -d is debug | 19:58 |
fungi | oh! yes okay i see it now | 19:59 |
clarkb | and in this case we've set verbose at INFO and higher and debug at DEBUG and higher | 19:59 |
clarkb | however, that will just log the mostly useless message 10 times over 20 seconds since we already know it is taking roughly that long | 19:59 |
fungi | okay, i have a more useful log file in /var/log now | 20:00 |
fungi | here we go... | 20:00 |
clarkb | fatal: ssh variant 'simple' does not support setting port | 20:01 |
fungi | jeepyb.utils - DEBUG - Command said: fatal: not a git repository: '/opt/lib/jeepyb/starlingx/public-keys/.git' | 20:01 |
clarkb | yup and if you scroll up a bit ssh vairnat simple does not support setting port seems to be why ^ that isn't a repository | 20:01 |
fungi | ahh, yeah that's even earlier | 20:01 |
fungi | looks like GIT_SSH_VARIANT=ssh is a workaround or `git config --global ssh.variant ssh` | 20:02 |
clarkb | ya and this is likely a side effect of our image change then I guess | 20:02 |
fungi | maybe different ssh client? | 20:03 |
clarkb | maybe? | 20:03 |
clarkb | fungi: we also set GIT_SSH to a wrapper script in order to set ssh flags for the key path and the username etc | 20:04 |
fungi | or it could be that the git command there has a built-in ssh client implementation now | 20:06 |
opendevreview | Clark Boylan proposed opendev/system-config master: Install openssh-client in our Gerrit docker image https://review.opendev.org/c/opendev/system-config/+/872802 | 20:20 |
clarkb | ok I think ^ will address it. Just a missing dep in the base image swap | 20:21 |
clarkb | note this is based on the apparmor change so that it can gate | 20:21 |
clarkb | the apparmor change should be considered carefully however, as I mentioned I thnk its a noop for our prod hosts | 20:21 |
fungi | system-config-run-gitea is still underway for 869091,10 as our confirmation on that one | 20:24 |
fungi | hopefully we'll see that green shortly | 20:24 |
clarkb | with that largely sorted out (for now anyway) I'm going to eat lunch | 20:28 |
mtreinish | clarkb: yeah, it's been on my backlog to try and figure out how to handle that. There was an issue opened a while ago about all the resource warnings that get raised: https://github.com/mtreinish/stestr/issues/320 and masayukig fixed some of them but there are definitely still more | 20:49 |
mtreinish | some of them will definitely be tricky to fix, because it's all in weird inherited usage from subunit and unittest (mostly because I have to remind myself how that all works) | 20:50 |
clarkb | mtreinish: I can commiserate with that. See also the jeepyb debugging above :) | 20:53 |
ianw | my docker 23 issue was ultimately that i had an old devicemapper based container and the docker daemon wouldn't start | 21:13 |
ianw | it might have been able to with various flags, but it was easier to just start again | 21:13 |
ianw | i think this was from when linode (my host) was a Xen-based vm. at some point they migrated everything to kvm, but iirc at the time something about being xen made it use devicemapper | 21:14 |
clarkb | some linux archeaolgy | 21:15 |
ianw | we're testing with docker 23 now, but i don't think it will get pulled in anywhere in prod unless we explicitly update | 21:15 |
clarkb | (also I can't type) | 21:15 |
clarkb | ianw: correct because updating docker implies restarting containers and we try to control that | 21:15 |
ianw | i wonder if it's worth just making a list and doing it manually, starting with lower-impact hosts? | 21:16 |
clarkb | fungi: heh the latest donor change made the header and text align properly but now the donor logos are stacked on top of each other. I Think I prefer this even if it is more scrolling though | 21:16 |
clarkb | ianw: not a bad idea | 21:16 |
clarkb | fungi: but I'm terrible at css and layout... | 21:17 |
ianw | i can start an etherpad and do that. it's probably not a bad idea to do a reboot anyway on some of these hosts | 21:17 |
clarkb | ++ | 21:17 |
ianw | tracking at https://etherpad.opendev.org/p/docker-23-prod | 21:21 |
clarkb | ianw: in the past what I've tried to do is stop service containers, upgrade docker, optionally reboot, start service containers again. I think the packaging will attempt to restart containers for you but I like doing it myself for most things | 21:23 |
ianw | ++ | 21:23 |
mtreinish | clarkb: tbh, looking at the code in detail now I think I can just drop the fdopen call. I don't think it's really relevant. IIRC, I just ported that from subunit and/or unittest when I rewrote the runner to be based on unittest's run instead of testtools, but the stestr context is more limited and we're almost always just passing stdout as the result stream and won't ever need to open a new descriptor | 21:28 |
mtreinish | in that code | 21:28 |
mtreinish | I'm just going to simplify that logic (famous last words) | 21:28 |
opendevreview | Merged zuul/zuul-jobs master: ansible-lint: fix a bunch of command-instead-of-shell errors https://review.opendev.org/c/zuul/zuul-jobs/+/872490 | 21:36 |
opendevreview | Merged zuul/zuul-jobs master: ansible-lint: add names to blocks/includes, etc. https://review.opendev.org/c/zuul/zuul-jobs/+/872491 | 21:36 |
opendevreview | Merged zuul/zuul-jobs master: ansible-lint: ignore use of mkdir https://review.opendev.org/c/zuul/zuul-jobs/+/872492 | 21:36 |
mtreinish | clarkb: https://github.com/mtreinish/stestr/pull/342 it passed tests locally | 21:39 |
opendevreview | Jeremy Stanley proposed opendev/system-config master: Feature our cloud donors on opendev.org https://review.opendev.org/c/opendev/system-config/+/869091 | 21:48 |
clarkb | mtreinish: thanks! I was mostly motiviated by the sqlalchemy 2.0 update and needing to filter out all the noise warnings from the useful warnings. | 21:48 |
fungi | clarkb: ^ looking at the other logos at the top of the page, i think i just incorrectly nested them | 21:48 |
*** dmitriis9 is now known as dmitriis | 21:51 | |
*** Tengu8 is now known as Tengu | 21:51 | |
*** mtreinish_ is now known as mtreinish | 21:51 | |
*** dtantsur_ is now known as dtantsur | 21:51 | |
*** noonedeadpunk_ is now known as noonedeadpunk | 21:51 | |
opendevreview | Merged opendev/system-config master: Install apparmor when we install docker-ce from upstream https://review.opendev.org/c/opendev/system-config/+/872801 | 22:29 |
opendevreview | Merged zuul/zuul-jobs master: ansible-lint: use pipefail https://review.opendev.org/c/zuul/zuul-jobs/+/872493 | 22:35 |
opendevreview | Merged zuul/zuul-jobs master: ansible-lint: ignore latest git pull https://review.opendev.org/c/zuul/zuul-jobs/+/872494 | 22:35 |
opendevreview | Ian Wienand proposed zuul/zuul-jobs master: build-docker-image: further cleanup buildx path https://review.opendev.org/c/zuul/zuul-jobs/+/872806 | 22:58 |
ianw | To ssh://review.opendev.org:29418/opendev/system-config.git | 23:16 |
ianw | ! [remote rejected] HEAD -> refs/for/master%topic=docker-apt-key (n/a (unpacker error)) | 23:16 |
ianw | is this my fault or gerrits fault?? | 23:17 |
ianw | Caused by: java.io.IOException: Unpack error on project "opendev/system-config": | 23:18 |
ianw | in gerrit logs | 23:18 |
opendevreview | Ian Wienand proposed opendev/system-config master: install-docker: switch from deprecated apt-key https://review.opendev.org/c/opendev/system-config/+/872808 | 23:20 |
opendevreview | Ian Wienand proposed opendev/system-config master: install-docker: remove apt-key cleanup https://review.opendev.org/c/opendev/system-config/+/872809 | 23:20 |
ianw | $ zgrep 'Unpack error, check server log' * | wc -l | 23:32 |
ianw | 28 | 23:32 |
ianw | so it's not unique, but also not that frequent. maybe it was my client dropping packets or something | 23:32 |
JayF | is something upside down? | 23:37 |
JayF | ianw: I'm seeing exactly that | 23:37 |
JayF | fetch-pack: unexpected disconnect while reading sideband packet | 23:37 |
JayF | more like these, it errors in different places depending on when it times out | 23:38 |
JayF | looks like generically slow-remote-server stuff? but I know little about what goes on behind the covers here | 23:38 |
ianw | JayF: what was the operation you were doing? | 23:38 |
JayF | Trying to push a fresh patch. It died in the git remote update gerrit step | 23:39 |
JayF | and I can make that fail outside of `git review1 | 23:39 |
ianw | JayF: hrm, i'm not seeing anything lining up in the gerrit logs, can you paste more context where it popped up? | 23:41 |
JayF | let me get a fresh reproduction then I'll paste it | 23:41 |
JayF | ianw: https://gist.github.com/jayofdoom/bbd2e080f66183192d5546f9a4591b9f | 23:42 |
JayF | web UI works as I'd expect, if a bit slow, so I think it's not connectivity | 23:42 |
ianw | ahh, ok, i see in logs now | 23:43 |
ianw | SshChannelNotFoundException: Received SSH_MSG_CHANNEL_WINDOW_ADJUST on unassigned channel 0 (last assigned=null) | 23:43 |
ianw | always great to see a new weird ssh error, it's been too long since the last one :) | 23:44 |
JayF | I'm running 9.1_p1-r3 | 23:44 |
JayF | on gentoo | 23:44 |
ianw | the last one almost turned clarkb into a java developer | 23:44 |
JayF | if it's possible the error is caused by shiny new openssh, it's likely I'm running the shiny new lol | 23:44 |
JayF | although there is a 9.2 in the repo too... | 23:44 |
opendevreview | Merged opendev/system-config master: Install openssh-client in our Gerrit docker image https://review.opendev.org/c/opendev/system-config/+/872802 | 23:46 |
ianw | there's references to this in a few places | 23:47 |
JayF | that's an ominous merge in time with this bug LOL | 23:48 |
ianw | https://bugs.chromium.org/p/gerrit/issues/detail?id=11491; an old wikimedia commit seems to have enabled the workaround -> https://gerrit.wikimedia.org/r/c/operations/puppet/+/755968/ | 23:48 |
JayF | looks like from what I'm seeing, most reports are when networking is slow or high latency | 23:49 |
JayF | makes me wonder if it's possible there's a network issue underlying this failure mode | 23:49 |
ianw | you're not the only user to have this error in the logs | 23:49 |
JayF | I'm talking more generally than just me; mainly based off a feeling (not quanitative data) that the Web UI is exhibiting some slowness too | 23:50 |
ianw | JayF: hrm, did you just upgrade or something? | 23:54 |
JayF | I don't think so; but I run updates on this thing very frequently | 23:55 |
ianw | there's a few users seeing this in a bit of a regular pattern | 23:55 |
ianw | 133 exceptionCaught(ServerSessionImpl[proliantci@ | 23:55 |
ianw | e.g. seems proliantci is experiencing it | 23:55 |
JayF | those are ironic third party CI :( | 23:55 |
ianw | 33 exceptionCaught(ServerSessionImpl[cisco-cinder-ci | 23:55 |
ianw | 19 exceptionCaught(ServerSessionImpl[hp-storage-blr-ci | 23:56 |
JayF | FWIW, looks like I'v ebeen running the same openssh client version for a couple weeks minimum | 23:56 |
ianw | that's the bot accounts, but there's user accounts too | 23:56 |
JayF | honestly, and I'm far from an expert in java ops (and even if I was, that info would be dusty) | 23:56 |
JayF | but this is the sort of thing I'd reboot first and ask questions later LOL | 23:56 |
ianw | i suppose it's possible all three of those are on the same distro with the same openssh | 23:57 |
JayF | honestly, I'd be amazed if anyone has modified config on HP third party CI in months | 23:57 |
ianw | JayF: so are you basically blocked from pushing changes with this atm? | 23:58 |
JayF | yes, but my day ends in 2 minutes | 23:58 |
JayF | so i'm happy to just close the laptop and re-run `git review` tomorrow lol | 23:59 |
JayF | but I can hang out and help w/testing if that's useful | 23:59 |
* JayF tries again for good measure | 23:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!