corvus | re-equeueing now | 00:02 |
---|---|---|
clarkb | estimatedNodepoolQuotaUsed has a bunch of IndexError: list index out of range | 00:03 |
clarkb | not sure if that is expected or not | 00:03 |
clarkb | looks like that may have been happening earlier today as well | 00:05 |
fungi | 15207 matches in launcher-debug.log.2021-02-18_17 | 00:05 |
fungi | yeah, that pre-dates the restart | 00:05 |
clarkb | the log reports nodes are going ready in that launcher so I'll assume it isn't fatal | 00:06 |
clarkb | oh I think this may be "expected" | 00:08 |
clarkb | I want to say this is the code that skips over stale data in the zk db. Previously our maths were wrong because we'd try to do maths on nodes that weren't fully valid. Somthing like that | 00:08 |
clarkb | anyway this is it saying I tried but I can't add this to the estimated pool | 00:08 |
clarkb | it appeas to be be ~4 nodes stuck in deleting | 00:09 |
*** tosky has quit IRC | 00:10 | |
*** zbr3 has joined #opendev | 00:13 | |
*** zbr has quit IRC | 00:15 | |
*** zbr3 is now known as zbr | 00:15 | |
clarkb | ya there are also timeouts for server deletion | 00:15 |
clarkb | and grepping on the uuids shows that at least one of the uuids in the estimated quota problem shows up in the deleting problems | 00:17 |
clarkb | pretty sure we can ignore this as far as nodepool functioning goes | 00:17 |
corvus | #status log restarted zuul on 4f897f8b9ff24797decaab5faa346bd72f110970 and nodepool on c3b68c1498cc87921c33737e8809fdabbf3db5d7 | 00:25 |
openstackstatus | corvus: finished logging | 00:25 |
corvus | i spotted a couple of node failures, but none recently | 00:26 |
corvus | might have been transient due to restart-related turnover or something | 00:26 |
*** zbr6 has joined #opendev | 00:31 | |
*** zbr has quit IRC | 00:33 | |
*** zbr6 is now known as zbr | 00:33 | |
*** DSpider has quit IRC | 00:35 | |
clarkb | I just did a mega update and now name reoslution doesn't work | 00:37 |
clarkb | I guess I try a reboot to see if that makes libc happy or whatever is causing this | 00:38 |
clarkb | that was weird, all seems well after rebooting | 00:46 |
*** brinzhang has joined #opendev | 00:47 | |
*** zbr3 has joined #opendev | 00:51 | |
*** zbr has quit IRC | 00:54 | |
*** zbr3 is now known as zbr | 00:54 | |
clarkb | that is really interesting. jsch creates a lock file for known_hosts files | 00:56 |
clarkb | and my latest gatling iteration is failing because I bind mount in the known_hosts from root's homedir and then it can't create the lock file there due to perms | 00:57 |
clarkb | I'll have to look at that on monday | 00:57 |
clarkb | (why not just read the file....) | 00:57 |
*** zbr0 has joined #opendev | 01:00 | |
*** zbr has quit IRC | 01:02 | |
*** zbr0 is now known as zbr | 01:02 | |
*** zbr9 has joined #opendev | 01:25 | |
*** LowKey has quit IRC | 01:25 | |
*** zbr has quit IRC | 01:27 | |
*** zbr9 is now known as zbr | 01:27 | |
*** mlavalle has quit IRC | 01:30 | |
openstackgerrit | Merged opendev/system-config master: Add pull tasks for nodepool/zuul https://review.opendev.org/c/opendev/system-config/+/776720 | 01:31 |
*** zbr7 has joined #opendev | 01:44 | |
*** zbr has quit IRC | 01:46 | |
*** zbr7 is now known as zbr | 01:46 | |
*** zbr5 has joined #opendev | 02:05 | |
*** zbr has quit IRC | 02:06 | |
*** zbr5 is now known as zbr | 02:06 | |
*** LowKey has joined #opendev | 02:20 | |
*** LowKey has quit IRC | 02:25 | |
*** zbr2 has joined #opendev | 02:27 | |
*** zbr has quit IRC | 02:28 | |
*** zbr2 is now known as zbr | 02:28 | |
*** ysandeep|away is now known as ysandeep | 02:42 | |
*** zbr7 has joined #opendev | 02:50 | |
*** zbr has quit IRC | 02:52 | |
*** zbr7 is now known as zbr | 02:52 | |
*** dviroel has quit IRC | 03:06 | |
*** zbr4 has joined #opendev | 03:10 | |
*** zbr has quit IRC | 03:13 | |
*** zbr4 is now known as zbr | 03:13 | |
*** ysandeep is now known as ysandeep|away | 03:19 | |
*** iurygregory has quit IRC | 04:10 | |
*** zbr4 has joined #opendev | 05:04 | |
*** zbr has quit IRC | 05:06 | |
*** zbr4 is now known as zbr | 05:06 | |
*** zbr3 has joined #opendev | 05:54 | |
*** zbr has quit IRC | 05:56 | |
*** zbr3 is now known as zbr | 05:56 | |
*** zbr3 has joined #opendev | 05:58 | |
*** zbr has quit IRC | 06:00 | |
*** zbr3 is now known as zbr | 06:00 | |
*** icey has quit IRC | 06:04 | |
*** icey has joined #opendev | 06:04 | |
*** hemanth_n has joined #opendev | 06:19 | |
*** hemanth_n has quit IRC | 06:23 | |
*** zbr2 has joined #opendev | 06:34 | |
*** zbr has quit IRC | 06:36 | |
*** zbr2 is now known as zbr | 06:36 | |
*** zbr7 has joined #opendev | 06:57 | |
*** zbr has quit IRC | 06:59 | |
*** zbr7 is now known as zbr | 06:59 | |
*** zbr8 has joined #opendev | 07:22 | |
*** zbr has quit IRC | 07:24 | |
*** zbr8 is now known as zbr | 07:24 | |
*** zbr7 has joined #opendev | 07:31 | |
*** zbr has quit IRC | 07:33 | |
*** zbr7 is now known as zbr | 07:33 | |
*** sboyron has joined #opendev | 07:56 | |
*** zbr1 has joined #opendev | 07:57 | |
*** zbr has quit IRC | 07:59 | |
*** zbr1 is now known as zbr | 07:59 | |
*** slaweq has joined #opendev | 08:19 | |
*** zbr5 has joined #opendev | 08:21 | |
*** zbr has quit IRC | 08:22 | |
*** zbr5 is now known as zbr | 08:22 | |
frickler | infra-root: brinzhang reported in #-nova: while I generate new password with HTTP Credentials from gerrit, it report "Error 500 (Server Error): Internal server error Endpoint: /accounts/self/password.http" | 08:25 |
frickler | I can reproduce this. we didn't restart gerrit tonight, did we? /me tries to take a look at gerrit logs now | 08:26 |
*** slaweq has quit IRC | 08:26 | |
*** slaweq has joined #opendev | 08:33 | |
frickler | humm, seems there is a long traceback for this, which in essense says some lock is invalid http://paste.openstack.org/show/l8D4oarIqcETtkpx9692/ | 08:34 |
frickler | I'll leave gerrit in this state for further debugging right now, but I won't object if someone wants to just try a restart | 08:35 |
*** slaweq has quit IRC | 08:43 | |
*** zbr4 has joined #opendev | 08:55 | |
*** zbr has quit IRC | 08:58 | |
*** zbr4 is now known as zbr | 08:58 | |
*** zbr has quit IRC | 09:05 | |
*** zbr has joined #opendev | 09:05 | |
brinzhang | frickler, infra-root: it's true, I tried some times today, but it always reported the same error, pls check this, that we cannot submit the patch to gerrit without the password. | 09:06 |
brinzhang | thanks | 09:06 |
*** zbr7 has joined #opendev | 09:08 | |
*** zbr has quit IRC | 09:10 | |
*** zbr7 is now known as zbr | 09:10 | |
*** DSpider has joined #opendev | 09:25 | |
*** zbr4 has joined #opendev | 09:54 | |
*** zbr has quit IRC | 09:56 | |
*** zbr4 is now known as zbr | 09:56 | |
*** zbr6 has joined #opendev | 10:08 | |
*** zbr has quit IRC | 10:10 | |
*** zbr6 is now known as zbr | 10:10 | |
*** tosky has joined #opendev | 10:13 | |
*** brinzhang has quit IRC | 10:15 | |
*** noonedeadpunk has quit IRC | 11:16 | |
*** noonedeadpunk has joined #opendev | 11:17 | |
*** LowKey has joined #opendev | 11:28 | |
*** slaweq has joined #opendev | 11:48 | |
*** slaweq has quit IRC | 11:57 | |
*** zbr6 has joined #opendev | 11:59 | |
*** biglot00 has joined #opendev | 11:59 | |
*** LowKey has quit IRC | 12:00 | |
*** LowKey has joined #opendev | 12:00 | |
*** zbr has quit IRC | 12:00 | |
*** zbr6 is now known as zbr | 12:00 | |
*** biglot00 has left #opendev | 12:06 | |
*** zbr4 has joined #opendev | 12:07 | |
*** zbr has quit IRC | 12:09 | |
*** zbr4 is now known as zbr | 12:09 | |
*** zbr9 has joined #opendev | 12:28 | |
*** zbr has quit IRC | 12:29 | |
*** zbr9 is now known as zbr | 12:29 | |
*** iurygregory has joined #opendev | 12:51 | |
*** zbr6 has joined #opendev | 13:21 | |
*** zbr has quit IRC | 13:22 | |
*** zbr6 is now known as zbr | 13:22 | |
fungi | i want to say we saw this same symptom briefly, immediately after the upgrade, and opened a gerrit bug about it | 13:45 |
fungi | looking back for details | 13:45 |
*** zbr6 has joined #opendev | 13:46 | |
fungi | ERROR com.google.gerrit.httpd.restapi.RestApiServlet : Error in PUT /accounts/self/password.http: AlreadyClosedException | 13:48 |
*** zbr has quit IRC | 13:48 | |
*** zbr6 is now known as zbr | 13:48 | |
fungi | com.google.gerrit.exceptions.StorageException: Failed to replace account 26458 in index version 11 | 13:48 |
fungi | looking back through the past month of gerrit error logs, we started seeing these on tuesday, first recorded occurrence was 2021-02-16 at 18:16:53 utc | 13:55 |
*** zbr6 has joined #opendev | 13:55 | |
fungi | last gerrit container restart was weeks earlier, 2021-02-02 | 13:56 |
fungi | so this is something which has spontaneously cropped up | 13:56 |
*** zbr has quit IRC | 13:56 | |
*** zbr6 is now known as zbr | 13:56 | |
fungi | it's a chain of several exceptions, which seems to stem from a lucene index file lock? i'm not great at interpreting the tea leaves in these tracebacks | 14:00 |
*** zbr has quit IRC | 14:01 | |
*** zbr has joined #opendev | 14:01 | |
fungi | yeah, the very first occurrence actually started with "WARN com.google.gerrit.server.plugincontext.PluginContext : Failure in class com.google.gerrit.server.index.change.ReindexAfterRefUpdate of plugin gerrit" | 14:11 |
fungi | but some others later on are "com.google.gerrit.httpd.restapi.RestApiServlet : Error in PUT /accounts/self/username: AlreadyClosedException" and similar for other account methods | 14:12 |
fungi | but they're "Caused by: com.google.gerrit.exceptions.StorageException: java.util.concurrent.ExecutionException: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed [...] Caused by: java.util.concurrent.ExecutionException: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed [...] Caused by: org.apache.lucene.store.AlreadyClosedException: FileLock invalidated by an | 14:15 |
fungi | external force: NativeFSLock(path=/var/gerrit/index/accounts_0011/write.lock,impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive invalid]" | 14:15 |
*** zbr3 has joined #opendev | 14:15 | |
*** zbr has quit IRC | 14:17 | |
*** zbr3 is now known as zbr | 14:17 | |
fungi | https://bugs.chromium.org/p/gerrit/issues/detail?id=13726 "Lost file lock for account index" | 14:26 |
fungi | looks like i opened that the monday we completed our upgrade | 14:26 |
*** LowKey has quit IRC | 14:28 | |
fungi | so the good news is that a gerrit restart will probably mitigate the problem again, but we still have no idea what caused it | 14:28 |
fungi | i can leave it in the broken state for a little while still, in case any other infra-root wants to take a closer look or has other ideas as to how we can extract more information from the current state | 14:28 |
fungi | `docker-compose exec gerrit lslocks -u` once again does not show any lock for /var/gerrit/index/accounts_0011/write.lock | 14:31 |
fungi | so definitely seems to be the same bug | 14:32 |
fungi | i've replied to the bug report, not that they seem all that inclined to triage it (still in "new" state after three months) | 14:55 |
fungi | but maybe now that we can say it wasn't a one-time occurrence, it will be more interesting to someone | 14:56 |
*** zbr5 has joined #opendev | 15:04 | |
*** zbr has quit IRC | 15:06 | |
*** zbr5 is now known as zbr | 15:06 | |
*** redrobot has quit IRC | 15:21 | |
*** lpetrut has joined #opendev | 15:24 | |
*** lpetrut has quit IRC | 15:25 | |
*** redrobot has joined #opendev | 15:26 | |
*** tosky has quit IRC | 15:35 | |
*** slaweq has joined #opendev | 16:15 | |
*** slaweq has quit IRC | 16:47 | |
*** zbr2 has joined #opendev | 17:20 | |
*** zbr has quit IRC | 17:22 | |
*** zbr has joined #opendev | 17:24 | |
*** zbr2 has quit IRC | 17:27 | |
*** slaweq has joined #opendev | 17:28 | |
*** slaweq has quit IRC | 17:34 | |
*** slaweq has joined #opendev | 17:51 | |
*** slaweq has quit IRC | 17:57 | |
openstackgerrit | Jeremy Stanley proposed opendev/git-review master: Test/assert Python 3.9 support https://review.opendev.org/c/opendev/git-review/+/772589 | 18:16 |
*** tosky has joined #opendev | 20:21 | |
*** sboyron has quit IRC | 21:43 | |
*** DSpider has quit IRC | 23:40 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!