opendevreview | Takashi Kajinami proposed openstack/project-config master: Fix wrong project removed https://review.opendev.org/c/openstack/project-config/+/875325 | 02:23 |
---|---|---|
opendevreview | Takashi Kajinami proposed openstack/project-config master: Retire puppet-tacker - Step 1: End project Gating https://review.opendev.org/c/openstack/project-config/+/874539 | 02:24 |
opendevreview | Takashi Kajinami proposed openstack/project-config master: Retire puppet-tacker - Step 5: Remove Project https://review.opendev.org/c/openstack/project-config/+/875291 | 02:24 |
opendevreview | Merged openstack/project-config master: Fix wrong project removed https://review.opendev.org/c/openstack/project-config/+/875325 | 02:50 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539 | 04:39 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679 | 04:39 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680 | 04:39 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539 | 04:42 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679 | 04:42 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680 | 04:42 |
opendevreview | Merged zuul/zuul-jobs master: Add conditional for UA registration role https://review.opendev.org/c/zuul/zuul-jobs/+/874907 | 04:46 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679 | 04:49 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680 | 04:49 |
*** yadnesh|away is now known as yadnesh | 04:49 | |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539 | 04:56 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679 | 04:56 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680 | 04:56 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679 | 04:59 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680 | 04:59 |
opendevreview | Merged zuul/zuul-jobs master: Changes to make fips work on ubuntu https://review.opendev.org/c/zuul/zuul-jobs/+/873893 | 05:02 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679 | 05:08 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680 | 05:08 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539 | 05:21 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679 | 05:21 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680 | 05:21 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-submodule-init role https://review.opendev.org/c/zuul/zuul-jobs/+/871539 | 05:42 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add ensure-git-lfs https://review.opendev.org/c/zuul/zuul-jobs/+/871679 | 05:42 |
opendevreview | Michael Kelly proposed zuul/zuul-jobs master: roles: Add git-lfs-init https://review.opendev.org/c/zuul/zuul-jobs/+/871680 | 05:42 |
opendevreview | Takashi Kajinami proposed openstack/project-config master: Retire puppet-tacker - Step 5: Remove Project https://review.opendev.org/c/openstack/project-config/+/875291 | 07:47 |
*** jpena|off is now known as jpena | 08:36 | |
frickler | infra-root: https://mirror.iad3.inmotion.opendev.org/ seems down, can't check myself until later today | 09:03 |
mnasiadka | frickler: just wanted to write about it ;) | 09:20 |
*** thuvh1 is now known as thuvh | 09:22 | |
*** jpena is now known as jpena|off | 10:22 | |
*** jpena|off is now known as jpena | 10:29 | |
*** dviroel_ is now known as dviroel | 11:28 | |
*** bhagyashris is now known as bhagyashris|ruck | 11:51 | |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Temporarily stop booting nodes in inmotion iad3 https://review.opendev.org/c/openstack/project-config/+/875488 | 12:47 |
fungi | frickler: mnasiadka: i'll self-approve that ^ | 12:47 |
mnasiadka | thanks | 12:48 |
tosky | thanks | 12:57 |
opendevreview | Merged openstack/project-config master: Temporarily stop booting nodes in inmotion iad3 https://review.opendev.org/c/openstack/project-config/+/875488 | 13:17 |
opendevreview | Jeremy Stanley proposed openstack/project-config master: Revert "Temporarily stop booting nodes in inmotion iad3" https://review.opendev.org/c/openstack/project-config/+/875495 | 13:43 |
fungi | that's ^ wip while i see if i can spot the issue | 13:43 |
fungi | openstack server show indicates it's been in a SHUTOFF state since 2023-02-26T22:19:58Z | 13:46 |
fungi | OS-EXT-STS:power_state=Shutdown OS-EXT-STS:vm_state=stopped | 13:46 |
fungi | i'll try booting it | 13:47 |
fungi | seems to be up | 13:49 |
fungi | #status log Booted mirror.iad3.inmotion via Nova API after it was found in power_state=Shutdown since 22:19:58 UTC yesterday | 13:50 |
opendevstatus | fungi: finished logging | 13:50 |
fungi | http://mirror.iad3.inmotion.opendev.org/debian/pool/main/ returns expected content, so afs cache is fine | 13:51 |
fungi | http://mirror.iad3.inmotion.opendev.org/pypi/simple/bindep/ returns expected content, so apache cache is fine | 13:53 |
fungi | infra-root: i've un-wipped 875495 if anyone wants to double-check so we can start booting nodes in inmotion-iad3 again | 13:56 |
fungi | also if someone wants to check logs on the nova controller there (since we have access) that might be good. i need to switch to other tasks for the moment | 13:58 |
clarkb | catching up now. fungi I guess double check the mirror looks good then approve? | 16:30 |
clarkb | I agree afs and proxy content both look good at the urls you posted above. I'll approve it | 16:30 |
clarkb | infra-root I would love feedback on https://review.opendev.org/c/opendev/system-config/+/874340 as I try to sort out bringing gitea09 into the fold of gerrit replication targets | 16:31 |
opendevreview | Clark Boylan proposed opendev/system-config master: Update gitea to 1.18.5 https://review.opendev.org/c/opendev/system-config/+/875533 | 16:38 |
clarkb | we'll want to coordinate ^ with the bring up of gitea09. But I think it can land either before or after we do the replication and it should be fine. Just wnt to avoid doing replication while also restarting giteas intentionally | 16:39 |
opendevreview | Merged openstack/project-config master: Revert "Temporarily stop booting nodes in inmotion iad3" https://review.opendev.org/c/openstack/project-config/+/875495 | 16:42 |
*** jpena is now known as jpena|off | 17:42 | |
clarkb | infra-root: review02 seems quite busy at the moment | 18:10 |
clarkb | The error log doesn't seem to have any new content and we don't seem to be logging to the /var/log/containers location yet? | 18:17 |
clarkb | I'm having a hard time seeing what is going on | 18:17 |
clarkb | ok that info is in docker logs now for whatever reason. That means ianw's change to put it in /var/log/containers would address this | 18:19 |
clarkb | infra-root any ideas? | 18:24 |
clarkb | I'm tempted to restart gerrit | 18:24 |
clarkb | the logs don't really show anything specific that seems problematic... | 18:24 |
clarkb | but more eyeballs are appreciated | 18:25 |
clarkb | fungi: frickler ^ are you around? | 18:27 |
TheJulia | ... I take it gerrit is down? | 18:33 |
clarkb | "yes" the process is running but spinning its wheels and I've yet to figure out why. I'm working on a java trhead dump next but then i think we may have to restart it | 18:34 |
artom | https://www.youtube.com/watch?v=uRGljemfwUE | 18:35 |
TheJulia | ... sadly upgrade parts are not here yet for KSP2 | 18:36 |
clarkb | infra-root it looks like debian doens't package the jcmd tool in the jre headless package (need the full jdk for that? so we should update our image I guess). But looks like kill -3 should work? | 18:37 |
clarkb | ok I've managed to get a threaddump stored in a file in my homedir | 18:44 |
clarkb | infra-root ^ I'll plan to restart the server shortly if I don't hear any objections | 18:45 |
clarkb | there are a bunch of threads waiting on a condition(s) | 18:45 |
fungi | clarkb: yep, sorry stepped away for lunch but looking now | 18:45 |
fungi | clarkb: i agree, restart seems like the expedient approach and then if it comes back we know it's some ongoing external cause | 18:46 |
fungi | system load is around 20 | 18:47 |
fungi | memory is really only half used, half cache, early zero paged out | 18:47 |
fungi | s/early/nearly/ | 18:47 |
clarkb | ok I'll restart | 18:48 |
fungi | thanks! | 18:48 |
clarkb | side note the restart will attempt to replicate to gitea09 too | 18:48 |
clarkb | (just be aware of that) | 18:48 |
fungi | noted | 18:48 |
clarkb | ok thats done | 18:52 |
clarkb | I can get the web ui again | 18:52 |
fungi | yeah, seems to be going again | 18:52 |
fungi | ssh api is working for me too | 18:52 |
clarkb | the thread dump is inside 20230227-gerrit-spinning-logs in my homedir. Note there are logs on either end of that as the kill -3 emits the threadump in the normal log output destination | 18:53 |
fungi | are you going to #status log the restart, or should i? | 18:55 |
clarkb | can you? I'm still looking at logs | 18:55 |
fungi | on it! | 18:55 |
fungi | #status log The Gerrit service on review.opendev.org has been restarted to clear an as of yet undiagnosed condition which lead to a prolonged period of unresponsiveness | 18:57 |
opendevstatus | fungi: finished logging | 18:57 |
fungi | er, s/lead/led/ | 18:57 |
fungi | oh well, my grammar errors are my own | 18:58 |
corvus | i'm still seeing timeouts | 18:58 |
fungi | so sounds like the situation is coming back already | 18:58 |
corvus | yeah, gertty logs suggest that's the case | 18:58 |
clarkb | ya the load is still high | 18:58 |
fungi | trying to see if i can get an ssh connection count out of it | 18:59 |
corvus | offline until 18:50, online from 18:51--18:54 then offline again (approx times) | 18:59 |
fungi | hopefully it's not already too far gone | 18:59 |
clarkb | I can't run show queue anymore | 18:59 |
fungi | i'll try to digest apache logs looking for something hammering the webui, and also see if there's any useful clues in the gerrit ssh log | 19:00 |
clarkb | the error log has some complaints about things in /var/gerrit/data/replication/ref-updates (note we don't seem to bind mount this dir so downing the container will delete these contents) | 19:03 |
clarkb | ideas: We could block access via the firewall and restart again and then see if it is stable without external connectivity. | 19:07 |
clarkb | I'm going to get another thread dump first I Guess | 19:07 |
fungi | there's a couple of netapp ci accounts with a few auth failures a minute over the 18z hour | 19:08 |
fungi | nothing really jumping out at me in the ssh log though | 19:08 |
clarkb | I've got a second thread dump from the current state in my homedir now | 19:11 |
fungi | really not a substantial count of requests through apache from any single client. the most active client made 379 requests during the 18z hour | 19:14 |
fungi | if it's someone hitting the server over https, then it's the nature of the requests not their volume | 19:14 |
clarkb | I think I may see it | 19:15 |
clarkb | or a thing anyway. | 19:15 |
fungi | the second most active client is definitely querying a lot of changes | 19:17 |
fungi | from an ipv6 addy in a research network in au | 19:17 |
clarkb | yes that | 19:17 |
clarkb | I think we should temporarily ask them not so kindly via the firewall to leave us alone | 19:18 |
fungi | GET /changes/?q=is:merged&o=CURRENT_REVISION&o=CURRENT_FILES&start=230500 HTTP/1.1" | 19:18 |
fungi | yes | 19:18 |
TheJulia | ... Did this happen ?last year? or was it the year before | 19:33 |
fungi | it happens semi-regularly | 19:33 |
fungi | last one i remember was a student/researcher at a university in canada | 19:34 |
fungi | i wonder if there's a way to rate-limit expensive rest api queries, but that likely gets deep into layer 7 inspection | 19:34 |
clarkb | is it still sad? | 19:37 |
clarkb | arg | 19:37 |
fungi | there's another ip address | 19:37 |
fungi | associated with monash university | 19:37 |
clarkb | hrm looks like maybe its just really slow right now. | 19:37 |
clarkb | though unsure if it will continue to trend worse. Oh if there is another ip then ya maybe we need to be more aggressive in the firweall | 19:38 |
fungi | also in au i think | 19:38 |
fungi | my guess is it's a dual-stack machine and the "new" address is just its ipv4 identity because we blocked its v6 from reaching us | 19:40 |
clarkb | ah | 19:40 |
dansmith | I just tried to comment on a patch and it timed out during the submit.. when i refresh I see a new comment, but without the text | 19:41 |
dansmith | so I dunno if it's "just slow" | 19:41 |
clarkb | dansmith: fungi's theory above seems likely and we'll need to block more ips | 19:41 |
fungi | well, i've blocked the offending ipv4 and ipv4 address i found in our logs and don't see any new sources for the same query pattern as of the past ~8 minutes, but it may take gerrit time to recover | 19:42 |
dansmith | ah, text just showed up | 19:42 |
dansmith | really weird | 19:42 |
fungi | er, ipv4 and ipv6 | 19:42 |
clarkb | there is also likely a lot of demand for the service since its been afk for a bit | 19:43 |
clarkb | I can show queues via ssh and load my personal dashboard on the webui | 19:43 |
fungi | yeah, some python script is making parallel queries for large swaths of merged changes from gerrit in parallel, likely a student research project based on the networks | 19:43 |
clarkb | system load is still high but it seems to be responsive at least | 19:43 |
clarkb | side note gitea09 replication seems to be unhappy :( | 19:44 |
clarkb | It seems to be trying to replicate things then saying "NOT_ATTEMPTED" | 19:44 |
fungi | i suppose in situations like that, the sockets might stay open long enough waiting for a response that a connlimit overload rule like we use for concurrent ssh sockets could help mitigate it | 19:44 |
clarkb | oh except https://gitea09.opendev.org:3081/openstack/manila seems to have replicated | 19:45 |
TheJulia | does gerrit tell browsers to close the socket or keep it open? (http keeyalive) | 19:46 |
clarkb | I'm going to try and manually replciate bindep to gitea09 and see what it does for that | 19:46 |
clarkb | ya that results in not attempted too. weird, it seems to have worked earlier when I restarted but now is sad? | 19:48 |
clarkb | on the gitea side I can see it accepting gerrit's pubkey for auth in the ssh logs | 19:50 |
clarkb | oh huh now it replicated. Maybe I'm not reading the logs properly | 19:51 |
clarkb | ya I think that is it I'm just not reading it correctly. | 19:51 |
clarkb | I'm going to hold off on doing a full replication of everything (somethign we should consider doing for all giteas due to the outage) until we're reasonably happy that things are mostly resolved | 19:52 |
clarkb | system load is normal now | 19:52 |
fungi | status notice The Gerrit service on review.opendev.org experienced severe performance degradation between 17:50 and 19:45 due to excessive API query activity; the addresses involved are now blocked but any changes missing job results from that timeframe should be rechecked | 19:53 |
fungi | clarkb: ^ lgty? | 19:53 |
clarkb | fungi: ++ | 19:53 |
fungi | #status notice The Gerrit service on review.opendev.org experienced severe performance degradation between 17:50 and 19:45 due to excessive API query activity; the addresses involved are now blocked but any changes missing job results from that timeframe should be rechecked | 19:53 |
opendevstatus | fungi: sending notice | 19:53 |
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org experienced severe performance degradation between 17:50 and 19:45 due to excessive API query activity; the addresses involved are now blocked but any changes missing job results from that timeframe should be rechecked | 19:54 | |
clarkb | infra-root: things that I don't want to forget 1) determining if we need to make /var/gerrit/data a persistent docker volume 2) fixing the docker logs to go in /var/log/cotnainers (I think ianw had changes for this already) 3) deciding if we want to install the normal jdk headless package instead of the jre headless package. This will add tools like jcmd which can be used for thread | 19:55 |
clarkb | dumps (though kill -3 worked fine) | 19:55 |
opendevstatus | fungi: finished sending notice | 19:56 |
clarkb | I'm going to finish this review I was doing when I noticed gerrit was sad then find lunch as it seems to be staying stable | 19:58 |
fungi | now i'm noticing i should have specified utc for that time range. i'm a bit scattered after a week of travel/meetings | 19:59 |
fungi | TheJulia: as to whether the server is recommending http keepalive/pipelining, i'm not sure (i guess that would be up to the apache daemon that proxies those requests to the java-based httpd embedded in gerrit itself). also i have no idea if scripts/libraries like the one involved in this incident pay attention to that signal | 20:01 |
fungi | those would be things to look into | 20:01 |
TheJulia | fungi: so servers *can* say "close the connection, and realistically should if a rule as such is put into place | 20:02 |
TheJulia | web browsers *really* don't like ti when they think the socket is still open, inside of the timeout window for the kept alive connection, and find out that it is no longer open | 20:02 |
TheJulia | s/ti/it/ | 20:02 |
TheJulia | at a job a long time ago, we... injected "Connection: Close" into response headers a bit :) | 20:03 |
clarkb | in this case I think the remote made a request and gerrit held the connection open while it was responding to that request. That would happen regardless of keepalives | 20:06 |
clarkb | as long as data was flowing (which I think it was based on the network utilization graphs) | 20:06 |
clarkb | I'm not sure I understand what the suggestion is as far as changing the pipelining behavior | 20:06 |
TheJulia | indeed, but if the webserver/browser thought the socket was open and things suddenly go down the right path of failure, the browser would hang slightly until it tried opening a new connection | 20:07 |
fungi | what i'm thinking is that if apache recommended pipelining of requests, then we could be more aggressive about connlimit overload rules in iptables, because a browser should have very few concurrent sockets that way | 20:07 |
TheJulia | ... years ago firefox was 4 sockets, fwiw | 20:07 |
clarkb | http 1.1 is pipelined by default iirc. | 20:09 |
TheJulia | I believe so yes, so if you do want to target log lived sockets, you should explicitly tell the browsers "do not hold the socket open" | 20:10 |
fungi | i mean, limiting clients to 10 concurrent connections would probably have mitigated this, but it's hard to say without recreating the incident and testing different concurrencies | 20:10 |
TheJulia | ++ | 20:10 |
TheJulia | that also... would have preveted it... except for anyone behind a NAT gateway. | 20:11 |
clarkb | ya NAT is he main issue. Most of red hat is behind a NAT for example | 20:11 |
clarkb | and I'm sure thats common with many corp networks (though HP gave us all individual IPs out of their /8 when I was there) | 20:11 |
fungi | right, that's why we have the ssh connlimit threshold at 100 simultaneous states | 20:12 |
clarkb | ok lunch now. Back in a bit to followup on some of those concerns | 20:12 |
TheJulia | enjoy! | 20:13 |
fungi | #status log Restarted the ptgbot service, apparently hung and serving a dead web page at ptg.opendev.org since 2023-02-07 | 20:23 |
opendevstatus | fungi: finished logging | 20:23 |
clarkb | load levels appear to remain low and stable. I'm going to try and kick off a fuul gitea09 replication now | 20:52 |
opendevreview | Clark Boylan proposed opendev/system-config master: Switch gerrit container from jre to jdk packages https://review.opendev.org/c/opendev/system-config/+/875553 | 20:57 |
clarkb | thats one of my todos in followup | 20:57 |
fungi | i still need to remember to add x/virtualpdu openstack handover plan to the meeting agenda before you send it out | 21:01 |
fungi | will try to get to that in a few | 21:02 |
clarkb | fungi: no rush I probably won't get to the agenda until the end of my day | 21:04 |
clarkb | fungi: re auto reloading configs for replication this replication data storage system seems to write out replication tasks to disk which would in theory allow the tasks to survive reloads. I've asked about that in the gerrit discord room and will work on an update to my autoreload change to bind mount that directory | 21:06 |
clarkb | fungi: is it happening again? | 21:17 |
ianw | i think so | 21:18 |
johnsom | So, I can't get gerrit to load. Was my IP in the blocked list? lol | 21:18 |
clarkb | johnsom: no I think our unfriendly user(s) have found new IPs | 21:19 |
johnsom | Oh, sigh. Cheering for you then | 21:19 |
clarkb | fungi: I think we should go ahead and block their entire ip range | 21:19 |
fungi | checking now | 21:24 |
clarkb | it should be coming back up now (I just restarted it) | 21:36 |
fungi | keeping an eye on the apache logs to see if it jumps outside the range we've blocked so far | 21:37 |
fungi | this stuff seems to keep cropping up every time i break to eat. guess i should avoid having a midnight snack | 21:39 |
clayg | clarkb: thanks for the quick fix 👍 | 21:40 |
clarkb | clayg: its a team effort :) | 21:40 |
fungi | i have sympathy for whatever student this is we're blocking, but we really need them to coordinate with us for bulk data | 21:41 |
opendevreview | Clark Boylan proposed opendev/system-config master: Bind mount Gerrit's review_site/data dir https://review.opendev.org/c/opendev/system-config/+/875570 | 21:41 |
fungi | i do think it's flattering that just about every time this happens it's queries from a university netblock | 21:42 |
clarkb | infra-root ^ if we can get confirmation on the replication plugin persisting things through that fs location then I think this change is a good followup to the autoreload change | 21:42 |
clarkb | but also one that should be considered carefully since it has to do wit hpersisting disk contents | 21:43 |
clarkb | its possible we may only want to do that for the replication plugin and not the other plugins. I don't know yet. Tried to leave notes about that in the commit message too | 21:43 |
fungi | yeah, we previously tried it and rolled back because it lost replication tasks in the queue, so i'm wary | 21:43 |
fungi | but so long as we think that will fix it this time on newer gerrit, i agree it would be an added convenience | 21:44 |
clarkb | yup exactly. I asked about it on gerrit discord so hopefully someone will chime in | 21:44 |
clarkb | if not soon I'll eventually convert that to an email to their list I Guess | 21:45 |
clarkb | aha looks like when you delete a project with the delete-project plugin it archives it to review_site/data/delete-poject | 21:45 |
clarkb | so thats less useful to make long lived but not a problem if we do I think | 21:46 |
fungi | poject typo yours or theirs? | 21:46 |
clarkb | mine | 21:47 |
clarkb | I'm just typing these things | 21:47 |
fungi | ah, okay. wasn't sure whether to be amused | 21:47 |
clarkb | related to deleting projects the gitea09 replication is slow and I'm realizing we've got a ton of dead projects... | 21:47 |
clarkb | oh well I GUESS | 21:47 |
clarkb | That was weird my keyboard got its shift key stuck | 21:48 |
fungi | secret alternate capslock | 21:48 |
fungi | it's proof you're turning into an old man | 21:49 |
clarkb | ha | 21:49 |
clarkb | I never use caps lock and through this process discovered that caps lock doesn't affect symbols or the number row. Makes sense | 21:50 |
fungi | it's really a useless key. not a proper bucky-lock | 21:51 |
fungi | okay, added x/virtualpdu to the agenda. i'm clear on topics | 21:57 |
clarkb | some of these replication tasks for gitea09 are going into a retry mode due to "short read of block" TransportException errors in jgit. When I view the repo in gitea09 there is content there so this may be an eventually consistent thing | 22:03 |
clarkb | Oh this might be due to our 900 second timeout? | 22:04 |
clarkb | I'll have to continue to keep an eye on it I guess | 22:04 |
clarkb | ya I think that is it. So ya should eventually be consistent after a few retries | 22:04 |
fungi | makes sense. thanks | 22:05 |
clarkb | fungi: NasserG over in Gerrit land believes the on disk storage of replication tasks should persist things through autoreload config updates | 22:36 |
clarkb | so I think those two chagnes together are probably a good end result to aim for. I do think the second one (that modifies volumes and mounts) deserves extra careful review though | 22:36 |
fungi | sounds good, thanks for digging into it! | 22:56 |
clarkb | I've updated the meeting agenda now with the content I was aware of | 23:04 |
clarkb | anything else to add to it? | 23:04 |
fungi | nothing else i know of | 23:09 |
clarkb | ianw: responded to your question on https://review.opendev.org/c/opendev/system-config/+/874340 tldr is it appear sthat the followup change may make all these problems go away | 23:14 |
ianw | ok i'm fine to try it. if we notice things out of sync we have a thread to pull on | 23:14 |
clarkb | ianw: we are down to 2100 ish replication tasks for gitea09 now from like 2400 or so? I don't expect this to be done by the time I need to call it a day. I think it isn't impacting anything else as there are 4 dedicated threads for each replication target. I just want you to be aware of that as a thing gerrit is currently doing | 23:50 |
clarkb | I'll probably trigger a full replication tomorrow of all targets once this first pass of gitea09 is done to make sure we didn't lose anything in the restarts | 23:50 |
ianw | ++ sounds good | 23:50 |
clarkb | that should go much quicker since the bulk of the data will already be there | 23:50 |
fungi | still no sign of new research | 23:56 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!