*** liuxie is now known as liushy | 07:19 | |
*** iurygregory__ is now known as iurygregory | 11:18 | |
jrosser | is it possible that one gitea backend is returning different content for openstack-ansible than the others? | 12:48 |
---|---|---|
fungi | jrosser: yes, see https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2024-07-20.log.html | 12:53 |
fungi | i'll go ahead and take gitea12 out of the lb pools and start working through the plan outlined there | 12:53 |
jrosser | ah ok - thanks! i think i get unlucky and hit that one consistently from my work network | 12:54 |
fungi | working theory is an unplanned host reboot in vexxhost sjc1 resulted in a truncated write | 12:54 |
fungi | jrosser: the load balancer does source persistence, so yes if you hit it you're likely to keep hitting it. sorry about that! | 12:54 |
fungi | jrosser: it's downed in haproxy now. can you check whether everything's fine for your remote now? | 12:56 |
jrosser | fungi: yes that looks good now - git fetch grabbed some new objects and master looks like i would expect now | 12:59 |
fungi | thanks for confirming | 13:00 |
*** iurygregory__ is now known as iurygregory | 13:01 | |
fungi | after trying a few options, i ended up transplanting a copy of openstack/openstack-ansible.git from gitea13 to replace the broken one on gitea12, triggered gerrit's replication of that repository to the backend which completed successfully, and am now rerunning the git gc cron manually to see if it raises any new errors | 13:16 |
fungi | seemingly unrelated, the gc errors on "fatal: bad object refs/changes/43/773243/8" (a non-current patchset for an openstack/kolla-ansible change) | 13:22 |
fungi | that ref was created in gerrit 2024-07-15 13:55-45 utc | 13:28 |
fungi | er, i meant 13:55:45 utc | 13:30 |
fungi | our guess on an outage that could have been responsible for the problem commit in openstack/openstack-ansible was between 13:56-14:20 utc on 2024-07-15 so, assuming a few seconds of delay for the replication from gerrit that fits (it's certainly suspiciously closely-timed at any rate) | 13:31 |
fungi | i'll perform similar transplantation for that repo in a bit | 13:31 |
fungi | okay, done and running the gc yet again to see if it comes back clean this time | 13:47 |
fungi | all good, no stdout/stderr and exited 0 | 14:31 |
fungi | i've put gitea12 back into the lb pools now | 14:31 |
fungi | #status log Repaired data corruption for two repositories on the gitea12 backend, root cause seems to be from an unexpected hypervisor host outage | 14:32 |
opendevstatus | fungi: finished logging | 14:32 |
clarkb | fungi: thank you for taking care of that | 14:40 |
fungi | np | 14:43 |
clarkb | from my notes `docker-compose run -T --rm gitea-web gitea doctor convert` is the command I ran on the held test node to run the doctor command. I think the rough process is to remove node from the lb, shutdown gitea-web and gitea-ssh, perform a db backup dump, run the doctor command, check exit code/logs/db table descriptions, start gitea and check the service, add back to the lb | 15:13 |
clarkb | we have a newer held node from the upgrade testing that I haven't cleaned up yet. We can do a more complete test run of that process there first | 15:14 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/924536 is a followup to the gitea upgrade to reflect the secret format in the test node to match reality better | 15:30 |
fungi | plan sounds good to me | 15:31 |
clarkb | ok when I'm a bit more awake and I've caught up on other things I'll take a look at doing that on the test node | 15:32 |
clarkb | https://etherpad.opendev.org/p/hcz4yMxUIsAWGgyoHKeZ put the gitea db doctoring steps in here. Still need to do the test pass on the test node and take additional notes | 16:54 |
opendevreview | Merged opendev/system-config master: Update test gitea's JWT_SECRET https://review.opendev.org/c/opendev/system-config/+/924536 | 16:57 |
opendevreview | Merged opendev/system-config master: Add vmware migration list to lists.openinfra.dev https://review.opendev.org/c/opendev/system-config/+/924432 | 16:57 |
clarkb | ok I'm reasonably happy with what is in the etherpad now. I'm not sure I've got enough notes in there about checking the results so definitely add content to that section if you've got additional ideas | 17:15 |
clarkb | and feel free to connect to the db on the test node and poke around. That is what I was doing | 17:17 |
clarkb | added a note about a docker-compose behavior I just discovered too. I don't think it is a problem for us but it was unexpected for me | 17:24 |
clarkb | I'm looking at hte meeting agenda to prepare for tomorrow and realize we didn't clean up those x/* projects from the zuul tenatn config yet did we? | 18:08 |
clarkb | I think we're past the day we said we would do that and we can proceed now cc frickler | 18:08 |
clarkb | infra-root do we want to keep the zuul db performance agenda item on the meeting agenda? I know a bunch of work went on while I was out and I think its resolved at this point. Maybe therei s value in a recap/summary? | 18:14 |
clarkb | frickler: and not sure if you want to keep the queue window size item? Maybe we can rewrite it as implementing early fail detection for openstack jobs instead? | 18:15 |
frickler | clarkb: I can check if the x/* patch is still current tomorrow | 18:20 |
frickler | clarkb: I would still like to reduce the queue window for openstack gate, at least temporarily to see at what value it will end settling on. not sure if there is a way to track that in grafana? | 18:22 |
frickler | I don't see anyone having time or interest to work on early fail detection | 18:22 |
frickler | so that doesn't sound like a viable alternative to me for now | 18:23 |
fungi | i.e. no interest from those maintaining the tempest/grenade jobs in signalling early failure from individual test results? | 18:24 |
fungi | clarkb: the plan in the etherpad looks good to me. i'm not thinking of anything else useful to test post-doctor | 18:26 |
clarkb | I don't see any grafana graphs for the window size, but I do think there is statsd reporting for that value so we could have a dashboard for it (the tricky thing is it is per queue now and we have lots of queues so not sure how best to visualize that) | 18:27 |
clarkb | fungi: thank you for looking | 18:27 |
clarkb | ok I've made edits to the agenda. Anything else we'd like to add/edit/remove? | 20:15 |
fungi | nothing comes to mind | 20:26 |
clarkb | cool I'll send that out at the end of my day to give tonyb a chance to chime in. re the gitea stuff it is looking like I'll start on actually running through that either tomorrow afternoon (after meetings) or wednesday morning. I've got at least one errand this afternoon and would prefer to have enough time to get all of the giteas done together | 20:32 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!