Monday, 2024-07-22

*** liuxie is now known as liushy		07:19
*** iurygregory__ is now known as iurygregory		11:18
jrosser	is it possible that one gitea backend is returning different content for openstack-ansible than the others?	12:48
fungi	jrosser: yes, see https://meetings.opendev.org/irclogs/%23opendev/%23opendev.2024-07-20.log.html	12:53
fungi	i'll go ahead and take gitea12 out of the lb pools and start working through the plan outlined there	12:53
jrosser	ah ok - thanks! i think i get unlucky and hit that one consistently from my work network	12:54
fungi	working theory is an unplanned host reboot in vexxhost sjc1 resulted in a truncated write	12:54
fungi	jrosser: the load balancer does source persistence, so yes if you hit it you're likely to keep hitting it. sorry about that!	12:54
fungi	jrosser: it's downed in haproxy now. can you check whether everything's fine for your remote now?	12:56
jrosser	fungi: yes that looks good now - git fetch grabbed some new objects and master looks like i would expect now	12:59
fungi	thanks for confirming	13:00
*** iurygregory__ is now known as iurygregory		13:01
fungi	after trying a few options, i ended up transplanting a copy of openstack/openstack-ansible.git from gitea13 to replace the broken one on gitea12, triggered gerrit's replication of that repository to the backend which completed successfully, and am now rerunning the git gc cron manually to see if it raises any new errors	13:16
fungi	seemingly unrelated, the gc errors on "fatal: bad object refs/changes/43/773243/8" (a non-current patchset for an openstack/kolla-ansible change)	13:22
fungi	that ref was created in gerrit 2024-07-15 13:55-45 utc	13:28
fungi	er, i meant 13:55:45 utc	13:30
fungi	our guess on an outage that could have been responsible for the problem commit in openstack/openstack-ansible was between 13:56-14:20 utc on 2024-07-15 so, assuming a few seconds of delay for the replication from gerrit that fits (it's certainly suspiciously closely-timed at any rate)	13:31
fungi	i'll perform similar transplantation for that repo in a bit	13:31
fungi	okay, done and running the gc yet again to see if it comes back clean this time	13:47
fungi	all good, no stdout/stderr and exited 0	14:31
fungi	i've put gitea12 back into the lb pools now	14:31
fungi	#status log Repaired data corruption for two repositories on the gitea12 backend, root cause seems to be from an unexpected hypervisor host outage	14:32
opendevstatus	fungi: finished logging	14:32
clarkb	fungi: thank you for taking care of that	14:40
fungi	np	14:43
clarkb	from my notes `docker-compose run -T --rm gitea-web gitea doctor convert` is the command I ran on the held test node to run the doctor command. I think the rough process is to remove node from the lb, shutdown gitea-web and gitea-ssh, perform a db backup dump, run the doctor command, check exit code/logs/db table descriptions, start gitea and check the service, add back to the lb	15:13
clarkb	we have a newer held node from the upgrade testing that I haven't cleaned up yet. We can do a more complete test run of that process there first	15:14
clarkb	https://review.opendev.org/c/opendev/system-config/+/924536 is a followup to the gitea upgrade to reflect the secret format in the test node to match reality better	15:30
fungi	plan sounds good to me	15:31
clarkb	ok when I'm a bit more awake and I've caught up on other things I'll take a look at doing that on the test node	15:32
clarkb	https://etherpad.opendev.org/p/hcz4yMxUIsAWGgyoHKeZ put the gitea db doctoring steps in here. Still need to do the test pass on the test node and take additional notes	16:54
opendevreview	Merged opendev/system-config master: Update test gitea's JWT_SECRET https://review.opendev.org/c/opendev/system-config/+/924536	16:57
opendevreview	Merged opendev/system-config master: Add vmware migration list to lists.openinfra.dev https://review.opendev.org/c/opendev/system-config/+/924432	16:57
clarkb	ok I'm reasonably happy with what is in the etherpad now. I'm not sure I've got enough notes in there about checking the results so definitely add content to that section if you've got additional ideas	17:15
clarkb	and feel free to connect to the db on the test node and poke around. That is what I was doing	17:17
clarkb	added a note about a docker-compose behavior I just discovered too. I don't think it is a problem for us but it was unexpected for me	17:24
clarkb	I'm looking at hte meeting agenda to prepare for tomorrow and realize we didn't clean up those x/* projects from the zuul tenatn config yet did we?	18:08
clarkb	I think we're past the day we said we would do that and we can proceed now cc frickler	18:08
clarkb	infra-root do we want to keep the zuul db performance agenda item on the meeting agenda? I know a bunch of work went on while I was out and I think its resolved at this point. Maybe therei s value in a recap/summary?	18:14
clarkb	frickler: and not sure if you want to keep the queue window size item? Maybe we can rewrite it as implementing early fail detection for openstack jobs instead?	18:15
frickler	clarkb: I can check if the x/* patch is still current tomorrow	18:20
frickler	clarkb: I would still like to reduce the queue window for openstack gate, at least temporarily to see at what value it will end settling on. not sure if there is a way to track that in grafana?	18:22
frickler	I don't see anyone having time or interest to work on early fail detection	18:22
frickler	so that doesn't sound like a viable alternative to me for now	18:23
fungi	i.e. no interest from those maintaining the tempest/grenade jobs in signalling early failure from individual test results?	18:24
fungi	clarkb: the plan in the etherpad looks good to me. i'm not thinking of anything else useful to test post-doctor	18:26
clarkb	I don't see any grafana graphs for the window size, but I do think there is statsd reporting for that value so we could have a dashboard for it (the tricky thing is it is per queue now and we have lots of queues so not sure how best to visualize that)	18:27
clarkb	fungi: thank you for looking	18:27
clarkb	ok I've made edits to the agenda. Anything else we'd like to add/edit/remove?	20:15
fungi	nothing comes to mind	20:26
clarkb	cool I'll send that out at the end of my day to give tonyb a chance to chime in. re the gitea stuff it is looking like I'll start on actually running through that either tomorrow afternoon (after meetings) or wednesday morning. I've got at least one errand this afternoon and would prefer to have enough time to get all of the giteas done together	20:32

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!