Wednesday, 2024-07-24

corvus	didn't the job that failed run in periodic?	00:00
clarkb	corvus: one was periodic and one was deploy	00:01
clarkb	I think your assumption that the job would set the remote if it needed it is correct though	00:01
corvus	https://zuul.opendev.org/t/openstack/build/a559a6653a274579b41f93efb13eac2a/console#1/1/12/bridge01.opendev.org	00:01
clarkb	I'll get a link in a sec	00:01
clarkb	https://opendev.org/opendev/base-jobs/src/branch/master/playbooks/infra-prod/setup-src.yaml#L27-L53	00:02
clarkb	so in the case of opendev prepare-workspace-git would remove the remote then we'd add it back again I think	00:02
clarkb	though its possible that ansible might get angry about changing the remote like that? I don't know	00:03
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Fix prepare-workspace-git operating on existing repos https://review.opendev.org/c/zuul/zuul-jobs/+/924802	00:18
opendevreview	James E. Blair proposed zuul/zuul-jobs master: Make prepare-workspace-git behavior more consistent https://review.opendev.org/c/zuul/zuul-jobs/+/924804	00:21
opendevreview	Clark Boylan proposed openstack/project-config master: Set xenial min ready to 0 https://review.opendev.org/c/openstack/project-config/+/924806	00:28
opendevreview	Merged zuul/zuul-jobs master: Fix prepare-workspace-git operating on existing repos https://review.opendev.org/c/zuul/zuul-jobs/+/924802	00:32
opendevreview	Merged openstack/project-config master: Set xenial min ready to 0 https://review.opendev.org/c/openstack/project-config/+/924806	00:55
fungi	that does seem to have fixed it	01:19
clarkb	ya things are looking good for now we should probably keep an eye on it but I'm happy with how the hourly jobs are going and the job for 924806	01:19
clarkb	I crammed dinner down. Now to find something frozen and sweet	01:20
opendevreview	yatin proposed zuul/zuul-jobs master: Fix wheel_mirror for Debian https://review.opendev.org/c/zuul/zuul-jobs/+/924815	05:01
opendevreview	Benjamin Schanzel proposed zuul/zuul-jobs master: Add ensure-dib role https://review.opendev.org/c/zuul/zuul-jobs/+/922910	07:26
opendevreview	Benjamin Schanzel proposed zuul/zuul-jobs master: wip: Add build-diskimage role https://review.opendev.org/c/zuul/zuul-jobs/+/922911	07:26
opendevreview	Benjamin Schanzel proposed zuul/zuul-jobs master: Add ensure-dib role https://review.opendev.org/c/zuul/zuul-jobs/+/922910	07:48
opendevreview	Benjamin Schanzel proposed zuul/zuul-jobs master: wip: Add build-diskimage role https://review.opendev.org/c/zuul/zuul-jobs/+/922911	07:48
opendevreview	Benjamin Schanzel proposed zuul/zuul-jobs master: Add ensure-dib role https://review.opendev.org/c/zuul/zuul-jobs/+/922910	07:50
opendevreview	Benjamin Schanzel proposed zuul/zuul-jobs master: wip: Add build-diskimage role https://review.opendev.org/c/zuul/zuul-jobs/+/922911	07:50
opendevreview	Benjamin Schanzel proposed zuul/zuul-jobs master: wip: Add build-diskimage role https://review.opendev.org/c/zuul/zuul-jobs/+/922911	08:08
frickler	so the issue that was blocking for [ps]*-config should be resolved, do I understand this correctly? at least the infra-prod hourly runs are looking fine again since 01:00 https://zuul.opendev.org/t/openstack/buildset/d0f51c4265364a49a5e7c1e08cd77ab2	08:17
opendevreview	Benjamin Schanzel proposed zuul/zuul-jobs master: wip: Add build-diskimage role https://review.opendev.org/c/zuul/zuul-jobs/+/922911	08:19
opendevreview	Simon Westphahl proposed zuul/zuul-jobs master: wip: Add example role for converting images https://review.opendev.org/c/zuul/zuul-jobs/+/922912	08:43
opendevreview	Benjamin Schanzel proposed zuul/zuul-jobs master: wip: Add example role for converting images https://review.opendev.org/c/zuul/zuul-jobs/+/922912	09:29
opendevreview	Jan Gutter proposed zuul/zuul-jobs master: Update sources for cri-o for newer Ubuntu https://review.opendev.org/c/zuul/zuul-jobs/+/924750	10:35
opendevreview	Jan Gutter proposed zuul/zuul-jobs master: Move minikube out of /tmp https://review.opendev.org/c/zuul/zuul-jobs/+/924738	10:35
opendevreview	Benjamin Schanzel proposed zuul/zuul-jobs master: Add build-diskimage role https://review.opendev.org/c/zuul/zuul-jobs/+/922911	10:36
opendevreview	Benjamin Schanzel proposed zuul/zuul-jobs master: Add a role to convert diskimages between formats https://review.opendev.org/c/zuul/zuul-jobs/+/922912	10:36
opendevreview	Jan Gutter proposed zuul/zuul-jobs master: Update sources for cri-o for newer Ubuntu https://review.opendev.org/c/zuul/zuul-jobs/+/924750	10:56
opendevreview	Jan Gutter proposed zuul/zuul-jobs master: Move minikube out of /tmp https://review.opendev.org/c/zuul/zuul-jobs/+/924738	10:56
fungi	frickler: yes, seems it's cleared up with the regression fix in zuul-jobs	11:51
opendevreview	Jake Yip proposed openstack/project-config master: Sync magnum-capi-helm-charts repo to GitHub mirror https://review.opendev.org/c/openstack/project-config/+/924846	11:53
opendevreview	Merged openstack/project-config master: Sync magnum-capi-helm-charts repo to GitHub mirror https://review.opendev.org/c/openstack/project-config/+/924846	12:10
opendevreview	Merged zuul/zuul-jobs master: Fix wheel_mirror for Debian https://review.opendev.org/c/zuul/zuul-jobs/+/924815	12:16
frickler	this is weird, is this a bug in zuul or a misconfiguration on our side or just the way it is? https://review.opendev.org/c/opendev/base-jobs/+/922653 one tenant is V-1, two others V+1 and the slowest one wins. not the reproducible gating that I would have expected. I wonder whether it might also run multiple gate pipelines if it would get approved? once rejected, twice merged? ;)	12:25
fungi	i think we shouldn't have the volvocars tenant running jobs for opendev/base-jobs changes	12:35
fungi	that is probably a configuration mistake copied from the opendev tenant (where it should be voting/gating)	12:36
frickler	yes, but even then there's opendev tenant + openstack tenant with conflicting votes	13:22
fungi	i think when we looked into it before there was no way to prevent zuul from adding a verified -1 on config errors from other tenants, but that's less concerning to me	13:27
clarkb	right its -1 won't be a -2 so its "fine" as far as merging goes	13:59
clarkb	did anyone look at the tenant configs to determine if the new volvocars config is wrong?	13:59
fungi	i have not yet	14:14
clarkb	I think I see the issue. I can push a patch once I have ssh keys loaded. But then I need to go find breakfast and do morning things before lookingat gitea in a bit	14:15
opendevreview	Clark Boylan proposed openstack/project-config master: Limit what volvocars loads from opendev/base-jobs https://review.opendev.org/c/openstack/project-config/+/924858	14:20
frickler	the -1 does block checks running on patches with a depends-on, though, see https://review.opendev.org/c/openstack/project-config/+/924790/1	15:17
opendevreview	Jan Gutter proposed zuul/zuul-jobs master: Move minikube out of /tmp https://review.opendev.org/c/zuul/zuul-jobs/+/924738	15:38
clarkb	infra-root I'm going to remove gitea14.o.o from the load balance now then proceed with the process at https://etherpad.opendev.org/p/hcz4yMxUIsAWGgyoHKeZ	16:29
corvus	frickler: the openstack tenant is the one that is reporting the config error on the dependent change: https://zuul.opendev.org/t/openstack/buildset/c5e7c0c4083d4fcdb1b71a43b539ca83	16:34
clarkb	the db backups is complete and the filesize looks about what I expected it to be	16:35
clarkb	I'm proceeding with running the doctor command now	16:35
corvus	so it makes sense that the change that depends on it reports that it depends on a change with a config error	16:35
corvus	clarkb: roger, docker doctor.	16:35
clarkb	"Converted successfully, please confirm your database's character set is now utf8mb4" that took about 2.5 minutes or so	16:38
clarkb	there were also slow query warnings I didn't get from the test node	16:38
clarkb	but I think thats fine its just letting us know it took some time to do some of these conversion	16:38
clarkb	checks look good. I'm going to turn gitea services back on, then do full replication against it and if we're happy with the results we can add it back to the lb	16:41
clarkb	https://gitea14.opendev.org:3081/opendev/system-config/ seems to be working	16:42
fungi	clarkb: sounds good, i guess i stepped away at the wrong moment, sorry!	16:43
clarkb	no problem. Seems to have gone well. If yuo are back maybe check there isn't anything amiss on the service usage side	16:44
fungi	and yeah, testing so far things look good with gitea14	16:44
clarkb	do we remember which repo had the 500 errors before?	16:44
fungi	the corrupt/truncated objects and refs?	16:44
fungi	that was on gitea12	16:45
fungi	or something different?	16:45
clarkb	no the git repo that had refs with colliding names with case insensitive lookups	16:45
clarkb	frickler reported it iirc. It will be in irc logs somewhere and maybe in my notes on the gitea issue I filed	16:45
clarkb	we discussed it on february 28	16:47
fungi	oh, that long ago	16:47
clarkb	https://opendev.org/x/fuel-plugin-onos	16:48
clarkb	it still 500s	16:48
clarkb	oh because I looked at gitea09 heh	16:48
* clarkb tries with teh correct backend		16:48
clarkb	https://gitea14.opendev.org:3081/x/fuel-plugin-onos works now \o/	16:48
fungi	oh yay!	16:49
clarkb	yes basically this came up a while ago then when I went to figure it out there was already work in progress to add this doctor tool to fix it in the 1.22 release	16:49
clarkb	but then the 1.22 was slow to go out and then slow to fix early release bugs so it took a while to get to the point where I was comfortable running it. But here we are now and this looks much happier	16:50
clarkb	git clone also works for me. I will add this node back to the load balacner as soon as replication completes	16:51
clarkb	I'll do them in reverse order so gitea13 will be next and so on	16:53
clarkb	replication is about half done	16:53
clarkb	seprately there is a git push task in gerrit's show-queue output from about a month ago. I wonder if we should try and kill that task	16:59
fungi	c1278428 Jun-26 09:43 git-upload-pack /openstack/nova (wangjing)	17:00
fungi	that one?	17:00
clarkb	ya	17:00
fungi	i can try, if you like	17:00
clarkb	maybe after gitea is done? I don't know how safe killing tasks is generally for gerrit	17:01
fungi	checking for now to see if it's associated with a hung connection	17:01
clarkb	ok replication is done. I'm going to add gitea14 to the lb and remove gitea13	17:02
clarkb	any concerns with proceeding with these tasks on the next node?	17:02
fungi	no concerns	17:03
fungi	user wangjing has 3 open ssh connections currently	17:07
clarkb	13 is getting db updates now	17:07
fungi	ss on the server indicates 3 established 29418/tcp connections for the same ip address (and that ip address is only in use by gerrit connections for that user id)	17:11
clarkb	bringing 13 back online now	17:11
clarkb	quick checks show it working. I'll start replication now	17:13
clarkb	fungi: my main concern would be that killing a git push task might make the git repo have a sad	17:14
clarkb	either due to incomplete object writes or more likely an incomplete change creation	17:14
fungi	yeah	17:16
fungi	but so might a service restart that clears that task	17:17
clarkb	though actually wait is git-upload-pack the thing that is run when you fetch?	17:17
clarkb	I always forget because I think git's terminology feels backward	17:17
clarkb	but if so then maybe they just haev a long running clone going and it is much safer to kill	17:18
fungi	oh, yeah i bet this is a third-party ci account	17:20
fungi	so it's probably listening to the event stream and fetching refs from gerrit	17:20
fungi	"Invoked by git fetch-pack, learns what objects the other side is missing, and sends them after packing.	17:22
fungi	"	17:22
fungi	bingo	17:22
fungi	so shouldn't be a write operation anyway	17:22
clarkb	I guess we can try killing it after giteas are all done. Or reach out to the user and ask if they are having problems?	17:23
clarkb	waiting for replication is the slowest part of this process	17:24
fungi	given we've seen plenty of half-timed-out but not removed ssh connections from third-party ci accounts, and this one has been hung for a month and they don't have an unusually high number of established connections, i'm inclined to just kill the hung task without trying to track them down	17:32
fungi	doesn't look like it's a frequently recurrent issue for them	17:33
clarkb	13 is back in the lb and 12 is out. proceeding on that node now	17:34
clarkb	12 is up and replication has been started	17:42
clarkb	12 is back in the rotation and 11 is on its way out to the conversion pasture	18:01
clarkb	gitea11 is back up and being replicated to now	18:09
* frickler notes that nova isn't done with their CVE stuff yet, but they're also taking it real slow with reviewing and pushing things, so likely we don't have to care much anymore, either		18:18
frickler	clarkb: did you see my comments on https://review.opendev.org/c/openstack/project-config/+/924858 ? though I must admit I don't quite understand what's happening, maybe corvus can take a more expert look	18:19
clarkb	not yet I've been focused on the gitea stuff mostly	18:20
clarkb	frickler: the code and comment is copied directly from the other tenant configs. I guess I can update all of them	18:20
clarkb	and I'm not trying to fix the openstack tenant in that change	18:21
clarkb	fwiw the openstack tenant is failing for different reasons. It isn't trying to run the jobs I don't think. Instead it is noticing the config isn't valid for the openstack tenant because openstack is still using those resources	18:22
clarkb	I don't think we need to fix all of openstack before merging these cleanups	18:22
frickler	yes, that will be a tedious task if we can only see the issues one-by-one	18:23
frickler	one nova patch in gate now, but all on its own in the integrated queue, so not much to do there	18:25
clarkb	putting gitea11 back into service and moving onto gitea10	18:25
frickler	but that makes me wonder why neutron-lib does have its own queue. might be a bad cleanup after the zuul change	18:26
clarkb	the most likely reason is it simply isn't in the same queue	18:28
clarkb	gitea10 is replicating now	18:33
clarkb	corvus: maybe you can weigh in on 924858 on whether or not excluding project is preferred to explicitly including nodeset, secret and job	18:35
frickler	hmm, looks like neutron-lib never was in the integrated queue, I'll check with the neutron team whether that's intentional or some oversight	18:43
clarkb	and now I'm putting gitea10 back into service and pulling 09 out. Almost done	18:50
clarkb	gitea09 is back up now	18:57
clarkb	things look ok at first check I'm starting replication	18:58
clarkb	infra-root I believe all six gitea backends have had their DBs converted to utf8mb4 text types and utf8mb4 case sensitive collation methods. Gitea09 is not back in the load balancer yet because I am waiting for replication to complete to it. Now is probably a good time to do any checks you'd like to do before we call it done	18:59
clarkb	in particular maybe you want to check the db state on each of the gitea backends to make sure I didn't miss one or do something silly like that	19:00
clarkb	then also https://giteaXY.opendev.org:3081/x/fuel-plugin-onos should render for every backend at this point	19:00
fungi	i've been browsing around, cloning things, et cetera and it's looking good so far	19:01
clarkb	cool. I've also been trying to check as I go and I don't think I've missed any backends and they all seem to be operating as expcted. Just good to get extra eyeballs and some more diverse actions against them	19:02
clarkb	once I've got 09 back into the lb I'm going to eat lunch	19:03
clarkb	replication appears done. I'm putting gitea09 back into the lb rotation	19:13
clarkb	#status log Converted database contents to utf8mb4 and case sensitive collations on each gitea backend using the gitea doctor convert tool including in v1.22	19:14
opendevstatus	clarkb: finished logging	19:14
fungi	thanks clarkb!!!	19:18
corvus	clarkb: done	19:21
opendevreview	Merged zuul/zuul-jobs master: Make prepare-workspace-git behavior more consistent https://review.opendev.org/c/zuul/zuul-jobs/+/924804	20:04
clarkb	corvus: will the currently queued infra-prod-service-registry job pick that up or do we have to wait for the 21:00 UTC hourly runs?	20:07
clarkb	I think the git state is decided when things enqueue not when the jobs start?	20:07
corvus	yeah we should have to wait	20:09
clarkb	ok /me attempts patience	20:12
clarkb	I think the reason that the vexxhost tenant is different is they are including things like pipelines too?	20:43
clarkb	anyway I'll fix the comment on the entry for everything but vexxhost. I don't think we need to alphasort those entries	20:43
opendevreview	Clark Boylan proposed openstack/project-config master: Limit what volvocars loads from opendev/base-jobs https://review.opendev.org/c/openstack/project-config/+/924858	20:45
clarkb	the hourly periodic jobs have just enqueued. They should use the new version of prepare-workspace-git as supplied in 924804	21:01
clarkb	corvus: I see the remote is valid on disk (not null) which implies that the tasks to fetch from "upstream" are working in periodic pipelines despite the initial reset to null remotes	21:03
clarkb	so I think this is working as expected	21:03
clarkb	the nodepool job that is running now is a good check though	21:03
clarkb	I just discovered that my gitea issue about the 500 errors due to case insensitivity was locked and marked resolved automatically 10 days after my last comment :/ I can't even leave a note there indicating this fixed it	21:12
corvus	clarkb: i think that means they want you to open a new issue. ;)	21:14
corvus	clarkb: looks like the nodepool job finished successfully; anything else we should check?	21:15
clarkb	corvus: no I think I'm happy with those results.	21:16
corvus	++	21:17
corvus	log looks as i would expect	21:17
opendevreview	Merged openstack/project-config master: Limit what volvocars loads from opendev/base-jobs https://review.opendev.org/c/openstack/project-config/+/924858	21:55
fungi	#status log Closed persistent SSH API connections from Gerrit account 34377 in order to end a Git fetch task which was hung for the past month	22:34
opendevstatus	fungi: finished logging	22:34
clarkb	fungi: did you just use gerrits task kill command for that?	22:35
fungi	i didn't find any such command, so i used close-connection on the three session ids	22:35
clarkb	fungi: https://gerrit-review.googlesource.com/Documentation/cmd-kill.html for the future	22:36
clarkb	they are probably pretty similar in implementation though	22:36
fungi	oh! i guess kill is a plugin	22:37
fungi	not a gerrit subcommand	22:37
fungi	which is why gerrit --help wasn't bringing it up	22:37
clarkb	ya I'm not sure why it doesn't have the prefix	22:37
fungi	well, anyway, closing the connections also terminated the task on its own	22:38
clarkb	ya I suspect that the underlying implementation for killing an ssh connection is basically the same	22:38
fungi	those same session ids had been open for hours at least (but more likely months)	22:38

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!