Wednesday, 2024-07-24

corvusdidn't the job that failed run in periodic?00:00
clarkbcorvus: one was periodic and one was deploy00:01
clarkbI think your assumption that the job would set the remote if it needed it is correct though00:01
corvushttps://zuul.opendev.org/t/openstack/build/a559a6653a274579b41f93efb13eac2a/console#1/1/12/bridge01.opendev.org00:01
clarkbI'll get a link in a sec00:01
clarkbhttps://opendev.org/opendev/base-jobs/src/branch/master/playbooks/infra-prod/setup-src.yaml#L27-L5300:02
clarkbso in the case of opendev prepare-workspace-git would remove the remote then we'd add it back again I think00:02
clarkbthough its possible that ansible might get angry about changing the remote like that? I don't know00:03
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Fix prepare-workspace-git operating on existing repos  https://review.opendev.org/c/zuul/zuul-jobs/+/92480200:18
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Make prepare-workspace-git behavior more consistent  https://review.opendev.org/c/zuul/zuul-jobs/+/92480400:21
opendevreviewClark Boylan proposed openstack/project-config master: Set xenial min ready to 0  https://review.opendev.org/c/openstack/project-config/+/92480600:28
opendevreviewMerged zuul/zuul-jobs master: Fix prepare-workspace-git operating on existing repos  https://review.opendev.org/c/zuul/zuul-jobs/+/92480200:32
opendevreviewMerged openstack/project-config master: Set xenial min ready to 0  https://review.opendev.org/c/openstack/project-config/+/92480600:55
fungithat does seem to have fixed it01:19
clarkbya things are looking good for now we should probably keep an eye on it but I'm happy with how the hourly jobs are going and the job for 92480601:19
clarkbI crammed dinner down. Now to find something frozen and sweet01:20
opendevreviewyatin proposed zuul/zuul-jobs master: Fix wheel_mirror for Debian  https://review.opendev.org/c/zuul/zuul-jobs/+/92481505:01
opendevreviewBenjamin Schanzel proposed zuul/zuul-jobs master: Add ensure-dib role  https://review.opendev.org/c/zuul/zuul-jobs/+/92291007:26
opendevreviewBenjamin Schanzel proposed zuul/zuul-jobs master: wip: Add build-diskimage role  https://review.opendev.org/c/zuul/zuul-jobs/+/92291107:26
opendevreviewBenjamin Schanzel proposed zuul/zuul-jobs master: Add ensure-dib role  https://review.opendev.org/c/zuul/zuul-jobs/+/92291007:48
opendevreviewBenjamin Schanzel proposed zuul/zuul-jobs master: wip: Add build-diskimage role  https://review.opendev.org/c/zuul/zuul-jobs/+/92291107:48
opendevreviewBenjamin Schanzel proposed zuul/zuul-jobs master: Add ensure-dib role  https://review.opendev.org/c/zuul/zuul-jobs/+/92291007:50
opendevreviewBenjamin Schanzel proposed zuul/zuul-jobs master: wip: Add build-diskimage role  https://review.opendev.org/c/zuul/zuul-jobs/+/92291107:50
opendevreviewBenjamin Schanzel proposed zuul/zuul-jobs master: wip: Add build-diskimage role  https://review.opendev.org/c/zuul/zuul-jobs/+/92291108:08
fricklerso the issue that was blocking for [ps]*-config should be resolved, do I understand this correctly? at least the infra-prod hourly runs are looking fine again since 01:00 https://zuul.opendev.org/t/openstack/buildset/d0f51c4265364a49a5e7c1e08cd77ab208:17
opendevreviewBenjamin Schanzel proposed zuul/zuul-jobs master: wip: Add build-diskimage role  https://review.opendev.org/c/zuul/zuul-jobs/+/92291108:19
opendevreviewSimon Westphahl proposed zuul/zuul-jobs master: wip: Add example role for converting images  https://review.opendev.org/c/zuul/zuul-jobs/+/92291208:43
opendevreviewBenjamin Schanzel proposed zuul/zuul-jobs master: wip: Add example role for converting images  https://review.opendev.org/c/zuul/zuul-jobs/+/92291209:29
opendevreviewJan Gutter proposed zuul/zuul-jobs master: Update sources for cri-o for newer Ubuntu  https://review.opendev.org/c/zuul/zuul-jobs/+/92475010:35
opendevreviewJan Gutter proposed zuul/zuul-jobs master: Move minikube out of /tmp  https://review.opendev.org/c/zuul/zuul-jobs/+/92473810:35
opendevreviewBenjamin Schanzel proposed zuul/zuul-jobs master: Add build-diskimage role  https://review.opendev.org/c/zuul/zuul-jobs/+/92291110:36
opendevreviewBenjamin Schanzel proposed zuul/zuul-jobs master: Add a role to convert diskimages between formats  https://review.opendev.org/c/zuul/zuul-jobs/+/92291210:36
opendevreviewJan Gutter proposed zuul/zuul-jobs master: Update sources for cri-o for newer Ubuntu  https://review.opendev.org/c/zuul/zuul-jobs/+/92475010:56
opendevreviewJan Gutter proposed zuul/zuul-jobs master: Move minikube out of /tmp  https://review.opendev.org/c/zuul/zuul-jobs/+/92473810:56
fungifrickler: yes, seems it's cleared up with the regression fix in zuul-jobs11:51
opendevreviewJake Yip proposed openstack/project-config master: Sync magnum-capi-helm-charts repo to GitHub mirror  https://review.opendev.org/c/openstack/project-config/+/92484611:53
opendevreviewMerged openstack/project-config master: Sync magnum-capi-helm-charts repo to GitHub mirror  https://review.opendev.org/c/openstack/project-config/+/92484612:10
opendevreviewMerged zuul/zuul-jobs master: Fix wheel_mirror for Debian  https://review.opendev.org/c/zuul/zuul-jobs/+/92481512:16
fricklerthis is weird, is this a bug in zuul or a misconfiguration on our side or just the way it is? https://review.opendev.org/c/opendev/base-jobs/+/922653 one tenant is V-1, two others V+1 and the slowest one wins. not the reproducible gating that I would have expected. I wonder whether it might also run multiple gate pipelines if it would get approved? once rejected, twice merged? ;)12:25
fungii think we shouldn't have the volvocars tenant running jobs for opendev/base-jobs changes12:35
fungithat is probably a configuration mistake copied from the opendev tenant (where it should be voting/gating)12:36
frickleryes, but even then there's opendev tenant + openstack tenant with conflicting votes13:22
fungii think when we looked into it before there was no way to prevent zuul from adding a verified -1 on config errors from other tenants, but that's less concerning to me13:27
clarkbright its -1 won't be a -2 so its "fine" as far as merging goes13:59
clarkbdid anyone look at the tenant configs to determine if the new volvocars config is wrong?13:59
fungii have not yet14:14
clarkbI think I see the issue. I can push a patch once I have ssh keys loaded. But then I need to go find breakfast and do morning things before lookingat gitea in a bit14:15
opendevreviewClark Boylan proposed openstack/project-config master: Limit what volvocars loads from opendev/base-jobs  https://review.opendev.org/c/openstack/project-config/+/92485814:20
fricklerthe -1 does block checks running on patches with a depends-on, though, see https://review.opendev.org/c/openstack/project-config/+/924790/115:17
opendevreviewJan Gutter proposed zuul/zuul-jobs master: Move minikube out of /tmp  https://review.opendev.org/c/zuul/zuul-jobs/+/92473815:38
clarkbinfra-root I'm going to remove gitea14.o.o from the load balance now then proceed with the process at https://etherpad.opendev.org/p/hcz4yMxUIsAWGgyoHKeZ16:29
corvusfrickler: the openstack tenant is the one that is reporting the config error on the dependent change: https://zuul.opendev.org/t/openstack/buildset/c5e7c0c4083d4fcdb1b71a43b539ca8316:34
clarkbthe db backups is complete and the filesize looks about what I expected it to be16:35
clarkbI'm proceeding with running the doctor command now16:35
corvusso it makes sense that the change that depends on it reports that it depends on a change with a config error16:35
corvusclarkb: roger, docker doctor.16:35
clarkb"Converted successfully, please confirm your database's character set is now utf8mb4" that took about 2.5 minutes or so16:38
clarkbthere were also slow query warnings I didn't get from the test node16:38
clarkbbut I think thats fine its just letting us know it took some time to do some of these conversion16:38
clarkbchecks look good. I'm going to turn gitea services back on, then do full replication against it and if we're happy with the results we can add it back to the lb16:41
clarkbhttps://gitea14.opendev.org:3081/opendev/system-config/ seems to be working16:42
fungiclarkb: sounds good, i guess i stepped away at the wrong moment, sorry!16:43
clarkbno problem. Seems to have gone well. If yuo are back maybe check there isn't anything amiss on the service usage side16:44
fungiand yeah, testing so far things look good with gitea1416:44
clarkbdo we remember which repo had the 500 errors before?16:44
fungithe corrupt/truncated objects and refs?16:44
fungithat was on gitea1216:45
fungior something different?16:45
clarkbno the git repo that had refs with colliding names with case insensitive lookups16:45
clarkbfrickler reported it iirc. It will be in irc logs somewhere and maybe in my notes on the gitea issue I filed16:45
clarkbwe discussed it on february 2816:47
fungioh, that long ago16:47
clarkbhttps://opendev.org/x/fuel-plugin-onos16:48
clarkbit still 500s16:48
clarkboh because I looked at gitea09 heh16:48
* clarkb tries with teh correct backend16:48
clarkbhttps://gitea14.opendev.org:3081/x/fuel-plugin-onos works now \o/16:48
fungioh yay!16:49
clarkbyes basically this came up a while ago then when I went to figure it out there was already work in progress to add this doctor tool to fix it in the 1.22 release16:49
clarkbbut then the 1.22 was slow to go out and then slow to fix early release bugs so it took a while to get to the point where I was comfortable running it. But here we are now and this looks much happier16:50
clarkbgit clone also works for me. I will add this node back to the load balacner as soon as replication completes16:51
clarkbI'll do them in reverse order so gitea13 will be next and so on16:53
clarkbreplication is about half done16:53
clarkbseprately there is a git push task in gerrit's show-queue output from about a month ago. I wonder if we should try and kill that task16:59
fungic1278428              Jun-26 09:43      git-upload-pack /openstack/nova (wangjing)17:00
fungithat one?17:00
clarkbya17:00
fungii can try, if you like17:00
clarkbmaybe after gitea is done? I don't know how safe killing tasks is generally for gerrit17:01
fungichecking for now to see if it's associated with a hung connection17:01
clarkbok replication is done. I'm going to add gitea14 to the lb and remove gitea1317:02
clarkbany concerns with proceeding with these tasks on the next node?17:02
fungino concerns17:03
fungiuser wangjing has 3 open ssh connections currently17:07
clarkb13 is getting db updates now17:07
fungiss on the server indicates 3 established 29418/tcp connections for the same ip address (and that ip address is only in use by gerrit connections for that user id)17:11
clarkbbringing 13 back online now17:11
clarkbquick checks show it working. I'll start replication now17:13
clarkbfungi: my main concern would be that killing a git push task might make the git repo have a sad17:14
clarkbeither due to incomplete object writes or more likely an incomplete change creation17:14
fungiyeah17:16
fungibut so might a service restart that clears that task17:17
clarkbthough actually wait is git-upload-pack the thing that is run when you fetch?17:17
clarkbI always forget because I think git's terminology feels backward17:17
clarkbbut if so then maybe they just haev a long running clone going and it is much safer to kill17:18
fungioh, yeah i bet this is a third-party ci account17:20
fungiso it's probably listening to the event stream and fetching refs from gerrit17:20
fungi"Invoked by git fetch-pack, learns what objects the other side is missing, and sends them after packing.17:22
fungi"17:22
fungibingo17:22
fungiso shouldn't be a write operation anyway17:22
clarkbI guess we can try killing it after giteas are all done. Or reach out to the user and ask if they are having problems?17:23
clarkbwaiting for replication is the slowest part of this process17:24
fungigiven we've seen plenty of half-timed-out but not removed ssh connections from third-party ci accounts, and this one has been hung for a month and they don't have an unusually high number of established connections, i'm inclined to just kill the hung task without trying to track them down17:32
fungidoesn't look like it's a frequently recurrent issue for them17:33
clarkb13 is back in the lb and 12 is out. proceeding on that node now17:34
clarkb12 is up and replication has been started17:42
clarkb12 is back in the rotation and 11 is on its way out to the conversion pasture18:01
clarkbgitea11 is back up and being replicated to now18:09
* frickler notes that nova isn't done with their CVE stuff yet, but they're also taking it real slow with reviewing and pushing things, so likely we don't have to care much anymore, either18:18
fricklerclarkb: did you see my comments on https://review.opendev.org/c/openstack/project-config/+/924858 ? though I must admit I don't quite understand what's happening, maybe corvus can take a more expert look18:19
clarkbnot yet I've been focused on the gitea stuff mostly18:20
clarkbfrickler: the code and comment is copied directly from the other tenant configs. I guess I can update all of them18:20
clarkband I'm not trying to fix the openstack tenant in that change18:21
clarkbfwiw the openstack  tenant is failing for different reasons. It isn't trying to run the jobs I don't think. Instead it is noticing the config isn't valid for the openstack tenant because openstack is still using those resources18:22
clarkbI don't think we need to fix all of openstack before merging these cleanups18:22
frickleryes, that will be a tedious task if we can only see the issues one-by-one18:23
fricklerone nova patch in gate now, but all on its own in the integrated queue, so not much to do there18:25
clarkbputting gitea11 back into service and moving onto gitea1018:25
fricklerbut that makes me wonder why neutron-lib does have its own queue. might be a bad cleanup after the zuul change18:26
clarkbthe most likely reason is it simply isn't in the same queue18:28
clarkbgitea10 is replicating now18:33
clarkbcorvus: maybe you can weigh in on 924858 on whether or not excluding project is preferred to explicitly including nodeset, secret and job18:35
fricklerhmm, looks like neutron-lib never was in the integrated queue, I'll check with the neutron team whether that's intentional or some oversight18:43
clarkband now I'm putting gitea10 back into service and pulling 09 out. Almost done18:50
clarkbgitea09 is back up now18:57
clarkbthings look ok at first check I'm starting replication18:58
clarkbinfra-root I believe all six gitea backends have had their DBs converted to utf8mb4 text types and utf8mb4 case sensitive collation methods. Gitea09 is not back in the load balancer yet because I am waiting for replication to complete to it. Now is probably a good time to do any checks you'd like to do before we call it done18:59
clarkbin particular maybe you want to check the db state on each of the gitea backends to make sure I didn't miss one or do something silly like that19:00
clarkbthen also https://giteaXY.opendev.org:3081/x/fuel-plugin-onos should render for every backend at this point19:00
fungii've been browsing around, cloning things, et cetera and it's looking good so far19:01
clarkbcool. I've also been trying to check as I go and I don't think I've missed any backends and they all seem to be operating as expcted. Just good to get extra eyeballs and some more diverse actions against them19:02
clarkbonce I've got 09 back into the lb I'm going to eat lunch19:03
clarkbreplication appears done. I'm putting gitea09 back into the lb rotation19:13
clarkb#status log Converted database contents to utf8mb4 and case sensitive collations on each gitea backend using the gitea doctor convert tool including in v1.2219:14
opendevstatusclarkb: finished logging19:14
fungithanks clarkb!!!19:18
corvusclarkb: done19:21
opendevreviewMerged zuul/zuul-jobs master: Make prepare-workspace-git behavior more consistent  https://review.opendev.org/c/zuul/zuul-jobs/+/92480420:04
clarkbcorvus: will the currently queued infra-prod-service-registry job pick that up or do we have to wait for the 21:00 UTC hourly runs?20:07
clarkbI think the git state is decided when things enqueue not when the jobs start?20:07
corvusyeah we should have to wait20:09
clarkbok /me attempts patience20:12
clarkbI think the reason that the vexxhost tenant is different is they are including things like pipelines too?20:43
clarkbanyway I'll fix the comment on the entry for everything but vexxhost. I don't think we need to alphasort those entries20:43
opendevreviewClark Boylan proposed openstack/project-config master: Limit what volvocars loads from opendev/base-jobs  https://review.opendev.org/c/openstack/project-config/+/92485820:45
clarkbthe hourly periodic jobs have just enqueued. They should use the new version of prepare-workspace-git as supplied in 92480421:01
clarkbcorvus: I see the remote is valid on disk (not null) which implies that the tasks to fetch from "upstream" are working in periodic pipelines despite the initial reset to null remotes21:03
clarkbso I think this is working as expected21:03
clarkbthe nodepool job that is running now is a good check though21:03
clarkbI just discovered that my gitea issue about the 500 errors due to case insensitivity was locked and marked resolved automatically 10 days after my last comment :/ I can't even leave a note there indicating this fixed it21:12
corvusclarkb: i think that means they want you to open a new issue.  ;)21:14
corvusclarkb: looks like the nodepool job finished successfully; anything else we should check?21:15
clarkbcorvus: no I think I'm happy with those results.21:16
corvus++21:17
corvuslog looks as i would expect21:17
opendevreviewMerged openstack/project-config master: Limit what volvocars loads from opendev/base-jobs  https://review.opendev.org/c/openstack/project-config/+/92485821:55
fungi#status log Closed persistent SSH API connections from Gerrit account 34377 in order to end a Git fetch task which was hung for the past month22:34
opendevstatusfungi: finished logging22:34
clarkbfungi: did you just use gerrits task kill command for that?22:35
fungii didn't find any such command, so i used close-connection on the three session ids22:35
clarkbfungi: https://gerrit-review.googlesource.com/Documentation/cmd-kill.html for the future22:36
clarkbthey are probably pretty similar in implementation though22:36
fungioh! i guess kill is a plugin22:37
funginot a gerrit subcommand22:37
fungiwhich is why gerrit --help wasn't bringing it up22:37
clarkbya I'm not sure why it doesn't have the prefix22:37
fungiwell, anyway, closing the connections also terminated the task on its own22:38
clarkbya I suspect that the underlying implementation for killing an ssh connection is basically the same22:38
fungithose same session ids had been open for hours at least (but more likely months)22:38

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!