corvus | didn't the job that failed run in periodic? | 00:00 |
---|---|---|
clarkb | corvus: one was periodic and one was deploy | 00:01 |
clarkb | I think your assumption that the job would set the remote if it needed it is correct though | 00:01 |
corvus | https://zuul.opendev.org/t/openstack/build/a559a6653a274579b41f93efb13eac2a/console#1/1/12/bridge01.opendev.org | 00:01 |
clarkb | I'll get a link in a sec | 00:01 |
clarkb | https://opendev.org/opendev/base-jobs/src/branch/master/playbooks/infra-prod/setup-src.yaml#L27-L53 | 00:02 |
clarkb | so in the case of opendev prepare-workspace-git would remove the remote then we'd add it back again I think | 00:02 |
clarkb | though its possible that ansible might get angry about changing the remote like that? I don't know | 00:03 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Fix prepare-workspace-git operating on existing repos https://review.opendev.org/c/zuul/zuul-jobs/+/924802 | 00:18 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Make prepare-workspace-git behavior more consistent https://review.opendev.org/c/zuul/zuul-jobs/+/924804 | 00:21 |
opendevreview | Clark Boylan proposed openstack/project-config master: Set xenial min ready to 0 https://review.opendev.org/c/openstack/project-config/+/924806 | 00:28 |
opendevreview | Merged zuul/zuul-jobs master: Fix prepare-workspace-git operating on existing repos https://review.opendev.org/c/zuul/zuul-jobs/+/924802 | 00:32 |
opendevreview | Merged openstack/project-config master: Set xenial min ready to 0 https://review.opendev.org/c/openstack/project-config/+/924806 | 00:55 |
fungi | that does seem to have fixed it | 01:19 |
clarkb | ya things are looking good for now we should probably keep an eye on it but I'm happy with how the hourly jobs are going and the job for 924806 | 01:19 |
clarkb | I crammed dinner down. Now to find something frozen and sweet | 01:20 |
opendevreview | yatin proposed zuul/zuul-jobs master: Fix wheel_mirror for Debian https://review.opendev.org/c/zuul/zuul-jobs/+/924815 | 05:01 |
opendevreview | Benjamin Schanzel proposed zuul/zuul-jobs master: Add ensure-dib role https://review.opendev.org/c/zuul/zuul-jobs/+/922910 | 07:26 |
opendevreview | Benjamin Schanzel proposed zuul/zuul-jobs master: wip: Add build-diskimage role https://review.opendev.org/c/zuul/zuul-jobs/+/922911 | 07:26 |
opendevreview | Benjamin Schanzel proposed zuul/zuul-jobs master: Add ensure-dib role https://review.opendev.org/c/zuul/zuul-jobs/+/922910 | 07:48 |
opendevreview | Benjamin Schanzel proposed zuul/zuul-jobs master: wip: Add build-diskimage role https://review.opendev.org/c/zuul/zuul-jobs/+/922911 | 07:48 |
opendevreview | Benjamin Schanzel proposed zuul/zuul-jobs master: Add ensure-dib role https://review.opendev.org/c/zuul/zuul-jobs/+/922910 | 07:50 |
opendevreview | Benjamin Schanzel proposed zuul/zuul-jobs master: wip: Add build-diskimage role https://review.opendev.org/c/zuul/zuul-jobs/+/922911 | 07:50 |
opendevreview | Benjamin Schanzel proposed zuul/zuul-jobs master: wip: Add build-diskimage role https://review.opendev.org/c/zuul/zuul-jobs/+/922911 | 08:08 |
frickler | so the issue that was blocking for [ps]*-config should be resolved, do I understand this correctly? at least the infra-prod hourly runs are looking fine again since 01:00 https://zuul.opendev.org/t/openstack/buildset/d0f51c4265364a49a5e7c1e08cd77ab2 | 08:17 |
opendevreview | Benjamin Schanzel proposed zuul/zuul-jobs master: wip: Add build-diskimage role https://review.opendev.org/c/zuul/zuul-jobs/+/922911 | 08:19 |
opendevreview | Simon Westphahl proposed zuul/zuul-jobs master: wip: Add example role for converting images https://review.opendev.org/c/zuul/zuul-jobs/+/922912 | 08:43 |
opendevreview | Benjamin Schanzel proposed zuul/zuul-jobs master: wip: Add example role for converting images https://review.opendev.org/c/zuul/zuul-jobs/+/922912 | 09:29 |
opendevreview | Jan Gutter proposed zuul/zuul-jobs master: Update sources for cri-o for newer Ubuntu https://review.opendev.org/c/zuul/zuul-jobs/+/924750 | 10:35 |
opendevreview | Jan Gutter proposed zuul/zuul-jobs master: Move minikube out of /tmp https://review.opendev.org/c/zuul/zuul-jobs/+/924738 | 10:35 |
opendevreview | Benjamin Schanzel proposed zuul/zuul-jobs master: Add build-diskimage role https://review.opendev.org/c/zuul/zuul-jobs/+/922911 | 10:36 |
opendevreview | Benjamin Schanzel proposed zuul/zuul-jobs master: Add a role to convert diskimages between formats https://review.opendev.org/c/zuul/zuul-jobs/+/922912 | 10:36 |
opendevreview | Jan Gutter proposed zuul/zuul-jobs master: Update sources for cri-o for newer Ubuntu https://review.opendev.org/c/zuul/zuul-jobs/+/924750 | 10:56 |
opendevreview | Jan Gutter proposed zuul/zuul-jobs master: Move minikube out of /tmp https://review.opendev.org/c/zuul/zuul-jobs/+/924738 | 10:56 |
fungi | frickler: yes, seems it's cleared up with the regression fix in zuul-jobs | 11:51 |
opendevreview | Jake Yip proposed openstack/project-config master: Sync magnum-capi-helm-charts repo to GitHub mirror https://review.opendev.org/c/openstack/project-config/+/924846 | 11:53 |
opendevreview | Merged openstack/project-config master: Sync magnum-capi-helm-charts repo to GitHub mirror https://review.opendev.org/c/openstack/project-config/+/924846 | 12:10 |
opendevreview | Merged zuul/zuul-jobs master: Fix wheel_mirror for Debian https://review.opendev.org/c/zuul/zuul-jobs/+/924815 | 12:16 |
frickler | this is weird, is this a bug in zuul or a misconfiguration on our side or just the way it is? https://review.opendev.org/c/opendev/base-jobs/+/922653 one tenant is V-1, two others V+1 and the slowest one wins. not the reproducible gating that I would have expected. I wonder whether it might also run multiple gate pipelines if it would get approved? once rejected, twice merged? ;) | 12:25 |
fungi | i think we shouldn't have the volvocars tenant running jobs for opendev/base-jobs changes | 12:35 |
fungi | that is probably a configuration mistake copied from the opendev tenant (where it should be voting/gating) | 12:36 |
frickler | yes, but even then there's opendev tenant + openstack tenant with conflicting votes | 13:22 |
fungi | i think when we looked into it before there was no way to prevent zuul from adding a verified -1 on config errors from other tenants, but that's less concerning to me | 13:27 |
clarkb | right its -1 won't be a -2 so its "fine" as far as merging goes | 13:59 |
clarkb | did anyone look at the tenant configs to determine if the new volvocars config is wrong? | 13:59 |
fungi | i have not yet | 14:14 |
clarkb | I think I see the issue. I can push a patch once I have ssh keys loaded. But then I need to go find breakfast and do morning things before lookingat gitea in a bit | 14:15 |
opendevreview | Clark Boylan proposed openstack/project-config master: Limit what volvocars loads from opendev/base-jobs https://review.opendev.org/c/openstack/project-config/+/924858 | 14:20 |
frickler | the -1 does block checks running on patches with a depends-on, though, see https://review.opendev.org/c/openstack/project-config/+/924790/1 | 15:17 |
opendevreview | Jan Gutter proposed zuul/zuul-jobs master: Move minikube out of /tmp https://review.opendev.org/c/zuul/zuul-jobs/+/924738 | 15:38 |
clarkb | infra-root I'm going to remove gitea14.o.o from the load balance now then proceed with the process at https://etherpad.opendev.org/p/hcz4yMxUIsAWGgyoHKeZ | 16:29 |
corvus | frickler: the openstack tenant is the one that is reporting the config error on the dependent change: https://zuul.opendev.org/t/openstack/buildset/c5e7c0c4083d4fcdb1b71a43b539ca83 | 16:34 |
clarkb | the db backups is complete and the filesize looks about what I expected it to be | 16:35 |
clarkb | I'm proceeding with running the doctor command now | 16:35 |
corvus | so it makes sense that the change that depends on it reports that it depends on a change with a config error | 16:35 |
corvus | clarkb: roger, docker doctor. | 16:35 |
clarkb | "Converted successfully, please confirm your database's character set is now utf8mb4" that took about 2.5 minutes or so | 16:38 |
clarkb | there were also slow query warnings I didn't get from the test node | 16:38 |
clarkb | but I think thats fine its just letting us know it took some time to do some of these conversion | 16:38 |
clarkb | checks look good. I'm going to turn gitea services back on, then do full replication against it and if we're happy with the results we can add it back to the lb | 16:41 |
clarkb | https://gitea14.opendev.org:3081/opendev/system-config/ seems to be working | 16:42 |
fungi | clarkb: sounds good, i guess i stepped away at the wrong moment, sorry! | 16:43 |
clarkb | no problem. Seems to have gone well. If yuo are back maybe check there isn't anything amiss on the service usage side | 16:44 |
fungi | and yeah, testing so far things look good with gitea14 | 16:44 |
clarkb | do we remember which repo had the 500 errors before? | 16:44 |
fungi | the corrupt/truncated objects and refs? | 16:44 |
fungi | that was on gitea12 | 16:45 |
fungi | or something different? | 16:45 |
clarkb | no the git repo that had refs with colliding names with case insensitive lookups | 16:45 |
clarkb | frickler reported it iirc. It will be in irc logs somewhere and maybe in my notes on the gitea issue I filed | 16:45 |
clarkb | we discussed it on february 28 | 16:47 |
fungi | oh, that long ago | 16:47 |
clarkb | https://opendev.org/x/fuel-plugin-onos | 16:48 |
clarkb | it still 500s | 16:48 |
clarkb | oh because I looked at gitea09 heh | 16:48 |
* clarkb tries with teh correct backend | 16:48 | |
clarkb | https://gitea14.opendev.org:3081/x/fuel-plugin-onos works now \o/ | 16:48 |
fungi | oh yay! | 16:49 |
clarkb | yes basically this came up a while ago then when I went to figure it out there was already work in progress to add this doctor tool to fix it in the 1.22 release | 16:49 |
clarkb | but then the 1.22 was slow to go out and then slow to fix early release bugs so it took a while to get to the point where I was comfortable running it. But here we are now and this looks much happier | 16:50 |
clarkb | git clone also works for me. I will add this node back to the load balacner as soon as replication completes | 16:51 |
clarkb | I'll do them in reverse order so gitea13 will be next and so on | 16:53 |
clarkb | replication is about half done | 16:53 |
clarkb | seprately there is a git push task in gerrit's show-queue output from about a month ago. I wonder if we should try and kill that task | 16:59 |
fungi | c1278428 Jun-26 09:43 git-upload-pack /openstack/nova (wangjing) | 17:00 |
fungi | that one? | 17:00 |
clarkb | ya | 17:00 |
fungi | i can try, if you like | 17:00 |
clarkb | maybe after gitea is done? I don't know how safe killing tasks is generally for gerrit | 17:01 |
fungi | checking for now to see if it's associated with a hung connection | 17:01 |
clarkb | ok replication is done. I'm going to add gitea14 to the lb and remove gitea13 | 17:02 |
clarkb | any concerns with proceeding with these tasks on the next node? | 17:02 |
fungi | no concerns | 17:03 |
fungi | user wangjing has 3 open ssh connections currently | 17:07 |
clarkb | 13 is getting db updates now | 17:07 |
fungi | ss on the server indicates 3 established 29418/tcp connections for the same ip address (and that ip address is only in use by gerrit connections for that user id) | 17:11 |
clarkb | bringing 13 back online now | 17:11 |
clarkb | quick checks show it working. I'll start replication now | 17:13 |
clarkb | fungi: my main concern would be that killing a git push task might make the git repo have a sad | 17:14 |
clarkb | either due to incomplete object writes or more likely an incomplete change creation | 17:14 |
fungi | yeah | 17:16 |
fungi | but so might a service restart that clears that task | 17:17 |
clarkb | though actually wait is git-upload-pack the thing that is run when you fetch? | 17:17 |
clarkb | I always forget because I think git's terminology feels backward | 17:17 |
clarkb | but if so then maybe they just haev a long running clone going and it is much safer to kill | 17:18 |
fungi | oh, yeah i bet this is a third-party ci account | 17:20 |
fungi | so it's probably listening to the event stream and fetching refs from gerrit | 17:20 |
fungi | "Invoked by git fetch-pack, learns what objects the other side is missing, and sends them after packing. | 17:22 |
fungi | " | 17:22 |
fungi | bingo | 17:22 |
fungi | so shouldn't be a write operation anyway | 17:22 |
clarkb | I guess we can try killing it after giteas are all done. Or reach out to the user and ask if they are having problems? | 17:23 |
clarkb | waiting for replication is the slowest part of this process | 17:24 |
fungi | given we've seen plenty of half-timed-out but not removed ssh connections from third-party ci accounts, and this one has been hung for a month and they don't have an unusually high number of established connections, i'm inclined to just kill the hung task without trying to track them down | 17:32 |
fungi | doesn't look like it's a frequently recurrent issue for them | 17:33 |
clarkb | 13 is back in the lb and 12 is out. proceeding on that node now | 17:34 |
clarkb | 12 is up and replication has been started | 17:42 |
clarkb | 12 is back in the rotation and 11 is on its way out to the conversion pasture | 18:01 |
clarkb | gitea11 is back up and being replicated to now | 18:09 |
* frickler notes that nova isn't done with their CVE stuff yet, but they're also taking it real slow with reviewing and pushing things, so likely we don't have to care much anymore, either | 18:18 | |
frickler | clarkb: did you see my comments on https://review.opendev.org/c/openstack/project-config/+/924858 ? though I must admit I don't quite understand what's happening, maybe corvus can take a more expert look | 18:19 |
clarkb | not yet I've been focused on the gitea stuff mostly | 18:20 |
clarkb | frickler: the code and comment is copied directly from the other tenant configs. I guess I can update all of them | 18:20 |
clarkb | and I'm not trying to fix the openstack tenant in that change | 18:21 |
clarkb | fwiw the openstack tenant is failing for different reasons. It isn't trying to run the jobs I don't think. Instead it is noticing the config isn't valid for the openstack tenant because openstack is still using those resources | 18:22 |
clarkb | I don't think we need to fix all of openstack before merging these cleanups | 18:22 |
frickler | yes, that will be a tedious task if we can only see the issues one-by-one | 18:23 |
frickler | one nova patch in gate now, but all on its own in the integrated queue, so not much to do there | 18:25 |
clarkb | putting gitea11 back into service and moving onto gitea10 | 18:25 |
frickler | but that makes me wonder why neutron-lib does have its own queue. might be a bad cleanup after the zuul change | 18:26 |
clarkb | the most likely reason is it simply isn't in the same queue | 18:28 |
clarkb | gitea10 is replicating now | 18:33 |
clarkb | corvus: maybe you can weigh in on 924858 on whether or not excluding project is preferred to explicitly including nodeset, secret and job | 18:35 |
frickler | hmm, looks like neutron-lib never was in the integrated queue, I'll check with the neutron team whether that's intentional or some oversight | 18:43 |
clarkb | and now I'm putting gitea10 back into service and pulling 09 out. Almost done | 18:50 |
clarkb | gitea09 is back up now | 18:57 |
clarkb | things look ok at first check I'm starting replication | 18:58 |
clarkb | infra-root I believe all six gitea backends have had their DBs converted to utf8mb4 text types and utf8mb4 case sensitive collation methods. Gitea09 is not back in the load balancer yet because I am waiting for replication to complete to it. Now is probably a good time to do any checks you'd like to do before we call it done | 18:59 |
clarkb | in particular maybe you want to check the db state on each of the gitea backends to make sure I didn't miss one or do something silly like that | 19:00 |
clarkb | then also https://giteaXY.opendev.org:3081/x/fuel-plugin-onos should render for every backend at this point | 19:00 |
fungi | i've been browsing around, cloning things, et cetera and it's looking good so far | 19:01 |
clarkb | cool. I've also been trying to check as I go and I don't think I've missed any backends and they all seem to be operating as expcted. Just good to get extra eyeballs and some more diverse actions against them | 19:02 |
clarkb | once I've got 09 back into the lb I'm going to eat lunch | 19:03 |
clarkb | replication appears done. I'm putting gitea09 back into the lb rotation | 19:13 |
clarkb | #status log Converted database contents to utf8mb4 and case sensitive collations on each gitea backend using the gitea doctor convert tool including in v1.22 | 19:14 |
opendevstatus | clarkb: finished logging | 19:14 |
fungi | thanks clarkb!!! | 19:18 |
corvus | clarkb: done | 19:21 |
opendevreview | Merged zuul/zuul-jobs master: Make prepare-workspace-git behavior more consistent https://review.opendev.org/c/zuul/zuul-jobs/+/924804 | 20:04 |
clarkb | corvus: will the currently queued infra-prod-service-registry job pick that up or do we have to wait for the 21:00 UTC hourly runs? | 20:07 |
clarkb | I think the git state is decided when things enqueue not when the jobs start? | 20:07 |
corvus | yeah we should have to wait | 20:09 |
clarkb | ok /me attempts patience | 20:12 |
clarkb | I think the reason that the vexxhost tenant is different is they are including things like pipelines too? | 20:43 |
clarkb | anyway I'll fix the comment on the entry for everything but vexxhost. I don't think we need to alphasort those entries | 20:43 |
opendevreview | Clark Boylan proposed openstack/project-config master: Limit what volvocars loads from opendev/base-jobs https://review.opendev.org/c/openstack/project-config/+/924858 | 20:45 |
clarkb | the hourly periodic jobs have just enqueued. They should use the new version of prepare-workspace-git as supplied in 924804 | 21:01 |
clarkb | corvus: I see the remote is valid on disk (not null) which implies that the tasks to fetch from "upstream" are working in periodic pipelines despite the initial reset to null remotes | 21:03 |
clarkb | so I think this is working as expected | 21:03 |
clarkb | the nodepool job that is running now is a good check though | 21:03 |
clarkb | I just discovered that my gitea issue about the 500 errors due to case insensitivity was locked and marked resolved automatically 10 days after my last comment :/ I can't even leave a note there indicating this fixed it | 21:12 |
corvus | clarkb: i think that means they want you to open a new issue. ;) | 21:14 |
corvus | clarkb: looks like the nodepool job finished successfully; anything else we should check? | 21:15 |
clarkb | corvus: no I think I'm happy with those results. | 21:16 |
corvus | ++ | 21:17 |
corvus | log looks as i would expect | 21:17 |
opendevreview | Merged openstack/project-config master: Limit what volvocars loads from opendev/base-jobs https://review.opendev.org/c/openstack/project-config/+/924858 | 21:55 |
fungi | #status log Closed persistent SSH API connections from Gerrit account 34377 in order to end a Git fetch task which was hung for the past month | 22:34 |
opendevstatus | fungi: finished logging | 22:34 |
clarkb | fungi: did you just use gerrits task kill command for that? | 22:35 |
fungi | i didn't find any such command, so i used close-connection on the three session ids | 22:35 |
clarkb | fungi: https://gerrit-review.googlesource.com/Documentation/cmd-kill.html for the future | 22:36 |
clarkb | they are probably pretty similar in implementation though | 22:36 |
fungi | oh! i guess kill is a plugin | 22:37 |
fungi | not a gerrit subcommand | 22:37 |
fungi | which is why gerrit --help wasn't bringing it up | 22:37 |
clarkb | ya I'm not sure why it doesn't have the prefix | 22:37 |
fungi | well, anyway, closing the connections also terminated the task on its own | 22:38 |
clarkb | ya I suspect that the underlying implementation for killing an ssh connection is basically the same | 22:38 |
fungi | those same session ids had been open for hours at least (but more likely months) | 22:38 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!