| -@gerrit:opendev.org- Julia Kreger proposed: [openstack/diskimage-builder] 990417: Add rdma-roce element for RoCE tenant OS images https://review.opendev.org/c/openstack/diskimage-builder/+/990417 | 03:46 | |
| -@gerrit:opendev.org- Sylvain Bauza proposed: [opendev/system-config] 990439: Add statusbot in #openstack-agentic-workflows https://review.opendev.org/c/opendev/system-config/+/990439 | 09:27 | |
| -@gerrit:opendev.org- Zuul merged on behalf of Sei Sano: [opendev/irc-meetings] 988910: Update Masakari IRC meeting ... https://review.opendev.org/c/opendev/irc-meetings/+/988910 | 10:26 | |
| -@gerrit:opendev.org- Zuul merged on behalf of Sylvain Bauza: [opendev/system-config] 990439: Add statusbot in #openstack-agentic-workflows https://review.opendev.org/c/opendev/system-config/+/990439 | 13:49 | |
| @clarkb:matrix.org | I don't see any obvious backup complaints in infra-root mail for the last day | 14:49 |
|---|---|---|
| @clarkb:matrix.org | that is agood sign. But should still check servers and their backups before callign it good | 14:49 |
| @mnasiadka:matrix.org | Clark: If I can do anything in my timezone tomorrow to check that - let me know | 15:25 |
| @clarkb:matrix.org | mnasiadka: ack I think the first step is just to check if anything got written to the backup server df/du/ls etc | 15:26 |
| @mnasiadka:matrix.org | btw, I think all the Prometheus stack patches should be good for a re-review | 15:26 |
| @clarkb:matrix.org | mnasiadka: https://docs.opendev.org/opendev/system-config/latest/sysadmin.html#restore-from-backup would be the next step (that describes how we you can restore from backup which we don't need to do, just need to go far enough to see that we could restore from a backup) | 15:27 |
| @mnasiadka:matrix.org | Makes sense, I'll have a look in depth tomorrow :) | 15:27 |
| @clarkb:matrix.org | the borg mount fuse thing is really convenient (just don't forget to unmount when done) | 15:27 |
| @clarkb:matrix.org | but it allows you to browse around the backup as if it were a normal filesystem mount whcih I personally prefer over tarball listings | 15:28 |
| @clarkb:matrix.org | I've put rereviewing the prometheus change stack on my todo list for today | 15:28 |
| @clarkb:matrix.org | 8% of backup03's backup volume is now in use so that is another good sign. I also see a number of backup dirs in /opt/backups. That set is not the same set as backup02 because we have retired old backups in place on 02 and the top level dirs stick around. So we may need to do a bit of an accounting comparison too cc mnasiadka | 15:32 |
| @clarkb:matrix.org | but I'm still not seeing anything concerning so this is encouraging | 15:32 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [zuul/zuul-jobs] 990392: Add a nox-py314 job https://review.opendev.org/c/zuul/zuul-jobs/+/990392 | 15:42 | |
| @clarkb:matrix.org | mnasiadka: ok left some thoughts but I think the greptimedb change looks good (just some comments around default var values which is not super critical and some thinking ahead to provisioning swift container to be used by s3 apis) | 15:47 |
| @mnasiadka:matrix.org | Sure, that makes sense - I haven't really analysed how secrets are managed in system-config | 15:48 |
| @clarkb:matrix.org | mnasiadka: we try to keep as much stuff public as possible directly in the system-config repo under inventory/ in the various host vars and group vars files | 15:49 |
| @clarkb:matrix.org | then for things that can't be public there is a shadow set of private vars on bridge that we manage there | 15:49 |
| @fungicide:matrix.org | with dummy values in the public repository for test purposes | 15:56 |
| @fungicide:matrix.org | so that our tests are still using the same playbooks and only the bare minimum variable contents differ | 15:57 |
| @clarkb:matrix.org | yup this particular change sidesteps that by using a file store for greptimedb in test | 16:01 |
| @clarkb:matrix.org | which I think is reasonable so that we don't have to run a swift with s3 middleware just to test this | 16:02 |
| @fungicide:matrix.org | Clark: with the git-review gerrit 3.13 testing, i realized we also test with different jdks, should we stick with openjdk-21 or try running on 25? | 16:05 |
| @clarkb:matrix.org | fungi: I think java 21 remains the required runtime for modern gerrit | 16:06 |
| @clarkb:matrix.org | I don't recall seeing any updates about java 25 (or any other version) | 16:06 |
| @fungicide:matrix.org | okay, i'll just stick with that then | 16:06 |
| -@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/git-review] 990377: Integration testing Gerrit 3.13 and Python 3.14 https://review.opendev.org/c/opendev/git-review/+/990377 | 16:08 | |
| @clarkb:matrix.org | I checked /var/log/borg-backup-backup03.ca-ymq-1.vexxhost.opendev.org.log on etherpad02 and it reports successful backups. | 16:26 |
| @clarkb:matrix.org | I'm running out of things to check short of actually browsing a backup. mnasiadka ^ feel free to double check these items that i checked as well | 16:27 |
| @clarkb:matrix.org | fungi: when you get a chance maybe you can review https://review.opendev.org/c/opendev/system-config/+/990190 as that should fix the chagnes that will allow mixed nodesets for arm testing | 16:37 |
| @fungicide:matrix.org | sure, just a sec | 16:37 |
| @mordred:waterwanders.com | if anyone has a sec, https://review.opendev.org/c/openstack/project-config/+/989417 would be really handy. I'd like to use drizzle as the source of examples for the ai+zuul talk I'm working on. I'll get to tell the "openstack dev process was birthed in the drizzle project" story, which will be lovely and recursive | 16:53 |
| @mordred:waterwanders.com | also - because I know you're all dying to know this - I've got it modernized and building/passing tests on resolute. so no further need for precise containers :) | 16:54 |
| @fungicide:matrix.org | mordred: the commit message there needs wider distribution, i hope this is the start of an article ;) | 16:57 |
| @mordred:waterwanders.com | fungi: I'll be sure to subscribe you to my newsletter. I'm going to take that as consent. | 16:59 |
| @fungicide:matrix.org | hmm, your ideas are intriguing to me and i wish to subscribe to your newsletter but i think we have to go to the retreat anyway | 17:01 |
| @fungicide:matrix.org | mordred: just to confirm, you only want to import code for the drizzle repo, the other 5 will start out ~empty? | 17:03 |
| @mordred:waterwanders.com | fungi: yeah. pre-existing state of the others is a mess not worth cleaning up directly | 17:04 |
| @fungicide:matrix.org | cool, just making sure | 17:04 |
| @mordred:waterwanders.com | turns out - bzr -> git translation got worse. thankfully stewart had already done the main one a few years ago. | 17:05 |
| @clarkb:matrix.org | how big of a repo is that? Does it have all of the mysql history in it too? | 17:06 |
| @clarkb:matrix.org | (mostly wondering if that will suddenly be our biggest repo) | 17:06 |
| @fungicide:matrix.org | i'm not surprised that sort of thing bitrots, since most projects that were going to move from $othervcs to git did so years ago | 17:06 |
| @mordred:waterwanders.com | 22100 commits ... I don't know it that would be all of the mysql history ... but _maybe_ it has it? | 17:07 |
| -@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [opendev/system-config] 990190: Fix order of operations during adns bind installation https://review.opendev.org/c/opendev/system-config/+/990190 | 17:08 | |
| @fungicide:matrix.org | i cloned it and then checked out all branches and tags, total local size is 105M according to `du -sh` | 17:09 |
| @mordred:waterwanders.com | oh - you'll find this fun (I'd forgotten we'd done this) - predating jjb or anything like that, our hudson jobs didn't actually put the logic into the jobs themselves. we had a drizzle-automation repo with a bunch of python programs that implemented the job content, and the hudson jobs were mostly just calling them. So - the various integration test and stress test job logic persists and should be re-recreatable as zuul jobs without too much effort \o/ | 17:09 |
| -@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 990531: Update etherpad from 2.7.3 to 3.2.0 https://review.opendev.org/c/opendev/system-config/+/990531 | 17:09 | |
| @clarkb:matrix.org | fungi: ah ok so not terribly huge. I wonder if there was a pruning at fork time | 17:10 |
| @clarkb:matrix.org | also ^ that is a first attempt at a gerrit upgrade | 17:10 |
| @clarkb:matrix.org | er etherpad upgrade | 17:10 |
| @fungicide:matrix.org | by comparison, `/home/gerrit2/review_site/git/openstack/nova.git` on review03 is currently 1.3G | 17:10 |
| @mordred:waterwanders.com | hats off to us for not historically vendoring in a bunch of crap | 17:11 |
| @mordred:waterwanders.com | Clark: I love the npm -> yarn -> pnpm saga | 17:11 |
| @fungicide:matrix.org | things were much smaller back in ye olden times, needed to fit it on as few floppies as possible | 17:11 |
| @clarkb:matrix.org | mordred: ya.... I decided not to dig through their git logs on that one :) | 17:12 |
| @mordred:waterwanders.com | Clark: left a comment - did you mean to add a commented out section? | 17:12 |
| @clarkb:matrix.org | mordred: I did actually. I tried to clarify in a response on gerrit | 17:15 |
| @mordred:waterwanders.com | Clark: AH - yeah, ok, that makes sense. | 17:20 |
| -@gerrit:opendev.org- Roja Eswaran proposed: [openstack/diskimage-builder] 987717: Replace debootstrap with mmdebstrap https://review.opendev.org/c/openstack/diskimage-builder/+/987717 | 17:23 | |
| @jim:acmegating.com | drizzle has 2x+2 should i send it or wait for more reviews? | 17:43 |
| @fungicide:matrix.org | it seemed like Clark might be looking at it, but if he's not planning to then should be able to just approve | 17:43 |
| @jim:acmegating.com | i'll wait for Clark to say "give more time" or "whatever" :) | 17:45 |
| @clarkb:matrix.org | Oh sorry I think you can proceed | 17:50 |
| @clarkb:matrix.org | I was mostly just wondering if the repo was going to be a new test case for Gerrit in terms of size but seems not | 17:51 |
| -@gerrit:opendev.org- Roja Eswaran proposed: [openstack/diskimage-builder] 987717: Replace debootstrap with mmdebstrap https://review.opendev.org/c/openstack/diskimage-builder/+/987717 | 17:51 | |
| @jim:acmegating.com | yeah, good idea to check. didn't want to approve if you were spelunking. | 17:54 |
| -@gerrit:opendev.org- Roja Eswaran proposed: [openstack/diskimage-builder] 987717: Replace debootstrap with mmdebstrap https://review.opendev.org/c/openstack/diskimage-builder/+/987717 | 18:04 | |
| -@gerrit:opendev.org- Zuul merged on behalf of Monty Taylor https://matrix.to/#/@mordred:inaugust.com: [openstack/project-config] 989417: Import drizzle revival and add a bindep-rs support project https://review.opendev.org/c/openstack/project-config/+/989417 | 18:06 | |
| @fungicide:matrix.org | infra-prod-manage-projects failed in deploy for that, i'm looking into the cause | 18:21 |
| @clarkb:matrix.org | thanks. I'm following up with 989022 now that the adns task reordering is done | 18:22 |
| @clarkb:matrix.org | but I'm also going to pop out for lunch in the nearish future | 18:25 |
| @fungicide:matrix.org | this is how manage-projects failed... cryptic: https://paste.opendev.org/show/bBQcHfBvHV1qkbF4YxNw/ | 18:26 |
| @clarkb:matrix.org | fungi: check the gerrit error log or replication log? | 18:27 |
| @clarkb:matrix.org | fungi: if I had to guess replication failed because the repo wasn't properly created on one or more of the giteas maybe? | 18:27 |
| @clarkb:matrix.org | but I think what happened there is jeepyb tried to push the repo content into gerrit which triggers replication. Then replication fails for $reason and the push is rejected | 18:28 |
| @fungicide:matrix.org | yeah, `... Internal server error (user openstack-project-creator account 6199) during replication start drizzle/dbd-drizzle ...` | 18:29 |
| @clarkb:matrix.org | head appears to be main in both gitea10 and gerrit so I think we got that aligned properly (and this sin't the first repo to have a non master default head) | 18:30 |
| @fungicide:matrix.org | sadly the backtrace doesn't help me | 18:31 |
| @fungicide:matrix.org | `org.kohsuke.args4j.IllegalAnnotationError: No OptionHandler is registered to handle interface java.util.Set` | 18:31 |
| @fungicide:matrix.org | that seems like a secondary exception | 18:32 |
| @clarkb:matrix.org | looking in manage_projects there is a gerrit.replicate() call. It looks like from that traceback that it is complaining about arguments to the replicate command | 18:33 |
| @clarkb:matrix.org | is it possible that gerrit.replicate is what failed and not the push? | 18:33 |
| @fungicide:matrix.org | the backtrace does have references to the commandline parser | 18:34 |
| @clarkb:matrix.org | so I see content in gitea | 18:34 |
| @clarkb:matrix.org | which implies that we pushed something and somethign replicated. My hunch is that the command to explicitly replicate the project failed because gerritlib is passing some invalid args now or something | 18:35 |
| @clarkb:matrix.org | but then general replication catches things up? | 18:35 |
| @fungicide:matrix.org | the most recent project creation run seems to have been for https://review.opendev.org/c/openstack/project-config/+/987373 which happened on 2026-05-07 | 18:35 |
| @clarkb:matrix.org | we upgraded from gerrit 3.12.6 to 3.12.7 on May 15 | 18:36 |
| @clarkb:matrix.org | so maybe that point release changed the replication plugins arg parsing? | 18:37 |
| @fungicide:matrix.org | yeah, you wouldn't expect a patch level upgrade to regress something like that | 18:37 |
| @clarkb:matrix.org | https://gerrit.googlesource.com/plugins/replication/+log/refs/heads/stable-3.13 this does show there is a difference between replication 3.12.6 and 3.12.7 so this is my hunch | 18:37 |
| @clarkb:matrix.org | https://gerrit.googlesource.com/plugins/replication/+/24b0f4b53a6cd48295a3aa78f57aa4d59531b639 is the likely issue | 18:38 |
| @clarkb:matrix.org | though that is a hunch based on it adding/changing arguments | 18:39 |
| @clarkb:matrix.org | and it added a Set which shows up in the traceback | 18:40 |
| @jim:acmegating.com | we've seen significant patch level changes before | 18:40 |
| @jim:acmegating.com | also i am not about to cast the first stone there :) | 18:40 |
| @fungicide:matrix.org | so odds are it's coming from gerritlib doing a `replication start drizzle/dbd-drizzle` | 18:42 |
| @clarkb:matrix.org | so anyway I think manage_projects was successful up to the point where it failed to trigger the replication in gerrit. I think we should check if there are any steps that happen after replication that need to be done or maybe this is good enough for now? and then separately figure out what if anything we can do fix the replication problem | 18:42 |
| @clarkb:matrix.org | fungi: yes | 18:42 |
| @fungicide:matrix.org | yeah, i think the replication plugin is broken... | 18:44 |
| @clarkb:matrix.org | the dependency injection really makes these tracebacks completely useless | 18:44 |
| @fungicide:matrix.org | `$ ssh fungi.admin@review -p 29418 'replication start opendev/bindep'` | 18:44 |
| @fungicide:matrix.org | `fatal: internal server error` | 18:44 |
| @clarkb:matrix.org | I'll start a thread on Gerrit's Discord | 18:45 |
| @fungicide:matrix.org | i'm tracing the rest of the manage-projects script and playbook | 18:45 |
| @clarkb:matrix.org | thanks | 18:45 |
| @jim:acmegating.com | mordred: has not lost the magic touch ;) | 18:46 |
| @clarkb:matrix.org | https://gerrit-review.googlesource.com/c/plugins/replication/+/579801 I think they fixed it but haven't made a release | 18:47 |
| @fungicide:matrix.org | okay, so the good news is that the `gerrit.replicate(project)` call is essentially the last thing the manage-projects script will do, aside from error reporting and some cleanup | 18:50 |
| @jim:acmegating.com | do we only trigger replication on import? and then if we have a gitea server oops we manually trigger it? | 18:50 |
| @jim:acmegating.com | (wondering if we can just live with the bug for a bit) | 18:51 |
| @fungicide:matrix.org | also, running manage-projects is the last task in the manage-projects playbook | 18:51 |
| @fungicide:matrix.org | so in theory all's well and we just need to get the replication plugin working | 18:52 |
| @clarkb:matrix.org | that fix has been backported to 3.11 but not 3.12 and 3.13. I have requested that get done. Then we can build new gerrit images that build replication from stable-3.12 and stable-3.13 instead of v3.12.7 and v3.13.6 | 18:53 |
| @clarkb:matrix.org | alternatively I can udpate our images to manually patch taht fix into our builds | 18:54 |
| @clarkb:matrix.org | I need to head out for lunch now. But hopefully upstream will have some thoughts by the time I get back and we can decide on a reasonable approach from there. Then maybe do a gerrit update tomorrow to fix it | 18:55 |
| @fungicide:matrix.org | i don't think it's urgent, mainly it's broken the command-line parser for the plugin, seems like, but actual configured replication is still working | 18:55 |
| @clarkb:matrix.org | yes, but we do sometimes need to trigger replication and now don't have a way to do that | 18:55 |
| @fungicide:matrix.org | so yes it would be nice to get fixed soon in case we need to force replication for anything, but it's not left us with a serious problem yet | 18:55 |
| @clarkb:matrix.org | I also asked ifthere was another working method for triggering replication re ^ | 18:56 |
| @clarkb:matrix.org | luca is going to push the merge ups to stable 3.12 and stable 3.13 now. In theory once those land we can rebuild images without local patching | 18:57 |
| @jim:acmegating.com | that sounds like a plan. i think we can coast for a day or so. | 18:58 |
| @jim:acmegating.com | so maybe if it hasn't happened by next week it's worth putting some local effort into it | 18:58 |
| @jim:acmegating.com | (and i'm assuming if we have a single gitea server oops, we can probably take it out of service for a bit) | 18:59 |
| @clarkb:matrix.org | cool I'm off ot lunch now. Back in a bit | 18:59 |
| @fungicide:matrix.org | yeah, we may still have some cleanup needed, looking at the present state of gerrit the drizzle-core group seems to have not gotten created, which i think means the acls were never pushed | 18:59 |
| @fungicide:matrix.org | but probably the next step is to restart gerrit onto a fixed replication plugin and then manually rerun manage-projects and see if it's self-correcting | 19:00 |
| @fungicide:matrix.org | and if not, then step through the missing bits by hand | 19:01 |
| @clarkb:matrix.org | If replication happens last any idea why the acls failed? | 19:01 |
| @fungicide:matrix.org | not sure yet, still trying to figure that out | 19:02 |
| @clarkb:matrix.org | I think there is logging for when it believes it needs to push acls and for when it pushes them | 19:02 |
| @fungicide:matrix.org | yeah, https://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/manage_projects.py#L169-L189 does a bit of logging around it | 19:04 |
| @fungicide:matrix.org | the `main()` function logs when it doesn't need to process acls, but not when it does | 19:08 |
| @fungicide:matrix.org | though the `run_command()` function also logs all the commands that are executed | 19:10 |
| @clarkb:matrix.org | https://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/manage_projects.py#L575 happens after the replicate on line 554 | 19:10 |
| @clarkb:matrix.org | So I think that explains it | 19:10 |
| @fungicide:matrix.org | oh! i was looking at the replicate on line 591 but i guess that's just for the old github mirroring | 19:12 |
| @fungicide:matrix.org | so yes, the only thing that we should have run in the script after the replication on line 555 (if it had succeeded) would be the acl config push | 19:12 |
| @fungicide:matrix.org | so acls did not get set on these repos yet, and new groups did not get implicitly created either | 19:13 |
| @fungicide:matrix.org | anyway, hopefully m-p will dtrt when we rerun it with the replication plugin fixed, otherwise i'll do the config pushes by hand | 19:14 |
| @clarkb:matrix.org | I think we need to check that since it may have cached that things are done and skip it. So we may have to unset that flag in the cache. But also is it a problem to resync from the source git repos (it may actually skip that since the repos exist) | 19:15 |
| @fungicide:matrix.org | yes, i'm unsure if it will have decided that's actually done even though it isn't, but at this point rerunning with the fixed plugin will take less time than revisiting the script in depth | 19:17 |
| -@gerrit:opendev.org- Roja Eswaran proposed: [openstack/diskimage-builder] 987717: Replace debootstrap with mmdebstrap https://review.opendev.org/c/openstack/diskimage-builder/+/987717 | 19:44 | |
| @clarkb:matrix.org | https://gerrit-review.googlesource.com/c/plugins/replication/+/588301 and https://gerrit-review.googlesource.com/c/plugins/replication/+/588302 are the changes we need for 3.12 and 3.13 respectively | 20:23 |
| @clarkb:matrix.org | I'll go ahead and push a chagne up to swap the replication plugin over to the stable branch now. But landing that won't be helpful until those changes merge | 20:24 |
| -@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 990570: Build gerrit replication plugin using stable branches https://review.opendev.org/c/opendev/system-config/+/990570 | 20:28 | |
| @clarkb:matrix.org | there that change has staged things for us | 20:28 |
| @mordred:waterwanders.com | what's this? I broke things? \o/ | 20:35 |
| @mordred:waterwanders.com | (reading) | 20:35 |
| @clarkb:matrix.org | you discovered that gerrit broke things in a totally unexecpted way :) | 20:37 |
| @mordred:waterwanders.com | yay, I'm helpful! :) | 20:40 |
| @mordred:waterwanders.com | also - wow, what a fun backscroll | 20:40 |
| -@gerrit:opendev.org- Pierre Riteau proposed: [opendev/irc-meetings] 990572: Move Blazar meeting one hour earlier (DST) https://review.opendev.org/c/opendev/irc-meetings/+/990572 | 20:53 | |
| @clarkb:matrix.org | corvus: fungi: there is a `gerrit-replicate` config that we can set somewhere based on my read of the jeeypb code to disable replication | 20:54 |
| @clarkb:matrix.org | that may be another workaround if this goes longer than a short period fo time. Then we can still create new projects without the explicit replicate step | 20:54 |
| @mordred:waterwanders.com | looking at it - I think the repo would have been pushed to gerrit, which will set the pushed-to-gerrit cache key, which will skip the gerrit replicate call next time. And then acls got skipped because replicate failed, but the acls shouldn't be listed for success in the cache, so next m-p _should_ re-run acls. Best I can tell re-running m-p should fix up the acls and be happy, but replication will definitely need a manual re-kick | 20:55 |
| @mordred:waterwanders.com | GERRIT_REPLICATE = registry.get_defaults('gerrit-replicate', True) | 20:56 |
| @clarkb:matrix.org | Got it. I think replication actually mostly happened automatically too. The manual trigger is belts and suspenders | 20:58 |
| @mordred:waterwanders.com | oh yeah - I agree with you ... except weirdly for the main branch | 21:00 |
| @mordred:waterwanders.com | all the tags are there: https://opendev.org/drizzle/drizzle/src/tag/7.2.24/ - but https://opendev.org/drizzle/drizzle/src/branch/main/ shows a README.me | 21:01 |
| @mordred:waterwanders.com | * all the tags are there: https://opendev.org/drizzle/drizzle/src/tag/7.2.24/ - but https://opendev.org/drizzle/drizzle/src/branch/main/ shows a README.md | 21:01 |
| @clarkb:matrix.org | Any idea where that originated? Like is that an old commit on the main branch? Or did Gerrit auto create that ? | 21:02 |
| @fungicide:matrix.org | i think gitea creates that by default for empty repos | 21:04 |
| @fungicide:matrix.org | then we push over it | 21:04 |
| @clarkb:matrix.org | Aha | 21:06 |
| @clarkb:matrix.org | Ya I think you are right | 21:06 |
| @mordred:waterwanders.com | and maybe background gerrit isn't pushing over that | 21:09 |
| @clarkb:matrix.org | if you clone from gerrit you get `warning: remote HEAD refers to nonexistent ref, unable to checkout` so I think the push to gerrit also failed | 21:09 |
| @mordred:waterwanders.com | WEIRD | 21:09 |
| @mordred:waterwanders.com | I'm helping to uncover all sorts of bugs today | 21:09 |
| @clarkb:matrix.org | which maybe was hidden in all the logs for the other failures? | 21:09 |
| @clarkb:matrix.org | and that is why main didn't get replicated | 21:09 |
| @clarkb:matrix.org | fungi: ^ not sure if you noticed anything around that | 21:10 |
| -@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 840972: DNM force etherpad failure to hold node https://review.opendev.org/c/opendev/system-config/+/840972 | 21:11 | |
| @clarkb:matrix.org | the etherpad chagne passed tests and screenshots look good so the next step is holding a node and checking the held node | 21:12 |
| -@gerrit:opendev.org- Monty Taylor https://matrix.to/#/@mordred:inaugust.com proposed: [opendev/system-config] 990575: Disable replication in manage-projects https://review.opendev.org/c/opendev/system-config/+/990575 | 21:17 | |
| @mordred:waterwanders.com | just in case you want to do that | 21:17 |
| @mordred:waterwanders.com | Clark: so, if gerrit doesn't have the main ref but does have the 7.2.24 ref, then it does have pretty much all of the refs themselves. ... oh weird, found what I think is a bug | 21:34 |
| @mordred:waterwanders.com | Clark: https://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/utils.py#L146 | 21:34 |
| @mordred:waterwanders.com | nm. I was reading it wrong | 21:36 |
| @mordred:waterwanders.com | I'm doubly confused about why it has the tags but not the main branch tip: https://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/manage_projects.py#L368-L369 ... it does: | 21:44 |
| ``` | ||
| u.git_command(repo_path, push_string % remote_url, env=ssh_env) | ||
| u.git_command(repo_path, "push --tags %s" % remote_url, env=ssh_env) | ||
| ``` | ||
| Where push_string should be: ```push %s +refs/copy/heads/*:refs/heads/*``` | ||
| @clarkb:matrix.org | ya I am trying to find the local repo that it pushes from but not finding it yet | 21:46 |
| @clarkb:matrix.org | maybe i hsould look at the manage projects log instead | 21:46 |
| @clarkb:matrix.org | mordred: https://paste.opendev.org/show/bEdKVOKKuw6L6BRNCJw9/ this is what it did from start to finish for drizzle/drizzle | 21:49 |
| @clarkb:matrix.org | /opt/lib/jeepyb/drizzle/ is empty so maybe it cleaned up after itself | 21:49 |
| @mordred:waterwanders.com | wow. that shows no issues pushing the heads | 21:49 |
| @clarkb:matrix.org | ya I wonder now if the problem is the acls | 21:50 |
| @mordred:waterwanders.com | (and I double-checked, nothing seems to have changed with that syntax, like a git upgrade or anything) | 21:50 |
| @clarkb:matrix.org | like maybe our default acls don't handle main but do handle master or something so main isn't visible as a ref? | 21:50 |
| @mordred:waterwanders.com | oh. weird. | 21:50 |
| @clarkb:matrix.org | this isn't the first repo using main though and I don't recall them having any issues | 21:50 |
| @clarkb:matrix.org | and we replicate before setting up acls so I'm really confused | 21:50 |
| @clarkb:matrix.org | but that is my next best guess as to what happened there | 21:51 |
| @clarkb:matrix.org | like maybe the repo is fine but we can't see it | 21:52 |
| @clarkb:matrix.org | review_site/git/drizzle/drizzle.git/refs/heads and review_site/git/drizzle/drizzle.git/branches are both empty though so maybe that is wishful thinking | 21:53 |
| @clarkb:matrix.org | I would expect there to be something in there that explainswhat main is | 21:53 |
| @mordred:waterwanders.com | All-Projects just does refs/* and access "refs/heads/*"] | 21:53 |
| @mordred:waterwanders.com | no mention of main or master anywhere | 21:54 |
| @clarkb:matrix.org | nothing for main in packed-refs either | 21:55 |
| @clarkb:matrix.org | so I think the issue is in the repo not acls again | 21:55 |
| @mordred:waterwanders.com | Clark: could the fsck --full have deleted the heads between the fetch and the push? | 21:57 |
| @mordred:waterwanders.com | if it did, then there would be no heads to push and the push line would silently succeed | 21:58 |
| @clarkb:matrix.org | I was just testing that locally and my local git client did not do that | 21:58 |
| @clarkb:matrix.org | but maybe the git client on review found something it didn't like and ya pruned more than we expected? | 21:58 |
| @clarkb:matrix.org | mordred: dbd-drizzle was the first repo to go and failed first. I'm wondering if there is some side effect on the jeepyb side due to the replication error that is preventing things from being handled properly there? | 22:00 |
| @mordred:waterwanders.com | yeah. I just did the same test | 22:00 |
| @clarkb:matrix.org | but you can see it takes an appreciable amount of time to push both the refs and the tags | 22:01 |
| @clarkb:matrix.org | as if its doing real work | 22:01 |
| @clarkb:matrix.org | just as a quick sanity check I checked that gerrit has disk space and it has plenty | 22:01 |
| @mordred:waterwanders.com | yeah. that sure does look like a real push | 22:01 |
| @clarkb:matrix.org | (in case we did failed writes due to not having disk space or inodes but both look fine | 22:01 |
| @mordred:waterwanders.com | in my local recreation I get this: ``` | 22:02 |
| git for-each-ref refs/copy/ | ||
| 269bf1ede95c17736f45d7fa7cdc2172a677892a commitrefs/copy/heads/drizzle-7.0 | ||
| 3873683eeb4cf650f9cdceece80eca10ca888f23 commitrefs/copy/heads/drizzle-7.1 | ||
| b4de3d25e752315ef7fd020efb3f14ef62bc151d commitrefs/copy/heads/drizzle-7.2 | ||
| 51e535c15ba38fb5c09bd101a374bba14d7c3487 commitrefs/copy/heads/main | ||
| ``` | ||
| @mordred:waterwanders.com | gah | 22:02 |
| @mordred:waterwanders.com | * in my local recreation I get this: | 22:02 |
| ``` | ||
| git for-each-ref refs/copy/ | ||
| 269bf1ede95c17736f45d7fa7cdc2172a677892a commitrefs/copy/heads/drizzle-7.0 | ||
| 3873683eeb4cf650f9cdceece80eca10ca888f23 commitrefs/copy/heads/drizzle-7.1 | ||
| b4de3d25e752315ef7fd020efb3f14ef62bc151d commitrefs/copy/heads/drizzle-7.2 | ||
| 51e535c15ba38fb5c09bd101a374bba14d7c3487 commitrefs/copy/heads/main | ||
| ``` | ||
| @mordred:waterwanders.com | so it's seeing all of them in refs/copy | 22:03 |
| @clarkb:matrix.org | jeepy does clean up its local copy fwiw | 22:03 |
| @mordred:waterwanders.com | (which we'd expect) | 22:03 |
| @clarkb:matrix.org | so the repo not existing on review is expected (but makes it more difficult to debug) | 22:03 |
| @mordred:waterwanders.com | yeah - I feel like it does that on failure so that it'll try again next time, right? | 22:03 |
| @clarkb:matrix.org | ya that could be though I think it checks if the repo is in gerrit and skips some steps if so? This might be a case of manual fixup and then monitor to see if it happens again | 22:06 |
| @mordred:waterwanders.com | yeah. I mean - a manual push is straightforward enough - but this is _weird_ and it would be nice to understand why it broke | 22:08 |
| @mordred:waterwanders.com | like - from what I can see, it shouldn't be broken | 22:08 |
| @clarkb:matrix.org | agreed. But I'm running out of clues in the system to debug from. Really that manage projects paste log has been the most useful but unfortunately doesn't explain anything | 22:08 |
| @clarkb:matrix.org | push_to_gerrit logs exceptions if the pushes fail too. So the log indicates it thinks that it succeeded | 22:09 |
| @mordred:waterwanders.com | Clark: want an even weirder thing? there are a few refs past the latest tag. If I fetch those refs by sha directly from gerrit - they are there: | 22:12 |
| ``` | ||
| $ git fetch ssh://mordred@review.opendev.org:29418/drizzle/drizzle 51e535c15ba38fb5c09bd101a374bba14d7c3487 | ||
| From ssh://review.opendev.org:29418/drizzle/drizzle | ||
| * branch 51e535c15ba38fb5c09bd101a374bba14d7c3487 -> FETCH_HEAD | ||
| ``` | ||
| @mordred:waterwanders.com | so ALL of the refs got pushed, which is super weird if there were no heads. my brain wants to go back to your though of something with acls | 22:13 |
| @mordred:waterwanders.com | like - for some reason do we not have view rights on refs/heads/ here? | 22:13 |
| @clarkb:matrix.org | mordred: I think you should still be parented to all projects. But also when I ls review_site/git/drizzle/drizzle.git/refs there is nothing there | 22:15 |
| @clarkb:matrix.org | and there are only tags in packed-refs | 22:16 |
| @clarkb:matrix.org | fwiw I have looked at the gerrit error_log and sshd_log around the time of the push. In the sshd_log I can see the login and push are recorded without obvious errors. And error_log has nothing drizzle in there that I see | 22:16 |
| @clarkb:matrix.org | it isn't until the replication start stuff happens that error_log records anything amiss | 22:18 |
| @clarkb:matrix.org | maybe something happens afterwards that creates the problem? | 22:18 |
| @clarkb:matrix.org | did we put conflicting entries in the projects.yaml so we pushed over main after the initial setup? I don't think so because main has no ref/sha so it just never got created I think | 22:19 |
| @clarkb:matrix.org | I don't see any such conflicts in projects.yaml | 22:25 |
| @mordred:waterwanders.com | Clark: there isn't anything useful in gerrit's error log is there? like maybe it didn't like something on the backend but didn't return an error? (I'm stretching here) | 22:26 |
| @clarkb:matrix.org | not that I can see. The first errors show up when we try to replicate and hit the understood issue we worked through earlier | 22:26 |
| @mordred:waterwanders.com | I want to just blame bzr | 22:26 |
| @clarkb:matrix.org | and the sshd_log shows that it received the pushed code | 22:26 |
| @clarkb:matrix.org | oooohhhhh | 22:26 |
| @clarkb:matrix.org | honestly that makes me suspect the fsck again | 22:27 |
| @mordred:waterwanders.com | like - we haven't imported a bzr converted git repo in a _very_ long time | 22:27 |
| @clarkb:matrix.org | like maybe that particular version of git can't handle the conversion somehow and fsck removes refs? | 22:27 |
| @mordred:waterwanders.com | granted, this has flowed through github, but who knows what's going on down in the bowels of that | 22:27 |
| @clarkb:matrix.org | I think at this point it is likely that we'll be trying manual pushes and validating along the way and see if anything can be debugged from that? | 22:28 |
| @clarkb:matrix.org | but we should fix replication before we do that | 22:28 |
| @mordred:waterwanders.com | yeah. I agree. I don't think we're going to figure out what went wrong with this particular one at this point | 22:28 |
| @mordred:waterwanders.com | maybe something will fail during a manual operation next time and it'll give us new debug | 22:29 |
| @mordred:waterwanders.com | or maybe it'll work for no reason | 22:29 |
| @clarkb:matrix.org | mordred: is the only missing ref main? all of the tags showed up? | 22:29 |
| @mordred:waterwanders.com | and we'll all go "wow, bzr, shrug" | 22:29 |
| @mordred:waterwanders.com | all the tags are there. none of the 4 branches are | 22:29 |
| @mordred:waterwanders.com | (don't really care about the other branches, but as a data point none of them are there) | 22:29 |
| @clarkb:matrix.org | oh I see in my local test copy drizzle-7.0 drizzle-7.1 and drizzle-7.2 should be there in addition to main | 22:30 |
| @clarkb:matrix.org | but ya none are. So tags are fine. Branches are not | 22:30 |
| @clarkb:matrix.org | for followup corrective processes maybe we try pushing one branch at a time and check as we go in order to try and find where things go wrong | 22:30 |
| @mordred:waterwanders.com | and all the refs are there - even refs not associated with a tag | 22:31 |
| @clarkb:matrix.org | ya so it pushed all the content but then didn't tie the heads into refs/heads properly | 22:31 |
| @mordred:waterwanders.com | yup | 22:31 |
| @clarkb:matrix.org | which explains why the pushes took time | 22:31 |
| @clarkb:matrix.org | infra-root https://gerrit-review.googlesource.com/c/plugins/replication/+/588301 and https://gerrit-review.googlesource.com/c/plugins/replication/+/588302 have merged so updating gerrit via https://review.opendev.org/c/opendev/system-config/+/990570 should do what we want now. Maybe we land that first thing tomorrow then restart gerrit after and then continue debugging the drizzle thing? | 22:36 |
| @mordred:waterwanders.com | Clark: so ... refs/meta/config is completely empty for drizzle/drizzle - I know we hadn't added our acls, but doesn't | 22:36 |
| @mordred:waterwanders.com | Clark: so there's also no empty config there yet | 22:37 |
| @clarkb:matrix.org | mordred: hrm so maybe this is all tied together. Like maybe pushing the acls causes some unexpected side effect for refs in general? | 22:37 |
| @clarkb:matrix.org | I dunno its weird. | 22:38 |
| @mordred:waterwanders.com | well - our acl pushing code fetches first | 22:38 |
| @mordred:waterwanders.com | and it has a poll expecting gerrit to have created an empty one | 22:38 |
| @mordred:waterwanders.com | https://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/manage_projects.py#L108 | 22:38 |
| @mordred:waterwanders.com | _then_ we copy our acls on top of the project.config file | 22:39 |
| @clarkb:matrix.org | oohhh and we even poll for it | 22:39 |
| @clarkb:matrix.org | we likely would've failed there since if the ref doesn't exist then our poll would timeout | 22:40 |
| @clarkb:matrix.org | mordred: though actually you may not have access to refs/meta/config anymore | 22:40 |
| @clarkb:matrix.org | let me check on disk | 22:40 |
| @mordred:waterwanders.com | oh, yeah. nod, good point | 22:41 |
| @mordred:waterwanders.com | I do for All-Projects | 22:41 |
| @clarkb:matrix.org | `review_site/git/drizzle/drizzle.git/refs/meta/config` does exist and points at some sha | 22:41 |
| @mordred:waterwanders.com | but you're right - jnot for other things - it's locked down in All-Projects | 22:41 |
| @mordred:waterwanders.com | ok. red herring | 22:41 |
| @clarkb:matrix.org | so I think this *is* an acl thing | 22:42 |
| @clarkb:matrix.org | but not one that explains why the toher thing exploded | 22:42 |
| @clarkb:matrix.org | so ya I think tomorrow we update gerrit to fix replication. Then we see where we end up from there with likely some manual intervention for those four branches in drizzle/drizzle | 22:43 |
| @clarkb:matrix.org | (mostly thinking tomorrow at this point because it will take an hour to gate the image updates and I'm already likely the only one paying attention at this hour and we would still need to restart gerrit to pick upthe changes) | 22:44 |
| @mordred:waterwanders.com | yeah. and ... maybe we'll learn something | 22:44 |
| @mordred:waterwanders.com | or - maybe it'll all just magic itself | 22:44 |
Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!