Thursday, 2026-05-28

-@gerrit:opendev.org- Julia Kreger proposed: [openstack/diskimage-builder] 990417: Add rdma-roce element for RoCE tenant OS images https://review.opendev.org/c/openstack/diskimage-builder/+/99041703:46
-@gerrit:opendev.org- Sylvain Bauza proposed: [opendev/system-config] 990439: Add statusbot in #openstack-agentic-workflows https://review.opendev.org/c/opendev/system-config/+/99043909:27
-@gerrit:opendev.org- Zuul merged on behalf of Sei Sano: [opendev/irc-meetings] 988910: Update Masakari IRC meeting ... https://review.opendev.org/c/opendev/irc-meetings/+/98891010:26
-@gerrit:opendev.org- Zuul merged on behalf of Sylvain Bauza: [opendev/system-config] 990439: Add statusbot in #openstack-agentic-workflows https://review.opendev.org/c/opendev/system-config/+/99043913:49
@clarkb:matrix.orgI don't see any obvious backup complaints in infra-root mail for the last day14:49
@clarkb:matrix.orgthat is agood sign. But should still check servers and their backups before callign it good14:49
@mnasiadka:matrix.orgClark: If I can do anything in my timezone tomorrow to check that - let me know15:25
@clarkb:matrix.orgmnasiadka: ack I think the first step is just to check if anything got written to the backup server df/du/ls etc15:26
@mnasiadka:matrix.orgbtw, I think all the Prometheus stack patches should be good for a re-review15:26
@clarkb:matrix.orgmnasiadka: https://docs.opendev.org/opendev/system-config/latest/sysadmin.html#restore-from-backup would be the next step (that describes how we you can restore from backup which we don't need to do, just need to go far enough to see that we could restore from a backup)15:27
@mnasiadka:matrix.orgMakes sense, I'll have a look in depth tomorrow :)15:27
@clarkb:matrix.orgthe borg mount fuse thing is really convenient (just don't forget to unmount when done)15:27
@clarkb:matrix.orgbut it allows you to browse around the backup as if it were a normal filesystem mount whcih I personally prefer over tarball listings15:28
@clarkb:matrix.orgI've put rereviewing the prometheus change stack on my todo list for today15:28
@clarkb:matrix.org8% of backup03's backup volume is now in use so that is another good sign. I also see a number of backup dirs in /opt/backups. That set is not the same set as backup02 because we have retired old backups in place on 02 and the top level dirs stick around. So we may need to do a bit of an accounting comparison too cc mnasiadka 15:32
@clarkb:matrix.orgbut I'm still not seeing anything concerning so this is encouraging15:32
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [zuul/zuul-jobs] 990392: Add a nox-py314 job https://review.opendev.org/c/zuul/zuul-jobs/+/99039215:42
@clarkb:matrix.orgmnasiadka: ok left some thoughts but I think the greptimedb change looks good (just some comments around default var values which is not super critical and some thinking ahead to provisioning swift container to be used by s3 apis)15:47
@mnasiadka:matrix.orgSure, that makes sense - I haven't really analysed how secrets are managed in system-config15:48
@clarkb:matrix.orgmnasiadka: we try to keep as much stuff public as possible directly in the system-config repo under inventory/ in the various host vars and group vars files15:49
@clarkb:matrix.orgthen for things that can't be public there is a shadow set of private vars on bridge that we manage there15:49
@fungicide:matrix.orgwith dummy values in the public repository for test purposes15:56
@fungicide:matrix.orgso that our tests are still using the same playbooks and only the bare minimum variable contents differ15:57
@clarkb:matrix.orgyup this particular change sidesteps that by using a file store for greptimedb in test16:01
@clarkb:matrix.orgwhich I think is reasonable so that we don't have to run a swift with s3 middleware just to test this16:02
@fungicide:matrix.orgClark: with the git-review gerrit 3.13 testing, i realized we also test with different jdks, should we stick with openjdk-21 or try running on 25?16:05
@clarkb:matrix.orgfungi: I think java 21 remains the required runtime for modern gerrit16:06
@clarkb:matrix.orgI don't recall seeing any updates about java 25 (or any other version)16:06
@fungicide:matrix.orgokay, i'll just stick with that then16:06
-@gerrit:opendev.org- Jeremy Stanley https://matrix.to/#/@fungicide:matrix.org proposed: [opendev/git-review] 990377: Integration testing Gerrit 3.13 and Python 3.14 https://review.opendev.org/c/opendev/git-review/+/99037716:08
@clarkb:matrix.orgI checked /var/log/borg-backup-backup03.ca-ymq-1.vexxhost.opendev.org.log on etherpad02 and it reports successful backups.16:26
@clarkb:matrix.orgI'm running out of things to check short of actually browsing a backup. mnasiadka ^ feel free to double check these items that i checked as well16:27
@clarkb:matrix.orgfungi: when you get a chance maybe you can review https://review.opendev.org/c/opendev/system-config/+/990190 as that should fix the chagnes that will allow mixed nodesets for arm testing16:37
@fungicide:matrix.orgsure, just a sec16:37
@mordred:waterwanders.comif anyone has a sec, https://review.opendev.org/c/openstack/project-config/+/989417 would be really handy. I'd like to use drizzle as the source of examples for the ai+zuul talk I'm working on. I'll get to tell the "openstack dev process was birthed in the drizzle project" story, which will be lovely and recursive16:53
@mordred:waterwanders.comalso - because I know you're all dying to know this - I've got it modernized and building/passing tests on resolute. so no further need for precise containers :)16:54
@fungicide:matrix.orgmordred: the commit message there needs wider distribution, i hope this is the start of an article ;)16:57
@mordred:waterwanders.comfungi: I'll be sure to subscribe you to my newsletter. I'm going to take that as consent.16:59
@fungicide:matrix.orghmm, your ideas are intriguing to me and i wish to subscribe to your newsletter but i think we have to go to the retreat anyway17:01
@fungicide:matrix.orgmordred: just to confirm, you only want to import code for the drizzle repo, the other 5 will start out ~empty?17:03
@mordred:waterwanders.comfungi: yeah. pre-existing state of the others is a mess not worth cleaning up directly17:04
@fungicide:matrix.orgcool, just making sure17:04
@mordred:waterwanders.comturns out - bzr -> git translation got worse. thankfully stewart had already done the  main one a few years ago.17:05
@clarkb:matrix.orghow big of a repo is that? Does it have all of the mysql history in it too?17:06
@clarkb:matrix.org(mostly wondering if that will suddenly be our biggest repo)17:06
@fungicide:matrix.orgi'm not surprised that sort of thing bitrots, since most projects that were going to move from $othervcs to git did so years ago17:06
@mordred:waterwanders.com22100 commits ... I don't know it that would be all of the mysql history ... but _maybe_ it has it?17:07
-@gerrit:opendev.org- Zuul merged on behalf of Clark Boylan: [opendev/system-config] 990190: Fix order of operations during adns bind installation https://review.opendev.org/c/opendev/system-config/+/99019017:08
@fungicide:matrix.orgi cloned it and then checked out all branches and tags, total local size is 105M according to `du -sh`17:09
@mordred:waterwanders.comoh - you'll find this fun (I'd forgotten we'd done this) - predating jjb or anything like that, our hudson jobs didn't actually put the logic into the jobs themselves. we had a drizzle-automation repo with a bunch of python programs that implemented the job content, and the hudson jobs were mostly just calling them. So - the various integration test and stress test job logic persists and should be re-recreatable as zuul jobs without too much effort \o/17:09
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 990531: Update etherpad from 2.7.3 to 3.2.0 https://review.opendev.org/c/opendev/system-config/+/99053117:09
@clarkb:matrix.orgfungi: ah ok so not terribly huge. I wonder if there was a pruning at fork time17:10
@clarkb:matrix.orgalso ^ that is a first attempt at a gerrit upgrade17:10
@clarkb:matrix.orger etherpad upgrade17:10
@fungicide:matrix.orgby comparison, `/home/gerrit2/review_site/git/openstack/nova.git` on review03 is currently 1.3G17:10
@mordred:waterwanders.comhats off to us for not historically vendoring in a bunch of crap17:11
@mordred:waterwanders.comClark: I love the npm -> yarn -> pnpm saga17:11
@fungicide:matrix.orgthings were much smaller back in ye olden times, needed to fit it on as few floppies as possible17:11
@clarkb:matrix.orgmordred: ya.... I decided not to dig through their git logs on that one :)17:12
@mordred:waterwanders.comClark: left a comment - did you mean to add a commented out section?17:12
@clarkb:matrix.orgmordred: I did actually. I tried to clarify in a response on gerrit17:15
@mordred:waterwanders.comClark: AH - yeah, ok, that makes sense.17:20
-@gerrit:opendev.org- Roja Eswaran proposed: [openstack/diskimage-builder] 987717: Replace debootstrap with mmdebstrap https://review.opendev.org/c/openstack/diskimage-builder/+/98771717:23
@jim:acmegating.comdrizzle has 2x+2 should i send it or wait for more reviews?17:43
@fungicide:matrix.orgit seemed like Clark might be looking at it, but if he's not planning to then should be able to just approve17:43
@jim:acmegating.comi'll wait for Clark to say "give more time" or "whatever" :)17:45
@clarkb:matrix.orgOh sorry I think you can proceed 17:50
@clarkb:matrix.orgI was mostly just wondering if the repo was going to be a new test case for Gerrit in terms of size but seems not17:51
-@gerrit:opendev.org- Roja Eswaran proposed: [openstack/diskimage-builder] 987717: Replace debootstrap with mmdebstrap https://review.opendev.org/c/openstack/diskimage-builder/+/98771717:51
@jim:acmegating.comyeah, good idea to check.  didn't want to approve if you were spelunking.17:54
-@gerrit:opendev.org- Roja Eswaran proposed: [openstack/diskimage-builder] 987717: Replace debootstrap with mmdebstrap https://review.opendev.org/c/openstack/diskimage-builder/+/98771718:04
-@gerrit:opendev.org- Zuul merged on behalf of Monty Taylor https://matrix.to/#/@mordred:inaugust.com: [openstack/project-config] 989417: Import drizzle revival and add a bindep-rs support project https://review.opendev.org/c/openstack/project-config/+/98941718:06
@fungicide:matrix.orginfra-prod-manage-projects failed in deploy for that, i'm looking into the cause18:21
@clarkb:matrix.orgthanks. I'm following up with 989022 now that the adns task reordering is done18:22
@clarkb:matrix.orgbut I'm also going to pop out for lunch in the nearish future18:25
@fungicide:matrix.orgthis is how manage-projects failed... cryptic: https://paste.opendev.org/show/bBQcHfBvHV1qkbF4YxNw/18:26
@clarkb:matrix.orgfungi: check the gerrit error log or replication log?18:27
@clarkb:matrix.orgfungi: if I had to guess replication failed because the repo wasn't properly created on one or more of the giteas maybe?18:27
@clarkb:matrix.orgbut I think what happened there is jeepyb tried to push the repo content into gerrit which triggers replication. Then replication fails for $reason and the push is rejected18:28
@fungicide:matrix.orgyeah, `... Internal server error (user openstack-project-creator account 6199) during replication start drizzle/dbd-drizzle ...`18:29
@clarkb:matrix.orghead appears to be main in both gitea10 and gerrit so I think we got that aligned properly (and this sin't the first repo to have a non master default head)18:30
@fungicide:matrix.orgsadly the backtrace doesn't help me18:31
@fungicide:matrix.org`org.kohsuke.args4j.IllegalAnnotationError: No OptionHandler is registered to handle interface java.util.Set`18:31
@fungicide:matrix.orgthat seems like a secondary exception18:32
@clarkb:matrix.orglooking in manage_projects there is a gerrit.replicate() call. It looks like from that traceback that it is complaining about arguments to the replicate command18:33
@clarkb:matrix.orgis it possible that gerrit.replicate is what failed and not the push?18:33
@fungicide:matrix.orgthe backtrace does have references to the commandline parser18:34
@clarkb:matrix.orgso I see content in gitea18:34
@clarkb:matrix.orgwhich implies that we pushed something and somethign replicated. My hunch is that the command to explicitly replicate the project failed because gerritlib is passing some invalid args now or something18:35
@clarkb:matrix.orgbut then general replication catches things up?18:35
@fungicide:matrix.orgthe most recent project creation run seems to have been for https://review.opendev.org/c/openstack/project-config/+/987373 which happened on 2026-05-0718:35
@clarkb:matrix.orgwe upgraded from gerrit 3.12.6 to 3.12.7 on May 1518:36
@clarkb:matrix.orgso maybe that point release changed the replication plugins arg parsing?18:37
@fungicide:matrix.orgyeah, you wouldn't expect a patch level upgrade to regress something like that18:37
@clarkb:matrix.orghttps://gerrit.googlesource.com/plugins/replication/+log/refs/heads/stable-3.13 this does show there is a difference between replication 3.12.6 and 3.12.7 so this is my hunch18:37
@clarkb:matrix.orghttps://gerrit.googlesource.com/plugins/replication/+/24b0f4b53a6cd48295a3aa78f57aa4d59531b639 is the likely issue18:38
@clarkb:matrix.orgthough that is a hunch based on it adding/changing arguments18:39
@clarkb:matrix.organd it added a Set which shows up in the traceback18:40
@jim:acmegating.comwe've seen significant patch level changes before18:40
@jim:acmegating.comalso i am not about to cast the first stone there :)18:40
@fungicide:matrix.orgso odds are it's coming from gerritlib doing a `replication start drizzle/dbd-drizzle`18:42
@clarkb:matrix.orgso anyway I think manage_projects was successful up to the point where it failed to trigger the replication in gerrit. I think we should check if there are any steps that happen after replication that need to be done or maybe this is good enough for now? and then separately figure out what if anything we can do fix the replication problem18:42
@clarkb:matrix.orgfungi: yes18:42
@fungicide:matrix.orgyeah, i think the replication plugin is broken...18:44
@clarkb:matrix.orgthe dependency injection really makes these tracebacks completely useless18:44
@fungicide:matrix.org`$ ssh fungi.admin@review -p 29418 'replication start opendev/bindep'`18:44
@fungicide:matrix.org`fatal: internal server error`18:44
@clarkb:matrix.orgI'll start a thread on Gerrit's Discord18:45
@fungicide:matrix.orgi'm tracing the rest of the manage-projects script and playbook18:45
@clarkb:matrix.orgthanks18:45
@jim:acmegating.commordred: has not lost the magic touch ;)18:46
@clarkb:matrix.orghttps://gerrit-review.googlesource.com/c/plugins/replication/+/579801 I think they fixed it but haven't made a release18:47
@fungicide:matrix.orgokay, so the good news is that the `gerrit.replicate(project)` call is essentially the last thing the manage-projects script will do, aside from error reporting and some cleanup18:50
@jim:acmegating.comdo we only trigger replication on import?  and then if we have a gitea server oops we manually trigger it?18:50
@jim:acmegating.com(wondering if we can just live with the bug for a bit)18:51
@fungicide:matrix.orgalso, running manage-projects is the last task in the manage-projects playbook18:51
@fungicide:matrix.orgso in theory all's well and we just need to get the replication plugin working18:52
@clarkb:matrix.orgthat fix has been backported to 3.11 but not 3.12 and 3.13. I have requested that get done. Then we can build new gerrit images that build replication from stable-3.12 and stable-3.13 instead of v3.12.7 and v3.13.618:53
@clarkb:matrix.orgalternatively I can udpate our images to manually patch taht fix into our builds18:54
@clarkb:matrix.orgI need to head out for lunch now. But hopefully upstream will have some thoughts by the time I get back and we can decide on a reasonable approach from there. Then maybe do a gerrit update tomorrow to fix it18:55
@fungicide:matrix.orgi don't think it's urgent, mainly it's broken the command-line parser for the plugin, seems like, but actual configured replication is still working18:55
@clarkb:matrix.orgyes, but we do sometimes need to trigger replication and now don't have a way to do that18:55
@fungicide:matrix.orgso yes it would be nice to get fixed soon in case we need to force replication for anything, but it's not left us with a serious problem yet18:55
@clarkb:matrix.orgI also asked ifthere was another working method for triggering replication re ^18:56
@clarkb:matrix.orgluca is going to push the merge ups to stable 3.12 and stable 3.13 now. In theory once those land we can rebuild images without local patching18:57
@jim:acmegating.comthat sounds like a plan.   i think we can coast for a day or so.18:58
@jim:acmegating.comso maybe if it hasn't happened by next week it's worth putting some local effort into it18:58
@jim:acmegating.com(and i'm assuming if we have a single gitea server oops, we can probably take it out of service for a bit)18:59
@clarkb:matrix.orgcool I'm off ot lunch now. Back in a bit18:59
@fungicide:matrix.orgyeah, we may still have some cleanup needed, looking at the present state of gerrit the drizzle-core group seems to have not gotten created, which i think means the acls were never pushed18:59
@fungicide:matrix.orgbut probably the next step is to restart gerrit onto a fixed replication plugin and then manually rerun manage-projects and see if it's self-correcting19:00
@fungicide:matrix.organd if not, then step through the missing bits by hand19:01
@clarkb:matrix.orgIf replication happens last any idea why the acls failed?19:01
@fungicide:matrix.orgnot sure yet, still trying to figure that out19:02
@clarkb:matrix.orgI think there is logging for when it believes it needs to push acls and for when it pushes them19:02
@fungicide:matrix.orgyeah, https://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/manage_projects.py#L169-L189 does a bit of logging around it19:04
@fungicide:matrix.orgthe `main()` function logs when it doesn't need to process acls, but not when it does19:08
@fungicide:matrix.orgthough the `run_command()` function also logs all the commands that are executed19:10
@clarkb:matrix.orghttps://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/manage_projects.py#L575 happens after the replicate on line 55419:10
@clarkb:matrix.orgSo I think that explains it19:10
@fungicide:matrix.orgoh! i was looking at the replicate on line 591 but i guess that's just for the old github mirroring19:12
@fungicide:matrix.orgso yes, the only thing that we should have run in the script after the replication on line 555 (if it had succeeded) would be the acl config push19:12
@fungicide:matrix.orgso acls did not get set on these repos yet, and new groups did not get implicitly created either19:13
@fungicide:matrix.organyway, hopefully m-p will dtrt when we rerun it with the replication plugin fixed, otherwise i'll do the config pushes by hand19:14
@clarkb:matrix.orgI think we need to check that since it may have cached that things are done and skip it. So we may have to unset that flag in the cache. But also is it a problem to resync from the source git repos (it may actually skip that since the repos exist)19:15
@fungicide:matrix.orgyes, i'm unsure if it will have decided that's actually done even though it isn't, but at this point rerunning with the fixed plugin will take less time than revisiting the script in depth19:17
-@gerrit:opendev.org- Roja Eswaran proposed: [openstack/diskimage-builder] 987717: Replace debootstrap with mmdebstrap https://review.opendev.org/c/openstack/diskimage-builder/+/98771719:44
@clarkb:matrix.orghttps://gerrit-review.googlesource.com/c/plugins/replication/+/588301 and https://gerrit-review.googlesource.com/c/plugins/replication/+/588302 are the changes we need for 3.12 and 3.13 respectively20:23
@clarkb:matrix.orgI'll go ahead and push a chagne up to swap the replication plugin over to the stable branch now. But landing that won't be helpful until those changes merge20:24
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 990570: Build gerrit replication plugin using stable branches https://review.opendev.org/c/opendev/system-config/+/99057020:28
@clarkb:matrix.orgthere that change has staged things for us20:28
@mordred:waterwanders.comwhat's this? I broke things? \o/20:35
@mordred:waterwanders.com(reading)20:35
@clarkb:matrix.orgyou discovered that gerrit broke things in a totally unexecpted way :)20:37
@mordred:waterwanders.comyay, I'm helpful! :) 20:40
@mordred:waterwanders.comalso - wow, what a fun backscroll20:40
-@gerrit:opendev.org- Pierre Riteau proposed: [opendev/irc-meetings] 990572: Move Blazar meeting one hour earlier (DST) https://review.opendev.org/c/opendev/irc-meetings/+/99057220:53
@clarkb:matrix.orgcorvus: fungi: there is a `gerrit-replicate` config that we can set somewhere based on my read of the jeeypb code to disable replication20:54
@clarkb:matrix.orgthat may be another workaround if this goes longer than a short period fo time. Then we can still create new projects without the explicit replicate step20:54
@mordred:waterwanders.comlooking at it - I think the repo would have been pushed to gerrit, which will set the pushed-to-gerrit cache key, which will skip the gerrit replicate call next time. And then acls got skipped because replicate failed, but the acls shouldn't be listed for success in the cache, so next m-p _should_ re-run acls.  Best I can tell re-running m-p should fix up the acls and be happy, but replication will definitely need a manual re-kick20:55
@mordred:waterwanders.com    GERRIT_REPLICATE = registry.get_defaults('gerrit-replicate', True)20:56
@clarkb:matrix.orgGot it. I think replication actually mostly happened automatically too. The manual trigger is belts and suspenders 20:58
@mordred:waterwanders.comoh yeah - I agree with you ... except weirdly for the main branch21:00
@mordred:waterwanders.comall the tags are there: https://opendev.org/drizzle/drizzle/src/tag/7.2.24/ - but https://opendev.org/drizzle/drizzle/src/branch/main/ shows a README.me21:01
@mordred:waterwanders.com* all the tags are there: https://opendev.org/drizzle/drizzle/src/tag/7.2.24/ - but https://opendev.org/drizzle/drizzle/src/branch/main/ shows a README.md21:01
@clarkb:matrix.orgAny idea where that originated? Like is that an old commit on the main branch? Or did Gerrit auto create that ?21:02
@fungicide:matrix.orgi think gitea creates that by default for empty repos21:04
@fungicide:matrix.orgthen we push over it21:04
@clarkb:matrix.orgAha21:06
@clarkb:matrix.orgYa I think you are right21:06
@mordred:waterwanders.comand maybe background gerrit isn't pushing over that21:09
@clarkb:matrix.orgif you clone from gerrit you get `warning: remote HEAD refers to nonexistent ref, unable to checkout` so I think the push to gerrit also failed21:09
@mordred:waterwanders.comWEIRD21:09
@mordred:waterwanders.comI'm helping to uncover all sorts of bugs today21:09
@clarkb:matrix.orgwhich maybe was hidden in all the logs for the other failures?21:09
@clarkb:matrix.organd that is why main didn't get replicated21:09
@clarkb:matrix.orgfungi: ^ not sure if you noticed anything around that21:10
-@gerrit:opendev.org- Clark Boylan proposed: [opendev/system-config] 840972: DNM force etherpad failure to hold node https://review.opendev.org/c/opendev/system-config/+/84097221:11
@clarkb:matrix.orgthe etherpad chagne passed tests and screenshots look good so the next step is holding a node and checking the held node21:12
-@gerrit:opendev.org- Monty Taylor https://matrix.to/#/@mordred:inaugust.com proposed: [opendev/system-config] 990575: Disable replication in manage-projects https://review.opendev.org/c/opendev/system-config/+/99057521:17
@mordred:waterwanders.comjust in case you want to do that21:17
@mordred:waterwanders.comClark: so, if gerrit doesn't have the main ref but does have the 7.2.24 ref, then it does have pretty much all of the refs themselves. ... oh weird, found what I think is a bug21:34
@mordred:waterwanders.comClark: https://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/utils.py#L14621:34
@mordred:waterwanders.comnm. I was reading it wrong21:36
@mordred:waterwanders.comI'm doubly confused about why it has the tags but not the main branch tip: https://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/manage_projects.py#L368-L369 ... it does: 21:44
```
u.git_command(repo_path, push_string % remote_url, env=ssh_env)
u.git_command(repo_path, "push --tags %s" % remote_url, env=ssh_env)
```
Where push_string should be: ```push %s +refs/copy/heads/*:refs/heads/*```
@clarkb:matrix.orgya I am trying to find the local repo that it pushes from but not finding it yet21:46
@clarkb:matrix.orgmaybe i hsould look at the manage projects log instead21:46
@clarkb:matrix.orgmordred: https://paste.opendev.org/show/bEdKVOKKuw6L6BRNCJw9/ this is what it did from start to finish for drizzle/drizzle21:49
@clarkb:matrix.org/opt/lib/jeepyb/drizzle/ is empty so maybe it cleaned up after itself21:49
@mordred:waterwanders.comwow. that shows no issues pushing the heads21:49
@clarkb:matrix.orgya I wonder now if the problem is the acls21:50
@mordred:waterwanders.com(and I double-checked, nothing seems to have changed with that syntax, like a git upgrade or anything)21:50
@clarkb:matrix.orglike maybe our default acls don't handle main but do handle master or something so main isn't visible as a ref?21:50
@mordred:waterwanders.comoh. weird.21:50
@clarkb:matrix.orgthis isn't the first repo using main though and I don't recall them having any issues21:50
@clarkb:matrix.organd we replicate before setting up acls so I'm really confused21:50
@clarkb:matrix.orgbut that is my next best guess as to what happened there21:51
@clarkb:matrix.orglike maybe the repo is fine but we can't see it21:52
@clarkb:matrix.orgreview_site/git/drizzle/drizzle.git/refs/heads and review_site/git/drizzle/drizzle.git/branches are both empty though so maybe that is wishful thinking21:53
@clarkb:matrix.orgI would expect there to be something in there that explainswhat main is21:53
@mordred:waterwanders.comAll-Projects just does refs/* and access "refs/heads/*"]21:53
@mordred:waterwanders.comno mention of main or master anywhere21:54
@clarkb:matrix.orgnothing for main in packed-refs either21:55
@clarkb:matrix.orgso I think the issue is in the repo not acls again21:55
@mordred:waterwanders.comClark: could the fsck --full have deleted the heads between the fetch and the push?21:57
@mordred:waterwanders.comif it did, then there would be no heads to push and the push line would silently succeed21:58
@clarkb:matrix.orgI was just testing that locally and my local git client did not do that21:58
@clarkb:matrix.orgbut maybe the git client on review found something it didn't like and ya pruned more than we expected?21:58
@clarkb:matrix.orgmordred: dbd-drizzle was the first repo to go and failed first. I'm wondering if there is some side effect on the jeepyb side due to the replication error that is preventing things from being handled properly there?22:00
@mordred:waterwanders.comyeah. I just did the same test22:00
@clarkb:matrix.orgbut you can see it takes an appreciable amount of time to push both the refs and the tags22:01
@clarkb:matrix.orgas if its doing real work22:01
@clarkb:matrix.orgjust as a quick sanity check I checked that gerrit has disk space and it has plenty22:01
@mordred:waterwanders.comyeah. that sure does look like a real push22:01
@clarkb:matrix.org(in case we did failed writes due to not having disk space or inodes but both look fine22:01
@mordred:waterwanders.comin my local recreation I get this: ```22:02
git for-each-ref refs/copy/
269bf1ede95c17736f45d7fa7cdc2172a677892a commitrefs/copy/heads/drizzle-7.0
3873683eeb4cf650f9cdceece80eca10ca888f23 commitrefs/copy/heads/drizzle-7.1
b4de3d25e752315ef7fd020efb3f14ef62bc151d commitrefs/copy/heads/drizzle-7.2
51e535c15ba38fb5c09bd101a374bba14d7c3487 commitrefs/copy/heads/main
```
@mordred:waterwanders.comgah22:02
@mordred:waterwanders.com* in my local recreation I get this:22:02
```
git for-each-ref refs/copy/
269bf1ede95c17736f45d7fa7cdc2172a677892a commitrefs/copy/heads/drizzle-7.0
3873683eeb4cf650f9cdceece80eca10ca888f23 commitrefs/copy/heads/drizzle-7.1
b4de3d25e752315ef7fd020efb3f14ef62bc151d commitrefs/copy/heads/drizzle-7.2
51e535c15ba38fb5c09bd101a374bba14d7c3487 commitrefs/copy/heads/main
```
@mordred:waterwanders.comso it's seeing all of them in refs/copy 22:03
@clarkb:matrix.orgjeepy does clean up its local copy fwiw22:03
@mordred:waterwanders.com(which we'd expect)22:03
@clarkb:matrix.orgso the repo not existing on review is expected (but makes it more difficult to debug)22:03
@mordred:waterwanders.comyeah - I feel like it does that on failure so that it'll try again next time, right?22:03
@clarkb:matrix.orgya that could be though I think it checks if the repo is in gerrit and skips some steps if so? This might be a case of manual fixup and then monitor to see if it happens again22:06
@mordred:waterwanders.comyeah. I mean - a manual push is straightforward enough - but this is _weird_ and it would be nice to understand why it broke22:08
@mordred:waterwanders.comlike - from what I can see, it shouldn't be broken22:08
@clarkb:matrix.orgagreed. But I'm running out of clues in the system to debug from. Really that manage projects paste log has been the most useful but unfortunately doesn't explain anything22:08
@clarkb:matrix.orgpush_to_gerrit logs exceptions if the pushes fail too. So the log indicates it thinks that it succeeded22:09
@mordred:waterwanders.comClark: want an even weirder thing? there are a few refs past the latest tag. If I fetch those refs by sha directly from gerrit - they are there:22:12
```
$ git fetch ssh://mordred@review.opendev.org:29418/drizzle/drizzle 51e535c15ba38fb5c09bd101a374bba14d7c3487
From ssh://review.opendev.org:29418/drizzle/drizzle
* branch 51e535c15ba38fb5c09bd101a374bba14d7c3487 -> FETCH_HEAD
```
@mordred:waterwanders.comso ALL of the refs got pushed, which is super weird if there were no heads. my brain wants to go back to your though of something with acls22:13
@mordred:waterwanders.comlike - for some reason do we not have view rights on refs/heads/ here?22:13
@clarkb:matrix.orgmordred:  I think you should still be parented to all projects. But also when I ls review_site/git/drizzle/drizzle.git/refs there is nothing there22:15
@clarkb:matrix.organd there are only tags in packed-refs22:16
@clarkb:matrix.orgfwiw I have looked at the gerrit error_log and sshd_log around the time of the push. In the sshd_log I can see the login and push are recorded without obvious errors. And error_log has nothing drizzle in there that I see22:16
@clarkb:matrix.orgit isn't until the replication start stuff happens that error_log records anything amiss22:18
@clarkb:matrix.orgmaybe something happens afterwards that creates the problem?22:18
@clarkb:matrix.orgdid we put conflicting entries in the projects.yaml so we pushed over main after the initial setup? I don't think so because main has no ref/sha so it just never got created I think22:19
@clarkb:matrix.orgI don't see any such conflicts in projects.yaml22:25
@mordred:waterwanders.comClark: there isn't anything useful in gerrit's error log is there? like maybe it didn't like something on the backend but didn't return an error? (I'm stretching here)22:26
@clarkb:matrix.orgnot that I can see. The first errors show up when we try to replicate and hit the understood issue we worked through earlier22:26
@mordred:waterwanders.comI want to just blame bzr22:26
@clarkb:matrix.organd the sshd_log shows that it received the pushed code22:26
@clarkb:matrix.orgoooohhhhh22:26
@clarkb:matrix.orghonestly that makes me suspect the fsck again22:27
@mordred:waterwanders.comlike - we haven't imported a bzr converted git repo in a _very_ long time22:27
@clarkb:matrix.orglike maybe that particular version of git can't handle the conversion somehow and fsck removes refs?22:27
@mordred:waterwanders.comgranted, this has flowed through github, but who knows what's going on down in the bowels of that22:27
@clarkb:matrix.orgI think at this point it is likely that we'll be trying manual pushes and validating along the way and see if anything can be debugged from that?22:28
@clarkb:matrix.orgbut we should fix replication before we do that22:28
@mordred:waterwanders.comyeah. I agree. I don't think we're going to figure out what went wrong with this particular one at this point22:28
@mordred:waterwanders.commaybe something will fail during a manual operation next time and it'll give us new debug22:29
@mordred:waterwanders.comor maybe it'll work for no reason22:29
@clarkb:matrix.orgmordred: is the only missing ref main? all of the tags showed up?22:29
@mordred:waterwanders.comand we'll all go "wow, bzr, shrug"22:29
@mordred:waterwanders.comall the tags are there. none of the 4 branches are22:29
@mordred:waterwanders.com(don't really care about the other branches, but as a data point none of them are there)22:29
@clarkb:matrix.orgoh I see in my local test copy drizzle-7.0 drizzle-7.1 and drizzle-7.2 should be there in addition to main22:30
@clarkb:matrix.orgbut ya none are.  So tags are fine. Branches are not22:30
@clarkb:matrix.orgfor followup corrective processes maybe we try pushing one branch at a time and check as we go in order to try and find where things go wrong22:30
@mordred:waterwanders.comand all the refs are there - even refs not associated with a tag22:31
@clarkb:matrix.orgya so it pushed all the content but then didn't tie the heads into refs/heads properly22:31
@mordred:waterwanders.comyup22:31
@clarkb:matrix.orgwhich explains why the pushes took time22:31
@clarkb:matrix.orginfra-root https://gerrit-review.googlesource.com/c/plugins/replication/+/588301 and https://gerrit-review.googlesource.com/c/plugins/replication/+/588302 have merged so updating gerrit via https://review.opendev.org/c/opendev/system-config/+/990570 should do what we want now. Maybe we land that first thing tomorrow then restart gerrit after and then continue debugging the drizzle thing?22:36
@mordred:waterwanders.comClark: so ... refs/meta/config is completely empty for drizzle/drizzle - I know we hadn't added our acls, but doesn't 22:36
@mordred:waterwanders.comClark: so there's also no empty config there yet22:37
@clarkb:matrix.orgmordred: hrm so maybe this is all tied together. Like maybe pushing the acls causes some unexpected side effect for refs in general?22:37
@clarkb:matrix.orgI dunno its weird.22:38
@mordred:waterwanders.comwell - our acl pushing code fetches first22:38
@mordred:waterwanders.comand it has a poll expecting gerrit to have created an empty one22:38
@mordred:waterwanders.comhttps://opendev.org/opendev/jeepyb/src/branch/master/jeepyb/cmd/manage_projects.py#L10822:38
@mordred:waterwanders.com_then_ we copy our acls on top of the project.config file 22:39
@clarkb:matrix.orgoohhh and we even poll for it22:39
@clarkb:matrix.orgwe likely would've failed there since if the ref doesn't exist then our poll would timeout22:40
@clarkb:matrix.orgmordred: though actually you may not have access to refs/meta/config anymore22:40
@clarkb:matrix.orglet me check on disk22:40
@mordred:waterwanders.comoh, yeah. nod, good point22:41
@mordred:waterwanders.comI do for All-Projects22:41
@clarkb:matrix.org`review_site/git/drizzle/drizzle.git/refs/meta/config` does exist and points at some sha22:41
@mordred:waterwanders.combut you're right - jnot for other things - it's locked down in All-Projects22:41
@mordred:waterwanders.comok. red herring22:41
@clarkb:matrix.orgso I think this *is* an acl thing22:42
@clarkb:matrix.orgbut not one that explains why the toher thing exploded22:42
@clarkb:matrix.orgso ya I think tomorrow we update gerrit to fix replication. Then we see where we end up from there with likely some manual intervention for those four branches in drizzle/drizzle22:43
@clarkb:matrix.org(mostly thinking tomorrow at this point because it will take an hour to gate the image updates and I'm already likely the only one paying attention at this hour and we would still need to restart gerrit to pick upthe changes)22:44
@mordred:waterwanders.comyeah. and ... maybe we'll learn something22:44
@mordred:waterwanders.comor - maybe it'll all just magic itself22:44

Generated by irclog2html.py 4.1.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!