| zigo | Hi. I want to rewrite the code that's doing this: | 08:10 |
|---|---|---|
| zigo | http://osbpo.debian.net/deb-status/ | 08:10 |
| zigo | Is there a better thing to do than parsing the HTML file from let's say https://releases.openstack.org/flamingo/index.html ? Is there a json output somewhere of all artifacts from a release for example ? | 08:10 |
| frickler | zigo: the source data for that page should be all yaml in https://opendev.org/openstack/releases/src/branch/master/deliverables/flamingo , maybe some of the release tooling https://opendev.org/openstack/releases/src/branch/master/openstack_releases/cmds can help you deal with it? | 09:45 |
| zigo | frickler: Thanks ! :) | 10:20 |
| zigo | frickler: Are you suggesting that my tool should just fetch the git and parse the yaml files? | 10:21 |
| opendevreview | yatin proposed zuul/zuul-jobs master: [WIP] Make fips setup compatible to 10-stream https://review.opendev.org/c/zuul/zuul-jobs/+/961208 | 10:30 |
| frickler | zigo: up to you, but that would at least sound simpler to me than trying to parse the html that is generated from those files | 10:33 |
| zigo | Sounds good, thanks. | 10:34 |
| *** gthiemon1e is now known as gthiemonge | 11:11 | |
| vsevolod_ | Hello. I have job, tox-py37, which started to fail with retry/retry_limit sometimes between 9 Aug and 10 Sep. | 11:23 |
| vsevolod_ | https://zuul.opendev.org/t/openstack/builds?job_name=tox-py37&project=jjb/jenkins-job-builder | 11:23 |
| vsevolod_ | No logs are shown for it | 11:23 |
| vsevolod_ | How to check, what is wrong? Have something changed in infrastructure in this period? | 11:23 |
| fungi | the mirror.ubuntu-ports volume move took just shy of 30 hours, i've got mirror.ubuntu moving now and that should hopefully be the last of them, so i ought to be able to upgrade afs02.dfw to noble tomorrow or wednesday finally | 12:43 |
| fungi | zigo: the openstack/releases repo also has tooling for parsing and filtering that data, so it may be just a few lines of code to make it output the exact information you're looking for | 12:48 |
| fungi | you want *all* the artifacts for the release, or only the most recent version of each of them in the release | 12:49 |
| zigo | fungi: I just wrote, with the help of AI, a tool that does what I needed ! :) | 12:50 |
| fungi | neat | 12:50 |
| zigo | It's the 2nd time I'm writing something with AI, it's very fast to do quick and dirty code. :P | 12:50 |
| zigo | In this case, I don't really care code quality, so that's ok. | 12:51 |
| fungi | well, if it ends up being a frequent task, we could probably add some feature to the existing openstack release tools, though that's more a conversation for the #openstack-release channel | 12:52 |
| fungi | vsevolod_: the ansible default in the zuul tenant where that's running increased to a version that won't run on ubuntu 18.04 lts (bionic), the job either needs to be moved to a newer platform (i think ubuntu-focal also has python 3.7 packages?) or pin it back to `ansible-version: 9` | 13:04 |
| fungi | https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/EUTEAOIUQZ6YJUQWD75QO34H5TDGBLKP/ | 13:04 |
| fungi | the lack of logs is a quirk of how ansible dropped support for older platforms, its execution never gets far enough to be able to generate a log, and since it's insta-failing in the pre-run phase zuul retries it thinking it might be an intermittent setup problem or broken test node, hence the retry_limit results | 13:08 |
| opendevreview | Jeremy Stanley proposed opendev/bindep master: Switch from license classifier to SPDX expression https://review.opendev.org/c/opendev/bindep/+/945416 | 13:21 |
| fungi | vsevolod_: digging deeper, ubuntu-focal doesn't have python3.7 packages (though does have 3.8), so setting `ansible-version: 9` on the tox-py37 job or dropping that job are your only options for now i think, and zuul will probably cease supporting ansible<11 soon so even that won't work for long | 13:51 |
| clarkb | yes, we intend on removing the bionic test platform as part of that process too | 14:45 |
| clarkb | my internet connection is experiencing packet loss somewhere beyond my personal home network (looks like between my isp and their next hop in california maybe). | 14:48 |
| clarkb | even my mobile provider seems to be struggling this morning so maybe it is more widespread | 15:09 |
| clarkb | and now things seem improved. Maybe I just needed to wait for osmoene to roll into work and reboot a router | 15:19 |
| clarkb | if things stary reliable then I'd like to consider rolling out gitea to the latest bugfix release: https://review.opendev.org/c/opendev/system-config/+/960675 | 15:19 |
| clarkb | fungi: other than the afs moves being slow (expected) anything else I should catch up on this morning? | 15:22 |
| fungi | i don't think so | 15:30 |
| fungi | and yeah, gitea upgrades next would be good | 15:30 |
| fungi | had some lengthy power outages this morning (high winds today), but seems stable enough now | 15:30 |
| fungi | i'm still busy getting noncritical stuff powered back up and clocks reset but about done | 15:31 |
| clarkb | my internets seem consistently happy at this point | 16:17 |
| stephenfin | fungi: I brought this up before (and clarkb explained why it was happening) but when you reply to a review with gertty, it doesn't use the (new'ish) comment system so there's no REPLY / QUOTE / ACK buttons. Probably one to fix at some point, assuming you ever contribute to that | 16:17 |
| clarkb | fungi: I think I'm happy for you to approve the gitea change (if you're happy with it) whenever you think your local network is up and happy | 16:17 |
| * stephenfin has never used gertty | 16:17 | |
| stephenfin | jfyi | 16:17 |
| stephenfin | (spotted on https://review.opendev.org/c/openstack/requirements/+/960843) | 16:18 |
| clarkb | stephenfin: you can manually reply/quote using email style quotes with > fwiw | 16:18 |
| stephenfin | that's what I've done | 16:18 |
| clarkb | but yes also some of fungi's comments also don't show up properly in the web ui comment thread views if you're looking for what changed since as a result so not just in the reporting side of things | 16:18 |
| clarkb | I think corvus does have a patch for this but it isn't reliable so hasn't been merged yet | 16:19 |
| fungi | yeah, i'm using that patch, but i think it only does that for inline comments at the moment | 16:20 |
| fungi | not for patchset level comments | 16:20 |
| fungi | sorry about that | 16:20 |
| opendevreview | Ivan Anfimov proposed openstack/project-config master: Add translation-jobs to masakari-dashboard https://review.opendev.org/c/openstack/project-config/+/961261 | 16:22 |
| opendevreview | Ivan Anfimov proposed openstack/project-config master: Add translation-jobs to masakari-dashboard https://review.opendev.org/c/openstack/project-config/+/961261 | 16:23 |
| clarkb | there is a fake file that the new style system leaves comments against for patchset level comments | 16:24 |
| clarkb | I wonder if it would be difficult to trick gertty into leaving comments on that "file" | 16:24 |
| fungi | i'm not positive it does file-level comments in the new style yet either, may just be line-level so far | 16:25 |
| vsevolod_ | fungi: Back to tox-py37. I have tried all combinations of `nodeset: ubuntu-focal` and `ansible-version: 9` | 16:28 |
| vsevolod_ | Results are same: `No package matching 'python3.7-dev' is available` | 16:28 |
| vsevolod_ | https://zuul.opendev.org/t/openstack/build/c3f4189b34d9431cb89f2d3e042aa859 | 16:28 |
| clarkb | vsevolod_: correct focal does not have bionic | 16:29 |
| clarkb | er sorry | 16:29 |
| clarkb | focal does not have python 3.7 but bionic does | 16:29 |
| clarkb | but that old version of python on bionic only works if you use ansible version 9 | 16:29 |
| clarkb | vsevolod_: is there a reason to keep python3.7 testing alive at this point or can it be removed? | 16:30 |
| zero__ | Not sure if there anyone using it. | 16:33 |
| zero__ | It is Jenkins Job Builder project. | 16:33 |
| clarkb | I have been removing python3.6 and 3.7 testing from a number of projects since the move since it doesn't amke sense in a lot of locations at this point due to age | 16:33 |
| clarkb | I think it is fair to tell people they need to run jjb with a newer version of python. Worst case they can run it in a container if they are on an older platform | 16:34 |
| zero__ | I see. I will remove it then. Thank you. | 16:34 |
| fungi | or run an older version of jjb that was tested with 3.7 | 16:38 |
| clarkb | fungi: I guess the other consideration for https://review.opendev.org/c/opendev/system-config/+/960675 is that we are deep into the openstack release process | 16:40 |
| clarkb | but if you're good with it so am I | 16:41 |
| fungi | it's actually a bit of a lull, most stuff is frozen, projects are figuring out what might make it into an rc2 | 16:41 |
| fungi | zuul isn't idle, but it's not particularly busy either | 16:42 |
| fungi | and this is a minor bugfix release for gitea | 16:44 |
| fungi | i don't see it as all that different from our weekly zuul upgrades, which we haven't paused either | 16:46 |
| fungi | between improvements to our testing and redundancy over the years, what "staying slushy" during releases means has changed for us, in my opinion | 16:47 |
| clarkb | wfm | 16:48 |
| clarkb | do you want to approve it or should I? | 16:48 |
| fungi | i can | 16:49 |
| clarkb | fungi: oh I meant to ask you if doing the vos releases after earch rw move did end up addressing the disk space issues | 17:14 |
| clarkb | and then I guess we don't need to stop vos release ssh tooling beacuse it uses afs01 and afs01 is all done, its just afs02 left | 17:14 |
| fungi | i think there may be somewhat of a delayed reaction for freeing space after a vos release, because the reason the mirror.ubuntu move had to be restarted is that it initially complained about insufficient space | 17:17 |
| fungi | but it was the only one this time | 17:17 |
| clarkb | could be that the backend cleanups run periodically and not instantly I guess | 17:17 |
| fungi | yeah, seems like something along those lines | 17:18 |
| clarkb | then if you want to weigh in on timing for https://review.opendev.org/c/opendev/system-config/+/957555 and https://review.opendev.org/c/opendev/system-config/+/958597 I think that would be helpful. I want to alnd the change to update the versions first then the quay.io source change second and then restart gerrit on the result after both are done | 17:21 |
| clarkb | the reason for that is I don't want to build what is effectively a new 3.10.8 with the wrong plugin versions if we do the quay move first | 17:22 |
| clarkb | better to get the gerrit details right then the docker stuff. | 17:22 |
| clarkb | But with the openstack release I'm not sure what we're thinking about as far as timing goes | 17:22 |
| fungi | yeah, the bug fixes there look like stuff that probably doesn't affect any of our projects, except maybe the mobile ui one which is cosmetic | 17:28 |
| fungi | we're still 2 weeks away from openstack release week, shouldn't be too hard to find a quiet moment for a gerrit restart or two, and this update also doesn't seem like it has any real risk of serious regressions | 17:30 |
| fungi | (openstack 2025.2/flamingo release is two weeks from wednesday) | 17:30 |
| clarkb | I think we only need one restart, we just need to land those changes in order in quick succession. Or I can squash them together if we're worried | 17:31 |
| fungi | nah, just approving them close together and then restarting after they're done is fine by me | 17:32 |
| fungi | maybe even later today if nothing goes sideways with the gitea upgrades | 17:32 |
| fungi | ooh, i totally missed that https://review.opendev.org/c/zuul/zuul-jobs/+/958605 existed and merged, thanks clarkb! | 17:41 |
| fungi | i just started trying to write that change myself and noticed it was not defaulting to what i remembered | 17:42 |
| clarkb | fungi: yup I announced it two weeks prioer to merging that and then merged it on the announced date | 17:42 |
| fungi | awesome. i can focus on the cleanup in that case | 17:42 |
| fungi | once afs server upgrades are done | 17:42 |
| clarkb | so bullseye backports and stretch can be cleared out of that volume | 17:42 |
| clarkb | I'm still curious to know if we can run the cleanup steps in a noop fashion safely 99% of the time. I'd be on board with doing that if os | 17:43 |
| fungi | yeah, and openeuler can also go completely, and soon we can drop bionic from ubuntu and ubuntu-ports | 17:43 |
| clarkb | I think bionic may already be out of ubuntu-ports fwiw but yes | 17:43 |
| fungi | oh, right you are, i did clean that one up in order to free space back at the start of the upgrades | 17:44 |
| opendevreview | Merged opendev/system-config master: Update gitea to 1.24.6 https://review.opendev.org/c/opendev/system-config/+/960675 | 18:36 |
| clarkb | https://gitea09.opendev.org:3081/opendev/system-config/ is updated | 18:40 |
| clarkb | git clone works and browsing the web ui seems happy | 18:41 |
| fungi | yeah, working for me | 18:42 |
| clarkb | all six backends look updated ot me | 18:54 |
| clarkb | and the job succeeded https://zuul.opendev.org/t/openstack/buildset/ff1a7330123441adbd67fc3d4d836620 | 18:54 |
| clarkb | I'm going to mark this done on my todo list and start looking for lunch | 18:54 |
| fungi | cool, when you get back we can decide whether to do the gerrit changes and restart today | 19:04 |
| clarkb | fungi: I do plan to be around for the rest of the day if we want to try and land those changes today. I guess worst case we can do the restart tomorrow if we get images updated today? | 19:34 |
| clarkb | I always feel bad with things like that later in the day as i know it is 3 hours later for you relative to me | 19:34 |
| clarkb | we could also aim for tomorrow or wednesday if that fits into schedules better | 19:34 |
| fungi | tuesdays are tough for me with all the meetings, would prefer to just knock it out while we can | 19:35 |
| fungi | i don't have anywhere to be tonight | 19:36 |
| clarkb | ya I have meetings all day tomorrow too | 19:36 |
| fungi | though i'll probably take off thursday after my lunchtime meeting since it's christine's birthday | 19:36 |
| clarkb | ok I think we want to approve https://review.opendev.org/c/opendev/system-config/+/957555 then https://review.opendev.org/c/opendev/system-config/+/958597 | 19:36 |
| fungi | should i wait for the first one to merge before approving the second, since there's no change relationship between them? | 19:37 |
| clarkb | they aren't stacked and don't have a depends on but I can keep an eye on them in the gate so that if the first one fails I can -W the second one to preserve the order. Then figure out strict order updates (I don't want to do that now because then we'll have to check at least one again) | 19:37 |
| fungi | otherwise the first might fail out of the gate | 19:37 |
| fungi | oh, that also works | 19:37 |
| clarkb | ya but if it does it should reset the second one giving plenty of time to -W it | 19:37 |
| fungi | okay, i'm approving them both in sequence now, it'll take them a while in the gate anyway | 19:38 |
| clarkb | sounds good thanks | 19:38 |
| clarkb | its been a while since we did a gerrit restart. I think the process is stop the containers, move the replication queue stuff aside, delete h2 cache backing files, start gerrit | 19:41 |
| clarkb | then once up and looking happy trigger full reindexing for changes | 19:41 |
| fungi | that matches my memory | 19:41 |
| clarkb | oh and we need to pull the new image first | 19:44 |
| fungi | right | 19:50 |
| clarkb | oh I meant to check gitea replication earlier. https://opendev.org/openstack/swift/commit/1a97c54766e9254ba4f1bf2e64e5a2d53e102ec6 was replicated from https://review.opendev.org/c/openstack/swift/+/961285 so I think it is working | 20:13 |
| clarkb | I did an initial edit to the meeting agenda, but will need to update things once the gerrit stuff is done | 20:26 |
| fungi | looking at the system-config-run-review-3.10 results from the second change, it did use a 3.10.8 image | 20:35 |
| opendevreview | Merged zuul/zuul-jobs master: Make buildx builder image configurable https://review.opendev.org/c/zuul/zuul-jobs/+/960840 | 20:36 |
| fungi | so the dependent pipeline is working correctly there for image builds | 20:36 |
| clarkb | fungi: it should've rebuilt it entirely too fwiw | 20:37 |
| fungi | right | 20:37 |
| fungi | i've got a root screen session going on review | 20:39 |
| fungi | i let #openstack-release know too | 20:43 |
| fungi | status notice The Gerrit service on review.opendev.org will be offline briefly for a quick patch update, but will return within a few minutes | 20:43 |
| fungi | can send that ^ when the time comes | 20:44 |
| opendevreview | Merged opendev/system-config master: Update Gerrit images to 3.10.8 and 3.11.5 https://review.opendev.org/c/opendev/system-config/+/957555 | 20:44 |
| clarkb | fungi: `ls -lh /home/gerrit2/review_site/cache/` on review shows one these is not like the others | 20:51 |
| clarkb | actually two of them. Those are the two caches and their lock files we want to delete when gerrit is shutdown | 20:51 |
| clarkb | gerrit_file_diff.h2.db and git_file_diff.h2.db | 20:51 |
| clarkb | I'm worried its going to hit the shutdown timeout due to that... | 20:52 |
| clarkb | but not sure what we can do at this point other than find out. removing the h2 resizing timeout increase did make things go faster last time but not instant so I wonder if it is based on how big those files are | 20:52 |
| clarkb | `quay.io/opendevorg/gerrit 3.10 2e7da5290a2d` appears to be the image we're currently running on | 20:54 |
| opendevreview | Merged opendev/system-config master: Build gerrit image with python base from quay.io https://review.opendev.org/c/opendev/system-config/+/958597 | 20:59 |
| clarkb | that is "deploying" now | 21:01 |
| fungi | yeah | 21:01 |
| clarkb | the deployment reports success. I guess we can pull now if we're ready to do this | 21:05 |
| fungi | starting, i have the command queued in screen already | 21:05 |
| fungi | and done | 21:06 |
| fungi | quay.io/opendevorg/gerrit 3.10 16e3e6710a12 55 minutes ago 694MB | 21:06 |
| clarkb | I did a docker inspect on the new image and it seems to amtch the latest version on quay | 21:06 |
| fungi | shall i send the notice? | 21:07 |
| clarkb | 58cc86858076f8f5992aa4128a2b15078f557f51de75fe442fa93f033a564c94 from https://quay.io/repository/opendevorg/gerrit/manifest/sha256:58cc86858076f8f5992aa4128a2b15078f557f51de75fe442fa93f033a564c94 | 21:07 |
| clarkb | yes I think we can send the notice then work out a command to do the restart while that is happening? | 21:07 |
| fungi | #status notice The Gerrit service on review.opendev.org will be offline briefly for a quick patch update, but will return within a few minutes | 21:07 |
| opendevstatus | fungi: sending notice | 21:07 |
| -opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline briefly for a quick patch update, but will return within a few minutes | 21:07 | |
| clarkb | looks like the problematic cache files haven't changed names | 21:08 |
| fungi | i've queued up an edited version of what we ran last time, but you noted some different files | 21:08 |
| fungi | okay, so it's the same names as last time | 21:08 |
| clarkb | fungi: ya sorry its the same files I just meant those two are so outrageously different in size to the other h2 cache files they stand out | 21:08 |
| clarkb | not that they were different from last time | 21:08 |
| fungi | oh, right you are, they are rather huge | 21:09 |
| clarkb | absurdly so imo | 21:09 |
| clarkb | I'm really surprised no one else in the gerrit community is complaining | 21:09 |
| opendevstatus | fungi: finished sending notice | 21:10 |
| fungi | okay, command look right? should i wait for jobs in opendev-prod-hourly to finish? | 21:11 |
| clarkb | I think the command looks right. The two caches match the absurdly large ones | 21:11 |
| clarkb | and hourlies are just about to finish so may as wellwait another few seconds? | 21:11 |
| clarkb | in the time it took me to type that they completed | 21:12 |
| fungi | heh | 21:12 |
| clarkb | I guess we are as ready as we can be | 21:12 |
| fungi | the mistral changes in the gate pipeline that say they have 2 minutes remaining? | 21:12 |
| clarkb | yes then they will run post jobs potentially too | 21:12 |
| clarkb | so could wait for those too I suppose | 21:12 |
| fungi | 961162,1 and 961163,1 | 21:13 |
| fungi | odds are they would try to merge while gerrit was down | 21:13 |
| fungi | one of them just did, the other has one job uploading logs now | 21:15 |
| fungi | they're just publishing branch tarballs and mirroring to github now, zuul should be able to do that with gerrit down yeah? | 21:16 |
| clarkb | as long as the merges have completed | 21:16 |
| fungi | they have, i guess time to run the restart then? | 21:17 |
| clarkb | which they won't have started for teh second one yet | 21:17 |
| clarkb | since the queue is supercedent | 21:17 |
| fungi | oh, merges in the zuul merger sense | 21:17 |
| clarkb | yes | 21:17 |
| fungi | okay, the second one has started running jobs | 21:17 |
| fungi | anything else or initiate the restart now? | 21:18 |
| clarkb | I think zuul would've compelted all operations against gerrit for that ref now (in order to enqueue the jobs it has to do that work) but it may do the work again on the executor when an executor is assigned? I can't remember how that works | 21:18 |
| clarkb | looks like it started anyway so I think we're good | 21:19 |
| fungi | it's underway | 21:19 |
| clarkb | the two large h2 files shrunk ever so slightly which is inline with our removal of the lgoner timeout there | 21:20 |
| clarkb | strace then cross checking agains fds in /proc show it doing stuff with /var/gerrit/cache/git_file_diff.h2.db so ya I'm worried this is going to timeout | 21:22 |
| clarkb | so frustrating | 21:22 |
| fungi | maybe we need to go back to having weekly gerrit restarts | 21:22 |
| clarkb | fungi: if we hit the timeout it should exit non zero so you'll need to edit your command to followup and do the other steps once gerrit actually stops | 21:24 |
| fungi | right, i figured | 21:24 |
| fungi | Error response from daemon: given PID did not die within timeout | 21:24 |
| clarkb | fungi: hold on | 21:24 |
| clarkb | fungi: you should actually rerun the entire command mariadb was not stopped | 21:25 |
| fungi | rerunning | 21:25 |
| fungi | starting back up now | 21:25 |
| clarkb | [2025-09-15T21:25:43.989Z] [main] INFO com.google.gerrit.pgm.Daemon : Gerrit Code Review 3.10.8-14-g12f33e503d-dirty ready | 21:26 |
| clarkb | still waiting for diffs to load but the ui is "up" | 21:27 |
| fungi | yeah, it's rendering for me | 21:27 |
| corvus | lol i pushed a big stack that very second :) | 21:27 |
| fungi | d'oh! | 21:27 |
| clarkb | yup I see diffs now | 21:27 |
| corvus | they seem fine! :) | 21:27 |
| fungi | the matrix version of our statusbot can't come soon enough | 21:27 |
| clarkb | I'm really glad that the gerrit folks will be at the summit. I'm really hoping that sitting down with them and explaining that not being able to gracefully stop their software is a major flaw | 21:28 |
| clarkb | particularly since 1) this is just for caches not even data we care about and 2) gerrit is full of data we do care about | 21:28 |
| clarkb | fungi: there are a few of the h2 pruner tasks running (that don't actually reduce the backing file size...) once those are done we can probably think about triggering a reindex? | 21:29 |
| fungi | yeah | 21:30 |
| clarkb | corvus: and you might want to check your changes to make sure you didn't end up generating new changes for them. I think this is the specific race where people push while gerrit is stopping that should update existing changes but if things haven't updated in the index yet then boom | 21:30 |
| clarkb | I spot checked one and it looks ok so didn't hit the race | 21:31 |
| fungi | `gerrit index start changes --force` looks like what i've done before | 21:31 |
| clarkb | that sounds right. the --force is needed bceause we're already on the latest version of the index so by default it won't reindex | 21:32 |
| clarkb | fungi: queues look empty now | 21:32 |
| fungi | not any more! ;) | 21:32 |
| clarkb | https://opendev.org/zuul/zuul/commit/a7baeca65d92dad883b59ab05480c1b0c84bbfe5 seems to have replicated from https://review.opendev.org/c/zuul/zuul/+/960695 which is also the change I spot checked for misalignment on changeid lookups | 21:33 |
| clarkb | so I think replication is working | 21:33 |
| clarkb | fungi: confirmed the queue is now full of work to do :) | 21:33 |
| fungi | i'll let #openstack-release know we're basically done | 21:34 |
| clarkb | sounds good. I already detached from the scree nand will let you decide when it can be shutdown | 21:34 |
| corvus | clarkb: yeah, lgtm so far. i think i pushed right after the process started | 21:34 |
| fungi | i terminated the screen session on review now | 21:35 |
| clarkb | corvus: ack | 21:36 |
| clarkb | fungi: I do notice a few replication errors like those that we move the waiting queue out of the way to avoid. At first I worried that we didn't move things, but on closer inspection I think someone is simply doing edits via the web ui which generates those errors | 21:37 |
| fungi | oh good | 21:37 |
| clarkb | fungi: so it is just coincidence that it happend around when we restarted and using the web ui to edit things is one of the cases where the replication system fails and why we do the mv in the first place | 21:37 |
| fungi | i mean, not good that using the webui causes errors, but whatev | 21:37 |
| clarkb | https://review.opendev.org/c/openstack/kolla-ansible/+/961313 is one of teh changes in question which did get an edit published. Interestingly that seems to affect replication of the change ref itself (it didn't replicate) | 21:38 |
| clarkb | I don't think I ever tracked down if that was the case for not | 21:38 |
| clarkb | but now we know? it shouldn't impact actual refs just the magical gerrit ones so probably not the end of the world. If we want we can request kolla-ansible get force replicated and see if that ref shows up | 21:39 |
| clarkb | https://review.opendev.org/c/openstack/freezer-web-ui/+/961311 is the other and the comment thread there seems to concur | 21:41 |
| clarkb | actually something as trivial as leaving a comment on the chagne may get it to replicate too. I think that is why 961311 is replicated (it was marked ready for review at the end) | 21:43 |
| clarkb | fungi: 961311 raises an error in the error_log if you open it because it can't find ps3 in the index. I think that is because its waiting for reindexing to complete before it can reindex that update | 21:44 |
| clarkb | so many fun and exciting behaviors to sort through here. I really don't like that gerrit restarts were once simple easy and reliable and now are full of gotchas and confusing behaviors | 21:44 |
| clarkb | reindexing is just about halfway done | 21:50 |
| clarkb | once reindexing is done I'll check that 961311 seems happier when loaded then I'll work on getting the meetin agenda updated and sent | 21:52 |
| clarkb | corvus: I've just tried to delete autoholds 0000000230 and 0000000231 in the openstack tenant in order to recycle my autoholds due to the new images and the commands don't complain but nothing seems deleted | 22:01 |
| clarkb | I went ahead and put in two new autoholds and rechecked the change | 22:02 |
| clarkb | reindexing is complete. It complained about 3 changes which I believe is consistent with preexisting failures | 22:05 |
| clarkb | and as expected (hoped really) opening https://review.opendev.org/c/openstack/freezer-web-ui/+/961311 no longer emits a traceback in error_log now that reindexing is complete | 22:06 |
| fungi | oh good | 22:06 |
| clarkb | ok my meeting agenda edits are in. Let me know if there are any other items to change/add/remove and I'll get that done before sending the email out. Otherwise expect the email to go out around 23:00 UTC | 22:09 |
| clarkb | fungi: zuul +1'd https://review.opendev.org/c/openstack/kolla-ansible/+/961313 but it isn't replicated. Do you think we should try a force replication for kolla-ansible to see if that fixes it? | 22:10 |
| clarkb | I suspect that this problem exists for any changes that are created via the gerrit web ui so not a new regression, but we may wish to do this in order to udnerstand the issue better | 22:11 |
| clarkb | in particular that change was created via cherry pick from one branch to another then edited via the web ui. It is the web ui events in particular that I think break replication due to the events being filtered out at least partially | 22:12 |
| clarkb | hrm https://review.opendev.org/c/openstack/kolla-ansible/+/961312 was an edit done in a similar way to a different branch in the same repo and it looks ok replication wise | 22:14 |
| clarkb | so ya not sure why that particular one didn't replicate now | 22:14 |
| clarkb | the replication log shows status=NOT_ATTEMPTED for all replication related to 961313 which also seems to be the cae for 961313. Maybe something else updated that caused a side effect of replicating 961313 but that same side effect source didn't occur for 961312? | 22:17 |
| clarkb | just to rule out a luck backend load balancing situation for myself I checked all six backends and the one that is present is present on all six backends and the one that is missing is missing on all six backends which is good I guess | 22:21 |
| clarkb | here is a theory [2025-09-15 21:32:51,583] i the timestamp for the first entry for the missing item. That coincides roughly with the start of reindexing. What if that is a race condition and it couldn't replicate due to index state? | 22:22 |
| corvus | clarkb: i think that's due to a one-time zk cleanup for leaked lock nodes, i can fix | 22:22 |
| clarkb | corvus: thanks | 22:22 |
| clarkb | re the replication situation as far as I can tell replication generally works and I suspect that the problem is specific to this change due to a confluence of events (how it was created and when it was created). I'm not sure it is worth debugging much further but fi we want I think we can trigger a full replication against openstack/kolla-ansible | 22:23 |
| corvus | done. i also deleted the old nodepool node records. | 22:29 |
| clarkb | end of an era | 22:30 |
| corvus | i'm going to delete some old autoholds from tonyb which aren't effective anymore | 22:31 |
| corvus | okay, i think that's all the surgery needed; what remains looks modern. if we see any more issues with holds in the openstack tenant, lmk, those might be real new bugs. | 22:33 |
| clarkb | will do thanks again! | 22:33 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!