opendevreview | Felix Edel proposed zuul/zuul-jobs master: mirror-workspace-git-repos: Retry on failure in git update task https://review.opendev.org/c/zuul/zuul-jobs/+/902907 | 08:07 |
---|---|---|
opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: rocky-container: Add installation of Minimal Install group https://review.opendev.org/c/openstack/diskimage-builder/+/899372 | 08:37 |
opendevreview | Michal Nasiadka proposed openstack/diskimage-builder master: rocky-container: Add installation of Minimal Install group https://review.opendev.org/c/openstack/diskimage-builder/+/899372 | 10:21 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: mirror-workspace-git-repos: Retry on failure in git update task https://review.opendev.org/c/zuul/zuul-jobs/+/902907 | 14:38 |
opendevreview | Merged zuul/zuul-jobs master: mirror-workspace-git-repos: Retry on failure in git update task https://review.opendev.org/c/zuul/zuul-jobs/+/902907 | 15:04 |
clarkb | the gitea09 backups to the one backup server are still failing... | 22:58 |
clarkb | I'm going to try a manual run | 22:58 |
clarkb | it fails when run manually so now we know the periodic jobs aren't at fault. The row it complained about changed between the last autoamted run and my manual run | 23:06 |
clarkb | after realizing I needed to set -o pipefail for accurate test results running `bash /etc/borg-streams/mysql | gzip -9 > clarkb_test_db_backup.sql.gz` locally on the server and not piping to borg to stream offserver succeeds. Which I expected because the other backup host is backing up just fine | 23:24 |
clarkb | this leads me to think that the problem has to do with the networking connection between gitea09 and the vexxhost backup server causing a backup in the stream such that mysqldump has a network error | 23:25 |
opendevreview | Ghanshyam proposed openstack/project-config master: Remove retired js-openstack-lib from infra https://review.opendev.org/c/openstack/project-config/+/798529 | 23:27 |
clarkb | I tried added --max-allowed-packet=256M since the internets say one reason these sorts of errors can occur is having the packet size to small for a row | 23:41 |
clarkb | however, I didn't really expect that to help because the other backup works and I would expect this to be a universal problem if increasing the packet size helps | 23:42 |
clarkb | I undid that manual change and the server is back to the way it started. I think I'm going to need to sleep on this one. It feels like the sort of bug that bashing my head against isn't going to be helpful with since it has to do with buffer/networking/mariadb stuff | 23:44 |
clarkb | one thing I think we could do as a workaround is to have the backup write to a tmpfile on disk, cat the file to stream it out, then rm the file | 23:44 |
clarkb | then we remove the mysqldump into borg-backup and instead have mysqldump to disk, then cat/zcat into borgbackup | 23:45 |
clarkb | ianw: ^ fyi struggles with the streaming backups | 23:45 |
clarkb | not sure if you ahve seen similar before and may have pointers | 23:45 |
clarkb | one thing that just occured to me: This could be a regression in mariadb or mariadbdump/mysqldump since one of the things that does change over time is our mariadb container image | 23:47 |
clarkb | I've put everything back the way it was before. I suspect this will continue to fail until we do something or if this is a mariadb regression they fix it and it magicaly goes away. | 23:51 |
clarkb | the more I think about it the more I like the idea of using a staging file locally. We should be able to do something like our current docker exec command | gzip -9 > $(mktemp tmp.XXXXXXXXXX.sql.gz) && zcat $TMPFILE ; rm $TMPFILE | 23:53 |
clarkb | that said debugging help to better understand is probably the first order of business | 23:54 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!