clarkb | other thoughts: what if borg is failing too | 00:02 |
---|---|---|
clarkb | and that breaks the pipe and then mysql can no longer write | 00:02 |
clarkb | the way logging is setup in the borg stream setup is it logs the first error in the pipeline which implies the problem is on the first command to fail, but I suppose to could be the second command (borg-backup) that is erroring | 00:03 |
clarkb | I'm guessing that would be a good thing to try and get better logs for first | 00:03 |
fungi | clarkb: i suspect there's some delay between when the dump is initiated and when the data begins to flow. if there's some middlebox terminating "idle" ssh sessions then using ssh keepalives might help? | 13:10 |
opendevreview | Birger J. Nordølum proposed openstack/diskimage-builder master: feat: add almalinux-container element https://review.opendev.org/c/openstack/diskimage-builder/+/883855 | 13:57 |
opendevreview | Merged openstack/project-config master: Remove retired js-openstack-lib from infra https://review.opendev.org/c/openstack/project-config/+/798529 | 13:57 |
fungi | pbr.build made it into the charts on https://discuss.python.org/t/40629 | 14:07 |
Clark[m] | fungi: the error occurs after a minute or two. It shouldn't be long enough for typical keepalives to help. It's a really weird one. After sleeping on it I think I want to try the dump to file then cat it out process manually to see if that has different behavior. And also update the script to check errors for both sides of the pipe | 15:27 |
fungi | i remember years ago the default "idle" session timeout on both juniper netscreen and f5 bigip was 120 seconds unless configured otherwise | 15:29 |
fungi | so i usually set my ssh servers and clients to a 60-second keepalive ping | 15:30 |
Clark[m] | Huh that seems really low. I guess we can try that | 15:30 |
fungi | it does indeed seem low, when by comparison openbsd was something like 2 weeks | 15:30 |
Clark[m] | That would go into .ssh/config? | 15:30 |
fungi | yes | 15:31 |
fungi | ServerAliveInterval is what you want in .ssh/config, according to the manpage | 15:33 |
fungi | "Sets a timeout interval in seconds after which if no data has been received from the server, ssh(1) will send a message through the encrypted channel to request a response from the server." | 15:34 |
fungi | it defaults to 0, which disables that behavior | 15:35 |
fungi | there's also TCPKeepAlive but i've seen middleboxes ignore those | 15:36 |
JayF | fungi: is the `openstackci` pypi account human-accessible at all? | 21:16 |
JayF | fungi: asking in context of https://github.com/eventlet/eventlet/issues/824#issuecomment-1848673640 -- I'm thinking given that's a github repo, and we don't know the shape of the CI | 21:16 |
JayF | might be wise to have openstackci + a human (me, for now?) get granted access | 21:16 |
JayF | but I figured I'd ask about how frequently humans have to use that account already before worrying about complicating the message | 21:17 |
fungi | JayF: yes, opendev sysadmins have access to shared login credentials and otp for that account | 21:43 |
Clark[m] | I think uploads use tokens now? So in theory we can make a token for that repo and give that to the maintainers but I'm not sure if they can be scoped like that | 22:27 |
fungi | they can, there are account-scoped and project-scoped upload tokens | 23:41 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!