09:03:50 Please put your topics into to Agenda
09:03:54 #topic Roll Call
09:03:58 o/
09:04:04 mnasiadka / dalees if you are around
09:04:16 o/
09:04:17 I'm here
09:04:23 o/
09:04:31 I'm around, sort of.
09:06:28 cool let's get on with it :)
09:07:01 #topic QueuePool limit bug
09:07:06 #link https://bugs.launchpad.net/magnum/+bug/2067345
09:07:20 andrewbonney: I believe this is from you
09:07:36 Yeah, I just wanted to raise it again as we've seen the same as others since upgrading to C
09:07:54 It's pretty major as we have to restart Magnum services frequently to keep things working
09:08:01 did the patches fix things?
09:08:59 I haven't applied them personally yet as patching oslo.db is a little involved, but given other services are using oslo.db without issue I was a little surprised that might be required
09:11:28 yeah I am not sure where the bug is, as I haven't encountered it in prod (we are still at B).
09:12:25 I was planning to get to C then I can debug, but unfortunately I had to chase down a few bugs in other places affecting our deployment of Magnum, so C upgrade got delayed
09:12:50 how about mnasiadka or dalees ?
09:14:06 Likewise, not running C yet; CAPI driver has got most of my attention for now and Magnum version isn't the limitation anymore.
09:16:36 if I was to guess, it may have been something introduced by us trying to bring sqlalchemy up to date
09:17:18 I did have a look at the code around those changes but nothing jumped out unfortunately
09:17:26 andrewbonney: are you able to help us test by rolling back those commits?
09:17:35 #link https://review.opendev.org/c/openstack/magnum/+/910722
09:17:46 #link https://review.opendev.org/c/openstack/magnum/+/910512
09:18:08 we are going to work on upgrades to C - so sooner or later this year we'll probably stumble on the same issue
09:18:42 andrewbonney: which driver are you using?
09:19:52 We're running the vexxhost CAPI integration
09:21:25 I'm happy to try rolling stuff back, but that will also involve pinning oslo.db back to ensure compatibility with the autocommit changes
09:25:03 will reverting just the autocommit change https://review.opendev.org/c/openstack/magnum/+/910722 fail ?
09:25:39 If we stick with oslo.db 15 from upper-constraints I believe so yes
09:26:44 what are the magnum / oslo.db versions you are running now?
09:27:14 Magnum 18.0.1, oslo.db 15.0.0
09:28:03 sqlalchemy?
09:28:30 1.4.51
09:29:58 andrewbonney: what is the pattern you see with db connections, how quickly do they rise with approx how many clusters? similar to https://bugs.launchpad.net/magnum/+bug/2067345/comments/12 ?
09:31:29 I can go away and collect some data. Looking at our logs it takes maybe 3 days from service restart to start seeing errors, but this is with 1-3 clusters present at any one time
09:31:38 We're running all this in a staging environment at present
09:34:36 I'll try an upgrade in development env soon, and see if I can reproduce the issues.
09:35:01 what andrewbonney is describing is an environment where we do man create/delete of a small number of clusters
09:35:13 rather than having a large number of clusters that is long lived
09:35:22 *many create/delete
09:35:49 andrewbonney: another thing you can try is try this patch https://review.opendev.org/c/openstack/magnum/+/926626
09:36:09 Will do, ta
09:36:50 this switches over the code from the legacy facade to the new one introduced in 2024.1, possibly fixing the issue
09:37:42 no sorry, not introduced in 2024.1, introduced many years ago
09:40:38 I think rolling forward to https://review.opendev.org/c/openstack/magnum/+/926626 is prob the best choice
09:44:46 I'll give that a go and feed back in the issue after I've got some data on connections
09:45:24 thanks!
09:49:08 anything else?
09:54:05 ok thanks everyone for coming