09:03:03 #startmeeting magnum 09:03:03 Meeting started Wed Aug 28 09:03:03 2024 UTC and is due to finish in 60 minutes. The chair is jakeyip. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:03:03 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:03:03 The meeting name has been set to 'magnum' 09:03:45 #link https://etherpad.opendev.org/p/magnum-weekly-meeting 09:03:50 Please put your topics into to Agenda 09:03:54 #topic Roll Call 09:03:58 o/ 09:04:04 mnasiadka / dalees if you are around 09:04:16 o/ 09:04:17 I'm here 09:04:23 o/ 09:04:31 I'm around, sort of. 09:06:28 cool let's get on with it :) 09:07:01 #topic QueuePool limit bug 09:07:06 #link https://bugs.launchpad.net/magnum/+bug/2067345 09:07:20 andrewbonney: I believe this is from you 09:07:36 Yeah, I just wanted to raise it again as we've seen the same as others since upgrading to C 09:07:54 It's pretty major as we have to restart Magnum services frequently to keep things working 09:08:01 did the patches fix things? 09:08:59 I haven't applied them personally yet as patching oslo.db is a little involved, but given other services are using oslo.db without issue I was a little surprised that might be required 09:11:28 yeah I am not sure where the bug is, as I haven't encountered it in prod (we are still at B). 09:12:25 I was planning to get to C then I can debug, but unfortunately I had to chase down a few bugs in other places affecting our deployment of Magnum, so C upgrade got delayed 09:12:50 how about mnasiadka or dalees ? 09:14:06 Likewise, not running C yet; CAPI driver has got most of my attention for now and Magnum version isn't the limitation anymore. 09:16:36 if I was to guess, it may have been something introduced by us trying to bring sqlalchemy up to date 09:17:18 I did have a look at the code around those changes but nothing jumped out unfortunately 09:17:26 andrewbonney: are you able to help us test by rolling back those commits? 09:17:35 #link https://review.opendev.org/c/openstack/magnum/+/910722 09:17:46 #link https://review.opendev.org/c/openstack/magnum/+/910512 09:18:08 we are going to work on upgrades to C - so sooner or later this year we'll probably stumble on the same issue 09:18:42 andrewbonney: which driver are you using? 09:19:52 We're running the vexxhost CAPI integration 09:21:25 I'm happy to try rolling stuff back, but that will also involve pinning oslo.db back to ensure compatibility with the autocommit changes 09:25:03 will reverting just the autocommit change https://review.opendev.org/c/openstack/magnum/+/910722 fail ? 09:25:39 If we stick with oslo.db 15 from upper-constraints I believe so yes 09:26:44 what are the magnum / oslo.db versions you are running now? 09:27:14 Magnum 18.0.1, oslo.db 15.0.0 09:28:03 sqlalchemy? 09:28:30 1.4.51 09:29:58 andrewbonney: what is the pattern you see with db connections, how quickly do they rise with approx how many clusters? similar to https://bugs.launchpad.net/magnum/+bug/2067345/comments/12 ? 09:31:29 I can go away and collect some data. Looking at our logs it takes maybe 3 days from service restart to start seeing errors, but this is with 1-3 clusters present at any one time 09:31:38 We're running all this in a staging environment at present 09:34:36 I'll try an upgrade in development env soon, and see if I can reproduce the issues. 09:35:01 what andrewbonney is describing is an environment where we do man create/delete of a small number of clusters 09:35:13 rather than having a large number of clusters that is long lived 09:35:22 *many create/delete 09:35:49 andrewbonney: another thing you can try is try this patch https://review.opendev.org/c/openstack/magnum/+/926626 09:36:09 Will do, ta 09:36:50 this switches over the code from the legacy facade to the new one introduced in 2024.1, possibly fixing the issue 09:37:42 no sorry, not introduced in 2024.1, introduced many years ago 09:40:38 I think rolling forward to https://review.opendev.org/c/openstack/magnum/+/926626 is prob the best choice 09:44:46 I'll give that a go and feed back in the issue after I've got some data on connections 09:45:24 thanks! 09:49:08 anything else? 09:54:05 ok thanks everyone for coming 09:54:07 #endmeeting