09:03:03 <jakeyip> #startmeeting magnum
09:03:03 <opendevmeet> Meeting started Wed Aug 28 09:03:03 2024 UTC and is due to finish in 60 minutes.  The chair is jakeyip. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:03:03 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:03:03 <opendevmeet> The meeting name has been set to 'magnum'
09:03:45 <jakeyip> #link https://etherpad.opendev.org/p/magnum-weekly-meeting
09:03:50 <jakeyip> Please put your topics into to Agenda
09:03:54 <jakeyip> #topic Roll Call
09:03:58 <jakeyip> o/
09:04:04 <jakeyip> mnasiadka / dalees if you are around
09:04:16 <mnasiadka> o/
09:04:17 <mnasiadka> I'm here
09:04:23 <dalees> o/
09:04:31 <dalees> I'm around, sort of.
09:06:28 <jakeyip> cool let's get on with it :)
09:07:01 <jakeyip> #topic QueuePool limit bug
09:07:06 <jakeyip> #link https://bugs.launchpad.net/magnum/+bug/2067345
09:07:20 <jakeyip> andrewbonney: I believe this is from you
09:07:36 <andrewbonney> Yeah, I just wanted to raise it again as we've seen the same as others since upgrading to C
09:07:54 <andrewbonney> It's pretty major as we have to restart Magnum services frequently to keep things working
09:08:01 <jakeyip> did the patches fix things?
09:08:59 <andrewbonney> I haven't applied them personally yet as patching oslo.db is a little involved, but given other services are using oslo.db without issue I was a little surprised that might be required
09:11:28 <jakeyip> yeah I am not sure where the bug is, as I haven't encountered it in prod (we are still at B).
09:12:25 <jakeyip> I was planning to get to C then I can debug, but unfortunately I had to chase down a few bugs in other places affecting our deployment of Magnum, so C upgrade got delayed
09:12:50 <jakeyip> how about mnasiadka or dalees ?
09:14:06 <dalees> Likewise, not running C yet; CAPI driver has got most of my attention for now and Magnum version isn't the limitation anymore.
09:16:36 <jakeyip> if I was to guess, it may have been something introduced by us trying to bring sqlalchemy up to date
09:17:18 <andrewbonney> I did have a look at the code around those changes but nothing jumped out unfortunately
09:17:26 <jakeyip> andrewbonney: are you able to help us test by rolling back those commits?
09:17:35 <jakeyip> #link https://review.opendev.org/c/openstack/magnum/+/910722
09:17:46 <jakeyip> #link https://review.opendev.org/c/openstack/magnum/+/910512
09:18:08 <mnasiadka> we are going to work on upgrades to C - so sooner or later this year we'll probably stumble on the same issue
09:18:42 <jakeyip> andrewbonney: which driver are you using?
09:19:52 <andrewbonney> We're running the vexxhost CAPI integration
09:21:25 <andrewbonney> I'm happy to try rolling stuff back, but that will also involve pinning oslo.db back to ensure compatibility with the autocommit changes
09:25:03 <jakeyip> will reverting just the autocommit change https://review.opendev.org/c/openstack/magnum/+/910722 fail ?
09:25:39 <andrewbonney> If we stick with oslo.db 15 from upper-constraints I believe so yes
09:26:44 <jakeyip> what are the magnum / oslo.db versions you are running now?
09:27:14 <andrewbonney> Magnum 18.0.1, oslo.db 15.0.0
09:28:03 <jakeyip> sqlalchemy?
09:28:30 <andrewbonney> 1.4.51
09:29:58 <dalees> andrewbonney: what is the pattern you see with db connections, how quickly do they rise with approx how many clusters? similar to https://bugs.launchpad.net/magnum/+bug/2067345/comments/12 ?
09:31:29 <andrewbonney> I can go away and collect some data. Looking at our logs it takes maybe 3 days from service restart to start seeing errors, but this is with 1-3 clusters present at any one time
09:31:38 <andrewbonney> We're running all this in a staging environment at present
09:34:36 <dalees> I'll try an upgrade in development env soon, and see if I can reproduce the issues.
09:35:01 <jrosser> what andrewbonney is describing is an environment where we do man create/delete of a small number of clusters
09:35:13 <jrosser> rather than having a large number of clusters that is long lived
09:35:22 <jrosser> *many create/delete
09:35:49 <jakeyip> andrewbonney: another thing you can try is try this patch https://review.opendev.org/c/openstack/magnum/+/926626
09:36:09 <andrewbonney> Will do, ta
09:36:50 <jakeyip> this switches over the code from the legacy facade to the new one introduced in 2024.1, possibly fixing the issue
09:37:42 <jakeyip> no sorry, not introduced in 2024.1, introduced many years ago
09:40:38 <jakeyip> I think rolling forward to https://review.opendev.org/c/openstack/magnum/+/926626 is prob the best choice
09:44:46 <andrewbonney> I'll give that a go and feed back in the issue after I've got some data on connections
09:45:24 <jakeyip> thanks!
09:49:08 <jakeyip> anything else?
09:54:05 <jakeyip> ok thanks everyone for coming
09:54:07 <jakeyip> #endmeeting