15:00:59 <noonedeadpunk> #startmeeting openstack_ansible_meeting 15:00:59 <opendevmeet> Meeting started Tue Nov 16 15:00:59 2021 UTC and is due to finish in 60 minutes. The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:59 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:59 <opendevmeet> The meeting name has been set to 'openstack_ansible_meeting' 15:01:05 <noonedeadpunk> #topic office hours 15:01:23 <noonedeadpunk> ah, I eventually skipped rollcall :( 15:01:27 <noonedeadpunk> \o/ 15:01:43 <damiandabrowski[m]> hey! 15:04:38 <opendevreview> Dmitriy Rabotyagov proposed openstack/openstack-ansible-rabbitmq_server master: Update rabbitmq version https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/817380 15:05:59 <noonedeadpunk> So, I pushed some patches to update rabbitmq and galera version before release 15:06:28 <noonedeadpunk> Frustrating thing is that for rabbitmq (erlang to be specific) bullseye repo is still missing 15:06:43 <noonedeadpunk> and galera 10.6 fails while running mariadb-upgrade 15:06:50 <noonedeadpunk> well, actually, it timeouts 15:06:57 <noonedeadpunk> I didn't have time to dig into this though 15:08:42 <noonedeadpunk> I guess once we figure this out we can do milestone release for testing 15:09:28 <noonedeadpunk> while still taking time before roles branching 15:09:39 <noonedeadpunk> also we have less then a month for final release jsut in case 15:10:46 <noonedeadpunk> like updating maria pooling vars across roles if we got them agreed in time 15:16:29 <damiandabrowski[m]> regarding to sqlachemy's connection pooling, i've checked how many active connections we have on our production environments and i think we should stick to oslodb's defaults(max_pool_size: 5, max_overflow: 50, pool_timeout: 30). I'll try to run some rally tests to make sure it won't degrade performance 15:20:00 <mgariepy> hey. 15:20:17 <noonedeadpunk> \o/ 15:21:10 <noonedeadpunk> I wonder if we should set max_pool_size: 10 but might be we don't need even this amount 15:21:53 <noonedeadpunk> 5 feels somehow extreme 15:22:27 <mgariepy> if you want i can take a couple hours to dig issue this afternoon. 15:24:01 <noonedeadpunk> I won't refuse help - the more eyes we have on this the more balanced result we get I guess 15:24:14 <spatel> hey 15:24:39 <noonedeadpunk> But I'm absolutely sure we should adjust this results and I'd wait for patches to land before release tbh 15:26:41 <damiandabrowski[m]> noonedeadpunk: regarding to max_pool_size, all of the environments i checked did not need more than 2 max_pool_size for the 12 hours(and with max_overflow set to 50, it makes it far more than enough). But i'll share some info after rally tests 15:27:32 <noonedeadpunk> well, in case of 1 controller failure I guess this number would increase? 15:27:35 <damiandabrowski[m]> spatel: there is a script i used to check how many active SQL connections I have for each openstack service. You may find it useful 15:27:35 <damiandabrowski[m]> https://paste.openstack.org/show/811029/ 15:27:52 <noonedeadpunk> I think you meant mgariepy hehe 15:28:09 <spatel> :) 15:28:10 <damiandabrowski[m]> ouh my bad, sorry! 15:28:40 <noonedeadpunk> andrewbonney: you had your head in this topic as well. wdyt? 15:30:23 <mgariepy> nice damiandabrowski[m] thanks i'll take a look 15:33:33 <spatel> damiandabrowski[m] i am trying to run but throwing some error :) 15:33:51 <spatel> may be my box doesn't have some tools 15:34:06 <spatel> sort: cannot read: ./data/keystone: No such file or directory 15:35:09 <damiandabrowski[m]> well, i found an issue with this script yesterday. Looks like it has to be executed from your $PWD. maybe it's the case? 15:36:29 <damiandabrowski[m]> line 13 should create ./data and line 16 should create service directories inside it 15:37:02 <damiandabrowski[m]> ouh and another thing: You need to run this script with 'collect' argument to collect some data, then You will be able to use 'summary' argument ;) 15:39:08 <andrewbonney> It's a while since I've looked at this, but it wouldn't surprise me if existing defaults could be reduced. I just wouldn't have confidence doing so without checking larger deployments 15:39:53 <noonedeadpunk> would be interesting to see spatel's result actually 15:40:35 <spatel> noonedeadpunk are you talking about related SQL script? 15:40:39 <noonedeadpunk> yeah 15:40:52 <spatel> fyi, i am trying to test it on wallaby not latest branch 15:41:52 <spatel> damiandabrowski[m] - https://paste.opendev.org/show/811030/ 15:42:06 <spatel> may need some love to that script.. 15:42:28 <damiandabrowski[m]> tbh i ran it with bash so it may have some issues with sh :/ 15:44:03 <spatel> trying with bash.. 15:45:19 <spatel> https://paste.opendev.org/show/811031/ 15:45:32 <spatel> all zero.. that is impossible 15:45:43 <noonedeadpunk> it's sandbox? 15:45:59 <spatel> lab with 5 compute nodes 15:46:07 <noonedeadpunk> and no activity? 15:46:11 <spatel> 1 controller 15:46:17 <spatel> let me run on busy cloud 15:46:27 <noonedeadpunk> because what this script does calculate I belive amount of active sql requests at a time 15:47:55 <mgariepy> yep is threads are sleeping it won't count them 15:47:57 <spatel> assuming collect continue collecting data in background right? ./mysql-active-conn.sh collect 15:48:06 <damiandabrowski[m]> i think it's very likely to be 0, but You can check the result of `mysql -sse "SELECT user,count(*) FROM information_schema.processlist WHERE command != 'Sleep' GROUP BY user;"` 15:48:19 <damiandabrowski[m]> yeah, i was running 'collect' in the background for 12 hours 15:48:29 <spatel> zero.. result 15:48:52 <noonedeadpunk> hm, that's weird on busy cloud indeed 15:49:00 <spatel> https://paste.opendev.org/show/811032/ 15:49:35 <noonedeadpunk> that is possible actually :) 15:50:22 <mgariepy> but even sleeping connection are still a connection 15:51:05 <damiandabrowski[m]> but if i understand it correctly, our main point of implementing pooling is to reduce the number of sleeping connections as we don't need them ;) 15:51:23 <damiandabrowski[m]> i mean, with max_pool_size=5, we will have 5 sleeping/active connections per worker per service 15:52:05 <noonedeadpunk> yeah idea is that our current default setup is weird in terms of pooling 15:52:24 <noonedeadpunk> because of huge amount of sleeping connections 15:52:49 <spatel> fyi, i rant this script on most busies cloud in my datacenter and result is all zero. i am running collect for just 1 min.. 15:53:06 <spatel> ran* 15:53:34 <opendevreview> James Denton proposed openstack/openstack-ansible-os_ironic master: Add [nova] section to ironic.conf https://review.opendev.org/c/openstack/openstack-ansible-os_ironic/+/818115 15:53:38 <spatel> may be i am running on stein release thats why :) 15:53:54 <noonedeadpunk> nah I don't think it matters 15:54:00 <damiandabrowski[m]> and `cat ./data/nova` has only zeros? i was testing it on victoria, but i think it doesn't matter 15:54:03 <noonedeadpunk> it's pure mysql 15:54:16 <spatel> k 15:55:00 <spatel> damiandabrowski[m] yes all zero for nova and even other services also 15:55:25 <spatel> only root and system user has 1 and 5 connection 15:55:56 <damiandabrowski[m]> so IMO, openstack queries are parsed super fast so script doesn't catch it 15:56:11 <damiandabrowski[m]> i mean, it just parses `mysql -sse "SELECT user,count(*) FROM information_schema.processlist WHERE command != 'Sleep' GROUP BY user;"` and saves the output to service files in ./data/* 15:57:01 <damiandabrowski[m]> during my case, the output was 0 in 99% of cases 15:59:33 <spatel> noonedeadpunk this is my mytop command output - https://paste.opendev.org/show/811033/ 15:59:55 <spatel> majority are sleeping connection 16:00:09 <noonedeadpunk> yeah and we aim to reduce their number) 16:00:32 <spatel> does sleep consume resources ? 16:01:05 <noonedeadpunk> well, they do. it's not like _huge_ problem, but unpleasant 16:01:25 <noonedeadpunk> eventually you have tons of hanging tcp cnnections which also some load on tcp stack 16:02:29 <spatel> how about setting up - interactive_timeout = 300 16:02:44 <spatel> i believe default is 8hrs 16:03:00 <spatel> or wait_timeout ? 16:03:35 <spatel> interactive_timeout | 28800 16:03:43 <spatel> wait_timeout | 28800 16:04:33 <damiandabrowski[m]> sleeping connections are the most problematic when galera nodes are going down. 16:04:33 <damiandabrowski[m]> Galera will keep them until timeout, that's how galera can easily reach max_connections 16:05:14 <damiandabrowski[m]> yeah, actually my main point was to implement connection pooling and lower wait_timeout - but I'm open for Your ideas ;) 16:05:33 <spatel> i had same issue.. that is why i kept max_connection to 5000 and more... my upgrade failed last time because of high max_connection limit 16:08:20 <mgariepy> damiandabrowski[m], which service does consume the most mysql connections ? 16:10:26 <damiandabrowski[m]> in my case: nova 16:10:48 <spatel> yes nova is always on top 16:11:58 <noonedeadpunk> #endmeeting