16:00:26 <johnsom> #startmeeting Octavia
16:00:27 <openstack> Meeting started Wed Jun 10 16:00:26 2020 UTC and is due to finish in 60 minutes.  The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:28 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:30 <openstack> The meeting name has been set to 'octavia'
16:00:52 <johnsom> Hi folks
16:00:59 <gthiemonge> hi
16:01:06 <cgoncalves> hi
16:01:56 <johnsom> #topic Announcements
16:02:03 <johnsom> Seems like a small group this week
16:02:22 <johnsom> FYI, we kept notes from the PTG sessions on the etherpad:
16:02:29 <johnsom> #link https://etherpad.opendev.org/p/octavia-virtual-V-ptg
16:02:38 <johnsom> In case anyone missed the fun and excitement!
16:03:43 <johnsom> Also a quick note, we have seen a few reports of Train deployments that octavia-dashboard does not work. Turns out this was an OpenStack Ansible bug where it was installing the master branch of octavia-dashboard. Fixed here:
16:03:55 <johnsom> #link https://review.opendev.org/734881
16:04:18 <johnsom> Any other announcements this week?
16:05:39 <johnsom> FYI, there is some mailing list discussions about releases:
16:05:41 <johnsom> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015342.html
16:06:13 <johnsom> #topic Brief progress reports / bugs needing review
16:06:26 <johnsom> Well, there was this PTG thing last week. grin
16:06:57 <johnsom> Other than that I have been doing reviews, working to get the gates functional again so that we can land some stuff and do stable branch releases.
16:07:13 <johnsom> Also working on the failover patch and backport to the v2 driver.
16:07:47 <johnsom> That about sums up my week. Anyone else?
16:09:33 <johnsom> #topic HAProxy memory usage and slow reload process cleanup
16:10:11 <gthiemonge> that topic sums up my week ^
16:10:23 <johnsom> gthiemonge Would you introduce this issue?
16:10:38 <gthiemonge> yes
16:11:12 <gthiemonge> so I've found an issue when using active standby and session persistence in centos
16:11:37 <gthiemonge> when the loadbalancer is updated (adding members, etc..), haproxy is reloaded
16:12:25 <gthiemonge> when it is reloading, it creates a new thread, does a lot of allocation, then destroys the previous thread (the worker)
16:12:56 <gthiemonge> in the case of active-standby and session-persistence, it takes 2 min to destroy the previous thread
16:13:10 <gthiemonge> (instead of 1 or 2 seconds)
16:13:37 <gthiemonge> so it means that we have 2 haproxy instances that both consume ~150MB at the same time
16:14:10 <gthiemonge> it should not be a big deal... unless we update the config during this period -> it creates a new worker that consumes 150MB
16:14:29 <gthiemonge> so after few config updates, we have a memory issue and haproxy crashes
16:15:15 <gthiemonge> I have more detail in a downstream bug: #link https://bugzilla.redhat.com/show_bug.cgi?id=1845406#c2
16:15:16 <openstack> bugzilla.redhat.com bug 1845406 in openstack-octavia "octavia_tempest_plugin.tests.api.v2.test_pool.PoolAPITest.test_pool_delete fails in ACTIVE_STANDBY jobs" [High,Assigned] - Assigned to gthiemon
16:15:43 <johnsom> At which point systemd restarts haproxy and things resume until the next update chain, correct?
16:16:06 <gthiemonge> yes, correct
16:16:22 <gthiemonge> I have a paste with logs from ubuntu: http://paste.openstack.org/show/794586/
16:16:35 <johnsom> Which is good, but still not ideal as there is downtime during that systemd restart window.
16:16:48 <gthiemonge> and that one: http://paste.openstack.org/show/794590/ that shows the restart of the service
16:17:09 <gthiemonge> johnsom: I think we can tune the systemd timeout for reload/restart
16:17:41 <gthiemonge> currently, it restarts after 1min30
16:19:01 <johnsom> Yeah, that could be problematic as well though. Sometimes it's good to give a little breathing room between restart attempts.
16:19:25 <johnsom> I think it would be best to not run into the problem in the first place.
16:19:52 <gthiemonge> sure
16:20:35 <johnsom> So the obvious option is bump up the RAM allocated to the amphora. It would only use more on the hypervisor when it is needed.  Though the optics on that may not be good. People see 1GB or 2GB and think that is all "reserved" RAM.
16:21:39 <johnsom> We can drop the default max connections from "unlimited" to something more reasonable, thus saving RAM allocation.
16:21:51 <johnsom> This would mean a change to the "default" behavior though.
16:22:27 <johnsom> Overall, I think that is a good idea anyway but rough given how long it's been set like this.
16:22:46 <johnsom> We could add a swap partition. lol
16:23:36 <gthiemonge> can we change the default max connections to a lower value? and add an config option for people who want to override it (to 1M)?
16:23:40 <johnsom> We could stop doing hitless reloads and just stop/start for configuration changes. (no, don't do this)
16:24:55 <johnsom> We could move to HAProxy 2.2 and use the new configuration API that doesn't need to reload..... (though it's not released yet)
16:26:55 <johnsom> I really lean towards dropping the default maxconn to something more reasonable, like 30,000 or so.
16:27:12 <gthiemonge> +1
16:27:16 <johnsom> I wonder what the RAM usage delta would be.
16:28:44 <gthiemonge> johnsom: I think RAM usage is linear with the maxconn value, I'll check that
16:30:32 <johnsom> Looks like 139464 -> 6664 RSS
16:30:51 <johnsom> At least on my Ubuntu amp in devstack
16:31:30 <gthiemonge> looks good
16:32:16 <johnsom> So, yeah, saves a lot. Plus that RAM was basically wasted since a single CPU isn't going to handle 1,000,000 concurrent connections.
16:32:59 <johnsom> Could use a bit more and go for 50,000
16:33:21 <johnsom> Again, this is all tunable via the listener settings by end users.
16:33:33 <johnsom> Anyone else have input?
16:33:55 <cgoncalves> +1
16:34:04 <gthiemonge> 50001?
16:34:10 <johnsom> lol
16:34:13 <cgoncalves> -1 +W
16:34:25 <johnsom> And we are back to 1,000,000! grin
16:35:20 <johnsom> So, the next part of this question is how to implement it. Currently we have "-1" as "unlimited" which translates to 1,000,000 in the configuration file because HAProxy doesn't really have an "unlimited" setting.
16:36:31 <johnsom> I would like to expose to users that it is set for 30,000 instead of pretending with "-1", but we should keep -1 as an option for other drivers.
16:37:06 <gthiemonge> good question
16:38:05 <johnsom> How do we feel about setting it to the new configuration setting, defaulting to 50,000, if they are using the amphora driver and select "-1"?
16:38:42 <johnsom> At least that way it would be truthful and give the user more information and control.
16:38:53 <johnsom> I just hate "magically" changing settings on users.
16:39:37 <gthiemonge> that sounds good, and people can change that value to get back to the previous behavior
16:40:07 <johnsom> Yeah. I can see with HAProxy 2.x they may want a higher value when using multi-CPU amphora.
16:41:24 <johnsom> Any other comments/thoughts on this?
16:42:29 <johnsom> We could update the API reference to say "The maximum number of connections permitted for this listener. Default value is -1 which represents infinite connections or a default value set in the configuration of a driver." Something like that I guess.
16:43:47 <johnsom> Well, this sounds like the best path forward. gthiemonge are you going to propose a patch?
16:44:14 <gthiemonge> johnsom: yes!
16:45:11 <johnsom> Hopefully we can get wider feedback on the patch proposal.
16:45:19 <johnsom> Cool, thanks for raising this!
16:45:59 <johnsom> #topic Open Discussion
16:46:01 <gthiemonge> np
16:46:11 <johnsom> Any other topics this week?
16:46:49 <johnsom> upstream HAProxy has released a 2.1 version with the "-x" issue fixed.
16:47:05 <johnsom> I'm not sure when the 1.8 version will land, but it's planned.
16:48:09 <johnsom> #link https://github.com/haproxy/haproxy/issues/644
16:48:38 <gthiemonge> you can use https://review.opendev.org/#/c/698086/ to test it ;-)
16:49:23 <johnsom> Lol, I should have! I just compiled one to test it.
16:50:55 <johnsom> Ok, if there are no other topics this week we can call it for today. Thanks!
16:51:08 <johnsom> #endmeeting