Tuesday, 2022-07-05

*** ysandeep|out is now known as ysandeep|ruck04:02
*** ysandeep|ruck is now known as ysandeep|ruck|afk04:38
*** ysandeep|ruck|afk is now known as ysandeep|ruck05:08
*** ysandeep|ruck is now known as ysandeep|ruck|afk06:18
opendevreviewVishal Manchanda proposed openstack/octavia-dashboard master: Migrate to AngularJS v1.8.2  https://review.opendev.org/c/openstack/octavia-dashboard/+/84617606:35
*** ysandeep|ruck|afk is now known as ysandeep|ruck07:11
*** ysandeep|ruck is now known as ysandeep|lunch08:15
*** ysandeep|lunch is now known as ysandeep|ruck10:16
*** ysandeep|ruck is now known as ysandeep|brb12:52
*** ysandeep|brb is now known as ysandeep|ruck13:01
opendevreviewTom Weininger proposed openstack/octavia master: Proposal for Amphora vertical scaling  https://review.opendev.org/c/openstack/octavia/+/84810514:00
opendevreviewTom Weininger proposed openstack/octavia master: Add element for TuneD and Tuna  https://review.opendev.org/c/openstack/octavia/+/84863714:00
*** ysandeep|ruck is now known as ysandeep|out14:39
opendevreviewTom Weininger proposed openstack/octavia master: Add element for TuneD and Tuna  https://review.opendev.org/c/openstack/octavia/+/84863715:14
spateljohnsom Hi20:04
johnsomspatel Hi20:05
spatelI have haproxy related question, i think you can help me out here :)  - https://paste.opendev.org/show/bhM1FR1Y0fzOyaSPexiN/20:05
spatelI have 5 backend servers and single source IP 20:06
spatelwhy i am running out of ports even i have enough local port range?20:06
johnsomAre you benchmarking or normal traffic?20:06
spatelI am benchmarking haproxy 20:07
spatelwe have built in load tester for my customer application 20:07
spatelI am generating 50k connection to haproxy but as soon as it hit 35k ish.. then problem start and haproxy.log full of error logs20:08
johnsomYeah, ok, so the kernel will put the ports in TIMED_WAIT for a period after use. When benchmarking you are making a lot of short connections (unless you use HTTP keepalive), which can use up the ports and they will be stuck in "TIMED_WAIT".20:08
johnsomYou can use lsof to see those20:09
spatelI didn't noticed any error in dmes (kernel logs related local_port_range)20:09
spatellsof -u haproxy -n | grep WAIT20:10
spatelempty output20:11
johnsomDon't use -u, just lsof -n20:11
johnsomTechnically they aren't owned by haproxy at that point.20:11
johnsomHAProxy has let them go20:11
spatelmy load-tester has keepalive which sending ping packet to keep connection alive.. 20:11
johnsomIs it tcp keepalive or HTTP keepalive, they are different20:12
spatellsof -n | grep WAIT20:12
spatelempty output 20:12
johnsomAnd the benchmark is running?20:12
spatellsof -n | grep EST  | wc -l20:13
spatelI have 15k ESTABLISHED connections20:13
spatelfor experiment i added alias IP and configured haproxy to use that ip then i can hit 50k connection without issue 20:14
johnsomYeah, it supports multiple source IPs20:15
spatelI am just curious i have 5 backend server then it should not required more source IP 20:16
johnsomCan you try "echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse"?20:16
spatelok and then run loadtest?20:16
johnsomYeah, it's all about the source ports. From your pastebin, you have 55,000 so, should be ok.20:16
spatelloadtest started 20:17
johnsomDefault is 0, so you can set it back if you wish20:17
spatelyep! lets see20:17
johnsomThe other thing to note, the health checks will also take a port20:17
johnsomBut, that should be small20:18
spatelYes that is correct20:18
spatelmy end goal is to run 1million connection :D20:18
johnsomThat is a lot, so you will need to do some tuning20:18
spatelbut currently testing with 50k and then try 100k --- 200k -- 500k (baby steps)20:19
johnsomAlso, do you need 1 million connections or requests. Different metrics. Connections is heavily weighted to issues with the TCP setup20:20
spatelYes we are planning to play with more options for 1mil20:20
johnsomIf you haven't seen it20:20
spateli am seeing error in logs no free ports :(20:20
spatel  /proc/sys/net/ipv4/tcp_tw_reuse didn't help20:21
johnsomHmmm. The other thing that will be an issue for you is the TLS offload and the logging.20:21
spatelI will deal with TLS later but currently not able to run 50k :)20:22
spatellets try to pass 50k first 20:22
johnsomTLS is going to need some hardware help, so make sure AESNI is available, but with that connection level, you might need to go all the way to QAT offload.20:22
spatelplan is to deploy multiple HA proxy in DNS rr method 20:22
spatelin worst case scenario 20:23
johnsomYeah, no need for session persistence, it works fine20:23
spateli want to check capacity of single HA proxy first20:23
spatelbased on that number i can go with N number of haproxy behind DNS 20:23
johnsomMy (old haproxy version) benchmark is around 35k rps per core20:24
spateli have 10 core VM 20:24
spatel4 core assigned to haproxy 20:24
johnsomOk, check one more thing, if you restart haproxy, in the log file are there comments about the maximum number of open files? Any error/warnings?20:26
spatelmy open file limit is 10million 20:28
spateli did all kind of tuning... :?920:28
spatellast 2 days doing googling to make it work 20:28
johnsomSo, you say you have 15k in established, what are those for? Is that during the benchmark?20:30
johnsomOr just normal situation20:30
spatel15k was current value when i run that command 20:31
johnsomThe benchmark doesn't have some kind of idle connection pool thing that is actually causing  a problem does it?20:31
johnsomSome protocols will hold open a channel "in case" there is another request, which the benchmark is likely not using since it's trying to maximize connections20:32
spatelwhen i start load it can ramp up from 0 to 50k in 5 min interval and keep all connection active with running ping based keepalive (its all magic of my application loadtester)20:32
spatelsame loadtest tools working for F5 load-balancer 20:32
spatelon F5 we hit 1 million without issue.. i am trying to mimic on Haproxy 20:33
johnsomCheck your F5 config for "forced close" or similar20:33
johnsomI don't remember what it's called20:33
johnsomFor TCP traffic, it's related to FIN and RST packets20:34
johnsomWhat protocol are you using over TCP?20:35
johnsomTry adding "option nolinger" to your defaults section in haproxy.conf20:37
spatelhmm 20:38
johnsomOh, actually, just set it in the backend section20:38
spatellet me try that 20:38
spateli can set on both place :)20:38
johnsomI have a suspicion this is an issue with the protocol you are running on top of TCP.20:39
spatellet me try some option and see20:39
johnsomYou might try an alternative benchmark tool, like ethr or something and see if the results are different. 20:41
johnsomOk, play around with it. If the CPU isn't maxed out for HAProxy, it's something other than haproxy that is limiting it.20:43
johnsomI am on vacation this week, so not around a lot. But let me know how it goes.20:43
johnsomAlso, the folks in https://haproxy.slack.com/ might also have other ideas for you20:44
spatelMy big question is if haproxy running out of connection then kernel should log that in dmesg (but in my case its not)20:44
johnsomNot really, that is an error that the kernel will return to the requesting application20:45
johnsomFrom the error it sure seemed like reuse was the answer, this is common, but since that didn't work, I am wondering if it's an benchmark too issue and related to the FIN/RST settings like nolinger20:46
spatelI am going to try option nolinger (just waiting for my current loadtest to finish otherwise it will be mess to clean up all lingering stuff)20:47

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!