Monday, 2019-11-18

*** dsneddon_ has quit IRC00:04
*** ivve has quit IRC00:33
openstackgerritzhanghao proposed openstack/neutron master: Make network support read and write separation  https://review.opendev.org/67716601:24
*** ociuhandu has joined #openstack-neutron01:31
*** chenhaw has joined #openstack-neutron01:35
*** ociuhandu has quit IRC01:41
*** goldyfruit_ has quit IRC01:53
*** dsneddon_ has joined #openstack-neutron02:00
*** macz has joined #openstack-neutron02:05
*** dsneddon_ has quit IRC02:05
*** zhanglong has joined #openstack-neutron02:24
*** ociuhandu has joined #openstack-neutron02:26
*** ociuhandu has quit IRC02:30
*** ramishra has joined #openstack-neutron02:39
*** macz has quit IRC02:51
*** ociuhandu has joined #openstack-neutron03:00
*** ociuhandu has quit IRC03:07
*** dsneddon_ has joined #openstack-neutron04:01
*** dsneddon_ has quit IRC04:05
*** baojg has quit IRC04:22
*** ociuhandu has joined #openstack-neutron04:27
*** ociuhandu has quit IRC04:29
*** ociuhandu has joined #openstack-neutron04:30
*** ociuhandu has quit IRC04:36
*** ociuhandu has joined #openstack-neutron04:53
*** ociuhandu has quit IRC04:58
*** ociuhandu has joined #openstack-neutron04:58
*** ociuhandu has quit IRC05:03
*** gcheresh_ has joined #openstack-neutron05:12
*** gcheresh_ has quit IRC05:20
*** ratailor has joined #openstack-neutron05:23
*** ileixe has quit IRC05:29
*** gcheresh_ has joined #openstack-neutron05:30
*** ociuhandu has joined #openstack-neutron05:30
*** ileixe has joined #openstack-neutron05:32
*** ociuhandu has quit IRC05:35
*** ircuser-1 has quit IRC05:35
*** slaweq has joined #openstack-neutron05:47
*** gcheresh_ has quit IRC05:53
*** slaweq has quit IRC05:57
*** abaindur has joined #openstack-neutron05:58
*** dsneddon_ has joined #openstack-neutron06:02
*** abaindur has quit IRC06:05
*** dsneddon_ has quit IRC06:06
*** Luzi has joined #openstack-neutron06:11
*** awalende has joined #openstack-neutron06:16
*** awalende has quit IRC06:20
*** numans_ has joined #openstack-neutron06:44
*** ksambor has joined #openstack-neutron06:51
*** ileixe has quit IRC06:51
*** ileixe has joined #openstack-neutron06:52
*** abaindur has joined #openstack-neutron07:01
*** ociuhandu has joined #openstack-neutron07:02
*** rcernin has quit IRC07:04
*** ociuhandu has quit IRC07:06
*** abaindur has quit IRC07:07
*** ltomasbo has joined #openstack-neutron07:13
openstackgerritOleg Bondarev proposed openstack/neutron master: L3 agent graceful shutdown  https://review.opendev.org/69332307:19
*** rpittau|afk is now known as rpittau07:28
*** maciejjozefczyk has joined #openstack-neutron07:36
*** slaweq has joined #openstack-neutron07:38
*** igordc has joined #openstack-neutron07:58
*** tkajinam has quit IRC08:02
*** igordc has quit IRC08:02
*** dsneddon_ has joined #openstack-neutron08:03
*** dsneddon_ has quit IRC08:08
*** luksky has joined #openstack-neutron08:08
*** gcheresh_ has joined #openstack-neutron08:09
*** lajoskatona has joined #openstack-neutron08:11
*** tesseract has joined #openstack-neutron08:15
*** jlibosva has joined #openstack-neutron08:16
*** ociuhandu has joined #openstack-neutron08:22
*** ociuhandu has quit IRC08:29
*** ratailor_ has joined #openstack-neutron08:40
*** ratailor has quit IRC08:42
*** lucasagomes has joined #openstack-neutron08:45
*** ivve has joined #openstack-neutron08:46
*** jpena|off is now known as jpena08:49
*** ralonsoh has joined #openstack-neutron08:49
*** ociuhandu has joined #openstack-neutron09:04
openstackgerritMerged openstack/networking-ovn master: [vagrants] Move to Ubuntu 18.04 by default  https://review.opendev.org/69279009:06
*** nanzha has joined #openstack-neutron09:10
*** yankcrime has left #openstack-neutron09:12
*** awalende has joined #openstack-neutron09:16
*** awalende has quit IRC09:18
openstackgerritDaniel Bengtsson proposed openstack/neutron master: Stop configuring install_command in tox.  https://review.opendev.org/69456809:21
*** awalende has joined #openstack-neutron09:26
*** awalende has quit IRC09:30
openstackgerritSlawek Kaplonski proposed openstack/networking-bagpipe stable/train: bagpipe-bgp: cleanly ignore RTC route of unsupported type  https://review.opendev.org/69039509:36
*** awalende has joined #openstack-neutron09:36
openstackgerritSlawek Kaplonski proposed openstack/networking-bagpipe stable/stein: bagpipe-bgp: cleanly ignore RTC route of unsupported type  https://review.opendev.org/69039609:37
*** awalende has quit IRC09:37
*** ileixe has quit IRC09:40
*** ratailor__ has joined #openstack-neutron09:41
*** ileixe has joined #openstack-neutron09:41
*** ratailor_ has quit IRC09:43
openstackgerritMerged openstack/neutron-fwaas stable/stein: Fix list_entries for netlink_lib when running on py3  https://review.opendev.org/69383409:45
*** bobmel has joined #openstack-neutron09:45
*** ociuhandu has quit IRC09:57
*** ociuhandu has joined #openstack-neutron09:58
*** ociuhandu has quit IRC10:03
*** dsneddon_ has joined #openstack-neutron10:04
*** lennyb has quit IRC10:06
*** zhanglong has quit IRC10:06
*** dsneddon_ has quit IRC10:08
*** ileixe has quit IRC10:10
*** pcaruana has joined #openstack-neutron10:10
*** nanzha has quit IRC10:11
*** ileixe has joined #openstack-neutron10:11
*** nanzha has joined #openstack-neutron10:12
openstackgerritLucas Alvares Gomes proposed openstack/networking-ovn master: Add support for virtual port type  https://review.opendev.org/67622310:23
openstackgerritMerged openstack/neutron-fwaas stable/rocky: Fix list_entries for netlink_lib when running on py3  https://review.opendev.org/69383510:32
*** CeeMac has joined #openstack-neutron10:37
*** davidsha has joined #openstack-neutron10:42
openstackgerritDaniel Alvarez proposed openstack/networking-ovn stable/train: [metadata-agent] Fix issue with TLS/SSL connections  https://review.opendev.org/69474210:42
openstackgerritAditya Reddy Nagaram proposed openstack/neutron master: [WIP] Support for stateless security groups  https://review.opendev.org/57276710:44
*** rcernin has joined #openstack-neutron10:52
*** chenhaw has quit IRC10:59
*** ociuhandu has joined #openstack-neutron11:02
openstackgerritMerged openstack/neutron stable/stein: Add extra unit test for get_cmdline_from_pid function  https://review.opendev.org/69431611:03
*** luksky has quit IRC11:06
*** awalende has joined #openstack-neutron11:14
*** ramishra has quit IRC11:14
*** awalende has quit IRC11:14
*** awalende has joined #openstack-neutron11:20
*** awalende has quit IRC11:21
*** luksky has joined #openstack-neutron11:22
openstackgerritAditya Reddy Nagaram proposed openstack/neutron master: [WIP] Support for stateless security groups  https://review.opendev.org/57276711:29
*** bobmel has quit IRC11:52
openstackgerritMerged openstack/neutron-lib stable/train: install neutron_lib international messages  https://review.opendev.org/68961912:02
*** zhanglong has joined #openstack-neutron12:04
*** dsneddon_ has joined #openstack-neutron12:05
*** dsneddon_ has quit IRC12:09
*** ociuhandu has quit IRC12:13
*** jpena is now known as jpena|lunch12:26
*** ramishra has joined #openstack-neutron12:26
*** rcernin has quit IRC12:31
openstackgerritAdrian Chiris proposed openstack/neutron master: Add upgrade check for NIC Switch agent  https://review.opendev.org/69475712:32
*** lpetrut has joined #openstack-neutron12:42
*** ociuhandu has joined #openstack-neutron12:43
*** luksky has quit IRC12:47
*** ratailor__ has quit IRC12:51
*** dsneddon_ has joined #openstack-neutron12:53
*** zhanglong has quit IRC13:02
*** ociuhandu has quit IRC13:04
*** zhanglong has joined #openstack-neutron13:07
*** lennyb has joined #openstack-neutron13:09
*** zhanglong has quit IRC13:13
*** sapd1 has quit IRC13:16
*** lseki has joined #openstack-neutron13:17
openstackgerritLajos Katona proposed openstack/neutron master: HA race condition test for DHCP scheduling  https://review.opendev.org/68398713:18
*** dtantsur is now known as dtantsur|bbl13:21
*** nanzha has quit IRC13:24
*** nanzha has joined #openstack-neutron13:25
*** jpena|lunch is now known as jpena13:25
*** zhanglong has joined #openstack-neutron13:29
*** damiandabrowski2 has joined #openstack-neutron13:33
*** nweinber has joined #openstack-neutron13:37
*** luksky has joined #openstack-neutron13:39
openstackgerritSlawek Kaplonski proposed openstack/neutron master: Switch neutron-tempest-with-os-ken-master job to zuul v3  https://review.opendev.org/69477013:40
damiandabrowski2Hello, is it possible for neutron to provide full multihomed BGP routing for the cloud (not only advertise routes, but also direct traffic to on of the external/provider BGP peers? The documentation (neutron-dynamic-routing) shows route advertisment, but that would leave the outgoing traffic statically routed through one of the external networks.13:44
slaweqdamiandabrowski2: afaict we don't have anything like what You're asking for13:49
slaweqYou can only advertise prefixes from nodes using neutron-dynamic-routing13:50
slaweqbut maybe tidwellr will know more about it as he is expert in neutron-dynamic-routing13:50
openstackgerritSlawek Kaplonski proposed openstack/neutron-lib master: Revert "'interconnection' API extension definition (neutron-interconnection)"  https://review.opendev.org/69446613:53
*** awalende has joined #openstack-neutron13:55
*** zhanglong has quit IRC13:55
damiandabrowski2slaweq: thanks for Your answer! tidwellr I would be very grateful if You could confirm that it's not possible ATM.13:56
*** zhanglong has joined #openstack-neutron13:57
openstackgerritMerged openstack/networking-ovn stable/train: Add missing unittests to OVN provider driver  https://review.opendev.org/69400413:58
*** ramishra has quit IRC13:59
*** slaweq has quit IRC14:02
*** slaweq has joined #openstack-neutron14:04
openstackgerritMerged openstack/networking-ovn stable/stein: Add missing unittests to OVN provider driver  https://review.opendev.org/69400514:07
*** haleyb has joined #openstack-neutron14:12
*** ramishra has joined #openstack-neutron14:13
*** lennyb has quit IRC14:14
openstackgerritLajos Katona proposed openstack/networking-odl master: Change function.func_doc to function.__doc__  https://review.opendev.org/68315214:15
openstackgerritLajos Katona proposed openstack/networking-odl master: Try deinit odl_features in TestOdlFeaturesNoFixture setUpClass  https://review.opendev.org/66890414:17
*** awalende has quit IRC14:26
*** beekneemech is now known as bnemec14:29
*** zhanglong has quit IRC14:34
*** goldyfruit has joined #openstack-neutron14:40
openstackgerritMerged openstack/networking-odl master: Remove the remaining neutron-lbaas related constants  https://review.opendev.org/66816114:44
*** goldyfruit_ has joined #openstack-neutron14:51
*** goldyfruit has quit IRC14:53
*** baha has joined #openstack-neutron14:57
*** Luzi has quit IRC14:59
*** tesseract has quit IRC15:01
*** tesseract has joined #openstack-neutron15:01
*** dtantsur|bbl is now known as dtantsur15:01
tidwellrdamiandabrowski2: neutron-dynamic-routing will only announce the appropriate next-hops for floating IP's, subnets, and when using DVR the fixed IP. At the moment it doesn't steer egress traffic originating from VM's, the BGP announcements will only steer ingress traffic15:05
*** tidwellr has quit IRC15:06
damiandabrowski2ok Thank You!15:09
*** dsneddon_ has quit IRC15:18
*** dsneddon_ has joined #openstack-neutron15:23
*** ociuhandu has joined #openstack-neutron15:26
*** ociuhandu has quit IRC15:28
*** dsneddon_ has quit IRC15:28
fricklerdamiandabrowski2: the bgp speakers in neutron are also not directly attached to the datapath, so what you want would be very difficult to achieve. most likely you rather want to setup a (pair of) router(s) in front of your openstack cloud that does this15:30
*** lajoskatona has quit IRC15:30
*** dsneddon_ has joined #openstack-neutron15:49
*** dsneddon_ has quit IRC15:55
*** dklyle has quit IRC15:57
*** macz has joined #openstack-neutron15:58
*** dklyle has joined #openstack-neutron15:58
*** luksky has quit IRC15:59
*** dsneddon_ has joined #openstack-neutron16:00
zigoI'm getting a huge amount of logs from openvswitch-agent, things like this: http://paste.openstack.org/show/786284/16:04
*** mlavalle has joined #openstack-neutron16:04
zigoThis looks like a real bug in Neutron that's been there for a long time already. :(16:04
*** jmlowe has joined #openstack-neutron16:07
*** gcheresh has joined #openstack-neutron16:08
*** gcheresh_ has quit IRC16:09
*** ociuhandu has joined #openstack-neutron16:14
*** gcheresh has quit IRC16:14
*** ociuhandu has quit IRC16:19
zigoI'm having this issue often, and the only way I know to fix is: 1/ stop neutron-l3 and ovs-agent 2/ iptables -F ; iptables -X 3/ restart the agents.16:21
zigoThis is *very* annoying ...16:21
zigoAny clue on what's going on?16:22
zigoslaweq: mlavalle: ^16:22
zigoI'm also getting this in the l3-agent logs: http://paste.openstack.org/show/786286/16:28
*** ociuhandu has joined #openstack-neutron16:30
*** gcheresh has joined #openstack-neutron16:33
openstackgerritMerged openstack/networking-ovn stable/train: [metadata-agent] Fix issue with TLS/SSL connections  https://review.opendev.org/69474216:34
openstackgerritSlawek Kaplonski proposed openstack/neutron master: Switch neutron-tempest-with-os-ken-master job to zuul v3  https://review.opendev.org/69477016:42
*** gcheresh has quit IRC16:44
fricklerzigo: is that also on rocky or newer?16:52
njohnstonslaweq: So we have the new review-priority field, is it applied to neutron-lib as well?  Have we written down the rules on when to set it so we have a common understanding?16:53
zigofrickler: Rocky.16:54
zigo13.0.4...16:54
zigofrickler: Is this fixed in the point release ?16:54
zigo13.0.5 ?16:54
zigoI saw related commits on the tip of the branch.16:55
fricklernjohnston: neutron-lib doesn't have rp yet16:55
fricklerzigo: I don't know anything about your issue in general, but it seems that py3 related testing in rocky was a bit thin, so I'd not be surprised if this was another py3 issue. I've seen some already, though they were more obvious16:56
zigo:/16:57
*** ralonsoh has quit IRC16:58
*** aedc has joined #openstack-neutron16:59
*** ralonsoh has joined #openstack-neutron17:01
*** luksky has joined #openstack-neutron17:02
*** lucasagomes has quit IRC17:03
*** jmlowe has quit IRC17:04
*** ociuhandu has quit IRC17:17
*** rpittau is now known as rpittau|afk17:18
*** nanzha has quit IRC17:20
*** jlibosva has quit IRC17:23
*** nweinber has quit IRC17:26
*** ociuhandu has joined #openstack-neutron17:28
*** dsneddon_ has quit IRC17:32
*** dsneddon_ has joined #openstack-neutron17:34
openstackgerritMerged openstack/networking-ovn master: Devstack: Install six via pip  https://review.opendev.org/69209617:36
*** dsneddon_ has quit IRC17:39
*** davidsha has quit IRC17:39
*** ircuser-1 has joined #openstack-neutron17:40
*** ociuhandu has quit IRC17:40
*** jpena is now known as jpena|off17:47
*** jlibosva has joined #openstack-neutron17:51
openstackgerritMerged openstack/networking-ovn stable/stein: Support for Router Scheduling on addition/removal of chassis  https://review.opendev.org/69436217:51
*** bobmel has joined #openstack-neutron17:51
*** jlibosva has quit IRC18:01
*** tbachman has joined #openstack-neutron18:02
*** dtantsur is now known as dtantsur|afk18:03
*** dsneddon_ has joined #openstack-neutron18:11
openstackgerritAdrian Chiris proposed openstack/neutron master: Add upgrade check for NIC Switch agent  https://review.opendev.org/69475718:18
*** manjeets has joined #openstack-neutron18:31
*** mvkr has quit IRC18:32
*** tbachman has quit IRC18:36
*** hjensas has quit IRC18:36
*** jlibosva has joined #openstack-neutron18:44
*** igordc has joined #openstack-neutron18:46
*** gouthamr_ is now known as gouthamr18:51
*** tesseract has quit IRC18:52
*** tbachman has joined #openstack-neutron18:52
*** ralonsoh has quit IRC18:56
*** jlibosva has quit IRC19:04
*** hjensas has joined #openstack-neutron19:11
*** aedc has quit IRC19:29
*** abaindur has joined #openstack-neutron19:34
*** dsneddon_ has quit IRC19:35
*** manjeets has quit IRC19:35
*** abaindur has quit IRC19:35
*** abaindur has joined #openstack-neutron19:36
*** manjeets has joined #openstack-neutron19:39
*** jlibosva has joined #openstack-neutron19:40
*** dsneddon_ has joined #openstack-neutron19:40
*** lajoskatona has joined #openstack-neutron19:48
*** dsneddon_ has quit IRC19:48
*** abaindur has quit IRC19:50
*** dsneddon_ has joined #openstack-neutron19:50
*** abaindur has joined #openstack-neutron19:51
*** dsneddon_ has quit IRC19:55
*** gcheresh has joined #openstack-neutron19:55
openstackgerritBrian Haley proposed openstack/neutron master: Add accepted egress direct flow  https://review.opendev.org/66699119:56
openstackgerritTerry Wilson proposed openstack/networking-ovn master: Fix agent extension support after hashring merge  https://review.opendev.org/69484019:58
*** lajoskatona has quit IRC19:59
*** dsneddon_ has joined #openstack-neutron20:06
*** dsneddon_ has quit IRC20:11
*** jmlowe has joined #openstack-neutron20:16
*** bobmel has quit IRC20:17
*** gcheresh has quit IRC20:47
*** dsneddon_ has joined #openstack-neutron20:50
*** dsneddon_ has quit IRC20:58
*** goldyfruit_ has quit IRC21:02
*** dsneddon_ has joined #openstack-neutron21:04
*** dsneddon_ has quit IRC21:08
*** jlibosva has quit IRC21:10
*** awalende has joined #openstack-neutron21:16
*** goldyfruit has joined #openstack-neutron21:17
*** awalende has quit IRC21:21
openstackgerritBrian Haley proposed openstack/networking-ovn master: Correctly initialize HashRingIsEmpty class  https://review.opendev.org/69484721:22
*** gcheresh has joined #openstack-neutron21:22
*** awalende has joined #openstack-neutron21:26
*** awalende has quit IRC21:31
*** rkukura has joined #openstack-neutron21:32
*** awalende has joined #openstack-neutron21:36
*** ociuhandu has joined #openstack-neutron21:38
*** dsneddon_ has joined #openstack-neutron21:40
*** awalende has quit IRC21:40
*** abaindur has quit IRC21:42
*** ociuhandu has quit IRC21:43
*** abaindur has joined #openstack-neutron21:43
*** awalende has joined #openstack-neutron21:46
*** abaindur has quit IRC21:50
*** gcheresh has quit IRC21:52
*** awalende has quit IRC21:56
*** awalende has joined #openstack-neutron21:57
*** awalende has quit IRC22:07
*** awalende has joined #openstack-neutron22:07
*** awalende_ has joined #openstack-neutron22:08
*** awalende has quit IRC22:12
*** abaindur has joined #openstack-neutron22:20
*** abaindur has quit IRC22:24
*** pcaruana has quit IRC22:26
*** maciejjozefczyk has quit IRC22:39
*** maciejjozefczyk has joined #openstack-neutron22:39
*** mvkr has joined #openstack-neutron22:40
*** abaindur has joined #openstack-neutron22:44
*** abaindur has quit IRC22:44
*** abaindur has joined #openstack-neutron22:45
abaindurany idea what might be causing this DBDeadlock when updating agent timestamps?22:45
abaindurERROR oslo_db.api [req-231004a2-d988-47b3-9730-d6b5276fdcf8 - - - - -] DB exceeded retry limit.: DBDeadlock: (_mysql_exceptions.OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction') [SQL: u'UPDATE agents SET heartbeat_timestamp=%s WHERE agents.id = %s'] [parameters: (datetime.datetime(2019, 11, 18, 8, 50, 23, 804716), '223c754e-9d7f-4df3-b5a5-9be4eb8692b0')] (Background on this error at: http://sqlalch22:46
abaindure.me/e/e3q8)22:46
*** awalende_ has quit IRC22:46
*** rcernin has joined #openstack-neutron22:46
abaindurSince upgrading to Rocky, seeing repeatedly, AMQP loses connection (see various errors like missed heartbeats, Socket closed, Broken pipes, etc...) and report state RPCs from agents are timing out. Our rabbit-server and neutron-server are on the same node, all localhost communication22:47
eanderssonabaindur we had this as well22:47
abaindurafter some time, the q-reports-plugin rabbitmq queue grows large22:47
abaindurand we see those DBDeadlock stack traces in logs22:47
eanderssonWe had to scale up neutron massively and after about an hour the load went down and these problems went away22:48
abaindurscale up... what?22:48
abaindurthe state report workers #?22:48
abainduror rpc_workers?22:48
eanderssonrpc workers22:48
eanderssonThe problem we had was that the rpc workers would take up a ton of memory22:48
eanderssonHow long ago was it that you upgraded?22:49
abaindurper atop, not seeing memory that high... although neutron and rabbit are at top of list22:49
eanderssonHow many rpc workers do you have, and how many computes did you upgrade?22:49
abaindurmaybe a month ago or something. but we branched off stable/rocky sometime back in June or July22:50
*** slaweq has quit IRC22:50
eanderssonWe ended up setting out agent down time to 150 for agents22:50
eanderssonbut another key thing we had to do was tweak nova to contact neutron less often22:50
abaindurthis isnt really a scaled setup either - only 22 hypervisors22:51
abaindur345 instances22:51
abaindurI think we have 4 or 6 rpc_workers22:52
eanderssonYou can try to raise heal_instance_info_cache_interval on the compute side22:52
abaindurwe had rpc_state_report_workers = 1, but scaled that up to 4 since it was the agent heartbeats that was getting DBDeadlock22:52
abaindurHow would that help? Just trying to understand problem here - so first, what might cause neutron to miss AMQP heartbeats and get all kinds of Socket CLosed and broken pipe errors?22:53
eanderssonSo the issue we were having was that computes was hitting neutron too hard, causing deadlocks.22:54
abaindurand then, what would be causing a DBDeadlock? That seems like some kind of bug with syncrhonization... if we insrease timeouts/add workers, worried might just be delaying the problem...?22:54
eanderssonIt's odd as we had the exact same issue and tweaking the workers, agent timeout and the interval on computes fixed it for us22:55
eanderssonand we have 1k+ computes22:55
abaindurwere you hitting some random AMQP connection errors and broken pipes prior to dbdeadlocks?22:55
abaindur AMQP server on 127.0.0.1:5672 is unreachable: <AMQPError: unknown error>. Trying again in 1 seconds.22:55
eanderssonI don't think so.22:55
abaindurWARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: Too many heartbeats missed22:56
eanderssonHaven't seen that.22:56
abaindur WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: error: [Errno 32] Broken pipe22:56
eanderssonWe have heavily tweaked RabbitMQ for our deploys.22:56
abaindur[65c381f9-6766-4d88-815d-e13b74a7c46e] AMQP server 127.0.0.1:5672 closed the connection. Check login credentials: Socket closed: IOError: Socket closed22:56
abaindurAll kinds of various errors like that ^^22:57
eanderssonI only see the above in scenarios where something has gone wrong and everything is trying to reconnect too fast.22:57
abainduryea first few times it happened after a network/DB outage.22:57
abaindurbut most recently, didnt have any control plane outage - all of a sudden AMQP errors, agents start being reported as down22:58
eanderssonSo one of the things we noticed with this issue is that the report queue was growing exponentially.22:58
abaindurTakes about 1 hr for the DBDeadlock errors to pop up22:58
eanderssonq-reports-22:58
abainduryea we see that queue growing22:58
eanderssonor something like that22:58
abaindurrabbitmqctl list_queues | grep -vw "0$"22:59
abaindurTimeout: 60.0 seconds ...22:59
abaindurListing queues for vhost / ...22:59
abaindurname    messages22:59
abaindurq-reports-plugin    923922:59
abaindurYes, thats the only queue that grows22:59
abaindurdid you tweak any sql params>?23:00
abaindurlike max_pool_size or max_overflow ?23:00
eanderssonWe did at some point, but for Rocky I think we went back to the defaults23:01
abaindurour agent_down_time is actually at 360 sec23:02
*** luksky has quit IRC23:02
abaindurreport_interval is default, i think 30 sec23:02
eanderssonWe have it set to 6023:06
eanderssonBut we also have 1k computes23:06
abaindurhow many rpc and state report workers do you havve btw?23:07
eandersson5x 20 rpc23:07
eanderssonmaybe 5x 10 state23:07
eanderssonbtw the problem we found with heartbeats was that one of those agent deadlocks locked one process23:08
eanderssonSo with your 9000 queued heartbeats stuck in deadlock23:08
eanderssoneach one of those would lock up one neutron-rpc worker23:09
*** tkajinam has joined #openstack-neutron23:09
abaindurhmm how does it get deadlocked in first place?23:09
eanderssonI think it's retrying too fast23:10
abainduryea in mysql, we saw some stuck sql queries for quite old timestamps that were still trying to be executed23:10
eanderssonLet me see if I can find the code23:10
*** slaweq has joined #openstack-neutron23:11
abaindurcreate_or_update_agent in neutron/db/agents_db.py23:12
abaindurSeens this on some very basic, small setups so i definitely think something is wrong here. seems like its using some new neutron_lib code to wrap the DB updates since we moved to rocky23:14
*** slaweq has quit IRC23:17
eanderssonYea - we assumed that this was just due to our scale.23:17
eanderssonI cant find my notes, but we found that the agent stuff was hitting a db retry (which retried like 6-10 times and once every 0.5s)23:22
eanderssonWhen multiple were done at the same time it would cause them to race condition with eachother23:23
eanderssonSo each worker would be locked up for the duration of the db retry23:23
eanderssonAnd I am pretty sure that in rocky they introduced that retry.23:23
*** mlavalle has quit IRC23:24
*** ivve has quit IRC23:26
*** ociuhandu has joined #openstack-neutron23:31
*** ociuhandu has quit IRC23:36
*** goldyfruit has quit IRC23:36
eanderssonbtw I would send an email to the mailinglist or open a bug abaindur23:37
abainduri am filing one right now :)23:39
*** zhanglong has joined #openstack-neutron23:47
abainduri filed https://bugs.launchpad.net/neutron/+bug/185307123:59
openstackLaunchpad bug 1853071 in neutron "AMQP disconnects, q-reports-plugin queue grows, leading to DBDeadlocks while trying to update agent heartbeats" [Undecided,New]23:59
*** dsneddon_ has quit IRC23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!