14:00:13 <liuyulong> #startmeeting neutron_l3
14:00:13 <openstack> Meeting started Wed Jul 10 14:00:13 2019 UTC and is due to finish in 60 minutes.  The chair is liuyulong. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:16 <openstack> The meeting name has been set to 'neutron_l3'
14:00:20 <njohnston> o/
14:00:28 <liuyulong> #chair haleyb
14:00:29 <openstack> Current chairs: haleyb liuyulong
14:01:01 <ralonsoh> hi
14:01:59 <liuyulong88> The nickname 'liuyulong' is already in use!
14:02:12 <liuyulong88> OK
14:02:22 <liuyulong88> #topic Announcements
14:03:30 <liuyulong_> I have no announcement today.
14:03:43 <liuyulong_> If you have, please go ahead.
14:04:46 <liuyulong_> OK, let's move on.
14:04:53 <liuyulong_> #topic Bugs
14:05:16 <liuyulong_> #link http://lists.openstack.org/pipermail/openstack-discuss/2019-July/007455.html
14:05:22 <liuyulong_> Bence Romsics (rubasov) was our bug deputy the week before last, thank you.
14:05:35 <liuyulong_> #link http://lists.openstack.org/pipermail/openstack-discuss/2019-July/007577.html
14:05:41 <liuyulong_> Bernard Cafarelli (bcafarel), last week bug deputy, also thank you.
14:06:45 <liuyulong_> #link https://bugs.launchpad.net/neutron/+bug/1835044
14:06:46 <openstack> Launchpad bug 1835044 in neutron "[Queens] Memory leak in pyroute2 0.4.21" [High,Won't fix] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
14:06:54 <liuyulong_> This is now marked as wont-fix.
14:06:58 <ralonsoh> yes
14:07:09 <liuyulong_> But for this popular rpm repo, we still do not have a new version of pyroute2 for queens release.
14:07:11 <ralonsoh> because we can't modify stable requirements
14:07:12 <liuyulong_> #link http://mirror.centos.org/centos/7.6.1810/cloud/x86_64/openstack-queens/
14:07:18 <liuyulong_> It is still python2-pyroute2-0.4.21-1.el7.noarch.rpm
14:07:33 <slaweq> hi
14:07:40 <ralonsoh> each company should fix this
14:07:54 <ralonsoh> we are pushing the changes for our RPM repos
14:08:19 <ralonsoh> but this won't be changed in a stable branch unless this is a security problem
14:08:52 <liuyulong_> So R and S repo have new version?
14:09:05 <ralonsoh> in devstack/requirements yes
14:09:08 <ralonsoh> not Q
14:09:17 <ralonsoh> 0.5.2 vs 0.4.21
14:10:05 <liuyulong_> Our environment are all running queens, we indeed need a repo fix. : )
14:10:39 <liuyulong_> ralonsoh, thank you for working on this.
14:10:46 <liuyulong_> #link https://bugs.launchpad.net/neutron/+bug/1834308
14:10:47 <openstack> Launchpad bug 1834308 in neutron "[DVR][DB] too many slow query during agent restart" [Medium,Confirmed] - Assigned to LIU Yulong (dragon889)
14:10:47 <ralonsoh> my pleasure
14:11:02 <liuyulong_> I will submit a fix for DVR related DB query.
14:11:31 <liuyulong_> Our DBA help me to get some slow query LOG.
14:13:15 <liuyulong> sorry lost the connection again....
14:13:39 <liuyulong> We noticed there will be 300k+ slow query (0.5s+) during 30 nodes ovs-agent restart.
14:14:08 <liuyulong> Yeah, most of them are related to DVR
14:14:32 <liuyulong> next one may be related to this.
14:14:34 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1835663
14:14:34 <openstack> Launchpad bug 1835663 in neutron "Some L3 RPCs are time-consuming especially get_routers" [Medium,Confirmed]
14:14:42 <liuyulong> As you noticed, it is really slow.
14:14:47 <liuyulong> http://logs.openstack.org/11/669111/4/check/neutron-tempest-plugin-dvr-multinode-scenario/dc3af26/controller/logs/screen-q-l3.txt.gz#_Jul_07_04_18_11_791730
14:15:24 <liuyulong> IMO, 37s for a single RPC, this is not acceptable for a production environment. My OP colleagues will complain. : )
14:16:55 <liuyulong> Neutron server side DB slow query may be one reason.
14:18:32 <haleyb> liuyulong: ack, and that log was from a check job?  that's pretty bad
14:19:29 <liuyulong> For this log here, maybe upstream CI neutron server just meet its bottleneck. It can not answer too much RPC calls concurrently.
14:19:52 <liuyulong> haleyb, yes, it is bad
14:20:20 <liuyulong> #link https://review.opendev.org/#/c/669111/
14:20:33 <liuyulong> ralonsoh, slaweq, hi, this patch ^^
14:21:01 <ralonsoh> there is an implementation for this function
14:21:03 <liuyulong> The time cost wrapper, I left some comments
14:21:16 <ralonsoh> I'll review it after the meeting
14:21:52 <liuyulong> ralonsoh, yes, it's good to know we have similar function already.
14:22:46 <liuyulong> Let me quote the comment here:
14:23:03 <liuyulong> but it can not distinguish each call for same RPC, so I will still add a wrapper here which call that function inside. And a log for the function start is needed as well. We need to know the precisely call start and end.
14:23:35 <njohnston> yes exactly I think that if you override the message argument to the oslo.utils time_it function and add the generated uuid then you get the benefit of the function
14:24:22 <ralonsoh> njohnston, agree
14:24:34 <njohnston> but you could do that without a separate decorator
14:25:52 <liuyulong> njohnston, how to enable the start log without a new decorator?
14:26:33 <liuyulong> We may want to see what happened between start and end.
14:26:58 <liuyulong> time_it just log the duration.
14:27:33 <njohnston> @osloutils.time_it(message="time-cost: %(seconds).02f seconds to run function '%(func_name)s', uuid=" + uuidutils.generate_uuid())
14:29:12 <njohnston> I see, so you believe there is value in having the "call: start" separate from the "call: ended, time = %d" log messages
14:29:19 <njohnston> I can see your point
14:31:02 <slaweq> I agree with liuyulong that separate log for start and end can be useful
14:31:09 <liuyulong> And I have another concern, that 'StopWatch' is used in the 'time_it'. It looks pretty complicated, don't know if it will cause something wrong in RPC calls.
14:31:29 <ralonsoh> this is just a context manager
14:33:16 <liuyulong> OK, I will refactor this decorator.
14:33:30 <liuyulong> It will be useful for upstream CI
14:35:48 <liuyulong> I have no bug today, last week is a bit quite for L3.
14:36:41 <ralonsoh> I still have one bug in L3
14:36:45 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/1732458
14:36:46 <openstack> Launchpad bug 1732458 in neutron "deleted_ports memory leak in dhcp agent" [Medium,In progress] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
14:36:57 <ralonsoh> #link https://review.opendev.org/#/c/521035/
14:37:19 <ralonsoh> (CI is not passing but I'm rechecking)
14:37:41 <liuyulong> A very old one
14:39:09 <liuyulong> Recently we meet many exceptions about DHCP during some upgrading or restarting.
14:39:58 <liuyulong> And I'm deciding to remove the DHCP agent in our local environment.
14:40:44 <liuyulong> config_drive or L2-agent self-sevice DHCP looks more friendly to large scale cloud.
14:41:35 <slaweq> liuyulong: there is RFE about distributed dhcp agent
14:41:50 <slaweq> let me find it
14:42:03 <liuyulong> Yes, all from our OPs complain.
14:42:04 <amotoki> liuyulong: what do you mean by 'L2-agent self-sevice DHCP'?
14:42:22 <liuyulong> slaweq, did you mean the OVN related RFE?
14:42:25 <slaweq> liuyulong: https://bugs.launchpad.net/neutron/+bug/1806390
14:42:26 <openstack> Launchpad bug 1806390 in neutron "[RFE] Distributed DHCP agent " [Wishlist,In progress] - Assigned to Yang Youseok (ileixe)
14:42:37 <slaweq> liuyulong: no, this one isn't related to OVN
14:42:38 <liuyulong> amotoki, it is a local implementation.
14:43:10 <amotoki> liuyulong: okay, is it a kind of distributed one?
14:43:14 <liuyulong> amotoki, since OVS-agent have full acknowage of port IP and MAC.
14:43:27 <slaweq> amotoki: I remember when I was in OVH we also had something like that - neutron-ovs agent was spawning simple udhcpd service for each port on host - and that worked very well :)
14:44:07 <amotoki> liuyulong: slaweq: thanks. it reminds me of nova-network dhcp stuff per compute node.
14:44:58 <amotoki> the proposed distributed dhcp agent would be similar.
14:45:03 <liuyulong> https://review.opendev.org/#/c/658414/9/specs/train/ml2ovs-ovn-convergence.rst@38
14:45:18 <liuyulong> I left a comment here, but no response for now. : )
14:45:33 <liuyulong> ML2+OVS+DVR and OVN
14:46:18 <haleyb> liuyulong: i will look at your comment...
14:46:43 <liuyulong> OK, next topic
14:46:58 <liuyulong> #topic Routed Networks
14:47:22 <liuyulong> I'm now interested in how this will work for external network with multiple segments.
14:47:36 <liuyulong> Yes, I mean public (provider) network for router gateway and floating IP.
14:48:58 <liuyulong> I also left some comment here: https://review.opendev.org/#/c/657170/
14:49:02 <liuyulong> No response.
14:49:22 <liuyulong> mlavalle, tidwellr, wwriverrat: your turn now.
14:50:52 <liuyulong> No updates?
14:51:01 <liuyulong> Next topic
14:51:15 <liuyulong> #topic On demand agenda
14:51:54 <liuyulong> I have one more thing about OVN and dvr.
14:52:00 <liuyulong> #link https://blueprints.launchpad.net/neutron/+spec/openflow-based-dvr
14:52:08 <liuyulong> Maybe we shoud add a note for this BP, or mark it as something like not-complete or abandoned.
14:52:12 <liuyulong> s/should
14:52:40 <liuyulong> And also abandon the related gerrit patch.
14:55:25 <haleyb> yes, i don't think that will be implemented
14:55:28 <amotoki> +1. it clarifies the current situation and it is useful especially for operators.
14:57:02 <liuyulong> OK, time is up, let's stop here.
14:57:10 <liuyulong> Thank you guys.
14:57:15 <liuyulong> #endmeeting