14:00:14 <liuyulong> #startmeeting neutron_l3 14:00:14 <openstack> Meeting started Wed Jul 17 14:00:14 2019 UTC and is due to finish in 60 minutes. The chair is liuyulong. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:15 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:17 <openstack> The meeting name has been set to 'neutron_l3' 14:00:33 <njohnston_> o/ 14:00:34 <haleyb> hi 14:00:40 <liuyulong> hi 14:00:46 <liuyulong> #chair haleyb 14:00:47 <openstack> Current chairs: haleyb liuyulong 14:01:15 <liuyulong> #topic Announcements 14:01:23 <ralonsoh> hi 14:01:25 <liuyulong> I have a question 14:01:37 <liuyulong> Where we can apply the early-bird discount for Shanghai summit and PTG? 14:01:49 <liuyulong> I have not received any discount CODE recently. If you have any information, please let us know. 14:02:37 <slaweq> hi 14:02:56 <liuyulong> And we have this PTG plan etherpad now: 14:03:03 <liuyulong> #link https://etherpad.openstack.org/p/Shanghai-Neutron-Planning 14:03:37 <liuyulong> And one more important thing is, seems we all still do not know the final official 'U' release name. 14:03:45 <liuyulong> It is interesting now. 14:04:23 <haleyb> there will be a vote eventually... 14:04:32 <liuyulong> Chinese Pinyin does not have 'U' starts pronunciation. But I have sent a suggestion to the mail list, like a stone dropped into the ocean, without any response. 14:04:39 <liuyulong> #link http://lists.openstack.org/pipermail/openstack-discuss/2019-February/002706.html 14:06:03 <liuyulong> Allow me to quote the contents from that mail: 14:06:11 <liuyulong> """ 14:06:27 <liuyulong> And my name is Yulong, then 'Uylong' can be a good example to explain my suggestion, : ) 14:07:47 <slaweq> so maybe we should propose 'Uylong' as a name of the most famous chineese neutron core ;) 14:08:04 <haleyb> +2 :) 14:08:15 <liuyulong> haleyb, What vote time is it usually? 14:09:14 <liuyulong> For OpenStack tradition, it should be a place name. Haha 14:09:45 <haleyb> liuyulong: i don't remember when exactly, but yes, a place or street or ??? 14:10:14 <liuyulong> Yes, mountains and rivers 14:10:43 <haleyb> Ussuri 14:11:09 <haleyb> the TC will eventually send an email with a place to add suggestions 14:11:41 <slaweq> yes, usually it was some wiki page or something like that where people were adding proposals for voting IIRC 14:12:06 <liuyulong> Ussuri is more like a Russia word. For Chinese Pinyin, it is "Wusuli". 14:12:07 <haleyb> we had the same problem in Hong Kong and chose Icehouse (street) 14:13:05 <haleyb> not sure we will solve this in the L3 meeting though :) 14:13:17 <liuyulong> Haha 14:13:27 <liuyulong> OK, Any other announcements? 14:13:45 <liuyulong> Sure, let's move on. 14:13:52 <liuyulong> #topic Bugs 14:14:01 <liuyulong> #link http://lists.openstack.org/pipermail/openstack-discuss/2019-July/007763.html 14:14:07 <liuyulong> Slawek Kaplonski (slaweq) was our bug deputy last week, thank you for the collection. 14:14:35 <liuyulong> I will skip all the bugs which were fixed or the related patches are getting merged now. 14:14:55 <liuyulong> First one 14:14:57 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1836642 14:14:58 <openstack> Launchpad bug 1836642 in neutron "Metadata responses are very slow sometimes" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 14:15:09 <liuyulong> Looks like nova metadata API was do nothing during that 16s+. 14:15:14 <liuyulong> Here is an example: 14:15:19 <liuyulong> #link http://logs.openstack.org/09/666409/7/check/tempest-full/08f4c53/controller/logs/screen-n-api-meta.txt.gz#_Jul_11_23_43_01_357100 14:15:39 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1821912 14:15:41 <openstack> Launchpad bug 1821912 in neutron "intermittent ssh failures in various scenario tests" [High,In progress] - Assigned to LIU Yulong (dragon889) 14:15:50 <liuyulong> slaweq sent some similar logs here before. 14:16:10 <slaweq> sean-k-mooney was looking into it yesterday with me, and he found that in case which we were analysing there was most of the time wasted on http://logs.openstack.org/09/666409/7/check/tempest-full/08f4c53/controller/logs/screen-q-svc.txt.gz#_Jul_11_23_43_04_584216 14:16:35 <slaweq> because when nova is preparing metadata for instance it is asking neutron server for security groups for instance :O 14:16:57 <slaweq> and it looks that in this case that call to neutron took most of the time and caused problem 14:17:24 <slaweq> becuase of that I sent today DNM patch to check time-cost of those resync quota methods 14:17:36 <slaweq> njohnston_: ^^ that's explanation for Your question in review there :) 14:17:47 <slaweq> but I'm also looking at other examples 14:17:48 <haleyb> why does nova need that? is it something given in metadata? 14:17:55 <liuyulong> We can not blame nova now, haha 14:17:56 <slaweq> haleyb: I have no idea 14:18:14 <njohnsto_> yes security groups are part of metadata 14:18:21 <liuyulong> haleyb, I know some clue 14:18:34 <slaweq> so, I was looking also at other examples and I found that it's not always this quota resync which takes long time 14:18:52 <liuyulong> haleyb, when nova try to sync network info it will try to get the port and its secruity group informations. 14:19:12 <slaweq> BUT, in every case in almost the same time as timeouted request to metadata is send, there is some API call in neutron which takes more than 10 seconds 14:19:51 <slaweq> so currently it looks for me like some slow down in neutron or maybe in db? I don't know 14:20:16 <haleyb> just a thought, but we should look at the code making the call, and make sure it's not asking for everything, but supplying a good filter 14:20:17 <liuyulong> DB slow query, maybe something like the bug we talked about last week. 14:20:47 <slaweq> haleyb: sure, but it is working fine in most cases 14:20:47 <njohnsto_> in the long term it would be good to 14:21:02 <slaweq> in tempest job there is plenty of vms spawned, each of them is asking for public-keys to metadata 14:21:17 <slaweq> and sometimes, one of such queries is long (more than 10 seconds) 14:21:49 <slaweq> and it's not last/middle/first test AFAICT - there is no any other pattern IMO 14:21:49 <liuyulong> Some related neutron API call is here: https://github.com/openstack/nova/blob/master/nova/network/neutronv2/api.py 14:22:33 <slaweq> the only common thing is that VM is doing request GET /2009-04-04/public-keys/ and it takes more than 10 seconds 14:22:41 <slaweq> 10 seconds is set as timeout in cirros script 14:22:45 <slaweq> so this fails 14:22:59 <slaweq> even if later nova send proper 200 response 14:23:34 <slaweq> I will try to read one more time all analysis from sean and go through all those calls there 14:23:43 <slaweq> maybe I will find something more 14:25:23 <liuyulong> slaweq, OK, thank you for working on this. Where is your DNM patch? 14:25:37 <haleyb> slaweq: i wonder how many SGs nova is asking for after noticing this... 14:25:44 <haleyb> https://github.com/openstack/nova/blob/master/nova/network/neutronv2/api.py#L760 14:25:45 <slaweq> liuyulong: https://review.opendev.org/#/c/671300/ 14:26:09 <haleyb> it is at least passing the tenant_id though 14:26:12 <liuyulong> slaweq, this should be the metadata API call neutron security group list: https://github.com/openstack/nova/blob/master/nova/api/metadata/base.py#L145 14:26:14 <njohnston_> a related bug filed under the tripleo project for similar failures in queens: https://bugs.launchpad.net/tripleo/+bug/1836046 14:26:15 <openstack> Launchpad bug 1836046 in tripleo "tempest.scenario.test_network_basic_ops.TestNetworkBasicOps Failing on queens" [Critical,Triaged] 14:26:22 <slaweq> haleyb: but as I said, in other cases it wasn't exactly the same, and there was other call which took long time 14:26:49 <slaweq> njohnston_: yep 14:27:01 <slaweq> and we have also d/s bug in bugzilla for the same 14:27:21 <liuyulong> One more thing is, we have added some time-consuming tracking log. It will help us to find some potential causes of CI failure. 14:27:26 <liuyulong> L3 RPC time-costs: 14:27:30 <liuyulong> #link http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Time-cost%3A%5C%22%20and%20NOT%20message%3A%20%5C%22start%5C%22 14:27:37 <liuyulong> L3 router processing time: 14:27:42 <liuyulong> #link http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Finished%20a%20router%20update%5C%22 14:28:38 <njohnston_> this is great stuff 14:28:55 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1836253 14:28:56 <openstack> Launchpad bug 1836253 in neutron "Sometimes InstanceMetada API returns 404 due to invalid InstaceID returned by _get_instance_and_tenant_id()" [Medium,Confirmed] - Assigned to Bence Romsics (bence-romsics) 14:29:11 <liuyulong> And this one looks like also related to the former bugs. 14:29:21 <slaweq> I though that it may be related 14:29:41 <slaweq> but it seems that we don't have this cache configured in neutron-metadata agent in any job 14:29:48 <slaweq> so it's not the case in gate 14:30:00 <liuyulong> More like a race condition. 14:31:33 <liuyulong> Next one 14:31:52 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1806032 14:31:52 <openstack> Launchpad bug 1806032 in neutron "neutron doesn't prevent the network update from external to internal when floatingIPs present" [Low,New] 14:32:01 <liuyulong> This will be proceed again from my understanding. 14:32:24 <liuyulong> I don't know why a cloud wants to change the external network type, but it is indeed a neutron bug. 14:33:05 <liuyulong> Next 14:33:06 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1835914 14:33:07 <openstack> Launchpad bug 1835914 in neutron "Test test_show_network_segment_range failing" [Medium,Confirmed] 14:33:20 <liuyulong> Let's try to contact Kailun Qin, he is the original author. 14:35:06 <liuyulong> And for this new feature, our test team have report many bugs. I will file them to the launchpad recently. 14:36:53 <liuyulong> Next two has been talked last week: 14:36:55 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1834308 14:36:56 <openstack> Launchpad bug 1834308 in neutron "[DVR][DB] too many slow query during agent restart" [Medium,Confirmed] - Assigned to LIU Yulong (dragon889) 14:36:59 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1835663 14:37:00 <openstack> Launchpad bug 1835663 in neutron "Some L3 RPCs are time-consuming especially get_routers" [Medium,Confirmed] 14:37:53 <liuyulong> Yes, I will upload some fix for the DB slow query. 14:39:18 <liuyulong> No more bug from me today. 14:39:33 <slaweq> \o/ no more bugs \o/ :D 14:39:37 <haleyb> there was one i had 14:39:46 <haleyb> https://bugs.launchpad.net/neutron/+bug/1835731 14:39:47 <openstack> Launchpad bug 1835731 in neutron "Neutron server error: failed to update port DOWN" [High,In progress] - Assigned to Oleg Bondarev (obondarev) 14:39:47 <slaweq> haleyb: :( 14:39:53 <haleyb> https://review.opendev.org/#/c/669640/ 14:40:09 <haleyb> liuyulong had a -1 on the change, didn't know if we needed to discuss 14:40:27 <liuyulong> We have remove that config for master branch 14:40:54 <liuyulong> And that removal fixes the bug, IMO 14:41:19 <liuyulong> But for the stable branches, it may need another approach. 14:41:49 <ralonsoh> IMO, we can use this patch with a note and then cherry-pick to stable branches 14:42:00 <ralonsoh> the logic seems to be correct in master and stable 14:42:46 <slaweq> ralonsoh++ 14:43:08 <slaweq> and later merge liuyulong's patch which removes this option in master 14:43:16 <ralonsoh> correct 14:43:52 <haleyb> ok, seems we have a way forward, just didn't want it to fall through the cracks 14:46:19 <haleyb> that's all from me 14:46:34 <ralonsoh> just a last note: https://review.opendev.org/#/c/521035/ 14:46:41 <ralonsoh> reviews are welcome 14:47:41 <liuyulong> OK, Let's move on. 14:47:45 <liuyulong> #topic Routed Networks 14:48:31 <liuyulong> I have get no reponse for the concern of "externel network with multiple segments". 14:48:31 <liuyulong> #link https://review.opendev.org/#/q/topic:bug/1764738 14:48:38 <liuyulong> no reponse and not too much activities from these patches. 14:49:21 <ralonsoh> maybe mlavalle can ping David 14:49:30 <wwriverrat> yes. sorry. I fear I'm a little over my head on how far reaching allowing multiple segments per host touches 14:51:03 <wwriverrat> Would love to have a 1-1 review with someone who has pulled ^ code and knows the intent of where it was going 14:52:01 <liuyulong> We are now facing such issue, external network has a large sets IPs, broadcast domain is too large. 14:53:03 <wwriverrat> Our original thought: why not allow multiple segments per network *everywhere* (thinking that for most implementations only one would be returned in a list) 14:53:28 <wwriverrat> but this crosses api, rpc and agent boundries 14:54:43 <wwriverrat> so... I know mlavalle and I were trying to find time to have a review session. Will keep trying 14:55:40 <liuyulong> both mlavalle and tidwellr may help 14:56:12 <liuyulong> #topic On demand agenda 14:56:17 <liuyulong> #link https://blueprints.launchpad.net/neutron/+spec/openflow-based-dvr 14:56:26 <liuyulong> I've added a small NOTE here: we have abandoned it. 14:56:32 <liuyulong> And these patches should be abandoned. I have no right to do that. 14:56:36 <liuyulong> #link https://review.opendev.org/#/q/topic:openflow-based-dvr+status:open 14:58:03 <liuyulong> We are running out of time. 14:58:07 <liuyulong> Let 14:58:17 <liuyulong> us stop here 14:58:21 <liuyulong> #endmeeting