16:00:07 #startmeeting neutron_ci 16:00:08 Meeting started Tue Jul 9 16:00:07 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:12 The meeting name has been set to 'neutron_ci' 16:00:16 o/ 16:00:17 o/ 16:01:12 lets wait few more minutes for others 16:02:34 hi 16:02:53 ok, so lets start 16:02:54 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:03:08 please open now so it will be ready later 16:03:14 #topic Actions from previous meetings 16:03:17 * mlavalle triggered Grafana 16:03:22 mlavalle to continue debugging neutron-tempest-plugin-dvr-multinode-scenario issues 16:03:32 I did continue looking at that 16:03:47 I am still finding the metadata proxy issue 16:04:13 Left detailed comment here: https://bugs.launchpad.net/neutron/+bug/1830763/comments/3 16:04:14 Launchpad bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:04:52 The summary is that the instance is ready before haproxy-metadata-proxy is created 16:04:57 in one of the nodes 16:05:07 and therefore fails to get its metadata 16:05:29 just FYI, the failure rate for that job has been dropping the past few days and is down to 32% 16:05:30 this analysis was done with http://logs.openstack.org/14/668914/1/check/neutron-tempest-plugin-dvr-multinode-scenario/f2ce738/ 16:06:05 this patch was created by ralonsoh late last week 16:06:11 I think 16:06:43 mlavalle: so there is some error on neutron-server side which slows down creation of router on compute node, right? 16:07:01 yes, that's my theory 16:08:09 do You know what is this "8831ed85-9ccf-48a2-92eb-ab39d3d30e89-ubuntu-bionic-rax-ord-0008" which is reported as duplicate in error message? 16:08:26 no, I haven't gotten to that yet 16:08:31 is it agent_id-hostname pair? 16:08:33 or what? 16:08:37 ok 16:09:01 so You will continue this investigation, right? 16:09:06 yes 16:09:10 indeed 16:09:21 I'll dig in this case as deeply as possible 16:09:23 thx mlavalle, great progress on this :) 16:09:29 #action mlavalle to continue debugging neutron-tempest-plugin-dvr-multinode-scenario issues 16:09:42 the good thing is that this patch is very recent, so the logs will be there 16:09:43 ok, next action was 16:09:50 slaweq to send patch to switch functional tests job in fwaas repo to py3 16:10:11 Patches: https://review.opendev.org/668917 and https://review.opendev.org/668918 are send 16:10:16 also https://review.opendev.org/#/c/669757/ is needed 16:10:23 please take a look if You will have some time 16:10:29 * njohnston will take a look 16:10:34 njohnston: thx 16:10:43 next one: 16:10:45 ralonsoh to continue debugging TestNeutronServer: start function (bug/1833279) with new logs 16:11:29 I didn't see any other error in the CI related to thjis bug 16:11:56 http://logstash.openstack.org/#/dashboard/file/logstash.json?query=build_name:neutron-functional%20AND%20message:%5C%22Timed%20out%20waiting%20for%20file%5C%22 16:13:42 ok, so maybe we can simply close this bug for now 16:13:49 what do You think? 16:13:57 I also didn't saw it during last week 16:13:59 I'll keep an eye on this during the next week 16:14:10 ok, thx ralonsoh 16:14:12 and if I don't see anythiong, I'll close it 16:14:14 np! 16:14:17 ++ 16:14:30 ok, next one 16:14:31 can I say something about fwaas 16:14:32 slaweq to mark test_ha_router_restart_agents_no_packet_lost as unstable again 16:14:35 Done: https://review.opendev.org/668914 16:14:45 mlavalle: sure, go on 16:15:12 I pinged Sridar yesterday. He responded back that he is still trying to help 16:15:25 so I will organize a meeting with him 16:15:35 This coming Thursday 16:15:40 I'll invite njohnston 16:15:44 yes please 16:15:47 late o/ sorry the local Q&A session took some time 16:15:59 that's it 16:16:23 mlavalle: can You invite me to this meeting as well? 16:16:30 slaweq: yes 16:16:32 thx 16:17:15 ok, lets move on then 16:17:23 #topic Stadium projects 16:17:31 first Python 3 migration 16:17:37 etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:17:59 I moved finished projects to the bottom of the document as bcafarel suggested last week 16:18:18 I'm currently working on the bagpipe tempest jobs 16:18:23 we have only 6 projects on the "not finished" list 16:18:47 but for most of the things we have volunteers already 16:19:09 I think we should ask yamamoto about networking-midonet 16:19:26 or is there anyone else involved in this project who we can ping? mlavalle do You know? 16:19:40 and thx njohnston for taking care of bagpipe :) 16:19:56 let's ask yamamoto 16:20:05 ok, I will send him an email this week 16:20:11 good for You? 16:20:14 yes 16:20:33 #action slaweq to contact yamamoto about networking-midonet py3 status 16:20:59 any other updates on it? 16:22:13 ok, lets move on then 16:22:20 tempest-plugins migration 16:22:26 Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:22:28 any updates? 16:22:54 I got back to the fwaas first change and made some progress with it 16:23:19 njohnston: yeah, I saw it today and I commented on one of Your patches already 16:23:24 it has 2 +2s but no +W https://review.opendev.org/#/c/643662/ 16:23:42 I'll fix up that issue, it's in the second stage change 16:23:59 +W'ed now 16:24:12 thanks! 16:24:21 thanks for working on this 16:24:40 so we will have still not finished vpnaas and neutron-dynamic-routing 16:25:13 and we should be good on this finally so gmann will be happy :) 16:25:48 mlavalle: tidwellr do You need any help with patches for vpnaas and neutron-dynamic-routing ? I can help with them if needed 16:26:34 slaweq: if I am slowing down anybody. go ahead. but I would still like to give it a try 16:26:48 mlavalle: no, it's not urgent of course 16:26:52 I just wanted to ask :) 16:27:00 thanks 16:27:34 ok, any other questions/updates about stadium projects? 16:28:20 ok, so lets move on then 16:28:26 #topic Grafana 16:28:31 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:29:11 I don't see anything urgent or "special" this week 16:29:23 So hey, how's about that neutron-tempest-plugin-dvr-multinode-scenario failure rate? Looking good! 16:29:53 exactly, it's better 16:29:59 agree 16:30:29 I think that changing order of connecting router interfaces and spawning instance minimized possibility of this race with metadata proxy 16:30:34 so it is failing less now 16:30:40 but issue is still there :/ 16:31:19 you mean after your patch? 16:31:24 still neutron-tempest-with-uwsgi is failing 100% but there is patch https://review.opendev.org/#/c/668311/ for that 16:31:29 mlavalle: yes 16:32:34 anything else related to grafana? 16:32:37 nope 16:33:23 ok, lets move on then 16:33:25 #topic fullstack/functional 16:33:41 I found 3 different test failures in functional tests recently: 16:33:47 neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase - http://logs.openstack.org/35/521035/8/check/neutron-functional/65a4ecf/testr_results.html.gz 16:34:03 ^^ I think ralonsoh was looking into something similar some time ago, right? 16:34:19 yes, I almost have a patch for this 16:34:41 it's this one which removes locks on netns privileged functions? 16:34:42 actually what I'm doing is replacing the ip_monitor in keepalived 16:34:51 which is the source of some problems 16:35:15 ooooh sorry 16:35:17 but in this case error was "Cannot open network namespace "qrouter-599a3366-e0ea-4a35-9497-4263b8409b00": No such file or directory" 16:35:17 one sec 16:35:28 yes, one sec (I write without reading first) 16:35:42 https://review.opendev.org/#/c/668682/ 16:36:20 one line, big discussion in the patch 16:36:52 personally I'm fine with merging Your patch 16:37:09 (me too) 16:37:19 the config removal proposed by liuyulong should be IMO discussed on neutron team meeting or drivers meeting maybe 16:39:00 mlavalle: njohnston: please add this patch to Your list of reviews and say what is Your opinion about it 16:39:13 ok 16:39:17 thx a lot 16:39:22 I think it’s a good topic for the drivers meeting, I don’t think we need to use the time of the entire neutron community to talk about it 16:39:26 will do 16:40:05 njohnston_: I agree 16:40:25 ok, next one was: 16:40:27 neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase - http://logs.openstack.org/70/650270/19/check/neutron-functional-python27/6c9c017/testr_results.html.gz 16:41:10 which I don't remember to see in the past 16:41:13 did You saw it before? 16:41:46 IMO, this is similar to other privsep errors 16:41:57 the privsep thread pool is limited 16:42:09 and sometimes many ops are executed at the same time 16:42:14 in logs there is almost nothing: http://logs.openstack.org/70/650270/19/check/neutron-functional-python27/6c9c017/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase.test_arp_protection_update.txt.gz 16:42:33 eventually will have an operation without time to be executed 16:43:02 ralonsoh: so do You think that maybe limiting number of workers can improve this? 16:43:32 slaweq, how many workers do we have? 16:43:38 one per core/thread? 16:44:25 ralonsoh: I don't know exactly 16:44:36 slaweq, that should be the limit 16:44:39 ralonsoh: from log in this failed job it looks that it's not specified 16:44:47 and it runs 8 workers IMO 16:44:51 slaweq, no, this is infra config 16:44:55 http://logs.openstack.org/70/650270/19/check/neutron-functional-python27/6c9c017/job-output.txt.gz#_2019-07-08_18_54_37_453034 16:46:07 it simply runs "tox -edsvm-functional-python27" and nothing else 16:46:09 http://logs.openstack.org/70/650270/19/check/neutron-functional-python27/6c9c017/job-output.txt.gz#_2019-07-08_18_53_34_474097 16:46:25 so if we want to limit number of test workers we should specify it there IMO 16:46:26 slaweq, so by default one worker per thread 16:46:52 let me check how to increase, in FT, the privsep pool 16:47:27 ralonsoh: ok, thx a lot 16:47:36 #action ralonsoh to check how to increase, in FT, the privsep pool 16:47:57 last one which I found is 16:47:58 neutron.tests.functional.agent.l3.test_ha_router.L3HATestCase - http://logs.openstack.org/69/668569/3/check/neutron-functional-python27/9d345e8/testr_results.html.gz 16:48:50 and I see error like: http://logs.openstack.org/69/668569/3/check/neutron-functional-python27/9d345e8/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.l3.test_ha_router.L3HATestCase.test_keepalived_state_change_notification.txt.gz#_2019-07-05_10_15_10_728 16:48:55 in logs from this test 16:49:17 but I'm not sure if that is related or not 16:49:38 I can investigate this one during this week 16:49:54 #action slaweq to check neutron.tests.functional.agent.l3.test_ha_router.L3HATestCase issue 16:50:05 did we merge a check to validate gateway is inside the subnet? Because this looks like it would fail such a check: ['ip', 'netns', 'exec', 'qrouter-eb89b020-c3b0-410f-82ec-639b197dc05b', 'ip', 'route', 'replace', 'to', '8.8.8.0/24', 'via', '19.4.4.4'] 16:50:34 oh, sorry, I misread that it was a default gateway, that just looks like an additional route 16:50:50 but it's not gateway 16:50:57 only extra route IIRC: http://logs.openstack.org/69/668569/3/check/neutron-functional-python27/9d345e8/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.l3.test_ha_router.L3HATestCase.test_keepalived_state_change_notification.txt.gz#_2019-07-05_10_15_10_662 16:51:03 yeah 16:51:28 so I will look deeper into it this week 16:51:50 and I will report a bug or send a patch, whatever will be needed :) 16:52:12 that's all from my side according to functional (and fullstack) tests 16:52:22 do You have anything else on that topic? 16:53:23 ok, lets move forward than 16:53:25 #topic Tempest/Scenario 16:53:35 first of all, good news 16:53:37 gmann proposed new integrated-gate templates, one for networking is merged: https://review.opendev.org/#/c/668930/ 16:53:43 and I proposed to use it in neutron: https://review.opendev.org/#/c/669815/ 16:54:17 I will also later try to change our exising neutron-tempest jobs to be inherit from this new jobs dedicated for networking 16:54:36 that should improve our gates as we will not run e.g. cinder or glance related tests in our gate 16:54:38 :) 16:54:58 cool 16:55:31 and now lets get back to the reality :P 16:55:36 I opened new bug https://bugs.launchpad.net/neutron/+bug/1835914 16:55:36 Launchpad bug 1835914 in neutron "Test test_show_network_segment_range failing" [Medium,Confirmed] 16:55:45 I found it at least 2 times during last week 16:56:05 and it is strange for me that network_segment_range object don't have project_id sometimes :/ 16:56:24 if there is any volunteer who wants to take a look into this, feel free :) 16:56:32 I noticed weirdnesses like that with bulk port - sometimes the requests that get created in testing aren't complete 16:56:42 I might give it a try later this week 16:56:52 njohnston: good to know 16:57:02 i'm always confused by the tenant_id/project_id transition, are we supposed to be using both still? shouldn't one have gone away? 16:57:22 haleyb: but in this case tenant_id wasn't there also 16:57:41 haleyb: either/or 16:58:02 oh, that's strange 16:58:08 both missing that is 16:58:08 haleyb: exactly 16:58:20 I'll try to take a look as well 16:58:42 I set it as medium priority as it don't happens often but it happens from time to time so there is definitely something to check :) 16:58:51 thx njohnston and mlavalle for taking care of it 16:59:05 ok, we are running out of time now 16:59:17 thx for attending 16:59:21 o/ 16:59:23 o/ 16:59:30 have a great week and see You online :) 16:59:32 o/ 16:59:36 #endmeeting