16:00:26 #startmeeting neutron_ci 16:00:27 Meeting started Tue Jul 23 16:00:26 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:28 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:28 hi 16:00:30 The meeting name has been set to 'neutron_ci' 16:00:31 hi 16:00:31 o/ 16:02:01 I know that njohnston is quite busy with internal stuff now, haleyb and bcafarel are on PTO 16:02:09 so lets start 16:02:14 first of all: 16:02:18 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:02:34 and now lets go 16:02:36 #topic Actions from previous meetings 16:02:44 first one: 16:02:46 mlavalle to continue debugging neutron-tempest-plugin-dvr-multinode-scenario issues 16:03:04 I did, although I changed my approach 16:03:10 o/ 16:03:58 since the merging of https://review.opendev.org/#/c/667547/, the frequency of test_connectivity_through_2_routers has decreased significantly 16:04:33 it still fails sometimes but many of those failures may be due to the slowness in metadata / nova we discussed yesterday 16:04:56 so I did an analysis of the failures over the past 7 days of all the tests 16:05:05 and came up with a ranking 16:05:25 Please see note #5 in 16:05:28 https://bugs.launchpad.net/neutron/+bug/1830763 16:05:29 Launchpad bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:06:24 As you can see, the biggest offenders are test_qos_basic_and_update and the routers migrations 16:07:06 digging into test_qos_basic_and_update, please read note #6 16:07:28 most of the failures happen after updating the QoS policy / rule 16:08:00 in other words, at that point we already have connectivity and the routers / dvr seem to be working fine 16:08:24 ok, so we have couple of different issues there 16:08:53 because issue with failing to get instance-id from metadata also happens quite often in various tests 16:10:47 * mlavalle waiting for the rest of the comment 16:11:10 mlavalle: that's all from my side 16:11:16 oh 16:11:19 mlavalle, I would like to see why the qos_check os failing 16:11:22 I just wanted to say that we have few different issues 16:11:24 is failing 16:11:28 I'll review the logs 16:11:45 can we than report it as 3 different bugs: 16:12:03 1. already created bug - left it related to ssh and metadata issues, 16:12:05 yeah, I wanted someone (really thinking ralonsoh) to check the failures in QoS 16:12:08 2. qos test issue 16:12:15 3. router migrations problems 16:12:22 I don't think this QoS failure is purely due to dvr 16:12:29 it must be the combination 16:12:43 so if ralonsoh takes care of that one 16:12:44 I can take QoS (2) 16:12:59 I will take care of bug 3 as described by slaweq 16:12:59 bug 3 in mono (Ubuntu) "Custom information for each translation team" [Undecided,Fix committed] https://launchpad.net/bugs/3 16:13:09 LOL 16:13:38 ralonsoh: would you file bug 2? 16:13:42 sure 16:13:47 thx guys 16:14:00 I can take a look once again on the issue with metadata 16:14:06 if you do, I'll post there the Kibana search that you need to see all the occurrences 16:14:18 this one is IMO clearly related to dvr as I didn't saw it on any other jobs 16:14:22 ralonsoh: ^^^^ 16:15:01 ok 16:15:23 ok, I achieved what I wanted with my report today 16:15:59 thx mlavalle for update and for working on this 16:16:14 mlavalle: will You also report bug 2 as new one? 16:16:21 yes 16:16:24 I will 16:16:24 thx a lot 16:16:35 #action mlavalle to report bug with router migrations 16:16:43 and assign to me 16:16:54 #action ralonsoh to report bug with qos scenario test failures 16:17:14 #action slaweq to take a look at issue with dvr and metadata: https://bugs.launchpad.net/neutron/+bug/1830763 16:17:15 Launchpad bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:17:42 ok, I think we are good here and can move on 16:17:52 ralonsoh to try a patch to resuce the number of workers in FT 16:18:15 slaweq, and I didn't have time, sorry 16:18:23 * mlavalle will have to drop off at 30 minutes after the hour 16:18:25 my bad, I'll do this tomorrow morning 16:18:44 but I'll need help, because I really don't know where to modify this 16:18:51 ralonsoh: sure, no problem 16:18:52 in the neutron/tox.ini file? 16:19:04 in the zuul FT definition? 16:19:19 ralonsoh: I think that it should be defined in tox.ini 16:19:26 but I'm not 100% sure now 16:19:31 slaweq, that was my initial though 16:19:39 I'll try it tomorrow 16:20:16 ok, thx 16:20:26 #action ralonsoh to try a patch to resuce the number of workers in FT 16:20:36 so, next one 16:20:38 ralonsoh to report a bug and investigate failed test neutron.tests.fullstack.test_qos.TestMinBwQoSOvs.test_bw_limit_qos_port_removed 16:20:52 in my list too, and no time 16:20:54 sorry again 16:21:15 really, I dind't have time last week 16:22:26 ralonsoh: no problem at all :) 16:22:31 it's not very urgent 16:22:35 #action ralonsoh to report a bug and investigate failed test neutron.tests.fullstack.test_qos.TestMinBwQoSOvs.test_bw_limit_qos_port_removed 16:22:52 but please at least report a bug, that we have it tracked 16:22:59 sure 16:23:00 maybe someone else will take a look 16:23:02 :) 16:23:13 thx 16:23:18 ok, and last one 16:23:20 slaweq to open bug about slow neutron-tempest-with-uwsgi job 16:23:26 I opened bug https://bugs.launchpad.net/neutron/+bug/1837552 16:23:27 Launchpad bug 1837552 in neutron "neutron-tempest-with-uwsgi job finish with timeout very often" [Medium,Confirmed] 16:23:38 and I wrote there my initial findings 16:24:02 basically there is some issue with neutron API IMO but I'm not sure what's going on there exactly 16:25:38 slaweq, I saw that most of the time the test failing is the vrrp one 16:25:42 let me find the name 16:25:52 ralonsoh: but in which job? 16:26:07 hmmm, not in tempest.... osrry 16:26:09 sorry 16:26:42 sure 16:27:12 in this tempest job, it is like at some point all tests are failing due to timeouts connecting to neutron API 16:27:28 in apache logs I see HTTP 500 for each request related to neutron 16:27:39 but in neutron logs I didn't saw any error 16:27:48 so I'm a bit confused with that 16:27:59 if I will have some time, I will try to investigate it 16:28:27 but I didn't assign myself to this bug for now, maybe there will be someone else who will wants to look into this 16:29:01 * mlavalle drops off for another meeting o/ 16:29:09 see You mlavalle 16:29:20 ok, I think we can move on 16:29:23 next topic 16:29:24 #topic Stadium projects 16:29:32 Python 3 migration 16:29:34 Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status 16:29:41 I have only short update about this one 16:29:55 last job in fwaas repo is switched to zuulv3 and python3 16:29:59 so we are good with this one 16:30:19 we still have some work to do in 16:30:36 networking-bagpipe, networking-midonet, networking-odl and python-neutronclient 16:30:41 and that would be all 16:31:03 we have patches or at least volunteers for all of them except midonet 16:31:13 so I think we are quite good with this 16:31:25 next part is: 16:31:27 tempest-plugins migration 16:31:28 slaweq, cool, we still support neutron-client 16:31:33 and I don't have any updates here 16:31:45 ralonsoh: sure, we are still supporting neutronclient 16:31:51 it's deprecated but supported 16:32:04 and amotoki is taking care of switching it to py3 16:33:14 any questions/other updates about stadium projects? 16:33:26 no 16:33:38 ok, lets move on quickly 16:33:40 #topic Grafana 16:33:47 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:34:45 looking at integrated tempest jobs, I think we are a bit better now, since we switched some jobs to only run neutron and nova related tests 16:35:26 even last week most of those jobs were in quite good shape 16:35:57 fullstack is also quite good now 16:36:04 functiona test are failing a bit 16:36:24 but I think it's mostly related to the issues which we talked about earlier today 16:37:36 anything else regarding grafana? 16:38:37 ok, lets move on 16:38:39 #topic fullstack/functional 16:38:55 I have only 2 things related to fullstack tests today 16:39:14 1. I recently found one "new" issue and reported a bug: https://bugs.launchpad.net/neutron/+bug/1837380 16:39:15 Launchpad bug 1837380 in neutron "Timeout while getting bridge datapath id crashes ova agent" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 16:40:00 basically if some physical bridge is recreated and there will be timeout while getting datapath id, neutron-ovs-agent will crash completly 16:40:11 ?? 16:40:18 that means the bridge is created again? 16:40:28 correct ralonsoh 16:40:32 hmmm 16:40:46 maybe the default datapath_is is not correct 16:40:58 this must be different from the other bridges 16:41:11 and should be "something" not null 16:41:13 ralonsoh: please check logs: http://logs.openstack.org/81/671881/1/check/neutron-fullstack/c6b2e08/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_north_south_traffic/neutron-openvswitch-agent--2019-07-22--07-48-17-892074_log.txt.gz#_2019-07-22_07_49_28_698 16:41:24 it was timeout while getting datapath_id 16:41:28 I see, yes 16:41:44 10 secs to retrieve the datapath 16:41:49 in fullstack/functional jobs we have seen from time to time e.g. ovsdbapp timeouts and things like that 16:42:01 so my assumption is that similar issue happend here 16:42:12 but this shouldn't cause crash of agent IMO 16:42:33 but, as You can see at http://logs.openstack.org/81/671881/1/check/neutron-fullstack/c6b2e08/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_north_south_traffic/neutron-openvswitch-agent--2019-07-22--07-48-17-892074_log.txt.gz#_2019-07-22_07_49_28_756 it crashes 16:42:36 no, of course 16:42:59 I proposed some patch for that but will need to add some UT for that probably: https://review.opendev.org/672018 16:43:10 actually I was talking more about the second error 16:43:56 which one? 16:44:10 the one related to https://review.opendev.org/#/c/672018 16:44:17 I'll review your patch 16:44:37 yes, this one is related to this crash while getting datapath_id of bridge 16:44:41 it's one issue 16:45:53 ok, and the other thing which I have for today is 16:45:55 http://logs.openstack.org/77/670177/4/check/neutron-fullstack/56c8bb0/testr_results.html.gz 16:46:16 but I found this failure only once so far and IMO it may be related to the patch on which it was running 16:46:25 so I would just keep an eye on those tests :) 16:46:56 and that's all from my for today 16:47:18 I think we already talked about all other "hot" issues with tempest and functional tests 16:47:32 anything else You want to talk about today? 16:47:43 no 16:48:08 ok, so lets finish a bit earlier today 16:48:12 thx for attending 16:48:16 o/ 16:48:16 bye! 16:48:19 #endmeeting