15:00:03 #startmeeting neutron_ci 15:00:03 Meeting started Wed Apr 1 15:00:03 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:04 hi 15:00:05 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:07 The meeting name has been set to 'neutron_ci' 15:00:39 hi 15:01:01 o/ 15:01:27 first of all 15:01:29 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:01:31 :) 15:01:38 and lets go 15:01:47 and do this quick hopefully 15:01:48 #topic Actions from previous meetings 15:01:53 slaweq to investigate fullstack SG test broken pipe failures 15:02:02 I still didn't have time for it 15:02:21 and worst thing is that I saw it also failing on openvswitch scenario this week 15:02:32 so I need to prioritize it for next week and debug 15:02:54 #action slaweq to investigate fullstack SG test broken pipe failures 15:03:02 o/ 15:03:05 ^^ reminder for next week 15:03:08 hi njohnston_ :) 15:03:12 ok, next one 15:03:13 maciejjozefczyk to take a look and report LP for failures in neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log 15:04:28 I just pinged maciejjozefczyk, maybe he will join here 15:06:14 o/ 15:06:19 ok, he is probably not here 15:06:24 lets assign it to him for next week 15:06:32 #action maciejjozefczyk to take a look and report LP for failures in neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log 15:06:38 next one 15:06:40 ralonsoh to check (again) issue with ns deletion in neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase 15:06:43 hi lajoskatona :) 15:06:56 Hi, sorry for being late 15:06:56 slaweq, yes, this is caused by ctypes call 15:07:17 during the privsep execution, in a eventlet context 15:07:28 the method returns the GIL and this is never returned 15:07:45 --> the method time outs 15:08:16 solution: quite complex if there is no alternative to ctypes calls (C method calls) 15:08:30 so I assume that we can't really fix it in neutron, right? 15:08:49 I'm still thinking about returning to "ip" calls 15:09:08 :) 15:09:12 at least for those funtions 15:09:24 ralonsoh: what about Your community goal proposal? :) 15:09:38 I'm writing now the document 15:09:56 I need to fill the administrative part 15:09:58 community goals are good, working code is better 15:10:11 but it was accepted 15:10:49 ralonsoh: I was more like joking that this solution is not going together with Your proposal of moving to privsep with everything :) 15:11:06 but I'm not talking about removing privsep 15:11:16 but just avoid pyroute2 on those callss 15:11:20 calls 15:11:43 so can we use privsep with "exec"? 15:11:46 yes 15:11:53 ohh, I didn't know that 15:12:30 ok then :) 15:12:43 so maybe it will be good way to go, at least for now 15:13:01 sure 15:13:11 one more question - can this bug impact real production? or is it something which may happen only in ci jobs? 15:13:26 it can impact, yes 15:13:36 ok 15:14:06 so we should definitely find solution, even if it requires getting back to exec("ip") 15:14:40 ralonsoh: do we have LP for that already opened? 15:14:49 not yet, I'll do it 15:14:54 thx ralonsoh 15:15:10 #action ralonsoh to open LP about issue with neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase 15:15:19 ok, next one 15:15:20 slaweq to check server termination on multicast test 15:15:29 I didn't have time for that one too :/ 15:15:31 sorry 15:15:37 I will try to check that 15:15:40 #action slaweq to check server termination on multicast test 15:15:50 and the last one 15:15:52 slaweq to report LP about PGSQL periodic job failures 15:15:55 I opened bug 15:16:01 and ralonsoh already fixed it - thx a lot 15:16:06 yw! 15:16:15 postgresql job is running fine now 15:16:23 yay 15:16:41 ok, that's all about actions from previous week from me 15:16:52 anything You want to add/ask? 15:18:12 lajoskatona: want to bring that rally regression issue now? 15:18:32 bcafarel: yeah 15:19:00 actually I just checked the bug opened by rubasov, and checked some logs for numbers :-) 15:19:27 by that as I see trunk create takes longer time, 15:19:29 https://review.opendev.org/#/c/716562/ reverts the code that added the regression; ralonsoh had a suggestion to fix it possibly without reverting 15:19:50 I would like your feedback on this proposal 15:19:51 if You are talking about https://bugs.launchpad.net/neutron/+bug/1870110 I had it for later in the agenda 15:19:53 Launchpad bug 1870110 in neutron "neutron-rally-task fails in rally_openstack.task.scenarios.neutron.trunk.CreateAndListTrunks" [Undecided,In progress] - Assigned to Bence Romsics (bence-romsics) 15:19:53 bug is 1870110 15:19:57 but we can discuss it now :) 15:20:04 #topic Rally 15:20:09 slaweq: yes, that is it 15:20:32 slaweq: oops sorry for jumping the gun 15:20:39 bcafarel: no problem :) 15:20:52 rubasov proposed a revert for the patch that suspicios 15:21:07 so I think we should go with revert now to unblock gates and then we can propose better solution for original problem maybe 15:21:18 +1 15:21:22 unless You have any quick solution already in mind 15:21:32 +1 before releases specially 15:21:33 but it seems that it's hitting us a lot now in rally jobs 15:21:52 I think revert and repropose is the way to go as well 15:22:09 I did not notice the performance effect on my system but I was only dealing with small numbers of ports 15:22:15 in a follow up patch, we can implement this subport bulk update: https://review.opendev.org/#/c/716562/1/neutron/services/trunk/plugin.py@a455 15:22:36 the point is: if we do this, we won't use the plugin method 15:22:51 and we need to check that the device_id is "" or the trunk id only 15:23:00 sounds good, revert for now and study it without gate pressure on master 15:23:41 ok so watch the revert change https://review.opendev.org/#/c/716562/ 15:23:44 looking at https://8d356b01067a0ad3b76e-f043268e56cbcc99f3170ead76b3a9f9.ssl.cf1.rackcdn.com/716562/1/check/neutron-rally-task/ba7611c/results/report.html#/NeutronTrunks.create_and_list_trunks it seems that really this revert will help 15:25:03 ok, I think we are good with this rally issue now 15:25:12 so we can move on to other topics, right? 15:25:17 yep[ 15:25:33 ok 15:25:35 #topic Stadium projects 15:25:45 any updates about zuulv3 ? 15:26:06 that is a hard question :-) 15:26:07 networking-odl has a patch in progress https://review.opendev.org/#/c/672925/, thanks lajoskatona 15:26:27 there's some progress but it's like the rabbit hole avery step is deeper.... 15:26:27 I saw it today 15:26:30 networking-bagpipe has a patch as well https://review.opendev.org/703949 15:26:35 LOL 15:26:53 yeah bagpipe is simpler I think 15:26:55 and still nothing visible for networking-midonet 15:27:03 lajoskatona: if You need any help with it I can try to help, but I don't know odl at all 15:27:14 those three are all that is left AFAICT 15:27:23 for odl I have some contacts from ODL guys but they are heavily overloaded so hard to get useful info from them 15:27:47 ok, fortunatelly it's not urgent for now 15:28:00 slaweq: thanks, the worst is to read java log ;) 15:28:11 slaweq: do you hear much from yamamoto these days? 15:28:13 and for bagpipe, is that fullstack job the only one which needs to be migrated? 15:28:22 yeas 15:28:24 yes 15:28:33 the fact is that the job fails anyway 15:28:39 njohnston: nope, yamamoto is mostly available during drivers meeting 15:29:16 I have a patch for that as well (https://review.opendev.org/702895 ) 15:29:21 lajoskatona: in https://9f709e7d1e4d7533935c-7291d6e818a7c847826cc66aee2194e8.ssl.cf5.rackcdn.com/703949/17/check/networking-bagpipe-dsvm-fullstack/b3b544e/testr_results.html I see that this job is failing now but probably because of some bugs in tests 15:29:27 not due to job definition 15:29:45 was this job green before this migration? 15:30:04 :) I just left similar comment in review 15:30:14 salweq: nope, from tmorin I know that it passed long time ago 15:30:35 maybe it passed away long time ago ;) 15:30:52 so my goal is to make it do something, and fail with actual job and do the fix separately 15:31:11 :-) 15:31:22 yes, so it seems for me at first glance that https://review.opendev.org/#/c/703949/ should be good to go now 15:31:28 right? 15:31:59 that's what I think yes 15:32:15 ok, I will review it too 15:32:21 and lets go with this one :) 15:32:27 will be one down hopefully 15:32:34 I have to check the result, njohnston pointed to the fact it is not doing anything in latest runs 15:33:11 in last run there are results even https://9f709e7d1e4d7533935c-7291d6e818a7c847826cc66aee2194e8.ssl.cf5.rackcdn.com/703949/17/check/networking-bagpipe-dsvm-fullstack/b3b544e/testr_results.html 15:33:15 so it is doing what it should 15:33:21 9 fail, 4 skipped 15:33:36 exactly same result as in old legacy job: https://6d454fee7aca3ec21c01-c455553d95560f1580667e93cc59b7bd.ssl.cf5.rackcdn.com/702895/5/check/legacy-networking-bagpipe-dsvm-fullstack/af47476/job-output.txt 15:33:42 ah, that's good 15:33:45 so "perfect" for me :) 15:33:46 thanks 15:34:32 and I think I have some idea what is missing in https://review.opendev.org/#/c/702895/ now 15:34:36 these stadiums are like old gardens without gardener in years.... 15:34:46 ^^ too true 15:34:51 yeap 15:35:03 that's why we had discussion about it in Shanghai 15:35:08 well a stadium lawn requires quite a faew gardeners 15:35:14 and why neutron-fwaas is deprecated now 15:35:21 yeah that is trying to make these tests work again 15:36:13 ok, lets move on 15:36:45 I think we are good speaking about stadium projects for today 15:37:13 #topic Grafana 15:37:19 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:37:49 Average number of rechecks in last weeks: 15:37:51 week 13 of 2020: 1,93 15:37:53 week 14 of 2020: 0,75 15:38:02 so this doesn't look very bad IMO 15:38:32 we had issue with designate job recently 15:38:50 example: https://zuul.opendev.org/t/openstack/build/dd2c6ed937934543a42afc2f92459eac 15:39:00 but it should be already fixed in devstack 15:39:06 so we are good there 15:39:19 yes it looks ok in recent zuul runs 15:39:54 from other things, we recently merged patch which added again non-voting tripleo standalone job 15:39:59 it's on centos 8 now 15:40:14 I didn't update BZ yet as I wanted first to get merged: 15:40:30 https://review.opendev.org/#/c/714917/ 15:40:31 and 15:40:54 https://review.opendev.org/#/c/710436/ 15:41:03 and then I will do one update of our dashboard again 15:41:32 anything else regarding grafana? 15:42:33 if not, lets move on 15:42:35 #topic fullstack/functional 15:42:56 I found one new (for me at least) issue in functional tests: 15:42:58 neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon 15:43:03 https://80137ce53930819135d8-42d904af0faa486c8226703976d821a0.ssl.cf2.rackcdn.com/704833/23/check/neutron-functional/17568d5/testr_results.html 15:43:40 I'll check that one 15:43:51 thx ralonsoh 15:45:25 #action ralonsoh to check failure in neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon 15:45:47 I also found one new for me failure in fullstack tests: 15:45:49 neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_router_fip_qos_after_admin_state_down_up 15:45:53 https://81525168d755db537877-a5e4e29d4d6432c5c7202337ef0214bc.ssl.cf1.rackcdn.com/714731/1/gate/neutron-fullstack/8a9753b/testr_results.html 15:46:22 anyone wants to check that one? 15:46:49 I could try if I have time 15:46:55 thx ralonsoh 15:47:14 it's probably not urgent, at least for now, as it happened only once so far 15:47:28 #action ralonsoh to check issue with neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_router_fip_qos_after_admin_state_down_up 15:47:49 ok, and that's basically all from me for today 15:48:11 as we already discussed about scenario jobs (only designate issue which I had there) and rally 15:48:22 anything else You want to discuss today? 15:48:44 nothing from me 15:50:10 ok, if not, I think I can give You few minutes back today 15:50:14 thx for attending 15:50:18 and see You all online 15:50:20 o/ 15:50:23 #endmeeting