15:00:03 <slaweq> #startmeeting neutron_ci 15:00:03 <openstack> Meeting started Wed Apr 1 15:00:03 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:04 <slaweq> hi 15:00:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:07 <openstack> The meeting name has been set to 'neutron_ci' 15:00:39 <ralonsoh> hi 15:01:01 <bcafarel> o/ 15:01:27 <slaweq> first of all 15:01:29 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:01:31 <slaweq> :) 15:01:38 <slaweq> and lets go 15:01:47 <slaweq> and do this quick hopefully 15:01:48 <slaweq> #topic Actions from previous meetings 15:01:53 <slaweq> slaweq to investigate fullstack SG test broken pipe failures 15:02:02 <slaweq> I still didn't have time for it 15:02:21 <slaweq> and worst thing is that I saw it also failing on openvswitch scenario this week 15:02:32 <slaweq> so I need to prioritize it for next week and debug 15:02:54 <slaweq> #action slaweq to investigate fullstack SG test broken pipe failures 15:03:02 <njohnston_> o/ 15:03:05 <slaweq> ^^ reminder for next week 15:03:08 <slaweq> hi njohnston_ :) 15:03:12 <slaweq> ok, next one 15:03:13 <slaweq> maciejjozefczyk to take a look and report LP for failures in neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log 15:04:28 <slaweq> I just pinged maciejjozefczyk, maybe he will join here 15:06:14 <lajoskatona> o/ 15:06:19 <slaweq> ok, he is probably not here 15:06:24 <slaweq> lets assign it to him for next week 15:06:32 <slaweq> #action maciejjozefczyk to take a look and report LP for failures in neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log 15:06:38 <slaweq> next one 15:06:40 <slaweq> ralonsoh to check (again) issue with ns deletion in neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase 15:06:43 <slaweq> hi lajoskatona :) 15:06:56 <lajoskatona> Hi, sorry for being late 15:06:56 <ralonsoh> slaweq, yes, this is caused by ctypes call 15:07:17 <ralonsoh> during the privsep execution, in a eventlet context 15:07:28 <ralonsoh> the method returns the GIL and this is never returned 15:07:45 <ralonsoh> --> the method time outs 15:08:16 <ralonsoh> solution: quite complex if there is no alternative to ctypes calls (C method calls) 15:08:30 <slaweq> so I assume that we can't really fix it in neutron, right? 15:08:49 <ralonsoh> I'm still thinking about returning to "ip" calls 15:09:08 <slaweq> :) 15:09:12 <ralonsoh> at least for those funtions 15:09:24 <slaweq> ralonsoh: what about Your community goal proposal? :) 15:09:38 <ralonsoh> I'm writing now the document 15:09:56 <ralonsoh> I need to fill the administrative part 15:09:58 <njohnston> community goals are good, working code is better 15:10:11 <ralonsoh> but it was accepted 15:10:49 <slaweq> ralonsoh: I was more like joking that this solution is not going together with Your proposal of moving to privsep with everything :) 15:11:06 <ralonsoh> but I'm not talking about removing privsep 15:11:16 <ralonsoh> but just avoid pyroute2 on those callss 15:11:20 <ralonsoh> calls 15:11:43 <slaweq> so can we use privsep with "exec"? 15:11:46 <ralonsoh> yes 15:11:53 <slaweq> ohh, I didn't know that 15:12:30 <slaweq> ok then :) 15:12:43 <slaweq> so maybe it will be good way to go, at least for now 15:13:01 <ralonsoh> sure 15:13:11 <slaweq> one more question - can this bug impact real production? or is it something which may happen only in ci jobs? 15:13:26 <ralonsoh> it can impact, yes 15:13:36 <slaweq> ok 15:14:06 <slaweq> so we should definitely find solution, even if it requires getting back to exec("ip") 15:14:40 <slaweq> ralonsoh: do we have LP for that already opened? 15:14:49 <ralonsoh> not yet, I'll do it 15:14:54 <slaweq> thx ralonsoh 15:15:10 <slaweq> #action ralonsoh to open LP about issue with neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase 15:15:19 <slaweq> ok, next one 15:15:20 <slaweq> slaweq to check server termination on multicast test 15:15:29 <slaweq> I didn't have time for that one too :/ 15:15:31 <slaweq> sorry 15:15:37 <slaweq> I will try to check that 15:15:40 <slaweq> #action slaweq to check server termination on multicast test 15:15:50 <slaweq> and the last one 15:15:52 <slaweq> slaweq to report LP about PGSQL periodic job failures 15:15:55 <slaweq> I opened bug 15:16:01 <slaweq> and ralonsoh already fixed it - thx a lot 15:16:06 <ralonsoh> yw! 15:16:15 <slaweq> postgresql job is running fine now 15:16:23 <bcafarel> yay 15:16:41 <slaweq> ok, that's all about actions from previous week from me 15:16:52 <slaweq> anything You want to add/ask? 15:18:12 <bcafarel> lajoskatona: want to bring that rally regression issue now? 15:18:32 <lajoskatona> bcafarel: yeah 15:19:00 <lajoskatona> actually I just checked the bug opened by rubasov, and checked some logs for numbers :-) 15:19:27 <lajoskatona> by that as I see trunk create takes longer time, 15:19:29 <njohnston> https://review.opendev.org/#/c/716562/ reverts the code that added the regression; ralonsoh had a suggestion to fix it possibly without reverting 15:19:50 <ralonsoh> I would like your feedback on this proposal 15:19:51 <slaweq> if You are talking about https://bugs.launchpad.net/neutron/+bug/1870110 I had it for later in the agenda 15:19:53 <openstack> Launchpad bug 1870110 in neutron "neutron-rally-task fails in rally_openstack.task.scenarios.neutron.trunk.CreateAndListTrunks" [Undecided,In progress] - Assigned to Bence Romsics (bence-romsics) 15:19:53 <njohnston> bug is 1870110 15:19:57 <slaweq> but we can discuss it now :) 15:20:04 <slaweq> #topic Rally 15:20:09 <lajoskatona> slaweq: yes, that is it 15:20:32 <bcafarel> slaweq: oops sorry for jumping the gun 15:20:39 <slaweq> bcafarel: no problem :) 15:20:52 <lajoskatona> rubasov proposed a revert for the patch that suspicios 15:21:07 <slaweq> so I think we should go with revert now to unblock gates and then we can propose better solution for original problem maybe 15:21:18 <ralonsoh> +1 15:21:22 <slaweq> unless You have any quick solution already in mind 15:21:32 <lajoskatona> +1 before releases specially 15:21:33 <slaweq> but it seems that it's hitting us a lot now in rally jobs 15:21:52 <njohnston> I think revert and repropose is the way to go as well 15:22:09 <njohnston> I did not notice the performance effect on my system but I was only dealing with small numbers of ports 15:22:15 <ralonsoh> in a follow up patch, we can implement this subport bulk update: https://review.opendev.org/#/c/716562/1/neutron/services/trunk/plugin.py@a455 15:22:36 <ralonsoh> the point is: if we do this, we won't use the plugin method 15:22:51 <ralonsoh> and we need to check that the device_id is "" or the trunk id only 15:23:00 <bcafarel> sounds good, revert for now and study it without gate pressure on master 15:23:41 <njohnston> ok so watch the revert change https://review.opendev.org/#/c/716562/ 15:23:44 <slaweq> looking at https://8d356b01067a0ad3b76e-f043268e56cbcc99f3170ead76b3a9f9.ssl.cf1.rackcdn.com/716562/1/check/neutron-rally-task/ba7611c/results/report.html#/NeutronTrunks.create_and_list_trunks it seems that really this revert will help 15:25:03 <slaweq> ok, I think we are good with this rally issue now 15:25:12 <slaweq> so we can move on to other topics, right? 15:25:17 <njohnston> yep[ 15:25:33 <slaweq> ok 15:25:35 <slaweq> #topic Stadium projects 15:25:45 <slaweq> any updates about zuulv3 ? 15:26:06 <lajoskatona> that is a hard question :-) 15:26:07 <njohnston> networking-odl has a patch in progress https://review.opendev.org/#/c/672925/, thanks lajoskatona 15:26:27 <lajoskatona> there's some progress but it's like the rabbit hole avery step is deeper.... 15:26:27 <slaweq> I saw it today 15:26:30 <njohnston> networking-bagpipe has a patch as well https://review.opendev.org/703949 15:26:35 <slaweq> LOL 15:26:53 <lajoskatona> yeah bagpipe is simpler I think 15:26:55 <njohnston> and still nothing visible for networking-midonet 15:27:03 <slaweq> lajoskatona: if You need any help with it I can try to help, but I don't know odl at all 15:27:14 <njohnston> those three are all that is left AFAICT 15:27:23 <lajoskatona> for odl I have some contacts from ODL guys but they are heavily overloaded so hard to get useful info from them 15:27:47 <slaweq> ok, fortunatelly it's not urgent for now 15:28:00 <lajoskatona> slaweq: thanks, the worst is to read java log ;) 15:28:11 <njohnston> slaweq: do you hear much from yamamoto these days? 15:28:13 <slaweq> and for bagpipe, is that fullstack job the only one which needs to be migrated? 15:28:22 <lajoskatona> yeas 15:28:24 <lajoskatona> yes 15:28:33 <lajoskatona> the fact is that the job fails anyway 15:28:39 <slaweq> njohnston: nope, yamamoto is mostly available during drivers meeting 15:29:16 <lajoskatona> I have a patch for that as well (https://review.opendev.org/702895 ) 15:29:21 <slaweq> lajoskatona: in https://9f709e7d1e4d7533935c-7291d6e818a7c847826cc66aee2194e8.ssl.cf5.rackcdn.com/703949/17/check/networking-bagpipe-dsvm-fullstack/b3b544e/testr_results.html I see that this job is failing now but probably because of some bugs in tests 15:29:27 <slaweq> not due to job definition 15:29:45 <slaweq> was this job green before this migration? 15:30:04 <bcafarel> :) I just left similar comment in review 15:30:14 <lajoskatona> salweq: nope, from tmorin I know that it passed long time ago 15:30:35 <slaweq> maybe it passed away long time ago ;) 15:30:52 <lajoskatona> so my goal is to make it do something, and fail with actual job and do the fix separately 15:31:11 <lajoskatona> :-) 15:31:22 <slaweq> yes, so it seems for me at first glance that https://review.opendev.org/#/c/703949/ should be good to go now 15:31:28 <slaweq> right? 15:31:59 <bcafarel> that's what I think yes 15:32:15 <slaweq> ok, I will review it too 15:32:21 <slaweq> and lets go with this one :) 15:32:27 <slaweq> will be one down hopefully 15:32:34 <lajoskatona> I have to check the result, njohnston pointed to the fact it is not doing anything in latest runs 15:33:11 <slaweq> in last run there are results even https://9f709e7d1e4d7533935c-7291d6e818a7c847826cc66aee2194e8.ssl.cf5.rackcdn.com/703949/17/check/networking-bagpipe-dsvm-fullstack/b3b544e/testr_results.html 15:33:15 <slaweq> so it is doing what it should 15:33:21 <slaweq> 9 fail, 4 skipped 15:33:36 <slaweq> exactly same result as in old legacy job: https://6d454fee7aca3ec21c01-c455553d95560f1580667e93cc59b7bd.ssl.cf5.rackcdn.com/702895/5/check/legacy-networking-bagpipe-dsvm-fullstack/af47476/job-output.txt 15:33:42 <lajoskatona> ah, that's good 15:33:45 <slaweq> so "perfect" for me :) 15:33:46 <lajoskatona> thanks 15:34:32 <slaweq> and I think I have some idea what is missing in https://review.opendev.org/#/c/702895/ now 15:34:36 <lajoskatona> these stadiums are like old gardens without gardener in years.... 15:34:46 <njohnston> ^^ too true 15:34:51 <slaweq> yeap 15:35:03 <slaweq> that's why we had discussion about it in Shanghai 15:35:08 <bcafarel> well a stadium lawn requires quite a faew gardeners 15:35:14 <slaweq> and why neutron-fwaas is deprecated now 15:35:21 <lajoskatona> yeah that is trying to make these tests work again 15:36:13 <slaweq> ok, lets move on 15:36:45 <slaweq> I think we are good speaking about stadium projects for today 15:37:13 <slaweq> #topic Grafana 15:37:19 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:37:49 <slaweq> Average number of rechecks in last weeks: 15:37:51 <slaweq> week 13 of 2020: 1,93 15:37:53 <slaweq> week 14 of 2020: 0,75 15:38:02 <slaweq> so this doesn't look very bad IMO 15:38:32 <slaweq> we had issue with designate job recently 15:38:50 <slaweq> example: https://zuul.opendev.org/t/openstack/build/dd2c6ed937934543a42afc2f92459eac 15:39:00 <slaweq> but it should be already fixed in devstack 15:39:06 <slaweq> so we are good there 15:39:19 <bcafarel> yes it looks ok in recent zuul runs 15:39:54 <slaweq> from other things, we recently merged patch which added again non-voting tripleo standalone job 15:39:59 <slaweq> it's on centos 8 now 15:40:14 <slaweq> I didn't update BZ yet as I wanted first to get merged: 15:40:30 <slaweq> https://review.opendev.org/#/c/714917/ 15:40:31 <slaweq> and 15:40:54 <slaweq> https://review.opendev.org/#/c/710436/ 15:41:03 <slaweq> and then I will do one update of our dashboard again 15:41:32 <slaweq> anything else regarding grafana? 15:42:33 <slaweq> if not, lets move on 15:42:35 <slaweq> #topic fullstack/functional 15:42:56 <slaweq> I found one new (for me at least) issue in functional tests: 15:42:58 <slaweq> neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon 15:43:03 <slaweq> https://80137ce53930819135d8-42d904af0faa486c8226703976d821a0.ssl.cf2.rackcdn.com/704833/23/check/neutron-functional/17568d5/testr_results.html 15:43:40 <ralonsoh> I'll check that one 15:43:51 <slaweq> thx ralonsoh 15:45:25 <slaweq> #action ralonsoh to check failure in neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon 15:45:47 <slaweq> I also found one new for me failure in fullstack tests: 15:45:49 <slaweq> neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_router_fip_qos_after_admin_state_down_up 15:45:53 <slaweq> https://81525168d755db537877-a5e4e29d4d6432c5c7202337ef0214bc.ssl.cf1.rackcdn.com/714731/1/gate/neutron-fullstack/8a9753b/testr_results.html 15:46:22 <slaweq> anyone wants to check that one? 15:46:49 <ralonsoh> I could try if I have time 15:46:55 <slaweq> thx ralonsoh 15:47:14 <slaweq> it's probably not urgent, at least for now, as it happened only once so far 15:47:28 <slaweq> #action ralonsoh to check issue with neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_router_fip_qos_after_admin_state_down_up 15:47:49 <slaweq> ok, and that's basically all from me for today 15:48:11 <slaweq> as we already discussed about scenario jobs (only designate issue which I had there) and rally 15:48:22 <slaweq> anything else You want to discuss today? 15:48:44 <bcafarel> nothing from me 15:50:10 <slaweq> ok, if not, I think I can give You few minutes back today 15:50:14 <slaweq> thx for attending 15:50:18 <slaweq> and see You all online 15:50:20 <slaweq> o/ 15:50:23 <slaweq> #endmeeting