#openstack-meeting-3 log

15:00:03 <slaweq> #startmeeting neutron_ci
15:00:03 <openstack> Meeting started Wed Apr  1 15:00:03 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:04 <slaweq> hi
15:00:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:07 <openstack> The meeting name has been set to 'neutron_ci'
15:00:39 <ralonsoh> hi
15:01:01 <bcafarel> o/
15:01:27 <slaweq> first of all
15:01:29 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:01:31 <slaweq> :)
15:01:38 <slaweq> and lets go
15:01:47 <slaweq> and do this quick hopefully
15:01:48 <slaweq> #topic Actions from previous meetings
15:01:53 <slaweq> slaweq to investigate fullstack SG test broken pipe failures
15:02:02 <slaweq> I still didn't have time for it
15:02:21 <slaweq> and worst thing is that I saw it also failing on openvswitch scenario this week
15:02:32 <slaweq> so I need to prioritize it for next week and debug
15:02:54 <slaweq> #action slaweq to investigate fullstack SG test broken pipe failures
15:03:02 <njohnston_> o/
15:03:05 <slaweq> ^^ reminder for next week
15:03:08 <slaweq> hi njohnston_ :)
15:03:12 <slaweq> ok, next one
15:03:13 <slaweq> maciejjozefczyk to take a look and report LP for failures in neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log
15:04:28 <slaweq> I just pinged maciejjozefczyk, maybe he will join here
15:06:14 <lajoskatona> o/
15:06:19 <slaweq> ok, he is probably not here
15:06:24 <slaweq> lets assign it to him for next week
15:06:32 <slaweq> #action maciejjozefczyk to take a look and report LP for failures in neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log
15:06:38 <slaweq> next one
15:06:40 <slaweq> ralonsoh to check (again) issue with ns deletion in neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase
15:06:43 <slaweq> hi lajoskatona :)
15:06:56 <lajoskatona> Hi, sorry for being late
15:06:56 <ralonsoh> slaweq, yes, this is caused by ctypes call
15:07:17 <ralonsoh> during the privsep execution, in a eventlet context
15:07:28 <ralonsoh> the method returns the GIL and this is never returned
15:07:45 <ralonsoh> --> the method time outs
15:08:16 <ralonsoh> solution: quite complex if there is no alternative to ctypes calls (C method calls)
15:08:30 <slaweq> so I assume that we can't really fix it in neutron, right?
15:08:49 <ralonsoh> I'm still thinking about returning to "ip" calls
15:09:08 <slaweq> :)
15:09:12 <ralonsoh> at least for those funtions
15:09:24 <slaweq> ralonsoh: what about Your community goal proposal? :)
15:09:38 <ralonsoh> I'm writing now the document
15:09:56 <ralonsoh> I need to fill the administrative part
15:09:58 <njohnston> community goals are good, working code is better
15:10:11 <ralonsoh> but it was accepted
15:10:49 <slaweq> ralonsoh: I was more like joking that this solution is not going together with Your proposal of moving to privsep with everything :)
15:11:06 <ralonsoh> but I'm not talking about removing privsep
15:11:16 <ralonsoh> but just avoid pyroute2 on those callss
15:11:20 <ralonsoh> calls
15:11:43 <slaweq> so can we use privsep with "exec"?
15:11:46 <ralonsoh> yes
15:11:53 <slaweq> ohh, I didn't know that
15:12:30 <slaweq> ok then :)
15:12:43 <slaweq> so maybe it will be good way to go, at least for now
15:13:01 <ralonsoh> sure
15:13:11 <slaweq> one more question - can this bug impact real production? or is it something which may happen only in ci jobs?
15:13:26 <ralonsoh> it can impact, yes
15:13:36 <slaweq> ok
15:14:06 <slaweq> so we should definitely find solution, even if it requires getting back to exec("ip")
15:14:40 <slaweq> ralonsoh: do we have LP for that already opened?
15:14:49 <ralonsoh> not yet, I'll do it
15:14:54 <slaweq> thx ralonsoh
15:15:10 <slaweq> #action ralonsoh to open LP about issue with neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase
15:15:19 <slaweq> ok, next one
15:15:20 <slaweq> slaweq to check server termination on multicast test
15:15:29 <slaweq> I didn't have time for that one too :/
15:15:31 <slaweq> sorry
15:15:37 <slaweq> I will try to check that
15:15:40 <slaweq> #action slaweq to check server termination on multicast test
15:15:50 <slaweq> and the last one
15:15:52 <slaweq> slaweq to report LP about PGSQL periodic job failures
15:15:55 <slaweq> I opened bug
15:16:01 <slaweq> and ralonsoh already fixed it - thx a lot
15:16:06 <ralonsoh> yw!
15:16:15 <slaweq> postgresql job is running fine now
15:16:23 <bcafarel> yay
15:16:41 <slaweq> ok, that's all about actions from previous week from me
15:16:52 <slaweq> anything You want to add/ask?
15:18:12 <bcafarel> lajoskatona: want to bring that rally regression issue now?
15:18:32 <lajoskatona> bcafarel: yeah
15:19:00 <lajoskatona> actually I just checked the bug opened by rubasov, and checked some logs for numbers :-)
15:19:27 <lajoskatona> by that as I see trunk create takes longer time,
15:19:29 <njohnston> https://review.opendev.org/#/c/716562/ reverts the code that added the regression; ralonsoh had a suggestion to fix it possibly without reverting
15:19:50 <ralonsoh> I would like your feedback on this proposal
15:19:51 <slaweq> if You are talking about https://bugs.launchpad.net/neutron/+bug/1870110 I had it for later in the agenda
15:19:53 <openstack> Launchpad bug 1870110 in neutron "neutron-rally-task fails in rally_openstack.task.scenarios.neutron.trunk.CreateAndListTrunks" [Undecided,In progress] - Assigned to Bence Romsics (bence-romsics)
15:19:53 <njohnston> bug is 1870110
15:19:57 <slaweq> but we can discuss it now :)
15:20:04 <slaweq> #topic Rally
15:20:09 <lajoskatona> slaweq: yes, that is it
15:20:32 <bcafarel> slaweq: oops sorry for jumping the gun
15:20:39 <slaweq> bcafarel: no problem :)
15:20:52 <lajoskatona> rubasov proposed a revert for the patch that suspicios
15:21:07 <slaweq> so I think we should go with revert now to unblock gates and then we can propose better solution for original problem maybe
15:21:18 <ralonsoh> +1
15:21:22 <slaweq> unless You have any quick solution already in mind
15:21:32 <lajoskatona> +1 before releases specially
15:21:33 <slaweq> but it seems that it's hitting us a lot now in rally jobs
15:21:52 <njohnston> I think revert and repropose is the way to go as well
15:22:09 <njohnston> I did not notice the performance effect on my system but I was only dealing with small numbers of ports
15:22:15 <ralonsoh> in a follow up patch, we can implement this subport bulk update: https://review.opendev.org/#/c/716562/1/neutron/services/trunk/plugin.py@a455
15:22:36 <ralonsoh> the point is: if we do this, we won't use the plugin method
15:22:51 <ralonsoh> and we need to check that the device_id is "" or the trunk id only
15:23:00 <bcafarel> sounds good, revert for now and study it without gate pressure on master
15:23:41 <njohnston> ok so watch the revert change https://review.opendev.org/#/c/716562/
15:23:44 <slaweq> looking at https://8d356b01067a0ad3b76e-f043268e56cbcc99f3170ead76b3a9f9.ssl.cf1.rackcdn.com/716562/1/check/neutron-rally-task/ba7611c/results/report.html#/NeutronTrunks.create_and_list_trunks it seems that really this revert will help
15:25:03 <slaweq> ok, I think we are good with this rally issue now
15:25:12 <slaweq> so we can move on to other topics, right?
15:25:17 <njohnston> yep[
15:25:33 <slaweq> ok
15:25:35 <slaweq> #topic Stadium projects
15:25:45 <slaweq> any updates about zuulv3 ?
15:26:06 <lajoskatona> that is a hard question :-)
15:26:07 <njohnston> networking-odl has a patch in progress https://review.opendev.org/#/c/672925/, thanks lajoskatona
15:26:27 <lajoskatona> there's some progress but it's like the rabbit hole avery step is deeper....
15:26:27 <slaweq> I saw it today
15:26:30 <njohnston> networking-bagpipe has a patch as well https://review.opendev.org/703949
15:26:35 <slaweq> LOL
15:26:53 <lajoskatona> yeah bagpipe is simpler I think
15:26:55 <njohnston> and still nothing visible for networking-midonet
15:27:03 <slaweq> lajoskatona: if You need any help with it I can try to help, but I don't know odl at all
15:27:14 <njohnston> those three are all that is left AFAICT
15:27:23 <lajoskatona> for odl I have some contacts from ODL guys but they are heavily overloaded so hard to get useful info from them
15:27:47 <slaweq> ok, fortunatelly it's not urgent for now
15:28:00 <lajoskatona> slaweq: thanks, the worst is to read java log ;)
15:28:11 <njohnston> slaweq: do you hear much from yamamoto these days?
15:28:13 <slaweq> and for bagpipe, is that fullstack job the only one which needs to be migrated?
15:28:22 <lajoskatona> yeas
15:28:24 <lajoskatona> yes
15:28:33 <lajoskatona> the fact is that the job fails anyway
15:28:39 <slaweq> njohnston: nope, yamamoto is mostly available during drivers meeting
15:29:16 <lajoskatona> I have a patch for that as well (https://review.opendev.org/702895 )
15:29:21 <slaweq> lajoskatona: in https://9f709e7d1e4d7533935c-7291d6e818a7c847826cc66aee2194e8.ssl.cf5.rackcdn.com/703949/17/check/networking-bagpipe-dsvm-fullstack/b3b544e/testr_results.html I see that this job is failing now but probably because of some bugs in tests
15:29:27 <slaweq> not due to job definition
15:29:45 <slaweq> was this job green before this migration?
15:30:04 <bcafarel> :) I just left similar comment in review
15:30:14 <lajoskatona> salweq: nope, from tmorin I know that it passed long time ago
15:30:35 <slaweq> maybe it passed away long time ago ;)
15:30:52 <lajoskatona> so my goal is to make it do something, and fail with actual job and do the fix separately
15:31:11 <lajoskatona> :-)
15:31:22 <slaweq> yes, so it seems for me at first glance that https://review.opendev.org/#/c/703949/ should be good to go now
15:31:28 <slaweq> right?
15:31:59 <bcafarel> that's what I think yes
15:32:15 <slaweq> ok, I will review it too
15:32:21 <slaweq> and lets go with this one :)
15:32:27 <slaweq> will be one down hopefully
15:32:34 <lajoskatona> I have to check the result, njohnston pointed to the fact it is not doing anything in latest runs
15:33:11 <slaweq> in last run there are results even https://9f709e7d1e4d7533935c-7291d6e818a7c847826cc66aee2194e8.ssl.cf5.rackcdn.com/703949/17/check/networking-bagpipe-dsvm-fullstack/b3b544e/testr_results.html
15:33:15 <slaweq> so it is doing what it should
15:33:21 <slaweq> 9 fail, 4 skipped
15:33:36 <slaweq> exactly same result as in old legacy job: https://6d454fee7aca3ec21c01-c455553d95560f1580667e93cc59b7bd.ssl.cf5.rackcdn.com/702895/5/check/legacy-networking-bagpipe-dsvm-fullstack/af47476/job-output.txt
15:33:42 <lajoskatona> ah, that's good
15:33:45 <slaweq> so "perfect" for me :)
15:33:46 <lajoskatona> thanks
15:34:32 <slaweq> and I think I have some idea what is missing in https://review.opendev.org/#/c/702895/ now
15:34:36 <lajoskatona> these stadiums are like old gardens without gardener in years....
15:34:46 <njohnston> ^^ too true
15:34:51 <slaweq> yeap
15:35:03 <slaweq> that's why we had discussion about it in Shanghai
15:35:08 <bcafarel> well a stadium lawn requires quite a faew gardeners
15:35:14 <slaweq> and why neutron-fwaas is deprecated now
15:35:21 <lajoskatona> yeah that is trying to make these tests work again
15:36:13 <slaweq> ok, lets move on
15:36:45 <slaweq> I think we are good speaking about stadium projects for today
15:37:13 <slaweq> #topic Grafana
15:37:19 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:37:49 <slaweq> Average number of rechecks in last weeks:
15:37:51 <slaweq> week 13 of 2020: 1,93
15:37:53 <slaweq> week 14 of 2020: 0,75
15:38:02 <slaweq> so this doesn't look very bad IMO
15:38:32 <slaweq> we had issue with designate job recently
15:38:50 <slaweq> example: https://zuul.opendev.org/t/openstack/build/dd2c6ed937934543a42afc2f92459eac
15:39:00 <slaweq> but it should be already fixed in devstack
15:39:06 <slaweq> so we are good there
15:39:19 <bcafarel> yes it looks ok in recent zuul runs
15:39:54 <slaweq> from other things, we recently merged patch which added again non-voting tripleo standalone job
15:39:59 <slaweq> it's on centos 8 now
15:40:14 <slaweq> I didn't update BZ yet as I wanted first to get merged:
15:40:30 <slaweq> https://review.opendev.org/#/c/714917/
15:40:31 <slaweq> and
15:40:54 <slaweq> https://review.opendev.org/#/c/710436/
15:41:03 <slaweq> and then I will do one update of our dashboard again
15:41:32 <slaweq> anything else regarding grafana?
15:42:33 <slaweq> if not, lets move on
15:42:35 <slaweq> #topic fullstack/functional
15:42:56 <slaweq> I found one new (for me at least) issue in functional tests:
15:42:58 <slaweq> neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon
15:43:03 <slaweq> https://80137ce53930819135d8-42d904af0faa486c8226703976d821a0.ssl.cf2.rackcdn.com/704833/23/check/neutron-functional/17568d5/testr_results.html
15:43:40 <ralonsoh> I'll check that one
15:43:51 <slaweq> thx ralonsoh
15:45:25 <slaweq> #action ralonsoh to check failure in neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon
15:45:47 <slaweq> I also found one new for me failure in fullstack tests:
15:45:49 <slaweq> neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_router_fip_qos_after_admin_state_down_up
15:45:53 <slaweq> https://81525168d755db537877-a5e4e29d4d6432c5c7202337ef0214bc.ssl.cf1.rackcdn.com/714731/1/gate/neutron-fullstack/8a9753b/testr_results.html
15:46:22 <slaweq> anyone wants to check that one?
15:46:49 <ralonsoh> I could try if I have time
15:46:55 <slaweq> thx ralonsoh
15:47:14 <slaweq> it's probably not urgent, at least for now, as it happened only once so far
15:47:28 <slaweq> #action ralonsoh to check issue with neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_router_fip_qos_after_admin_state_down_up
15:47:49 <slaweq> ok, and that's basically all from me for today
15:48:11 <slaweq> as we already discussed about scenario jobs (only designate issue which I had there) and rally
15:48:22 <slaweq> anything else You want to discuss today?
15:48:44 <bcafarel> nothing from me
15:50:10 <slaweq> ok, if not, I think I can give You few minutes back today
15:50:14 <slaweq> thx for attending
15:50:18 <slaweq> and see You all online
15:50:20 <slaweq> o/
15:50:23 <slaweq> #endmeeting