15:01:11 <slaweq> #startmeeting neutron_ci
15:01:12 <openstack> Meeting started Tue Mar 23 15:01:11 2021 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:16 <openstack> The meeting name has been set to 'neutron_ci'
15:01:18 <slaweq> hi
15:01:19 <ralonsoh> hi
15:01:20 <bcafarel> o/ again
15:01:34 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:01:35 <slaweq> Please open now :)
15:02:41 <slaweq> #topic Actions from previous meetings
15:03:19 <slaweq> ralonsoh to try to check how to limit number of logged lines in FT output
15:03:39 <ralonsoh> one sec..
15:03:59 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/780926
15:04:11 <ralonsoh> reviews are welcome
15:05:01 <slaweq> sure
15:05:05 <slaweq> but I already +2 it :/
15:05:19 <ralonsoh> hehehe I know
15:05:38 <slaweq> maybe lajoskatona can check
15:05:55 <bcafarel> :) I know why it looked familiar I checked it not so long ago
15:06:28 <slaweq> thx ralonsoh for that
15:06:31 <slaweq> next one
15:06:37 <slaweq> jlibosva to check LP https://bugs.launchpad.net/neutron/+bug/1918266
15:06:39 <openstack> Launchpad bug 1918266 in neutron "Functional test test_gateway_chassis_rebalance failing due to "failed to bind logical router"" [High,Confirmed] - Assigned to Jakub Libosvar (libosvar)
15:07:22 <jlibosva> I have spent some time today looking at it, I tried to capture it in the comment on the LP. I still don't get the full picture and we may need to add more debug messages to the test - because the reproducer ration is very very low
15:07:25 <jlibosva> 1 hit last week
15:07:58 <lajoskatona> Hi
15:08:01 <slaweq> jlibosva: k, thx for checking it
15:08:12 <lajoskatona> sure I check it
15:08:19 <jlibosva> I'll continue working on it and I'll send a patch with the debugs
15:08:28 <slaweq> thx a lot jlibosva
15:08:35 <slaweq> lajoskatona: thank You too :)
15:09:23 <slaweq> ok, next one
15:09:26 <slaweq> ralonsoh to check timeout while spawning metadata proxy in functional tests
15:09:50 <ralonsoh> yeah, I'm on it, one sec
15:10:04 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/779024
15:10:22 <ralonsoh> in a nutshell, this is kind of a workaround
15:10:46 <ralonsoh> if the initial IP read fails, the router will become backup
15:11:33 <ralonsoh> but at least the daemon does not hang on this process (reading the IP config)
15:11:36 <slaweq> but we still don't knw why original status wasn't read properly, right?
15:11:44 <ralonsoh> yeah...
15:11:47 <slaweq> k
15:11:49 <ralonsoh> no, sorry
15:13:05 <slaweq> ok, last one
15:13:07 <slaweq> ralonsoh to try to move to os.kill
15:13:25 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/681671
15:13:34 <ralonsoh> +w now! thanks
15:13:41 <slaweq> thank You!
15:13:57 <slaweq> I hope it will fix that problem with pm.disable()
15:14:02 <ralonsoh> I hope so!
15:14:13 <slaweq> as it's most common failure in the functional tests AFAICT
15:15:18 <slaweq> ok, that's all actions from last meeting
15:15:25 <slaweq> thank You for taking care of them
15:15:31 <slaweq> and lets move on to the next topic
15:15:33 <slaweq> #topic Stadium projects
15:15:38 <slaweq> lajoskatona: any updates?
15:16:13 <lajoskatona> not much, I just checked and only rocky is the problematic recently for odl and bgpvpn i.e.
15:17:31 <bcafarel> and we will get a nice round of recent CI status with wallaby branch creation (crossing fingers all pass)
15:17:50 <slaweq> :)
15:17:57 <slaweq> thx for taking care of it
15:17:59 <lajoskatona> +1
15:18:06 <slaweq> if You would need any help, please ping me
15:18:14 <slaweq> next topic
15:18:16 <slaweq> #topic Stable branches
15:18:31 <slaweq> bcafarel: anything going bad with stable branches?
15:18:47 <bcafarel> overall things are goode
15:19:17 <bcafarel> functional needs a bit more rechecks than usual but maybe some of the in progress patches will help in stable too :)
15:19:44 <slaweq> bcafarel: do You have examples of failures?
15:19:45 <bcafarel> and tempest-slow-py3 on stein from time to time, I wanted to ask slaweq if the swith to neutron-tempest-slow-yp3 is backportable there?
15:19:53 <bcafarel> (I did not check yet, just asking)
15:20:18 <slaweq> I can check if backport of that would be doable easy
15:20:39 <slaweq> #action slaweq to try to backport neutron-tempest-slow-py3 to stable/stein
15:21:04 <bcafarel> slaweq:  thanks!  I will dig up these functional failures in the meantime (lost my tabs yesterday...)
15:21:21 <slaweq> bcafarel: k
15:21:22 <slaweq> thx
15:21:36 <slaweq> next topic
15:21:38 <slaweq> #topic Grafana
15:21:44 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:22:26 <slaweq> we had issue with neutron-tempest-plugin-scenario-ovn job during the weekend
15:22:33 <slaweq> it was caused by my patch to devstack
15:22:38 <ralonsoh> hehehe
15:22:44 <slaweq> but now it is fixed already in config of our job
15:22:56 <slaweq> proper devstack fix is proposed by ralonsoh also
15:23:15 <ralonsoh> according to amotoki's suggestion
15:23:23 <slaweq> and we have problem with neutron-grenade-dvr-multinode job
15:23:36 <slaweq> which we can discuss in few minutes
15:24:06 <slaweq> except that, our periodic fedora job is passing again
15:24:11 <slaweq> thx bcafarel for fix
15:24:32 <bcafarel> nice, so periodic is all green?
15:24:49 <slaweq> it was e.g. yesterday
15:24:53 <slaweq> today functional job failed
15:25:03 <slaweq> but it's much better than it was
15:25:55 <slaweq> if there are is nothing else regarding grafana, lets go to some specific jobs now
15:26:09 <slaweq> #topic fullstack/functional
15:26:20 <slaweq> I found one qos related issue in functional jobs:
15:26:24 <slaweq> https://23270c5969573311d718-1a2f2b99c35dbfd3f442550661b64ad9.ssl.cf5.rackcdn.com/780916/3/check/neutron-functional-with-uwsgi/6443654/testr_results.html
15:26:30 <slaweq> ralonsoh: did You saw it already?
15:26:38 <ralonsoh> no, I'll check it
15:27:17 <slaweq> maybe lajoskatona or rubasov could take a look as You are probably familiar with minimum bandwidth rules
15:27:58 <ralonsoh> actually this test is failing precisely in https://review.opendev.org/c/openstack/neutron/+/770154/1/neutron/tests/functional/agent/common/test_ovs_lib.py
15:27:59 <lajoskatona> slaweq: in functional?
15:28:10 <lajoskatona> we have just fullstack as I remember, I check it
15:28:14 <ralonsoh> this is not related to QoS
15:28:57 <slaweq> ralonsoh: isn't it? I saw it's failing in the "qos something test" :)
15:29:11 <ralonsoh> yes but is failing during the port creation
15:29:21 <lajoskatona> test_update_minimum_bandwidth_queue_no_qos_no_queue?
15:29:24 <ralonsoh> this is an OVS problem/delay/something else
15:29:27 <slaweq> lajoskatona: yes
15:29:42 <slaweq> ralonsoh: ok, so sorry for mistake :)
15:29:49 <ralonsoh> no, not at all
15:30:45 <slaweq> ok, next one is from fullstack
15:30:51 <slaweq> neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_keepalived_multiple_sighups_does_not_forfeit_primary
15:30:52 <slaweq> https://3c7a0b423eb6fce37a1a-4781fed732bcaf2a49f1da3bb2ee8431.ssl.cf5.rackcdn.com/782275/1/gate/neutron-fullstack-with-uwsgi/3b96f9a/testr_results.html
15:31:19 <slaweq> but I think it could be some oom killer again
15:34:44 <slaweq> so I don't think we should bother with that now
15:34:51 <slaweq> lets see if that will happen more times
15:35:01 <slaweq> #topic Tempest/Scenario
15:35:11 <slaweq> here I found also one, qos related failure
15:35:18 <slaweq> https://a2a93c3e4994a3d62247-1af30bf7c5ab7139a47557262bacc248.ssl.cf2.rackcdn.com/779310/6/check/neutron-tempest-plugin-scenario-linuxbridge/0550a7c/testr_results.html
15:35:25 <slaweq> this time I think it is qos related ;)
15:36:15 <slaweq> or not
15:36:18 <slaweq> as I see now
15:36:19 <slaweq> 2021-03-22 11:57:53,245 82121 WARNING  [neutron_tempest_plugin.scenario.test_qos] Socket timeout while reading the remote file, bytes read: 430752
15:36:29 <ralonsoh> it is, I thin so
15:36:36 <ralonsoh> think*
15:37:53 <slaweq> ralonsoh: do You think we should open LP for that?
15:38:05 <ralonsoh> I'll check it first
15:38:08 <slaweq> thx
15:38:18 <slaweq> #action ralonsoh to check failed qos scenario test
15:38:35 <slaweq> ok, next topic
15:38:40 <slaweq> #topic Grenade
15:38:58 <slaweq> as I mentioned earlier, we have one serious issue with grenade dvr job
15:39:02 <slaweq> Bug https://launchpad.net/bugs/1920778
15:39:17 <openstack> Launchpad bug 1920778 in neutron ""test_add_remove_fixed_ip" faling in "grenade-dvr-multinode" CI job" [Critical,New] - Assigned to Slawek Kaplonski (slaweq)
15:39:19 <slaweq> so far I proposed https://review.opendev.org/c/openstack/neutron/+/782275/ to make that job non voting and non gating temporary
15:39:26 <slaweq> and I'm investigating that
15:39:47 <slaweq> it seems for me and ralonsoh that it is some issue with metadata as connectivity is established properly
15:40:13 <slaweq> but we will probably need to ask infra team to put some nodes on hold for us and try to debug there directly
15:40:28 <slaweq> as from logs we don't know exactly what could be wrong there
15:40:50 <slaweq> I was also trying today to find way in tempest to not clean resources in case of test failure
15:41:07 <slaweq> but I didn't found any smart way to do that so far
15:41:18 <slaweq> I will ask gmann if that is possible somehow
15:43:21 <slaweq> and that's basically all update about that grenade job so far
15:43:28 <slaweq> #topic rally
15:43:37 <slaweq> I opened new bug today https://bugs.launchpad.net/neutron/+bug/1920923
15:43:38 <openstack> Launchpad bug 1920923 in neutron "Rally test NeutronNetworks.create_and_update_subnets fails" [High,Confirmed]
15:43:48 <slaweq> if there is anyone who wants to check it, that would be great
15:44:20 <ralonsoh> I'll try to debug this one
15:44:27 <slaweq> thx ralonsoh
15:44:56 <slaweq> and that was last thing from me for today
15:45:01 <lajoskatona> I just checked and seems like everything stopped for seconds
15:45:08 <slaweq> do You have anything else You want to discuss today?
15:46:04 <bcafarel> nothing from me
15:46:55 <slaweq> if not, I will give You few minutes back today
15:46:59 <slaweq> thx for attending the meeting
15:47:02 <slaweq> o/
15:47:05 <ralonsoh> bye
15:47:07 <bcafarel> o/
15:47:08 <slaweq> #endmeeting