15:01:11 <slaweq> #startmeeting neutron_ci 15:01:12 <openstack> Meeting started Tue Mar 23 15:01:11 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:16 <openstack> The meeting name has been set to 'neutron_ci' 15:01:18 <slaweq> hi 15:01:19 <ralonsoh> hi 15:01:20 <bcafarel> o/ again 15:01:34 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:01:35 <slaweq> Please open now :) 15:02:41 <slaweq> #topic Actions from previous meetings 15:03:19 <slaweq> ralonsoh to try to check how to limit number of logged lines in FT output 15:03:39 <ralonsoh> one sec.. 15:03:59 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/780926 15:04:11 <ralonsoh> reviews are welcome 15:05:01 <slaweq> sure 15:05:05 <slaweq> but I already +2 it :/ 15:05:19 <ralonsoh> hehehe I know 15:05:38 <slaweq> maybe lajoskatona can check 15:05:55 <bcafarel> :) I know why it looked familiar I checked it not so long ago 15:06:28 <slaweq> thx ralonsoh for that 15:06:31 <slaweq> next one 15:06:37 <slaweq> jlibosva to check LP https://bugs.launchpad.net/neutron/+bug/1918266 15:06:39 <openstack> Launchpad bug 1918266 in neutron "Functional test test_gateway_chassis_rebalance failing due to "failed to bind logical router"" [High,Confirmed] - Assigned to Jakub Libosvar (libosvar) 15:07:22 <jlibosva> I have spent some time today looking at it, I tried to capture it in the comment on the LP. I still don't get the full picture and we may need to add more debug messages to the test - because the reproducer ration is very very low 15:07:25 <jlibosva> 1 hit last week 15:07:58 <lajoskatona> Hi 15:08:01 <slaweq> jlibosva: k, thx for checking it 15:08:12 <lajoskatona> sure I check it 15:08:19 <jlibosva> I'll continue working on it and I'll send a patch with the debugs 15:08:28 <slaweq> thx a lot jlibosva 15:08:35 <slaweq> lajoskatona: thank You too :) 15:09:23 <slaweq> ok, next one 15:09:26 <slaweq> ralonsoh to check timeout while spawning metadata proxy in functional tests 15:09:50 <ralonsoh> yeah, I'm on it, one sec 15:10:04 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/779024 15:10:22 <ralonsoh> in a nutshell, this is kind of a workaround 15:10:46 <ralonsoh> if the initial IP read fails, the router will become backup 15:11:33 <ralonsoh> but at least the daemon does not hang on this process (reading the IP config) 15:11:36 <slaweq> but we still don't knw why original status wasn't read properly, right? 15:11:44 <ralonsoh> yeah... 15:11:47 <slaweq> k 15:11:49 <ralonsoh> no, sorry 15:13:05 <slaweq> ok, last one 15:13:07 <slaweq> ralonsoh to try to move to os.kill 15:13:25 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/681671 15:13:34 <ralonsoh> +w now! thanks 15:13:41 <slaweq> thank You! 15:13:57 <slaweq> I hope it will fix that problem with pm.disable() 15:14:02 <ralonsoh> I hope so! 15:14:13 <slaweq> as it's most common failure in the functional tests AFAICT 15:15:18 <slaweq> ok, that's all actions from last meeting 15:15:25 <slaweq> thank You for taking care of them 15:15:31 <slaweq> and lets move on to the next topic 15:15:33 <slaweq> #topic Stadium projects 15:15:38 <slaweq> lajoskatona: any updates? 15:16:13 <lajoskatona> not much, I just checked and only rocky is the problematic recently for odl and bgpvpn i.e. 15:17:31 <bcafarel> and we will get a nice round of recent CI status with wallaby branch creation (crossing fingers all pass) 15:17:50 <slaweq> :) 15:17:57 <slaweq> thx for taking care of it 15:17:59 <lajoskatona> +1 15:18:06 <slaweq> if You would need any help, please ping me 15:18:14 <slaweq> next topic 15:18:16 <slaweq> #topic Stable branches 15:18:31 <slaweq> bcafarel: anything going bad with stable branches? 15:18:47 <bcafarel> overall things are goode 15:19:17 <bcafarel> functional needs a bit more rechecks than usual but maybe some of the in progress patches will help in stable too :) 15:19:44 <slaweq> bcafarel: do You have examples of failures? 15:19:45 <bcafarel> and tempest-slow-py3 on stein from time to time, I wanted to ask slaweq if the swith to neutron-tempest-slow-yp3 is backportable there? 15:19:53 <bcafarel> (I did not check yet, just asking) 15:20:18 <slaweq> I can check if backport of that would be doable easy 15:20:39 <slaweq> #action slaweq to try to backport neutron-tempest-slow-py3 to stable/stein 15:21:04 <bcafarel> slaweq: thanks! I will dig up these functional failures in the meantime (lost my tabs yesterday...) 15:21:21 <slaweq> bcafarel: k 15:21:22 <slaweq> thx 15:21:36 <slaweq> next topic 15:21:38 <slaweq> #topic Grafana 15:21:44 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:22:26 <slaweq> we had issue with neutron-tempest-plugin-scenario-ovn job during the weekend 15:22:33 <slaweq> it was caused by my patch to devstack 15:22:38 <ralonsoh> hehehe 15:22:44 <slaweq> but now it is fixed already in config of our job 15:22:56 <slaweq> proper devstack fix is proposed by ralonsoh also 15:23:15 <ralonsoh> according to amotoki's suggestion 15:23:23 <slaweq> and we have problem with neutron-grenade-dvr-multinode job 15:23:36 <slaweq> which we can discuss in few minutes 15:24:06 <slaweq> except that, our periodic fedora job is passing again 15:24:11 <slaweq> thx bcafarel for fix 15:24:32 <bcafarel> nice, so periodic is all green? 15:24:49 <slaweq> it was e.g. yesterday 15:24:53 <slaweq> today functional job failed 15:25:03 <slaweq> but it's much better than it was 15:25:55 <slaweq> if there are is nothing else regarding grafana, lets go to some specific jobs now 15:26:09 <slaweq> #topic fullstack/functional 15:26:20 <slaweq> I found one qos related issue in functional jobs: 15:26:24 <slaweq> https://23270c5969573311d718-1a2f2b99c35dbfd3f442550661b64ad9.ssl.cf5.rackcdn.com/780916/3/check/neutron-functional-with-uwsgi/6443654/testr_results.html 15:26:30 <slaweq> ralonsoh: did You saw it already? 15:26:38 <ralonsoh> no, I'll check it 15:27:17 <slaweq> maybe lajoskatona or rubasov could take a look as You are probably familiar with minimum bandwidth rules 15:27:58 <ralonsoh> actually this test is failing precisely in https://review.opendev.org/c/openstack/neutron/+/770154/1/neutron/tests/functional/agent/common/test_ovs_lib.py 15:27:59 <lajoskatona> slaweq: in functional? 15:28:10 <lajoskatona> we have just fullstack as I remember, I check it 15:28:14 <ralonsoh> this is not related to QoS 15:28:57 <slaweq> ralonsoh: isn't it? I saw it's failing in the "qos something test" :) 15:29:11 <ralonsoh> yes but is failing during the port creation 15:29:21 <lajoskatona> test_update_minimum_bandwidth_queue_no_qos_no_queue? 15:29:24 <ralonsoh> this is an OVS problem/delay/something else 15:29:27 <slaweq> lajoskatona: yes 15:29:42 <slaweq> ralonsoh: ok, so sorry for mistake :) 15:29:49 <ralonsoh> no, not at all 15:30:45 <slaweq> ok, next one is from fullstack 15:30:51 <slaweq> neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_keepalived_multiple_sighups_does_not_forfeit_primary 15:30:52 <slaweq> https://3c7a0b423eb6fce37a1a-4781fed732bcaf2a49f1da3bb2ee8431.ssl.cf5.rackcdn.com/782275/1/gate/neutron-fullstack-with-uwsgi/3b96f9a/testr_results.html 15:31:19 <slaweq> but I think it could be some oom killer again 15:34:44 <slaweq> so I don't think we should bother with that now 15:34:51 <slaweq> lets see if that will happen more times 15:35:01 <slaweq> #topic Tempest/Scenario 15:35:11 <slaweq> here I found also one, qos related failure 15:35:18 <slaweq> https://a2a93c3e4994a3d62247-1af30bf7c5ab7139a47557262bacc248.ssl.cf2.rackcdn.com/779310/6/check/neutron-tempest-plugin-scenario-linuxbridge/0550a7c/testr_results.html 15:35:25 <slaweq> this time I think it is qos related ;) 15:36:15 <slaweq> or not 15:36:18 <slaweq> as I see now 15:36:19 <slaweq> 2021-03-22 11:57:53,245 82121 WARNING [neutron_tempest_plugin.scenario.test_qos] Socket timeout while reading the remote file, bytes read: 430752 15:36:29 <ralonsoh> it is, I thin so 15:36:36 <ralonsoh> think* 15:37:53 <slaweq> ralonsoh: do You think we should open LP for that? 15:38:05 <ralonsoh> I'll check it first 15:38:08 <slaweq> thx 15:38:18 <slaweq> #action ralonsoh to check failed qos scenario test 15:38:35 <slaweq> ok, next topic 15:38:40 <slaweq> #topic Grenade 15:38:58 <slaweq> as I mentioned earlier, we have one serious issue with grenade dvr job 15:39:02 <slaweq> Bug https://launchpad.net/bugs/1920778 15:39:17 <openstack> Launchpad bug 1920778 in neutron ""test_add_remove_fixed_ip" faling in "grenade-dvr-multinode" CI job" [Critical,New] - Assigned to Slawek Kaplonski (slaweq) 15:39:19 <slaweq> so far I proposed https://review.opendev.org/c/openstack/neutron/+/782275/ to make that job non voting and non gating temporary 15:39:26 <slaweq> and I'm investigating that 15:39:47 <slaweq> it seems for me and ralonsoh that it is some issue with metadata as connectivity is established properly 15:40:13 <slaweq> but we will probably need to ask infra team to put some nodes on hold for us and try to debug there directly 15:40:28 <slaweq> as from logs we don't know exactly what could be wrong there 15:40:50 <slaweq> I was also trying today to find way in tempest to not clean resources in case of test failure 15:41:07 <slaweq> but I didn't found any smart way to do that so far 15:41:18 <slaweq> I will ask gmann if that is possible somehow 15:43:21 <slaweq> and that's basically all update about that grenade job so far 15:43:28 <slaweq> #topic rally 15:43:37 <slaweq> I opened new bug today https://bugs.launchpad.net/neutron/+bug/1920923 15:43:38 <openstack> Launchpad bug 1920923 in neutron "Rally test NeutronNetworks.create_and_update_subnets fails" [High,Confirmed] 15:43:48 <slaweq> if there is anyone who wants to check it, that would be great 15:44:20 <ralonsoh> I'll try to debug this one 15:44:27 <slaweq> thx ralonsoh 15:44:56 <slaweq> and that was last thing from me for today 15:45:01 <lajoskatona> I just checked and seems like everything stopped for seconds 15:45:08 <slaweq> do You have anything else You want to discuss today? 15:46:04 <bcafarel> nothing from me 15:46:55 <slaweq> if not, I will give You few minutes back today 15:46:59 <slaweq> thx for attending the meeting 15:47:02 <slaweq> o/ 15:47:05 <ralonsoh> bye 15:47:07 <bcafarel> o/ 15:47:08 <slaweq> #endmeeting