15:01:11 #startmeeting neutron_ci 15:01:12 Meeting started Tue Mar 23 15:01:11 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:16 The meeting name has been set to 'neutron_ci' 15:01:18 hi 15:01:19 hi 15:01:20 o/ again 15:01:34 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:01:35 Please open now :) 15:02:41 #topic Actions from previous meetings 15:03:19 ralonsoh to try to check how to limit number of logged lines in FT output 15:03:39 one sec.. 15:03:59 https://review.opendev.org/c/openstack/neutron/+/780926 15:04:11 reviews are welcome 15:05:01 sure 15:05:05 but I already +2 it :/ 15:05:19 hehehe I know 15:05:38 maybe lajoskatona can check 15:05:55 :) I know why it looked familiar I checked it not so long ago 15:06:28 thx ralonsoh for that 15:06:31 next one 15:06:37 jlibosva to check LP https://bugs.launchpad.net/neutron/+bug/1918266 15:06:39 Launchpad bug 1918266 in neutron "Functional test test_gateway_chassis_rebalance failing due to "failed to bind logical router"" [High,Confirmed] - Assigned to Jakub Libosvar (libosvar) 15:07:22 I have spent some time today looking at it, I tried to capture it in the comment on the LP. I still don't get the full picture and we may need to add more debug messages to the test - because the reproducer ration is very very low 15:07:25 1 hit last week 15:07:58 Hi 15:08:01 jlibosva: k, thx for checking it 15:08:12 sure I check it 15:08:19 I'll continue working on it and I'll send a patch with the debugs 15:08:28 thx a lot jlibosva 15:08:35 lajoskatona: thank You too :) 15:09:23 ok, next one 15:09:26 ralonsoh to check timeout while spawning metadata proxy in functional tests 15:09:50 yeah, I'm on it, one sec 15:10:04 https://review.opendev.org/c/openstack/neutron/+/779024 15:10:22 in a nutshell, this is kind of a workaround 15:10:46 if the initial IP read fails, the router will become backup 15:11:33 but at least the daemon does not hang on this process (reading the IP config) 15:11:36 but we still don't knw why original status wasn't read properly, right? 15:11:44 yeah... 15:11:47 k 15:11:49 no, sorry 15:13:05 ok, last one 15:13:07 ralonsoh to try to move to os.kill 15:13:25 https://review.opendev.org/c/openstack/neutron/+/681671 15:13:34 +w now! thanks 15:13:41 thank You! 15:13:57 I hope it will fix that problem with pm.disable() 15:14:02 I hope so! 15:14:13 as it's most common failure in the functional tests AFAICT 15:15:18 ok, that's all actions from last meeting 15:15:25 thank You for taking care of them 15:15:31 and lets move on to the next topic 15:15:33 #topic Stadium projects 15:15:38 lajoskatona: any updates? 15:16:13 not much, I just checked and only rocky is the problematic recently for odl and bgpvpn i.e. 15:17:31 and we will get a nice round of recent CI status with wallaby branch creation (crossing fingers all pass) 15:17:50 :) 15:17:57 thx for taking care of it 15:17:59 +1 15:18:06 if You would need any help, please ping me 15:18:14 next topic 15:18:16 #topic Stable branches 15:18:31 bcafarel: anything going bad with stable branches? 15:18:47 overall things are goode 15:19:17 functional needs a bit more rechecks than usual but maybe some of the in progress patches will help in stable too :) 15:19:44 bcafarel: do You have examples of failures? 15:19:45 and tempest-slow-py3 on stein from time to time, I wanted to ask slaweq if the swith to neutron-tempest-slow-yp3 is backportable there? 15:19:53 (I did not check yet, just asking) 15:20:18 I can check if backport of that would be doable easy 15:20:39 #action slaweq to try to backport neutron-tempest-slow-py3 to stable/stein 15:21:04 slaweq: thanks! I will dig up these functional failures in the meantime (lost my tabs yesterday...) 15:21:21 bcafarel: k 15:21:22 thx 15:21:36 next topic 15:21:38 #topic Grafana 15:21:44 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:22:26 we had issue with neutron-tempest-plugin-scenario-ovn job during the weekend 15:22:33 it was caused by my patch to devstack 15:22:38 hehehe 15:22:44 but now it is fixed already in config of our job 15:22:56 proper devstack fix is proposed by ralonsoh also 15:23:15 according to amotoki's suggestion 15:23:23 and we have problem with neutron-grenade-dvr-multinode job 15:23:36 which we can discuss in few minutes 15:24:06 except that, our periodic fedora job is passing again 15:24:11 thx bcafarel for fix 15:24:32 nice, so periodic is all green? 15:24:49 it was e.g. yesterday 15:24:53 today functional job failed 15:25:03 but it's much better than it was 15:25:55 if there are is nothing else regarding grafana, lets go to some specific jobs now 15:26:09 #topic fullstack/functional 15:26:20 I found one qos related issue in functional jobs: 15:26:24 https://23270c5969573311d718-1a2f2b99c35dbfd3f442550661b64ad9.ssl.cf5.rackcdn.com/780916/3/check/neutron-functional-with-uwsgi/6443654/testr_results.html 15:26:30 ralonsoh: did You saw it already? 15:26:38 no, I'll check it 15:27:17 maybe lajoskatona or rubasov could take a look as You are probably familiar with minimum bandwidth rules 15:27:58 actually this test is failing precisely in https://review.opendev.org/c/openstack/neutron/+/770154/1/neutron/tests/functional/agent/common/test_ovs_lib.py 15:27:59 slaweq: in functional? 15:28:10 we have just fullstack as I remember, I check it 15:28:14 this is not related to QoS 15:28:57 ralonsoh: isn't it? I saw it's failing in the "qos something test" :) 15:29:11 yes but is failing during the port creation 15:29:21 test_update_minimum_bandwidth_queue_no_qos_no_queue? 15:29:24 this is an OVS problem/delay/something else 15:29:27 lajoskatona: yes 15:29:42 ralonsoh: ok, so sorry for mistake :) 15:29:49 no, not at all 15:30:45 ok, next one is from fullstack 15:30:51 neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_keepalived_multiple_sighups_does_not_forfeit_primary 15:30:52 https://3c7a0b423eb6fce37a1a-4781fed732bcaf2a49f1da3bb2ee8431.ssl.cf5.rackcdn.com/782275/1/gate/neutron-fullstack-with-uwsgi/3b96f9a/testr_results.html 15:31:19 but I think it could be some oom killer again 15:34:44 so I don't think we should bother with that now 15:34:51 lets see if that will happen more times 15:35:01 #topic Tempest/Scenario 15:35:11 here I found also one, qos related failure 15:35:18 https://a2a93c3e4994a3d62247-1af30bf7c5ab7139a47557262bacc248.ssl.cf2.rackcdn.com/779310/6/check/neutron-tempest-plugin-scenario-linuxbridge/0550a7c/testr_results.html 15:35:25 this time I think it is qos related ;) 15:36:15 or not 15:36:18 as I see now 15:36:19 2021-03-22 11:57:53,245 82121 WARNING [neutron_tempest_plugin.scenario.test_qos] Socket timeout while reading the remote file, bytes read: 430752 15:36:29 it is, I thin so 15:36:36 think* 15:37:53 ralonsoh: do You think we should open LP for that? 15:38:05 I'll check it first 15:38:08 thx 15:38:18 #action ralonsoh to check failed qos scenario test 15:38:35 ok, next topic 15:38:40 #topic Grenade 15:38:58 as I mentioned earlier, we have one serious issue with grenade dvr job 15:39:02 Bug https://launchpad.net/bugs/1920778 15:39:17 Launchpad bug 1920778 in neutron ""test_add_remove_fixed_ip" faling in "grenade-dvr-multinode" CI job" [Critical,New] - Assigned to Slawek Kaplonski (slaweq) 15:39:19 so far I proposed https://review.opendev.org/c/openstack/neutron/+/782275/ to make that job non voting and non gating temporary 15:39:26 and I'm investigating that 15:39:47 it seems for me and ralonsoh that it is some issue with metadata as connectivity is established properly 15:40:13 but we will probably need to ask infra team to put some nodes on hold for us and try to debug there directly 15:40:28 as from logs we don't know exactly what could be wrong there 15:40:50 I was also trying today to find way in tempest to not clean resources in case of test failure 15:41:07 but I didn't found any smart way to do that so far 15:41:18 I will ask gmann if that is possible somehow 15:43:21 and that's basically all update about that grenade job so far 15:43:28 #topic rally 15:43:37 I opened new bug today https://bugs.launchpad.net/neutron/+bug/1920923 15:43:38 Launchpad bug 1920923 in neutron "Rally test NeutronNetworks.create_and_update_subnets fails" [High,Confirmed] 15:43:48 if there is anyone who wants to check it, that would be great 15:44:20 I'll try to debug this one 15:44:27 thx ralonsoh 15:44:56 and that was last thing from me for today 15:45:01 I just checked and seems like everything stopped for seconds 15:45:08 do You have anything else You want to discuss today? 15:46:04 nothing from me 15:46:55 if not, I will give You few minutes back today 15:46:59 thx for attending the meeting 15:47:02 o/ 15:47:05 bye 15:47:07 o/ 15:47:08 #endmeeting