15:00:16 #startmeeting neutron_ci 15:00:17 Meeting started Wed Apr 8 15:00:16 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:18 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:20 The meeting name has been set to 'neutron_ci' 15:00:26 hi 15:00:27 hi 15:00:28 o/ 15:00:51 \o 15:01:08 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:01:16 #topic Actions from previous meetings 15:01:23 o/ 15:01:24 first one: 15:01:26 slaweq to investigate fullstack SG test broken pipe failures 15:01:36 I did some investigation on that one 15:02:06 I have some theory why it could fail like that but I need to try to reproduce it locally to confirm that 15:02:15 so I will continue this work next week too 15:02:28 #action slaweq to continue investigation of fullstack SG test broken pipe failures 15:02:41 next one 15:02:43 maciejjozefczyk to take a look and report LP for failures in neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log 15:03:02 I *think* that I found whats the problem 15:03:06 #link https://review.opendev.org/#/c/717704/ 15:03:37 I proposed to change the transaction timeout for updating OVN rows 15:03:55 with bumping up the timeout I cannot see any failure of that test in the link above. 15:04:17 for a while I changed the zuul config to run those jobs 20 times and in all cases the failure wasn't there. 15:04:40 according to the logs in the LP report - I found the timeout errors there, so I think that should help 15:04:42 based on old comment there, it's not first time when we are changing this timeout for tests 15:05:08 slaweq, yes, but the funny think is by *default* in the config we have a much bigger timeout there 15:05:21 I don't know why it was set as 5 seconds for functionals before 15:05:51 ahh, ok 15:06:07 https://docs.openstack.org/networking-ovn/latest/configuration/ml2_conf.html 15:06:14 default is 180 seconds 15:06:43 ok, if that solves the problem, lets try that :) 15:06:45 thx maciejjozefczyk 15:06:56 So actually it wasn't that bad that sometimes we got Timeout after 5 seconds :) 15:07:49 ok, lets move on 15:07:52 next one 15:07:54 ralonsoh to open LP about issue with neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase 15:08:05 patch merged 15:08:11 ++ 15:08:13 thx ralonsoh 15:08:36 #link https://review.opendev.org/#/c/716944/ 15:09:17 I hope that with this one and maciejjozefczyk's patch functional jobs will be in better shape finally :) 15:09:29 sure! 15:09:32 ++ 15:09:51 ok, next one 15:09:53 slaweq to check server termination on multicast test 15:09:59 I again didn't had time for this 15:10:22 and sadly it just hit as again in maciejjozefczyk's patch mentioned few minutes ago 15:10:24 :/ 15:10:37 I will really dedicate some time this week to finally check that one 15:10:40 #action slaweq to check server termination on multicast test 15:10:54 next one 15:10:56 ralonsoh to check failure in neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon 15:11:27 which one? 15:11:41 ups, I wasn't aware of this one 15:11:44 sorry 15:12:06 no problem 15:12:12 slaweq, perhaps my patch needs rebase 15:12:13 do You want to check it this week? 15:12:24 maciejjozefczyk: just recheck :) 15:12:25 slaweq, do you have the links? 15:12:28 of the logs 15:12:57 ralonsoh: https://80137ce53930819135d8-42d904af0faa486c8226703976d821a0.ssl.cf2.rackcdn.com/704833/23/check/neutron-functional/17568d5/testr_results.html 15:13:13 slaweq, thanks! 15:13:26 ralonsoh: is this related https://bugs.launchpad.net/neutron/+bug/1870313 ? 15:13:27 Launchpad bug 1870313 in neutron ""send_ip_addr_adv_notif" can't use eventlet when called from "keepalived_state_change"" [High,Fix released] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 15:13:34 ahh but is the same problem 15:13:39 You linked this bug last week in meeting's etherpad 15:13:43 yes, I was writing this 15:13:59 we can remove it from the TODO list 15:14:08 ok, so done :) 15:14:10 great, thx 15:14:36 ok, and the last one from previous week 15:14:38 ralonsoh to check issue with neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_router_fip_qos_after_admin_state_down_up 15:14:46 yes, the namespace drama 15:15:17 #link https://review.opendev.org/#/c/717017/ 15:15:44 it's merged, please review the commit message for more info 15:15:47 LOL - "namespace drama" sounds like a good name for topic 15:15:48 or ping me in IRC 15:15:53 hehehehe 15:15:57 ahh, it's that one 15:15:59 ok 15:16:03 heh 15:16:24 ok, that's all actions from last week 15:16:30 lets move on 15:16:34 #topic Stadium projects 15:16:44 standardize on zuul v3 15:16:45 Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop 15:16:52 I just marked networking-bagpipe as done 15:17:01 thx lajoskatona for work on this 15:17:17 so we have left only networking-{midonet,odl} in the list 15:17:20 not bad :) 15:17:26 getting there 15:17:40 \o/ 15:17:54 IPv6-only CI 15:17:55 Etherpad https://etherpad.openstack.org/p/neutron-stadium-ipv6-testing 15:18:02 no progress on my side with that one still 15:19:07 and there is one more thing related to stadium projects for today 15:19:12 midonet UT failures: https://bugs.launchpad.net/networking-midonet/+bug/1871568 15:19:13 Launchpad bug 1871568 in networking-midonet "python3 unit tests jobs are failing on master branch" [Undecided,New] 15:19:24 basically midonet gate is broken now 15:19:50 I just cloned it locally to check that one, but if someone wants to take it - feel free :) 15:20:23 Do we have any active midonet contributors left? 15:20:38 yamamoto? 15:20:39 I haven't seen yamamoto active recently 15:20:41 njohnston: yamamoto is the only one I'm aware of 15:20:50 but his activity is very limited :/ 15:21:17 I will send him email about that one - maybe he will be able to help with this 15:22:11 #action slaweq to ping yamamoto about midonet gate problems 15:22:24 ok, anything else related to stadium for today? 15:22:28 I think it would be good from a stadium perspective to do that just to check the health of the midonet driver, I know they have struggled to keep up the last few cycles with Xenial->Bionic and py27 15:22:59 njohnston: yes, that's true 15:23:19 maybe we will need to discuss that (again) during the next PTG 15:23:23 and Stackalytics shows no activity for yamamoto since January 15:24:19 ok 15:24:29 I will email him and we will see how it will be 15:25:09 +1 15:25:21 +1 15:25:35 lets move on 15:25:36 #topic Stable branches 15:26:06 as recently we have less issues to discuss e.g. with scenario jobs, I though that it would be good to add topic about stable branches to this meeting 15:26:24 so we can all catch up on current ci state for stable brances 15:26:25 +100 15:26:43 perfect 15:27:14 I have not been updating the stable grafana dashboards, has anyone else? 15:27:47 njohnston: nope 15:27:47 ah looks like they are good, excellent 15:27:55 http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1 <- train 15:28:14 http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1 <- stein 15:28:48 I'm not sure if all jobs are up to date there 15:28:56 yes at least the branch name changes are there, did not check if jobs are up to date there 15:30:04 bcafarel: do You want to check that this week? :) 15:30:09 sure! 15:30:23 thx 15:30:37 from my "what recheck keyword did you type?" memory, most are designate, rally and the occasional non-neutron test failing 15:30:39 #action bcafarel to check and update stable branches grafana dashboards 15:31:23 one serious issue from today is this one with rally https://bugs.launchpad.net/neutron/+bug/1871596 15:31:24 Launchpad bug 1871596 in neutron "Rally job on stable branches is failing" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:31:37 but it seems it's already fixed in rally thanks to lucasgomes :) 15:31:51 +1 15:32:21 yep 15:32:48 bcafarel: looks like haleyb updated the stable dashboards when train was released 15:32:49 great, so other than that it's fine for stable branches now, right? 15:33:09 fix is merged and from andreykurilin's comments there are not many rally calls from playbook (the part currently running from master) so we should be good 15:33:40 https://bugs.launchpad.net/bugs/1871327 also has fixes merged in all branches now 15:33:41 Launchpad bug 1871327 in tempest "stable/stein tempest-full job fails with "tempest requires Python '>=3.6' but the running Python is 2.7.17"" [Undecided,New] 15:34:27 ok 15:34:37 so I think we can move on to the next topic 15:34:38 #topic Grafana 15:34:44 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:34:49 Average number of rechecks in last weeks: 15:34:51 week 14 of 2020: 3.13 15:34:53 week 15 of 2020: 3.6 15:35:06 that's my metrics - not so good this week :/ 15:35:19 IMO, related to mentioned problems 15:35:25 ralonsoh: yes 15:35:41 that's for sure - I didn't noticed any new problems last week 15:36:03 one thing which I want to mention is that Grenade jobs should be better soon as https://bugs.launchpad.net/bugs/1844929 has proposed patch https://review.opendev.org/717662 15:36:04 Launchpad bug 1844929 in OpenStack Compute (nova) "grenade jobs failing due to "Timed out waiting for response from cell" in scheduler" [High,In progress] - Assigned to melanie witt (melwitt) 15:36:26 melwitt did great debugging on it and found finally what was the issue there 15:36:43 ooooh nice! (I forgot grenade in the frequent recheck causes on stable)) 15:38:00 I also proposed patch https://review.opendev.org/718392 with updates for grafana dashboard 15:39:30 and that's all about grafana from me 15:39:33 anything else to add? 15:39:34 (you need to rebase it) 15:40:19 ralonsoh: I will 15:40:51 ok, if not, let's move on 15:41:00 #topic Tempest/Scenario 15:41:12 I have one new issue only here 15:41:16 Tripleo based jobs are failing, like e.g.: https://933286ee423f4ed9028e-1eceb8a6fb7f917522f65bda64a8589f.ssl.cf5.rackcdn.com/717754/2/check/neutron-centos-8-tripleo-standalone/a5f2585/job-output.txt 15:41:38 do You maybe know why it can be like that? It seems that neutron rpm build fails in those jobs 15:42:08 sounds like https://review.rdoproject.org/r/26305 15:42:15 slaweq, after we merged the ovn migration tools 15:42:32 yes bcafarel 15:42:34 I though that but I wanted to confirm that :) 15:42:41 thx 15:42:45 so should be good soon 15:43:28 ok, so that's all what I have for today 15:43:40 anything else You want to talk about regarding ci? 15:44:00 glad things are looking pretty stable going into the U release 15:45:11 njohnston: yes, IMO it is in better shape recently 15:45:17 we don't have many new issues 15:45:25 nothing else from me 15:45:29 and patches are generally merged (usually) pretty fast 15:46:12 ok, if there is nothing else, I will give You almost 15 minutes back :) 15:46:26 thx for attending the meeting and for taking care of our CI 15:46:29 o/ 15:46:31 #endmeeting