15:00:16 <slaweq> #startmeeting neutron_ci 15:00:17 <openstack> Meeting started Wed Apr 8 15:00:16 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:20 <openstack> The meeting name has been set to 'neutron_ci' 15:00:26 <ralonsoh> hi 15:00:27 <slaweq> hi 15:00:28 <njohnston> o/ 15:00:51 <maciejjozefczyk> \o 15:01:08 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:01:16 <slaweq> #topic Actions from previous meetings 15:01:23 <bcafarel> o/ 15:01:24 <slaweq> first one: 15:01:26 <slaweq> slaweq to investigate fullstack SG test broken pipe failures 15:01:36 <slaweq> I did some investigation on that one 15:02:06 <slaweq> I have some theory why it could fail like that but I need to try to reproduce it locally to confirm that 15:02:15 <slaweq> so I will continue this work next week too 15:02:28 <slaweq> #action slaweq to continue investigation of fullstack SG test broken pipe failures 15:02:41 <slaweq> next one 15:02:43 <slaweq> maciejjozefczyk to take a look and report LP for failures in neutron.tests.functional.plugins.ml2.drivers.ovn.mech_driver.ovsdb.test_ovn_db_sync.TestOvnNbSyncOverTcp.test_ovn_nb_sync_log 15:03:02 <maciejjozefczyk> I *think* that I found whats the problem 15:03:06 <maciejjozefczyk> #link https://review.opendev.org/#/c/717704/ 15:03:37 <maciejjozefczyk> I proposed to change the transaction timeout for updating OVN rows 15:03:55 <maciejjozefczyk> with bumping up the timeout I cannot see any failure of that test in the link above. 15:04:17 <maciejjozefczyk> for a while I changed the zuul config to run those jobs 20 times and in all cases the failure wasn't there. 15:04:40 <maciejjozefczyk> according to the logs in the LP report - I found the timeout errors there, so I think that should help 15:04:42 <slaweq> based on old comment there, it's not first time when we are changing this timeout for tests 15:05:08 <maciejjozefczyk> slaweq, yes, but the funny think is by *default* in the config we have a much bigger timeout there 15:05:21 <maciejjozefczyk> I don't know why it was set as 5 seconds for functionals before 15:05:51 <slaweq> ahh, ok 15:06:07 <maciejjozefczyk> https://docs.openstack.org/networking-ovn/latest/configuration/ml2_conf.html 15:06:14 <maciejjozefczyk> default is 180 seconds 15:06:43 <slaweq> ok, if that solves the problem, lets try that :) 15:06:45 <slaweq> thx maciejjozefczyk 15:06:56 <maciejjozefczyk> So actually it wasn't that bad that sometimes we got Timeout after 5 seconds :) 15:07:49 <slaweq> ok, lets move on 15:07:52 <slaweq> next one 15:07:54 <slaweq> ralonsoh to open LP about issue with neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase 15:08:05 <ralonsoh> patch merged 15:08:11 <slaweq> ++ 15:08:13 <slaweq> thx ralonsoh 15:08:36 <ralonsoh> #link https://review.opendev.org/#/c/716944/ 15:09:17 <slaweq> I hope that with this one and maciejjozefczyk's patch functional jobs will be in better shape finally :) 15:09:29 <ralonsoh> sure! 15:09:32 <maciejjozefczyk> ++ 15:09:51 <slaweq> ok, next one 15:09:53 <slaweq> slaweq to check server termination on multicast test 15:09:59 <slaweq> I again didn't had time for this 15:10:22 <slaweq> and sadly it just hit as again in maciejjozefczyk's patch mentioned few minutes ago 15:10:24 <slaweq> :/ 15:10:37 <slaweq> I will really dedicate some time this week to finally check that one 15:10:40 <slaweq> #action slaweq to check server termination on multicast test 15:10:54 <slaweq> next one 15:10:56 <slaweq> ralonsoh to check failure in neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon 15:11:27 <ralonsoh> which one? 15:11:41 <ralonsoh> ups, I wasn't aware of this one 15:11:44 <ralonsoh> sorry 15:12:06 <slaweq> no problem 15:12:12 <maciejjozefczyk> slaweq, perhaps my patch needs rebase 15:12:13 <slaweq> do You want to check it this week? 15:12:24 <slaweq> maciejjozefczyk: just recheck :) 15:12:25 <ralonsoh> slaweq, do you have the links? 15:12:28 <ralonsoh> of the logs 15:12:57 <slaweq> ralonsoh: https://80137ce53930819135d8-42d904af0faa486c8226703976d821a0.ssl.cf2.rackcdn.com/704833/23/check/neutron-functional/17568d5/testr_results.html 15:13:13 <ralonsoh> slaweq, thanks! 15:13:26 <slaweq> ralonsoh: is this related https://bugs.launchpad.net/neutron/+bug/1870313 ? 15:13:27 <openstack> Launchpad bug 1870313 in neutron ""send_ip_addr_adv_notif" can't use eventlet when called from "keepalived_state_change"" [High,Fix released] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez) 15:13:34 <ralonsoh> ahh but is the same problem 15:13:39 <slaweq> You linked this bug last week in meeting's etherpad 15:13:43 <ralonsoh> yes, I was writing this 15:13:59 <ralonsoh> we can remove it from the TODO list 15:14:08 <slaweq> ok, so done :) 15:14:10 <slaweq> great, thx 15:14:36 <slaweq> ok, and the last one from previous week 15:14:38 <slaweq> ralonsoh to check issue with neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_router_fip_qos_after_admin_state_down_up 15:14:46 <ralonsoh> yes, the namespace drama 15:15:17 <ralonsoh> #link https://review.opendev.org/#/c/717017/ 15:15:44 <ralonsoh> it's merged, please review the commit message for more info 15:15:47 <slaweq> LOL - "namespace drama" sounds like a good name for topic 15:15:48 <ralonsoh> or ping me in IRC 15:15:53 <ralonsoh> hehehehe 15:15:57 <slaweq> ahh, it's that one 15:15:59 <slaweq> ok 15:16:03 <maciejjozefczyk> heh 15:16:24 <slaweq> ok, that's all actions from last week 15:16:30 <slaweq> lets move on 15:16:34 <slaweq> #topic Stadium projects 15:16:44 <slaweq> standardize on zuul v3 15:16:45 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop 15:16:52 <slaweq> I just marked networking-bagpipe as done 15:17:01 <slaweq> thx lajoskatona for work on this 15:17:17 <slaweq> so we have left only networking-{midonet,odl} in the list 15:17:20 <slaweq> not bad :) 15:17:26 <bcafarel> getting there 15:17:40 <njohnston> \o/ 15:17:54 <slaweq> IPv6-only CI 15:17:55 <slaweq> Etherpad https://etherpad.openstack.org/p/neutron-stadium-ipv6-testing 15:18:02 <slaweq> no progress on my side with that one still 15:19:07 <slaweq> and there is one more thing related to stadium projects for today 15:19:12 <slaweq> midonet UT failures: https://bugs.launchpad.net/networking-midonet/+bug/1871568 15:19:13 <openstack> Launchpad bug 1871568 in networking-midonet "python3 unit tests jobs are failing on master branch" [Undecided,New] 15:19:24 <slaweq> basically midonet gate is broken now 15:19:50 <slaweq> I just cloned it locally to check that one, but if someone wants to take it - feel free :) 15:20:23 <njohnston> Do we have any active midonet contributors left? 15:20:38 <ralonsoh> yamamoto? 15:20:39 <njohnston> I haven't seen yamamoto active recently 15:20:41 <slaweq> njohnston: yamamoto is the only one I'm aware of 15:20:50 <slaweq> but his activity is very limited :/ 15:21:17 <slaweq> I will send him email about that one - maybe he will be able to help with this 15:22:11 <slaweq> #action slaweq to ping yamamoto about midonet gate problems 15:22:24 <slaweq> ok, anything else related to stadium for today? 15:22:28 <njohnston> I think it would be good from a stadium perspective to do that just to check the health of the midonet driver, I know they have struggled to keep up the last few cycles with Xenial->Bionic and py27 15:22:59 <slaweq> njohnston: yes, that's true 15:23:19 <slaweq> maybe we will need to discuss that (again) during the next PTG 15:23:23 <njohnston> and Stackalytics shows no activity for yamamoto since January 15:24:19 <slaweq> ok 15:24:29 <slaweq> I will email him and we will see how it will be 15:25:09 <njohnston> +1 15:25:21 <ralonsoh> +1 15:25:35 <slaweq> lets move on 15:25:36 <slaweq> #topic Stable branches 15:26:06 <slaweq> as recently we have less issues to discuss e.g. with scenario jobs, I though that it would be good to add topic about stable branches to this meeting 15:26:24 <slaweq> so we can all catch up on current ci state for stable brances 15:26:25 <bcafarel> +100 15:26:43 <ralonsoh> perfect 15:27:14 <njohnston> I have not been updating the stable grafana dashboards, has anyone else? 15:27:47 <slaweq> njohnston: nope 15:27:47 <njohnston> ah looks like they are good, excellent 15:27:55 <njohnston> http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1 <- train 15:28:14 <njohnston> http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1 <- stein 15:28:48 <slaweq> I'm not sure if all jobs are up to date there 15:28:56 <bcafarel> yes at least the branch name changes are there, did not check if jobs are up to date there 15:30:04 <slaweq> bcafarel: do You want to check that this week? :) 15:30:09 <bcafarel> sure! 15:30:23 <slaweq> thx 15:30:37 <bcafarel> from my "what recheck keyword did you type?" memory, most are designate, rally and the occasional non-neutron test failing 15:30:39 <slaweq> #action bcafarel to check and update stable branches grafana dashboards 15:31:23 <slaweq> one serious issue from today is this one with rally https://bugs.launchpad.net/neutron/+bug/1871596 15:31:24 <openstack> Launchpad bug 1871596 in neutron "Rally job on stable branches is failing" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:31:37 <slaweq> but it seems it's already fixed in rally thanks to lucasgomes :) 15:31:51 <ralonsoh> +1 15:32:21 <bcafarel> yep 15:32:48 <njohnston> bcafarel: looks like haleyb updated the stable dashboards when train was released 15:32:49 <slaweq> great, so other than that it's fine for stable branches now, right? 15:33:09 <bcafarel> fix is merged and from andreykurilin's comments there are not many rally calls from playbook (the part currently running from master) so we should be good 15:33:40 <bcafarel> https://bugs.launchpad.net/bugs/1871327 also has fixes merged in all branches now 15:33:41 <openstack> Launchpad bug 1871327 in tempest "stable/stein tempest-full job fails with "tempest requires Python '>=3.6' but the running Python is 2.7.17"" [Undecided,New] 15:34:27 <slaweq> ok 15:34:37 <slaweq> so I think we can move on to the next topic 15:34:38 <slaweq> #topic Grafana 15:34:44 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:34:49 <slaweq> Average number of rechecks in last weeks: 15:34:51 <slaweq> week 14 of 2020: 3.13 15:34:53 <slaweq> week 15 of 2020: 3.6 15:35:06 <slaweq> that's my metrics - not so good this week :/ 15:35:19 <ralonsoh> IMO, related to mentioned problems 15:35:25 <slaweq> ralonsoh: yes 15:35:41 <slaweq> that's for sure - I didn't noticed any new problems last week 15:36:03 <slaweq> one thing which I want to mention is that Grenade jobs should be better soon as https://bugs.launchpad.net/bugs/1844929 has proposed patch https://review.opendev.org/717662 15:36:04 <openstack> Launchpad bug 1844929 in OpenStack Compute (nova) "grenade jobs failing due to "Timed out waiting for response from cell" in scheduler" [High,In progress] - Assigned to melanie witt (melwitt) 15:36:26 <slaweq> melwitt did great debugging on it and found finally what was the issue there 15:36:43 <bcafarel> ooooh nice! (I forgot grenade in the frequent recheck causes on stable)) 15:38:00 <slaweq> I also proposed patch https://review.opendev.org/718392 with updates for grafana dashboard 15:39:30 <slaweq> and that's all about grafana from me 15:39:33 <slaweq> anything else to add? 15:39:34 <ralonsoh> (you need to rebase it) 15:40:19 <slaweq> ralonsoh: I will 15:40:51 <slaweq> ok, if not, let's move on 15:41:00 <slaweq> #topic Tempest/Scenario 15:41:12 <slaweq> I have one new issue only here 15:41:16 <slaweq> Tripleo based jobs are failing, like e.g.: https://933286ee423f4ed9028e-1eceb8a6fb7f917522f65bda64a8589f.ssl.cf5.rackcdn.com/717754/2/check/neutron-centos-8-tripleo-standalone/a5f2585/job-output.txt 15:41:38 <slaweq> do You maybe know why it can be like that? It seems that neutron rpm build fails in those jobs 15:42:08 <bcafarel> sounds like https://review.rdoproject.org/r/26305 15:42:15 <maciejjozefczyk> slaweq, after we merged the ovn migration tools 15:42:32 <maciejjozefczyk> yes bcafarel 15:42:34 <slaweq> I though that but I wanted to confirm that :) 15:42:41 <slaweq> thx 15:42:45 <slaweq> so should be good soon 15:43:28 <slaweq> ok, so that's all what I have for today 15:43:40 <slaweq> anything else You want to talk about regarding ci? 15:44:00 <njohnston> glad things are looking pretty stable going into the U release 15:45:11 <slaweq> njohnston: yes, IMO it is in better shape recently 15:45:17 <slaweq> we don't have many new issues 15:45:25 <njohnston> nothing else from me 15:45:29 <slaweq> and patches are generally merged (usually) pretty fast 15:46:12 <slaweq> ok, if there is nothing else, I will give You almost 15 minutes back :) 15:46:26 <slaweq> thx for attending the meeting and for taking care of our CI 15:46:29 <slaweq> o/ 15:46:31 <slaweq> #endmeeting