15:01:21 <slaweq> #startmeeting neutron_ci
15:01:22 <openstack> Meeting started Tue Apr 27 15:01:21 2021 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:23 <slaweq> hi
15:01:24 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:26 <openstack> The meeting name has been set to 'neutron_ci'
15:01:28 <ralonsoh> hi
15:01:32 <lajoskatona> Hi
15:01:38 <bcafarel> o/
15:01:51 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:02:39 <slaweq> and now we can start
15:02:56 <slaweq> #topic Actions from previous meetings
15:03:06 <slaweq> first one
15:03:08 <slaweq> slaweq to update wallaby's scenario jobs in neutron-tempest-plugin
15:03:21 <slaweq> I did, all patches are merged but I don't have links now
15:03:32 <slaweq> next one
15:03:33 <slaweq> bcafarel to report stable/rocky ci failures on LP
15:05:13 <bcafarel> https://bugs.launchpad.net/neutron/+bug/1924315 and our fearless PTL close to fixing it (when CI is happy)
15:05:13 <openstack> Launchpad bug 1924315 in neutron "[stable/rocky] neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid-rocky job fails" [Critical,In progress] - Assigned to Slawek Kaplonski (slaweq)
15:05:40 <slaweq> "fearless PTL" :D
15:05:46 <bcafarel> although I was looking in https://bugs.launchpad.net/neutron/+bug/1925451 - grenade seems to fail about 50% of the time on that DistutilsError
15:05:46 <openstack> Launchpad bug 1925451 in neutron "[stable/rocky] grenade job is broken" [Critical,New]
15:05:47 <slaweq> you made my day now
15:06:07 <bcafarel> :)
15:06:23 <slaweq> patch https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/786657 should fix that original issue
15:07:05 <slaweq> regarding grenade one, did You check if that is e.g. failing only one some of the cloud providers?
15:08:13 <bcafarel> no, good point I will check that
15:08:19 <slaweq> thx
15:08:33 <slaweq> so lets continue this discussion later/tomorrow
15:09:03 <slaweq> we need to fix it finally and unblock rocky's gate
15:09:12 <bcafarel> +1
15:09:25 <slaweq> ralonsoh: please check https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/786657 :)
15:09:31 <slaweq> this is also needed for rocky gate
15:09:32 <ralonsoh> done
15:09:40 <slaweq> thx
15:10:02 <slaweq> ok, next one
15:10:04 <slaweq> ralonsoh to mark test_keepalived_spawns_conflicting_pid_vrrp_subprocess functional test as unstable
15:10:27 <ralonsoh> no progress last week, but related to the kill signal
15:10:34 <ralonsoh> no progress, sorry
15:10:48 <slaweq> I will set it for You for this week, ok?
15:10:51 <ralonsoh> sure
15:10:55 <slaweq> #action ralonsoh to mark test_keepalived_spawns_conflicting_pid_vrrp_subprocess functional test as unstable
15:10:56 <slaweq> thx
15:11:04 <slaweq> next one
15:11:05 <slaweq> slaweq to report LP with metadata issue in scenario jobs
15:11:11 <slaweq> Bug reported: https://bugs.launchpad.net/neutron/+bug/1923633
15:11:11 <openstack> Launchpad bug 1923633 in neutron "Neutron-tempest-plugin scenario jobs failing due to metadata issues" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
15:11:25 <slaweq> this is currently IMO most hurting us bug in ci
15:11:42 <slaweq> we investigated that with ralonsoh last week
15:11:46 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/787777
15:11:51 <slaweq> and we think we know what is going on there
15:11:55 <ralonsoh> doesn't help too much
15:12:08 <slaweq> still same issues?
15:12:19 <ralonsoh> yes but less frequent
15:12:28 <slaweq> :/
15:12:37 <slaweq> so maybe we don't know exactly what is the problem there
15:12:44 <ralonsoh> the socket receiving the messages is not responsive
15:13:03 <slaweq> for very long time, or forever?
15:13:09 <ralonsoh> long time
15:13:18 <ralonsoh> another option could be not to mock socket module
15:13:23 <ralonsoh> in the L3 agent
15:13:24 <slaweq> I wonder what have been changed recently there
15:13:30 <ralonsoh> I'll try it in this patch
15:13:41 <slaweq> as it wasn't that bad few weeks back
15:13:45 <ralonsoh> s/mock/monkey_patch
15:14:26 <slaweq> ok, lets try that
15:14:31 <ralonsoh> ok
15:14:42 <slaweq> I also made patch https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/787324 which should mittigate the issue a bit at least
15:15:04 <slaweq> it's merged now, so hopefully jobs will be a bit more stable now
15:15:34 <slaweq> and also, if You will see in job error that "router wasn't become active on any L3 agent" than it means for sure that You hit the same bug
15:15:49 <slaweq> so it will be easier to identify that specific issue now
15:15:53 <ralonsoh> right
15:16:44 <slaweq> ok, next topic
15:16:46 <slaweq> #topic Stadium projects
15:16:51 <slaweq> any updates?
15:17:04 <lajoskatona> no specific thing
15:17:26 <lajoskatona> CI is working (at elast where I see new patches :-)
15:17:39 <bcafarel> that section will probably heat up with OVN switch
15:17:40 <slaweq> ok, that's good news :)
15:17:42 <slaweq> thx
15:17:50 <slaweq> bcafarel: true
15:18:02 <slaweq> maybe we should start changing jobs definitions where it's needed?
15:18:23 <slaweq> any volunteer to do that?
15:18:27 <lajoskatona> yeah perhaps, to make it happen paralelly
15:18:44 <lajoskatona> I can check
15:18:49 <slaweq> thx lajoskatona
15:19:14 <slaweq> #action lajoskatona to check stadium job's and what needs to be switched to ovs explicity
15:19:26 <slaweq> ok, next topic
15:19:28 <slaweq> #topic Stable branches
15:19:34 <slaweq> anything to discuss?
15:19:38 <slaweq> except rocky
15:20:34 <bcafarel> there is still https://bugs.launchpad.net/neutron/+bug/1923412 for stein, I hope to finally take a look this week
15:20:34 <openstack> Launchpad bug 1923412 in neutron "[stable/stein] Tempest fails with unrecognized arguments: --exclude-regex" [Critical,Triaged]
15:21:00 <slaweq> bcafarel: ouch, I missed that one
15:21:08 <slaweq> it's the same issue like for rocky
15:21:14 <slaweq> or very similar
15:21:17 <tosky> bcafarel: oh, there is a devstack change which may solve that (but you can still fix it by refactoring the jobs)
15:21:34 <tosky> namely https://review.opendev.org/c/openstack/tempest/+/787455
15:21:40 <tosky> tempest, not devstack
15:22:01 <bcafarel> tosky: oh nice! I will test it as depends-on on one of our stein pending backports
15:22:20 <slaweq> nice, thx tosky
15:23:01 <tosky> or you can do what I did for cinder-tempest-plugin
15:23:14 <tosky> https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/786755
15:23:41 <tosky> but that requires branch-specific job variants and maybe a bit of refactoring (or it may be easy, depending on your job structure)
15:23:56 <lajoskatona> Have you read the TC pad (https://etherpad.opendev.org/p/tc-xena-ptg ~l360) about EOLing old branches (ocata....) ?
15:24:02 <slaweq> yes, I did something similar for our rocky jobs already
15:24:27 <tosky> ocata hopefully will be finally EOLed
15:24:40 <tosky> and pike, I guess it depends on more projects abandoning it (we did it in cinder)
15:24:56 <tosky> (so if you think about abandoning pike, please do it :)
15:25:02 <lajoskatona> ok, so perhaps the lavina will start :-)
15:25:07 <bcafarel> :) I don't recall recent backport requests on pike
15:25:38 <slaweq> me neighter
15:25:42 <slaweq> only queens and newer
15:25:54 <slaweq> but still, even queens and rocky are starting to be pain
15:26:12 <bcafarel> no open pike backport, last merge in July 2020
15:26:25 <tosky> yeah, in a non far future (in cinder, again) we are thinking about abandoning those too
15:26:34 <slaweq> ++
15:26:45 <slaweq> we can think about it also
15:26:50 <slaweq> or just do it
15:26:55 <slaweq> I will take a look
15:26:59 <tosky> it seems one of those things where, if no one starts, it's never going to happen
15:27:59 <lajoskatona> yeah but silently we anyway skip those branches
15:28:37 <tosky> so better give that message to the community in an official way: this is gone
15:28:42 <slaweq> true, it's just not officially marked as EOL
15:28:54 <slaweq> I will check how to do it in next week
15:29:11 <slaweq> thx for bringing that topic up
15:30:16 <bcafarel> +1
15:30:25 <slaweq> ok, lets move on
15:30:27 <slaweq> #topic Grafana
15:31:58 <slaweq> looking at dashboard, the only big problem which I see is that one with neutron-tempest-plugin jobs
15:32:25 <slaweq> and that is mostly cause by the bug with L3 HA which we already discussed earlier
15:34:03 <slaweq> do You see anything else You want to discuss?
15:34:08 <slaweq> or can we move on?
15:35:34 <bcafarel> let's go to next topic yes
15:35:40 <slaweq> ok, let's go
15:35:45 <slaweq> #topic fullstack/functional
15:35:55 <slaweq> Here there is just one quick thing
15:36:05 <slaweq> please review new test https://review.opendev.org/c/openstack/neutron/+/783748 :)
15:36:10 <ralonsoh> sure
15:36:20 <slaweq> thx
15:36:29 <slaweq> I don't have any new issues from those jobs for today
15:36:36 <slaweq> #topic Tempest/Scenario
15:36:44 <slaweq> here there is couple of new issues
15:36:55 <slaweq> first one, there is bug reported by Liu:
15:37:00 <slaweq> https://bugs.launchpad.net/neutron/+bug/1926109
15:37:00 <openstack> Launchpad bug 1926109 in neutron "SSH timeout (wait timeout) due to potential paramiko issue" [Critical,New]
15:37:32 <slaweq> but tbh, I'm not sure if that isn't the same issue with L3 HA like we discussed already
15:37:41 <ralonsoh> is this one related to the ha router?
15:37:42 <slaweq> the problem is that in that case there is no console log logged
15:37:45 <ralonsoh> yes, same concern
15:38:06 <slaweq> I think we should first add log of the vm's consolelog first
15:38:15 <slaweq> and then we will see if that's not duplicate
15:38:23 <slaweq> any volunteer to do that?
15:38:26 <ralonsoh> exactly, to check the metadata update
15:38:34 <ralonsoh> I can (at the end of the week)
15:38:35 <slaweq> or, even better
15:38:51 <lajoskatona> I can add, will se if can do before ralonsoh
15:38:56 <slaweq> we should be able to know it without console log now, when https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/787324 is merged
15:39:10 <slaweq> but console log could be useful always
15:39:17 <slaweq> so thx lajoskatona and ralonsoh for taking care of it
15:39:25 <ralonsoh> yeah, console output will help
15:39:37 <slaweq> #action ralonsoh or lajoskatona will add logging of the console log, related to the https://bugs.launchpad.net/neutron/+bug/1926109
15:39:37 <openstack> Launchpad bug 1926109 in neutron "SSH timeout (wait timeout) due to potential paramiko issue" [Critical,New]
15:39:38 <slaweq> :)
15:39:45 <slaweq> I assigned it to both of You :P
15:40:37 <slaweq> I also found one issue with multicast test in the ovn job:
15:40:38 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b66/712474/7/check/neutron-tempest-plugin-scenario-ovn/b661cd4/testr_results.html
15:40:58 <slaweq> but I need to check if that is something what happens more often and report LP with it
15:41:26 <slaweq> #action slaweq to check frequency of the multicast issue in the ovn job and report a LP bug for that
15:42:28 <slaweq> and last one topic for today
15:42:34 <slaweq> #topic Periodic
15:42:47 <slaweq> I just noticed that nftables jobs are failing every day
15:42:52 <slaweq> like e.g. https://619cfb3845a212f70f8d-f88cc2e228aea8b2c74f92ce7ecb609d.ssl.cf2.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-tempest-plugin-scenario-linuxbridge-nftables/1d9785e/job-output.txt
15:43:02 <slaweq> and it's like that for both of them
15:43:16 <slaweq> they are failing on "[nftables : Restore saved IPv4 iptables rules, stored by iptables-persistent]"
15:43:29 <ralonsoh> yeah... ok, I'll check it
15:43:32 <slaweq> thx
15:43:39 <slaweq> ralonsoh: to check periodic nftables jobs
15:43:44 <slaweq> #action ralonsoh: to check periodic nftables jobs
15:44:10 <slaweq> and that's basically all what I have for today
15:44:24 <slaweq> do You have anything else to discuss now?
15:44:43 <slaweq> or if not, I'm closing the meeting and calling it a day finally :)
15:45:53 <bcafarel> in that case, nothing to add for me :)
15:45:53 <slaweq> ok, so thx for attending the meeting
15:45:56 <ralonsoh> bye!
15:46:02 <lajoskatona> o/
15:46:02 <bcafarel> o/
15:46:03 <slaweq> have a nice day, and see You online
15:46:06 <slaweq> #endmeeting