15:01:21 <slaweq> #startmeeting neutron_ci 15:01:22 <openstack> Meeting started Tue Apr 27 15:01:21 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:23 <slaweq> hi 15:01:24 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:26 <openstack> The meeting name has been set to 'neutron_ci' 15:01:28 <ralonsoh> hi 15:01:32 <lajoskatona> Hi 15:01:38 <bcafarel> o/ 15:01:51 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:39 <slaweq> and now we can start 15:02:56 <slaweq> #topic Actions from previous meetings 15:03:06 <slaweq> first one 15:03:08 <slaweq> slaweq to update wallaby's scenario jobs in neutron-tempest-plugin 15:03:21 <slaweq> I did, all patches are merged but I don't have links now 15:03:32 <slaweq> next one 15:03:33 <slaweq> bcafarel to report stable/rocky ci failures on LP 15:05:13 <bcafarel> https://bugs.launchpad.net/neutron/+bug/1924315 and our fearless PTL close to fixing it (when CI is happy) 15:05:13 <openstack> Launchpad bug 1924315 in neutron "[stable/rocky] neutron-tempest-plugin-scenario-openvswitch-iptables_hybrid-rocky job fails" [Critical,In progress] - Assigned to Slawek Kaplonski (slaweq) 15:05:40 <slaweq> "fearless PTL" :D 15:05:46 <bcafarel> although I was looking in https://bugs.launchpad.net/neutron/+bug/1925451 - grenade seems to fail about 50% of the time on that DistutilsError 15:05:46 <openstack> Launchpad bug 1925451 in neutron "[stable/rocky] grenade job is broken" [Critical,New] 15:05:47 <slaweq> you made my day now 15:06:07 <bcafarel> :) 15:06:23 <slaweq> patch https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/786657 should fix that original issue 15:07:05 <slaweq> regarding grenade one, did You check if that is e.g. failing only one some of the cloud providers? 15:08:13 <bcafarel> no, good point I will check that 15:08:19 <slaweq> thx 15:08:33 <slaweq> so lets continue this discussion later/tomorrow 15:09:03 <slaweq> we need to fix it finally and unblock rocky's gate 15:09:12 <bcafarel> +1 15:09:25 <slaweq> ralonsoh: please check https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/786657 :) 15:09:31 <slaweq> this is also needed for rocky gate 15:09:32 <ralonsoh> done 15:09:40 <slaweq> thx 15:10:02 <slaweq> ok, next one 15:10:04 <slaweq> ralonsoh to mark test_keepalived_spawns_conflicting_pid_vrrp_subprocess functional test as unstable 15:10:27 <ralonsoh> no progress last week, but related to the kill signal 15:10:34 <ralonsoh> no progress, sorry 15:10:48 <slaweq> I will set it for You for this week, ok? 15:10:51 <ralonsoh> sure 15:10:55 <slaweq> #action ralonsoh to mark test_keepalived_spawns_conflicting_pid_vrrp_subprocess functional test as unstable 15:10:56 <slaweq> thx 15:11:04 <slaweq> next one 15:11:05 <slaweq> slaweq to report LP with metadata issue in scenario jobs 15:11:11 <slaweq> Bug reported: https://bugs.launchpad.net/neutron/+bug/1923633 15:11:11 <openstack> Launchpad bug 1923633 in neutron "Neutron-tempest-plugin scenario jobs failing due to metadata issues" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:11:25 <slaweq> this is currently IMO most hurting us bug in ci 15:11:42 <slaweq> we investigated that with ralonsoh last week 15:11:46 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/787777 15:11:51 <slaweq> and we think we know what is going on there 15:11:55 <ralonsoh> doesn't help too much 15:12:08 <slaweq> still same issues? 15:12:19 <ralonsoh> yes but less frequent 15:12:28 <slaweq> :/ 15:12:37 <slaweq> so maybe we don't know exactly what is the problem there 15:12:44 <ralonsoh> the socket receiving the messages is not responsive 15:13:03 <slaweq> for very long time, or forever? 15:13:09 <ralonsoh> long time 15:13:18 <ralonsoh> another option could be not to mock socket module 15:13:23 <ralonsoh> in the L3 agent 15:13:24 <slaweq> I wonder what have been changed recently there 15:13:30 <ralonsoh> I'll try it in this patch 15:13:41 <slaweq> as it wasn't that bad few weeks back 15:13:45 <ralonsoh> s/mock/monkey_patch 15:14:26 <slaweq> ok, lets try that 15:14:31 <ralonsoh> ok 15:14:42 <slaweq> I also made patch https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/787324 which should mittigate the issue a bit at least 15:15:04 <slaweq> it's merged now, so hopefully jobs will be a bit more stable now 15:15:34 <slaweq> and also, if You will see in job error that "router wasn't become active on any L3 agent" than it means for sure that You hit the same bug 15:15:49 <slaweq> so it will be easier to identify that specific issue now 15:15:53 <ralonsoh> right 15:16:44 <slaweq> ok, next topic 15:16:46 <slaweq> #topic Stadium projects 15:16:51 <slaweq> any updates? 15:17:04 <lajoskatona> no specific thing 15:17:26 <lajoskatona> CI is working (at elast where I see new patches :-) 15:17:39 <bcafarel> that section will probably heat up with OVN switch 15:17:40 <slaweq> ok, that's good news :) 15:17:42 <slaweq> thx 15:17:50 <slaweq> bcafarel: true 15:18:02 <slaweq> maybe we should start changing jobs definitions where it's needed? 15:18:23 <slaweq> any volunteer to do that? 15:18:27 <lajoskatona> yeah perhaps, to make it happen paralelly 15:18:44 <lajoskatona> I can check 15:18:49 <slaweq> thx lajoskatona 15:19:14 <slaweq> #action lajoskatona to check stadium job's and what needs to be switched to ovs explicity 15:19:26 <slaweq> ok, next topic 15:19:28 <slaweq> #topic Stable branches 15:19:34 <slaweq> anything to discuss? 15:19:38 <slaweq> except rocky 15:20:34 <bcafarel> there is still https://bugs.launchpad.net/neutron/+bug/1923412 for stein, I hope to finally take a look this week 15:20:34 <openstack> Launchpad bug 1923412 in neutron "[stable/stein] Tempest fails with unrecognized arguments: --exclude-regex" [Critical,Triaged] 15:21:00 <slaweq> bcafarel: ouch, I missed that one 15:21:08 <slaweq> it's the same issue like for rocky 15:21:14 <slaweq> or very similar 15:21:17 <tosky> bcafarel: oh, there is a devstack change which may solve that (but you can still fix it by refactoring the jobs) 15:21:34 <tosky> namely https://review.opendev.org/c/openstack/tempest/+/787455 15:21:40 <tosky> tempest, not devstack 15:22:01 <bcafarel> tosky: oh nice! I will test it as depends-on on one of our stein pending backports 15:22:20 <slaweq> nice, thx tosky 15:23:01 <tosky> or you can do what I did for cinder-tempest-plugin 15:23:14 <tosky> https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/786755 15:23:41 <tosky> but that requires branch-specific job variants and maybe a bit of refactoring (or it may be easy, depending on your job structure) 15:23:56 <lajoskatona> Have you read the TC pad (https://etherpad.opendev.org/p/tc-xena-ptg ~l360) about EOLing old branches (ocata....) ? 15:24:02 <slaweq> yes, I did something similar for our rocky jobs already 15:24:27 <tosky> ocata hopefully will be finally EOLed 15:24:40 <tosky> and pike, I guess it depends on more projects abandoning it (we did it in cinder) 15:24:56 <tosky> (so if you think about abandoning pike, please do it :) 15:25:02 <lajoskatona> ok, so perhaps the lavina will start :-) 15:25:07 <bcafarel> :) I don't recall recent backport requests on pike 15:25:38 <slaweq> me neighter 15:25:42 <slaweq> only queens and newer 15:25:54 <slaweq> but still, even queens and rocky are starting to be pain 15:26:12 <bcafarel> no open pike backport, last merge in July 2020 15:26:25 <tosky> yeah, in a non far future (in cinder, again) we are thinking about abandoning those too 15:26:34 <slaweq> ++ 15:26:45 <slaweq> we can think about it also 15:26:50 <slaweq> or just do it 15:26:55 <slaweq> I will take a look 15:26:59 <tosky> it seems one of those things where, if no one starts, it's never going to happen 15:27:59 <lajoskatona> yeah but silently we anyway skip those branches 15:28:37 <tosky> so better give that message to the community in an official way: this is gone 15:28:42 <slaweq> true, it's just not officially marked as EOL 15:28:54 <slaweq> I will check how to do it in next week 15:29:11 <slaweq> thx for bringing that topic up 15:30:16 <bcafarel> +1 15:30:25 <slaweq> ok, lets move on 15:30:27 <slaweq> #topic Grafana 15:31:58 <slaweq> looking at dashboard, the only big problem which I see is that one with neutron-tempest-plugin jobs 15:32:25 <slaweq> and that is mostly cause by the bug with L3 HA which we already discussed earlier 15:34:03 <slaweq> do You see anything else You want to discuss? 15:34:08 <slaweq> or can we move on? 15:35:34 <bcafarel> let's go to next topic yes 15:35:40 <slaweq> ok, let's go 15:35:45 <slaweq> #topic fullstack/functional 15:35:55 <slaweq> Here there is just one quick thing 15:36:05 <slaweq> please review new test https://review.opendev.org/c/openstack/neutron/+/783748 :) 15:36:10 <ralonsoh> sure 15:36:20 <slaweq> thx 15:36:29 <slaweq> I don't have any new issues from those jobs for today 15:36:36 <slaweq> #topic Tempest/Scenario 15:36:44 <slaweq> here there is couple of new issues 15:36:55 <slaweq> first one, there is bug reported by Liu: 15:37:00 <slaweq> https://bugs.launchpad.net/neutron/+bug/1926109 15:37:00 <openstack> Launchpad bug 1926109 in neutron "SSH timeout (wait timeout) due to potential paramiko issue" [Critical,New] 15:37:32 <slaweq> but tbh, I'm not sure if that isn't the same issue with L3 HA like we discussed already 15:37:41 <ralonsoh> is this one related to the ha router? 15:37:42 <slaweq> the problem is that in that case there is no console log logged 15:37:45 <ralonsoh> yes, same concern 15:38:06 <slaweq> I think we should first add log of the vm's consolelog first 15:38:15 <slaweq> and then we will see if that's not duplicate 15:38:23 <slaweq> any volunteer to do that? 15:38:26 <ralonsoh> exactly, to check the metadata update 15:38:34 <ralonsoh> I can (at the end of the week) 15:38:35 <slaweq> or, even better 15:38:51 <lajoskatona> I can add, will se if can do before ralonsoh 15:38:56 <slaweq> we should be able to know it without console log now, when https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/787324 is merged 15:39:10 <slaweq> but console log could be useful always 15:39:17 <slaweq> so thx lajoskatona and ralonsoh for taking care of it 15:39:25 <ralonsoh> yeah, console output will help 15:39:37 <slaweq> #action ralonsoh or lajoskatona will add logging of the console log, related to the https://bugs.launchpad.net/neutron/+bug/1926109 15:39:37 <openstack> Launchpad bug 1926109 in neutron "SSH timeout (wait timeout) due to potential paramiko issue" [Critical,New] 15:39:38 <slaweq> :) 15:39:45 <slaweq> I assigned it to both of You :P 15:40:37 <slaweq> I also found one issue with multicast test in the ovn job: 15:40:38 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b66/712474/7/check/neutron-tempest-plugin-scenario-ovn/b661cd4/testr_results.html 15:40:58 <slaweq> but I need to check if that is something what happens more often and report LP with it 15:41:26 <slaweq> #action slaweq to check frequency of the multicast issue in the ovn job and report a LP bug for that 15:42:28 <slaweq> and last one topic for today 15:42:34 <slaweq> #topic Periodic 15:42:47 <slaweq> I just noticed that nftables jobs are failing every day 15:42:52 <slaweq> like e.g. https://619cfb3845a212f70f8d-f88cc2e228aea8b2c74f92ce7ecb609d.ssl.cf2.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-tempest-plugin-scenario-linuxbridge-nftables/1d9785e/job-output.txt 15:43:02 <slaweq> and it's like that for both of them 15:43:16 <slaweq> they are failing on "[nftables : Restore saved IPv4 iptables rules, stored by iptables-persistent]" 15:43:29 <ralonsoh> yeah... ok, I'll check it 15:43:32 <slaweq> thx 15:43:39 <slaweq> ralonsoh: to check periodic nftables jobs 15:43:44 <slaweq> #action ralonsoh: to check periodic nftables jobs 15:44:10 <slaweq> and that's basically all what I have for today 15:44:24 <slaweq> do You have anything else to discuss now? 15:44:43 <slaweq> or if not, I'm closing the meeting and calling it a day finally :) 15:45:53 <bcafarel> in that case, nothing to add for me :) 15:45:53 <slaweq> ok, so thx for attending the meeting 15:45:56 <ralonsoh> bye! 15:46:02 <lajoskatona> o/ 15:46:02 <bcafarel> o/ 15:46:03 <slaweq> have a nice day, and see You online 15:46:06 <slaweq> #endmeeting