15:00:16 <slaweq> #startmeeting neutron_ci 15:00:17 <openstack> Meeting started Tue Nov 24 15:00:16 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:20 <openstack> The meeting name has been set to 'neutron_ci' 15:00:47 * bcafarel finishes his coffee just in time 15:00:51 <gmann> slaweq: lajoskatona how can I help you? 15:00:51 <slaweq> welcome (again) 15:00:56 <slaweq> gmann: hi 15:01:16 <gmann> please ping me link I will review the tempest one. 15:01:27 <slaweq> gmann: we were just talking about patch 15:01:29 <slaweq> https://review.opendev.org/c/openstack/tempest/+/743695 15:01:42 <slaweq> if You will have some time to review, that would be great :) 15:01:47 <gmann> slaweq: ack, will check today 15:01:48 <lajoskatona> Hi 15:01:52 <slaweq> thx a lot gmann 15:01:59 <gmann> np! 15:02:05 <lajoskatona> gmann: Hi, I send it 15:02:15 <lajoskatona> slaweq was quivker 15:02:41 <gmann> lajoskatona: sure, i will review it today 15:02:49 <lajoskatona> gmann: thanks 15:03:06 <slaweq> ok, lets go with our ci meeting now :) 15:03:09 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:03:17 <slaweq> and agenda is on etherpad https://etherpad.opendev.org/p/neutron-ci-meetings 15:03:30 <slaweq> #topic Actions from previous meetings 15:03:36 <slaweq> first one was: 15:03:39 <slaweq> slaweq to report bug regarding errors 500 in ovn functional tests 15:03:49 <slaweq> it already was reported: https://bugs.launchpad.net/neutron/+bug/1903008 15:03:52 <openstack> Launchpad bug 1903008 in neutron "Create network failed during functional test" [High,Confirmed] 15:04:26 <slaweq> and we are actually waiting for ralonsoh's patch with engine facade migration first 15:04:33 <slaweq> so this is "on hold" for now 15:04:44 <slaweq> and next one was: 15:04:46 <slaweq> ralonsoh will decrease number of test workers in scenario jobs 15:04:52 <ralonsoh> merged 15:04:59 <slaweq> fast :) 15:05:41 <ralonsoh> https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/763051 15:05:48 <ralonsoh> sorry, I didn't find it 15:06:05 <slaweq> I hope it will make those jobs more stable 15:06:57 <bcafarel> crossing fingers 15:07:15 <slaweq> thx ralonsoh :) 15:07:18 <slaweq> ok, lets move on 15:07:21 <slaweq> #topic Stadium projects 15:07:33 <slaweq> anything regarding stadium to discuss today? 15:08:09 <bcafarel> small stable/stadium update, https://bugs.launchpad.net/neutron/+bug/1903689/comments/5 15:08:11 <openstack> Launchpad bug 1903689 in neutron "[stable/ussuri] Functional job fails - AttributeError: module 'neutron_lib.constants' has no attribute 'DEVICE_OWNER_DISTRIBUTED'" [Medium,In progress] - Assigned to Bernard Cafarelli (bcafarel) 15:08:11 <lajoskatona> nothing special, perhasp this one: https://review.opendev.org/c/openstack/networking-odl/+/763210 15:09:05 <bcafarel> basically, adding neutron to upper-constraints needs to be done manually when creating new stable branch (maybe to add to a list of steps for that?) 15:09:15 <bcafarel> I will send patches for train to victoria (forgot to do it yesterday) to catch up 15:10:19 <slaweq> bcafarel: can You also check https://docs.openstack.org/neutron/latest/contributor/policies/release-checklist.html if it's is up to date? 15:10:27 <slaweq> and maybe update with this info if needed 15:10:51 <bcafarel> ooh nice, I wondered if we had something like that 15:11:19 <bcafarel> slaweq: will do, and check other stuff I think of (adding branch tempest template, remove *master* jobs, etc) 15:11:33 <slaweq> bcafarel++ thx a lot 15:12:00 <slaweq> #action bcafarel to fix stable branches upper-constraints in stadium projects 15:12:11 <slaweq> #action bcafarel to check and update doc https://docs.openstack.org/neutron/latest/contributor/policies/release-checklist.html 15:12:29 <slaweq> ^^ just to not forget (that bcafarel voluneer for that :P) 15:12:40 <bcafarel> :) 15:13:18 <slaweq> lajoskatona: and regardig Your patch, I already +2 it 15:13:28 <slaweq> so You need e.g. ralonsoh to check that 15:13:37 <lajoskatona> slaweq: thanks, just som eadvertisement for more attention :-) 15:14:11 <ralonsoh> np 15:15:09 <slaweq> ok, next topic 15:15:11 <slaweq> #topic Stable branches 15:15:15 <slaweq> Victoria dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1 15:15:17 <slaweq> Ussuri dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1 15:16:11 <bcafarel> my unread backlog for stable is not too bad, so I'd say stble branches are good 15:16:19 <bcafarel> (well except still pending https://bugs.launchpad.net/neutron/+bug/1903531 ) 15:16:22 <openstack> Launchpad bug 1903531 in neutron "Update of neutron-server breaks compatibility to previous neutron-agent version" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:16:56 <slaweq> sorry, correct links: 15:16:58 <slaweq> Victoria dashboard: https://grafana.opendev.org/d/HUCHup2Gz/neutron-failure-rate-previous-stable-release?orgId=1 15:17:00 <slaweq> Ussuri dashboard: https://grafana.opendev.org/d/smqHXphMk/neutron-failure-rate-older-stable-release?orgId=1 15:18:30 <slaweq> thx bcafarel - that is good topic to discuss 15:18:40 <slaweq> and I forgot about it on previous meeting 15:18:54 <slaweq> I don't really know what to do with it now :/ 15:20:12 <slaweq> the problem is that IIUC fix should go to the agent's side 15:20:24 <slaweq> and if agent will be already updated, there will be no issue at all 15:20:33 <slaweq> is my understanding correct? 15:21:33 <bcafarel> checking that original commit again 15:22:50 <bcafarel> slaweq: so having a fix in agent to handle both types, and note that agents should be updated first for this bug? 15:23:25 <slaweq> problem is that officially supported update path is that first server should be updated always 15:23:34 <slaweq> as it should handle compatibility with older agents 15:23:37 <lajoskatona> what I can't see what happens if the revert will be merged 15:23:38 <slaweq> not vice versa 15:23:50 <lajoskatona> how that affects these deployments 15:24:08 <slaweq> lajoskatona: when we will revert that change, someone who already updated to e.g. 15.3 will have the same issue again 15:24:14 <slaweq> but in the opposite direction 15:24:29 <ralonsoh> exactly, they will experience the same problem 15:24:31 <slaweq> because his server will send (ip, mac) tuple 15:24:32 <lajoskatona> ok, so the fix would be better 15:24:40 <slaweq> but how to fix it? 15:24:41 <ralonsoh> because they have already rebooted the agents 15:25:09 <ralonsoh> send a patch handling both possible RPC responses 15:25:18 <ralonsoh> (IP) or (IP, MAC) 15:25:25 <slaweq> ralonsoh: but that patch needs to be on Agent's side, right 15:25:27 <slaweq> ? 15:25:44 <ralonsoh> in both, if I'm not wrong 15:25:50 <ralonsoh> this is something sent by the server 15:28:10 <slaweq> ralonsoh: yes, it is send by server 15:28:22 <slaweq> but how You want to send 2 things by server? 15:29:18 <ralonsoh> no, if the server is updated, it should send (IP,MAC) 15:29:56 <slaweq> yes 15:30:07 <ralonsoh> but, TBH, to those deployments no updated 15:30:09 <slaweq> so agent should be changed that it would be able to handle both cases 15:30:18 <ralonsoh> if they follow the update procedures 15:30:23 <ralonsoh> first the server, then the agents 15:30:40 <ralonsoh> if we don't revert the original patch, then when the server is updated 15:30:47 <ralonsoh> the RPC will send (IP,MAC) 15:30:52 <ralonsoh> and the agents won't understand this 15:31:13 <slaweq> yes, that's the problem 15:31:51 <ralonsoh> so maybe we should just revert the patch in stable releases 15:32:24 <slaweq> but if we will revert it in stable branches, then for deployments which already updated to latest Train (or Stein) the issue will be the same 15:32:36 <slaweq> updated server will again send just IP 15:32:52 <slaweq> and agent will expect (IP, MAC) as it will not have reverted change yet 15:33:22 <bcafarel> I guess it will be limited number of deployments - if we cannot have fix soon it may be the "not so bad" option 15:33:51 <bcafarel> fix it for people that have not updated yet, with the cost of additional trouble for those (hopefully few) who did 15:34:20 <slaweq> I think that will be better but I'm not 100% sure about that 15:34:29 <slaweq> ok, I will try to play with it a bit more 15:34:49 <slaweq> and lets discuss that on drivers meeting on Friday and decide there what to do with it 15:35:42 <slaweq> are You ok with this plan? 15:35:47 <ralonsoh> perfect 15:35:49 <bcafarel> sounds good 15:35:51 <slaweq> ok 15:36:15 <slaweq> we may also say that e.g. 15.3 is "broken" and maybe remove it from pypi if possible 15:36:27 <slaweq> so no new people will update to that verion 15:36:34 <ralonsoh> that's also an option 15:36:39 <slaweq> I will ask release team for that 15:36:47 <bcafarel> +1 that would be good in the meantime 15:36:52 <slaweq> #action slaweq to explore options to fix https://bugs.launchpad.net/neutron/+bug/1903531 15:36:53 <openstack> Launchpad bug 1903531 in neutron "Update of neutron-server breaks compatibility to previous neutron-agent version" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:37:11 <slaweq> ok, lets move on now 15:37:13 <slaweq> #topic Grafana 15:37:17 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:37:44 <slaweq> in overall I think that it looks not so bad this week 15:38:21 <ralonsoh> it's getting better, yes 15:39:10 <slaweq> looking e.g. at https://grafana.opendev.org/d/PfjNuthGz/neutron-failure-rate?viewPanel=20&orgId=1 15:39:20 <slaweq> all except ovn job looks pretty good this week 15:39:44 <slaweq> and there is much less ssh authentication failures recently IMO 15:40:13 <bcafarel> that's nice 15:40:41 <slaweq> regarding specific jobs 15:40:49 <slaweq> #topic Tempest/Scenario 15:41:07 <slaweq> I was looking at various failures from last week today 15:41:15 <slaweq> and I didn't found many new issues 15:41:28 <slaweq> I just found 2 examples of SSH failure in ovn jobs: 15:41:32 <slaweq> https://7513009f5bff8f76e461-f83d06667d580e000031601b82c71a43.ssl.cf2.rackcdn.com/763246/1/gate/neutron-ovn-tempest-ovs-release/acf920f/testr_results.html 15:41:36 <slaweq> https://8c281a85ffc729001c78-68bc071a5cbea1ed39a41226592204b6.ssl.cf1.rackcdn.com/763777/1/check/neutron-ovn-tempest-ovs-release-ipv6-only/3e09df5/testr_results.html 15:41:45 <slaweq> I didn't report it yet on LP 15:41:47 <slaweq> but I will 15:43:21 <slaweq> but really https://7513009f5bff8f76e461-f83d06667d580e000031601b82c71a43.ssl.cf2.rackcdn.com/763246/1/gate/neutron-ovn-tempest-ovs-release/acf920f/testr_results.html is probably known issue with some race in paramiko 15:43:31 <slaweq> but there wasn't console output there 15:43:50 <ralonsoh> did you push the patch to tempest? 15:44:07 <slaweq> tempest patch is merged 15:44:09 <ralonsoh> that one waiting for the VM output to mitigate the paramiko problem 15:44:12 <ralonsoh> ooook 15:44:30 <slaweq> https://review.opendev.org/c/openstack/tempest/+/761964 15:44:57 <slaweq> but in that case it was waiting for more than 10 minutes, checking console output 15:45:01 <slaweq> and it failed later :/ 15:47:21 <slaweq> I don't have any other examples of the failures in tempest jobs for this week 15:47:28 <slaweq> lets move on 15:47:34 <slaweq> #topic Rally jobs 15:47:58 <slaweq> I found today few cases with failure like: https://zuul.opendev.org/t/openstack/build/be642647ac1e4f5993a65e5f3f91a7a5 in rally job 15:48:06 <slaweq> do You know maybe if that is known issue? 15:48:45 <ralonsoh> no 15:48:50 <slaweq> :/ 15:49:09 <slaweq> I will report that agains rally as it doesn't seems to be issue in neutron really 15:50:17 <slaweq> #action slaweq to report bug against rally 15:51:15 <slaweq> and that's all what I had for today 15:51:29 <slaweq> do You want to talk about anything else regarding CI today? 15:51:38 <haleyb> i had one question, kind-of related to CI 15:52:11 <haleyb> I've been randomly working on fixing issues using IPv6 addresses for tunnel endpoints 15:52:22 <haleyb> and i sent out a WIP at https://review.opendev.org/c/openstack/neutron/+/760668 15:52:42 <haleyb> i was wondering if something like that should just be in one of the existing CI jobs 15:53:00 <haleyb> truly making things ipv6-only 15:53:52 <slaweq> haleyb: isn't it like that in tempest-ipv6-only job? 15:53:55 <slaweq> https://zuul.opendev.org/t/openstack/build/7d267a79a9ef4a6ab3413619a09bf0aa 15:54:08 <haleyb> i don't think it does the tunnel does it? 15:55:02 <haleyb> i just added that TUNNEL_IP_VERSION to devstack, it actually hasn't merged yet 15:55:23 <slaweq> maybe You can then change that tempest-ipv6-only job 15:55:33 <slaweq> as it is indended to be ipv6-only :) 15:56:59 <haleyb> slaweq: yes, i thought about that too, just didn't want to break everyone that inherited that 15:57:19 <haleyb> but maybe noone will notice with the new gerrit :) 15:57:39 <slaweq> haleyb: if You don't want to break anything for other projects You can propose new job like neutron-tempest-ipv6-only 15:57:48 <ralonsoh> but slaweq is right, according to the playbooks, "ipv6-only-deployments-verification" should "Verify the IPv6-only deployments" 15:57:58 <slaweq> which will inherit from tempest-ipv6-only and will also set this one var 15:58:01 <ralonsoh> and this is executed in tempest ipv6 15:58:13 <slaweq> then we can run only this new job in our queue 15:58:37 <slaweq> haleyb: can You sync with tempest folks what would be better for them? 15:59:22 <haleyb> slaweq: sure, i can propose something there and ask them for comments 15:59:28 <slaweq> haleyb++ thx 15:59:41 <slaweq> ok, we are running out of time now 15:59:45 <slaweq> thx for attending the meeting 15:59:49 <slaweq> and see You online 15:59:50 <ralonsoh> bye 15:59:50 <bcafarel> o/ 15:59:52 <slaweq> #endmeeting