15:00:16 #startmeeting neutron_ci 15:00:17 Meeting started Tue Nov 24 15:00:16 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:18 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:20 The meeting name has been set to 'neutron_ci' 15:00:47 * bcafarel finishes his coffee just in time 15:00:51 slaweq: lajoskatona how can I help you? 15:00:51 welcome (again) 15:00:56 gmann: hi 15:01:16 please ping me link I will review the tempest one. 15:01:27 gmann: we were just talking about patch 15:01:29 https://review.opendev.org/c/openstack/tempest/+/743695 15:01:42 if You will have some time to review, that would be great :) 15:01:47 slaweq: ack, will check today 15:01:48 Hi 15:01:52 thx a lot gmann 15:01:59 np! 15:02:05 gmann: Hi, I send it 15:02:15 slaweq was quivker 15:02:41 lajoskatona: sure, i will review it today 15:02:49 gmann: thanks 15:03:06 ok, lets go with our ci meeting now :) 15:03:09 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:03:17 and agenda is on etherpad https://etherpad.opendev.org/p/neutron-ci-meetings 15:03:30 #topic Actions from previous meetings 15:03:36 first one was: 15:03:39 slaweq to report bug regarding errors 500 in ovn functional tests 15:03:49 it already was reported: https://bugs.launchpad.net/neutron/+bug/1903008 15:03:52 Launchpad bug 1903008 in neutron "Create network failed during functional test" [High,Confirmed] 15:04:26 and we are actually waiting for ralonsoh's patch with engine facade migration first 15:04:33 so this is "on hold" for now 15:04:44 and next one was: 15:04:46 ralonsoh will decrease number of test workers in scenario jobs 15:04:52 merged 15:04:59 fast :) 15:05:41 https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/763051 15:05:48 sorry, I didn't find it 15:06:05 I hope it will make those jobs more stable 15:06:57 crossing fingers 15:07:15 thx ralonsoh :) 15:07:18 ok, lets move on 15:07:21 #topic Stadium projects 15:07:33 anything regarding stadium to discuss today? 15:08:09 small stable/stadium update, https://bugs.launchpad.net/neutron/+bug/1903689/comments/5 15:08:11 Launchpad bug 1903689 in neutron "[stable/ussuri] Functional job fails - AttributeError: module 'neutron_lib.constants' has no attribute 'DEVICE_OWNER_DISTRIBUTED'" [Medium,In progress] - Assigned to Bernard Cafarelli (bcafarel) 15:08:11 nothing special, perhasp this one: https://review.opendev.org/c/openstack/networking-odl/+/763210 15:09:05 basically, adding neutron to upper-constraints needs to be done manually when creating new stable branch (maybe to add to a list of steps for that?) 15:09:15 I will send patches for train to victoria (forgot to do it yesterday) to catch up 15:10:19 bcafarel: can You also check https://docs.openstack.org/neutron/latest/contributor/policies/release-checklist.html if it's is up to date? 15:10:27 and maybe update with this info if needed 15:10:51 ooh nice, I wondered if we had something like that 15:11:19 slaweq: will do, and check other stuff I think of (adding branch tempest template, remove *master* jobs, etc) 15:11:33 bcafarel++ thx a lot 15:12:00 #action bcafarel to fix stable branches upper-constraints in stadium projects 15:12:11 #action bcafarel to check and update doc https://docs.openstack.org/neutron/latest/contributor/policies/release-checklist.html 15:12:29 ^^ just to not forget (that bcafarel voluneer for that :P) 15:12:40 :) 15:13:18 lajoskatona: and regardig Your patch, I already +2 it 15:13:28 so You need e.g. ralonsoh to check that 15:13:37 slaweq: thanks, just som eadvertisement for more attention :-) 15:14:11 np 15:15:09 ok, next topic 15:15:11 #topic Stable branches 15:15:15 Victoria dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1 15:15:17 Ussuri dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1 15:16:11 my unread backlog for stable is not too bad, so I'd say stble branches are good 15:16:19 (well except still pending https://bugs.launchpad.net/neutron/+bug/1903531 ) 15:16:22 Launchpad bug 1903531 in neutron "Update of neutron-server breaks compatibility to previous neutron-agent version" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:16:56 sorry, correct links: 15:16:58 Victoria dashboard: https://grafana.opendev.org/d/HUCHup2Gz/neutron-failure-rate-previous-stable-release?orgId=1 15:17:00 Ussuri dashboard: https://grafana.opendev.org/d/smqHXphMk/neutron-failure-rate-older-stable-release?orgId=1 15:18:30 thx bcafarel - that is good topic to discuss 15:18:40 and I forgot about it on previous meeting 15:18:54 I don't really know what to do with it now :/ 15:20:12 the problem is that IIUC fix should go to the agent's side 15:20:24 and if agent will be already updated, there will be no issue at all 15:20:33 is my understanding correct? 15:21:33 checking that original commit again 15:22:50 slaweq: so having a fix in agent to handle both types, and note that agents should be updated first for this bug? 15:23:25 problem is that officially supported update path is that first server should be updated always 15:23:34 as it should handle compatibility with older agents 15:23:37 what I can't see what happens if the revert will be merged 15:23:38 not vice versa 15:23:50 how that affects these deployments 15:24:08 lajoskatona: when we will revert that change, someone who already updated to e.g. 15.3 will have the same issue again 15:24:14 but in the opposite direction 15:24:29 exactly, they will experience the same problem 15:24:31 because his server will send (ip, mac) tuple 15:24:32 ok, so the fix would be better 15:24:40 but how to fix it? 15:24:41 because they have already rebooted the agents 15:25:09 send a patch handling both possible RPC responses 15:25:18 (IP) or (IP, MAC) 15:25:25 ralonsoh: but that patch needs to be on Agent's side, right 15:25:27 ? 15:25:44 in both, if I'm not wrong 15:25:50 this is something sent by the server 15:28:10 ralonsoh: yes, it is send by server 15:28:22 but how You want to send 2 things by server? 15:29:18 no, if the server is updated, it should send (IP,MAC) 15:29:56 yes 15:30:07 but, TBH, to those deployments no updated 15:30:09 so agent should be changed that it would be able to handle both cases 15:30:18 if they follow the update procedures 15:30:23 first the server, then the agents 15:30:40 if we don't revert the original patch, then when the server is updated 15:30:47 the RPC will send (IP,MAC) 15:30:52 and the agents won't understand this 15:31:13 yes, that's the problem 15:31:51 so maybe we should just revert the patch in stable releases 15:32:24 but if we will revert it in stable branches, then for deployments which already updated to latest Train (or Stein) the issue will be the same 15:32:36 updated server will again send just IP 15:32:52 and agent will expect (IP, MAC) as it will not have reverted change yet 15:33:22 I guess it will be limited number of deployments - if we cannot have fix soon it may be the "not so bad" option 15:33:51 fix it for people that have not updated yet, with the cost of additional trouble for those (hopefully few) who did 15:34:20 I think that will be better but I'm not 100% sure about that 15:34:29 ok, I will try to play with it a bit more 15:34:49 and lets discuss that on drivers meeting on Friday and decide there what to do with it 15:35:42 are You ok with this plan? 15:35:47 perfect 15:35:49 sounds good 15:35:51 ok 15:36:15 we may also say that e.g. 15.3 is "broken" and maybe remove it from pypi if possible 15:36:27 so no new people will update to that verion 15:36:34 that's also an option 15:36:39 I will ask release team for that 15:36:47 +1 that would be good in the meantime 15:36:52 #action slaweq to explore options to fix https://bugs.launchpad.net/neutron/+bug/1903531 15:36:53 Launchpad bug 1903531 in neutron "Update of neutron-server breaks compatibility to previous neutron-agent version" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:37:11 ok, lets move on now 15:37:13 #topic Grafana 15:37:17 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:37:44 in overall I think that it looks not so bad this week 15:38:21 it's getting better, yes 15:39:10 looking e.g. at https://grafana.opendev.org/d/PfjNuthGz/neutron-failure-rate?viewPanel=20&orgId=1 15:39:20 all except ovn job looks pretty good this week 15:39:44 and there is much less ssh authentication failures recently IMO 15:40:13 that's nice 15:40:41 regarding specific jobs 15:40:49 #topic Tempest/Scenario 15:41:07 I was looking at various failures from last week today 15:41:15 and I didn't found many new issues 15:41:28 I just found 2 examples of SSH failure in ovn jobs: 15:41:32 https://7513009f5bff8f76e461-f83d06667d580e000031601b82c71a43.ssl.cf2.rackcdn.com/763246/1/gate/neutron-ovn-tempest-ovs-release/acf920f/testr_results.html 15:41:36 https://8c281a85ffc729001c78-68bc071a5cbea1ed39a41226592204b6.ssl.cf1.rackcdn.com/763777/1/check/neutron-ovn-tempest-ovs-release-ipv6-only/3e09df5/testr_results.html 15:41:45 I didn't report it yet on LP 15:41:47 but I will 15:43:21 but really https://7513009f5bff8f76e461-f83d06667d580e000031601b82c71a43.ssl.cf2.rackcdn.com/763246/1/gate/neutron-ovn-tempest-ovs-release/acf920f/testr_results.html is probably known issue with some race in paramiko 15:43:31 but there wasn't console output there 15:43:50 did you push the patch to tempest? 15:44:07 tempest patch is merged 15:44:09 that one waiting for the VM output to mitigate the paramiko problem 15:44:12 ooook 15:44:30 https://review.opendev.org/c/openstack/tempest/+/761964 15:44:57 but in that case it was waiting for more than 10 minutes, checking console output 15:45:01 and it failed later :/ 15:47:21 I don't have any other examples of the failures in tempest jobs for this week 15:47:28 lets move on 15:47:34 #topic Rally jobs 15:47:58 I found today few cases with failure like: https://zuul.opendev.org/t/openstack/build/be642647ac1e4f5993a65e5f3f91a7a5 in rally job 15:48:06 do You know maybe if that is known issue? 15:48:45 no 15:48:50 :/ 15:49:09 I will report that agains rally as it doesn't seems to be issue in neutron really 15:50:17 #action slaweq to report bug against rally 15:51:15 and that's all what I had for today 15:51:29 do You want to talk about anything else regarding CI today? 15:51:38 i had one question, kind-of related to CI 15:52:11 I've been randomly working on fixing issues using IPv6 addresses for tunnel endpoints 15:52:22 and i sent out a WIP at https://review.opendev.org/c/openstack/neutron/+/760668 15:52:42 i was wondering if something like that should just be in one of the existing CI jobs 15:53:00 truly making things ipv6-only 15:53:52 haleyb: isn't it like that in tempest-ipv6-only job? 15:53:55 https://zuul.opendev.org/t/openstack/build/7d267a79a9ef4a6ab3413619a09bf0aa 15:54:08 i don't think it does the tunnel does it? 15:55:02 i just added that TUNNEL_IP_VERSION to devstack, it actually hasn't merged yet 15:55:23 maybe You can then change that tempest-ipv6-only job 15:55:33 as it is indended to be ipv6-only :) 15:56:59 slaweq: yes, i thought about that too, just didn't want to break everyone that inherited that 15:57:19 but maybe noone will notice with the new gerrit :) 15:57:39 haleyb: if You don't want to break anything for other projects You can propose new job like neutron-tempest-ipv6-only 15:57:48 but slaweq is right, according to the playbooks, "ipv6-only-deployments-verification" should "Verify the IPv6-only deployments" 15:57:58 which will inherit from tempest-ipv6-only and will also set this one var 15:58:01 and this is executed in tempest ipv6 15:58:13 then we can run only this new job in our queue 15:58:37 haleyb: can You sync with tempest folks what would be better for them? 15:59:22 slaweq: sure, i can propose something there and ask them for comments 15:59:28 haleyb++ thx 15:59:41 ok, we are running out of time now 15:59:45 thx for attending the meeting 15:59:49 and see You online 15:59:50 bye 15:59:50 o/ 15:59:52 #endmeeting