15:00:11 #startmeeting neutron_ci 15:00:12 Meeting started Wed May 27 15:00:11 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:15 The meeting name has been set to 'neutron_ci' 15:00:20 hi 15:00:24 hey 15:00:41 o/ 15:01:07 o/ 15:01:18 o/ 15:01:23 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:01:26 Please open now :) 15:01:38 and I think that we can start 15:01:44 #topic Actions from previous meetings 15:01:58 first one 15:02:00 ralonsoh to continue checking ovn jobs timeouts 15:02:33 sorry 15:02:35 I'm late 15:02:47 ralonsoh: no problem :) 15:02:48 I spent some time on this again 15:02:59 with ovsdbapp and python-ovn developers 15:03:09 and I still don't find a "breach" in the code 15:03:24 this is something permanent in my plate 15:03:32 can we put this task on hold? 15:03:37 but it still happens in the ci, right? 15:03:44 sometimes 15:03:49 but no so often 15:03:52 ok 15:04:01 and this is something that happens in OVS too 15:04:31 so it's not always related to ovn jobs? also ml2/ovs jobs has got the same issue? 15:04:59 yes 15:05:07 ok 15:05:12 it's a problem with eventlet, ovsdbapp and pyuthon-ovs 15:05:28 ouch, probably will be hard to find :/ 15:05:36 pfffff 15:05:41 :) 15:07:16 ok, I know You will continue this investigation so I think we can move on 15:07:24 next one 15:07:26 slaweq to check failure in test_ha_router_failover: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6d0/726168/2/check/neutron-functional/6d0b174/testr_results.html 15:07:40 I have it in my todo list but I didn't have time to get to this one yet 15:07:44 I will try this week 15:07:52 #action slaweq to check failure in test_ha_router_failover: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6d0/726168/2/check/neutron-functional/6d0b174/testr_results.html 15:08:54 next one 15:08:56 slaweq to add additional logging for fullstack's firewall tests 15:09:19 and here it is the same: I have it in my todo list but I didn't have time yet to get to this 15:09:41 I reopened bug related to this test and marked it as unstable again 15:09:57 so it will not make our life harder :) 15:10:23 #action slaweq to add additional logging for fullstack's firewall tests 15:10:25 next one 15:10:30 there si so much that goes on in that one test, would it make sense to break it up? 15:11:09 njohnston: I was thinking about that, and I even started something https://review.opendev.org/#/c/716773/ 15:11:27 but it will require some more work 15:11:48 cool! As always, you are ahead of the curve. :-) 15:11:51 and I agree that this would be good to break it into few smaller tests 15:11:59 njohnston: thx :) 15:13:18 njohnston: so I will continue this effort in next weeek(s) but as low priority task 15:13:24 makes sense 15:14:12 ok 15:14:14 thx 15:14:17 so lets move on 15:14:31 slaweq to reopen bug related to failing fuillstack firewall tests 15:14:44 as I said I reopended bug https://bugs.launchpad.net/neutron/+bug/1742401 15:14:44 Launchpad bug 1742401 in neutron "Fullstack tests neutron.tests.fullstack.test_securitygroup.TestSecurityGroupsSameNetwork fails often" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:14:50 and marked test as unstable again 15:16:05 ok, and the last one 15:16:07 ralonsoh to check Address already allocated in subnet issue in tempest job 15:16:34 I'm on it, sorry 15:16:42 didn't spend too much time on this one 15:17:35 ralonsoh: I can imagine as I know what You were doing for most of the week :) 15:17:55 * slaweq also don't likes conjuntions :P 15:18:00 : 15:18:01 :P 15:18:32 with that I think we can move on to the next topic 15:18:34 #topic Stadium projects 15:18:59 I don't have anything related to the stadium for today 15:19:07 but maybe You have something to discuss here? 15:19:19 nope 15:19:20 nothing special, at least from me 15:20:00 ok, so lets move on to the next topic 15:20:02 #topic Stable branches 15:20:08 Train dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1 15:20:10 Stein dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1 15:20:18 Ussuri and Train now :) 15:20:31 nice! 15:20:55 bcafarel: are You sure? 15:21:03 I see in the description train and stein still 15:21:13 maybe You forgot change description? 15:21:19 argh, checking 15:21:26 the changeset merged for sure 15:22:51 nah, names were updated too, I guess 14h ago is too recent? 15:23:05 ( https://review.opendev.org/#/c/729291/ ) 15:23:44 ok, lets wait some more time 15:23:54 hopefully it will change 15:24:03 and thx bcafarel for taking care of it 15:24:13 something to check on next meeting :) 15:24:32 and apart from that from memory I don't think I have seen many stable failures 15:24:52 (also new stable releases for trein/stain) 15:25:00 *train/stein 15:25:02 yes, me too - most of my patches were merged pretty fast recently 15:26:25 if there is nothing else regarding stable branches, I think we can move on to the next topic 15:29:16 #topic Grafana 15:29:22 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:32:36 I don't know but our gate queue looks very good since few days - no failures at all :) 15:32:43 or almost at all 15:32:59 but it's worst in the check queue 15:33:21 e.g. neutron-ovn-tempest-slow is failing 100% times 15:36:01 hmm, I may know whats going there 15:36:15 maciejjozefczyk: I was hoping that You will know :P 15:36:30 I can take a look, this failing test seems similar to me :P (test_port_security_macspoofing_port) 15:36:48 hmm yeah the name does ring a bell 15:36:52 yes, that's the test which is failing every time since around last friday 15:36:55 I think we blacklisted that test in other jobs 15:37:08 there is a fix in core-ovn for this but I think its not yet in the OVN version we use in the gates 15:37:09 maciejjozefczyk: please check that if You will have some time 15:37:18 sure, thats gonan be quick :) slaweq 15:37:30 #action maciejjozefczyk to check failing test_port_security_macspoofing_port test 15:37:40 I left the vivaldi tab open for tomorrow morning to not forget :) 15:37:56 other than that things looks quite good IMO 15:38:10 anything else You want to discuss regarding grafana? 15:39:53 ok, so next topic 15:39:55 #topic fullstack/functional 15:40:08 I have only one thing regarding functional job 15:40:14 What to do with https://review.opendev.org/#/c/729588/ ? IMO it's good to go 15:40:35 this uwsgi job is even more stable recently than "normal" functional tests job 15:40:35 I think so 15:40:50 +1 15:40:53 +1 15:40:53 +1 15:41:26 ok, so please review and approve it :) 15:41:41 thx lajoskatona bcafarel and maciejjozefczyk for review of it already :) 15:42:04 :) +1 15:42:13 fullstack tests seems to be better when I marked this one test as unstable 15:42:31 slaweq: no problem 15:42:54 any questions/comments or can we move on? 15:43:40 go ahead 15:43:56 nothing from me 15:44:01 ok 15:44:04 #topic Tempest/Scenario 15:44:16 we already talked about failing ovn-slow job 15:44:44 the only other issue which I have today is yet another error with IpAddressAlreadyAllocated error 15:44:55 this time in grenade job: https://7ba1272f105c99db4826-d9c28f5658476db1e4ca5968d196888d.ssl.cf2.rackcdn.com/729591/1/check/neutron-grenade-dvr-multinode/54a824e/controller/logs/grenade.sh_log.txt 15:48:15 I'm not sure why it's like that but for me it smells like issue in the test 15:49:29 ok 15:49:31 I found 15:49:33 May 20 14:02:57.487624 ubuntu-bionic-rax-ord-0016690112 neutron-server[5514]: INFO neutron.wsgi [None req-e0e86a16-03d9-4d2c-86fd-db9c1c4133b9 tempest-RoutersTest-1419393395 tempest-RoutersTest-1419393395] 10.209.98.10 "PUT /v2.0/routers/68d7b71d-47e2-4b98-9f87-d972e2f3889d/add_router_interface HTTP/1.1" status: 409 len: 369 time: 38.4982579 15:50:08 sorry 15:50:12 first was: 15:50:14 May 20 14:03:00.501061 ubuntu-bionic-rax-ord-0016690112 neutron-server[5514]: INFO neutron.wsgi [None req-a76c0e0a-7bc9-4c57-94e9-fe92c402f768 tempest-RoutersTest-1419393395 tempest-RoutersTest-1419393395] 10.209.98.10 "PUT /v2.0/routers/68d7b71d-47e2-4b98-9f87-d972e2f3889d/add_router_interface HTTP/1.1" status: 200 len: 503 time: 102.8894622 15:50:31 this took long time, so client ended up with timeout and retried request 15:50:35 slaweq, I found the error for trhe IpAddressAlreadyAllocated 15:50:40 slaweq, https://bugs.launchpad.net/neutron/+bug/1880976 15:50:40 Launchpad bug 1880976 in neutron "[tempest] Error in "test_reuse_ip_address_with_other_fip_on_other_router" with duplicated floating IP" [Undecided,New] 15:50:44 reported 5 mins ago 15:51:04 but in the meantime it was allocated already so second request was 409 15:51:15 so at least in this grenade job it's nothing really new 15:51:17 in a nutshell: between the FIP deletion and the creation again with the same IP address, another test requested an FIP 15:51:32 and the server gave the same IP 15:51:38 just a coincidence 15:51:42 ugh 15:51:52 I'll try to reduce the time between the deletion and the new creation 15:52:00 reusing the IP address 15:52:13 nice catch ralonsoh 15:52:22 good luck! 15:52:25 it's just unfortunate that with our ipam randomization work that it got the same IP 15:52:35 yeah... 15:52:44 lower probability but still not 0% 15:52:53 bad roll of the dice 15:53:28 ralonsoh: so is it conflict of FIP or fixed IP? 15:53:33 FIP 15:53:34 how this test works? 15:53:39 simple 15:53:40 do we need to delete it? 15:53:45 2 vms with FIP 15:53:50 1 vm deleted and the FIP 15:54:09 then a VM is created and the FIP is created again, "reusing" the IP address 15:54:16 maybe we could just create FIP, attach to the first vm, detach from the vm, attach to the second vm? 15:54:33 that way FIP would be "reserved" in the tenant for whole test 15:54:37 am I right? 15:54:48 but we really need to delete it 15:54:55 one sec 15:55:00 it's in the test case description 15:55:24 https://github.com/openstack/neutron-tempest-plugin/blob/7b374486a54456d3c67fd2961c5894fb64ba48ab/neutron_tempest_plugin/scenario/test_floatingip.py#L518-L536 15:55:32 step 6 15:56:20 (btw, we SHOULD document always the test cases like this) 15:56:28 +1 for documenting 15:56:36 I agree 15:56:42 so clear, with those steps 15:57:12 and according to the test, I still don't see any reason why FIP has to be deleted? if we would detach it from VM it would be just DB record 15:57:33 you are right 15:57:35 so You don't need to "remember" its IP address but just reuse it later for VM3 15:57:42 we really don't need to delete the DB register 15:57:52 and this should be more stable IMHO 15:58:02 okidoki, I'll propose a patch 15:58:07 ralonsoh++ thx 15:58:15 nice 15:58:25 and with that we are almost out of time 15:58:35 at the end one quick note about periodic jobs 15:58:42 all seems ok this week 15:58:45 just a question: isn't it possible to run serially these tests? 15:58:49 too much time? 15:58:53 our job with ovsdbapp from mast is fine now 15:59:12 lajoskatona: yes, it would take too much time 15:59:19 ok 15:59:29 in tempest-slow job tests are run in serial but there is only few of them 15:59:47 thx for attending the meeting, see You all tomorrow :) 15:59:51 o/ 15:59:54 o/ 15:59:55 #endmeeting