15:00:42 #startmeeting neutron_ci 15:00:43 Meeting started Tue Jan 19 15:00:42 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:44 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:46 The meeting name has been set to 'neutron_ci' 15:00:46 hi 15:00:50 o/ 15:00:53 hi again 15:01:21 lets wait 2-3 more minutes for others to join :) 15:01:28 hi 15:02:20 o/ again 15:03:10 ok, lets start 15:03:13 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:03:19 please open and we can move on :) 15:03:27 #topic Actions from previous meetings 15:03:35 ralonsoh will check fullstack test_min_bw_qos_port_removed issues 15:03:54 (I can't find the patches) 15:04:04 https://review.opendev.org/c/openstack/neutron/+/770799 15:04:17 delete the port to force the OVS agent to delete the QoS 15:04:51 this patch is merged already 15:04:52 sorry!!! https://review.opendev.org/c/openstack/neutron/+/770458 15:04:55 ^^ this patch 15:05:10 ok, this one is merged too 15:05:12 :) 15:05:14 thx ralonsoh 15:05:22 next one: 15:05:25 otherwiseguy to check fedora ovn periodic job issue 15:06:09 Hi 15:06:24 I think otherwiseguy is not here today 15:06:31 * otherwiseguy is here 15:06:34 :) 15:06:35 on phone, sorry 15:06:38 sorry :) 15:09:02 I think otherwiseguy updated LP related to that issue and he's on it 15:09:10 so we probably can move on 15:09:20 #topic #topic Stadium projects 15:09:25 #undo 15:09:25 Removing item from minutes: #topic #topic Stadium projects 15:09:27 ##topic Stadium projects 15:09:29 #topic Stadium projects 15:09:42 lajoskatona: any updates about stadium projects ci? 15:09:56 except that midonet gate seems to be totally broken now :/ 15:10:47 not much 15:11:06 quick check at failed jobs for midonet it seems the usual new pip resolver issues, "just" needs some requirements fixes 15:11:16 l-c jobs causing troubles like elesewhere, I proposed patch for bgpvpn to make non-voting 15:11:35 ok, sorry about that. insurance agent called back. you have my attention now. 15:11:37 bcafarel: yeah, "just" 15:11:41 If I will have time I check odl, as I remember that one is in the same sad situation 15:11:55 ok, thx lajoskatona and bcafarel 15:12:00 I will open LP for midonet issue 15:12:11 and will ping midonet guys to take a look at it 15:12:36 otherwiseguy: noting really important, we talked about Your action item to check failing fedora job 15:12:53 I know You updated LP so it's fine for now 15:13:06 slaweq: ah, yeah I think that one was going to be fixed by one of lucas's patches 15:13:17 I hope so :) 15:14:06 ok, lets move on 15:14:08 #topic Stable branches 15:14:15 Victoria dashboard: https://grafana.opendev.org/d/HUCHup2Gz/neutron-failure-rate-previous-stable-release?orgId=1 15:14:17 Ussuri dashboard: https://grafana.opendev.org/d/smqHXphMk/neutron-failure-rate-older-stable-release?orgId=1 15:14:40 a hybrid stadium/stable https://review.opendev.org/q/topic:%2522oslo_lc_drop%2522+status:open has a few patches pending stable reviews to show the way out to l-c 15:14:47 I think that stable branches are recently surprisingly stable :) 15:14:55 yeah 15:15:04 and yes we are finally back on normal merge routine in stable :) 15:16:32 so I guess we don't have anything special to discuss today regarding stable branches, right? 15:17:25 ok, lets move on 15:17:27 nothing special, just hoping to get the opened reviews back to a short list soon 15:17:59 #topic Grafana 15:18:02 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:18:10 * slaweq will be back in 5 minutes 15:20:20 ok, I'm back 15:20:22 sorry 15:20:40 so Grafana looks pretty ok IMO recently 15:21:01 I don't see any major issues 15:22:19 recently our CI is in pretty good shape IMO, many patches are merged even without rechecks :) 15:22:29 do You have anything regarding grafana to discuss today? 15:22:34 nope 15:22:42 no, and not complaining! 15:22:43 or we can go to the next topics? 15:23:21 ok, so lets move on 15:23:27 #topic Tempest/Scenario 15:23:34 here I have couple of topic to discuss today 15:23:48 We have many non-voting jobs, I proposed patch https://review.opendev.org/c/openstack/neutron/+/770832 to move those which are broken to the experimental queue for now 15:24:14 I did that because infra was overloaded recently and neutron is using a lot of resources for every patch 15:24:26 so maybe we can limit usage of those resources a bit 15:24:38 first by not run on every patch jobs which are failing all the time 15:24:40 wdyt? 15:24:46 we have 3 +2 15:25:01 maybe haleyb can check it again 15:25:01 I didn't noticed that :) 15:25:06 good way forward 15:25:18 ? 15:25:25 haleyb, https://review.opendev.org/c/openstack/neutron/+/770832 15:25:31 ack, i can look 15:25:34 thanks 15:25:36 thx 15:25:44 thx lajoskatona for fixing dvr job 15:25:53 I didn't touch this one for now 15:25:55 lajoskatona, thanks! 15:25:58 oh, the one with the strange job definition :) 15:26:03 lets see if it will finally be more stable 15:26:07 haleyb: yes 15:26:13 actually it was Oleg who found that 2 of my tests are failing on that job 15:26:33 thx obondarev too :) 15:26:56 another idea to limit used resources is to maybe move 3rd party jobs to periodic queue 15:27:31 we will not run those jobs on every patch but still we should be able to check if we don't break other projects 15:27:55 IMO it may be pretty good compromise between infra resources and coverage 15:27:57 wdyt? 15:28:21 what resources? 15:28:48 ok, I though you were talking about a env resource 15:28:49 sorry 15:28:54 I just +W'd that, let me know if you change your mind now 15:29:01 ralonsoh: I'm taking about jobs like ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa (non-voting) or openstacksdk job 15:29:18 which are non-voting jobs in our check queue 15:29:31 yeah, that could be a candidate to be moved 15:29:32 so each of them is using some infra resources 15:29:53 IMO running them once per day would be pretty ok and we would have less jobs run in check queue 15:30:12 (yeah, I was thinking about a VM resource or daemon or something, not zuul resources) 15:30:21 :) 15:30:27 sorry for not being clear 15:30:57 so if You are ok with it, I will propose patch today 15:31:01 cool 15:31:47 +1 15:31:55 thx 15:32:11 #action slaweq to move 3rd-party jobs to periodic queue 15:32:20 and one more thing about our CI jobs 15:32:25 I proposed https://review.opendev.org/c/openstack/neutron/+/770630 15:32:37 to change some tempest jobs to be "neutron-tempest" jobs 15:32:51 and to not run e.g. cinder services there as we don't really need them 15:33:07 so our jobs may be a bit faster and more stable hopefully 15:33:12 agree 15:33:21 please check that patch and tell me what You think about it 15:34:31 and that's all regaring scenario jobs for today from me 15:34:40 do You have anything else or can we move on? 15:36:08 so, let's move on 15:36:10 #topic fullstack/functional 15:36:30 functional tests job is still most unstable one now :/ 15:36:35 I found few issues there 15:36:37 like e.g.: 15:36:46 https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e3b/760967/9/check/neutron-functional-with-uwsgi/e3b913c/testr_results.html 15:36:52 which is db migration failure 15:37:08 ralonsoh: I think You proposed some patch to improve that db migration tests 15:37:15 do You think it can solve that problem? 15:37:22 one sec 15:37:50 https://review.opendev.org/c/openstack/neutron/+/770935 15:37:56 that will help 15:38:22 great, this patch is merged already 15:38:34 so hopefully we will not see such failures again :) 15:38:36 thx ralonsoh 15:38:41 yw 15:39:17 next one is failure in neutron.tests.functional.agent.l3.test_dvr_router.TestDvrRouter.test_dvr_router_lifecycle_ha_with_snat_with_fips 15:39:29 https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_44b/763231/13/check/neutron-functional-with-uwsgi/44b98e8/testr_results.html 15:39:41 that's killing me 15:39:46 that's why I added the debug logs 15:39:54 we kill -9 the process 15:40:03 and remove (with root) the PID file 15:40:21 and the PID file is still present... I don't understand this 15:41:04 strange 15:42:21 ok, I'll check if, because this command is being executed as root, the parent process is still present 15:42:29 maybe this process has the file open 15:42:32 maybe... 15:42:39 can be 15:42:45 I'll check it 15:42:55 thx 15:43:14 #action ralonsoh to check problem with (not) deleted pid files in functional tests 15:43:32 and in the same job, there was also other test failed 15:43:47 neutron.tests.functional.sanity.test_sanity.SanityTestCaseRoot.test_keepalived_ipv6_support 15:43:55 did You saw such failures already? 15:44:06 ? never 15:44:19 logs?>? 15:45:04 https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_44b/763231/13/check/neutron-functional-with-uwsgi/44b98e8/controller/logs/dsvm-functional-logs/neutron.tests.functional.sanity.test_sanity.SanityTestCaseRoot.test_keepalived_ipv6_support.txt 15:45:16 in the logs from this file I see errors related to ovn hash_ring 15:45:28 but I'm not sure if that is root cause of the failure or maybe red herring 15:45:49 that problem was solved, I think 15:45:57 when? 15:46:04 let me find it 15:46:04 that failuire is from 17.01 15:46:58 thx 15:47:20 https://review.opendev.org/c/openstack/neutron/+/765874 15:47:41 that was supposed to solve this issue 15:48:03 so it didn't :/ 15:48:14 but maybe it's just problem with functional test 15:48:20 could be 15:48:21 maybe we should mock something there? 15:48:37 let me open a LP and then I'll check it 15:48:44 why we need periodic task to be run in the functional test? 15:49:09 you are right 15:49:12 we don't 15:50:13 do You want me to check it? or will You take a look? 15:50:22 I'll check it 15:50:27 thx 15:50:48 #action ralonsoh to check failing periodic task in functional test 15:51:09 ok, another one 15:51:16 related to not existing interface this time: 15:51:18 https://b50e3eeb8d199503f863-bb8dadf314ca143f13ef83e8dbc65d1a.ssl.cf5.rackcdn.com/764401/14/check/neutron-functional-with-uwsgi/b4e3ebe/testr_results.html 15:52:11 that's weird 15:53:13 seems like maybe we should wait a bit longer for interface to be created before we start setting attributes for it? 15:53:48 hmmm the creating command should return only when the interface is created and accessible 15:55:13 I will check logs from that test and that method 15:55:19 maybe I will find something 15:56:03 #action slaweq to check missing interface in namespace in functional tests 15:56:37 and the last one related to the functional tests 15:56:39 one more, related to functional/fullstack (but already resolved) 15:56:42 #link https://review.opendev.org/c/openstack/neutron/+/771436 15:56:48 this time ovn 15:57:10 (sorry) 15:57:10 thx ralonsoh, I will review it today 15:57:13 np :) 15:57:24 ok, last one from me 15:57:26 neutron.tests.functional.services.ovn_l3.test_plugin.TestRouter.test_gateway_chassis_rebalance 15:57:29 https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_1f1/764433/7/gate/neutron-functional-with-uwsgi/1f14b81/testr_results.html 15:57:36 rings a bell for You? 15:57:50 let me check 15:58:43 I think the waiting event should be declared before the command 15:59:16 so the port creating event should be added to the watch list before the _create_router call 15:59:25 ralonsoh: will You propose a patch for that? 15:59:32 I'll check it now 15:59:44 thx 15:59:53 #action ralonsoh to check failing neutron.tests.functional.services.ovn_l3.test_plugin.TestRouter.test_gateway_chassis_rebalance test 16:00:02 so, that's all for today 16:00:04 last action, right on time :) 16:00:10 thx for attending the meeting 16:00:11 bye! 16:00:13 o/ 16:00:14 o/ 16:00:16 #endmeeting