15:00:42 <slaweq> #startmeeting neutron_ci 15:00:43 <openstack> Meeting started Tue Jan 19 15:00:42 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:46 <openstack> The meeting name has been set to 'neutron_ci' 15:00:46 <slaweq> hi 15:00:50 <jlibosva> o/ 15:00:53 <ralonsoh> hi again 15:01:21 <slaweq> lets wait 2-3 more minutes for others to join :) 15:01:28 <obondarev> hi 15:02:20 <bcafarel> o/ again 15:03:10 <slaweq> ok, lets start 15:03:13 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:03:19 <slaweq> please open and we can move on :) 15:03:27 <slaweq> #topic Actions from previous meetings 15:03:35 <slaweq> ralonsoh will check fullstack test_min_bw_qos_port_removed issues 15:03:54 <ralonsoh> (I can't find the patches) 15:04:04 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/770799 15:04:17 <ralonsoh> delete the port to force the OVS agent to delete the QoS 15:04:51 <slaweq> this patch is merged already 15:04:52 <ralonsoh> sorry!!! https://review.opendev.org/c/openstack/neutron/+/770458 15:04:55 <ralonsoh> ^^ this patch 15:05:10 <slaweq> ok, this one is merged too 15:05:12 <slaweq> :) 15:05:14 <slaweq> thx ralonsoh 15:05:22 <slaweq> next one: 15:05:25 <slaweq> otherwiseguy to check fedora ovn periodic job issue 15:06:09 <lajoskatona> Hi 15:06:24 <slaweq> I think otherwiseguy is not here today 15:06:31 * otherwiseguy is here 15:06:34 <slaweq> :) 15:06:35 <otherwiseguy> on phone, sorry 15:06:38 <slaweq> sorry :) 15:09:02 <slaweq> I think otherwiseguy updated LP related to that issue and he's on it 15:09:10 <slaweq> so we probably can move on 15:09:20 <slaweq> #topic #topic Stadium projects 15:09:25 <slaweq> #undo 15:09:25 <openstack> Removing item from minutes: #topic #topic Stadium projects 15:09:27 <slaweq> ##topic Stadium projects 15:09:29 <slaweq> #topic Stadium projects 15:09:42 <slaweq> lajoskatona: any updates about stadium projects ci? 15:09:56 <slaweq> except that midonet gate seems to be totally broken now :/ 15:10:47 <lajoskatona> not much 15:11:06 <bcafarel> quick check at failed jobs for midonet it seems the usual new pip resolver issues, "just" needs some requirements fixes 15:11:16 <lajoskatona> l-c jobs causing troubles like elesewhere, I proposed patch for bgpvpn to make non-voting 15:11:35 <otherwiseguy> ok, sorry about that. insurance agent called back. you have my attention now. 15:11:37 <slaweq> bcafarel: yeah, "just" 15:11:41 <lajoskatona> If I will have time I check odl, as I remember that one is in the same sad situation 15:11:55 <slaweq> ok, thx lajoskatona and bcafarel 15:12:00 <slaweq> I will open LP for midonet issue 15:12:11 <slaweq> and will ping midonet guys to take a look at it 15:12:36 <slaweq> otherwiseguy: noting really important, we talked about Your action item to check failing fedora job 15:12:53 <slaweq> I know You updated LP so it's fine for now 15:13:06 <otherwiseguy> slaweq: ah, yeah I think that one was going to be fixed by one of lucas's patches 15:13:17 <slaweq> I hope so :) 15:14:06 <slaweq> ok, lets move on 15:14:08 <slaweq> #topic Stable branches 15:14:15 <slaweq> Victoria dashboard: https://grafana.opendev.org/d/HUCHup2Gz/neutron-failure-rate-previous-stable-release?orgId=1 15:14:17 <slaweq> Ussuri dashboard: https://grafana.opendev.org/d/smqHXphMk/neutron-failure-rate-older-stable-release?orgId=1 15:14:40 <bcafarel> a hybrid stadium/stable https://review.opendev.org/q/topic:%2522oslo_lc_drop%2522+status:open has a few patches pending stable reviews to show the way out to l-c 15:14:47 <slaweq> I think that stable branches are recently surprisingly stable :) 15:14:55 <ralonsoh> yeah 15:15:04 <bcafarel> and yes we are finally back on normal merge routine in stable :) 15:16:32 <slaweq> so I guess we don't have anything special to discuss today regarding stable branches, right? 15:17:25 <slaweq> ok, lets move on 15:17:27 <bcafarel> nothing special, just hoping to get the opened reviews back to a short list soon 15:17:59 <slaweq> #topic Grafana 15:18:02 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:18:10 * slaweq will be back in 5 minutes 15:20:20 <slaweq> ok, I'm back 15:20:22 <slaweq> sorry 15:20:40 <slaweq> so Grafana looks pretty ok IMO recently 15:21:01 <slaweq> I don't see any major issues 15:22:19 <slaweq> recently our CI is in pretty good shape IMO, many patches are merged even without rechecks :) 15:22:29 <slaweq> do You have anything regarding grafana to discuss today? 15:22:34 <ralonsoh> nope 15:22:42 <bcafarel> no, and not complaining! 15:22:43 <slaweq> or we can go to the next topics? 15:23:21 <slaweq> ok, so lets move on 15:23:27 <slaweq> #topic Tempest/Scenario 15:23:34 <slaweq> here I have couple of topic to discuss today 15:23:48 <slaweq> We have many non-voting jobs, I proposed patch https://review.opendev.org/c/openstack/neutron/+/770832 to move those which are broken to the experimental queue for now 15:24:14 <slaweq> I did that because infra was overloaded recently and neutron is using a lot of resources for every patch 15:24:26 <slaweq> so maybe we can limit usage of those resources a bit 15:24:38 <slaweq> first by not run on every patch jobs which are failing all the time 15:24:40 <slaweq> wdyt? 15:24:46 <ralonsoh> we have 3 +2 15:25:01 <ralonsoh> maybe haleyb can check it again 15:25:01 <slaweq> I didn't noticed that :) 15:25:06 <lajoskatona> good way forward 15:25:18 <haleyb> ? 15:25:25 <ralonsoh> haleyb, https://review.opendev.org/c/openstack/neutron/+/770832 15:25:31 <haleyb> ack, i can look 15:25:34 <ralonsoh> thanks 15:25:36 <slaweq> thx 15:25:44 <slaweq> thx lajoskatona for fixing dvr job 15:25:53 <slaweq> I didn't touch this one for now 15:25:55 <ralonsoh> lajoskatona, thanks! 15:25:58 <haleyb> oh, the one with the strange job definition :) 15:26:03 <slaweq> lets see if it will finally be more stable 15:26:07 <slaweq> haleyb: yes 15:26:13 <lajoskatona> actually it was Oleg who found that 2 of my tests are failing on that job 15:26:33 <slaweq> thx obondarev too :) 15:26:56 <slaweq> another idea to limit used resources is to maybe move 3rd party jobs to periodic queue 15:27:31 <slaweq> we will not run those jobs on every patch but still we should be able to check if we don't break other projects 15:27:55 <slaweq> IMO it may be pretty good compromise between infra resources and coverage 15:27:57 <slaweq> wdyt? 15:28:21 <ralonsoh> what resources? 15:28:48 <ralonsoh> ok, I though you were talking about a env resource 15:28:49 <ralonsoh> sorry 15:28:54 <haleyb> I just +W'd that, let me know if you change your mind now 15:29:01 <slaweq> ralonsoh: I'm taking about jobs like ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa (non-voting) or openstacksdk job 15:29:18 <slaweq> which are non-voting jobs in our check queue 15:29:31 <ralonsoh> yeah, that could be a candidate to be moved 15:29:32 <slaweq> so each of them is using some infra resources 15:29:53 <slaweq> IMO running them once per day would be pretty ok and we would have less jobs run in check queue 15:30:12 <ralonsoh> (yeah, I was thinking about a VM resource or daemon or something, not zuul resources) 15:30:21 <slaweq> :) 15:30:27 <slaweq> sorry for not being clear 15:30:57 <slaweq> so if You are ok with it, I will propose patch today 15:31:01 <ralonsoh> cool 15:31:47 <lajoskatona> +1 15:31:55 <slaweq> thx 15:32:11 <slaweq> #action slaweq to move 3rd-party jobs to periodic queue 15:32:20 <slaweq> and one more thing about our CI jobs 15:32:25 <slaweq> I proposed https://review.opendev.org/c/openstack/neutron/+/770630 15:32:37 <slaweq> to change some tempest jobs to be "neutron-tempest" jobs 15:32:51 <slaweq> and to not run e.g. cinder services there as we don't really need them 15:33:07 <slaweq> so our jobs may be a bit faster and more stable hopefully 15:33:12 <ralonsoh> agree 15:33:21 <slaweq> please check that patch and tell me what You think about it 15:34:31 <slaweq> and that's all regaring scenario jobs for today from me 15:34:40 <slaweq> do You have anything else or can we move on? 15:36:08 <slaweq> so, let's move on 15:36:10 <slaweq> #topic fullstack/functional 15:36:30 <slaweq> functional tests job is still most unstable one now :/ 15:36:35 <slaweq> I found few issues there 15:36:37 <slaweq> like e.g.: 15:36:46 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e3b/760967/9/check/neutron-functional-with-uwsgi/e3b913c/testr_results.html 15:36:52 <slaweq> which is db migration failure 15:37:08 <slaweq> ralonsoh: I think You proposed some patch to improve that db migration tests 15:37:15 <slaweq> do You think it can solve that problem? 15:37:22 <ralonsoh> one sec 15:37:50 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/770935 15:37:56 <ralonsoh> that will help 15:38:22 <slaweq> great, this patch is merged already 15:38:34 <slaweq> so hopefully we will not see such failures again :) 15:38:36 <slaweq> thx ralonsoh 15:38:41 <ralonsoh> yw 15:39:17 <slaweq> next one is failure in neutron.tests.functional.agent.l3.test_dvr_router.TestDvrRouter.test_dvr_router_lifecycle_ha_with_snat_with_fips 15:39:29 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_44b/763231/13/check/neutron-functional-with-uwsgi/44b98e8/testr_results.html 15:39:41 <ralonsoh> that's killing me 15:39:46 <ralonsoh> that's why I added the debug logs 15:39:54 <ralonsoh> we kill -9 the process 15:40:03 <ralonsoh> and remove (with root) the PID file 15:40:21 <ralonsoh> and the PID file is still present... I don't understand this 15:41:04 <slaweq> strange 15:42:21 <ralonsoh> ok, I'll check if, because this command is being executed as root, the parent process is still present 15:42:29 <ralonsoh> maybe this process has the file open 15:42:32 <ralonsoh> maybe... 15:42:39 <slaweq> can be 15:42:45 <ralonsoh> I'll check it 15:42:55 <slaweq> thx 15:43:14 <slaweq> #action ralonsoh to check problem with (not) deleted pid files in functional tests 15:43:32 <slaweq> and in the same job, there was also other test failed 15:43:47 <slaweq> neutron.tests.functional.sanity.test_sanity.SanityTestCaseRoot.test_keepalived_ipv6_support 15:43:55 <slaweq> did You saw such failures already? 15:44:06 <ralonsoh> ? never 15:44:19 <ralonsoh> logs?>? 15:45:04 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_44b/763231/13/check/neutron-functional-with-uwsgi/44b98e8/controller/logs/dsvm-functional-logs/neutron.tests.functional.sanity.test_sanity.SanityTestCaseRoot.test_keepalived_ipv6_support.txt 15:45:16 <slaweq> in the logs from this file I see errors related to ovn hash_ring 15:45:28 <slaweq> but I'm not sure if that is root cause of the failure or maybe red herring 15:45:49 <ralonsoh> that problem was solved, I think 15:45:57 <slaweq> when? 15:46:04 <ralonsoh> let me find it 15:46:04 <slaweq> that failuire is from 17.01 15:46:58 <slaweq> thx 15:47:20 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/765874 15:47:41 <ralonsoh> that was supposed to solve this issue 15:48:03 <slaweq> so it didn't :/ 15:48:14 <slaweq> but maybe it's just problem with functional test 15:48:20 <ralonsoh> could be 15:48:21 <slaweq> maybe we should mock something there? 15:48:37 <ralonsoh> let me open a LP and then I'll check it 15:48:44 <slaweq> why we need periodic task to be run in the functional test? 15:49:09 <ralonsoh> you are right 15:49:12 <ralonsoh> we don't 15:50:13 <slaweq> do You want me to check it? or will You take a look? 15:50:22 <ralonsoh> I'll check it 15:50:27 <slaweq> thx 15:50:48 <slaweq> #action ralonsoh to check failing periodic task in functional test 15:51:09 <slaweq> ok, another one 15:51:16 <slaweq> related to not existing interface this time: 15:51:18 <slaweq> https://b50e3eeb8d199503f863-bb8dadf314ca143f13ef83e8dbc65d1a.ssl.cf5.rackcdn.com/764401/14/check/neutron-functional-with-uwsgi/b4e3ebe/testr_results.html 15:52:11 <ralonsoh> that's weird 15:53:13 <slaweq> seems like maybe we should wait a bit longer for interface to be created before we start setting attributes for it? 15:53:48 <ralonsoh> hmmm the creating command should return only when the interface is created and accessible 15:55:13 <slaweq> I will check logs from that test and that method 15:55:19 <slaweq> maybe I will find something 15:56:03 <slaweq> #action slaweq to check missing interface in namespace in functional tests 15:56:37 <slaweq> and the last one related to the functional tests 15:56:39 <ralonsoh> one more, related to functional/fullstack (but already resolved) 15:56:42 <ralonsoh> #link https://review.opendev.org/c/openstack/neutron/+/771436 15:56:48 <slaweq> this time ovn 15:57:10 <ralonsoh> (sorry) 15:57:10 <slaweq> thx ralonsoh, I will review it today 15:57:13 <slaweq> np :) 15:57:24 <slaweq> ok, last one from me 15:57:26 <slaweq> neutron.tests.functional.services.ovn_l3.test_plugin.TestRouter.test_gateway_chassis_rebalance 15:57:29 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_1f1/764433/7/gate/neutron-functional-with-uwsgi/1f14b81/testr_results.html 15:57:36 <slaweq> rings a bell for You? 15:57:50 <ralonsoh> let me check 15:58:43 <ralonsoh> I think the waiting event should be declared before the command 15:59:16 <ralonsoh> so the port creating event should be added to the watch list before the _create_router call 15:59:25 <slaweq> ralonsoh: will You propose a patch for that? 15:59:32 <ralonsoh> I'll check it now 15:59:44 <slaweq> thx 15:59:53 <slaweq> #action ralonsoh to check failing neutron.tests.functional.services.ovn_l3.test_plugin.TestRouter.test_gateway_chassis_rebalance test 16:00:02 <slaweq> so, that's all for today 16:00:04 <bcafarel> last action, right on time :) 16:00:10 <slaweq> thx for attending the meeting 16:00:11 <ralonsoh> bye! 16:00:13 <slaweq> o/ 16:00:14 <bcafarel> o/ 16:00:16 <slaweq> #endmeeting