15:00:42 <slaweq> #startmeeting neutron_ci
15:00:43 <openstack> Meeting started Tue Jan 19 15:00:42 2021 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:46 <openstack> The meeting name has been set to 'neutron_ci'
15:00:46 <slaweq> hi
15:00:50 <jlibosva> o/
15:00:53 <ralonsoh> hi again
15:01:21 <slaweq> lets wait 2-3 more minutes for others to join :)
15:01:28 <obondarev> hi
15:02:20 <bcafarel> o/ again
15:03:10 <slaweq> ok, lets start
15:03:13 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:03:19 <slaweq> please open and we can move on :)
15:03:27 <slaweq> #topic Actions from previous meetings
15:03:35 <slaweq> ralonsoh will check fullstack test_min_bw_qos_port_removed issues
15:03:54 <ralonsoh> (I can't find the patches)
15:04:04 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/770799
15:04:17 <ralonsoh> delete the port to force the OVS agent to delete the QoS
15:04:51 <slaweq> this patch is merged already
15:04:52 <ralonsoh> sorry!!! https://review.opendev.org/c/openstack/neutron/+/770458
15:04:55 <ralonsoh> ^^ this patch
15:05:10 <slaweq> ok, this one is merged too
15:05:12 <slaweq> :)
15:05:14 <slaweq> thx ralonsoh
15:05:22 <slaweq> next one:
15:05:25 <slaweq> otherwiseguy to check fedora ovn periodic job issue
15:06:09 <lajoskatona> Hi
15:06:24 <slaweq> I think otherwiseguy is not here today
15:06:31 * otherwiseguy is here
15:06:34 <slaweq> :)
15:06:35 <otherwiseguy> on phone, sorry
15:06:38 <slaweq> sorry :)
15:09:02 <slaweq> I think otherwiseguy updated LP related to that issue and he's on it
15:09:10 <slaweq> so we probably can move on
15:09:20 <slaweq> #topic #topic Stadium projects
15:09:25 <slaweq> #undo
15:09:25 <openstack> Removing item from minutes: #topic #topic Stadium projects
15:09:27 <slaweq> ##topic Stadium projects
15:09:29 <slaweq> #topic Stadium projects
15:09:42 <slaweq> lajoskatona: any updates about stadium projects ci?
15:09:56 <slaweq> except that midonet gate seems to be totally broken now :/
15:10:47 <lajoskatona> not much
15:11:06 <bcafarel> quick check at failed jobs for midonet it seems the usual new pip resolver issues, "just" needs some requirements fixes
15:11:16 <lajoskatona> l-c jobs causing troubles like elesewhere, I proposed patch for bgpvpn to make non-voting
15:11:35 <otherwiseguy> ok, sorry about that. insurance agent called back. you have my attention now.
15:11:37 <slaweq> bcafarel: yeah, "just"
15:11:41 <lajoskatona> If I will have time I check odl, as I remember that one is in the same sad situation
15:11:55 <slaweq> ok, thx lajoskatona and bcafarel
15:12:00 <slaweq> I will open LP for midonet issue
15:12:11 <slaweq> and will ping midonet guys to take a look at it
15:12:36 <slaweq> otherwiseguy: noting really important, we talked about Your action item to check failing fedora job
15:12:53 <slaweq> I know You updated LP so it's fine for now
15:13:06 <otherwiseguy> slaweq: ah, yeah I think that one was going to be fixed by one of lucas's patches
15:13:17 <slaweq> I hope so :)
15:14:06 <slaweq> ok, lets move on
15:14:08 <slaweq> #topic Stable branches
15:14:15 <slaweq> Victoria dashboard: https://grafana.opendev.org/d/HUCHup2Gz/neutron-failure-rate-previous-stable-release?orgId=1
15:14:17 <slaweq> Ussuri dashboard: https://grafana.opendev.org/d/smqHXphMk/neutron-failure-rate-older-stable-release?orgId=1
15:14:40 <bcafarel> a hybrid stadium/stable https://review.opendev.org/q/topic:%2522oslo_lc_drop%2522+status:open has a few patches pending stable reviews to show the way out to l-c
15:14:47 <slaweq> I think that stable branches are recently surprisingly stable :)
15:14:55 <ralonsoh> yeah
15:15:04 <bcafarel> and yes we are finally back on normal merge routine in stable :)
15:16:32 <slaweq> so I guess we don't have anything special to discuss today regarding stable branches, right?
15:17:25 <slaweq> ok, lets move on
15:17:27 <bcafarel> nothing special, just hoping to get the opened reviews back to a short list soon
15:17:59 <slaweq> #topic Grafana
15:18:02 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:18:10 * slaweq will be back in 5 minutes
15:20:20 <slaweq> ok, I'm back
15:20:22 <slaweq> sorry
15:20:40 <slaweq> so Grafana looks pretty ok IMO recently
15:21:01 <slaweq> I don't see any major issues
15:22:19 <slaweq> recently our CI is in pretty good shape IMO, many patches are merged even without rechecks :)
15:22:29 <slaweq> do You have anything regarding grafana to discuss today?
15:22:34 <ralonsoh> nope
15:22:42 <bcafarel> no, and not complaining!
15:22:43 <slaweq> or we can go to the next topics?
15:23:21 <slaweq> ok, so lets move on
15:23:27 <slaweq> #topic Tempest/Scenario
15:23:34 <slaweq> here I have couple of topic to discuss today
15:23:48 <slaweq> We have many non-voting jobs, I proposed patch https://review.opendev.org/c/openstack/neutron/+/770832 to move those which are broken to the experimental queue for now
15:24:14 <slaweq> I did that because infra was overloaded recently and neutron is using a lot of resources for every patch
15:24:26 <slaweq> so maybe we can limit usage of those resources a bit
15:24:38 <slaweq> first by not run on every patch jobs which are failing all the time
15:24:40 <slaweq> wdyt?
15:24:46 <ralonsoh> we have 3 +2
15:25:01 <ralonsoh> maybe haleyb can check it again
15:25:01 <slaweq> I didn't noticed that :)
15:25:06 <lajoskatona> good way forward
15:25:18 <haleyb> ?
15:25:25 <ralonsoh> haleyb, https://review.opendev.org/c/openstack/neutron/+/770832
15:25:31 <haleyb> ack, i can look
15:25:34 <ralonsoh> thanks
15:25:36 <slaweq> thx
15:25:44 <slaweq> thx lajoskatona for fixing dvr job
15:25:53 <slaweq> I didn't touch this one for now
15:25:55 <ralonsoh> lajoskatona, thanks!
15:25:58 <haleyb> oh, the one with the strange job definition :)
15:26:03 <slaweq> lets see if it will finally be more stable
15:26:07 <slaweq> haleyb: yes
15:26:13 <lajoskatona> actually it was Oleg who found that 2 of my tests are failing on that job
15:26:33 <slaweq> thx obondarev too :)
15:26:56 <slaweq> another idea to limit used resources is to maybe move 3rd party jobs to periodic queue
15:27:31 <slaweq> we will not run those jobs on every patch but still we should be able to check if we don't break other projects
15:27:55 <slaweq> IMO it may be pretty good compromise between infra resources and coverage
15:27:57 <slaweq> wdyt?
15:28:21 <ralonsoh> what resources?
15:28:48 <ralonsoh> ok, I though you were talking about a env resource
15:28:49 <ralonsoh> sorry
15:28:54 <haleyb> I just +W'd that, let me know if you change your mind now
15:29:01 <slaweq> ralonsoh: I'm taking about jobs like ironic-tempest-ipa-wholedisk-bios-agent_ipmitool-tinyipa (non-voting) or openstacksdk job
15:29:18 <slaweq> which are non-voting jobs in our check queue
15:29:31 <ralonsoh> yeah, that could be a candidate to be moved
15:29:32 <slaweq> so each of them is using some infra resources
15:29:53 <slaweq> IMO running them once per day would be pretty ok and we would have less jobs run in check queue
15:30:12 <ralonsoh> (yeah, I was thinking about a VM resource or daemon or something, not zuul resources)
15:30:21 <slaweq> :)
15:30:27 <slaweq> sorry for not being clear
15:30:57 <slaweq> so if You are ok with it, I will propose patch today
15:31:01 <ralonsoh> cool
15:31:47 <lajoskatona> +1
15:31:55 <slaweq> thx
15:32:11 <slaweq> #action slaweq to move 3rd-party jobs to periodic queue
15:32:20 <slaweq> and one more thing about our CI jobs
15:32:25 <slaweq> I proposed  https://review.opendev.org/c/openstack/neutron/+/770630
15:32:37 <slaweq> to change some tempest jobs to be "neutron-tempest" jobs
15:32:51 <slaweq> and to not run e.g. cinder services there as we don't really need them
15:33:07 <slaweq> so our jobs may be a bit faster and more stable hopefully
15:33:12 <ralonsoh> agree
15:33:21 <slaweq> please check that patch and tell me what You think about it
15:34:31 <slaweq> and that's all regaring scenario jobs for today from me
15:34:40 <slaweq> do You have anything else or can we move on?
15:36:08 <slaweq> so, let's move on
15:36:10 <slaweq> #topic fullstack/functional
15:36:30 <slaweq> functional tests job is still most unstable one now :/
15:36:35 <slaweq> I found few issues there
15:36:37 <slaweq> like e.g.:
15:36:46 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e3b/760967/9/check/neutron-functional-with-uwsgi/e3b913c/testr_results.html
15:36:52 <slaweq> which is db migration failure
15:37:08 <slaweq> ralonsoh: I think You proposed some patch to improve that db migration tests
15:37:15 <slaweq> do You think it can solve that problem?
15:37:22 <ralonsoh> one sec
15:37:50 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/770935
15:37:56 <ralonsoh> that will help
15:38:22 <slaweq> great, this patch is merged already
15:38:34 <slaweq> so hopefully we will not see such failures again :)
15:38:36 <slaweq> thx ralonsoh
15:38:41 <ralonsoh> yw
15:39:17 <slaweq> next one is failure in     neutron.tests.functional.agent.l3.test_dvr_router.TestDvrRouter.test_dvr_router_lifecycle_ha_with_snat_with_fips
15:39:29 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_44b/763231/13/check/neutron-functional-with-uwsgi/44b98e8/testr_results.html
15:39:41 <ralonsoh> that's killing me
15:39:46 <ralonsoh> that's why I added the debug logs
15:39:54 <ralonsoh> we kill -9 the process
15:40:03 <ralonsoh> and remove (with root) the PID file
15:40:21 <ralonsoh> and the PID file is still present... I don't understand this
15:41:04 <slaweq> strange
15:42:21 <ralonsoh> ok, I'll check if, because this command is being executed as root, the parent process is still present
15:42:29 <ralonsoh> maybe this process has the file open
15:42:32 <ralonsoh> maybe...
15:42:39 <slaweq> can be
15:42:45 <ralonsoh> I'll check it
15:42:55 <slaweq> thx
15:43:14 <slaweq> #action ralonsoh to check problem with (not) deleted pid files in functional tests
15:43:32 <slaweq> and in the same job, there was also other test failed
15:43:47 <slaweq> neutron.tests.functional.sanity.test_sanity.SanityTestCaseRoot.test_keepalived_ipv6_support
15:43:55 <slaweq> did You saw such failures already?
15:44:06 <ralonsoh> ? never
15:44:19 <ralonsoh> logs?>?
15:45:04 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_44b/763231/13/check/neutron-functional-with-uwsgi/44b98e8/controller/logs/dsvm-functional-logs/neutron.tests.functional.sanity.test_sanity.SanityTestCaseRoot.test_keepalived_ipv6_support.txt
15:45:16 <slaweq> in the logs from this file I see errors related to ovn hash_ring
15:45:28 <slaweq> but I'm not sure if that is root cause of the failure or maybe red herring
15:45:49 <ralonsoh> that problem was solved, I think
15:45:57 <slaweq> when?
15:46:04 <ralonsoh> let me find it
15:46:04 <slaweq> that failuire is from 17.01
15:46:58 <slaweq> thx
15:47:20 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/765874
15:47:41 <ralonsoh> that was supposed to solve this issue
15:48:03 <slaweq> so it didn't :/
15:48:14 <slaweq> but maybe it's just problem with functional test
15:48:20 <ralonsoh> could be
15:48:21 <slaweq> maybe we should mock something there?
15:48:37 <ralonsoh> let me open a LP and then I'll check it
15:48:44 <slaweq> why we need periodic task to be run in the functional test?
15:49:09 <ralonsoh> you are right
15:49:12 <ralonsoh> we don't
15:50:13 <slaweq> do You want me to check it? or will You take a look?
15:50:22 <ralonsoh> I'll check it
15:50:27 <slaweq> thx
15:50:48 <slaweq> #action ralonsoh to check failing periodic task in functional test
15:51:09 <slaweq> ok, another one
15:51:16 <slaweq> related to not existing interface this time:
15:51:18 <slaweq> https://b50e3eeb8d199503f863-bb8dadf314ca143f13ef83e8dbc65d1a.ssl.cf5.rackcdn.com/764401/14/check/neutron-functional-with-uwsgi/b4e3ebe/testr_results.html
15:52:11 <ralonsoh> that's weird
15:53:13 <slaweq> seems like maybe we should wait a bit longer for interface to be created before we start setting attributes for it?
15:53:48 <ralonsoh> hmmm the creating command should return only when the interface is created and accessible
15:55:13 <slaweq> I will check logs from that test and that method
15:55:19 <slaweq> maybe I will find something
15:56:03 <slaweq> #action slaweq to check missing interface in namespace in functional tests
15:56:37 <slaweq> and the last one related to the functional tests
15:56:39 <ralonsoh> one more, related to functional/fullstack (but already resolved)
15:56:42 <ralonsoh> #link https://review.opendev.org/c/openstack/neutron/+/771436
15:56:48 <slaweq> this time ovn
15:57:10 <ralonsoh> (sorry)
15:57:10 <slaweq> thx ralonsoh, I will review it today
15:57:13 <slaweq> np :)
15:57:24 <slaweq> ok, last one from me
15:57:26 <slaweq> neutron.tests.functional.services.ovn_l3.test_plugin.TestRouter.test_gateway_chassis_rebalance
15:57:29 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_1f1/764433/7/gate/neutron-functional-with-uwsgi/1f14b81/testr_results.html
15:57:36 <slaweq> rings a bell for You?
15:57:50 <ralonsoh> let me check
15:58:43 <ralonsoh> I think the waiting event should be declared before the command
15:59:16 <ralonsoh> so the port creating event should be added to the watch list before the _create_router call
15:59:25 <slaweq> ralonsoh: will You propose a patch for that?
15:59:32 <ralonsoh> I'll check it now
15:59:44 <slaweq> thx
15:59:53 <slaweq> #action ralonsoh to check failing neutron.tests.functional.services.ovn_l3.test_plugin.TestRouter.test_gateway_chassis_rebalance test
16:00:02 <slaweq> so, that's all for today
16:00:04 <bcafarel> last action, right on time :)
16:00:10 <slaweq> thx for attending the meeting
16:00:11 <ralonsoh> bye!
16:00:13 <slaweq> o/
16:00:14 <bcafarel> o/
16:00:16 <slaweq> #endmeeting