#openstack-meeting log

16:00:15 <slaweq> #startmeeting neutron_ci
16:00:16 <openstack> Meeting started Tue Sep 10 16:00:15 2019 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:20 <slaweq> welcome again
16:00:20 <openstack> The meeting name has been set to 'neutron_ci'
16:00:24 <ralonsoh> hi
16:01:15 <slaweq> mlavalle will not be here today
16:01:31 <slaweq> but lets wait few more minutes for njohnston bcafarel and others
16:01:39 * slaweq will be back in 2 minutes
16:01:52 <bcafarel> o/ sorry did not see the time
16:02:35 <clarkb> really quickly I wanted to point out that some of octavia's CI problems were due to an ubuntu kernel bug in ovs that was causing the kernel to panic. Its possible neutron and others are seeing that too (if the job is retried)
16:02:55 <clarkb> that bug has been fixed we need to update our mirrors and rebuild images (complicated by a fileserver outage the other day we are still trying to recover from)
16:03:16 <slaweq> ok, I'm back
16:03:22 <johnsom> #link https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1842447
16:03:23 <openstack> Launchpad bug 1842447 in linux (Ubuntu) "Kernel Panic with linux-image-4.15.0-60-generic when specifying nameserver in docker-compose" [Undecided,Confirmed]
16:03:25 <johnsom> fyi
16:03:47 <slaweq> clarkb: johnsom thx for heads up
16:03:49 <johnsom> I was going to ask after my meeting as we are not seeing new images today
16:03:55 <slaweq> but I didn't saw it in neutron jobs (yet)
16:04:34 <slaweq> but we will keep an eye on it for sure :)
16:04:47 <slaweq> ok, let's get going with meeting agenda
16:04:49 <slaweq> #topic Actions from previous meetings
16:04:56 <slaweq> first one
16:04:57 <slaweq> mlavalle to continue investigating router migrations issue
16:05:06 <slaweq> he told me that he is still investigating
16:05:15 <slaweq> so I will just assign it to him for next week too
16:05:19 <slaweq> #action mlavalle to continue investigating router migrations issue
16:05:26 <slaweq> next one
16:05:28 <slaweq> slaweq to check reasons of failures of neutron-tempest-plugin-scenario-openvswitch job
16:05:42 <slaweq> I didn't have time but it looks much better now so I hope we will be good with this job :)
16:05:56 <slaweq> next one
16:05:58 <slaweq> ralonsoh to report bug and investigate failing test_get_devices_info_veth_different_namespaces functional test
16:06:06 <njohnston> o/ sorry I am late
16:06:18 <ralonsoh> slaweq, that's solved in your patch
16:06:25 <slaweq> ahh, right :)
16:06:27 <slaweq> ok
16:06:32 <slaweq> thx ralonsoh
16:06:36 <ralonsoh> np!
16:06:42 <slaweq> ok, next one
16:06:43 <slaweq> slaweq to check reason of failure neutron.tests.functional.agent.test_firewall.FirewallTestCase.test_rule_ordering_correct
16:06:50 <slaweq> It was the issue which should be fixed with https://review.opendev.org/#/c/679428/
16:06:55 <slaweq> so nothing more to check there
16:07:03 <slaweq> and the last one
16:07:05 <slaweq> slaweq to add mariadb periodic job
16:07:13 <slaweq> I proposed https://review.opendev.org/681202
16:07:24 <slaweq> but it also requires https://review.opendev.org/#/c/681200/1 and https://review.opendev.org/#/c/681201/
16:07:55 <ralonsoh> +1 to this
16:08:06 <slaweq> ralonsoh: lets first check if that will work as expected :)
16:08:17 <slaweq> I will continue this work during next week
16:08:34 <slaweq> ok, lets move on to the next topic than
16:08:35 <slaweq> #topic Stadium projects
16:08:41 <slaweq> Python 3 migration
16:08:43 <slaweq> Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status
16:08:51 <slaweq> I think we already talked about it on neutron meeting
16:08:59 <njohnston> +1
16:09:06 <slaweq> njohnston: anything You want to add about it?
16:09:36 <njohnston> slaweq: Just that I have not had a chance to check in with yamamoto about midonet
16:10:20 <njohnston> that's all
16:10:21 <slaweq> ok, yes, midonet is the last "almost not touched" one, right?
16:10:26 <njohnston> yes
16:10:42 <njohnston> I rarely see yamamoto online it seems
16:10:47 <slaweq> we still have some time but IMO we should finish this work before end of this year
16:10:57 <slaweq> as python 2 is EOL at 1.1.2020
16:11:34 <slaweq> ok, and second stadium projects topic
16:11:36 <slaweq> tempest-plugins migration
16:11:38 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo
16:11:52 <slaweq> I know that tidwellr is doing some progress with neutron-dynamic-routing still
16:11:59 <slaweq> tidwellr: do You need help on this?
16:12:06 <njohnston> same state as last week I believe; vpnaas is -W from mlavalle
16:12:51 <njohnston> tidwellr did mention on his review that he was seeing worrisome errors from neutron https://review.opendev.org/#/c/652099
16:13:59 <slaweq> yes, I saw 1 error on this patch today
16:14:18 <slaweq> ok, tidwellr if You would need any help, please ping me :)
16:14:41 <slaweq> anything else You want to talk regarding stadium projects CI today?
16:15:04 <tidwellr> slaweq: I'm looking into it, but I could use some help
16:15:29 <tidwellr> I thought it would be simple, but these issue aren't seeming so simple anymore :)
16:15:32 <slaweq> tidwellr: ok, I will try to look into it this week too
16:15:47 <njohnston> nothing else from me
16:16:01 <slaweq> and interesting question is why it's not failing in old, legacy jobs
16:17:37 <tidwellr> slaweq: I'm seeing intermittent failures where BGP peering simply doesn't start, but then on a recheck everything begins peering and the test in question passes
16:18:07 <tidwellr> but we don't see this behavior with the legacy jobs in neutron-dynamic-routing
16:18:09 <slaweq> BGP peering is something run in docker container, right?
16:18:23 <tidwellr> BGP agent peers with docker container
16:18:54 <slaweq> ok, I will take a look into logs
16:18:58 <tidwellr> the BGP agent will always be the one to initiate the peering
16:19:02 <slaweq> maybe I will find something
16:19:10 <tidwellr> good luck :)
16:19:24 <slaweq> thx tidwellr :)
16:19:29 <slaweq> and thx for working on this
16:19:33 <slaweq> ok, lets move on
16:19:37 <slaweq> next topic
16:19:39 <slaweq> #topic Grafana
16:19:45 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:19:57 <slaweq> (sorry that I forgot sent it at the beginning)
16:20:56 <slaweq> currently we have 2 main issues in CI
16:21:02 <slaweq> 1. problems with rally
16:21:32 <slaweq> which is failing 100% times due to jsonschema versions mismatch
16:22:10 <slaweq> 2. neutron-tempest-iptables_hybrid-fedora which is even not starting (RETRY LIMIT) 100% times
16:22:31 <slaweq> because we were running it on F28 which is now EOL and there is no repositories for it anymore
16:22:43 <slaweq> for both cases there are patches to fix them
16:22:58 <slaweq> for rally issue we are waiting for new rally release
16:23:25 <ralonsoh> one question, if possible (about Fedora issue)
16:23:32 <slaweq> ralonsoh: sure
16:23:54 <ralonsoh> why, instead of using F29 we don't try F30?
16:24:01 <njohnston> slaweq: FOr Fedora issue, is this DNM patch the fix patch or is there a different one?
16:24:03 <njohnston> https://review.opendev.org/#/c/681213/
16:24:25 <ralonsoh> yes, why don't we switch to F30?
16:24:32 <ralonsoh> instead of F29
16:24:54 <ralonsoh> maybe Brian can help us (is not here now)
16:24:54 <slaweq> my DNM patch was send only to test if haleyb's change to devstack will fix this job
16:25:04 <ralonsoh> haleyb, yes he is!
16:25:07 <slaweq> but it's not fix for the issue
16:25:13 <njohnston> ah ok.
16:25:18 <slaweq> fix is proposed by haleyb to devstack repo
16:25:24 <slaweq> https://review.opendev.org/#/c/662529/5
16:25:37 <clarkb> re f30 I don't know that it is quite ready yet. We are adding it in nowish iirc
16:25:44 <slaweq> according to comment from ianw there he had some issues with F30
16:25:45 <clarkb> (but once it is ready you should feel free to test on it)
16:25:48 <haleyb> and doug is looking at the barbican failure today
16:25:51 <slaweq> that's why we are changing for F29 now
16:25:57 <ralonsoh> ooook, thanks!!
16:26:04 <redrobot> haleyb, o/
16:26:08 <haleyb> i didn't see any f30 support, so stopped at f29
16:26:16 <ralonsoh> perfect
16:27:22 <slaweq> ralonsoh: njohnston is that clear for You now?
16:27:33 <ralonsoh> it is, for sure
16:27:34 <njohnston> slaweq: yes thanks
16:28:00 <slaweq> great :)
16:28:10 <slaweq> haleyb: thx for fixing this
16:28:34 <haleyb> np, wish everything had merged sooner, started as a periodic ovn job failure
16:28:49 <slaweq> but also to unblock our gates ASAP I sent today patch https://review.opendev.org/#/c/681186/
16:29:09 <slaweq> later we will be able to revert it when proper fixes in external repos will land
16:29:31 <slaweq> ok
16:29:41 <slaweq> other than those 2 issues, we are quite fine
16:29:56 <slaweq> even functional/fullstack jobs are in quite good shape this week
16:31:42 <slaweq> and that's all from me according to grafana
16:31:53 <slaweq> do You have anything else about grafana today?
16:33:07 <njohnston> nope
16:33:09 <slaweq> ok, lets move on
16:33:11 <slaweq> #topic fullstack/functional
16:33:30 <slaweq> I today found one new (for me) issue on functional tests
16:33:37 <slaweq> and it happend at least twice
16:33:47 <slaweq> it was on different tests
16:33:54 <slaweq> but same error in test's log
16:34:00 <slaweq> https://0668011f33af6364883c-c555fae2d8c498523cc4b2c363541725.ssl.cf1.rackcdn.com/679852/11/gate/neutron-functional/6b7c424/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase.test_arp_protection_port_security_disabled.txt.gz
16:34:02 <slaweq> or
16:34:06 <slaweq> https://148a66b404dde523de26-17406e3478c64e603d8ff3ea0aac16c8.ssl.cf5.rackcdn.com/680393/1/check/neutron-functional-python27/59e721e/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_l3_tc_lib.TcLibTestCase.test_clear_all_filters.txt.gz
16:34:14 <slaweq> did You saw something like that before?
16:34:31 <njohnston> no, that is a new one for me... very strange...
16:34:51 <ralonsoh> this can be (i guess) because you are deleting the qos registers
16:35:03 <slaweq> tbh this ovsdb error may be redhearing as it could happend during cleanup
16:35:12 <ralonsoh> and the rules are not there anymore
16:35:25 <slaweq> but that's only thing which was common in those 2 failed tests for me
16:35:34 <ralonsoh> yes, this could happen
16:35:41 <ralonsoh> I can check this tomorrow
16:35:49 <slaweq> thx ralonsoh
16:35:51 <ralonsoh> do you have a bug ref?
16:35:55 <slaweq> ralonsoh: no
16:36:02 <ralonsoh> ok, I'll do it
16:36:07 <slaweq> ok, thx a lot
16:36:22 <slaweq> #action ralonsoh to report bug and check issue with ovsdb errors in functional tests
16:36:49 <slaweq> and I also found (happend once) issue with killing external process during cleanup
16:36:52 <slaweq> I reported it here     New bug reported by me: https://bugs.launchpad.net/neutron/+bug/1843418 - not very urgent
16:36:54 <openstack> Launchpad bug 1843418 in neutron "Functional tests shouldn't fail if kill command will have "no such process" during cleanup" [Medium,In progress] - Assigned to Slawek Kaplonski (slaweq)
16:37:19 <slaweq> I know that bcafarel and ralonsoh already reviewed proposed patch
16:37:30 <slaweq> I didn't had time to check those reviews yet
16:37:50 <ralonsoh> slaweq, I propose to implement a os.kill() method with and wihtout privsep
16:37:56 <ralonsoh> if root is True/False
16:38:21 <ralonsoh> there are several places in Neutron where a shell to execute "kill" is spawned
16:38:46 <slaweq> ralonsoh: that is good idea
16:38:54 <slaweq> I will do it this way
16:39:00 <ralonsoh> perfect!
16:39:22 <bcafarel> that will be nice :)
16:39:37 <njohnston> very good +1
16:39:38 <slaweq> thx a lot for this idea and reviewing patch
16:40:20 <slaweq> that's all regarding functional/fullstack jobs from me
16:40:26 <slaweq> anything else You want to add?
16:41:47 <slaweq> ok, if not that was all from me for today
16:42:03 <slaweq> as I don't have anything new regarding scenario jobs
16:42:14 <slaweq> do You have anything else You want to talk about today?
16:42:26 <slaweq> if not, I think I can give You back about 15 minutes :)
16:43:15 <njohnston> o/
16:43:23 <bcafarel> yay
16:43:24 <bcafarel> o/
16:43:34 <slaweq> ok, lets finish than
16:43:35 <ralonsoh> bye
16:43:37 <slaweq> thx for attending
16:43:41 <slaweq> o/
16:43:45 <slaweq> #endmeeting