16:01:41 <slaweq> #startmeeting neutron_ci
16:01:42 <openstack> Meeting started Tue Mar 12 16:01:41 2019 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:01:43 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:01:43 <slaweq> hi
16:01:45 <openstack> The meeting name has been set to 'neutron_ci'
16:02:06 <njohnston> o/
16:02:24 <slaweq> hi njohnston
16:02:39 <njohnston> hello!
16:02:49 <slaweq> lets wait a bit more for others like haleyb, hongbin, bcafarel
16:02:58 <slaweq> I know that mlavalle will not be here today
16:03:05 <bcafarel> bot is back in order?
16:03:10 <bcafarel> (also o/ )
16:03:23 <slaweq> and also I need to finish this meeting in about 45 minutes
16:03:32 <slaweq> bcafarel: seems so
16:03:56 <slaweq> ok, lets go then
16:03:59 <slaweq> first of all
16:04:01 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:04:06 <slaweq> please open it now :)
16:04:17 <bcafarel> :)
16:04:20 <slaweq> it will be ready for later :)
16:04:44 <slaweq> #topic Actions from previous meetings
16:04:55 <slaweq> lets go then
16:04:58 <slaweq> first action
16:05:00 <slaweq> * njohnston Debug fullstack DSCP issue
16:05:30 <slaweq> I know that njohnston didn't look into it
16:05:33 <slaweq> because I did :)
16:05:40 <slaweq> and I found an issue
16:05:44 <slaweq> fix is here: https://review.openstack.org/#/c/642186/
16:06:18 <njohnston> slaweq++
16:06:21 <slaweq> basically it looks that sometimes tcpdump was starting too slow, we send one icmp packet which wasn't captured by tcpdump and test failed
16:06:31 <slaweq> so I proposed to send at least 10 packets always
16:06:43 <ralonsoh> good catch!
16:06:50 <slaweq> then some should be always captured by tcpdump
16:07:06 <slaweq> thx ralonsoh :)
16:07:09 <slaweq> ok, next one
16:07:11 <slaweq> mlavalle to take a look at fullstack dhcp rescheduling issue https://bugs.launchpad.net/neutron/+bug/1799555
16:07:12 <openstack> Launchpad bug 1799555 in neutron "Fullstack test neutron.tests.fullstack.test_dhcp_agent.TestDhcpAgentHA.test_reschedule_network_on_new_agent timeout" [High,Confirmed]
16:07:36 <slaweq> I don't think that mlavalle was looking into this one
16:08:16 <ralonsoh> slaweq, I can try it
16:08:22 <slaweq> thx ralonsoh :)
16:08:33 <slaweq> #action ralonsoh to take a look at fullstack dhcp rescheduling issue https://bugs.launchpad.net/neutron/+bug/1799555
16:08:34 <openstack> Launchpad bug 1799555 in neutron "Fullstack test neutron.tests.fullstack.test_dhcp_agent.TestDhcpAgentHA.test_reschedule_network_on_new_agent timeout" [High,Confirmed]
16:08:52 <slaweq> and the last one from last week was:
16:08:54 <slaweq> slaweq to talk with tmorin about networking-bagpipe
16:09:03 <slaweq> and I totally forgot about this
16:09:05 <slaweq> sorry for that
16:09:13 <slaweq> I will add it to myself for this week
16:09:19 <slaweq> #action slaweq to talk with tmorin about networking-bagpipe
16:10:35 <slaweq> anything else You want to add/ask?
16:10:44 <njohnston> nope
16:11:17 <slaweq> ok, lets move on then
16:11:24 <slaweq> #topic Python 3
16:11:35 <slaweq> njohnston: any updates about stadium projects?
16:11:47 <slaweq> etherpad is here: https://etherpad.openstack.org/p/neutron_stadium_python3_status
16:12:17 <njohnston2> Nothing yet, but I will note that as of now we have satisfied the basic requirements for the community goal
16:12:49 <njohnston2> The community goal doesn't specify all the kinds of testing we're handling at this point, it establishes a baseline
16:13:14 <slaweq> great then
16:13:17 <njohnston2> We've met that baseline across the stadium AFAICT
16:13:30 <slaweq> that is very good news njohnston2 :)
16:13:46 <njohnston2> I'll continue and I think we should keep this topic, but this is more forward-looking, getting us ready for the python2 removal in the U cycle
16:13:56 <slaweq> but I will keep this topic in agenda for now, to track our (slow) progress on that in next cycle too
16:14:02 <slaweq> :)
16:14:07 <njohnston2> perfect
16:14:13 <slaweq> thx njohnston2
16:14:28 <slaweq> ok, lets move on then
16:14:30 <slaweq> next topic
16:14:32 <slaweq> #topic Ubuntu Bionic in CI jobs
16:16:09 <slaweq> We are almost good with it
16:16:19 <slaweq> we have for now one issue with fullstack tests in neutron
16:16:37 <slaweq> because on Bionic node we shouldn't compile ovs kernel module anymore, it's not necessary
16:17:16 <slaweq> I will try to push patch for that tomorrow morning, unless there is someone else who want's to take care of it today
16:17:50 <slaweq> I tried today with https://review.openstack.org/#/c/642461/ and it looks good
16:18:08 <slaweq> but it should be done in the way which will work on both, xenial and bionic
16:18:40 <njohnston> I’ll take a look
16:18:42 <slaweq> for networking-dynamic-routing we should be good to go with patch https://review.openstack.org/#/c/642433/2
16:18:46 <slaweq> njohnston: thx :)
16:19:06 <slaweq> if some of You have +2 power in networking-dynamic-routing, please check this patch :)
16:19:17 <slaweq> it isn't complicated :)
16:19:38 <slaweq> and works fine for Bionic too: https://review.openstack.org/#/c/639675/2
16:19:45 <njohnston2> +2
16:19:55 <slaweq> thx njohnston2 :)
16:20:28 <slaweq> and third problem (but not very urgent) is with networking-bagpipe: https://review.openstack.org/#/c/642456/1 fullstack job is running but there is some fail still: http://logs.openstack.org/87/639987/3/check/legacy-networking-bagpipe-dsvm-fullstack/8d5af6c/job-output.txt.gz
16:20:54 <slaweq> here we should have for sure same patch as for neutron to not compile ovs kernel module on Bionic
16:21:02 <slaweq> but even with this tests are failing
16:21:22 <njohnston2> "ImportError: cannot import name async_process" perhaps a dependency changed names?
16:21:27 <slaweq> I'm not familiar with bagpipe so I'm not sure if that is somehow related to switch to Bionic
16:21:31 <slaweq> njohnston2: maybe
16:21:43 <slaweq> but problem isn't very urgent as it is non-voting job there :)
16:22:26 <slaweq> and last one with some problems is networking-midonet, but here yamamoto is aware of those issues so I hope they will fix them
16:23:23 <slaweq> ok, any questions/something to add regarding bionic?
16:23:51 <njohnston> looks good, great work
16:24:01 <slaweq> thx njohnston :)
16:24:12 <slaweq> ok, so moving to next topic
16:24:16 <slaweq> #topic tempest-plugins migration
16:24:26 <slaweq> njohnston_: how many of You are in the meeting? :D
16:24:47 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo
16:24:54 <slaweq> any updates here?
16:24:59 <slaweq> I don't have anything
16:25:03 <njohnston> Hey, I've been saying that cloning myself was the only way to get ahead of these trello cards...
16:25:13 <njohnston> nothing from me
16:25:15 <slaweq> njohnston: LOL
16:25:20 <slaweq> great idea
16:25:33 <slaweq> but I don't know if my wife will handle that when I will clone myself :P
16:25:41 * haleyb wanders in late
16:25:46 <njohnston> you're not allowed to clone yourself, because an army of Hulks would be unstoppable
16:25:46 <haleyb> any bugs for me? :)
16:25:56 <slaweq> LOL
16:25:58 <njohnston> haleyb: all of them
16:26:10 <slaweq> haleyb: hi, sure, if You want some :P
16:26:18 <slaweq> ok, lets move on
16:26:20 <slaweq> #topic Grafana
16:26:31 <slaweq> reminder link: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:27:07 <haleyb> that reminds me to update the ovn dashboard
16:27:50 <njohnston> is it weird to anyone else how for the check queue the number of runs is on a steady downward slope
16:28:39 <njohnston> example "Number of Unit Tests jobs runs (Check queue)" has a high around 78 on 3/6 steadily down to 45 for 3/11
16:29:05 <slaweq> njohnston: yes, looks strange but lets see how it will be in next days maybe
16:29:34 <slaweq> IMHO last week we had some "spike" there
16:30:24 <slaweq> njohnston: if You will get data from 30 days, it doesn't look odd
16:30:31 <njohnston> oh good
16:30:46 <njohnston> I set it to the 30 day view, that should render sometime on Thursday
16:31:49 <njohnston> actually that was very fast.  That really helps show what is normal skitter and what isn't.
16:32:02 <slaweq> from other things it generally looks much better than in last week
16:32:19 <slaweq> I see that unit tests are rising, and that isn't good IMO
16:32:30 <bcafarel> a few fixes got in yeah
16:32:47 <slaweq> and in last couple of days I saw issue like http://logs.openstack.org/79/633979/26/check/openstack-tox-py37/e7878ff/testr_results.html.gz in various unit tests
16:32:54 <slaweq> did You saw it too?
16:33:32 <bcafarel> test_port_ip_update_revises? I don't think so
16:33:38 <haleyb> i saw this update_revises somewhere
16:33:54 <slaweq> looking at logstash: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%20%5C%22line%20170%2C%20in%20test_port_ip_update_revises%5C%22
16:34:06 <slaweq> it happend at least couple of times in last week
16:34:47 <njohnston> I managed to miss that one
16:34:51 <slaweq> anyone wants to look at this?
16:35:11 <ralonsoh> me
16:35:17 <slaweq> ralonsoh: thx a lot
16:35:30 <slaweq> please report a bug for it so we can track it there
16:35:37 <ralonsoh> dure
16:35:39 <ralonsoh> sure
16:35:52 <slaweq> #action ralonsoh to take a look at update_revises unit test failures
16:36:15 <slaweq> ok, lets move on then
16:36:34 <slaweq> or You want to talk about something else related to grafana?
16:36:45 <njohnston> While we have grafana up... What do people think about marking the neutron-tempest-iptables_hybrid-fedora job voting?  It looks very stable as far as error rate, and will be helpful as the next version of RHEL comes down the pike.
16:37:35 <slaweq> njohnston: that is very good idea IMO
16:38:11 <njohnston> OK, I'll push a change for it
16:38:25 <slaweq> it's under 25% in last 30 days at least
16:38:36 <slaweq> there wasn't any real problem with it before IIRC
16:38:55 <njohnston> it only deviates from neutron-tempest-iptables_hybrid by a percent or two
16:39:01 <slaweq> so I think it's very good idea to make it voting (and maybe gating in few weeks too)
16:39:04 <haleyb> is the fedora one on that page?
16:39:07 <njohnston> which seems to be within the margin of error
16:39:25 <njohnston> haleyb: Yes, look in " Integrated Tempest Failure Rates (Check queue)"
16:39:32 <njohnston> which has a lot of lines but that is one of them
16:39:38 <slaweq> http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?panelId=16&fullscreen&orgId=1&from=now-30d&to=now
16:39:49 <haleyb> ah, a search didn't show it initially
16:40:08 <njohnston> latest datapoint: neutron-tempest-iptables_hybrid 8% fails, neutron-tempest-iptables_hybrid-fedora 6% fails
16:40:53 <slaweq> yes, there was one spike up to 100% about 2 weeks ago but it was the same for neutron-tempest-iptables_hybrid job and was related to os-vif issue IIRC
16:41:16 <slaweq> other than that it's good and I think it can be voting IMO
16:41:26 <slaweq> haleyb: any thoughts?
16:41:45 <haleyb> +1 for voting
16:41:57 <slaweq> ok, njohnston please propose patch for that
16:42:01 <njohnston> slaweq: https://review.openstack.org/642818
16:42:07 <slaweq> that was fast
16:42:11 <bcafarel> :)
16:42:12 <haleyb> as long as we remember we're at the end of stein :)
16:42:48 <slaweq> yes, we can maybe wait with this patch couple of weeks to release stein and then merge this one
16:43:10 <slaweq> but IMO it will be good to have patch there and ask others what they think about it :)
16:43:16 <njohnston2> I can -W until the branch
16:43:30 <slaweq> +1
16:43:34 <bcafarel> sounds good
16:43:59 <slaweq> ok, lets move on as I will need to go soon
16:44:04 <slaweq> next topic
16:44:14 <slaweq> #topic functional/fullstack
16:44:32 <slaweq> we merged some fixes for recent failures and we are in better shape now
16:44:38 <slaweq> I wanted to share with You one bug
16:44:40 <slaweq> https://bugs.launchpad.net/neutron/+bug/1818614
16:44:41 <openstack> Launchpad bug 1818614 in neutron "Various L3HA functional tests fails often" [Critical,In progress] - Assigned to Slawek Kaplonski (slaweq)
16:44:47 <slaweq> I was investigating it during last week
16:44:54 <slaweq> and I found 2 different issues there
16:45:22 <slaweq> one with neutron-keepalived-state-change for which patch is https://review.openstack.org/#/c/642295/
16:45:30 <slaweq> but I will need to check it once again
16:45:36 <slaweq> so it's still -W
16:45:52 <slaweq> but I also opened second bug https://bugs.launchpad.net/neutron/+bug/1819160
16:45:54 <openstack> Launchpad bug 1819160 in neutron "Functional tests for dvr ha routers are broken" [High,In progress] - Assigned to Slawek Kaplonski (slaweq)
16:46:07 <slaweq> and I would like to ask haleyb and other L3 experts to take a look on it
16:46:31 <slaweq> basically it looks for me that in test_dvr_router module in functional tests we are not testing dvr ha routers
16:46:43 <slaweq> or differently
16:46:54 <slaweq> we should test one dvr ha router spawned on 2 agents
16:47:09 <slaweq> but we are testing 2 independent dvr routers spawned on 2 different agents
16:47:27 <slaweq> and both of them can be set to master in the same time
16:47:48 <slaweq> IMO that is wrong and should be fixed but please check it if You will have time :)
16:48:02 <haleyb> will look
16:48:08 <slaweq> thx
16:48:49 <slaweq> ok, anything else You want to discuss today, because I need to leave now :)
16:49:02 <njohnston> that's it for me
16:49:21 <slaweq> ok, thx for attending and sorry for a nit shorter meeting today :)
16:49:22 <bcafarel> nothing worth keeping you around!
16:49:25 <slaweq> #endmeeting