16:00:40 <ihrachys> #startmeeting neutron_ci
16:00:42 <openstack> Meeting started Tue Dec 12 16:00:40 2017 UTC and is due to finish in 60 minutes.  The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:43 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:46 <openstack> The meeting name has been set to 'neutron_ci'
16:00:50 <jlibosva> o/
16:00:51 <mlavalle> o/
16:00:56 <ihrachys> o/
16:01:20 <haleyb> hi
16:01:28 <ihrachys> #topic Actions from prev meeting
16:01:37 <ihrachys> "ihrachys to make sure legacy tempest jobs are gone in gate queue"
16:01:44 <ihrachys> done: https://review.openstack.org/525754
16:01:51 <ihrachys> "ihrachys to update grafana for new non-legacy job names"
16:02:02 <ihrachys> done: https://review.openstack.org/525759
16:02:15 <ihrachys> "jlibosva to post wishlist bugs for fullstack improvements (reuse db dump; reuse env per class)"
16:02:21 <ihrachys> jlibosva, ?
16:02:24 * jlibosva sux
16:02:32 <ihrachys> gotcha
16:02:35 <ihrachys> #action jlibosva to post wishlist bugs for fullstack improvements (reuse db dump; reuse env per class)
16:02:37 <jlibosva> can you flip it pls as
16:02:39 <jlibosva> yeah that :)
16:02:41 <jlibosva> thanks
16:03:09 <ihrachys> these are all AIs we had
16:03:25 <ihrachys> I will flip the order of topics a bit this time
16:03:28 <ihrachys> #topic Tempest plugin
16:04:11 <ihrachys> we made quite some progress there
16:04:45 <ihrachys> for one, tempest in-tree tests are gone: https://review.openstack.org/#/c/506672/
16:05:30 <ihrachys> we track remaining bits in https://etherpad.openstack.org/p/neutron-tempest-plugin-job-move
16:06:08 <ihrachys> I believe there are two main bits now left
16:06:28 <ihrachys> 1. completely remove legacy jobs from infra repos (this requires to first move them into our stable branches); and
16:06:51 <ihrachys> 2. remove all remaining neutron/tests/tempest/ contents (this requires all consuming projects to switch to new repo)
16:06:59 <ihrachys> let's start from point 1
16:07:06 <ihrachys> I believe mlavalle was looking into it
16:07:28 <mlavalle> I didn't make as much progress as I wanted on this
16:07:55 <ihrachys> this is the patch correct? https://review.openstack.org/#/c/525345/
16:07:58 <mlavalle> haleyb pointed to a tripleo job that I didn't include in my migration patch
16:08:19 <mlavalle> ihrachys: yes, that's the patch
16:08:38 <ihrachys> what's the issue with tripleo job?
16:08:51 <mlavalle> I asked the infra team and EmilienM about that tripleo job
16:09:02 <mlavalle> we don't need to migrate it
16:09:12 <ihrachys> right. because it's probably shared across projects
16:09:23 <mlavalle> but EmilienM would like us to run in our check queue
16:09:30 <mlavalle> non voting
16:09:48 <mlavalle> I am looking now into adding it
16:10:13 <ihrachys> but should we keep it in our tree? isn't it shared?
16:10:25 <ihrachys> I believe if it's shared it then should live in a common place
16:10:27 <mlavalle> no, we are not going to keep it in our tree
16:10:35 <EmilienM> sorry, I'm in a meeting already
16:10:40 <EmilienM> I'm happy to discuss offline later
16:10:49 <mlavalle> EmilienM: no need to answer
16:10:57 <mlavalle> disregard us
16:11:11 <ihrachys> mlavalle, ok anyway, this is something that you will tackle one way or another
16:11:12 <EmilienM> mlavalle: I can't disregard you :-)
16:11:18 <haleyb> mlavalle: so it's an integrated gate job?
16:11:29 <ihrachys> mlavalle, another issue I see there is that new non-legacy jobs are now voting for some that are non-voting for legacy
16:11:35 <ihrachys> mlavalle, like fullstack or dvr-ha
16:11:47 <ihrachys> mlavalle, we should make sure their voting state is same
16:11:52 <mlavalle> ihrachys: I'll fithat
16:11:56 <mlavalle> fix that
16:12:17 <ihrachys> ok
16:13:02 <ihrachys> once this is done, we should be able to clean up infra repos from all those jobs
16:13:15 <ihrachys> (well, after we backport the patch to stable)
16:13:54 <ihrachys> we have something for -api here: https://review.openstack.org/516724 and here: https://review.openstack.org/516744
16:14:07 <ihrachys> but it will need refinement to clean up all those old jobs, not just -api
16:14:26 <ihrachys> the next bullet point to cover is cleaning up in-tree remaining tempest code
16:14:50 <ihrachys> https://etherpad.openstack.org/p/neutron-tempest-plugin-job-move line 28+ captured patches for projects that consume the code
16:14:59 <ihrachys> that need to land before we can safely pull off the rug
16:15:36 <ihrachys> I would suggest we focus on just stadium participants
16:15:46 <ihrachys> which would be vpnaas, midonet and dynamic-routing
16:15:56 <ihrachys> all those patches are in bad shape / Jenkins-1
16:16:03 <ihrachys> I believe chandankumar planned to respin them
16:16:53 <ihrachys> in one of those patches, suggestion was to add a devstack plugin to new plugin repo that could be used in subprojects to install tempest tests
16:17:01 <ihrachys> here is the new devstack plugin: https://review.openstack.org/#/c/526044/
16:17:36 <ihrachys> is zuul feeling sick today?.. I see a lot of red across the board.
16:18:40 <ihrachys> anyway...
16:18:56 <haleyb> ihrachys: saw a note from infra on a timeout issue they're cahsing
16:19:01 <ihrachys> we will need chandankumar to refresh those patches in light of new devstack plugin
16:20:20 <ihrachys> speaking of tempest plugin, there is also a suggestion to add designate-specific job here:
16:20:24 <ihrachys> https://review.openstack.org/520233
16:20:39 <ihrachys> as a separate flavour
16:20:48 <ihrachys> (so we would have dvr, linuxbridge, and designate)
16:21:08 <ihrachys> and this is just to be able to run a single scenario https://review.openstack.org/#/c/520237/
16:21:29 <ihrachys> so I wonder, what do people think about it?
16:21:52 <ihrachys> is it ok to add another one, or we should try to push it into an existing job?
16:22:19 <ihrachys> if we push it, it would mean that we would have a designate enabled job (maybe voting in the future) in neutron gate
16:22:26 <ihrachys> since dvr/ha and linuxbridge jobs are shared with neutron repo
16:22:34 <mlavalle> I like it
16:23:14 <mlavalle> I will review the patches later today
16:23:17 <ihrachys> like adding a new job?
16:23:22 <mlavalle> yes
16:23:55 <ihrachys> it will execute same tests + one dns scenario. is it the plan?
16:24:08 <ihrachys> or should we constraint it to the scenario?
16:24:21 <mlavalle> I think the scenario is enough
16:25:06 <ihrachys> ok, then line 80 in https://review.openstack.org/#/c/520233/8..9/.zuul.yaml should be refined
16:25:12 <ihrachys> to include just dns scenario
16:26:35 <ihrachys> ok I think it's all we have for tempest plugin
16:26:44 <ihrachys> #topic Grafana
16:26:45 <ihrachys> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:27:18 <ihrachys> now that scenarios are back at the board, we have them at 100%. nothing new. :)
16:27:34 <ihrachys> fullstack is also sideways in 60-80% interval
16:28:10 <ihrachys> nothing to talk about here, so moving on
16:28:12 <ihrachys> #topic Scenarios
16:29:37 <ihrachys> linuxbridge was affected by that iptables bug... looking where the fix is
16:30:16 <ihrachys> ok I think the bug was https://bugs.launchpad.net/neutron/+bug/1719711
16:30:17 <openstack> Launchpad bug 1719711 in neutron "iptables failed to apply when binding a port with AGENT.debug_iptables_rules enabled" [High,Fix released] - Assigned to Brian Haley (brian-haley)
16:30:24 <ihrachys> and looks like we merged the fix here: https://review.openstack.org/#/c/523319/
16:30:29 <ihrachys> 3 days ago
16:31:02 <ihrachys> it's definitely a lot more clean now: http://logs.openstack.org/19/523319/9/check/neutron-tempest-plugin-scenario-linuxbridge/3779380/logs/testr_results.html.gz
16:31:06 <ihrachys> still, some failures
16:33:05 <ihrachys> both failures seem to suggest that security group rules not working?
16:33:12 <ihrachys> because I see ping succeeds
16:33:22 <ihrachys> so the expectation is probably that it shouldn't
16:33:28 <ihrachys> I don't think we have a bug report for this
16:33:54 <haleyb> and one is a test for protocol numbers?  i would hope i didn't break something, or is this linuxbridge specific?
16:34:33 <jlibosva> the scenario with ovs agent fails too
16:34:35 <haleyb> https://bugs.launchpad.net/neutron/+bug/1736674 ?
16:34:36 <openstack> Launchpad bug 1736674 in OpenStack Security Advisory "sg rules are sometimes not applied" [Undecided,New]
16:34:36 <jlibosva> so maybe? :)
16:35:06 <jlibosva> or maybe I'm looking at wrong place
16:35:07 <haleyb> related to recent qos change it seems?
16:35:41 <ihrachys> hm. it's weird how this bug was handled security wise
16:37:03 <ihrachys> jlibosva, I think it's the culprit
16:37:08 <haleyb> https://review.openstack.org/#/c/449710/4/neutron/agent/l2/extensions/qos.py removed a try/except, perhaps there's something there?
16:37:09 <ihrachys> thanks for the link
16:38:28 <haleyb> guess we should bump the severity of that bug
16:38:49 <ihrachys> I set to high
16:38:52 <ihrachys> and confirmed
16:40:14 <ihrachys> frickler, any updates about https://bugs.launchpad.net/neutron/+bug/1736674 ?
16:40:15 <openstack> Launchpad bug 1736674 in neutron "sg rules are sometimes not applied" [High,Confirmed]
16:40:34 <ihrachys> oh wait, it's not you who reported it
16:40:48 <ihrachys> sorry
16:41:13 * mlavalle also sometimes mixes them up
16:42:07 <ihrachys> mlavalle, oh actually it's same person :)
16:42:16 <frickler> yes, its me :)
16:42:28 <mlavalle> that explains it :-)
16:42:34 <ihrachys> Dr. confused me
16:42:47 <ihrachys> frickler, so doc, will the patient live?
16:42:55 <frickler> I found a patch that when reverted seems to fix the issue
16:42:55 <ihrachys> any ideas what makes him sick? :)
16:43:23 <ihrachys> frickler, is the issue 100% reproducible?
16:43:27 <ihrachys> you mean https://review.openstack.org/#/c/449710/ right?
16:43:44 <frickler> yes, it seems to be 100% on my test node
16:44:19 <frickler> it also reproduced on my increased logging patch on the first attempt https://review.openstack.org/#/c/525934/
16:45:12 <frickler> but I don't have time to work on a proper fix currently, so would be great if someone could take over from here
16:45:41 <mlavalle> frickler: what is the patch that, when revereted, fixes this?
16:45:50 <ihrachys> https://review.openstack.org/#/c/449710/4/neutron/agent/l2/extensions/qos.py
16:45:53 <ihrachys> mlavalle, ^
16:45:58 <mlavalle> ok
16:46:20 <ihrachys> we need a volunteer to drive it
16:46:34 <ihrachys> I admit I don't have cycles since I fight other CI failures in downstream
16:46:37 <haleyb> i didn't see any tracebacks in the q-agt log
16:46:47 <ihrachys> yeah q-agt and q-l3 are clean
16:46:55 <frickler> it doesn't traceback
16:47:16 <frickler> without my logging patch, it silently installs iptables rules and deletes them immediately again
16:47:18 <ihrachys> could it be self.qos_driver.delete doing something nasty?
16:47:33 <frickler> see the logs in the bug
16:48:31 <frickler> and with qos disabled, the bug disappears, too
16:49:18 <ihrachys> right. qos rules, are any of them implemented with iptables?
16:49:42 <mlavalle> I don't think so
16:49:51 <ihrachys> dscp is actually
16:49:54 <ihrachys> for linuxbridge at least
16:51:07 <ihrachys> https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/linuxbridge/agent/extension_drivers/qos_driver.py#L125-L137
16:51:15 <haleyb> i can look once i finish the linuxbridge ARP critical issue, unless someone beats me to it...
16:51:20 <ihrachys> this is probably called now that we don't raise an exception
16:52:01 <ihrachys> and this: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/linuxbridge/agent/extension_drivers/qos_driver.py#L148-L152
16:52:17 <ihrachys> but jlibosva told it's in ovs too
16:52:25 <jlibosva> nono, disregard
16:52:28 <jlibosva> I clicked wrong link
16:52:34 <jlibosva> I later checked and didn't find it in ov
16:52:35 <jlibosva> ovs
16:52:45 <jlibosva> I looked again in LB job
16:52:59 <ihrachys> ok
16:53:07 <ihrachys> so let's assume for now it's linuxbridge
16:53:11 <mlavalle> I have a one on one with slawek every Friday
16:53:21 <ihrachys> I think those iptables lines for dscp are worth focusing on
16:53:26 <mlavalle> I can discuss with him this week
16:53:35 <ihrachys> mlavalle, yeah please pull him in. he should be aware of the code.
16:53:42 <ihrachys> thanks!
16:53:52 <mlavalle> so in the mean time, I'll assighn the bug to me
16:53:59 <mlavalle> haleyb: is that ok?
16:54:31 <haleyb> mlavalle: sure.  the iptables line might need some tweaking, seems it's installed in hex
16:54:39 <haleyb> -A INPUT -j DSCP --set-dscp 0x0c
16:54:47 <mlavalle> ok
16:54:48 <haleyb> might be unrelated of course
16:54:56 <mlavalle> will consider it
16:55:01 <ihrachys> ok cool
16:55:02 <ihrachys> thanks
16:55:25 <ihrachys> as for dvr flavor...
16:55:34 * haleyb ducks
16:56:20 <ihrachys> http://logs.openstack.org/32/527032/1/check/neutron-tempest-plugin-dvr-multinode-scenario/8167f10/logs/testr_results.html.gz
16:56:30 <ihrachys> ok, so just east-west
16:56:54 <ihrachys> I remember there was a bug for that...
16:57:05 <ihrachys> https://bugs.launchpad.net/neutron/+bug/1717302 ?
16:57:06 <openstack> Launchpad bug 1717302 in neutron "Tempest floatingip scenario tests failing on DVR Multinode setup with HA" [High,Confirmed]
16:57:39 <ihrachys> is it the bug you mentioned haleyb ?
16:57:55 <mlavalle> I think it is
16:57:56 <ihrachys> probably not since it's ovs..
16:58:26 <ihrachys> but anyway. last thing I see there is Swami bailing out
16:58:27 <haleyb> ihrachys: yes, and swami has no time and asked for help
16:58:30 <ihrachys> has anyone picked it up?
16:58:30 <jlibosva> Perhaps we could do the same with east-west tests what we did with fullstack? mark them as unstable to see where it will get us?
16:58:43 <ihrachys> jlibosva, good idea
16:58:48 <ihrachys> I think if it's the last bit, it's worth it
16:58:58 <ihrachys> then we can enable voting and have first victor
16:59:03 <ihrachys> victory
16:59:07 <jlibosva> it's actually haleyb's idea I think :)
16:59:26 <ihrachys> haleyb, please assign the bug to yourself
16:59:37 <ihrachys> haleyb, and please post a patch disabling the tests. works for you?
16:59:52 <haleyb> ihrachys: yes, we can do that as we chase the issue
16:59:58 <ihrachys> #action haleyb to post patch disabling east-west tests
17:00:00 <ihrachys> cool
17:00:04 <ihrachys> we are sadly out of time
17:00:09 <ihrachys> next time we'll focus on fullstack
17:00:14 <ihrachys> thanks folks!
17:00:17 <ihrachys> #endmeeting