16:00:40 <ihrachys> #startmeeting neutron_ci 16:00:42 <openstack> Meeting started Tue Dec 12 16:00:40 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:43 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:46 <openstack> The meeting name has been set to 'neutron_ci' 16:00:50 <jlibosva> o/ 16:00:51 <mlavalle> o/ 16:00:56 <ihrachys> o/ 16:01:20 <haleyb> hi 16:01:28 <ihrachys> #topic Actions from prev meeting 16:01:37 <ihrachys> "ihrachys to make sure legacy tempest jobs are gone in gate queue" 16:01:44 <ihrachys> done: https://review.openstack.org/525754 16:01:51 <ihrachys> "ihrachys to update grafana for new non-legacy job names" 16:02:02 <ihrachys> done: https://review.openstack.org/525759 16:02:15 <ihrachys> "jlibosva to post wishlist bugs for fullstack improvements (reuse db dump; reuse env per class)" 16:02:21 <ihrachys> jlibosva, ? 16:02:24 * jlibosva sux 16:02:32 <ihrachys> gotcha 16:02:35 <ihrachys> #action jlibosva to post wishlist bugs for fullstack improvements (reuse db dump; reuse env per class) 16:02:37 <jlibosva> can you flip it pls as 16:02:39 <jlibosva> yeah that :) 16:02:41 <jlibosva> thanks 16:03:09 <ihrachys> these are all AIs we had 16:03:25 <ihrachys> I will flip the order of topics a bit this time 16:03:28 <ihrachys> #topic Tempest plugin 16:04:11 <ihrachys> we made quite some progress there 16:04:45 <ihrachys> for one, tempest in-tree tests are gone: https://review.openstack.org/#/c/506672/ 16:05:30 <ihrachys> we track remaining bits in https://etherpad.openstack.org/p/neutron-tempest-plugin-job-move 16:06:08 <ihrachys> I believe there are two main bits now left 16:06:28 <ihrachys> 1. completely remove legacy jobs from infra repos (this requires to first move them into our stable branches); and 16:06:51 <ihrachys> 2. remove all remaining neutron/tests/tempest/ contents (this requires all consuming projects to switch to new repo) 16:06:59 <ihrachys> let's start from point 1 16:07:06 <ihrachys> I believe mlavalle was looking into it 16:07:28 <mlavalle> I didn't make as much progress as I wanted on this 16:07:55 <ihrachys> this is the patch correct? https://review.openstack.org/#/c/525345/ 16:07:58 <mlavalle> haleyb pointed to a tripleo job that I didn't include in my migration patch 16:08:19 <mlavalle> ihrachys: yes, that's the patch 16:08:38 <ihrachys> what's the issue with tripleo job? 16:08:51 <mlavalle> I asked the infra team and EmilienM about that tripleo job 16:09:02 <mlavalle> we don't need to migrate it 16:09:12 <ihrachys> right. because it's probably shared across projects 16:09:23 <mlavalle> but EmilienM would like us to run in our check queue 16:09:30 <mlavalle> non voting 16:09:48 <mlavalle> I am looking now into adding it 16:10:13 <ihrachys> but should we keep it in our tree? isn't it shared? 16:10:25 <ihrachys> I believe if it's shared it then should live in a common place 16:10:27 <mlavalle> no, we are not going to keep it in our tree 16:10:35 <EmilienM> sorry, I'm in a meeting already 16:10:40 <EmilienM> I'm happy to discuss offline later 16:10:49 <mlavalle> EmilienM: no need to answer 16:10:57 <mlavalle> disregard us 16:11:11 <ihrachys> mlavalle, ok anyway, this is something that you will tackle one way or another 16:11:12 <EmilienM> mlavalle: I can't disregard you :-) 16:11:18 <haleyb> mlavalle: so it's an integrated gate job? 16:11:29 <ihrachys> mlavalle, another issue I see there is that new non-legacy jobs are now voting for some that are non-voting for legacy 16:11:35 <ihrachys> mlavalle, like fullstack or dvr-ha 16:11:47 <ihrachys> mlavalle, we should make sure their voting state is same 16:11:52 <mlavalle> ihrachys: I'll fithat 16:11:56 <mlavalle> fix that 16:12:17 <ihrachys> ok 16:13:02 <ihrachys> once this is done, we should be able to clean up infra repos from all those jobs 16:13:15 <ihrachys> (well, after we backport the patch to stable) 16:13:54 <ihrachys> we have something for -api here: https://review.openstack.org/516724 and here: https://review.openstack.org/516744 16:14:07 <ihrachys> but it will need refinement to clean up all those old jobs, not just -api 16:14:26 <ihrachys> the next bullet point to cover is cleaning up in-tree remaining tempest code 16:14:50 <ihrachys> https://etherpad.openstack.org/p/neutron-tempest-plugin-job-move line 28+ captured patches for projects that consume the code 16:14:59 <ihrachys> that need to land before we can safely pull off the rug 16:15:36 <ihrachys> I would suggest we focus on just stadium participants 16:15:46 <ihrachys> which would be vpnaas, midonet and dynamic-routing 16:15:56 <ihrachys> all those patches are in bad shape / Jenkins-1 16:16:03 <ihrachys> I believe chandankumar planned to respin them 16:16:53 <ihrachys> in one of those patches, suggestion was to add a devstack plugin to new plugin repo that could be used in subprojects to install tempest tests 16:17:01 <ihrachys> here is the new devstack plugin: https://review.openstack.org/#/c/526044/ 16:17:36 <ihrachys> is zuul feeling sick today?.. I see a lot of red across the board. 16:18:40 <ihrachys> anyway... 16:18:56 <haleyb> ihrachys: saw a note from infra on a timeout issue they're cahsing 16:19:01 <ihrachys> we will need chandankumar to refresh those patches in light of new devstack plugin 16:20:20 <ihrachys> speaking of tempest plugin, there is also a suggestion to add designate-specific job here: 16:20:24 <ihrachys> https://review.openstack.org/520233 16:20:39 <ihrachys> as a separate flavour 16:20:48 <ihrachys> (so we would have dvr, linuxbridge, and designate) 16:21:08 <ihrachys> and this is just to be able to run a single scenario https://review.openstack.org/#/c/520237/ 16:21:29 <ihrachys> so I wonder, what do people think about it? 16:21:52 <ihrachys> is it ok to add another one, or we should try to push it into an existing job? 16:22:19 <ihrachys> if we push it, it would mean that we would have a designate enabled job (maybe voting in the future) in neutron gate 16:22:26 <ihrachys> since dvr/ha and linuxbridge jobs are shared with neutron repo 16:22:34 <mlavalle> I like it 16:23:14 <mlavalle> I will review the patches later today 16:23:17 <ihrachys> like adding a new job? 16:23:22 <mlavalle> yes 16:23:55 <ihrachys> it will execute same tests + one dns scenario. is it the plan? 16:24:08 <ihrachys> or should we constraint it to the scenario? 16:24:21 <mlavalle> I think the scenario is enough 16:25:06 <ihrachys> ok, then line 80 in https://review.openstack.org/#/c/520233/8..9/.zuul.yaml should be refined 16:25:12 <ihrachys> to include just dns scenario 16:26:35 <ihrachys> ok I think it's all we have for tempest plugin 16:26:44 <ihrachys> #topic Grafana 16:26:45 <ihrachys> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:27:18 <ihrachys> now that scenarios are back at the board, we have them at 100%. nothing new. :) 16:27:34 <ihrachys> fullstack is also sideways in 60-80% interval 16:28:10 <ihrachys> nothing to talk about here, so moving on 16:28:12 <ihrachys> #topic Scenarios 16:29:37 <ihrachys> linuxbridge was affected by that iptables bug... looking where the fix is 16:30:16 <ihrachys> ok I think the bug was https://bugs.launchpad.net/neutron/+bug/1719711 16:30:17 <openstack> Launchpad bug 1719711 in neutron "iptables failed to apply when binding a port with AGENT.debug_iptables_rules enabled" [High,Fix released] - Assigned to Brian Haley (brian-haley) 16:30:24 <ihrachys> and looks like we merged the fix here: https://review.openstack.org/#/c/523319/ 16:30:29 <ihrachys> 3 days ago 16:31:02 <ihrachys> it's definitely a lot more clean now: http://logs.openstack.org/19/523319/9/check/neutron-tempest-plugin-scenario-linuxbridge/3779380/logs/testr_results.html.gz 16:31:06 <ihrachys> still, some failures 16:33:05 <ihrachys> both failures seem to suggest that security group rules not working? 16:33:12 <ihrachys> because I see ping succeeds 16:33:22 <ihrachys> so the expectation is probably that it shouldn't 16:33:28 <ihrachys> I don't think we have a bug report for this 16:33:54 <haleyb> and one is a test for protocol numbers? i would hope i didn't break something, or is this linuxbridge specific? 16:34:33 <jlibosva> the scenario with ovs agent fails too 16:34:35 <haleyb> https://bugs.launchpad.net/neutron/+bug/1736674 ? 16:34:36 <openstack> Launchpad bug 1736674 in OpenStack Security Advisory "sg rules are sometimes not applied" [Undecided,New] 16:34:36 <jlibosva> so maybe? :) 16:35:06 <jlibosva> or maybe I'm looking at wrong place 16:35:07 <haleyb> related to recent qos change it seems? 16:35:41 <ihrachys> hm. it's weird how this bug was handled security wise 16:37:03 <ihrachys> jlibosva, I think it's the culprit 16:37:08 <haleyb> https://review.openstack.org/#/c/449710/4/neutron/agent/l2/extensions/qos.py removed a try/except, perhaps there's something there? 16:37:09 <ihrachys> thanks for the link 16:38:28 <haleyb> guess we should bump the severity of that bug 16:38:49 <ihrachys> I set to high 16:38:52 <ihrachys> and confirmed 16:40:14 <ihrachys> frickler, any updates about https://bugs.launchpad.net/neutron/+bug/1736674 ? 16:40:15 <openstack> Launchpad bug 1736674 in neutron "sg rules are sometimes not applied" [High,Confirmed] 16:40:34 <ihrachys> oh wait, it's not you who reported it 16:40:48 <ihrachys> sorry 16:41:13 * mlavalle also sometimes mixes them up 16:42:07 <ihrachys> mlavalle, oh actually it's same person :) 16:42:16 <frickler> yes, its me :) 16:42:28 <mlavalle> that explains it :-) 16:42:34 <ihrachys> Dr. confused me 16:42:47 <ihrachys> frickler, so doc, will the patient live? 16:42:55 <frickler> I found a patch that when reverted seems to fix the issue 16:42:55 <ihrachys> any ideas what makes him sick? :) 16:43:23 <ihrachys> frickler, is the issue 100% reproducible? 16:43:27 <ihrachys> you mean https://review.openstack.org/#/c/449710/ right? 16:43:44 <frickler> yes, it seems to be 100% on my test node 16:44:19 <frickler> it also reproduced on my increased logging patch on the first attempt https://review.openstack.org/#/c/525934/ 16:45:12 <frickler> but I don't have time to work on a proper fix currently, so would be great if someone could take over from here 16:45:41 <mlavalle> frickler: what is the patch that, when revereted, fixes this? 16:45:50 <ihrachys> https://review.openstack.org/#/c/449710/4/neutron/agent/l2/extensions/qos.py 16:45:53 <ihrachys> mlavalle, ^ 16:45:58 <mlavalle> ok 16:46:20 <ihrachys> we need a volunteer to drive it 16:46:34 <ihrachys> I admit I don't have cycles since I fight other CI failures in downstream 16:46:37 <haleyb> i didn't see any tracebacks in the q-agt log 16:46:47 <ihrachys> yeah q-agt and q-l3 are clean 16:46:55 <frickler> it doesn't traceback 16:47:16 <frickler> without my logging patch, it silently installs iptables rules and deletes them immediately again 16:47:18 <ihrachys> could it be self.qos_driver.delete doing something nasty? 16:47:33 <frickler> see the logs in the bug 16:48:31 <frickler> and with qos disabled, the bug disappears, too 16:49:18 <ihrachys> right. qos rules, are any of them implemented with iptables? 16:49:42 <mlavalle> I don't think so 16:49:51 <ihrachys> dscp is actually 16:49:54 <ihrachys> for linuxbridge at least 16:51:07 <ihrachys> https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/linuxbridge/agent/extension_drivers/qos_driver.py#L125-L137 16:51:15 <haleyb> i can look once i finish the linuxbridge ARP critical issue, unless someone beats me to it... 16:51:20 <ihrachys> this is probably called now that we don't raise an exception 16:52:01 <ihrachys> and this: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/linuxbridge/agent/extension_drivers/qos_driver.py#L148-L152 16:52:17 <ihrachys> but jlibosva told it's in ovs too 16:52:25 <jlibosva> nono, disregard 16:52:28 <jlibosva> I clicked wrong link 16:52:34 <jlibosva> I later checked and didn't find it in ov 16:52:35 <jlibosva> ovs 16:52:45 <jlibosva> I looked again in LB job 16:52:59 <ihrachys> ok 16:53:07 <ihrachys> so let's assume for now it's linuxbridge 16:53:11 <mlavalle> I have a one on one with slawek every Friday 16:53:21 <ihrachys> I think those iptables lines for dscp are worth focusing on 16:53:26 <mlavalle> I can discuss with him this week 16:53:35 <ihrachys> mlavalle, yeah please pull him in. he should be aware of the code. 16:53:42 <ihrachys> thanks! 16:53:52 <mlavalle> so in the mean time, I'll assighn the bug to me 16:53:59 <mlavalle> haleyb: is that ok? 16:54:31 <haleyb> mlavalle: sure. the iptables line might need some tweaking, seems it's installed in hex 16:54:39 <haleyb> -A INPUT -j DSCP --set-dscp 0x0c 16:54:47 <mlavalle> ok 16:54:48 <haleyb> might be unrelated of course 16:54:56 <mlavalle> will consider it 16:55:01 <ihrachys> ok cool 16:55:02 <ihrachys> thanks 16:55:25 <ihrachys> as for dvr flavor... 16:55:34 * haleyb ducks 16:56:20 <ihrachys> http://logs.openstack.org/32/527032/1/check/neutron-tempest-plugin-dvr-multinode-scenario/8167f10/logs/testr_results.html.gz 16:56:30 <ihrachys> ok, so just east-west 16:56:54 <ihrachys> I remember there was a bug for that... 16:57:05 <ihrachys> https://bugs.launchpad.net/neutron/+bug/1717302 ? 16:57:06 <openstack> Launchpad bug 1717302 in neutron "Tempest floatingip scenario tests failing on DVR Multinode setup with HA" [High,Confirmed] 16:57:39 <ihrachys> is it the bug you mentioned haleyb ? 16:57:55 <mlavalle> I think it is 16:57:56 <ihrachys> probably not since it's ovs.. 16:58:26 <ihrachys> but anyway. last thing I see there is Swami bailing out 16:58:27 <haleyb> ihrachys: yes, and swami has no time and asked for help 16:58:30 <ihrachys> has anyone picked it up? 16:58:30 <jlibosva> Perhaps we could do the same with east-west tests what we did with fullstack? mark them as unstable to see where it will get us? 16:58:43 <ihrachys> jlibosva, good idea 16:58:48 <ihrachys> I think if it's the last bit, it's worth it 16:58:58 <ihrachys> then we can enable voting and have first victor 16:59:03 <ihrachys> victory 16:59:07 <jlibosva> it's actually haleyb's idea I think :) 16:59:26 <ihrachys> haleyb, please assign the bug to yourself 16:59:37 <ihrachys> haleyb, and please post a patch disabling the tests. works for you? 16:59:52 <haleyb> ihrachys: yes, we can do that as we chase the issue 16:59:58 <ihrachys> #action haleyb to post patch disabling east-west tests 17:00:00 <ihrachys> cool 17:00:04 <ihrachys> we are sadly out of time 17:00:09 <ihrachys> next time we'll focus on fullstack 17:00:14 <ihrachys> thanks folks! 17:00:17 <ihrachys> #endmeeting