16:00:40 #startmeeting neutron_ci 16:00:42 Meeting started Tue Dec 12 16:00:40 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:43 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:46 The meeting name has been set to 'neutron_ci' 16:00:50 o/ 16:00:51 o/ 16:00:56 o/ 16:01:20 hi 16:01:28 #topic Actions from prev meeting 16:01:37 "ihrachys to make sure legacy tempest jobs are gone in gate queue" 16:01:44 done: https://review.openstack.org/525754 16:01:51 "ihrachys to update grafana for new non-legacy job names" 16:02:02 done: https://review.openstack.org/525759 16:02:15 "jlibosva to post wishlist bugs for fullstack improvements (reuse db dump; reuse env per class)" 16:02:21 jlibosva, ? 16:02:24 * jlibosva sux 16:02:32 gotcha 16:02:35 #action jlibosva to post wishlist bugs for fullstack improvements (reuse db dump; reuse env per class) 16:02:37 can you flip it pls as 16:02:39 yeah that :) 16:02:41 thanks 16:03:09 these are all AIs we had 16:03:25 I will flip the order of topics a bit this time 16:03:28 #topic Tempest plugin 16:04:11 we made quite some progress there 16:04:45 for one, tempest in-tree tests are gone: https://review.openstack.org/#/c/506672/ 16:05:30 we track remaining bits in https://etherpad.openstack.org/p/neutron-tempest-plugin-job-move 16:06:08 I believe there are two main bits now left 16:06:28 1. completely remove legacy jobs from infra repos (this requires to first move them into our stable branches); and 16:06:51 2. remove all remaining neutron/tests/tempest/ contents (this requires all consuming projects to switch to new repo) 16:06:59 let's start from point 1 16:07:06 I believe mlavalle was looking into it 16:07:28 I didn't make as much progress as I wanted on this 16:07:55 this is the patch correct? https://review.openstack.org/#/c/525345/ 16:07:58 haleyb pointed to a tripleo job that I didn't include in my migration patch 16:08:19 ihrachys: yes, that's the patch 16:08:38 what's the issue with tripleo job? 16:08:51 I asked the infra team and EmilienM about that tripleo job 16:09:02 we don't need to migrate it 16:09:12 right. because it's probably shared across projects 16:09:23 but EmilienM would like us to run in our check queue 16:09:30 non voting 16:09:48 I am looking now into adding it 16:10:13 but should we keep it in our tree? isn't it shared? 16:10:25 I believe if it's shared it then should live in a common place 16:10:27 no, we are not going to keep it in our tree 16:10:35 sorry, I'm in a meeting already 16:10:40 I'm happy to discuss offline later 16:10:49 EmilienM: no need to answer 16:10:57 disregard us 16:11:11 mlavalle, ok anyway, this is something that you will tackle one way or another 16:11:12 mlavalle: I can't disregard you :-) 16:11:18 mlavalle: so it's an integrated gate job? 16:11:29 mlavalle, another issue I see there is that new non-legacy jobs are now voting for some that are non-voting for legacy 16:11:35 mlavalle, like fullstack or dvr-ha 16:11:47 mlavalle, we should make sure their voting state is same 16:11:52 ihrachys: I'll fithat 16:11:56 fix that 16:12:17 ok 16:13:02 once this is done, we should be able to clean up infra repos from all those jobs 16:13:15 (well, after we backport the patch to stable) 16:13:54 we have something for -api here: https://review.openstack.org/516724 and here: https://review.openstack.org/516744 16:14:07 but it will need refinement to clean up all those old jobs, not just -api 16:14:26 the next bullet point to cover is cleaning up in-tree remaining tempest code 16:14:50 https://etherpad.openstack.org/p/neutron-tempest-plugin-job-move line 28+ captured patches for projects that consume the code 16:14:59 that need to land before we can safely pull off the rug 16:15:36 I would suggest we focus on just stadium participants 16:15:46 which would be vpnaas, midonet and dynamic-routing 16:15:56 all those patches are in bad shape / Jenkins-1 16:16:03 I believe chandankumar planned to respin them 16:16:53 in one of those patches, suggestion was to add a devstack plugin to new plugin repo that could be used in subprojects to install tempest tests 16:17:01 here is the new devstack plugin: https://review.openstack.org/#/c/526044/ 16:17:36 is zuul feeling sick today?.. I see a lot of red across the board. 16:18:40 anyway... 16:18:56 ihrachys: saw a note from infra on a timeout issue they're cahsing 16:19:01 we will need chandankumar to refresh those patches in light of new devstack plugin 16:20:20 speaking of tempest plugin, there is also a suggestion to add designate-specific job here: 16:20:24 https://review.openstack.org/520233 16:20:39 as a separate flavour 16:20:48 (so we would have dvr, linuxbridge, and designate) 16:21:08 and this is just to be able to run a single scenario https://review.openstack.org/#/c/520237/ 16:21:29 so I wonder, what do people think about it? 16:21:52 is it ok to add another one, or we should try to push it into an existing job? 16:22:19 if we push it, it would mean that we would have a designate enabled job (maybe voting in the future) in neutron gate 16:22:26 since dvr/ha and linuxbridge jobs are shared with neutron repo 16:22:34 I like it 16:23:14 I will review the patches later today 16:23:17 like adding a new job? 16:23:22 yes 16:23:55 it will execute same tests + one dns scenario. is it the plan? 16:24:08 or should we constraint it to the scenario? 16:24:21 I think the scenario is enough 16:25:06 ok, then line 80 in https://review.openstack.org/#/c/520233/8..9/.zuul.yaml should be refined 16:25:12 to include just dns scenario 16:26:35 ok I think it's all we have for tempest plugin 16:26:44 #topic Grafana 16:26:45 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:27:18 now that scenarios are back at the board, we have them at 100%. nothing new. :) 16:27:34 fullstack is also sideways in 60-80% interval 16:28:10 nothing to talk about here, so moving on 16:28:12 #topic Scenarios 16:29:37 linuxbridge was affected by that iptables bug... looking where the fix is 16:30:16 ok I think the bug was https://bugs.launchpad.net/neutron/+bug/1719711 16:30:17 Launchpad bug 1719711 in neutron "iptables failed to apply when binding a port with AGENT.debug_iptables_rules enabled" [High,Fix released] - Assigned to Brian Haley (brian-haley) 16:30:24 and looks like we merged the fix here: https://review.openstack.org/#/c/523319/ 16:30:29 3 days ago 16:31:02 it's definitely a lot more clean now: http://logs.openstack.org/19/523319/9/check/neutron-tempest-plugin-scenario-linuxbridge/3779380/logs/testr_results.html.gz 16:31:06 still, some failures 16:33:05 both failures seem to suggest that security group rules not working? 16:33:12 because I see ping succeeds 16:33:22 so the expectation is probably that it shouldn't 16:33:28 I don't think we have a bug report for this 16:33:54 and one is a test for protocol numbers? i would hope i didn't break something, or is this linuxbridge specific? 16:34:33 the scenario with ovs agent fails too 16:34:35 https://bugs.launchpad.net/neutron/+bug/1736674 ? 16:34:36 Launchpad bug 1736674 in OpenStack Security Advisory "sg rules are sometimes not applied" [Undecided,New] 16:34:36 so maybe? :) 16:35:06 or maybe I'm looking at wrong place 16:35:07 related to recent qos change it seems? 16:35:41 hm. it's weird how this bug was handled security wise 16:37:03 jlibosva, I think it's the culprit 16:37:08 https://review.openstack.org/#/c/449710/4/neutron/agent/l2/extensions/qos.py removed a try/except, perhaps there's something there? 16:37:09 thanks for the link 16:38:28 guess we should bump the severity of that bug 16:38:49 I set to high 16:38:52 and confirmed 16:40:14 frickler, any updates about https://bugs.launchpad.net/neutron/+bug/1736674 ? 16:40:15 Launchpad bug 1736674 in neutron "sg rules are sometimes not applied" [High,Confirmed] 16:40:34 oh wait, it's not you who reported it 16:40:48 sorry 16:41:13 * mlavalle also sometimes mixes them up 16:42:07 mlavalle, oh actually it's same person :) 16:42:16 yes, its me :) 16:42:28 that explains it :-) 16:42:34 Dr. confused me 16:42:47 frickler, so doc, will the patient live? 16:42:55 I found a patch that when reverted seems to fix the issue 16:42:55 any ideas what makes him sick? :) 16:43:23 frickler, is the issue 100% reproducible? 16:43:27 you mean https://review.openstack.org/#/c/449710/ right? 16:43:44 yes, it seems to be 100% on my test node 16:44:19 it also reproduced on my increased logging patch on the first attempt https://review.openstack.org/#/c/525934/ 16:45:12 but I don't have time to work on a proper fix currently, so would be great if someone could take over from here 16:45:41 frickler: what is the patch that, when revereted, fixes this? 16:45:50 https://review.openstack.org/#/c/449710/4/neutron/agent/l2/extensions/qos.py 16:45:53 mlavalle, ^ 16:45:58 ok 16:46:20 we need a volunteer to drive it 16:46:34 I admit I don't have cycles since I fight other CI failures in downstream 16:46:37 i didn't see any tracebacks in the q-agt log 16:46:47 yeah q-agt and q-l3 are clean 16:46:55 it doesn't traceback 16:47:16 without my logging patch, it silently installs iptables rules and deletes them immediately again 16:47:18 could it be self.qos_driver.delete doing something nasty? 16:47:33 see the logs in the bug 16:48:31 and with qos disabled, the bug disappears, too 16:49:18 right. qos rules, are any of them implemented with iptables? 16:49:42 I don't think so 16:49:51 dscp is actually 16:49:54 for linuxbridge at least 16:51:07 https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/linuxbridge/agent/extension_drivers/qos_driver.py#L125-L137 16:51:15 i can look once i finish the linuxbridge ARP critical issue, unless someone beats me to it... 16:51:20 this is probably called now that we don't raise an exception 16:52:01 and this: https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/linuxbridge/agent/extension_drivers/qos_driver.py#L148-L152 16:52:17 but jlibosva told it's in ovs too 16:52:25 nono, disregard 16:52:28 I clicked wrong link 16:52:34 I later checked and didn't find it in ov 16:52:35 ovs 16:52:45 I looked again in LB job 16:52:59 ok 16:53:07 so let's assume for now it's linuxbridge 16:53:11 I have a one on one with slawek every Friday 16:53:21 I think those iptables lines for dscp are worth focusing on 16:53:26 I can discuss with him this week 16:53:35 mlavalle, yeah please pull him in. he should be aware of the code. 16:53:42 thanks! 16:53:52 so in the mean time, I'll assighn the bug to me 16:53:59 haleyb: is that ok? 16:54:31 mlavalle: sure. the iptables line might need some tweaking, seems it's installed in hex 16:54:39 -A INPUT -j DSCP --set-dscp 0x0c 16:54:47 ok 16:54:48 might be unrelated of course 16:54:56 will consider it 16:55:01 ok cool 16:55:02 thanks 16:55:25 as for dvr flavor... 16:55:34 * haleyb ducks 16:56:20 http://logs.openstack.org/32/527032/1/check/neutron-tempest-plugin-dvr-multinode-scenario/8167f10/logs/testr_results.html.gz 16:56:30 ok, so just east-west 16:56:54 I remember there was a bug for that... 16:57:05 https://bugs.launchpad.net/neutron/+bug/1717302 ? 16:57:06 Launchpad bug 1717302 in neutron "Tempest floatingip scenario tests failing on DVR Multinode setup with HA" [High,Confirmed] 16:57:39 is it the bug you mentioned haleyb ? 16:57:55 I think it is 16:57:56 probably not since it's ovs.. 16:58:26 but anyway. last thing I see there is Swami bailing out 16:58:27 ihrachys: yes, and swami has no time and asked for help 16:58:30 has anyone picked it up? 16:58:30 Perhaps we could do the same with east-west tests what we did with fullstack? mark them as unstable to see where it will get us? 16:58:43 jlibosva, good idea 16:58:48 I think if it's the last bit, it's worth it 16:58:58 then we can enable voting and have first victor 16:59:03 victory 16:59:07 it's actually haleyb's idea I think :) 16:59:26 haleyb, please assign the bug to yourself 16:59:37 haleyb, and please post a patch disabling the tests. works for you? 16:59:52 ihrachys: yes, we can do that as we chase the issue 16:59:58 #action haleyb to post patch disabling east-west tests 17:00:00 cool 17:00:04 we are sadly out of time 17:00:09 next time we'll focus on fullstack 17:00:14 thanks folks! 17:00:17 #endmeeting