#openstack-meeting log

16:00:55 <slaweq> #startmeeting neutron_ci
16:00:56 <openstack> Meeting started Tue May 29 16:00:55 2018 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:57 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:58 <slaweq> hello
16:00:59 <openstack> The meeting name has been set to 'neutron_ci'
16:01:10 <mlavalle> o/
16:01:52 <slaweq> hi mlavalle
16:02:00 <slaweq> haleyb: are You around maybe?
16:02:23 <haleyb> slaweq: yes, sorry, just getting off other call
16:02:33 <slaweq> haleyb: no problem, hi :)
16:02:40 <slaweq> ok, so let's start
16:02:46 <slaweq> #topic Actions from previous meetings
16:03:00 <slaweq> first one is:
16:03:00 <slaweq> mlavalle will check why trunk tests are failing in dvr multinode scenario
16:03:17 <mlavalle> I took a look at it
16:03:47 <mlavalle> The first one is  neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_subport_connectivity
16:04:30 <mlavalle> it seems that with this patch we introduced a bug: https://review.openstack.org/#/c/567875
16:05:08 <mlavalle> We are getting a bash syntax error: http://logs.openstack.org/05/570405/3/check/neutron-tempest-plugin-dvr-multinode-scenario/bd70e59/job-output.txt.gz#_2018-05-25_19_15_56_380909
16:05:30 <mlavalle> I have examined a couple of intsnaces of the error and they are the same
16:06:02 <slaweq> good catch mlavalle
16:06:04 <mlavalle> it seems to me the " is not needed in dhclient $IFACE.10"
16:06:20 <mlavalle> so I will propose a patch to fix it
16:06:47 <haleyb> yes, good catch mlavalle :)
16:07:04 <mlavalle> I am still investigating what is the problem with neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle
16:07:20 <mlavalle> in that case we get ssh time out
16:07:29 <mlavalle> I will continue working on it
16:07:45 <slaweq> ok, great that we have some progress on it, thx mlavalle
16:07:58 <mlavalle> :-)
16:08:19 <slaweq> next one was:
16:08:19 <slaweq> slaweq will continue debugging slow rally tests issue
16:08:33 <slaweq> again I didn't have time to investigate it
16:08:44 <mlavalle> we had the Summit
16:08:57 <mlavalle> so understandable
16:09:00 <slaweq> but as it not happens often I "set low priority" to it in my queue :)
16:09:25 <slaweq> next one:
16:09:26 <slaweq> slaweq to debug failing security groups fullstack test: https://bugs.launchpad.net/neutron/+bug/1767829
16:09:27 <openstack> Launchpad bug 1767829 in neutron "Fullstack test_securitygroup.TestSecurityGroupsSameNetwork fails often after SG rule delete" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
16:09:34 <slaweq> I started digging into it more today
16:09:49 <slaweq> but it also not happens often in gate recently
16:10:30 <mlavalle> those are the tough ones
16:10:34 <slaweq> I can reproduce it together with patch https://review.openstack.org/#/c/470912/ locally - issue looks the same so I hope that maybe this patch is somehow triggering it often
16:10:56 <slaweq> if not, maybe I will at least find what is wrong with https://review.openstack.org/#/c/470912/ and help with this one :)
16:11:26 <slaweq> #action slaweq continue debugging fullstack security groups issue: https://bugs.launchpad.net/neutron/+bug/1767829
16:11:27 <openstack> Launchpad bug 1767829 in neutron "Fullstack test_securitygroup.TestSecurityGroupsSameNetwork fails often after SG rule delete" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
16:11:28 <haleyb> slaweq: interesting.  i tried to reproduce that manually and couldn't, we'll need to look at conntrack entries before/after
16:11:58 <slaweq> haleyb: I just had failed test with Your patch locally 10 minutes before meeting :)
16:12:08 <slaweq> I plan to continue work on it tomorrow morning
16:12:16 <haleyb> ack
16:12:37 <slaweq> I also added some additional logs in conntrack module to check what happens there - maybe I will find something tomorrow
16:12:52 <slaweq> ok, next one is:
16:12:54 <slaweq> slaweq to check why some scenario tests don't log instance console log
16:13:01 <slaweq> I added logging console log in case of remote_connectivity check will fail: https://review.openstack.org/#/c/568808/
16:13:24 <slaweq> it's merged so we should have console log logged in case of such issues now
16:13:55 <mlavalle> yeah I saw that one. Thanks!
16:14:15 <slaweq> and the last one was:
16:14:16 <slaweq> slaweq to switch neutron-tempest-plugin-designate-scenario to be voting
16:14:22 <slaweq> Done: https://review.openstack.org/#/c/568681/
16:14:34 <slaweq> grafana dashboard is also updated
16:14:46 <slaweq> so we have one more voting job now :)
16:14:53 <mlavalle> yeap
16:15:29 <slaweq> ok, moving on to next topic
16:15:30 <slaweq> #topic Grafana
16:15:37 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:16:05 * slaweq has storm here so internet connection might be broken accidentally
16:17:02 * mlavalle only sees blue skies and 95F (~35C)
16:17:09 <slaweq> mlavalle: LOL
16:17:25 <slaweq> Yesterday and today morning there was an issue with cmd2 python module. It now should be fixed by https://review.openstack.org/#/c/570822/
16:17:52 <slaweq> because of this issue most of jobs were failing recently
16:17:56 <mlavalle> cool, thanks for letting us know
16:18:03 <slaweq> but it should be good soon (I hope)
16:18:25 <mlavalle> it was a good thing that it was slow due to the holiday in the USA
16:18:40 <slaweq> yes, exactly
16:18:49 <slaweq> there wasn't many patches in queue yesterday
16:19:03 <slaweq> second thing, just FYI: Last week there was also problem with openvswitch kernel module compilation which impacted fullstack and ovsfw scenario jobs - fixed already by https://review.openstack.org/#/c/570085/
16:20:34 <slaweq> do You have anything to add/ask about grafana?
16:20:49 <mlavalle> no, I think we need to wait for things to normalize
16:20:49 <slaweq> or we will now go to discuss different jobs?
16:20:57 <slaweq> mlavalle: I agree
16:21:14 <slaweq> so going to next topic then
16:21:15 <slaweq> #topic Fullstack
16:21:39 <slaweq> when I was preparing today's meeting I found two different problems with fullstack
16:21:56 <slaweq> one is issue with (probably) conntrack entries which we already discussed
16:22:19 <slaweq> and second one which happens more often: https://bugs.launchpad.net/neutron/+bug/1774006
16:22:20 <openstack> Launchpad bug 1774006 in neutron "Fullstack security group test fails on _test_using_multiple_security_groups" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
16:22:37 <slaweq> Failure related (probably) to new test for multiple sg groups, e.g.:
16:22:47 <slaweq> http://logs.openstack.org/37/558037/6/check/neutron-fullstack/85bb570/logs/testr_results.html.gz
16:22:58 <slaweq> I want to take a look on it soon also
16:23:15 <slaweq> as I have prepared host for debugging fullstack tests now :)
16:23:42 <slaweq> #action slaweq will check fullstack multiple sg test failure: https://bugs.launchpad.net/neutron/+bug/1774006
16:23:43 <openstack> Launchpad bug 1774006 in neutron "Fullstack security group test fails on _test_using_multiple_security_groups" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
16:24:14 <slaweq> or I can give one of You this pleasure to debug it if You want :)
16:24:23 <slaweq> sorry that I didn't ask earlier ;)
16:25:49 <haleyb> sorry, is that related to the other SG failure?  or maybe it is the failure :(
16:26:15 <haleyb> different one...
16:26:19 <slaweq> haleyb: it is different failure than we discused before
16:26:38 <slaweq> that's why I opened new bug for it also
16:26:39 <haleyb> 26 minutes ago :)
16:26:43 <slaweq> yep
16:26:59 <slaweq> did You already forgot about it? :D
16:27:12 <haleyb> is it the weekend yet?
16:27:23 <slaweq> LOL
16:27:46 <haleyb> that failure looks interesting, as if it's a nested network namespace or something
16:28:19 <slaweq> yes, but it is not failing 100% times so it's kind of race probably
16:28:49 <slaweq> but in fact there aren't any nested namespaces IIRC
16:29:16 <slaweq> there are namespaces which "simulates" vms and namespaces which "simulates" host with lb agent
16:29:27 <slaweq> in this namespace lb agent applies it's SG rules
16:29:46 <slaweq> such "hosts" are connected with some ovs bridge between each other
16:30:13 <slaweq> so in this case LB agents are even better isolated than it is done for ovs agents
16:30:53 <haleyb> the ping is generating it though right?  that is odd
16:32:18 <slaweq> it isn't right
16:32:22 <slaweq> 100% packets lost
16:33:57 <slaweq> where You see this correct ping exactly?
16:34:37 <haleyb> i dind't see a correct ping, just assumed it was the command and that was stderr
16:35:05 * haleyb is looking at logs and sees nothing yet besides some fbd RTNETLINK failures
16:36:18 <haleyb> one warning in api log, http://logs.openstack.org/37/558037/6/check/neutron-fullstack/85bb570/logs/dsvm-fullstack-logs/TestSecurityGroupsSameNetwork.test_securitygroup_linuxbridge-iptables_/neutron-server--2018-05-17--22-16-48-838393.txt.gz#_2018-05-17_22_16_53_090
16:36:22 <haleyb> looks unrelated though
16:36:44 <slaweq> yep, probably not related
16:38:09 <slaweq> nothing very clear in logs at first glance IMO
16:38:25 <slaweq> I will try to look deeper on this one during this week
16:40:12 <slaweq> ok, I think we can move to next topic then
16:40:13 <slaweq> #topic Scenarios
16:40:47 <slaweq> first thing here I wanted to discuss is from jlibosva
16:41:07 <slaweq> he proposed to make openvswitch fw driver as default in devstack:  https://review.openstack.org/#/c/568297/
16:41:19 <slaweq> and I wanted to ask what You think about it
16:41:33 <mlavalle> as I said yesterday, I am fine with it
16:42:19 <mlavalle> the ovsfw test seems to be stable in the check and gate queues, right?
16:42:25 <slaweq> from grafana dashboard ovsfw firewall job looks stable - it's not failing more than other jobs
16:42:46 <mlavalle> I say let's give it a try
16:43:01 <slaweq> ok, so please vote on this patch mlavalle :)
16:43:14 <slaweq> and also if so, I have a question about our existing jobs then
16:43:23 <mlavalle> done
16:43:46 <slaweq> if we will change default driver to openvswitch, I think we should change this ovsfw-scenario job into "iptables-scenario" job
16:43:56 <slaweq> as ovsfw will be done in default one, right?
16:44:09 <mlavalle> yeah, that's good point
16:44:32 <mlavalle> we need to keep the non default one alive
16:44:38 <slaweq> yep
16:44:58 <slaweq> so is anyone who wants to do this change or should I assing it to myself?
16:45:19 <mlavalle> if you want to do it, go ahead
16:45:23 <slaweq> ok
16:45:50 <slaweq> #action slaweq to propose scenario job with iptables fw driver
16:46:20 <slaweq> ok, speaking about scenario jobs
16:46:46 <slaweq> we have still some of them on quite high failure rate, non-voting at least :)
16:47:02 <slaweq> I checked reasons of some of such failures
16:47:03 <slaweq> so:
16:47:11 <slaweq> neutron-tempest-multinode-full (non-voting):
16:47:27 <slaweq> failures in this one doesn't look like related to neutron:
16:47:28 <slaweq> * Block migration failure: http://logs.openstack.org/85/570085/5/check/neutron-tempest-multinode-full/09bea41/logs/testr_results.html.gz
16:47:33 <slaweq> * Rebuild server: http://logs.openstack.org/32/564132/2/check/neutron-tempest-multinode-full/a5262a2/logs/testr_results.html.gz
16:47:47 <slaweq> I didn't found other failures in last few days
16:48:07 <slaweq> job neutron-tempest-dvr-ha-multinode-full, example of failure:
16:48:08 <slaweq> * SSH to instance not available: http://logs.openstack.org/87/564887/6/check/neutron-tempest-dvr-ha-multinode-full/6296f64/logs/testr_results.html.gz
16:48:19 <slaweq> this one might be related to neutron
16:48:28 <mlavalle> I think it is
16:50:22 <slaweq> I found such errors in l3 agent logs: http://logs.openstack.org/87/564887/6/check/neutron-tempest-dvr-ha-multinode-full/6296f64/logs/subnode-2/screen-q-l3.txt.gz?level=ERROR
16:50:30 <slaweq> is it known for You?
16:51:31 <mlavalle> no, not familiar with that
16:51:56 <slaweq> and in neutron server: http://logs.openstack.org/87/564887/6/check/neutron-tempest-dvr-ha-multinode-full/6296f64/logs/screen-q-svc.txt.gz?level=ERROR
16:52:12 <slaweq> but I don't even know if this is related to this failed test exactly
16:52:21 <haleyb> me either, but can guess at the call to update the route and it not existing
16:55:02 <slaweq> can You maybe talk about it during L3 meeting then?
16:55:10 <mlavalle> yes
16:55:18 <slaweq> maybe it will be familiar for someone from this team :)
16:55:18 <mlavalle> we will take a look
16:55:26 <slaweq> thx mlavalle
16:55:40 <mlavalle> slaweq: is there a bug filed for it?
16:55:52 <slaweq> no, I don't know about any
16:56:05 <mlavalle> ok, I'll file bug
16:56:19 <slaweq> thx
16:56:39 <slaweq> ok, going quickly to next topic as we are almost out of time
16:56:45 <slaweq> #topic Rally
16:57:07 <slaweq> I noticed today that few days ago rally job name was changed in: https://review.openstack.org/#/c/558037/
16:57:19 <slaweq> so we don't have stats from rally in grafana since then
16:57:42 <slaweq> today I send a patch to fix that: https://review.openstack.org/#/c/570949/
16:57:50 <slaweq> so should be good when it will be merged
16:58:21 <slaweq> one moving quickly to last topic :)
16:58:25 <slaweq> #topic Open discussion
16:58:42 <slaweq> I was asked today if we are planning to create some tag in neutron_tempest_plugin repo
16:58:46 <slaweq> and if yes, then when
16:58:51 <slaweq> mlavalle: do You know?
16:59:05 <mlavalle> we can do it whenever it is needed
16:59:19 <mlavalle> how about next week with Rocky-2
16:59:25 <mlavalle> ?
16:59:28 <slaweq> would be great IMO
16:59:34 <slaweq> thx
16:59:43 <mlavalle> ok, will do it next week
16:59:47 <slaweq> thx mlavalle
16:59:48 <mlavalle> towards the end
16:59:53 <slaweq> and one last thing to mention
17:00:00 <slaweq> According to session in Vancouver I started switching neutron projects to stestr as test runner, patches have common topic. Please review them:
17:00:00 <slaweq> https://review.openstack.org/#/q/status:open+branch:master+topic:switch-to-stestr
17:00:07 <slaweq> ok, we are out of time
17:00:10 <slaweq> thx
17:00:12 <haleyb> mlavalle: sounds good.  i can't remember if i was supposed to write up something in our docs to say our cadencde for tags was every release
17:00:13 <mlavalle> Tnamks!
17:00:13 <slaweq> #endmeeting