#openstack-meeting log

16:02:30 <ihrachys> #startmeeting neutron_ci
16:02:31 <openstack> Meeting started Tue Feb  6 16:02:30 2018 UTC and is due to finish in 60 minutes.  The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:02:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:02:35 <openstack> The meeting name has been set to 'neutron_ci'
16:02:38 <mlavalle> o/
16:02:43 <haleyb> o/
16:02:49 <slaweq> hi
16:03:16 <ihrachys> thanks jlibosva for taking over the chair the prev meeting
16:03:23 <ihrachys> afaiu Jakub won't join us today
16:03:23 <mlavalle> ++
16:03:32 <ihrachys> #topic Actions from prev meeting
16:03:37 <ihrachys> "jlibosva to request a new release for ovsdbapp"
16:03:39 <mlavalle> yes, he dropped of about 15 minutes ago
16:04:02 <ihrachys> there is https://review.openstack.org/541056 and https://review.openstack.org/541112 in pipeline that should roll in new library into gates
16:04:03 <patchbot> patch 541056 - requirements - update constraint for ovsdbapp to new release 0.9.1
16:04:04 <patchbot> patch 541112 - requirements (stable/pike) - update constraint for ovsdbapp to new release 0.4.2
16:05:40 <ihrachys> "mlavalle and haleyb to follow up on how we can move forward floating ip failures in dvr scenario"
16:06:25 <mlavalle> haleyb is setting up an environment to replicate the issue.
16:06:50 <mlavalle> since we marked the test as unstable
16:06:52 <haleyb> ihrachys: my only update is i have an environment but haven't replicated yet
16:09:57 <ihrachys> ok. I guess there is not much sense to roll the AI over and over going forward. I trust you to beat it to death. :)
16:10:16 <ihrachys> next was "jlibosva to report bug about scenario failure for test_snat_external_ip"
16:11:36 <ihrachys> not sure it happened, can't find anything in gate-failure tagged bugs
16:13:19 <ihrachys> I sent him an email about it, we'll see if he followed up on that
16:13:29 <mlavalle> ++
16:13:36 <ihrachys> ok that's all we had from prev meeting
16:13:44 <ihrachys> #topic Grafana
16:13:48 <ihrachys> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:14:12 <ihrachys> before we walk through regular offenders, let's peek at periodics
16:14:39 <ihrachys> mlavalle, afaiu https://review.openstack.org/#/c/539006 would move the jobs to neutron tree, then we would change names for jobs / clean up legacy jobs from infra repos?
16:14:40 <patchbot> patch 539006 - neutron - Move periodic jobs to Neutron repo
16:14:54 <mlavalle> correct
16:15:16 <ihrachys> there seems to be some syntax error or smth.
16:15:28 <ihrachys> ok that's covered then
16:15:29 <mlavalle> I introduced a syntax error yesterday. I'll fix it today
16:16:07 <mlavalle> in the first revision we had a funtional periodic job as well
16:16:20 <mlavalle> but is seems it is the same as the normal functional
16:16:32 <mlavalle> so I removed it yesterday
16:16:56 <mlavalle> and when we add the other two to the periodic queue, we will also add the normal functional
16:19:04 <ihrachys> ok. -pg- (postgres) job of those periodic jobs shows some level of instability
16:19:20 <ihrachys> but it's not clear if it's totally broken, since some days are fine
16:20:27 <ihrachys> I checked latest logs and it seems something is busted
16:20:27 <ihrachys> http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/legacy-periodic-tempest-dsvm-neutron-pg-full/be7f73e/logs/devstacklog.txt.gz#_2018-02-06_06_24_58_783
16:20:33 <ihrachys> "Rolling upgrades are currently supported only for MySQL and Sqlite"
16:20:41 <ihrachys> and that's from glance section of devstack
16:21:29 <ihrachys> I will report a bug for that and assign it to glance
16:21:41 <ihrachys> #action ihrachys report bug for -pg- failure to glance
16:22:07 <ihrachys> fullstack is unstable, that's expected
16:22:21 <ihrachys> functional is a tad better and I expect it to become a lot better when new ovsdbapp rolls in
16:22:23 <slaweq> but it's better than last week at least
16:22:25 <slaweq> :)
16:22:33 <ihrachys> (actually, it may also help fullstack)
16:22:44 <ihrachys> slaweq, absolutely!
16:23:25 <ihrachys> one thing of concern esp. close to releas is that ovsfw job, though of course non-voting, now at 100%
16:23:40 <ihrachys> anyone aware of the reason for breakage / bug to look at?
16:24:39 <slaweq> e.g. here: http://logs.openstack.org/54/537654/4/check/neutron-tempest-ovsfw/5c90b2b/logs/testr_results.html.gz IMO some error with SSH to vm
16:25:08 <slaweq> I can report a bug and try to check it this week if You want
16:25:16 <slaweq> *in this week
16:25:35 <ihrachys> and based on what guests log, metadata is not accessible
16:26:54 <slaweq> http://logs.openstack.org/54/537654/4/check/neutron-tempest-ovsfw/5c90b2b/logs/screen-q-agt.txt.gz?level=ERROR
16:26:59 <ihrachys> slaweq, well if you have cycles. jlibosva may be of help here.
16:27:07 <ihrachys> yeah was going to post same link to agent log
16:27:12 <ihrachys> it doesn't look healthy
16:27:18 <slaweq> there are some errors related to ovsfw but I don't know if it is a reason
16:27:24 <ihrachys> ofport: -1 for VIF: bdfda7ad-c656-46a3-bd83-a17878346c35 is not a positive integer
16:28:14 <ihrachys> slaweq, it's probably worth marking the bug as blocker for release. mlavalle what do you think?
16:28:44 <ihrachys> this sea of red is not something we should release :)
16:28:51 <mlavalle> yeah, I think it is a good idea
16:29:02 <slaweq> ihrachys: I have some cycles so I will check it ASAP
16:29:15 <ihrachys> slaweq, great. make sure the bug is tagged for queens-rc1
16:29:23 <slaweq> sure
16:29:38 <ihrachys> mlavalle, btw when is stable/queens cutoff / rc1?
16:29:53 <mlavalle> rc1 is this coming Friday
16:30:02 <ihrachys> I wonder if our release liaison started to walk through pre-release check list.
16:30:18 <ihrachys> probably not a q for this venue though, but smth to follow up with armax if not already, mlavalle.
16:30:33 <mlavalle> yeap
16:30:42 <mlavalle> but he's been active
16:30:48 <ihrachys> ok so ovsfw is covered too, great
16:31:11 <ihrachys> #action slaweq to report bug for ovsfw job failure / sea of red in ovs agent logs
16:32:25 <ihrachys> as for scenarios, linuxbridge is quite bad, but dvr one actually seems to mostly match trend spikes and dips of other tempest jobs, just a bit exaggerated
16:33:13 <ihrachys> so there are probably some more issues somewhere there. we'll have a look right now on logs.
16:34:20 <ihrachys> actually, dvr is not that much better, it's just some spike in linuxbridge lately that has not reflected on dvr, but the shape for older period is same.
16:34:45 <ihrachys> #topic Scenarios
16:35:37 <ihrachys> linuxbridge: http://logs.openstack.org/54/537654/4/check/neutron-tempest-plugin-scenario-linuxbridge/3272a19/logs/testr_results.html.gz
16:35:45 <ihrachys> ssh to fip failed
16:37:04 <ihrachys> there is some warning in q-agt logged over and over: http://logs.openstack.org/54/537654/4/check/neutron-tempest-plugin-scenario-linuxbridge/3272a19/logs/screen-q-agt.txt.gz?level=WARNING
16:37:18 <ihrachys> haleyb, any idea where that one comes from and whether it's harmful?
16:37:53 <ihrachys> based on table name, neutron-linuxbri-qos-o15fb2c, it's probably smth qos related
16:37:53 <haleyb> ihrachys: i don't know where it's coming from but don't think it's harmful
16:40:03 <haleyb> it could be something as simple as removing the rule with a different string than it was added
16:40:35 <slaweq> it's strange because AFAIR qos is adding rules to MANGLE table only (but I might be wrong)
16:40:48 <ihrachys> right. but then wouldn't we e.g. potentially leave an old?
16:41:11 <slaweq> actually it might be MANGLE table and POSTROUTING chain there
16:43:07 <ihrachys> here is another run: http://logs.openstack.org/68/540868/1/check/neutron-tempest-plugin-scenario-linuxbridge/01c4559/logs/testr_results.html.gz
16:43:25 <ihrachys> also failing on ssh to fip, but different tests
16:43:40 <ihrachys> looks like some random issues that are not specific to scenario, just ssh to fip
16:43:44 <haleyb> slaweq: maybe because the chain removal triggered the rule removal?
16:44:10 <slaweq> haleyb: maybe, I really don't know now
16:44:37 <haleyb> it's the postrouting rule in the mangle table
16:44:52 <ihrachys> but a successful run has same messages: http://logs.openstack.org/57/523257/33/check/neutron-tempest-plugin-scenario-linuxbridge/35864c5/logs/screen-q-agt.txt.gz?level=WARNING so it's probably not the root cause
16:45:38 <haleyb> actually mangle table rule in the postrouting table, that's a mouthful
16:45:54 <ihrachys> we will probably need to track through things like - whether FIP was reused; whether gARP was sent; whether ARP table was updated..
16:46:23 <ihrachys> I will bite that one
16:46:39 <ihrachys> #action ihrachys to look at linuxbridge scenario random failures when sshing to FIP
16:48:20 <ihrachys> also, dvr scenarios fail from time to time
16:48:21 <ihrachys> http://logs.openstack.org/57/523257/33/check/neutron-tempest-plugin-dvr-multinode-scenario/b907462/logs/testr_results.html.gz
16:48:28 <ihrachys> similar symptoms actually
16:49:05 <slaweq> I think that it's the issue which haleyb and mlavalle was talking at the beginning (in "actions from previous meeting")
16:49:23 <ihrachys> oh fip one?
16:49:30 <slaweq> but I might be wrong
16:49:37 <ihrachys> I also see this: http://logs.openstack.org/57/523257/33/check/neutron-tempest-plugin-dvr-multinode-scenario/b907462/logs/screen-q-agt.txt.gz?level=WARNING#_Feb_03_18_50_36_283366 that seems to resemble ovsfw- job failure
16:50:35 <ihrachys> but not sure, could as well be unrelated
16:50:57 <ihrachys> the issue haleyb was trying to reproduce is for specific test cases that are disabled no?
16:51:18 <mlavalle> yes
16:52:50 <ihrachys> ok I am not sure what to do with this failure. maybe allow it to slip since haleyb is already busy with another issue for the job related to FIPs.
16:54:39 <haleyb> was that just a warning or failure?
16:54:56 <ihrachys> well it fails test cases with ssh connection issues when used with FIP
16:54:57 <haleyb> oh, ERROR
16:55:03 <ihrachys> http://logs.openstack.org/57/523257/33/check/neutron-tempest-plugin-dvr-multinode-scenario/b907462/logs/testr_results.html.gz
16:55:14 <ihrachys> it broke lots of tests
16:55:18 <haleyb> ah, i saw WARNING in the url
16:55:44 <ihrachys> well yeah WARNINGs are just things we look at in hope they reveal the cause. but it's a legit failure.
16:56:18 <ihrachys> anyhow. I guess we will wait for more progress on that other FIP issue.
16:56:24 <ihrachys> #topic Fullstack
16:56:29 <ihrachys> not much time but let's peek
16:57:08 <slaweq> there is still one issue with security group tests: https://bugs.launchpad.net/neutron/+bug/1744402
16:57:09 <openstack> Launchpad bug 1744402 in neutron "fullstack security groups test fails because ncat process don't starts" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
16:57:18 <ihrachys> http://logs.openstack.org/14/531414/1/check/neutron-fullstack/55e82ca/logs/testr_results.html.gz
16:57:48 <ihrachys> slaweq, gotcha. fix here: https://review.openstack.org/#/c/541242/
16:57:48 <patchbot> patch 541242 - neutron - [Fullstack] Mark security group test as unstable
16:57:57 <ihrachys> oh it's just mark as unstable
16:58:13 <slaweq> ihrachys: it's not a fix but related to this issue :)
16:58:27 <ihrachys> ok
16:58:28 <slaweq> I didn't have time yet to debug it
16:59:09 <ihrachys> btw, before we wrap up
16:59:25 <ihrachys> I noticed a colleague from ironic posted this revert for ovsfw patch: https://review.openstack.org/#/c/541297/1
16:59:25 <patchbot> patch 541297 - neutron - DNM Test Revert "ovsfw: Don't create rules if upda...
16:59:33 <ihrachys> maybe they are onto something
16:59:51 <ihrachys> it would make sense to check with them what they try to do
17:00:27 <ihrachys> ok time is out
17:00:35 <ihrachys> thanks folks
17:00:37 <ihrachys> #endmeeting