16:00:45 <ihrachys> #startmeeting neutron_ci
16:00:45 <openstack> Meeting started Tue Jul 18 16:00:45 2017 UTC and is due to finish in 60 minutes.  The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:46 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:46 <jlibosva> o/
16:00:48 <openstack> The meeting name has been set to 'neutron_ci'
16:00:57 <ihrachys> we'll try quick
16:00:58 <ihrachys> #topic Actions from prev week
16:01:10 <ihrachys> first was "jlibosva to reach out to Victor Stinner about eventlet/py3 issue with functional tests"
16:01:23 <jlibosva> I did
16:01:35 <jlibosva> he was able to get some data from my environment
16:01:42 <jlibosva> and produced a simple reporducer
16:01:48 <jlibosva> http://paste.alacon.org/44101
16:01:58 <ihrachys> aha
16:02:04 <ihrachys> since it doesn't import from openstack..
16:02:06 <ihrachys> eventlet bug?
16:02:24 <jlibosva> there seems to be a difference in python3 signal handling, that breaks eventlet loop
16:02:27 <jlibosva> yes, eventlet bug
16:02:46 <jlibosva> but the fix might be complicated as they will need to redesign signal handling in eventlet
16:02:48 <jlibosva> or
16:02:54 <jlibosva> victor came up with following fix
16:03:00 <jlibosva> http://paste.alacon.org/44102
16:03:45 <jlibosva> we also got help from sileth who suggest to make a workaround in oslo service
16:03:56 <jlibosva> next step is to report bug (if not done already)
16:04:02 <jlibosva> but seems like it got a traction
16:04:11 <jlibosva> maybe the self_pipe will be a way to fix this
16:04:15 <ihrachys> I guess it will be both oslo and eventlet bugs to report?
16:04:25 <jlibosva> technically oslo doesn't do anything wrong
16:04:49 <ihrachys> yeah but for a workaround it may make sense no?
16:05:07 <ihrachys> bumping a minimal for eventlet is always a pain
16:05:27 <jlibosva> hmm
16:05:45 <jlibosva> then maybe oslo workaround would be easier, we'll see how the fix will go
16:06:21 <ihrachys> ok, so next step from your side is report bugs
16:06:40 <ihrachys> #action jlibosva to report bugs for eventlet and maybe oslo.service for eventlet signal/timer issue with py3
16:06:49 <ihrachys> great dig it seems
16:07:11 <ihrachys> next item was "jlibosva to post patch splitting OVN from OvsVenvFixture"
16:07:43 <jlibosva> ah, didn't do. maybe I could just push what I currently have :)
16:07:58 <ihrachys> it may be good to see the direction
16:08:08 <ihrachys> I will repeat the item
16:08:19 <ihrachys> #action jlibosva to post patch splitting OVN from OvsVenvFixture
16:08:29 <ihrachys> next was "jlibosva to post patch splitting OVN from OvsVenvFixture"
16:08:33 <ihrachys> I didn't do :-x
16:08:43 <ihrachys> oops
16:08:46 <jlibosva> yeah, I didn't do that either
16:08:53 <ihrachys> I meant "ihrachys to look at why fullstack switch to rootwrap/d-g broke some test cases"
16:09:03 <ihrachys> #action ihrachys to look at why fullstack switch to rootwrap/d-g broke some test cases
16:09:20 <ihrachys> that kinda gets pushed down in my list because it's non-voting :p
16:09:29 <ihrachys> next was "haleyb to split grafana check dashboard into grenade and tempest charts"
16:09:54 <ihrachys> I believe it's https://review.openstack.org/#/c/483119/
16:10:22 <ihrachys> next was "haleyb to continue looking at dvr-ha job failure rate and reasons"
16:10:38 <ihrachys> I see there is this patch: https://review.openstack.org/#/c/483600/ (WIP)
16:11:06 <ihrachys> also, Brian sent an email to openstack-dev@
16:11:40 <ihrachys> #link http://lists.openstack.org/pipermail/openstack-dev/2017-July/119743.html
16:11:43 <ihrachys> no replies so far
16:12:18 <ihrachys> let's chime in there, and also maybe ping some key folks
16:12:53 <ihrachys> #action haleyb to collect feedback of key contributors on multinode-by-default patch
16:13:07 <ihrachys> #action everyone to chime in on haleyb's thread on multinode switch
16:13:24 <ihrachys> next was "haleyb to clean up old trusty charts from grafana"
16:14:22 <ihrachys> I believe it magically happened by virtue of general trusty cleanup
16:14:38 <ihrachys> so no patches from Brian, but it is done nevertheless
16:14:55 <ihrachys> next was "haleyb to spin up a ML discussion on replacing single node grenade job with multinode in integrated gate"
16:15:04 <ihrachys> ok, THAT one was for the email thread I mentioned above
16:15:10 <ihrachys> but those topics are interrelated
16:15:17 <ihrachys> and it seems Brian is not here
16:15:32 <ihrachys> so we will follow up with the thread and see where it leads us
16:15:40 <ihrachys> next was "haleyb to continue looking at places to reduce the number of jobs"
16:15:57 <ihrachys> kinda an open ended action, probably was not worth existance in the first place :)
16:16:10 <ihrachys> again, will see where more specific actions lead us
16:16:16 <ihrachys> next was "ihrachys to complete triage of latest functional test failures that result in 30% failure rate"
16:16:34 <ihrachys> I did some triaging for all failures since last meeting for functional gate (not check queue)
16:16:41 <ihrachys> this is the result:
16:16:51 <ihrachys> #link https://etherpad.openstack.org/p/neutron-functional-gate-failures-july Functional Gate failures
16:17:11 <ihrachys> it's basically timeouts, and tester threads running firewall test cases dying
16:17:18 <ihrachys> which may actually be the same
16:17:37 <ihrachys> when a tester thread is dying, we just see 'Killed' in the console log
16:17:44 <ihrachys> nothing in the per-test case log
16:17:46 <ihrachys> or in syslog
16:18:01 <ihrachys> it's suspicious it's almost always firewall test caser
16:18:13 * ihrachys wonders if it's smth wrong with the test class
16:18:25 <jlibosva> one thing that comes in my mind is that searching for pid to kill (like nc) is malfunctioned
16:18:38 <ihrachys> or is it just so huge that the chance of triggering it there is high?
16:18:41 <jlibosva> and since nc dies, it picks a wrong pid
16:18:58 <ihrachys> jlibosva, could it be some other thread kills the current thread somehow?
16:19:03 <ihrachys> or maybe it kills itself? :)
16:19:18 <jlibosva> I'll try to add some debug messages to patch and send it upstream to recheck, recheck, recheck
16:19:28 <ihrachys> ok
16:19:56 <ihrachys> #action jlibosva to send a debug patch for random test runner murders, and recheck, recheck, recheck
16:20:56 <ihrachys> maybe the code searching for children is misbehaving and ends up killing itself?
16:21:07 <ihrachys> anyway, we will follow up in gerrit
16:21:16 <jlibosva> yeah, that's what I meant
16:21:16 <ihrachys> thanks for taking the next step on this one!
16:21:28 <ihrachys> next was "ihrachys to remove pg job from periodics grafana board"
16:21:46 <ihrachys> I sent this: https://review.openstack.org/#/c/482676/
16:21:56 <ihrachys> and Armando chimed in there with action ;0
16:21:59 <ihrachys> so I abandoned
16:22:06 <ihrachys> seems like there is some interest
16:22:29 <ihrachys> and next time the job fails, we may ask him
16:23:13 <jlibosva> pg-liaison
16:23:57 <ihrachys> it *seems* that the bugs reported are now closed
16:24:00 <ihrachys> checking the dash
16:24:28 <ihrachys> yeah it's green
16:24:33 <ihrachys> so we have closure here
16:24:40 <ihrachys> and those were all items we had
16:24:43 <ihrachys> now...
16:24:50 <ihrachys> let's review grafana
16:24:54 <ihrachys> #topic Grafana
16:25:01 <ihrachys> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:25:04 <jlibosva> ah, no action for me re fullstack process isolation?
16:25:30 <ihrachys> jlibosva, there was the ovs fixture no?
16:25:42 <ihrachys> you were saying you will push what you have
16:25:52 <jlibosva> that was just about splitting the class into ovs and ovn
16:25:57 <ihrachys> ah ok
16:26:09 <ihrachys> let's discuss that after grafana
16:26:10 <jlibosva> I did some successful research on multiple ovsdb-server processes on a single node
16:26:12 <jlibosva> sure
16:26:47 <ihrachys> so looking at the board, we have a decent state actually
16:26:55 <ihrachys> except functional that was discussed
16:27:04 <ihrachys> and fullstack that is just knowingly broken
16:27:20 <ihrachys> and scenarios that don't get much traction for connectivity issues
16:27:25 <ihrachys> but those are long known
16:27:30 <ihrachys> no new breakages
16:28:07 <ihrachys> I see there is some -ovsfw- job in tempest check queue chart that is at 40% failure rate
16:28:11 <ihrachys> I haven't seen it before
16:28:13 <ihrachys> something new?
16:28:19 <jlibosva> no, it's been there for a while
16:28:28 <ihrachys> ok.
16:28:38 <ihrachys> the result is not great don't you think
16:28:40 <jlibosva> I checked the failure couple months back and they don't seem to be related to the firewall
16:28:50 <jlibosva> it's mostly other tempest issues
16:29:03 <jlibosva> if you look at the curve, it copies other tempest tests like dvr
16:29:06 <ihrachys> the fact that it stands out is suspicious
16:29:33 <ihrachys> curve yeah. but it's twice higher rate than e.g. dvr+ha
16:30:28 <ihrachys> anyway
16:30:32 <ihrachys> I don't think it's high prio
16:30:53 <ihrachys> #topic Fullstack isolation
16:30:59 <ihrachys> jlibosva, you had smth to update here
16:31:07 <jlibosva> hmm,maybe we'll need some job that uses the same env to compare
16:31:15 <jlibosva> to just have a diff between ovs vs. iptables
16:31:33 <jlibosva> I don't remember what it runs, I think it's an all-in-one but not sure if ti's dvr or not
16:31:50 <jlibosva> yep, so, I'm happy to announce that I was able to have multiple ovsdb-servers
16:32:12 <jlibosva> each running its own vswitchd to communicate with kernel datapath. ovsdb-servers must be in namespaces
16:32:15 <ihrachys> in fullstack env? or just in some poc env?
16:32:23 <jlibosva> so I had two namespaces and root namespace
16:32:30 <jlibosva> just a poc to see what's possible
16:32:34 <jlibosva> I didn't write any code
16:33:08 <jlibosva> I was able to have traffic from one namespace to other namespace, basically running traffic through interfaces from three different ovsdb-servers
16:33:26 <jlibosva> also, we can have a namespace in a namespace, nest it
16:33:35 <ihrachys> inception
16:33:45 <jlibosva> so what I plan to do is to create a namespace per fullstack host
16:34:04 <jlibosva> and run all agents there, using a single ovsdb-server running in namespace
16:34:27 <jlibosva> which means, fakefullstack machines will be also spawned in this namespace, that's why the inception
16:34:36 <ihrachys> hm. and so e.g. dhcp or l3 namespaces will be 2nd order depth?
16:34:53 <jlibosva> no, dhcp and l3 will run in a host namespace, not in its own like today
16:35:06 <jlibosva> oh, right
16:35:13 <ihrachys> yeah but I mean, then they create namespaces
16:35:14 <jlibosva> yeah, qrouter and qdhcp will be 2nd
16:35:16 <ihrachys> ok
16:35:18 <ihrachys> we are on the same page
16:35:29 <ihrachys> I wonder if it will reveal any kernel issues :p
16:35:32 <jlibosva> which means, we won't need the hacks for unique namespaces
16:35:47 <ihrachys> jlibosva, oh right, it would be great to get rid of that
16:36:00 <jlibosva> and I used ovs 2.6
16:36:12 <ihrachys> is it the gate version?
16:36:14 <jlibosva> there were similar attempts to do similar things in the past on older ovs without success
16:36:25 <jlibosva> I think we have 2.5.2
16:36:37 <jlibosva> iirc
16:37:07 <ihrachys> OVS_BRANCH=v2.6.1
16:37:10 <ihrachys> in fullstack
16:37:20 <jlibosva> but we don't compile do we?
16:37:24 <jlibosva> oh, we compile kernel
16:37:33 <ihrachys> http://logs.openstack.org/20/483020/3/check/gate-neutron-dsvm-fullstack-ubuntu-xenial/a95166c/console.html#_2017-07-18_14_02_16_215224 ?
16:37:40 <jlibosva> because of vxlan local tunneling
16:37:53 <jlibosva> but we use userspace from deb
16:38:10 <jlibosva> yeah, anyways, those were my findings and I'm excited about implementing it :)
16:38:24 <ihrachys> yeah that sounds like a great project
16:39:15 <ihrachys> there are no interesting new gate bugs, so I will skip
16:39:18 <ihrachys> #topic Open discussion
16:39:22 <ihrachys> I don't have anything
16:39:29 <jlibosva> me neither
16:39:45 <ihrachys> ok then we close the meeting. thanks jlibosva for being active, I would feel lonely otherwise lol.
16:39:48 <ihrachys> #endmeeting