16:00:42 <ihrachys> #startmeeting neutron_ci 16:00:43 <openstack> Meeting started Tue Apr 4 16:00:42 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:46 <openstack> The meeting name has been set to 'neutron_ci' 16:00:51 <haleyb> hi 16:01:06 <ihrachys> hi everyone, thanks for joining 16:01:19 <ihrachys> we will start with our tradition - reviewing action items from prev meeting 16:01:26 <ihrachys> #topic Action items from prev meeting 16:01:33 <ihrachys> huh, "ihrachys fix e-r bot not reporting in irc channel" 16:01:44 * ihrachys turns red 16:01:52 <ihrachys> no, it's not tackled 16:02:04 <ihrachys> I wonder if history shows I can't tackle it in due time :) 16:02:55 <ihrachys> I will repeat it for the next time, but you know... 16:02:56 <ihrachys> #action ihrachys fix e-r bot not reporting in irc channel 16:03:11 <ihrachys> if someone wants to help tracking it, you are welcome 16:03:15 <ihrachys> next was "mlavalle to fix the grafana board to include gate-tempest-dsvm-neutron-dvr-multinode-full-ubuntu-xenial-nv" 16:03:35 <mlavalle> I submitted this patchset: https://review.openstack.org/#/c/452294/ 16:03:35 <ihrachys> seems like it merged: https://review.openstack.org/#/c/452294/ 16:03:38 <ihrachys> mlavalle, good job 16:03:43 <mlavalle> and it got merged last night 16:03:54 <mlavalle> :-) 16:04:04 <mlavalle> Thanks for the reviews :-) 16:04:06 <ihrachys> now, let's have a look how the dashboard looks like now 16:04:26 <ihrachys> should be here: http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=8&fullscreen 16:04:26 <reedip> o\ /o 16:05:41 <ihrachys> 35% failure rate it seems 16:06:05 <mlavalle> yeap about that 16:06:28 <ihrachys> that job is non-voting 16:07:36 <ihrachys> mlavalle, haleyb: is there any plan in l3 team to make it the gate job that could replace non-dvr/non-multinode flavours? 16:07:53 <mlavalle> not that I'm aware of 16:08:01 <ihrachys> I remember there was a plan to make ha+dvr the gate setup, but at this point it seems dim 16:08:05 <mlavalle> but haleyb may have a plan 16:08:18 <mlavalle> if he is not on-line.... 16:08:28 <haleyb> ihrachys: yes, the plan was to get the ha+dvr change merged 16:08:32 <mlavalle> I will bring this up during the next L3 team meeting on Thursday 16:08:47 <ihrachys> haleyb, I think devstack-gate piece landed, no? 16:09:05 <ihrachys> this one: https://review.openstack.org/#/c/383827/ 16:09:32 <ihrachys> there is a test patch from anil here: https://review.openstack.org/#/c/383833/ but I don't know what's the state there 16:09:40 <ihrachys> haleyb, are you in touch with anil? 16:10:14 <haleyb> ihrachys: no, but we will add this to the list of items at the L3 meeting 16:10:20 <ihrachys> ok cool, thanks 16:10:37 <ihrachys> ok next one was on me 16:10:39 <ihrachys> "ihrachys to report bugs for fullstack race in ovs agent when calling to enable_connection_uri" 16:10:44 <ihrachys> there are mixed news here 16:10:55 <ihrachys> I haven't reported bugs just yet but... 16:11:33 <ihrachys> I was working on some OSP Ocata test failures, and while reading through logs, I spotted that we see the same duplicate messages in logs when setting manager 16:11:53 <ihrachys> so it's not only fullstack issue 16:12:03 <ihrachys> the env I see it is an actual multinode deployment 16:12:16 <ihrachys> for reference, this is the error I talk about: http://logs.openstack.org/98/446598/1/check/gate-neutron-dsvm-fullstack-ubuntu-xenial/2e0f93e/logs/dsvm-fullstack-logs/TestOvsConnectivitySameNetworkOnOvsBridgeControllerStop.test_controller_timeout_does_not_break_connectivity_sigkill_GRE-and-l2pop,openflow-native_ovsdb-cli_/neutron-openvswitch-agent--2017-03-16--16-06-05-730632.txt.gz?level=TRACE 16:12:57 <ihrachys> apparently the code that sets managers for native ovsdb driver is raceful when executed by two agents 16:13:18 <ihrachys> which can of course happen because we deploy multiple agents on a single node 16:13:34 <ihrachys> and each of them uses its own copy of ovsdb connection 16:13:59 <ihrachys> so, just a heads up; and it's still on me to report the bug 16:14:10 <ihrachys> #action ihrachys to report bugs for fullstack race in ovs agent when calling to enable_connection_uri 16:14:37 <ihrachys> #action haleyb or mlavalle to report back on ha+dvr plan after l3 meeting 16:15:16 <ihrachys> there was also a long standing action item on jlibosva to prepare py3 transition plan for Pike 16:15:27 <ihrachys> I doubt it's ready though sicnce jlibosva was offline for a while 16:15:38 <ihrachys> jlibosva, but that's your chance to surprize everyone 16:15:38 <jlibosva> indeed 16:15:44 <jlibosva> no surprise 16:15:47 <jlibosva> :( 16:15:57 <ihrachys> that's ok, I would be really surprized 16:16:08 <ihrachys> #action jlibosva to prepare py3 transition plan for Pike 16:16:20 <ihrachys> we can walk through it the next week 16:16:43 <ihrachys> #topic Patches in review 16:16:59 <ihrachys> manjeets's patch to add gate-failure bugs to neutron review board seems stuck: https://review.openstack.org/#/c/439114/ 16:17:13 <ihrachys> clarkb, I know you +2d it. who could be the 2nd person to review it? 16:17:37 <manjeets> ihrachys, I posted it on infra channel but did not get attention 16:17:50 <manjeets> may be need to find out 16:19:10 <ihrachys> ok I guess we will need to chase them somehow 16:19:18 <ihrachys> #action ihrachys to chase infra to review https://review.openstack.org/#/c/439114/ 16:19:19 <manjeets> jeremy stanley 16:19:39 <manjeets> don't know his irc handle 16:19:55 <ihrachys> I am also aware of this set of backports to fix scenario jobs in lbaas: https://review.openstack.org/#/q/I5d41652a85cfb91646bb48d38eedbe97741a97c2,n,z (mitaka seems broken but I probably won't have time till EOL to fix it) 16:20:00 <ihrachys> manjeets, I think it's fungi 16:20:40 <ihrachys> I also revise a bit how we disable dvr tests for dvrskip scenario jobs here: https://review.openstack.org/#/c/453212/ 16:21:36 <fungi> yep, that's me 16:21:43 <ihrachys> dasanind has a patch fixing sporadic tempest failure because of missing project_id on first API call: https://review.openstack.org/#/c/447781/ Now that it has a functional test, it should probably be ready, though I am still to look at the test. 16:21:50 <clarkb> ihrachys: probably fungi or pabelanger 16:21:55 <ihrachys> fungi, wonder if you could push https://review.openstack.org/#/c/439114/ 16:22:42 <fungi> looking into it now 16:22:42 <dasanind> ihrachys: I am getting a tempest test failure for https://review.openstack.org/#/c/447781/ 16:22:50 <dasanind> http://logs.openstack.org/81/447781/6/check/gate-tempest-dsvm-neutron-linuxbridge-ubuntu-xenial/c1962e8/logs/testr_results.html.gz 16:23:55 <ihrachys> dasanind, is it stable? or just a single failure? becuase the test is for Cinder API, and doesn't look neutron related for what I can see from 30 secs of log inspection 16:24:03 <fungi> manjeets: ihrachys: clarkb: i've approved 439114 now 16:24:12 <ihrachys> fungi, thanks a lot! 16:24:12 <manjeets> thanks fungi 16:24:19 <dasanind> ihrachys: it's just a single failure 16:24:19 <fungi> any time 16:24:52 <ihrachys> dasanind, ok, then it's probably something else. you are of course advised to report a bug if there is no bug that tracks the failure for cinder just yet. 16:25:27 <dasanind> ihrachys: will do 16:26:02 <ihrachys> another failure that we track is fullstack being broken by kevinbenton's change in provisioning blocks where we now require some ml2 driver to deliver the dhcp block for a dhcp enabled port to transition to ACTIVE 16:26:08 <ihrachys> more info here: http://lists.openstack.org/pipermail/openstack-dev/2017-March/114796.html 16:26:20 <ihrachys> jlibosva has a patch for fullstack here: https://review.openstack.org/451704 16:26:44 <ihrachys> it may need some additional investigation on why we seem to need to set agent_down_time=10 to make it pass 16:27:12 <jlibosva> I think it might not be needed but I don't understand why the tests fail on my env if I don't set the agent down time 16:27:24 <jlibosva> I'll wait for CI results and will try to remove that 16:27:35 <ihrachys> jlibosva, wonder what happens if we post a patch on top cleaning those up 16:28:16 <jlibosva> ihrachys: I went through patches that added dhcp tests and the agent_down_time is just to lower waiting when agent goes down in a failover test 16:28:27 <jlibosva> for HA 16:29:31 <ihrachys> jlibosva, but we don't failover in those tests that you touched do we? 16:31:46 <ihrachys> ok we will probably track it offline 16:31:58 <ihrachys> speaking of other patches 16:32:10 <jlibosva> ihrachys: nope 16:32:48 <ihrachys> my attention was brought to https://review.openstack.org/#/c/421155/ that fixes dvr tests for multinode setups (I expect it to affect our new ha+dvr job) 16:33:04 <ihrachys> I marked the bug as gate-failure for that matter 16:33:08 <ihrachys> to ease tracking it 16:33:33 <ihrachys> there is some back and forth there in comments about where to fix it first - tempest or neutron (seems like the test is duplicated) 16:33:47 <ihrachys> anyhoo, I am glad to see it got attention from some :) 16:35:32 <ihrachys> there seems to be a proposal to add a job using ryu master against neutron: https://review.openstack.org/#/c/445262/ 16:36:00 <ihrachys> not sure why it's added in gate and not e.g. periodic 16:37:56 <ihrachys> ok I left a comment there 16:38:12 <ihrachys> there is also that long standing patch from jlibosva that documents how rechecks should be approached in gate: https://review.openstack.org/#/c/426829/ 16:38:23 <ihrachys> mlavalle, I wonder if your WIP is still needed there 16:38:43 <mlavalle> I can remove it 16:38:51 <mlavalle> I don't think it is useful anymore 16:39:14 <ihrachys> yeah the patch seems to take a lot of time to get in 16:39:31 <mlavalle> I'll just merge it now 16:39:36 <ihrachys> I wonder if it's ok for me to just push it, or I better seek +W from e.g. kevinbenton 16:39:52 <ihrachys> since it's policy thing 16:40:07 <mlavalle> yeah, I think it would be good to get Kevinbenton's blessing 16:40:34 <mlavalle> I'll just remove the -1 16:40:44 <mlavalle> Done 16:41:56 <ihrachys> thanks 16:42:39 <ihrachys> I am not aware of any other patches. have I missed anything? 16:43:35 <ihrachys> otherwiseguy, how close are we to pull the trigger on ovsdbapp switch? https://review.openstack.org/#/c/438087/ 16:46:51 <ihrachys> ok I guess otherwiseguy is offline 16:47:08 <otherwiseguy> oh hi 16:47:23 <ihrachys> o/ 16:47:31 <ihrachys> how's ovsdbapp doing? 16:49:41 <ihrachys> ok otherwiseguy says he has some connectivity issues 16:49:58 <ihrachys> #topic Bugs 16:50:33 <ihrachys> there seems to be nothing actionable in the list that we haven't discussed already 16:50:36 <ihrachys> #link https://bugs.launchpad.net/neutron/+bugs?field.tag=gate-failure 16:51:04 <ihrachys> I am still not clear why we track e.g. vpnaas bugs that is not even stadium subproject under neutron component 16:51:44 <ihrachys> let's discuss something else 16:51:50 <ihrachys> #topic Open discussion 16:52:08 <ihrachys> infra seems to switch the whole gate that uses ubuntu xenial to UCA: http://lists.openstack.org/pipermail/openstack-dev/2017-April/114912.html 16:52:28 <ihrachys> that's ubuntu cloud archive, a repo that contains new versions of libvirt, openvswitch and such 16:52:28 <clarkb> well I am posing the question :) 16:52:42 <clarkb> I think newer ovs (2.6.1 compared to 2.5.0) helps neutron? 16:53:12 <ihrachys> clarkb, I think fullstack may make use of 2.6.1 so that we can stop compilation for kernel modules 16:53:17 <jlibosva> clarkb: would the image get also the newer kernel? 16:53:28 <clarkb> jlibosva: no UCA doesn't have nweer kernels in it 16:53:30 <ihrachys> right, question is, are images built with UCA on? 16:53:45 <ihrachys> or it's enabled after the fact 16:53:54 <jlibosva> ah, would be useful for fullstack 16:53:54 <clarkb> and they wouldn't be built with UCA on (most likely not at least, that specific detail isn't completely 100% settled) 16:54:01 <jlibosva> but it may help the disabled functional tests 16:54:04 <clarkb> ihrachys: even if it was enabled during image builds we woulnd't get newer kernels 16:54:16 <ihrachys> clarkb, is it possible to get an image with newer kernel too? otherwise we will still compile it seems. 16:54:37 <ihrachys> for functional, the only benefit is we will be able to reenable two tests 16:55:03 <clarkb> ihrachys: we could possibly do hardware enablement but unlike UCA I think ubuntu/canonical says not to use hardware enablement on servers 16:55:30 <jlibosva> we could also precompile the kernel module and fetch it from reliable storage instead of compiling the same all the time 16:55:36 <ihrachys> clarkb, sorry, what's hardware enablement? 16:56:01 <clarkb> ihrachys: its a separate thing that ubuntu does, where they publish newer kernels for LTS so that your new shiny laptop with silly new peripheral design will work 16:56:06 <ihrachys> jlibosva, the module should match kernel; if kernel is updated by ubuntu, we are screwed 16:56:12 <ihrachys> jlibosva, it may not load 16:56:23 <jlibosva> ihrachys: but we won't get newer kernel 16:56:40 <ihrachys> jlibosva, not new enough; but they can still update it for CVE or whatnot 16:56:41 <clarkb> I'm sort of confused why a new kernel is necessary 16:56:54 <jlibosva> there is a bug in kernel datapath for local vxlan traffic 16:57:06 <ihrachys> clarkb, openvswitch kernel pieces contain a fix that is needed for some fullstack tunneling feature 16:57:19 <clarkb> i see, has that been filed against ubuntu? 16:57:23 <ihrachys> jlibosva should really document that somewhere 16:57:35 <jlibosva> there was a bug, let me search 16:57:42 <ihrachys> #action jlibosva document current openvswitch requirements for fullstack/functional in TESTING.rst 16:58:03 <clarkb> I think if we want to talk about newer kernels that avenue for that would be the hardware enablement kernels and that would be separate from any use of UCA 16:58:08 <ihrachys> clarkb, one consideration when switching should also be revising https://review.openstack.org/#/c/402940/4/reference/project-testing-interface.rst not to give wrong message to consumers 16:58:22 <ihrachys> the way the document is worded now suggests that you can safely deploy from LTS bits 16:58:37 <ihrachys> clarkb, ack 16:58:41 <jlibosva> clarkb: this one https://bugs.launchpad.net/kernel/+bug/1627095 16:58:41 <openstack> Launchpad bug 1627095 in linux "Request to backport fix for local VxLAN" [Undecided,New] 16:58:41 <clarkb> ihrachys: well thats what openstack has stated it will support 16:58:47 <clarkb> ihrachys: so if its not the case we should ork to fix that 16:59:30 <ihrachys> clarkb, question is, should we work on it retroactively once bugs are revealed, or maintain a job that proves it still works, even if not too stable? 17:00:04 <clarkb> ihrachys: I think working with the distros to keep a functioning useable "openstack" is likely ideal. I don't know how practical that is in reality though 17:00:23 <clarkb> ihrachys: our users are deploying on these distros, if they don't work then our users will be sad (like me!)) 17:00:29 <jlibosva> we're out of time 17:00:39 <ihrachys> right. anyhow would make sense to update docs based on UCA decision. 17:00:49 <ihrachys> ok time indeed 17:00:51 <ihrachys> thanks folks 17:00:53 <ihrachys> #endmeeting