16:00:42 #startmeeting neutron_ci 16:00:43 Meeting started Tue Apr 4 16:00:42 2017 UTC and is due to finish in 60 minutes. The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:44 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:46 The meeting name has been set to 'neutron_ci' 16:00:51 hi 16:01:06 hi everyone, thanks for joining 16:01:19 we will start with our tradition - reviewing action items from prev meeting 16:01:26 #topic Action items from prev meeting 16:01:33 huh, "ihrachys fix e-r bot not reporting in irc channel" 16:01:44 * ihrachys turns red 16:01:52 no, it's not tackled 16:02:04 I wonder if history shows I can't tackle it in due time :) 16:02:55 I will repeat it for the next time, but you know... 16:02:56 #action ihrachys fix e-r bot not reporting in irc channel 16:03:11 if someone wants to help tracking it, you are welcome 16:03:15 next was "mlavalle to fix the grafana board to include gate-tempest-dsvm-neutron-dvr-multinode-full-ubuntu-xenial-nv" 16:03:35 I submitted this patchset: https://review.openstack.org/#/c/452294/ 16:03:35 seems like it merged: https://review.openstack.org/#/c/452294/ 16:03:38 mlavalle, good job 16:03:43 and it got merged last night 16:03:54 :-) 16:04:04 Thanks for the reviews :-) 16:04:06 now, let's have a look how the dashboard looks like now 16:04:26 should be here: http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=8&fullscreen 16:04:26 o\ /o 16:05:41 35% failure rate it seems 16:06:05 yeap about that 16:06:28 that job is non-voting 16:07:36 mlavalle, haleyb: is there any plan in l3 team to make it the gate job that could replace non-dvr/non-multinode flavours? 16:07:53 not that I'm aware of 16:08:01 I remember there was a plan to make ha+dvr the gate setup, but at this point it seems dim 16:08:05 but haleyb may have a plan 16:08:18 if he is not on-line.... 16:08:28 ihrachys: yes, the plan was to get the ha+dvr change merged 16:08:32 I will bring this up during the next L3 team meeting on Thursday 16:08:47 haleyb, I think devstack-gate piece landed, no? 16:09:05 this one: https://review.openstack.org/#/c/383827/ 16:09:32 there is a test patch from anil here: https://review.openstack.org/#/c/383833/ but I don't know what's the state there 16:09:40 haleyb, are you in touch with anil? 16:10:14 ihrachys: no, but we will add this to the list of items at the L3 meeting 16:10:20 ok cool, thanks 16:10:37 ok next one was on me 16:10:39 "ihrachys to report bugs for fullstack race in ovs agent when calling to enable_connection_uri" 16:10:44 there are mixed news here 16:10:55 I haven't reported bugs just yet but... 16:11:33 I was working on some OSP Ocata test failures, and while reading through logs, I spotted that we see the same duplicate messages in logs when setting manager 16:11:53 so it's not only fullstack issue 16:12:03 the env I see it is an actual multinode deployment 16:12:16 for reference, this is the error I talk about: http://logs.openstack.org/98/446598/1/check/gate-neutron-dsvm-fullstack-ubuntu-xenial/2e0f93e/logs/dsvm-fullstack-logs/TestOvsConnectivitySameNetworkOnOvsBridgeControllerStop.test_controller_timeout_does_not_break_connectivity_sigkill_GRE-and-l2pop,openflow-native_ovsdb-cli_/neutron-openvswitch-agent--2017-03-16--16-06-05-730632.txt.gz?level=TRACE 16:12:57 apparently the code that sets managers for native ovsdb driver is raceful when executed by two agents 16:13:18 which can of course happen because we deploy multiple agents on a single node 16:13:34 and each of them uses its own copy of ovsdb connection 16:13:59 so, just a heads up; and it's still on me to report the bug 16:14:10 #action ihrachys to report bugs for fullstack race in ovs agent when calling to enable_connection_uri 16:14:37 #action haleyb or mlavalle to report back on ha+dvr plan after l3 meeting 16:15:16 there was also a long standing action item on jlibosva to prepare py3 transition plan for Pike 16:15:27 I doubt it's ready though sicnce jlibosva was offline for a while 16:15:38 jlibosva, but that's your chance to surprize everyone 16:15:38 indeed 16:15:44 no surprise 16:15:47 :( 16:15:57 that's ok, I would be really surprized 16:16:08 #action jlibosva to prepare py3 transition plan for Pike 16:16:20 we can walk through it the next week 16:16:43 #topic Patches in review 16:16:59 manjeets's patch to add gate-failure bugs to neutron review board seems stuck: https://review.openstack.org/#/c/439114/ 16:17:13 clarkb, I know you +2d it. who could be the 2nd person to review it? 16:17:37 ihrachys, I posted it on infra channel but did not get attention 16:17:50 may be need to find out 16:19:10 ok I guess we will need to chase them somehow 16:19:18 #action ihrachys to chase infra to review https://review.openstack.org/#/c/439114/ 16:19:19 jeremy stanley 16:19:39 don't know his irc handle 16:19:55 I am also aware of this set of backports to fix scenario jobs in lbaas: https://review.openstack.org/#/q/I5d41652a85cfb91646bb48d38eedbe97741a97c2,n,z (mitaka seems broken but I probably won't have time till EOL to fix it) 16:20:00 manjeets, I think it's fungi 16:20:40 I also revise a bit how we disable dvr tests for dvrskip scenario jobs here: https://review.openstack.org/#/c/453212/ 16:21:36 yep, that's me 16:21:43 dasanind has a patch fixing sporadic tempest failure because of missing project_id on first API call: https://review.openstack.org/#/c/447781/ Now that it has a functional test, it should probably be ready, though I am still to look at the test. 16:21:50 ihrachys: probably fungi or pabelanger 16:21:55 fungi, wonder if you could push https://review.openstack.org/#/c/439114/ 16:22:42 looking into it now 16:22:42 ihrachys: I am getting a tempest test failure for https://review.openstack.org/#/c/447781/ 16:22:50 http://logs.openstack.org/81/447781/6/check/gate-tempest-dsvm-neutron-linuxbridge-ubuntu-xenial/c1962e8/logs/testr_results.html.gz 16:23:55 dasanind, is it stable? or just a single failure? becuase the test is for Cinder API, and doesn't look neutron related for what I can see from 30 secs of log inspection 16:24:03 manjeets: ihrachys: clarkb: i've approved 439114 now 16:24:12 fungi, thanks a lot! 16:24:12 thanks fungi 16:24:19 ihrachys: it's just a single failure 16:24:19 any time 16:24:52 dasanind, ok, then it's probably something else. you are of course advised to report a bug if there is no bug that tracks the failure for cinder just yet. 16:25:27 ihrachys: will do 16:26:02 another failure that we track is fullstack being broken by kevinbenton's change in provisioning blocks where we now require some ml2 driver to deliver the dhcp block for a dhcp enabled port to transition to ACTIVE 16:26:08 more info here: http://lists.openstack.org/pipermail/openstack-dev/2017-March/114796.html 16:26:20 jlibosva has a patch for fullstack here: https://review.openstack.org/451704 16:26:44 it may need some additional investigation on why we seem to need to set agent_down_time=10 to make it pass 16:27:12 I think it might not be needed but I don't understand why the tests fail on my env if I don't set the agent down time 16:27:24 I'll wait for CI results and will try to remove that 16:27:35 jlibosva, wonder what happens if we post a patch on top cleaning those up 16:28:16 ihrachys: I went through patches that added dhcp tests and the agent_down_time is just to lower waiting when agent goes down in a failover test 16:28:27 for HA 16:29:31 jlibosva, but we don't failover in those tests that you touched do we? 16:31:46 ok we will probably track it offline 16:31:58 speaking of other patches 16:32:10 ihrachys: nope 16:32:48 my attention was brought to https://review.openstack.org/#/c/421155/ that fixes dvr tests for multinode setups (I expect it to affect our new ha+dvr job) 16:33:04 I marked the bug as gate-failure for that matter 16:33:08 to ease tracking it 16:33:33 there is some back and forth there in comments about where to fix it first - tempest or neutron (seems like the test is duplicated) 16:33:47 anyhoo, I am glad to see it got attention from some :) 16:35:32 there seems to be a proposal to add a job using ryu master against neutron: https://review.openstack.org/#/c/445262/ 16:36:00 not sure why it's added in gate and not e.g. periodic 16:37:56 ok I left a comment there 16:38:12 there is also that long standing patch from jlibosva that documents how rechecks should be approached in gate: https://review.openstack.org/#/c/426829/ 16:38:23 mlavalle, I wonder if your WIP is still needed there 16:38:43 I can remove it 16:38:51 I don't think it is useful anymore 16:39:14 yeah the patch seems to take a lot of time to get in 16:39:31 I'll just merge it now 16:39:36 I wonder if it's ok for me to just push it, or I better seek +W from e.g. kevinbenton 16:39:52 since it's policy thing 16:40:07 yeah, I think it would be good to get Kevinbenton's blessing 16:40:34 I'll just remove the -1 16:40:44 Done 16:41:56 thanks 16:42:39 I am not aware of any other patches. have I missed anything? 16:43:35 otherwiseguy, how close are we to pull the trigger on ovsdbapp switch? https://review.openstack.org/#/c/438087/ 16:46:51 ok I guess otherwiseguy is offline 16:47:08 oh hi 16:47:23 o/ 16:47:31 how's ovsdbapp doing? 16:49:41 ok otherwiseguy says he has some connectivity issues 16:49:58 #topic Bugs 16:50:33 there seems to be nothing actionable in the list that we haven't discussed already 16:50:36 #link https://bugs.launchpad.net/neutron/+bugs?field.tag=gate-failure 16:51:04 I am still not clear why we track e.g. vpnaas bugs that is not even stadium subproject under neutron component 16:51:44 let's discuss something else 16:51:50 #topic Open discussion 16:52:08 infra seems to switch the whole gate that uses ubuntu xenial to UCA: http://lists.openstack.org/pipermail/openstack-dev/2017-April/114912.html 16:52:28 that's ubuntu cloud archive, a repo that contains new versions of libvirt, openvswitch and such 16:52:28 well I am posing the question :) 16:52:42 I think newer ovs (2.6.1 compared to 2.5.0) helps neutron? 16:53:12 clarkb, I think fullstack may make use of 2.6.1 so that we can stop compilation for kernel modules 16:53:17 clarkb: would the image get also the newer kernel? 16:53:28 jlibosva: no UCA doesn't have nweer kernels in it 16:53:30 right, question is, are images built with UCA on? 16:53:45 or it's enabled after the fact 16:53:54 ah, would be useful for fullstack 16:53:54 and they wouldn't be built with UCA on (most likely not at least, that specific detail isn't completely 100% settled) 16:54:01 but it may help the disabled functional tests 16:54:04 ihrachys: even if it was enabled during image builds we woulnd't get newer kernels 16:54:16 clarkb, is it possible to get an image with newer kernel too? otherwise we will still compile it seems. 16:54:37 for functional, the only benefit is we will be able to reenable two tests 16:55:03 ihrachys: we could possibly do hardware enablement but unlike UCA I think ubuntu/canonical says not to use hardware enablement on servers 16:55:30 we could also precompile the kernel module and fetch it from reliable storage instead of compiling the same all the time 16:55:36 clarkb, sorry, what's hardware enablement? 16:56:01 ihrachys: its a separate thing that ubuntu does, where they publish newer kernels for LTS so that your new shiny laptop with silly new peripheral design will work 16:56:06 jlibosva, the module should match kernel; if kernel is updated by ubuntu, we are screwed 16:56:12 jlibosva, it may not load 16:56:23 ihrachys: but we won't get newer kernel 16:56:40 jlibosva, not new enough; but they can still update it for CVE or whatnot 16:56:41 I'm sort of confused why a new kernel is necessary 16:56:54 there is a bug in kernel datapath for local vxlan traffic 16:57:06 clarkb, openvswitch kernel pieces contain a fix that is needed for some fullstack tunneling feature 16:57:19 i see, has that been filed against ubuntu? 16:57:23 jlibosva should really document that somewhere 16:57:35 there was a bug, let me search 16:57:42 #action jlibosva document current openvswitch requirements for fullstack/functional in TESTING.rst 16:58:03 I think if we want to talk about newer kernels that avenue for that would be the hardware enablement kernels and that would be separate from any use of UCA 16:58:08 clarkb, one consideration when switching should also be revising https://review.openstack.org/#/c/402940/4/reference/project-testing-interface.rst not to give wrong message to consumers 16:58:22 the way the document is worded now suggests that you can safely deploy from LTS bits 16:58:37 clarkb, ack 16:58:41 clarkb: this one https://bugs.launchpad.net/kernel/+bug/1627095 16:58:41 Launchpad bug 1627095 in linux "Request to backport fix for local VxLAN" [Undecided,New] 16:58:41 ihrachys: well thats what openstack has stated it will support 16:58:47 ihrachys: so if its not the case we should ork to fix that 16:59:30 clarkb, question is, should we work on it retroactively once bugs are revealed, or maintain a job that proves it still works, even if not too stable? 17:00:04 ihrachys: I think working with the distros to keep a functioning useable "openstack" is likely ideal. I don't know how practical that is in reality though 17:00:23 ihrachys: our users are deploying on these distros, if they don't work then our users will be sad (like me!)) 17:00:29 we're out of time 17:00:39 right. anyhow would make sense to update docs based on UCA decision. 17:00:49 ok time indeed 17:00:51 thanks folks 17:00:53 #endmeeting