16:00:19 <slaweq> #startmeeting neutron_ci
16:00:20 <openstack> Meeting started Tue Mar  5 16:00:19 2019 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:24 <openstack> The meeting name has been set to 'neutron_ci'
16:00:38 <mlavalle> o/
16:00:43 <slaweq> hi
16:01:00 <slaweq> lets wait few minutes for others to join
16:01:12 <mlavalle> while we wait, did you ping rubasov about the patches yet to be revieiwed?
16:01:16 <njohnston> o/
16:01:29 <ralonsoh> hi
16:02:20 <slaweq> mlavalle: yes I talked with rubasov about it
16:02:33 <slaweq> I will give You later log from what we talked, ok?
16:02:48 <slaweq> ok, lets start
16:02:52 <bcafarel> o/
16:02:54 <haleyb> hi
16:03:01 <slaweq> #topic Actions from previous meetings
16:03:08 <slaweq> first one was
16:03:10 <slaweq> njohnston to create etherpad with python3 status of stadium projects
16:03:22 <njohnston> #link https://etherpad.openstack.org/p/neutron_stadium_python3_status
16:03:47 <njohnston> I identified the jobs that look like they are still running py2
16:04:12 <njohnston> by going into the job log for every job and seeing what the full path to libraries were reported in tracebacks, seemed like the most reliable method
16:04:43 <njohnston> at this point all automatically generated changes are merged and py3 unit test jobs are present and passing
16:05:08 <slaweq> thx njohnston, great job
16:05:34 <njohnston> I started on the list with bagpipe trying a zuul v3 migration as well.
16:05:49 <njohnston> so with your approval I will send this to the ML
16:05:58 <njohnston> and work on a few of the changes mysqlf
16:06:02 <njohnston> *myself
16:06:09 <slaweq> ok, that sounds good for me
16:06:37 <mlavalle> yes, great!
16:06:48 <slaweq> and IMO we should focus only on voting jobs, non-voting jobs should be taken care by project's team
16:06:52 <slaweq> what do You think?
16:07:00 <bcafarel> the list does not look too bad nice (of course, no guarantee what will happen when trying to move some of them to py3)
16:07:08 <njohnston> I agree, I just made a note of them for the sake of completeness
16:07:41 <njohnston> so that if a project wants to mark a job voting then they know they should also make it py3
16:07:44 <mlavalle> yes, only voting jobs
16:07:48 <slaweq> yes, that's good we have it on the list but lets take care only about voting jobs from this list :)
16:07:56 <njohnston> +1
16:09:03 <slaweq> ok, lets move on then
16:09:08 <slaweq> next one was
16:09:12 <slaweq> slaweq to check bug https://bugs.launchpad.net/neutron/+bug/1816489
16:09:14 <openstack> Launchpad bug 1816489 in neutron "Functional test neutron.tests.functional.agent.l3.test_ha_router.LinuxBridgeL3HATestCase. test_ha_router_lifecycle failing" [High,Fix released] - Assigned to Slawek Kaplonski (slaweq)
16:09:26 <slaweq> I did, and fix is merged: https://review.openstack.org/640400
16:09:51 <slaweq> next one was:
16:09:53 <slaweq> slaweq to create bionic test patches for stadium projects
16:10:20 <slaweq> test patches are done for neutron and stadium projects: patches done: https://review.openstack.org/#/q/topic:legacy-job-bionic+status:open+owner:%22Slawek+Kaplonski+%253Cskaplons%2540redhat.com%253E%22
16:11:10 <slaweq> I found some bugs, so reported it on launchpad
16:11:27 <slaweq> I also listed summary to ML: http://lists.openstack.org/pipermail/openstack-discuss/2019-February/003129.html
16:11:49 <slaweq> if someone wants to help, You can take a look at those bugs and propose fixes :)
16:12:07 <slaweq> for networking-ovn lucasgomes already proposed fix, so that is fine
16:12:31 <slaweq> questions, comments?
16:12:39 <mlavalle> good job, thanks!
16:12:52 <njohnston> looks great!  good work.
16:12:56 <slaweq> thx
16:13:00 <slaweq> ok, lets move on
16:13:08 <slaweq> slaweq to prepare etherpad and plan of moving tempest plugins from stadium projects to neutron-tempest-plugin repo
16:13:31 <slaweq> so I did etherpad https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo with "plan of work to do"
16:13:37 <slaweq> please check if that makes sense for You
16:14:13 <slaweq> generally it's only 5 projects which have some tempest tests to move
16:14:31 <mlavalle> that's a manageable number
16:14:35 <slaweq> I didn't list here 3rd party projects like for vmware-nsx
16:14:38 <njohnston> very nice
16:14:43 <slaweq> it's only related to stadium projects
16:15:01 <slaweq> is that fine? or should we take care of 3rd party project too?
16:15:02 <mlavalle> I think tidwellr can help us with dynamic routing
16:15:05 <njohnston> just FYI for the meeting minutes; the link for the bionic migration email was incorrect.  The correct one is: http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003479.html
16:15:31 <slaweq> njohnston: thx
16:15:33 <mlavalle> I can ping tidwellr and ask him to help
16:15:43 <bcafarel> I'd say 3rd party by definition means outside of common repo
16:15:56 <slaweq> bcafarel: yes, I also think that
16:16:20 <slaweq> but according to what QA team wants they also should be moved to separate repositories
16:16:20 <mlavalle> that leaves 4 repos
16:16:21 <njohnston> I think 3rd party projects would be outside the scope of this upstream committee, but might be in scope for distro vendors like Red Hat or SuSE to work with as part of the distro offerings outside the scope of the official stadium support
16:16:40 <njohnston> if you know what I mean
16:16:56 <slaweq> njohnston: yes, I know
16:17:06 <slaweq> njohnston: that we can discuss downstream too :)
16:17:33 <slaweq> ok, so getting back to the list
16:17:51 <slaweq> mlavalle will ask tidwellr to help with dynamic-routing, that's good
16:17:56 <slaweq> any other volunteers?
16:18:04 <mlavalle> before going on....
16:18:10 <slaweq> if You want to help, please add You name in etherpad
16:18:11 <mlavalle> I got confused
16:18:19 <slaweq> mlavalle: why?
16:18:28 <mlavalle> to make sure
16:18:47 <mlavalle> we are moving these plugins to the neutron-tempest-plugin repor, right?
16:18:54 <slaweq> yes
16:18:58 <mlavalle> ok
16:19:08 <slaweq> we agreed on that on last meeting I think
16:19:25 <slaweq> it was already done for midonet some time ago, so we will do it for others too
16:19:43 <bcafarel> it was a request from QA in Denver PTG right?
16:19:49 <slaweq> bcafarel: right
16:19:55 <mlavalle> I know, but I got cofused by something that was said above... nevermind
16:20:07 <slaweq> mlavalle: sure, I understand :)
16:20:24 <bcafarel> I can probably give a hand for sfc, I still remember a few things there :)
16:20:32 <mlavalle> for bgpvpn.... have we asked tmorin?
16:20:41 <njohnston> I put my name down for fwaas
16:20:57 <mlavalle> I took sfc
16:21:08 <slaweq> ok, I also put in this etherpad some "action plan" how to perform such migration but please read it and update if You think it should be done differently
16:21:13 <mlavalle> unless bcafarel wants it
16:21:35 <mlavalle> then I'd take vpnaas
16:21:37 <bcafarel> mlavalle: no problem, I can do vpnaas
16:21:54 <mlavalle> bcafarel: let's switch. you know sfc well
16:22:01 <mlavalle> in case there are issues
16:22:02 <slaweq> I can ping tmorin about bgpvpn tomorrow
16:22:14 <bcafarel> ^ sounds good, before "volunteering" him
16:22:22 <bcafarel> mlavalle: ack :)
16:22:54 <slaweq> basically there shouldn't be big problems with it, it's "just" rehoming tests to other repo :)
16:23:09 <slaweq> but we also need to define new jobs for those tests
16:24:16 <slaweq> ok, can we move forward then?
16:24:17 <mlavalle> so each volunteer is responsible for the jobs of his repo?
16:24:37 <njohnston> +1
16:24:45 <slaweq> +1
16:24:55 <bcafarel> all good
16:25:56 <slaweq> ok, that was easy :)
16:26:01 <slaweq> thx guys :)
16:26:08 <slaweq> lets go to next topics
16:26:30 <slaweq> as we already talked about transiton to bionic and python 3, lets go to
16:26:33 <slaweq> #topic Grafana
16:26:40 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:27:52 <bcafarel> sometimes I think we should open this link in some tab when starting the meeting (to give it time to load)
16:28:05 <slaweq> bcafarel: that's good idea
16:28:18 <slaweq> I will try to remember to put it as first thing in the meeting next time :)
16:29:05 <slaweq> so basically we have "only" 2 problems now :/
16:29:18 <slaweq> neutron-fullstack and functional tests are in very bad shape
16:29:22 <mlavalle> fullstack/ functional?
16:29:48 <slaweq> yes, most of tempest/scenario jobs are fine
16:30:24 <slaweq> one thing to mention: I added py37 UT job to dashboard recently
16:30:54 <slaweq> and moved lower-constraints job to the same graph too as it's also UT job in fact
16:31:32 <slaweq> any questions/comments?
16:32:04 <bcafarel> makes sense (the "new" UT graph)
16:32:26 <slaweq> thx bcafarel :)
16:32:33 <slaweq> lets than talk about functional/fullstack issues
16:32:39 <slaweq> #topic fullstack/functional
16:32:48 <slaweq> so, first functional tests
16:33:01 <slaweq> I recently noticed and reported 3 different bugs:
16:33:25 <slaweq> https://bugs.launchpad.net/neutron/+bug/1818334 - this one isn't very big issue as it happend "only" few times
16:33:26 <openstack> Launchpad bug 1818334 in neutron "Functional test test_concurrent_create_port_forwarding_update_port is failing" [Medium,Confirmed]
16:33:53 <slaweq> I wanted to ask liuyulong_zzz that maybe he can take a look as he was doing this test IIRC
16:34:11 <mlavalle> we can bring it up tomorrow in the l3 meeting
16:34:16 <mlavalle> I'll do it
16:34:33 <slaweq> mlavalle: thx
16:34:49 <mlavalle> so, we want to ask if he is still fixing it?
16:34:49 <slaweq> now, next bugs are much more urgent:
16:34:57 <slaweq> https://bugs.launchpad.net/neutron/+bug/1818613
16:34:58 <openstack> Launchpad bug 1818613 in neutron "Functional/fullstack qos related tests fails often" [Critical,Confirmed]
16:35:50 <slaweq> this one is I think somehow related to ralonsoh patch https://review.openstack.org/#/c/406841/
16:36:22 <ralonsoh> slaweq, how?
16:37:16 <slaweq> ralonsoh: http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22line%2052,%20in%20_minimum_bandwidth_initialize%5C%22
16:37:28 <slaweq> it looks that it started failing like that when we merged Your patch
16:37:55 <slaweq> but it's only from my quick look, so please don't take it as 100% sure thing :)
16:38:06 <ralonsoh> slaweq, I'll take a look at this now
16:38:11 <slaweq> ralonsoh: thx
16:38:26 <slaweq> in bug report You have link to example failure and to logstash query
16:38:27 <mlavalle> gracias ralonsoh
16:38:39 <mlavalle> como siempre, muy entron
16:38:58 <slaweq> I know that dougwig also wanted to look at them so please maybe sync with him
16:39:15 <slaweq> ok, and second of those bugs is related to L3 HA: https://bugs.launchpad.net/neutron/+bug/1818614
16:39:16 <openstack> Launchpad bug 1818614 in neutron "Various L3HA functional tests fails often" [Critical,Confirmed]
16:39:32 <slaweq> in this case also many random tests are failing
16:39:54 <slaweq> and only common thing between them is that they are all failing when waiting for router to be transitioned to master
16:40:17 <slaweq> so from that I would say that first thing to check is keepalived and all things related to it
16:41:50 <slaweq> mlavalle: maybe You can also raise it on tomorrow's L3 meeting?
16:42:09 <mlavalle> you read my mind.... I'll add a tage to it
16:42:14 <slaweq> mlavalle: thx
16:43:05 <slaweq> I saw in neutron channel that dougwig is looking at this now, I will sync later with him if he will need any help on that
16:43:26 <slaweq> and basically those are most urgent functional tests issues which fails a lot recently
16:43:30 <dougwig> i was adding elastic queries, i'm not sure how much time i have for these particular bugs.
16:43:43 <slaweq> dougwig: hi, ok
16:43:53 <mlavalle> all help is welcome dougwig
16:43:55 <dougwig> i'll try to give it some cycles today, but i'm not sure.
16:44:05 <slaweq> dougwig: so if You would found something, please write it in bug report
16:44:08 <dougwig> ok.
16:44:14 <slaweq> I will tomorrow morning continue work on it
16:44:21 <slaweq> thx dougwig :)
16:44:44 <mlavalle> dougwig, slaweq: let's just update the bug with whatever progress we make
16:44:50 <slaweq> mlavalle++
16:44:53 <mlavalle> so others can follow from that point
16:45:22 <slaweq> ok, and now fullstack tests
16:45:28 <slaweq> - https://bugs.launchpad.net/neutron/+bug/1818335
16:45:29 <openstack> Launchpad bug 1818335 in neutron "Fullstack test test_dscp_marking_packets fails" [Medium,Confirmed]
16:45:44 * mlavalle aslo added a l3 dvr backlog tag to the port forwarding bug
16:45:44 <slaweq> I found quite many times some issues with this test
16:46:01 <slaweq> it always fails because no marked packet was received
16:46:21 <mlavalle> is the priority right?
16:46:23 <slaweq> I don't know if that is issue in L2 agent or maybe in tcpdump which is checking those packets
16:46:37 <slaweq> mlavalle: what is the priority?
16:46:44 <mlavalle> of the bug, medium?
16:47:00 <slaweq> yes, I marked it like that few days ago
16:47:06 <mlavalle> ok
16:47:09 <slaweq> but now I think we can change to high
16:47:20 <mlavalle> ahhh, that was my question
16:47:22 <slaweq> as it happens more times in last few days :)
16:47:29 <slaweq> changed
16:48:00 <slaweq> I guess that there will be no voluneers for that one so I will probably assign it to my self
16:48:15 <slaweq> but first I will focus on this bug with functional tests
16:48:58 <slaweq> from other bugs, I found also that we recently hit couple of times old bug https://bugs.launchpad.net/neutron/+bug/1799555
16:48:59 <openstack> Launchpad bug 1799555 in neutron "Fullstack test neutron.tests.fullstack.test_dhcp_agent.TestDhcpAgentHA.test_reschedule_network_on_new_agent timeout" [Medium,Confirmed]
16:49:41 <mlavalle> slaweq: if you need help with the dscp one, I can try to help
16:49:51 <slaweq> mlavalle: that would be great
16:49:59 <mlavalle> ok, I'll take it
16:50:07 <njohnston> I can try to help with the dscp one as well, I have some expertise in DSCP :-)
16:50:11 <slaweq> actally looking at this last one now http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22line%20168,%20in%20test_reschedule_network_on_new_agent%5C%22
16:50:17 <slaweq> it happens also quite often
16:50:30 <mlavalle> njohnston: in that case, I'll let you take a stab at it
16:50:33 <slaweq> so I will also change it to high
16:50:46 * njohnston adds it to my task list
16:50:58 <slaweq> mlavalle: so maybe You can take a look at https://bugs.launchpad.net/neutron/+bug/1799555 ?
16:51:00 <openstack> Launchpad bug 1799555 in neutron "Fullstack test neutron.tests.fullstack.test_dhcp_agent.TestDhcpAgentHA.test_reschedule_network_on_new_agent timeout" [High,Confirmed]
16:51:00 <slaweq> :)
16:51:27 <mlavalle> ok
16:51:34 <slaweq> I see at least 7 hits in last 7 days
16:51:37 <slaweq> thx mlavalle
16:51:52 <slaweq> ok, so to sum up
16:51:57 <dougwig> slaweq: i filed two new ci unstable bugs this morning, and added elastic rechecks for them.
16:52:10 <slaweq> njohnston will take a look at fullstack dscp issue,
16:52:24 <njohnston> #action njohnston Debug fullstack DSCP issue
16:52:25 <slaweq> mlavalle: will take a look at fullstack dhcp rescheduling issue
16:52:27 <slaweq> right?
16:52:31 <njohnston> +1
16:52:34 <mlavalle> +1
16:52:36 <slaweq> thx
16:52:54 <slaweq> #action mlavalle to take a look at fullstack dhcp rescheduling issue https://bugs.launchpad.net/neutron/+bug/1799555
16:52:55 <openstack> Launchpad bug 1799555 in neutron "Fullstack test neutron.tests.fullstack.test_dhcp_agent.TestDhcpAgentHA.test_reschedule_network_on_new_agent timeout" [High,Confirmed]
16:53:03 <slaweq> dougwig: do You have links to bugs?
16:53:16 <dougwig> one sec
16:53:37 <dougwig> https://bugs.launchpad.net/neutron/+bug/1818696
16:53:37 <openstack> Launchpad bug 1818696 in neutron "frequent ci failures trying to delete qos port" [Undecided,New]
16:53:45 <dougwig> https://bugs.launchpad.net/neutron/+bug/1818697
16:53:46 <openstack> Launchpad bug 1818697 in neutron "neutron fullstack frequently times out waiting on qos ports" [Undecided,New]
16:53:55 <dougwig> if either is a dup, i can update the elastic queries.
16:54:40 <slaweq> it may be that second one is dup of https://bugs.launchpad.net/neutron/+bug/1818613
16:54:40 <openstack> Launchpad bug 1818613 in neutron "Functional/fullstack qos related tests fails often" [Critical,Confirmed]
16:54:52 <slaweq> but one is related to functional tests and second to fullstack
16:55:12 <slaweq> so I would say, lets keep open both - maybe they will be fixed with same patch
16:55:24 <slaweq> do You agree?
16:55:40 <dougwig> yes.  one is not finding a port, the other is a timeout.  they may be related, but i'm not sure yet.
16:55:57 <slaweq> dougwig: ok, thx :)
16:56:03 <mlavalle> +1
16:56:47 <slaweq> ok
16:57:12 <slaweq> ok, lets move on quickly
16:57:18 <slaweq> #topic Tempest/Scenario
16:57:31 <slaweq> mlavalle: any updates on on https://bugs.launchpad.net/neutron/+bug/1795870 ?
16:57:32 <openstack> Launchpad bug 1795870 in neutron "Trunk scenario test test_trunk_subport_lifecycle fails from time to time" [High,In progress] - Assigned to Miguel Lavalle (minsel)
16:57:41 <mlavalle> the patches are ok
16:57:51 <mlavalle> I am trying to find the best way to test
16:57:59 <slaweq> ready for review, right?
16:58:20 <mlavalle> attempt this past Sunday on testing wasn\'t too good
16:59:04 <mlavalle> so I need to iterate once more
16:59:16 <slaweq> ok, if You will need any help, ping me :)
16:59:23 <mlavalle> ok
16:59:39 <mlavalle> I'll probably ping you and dougwig in channel
16:59:45 <slaweq> I think we are running out of time now
16:59:49 <slaweq> thx for attending
16:59:53 <bcafarel> o/
16:59:53 <slaweq> #endmeeting