16:00:03 <slaweq> #startmeeting neutron_ci
16:00:05 <openstack> Meeting started Tue Dec  3 16:00:03 2019 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:09 <openstack> The meeting name has been set to 'neutron_ci'
16:00:10 <slaweq> welcome (again) :)
16:00:19 <bcafarel> o/ (again too :) )
16:01:38 <ralonsoh> hi
16:01:55 <slaweq> njohnston liuyulong CI meeting, do You have time to attend?
16:02:30 <slaweq> ok, lets start
16:02:31 <slaweq> #topic Actions from previous meetings
16:02:38 <slaweq> njohnston to check failing NetworkMigrationFromHA in multinode dvr job
16:03:44 <slaweq> ok, I think njohnston is not here now
16:03:47 <slaweq> so lets move on
16:03:56 <slaweq> slaweq to continue investigating issue https://bugs.launchpad.net/neutron/+bug/1850557
16:03:56 <openstack> Launchpad bug 1850557 in neutron "DHCP connectivity after migration/resize not working" [High,Fix released] - Assigned to Slawek Kaplonski (slaweq)
16:04:04 <slaweq> Patch is now merged already https://review.opendev.org/696794
16:04:19 <slaweq> I hope we will be good now with those migration/shelve tests on multinode jobs
16:05:00 <slaweq> and next one:
16:05:01 <bcafarel> yep I have seen this one in stable backports queue
16:05:02 <slaweq> slaweq to move job definitions to zuul.d directory
16:05:38 <slaweq> bcafarel: yes, I proposed backport as this issue is also valid for stable braches and backport was easy
16:06:01 <slaweq> according to zuul jobs definitions, patch is proposed https://review.opendev.org/#/c/696286/
16:06:10 <slaweq> please review it if You have some time
16:06:37 <slaweq> any questions/comments to actions from last week?
16:07:32 <bcafarel> just big +1 on that zuul jobs definitions split :)
16:07:38 <ralonsoh> for sure
16:07:59 <slaweq> thx
16:08:19 <slaweq> I hope it will be easier to look for jobs' definitions now
16:08:30 <slaweq> ok, lets move on
16:08:35 <slaweq> #action njohnston to check failing NetworkMigrationFromHA in multinode dvr job
16:08:42 <slaweq> ^^ just a reminder for next week
16:08:51 <slaweq> and I think we can move on to the next topic now
16:08:59 <slaweq> #topic Actions from previous meetings
16:09:02 <slaweq> #undo
16:09:03 <openstack> Removing item from minutes: #topic Actions from previous meetings
16:09:09 <slaweq> #topic Stadium projects
16:09:21 <slaweq> tempest-plugins migration
16:09:23 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo
16:09:52 <slaweq> we finally merged step 2 for neutron-dynamic-routing
16:09:56 <slaweq> thx njohnston for that
16:10:03 <slaweq> so the last project on this list is vpnaas
16:10:16 <slaweq> and mlavalle told me yesterday that patches for that are ready for review
16:10:28 <slaweq> I wanted to review them today but didn't had time
16:10:43 <slaweq> Patches are here:
16:10:45 <slaweq> Step 1: https://review.openstack.org/#/c/649373
16:10:47 <slaweq> Step 2: https://review.opendev.org/#/c/695834
16:10:58 <slaweq> so please review them if You will have some time :)
16:11:16 <bcafarel> same I started to review step 1 but got sidetracked
16:11:17 <slaweq> and we will be good with whole this migration finally
16:11:35 <bcafarel> one question I had there, vpnaas should only be migrated from ussuri?
16:11:46 <bcafarel> or from train to be "in sync" with others
16:12:30 <slaweq> no, I think that ussuri is enough
16:12:44 <bcafarel> ok, off to -1 then :)
16:13:27 <slaweq> :)
16:14:21 <slaweq> thx bcafarel for taking look into that
16:14:38 <slaweq> and next topic related to stadium projects is:
16:14:40 <slaweq> Neutron Train - Drop py27 and standardize on zuul v3
16:14:42 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop
16:14:50 <slaweq> but this was already discussed today on team meeting
16:15:00 <slaweq> so I don't think we need to talk about it here too
16:15:25 <slaweq> do You have anything else related to stadium projects for today?
16:16:23 <slaweq> ok, I guess that this means "no" :)
16:16:27 <slaweq> so lets move on
16:16:29 <ralonsoh> no
16:16:33 <slaweq> #topic Grafana
16:16:35 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:18:04 <slaweq> We need to clean it a bit from some recently removed jobs. But I want to do it after grenade-py3 will be removed and after we will add ovn jobs to neutron CI too.
16:18:19 <slaweq> other than that I don't see anything really wrong in grafana
16:19:31 <slaweq> do You have anything related to grafana?
16:19:38 <ralonsoh> no
16:19:53 <bcafarel> me neither
16:20:19 <slaweq> so lets talk about some specific issues now
16:20:21 <slaweq> #topic fullstack/functional
16:20:39 <slaweq> regarding functional tests I found new (IMO) bug:
16:20:40 <njohnston> o/ sorry I am late
16:20:42 <slaweq> https://bugs.launchpad.net/neutron/+bug/1854462
16:20:42 <openstack> Launchpad bug 1854462 in neutron "[Functional tests] Timeout exception in list_namespace_pids" [High,Confirmed]
16:20:59 <slaweq> I know that we had something similar in the past but I was sure that ralonsoh fixed it already
16:21:09 <ralonsoh> slaweq, I need to check that
16:21:10 <slaweq> njohnston: o/ no problem :)
16:21:34 <bcafarel> that sounds really familiar indeed
16:21:35 <slaweq> thx ralonsoh - I saw it at least couple of times this last week so I marked it as High for now
16:21:35 <ralonsoh> yes, the fix was merged, isn't it?
16:22:11 <ralonsoh> I mean: we implement, in Neutron, this part of the pyroute2 code
16:22:12 <slaweq> ralonsoh: I don't remember fix exactly so I can't find it now
16:22:25 <ralonsoh> and we implemented a retry catch in the testcase class
16:22:30 <slaweq> but I'm pretty sure we merged Your fix for this
16:22:35 <ralonsoh> for timeouts
16:22:48 <ralonsoh> slaweq, put this in my TODO list
16:22:54 <slaweq> ralonsoh: thx
16:23:11 <slaweq> #action ralonsoh to check functional tests timeouts https://bugs.launchpad.net/neutron/+bug/1854462
16:23:12 <openstack> Launchpad bug 1854462 in neutron "[Functional tests] Timeout exception in list_namespace_pids" [High,Confirmed]
16:23:23 <slaweq> ralonsoh: just to remember to check it next week :)
16:23:26 <ralonsoh> sure
16:23:52 <slaweq> for fullstack tests I noticed one failed test https://0050cb9fd8118437e3e0-3c2a18acb5109e625907972e3aa6a592.ssl.cf5.rackcdn.com/696600/1/check/neutron-fullstack/4966bce/testr_results.html.gz
16:24:09 <slaweq> but as I checked logs from it, it seems that there was problem with rabbitmq during this test
16:24:20 <slaweq> all agents were dead in neutron db
16:24:35 <slaweq> so maybe it was some host slowdown or something like that
16:24:54 <slaweq> I will simply check if that will happen more times or not
16:24:57 <ralonsoh> did you open a bug for this one?
16:25:05 <ralonsoh> not necessary
16:25:08 <slaweq> ralonsoh: no
16:25:10 <ralonsoh> ok
16:25:21 <slaweq> I found it today and wanted to take a look for few days first
16:25:29 <slaweq> to check if that will happen more times
16:25:57 <slaweq> and that's all related to functional/fullstack tests from my side
16:26:04 <slaweq> anything else You want to add/ask?
16:27:03 <slaweq> ok, if not, lets move on
16:27:07 <njohnston> go ahead
16:27:12 <slaweq> #topic Tempest/Scenario
16:27:37 <slaweq> here, after merging my fix for resize/shelve failure I think we are quite good now
16:27:45 <slaweq> but we have problem with grenade jobs
16:27:58 <slaweq> those jobs are failing quite often recently
16:28:32 <slaweq> so first of all, as we talked some time ago, I proposed to remove grenade-py3 from our gate: https://review.opendev.org/#/c/695172/
16:28:43 <slaweq> please review this patch if You have few minutes
16:29:03 <slaweq> less grenade jobs, smaller chance to hit its failures :)
16:29:11 <ralonsoh> hahahaha
16:29:16 <slaweq> :)
16:29:28 <njohnston> +100
16:29:37 <slaweq> and than second part is worts, as in multinode grenade jobs we are hitting some issue quite often
16:29:46 <slaweq> examples of such failures are e.g.:
16:29:47 <slaweq> https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ad0/696592/3/check/neutron-grenade-multinode/ad0df97/logs/grenade.sh.txt.gz
16:29:49 <slaweq> https://6e84b50c364d7e277563-65c8cd20428a10135cd2762abf51d9a7.ssl.cf2.rackcdn.com/697035/1/check/grenade-py3/78b1764/logs/grenade.sh.txt.gz
16:29:51 <slaweq> https://819efd42b5c79a55763b-90a63ad77a0414e858bcf634436e4dc8.ssl.cf5.rackcdn.com/697035/1/check/neutron-grenade-multinode/4d68281/logs/testr_results.html.gz
16:29:53 <slaweq> https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_725/696103/8/check/neutron-grenade-dvr-multinode/7255438/logs/grenade.sh.txt.gz
16:30:01 <slaweq> in each of those cases it seems for me as some error on nova's side
16:30:12 <slaweq> but I didn't dig a lot into logs so I'm not 100% sure
16:31:20 <slaweq> so I will try to dig into those failures a bit more and try to find solution for it or report nova bug at least :)
16:31:34 <slaweq> #action slaweq to check reason of grenade jobs failures
16:32:05 <ralonsoh> thanks!
16:32:34 <slaweq> from other things related to scenario jobs I have 2 patches ready for review:
16:32:55 <slaweq> https://review.opendev.org/#/c/694049/ - this one switches queens jobs to run on tagged version of tempest plugin
16:33:19 <slaweq> and removes those jobs from check and gate queue
16:33:36 <slaweq> and second one:
16:33:38 <slaweq> https://review.opendev.org/#/c/695013/
16:33:53 <slaweq> this one switches to use py3 on all nodes in multinode jobs
16:34:08 <slaweq> so please review those patches if You will have some time :)
16:34:12 <ralonsoh> +2 to boith
16:34:14 <ralonsoh> both
16:34:25 <slaweq> ralonsoh: thx
16:34:25 <bcafarel> +1 to both ;)
16:34:32 <slaweq> bcafarel: thx :)
16:34:53 <slaweq> and that's all what I have for today
16:35:00 <njohnston> +2+W x 2
16:35:06 <slaweq> thx njohnston :)
16:35:10 <slaweq> that was fast
16:35:15 <slaweq> team++
16:35:19 <njohnston> I have one bug to talk about
16:35:25 <slaweq> njohnston: go on
16:35:32 <njohnston> "py36 unit test cases fails" https://bugs.launchpad.net/neutron/+bug/1854051
16:35:32 <openstack> Launchpad bug 1854051 in neutron "py36 unit test cases fails" [Critical,New]
16:35:37 <njohnston> from last week as bug deputy
16:36:01 <njohnston> I have not seen that in the gate or personally but I wanted to see if anyone had any experience with this sort of thing
16:36:29 <slaweq> not me, I didn't noticed that bug in gate
16:36:53 <ralonsoh> maybe we can block this specific "typing" version
16:37:00 <slaweq> and based on last comment from liuyulong it seems that he is using rpm to install deps
16:37:08 <njohnston> yeah
16:37:21 <slaweq> so maybe we are installing some other version of typing from pypi
16:37:26 <slaweq> and that's why we are fine?
16:38:25 <njohnston> I think that is possible
16:38:56 <njohnston> anyhow, just wanted to raise it here and see if anyone had seen it.  thanks!
16:39:26 <slaweq> and also typing package is not in neutron requirements
16:39:46 <slaweq> so I'm not sure how this may cause problem in neutron
16:40:28 <ralonsoh> this is part of the standard library
16:41:05 <slaweq> ok, but than it shouldn't cause any problems for us, right?
16:41:13 <ralonsoh> right
16:41:33 <slaweq> tbh I would close this bug for now as incomplete as it not happens on gate
16:42:35 <njohnston> done!
16:42:43 <slaweq> njohnston: thx :)
16:42:55 <slaweq> ok, anything else You want to discuss today?
16:43:05 <slaweq> if not, I will give You 15 minutes back :)
16:43:16 <bcafarel> I like option 2
16:43:21 <ralonsoh> hahaha
16:43:29 <slaweq> ok
16:43:32 <njohnston> +2
16:43:33 <slaweq> so thx for attending
16:43:35 <ralonsoh> bye!
16:43:37 <slaweq> and see You tomorrow :)
16:43:39 <slaweq> o/
16:43:39 <njohnston> o/
16:43:41 <slaweq> #endmeeting