16:00:03 <slaweq> #startmeeting neutron_ci 16:00:05 <openstack> Meeting started Tue Dec 3 16:00:03 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:09 <openstack> The meeting name has been set to 'neutron_ci' 16:00:10 <slaweq> welcome (again) :) 16:00:19 <bcafarel> o/ (again too :) ) 16:01:38 <ralonsoh> hi 16:01:55 <slaweq> njohnston liuyulong CI meeting, do You have time to attend? 16:02:30 <slaweq> ok, lets start 16:02:31 <slaweq> #topic Actions from previous meetings 16:02:38 <slaweq> njohnston to check failing NetworkMigrationFromHA in multinode dvr job 16:03:44 <slaweq> ok, I think njohnston is not here now 16:03:47 <slaweq> so lets move on 16:03:56 <slaweq> slaweq to continue investigating issue https://bugs.launchpad.net/neutron/+bug/1850557 16:03:56 <openstack> Launchpad bug 1850557 in neutron "DHCP connectivity after migration/resize not working" [High,Fix released] - Assigned to Slawek Kaplonski (slaweq) 16:04:04 <slaweq> Patch is now merged already https://review.opendev.org/696794 16:04:19 <slaweq> I hope we will be good now with those migration/shelve tests on multinode jobs 16:05:00 <slaweq> and next one: 16:05:01 <bcafarel> yep I have seen this one in stable backports queue 16:05:02 <slaweq> slaweq to move job definitions to zuul.d directory 16:05:38 <slaweq> bcafarel: yes, I proposed backport as this issue is also valid for stable braches and backport was easy 16:06:01 <slaweq> according to zuul jobs definitions, patch is proposed https://review.opendev.org/#/c/696286/ 16:06:10 <slaweq> please review it if You have some time 16:06:37 <slaweq> any questions/comments to actions from last week? 16:07:32 <bcafarel> just big +1 on that zuul jobs definitions split :) 16:07:38 <ralonsoh> for sure 16:07:59 <slaweq> thx 16:08:19 <slaweq> I hope it will be easier to look for jobs' definitions now 16:08:30 <slaweq> ok, lets move on 16:08:35 <slaweq> #action njohnston to check failing NetworkMigrationFromHA in multinode dvr job 16:08:42 <slaweq> ^^ just a reminder for next week 16:08:51 <slaweq> and I think we can move on to the next topic now 16:08:59 <slaweq> #topic Actions from previous meetings 16:09:02 <slaweq> #undo 16:09:03 <openstack> Removing item from minutes: #topic Actions from previous meetings 16:09:09 <slaweq> #topic Stadium projects 16:09:21 <slaweq> tempest-plugins migration 16:09:23 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo 16:09:52 <slaweq> we finally merged step 2 for neutron-dynamic-routing 16:09:56 <slaweq> thx njohnston for that 16:10:03 <slaweq> so the last project on this list is vpnaas 16:10:16 <slaweq> and mlavalle told me yesterday that patches for that are ready for review 16:10:28 <slaweq> I wanted to review them today but didn't had time 16:10:43 <slaweq> Patches are here: 16:10:45 <slaweq> Step 1: https://review.openstack.org/#/c/649373 16:10:47 <slaweq> Step 2: https://review.opendev.org/#/c/695834 16:10:58 <slaweq> so please review them if You will have some time :) 16:11:16 <bcafarel> same I started to review step 1 but got sidetracked 16:11:17 <slaweq> and we will be good with whole this migration finally 16:11:35 <bcafarel> one question I had there, vpnaas should only be migrated from ussuri? 16:11:46 <bcafarel> or from train to be "in sync" with others 16:12:30 <slaweq> no, I think that ussuri is enough 16:12:44 <bcafarel> ok, off to -1 then :) 16:13:27 <slaweq> :) 16:14:21 <slaweq> thx bcafarel for taking look into that 16:14:38 <slaweq> and next topic related to stadium projects is: 16:14:40 <slaweq> Neutron Train - Drop py27 and standardize on zuul v3 16:14:42 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop 16:14:50 <slaweq> but this was already discussed today on team meeting 16:15:00 <slaweq> so I don't think we need to talk about it here too 16:15:25 <slaweq> do You have anything else related to stadium projects for today? 16:16:23 <slaweq> ok, I guess that this means "no" :) 16:16:27 <slaweq> so lets move on 16:16:29 <ralonsoh> no 16:16:33 <slaweq> #topic Grafana 16:16:35 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:18:04 <slaweq> We need to clean it a bit from some recently removed jobs. But I want to do it after grenade-py3 will be removed and after we will add ovn jobs to neutron CI too. 16:18:19 <slaweq> other than that I don't see anything really wrong in grafana 16:19:31 <slaweq> do You have anything related to grafana? 16:19:38 <ralonsoh> no 16:19:53 <bcafarel> me neither 16:20:19 <slaweq> so lets talk about some specific issues now 16:20:21 <slaweq> #topic fullstack/functional 16:20:39 <slaweq> regarding functional tests I found new (IMO) bug: 16:20:40 <njohnston> o/ sorry I am late 16:20:42 <slaweq> https://bugs.launchpad.net/neutron/+bug/1854462 16:20:42 <openstack> Launchpad bug 1854462 in neutron "[Functional tests] Timeout exception in list_namespace_pids" [High,Confirmed] 16:20:59 <slaweq> I know that we had something similar in the past but I was sure that ralonsoh fixed it already 16:21:09 <ralonsoh> slaweq, I need to check that 16:21:10 <slaweq> njohnston: o/ no problem :) 16:21:34 <bcafarel> that sounds really familiar indeed 16:21:35 <slaweq> thx ralonsoh - I saw it at least couple of times this last week so I marked it as High for now 16:21:35 <ralonsoh> yes, the fix was merged, isn't it? 16:22:11 <ralonsoh> I mean: we implement, in Neutron, this part of the pyroute2 code 16:22:12 <slaweq> ralonsoh: I don't remember fix exactly so I can't find it now 16:22:25 <ralonsoh> and we implemented a retry catch in the testcase class 16:22:30 <slaweq> but I'm pretty sure we merged Your fix for this 16:22:35 <ralonsoh> for timeouts 16:22:48 <ralonsoh> slaweq, put this in my TODO list 16:22:54 <slaweq> ralonsoh: thx 16:23:11 <slaweq> #action ralonsoh to check functional tests timeouts https://bugs.launchpad.net/neutron/+bug/1854462 16:23:12 <openstack> Launchpad bug 1854462 in neutron "[Functional tests] Timeout exception in list_namespace_pids" [High,Confirmed] 16:23:23 <slaweq> ralonsoh: just to remember to check it next week :) 16:23:26 <ralonsoh> sure 16:23:52 <slaweq> for fullstack tests I noticed one failed test https://0050cb9fd8118437e3e0-3c2a18acb5109e625907972e3aa6a592.ssl.cf5.rackcdn.com/696600/1/check/neutron-fullstack/4966bce/testr_results.html.gz 16:24:09 <slaweq> but as I checked logs from it, it seems that there was problem with rabbitmq during this test 16:24:20 <slaweq> all agents were dead in neutron db 16:24:35 <slaweq> so maybe it was some host slowdown or something like that 16:24:54 <slaweq> I will simply check if that will happen more times or not 16:24:57 <ralonsoh> did you open a bug for this one? 16:25:05 <ralonsoh> not necessary 16:25:08 <slaweq> ralonsoh: no 16:25:10 <ralonsoh> ok 16:25:21 <slaweq> I found it today and wanted to take a look for few days first 16:25:29 <slaweq> to check if that will happen more times 16:25:57 <slaweq> and that's all related to functional/fullstack tests from my side 16:26:04 <slaweq> anything else You want to add/ask? 16:27:03 <slaweq> ok, if not, lets move on 16:27:07 <njohnston> go ahead 16:27:12 <slaweq> #topic Tempest/Scenario 16:27:37 <slaweq> here, after merging my fix for resize/shelve failure I think we are quite good now 16:27:45 <slaweq> but we have problem with grenade jobs 16:27:58 <slaweq> those jobs are failing quite often recently 16:28:32 <slaweq> so first of all, as we talked some time ago, I proposed to remove grenade-py3 from our gate: https://review.opendev.org/#/c/695172/ 16:28:43 <slaweq> please review this patch if You have few minutes 16:29:03 <slaweq> less grenade jobs, smaller chance to hit its failures :) 16:29:11 <ralonsoh> hahahaha 16:29:16 <slaweq> :) 16:29:28 <njohnston> +100 16:29:37 <slaweq> and than second part is worts, as in multinode grenade jobs we are hitting some issue quite often 16:29:46 <slaweq> examples of such failures are e.g.: 16:29:47 <slaweq> https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ad0/696592/3/check/neutron-grenade-multinode/ad0df97/logs/grenade.sh.txt.gz 16:29:49 <slaweq> https://6e84b50c364d7e277563-65c8cd20428a10135cd2762abf51d9a7.ssl.cf2.rackcdn.com/697035/1/check/grenade-py3/78b1764/logs/grenade.sh.txt.gz 16:29:51 <slaweq> https://819efd42b5c79a55763b-90a63ad77a0414e858bcf634436e4dc8.ssl.cf5.rackcdn.com/697035/1/check/neutron-grenade-multinode/4d68281/logs/testr_results.html.gz 16:29:53 <slaweq> https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_725/696103/8/check/neutron-grenade-dvr-multinode/7255438/logs/grenade.sh.txt.gz 16:30:01 <slaweq> in each of those cases it seems for me as some error on nova's side 16:30:12 <slaweq> but I didn't dig a lot into logs so I'm not 100% sure 16:31:20 <slaweq> so I will try to dig into those failures a bit more and try to find solution for it or report nova bug at least :) 16:31:34 <slaweq> #action slaweq to check reason of grenade jobs failures 16:32:05 <ralonsoh> thanks! 16:32:34 <slaweq> from other things related to scenario jobs I have 2 patches ready for review: 16:32:55 <slaweq> https://review.opendev.org/#/c/694049/ - this one switches queens jobs to run on tagged version of tempest plugin 16:33:19 <slaweq> and removes those jobs from check and gate queue 16:33:36 <slaweq> and second one: 16:33:38 <slaweq> https://review.opendev.org/#/c/695013/ 16:33:53 <slaweq> this one switches to use py3 on all nodes in multinode jobs 16:34:08 <slaweq> so please review those patches if You will have some time :) 16:34:12 <ralonsoh> +2 to boith 16:34:14 <ralonsoh> both 16:34:25 <slaweq> ralonsoh: thx 16:34:25 <bcafarel> +1 to both ;) 16:34:32 <slaweq> bcafarel: thx :) 16:34:53 <slaweq> and that's all what I have for today 16:35:00 <njohnston> +2+W x 2 16:35:06 <slaweq> thx njohnston :) 16:35:10 <slaweq> that was fast 16:35:15 <slaweq> team++ 16:35:19 <njohnston> I have one bug to talk about 16:35:25 <slaweq> njohnston: go on 16:35:32 <njohnston> "py36 unit test cases fails" https://bugs.launchpad.net/neutron/+bug/1854051 16:35:32 <openstack> Launchpad bug 1854051 in neutron "py36 unit test cases fails" [Critical,New] 16:35:37 <njohnston> from last week as bug deputy 16:36:01 <njohnston> I have not seen that in the gate or personally but I wanted to see if anyone had any experience with this sort of thing 16:36:29 <slaweq> not me, I didn't noticed that bug in gate 16:36:53 <ralonsoh> maybe we can block this specific "typing" version 16:37:00 <slaweq> and based on last comment from liuyulong it seems that he is using rpm to install deps 16:37:08 <njohnston> yeah 16:37:21 <slaweq> so maybe we are installing some other version of typing from pypi 16:37:26 <slaweq> and that's why we are fine? 16:38:25 <njohnston> I think that is possible 16:38:56 <njohnston> anyhow, just wanted to raise it here and see if anyone had seen it. thanks! 16:39:26 <slaweq> and also typing package is not in neutron requirements 16:39:46 <slaweq> so I'm not sure how this may cause problem in neutron 16:40:28 <ralonsoh> this is part of the standard library 16:41:05 <slaweq> ok, but than it shouldn't cause any problems for us, right? 16:41:13 <ralonsoh> right 16:41:33 <slaweq> tbh I would close this bug for now as incomplete as it not happens on gate 16:42:35 <njohnston> done! 16:42:43 <slaweq> njohnston: thx :) 16:42:55 <slaweq> ok, anything else You want to discuss today? 16:43:05 <slaweq> if not, I will give You 15 minutes back :) 16:43:16 <bcafarel> I like option 2 16:43:21 <ralonsoh> hahaha 16:43:29 <slaweq> ok 16:43:32 <njohnston> +2 16:43:33 <slaweq> so thx for attending 16:43:35 <ralonsoh> bye! 16:43:37 <slaweq> and see You tomorrow :) 16:43:39 <slaweq> o/ 16:43:39 <njohnston> o/ 16:43:41 <slaweq> #endmeeting