16:00:18 #startmeeting neutron_ci 16:00:19 Meeting started Tue Nov 27 16:00:18 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:20 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:22 hi 16:00:23 The meeting name has been set to 'neutron_ci' 16:00:38 o/ 16:01:58 let's wait few minutes for the others 16:02:06 o/ 16:02:12 I pinged them on openstack-neutron channel 16:02:17 o/ 16:03:48 sorry, I thought it was 1 hour from now 16:04:01 not used to winter time yet 16:04:07 :) 16:04:09 :) 16:04:14 ok, so lets start 16:04:21 #topic Actions from previous meetings 16:04:32 mlavalle to continue tracking not reachable FIP in trunk tests 16:04:38 yes 16:04:56 that entails merging https://review.openstack.org/#/c/618750 16:05:04 this was added only to not forget about it IIRC, as we first want to get my patch merged 16:05:09 (I just approved it) 16:05:09 right mlavalle :) 16:05:13 thx 16:05:24 and then looking at the effects for some days 16:05:29 so I will add this action for next week too to remember it, ok? 16:05:33 yes 16:05:37 #action mlavalle to continue tracking not reachable FIP in trunk tests 16:05:39 thx 16:05:44 that was quick :) 16:05:52 next one 16:05:54 slaweq to check which experimental jobs can be removed 16:05:55 I was actually going to ping you.... 16:06:10 why? 16:06:36 do you have a pointer to a traceback of the failute that the patch^^ is supposed to fix 16:06:40 ? 16:06:52 sure 16:06:59 there is a lot of such issues recently 16:07:25 I want to see if I find it in the trunk test failure 16:07:34 e.g. http://logs.openstack.org/23/619923/2/check/neutron-tempest-dvr-ha-multinode-full/e356b9a/logs/subnode-2/screen-q-l3.txt.gz?level=ERROR 16:07:51 and it happens in different tests, not only in trunk 16:07:59 exactly, that is what I was looking for 16:08:00 it's general problem with FIP connectivity 16:08:16 thanks 16:08:20 yw 16:08:44 ok, so going back to not needed experimental jobs 16:08:56 slaweq: sorry, i need to help a repair man here, just assign me some tasks :) 16:08:56 I did patch to remove some of them: https://review.openstack.org/619719 16:09:10 haleyb: sure - that we can do definitelly 16:09:12 we are all repair men here 16:09:16 haleyb: ^^^^ 16:09:19 #action haleyb takes all this week :D 16:09:27 lol 16:10:18 mlavalle: please take a look at this patch https://review.openstack.org/619719 - it has already +2 from haleyb 16:10:38 slaweq: added to the pile 16:10:42 mlavalle: thx 16:10:50 ok, moving on 16:10:58 next one was: slaweq to start migrating neutron CI jobs to zuul v3 syntax 16:11:10 I opened bug for that https://bugs.launchpad.net/neutron/+bug/1804844 16:11:10 Launchpad bug 1804844 in neutron "CI jobs definitions should be migrated to Zuul v3 syntax" [Low,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:11:22 And I pushed first patch for functional tests but it’s WIP now: https://review.openstack.org/#/c/619742/ 16:11:45 thanks for working on that 16:11:54 so if someone wants to work on migration for some job, please feel free to do it and push patch related to this bug 16:12:19 it is in fact a lot of patches to do but I though that one bug to track them all will be enough 16:12:44 good idea 16:13:47 ok, next one 16:13:48 njohnston to switch neutron to use integrated-gate-py35 with grenade-py3 job instead of our neutron-grenade job 16:14:45 njohnston: any update on this one? 16:14:59 So the grenade-py3 job is already in check and gate queue. I am watching it for a few runs 16:15:35 Just for due diligence, then I'll push up a. change to disable neutron-grenade. 16:16:27 where it is in gate alreade? 16:16:30 *already 16:16:37 I don't see it 16:16:42 it is inherited from one of the templates we include 16:17:17 but if you look at any neutron job in zuul.openstack.org you'll see grenade-py3 16:18:30 ahh, ok 16:18:33 I see it now 16:18:55 so we only need to remove neutron-grenade job now and we will be done with this, right? 16:19:00 yep! 16:19:07 good 16:19:12 will You do it this week? 16:19:40 I should have the change up within the hour 16:19:45 #action njohnston to remove neutron-grenade job from neutron's CI queues 16:19:47 thx njohnston 16:19:51 just waiting for the job I am watching to finish 16:20:00 ok 16:20:07 so lets move on to the next onw 16:20:09 *one 16:20:12 slaweq to check bug 1798475 16:20:13 bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475 16:20:23 I sent patch to store all journal logs in fullstack results: https://review.openstack.org/#/c/619935/ 16:20:34 I hope this will help to debug this issue as we will be able to see what is keepalived doing then. 16:20:45 I'll review it today 16:21:27 in the future when jobs will be migrated to zuulv3 format I think this can be added as role and added to all jobs as it can be helpful with some keepalived or dnsmasq logs 16:21:28 it's a great idea regardless 16:21:45 yeap 16:21:45 but for now I want it only in fullstack job as first step 16:22:41 #action slaweq to continue debugging bug 1798475 when journal log will be available in fullstack tests 16:22:43 bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475 16:22:54 ok, lets move on 16:23:02 slaweq to check why db_migration functional tests don't have logs 16:23:09 patch https://review.openstack.org/619266 16:23:20 it's merged already 16:23:49 so now we should have logs from all functional tests in job results 16:24:04 next one was: 16:24:07 njohnston to remove neutron-fullstack-python36 from grafana dashboard 16:24:46 One side note on the removal of the neutron-grenade job; that job is actually in the check and gate queue for the grenade project so I'll push a change in grenade to remove those first, and use a Depends-On to make sure that goes through before the neutron change 16:25:21 Regarding neutron-fullstack-python36, I remember adding it, but when I went to project-config I could find no reference to it. So that is a no-op. 16:26:08 ahh, that's good 16:26:10 so it's done :) 16:26:15 thx njohnston for checking it 16:26:32 ok, so that was all actions for today 16:26:39 fwiw 16:27:07 anything else to add or can we move on? 16:27:23 nothing from me 16:27:39 ok, so next topic then 16:27:40 #topic Python 3 16:27:51 njohnston: bcafarel any updates from You? 16:28:29 from next week not much I think 16:28:31 *previous 16:28:46 slaweq: except someone digging into functional tests for py3 16:29:15 ok, about this functional tests it is real problem 16:29:31 nothing from me because of PTO 16:29:55 I pushed today some DNM patch to test those tests with less output: https://review.openstack.org/#/c/620271/ 16:30:01 and indeed it was better 16:30:05 but not perfect 16:30:38 I also talked with mtreinish about it and he told me that it's know issue with stestr and too much output from tests 16:30:50 :/ 16:30:56 :-[ 16:31:20 so based on his comments I think that only workaround for this is to make somehow that our tests will produce less on stdout/stderr 16:31:50 also in my DNM patch I had 3 tests failing: http://logs.openstack.org/71/620271/2/check/neutron-functional/a7fd8ea/logs/testr_results.html.gz 16:32:03 it looks for me that it's related to issue with SIGHUP 16:32:20 so I'm not sure if we shouldn't skip/mark as usnstable those tests for now 16:33:35 I will try once again this DNM patch but with those 3 tests marked as unstable to check how it will be then 16:33:42 and we will see then 16:34:24 if anyone has some idea how to fix/workaround this problem, that would be great 16:34:44 patch to switch functional tests to py3 is here: https://review.openstack.org/#/c/577383/ 16:34:46 sounds good, we do have https://bugs.launchpad.net/neutron/+bug/1780139 open for the SIGHUP issue 16:34:47 Launchpad bug 1780139 in neutron "Sending SIGHUP to neutron-server process causes it to hang" [Undecided,Triaged] - Assigned to Bernard Cafarelli (bcafarel) 16:36:32 so thats all from me about py3 16:36:51 njohnston: do You know how many other jobs we still should switch to py3? 16:37:34 slaweq: maybe worth going through https://bugs.launchpad.net/cinder/+bug/1728640 and see if we can grab some ideas, like this "Make test logging setup fixture disable future setup" 16:37:35 Launchpad bug 1728640 in Cinder "py35 unit test subunit.parser failures" [Critical,Fix released] - Assigned to Sean McGinnis (sean-mcginnis) 16:38:11 yes, that is very similar issue to what we have with functional tests now :) 16:38:26 I will check that this week 16:38:39 I believe the multinode grenade jobs still need to be switched, at a minimum; grenade-py3 does not relieve us of those sadly 16:38:45 I'll have to check the etherpad 16:38:46 #action slaweq to continue fixing funtional-py3 tests 16:39:02 ok, thx njohnston 16:39:06 #action njohnston to research py3 conversion for neutron grenade multinode jobs 16:39:15 I will also check neutron-tempest-plugin jobs then 16:39:31 #action slaweq to convert neutron-tempest-plugin jobs to py3 16:40:17 ok, can we go on to the next topic then? 16:40:24 I think so 16:40:26 go ahead 16:40:30 #topic Grafana 16:40:37 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:41:55 gate queue wasn't busy last week as there was not too many people with +2 power available :) 16:42:20 yeap 16:42:38 We have Neutron-tempest-dvr-ha-multinode-full and Neutron-tempest-plugin-dvr-multinode-scenario failing on 100% again 16:43:10 but from what I was checking it's very often this issue with snat namespace, which should be fixed by https://review.openstack.org/#/c/618750/ 16:43:19 so we should be better next week I hope 16:43:59 From other things, I spotted again couple of issues with cinder backups, like: 16:44:01 http://logs.openstack.org/64/617364/19/check/tempest-slow/18519dc/testr_results.html.gz 16:44:03 yeah, let's track the effect of that 16:44:03 http://logs.openstack.org/87/609587/11/check/tempest-multinode-full/2a5c5a1/testr_results.html.gz 16:44:21 I will report this as a cinder bug today 16:44:56 slaweq: and I know I have an email from you with cinder failures 16:45:07 I will talk to Jay and Sean this week 16:45:38 from other things, we still have from time to time failures in functional tests (db-migrations timeout) and fullstack tests (this issue with keepalived mostly) and I'm trying to find out what is going on with both of them 16:45:49 thx mlavalle :) 16:46:07 one more thing related to grafana 16:46:09 We should add to grafana 2 new jobs: 16:46:11 networking-ovn-tempest-dsvm-ovs-release 16:46:13 Tempest-slow 16:46:17 any volunteer for that? :) 16:46:40 sure 16:46:45 thx njohnston :) 16:47:00 #action njohnston add tempest-slow and networking-ovn-tempest-dsvm-ovs-release to grafana 16:47:23 ok, lets move on then 16:47:29 #topic Tempest/Scenario 16:47:52 I today found out that we have job neutron-tempest-dvr in our queue 16:48:00 and it looks that it is single node dvr job 16:48:13 is it intentional? do we want to keep it like that? 16:48:33 It looks the same as neutron-tempest-dvr-ha-multinode-full job in fact 16:48:47 ISTR some discussion about this a long time ago, like in the newton timeframe 16:48:48 only difference is that this multinode job is non-voting 16:49:05 I think the goal was for the multinode job to end up being the voting one 16:49:21 yes, I think I have the same recollection 16:49:32 we can discuss in the L3 meeting 16:49:36 +1 16:49:42 njohnston: that is not possible to have multinode job voting now ;) 16:50:00 ok, mlavalle please then add this to L3 meeting agenda if You can 16:50:05 does the multinode job stable enough? 16:50:06 yes 16:50:15 hongbin: not even close 16:50:32 #action mlavalle to discuss about neutron-tempest-dvr job in L3 meeting 16:50:43 hongbin: it depends what You mean by stable 16:51:00 it's very stable now as it is on 100% of failures all the time :P 16:51:26 slaweq: if it doesn't block the merging too much after turning into voting, then it is fine 16:52:07 hongbin: it will block everything currently but I agree that we should focus on stabilize it 16:52:21 and we are working on it since some time 16:52:36 ack 16:53:03 ok, lets move on then 16:53:05 #topic Periodic 16:53:23 I just want to mention that we still have neutron-tempest-postgres-full failing all the time 16:53:29 but it's nova issue 16:53:35 bug reported: https://bugs.launchpad.net/nova/+bug/1804271 16:53:38 Launchpad bug 1804271 in OpenStack Compute (nova) "nova-api is broken in postgresql jobs" [High,In progress] - Assigned to Matt Riedemann (mriedem) 16:53:41 Fix in progress: https://review.openstack.org/#/c/619061/ 16:53:53 so we should be good when this will be merged 16:54:05 slaweq: here is a tip, 16:54:17 show up in the nova channel and ask that another core look at that already +2ed fix for the postgres job 16:54:33 i would, but i've already spent some review request karma today 16:54:41 mriedem: ok, I will :) 16:54:44 thx 16:55:31 last topic then 16:55:33 #topic Open discussion 16:55:46 anyone wants to discuss about anything? 16:56:00 i have one 16:56:08 go on hongbin 16:56:27 i don't like the long list of extensions in zuul job, so i propose a patch: https://review.openstack.org/#/c/619642/ 16:56:46 i want to know if this is what you guys prefer to do? 16:57:02 or it is not a good idea 16:57:25 yes, IMO that it easier to read in diff 16:57:30 it certainly better fits the screen 16:58:04 Would it be possible to use reusable snippets like we do with *tempest-irrelevant-files now? 16:58:14 yes, it possibly will fix the frequeent merge conflict between patches 16:58:38 hongbin: njohnston: great ideas 16:58:59 njohnston: i am not sure, because the list of extensions look different between jobs 16:59:08 hongbin: not all jobs 16:59:23 You can define snippet "per branch" and reuse them if necessary 16:59:33 at least for master branch it should be fine 16:59:52 yes, we can possibly consolidate the stable branch list 16:59:58 i will look into that 17:00:16 ok, we have to finish now 17:00:20 thx for attending 17:00:23 #endmeeting