16:00:18 <slaweq> #startmeeting neutron_ci 16:00:19 <openstack> Meeting started Tue Nov 27 16:00:18 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:22 <slaweq> hi 16:00:23 <openstack> The meeting name has been set to 'neutron_ci' 16:00:38 <bcafarel> o/ 16:01:58 <slaweq> let's wait few minutes for the others 16:02:06 <hongbin> o/ 16:02:12 <slaweq> I pinged them on openstack-neutron channel 16:02:17 <njohnston> o/ 16:03:48 <mlavalle> sorry, I thought it was 1 hour from now 16:04:01 <mlavalle> not used to winter time yet 16:04:07 <slaweq> :) 16:04:09 <bcafarel> :) 16:04:14 <slaweq> ok, so lets start 16:04:21 <slaweq> #topic Actions from previous meetings 16:04:32 <slaweq> mlavalle to continue tracking not reachable FIP in trunk tests 16:04:38 <mlavalle> yes 16:04:56 <mlavalle> that entails merging https://review.openstack.org/#/c/618750 16:05:04 <slaweq> this was added only to not forget about it IIRC, as we first want to get my patch merged 16:05:09 <mlavalle> (I just approved it) 16:05:09 <slaweq> right mlavalle :) 16:05:13 <slaweq> thx 16:05:24 <mlavalle> and then looking at the effects for some days 16:05:29 <slaweq> so I will add this action for next week too to remember it, ok? 16:05:33 <mlavalle> yes 16:05:37 <slaweq> #action mlavalle to continue tracking not reachable FIP in trunk tests 16:05:39 <slaweq> thx 16:05:44 <slaweq> that was quick :) 16:05:52 <slaweq> next one 16:05:54 <slaweq> slaweq to check which experimental jobs can be removed 16:05:55 <mlavalle> I was actually going to ping you.... 16:06:10 <slaweq> why? 16:06:36 <mlavalle> do you have a pointer to a traceback of the failute that the patch^^ is supposed to fix 16:06:40 <mlavalle> ? 16:06:52 <slaweq> sure 16:06:59 <slaweq> there is a lot of such issues recently 16:07:25 <mlavalle> I want to see if I find it in the trunk test failure 16:07:34 <slaweq> e.g. http://logs.openstack.org/23/619923/2/check/neutron-tempest-dvr-ha-multinode-full/e356b9a/logs/subnode-2/screen-q-l3.txt.gz?level=ERROR 16:07:51 <slaweq> and it happens in different tests, not only in trunk 16:07:59 <mlavalle> exactly, that is what I was looking for 16:08:00 <slaweq> it's general problem with FIP connectivity 16:08:16 <mlavalle> thanks 16:08:20 <slaweq> yw 16:08:44 <slaweq> ok, so going back to not needed experimental jobs 16:08:56 <haleyb> slaweq: sorry, i need to help a repair man here, just assign me some tasks :) 16:08:56 <slaweq> I did patch to remove some of them: https://review.openstack.org/619719 16:09:10 <slaweq> haleyb: sure - that we can do definitelly 16:09:12 <mlavalle> we are all repair men here 16:09:16 <mlavalle> haleyb: ^^^^ 16:09:19 <slaweq> #action haleyb takes all this week :D 16:09:27 <njohnston> lol 16:10:18 <slaweq> mlavalle: please take a look at this patch https://review.openstack.org/619719 - it has already +2 from haleyb 16:10:38 <mlavalle> slaweq: added to the pile 16:10:42 <slaweq> mlavalle: thx 16:10:50 <slaweq> ok, moving on 16:10:58 <slaweq> next one was: slaweq to start migrating neutron CI jobs to zuul v3 syntax 16:11:10 <slaweq> I opened bug for that https://bugs.launchpad.net/neutron/+bug/1804844 16:11:10 <openstack> Launchpad bug 1804844 in neutron "CI jobs definitions should be migrated to Zuul v3 syntax" [Low,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 16:11:22 <slaweq> And I pushed first patch for functional tests but it’s WIP now: https://review.openstack.org/#/c/619742/ 16:11:45 <njohnston> thanks for working on that 16:11:54 <slaweq> so if someone wants to work on migration for some job, please feel free to do it and push patch related to this bug 16:12:19 <slaweq> it is in fact a lot of patches to do but I though that one bug to track them all will be enough 16:12:44 <mlavalle> good idea 16:13:47 <slaweq> ok, next one 16:13:48 <slaweq> njohnston to switch neutron to use integrated-gate-py35 with grenade-py3 job instead of our neutron-grenade job 16:14:45 <slaweq> njohnston: any update on this one? 16:14:59 <njohnston> So the grenade-py3 job is already in check and gate queue. I am watching it for a few runs 16:15:35 <njohnston> Just for due diligence, then I'll push up a. change to disable neutron-grenade. 16:16:27 <slaweq> where it is in gate alreade? 16:16:30 <slaweq> *already 16:16:37 <slaweq> I don't see it 16:16:42 <njohnston> it is inherited from one of the templates we include 16:17:17 <njohnston> but if you look at any neutron job in zuul.openstack.org you'll see grenade-py3 16:18:30 <slaweq> ahh, ok 16:18:33 <slaweq> I see it now 16:18:55 <slaweq> so we only need to remove neutron-grenade job now and we will be done with this, right? 16:19:00 <njohnston> yep! 16:19:07 <slaweq> good 16:19:12 <slaweq> will You do it this week? 16:19:40 <njohnston> I should have the change up within the hour 16:19:45 <slaweq> #action njohnston to remove neutron-grenade job from neutron's CI queues 16:19:47 <slaweq> thx njohnston 16:19:51 <njohnston> just waiting for the job I am watching to finish 16:20:00 <slaweq> ok 16:20:07 <slaweq> so lets move on to the next onw 16:20:09 <slaweq> *one 16:20:12 <slaweq> slaweq to check bug 1798475 16:20:13 <openstack> bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475 16:20:23 <slaweq> I sent patch to store all journal logs in fullstack results: https://review.openstack.org/#/c/619935/ 16:20:34 <slaweq> I hope this will help to debug this issue as we will be able to see what is keepalived doing then. 16:20:45 <mlavalle> I'll review it today 16:21:27 <slaweq> in the future when jobs will be migrated to zuulv3 format I think this can be added as role and added to all jobs as it can be helpful with some keepalived or dnsmasq logs 16:21:28 <njohnston> it's a great idea regardless 16:21:45 <mlavalle> yeap 16:21:45 <slaweq> but for now I want it only in fullstack job as first step 16:22:41 <slaweq> #action slaweq to continue debugging bug 1798475 when journal log will be available in fullstack tests 16:22:43 <openstack> bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475 16:22:54 <slaweq> ok, lets move on 16:23:02 <slaweq> slaweq to check why db_migration functional tests don't have logs 16:23:09 <slaweq> patch https://review.openstack.org/619266 16:23:20 <slaweq> it's merged already 16:23:49 <slaweq> so now we should have logs from all functional tests in job results 16:24:04 <slaweq> next one was: 16:24:07 <slaweq> njohnston to remove neutron-fullstack-python36 from grafana dashboard 16:24:46 <njohnston> One side note on the removal of the neutron-grenade job; that job is actually in the check and gate queue for the grenade project so I'll push a change in grenade to remove those first, and use a Depends-On to make sure that goes through before the neutron change 16:25:21 <njohnston> Regarding neutron-fullstack-python36, I remember adding it, but when I went to project-config I could find no reference to it. So that is a no-op. 16:26:08 <slaweq> ahh, that's good 16:26:10 <slaweq> so it's done :) 16:26:15 <slaweq> thx njohnston for checking it 16:26:32 <slaweq> ok, so that was all actions for today 16:26:39 <mlavalle> fwiw 16:27:07 <slaweq> anything else to add or can we move on? 16:27:23 <mlavalle> nothing from me 16:27:39 <slaweq> ok, so next topic then 16:27:40 <slaweq> #topic Python 3 16:27:51 <slaweq> njohnston: bcafarel any updates from You? 16:28:29 <bcafarel> from next week not much I think 16:28:31 <bcafarel> *previous 16:28:46 <bcafarel> slaweq: except someone digging into functional tests for py3 16:29:15 <slaweq> ok, about this functional tests it is real problem 16:29:31 <njohnston> nothing from me because of PTO 16:29:55 <slaweq> I pushed today some DNM patch to test those tests with less output: https://review.openstack.org/#/c/620271/ 16:30:01 <slaweq> and indeed it was better 16:30:05 <slaweq> but not perfect 16:30:38 <slaweq> I also talked with mtreinish about it and he told me that it's know issue with stestr and too much output from tests 16:30:50 <bcafarel> :/ 16:30:56 <njohnston> :-[ 16:31:20 <slaweq> so based on his comments I think that only workaround for this is to make somehow that our tests will produce less on stdout/stderr 16:31:50 <slaweq> also in my DNM patch I had 3 tests failing: http://logs.openstack.org/71/620271/2/check/neutron-functional/a7fd8ea/logs/testr_results.html.gz 16:32:03 <slaweq> it looks for me that it's related to issue with SIGHUP 16:32:20 <slaweq> so I'm not sure if we shouldn't skip/mark as usnstable those tests for now 16:33:35 <slaweq> I will try once again this DNM patch but with those 3 tests marked as unstable to check how it will be then 16:33:42 <slaweq> and we will see then 16:34:24 <slaweq> if anyone has some idea how to fix/workaround this problem, that would be great 16:34:44 <slaweq> patch to switch functional tests to py3 is here: https://review.openstack.org/#/c/577383/ 16:34:46 <bcafarel> sounds good, we do have https://bugs.launchpad.net/neutron/+bug/1780139 open for the SIGHUP issue 16:34:47 <openstack> Launchpad bug 1780139 in neutron "Sending SIGHUP to neutron-server process causes it to hang" [Undecided,Triaged] - Assigned to Bernard Cafarelli (bcafarel) 16:36:32 <slaweq> so thats all from me about py3 16:36:51 <slaweq> njohnston: do You know how many other jobs we still should switch to py3? 16:37:34 <bcafarel> slaweq: maybe worth going through https://bugs.launchpad.net/cinder/+bug/1728640 and see if we can grab some ideas, like this "Make test logging setup fixture disable future setup" 16:37:35 <openstack> Launchpad bug 1728640 in Cinder "py35 unit test subunit.parser failures" [Critical,Fix released] - Assigned to Sean McGinnis (sean-mcginnis) 16:38:11 <slaweq> yes, that is very similar issue to what we have with functional tests now :) 16:38:26 <slaweq> I will check that this week 16:38:39 <njohnston> I believe the multinode grenade jobs still need to be switched, at a minimum; grenade-py3 does not relieve us of those sadly 16:38:45 <njohnston> I'll have to check the etherpad 16:38:46 <slaweq> #action slaweq to continue fixing funtional-py3 tests 16:39:02 <slaweq> ok, thx njohnston 16:39:06 <njohnston> #action njohnston to research py3 conversion for neutron grenade multinode jobs 16:39:15 <slaweq> I will also check neutron-tempest-plugin jobs then 16:39:31 <slaweq> #action slaweq to convert neutron-tempest-plugin jobs to py3 16:40:17 <slaweq> ok, can we go on to the next topic then? 16:40:24 <mlavalle> I think so 16:40:26 <njohnston> go ahead 16:40:30 <slaweq> #topic Grafana 16:40:37 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:41:55 <slaweq> gate queue wasn't busy last week as there was not too many people with +2 power available :) 16:42:20 <mlavalle> yeap 16:42:38 <slaweq> We have Neutron-tempest-dvr-ha-multinode-full and Neutron-tempest-plugin-dvr-multinode-scenario failing on 100% again 16:43:10 <slaweq> but from what I was checking it's very often this issue with snat namespace, which should be fixed by https://review.openstack.org/#/c/618750/ 16:43:19 <slaweq> so we should be better next week I hope 16:43:59 <slaweq> From other things, I spotted again couple of issues with cinder backups, like: 16:44:01 <slaweq> http://logs.openstack.org/64/617364/19/check/tempest-slow/18519dc/testr_results.html.gz 16:44:03 <mlavalle> yeah, let's track the effect of that 16:44:03 <slaweq> http://logs.openstack.org/87/609587/11/check/tempest-multinode-full/2a5c5a1/testr_results.html.gz 16:44:21 <slaweq> I will report this as a cinder bug today 16:44:56 <mlavalle> slaweq: and I know I have an email from you with cinder failures 16:45:07 <mlavalle> I will talk to Jay and Sean this week 16:45:38 <slaweq> from other things, we still have from time to time failures in functional tests (db-migrations timeout) and fullstack tests (this issue with keepalived mostly) and I'm trying to find out what is going on with both of them 16:45:49 <slaweq> thx mlavalle :) 16:46:07 <slaweq> one more thing related to grafana 16:46:09 <slaweq> We should add to grafana 2 new jobs: 16:46:11 <slaweq> networking-ovn-tempest-dsvm-ovs-release 16:46:13 <slaweq> Tempest-slow 16:46:17 <slaweq> any volunteer for that? :) 16:46:40 <njohnston> sure 16:46:45 <slaweq> thx njohnston :) 16:47:00 <njohnston> #action njohnston add tempest-slow and networking-ovn-tempest-dsvm-ovs-release to grafana 16:47:23 <slaweq> ok, lets move on then 16:47:29 <slaweq> #topic Tempest/Scenario 16:47:52 <slaweq> I today found out that we have job neutron-tempest-dvr in our queue 16:48:00 <slaweq> and it looks that it is single node dvr job 16:48:13 <slaweq> is it intentional? do we want to keep it like that? 16:48:33 <slaweq> It looks the same as neutron-tempest-dvr-ha-multinode-full job in fact 16:48:47 <njohnston> ISTR some discussion about this a long time ago, like in the newton timeframe 16:48:48 <slaweq> only difference is that this multinode job is non-voting 16:49:05 <njohnston> I think the goal was for the multinode job to end up being the voting one 16:49:21 <mlavalle> yes, I think I have the same recollection 16:49:32 <mlavalle> we can discuss in the L3 meeting 16:49:36 <njohnston> +1 16:49:42 <slaweq> njohnston: that is not possible to have multinode job voting now ;) 16:50:00 <slaweq> ok, mlavalle please then add this to L3 meeting agenda if You can 16:50:05 <hongbin> does the multinode job stable enough? 16:50:06 <mlavalle> yes 16:50:15 <mlavalle> hongbin: not even close 16:50:32 <slaweq> #action mlavalle to discuss about neutron-tempest-dvr job in L3 meeting 16:50:43 <slaweq> hongbin: it depends what You mean by stable 16:51:00 <slaweq> it's very stable now as it is on 100% of failures all the time :P 16:51:26 <hongbin> slaweq: if it doesn't block the merging too much after turning into voting, then it is fine 16:52:07 <slaweq> hongbin: it will block everything currently but I agree that we should focus on stabilize it 16:52:21 <slaweq> and we are working on it since some time 16:52:36 <hongbin> ack 16:53:03 <slaweq> ok, lets move on then 16:53:05 <slaweq> #topic Periodic 16:53:23 <slaweq> I just want to mention that we still have neutron-tempest-postgres-full failing all the time 16:53:29 <slaweq> but it's nova issue 16:53:35 <slaweq> bug reported: https://bugs.launchpad.net/nova/+bug/1804271 16:53:38 <openstack> Launchpad bug 1804271 in OpenStack Compute (nova) "nova-api is broken in postgresql jobs" [High,In progress] - Assigned to Matt Riedemann (mriedem) 16:53:41 <slaweq> Fix in progress: https://review.openstack.org/#/c/619061/ 16:53:53 <slaweq> so we should be good when this will be merged 16:54:05 <mriedem> slaweq: here is a tip, 16:54:17 <mriedem> show up in the nova channel and ask that another core look at that already +2ed fix for the postgres job 16:54:33 <mriedem> i would, but i've already spent some review request karma today 16:54:41 <slaweq> mriedem: ok, I will :) 16:54:44 <slaweq> thx 16:55:31 <slaweq> last topic then 16:55:33 <slaweq> #topic Open discussion 16:55:46 <slaweq> anyone wants to discuss about anything? 16:56:00 <hongbin> i have one 16:56:08 <slaweq> go on hongbin 16:56:27 <hongbin> i don't like the long list of extensions in zuul job, so i propose a patch: https://review.openstack.org/#/c/619642/ 16:56:46 <hongbin> i want to know if this is what you guys prefer to do? 16:57:02 <hongbin> or it is not a good idea 16:57:25 <slaweq> yes, IMO that it easier to read in diff 16:57:30 <bcafarel> it certainly better fits the screen 16:58:04 <njohnston> Would it be possible to use reusable snippets like we do with *tempest-irrelevant-files now? 16:58:14 <hongbin> yes, it possibly will fix the frequeent merge conflict between patches 16:58:38 <slaweq> hongbin: njohnston: great ideas 16:58:59 <hongbin> njohnston: i am not sure, because the list of extensions look different between jobs 16:59:08 <slaweq> hongbin: not all jobs 16:59:23 <slaweq> You can define snippet "per branch" and reuse them if necessary 16:59:33 <slaweq> at least for master branch it should be fine 16:59:52 <hongbin> yes, we can possibly consolidate the stable branch list 16:59:58 <hongbin> i will look into that 17:00:16 <slaweq> ok, we have to finish now 17:00:20 <slaweq> thx for attending 17:00:23 <slaweq> #endmeeting