16:01:48 #startmeeting neutron_ci 16:01:49 Meeting started Tue Nov 20 16:01:48 2018 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:50 hello 16:01:50 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:01:54 The meeting name has been set to 'neutron_ci' 16:01:54 o/ 16:01:54 o/ 16:02:37 lets start last of today's meetings then :) 16:02:38 #topic Actions from previous meetings 16:02:48 mlavalle/slaweq to continue debugging issue with not reachable FIP in scenario jobs 16:02:56 I continued doing that 16:03:04 since I came back from berlin 16:03:11 and before leaving as well 16:03:32 I didn't have time to look at this one, but I found some issue related to https://bugs.launchpad.net/neutron/+bug/1717302 16:03:33 Launchpad bug 1717302 in neutron "Tempest floatingip scenario tests failing on DVR Multinode setup with HA" [High,In progress] - Assigned to Slawek Kaplonski (slaweq) 16:03:44 and it maybe can help with this one too 16:03:54 I've been comparing a "good run" with "bad run" 16:03:59 I sent some patch already: https://review.openstack.org/#/c/618750/ 16:04:34 this my patch can help with dvr jobs only 16:05:05 I saw issues like that in quite many test runs, e.g. in http://logs.openstack.org/24/618024/5/check/neutron-tempest-dvr-ha-multinode-full/d27f183/logs/ 16:05:42 is this going to address also https://bugs.launchpad.net/neutron/+bug/1795870? 16:05:43 Launchpad bug 1795870 in neutron "Trunk scenario test test_trunk_subport_lifecycle fails from time to time" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:06:50 if You had in l3-agent logs something like: http://logs.openstack.org/24/618024/5/check/neutron-tempest-dvr-ha-multinode-full/d27f183/logs/subnode-2/screen-q-l3.txt.gz?level=ERROR then it should help 16:07:01 if not, then it's different issue probably 16:07:34 well 16:07:50 I was thinking of a different aproach 16:08:14 we recently merged https://review.openstack.org/#/c/609924/ 16:09:02 this fixes a situation where the fip is associated to a port before the port is found 16:09:22 as a consequence, the fip is create in the snat node, right? 16:09:32 yep 16:10:02 now, when the port is bound, the patch fixes the migration of the fip / port to the corresponding node 16:10:10 a compute presumably 16:10:16 right? 16:10:24 yep 16:10:49 now, let's look at the code of test_trunk_subport_lifecycle: 16:11:14 https://github.com/openstack/neutron-tempest-plugin/blob/master/neutron_tempest_plugin/scenario/test_trunk.py#L67 16:11:34 it creates the port and associates it to a fip 16:11:53 so in a dvr env, that fip is going to the snat node 16:12:07 yes 16:12:14 and then the server is created in L76 16:12:23 so the migration starts 16:12:54 it is possible that the bug https://bugs.launchpad.net/neutron/+bug/1795870 16:12:55 Launchpad bug 1795870 in neutron "Trunk scenario test test_trunk_subport_lifecycle fails from time to time" [High,Confirmed] - Assigned to Miguel Lavalle (minsel) 16:13:24 is a consequence of the fip not being ready in the compute, because it is migrating 16:13:46 it is a race that is not fixed by https://review.openstack.org/#/c/609924/ 16:13:56 right? 16:14:36 it can be something like that 16:14:49 maybe this FIP is configured in both snat and fip- namespaces then? 16:14:57 noiw the trunk test bug happens 95% of the time with DVR 16:15:04 as in our gates all nodes are dvr_snat nodes 16:15:36 so I intend to change the test script 16:15:52 to create and associate the fip after the server 16:15:58 and see what happens 16:16:02 makes sense? 16:16:28 yes, totally :) 16:16:39 so I propose the following: 16:16:56 1) let's merge your patch first and see the effect on the trunk bug 16:17:20 2) Then let's change the trunk test script and see the effect 16:17:39 this way we learn what fix is having an efect or not 16:17:51 ok, I will address comments on my patch today (or tomorrow morning) 16:18:11 and the other think that I propose is let's track the trunk test failure independently of the bug you are fixing 16:18:31 because in the case of the trunk stuff, it might be something else 16:18:43 one more question, if it's like You are described it, is it still happens so often, or maybe it's now fixed by https://review.openstack.org/#/c/609924/11 ? 16:18:47 I mean the problem might be in the trunks code for example 16:19:19 it's still happening after merging that patch 16:19:34 I saw it yesterday 16:19:44 sure, I just raised this patch here as I saw issues like that in many tests different tests (trunk also) - only common thing was not reachable FIP 16:20:17 ok, so lets do it as You described and we will see how it will be 16:20:22 I am just trying to be disciplined and merge fixes orderly, to learn what fixes what 16:20:37 makes sense? 16:20:48 #action mlavalle to continue tracking not reachable FIP in trunk tests 16:20:55 sure, makes sense for me :) 16:21:16 I think we can go on to the next one then, right? 16:21:30 yeap 16:21:38 thanks for listening :-) 16:21:57 ok, so the next one is: 16:21:59 njohnston rename existing neutron-functional job to neutron-functional-python27 and switch neutron-functional to be py3 16:22:26 njohnston: any updates? 16:22:29 bcafarel already had that underway before we had talked about it: https://review.openstack.org/#/c/577383/ 16:23:03 It is still showing the same issue with subunit.parser that I have been stumped by: http://logs.openstack.org/83/577383/12/check/neutron-functional/e02bd4f/job-output.txt.gz#_2018-11-13_14_10_04_212536 16:23:24 bcafarel: are You around? Do You need help with this one? 16:23:45 but if I or anyone else has a eureka moment and figures out what in the functional test harness is not py3 ready, then the change is ready to go 16:24:01 o/ 16:24:20 bcafarel and I have talked about it, when last we chatted he also had not had luck finding the py3 string handling incompatibility that is causing the error 16:24:20 yeah basically what njohnston said 16:24:49 ok, I will try to take a look this week on it if I will have few minutes 16:24:55 I will try to catch this issue again, but will not mind at all if someone solves it in the meantime :) 16:25:27 ok, let's move forward then 16:25:30 njohnston make py3 etherpad 16:25:41 I guess it's https://etherpad.openstack.org/p/neutron_ci_python3, right? 16:25:42 As mentioned in the neutron team meeting, that is up 16:25:44 yep 16:25:49 #link https://etherpad.openstack.org/p/neutron_ci_python3 16:25:49 thx njohnston :) 16:26:49 I will go through it and will start doing some patches with conversion to py3 as I have some experience with it already 16:27:06 I need to flesh out the experimental jobs a bit 16:27:14 but I figure it will be a while before we get tot hose anyway 16:27:20 * to those 16:28:29 I think that we should revisit which experimental jobs we still need :) 16:29:08 yes, many of the legacy ones may be able to be removed, like legacy-tempest-dsvm-neutron-dvr-multinode-full 16:29:20 yes, I will take a look at them too 16:29:35 #action slaweq to check which experimental jobs can be removed 16:29:38 since that is covered already by neutron-tempest-dvr-ha-multinode-full in check and gate queues 16:29:53 #action slaweq to start migrating neutron CI jobs to zuul v3 syntax 16:30:33 Much appreciated slaweq, my attempts at zuul v3 conversions have taught me humility 16:30:44 LOL 16:30:51 :) 16:31:14 I spent some time on converting neutron-tempest-plugin jobs to it so I understand You :) 16:31:33 ok, lets move to the next one then 16:31:35 njohnston check if grenade is ready for py3 16:31:52 Looks like Grenade has py3 supported and has a zuul job defined for py3 by mriedem https://github.com/openstack-dev/grenade/commit/7bae489f38f8f0c82c8eb284d1841ef68d8e9a43 16:32:53 \o/ 16:32:59 so we should just switch to use this one instead of what we are using now, right? 16:33:13 yes 16:33:15 still need https://review.openstack.org/#/c/617662/ 16:33:24 but maybe unrelated to what you care about 16:35:09 No, I think that is excellent 16:35:16 I think we probably make use of that template 16:35:48 So we can either base our jobs off of that zuul template or just use it outright 16:36:49 I'm now comparing our definition of neutron-grenade job with grenade-py3 job 16:37:08 I see only one difference, in grenade-py3 there is no openstack/neutron in required projects 16:37:29 will we have to add it if we will use this job, or it's not necessary? 16:37:30 i believe that comes from legacy-dsvm-base 16:37:42 http://git.openstack.org/cgit/openstack-infra/openstack-zuul-jobs/tree/zuul.d/jobs.yaml#n915 16:37:56 but i'm never really sure how that all works 16:38:09 ahh, so we don't need to define it in our template 16:38:23 i do'nt think it's needed, 16:38:27 grenade-py3 is stein only, 16:38:35 and devstack has defaulted to neutron since i think newton 16:38:37 thx mriedem 16:39:01 so IMO we could switch to use this template in neutron's .zuul.yaml file 16:39:09 njohnston: will You do it then? 16:39:54 definitely 16:39:58 thx 16:40:21 correct, neutron is the default since ocata, but doesn't matter for this anyway https://github.com/openstack-infra/devstack-gate/blob/master/devstack-vm-gate-wrap.sh#L201 16:40:25 *correction 16:40:26 #action njohnston to switch neutron to use integrated-gate-py35 with grenade-py3 job instead of our neutron-grenade job 16:40:45 thx mriedem, that sounds very good for us :) 16:41:15 ok, so lets move on to the next one then 16:41:17 slaweq to check Fullstack tests fails because process is not killed properly (bug 1798472) 16:41:18 bug 1798472 in neutron "Fullstack tests fails because process is not killed properly" [High,In progress] https://launchpad.net/bugs/1798472 - Assigned to Slawek Kaplonski (slaweq) 16:41:32 I was checking that one 16:42:01 and what I found is, that in some cases openvswitch-agent or sometimes neutron-server are not responding for SIGTERM at all 16:42:18 and then tests are failing in cleanup phase as process is not stopped properly and timeout is raised 16:42:33 oh interesting 16:42:35 didn't we approve a patch for that yesterday? 16:42:38 I did patch https://review.openstack.org/#/c/618024/ which should at least workaround it in tests 16:42:42 mlavalle: yes :) 16:42:57 I just wanted to do introduction for everyone ;) 16:43:13 cool 16:43:31 for now it failed with some not related errors so I rechecked it 16:43:44 it should help with such issues in fullstack tests 16:43:57 ok, next one 16:43:59 mlavalle to check bug 1798475 16:44:00 bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475 16:44:17 no time to check this one 16:44:50 do You plan to check that this week maybe? 16:45:05 no, I don't thinbk I'll time this week 16:45:15 Thursday and Friday are holidays 16:45:32 ok, I will assign it to me but I will check it only if I will have some time for it 16:45:45 #action slaweq to check bug 1798475 16:45:47 bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475 16:46:01 and the last one was: 16:46:03 Great, thanks! 16:46:03 slaweq to check why db_migration functional tests don't have logs 16:46:08 yw mlavalle :) 16:46:24 for this last one I didn't have time 16:46:33 so I will assign it to me for next week too 16:46:42 #action slaweq to check why db_migration functional tests don't have logs 16:46:57 ok, that were all actions from previous week 16:47:06 do You want to add anything? 16:47:37 not from me 16:47:55 ok, lets move on then 16:47:58 #topic Python 3 16:48:11 I think we discussed most things already :) 16:48:16 indeed :-) 16:48:28 I just wanted to mention that fullstack tests are now running on py3 only: 16:48:30 #topic Python 3 16:48:36 #undo 16:48:37 Removing item from minutes: #topic Python 3 16:48:39 https://review.openstack.org/#/c/604749/ 16:49:10 so we don't have neutron-fullstack-python36 anymore 16:49:13 :) 16:49:36 thx bcafarel and njohnston for that one 16:49:40 Did someone already submit a change to remove that from the grafana dashboard, or shall I do it? 16:50:03 no, I think there is no such patch yet 16:50:12 would be good if You will do it :) 16:50:25 thx for remember about that 16:50:41 #action njohnston to remove neutron-fullstack-python36 from grafana dashboard 16:50:45 Will do, and I'll prep a second one one to match up with the change in functional tests with a proper depends-on 16:50:55 thx njohnston 16:51:07 ok, so speaking about grafana 16:51:09 #topic Grafana 16:51:15 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:53:24 there were quite big failure rates during the weekend on charts but there was not many jobs running then so I don't think we should focus on those 16:53:53 except that, I think that most things are similar like it was two weeks ago 16:54:21 one thing which I want to rais is failing neutron-tempest-postgres-full periodic job 16:54:25 we should check that one 16:54:52 ok 16:56:23 looks like some nova issue, e.g.: http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/neutron-tempest-postgres-full/a52bcf9/job-output.txt.gz#_2018-11-18_06_59_45_183264 16:56:31 and nova logs: http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/neutron-tempest-postgres-full/a52bcf9/logs/screen-n-api.txt.gz?level=ERROR 16:56:43 mriedem: does it ring a bell for You ^^ ? 16:57:43 yes, 16:57:46 but should be fixed 16:58:15 oh nvm this is something else 16:58:31 last such issue is from today, http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/neutron-tempest-postgres-full/1de7427/logs/screen-n-api.txt.gz?level=ERROR 16:58:42 https://github.com/openstack/nova/commit/77881659251bdff52163ba1572e13a105eadaf7f 16:59:22 ok so the pg jobs are broken 16:59:25 has anyone reported a bug? 16:59:43 not me, I just noticed that in periodic job 16:59:58 I will report a bug aftet this meeting 17:00:01 thanks 17:00:05 (which is just over) :) 17:00:11 ok, thx for attending 17:00:16 #endmeeting