16:00:16 #startmeeting neutron_ci 16:00:17 Meeting started Tue Jan 21 16:00:16 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:18 hi 16:00:19 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:21 The meeting name has been set to 'neutron_ci' 16:01:28 o/ 16:01:43 hi 16:01:57 hi 16:02:26 I think we have our usuall attendees so lets start 16:02:36 #topic Actions from previous meetings 16:02:47 first one: 16:02:48 slaweq to remove networking-midonet and tripleo based jobs from Neutron check queue 16:02:52 I did patches: 16:02:59 Midonet: https://review.opendev.org/#/c/703282/1 16:03:00 TripleO: https://review.opendev.org/#/c/703283/ 16:03:27 I sent separate patches for each job because it will be easier to revert if we will want to bring those jobs back 16:03:34 +2 to both 16:03:40 thx ralonsoh 16:04:54 and the next one: 16:04:56 bcafarel to send cherry-pick of https://review.opendev.org/#/c/680001/ to stable/stein to fix functional tests failure 16:05:17 sent and merged! https://review.opendev.org/#/c/702603/ 16:05:23 thx bcafarel 16:05:28 For midonet, I am not sure it is still broken on py3... look at this midonet change, to drop py27, the py3 jobs all work well https://review.opendev.org/#/c/701210/ 16:05:56 njohnston: yes, but there is no any scenario job there 16:05:57 njohnston, good catch 16:05:59 only UT 16:06:34 problem is that midonet's CI is using centos and it's not really working on python 3 IIRC 16:07:11 OK, I'll look into it and probably +2+W 16:07:22 thx njohnston 16:08:21 ok, I think we can move on 16:08:25 may need to wait a little, recentl failures fail with the "pip._internal.distributions.source import SourceDistribution\nImportError: cannot import name SourceDistribution\n" error that is now getting cleared up 16:08:58 njohnston: is that failure from midonet job or from where? 16:09:40 I saw that in networking-midonet-tempest-aio-ml2-centos-7 results, yes 16:10:00 ahh, ok 16:10:23 FOr those who have not seen, it was discussed on openstack-discuss 16:10:26 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-January/012117.html 16:10:31 if You have some time to play with it to fix it, that would be great, if not - I would still be for removing it for now until it will be fixed 16:11:13 slaweq: Yeah, I'll take a quick look, if it gets outside the timebox then I'll approve the change and we can move on :-) 16:11:39 njohnston: thx a lot 16:11:54 ok, next topic than? 16:12:17 +1 16:13:50 #topic Stadium projects 16:14:05 njohnston: any updates about dropping py2? 16:14:23 You wasn't on yesterday's team meeting so maybe You have something new for today :) 16:14:48 so midonet merged it's change 16:15:13 which I think means the only one left for dropping py27 is neutron-fwaas 16:15:24 yeah open reviews list is quite short now 16:15:27 #link https://review.opendev.org/#/c/688278/ 16:16:19 if neutron-fwaas is going to be retired then I believe there's no sense in doing the work of fixing it 16:16:31 so in short: mission complete! 16:16:38 cool! 16:16:38 njohnston: I don't think it's going to be retired this cycle 16:16:40 \o/ 16:16:42 maybe in next one 16:17:00 so IMHO we should merge this patch, but it shouldn't be too much work IMO 16:17:23 neutron-fwaas change was in already good shape last time I checked, should be able to merge it soon 16:17:25 ok, I'll start pushing it 16:17:49 thx njohnston 16:18:05 yes, amotoki found some small issues in this patch, except that IMO it's good to go 16:18:29 I'll push the fixes after the meeting 16:18:36 and tempest job failure wasn't related to dropping py2 for sure 16:18:39 thx njohnston 16:19:07 ok, I think that's all about stadium projects 16:19:11 so we can move on 16:19:51 #topic Grafana 16:19:59 #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:21:40 looking pretty good 16:22:11 but during the last week, the number of rechecks has increased 16:22:17 the gate is almost "closed" 16:22:31 FT is failing a lot because ovsdb timeouts 16:23:13 yes, grafana looks pretty good but I also have impression that it's not working very well for some of the jobs 16:23:59 yup, same feeling here 16:24:09 also my script shown me that last week we had 1 recheck in average to merge patch 16:24:35 but I think it's because we didn't merge many patches and some of them were with e.g. docs only, so less jobs are run on them 16:25:38 and, for example, the gate now is dropping all jobs 16:25:38 ok, so lets maybe talk about some specific jobs and issues from last week 16:25:45 ralonsoh: why? 16:25:49 any specific reason? 16:25:55 or just random failures? 16:26:03 still reviewing it 16:26:11 ok, thx 16:26:12 the recent pip issue? 16:26:38 ahh, yes I saw some email about it today 16:27:52 seems that the problem is fixed and released 16:27:57 pip 20.0.1 16:28:08 yes, this should be better now 16:29:15 ok, lets move on to the specific jobs now 16:29:23 #topic fullstack/functional 16:29:34 I have couple of issues for today 16:29:46 first one is, already mentioned by ralonsoh ovsdbapp timeouts, like e.g.: 16:29:54 https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_559/703299/1/gate/neutron-functional/559c0ed/testr_results.html 16:29:55 (fullstack) https://1a798a15650a97d81a82-17406e3478c64e603d8ff3ea0aac16c8.ssl.cf1.rackcdn.com/703366/1/check/neutron-fullstack/59e1877/testr_results.html 16:29:57 https://0bad6b662fac8347dc41-be430d2f919a8698d2e96141ed3ac146.ssl.cf5.rackcdn.com/687922/15/gate/neutron-functional/b6769d6/testr_results.html 16:30:11 I think there was old bug reported for that even 16:30:31 #link https://bugs.launchpad.net/neutron/+bug/1802640 16:30:31 Launchpad bug 1802640 in neutron "TimeoutException: Commands [ yes, this one 16:30:53 thx ralonsoh 16:30:56 I would like to submit a patch to increase the ovsdbapp log 16:31:24 and then, maybe, talk to Terry Wilson to do something there, in the command commit context 16:31:24 if that may help to debug this issue, I'm all for this :) 16:31:46 +1 if that gives a better image of what it is actually spending time on 16:32:20 I did not check the specific failure, but functional has been grumpier recently on stein/trein too - may be similar issue 16:32:35 (where rechecks are ping pong between functional and grenade timeouts) 16:33:47 ok, so ralonsoh will increase log level for ovsdbapp in those jobs and we will see then 16:34:04 +1 16:34:08 #action ralonsoh to increase log level for ovsdbapp in fullstack/functional jobs 16:34:17 thx ralonsoh for taking care of it 16:34:32 yw 16:35:15 ok, next one which I saw this week was Issue with EVENT OVSNeutronAgentOSKenApp->ofctl_service GetDatapathRequest 16:35:34 like e.g. https://f26da45659020db1220c-76fc92e5e7c4e5a091c792a95503ad1d.ssl.cf5.rackcdn.com/701565/5/check/neutron-functional/4db8d3f/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.l2.extensions.test_ovs_agent_qos_extension.TestOVSAgentQosExtension.test_policy_rule_delete_egress_.txt 16:35:56 and I remember that I already saw such issue in the past few times 16:37:00 and it failed on setup_physical_bridges because it couldn't get dp_id 16:37:51 is there anyone who wants to check that? 16:38:03 I can take a look at it 16:38:08 is there a bug reported? 16:38:36 ralonsoh: nope, but I will open one for it today 16:38:42 perfect 16:38:51 #action slaweq to open bug for issue with get_dp_id in os_ken 16:39:01 and I will send it to You 16:39:35 ok, lets move on to fullstack job than 16:39:42 first issue there: 16:39:43 https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_315/601336/42/check/neutron-fullstack/3156265/testr_results.html 16:39:49 I saw it few weeks ago also 16:40:09 and according to http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AssertionError%3A%20QoS%20register%20not%20found%20with%20queue-num%5C%22 it is happening from time to time recently 16:40:41 yes, it's very "delicate" the min-qos feature 16:40:54 if you have only one agent, ewverything is ok 16:41:10 but when we are testing, we are introducing several qos/queue configurations 16:41:20 the agent extension is not ready for this 16:41:27 but we fake this in the tests 16:41:42 but of course, we can have these kind of situations... 16:42:09 but, since the last patch, is much more stable 16:42:21 and, btw, this is still pending 16:42:31 https://review.opendev.org/#/c/687922/ 16:42:39 (re-re-re-rechecking) 16:42:52 ahh, ok 16:43:03 so this patch should helps with this issue, right? 16:43:11 I hope so 16:43:14 ok 16:43:26 (patch you are "my last hope") 16:43:33 LOL 16:43:44 ok, next issue with fullstack is 16:43:46 Failure on cleanup phase: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_564/702250/6/gate/neutron-fullstack/564244f/testr_results.html 16:43:48 Do we really need cleanup in fullstack tests? 16:44:13 I started thinking today that we maybe don't need cleanup phase in fullstack tests 16:44:27 for each test we are spawning new neutron-server, agents and everything else 16:44:36 and it's always only for this one test 16:44:53 so what's the point to delete routers, ports, networks, etc. at the end? 16:45:00 IMO it's just waste of time 16:45:07 or am I missing something here? 16:45:11 to make sure the delete functionality doesn't have bugs, yes? 16:45:25 exactly 16:45:47 njohnston: sure, but we are testing delete functionality in scenario and api tests also, no? 16:46:21 fullstack is more for kind of whitebox testing where You can check if things are actually configured on host properly 16:48:05 OK, you could make that argument. We could spin a change to remove those cleanups and make sure nothing bag happens. 16:48:23 :) 16:48:31 *bad 16:48:32 SMH 16:48:47 so I will send such patch to also see how much time we can safe on that job if we will remove cleanups from it 16:49:07 if it's not too much, maybe we can drop this patch and stay with it as it is now 16:49:08 my only concern are the L1 elements in the host (taps, devices, bridges, etc) 16:49:09 ok for You? 16:49:26 ralonsoh: sure, I'm not talking about things like that 16:49:30 we can try it 16:49:33 ok 16:49:39 I just want to skip cleaning neutron resources 16:50:00 if you run fullstack in a loop and do no start seeing OOM errors it is probably good ;) 16:50:47 bcafarel: good test indeed :) 16:51:11 #action slaweq to try to skip cleaning up neutron resources in fullstack job 16:51:37 ok, lets move on quickly to the next topic as we are running out of time 16:51:39 #topic Tempest/Scenario 16:51:44 just make sure that we also accomodate developers running fullstack on their personal laptops 16:52:06 njohnston: sure, I will check if this will be fine in such case 16:52:57 ok, speaking about tempest jobs I saw again this week issue with paramiko: 16:53:10 https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ca7/653883/11/check/neutron-tempest-plugin-scenario-linuxbridge/ca7d140/testr_results.html 16:53:18 https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_820/611605/22/gate/neutron-tempest-plugin-scenario-linuxbridge/82023b2/testr_results.html 16:53:45 #link https://review.opendev.org/#/c/702903/ 16:53:49 basically there is error like "paramiko.ssh_exception.SSHException: No existing session" 16:53:53 that's why I pushed this 16:54:22 ahhh no, this is another error 16:54:33 but that was solved last week 16:54:47 I don't think it was solved last week 16:55:01 https://review.opendev.org/#/c/701018/ 16:55:13 there was another, similar issue but with error like "NoneType object has no attribute session" or something like that 16:55:26 pfffff ok then 16:55:43 there is bug reported for this new issue, I need to find it 16:56:10 paramiko.ssh_exception.SSHException: No existing session 16:56:17 sorry 16:56:19 https://bugs.launchpad.net/neutron/+bug/1858642 16:56:19 Launchpad bug 1858642 in neutron "paramiko.ssh_exception.NoValidConnectionsError error cause dvr scenario jobs failing" [High,Confirmed] 16:56:37 if anyone whould have some time to look into it, that would be great 16:56:47 and it's not only in dvr jobs 16:57:34 and that's all from me for today 16:57:35 I can take a look, but this problem seems to be in the zuul.test.base 16:57:53 zuul.test.base? what is it? 16:58:15 nothing, wrong search 16:58:20 ahh, ok 16:58:53 ok, lets move on quickly 16:58:58 2 more things 16:59:12 1. thx ralonsoh for fix mariadb periodic job - it's working fine now 16:59:23 thanks ralonsoh! 16:59:25 2. I would like to ask You about change time of this meeting 16:59:51 what do You say about moving it to wednesday for 2pm utc, just after L3 meeting? 17:00:11 that works fine for me 17:00:12 as today at same time there is tc call in redhat every 3 weeks and I would like to attend it sometimes 17:00:23 you mean 3pm UTC 17:00:33 sorry, yes 17:00:34 3pm 17:00:38 2pm is L3 meeting 17:00:41 ok for me 17:00:44 thx 17:00:53 I will propose patch and add all of You as reviewers than 17:01:02 ok, that's all for today 17:01:05 and mail maybe also for interested folks 17:01:06 thx for attending 17:01:08 bye 17:01:12 #endmeeting