15:00:41 #startmeeting neutron_ci 15:00:43 Meeting started Tue Dec 1 15:00:41 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:44 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:46 The meeting name has been set to 'neutron_ci' 15:00:47 welcome again :) 15:00:50 not even time for coffee break :( 15:00:56 hi again 15:00:57 o/ 15:01:42 o/ 15:02:09 ok, lets start as we have couple of things to discuss here also :) 15:02:15 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:37 #topic Actions from previous meetings 15:02:42 bcafarel to fix stable branches upper-constraints in stadium projects 15:03:34 done for victoria https://review.opendev.org/c/openstack/requirements/+/764022 15:03:48 ussuri close https://review.opendev.org/c/openstack/requirements/+/764021 15:03:50 * mlavalle has a doctor appointment. will skip this meeting o/ 15:04:10 in the end this also required dropping neutron from blacklist 15:04:11 take care mlavalle :) 15:04:19 o/ mlavalle 15:04:39 with requirements folks still hoping neutron-lib would be complete one day and remove need for these steps 15:04:51 but well we know this will not be the case soon™ 15:05:13 anyway at least this will be noted in my next action item 15:05:18 what do You mean by "neutron-lib will be complete"? 15:05:31 so all projects will import only neutron-lib, and not neutron? 15:05:33 * mlavalle is only going for an eye exam. needs new eye glasses. that's all :-) 15:05:44 slaweq: indeed 15:06:01 bcafarel: that can be hard, especially that we don't work on that too much recently :/ 15:07:06 yes :/ so I think we will stay with the "need to update requirements after a release" step 15:07:33 bcafarel: and to fix that in ussuri we need https://review.opendev.org/c/openstack/requirements/+/764021 right? 15:08:04 slaweq: yes that's the one (764022 is the merged one for victoria) 15:08:21 ok, so it's almost there 15:09:33 ok, lets move to the next one 15:09:35 bcafarel to check and update doc https://docs.openstack.org/neutron/latest/contributor/policies/release-checklist.html 15:10:04 barely started, we can put that for next week 15:10:18 ok 15:10:24 #action bcafarel to check and update doc https://docs.openstack.org/neutron/latest/contributor/policies/release-checklist.html 15:10:38 so next one 15:10:42 slaweq to explore options to fix https://bugs.launchpad.net/neutron/+bug/1903531 15:10:44 Launchpad bug 1903531 in neutron "Update of neutron-server breaks compatibility to previous neutron-agent version" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:10:52 we already dicussed that on the previous meeting 15:10:56 just a bit :) 15:10:58 so no need to repeat it here 15:11:05 next one 15:11:07 slaweq to report bug against rally 15:11:16 I checked that and it's really not rally bug 15:11:23 but some red herring 15:11:34 real bug was that some subnet creation failed simply 15:11:42 so I didn't report anything againt rally 15:12:02 and that's all actions from last week 15:12:15 next topic 15:12:17 #topic Stadium projects 15:12:28 any updates about stadium projects ci? 15:12:31 lajoskatona? 15:12:41 nothing as I have seen 15:12:56 things are going on without much problem 15:13:24 lajoskatona: that's good to hear 15:13:35 #topic Stable branches 15:13:46 Victoria dashboard: https://grafana.opendev.org/d/HUCHup2Gz/neutron-failure-rate-previous-stable-release?orgId=1 15:13:49 Ussuri dashboard: https://grafana.opendev.org/d/smqHXphMk/neutron-failure-rate-older-stable-release?orgId=1 15:14:01 bcafarel: any updates/issues regarding ci of stable branches? 15:14:14 not that I am aware of at least :) 15:15:02 ok 15:15:05 so lets move on 15:15:07 #topic Grafana 15:15:33 in master branch I don't think that things are going well 15:15:45 we have plenty of issues and failure rates are pretty high for some jobs 15:15:55 especially functional/fullstack recently 15:17:16 if we see a recurrent error in the CI (on those jobs), report it and inform in IRC 15:17:30 just to let everybody know that you are on it 15:17:31 ralonsoh: yes, I have couple of examples 15:17:35 perfect 15:17:37 I found them today 15:17:43 (test_walk_versions, for example) 15:17:45 but I didn't had time yet to report LPs 15:18:05 ok, regarding grafana I don't really have more to say 15:18:39 I know that some graphs are a bit not up to date recently but I want to propose one update for that when all patches which changes some jobs will be merged 15:18:48 I think there is still one or too in gerrit 15:19:02 other than that, I think we can talk about some specific jobs now 15:19:06 are You ok with that? 15:20:01 yes 15:20:09 #topic fullstack/functional 15:20:16 ok 15:20:34 first one is bug https://bugs.launchpad.net/neutron/+bug/1889781 which is still hitting us from time to time 15:20:35 Launchpad bug 1889781 in neutron "Functional tests are timing out" [High,Confirmed] 15:20:43 and I think that it's even more often recently 15:21:30 I may try to limit number of logs send to stdout during those tests 15:21:47 but if there is anyone else who wants to do that, that would be great :) 15:21:58 please then simply assign this bug to You 15:22:03 and work on it 15:22:18 is that related to the size of the logs? 15:22:29 ralonsoh: most likely yes 15:22:33 ok 15:22:41 we saw similar issue in the past in UT IIRC 15:22:53 but I think this is because of some failing tests 15:22:54 basically it is some bug in stestr or something like that 15:22:58 ralonsoh: no 15:23:00 like neutron.tests.functional.agent.linux.test_tc_lib.TcFiltersTestCase.test_add_tc_filter_vxlan [540.005735s] ... FAILED 15:23:11 expending too much time 15:23:14 if You will see logs, there is always huge gap when nothing happens 15:23:43 because all workers are blocked in other tests 15:23:55 see for example: 15:23:57 2020-11-30 10:03:00.937710 | controller | {1} neutron.tests.functional.agent.ovn.metadata.test_metadata_agent.TestMetadataAgent.test_agent_resync_on_non_existing_bridge [1.997655s] ... ok 15:23:59 2020-11-30 10:43:39.465033 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/openstack/neutron/playbooks/run_functional_job.yaml@master] 15:24:04 I know 15:24:11 those are 2 consequent lines from the log 15:24:24 so there is nothing for about 40 minutes there 15:24:27 but IMO this is because the other workers are blocked checking something 15:24:37 and that was exactly the symptom of the issue with too much output and stestr 15:26:07 ralonsoh: maybe the root cause now is different than it was with that stestr issue 15:26:10 idk really 15:26:17 there was a new release of stestr recently, not sure though what it fixes 15:26:21 but at first glance it looks similar to what we had in the past 15:28:23 anyway, if someone will have some time, You can take a look at that bug :) 15:28:43 lets move on 15:28:45 next one 15:28:59 I noticed few times this week failures with TestSimpleMonitorInterface 15:29:03 like e.g.: 15:29:08 https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_93d/764365/1/gate/neutron-functional-with-uwsgi/93df51c/testr_results.html 15:29:16 I need to report LP for that 15:29:45 ralonsoh: isn't that related to some of Your changes maybe? It looks like something what You could work on :) 15:30:03 sure, I'll check it 15:30:09 and I'll report a LP 15:31:12 ahh I think you are talking about a fullstack patch 15:31:12 ralonsoh: in log I see something like: 15:31:15 2020-11-30 10:40:38.271 61912 DEBUG neutron.agent.linux.utils [req-2aa4c2b1-90e9-4f8d-a708-61d18ad4f3ec - - - - -] Running command: ['sudo', '/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/bin/neutron-rootwrap', '/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/etc/neutron/rootwrap.conf', 'ovsdb-client', 'monitor', 'Interface', 15:31:17 'name,ofport,external_ids', '--format=json'] create_process /home/zuul/src/opendev.org/openstack/neutron/neutron/agent/linux/utils.py:88 15:31:19 2020-11-30 10:40:38.321 61912 DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor Interface name,ofport,external_ids --format=json]: None _read_stdout /home/zuul/src/opendev.org/openstack/neutron/neutron/agent/common/async_process.py:264 15:31:21 2020-11-30 10:40:38.322 61912 DEBUG neutron.agent.common.async_process [-] Halting async process [ovsdb-client monitor Interface name,ofport,external_ids --format=json] in response to an error. stdout: [[]] - stderr: [[]] _handle_process_error /home/zuul/src/opendev.org/openstack/neutron/neutron/agent/common/async_process.py:222 15:31:48 slaweq, in OVS there are two monitors 15:31:56 one for the ports and another one for the bridges 15:32:01 I migrated the birdges one 15:32:11 but I never finished the complex one, for ports 15:32:40 https://review.opendev.org/c/openstack/neutron/+/735201 15:32:41 so this seems for me like monitor of Interfaces 15:32:50 yes 15:32:55 name,ofport,external_ids 15:32:55 (not ports, interfaces) 15:33:37 ok, do You want to investigate it? Or do You want me to check that? 15:33:47 I'll report and investigate it 15:33:52 thx 15:34:13 #action ralonsoh to report and check issue with TestSimpleMonitorInterface in functional tests 15:34:28 in the meeting agenda https://etherpad.opendev.org/p/neutron-ci-meetings there is more examples of same failure 15:34:48 lets move now to fullstack tests 15:34:54 which are also not very stable recently 15:35:12 most often I saw issue with mysql killed by oom killer 15:35:17 bug reported https://launchpad.net/bugs/1906366 15:35:19 Launchpad bug 1906366 in neutron "oom killer kills mysqld process on the node running fullstack tests" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq) 15:35:38 I proposed patch to limit resources used there 15:35:48 but I saw that ralonsoh had some comments there 15:35:56 I didn't had time yet to address them 15:36:12 we test concurrency so I should not reduce to 1 the number of API workers 15:36:15 just this 15:36:33 ralonsoh: but is that only this one test which You actually mentioned? 15:36:38 or there are others also 15:36:47 I only found this one 15:36:55 because if that's the only test which needs 2 workers, I can set 2 workers only for that test 15:37:05 perfect 15:37:05 and use default to "1" for all other tests 15:37:58 yes, I think this is the only one 15:38:07 ok, great 15:38:12 so I will update my patch 15:38:36 that is a good exampl,e thanks for mentioning 15:38:48 and also as I see in the results now, lowering number of test runner workers from 4 to 3 results in about 18 minutes more for whole job 15:38:54 so should be acceptable 15:39:44 slaweq: if you are overloaded I can take care of this api_worker change, that dhcp test comes from us. 15:40:18 lajoskatona: thx, if You could update my patch that would be great 15:41:00 slaweq: sure 15:41:21 lajoskatona: thx a lot 15:41:40 ok, lets move on to the scenario/tempest jobs 15:41:47 #topic Tempest/Scenario 15:41:57 first of all neutron-tempest-plugin-api 15:42:11 I noticed quite often that there is one test failing 15:42:16 test_dhcp_port_status_active 15:42:20 e.g.: 15:42:24 https://1973ad26b23f3d5a6239-a05b796fccac2efb122cdf71ce7f0104.ssl.cf5.rackcdn.com/763828/4/check/neutron-tempest-plugin-api/bda79c4/testr_results.html 15:42:28 or 15:42:29 https://38bbf4ec3cadfd43de08-7d0e556db3075d25d1b91bbdcc8a4562.ssl.cf2.rackcdn.com/764108/6/check/neutron-tempest-plugin-api/cc5cbc6/testr_results.html 15:42:34 I need to report that one too 15:44:38 from what I saw in neutron-ovs-agent logs it seems that the issue is with rpc loop iteration which takes long time and due to that port is not becoming ACTIVE in 60 seconds 15:45:00 so one workaround for that could be to bump timeout in that test 15:45:09 is this because the VM is not spawned? 15:45:29 but I though that maybe ralonsoh's patches which moves sleep(0) to the end of the rpc loop iteration may help with that 15:45:40 and second patch which lowers number of workers in the tests 15:45:45 agree 15:45:47 is it also for neutron-tempest-plugin-api job? 15:46:06 ralonsoh: there is no really vm spawned in that test. It is checking just dhcp port 15:46:20 but that port needs to be provisioned by L2 entity also to be ACTIVE 15:48:05 it takes more than one minute to set the device UP 15:48:19 ralonsoh: yes 15:48:33 that's insane... 15:48:44 and You can check in neutron-ovs-agent's logs that rpc loop iteration takes about 80-90 seconds in that specific time 15:49:02 yeah 15:49:29 so I thought about patch https://review.opendev.org/c/openstack/neutron/+/755313 that maybe will help with that issue 15:49:50 if that will be merged and we will still see the same issues, I will investigate it more 15:49:57 perfect 15:50:27 #action slaweq to check if test_dhcp_port_status_active will be still failing after https://review.opendev.org/c/openstack/neutron/+/755313 will be merged 15:50:40 btw. lajoskatona if You can take a look at ^^ that would be great :) 15:51:11 slaweq: I check it 15:51:14 lajoskatona: thx 15:51:17 ok, lets move on 15:51:29 next issue which I found was in neutron-ovn-tempest-ovs-release-ipv6-only 15:51:40 I saw few times ssh failures in that job 15:51:45 https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_cac/764356/1/check/neutron-ovn-tempest-ovs-release-ipv6-only/cacd054/testr_results.html 15:51:47 https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_08c/752795/24/check/neutron-ovn-tempest-ovs-release-ipv6-only/08c6400/testr_results.html 15:53:13 in both cases it seems that even metadata wasn't reachable from the vm 15:53:30 do You know any issues which could cause that and are already reported/in progress? 15:53:48 yes but for OVS-DPDK 15:54:00 (I think this is not related) 15:54:20 https://review.opendev.org/c/openstack/neutron/+/763745 15:57:03 ok, I will report that issue on LP and ask someone from OVN squad to take a look at it 15:57:29 #action slaweq to report LP about SSH failures in the neutron-ovn-tempest-ovs-release-ipv6-only 15:57:46 and with that I think it's all for today 15:57:49 give me 10 secs, please. Fullstack related 15:57:50 from me 15:57:52 sure 15:57:56 liuyulong, https://review.opendev.org/c/openstack/neutron/+/738446 15:58:05 please, take a look at the replies 15:58:16 and anyone else is welcome to review it 15:58:19 thanks a lot 15:58:23 (that's all) 15:59:35 ok 15:59:40 thx for attending the meeting 15:59:43 bye! 15:59:46 see You online 15:59:48 o/ 15:59:48 Bye! 15:59:50 #endmeeting