15:00:13 <slaweq> #startmeeting neutron_ci 15:00:14 <openstack> Meeting started Wed Feb 5 15:00:13 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:15 <slaweq> hi 15:00:16 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:18 <openstack> The meeting name has been set to 'neutron_ci' 15:00:19 <njohnston> o/ 15:01:13 <slaweq> ralonsoh: bcafarel haleyb: CI meeting, are You around? 15:01:20 <ralonsoh> hi 15:01:24 <bcafarel> o/ 15:01:34 <ralonsoh> I was waiting in the wrong channel 15:01:35 <bcafarel> slaweq: thanks for the ping I was looking for correct window :) 15:01:40 <haleyb> slaweq: i'm in another meeting too, have one eye here :) 15:01:46 <slaweq> :) 15:01:49 <slaweq> ok, lets start 15:01:51 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:03 <slaweq> please open it and we can move on 15:02:05 <slaweq> #topic Actions from previous meetings 15:02:10 <slaweq> slaweq to talk with gmann about vpnaas jobs on rocky 15:02:18 <slaweq> tbh I forgot about it :/ 15:02:47 <gmann> slaweq: i sent summary of what amotoki and we discussed on ML 15:02:49 <slaweq> maybe gmann is around now so we can ask him how to fix this issue with vpnaas rocky tempest jobs 15:03:02 <slaweq> gmann: ok, so I need to find this email than :) 15:03:42 <gmann> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-January/012241.html 15:03:45 <gmann> 4th point 15:04:01 <gmann> slaweq: ping me once you are done then we can discuss further 15:07:00 <slaweq> gmann: so basically we should backport to rocky https://review.opendev.org/#/c/695834/ 15:07:10 <slaweq> or at least "partially" backport it 15:07:12 <slaweq> ? 15:07:41 <gmann> slaweq: yeah that will be good for long term maintenance and how py2 EOL things happening 15:08:27 <slaweq> ok, and this will not be a problem if we will have those tests in-tree but will run them from neutron-tempest-plugin repo actually? 15:08:56 <gmann> i am fixing stable branches with stable u-c t use in tempest tox run which solve issue but backport that is what i will suggest in case of another issue 15:09:03 <gmann> true 15:09:16 <slaweq> ok, thx for explanation 15:09:18 <gmann> fixing current stable branches summary - http://lists.openstack.org/pipermail/openstack-discuss/2020-February/012371.html 15:09:36 <slaweq> #action slaweq to backport https://review.opendev.org/#/c/695834/ to stable branches in neutron-vpnaas 15:09:44 <gmann> FYI, all stable branch till rocky are broken now 15:10:14 <bcafarel> sigh 15:10:23 <slaweq> :/ 15:11:19 <slaweq> ok, lets move on 15:11:24 <slaweq> next action was 15:11:26 <slaweq> slaweq to update grafana dashboard with missing jobs 15:11:30 <slaweq> and I also forgot about it :/ 15:11:34 <slaweq> #action slaweq to update grafana dashboard with missing jobs 15:11:41 <slaweq> I will do it this week 15:12:14 <slaweq> any questions/comments to this topic? 15:12:44 <njohnston> nope 15:13:02 <ralonsoh> no 15:13:07 <slaweq> so we can move on to the next topic 15:13:08 <slaweq> #topic Stadium projects 15:13:21 <slaweq> migration to zuulv3 15:13:23 <slaweq> https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop 15:13:40 <slaweq> I was checking this etherpad few days ago, and I even send some small patches related to it 15:13:47 <slaweq> (but still needs some work) 15:14:41 <slaweq> and generally we are pretty good there 15:14:57 <slaweq> most of the legacy jobs have got patches already in review 15:14:57 <njohnston> +1 15:15:18 <slaweq> huge thx for bcafarel for sending many related patches :) 15:15:38 <bcafarel> np, some of them are still not working properly 15:16:02 <bcafarel> slaweq: as you know neutron-functional well if you have some time to take a look at https://review.opendev.org/#/c/703601/ 15:16:13 <bcafarel> I can't seem to convince it to install/find neutron :( 15:16:24 <bcafarel> (nothing urgent of course) 15:16:39 <slaweq> bcafarel: ok, I will take a look 15:17:39 <bcafarel> thanks :) 15:17:54 <slaweq> np 15:18:00 <slaweq> anything else related to the stadium projects' ci? 15:18:25 <njohnston> nope 15:18:49 <ralonsoh> no 15:19:27 <slaweq> ok, lets move on than 15:19:55 <slaweq> #topic Grafana 15:19:57 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:21:21 <slaweq> first of all our gate jobs are not running since few days 15:21:44 <slaweq> and it's mostly due to broken neutron-ovn-tempest-ovs-release job 15:22:07 <slaweq> second issue is that we have some gap in data in during last weekend and begin of this week 15:22:13 <slaweq> but that's probably infra issue 15:23:21 <njohnston> Is neutron-ovn-tempest-ovs-release one of the missing jobs that needs to be updated in grafana? I can't find it. 15:23:30 <slaweq> njohnston: yes 15:23:33 <slaweq> sorry for that :/ 15:23:33 <njohnston> ok 15:23:44 <njohnston> np :-) 15:24:50 <slaweq> also it seems we had some problem yesterday as many jobs has got high numbers there 15:25:05 <slaweq> but I don't know about any specific issue from yesterday 15:26:55 <slaweq> but this spike can be also due to low number of running jobs (or data stored) in the day before 15:28:04 <slaweq> other than that I don't see anything really wrong 15:29:11 <slaweq> ok, lets move on than 15:29:13 <slaweq> #topic fullstack/functional 15:29:28 <slaweq> I have couple of issues in fullstack tests for today 15:29:43 <slaweq> Error when connecting to the placement service (same as in last week too): 15:29:50 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f66/703143/3/check/neutron-fullstack/f667c93/controller/logs/dsvm-fullstack-logs/TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement_NIC-Switch-agent_/neutron-server--2020-02-04--12-09-12-759753_log.txt 15:30:03 <slaweq> maybe lajoskatona or rubasov could take a look at it 15:30:57 <slaweq> I pinged rubasov to join this meeting 15:31:05 <slaweq> maybe he will join soon 15:31:48 <rubasov> hi 15:31:54 <slaweq> hi rubasov 15:32:19 <slaweq> recently we spotted few times issue like in https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f66/703143/3/check/neutron-fullstack/f667c93/controller/logs/dsvm-fullstack-logs/TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement_NIC-Switch-agent_/neutron-server--2020-02-04--12-09-12-759753_log.txt 15:32:38 <slaweq> neutron-server can't connect to (fake) placement service 15:32:51 <slaweq> did You maybe saw something like that before? 15:33:07 <slaweq> or do You know why it could happen? 15:33:28 <rubasov> did not see this before 15:35:04 <slaweq> rubasov: can You try to take a look into that? 15:35:12 <rubasov> I don't really have ideas right now 15:35:17 <slaweq> not today ofcourse :) but if You will have some time 15:35:41 <rubasov> sure we'll look into it with lajoskatona 15:35:55 <rubasov> he wrote that fake placement service originally IIRC 15:36:05 <rubasov> how frequent this is? 15:36:39 <slaweq> it's not very frequent, I saw it once per week or something like that 15:37:07 <slaweq> rubasov: I will open LP to track it 15:37:09 <rubasov> okay, I'll put it on my todo list 15:37:29 <slaweq> and I will send it to You and Lajos - maybe You will have some time to take a look 15:38:08 <rubasov> that's even better, thank you 15:38:15 <slaweq> #action slaweq to open LP related to fullstack placement issue 15:38:19 <slaweq> thx rubasov 15:38:32 <slaweq> ok, another issue 15:38:35 <slaweq> in https://c355270b22583c2d2af0-42801c5a43c64ea303a559bec7f7cdd7.ssl.cf5.rackcdn.com/705903/2/check/neutron-fullstack/79cf197/controller/logs/dsvm-fullstack-logs/TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_egress_/neutron-server--2020-02-05--11-00-19-067345_log.txt 15:38:37 <rubasov> thanks 15:38:52 <slaweq> it seems for me like neutron-server just hanged and that caused test timeout 15:39:38 <slaweq> actually not, it was also connection issue: https://c355270b22583c2d2af0-42801c5a43c64ea303a559bec7f7cdd7.ssl.cf5.rackcdn.com/705903/2/check/neutron-fullstack/79cf197/controller/logs/dsvm-fullstack-logs/TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_egress_.txt 15:39:42 <slaweq> this time to neutron-server 15:40:37 <njohnston> well if the neutron server crashed hard then both the log would end precipitously like that and ECONNREFUSED is what clients would see 15:41:05 <njohnston> if it was a hang then the client would have timeouts 15:41:55 <slaweq> njohnston: but in the end of the logs You can see 15:41:57 <slaweq> 2020-02-05 11:01:19.086 22341 DEBUG neutron.agent.linux.utils [-] Running command: ['kill', '-15', '3180'] create_process /home/zuul/src/opendev.org/openstack/neutron/neutron/agent/linux/utils.py:87 15:41:59 <slaweq> 2020-02-05 11:01:19.280 22341 DEBUG neutron.tests.fullstack.resources.process [-] Process stopped: neutron-server stop /home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/resources/process.py:85 15:42:08 <slaweq> so it seems that neutron-server was properly stopped at the end 15:42:21 <slaweq> if it would crash earlier wouldn't this be an error? 15:42:51 <ralonsoh> I think so... 15:43:03 <njohnston> which log is that in? I don't see it in https://c355270b22583c2d2af0-42801c5a43c64ea303a559bec7f7cdd7.ssl.cf5.rackcdn.com/705903/2/check/neutron-fullstack/79cf197/controller/logs/dsvm-fullstack-logs/TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_egress_/neutron-server--2020-02-05--11-00-19-067345_log.txt 15:43:40 <slaweq> njohnston: it's in https://c355270b22583c2d2af0-42801c5a43c64ea303a559bec7f7cdd7.ssl.cf5.rackcdn.com/705903/2/check/neutron-fullstack/79cf197/controller/logs/dsvm-fullstack-logs/TestBwLimitQoSOvs.test_bw_limit_qos_policy_rule_lifecycle_egress_.txt 15:43:50 <slaweq> this is "test log" 15:44:30 <njohnston> ok 15:46:00 <ralonsoh> slaweq, do you have a bug for this one? 15:46:05 <ralonsoh> I can review it later 15:46:14 <slaweq> ralonsoh: nope, I saw it only once so far so and I didn't open bug for it 15:46:17 <slaweq> but I can 15:46:24 <ralonsoh> (at least this is not a QoS error) 15:47:00 <slaweq> I will ping You when I will open LP for that 15:47:09 <ralonsoh> thanks for the "present" 15:47:22 <slaweq> #action slaweq to open LP related to "hang" neutron-server 15:47:27 <slaweq> ralonsoh: yw :D 15:47:49 <slaweq> and that's all what I have for today for functional/fullstack jobs 15:48:00 <slaweq> anything else You have maybe? 15:48:31 <ralonsoh> slaweq, https://review.opendev.org/#/c/705760/ is almost merged 15:48:51 <ralonsoh> I'll abandon https://review.opendev.org/#/c/705903/ 15:49:27 <slaweq> ralonsoh: great 15:49:28 <bcafarel> so, recheck time once 705760 is in? 15:49:43 <slaweq> that should unblock our gate hopefully 15:50:12 <njohnston> excellent 15:51:11 <slaweq> ok, so we moved to scenario/tempest test now 15:51:24 <slaweq> #topic Tempest/Scenario 15:51:57 <slaweq> we already mentioned broken neutron-ovn-tempest-ovs-release job which should be fixed with 705760 15:53:38 <slaweq> from other issues I have one with test_show_network_segment_range https://9f9aee74b45263b2a9d8-795792c1f104e79962e44448ab55e3f1.ssl.cf1.rackcdn.com/681466/2/check/neutron-tempest-plugin-api/b340c8e/testr_results.html 15:53:49 <slaweq> and I think I saw something similar couple of times already 15:54:05 <slaweq> I'm not sure if that was always the same test but similar issue for sure 15:54:09 <ralonsoh> again? 15:54:16 <ralonsoh> it is the same one 15:54:28 <bcafarel> KeyError on project_id?? 15:54:38 <ralonsoh> I really don't understand why this specific key is not present 15:54:48 <ralonsoh> and this is not a trivial one, but project_id 15:55:31 <slaweq> yes, I also don't understant it 15:55:56 <slaweq> wait, actually this one was 21.01 15:56:01 <slaweq> so maybe it's an old issue 15:56:37 <slaweq> sorry for the noise than :) 15:57:12 <ralonsoh> no, but this test error is recurrent 15:57:40 <slaweq> yes, and actually I don't understand it excatly 15:58:46 <slaweq> if You look at code: https://github.com/openstack/neutron-tempest-plugin/blob/master/neutron_tempest_plugin/api/admin/test_network_segment_range.py#L201 15:58:59 <slaweq> it failed after checking "id", "name" and other attributes 15:59:04 <slaweq> so it's not like dict is empty 15:59:13 <ralonsoh> one question 15:59:14 <slaweq> there is "only" project_id missing from it 15:59:21 <ralonsoh> this test is using neutron-client 15:59:28 <ralonsoh> not os-client 15:59:32 <ralonsoh> is that correct? 15:59:59 <slaweq> idk 16:00:10 <slaweq> but IMO those tests are using tempest clients, no? 16:01:02 <ralonsoh> Ok, I'll check it 16:01:09 <slaweq> thx ralonsoh 16:01:18 <slaweq> #action ralonsoh to check missing project_id issue 16:01:20 <ralonsoh> we run out of time 16:01:26 <slaweq> ok we are out of time today 16:01:31 <slaweq> thx for attending 16:01:33 <slaweq> o/ 16:01:34 <ralonsoh> bye 16:01:35 <njohnston> \o 16:01:35 <slaweq> #endmeeting