16:00:29 <slaweq> #startmeeting neutron_ci 16:00:30 <openstack> Meeting started Tue Oct 15 16:00:29 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:31 <slaweq> hi 16:00:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:33 <ralonsoh> hi 16:00:34 <openstack> The meeting name has been set to 'neutron_ci' 16:00:55 * slaweq is on last meeting today \o/ 16:00:56 <njohnston> o/ 16:02:15 <slaweq> ok, I think we can start as bcafarel will not be available today 16:02:23 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:02:27 <slaweq> please open it now 16:02:36 <slaweq> #topic Actions from previous meetings 16:02:44 <slaweq> njohnston Update the neutron-tempest-plugin dashboard in grafana 16:03:20 <njohnston> so the change has been submitted 16:03:37 <njohnston> #link https://review.opendev.org/687686 16:04:09 <njohnston> waiting for another +2 16:04:24 <ralonsoh> (you have my +1) 16:04:37 <slaweq> thx njohnston 16:04:38 <njohnston> It is dependent on this change by slaweq https://review.opendev.org/#/c/685214/ 16:05:00 <slaweq> ahh, yes I wanted to ask today for review of it 16:05:03 <ralonsoh> ^ this one, IMO, is ready to be merged 16:05:04 <slaweq> njohnston: can You? 16:05:06 <slaweq> :) 16:05:16 <njohnston> slaweq: your wish is my command. +W 16:05:35 <slaweq> LOL 16:05:37 <slaweq> thx 16:05:50 <slaweq> ok, so lets move on than 16:05:55 <slaweq> ralonsoh to check root cause of ssh issue in https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9c2/664646/6/check/tempest-ipv6-only/9c2f68f/testr_results.html.gz 16:06:15 <ralonsoh> slaweq, sorry but I didn't find the error 16:06:46 <ralonsoh> I still don't know why the IP is not given correctly to the VM 16:06:54 <ralonsoh> and I don't see any DHCP message 16:07:13 <ralonsoh> is like this VM has no connectivity to the DHCP 16:07:28 <slaweq> but it's single node job, rigth? 16:07:33 <ralonsoh> yes 16:07:51 <slaweq> so both ports (vm and dhcp) should be plugged into br-int 16:07:56 <ralonsoh> yes 16:08:30 <ralonsoh> I can give another try 16:08:33 <slaweq> there can be only to reasons IMO - either port wasn't wired properly either firewall rules were not configured in it properly 16:08:39 <slaweq> nono ralonsoh 16:08:41 <ralonsoh> but I spent one whole morning on this 16:08:51 <slaweq> if there is nothing in logs that You can't do anything :) 16:08:58 <slaweq> don't waste Your time on it 16:09:50 <clarkb> well usually that is an indication we need to improve our logging 16:10:07 <slaweq> clarkb: yes, I just think about the same 16:10:09 <clarkb> but ya update logging to cover likely cases then move on and wait for it to hit again (can also update elastic-recheck to track it for you) 16:10:18 <ralonsoh> clarkb, and that's usually the next action 16:11:23 <slaweq> ralonsoh: but do You think we can add some additional debug logs to our code to know something more about it? 16:11:46 <ralonsoh> slaweq, I'll check it and if possible, I'll submit a patch 16:11:57 <slaweq> tbh we should be able to see if ports were configured by ovs agent 16:12:22 <slaweq> and in fact IMO it was configured otherwise neutron would not report to nova that port is UP 16:12:32 <slaweq> and nova would not unpause vm, right? 16:12:47 <ralonsoh> slaweq, yes and we see it in the ovs logs 16:13:08 <slaweq> so IMO more likely this was some issue with openflow rules in br-int 16:13:36 <ralonsoh> slaweq, this is something we maybe should log better in debug/testing mode 16:14:01 <slaweq> ralonsoh: is there any way to maybe log all OF rules from bridges (br-int) every e.g. 1 second? 16:14:22 <ralonsoh> slaweq, uffff be careful 16:14:23 <slaweq> something similar to e.g. dstat: https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9c2/664646/6/check/tempest-ipv6-only/9c2f68f/controller/logs/screen-dstat.txt.gz 16:14:35 <slaweq> ralonsoh: LOL, I know but that would help us a lot 16:14:35 <ralonsoh> this could be too much for the logs 16:14:45 <ralonsoh> but yes, we can al least track the changes 16:15:03 <slaweq> ralonsoh: right, timestamp and what flow rule was added/removed 16:15:26 <ralonsoh> exactly but I need to check if ovs-ofctl allows this kind of monitoring 16:15:29 <slaweq> do You think that would this be possible? 16:15:38 <ralonsoh> I need to check it first 16:15:46 <ralonsoh> like ovs-vsctl monitor 16:15:56 <slaweq> ralonsoh: exactly 16:16:00 <slaweq> will You do this? 16:16:04 <ralonsoh> (sure 16:16:08 <slaweq> thx a lot 16:16:39 <slaweq> #action ralonsoh to check if there is any possibility to do somethig like ovsdb-monitor for openflows 16:17:05 <slaweq> thx ralonsoh for taking care of it 16:17:09 <slaweq> lets move on 16:17:18 <slaweq> next was 16:17:19 <slaweq> slaweq to prepare etherpad to track dropping py27 jobs from ci 16:17:26 <slaweq> Etherpad is there https://etherpad.openstack.org/p/neutron_drop_python_2_status but it's not ready yet 16:17:39 <slaweq> I will prepare list of jobs to remove in each repo 16:18:10 <slaweq> I know we already started it for some repos like networking-ovn and neutron 16:18:24 <slaweq> this should be much faster than switch to python 3 :) 16:19:08 <njohnston> just one note on this 16:19:15 <slaweq> njohnston: sure 16:19:27 <njohnston> there are some projects out there that are being more aggressive in deprecating python 2 16:19:39 <njohnston> for example, by starting the process of removing six 16:20:13 <njohnston> that level of effort should be discussed and planned out especially for projects that get imported by other projects like neutron/neutron-lib 16:20:38 <njohnston> so while it may seem tempting I just wanted to note for posterity's sake that such moves should be held off, at this time 16:20:39 <slaweq> njohnston: personally I don't think we should do it now 16:20:57 <slaweq> 1. because of backports to stable branches which still support py2 16:21:12 <njohnston> the particular project I saw with the six removal was a widely used oslo project, and so everything I said obviously goes double for oslo 16:21:52 <slaweq> is there any "common goal" to remove such pieces of code during ussuri also? 16:22:02 <bnemec> There's discussion underway: https://etherpad.openstack.org/p/drop-python2-support 16:22:07 <bnemec> Oh, you already saw that. 16:22:09 <slaweq> or dropping py2 support means just that we will not test it in u/s ci? 16:22:41 <bnemec> No wait, the etherpad above is neutron-specific. So that tc one is still relevant here. 16:22:58 <slaweq> bnemec: yes, thx for link 16:23:05 <slaweq> I will read it later 16:23:19 <njohnston> thanks bnemec, that link was my next thing to say :-) 16:24:03 <njohnston> so to sum up: the stadium is complicated, deprecating py2 is complicated, so let's do our usual neutron thing and overcommunicate about it all :-) 16:24:16 <bnemec> lol 16:25:16 <slaweq> njohnston: what do You mean by "overcommunicate"? :) 16:28:46 <slaweq> hmm, I think we lost njohnston now 16:28:56 <slaweq> maybe we can continue with other topics than 16:28:58 <slaweq> #topic Stadium projects 16:29:07 <njohnston> I am here. Sorry, I thought you were saying that tongue-in-cheek 16:29:41 <slaweq> ahh, no 16:29:53 <slaweq> sorry I really wanted to know what You mean exactly 16:30:07 <njohnston> I just meant that we'll talk about it in all our meetings and coordinate about it verbosely 16:30:16 <slaweq> isn't preparing some etherpad and syncing weekly about it enough? 16:30:20 <slaweq> ahh, ok 16:30:33 <slaweq> thx, so that's something which I want to do with this :) 16:31:33 * njohnston is done 16:31:36 <slaweq> regarding stadium projects, I think we can remove python3 migration from this agenda as we did for neutron meeting 16:31:41 <njohnston> agreed 16:31:54 <slaweq> so we still have tempest-plugins migration 16:32:01 <slaweq> and there is no any news about it 16:32:26 <slaweq> anything else You want to discuss regarding stadium projects' ci? 16:33:06 <slaweq> ahh, I have one more small thing 16:33:10 <slaweq> please review https://review.opendev.org/#/c/685213/ - it's needed for Train and for stadium projects 16:34:07 <slaweq> if there is nothing else related to stadium, lets move on to the next topic 16:34:09 <slaweq> #topic Grafana 16:34:19 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:35:24 <slaweq> it's a bit strange that we have so few data for gate queue 16:35:54 <slaweq> did we merge only few patches last week in neutron really? 16:36:01 <ralonsoh> not many merged last week 16:36:05 <ralonsoh> merges 16:36:50 <slaweq> hmm, it could be like that really 16:37:48 <slaweq> in check queue still our bigger problem IMO is functional and fullstack job 16:38:21 <njohnston> yep 16:38:30 <slaweq> I have couple of examples of new failures there 16:39:05 <slaweq> from good things I think that our (voting) scenario jobs works quite good recently 16:39:43 <njohnston> agreed 16:40:28 <slaweq> ok, so lets talk about those functional/fullstack failures 16:40:38 <slaweq> #topic fullstack/functional 16:40:45 <slaweq> first fullstack 16:41:05 <slaweq> ralonsoh: I saw 2 new failures of our old friend: neutron.tests.fullstack.test_dhcp_agent.TestDhcpAgentHA.test_reschedule_network_on_new_agent 16:41:11 <slaweq> https://2f600b8d6843c7d64afe-bbbb707b755a08f42bfac9929d4d55b4.ssl.cf2.rackcdn.com/688439/3/check/neutron-fullstack/119e702/testr_results.html.gz 16:41:18 <slaweq> and 16:41:20 <slaweq> https://e5965e413edbbc117465-bfd96e490b07511790a1eb1aa4beb29d.ssl.cf2.rackcdn.com/665467/37/check/neutron-fullstack/4437eb3/testr_results.html.gz 16:41:40 <slaweq> IIRC You added some extra logs some time ago to debug this when it will happen again, right? 16:42:10 <ralonsoh> slaweq, yes. I'll add this to my pile for tomorrow 16:42:23 <slaweq> ralonsoh: thx, I can help with this if You want 16:43:34 <slaweq> #action ralonsoh to investigate failed fullstack tests for dhcp agent rescheduling 16:43:51 <slaweq> ok, next one 16:43:52 <slaweq> neutron.tests.fullstack.test_qos.TestDscpMarkingQoSOvs 16:43:57 <slaweq> https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_7c2/684457/1/check/neutron-fullstack/7c2c5d8/testr_results.html.gz 16:44:09 <slaweq> It seems like there was only one icmp packet sent and test failed: https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_7c2/684457/1/check/neutron-fullstack/7c2c5d8/controller/logs/dsvm-fullstack-logs/TestDscpMarkingQoSOvs.test_dscp_marking_packets.txt.gz 16:44:21 <slaweq> so as it is "my" test I will debug this 16:44:24 <slaweq> ok for You? 16:44:29 <ralonsoh> ok 16:44:32 <njohnston> +1 16:44:35 <slaweq> :) 16:44:48 <slaweq> #action slaweq to investigate failed neutron.tests.fullstack.test_qos.TestDscpMarkingQoSOvs 16:45:01 <slaweq> now functional tests 16:45:04 <slaweq> neutron.tests.functional.agent.l2.extensions.test_ovs_agent_qos_extension.TestOVSAgentQosExtension.test_port_creation_with_dscp_marking(egress) 16:45:09 <slaweq> https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_692/688439/3/check/neutron-functional/69288a9/testr_results.html.gz 16:45:21 <slaweq> this one is interesting IMO 16:45:26 <slaweq> in test logs: https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_692/688439/3/check/neutron-functional/69288a9/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.l2.extensions.test_ovs_agent_qos_extension.TestOVSAgentQosExtension.test_port_creation_with_dscp_marking_egress_.txt.gz 16:45:38 <slaweq> there is a lot of "EVENT OVSNeutronAgentOSKenApp->ofctl_service GetDatapathRequest send_event" log lines 16:45:59 <slaweq> at first glance it looks for me like something was looping there infinitely 16:46:09 <njohnston> that's quite unusual 16:46:38 <slaweq> njohnston: yep 16:47:25 <slaweq> anyone wants to investigate that? 16:47:34 <ralonsoh> sorry not now 16:47:34 <slaweq> if not, I can assign it to myself 16:47:41 <ralonsoh> I have enough backlog 16:47:45 <slaweq> ralonsoh: sure, I know You are overloaded :) 16:47:57 <slaweq> ok, I will try to check that 16:48:26 <slaweq> #action slaweq to check strange "EVENT OVSNeutronAgentOSKenApp->ofctl_service GetDatapathRequest send_event" log lines in neutron.tests.functional.agent.l2.extensions.test_ovs_agent_qos_extension.TestOVSAgentQosExtension.test_port_creation_with_dscp_marking 16:48:42 <slaweq> ok, next one 16:48:51 <slaweq> this one might be interesting for njohnston :) 16:48:59 <slaweq> failed db migration tests again 16:49:01 <slaweq> https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_692/688439/3/check/neutron-functional/69288a9/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.l2.extensions.test_ovs_agent_qos_extension.TestOVSAgentQosExtension.test_port_creation_with_dscp_marking_egress_.txt.gz 16:49:10 <slaweq> but this time it's not due to timeout and slow node 16:49:32 <slaweq> sorry, wrong link 16:50:01 <slaweq> argh, I don't have correct link for this one now 16:50:03 <slaweq> sorry 16:50:17 <njohnston> slaweq: np 16:50:21 <slaweq> if I will find it somewhere, I will ping You 16:50:28 <njohnston> slaweq: sure thing 16:50:46 <njohnston> slaweq: also as a lead on the ofctl error from the last bug, check out https://bugzilla.redhat.com/show_bug.cgi?id=1382372 16:50:46 <openstack> bugzilla.redhat.com bug 1382372 in openstack-neutron "Selinux is blocking ovs-vswitchd during functional tests" [Urgent,Closed: currentrelease] - Assigned to twilson 16:51:02 <njohnston> it's been seen before, as an selinux issue 16:51:44 <slaweq> njohnston: but u/s ci is running on ubuntu and I don't think there is selinux enabled there 16:52:54 <njohnston> darn bugzilla, bamboozled again 16:53:17 <slaweq> lol 16:53:58 <slaweq> ok, lets move on to the last one: 16:54:00 <slaweq> neutron.tests.functional.agent.test_firewall.FirewallTestCase 16:54:02 <slaweq> https://08b11cea7395c153ac8e-9514c1b1570a8e9931b2a7d3207ef22f.ssl.cf2.rackcdn.com/684457/1/check/neutron-functional/06e9d0c/testr_results.html.gz 16:55:07 <ralonsoh> the flows were not applied?? maybe 16:55:23 <ralonsoh> (we need a flow monitor) 16:55:36 <slaweq> ralonsoh: yes 16:55:48 <slaweq> otherwise it may be hard to say what happend there 16:56:13 <slaweq> but in this case it's not "br-int" only to monitor 16:56:20 <ralonsoh> I know 16:56:21 <slaweq> each test has got own bridge 16:56:36 <ralonsoh> every test case should be able to deploy it's own monitor 16:56:42 <ralonsoh> in any bridge 16:56:53 <ralonsoh> I'll put extra effort on this 16:57:02 <slaweq> maybe we can do some simple decorator to run tests and in case of failure dump bridges from bridge used in the test somehow? 16:57:19 <ralonsoh> this could be useful too 16:57:21 <njohnston> that would be very useful 16:57:23 <slaweq> *dump flows 16:57:40 <slaweq> anyone wants to do it maybe? 16:57:47 <slaweq> if not, I can 16:57:49 <slaweq> :) 16:57:56 <ralonsoh> I can check it 16:57:57 <slaweq> at least add it to my backlog 16:58:01 <slaweq> ok, thx ralonsoh 16:58:04 <ralonsoh> along with the flow monitor 16:58:27 <slaweq> #action ralonsoh to try to log flows at the end of faliled functional test 16:58:41 <slaweq> ralonsoh: ^^ it's just as a reminder, it's not urgent for sure :) 16:58:52 <njohnston> I'll also see if I can clear enough time to take a look 16:58:59 <slaweq> thx njohnston 16:59:03 <ralonsoh> thanks! 16:59:20 <slaweq> ok, I think we have to finish now 16:59:25 <slaweq> one last thing 16:59:31 <slaweq> please add https://review.opendev.org/#/c/685705/ to Your review pile 16:59:42 <slaweq> thx for attending 16:59:51 <slaweq> and have a great day/evening :) 16:59:53 <slaweq> bye 16:59:57 <ralonsoh> bye! 16:59:57 <slaweq> #endmeeting