16:00:10 <slaweq> #startmeeting neutron_ci 16:00:10 <openstack> Meeting started Tue Oct 8 16:00:10 2019 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:14 <openstack> The meeting name has been set to 'neutron_ci' 16:00:14 <slaweq> welcome back :) 16:00:36 <ralonsoh> hi 16:00:53 <njohnston> o/ 16:01:00 <ralonsoh> (3 meetings in a row) 16:01:07 <slaweq> ralonsoh: yes 16:01:15 <slaweq> that's a lot 16:01:24 <slaweq> but fortunatelly this one is the last one 16:01:34 <slaweq> so lets start and do this quick :) 16:01:35 <bcafarel> o/ 16:01:41 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:01:54 <slaweq> please open ^^ now 16:02:01 <slaweq> #topic Actions from previous meetings 16:02:11 <slaweq> there is only one action from last week 16:02:16 <slaweq> slaweq to fix networking-bagpipe python3 scenario job and update etherpad 16:02:32 <slaweq> I did this patches as I mentioned before on neutron meeting alread 16:02:37 <slaweq> *already 16:03:01 <slaweq> now patch https://review.opendev.org/#/c/686990/ is in gate already - thx for reviews 16:03:18 <slaweq> and with that https://review.opendev.org/#/c/685702/ can also be merged 16:03:18 <njohnston> good catch there 16:03:28 <slaweq> njohnston: it wasn't me but CI :) 16:03:44 <bcafarel> still good catch on the fix :) 16:03:54 <slaweq> thx 16:04:07 <slaweq> ok, lets move on 16:04:09 <slaweq> #topic Stadium projects 16:04:23 <slaweq> according python 3 migration we already talked 2 hours ago 16:04:57 <slaweq> but I think that now, as we are in Ussuri cycle, we should prepare new etherpad to track dropping of py2 jobs from various repos 16:05:02 <slaweq> what do You think about it? 16:05:37 <ralonsoh> I'm ok with this 16:05:47 <ralonsoh> just to track it 16:06:05 <slaweq> yes, exactly 16:06:09 <ralonsoh> (btw, this should be fast) 16:06:12 <njohnston> sounds good 16:06:17 <slaweq> I will prepare it for next week 16:06:36 <slaweq> #action slaweq to prepare etherpad to track dropping py27 jobs from ci 16:06:57 <slaweq> according to tempest-plugins migration 16:07:15 <slaweq> we recently finished step 1 for neutron-dynamic-routing 16:07:26 <slaweq> thx tidwellr 16:07:52 <slaweq> so we are only missing removal of tempest tests from neutron-dynamic-routing repo 16:08:00 <bcafarel> step 2 patch is usually easier 16:08:01 <slaweq> and we are still missing neutron-vpnaas 16:08:06 <slaweq> bcafarel: indeed 16:08:11 <lajoskatona> o/ 16:09:21 <slaweq> anything else regarding stadium projects? 16:09:46 <bcafarel> maybe check with mlavalle if he still plans to work on it? last update on patch is some time ago now 16:09:55 <bcafarel> (regargind vpnaas) 16:10:05 <slaweq> bcafarel: IIRC he told me recently that he will work on this 16:10:23 <bcafarel> ok so nothing to do for us then :) 16:10:25 <slaweq> but I will ping him if there will be no any progress in next few weeks 16:11:31 <slaweq> ok, lets move on then 16:11:36 <slaweq> #topic Grafana 16:11:54 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 16:12:19 <njohnston> zero unit test failures since Saturday! 16:12:37 <njohnston> wow we write some awesome code. :-) 16:12:44 <ralonsoh> fore sure!!! 16:12:47 <ralonsoh> for sure!! 16:13:54 <slaweq> njohnston: it looks for me more like some issue with data for the graphs 16:14:28 <njohnston> slaweq: pessimist. ;-) but how can we tell how large any discrepancy might be? 16:14:59 <slaweq> njohnston: tbh I don't know 16:15:12 <slaweq> maybe it's just we didn't have any jobs running in gate queue during the weekend? 16:15:26 <njohnston> slaweq: I was looking at the check queue unit test graph 16:15:27 <slaweq> because for check queue graphs looks more "natural" 16:16:11 <slaweq> njohnston: ok, so for check queue it seems that it indeed could be that all tests were passing 16:16:23 <slaweq> as in number of runs graph there are some values 16:18:09 <slaweq> lets keep an eye on those graphs for next days and if values will still be 0, we will probably need to investigate that :) 16:19:21 <njohnston> sounds good 16:19:46 <slaweq> from other things failure rates looks quite good 16:20:00 <slaweq> I have one additional thing to add regarding grafana 16:20:17 <slaweq> we have dashboard for neutron-tempest-plugin: http://grafana.openstack.org/d/zDcINcIik/neutron-tempest-plugin-failure-rate?orgId=1 16:20:27 <slaweq> but it's unmaintained for now 16:20:38 <njohnston> oh, neat 16:20:38 <slaweq> and as we are adding more and more jobs to this project 16:20:47 <slaweq> I think we should update this dashboard 16:21:09 <slaweq> even if we will not check it often, it may be useful to check from time to time if we have any issues there or not 16:21:31 <slaweq> as currently I think that we probably have quite often failing fwaas scenario job 16:21:39 <slaweq> but I don't have it on dashboard to check that for sure 16:21:52 <slaweq> any volunteer to update this dashboard? 16:22:12 <njohnston> I'll do it 16:22:22 <slaweq> njohnston: thx a lot 16:22:28 <njohnston> #action njohnston Update the neutron-tempest-plugin dashboard in grafana 16:23:10 <slaweq> that's all about grafana on my side 16:23:20 <slaweq> do You have anything else to add/ask? 16:23:29 <ralonsoh> no 16:23:39 <njohnston> no 16:23:55 <slaweq> ok, so lets move on 16:24:11 <slaweq> I was looking today for any examples of failed jobs from last week 16:24:21 <slaweq> and I didn't found too much to investigate 16:24:25 <slaweq> (which is good :)) 16:24:35 <slaweq> I only found one such issue 16:24:43 <slaweq> #topic Tempest/Scenario 16:24:50 <slaweq> SSH failure: https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9c2/664646/6/check/tempest-ipv6-only/9c2f68f/testr_results.html.gz 16:25:07 <ralonsoh> with ip[v6? 16:25:08 <slaweq> lets check it now if there are maybe any obvious reasons of failure 16:25:20 <ralonsoh> no, ipv4 16:25:51 <slaweq> job is ipv6-only but IIUC this means that all services are using IPv6 to communicate 16:25:58 <slaweq> but vms can use IPv4 still 16:29:38 <slaweq> from console-log: 16:29:40 <slaweq> udhcpc (v1.23.2) started 16:29:42 <slaweq> Sending discover... 16:29:44 <slaweq> Sending discover... 16:29:46 <slaweq> Sending discover... 16:29:48 <slaweq> Usage: /sbin/cirros-dhcpc <up|down> 16:29:50 <slaweq> No lease, failing 16:30:01 <slaweq> it seems that fixed IP wasn't configured on instance at all 16:30:12 <slaweq> so it could be problem in dhcp agent or ovs agent 16:30:44 <ralonsoh> ovs? 16:32:08 <njohnston> I wonder if the DHCP requests ever got to their destination 16:32:38 <slaweq> njohnston: that we can check by downlading devstack.journal.xz.gz and checking it in journalctl tool locally 16:38:03 <ralonsoh> slaweq, sorry but where can we see the dnsmasq logs in devstack.jouirnal? 16:38:24 <slaweq> ralonsoh: if You do like is described in http://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9c2/664646/6/check/tempest-ipv6-only/9c2f68f/controller/logs/devstack.journal.README.txt.gz 16:38:47 <slaweq> You can than check this journal log file and look for e.g. dnsmasq entries there 16:39:02 <ralonsoh> no no, I mean an example of dnsmasq log register 16:40:15 <slaweq> ralonsoh: I see in this file e.g. something like 16:40:16 <slaweq> Oct 07 22:23:25 ubuntu-bionic-rax-ord-0012196912 dnsmasq-dhcp[29182]: DHCPDISCOVER(tap4319b184-00) fa:16:3e:8a:0d:b8 16:40:18 <slaweq> Oct 07 22:23:25 ubuntu-bionic-rax-ord-0012196912 dnsmasq-dhcp[29182]: DHCPOFFER(tap4319b184-00) 10.1.0.10 fa:16:3e:8a:0d:b8 16:40:20 <slaweq> Oct 07 22:23:25 ubuntu-bionic-rax-ord-0012196912 dnsmasq-dhcp[29182]: DHCPREQUEST(tap4319b184-00) 10.1.0.10 fa:16:3e:8a:0d:b8 16:40:22 <slaweq> Oct 07 22:23:25 ubuntu-bionic-rax-ord-0012196912 dnsmasq-dhcp[29182]: DHCPACK(tap4319b184-00) 10.1.0.10 fa:16:3e:8a:0d:b8 host-10-1-0-10 16:40:26 <slaweq> (it's for other port for sure, but from this job) 16:42:29 <slaweq> but I don't see there anything related to resource from this failed test 16:42:44 <slaweq> any volunteer to check that carefully? 16:42:54 <ralonsoh> slaweq, I can take it 16:43:05 <slaweq> ralonsoh: thx 16:43:22 <slaweq> action ralonsoh to check root cause of ssh issue in https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9c2/664646/6/check/tempest-ipv6-only/9c2f68f/testr_results.html.gz 16:43:29 <slaweq> #action ralonsoh to check root cause of ssh issue in https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9c2/664646/6/check/tempest-ipv6-only/9c2f68f/testr_results.html.gz 16:43:47 <slaweq> ok, that was all from my side for today 16:43:55 <slaweq> anything else You want to talk about today? 16:44:08 <njohnston> nothing from me 16:44:43 <ralonsoh> nothing 16:46:12 <slaweq> ok, thx a lot for attending 16:46:24 <slaweq> I will give You almost 15 minutes back :) 16:46:33 <slaweq> see You online and on the meeting next week 16:46:39 <slaweq> #endmeeting