#openstack-meeting log

16:00:10 <slaweq> #startmeeting neutron_ci
16:00:10 <openstack> Meeting started Tue Oct  8 16:00:10 2019 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:14 <openstack> The meeting name has been set to 'neutron_ci'
16:00:14 <slaweq> welcome back :)
16:00:36 <ralonsoh> hi
16:00:53 <njohnston> o/
16:01:00 <ralonsoh> (3 meetings in a row)
16:01:07 <slaweq> ralonsoh: yes
16:01:15 <slaweq> that's a lot
16:01:24 <slaweq> but fortunatelly this one is the last one
16:01:34 <slaweq> so lets start and do this quick :)
16:01:35 <bcafarel> o/
16:01:41 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:01:54 <slaweq> please open ^^ now
16:02:01 <slaweq> #topic Actions from previous meetings
16:02:11 <slaweq> there is only one action from last week
16:02:16 <slaweq> slaweq to fix networking-bagpipe python3 scenario job and update etherpad
16:02:32 <slaweq> I did this patches as I mentioned before on neutron meeting alread
16:02:37 <slaweq> *already
16:03:01 <slaweq> now patch https://review.opendev.org/#/c/686990/ is in gate already - thx for reviews
16:03:18 <slaweq> and with that https://review.opendev.org/#/c/685702/ can also be merged
16:03:18 <njohnston> good catch there
16:03:28 <slaweq> njohnston: it wasn't me but CI :)
16:03:44 <bcafarel> still good catch on the fix :)
16:03:54 <slaweq> thx
16:04:07 <slaweq> ok, lets move on
16:04:09 <slaweq> #topic Stadium projects
16:04:23 <slaweq> according python 3 migration we already talked 2 hours ago
16:04:57 <slaweq> but I think that now, as we are in Ussuri cycle, we should prepare new etherpad to track dropping of py2 jobs from various repos
16:05:02 <slaweq> what do You think about it?
16:05:37 <ralonsoh> I'm ok with this
16:05:47 <ralonsoh> just to track it
16:06:05 <slaweq> yes, exactly
16:06:09 <ralonsoh> (btw, this should be fast)
16:06:12 <njohnston> sounds good
16:06:17 <slaweq> I will prepare it for next week
16:06:36 <slaweq> #action slaweq to prepare etherpad to track dropping py27 jobs from ci
16:06:57 <slaweq> according to tempest-plugins migration
16:07:15 <slaweq> we recently finished step 1 for neutron-dynamic-routing
16:07:26 <slaweq> thx tidwellr
16:07:52 <slaweq> so we are only missing removal of tempest tests from neutron-dynamic-routing repo
16:08:00 <bcafarel> step 2 patch is usually easier
16:08:01 <slaweq> and we are still missing neutron-vpnaas
16:08:06 <slaweq> bcafarel: indeed
16:08:11 <lajoskatona> o/
16:09:21 <slaweq> anything else regarding stadium projects?
16:09:46 <bcafarel> maybe check with mlavalle if he still plans to work on it? last update on patch is some time ago now
16:09:55 <bcafarel> (regargind vpnaas)
16:10:05 <slaweq> bcafarel: IIRC he told me recently that he will work on this
16:10:23 <bcafarel> ok so nothing to do for us then :)
16:10:25 <slaweq> but I will ping him if there will be no any progress in next few weeks
16:11:31 <slaweq> ok, lets move on then
16:11:36 <slaweq> #topic Grafana
16:11:54 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:12:19 <njohnston> zero unit test failures since Saturday!
16:12:37 <njohnston> wow we write some awesome code. :-)
16:12:44 <ralonsoh> fore sure!!!
16:12:47 <ralonsoh> for sure!!
16:13:54 <slaweq> njohnston: it looks for me more like some issue with data for the graphs
16:14:28 <njohnston> slaweq: pessimist. ;-) but how can we tell how large any discrepancy might be?
16:14:59 <slaweq> njohnston: tbh I don't know
16:15:12 <slaweq> maybe it's just we didn't have any jobs running in gate queue during the weekend?
16:15:26 <njohnston> slaweq: I was looking at the check queue unit test graph
16:15:27 <slaweq> because for check queue graphs looks more "natural"
16:16:11 <slaweq> njohnston: ok, so for check queue it seems that it indeed could be that all tests were passing
16:16:23 <slaweq> as in number of runs graph there are some values
16:18:09 <slaweq> lets keep an eye on those graphs for next days and if values will still be 0, we will probably need to investigate that :)
16:19:21 <njohnston> sounds good
16:19:46 <slaweq> from other things failure rates looks quite good
16:20:00 <slaweq> I have one additional thing to add regarding grafana
16:20:17 <slaweq> we have dashboard for neutron-tempest-plugin: http://grafana.openstack.org/d/zDcINcIik/neutron-tempest-plugin-failure-rate?orgId=1
16:20:27 <slaweq> but it's unmaintained for now
16:20:38 <njohnston> oh, neat
16:20:38 <slaweq> and as we are adding more and more jobs to this project
16:20:47 <slaweq> I think we should update this dashboard
16:21:09 <slaweq> even if we will not check it often, it may be useful to check from time to time if we have any issues there or not
16:21:31 <slaweq> as currently I think that we probably have quite often failing fwaas scenario job
16:21:39 <slaweq> but I don't have it on dashboard to check that for sure
16:21:52 <slaweq> any volunteer to update this dashboard?
16:22:12 <njohnston> I'll do it
16:22:22 <slaweq> njohnston: thx a lot
16:22:28 <njohnston> #action njohnston Update the neutron-tempest-plugin dashboard in grafana
16:23:10 <slaweq> that's all about grafana on my side
16:23:20 <slaweq> do You have anything else to add/ask?
16:23:29 <ralonsoh> no
16:23:39 <njohnston> no
16:23:55 <slaweq> ok, so lets move on
16:24:11 <slaweq> I was looking today for any examples of failed jobs from last week
16:24:21 <slaweq> and I didn't found too much to investigate
16:24:25 <slaweq> (which is good :))
16:24:35 <slaweq> I only found one such issue
16:24:43 <slaweq> #topic Tempest/Scenario
16:24:50 <slaweq> SSH failure: https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9c2/664646/6/check/tempest-ipv6-only/9c2f68f/testr_results.html.gz
16:25:07 <ralonsoh> with ip[v6?
16:25:08 <slaweq> lets check it now if there are maybe any obvious reasons of failure
16:25:20 <ralonsoh> no, ipv4
16:25:51 <slaweq> job is ipv6-only but IIUC this means that all services are using IPv6 to communicate
16:25:58 <slaweq> but vms can use IPv4 still
16:29:38 <slaweq> from console-log:
16:29:40 <slaweq> udhcpc (v1.23.2) started
16:29:42 <slaweq> Sending discover...
16:29:44 <slaweq> Sending discover...
16:29:46 <slaweq> Sending discover...
16:29:48 <slaweq> Usage: /sbin/cirros-dhcpc <up|down>
16:29:50 <slaweq> No lease, failing
16:30:01 <slaweq> it seems that fixed IP wasn't configured on instance at all
16:30:12 <slaweq> so it could be problem in dhcp agent or ovs agent
16:30:44 <ralonsoh> ovs?
16:32:08 <njohnston> I wonder if the DHCP requests ever got to their destination
16:32:38 <slaweq> njohnston: that we can check by downlading devstack.journal.xz.gz and checking it in journalctl tool locally
16:38:03 <ralonsoh> slaweq, sorry but where can we see the dnsmasq logs in devstack.jouirnal?
16:38:24 <slaweq> ralonsoh: if You do like is described in http://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9c2/664646/6/check/tempest-ipv6-only/9c2f68f/controller/logs/devstack.journal.README.txt.gz
16:38:47 <slaweq> You can than check this journal log file and look for e.g. dnsmasq entries there
16:39:02 <ralonsoh> no no, I mean an example of dnsmasq log register
16:40:15 <slaweq> ralonsoh: I see in this file e.g. something like
16:40:16 <slaweq> Oct 07 22:23:25 ubuntu-bionic-rax-ord-0012196912 dnsmasq-dhcp[29182]: DHCPDISCOVER(tap4319b184-00) fa:16:3e:8a:0d:b8
16:40:18 <slaweq> Oct 07 22:23:25 ubuntu-bionic-rax-ord-0012196912 dnsmasq-dhcp[29182]: DHCPOFFER(tap4319b184-00) 10.1.0.10 fa:16:3e:8a:0d:b8
16:40:20 <slaweq> Oct 07 22:23:25 ubuntu-bionic-rax-ord-0012196912 dnsmasq-dhcp[29182]: DHCPREQUEST(tap4319b184-00) 10.1.0.10 fa:16:3e:8a:0d:b8
16:40:22 <slaweq> Oct 07 22:23:25 ubuntu-bionic-rax-ord-0012196912 dnsmasq-dhcp[29182]: DHCPACK(tap4319b184-00) 10.1.0.10 fa:16:3e:8a:0d:b8 host-10-1-0-10
16:40:26 <slaweq> (it's for other port for sure, but from this job)
16:42:29 <slaweq> but I don't see there anything related to resource from this failed test
16:42:44 <slaweq> any volunteer to check that carefully?
16:42:54 <ralonsoh> slaweq, I can take it
16:43:05 <slaweq> ralonsoh: thx
16:43:22 <slaweq> action ralonsoh to check root cause of ssh issue in https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9c2/664646/6/check/tempest-ipv6-only/9c2f68f/testr_results.html.gz
16:43:29 <slaweq> #action ralonsoh to check root cause of ssh issue in https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9c2/664646/6/check/tempest-ipv6-only/9c2f68f/testr_results.html.gz
16:43:47 <slaweq> ok, that was all from my side for today
16:43:55 <slaweq> anything else You want to talk about today?
16:44:08 <njohnston> nothing from me
16:44:43 <ralonsoh> nothing
16:46:12 <slaweq> ok, thx a lot for attending
16:46:24 <slaweq> I will give You almost 15 minutes back :)
16:46:33 <slaweq> see You online and on the meeting next week
16:46:39 <slaweq> #endmeeting