#openstack-neutron log

15:00:58 <slaweq> #startmeeting neutron_ci
15:00:58 <opendevmeet> Meeting started Tue Aug 10 15:00:58 2021 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:58 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:58 <opendevmeet> The meeting name has been set to 'neutron_ci'
15:02:39 <obondarev> hi
15:02:47 <slaweq> hi
15:02:59 <slaweq> let's wait few more minutes for lajoskatona
15:03:08 <slaweq> I know ralonsoh_ will be late for the meeting today
15:03:12 <lajoskatona> Hi
15:03:13 <slaweq> and bcafarel is on pto
15:03:15 <slaweq> hi lajoskatona
15:03:21 <slaweq> so I think we can start now
15:03:31 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:03:39 <slaweq> #topic Actions from previous meetings
15:03:44 <slaweq> slaweq to check networking-bgpvpn stable branches failures
15:03:51 <slaweq> I checked those failures
15:04:12 <slaweq> and the problem is that horizon is already EOL in queens and pike
15:04:36 <lajoskatona> as I remember your patch using the tag for horizon was not enough?
15:04:36 <slaweq> I tried to use last eol tag from horizon in the networking-bgpvpn ci jobs
15:04:43 <slaweq> but there were some other issues as well
15:04:50 <slaweq> so I proposed http://lists.openstack.org/pipermail/openstack-discuss/2021-August/024014.html
15:06:29 <lajoskatona> We have to wait for answers if somebody still needs the branch?
15:07:39 <slaweq> lajoskatona: yes, that's official procedure
15:07:46 <slaweq> so I wanted to wait until end of this week
15:07:49 <lajoskatona> ok
15:09:08 <slaweq> ok, next one
15:09:23 <slaweq> bcafarel to check vpnaas failures in stable branches
15:09:32 <slaweq> He started it, and progress is tracked in https://etherpad.opendev.org/p/neutron-periodic-stable-202107
15:10:02 <slaweq> and that were all actions from last week
15:10:06 <slaweq> next topic
15:10:11 <slaweq> #topic Stadium projects
15:10:15 <slaweq> lajoskatona: any updates?
15:10:51 <lajoskatona> yeah I collected a few patcheswhich wait for review
15:11:02 <lajoskatona> https://paste.opendev.org/show/807983/
15:11:30 <lajoskatona> I run through them and mostly simple ones on master or some on stable branches
15:11:56 <slaweq> thx
15:12:05 <lajoskatona> I tried to make them to shape (rebase, whatever....), so please check them if you have some spare time
15:12:05 <slaweq> I will go through them tomorrow morning
15:12:15 <slaweq> for sure
15:16:09 <slaweq> I think we can move on then, right?
15:16:31 <slaweq> I will skip stable branches today as we don't have bcafarel here
15:16:59 <lajoskatona> ack
15:18:06 <slaweq> #topic Grafana
15:18:39 <slaweq> more or less it looks fine for me
15:19:10 <slaweq> do You see anything critical there?
15:20:45 <slaweq> let's move on
15:20:52 <slaweq> #topic fullstack/functional
15:21:18 <slaweq> in most cases I saw failures related to the ovn bug which obondarev already mentioned earlier today
15:21:31 <slaweq> and for which jlibosva proposed some extra logs already
15:21:39 <jlibosva> o/
15:21:47 <lajoskatona> https://bugs.launchpad.net/neutron/+bug/1938766
15:22:03 <slaweq> yes, that one
15:22:39 <jlibosva> the extra logging patch is here: https://review.opendev.org/c/openstack/neutron/+/803936
15:22:42 <slaweq> jlibosva: maybe You can try to recheck it couple of times before we will merge it?
15:22:47 <jlibosva> slaweq: yeah, I can
15:23:02 <ralonsoh> hi
15:23:06 <slaweq> or do You think it would be useful to have those logs there for the future?
15:23:12 <slaweq> hi ralonsoh :)
15:23:34 <jlibosva> I don't think it will harm to have it, it's a log message that happens once for each test
15:23:52 <slaweq> good for me
15:24:07 <slaweq> so let's merge it and we will for sure get some failure soon :)
15:26:03 <slaweq> from other issues in functional job, I found one failure of neutron.tests.functional.agent.l3.test_ha_router.L3HATestCase.test_ipv6_router_advts_and_fwd_after_router_state_change_backup
15:26:10 <slaweq> https://a1fab4006c6a1daf82f2-bd8cbc347d913753596edf9ef5797d55.ssl.cf1.rackcdn.com/786478/17/check/neutron-functional-with-uwsgi/7250dcf/testr_results.html
15:26:27 <slaweq> and TBH I think I saw similar issues in the past already
15:26:31 <slaweq> so I will report LP for that
15:26:41 <slaweq> and I will try to investigate it if I will have some time
15:27:02 <slaweq> #action slaweq to report failure in test_ipv6_router_advts_and_fwd_after_router_state_change_backup functional test
15:27:15 <slaweq> or maybe do You already know that issue? :)
15:27:34 <ralonsoh> I don't, sorry
15:27:54 <obondarev> me neither
15:28:13 <slaweq> ok, I will report it and we will see :)
15:28:18 <slaweq> next topic :)
15:28:22 <slaweq> #topic Tempest/Scenario
15:28:30 <slaweq> here I just have one new issue
15:28:36 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_757/803462/2/check/neutron-tempest-plugin-scenario-openvswitch/7575466/controller/logs/screen-n-api.txt
15:28:50 <slaweq> it seems that mysql server was killed by oom-killer
15:29:05 <ralonsoh> same as last week in fullstack
15:29:09 <slaweq> so something similar what we see in fullstack jobs recently
15:29:14 <ralonsoh> yeah
15:29:32 <slaweq> in fullstack I though it's because we are spawning many neutron processes at once
15:29:33 <ralonsoh> (we need bigger VMs...)
15:29:47 <slaweq> each test has own neutron-server, agents etc
15:30:19 <lajoskatona> for tempest even decreasing concureency will not help, or am I wrong?
15:30:30 <slaweq> but now, as I saw similar issue in the scenario job, my question is: should we have bigger VMs or do we maybe have issues with memory consumption by our processes?
15:30:42 <slaweq> lajoskatona: yes, it won't help for sure for tempest
15:30:53 <slaweq> for now I saw it only once
15:31:09 <ralonsoh> we are always on the limit
15:31:11 <slaweq> but I wanted to raise it here so all of You will be aware about it
15:31:37 <ralonsoh> I think we have increased the memory consumption when I pushed the patches for new privsep contexts
15:31:45 <ralonsoh> that creates new daemons per context
15:31:56 <ralonsoh> that's something needed to segregate the permissions
15:32:06 <ralonsoh> but the memory consumption is the drawback
15:32:30 <slaweq> maybe we should raise this topic on the ML for wider audience?
15:32:52 <slaweq> maybe we should try to use slightly bigger vms now?
15:33:03 <ralonsoh> that could help
15:33:11 <ralonsoh> the vms have 8GB, right?
15:33:15 <slaweq> I think so
15:33:27 <ralonsoh> 10-12 could be perfect for uss
15:33:41 <ralonsoh> I'll write the mail today
15:33:48 <slaweq> I know it would help - but I don't want to raise it and got answer like "you should optimize Your software and not always request more memory" :)
15:33:51 <ralonsoh> pointing to this conversation
15:33:56 <slaweq> You know what I mean I hope
15:33:58 <opendevreview> Mamatisa Nurmatov proposed openstack/neutron master: use payloads for FLOATING_IP  https://review.opendev.org/c/openstack/neutron/+/801874
15:34:08 <slaweq> that's why I raised it here also
15:34:34 <slaweq> so maybe we should first prepare some analysis or explanations why we thing that vms should be bigger :)
15:34:40 <lajoskatona> If it is related to privsep it can be a common problem as more and more projects change to it
15:34:49 <slaweq> yes, that's true
15:35:10 <slaweq> and that is IMO good reason why we would want to do such change
15:35:22 <slaweq> thx ralonsoh for volunteering to write email about it
15:35:27 <ralonsoh> yw
15:35:43 <slaweq> #action ralonsoh to send email about memory in CI vms
15:36:22 <slaweq> ok, last topic for today
15:36:28 <slaweq> #topic Periodic
15:36:46 <slaweq> here I have only one thing
15:36:55 <slaweq> our neutron-ovn-tempest-ovs-master-fedora is broken again
15:37:02 <ralonsoh> pffff
15:37:13 <slaweq> and it seems that nodes where updated to fedora 34 which isn't supported by devstack yet
15:37:19 <slaweq> https://zuul.openstack.org/build/d974a9b6e1854d21a30e0c541ff56cc4
15:37:31 <slaweq> I can check how to fix that
15:37:56 <slaweq> unless anyone else wants to work on it :)
15:38:03 <ralonsoh> I can
15:38:14 <slaweq> if You have time :)
15:38:15 <ralonsoh> I'll create a VM with f34
15:38:27 <slaweq> thx ralonsoh
15:38:28 <ralonsoh> well, I'll try to stack with a f34 vm
15:38:46 <slaweq> #action ralonsoh to check neutron-ovn-tempest-ovs-master-fedora job failures
15:38:48 <ralonsoh> is there a LP bug??
15:38:51 <ralonsoh> just asking
15:39:01 <slaweq> ralonsoh: no, there's no any bug reported for that
15:39:08 <ralonsoh> perfect, I'll do it
15:39:12 <slaweq> ++
15:39:14 <slaweq> thx
15:39:36 <slaweq> that was my last topic for today
15:39:45 <slaweq> do You have anything else You want to discuss?
15:39:52 <slaweq> or if not, we can finish earlier today
15:39:54 <obondarev> for dvr-ha check job - it's been pretty stable since last week DVR fix - may we consider it for voting?
15:40:06 <obondarev> just to prevent further DVR-HA regressions
15:40:13 <ralonsoh> +1
15:40:41 <slaweq> https://grafana.opendev.org/d/BmiopeEMz/neutron-failure-rate?viewPanel=18&orgId=1
15:40:48 <slaweq> indeed it seems like it is more stable now
15:40:58 <slaweq> we can try that
15:41:06 <slaweq> worst case, we can always revert it :)
15:41:15 <slaweq> obondarev: will You propose that?
15:41:30 <obondarev> sounds good :) yeah I will
15:41:40 <slaweq> thx a lot
15:41:45 <obondarev> sure
15:41:52 <slaweq> #action obondarev to promote dvr-ha job to be voting
15:43:15 <slaweq> ok, so if that's all for today, let's finish it earlier
15:43:21 <slaweq> thx for attending the meeting
15:43:25 <slaweq> #endmeeting