15:00:16 <slaweq> #startmeeting neutron_ci
15:00:16 <opendevmeet> Meeting started Tue Oct 26 15:00:16 2021 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:16 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:16 <opendevmeet> The meeting name has been set to 'neutron_ci'
15:00:38 <slaweq> welcome on the CI meeting :)
15:01:36 <obondarev> hi
15:01:54 <bcafarel> o/
15:02:02 <slaweq> ping ralonsoh :)
15:02:07 <ralonsoh> sorry
15:02:24 <slaweq> ok, I think we can start
15:02:30 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:02:31 <slaweq> Please open now :)
15:02:56 <slaweq> #topic Actions from previous meetings
15:03:12 <slaweq> slaweq to check bug https://bugs.launchpad.net/neutron/+bug/1946187
15:03:22 <slaweq> I checked it
15:03:30 <slaweq> and It's in fact the same issue like in https://bugs.launchpad.net/neutron/+bug/1944201 - ovs-agent is down thus HA ports are DOWN and routers aren't becoming to be primary
15:04:24 <slaweq> and about https://bugs.launchpad.net/neutron/+bug/1944201 we will talk later :)
15:04:27 <slaweq> so next one
15:04:33 <slaweq> bcafarel to check n-t-p issue in Stein branch
15:04:59 * bcafarel catches on
15:05:10 <bcafarel> it is mostly good, let me find patch
15:05:30 <bcafarel> https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/813840 fixes it (by pinning version)
15:05:56 <bcafarel> I have another patch in progress to update the excluded tests list https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/815458 as some tests are unstable in that pinned version
15:06:59 <slaweq> bcafarel: ok, please let me know when this will be ready to go, I will review it
15:07:08 <mnaser_> hi folks
15:07:13 <slaweq> mnaser_: hi
15:07:19 <mnaser_> is there any good way of profiling neutron-server api calls :)
15:07:39 <mnaser_> i've got a stein deployment (yes i know :() with around ~1500 ports
15:07:42 <bcafarel> slaweq: sure! running a few rechecks to see if we should filter out a few extra ones, I will ping when ready for review)
15:07:43 <slaweq> mnaser_: we are now in the middle of the ci meeting :)
15:07:47 <mnaser_> and occasionally see api calls that take nearly 2-3s
15:07:53 <mnaser_> oh im sorry, i didnt see the topic update, /me hides
15:08:03 <mnaser_> (i saw end meeting earlier but didntsee the new start)
15:08:47 <slaweq> mnaser_: np
15:08:58 <slaweq> mnaser_: in the dev env You can use https://docs.openstack.org/neutron/latest/contributor/internals/code_profiling.html
15:09:04 <slaweq> would that be ok for You?
15:09:19 <slaweq> ok, let's get back to the ci :)
15:09:33 <slaweq> I guess we can skip stadium projects today as lajoskatona is off
15:09:43 <slaweq> so lets move on directly to the next one
15:09:47 <slaweq> #topic Stable branches
15:09:58 <slaweq> bcafarel: any updates, except that Stein fix
15:10:18 <bcafarel> I have https://bugs.launchpad.net/neutron/+bug/1948804 for train
15:10:43 <bcafarel> neutron-tempest-plugin jobs fail as they do not find guestmount, seems related to the switch to minimal image
15:11:06 <slaweq> ups
15:11:18 <slaweq> so we should use normal image for train jobs probably
15:11:26 <bcafarel> with these branches I guess just switching back to older image should be good enough fix
15:11:33 <slaweq> yeah
15:11:52 <bcafarel> ok I wanted to confirm before sending patch :) I will get one out
15:11:57 <slaweq> thx a lot
15:12:23 <slaweq> #action bcafarel to switch to use regular ubuntu image in the train jobs, related to https://bugs.launchpad.net/neutron/+bug/1948804
15:12:41 <bcafarel> full support branches are good, we should just watch out to clear out the backports queue before ussuri gets its last release
15:12:57 <slaweq> ahh, ussuri is going to EM soon, right?
15:13:35 <bcafarel> yes http://lists.openstack.org/pipermail/openstack-discuss/2021-October/025276.html https://etherpad.opendev.org/p/neutron-stable-ussuri-em
15:13:59 <bcafarel> planned date is 2021-11-12
15:14:30 <slaweq> thx, I will check that etherpad this week
15:15:59 <bcafarel> and that's all for stable
15:16:08 <slaweq> thx bcafarel
15:16:12 <slaweq> so let's move on
15:16:14 <slaweq> #topic Grafana
15:16:21 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:16:59 <slaweq> in general I don't see any major issues on the graphs
15:17:19 <slaweq> even fullstack/functional jobs are pretty ok this week
15:18:26 <slaweq> do You have anything regarding our dashboard or can we move on?
15:19:05 <bcafarel> looks good to me (which is nice to say)
15:19:25 <slaweq> ok, so lets talk about some specific jobs
15:19:30 <slaweq> #topic fullstack/functional
15:19:43 <slaweq> here I found 2 failures which looked pretty similar:
15:19:47 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_eff/815280/2/check/neutron-functional-with-uwsgi/eff69a3/testr_results.html
15:19:48 <slaweq> https://02ffbd184b4d2054d841-93a4b6f56916009ea1b2d500657cc17f.ssl.cf1.rackcdn.com/802037/4/check/neutron-functional-with-uwsgi/9be975c/testr_results.html
15:20:15 <slaweq> anyone wants to investigate that?
15:20:42 <ralonsoh> I'll do it
15:20:49 <ralonsoh> is there a LP bug?
15:20:49 <slaweq> ralonsoh++ thx
15:20:59 <slaweq> no, I didn't opened bug yet
15:21:01 <ralonsoh> I'll create one
15:21:03 <slaweq> thx
15:21:22 <slaweq> #action ralonsoh to open LP bug and check functional failure with missing snat namespace
15:21:33 <slaweq> #topic Tempest/Scenario
15:21:49 <slaweq> and that's main topic for today :)
15:22:03 <slaweq> because we still have https://bugs.launchpad.net/neutron/+bug/1944201 which is killing our CI
15:22:26 <slaweq> generally about 95% of ovs scenario jobs are failing due to that issue
15:22:35 <slaweq> ralonsoh: has some idea how to improve that
15:22:38 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/815459
15:23:43 <ralonsoh> and CI is passing
15:23:46 <slaweq> thx ralonsoh
15:23:49 <slaweq> I +2 it
15:24:00 <slaweq> obondarev: and I also replied to Your comment there
15:24:08 <ralonsoh> I did too
15:24:10 <slaweq> neutron-ovs-agent is not using greenthreads AFAIR
15:24:13 <ralonsoh> it is
15:24:14 <slaweq> so it should be ok IMO
15:24:19 <ralonsoh> we monkey patch all agents
15:24:24 <slaweq> ahh
15:24:33 <ralonsoh> but I replied to the comment
15:24:45 <ralonsoh> there is one single thread executing the ovs agent code
15:25:01 <ralonsoh> other threads are attending rpc events, ovsdb updates or OF events
15:25:33 <slaweq> ahh, right, things like fdb_add/remove are processed by separate greenthread workers
15:25:41 <ralonsoh> exactly
15:25:43 <slaweq> or port/network_update notifications
15:25:45 <slaweq> ok
15:26:36 <slaweq> ralonsoh: I rechecked it to see once again results :)
15:26:40 <slaweq> I hope You don't mind
15:26:42 <ralonsoh> perfect
15:27:18 <slaweq> I hope that this will help
15:27:26 <slaweq> for now lets move on
15:27:37 <obondarev> ralonsoh: I'll check your reply, thanks
15:27:52 <slaweq> as follow up from the ptg discussions I proposed today improvement of our CI: https://review.opendev.org/c/openstack/neutron/+/815465
15:28:40 <obondarev> ralonsoh: aren't rpc threads using os_ken?
15:29:03 <ralonsoh> obondarev, no, only main thread. But we can talk later
15:29:10 <obondarev> sure, thanks
15:29:17 <ralonsoh> slaweq, just one comment
15:29:18 <ralonsoh> "neutron-ovn-tempest-slow" is multinode but "neutron-ovn-tempest-ipv6" is single node
15:29:31 <slaweq> ralonsoh: yes, I just noticed that
15:29:33 <ralonsoh> but I think the patch is valid
15:29:41 <slaweq> we can make neutron-ovn-tempest-ipv6 to be multinode job
15:29:47 <slaweq> and still we will have less jobs :)
15:29:47 <ralonsoh> perfect
15:29:51 <ralonsoh> exactly
15:30:03 <ralonsoh> +1 to this idea
15:30:13 <slaweq> ok, I will update my patch and ping You to review it again
15:30:30 <slaweq> obondarev: bcafarel please also review it and tell me what do You think about it
15:30:45 <obondarev> sure
15:30:46 <bcafarel> slaweq: sure, already in my to-review tabs :)
15:30:51 <slaweq> thx
15:31:23 <slaweq> it should remove 4 jobs from our check queue and 2 jobs from gate
15:31:31 <slaweq> so IMHO good improvement :)
15:31:40 <ralonsoh> of course!
15:32:04 <slaweq> ok, moving on
15:32:25 <slaweq> during checking today ci results I noticed that in some cases size of neutron logs is insane
15:32:42 <slaweq> like e.g. linuxbridge log in https://94d5d118ec3db75721c2-a00e37315b6784119b950c4b112ef30c.ssl.cf2.rackcdn.com/807687/4/gate/neutron-tempest-plugin-scenario-linuxbridge/b23d411/controller/logs/screen-q-agt.txt
15:32:48 <slaweq> which is about 200 MB
15:33:02 <ralonsoh> the iptables debug mode
15:33:18 <ralonsoh> that will add a huge extra to the log file
15:33:47 <slaweq> ahh, right
15:33:56 <slaweq> we enabled it some time ago in other agents too IIRC
15:34:03 <slaweq> ok
15:34:38 <slaweq> that's all what I had for today regarding scenario jobs
15:34:44 <slaweq> #topic On Demand
15:34:51 <slaweq> anything else You want to discuss today?
15:34:57 <ralonsoh> I'm ok
15:35:51 <slaweq> ok, 2 quick things from me
15:36:13 <slaweq> as follow up after ptg, please expect some doodle with question about time slot for that meeting
15:36:19 <slaweq> I will try to send it this week
15:36:26 <slaweq> I will send email to the ML
15:36:32 <slaweq> and second thing:
15:36:41 <slaweq> also as a result of the CI discussion
15:36:46 <slaweq> *ptg discussion
15:36:59 <slaweq> next week we can do meeting on video - wdyt?
15:37:18 <obondarev> +
15:37:22 <slaweq> and keep irc opened for all the time to have action items, etc.
15:37:34 <ralonsoh> +1 to video calls
15:37:49 <slaweq> #action slaweq to prepare doodle with question about new time slot for CI meetings
15:38:11 <ralonsoh> will be nice to have at least one video call per week
15:38:17 <slaweq> #action slaweq to prepare ci meeting on tue 2.11 as video call
15:38:55 <slaweq> and that's all from me for today
15:39:15 <slaweq> if You don't have anything else, I think we can finish earlier today
15:39:29 <bcafarel> sounds good for both (well for all 3 topics with "finish earlier")
15:39:35 <slaweq> LOL
15:39:45 <slaweq> thx for attending meeting today
15:39:47 <ralonsoh> bye!
15:39:50 <slaweq> and see You all online
15:39:53 <bcafarel> o/
15:39:55 <slaweq> #endmeeting