15:00:16 <slaweq> #startmeeting neutron_ci 15:00:16 <opendevmeet> Meeting started Tue Oct 26 15:00:16 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:16 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:16 <opendevmeet> The meeting name has been set to 'neutron_ci' 15:00:38 <slaweq> welcome on the CI meeting :) 15:01:36 <obondarev> hi 15:01:54 <bcafarel> o/ 15:02:02 <slaweq> ping ralonsoh :) 15:02:07 <ralonsoh> sorry 15:02:24 <slaweq> ok, I think we can start 15:02:30 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:02:31 <slaweq> Please open now :) 15:02:56 <slaweq> #topic Actions from previous meetings 15:03:12 <slaweq> slaweq to check bug https://bugs.launchpad.net/neutron/+bug/1946187 15:03:22 <slaweq> I checked it 15:03:30 <slaweq> and It's in fact the same issue like in https://bugs.launchpad.net/neutron/+bug/1944201 - ovs-agent is down thus HA ports are DOWN and routers aren't becoming to be primary 15:04:24 <slaweq> and about https://bugs.launchpad.net/neutron/+bug/1944201 we will talk later :) 15:04:27 <slaweq> so next one 15:04:33 <slaweq> bcafarel to check n-t-p issue in Stein branch 15:04:59 * bcafarel catches on 15:05:10 <bcafarel> it is mostly good, let me find patch 15:05:30 <bcafarel> https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/813840 fixes it (by pinning version) 15:05:56 <bcafarel> I have another patch in progress to update the excluded tests list https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/815458 as some tests are unstable in that pinned version 15:06:59 <slaweq> bcafarel: ok, please let me know when this will be ready to go, I will review it 15:07:08 <mnaser_> hi folks 15:07:13 <slaweq> mnaser_: hi 15:07:19 <mnaser_> is there any good way of profiling neutron-server api calls :) 15:07:39 <mnaser_> i've got a stein deployment (yes i know :() with around ~1500 ports 15:07:42 <bcafarel> slaweq: sure! running a few rechecks to see if we should filter out a few extra ones, I will ping when ready for review) 15:07:43 <slaweq> mnaser_: we are now in the middle of the ci meeting :) 15:07:47 <mnaser_> and occasionally see api calls that take nearly 2-3s 15:07:53 <mnaser_> oh im sorry, i didnt see the topic update, /me hides 15:08:03 <mnaser_> (i saw end meeting earlier but didntsee the new start) 15:08:47 <slaweq> mnaser_: np 15:08:58 <slaweq> mnaser_: in the dev env You can use https://docs.openstack.org/neutron/latest/contributor/internals/code_profiling.html 15:09:04 <slaweq> would that be ok for You? 15:09:19 <slaweq> ok, let's get back to the ci :) 15:09:33 <slaweq> I guess we can skip stadium projects today as lajoskatona is off 15:09:43 <slaweq> so lets move on directly to the next one 15:09:47 <slaweq> #topic Stable branches 15:09:58 <slaweq> bcafarel: any updates, except that Stein fix 15:10:18 <bcafarel> I have https://bugs.launchpad.net/neutron/+bug/1948804 for train 15:10:43 <bcafarel> neutron-tempest-plugin jobs fail as they do not find guestmount, seems related to the switch to minimal image 15:11:06 <slaweq> ups 15:11:18 <slaweq> so we should use normal image for train jobs probably 15:11:26 <bcafarel> with these branches I guess just switching back to older image should be good enough fix 15:11:33 <slaweq> yeah 15:11:52 <bcafarel> ok I wanted to confirm before sending patch :) I will get one out 15:11:57 <slaweq> thx a lot 15:12:23 <slaweq> #action bcafarel to switch to use regular ubuntu image in the train jobs, related to https://bugs.launchpad.net/neutron/+bug/1948804 15:12:41 <bcafarel> full support branches are good, we should just watch out to clear out the backports queue before ussuri gets its last release 15:12:57 <slaweq> ahh, ussuri is going to EM soon, right? 15:13:35 <bcafarel> yes http://lists.openstack.org/pipermail/openstack-discuss/2021-October/025276.html https://etherpad.opendev.org/p/neutron-stable-ussuri-em 15:13:59 <bcafarel> planned date is 2021-11-12 15:14:30 <slaweq> thx, I will check that etherpad this week 15:15:59 <bcafarel> and that's all for stable 15:16:08 <slaweq> thx bcafarel 15:16:12 <slaweq> so let's move on 15:16:14 <slaweq> #topic Grafana 15:16:21 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:16:59 <slaweq> in general I don't see any major issues on the graphs 15:17:19 <slaweq> even fullstack/functional jobs are pretty ok this week 15:18:26 <slaweq> do You have anything regarding our dashboard or can we move on? 15:19:05 <bcafarel> looks good to me (which is nice to say) 15:19:25 <slaweq> ok, so lets talk about some specific jobs 15:19:30 <slaweq> #topic fullstack/functional 15:19:43 <slaweq> here I found 2 failures which looked pretty similar: 15:19:47 <slaweq> https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_eff/815280/2/check/neutron-functional-with-uwsgi/eff69a3/testr_results.html 15:19:48 <slaweq> https://02ffbd184b4d2054d841-93a4b6f56916009ea1b2d500657cc17f.ssl.cf1.rackcdn.com/802037/4/check/neutron-functional-with-uwsgi/9be975c/testr_results.html 15:20:15 <slaweq> anyone wants to investigate that? 15:20:42 <ralonsoh> I'll do it 15:20:49 <ralonsoh> is there a LP bug? 15:20:49 <slaweq> ralonsoh++ thx 15:20:59 <slaweq> no, I didn't opened bug yet 15:21:01 <ralonsoh> I'll create one 15:21:03 <slaweq> thx 15:21:22 <slaweq> #action ralonsoh to open LP bug and check functional failure with missing snat namespace 15:21:33 <slaweq> #topic Tempest/Scenario 15:21:49 <slaweq> and that's main topic for today :) 15:22:03 <slaweq> because we still have https://bugs.launchpad.net/neutron/+bug/1944201 which is killing our CI 15:22:26 <slaweq> generally about 95% of ovs scenario jobs are failing due to that issue 15:22:35 <slaweq> ralonsoh: has some idea how to improve that 15:22:38 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/815459 15:23:43 <ralonsoh> and CI is passing 15:23:46 <slaweq> thx ralonsoh 15:23:49 <slaweq> I +2 it 15:24:00 <slaweq> obondarev: and I also replied to Your comment there 15:24:08 <ralonsoh> I did too 15:24:10 <slaweq> neutron-ovs-agent is not using greenthreads AFAIR 15:24:13 <ralonsoh> it is 15:24:14 <slaweq> so it should be ok IMO 15:24:19 <ralonsoh> we monkey patch all agents 15:24:24 <slaweq> ahh 15:24:33 <ralonsoh> but I replied to the comment 15:24:45 <ralonsoh> there is one single thread executing the ovs agent code 15:25:01 <ralonsoh> other threads are attending rpc events, ovsdb updates or OF events 15:25:33 <slaweq> ahh, right, things like fdb_add/remove are processed by separate greenthread workers 15:25:41 <ralonsoh> exactly 15:25:43 <slaweq> or port/network_update notifications 15:25:45 <slaweq> ok 15:26:36 <slaweq> ralonsoh: I rechecked it to see once again results :) 15:26:40 <slaweq> I hope You don't mind 15:26:42 <ralonsoh> perfect 15:27:18 <slaweq> I hope that this will help 15:27:26 <slaweq> for now lets move on 15:27:37 <obondarev> ralonsoh: I'll check your reply, thanks 15:27:52 <slaweq> as follow up from the ptg discussions I proposed today improvement of our CI: https://review.opendev.org/c/openstack/neutron/+/815465 15:28:40 <obondarev> ralonsoh: aren't rpc threads using os_ken? 15:29:03 <ralonsoh> obondarev, no, only main thread. But we can talk later 15:29:10 <obondarev> sure, thanks 15:29:17 <ralonsoh> slaweq, just one comment 15:29:18 <ralonsoh> "neutron-ovn-tempest-slow" is multinode but "neutron-ovn-tempest-ipv6" is single node 15:29:31 <slaweq> ralonsoh: yes, I just noticed that 15:29:33 <ralonsoh> but I think the patch is valid 15:29:41 <slaweq> we can make neutron-ovn-tempest-ipv6 to be multinode job 15:29:47 <slaweq> and still we will have less jobs :) 15:29:47 <ralonsoh> perfect 15:29:51 <ralonsoh> exactly 15:30:03 <ralonsoh> +1 to this idea 15:30:13 <slaweq> ok, I will update my patch and ping You to review it again 15:30:30 <slaweq> obondarev: bcafarel please also review it and tell me what do You think about it 15:30:45 <obondarev> sure 15:30:46 <bcafarel> slaweq: sure, already in my to-review tabs :) 15:30:51 <slaweq> thx 15:31:23 <slaweq> it should remove 4 jobs from our check queue and 2 jobs from gate 15:31:31 <slaweq> so IMHO good improvement :) 15:31:40 <ralonsoh> of course! 15:32:04 <slaweq> ok, moving on 15:32:25 <slaweq> during checking today ci results I noticed that in some cases size of neutron logs is insane 15:32:42 <slaweq> like e.g. linuxbridge log in https://94d5d118ec3db75721c2-a00e37315b6784119b950c4b112ef30c.ssl.cf2.rackcdn.com/807687/4/gate/neutron-tempest-plugin-scenario-linuxbridge/b23d411/controller/logs/screen-q-agt.txt 15:32:48 <slaweq> which is about 200 MB 15:33:02 <ralonsoh> the iptables debug mode 15:33:18 <ralonsoh> that will add a huge extra to the log file 15:33:47 <slaweq> ahh, right 15:33:56 <slaweq> we enabled it some time ago in other agents too IIRC 15:34:03 <slaweq> ok 15:34:38 <slaweq> that's all what I had for today regarding scenario jobs 15:34:44 <slaweq> #topic On Demand 15:34:51 <slaweq> anything else You want to discuss today? 15:34:57 <ralonsoh> I'm ok 15:35:51 <slaweq> ok, 2 quick things from me 15:36:13 <slaweq> as follow up after ptg, please expect some doodle with question about time slot for that meeting 15:36:19 <slaweq> I will try to send it this week 15:36:26 <slaweq> I will send email to the ML 15:36:32 <slaweq> and second thing: 15:36:41 <slaweq> also as a result of the CI discussion 15:36:46 <slaweq> *ptg discussion 15:36:59 <slaweq> next week we can do meeting on video - wdyt? 15:37:18 <obondarev> + 15:37:22 <slaweq> and keep irc opened for all the time to have action items, etc. 15:37:34 <ralonsoh> +1 to video calls 15:37:49 <slaweq> #action slaweq to prepare doodle with question about new time slot for CI meetings 15:38:11 <ralonsoh> will be nice to have at least one video call per week 15:38:17 <slaweq> #action slaweq to prepare ci meeting on tue 2.11 as video call 15:38:55 <slaweq> and that's all from me for today 15:39:15 <slaweq> if You don't have anything else, I think we can finish earlier today 15:39:29 <bcafarel> sounds good for both (well for all 3 topics with "finish earlier") 15:39:35 <slaweq> LOL 15:39:45 <slaweq> thx for attending meeting today 15:39:47 <ralonsoh> bye! 15:39:50 <slaweq> and see You all online 15:39:53 <bcafarel> o/ 15:39:55 <slaweq> #endmeeting