#openstack-meeting-3 log

15:00:37 <slaweq> #startmeeting neutron_ci
15:00:38 <openstack> Meeting started Tue Mar  9 15:00:37 2021 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:39 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:41 <openstack> The meeting name has been set to 'neutron_ci'
15:01:12 <lajoskatona> Hi
15:01:52 <ralonsoh> hi
15:01:56 <bcafarel> hey again
15:02:43 <slaweq> ok, let's start
15:02:50 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:03:15 <slaweq> #topic Actions from previous meetings
15:03:21 <slaweq> slaweq to check failing qos migration tests in train neutron-tempest-dvr-ha-multinode-full job
15:03:26 <slaweq> Bug reported for nova for now https://bugs.launchpad.net/nova/+bug/1917610
15:03:27 <openstack> Launchpad bug 1917610 in neutron "Migration and resize tests from tempest.scenario.test_minbw_allocation_placement.MinBwAllocationPlacementTest failing in neutron-tempest-dvr-ha-multinode-full" [Critical,Fix released]
15:03:28 <slaweq> Fixed in tempest https://review.opendev.org/c/openstack/tempest/+/778451
15:03:31 <slaweq> thx gibi for help with it :)
15:03:44 <slaweq> next one
15:03:50 <slaweq> ralonsoh to try to check how to limit number of logged lines in FT output
15:04:13 <ralonsoh> still checking this one, no progress yet
15:04:15 <ralonsoh> sorry
15:04:20 <slaweq> sure, np
15:04:32 <slaweq> can I assign it to You for next week?
15:05:13 <ralonsoh> sure
15:05:18 <slaweq> #action ralonsoh to try to check how to limit number of logged lines in FT output
15:05:20 <slaweq> thx
15:05:27 <slaweq> next one
15:05:29 <slaweq> ralonsoh to report bug with ip operations timeout in FT
15:05:36 <ralonsoh> one sec...
15:05:54 <ralonsoh> one patch: https://review.opendev.org/c/openstack/neutron/+/778735
15:06:01 <ralonsoh> LP: https://launchpad.net/bugs/1917487
15:06:02 <openstack> Launchpad bug 1917487 in neutron "[FT] "IpNetnsCommand.add" command fails frequently " [Critical,New] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
15:06:32 <ralonsoh> still working on 2) timeouts during the sysctl command execution
15:07:18 <slaweq> thx for that
15:07:31 <slaweq> I hope that with https://review.opendev.org/c/openstack/neutron/+/778735 functional tests will be a bit more stable
15:07:41 <bcafarel> that would be nice
15:08:53 <slaweq> next one
15:08:54 <slaweq> bcafarel to check failing fedora based periodic job
15:09:17 <bcafarel> so we had in fact a LP for that https://bugs.launchpad.net/neutron/+bug/1911128
15:09:18 <openstack> Launchpad bug 1911128 in neutron "Neutron with ovn driver failed to start on Fedora" [Critical,In progress] - Assigned to Bernard Cafarelli (bcafarel)
15:09:47 <bcafarel> I think main issue is ovs daemons do not run as root in Fedora, and so can not read the TLS certs (owned by stack)
15:10:09 <bcafarel> I am testing this in https://review.opendev.org/c/openstack/neutron/+/779494 (could have had results if I had modified the correct job on first try...)
15:10:35 <bcafarel> if it passes, it sounds like a good fix, we can have fedora+tls support added later, what do you think?
15:10:52 <bcafarel> oh actually it passed zuul
15:11:00 <slaweq> that would be ok as workaround at least IMO
15:11:05 <slaweq> yes, it's green now
15:11:40 <bcafarel> yes for proper support I am not sure how it would go in devstack, as "chmod 777" the certs is not really a nice fix :)
15:12:05 <slaweq> yes, but I think that it's perfectly valid to test it without ssl in that job
15:12:21 <slaweq> we don't want really to test ovs on fedora in that job
15:12:23 <slaweq> but neutron
15:12:27 <slaweq> :)
15:12:29 <bcafarel> +1
15:12:45 <slaweq> if ralonsoh and lajoskatona are ok with that, I'm ok too
15:12:49 <ralonsoh> +1
15:12:58 <lajoskatona> +1
15:13:03 <bcafarel> ok I will remove "wip" flag and then periodic can go back to green then
15:13:29 <lajoskatona> 1 less periodic failure mail then?
15:13:34 <slaweq> ++
15:13:36 <bcafarel> cross fingers :)
15:13:38 <slaweq> thx a lot
15:13:56 <slaweq> lajoskatona: do You get emails about periodic jobs results?
15:15:07 <lajoskatona> yes, but recently too much
15:15:15 <slaweq> how to configure that?
15:15:21 <slaweq> I don't get such emails :/
15:15:28 <lajoskatona> if there's only a few networking related I checked
15:15:36 <lajoskatona> I check it for you
15:15:42 <slaweq> thx
15:16:14 <slaweq> ok, lets move on
15:16:21 <slaweq> #topic Stadium projects
15:16:30 <slaweq> anything related stadium's ci?
15:17:00 <lajoskatona> I think this is where you can suscribe: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-stable-maint
15:17:45 <lajoskatona> For stadiums: some patches are moving to the stable branches, nothing serious
15:17:55 <slaweq> thx lajoskatona
15:20:06 <slaweq> ok, thx
15:20:13 <slaweq> #topic Stable branches
15:20:27 <slaweq> bcafarel: except recent pip issue, anything else worth mentioning?
15:21:02 <bcafarel> rest is mostly OK, I saw more grenade failures/timeouts than usual recently but not too bad (yet)
15:21:11 <slaweq> k
15:21:16 <bcafarel> and thanks slaweq for all the CI improvement backports, they should help stable branches too!
15:21:33 <slaweq> yes, I made some but only up to train
15:21:40 <slaweq> in older branches we have many legacy jobs
15:21:50 <slaweq> and it would be too much to backport those things
15:22:55 <bcafarel> sounds good, older EM branches if jobs get problematic, we can limit them
15:23:12 <slaweq> yeah
15:23:13 <bcafarel> and though I stein needs some rechecks from time to time, rocky and queens are quite stable these days
15:24:33 <slaweq> let's move on
15:24:35 <slaweq> #topic Grafana
15:24:40 <slaweq> grafana.openstack.org/dashboard/db/neutron-failure-rate
15:25:15 <slaweq> in overall I think that things are pretty ok now
15:25:25 <slaweq> still functional/fullstack jobs are failing most
15:25:37 <slaweq> but they also went down a bit since last week
15:25:57 <slaweq> maybe it's due to mock of the ovn maintenance task there
15:26:22 <slaweq> do You have anything regading grafana dashboards for today?
15:28:08 <slaweq> ok, so lets talk about functional jobs then
15:28:10 <slaweq> #topic fullstack/functional
15:28:17 <slaweq> I have few things there
15:28:33 <slaweq> first one is interesting (for me) issue
15:28:50 <slaweq> I proposed some time ago patch to limit number of test workers in functional job
15:28:59 <slaweq> https://review.opendev.org/c/openstack/neutron/+/778151
15:29:09 <slaweq> and now I see that this job is failing
15:29:23 <slaweq> and many tests are failed due to "too many opened files" error
15:29:29 <slaweq> https://0bf054d7c7210f57ced8-38841c8dd9732a175234859ce574a8ea.ssl.cf5.rackcdn.com/778151/3/check/neutron-functional-with-uwsgi/6757358/testr_results.html
15:29:38 <slaweq> I have no idea why it is like that
15:29:45 <ralonsoh> should be the opposite...
15:29:48 <slaweq> do You maybe have any clues?
15:29:51 <slaweq> ralonsoh: exactly :)
15:30:09 <slaweq> but it's repeatable
15:30:17 <slaweq> I rechecked few times and had such problem
15:30:37 <lajoskatona> but only with zuul?
15:30:45 <lajoskatona> I have never seen locally
15:31:02 <slaweq> I didn't try to run all functional tests locally
15:32:48 <ralonsoh> I need to review that, I have no idea why this is happening
15:32:55 <slaweq> so, any help with that is more than welcome :)
15:33:02 <slaweq> thx ralonsoh
15:34:13 <slaweq> I also reported 2 new bugs
15:34:15 <slaweq> https://bugs.launchpad.net/neutron/+bug/1917487
15:34:16 <openstack> Launchpad bug 1917487 in neutron "[FT] "IpNetnsCommand.add" command fails frequently " [Critical,New] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
15:34:18 <slaweq> sorry
15:34:24 <slaweq> this one was reported by ralonsoh :)
15:34:35 <slaweq> I just found new occurence of that issue this week
15:34:37 <slaweq> :)
15:34:50 <slaweq> but I opened new bug https://bugs.launchpad.net/neutron/+bug/1918266
15:34:50 <openstack> Launchpad bug 1918266 in neutron "Functional test test_gateway_chassis_rebalance failing due to "failed to bind logical router"" [High,Confirmed]
15:35:03 <slaweq> any volunteer to check that?
15:35:20 <slaweq> if not, I will ask jlibosva or otherwiseguy if they have some cycles to look
15:35:28 <jlibosva> o/
15:35:33 <ralonsoh> sorry, not this week, I has 6 bugs today for me
15:35:35 * jlibosva looks
15:35:47 <slaweq> ralonsoh: no need to sorry, I know You are busy :)
15:36:57 <jlibosva> slaweq: I can have a look, tho I see we still don't collect OVN logs :-/
15:37:07 <slaweq> we don't?
15:37:23 <slaweq> I thought we merged Your patch already
15:37:33 <jlibosva> yeah, we did but the logs are not there
15:37:43 <jlibosva> ah, sorry
15:37:46 <jlibosva> the patch is not yet merged
15:38:01 <jlibosva> wait :)
15:38:12 <slaweq> jlibosva: ok, so lets merge that patch first and then if the problem will happen again, I will ping You :)
15:38:17 <slaweq> fine for You?
15:38:36 <jlibosva> yes, I'll check if perhaps the patch fixed some jobs only or if functional was inlcuded too
15:39:18 <slaweq> jlibosva: are we talking about https://review.opendev.org/c/openstack/neutron/+/771658 ?
15:39:24 <slaweq> if so, it's just for tempest jobs
15:40:24 <jlibosva> slaweq: that's right
15:40:36 <jlibosva> maybe I'm looking at wrong place
15:40:47 <slaweq> jlibosva: can You do the same for functional job?
15:40:58 <jlibosva> slaweq: yes
15:41:08 <slaweq> You can make it "related to" to that LP mentioned above
15:41:24 <slaweq> #action jlibosva to fix collecting ovn logs in functional jobs
15:41:27 <slaweq> thx jlibosva
15:41:47 <slaweq> ok, generally it's all what I have for today
15:41:58 <slaweq> do You have anything else related to our CI to discuss?
15:43:08 <bcafarel> nothing else from me
15:43:12 <slaweq> if not, I think we can finish meeting earlier today
15:43:13 <ralonsoh> nope
15:43:19 <slaweq> thx for attending the meeting
15:43:24 <ralonsoh> bye
15:43:26 <slaweq> and have a great week o/
15:43:30 <slaweq> #endmeeting