#openstack-neutron log

15:00:20 <slaweq> #startmeeting neutron_ci
15:00:20 <opendevmeet> Meeting started Tue Jan 10 15:00:20 2023 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:20 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:20 <opendevmeet> The meeting name has been set to 'neutron_ci'
15:00:21 <slaweq> o/
15:00:25 <mlavalle> o/
15:00:32 <ralonsoh> amorin, I'll check it after this meeting
15:00:33 <ykarel> o/
15:00:34 <ralonsoh> hi
15:00:51 * amorin will be quiet during the meeting, thanks ralonsoh
15:01:02 <lajoskatona> o/
15:01:25 <slaweq> I think we can start as we have quorum
15:01:32 <slaweq> Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1
15:01:32 <slaweq> Please open now :)
15:01:37 <slaweq> #topic Actions from previous meetings
15:01:49 <slaweq> lajoskatona to check dvr lifecycle functional tests failures
15:02:17 <slaweq> I saw the same (probably) issue this week too
15:02:55 <lajoskatona> yes, I still can't reproduce it locally so today I tried to add some extra logs, but seems I touched some shaky part as due to the extra logs 3 unit tests are failing :-)
15:03:19 <lajoskatona> https://review.opendev.org/c/openstack/neutron/+/869666
15:03:54 <lajoskatona> so I have to check why my logs break those tests
15:04:35 <lajoskatona> that's all for these issue from me
15:04:52 <slaweq> I just commented there
15:04:58 <lajoskatona> thanks
15:05:01 <slaweq> I think You missed "raise" after logging of error
15:05:10 <slaweq> as You now silently catching all exceptions there
15:05:40 <lajoskatona> I check it, the first ps reraised it but the  result was the same, but I will check it again
15:05:50 <slaweq> ok
15:05:52 <slaweq> next one
15:05:57 <slaweq> lajoskatona to check networking-odl periodic failures
15:06:44 <lajoskatona> no time for it, but I saw the red results, so on my list
15:07:33 <slaweq> #action lajoskatona to check networking-odl periodic failures
15:07:38 <slaweq> ok, lets keep it for next week
15:07:43 <ykarel> atleast one error i show was related to tox4 in odl
15:08:21 <lajoskatona> ykarel: thanks I will check, perhaps it is just to add the magic words to tox.ini as for other projects
15:08:42 <ykarel> yeap
15:10:39 <slaweq> ++
15:10:45 <slaweq> thx lajoskatona
15:10:49 <slaweq> and ykarel
15:11:20 <slaweq> ok, next one
15:11:25 <slaweq> slaweq to check logs in https://55905461b56e292f56bb-d32e9684574055628f247373c3e6dda1.ssl.cf1.rackcdn.com/868379/2/gate/neutron-functional-with-uwsgi/60b4ea3/testr_results.html
15:11:38 <slaweq> I was checking it and I proposed patch https://review.opendev.org/c/openstack/neutron/+/869205
15:12:04 <slaweq> I was trying to reproduce the issue in  https://review.opendev.org/c/openstack/neutron/+/869225/ but so far I couldn't
15:12:15 <slaweq> I run 4 times 20 neutron-functional jobs and all of them passed
15:13:23 <slaweq> please review this patch when You will have some time
15:13:30 <slaweq> and thx haleyb for review
15:13:49 <slaweq> next one
15:13:51 <slaweq> slaweq to talk with hrw about cirros kernel panic
15:14:05 <slaweq> I talked with hrw last week and he told me to try new cirros
15:14:10 <haleyb> slaweq: np, i'll take a look if there are any updates
15:14:23 <slaweq> and if that will be still happening, increase memory in the flavor to 192 or 256 MB
15:14:50 <ykarel> slaweq, so they have seen similar kernel panic with older cirros versions?
15:15:11 <ykarel> i mean it was a known issue and newer version fixing it?
15:15:15 <slaweq> so I proposed https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/869152 and https://review.opendev.org/c/openstack/neutron/+/869154
15:15:34 <ykarel> just curios on what was the fix
15:15:38 <slaweq> ykarel he didn't confirm it for sure but his advice was "try newer cirros first"
15:16:20 <ykarel> uhhk i think we have seen that issue very rarely so just trying new cirros we can't confirm until it reproduces
15:16:33 <ykarel> personally /me seen only once
15:16:39 <ralonsoh> didn't we have problems with nested virt and this cirros version?
15:16:53 <slaweq> I think I saw something similar once this week too
15:17:01 <slaweq> https://28f5318084af7eb69294-d7da90a475a01486cfcea9707ed18dfb.ssl.cf2.rackcdn.com/864000/5/check/tempest-integrated-networking/10b07aa/testr_results.html
15:17:25 <ykarel> slaweq, ^ is different
15:17:35 <ykarel> and workaround for that was to use uec image
15:18:03 <slaweq> ok, this one happened in tempest-integrated-networking which isn't using uec image
15:18:16 <slaweq> that would explain why it happened this week :)
15:18:27 <lajoskatona> this cirros 0.6.1 is not the one on which frickler is working to replace ubuntu minimal am I right?
15:18:48 <slaweq> lajoskatona I'm not sure on which he was working on
15:18:55 <lajoskatona> ok
15:19:15 <slaweq> ok, last one
15:19:16 <slaweq> yatin to check status of ubuntu minimal as advanced image in ovn tempest plugin job
15:19:17 <ykarel> ralonsoh, yes there were issues with cirros 0.6.1 and nested virt
15:19:32 <ykarel> i pushed upated to not use host-passthrough and it worked
15:19:53 <ykarel> but there were still some failures in stadium project jobs but /me not checked
15:20:07 <ykarel> for ubuntu minimal pushed https://review.opendev.org/q/topic:ubuntu-minimal-as-adv-image
15:20:45 <ykarel> noticed taas too was using regular image so updated those too to use minimal and as per tests working there too
15:21:02 <slaweq> I just approved taas one
15:21:10 <ykarel> thx
15:21:28 <lajoskatona> thanks for it, good catch
15:21:40 <slaweq> thx ykarel
15:21:48 <slaweq> ok, I think we can move on to the next topic now
15:22:17 <frickler> you were thinking about https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/854910 I guess
15:22:33 <slaweq> #topic Stable branches
15:22:36 <slaweq> bcafarel is not here but do You have anything related to stable branches ci?
15:22:38 <frickler> since that's auto-abandoned, I guess I'll leave it at that
15:23:44 <slaweq> frickler we can restore it if You want to continue
15:23:48 <lajoskatona> frickler: yes that was it, the abandon I guess was just the usual cleanup process, we can check if it worth to continue
15:25:17 <slaweq> ok, if there's nothing regarding stable, lets move on
15:25:19 <slaweq> #topic Stadium projects
15:25:35 <slaweq> lajoskatona any updates, except odl which we already discussed earlier
15:25:48 <lajoskatona> I hope all of them are now safe from the tox4 issue ecept odl
15:26:00 <lajoskatona> that's it from me
15:26:47 <slaweq> lajoskatona I saw many other projects red this week in periodic weekly jobs
15:27:06 <slaweq> I didn't check results so I'm not sure if that was still tox4 issues or something else
15:27:13 <lajoskatona> yes those were tox4 issues
15:27:19 <slaweq> ahh, ok
15:27:29 <slaweq> so next week it should be much more green I hope
15:28:01 <lajoskatona> I hope :-)
15:28:11 <mlavalle> LOL
15:28:16 <slaweq> :)
15:28:19 <slaweq> next topic then
15:28:20 <mlavalle> let's hope for greener pastures
15:28:21 <slaweq> #topic Grafana
15:28:26 <slaweq> https://grafana.opendev.org/d/f913631585/neutron-failure-rate
15:29:53 <slaweq> TBH there is not much data from last few days there and I don't know exactly why
15:30:34 <mlavalle> people returnign from holidays until this week maybe
15:30:59 <slaweq> there was some spike of failures during the weekend but now seems that things are getting back to normal
15:31:41 <slaweq> I think we can move on
15:31:48 <slaweq> #topic Rechecks
15:32:03 <slaweq> regarding number of rechecks, we are back to the much better numbers
15:32:14 <slaweq> | 2023-1  | 0.29     |
15:32:14 <slaweq> | 2023-2  | 0.0      |
15:32:29 <slaweq> and I hope it will stay like that for longer time
15:32:53 <slaweq> regarding bare rechecks, it also seems good:
15:33:13 <slaweq> 3 out of 17 were bare, which is about 18%
15:33:37 <slaweq> thx for checking ci issues before rechecking
15:33:50 <slaweq> any questions/comments regarding rechecks?
15:34:05 <lajoskatona> nope
15:34:22 <mlavalle> thanks for keeping track of it
15:35:23 <slaweq> ok, so lets move on
15:35:24 <slaweq> next topic
15:35:25 <slaweq> #topic fullstack/functional
15:35:26 <slaweq> for functional I have 2 issues this week
15:35:27 <slaweq> neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon.test_read_queue_change_state
15:35:30 <slaweq> https://c50fdb7f046159692f4d-3059cf1890ea1358c70d952067d56657.ssl.cf2.rackcdn.com/869388/1/check/neutron-functional-with-uwsgi/1e50279/testr_results.html
15:36:00 <slaweq> anyone wants to check it?
15:36:29 <mlavalle> o/
15:37:15 <slaweq> thx mlavalle
15:37:29 <slaweq> #action mlavalle to check failed neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon.test_read_queue_change_state
15:38:01 <mlavalle> is that the only instance so far?
15:38:03 <slaweq> second one is (probably) the same issue as lajoskatona is already checking:
15:38:08 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b1f/869163/3/check/neutron-functional-with-uwsgi/b1f4063/testr_results.html
15:38:16 <slaweq> yes mlavalle, I found it only once
15:38:22 <mlavalle> ack
15:38:28 <ralonsoh> mlavalle, slaweq it is a timing issue
15:38:34 <ralonsoh> the text is in the file
15:38:42 <ralonsoh> but populated just before the timewait
15:38:44 <lajoskatona> yes at least it is from the DVR ones, but the tracebeck is not exactly the same
15:38:46 <ralonsoh> (it is in the logs)
15:39:08 <slaweq> ralonsoh so maybe we should increase the timeout slightly then?
15:39:15 <ralonsoh> yes, a couple of secs
15:39:28 <mlavalle> ok, I'll try that
15:39:32 <slaweq> mlavalle so it should be easy patch for You :)
15:39:44 <mlavalle> hope so :-)
15:39:52 <slaweq> :)
15:40:47 <slaweq> now fullstack
15:40:58 <slaweq> neutron.tests.fullstack.test_agent_bandwidth_report.TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement(Open vSwitch agent)
15:41:00 <slaweq> https://7a456b090239dc19e21e-66179e131883a8ab832a0afb9e9b5999.ssl.cf5.rackcdn.com/869388/1/check/neutron-fullstack-with-uwsgi/d4f2039/testr_results.html
15:41:31 <lajoskatona> I check this one
15:41:39 <lajoskatona> only one occurance?
15:41:40 <slaweq> anyone wants to check if this is maybe some timing issue or something worth to report and investigate?
15:41:46 <slaweq> thx lajoskatona
15:41:54 <slaweq> yes, it also happened only once so far
15:42:00 <lajoskatona> ok
15:42:16 <slaweq> #action lajoskatona to check fullstack failure neutron.tests.fullstack.test_agent_bandwidth_report.TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement(Open vSwitch agent)
15:42:18 <slaweq> and second one
15:42:30 <slaweq> IP or gateway not configured properly (again): https://de87c0b256f64d4fa9ad-627fb04945741dffd55f8af38c253b04.ssl.cf2.rackcdn.com/869613/1/check/neutron-fullstack-with-uwsgi/bc5e466/testr_results.html
15:43:04 <slaweq> this one I will check
15:43:18 <slaweq> as it's something what I was hope to be fixed few weeks ago already
15:43:28 <slaweq> #action slaweq to check IP or gateway not configured properly (again): https://de87c0b256f64d4fa9ad-627fb04945741dffd55f8af38c253b04.ssl.cf2.rackcdn.com/869613/1/check/neutron-fullstack-with-uwsgi/bc5e466/testr_results.html
15:43:48 <slaweq> and with that I reached end of my list for today :)
15:43:57 <slaweq> #topic On Demand
15:44:10 <slaweq> anything else You want do discuss today?
15:44:16 <slaweq> related to the CI of course ;)
15:44:36 <ralonsoh> all good
15:44:50 <mlavalle> not from me
15:45:38 <slaweq> ok, so I think we can end the meeting now
15:45:42 <slaweq> thx for attending
15:45:47 <mlavalle> o/
15:45:48 <slaweq> have a great week and see You online
15:45:48 <ralonsoh> see you
15:45:50 <slaweq> o/
15:45:53 <lajoskatona> o/
15:45:53 <ykarel> o/
15:45:54 <slaweq> #endmeeting