15:00:20 <slaweq> #startmeeting neutron_ci 15:00:20 <opendevmeet> Meeting started Tue Jan 10 15:00:20 2023 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:20 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:20 <opendevmeet> The meeting name has been set to 'neutron_ci' 15:00:21 <slaweq> o/ 15:00:25 <mlavalle> o/ 15:00:32 <ralonsoh> amorin, I'll check it after this meeting 15:00:33 <ykarel> o/ 15:00:34 <ralonsoh> hi 15:00:51 * amorin will be quiet during the meeting, thanks ralonsoh 15:01:02 <lajoskatona> o/ 15:01:25 <slaweq> I think we can start as we have quorum 15:01:32 <slaweq> Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1 15:01:32 <slaweq> Please open now :) 15:01:37 <slaweq> #topic Actions from previous meetings 15:01:49 <slaweq> lajoskatona to check dvr lifecycle functional tests failures 15:02:17 <slaweq> I saw the same (probably) issue this week too 15:02:55 <lajoskatona> yes, I still can't reproduce it locally so today I tried to add some extra logs, but seems I touched some shaky part as due to the extra logs 3 unit tests are failing :-) 15:03:19 <lajoskatona> https://review.opendev.org/c/openstack/neutron/+/869666 15:03:54 <lajoskatona> so I have to check why my logs break those tests 15:04:35 <lajoskatona> that's all for these issue from me 15:04:52 <slaweq> I just commented there 15:04:58 <lajoskatona> thanks 15:05:01 <slaweq> I think You missed "raise" after logging of error 15:05:10 <slaweq> as You now silently catching all exceptions there 15:05:40 <lajoskatona> I check it, the first ps reraised it but the result was the same, but I will check it again 15:05:50 <slaweq> ok 15:05:52 <slaweq> next one 15:05:57 <slaweq> lajoskatona to check networking-odl periodic failures 15:06:44 <lajoskatona> no time for it, but I saw the red results, so on my list 15:07:33 <slaweq> #action lajoskatona to check networking-odl periodic failures 15:07:38 <slaweq> ok, lets keep it for next week 15:07:43 <ykarel> atleast one error i show was related to tox4 in odl 15:08:21 <lajoskatona> ykarel: thanks I will check, perhaps it is just to add the magic words to tox.ini as for other projects 15:08:42 <ykarel> yeap 15:10:39 <slaweq> ++ 15:10:45 <slaweq> thx lajoskatona 15:10:49 <slaweq> and ykarel 15:11:20 <slaweq> ok, next one 15:11:25 <slaweq> slaweq to check logs in https://55905461b56e292f56bb-d32e9684574055628f247373c3e6dda1.ssl.cf1.rackcdn.com/868379/2/gate/neutron-functional-with-uwsgi/60b4ea3/testr_results.html 15:11:38 <slaweq> I was checking it and I proposed patch https://review.opendev.org/c/openstack/neutron/+/869205 15:12:04 <slaweq> I was trying to reproduce the issue in https://review.opendev.org/c/openstack/neutron/+/869225/ but so far I couldn't 15:12:15 <slaweq> I run 4 times 20 neutron-functional jobs and all of them passed 15:13:23 <slaweq> please review this patch when You will have some time 15:13:30 <slaweq> and thx haleyb for review 15:13:49 <slaweq> next one 15:13:51 <slaweq> slaweq to talk with hrw about cirros kernel panic 15:14:05 <slaweq> I talked with hrw last week and he told me to try new cirros 15:14:10 <haleyb> slaweq: np, i'll take a look if there are any updates 15:14:23 <slaweq> and if that will be still happening, increase memory in the flavor to 192 or 256 MB 15:14:50 <ykarel> slaweq, so they have seen similar kernel panic with older cirros versions? 15:15:11 <ykarel> i mean it was a known issue and newer version fixing it? 15:15:15 <slaweq> so I proposed https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/869152 and https://review.opendev.org/c/openstack/neutron/+/869154 15:15:34 <ykarel> just curios on what was the fix 15:15:38 <slaweq> ykarel he didn't confirm it for sure but his advice was "try newer cirros first" 15:16:20 <ykarel> uhhk i think we have seen that issue very rarely so just trying new cirros we can't confirm until it reproduces 15:16:33 <ykarel> personally /me seen only once 15:16:39 <ralonsoh> didn't we have problems with nested virt and this cirros version? 15:16:53 <slaweq> I think I saw something similar once this week too 15:17:01 <slaweq> https://28f5318084af7eb69294-d7da90a475a01486cfcea9707ed18dfb.ssl.cf2.rackcdn.com/864000/5/check/tempest-integrated-networking/10b07aa/testr_results.html 15:17:25 <ykarel> slaweq, ^ is different 15:17:35 <ykarel> and workaround for that was to use uec image 15:18:03 <slaweq> ok, this one happened in tempest-integrated-networking which isn't using uec image 15:18:16 <slaweq> that would explain why it happened this week :) 15:18:27 <lajoskatona> this cirros 0.6.1 is not the one on which frickler is working to replace ubuntu minimal am I right? 15:18:48 <slaweq> lajoskatona I'm not sure on which he was working on 15:18:55 <lajoskatona> ok 15:19:15 <slaweq> ok, last one 15:19:16 <slaweq> yatin to check status of ubuntu minimal as advanced image in ovn tempest plugin job 15:19:17 <ykarel> ralonsoh, yes there were issues with cirros 0.6.1 and nested virt 15:19:32 <ykarel> i pushed upated to not use host-passthrough and it worked 15:19:53 <ykarel> but there were still some failures in stadium project jobs but /me not checked 15:20:07 <ykarel> for ubuntu minimal pushed https://review.opendev.org/q/topic:ubuntu-minimal-as-adv-image 15:20:45 <ykarel> noticed taas too was using regular image so updated those too to use minimal and as per tests working there too 15:21:02 <slaweq> I just approved taas one 15:21:10 <ykarel> thx 15:21:28 <lajoskatona> thanks for it, good catch 15:21:40 <slaweq> thx ykarel 15:21:48 <slaweq> ok, I think we can move on to the next topic now 15:22:17 <frickler> you were thinking about https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/854910 I guess 15:22:33 <slaweq> #topic Stable branches 15:22:36 <slaweq> bcafarel is not here but do You have anything related to stable branches ci? 15:22:38 <frickler> since that's auto-abandoned, I guess I'll leave it at that 15:23:44 <slaweq> frickler we can restore it if You want to continue 15:23:48 <lajoskatona> frickler: yes that was it, the abandon I guess was just the usual cleanup process, we can check if it worth to continue 15:25:17 <slaweq> ok, if there's nothing regarding stable, lets move on 15:25:19 <slaweq> #topic Stadium projects 15:25:35 <slaweq> lajoskatona any updates, except odl which we already discussed earlier 15:25:48 <lajoskatona> I hope all of them are now safe from the tox4 issue ecept odl 15:26:00 <lajoskatona> that's it from me 15:26:47 <slaweq> lajoskatona I saw many other projects red this week in periodic weekly jobs 15:27:06 <slaweq> I didn't check results so I'm not sure if that was still tox4 issues or something else 15:27:13 <lajoskatona> yes those were tox4 issues 15:27:19 <slaweq> ahh, ok 15:27:29 <slaweq> so next week it should be much more green I hope 15:28:01 <lajoskatona> I hope :-) 15:28:11 <mlavalle> LOL 15:28:16 <slaweq> :) 15:28:19 <slaweq> next topic then 15:28:20 <mlavalle> let's hope for greener pastures 15:28:21 <slaweq> #topic Grafana 15:28:26 <slaweq> https://grafana.opendev.org/d/f913631585/neutron-failure-rate 15:29:53 <slaweq> TBH there is not much data from last few days there and I don't know exactly why 15:30:34 <mlavalle> people returnign from holidays until this week maybe 15:30:59 <slaweq> there was some spike of failures during the weekend but now seems that things are getting back to normal 15:31:41 <slaweq> I think we can move on 15:31:48 <slaweq> #topic Rechecks 15:32:03 <slaweq> regarding number of rechecks, we are back to the much better numbers 15:32:14 <slaweq> | 2023-1 | 0.29 | 15:32:14 <slaweq> | 2023-2 | 0.0 | 15:32:29 <slaweq> and I hope it will stay like that for longer time 15:32:53 <slaweq> regarding bare rechecks, it also seems good: 15:33:13 <slaweq> 3 out of 17 were bare, which is about 18% 15:33:37 <slaweq> thx for checking ci issues before rechecking 15:33:50 <slaweq> any questions/comments regarding rechecks? 15:34:05 <lajoskatona> nope 15:34:22 <mlavalle> thanks for keeping track of it 15:35:23 <slaweq> ok, so lets move on 15:35:24 <slaweq> next topic 15:35:25 <slaweq> #topic fullstack/functional 15:35:26 <slaweq> for functional I have 2 issues this week 15:35:27 <slaweq> neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon.test_read_queue_change_state 15:35:30 <slaweq> https://c50fdb7f046159692f4d-3059cf1890ea1358c70d952067d56657.ssl.cf2.rackcdn.com/869388/1/check/neutron-functional-with-uwsgi/1e50279/testr_results.html 15:36:00 <slaweq> anyone wants to check it? 15:36:29 <mlavalle> o/ 15:37:15 <slaweq> thx mlavalle 15:37:29 <slaweq> #action mlavalle to check failed neutron.tests.functional.agent.l3.test_keepalived_state_change.TestMonitorDaemon.test_read_queue_change_state 15:38:01 <mlavalle> is that the only instance so far? 15:38:03 <slaweq> second one is (probably) the same issue as lajoskatona is already checking: 15:38:08 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_b1f/869163/3/check/neutron-functional-with-uwsgi/b1f4063/testr_results.html 15:38:16 <slaweq> yes mlavalle, I found it only once 15:38:22 <mlavalle> ack 15:38:28 <ralonsoh> mlavalle, slaweq it is a timing issue 15:38:34 <ralonsoh> the text is in the file 15:38:42 <ralonsoh> but populated just before the timewait 15:38:44 <lajoskatona> yes at least it is from the DVR ones, but the tracebeck is not exactly the same 15:38:46 <ralonsoh> (it is in the logs) 15:39:08 <slaweq> ralonsoh so maybe we should increase the timeout slightly then? 15:39:15 <ralonsoh> yes, a couple of secs 15:39:28 <mlavalle> ok, I'll try that 15:39:32 <slaweq> mlavalle so it should be easy patch for You :) 15:39:44 <mlavalle> hope so :-) 15:39:52 <slaweq> :) 15:40:47 <slaweq> now fullstack 15:40:58 <slaweq> neutron.tests.fullstack.test_agent_bandwidth_report.TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement(Open vSwitch agent) 15:41:00 <slaweq> https://7a456b090239dc19e21e-66179e131883a8ab832a0afb9e9b5999.ssl.cf5.rackcdn.com/869388/1/check/neutron-fullstack-with-uwsgi/d4f2039/testr_results.html 15:41:31 <lajoskatona> I check this one 15:41:39 <lajoskatona> only one occurance? 15:41:40 <slaweq> anyone wants to check if this is maybe some timing issue or something worth to report and investigate? 15:41:46 <slaweq> thx lajoskatona 15:41:54 <slaweq> yes, it also happened only once so far 15:42:00 <lajoskatona> ok 15:42:16 <slaweq> #action lajoskatona to check fullstack failure neutron.tests.fullstack.test_agent_bandwidth_report.TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement(Open vSwitch agent) 15:42:18 <slaweq> and second one 15:42:30 <slaweq> IP or gateway not configured properly (again): https://de87c0b256f64d4fa9ad-627fb04945741dffd55f8af38c253b04.ssl.cf2.rackcdn.com/869613/1/check/neutron-fullstack-with-uwsgi/bc5e466/testr_results.html 15:43:04 <slaweq> this one I will check 15:43:18 <slaweq> as it's something what I was hope to be fixed few weeks ago already 15:43:28 <slaweq> #action slaweq to check IP or gateway not configured properly (again): https://de87c0b256f64d4fa9ad-627fb04945741dffd55f8af38c253b04.ssl.cf2.rackcdn.com/869613/1/check/neutron-fullstack-with-uwsgi/bc5e466/testr_results.html 15:43:48 <slaweq> and with that I reached end of my list for today :) 15:43:57 <slaweq> #topic On Demand 15:44:10 <slaweq> anything else You want do discuss today? 15:44:16 <slaweq> related to the CI of course ;) 15:44:36 <ralonsoh> all good 15:44:50 <mlavalle> not from me 15:45:38 <slaweq> ok, so I think we can end the meeting now 15:45:42 <slaweq> thx for attending 15:45:47 <mlavalle> o/ 15:45:48 <slaweq> have a great week and see You online 15:45:48 <ralonsoh> see you 15:45:50 <slaweq> o/ 15:45:53 <lajoskatona> o/ 15:45:53 <ykarel> o/ 15:45:54 <slaweq> #endmeeting