15:00:33 #startmeeting neutron_ci 15:00:33 Meeting started Tue Mar 2 15:00:33 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:34 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:36 The meeting name has been set to 'neutron_ci' 15:00:39 hi 15:00:57 hey again 15:02:08 ping ralonsoh: lajoskatona 15:02:22 ci meeting, are You going to attend? 15:02:27 hi 15:02:58 Hi 15:03:02 I think we can start 15:03:08 Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:03:15 #topic Actions from previous meetings 15:03:26 we have only one 15:03:31 slaweq to check failing periodic task in functional test 15:03:39 I reported bug https://bugs.launchpad.net/neutron/+bug/1916761 15:03:40 Launchpad bug 1916761 in neutron "[dvr] bound port permanent arp entries never deleted" [High,In progress] - Assigned to Edward Hope-Morley (hopem) 15:03:47 and proposed patch https://review.opendev.org/c/openstack/neutron/+/778080 15:04:08 it seems that it works, at least there are no errors related to the maintenance task in the logs 15:04:27 job-output.txt file is about 10x smaller with that change 15:05:00 no, sorry 15:05:08 it's not so much smaller 15:05:15 but it is smaller significantly 15:05:45 please review that patch if You will have some time 15:06:06 and let's move on 15:06:08 #topic Stadium projects 15:06:17 anything regarding stadium projects' ci? 15:06:26 not much from me 15:06:39 still strugling with old branches fixes 15:06:52 :) 15:06:56 I am on the point to ask around infra or QA channel 15:07:17 I run into stupid no pip2.7 available issues and similar 15:07:45 but it's on older (before train??) branches so 15:08:39 thats it as I remember 15:09:07 I think that all branches before Train are already EM 15:09:17 yep 15:09:19 so we can mark them as EOL for stadium projects probably 15:09:22 no? 15:10:01 yes, we can check all and decide based on that 15:10:27 I mean based on the alive backports or similar 15:10:37 lajoskatona: so if there is no any interest in community to maintain them, and there are big issues, I would say - don't spent too much time on it :) 15:11:00 agree 15:11:50 thx, please keep me updated if You will want to EOL some branches in some projects 15:12:01 especially if they have been broken for some time and no one complained 15:12:44 ok, I check where we are with those branches 15:12:47 and I think we can easily announce them as Unmaintained (then if people show up to fix it, it can go back to EM) 15:13:01 ++ 15:13:10 +1 15:13:16 thx lajoskatona for taking care of it 15:13:27 let's move on 15:13:29 #topic Stable branches 15:13:34 Victoria dashboard: https://grafana.opendev.org/d/HUCHup2Gz/neutron-failure-rate-previous-stable-release?orgId=1 15:13:36 Ussuri dashboard: https://grafana.opendev.org/d/smqHXphMk/neutron-failure-rate-older-stable-release?orgId=1 15:13:46 bcafarel: any updates about stable branches? 15:15:05 train is failing, "neutron-tempest-dvr-ha-multinode-full" job, qos migration tests 15:15:26 eg: https://0c345762207dc13e339e-d1e090fdf1a39e65d2b0ba37cbdce0a4.ssl.cf2.rackcdn.com/777781/1/check/neutron-tempest-dvr-ha-multinode-full/463e963/testr_results.html 15:15:30 in all patches 15:15:48 ralonsoh: but this job is non-voting, right? 15:16:05 it is, yes 15:16:11 just a heads-up 15:16:19 but it seems like nova issue really 15:16:27 No valid host found for cold migrate 15:16:34 and 15:16:36 No valid host found for resize 15:16:58 is it only in train? 15:17:31 sorry Murphy's law, mailman ringing just before the ping 15:17:43 did it work before? I remember this job being mostly unstable 15:19:44 TBH I wasn't checking it for pretty long time 15:19:55 I can investigate and reach out to nova ppl if needed 15:20:17 #action slaweq to check failing qos migration tests in train neutron-tempest-dvr-ha-multinode-full job 15:21:46 anything else related to stable branches? 15:22:08 hopefully https://review.opendev.org/c/openstack/neutron/+/777389 will be done soon for stein :) 15:22:39 rest looked OK, I have a few ones to review in my backlog, but CI looked good overall 15:22:59 thx 15:23:03 so we can move on 15:23:06 next topic 15:23:08 #topic Grafana 15:23:12 http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:23:44 I see just one, but pretty serious problem there 15:23:53 fullstack/functional jobs are failing way too much still 15:24:05 except that it seems that we are good 15:25:11 do You see any other issues there? 15:25:17 we need to commit ourselves to, if we find an error in those jobs, report it in LP 15:25:27 just to track it 15:26:01 yes, I agree 15:26:06 I spent some time today on them 15:26:30 and I came with few small patches https://review.opendev.org/q/topic:%22improve-neutron-ci%22+(status:merged) 15:26:58 sorry 15:26:59 https://review.opendev.org/q/topic:%2522improve-neutron-ci%2522+status:open 15:27:03 that is correct link 15:27:09 please take a look at them 15:28:01 I think that most often failures are due to oom-killer kills mysql server 15:28:22 right 15:28:24 so I proposed to lower number of test workers in both jobs 15:28:31 in fullstack I already did that some time ago 15:28:45 let's reduce FT to 4 and fullstack to 3 15:28:51 but I forgot about dsvm-fullstack-gate tox env which is used in gate really 15:29:05 ralonsoh: that's exactly what my patches proposed :) 15:29:13 FT to 4 and fullstack to 3 15:29:17 yeah hehehe 15:29:19 :) 15:29:33 so that should be covered :) 15:29:52 another issue are timeouted jobs 15:30:05 and I think that those are mostly due to stestr and "too much output" 15:30:19 like we had already in the past in UT IIRC 15:30:25 so I proposed https://review.opendev.org/c/openstack/neutron/+/778196 15:30:37 and also https://review.opendev.org/c/openstack/neutron/+/778080 should helps for that 15:30:51 but in FT job there is still a lot of things logged 15:31:09 if You check in https://23965cc52ad55df824a3-476d86922c45abb704c82e069ca48dea.ssl.cf1.rackcdn.com/778080/2/check/neutron-functional-with-uwsgi/875857e/job-output.txt 15:31:15 there is a lot of errors like: 15:31:26 oslo_db.exception.CantStartEngineError: No sql_connection parameter is established 15:31:33 and a lot of lines like: 15:31:39 Running command: ... 15:31:57 I was trying to somehow get rid of them but I really don't know how 15:32:11 if You would have any ideas, help is more than welcome :) 15:33:11 anyone wants to check that? 15:33:42 sure 15:34:21 thx ralonsoh 15:35:15 #action ralonsoh to try to check how to limit number of logged lines in FT output 15:35:26 * slaweq will be back in 2 minutes 15:36:02 during this waiting time, I think this is because "DBInconsistenciesPeriodics", in FTs 15:36:06 but I need to check it 15:36:57 * slaweq is back 15:37:17 ralonsoh: but in patch https://review.opendev.org/c/openstack/neutron/+/778080 I mocked this maintenance worker thread 15:37:28 and still there are those lines logged there 15:37:37 maybe I missed something there, idk 15:38:00 right, you are stopping the thread there 15:39:36 there is one more issue which I found couple of times in FT recently 15:39:42 Timeouts while doing some ip operations, like in 15:39:47 https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_018/772460/13/check/neutron-functional-with-uwsgi/0181e4f/testr_results.html 15:39:49 https://3d423a08ba57e3349bef-667e59a55d2239af414b0984e42f005a.ssl.cf5.rackcdn.com/771621/7/check/neutron-functional-with-uwsgi/c8d3396/testr_results.html 15:39:51 https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ac4/768129/37/check/neutron-functional-with-uwsgi/ac480c2/testr_results.html 15:39:56 did You saw them already? 15:40:03 maybe You know what can cause such problems? 15:40:17 I did but I still can't find the root cause 15:41:12 I saw it always in those test_linuxbridge_arp_protect tests module 15:41:19 did You saw it in different modules too? 15:41:39 I can't remember 15:41:40 maybe we could mark those tests are unstable for now and that would give us a breath 15:42:17 I'll record all appearances I find and I'll report a LP 15:42:54 ralonsoh: thx 15:43:17 #action ralonsoh to report bug with ip operations timeout in FT 15:43:39 so that's basically all what I had for today regarding those jobs 15:43:59 long story short, lets merge patches which we have now and hopefully it will be bit better 15:44:13 and then lets focus on debugging issues which we already mentioned here 15:44:25 last topic for today 15:44:27 #topic Periodic 15:44:34 Jobs results: http://zuul.openstack.org/buildsets?project=openstack%2Fneutron&pipeline=periodic&branch=master 15:44:43 in overall periodic jobs seems good 15:44:51 but fedora based job is all the time red 15:44:58 is there any volunteer to check it? 15:45:26 not this week, sorry 15:45:30 again? sigh 15:45:35 no need to sorry ralonsoh :) 15:45:43 I can try to take a look 15:45:44 bcafarel: I think it is failing still, not again :) 15:45:49 bcafarel: thx a lot 15:46:07 #action bcafarel to check failing fedora based periodic job 15:46:37 seems not so long ago we had to push a few fixes for it :) 15:47:16 yes, but then it start failing again 15:47:22 and we never fixed it I think :/ 15:48:04 and that's basically all what I had for today 15:48:18 do You have anything else You want to discuss today, regarding our ci? 15:48:31 no 15:48:36 nothing from me 15:48:44 if no, I will give You few minutes back today 15:48:50 thx for attending the meeting 15:48:55 and have a great week 15:48:56 o/ 15:48:57 bye! 15:48:59 #endmeeting