15:00:17 <slaweq> #startmeeting neutron_ci 15:00:17 <opendevmeet> Meeting started Tue Jul 6 15:00:17 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:17 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:17 <opendevmeet> The meeting name has been set to 'neutron_ci' 15:00:27 <ralonsoh> hi 15:00:28 <bcafarel> hi again 15:00:30 <slaweq> hi (again)! 15:00:37 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:00:39 <slaweq> Please open now :) 15:01:37 <slaweq> ok, let's start 15:01:40 <slaweq> #topic Actions from previous meetings 15:01:46 <slaweq> amotoki to clean failing jobs in networking-odl rocky and older 15:01:50 <lajoskatona> Hi 15:01:52 <amotoki> hi 15:02:03 <obondarev> hi 15:02:27 <amotoki> zuul configuration errors happen in networking-odl, neutron-fwaas and -midonet stable/rocky or older. 15:03:09 <amotoki> networking-odl rocky and queens have been done. 15:03:25 <amotoki> others are under reviews except networking-odl ocata. 15:03:52 <amotoki> networking-odl ocata needs to be EOL'ed as neutron ocata is already EOL. lajoskatona prepared the release patch. 15:03:59 <lajoskatona> Jsut for reference the original mail: http://lists.openstack.org/pipermail/openstack-discuss/2021-June/023321.html 15:04:21 <lajoskatona> and the patch: https://review.opendev.org/c/openstack/releases/+/799473 15:04:53 <slaweq> thx amotoki and lajoskatona for taking care of all of that 15:05:08 <lajoskatona> no problem 15:05:22 <amotoki> https://review.opendev.org/c/openstack/releases/+/799472 also will clean up newton or older unofficial branches under the neutron governance curretly. 15:05:23 <lajoskatona> actually there's another release patch for older branches from elod: https://review.opendev.org/c/openstack/releases/+/799472 15:05:30 <amotoki> that's all from me 15:05:38 <lajoskatona> yeah, exactly as amotoki wrote it :-) 15:05:44 <slaweq> thx 15:05:48 <slaweq> ok, so next one 15:05:50 <slaweq> lajoskatona to start EOL process for networking-odl rocky and older 15:06:01 <slaweq> I assume it's already covered as well :) 15:06:12 <lajoskatona> yeah 15:06:30 <slaweq> thx 15:06:32 <slaweq> so next one 15:06:33 <lajoskatona> I haven't sent mail, as I assumethe original mail from You slaweq covered odl as well 15:06:35 <slaweq> ralonsoh to check if there is better way to check dns nameservers in cirros 15:06:50 <ralonsoh> still testing how to do it in cirros 15:06:53 <ralonsoh> in a different way 15:06:56 <slaweq> ok 15:07:10 <slaweq> I will add it for You for next time to not forget 15:07:12 <slaweq> ok? 15:07:13 <ralonsoh> sure 15:07:19 <slaweq> #action ralonsoh to check if there is better way to check dns nameservers in cirros 15:07:21 <slaweq> thx ralonsoh 15:07:29 <slaweq> and that are all actions from last week 15:07:35 <slaweq> #topic Stadium projects 15:08:04 <slaweq> lajoskatona: any other updates/issues with stadium 15:08:10 <slaweq> *stadium's ci 15:08:32 <lajoskatona> The most interesting is to eol old branches which we covered 15:08:47 <bcafarel> cleanups++ 15:09:20 <lajoskatona> and the gerrit server restart (http://lists.openstack.org/pipermail/openstack-discuss/2021-July/023434.html ) wihc you mentioned on previous meeting, so after that taas will be under openstack/ namepace 15:09:52 <slaweq> great 15:09:55 <slaweq> thx 15:09:59 <slaweq> I think we can move on 15:10:01 <amotoki> during fixing netwokring-l2gw location in zuul, I noticed a lot of failures in stadium old branhces. I moved them to the experimental queue for the record. 15:10:11 <amotoki> is it the right action? 15:10:41 <slaweq> amotoki: if we can't/don't have time to fix such failing jobs, then yes 15:10:52 <amotoki> I tried to fix them if they have simple straight-forward fixes, but otherwise i did so. 15:10:55 <slaweq> I think it's good approach to move such jobs to experimental queue 15:11:00 <lajoskatona> I am fine with it, those fixes are really time consuming 15:11:12 <slaweq> thx a lot 15:11:17 <amotoki> if you have memory on devstack-gate and pip/pip3 issues, it would be nice to check it for example in neutron-fwaas. 15:11:37 <bcafarel> +1, if it can be easily fixed it is nice, but for EM branches we mostly keep the light on 15:11:55 <slaweq> I will try to check but I don't promise anything 15:12:39 <amotoki> most failing jobs still use the legacy way and it takes time :( 15:13:06 <slaweq> yeah, that is not something I'm really familiar with 15:13:19 <bcafarel> :) 15:14:09 <amotoki> anyway let's fix it separetely :) I think we can move on 15:14:37 <slaweq> thx 15:14:39 <slaweq> so next topic 15:14:42 <slaweq> #topic Stable branches 15:14:53 <slaweq> bcafarel: anything regarding stable branches ci in neutron? 15:15:02 <slaweq> IMO it works pretty ok recently 15:15:09 <slaweq> but maybe I missed some issues there 15:15:14 <bcafarel> indeed, backports are getting in quite nicely 15:15:24 <bcafarel> one that went back to rocky required a few rechecks 15:15:45 <bcafarel> but nothing specific, just an array of separate tests (including compute and storage) failing 15:16:02 <bcafarel> and that branch is low on backports so no need to dig further atm 15:16:29 <bcafarel> full support branches almost all patches are in at the moment, which is nice for the incoming stable releases :) 15:17:00 <slaweq> ++ 15:17:03 <slaweq> great news 15:17:10 <slaweq> #topic Grafana 15:17:17 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate 15:18:08 <slaweq> in grafana failure rates seems to be pretty ok for all of the jobs recently 15:19:50 <slaweq> I think we can move on, unless You see any issue with grafana and wants to talk about it now 15:20:18 <ralonsoh> nothing from me 15:20:29 <bcafarel> that looks good to me 15:20:39 <slaweq> ok, let's move on 15:20:41 <slaweq> #topic fullstack/functional 15:20:41 <bcafarel> also this is with updated job names right? 15:21:04 <slaweq> bcafarel: I think it is 15:21:31 <slaweq> regarding those jobs I have only one small update about fullstack issue https://bugs.launchpad.net/neutron/+bug/1933234 15:21:49 <slaweq> I just confirmed today with extra logs why this test is failing 15:22:29 <slaweq> basically router is still processed and its ports aren't added to the RouterInfo.internal_ports cache 15:22:45 <slaweq> when network_update rpc message comes 15:23:02 <slaweq> due to that, it can't find port attached to that router from the network 15:23:11 <slaweq> and router update isn't scheduled 15:23:22 <ralonsoh> good catch 15:23:28 <slaweq> so this is in fact real bug, not test issue 15:23:35 <ralonsoh> indeed a race condition 15:23:51 <ralonsoh> how are we updating the network before adding the port to the internal cache? 15:23:51 <slaweq> and I don't know yet how to fix it 15:24:28 <slaweq> ralonsoh: so router is processed and goes to https://github.com/openstack/neutron/blob/3764969b82c6e7b8c74172a1ec4d230ce4ddedcc/neutron/agent/l3/router_info.py#L636 15:24:38 <slaweq> there it should be added to the internal ports cache 15:24:42 <ralonsoh> right 15:24:54 <slaweq> but in the meantime, network_update is called https://github.com/openstack/neutron/blob/3764969b82c6e7b8c74172a1ec4d230ce4ddedcc/neutron/agent/l3/agent.py#L602 15:25:13 <ralonsoh> ok, I see 15:25:16 <slaweq> and if there is no this port in cache yet, it fails to schedule router update 15:25:31 <ralonsoh> and nothing to say "stop, we are updating this..." 15:26:06 <slaweq> yes, I think I will add some kind of flag (lock) in https://github.com/openstack/neutron/blob/3764969b82c6e7b8c74172a1ec4d230ce4ddedcc/neutron/agent/l3/router_info.py#L1256 15:26:26 <slaweq> and then we can check it in network_update if it's processing that router info or not 15:26:36 <slaweq> if yes, we can wait a bit with looking for ports 15:26:45 <slaweq> but I didn't implement anything yet 15:26:53 <slaweq> I have it on my todo list for tomorrow :) 15:27:28 <lajoskatona> regarding https://bugs.launchpad.net/neutron/+bug/1930401 and privsep timeout 15:27:49 <lajoskatona> https://review.opendev.org/c/openstack/neutron/+/794994 is failing with old timeouts :-( 15:28:17 <lajoskatona> I started to check, but have no (quick) idea what should be there 15:28:25 <ralonsoh> but this is because what was failing is not the privsep command 15:28:35 <ralonsoh> but the daemon spawn process, right? 15:28:44 <ralonsoh> the daemon was not spawning 15:28:44 <obondarev> ++ 15:29:26 <lajoskatona> ralonsoh: yeah 15:29:31 <ralonsoh> the timeout will prevent from a dead lock during a command execution 15:29:46 <ralonsoh> but not prevent or mitigate the problem we have here 15:31:45 <slaweq> so we still need to have 2 dhcp agents in those test which helped us a lot to workaround that issue 15:32:13 <ralonsoh> yes for now until we now what is preventing the daemon to start 15:32:23 <slaweq> k 15:32:45 <lajoskatona> ok 15:33:43 <slaweq> #topic Tempest/Scenario 15:33:59 <slaweq> here I just wanted to ask You to review https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/799648 15:34:13 <slaweq> it's small patch but it is causing issues in the tripleo based jobs which runs ovn 15:34:18 <slaweq> thx in advance 15:34:48 <slaweq> regarding issues in those jobs, I didn't found anything new worth to discuss today 15:35:34 <slaweq> and that's all from me for today regarding CI jobs 15:35:44 <slaweq> do You have anything else You want to discuss? 15:35:49 <ralonsoh> I'm fine 15:36:14 <bcafarel> same here 15:36:39 <slaweq> so one last thing - do You want to cancel next week's meeting or anyone wants to chair it? 15:37:03 <ralonsoh> (I'm fine with having a "free" week) 15:37:08 <lajoskatona> +1 15:37:22 <ralonsoh> if there is nothing catastrophic, of course 15:37:30 <bcafarel> worst case we know where to ping people :) 15:37:38 <ralonsoh> hehehe yes 15:37:52 <slaweq> ok, good 15:37:59 <slaweq> so I will cancel next week's meeting 15:38:09 <slaweq> and I think we are done for today 15:38:24 <slaweq> thx a lot for attending and for keeping our ci up and running :) 15:38:31 <slaweq> it seems really good recently 15:38:41 <slaweq> have a great week and see You online 15:38:44 <slaweq> o/ 15:38:46 <ralonsoh> bye 15:38:47 <slaweq> #endmeeting