#openstack-neutron log

15:00:17 <slaweq> #startmeeting neutron_ci
15:00:17 <opendevmeet> Meeting started Tue Jul  6 15:00:17 2021 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:17 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:17 <opendevmeet> The meeting name has been set to 'neutron_ci'
15:00:27 <ralonsoh> hi
15:00:28 <bcafarel> hi again
15:00:30 <slaweq> hi (again)!
15:00:37 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:00:39 <slaweq> Please open now :)
15:01:37 <slaweq> ok, let's start
15:01:40 <slaweq> #topic Actions from previous meetings
15:01:46 <slaweq> amotoki to clean failing jobs in networking-odl rocky and older
15:01:50 <lajoskatona> Hi
15:01:52 <amotoki> hi
15:02:03 <obondarev> hi
15:02:27 <amotoki> zuul configuration errors happen in networking-odl, neutron-fwaas and -midonet stable/rocky or older.
15:03:09 <amotoki> networking-odl rocky and queens have been done.
15:03:25 <amotoki> others are under reviews except networking-odl ocata.
15:03:52 <amotoki> networking-odl ocata needs to be EOL'ed as neutron ocata is already EOL. lajoskatona prepared the release patch.
15:03:59 <lajoskatona> Jsut for reference the original mail: http://lists.openstack.org/pipermail/openstack-discuss/2021-June/023321.html
15:04:21 <lajoskatona> and the patch: https://review.opendev.org/c/openstack/releases/+/799473
15:04:53 <slaweq> thx amotoki and lajoskatona for taking care of all of that
15:05:08 <lajoskatona> no problem
15:05:22 <amotoki> https://review.opendev.org/c/openstack/releases/+/799472 also will clean up newton or older unofficial branches under the neutron governance curretly.
15:05:23 <lajoskatona> actually there's another release patch for older branches from elod: https://review.opendev.org/c/openstack/releases/+/799472
15:05:30 <amotoki> that's all from me
15:05:38 <lajoskatona> yeah, exactly as amotoki wrote it :-)
15:05:44 <slaweq> thx
15:05:48 <slaweq> ok, so next one
15:05:50 <slaweq> lajoskatona to start EOL process for networking-odl rocky and older
15:06:01 <slaweq> I assume it's already covered as well :)
15:06:12 <lajoskatona> yeah
15:06:30 <slaweq> thx
15:06:32 <slaweq> so next one
15:06:33 <lajoskatona> I haven't sent mail, as I assumethe original mail from You slaweq covered odl as well
15:06:35 <slaweq> ralonsoh to check if there is better way to check dns nameservers in cirros
15:06:50 <ralonsoh> still testing how to do it in cirros
15:06:53 <ralonsoh> in a different way
15:06:56 <slaweq> ok
15:07:10 <slaweq> I will add it for You for next time to not forget
15:07:12 <slaweq> ok?
15:07:13 <ralonsoh> sure
15:07:19 <slaweq> #action ralonsoh to check if there is better way to check dns nameservers in cirros
15:07:21 <slaweq> thx ralonsoh
15:07:29 <slaweq> and that are all actions from last week
15:07:35 <slaweq> #topic Stadium projects
15:08:04 <slaweq> lajoskatona: any other updates/issues with stadium
15:08:10 <slaweq> *stadium's ci
15:08:32 <lajoskatona> The most interesting is to eol old branches which we covered
15:08:47 <bcafarel> cleanups++
15:09:20 <lajoskatona> and the gerrit server restart (http://lists.openstack.org/pipermail/openstack-discuss/2021-July/023434.html ) wihc you mentioned on previous meeting, so after that taas will be under openstack/ namepace
15:09:52 <slaweq> great
15:09:55 <slaweq> thx
15:09:59 <slaweq> I think we can move on
15:10:01 <amotoki> during fixing netwokring-l2gw location in zuul, I noticed a lot of failures in stadium old branhces. I moved them to the experimental queue for the record.
15:10:11 <amotoki> is it the right action?
15:10:41 <slaweq> amotoki: if we can't/don't have time to fix such failing jobs, then yes
15:10:52 <amotoki> I tried to fix them if they have simple straight-forward fixes, but otherwise i did so.
15:10:55 <slaweq> I think it's good approach to move such jobs to experimental queue
15:11:00 <lajoskatona> I am fine with it, those fixes are really time consuming
15:11:12 <slaweq> thx a lot
15:11:17 <amotoki> if you have memory on devstack-gate and pip/pip3 issues, it would be nice to check it for example in neutron-fwaas.
15:11:37 <bcafarel> +1, if it can be easily fixed it is nice, but for EM branches we mostly keep the light on
15:11:55 <slaweq> I will try to check but I don't promise anything
15:12:39 <amotoki> most failing jobs still use the legacy way and it takes time :(
15:13:06 <slaweq> yeah, that is not something I'm really familiar with
15:13:19 <bcafarel> :)
15:14:09 <amotoki> anyway let's fix it separetely :) I think we can move on
15:14:37 <slaweq> thx
15:14:39 <slaweq> so next topic
15:14:42 <slaweq> #topic Stable branches
15:14:53 <slaweq> bcafarel: anything regarding stable branches ci in neutron?
15:15:02 <slaweq> IMO it works pretty ok recently
15:15:09 <slaweq> but maybe I missed some issues there
15:15:14 <bcafarel> indeed, backports are getting in quite nicely
15:15:24 <bcafarel> one that went back to rocky required a few rechecks
15:15:45 <bcafarel> but nothing specific, just an array of separate tests (including compute and storage) failing
15:16:02 <bcafarel> and that branch is low on backports so no need to dig further atm
15:16:29 <bcafarel> full support branches almost all patches are in at the moment, which is nice for the incoming stable releases :)
15:17:00 <slaweq> ++
15:17:03 <slaweq> great news
15:17:10 <slaweq> #topic Grafana
15:17:17 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:18:08 <slaweq> in grafana failure rates seems to be pretty ok for all of the jobs recently
15:19:50 <slaweq> I think we can move on, unless You see any issue with grafana and wants to talk about it now
15:20:18 <ralonsoh> nothing from me
15:20:29 <bcafarel> that looks good to me
15:20:39 <slaweq> ok, let's move on
15:20:41 <slaweq> #topic fullstack/functional
15:20:41 <bcafarel> also this is with updated job names right?
15:21:04 <slaweq> bcafarel: I think it is
15:21:31 <slaweq> regarding those jobs I have only one small update about fullstack issue https://bugs.launchpad.net/neutron/+bug/1933234
15:21:49 <slaweq> I just confirmed today with extra logs why this test is failing
15:22:29 <slaweq> basically router is still processed and its ports aren't added to the RouterInfo.internal_ports cache
15:22:45 <slaweq> when network_update rpc message comes
15:23:02 <slaweq> due to that, it can't find port attached to that router from the network
15:23:11 <slaweq> and router update isn't scheduled
15:23:22 <ralonsoh> good catch
15:23:28 <slaweq> so this is in fact real bug, not test issue
15:23:35 <ralonsoh> indeed a race condition
15:23:51 <ralonsoh> how are we updating the network before adding the port to the internal cache?
15:23:51 <slaweq> and I don't know yet how to fix it
15:24:28 <slaweq> ralonsoh: so router is processed and goes to https://github.com/openstack/neutron/blob/3764969b82c6e7b8c74172a1ec4d230ce4ddedcc/neutron/agent/l3/router_info.py#L636
15:24:38 <slaweq> there it should be added to the internal ports cache
15:24:42 <ralonsoh> right
15:24:54 <slaweq> but in the meantime, network_update is called https://github.com/openstack/neutron/blob/3764969b82c6e7b8c74172a1ec4d230ce4ddedcc/neutron/agent/l3/agent.py#L602
15:25:13 <ralonsoh> ok, I see
15:25:16 <slaweq> and if there is no this port in cache yet, it fails to schedule router update
15:25:31 <ralonsoh> and nothing to say "stop, we are updating this..."
15:26:06 <slaweq> yes, I think I will add some kind of flag (lock) in https://github.com/openstack/neutron/blob/3764969b82c6e7b8c74172a1ec4d230ce4ddedcc/neutron/agent/l3/router_info.py#L1256
15:26:26 <slaweq> and then we can check it in network_update if it's processing that router info or not
15:26:36 <slaweq> if yes, we can wait a bit with looking for ports
15:26:45 <slaweq> but I didn't implement anything yet
15:26:53 <slaweq> I have it on my todo list for tomorrow :)
15:27:28 <lajoskatona> regarding https://bugs.launchpad.net/neutron/+bug/1930401 and privsep timeout
15:27:49 <lajoskatona> https://review.opendev.org/c/openstack/neutron/+/794994 is failing with old timeouts :-(
15:28:17 <lajoskatona> I started to check, but have no (quick) idea what should be there
15:28:25 <ralonsoh> but this is because what was failing is not the privsep command
15:28:35 <ralonsoh> but the daemon spawn process, right?
15:28:44 <ralonsoh> the daemon was not spawning
15:28:44 <obondarev> ++
15:29:26 <lajoskatona> ralonsoh: yeah
15:29:31 <ralonsoh> the timeout will prevent from a dead lock during a command execution
15:29:46 <ralonsoh> but not prevent or mitigate the problem we have here
15:31:45 <slaweq> so we still need to have 2 dhcp agents in those test which helped us a lot to workaround that issue
15:32:13 <ralonsoh> yes for now until we now what is preventing the daemon to start
15:32:23 <slaweq> k
15:32:45 <lajoskatona> ok
15:33:43 <slaweq> #topic Tempest/Scenario
15:33:59 <slaweq> here I just wanted to ask You to review https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/799648
15:34:13 <slaweq> it's small patch but it is causing issues in the tripleo based jobs which runs ovn
15:34:18 <slaweq> thx in advance
15:34:48 <slaweq> regarding issues in those jobs, I didn't found anything new worth to discuss today
15:35:34 <slaweq> and that's all from me for today regarding CI jobs
15:35:44 <slaweq> do You have anything else You want to discuss?
15:35:49 <ralonsoh> I'm fine
15:36:14 <bcafarel> same here
15:36:39 <slaweq> so one last thing - do You want to cancel next week's meeting or anyone wants to chair it?
15:37:03 <ralonsoh> (I'm fine with having a "free" week)
15:37:08 <lajoskatona> +1
15:37:22 <ralonsoh> if there is nothing catastrophic, of course
15:37:30 <bcafarel> worst case we know where to ping people :)
15:37:38 <ralonsoh> hehehe yes
15:37:52 <slaweq> ok, good
15:37:59 <slaweq> so I will cancel next week's meeting
15:38:09 <slaweq> and I think we are done for today
15:38:24 <slaweq> thx a lot for attending and for keeping our ci up and running :)
15:38:31 <slaweq> it seems really good recently
15:38:41 <slaweq> have a great week and see You online
15:38:44 <slaweq> o/
15:38:46 <ralonsoh> bye
15:38:47 <slaweq> #endmeeting