*** dkehn_ is now known as dkehn | 01:06 | |
opendevreview | ZhouHeng proposed openstack/neutron-lib master: [ovn]Floating IP adds distributed attributes https://review.opendev.org/c/openstack/neutron-lib/+/855053 | 02:34 |
---|---|---|
opendevreview | liujinxin proposed openstack/neutron master: For DvrEdgeRouter, snat namespace should not be created in initialize. https://review.opendev.org/c/openstack/neutron/+/855995 | 02:40 |
opendevreview | yangjianfeng proposed openstack/neutron-tempest-plugin master: Create extra external network with address scope for `ndp proxy` tests https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/855997 | 02:53 |
opendevreview | ZhouHeng proposed openstack/neutron-tempest-plugin master: skip some port_forwarding test https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/840584 | 02:53 |
opendevreview | yangjianfeng proposed openstack/neutron master: Forbid enable ndp proxy when external netwrok has no IPv6 address scope https://review.opendev.org/c/openstack/neutron/+/855850 | 03:17 |
opendevreview | yangjianfeng proposed openstack/neutron master: Forbid enable ndp proxy when external netwrok has no IPv6 address scope https://review.opendev.org/c/openstack/neutron/+/855850 | 03:36 |
opendevreview | Lajos Katona proposed openstack/networking-sfc master: Adopt to latest VlanManager changes https://review.opendev.org/c/openstack/networking-sfc/+/855887 | 08:30 |
opendevreview | Lajos Katona proposed openstack/neutron-fwaas master: Adopt to latest VlanManager changes https://review.opendev.org/c/openstack/neutron-fwaas/+/855891 | 09:16 |
opendevreview | Szymon Wróblewski proposed openstack/neutron master: Fix test_nova_send_events_* tests https://review.opendev.org/c/openstack/neutron/+/856034 | 09:41 |
opendevreview | yangjianfeng proposed openstack/neutron master: Forbid enable ndp proxy when external netwrok has no IPv6 address scope https://review.opendev.org/c/openstack/neutron/+/855850 | 10:14 |
opendevreview | yangjianfeng proposed openstack/neutron master: Forbid enable ndp proxy when external netwrok has no IPv6 address scope https://review.opendev.org/c/openstack/neutron/+/855850 | 10:52 |
opendevreview | Slawek Kaplonski proposed openstack/neutron master: Add new role "prepare_functional_tests_logs" https://review.opendev.org/c/openstack/neutron/+/855868 | 10:57 |
opendevreview | Slawek Kaplonski proposed openstack/neutron master: DNM Just run small subset of the functional jobs to test new role https://review.opendev.org/c/openstack/neutron/+/856039 | 10:57 |
opendevreview | Slawek Kaplonski proposed openstack/neutron master: Add new role "prepare_functional_tests_logs" https://review.opendev.org/c/openstack/neutron/+/855868 | 12:04 |
opendevreview | Slawek Kaplonski proposed openstack/neutron master: Add new role "prepare_functional_tests_logs" https://review.opendev.org/c/openstack/neutron/+/855868 | 13:07 |
*** kleini- is now known as kleini | 13:17 | |
opendevreview | Lajos Katona proposed openstack/networking-bagpipe stable/ussuri: [stable-only] Cap virtualenv for py37 https://review.opendev.org/c/openstack/networking-bagpipe/+/855883 | 13:23 |
*** dasm is now known as Guest2115 | 13:31 | |
*** Guest2115 is now known as dasm | 14:02 | |
slaweq | #startmeeting neutron_ci | 15:00 |
opendevmeet | Meeting started Tue Sep 6 15:00:12 2022 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:00 |
opendevmeet | The meeting name has been set to 'neutron_ci' | 15:00 |
slaweq | hi | 15:00 |
mlavalle | o/ | 15:00 |
ykarel | o/ | 15:00 |
slaweq | ralonsoh_ is on PTO, bcafarel too | 15:02 |
slaweq | I don't think if lajoskatona will be able to join today | 15:03 |
slaweq | so I guess we can start | 15:03 |
mlavalle | probably not | 15:03 |
slaweq | Grafana dashboard: https://grafana.opendev.org/d/f913631585/neutron-failure-rate?orgId=1 | 15:03 |
lajoskatona | Hi, I can, but only on IRC | 15:03 |
slaweq | lets go with first topic | 15:03 |
slaweq | lajoskatona: hi, yeah, we have it on irc today | 15:03 |
slaweq | #topic Actions from previous meetings | 15:03 |
lajoskatona | and I am on mobilnet, so possible that I will disappear time-to-time.... | 15:03 |
slaweq | slaweq to fix functiona/fullstack failures on centos 9 stream: https://bugs.launchpad.net/neutron/+bug/1976323 | 15:03 |
slaweq | lajoskatona: sure, thx for the heads up | 15:03 |
slaweq | regarding that action item, I didn't made any progress really | 15:04 |
slaweq | so I will add it for myself for next week too | 15:05 |
slaweq | #action slaweq to fix functiona/fullstack failures on centos 9 stream: https://bugs.launchpad.net/neutron/+bug/1976323 | 15:05 |
slaweq | next one | 15:05 |
slaweq | slaweq to check POST_FAILURE reasons | 15:05 |
slaweq | I checked it with infra team and it seems that it is timing out while uploading logs to swift | 15:05 |
slaweq | and we have a lot of small log files in the "dsvm-functional-logs" directory and that may be slow to upload all those files to Swift | 15:06 |
lajoskatona | ok so it is not that our tests take again longer time | 15:06 |
slaweq | so I prepared patch https://review.opendev.org/c/openstack/neutron/+/855868 | 15:06 |
slaweq | lajoskatona: nope | 15:06 |
slaweq | with that patch we will upload to swift .tar.gz archive with those logs which should be faster (I hope) | 15:06 |
slaweq | I also did additional patch https://review.opendev.org/c/openstack/neutron/+/855867/ which removes store of the journal.log in the logs of the job | 15:07 |
slaweq | it's not needed as devstack is already doing that and storing in the devstack.journal.gz file | 15:07 |
slaweq | so it can save some disk space and few seconds during the job execution :) | 15:08 |
slaweq | please review both those patches when You will have a minute or two | 15:08 |
ykarel | ack | 15:08 |
slaweq | next one | 15:09 |
slaweq | ykarel to check interface not found issues in the periodic functional jobs | 15:09 |
ykarel | yes i checked all the three failures linked | 15:09 |
ykarel | All the failures share common symptoms where interface get's deleted/added quickly, and in that period neutron fails with device missing in namespace in two of those failures | 15:09 |
ykarel | like two of them, deleted at 02:45:35.681, readded at 02:45:35.778, fails at 02:45:35.705 | 15:09 |
ykarel | deleted at 02:55:12.157, readded at 02:55:13.608, fails at 02:55:13.527 | 15:10 |
ykarel | One failure share same observations as done by slawek in https://bugs.launchpad.net/neutron/+bug/1961740/comments/17 | 15:10 |
ykarel | from opensearch i see some more occurances in non periodic jobs too in master and stable/yoga | 15:10 |
ykarel | https://opensearch.logs.openstack.org/_dashboards/app/discover/?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_a=(columns:!(_source),filters:!(),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:auto,query:(language:kuery,query:'message:%22not%20found%20in%20namespace%20snat%22'),sort:!()) | 15:11 |
slaweq | ykarel: yes, I also saw that "readd" of the interfaces some time ago when I was investigating that | 15:11 |
slaweq | but I have no idea why it happens like that | 15:11 |
ykarel | slaweq, yes i too didn't got the root cause for that | 15:12 |
slaweq | some time ago I even did patch which I hoped will workaround it | 15:12 |
slaweq | let me find it | 15:12 |
ykarel | the retry one? | 15:12 |
slaweq | yes | 15:12 |
ykarel | yeap that's not helping atleast not avoiding this issue completely | 15:13 |
slaweq | I know :/ | 15:13 |
ykarel | as in two of them i see that device added to namespace without retry | 15:13 |
slaweq | and that's strange as interface is added/removed/added in short period of time | 15:13 |
ykarel | but then removed | 15:13 |
ykarel | yes | 15:14 |
ykarel | also noticed there was cpu load > 10 around the failure, but i see similar with success jobs | 15:15 |
ykarel | also ram was not full utilized during failures | 15:15 |
ykarel | also observed many failures were seen in test patch https://review.opendev.org/c/openstack/neutron/+/854191/ as per opensearch | 15:16 |
ykarel | but that's just to trigger jobs, a lot of jobs | 15:17 |
ykarel | i recall some time back it was discussed to not use rootwrap in functional tests, you think that's related here? | 15:18 |
slaweq | maybe I have some theory | 15:18 |
slaweq | it is failing with error like "Interface not found in namespace snat..." or something like that | 15:19 |
ykarel | yes | 15:19 |
slaweq | so maybe as device is re-added, it's not in the snat-XXX namespace but in the global namespace | 15:20 |
slaweq | and that's why it cannot find it | 15:20 |
slaweq | look at https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ip_lib.py#L463 | 15:20 |
slaweq | it's where it is failing | 15:20 |
slaweq | and here "self._parent.namespace" is namespace in which interface is looked for? | 15:20 |
slaweq | and "net_ns_fd=namespace" is attribute to set for the interface | 15:21 |
slaweq | so it is expected to be in snat-XXX namespace but it's not there | 15:21 |
slaweq | as it was deleted/added again | 15:21 |
slaweq | does it makes sense? | 15:21 |
ykarel | didn't got why it's in global namespace | 15:21 |
slaweq | when You are adding new port it's always first in global namespace | 15:22 |
slaweq | right? | 15:22 |
ykarel | yes i think so | 15:22 |
ykarel | and to add it to namespace it needs some explicit calss | 15:23 |
ykarel | s/calss/calls | 15:23 |
slaweq | ok, I know why | 15:25 |
slaweq | it's bug in my retry | 15:25 |
ykarel | ahh | 15:25 |
slaweq | when it calls first time add_device_to_namespace | 15:25 |
slaweq | it set's parent namespace to namespace in https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ip_lib.py#L464 | 15:26 |
slaweq | and when it's deleted and added again, it's in global namespace | 15:26 |
slaweq | but _parent.namespace is already set | 15:26 |
slaweq | so that's why it's failing as it's looking for it in wrong namespace | 15:26 |
slaweq | :) | 15:26 |
slaweq | in this except block https://github.com/openstack/neutron/blob/master/neutron/agent/linux/interface.py#L360 | 15:27 |
slaweq | we should do something like: | 15:27 |
slaweq | device._parent.namespace = None before retrying | 15:27 |
slaweq | and that should make it working fine IMO | 15:27 |
ykarel | so iiuc this will fix the case where it's failing even after multiple retry, right? | 15:28 |
slaweq | yes | 15:28 |
ykarel | not the other two cases | 15:28 |
ykarel | okk | 15:28 |
slaweq | I will propose patch for that | 15:28 |
ykarel | k Thanks | 15:29 |
slaweq | I think it may fix all cases where interface is "re-added" | 15:30 |
slaweq | as currently retry mechanism is broken | 15:30 |
slaweq | #action slaweq to fix add_device_to_namespace retry mechanism | 15:32 |
ykarel | if add_interface_to_namespace is called everytime port is added to ovs-bridge, then yes it should fix | 15:32 |
ykarel | i still have to check complete flow | 15:32 |
slaweq | k | 15:32 |
slaweq | I will propose patch to fix that issue which we found | 15:32 |
slaweq | but if You will find anything else, please propose fixes too :) | 15:33 |
slaweq | ok, lets move on | 15:33 |
slaweq | mlavalle to check failing quota test in openstack-tox-py39-with-oslo-master periodic job | 15:33 |
mlavalle | It is failing sometime | 15:33 |
mlavalle | I filed this bug: https://bugs.launchpad.net/neutron/+bug/1988604 | 15:34 |
mlavalle | and proposed this fix: https://review.opendev.org/c/openstack/neutron/+/855703 | 15:34 |
lajoskatona | quick question: can it be realted to the sqlalchemy2.0 vs oslo.db relase thread? | 15:34 |
mlavalle | it might | 15:35 |
lajoskatona | ok, thanks, it is interesting to have an opinion on the debate | 15:35 |
lajoskatona | this morning I said let's wait with it, but if we are on the safe side with our best understanding let's have a release | 15:36 |
slaweq | lajoskatona: oslo.db version which has this "issue" is 12.1.0, right? | 15:36 |
lajoskatona | yes I think | 15:36 |
slaweq | ok | 15:36 |
slaweq | in "normal" unit test jobs we are still using 12.0.0 | 15:36 |
lajoskatona | It is not an issue more that some project not adopted, and we have this flapping job | 15:36 |
slaweq | so that's why those jobs are working fine | 15:36 |
lajoskatona | yes this is how I understand | 15:37 |
slaweq | mlavalle: I just run experimental jobs on Your patch | 15:37 |
slaweq | I think we can run it few times to check if that oslo-master job will be stable with it | 15:37 |
lajoskatona | but if mlavalle's patch fixes the job, I would say let's have this oslo.db out | 15:37 |
slaweq | ++ | 15:37 |
lajoskatona | slaweq: good idea | 15:37 |
lajoskatona | i forgot tht we have experimental for this | 15:37 |
slaweq | mlavalle: and also, I would really like ralonsoh_ to look at Your patch too :) | 15:38 |
mlavalle | slaweq: we actually discussed it before he went on vacation | 15:38 |
lajoskatona | +1 | 15:38 |
mlavalle | it is in this channel's log a week ago | 15:38 |
slaweq | mlavalle: ahh, ok | 15:38 |
slaweq | so if he was fine with it, I'm good too :) | 15:39 |
slaweq | I trust You ;) | 15:39 |
mlavalle | yes, he was | 15:39 |
lajoskatona | ok, than let's see the experimental jobs results and go back to the thread | 15:39 |
slaweq | ++ | 15:39 |
slaweq | thx mlavalle | 15:39 |
slaweq | next topic then | 15:39 |
slaweq | #topic Stable branches | 15:39 |
slaweq | anything new regarding stable branches? | 15:40 |
lajoskatona | I just checked (https://review.opendev.org/c/openstack/requirements/+/855973 ) and cinderseems to be failing but I can't check the logs on mobile net :P | 15:40 |
lajoskatona | elodilles proposed a series for caping virtualenv: https://review.opendev.org/q/topic:cap-virtualenv-py37 | 15:40 |
lajoskatona | if effects some networking projects also, I started to check (bagpipe perhaps) if you have tim please keep an eye on these | 15:41 |
lajoskatona | it is for ussuri only as I see | 15:41 |
slaweq | thx lajoskatona | 15:42 |
slaweq | I will take a look | 15:42 |
lajoskatona | thanks | 15:42 |
slaweq | ok, next topic | 15:42 |
slaweq | #topic Stadium projects | 15:42 |
* slaweq will be back in 2 minutes | 15:43 | |
lajoskatona | One topic, with the segments patches we let in a change in vlanmanager that brakes some stadium | 15:43 |
lajoskatona | I added the patches to the etherpad | 15:43 |
lajoskatona | https://review.opendev.org/c/openstack/networking-bagpipe/+/855886 | 15:43 |
lajoskatona | https://review.opendev.org/c/openstack/networking-sfc/+/855887 | 15:43 |
lajoskatona | https://review.opendev.org/c/openstack/neutron-fwaas/+/855891 | 15:43 |
lajoskatona | it was too late when I switched to FF mode and stopped merging of this feature sorry for it :-( | 15:44 |
* slaweq is back | 15:45 | |
slaweq | no worries | 15:45 |
slaweq | good that we found it before final release of Zed | 15:45 |
slaweq | so we still have time to fix those | 15:45 |
lajoskatona | And i have to drop (low battery, and have to fetch my sons from English lesson) | 15:46 |
lajoskatona | yeah good that we have periodic jobs :-) | 15:46 |
slaweq | lajoskatona: thx, see You | 15:46 |
lajoskatona | o/ | 15:46 |
slaweq | ok, lets move on to the next topic | 15:46 |
slaweq | #topic Grafana | 15:46 |
slaweq | dashboards looks pretty good IMO | 15:47 |
mlavalle | yeap | 15:47 |
slaweq | I don't see anything very bad there | 15:47 |
slaweq | do You see anything worth discussion there? | 15:47 |
slaweq | if not, I think we can quickly move on | 15:48 |
slaweq | #topic Rechecks | 15:48 |
slaweq | rechecks stats are in the meeting agenda etherpad https://etherpad.opendev.org/p/neutron-ci-meetings#L52 | 15:49 |
slaweq | basically it looks good still | 15:49 |
slaweq | last week we had 0.17 recheck in average to get patch merged | 15:49 |
slaweq | this week it's 1.5 but it's just begin of the week | 15:49 |
slaweq | so hopefully it will be better | 15:49 |
slaweq | regarding bare rechecks it's also much better this week | 15:50 |
slaweq | +---------+---------------+--------------+-------------------+... (full message at https://matrix.org/_matrix/media/r0/download/matrix.org/WAYKKcqlXmMdgZjqrbwNZgul) | 15:50 |
slaweq | thx a lot to all of You who are checking failures before rechecking :) | 15:50 |
mlavalle | +1 | 15:50 |
slaweq | anything else You want to add/ask regarding rechecks? | 15:51 |
mlavalle | nope | 15:51 |
slaweq | ok, so next topic | 15:52 |
slaweq | #topic fullstack/functional | 15:52 |
slaweq | here I found one "new" error | 15:52 |
slaweq | https://zuul.openstack.org/build/ad0801f20bc143cebf5692440b331df4 | 15:52 |
slaweq | metadata proxy didn't start | 15:52 |
slaweq | but I didn't had time to look into it deeper | 15:53 |
slaweq | anyone wants to check it? | 15:54 |
mlavalle | I'll look | 15:54 |
slaweq | from log https://6338bbe59b3242bd04ef-84c9f5cd8c2b87d7cd3ff61e3f0a2559.ssl.cf2.rackcdn.com/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-uwsgi-fips/ad0801f/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.test_dhcp_agent.DHCPAgentOVSTestCase.test_metadata_proxy_respawned.txt it seems that it was respawned | 15:54 |
mlavalle | we don't know it's and issue yet, right? | 15:54 |
slaweq | so maybe that's some issue in test | 15:54 |
slaweq | mlavalle: nope | 15:54 |
slaweq | thx for volunteering | 15:54 |
slaweq | #action mlavalle to check metadata proxy not respawned error | 15:55 |
slaweq | mlavalle: but please don't treat it as high priority (for now) as it happened only once | 15:55 |
mlavalle | yeap, that's why I asked | 15:55 |
slaweq | ++ | 15:55 |
slaweq | any other issues/questions related to the functional or fullstack jobs? | 15:56 |
slaweq | or can we move on? | 15:56 |
slaweq | ok, lets move on | 15:56 |
slaweq | #topic Tempest/Scenario | 15:56 |
slaweq | here I just wanted to share with You one failure | 15:57 |
slaweq | https://3525f1c73d59ef5d5b98-485374e596f765d9f96c9ac94e680c34.ssl.cf2.rackcdn.com/840421/34/check/neutron-tempest-plugin-ovn/b503178/testr_results.html | 15:57 |
slaweq | it seems like some segfault in the guest ubuntu image | 15:57 |
slaweq | I saw it only once and it's not neutron related issue | 15:57 |
slaweq | but just wanted to make You aware for things like that | 15:57 |
slaweq | and that's all | 15:57 |
slaweq | regarding periodic jobs, it looks good this week | 15:58 |
slaweq | it was even all green 3 or 4 days so it's great | 15:58 |
slaweq | that's all from me for today | 15:58 |
slaweq | any last minute topics for the CI meeting for today? | 15:58 |
ykarel | none from me | 15:58 |
mlavalle | non from me either | 15:59 |
slaweq | ok, if not, then thx for attending the meeting and have a great week :) | 15:59 |
slaweq | #endmeeting | 15:59 |
opendevmeet | Meeting ended Tue Sep 6 15:59:13 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:59 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/neutron_ci/2022/neutron_ci.2022-09-06-15.00.html | 15:59 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/neutron_ci/2022/neutron_ci.2022-09-06-15.00.txt | 15:59 |
opendevmeet | Log: https://meetings.opendev.org/meetings/neutron_ci/2022/neutron_ci.2022-09-06-15.00.log.html | 15:59 |
mlavalle | o/ | 15:59 |
opendevreview | Merged openstack/neutron-tempest-plugin master: skip some port_forwarding test https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/840584 | 16:30 |
opendevreview | Merged openstack/neutron master: [api]adds port_forwarding id when list floatingip https://review.opendev.org/c/openstack/neutron/+/840565 | 17:14 |
opendevreview | Merged openstack/neutron stable/train: Bump revision number of objects when description is changed https://review.opendev.org/c/openstack/neutron/+/854990 | 17:14 |
*** dasm is now known as dasm|off | 22:56 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!