*** jbadiapa has quit IRC | 00:21 | |
*** jbadiapa has joined #openstack-meeting-3 | 00:34 | |
*** yasufum has joined #openstack-meeting-3 | 00:45 | |
*** manub has quit IRC | 02:53 | |
*** manub has joined #openstack-meeting-3 | 04:31 | |
*** ricolin_ has joined #openstack-meeting-3 | 04:32 | |
*** manub has quit IRC | 04:33 | |
*** ricolin has quit IRC | 04:36 | |
*** ricolin_ is now known as ricolin | 04:36 | |
*** ralonsoh has joined #openstack-meeting-3 | 05:41 | |
*** slaweq has joined #openstack-meeting-3 | 06:17 | |
*** jbadiapa has quit IRC | 07:02 | |
*** jbadiapa has joined #openstack-meeting-3 | 07:02 | |
*** tosky has joined #openstack-meeting-3 | 07:48 | |
*** tosky has quit IRC | 07:48 | |
*** tosky has joined #openstack-meeting-3 | 08:08 | |
*** stephenfin has quit IRC | 08:49 | |
*** stephenfin has joined #openstack-meeting-3 | 09:15 | |
*** obondarev has joined #openstack-meeting-3 | 11:32 | |
*** jbadiapa has quit IRC | 12:07 | |
*** jbadiapa has joined #openstack-meeting-3 | 12:09 | |
*** belmoreira has joined #openstack-meeting-3 | 13:51 | |
*** jlibosva has joined #openstack-meeting-3 | 13:56 | |
*** lajoskatona has joined #openstack-meeting-3 | 14:00 | |
slaweq | #startmeeting networking | 14:01 |
---|---|---|
opendevmeet | Meeting started Tue Jun 1 14:01:50 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. | 14:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 14:01 |
opendevmeet | The meeting name has been set to 'networking' | 14:01 |
rubasov | hi | 14:02 |
lajoskatona | Hi | 14:02 |
jlibosva | hi | 14:02 |
ralonsoh | hhi | 14:02 |
amotoki | o/ | 14:02 |
obondarev | hi | 14:03 |
bcafarel | o/ | 14:03 |
slaweq | ok, lets start | 14:04 |
*** manub has joined #openstack-meeting-3 | 14:04 | |
slaweq | welcome in our new irc "home" :) | 14:04 |
slaweq | #topic announcements | 14:05 |
slaweq | Xena cycle calendar https://releases.openstack.org/xena/schedule.html | 14:05 |
njohnston | o/ | 14:06 |
slaweq | we have about 5 more weeks before Xena-2 milestone | 14:06 |
slaweq | and I want to ask everyone to try to review some of opened specs, so we can get them merged before milestone 2 and move on with implementation later | 14:06 |
slaweq | next one | 14:07 |
slaweq | Move to OTFC is done: http://lists.openstack.org/pipermail/openstack-discuss/2021-May/022724.html | 14:07 |
slaweq | You probably all know that already as You are here | 14:07 |
bcafarel | :) | 14:07 |
slaweq | but just for the sake of announcements | 14:07 |
slaweq | regarding to that migration I have couple of related things | 14:07 |
slaweq | if You can, please stay few more days on the old channel, and redirect people to the new one if necessary, | 14:07 |
slaweq | but please remember to not say explicitly name of the new server on the freenode channel | 14:08 |
slaweq | as that may cause that our channel there will be taken | 14:08 |
slaweq | neutron docs are updated: https://review.opendev.org/c/openstack/neutron/+/793846 - if there are any other places to change, please ping me or send patch to update that | 14:08 |
*** mlavalle has joined #openstack-meeting-3 | 14:09 | |
slaweq | and last thing related to that migration | 14:09 |
mlavalle | o/ | 14:09 |
slaweq | hi mlavalle - You were fast connecting here :) | 14:09 |
mlavalle | LOL, took me a few minutes to figure it out | 14:10 |
slaweq | :) | 14:10 |
slaweq | tc asked teams to consider moving from the meeting rooms to the project channels for meetings | 14:10 |
slaweq | so I wanted to ask You what do You think about it? | 14:10 |
slaweq | do You think we should still have all meetings in the #openstack-meeting-X channels? | 14:10 |
slaweq | or maybe we can move to the #openstack-neutron with all meetings? | 14:11 |
amotoki | the only downside I see is that openstack bot could post messages on reviews. Otherwise sounds good. | 14:11 |
slaweq | amotoki: do You know if that is an issue for other teams maybe? | 14:12 |
amotoki | slaweq: unfortunately no | 14:12 |
slaweq | I know that some teams already have their meetings in the team channels | 14:12 |
obondarev | some ongoing discussions in team channel would have to be paused during meeting time | 14:13 |
lajoskatona | Perhaps that would be easier for collecting people to the meetings | 14:13 |
lajoskatona | make ----^ | 14:13 |
slaweq | obondarev: yes, but tbh, do we have a lot of them? :) | 14:13 |
bcafarel | overall I think #openstack-neutron is generally "quiet" enough now that to have meetings in it | 14:13 |
bcafarel | but no strong opinion, I will just update my tabs :) (or follow the PTL pings) | 14:14 |
obondarev | just try to figure out downsides, I don't think it should stop us:) | 14:14 |
amotoki | obondarev's point and mine are potential downside. generally spekaing, #-neutron channel does not have many traffic, so I am fine with having meetings in #-neutron channel. | 14:14 |
rubasov | I'm also okay with both options | 14:14 |
obondarev | +1 | 14:15 |
slaweq | ok, I will try to get more info from the tc why they want teams to migrate | 14:15 |
njohnston | The octavia team has met in their channel for a long time now without issues | 14:15 |
slaweq | njohnston: thx, that's good to know | 14:16 |
slaweq | do You maybe know why tc wants teams to do such migration? is there any issue with using -meeting channels? | 14:16 |
* slaweq is just curious | 14:16 | |
njohnston | I'm not sure, I did not attend that discussion. | 14:17 |
slaweq | ok | 14:17 |
slaweq | so I will try to get such info and as there is no strong opinions agains, I will update our meetings accordingly | 14:18 |
slaweq | so please expect some email with info about it during this week :) | 14:19 |
slaweq | ok, last reminder | 14:19 |
slaweq | Nomination for Y names' proposals is still open http://lists.openstack.org/pipermail/openstack-discuss/2021-May/022383.html | 14:19 |
slaweq | that's all announcements/reminders from me | 14:19 |
slaweq | do You have anything else what You want to share now? | 14:20 |
amotoki | I have one thing I would like to share around OVN switch and horizon. I don't want to dig into it in detail here though. | 14:20 |
amotoki | horizon see consistent failures in the integration tests after switching the network backend to OVN. The failures happen in router related tests. | 14:20 |
amotoki | horizon see consistent failures in the integration tests after switching the network backend to OVN. The failures happen in router related tests. | 14:20 |
amotoki | We haven't understood what happens in detail, but if you notice some difference between OVS/L3 and OVN cases please let vishal (PTL) or me know it. | 14:20 |
amotoki | that's all I would like to share. | 14:20 |
slaweq | amotoki: there are for sure differences as in OVN/L3 there is no "distributed" attribute supported - routers are distributed by default | 14:21 |
slaweq | and other difference which I know is that ovn/l3 don't have l3 agent scheduler | 14:21 |
amotoki | mistake in the third comment : it should be "When we configure the job to use OVS backend, the failure (at least on the router related failures) has gone." | 14:22 |
slaweq | so all "agent" related api calls will probably fail | 14:22 |
amotoki | slaweq: yeah, I know. I think horizon needs to investigate it in more detail on what's happening. | 14:22 |
slaweq | amotoki: do You have link to the failed job, I can take a look after the meeting | 14:22 |
slaweq | amotoki: we also had similar issue with functional job in osc | 14:23 |
amotoki | I think https://zuul.openstack.org/builds?job_name=horizon-integration-tests&project=openstack%2Fhorizon&branch=master&pipeline=periodic is the easist way. | 14:23 |
slaweq | I sent patch https://review.opendev.org/c/openstack/python-openstackclient/+/793142 to address issues in that job | 14:23 |
amotoki | I saw it too. | 14:23 |
slaweq | and there are also some fixes on neutron side needed (see depends-on in the linked patch) | 14:23 |
amotoki | anyway I will continue to follow up the failure detail :) | 14:24 |
slaweq | ok | 14:25 |
slaweq | #topic Blueprints | 14:25 |
slaweq | Xena-2 BPs: https://bugs.launchpad.net/neutron/+milestone/xena-2 | 14:26 |
slaweq | do You have any updates about any of them? | 14:26 |
slaweq | I have only one short update about https://blueprints.launchpad.net/neutron/+spec/secure-rbac-roles | 14:27 |
slaweq | almost all UT patches are now merged | 14:27 |
slaweq | only one left | 14:27 |
bcafarel | nice work | 14:29 |
slaweq | now I need to check how to switch some devstack based job to use those new defaults | 14:30 |
slaweq | and then to check what will be broken | 14:30 |
slaweq | but tbh I'm not sure if we should mark BP as completed now or keep it still opened? | 14:30 |
slaweq | all new default rules are there already | 14:30 |
slaweq | so IMO we can treat any issues which we will find as bugs simply and report them on LP | 14:31 |
slaweq | wdyt? | 14:31 |
obondarev | makes sense | 14:31 |
slaweq | IMO it would be easier to track them | 14:31 |
bcafarel | well the feature is implemented code is in there etc so marking BP as completed makes sense | 14:31 |
ralonsoh | yeah, we can open new LP for new issues | 14:32 |
amotoki | make sense to me too. | 14:32 |
slaweq | k, thx | 14:32 |
slaweq | and that's all update from my side regarding BPs | 14:32 |
slaweq | if there are no other updates, I think we can move on | 14:33 |
slaweq | to the next topic | 14:33 |
slaweq | #topic Bugs | 14:33 |
slaweq | I was bug deputy last week. Report http://lists.openstack.org/pipermail/openstack-discuss/2021-May/022776.html | 14:33 |
slaweq | there is couple of bugs which I wanted to highlight now | 14:33 |
slaweq | first of all, l3-ha issue: | 14:33 |
slaweq | https://bugs.launchpad.net/neutron/+bug/1930096 - this is related to the L3ha and seems pretty serious issue which needs to be checked. | 14:33 |
opendevmeet | Launchpad bug 1930096 in neutron "Missing static routes after neutron-l3-agent restart" [High,Confirmed] | 14:33 |
slaweq | according to my test, it seems that this may be related to the change which is setting interfaces to be DOWN on the standby node | 14:34 |
slaweq | and bring them UP when router becomes active on the node | 14:34 |
slaweq | I saw in the logs locally that keepalived was complaining that there is no route to host and extra routes were not configured in qrouter namespace | 14:35 |
slaweq | if anyone will have cycles, please try to check that | 14:35 |
slaweq | I may try but not this week for sure | 14:36 |
ralonsoh | I can check it | 14:36 |
slaweq | ralonsoh: thx | 14:36 |
obondarev | ralonsoh has infinite cycles it seems :) | 14:36 |
slaweq | obondarev: yes, that's true | 14:37 |
slaweq | :) | 14:37 |
mlavalle | There are several ralonsoh s in parallel universes all working together | 14:37 |
slaweq | mlavalle: that can be the only reasonable explanation ;) | 14:38 |
ralonsoh | hehehe | 14:38 |
amotoki | :) | 14:38 |
slaweq | ok, next one | 14:38 |
slaweq | https://bugs.launchpad.net/neutron/+bug/1929523 - this one is gate blocker | 14:38 |
opendevmeet | Launchpad bug 1929523 in neutron "Test tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_subnet_details is failing from time to time" [High,Confirmed] | 14:38 |
slaweq | and as our gate isn't in good condition now, we need to take a look into that | 14:39 |
ralonsoh | there was a related patch from Candido | 14:39 |
slaweq | I may try to check that tomorrow or on Friday | 14:39 |
ralonsoh | I'll send you the link | 14:39 |
slaweq | ralonsoh: but I'm not sure if it's the same issue really | 14:39 |
slaweq | please send me link and I will take a look | 14:39 |
ralonsoh | it is, the DNS entries | 14:39 |
slaweq | would be great :) | 14:40 |
ralonsoh | https://review.opendev.org/c/openstack/tempest/+/779756 | 14:40 |
ralonsoh | of course, maybe we need something more | 14:40 |
slaweq | thx ralonsoh | 14:42 |
slaweq | I will take a look at it | 14:42 |
slaweq | and the last bug from me: | 14:42 |
slaweq | https://bugs.launchpad.net/neutron/+bug/1929821 - that seems to be low-hanging-fruit bug, so maybe there is someone who wants to take it :) | 14:42 |
opendevmeet | Launchpad bug 1929821 in neutron "[dvr] misleading fip rule priority not found error message" [Low,New] | 14:42 |
slaweq | and that are all bugs which I had for today | 14:43 |
slaweq | any other issues You want to discuss? | 14:44 |
mlavalle | not me | 14:44 |
slaweq | bug deputy this week is hongbin | 14:45 |
slaweq | he is aware of it and should be ok | 14:45 |
slaweq | next week will be haleyb's turn | 14:45 |
haleyb | hi | 14:45 |
slaweq | hi haleyb :) | 14:46 |
mlavalle | hi haleyb! | 14:46 |
slaweq | are You ok being bug deputy next week? | 14:46 |
haleyb | yes, that's fine | 14:46 |
slaweq | great, thx | 14:46 |
slaweq | so, let's move on | 14:46 |
slaweq | #topic CLI/SDK | 14:47 |
slaweq | OSC patch https://review.opendev.org/c/openstack/python-openstackclient/+/768210 is merged now | 14:47 |
slaweq | so with next osc release we should have possibility to send custom parameters to the neutron | 14:47 |
bcafarel | \o/ | 14:47 |
slaweq | thanks all for reviews of that patch :) | 14:47 |
amotoki | really nice | 14:47 |
*** jlibosva has quit IRC | 14:47 | |
slaweq | I proposed also neutronclient patch https://review.opendev.org/c/openstack/python-neutronclient/+/793366 | 14:48 |
slaweq | please check that when You will have some time | 14:48 |
slaweq | and I have 1 more question - is there anything else we should do to finish that effort? | 14:48 |
amotoki | generally no, but I'd like to check --clear behavior consistency in detail. in some API, None needs to be sent to clear the list (instead of []). | 14:49 |
slaweq | AFAIR that OSC thing was last missing piece but maybe I missed something | 14:49 |
amotoki | I am not sure it should be handled via CLI or API itself. | 14:49 |
slaweq | amotoki: You mean to check if it works ok for all our resources already? | 14:50 |
amotoki | slaweq: yeah, I would like to check some API which uses None to clear the list, but I don't think it is a blcoking issue for final cal. | 14:51 |
amotoki | *call. | 14:51 |
slaweq | amotoki: great, if You will find any issues, please report bug for OSC and we can fix them | 14:51 |
amotoki | slaweq: sure | 14:51 |
slaweq | amotoki++ thx | 14:51 |
slaweq | so we are finally approaching to the EOL for neutronclient CLI :) | 14:51 |
slaweq | please be aware | 14:52 |
slaweq | and that are all topics from me for today | 14:52 |
obondarev | raise hand who still use neutronclient o/ :D | 14:52 |
slaweq | obondarev: o/ | 14:53 |
slaweq | but it's time to move to the OSC now :) | 14:53 |
obondarev | yeah, sad but true | 14:53 |
amotoki | using neutron in mitaka env I still need to take care :p | 14:53 |
slaweq | amotoki: WOW, that's old one | 14:53 |
slaweq | I hope You don't need to do backports from master to that mitaka branch :P | 14:53 |
amotoki | hehe. it's not so surprising in teleco env, | 14:54 |
slaweq | :) | 14:54 |
bcafarel | true | 14:54 |
slaweq | ok, thx for attending the meeting | 14:54 |
slaweq | and have a great week | 14:54 |
mlavalle | o/ | 14:54 |
ralonsoh | bye | 14:54 |
slaweq | o/ | 14:54 |
amotoki | o/ | 14:54 |
rubasov | o/ | 14:54 |
bcafarel | o/ | 14:54 |
slaweq | #endmeeting | 14:54 |
opendevmeet | Meeting ended Tue Jun 1 14:54:55 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 14:54 |
opendevmeet | Minutes: http://eavesdrop.openstack.org/meetings/networking/2021/networking.2021-06-01-14.01.html | 14:54 |
opendevmeet | Minutes (text): http://eavesdrop.openstack.org/meetings/networking/2021/networking.2021-06-01-14.01.txt | 14:55 |
opendevmeet | Log: http://eavesdrop.openstack.org/meetings/networking/2021/networking.2021-06-01-14.01.log.html | 14:55 |
lajoskatona | o/ | 14:55 |
manub | o/ | 14:55 |
*** manub has quit IRC | 14:55 | |
slaweq | #startmeeting neutron_ci | 15:01 |
opendevmeet | Meeting started Tue Jun 1 15:01:37 2021 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:01 |
opendevmeet | The meeting name has been set to 'neutron_ci' | 15:01 |
ralonsoh | hi | 15:02 |
lajoskatona | Hi | 15:03 |
obondarev | hi | 15:03 |
slaweq | bcafarel: ping | 15:03 |
slaweq | ci meeting | 15:03 |
bcafarel | o/ sorry | 15:03 |
slaweq | np :) | 15:03 |
bcafarel | I got used to the usual 15-20 min back between these meetings :p | 15:03 |
slaweq | ok, let's start | 15:03 |
slaweq | Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate | 15:04 |
slaweq | Please open now :) | 15:04 |
slaweq | #topic Actions from previous meetings | 15:04 |
slaweq | obondarev to check neutron-tempest-dvr-ha-multinode-full and switch it to ML2/OVS | 15:04 |
obondarev | https://review.opendev.org/c/openstack/neutron/+/793104 | 15:05 |
obondarev | ready | 15:05 |
*** sean-k-mooney has joined #openstack-meeting-3 | 15:05 | |
slaweq | obondarev: thx | 15:05 |
slaweq | today I found out that there is also neutron-tempest-ipv6 which is now running on ovn | 15:06 |
slaweq | and now the question is - do we want to switch it back to ovs or keep it with default backend? | 15:06 |
ralonsoh | it is not failing | 15:06 |
lajoskatona | +1 | 15:07 |
slaweq | I would say - keep it with default backend (ovn now) but maybe You have other opinions about it | 15:07 |
ralonsoh | so, IMO, keep it in OVN | 15:07 |
lajoskatona | agree | 15:07 |
obondarev | will it mean that reference implementation will not be covered in gates? | 15:07 |
obondarev | with tempest ipv6 tests | 15:07 |
slaweq | obondarev: what do You mean by reference implementation? ML2/OVS? | 15:07 |
bcafarel | it may turn close to a duplicated of neutron-ovn-tempest-ovs-release-ipv6-only too? | 15:07 |
obondarev | if yes - that would be a problem | 15:07 |
obondarev | slaweq, yes ML2-OVS | 15:08 |
slaweq | bcafarel: good point, I missed that we have such job already | 15:08 |
ralonsoh | if I'm not wrong, ovs release uses master | 15:08 |
ralonsoh | right? | 15:08 |
ralonsoh | in any case, it won't affect and could be a duplicate | 15:08 |
slaweq | so according to that and to what obondarev said, maybe we should switch it neutron-tempest-ipv6 to be ml2/ovs again | 15:09 |
obondarev | so if reference ML2-OVS is not protected with CI then it's prone to regressions which is not good | 15:09 |
obondarev | for ipv6 again | 15:09 |
slaweq | obondarev: that's good point, we don't want regression in ML2/OVS for sure | 15:10 |
ralonsoh | btw, why don't we rename neutron-ovn-tempest-ovs-release-ipv6-only to be OVS? | 15:10 |
slaweq | ralonsoh: that' | 15:10 |
ralonsoh | and keep it neutron-tempest-ipv6 with the default backend | 15:10 |
slaweq | that's my another question - what we should do as next step with our jobs | 15:10 |
slaweq | should we switch "*-ovn" jobs to be "-ovs" and keep "default" jobs as ovn ones now? | 15:11 |
slaweq | to reflect the devstack change in our ci jobs too? | 15:11 |
slaweq | or should we for now just keep everything as it was, so "regular" jobs running ovs and "-ovn" jobs running ovn | 15:11 |
slaweq | wdyt? | 15:12 |
ralonsoh | IMO, rename those with a different backend | 15:12 |
ralonsoh | in this case, -ovs | 15:12 |
obondarev | that makes sense | 15:13 |
bcafarel | +1 at least for a while, it will be clearer | 15:13 |
slaweq | ok, so we need to change many of our jobs now :) | 15:13 |
lajoskatona | yeah make the names help identifing what is the backend | 15:14 |
bcafarel | before there was only select job with linuxbridge (and some new with ovn) so naming was clear, but with the default switch, it can get confusing (even for us :) ) | 15:14 |
slaweq | but I agree that this is better long term, especially that some of our jobs inherits e.g. from tempest jobs and those tempest jobs with default settings are run in e.g. tempest or nova's gate too | 15:14 |
obondarev | we can set them use OVS explicitly as first step | 15:14 |
obondarev | and go on with renaming as second | 15:14 |
slaweq | obondarev: yes, I agree | 15:14 |
slaweq | let's merge patches which we have now to enforce ovs where it was before, to have working ci | 15:15 |
slaweq | and then let's switch jobs completly | 15:15 |
slaweq | as that will require more work for sure | 15:15 |
bcafarel | sounds good to me! | 15:16 |
slaweq | ok, sounds like a plan :) | 15:16 |
slaweq | I will try to prepare plan to switch jobs for next week | 15:17 |
slaweq | #action slaweq to prepare plan of switch ovn <-> ovs jobs in neutron CI | 15:17 |
slaweq | ok | 15:19 |
slaweq | next one | 15:19 |
slaweq | ralonsoh to talk with ccamposr about issue https://bugs.launchpad.net/neutron/+bug/1929523 | 15:19 |
opendevmeet | Launchpad bug 1929523 in neutron "Test tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_subnet_details is failing from time to time" [High,Confirmed] | 15:19 |
slaweq | ralonsoh: it's related to the patch https://review.opendev.org/c/openstack/tempest/+/779756 right? | 15:19 |
ralonsoh | yes | 15:20 |
slaweq | ralonsoh: I'm still not convinced that this will solve that problem from https://bugs.launchpad.net/neutron/+bug/1929523 | 15:20 |
opendevmeet | Launchpad bug 1929523 in neutron "Test tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_subnet_details is failing from time to time" [High,Confirmed] | 15:20 |
slaweq | as the issue is a bit different now | 15:21 |
slaweq | it's not that we have additional server in the list | 15:21 |
ralonsoh | in this case we don't have any DNS regoster | 15:21 |
slaweq | but we got empty list | 15:21 |
slaweq | and e.g. failure https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_567/785895/1/gate/neutron-tempest-slow-py3/567fc7f/testr_results.html happened 20.05, more than week after patch https://review.opendev.org/c/openstack/tempest/+/779756 was merged | 15:22 |
ralonsoh | are we using cirros or ubuntu? | 15:22 |
ralonsoh | if we use the advance image, maybe we should use resolvectl | 15:22 |
slaweq | in that failed test, Ubuntu | 15:22 |
ralonsoh | instead of reading /etc/resolv.conf | 15:23 |
ralonsoh | I'll propose a patch in tempest to use resolvectl, if present in the VM | 15:23 |
slaweq | k | 15:23 |
ralonsoh | that should be more accurate | 15:23 |
slaweq | maybe indeed that will help | 15:23 |
slaweq | thx ralonsoh | 15:23 |
slaweq | #action ralonsoh to propose tempest patch to use resolvectl to address https://bugs.launchpad.net/neutron/+bug/1929523 | 15:24 |
opendevmeet | Launchpad bug 1929523 in neutron "Test tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_subnet_details is failing from time to time" [High,Confirmed] | 15:24 |
slaweq | ok, I think we can move on | 15:24 |
slaweq | #topic Stadium projects | 15:24 |
slaweq | lajoskatona: any updates? | 15:24 |
lajoskatona | nothing | 15:25 |
*** jpward has joined #openstack-meeting-3 | 15:25 | |
lajoskatona | I think one backend change patch is open, letme check | 15:25 |
lajoskatona | https://review.opendev.org/c/openstack/networking-bagpipe/+/791126 | 15:25 |
lajoskatona | yeah one more thing, the old patches of boden for using payload are now again active, but I hope there will be no problem with them | 15:26 |
slaweq | yes, I saw some of them already | 15:27 |
lajoskatona | I have seen patch in x/vmware-nsx as abandoned for payload patch, and I have no right to activate it | 15:27 |
lajoskatona | not sure if we have to warn them somehow | 15:27 |
slaweq | good idea, I will try to reach out to boden somehow | 15:28 |
slaweq | maybe he will redirect me to someone who is now working on it | 15:28 |
slaweq | #action slaweq to reach out to boden about payload patches and x/vmware-nsx | 15:28 |
slaweq | thx lajoskatona | 15:28 |
slaweq | if that is all, lets move on | 15:31 |
slaweq | #topic Stable branches | 15:31 |
slaweq | bcafarel: anything new here? | 15:31 |
bcafarel | mostly good all around :) one question I had there (coming from https://review.opendev.org/c/openstack/neutron/+/793417/ failing backport) | 15:31 |
bcafarel | other branches do not have that irrelevant-files issue as in newer branches these jobs run in periodic | 15:32 |
bcafarel | but for victoria/ussuri I think it is better to fix the job dep instead of backporting the move to periodic | 15:33 |
slaweq | I think we are still missing many patches like https://review.opendev.org/q/topic:%22improve-neutron-ci-stable%252Fussuri%22+(status:open%20OR%20status:merged) | 15:33 |
slaweq | in stable branches | 15:33 |
slaweq | and that's only example for ussuri | 15:34 |
slaweq | but similar patches are opened for other branches too | 15:34 |
bcafarel | yes, getting these ones in will probably help ussuri in general too | 15:34 |
bcafarel | I have https://review.opendev.org/c/openstack/neutron/+/793799 and https://review.opendev.org/c/openstack/neutron/+/793801 mostly for that provider job issue | 15:35 |
bcafarel | ralonsoh: looks like that whole chain is just wainting on https://review.opendev.org/c/openstack/neutron/+/778708 if you can check it | 15:36 |
ralonsoh | I'll do | 15:36 |
ralonsoh | ah I know this patch, perfect | 15:37 |
bcafarel | yes hopefully it should not take too much off of your infinite cycles :) | 15:37 |
slaweq | LOL | 15:38 |
slaweq | thx ralonsoh | 15:38 |
slaweq | ok, lets move on | 15:39 |
slaweq | #topic Grafana | 15:39 |
slaweq | we have our gate broken now due to ovs->ovn migration and some other issue | 15:39 |
slaweq | so we can only focus on the check queue graphs today | 15:39 |
slaweq | and the biggest issues which I see for now are with neutron-ovn-tempest-slow job | 15:40 |
slaweq | which is failing very often | 15:40 |
slaweq | and ralonsoh already proposed to make it non-voting temporary | 15:40 |
ralonsoh | yes | 15:40 |
slaweq | I reported LP for that https://bugs.launchpad.net/neutron/+bug/1930402 | 15:41 |
opendevmeet | Launchpad bug 1930402 in neutron "SSH timeouts happens very often in the ovn based CI jobs" [Critical,Confirmed] | 15:41 |
slaweq | and I know that jlibosva and lucasgomes are looking into it | 15:41 |
slaweq | do You have anything else regarding grafana? | 15:42 |
bcafarel | I see openstack-tox-py36-with-neutron-lib-master started 100% failing in periodic few days ago | 15:42 |
ralonsoh | link? | 15:43 |
slaweq | bcafarel: yes, I had it for the last topic of the meeting :) | 15:43 |
slaweq | but as You started, we can discuss it now | 15:43 |
slaweq | https://bugs.launchpad.net/neutron/+bug/1930397 | 15:43 |
opendevmeet | Launchpad bug 1930397 in neutron "neutron-lib from master branch is breaking our UT job" [Critical,Confirmed] | 15:43 |
slaweq | ther is bug reported | 15:43 |
slaweq | and example https://zuul.openstack.org/build/9e852a424a52479695223ac2a7723e1a | 15:43 |
bcafarel | ah thanks I was looking for some job link | 15:43 |
ralonsoh | maybe this is because of the change in the n-lib session | 15:44 |
ralonsoh | I'll check it | 15:44 |
ralonsoh | good to have this n-lib master job | 15:44 |
slaweq | ralonsoh: yes, I suspect that | 15:44 |
slaweq | so we should avoid release new neutron-lib before we will not fix that issue | 15:44 |
slaweq | otherwise we will probably break our gate (again) :) | 15:45 |
ralonsoh | right | 15:45 |
ralonsoh | pffff | 15:45 |
ralonsoh | no, not again | 15:45 |
bcafarel | one broken gate at a time | 15:45 |
slaweq | LOL | 15:45 |
obondarev | :) | 15:45 |
bcafarel | maybe related to recent "Allow lazy load in model_query" neutron-lib commit? | 15:45 |
ralonsoh | no, not this | 15:45 |
obondarev | I checked it but seems unrelated | 15:45 |
ralonsoh | this is not used yet | 15:45 |
obondarev | yes | 15:46 |
bcafarel | ok :) | 15:46 |
slaweq | so, ralonsoh You will check it, right? | 15:47 |
ralonsoh | yes | 15:47 |
slaweq | thx a lot | 15:47 |
slaweq | #action ralonsoh to check failing neutron-lib-from-master periodic job | 15:47 |
slaweq | ok, let's move on then | 15:47 |
slaweq | #topic fullstack/functional | 15:47 |
slaweq | regarding functional job, I didn't found any new issues for today | 15:47 |
slaweq | but for fullstack there is new one: | 15:48 |
slaweq | https://bugs.launchpad.net/neutron/+bug/1930401 | 15:48 |
opendevmeet | Launchpad bug 1930401 in neutron "Fullstack l3 agent tests failing due to timeout waiting until port is active" [Critical,Confirmed] | 15:48 |
slaweq | seem like it happens pretty often on various L3 related tests | 15:48 |
slaweq | I can investigate it more in next days | 15:48 |
slaweq | unless someone else wants to take it :) | 15:48 |
ralonsoh | maybe next week | 15:49 |
lajoskatona | I can check | 15:49 |
slaweq | lajoskatona: thx a lot | 15:50 |
slaweq | #action lajoskatona to check fullstack failures https://bugs.launchpad.net/neutron/+bug/1930401 | 15:50 |
opendevmeet | Launchpad bug 1930401 in neutron "Fullstack l3 agent tests failing due to timeout waiting until port is active" [Critical,Confirmed] | 15:50 |
slaweq | lajoskatona: and also, there is another fullstack issue: https://bugs.launchpad.net/neutron/+bug/1928764 | 15:50 |
opendevmeet | Launchpad bug 1928764 in neutron "Fullstack test TestUninterruptedConnectivityOnL2AgentRestart failing often with LB agent" [Critical,Confirmed] - Assigned to Lajos Katona (lajos-katona) | 15:50 |
slaweq | which is hitting us pretty often | 15:51 |
slaweq | I know You were working on it some time ago | 15:51 |
slaweq | do You have any patch which should fix it? | 15:51 |
lajoskatona | Yes we discussed it with Oleg in review | 15:51 |
slaweq | or should we maybe mark those failing tests as unstable for now? | 15:51 |
lajoskatona | https://review.opendev.org/c/openstack/neutron/+/792507 | 15:51 |
lajoskatona | but obondarev is right, ping should not fail during restart of agent | 15:52 |
slaweq | actually yes - that is even main goal of this test AFAIR | 15:52 |
slaweq | to ensure that ping will work during the restart all the time | 15:53 |
lajoskatona | yeah marking them unstable can be a way forward to decrease the pressure on CI | 15:53 |
slaweq | lajoskatona: will You propose it? | 15:53 |
lajoskatona | Yes | 15:53 |
slaweq | thank You | 15:53 |
slaweq | #action lajoskatona to mark failing TestUninterruptedConnectivityOnL2AgentRestart fullstack tests as unstable temporary | 15:54 |
slaweq | lajoskatona: if You will not have too much time to work on the https://bugs.launchpad.net/neutron/+bug/1930401 this week, maybe You can also mark those tests as unstable for now | 15:54 |
opendevmeet | Launchpad bug 1930401 in neutron "Fullstack l3 agent tests failing due to timeout waiting until port is active" [Critical,Confirmed] | 15:54 |
obondarev | another bug related to PTG discussion on linuxbridge fiture | 15:55 |
obondarev | future* | 15:55 |
lajoskatona | slaweq: I will check | 15:55 |
slaweq | IMHO we need to make our CI to be a bit better as now it's a nightmare | 15:55 |
slaweq | obondarev: yes, that's true | 15:55 |
*** elodilles has joined #openstack-meeting-3 | 15:55 | |
slaweq | probably we will get back to that discussion in some time :) | 15:55 |
lajoskatona | we should ask NASA to help maintaining it :P | 15:56 |
slaweq | lajoskatona: yeah :) | 15:56 |
slaweq | good idea | 15:56 |
slaweq | can I assign it as an action item to You? :P | 15:56 |
* slaweq is just kidding | 15:56 | |
lajoskatona | :-) | 15:57 |
slaweq | ok, that was all what I had for today | 15:57 |
slaweq | if You don't have any last minute topics, I will give You few minutes back | 15:58 |
obondarev | o/ | 15:58 |
bcafarel | nothing from me | 15:58 |
slaweq | ok, thx for attending the meeting today | 15:58 |
slaweq | #endmeeting | 15:58 |
lajoskatona | o/ | 15:58 |
opendevmeet | Meeting ended Tue Jun 1 15:58:46 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:58 |
slaweq | o/ | 15:58 |
opendevmeet | Minutes: http://eavesdrop.openstack.org/meetings/neutron_ci/2021/neutron_ci.2021-06-01-15.01.html | 15:58 |
opendevmeet | Minutes (text): http://eavesdrop.openstack.org/meetings/neutron_ci/2021/neutron_ci.2021-06-01-15.01.txt | 15:58 |
opendevmeet | Log: http://eavesdrop.openstack.org/meetings/neutron_ci/2021/neutron_ci.2021-06-01-15.01.log.html | 15:58 |
ralonsoh | bye | 15:58 |
bcafarel | o/ | 15:59 |
gibi | #startmeeting nova | 16:00 |
opendevmeet | Meeting started Tue Jun 1 16:00:02 2021 UTC and is due to finish in 60 minutes. The chair is gibi. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
opendevmeet | The meeting name has been set to 'nova' | 16:00 |
gibi | o/ | 16:00 |
bauzas | \o | 16:00 |
elodilles | o/ | 16:00 |
gibi | hotseating with neutron :) | 16:00 |
bauzas | yup, I guess the chair is hot | 16:00 |
gibi | or hotswaping | 16:00 |
gmann | o/ | 16:00 |
slaweq | :) | 16:00 |
slaweq | hi | 16:00 |
bauzas | maybe we should open the windows ? | 16:00 |
gibi | slaweq: o/ | 16:00 |
gibi | bauzas: :D | 16:00 |
slaweq | bauzas: :D | 16:01 |
stephenfin | o/ | 16:01 |
* bauzas misses the physical meetings :cry: | 16:01 | |
* gibi joins in | 16:01 | |
elodilles | :) | 16:01 |
sean-k-mooney | o/ | 16:01 |
bauzas | you could see me sweating | 16:01 |
gibi | this will be a bittersweet meeting | 16:01 |
gibi | lets get rolling | 16:02 |
gibi | #topic Bugs (stuck/critical) | 16:02 |
gibi | no critical bugs | 16:02 |
gibi | #link 9 new untriaged bugs (-0 since the last meeting): #link https://bugs.launchpad.net/nova/+bugs?search=Search&field.status=New | 16:02 |
gibi | I like this stable number under 10 | 16:02 |
gibi | :D | 16:02 |
gibi | any specific bug we need to discuss? | 16:02 |
gibi | good | 16:04 |
gibi | #topic Gate status | 16:04 |
gibi | Placement periodic job status #link https://zuul.openstack.org/builds?project=openstack%2Fplacement&pipeline=periodic-weekly | 16:04 |
*** lajoskatona has left #openstack-meeting-3 | 16:04 | |
gibi | super green | 16:04 |
gibi | also nova master gate seems to be OK | 16:04 |
gibi | we merged patches today | 16:04 |
bauzas | \o/ | 16:04 |
bauzas | thanks lyarwood I guess | 16:04 |
gibi | thanks everybody who keep this up :) | 16:05 |
gibi | any gate issue we need to talk about? | 16:05 |
sean-k-mooney | im still investingating the one that might be related to os-vif | 16:06 |
sean-k-mooney | but not form me | 16:06 |
gibi | sean-k-mooney: thanks | 16:06 |
gibi | if nothing else for the gate then | 16:07 |
gibi | #topic Release Planning | 16:07 |
gibi | We had Milestone 1 last Thursday | 16:07 |
gibi | M2 is 5 weeks from now | 16:07 |
gibi | at M2 we will hit spec freeze | 16:07 |
gibi | hurry up with specs :) | 16:08 |
gibi | anything else about the release? | 16:08 |
gibi | #topic Stable Branches | 16:09 |
gibi | copying elodilles' notes | 16:09 |
gibi | newer stable branch gates needs investigation why those fail | 16:09 |
gibi | wallaby..ussuri seems to be failing, mainly due to nova-grenade-multinode (?) | 16:09 |
gibi | train..queens seems to be OK | 16:09 |
gibi | pike gate fix is on the way, should be OK whenever it lands ( https://review.opendev.org/c/openstack/devstack/+/792268 ) | 16:09 |
gibi | EOM | 16:09 |
gibi | elodilles: on the nova-grenade-multinode failure, is the ceph issue you pushed a DNM patch for? | 16:10 |
*** obondarev has quit IRC | 16:10 | |
elodilles | yes, that's it i think | 16:10 |
gibi | in short we see to new ceph version (pacific) installed on stable | 16:11 |
gibi | s/to/too/ | 16:11 |
gibi | anything else about stable? | 16:12 |
elodilles | yes, melwitt's comment pointed that out ( https://review.opendev.org/c/openstack/nova/+/785059/2#message-c31738db1240ddaa629a3aaa4e901c5a62206e85 ) | 16:12 |
elodilles | nothing else from me :X | 16:12 |
gibi | #topic Sub/related team Highlights | 16:13 |
gibi | Libvirt (bauzas) | 16:13 |
bauzas | well, nothing worth mentioning | 16:13 |
gibi | thanks | 16:13 |
gibi | #topic Open discussion | 16:13 |
gibi | I have couple of topics | 16:13 |
gibi | (gibi) Follow up on the IRC move | 16:14 |
gibi | so welcome on OFTC | 16:14 |
gmann | +1 | 16:14 |
gibi | so far the move seems to be going well | 16:14 |
gibi | I grepped our docs and the nova related wiki pages and fixed them up | 16:14 |
artom | Then again, if someone's trying to reach us over on Freenode, we'll never know, will we? | 16:14 |
gmann | yeah thanks for that | 16:14 |
gibi | artom: I will stay on freenode | 16:14 |
artom | Unless someone's stayed behind to redirect folks? | 16:14 |
* sean-k-mooney was not really aware we mentioned irc in the docs before this | 16:14 | |
artom | Aha! | 16:15 |
bauzas | we tho still have less people in the OFTC chan vs. the freenode one | 16:15 |
gmann | yeah I will also stay on Freenode for redirect | 16:15 |
bauzas | gibi: me too, I'll keep the ZNC network open for a while | 16:15 |
gibi | so if any discussion is starting on freenode I will redirect people | 16:15 |
gmann | We are going to discuss in TC Thursday meeting on topic change on freenode or so | 16:15 |
artom | I believe you're not allowed to mention OFTC by name? | 16:15 |
artom | There are apparently bots that hijack channels if you do that? | 16:15 |
gibi | artom: I will use private messages if needed | 16:15 |
bauzas | we currently have 102 attendees on freenode -nova compared to OFTC one | 16:15 |
gmann | artom: we can do as we have OFTC ready and working now | 16:15 |
bauzas | freenode : 102, OFTC : 83 | 16:16 |
sean-k-mooney | artom: that was librea and it was based on the topic i think | 16:16 |
bauzas | so I guess not everyone moved already | 16:16 |
gmann | we can give this email ref to know details http://lists.openstack.org/pipermail/openstack-discuss/2021-May/022780.html | 16:16 |
gibi | gmann: good point | 16:16 |
bauzas | artom: indeed, you can't spell the word | 16:16 |
sean-k-mooney | bauzas: well som of those are likely bots | 16:16 |
bauzas | artom: or the channels could be hijacked by some ops | 16:16 |
bauzas | sean-k-mooney: haven't really digged into details | 16:17 |
bauzas | but the numbers were appealing | 16:17 |
bauzas | but that's only a 48h change | 16:17 |
sean-k-mooney | yep most people are now here | 16:17 |
bauzas | we'll see next weeks | 16:17 |
gibi | OK, any other feedback or question around the IRC move? | 16:17 |
gibi | if not then a related topic... | 16:18 |
gibi | (gibi) Do we want to move our meeting from #openstack-meetin-3 to #openstack-nova ? | 16:18 |
gibi | this was a question originally from the TC | 16:18 |
gmann | I think it make sense and also working for many projects like QA, TC afaik | 16:18 |
artom | -1 from me. I've popped in to other channel, not aware that they were having a meeting, and "polluted" their meeting | 16:18 |
gmann | artom: which one? | 16:19 |
sean-k-mooney | artom: i raised that in the tc call | 16:19 |
artom | gmann, #openstack-qa, actually, I think :) | 16:19 |
gmann | I see very less interruption in QA or TC since past 1 year | 16:19 |
sean-k-mooney | but that is really just a habbit thing | 16:19 |
artom | Folks were very polite and everything | 16:19 |
gmann | artom: it might happen very less. | 16:19 |
sean-k-mooney | waitign for the topic to load adn see if the channel is active | 16:19 |
gibi | I can handle interruption politely I guess | 16:19 |
artom | But I felt guilty for interrupting | 16:19 |
gmann | and if anyone come in between we can tell them its meeting time | 16:19 |
artom | But... why? | 16:20 |
artom | What's wrong with a dedicated meeting channel? | 16:20 |
sean-k-mooney | artom: i have had the same feeling yes | 16:20 |
sean-k-mooney | artom: noting really just more infracture to schduing the meetings | 16:20 |
bauzas | I am absolutely +0 to this | 16:20 |
sean-k-mooney | e.g. "booking the room" | 16:20 |
gmann | it is hard to know where the meeting is going on with all openstack-meeting-* channels. | 16:20 |
sean-k-mooney | and seeting up the loging ectra | 16:20 |
bauzas | but sometimes it's nice to have sideways discussions happening on -nova while we continue ranting here | 16:20 |
artom | sean-k-mooney, it's already all set up :) | 16:21 |
gmann | there will be no difference in logging etc | 16:21 |
sean-k-mooney | artom: oh i know | 16:21 |
gibi | bauzas: +1 on the side discussions | 16:21 |
sean-k-mooney | i do like having the side conversation option | 16:21 |
bauzas | so I guess my only concern would be the ability to have dual conversations happening at the same time without polluting the meeting | 16:21 |
artom | gmann, is it though? Maybe for someone completely new to the community | 16:21 |
sean-k-mooney | that said i try not to do that when i can | 16:21 |
gmann | artom: its for me too when I need to attend many meeting :) | 16:22 |
bauzas | but I guess #openstack-dev could do the job | 16:22 |
artom | gmann, ah, I see your point. | 16:22 |
bauzas | my other concern could be some random folks pinging us straight during the meeting | 16:22 |
artom | Well, #openstack-<project-name>-meetings then? | 16:22 |
sean-k-mooney | i guess it depends i ususally wait for gibi to ping us :) | 16:22 |
bauzas | but that's not a big deal | 16:22 |
bauzas | (as I usually diverge) | 16:22 |
artom | I still want to keep the dual channel meeting/normal IRC option | 16:22 |
gmann | artom: that will be too many channels | 16:23 |
artom | *shrug* w/e :) | 16:23 |
bauzas | #openstack-dev can fit the purpose of side discussions | 16:23 |
sean-k-mooney | dansmith: you are pretty quite on the topic | 16:23 |
* artom joins bauzas in the +0 camp | 16:23 | |
dansmith | sean-k-mooney: still have a conflict in this slot, will have to read later | 16:23 |
gmann | dansmith: is in another call i think | 16:23 |
bauzas | I just express open thoughts and I'm okay with workarounds if needed | 16:24 |
bauzas | hence the +0 | 16:24 |
bauzas | nothing critical to me to hold | 16:24 |
sean-k-mooney | dansmith: ah just asking if you prefered #openstck-nova vs #openstack-meeting-3 | 16:24 |
gibi | OK, lets table this for next week then. So far I don't see too many people wanting to move | 16:24 |
sean-k-mooney | dansmith: no worries | 16:24 |
gmann | I will say let's try and if it does not work we can come back here | 16:24 |
gmann | +1 on keeping it open for discussion | 16:24 |
gibi | we will come back to this next week | 16:25 |
gibi | next | 16:25 |
gibi | (gibi) Monthly extra meeting slot for the Asia + EU. Doodle #link https://doodle.com/poll/svrnmrtn6nnknzqp . It seems Wednesday 8:00 or Thursday 8:00 is winning. | 16:25 |
gibi | 8:00 UTC I mean | 16:25 |
dansmith | sean-k-mooney: very much -nova | 16:25 |
sean-k-mooney | gibi: does that work for you to chair the meeting at that time | 16:27 |
bauzas | dansmith: I just expressed some concern about the ability to have side discussions, they could happen "elsewhere" tho | 16:27 |
gibi | If no objection then I will schedule that to Thursday 8:00 UTC and I will do that on #openstack-nova (so we can try the feeling) | 16:27 |
gibi | sean-k-mooney: yes | 16:27 |
gibi | sean-k-mooney: I can chair | 16:27 |
sean-k-mooney | cool | 16:27 |
bauzas | this works for me too | 16:27 |
bauzas | 10am isn't exactly early in the morning | 16:27 |
* sean-k-mooney wakes up at 10:30 most mornings | 16:28 | |
gibi | there was a lot of participation in the doodle | 16:28 |
sean-k-mooney | yes | 16:28 |
gibi | so I hope for similar crowd on the meeting too | 16:28 |
bauzas | indeed | 16:28 |
sean-k-mooney | form a number of names we dont see often in this neeting | 16:28 |
bauzas | yup, even from cyborg team | 16:28 |
gibi | I will schedule the next meeting for this Thursday | 16:29 |
gibi | so we can have a rule of every first Thurday of a month | 16:29 |
bauzas | (I mentioned this because they expressed their likeliness for a Asian-friendly timeslot) | 16:29 |
bauzas | gibi: this works for me, and this would mean the first meeting being held in two days | 16:30 |
gibi | yepp | 16:30 |
gibi | no more topic on the agenda for today. Is there anything else you would like to discuss today | 16:30 |
gibi | ? | 16:30 |
bauzas | do we need some kind of formal agenda for thursday meeting ? | 16:31 |
bauzas | or would we stick with a free open hour | 16:31 |
bauzas | ? | 16:32 |
gibi | I will ask the people about it on Thursday | 16:32 |
gibi | I can do both | 16:32 |
gibi | or just summarizing anyting from Tuesday | 16:32 |
artom | Oh, can I ask a specless blueprint vs spec question? | 16:32 |
gibi | artom: sure, go ahead | 16:32 |
artom | So, we talked about https://review.opendev.org/c/openstack/nova-specs/+/791287 a few meetings ago | 16:33 |
artom | WIP: Rabbit exchange name: normalize case | 16:33 |
artom | sean-k-mooney came up with a better idea that solves the same problem, except without the messy upgrade impact: | 16:33 |
artom | Just refuse to start nova-compute if we detect the hostname has changed | 16:33 |
artom | So I want to abandon https://review.opendev.org/c/openstack/nova-specs/+/791287 and replace it with sean-k-mooney's approach | 16:34 |
artom | Which I believe doesn't need a spec, maybe not even a blueprint | 16:34 |
sean-k-mooney | you mean treat it as a bugfix if its not a blueprint | 16:34 |
gibi | artom: the hostname reported by libvirt change compared to the hostname stored in DB? | 16:35 |
artom | gibi, yes | 16:35 |
artom | sean-k-mooney, basically | 16:35 |
bauzas | tbc, we don't mention the service name | 16:36 |
gibi | artom: could there be deployments out there that are working today but will stop working after your cahnge? | 16:36 |
bauzas | but the hypervisor hostname which is reported by libvirt | 16:36 |
bauzas | because you have a tuple | 16:36 |
artom | gibi, we could wrap it in a config option, sorta like stephenfin did for NUMA live migration | 16:36 |
bauzas | (host, hypervisor_hostame) | 16:36 |
sean-k-mooney | gibi: tl;dr in the virt diver we would lookup the compute service record using CONF.host and in the libvirt driver check that 1 the compute nodes assocated with the compute service record is lenght 1 and 2 that its hypervior_hostnaem is the same as the one we currenmtly have | 16:36 |
gibi | sean-k-mooney: thanks | 16:37 |
bauzas | eg. with ironic, you have a single nova-compute service (then, a single hostname) but multiple nodes, each of them being an ironic node UUID | 16:37 |
bauzas | sean-k-mooney: what puzzles me is that I thought service RPC names were absolutely unrelated to hypervisor names | 16:37 |
artom | bauzas, it would be for drivers that have a 1:1 host:node relationship | 16:37 |
artom | bauzas, but it's a good point, we'd have to make it drive-agnostic as much as possible | 16:37 |
bauzas | and CONF.host is the RPC name, hence the service name | 16:37 |
bauzas | artom: that's my concern, I guess | 16:38 |
bauzas | some ops wanna define some RPC name that's not exactly what the driver reports and we said for a while "you'll be fine" | 16:38 |
artom | bauzas, valid concern, though I think it'd be pointless to talk about it in a spec, without the code to look at | 16:38 |
bauzas | artom: tbh, I'm even not sure that other drivers but libvirt use the libvirt name as the service name | 16:39 |
bauzas | so we need to be... cautious, I'd say | 16:39 |
gibi | I thought we use CONF.host as service name | 16:39 |
sean-k-mooney | bauzas: we cant do that wihtou breaking people | 16:39 |
bauzas | gibi: right | 16:39 |
sean-k-mooney | gibi: we do | 16:39 |
gibi | OK | 16:40 |
gibi | so the service name is hypervisor agnostic | 16:40 |
bauzas | gibi: but we use what the virt driver reports for the hypervisor_hostname field of the ComputeNode record | 16:40 |
gibi | the node name is hypervisor specific | 16:40 |
bauzas | gibi: this is correct again | 16:40 |
sean-k-mooney | and in theory we use conf.host for the name we put in instance.host | 16:40 |
gibi | and RPC name is also the service name so that is also hypervisor agnostic | 16:41 |
bauzas | yup | 16:41 |
gibi | so if we need to fix the RPC name the we could do it hypervisor agonsticly | 16:41 |
gibi | (I guess) | 16:41 |
bauzas | but here, artom proposes to rely on the discrepancy to make it hardstoppable | 16:41 |
sean-k-mooney | i think there are two issues 1 changing conf.host and two the hypervior_hostname changing | 16:41 |
gibi | what if we detect the discrepancy via comparing db host with nova-compute conf.host? | 16:41 |
sean-k-mooney | ideally we would liek both to not change and detetc/block both | 16:42 |
bauzas | well, if you change the service name, then it will create a new service record | 16:42 |
gibi | hm, we cannot look up our old DB record if conf.host is changed :/ | 16:42 |
sean-k-mooney | gibi: we would have to do that backwards | 16:42 |
sean-k-mooney | lookup compute node rp by hypervior_hostname | 16:42 |
bauzas | the old service will be seen as dead | 16:42 |
sean-k-mooney | then check compute service recrod | 16:42 |
gibi | sean-k-mooney: so we detect if conf.host changes but hypervisor_hostname remains the same | 16:43 |
*** mlavalle has quit IRC | 16:43 | |
sean-k-mooney | gibi: yep we can check if either of the values changes | 16:43 |
sean-k-mooney | but not if both of the values change | 16:43 |
bauzas | what problem are we trying to solve ? | 16:43 |
bauzas | the fact that messages go lost ? | 16:44 |
gibi | sean-k-mooney: OK thanks, now I see it | 16:44 |
sean-k-mooney | unless we just write this in a file on disk that we read | 16:44 |
artom | bauzas, people renaming their compute hosts and exploding stuff | 16:44 |
artom | Either on purpose, or accidentally | 16:44 |
sean-k-mooney | bauzas: the fact that is possible to start the compute service when either conf.host has change or hypervisor_hostname has changed and get in an inconsitent state | 16:45 |
sean-k-mooney | bauzas: one of those bing that we can have instance on the same host with different valuse of instace.host | 16:45 |
gibi | I guess if both changed then we get a new working compute and the old will be orphaned | 16:45 |
sean-k-mooney | gibi: yep | 16:45 |
gibi | OK | 16:45 |
stephenfin | assuming we can solve this, I don't see this as any different to gibi's change to disallow N and N-M (M > 1) in the same deployment | 16:45 |
bauzas | sean-k-mooney: can we just consider to NOT accept to rename the service hostname if instances are existing on it ? | 16:46 |
stephenfin | in terms of being a hard break but only for people that already in a broken state | 16:46 |
sean-k-mooney | bauzas: that might also be an option yes | 16:46 |
stephenfin | ditto for the NUMA live migration thing, as artom alluded to above | 16:46 |
sean-k-mooney | bauzas: i think we have options and it might be good to POC some of them | 16:46 |
bauzas | again, I'm a bit conservative here | 16:47 |
gibi | I think I convinced that this can be done for libvirt driver. As of how to do it for the other drivers it remains to be seen | 16:47 |
bauzas | I get your point but I think we need to be extra cautious, especially with some fancy scenarios involving ironic | 16:47 |
bauzas | gibi: that's a service issue, we need to stay driver agnostic | 16:48 |
sean-k-mooney | bauzas: ack although what i was orignly suggestin was driver speific | 16:48 |
artom | sean-k-mooney, right, true | 16:48 |
bauzas | sean-k-mooney: hence my point about hypervisor_hostname | 16:48 |
sean-k-mooney | yes it not that i did not consider ironic | 16:48 |
artom | It would be entirely within the libvirt driver in init_host() or something, though I suspect we'd have to add new method arguments | 16:48 |
bauzas | I thought you were mentioning it as this is the only fied being virt specific | 16:48 |
sean-k-mooney | i intentionally was declaring it out of scope | 16:48 |
sean-k-mooney | and just fixinbg the issue for libvirt | 16:48 |
sean-k-mooney | bauzas: i dont think this can happen with ironic for what its worth | 16:49 |
bauzas | artom: init_host() is very late in the boot proicess | 16:49 |
artom | bauzas, well pre_init_host() then :P | 16:49 |
bauzas | you already have a service record | 16:49 |
bauzas | artom: pre init_host, you are driver agnostic | 16:49 |
gibi | bauzas: we need a libvirt connection though | 16:49 |
bauzas | gibi: I know | 16:49 |
gibi | bauzas: so I'm not sure we can do it earlier | 16:49 |
bauzas | gibi: my point | 16:49 |
sean-k-mooney | bauzas: we need to do it after we retrive teh compute service recored but before we create a new one | 16:49 |
bauzas | iirc, we create the service record *before* we initialize the libvirt connection | 16:50 |
bauzas | this wouldn't be idempotent | 16:50 |
sean-k-mooney | perhaps we should bring this to #openstack-nova | 16:50 |
bauzas | I honestly feel this is tricky enough for drafting it somewhere... unfortunately like in a spec | 16:51 |
artom | FWIW we call init_host() before we update the service ref... | 16:51 |
bauzas | especially if we need to be virt-agnostic | 16:51 |
artom | And before we create the RPC server | 16:51 |
bauzas | artom: ack, then I was wrong | 16:51 |
artom | 'ts what I'm saying, we need the code :) | 16:52 |
artom | In a spec it's all high up and abstract | 16:52 |
bauzas | I thought we did it in pre_hook or something | 16:52 |
sean-k-mooney | no we can talk about this in a spec if we need too | 16:52 |
bauzas | poc, then | 16:52 |
sean-k-mooney | specs dont hae to be high level | 16:52 |
bauzas | poc, poc, poc | 16:52 |
sean-k-mooney | ok | 16:52 |
gibi | OK, lets put up some patches | 16:53 |
gibi | discuss it there | 16:53 |
artom | 🐔 poc poc poc it is then 🐔🐔 | 16:53 |
gibi | and see if this can fly | 16:53 |
artom | Chickens can't fly | 16:53 |
gibi | I'm OK to keep this without a spec so far | 16:53 |
sean-k-mooney | if we have a few mints i have one ther simialr topic | 16:53 |
sean-k-mooney | https://bugzilla.redhat.com/show_bug.cgi?id=1700390 | 16:53 |
opendevmeet | bugzilla.redhat.com bug 1700390 in openstack-nova "KVM-RT guest with 10 vCPUs hangs on reboot" [High,Closed: notabug] - Assigned to nova-maint | 16:53 |
gibi | sean-k-mooney: sure | 16:53 |
sean-k-mooney | we have ^ downstream | 16:54 |
sean-k-mooney | baically when using realtime you shoudl alway use hw:emulator_tread_polcy=something | 16:54 |
sean-k-mooney | but we dont disally it because while its a bad idea not to it can work | 16:54 |
sean-k-mooney | im debating between filing a whish list bug vs a specless blueprint for a small change in our defaultl logic | 16:55 |
gibi | feels like a bug to me we can fix in nova. even if it works some times we can disallow it | 16:55 |
sean-k-mooney | if i rememeber correct we stil require at least 1 core to not be realtime | 16:55 |
sean-k-mooney | so i was tinking we could limit the emulator tread to that core | 16:55 |
gibi | sounds like a plan | 16:56 |
gibi | if we don't have such a limit then I think we should add that as weel | 16:56 |
gibi | well | 16:56 |
gibi | it does not make much sense not to have a o&m cpu | 16:56 |
sean-k-mooney | we used too | 16:56 |
sean-k-mooney | i dont think we removed it | 16:56 |
sean-k-mooney | would people be ok with this as a bugfix | 16:56 |
sean-k-mooney | i was conserded it might be slitghly featureish | 16:57 |
gibi | yes. It removes the possibility of a known bad setup | 16:57 |
sean-k-mooney | ok ill file and transcribe the relevent bit from the downstream bug so | 16:58 |
sean-k-mooney | the workaround is just use hw:emulator_tread_polcy=share|isolate | 16:58 |
sean-k-mooney | but it would be nice to not have the bugging config by default | 16:58 |
gibi | I agree | 16:58 |
gibi | and others seems to be silent :) | 16:59 |
gibi | so it is sold | 16:59 |
gibi | any last words before we hit the top of the hour? | 16:59 |
gibi | then thanks for joining today | 17:00 |
gibi | o/ | 17:00 |
gibi | #endmeeting | 17:00 |
opendevmeet | Meeting ended Tue Jun 1 17:00:12 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 17:00 |
opendevmeet | Minutes: http://eavesdrop.openstack.org/meetings/nova/2021/nova.2021-06-01-16.00.html | 17:00 |
opendevmeet | Minutes (text): http://eavesdrop.openstack.org/meetings/nova/2021/nova.2021-06-01-16.00.txt | 17:00 |
opendevmeet | Log: http://eavesdrop.openstack.org/meetings/nova/2021/nova.2021-06-01-16.00.log.html | 17:00 |
*** sean-k-mooney has left #openstack-meeting-3 | 17:00 | |
elodilles | o/ | 17:00 |
*** ralonsoh has quit IRC | 17:20 | |
*** ralonsoh has joined #openstack-meeting-3 | 17:20 | |
*** ralonsoh has quit IRC | 17:21 | |
*** belmoreira has quit IRC | 18:35 | |
*** tosky has quit IRC | 23:11 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!