14:00:10 <slaweq> #startmeeting neutron_drivers 14:00:11 <openstack> Meeting started Fri Feb 14 14:00:10 2020 UTC and is due to finish in 60 minutes. The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:14 <openstack> The meeting name has been set to 'neutron_drivers' 14:00:15 <njohnston> o/ 14:00:15 <slaweq> hi 14:00:17 <ralonsoh> hi 14:00:18 <yamamoto> hi 14:00:22 <TheJulia> o/ 14:00:34 <stephen-ma> hi 14:01:09 <slaweq> lets wait few more minutes for amotoki haleyb and mlavalle 14:01:16 <haleyb> hi 14:01:44 <slaweq> hi haleyb 14:01:56 <slaweq> so we have already quorum and I think we can start 14:02:13 <slaweq> as TheJulia is here, lets start not as usual :) 14:02:19 <slaweq> #topic On Demand Agenda 14:02:32 <slaweq> TheJulia has added topic to https://wiki.openstack.org/wiki/Meetings/NeutronDrivers 14:03:44 <mlavalle> o/ 14:03:47 <njohnston> hello TheJulia 14:04:01 <amotoki> hi1 sorry for late 14:04:08 <slaweq> hi mlavalle and amotoki 14:04:33 <TheJulia> So bottom line is we're wondering if the mac address update can be made non-admin or covered by a specific policy because ironic is making the service more multitenanty and usable for non-admins, but we pass credentials through for port actions and are trying to avoid pulling a second admin session as the ironic service user to just update the mac address 14:04:35 <slaweq> just FYI, I started today with On demand agenda as TheJulia added some topic to it and I didn't want to hold her on the meeting for whole hour :) 14:04:56 <TheJulia> slaweq: much appreciated,.... for I have hours of meetings ahead of me :) 14:05:06 <njohnston> I wonder if this could be achieved with a policy.json modification defining a role tied to a specific service credential for Ironic 14:05:14 <slaweq> njohnston: I think it can: https://github.com/openstack/neutron/blob/master/neutron/conf/policies/port.py#L192 14:05:27 <slaweq> it's defined there IIUC what is the need from Ironic 14:06:19 <slaweq> and it seems for me that it can be done by admin or advsvc user 14:06:50 <njohnston> slaweq: yes that is exactly what I was thinking about 14:07:16 <amotoki> are we discussing mac address update by all non-admin users or users with specific roles? 14:08:00 <TheJulia> non-admin users of baremetal, which has me thinking we're going to have to do the thing we don't want to do which is pull a separate client/session to directly update the port mac as a separate action 14:08:53 <njohnston> the thing is, Neutron has no way of distinguishing between the Ironic use case and the other use cases where non-admin access to this would be a bad idea 14:09:36 <amotoki> we prepared the advsvc role for such purpose. If it works it would be great. 14:10:51 <TheJulia> Yeah, I suspect we could just have ironic learn how to do it separately, which would prevent potential security issues. I guess well need to look at that. Anyway thanks everyone! 14:11:32 <amotoki> one thing to note is that updating mac address should be limited to a private network. 14:11:41 <amotoki> I mean "self-service network". 14:11:57 <njohnston> Are there some baremetal scenarios where it would not be a good idea to allow mac address updating? 14:12:18 <amotoki> it should not be allowed in a shared network. 14:12:38 <amotoki> in other words, the operation should be limited to a network owner IMHO. 14:12:57 <TheJulia> njohnston: none, we must be able to update the mac address for pxe booting and addressing of physical ports. 14:14:55 <TheJulia> in that case the ironic service account needs to perform the port update action for just the mac address 14:15:28 <TheJulia> since we know it and manage it 14:16:12 <njohnston> I am wondering if we could permit port update for all in policy.json and then later in the logic require specific privileges unless it's a baremetal port. 14:17:12 <njohnston> But I don't know if there are baremetal-but-not-Ironic scenarios that could bite us with that 14:17:24 <slaweq> njohnston: I'm not sure if we should do such hard coded rules for some specific kinds of resources 14:17:26 <TheJulia> that is only informed via the ?binding profile? which I think is later on, also since users can request vifs on user created networks and ironic will request it be attached to that network 14:18:03 <njohnston> Having a separate admin session has the virtue of simplicity, other methods for doing it get complex quickly it seems to me 14:18:19 <TheJulia> agreed 14:18:39 <TheJulia> Thanks, I'll let the contributor working on the multitenancy feature set know! 14:19:31 <slaweq> ok, so I think we are good with Your topic TheJulia, right? You will try to use advsvc role for this action. 14:20:39 <TheJulia> yup, thanks 14:21:02 <slaweq> thx TheJulia :) 14:21:15 <slaweq> so now we can move to our regular topic 14:21:17 <slaweq> #topic RFEs 14:21:31 <slaweq> we have 2 RFEs for today 14:21:34 <slaweq> first one: 14:21:36 <slaweq> https://bugs.launchpad.net/neutron/+bug/1860521 14:21:38 <openstack> Launchpad bug 1860521 in neutron "L2 pop notifications are not reliable" [Undecided,New] 14:24:21 <slaweq> I remember from when I was working in OVH that we had similar problems with L2pop mechanism and we added something like periodic sync of tunnels config on host 14:25:49 <njohnston> I am not sure what the impact to the message bus would be to change from fanout/cast to RPC calls 14:27:24 <slaweq> njohnston: to the message bus not much but for neutron-rpc workers which will send such messages and wait for reply, impact will be at least "noticeable" IMO 14:27:59 <njohnston> slaweq: Yeah, I was worried more about the RPC workers 14:28:21 <slaweq> njohnston: ahh, ok :) 14:28:25 <mlavalle> it's a matter of whether the mesh of tunnels works vs the cost 14:28:36 <njohnston> Does OVN use l2pop? IIRC it doesn't but I haven't looked in that part of the code in a while. 14:29:06 <slaweq> njohnston: nope 14:29:07 <amotoki> I am not sure right now which is better to switch it to RPC calls or to sync info periodically. 14:29:25 <amotoki> if the number of nodes to be informed is small, it makes sense to switch it to RPC calls. 14:30:22 <mlavalle> but I am sure the problem is more acute in large deployments 14:31:09 <njohnston> There are costs and benefits each way - if the RPC call idea adds overhead then it could make the situation in large deployments worse. With the periodic sync you have a period of time where things might not work correctly, before the next sync. 14:31:27 <mlavalle> yeap 14:31:33 <mlavalle> it's always a trade off 14:31:59 <njohnston> I personally favor the periodic sync as being in keeping with our "eventually consistent" way of doing things, but I have a bias towards large deployment thinking. 14:32:22 <slaweq> actually looking at the code it will not be possible to switch always to call() from cast() 14:32:23 <mlavalle> but if the mesh of tunnels gets to a point where it doesn't work, then it's time to consider the trade offs 14:32:28 <slaweq> https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/l2pop/rpc.py 14:32:45 <slaweq> in some cases it uses fanout=True and then it can't call() be used 14:33:15 <ralonsoh> slaweq, why? 14:33:46 <ralonsoh> actually one problem we have with the MQ is that some calls are blocking 14:34:59 <slaweq> ralonsoh: https://docs.openstack.org/oslo.messaging/latest/reference/rpcclient.html#oslo_messaging.RPCClient.call 14:35:16 <slaweq> call() waits for return value so You can't send it to many hosts and wait for many replies 14:35:28 <slaweq> that's at least how I understand it 14:35:33 <ralonsoh> I know, but why do we need to use call instead of cast? 14:35:44 <ralonsoh> if the MQ is down, the server will stop working 14:35:56 <slaweq> ralonsoh: this was "Option 2" in the RFE 14:36:01 <ralonsoh> (maybe this is out of topic, sorry) 14:36:06 <amotoki> yeah, switching cast() into call() is not straight-forward. in case of fanout=True, we need to convert it into multiple call() and also need to check the status of individual call(). 14:36:18 <amotoki> using call() allows us to check the result 14:36:39 <amotoki> but perhaps it will bring another scaling issue in this case. 14:36:51 <stephen-ma> slaweq: when can an RFE can be brought up for discussion? 14:37:02 <slaweq> amotoki: yes, but IMO that's not good idea as L2pop was IMO designed to address some scale problems and such change would make it totally not scallable 14:37:20 <amotoki> slaweq: exactly 14:37:31 <amotoki> I just tried to explain what would happen. 14:37:39 <slaweq> stephen-ma: are You asking about https://bugs.launchpad.net/neutron/+bug/1861529 ? If yes, I keep it for the end of the meeting :) 14:37:40 <openstack> Launchpad bug 1861529 in neutron "[RFE] A port's network should be changable" [Wishlist,New] 14:38:03 <stephen-ma> yes that's the RFE 14:38:21 <slaweq> amotoki: ralonsoh so IMO to address issue described by Oleg, we should only consider "option 1 - periodic sync mechanism" 14:38:45 <amotoki> slaweq: agree. I personally prefer to the periodic sync. 14:39:19 <ralonsoh> (I don't, that's why we have the MQ) 14:39:28 <ralonsoh> but the problem is why the MQ is not reliable 14:39:56 <ralonsoh> anyway, if making this update periodic can solve this problem, I'm ok 14:40:02 <slaweq> ralonsoh: problem here is that with cast() method neutron-server don't know if agent configured everything fine 14:41:17 <ralonsoh> I will comment in the bug 14:41:34 <ralonsoh> but that should not be a server problem 14:41:49 <ralonsoh> if agent is down, server should keep working 14:41:52 <amotoki> MQ is durable in some cases, but from my operator experience it is not easy to ensure MQ msgs are reliable, so it is nice if neutron (MQ users) provides some mechanism for reliability. 14:42:08 <ralonsoh> if the agent received the config and everything went fine, ok 14:42:20 <ralonsoh> if not, the agent should communicate to the server informing about the error 14:42:44 <njohnston> The other alternative, just to play devil's advocate, is to build the reliability higher up in the stack 14:42:45 <mlavalle> which is synch of sorts, right? 14:43:10 <ralonsoh> njohnston, the reliability is on the services: agent, server, etc 14:43:11 <njohnston> ralonsoh: If the MQ never sent the message tot he agent then the agent has no idea it has something to complain about 14:43:26 <ralonsoh> the agent should be smart enough to send a warning message to the server 14:43:35 <ralonsoh> njohnston, I know 14:43:51 <ralonsoh> and if the MQ does not work, then will have a stopped Neutron server 14:43:54 <slaweq> ralonsoh: exactly how njohnston said, in case of cast() You will not have e.g. message timeout error on server side 14:44:07 <njohnston> ralonsoh: Instead of depending on the call() method the agent sends a message saying "I processed this update", and neutron server counts these acks. 14:44:07 <ralonsoh> (something very common in some bugs) 14:44:50 <njohnston> Similarly to how you design a reliable service on top of UDP, you don't have the transmission mechanism ensure reliability, you build it into the application layer 14:44:54 <ralonsoh> exactly: this should be like a UDP call, and the client should inform..... 14:45:01 <ralonsoh> exactly! 14:45:05 <ralonsoh> I was writing the same 14:45:30 <mlavalle> and all of his is a synch mechanism, isn't it? 14:45:30 <ralonsoh> goal: do NOT block the server 14:45:33 <njohnston> that requires neutron to track what agent(s) should respond to this kind of request and keep an account for responses received 14:45:43 <ralonsoh> yes 14:46:05 <ralonsoh> in a async way 14:46:32 <njohnston> mlavalle: the value here is that the approach is not periodic or timer-based 14:46:52 <mlavalle> sure 14:47:56 <mlavalle> whenever to asynch entities need to cooperate (server and agent for example) you need a way to fins synchronization points, periodic or otherwise 14:48:23 <mlavalle> nature of distributed systems 14:48:41 <mlavalle> I'd say the idea has merit and we should explore it further with a spce 14:48:56 <mlavalle> *spec 14:49:23 <ralonsoh> agree 14:49:43 <njohnston> My main question: the work of syncing to the database for FDB updates and then keeping an account of responses received, is it worth the effort? Compared to the simpler periodic sync mechanism. 14:50:10 <ralonsoh> IMO, you don't need to track the responses 14:50:18 <slaweq> ok, so to sumup what we discussed so far: we should continue discussion in the sync (periodic or not) mechanism in the spec, we don't want to switch from cast() to call(), right? 14:50:22 <ralonsoh> you'll have error responses or nothing 14:50:28 <njohnston> If you don't track the responses then you can't reissue the update when a response is not received 14:50:47 <ralonsoh> 1) if the agent is not working, the server will notice that, period checks 14:51:17 <ralonsoh> 2) if the agent message didn't work, the agent will reply with an error 14:51:32 <ralonsoh> 3) if the MQ is unreliable.... well, this IS a problem 14:51:40 <ralonsoh> but not Neutron's problem 14:52:00 <njohnston> But addressing 3 is the point of the RFE, it is Neutron's problem 14:52:08 <njohnston> If AMQP drops the message then neither server nor client will know there was an error 14:52:33 <njohnston> drops a casted message to be specific 14:52:38 <slaweq> I agree with njohnston here - we should do as much as we can to address such case on our side 14:52:42 <mlavalle> it is Neutron's problem in the sense that is has at least cope with it 14:52:57 <ralonsoh> ok 14:55:25 <slaweq> so, I will sum up this discussion in RFE and ask for spec to continue discussion there, right? 14:55:32 <mlavalle> slaweq: I would point out in the RFE that in light of today's discussion, we lean towards some sort of synch problem 14:55:49 <mlavalle> and that we would like to explore it in a spec 14:55:55 <slaweq> mlavalle: sure 14:56:06 <njohnston> +1 14:56:12 <amotoki> +1 14:56:16 <ralonsoh> +1 14:56:42 <mlavalle> I meant synch mechanism 14:56:59 <slaweq> ok, thx for discussion about this RFE - it was the good one today :) 14:57:22 <slaweq> as we are almost on top of the hour, I don't want to start discussion about next rfe 14:58:03 <slaweq> but I want to ask all of You to check https://bugs.launchpad.net/neutron/+bug/1861529 14:58:04 <openstack> Launchpad bug 1861529 in neutron "[RFE] A port's network should be changable" [Wishlist,New] 14:58:12 <ralonsoh> ok 14:58:31 <slaweq> and that's all for today from me 14:58:36 <slaweq> thx for attending 14:58:41 <slaweq> and have a great weekend 14:58:42 <slaweq> o/ 14:58:45 <amotoki> o/ 14:58:45 <mlavalle> o/ 14:58:45 <ralonsoh> bye 14:58:46 <slaweq> #endmeeting