20:00:31 <asadoughi> #startmeeting blueprint ovs-firewall-driver 20:00:32 <openstack> Meeting started Mon Dec 16 20:00:31 2013 UTC and is due to finish in 60 minutes. The chair is asadoughi. Information about MeetBot at http://wiki.debian.org/MeetBot. 20:00:33 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 20:00:35 <openstack> The meeting name has been set to 'blueprint_ovs_firewall_driver' 20:01:03 <asadoughi> everyone around for the meeting? 20:01:35 <asadoughi> say hi if you're here for the blueprint ovs-firewall-driver meeting 20:01:42 <JunPark_> hi asadoughi, this is Jun. 20:01:59 <asadoughi> hi JunPark_ 20:02:35 <JunPark_> I believe Mike Wilson is here as well. 20:03:31 <kanthi> Hi Amir, Kanthi here 20:03:41 <asadoughi> hi kanthi 20:03:45 <geekinutah> hi all 20:03:47 <geekinutah> <- Mike 20:03:56 <asadoughi> hi geekinutah 20:04:30 <asadoughi> #link https://wiki.openstack.org/wiki/Meetings/Neutron_blueprint_ovs-firewall-driver 20:04:43 <asadoughi> that's the agenda link for today 20:04:59 <asadoughi> #link blueprint https://blueprints.launchpad.net/neutron/+spec/ovs-firewall-driver 20:05:21 <asadoughi> #topic Purpose of blueprint 20:05:41 <asadoughi> I wanted to restate the purpose of the blueprint so everyone's on the same page as far as the direction. 20:05:46 <asadoughi> To support the security groups extension in the OVS neutron agent through OVS flows using the existing OVS library with feature parity to the existing iptables-based implementations. In Icehouse, the existing openvswitch plugin is being deprecated, so the blueprint is compatible with the ML2 plugin with the openvswitch mechanism driver. 20:06:20 <asadoughi> any questions or comments on the purpose statement? 20:06:52 <geekinutah> seems solid 20:08:02 <asadoughi> ok. on to the next topic... 20:08:17 <asadoughi> #topic Design decisions 20:08:54 <asadoughi> ok, so i wanted to talk about ovs and state. so this took up ~30 minutes of the ml2 meeting so i'll try to be more concise here 20:09:05 <asadoughi> #link https://etherpad.openstack.org/p/ovs-firewall-driver-stateless-2 20:09:27 <asadoughi> that's a link to the etherpad of where my discussion will be surrounding 20:10:06 <asadoughi> so, in ovs today, there are two best practices options of implementing security groups: 20:10:18 <asadoughi> 1. reflexive learning actions (available in OVS today) 20:10:40 <asadoughi> 2. stateless ACLs with tcp_flags= (in OVS master, but not in a tagged version of OVS AFAIK) 20:11:07 <asadoughi> 3. stateless ACLs (self explanatory) 20:11:47 <asadoughi> so, the design decision i wanted to bring to the community was how do we want to implement security groups using 2 (almost 3 options) 20:12:01 <asadoughi> i say almost 3 because #2 is not available in a versioned OVS 20:12:28 <asadoughi> my idea was to implement 3 followed by 2 once it is available in a versioned OVS 20:13:07 <geekinutah> is there anything that we can't implement for security groups with #3? 20:13:44 <asadoughi> geekinutah: great question. so the security groups API implementation is 'stateful' and now i will go through the examples in the etherpad 20:14:12 <asadoughi> so given a server and a client, an instance and remote ip, there are 4 possible flows 20:14:21 <asadoughi> i have enumerated the 4 flows in the etherpad 20:14:46 <asadoughi> the 4 flows represent 2 connections 20:15:04 <asadoughi> 1. nw_src=$instance_ip, tp_src=random, nw_dst=$remote_ip, tp_dst=22 20:15:11 <asadoughi> 2. nw_src=$remote_ip, tp_src=random, nw_dst=$instance_ip, tp_dst=22 20:15:17 <asadoughi> 3. nw_src=$instance_ip, tp_src=22, nw_dst=$remote_ip, tp_src=random 20:15:24 <asadoughi> 4. nw_src=$remote_ip, tp_src=22, nw_dst=$instance_ip, tp_dst=random 20:16:26 <asadoughi> 1. and 4. represent two halves of the same connections. 2 and 3. represent two halfs of the same (but different from 1 and 4) connection as well. 20:17:17 <asadoughi> any comments or questions about the 4 flows / 2 connections? 20:17:22 <JunPark_> ok, they make sense. 20:18:49 <asadoughi> ok, so now with these 4 flows in your collective heads, let's move to the neutron security groups api 20:18:55 <JunPark_> basically, an ssh session to an instance and another ssh session to outside, right? 20:19:02 <asadoughi> JunPark_: correct 20:19:33 <asadoughi> in the etherpad i have the api shown as documented by client help output 20:19:56 <asadoughi> you can see that the api lacks a source port match (tp_src) 20:20:11 <asadoughi> geekinutah: answering your question now hopefully...: 20:20:44 <geekinutah> yup, makes sense 20:20:46 <asadoughi> so, with the existing security groups api it is not possible to implement stateless ACL (#3) without adding a source port match 20:21:17 <asadoughi> so, i have proposed two reviews adding a source port match to the api and client 20:21:29 <asadoughi> #link neutron change https://review.openstack.org/#/c/62129/ 20:21:39 <asadoughi> #link neutronclient change https://review.openstack.org/#/c/62130/ 20:22:13 <asadoughi> ok, so are we on the same page for implementation choice #3 and what has to change in the api for it to happen? 20:22:22 <JunPark_> a quick question. 20:22:33 <asadoughi> JunPark_: shoot 20:23:06 <JunPark_> as an example of 1 & 4 that deals with outgoing ssh session... 20:24:00 <JunPark_> port 22 can be easily guessed in 4, I believe. 20:24:35 <asadoughi> can you elaborated on "guessed"? are you referring to reflexive learning actions (implementation choice #1)? 20:25:26 <JunPark_> sorry for the confusion. I don't know about "reflexive learning actions." Maybe I may not clearly understand about the issue here. But let me try explain it again. 20:26:15 <asadoughi> so, reflexive learning actions would mean OVS could learn the source port from the egress flow 20:26:48 <JunPark_> when there is an outgoing ssh session that needs to be allowed, those two flows can be built via one single api that only says "allow outgoing port 22." 20:27:03 <JunPark_> the two flows mean 1 & 4 in this example. 20:27:34 <JunPark_> I'm trying to understand why we need to have "source port match" in api... 20:28:23 <asadoughi> oh, i see. good point. so, the specific security group rule that makes the source port match necessary is: in default security groups, you allow all egress; you wouldn't want to automatically allow all ingress in that case. 20:29:30 <asadoughi> are you familiar with default security groups? default security groups are defined as: (1) allow all egress (2) allow all ingress from other instances in the default security group. 20:30:16 <JunPark_> nope...I just got to know such details here. ^^ 20:31:28 <JunPark_> anyway, I think I got you now. 20:31:34 <asadoughi> ok. so with choice #3, you will need to add a third rule to say allow all ingress from source port 22. and with choice #2, you will be able to say allow all ingress with tcp_flags=ack (no third security group rule necessary) 20:32:12 <asadoughi> so, my preferred implementation path is choice #3 stateless ACL followed by #2 stateless ACL with tcp_flags. 20:32:32 <JunPark_> that sounds ok. 20:32:39 <asadoughi> i prefer not to implement #1 reflexive learning actions because they are not as perfomant: as i talked to another engineer, "cuts into how many things a megaflow can wildcard, the less that can be wildcarded, the more ovs will have to hit userspace for flows" 20:33:00 <hemanthravi> asadoughi: sorry joined late, are these choices in the etherpad? 20:33:23 <asadoughi> hemanthravi: sorry. i do not have them there, but i'll add now. 20:34:33 <asadoughi> hemanthravi: added 20:35:14 <asadoughi> ok, it sounds like i need to summarize what we discussed here and take it to the ML given lower attendance than i was hoping for 20:35:35 <asadoughi> #action asadoughi to take implementation choice / design discussion to openstack-dev mailing list 20:36:22 <asadoughi> does anyone have any input to the implementation choices other than what was discussed so far? 20:37:40 <geekinutah> so far so good I think 20:37:58 <geekinutah> I will need to learn more about #2, but I agree with you on #1 20:38:07 <JunPark_> another quick question. 20:38:12 <asadoughi> JunPark_: shoot 20:38:17 <geekinutah> and it does seem like a raw stateless implementation is not out of place 20:38:58 <JunPark_> so this firewall or security group related DB is already implemented in icehouse for persistent data? 20:39:26 <JunPark_> sorry for ignorance about the current state of icehouse regarding this topic. 20:39:34 <asadoughi> JunPark_: security groups has been in neutron for a while, yes. with a database and rpc api 20:39:44 <JunPark_> cool. thanks! 20:40:19 <asadoughi> JunPark_: https://github.com/openstack/neutron/blob/master/neutron/db/securitygroups_db.py https://github.com/openstack/neutron/blob/master/neutron/db/securitygroups_rpc_base.py 20:40:50 <asadoughi> ok, i am going to move on to the next topic 20:41:08 <asadoughi> #topic ovs_neutron_agent nuances/issues/tasks 20:43:13 <asadoughi> 1. so, in neutron, the ovs_neutron_agent (the agent that applies flows) provisions VLANs for vifs after the firewall is invoked, which is troublesome since the security groups flows have to apply the vlan that the agent applied so i'll have a patch to rearrange the vlan allocation before the firewall is invoked later this week 20:43:58 <asadoughi> any questions/ comments about #1? 20:45:14 <asadoughi> 2. ovs_neutron_agent removes all flows on all bridges at initialization, which might be a terrible thing depending on how reliable the agent is for tenant traffic -- i'm not sure exactly what i want to do about that 20:45:46 <asadoughi> any questions/comments about #2? ..any solutions? :) 20:45:47 <geekinutah> asadoughi: it is disruptive 20:46:01 <geekinutah> JunPark_ actually wrote a "soft" restart for the agent 20:46:10 <JunPark_> "at initialization" means "agent restart"? 20:46:17 <asadoughi> JunPark_: correct 20:46:46 <geekinutah> however, this is not a problem directly related to the firewall I think 20:46:59 <geekinutah> as long as flows come up with the default deny or accept depending on policy 20:47:03 <asadoughi> geekinutah: also true, but a concern nonetheless 20:47:06 <geekinutah> but yeah, it needs fixing :-) 20:47:19 <asadoughi> 3. ovs_neutron_agent removes all flows on the vifs port at initialization 20:47:20 <yamahata> How many flows do you expect? Is remove all flows and reinstall all flows unacceptable? 20:47:57 <asadoughi> yamahata: removing and reinstalling is not acceptable if its killing all of the tenants traffic, pending on agent uptime reliability 20:49:09 <asadoughi> regarding #3, it deletes all flows based on in_port, you cannot delete flows based on cookie until OVS 1.5.0+ which might be troublesome for some users b/c XenStack 6.2 is on 1.4.6 and Ubunut P*/Q* is on 1.4.6 as well 20:49:33 <yamahata> I see. then what is wanted is to retrieve the existing flow and fix them up according to the configuration. 20:49:39 <JunPark_> In our env where, e.g., a host runs 100 VMs, when we restart neutron agent, if agent wipes out all flows, it takes 5 to 10 minutes to complete all flows deployment, which is terrible to us. 20:49:59 <asadoughi> JunPark_: yes, that sounds terrible 20:50:17 <JunPark_> that's why I implemented our own "soft restart" 20:50:35 <JunPark_> i'm not sure the patch that I shared with you before includes that patch of "soft restart" though. 20:51:00 <asadoughi> ok. so #2 sounds like maybe another blueprint to file 20:51:08 <asadoughi> JunPark_: interested in filing that? 20:51:22 <JunPark_> sure. 20:51:32 <kanthi> Amir, one question which may be bit out of context at this point of discussion, how do you plan the between VM traffic filtering 20:51:41 <asadoughi> #action JunPark_ to file blueprint around soft restart 20:52:49 <kanthi> Current security rules use iptables, where all these rules are applied in forward chain and these will not be hit if both VMs are on same compute host in which case they just use L2Network for communication 20:53:04 <asadoughi> kanthi: i don't have that in my prototype at the moment, but i have solutions to base my future work off of ( JunPark_ as well as my co-workers') 20:53:16 <JunPark_> kanthi: do you mean "vm-to-vm traffic"? 20:53:22 <kanthi> yes 20:53:44 <asadoughi> #topic impromptu vm-to-vm traffic 20:53:57 <kanthi> since we would be doing filtering based on ovs ports, these rules might be applied for vm-vm traffic on same host as well 20:54:07 <JunPark_> my patch includes vm-to-vm traffic flows implementation. 20:54:45 <JunPark_> assuming that by default it's allowed because we assume that the flat provider networks is a public vlan. 20:54:52 <asadoughi> kanthi: so something TBD .. i'd like the underlying issues to be discussed and resolved first before we get to that level 20:55:14 <JunPark_> well, i'm not sure which behavior would be the legit default behavior though. 20:55:26 <geekinutah> asadoughi: agreed, let's get the basics in place and then deal with that 20:55:55 <kanthi> ok, then please procces according to the agenda 20:55:56 <asadoughi> #topic ovs_agent: versioning 20:56:23 <asadoughi> so does anyone have issue with requriing ovs 1.5.0+ in the agent? because of the cookie delete issue 20:57:02 <geekinutah> asadoughi: do you know what the latest LTS has built in? 20:57:13 <asadoughi> geekinutah: not handy. can look it up 20:57:20 <sc68cal> 12.04 packages like 0.4.0 20:57:30 <sc68cal> the cloud archive has like 1.10 I think? 20:57:38 <asadoughi> #link https://launchpad.net/ubuntu/+source/openvswitch 20:57:59 <asadoughi> so anything R and above, would be complicit 20:58:13 <geekinutah> one consideration is asking distributions to maintain a kmod, not sure if that's a big deal 20:58:24 <asadoughi> we're running short on time so i'll skip the prototype discussion 20:58:25 <geekinutah> other than that requiring 1.5.0+ seems reasonable 20:58:48 <asadoughi> #topic open ended 20:59:15 <asadoughi> anything before we end? 21:00:03 <asadoughi> #endmeeting