#openstack-meeting log

20:00:31 <asadoughi> #startmeeting blueprint ovs-firewall-driver
20:00:32 <openstack> Meeting started Mon Dec 16 20:00:31 2013 UTC and is due to finish in 60 minutes.  The chair is asadoughi. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:33 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:00:35 <openstack> The meeting name has been set to 'blueprint_ovs_firewall_driver'
20:01:03 <asadoughi> everyone around for the meeting?
20:01:35 <asadoughi> say hi if you're here for the blueprint ovs-firewall-driver meeting
20:01:42 <JunPark_> hi asadoughi, this is Jun.
20:01:59 <asadoughi> hi JunPark_
20:02:35 <JunPark_> I believe Mike Wilson is here as well.
20:03:31 <kanthi> Hi Amir, Kanthi here
20:03:41 <asadoughi> hi kanthi
20:03:45 <geekinutah> hi all
20:03:47 <geekinutah> <- Mike
20:03:56 <asadoughi> hi geekinutah
20:04:30 <asadoughi> #link https://wiki.openstack.org/wiki/Meetings/Neutron_blueprint_ovs-firewall-driver
20:04:43 <asadoughi> that's the agenda link for today
20:04:59 <asadoughi> #link blueprint https://blueprints.launchpad.net/neutron/+spec/ovs-firewall-driver
20:05:21 <asadoughi> #topic Purpose of blueprint
20:05:41 <asadoughi> I wanted to restate the purpose of the blueprint so everyone's on the same page as far as the direction.
20:05:46 <asadoughi> To support the security groups extension in the OVS neutron agent through OVS flows using the existing OVS library with feature parity to the existing iptables-based implementations. In Icehouse, the existing openvswitch plugin is being deprecated, so the blueprint is compatible with the ML2 plugin with the openvswitch mechanism driver.
20:06:20 <asadoughi> any questions or comments on the purpose statement?
20:06:52 <geekinutah> seems solid
20:08:02 <asadoughi> ok. on to the next topic...
20:08:17 <asadoughi> #topic Design decisions
20:08:54 <asadoughi> ok, so i wanted to talk about ovs and state. so this took up ~30 minutes of the ml2 meeting so i'll try to be more concise here
20:09:05 <asadoughi> #link https://etherpad.openstack.org/p/ovs-firewall-driver-stateless-2
20:09:27 <asadoughi> that's a link to the etherpad of where my discussion will be surrounding
20:10:06 <asadoughi> so, in ovs today, there are two best practices options of implementing security groups:
20:10:18 <asadoughi> 1. reflexive learning actions (available in OVS today)
20:10:40 <asadoughi> 2. stateless ACLs with tcp_flags= (in OVS master, but not in a tagged version of OVS AFAIK)
20:11:07 <asadoughi> 3. stateless ACLs (self explanatory)
20:11:47 <asadoughi> so, the design decision i wanted to bring to the community was how do we want to implement security groups using 2 (almost 3 options)
20:12:01 <asadoughi> i say almost 3 because #2 is not available in a versioned OVS
20:12:28 <asadoughi> my idea was to implement 3 followed by 2 once it is available in a versioned OVS
20:13:07 <geekinutah> is there anything that we can't implement for security groups with #3?
20:13:44 <asadoughi> geekinutah: great question. so the security groups API implementation is 'stateful' and now i will go through the examples in the etherpad
20:14:12 <asadoughi> so given a server and a client, an instance and remote ip, there are 4 possible flows
20:14:21 <asadoughi> i have enumerated the 4 flows in the etherpad
20:14:46 <asadoughi> the 4 flows represent 2 connections
20:15:04 <asadoughi> 1. nw_src=$instance_ip, tp_src=random, nw_dst=$remote_ip, tp_dst=22
20:15:11 <asadoughi> 2. nw_src=$remote_ip, tp_src=random, nw_dst=$instance_ip, tp_dst=22
20:15:17 <asadoughi> 3. nw_src=$instance_ip, tp_src=22, nw_dst=$remote_ip, tp_src=random
20:15:24 <asadoughi> 4. nw_src=$remote_ip, tp_src=22, nw_dst=$instance_ip, tp_dst=random
20:16:26 <asadoughi> 1. and 4. represent two halves of the same connections. 2 and 3. represent two halfs of the same (but different from 1 and 4) connection as well.
20:17:17 <asadoughi> any comments or questions about the 4 flows / 2 connections?
20:17:22 <JunPark_> ok, they make sense.
20:18:49 <asadoughi> ok, so now with these 4 flows in your collective heads, let's move to the neutron security groups api
20:18:55 <JunPark_> basically, an ssh session to an instance and another ssh session to outside, right?
20:19:02 <asadoughi> JunPark_: correct
20:19:33 <asadoughi> in the etherpad i have the api shown as documented by client help output
20:19:56 <asadoughi> you can see that the api lacks a source port match (tp_src)
20:20:11 <asadoughi> geekinutah: answering your question now hopefully...:
20:20:44 <geekinutah> yup, makes sense
20:20:46 <asadoughi> so, with the existing security groups api it is not possible to implement stateless ACL (#3) without adding a source port match
20:21:17 <asadoughi> so, i have proposed two reviews adding a source port match to the api and client
20:21:29 <asadoughi> #link neutron change https://review.openstack.org/#/c/62129/
20:21:39 <asadoughi> #link neutronclient change https://review.openstack.org/#/c/62130/
20:22:13 <asadoughi> ok, so are we on the same page for implementation choice #3 and what has to change in the api for it to happen?
20:22:22 <JunPark_> a quick question.
20:22:33 <asadoughi> JunPark_: shoot
20:23:06 <JunPark_> as an example of 1 & 4 that deals with outgoing ssh session...
20:24:00 <JunPark_> port 22 can be easily guessed in 4, I believe.
20:24:35 <asadoughi> can you elaborated on "guessed"? are you referring to reflexive learning actions (implementation choice #1)?
20:25:26 <JunPark_> sorry for the confusion. I don't know about "reflexive learning actions." Maybe I may not clearly understand about the issue here. But let me try explain it again.
20:26:15 <asadoughi> so, reflexive learning actions would mean OVS could learn the source port from the egress flow
20:26:48 <JunPark_> when there is an outgoing ssh session that needs to be allowed, those two flows can be built via one single api that only says "allow outgoing port 22."
20:27:03 <JunPark_> the two flows mean 1 & 4 in this example.
20:27:34 <JunPark_> I'm trying to understand why we need to have "source port match" in api...
20:28:23 <asadoughi> oh, i see. good point. so, the specific security group rule that makes the source port match necessary is: in default security groups, you allow all egress; you wouldn't want to automatically allow all ingress in that case.
20:29:30 <asadoughi> are you familiar with default security groups? default security groups are defined as: (1) allow all egress (2) allow all ingress from other instances in the default security group.
20:30:16 <JunPark_> nope...I just got to know such details here. ^^
20:31:28 <JunPark_> anyway, I think I got you now.
20:31:34 <asadoughi> ok. so with choice #3, you will need to add a third rule to say allow all ingress from source port 22. and with choice #2, you will be able to say allow all ingress with tcp_flags=ack (no third security group rule necessary)
20:32:12 <asadoughi> so, my preferred implementation path is choice #3 stateless ACL followed by #2 stateless ACL with tcp_flags.
20:32:32 <JunPark_> that sounds ok.
20:32:39 <asadoughi> i prefer not to implement #1 reflexive learning actions because they are not as perfomant: as i talked to another engineer, "cuts into how many things a megaflow can wildcard, the less that can be wildcarded, the more ovs will have to hit userspace for flows"
20:33:00 <hemanthravi> asadoughi: sorry joined late, are these choices in the etherpad?
20:33:23 <asadoughi> hemanthravi: sorry. i do not have them there, but i'll add now.
20:34:33 <asadoughi> hemanthravi: added
20:35:14 <asadoughi> ok, it sounds like i need to summarize what we discussed here and take it to the ML given lower attendance than i was hoping for
20:35:35 <asadoughi> #action asadoughi to take implementation choice / design discussion to openstack-dev mailing list
20:36:22 <asadoughi> does anyone have any input to the implementation choices other than what was discussed so far?
20:37:40 <geekinutah> so far so good I think
20:37:58 <geekinutah> I will need to learn more about #2, but I agree with you on #1
20:38:07 <JunPark_> another quick question.
20:38:12 <asadoughi> JunPark_: shoot
20:38:17 <geekinutah> and it does seem like a raw stateless implementation is not out of place
20:38:58 <JunPark_> so this firewall or security group related DB is already implemented in icehouse for persistent data?
20:39:26 <JunPark_> sorry for ignorance about the current state of icehouse regarding this topic.
20:39:34 <asadoughi> JunPark_: security groups has been in neutron for a while, yes. with a database and rpc api
20:39:44 <JunPark_> cool. thanks!
20:40:19 <asadoughi> JunPark_: https://github.com/openstack/neutron/blob/master/neutron/db/securitygroups_db.py  https://github.com/openstack/neutron/blob/master/neutron/db/securitygroups_rpc_base.py
20:40:50 <asadoughi> ok, i am going to move on to the next topic
20:41:08 <asadoughi> #topic ovs_neutron_agent nuances/issues/tasks
20:43:13 <asadoughi> 1. so, in neutron, the ovs_neutron_agent (the agent that applies flows) provisions VLANs for vifs after the firewall is invoked, which is troublesome since the security groups flows have to apply the vlan that the agent applied so i'll have a patch to rearrange the vlan allocation before the firewall is invoked later this week
20:43:58 <asadoughi> any questions/ comments about #1?
20:45:14 <asadoughi> 2. ovs_neutron_agent removes all flows on all bridges at initialization, which might be a terrible thing depending on how reliable the agent is for tenant traffic -- i'm not sure exactly what i want to do about that
20:45:46 <asadoughi> any questions/comments about #2? ..any solutions? :)
20:45:47 <geekinutah> asadoughi: it is disruptive
20:46:01 <geekinutah> JunPark_ actually wrote a "soft" restart for the agent
20:46:10 <JunPark_> "at initialization" means "agent restart"?
20:46:17 <asadoughi> JunPark_: correct
20:46:46 <geekinutah> however, this is not a problem directly related to the firewall I think
20:46:59 <geekinutah> as long as flows come up with the default deny or accept depending on policy
20:47:03 <asadoughi> geekinutah: also true, but a concern nonetheless
20:47:06 <geekinutah> but yeah, it needs fixing :-)
20:47:19 <asadoughi> 3. ovs_neutron_agent removes all flows on the vifs port at initialization
20:47:20 <yamahata> How many flows do you expect? Is remove all flows and reinstall all flows  unacceptable?
20:47:57 <asadoughi> yamahata: removing and reinstalling is not acceptable if its killing all of the tenants traffic, pending on agent uptime reliability
20:49:09 <asadoughi> regarding #3, it deletes all flows based on in_port, you cannot delete flows based on cookie until OVS 1.5.0+ which might be troublesome for some users b/c XenStack 6.2 is on 1.4.6 and Ubunut P*/Q* is on 1.4.6 as well
20:49:33 <yamahata> I see. then what is wanted is to retrieve the existing flow and fix them up according to the configuration.
20:49:39 <JunPark_> In our env where, e.g., a host runs 100 VMs, when we restart neutron agent, if agent wipes out all flows, it takes 5 to 10 minutes to complete all flows deployment, which is terrible to us.
20:49:59 <asadoughi> JunPark_: yes, that sounds terrible
20:50:17 <JunPark_> that's why I implemented our own "soft restart"
20:50:35 <JunPark_> i'm not sure the patch that I shared with you before includes that patch of "soft restart" though.
20:51:00 <asadoughi> ok. so #2 sounds like maybe another blueprint to file
20:51:08 <asadoughi> JunPark_: interested in filing that?
20:51:22 <JunPark_> sure.
20:51:32 <kanthi> Amir, one question which may be bit out of context at this point of discussion, how do you plan the between VM traffic filtering
20:51:41 <asadoughi> #action JunPark_ to file blueprint around soft restart
20:52:49 <kanthi> Current security rules use iptables, where all these rules are applied in forward chain and these will not be hit if both VMs are on same compute host in which case they just use L2Network for communication
20:53:04 <asadoughi> kanthi: i don't have that in my prototype at the moment, but i have solutions to base my future work off of ( JunPark_ as well as my co-workers')
20:53:16 <JunPark_> kanthi: do you mean "vm-to-vm traffic"?
20:53:22 <kanthi> yes
20:53:44 <asadoughi> #topic impromptu vm-to-vm traffic
20:53:57 <kanthi> since we would be doing filtering based on ovs ports, these rules might be applied for vm-vm traffic on same host as well
20:54:07 <JunPark_> my patch includes vm-to-vm traffic flows implementation.
20:54:45 <JunPark_> assuming that by default it's allowed because we assume that the flat provider networks is a public vlan.
20:54:52 <asadoughi> kanthi: so something TBD .. i'd like the underlying issues to be discussed and resolved first before we get to that level
20:55:14 <JunPark_> well, i'm not sure which behavior would be the legit default behavior though.
20:55:26 <geekinutah> asadoughi: agreed, let's get the basics in place and then deal with that
20:55:55 <kanthi> ok, then please procces according to the agenda
20:55:56 <asadoughi> #topic ovs_agent: versioning
20:56:23 <asadoughi> so does anyone have issue with requriing ovs 1.5.0+ in the agent? because of the cookie delete issue
20:57:02 <geekinutah> asadoughi: do you know what the latest LTS has built in?
20:57:13 <asadoughi> geekinutah: not handy. can look it up
20:57:20 <sc68cal> 12.04 packages like 0.4.0
20:57:30 <sc68cal> the cloud archive has like 1.10 I think?
20:57:38 <asadoughi> #link https://launchpad.net/ubuntu/+source/openvswitch
20:57:59 <asadoughi> so anything R and above, would be complicit
20:58:13 <geekinutah> one consideration is asking distributions to maintain a kmod, not sure if that's a big deal
20:58:24 <asadoughi> we're running short on time so i'll skip the prototype discussion
20:58:25 <geekinutah> other than that requiring 1.5.0+ seems reasonable
20:58:48 <asadoughi> #topic open ended
20:59:15 <asadoughi> anything before we end?
21:00:03 <asadoughi> #endmeeting