13:00:28 #startmeeting PCI Passthrough 13:00:29 Meeting started Wed Feb 5 13:00:28 2014 UTC and is due to finish in 60 minutes. The chair is baoli. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:00:30 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:00:32 The meeting name has been set to 'pci_passthrough' 13:00:55 hi 13:01:31 Hi 13:02:16 short update with current status? 13:02:42 Irenab, Shall we wait for others 13:03:04 sure 13:05:48 hi 13:06:14 rkukura:hi 13:06:36 snow day here - had to shovel a path to take the trash out 13:06:54 Hi rkukura 13:07:14 rkukura: we are more than month without rain... 13:08:01 we'll have 8-12 inches snow today 13:08:01 baoli: shall we wait for Sandhya? 13:08:05 expecting 10-12" of snow today - looks like about 8" so far 13:08:27 baoli: Where are you located? 13:08:28 irenab, yes. a couple of more minutes 13:08:45 rkukura, westford/boston 13:09:09 rkukura, is your office in westford? 13:09:12 baoli: I'm in Sudbury, office is in Westford 13:09:47 guys, till we wait for Sandhya, how we make progress with vnic_type? 13:09:56 rkukura, we are neighbors 13:10:27 At least the couple days a week I work from the office 13:10:43 yep 13:13:44 irenab, regarding vnic_type, I'd like to see normal users can choose to go with sriov or virtio. 13:13:49 rkukura: did you have any chance to discuss it with other core team members? Shall I send an email to the mailing list? 13:15:09 baoli: I think I'll be able to push the code for it as draft either later today or tomorrow 13:15:10 irenab: I think you should send an email to openstack-dev. 13:15:27 irenab: That's probably best - makes it concrete. 13:15:35 irenab, that sounds great 13:15:40 rkukura: Ok, thanks 13:16:33 rkukura: It will be a hard to present without having nova api defined.. 13:17:14 Not sure if you guys are aware of the change proposed for ipv6. Two new keywords for ipv6 subnet: ipv6_ra_mode and ipv6_address_mode. You'd expect a normal user to fully understand ipv6 before using it 13:17:58 intuitively, a normal user should just say "I want ipv6" for my network 13:18:10 baoli: I'm no IPv6 expert, but don't those effect what happens inside the VM? 13:18:22 baoli: it has permissions as regular user in policy.py ? 13:18:41 rkukura, it just provides ipv6 connectivity 13:19:10 irenab, no restrictions imposed in policy.py 13:19:14 on subnet 13:19:18 With IPv4, the VM needs to know whether DHCP is being used. Aren't these similar? 13:19:48 baoli: only admin or network owner can create subnet 13:19:59 rkukura, in addition to that, a user needs to say dhcp stateless or dhcp statefull, or slacc, etc 13:20:33 baoli: can you please send the link to review? 13:21:19 baoli: Someone should be looking out for the usability of this for normal tenants! 13:21:41 baoli: I am not sure it supposed to be managed by regular user, should be admin or network owner 13:21:43 irenab, you mean the ipv6 change? 13:21:52 baii: yes 13:22:19 baoli:yes 13:22:28 irenab, if it's ok for network owner, then it's ok for a network owner to say that I need a sriov port 13:22:44 If this precedent helps with the case that exposing vnic_type to nomral users is the way to go, fine. 13:23:10 Don't forget that tenants can use networks owned by someone else, as long as the shared attribute is set. 13:23:21 so normal_user =network_owner for the vnic_type case? 13:23:37 There are also proposals being discussed for hierarchies of tenants 13:24:00 I need to state it in the neutron policy.py 13:24:04 I do not think we should restrinct SR-IOV to the case where the tenant is the owner of the network. 13:24:30 its either admin or owner 13:24:41 rkukura: would it be ok? 13:24:43 Networks (and their subnets) are often shared, especially provider networks. 13:25:05 I don't think admin or owner is correct for attaching to a network 13:25:42 rkukura, in the current proposal, sriov ports are shared among tenants! 13:26:50 Why not just let normal users request SR-IOV via vnic_type? 13:27:06 according to policy.py "create_port:mac_learning_enabled": "rule:admin_or_network_owner" 13:27:28 I think we should have same for vnic_type 13:28:35 This whole API is getting way too complicated! 13:29:36 mac_learning_enabled is a nicira-specific extension, it looks like. 13:29:55 irenab, I think that --binding:vnic_type should have at least have the same restriction as the command port-create. 13:30:14 So I think it should be admin_or_network_owner 13:30:26 I have no idea what the use case is for mac_learning_enabled, so maybe it is something normal users would never use unless they own the network 13:31:39 baoli: Wouldn't using admin_or_network_owner would prevent normal tenants from requesting SR-IOV on a port attached to a shared network? Maybe I'm wrong and this poilicy rule takes --shared into account? 13:31:55 rkukura: do not see any other examle that fits... MAC and IP seems indeed something the only admin_or_network_owner should manage 13:32:25 rkukura: there is also rule "shared", not sure how it works 13:32:50 rkukura: so we can mix 13:33:27 Interesting - I do see now that specifying mac_address or fixed_ips is admin_or_network_owner. I'm getting convinced that that rule must take sharing into account. 13:33:58 rkukura, good point. I think that i need to study the policy.py a bit more 13:34:24 anyway seems that the current discussion proves that there is a good chance vnic_type may have different policy rules that other items that may land into binding:profile 13:34:45 I apologize if my misinformation on this has been leading the team astray! 13:35:11 irenab: Agreed. 13:35:16 rkukura: I think your questions are in place and cause us provide good answers 13:36:07 so seems that no reason to block the vnic_type bp, what do you think? 13:38:09 irenab: I'm fine with going forward with it. 13:38:11 rkukura, a shared network can be used by any tenant, is that right? 13:41:09 baoli: there was a question sadasu sent to the mailing list on neutron SRIOV ports and MD. After vnic_type I wanted to start with SRIOVPortMDBase. Do know if sadasu started some work on this? 13:41:33 baoli: That is my understanding. The mailing list discussion regarding hierarchies of tenants/projects may eventually make that more useful. 13:42:03 irenab, not sure if she would do something. 13:42:21 baoli: ok.. 13:43:04 so for vnic_type, we'd go with binding:vnic_type, and set the rule as admin_or_network_owner? is that agreed? 13:43:26 baoli: I agree 13:44:21 cool 13:44:32 let's move on to SRIOVPortMDBase 13:44:38 rkukura: do you agree? 13:45:11 irenab: I agree as long as admin_or_network_owner really does work for shared networks 13:46:12 rkukura: I'll do some test on some existing attribute to verify 13:46:30 rkukura, that's the catch we have to experiment with. But I think that should be same for someone to create a port on a shared network regardless the vnic_type. I maybe wrong, though 13:46:42 irenab: Great. Very interested in what you find out. 13:47:01 the rule can be also the rule like "rule:admin_or_owner or rule:shared", 13:47:29 Irenab, let us know once you find out 13:47:35 baoli: sure 13:47:57 baoli: can you please give short update for nova side? 13:48:32 irenab, sure 13:48:58 basically, yunhong would like to go with a simpler version for Icehouse 13:49:35 The major enhancement would be: a) add attr supports in the whitelist b) support multiple aliases 13:49:55 and c) support stats based on aliases 13:50:15 he is asking for approval 13:50:48 baoli: this will work without need to create VM flavor with PCI alias, right? 13:51:11 If that can be done, then we should be ok. I also asked for the API to support correlation between the allocated device with the requested networks. 13:51:26 irenab, no vm flavor for network. 13:51:27 baoli: and for your bp, is it on the way? 13:52:03 irenab, I put John's name as approver. I need to send an email as well 13:52:15 Is requesting the physical network still implicit in requesting the PCI device? I saw some mention in baoli's wiki of nova asking neutron for the physical_network, but don't think that is possible. 13:53:16 baoli: thanks 13:54:00 baoli: so we need to well define nova-neutron API for setting PCI details and returning VIF details/VIF_TYPE 13:54:02 rkukura, what is not possible? Can you clarify it? 13:55:08 rkukura: yes, the wiki says that, why not possible? 13:55:24 I thought phy_net can be part of the vif_details 13:55:33 baoli: With ML2, a virtual network can be made up of multiple segments, which may have different values for provider:phyiscal_network. I don't think either nova or neutron could know which segment's physical_network to use. 13:56:21 sadasu: Once port binding occurs, a segment has been picked, and the MD can put that segment's physical_network into binding:vif_details. 13:56:48 rkukura, do you have an example on how that's used and provisioned? 13:57:07 rkukura, I mean multi-segments 13:57:11 It seemed to me that the physical_network was needed in nova before the VM is scheduled. Port binding can't be done until after the VM is scheduled. 13:57:51 rkukura: yes. didn't know of his sequencing problem 13:57:51 rkukura, I assumes that a nuetron net is associated with a physical net 13:58:15 rkukura, and a port is created from a neutron net 13:58:58 rkukura: it should possible to support single segment network, right? 13:59:16 time is up. can we switch to a different channel? 13:59:28 Single segment networks are certainly possible, and most common right now, but the ML2 model allows multi-segment networks. 13:59:29 I can for ~10 mins more 14:00:17 Ok, rkukura, could you please send me an email on that? I'd like to know more details 14:00:31 me too 14:00:51 What happened to the plan for the admin to create flavors or host aggregates with SR-IOV connectivity to specific physical networks? 14:01:33 rkukura: Host aggregates can be done today 14:02:12 We should #endmeeting 14:02:43 if no one is using it, we can continue a bit more 14:03:27 OK, but if someone else is scheduled and ready they should speak up 14:04:01 checked, it's open 14:04:09 rkukura: with multi-segment + provider netowrk, should the network be created as with regular plugin? 14:05:44 I think that we only care about the first segment, which is the one that the compute node immediately connects with. 14:06:05 irenab: I'm not sure what you mean. The ML2 plugin supports both the providernet and multiprovidernet extensions. Either API extension can be used to create single-segment provider networks. Only the multiprovidernet extension can be used to create multi-segment provider networks. Creating normal tenant networks doesn't use either extension. 14:06:49 baoli: Port binding can pick any of the segments. First is not special. In fact there is a bug right now that the order of the segments isn't even deterministic. 14:07:31 rkukura, is the multi-segments support documented anywhere? 14:08:53 so now I don't understand how vnic_type helps us ... 14:08:58 baoli: I'm not sure, but multiprovidernet should be covered in the API guide. 14:09:19 Here's what I thought was supposed to happen: 14:09:24 baoli: I think that provider net is documented in neutron admin guide 14:09:40 but not with ML2, if its differ 14:10:00 rkukura, please go ahead to describe it. I'll take a look at it offline as well. 14:10:07 1) The admin creates a flavor or host aggregate with SR-IOV connectivity to a specific physical network 14:11:03 2) Probably the admin, or someone else, creates a virtual VLAN network on that physical network, and gives the tenant access (as owner or shared) 14:11:19 actually I guess 2) needs to be the admin 14:12:05 rkukura, in 2), a neutron net is associated with a physical net, right? 14:12:16 3) The tenant creates a neutron port on that network specifying --binding:vnic_type 14:13:05 4) The tenant boots a VM specifying the flavor or host aggregate with SR-IOV connectivity to that same network 14:14:04 In 4) the user specifying --nic with the port ID from 3 14:14:33 rkukura: I don't see a problem here. neutron net is associated with physicla network and nova had admin access, right? 14:14:43 5) Nova takes care of reserving the PCI slot and PF when scheduling the VM, and stores these details in the binding:profile attrbiute of the port created in 3 14:14:59 rkukura, also in 1), we need to tag the pci devices with a specific physical network on each compute nodes. 14:15:41 6) Either in 5 or after, nova sets binding:host_id to specify, which triggers ML2 port biinding 14:17:02 rkukura, if by port binding, you mean to bind a port to a host, then we don't have a problem here. The association of a port with a physical net is determined when the port is created 14:17:55 7) ML2 port binding tries the registered mechanism drivers. Ones that don't support the binding:vnic_type refuse to binding. The SR-IOV mechanism drivers do try to bind, and look for a segment for which the PCI device has connectivity to the segment's physical network 14:18:50 that information is then used for scheduling a host. We don't want to do a trail-and-error approach in selecting a host to 'bind' 14:19:03 sorry, trial-and-error 14:19:07 baoli: Port binding is what determines which network segment is being used, what the binding:vif_type is, and (soon) what is in binding:vif_details. 14:19:36 but in 2), you said that a vlan net(a nuetron net) is associated with a physical net, right? 14:19:52 rkukura: its first time in the flow you mentioned network segment. when it was created? 14:20:37 baoli: My understanding was that nova's scheduler would schedule on a host from the aggregate or flavor with the needed connectivtiy, so as long as the user gets this right, its not trial and errror. 14:20:53 rkukura, no. that's not the plan for now 14:21:21 irenab: The network segment(s) is/are created in 2 14:22:02 rkukura, regardless sriov, can you provide a work flow for multi-segment? 14:22:07 rkukura: 2) says, network is created 14:22:28 baoli: What is the current plan for making nova schedule on a host with an available SR-IOV VF with connectivity to the needed physical network then? 14:22:46 rkukura, it's described in my wiki 14:23:05 I have to go, please do end meeting by the end to make logs available. 14:23:05 1) you tag each pci device with the attached physical net 14:23:37 2) compute node reports pci stats as "host:net-group:phynet1:count" to the controller/scheduler 14:23:55 baoli: Right now multi-segment networks can only be created by passing in a list of maps describing each segment to create network or update network, using the multiprovider extension. 14:24:40 rkukura, can you provide how it's done in terms of neutron config and work flow (cli commands, etc)? 14:25:16 3) create a neutron net that is associated with a physical net 14:25:35 4) create a neutron port on this net with --vnic-type=direct 14:25:52 5) nova boot --nic port-id= 14:26:13 baoli: I can track down the exact syntax. They aren't too commonly used right now, but the capability is there. The bridging between the segments in not managed by neutron and must be setup administratively right now. 14:26:43 rkukura, are all the segements bridged together? 14:26:57 administratively? 14:27:28 baoli: Yes, that's what makes them the same virtual L2 network, right? 14:27:54 rkukura, so on each compute node, the starting segement may be different? 14:28:34 Longer term plans are for some of this to be automated - maybe creating a vlan segment shared within a specific rack, but a vxlan segment connecting the various top of rack switches 14:29:00 baoli: Not sure what you mean by "starting segment"? 14:29:45 rkukura, I think that we need to do some study on that. Thanks for bringing that up. 14:29:51 baoli: In your steps above, what forces nova to schedule the VM on a node with an available VF for an SR-IOV device with connectivity to the needed physical network? 14:30:27 we have a pci filter scheudler that has to be loaded into the nova scheduler 14:31:00 that pci filter scheduler works off the stats: host:net-group:phynet:count 14:31:02 speaking of nova...which bps are still active on the nova side for icehouse? 14:31:08 do we need to seek core reviewers? 14:31:17 #endmeeting