13:04:42 <baoli> #startmeeting PCI passthrough 13:04:43 <openstack> Meeting started Wed Jan 8 13:04:42 2014 UTC and is due to finish in 60 minutes. The chair is baoli. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:04:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:04:46 <openstack> The meeting name has been set to 'pci_passthrough' 13:05:01 <johnthetubaguy> hi, do we have a rough agenda for today? 13:05:13 <baoli> Hi John 13:05:27 <baoli> I posted this on the wiki yesterday:https://wiki.openstack.org/wiki/Meetings/Passthrough 13:06:11 <irenab__> baoli: how do you suggest to proceed? 13:06:18 <baoli> Yesterday, we were discussing predefined PCI groups. Yongli doesn't seem to like the idea 13:06:37 <baoli> Let's continue from what we have left yesterday 13:06:41 <johnthetubaguy> baoli: got it, thanks 13:07:27 <ttx> baoli: will the meeting change back to Tuesdays once agreement is reached ? I haven't updated the meeting calendar given it's very temporary... 13:07:46 <baoli> ttx, yes. 13:07:58 <ttx> ok, let's skip the calendar update then :) 13:08:03 <baoli> we will be doing daily meetings for this week. 13:08:19 <baoli> except for Friday/Saturday 13:08:33 <johnthetubaguy> baoli: I can't promise to make all those, but lets see how it goes 13:09:04 <johnthetubaguy> Have a agreed the list of use cases we want to support yet? 13:09:29 <johnthetubaguy> like a short term list (for Icehouse) and longer term aims too? 13:09:58 <baoli> john, we didn't go through those cases yet. But stuck on the first part 13:10:17 <johnthetubaguy> I thought use cases would be the first part, which bit are we stuck on? 13:10:18 <baoli> But I guess that we should go through them first? 13:10:34 <irenab__> baoli: there is a list of use cases you put on wiki 13:11:03 <baoli> Shall we start from use cases today, then? 13:11:03 <irenab__> I just miss there one more case for mixed VIFs, both sriov and vnic for same VM 13:11:19 <baoli> #topic use cases 13:11:22 <heyongli> i think john mean the use case in the blueprint of nova. 13:11:33 <baoli> #irenab, yes, I should put that in 13:11:53 <johnthetubaguy> heyongli: I think we probably want both, but lets start with this wiki first 13:12:02 <heyongli> sure 13:12:29 <baoli> #topic SRIOV-based cloud 13:12:37 <baoli> Any thoughts on this? 13:12:51 <johnthetubaguy> Can we start with GPU passthrough? 13:13:00 <johnthetubaguy> just to keep things simple 13:13:04 <baoli> Ok 13:13:27 <johnthetubaguy> how do we want that to look? 13:13:39 <johnthetubaguy> nova boot —flavor bigGPU 13:14:04 <johnthetubaguy> nova boot —flavor smallGPU_4GBRAM_2_vCPUs 13:14:06 <baoli> John, our discussion so far is based on PCI groups 13:14:31 <heyongli> groups is almost identical to pci-flavor 13:14:39 <johnthetubaguy> I think there will be more agreement if we work from what the user wants, then look how to deliver that 13:15:01 <johnthetubaguy> i.e. agree the problem we are solving, then look how to implement 13:15:08 <johnthetubaguy> then apply that to networking 13:15:26 <irenab__> johnthetubaguy: I think we mostly talked on PCI for networking, and this is quite different from GPU case 13:15:31 <baoli> #agreed 13:16:02 <johnthetubaguy> I agree its different, but we need the object model to work for both right? 13:16:12 <baoli> When we have been working on this for a while, and certainly we would think about it from user's point of view 13:16:34 <baoli> Also taking into account what existing API we have in nova/neutron 13:16:59 <irenab__> johnthetubaguy: not sure it will e the same from request point of view 13:17:26 <irenab__> I have strong objection to elaborate in flavor request for SRIOV NICs 13:17:46 <baoli> John, in any case, a PCI group/pci flavor can be used in the nova server flavor 13:17:49 <irenab__> which is fine for Device passthrough case 13:17:58 <johnthetubaguy> yes, and I think I agree, but I would just like to see both SRIOV and GPU side by side 13:18:30 <johnthetubaguy> if we agree how to setup GPU, for example, it should be very similar for SRIOV, agreed the user bit is probably different 13:18:42 <irenab__> I think GPU case should be mostly as today, with extra_spec for PCI device 13:19:09 <irenab__> the proposal is to change the terminology from pci_alias to pci_group 13:20:02 <heyongli> irena: no , i think the alias is diffrent , we can drop but not the same thing with group 13:20:06 <johnthetubaguy> OK, can we recap what we have in the code today, if only for my benefit? 13:20:22 <johnthetubaguy> then agree how GPU looks in the new(er) world? 13:20:52 <baoli> Yongli, can you go ahead to describe that for John? 13:20:55 <heyongli> we have now is : alias: define how you chose the device 13:21:11 <heyongli> server flavor: use alias request your device 13:21:37 <heyongli> white list: select device from a host, pick which can be assign to VMs 13:22:00 <baoli> Just want to add that the extra_specs/whitelist is based on PCI device's vender_id and product_id 13:22:22 <johnthetubaguy> and how do the while list and alias relate again? 13:22:41 <baoli> by vendor_id and product_id 13:22:55 <heyongli> alias is chose device from the available pool 13:23:10 <johnthetubaguy> how does the device id come into things? only via whitelist? 13:23:13 <heyongli> whitelist chose device from all on a specific host 13:24:13 <johnthetubaguy> Ok, so if I want a flavor that says pick either GPUv3 and GPUv4, can I do that? 13:24:36 <heyongli> alias support this 13:24:57 <heyongli> define a alias, say the GPUv3 or GPUv4 13:25:05 <johnthetubaguy> OK, so alias is a list of possible vendor_ids and product_ids? 13:25:12 <heyongli> yeah 13:25:16 <johnthetubaguy> does it include device ids? 13:25:29 <heyongli> what id do you mean? 13:25:43 <johnthetubaguy> the PCI device id, where does that come into the model? 13:26:11 <heyongli> no alias not include the device id ( the DB main key) 13:26:50 <johnthetubaguy> so where does the device id come from? it gets seletected our of the whitelist on the device when attaching it to the VM? 13:26:51 <baoli> John, by id, do you mean PCI slot? 13:26:59 <johnthetubaguy> possibly 13:27:21 <heyongli> that infomation store in the pci device model 13:27:47 <johnthetubaguy> I think I mean address, sorry 13:28:03 <heyongli> alias should not include the address 13:28:05 <baoli> domain:bus:slot:func 13:28:16 <johnthetubaguy> right, thats the thing 13:28:26 <heyongli> whitelist doe not also, but i added alread in current patch i released 13:28:28 <johnthetubaguy> is that in the whitelist? 13:28:40 <johnthetubaguy> ah, OK 13:28:57 <heyongli> and support the * and [1-5] 13:29:01 <johnthetubaguy> so the big step between GPU and SRIOV is groups different addresses? 13:29:09 <johnthetubaguy> grouping^ 13:29:33 <baoli> yes, they belong to different groups 13:29:52 <johnthetubaguy> so that should go in the alias now? for SRIOV? 13:30:05 <heyongli> alias don't need the address 13:30:16 <irenab__> Do we talk about SRIOV for networking or general? 13:30:18 <heyongli> add the group to alias is sufficent 13:30:37 <johnthetubaguy> OK, so we are adding an extra thing called group? 13:30:43 <heyongli> this is also in my patches released 13:30:46 <johnthetubaguy> that deals with grouping addresses? 13:30:46 <heyongli> yeah 13:30:59 <johnthetubaguy> why is this not just part of alias, that is just a grouping right? 13:31:12 <heyongli> yeah, just in group 13:31:16 <heyongli> alias is global 13:31:23 <johnthetubaguy> (sorry lots of dumb questions, but I just don't think I get where you are coming from now) 13:31:30 <johnthetubaguy> so group is going to be local to each server? 13:31:39 <heyongli> should not say, i want the devicd had bdf is a:b:c, this is meanless 13:31:59 <baoli> PCI group is global 13:32:02 <heyongli> kind of local , like pci vendor 13:32:20 <heyongli> if we keep alias as it, this is local 13:32:29 <heyongli> if we kill alias, this is going to global 13:32:35 <baoli> Yongli, alias is defined on the controller node 13:32:41 <heyongli> yeah 13:32:48 <johnthetubaguy> hmm, but its a local thing that gets referenced in a global concept (flavor) 13:32:55 <johnthetubaguy> I think this is where it gets very confusing 13:33:11 <heyongli> kind of confusing, might have a better solution 13:33:34 <johnthetubaguy> So, from my outsider view, this seems: 13:33:46 <heyongli> but , group is very like the vender id 13:34:15 <johnthetubaguy> a) roughly complete, but (b) a bit confusing ( c) re-inventing groupings we already have in other bits of nova 13:34:17 <heyongli> we can say vendor id is global, cause it's alloced by pci world 13:34:40 <johnthetubaguy> I think we can agree on this though... 13:35:08 <johnthetubaguy> PCI device has: vendor_id, product_id, address 13:35:27 <johnthetubaguy> and we want to group them 13:35:44 <baoli> well, vendor_id is a hardware specific thing 13:35:50 <johnthetubaguy> types of GPU (don't care about address), types of VIF (do care about specific groups of addresses) 13:36:25 <johnthetubaguy> by default we should not expose any of these devices, unless we configure nova to allow such a device on a particular host to be exposed 13:36:26 <heyongli> VIF should not care the address, i think, they just need partion by address, am right ? 13:36:47 <johnthetubaguy> well, they a grouped by an address range right? 13:36:59 <heyongli> yeah, think 13:37:14 <irenab__> john: it maybe PF 13:37:24 <irenab__> parent of all Virtual Functions 13:37:44 <johnthetubaguy> ah, OK, so we have virtual functions from a specific address too? 13:37:55 <johnthetubaguy> or is function just part of the address? 13:38:18 <baoli> Join, in SRIOV, we have PF and VF 13:38:40 <baoli> PF: physical function, VF: vritual function, The function is part of the address 13:38:56 <johnthetubaguy> thats cool, just checking we are still grouping by address 13:38:58 <irenab__> Virtual Function is a PCI device of SRIOV NIC that has Parent Function representing the SRIOV NIC itself 13:39:21 <johnthetubaguy> thats all cool, just trying to work out what we are grouping 13:39:54 <baoli> A PCI group is a collection of PCI devices that share the same functions or belong to the same subsystem in a cloud. 13:40:27 <irenab__> actually what we need for basic networking case is grouping by network connectivity 13:40:57 <baoli> Irenab, that's what I mean by subsystem 13:41:02 <johnthetubaguy> OK, so we need someway to link the address to the neutron network-uuid? 13:41:42 <baoli> in the case of SRIOV, new --nic options will achieve that 13:42:15 <irenab__> john: yes, but wee need to make sure that VM is scheduled on appropriate Host 13:42:38 <johnthetubaguy> well, I am not sure it always can, the user doesn't know which host that request will land on right? it just hints to some mappings 13:42:51 <heyongli> +1 13:42:54 <johnthetubaguy> anyways, I think we are moving foward here 13:43:24 <baoli> pci group is a logical abstraction 13:43:53 <irenab__> john: its an idea. Based on the VM boot request it should be scheduled on the Host that is capable to provide SRIOV nics and connect to correct physical network 13:43:59 <baoli> it doesn't care where it lands, but as long as it's using a device in a particular pci group 13:44:25 <heyongli> agree 13:44:45 <johnthetubaguy> right, so what is the user requesting here 13:45:05 <johnthetubaguy> the neutron network, and the type of connection? 13:45:27 <johnthetubaguy> so passthrough, or virtual, and also which type of passthrough, 1Gb or 10Gb, etc? 13:45:29 <baoli> a neutron network with a NIC that is in a particular PCI group 13:45:39 <irenab__> on wiki: nova boot --flavor m1.large --image <image_id> --nic net-id=<net-id>,vnic-type=macvtap,pci-group=<group-name> <vm-name> 13:45:42 <johnthetubaguy> I am trying to ignore our terms here, and thing of the user 13:45:47 <johnthetubaguy> think^ 13:46:02 <baoli> John, 1Gb or 10 GB is a qos thing 13:46:34 <baoli> It's not related to what we are discussing here. But conceptually, you can have a PCI group with 1GB nics 13:46:36 <johnthetubaguy> depends, it could be different cards right? 13:46:54 <irenab__> john: on --nic there is waht we think is needed 13:47:28 <heyongli> to deal with 1G, 10G thing, add the pci device_id to alias is a good solution 13:47:37 <heyongli> i think 13:48:05 <baoli> AGain, you can use PCI groups to group NICs that are on different kind of cards 13:48:16 <heyongli> also work 13:48:33 <johnthetubaguy> OK 13:48:47 <johnthetubaguy> I have written up what I think we said here: 13:48:48 <johnthetubaguy> https://wiki.openstack.org/wiki/Meetings/Passthrough#Definitions 13:48:59 <johnthetubaguy> Do we all agree with those statements? 13:49:40 <johnthetubaguy> sorry, I missed a bit, please refresh 13:49:48 <johnthetubaguy> extra bullet on SRIOV 13:50:23 <heyongli> i post my +1 13:50:29 <irenab__> john: I think the last SRIOV bullet is not accurate. 13:50:54 <johnthetubaguy> irenab__: yeah, I don't like it, what is a better statement? 13:51:15 <irenab__> Its not specific to neutron network , its specific to provider_network that many neutron networks can be defined for 13:51:39 <baoli> John, can we go throught the original post and see if they make sense? 13:51:43 <johnthetubaguy> OK, so it could be specific to a group of netron networks? 13:51:52 <irenab__> john: yes 13:52:06 <johnthetubaguy> irenab__: awesome, got you, thanks 13:53:00 <johnthetubaguy> irenab__: can you check my update please, is that better? 13:53:32 <johnthetubaguy> baoli: we can do that next, I just wanted to agree some basics of what we have, and what we need 13:54:04 <baoli> ok 13:54:26 <irenab__> john: its OK. Not sure what you mean by specific configuration 13:54:45 <johnthetubaguy> I was meaning neutron might specify settings like VLAN id 13:55:19 <irenab__> john: correct 13:55:30 <johnthetubaguy> cool, thanks, let me add an e.g. 13:55:55 <johnthetubaguy> so I guess in basic cases we pass the full nic though 13:56:08 <johnthetubaguy> and its straight to the provider network 13:56:11 <irenab__> each device can be configured differently, but the common part is that it has same network connectivity (to the same Fabric) 13:56:22 <johnthetubaguy> but if we have virtual devices, we can do some fun stuff 13:56:25 <johnthetubaguy> right 13:57:10 <irenab__> john: with full NIC passthrough , I think there is nothing for neutron to do 13:57:38 <johnthetubaguy> irenab__: yeah, it probably gives the guest IP addresses, and things, but yes, there is little connection info I guess 13:58:30 <irenab__> In full passthrough it can be only configured from inside the VM 13:58:39 <baoli> server flavor can still be used for generic PCI passthrought 13:58:50 <irenab__> at least for cases I need, we talk only of SRIOV VFs 13:58:51 <johnthetubaguy> I don't get why that is, neutron DHCP can still setup, if its given the mac address? 13:59:28 <irenab__> jogh: agree. I mean that you need VM to actually do something to get the config, like send DHCP request 13:59:30 <heyongli> they might mean pass through regular PCI 13:59:51 <johnthetubaguy> irenab__: ah, yep, sorry, thats true 14:00:09 <johnthetubaguy> cool, so I think we can agree the GPU passthrough case then... 14:00:30 <johnthetubaguy> user requests a flavor extra specs *imply* which possible PCI devices can be connected 14:00:33 <heyongli> still base on alias, right ? 14:00:51 <irenab__> john: would you be available tomorrow for this meeting to dig into SRIOV net details? 14:00:55 <johnthetubaguy> I am leaving that out for now.. we can add that later 14:01:06 <johnthetubaguy> what time is tomorrow? 14:01:15 <baoli> same time 14:01:23 <johnthetubaguy> 13.00 UTC? 14:01:27 <johnthetubaguy> that should be OK 14:01:31 <baoli> Yes 14:01:40 <irenab__> great. thanks 14:01:45 <baoli> Do we want to end this meeting now? 14:01:57 <johnthetubaguy> we might have to soon 14:02:02 <johnthetubaguy> I can do another 10 mins 14:02:14 <irenab__> I can too 14:02:20 <baoli> cool 14:02:23 <heyongli> fine 14:03:07 <irenab__> Do we want to start SRIOV NIC case? 14:03:32 <johnthetubaguy> well, just thinking about doing a statement like 14:03:33 <johnthetubaguy> user requests a flavor extra specs *imply* which possible PCI devices can be connected 14:03:42 <johnthetubaguy> as in thats the GPU case 14:03:54 <johnthetubaguy> what do we say for the SRIOV case? 14:04:13 <irenab__> I think flavor extra spec is not god solution for networking case 14:04:32 <irenab__> good^ 14:04:37 <baoli> a VM needs NICs from one or more PCI groups 14:04:51 <johnthetubaguy> user requests neutron nics, on specific neutron networks, but connected in a specific way (i.e. high speed SRIOV vs virtual) 14:05:01 <johnthetubaguy> doe that make sense? 14:05:05 <johnthetubaguy> does^ 14:05:15 <irenab__> and VM can be attached to different virtual networks 14:05:39 <irenab__> and iterface can be attached/detached later on 14:05:49 <irenab__> interface^ 14:05:51 <baoli> I should say, A VM needs NICs on some networks from some PCI groups 14:05:55 <johnthetubaguy> some of the nics may be virtual, some may be passthrough, and some might be a different type of passthrough 14:06:03 <baoli> yes 14:06:08 <irenab__> john: correct 14:06:15 <johnthetubaguy> I am trying to exclude any of the admin terms in the user description 14:06:26 <johnthetubaguy> so we have a clear vision we can agree on, thats all 14:06:49 <baoli> #agreed 14:06:56 <irenab__> john: vision yes, implementation details - no 14:07:00 <johnthetubaguy> OK, I updated the wiki page 14:07:08 <johnthetubaguy> https://wiki.openstack.org/wiki/Meetings/Passthrough#The_user_view_of_requesting_things 14:07:14 <johnthetubaguy> do we agree on that? 14:07:47 <irenab__> john: yes 14:08:14 <johnthetubaguy> sorry to take up the whole meeting on this, but really happy to get a set of aims we all agree on now 14:08:14 <baoli> #agreed 14:08:19 <johnthetubaguy> sweet 14:08:30 <heyongli> +1 14:08:35 <johnthetubaguy> so I think the question now, is how do we get the admin to set this up and configure it 14:08:40 <johnthetubaguy> and what do we call everything 14:09:01 <irenab__> agree 14:09:06 <johnthetubaguy> that sounds like something for tomorrow, but maybe spend 5 mins discussing one point... 14:09:06 <baoli> #agreed 14:09:15 <irenab__> ok 14:09:20 <johnthetubaguy> at the summit we raised an issue with the current config 14:09:44 <irenab__> john: can you recap 14:10:10 <johnthetubaguy> basically we are trying to keep more of the config as API driven, to stop the need for reloading nova.conf, etc, and general ease of configuration 14:10:18 <johnthetubaguy> now clearly not everything should be an API 14:11:01 <johnthetubaguy> also, in other sessions, we have pushed back on ideas that introduce new groups that are already covered by existing generic groupings (i.e. use host aggregates, don't just add a new grouping) 14:11:06 <baoli> John, we have discussed configuration versus API for the past couple of meetings. Would you be able to look at the logs? I can send you the logs 14:11:31 <johnthetubaguy> yeah, if you can mail me the lots that would be awesome, or are then on the usual web address? 14:11:43 <johnthetubaguy> did we have nova-core review any outcomes of that yet? 14:11:47 <irenab__> john: we try to define auto-discovery of PCI device in order to minimize items needed for config 14:11:55 <johnthetubaguy> right that sounds good 14:12:11 <johnthetubaguy> I should read up on those lots 14:12:13 <johnthetubaguy> logs 14:12:16 <baoli> A couple of them are in the daily logs, but not in the meeting logs 14:12:29 <johnthetubaguy> ah... 14:12:49 <baoli> I need to find a way to link them back here. I'll try to do that 14:12:59 <irenab__> baoli: I think its better you send it, since there was meeting name change and one meeting without starting... 14:13:12 <johnthetubaguy> cool, we should probably end this meeting, then add those pointers to the wiki page? 14:13:18 <baoli> I'll send them again. 14:13:28 <baoli> Sure 14:13:32 <johnthetubaguy> cool, could we just add it to that meeting wiki page? 14:13:37 <johnthetubaguy> cool 14:13:49 <irenab__> thanks, I think the meeting was productive. see you tomorrow 14:13:55 <baoli> I'll do both. See you guys tomorrow 14:13:55 <heyongli> thanks,baoli 14:14:00 <johnthetubaguy> so to be upfront, I think we can do the whole grouping with host aggregates and an API to list all pci devices 14:14:09 <johnthetubaguy> but yep, lets chat tomorrow! 14:14:21 <baoli> #endmeeting