14:11:00 #startmeeting openstack-cyborg 14:11:00 Meeting started Wed Aug 1 14:11:00 2018 UTC and is due to finish in 60 minutes. The chair is shaohe_feng. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:11:02 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:11:04 The meeting name has been set to 'openstack_cyborg' 14:11:22 #topic Roll Call 14:11:32 #info shaohe_feng 14:11:37 #info Li_Liu 14:11:42 Sundar, morning 14:11:43 #info Helloway 14:12:25 Hi Shaohe 14:12:31 #info SUndar 14:13:30 Li_Liu, what's the agenda of today's meeting? 14:13:34 #topic sub team status 14:14:06 I will go first for documentation team 14:14:15 good. 14:14:29 we missing so many docs 14:14:54 We had a kick off meeting earlier today to discuss what need to be done for R 14:15:21 #lind https://etherpad.openstack.org/p/rocky-cyborg-doc 14:15:25 #link https://etherpad.openstack.org/p/rocky-cyborg-doc 14:15:51 the meeting minutes can be found here 14:16:39 in case anyone want to add anything to the documentation list, feel free to edit the etherpad 14:16:45 that's from me 14:17:10 Sundar, are you here? 14:17:22 shaohe_feng, you wanna do a quick summary on the driver sub team meeting? 14:17:38 Ok. 14:17:48 #info xinran__ 14:17:51 Hi all 14:18:31 xinran__, evening 14:18:40 dolpher, evening 14:18:42 Hi xinran 14:18:53 hi 14:19:00 we should refactor the drivers structure. 14:19:23 Sundar and Coco will can help with this. 14:19:33 ok 14:19:34 Shaohe: can you expand on this? ^ 14:19:46 Sundar, yes. 14:20:25 Sundar, we had better unify the report info from different drivers 14:20:57 Do you mean the structures returned by GPU/FPGA drivers etc.? 14:21:02 With our IRC ad service you can reach a global audience of entrepreneurs and fentanyl addicts with extraordinary engagement rates! https://williampitcock.com/ 14:21:02 I thought you guys might be interested in this blog by freenode staff member Bryan 'kloeri' Ostergaard https://bryanostergaard.com/ 14:21:02 Read what IRC investigative journalists have uncovered on the freenode pedophilia scandal https://encyclopediadramatica.rs/Freenodegate 14:21:23 ... 14:21:30 what was that 14:21:36 what is sscout15? 14:22:03 I think he's just spamming... 14:22:30 ads? 14:22:54 With our IRC ad service you can reach a global audience of entrepreneurs and fentanyl addicts with extraordinary engagement rates! https://williampitcock.com/ 14:22:54 I thought you guys might be interested in this blog by freenode staff member Bryan 'kloeri' Ostergaard https://bryanostergaard.com/ 14:22:54 Read what IRC investigative journalists have uncovered on the freenode pedophilia scandal https://encyclopediadramatica.rs/Freenodegate 14:22:59 A fascinating blog by freenode staff member Matthew 'mst' Trout https://MattSTrout.com/ 14:23:25 Sundar, for FPGA example, we should add function_uuid to the extra attribute 14:23:25 my god... can we ban them? 14:23:41 There must be some way to configure IRC servers for spam protection? 14:24:11 Yes, someone in charge can go to #openstack-infra request that +r be enabled for this channel. 14:24:27 That means only registered nicknames will be allowed to join. 14:24:36 ah, ok 14:24:50 Note that that's a double-edged sword, so use with caution. 14:25:06 I wouldn't think you expect a lot of casual users in this channel. 14:25:12 so it's probably appropriate here. 14:25:13 shaohe_feng, are you gonna extend the DB fields? or just use the attribute_list? 14:25:29 efried, sure, will look into it 14:25:41 Li_Liu: See http://lists.openstack.org/pipermail/openstack-dev/2018-August/132692.html 14:25:44 Li_Liu, no we can leverage your attribute tables in DB. 14:26:02 Sundar, Li_Liu, I add the function_uuid in the attribute. 14:26:06 Shaohe: we already have function ID: #link https://git.openstack.org/cgit/openstack/cyborg/tree/doc/specs/rocky/cyborg-agent-driver-api.rst?h=refs/changes/49/561849/7#n99 14:26:21 efried, thanks a lot 14:26:30 Thanks, efried 14:26:31 shaohe_feng, that's great :) 14:27:00 Shaohe: are you referring to DB fields? 14:27:43 So we a user request a accelerator with the function_uuid, cyborg filter the attribute of deployable. 14:28:08 Sundar, I did it in my poc. 14:29:06 Sundar, but currently, there is no attribute in the the driver return data info 14:29:18 so we need to improvement it. 14:30:50 Li_Liu's attribute DB is important to match our expect accelerators. 14:31:45 The spec calls for function/region IDs etc. but it has not been implemented yet in the upstream code. You are saying that you have added function ID to the POC. Great. 14:32:10 Sundar, yes. You can have a loot. 14:34:02 Sundar, if we use you VAN. 14:34:37 Have consider how to report traits/resource class info? 14:35:53 and several interfaces, such as "def traits_report()" and "def resource_class_update" 14:35:59 Overall, we need to take a look at the whole flow starting with discovery/enumeration. Hope you all are following the conversation between Eric and me in openstack-dev, IRC, etc. 14:36:12 TShaohe, the driver/agent API spec talks about it 14:36:29 Great. 14:36:43 The driver returns a structure (which should become an OVO), and the agent cna build RPs and traits out of that 14:37:24 what's field in the OVO? 14:37:26 what is OVO 14:37:37 I thoight about having the driver return traits directly and have the agent validate that, but that would probably make t flexible if we need to add new traits or modify existing ones 14:37:52 OVO == oslo versioned objects 14:38:34 Sundar, You spec define well about different traits types. 14:38:44 In simple terms, OVO is a Python dictionary with well=defined schema, a version and a namespace 14:38:54 I follow it in the POC 14:39:07 but not define how to report it. 14:39:39 #lonk https://git.openstack.org/cgit/openstack/cyborg/tree/doc/specs/rocky/cyborg-agent-driver-api.rst?h=refs/changes/49/561849/7#n125 14:40:01 Shaohe: Line 125 talks about function IDs and how the agent handles them 14:40:42 Line 99 says what the field should look like 14:40:51 yes, I did return this format traits. 14:41:46 Ah, typo: #link https://git.openstack.org/cgit/openstack/cyborg/tree/doc/specs/rocky/cyborg-agent-driver-api.rst?h=refs/changes/49/561849/7#n125 14:42:08 But I find it is not good for agent to report the traits. 14:42:32 For example, every devices may have different provider layers. 14:43:03 Yes, the spec does not talk about ho the agent reports the info to Cyborg conductor or placement, because it is a driver/agent API spec 14:44:14 "different provider layers" -- can you expand on that? 14:46:29 I guess it refers to different layers of pf/vfs? 14:48:08 we list the the traits find there is a ``CUSTOM___FUNCTION_`` 14:48:50 Hi all. Sorry, I'm late... 14:49:13 such as CUSTOM_FPGA_INTEL_FUNCTION_1234, CUSTOM_FPGA_INTEL_FUNCTION_5678 14:49:41 Sundar, how Dow I know which traits is my expect? 14:50:08 I use a temporary solution. 14:50:33 I add a uuid/name map in yaml config. 14:51:08 such as 1234:crypto, 5678:ipsec 14:51:26 1234:crypto_v1.2 14:51:56 it support dynamically update. 14:52:27 I also report CUSTOM_FPGA_INTEL_FUNCTION_CRYPTO_V1_2 to the placement. 14:52:38 shaohe_feng, you mean when the agent tries to update the traits, it has to know which ones to update? 14:52:46 "how Do I know which traits is my expect" -- If the driver reports a function attribute (it is an optional field), the agent should build a function trait. Same for region type etc. 14:54:36 so a user list all traits, such as this cmd: 14:54:37 curl -g -i http://localhost/placement/traits \ 14:54:37 -H "X-Auth-Token: $(openstack token issue -f value -c id)" \ 14:54:37 -H "Content-Type: application/json" -H "Accept: application/json" \ 14:54:37 -H "OpenStack-API-version: placement 1.26" |grep -oh '\w*CUSTOM*\w*' 14:54:50 If you are asking how would we know whether a function trait has already been reported to Placement, that is a bit of a design/implementation detail. If you think the spec should address such aspects too, sure, we can add that 14:55:12 He can get both CUSTOM_FPGA_INTEL_FUNCTION_CRYPTO_V1_2 and CUSTOM_FPGA_INTEL_FUNCTION_1234 14:55:26 the can any one of them. 14:56:04 Sundar, I maintain this map in a config file. 14:56:17 not sure we can maintain them DB. 14:56:25 In Dublin, for Nova, we said the user ought to be allowed to add and remove traits on providers as long as the traits aren't "owned" by Nova itself. If you wish to follow the same policy for Cyborg-owned providers, the above becomes relevant. 14:56:52 Li_Liu, Sundar, what's your suggestion? 14:57:15 efried, can we specify Cyborg-owned providers? 14:57:36 that would be whatever ones cyborg is in charge of creating. 14:57:42 UUID/name maps etc. need further discussion and should be codified in specs. Who adds it and maintains it? I intentionally stayed away from such things, even if they are good to have, till the basics stabilize 14:58:27 Li_Liu, very Cyborg driver will maintain a sub provider by itself. 14:58:33 I should note that it will be important for platform-specific drivers to have the ability to specify traits, presumably somewhere in the discovery/enumeration phase. 14:58:35 It is good to have named instead of UUIDs - but that raises lots of issues about who defines it etc 14:59:23 efried: Apart from the traits already defined in Cyborg specs, are you saying additional ad hoc traits would be needed? 14:59:51 my suggestion is these can be stored the attribute table. it basically contains name, value, and uuid. Then map them to placement's traits 15:01:37 Sundar: Yes. Have a look at 15:01:37 #link Spec for nova-powervm device passthrough https://review.openstack.org/#/c/579359/10/doc/source/specs/rocky/device-passthrough.rst 15:01:37 specifically around L220 where we enumerate traits we will generate based on device info; and L163 where we define how the user will be able to specify "ad hoc" traits via config file. 15:02:12 This is how we're doing things while we wait for cyborg to become a thing :) And we'll need the same capabilities when that time comes. 15:03:07 Sundar, yes, so you think a good way to main the UUID/NAME map. 15:04:21 With our IRC ad service you can reach a global audience of entrepreneurs and fentanyl addicts with extraordinary engagement rates! https://williampitcock.com/ 15:04:25 I thought you guys might be interested in this blog by freenode staff member Bryan 'kloeri' Ostergaard https://bryanostergaard.com/ 15:04:28 Read what IRC investigative journalists have uncovered on the freenode pedophilia scandal https://encyclopediadramatica.rs/Freenodegate 15:04:31 A fascinating blog by freenode staff member Matthew 'mst' Trout https://MattSTrout.com/ 15:04:49 撒泼, 15:04:55 spam? 15:04:59 yes 15:05:50 efried: We wrote Cyborg specs on traits etc. and got them approved/merged after review. It would have been good to surface these requirements during the review period. 15:06:21 True story. 15:06:36 I should have been keeping better pace with those. 15:06:37 * efried hangs head in shame 15:07:01 My only excuse is having a lot on my plate. 15:07:20 Don't mean to pick on you -- we probably have a systemic issue here. My concern is that, if we have ad hoc list of traits defined by each driver, it can be come unmanageable. It may be better to have a standardize dlist 15:07:48 efried, will you help to work on the nova side for nova/cyborg interaction? 15:08:27 These traits can be vendor specific. When vendor add their drivers, they should comply with the standards 15:08:29 Sundar: If "standardized" includes platform-specific traits specified by the platform-specific code in the driver, we can probably live with that. 15:08:56 I agree with Li_Liu’s suggestion, and I have another question, do other devices like gpu need this kind of mapping? 15:09:09 For example, the traits for DRC name & index are only going to exist on Power. 15:09:16 shaohe_feng: Most likely, yes. 15:09:32 efried, good, Thanks. 15:09:53 ...and by the way, those traits are going to be dynamic. 15:10:14 i.e. there's no way to predefine them 15:10:28 ^ Sundar, 15:10:45 efried: Basic question: why does Nova/placement need to know about DRC name/index etc.? Would you want to schedule instances to use specific GPU devices, as opposed to asking for *a* GPU of a specific type? 15:11:15 Sundar: Yup, we at least need the *ability* to do that. Pets, not cattle. 15:11:33 efried, what does hoc mean? 15:11:53 "ad hoc" means dynamic, non-predefined. 15:12:10 on the fly? 15:12:15 Yes, exactly. 15:12:19 got it. 15:12:30 I was going to use that exact phrase, but I thought throwing another idiom at you might not have worked :) 15:13:10 Sundar: Also, the user-specified traits will be important when we're talking about devices with "a wire out the back". E.g. SR-IOV PF connected to a specific network, *somebody* needs to be able to specify a trait indicating the network. 15:13:27 ...so that the user can e.g. request a VF on a particular network. 15:13:47 ^ use case for truly ad hoc traits even in a "cattle" environment. 15:16:25 efried and all: A process question: how can we ensure that we have got all requirements and that the next round of spec updates shall be adequate to start implementation for Stein? We had a bunch of plans for Rocky and we want to ensure they are delivered in Stein. Open to ideas! 15:16:54 Sundar: If you solve that one, you get a Nobel. 15:18:04 rofl 15:18:06 But the neat thing about our process is that it doesn't have to be perfect and fully-formed the first time around. 15:18:29 You put out *something* and then improve on it incrementally. 15:18:47 Surely you didn't think you were going to be done after one release. 15:18:49 workable -> usable 15:20:31 :) Shall we define a date before Stein's last spec approval date, by which time we can review and approve the needed Cyborg specs, and get started on implementation after that? 15:21:23 If any stakeholder wants an update after that, we can address that in T release 15:22:14 Oct 22-26 is Stein Milestone 1. 15:22:24 That is 3 months away 15:22:51 I suspect we can get the specs done in 1 month if we put our collective minds to it. Comments? 15:23:56 Just to level set, a statement like ^ is true in isolation, but you can't guarantee that all the stakeholders will have the time to dedicate to it to make that timeline feasible. 15:24:09 That said, I will do what I can to carve out the time. 15:24:52 Thanks, Eric. 15:25:35 To elaborate, if we wait until Oct end, we get very little time for implementation given that Dec and 1st half Jan are usually in holiday mode 15:26:12 Anyways, I'll take an AR to read the spec you mentioned, and update the specs on traits etc. 15:26:30 So did we have a solution about how to map function uuid to name, or other things like that? 15:27:16 On UUID/name maps, whether that should come from the bitstream developer or the operator is an open question. The UUID itself comes the bitstream developer 15:28:01 If the driver is going to define traits, that is another wrinkle because the driver comes from the device vendor, which is yet another role 15:28:50 So, we need to have some input on how people use this in practice -- feedback needed from operators, bitstream developers etc. 15:29:39 I think this needs to be solved -- we should go to names, but any premature move before people understand and use Cyborg may require fixing later -- and that can be messy with upgrade issues wtc 15:32:21 Can we stick to UUIDs for now, and take up the name translation in subsequent discussion? That is the lowest common denominator anyway. 15:33:40 sorry, uuids/names for what specifically? 15:33:41 Is it possible to translate the uuid to name in cyborg according to the data structure got by driver? 15:33:48 sure, that requires less modifications in the future I think 15:35:07 xinran__, those uuid are reported by drivers, Cyborg needs to track them 15:35:11 Xinran: as I explained, we have to ask should define function names. The driver comes from a device vendor (like Intel/Xilinx) -- they may not know anything about a specific function developed by a third party 15:35:14 Sundar, am i right? 15:36:08 Li_Liu: the UUIDs are reported by drivers, based on what they discover from the device itself. 15:36:20 However, where does the name came from? 15:36:38 Yes that’s the problem 15:36:54 I believe name should also reported by driver 15:37:31 driver should know the name, and report them to Cyborg 15:37:47 There are two options: have the bitstream developer define an optional name (the current metadata spec already provides for it), or have the cloud operator define a UUID/name map like Shaohe mentioned 15:37:52 How does driver know the name? 15:37:56 Both have their pros and cons. 15:39:27 The driver does not know any function names in the general case. If the driver itself is involved in programming a bitstream and the bitstream metadata has a name, well and good. But, the device may be pre-programmed or the bitstream many not have that metadata 15:39:44 driver gets the name from either: 1. device information read from hardware. 2. When loading a bitstream, driver can decode the name from the image file 15:39:52 sundar: operator doest define name, they should got name from vendor, and put them into the mapping file 15:40:28 so agent or driver can read it then update to trait 15:40:44 Li_Liu: Device info from hardware does not include function names today, for any major vendor AFAIK 15:41:08 Dolpher: which vendor? Bitstream developer or device vendor? 15:41:21 Device vendors would not know about function names at all 15:41:27 bitsteam vendor 15:41:27 Sundar, I know, this is totally up to vendors 15:42:24 Bitstream developers can define an optional name like I said -- it is already there in the spec today. However, that may not be unique! Two different bitstreams can both say they are doing gzip 15:42:45 is that a bad thing? 15:43:05 Li_Liu: I think the option2 is better, when it is a pre preprogrammed card, can we get the name? 15:44:27 xinran__ if it's pre-programmed card, it's driver's job to figure out what to report 15:45:35 I mean, using a pre-programmed card is almost equivalent to using a ASIC chip 15:45:45 Li_Liu, can driver collect the info 'name'? 15:46:27 imagine the case when a ASIC card is used 15:46:54 it has to have a dedicated driver to discovery+report 15:47:22 lol. there your go :) 15:48:54 So you mean each preprogrammed card need a dedicated driver? 15:50:19 Probably the practical way out is to have the operator define the map. But that means everytime a new bitstream gets uploaded or the operator updates a pre-programmed card, the map has to be updated. This can probably be automated. We should probably let operators get the hang of Cyborg before imposing new workflows on them. 15:51:03 if you really wanna do that. it's driver's job to figure out the initial state of the card (which includes name as one of the initial information) 15:52:49 I need to drop out in 5 min. 15:53:30 I will update the specs. Please LMK what else do you think is missing or needs to be added to the specs. 15:53:56 Sundar: Please add me as a reviewer to any specs I'm not already on that you think I need to see. 15:54:04 I have to drop too. 15:54:15 Let's wrap up 15:54:17 efried: Absolutely 15:54:19 Li_Liu: Can we make it a standing agenda item to ask for spec reviews, so we can close this quickly? 15:54:32 ok 15:55:47 #todo add spec review as a standing agenda for future irc meetings 15:55:57 #endmeeting