14:01:10 #startmeeting openstack-cyborg-driver 14:01:11 Meeting started Mon Aug 13 14:01:10 2018 UTC and is due to finish in 60 minutes. The chair is shaohe_feng. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:13 #topic Roll Call 14:01:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:15 The meeting name has been set to 'openstack_cyborg_driver' 14:01:40 #info shaohe_feng 14:02:16 #info ed leafe 14:05:13 #info Sundar 14:05:54 Sundar, edleafe morning 14:06:10 good UGT morning to you! 14:06:36 :) 14:06:43 Good day, Shaohe 14:07:50 edleafe, Sundar now only we three on line. 14:08:28 #info Li_Liu 14:08:40 do we still have the meeting today? 14:08:41 shaohe: I had sent an email yesterday to openstack-dev. Shall we talk about that? 14:09:01 Sundar: OK, go ahead. 14:09:16 Li_Liu: morning. 14:09:46 good evening, xinran 14:09:48 shaohe_feng: good evening :P 14:09:58 Hi all 14:10:33 Sundar: You can talk about your email 14:10:40 It was apparently decided in some meeting that, to record the discovered devices, Cyborg agent will call Cyborg REST API. Also, to allocate and deallocate accelerators. If so, that will make the public and ha many disadvantages. 14:11:45 Among other things, it means it is not internal to Cyborg any more. Any user can call it. So, we should authenticate. Even if we open it only to operators, it is still error-prone. We can just keep it Cyborg-internal, right? 14:12:43 When Cyborg agent will call restful API? 14:13:38 Cyborg agent should not call Cyborg REST API. 14:13:53 Agent should stay with rpc 14:13:59 xinran: Rest API is meant for things that can be accessed by external users, operators or other services. 14:14:44 Sundar: can you give us an example when agent call restful api 14:15:31 shaohe, Li_Liu: Agreed. I amreferring to #link https://etherpad.openstack.org/p/cyborg-rocky-development (Line 44) 14:15:38 Hi all. Test connection. 14:16:03 IMHO agent should not call restful api 14:16:32 Sundar, that is exposed by cyborg-api, not meat to be used by Agent 14:17:46 Li_Liu: Was that agreed only for GET, not for PUT/POST? 14:18:19 wangzhh: good evening. 14:19:10 shaohe_feng: good evening. :) 14:19:15 Sundar, PUT/POST are open to admin I think, in case they want to tune some of the deployables 14:20:18 Li_Liu: the issue there is that we will have to maintain backwards compat for REST APIs. That is going to constrain our development and/or cause upgrade issues. 14:21:13 wangzhh, do you have comments on this? 14:22:00 Sundar, what do you mean about "we will have to maintain backwards compat for REST APIs. That is going to constrain our development and/or cause upgrade issues." 14:23:11 Can u give me an example which current APIs can not handle? 14:23:32 wangzhh: In #link https://etherpad.openstack.org/p/cyborg-rocky-development (Line 44), if we allow PUT/POST in addition to get, we may have such considerations 14:25:15 We have PATCH now. If we want to update deployable. we can use it. 14:25:46 sure we should allow PUT/POST, that depends on 14:26:09 wangzhh: Can you elaborate? 14:26:35 shaohe_feng: Do we need create deployable via API? 14:27:52 That depends on. 14:28:07 wangzhh, maybe we should allow that for now 14:28:23 at present, agent creates deployable. 14:28:30 to let admin able to add stuff in manaully 14:29:02 Li_Liu, shaohe: can you explain why that is necessary? And how we will ensure backwards compat and avoid upgrade issues? 14:29:04 Sundar: https://github.com/openstack/cyborg/blob/719f3dee01b6f0b0f4f2ce9b45ffd4978eeac287/cyborg/api/controllers/v1/deployables.py#L217 14:30:03 Sundar: we should allow users to update some attribution of a deployable 14:30:30 I agree that users need to update something. 14:31:20 By users, I hope you mean operators. End users (tenants) should not be touching this 14:31:22 But can user create a new deployable? In which case? Could u give some example? 14:31:55 The users here should mean admin/operators 14:32:08 ^ IMHO, Li_Liu, the intention of your attribution design should allow user to update them, right? 14:32:20 yes, indeed 14:32:30 Agree. 14:32:50 If operators need a manual way, can't we provide a script of some kind? Before opening up REST APIs, we need to understand their implications. 14:33:06 I agree, right now Create api for deployables might not be very useful 14:34:15 Sundar, I assume the script you mena cyborg-pythonclient? 14:34:16 Sundar: yes, at present it means admin/operations, Unless there are sufficient reasons that end users should update them 14:35:19 Sundar, I think update is necessary. For example, I need change my deployable's name. Some attr which we can't collect from driver. 14:35:44 Li_Liu: isn't pythonclient a wrapper around the REST API? 14:36:19 Yes 14:36:23 wangzhh: Sure, we can provide some script that not guarantees backwards compat till it all matures 14:36:40 *does not guarantee 14:37:32 Perhaps @edleafe can comment but, my understand is that REST APIs need to be honored in future releases 14:37:42 *understanding 14:38:26 So you suggestion is take out the deployable creation rest api? 14:38:46 Sundar: it has nothing to do with REST - any published API should not be removed or modified 14:39:43 edleafe: What APIs do we publish to users apart from REST? RPC APIs are internal and can evolve across releases, right? 14:39:46 A public API is a contract with your users 14:40:26 I'm speaking more generally 14:40:57 Whether your API is REST, RPC, SOAP, or anything else: public APIs need to be maintained 14:42:16 If, say, Cyborg offers a RPC API to Nova or os-acc, we can evolve that in future releases based on mutual understanding, right? IOW, that doesn;t count as 'public' - exposed to users 14:43:04 I'm not following. If another service is relying on an API, then it's public 14:43:25 Sundar, if we use script. It's hard to intergrate with other project or software. 14:43:50 For example, If we want to update attr by horizon? 14:44:47 Command line and script is not friendly. :) IMHO. 14:44:58 If other service interact with cyborg there may be the rest ful apis 14:45:13 Agree with xinran. 14:45:53 wangzhh: Before we make something public and incur all the costs for all future releases, let us keep it minimal. If it is not absolutely essential to do something with public APIs, we should look at alternatives first 14:46:40 edleafe: Has nova/os-vif/neutron RPC APIs never changed? I thought those have evolved too 14:47:53 Sundar: I don't really know - I haven't worked with Neutron or os-vif 14:47:58 Sundar, I am thinking about it in the other direction tho. I think there is no harm to open couple rest apis and maintain them. As we couldn't confine what the users can/want to do in their scenarios. 14:48:48 By offering something more, can open up more opportunities for them. 14:48:54 Li_Liu: you should always think very carefully before publishing an API 14:49:08 Li_Liu: That means keeping your deployables structure backwards compatible? Before we know all the use cases? 14:49:16 Li_Liu: https://blog.leafe.com/api-longevity/ 14:50:19 Yes, API should be carefully, as edleafe said it is a contract :) 14:51:02 For sure, I am not saying we should do it without a thought :) 14:51:38 Nova have API version to handle these issue. 14:51:57 shaohe: Not just the API. It will constrain how you can evolve the data structure that we call Deployable. How do we add new fields, drop old fields, or change the meaning/type/format of any field? 14:54:44 Hi 14:55:06 Sundar: yes, we need a API version for them. 14:55:09 Sundar: you're describing alpha development, where things can be added/removed/changed. Is that you would describe Cyborg's current state? 14:55:10 CoCo, Good evening. :) 14:55:16 Coco_gao: evening. 14:55:47 wangzhh, shaohe, IIUC, having API versions does not mean that older versions can be dropped. They are still public 14:56:17 #evening or morning, everyone. 14:57:40 edleafe: Not quite, I am just advocating minimal public APIs till Cyborg has got some adoption and use cases are known and well-supported 14:57:40 Sundar: exactly. The point of API versioning is that the default never changes, but users can opt in to a newer version 14:57:48 Sundar: yes. They are still public. For the end users's app still use older versions, depends on older instead of latest. 14:57:51 Yep. LTS. But it can be dropped after depatched declaration. 14:58:06 As edleafe said, Cyborg is still at its very early dev stage 14:58:47 When you introduce API v2 with many changes, you still have to maintain v1 at least for some time. That means handling upgrades, conversions without breaking old functionality. 14:59:15 I would think that given Cyborg's early development status, that any APIs you create are labeled as alpha, with a note that they may be changed in the future 14:59:35 Sundar: some would contend that you have to maintain v1 forever. 14:59:41 (not me, though!) 15:00:13 edleafe: iIf we can mark REST APIs as alpha, that will be great 15:00:38 I think that's want we should be doing for now 15:00:48 Sundar, all that mean that we should do with thought. Instead of not to do. 15:01:31 Sundar: http://specs.openstack.org/openstack/api-wg/guidelines/api_interoperability.html#new-or-experimental-services-and-versioning 15:02:46 Thanks, edleafe 15:02:49 Folks, shall we agree to mark Deployables API as alpha (subject to change without backwards compat) for now? 15:03:26 I vote yes 15:04:27 Agree with Sundar. :) :) :) :) 15:04:46 agree 15:05:28 agree 15:05:31 I agree, I think maybe the alpha is the version when cyborg finish interactions with nova. Before that, cyborg is not workable by users. 15:06:06 Great! Thanks, guys. :) 15:06:17 Sundar: you should take more time/though on how to define the API. Post your ideas and let we discuss together. 15:07:24 Sundar: have you consider the API that nova how to apply an accelerators from cyborg? 15:08:45 shaohe: Yes, that should be part of the os-acc spec. We are still in the stage of getting agreement on the overall workflows and interactions. But I agree we should document that in detail 15:09:12 Sundar: and also the parameters that API supports. 15:10:09 Sure. Please see https://review.openstack.org/#/c/577438/6/doc/source/specs/rocky/approved/compute-node.rst Line 525 for example 15:11:31 Sundar: OK, for example, as a user I want to apply a accelerator, accelerator type is FPGA instead of GPU, vendor is intel, model is A10, and it's function is Crypto. 15:12:01 Sundar: maybe more parameters will be extended. 15:13:16 shaohe_feng, use the attribute in deployables, if the fields are not supported 15:13:24 Sundar: these parameters, nova will extract for your defined traits/RC in the flavor 15:13:34 Operators will define flavors based on the attributes you mentioned, and users will pick one of the flavors. The accelerator requirement is conveyed though extra specs, and that is aprt of these function signatures 15:13:57 Li_Liu: Yes, I already use attribute :) 15:14:01 right now, when creating Deployables any parameters that are not native to deployable, will be added to the attribute_list 15:14:58 yes. I did use attribute as this way. 15:15:42 Sundar: also can we support to apply batch accelerators in that API? 15:16:12 shaohe: if you mean that a sigle VM may want more than one accelerator, sure. 15:16:30 Sundar: for example, I want 2 accelerator, both accelerator type is FPGA, vendor is intel, model is A10, and it's function is Crypto. 15:16:58 Have we make an agreement on users' input, parameters or sections we may support by now? 15:17:02 Currently, the API gets a list of device RPs selected by Nova, and extra specs. But that is not ideal. Still trying to think of a better way 15:17:05 Sundar: or I want 2 different accelerators, one is GPU another is FPGA? 15:17:53 shaohe: All that should be possible. The currently proposed APIs can technically handle them, but we can probably improve upon them 15:18:27 Coco_gao: Sundar is working on them. 15:18:28 Sundar when will we see your API define? 15:19:00 Sundar: one api to apply 2 accelerators instead of two api call? 15:19:17 shaohe: I am updating the os-acc spec. It should be out by this week, if not in next couple of days. 15:19:54 Sundar: nice. 15:19:57 shaohe: The current API looks something like: prepareVANs(device_rp[], extra_specs, instance_info) 15:20:02 Great jobs. 15:20:27 ok 15:20:32 The extra specs will contain the details of the user request, like: CUSTOM_ACCELERATOR_FPGA=2 and the traits 15:20:52 The device_rp[] will contain the device resource providers selected by Nova for each of those accelerators 15:21:08 Sundar: anyway, the api should well match the user the expect accelerators 15:21:18 device_rp is a list, and extra_specs will be applicable to all the RPs in the list? 15:21:54 The problem is, how do we correlate each device_rp to a specific accelerator in extra specs. If we can massociate each user request with its own device RP, that will simplify our livs 15:23:51 can we make extra_specs a list as well? 15:23:56 Li_Liu: yes. The extra specs may contain groups based on granular request syntax #link https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/granular-resource-requests.html 15:24:13 The extra specs is a pre-defined thing coming from Nova. 15:24:35 ah, ok 15:24:36 Not sure if we can modify it. Still looking into that 15:25:31 OpenStack Release Bot proposed openstack/cyborg master: Update reno for stable/rocky https://review.openstack.org/591423 15:26:15 I have already use resource groups to support batch accelerators :) 15:26:59 Sundar: remind, have you notify rodrego that you and Coco_gao are refactoring the new returning data of the driver? 15:27:25 Oh yes. 15:27:34 Good 15:27:46 We are currently looking at making the current driver patch work with Zuul 15:27:52 Coco_gao: how is the refactor going on. 15:28:43 yeah, I am working on it right now. Still need to test the code by now. 15:29:10 Coco_gao: we probably need to sync up. Are you using OVOs? 15:29:33 Coco_gao: good. Guess other developers are depending on your change. 15:29:56 Sundar: have resolve the OPAE package install in Zuul? 15:29:59 Coco:Good job. Plz don't forget test with my GPU driver. 15:30:34 Our Datacenter is going to move to other location, so the test work need to be done several days later. 15:31:31 shaohe: Rodrigo and I discussed the pkg install, and he is making changes 15:31:31 Sundar, you mean VersionedObject? 15:31:39 Coco_gao: yes 15:32:17 We should update the spec before submitting the patch? 15:32:17 yes, I am using VersionedObject 15:32:59 Sundar: good. 15:33:05 I am still backlogged on os-acc. I'll update the driver/agent spec. Coco_gao, I cna send you an early version for review 15:33:17 Sundar: you should sync up with Coco_gao well. 15:33:51 Yes 15:33:51 Li_Liu: do we have deadline for this refactor? 15:34:08 I don't know 15:34:29 maybe we can discuss the deadline. 15:35:41 shaohe_feng, if you wanna catch the Rocky release, then Aug 20 - Aug 24 15:35:53 but if not, there's not hard deadline on this 15:36:00 Sundar, any detailed questions or informations , plz send me an email(gaojh4@lenovo.com) 15:36:13 yes, Your working is very important. for other drivers base on your work. 15:36:21 Coco_gao: sure 15:36:52 well, yes, we need to save time for reviews as well(if you wanna catch the R release) 15:37:50 Li_Liu: I need to change the agent code, base on your new pf/vf model. 15:38:05 shaohe_feng, I know, maybe this week I will give an patch on the object first. 15:38:06 R release is past code freeze, right? 15:38:15 ok, let me know if you need any help/discussion 15:38:59 we can have ZOOM meeting anytime for discussion until Rocky releases 15:39:08 OK. 15:40:21 then shaohe can update the agent code, others can update the driver code. 15:41:22 Sundar, Li_Liu, will we not touch the API code until R release any more? 15:41:40 even Sundar have define the new API, right? 15:42:01 I think so 15:42:09 no time for that 15:42:48 Agreed 15:43:04 Hi, shaohe, I think maybe zhenghao can help update agent code, since we need to test gpu driver together. 15:43:37 Yep. shaohe_feng: Can I take it? 15:44:08 If the agent code don't finished, we can't test our code seperately. 15:45:05 that's great 15:45:18 3ks, wangzhh 15:45:55 You're welcome :) 15:46:43 Everyone, if I have any questions I will contact you directly and thank you for you support and patience~ 15:47:24 Thank you Coco :) 15:47:38 Thanks a lot everyone for the hard work 15:47:57 cool, Coco. 15:48:15 No problem. ;) 15:48:25 Time to bed if we don't have other questions. 15:48:52 yes, enough sleep keep girl beauty. 15:49:08 ヾ(・ω・`。) Good night. 15:49:10 Any thing wants to discuss? 15:49:10 Good night or good day, everybody 15:49:20 OK, let's end the meeting. 15:49:37 Have a good nite guys 15:49:51 ヾ( ̄▽ ̄)Bye~Bye~ 15:49:52 Thank you Li_Liu 15:49:59 #endmeeting