14:00:05 #startmeeting openstack-cyborg 14:00:06 Meeting started Wed Jul 25 14:00:05 2018 UTC and is due to finish in 60 minutes. The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:10 The meeting name has been set to 'openstack_cyborg' 14:00:31 #topic Roll Call 14:00:37 #info Howard 14:00:53 #info wangzhh 14:01:42 Hi there~^^ 14:01:47 hi rianx_ and seungwook :) 14:01:54 brianx_ 14:02:07 #info shaohe_feng 14:02:09 #info Yumeng 14:02:18 #info Helloway 14:02:41 Hi zhipeng, shaohe and yumeng 14:02:54 Hi brianx 14:02:57 hi seungwook, welcome 14:03:27 welcome seungwook 14:03:34 :) 14:03:47 Thanks all~ : ) 14:04:04 Hi brianx__ 14:04:15 #info Li_Liu 14:04:34 Hi Li Liu 14:04:56 Hey Seungwook! welcome to the party 14:05:29 Hi xinran 14:06:06 Hey Seungwook! Welcome! 14:06:23 #topic Rocky Development Update 14:06:26 Thank wangzhh 14:06:36 so the deadline we are facing this week 14:06:43 is the client lib deadline 14:06:54 Ho vrv 14:07:05 shaohe_feng how's the client side prepared ? Could we make it on time ? 14:08:33 zhipeng, which more features should we add for the client? 14:08:38 zhipeng, allocation? 14:08:59 could we do basic CRUD ? 14:09:00 zhipeng, unallocation? 14:09:04 Does fpga program count? 14:09:26 the program api should be ready this week 14:09:34 zhipeng, we can only list accelerators 14:10:00 but calling the client part I am not sure 14:10:18 i don't think we could have programming in client on time 14:10:29 shaohe_feng, once coco and zhenghao's patch in 14:10:34 ok, as least the api part 14:10:42 could we quickly add allocation and unallocation ? 14:10:45 hi seungwook welcome 14:10:53 in the most simple form, add to the db, remove from db 14:11:09 zhipeng, if we want to let nova call cyborg. we do not need CRUD for accelerators. we need CRUD for deployables. 14:11:52 according to the discussion in here: https://etherpad.openstack.org/p/cyborg-rocky-development 14:11:52 Thank xinran 14:12:43 we can use the current PATCH deployable for allocation and unallocation 14:12:56 but I do not think this is a good ideas. 14:13:19 also do we need batch allocation? 14:13:59 for example, a user want 3 deployables, can he call once allocation? 14:13:59 which patch are you talking about ? 14:14:15 https://review.openstack.org/#/c/584641/ Maybe this one. 14:14:20 API part 14:14:39 why is it not a good idea ? 14:14:58 yes. 14:16:09 IMHO, There should better be a explicit API for these two actions 14:16:33 I also use PATCH for allocation and unallocation for my POC. 14:16:52 just for I do not touch more cyborg code 14:17:25 I think there should be an unified api which does allocation and unallocation 14:17:41 xinran__ what do you mean by unified ? 14:17:50 Instead of implementing in deployable api 14:17:52 the PATCH method, it is for resource, not for collections. So it can not support batch allocation 14:17:53 #info Coco 14:18:03 1 api for both alloc and unalloc? 14:18:16 I mean another independent api :) 14:18:27 2 independent api. 14:18:28 #info xinran__ 14:18:57 shaohe_feng, i think as least there is no harm to have the PATCH method, even for long term 14:19:37 #link https://github.com/shaohef/cyborg/commit/13cb904809376584f3cf78b7d7f120817d83c2ad 14:19:52 this is a poc about half years ago 14:20:07 patch method can keep, but patch can do many things, not only allocate or unallocate. 14:20:15 I use it PATCH for unallocation 14:20:15 I agree with Li_Liu's assessment, should be good at the moment 14:20:35 curl -g -i -X PATCH http://localhost:6666/deployables/$UUID -H "Content-Type: application/json" \ 14:20:35 -H "Accept: application/json" -H "X-Auth-Token: $(openstack token issue -f value -c id)" -d ' 14:20:35 [ 14:20:35 { "op": "replace", "path": "/instance_uuid", "value": null } 14:20:36 ]' 14:21:06 I agree we do not touch two many codes for cyborg at present. 14:22:31 tiny change is OK for me as https://review.openstack.org/#/c/584641/ 14:22:51 okey then let's wrap up zhenghao and coco's patch, and shaohe_feng could update the cyborgclient 14:23:08 I will need to cut the final release for client on 27th 14:23:09 So, everyone agree that we should have two another apis for allocate or unallocate? 14:23:20 but we need dolpher/sunder or Li_Liu comments for it. 14:23:24 and... 14:23:55 we can not get the expect deployables. 14:24:13 for we need more attributes. 14:24:28 shaohe_feng, we can extend parameters if dolpher/sunder or Li_Liu have other comments. 14:24:49 such as function ID/name and more 14:25:23 shaohe_feng so you want to add more fields in the table? 14:25:36 API is stable. Just extend the filter for querying. 14:26:17 Li_Liu, not db. It's for API. 14:26:45 For example, I want a crypto FPGA(pre-pre-programmed) 14:26:45 adding fields should not affect api tho 14:27:17 Yes. so you give the comments ASAP 14:27:37 need just need experts comments 14:27:51 Li_liu plz take a look at https://review.openstack.org/#/c/584641/ 14:28:20 and also coco's patch 14:28:22 https://review.openstack.org/584296 14:28:37 Li_Liu if you are ok with it plz go head w+1 14:28:44 to land it 14:28:50 As we have talked, it is easy to improve these patches, as your comments arrives. :) 14:29:46 I was looking as it 14:30:01 will provide comments in a bit 14:30:13 Thanks. 14:30:27 For sample scenarios 14:30:50 There is a QAT and FPGA. 14:31:26 the user want to cypto function. 14:32:11 pre-programmed cypto FPGA 14:32:19 how does he do? 14:33:03 needs host, devices_type, function_type. 14:33:15 also the numbers? 14:33:29 one way is to use add_attribute() api in Deployable 14:33:39 maybe he want to 2 pre-programmed cypto FPGAs 14:33:57 Li_Liu, I did it as you said. 14:34:02 yes, I think number is necessary. 14:34:17 what kind of numbers? 14:34:23 performance kpi? 14:34:32 like 30 gbps? 14:34:51 Coco, agree. You can also give the comments on that API patches. 14:34:54 the number of deployables allocated at once. 14:35:13 GET will return a list of acc. So we need number in allocate? 14:35:58 at least, there are some place in the code where we deal with the number requirement of deployables. 14:36:07 Batch allocate? 14:36:08 if we have a dedicated allocation api, then the number should be included 14:36:50 I though for now we are just using the deployable PATCH, which is single allocation 14:37:18 OK, than if user ask for 2, we should use the api twice? 14:37:28 then 14:37:42 For example, you and I Get 2 accs. 14:37:48 that could a bit tricky 14:38:07 Then we need allocated them, then risk 14:39:12 will user actually needs to simultaneously deploy two kinds of accs ? 14:39:23 or more, all at once 14:39:39 So For my POC. dolpher proposal one API to get and allocated. 14:40:14 zhipeng, dolper prefer at once. 14:40:23 i was thinking about batch actions 14:40:43 it makes sense for VMs and containers, since they are mostly homogeneous 14:40:56 if fails, rollback is pretty easy 14:41:07 I agree with the multi-alloc use case 14:41:09 sure. thanks, you got my points. 14:41:27 but if we are gonna have batch operation on difference accelerators 14:41:54 if they are the same type it will be easy, but for different functionalities, it will be tricky 14:42:07 It is hard for difference accelerators. 14:42:17 I have not think a good solutions for it. 14:42:22 Cyborg return a list of accelerators and nova will do a for loop to allocate one by one, this is the current solution, but I think we should pass num to cyborg allocate api in the future 14:42:34 Li_Liu, what's your suggestion on this one? 14:42:49 when calling the dedicated alloc api, just provide the list of acc type you need 14:42:52 yes what xinran__ said is okey 14:43:22 for instance, alloc: [aes x 1 + rsa x 2] 14:43:37 that gives you 3 different accs 14:47:57 i think we should follow xinran__'s suggestion 14:48:05 let's stuck to the current solution for Rocky 14:48:15 and discuss about batch in Denver ptg 14:48:38 what's the deadline? 14:50:16 client is 27th ... 14:51:42 agree, xinran_'s suggestion should solve the problem for us for now 14:51:56 2 days left? 14:52:09 LIke os-acc before. It is urgent.:) 14:52:15 yes lol 14:52:20 zhipeng, what else are we trying to squeeze in by 27th? 14:52:20 every release is like this 14:52:33 just the client lib final cut 14:52:51 meaning the basic api will not be changed as well, after that, to make sure the client works 14:53:08 i see 14:53:29 shaohe_feng your new patch 14:53:30 https://review.openstack.org/#/c/585146/ 14:53:37 is about the placement support right ? 14:53:49 do we need to remove some old implementation ? 14:54:03 I remember Li Liu coded the report functionality before 14:54:39 https://github.com/openstack/cyborg/blob/master/cyborg/services/report.py 14:54:43 I did this part 14:55:15 shaohe_feng, is it sufficient for you? 14:55:17 are there any conflict ? 14:55:39 client lib depends on cyborg api we don’t have allocation api right now. How we handle that ? 14:55:43 I think shaohe added the provider_tree and also make sure the interaction happens on the agent level 14:56:03 xinran__ will zhenghao's patch handle that ? 14:58:15 Li_Liu, zhipeng I'm back. 14:58:16 sorry 14:58:24 I guess shaohe_feng re-implemented my part in his new patch.. 14:58:43 It support the sub-provider. 14:58:57 actually, I did not re-implemented it. 14:59:05 I leverage it from nova 14:59:07 shaohe_feng, I have implemented the SchedulerReportClient in https://github.com/openstack/cyborg/blob/master/cyborg/services/report.py 14:59:34 Li_Liu, yes, I know your implementations. 14:59:48 so what the difference between your SchedulerReportClient and my SchedulerReportClient? just curious 14:59:58 no sup-provider. 15:00:09 it support nest provider. 15:00:24 I think these code should be in a lib 15:00:31 i see, but should be put them together? 15:00:32 so nova and cyborg can share. 15:00:50 just use your code 15:01:02 yes. Maybe it need to remove the old one. 15:01:03 ignore my implementation for now then 15:01:11 ok 15:01:19 okey 15:01:30 And the best solution, move them to a common lib shared by nova and cyborg or other project 15:01:42 zhipeng: yeah but we need modify that in the future 15:01:55 xinran__ yes for sure 15:02:03 zhipeng, you can talk about with nova's guy, do they have plan to do it? 15:02:03 xinran__, IMHO, we can use PATCH for allocation as my patch in rocky. Or we can design new API interface and parallel development. 15:02:19 shaohe_feng I think it is a reasonable request 15:02:33 since provider tree has also been implemented for nova-compute 15:02:39 Merged openstack/cyborg master: Add "interface_type" field in deployable DB https://review.openstack.org/584296 15:02:57 wangzhh: and deallocate also by patch method ? 15:02:59 hey look who's here 15:03:08 a lot of reimplementations cross different project just for this part... 15:03:13 zhipeng, yes. Need need for us to maintain same code with other project. :) 15:03:31 Yes. 15:03:39 xinran__ 15:03:44 Li_Liu, yes. too many. but we should avoid. 15:03:55 agree 15:04:00 okey next on drivers 15:04:05 And that is cyborg should focus on 15:04:18 that is not 15:04:20 sorry. 15:04:35 we now have gpu and opae based fpga drivers on the fly 15:05:02 need brianx__ to start the Xilinx driver ASAP lol 15:05:10 Coco: did you modify the driver discover() method to get interface type? 15:05:11 let's start with a simple version 15:05:47 I think the yes 15:06:14 I will check again. 15:07:29 I use the "pci" as the "interface_type" value at this moment, since it's fpga driver. 15:08:12 I think discover should have common data structure. 15:08:22 I have list cyborg tasks in the etherpad. include this one 15:08:25 I agree with wangzhh 15:08:26 No one take it at present. 15:09:02 27th also the deadline? 15:09:02 Yes. maybe an object 15:09:17 then it can not need schema verify 15:09:53 anyway, the goal is unify the discover data 15:10:22 I can take it, but can't make sure it can be done by 27th. 15:10:41 does that affect the API behaviour ? 15:10:41 when discover() reports data, it should be in Cyborg understandable format 15:11:01 Coco, great, thanks. 15:11:09 zhipeng, no. Just affect agent. 15:11:28 To report data. 15:11:29 no affect on api I don't think 15:11:32 okey then there is no hurry 15:11:34 Coco, have you looked at sundar's VAN? 15:11:37 yup 15:11:59 Coco, can we leverage it? 15:12:07 yes, not hurry. 15:12:13 what's the link? 15:12:28 the spec, let me show you. 15:13:15 ok, thks. 15:13:35 #link https://review.openstack.org/#/c/577438/ 15:13:46 it is used for os-acc 15:14:21 you can have a check, does the drivers can use it? 15:16:02 ok, i will check tomorrow. 15:16:08 Thanks. 15:16:42 Yumeng__ are you still around ? 15:16:54 the doc work should start now 15:17:52 hmm 15:18:16 I will focus on that after warpping up the program api 15:18:33 will as Yumeng and others for help 15:18:59 yes will need to work together, to comb through the implementations :) 15:19:23 yup 15:20:29 guess she is in sleep. :) 15:20:47 OK, we already had doc wechat group. 15:21:57 more sleep can keep a girl beauty. :) 15:22:10 .... 15:22:25 I need to go sleep too. 15:22:37 okey 15:22:38 have a good sleep guys :) 15:22:40 I think it takes some time for me to catch up the dialog. 15:22:52 let's try to nail client lib down this week lol 15:23:17 #endmeeting