14:04:15 #startmeeting openstack-cyborg 14:04:16 Meeting started Wed Jun 27 14:04:15 2018 UTC and is due to finish in 60 minutes. The chair is Li_Liu. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:04:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:04:19 The meeting name has been set to 'openstack_cyborg' 14:04:46 #topic Roll Call 14:04:57 #info Li_Liu 14:04:58 #shaohe_feng 14:05:05 #info shaohe_feng 14:05:22 ō/ 14:05:28 #info sum12 14:06:55 #info Sundar 14:07:04 Let's get started 14:07:24 Howard is not here today, I will run the meeting for me today 14:07:52 #topic os-acc discussion 14:08:25 HI Sundar, do you want to lead this topic? 14:08:54 Sure, Li Liu 14:09:27 The os-acc spec was updated as indicated in #link http://lists.openstack.org/pipermail/openstack-dev/2018-June/131751.html 14:09:40 The reasons for the update are listed there as well 14:10:08 * efried owes y'all a review on that 14:10:12 The updated spec is in #link https://review.openstack.org/#/c/577438/ 14:10:50 efried: One comment I got was to include a sequence diagram 14:10:56 yup 14:11:17 When I follow https://review.openstack.org/#/c/572583/11/specs/rocky/approved/reshape-provider-tree.rst, I get an error in RST syntax 14:11:31 The seqdiag directive is not known to standard RST editors 14:11:53 What do you do to in addition to updating requirements.txt etc.? 14:11:55 Sundar: Right, you have to set up dependencies and other stuff in your env 14:12:05 hold on, let me find the patch where we added that stuff into nova-specs... 14:12:18 Yes, requirements.txt and doc/source/conf.py need to be updated 14:12:36 right, okay - so you did those things and it's still busted? 14:13:16 Sundar, quick question, do we still need a dedicated os-acc spec on top of this one you posted? 14:13:17 The standard RST editors like http://rst.ninjs.org/# wouldn't know about them. So, how do you check before a submit? 14:13:44 If you're building locally, you'll need to make sure the dep is installed into your venv. You can do that by adding -r to your tox command, or you can enter the venv and install it manually via pip 14:14:25 OK, how do you run the tox command? 14:14:45 Sundar: How have you been running it thus far? 14:15:00 tox -e docs ? 14:15:28 SO, tox -e -r docs? 14:15:31 So you would say tox -re docs instead. That will rebuild the venv from scratch. That's assuming your problem is the missing dependency. 14:15:48 Got it. Thanks, Eric. 14:15:55 No, -e is how you specify which testenv, so tox -r -e docs or tox -re docs 14:15:56 Li Liu: I don't see why. We should be able to add any additional detail here 14:16:02 ++ ^ 14:16:18 Sure, Eric 14:16:21 agree the goal should be to have all the necessary design detail here. 14:16:35 sure, that is what I figured 14:16:52 Sundar: This is assuming the problem was the missing dependency. If you're still having problems, let me know and I'll see if I can help out. 14:16:53 Li Liu: We presumably need to add a bit more detail on how we can do a plugin for x86+FPGAs. Is that what you are looking for. 14:17:27 Thanks, Eric. I may take up on that if things don't work out by today :) 14:17:43 we're still doing this in openstack/cyborg, huh? 14:17:48 yup 14:17:49 any plans to move to cyborg-specs? 14:18:03 would make things easier to find :) 14:18:39 I was told we have latitude to stick to the current location for this cycle. 14:18:57 that's sounds good, let's get the current ones merged before we start the moving 14:19:25 right 14:20:08 OK, next steps are to add a sequence diagram. Also, I would propose a plugin for x86 + FPGAs, and a separate one for x86 + GPUs. The main reason to stick to x86 is that that's all I know about :) 14:20:42 We can probably generalize to ARM if SR-IOV works the same way there. Power is an area where we need Eric's input 14:20:59 how much difference if it was arm or even power? 14:21:00 I don't think you should be proposing specific plugins via this spec. 14:21:06 just the framework. 14:21:24 The plugins themselves will ultimately be the responsibility of the vendors. 14:21:29 I would think. 14:21:58 For x86+FPGAs, we could have a community-supported plugin which calls into Cyborg/agent/driver for vendor-specific stuff 14:22:15 shall we provide a reference from community tho? 14:22:28 Li_Liu: yes ^^ 14:23:06 Li_Liu: "how much difference if it was arm or even power?" Power seems to be quite different IIUC. Eric can expand on that 14:23:07 entry points for driver. 14:24:32 Yeah, e.g. if the plugin has a "plug" operation, on libvirt this entails, what, editing the domain XML and doing stuff to special files under /dev? 14:25:01 In POWER, it entails making a REST call to the NovaLink API asking for an I/O slot to be attached to a logical partition (VM) and the platform does the magic. 14:25:07 Totally, totally different model. 14:25:25 and definitely not something the cyborg team should try to become domain experts on. 14:26:31 efried: Agreed. The plugin model should be able to handle such differences. The return value of the plug() itself is left open for that reason. 14:26:55 sounds good 14:27:01 I would think the return value would be an Acc* object 14:27:13 That is why it may make sense to have a x86/ARM/libvirt section that specifies what plug(), does 14:27:33 If it needs to contain platform-specific data, that goes in the *Profile bit, or something. 14:28:01 The return value can be a VAN object with enough data that is hypervisor-specific 14:28:20 Sundar: sorry, yeah, I guess we're calling it VAN now. 14:28:41 The port profiles in Neutron may not be applicable here IMHO 14:28:52 But again, I don't think your spec should go into any detail as to what each platform's plugin actually does. You could mention it for the sake of example/understanding, but it should not be the role of this spec to lay out the details. 14:29:32 OK. If we agree that we will do a community-supported plugin for x86 + FPGAs, I can write a separate spec for that. Is that fine? 14:30:41 yes, that sounds like the right plan. 14:31:14 Sundar: update on the seqdiag: right now it looks like the cyborg repo isn't yet using the "new" process for doc builds. 14:31:33 So you'll want to put the requirement into test-requirements.txt instead of creating a new doc/requirements.txt 14:31:35 Li_Liu, shaohe, all: do we all agree on this plan? 14:31:39 or you can switch over to the new build process 14:32:02 Ah, I put it in cyborg/requirements.txt 14:32:35 That seems closest? 14:32:53 Sundar, i am fine 14:33:02 Right now your docs env is calling out test-requirements.txt 14:33:09 I don't actually see anything calling out requirements.txt, which is weird. 14:35:07 Sundar: Anyway, I'm noodling around with it locally, will let you know if I get something working. 14:35:23 let's moving on 14:35:45 efried: Sorry, where do you see the reference to test-requirements.txt? 14:36:48 I mean , I see the file but what in doc is referring to it? 14:37:02 Cyborg had a meetup in Beijing LC3 yesterday, shaohe_feng do you have anything to share? 14:38:50 zhipengh[m], introduce some scenario about cyborg 14:39:42 #topic LC3 meetup summary 14:43:52 ok, I will gather more information from the folks who attend the meetup and share with rest of the team 14:44:09 #topc AoB 14:44:14 #topic AoB 14:44:32 Do you know iflytek? Sundar, Li_Liu 14:44:59 I know them 14:45:18 #link http://www.iflytek.com/en/index.html 14:45:18 This one? https://en.wikipedia.org/wiki/IFlytek 14:46:01 It is a famous AI company in China. 14:46:08 Sundar, yes. 14:47:04 AI will be a cyborg scenario. 14:47:12 they need accelerator 14:47:46 then we discuss the current status of cyborg development. 14:47:47 GPUs? 14:48:06 Sundar, Gpus and FPGA. 14:48:08 FPGAs, also. :) 14:48:13 are they planing to put people on Cyborg? 14:48:41 Not sure. 14:49:49 Sundar, they develop logic on FPGA. 14:50:17 we have a gap on cyborg API design. 14:50:31 Interesting. So, they have their own bitstreams. How are they using it -- like FPGA aaS? 14:52:13 Not sure how do they use it. 14:52:34 Sundar, will check with them about it by mail. and CC you. 14:52:42 for we need to interactive nova. 14:53:13 shaohe: Ok 14:53:42 wangzhh, introduce the whole flow for the interaction. 14:53:55 they have do a demo for it during summit. 14:54:18 OK. We have a discussion offline this afternoon. 14:54:34 Let me introduce about it. 14:54:49 wangzhh, can you give a more details on it? 14:55:31 It is about the flow of creating, deleting and rebooting instance with accelerators. 14:55:41 Create: 14:55:54 1. nova-api (Create an instance with acc) 14:56:00 2. nova-conductor => nova-scheduler => placement-api (Get a node list and claim the accelerator unit) 14:56:05 3. nova-conductor => nova-compute 14:56:11 Extra work we should do in nova: 14:56:19 a) nova-compute => cyborgclient => cyborg-api (Update acc, instance_uuid, assignable,etc. Program here if it is a request of fpga) 14:56:27 b) nova-compute pass parameters to os-acc(accelerator_address, acc_type, etc.) and get xml segment 14:57:36 Why do they do "nova-compute => cyborgclient => cyborg-api" for programming? We can do it entirely in compute node 14:58:14 We need to update the db, and that requires an RPC to the conductor 14:58:34 But, otherwise, it can all be in os-acc/Cyborg-agent/driver 14:58:36 It is for 2 reasons. 14:58:37 I'll just mention that there was a nova core (it might have been Dan) who said nova-compute should never have to call the cyborg API; everything should be done via os-acc. 14:58:41 I could be misquoting too :( 15:00:09 wangzhh: Did they modify Cyborg? 15:00:41 1. In Openstack, It is a common design to interact by api. 2. We have something like update quota in api. 15:02:01 wangzhh: If Nova compute has to call the REST APIs of Neutron, Cinder, Cyborg, ... it can get out of hand. The os-acc/os-vif/os-brick is supposed to solve that problem 15:03:03 efried, if so, could we imply it by os-acc =》 cyborclient? 15:03:22 wangzhh: Yes, I would think so. 15:03:42 efried: How about Power? Would you have os-acc call Cyborg for that too? 15:04:09 Sundar: I would think any calls from os-acc to cyborg API would happen *outside* of the plugin, in the common os-acc code. 15:04:33 Sundar, IMHO os-vif/os-brick didn't call neutron or cinder. 15:05:20 I probably need to punt on this issue, just wanted to bring it up so y'all didn't get miles down the road and end up surprised if someone started waving a red flag. 15:05:51 In the currently proposed flow, os-acc --> plugin, and the plugin can do whatever it wants. For FPGAs, I think it would call Cyborg agent, which in turn can call the Cyborg conductor etc. 15:06:26 For Power plugin, there could be a common os-acc function which invokes cyborg client to update the API 15:06:38 I think API is the entry of cyborg. 15:07:12 If we call agent directly. Some process would be complex. 15:08:07 For vendor-specific actions, we need to delegate to the drivers, which would be through the agent. 15:08:12 We can reuse client if so. 15:08:52 Instead of init a rpc client and do extra works. 15:09:21 should os-acc be calling the Cyborg API from the top instead of call Cyborg driver? 15:09:30 Sure, we can have the Cyborg agent use the cyborgclient too. Then we need to configure the REST API URL etc. but that gives a single entry point to update allocation/release 15:09:51 It is cyborg-api =》 cyborg-agent 15:10:35 i mean shall if be like: os-acc --> Cyborg api --> Cyborg agent ? 15:10:49 Yes, uncle li 15:11:22 Sundar, client is for restful api 15:11:34 Li_Liu: There is no need for such circuitous paths, and that probably won't work for Power. Eric doesn't want Cyborg to be in the middle for Power 15:11:36 But agent is a rpc server. 15:12:31 Folks, we need 2 things: vendor-specific actions (through Cyborg drivers for FPGAs etc.), and update the db (which can be through REST API) 15:12:46 why Power does not work in this case, efried? 15:12:47 I want code provided by cyborg to do everything that's not platform specific; and I want everything that's platform specific done by plugin code. That's all. 15:13:02 That needs to apply to *all* platforms, including libvirt. 15:13:17 There's a tendency to think of libvirt as being "common". It's not. 15:14:00 (It's common in the sense of "pervasive" - not common in the sense of "all code paths go through it") 15:14:07 I understand, I don't think going through os-acc --> Cyborg api --> Cyborg agent could be a problem for you 15:14:20 To me, that means: os-acc calls the plugin for all vendor/arch-specific actions. The os-acc or the plugin must call into Cyborg REST API to update the db. 15:14:29 ^ this 15:14:59 And I'm tending toward s/or the plugin// 15:15:16 but will have to see how the actual flows shake out. 15:15:21 efried: That is how the spec is written today 15:15:33 * efried sheepish 15:15:35 Sundar, that I agree, but call Cyborg Agent/Driver directly from os-acc might be a problem 15:15:36 still need to review 15:16:30 Sundar, 15:16:43 Li Liu: os-acc --> plugin. The plugin possibly calls Cyborg agent, but always updates Cyborg db through REST API 15:17:10 It 's not just update state of acc in db when attach acc. 15:17:31 Something should be done by API 15:18:12 Why "by API"? All actions can be done by Cyborg agent/drivers or the plugin, except for updating db 15:18:18 E.g. programming 15:18:43 Yes, but as a management service. 15:18:48 even the programming function is now exposed by REST api 15:18:50 Does programming require platform-specific code? 15:19:00 I would kind of expect it does. 15:19:00 efried: Yes ^^ 15:19:10 We must maintain other meta, 15:19:30 efried, in cyborg itself, not platform-specific code, in the driver plugin, yes 15:19:36 Quota usage, project or user info. 15:20:05 I would use the Cyborg driver on the node to do the programming. Using the compute node --> controller --> compute node loop relies on connectivity being intact through out the operation, and has more failure modes 15:20:06 It is in oslo_context or handled by APi 15:20:31 So what I would expect is that os-acc would have a 'program' method; that method would do anything generic (like maybe downloading the image? not sure) and then call into the plugin's 'program' method to do the platform-specific bits. 15:21:01 but I wouldn't expect the plugin to call the cyborg API for any part of that. 15:21:35 well, os-acc is not meant for attach/detach right now 15:21:48 programming is done in vendor device drivers 15:22:25 as programming process does not involve nova at all 15:22:38 "os-acc is not meant for attach/detach right now" ? 15:22:59 sorry 15:23:03 efried, Yes. If nova wouldn't call client, we should have programe 15:23:16 typo. is meat for attach/detach right now... :( 15:23:46 The os-acc is an abstraction for Nova Compute to Cyborg, for all devices, not just FPGAs 15:24:10 It includes attach/detach and any associated actions like device-specific configuration and programming 15:24:32 The latter part is delegated to plugins, which in turn may delegate them to Cyborg drivers 15:25:03 So, os-acc does not have a 'program' primitive natively 15:25:36 That is left to any FPGA plugin. Agree with Li_Liu that it would be done via Cyborg drivers down the chain 15:26:51 One expectation seems to be that all programming has to go through REST API? If so, as I explained, that relies on connectivity and has more failure modes 15:27:07 Only bitstream fetching needs connectivity 15:28:26 Anyways, please review the spec :) 15:29:20 ok, I think we open up more questions in os-acc spec, we can discuss them in the code review 15:29:55 Let's wrap up and end the meeting for now and continuous the discussion offline 15:29:59 #endmeeting