14:04:15 <Li_Liu> #startmeeting openstack-cyborg 14:04:16 <openstack> Meeting started Wed Jun 27 14:04:15 2018 UTC and is due to finish in 60 minutes. The chair is Li_Liu. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:04:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:04:19 <openstack> The meeting name has been set to 'openstack_cyborg' 14:04:46 <Li_Liu> #topic Roll Call 14:04:57 <Li_Liu> #info Li_Liu 14:04:58 <shaohe_feng> #shaohe_feng 14:05:05 <shaohe_feng> #info shaohe_feng 14:05:22 <efried> ō/ 14:05:28 <sum12> #info sum12 14:06:55 <Sundar> #info Sundar 14:07:04 <Li_Liu> Let's get started 14:07:24 <Li_Liu> Howard is not here today, I will run the meeting for me today 14:07:52 <Li_Liu> #topic os-acc discussion 14:08:25 <Li_Liu> HI Sundar, do you want to lead this topic? 14:08:54 <Sundar> Sure, Li Liu 14:09:27 <Sundar> The os-acc spec was updated as indicated in #link http://lists.openstack.org/pipermail/openstack-dev/2018-June/131751.html 14:09:40 <Sundar> The reasons for the update are listed there as well 14:10:08 * efried owes y'all a review on that 14:10:12 <Sundar> The updated spec is in #link https://review.openstack.org/#/c/577438/ 14:10:50 <Sundar> efried: One comment I got was to include a sequence diagram 14:10:56 <efried> yup 14:11:17 <Sundar> When I follow https://review.openstack.org/#/c/572583/11/specs/rocky/approved/reshape-provider-tree.rst, I get an error in RST syntax 14:11:31 <Sundar> The seqdiag directive is not known to standard RST editors 14:11:53 <Sundar> What do you do to in addition to updating requirements.txt etc.? 14:11:55 <efried> Sundar: Right, you have to set up dependencies and other stuff in your env 14:12:05 <efried> hold on, let me find the patch where we added that stuff into nova-specs... 14:12:18 <Sundar> Yes, requirements.txt and doc/source/conf.py need to be updated 14:12:36 <efried> right, okay - so you did those things and it's still busted? 14:13:16 <Li_Liu> Sundar, quick question, do we still need a dedicated os-acc spec on top of this one you posted? 14:13:17 <Sundar> The standard RST editors like http://rst.ninjs.org/# wouldn't know about them. So, how do you check before a submit? 14:13:44 <efried> If you're building locally, you'll need to make sure the dep is installed into your venv. You can do that by adding -r to your tox command, or you can enter the venv and install it manually via pip 14:14:25 <Sundar> OK, how do you run the tox command? 14:14:45 <efried> Sundar: How have you been running it thus far? 14:15:00 <efried> tox -e docs ? 14:15:28 <Sundar> SO, tox -e -r docs? 14:15:31 <efried> So you would say tox -re docs instead. That will rebuild the venv from scratch. That's assuming your problem is the missing dependency. 14:15:48 <Sundar> Got it. Thanks, Eric. 14:15:55 <efried> No, -e is how you specify which testenv, so tox -r -e docs or tox -re docs 14:15:56 <Sundar> Li Liu: I don't see why. We should be able to add any additional detail here 14:16:02 <efried> ++ ^ 14:16:18 <Sundar> Sure, Eric 14:16:21 <efried> agree the goal should be to have all the necessary design detail here. 14:16:35 <Li_Liu> sure, that is what I figured 14:16:52 <efried> Sundar: This is assuming the problem was the missing dependency. If you're still having problems, let me know and I'll see if I can help out. 14:16:53 <Sundar> Li Liu: We presumably need to add a bit more detail on how we can do a plugin for x86+FPGAs. Is that what you are looking for. 14:17:27 <Sundar> Thanks, Eric. I may take up on that if things don't work out by today :) 14:17:43 <efried> we're still doing this in openstack/cyborg, huh? 14:17:48 <Li_Liu> yup 14:17:49 <efried> any plans to move to cyborg-specs? 14:18:03 <efried> would make things easier to find :) 14:18:39 <Sundar> I was told we have latitude to stick to the current location for this cycle. 14:18:57 <Li_Liu> that's sounds good, let's get the current ones merged before we start the moving 14:19:25 <Li_Liu> right 14:20:08 <Sundar> OK, next steps are to add a sequence diagram. Also, I would propose a plugin for x86 + FPGAs, and a separate one for x86 + GPUs. The main reason to stick to x86 is that that's all I know about :) 14:20:42 <Sundar> We can probably generalize to ARM if SR-IOV works the same way there. Power is an area where we need Eric's input 14:20:59 <Li_Liu> how much difference if it was arm or even power? 14:21:00 <efried> I don't think you should be proposing specific plugins via this spec. 14:21:06 <efried> just the framework. 14:21:24 <efried> The plugins themselves will ultimately be the responsibility of the vendors. 14:21:29 <efried> I would think. 14:21:58 <Sundar> For x86+FPGAs, we could have a community-supported plugin which calls into Cyborg/agent/driver for vendor-specific stuff 14:22:15 <Li_Liu> shall we provide a reference from community tho? 14:22:28 <Sundar> Li_Liu: yes ^^ 14:23:06 <Sundar> Li_Liu: "how much difference if it was arm or even power?" Power seems to be quite different IIUC. Eric can expand on that 14:23:07 <shaohe_feng> entry points for driver. 14:24:32 <efried> Yeah, e.g. if the plugin has a "plug" operation, on libvirt this entails, what, editing the domain XML and doing stuff to special files under /dev? 14:25:01 <efried> In POWER, it entails making a REST call to the NovaLink API asking for an I/O slot to be attached to a logical partition (VM) and the platform does the magic. 14:25:07 <efried> Totally, totally different model. 14:25:25 <efried> and definitely not something the cyborg team should try to become domain experts on. 14:26:31 <Sundar> efried: Agreed. The plugin model should be able to handle such differences. The return value of the plug() itself is left open for that reason. 14:26:55 <Li_Liu> sounds good 14:27:01 <efried> I would think the return value would be an Acc* object 14:27:13 <Sundar> That is why it may make sense to have a x86/ARM/libvirt section that specifies what plug(), does 14:27:33 <efried> If it needs to contain platform-specific data, that goes in the *Profile bit, or something. 14:28:01 <Sundar> The return value can be a VAN object with enough data that is hypervisor-specific 14:28:20 <efried> Sundar: sorry, yeah, I guess we're calling it VAN now. 14:28:41 <Sundar> The port profiles in Neutron may not be applicable here IMHO 14:28:52 <efried> But again, I don't think your spec should go into any detail as to what each platform's plugin actually does. You could mention it for the sake of example/understanding, but it should not be the role of this spec to lay out the details. 14:29:32 <Sundar> OK. If we agree that we will do a community-supported plugin for x86 + FPGAs, I can write a separate spec for that. Is that fine? 14:30:41 <efried> yes, that sounds like the right plan. 14:31:14 <efried> Sundar: update on the seqdiag: right now it looks like the cyborg repo isn't yet using the "new" process for doc builds. 14:31:33 <efried> So you'll want to put the requirement into test-requirements.txt instead of creating a new doc/requirements.txt 14:31:35 <Sundar> Li_Liu, shaohe, all: do we all agree on this plan? 14:31:39 <efried> or you can switch over to the new build process 14:32:02 <Sundar> Ah, I put it in cyborg/requirements.txt 14:32:35 <Sundar> That seems closest? 14:32:53 <Li_Liu> Sundar, i am fine 14:33:02 <efried> Right now your docs env is calling out test-requirements.txt 14:33:09 <efried> I don't actually see anything calling out requirements.txt, which is weird. 14:35:07 <efried> Sundar: Anyway, I'm noodling around with it locally, will let you know if I get something working. 14:35:23 <Li_Liu> let's moving on 14:35:45 <Sundar> efried: Sorry, where do you see the reference to test-requirements.txt? 14:36:48 <Sundar> I mean , I see the file but what in doc is referring to it? 14:37:02 <Li_Liu> Cyborg had a meetup in Beijing LC3 yesterday, shaohe_feng do you have anything to share? 14:38:50 <shaohe_feng> zhipengh[m], introduce some scenario about cyborg 14:39:42 <Li_Liu> #topic LC3 meetup summary 14:43:52 <Li_Liu> ok, I will gather more information from the folks who attend the meetup and share with rest of the team 14:44:09 <Li_Liu> #topc AoB 14:44:14 <Li_Liu> #topic AoB 14:44:32 <shaohe_feng> Do you know iflytek? Sundar, Li_Liu 14:44:59 <Li_Liu> I know them 14:45:18 <shaohe_feng> #link http://www.iflytek.com/en/index.html 14:45:18 <Sundar> This one? https://en.wikipedia.org/wiki/IFlytek 14:46:01 <shaohe_feng> It is a famous AI company in China. 14:46:08 <shaohe_feng> Sundar, yes. 14:47:04 <shaohe_feng> AI will be a cyborg scenario. 14:47:12 <shaohe_feng> they need accelerator 14:47:46 <shaohe_feng> then we discuss the current status of cyborg development. 14:47:47 <Sundar> GPUs? 14:48:06 <shaohe_feng> Sundar, Gpus and FPGA. 14:48:08 <wangzhh> FPGAs, also. :) 14:48:13 <Li_Liu> are they planing to put people on Cyborg? 14:48:41 <shaohe_feng> Not sure. 14:49:49 <shaohe_feng> Sundar, they develop logic on FPGA. 14:50:17 <shaohe_feng> we have a gap on cyborg API design. 14:50:31 <Sundar> Interesting. So, they have their own bitstreams. How are they using it -- like FPGA aaS? 14:52:13 <shaohe_feng> Not sure how do they use it. 14:52:34 <shaohe_feng> Sundar, will check with them about it by mail. and CC you. 14:52:42 <shaohe_feng> for we need to interactive nova. 14:53:13 <Sundar> shaohe: Ok 14:53:42 <shaohe_feng> wangzhh, introduce the whole flow for the interaction. 14:53:55 <shaohe_feng> they have do a demo for it during summit. 14:54:18 <wangzhh> OK. We have a discussion offline this afternoon. 14:54:34 <wangzhh> Let me introduce about it. 14:54:49 <shaohe_feng> wangzhh, can you give a more details on it? 14:55:31 <wangzhh> It is about the flow of creating, deleting and rebooting instance with accelerators. 14:55:41 <wangzhh> Create: 14:55:54 <wangzhh> 1. nova-api (Create an instance with acc) 14:56:00 <wangzhh> 2. nova-conductor => nova-scheduler => placement-api (Get a node list and claim the accelerator unit) 14:56:05 <wangzhh> 3. nova-conductor => nova-compute 14:56:11 <wangzhh> Extra work we should do in nova: 14:56:19 <wangzhh> a) nova-compute => cyborgclient => cyborg-api (Update acc, instance_uuid, assignable,etc. Program here if it is a request of fpga) 14:56:27 <wangzhh> b) nova-compute pass parameters to os-acc(accelerator_address, acc_type, etc.) and get xml segment 14:57:36 <Sundar> Why do they do "nova-compute => cyborgclient => cyborg-api" for programming? We can do it entirely in compute node 14:58:14 <Sundar> We need to update the db, and that requires an RPC to the conductor 14:58:34 <Sundar> But, otherwise, it can all be in os-acc/Cyborg-agent/driver 14:58:36 <wangzhh> It is for 2 reasons. 14:58:37 <efried> I'll just mention that there was a nova core (it might have been Dan) who said nova-compute should never have to call the cyborg API; everything should be done via os-acc. 14:58:41 <efried> I could be misquoting too :( 15:00:09 <Sundar> wangzhh: Did they modify Cyborg? 15:00:41 <wangzhh> 1. In Openstack, It is a common design to interact by api. 2. We have something like update quota in api. 15:02:01 <Sundar> wangzhh: If Nova compute has to call the REST APIs of Neutron, Cinder, Cyborg, ... it can get out of hand. The os-acc/os-vif/os-brick is supposed to solve that problem 15:03:03 <wangzhh> efried, if so, could we imply it by os-acc =》 cyborclient? 15:03:22 <efried> wangzhh: Yes, I would think so. 15:03:42 <Sundar> efried: How about Power? Would you have os-acc call Cyborg for that too? 15:04:09 <efried> Sundar: I would think any calls from os-acc to cyborg API would happen *outside* of the plugin, in the common os-acc code. 15:04:33 <wangzhh> Sundar, IMHO os-vif/os-brick didn't call neutron or cinder. 15:05:20 <efried> I probably need to punt on this issue, just wanted to bring it up so y'all didn't get miles down the road and end up surprised if someone started waving a red flag. 15:05:51 <Sundar> In the currently proposed flow, os-acc --> plugin, and the plugin can do whatever it wants. For FPGAs, I think it would call Cyborg agent, which in turn can call the Cyborg conductor etc. 15:06:26 <Sundar> For Power plugin, there could be a common os-acc function which invokes cyborg client to update the API 15:06:38 <wangzhh> I think API is the entry of cyborg. 15:07:12 <wangzhh> If we call agent directly. Some process would be complex. 15:08:07 <Sundar> For vendor-specific actions, we need to delegate to the drivers, which would be through the agent. 15:08:12 <wangzhh> We can reuse client if so. 15:08:52 <wangzhh> Instead of init a rpc client and do extra works. 15:09:21 <Li_Liu> should os-acc be calling the Cyborg API from the top instead of call Cyborg driver? 15:09:30 <Sundar> Sure, we can have the Cyborg agent use the cyborgclient too. Then we need to configure the REST API URL etc. but that gives a single entry point to update allocation/release 15:09:51 <wangzhh> It is cyborg-api =》 cyborg-agent 15:10:35 <Li_Liu> i mean shall if be like: os-acc --> Cyborg api --> Cyborg agent ? 15:10:49 <wangzhh> Yes, uncle li 15:11:22 <wangzhh> Sundar, client is for restful api 15:11:34 <Sundar> Li_Liu: There is no need for such circuitous paths, and that probably won't work for Power. Eric doesn't want Cyborg to be in the middle for Power 15:11:36 <wangzhh> But agent is a rpc server. 15:12:31 <Sundar> Folks, we need 2 things: vendor-specific actions (through Cyborg drivers for FPGAs etc.), and update the db (which can be through REST API) 15:12:46 <Li_Liu> why Power does not work in this case, efried? 15:12:47 <efried> I want code provided by cyborg to do everything that's not platform specific; and I want everything that's platform specific done by plugin code. That's all. 15:13:02 <efried> That needs to apply to *all* platforms, including libvirt. 15:13:17 <efried> There's a tendency to think of libvirt as being "common". It's not. 15:14:00 <efried> (It's common in the sense of "pervasive" - not common in the sense of "all code paths go through it") 15:14:07 <Li_Liu> I understand, I don't think going through os-acc --> Cyborg api --> Cyborg agent could be a problem for you 15:14:20 <Sundar> To me, that means: os-acc calls the plugin for all vendor/arch-specific actions. The os-acc or the plugin must call into Cyborg REST API to update the db. 15:14:29 <efried> ^ this 15:14:59 <efried> And I'm tending toward s/or the plugin// 15:15:16 <efried> but will have to see how the actual flows shake out. 15:15:21 <Sundar> efried: That is how the spec is written today 15:15:33 * efried sheepish 15:15:35 <Li_Liu> Sundar, that I agree, but call Cyborg Agent/Driver directly from os-acc might be a problem 15:15:36 <efried> still need to review 15:16:30 <wangzhh> Sundar, 15:16:43 <Sundar> Li Liu: os-acc --> plugin. The plugin possibly calls Cyborg agent, but always updates Cyborg db through REST API 15:17:10 <wangzhh> It 's not just update state of acc in db when attach acc. 15:17:31 <wangzhh> Something should be done by API 15:18:12 <Sundar> Why "by API"? All actions can be done by Cyborg agent/drivers or the plugin, except for updating db 15:18:18 <Sundar> E.g. programming 15:18:43 <wangzhh> Yes, but as a management service. 15:18:48 <Li_Liu> even the programming function is now exposed by REST api 15:18:50 <efried> Does programming require platform-specific code? 15:19:00 <efried> I would kind of expect it does. 15:19:00 <Sundar> efried: Yes ^^ 15:19:10 <wangzhh> We must maintain other meta, 15:19:30 <Li_Liu> efried, in cyborg itself, not platform-specific code, in the driver plugin, yes 15:19:36 <wangzhh> Quota usage, project or user info. 15:20:05 <Sundar> I would use the Cyborg driver on the node to do the programming. Using the compute node --> controller --> compute node loop relies on connectivity being intact through out the operation, and has more failure modes 15:20:06 <wangzhh> It is in oslo_context or handled by APi 15:20:31 <efried> So what I would expect is that os-acc would have a 'program' method; that method would do anything generic (like maybe downloading the image? not sure) and then call into the plugin's 'program' method to do the platform-specific bits. 15:21:01 <efried> but I wouldn't expect the plugin to call the cyborg API for any part of that. 15:21:35 <Li_Liu> well, os-acc is not meant for attach/detach right now 15:21:48 <Li_Liu> programming is done in vendor device drivers 15:22:25 <Li_Liu> as programming process does not involve nova at all 15:22:38 <efried> "os-acc is not meant for attach/detach right now" ? 15:22:59 <Li_Liu> sorry 15:23:03 <wangzhh> efried, Yes. If nova wouldn't call client, we should have programe 15:23:16 <Li_Liu> typo. is meat for attach/detach right now... :( 15:23:46 <Sundar> The os-acc is an abstraction for Nova Compute to Cyborg, for all devices, not just FPGAs 15:24:10 <Sundar> It includes attach/detach and any associated actions like device-specific configuration and programming 15:24:32 <Sundar> The latter part is delegated to plugins, which in turn may delegate them to Cyborg drivers 15:25:03 <Sundar> So, os-acc does not have a 'program' primitive natively 15:25:36 <Sundar> That is left to any FPGA plugin. Agree with Li_Liu that it would be done via Cyborg drivers down the chain 15:26:51 <Sundar> One expectation seems to be that all programming has to go through REST API? If so, as I explained, that relies on connectivity and has more failure modes 15:27:07 <Sundar> Only bitstream fetching needs connectivity 15:28:26 <Sundar> Anyways, please review the spec :) 15:29:20 <Li_Liu> ok, I think we open up more questions in os-acc spec, we can discuss them in the code review 15:29:55 <Li_Liu> Let's wrap up and end the meeting for now and continuous the discussion offline 15:29:59 <Li_Liu> #endmeeting