14:04:15 <Li_Liu> #startmeeting openstack-cyborg
14:04:16 <openstack> Meeting started Wed Jun 27 14:04:15 2018 UTC and is due to finish in 60 minutes.  The chair is Li_Liu. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:04:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:04:19 <openstack> The meeting name has been set to 'openstack_cyborg'
14:04:46 <Li_Liu> #topic Roll Call
14:04:57 <Li_Liu> #info Li_Liu
14:04:58 <shaohe_feng> #shaohe_feng
14:05:05 <shaohe_feng> #info shaohe_feng
14:05:22 <efried> ō/
14:05:28 <sum12> #info sum12
14:06:55 <Sundar> #info Sundar
14:07:04 <Li_Liu> Let's get started
14:07:24 <Li_Liu> Howard is not here today, I will run the meeting for me today
14:07:52 <Li_Liu> #topic os-acc discussion
14:08:25 <Li_Liu> HI Sundar, do you want to lead this topic?
14:08:54 <Sundar> Sure, Li Liu
14:09:27 <Sundar> The os-acc spec was updated as indicated in #link http://lists.openstack.org/pipermail/openstack-dev/2018-June/131751.html
14:09:40 <Sundar> The reasons for the update are listed there as well
14:10:08 * efried owes y'all a review on that
14:10:12 <Sundar> The updated spec is in #link https://review.openstack.org/#/c/577438/
14:10:50 <Sundar> efried: One comment I got was to include a sequence diagram
14:10:56 <efried> yup
14:11:17 <Sundar> When  I follow https://review.openstack.org/#/c/572583/11/specs/rocky/approved/reshape-provider-tree.rst, I get an error in RST syntax
14:11:31 <Sundar> The seqdiag directive is not known to standard RST editors
14:11:53 <Sundar> What do you do to in addition to updating requirements.txt etc.?
14:11:55 <efried> Sundar: Right, you have to set up dependencies and other stuff in your env
14:12:05 <efried> hold on, let me find the patch where we added that stuff into nova-specs...
14:12:18 <Sundar> Yes, requirements.txt and doc/source/conf.py need to be updated
14:12:36 <efried> right, okay - so you did those things and it's still busted?
14:13:16 <Li_Liu> Sundar, quick question, do we still need a dedicated os-acc spec on top of this one you posted?
14:13:17 <Sundar> The standard RST editors like http://rst.ninjs.org/# wouldn't know about them. So, how do you check before a submit?
14:13:44 <efried> If you're building locally, you'll need to make sure the dep is installed into your venv.  You can do that by adding -r to your tox command, or you can enter the venv and install it manually via pip
14:14:25 <Sundar> OK, how do you run the tox command?
14:14:45 <efried> Sundar: How have you been running it thus far?
14:15:00 <efried> tox -e docs ?
14:15:28 <Sundar> SO, tox -e -r docs?
14:15:31 <efried> So you would say  tox -re docs  instead.  That will rebuild the venv from scratch.  That's assuming your problem is the missing dependency.
14:15:48 <Sundar> Got it. Thanks, Eric.
14:15:55 <efried> No, -e is how you specify which testenv, so   tox -r -e docs   or   tox -re docs
14:15:56 <Sundar> Li Liu: I don't see why. We should be able to add any additional detail here
14:16:02 <efried> ++ ^
14:16:18 <Sundar> Sure, Eric
14:16:21 <efried> agree the goal should be to have all the necessary design detail here.
14:16:35 <Li_Liu> sure, that is what I figured
14:16:52 <efried> Sundar: This is assuming the problem was the missing dependency.  If you're still having problems, let me know and I'll see if I can help out.
14:16:53 <Sundar> Li Liu: We presumably need to add a bit more detail on how we can do a plugin for x86+FPGAs. Is that what you are looking for.
14:17:27 <Sundar> Thanks, Eric. I may take up on that if things don't work out by today :)
14:17:43 <efried> we're still doing this in openstack/cyborg, huh?
14:17:48 <Li_Liu> yup
14:17:49 <efried> any plans to move to cyborg-specs?
14:18:03 <efried> would make things easier to find :)
14:18:39 <Sundar> I was told we have latitude to stick to the current location for this cycle.
14:18:57 <Li_Liu> that's sounds good, let's get the current ones merged before we start the moving
14:19:25 <Li_Liu> right
14:20:08 <Sundar> OK, next steps are to add a sequence diagram. Also, I would propose a plugin for x86 + FPGAs, and a separate one for x86 + GPUs. The main reason to stick to x86 is that that's all I know about :)
14:20:42 <Sundar> We can probably generalize to ARM if SR-IOV works the same way there. Power is an area where we need Eric's input
14:20:59 <Li_Liu> how much difference if it was arm or even power?
14:21:00 <efried> I don't think you should be proposing specific plugins via this spec.
14:21:06 <efried> just the framework.
14:21:24 <efried> The plugins themselves will ultimately be the responsibility of the vendors.
14:21:29 <efried> I would think.
14:21:58 <Sundar> For x86+FPGAs, we could have a community-supported plugin which calls into Cyborg/agent/driver for vendor-specific stuff
14:22:15 <Li_Liu> shall we provide a reference from community tho?
14:22:28 <Sundar> Li_Liu: yes ^^
14:23:06 <Sundar> Li_Liu: "how much difference if it was arm or even power?" Power seems to be quite different IIUC. Eric can expand on that
14:23:07 <shaohe_feng> entry points for driver.
14:24:32 <efried> Yeah, e.g. if the plugin has a "plug" operation, on libvirt this entails, what, editing the domain XML and doing stuff to special files under /dev?
14:25:01 <efried> In POWER, it entails making a REST call to the NovaLink API asking for an I/O slot to be attached to a logical partition (VM) and the platform does the magic.
14:25:07 <efried> Totally, totally different model.
14:25:25 <efried> and definitely not something the cyborg team should try to become domain experts on.
14:26:31 <Sundar> efried: Agreed. The plugin model should be able to handle such differences. The return value of the plug() itself is left open for that reason.
14:26:55 <Li_Liu> sounds good
14:27:01 <efried> I would think the return value would be an Acc* object
14:27:13 <Sundar> That is why it may make sense to have a x86/ARM/libvirt section that specifies what plug(), does
14:27:33 <efried> If it needs to contain platform-specific data, that goes in the *Profile bit, or something.
14:28:01 <Sundar> The return value can be a VAN object with enough data that is hypervisor-specific
14:28:20 <efried> Sundar: sorry, yeah, I guess we're calling it VAN now.
14:28:41 <Sundar> The port profiles in Neutron may not be applicable here IMHO
14:28:52 <efried> But again, I don't think your spec should go into any detail as to what each platform's plugin actually does.  You could mention it for the sake of example/understanding, but it should not be the role of this spec to lay out the details.
14:29:32 <Sundar> OK. If we agree that we will do a community-supported plugin for x86 + FPGAs, I can write a separate spec for that. Is that fine?
14:30:41 <efried> yes, that sounds like the right plan.
14:31:14 <efried> Sundar: update on the seqdiag: right now it looks like the cyborg repo isn't yet using the "new" process for doc builds.
14:31:33 <efried> So you'll want to put the requirement into test-requirements.txt instead of creating a new doc/requirements.txt
14:31:35 <Sundar> Li_Liu, shaohe, all: do we all agree on this plan?
14:31:39 <efried> or you can switch over to the new build process
14:32:02 <Sundar> Ah, I put it in cyborg/requirements.txt
14:32:35 <Sundar> That seems closest?
14:32:53 <Li_Liu> Sundar, i am fine
14:33:02 <efried> Right now your docs env is calling out test-requirements.txt
14:33:09 <efried> I don't actually see anything calling out requirements.txt, which is weird.
14:35:07 <efried> Sundar: Anyway, I'm noodling around with it locally, will let you know if I get something working.
14:35:23 <Li_Liu> let's moving on
14:35:45 <Sundar> efried: Sorry, where do you see the reference to test-requirements.txt?
14:36:48 <Sundar> I mean , I see the file but what in doc is referring to it?
14:37:02 <Li_Liu> Cyborg had a meetup in Beijing LC3 yesterday, shaohe_feng do you have anything to share?
14:38:50 <shaohe_feng> zhipengh[m], introduce some scenario about cyborg
14:39:42 <Li_Liu> #topic LC3 meetup summary
14:43:52 <Li_Liu> ok, I will gather more information from the folks who attend the meetup and share with rest of the team
14:44:09 <Li_Liu> #topc AoB
14:44:14 <Li_Liu> #topic AoB
14:44:32 <shaohe_feng> Do you know iflytek? Sundar, Li_Liu
14:44:59 <Li_Liu> I know them
14:45:18 <shaohe_feng> #link http://www.iflytek.com/en/index.html
14:45:18 <Sundar> This one? https://en.wikipedia.org/wiki/IFlytek
14:46:01 <shaohe_feng> It is a famous AI company in China.
14:46:08 <shaohe_feng> Sundar, yes.
14:47:04 <shaohe_feng> AI will be a cyborg scenario.
14:47:12 <shaohe_feng> they need accelerator
14:47:46 <shaohe_feng> then we discuss the current status of cyborg development.
14:47:47 <Sundar> GPUs?
14:48:06 <shaohe_feng> Sundar, Gpus and FPGA.
14:48:08 <wangzhh> FPGAs, also. :)
14:48:13 <Li_Liu> are they planing to put people on Cyborg?
14:48:41 <shaohe_feng> Not sure.
14:49:49 <shaohe_feng> Sundar, they develop logic on FPGA.
14:50:17 <shaohe_feng> we have a gap on cyborg API design.
14:50:31 <Sundar> Interesting. So, they have their own bitstreams. How are they using it -- like FPGA aaS?
14:52:13 <shaohe_feng> Not sure how do they use it.
14:52:34 <shaohe_feng> Sundar, will check with them about it by mail.  and CC you.
14:52:42 <shaohe_feng> for we need to interactive nova.
14:53:13 <Sundar> shaohe: Ok
14:53:42 <shaohe_feng> wangzhh, introduce the whole flow for the interaction.
14:53:55 <shaohe_feng> they have do a demo for it during summit.
14:54:18 <wangzhh> OK. We have a discussion offline this afternoon.
14:54:34 <wangzhh> Let me introduce about it.
14:54:49 <shaohe_feng> wangzhh, can you give a more details on it?
14:55:31 <wangzhh> It is about the flow of creating, deleting and rebooting instance with accelerators.
14:55:41 <wangzhh> Create:
14:55:54 <wangzhh> 1. nova-api (Create an instance with acc)
14:56:00 <wangzhh> 2. nova-conductor => nova-scheduler => placement-api (Get a node list and claim the accelerator unit)
14:56:05 <wangzhh> 3. nova-conductor => nova-compute
14:56:11 <wangzhh> Extra work we should do in nova:
14:56:19 <wangzhh> a) nova-compute => cyborgclient => cyborg-api (Update acc, instance_uuid, assignable,etc. Program here if it is a request of fpga)
14:56:27 <wangzhh> b) nova-compute pass parameters to os-acc(accelerator_address, acc_type, etc.) and get xml segment
14:57:36 <Sundar> Why do they do "nova-compute => cyborgclient => cyborg-api" for programming? We can do it entirely in compute node
14:58:14 <Sundar> We need to update the db, and that requires an RPC to the conductor
14:58:34 <Sundar> But, otherwise, it can all be in os-acc/Cyborg-agent/driver
14:58:36 <wangzhh> It is for 2 reasons.
14:58:37 <efried> I'll just mention that there was a nova core (it might have been Dan) who said nova-compute should never have to call the cyborg API; everything should be done via os-acc.
14:58:41 <efried> I could be misquoting too :(
15:00:09 <Sundar> wangzhh: Did they modify Cyborg?
15:00:41 <wangzhh> 1. In Openstack, It is a common design to interact by api. 2. We  have something like update quota in api.
15:02:01 <Sundar> wangzhh: If Nova compute has to call the REST APIs of Neutron, Cinder, Cyborg, ... it can get out of hand. The os-acc/os-vif/os-brick is supposed to solve that problem
15:03:03 <wangzhh> efried, if so, could we imply it by os-acc =》 cyborclient?
15:03:22 <efried> wangzhh: Yes, I would think so.
15:03:42 <Sundar> efried: How about Power? Would you have os-acc call Cyborg for that too?
15:04:09 <efried> Sundar: I would think any calls from os-acc to cyborg API would happen *outside* of the plugin, in the common os-acc code.
15:04:33 <wangzhh> Sundar, IMHO os-vif/os-brick didn't call neutron or cinder.
15:05:20 <efried> I probably need to punt on this issue, just wanted to bring it up so y'all didn't get miles down the road and end up surprised if someone started waving a red flag.
15:05:51 <Sundar> In the currently proposed flow, os-acc --> plugin, and the plugin can do whatever it wants. For FPGAs, I think it would call Cyborg agent, which in turn can call the Cyborg conductor etc.
15:06:26 <Sundar> For Power plugin, there could be a common os-acc function which invokes cyborg client to update the API
15:06:38 <wangzhh> I think API is the entry of  cyborg.
15:07:12 <wangzhh> If we call agent directly. Some process would be complex.
15:08:07 <Sundar> For vendor-specific actions, we need to delegate to the drivers, which would be through the agent.
15:08:12 <wangzhh> We can reuse client if so.
15:08:52 <wangzhh> Instead of init a rpc client and do extra works.
15:09:21 <Li_Liu> should os-acc be calling the Cyborg API from the top instead of call Cyborg driver?
15:09:30 <Sundar> Sure, we can have the Cyborg agent use the cyborgclient too. Then we need to configure the REST API URL etc. but that gives a single entry point to update allocation/release
15:09:51 <wangzhh> It is cyborg-api =》 cyborg-agent
15:10:35 <Li_Liu> i mean shall if be like: os-acc --> Cyborg api --> Cyborg agent ?
15:10:49 <wangzhh> Yes, uncle li
15:11:22 <wangzhh> Sundar, client is for restful api
15:11:34 <Sundar> Li_Liu: There is no need for such circuitous paths, and that probably won't work for Power. Eric doesn't want Cyborg to be in the middle for Power
15:11:36 <wangzhh> But agent is a rpc server.
15:12:31 <Sundar> Folks, we need 2 things: vendor-specific actions (through Cyborg drivers for FPGAs etc.), and update the db (which can be through REST API)
15:12:46 <Li_Liu> why Power does not work in this case, efried?
15:12:47 <efried> I want code provided by cyborg to do everything that's not platform specific; and I want everything that's platform specific done by plugin code.  That's all.
15:13:02 <efried> That needs to apply to *all* platforms, including libvirt.
15:13:17 <efried> There's a tendency to think of libvirt as being "common".  It's not.
15:14:00 <efried> (It's common in the sense of "pervasive" - not common in the sense of "all code paths go through it")
15:14:07 <Li_Liu> I understand, I don't think going through  os-acc --> Cyborg api --> Cyborg agent could be a problem for you
15:14:20 <Sundar> To me, that means: os-acc calls the plugin for all vendor/arch-specific actions. The os-acc or the plugin must call into Cyborg REST API to update the db.
15:14:29 <efried> ^ this
15:14:59 <efried> And I'm tending toward s/or the plugin//
15:15:16 <efried> but will have to see how the actual flows shake out.
15:15:21 <Sundar> efried: That is how the spec is written today
15:15:33 * efried sheepish
15:15:35 <Li_Liu> Sundar, that I agree, but call Cyborg Agent/Driver directly from os-acc might be a problem
15:15:36 <efried> still need to review
15:16:30 <wangzhh> Sundar,
15:16:43 <Sundar> Li Liu: os-acc --> plugin. The plugin possibly calls Cyborg agent, but always updates Cyborg db through REST API
15:17:10 <wangzhh> It 's not just update state of acc in db when attach acc.
15:17:31 <wangzhh> Something should be done by API
15:18:12 <Sundar> Why "by API"? All actions can be done by Cyborg agent/drivers or the plugin, except for updating db
15:18:18 <Sundar> E.g. programming
15:18:43 <wangzhh> Yes, but as a management service.
15:18:48 <Li_Liu> even the programming function is now exposed by REST api
15:18:50 <efried> Does programming require platform-specific code?
15:19:00 <efried> I would kind of expect it does.
15:19:00 <Sundar> efried: Yes ^^
15:19:10 <wangzhh> We must maintain other meta,
15:19:30 <Li_Liu> efried, in cyborg itself, not platform-specific code, in the driver plugin, yes
15:19:36 <wangzhh> Quota usage, project or user info.
15:20:05 <Sundar> I would use the Cyborg driver on the node to do the programming. Using the compute node --> controller --> compute node loop relies on connectivity being intact through out the operation, and has more failure modes
15:20:06 <wangzhh> It is in oslo_context or handled by APi
15:20:31 <efried> So what I would expect is that os-acc would have a 'program' method; that method would do anything generic (like maybe downloading the image? not sure) and then call into the plugin's 'program' method to do the platform-specific bits.
15:21:01 <efried> but I wouldn't expect the plugin to call the cyborg API for any part of that.
15:21:35 <Li_Liu> well, os-acc is not meant for attach/detach right now
15:21:48 <Li_Liu> programming is done in vendor device drivers
15:22:25 <Li_Liu> as programming process does not involve nova at all
15:22:38 <efried> "os-acc is not meant for attach/detach right now" ?
15:22:59 <Li_Liu> sorry
15:23:03 <wangzhh> efried, Yes. If nova wouldn't call client, we should have programe
15:23:16 <Li_Liu> typo. is meat for attach/detach right now... :(
15:23:46 <Sundar> The os-acc is an abstraction for Nova Compute to Cyborg, for all devices, not just FPGAs
15:24:10 <Sundar> It includes attach/detach and any associated actions like device-specific configuration and programming
15:24:32 <Sundar> The latter part is delegated to plugins, which in turn may delegate them to Cyborg drivers
15:25:03 <Sundar> So, os-acc does not have a 'program' primitive natively
15:25:36 <Sundar> That is left to any FPGA plugin. Agree with Li_Liu that it would be done via Cyborg drivers down the chain
15:26:51 <Sundar> One expectation seems to be that all programming has to go through REST API? If so, as I explained, that relies on connectivity and has more failure modes
15:27:07 <Sundar> Only bitstream fetching needs connectivity
15:28:26 <Sundar> Anyways, please review the spec :)
15:29:20 <Li_Liu> ok, I think we open up more questions in os-acc spec, we can discuss them in the code review
15:29:55 <Li_Liu> Let's wrap up and end the meeting for now and continuous the discussion offline
15:29:59 <Li_Liu> #endmeeting