14:00:42 <zhipeng> #startmeeting openstack-cyborg
14:00:44 <openstack> Meeting started Wed Jun  6 14:00:42 2018 UTC and is due to finish in 60 minutes.  The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:47 <openstack> The meeting name has been set to 'openstack_cyborg'
14:00:55 <zhipeng> #topic Roll Call
14:00:59 <zhipeng> #info Howard
14:03:11 <Sundar> #info Sundar
14:05:54 <sum12> #info sum12
14:08:23 <NokMikeR> #info Mike
14:11:31 <zhipeng> okey let's start
14:11:36 <zhipeng> we have only one topic today :)
14:11:45 <zhipeng> #topic rocky spec finalization
14:12:59 <zhipeng> let's start from quota spec
14:13:07 <zhipeng> which xinran__ has been working on
14:13:07 <Sundar> For https://review.openstack.org/#/c/554717/, we probably need at least one +1 from Nova
14:13:40 <zhipeng> Sundar got it :)
14:13:43 <zhipeng> we get to that
14:13:53 <xinran__> #info xinran__
14:14:07 <Sundar> Sure, zhipeng. NP
14:14:22 <zhipeng> #link https://review.openstack.org/560285
14:14:55 <xinran__> Hi for quota spec, do you think implement this on api layer?
14:15:06 <Sundar> For quotas, we have had a discussion with Nova folks: http://lists.openstack.org/pipermail/openstack-dev/2018-May/130563.html
14:15:22 <Sundar> It doesn;t look like there is any foolproof way to enforce quotas today
14:15:35 <Sundar> So, you guys cna make a call
14:15:41 <zhipeng> yes this is what I understand as well
14:15:47 <xinran__> I mean when there is a new api request, we should check/update quota
14:16:43 <xinran__> It also depend on how cyborg interact with nova.
14:17:10 <zhipeng> xinran__ what is your current proposal ?
14:17:13 <xinran__> If nova call agent directly, we should change it
14:17:24 <xinran__> On api layer
14:17:37 <zhipeng> i'm inclined to that option
14:17:53 <zhipeng> i don't see a good scenario for nova to call cyborg-agent directly
14:18:23 <xinran__> Yes
14:18:39 <shaohe_feng> why nova need to call agent?
14:19:07 <xinran__> the current nova/cyborg interaction is calling agent I am not sure about this
14:19:25 <xinran__> What do you guys think?
14:19:31 <zhipeng> no atm we have conductor interact with placement
14:19:33 <zhipeng> that is all
14:19:44 <zhipeng> there will be/should be the api layer interaction
14:19:48 <xinran__> I mean the current spec :)
14:19:49 <zhipeng> but we are not there yet
14:19:58 <shaohe_feng> oh, if nova call cyborg-agent, there is a problem.
14:20:22 <zhipeng> I would suggest to have it on the api layer
14:20:28 <Sundar> Xinran, can you clarify which spec says Nova compute does or should call into Cyborg agent?
14:21:00 <xinran__> Let me find it
14:21:26 <zhipeng> okey for the spec itself in the current shape
14:21:30 <zhipeng> everyone happy with it ?
14:22:53 <Sundar> zhipeng, the present scheme is not ideal, but I have no objections, since I know it is a priority for others. :)
14:23:08 <Sundar> We can improve upon it over time
14:23:17 <zhipeng> Sundar agree :)
14:23:22 <shaohe_feng> IMO, cyborg can do lazy quota.
14:23:32 <zhipeng> so then let's mark the spec ok to go
14:23:41 <Sundar> Agreed
14:23:42 <xinran__> https://review.openstack.org/#/c/566798/5/doc/specs/rocky/compute-node.rst
14:23:51 <zhipeng> #action quota spec https://review.openstack.org/#/c/560285/ ready
14:24:22 <xinran__> What to you mean lazy quota shaohe_feng
14:25:04 <shaohe_feng> which means, even scheduler pass, cyborg will still refuse accelerator quest for quota failed.
14:25:06 <Sundar> Xinran, compute-node (os-acc) spec does not say that Nova compute will call directly into Cyborg agent. It says that Nova compute should call into os-acc, which should call into Cyborg agent.
14:25:37 <shaohe_feng> os-acc can call cyborg agent?
14:26:13 <zhipeng> guys this is another problem
14:26:18 <zhipeng> let's move on :)
14:26:25 <zhipeng> we have limited time for a lot of specs
14:26:36 <shaohe_feng> Does that means nova will call cyborg agent without api?
14:26:55 <zhipeng> moving on
14:27:10 <zhipeng> Li Liu's two patches on metadata and programming
14:27:18 <shaohe_feng> xinran__, lazy quota, just performance issue.
14:27:38 <xinran__> shaohe_feng:  ok got it
14:27:51 <zhipeng> #link https://review.openstack.org/558265
14:28:03 <zhipeng> any further comment on the metadata spec ?
14:28:24 <xinran__> Sundar:  the “directly” I mean is not pass api layer
14:28:46 <Sundar> I thought Li Liu added a function name. Looking for it
14:29:05 <zhipeng> also Melissa
14:29:21 <zhipeng> @Guest24200
14:29:31 <zhipeng> has Xilinx team also went over the spec ?
14:32:21 <zhipeng> Sundar has Li Liu addressed your comment ?
14:32:28 <NokMikeR> what bootstraps the driver instance in the first place? e.g if a driver is requested that may require an additional tool to already be present like a driver deamon from the fpga vendor to be already in place.
14:32:42 <Sundar> During last spec day, Li Liu and I agreed to add a function name to the bitstream spec as an optional property. #link http://eavesdrop.openstack.org/irclogs/%23openstack-cyborg/%23openstack-cyborg.2018-05-09.log.html#t2018-05-09T19:00:22
14:33:37 <Sundar> This is also documented in the scheldung spec #link https://review.openstack.org/#/c/554717/
14:33:43 <Sundar> *scheduling
14:34:06 <Sundar> Can we get that addressed?
14:34:26 <zhipeng> sure
14:34:53 <zhipeng> #action metadata spec https://review.openstack.org/558265 to add a function name
14:35:16 <zhipeng> other than that there should be no problems right ?
14:36:20 <Sundar> Zhipeng, yes. Thanks
14:36:36 <zhipeng> sounds great :)
14:37:06 <zhipeng> #link https://review.openstack.org/#/c/559395/
14:37:10 <zhipeng> programming
14:37:36 <zhipeng> i think other than Zuul, we should be happy about this now
14:37:40 <zhipeng> :P
14:38:33 <Sundar> I don't see any issue with the basics. We don't have an end-to-end flow that uses this. I guess this is meant to be a standalone API?
14:39:43 <zhipeng> yep
14:40:32 <Sundar> OK. We can tweak this later as needed, when we need to define end-to-end flows. With that understanding, we can approve it as it stands. Sounds good?
14:41:06 <zhipeng> shaohe_feng and others ?
14:41:21 <zhipeng> at least from the review I see Li Liu had addressed all the comments
14:42:12 <shaohe_feng> zhipeng, OK, it looks good for me
14:43:33 <zhipeng> cool
14:43:46 <zhipeng> #action programming spec https://review.openstack.org/#/c/559395/ ready to go
14:44:40 <zhipeng> okey let's go to Sundar's four spec
14:44:53 <zhipeng> Sundar which one do you think is ready to go ?
14:45:03 <Sundar> All of them :)
14:45:17 <zhipeng> haha great
14:45:20 <Sundar> I will still request Nova folks to do a +1 on scheduling spec.
14:45:37 <zhipeng> yes that one has gone through a lot
14:45:48 <Sundar> For os-acc also, we should probably get Nova ok, right?
14:46:42 <shaohe_feng> os-acc will call cyborg-agent?
14:46:55 <shaohe_feng> and nova will call os-acc?
14:46:55 <zhipeng> https://review.openstack.org/#/c/566798/
14:46:59 <zhipeng> this one right ?
14:47:23 <Sundar> zhipeng, yes
14:47:25 <zhipeng> shaohe_feng I think nova-compute calls os-acc to do the attach/detach
14:47:58 <zhipeng> but I think the original goal of os-acc is to serve as a library
14:48:06 <Sundar> shaohe: Yes, as zhipeng says.
14:48:31 <zhipeng> Sundar does nova-compute also calls os-brick or os-vif ?
14:48:32 <Sundar> zhipeng, yes. It is still a library that Cyborg provides, like os-vif for Neutron
14:49:20 <Sundar> zhipeng, yes, nova compute calls into os-vif -- plug(), unplug() API
14:49:25 <Sundar> I'll try to get a link
14:49:37 <shaohe_feng> zhipeng, so that means nova will call agent by attach/detach directly?
14:50:05 <shaohe_feng> is there no race for cyborg API and os-acc?
14:50:29 <zhipeng> i think for the nova scenario, which means accelerator attachment for the VM
14:50:52 <zhipeng> the attach should be issued by nova-compute
14:50:58 <zhipeng> however for the baremetal usecase
14:51:07 <zhipeng> it should be gone through cyborg-api
14:51:08 <shaohe_feng> Sundar, os-vif plug API call nuetron agent?
14:51:50 <zhipeng> i think we should target os-acc for VM usecase for Rocky
14:52:12 <zhipeng> which means os-acc alls the cyborg-agent directly to call upon the driver
14:52:26 <Sundar> That is my understanding. We can double check. But, please note that the spec doesn't make implementation commitments yet. We can implement it the same way as os-vif
14:52:28 <zhipeng> (and the driver invoke the bus protocols)
14:52:54 <Sundar> zhipeng: Agreed :)
14:52:56 <shaohe_feng> edleafe, os-vif can call neutron agent directly by RPC by-pass  API?
14:53:00 <Sundar> shaohe: I think you may be referring to scenarios like the one where Cyborg API is called to program a region, which is already in use?
14:53:42 <zhipeng> i think the attach/detach does not make assumption on the operation
14:53:47 <wangzhh> Hi, I'm confused. Could anyone explain. When nova-compute call cyborg by os-acc,  It will call by http(API) or just by rpc(message queen)?
14:53:50 <zhipeng> no matter a region needs to be programmed or not
14:54:02 <zhipeng> if attach, it presumes the device is ready
14:54:33 <wangzhh> Or both of them?
14:54:35 <zhipeng> wangzhh the current thinking is via rpc to call cyborg-agent in order to get to the driver
14:54:47 <zhipeng> no api involved in this scenario
14:55:21 <wangzhh> OK. Thx.
14:55:29 <Sundar> Yes, agreed with zhipeng
14:55:48 <Sundar> Here are the interfaces exposed by os-vif: https://github.com/openstack/os-vif/blob/master/os_vif/__init__.py
14:56:04 <zhipeng> it means that this is a VM related operation and we assume the Nova got the necessary previllage
14:56:04 <Sundar> I will try to locate how nova compute calls it
14:56:30 <zhipeng> we could focus on the details later, whether the lib itself could be called
14:56:45 <zhipeng> or something implement the lib interfaces should be called
14:57:35 <zhipeng> okey so everyone good on the os-acc spec ?
14:57:35 <Sundar> Nova compute calls the initialize API of os-vif directly here: https://github.com/openstack/nova/blob/master/nova/cmd/compute.py#L49
14:59:05 <shaohe_feng> what does os_vif.initialize do?
14:59:27 <shaohe_feng> will it set up RPC client?
14:59:39 <Sundar> It sets up os_vif for further calls, like plug and unplug to attach/detach network ports to instances
15:00:06 <shaohe_feng> we want to know the plug  detail
15:00:17 <shaohe_feng> how does it call neutron.
15:00:37 <shaohe_feng> by API or RPC?
15:00:51 <Sundar> shaohe: It seems to a direct call AFAICS> Here's the implementation: https://github.com/openstack/os-vif/blob/master/os_vif/__init__.py#L24
15:01:51 <Sundar> shaohe: Could I ask whether it is important to settle this now before approving the spec?
15:02:18 <zhipeng> we could discuss this more in detail for implementation
15:02:27 <Sundar> The spec focuses on 2 things: behavior of accelerators during start/top/etc. and the os-acc interfaces
15:02:48 <shaohe_feng> Sundar, it should block the spec.
15:02:55 <Sundar> The implementation is up to us. There will be more comments on the code patch that implements this :)
15:03:10 <shaohe_feng> for if it can agent, we should be careful.
15:04:11 <shaohe_feng> s/for if it can agent/for if it call agent
15:04:13 <zhipeng> Sundar regarding nrp
15:04:21 <Sundar> We should be careful in the implementation. But the spec does not say RPC, API or whatever
15:04:26 <zhipeng> I think from the maillinglist discussion
15:04:31 <zhipeng> we should still go for it
15:04:38 <zhipeng> per your sched spec
15:05:02 <Sundar> Shaohe, if we don;t close on os-acc spec now, what is your proposed plan for Rocky?
15:06:04 <zhipeng> Sundar shaohe_feng let's moving on from the os-acc specifics
15:06:19 <shaohe_feng> OK
15:06:20 <zhipeng> actually let me put down a comment
15:06:51 <zhipeng> #action os-acc spec https://review.openstack.org/566798 after Li Liu remove -1 is ready to go
15:07:06 <zhipeng> so Li Liu still has to check :)
15:07:12 <Sundar> Thanks, zhipeng :)
15:07:21 <zhipeng> now back on the sched spec
15:07:31 <zhipeng> nrp
15:07:38 <zhipeng> I think we should still go for it
15:07:39 <Sundar> zhipeng, re. nRP, it may still take more time
15:07:50 <Sundar> We may not deliver anything if we keep waiting for it
15:08:13 <Sundar> Can we start by applying the traots on compute node RP and moving later when ever nRP is ready?
15:08:18 <Sundar> *traits
15:08:23 <zhipeng> given the impression from the Nova team
15:08:52 <zhipeng> nrp should be a worthy goal for Rocky, if we wait that's gonna be another cycle
15:09:12 <zhipeng> plus we have Alex here :P
15:09:45 <Sundar> What will be delivered in Rocky then?
15:10:17 <zhipeng> all the nrp based traits and rcs we discussed
15:10:28 <zhipeng> and placement will be ready i suspect
15:11:46 <shaohe_feng> Sundar, we have a discuss nRP on Monday's meeting.  can you  summary it?
15:14:20 <zhipeng> Sundar are you still around ?
15:14:22 <Sundar> shaohe: after Monday's meeting, I started a thread with Nova. Please see some of the responses, like: http://lists.openstack.org/pipermail/openstack-dev/2018-June/131157.html
15:14:35 <zhipeng> the agent-driver api spec
15:14:52 <zhipeng> is this urgent for rocky as well ?
15:14:58 <Sundar> The virt-drivers need to be updated for nRP, and there are still some concerns around in-place upgrades with nRP
15:15:40 <Sundar> zhipenf, Just trying to understand :) -- if nRP is not ready in Rocky, what will Cyborg deliver in Rocky?
15:15:45 <Sundar> *zhipeng
15:16:37 <zhipeng> Sundar we could make a hack work
15:16:49 <shaohe_feng> zhipeng, we should make a decision on how we report the resource to placement.
15:17:14 <zhipeng> shaohe_feng specifically ?
15:18:36 <zhipeng> Sundar I see the feedbacks in the email thread, and the general feedback is that most of the stuff could be done
15:19:46 <Sundar> zhipeng: Say the nRP functionality is ready by mid-July. Would we have enough time to get it done after that? We have lots of people waiting to use Cyborg. Cyborg has got popular. ;)
15:20:02 <shaohe_feng> zhipeng, Must it be ready for resource report in R release?
15:20:08 <zhipeng> Sundar we have more than capable devs :)
15:20:42 <zhipeng> shaohe_feng basic functionality should be ready I presume
15:20:46 <zhipeng> nothing too fancy
15:21:04 <shaohe_feng> Sundar, yes, cyborg is becoming popular.
15:21:21 <Sundar> zhipeng: Definitely. :) But it may help to have a backup plan, right? Without that, we cannot get basic VM placement to work, AFAICS.
15:22:22 <zhipeng> yes we will have time for a backup plan, this could be planned together with Li Liu and Zhuli
15:23:10 <zhipeng> we shoot for NRP as priority, if Nova could not deliver it, then we could go backup
15:23:27 <zhipeng> but I don't want to drop NRP to a secondary concern at first
15:25:02 <Sundar> So, for development till then, we could invoke placement in some ad hoc way to populate inventory and traits ?
15:25:49 <zhipeng> i think so
15:27:07 <Sundar> OK, zhipeng. Your call. :) I had updated the spec to reflect compute node RP as a backup. I cna further clarify that nRP is the preferred way. Would that be enough?
15:27:37 <zhipeng> that'd be great :)
15:28:04 <zhipeng> btw all the specs we deemed ready today will be merge no later than the end of the week
15:28:31 <Sundar> Sure, thanks. :) Could we say that the spec is ready modulo that clarification?
15:29:06 <zhipeng> yes
15:29:16 <shaohe_feng> Sundar, another question about nRP
15:29:17 <zhipeng> that's actually more related to the implementation
15:29:30 <shaohe_feng> how we call placement?
15:29:49 <zhipeng> folks I got to drop, plz continue discussion, I will come back and terminate the meeting :)
15:29:55 <shaohe_feng> The placement client it ready?
15:30:13 <shaohe_feng> Sundar, I did not find it.
15:30:22 <Sundar> shaohe, I also need to drop for another call. Can we pursue in this IRC channel later?
15:30:31 <shaohe_feng> OK.
15:30:43 <shaohe_feng> then we can terminate the meeting
15:30:50 <shaohe_feng> ^ zhipeng
15:34:35 <zhipeng> okey then :)
15:34:42 <zhipeng> thx everyone for the discussion
15:34:45 <zhipeng> #endmeeting