14:00:42 <zhipeng> #startmeeting openstack-cyborg 14:00:44 <openstack> Meeting started Wed Jun 6 14:00:42 2018 UTC and is due to finish in 60 minutes. The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:47 <openstack> The meeting name has been set to 'openstack_cyborg' 14:00:55 <zhipeng> #topic Roll Call 14:00:59 <zhipeng> #info Howard 14:03:11 <Sundar> #info Sundar 14:05:54 <sum12> #info sum12 14:08:23 <NokMikeR> #info Mike 14:11:31 <zhipeng> okey let's start 14:11:36 <zhipeng> we have only one topic today :) 14:11:45 <zhipeng> #topic rocky spec finalization 14:12:59 <zhipeng> let's start from quota spec 14:13:07 <zhipeng> which xinran__ has been working on 14:13:07 <Sundar> For https://review.openstack.org/#/c/554717/, we probably need at least one +1 from Nova 14:13:40 <zhipeng> Sundar got it :) 14:13:43 <zhipeng> we get to that 14:13:53 <xinran__> #info xinran__ 14:14:07 <Sundar> Sure, zhipeng. NP 14:14:22 <zhipeng> #link https://review.openstack.org/560285 14:14:55 <xinran__> Hi for quota spec, do you think implement this on api layer? 14:15:06 <Sundar> For quotas, we have had a discussion with Nova folks: http://lists.openstack.org/pipermail/openstack-dev/2018-May/130563.html 14:15:22 <Sundar> It doesn;t look like there is any foolproof way to enforce quotas today 14:15:35 <Sundar> So, you guys cna make a call 14:15:41 <zhipeng> yes this is what I understand as well 14:15:47 <xinran__> I mean when there is a new api request, we should check/update quota 14:16:43 <xinran__> It also depend on how cyborg interact with nova. 14:17:10 <zhipeng> xinran__ what is your current proposal ? 14:17:13 <xinran__> If nova call agent directly, we should change it 14:17:24 <xinran__> On api layer 14:17:37 <zhipeng> i'm inclined to that option 14:17:53 <zhipeng> i don't see a good scenario for nova to call cyborg-agent directly 14:18:23 <xinran__> Yes 14:18:39 <shaohe_feng> why nova need to call agent? 14:19:07 <xinran__> the current nova/cyborg interaction is calling agent I am not sure about this 14:19:25 <xinran__> What do you guys think? 14:19:31 <zhipeng> no atm we have conductor interact with placement 14:19:33 <zhipeng> that is all 14:19:44 <zhipeng> there will be/should be the api layer interaction 14:19:48 <xinran__> I mean the current spec :) 14:19:49 <zhipeng> but we are not there yet 14:19:58 <shaohe_feng> oh, if nova call cyborg-agent, there is a problem. 14:20:22 <zhipeng> I would suggest to have it on the api layer 14:20:28 <Sundar> Xinran, can you clarify which spec says Nova compute does or should call into Cyborg agent? 14:21:00 <xinran__> Let me find it 14:21:26 <zhipeng> okey for the spec itself in the current shape 14:21:30 <zhipeng> everyone happy with it ? 14:22:53 <Sundar> zhipeng, the present scheme is not ideal, but I have no objections, since I know it is a priority for others. :) 14:23:08 <Sundar> We can improve upon it over time 14:23:17 <zhipeng> Sundar agree :) 14:23:22 <shaohe_feng> IMO, cyborg can do lazy quota. 14:23:32 <zhipeng> so then let's mark the spec ok to go 14:23:41 <Sundar> Agreed 14:23:42 <xinran__> https://review.openstack.org/#/c/566798/5/doc/specs/rocky/compute-node.rst 14:23:51 <zhipeng> #action quota spec https://review.openstack.org/#/c/560285/ ready 14:24:22 <xinran__> What to you mean lazy quota shaohe_feng 14:25:04 <shaohe_feng> which means, even scheduler pass, cyborg will still refuse accelerator quest for quota failed. 14:25:06 <Sundar> Xinran, compute-node (os-acc) spec does not say that Nova compute will call directly into Cyborg agent. It says that Nova compute should call into os-acc, which should call into Cyborg agent. 14:25:37 <shaohe_feng> os-acc can call cyborg agent? 14:26:13 <zhipeng> guys this is another problem 14:26:18 <zhipeng> let's move on :) 14:26:25 <zhipeng> we have limited time for a lot of specs 14:26:36 <shaohe_feng> Does that means nova will call cyborg agent without api? 14:26:55 <zhipeng> moving on 14:27:10 <zhipeng> Li Liu's two patches on metadata and programming 14:27:18 <shaohe_feng> xinran__, lazy quota, just performance issue. 14:27:38 <xinran__> shaohe_feng: ok got it 14:27:51 <zhipeng> #link https://review.openstack.org/558265 14:28:03 <zhipeng> any further comment on the metadata spec ? 14:28:24 <xinran__> Sundar: the “directly” I mean is not pass api layer 14:28:46 <Sundar> I thought Li Liu added a function name. Looking for it 14:29:05 <zhipeng> also Melissa 14:29:21 <zhipeng> @Guest24200 14:29:31 <zhipeng> has Xilinx team also went over the spec ? 14:32:21 <zhipeng> Sundar has Li Liu addressed your comment ? 14:32:28 <NokMikeR> what bootstraps the driver instance in the first place? e.g if a driver is requested that may require an additional tool to already be present like a driver deamon from the fpga vendor to be already in place. 14:32:42 <Sundar> During last spec day, Li Liu and I agreed to add a function name to the bitstream spec as an optional property. #link http://eavesdrop.openstack.org/irclogs/%23openstack-cyborg/%23openstack-cyborg.2018-05-09.log.html#t2018-05-09T19:00:22 14:33:37 <Sundar> This is also documented in the scheldung spec #link https://review.openstack.org/#/c/554717/ 14:33:43 <Sundar> *scheduling 14:34:06 <Sundar> Can we get that addressed? 14:34:26 <zhipeng> sure 14:34:53 <zhipeng> #action metadata spec https://review.openstack.org/558265 to add a function name 14:35:16 <zhipeng> other than that there should be no problems right ? 14:36:20 <Sundar> Zhipeng, yes. Thanks 14:36:36 <zhipeng> sounds great :) 14:37:06 <zhipeng> #link https://review.openstack.org/#/c/559395/ 14:37:10 <zhipeng> programming 14:37:36 <zhipeng> i think other than Zuul, we should be happy about this now 14:37:40 <zhipeng> :P 14:38:33 <Sundar> I don't see any issue with the basics. We don't have an end-to-end flow that uses this. I guess this is meant to be a standalone API? 14:39:43 <zhipeng> yep 14:40:32 <Sundar> OK. We can tweak this later as needed, when we need to define end-to-end flows. With that understanding, we can approve it as it stands. Sounds good? 14:41:06 <zhipeng> shaohe_feng and others ? 14:41:21 <zhipeng> at least from the review I see Li Liu had addressed all the comments 14:42:12 <shaohe_feng> zhipeng, OK, it looks good for me 14:43:33 <zhipeng> cool 14:43:46 <zhipeng> #action programming spec https://review.openstack.org/#/c/559395/ ready to go 14:44:40 <zhipeng> okey let's go to Sundar's four spec 14:44:53 <zhipeng> Sundar which one do you think is ready to go ? 14:45:03 <Sundar> All of them :) 14:45:17 <zhipeng> haha great 14:45:20 <Sundar> I will still request Nova folks to do a +1 on scheduling spec. 14:45:37 <zhipeng> yes that one has gone through a lot 14:45:48 <Sundar> For os-acc also, we should probably get Nova ok, right? 14:46:42 <shaohe_feng> os-acc will call cyborg-agent? 14:46:55 <shaohe_feng> and nova will call os-acc? 14:46:55 <zhipeng> https://review.openstack.org/#/c/566798/ 14:46:59 <zhipeng> this one right ? 14:47:23 <Sundar> zhipeng, yes 14:47:25 <zhipeng> shaohe_feng I think nova-compute calls os-acc to do the attach/detach 14:47:58 <zhipeng> but I think the original goal of os-acc is to serve as a library 14:48:06 <Sundar> shaohe: Yes, as zhipeng says. 14:48:31 <zhipeng> Sundar does nova-compute also calls os-brick or os-vif ? 14:48:32 <Sundar> zhipeng, yes. It is still a library that Cyborg provides, like os-vif for Neutron 14:49:20 <Sundar> zhipeng, yes, nova compute calls into os-vif -- plug(), unplug() API 14:49:25 <Sundar> I'll try to get a link 14:49:37 <shaohe_feng> zhipeng, so that means nova will call agent by attach/detach directly? 14:50:05 <shaohe_feng> is there no race for cyborg API and os-acc? 14:50:29 <zhipeng> i think for the nova scenario, which means accelerator attachment for the VM 14:50:52 <zhipeng> the attach should be issued by nova-compute 14:50:58 <zhipeng> however for the baremetal usecase 14:51:07 <zhipeng> it should be gone through cyborg-api 14:51:08 <shaohe_feng> Sundar, os-vif plug API call nuetron agent? 14:51:50 <zhipeng> i think we should target os-acc for VM usecase for Rocky 14:52:12 <zhipeng> which means os-acc alls the cyborg-agent directly to call upon the driver 14:52:26 <Sundar> That is my understanding. We can double check. But, please note that the spec doesn't make implementation commitments yet. We can implement it the same way as os-vif 14:52:28 <zhipeng> (and the driver invoke the bus protocols) 14:52:54 <Sundar> zhipeng: Agreed :) 14:52:56 <shaohe_feng> edleafe, os-vif can call neutron agent directly by RPC by-pass API? 14:53:00 <Sundar> shaohe: I think you may be referring to scenarios like the one where Cyborg API is called to program a region, which is already in use? 14:53:42 <zhipeng> i think the attach/detach does not make assumption on the operation 14:53:47 <wangzhh> Hi, I'm confused. Could anyone explain. When nova-compute call cyborg by os-acc, It will call by http(API) or just by rpc(message queen)? 14:53:50 <zhipeng> no matter a region needs to be programmed or not 14:54:02 <zhipeng> if attach, it presumes the device is ready 14:54:33 <wangzhh> Or both of them? 14:54:35 <zhipeng> wangzhh the current thinking is via rpc to call cyborg-agent in order to get to the driver 14:54:47 <zhipeng> no api involved in this scenario 14:55:21 <wangzhh> OK. Thx. 14:55:29 <Sundar> Yes, agreed with zhipeng 14:55:48 <Sundar> Here are the interfaces exposed by os-vif: https://github.com/openstack/os-vif/blob/master/os_vif/__init__.py 14:56:04 <zhipeng> it means that this is a VM related operation and we assume the Nova got the necessary previllage 14:56:04 <Sundar> I will try to locate how nova compute calls it 14:56:30 <zhipeng> we could focus on the details later, whether the lib itself could be called 14:56:45 <zhipeng> or something implement the lib interfaces should be called 14:57:35 <zhipeng> okey so everyone good on the os-acc spec ? 14:57:35 <Sundar> Nova compute calls the initialize API of os-vif directly here: https://github.com/openstack/nova/blob/master/nova/cmd/compute.py#L49 14:59:05 <shaohe_feng> what does os_vif.initialize do? 14:59:27 <shaohe_feng> will it set up RPC client? 14:59:39 <Sundar> It sets up os_vif for further calls, like plug and unplug to attach/detach network ports to instances 15:00:06 <shaohe_feng> we want to know the plug detail 15:00:17 <shaohe_feng> how does it call neutron. 15:00:37 <shaohe_feng> by API or RPC? 15:00:51 <Sundar> shaohe: It seems to a direct call AFAICS> Here's the implementation: https://github.com/openstack/os-vif/blob/master/os_vif/__init__.py#L24 15:01:51 <Sundar> shaohe: Could I ask whether it is important to settle this now before approving the spec? 15:02:18 <zhipeng> we could discuss this more in detail for implementation 15:02:27 <Sundar> The spec focuses on 2 things: behavior of accelerators during start/top/etc. and the os-acc interfaces 15:02:48 <shaohe_feng> Sundar, it should block the spec. 15:02:55 <Sundar> The implementation is up to us. There will be more comments on the code patch that implements this :) 15:03:10 <shaohe_feng> for if it can agent, we should be careful. 15:04:11 <shaohe_feng> s/for if it can agent/for if it call agent 15:04:13 <zhipeng> Sundar regarding nrp 15:04:21 <Sundar> We should be careful in the implementation. But the spec does not say RPC, API or whatever 15:04:26 <zhipeng> I think from the maillinglist discussion 15:04:31 <zhipeng> we should still go for it 15:04:38 <zhipeng> per your sched spec 15:05:02 <Sundar> Shaohe, if we don;t close on os-acc spec now, what is your proposed plan for Rocky? 15:06:04 <zhipeng> Sundar shaohe_feng let's moving on from the os-acc specifics 15:06:19 <shaohe_feng> OK 15:06:20 <zhipeng> actually let me put down a comment 15:06:51 <zhipeng> #action os-acc spec https://review.openstack.org/566798 after Li Liu remove -1 is ready to go 15:07:06 <zhipeng> so Li Liu still has to check :) 15:07:12 <Sundar> Thanks, zhipeng :) 15:07:21 <zhipeng> now back on the sched spec 15:07:31 <zhipeng> nrp 15:07:38 <zhipeng> I think we should still go for it 15:07:39 <Sundar> zhipeng, re. nRP, it may still take more time 15:07:50 <Sundar> We may not deliver anything if we keep waiting for it 15:08:13 <Sundar> Can we start by applying the traots on compute node RP and moving later when ever nRP is ready? 15:08:18 <Sundar> *traits 15:08:23 <zhipeng> given the impression from the Nova team 15:08:52 <zhipeng> nrp should be a worthy goal for Rocky, if we wait that's gonna be another cycle 15:09:12 <zhipeng> plus we have Alex here :P 15:09:45 <Sundar> What will be delivered in Rocky then? 15:10:17 <zhipeng> all the nrp based traits and rcs we discussed 15:10:28 <zhipeng> and placement will be ready i suspect 15:11:46 <shaohe_feng> Sundar, we have a discuss nRP on Monday's meeting. can you summary it? 15:14:20 <zhipeng> Sundar are you still around ? 15:14:22 <Sundar> shaohe: after Monday's meeting, I started a thread with Nova. Please see some of the responses, like: http://lists.openstack.org/pipermail/openstack-dev/2018-June/131157.html 15:14:35 <zhipeng> the agent-driver api spec 15:14:52 <zhipeng> is this urgent for rocky as well ? 15:14:58 <Sundar> The virt-drivers need to be updated for nRP, and there are still some concerns around in-place upgrades with nRP 15:15:40 <Sundar> zhipenf, Just trying to understand :) -- if nRP is not ready in Rocky, what will Cyborg deliver in Rocky? 15:15:45 <Sundar> *zhipeng 15:16:37 <zhipeng> Sundar we could make a hack work 15:16:49 <shaohe_feng> zhipeng, we should make a decision on how we report the resource to placement. 15:17:14 <zhipeng> shaohe_feng specifically ? 15:18:36 <zhipeng> Sundar I see the feedbacks in the email thread, and the general feedback is that most of the stuff could be done 15:19:46 <Sundar> zhipeng: Say the nRP functionality is ready by mid-July. Would we have enough time to get it done after that? We have lots of people waiting to use Cyborg. Cyborg has got popular. ;) 15:20:02 <shaohe_feng> zhipeng, Must it be ready for resource report in R release? 15:20:08 <zhipeng> Sundar we have more than capable devs :) 15:20:42 <zhipeng> shaohe_feng basic functionality should be ready I presume 15:20:46 <zhipeng> nothing too fancy 15:21:04 <shaohe_feng> Sundar, yes, cyborg is becoming popular. 15:21:21 <Sundar> zhipeng: Definitely. :) But it may help to have a backup plan, right? Without that, we cannot get basic VM placement to work, AFAICS. 15:22:22 <zhipeng> yes we will have time for a backup plan, this could be planned together with Li Liu and Zhuli 15:23:10 <zhipeng> we shoot for NRP as priority, if Nova could not deliver it, then we could go backup 15:23:27 <zhipeng> but I don't want to drop NRP to a secondary concern at first 15:25:02 <Sundar> So, for development till then, we could invoke placement in some ad hoc way to populate inventory and traits ? 15:25:49 <zhipeng> i think so 15:27:07 <Sundar> OK, zhipeng. Your call. :) I had updated the spec to reflect compute node RP as a backup. I cna further clarify that nRP is the preferred way. Would that be enough? 15:27:37 <zhipeng> that'd be great :) 15:28:04 <zhipeng> btw all the specs we deemed ready today will be merge no later than the end of the week 15:28:31 <Sundar> Sure, thanks. :) Could we say that the spec is ready modulo that clarification? 15:29:06 <zhipeng> yes 15:29:16 <shaohe_feng> Sundar, another question about nRP 15:29:17 <zhipeng> that's actually more related to the implementation 15:29:30 <shaohe_feng> how we call placement? 15:29:49 <zhipeng> folks I got to drop, plz continue discussion, I will come back and terminate the meeting :) 15:29:55 <shaohe_feng> The placement client it ready? 15:30:13 <shaohe_feng> Sundar, I did not find it. 15:30:22 <Sundar> shaohe, I also need to drop for another call. Can we pursue in this IRC channel later? 15:30:31 <shaohe_feng> OK. 15:30:43 <shaohe_feng> then we can terminate the meeting 15:30:50 <shaohe_feng> ^ zhipeng 15:34:35 <zhipeng> okey then :) 15:34:42 <zhipeng> thx everyone for the discussion 15:34:45 <zhipeng> #endmeeting