03:00:17 <Sundar> #startmeeting openstack-cyborg
03:00:18 <chenke> Hello sudnar.
03:00:19 <openstack> Meeting started Thu Feb 27 03:00:17 2020 UTC and is due to finish in 60 minutes.  The chair is Sundar. Information about MeetBot at http://wiki.debian.org/MeetBot.
03:00:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
03:00:22 <openstack> The meeting name has been set to 'openstack_cyborg'
03:00:24 <chenke> #info chenke
03:00:35 <Sundar> #info Sundar
03:00:38 <Yumeng> #info Yumeng
03:00:43 <brinzhang> #info brinzhang
03:00:45 <xinranwang> #info xinranwang
03:01:32 <Sundar> Do we have any major topics for today? I'd like to provide a status update on Nova integ after other discussion.
03:02:07 <xinranwang> I have one topic about microversion
03:02:49 <Sundar> Ok, xinranwang. Anything else to discuss, before we get int that?
03:02:50 <xinranwang> Now I implemented a decorator api_version to check the microversion.
03:03:34 <Yumeng> I only have gpu_driver_improve patches need your reivew: https://review.opendev.org/#/q/status:open+project:openstack/cyborg+branch:master+topic:gpu-driver-improve. nothing else.
03:03:35 <xinranwang> please see https://review.opendev.org/#/c/696860/
03:04:17 <brinzhang> I want to talk about the functional tests, I found there are so many work should to do, anyone have some sugestion? or any simple project we can reference?
03:05:14 <Sundar> xinranwang: do you have a question or something to discuss about that patch?
03:06:27 <xinranwang> Yes
03:06:34 <xinranwang> Now I implemented a decorator api_version to check the microversion.
03:06:38 <brinzhang> Sundar, xinranwang, I think mainly is https://review.opendev.org/#/c/696860/3/cyborg/api/controllers/base.py@85
03:08:13 <xinranwang> Do you think we should have a schema check in this patch, IMO,  I think schema check is based on microversion support, we can do it later.
03:08:36 <xinranwang> thanks brinzhang  to paste the link
03:09:42 <shaohe_feng> it can be another patch.
03:09:53 <Sundar> "This decorator MUST appear first (the outermost
03:10:14 <shaohe_feng> schema check  should not only for microversion
03:10:24 <shaohe_feng> it should for all APIs.
03:11:52 <brinzhang> I don't think we should rewrite the same API interface every time while we need to add a new microversion, which will cause a lot of code redundancy.
03:14:04 <xinranwang> brinzhang:  Yes, I understand your concern. For now, we have only one microversion, (the v2.1 is my PoC code, will be merge to v2.0), so there is only one API function.
03:14:32 <brinzhang> xinranwang: Yeah, understand
03:15:06 <s_shogo> #info s_shogo
03:15:20 <xinranwang> What I suggest to do, is to support the microverison firstly, and them we can add schema check in another patch.
03:17:21 <shaohe_feng> I'd like to talk one things, one patch for one issues.  except the nits fix.
03:18:13 <shaohe_feng> https://review.opendev.org/#/c/693784/1/devstack/lib/cyborg@159
03:18:26 <shaohe_feng> ^ this is a example
03:18:32 <Li_Liu> #info Li_Liu
03:19:22 <Yumeng> agree with xinranwang. we can have a bp of API shema check later or in next release.
03:19:23 <shaohe_feng> we have already know there are some issues in devstack config
03:20:12 <shaohe_feng> about half years ago.
03:20:36 <Sundar> xinranwang: I think we all agree with Brin's concern, and that we all agree there should be a schema check. You just want it in another patch, right?
03:20:39 <shaohe_feng> but why we can not continue on it?
03:21:01 <shaohe_feng> we always want to one patch to fix many issues.
03:21:10 <shaohe_feng> fix one by one
03:21:19 <shaohe_feng> this will be a fast way.
03:21:36 <shaohe_feng> and you can see, https://review.opendev.org/#/c/709749/1/devstack/lib/cyborg
03:21:54 <xinranwang> Sundar:  yes, exactly. As Yumeng  said, we should have another bp and patch to implement it.
03:22:20 <shaohe_feng> It is also devstack fix. we have config it right before.
03:22:37 <Sundar> Ok, I agree, xinranwang, Yumeng
03:22:41 <shaohe_feng> Mix many issues together.
03:22:54 <shaohe_feng> really a bad idea.
03:23:30 <xinranwang> It seems brinzhang was dropped off, I will talk to him later. Thanks guys.
03:23:40 <Sundar> s_shogo: No sure what you are trying to say. The link you gave is for an old patch set. Are you saying we should enforce one patch for one issue?
03:24:57 <s_shogo> Sundar: IMHO, the mention intended to shaohe_feng ?
03:25:27 <shaohe_feng> yes, one by one.
03:25:28 <Sundar> Yes, sorry
03:25:48 <s_shogo> Thanks
03:26:14 <shaohe_feng> micro step and may be more fast.
03:26:32 <Sundar> shaohe_feng: Just change the title for your patch, and maybe make 'multinode' as the topic. We could have a patch series for multinode, including Sean's patches
03:27:02 <Sundar> You already have the right topic
03:27:03 <shaohe_feng> mircroversion can also be fix one issues.
03:27:37 <shaohe_feng> and  more patch fix other issues.
03:27:43 <shaohe_feng> This is a right way.
03:28:34 <shaohe_feng> or it will be the same as we fix "multinode" issues in devstack. last half years, and no progress.
03:28:56 <shaohe_feng> s/last/take
03:33:30 <shaohe_feng> ping
03:34:04 <brinzhang_> xinranwang, Sundar, Agree the microversion schema do in next release, and it better done in this release, later we(xinranwang) can talk how to do it easy.
03:34:06 <chenke> Agree, a patch does not need to be modified too much. we can make a series of patches  to speed up review.
03:35:28 <Sundar> chenke, shaoe_feng: Sure, organize in a series of patches. A reviewer may want to see a placeholder patch for the next step before agreeing to previous step?
03:35:38 <Sundar> shaohe_feng: ^
03:35:52 <chenke> Ye. Agree
03:36:37 <chenke> reviewer's suggestions, we should carefully consider and give a reasonable response.
03:37:17 <Sundar> I think we are all in the same page. shaohe_feng may feel that his patch has not merged in a long time. Let's review it quickly and help it get merged.
03:37:23 <Sundar> Ok, anything else?
03:38:23 <shaohe_feng> Add NOTE in patch or summarize the tasks lists in etherpad to follow, if we want to address more issues.
03:38:42 <shaohe_feng> s/follow/track
03:39:21 <Sundar> Yumeng: you seem to be adding mdev support to Cyborg. That is interesting
03:39:52 <Sundar> Tht will require changes to Nova patches too
03:40:11 <Sundar> I am only allowing 'PCI' in the current patches.
03:40:13 <Yumeng> yes, I found that controlpatch_id_type of nvidia GPU should be "MDEV"
03:41:01 <Yumeng> Sundar: I was thinking just now: attach_handle type should also be "MDEV". do you agree?
03:41:06 <shaohe_feng> now the intel and NVIDIA vGPU are MDEV devices.
03:41:10 <Sundar> Yumeng: What do you mean that you found that? If you are doing PCI passthrough of the physical function, it would be 'PCI'. The mdeiated device is a different way of attaching a device, different from PCI passthrough.
03:41:11 <shaohe_feng> Only AMD's are SRIOV.
03:42:03 <Sundar> shaohe_feng and all: We had already discussed that Cyborg intends to support only physical PF passthrough for GPUs for now.
03:42:12 <Yumeng> Thanks shaohe_feng for pointing this.
03:42:50 <shaohe_feng> IMHO, MDEV can exist with nova's together for a long time. like PCI devices
03:42:53 <Sundar> What prevents a Nvidia GPU's PF from being passed through to a VM, instead of mdev?
03:43:59 <shaohe_feng> we should not prevents it.
03:46:04 <Sundar> Yumeng: Re. attach_handle type should also be "MDEV"  -- yes, in fact control path ID cannot be 'mdev', only the attach handle can be. Because mdev refers to how the device is attached to a VM.
03:46:31 <Sundar> The cpid is PCI even for a device that supports mdev.
03:46:49 <Sundar> Because that is the management interface, which needs to be a PCI interface
03:46:54 <Yumeng> Sundar: ops. emmmm. I was testing vGPU, and found that controlpatch_id_type-- "MDEV"  cannot be reported to DB Shema.
03:47:40 <Yumeng> so I was thinking that was a bug of DB.
03:48:28 <Sundar> Yumeng and all: do we intend to support vGPUs? In previous discussions, we said that Cyborg is for offload i.e. GPGPU type use cases, not media or graphics. So, the most comon usage is to ssign an entire GPU (may be multiple GPUs) to the same VM.
03:48:36 <Yumeng> seems that was an intend, right?
03:48:36 <Sundar> *assign
03:50:18 <brinzhang_> Sundar, I would like we Cyborgcan support vGPU, it's an intend
03:50:49 <shaohe_feng> But I know some Public cloud has support vGPU
03:51:06 <shaohe_feng> why not we support it?
03:51:18 <Sundar> brinzhang_: Ho do we distinguish Cyborg from Nova, if both can do all use cases for GPUs?
03:51:24 <Sundar> *How
03:52:42 <Sundar> Anyway, I am fine with vGPU support if you all want it.
03:53:06 <shaohe_feng> and the first propose for smart-nic generic solution in kernel by Redhat also make the device under mdev bus.
03:53:57 <brinzhang_> Sundar, I think Cyborg can provide the vGPU architecture to Nova, so Nova can choose the vGPU to use
03:55:14 <chenke> I think supporting vgpu is not a bad thing for cyborg. So, I agree.
03:55:50 <Sundar> brinzhang_: I don't know what that means. Nova already has vGPU support without Cyborg. Cyborg can report VFs as attach handles (with SR-IOV) or mdevs as attach handles (with Nvidia GPUs). Both would work AFAICS. It is just that Nova and CYborg overlap in functionality.
03:56:18 <Sundar> As i said, I am fine with vGPU support in Cyborg. However, I'd rather close the current Nova patch series as is. I have added only 'PCI" as supported type in https://review.opendev.org/#/c/631245/56/nova/virt/libvirt/driver.py@5698
03:56:45 <Sundar> One of you could add 'mdev' to that.
03:57:08 <shaohe_feng> We know k8s,  CNI for network, CSI for storage, DPI for device/accelerators, let the right components do the right functions
03:57:36 <shaohe_feng> It said they want to add mdev in cyborg
03:58:25 <shaohe_feng> but cyborg's progress is slow, so that they can wait it and add it to nova.
03:59:21 <Yumeng> Sundar:  I think the good side is for the Operators(users). They can manage FPGA and vGPU using just one component cyborg. configuring at cyborg.conf instead of vGPU at nova.conf and FPGA at cyborg.conf.
03:59:29 <chenke> I think the current nova's patch is high priority.
04:00:38 <Sundar> Yumeng and all: Sure. No objections to vGPUs or mediated devices in Cyborg. But it is going to take some effort probably to get that change into Nova
04:00:51 <shaohe_feng> Yes, the cyborg goal is similar to DPI in k8s, manage different accelerators.
04:01:48 <shaohe_feng> Nova change for cyborg mdev can later.  Let's focus on PCI
04:01:50 <Sundar> Yumeng: My main comment would be to move 'mdev' type to attach handles from CPIDs. Also, it needs some testing. Do you plan to support 3rd party CI for GPUs?
04:01:59 <xinranwang> Can vGPU be attached like normal PCI device passthrough, if so, I think there is no conflict.
04:02:20 <Yumeng> xinranwang: it cannot.
04:03:22 <Yumeng> xinranwang: nova already support this. we can reuse those code.
04:03:46 <xinranwang> So VGPU has a different way to be attach, comparing with GPU PF?
04:04:08 <Yumeng> yes, different way.
04:04:09 <shaohe_feng> but cyborg can support it. The cloud provider can change it in nova by themselves downstream, if they really want to use cyborg.
04:04:33 <shaohe_feng> yes, MDEV is uuid in mdev bus.
04:04:43 <shaohe_feng> PCI is bdf in PCI bus.
04:04:50 <Yumeng> yes, exactly. MDEV is uuid in mdev bus.
04:05:14 <shaohe_feng> let  cyborg support mdev firstly.
04:05:43 <shaohe_feng> leave the change in nova to cloud provider.
04:05:57 <shaohe_feng> they have ability to change it.
04:06:20 <shaohe_feng> at present.
04:07:51 <shaohe_feng> they will contribute to nova, if they think it is necessary
04:08:13 <Sundar> Yumeng: do you plan to support 3rd party CI for GPUs?
04:08:17 <Yumeng> ok. thanks for the suggestion. shaohe_feng.  I think we can talk later about the vGPU.  Sundar: pls go back to nova-interaction topic.
04:08:30 <Yumeng> Sundar: not yet for now.
04:08:45 <Sundar> ok
04:09:21 <Sundar> Nova interaction: currently held up on the requirement to support rescheduling. I have a single-node devstack setup, so cannot test rescheduling.
04:10:55 <Sundar> This does not mean multi-node for CYborg -- the functional tests that I wrote mock the Cyborg APIs, so only need it for the Nova side
04:11:52 <Sundar> If anybody has any ideas to handle this, please LMK
04:13:19 <Sundar> That's all I have. Anything else to discuss?
04:14:11 <Yumeng> ok. not from me.
04:15:05 <brinzhang> The functional tests by me, I think I should do some search, then to talk, others none of me.
04:16:09 <Sundar> brinzhang: yes, we need functional tests. You have already started that. What do we need to discuss?
04:17:22 <brinzhang> Sundar, I think there are need so many work I should consider, wait for enough, I will re take it in ircmeeting
04:18:10 <brinzhang> Some cleanup patch, can you +A?
04:18:27 <shaohe_feng> no from me.  If you need some help, please ping me on Skype.
04:18:31 <brinzhang> https://review.opendev.org/#/c/707332/
04:18:49 <brinzhang> https://review.opendev.org/#/c/709974/
04:19:11 <Sundar> brinzhang: Got it. Sure
04:19:31 <Sundar> Good. Thanks a lot, everybody. Have a good day!
04:19:33 <brinzhang> These are not necessary in cyborg or cyborg-specs project, so I want to remove it and make they are clear
04:19:34 <Sundar> #endmeeting