03:01:17 <Li_Liu> #startmeeting openstack-cyborg 03:01:18 <openstack> Meeting started Wed Mar 13 03:01:17 2019 UTC and is due to finish in 60 minutes. The chair is Li_Liu. Information about MeetBot at http://wiki.debian.org/MeetBot. 03:01:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 03:01:21 <openstack> The meeting name has been set to 'openstack_cyborg' 03:01:24 <Li_Liu> Let's get started 03:01:30 <Li_Liu> #topic Roll Call 03:01:36 <Li_Liu> #info Li_Liu 03:01:48 <Coco_gao> #info Coco_gao 03:01:54 <xinranwang> #info xinranwang 03:02:18 <Li_Liu> are sundar and zhenghao here yet? 03:02:40 <Li_Liu> #topic Code Freeze Status Update 03:03:17 <Li_Liu> https://review.openstack.org/#/q/status:open%20project:openstack/cyborg 03:03:21 <Coco_gao> I will update my patch according to the comments these two days. 03:03:38 <Li_Liu> Coco_gao, thanks 03:04:08 <Li_Liu> Why https://review.openstack.org/#/c/574075/ this one is not merged yet? 03:04:18 <Li_Liu> strange ><|| 03:04:52 <Coco_gao> That's depend on my patch 03:05:03 <Coco_gao> because my patch are not merged. 03:05:13 <zhipeng> Zuul not started 03:05:21 <Li_Liu> I see 03:06:33 <Li_Liu> By the hard dead line of code freeze, please add UT to the features you own 03:06:53 <wangzhh> hi all 03:06:58 <Li_Liu> Hi wangzhh 03:07:07 <wangzhh> Sorry for late. 03:08:26 <Coco_gao> I will add my UTs. 03:08:30 <Sundar> #info Sundar 03:08:38 <Li_Liu> Hi Sundar 03:08:43 <Sundar> Sorry for the delay 03:08:47 <Sundar> Hi Li_Liu 03:08:48 <Coco_gao> Hi Sundar 03:08:58 <Sundar> Hi Coco_gao and all 03:09:44 <Li_Liu> I think are in a good shape so far 03:10:30 <zhipeng> Any luck to have xilinx driver lol ? 03:10:40 <Li_Liu> no updates tho 03:11:10 <Li_Liu> I can follow up with Chuck later 03:11:42 <Li_Liu> but prob not gonna make to the deadline 03:12:43 <Li_Liu> zhipeng, we can still refine our docs after the deadline right? 03:13:23 <zhipeng> Need to do it before the RC 03:14:16 <Sundar> What is needed for docs? 03:14:38 <Li_Liu> RC1 is Mar 18 - Mar 22 03:15:54 <Li_Liu> https://docs.openstack.org/cyborg/latest/#developer-documentation 03:16:14 <Li_Liu> yumeng already added quite some stuff there 03:16:25 <Sundar> Li_Liu: We have feedback that we should improve our API docs. I can document the current v1 API, if nobody else volunteers. 03:16:31 <Li_Liu> we need to keep improving it 03:16:41 <Li_Liu> Sundar, sure thanks a lot 03:17:20 <Li_Liu> I will take some time to work on the python-clinet 03:17:41 <Li_Liu> as least make it align with our docs and APIs 03:18:30 <Coco_gao> Hi Sundar, still one thing about driver ovo. Is deployable name unique? why is that? 03:18:35 <Coco_gao> Thanks a lot. 03:19:08 <Sundar> Coco_gao: I think so, because it will be used as resource provider name in Placement, and that must be unique AFAIK 03:19:43 <Li_Liu> How are we going to guarantee deployable's name's uniqueness 03:20:05 <Li_Liu> are we doing the check when the resource is reported? 03:20:25 <Coco_gao> we can't if we set the name field in the drivers. 03:20:33 <Sundar> Coco_gao: I don't see explicit documentation that it must be unique. I will check and get back. 03:21:31 <Sundar> Coco_gao and all: why can't Cyborg agent construct the name from other fields like vendor, type, etc., and add a unique id? 03:21:55 <Sundar> E.g. 'INTEL_FPGA_PAC_CARD_ID1' 03:22:08 <Li_Liu> is this ID1 a uuid? 03:22:26 <Sundar> Li_Liu: I was thinking a simple integer 03:22:38 <Sundar> Oh, wait 03:22:50 <Sundar> There is a convention for naming nested RPs 03:22:59 <Sundar> It is based on compute node name 03:23:07 <Sundar> I will check and send email. 03:23:22 <xinranwang> now the deployable name is the filename in /sys/class/fpga, it's unique 03:24:01 <Sundar> xinranwang: It is unique within a compute node 03:24:10 <Sundar> The same name can repeat across nodes 03:24:16 <Coco_gao> Sundar, I agree we'd better do that in agent. 03:25:01 <Sundar> We are not reporting anything ti Placement yet, right? 03:25:04 <Coco_gao> xinranwang, that's the problem when across nodes, name maybe same right? 03:25:07 <Li_Liu> how about when we report the deployable to placement API, we concate name+uuid 03:25:53 <Sundar> Li_Liu: good idea. I'll get back with the name convention for nested RPs 03:26:16 <wangzhh> xinranwang, what if different node has same device? Is it unique? 03:26:33 <xinranwang> if we support NRP, we can identify which host the deployable locate, should it be ok to have same deployable name in different compute node ? 03:27:13 <Coco_gao> xinranwang, that will be ok, i think. 03:27:33 <shaohe_feng_> the fpga devices name is generated by the kernel. 03:27:37 <wangzhh> xinranwang, Not really, Now it is global unique. 03:27:43 <shaohe_feng_> the name is unique 03:28:33 <shaohe_feng_> it does not mater if different node has same device 03:28:40 <Coco_gao> the reason why we need to keep unique from the aspect of driver ovo is that we need to identify the deployable. But driver ovo is compared in the same node, so, the name need only to be unique in one host. 03:29:06 <shaohe_feng_> Coco_gao: yes. 03:29:33 <wangzhh> Coco, so we should change db, it is global unique now. 03:29:33 <shaohe_feng_> for device@host is unique 03:29:39 <xinranwang> so i think it's ok to have same deployable name in different compute node, in placement side. But name should be unique on same compute node. 03:30:17 <shaohe_feng_> the name is not used to identify a device 03:30:26 <Coco_gao> wangzhh, Sundar and all, maybe we need to change the db constrains on the deployable table, name field. 03:30:46 <Coco_gao> do you argree if I modify that? 03:31:03 <Li_Liu> what constrain? 03:31:07 <shaohe_feng_> just a Prompt for human 03:31:21 <Coco_gao> the name field is unique in deployable table. 03:31:33 <Li_Liu> ah, ok 03:31:42 <Li_Liu> go ahead 03:31:50 <Li_Liu> no problem on my side 03:31:55 <shaohe_feng_> I agree 03:31:58 <Coco_gao> OK, thank you are for the advice. 03:32:05 <Coco_gao> all 03:32:07 <wangzhh> Of course. But how to handle device like gpu, <device_name>_<address>? 03:32:24 <Sundar> Coco_gao: I think it is ok to make it unique because: there is some proposed convention to name nested RPs like '<hostname>_<numaNode>_<x>' and x must be unique within a node anyway for us. 03:32:25 <shaohe_feng_> just keep id/uuid unique. it it machine readable. 03:33:13 <shaohe_feng_> unique in a node is ok. 03:33:18 <wangzhh> shaohe, when driver report a device, it does not have a uuid. 03:33:31 <shaohe_feng_> not need global 03:33:49 <shaohe_feng_> wangzhh: agent gen one for it. :) 03:34:05 <shaohe_feng_> bus is also unique. 03:34:17 <shaohe_feng_> bus is also machine readable. 03:34:30 <Coco_gao> Sundar, the problem is how to generate x to make sure same card is using the same x when reporting. 03:34:56 <wangzhh> shaohe, agent will generate the uuid every time? 03:35:08 <shaohe_feng_> wangzhh: no. just once. 03:35:16 <shaohe_feng_> wangzhh: it need to check the bus. 03:35:35 <shaohe_feng_> wangzhh: on a node, bus is used for machine read . 03:35:51 <shaohe_feng_> on a cluster, uuid is used for machine read 03:36:06 <Sundar> There may not be a PCI bdf in all hypervisors. 03:36:09 <wangzhh> shaohe, I suppose you mean to generate it at first time. 03:36:23 <shaohe_feng_> Coco_gao: the x can be generated by the bus. 03:36:33 <shaohe_feng_> Coco_gao: let me show you an example 03:36:37 <shaohe_feng_> wangzhh: yes. 03:36:44 <Coco_gao> thanks shaohe 03:36:51 <Li_Liu> if there's no bdf, can we use uuid? 03:37:35 <shaohe_feng_> Li_Liu: there's another identification without bdf 03:37:36 <shaohe_feng_> for 03:37:38 <wangzhh> But agent doesn't know which time it is. 03:37:51 <wangzhh> shaohe_feng_ 03:38:13 <shaohe_feng_> wangzhh: it need to check. if the bus not in the db, then it is the first time. 03:38:14 <Li_Liu> shaohe_feng_, sure that also works 03:38:38 <shaohe_feng_> seems mdev has a uuid. 03:38:48 <shaohe_feng_> and usb has it own bus. 03:38:53 <Sundar> The driver should report a unique id within the node for each device. It could be PCI bdf for libvirt or whatever is unique for PowerVM and others 03:38:54 <wangzhh> If so, agent should query db first. do something like diff? 03:39:08 <Sundar> Then that could be the x factor 03:39:37 <Sundar> wangzhh: No, agent should not query db. For 2 reasons: scaling, upgrades can change db schema 03:39:46 <shaohe_feng_> wangzhh: yes. wen agent start. ti should sync with db firstly 03:39:47 <wangzhh> +1 03:39:50 <shaohe_feng_> when 03:40:31 <wangzhh> shaohe_feng_ agent doesn't query db now. 03:40:33 <shaohe_feng_> Sundar: no, it should sync when it start. and can keep the info in cache. 03:40:57 <Sundar> Agent should not keep state. Even if it reads db at startup, it cannot assume that it will remain in sync, because operator can update config 03:41:25 <Sundar> No cache, please. We will hit all kinds of issues with stale caches, aging, etc. 03:41:28 <Li_Liu> shaohe_feng_, is the cache only containing the information related to the node? 03:41:39 <shaohe_feng_> yes. 03:41:41 <wangzhh> Agree with sundar at this part. :) 03:41:49 <shaohe_feng_> it's own node info. 03:42:12 <shaohe_feng_> let me show you what I do. 03:42:20 <Li_Liu> Sundar, I think it should be ok if it only holds its own information in cache 03:42:53 <wangzhh> shaohe_feng_ haha talk is cheap, show me your code. :) 03:42:54 <Sundar> Li_Liu: The operator may want to disable or enable specific devices, or do other config. 03:43:21 <shaohe_feng_> wangzhh: yes, I do show you code 03:43:28 <Coco_gao> before diff, the agent should get the old driver ovo, is that from db or cache? 03:43:30 <wangzhh> Cool. 03:43:34 <shaohe_feng_> wangzhh: I have implemented it. 03:43:43 <Sundar> Li_Liu: Then we have to propagate such changes to each agent, ensure that it has received it, etc. The agent doesn't need any state for discovery -- just add a unique field that driver reports. 03:43:53 <shaohe_feng_> I report the placement by: device_name@host this is unique 03:44:07 <shaohe_feng_> and I just pud the device_name in cyborg db 03:44:27 <shaohe_feng_> it can works well, no any conflict, 03:44:28 <Coco_gao> I agree with shaohe. 03:44:51 <shaohe_feng_> for placement use device_name@host for index. 03:45:02 <Sundar> Coco_gao: Again, there are some conventions proposed for nested RP names. I am still trying to find the spec/doc where I saw that. 03:45:06 <shaohe_feng_> but cyborg does not use device_name for index. 03:46:54 <shaohe_feng_> Sundar: that's 2 things, but if you want to keep it same. it is OK. 03:47:50 <Sundar> shaohe_feng: There's no point in making them different. The only reason why we have a deployable name is to report to placement 03:47:51 <shaohe_feng_> the big problems it not this. 03:48:20 <Li_Liu> Sundar, please help to find out the conventions. shaohe_feng_, could you share you code with us? 03:48:37 <shaohe_feng_> the big problems is enumeration. 03:48:41 <xinranwang> maybe keep deployable name unique on same compute, and add hostname like "@host" when report to placement. 03:48:45 <Li_Liu> It seems we need some further discussion on this issue, we can discuss it in tomorrow's zoom sync 03:48:55 <xinranwang> i believe that's what shaohe_feng_ did. 03:49:37 <shaohe_feng_> Li_Liu: if restart, and some change on the host. the enumeration may change the bus of a same device. 03:50:05 <Li_Liu> shaohe_feng_, yea, I know 03:50:15 <shaohe_feng_> I means cloud provider may resize the hardware on the host 03:50:24 <shaohe_feng_> so that's we really need to care 03:51:08 <shaohe_feng_> after all, the machine need bus to identify a device not the name. 03:51:56 <shaohe_feng_> Li_Liu: yes. maybe we care the same thing. 03:52:32 <Li_Liu> shaohe_feng_, driver should do the mapping from bus to device name/id I believe 03:53:01 <shaohe_feng_> Li_Liu: yes, that's what we need to improve. 03:54:40 <Li_Liu> ok 03:55:12 <Sundar> @all: Please look at https://git.openstack.org/cgit/openstack/nova-specs/tree/specs/stein/approved/numa-topology-with-rps.rst?h=refs/changes/24/552924/14#n163 03:55:28 <Coco_gao> shaohe_feng, that will be ok if name change, the conductor will delete the old device with name1 and add new device to db with name2. But actually, the db is exactly the same with the real situation. 03:55:43 <Li_Liu> Since it's pretty late for me here. Let's close it up and discuss more detail in tomorrow's sync up 03:56:05 <shaohe_feng_> Li_Liu: OK. 03:56:33 <Coco_gao> but that name change for one device is not supposed to be frequent. 03:56:50 <shaohe_feng_> Coco_gao: yes. not frequently. 03:57:17 <shaohe_feng_> seldom resize the baremetal 03:59:41 <Li_Liu> Alright, let's call the meeting for today. Have a good night/day where ever you are 03:59:45 <Li_Liu> #endmeeting