15:04:20 <shaohe_feng_> #startmeeting openstack-cyborg-driver 15:04:21 <openstack> Meeting started Mon Mar 18 15:04:20 2019 UTC and is due to finish in 60 minutes. The chair is shaohe_feng_. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:04:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:04:25 <openstack> The meeting name has been set to 'openstack_cyborg_driver' 15:04:39 <shaohe_feng_> let waits for a minutes. 15:05:39 <shaohe_feng_> #info shaohe_feng_ 15:05:40 <wangzhh> Fine. 15:06:03 <xinranwang> Hi all 15:06:07 <wangzhh> Hi xinran. 15:06:11 <xinranwang> Sorry for late 15:06:16 <xinranwang> #info xinranwang 15:06:19 <shaohe_feng_> evening xinranwang 15:06:24 <wangzhh> #info wangzhh 15:06:35 <xinranwang> hi shaohe_feng_ wangzhh 15:06:53 <shaohe_feng_> we have not hold this meeting for a long time. 15:07:24 <shaohe_feng_> #link https://wiki.openstack.org/wiki/Meetings/CyborgDriverTeamMeeting#Agenda_for_next_meeting_:_Mar_18th.2C_2019 15:07:28 <shaohe_feng_> here is the agent. 15:07:50 <shaohe_feng_> s/agent/agenda 15:07:57 <Li_Liu> Hi Gyus 15:08:04 <wangzhh> Hi, uncle Li. 15:08:16 <shaohe_feng_> Li_Liu: morning uncle Li. 15:08:19 <Li_Liu> Hi, xiaohei~~ 15:08:28 <Li_Liu> hi shaohe 15:08:38 <Li_Liu> you guys wanna do a zoom meeting instead? 15:08:38 <shaohe_feng_> I want to introduce some some hardware accelerators. 15:09:10 <shaohe_feng_> 1. the current know type of accelerator card 15:09:45 <shaohe_feng_> as we all know cyborg will support mdev and pci card. 15:10:11 <shaohe_feng_> but now I find there are 2 other kinds of hardware card we can support. 15:10:35 <shaohe_feng_> one is ip over PCIE, another is USB. 15:10:47 <Li_Liu> i see 15:11:06 <shaohe_feng_> wangzhh: do you know these two kind cards? 15:11:08 <Li_Liu> can they fit into our current design? 15:11:29 <shaohe_feng_> not sure, so we need more discuss with them. 15:11:56 <wangzhh> I don't know much about ip over pcie, what does that mean? 15:12:29 <Li_Liu> I think it's a remote case 15:12:55 <Li_Liu> PCI over ethernet? 15:13:14 <shaohe_feng_> Li_Liu: yes. 15:13:43 <shaohe_feng_> #link https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/vca-2-visual-compute-accelerator-product-brief.pdf 15:13:53 <shaohe_feng_> Li_Liu: No, IP over pci. 15:14:05 <Li_Liu> from Operation System point of view, it's still a pci device right? 15:14:45 <shaohe_feng_> Li_Liu: is it s pci devices, but you communicate with it by it. 15:14:52 <shaohe_feng_> Li_Liu: it is a local pci card. 15:15:16 <shaohe_feng_> such as the vca2 card, see link above, 15:15:34 <Li_Liu> you mean other hosts can communicate with it over ethernet? 15:16:29 <wangzhh> So, actually, it is a pci device? 15:16:31 <shaohe_feng_> oh, the local host communicate the local card, over PCIE. 15:16:49 <wangzhh> from os view. 15:17:50 <shaohe_feng_> wangzhh: from the os view, you can see it a new kind device with new driver maybe. 15:18:18 <shaohe_feng_> there's another card I have attend meeting last week, seem this is a common way for some card. 15:18:58 <shaohe_feng_> we can dig more about this kind of card. 15:19:52 <shaohe_feng_> for usb card, the movidius AI card is this kind. 15:20:10 <Li_Liu> I think we just need to make sure 2 things: 1. can os-acc attach it like all the other devices, 2. can the resource fit into our current data model 15:20:53 <shaohe_feng_> yes. 15:20:55 <Li_Liu> as long as these two requirements can meet, we should be good 15:21:26 <wangzhh> It make sense. 15:21:48 <shaohe_feng_> I think the usb devices can satisfy these two requirements. 15:22:01 <Li_Liu> shaohe_feng_ have they finalize the resource structure yet? 15:23:10 <shaohe_feng_> Li_Liu: usb, yes. 15:23:31 <shaohe_feng_> just remind these 2 kind devices. 15:23:31 <wangzhh> How about another one? 15:24:20 <shaohe_feng_> wangzhh: I'm not looking into looking into it well. 15:24:35 <shaohe_feng_> OK, let's go ahead. 15:24:48 <wangzhh> OK. 15:24:52 <shaohe_feng_> Re-enumeration of hardware card 15:25:17 <shaohe_feng_> most of us know the issue of Re-enumeration. 15:25:52 <Li_Liu> the issue we discussed last week? 15:26:16 <shaohe_feng_> no, but this is a common issue. 15:26:43 <shaohe_feng_> the bus of a hardware card maybe change after we resize a hardware and reboot. 15:27:12 <shaohe_feng_> seem this is a big problem for accelerator manage in cyborg. 15:28:01 <shaohe_feng_> I have discuss it with Yongli, the main PCI devices contributor in nova. 15:28:29 <shaohe_feng_> he say, nova does not allow resize hardware 15:29:35 <Li_Liu> you mean add/remove device after reboot? 15:29:54 <shaohe_feng_> yes 15:30:04 <shaohe_feng_> unless evict all VMs from this node. 15:30:30 <shaohe_feng_> Li_Liu: wangzhh: xinranwang: what's do you think about? 15:31:00 <shaohe_feng_> Or do you have a good ideas for hardware resize? 15:31:33 <wangzhh> Wuu, IMO, it's better to change status to error or offline in cyborg. 15:31:58 <wangzhh> And let operator sync it manaully. 15:32:17 <Li_Liu> Let's say before restart we have 3 cards, after restart we now have 4 cards 15:32:44 <Li_Liu> I think driver can find out which one is the new one right? 15:32:47 <wangzhh> We can supply a tool or api for operator to update it. 15:32:50 <xinranwang> If we plug in a new card on server, and reboot. But the hw resource assigned to an instance does not change. 15:33:17 <xinranwang> will the bdf change? if so, that should be an issue. 15:33:20 <wangzhh> Li_liu, as xinran said. 15:33:57 <Li_Liu> the bdf might change, but we don't need to guarantee give user the card with the same bdf 15:34:12 <shaohe_feng_> the bdf maybe change, bus-port of a usb devices also maybe change. 15:34:18 <Li_Liu> just give user the card with the same type 15:35:27 <wangzhh> The most tricky thing is how to handle the resource which had been assinged. 15:36:18 <xinranwang> if user has done some work on old hw, that will be a loss. 15:37:12 <Li_Liu> in that case, operator has to notify the user to backup first 15:37:35 <Li_Liu> size operator should know when the resizing is happening 15:37:44 <Li_Liu> since* 15:38:50 <Li_Liu> In 99% of the scenarios tho, I don't think it matters anyway 15:38:57 <wangzhh> What about power failure? 15:39:28 <xinranwang> how will nova record the hw resource from cyborg, there should be a field of nova instance to record this. is this attach_handle_uuid? 15:39:28 <Li_Liu> if power failure happen, the device should not be resized right? 15:40:56 <shaohe_feng_> if you hotplug in a hardware before failure happen, the things is also bad. 15:42:33 <wangzhh> Li, If we just reboot the server, the bus wont't change? 15:42:43 <wangzhh> *won't 15:43:03 <Li_Liu> lol... as I said.. if operator wants to do this... he/she needs to notify users... 15:43:14 <shaohe_feng_> if you do not resize hardware, the bus wont't change. 15:43:16 <shaohe_feng_> Li_Liu: yes. 15:43:31 <xinranwang> wangzhh: no, it will not change 15:43:36 <wangzhh> shaohe_feng_, Got it. 15:43:36 <Li_Liu> wangzhh, I think simple reboot should not change the bdf 15:43:48 <Li_Liu> bios just scan the pci tree 15:44:04 <Li_Liu> if nothing new is inserted, it should not change 15:44:07 <shaohe_feng_> live migrate the VM to another host. 15:44:50 <xinranwang> that's more complex... 15:45:21 <wangzhh> scheduler filter should deal with this part. shaohe_feng_ 15:46:19 <shaohe_feng_> the data center can scale their hardwares. For example the want to support more AI card in their exist hosts. 15:47:27 <shaohe_feng_> OK, let keep this issue in mind, maybe we can find a good way to solve it 15:47:31 <shaohe_feng_> go ahead. 15:47:52 <shaohe_feng_> multi-level resources support 15:48:18 <shaohe_feng_> now I want to support a new multi-level card. 15:48:29 <shaohe_feng_> similar to pfga card. 15:49:15 <shaohe_feng_> for example. There is a one region in a card but 4 functions in a region. 15:49:16 <Li_Liu> sure, to support new cards. as long as it can meet the requirements I mentioned earlier 15:49:38 <Li_Liu> 4 different functions? 15:49:42 <shaohe_feng_> there's 3 requirements: 15:50:29 <shaohe_feng_> Li_Liu: in my new card, they are same function, but for fpga, it may different functions. fpga is more complex. 15:51:03 <shaohe_feng_> 1. we should know the topology of this devices. 15:52:05 <shaohe_feng_> 2. user can apply any level of the resources, for example, he want to apply a region or just one function. 15:52:37 <shaohe_feng_> 3. avoid fragmentization 15:53:35 <shaohe_feng_> Li_Liu: now the cyborg satisfy the the former 2 requirements, right? 15:54:11 <Li_Liu> shaohe_feng_ it should 15:54:18 <shaohe_feng_> Ok, greate. 15:54:26 <shaohe_feng_> what's about 3. 15:54:27 <Li_Liu> cyborg was designed to have these in mind 15:54:36 <shaohe_feng_> good. 15:54:49 <Li_Liu> the 3rd one is related to scheduling algorithm 15:55:10 <Li_Liu> we might need to work with nova weigher for that 15:55:22 <shaohe_feng_> Li_Liu: that's need cyborg help. 15:55:31 <Li_Liu> that's for sure 15:55:47 <shaohe_feng_> let me elaborate it 15:56:01 <shaohe_feng_> 3 regions 15:56:03 <Li_Liu> cyborg can provide a weigher like mechanism and work with nova 15:56:20 <shaohe_feng_> one region with 4 function. 15:56:43 <shaohe_feng_> User 1 apply one function from region 1 15:58:14 <shaohe_feng_> user 2 want another 2 more functions. I expect cyborg allocate them from region 1 instead of region 2/3. 15:58:48 <shaohe_feng_> user 3 want another one more functions, it is also from region 1. 15:59:49 <shaohe_feng_> the allocation should not scatter among region 1,2 and 3 16:00:28 <shaohe_feng_> they should centralize 1 region. 16:00:29 <Li_Liu> that should be easy to do. a weigher would do the job' 16:01:24 <shaohe_feng_> so user 4 can apply the rest 2 whole regions. 16:02:08 <shaohe_feng_> Li_Liu: OK, is there a weigher mechanism for it now? 16:02:17 <Li_Liu> not yet 16:02:27 <Li_Liu> we can plan this 16:02:42 <shaohe_feng_> OK, good. 16:02:44 <Li_Liu> coz I think numa scheduling also needs this feature 16:02:56 <shaohe_feng_> this is useful. 16:03:15 <Li_Liu> for sure 16:03:40 <Li_Liu> I will add this to T release plannig 16:03:59 <shaohe_feng_> there's a common scenario for this feature. 16:04:10 <shaohe_feng_> Li_Liu: good, thanks. 16:04:26 <Li_Liu> npnp 16:04:57 <shaohe_feng_> AoB? 16:05:02 <Li_Liu> I need to pick up my lunch now, you guys can go ahead. don't stay too late.. :P 16:05:08 <shaohe_feng_> Li_Liu: wangzhh: xinranwang ? 16:05:16 <Li_Liu> I am all good\ 16:05:26 <shaohe_feng_> good. 16:05:32 <shaohe_feng_> glad to talk with you. 16:05:42 <wangzhh> Me, too. 16:06:10 <shaohe_feng_> let's end the meeting. 16:06:11 <xinranwang> i am fine with that. NUMA should also need the similar mechanism 16:06:33 <shaohe_feng_> thanks all. 16:06:55 <shaohe_feng_> #endmeeting