| *** crushil has joined #openstack-cyborg | 01:40 | |
| *** crushil has quit IRC | 02:26 | |
| *** crushil has joined #openstack-cyborg | 02:31 | |
| *** sekelso has quit IRC | 03:13 | |
| *** sekelso has joined #openstack-cyborg | 03:22 | |
| *** sekelso has quit IRC | 03:30 | |
| *** sekelso has joined #openstack-cyborg | 03:38 | |
| *** crushil has quit IRC | 03:38 | |
| *** crushil has joined #openstack-cyborg | 03:40 | |
| *** crushil has quit IRC | 03:40 | |
| *** sekelso has quit IRC | 04:16 | |
| *** joseppc has quit IRC | 07:16 | |
| *** jkilpatr has joined #openstack-cyborg | 11:10 | |
| *** joseppc has joined #openstack-cyborg | 11:49 | |
| *** mikeH has joined #openstack-cyborg | 12:06 | |
| *** sekelso has joined #openstack-cyborg | 13:20 | |
| *** skelso has joined #openstack-cyborg | 13:22 | |
| *** sekelso has quit IRC | 13:25 | |
| *** NokMikeR has joined #openstack-cyborg | 13:47 | |
| *** zhipeng_ has joined #openstack-cyborg | 13:47 | |
| *** crushil has joined #openstack-cyborg | 13:49 | |
| *** skelso has quit IRC | 13:52 | |
| *** skelso has joined #openstack-cyborg | 14:03 | |
| NokMikeR | any meeting today? | 14:07 |
|---|---|---|
| zhipeng_ | yes I just sent out the email to the openstack-dev | 14:09 |
| zhipeng_ | weekly meeting as usual | 14:09 |
| NokMikeR | ok thanks | 14:09 |
| *** zhipeng_ has quit IRC | 14:18 | |
| *** zhipeng_ has joined #openstack-cyborg | 14:18 | |
| *** joseppc has quit IRC | 14:52 | |
| crushil | \o | 14:59 |
| jkilpatr | o/ | 14:59 |
| zhipeng_ | hey | 15:00 |
| zhipeng_ | let's staaaart the longest irc meeting ever | 15:00 |
| jkilpatr | can't be as bad as last week | 15:00 |
| zhipeng_ | #startmeeting openstack-cyborg | 15:00 |
| openstack | Meeting started Wed Jun 7 15:00:56 2017 UTC and is due to finish in 60 minutes. The chair is zhipeng_. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:00 |
| openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:00 |
| openstack | The meeting name has been set to 'openstack_cyborg' | 15:01 |
| zhipeng_ | hahaha | 15:01 |
| zhipeng_ | let's hope so | 15:01 |
| zhipeng_ | okey so quick update from my side | 15:01 |
| zhipeng_ | on the api/db patch | 15:01 |
| zhipeng_ | #topic BP discussion | 15:01 |
| zhipeng_ | #link https://review.openstack.org/#/c/445814/ | 15:01 |
| zhipeng_ | so ChrisD reviewed with the comments that there is an ongoing discussion on the traits | 15:02 |
| zhipeng_ | we might consider to align our design to it | 15:02 |
| zhipeng_ | originally, the placement resource provider was meant for just compute node | 15:02 |
| jkilpatr | I was looking over that, care to summarize? | 15:02 |
| zhipeng_ | sure I'm putting my thoughts together now | 15:03 |
| zhipeng_ | so now the placement team see the pitfall for that | 15:03 |
| zhipeng_ | since for example for shared storage (external arrays I would suppose) | 15:03 |
| zhipeng_ | if you only count the storage side of things on the compute node | 15:04 |
| zhipeng_ | your resource provider will never correctly reflect the required traits | 15:04 |
| jkilpatr | so this is an issue with accelerators that may be shared between many computes? | 15:04 |
| zhipeng_ | the resouce provider should reflect the shared storage arrays, rather than only local discks | 15:04 |
| zhipeng_ | no, I think this is an issue for accelerators as whole | 15:05 |
| jkilpatr | how so? | 15:05 |
| zhipeng_ | since if the resource provider only identify with compute node | 15:05 |
| zhipeng_ | we could wind up with the same problem as we have now, since accelerator characters are bundled with the compute charaters | 15:05 |
| zhipeng_ | well we could have our own resource class for sure, but that does not solve the problem | 15:06 |
| zhipeng_ | nova scheduler asks the placement api to provide all the necessary resources | 15:06 |
| zhipeng_ | and for Cyborg, one of the important goals is that accelerators being treated as the first class citezen | 15:07 |
| zhipeng_ | meaning that we should have indiidual resource providers for accelerators | 15:07 |
| zhipeng_ | from the email link Chris provided, there is an etherpad documenting the "Plan B" | 15:08 |
| jkilpatr | ok so the issue is that if we have a 'gpu' resource provider it's dependent on computes in a way that resource providers aren't supposed to be. | 15:09 |
| zhipeng_ | which I liked very much, is working on to extend the current nested resource provider definition, to a more relaxed, multiple resource providers one | 15:09 |
| zhipeng_ | yes exactly | 15:09 |
| zhipeng_ | the scheduling decision would still largely depends on the regular compute features, since we are just part of the traits | 15:09 |
| crushil | interesting | 15:09 |
| zhipeng_ | so back to the "Plan B", the current nested resource provider model is designed primarily for stuff like NUMA nodes | 15:10 |
| zhipeng_ | where you got this parent-child relationship | 15:10 |
| crushil | So, how does that change our implementation? | 15:10 |
| zhipeng_ | the Plan B extneds the scope to be more general, meaning for Cyborg use cases | 15:10 |
| zhipeng_ | we could have multiple resource provider for each and every accelerators | 15:11 |
| zhipeng_ | (if they deemed important for the workload) | 15:11 |
| zhipeng_ | crushil the change is that | 15:11 |
| zhipeng_ | our DB design has to align with the proposed nested resource provider/trait design | 15:11 |
| zhipeng_ | at least DB schemas | 15:12 |
| zhipeng_ | so that when cyborg agent populate our inventory to the placement api | 15:12 |
| zhipeng_ | it could understand it correctly | 15:12 |
| crushil | Ok, what about the other specs? | 15:13 |
| zhipeng_ | not concerned that much :) | 15:14 |
| crushil | gotcha | 15:14 |
| zhipeng_ | So I'm thinking we might need two DB schemas | 15:14 |
| zhipeng_ | the current one in the spec patch, could be used for the discovery phase | 15:15 |
| zhipeng_ | that is when user start the cyborg service and then agent/driver do the discovery/pre-config | 15:15 |
| zhipeng_ | collect what we have, on the host | 15:15 |
| zhipeng_ | the second set of schema needs to be aligned with nested resource provider | 15:16 |
| zhipeng_ | to interact with placement api and eventually nova-scheduler | 15:16 |
| zhipeng_ | for the VM to select the correct accelerator resource | 15:16 |
| jkilpatr | so we need to maintain two parallel db's for each purpose or do you mean we want to change the format in a future release? | 15:17 |
| zhipeng_ | what I'm thinking is that we don't have exhaustive knowledge on the hardware now | 15:18 |
| zhipeng_ | therefore we keep a seperate DB schema, the host side one should be more extendable or more abstract | 15:18 |
| zhipeng_ | But on another thought | 15:19 |
| zhipeng_ | it might be just too complex ..... | 15:19 |
| zhipeng_ | what do you guys think | 15:19 |
| jkilpatr | I think we should try and keep one db as much as possible, I don't want to try and maintain parallel sets of data | 15:19 |
| zhipeng_ | that makes sense | 15:19 |
| crushil | I agree, having multiple DBs is just clunky | 15:20 |
| zhipeng_ | in that case we will just use the resource provider schema,I will follow up with Chris to see which one I should use | 15:21 |
| zhipeng_ | the current one or the proposed one | 15:21 |
| jkilpatr | sounds good. | 15:22 |
| jkilpatr | Anything else on that subject? | 15:22 |
| zhipeng_ | nope | 15:22 |
| zhipeng_ | anything else from you guys on the open spec ? | 15:23 |
| crushil | nope | 15:23 |
| zhipeng_ | great | 15:23 |
| zhipeng_ | #topic initial code development | 15:23 |
| zhipeng_ | so, any roadblocks | 15:24 |
| jkilpatr | been trying to understand oslo rpc and message passing and start structuring the conductor/agent | 15:24 |
| zhipeng_ | sounds like a great start :) | 15:24 |
| crushil | I have created stubs and I will push them up by the end of the week | 15:25 |
| zhipeng_ | great ! | 15:25 |
| jkilpatr | crushil, sounds good. | 15:25 |
| zhipeng_ | let's do small pieces like Justin suggested | 15:26 |
| crushil | I will fill them out rebased on top of the API and agent patches | 15:26 |
| jkilpatr | so a lot of what we will be doing involves rpc between different components, so people with integrating parts need to talk to each other about interfaces | 15:26 |
| jkilpatr | I don't think we should be too worried about a stable internal interface | 15:26 |
| zhipeng_ | yes I agree | 15:26 |
| zhipeng_ | oslo.messaging could provide everything we need | 15:27 |
| jkilpatr | well sometimes we need rpc for example the driver should be called by the agent over rpc I'm thinking (we could invoke directly but I'm not sure if I want to do that) | 15:27 |
| zhipeng_ | i think it should be done over rpc | 15:29 |
| zhipeng_ | unless, we gave driver restful apis ? | 15:29 |
| jkilpatr | I don't think that's the right application here. Our internal code needs to be more tightly integrated than restfulness allows. | 15:30 |
| zhipeng_ | yep | 15:30 |
| *** rushil has joined #openstack-cyborg | 15:30 | |
| zhipeng_ | so rpc should be fine here | 15:31 |
| zhipeng_ | i think at the moment, it is agent talking to the generic driver | 15:31 |
| zhipeng_ | later on, we should design something like the neutron ml2 driver interface | 15:31 |
| zhipeng_ | that every driver, vendor or not, implements the interface which rpc calls will go through | 15:32 |
| zhipeng_ | in a rather standard way | 15:32 |
| rushil | Ok. So, are we going to follow the neutron model vs the nova/cinder model? | 15:33 |
| zhipeng_ | i think more like the neutron moddel | 15:33 |
| zhipeng_ | for out-of-tree drivers | 15:33 |
| rushil | But isn't that too complicated | 15:34 |
| zhipeng_ | cinder and nova are mostly in-tree maintained drivers | 15:34 |
| zhipeng_ | it won't be too complicated for us i think | 15:34 |
| zhipeng_ | neutron is complicated because they have to define the type drivers and mechanism drivers | 15:34 |
| rushil | Well, cinder has out of tree drivers based on whether you have CI or not | 15:34 |
| zhipeng_ | I think in-tree drivers also requires the CI | 15:35 |
| zhipeng_ | otherwise the cinder team removes your driver | 15:35 |
| rushil | No, they just make it unsupported i.e. move it out of tree | 15:35 |
| zhipeng_ | for us, as long as it is PCIe communicated devices, the driver interface won't be too complicated | 15:35 |
| zhipeng_ | but if we need to support extra protocols, that is where things will get wild | 15:36 |
| zhipeng_ | rushil ah okey | 15:36 |
| rushil | Ok. I just want to make sure we don't make things more complicated than they should be | 15:36 |
| zhipeng_ | yes that is always our goal | 15:36 |
| jkilpatr | I can agree on a standard rpc interface but that's less complicated than I think you are making it out to be. | 15:36 |
| zhipeng_ | we even wanted to skip the conductor :P | 15:36 |
| jkilpatr | and I nearly got away with it too! | 15:37 |
| zhipeng_ | jkilpatr haha | 15:37 |
| rushil | Lol | 15:37 |
| zhipeng_ | rushil the cyborg ml2 driver would be modeled from your generic driver implementation :P | 15:39 |
| rushil | I wouldn't call it ml2 driver though | 15:40 |
| zhipeng_ | of course we will have another name for it | 15:40 |
| zhipeng_ | aluminum drivers :P | 15:41 |
| zhipeng_ | for cyborg robots | 15:41 |
| rushil | Hehe | 15:41 |
| jkilpatr | Anyways I'll try have a stub up this week (conductor) and then agent next week. | 15:42 |
| jkilpatr | depends on how other tasks go for me. | 15:42 |
| rushil | jkilpatr: Cool | 15:43 |
| zhipeng_ | sounds great, i got another colleague working on cyborg this week, so api code will be developed in parallel | 15:43 |
| rushil | Awesome | 15:44 |
| zhipeng_ | hopefully when we settled the spec, the initial code will come out | 15:44 |
| zhipeng_ | and we could iterate over | 15:44 |
| zhipeng_ | #topic AoB | 15:44 |
| zhipeng_ | any other topics | 15:44 |
| rushil | Btw our group at Lenovo sent out initial emails to vendors to get their drivers aligned with cyborg | 15:44 |
| zhipeng_ | wow | 15:45 |
| zhipeng_ | that is awesome | 15:45 |
| rushil | I'll keep you guys posted on that | 15:45 |
| zhipeng_ | could you disclose the vendor names for now ? | 15:45 |
| zhipeng_ | or should we wait until later | 15:45 |
| rushil | The usual suspects | 15:46 |
| zhipeng_ | e.g ? | 15:46 |
| rushil | Nvidia, AMD | 15:46 |
| rushil | And smaller ones like Micron | 15:47 |
| zhipeng_ | cool ! | 15:47 |
| rushil | I'll let y'all know when they are committed to contributing code | 15:47 |
| zhipeng_ | great :) | 15:47 |
| zhipeng_ | okey if there are no other topics, we go to the usual long slumber ~~ | 15:50 |
| zhipeng_ | will try to remember to close the meeting an hour later | 15:50 |
| crushil | Cool, thanks zhipeng_ | 15:51 |
| *** NokMikeR has quit IRC | 15:51 | |
| *** rushil has quit IRC | 15:53 | |
| *** jkilpatr has left #openstack-cyborg | 15:56 | |
| *** skelso has quit IRC | 16:12 | |
| *** skelso has joined #openstack-cyborg | 16:13 | |
| *** joseppc has joined #openstack-cyborg | 16:13 | |
| *** skelso has quit IRC | 16:30 | |
| *** skelso has joined #openstack-cyborg | 16:30 | |
| *** jkilpatr has joined #openstack-cyborg | 16:40 | |
| *** zhipeng_ has quit IRC | 16:59 | |
| *** zhipeng_ has joined #openstack-cyborg | 17:00 | |
| zhipeng_ | #endmeeting | 17:00 |
| openstack | Meeting ended Wed Jun 7 17:00:56 2017 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 17:00 |
| openstack | Minutes: http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-06-07-15.00.html | 17:00 |
| openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-06-07-15.00.txt | 17:01 |
| openstack | Log: http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-06-07-15.00.log.html | 17:01 |
| *** zhipeng_ has quit IRC | 17:01 | |
| *** skelso has quit IRC | 19:02 | |
| *** skelso has joined #openstack-cyborg | 19:02 | |
| *** skelso has quit IRC | 19:11 | |
| -openstackstatus- NOTICE: The Gerrit service on review.openstack.org is being restarted now to clear some excessive connection counts while we debug the intermittent request failures reported over the past few minutes | 20:05 | |
| *** skelso has joined #openstack-cyborg | 20:13 | |
| *** skelso has quit IRC | 20:24 | |
| *** skelso has joined #openstack-cyborg | 20:25 | |
| *** mikeH has quit IRC | 21:15 | |
| *** crushil has quit IRC | 21:31 | |
| *** crushil has joined #openstack-cyborg | 21:31 | |
| *** crushil has quit IRC | 21:36 | |
| *** jkilpatr has quit IRC | 22:34 | |
| *** skelso has quit IRC | 22:40 | |
| *** openstack has joined #openstack-cyborg | 23:13 | |
| *** mikeH has joined #openstack-cyborg | 23:37 | |
| *** skelso has quit IRC | 23:37 | |
| *** mikeH has quit IRC | 23:55 | |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!