*** crushil has joined #openstack-cyborg | 01:40 | |
*** crushil has quit IRC | 02:26 | |
*** crushil has joined #openstack-cyborg | 02:31 | |
*** sekelso has quit IRC | 03:13 | |
*** sekelso has joined #openstack-cyborg | 03:22 | |
*** sekelso has quit IRC | 03:30 | |
*** sekelso has joined #openstack-cyborg | 03:38 | |
*** crushil has quit IRC | 03:38 | |
*** crushil has joined #openstack-cyborg | 03:40 | |
*** crushil has quit IRC | 03:40 | |
*** sekelso has quit IRC | 04:16 | |
*** joseppc has quit IRC | 07:16 | |
*** jkilpatr has joined #openstack-cyborg | 11:10 | |
*** joseppc has joined #openstack-cyborg | 11:49 | |
*** mikeH has joined #openstack-cyborg | 12:06 | |
*** sekelso has joined #openstack-cyborg | 13:20 | |
*** skelso has joined #openstack-cyborg | 13:22 | |
*** sekelso has quit IRC | 13:25 | |
*** NokMikeR has joined #openstack-cyborg | 13:47 | |
*** zhipeng_ has joined #openstack-cyborg | 13:47 | |
*** crushil has joined #openstack-cyborg | 13:49 | |
*** skelso has quit IRC | 13:52 | |
*** skelso has joined #openstack-cyborg | 14:03 | |
NokMikeR | any meeting today? | 14:07 |
---|---|---|
zhipeng_ | yes I just sent out the email to the openstack-dev | 14:09 |
zhipeng_ | weekly meeting as usual | 14:09 |
NokMikeR | ok thanks | 14:09 |
*** zhipeng_ has quit IRC | 14:18 | |
*** zhipeng_ has joined #openstack-cyborg | 14:18 | |
*** joseppc has quit IRC | 14:52 | |
crushil | \o | 14:59 |
jkilpatr | o/ | 14:59 |
zhipeng_ | hey | 15:00 |
zhipeng_ | let's staaaart the longest irc meeting ever | 15:00 |
jkilpatr | can't be as bad as last week | 15:00 |
zhipeng_ | #startmeeting openstack-cyborg | 15:00 |
openstack | Meeting started Wed Jun 7 15:00:56 2017 UTC and is due to finish in 60 minutes. The chair is zhipeng_. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:00 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:00 |
openstack | The meeting name has been set to 'openstack_cyborg' | 15:01 |
zhipeng_ | hahaha | 15:01 |
zhipeng_ | let's hope so | 15:01 |
zhipeng_ | okey so quick update from my side | 15:01 |
zhipeng_ | on the api/db patch | 15:01 |
zhipeng_ | #topic BP discussion | 15:01 |
zhipeng_ | #link https://review.openstack.org/#/c/445814/ | 15:01 |
zhipeng_ | so ChrisD reviewed with the comments that there is an ongoing discussion on the traits | 15:02 |
zhipeng_ | we might consider to align our design to it | 15:02 |
zhipeng_ | originally, the placement resource provider was meant for just compute node | 15:02 |
jkilpatr | I was looking over that, care to summarize? | 15:02 |
zhipeng_ | sure I'm putting my thoughts together now | 15:03 |
zhipeng_ | so now the placement team see the pitfall for that | 15:03 |
zhipeng_ | since for example for shared storage (external arrays I would suppose) | 15:03 |
zhipeng_ | if you only count the storage side of things on the compute node | 15:04 |
zhipeng_ | your resource provider will never correctly reflect the required traits | 15:04 |
jkilpatr | so this is an issue with accelerators that may be shared between many computes? | 15:04 |
zhipeng_ | the resouce provider should reflect the shared storage arrays, rather than only local discks | 15:04 |
zhipeng_ | no, I think this is an issue for accelerators as whole | 15:05 |
jkilpatr | how so? | 15:05 |
zhipeng_ | since if the resource provider only identify with compute node | 15:05 |
zhipeng_ | we could wind up with the same problem as we have now, since accelerator characters are bundled with the compute charaters | 15:05 |
zhipeng_ | well we could have our own resource class for sure, but that does not solve the problem | 15:06 |
zhipeng_ | nova scheduler asks the placement api to provide all the necessary resources | 15:06 |
zhipeng_ | and for Cyborg, one of the important goals is that accelerators being treated as the first class citezen | 15:07 |
zhipeng_ | meaning that we should have indiidual resource providers for accelerators | 15:07 |
zhipeng_ | from the email link Chris provided, there is an etherpad documenting the "Plan B" | 15:08 |
jkilpatr | ok so the issue is that if we have a 'gpu' resource provider it's dependent on computes in a way that resource providers aren't supposed to be. | 15:09 |
zhipeng_ | which I liked very much, is working on to extend the current nested resource provider definition, to a more relaxed, multiple resource providers one | 15:09 |
zhipeng_ | yes exactly | 15:09 |
zhipeng_ | the scheduling decision would still largely depends on the regular compute features, since we are just part of the traits | 15:09 |
crushil | interesting | 15:09 |
zhipeng_ | so back to the "Plan B", the current nested resource provider model is designed primarily for stuff like NUMA nodes | 15:10 |
zhipeng_ | where you got this parent-child relationship | 15:10 |
crushil | So, how does that change our implementation? | 15:10 |
zhipeng_ | the Plan B extneds the scope to be more general, meaning for Cyborg use cases | 15:10 |
zhipeng_ | we could have multiple resource provider for each and every accelerators | 15:11 |
zhipeng_ | (if they deemed important for the workload) | 15:11 |
zhipeng_ | crushil the change is that | 15:11 |
zhipeng_ | our DB design has to align with the proposed nested resource provider/trait design | 15:11 |
zhipeng_ | at least DB schemas | 15:12 |
zhipeng_ | so that when cyborg agent populate our inventory to the placement api | 15:12 |
zhipeng_ | it could understand it correctly | 15:12 |
crushil | Ok, what about the other specs? | 15:13 |
zhipeng_ | not concerned that much :) | 15:14 |
crushil | gotcha | 15:14 |
zhipeng_ | So I'm thinking we might need two DB schemas | 15:14 |
zhipeng_ | the current one in the spec patch, could be used for the discovery phase | 15:15 |
zhipeng_ | that is when user start the cyborg service and then agent/driver do the discovery/pre-config | 15:15 |
zhipeng_ | collect what we have, on the host | 15:15 |
zhipeng_ | the second set of schema needs to be aligned with nested resource provider | 15:16 |
zhipeng_ | to interact with placement api and eventually nova-scheduler | 15:16 |
zhipeng_ | for the VM to select the correct accelerator resource | 15:16 |
jkilpatr | so we need to maintain two parallel db's for each purpose or do you mean we want to change the format in a future release? | 15:17 |
zhipeng_ | what I'm thinking is that we don't have exhaustive knowledge on the hardware now | 15:18 |
zhipeng_ | therefore we keep a seperate DB schema, the host side one should be more extendable or more abstract | 15:18 |
zhipeng_ | But on another thought | 15:19 |
zhipeng_ | it might be just too complex ..... | 15:19 |
zhipeng_ | what do you guys think | 15:19 |
jkilpatr | I think we should try and keep one db as much as possible, I don't want to try and maintain parallel sets of data | 15:19 |
zhipeng_ | that makes sense | 15:19 |
crushil | I agree, having multiple DBs is just clunky | 15:20 |
zhipeng_ | in that case we will just use the resource provider schema,I will follow up with Chris to see which one I should use | 15:21 |
zhipeng_ | the current one or the proposed one | 15:21 |
jkilpatr | sounds good. | 15:22 |
jkilpatr | Anything else on that subject? | 15:22 |
zhipeng_ | nope | 15:22 |
zhipeng_ | anything else from you guys on the open spec ? | 15:23 |
crushil | nope | 15:23 |
zhipeng_ | great | 15:23 |
zhipeng_ | #topic initial code development | 15:23 |
zhipeng_ | so, any roadblocks | 15:24 |
jkilpatr | been trying to understand oslo rpc and message passing and start structuring the conductor/agent | 15:24 |
zhipeng_ | sounds like a great start :) | 15:24 |
crushil | I have created stubs and I will push them up by the end of the week | 15:25 |
zhipeng_ | great ! | 15:25 |
jkilpatr | crushil, sounds good. | 15:25 |
zhipeng_ | let's do small pieces like Justin suggested | 15:26 |
crushil | I will fill them out rebased on top of the API and agent patches | 15:26 |
jkilpatr | so a lot of what we will be doing involves rpc between different components, so people with integrating parts need to talk to each other about interfaces | 15:26 |
jkilpatr | I don't think we should be too worried about a stable internal interface | 15:26 |
zhipeng_ | yes I agree | 15:26 |
zhipeng_ | oslo.messaging could provide everything we need | 15:27 |
jkilpatr | well sometimes we need rpc for example the driver should be called by the agent over rpc I'm thinking (we could invoke directly but I'm not sure if I want to do that) | 15:27 |
zhipeng_ | i think it should be done over rpc | 15:29 |
zhipeng_ | unless, we gave driver restful apis ? | 15:29 |
jkilpatr | I don't think that's the right application here. Our internal code needs to be more tightly integrated than restfulness allows. | 15:30 |
zhipeng_ | yep | 15:30 |
*** rushil has joined #openstack-cyborg | 15:30 | |
zhipeng_ | so rpc should be fine here | 15:31 |
zhipeng_ | i think at the moment, it is agent talking to the generic driver | 15:31 |
zhipeng_ | later on, we should design something like the neutron ml2 driver interface | 15:31 |
zhipeng_ | that every driver, vendor or not, implements the interface which rpc calls will go through | 15:32 |
zhipeng_ | in a rather standard way | 15:32 |
rushil | Ok. So, are we going to follow the neutron model vs the nova/cinder model? | 15:33 |
zhipeng_ | i think more like the neutron moddel | 15:33 |
zhipeng_ | for out-of-tree drivers | 15:33 |
rushil | But isn't that too complicated | 15:34 |
zhipeng_ | cinder and nova are mostly in-tree maintained drivers | 15:34 |
zhipeng_ | it won't be too complicated for us i think | 15:34 |
zhipeng_ | neutron is complicated because they have to define the type drivers and mechanism drivers | 15:34 |
rushil | Well, cinder has out of tree drivers based on whether you have CI or not | 15:34 |
zhipeng_ | I think in-tree drivers also requires the CI | 15:35 |
zhipeng_ | otherwise the cinder team removes your driver | 15:35 |
rushil | No, they just make it unsupported i.e. move it out of tree | 15:35 |
zhipeng_ | for us, as long as it is PCIe communicated devices, the driver interface won't be too complicated | 15:35 |
zhipeng_ | but if we need to support extra protocols, that is where things will get wild | 15:36 |
zhipeng_ | rushil ah okey | 15:36 |
rushil | Ok. I just want to make sure we don't make things more complicated than they should be | 15:36 |
zhipeng_ | yes that is always our goal | 15:36 |
jkilpatr | I can agree on a standard rpc interface but that's less complicated than I think you are making it out to be. | 15:36 |
zhipeng_ | we even wanted to skip the conductor :P | 15:36 |
jkilpatr | and I nearly got away with it too! | 15:37 |
zhipeng_ | jkilpatr haha | 15:37 |
rushil | Lol | 15:37 |
zhipeng_ | rushil the cyborg ml2 driver would be modeled from your generic driver implementation :P | 15:39 |
rushil | I wouldn't call it ml2 driver though | 15:40 |
zhipeng_ | of course we will have another name for it | 15:40 |
zhipeng_ | aluminum drivers :P | 15:41 |
zhipeng_ | for cyborg robots | 15:41 |
rushil | Hehe | 15:41 |
jkilpatr | Anyways I'll try have a stub up this week (conductor) and then agent next week. | 15:42 |
jkilpatr | depends on how other tasks go for me. | 15:42 |
rushil | jkilpatr: Cool | 15:43 |
zhipeng_ | sounds great, i got another colleague working on cyborg this week, so api code will be developed in parallel | 15:43 |
rushil | Awesome | 15:44 |
zhipeng_ | hopefully when we settled the spec, the initial code will come out | 15:44 |
zhipeng_ | and we could iterate over | 15:44 |
zhipeng_ | #topic AoB | 15:44 |
zhipeng_ | any other topics | 15:44 |
rushil | Btw our group at Lenovo sent out initial emails to vendors to get their drivers aligned with cyborg | 15:44 |
zhipeng_ | wow | 15:45 |
zhipeng_ | that is awesome | 15:45 |
rushil | I'll keep you guys posted on that | 15:45 |
zhipeng_ | could you disclose the vendor names for now ? | 15:45 |
zhipeng_ | or should we wait until later | 15:45 |
rushil | The usual suspects | 15:46 |
zhipeng_ | e.g ? | 15:46 |
rushil | Nvidia, AMD | 15:46 |
rushil | And smaller ones like Micron | 15:47 |
zhipeng_ | cool ! | 15:47 |
rushil | I'll let y'all know when they are committed to contributing code | 15:47 |
zhipeng_ | great :) | 15:47 |
zhipeng_ | okey if there are no other topics, we go to the usual long slumber ~~ | 15:50 |
zhipeng_ | will try to remember to close the meeting an hour later | 15:50 |
crushil | Cool, thanks zhipeng_ | 15:51 |
*** NokMikeR has quit IRC | 15:51 | |
*** rushil has quit IRC | 15:53 | |
*** jkilpatr has left #openstack-cyborg | 15:56 | |
*** skelso has quit IRC | 16:12 | |
*** skelso has joined #openstack-cyborg | 16:13 | |
*** joseppc has joined #openstack-cyborg | 16:13 | |
*** skelso has quit IRC | 16:30 | |
*** skelso has joined #openstack-cyborg | 16:30 | |
*** jkilpatr has joined #openstack-cyborg | 16:40 | |
*** zhipeng_ has quit IRC | 16:59 | |
*** zhipeng_ has joined #openstack-cyborg | 17:00 | |
zhipeng_ | #endmeeting | 17:00 |
openstack | Meeting ended Wed Jun 7 17:00:56 2017 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 17:00 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-06-07-15.00.html | 17:00 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-06-07-15.00.txt | 17:01 |
openstack | Log: http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-06-07-15.00.log.html | 17:01 |
*** zhipeng_ has quit IRC | 17:01 | |
*** skelso has quit IRC | 19:02 | |
*** skelso has joined #openstack-cyborg | 19:02 | |
*** skelso has quit IRC | 19:11 | |
-openstackstatus- NOTICE: The Gerrit service on review.openstack.org is being restarted now to clear some excessive connection counts while we debug the intermittent request failures reported over the past few minutes | 20:05 | |
*** skelso has joined #openstack-cyborg | 20:13 | |
*** skelso has quit IRC | 20:24 | |
*** skelso has joined #openstack-cyborg | 20:25 | |
*** mikeH has quit IRC | 21:15 | |
*** crushil has quit IRC | 21:31 | |
*** crushil has joined #openstack-cyborg | 21:31 | |
*** crushil has quit IRC | 21:36 | |
*** jkilpatr has quit IRC | 22:34 | |
*** skelso has quit IRC | 22:40 | |
*** openstack has joined #openstack-cyborg | 23:13 | |
*** mikeH has joined #openstack-cyborg | 23:37 | |
*** skelso has quit IRC | 23:37 | |
*** mikeH has quit IRC | 23:55 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!