14:00:09 <edleafe> #startmeeting nova_scheduler 14:00:09 <openstack> Meeting started Mon Jun 20 14:00:09 2016 UTC and is due to finish in 60 minutes. The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:13 <openstack> The meeting name has been set to 'nova_scheduler' 14:00:15 <edleafe> Anyone here today? 14:00:17 <takashin> o/ 14:00:19 <mlavalle> o/ 14:00:22 <_gryf> o/ 14:00:28 <rlrossit> o/ 14:00:34 <Yingxin> o/ 14:01:09 <sudipto> o/ 14:01:27 <edleafe> Let's wait another minute for the latecomers... 14:01:38 <edleafe> In the meantime, happy solstice! 14:01:49 * rlrossit feels the sunburn 14:02:43 * johnthetubaguy lurks in a multi-tasking way 14:03:28 * bauzas mentions he's out of coffee 14:03:29 <edleafe> Well, I guess we should get started 14:03:43 <edleafe> #topic Specs and Reviews 14:03:54 <alaski> o/ 14:03:59 <edleafe> There is only one on the agenda: rlrossit, take it away! 14:04:05 <rlrossit> alright! 14:04:10 <edleafe> #link https://review.openstack.org/#/c/330145/ 14:04:24 <rlrossit> so I have a crazy idea and am looking for some feedback 14:04:41 <rlrossit> I want to try and PoC a new scheduler that uses a different HostStateManager 14:05:00 <jaypipes> o/ 14:05:00 <rlrossit> the new manager will use Redis (or some other shared memory) to maintain the state of all of the hosts 14:05:11 <woodster_> o/ 14:05:15 <rlrossit> that way we don't need to be going back to the DB or caching, it's always up-to-date on all of the nodes 14:05:32 <rlrossit> I just whipped up that spec last week, so it's still really rough around the edges 14:05:56 <rlrossit> but initially, I'm just looking for an answer to the questions: Am I crazy for trying this? Has anyone done this before? 14:05:57 <jaypipes> rlrossit: until the scheduler actually owns that data, it will never be fully accurate.' 14:05:58 <bauzas> it's a very old story :) 14:06:10 <rlrossit> jaypipes: good point 14:06:27 <alaski> rlrossit: done it with redis, not sure. there was talk of trying cassandra for that 14:06:29 <rlrossit> currently my plan is to have the compute nodes update their state to the shared memory 14:06:35 <doffm> rlrossit: Google have done this before. http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41684.pdf 14:07:03 <rlrossit> I saw that when looking at johnthetubaguy's backlog scheduler spec 14:07:08 <bauzas> doffm: rlrossit: jaypipes: https://etherpad.openstack.org/p/liberty-nova-scalable-scheduler 14:07:24 <jaypipes> rlrossit: and then, once the scheduler *does* own that data, putting into Redis essentially makes it throwaway data and you will lose any transactional contracts that an RDBMS provides. 14:07:28 <alaski> rlrossit: my expectation for a proposal like that would be to have numbers showing that it's worthwhile, and a clear explanation of the failure modes and how that compares to now 14:07:45 <rlrossit> alaski: that's the plan of the quick-and-dirty PoC 14:08:10 <rlrossit> I just didn't want to be met with pitchforks and torches when I came with numbers later :) 14:08:20 * edleafe is back after a wifi drop 14:08:43 <bauzas> well, my original thinking is that we can still have the computes owning the resources but having the schedulers sharing a global state 14:08:51 <alaski> rlrossit: I'm not sure you can avoid that completely :) 14:08:52 <bauzas> that's not mutually exclusive 14:09:12 <rlrossit> bauzas: that's my current plan 14:09:16 <edleafe> So is the idea that this will enable multiple schedulers better, sicne they will share a common view of the state of the compute nodes? 14:09:21 <rlrossit> or, rather, my current plan for the current scheduler 14:09:36 <bauzas> rlrossit: see the etherpad I mentioned above, it was kind of an idea I had in the past 14:09:39 <rlrossit> yeah it'll allow horizontal scaling without getting contention on the hosts 14:09:42 <alaski> rlrossit: honestly the interesting thing to me would be what apis do you need between computes and the scheduler so that what you're working on and what jaypipes is working on could both coexist 14:10:05 <bauzas> the real problem to me is to find some way to address that kind of shared state without pulling some huge dep 14:10:31 <rlrossit> bauzas: yeah... 14:10:44 <edleafe> Yeah, the current scheduler design doesn't do shared state very well 14:10:49 <doffm> rlrossit: There will still be contention. Shared state doesn't mean IMMIDEATE update on all the schedulers. It EVENTUALLY consistent, not actually consistent. 14:10:50 <rlrossit> that's the part where I think the main disagreement will come 14:10:54 <bauzas> tbh, I'm still totally on page with the rp-providers specs 14:11:15 <bauzas> those specs are for managing our heterogenous ways of counting resources 14:11:28 <edleafe> doffm: I think that is understood. The problem now with horizontal scaling is increasing the number of schedulers increases the raciness 14:11:33 <rlrossit> doffm: I want to see how many schedulers it takes to get contention with redis, I bet it'll be a lot 14:11:40 <doffm> I hope so. 14:12:00 <bauzas> the concept of "owning" that resource is just one side blueprint that was discussed to be left up for discussion post-newton given the progress we make on Newton 14:12:05 <rlrossit> yeah, the goal is to get to more than 2 schedulers :) 14:12:13 <edleafe> rlrossit: so this will reduce the lag between a host being updated, and all of the schedulers knowing about it? 14:12:43 <rlrossit> edleafe: yep 14:12:55 <rlrossit> edleafe: and it's more of an instant cache between the multiple schedulers 14:12:57 <rlrossit> "instant" 14:13:02 <doffm> :) 14:13:11 * rlrossit steps lightly around words 14:13:13 <bauzas> well, it's a global state cache 14:13:16 <bauzas> rather 14:13:19 <rlrossit> yeah 14:13:34 <edleafe> host managers are local state caches 14:13:49 <bauzas> right, but they aren't shared 14:14:04 <bauzas> which is pretty expensive 14:14:05 <rlrossit> which is why you start getting contention with multiple caching schedulers 14:14:15 <edleafe> bauzas: if they were, they'd be global :) 14:14:21 <bauzas> not really 14:14:40 <rlrossit> this way if someone schedules to a host, all schedulers see that and have their host state updated, so if a host gets filled up, they won't schedule to it anymore 14:15:00 <rlrossit> instead of failing and having to update their cache, and then retrying 14:15:22 <bauzas> the main driver to me is to keep it as light as possible and keep the scheduler(s) optimistic 14:15:23 <edleafe> rlrossit: what would be your time frame for getting PoC numbers? 14:15:39 <bauzas> that's where I think we should still have the computes owning the resources 14:15:46 <rlrossit> edleafe: the goal is before the midcycle so doffm can present my numbers if they are good 14:15:51 <doffm> Or bad. 14:15:56 <bauzas> ie. the scheduler could fail fast, or give a wrong answer 14:16:01 <edleafe> rlrossit: cool 14:16:12 <rlrossit> so, if there's no one that wants to kill me yet, I'll get started on the PoC 14:16:27 <bauzas> the only concern I have with the idea is to use Redis as it 14:16:30 <edleafe> So I guess the question is: what sort of numbers would be persuasive enough to get everyone to look at this more seriously? 14:16:50 <bauzas> that's where I think we should be more subtle 14:17:12 <edleafe> bauzas: that's an implementation detail, no? 14:17:30 <rlrossit> bauzas: yeah, if this works out, I have long-term thoughts on how the deps will work 14:17:34 <edleafe> I think the idea is to demonstrate if the general approach helps. 14:17:42 <rlrossit> but it's not worth looking into that if this doesn't even do anything for us 14:17:48 <bauzas> edleafe: I dunno 14:18:20 * edleafe notes that my first commits to nova was removing redis from NASA's original design 14:18:31 <bauzas> edleafe: ie. my point is that we shouldn't rely on some magic given by a backend for providing us update consensus and cache invalidation 14:19:03 <edleafe> "magic" is kind of harsh at this point 14:19:11 <edleafe> It's a cache 14:19:23 <edleafe> Shared by multiple schedulers 14:19:35 <edleafe> Is there any other cache that would not be as magic? 14:19:50 <Yingxin> rlrossit: I also had a design for the "shared-state scheduler" 14:20:03 <doffm> bauzas: Why would we write our own? 'Magic' is good. 'Magic' means tested iplementation that we worry less about. 14:20:09 <Yingxin> rlrossit: but I think we take the different approach 14:20:29 <edleafe> Yingxin: do you want to give a quick summary of the differences? 14:21:03 <doffm> I think part of the plan for testing is also to look at Yingxin's changes. 14:21:25 <rlrossit> doffm: indeed it is 14:21:31 <Yingxin> edleafe: there are no global cache in my design, they just get quick synchronized from incremental updates. 14:22:25 <Yingxin> I'll publish the new test result of my prototype to the ML :) 14:22:43 <edleafe> OK, great 14:22:51 <bauzas> doffm: "magic" means "strong dependency for us" that would make us sensitive to updates :) 14:23:17 <edleafe> Yingxin: would you review rlrossit's spec and give your feedback, as you've also worked on this issue? 14:23:21 <doffm> I understand, There is a tradeoff. :) 14:24:10 <Yingxin> edleafe: sure 14:24:16 <edleafe> OK, let's continue this on the spec. And I hope that rlrossit and doffm keep us updated between now and the midcycle 14:24:35 <edleafe> Any other specs or reviews that we need to discuss here? 14:24:37 <rlrossit> I will do my best to take copious notes 14:25:41 <edleafe> OK, let's move on 14:25:52 <edleafe> #topic Midcycle 14:26:10 * edleafe can never decide if there's a hyphen in midcycle or not 14:26:28 <edleafe> So a quick show of hands: who's going, and who isn't? 14:26:59 * bauzas waves hand 14:27:02 * _gryf will be there. probably. 14:27:10 <takashin> o/ 14:27:24 * edleafe will be there 14:27:34 <doffm> Will also be there. 14:27:43 <diga> edleafe: I will attend it for sure but I will attend it remotely 14:28:00 <alaski> I will be there 14:28:09 <alaski> as will jaypipes 14:28:27 <edleafe> diga: do you know if there will be a remote system in place? 14:29:08 <diga> yes, if possible we can setup webex 14:29:24 <edleafe> What I'd like to do is make sure we've identified the issues that need discussion ahead of time 14:29:25 <diga> last time I attended magnum via webex 14:30:12 <bauzas> well, nova midcycles are pretty hard to remotely attend, tbh 14:30:22 <bauzas> first, the audience is larger in the room 14:30:22 <edleafe> So rather than start yet another etherpad, how about we keep editing the meeting agenda for now, and we can be sure to discuss these in the meetings before the midcycle 14:30:38 <bauzas> second, the agenda is pretty free up to the last minute 14:30:51 <bauzas> and third, the flow of the conversation is pretty high 14:30:55 <edleafe> bauzas would know (last year's Rochester meetup) 14:31:25 <bauzas> so, in general, we could maybe try to setup some kind of connectivity, but that's mostly just an audio without asking folks to participate 14:31:47 <bauzas> because that would slow down the convos 14:32:03 <edleafe> diga: were you able to participate in the Magnum discussions? Or simply listen? 14:32:18 <bauzas> either way, it's something I guess mriedem hasn't planned yet 14:32:26 <diga> edleafe: participated in the magnum discusion 14:33:15 <edleafe> diga: well, I echo bauzas's concern, but I guess it's worth a try 14:33:37 <diga> edleafe: Thank you :) 14:34:17 <edleafe> So I guess what I'd like to see in the next week is for people on this subteam to start adding their ideas for the midcycle to the agenda page 14:34:25 <edleafe> #link https://wiki.openstack.org/wiki/Meetings/NovaScheduler 14:35:07 <edleafe> Since we only have limited time at the midcyle, we should discuss these topics and identify the most important for F2F discussion 14:35:35 <edleafe> Sound good to everyone? 14:37:20 * edleafe notes that silence == agreement 14:37:37 <doffm> agreed 14:37:50 <diga> I am fine with this edleafe 14:38:03 <edleafe> #topic Opens 14:38:25 <edleafe> So, before I send you all back to being productive, anyone have any other topics to discuss? 14:38:38 <sudipto> edleafe, anything related to the scheduler that i could help with? 14:38:54 <diga> edleafe: any date you are planning for mid-cycle ? 14:39:15 <edleafe> diga: https://wiki.openstack.org/wiki/Sprints/NovaNewtonSprint#Hotels 14:39:19 <edleafe> oops 14:39:22 <edleafe> diga: https://wiki.openstack.org/wiki/Sprints/NovaNewtonSprint 14:39:32 <edleafe> July 19-21 14:39:38 <diga> edleafe: thanks 14:40:01 <edleafe> sudipto: have you seen the Resource Providers work? 14:40:16 <sudipto> edleafe, yeah. 14:40:32 <sudipto> edleafe, read the spec, saw a few reviews. Nothing in particular yet though. 14:40:48 <edleafe> sudipto: then reviewing the series that now starts with https://review.openstack.org/#/c/328276 14:40:53 <edleafe> would be a good start 14:40:57 <sudipto> edleafe, alright. 14:41:08 <sudipto> edleafe, will do. 14:41:12 <sudipto> edleafe, thanks! 14:41:38 <edleafe> Anything else to discuss? 14:42:25 <edleafe> OK, thanks everyone! Now go back to work!! 14:42:27 <edleafe> #endmeeting