14:00:09 <edleafe> #startmeeting nova_scheduler
14:00:09 <openstack> Meeting started Mon Jun 20 14:00:09 2016 UTC and is due to finish in 60 minutes.  The chair is edleafe. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:13 <openstack> The meeting name has been set to 'nova_scheduler'
14:00:15 <edleafe> Anyone here today?
14:00:17 <takashin> o/
14:00:19 <mlavalle> o/
14:00:22 <_gryf> o/
14:00:28 <rlrossit> o/
14:00:34 <Yingxin> o/
14:01:09 <sudipto> o/
14:01:27 <edleafe> Let's wait another minute for the latecomers...
14:01:38 <edleafe> In the meantime, happy solstice!
14:01:49 * rlrossit feels the sunburn
14:02:43 * johnthetubaguy lurks in a multi-tasking way
14:03:28 * bauzas mentions he's out of coffee
14:03:29 <edleafe> Well, I guess we should get started
14:03:43 <edleafe> #topic Specs and Reviews
14:03:54 <alaski> o/
14:03:59 <edleafe> There is only one on the agenda: rlrossit, take it away!
14:04:05 <rlrossit> alright!
14:04:10 <edleafe> #link https://review.openstack.org/#/c/330145/
14:04:24 <rlrossit> so I have a crazy idea and am looking for some feedback
14:04:41 <rlrossit> I want to try and PoC a new scheduler that uses a different HostStateManager
14:05:00 <jaypipes> o/
14:05:00 <rlrossit> the new manager will use Redis (or some other shared memory) to maintain the state of all of the hosts
14:05:11 <woodster_> o/
14:05:15 <rlrossit> that way we don't need to be going back to the DB or caching, it's always up-to-date on all of the nodes
14:05:32 <rlrossit> I just whipped up that spec last week, so it's still really rough around the edges
14:05:56 <rlrossit> but initially, I'm just looking for an answer to the questions: Am I crazy for trying this? Has anyone done this before?
14:05:57 <jaypipes> rlrossit: until the scheduler actually owns that data, it will never be fully accurate.'
14:05:58 <bauzas> it's a very old story :)
14:06:10 <rlrossit> jaypipes: good point
14:06:27 <alaski> rlrossit: done it with redis, not sure. there was talk of trying cassandra for that
14:06:29 <rlrossit> currently my plan is to have the compute nodes update their state to the shared memory
14:06:35 <doffm> rlrossit: Google have done this before. http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41684.pdf
14:07:03 <rlrossit> I saw that when looking at johnthetubaguy's backlog scheduler spec
14:07:08 <bauzas> doffm: rlrossit: jaypipes: https://etherpad.openstack.org/p/liberty-nova-scalable-scheduler
14:07:24 <jaypipes> rlrossit: and then, once the scheduler *does* own that data, putting into Redis essentially makes it throwaway data and you will lose any transactional contracts that an RDBMS provides.
14:07:28 <alaski> rlrossit: my expectation for a proposal like that would be to have numbers showing that it's worthwhile, and a clear explanation of the failure modes and how that compares to now
14:07:45 <rlrossit> alaski: that's the plan of the quick-and-dirty PoC
14:08:10 <rlrossit> I just didn't want to be met with pitchforks and torches when I came with numbers later :)
14:08:20 * edleafe is back after a wifi drop
14:08:43 <bauzas> well, my original thinking is that we can still have the computes owning the resources but having the schedulers sharing a global state
14:08:51 <alaski> rlrossit: I'm not sure you can avoid that completely :)
14:08:52 <bauzas> that's not mutually exclusive
14:09:12 <rlrossit> bauzas: that's my current plan
14:09:16 <edleafe> So is the idea that this will enable multiple schedulers better, sicne they will share a common view of the state of the compute nodes?
14:09:21 <rlrossit> or, rather, my current plan for the current scheduler
14:09:36 <bauzas> rlrossit: see the etherpad I mentioned above, it was kind of an idea I had in the past
14:09:39 <rlrossit> yeah it'll allow horizontal scaling without getting contention on the hosts
14:09:42 <alaski> rlrossit: honestly the interesting thing to me would be what apis do you need between computes and the scheduler so that what you're working on and what jaypipes is working on could both coexist
14:10:05 <bauzas> the real problem to me is to find some way to address that kind of shared state without pulling some huge dep
14:10:31 <rlrossit> bauzas: yeah...
14:10:44 <edleafe> Yeah, the current scheduler design doesn't do shared state very well
14:10:49 <doffm> rlrossit: There will still be contention. Shared state doesn't mean IMMIDEATE update on all the schedulers. It EVENTUALLY consistent, not actually consistent.
14:10:50 <rlrossit> that's the part where I think the main disagreement will come
14:10:54 <bauzas> tbh, I'm still totally on page with the rp-providers specs
14:11:15 <bauzas> those specs are for managing our heterogenous ways of counting resources
14:11:28 <edleafe> doffm: I think that is understood. The problem now with horizontal scaling is increasing the number of schedulers increases the raciness
14:11:33 <rlrossit> doffm: I want to see how many schedulers it takes to get contention with redis, I bet it'll be a lot
14:11:40 <doffm> I hope so.
14:12:00 <bauzas> the concept of "owning" that resource is just one side blueprint that was discussed to be left up for discussion post-newton given the progress we make on Newton
14:12:05 <rlrossit> yeah, the goal is to get to more than 2 schedulers :)
14:12:13 <edleafe> rlrossit: so this will reduce the lag between a host being updated, and all of the schedulers knowing about it?
14:12:43 <rlrossit> edleafe: yep
14:12:55 <rlrossit> edleafe: and it's more of an instant cache between the multiple schedulers
14:12:57 <rlrossit> "instant"
14:13:02 <doffm> :)
14:13:11 * rlrossit steps lightly around words
14:13:13 <bauzas> well, it's a global state cache
14:13:16 <bauzas> rather
14:13:19 <rlrossit> yeah
14:13:34 <edleafe> host managers are local state caches
14:13:49 <bauzas> right, but they aren't shared
14:14:04 <bauzas> which is pretty expensive
14:14:05 <rlrossit> which is why you start getting contention with multiple caching schedulers
14:14:15 <edleafe> bauzas: if they were, they'd be global :)
14:14:21 <bauzas> not really
14:14:40 <rlrossit> this way if someone schedules to a host, all schedulers see that and have their host state updated, so if a host gets filled up, they won't schedule to it anymore
14:15:00 <rlrossit> instead of failing and having to update their cache, and then retrying
14:15:22 <bauzas> the main driver to me is to keep it as light as possible and keep the scheduler(s) optimistic
14:15:23 <edleafe> rlrossit: what would be your time frame for getting PoC numbers?
14:15:39 <bauzas> that's where I think we should still have the computes owning the resources
14:15:46 <rlrossit> edleafe: the goal is before the midcycle so doffm can present my numbers if they are good
14:15:51 <doffm> Or bad.
14:15:56 <bauzas> ie. the scheduler could fail fast, or give a wrong answer
14:16:01 <edleafe> rlrossit: cool
14:16:12 <rlrossit> so, if there's no one that wants to kill me yet, I'll get started on the PoC
14:16:27 <bauzas> the only concern I have with the idea is to use Redis as it
14:16:30 <edleafe> So I guess the question is: what sort of numbers would be persuasive enough to get everyone to look at this more seriously?
14:16:50 <bauzas> that's where I think we should be more subtle
14:17:12 <edleafe> bauzas: that's an implementation detail, no?
14:17:30 <rlrossit> bauzas: yeah, if this works out, I have long-term thoughts on how the deps will work
14:17:34 <edleafe> I think the idea is to demonstrate if the general approach helps.
14:17:42 <rlrossit> but it's not worth looking into that if this doesn't even do anything for us
14:17:48 <bauzas> edleafe: I dunno
14:18:20 * edleafe notes that my first commits to nova was removing redis from NASA's original design
14:18:31 <bauzas> edleafe: ie. my point is that we shouldn't rely on some magic given by a backend for providing us update consensus and cache invalidation
14:19:03 <edleafe> "magic" is kind of harsh at this point
14:19:11 <edleafe> It's a cache
14:19:23 <edleafe> Shared by multiple schedulers
14:19:35 <edleafe> Is there any other cache that would not be as magic?
14:19:50 <Yingxin> rlrossit: I also had a design for the "shared-state scheduler"
14:20:03 <doffm> bauzas: Why would we write our own? 'Magic' is good. 'Magic' means tested iplementation that we worry less about.
14:20:09 <Yingxin> rlrossit: but I think we take the different approach
14:20:29 <edleafe> Yingxin: do you want to give a quick summary of the differences?
14:21:03 <doffm> I think part of the plan for testing is also to look at Yingxin's changes.
14:21:25 <rlrossit> doffm: indeed it is
14:21:31 <Yingxin> edleafe: there are no global cache in my design, they just get quick synchronized from incremental updates.
14:22:25 <Yingxin> I'll publish the new test result of my prototype to the ML :)
14:22:43 <edleafe> OK, great
14:22:51 <bauzas> doffm: "magic" means "strong dependency for us" that would make us sensitive to updates :)
14:23:17 <edleafe> Yingxin: would you review rlrossit's spec and give your feedback, as you've also worked on this issue?
14:23:21 <doffm> I understand, There is a tradeoff. :)
14:24:10 <Yingxin> edleafe: sure
14:24:16 <edleafe> OK, let's continue this on the spec. And I hope that rlrossit and doffm keep us updated between now and the midcycle
14:24:35 <edleafe> Any other specs or reviews that we need to discuss here?
14:24:37 <rlrossit> I will do my best to take copious notes
14:25:41 <edleafe> OK, let's move on
14:25:52 <edleafe> #topic Midcycle
14:26:10 * edleafe can never decide if there's a hyphen in midcycle or not
14:26:28 <edleafe> So a quick show of hands: who's going, and who isn't?
14:26:59 * bauzas waves hand
14:27:02 * _gryf will be there. probably.
14:27:10 <takashin> o/
14:27:24 * edleafe will be there
14:27:34 <doffm> Will also be there.
14:27:43 <diga> edleafe: I will attend it for sure but I will attend it remotely
14:28:00 <alaski> I will be there
14:28:09 <alaski> as will jaypipes
14:28:27 <edleafe> diga: do you know if there will be a remote system in place?
14:29:08 <diga> yes, if possible we can setup webex
14:29:24 <edleafe> What I'd like to do is make sure we've identified the issues that need discussion ahead of time
14:29:25 <diga> last time I attended magnum via webex
14:30:12 <bauzas> well, nova midcycles are pretty hard to remotely attend, tbh
14:30:22 <bauzas> first, the audience is larger in the room
14:30:22 <edleafe> So rather than start yet another etherpad, how about we keep editing the meeting agenda for now, and we can be sure to discuss these in the meetings before the midcycle
14:30:38 <bauzas> second, the agenda is pretty free up to the last minute
14:30:51 <bauzas> and third, the flow of the conversation is pretty high
14:30:55 <edleafe> bauzas would know (last year's Rochester meetup)
14:31:25 <bauzas> so, in general, we could maybe try to setup some kind of connectivity, but that's mostly just an audio without asking folks to participate
14:31:47 <bauzas> because that would slow down the convos
14:32:03 <edleafe> diga: were you able to participate in the Magnum discussions? Or simply listen?
14:32:18 <bauzas> either way, it's something I guess mriedem hasn't planned yet
14:32:26 <diga> edleafe: participated in the magnum discusion
14:33:15 <edleafe> diga: well, I echo bauzas's concern, but I guess it's worth a try
14:33:37 <diga> edleafe: Thank you :)
14:34:17 <edleafe> So I guess what I'd like to see in the next week is for people on this subteam to start adding their ideas for the midcycle to the agenda page
14:34:25 <edleafe> #link https://wiki.openstack.org/wiki/Meetings/NovaScheduler
14:35:07 <edleafe> Since we only have limited time at the midcyle, we should discuss these topics and identify the most important for F2F discussion
14:35:35 <edleafe> Sound good to everyone?
14:37:20 * edleafe notes that silence == agreement
14:37:37 <doffm> agreed
14:37:50 <diga> I am fine with this edleafe
14:38:03 <edleafe> #topic Opens
14:38:25 <edleafe> So, before I send you all back to being productive, anyone have any other topics to discuss?
14:38:38 <sudipto> edleafe, anything related to the scheduler that i could help with?
14:38:54 <diga> edleafe: any date you are planning for mid-cycle ?
14:39:15 <edleafe> diga: https://wiki.openstack.org/wiki/Sprints/NovaNewtonSprint#Hotels
14:39:19 <edleafe> oops
14:39:22 <edleafe> diga: https://wiki.openstack.org/wiki/Sprints/NovaNewtonSprint
14:39:32 <edleafe> July 19-21
14:39:38 <diga> edleafe: thanks
14:40:01 <edleafe> sudipto: have you seen the Resource Providers work?
14:40:16 <sudipto> edleafe, yeah.
14:40:32 <sudipto> edleafe, read the spec, saw a few reviews. Nothing in particular yet though.
14:40:48 <edleafe> sudipto: then reviewing the series that now starts with https://review.openstack.org/#/c/328276
14:40:53 <edleafe> would be a good start
14:40:57 <sudipto> edleafe, alright.
14:41:08 <sudipto> edleafe, will do.
14:41:12 <sudipto> edleafe, thanks!
14:41:38 <edleafe> Anything else to discuss?
14:42:25 <edleafe> OK, thanks everyone! Now go back to work!!
14:42:27 <edleafe> #endmeeting