11:00:26 <oneswig> #startmeeting scientific-sig 11:00:27 <openstack> Meeting started Wed Jan 31 11:00:26 2018 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:28 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:31 <openstack> The meeting name has been set to 'scientific_sig' 11:00:44 <oneswig> Hello there! 11:00:49 <armstrong> Hello 11:00:56 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_January_31st_2018 11:00:56 <ttsiouts> Hello! 11:00:59 <strigazi> hello 11:01:05 <belmoreira> hi 11:01:09 <oneswig> Greetings all, thanks for coming. 11:01:37 <vabada> hi 11:01:37 <priteau> Hi everyone 11:01:46 <armstrong> Hi 11:01:50 <oneswig> johnthetubaguy tells me he'll be along shortly 11:02:01 <b1airo> evening 11:02:07 <oneswig> Hey Blair 11:02:11 <oneswig> #chair b1airo 11:02:12 <openstack> Current chairs: b1airo oneswig 11:02:27 <oneswig> #topic housekeeping stuff 11:02:31 <daveholland> hi all 11:02:38 <b1airo> apologies if i'm a little slow - only 10pm here but i'm particularly shattered this evening for some reason 11:02:51 <b1airo> hi daveholland 11:02:57 <oneswig> There's about a week to go until the summit CFP closes, HPC/GPU/AI topics please! 11:03:05 <oneswig> no worries b1airo 11:03:20 * johnthetubaguy hides at the back of the room 11:03:31 <armstrong> @oneswig: meetings now are only on Wednesdays? 11:03:34 <oneswig> Quite an interesting subject for a track and I'm hoping for some good use cases 11:03:37 <b1airo> you're an expert now johnthetubaguy! 11:03:54 <oneswig> armstrong: no, Tuesday 2100UTC on alternate weeks 11:04:08 <johnthetubaguy> b1airo: heh 11:04:09 * ildikov lurks behind johnthetubaguy :) 11:04:26 <oneswig> Hi johnthetubaguy, we'll get round to the nova bits shortly 11:04:36 <oneswig> Quiet at the back there please! 11:04:41 <b1airo> anyone know how the submissions are looking? 11:05:00 <oneswig> No, don't think that comes out until the deadline closes. 11:05:21 <oneswig> Was thinking of writing around to a few orgs that might be interested to make sure they are aware 11:05:36 <b1airo> i heard from a few people they liked the change up of the tracks, but don't think any of my colleagues have actually submitted anything 11:05:36 <ildikov> oneswig: +1 11:05:51 <b1airo> yeah good idea 11:06:08 <oneswig> b1airo: you've got to put something meaty in on GPUs, haven't you? 11:06:22 <b1airo> actually i don't think it has gone around on the OzStackers list yet... 11:06:55 <oneswig> Take care of matters b1airo 11:07:36 <oneswig> In general can we try to circulate awareness of the HPC/GPU/AI track and the 8th Feb deadline 11:07:45 <oneswig> that'd be great 11:07:50 <b1airo> is there a thumbs up ascii symbol? 11:08:10 <oneswig> This IRC, so quaint, like vintage motoring :-) 11:08:31 <oneswig> Let's move on 11:08:39 <oneswig> #topic Dublin PTG 11:09:00 <oneswig> Short notice on this - if you're going and don't have a ticket yet, buy it right now. 11:09:18 <ildikov> price increases on Thursday 11:09:25 <ildikov> so hurry up! :) 11:09:31 <b1airo> are you scalping oneswig? 11:09:56 <oneswig> It jumpstarted me into action... our away team's all sorted (finally). 11:10:03 <oneswig> b1airo: ha, to you, sir, special price 11:10:44 <priteau> It says "57 remaining" right now 11:10:55 <priteau> Is there also a quantity limit on the $100 tickets? 11:10:56 <oneswig> We've covered content already, but are open to more items 11:11:04 <oneswig> priteau: was 80 yesterday, I believe 11:12:13 <johnthetubaguy> thats probably the overall limit I think 11:12:34 <johnthetubaguy> size is small to keep the costs down I believe 11:12:55 <oneswig> I think so, don't think more tickets will be appearing - unless anyone knows otherwise 11:13:12 <oneswig> OK, one more event to cover 11:13:18 <oneswig> #topic HPCAC Lugano 11:13:41 <oneswig> As I understand it, b1airo gave a barnstorming performance at the last one in Perth, right Blair? 11:13:55 <oneswig> #link conference page https://www.cscs.ch/publications/press-releases/swiss-hpc-advisory-council-conference-2018-hpcxxl-user-group/ 11:14:18 <oneswig> OpenStack's now front and central in this conference. 11:14:49 <b1airo> i really enjoyed it... 11:14:50 <oneswig> I went last year and thought it was good. zioproto came along and gave a good talk 11:15:15 <oneswig> And Lugano, what's not to like about that 11:15:39 <b1airo> true dat 11:16:15 <oneswig> 9th-12th April if you're interested, perhaps I'll see you there (haven't completely decided yet) 11:16:35 <b1airo> i was expecting an invite from Cindy for that, they probably decided you were better oneswig! 11:16:47 <belmoreira> oneswig great to know 11:17:04 <oneswig> b1airo: I had a mail from Hussein about it 11:17:13 <oneswig> Any other events coming up people would like to circulate? 11:18:23 <oneswig> #topic preemptible instances in OpenStack 11:18:35 <oneswig> today's main event! 11:19:14 <oneswig> belmoreira: how's your reaper project developing? can you describe it? 11:19:31 <belmoreira> I will give the floor to ttsiouts 11:19:48 <ttsiouts> belmoreira: thanks! 11:20:14 <ttsiouts> Currently, the operators use project quotas for ensuring the fair sharing of their infrastructure. 11:20:25 <ttsiouts> The problem with this, is that quotas pose as hard limits. 11:20:41 <ttsiouts> This leads to actually dedicating resources for workloads even if the resources are not used all the time. 11:20:56 <ttsiouts> Which in turn leads to lower cloud utilization 11:21:11 <ttsiouts> So there are idle and unused resources when at the same time, the users ask for more resources. 11:21:37 <ttsiouts> The concept of Preemptible Instances tries to address to this problem. 11:21:51 <ttsiouts> This type of servers can be spawned on top of the project's quota, making use of the idling resources. 11:21:53 * johnthetubaguy whoops in agreement 11:21:59 <oneswig> ttsiouts: how often does the CERN cloud run out of hosts? 11:22:59 <belmoreira> we manage quotas and expectations to not happen 11:23:26 <ttsiouts> Here in Cern we have compute intensive workloads such as batch processing. 11:23:42 <ttsiouts> For those, currently we use projects with dedicated quota. 11:24:12 <ttsiouts> so we are committing big chunks of the Cern's cloud infra to these workloads even when the resources are not actually used, which leads to lower cloud utilization. 11:24:23 <b1airo> "dedicated" means that <committed quota> == <available resource> ? 11:24:45 <johnthetubaguy> I like to think of this as slicing your cloud, like pizza, using quota. you don't want the unused bits going to waste 11:24:46 <strigazi> b1airo it depends on the user 11:25:02 <belmoreira> b1airo for compute processing yes 11:25:08 <ttsiouts> johnthetubaguy: yes 11:25:27 <b1airo> got it - i understand this might not be true for general purpose instances 11:25:43 <belmoreira> b1airo correct 11:25:45 <b1airo> i assume you control this by segmenting the offerings across AZs or something? 11:26:12 <johnthetubaguy> so some enterprises hit issues when departments co-fund a cloud, but want to keep the bits they paid for 11:26:13 <belmoreira> compute is segmented through cells 11:26:34 <b1airo> johnthetubaguy: yeah we have that problem too 11:27:04 <b1airo> please continue ttsiouts 11:27:08 <belmoreira> johnthetubaguy yeah, that is also our main issue 11:27:21 <ttsiouts> So currently we are prototyping a service to orchestrate the preemptible instances. 11:27:38 <ttsiouts> The "reaper" service! 11:27:59 <ttsiouts> As proposed by the Nova team in the latest PTG, the service is external to Nova. 11:28:24 <ttsiouts> The purpose of this is to help us identify what changes are needed in the Nova code base, if any, in order to make the preemptible instances work. 11:28:42 <ttsiouts> There are various things that have to be addressed here. 11:29:00 <ttsiouts> There is the need to tag the preemptible servers with a preepmptible property at creation time. 11:29:13 <ttsiouts> And the property should be immutable. 11:29:29 <armstrong> @ttsiouts: does CERN uses OpenStack cloud? 11:30:33 <belmoreira> armstrong yes 11:30:35 <ttsiouts> armstrong: yes! Openstack is a big thing for Cern. 11:30:37 <johnthetubaguy> The way I hope we talk about this at the PTG to Nova folks is: so we tried everything external to Nova, these few things suck when we do that, so we need to work out how to address those things 11:31:12 <ttsiouts> johnthetubaguy: +1 11:31:21 <oneswig> ttsiouts: johnthetubaguy: can we go over what works and what doesn't? 11:31:40 <johnthetubaguy> that preemptable property is a good point 11:31:50 <oneswig> I believe it only partially protects against running out of capacity, right? 11:32:39 <johnthetubaguy> its more about maximizing utilization, I believe 11:33:31 <johnthetubaguy> i.e. only stop being able to build servers when you really have no more room 11:34:11 <b1airo> what happens when an on-demand request comes in for capacity that is currently filled by preemptible instances? 11:34:14 <daveholland> how does the scheduler connect with the reaper? does the reaper get triggered when the scheduler thinks it can't find a valid host? (do point me at docs if this is already covered somewhere) 11:34:23 <oneswig> johnthetubaguy: but with an external service, is there an issue that preemptible instances might not be harvested in time when availability requires it? 11:34:44 <johnthetubaguy> yes, we have the scheduler no valid host triggering the reaper 11:34:51 <johnthetubaguy> ttsiouts: ? 11:35:00 <johnthetubaguy> did I just make that up :) 11:35:14 <ttsiouts> The service is triggered by the scheduler when the placement returns no valid hosts for the requested resources during a boot requested. 11:35:25 * johnthetubaguy is about to move location, may get disconnected 11:35:45 <daveholland> so the expectation is that the reaper reaps fast enough to be within the overall scheduler timeout? 11:36:11 <oneswig> ttsiouts: From your tests, do higher-level services like Heat cover for the occasional No Valid Hosts, and hide it from the user? 11:36:16 <ttsiouts> daveholland: Hopefully yes 11:36:49 <ttsiouts> oneswig: Actually we try to avoid the NoValidHosts exception 11:37:20 <ttsiouts> We trigger the reaper as soon as the placement return no allocation candidates 11:37:26 <belmoreira> oneswig this should be hidden for the user 11:37:57 <oneswig> ah, so the failure never makes it back to the API request? 11:38:06 <ttsiouts> oneswig: yes 11:38:47 <oneswig> Are you able to signal preemptible instances before they get terminated, or is there simply no time for a grace period? 11:38:47 <ttsiouts> If the reaper fails, then we have the NoValidHosts exception raised 11:39:00 <johnthetubaguy> yeah, this all happens after the create server API has returned 11:39:20 <johnthetubaguy> but server will go to error if you can't find room 11:39:21 <oneswig> johnthetubaguy: ah, of course, thanks 11:39:35 <priteau> ttsiouts: And that's done without any modification to the nova scheduler code? 11:39:48 <johnthetubaguy> these are the changes we want to make to nova 11:39:49 <b1airo> so the timeout (or not) and ability to signal the p-instance should all be configurable i guess (eventually anyway) 11:40:04 <johnthetubaguy> I think this is in the conductor rather than the scheduler, at least ideally, but thats a technicallity 11:40:15 <ttsiouts> priteau: the only modification we've done is the triggering of the reaper 11:40:33 <ttsiouts> priteau: with the same request that is placed to placement 11:40:56 <priteau> I would love to see the code if you have it public somewhere 11:41:12 <johnthetubaguy> me too, would love to review where we are 11:41:41 <ttsiouts> priteau, johnthetubaguy: this is pretty early development stage 11:41:55 <johnthetubaguy> thats all good, just a github branch is fine 11:41:55 <belmoreira> ttsiouts isn't our repo public? 11:41:56 <oneswig> ttsiouts: are there complexities introduced if (eg) the reaper or the scheduler that invokes it were to die at this point? 11:42:36 <ttsiouts> belmoreira: there are several things to be done.. 11:43:06 <belmoreira> yeah, I know. But I though we made it public few time ago 11:43:39 <ttsiouts> I guess we can make it public.. 11:43:43 <oneswig> ttsiouts: belmoreira: are there assumptions about cern's use case or is it already generalised 11:44:38 <oneswig> It seems to have huge potential for many use cases. 11:44:39 <johnthetubaguy> where I can see this getting tricky is deciding who gets deleted by the reaper 11:45:02 <belmoreira> oneswig initially we would like to run p-instances in dedicated projects 11:45:24 <ttsiouts> oneswig: For this prototype the flavors used for spawning this kind of servers are tagged as preemptible. 11:45:36 <johnthetubaguy> belmoreira: good call out, that's a simplification we can work through later 11:45:40 <b1airo> belmoreira: we have the same desire - would meet 80% of our need 11:45:58 <daveholland> I like the flavour/flavor approach, that's easy for users to understand 11:46:05 <johnthetubaguy> b1airo: interesting 11:46:22 <belmoreira> johnthetubaguy just to mentioned that our usecase is not that generic (at least initially) 11:46:26 <b1airo> we even started spec'ing out a similar project, albeit without the integration and just having a pool of resource dedicated for temporary instances 11:46:29 <johnthetubaguy> daveholland: +1 (although its a shame you create so many flavors!) 11:46:43 <johnthetubaguy> belmoreira: +1 although its more general that it first seems 11:46:43 <priteau> johnthetubaguy: It would be nice to have a configurable reaper decision engine, like Nova scheduler weighters 11:47:35 <johnthetubaguy> priteau: I am tempted to go simpler than that, but +1 on the customizable part of that 11:47:35 <ttsiouts> priteau: Actually we are experimenting with two strategies for selecting servers. 11:47:49 <oneswig> Has this got the attention of the public cloud folks yet? 11:47:56 <johnthetubaguy> ttsiouts: I am curious where you settled with that 11:48:07 <ttsiouts> The first one maps servers to hosts. 11:48:15 <ttsiouts> Then it selects a host that could provide the requested resources. 11:48:18 <johnthetubaguy> oneswig: good point, worth reaching out to that SIG (someone must do reserved instances and live-migrates) 11:48:20 <belmoreira> daveholland for our use case we don't see a need for many flavors. We think a small/generic flavor will be enough 11:48:25 <martial_> oneswig: the PTG is a good option to do just that 11:48:30 <ttsiouts> and as a last step finds a combination of preemptible servers that will be killed to free up the resources 11:48:43 <ttsiouts> The second strategy tries to eliminate the idling resources. 11:48:48 <oneswig> Hi martial_ - morning 11:48:53 <oneswig> #chair martial_ 11:48:54 <openstack> Current chairs: b1airo martial_ oneswig 11:48:55 <ttsiouts> So it select the minimum of the existing preemptible resources that in combination with the already idle resources of the selected host, provide the requested resources. 11:48:58 <johnthetubaguy> martial_: often don't get many SIG folks at the PTG, at least it feels that way most times 11:49:09 <martial_> (thanks Stig, been listening) 11:49:43 <martial_> johnthetubaguy: Tobias (part of the Public Cloud WG) reached out, so I know he is going to be there 11:49:50 <oneswig> johnthetubaguy: true, I don't know if they've booked time at the PTG 11:49:58 <johnthetubaguy> martial_: ah, excellent 11:50:00 <oneswig> ah good 11:50:12 <daveholland> we have ~20 flavours (some of those are to distinguish root-on-hypervisor from root-on-ceph) so a few more won't hurt. I'd be wary of doubling that for pre-emptible vs not-pre-empt though 11:50:58 * johnthetubaguy hopes we get to flavor decomposition eventually! 11:51:11 <belmoreira> johnthetubaguy ahaha :) 11:51:12 <oneswig> In an Ironic use case (if indeed there is one), processes like node cleaning could make this a seriously lengthy build time 11:51:53 <b1airo> did you ever use EC2 back in the day oneswig - sometimes took them over 10 mins just to build a Xen VM! 11:52:08 <oneswig> b1airo: alas no... 11:52:31 <b1airo> everyone is getting exceptionally spoiled by on-demand computing. kids these days... 11:52:40 <johnthetubaguy> heh 11:52:56 <oneswig> ttsiouts: There's loads of interest here. How can we get involved? 11:53:14 <oneswig> Will you announce when the git repo is ready? 11:53:17 <johnthetubaguy> oneswig: its a good point, you may have to wait! 11:53:48 <johnthetubaguy> I like the idea of dropping a mail to the ML for public and scientific SIG, linking to the prototype code 11:53:57 <oneswig> me too 11:54:12 <b1airo> this all started with someone sticking web-services in-front of batch-queued clusters (i.e., The Grid), feels like we're about to come full-circle ;-) 11:54:44 <johnthetubaguy> b1airo: +1 bite our tail or something 11:54:49 <oneswig> There's a good deal of that! 11:54:59 <belmoreira> oneswig never though about an ironic usecase... interesting. But that seems more tricky 11:55:47 <oneswig> belmoreira: signalling the instance on reaping might be harder, there's an nmi that can be injected but who knows what effect it would have 11:55:57 <b1airo> ttsiouts: i like the idea of first preempting instances that are idling, i think that fits the science-cloud use-case, public-cloud would want other things i guess (e.g. billing/bidding tie ins) 11:56:40 <oneswig> b1airo: you're right, a good design would provide a nice separation of policy and mechanism 11:56:52 <johnthetubaguy> most of the extra things seem to boil down to priority of different spot instances 11:57:01 <oneswig> We are nearly out of time. Any more on this? 11:57:04 <johnthetubaguy> not saying that makes it easy! 11:57:26 <ttsiouts> I can add the link to it in the meeting agenda after the meeting 11:57:36 <b1airo> johnthetubaguy: got it - you'll have it ready for Rocky ;-) 11:57:42 <johnthetubaguy> ttsiouts: belmoreira: are we good sending that ML message? 11:57:49 <oneswig> ok - thanks ttsiouts 11:58:02 <johnthetubaguy> ah, cool 11:58:06 <martial_> cool, info to share :) 11:58:11 <ttsiouts> I will clean up some things first 11:58:17 <ttsiouts> :) 11:58:24 <johnthetubaguy> all good :) 11:58:31 <oneswig> #topic AOB 11:58:38 <oneswig> Any other business to raise? 11:58:44 <oneswig> In the final minute!? 11:59:00 <b1airo> anyone know how to force the tcp window size default on Windoze servers ? 11:59:17 * johnthetubaguy points at the registry, runs away 11:59:39 <b1airo> i have some long fat pipe problems to solve between a Windows-based instrument PC and a Linux HPC system in different States... 12:00:01 <oneswig> b1airo: you may be on your own there :-) 12:00:21 <b1airo> i guessed as much - for some reason the Windows end doesn't auto-scale the tcp window 12:00:27 <oneswig> OK, looks like time's up 12:00:38 <oneswig> Thank you everyone, very interesting discussion 12:00:42 <oneswig> #endmeeting