21:00:23 <oneswig> #startmeeting scientific-wg 21:00:24 <openstack> Meeting started Tue Sep 5 21:00:23 2017 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:26 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:28 <openstack> The meeting name has been set to 'scientific_wg' 21:00:28 <zioproto> hello 21:00:31 <hogepodge> hi 21:00:39 <oneswig> hello and good evening etc. 21:00:43 <rbudden> hello 21:00:58 <martial> Hello all 21:00:58 <priteau> Hello! 21:01:02 <oneswig> #link agenda for today is https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_September_5th_2017 21:01:20 <oneswig> #chair martial 21:01:21 <openstack> Current chairs: martial oneswig 21:02:01 <oneswig> Quite a few topics to cover today, lets get rolling 21:02:23 <oneswig> #topic opportunistic capacity on OpenStack 21:02:58 <oneswig> Blair was particularly interested in this - should we defer until he's joined? 21:03:12 <oneswig> Lets cover item 2 first 21:03:23 <oneswig> #topic private cloud capacity meter 21:03:54 <oneswig> OK so this item was triggered by the discussion on metrics for instance availability a few weeks back 21:04:38 <oneswig> As an example of how new API capabilities can be used, John Garbutt put together a demo 21:05:01 <oneswig> for measuring cloud available capacity using (some of) the new Nova placement APIs 21:05:18 <oneswig> #link os-capacity tool https://github.com/johngarbutt/os-capacity 21:05:35 <oneswig> Works particularly well for bare metal clouds! 21:05:47 <zioproto> cool, I just went through the README. I guess you need to run it with Admin credentials right ? 21:06:06 <oneswig> yes, it's an admin tool, unless your users are especially empowered 21:06:39 <priteau> oneswig: Does it require the placement API? 21:06:47 <b1airo> morning and sorry for tardiness - bit of a morning meltdown happening with #1 here 21:07:08 <oneswig> Yes but not the new features introduced by pike - we use it on ocata - but one day will improve to use it with the new features introduced in pike 21:07:19 <oneswig> Hi b1airo 21:07:23 <oneswig> #chair b1airo 21:07:24 <openstack> Current chairs: b1airo martial oneswig 21:07:33 <oneswig> just on os-capacity 21:07:36 <martial> welcome blair 21:08:03 <oneswig> What it helps with is the disconnect between (eg) SLURM queues and cloud about how to handle being full. 21:08:22 <oneswig> cloud says 'no', slurm says 'join the queue' 21:08:37 <oneswig> At least now we have an idea of how much resource we can ask for 21:09:20 <oneswig> OK - just wanted to offer that up - share and enjoy :-) 21:09:38 <oneswig> Back to the agenda 21:09:47 <oneswig> #topic opportunistic cloud capacity 21:09:58 <oneswig> b1airo: take it away 21:10:44 <b1airo> wanted to take a suevey of what people/deployers are doing to address this use-case today 21:11:18 <b1airo> what opportunistic capacity i mean something slightly different to the usual "on-demand" associated with cloud-computing 21:12:18 <b1airo> my experience of "on-demand" in the private/community cloud space is that it really means on-demand until the cloud is full, then again for a little while after each upgrade, but generally it becomes hard to launch e.g. larger flavours actually on-demand 21:13:47 <b1airo> i'd like to carve out some compute capacity for groups who have burst / speculative use-cases and are happy to be able to launch e.g. one or two 16 core instances for 24 hours with some basic fairness mechanism arbitrating 21:14:15 <b1airo> the simplest idea today seems to be: 21:14:27 <b1airo> 1) create a separate AZ for it 21:14:54 <b1airo> 2) create a new project for each existing project that wants access (to control quota) 21:15:09 <b1airo> 3) give that new project access to use the AZ 21:15:59 <b1airo> 4) run watcher and killer scripts that randomly kill stuff older than X hours 21:16:45 <oneswig> b1airo: would you have it so that there was some kind of kill-to-fill LRU execution when an instance is requested and the AZ is full? 21:17:09 <zioproto> b1airo: watcher and killer scripts, are you considering Openstack Mistral ? 21:17:41 <b1airo> that'd certainly be ideal oneswig , but implementing that is a nightmare i reckon, would be easier to have external scripts just ensure there is always Y capacity free 21:18:03 <b1airo> and if no instances older than limit then the zone is full 21:18:24 <zioproto> b1airo: we have a similar use case where we have to make sure instances run by students are killed everynight at midnight. The use case is different, but we have the same concept of killing resources after a while they are running 21:19:16 <oneswig> zioproto: do you use mistral for that, as you suggest? 21:19:34 <zioproto> oneswig: no we dont 21:20:03 <b1airo> does sound similar zioproto, i guess that is for a lab setup? 21:20:44 <b1airo> re. mistral, maybe... is it a good fit? 21:20:47 <zioproto> yes, we are using ansible based stuff to delete the instances 21:21:07 <zioproto> just because we needed to set this up quickly, and we did not have time to learn another tool just for this task 21:22:08 <b1airo> anyway, i think this general use case is very common and something that OpenStack really needs to address 21:22:37 <priteau> b1airo: In our team we call "on-availability" the opposite of "on-demand". One of our projects is combining OpenStack for on-demand and Torque for on-availability, where the compute nodes are moved from one to the other depending on usage. We are hoping to publish results at a conference in 2018. This is quite different from your solution and not relying only on OpenStack though. 21:23:07 <b1airo> e.g. in the Nectar cloud we have an allocations system with 1,3,6,12 month project lengths and an expiry process. but even with 7 zones of 3-4k cores we still run into this problem 21:23:30 <oneswig> priteau: does that mean that torque queues up OpenStack API requests that couldn't be satisfied? 21:23:41 <StefanPaetowJisc> evening folks. Pardon the tardiness 21:23:44 <b1airo> priteau, "on-availability" - i like it! have not heard that term in this context before 21:23:46 <oneswig> Hi StefanPaetowJisc 21:23:53 <rbudden> priteau: that’s similar to the limited use cases we’ve had for large scale VMs. We’ve traditionally had the nodes placed in a Slurm reservation, then turned into Nova Computes on demand via bash/ansible/etc. 21:23:59 <zioproto> b1airo: I am reading as we speak the python code my colleague wrote. The project is decorated with an attribute. Reading the attribute we wrote a custom python code that decides if killing or shutting down the instances on the project. 21:24:18 <b1airo> o/ StefanPaetowJisc 21:24:18 <priteau> oneswig: no, they're two separate queues with possibly different groups of users. But behind the scenes they're sharing the same cluster. 21:24:31 <rbudden> which reminds me, i still owe b1airo an email about this ;) 21:24:49 <b1airo> oh hey rbudden o/ 21:25:21 <oneswig> John Garbutt asked me to prompt people interested in preemptible instances (which essentially is the user-centric effect of this concept) 21:25:25 <priteau> b1airo: Not directly related to the above: the Blazar team will be meeting for the Denver PTG next week and will discuss the idea of the "reaper" service that was proposed in Boston 21:25:30 <oneswig> If they could review and comment on https://review.openstack.org/#/c/438640 21:25:48 <oneswig> If they haven't done so already. This will inform discussion at the PTG next week. 21:26:01 <martial> priteau +1 21:26:15 <oneswig> So please take a look if you want a spot instance capability on your cloud 21:26:22 <zioproto> #link WIP: Backlog spec on preemptible servers https://review.openstack.org/#/c/438640 21:26:37 <oneswig> zioproto: the very same :-) 21:26:49 <zioproto> oneswig: yes I just formatted it for the MeetBot 21:26:57 <oneswig> thanks zioproto 21:27:23 <oneswig> priteau: how's the gui for blazar? 21:27:52 <priteau> oneswig: It's upstream! 21:28:01 <priteau> https://git.openstack.org/cgit/openstack/blazar-dashboard/ 21:28:05 <oneswig> nice work 21:28:11 <b1airo> nice 21:28:38 <oneswig> We may want this, sooner rather than later, our ska system is getting very busy 21:29:00 <oneswig> I'll be in touch priteau... 21:29:05 <priteau> Sounds good 21:31:09 <oneswig> OK, anything more to add on opportunistic usage? 21:31:38 <b1airo> i'm interested to know if people think it is ok to have a different system/api to meet this use-case 21:32:25 <martial> b1airo: I think that is how some people do it, so I would vote yes 21:32:34 <b1airo> or whether it should be through Nova API and therefore require some API changes to instance creation, i.e., a richer NoValidHost 21:33:08 <martial> I do like the blazar solution 21:33:25 <b1airo> martial, the implication if that is the case is we as a community should make an effort to ease and demonstrate that integration for newcomers 21:34:16 <b1airo> there are really not that many combinations to worry about, e.g., Nova+SLURM and Nova+PBS would probably cover ~80% 21:34:24 <priteau> rbudden: Do you have a writeup of your Slurm/Nova solution somewhere? 21:34:55 <rbudden> priteau: I owe and email about this to b1airo, I can include you on it if you’d like ;) 21:35:03 <priteau> Yes please! 21:35:14 <martial> rbudden: can you add me as well? 21:35:18 <rbudden> Everything is just getting back to normal after some vacation and our Bridges upgrades 21:35:21 <rbudden> Martial: sure thing 21:35:28 <martial> thx 21:35:38 <b1airo> thanks rbudden! 21:35:39 <rbudden> I’ll warn you it’s nothing super fancy 21:35:46 <oneswig> b1airo: On the all-openstack side, I think there are liabilities with queuing to get an instance that nova may be wary of, eg, what if I was delayed in creating an instance and then found the resources upon which I depended were gone 21:35:48 <zioproto> I dont know if it is related but I did some testing in running 1000 VMs in a single 'openstack server create' command 21:35:59 <rbudden> largely utilizing Availability Zones and metadata tagging of hypervisors and Nova flavors 21:36:02 <zioproto> the idea is to be able to run that big about of VMs but for a short time 21:36:04 <rockyg> rbudden, you should also consider giving it to the folks who publish superuser blog 21:36:34 <oneswig> Hi rockyg! 21:36:40 <rbudden> rockyg: sounds interesting, i can check into that unless someone has a direct contact I can use? 21:36:45 <rockyg> Hey! been lurking 21:37:02 <oneswig> rbudden: there's a whole chapter on this... in the book... hint... 21:37:04 <rockyg> Nicole ??? 21:37:08 <zioproto> where is a good start to read about PBS+nova, given that I never used PBS ? 21:37:27 <rbudden> oneswig: thanks! i have a copy in front of me on the bookshelf, i’ll check it out! 21:37:43 <rbudden> zioproto: I just did a simlar test using Nova/Ironic during our upgrade 21:38:01 <rbudden> I believe only on the order of 500 nodes in a single instantiation 21:38:04 <oneswig> rbudden: might be a case study for the second edition? 21:38:16 <rbudden> yes, i’ll have notes on this for the book update! 21:38:41 <rbudden> moved to local boot across all nodes and was able to test and verify the Nova scheduler bug fix for this that’s mentioned in the first edition of the book 21:38:42 <rockyg> Nicole Martinelli, rbudden 21:38:50 <rbudden> rockyg: thx 21:39:21 <zioproto> rbudden: #link https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/ 21:40:38 <rbudden> cool, i’ll check out the link 21:40:39 <martial> zioproto: very nice indeed 21:41:45 <oneswig> zioproto: you should get your blog onto planet.openstack.org, if it isn't already? 21:42:04 <rockyg> ++ to that. 21:42:12 <zioproto> #action zioproto check if his blog is already on planet openstack 21:43:03 <oneswig> OK, move on? 21:43:22 <oneswig> #topic book update 21:43:37 <oneswig> The second edition is taking shape nicely. 21:43:53 <oneswig> Thank you to everyone who has contributed their time and input so far. 21:44:07 <oneswig> We have some case studies to fill still 21:44:28 <oneswig> 1) Bare metal infrastructure management case study please, to accompany Bridges and Chameleon 21:44:56 <oneswig> 2) Federation examples to be proposed for the new section, led by Enol 21:45:18 <hogepodge> I'm here to remind everyone of the deadline. 21:45:37 <rbudden> oneswig: as mentioned in my email to you this morning, i’ll be working on the Bridges update this week. 21:45:49 <oneswig> Thanks rbudden, appreciated 21:46:04 <rbudden> I was delaying in hopes to have more time to play with some Neutron integration, but other tasks have unfortuantey had me preoccupied 21:46:22 <priteau> I have done most of the update of the Chameleon case study this morning, will still provide a few more changes later this week 21:47:11 <rbudden> I’m attempting a skip level upgrade from Liberty -> Ocata on our second cluster… I doubt I’ll have it complete before the deadline but if I’ll keep everyone appraised 21:48:00 <oneswig> Excellent. What of the federators in this time zone? 21:48:52 <oneswig> (... obviously busy debugging SAML issues...) 21:50:02 <hogepodge> Does the team feel like it's on track to deliver the update in a few weeks? 21:50:11 <StefanPaetowJisc> Sorry, debugging non-SAML stuff here... feverishly trying to get GSSAPI (Moonshot) done for an HPC-SIG meeting next week :-) 21:50:30 <StefanPaetowJisc> Still haven't looked at the book spec :-( 21:50:32 <oneswig> hogepodge: I think so, many people have been responsive 21:50:53 <oneswig> Good luck StefanPaetowJisc, keep us updated! 21:50:54 <hogepodge> Excellent. Is there anything I need to take back to the Foundation team? 21:51:31 <hogepodge> We're hoping that the book will have an exciting color image, btw. :-D 21:51:58 <martial> hogepodge: still the plan, we have to discuss a cut off date for review but we are on track 21:52:10 <oneswig> hogepodge: nothing comes to mind for the foundation team right now, thanks 21:52:15 <martial> color :) 21:52:34 <oneswig> hogepodge: how will you decide on a cover? 21:53:15 <oneswig> BTW - one issue - does anyone have Adobe Illustrator? We can read the .ai files (they are actually PDFs) but not edit them. 21:53:19 <hogepodge> oneswig: the previous book used a research image from one of our community members. If you get an image to us, we can get it to our design team to build out the cover 21:54:39 <oneswig> Interesting idea... Can the WG members run a poll do you think? I'm sure you and the team would pick a good one. 21:54:43 <StefanPaetowJisc> Hmmmm 21:54:49 <StefanPaetowJisc> I have AI somewhere... 21:55:05 <StefanPaetowJisc> I have AI CS4. 21:55:08 <StefanPaetowJisc> If that helps 21:55:21 <b1airo> pretty sure i can get Adobe suite if required 21:55:32 <martial> same as b1airo 21:55:32 <oneswig> StefanPaetowJisc: it could well. Can I bear that in mind. Same to you b1airo 21:56:00 <b1airo> hogepodge, i was wondering about that - we might be able to get something from Monash 21:56:17 <oneswig> I sense a poll ... 21:56:23 <oneswig> OK, 1 final topic to squeeze in - can we do it? 21:56:40 <b1airo> i will talk to my colleague who is very good with this sort of stuff and spends hours making slide decks :-) 21:56:44 <StefanPaetowJisc> Ok, oneswig. 21:56:46 <oneswig> #topic SWG -> SSIG? 21:57:02 <oneswig> b1airo: what's up? Do we automatically become a SIG? 21:57:45 <zioproto> I will try to let you know about this soon 21:57:56 <zioproto> should be a topic in the UC 21:58:00 <martial> that is a conversation that was explained to us at the UC forum session in Boston 21:58:06 <zioproto> we skipped a meeting because of bank holiday in the US 21:58:28 <martial> but it seemed to Blair and I at the time that it seems so 21:58:42 <rockyg> So, you get to say yea/nay 21:58:46 <martial> zioproto, you will keep us updated it seems :) 21:58:47 <oneswig> Are there material changes to be aware of? 21:58:53 <rockyg> I don't know what happens if you don't pick. 21:59:10 <rockyg> Trying to get more devs involved. 21:59:51 <zioproto> as far as I understood the biggest change is that there will be a big mailing list with all SIGs 22:00:01 <StefanPaetowJisc> EWWW 22:00:02 <zioproto> and you have to write with your SIG in [] 22:00:11 <zioproto> similar to openstack-dev mailing list 22:00:19 <b1airo> sorry i walked away to chase someone to pack there schoolbag o_0 22:00:25 <b1airo> *their 22:00:30 <b1airo> (back to school for me!) 22:00:32 <oneswig> It doesn't sound all that different 22:00:43 <priteau> Isn't that what we already do? 22:00:43 <zioproto> oneswig: I would not worry too much 22:00:49 <oneswig> b1airo: it's been that week here, too 22:00:58 <rockyg> Yeah. Hope is one ml will get more response and cross pollination 22:01:01 <b1airo> zioproto, that is my understanding too 22:01:02 <priteau> Let's skip the SIG and go straight to STIG ;-) 22:01:10 <zioproto> #link http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs 22:01:25 <oneswig> is the major difference the expectation of less work and more interest? 22:01:28 <b1airo> Special Technical Interest Group! 22:01:33 <oneswig> priteau: too good.... 22:01:35 <priteau> b1airo: exactly! 22:01:55 <b1airo> oneswig, i think that is one of the subtler expectations yeah 22:02:16 <b1airo> WGs were probably thought of originally as more autonomous and goal focused 22:02:24 <oneswig> Ah, we are over time. 22:02:35 <zioproto> good night ! 22:02:48 <oneswig> But to conclude, this is not an issue of concern it seems 22:02:53 <oneswig> business as usual? 22:03:02 <oneswig> zioproto: thanks for staying up! 22:03:05 <b1airo> whereas SIGs are a way to get cliques together, and i think the UC would then like to introduce a few more guidelines to get useful and standardised outputs from those groups 22:03:22 <rbudden> gotta jet, goodbye everyone! 22:03:30 <b1airo> bye all! 22:03:30 <zioproto> b1airo: ok ! I take this input for the UC :) 22:03:38 <StefanPaetowJisc> bye rbudden 22:03:43 <oneswig> thanks everyone 22:03:50 <priteau> bye everyone 22:03:52 <zioproto> guys it is really late here, I have to leave to sleep, ciao :) 22:04:00 <oneswig> #endmeeting