21:01:04 <oneswig> #startmeeting scientific-wg 21:01:05 <openstack> Meeting started Tue May 16 21:01:04 2017 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:08 <openstack> The meeting name has been set to 'scientific_wg' 21:01:10 <oneswig> aloha! 21:01:17 <martial> Hi Stig 21:01:28 <oneswig> #chair martial 21:01:29 <openstack> Current chairs: martial oneswig 21:01:30 <martial> so no use of the dedicated channel? 21:01:31 <oneswig> Hi Martial 21:01:37 <martial> because there are two it seems :) 21:01:50 <oneswig> Not as yet ... 21:01:56 <martial> #science-wg has people in it and #scientific-wg has a bot :) 21:02:19 <oneswig> Ah. Well I welcome feedback on https://review.openstack.org/#/c/459884/ 21:02:44 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_May_16th_2017 21:03:17 <oneswig> martial: you going to start a science-wg meeting as well? :-) 21:03:43 <martial> nope, not my intention, pointing people over here if anything 21:03:48 <oneswig> Do we have Blair today? 21:04:33 <oneswig> martial: np, it'll shake out. I think the review means the channel gets eavesdropped 21:05:14 <oneswig> OK, shall we start? Hope you had a good summit - sounds like I missed out on plenty 21:05:30 <oneswig> #topic Boston summit roundup 21:05:42 <jmlowe> doh! 21:05:52 <b1airo> Hi all! I only got back yesterday morning so still not sure which way is up 21:05:58 <oneswig> Hi jmlowe, all ready for the LUG? 21:06:01 <oneswig> #chair b1airo 21:06:02 <openstack> Current chairs: b1airo martial oneswig 21:06:04 <oneswig> Hi b1airo 21:06:12 <jmlowe> well the bot one is probably the real one and I'm not in it 21:06:28 <martial> hey Mike 21:06:37 <oneswig> ah, but how is real defined in IRC? 21:06:42 <jmlowe> oneswig: just signed up yesterday, I'll work registration for a bit to earn my free code 21:06:52 <jmlowe> Hey martial 21:07:22 <oneswig> jmlowe: I've heard the Cambridge team are gearing up for it. 21:07:46 <oneswig> So how did it go at the summit? 21:07:54 <jmlowe> oh, great, I'm trying to wind up for a big use openstack for everything pitch here, it will help 21:07:59 <b1airo> You mean they're not planning to ditch Lustre in favour of CephFS :-) 21:08:05 <martial> #link https://etherpad.openstack.org/p/Scientific-WG-boston 21:08:19 <oneswig> Ah thanks martial 21:08:19 <martial> so a lot of the conversation from the SWG happened in the Etherpad 21:09:03 <martial> Blair was kind enough to share his GPU work and some conversations he had with Nvidia 21:09:19 <b1airo> Yeah good turn outs for our sessions and some great lightening talks, only negative was no one volunteering to lead anything 21:09:28 <jmlowe> I'd love to do that some day, safety over speed an all 21:09:54 <martial> We talked about Identity Federation, more on that through the Open Research Cloud (ORC) Declaration (ORCD?) 21:09:58 <oneswig> b1airo: ah, too bad. 21:10:21 <oneswig> Did jmlowe just say he'd love to volunteer to lead? 21:10:28 <jmlowe> science-wg events were well attended I thought 21:10:33 <martial> stig: your work was discussed too (Too bad you could not be there) 21:10:37 <b1airo> And for next summit I'd suggest we simply to do sessions: one double session BoF and one lightening talks 21:10:37 <martial> #link http://www.stackhpc.com/monasca-log-api.html 21:10:50 <jmlowe> wait what? (was actually looking over the etherpad to volunteer for something) 21:10:55 <b1airo> *simply do two 21:10:57 <martial> b1airo: we might do two Lighting talks too 21:11:05 <oneswig> martial: cool! 21:11:18 <martial> mike: you are still chair for the next HPC 21:11:48 <martial> right? If you are, maybe we can use the extra Lighting Talk for the SWG to add some of your proposed talks? 21:12:13 <martial> (through the HPC track I mean) 21:12:23 <jmlowe> Which conference? 21:12:33 <martial> OpenStack Summit Australia 21:12:47 <martial> For the telemetry effort, I also mentioned our work here at NIST 21:12:52 <oneswig> dmoni? 21:12:54 <oneswig> How is it? 21:13:19 <martial> I met with my team today and we are going to try to release dmoni / ansible scripts / heat templates and VM config files mid june 21:13:22 <martial> github likely 21:13:43 <oneswig> martial: cool, keep us informed. 21:13:47 <martial> then ask people to test it 21:13:55 <jmlowe> Oh, um, I didn't know I had signed up to chair the fall summit hpc track, happy to review but showing up in person might be tricky 21:14:03 <martial> mike: bummer 21:14:06 <oneswig> martial: How did Cyborg go? 21:14:30 <martial> stig: Cyborg went well, we had a person from the team do a lighting talk 21:14:58 <b1airo> The special hardware forum session went reasonably well even if it got sidetracked in Nova scheduling for a while 21:15:00 <martial> stig: and the full session presented the aim of the project and how to get attached to it 21:15:17 <martial> b1airo: true, that was a good discussion as well 21:15:28 <oneswig> b1airo: was it clear if/how it is distinct from the newly-evolving scheduler traits? 21:15:32 <martial> #link https://etherpad.openstack.org/p/BOS-forum-special-hardware 21:16:04 <rbudden> hello 21:16:06 <martial> Lighting talks 21:16:08 <martial> #link https://etherpad.openstack.org/p/Scientific-WG-Boston-Lightning 21:16:15 <oneswig> Hi rbudden 21:16:16 <martial> Hi Robert :) 21:16:21 <rbudden> hi guys 21:16:33 <rbudden> got distracted on our ironic cluster, so apologies for being late 21:16:37 <oneswig> Was there a prize from Arkady for the lightning talks? 21:16:43 <oneswig> rbudden: that Bridges thing? 21:16:47 <rbudden> yep ;) 21:16:47 <oneswig> I've heard of it 21:17:24 <trandles> o/ sorry I'm late 21:17:27 <b1airo> oneswig: I haven't yet been back and watched Jay's placement API talks, but I guess the main thing is that Cyborg aims to lifecycle manage accelerators, and may provide scheduling info to Nova via placement as needed 21:17:36 <martial> stig: Google Home I think 21:17:44 <oneswig> As an aside, had a weird problem today - all new CentOS images built today are not starting their network, don't know why and it's bugging me... 21:17:53 <oneswig> Hi trandles 21:17:56 <b1airo> Jay was in the special hardware session and didn't poopoo anything in particular 21:18:21 <jmlowe> I might volunteer to take on the Scientific Datasets activity for this cycle 21:18:22 <b1airo> Actually had most of Nova core in thete 21:18:32 <martial> Mike: thank you 21:18:33 <b1airo> jmlowe: w00t! 21:18:43 <b1airo> Back in 5... 21:19:09 <oneswig> b1airo: most of Nova core, no pressure then 21:19:10 <martial> stig: yes Scientific Dataset was the next item on the list ... Mike just solved this question :) 21:19:30 <oneswig> jmlowe: would be great, how is this tackled at IU? 21:19:35 <martial> stig: then we had an interesting "OpenStack bugbears" 21:19:37 <jmlowe> A few weeks ago we grabed some bad centos cloud images, they were yanked but not before they caused us problems 21:20:14 <oneswig> jmlowe: bad in what way? 21:20:41 <martial> and then there was Greg and the interview. Talked to the gentleman for a bit on Thursday but he mentioned he would be around today ... is he here? 21:21:11 <martial> blair and I were also in many of the forum meeting where organization of the WG was discussed 21:21:21 <oneswig> No sign as yet but we have the questions, should reserve at least 20 mins for that 21:21:23 <martial> nothing too critical there yet 21:21:26 <jmlowe> oneswig: not sure, just remember Jeremy talking at the summit about finding some terminally broken cloud images in their repo a couple of weeks back 21:22:04 <martial> it was a well attended meeting with over 30 people in the room (and names in the Etherpad) 21:22:17 <oneswig> jmlowe: hmmm... I'll clear caches and try again. Would hate for this to be the root cause... 21:22:49 <martial> among the todos ... ##Todo: extend book chapter on federation (keystone / OpenID) 21:23:00 <oneswig> nice work martial - I see quite a few familiar folk in the etherpad, am doubly sorry to miss now! 21:23:39 <martial> stig: hopefully Australia (might be the one missing that one, reached out to the Federation about travel support ... awaiting to hear back) 21:23:43 <oneswig> martial: indeed, there's a pre-draft section there that needs much content 21:23:47 <jmlowe> oneswig: scientific data sets, we have more datasets showing up than we have room for, try to offload to wrangler's 10PB of lustre and reexport over nfs with some per tenant provider vlans, the rest we encourage to put on volumes and export over nfs to their other instances 21:24:51 <oneswig> jmlowe: will need to follow up about this. I've got you in my sights :-) 21:25:06 <martial> related to ORC (I like that acronym of course :) ) 21:25:07 <jmlowe> OpenID federation with globus auth in horizon is on my todo list, probably just in time for our annual review in July 21:25:09 <oneswig> We should also cover the cloud congress... move on? 21:25:23 <oneswig> #topic ORCD / cloud congress 21:25:32 <oneswig> take it away martial 21:26:01 <martial> topics of conversation were Federation / Promoting Teaching & Learning / Improve, Share, and Standardize Operational processes / Making federated cloud usage simple to adopt 21:26:24 <martial> Assist with Reproducibility / Standards and Open Source / Reduce friction from Policy / Cost / Funding Models 21:27:01 <martial> Security / Governance / Support / Federation 21:27:09 <martial> a very busy couple days 21:27:30 <martial> forgot Resource Sharing 21:27:59 <martial> the next steps are as follow: 21:28:04 <oneswig> how many people managed to attend and was it a good mix? 21:28:10 <jmlowe> the commercial cloud vendors were certainly present 21:28:12 <martial> - Leave open Google Folder for some time for additional input and then we will compile the declaration. The Google Folder docs will “close” off for edit in 2 weeks. 21:28:35 <martial> stig: yes mike is very correct and a few people from the research side 21:28:42 <martial> https://docs.google.com/document/d/1AmB59CaWBTklH9NIb_6vkif51eXLpapPegf_7ZyulBo/edit 21:28:59 <martial> (not sharing the link as pound link to be safe) 21:29:16 <martial> if you want to add to it/view the discussions, follow the link 21:29:27 <martial> - Next main meeting in Sydney November around OpenStack Summit. 21:29:44 <martial> - creation of Working Groups 21:30:24 <martial> that's pretty much it on the ORC'd 21:30:52 <oneswig> Thanks martial for the update 21:31:03 <martial> stig: feel free to review the link I just shared 21:31:11 <oneswig> am looking now 21:31:15 <martial> the conversation is just starting 21:31:49 <oneswig> I think it's a victory if there's any cross-fertilisation here 21:31:51 <martial> same problem as the BoF ... moderator asking a lot things akin to "does this work for everybody" 21:32:07 <martial> and nobody saying no 21:32:34 <martial> so we will see how this evolves 21:32:36 <oneswig> Before anything is decided, everything is possible 21:33:09 <oneswig> Good to hear that the effort will continue. 21:33:31 <oneswig> Was there much discussion on funding? I saw it on the agenda 21:33:41 <martial> yes and no 21:33:51 <martial> there was peopel identified as funding agency present 21:34:00 <martial> but no real talk about funding sources 21:34:13 <martial> my colleague Robert Bohn was on the "funding agency" panel 21:34:15 <trandles> when discussing funding and governance, "effort" should be capitalised...it's going to take a lot of Effort to tackle those issues 21:34:50 <martial> but he was here to talk Federation (and the effort run by his team on this matter) 21:35:22 <martial> Tim: you are very right, it was very ... chaotic 21:35:40 <martial> (now was it chaotic good or chaotic evil ...) 21:35:49 <b1airo> Another potential new focus area is cloud workload traces - KateK is looking for a student to work on it in Chameleon over the US summer 21:36:05 <trandles> I think chaotic good actually 21:36:36 <oneswig> b1airo: got a link to a role description? Might know some people 21:36:45 <martial> blair: we ought to publicize this for her 21:36:47 <trandles> b1airo: we have a workload effort ongoing that might benefit from discussion with a wider audience 21:36:53 <martial> (like you just did) 21:36:57 <martial> is Pierre around? 21:37:09 <oneswig> seems not. 21:37:26 <b1airo> #link http://www.nimbusproject.org/news/#421 21:37:33 <martial> :) 21:37:57 <oneswig> Thanks b1airo. OK, we ought to look over Gene's questions 21:38:13 <oneswig> or we'll be dashing madly at the end (as usual).. 21:38:26 <b1airo> Yes good point 21:38:43 <oneswig> How about I put the question as topic and you guys chip in with some soundbites? 21:38:53 <martial> sounds good to me 21:39:13 <oneswig> #topic Why as a student or researcher in university should I care about Scientific Working Group? 21:39:40 <oneswig> That's an interesting one, given none of us are actually students and not really researchers either. 21:39:48 <martial> (everybody feel free to contribute your take on it) 21:40:16 <oneswig> Mostly I'd say the SWG resonates with the architects and admins of research computing services. 21:40:45 <b1airo> Yes agreed, those people are sometimes also (or were) researchers 21:40:59 <oneswig> I've heard of the term "ResOps" before - people dedicated to outreach into research faculties to bring scientists onto the cloud platform most effectively. 21:40:59 <jmlowe> It's a relatively rare opportunity to connect with those architects and admins 21:41:26 <b1airo> But possible focus areas like workload traces and dataset sharing are much more concretely relevant to researchers 21:41:48 <oneswig> It's about bringing the benefits of cloud to their workflows? 21:41:49 <jmlowe> We have on open job, just posted last week to higher another, Jeremy Fischer from IU is our "ResOps" person and we need another 21:42:22 <oneswig> #topic Why do researchers choose OpenStack as their IaaS platform? 21:42:39 <martial> or maybe: Researchers and students often encounter needs for High Performance Computing or Distributed Computing, or simply for Infrastructure as a Service components. The SWG help aggregate knowledge of user and operators who have tried to setup and use such models and can help guide the research model for functional solutions 21:42:41 <b1airo> There is also interest amongst us in scientific application sharing/packaging for cloud 21:42:45 <martial> (oops too late on the last one) 21:43:42 <martial> The traditional HPC model is limited in what it can achieve, novel solutions based on Mesos, Kubernetes, OpenStack allow the deployments of specialized solutions on Commercial Of the Shelf as well as specialized hardware 21:43:48 <b1airo> Lots of reasons for that - flexibility in architecture, security, data locality 21:44:07 <oneswig> Research computing services see the advantages of converging a zoo of clusters into a single managed resource. Academia, as much as anywhere, suffers from beige-box "shadow IT" 21:44:14 <trandles> Because it's free (as-in money and open source) with a large, very active community. I don't feel like I'll suddenly be left with an abandoned platform when choosing OpenStack. 21:44:17 <martial> Spartan talk ... 21:44:33 <rbudden> trandles: +1 21:44:39 <martial> #link https://www.openstack.org/videos/barcelona-2016/spartan-a-hpc-cloud-hybrid-delivering-performance-and-flexibility 21:44:49 <rbudden> cost and community are two major factors 21:44:54 <oneswig> OpenStack is free if your time costs you nothing! 21:45:04 <b1airo> Lol 21:45:07 <rbudden> lol 21:45:12 <jmlowe> It is the defacto standard, from the campus, to the regional like Minnesota Supercomputing Institute to the National like Jetstream and bridges, and even international SKA, Nektar ( international depends on where you are standing) you have a uniform api for programmable cyber-infrastructure (tm) 21:45:40 <martial> tm included I see 21:45:49 <oneswig> b1airo: surprised you're letting the guys from across town get away without some comment on the local derby... 21:45:57 <jmlowe> I could fill the rest of the meeting with discussion of that term 21:46:26 <oneswig> jmlowe: you ever applied for funding for something? :-) 21:46:47 <trandles> as long as I get everything done that the program demands, my time is free when working on "free" software :P 21:46:48 <oneswig> #topic What are the key difference between scientific OpenStack Clouds and other general OpenStack Clouds? 21:46:50 <b1airo> oneswig: old news, they do what we do 12 months later :-) 21:47:06 <jmlowe> one pi coined cyber-infrastructure another added programmable 21:47:46 <oneswig> OK, this is where the bulk of the WG's value add comes in. 21:48:05 <b1airo> Integration with other research infrastructure is probably the big difference, e.g., major HPC, data archives, instruments 21:48:33 <jmlowe> The mix of memory, interconnect, networks local and upstream, experienced HPC staff, access to large parallel filesystems 21:49:07 <b1airo> Scientific deployments are also often quite open, e.g., outside the institutional firewall 21:49:10 <trandles> Different workload characteristics (that we're struggling to characterize effectively) 21:49:13 <oneswig> For us, there's problems that run on our cloud that are affected by Amdahl's law. Cloud workloads typically scale out in a way that scientific applications don't (or can't). Tight coupling between instances is the principal expression of this difference in application. 21:49:15 <jmlowe> if you are running a big pile of webservers you aren't going to have the same rule of thumb for processors to memory 21:49:22 <rbudden> jmlowe: +1 unique hardware definitely sets things appart 21:49:44 <martial> the SWG is about the use cases of integration of novel HPC models within a research cloud, including the use of specialized hardware (from GPUs to NUMA links) as well as specialized methodologies or distributed algorithms (MPI, ...) 21:49:47 <oneswig> What jmlowe said is pretty much what I menat 21:49:52 <oneswig> ... meant... 21:50:08 <martial> (10 minutes mark) 21:50:15 <oneswig> #topic What kinds of workloads do researchers run on their OpenStack Clouds? 21:50:35 <martial> Machine learning training models 21:50:41 <martial> Data Science evaluations 21:50:47 <b1airo> Easy: all of the workloads, and then some 21:50:56 <jmlowe> oneswig: you should flog your Lugano talk, the video is posted, very compelling case for doing all the above in research with openstack 21:51:13 <martial> Natural Language Processing, Machine Translation, Video Surveillance, ... 21:51:16 <trandles> data science frameworks that don't play well with HPC workload managers (DASK, Spark, etc.) 21:51:47 <martial> Tim: did no say HPC in this particular case 21:51:51 <martial> simply OpenStack 21:52:05 <jmlowe> I've got a guy from UTSA running NAMD doing mpi over our 10GigE vxlan tenant networks 21:52:10 <oneswig> We've worked on a couple of generic research computing resources which take all-comers. But we've also seen some very specialised applications such as medical informatics, or radio astronomy. Much of it is categorised as "the long tail of HPC", ie the stuff that doesn't fit well into conventional HPC infrastructure 21:52:27 <martial> but I agree with earlier comments, think of a topic ... OpenStack can likely do it 21:52:36 <oneswig> #topic How can researchers speed up their work with OpenStack? 21:52:39 <martial> (and make coffee and pancakes :) ... ) 21:52:52 <jmlowe> lots and lots of educational allocations on our clouds 21:52:54 <oneswig> Is this about the fabled metric of "time to paper"? 21:53:05 <trandles> haha 21:54:02 <jmlowe> One great way to speed things up is with orchestration and the higher level openstack projects 21:54:05 <oneswig> It's about the situations where the development cycles spend as much time between keyboard and chair as they do between compute, network and storage. If researchers can get up and running (and stay up and running) faster with OpenStack, it's a win. 21:54:11 <trandles> Researchers can speed up their work by using a runtime environment they control at a scale they might not be able to afford or support. 21:54:18 <jmlowe> so crawl with nova boot, walk with heat, run with sahara 21:54:50 <oneswig> jmlowe: what's next after that? 21:54:52 <martial> heat templates, ansible [why is the name failing me now], VM configurations => experiment => mutli tenant + ro data access + SDN=> segregated private experiment run 21:55:24 <oneswig> #topic What kinds of challenges do researchers face when using OpenStack clouds in their organization? 21:55:25 <martial> => repeatibility 21:55:37 <jmlowe> I had a guy who spent a couple of days trying to run some generic k8s heat template from a tutorial somewhere, had him just use magnum and he was off and running on his k8s cluster in 10 min, enter at the level of customization you need and forget the rest 21:56:08 <b1airo> Biggest challenge we see is that researchers are not sysadmins 21:56:55 <trandles> b1airo: +1 That's how we justify our entire existence. We focus on the computing infrastructure so they can focus on being scientists. 21:57:13 <martial> I like that 21:57:14 <jmlowe> one of the major problems I havae with reproducability is the idea that you keep everything the same, reproducability is not me going into your lab and using your graduated cylinders etc, it is me doing it with my equipment and getting roughly the same results 21:57:17 <rbudden> b1airo: +1 21:57:35 <flanders_> Scientific-wg tagline?! 21:57:41 <flanders_> ;) 21:57:44 <b1airo> Yeah and then the corollary challenge for us is how much effort to spend on the infrastructure versus helping with the science 21:58:11 <jmlowe> b1airo: +1 21:58:16 <martial> flanders_ +1 :) 21:58:20 <rbudden> yep, ‘user services’ vs ‘facilities’ 21:58:57 <oneswig> There's a difference in mindset. Research computing has this level of order that doesn't apply in cloud. HPC users assume they can book a number of physical nodes and network switches. There's time sharing and strict queuing. In comparison, cloud users get resource like they're crowding round an ice cream shop! 21:59:15 <oneswig> #topic What features are missing in OpenStack to provide better infrastructure for scientific research? 21:59:32 <oneswig> Next meeting perhaps...? This could take a little while 21:59:44 <jmlowe> spot instances? 21:59:45 <trandles> lol but it's our chance to be selfish 21:59:53 <oneswig> preemptible instances and resource reservation has been a long-sought-after goal 21:59:53 <b1airo> Yeah maybe we should carry those two over to next meeting... 22:00:03 <jmlowe> yeah 22:00:31 <oneswig> Alas, we are out of time. 22:00:32 <martial> (not sure there is a meeting after, so if we need to overrun, others can tell us :) ) 22:00:51 <oneswig> pipe up if you're waiting or we'll nail this last question... 22:01:38 <martial> (and we can move the the #scientific-wg if needed) 22:01:45 <martial> seems we can go on 22:01:53 <martial> answers anybody? 22:02:28 <oneswig> When looking at HPC workloads on OpenStack, exposing physical resource into the virtual world has been key for hypervisor efficiency gains. The next level may be placement within the physical network. How can we deliver the benefits of cloud but pare it down to something so close to the metal? 22:03:24 <trandles> that question would have been a lot easier a couple years ago but now I feel like a lot of gaps are being filled 22:03:54 <oneswig> In essence a lot of the WG members are "physicalising" the virtual resources, and somehow the OpenStack managed infrastructure is still flexible enough to be a game changer. 22:04:20 <oneswig> ... final comments ? 22:05:03 <oneswig> OK, lets wrap up - thanks everyone 22:05:12 <oneswig> #endmeeting