21:01:04 <oneswig> #startmeeting scientific-wg
21:01:05 <openstack> Meeting started Tue May 16 21:01:04 2017 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:01:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:01:08 <openstack> The meeting name has been set to 'scientific_wg'
21:01:10 <oneswig> aloha!
21:01:17 <martial> Hi Stig
21:01:28 <oneswig> #chair martial
21:01:29 <openstack> Current chairs: martial oneswig
21:01:30 <martial> so no use of the dedicated channel?
21:01:31 <oneswig> Hi Martial
21:01:37 <martial> because there are two it seems :)
21:01:50 <oneswig> Not as yet ...
21:01:56 <martial> #science-wg has people in it and #scientific-wg has a bot :)
21:02:19 <oneswig> Ah.  Well I welcome feedback on https://review.openstack.org/#/c/459884/
21:02:44 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_May_16th_2017
21:03:17 <oneswig> martial: you going to start a science-wg meeting as well? :-)
21:03:43 <martial> nope, not my intention, pointing people over here if anything
21:03:48 <oneswig> Do we have Blair today?
21:04:33 <oneswig> martial: np, it'll shake out.  I think the review means the channel gets eavesdropped
21:05:14 <oneswig> OK, shall we start?  Hope you had a good summit - sounds like I missed out on plenty
21:05:30 <oneswig> #topic Boston summit roundup
21:05:42 <jmlowe> doh!
21:05:52 <b1airo> Hi all! I only got back yesterday morning so still not sure which way is up
21:05:58 <oneswig> Hi jmlowe, all ready for the LUG?
21:06:01 <oneswig> #chair b1airo
21:06:02 <openstack> Current chairs: b1airo martial oneswig
21:06:04 <oneswig> Hi b1airo
21:06:12 <jmlowe> well the bot one is probably the real one and I'm not in it
21:06:28 <martial> hey Mike
21:06:37 <oneswig> ah, but how is real defined in IRC?
21:06:42 <jmlowe> oneswig: just signed up yesterday, I'll work registration for a bit to earn my free code
21:06:52 <jmlowe> Hey martial
21:07:22 <oneswig> jmlowe: I've heard the Cambridge team are gearing up for it.
21:07:46 <oneswig> So how did it go at the summit?
21:07:54 <jmlowe> oh, great, I'm trying to wind up for a big use openstack for everything pitch here, it will help
21:07:59 <b1airo> You mean they're not planning to ditch Lustre in favour of CephFS :-)
21:08:05 <martial> #link https://etherpad.openstack.org/p/Scientific-WG-boston
21:08:19 <oneswig> Ah thanks martial
21:08:19 <martial> so a lot of the conversation from the SWG happened in the Etherpad
21:09:03 <martial> Blair was kind enough to share his GPU work and some conversations he had with Nvidia
21:09:19 <b1airo> Yeah good turn outs for our sessions and some great lightening talks, only negative was no one volunteering to lead anything
21:09:28 <jmlowe> I'd love to do that some day, safety over speed an all
21:09:54 <martial> We talked about Identity Federation, more on that through the Open Research Cloud (ORC) Declaration (ORCD?)
21:09:58 <oneswig> b1airo: ah, too bad.
21:10:21 <oneswig> Did jmlowe just say he'd love to volunteer to lead?
21:10:28 <jmlowe> science-wg events were well attended I thought
21:10:33 <martial> stig: your work was discussed too (Too bad you could not be there)
21:10:37 <b1airo> And for next summit I'd suggest we simply to do sessions: one double session BoF and one lightening talks
21:10:37 <martial> #link http://www.stackhpc.com/monasca-log-api.html
21:10:50 <jmlowe> wait what? (was actually looking over the etherpad to volunteer for something)
21:10:55 <b1airo> *simply do two
21:10:57 <martial> b1airo: we might do two Lighting talks too
21:11:05 <oneswig> martial: cool!
21:11:18 <martial> mike: you are still chair for the next HPC
21:11:48 <martial> right? If you are, maybe we can use the extra Lighting Talk for the SWG to add some of your proposed talks?
21:12:13 <martial> (through the HPC track I mean)
21:12:23 <jmlowe> Which conference?
21:12:33 <martial> OpenStack Summit Australia
21:12:47 <martial> For the telemetry effort, I also mentioned our work here at NIST
21:12:52 <oneswig> dmoni?
21:12:54 <oneswig> How is it?
21:13:19 <martial> I met with my team today and we are going to try to release dmoni / ansible scripts / heat templates and VM config files mid june
21:13:22 <martial> github likely
21:13:43 <oneswig> martial: cool, keep us informed.
21:13:47 <martial> then ask people to test it
21:13:55 <jmlowe> Oh, um, I didn't know I had signed up to chair the fall summit hpc track, happy to review but showing up in person might be tricky
21:14:03 <martial> mike: bummer
21:14:06 <oneswig> martial: How did Cyborg go?
21:14:30 <martial> stig: Cyborg went well, we had a person from the team do a lighting talk
21:14:58 <b1airo> The special hardware forum session went reasonably well even if it got sidetracked in Nova scheduling for a while
21:15:00 <martial> stig: and the full session presented the aim of the project and how to get attached to it
21:15:17 <martial> b1airo: true, that was a good discussion as well
21:15:28 <oneswig> b1airo: was it clear if/how it is distinct from the newly-evolving scheduler traits?
21:15:32 <martial> #link https://etherpad.openstack.org/p/BOS-forum-special-hardware
21:16:04 <rbudden> hello
21:16:06 <martial> Lighting talks
21:16:08 <martial> #link https://etherpad.openstack.org/p/Scientific-WG-Boston-Lightning
21:16:15 <oneswig> Hi rbudden
21:16:16 <martial> Hi Robert :)
21:16:21 <rbudden> hi guys
21:16:33 <rbudden> got distracted on our ironic cluster, so apologies for being late
21:16:37 <oneswig> Was there a prize from Arkady for the lightning talks?
21:16:43 <oneswig> rbudden: that Bridges thing?
21:16:47 <rbudden> yep ;)
21:16:47 <oneswig> I've heard of it
21:17:24 <trandles> o/  sorry I'm late
21:17:27 <b1airo> oneswig: I haven't yet been back and watched Jay's placement API talks, but I guess the main thing is that Cyborg aims to lifecycle manage accelerators, and may provide scheduling info to Nova via placement as needed
21:17:36 <martial> stig: Google Home I think
21:17:44 <oneswig> As an aside, had a weird problem today - all new CentOS images built today are not starting their network, don't know why and it's bugging me...
21:17:53 <oneswig> Hi trandles
21:17:56 <b1airo> Jay was in the special hardware session and didn't poopoo anything in particular
21:18:21 <jmlowe> I might volunteer to take on the Scientific Datasets activity for this cycle
21:18:22 <b1airo> Actually had most of Nova core in thete
21:18:32 <martial> Mike: thank you
21:18:33 <b1airo> jmlowe: w00t!
21:18:43 <b1airo> Back in 5...
21:19:09 <oneswig> b1airo: most of Nova core, no pressure then
21:19:10 <martial> stig: yes Scientific Dataset was the next item on the list ... Mike just solved this question :)
21:19:30 <oneswig> jmlowe: would be great, how is this tackled at IU?
21:19:35 <martial> stig: then we had an interesting "OpenStack bugbears"
21:19:37 <jmlowe> A few weeks ago we grabed some bad centos cloud images, they were yanked but not before they caused us problems
21:20:14 <oneswig> jmlowe: bad in what way?
21:20:41 <martial> and then there was Greg and the interview. Talked to the gentleman for a bit on Thursday but he mentioned he would be around today ... is he here?
21:21:11 <martial> blair and I were also in many of the forum meeting where organization of the WG was discussed
21:21:21 <oneswig> No sign as yet but we have the questions, should reserve at least 20 mins for that
21:21:23 <martial> nothing too critical there yet
21:21:26 <jmlowe> oneswig: not sure, just remember Jeremy talking at the summit about finding some terminally broken cloud images in their repo a couple of weeks back
21:22:04 <martial> it was a well attended meeting with over 30 people in the room (and names in the Etherpad)
21:22:17 <oneswig> jmlowe: hmmm... I'll clear caches and try again.  Would hate for this to be the root cause...
21:22:49 <martial> among the todos ... ##Todo: extend book chapter on federation (keystone / OpenID)
21:23:00 <oneswig> nice work martial - I see quite a few familiar folk in the etherpad, am doubly sorry to miss now!
21:23:39 <martial> stig: hopefully Australia (might be the one missing that one, reached out to the Federation about travel support ... awaiting to hear back)
21:23:43 <oneswig> martial: indeed, there's a pre-draft section there that needs much content
21:23:47 <jmlowe> oneswig: scientific data sets, we have more datasets showing up than we have room for, try to offload to wrangler's 10PB of lustre and reexport over nfs with some per tenant provider vlans, the rest we encourage to put on volumes and export over nfs to their other instances
21:24:51 <oneswig> jmlowe: will need to follow up about this. I've got you in my sights :-)
21:25:06 <martial> related to ORC (I like that acronym of course :) )
21:25:07 <jmlowe> OpenID federation with globus auth in horizon is on my todo list, probably just in time for our annual review in July
21:25:09 <oneswig> We should also cover the cloud congress... move on?
21:25:23 <oneswig> #topic ORCD / cloud congress
21:25:32 <oneswig> take it away martial
21:26:01 <martial> topics of conversation were Federation / Promoting Teaching & Learning  / Improve, Share, and Standardize Operational processes / Making federated cloud usage simple to adopt
21:26:24 <martial> Assist with Reproducibility / Standards and Open Source  / Reduce friction from Policy / Cost / Funding Models
21:27:01 <martial> Security / Governance /  Support / Federation
21:27:09 <martial> a very busy couple days
21:27:30 <martial> forgot Resource Sharing
21:27:59 <martial> the next steps are as follow:
21:28:04 <oneswig> how many people managed to attend and was it a good mix?
21:28:10 <jmlowe> the commercial cloud vendors were certainly present
21:28:12 <martial> - Leave open Google Folder for some time for additional input and then we will compile the declaration.  The Google Folder docs will “close” off for edit in 2 weeks.
21:28:35 <martial> stig: yes mike is very correct and a few people from the research side
21:28:42 <martial> https://docs.google.com/document/d/1AmB59CaWBTklH9NIb_6vkif51eXLpapPegf_7ZyulBo/edit
21:28:59 <martial> (not sharing the link as pound link to be safe)
21:29:16 <martial> if you want to add to it/view the discussions, follow the link
21:29:27 <martial> - Next main meeting in Sydney November around OpenStack Summit.
21:29:44 <martial> - creation of Working Groups
21:30:24 <martial> that's pretty much it on the ORC'd
21:30:52 <oneswig> Thanks martial for the update
21:31:03 <martial> stig: feel free to review the link I just shared
21:31:11 <oneswig> am looking now
21:31:15 <martial> the conversation is just starting
21:31:49 <oneswig> I think it's a victory if there's any cross-fertilisation here
21:31:51 <martial> same problem as the BoF ... moderator asking a lot things akin to "does this work for everybody"
21:32:07 <martial> and nobody saying no
21:32:34 <martial> so we will see how this evolves
21:32:36 <oneswig> Before anything is decided, everything is possible
21:33:09 <oneswig> Good to hear that the effort will continue.
21:33:31 <oneswig> Was there much discussion on funding?  I saw it on the agenda
21:33:41 <martial> yes and no
21:33:51 <martial> there was peopel identified as funding agency present
21:34:00 <martial> but no real talk about funding sources
21:34:13 <martial> my colleague Robert Bohn was on the "funding agency" panel
21:34:15 <trandles> when discussing funding and governance, "effort" should be capitalised...it's going to take a lot of Effort to tackle those issues
21:34:50 <martial> but he was here to talk Federation (and the effort run by his team on this matter)
21:35:22 <martial> Tim: you are very right, it was very ... chaotic
21:35:40 <martial> (now was it chaotic good or chaotic evil ...)
21:35:49 <b1airo> Another potential new focus area is cloud workload traces - KateK is looking for a student to work on it in Chameleon over the US summer
21:36:05 <trandles> I think chaotic good actually
21:36:36 <oneswig> b1airo: got a link to a role description?  Might know some people
21:36:45 <martial> blair: we ought to publicize this for her
21:36:47 <trandles> b1airo: we have a workload effort ongoing that might benefit from discussion with a wider audience
21:36:53 <martial> (like you just did)
21:36:57 <martial> is Pierre around?
21:37:09 <oneswig> seems not.
21:37:26 <b1airo> #link http://www.nimbusproject.org/news/#421
21:37:33 <martial> :)
21:37:57 <oneswig> Thanks b1airo.  OK, we ought to look over Gene's questions
21:38:13 <oneswig> or we'll be dashing madly at the end (as usual)..
21:38:26 <b1airo> Yes good point
21:38:43 <oneswig> How about I put the question as topic and you guys chip in with some soundbites?
21:38:53 <martial> sounds good to me
21:39:13 <oneswig> #topic Why as a student or researcher in university should I care about Scientific Working Group?
21:39:40 <oneswig> That's an interesting one, given none of us are actually students and not really researchers either.
21:39:48 <martial> (everybody feel free to contribute your take on it)
21:40:16 <oneswig> Mostly I'd say the SWG resonates with the architects and admins of research computing services.
21:40:45 <b1airo> Yes agreed, those people are sometimes also (or were) researchers
21:40:59 <oneswig> I've heard of the term "ResOps" before - people dedicated to outreach into research faculties to bring scientists onto the cloud platform most effectively.
21:40:59 <jmlowe> It's a relatively rare opportunity to connect with those architects and admins
21:41:26 <b1airo> But possible focus areas like workload traces and dataset sharing are much more concretely relevant to researchers
21:41:48 <oneswig> It's about bringing the benefits of cloud to their workflows?
21:41:49 <jmlowe> We have on open job, just posted last week to higher another, Jeremy Fischer from IU is our "ResOps" person and we need another
21:42:22 <oneswig> #topic Why do researchers choose OpenStack as their IaaS platform?
21:42:39 <martial> or maybe: Researchers and students often encounter needs for High Performance Computing or Distributed Computing, or simply for Infrastructure as a Service components. The SWG help aggregate knowledge of user and operators who have tried to setup and use such models and can help guide the research model for functional solutions
21:42:41 <b1airo> There is also interest amongst us in scientific application sharing/packaging for cloud
21:42:45 <martial> (oops too late on the last one)
21:43:42 <martial> The traditional HPC model is limited in what it can achieve, novel solutions based on Mesos, Kubernetes, OpenStack allow the deployments of specialized solutions on Commercial Of the Shelf as well as specialized hardware
21:43:48 <b1airo> Lots of reasons for that - flexibility in architecture, security, data locality
21:44:07 <oneswig> Research computing services see the advantages of converging a zoo of clusters into a single managed resource.  Academia, as much as anywhere, suffers from beige-box "shadow IT"
21:44:14 <trandles> Because it's free (as-in money and open source) with a large, very active community.  I don't feel like I'll suddenly be left with an abandoned platform when choosing OpenStack.
21:44:17 <martial> Spartan talk ...
21:44:33 <rbudden> trandles: +1
21:44:39 <martial> #link https://www.openstack.org/videos/barcelona-2016/spartan-a-hpc-cloud-hybrid-delivering-performance-and-flexibility
21:44:49 <rbudden> cost and community are two major factors
21:44:54 <oneswig> OpenStack is free if your time costs you nothing!
21:45:04 <b1airo> Lol
21:45:07 <rbudden> lol
21:45:12 <jmlowe> It is the defacto standard, from the campus, to the regional like Minnesota Supercomputing Institute to the National like Jetstream and bridges, and even international SKA, Nektar ( international depends on where you are standing) you have a uniform api for programmable cyber-infrastructure (tm)
21:45:40 <martial> tm included I see
21:45:49 <oneswig> b1airo: surprised you're letting the guys from across town get away without some comment on the local derby...
21:45:57 <jmlowe> I could fill the rest of the meeting with discussion of that term
21:46:26 <oneswig> jmlowe: you ever applied for funding for something? :-)
21:46:47 <trandles> as long as I get everything done that the program demands, my time is free when working on "free" software :P
21:46:48 <oneswig> #topic What are the key difference between scientific OpenStack Clouds and other general OpenStack Clouds?
21:46:50 <b1airo> oneswig: old news, they do what we do 12 months later :-)
21:47:06 <jmlowe> one pi coined cyber-infrastructure another added programmable
21:47:46 <oneswig> OK, this is where the bulk of the WG's value add comes in.
21:48:05 <b1airo> Integration with other research infrastructure is probably the big difference, e.g., major HPC, data archives, instruments
21:48:33 <jmlowe> The mix of memory, interconnect, networks local and upstream, experienced HPC staff, access to large parallel filesystems
21:49:07 <b1airo> Scientific deployments are also often quite open, e.g., outside the institutional firewall
21:49:10 <trandles> Different workload characteristics (that we're struggling to characterize effectively)
21:49:13 <oneswig> For us, there's problems that run on our cloud that are affected by Amdahl's law.  Cloud workloads typically scale out in a way that scientific applications don't (or can't).  Tight coupling between instances is the principal expression of this difference in application.
21:49:15 <jmlowe> if you are running a big pile of webservers you aren't going to have the same rule of thumb for processors to memory
21:49:22 <rbudden> jmlowe: +1 unique hardware definitely sets things appart
21:49:44 <martial> the SWG is about the use cases of integration of novel HPC models within a research cloud, including the use of specialized hardware (from GPUs to NUMA links) as well as specialized methodologies or distributed algorithms (MPI, ...)
21:49:47 <oneswig> What jmlowe said is pretty much what I menat
21:49:52 <oneswig> ... meant...
21:50:08 <martial> (10 minutes mark)
21:50:15 <oneswig> #topic What kinds of workloads do researchers run on their OpenStack Clouds?
21:50:35 <martial> Machine learning training models
21:50:41 <martial> Data Science evaluations
21:50:47 <b1airo> Easy: all of the workloads, and then some
21:50:56 <jmlowe> oneswig: you should flog your Lugano talk, the video is posted, very compelling case for doing all the above in research with openstack
21:51:13 <martial> Natural Language Processing, Machine Translation, Video Surveillance, ...
21:51:16 <trandles> data science frameworks that don't play well with HPC workload managers (DASK, Spark, etc.)
21:51:47 <martial> Tim: did no say HPC in this particular case
21:51:51 <martial> simply OpenStack
21:52:05 <jmlowe> I've got a guy from UTSA running NAMD doing mpi over our 10GigE vxlan tenant networks
21:52:10 <oneswig> We've worked on a couple of generic research computing resources which take all-comers.  But we've also seen some very specialised applications such as  medical informatics, or radio astronomy.  Much of it is categorised as "the long tail of HPC", ie the stuff that doesn't fit well into conventional HPC infrastructure
21:52:27 <martial> but I agree with earlier comments, think of a topic ... OpenStack can likely do it
21:52:36 <oneswig> #topic How can researchers speed up their work with OpenStack?
21:52:39 <martial> (and make coffee and pancakes :) ... )
21:52:52 <jmlowe> lots and lots of educational allocations on our clouds
21:52:54 <oneswig> Is this about the fabled metric of "time to paper"?
21:53:05 <trandles> haha
21:54:02 <jmlowe> One great way to speed things up is with orchestration and the higher level openstack projects
21:54:05 <oneswig> It's about the situations where the development cycles spend as much time between keyboard and chair as they do between compute, network and storage.  If researchers can get up and running (and stay up and running) faster with OpenStack, it's a win.
21:54:11 <trandles> Researchers can speed up their work by using a runtime environment they control at a scale they might not be able to afford or support.
21:54:18 <jmlowe> so crawl with nova boot, walk with heat, run with sahara
21:54:50 <oneswig> jmlowe: what's next after that?
21:54:52 <martial> heat templates, ansible [why is the name failing me now], VM configurations => experiment => mutli tenant + ro data access + SDN=> segregated private experiment run
21:55:24 <oneswig> #topic What kinds of challenges do researchers face when using OpenStack clouds in their organization?
21:55:25 <martial> => repeatibility
21:55:37 <jmlowe> I had a guy who spent a couple of days trying to run some generic k8s heat template from a tutorial somewhere, had him just use magnum and he was off and running on his k8s cluster in 10 min, enter at the level of customization you need and forget the rest
21:56:08 <b1airo> Biggest challenge we see is that researchers are not sysadmins
21:56:55 <trandles> b1airo: +1  That's how we justify our entire existence.  We focus on the computing infrastructure so they can focus on being scientists.
21:57:13 <martial> I like that
21:57:14 <jmlowe> one of the major problems I havae with reproducability is the idea that you keep everything the same, reproducability is not me going into your lab and using your graduated cylinders etc, it is me doing it with my equipment and getting roughly the same results
21:57:17 <rbudden> b1airo: +1
21:57:35 <flanders_> Scientific-wg tagline?!
21:57:41 <flanders_> ;)
21:57:44 <b1airo> Yeah and then the corollary challenge for us is how much effort to spend on the infrastructure versus helping with the science
21:58:11 <jmlowe> b1airo: +1
21:58:16 <martial> flanders_ +1 :)
21:58:20 <rbudden> yep, ‘user services’ vs ‘facilities’
21:58:57 <oneswig> There's a difference in mindset.  Research computing has this level of order that doesn't apply in cloud.  HPC users assume they can book a number of physical nodes and network switches.  There's time sharing and strict queuing.  In comparison, cloud users get resource like they're crowding round an ice cream shop!
21:59:15 <oneswig> #topic What features are missing in OpenStack to provide better infrastructure for scientific research?
21:59:32 <oneswig> Next meeting perhaps...?  This could take a little while
21:59:44 <jmlowe> spot instances?
21:59:45 <trandles> lol but it's our chance to be selfish
21:59:53 <oneswig> preemptible instances and resource reservation has been a long-sought-after goal
21:59:53 <b1airo> Yeah maybe we should carry those two over to next meeting...
22:00:03 <jmlowe> yeah
22:00:31 <oneswig> Alas, we are out of time.
22:00:32 <martial> (not sure there is a meeting after, so if we need to overrun, others can tell us :) )
22:00:51 <oneswig> pipe up if you're waiting or we'll nail this last question...
22:01:38 <martial> (and we can move the the #scientific-wg if needed)
22:01:45 <martial> seems we can go on
22:01:53 <martial> answers anybody?
22:02:28 <oneswig> When looking at HPC workloads on OpenStack, exposing physical resource into the virtual world has been key for hypervisor efficiency gains.  The next level may be placement within the physical network.  How can we deliver the benefits of cloud but pare it down to something so close to the metal?
22:03:24 <trandles> that question would have been a lot easier a couple years ago but now I feel like a lot of gaps are being filled
22:03:54 <oneswig> In essence a lot of the WG members are "physicalising" the virtual resources, and somehow the OpenStack managed infrastructure is still flexible enough to be a game changer.
22:04:20 <oneswig> ... final comments ?
22:05:03 <oneswig> OK, lets wrap up - thanks everyone
22:05:12 <oneswig> #endmeeting