21:00:41 <oneswig> #startmeeting scientific-sig 21:00:42 <openstack> Meeting started Tue Nov 27 21:00:41 2018 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:43 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:45 <openstack> The meeting name has been set to 'scientific_sig' 21:00:47 <oneswig> you snooze you lose! 21:01:03 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_November_27th_2018 21:01:32 <b1airo> howdy 21:01:32 <oneswig> Greetings Americans, I trust Thanksgiving was good 21:01:42 <oneswig> hey b1airo, g'day 21:01:43 <trandles> omnomnom 21:01:46 <oneswig> #chair b1airo 21:01:47 <openstack> Current chairs: b1airo oneswig 21:01:57 * b1airo has the power! 21:02:21 <janders> g'day everyone 21:02:30 <oneswig> I'm on a train, wifi's likely to get interesting as we move 21:02:32 <b1airo> hi janders 21:02:33 <oneswig> g'day janders 21:02:46 <oneswig> #topic SC roundup 21:02:53 <janders> hey Blair - how was SC? 21:02:59 <oneswig> How indeed? 21:03:18 <b1airo> good. that enough...? :-P 21:03:37 <oneswig> How did Wojtek break his leg, can you share that? 21:03:40 <b1airo> the container sessions outnumbered the cloud sessions this year 21:04:24 <janders> sounds like Berlin :) 21:04:29 <b1airo> i can share a video of him in emergency. but actually the incident was pretty lame, he must just be getting old 21:04:29 <janders> Is SC changing name anytime soon? 21:04:35 <oneswig> Anything on Charliecloud trandles? 21:05:32 <b1airo> one particular thing on the container front that i found interesting was an understanding of Docker's strategy in the HPC space... 21:05:35 <trandles> Nothing interesting on Charliecloud. Just more progress going into supporting what seems like an increasing number of MPI implementations and interconnects 21:06:02 <trandles> b1airo: you have an understanding of docker's HPC strategy? I don't and I've been talking to them for the past year. :( 21:06:27 <trandles> although, I get different strategies from different people within docker 21:06:56 <oneswig> Docker's strategy right now is to marvel at that bright light at the end of the tunnel, it seems 21:07:12 <oneswig> ... and wonder what the chuffing noise might be... 21:07:39 <trandles> oneswig: that's my impression as well 21:07:41 <janders> I hope the strategy is not "we don't care and keep going our own way, the HPC folks will figure it out" 21:07:43 <b1airo> basically, because everyone in research and big HPC wants stuff for free, Docker Inc can't commit engineering to related feature development. So Christian's strategy is to go after ML workloads in enterprise, supporting GPU, then MPI, and presto - HPC is supported. 21:08:15 <b1airo> that's the simplistic version 21:08:21 <janders> it's a bit like RHAT&mlnx, HPC, SRIOV and Telco 21:08:31 <trandles> last I heard from Christian is "we'll replace [slurm|pbs|torque|etc] with docker swarm, voila!" 21:08:58 <oneswig> MPI support that I've seen (kubeflow's mpi-operator) isn't going to scale - but that doesn't mean it will always be like that 21:09:51 <b1airo> there is also some uid/gid mapping stuff they have in the works for FS namespaces, i guess that will be a "within this container, squash all I/O to this volume to these IDs" 21:10:06 <trandles> Well, LANL's position is that money, power, and cooling aren't going to scale much further so we might as well tackle trying to get better than 1% efficiency from our platforms 21:10:06 <oneswig> That has bitten us 21:10:33 <oneswig> 1%? Sounds like my household finances. 21:10:37 <b1airo> trandles: lol, that'll never catch on 21:10:58 <trandles> we're seeing something around 3-6% efficiency with our new ARM systems 21:11:00 <janders> have you guys heard anything interesting from the other side ( [slurm|pbs|torque|etc] )? 21:11:15 <b1airo> didn't you hear Trump, he doesn't believe climate change will have any economic impacts, and besides, it was cold last weekend 21:11:27 <janders> are they looking at running their own components containerised - and consuming k8s resources? 21:11:39 <trandles> I had lunch == no one is hungry any more! 21:12:08 <janders> b1airo: damn right! It's bloody cold here despite we're almost in December.. burn more coal, quickly! 21:12:09 <b1airo> trandles: more seriously though, what is your definition of efficiency for these purposes? 21:12:09 <trandles> janders: I haven't heard much of anything from the resource manager side RE: containers 21:12:18 <oneswig> janders: that sounds a little far fetched - apart from for univa grid engine, who have done something likethat 21:12:48 <janders> yeah I hear those vendors trying to orchestrate docker themselves - which I find retarded 21:13:18 <b1airo> there was one suggestion during the Q&A in one of the BoFs that Slurm should support one of the OCI interfaces (whichever one is responsible for launching containers) 21:13:29 <b1airo> there was one interesting suggestion during the Q&A in one of the BoFs - that Slurm should support one of the OCI interfaces (whichever one is responsible for launching containers) 21:13:30 <janders> have slurm run containers for HPC and k8s run containers for non-hpc within a single site sounds anything but efficient 21:13:31 <trandles> b1airo: application performance as percent of theoretical peak IIRC...looking for that now... 21:13:56 <oneswig> Were there new faces for OpenStack at SC? 21:14:52 <b1airo> oneswig: yes, both in the audience and on the panel, but mostly that was because half of them were container peeps 21:14:54 <trandles> bigger problem is no one knows how to launch MPI applications at scale but the existing resource managers (kinda) and the MPI implementations (for now) 21:15:22 <oneswig> your caveats, most beguiling... 21:15:31 <oneswig> care to elaborate? 21:15:42 <trandles> in the OpenMPI space, it sounds like before long no one will be able to launch anything...orte is dead, PMIx is almost unsupported, and mpirun is deprecated 21:15:50 <oneswig> I hadn't realised that pmix is itself written like an hpc application 21:16:42 <trandles> b1airo: did you or anyone you know attend the OpenMPI BoF? I wasn't there long enough. Rumor had it lots of folks were showing up to make a stink about the lack of future for things like mpirun 21:17:01 <oneswig> sounds awkward 21:17:31 <b1airo> unfortunately not trandles , sounds like i missed out though. it's possible one of the Monash folks did (will ask)... 21:17:32 <trandles> at LANL, we use slurm's support of PMI2 to use srun to wire up the job 21:17:52 <trandles> not sure what IBM is doing RE: jsrun and lrun 21:18:00 <trandles> but they have Spectrum-MPI anyway 21:18:08 <oneswig> trandles: that's what I tend to use for openhpc and slurm 21:18:17 <oneswig> (pmi2) 21:18:50 <oneswig> We should also round up the OpenStack summit activity 21:19:06 <oneswig> Ready to flip over? 21:19:20 <trandles> one last thing 21:19:32 <oneswig> go for it, columbo 21:19:52 <trandles> without details, I get the impression that some vendors are looking to implement container-launching plugins for resource managers 21:20:10 <b1airo> oneswig: i'm afraid my memory of the session needs jogging slightly - it was on the afternoon following my night in the hospital with Wojtek, so i was a little spacey. there was not a lot of OpenStack specific conversation though, more general higher-level cloud workload issues. a couple of the people asking questions had some very... strange... problems too (of the "i can't believe you ended up here" nature) 21:20:32 <trandles> and it's not like "have slurm run docker" it's more like "slurm job_launch plugin that makes the right syscalls to set up the namespaces" 21:21:21 <janders> trandles: this is good to hear! :) 21:21:23 <oneswig> trandles: will all those people who bemoaned the docker daemon's root privilege now turn on slurmd, I wonder? 21:21:45 <b1airo> good question oneswig 21:21:47 <trandles> slurmd already does things on behalf of the user 21:21:57 <trandles> ie. the user doesn't need sudo to use slurm ;) 21:22:11 <oneswig> indeed, and gets little of the flak that docker gets 21:22:43 <trandles> maybe if docker didn't require sudo (or setuid) and direct user control it would be different 21:22:47 <b1airo> is that just because some of Slurm's guts are setuid ? 21:23:10 <oneswig> anyway, trandles, you've got a pretty good implementation of container launch, do you think it will come in? 21:23:36 <trandles> maybe...I certainly hope so 21:23:38 <oneswig> trandles: fair point on the direct user control 21:24:11 <oneswig> OK, move on? 21:24:14 <trandles> anyway, that's all I got...Berlin? 21:24:17 <oneswig> #topic Berlin roundup 21:24:32 <oneswig> janders: we had a pretty vibrant meeting this time round, wouldn't you say? 21:24:44 <oneswig> more than half the crowd were there for the first time 21:25:09 <oneswig> The room was full, I'd guess 70 upwards 21:25:20 <janders> oneswig: I think we can definitely say that! 21:25:23 <oneswig> perhaps even 100 21:25:40 <janders> it almost felt like we need a bigger room 21:25:52 <oneswig> what was interesting was that the majority of attendees had bioinformatics use cases 21:26:01 <janders> if we weren't that far from all the other presentations we would likely get few more people :) 21:26:21 <oneswig> some HEP, some generic university workloads, some AI/ML 21:26:51 <janders> true, however looking back at the last five years of OpenStacking I think nearly all of my "interesting" users were from the bioinformatics domain 21:27:03 <oneswig> There's continuing interest in how to manage sensitive data, particularly from them. 21:27:03 <b1airo> it's slightly depressing that so many people need a whole different infrastructure approach just to make bioinformaticians happy 21:27:28 <oneswig> I don't think it's different, just different processes 21:28:03 <b1airo> i could be projecting some current frustration with supporting the Genomics community down here 21:28:04 <janders> b1airo: other than the storage backend, what differences are you thinking? 21:28:06 <oneswig> There's a lot of common ground around questions like how to implement a safe haven 21:28:48 <oneswig> Somebody today made the point it's just like data in banking (or similar), really 21:29:00 <martial__> (sorry computer issues) 21:29:12 <oneswig> hey martial__ 21:29:20 <oneswig> you're growing underscores again! 21:29:25 <oneswig> #chair martial__ 21:29:26 <openstack> Current chairs: b1airo martial__ oneswig 21:29:42 <martial__> I know, fun times :) 21:29:53 <oneswig> We had a visit today to our office from a team from Monash - Komathy and Jerico 21:30:10 <oneswig> Had a really useful discussion with them on where they are and what they need 21:30:58 <oneswig> They are keen to make contact with fellow travellers from the US. They've been speaking with half a dozen or so around Europe and the UK on their tour 21:31:22 <oneswig> Ah, I should update the topic 21:31:40 <oneswig> #topic fellow travellers for controlled-access data 21:32:19 <oneswig> trandles: I imagine sensitive data handling is totally different in your domain 21:32:29 <trandles> oneswig: I think those folks need to make contact with Khalil and his federation efforts 21:32:48 <trandles> hrm, for us it's a pain in the arse 21:33:21 <oneswig> Does ORCA cover that kind of use case? 21:33:26 <oneswig> martial__: ? 21:33:46 <martial__> (sorry was looking into logs of crash) 21:34:02 <oneswig> you're not in the car, I hope? 21:34:16 <martial__> (no, computer reboot) 21:34:24 <oneswig> just checking :-) 21:34:54 <martial__> so Khalil and the ORCA people do not cover the conversation about data sensitivity 21:35:26 <martial__> it covers the case of access of information and it can be either an RBAC or ACL solution 21:35:26 <oneswig> ah ok 21:35:33 <trandles> I'm not sure how they can work on federation without considering data sensitivity 21:36:08 <martial__> not sure if those overlap with your definition of sensitivity? 21:36:20 <martial__> (use case of) 21:36:43 <oneswig> I guess the first use case is to enable sharing, and the follow-up use case becomes how to control it 21:37:08 <b1airo> i went to the federation panel at SC, but to be perfectly honest it still seems very academic 21:37:30 <martial__> the idea that a user has access to a subset of data if controlled by who this user is per its access rights 21:38:01 <martial__> it closer to ACLs type 21:38:19 <oneswig> b1airo: from an OpenStack perspective there are many gaps for federated users, agreed 21:38:46 <martial__> OpenStack has federated users (keystone to keystone) 21:39:08 <oneswig> It does indeed, but (for example) they cannot create heat stacks or use application credentials 21:39:17 <oneswig> a federated user is a bit ghostly 21:39:20 <martial__> no, indeed 21:40:08 <janders> oneswig: I wasn't aware of that. What are the reasons? Bugs in heat? Else? 21:40:13 <martial__> the concept of the work done for ORCA is the use of the IEEE P2302 model to drive an implementation (proof of concept) 21:40:27 <martial__> that allows clouds to interconnect 21:40:33 <oneswig> It relates to trusts in Keystone janders 21:41:06 <oneswig> beyond that I am not sure but people on ou team have been through the issues. 21:41:13 <martial__> their is an "agent" to communicate rights, roles and privileges for the users from a remote cloud to the local cloud 21:41:34 <martial__> (the "cloud broker" in a way) 21:41:34 <janders> right... one would think that a user holds a token or he/she doesn't - so every service works or no service works 21:41:51 <oneswig> martial__: ORCA is working on a poc implementation? 21:41:54 <janders> but I guess that was pretty naive and the reality is much less binary 21:41:56 <janders> :) 21:42:27 <martial__> so IEEE P2302 is getting a NIST SP500 in the draft in the coming month 21:42:39 <janders> noted! thank you, knowing this will help me stay out of trouble down the track 21:42:45 <oneswig> janders: I think it relates to actions performed in the user's name when they've gone home for the night, my understanding of it 21:43:36 <martial__> ORCA will benefit from it and there was a big effort of integrating sites while at SC18 in the framework to help dive this POC 21:44:09 <oneswig> good to hear there is some forward progress for them 21:44:23 <oneswig> anyway, I feel we should do what we can to gather interested parties around best practice for sensitive data. 21:44:34 <martial__> yes, Stig, I agree 21:44:55 <oneswig> Seems we are light on the ground today but if we can tap a few regulars, perhaps 21:45:05 <martial__> although ORCA is "for academic use" so less issue of data sensitivity (we are not talking Secret, TS, ... are we?) 21:45:31 <oneswig> The other area of interest from the SIG session - a new one for our discussions to date - best practice for AI/ML 21:46:15 <trandles> martial__: it doesn't have to venture into classifications, it can be simple export control 21:46:19 <oneswig> I wasn't sure what demands that entails 21:46:50 <oneswig> but then I met a guy from sweden wanting to put 8x100G networking into each GPU node... 21:47:28 <b1airo> sorry, sidetracked in another meeting on Zoom... 21:47:46 <oneswig> np, hope you're not sharing your desktop :-) 21:48:05 <b1airo> related to the federated roles, privileges etc discussion above. sounds like what SciTokens might be designed to help with 21:48:26 <b1airo> oneswig: me too! 21:49:05 <oneswig> the AI use case is tangential to our cohort but becoming increasingly used at a platform level by users I'm aware of 21:49:31 <oneswig> we may not realise our infra is already doing this kind of work... 21:50:42 <oneswig> OK, I forgot the other matter arising (from Pawsey) 21:50:56 <oneswig> Anyone using Manila to manage Lustre? Or interested in doing so? 21:51:19 <oneswig> #topic manila and lustre 21:51:22 <martial__> yes, we have looking into SciTokens indeed 21:51:42 <oneswig> martial__: is that you with your ORCA hat on? 21:53:44 <martial__> nah, the P2302. We have a ton of conversation on the different models out there ... it needs to "interconnect" after all :) 21:54:03 <oneswig> There was some interest in orchestrating dynamic shares on Lustre filesystems using Manila. I'm aware of a few people interested in pursuing it. 21:54:22 <oneswig> This was raised last week on the EMEA session 21:55:14 <janders> just from my perspective, doing it via NFS is less interesting 21:56:05 <oneswig> I agree that the native transport makes most sense, given the effort of getting Lustre to that point. 21:56:08 <janders> I hope to look at BeeGFS as nova/glance/cinder/manila/swift backend instead (sorry I haven't managed to get back to you on this oneswig - chasing up some other things this week) 21:56:41 <oneswig> janders: np, we've all got stuff to do :-) 21:56:44 <janders> native Lustre orchestration could also be of strong interest to us, depending on the outcomes of a tender 21:57:13 <janders> we have BeeGFS for sure, Lustre might or might not come into play 21:57:22 <oneswig> janders: may the best filesystem win :-) 21:57:33 <trandles> gotta run to another meeting, toodle pip 21:57:40 <oneswig> cheers trandles 21:57:44 <janders> GPFS would be another area of interest, though I feel that BeeGFS might be better going forward 21:58:32 <janders> oneswig: exactly! :) 21:58:37 <oneswig> I'm sure you're not the only one weighing these things up 21:58:47 <janders> indeed 21:58:47 <oneswig> OK, we are nearly out of time 21:58:56 <oneswig> final thoughts? 21:59:20 <janders> I'm good. Thank you all! 21:59:26 <oneswig> I'll mail openstack-discuss to try to gather more interest on the data 21:59:45 <oneswig> OK y'all, thanks and goodnight 21:59:52 <martial__> cool, thanks :) 21:59:55 <oneswig> #endmeeting