21:01:12 #startmeeting Scientific-sig 21:01:15 Meeting started Tue Feb 5 21:01:12 2019 UTC and is due to finish in 60 minutes. The chair is martial. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:16 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:18 The meeting name has been set to 'scientific_sig' 21:01:30 good day everyone 21:01:45 welcome to the next edition of the Scientific SIG meeting 21:01:57 hi martial 21:02:03 #chair b1airo 21:02:04 Current chairs: b1airo martial 21:02:08 Gi Blair 21:02:15 Hi Blair (typo) 21:02:35 i was about to type startmeeting when my laptop suddenly powered off, so if i go quiet i may be having computer issues... 21:03:11 A couple announcements first: 21:03:19 (rgr b1airo) 21:03:41 PEARC19 CFP is out 21:03:59 #link https://www.pearc19.pearc.org/ 21:04:03 i was thinking about going to PEARC this year, have heard good things... 21:04:31 it is turning into a good mini SuperComputing in truth 21:05:30 and ISC HP 19 is going to have a container in HPC specific workshop 21:05:40 Christian Kniep from Docker is the chair 21:06:04 interesting... 21:06:27 https://www.irccloud.com/pastebin/kDWaSrVb/ 21:06:32 In conjunction with ISC HIGH PERFORMANCE 2019 21:07:13 am on the technical committee so I will post as we get the CFP finalized 21:07:19 but FYI 21:07:52 if you are coming to PEARC I will add you to this version of the HPC Infrastructure panel 21:08:04 (need to finalize it) 21:08:24 g'day all. Sorry - hardware issues! 21:08:42 hi janders 21:08:47 welcome janders 21:09:21 adding to that list ... the "Open Infrastructure Summit" is coming soon :) 21:09:38 and we are looking for lightning talk presenters for the SIG BoF at the Summit ... as usual :) 21:10:38 so if you are doing something using an Open Infrastructure (not just OpenStack) related to HPC or scientific workflow and are willing to tell us about it (and are coming to the summit) ... 21:10:56 the call is out :) 21:11:10 we should probably create an etherpad as usual... 21:11:16 doing so now 21:12:07 #link https://etherpad.openstack.org/p/scientific-sig-denver19-bof 21:12:07 in regards to the Summit - do you guys know what's the approximate timeline for presentation selection? 21:12:34 let me check latest emails from the organising committee, janders... 21:12:45 thank you b1airo 21:13:29 Past this little tidbit, the usual conversation here :) 21:13:43 oh and I know Blair is keen to collect HPC container war stories 21:13:50 current plan appears to be that "The Foundation staff will send the speaker notifications the week of February 19" 21:14:06 I could probably do a storage-related talk - I've got a BeeGFS, GPFS and long-range storage projects going on, at least one should be presentation-worthy by Denver timeframe 21:14:17 do it~ 21:15:42 at this stage I'd say I'm 75% likely to be in Denver, whether my presentation gets in or not will determine how easy it will be to convince my bosses to fund the trip 21:16:33 i'm not sure there's a fingers cross ascii emoji is there? 21:16:45 🤞 21:17:09 19 Feb notifications - does this mean the voting is on? 21:17:23 it is 21:17:28 (vote for me :) ) 21:18:23 i think voting may have actually closed already 21:19:04 oops - looks like it has 21:19:44 really? 21:20:24 this feels like it was an unusually short window.. 21:20:53 submissions only closed like a fortnight ago right? 21:21:07 yeah i'm always surprised at how quickly this seems to roll around relative to the last Summit 21:21:56 it does feel like the track finalisation could be pushed at least a couple of weeks closer to the actual Summit, but i'm no events manager (probably a good thing!) 21:22:24 janders: looks like there was a lot of community interest in your talk ;-) 21:22:48 thank you b1airo! :) good to hear 21:23:28 one of my RH contacts works with the organisers a fair bit, I will ask around about the motivation between accelerating things 21:23:49 I think the Summit is reinventing itself a fair bit, so it's quite likely they will be changing their ways across the board 21:28:22 this is kind of off topic but janders you might have a pointer i can pass on - Mike Lowe was asking in a separate chat about tracking GPU utilisation against Slurm jobs, i.e., how to tell how much GPU a GPU job has actually used... i'm no Slurm guru but doesn't look obvious 21:29:06 looks like it might require a job epilogue that uses nvidia-smi's accounting functionality 21:29:13 I don't have an answer off the top of my head but I will ask my Slurm guys and get back to you 21:29:41 I have a couple colleagues at NIST (where I am right now) that are interested in doing a lighting talk on "the use OpenStack for hardware specific ML evaluations" 21:30:06 throw it in the Etherpad martial 21:30:17 we use Bright and I believe they introduced some functionality to assist with GPU utilisation monitoring. Previously we had a homebrew solution based on nvidia-smi - that I know 21:30:22 thanks janders ! 21:30:37 no worries 21:30:38 having them post it right now 21:31:41 stepping out for a few minutes 21:33:27 i think we are more-or-less done for today. couple of interesting things in the pipeline for upcoming meetings, but not much to discuss this week without a bigger crew 21:33:45 sounds good! 21:34:08 I put a generic "RDMA storage for OpenStack" talk proposal into the Etherpad 21:34:26 it will crystallise into something more concrete closer to the Summit 21:34:32 👍 21:35:18 We're playing around with GPFS, I will have a look at BeeGFS as well and we've done some interesting long-range work as well I should probably brief you on at some point - maybe at the Summit :) 21:35:26 i might be able to throw higher level together too, will see how things develop here... 21:36:09 great! 21:37:22 very interested in GPFS multi-tenancy. at this point i can't see a low-risk way of supporting sensitive-data research in a typical large shared/multi-user HPC environment, better to create dynamic clusters per data-set/project/group as needed 21:38:16 I was thinking along the lines of what Stig and his team did for BeeGFS 21:38:36 they allow access to the RDMA filesystem by sharing the private network that storage is connected to with "trusted" tenants 21:38:51 and the deal is - if you have native RDMA storage, you don't have root 21:38:59 (or at least this is my understanding) 21:39:36 However, having said that, BeeGFS-OND could offer some interesting ways of tackling this for Slurm running on bare metal OpenStack 21:40:03 and OND also means a brand new approach to resiliency 21:40:56 indeed - real scratch 21:41:00 (while we're quite deep down into the resiliency discussion for a single 1PB scratch and setting performance:capacity:uptime balance, Cambridge guys just don't have this problem, running on-demand.. :) 21:41:38 I'm not so much worried about losing data on scratch, more concerned that with no redundancy one NVMe dies and a good chunk (if not all) of HPC grinds to a halt.. 21:41:42 that's expensive downtime 21:42:06 good point 21:42:11 now if each job has own storage, if a couple per-job ephemeral filesystems die, who cares 21:42:13 resubmit jobs 21:42:14 sorry! 21:42:31 if it's 15x faster than the current storage the users will happily accept this 21:42:56 but - having said this - if you have a significant existing investment in GPFS that's less applicable 21:43:20 we're starting fresh in this particular field so can do any filesystem or a combination of a couple different ones 21:43:58 having the ability to provision high performance storage through OpenStack really helps with trying different things 21:44:33 yes, that's something i'm missing somewhat at the moment 21:46:16 ok - I don't have anything more to add - shall we wrap up? 21:48:57 (back) 21:49:01 yes, thanks for the chat janders . catch you next time! 21:49:06 sounds like we are ready to wrap 21:49:11 seems so 21:49:28 and entry added to the etherpad:) 21:49:29 i have a cake to bake (public holiday day off here) 21:49:50 Happy Waitangi Day! :) 21:50:05 thanks! 21:50:38 #endmeeting