21:00:21 #startmeeting scientific-sig 21:00:22 Meeting started Tue Jul 24 21:00:21 2018 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:26 The meeting name has been set to 'scientific_sig' 21:00:34 I even spelled it right 21:00:41 g'day all! :) 21:00:46 #link Today's agenda https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_July_24th_2018 21:00:52 Hey janders! 21:01:00 o/ 21:01:09 Tim - you made it - bravo 21:01:14 How is PEARC18? 21:01:33 it's good...I'm only here today :P 21:01:42 you were in the UK last week, right? 21:01:43 Mike and Martial should make it too 21:01:47 How was that? 21:01:47 yeah 21:01:56 busy and tiring 21:01:58 long days 21:02:08 It's ruddy hot over here right now, you'd have fitted right in. 21:02:09 thanks for the voting reminder! :) 21:02:32 Would have been just like that time you had no AC :-) 21:02:44 janders: ah, right, lets get onto that. 21:02:52 #topic Voting closes thursday 21:02:55 I used to think we over-air conditioned in the US...no more 21:03:41 trandles: I'm melting over here... 21:04:06 We visited CERN last week. Big surprise there was that the offices (in Geneva) have no AC. 21:04:25 I'm impressed with the beers from Tiny Rebel btw. Wish we could get them in the states. 21:04:49 Tiny rebel? I'll look out for it. Whereabouts was this? 21:05:20 whoa... it's been close to 40C last time I visited. Would be painful to work in offices without AC if that were to last for days.. 21:05:35 sorry, wifi here is really dodgy 21:05:51 trandles_: where's the conference? 21:06:00 Pittsburgh 21:06:04 I am seriously waiting for IRC to add 5 ___ to my nick at some point :) 21:06:11 hey Stig 21:06:13 Hey martial__, welcome 21:06:16 expect jmlowe, martial__, me to have connection problems 21:06:25 #chair martial__ 21:06:26 Current chairs: martial__ oneswig 21:06:29 Not me, different hotel 21:06:32 well Mr Randles, long time no see :) 21:06:37 lol 21:06:42 Hi jmlowe 21:06:47 Hey Stig 21:06:57 How was the panel - is this filmed? 21:06:59 Mike: too easy 21:07:04 not filmed 21:07:09 I think it went well 21:07:16 oneswig: Tiny Rebel is just north of Cardiff I think 21:07:21 tough crowd? 21:07:35 trandles_: not too far at all then. I'll keep an eye out for it. Thanks 21:07:40 #link https://etherpad.openstack.org/p/pearc18-panel 21:07:52 here are the questions we went through 21:07:59 (well most of them anyhow) 21:08:00 ah, it's in Newport 21:08:05 stig: I see you've got some _really_ cool presos submitted 21:08:14 yep reused the Etherpad method ... worked well in truth 21:08:17 I know those people :-) 21:08:40 janders: thanks! Have you got a link to yours? 21:09:05 https://www.openstack.org/summit/berlin-2018/vote-for-speakers#/22223 21:09:18 https://www.openstack.org/summit/berlin-2018/vote-for-speakers#/22219 21:09:32 https://www.openstack.org/summit/berlin-2018/vote-for-speakers#/22164 21:09:33 I'm guessing most of the NSF folks are doing SC instead of Berlin so this is of passing interest but voting for sessions closes Thursday 21:10:11 3x really good "I would love to see topics" from me 21:10:43 but no favoritism ... send your proposal in :) 21:10:46 Erez and Moshe up with you again, eh? Like the 3 Musketeers :-) 21:10:58 yes :)O 21:11:29 though for the nova-compute on Ironic we team up with RHAT for a change 21:12:05 janders: It sounds like this trick is in use at CERN also - talk to Arne Wiebalck about it 21:12:29 speaking of.. sorry I was real busy and never followed up with John on the compute/ironic bit. Are you back in office oneswig? 21:12:43 janders: did you get it doing everything you wanted and what are the limitations? Don't keep us hanging until november! 21:12:51 Right now I'm in Cambridge, but close enough 21:13:14 Here's the talks from our team 21:13:19 :) I will try to restart that email thread this week or early next 21:13:21 Doug: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22078 21:13:21 Stig: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22446 21:13:21 Stig: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22233 21:13:23 Mark: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22454 21:13:25 Mark: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22579 21:13:27 John: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22438 21:14:40 There's also some interesting developments on various fronts around preemptible instances 21:15:03 priteau pointed out a talk from the Blazar core on delivering these within that context. 21:15:27 janders: please do, I'm sure john would be interested. 21:15:40 oneswig: that's some really excellent stuff. It's all great, monasca, preemptible instances and container networking are supercool 21:15:52 +3'ed all 21:16:00 next? :) 21:16:01 ... in each case, when they work! :-) 21:16:08 Good plan, thanks martial__ 21:16:15 #topic Ansible+OpenHPC 21:16:26 A topic of longstanding interest over here. 21:16:38 yes! :) 21:17:01 You guys in Pittsburgh, right? How about calling in at Santa Clara on your way home... 21:17:17 #link OpenHPC event https://lists.openhpc.community/g/main/message/4 21:17:50 BTW that link's not an error. This was apparently the 4th message this year on the OpenHPC mailing list 21:19:06 there was a presentation on OpenHPC yesterday I think 21:19:07 There's some playbooks online already: https://github.com/Linaro/ansible-playbook-for-ohpc 21:19:22 Apparently following the Warewulf-style deploy process. 21:20:05 What I'm hoping is that once the infrastructure is deployed, there's a natural point where OpenStack-provisioned infrastructure could dovetail into the flow 21:20:16 Tim, Mike: did Evan not speak about this in our panel? 21:20:41 um...yeah kinda 21:21:24 They are using OpenHPC at Minnesota? 21:21:25 in passing 21:21:43 I think he went to the talk yesterday 21:22:05 was looking for it in the agenda, not sure it is there 21:22:13 A talk on OpenHPC? 21:22:17 yes 21:22:21 tbh I'm still not sure what OpenHPC provides of value other than a collection of things you might want on your cluster 21:23:35 They even have this: http://build.openhpc.community/OpenHPC:/1.3:/Update6:/Factory/CentOS_7/x86_64/charliecloud-ohpc-0.9.0-4.1.ohpc.1.3.6.x86_64.rpm - what more could you want? 21:23:57 lol, touché 21:24:16 but you can get that via "git clone https://github.com/hpc/charliecloud" 21:24:28 I'll +3 this one too :) 21:24:30 trandles: we like it because it's a lot easier to automate the deploy and configure 21:24:52 someone's gone to the trouble of building and packaging. Maybe even testing 21:25:49 I'm leaning more and more to the side of containerized the world, stop provisioning clusters, use Spack to manage build and runtime complexities 21:25:56 oneswig: trandles: I don't have much hands-on with OHPC but my hope is to simplify things - using Bright Cluster Manager is cool, but creates a network of dependencies from hell and at times creates almost as many problems as it solves.. 21:25:58 *containerize 21:28:00 trandles: we've got some nodes running Fedora Atomic. There's basically no other way with those nodes. We are facing containerising BeeGFS OSS etc to create something hyperconverged 21:28:53 feels like a sane way to manage a cluster in my opinion 21:29:21 I'd prefer to be all-in on one approach or the other. 21:29:32 but I don't disagree with you. 21:30:16 looking at the tools available today, especially the cloud stuff for both hardware management and runtime portability, it seems like we can do a better job managing our HPC clusters 21:30:42 or maybe I'm just jetlagged and tired 21:31:10 trandles: on a related note, there was some recent interest here in running OpenMPI jobs in Kubernetes. There's some prior work on this that passed our way. Ever seen this work? 21:31:35 we haven't looked at k8s for much more than deploying services 21:31:47 we have looked at lot at OpenMPI + linux namespaces 21:32:00 I'm hoping to understand the capabilities - and limitations 21:32:11 OpenMPI has a large amount of internal voodoo 21:32:17 (especially if my talk goes through...) 21:32:26 trandles: it's not small! 21:32:42 oneswig: does your talk touch on RDMA in containers? 21:32:58 it plays games in the name of efficiency and those games break with certain namespace constraints 21:33:14 but we've run 1000+ node, 10000+ rank OpenMPI jobs using Charliecloud 21:33:19 janders: We need that. We need to understand what's hackery-bodgery vs what's designed for this purpose 21:33:43 RDMA in containers is no different than RDMA on bare metal isn't it? 21:33:53 oneswig: +1 21:34:06 trandles: How does Charliecloud work with (eg) orted? Does that run on the host, outside the container? 21:34:29 depends on how everything is built 21:34:35 trandles: last time I checked (which was few months back) there were some issues with running multiple RDMA enabled containers on one bare-metal node 21:34:46 trandles: I'm not sure how it works with the /dev objects that get opened for RDMA access 21:34:49 the easiest is building your resource manager with PMI support and no orted is required 21:35:14 janders: probably depends on the container runtime 21:35:16 oneswig: exactly... the /dev challenge.. 21:35:28 Charliecloud bind mounts /dev into your namespace 21:35:48 docker does the wrong thing for HPC (IMO) by creating it's own /dev entries 21:35:51 *its 21:35:51 trandles: thanks. Good to know. I'm wondering how K8S + OpenMPI achieves this 21:35:56 (ie, without slurm) 21:36:13 trandles: is that the private devices thing? 21:36:29 we can launch using mpirun but you start to have OpenMPI version mismatch issues inside vs. outside 21:36:56 trandles: which runtime is better than docker for RDMA in containers? 21:37:13 our position is, for the HPC use case, you want as little abstraction/isolation as possible 21:37:38 Shifter and Charliecloud both have full RDMA support 21:37:55 I assume Singularity does to, but I haven't looked 21:37:55 trandles: great, thank you 21:38:21 at one point Singularity was playing games to pass things between the container and the bare metal, but Shifter and Charliecloud do not 21:38:41 trandles: Can we pin you down to a date to talk this over in great depth? 21:38:59 oneswig: would like that, yes 21:39:08 +1!!! 21:39:32 August 7 works 21:39:49 (no travel in August!!! :) ) 21:40:33 Excellent - although I'll be unable to join you that day (holidays) 21:40:50 21st works too 21:41:02 Perfect for me :-) 21:41:07 or I could do a separate WebEx for StackHPC ;) 21:41:57 trandles: we really need to get this installed in our OpenHPC environment before we can ask useful questions 21:42:38 Aug 21, will publicize it 21:42:56 On the Ansible+OpenHPC side, I'm hoping to gather a few interested sites together 21:43:00 maybe we can do a video meetup 21:43:15 janders: sounds like you're in? 21:43:19 sounds great! 21:43:27 I can organise a Google Meet if that is of interest 21:43:44 martial__: could be good. Then Tim could end by singing the Charliecloud Song 21:43:48 I can host a goto meeting too 21:44:05 101 users enough you think? 21:44:15 21:44:34 (hey isn't that your plan with Mike in a few minutes? :) ) 21:45:34 martial__: trandles: jmlowe: if you see Evan Bollig later, can you gauge his interest in Ansible+OpenHPC? 21:45:45 Will do 21:46:20 Thanks! 21:46:42 OK, I'll get that going in the next few days. Good stuff. 21:47:12 Lets move on, 21:47:20 #topic Ceph days Berlin 21:47:42 Hooray, if you were worried that a 3-day OpenStack summit was too short, I have the solution for you 21:48:14 excellent. I will aim to rock up early for this event :) 21:48:16 \o/ 21:48:19 #link Ceph day Berlin, Monday 12 November https://ceph.com/cephdays/ceph-day-berlin/ 21:48:41 janders: What better antidote to jet lag? 21:48:42 Side note: I'm starting to really like the cephfs manila ganesha nfs I have going now. 21:48:45 ...and hopefully good RDMA support will arrive even before me... :) 21:49:29 jmlowe: ganesha is the piece we've not used - our clients are given slightly more trust and access CephFS direct 21:49:41 jmlowe: very interesting! can you tell us bit more about it? (use cases, performance, security) 21:49:47 I'm noting your positive experience... 21:50:05 sounds like another great topic for a presentation here? :) 21:50:28 janders: +1 from me 21:50:46 janders: so you are offering to do it, very well I will remember this ;) 21:51:05 security is ip address in exports, ebtables prevents eves dropping, random uuid for export path as a kind of shared secret 21:51:53 martial: unfortunately I don't have much to say on this topic for the time being 21:51:58 performance wise in relatively limited testing I can max out my 10GigE 21:52:28 jmlowe: that's better than I'd expected, nice job. 21:53:06 Ganesha is running locally on the hypervisor, right? What's the route between Ganesha and the client VM - is it that VIRTIO_SOCK thing? 21:53:19 I was pleasantly surprised, metadata on nvme backed pool 21:53:37 jmlowe: maxing out 10GE - is that over multiple clients, or can a single client reach that? 21:53:43 264 spinning rust bluestore osd's with mimic 21:53:48 single client 21:53:55 impressive! 21:54:19 jmlowe: I've been cursing mimic recently due to some weirdness with ceph-volume failing to deploy new OSDs 21:54:36 +1 to oneswig's VIRTIO_SOCK question 21:54:53 !!!! I'm waiting on 13.2.1 so I can get a failed disk back in 21:54:55 jmlowe: Error: "!!!" is not a valid command. 21:55:39 silly old openstack... 21:55:54 OK, we should wrap up... 21:55:58 #topic AOB 21:56:05 So what else is new? 21:56:35 UFM6 is out, seems to work for SDN/IB 21:56:55 I need to track down some dependency issues but it's most likely not UFM's fault 21:56:58 I talked on Ceph RDMA with the team at CERN, and also MEERKAT in South Africa. There's plenty of interest in this, in the right places 21:57:10 excellent to hear! 21:57:15 janders: I haven't forgotten about IB + SDN, I've just been too busy with travel 21:57:19 janders: what changes did you need to apply for new UFM? 21:57:33 oneswig: none. Drop-in replacement. 21:57:38 phew 21:57:49 but more detailed testing to follow 21:58:02 sysbench 16 threads random write written, MiB/s: 13560.71, read, MiB/s: 14416.96 21:58:02 That chain of services is long enough without being cranky along with it 21:58:03 I think 6 has some resiliency enhancements 21:58:28 though I don't think the uppercase/lowercase thing made it into UFM6 GA 21:59:14 had some issues with uppercase GUIDs in the past, but that's probably not worth discussing here 21:59:15 janders: thanks for the heads up on that, seems like we (luckily) dodged a howler on our setup 21:59:32 jmlowe: > 14 GB/s? 21:59:33 trandles: no worries, feel free to reach out any time 21:59:50 there's some caching in there 22:00:01 oneswig: our blade chassis only speak uppercase for MACs/GUIDs hence we hit this at full speed. It hurt :) 22:00:02 too short of a test to fill them 22:00:17 Ah, thanks 22:00:23 'sysbench --test=fileio --file-fsync-freq=0 --file-test-mode=rndrd --num-threads=16 run' 22:00:27 Sorry y'all, we are at time 22:00:43 Thank you all! Don't forget to cast your votes 22:00:45 PEARC18 folks, have a good conference 22:00:51 I will put in mine as soon as I get into the office 22:00:52 thanks Stig 22:00:56 #endmeeting