21:00:21 <oneswig> #startmeeting scientific-sig 21:00:22 <openstack> Meeting started Tue Jul 24 21:00:21 2018 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:26 <openstack> The meeting name has been set to 'scientific_sig' 21:00:34 <oneswig> I even spelled it right 21:00:41 <janders> g'day all! :) 21:00:46 <oneswig> #link Today's agenda https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_July_24th_2018 21:00:52 <oneswig> Hey janders! 21:01:00 <trandles> o/ 21:01:09 <oneswig> Tim - you made it - bravo 21:01:14 <oneswig> How is PEARC18? 21:01:33 <trandles> it's good...I'm only here today :P 21:01:42 <oneswig> you were in the UK last week, right? 21:01:43 <trandles> Mike and Martial should make it too 21:01:47 <oneswig> How was that? 21:01:47 <trandles> yeah 21:01:56 <trandles> busy and tiring 21:01:58 <trandles> long days 21:02:08 <oneswig> It's ruddy hot over here right now, you'd have fitted right in. 21:02:09 <janders> thanks for the voting reminder! :) 21:02:32 <oneswig> Would have been just like that time you had no AC :-) 21:02:44 <oneswig> janders: ah, right, lets get onto that. 21:02:52 <oneswig> #topic Voting closes thursday 21:02:55 <trandles> I used to think we over-air conditioned in the US...no more 21:03:41 <oneswig> trandles: I'm melting over here... 21:04:06 <oneswig> We visited CERN last week. Big surprise there was that the offices (in Geneva) have no AC. 21:04:25 <trandles> I'm impressed with the beers from Tiny Rebel btw. Wish we could get them in the states. 21:04:49 <oneswig> Tiny rebel? I'll look out for it. Whereabouts was this? 21:05:20 <janders> whoa... it's been close to 40C last time I visited. Would be painful to work in offices without AC if that were to last for days.. 21:05:35 <trandles_> sorry, wifi here is really dodgy 21:05:51 <oneswig> trandles_: where's the conference? 21:06:00 <trandles_> Pittsburgh 21:06:04 <martial__> I am seriously waiting for IRC to add 5 ___ to my nick at some point :) 21:06:11 <martial__> hey Stig 21:06:13 <oneswig> Hey martial__, welcome 21:06:16 <trandles_> expect jmlowe, martial__, me to have connection problems 21:06:25 <oneswig> #chair martial__ 21:06:26 <openstack> Current chairs: martial__ oneswig 21:06:29 <jmlowe> Not me, different hotel 21:06:32 <martial__> well Mr Randles, long time no see :) 21:06:37 <trandles_> lol 21:06:42 <oneswig> Hi jmlowe 21:06:47 <jmlowe> Hey Stig 21:06:57 <oneswig> How was the panel - is this filmed? 21:06:59 <martial__> Mike: too easy 21:07:04 <martial__> not filmed 21:07:09 <martial__> I think it went well 21:07:16 <trandles_> oneswig: Tiny Rebel is just north of Cardiff I think 21:07:21 <oneswig> tough crowd? 21:07:35 <oneswig> trandles_: not too far at all then. I'll keep an eye out for it. Thanks 21:07:40 <martial__> #link https://etherpad.openstack.org/p/pearc18-panel 21:07:52 <martial__> here are the questions we went through 21:07:59 <martial__> (well most of them anyhow) 21:08:00 <trandles_> ah, it's in Newport 21:08:05 <janders> stig: I see you've got some _really_ cool presos submitted 21:08:14 <martial__> yep reused the Etherpad method ... worked well in truth 21:08:17 <oneswig> I know those people :-) 21:08:40 <oneswig> janders: thanks! Have you got a link to yours? 21:09:05 <janders> https://www.openstack.org/summit/berlin-2018/vote-for-speakers#/22223 21:09:18 <janders> https://www.openstack.org/summit/berlin-2018/vote-for-speakers#/22219 21:09:32 <janders> https://www.openstack.org/summit/berlin-2018/vote-for-speakers#/22164 21:09:33 <oneswig> I'm guessing most of the NSF folks are doing SC instead of Berlin so this is of passing interest but voting for sessions closes Thursday 21:10:11 <martial__> 3x really good "I would love to see topics" from me 21:10:43 <martial__> but no favoritism ... send your proposal in :) 21:10:46 <oneswig> Erez and Moshe up with you again, eh? Like the 3 Musketeers :-) 21:10:58 <janders> yes :)O 21:11:29 <janders> though for the nova-compute on Ironic we team up with RHAT for a change 21:12:05 <oneswig> janders: It sounds like this trick is in use at CERN also - talk to Arne Wiebalck about it 21:12:29 <janders> speaking of.. sorry I was real busy and never followed up with John on the compute/ironic bit. Are you back in office oneswig? 21:12:43 <oneswig> janders: did you get it doing everything you wanted and what are the limitations? Don't keep us hanging until november! 21:12:51 <oneswig> Right now I'm in Cambridge, but close enough 21:13:14 <oneswig> Here's the talks from our team 21:13:19 <janders> :) I will try to restart that email thread this week or early next 21:13:21 <oneswig> Doug: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22078 21:13:21 <oneswig> Stig: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22446 21:13:21 <oneswig> Stig: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22233 21:13:23 <oneswig> Mark: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22454 21:13:25 <oneswig> Mark: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22579 21:13:27 <oneswig> John: https://www.openstack.org/summit/berlin-2018/vote-for-speakers/#/22438 21:14:40 <oneswig> There's also some interesting developments on various fronts around preemptible instances 21:15:03 <oneswig> priteau pointed out a talk from the Blazar core on delivering these within that context. 21:15:27 <oneswig> janders: please do, I'm sure john would be interested. 21:15:40 <janders> oneswig: that's some really excellent stuff. It's all great, monasca, preemptible instances and container networking are supercool 21:15:52 <martial__> +3'ed all 21:16:00 <martial__> next? :) 21:16:01 <oneswig> ... in each case, when they work! :-) 21:16:08 <oneswig> Good plan, thanks martial__ 21:16:15 <oneswig> #topic Ansible+OpenHPC 21:16:26 <oneswig> A topic of longstanding interest over here. 21:16:38 <janders> yes! :) 21:17:01 <oneswig> You guys in Pittsburgh, right? How about calling in at Santa Clara on your way home... 21:17:17 <oneswig> #link OpenHPC event https://lists.openhpc.community/g/main/message/4 21:17:50 <oneswig> BTW that link's not an error. This was apparently the 4th message this year on the OpenHPC mailing list 21:19:06 <martial__> there was a presentation on OpenHPC yesterday I think 21:19:07 <oneswig> There's some playbooks online already: https://github.com/Linaro/ansible-playbook-for-ohpc 21:19:22 <oneswig> Apparently following the Warewulf-style deploy process. 21:20:05 <oneswig> What I'm hoping is that once the infrastructure is deployed, there's a natural point where OpenStack-provisioned infrastructure could dovetail into the flow 21:20:16 <martial__> Tim, Mike: did Evan not speak about this in our panel? 21:20:41 <trandles> um...yeah kinda 21:21:24 <oneswig> They are using OpenHPC at Minnesota? 21:21:25 <jmlowe> in passing 21:21:43 <jmlowe> I think he went to the talk yesterday 21:22:05 <martial__> was looking for it in the agenda, not sure it is there 21:22:13 <oneswig> A talk on OpenHPC? 21:22:17 <jmlowe> yes 21:22:21 <trandles> tbh I'm still not sure what OpenHPC provides of value other than a collection of things you might want on your cluster 21:23:35 <oneswig> They even have this: http://build.openhpc.community/OpenHPC:/1.3:/Update6:/Factory/CentOS_7/x86_64/charliecloud-ohpc-0.9.0-4.1.ohpc.1.3.6.x86_64.rpm - what more could you want? 21:23:57 <trandles> lol, touché 21:24:16 <trandles> but you can get that via "git clone https://github.com/hpc/charliecloud" 21:24:28 <martial__> I'll +3 this one too :) 21:24:30 <oneswig> trandles: we like it because it's a lot easier to automate the deploy and configure 21:24:52 <oneswig> someone's gone to the trouble of building and packaging. Maybe even testing 21:25:49 <trandles> I'm leaning more and more to the side of containerized the world, stop provisioning clusters, use Spack to manage build and runtime complexities 21:25:56 <janders> oneswig: trandles: I don't have much hands-on with OHPC but my hope is to simplify things - using Bright Cluster Manager is cool, but creates a network of dependencies from hell and at times creates almost as many problems as it solves.. 21:25:58 <trandles> *containerize 21:28:00 <oneswig> trandles: we've got some nodes running Fedora Atomic. There's basically no other way with those nodes. We are facing containerising BeeGFS OSS etc to create something hyperconverged 21:28:53 <trandles> feels like a sane way to manage a cluster in my opinion 21:29:21 <oneswig> I'd prefer to be all-in on one approach or the other. 21:29:32 <oneswig> but I don't disagree with you. 21:30:16 <trandles> looking at the tools available today, especially the cloud stuff for both hardware management and runtime portability, it seems like we can do a better job managing our HPC clusters 21:30:42 <trandles> or maybe I'm just jetlagged and tired 21:31:10 <oneswig> trandles: on a related note, there was some recent interest here in running OpenMPI jobs in Kubernetes. There's some prior work on this that passed our way. Ever seen this work? 21:31:35 <trandles> we haven't looked at k8s for much more than deploying services 21:31:47 <trandles> we have looked at lot at OpenMPI + linux namespaces 21:32:00 <oneswig> I'm hoping to understand the capabilities - and limitations 21:32:11 <trandles> OpenMPI has a large amount of internal voodoo 21:32:17 <oneswig> (especially if my talk goes through...) 21:32:26 <oneswig> trandles: it's not small! 21:32:42 <janders> oneswig: does your talk touch on RDMA in containers? 21:32:58 <trandles> it plays games in the name of efficiency and those games break with certain namespace constraints 21:33:14 <trandles> but we've run 1000+ node, 10000+ rank OpenMPI jobs using Charliecloud 21:33:19 <oneswig> janders: We need that. We need to understand what's hackery-bodgery vs what's designed for this purpose 21:33:43 <trandles> RDMA in containers is no different than RDMA on bare metal isn't it? 21:33:53 <janders> oneswig: +1 21:34:06 <oneswig> trandles: How does Charliecloud work with (eg) orted? Does that run on the host, outside the container? 21:34:29 <trandles> depends on how everything is built 21:34:35 <janders> trandles: last time I checked (which was few months back) there were some issues with running multiple RDMA enabled containers on one bare-metal node 21:34:46 <oneswig> trandles: I'm not sure how it works with the /dev objects that get opened for RDMA access 21:34:49 <trandles> the easiest is building your resource manager with PMI support and no orted is required 21:35:14 <trandles> janders: probably depends on the container runtime 21:35:16 <janders> oneswig: exactly... the /dev challenge.. 21:35:28 <trandles> Charliecloud bind mounts /dev into your namespace 21:35:48 <trandles> docker does the wrong thing for HPC (IMO) by creating it's own /dev entries 21:35:51 <trandles> *its 21:35:51 <oneswig> trandles: thanks. Good to know. I'm wondering how K8S + OpenMPI achieves this 21:35:56 <oneswig> (ie, without slurm) 21:36:13 <oneswig> trandles: is that the private devices thing? 21:36:29 <trandles> we can launch using mpirun but you start to have OpenMPI version mismatch issues inside vs. outside 21:36:56 <janders> trandles: which runtime is better than docker for RDMA in containers? 21:37:13 <trandles> our position is, for the HPC use case, you want as little abstraction/isolation as possible 21:37:38 <trandles> Shifter and Charliecloud both have full RDMA support 21:37:55 <trandles> I assume Singularity does to, but I haven't looked 21:37:55 <janders> trandles: great, thank you 21:38:21 <trandles> at one point Singularity was playing games to pass things between the container and the bare metal, but Shifter and Charliecloud do not 21:38:41 <oneswig> trandles: Can we pin you down to a date to talk this over in great depth? 21:38:59 <trandles> oneswig: would like that, yes 21:39:08 <janders> +1!!! 21:39:32 <trandles> August 7 works 21:39:49 <trandles> (no travel in August!!! :) ) 21:40:33 <oneswig> Excellent - although I'll be unable to join you that day (holidays) 21:40:50 <trandles> 21st works too 21:41:02 <oneswig> Perfect for me :-) 21:41:07 <trandles> or I could do a separate WebEx for StackHPC ;) 21:41:57 <oneswig> trandles: we really need to get this installed in our OpenHPC environment before we can ask useful questions 21:42:38 <martial__> Aug 21, will publicize it 21:42:56 <oneswig> On the Ansible+OpenHPC side, I'm hoping to gather a few interested sites together 21:43:00 <martial__> maybe we can do a video meetup 21:43:15 <oneswig> janders: sounds like you're in? 21:43:19 <janders> sounds great! 21:43:27 <janders> I can organise a Google Meet if that is of interest 21:43:44 <oneswig> martial__: could be good. Then Tim could end by singing the Charliecloud Song 21:43:48 <martial__> I can host a goto meeting too 21:44:05 <martial__> 101 users enough you think? 21:44:15 <trandles> <damn, need to write a song> 21:44:34 <martial__> (hey isn't that your plan with Mike in a few minutes? :) ) 21:45:34 <oneswig> martial__: trandles: jmlowe: if you see Evan Bollig later, can you gauge his interest in Ansible+OpenHPC? 21:45:45 <jmlowe> Will do 21:46:20 <oneswig> Thanks! 21:46:42 <oneswig> OK, I'll get that going in the next few days. Good stuff. 21:47:12 <oneswig> Lets move on, 21:47:20 <oneswig> #topic Ceph days Berlin 21:47:42 <oneswig> Hooray, if you were worried that a 3-day OpenStack summit was too short, I have the solution for you 21:48:14 <janders> excellent. I will aim to rock up early for this event :) 21:48:16 <trandles> \o/ 21:48:19 <oneswig> #link Ceph day Berlin, Monday 12 November https://ceph.com/cephdays/ceph-day-berlin/ 21:48:41 <oneswig> janders: What better antidote to jet lag? 21:48:42 <jmlowe> Side note: I'm starting to really like the cephfs manila ganesha nfs I have going now. 21:48:45 <janders> ...and hopefully good RDMA support will arrive even before me... :) 21:49:29 <oneswig> jmlowe: ganesha is the piece we've not used - our clients are given slightly more trust and access CephFS direct 21:49:41 <janders> jmlowe: very interesting! can you tell us bit more about it? (use cases, performance, security) 21:49:47 <oneswig> I'm noting your positive experience... 21:50:05 <janders> sounds like another great topic for a presentation here? :) 21:50:28 <oneswig> janders: +1 from me 21:50:46 <martial__> janders: so you are offering to do it, very well I will remember this ;) 21:51:05 <jmlowe> security is ip address in exports, ebtables prevents eves dropping, random uuid for export path as a kind of shared secret 21:51:53 <janders> martial: unfortunately I don't have much to say on this topic for the time being 21:51:58 <jmlowe> performance wise in relatively limited testing I can max out my 10GigE 21:52:28 <oneswig> jmlowe: that's better than I'd expected, nice job. 21:53:06 <oneswig> Ganesha is running locally on the hypervisor, right? What's the route between Ganesha and the client VM - is it that VIRTIO_SOCK thing? 21:53:19 <jmlowe> I was pleasantly surprised, metadata on nvme backed pool 21:53:37 <janders> jmlowe: maxing out 10GE - is that over multiple clients, or can a single client reach that? 21:53:43 <jmlowe> 264 spinning rust bluestore osd's with mimic 21:53:48 <jmlowe> single client 21:53:55 <janders> impressive! 21:54:19 <oneswig> jmlowe: I've been cursing mimic recently due to some weirdness with ceph-volume failing to deploy new OSDs 21:54:36 <janders> +1 to oneswig's VIRTIO_SOCK question 21:54:53 <jmlowe> !!!! I'm waiting on 13.2.1 so I can get a failed disk back in 21:54:55 <openstack> jmlowe: Error: "!!!" is not a valid command. 21:55:39 <oneswig> silly old openstack... 21:55:54 <oneswig> OK, we should wrap up... 21:55:58 <oneswig> #topic AOB 21:56:05 <oneswig> So what else is new? 21:56:35 <janders> UFM6 is out, seems to work for SDN/IB 21:56:55 <janders> I need to track down some dependency issues but it's most likely not UFM's fault 21:56:58 <oneswig> I talked on Ceph RDMA with the team at CERN, and also MEERKAT in South Africa. There's plenty of interest in this, in the right places 21:57:10 <janders> excellent to hear! 21:57:15 <trandles> janders: I haven't forgotten about IB + SDN, I've just been too busy with travel 21:57:19 <oneswig> janders: what changes did you need to apply for new UFM? 21:57:33 <janders> oneswig: none. Drop-in replacement. 21:57:38 <oneswig> phew 21:57:49 <janders> but more detailed testing to follow 21:58:02 <jmlowe> sysbench 16 threads random write written, MiB/s: 13560.71, read, MiB/s: 14416.96 21:58:02 <oneswig> That chain of services is long enough without being cranky along with it 21:58:03 <janders> I think 6 has some resiliency enhancements 21:58:28 <janders> though I don't think the uppercase/lowercase thing made it into UFM6 GA 21:59:14 <janders> had some issues with uppercase GUIDs in the past, but that's probably not worth discussing here 21:59:15 <oneswig> janders: thanks for the heads up on that, seems like we (luckily) dodged a howler on our setup 21:59:32 <oneswig> jmlowe: > 14 GB/s? 21:59:33 <janders> trandles: no worries, feel free to reach out any time 21:59:50 <jmlowe> there's some caching in there 22:00:01 <janders> oneswig: our blade chassis only speak uppercase for MACs/GUIDs hence we hit this at full speed. It hurt :) 22:00:02 <jmlowe> too short of a test to fill them 22:00:17 <oneswig> Ah, thanks 22:00:23 <jmlowe> 'sysbench --test=fileio --file-fsync-freq=0 --file-test-mode=rndrd --num-threads=16 run' 22:00:27 <oneswig> Sorry y'all, we are at time 22:00:43 <janders> Thank you all! Don't forget to cast your votes 22:00:45 <oneswig> PEARC18 folks, have a good conference 22:00:51 <janders> I will put in mine as soon as I get into the office 22:00:52 <martial__> thanks Stig 22:00:56 <oneswig> #endmeeting