21:01:01 #startmeeting scientific_wg 21:01:02 Meeting started Tue Jun 13 21:01:01 2017 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:03 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:05 The meeting name has been set to 'scientific_wg' 21:01:10 Greetings all 21:01:12 hello everyone! 21:01:21 Hey, Bob made it! 21:01:21 Hi Stig, everyone 21:01:29 Hi, Stig, everyone 21:01:31 #chair martial 21:01:31 Current chairs: martial oneswig 21:01:33 Martial, Stig 21:01:33 Hello 21:01:38 yes, might have to bow out early for daycare pickup though ;) 21:01:39 Hi Xiaoyi-Lu-OSU_, thanks for coming 21:01:42 Hi hi everyone 21:01:47 sure you are welcome 21:01:52 Hi, Stig 21:01:56 #link Today's agenda https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_June_13th_2017 21:02:13 OK lets get the show on the road 21:02:17 Blair here yet? 21:02:36 OK, lets get started 21:02:47 #topic RDMA-enabled Big Data 21:03:01 Hello DK_ Xiaoyi-Lu-OSU_ thanks both for coming 21:03:13 Hello everybody 21:03:23 #link presentation for today is http://www.hpcadvisorycouncil.com/events/2017/swiss-workshop/pdf/Tuesday11April/DKPanda_BigDataMeetsHPC_Tue04112017.pdf 21:03:38 We can talk today about OSU's work 21:03:52 on optimising HPDA platforms with RDMA 21:03:55 Evening. 21:04:03 Hi verdurin, welcome 21:04:46 Xiaoyi-Lu-OSU_: DK_: can you talk about how long you've been working on this project and an overview of what you've done? 21:05:08 We have been working on this project for the last four years. 21:05:43 The broad idea is to exploit HPC technologies (including RDMA) to accelerate Hadoop, Spark and Memcached. 21:06:16 Recently, we have also been exploring virtualization support for these stacks with SR-IOV and OpenStack 21:06:16 Hadoop and Spark - presumably big lumps of java - how do you do that? 21:06:42 For Hadoop, we have designs for difffernent components 21:06:52 say, RPC, MapReduce, and HDFS 21:06:55 I'm looking at the box called "OSU design" in the new network stack (slide 11) 21:07:10 They are designed with Java + JNI + native C libraires 21:07:52 Are there well established precedents for using JNI to do RDMA into a JVM? 21:07:57 For Spark, we currently also bring our RDMA design into the shuffle manager 21:08:49 hi all (bit late sorry, early morning dns issues) 21:08:57 Hi b1airo, good morning 21:09:01 #chair b1airo 21:09:02 Different groups are exploring different solutions. We choose JNI to have better control for the low-level verbs-based designs. 21:09:03 Current chairs: b1airo martial oneswig 21:09:05 #chaor b1airo 21:09:58 Xiaoyi-Lu-OSU_: How big were the changes for Hadoop and Spark components - is this a major change or is it well layered/ 21:10:25 First I think it is well layered. 21:10:53 For example, we implement our RDMA designs as plugins for these components. 21:11:11 We don't want to change too many lines of code inside the original codebase. 21:11:30 That's why we are able to support Apache Distribution of Hadoop, as well as CDH and HDP. 21:11:36 So it's maintainable? Sounds promising. 21:12:11 Yes, it is maintainable. And we don't change any existing APIs for these components 21:12:39 Something I missed was the acronym HHH for Hadoop - what is that? (slide 14) 21:12:40 We also keep the existing architecture intact. 21:13:16 HHH means A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture 21:13:17 oneswig: are there slides somewhere i missed? 21:13:31 Hybrid, HDFS, and Heterogeneous are three key wrods 21:13:32 these ones b1airo: http://www.hpcadvisorycouncil.com/events/2017/swiss-workshop/pdf/Tuesday11April/DKPanda_BigDataMeetsHPC_Tue04112017.pdf 21:13:33 words 21:13:52 Hybrid between what and what? 21:14:58 Hybrid means different I/O paths among hard disks, SSD, RAM Disk, and parallel filesystems. 21:15:12 More details can be found at this paper: Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture 21:15:20 which was presented at CCGrid 2015 21:15:26 Is there a URL? 21:15:50 Here it is: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7152476 21:16:05 from IEEE Digital Library 21:16:48 Thanks Xiaoyi-Lu-OSU_ 21:16:56 No problem. 21:16:57 #link http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7152476 21:18:20 the benchmark results (slide 26 onwards) look compelling. How does Ethernet (10G or 50G) fit on these graphs - have you tested? 21:18:56 we have some initial evaluations with 10GigE, but we have not done anything for 50G 21:19:14 RoCE 10GE would be interesting if you've done that 21:19:52 We don't have large testbed with RoCE 10GE 21:20:05 but anybody can do this study with our libraries since we support RoCE also 21:20:21 users can just download them from this link: http://hibd.cse.ohio-state.edu/ 21:20:25 as indicated in slide 12 21:21:32 #link http://hibd.cse.ohio-state.edu/ 21:21:36 Is the project open source? 21:21:52 not yet. 21:22:10 The OHB benchmarks are opensourced already 21:22:26 Do you have plans for the rest? 21:22:36 yes, in future 21:22:47 great. 21:22:52 large scale RoCE workloads may not work well anyway, unless you were using MOFED4.0, depending on application RDMA QP requirements 21:23:10 b1airo: did you get that bond issue fixed that was blocking you? 21:24:01 Xiaoyi-Lu-OSU_: These results all look great. What are the cases where it doesn't perform well? :-) 21:24:36 oneswig: it was supposed to be fixed in 4.1 but we haven't checked it again yet 21:24:39 for some workloads, if they are not communication intensive, then you may not be able to see obvious benefits 21:26:03 Why did you base your benchmarks on IPoIB? Was that because of what you had available? 21:26:41 No, for IPoIB, we can evaluate our enhanced design with default design on the same InfiniBand hardware 21:27:01 we believe this is a fair way to compare them. 21:27:11 makes sense 21:27:40 it would be very interesting, and probably give the work a greater audience and applicability, to see more results over an Ethernet fabric 21:27:56 Do you know these people: https://gaia.ac.uk/gaia-uk/ioa-cambridge/dpci - website says 108 nodes but that's a previous generation. IIRC they have 200+ nodes running IPoIB Hadoop 21:28:03 Do you mean RoCE? 21:28:29 Yes, RoCE versus regular TCP over the same high-speed Ethernet 21:28:31 Xiaoyi-Lu-OSU_: I'd say so 21:29:20 Yes, we actually can support such type of comparison as well with our packages 21:29:35 like I said earlier, we support native RoCE or IB 21:29:43 it can be configured with our libraries 21:30:03 So - OpenStack - how do they integrate so far? You build on an OFED image, so that's easy enough. 21:30:06 Xiaoyi-Lu-OSU_: do you have any special relationship with Mellanox, e.g., Centre of Excellence ? 21:30:09 we are able to run three modes: default TCP/IP, native IB, native RoCE 21:30:50 you can go to this page: http://hibd.cse.ohio-state.edu/userguide/ 21:30:59 Are the last two via MVAPICH2? 21:30:59 they might be able to get you access to a reasonable Ethernet test-bed 21:31:02 to get all the configuration information from our userguides for various components 21:31:21 yes, we do have closely worked with Mellanox folks 21:31:43 MVAPICH2 is a separate project 21:31:52 that's great! 21:32:01 we would love to get the access 21:32:51 Xiaoyi-Lu-OSU_: OK thanks re: MVAPICH2. How have you integrated with OpenStack to date? 21:33:39 So, for Openstack integration, we are using Heat to develop a standard deploy template 21:33:54 Is this the appliance for Chameleon 21:33:55 ? 21:34:03 this template will help users to set up all required dependencies as well as install and configure our libraries automatically 21:34:08 yes 21:34:15 it is available on Chameleon already 21:34:48 here is the information about our appliance 21:34:49 https://www.chameleoncloud.org/appliances/17/docs/ 21:35:23 The appliance looks like it deploys a hypervisor first - is that correct? 21:35:39 yes 21:35:55 we will first allocate some bare-metal nodes and then deploy the KVM instances on top of it 21:36:10 then, a layer of RDMA-Hadoop will be set up on these vms 21:36:28 Are the benchmarks you publish from SR-IOV VMs? Or bare metal? 21:36:59 except slide 48 21:37:07 others are taken from bare-metal nodes 21:37:38 the numbers on slide 48 are taken from SR-IOV vms 21:37:57 We are looking at integrating HiBD into Sahara - on bare metal 21:38:04 Any advice? 21:38:45 We are still exploring Sahara. At this point, we don't know what kind of issues will be there 21:38:49 What Mark my colleague has found so far is that HiBD uses pretty new versions of everything, and Sahara's pinned on some pretty old versions. 21:38:58 if you find any problems, please feel free to contact us 21:39:04 and we will be happy to help 21:39:13 Thanks Xiaoyi-Lu-OSU_, will do! 21:39:21 Xiaoyi-Lu-OSU_: would the Heat template be easily editable to remove the bare-metal component assuming we already had RDMA-capable KVM guests via Nova? 21:39:47 We keep upgrading our designs to the newer version of the codebase 21:39:51 I think generally it would be a great thing to have this project easily integrated into HPDA-on-demand 21:40:10 oneswig: we have Sahara in Nectar testcloud at the moment, it's behind Trove for prod deployment currently 21:40:11 yes, that will be doable with Heat 21:40:11 oneswig: great idea 21:40:58 Xiaoyi-Lu-OSU_: Interesting to see how you've worked a software deployment inside a vm that is itself a software deployment 21:41:27 OK 21:41:29 :-) 21:42:10 I'm sure there's wide interest in this, for anyone with RDMA-enabled kit and an interest in data-intensive analytics 21:42:23 I agree 21:42:43 Xiaoyi-Lu-OSU_: do you already have contact with people from the Sahara project? 21:43:29 One person from Sahara project talked to me earlier when I presented this work in OpenStack Summit @ Boston 21:43:49 they also feel very interested with our designs. 21:44:02 If we can, let's find a way to get an OSU recipe into their work. 21:44:07 that's my main concern with Sahara... as I understand it Mirantis built it but then cut back the dev resources, and I'm not sure how much other community there is around it yet 21:44:35 b1airo: don't know either. It's very useful for us. 21:45:39 Xiaoyi-Lu-OSU_: memcached - presumably much simpler. I wonder if this might be usable on the OpenStack control plane - have you ever tested it in a Keystone use case? 21:46:00 i agree it is useful, but i'm still on the fence about whether we need a specific service, i mean you could do something pretty similar with Murano packages 21:46:03 No, we didn't do that 21:46:16 or even just an external orchestration tool - Juju, Ansible, etc 21:46:46 b1airo: We do a fair bit with heat wrapped up in Ansible. What these services get you is a dashboard panel. 21:46:57 Which helps for user friendliness 21:47:10 we have not explored these 21:48:31 I suppose it doesn't do much for you if Hadoop/Spark is part of a wider application platform - like http://opencb.org 21:48:47 oneswig: yes UI is important, but Murano could give you that too 21:49:10 b1airo: Isn't Murano in a similar boat to Sahara? 21:49:32 have you looked at the Sahara interface? there are a LOT of widgets, confusing even for someone with a vague idea of what they are doing 21:49:52 oneswig: dev/community wise? yes i suppose so 21:50:24 i suppose i'm thinking that not all of these projects will last and wondering what things are best to invest in 21:50:25 If the OpenCB can run with default Hadoop/Spark, they should be able to run on our packages directly 21:50:44 b1airo: No I haven't - slightly worried that the interface might not be the panacea we hope for... 21:51:10 We are overrunning on time... any final questions WG? 21:51:13 other option for UI is something like Ambari, but then that's a service atop your cloud, not integrated into it 21:52:05 b1airo: perhaps that's OK, in that the application platform is not locked in to OpenStack... 21:52:22 heresy I know :-) 21:52:42 OK, we should cover other items. 21:52:51 OK 21:52:53 :-) 21:52:54 Thank you Xiaoyi-Lu-OSU_ and DK_ - very helpful 21:53:02 Thank you!! 21:53:04 Thanks everyone! 21:53:06 really good to have you come by! 21:53:12 yes thanks a lot! 21:53:19 really good coverage, truly appreciate 21:53:29 Martial do you have a roundup on ORC? 21:53:36 yes I do 21:53:39 #topic ORC roundup 21:53:46 ORC continue its discussion on the different topics discovered during the original effort 21:53:52 #link https://drive.google.com/drive/folders/0B4Y7flFgUgf9dElkaFkwbUhKblU 21:53:58 Conversation is ongoing toward a federation effort for more than just OpenStack 21:54:04 There is a request to see what and who from the SWG can bring any research user stories / information / effort to the initiative 21:54:12 The next meeting is going to be at the end of August in Amsterdam (dates to be finalized) 21:54:19 Stig, are you able/willing to go? 21:54:39 (physical meeting) 21:54:47 martial: sounds possible. 21:54:55 there is a weekly telecon on Mondays at 11am EST 21:54:59 I love a good kletskoppen 21:55:24 oneswig: will mention that to Kazil then, he will reach out to you once the official meeting is set 21:55:36 and that is it for ORC roundup :) 21:55:45 OK, thanks martial 21:55:54 (yes I prepared my text :) ) 21:56:10 Anyone going to ISC next week? I believe Xiaoyi-Lu-OSU_ and DK will be there... 21:56:42 Unfortunately I will not but one of my colleagues is heading over 21:56:50 we are there 21:56:56 Xiaoyi-Lu-OSU_: great 21:56:56 hey oneswig, ill be over at ISC! 21:57:09 i was going to but then realised how close it was to all the travel i've just had - need to stick around home for a while! 21:57:19 Hello powerd! You guys should meet up with Xiaoyi-Lu-OSU_? 21:57:58 we are happy to meet with you guys there 21:57:59 powerd: John's going to be there, perhaps he already mentioned that. 21:58:06 yea that would be great - i sat in a workshop at SC about HiBD and have had it on my 'to test' list for for too long 21:58:15 quick question - anyone got experience with transparent hugepages integration with HPC workloads ? 21:58:21 yup will be syncing with john too 21:58:41 b1airo: not that I'm aware of, we've been busy on baremetal these last few months 21:59:06 I had one final question 21:59:11 e.g., how to transparently make allocations use madvise etc, preferably with scheduler switches 21:59:13 #link CharlieCloud article https://insidehpc.com/2017/06/charliecloud-simplifies-big-data-supercomputing-lanl/ 21:59:26 trandles: Are you holding punched cards in the photo?? 21:59:36 haha 21:59:38 lol 21:59:40 You guys are old school :-) 21:59:41 yeah the caption wasn't correct 22:00:05 I'm holding equivalent pages representing code base for docker, shifter, singularity 22:00:09 really floppy disks? 22:00:18 Reid is holding all of Charliecloud's source code 22:00:35 should the caption have read: "here's some guys we ran into outside the printer room" 22:00:42 yes basically 22:00:45 OK - on that happy note, time to take it away for another week... 22:00:56 Thanks all 22:00:59 #endmeeting