11:00:47 <oneswig> #startmeeting scientific-sig 11:00:49 <openstack> Meeting started Wed Jun 19 11:00:47 2019 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:52 <openstack> The meeting name has been set to 'scientific_sig' 11:01:00 <oneswig> Hello 11:01:13 <verdurin> Hello. 11:01:16 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_June_19th_2019 11:01:19 <janders> g'day all! 11:01:37 <oneswig> hi verdurin janders, glad you could make it 11:02:00 <oneswig> janders: have you seen martial at ISC? 11:02:15 <oneswig> #topic ISC roundup 11:02:29 <janders> oneswig: I haven't attended this year unfortunately 11:02:41 <oneswig> martial: guten tag :-) 11:02:42 <martial> Hello 11:02:48 <oneswig> #chair martial 11:02:49 <openstack> Current chairs: martial oneswig 11:02:55 <oneswig> janders: ah, for some reason I thought you were going 11:02:59 <oneswig> my mistake 11:03:06 <janders> few of my colleagues were there though 11:03:12 <janders> we scored a decent IO500 score :) 11:03:25 <oneswig> as did we :-) 11:03:36 <martial> janders: coming to Shanghai btw? 11:04:01 <oneswig> janders: bracewell? 11:04:09 <martial> Actually talked to the organizers by chance last night 11:04:15 <martial> Mentioned the BoF 11:04:27 <martial> Was told it is comitee driven 11:04:47 <janders> that's right! :) 11:04:52 <oneswig> The Bof from ISC? 11:04:57 <martial> Yep 11:05:33 <janders> still thinking whether I should aim for Shanghai or not. The political climate right now is so so. Good destination though! 11:06:22 <martial> The Chinese teams here are interesting, they have a demo of sugon too 11:06:36 <martial> Really nice 11:07:00 <oneswig> martial: what are they demonstrating? 11:07:25 <martial> Mostly their different data centers 11:07:58 <martial> And a sale team explain how you can use it in your setup 11:08:10 <janders> looking at IO500 - you have the fastest Lustre, while we have the fastest BeeGFS :) 11:08:28 <oneswig> Calling that evens janders? 11:08:37 <martial> Very interesting hardware 11:08:55 <martial> Very popular beegfs here 11:09:17 <janders> the 10 node challenge was close to even :) 11:09:55 <janders> on the main list, a different story though. Very well done! 11:09:56 <oneswig> I think a lot of the tuning at Cambridge was done on striping for large server configurations. 11:10:31 <oneswig> Thanks janders, although I don't think we were involved in the performance tuning at all, just the dynamic orchestration 11:10:36 <janders> Nearly double the score of the second place. That's something! :) 11:11:10 <oneswig> It was about half the score of the (then) top place in the last list, which goes to show what tuning can do for Lustre! 11:11:27 <janders> your good advice helped us out with making some early architectural decisions - so thank you! :) 11:11:30 <janders> indeed 11:11:47 <janders> pretty amazing IOPS numbers too - I didn't know Lustre can do that 11:12:13 <janders> aggregate bandwidth - hell yeah but IOPS? That's something quite new. 11:12:26 <martial> Am going to walk around and read ;) 11:13:14 <oneswig> martial: what's the news on hpc cloud at ISC? 11:13:20 <janders> storage aside, what were the ISC highlights? 11:13:23 <janders> exactly :) 11:14:41 <martial> Trend is toward arm (very present), AMD new processors and beegfs 11:14:53 <martial> Nvidia is not present which is surprising 11:15:35 <oneswig> not at all? 11:16:15 <martial> Couple nvidia people for talks but that is it 11:16:38 <oneswig> That's surprising to me 11:16:52 <martial> Same here 11:16:53 <janders> esp given the mlnx acquisition.. 11:17:12 <martial> Mellanox is here 11:17:33 <martial> But no Nvidia "themselves" 11:19:43 <martial> Oracle, IBM, hp, Microsoft,... 11:20:08 <oneswig> How much public cloud presence is there? 11:20:19 <martial> Google cloud 11:20:37 <martial> Aws 11:20:44 <martial> Microsoft 11:20:55 <oneswig> have Google come up with an HPC offering yet? Its well overdue 11:21:15 <martial> Actually they had a booth talk about this yesterday 11:22:09 <martial> Am looking for the name 11:22:31 <HPCJohn> JohnH here. Interested in what Google are doing 11:22:32 <oneswig> While you're digging that up, let's shift over to HPCW 11:22:38 <martial> It provision hpc workloads 11:22:48 <oneswig> Hi HPCJohn :-) 11:23:21 <oneswig> martial: was there anything like the Bof we proposed? 11:23:46 <martial> Nothing comparable 11:24:39 <oneswig> huh. 11:24:41 <janders> google HPC offering = k8s DCaaS? :) 11:25:18 <martial> #link https://cloud.google.com/solutions/hpc/ 11:25:33 <martial> In the videos 11:25:51 <oneswig> janders: would love to see your BeeGFS tuning make its way into https://galaxy.ansible.com/stackhpc/beegfs 11:26:06 <martial> Basically deploys slurm and other tools and you can use it 11:26:28 <oneswig> martial: it's bobbins unless they've done something to improve the networking 11:26:37 <martial> As a regular cloud with hpc tools pre installed 11:27:26 <janders> oneswig: nice! :) we're far behind with ansible-isation of BeeGFS but hoping to catch up :) 11:28:02 <martial> Seems more of a use case at the demo I saw but seems interesting 11:28:22 <martial> As for hpcw it is happening tomorrow 11:28:44 <martial> Quite an interesting program 11:28:54 <oneswig> martial: ah, so my round-up agenda item was premature! 11:28:58 <martial> With 5 minutes slot for most presenters 11:29:24 <janders> yeah Thursdays at ISC can be quite interesting :) 11:29:25 <martial> It's okay, I should have access to all the slide decks 11:29:32 <martial> And should be able to share 11:29:33 <oneswig> martial: how many 5-minute presentations? 11:30:04 <martial> #link http://qnib.org/isc/ 11:30:14 <martial> That's the program so far 11:30:18 <janders> sounds like a lightning-talk-storm over Frankfurt.. :) 11:30:53 <martial> Maybe I could had a slide on Kayobe 11:31:19 <oneswig> That would be cool martial, let me know if you need resources for that 11:31:30 <oneswig> So what's Christian up to - apart from hosting HPCW? 11:31:39 <HPCJohn> I know Christian. Good guy 11:32:06 <HPCJohn> If it is relevant, the latest version of Singularity is out yesterday. Lots of OCI features. 11:32:24 <HPCJohn> Also you can check a Singularity container directly out of a Docker harbour, as I understand 11:33:10 <HPCJohn> The direct quoto so I get ti right an oras:// URI allowing you to push and pull SIF files to supported OCI registries! 11:33:33 <oneswig> I was talking to someone about the latest on containers, MPI, kubernetes, etc. 11:34:13 <martial> Christian is starting at Amazon in July I think 11:34:13 <oneswig> #link Latest on MPI configuration with Singularity and K8S https://pmix.org/wp-content/uploads/2019/04/PMIxSUG2019.pdf 11:35:03 <martial> Stig I will follow up on slack 11:35:21 <martial> A one page for Kayobe would be cool indeed 11:35:32 <oneswig> martial: I'll see what I can find for you 11:36:09 <oneswig> OK, so we can't cover HPCW just yet but I was definitely at the Sanger Centre OpenStack day yesterday... 11:36:21 <oneswig> #topic Sanger OpenStack day round-up 11:36:40 <oneswig> Had a good session there, some interesting talks 11:36:56 <verdurin> Missed it again... Eheu. 11:37:13 <oneswig> Gerry Reilly from Health Data Research UK gave a good high-level presentation on secure computing environments 11:37:51 <oneswig> Had very similar thoughts in the abstract to the kind of projects going on across the research computing / Openstack space 11:38:28 <oneswig> There was some interesting update from the Sanger team themselves on their giant Ceph deployment, and plans for more science and more data on OpenStack 11:38:56 <oneswig> I mentioned you verdurin - in reference to your description of "sidegrades" a couple of years ago when you had your Crick hat on 11:39:01 <verdurin> Was at the HDRUK meeting last week when those environments were being discussed. 11:39:27 <verdurin> Gosh. 11:39:45 <oneswig> verdurin: interesting. Is there any scope for alignment on these matters? 11:39:48 <martial> Stig sent you a pm on slack 11:40:23 <oneswig> Jani from Basel presented on something similar in Switzerland - SPHN - but with federation involved 11:41:17 <verdurin> There is, my impression is that there's still a diversity of viewpoints. 11:42:20 <oneswig> Bruno from the Crick presented on eMedlab - as an early pioneer it has plenty of hindsight to offer 11:43:07 <oneswig> verdurin: it would be interesting to hear how those discussions develop. 11:43:41 <HPCJohn> Me too please - hindsight is a wonderful thing. 11:43:55 <HPCJohn> I have followed eMEdlab from early days 11:45:26 <oneswig> I hadn't realised there was 512G RAM in each hypervisor - that's quite something, particularly in a machine that's >3 years old 11:46:56 <oneswig> johnthetubaguy presented on upstream developments for research computing use cases and I presented on some recent Rocky upgrade experiences 11:47:57 <oneswig> It was good to hear an update on the scale of the JASMIN system and their new Mirantis OpenStack deployment too. 11:49:26 <oneswig> #topic AOB 11:49:32 <oneswig> So what else is new? 11:49:40 <HPCJohn> JASMIN interesting. Scientists not using public cloud. But gathering research grants into a central facility 11:50:08 <oneswig> HPCJohn: It does appear to be a success story. 11:50:21 <HPCJohn> I can see that model being applied more widely - I guess in climate research there is no patient ideitifiable data! 11:51:08 <oneswig> That niche field intersecting climate science and genomics - seasonally-affected disorder perhaps? 11:52:21 <janders> from my side - a little follow up from Denver 11:52:22 <janders> https://bugs.launchpad.net/neutron/+bug/1829449 11:52:23 <openstack> Launchpad bug 1829449 in neutron "Implement consistency check and self-healing for SDN-managed fabrics" [Wishlist,New] 11:52:28 <janders> some progress on the SDN front 11:52:56 <janders> tags: added: rfe-approved 11:53:03 <janders> thanks to those involved! :) 11:53:16 <oneswig> The glacier of development grinds ever onwards 11:54:02 <oneswig> Good work janders, keep it up! 11:54:08 <oneswig> We had an interesting problem using Docker and the overlay storage driver 11:54:38 <oneswig> It turns out that you can exceed 64K hard links to a single inode! 11:55:34 <HPCJohn> Some filsystems hide small files inside the inode 11:55:47 <HPCJohn> You are turning this on its head - hiding a filesystem in the inode 11:55:49 <oneswig> In yum, there is a file called checksum_type in the metadata of each package which is hard-linked to one source. ie,the hard link count is a function of the number of packages installed 11:56:20 <oneswig> The layers of Docker overlays, and multiple containers using the same base layers will compound this. 11:56:33 <oneswig> Hence we hit 65536 links - the limit for ext4 11:56:40 <oneswig> Solution: XFS allows more links :-) 11:57:20 <janders> I will have that with triple-nested VMs, thanks :) 11:57:34 <oneswig> The unexpected consequences of creating many virtual systems from one physical one. 11:58:06 <oneswig> We are nearly at the hour, any more to add? 11:58:22 <janders> some time back I was trying to convince mlnx to implemented SRIOV in nested-virt, but they didn't share my enthusiasm :) 11:58:30 <oneswig> janders: we may start work soon on HA support for Mellanox-NEO-UFM SDN 11:58:39 <janders> s/implemented/implement 11:58:54 <janders> cool! I'm heading in that direction, too 11:59:08 <oneswig> see you at the first hurdle then :-) 11:59:16 <janders> just hitting some roadblocks lower down the stack with new hardware 11:59:19 <janders> that's right! :) 11:59:23 <janders> no rest for the wicked 11:59:32 <martial> Cool talk soon team 11:59:40 <oneswig> And on that note, back to it. 11:59:42 <janders> thank you all 11:59:45 <oneswig> Thanks all 11:59:48 <oneswig> #endmeeting