21:02:30 #startmeeting Scientific-SIG 21:02:31 Meeting started Tue Sep 3 21:02:30 2019 UTC and is due to finish in 60 minutes. The chair is martial. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:02:32 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:02:34 The meeting name has been set to 'scientific_sig' 21:03:06 cute informal meeting today as well :) 21:03:18 I'm making up the numbers by joining twice 21:03:24 trying to get my mac to cooperate 21:04:18 janders & janders42 : good man ! 21:04:50 I am nowadays using irccloud, simpler way to go online 21:05:05 good idea - I should look at this, too 21:05:17 what are you up to these days martial? 21:05:45 trying to train some models for ML for a Yolo test 21:05:54 I'm fighting our GPFS hardware, finally winning 21:06:18 (looking up initial benchmark of what the kit is capable of) 21:06:38 Run status group 0 (all jobs): READ: bw=36.1GiB/s (38.7GB/s), 4618MiB/s-4745MiB/s (4842MB/s-4976MB/s), io=800GiB (859GB), run=21579-22174msecRun status group 1 (all jobs): WRITE: bw=24.0GiB/s (26.8GB/s), 3194MiB/s-3224MiB/s (3349MB/s-3381MB/s), io=800GiB (859GB), run=31757-32057msec[root@s206 ~]# cat bw.fio# Do some important numbers on SSD drives 21:06:38 , to gauge what kind of# performance you might get out of them.## Sequential read and write speeds are tested, these are expected to be# high. Random reads should also be fast, random writes are where crap# drives are usually separated from the good drives.## This uses a queue depth of 4. New SATA SSD's will support up to 32# in flight commands, so 21:06:39 it may also be interesting to increase the queue# depth and compare. Note that most real-life usage will not see that# large of a queue depth, so 4 is more representative of normal use.#[global]bs=10Mioengine=libaioiodepth=32size=100gdirect=1runtime=60#directory=/mntfilename=/dev/md/stripenumjobs=8[seq-read]rw=readstonewall[seq-write]rw=writestone 21:06:39 wall[root@s206 ~]# 21:06:55 39GB/s read, 27GB/s write per node 21:06:58 there will be 6 21:07:01 with HDR200 21:07:04 so should be good 21:07:38 this is evolution of our BeeGFS design - more balanced, non-blocking in terms of PCIe bandwidth 21:07:41 that is quite good 21:08:08 unfortunately the interconnect doesnt fully work yet, but I'll be at the DC later this morning hopefully getting it to work 21:08:57 have you ever looked at k8s/GFPS integration? 21:09:27 this is meant to be HPC OpenStack storage backend and HPC storage backend but I think there's some interesting work happening in k8s-gpfs space 21:10:11 I confess we have not had the need for now 21:10:43 what are your reading recommendations on this topic? 21:11:09 I think all I had was some internal email from IBM folks 21:11:41 I hope to learn more about this in the coming months - and will report back 21:12:17 that would indeed be useful 21:13:09 maybe a small white paper on the subect? 21:14:04 #chair b1airo 21:14:05 Current chairs: b1airo martial 21:14:08 if it will be possible to remote into Shanghai SIG meetup, I can give a lightning talk about GPFS as OpenStack and/or k8s backend 21:14:11 hey Blair! 21:14:11 o/ 21:14:21 issues connecting today sorry 21:14:45 another "interesting" thing I've come across lately is how Mellanox splitter cables work on some switches 21:14:46 how goes it janders42 21:14:49 not sure if Stig will have that capability at the Scientific SIG meet in Shanghai but I hope so 21:15:16 i'd be keen to see that lightning talk janders42 21:15:19 yeah that would be great! :) - unlikely I will be able to attend in person at this stage 21:15:47 not sure if Blair will be there. I will not 21:15:54 (Shanghai) 21:16:13 I plugged a 50GE to 4x10gE cable into my mlnx-eth switch yesterday and enabled the splitter function on the port 21:16:35 and the switch went "port eth1/29 reconfigured. Port eth1/30 reconfigured" 21:16:48 erm... I did NOT want to touch port 30 - it's an uplink for an IPMI switch... 21:17:00 boom 21:17:03 another DC trip 21:17:10 nice "feature" 21:17:27 with some switches it is possible to use splitters and still use all of the ports 21:17:45 with others - the above happens - connecting a splitter automagically kills off the port immediately below 21:17:57 oh well lesson learned 21:18:06 hopefully will undo this later this morning 21:18:52 or maybe you will lose your uplink ... 21:19:25 (that's why we love technology :) ) 21:19:43 no, i won't make it to Shanghai. already have a couple of trips before the end of the year and another to the States early Jan 21:20:15 joining us in Denver ? 21:20:21 Denver = SC? 21:20:36 janders42: iirc that functionality is documented regarding the breakout ports 21:21:04 yeah... @janders 21:21:07 RTFM 21:21:07 :D 21:21:21 just quite counter-intuitive 21:21:46 SC yes 21:21:50 reconfiguring the port requires a full switchd restart and associated low-level changes, which interrupts all the way down to L2 i guess 21:22:11 SC yes 21:22:20 just looked at the calendar and SC19 and Kubecon clash 21:22:27 I'm hoping to go to Kubecon 21:22:41 otherwise would be happy to revisit Denver 21:22:59 how long of a trip is it for you Blair to get to LAX? 14 hrs? Bit closer than from here I suppose.. 21:23:54 yeah bit closer, just under 12 i think 21:23:58 nice! 21:24:17 usually try to fly via San Fran though 21:24:29 yeah LAX can be hectic 21:24:35 I quite like Dallas 21:24:37 very smooth 21:24:54 good stopover while heading to more central / eastern parts of the States 21:25:25 not optimal for San Diego though - that'd be a LAX/SFO connection 21:25:52 (like I said a very AOB meeting today ;) ) 21:26:02 since Blair is here I will re-post the GPFS node benchmarks 21:26:06 :-) 21:26:12 yes please 21:26:13 Run status group 0 (all jobs): READ: bw=36.1GiB/s (38.7GB/s), 4618MiB/s-4745MiB/s (4842MB/s-4976MB/s), io=800GiB (859GB), run=21579-22174msecRun status group 1 (all jobs): WRITE: bw=24.0GiB/s (26.8GB/s), 3194MiB/s-3224MiB/s (3349MB/s-3381MB/s), io=800GiB (859GB), run=31757-32057msec[root@s206 ~]# cat bw.fio# Do some important numbers on SSD drives 21:26:13 , to gauge what kind of# performance you might get out of them.## Sequential read and write speeds are tested, these are expected to be# high. Random reads should also be fast, random writes are where crap# drives are usually separated from the good drives.## This uses a queue depth of 4. New SATA SSD's will support up to 32# in flight commands, so 21:26:14 it may also be interesting to increase the queue# depth and compare. Note that most real-life usage will not see that# large of a queue depth, so 4 is more representative of normal use.#[global]bs=10Mioengine=libaioiodepth=32size=100gdirect=1runtime=60#directory=/mntfilename=/dev/md/stripenumjobs=8[seq-read]rw=readstonewall[seq-write]rw=writestone 21:26:14 wall[root@s206 ~]# 21:26:26 this is from a single node with 12NVMes - no GPFS yet 21:26:38 but we did manage to get a 12NVMe non-blocking PCIe topology going 21:26:47 39GB/s read 27GB/s write 21:27:07 we'll have six of those puppies on HDR200 so should be good 21:27:34 but having said that I need to head off to the DC soon to bring this dropped out IPMI switch back on the network - otherwise I can't build the last node.,, 21:28:15 would be interesting to see how those numbers change with different workload characteristics 21:28:26 bad news is I haven't found any way whatsoever to build these through Ironic 21:28:48 14 drives per box booting thru UEFI from drives number 8 and 9 is too much of an ask 21:28:53 but those look like big numbers for relatively low depth 21:28:55 and these drives need to be SWRAID, too 21:29:54 I think there's a fair bit of room for tweaking, this was just to prove that the topology is right 21:30:20 it would be very interesting to see how the numbers stack up against our IO500 #4 BeeGFS cluster 21:30:27 what's the GPFS plan with these? declusered raid? 21:30:28 in the ten node challenge 21:30:34 GPFS-EC 21:31:04 though we will run 4 nodes of EC and 2 nodes of non-EC just to understand the impact (or lack of) of EC on throughput/latency 21:31:19 for prod it will be all EC 21:32:15 the idea behind this system is a mash up of Ceph style arch, RDMA transport and NVMes connected in a non-blocking fashion 21:32:34 hoping to get the best of all these worlds 21:32:46 so far the third bit looks like a "tick" 21:32:55 these run much smoother than our BeeGFS nodes in the early days 21:33:40 these do have some blocking which is causing all sorts of special effects if not handled carefully 21:33:49 the new GPFSes have none 21:34:25 which gets a little funny cause people hear this and ask me - so what's the difference between BeeGFS and GPFS, why do we need both? 21:34:41 I used to say BeeGFS is a Ferrari and GPFS is a beefed up Ford Ranger 21:34:47 ha! 21:34:53 but it really comes to a Ferrari and Porsche Cayenne now really 21:35:21 i guess a lot of it comes down to the question of what happens when either a) the network goes tits up, and/or b) the power suddenly goes out from a node/rack/room 21:35:33 all good questions 21:35:41 and with BeeGFS I have a catch-all answer 21:35:43 it's scratch 21:35:45 it's not backed up 21:35:48 if it goes it goes 21:35:48 i.e., do you still have a working cluster tomorrow :-), and is there any data still on it 21:36:00 GPFS on the other hand... ;) 21:36:07 get out of jail free card ;-) 21:36:38 if one was to go through a car crash, I recommend being in the Cayenne not the Ferrari 21:36:41 let's put it this way 21:37:14 but yeah it's an interesting little project 21:37:34 if it wasnt all the drama with HDR VPI firmware slipping and splitter cables giving us shit it would be up and running by now 21:37:37 hopefully another month 21:37:51 and on that note I better start getting ready to head out to the DC to fix that IPMI switch.. 21:38:10 we got mlnx 1GE switches with Cumulus on them for IPMI 21:38:14 I can't remember why 21:38:29 they are funky, but it is soo distracting switching between mlnx-os and cumulus 21:38:39 completely different philosophy and management style 21:38:53 probably wouldnt get those again - just some braindead cisco like ones for IPMI 21:39:17 MLNX-100GE and HDR ones are great to work with though 21:39:39 (except the automatic shutdown of the other port when enabling splitters :) ) 21:39:53 if you go Cumulus all the way then you can obviously get 1G switches too, but then bye bye IB 21:40:08 i'm slightly distracted in another Zoom meeting now sorry 21:40:14 no worries 21:40:19 wrapping up here, too 21:40:45 I hope I haven't bored you to death with my GPFS endeavours 21:41:13 no, sounds fun! 21:41:42 if you happen to have a cumulus-cisco cheat sheet I'd love that 21:41:55 I wish I could just paste config snippets to Google Translate 21:41:58 :D 21:44:48 with that, should we move to AOB? :) 21:45:26 If not, I will propose we adjourn Today's meeting 21:47:15 see you all next week 21:47:21 #endmeeting