21:02:30 <martial> #startmeeting Scientific-SIG 21:02:31 <openstack> Meeting started Tue Sep 3 21:02:30 2019 UTC and is due to finish in 60 minutes. The chair is martial. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:02:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:02:34 <openstack> The meeting name has been set to 'scientific_sig' 21:03:06 <martial> cute informal meeting today as well :) 21:03:18 <janders42> I'm making up the numbers by joining twice 21:03:24 <janders42> trying to get my mac to cooperate 21:04:18 <martial> janders & janders42 : good man ! 21:04:50 <martial> I am nowadays using irccloud, simpler way to go online 21:05:05 <janders42> good idea - I should look at this, too 21:05:17 <janders42> what are you up to these days martial? 21:05:45 <martial> trying to train some models for ML for a Yolo test 21:05:54 <janders42> I'm fighting our GPFS hardware, finally winning 21:06:18 <janders42> (looking up initial benchmark of what the kit is capable of) 21:06:38 <janders42> Run status group 0 (all jobs): READ: bw=36.1GiB/s (38.7GB/s), 4618MiB/s-4745MiB/s (4842MB/s-4976MB/s), io=800GiB (859GB), run=21579-22174msecRun status group 1 (all jobs): WRITE: bw=24.0GiB/s (26.8GB/s), 3194MiB/s-3224MiB/s (3349MB/s-3381MB/s), io=800GiB (859GB), run=31757-32057msec[root@s206 ~]# cat bw.fio# Do some important numbers on SSD drives 21:06:38 <janders42> , to gauge what kind of# performance you might get out of them.## Sequential read and write speeds are tested, these are expected to be# high. Random reads should also be fast, random writes are where crap# drives are usually separated from the good drives.## This uses a queue depth of 4. New SATA SSD's will support up to 32# in flight commands, so 21:06:39 <janders42> it may also be interesting to increase the queue# depth and compare. Note that most real-life usage will not see that# large of a queue depth, so 4 is more representative of normal use.#[global]bs=10Mioengine=libaioiodepth=32size=100gdirect=1runtime=60#directory=/mntfilename=/dev/md/stripenumjobs=8[seq-read]rw=readstonewall[seq-write]rw=writestone 21:06:39 <janders42> wall[root@s206 ~]# 21:06:55 <janders42> 39GB/s read, 27GB/s write per node 21:06:58 <janders42> there will be 6 21:07:01 <janders42> with HDR200 21:07:04 <janders42> so should be good 21:07:38 <janders42> this is evolution of our BeeGFS design - more balanced, non-blocking in terms of PCIe bandwidth 21:07:41 <martial> that is quite good 21:08:08 <janders42> unfortunately the interconnect doesnt fully work yet, but I'll be at the DC later this morning hopefully getting it to work 21:08:57 <janders42> have you ever looked at k8s/GFPS integration? 21:09:27 <janders42> this is meant to be HPC OpenStack storage backend and HPC storage backend but I think there's some interesting work happening in k8s-gpfs space 21:10:11 <martial> I confess we have not had the need for now 21:10:43 <martial> what are your reading recommendations on this topic? 21:11:09 <janders42> I think all I had was some internal email from IBM folks 21:11:41 <janders42> I hope to learn more about this in the coming months - and will report back 21:12:17 <martial> that would indeed be useful 21:13:09 <martial> maybe a small white paper on the subect? 21:14:04 <martial> #chair b1airo 21:14:05 <openstack> Current chairs: b1airo martial 21:14:08 <janders42> if it will be possible to remote into Shanghai SIG meetup, I can give a lightning talk about GPFS as OpenStack and/or k8s backend 21:14:11 <janders42> hey Blair! 21:14:11 <b1airo> o/ 21:14:21 <b1airo> issues connecting today sorry 21:14:45 <janders42> another "interesting" thing I've come across lately is how Mellanox splitter cables work on some switches 21:14:46 <b1airo> how goes it janders42 21:14:49 <martial> not sure if Stig will have that capability at the Scientific SIG meet in Shanghai but I hope so 21:15:16 <b1airo> i'd be keen to see that lightning talk janders42 21:15:19 <janders42> yeah that would be great! :) - unlikely I will be able to attend in person at this stage 21:15:47 <martial> not sure if Blair will be there. I will not 21:15:54 <martial> (Shanghai) 21:16:13 <janders42> I plugged a 50GE to 4x10gE cable into my mlnx-eth switch yesterday and enabled the splitter function on the port 21:16:35 <janders42> and the switch went "port eth1/29 reconfigured. Port eth1/30 reconfigured" 21:16:48 <janders42> erm... I did NOT want to touch port 30 - it's an uplink for an IPMI switch... 21:17:00 <janders42> boom 21:17:03 <janders42> another DC trip 21:17:10 <janders42> nice "feature" 21:17:27 <janders42> with some switches it is possible to use splitters and still use all of the ports 21:17:45 <janders42> with others - the above happens - connecting a splitter automagically kills off the port immediately below 21:17:57 <janders42> oh well lesson learned 21:18:06 <janders42> hopefully will undo this later this morning 21:18:52 <martial> or maybe you will lose your uplink ... <disconnect> 21:19:25 <martial> (that's why we love technology :) ) 21:19:43 <b1airo> no, i won't make it to Shanghai. already have a couple of trips before the end of the year and another to the States early Jan 21:20:15 <martial> joining us in Denver ? 21:20:21 <janders42> Denver = SC? 21:20:36 <b1airo> janders42: iirc that functionality is documented regarding the breakout ports 21:21:04 <janders42> yeah... @janders 21:21:07 <janders42> RTFM 21:21:07 <janders42> :D 21:21:21 <janders42> just quite counter-intuitive 21:21:46 <martial> SC yes 21:21:50 <b1airo> reconfiguring the port requires a full switchd restart and associated low-level changes, which interrupts all the way down to L2 i guess 21:22:11 <b1airo> SC yes 21:22:20 <janders42> just looked at the calendar and SC19 and Kubecon clash 21:22:27 <janders42> I'm hoping to go to Kubecon 21:22:41 <janders42> otherwise would be happy to revisit Denver 21:22:59 <janders42> how long of a trip is it for you Blair to get to LAX? 14 hrs? Bit closer than from here I suppose.. 21:23:54 <b1airo> yeah bit closer, just under 12 i think 21:23:58 <janders42> nice! 21:24:17 <b1airo> usually try to fly via San Fran though 21:24:29 <janders42> yeah LAX can be hectic 21:24:35 <janders42> I quite like Dallas 21:24:37 <janders42> very smooth 21:24:54 <janders42> good stopover while heading to more central / eastern parts of the States 21:25:25 <janders42> not optimal for San Diego though - that'd be a LAX/SFO connection 21:25:52 <martial> (like I said a very AOB meeting today ;) ) 21:26:02 <janders42> since Blair is here I will re-post the GPFS node benchmarks 21:26:06 <b1airo> :-) 21:26:12 <b1airo> yes please 21:26:13 <janders42> Run status group 0 (all jobs): READ: bw=36.1GiB/s (38.7GB/s), 4618MiB/s-4745MiB/s (4842MB/s-4976MB/s), io=800GiB (859GB), run=21579-22174msecRun status group 1 (all jobs): WRITE: bw=24.0GiB/s (26.8GB/s), 3194MiB/s-3224MiB/s (3349MB/s-3381MB/s), io=800GiB (859GB), run=31757-32057msec[root@s206 ~]# cat bw.fio# Do some important numbers on SSD drives 21:26:13 <janders42> , to gauge what kind of# performance you might get out of them.## Sequential read and write speeds are tested, these are expected to be# high. Random reads should also be fast, random writes are where crap# drives are usually separated from the good drives.## This uses a queue depth of 4. New SATA SSD's will support up to 32# in flight commands, so 21:26:14 <janders42> it may also be interesting to increase the queue# depth and compare. Note that most real-life usage will not see that# large of a queue depth, so 4 is more representative of normal use.#[global]bs=10Mioengine=libaioiodepth=32size=100gdirect=1runtime=60#directory=/mntfilename=/dev/md/stripenumjobs=8[seq-read]rw=readstonewall[seq-write]rw=writestone 21:26:14 <janders42> wall[root@s206 ~]# 21:26:26 <janders42> this is from a single node with 12NVMes - no GPFS yet 21:26:38 <janders42> but we did manage to get a 12NVMe non-blocking PCIe topology going 21:26:47 <janders42> 39GB/s read 27GB/s write 21:27:07 <janders42> we'll have six of those puppies on HDR200 so should be good 21:27:34 <janders42> but having said that I need to head off to the DC soon to bring this dropped out IPMI switch back on the network - otherwise I can't build the last node.,, 21:28:15 <b1airo> would be interesting to see how those numbers change with different workload characteristics 21:28:26 <janders42> bad news is I haven't found any way whatsoever to build these through Ironic 21:28:48 <janders42> 14 drives per box booting thru UEFI from drives number 8 and 9 is too much of an ask 21:28:53 <b1airo> but those look like big numbers for relatively low depth 21:28:55 <janders42> and these drives need to be SWRAID, too 21:29:54 <janders42> I think there's a fair bit of room for tweaking, this was just to prove that the topology is right 21:30:20 <janders42> it would be very interesting to see how the numbers stack up against our IO500 #4 BeeGFS cluster 21:30:27 <b1airo> what's the GPFS plan with these? declusered raid? 21:30:28 <janders42> in the ten node challenge 21:30:34 <janders42> GPFS-EC 21:31:04 <janders42> though we will run 4 nodes of EC and 2 nodes of non-EC just to understand the impact (or lack of) of EC on throughput/latency 21:31:19 <janders42> for prod it will be all EC 21:32:15 <janders42> the idea behind this system is a mash up of Ceph style arch, RDMA transport and NVMes connected in a non-blocking fashion 21:32:34 <janders42> hoping to get the best of all these worlds 21:32:46 <janders42> so far the third bit looks like a "tick" 21:32:55 <janders42> these run much smoother than our BeeGFS nodes in the early days 21:33:40 <janders42> these do have some blocking which is causing all sorts of special effects if not handled carefully 21:33:49 <janders42> the new GPFSes have none 21:34:25 <janders42> which gets a little funny cause people hear this and ask me - so what's the difference between BeeGFS and GPFS, why do we need both? 21:34:41 <janders42> I used to say BeeGFS is a Ferrari and GPFS is a beefed up Ford Ranger 21:34:47 <b1airo> ha! 21:34:53 <janders42> but it really comes to a Ferrari and Porsche Cayenne now really 21:35:21 <b1airo> i guess a lot of it comes down to the question of what happens when either a) the network goes tits up, and/or b) the power suddenly goes out from a node/rack/room 21:35:33 <janders42> all good questions 21:35:41 <janders42> and with BeeGFS I have a catch-all answer 21:35:43 <janders42> it's scratch 21:35:45 <janders42> it's not backed up 21:35:48 <janders42> if it goes it goes 21:35:48 <b1airo> i.e., do you still have a working cluster tomorrow :-), and is there any data still on it 21:36:00 <janders42> GPFS on the other hand... ;) 21:36:07 <b1airo> get out of jail free card ;-) 21:36:38 <janders42> if one was to go through a car crash, I recommend being in the Cayenne not the Ferrari 21:36:41 <janders42> let's put it this way 21:37:14 <janders42> but yeah it's an interesting little project 21:37:34 <janders42> if it wasnt all the drama with HDR VPI firmware slipping and splitter cables giving us shit it would be up and running by now 21:37:37 <janders42> hopefully another month 21:37:51 <janders42> and on that note I better start getting ready to head out to the DC to fix that IPMI switch.. 21:38:10 <janders42> we got mlnx 1GE switches with Cumulus on them for IPMI 21:38:14 <janders42> I can't remember why 21:38:29 <janders42> they are funky, but it is soo distracting switching between mlnx-os and cumulus 21:38:39 <janders42> completely different philosophy and management style 21:38:53 <janders42> probably wouldnt get those again - just some braindead cisco like ones for IPMI 21:39:17 <janders42> MLNX-100GE and HDR ones are great to work with though 21:39:39 <janders42> (except the automatic shutdown of the other port when enabling splitters :) ) 21:39:53 <b1airo> if you go Cumulus all the way then you can obviously get 1G switches too, but then bye bye IB 21:40:08 <b1airo> i'm slightly distracted in another Zoom meeting now sorry 21:40:14 <janders42> no worries 21:40:19 <janders42> wrapping up here, too 21:40:45 <janders42> I hope I haven't bored you to death with my GPFS endeavours 21:41:13 <b1airo> no, sounds fun! 21:41:42 <janders42> if you happen to have a cumulus-cisco cheat sheet I'd love that 21:41:55 <janders42> I wish I could just paste config snippets to Google Translate 21:41:58 <janders42> :D 21:44:48 <martial> with that, should we move to AOB? :) 21:45:26 <martial> If not, I will propose we adjourn Today's meeting 21:47:15 <martial> see you all next week 21:47:21 <martial> #endmeeting