11:00:14 <oneswig> #startmeeting scientific-sig
11:00:25 <oneswig> ahoy
11:00:43 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_August_28th_2019
11:01:14 <oneswig> back to school...
11:01:37 <janders> too cool for school!
11:01:50 <janders> g'day all
11:02:06 <oneswig> hey janders, how's things?
11:02:33 <janders> not too bad... just got the boot media for our GPFS cluster
11:02:42 <janders> was overlooked in the bill of materials somehow
11:02:47 <janders> build starting next week
11:02:57 <verdurin> Afternoon
11:03:00 <oneswig> sounds good - how big?
11:03:12 <oneswig> hi verdurin, trust you've had a good summer
11:03:23 <janders> small but should be punching above it's weight
11:03:25 <janders> six nodes
11:03:29 <janders> 12x8TB NVMe each
11:03:46 <oneswig> 2x 100G IB as well?
11:03:47 <verdurin> oneswig: I like to think it's not quite over yet, but we'll see...
11:03:55 <janders> 50GE+HDR
11:03:58 <oneswig> verdurin: indeed
11:04:16 <oneswig> janders: nice!  does the HDR work?
11:04:28 <janders> I'm quietly hoping to reach the 100GB/s mark
11:04:41 <janders> well.. it doesn't.. yet
11:04:46 <janders> but hopefully soon
11:05:40 <janders> if we wanted pure IB we could probably make it work today, but we're waiting for the VPI stuff to work
11:05:46 <janders> it's still few weeks from GA
11:06:17 <oneswig> dual port hdr?
11:06:23 <janders> correct
11:06:40 <oneswig> phewee.
11:06:49 <janders> this is actually a nice intro to the HPC-ceph discussion
11:07:01 <oneswig> We only have one item for the agenda - let's move on to it...
11:07:06 <janders> the idea behind this cluster is ceph-like self healing with the goodness of RDMA transport
11:07:15 <oneswig> #topic Scientific Ceph SIG
11:07:41 <janders> so - speaking of Ceph & HPC - how's RDMA support in Ceph going?
11:07:45 <oneswig> #link Ceph meeting later today http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036705.html
11:07:59 <oneswig> janders: my experience of it wasn't compelling
11:08:10 <oneswig> but it was ~9 months ago
11:08:20 <janders> what were the issues?
11:08:27 <verdurin> Yes, saw someone ask you about that on the mailing list oneswig
11:08:37 <oneswig> And I'd love to spend some quality time with it in the absence of distraction...
11:09:03 <oneswig> the issues were various.  It worked OK for RoCE but didn't work reliably in other fabrics
11:09:10 <verdurin> I'm trying to squeeze an update on this from Mellanox
11:09:15 <oneswig> at the time I had EDR IB and OPA to compare with
11:09:27 <oneswig> OPA I think was a first
11:09:31 <janders> wow
11:09:43 <janders> this might be the first time I hear something works with RoCE but not IB
11:09:51 <janders> usually it's the other way round isn't it? :)
11:10:25 <oneswig> On the RoCE side, performance was inconclusive.  I had a 25GE network and the performance saturated that.  The devices I had were SATA SSDs and networking was not the bottleneck.
11:10:39 <oneswig> So it was hard to show substantial benefit on the hardware available.
11:10:58 <oneswig> It actually underperformed TCP for large messages
11:10:58 <janders> riiight
11:12:06 <oneswig> The IB, I forget what the issue was but I think there were stability problems.  Lost transactions in the msgr protocol, that kind of thing.
11:12:24 <janders> yeah sounds a bit rough around the edges every time I look at it
11:12:37 <janders> it would be awesome to have it relatively stable
11:12:50 <oneswig> One neat thing was that Intel had contributed some work for iWARP that appeared to have nothing to do with iWARP per se, but did introduce rdmacm, which was needed for all these weird and wonderful fabric variants
11:13:09 <oneswig> janders: +1 on that, would love to see it work (and be worth the trouble)
11:13:28 <janders> Intel... I have some harsh words to say there in the AOB
11:13:49 <janders> keyword: VROC
11:13:51 <janders> later!
11:14:01 <oneswig> ooh, won't be long to wait, we only have a short agenda for today.
11:14:31 <oneswig> verdurin: how is Ceph performing for your workloads?
11:15:01 <verdurin> No complaints so far, oneswig: though I know there is much heavier work to come
11:15:24 <oneswig> #link Ceph day for Research and Scientific Computing at CERN https://indico.cern.ch/event/765214/
11:15:43 <oneswig> verdurin: sounds intriguing - new projects coming on board?
11:16:12 <verdurin> Yes, and we're also looking at Ceph now for a different storage requirement.
11:16:41 <oneswig> Will you be at the CERN event?
11:16:51 <verdurin> I will.
11:17:03 <oneswig> super, see you there :-)
11:17:05 <verdurin> Curious how it relates to the other Ceph Day that's scheduled for London in October.
11:17:23 <janders> I envy you guys... Geneva - sure, 2hrs flight if even that...
11:17:50 <verdurin> Sadly my CERN User Pass has expired, so I have to visit as a civilian, like everyone else.
11:18:00 <oneswig> janders: we don't get to go to LCA, in return
11:18:19 <janders> if I were to pick one I don't think it'd be LCA
11:18:29 <oneswig> verdurin: I'm sure they'll remember you...
11:18:30 <janders> will be a while till we get the Summit again
11:18:57 <verdurin> This is the other Ceph Day I mentioned: https://ceph.com/cephdays/ceph-day-london-2019/
11:19:27 <janders> you guys heading to Shanghai?
11:19:38 <verdurin> Obviously you have to filter out the CloudStack parts.
11:19:44 <oneswig> I'm planning to be there janders
11:19:53 <janders> nice!
11:19:58 <janders> not sure what I'll do
11:20:17 <verdurin> No plans to go so far.
11:20:21 <janders> I wouldn't expect much support given rather tense AUS-CN relations these days
11:20:29 <oneswig> verdurin: last time I went to Ceph London it was in a venue somewhere in Docklands.  Good to see they've relocated it.
11:20:34 <janders> would be fun though!
11:20:55 <janders> do you guys know if it's possible to remote into the PTG?
11:20:58 <oneswig> janders: you're not the first to entangle openstack with geo politics.
11:21:05 <verdurin> oneswig: Canada Water, yes.
11:21:39 <oneswig> verdurin: ah I remember seeing you there.  Wasn't it around the time of your move out west?
11:22:28 <verdurin> oneswig: A few months afterwards.
11:22:46 <HPCJohn> Canada Water Printworks?  5 minute walk from where I live. But I Agree - not a great conference venue.
11:22:55 <verdurin> oneswig: lots of good user talks at the CERN day. Do you know whether upstream people will be there?
11:23:14 <oneswig> verdurin: don't know I'm afraid
11:23:26 <oneswig> Hi HPCJohn, good to see you
11:23:48 <oneswig> It was a small venue that doubled as a daycare centre.  Don't remember the name...
11:24:00 <oneswig> If I have the chance I'll join the Scientific Ceph meeting later
11:24:18 <oneswig> Might be interesting to run the RDMA flag up the pole there.
11:24:30 <janders> +1
11:24:33 <verdurin> Yes, I'm hoping to do the same.
11:24:40 <janders> what time is it? is it back to back with our meeting, or much later?
11:24:57 <oneswig> I think any serious effort on making it zing is deferred until the scylladb-derived messenger is introduced (forget the name)
11:25:05 <HPCJohn> May I ask what time British Summer time?   I think 16:30 as that is European time also
11:25:46 <oneswig> Time was 10:30am us eastern/4:30pm eu central time and I think that means 3:30pm BST
11:26:02 <oneswig> In Canberra, it's the middle of the night :-(
11:26:18 <verdurin> HPCJohn: Yes, in my calendar as 15:30
11:26:24 <verdurin> BST
11:26:37 <janders> yeah... makes me realise how lucky I am with the timing of this meetup!
11:26:42 <janders> I should move to Perth
11:27:05 <verdurin> janders: they're going to be discussing the timings, so it's quite possible they'll alternate like this one
11:27:16 <janders> that would be very cool
11:27:32 <oneswig> OK shall we move on?
11:27:35 <oneswig> #topic AOB
11:27:40 <oneswig> janders: ?
11:27:47 <janders> ok... bit of Intel bashing as promised
11:27:53 <janders> have you guys played with VROC much?
11:28:07 <oneswig> To my shame I admit to never having heard of it
11:28:17 <janders> CPU-assisted hybrid RAID
11:28:33 <oneswig> Ah, I think I've heard of that.
11:28:34 <janders> we originally wanted to run our BeeGFS nodes in RAID6
11:28:52 <janders> but as RAID6 sucks on NVMes, we ended up with smaller RAID0s
11:29:12 <janders> but because of these limitations we thought we'll try VROC on our Cyber kit
11:29:17 <janders> so we did
11:29:19 <janders> catch?
11:29:23 <janders> Supported on RHEL7
11:29:29 <janders> ...unless you use Ironic
11:29:35 <oneswig> aha.
11:29:40 <janders> Ironic is too cut down to understand it
11:29:50 <janders> s/ironic/IPA image
11:30:00 <oneswig> That's not surprising
11:30:10 <janders> so we have dual NVMes but so far can only run off a single one, no RAID protection
11:30:21 <oneswig> Can you boot from sw-assisted raid?
11:30:25 <janders> it's work in progress, we're thinking of retro-fitting RAID after the fact in an automated way
11:30:27 <HPCJohn> Looking forward to CEPH later on. Lunch calls!
11:30:39 <janders> oneswig: yes, we can
11:30:52 <janders> just can't mark the VROC array as root_hint
11:31:02 <janders> bit of a bummer really
11:31:20 <janders> we're working on the retrofit script in the meantime
11:31:24 <oneswig> Sounds like a DIB element for your IPA image might help with building in the support
11:31:25 <janders> we've got an RFE/BZ with RHAT
11:31:48 <janders> but all in all I have a sneaking suspicion we'll just leverage newly added SW-RAID support
11:32:02 <janders> as with RAID0/1 VROC doesn't bring much benefit AFAIK
11:32:04 <oneswig> With root hints don't you get a broad range of options, even device size?
11:32:09 <janders> RAID5/6 - different story
11:32:18 <janders> trouble is - IPA just can't see the array
11:32:25 <janders> full RHEL can but not IPA
11:32:39 <janders> we had RHAT support look at it, we tweaked IPA initrd but no luck
11:32:54 <janders> sounds like unnecessary complexity if you ask me
11:33:01 <oneswig> It's a fiddly environment to debug, for sure.
11:33:10 <janders> so I'd say if you're looking for VROC for RAID0/1 scenarios, probably steer away from it
11:33:17 <janders> for sure!
11:33:32 <verdurin> janders: not "Cost-efftice and simple" then?
11:33:35 <janders> for RAID5/6, especially if there's an alternative root drive, sure
11:33:36 <verdurin> effective
11:33:42 <janders> not at all
11:33:44 <janders> not at all
11:33:58 <janders> it did cost us way too much time to troubleshoot this already
11:34:04 <janders> so I thought I will share
11:34:12 <janders> good idea but so far pretty troublesome
11:34:38 <verdurin> Thanks for suffering on all our behalf
11:35:03 <oneswig> I look forward to the sequel...
11:35:11 <janders> hopefully SW-RAID support will fix this once and for all
11:35:20 <janders> VROC->disable; done!
11:35:43 <janders> on a good news front I played around with NVMe based iSCSI cinter
11:35:49 <janders> just for a small POC
11:35:54 <janders> s/cinter/cinder
11:36:04 <janders> 3/2GB per sec read/write
11:36:10 <janders> 70k IOPS on a single client
11:36:14 <janders> 200k IOPS across four
11:36:39 <janders> not a bit fan of iSCSI so was quite impressed
11:36:40 <oneswig> have you tried it with iSER?  That works well
11:36:57 <janders> it's just a placeholder for the GPFS when it's ready - so I did not
11:37:10 <janders> but I was expecting maybe 10% of that
11:37:27 <janders> nice surprise
11:37:33 <janders> ( unlike VROC ;)
11:38:24 <verdurin> janders: do keep us informed about your GPFS work
11:38:31 <janders> will do!
11:38:48 <oneswig> Thanks janders!
11:39:08 <janders> happy to give a lightning talk about it in Shanghai if it's ready (and if I am there)
11:39:17 <janders> (or remote in if that's an option)
11:39:24 <oneswig> I'll pencil you in...
11:39:50 <janders> it's EC-GPFS so if it all works as designed it will be a very interesting little system
11:40:06 <janders> bit of ceph-like design
11:40:11 <janders> 12x drives a node and self healing
11:40:23 <janders> PLUS all the GPFS features and stable RDMA transport
11:40:52 <oneswig> janders: what's the condition of the OpenStack drivers for it?
11:41:10 <janders> not ideal, not too bad
11:41:15 <oneswig> I'm guessing this is bare metal so infrastructure storage requirement is limited
11:41:26 <janders> they claim they discontinued support for it few months back
11:41:37 <janders> but we tested it also few months back and it worked reasonably well
11:41:58 <oneswig> good to hear.
11:42:11 <janders> we'll probably use it across the board (glance & cinder for VMs, native for BM)
11:42:57 <oneswig> Did IBM add Manila?
11:43:08 <janders> I don't think so but not 100% sure
11:43:37 <janders> the promise is that they cut down development effort for OpenStack to ramp up development effort for k8s
11:43:54 <janders> so when we get to that part, there might be more interesting options
11:44:02 <janders> not sure where that's at right now though
11:44:23 <verdurin> janders: no restricted-data concerns for this system?
11:44:32 <janders> long story
11:44:35 <janders> short answer - not yet
11:44:46 <verdurin> It's always a long story...
11:44:53 <janders> long answer - we're still working out some of the cybersecurity research workflows
11:44:56 <janders> actually...
11:45:05 <janders> given we're in AOB and we've got 15 mins to go
11:45:28 <janders> do you guys have any experience with malware research related workflows on OpenStack
11:45:36 <janders> as in:
11:45:46 <janders> 1) building up the environment in an airgapped system
11:46:05 <janders> 2) getting the malware into the airgapped perimiter
11:46:06 <janders> 3) storing it
11:46:10 <janders> 4) injecting it
11:46:15 <janders> that sort of stuff
11:46:33 <janders> we're getting more and more interest in that and it feels a bit like we're trailblazing... but are we?
11:47:14 <verdurin> janders: we have an air-gapped system, though not for malware research.
11:47:48 <janders> how do you balance security with giving users reasonable flexibility to built up their environment?
11:47:59 <oneswig> It's a very interesting use case.  Hope your Ironic node cleaning steps are good.
11:48:31 <janders> right now we're in between the easy way (just importing CentOS/RHEL point releases) and the more flexible way (regularly updating repo mirrors inside the airgap, including EPEL)
11:48:54 <janders> and this is even before we think about the malware bits
11:49:23 <janders> right now (and this is only a few days old concept) I'm thinking:
11:49:41 <janders> 1) build the environment in a less restricted environment (repo access, ssh in via floating IP)
11:49:55 <verdurin> janders: we're more like the latter, though in fact our users have very little ability to customise beyond requesting changes for us to implement
11:50:00 <janders> 2) move the environment to a restricted area (isolated network, no floats, VNC access only)
11:50:33 <janders> 3) inject malware by attaching volumes (with encrypted malware inside)
11:51:05 <janders> yeah the longer we thing about it the more convinced we are we'll end up playing a significant role in setting up user environments... :(
11:51:32 <janders> s/thing/think
11:51:59 <janders> it really feels like something that either hasn't been fully solved - or has been solved by people who tend not to talk about their work..
11:52:27 <verdurin> janders: we did take an approach similar to your 1) and 2) steps, in order to make it more bearable/efficient at the preparation stage
11:52:54 <janders> so do you run 1) in non-airgap, take snapshots and import into airgapped for 2)?
11:53:22 <verdurin> exactly, yes
11:53:31 <janders> ok! might be a way forward, too
11:53:45 <janders> what worries me is maintaining relatively up-to-date CentOS/RHEL/Ubuntu/...
11:53:55 <janders> if we go with what we described, that's an non-issue
11:54:25 <verdurin> Yes.
11:54:26 <janders> fun times!
11:54:40 <verdurin> You're always up to something interesting...
11:54:51 <oneswig> indeed.  Great use case
11:54:59 <janders> my life these days is figuring this out, while getting HDR/VROC to work - and writing paperwork to keep funding going! :)
11:55:07 <janders> argh
11:55:09 <janders> frustrating at times
11:55:15 <janders> hopefully we're past the worst
11:55:35 <janders> I will see where we end up with all this - and if I am still allowed to talk about it at that stage I'm happy to share the story with you guys
11:56:48 <oneswig> thanks janders, would be a great topic for discussion.
11:56:55 <janders> thanks guys!
11:56:59 <oneswig> OK, shall we wrap up?
11:57:03 <janders> I think so
11:57:15 <janders> thanks for a great chat
11:57:28 <janders> please do wave the RDMA flag at the Ceph meeting!
11:57:38 <oneswig> I'll try...
11:57:42 <oneswig> #endmeeting