#openstack-meeting log

11:00:08 <oneswig> #startmeeting scientific-sig
11:00:09 <openstack> Meeting started Wed Jan 13 11:00:08 2021 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
11:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
11:00:12 <openstack> The meeting name has been set to 'scientific_sig'
11:00:47 <oneswig> Apologies, no set agenda for today (been busy!)
11:04:48 <oneswig> Wrangling with diskimage-builder in the other terminal
11:06:43 <dh3> o/ my other meeting was cancelled so I thought I'd stick my head in
11:07:09 <oneswig> Hello dh3, nice to see you.
11:07:48 <oneswig> I don't think there's anything on my agenda for this week
11:07:55 <oneswig> Any news to report from you?
11:08:31 <dh3> things are ticking over in our "new normal"
11:08:59 <oneswig> Good to hear.
11:09:25 <oneswig> The very phrase "ticking over" is evoking all kinds of envy :-)
11:09:51 <oneswig> (been a busy return to work)
11:10:30 <dh3> There is plenty going on but not much firefighting which is a good place to be :)
11:11:18 <oneswig> Happy to hear it, for many reasons
11:11:45 <PeteC61> It is only Jan
11:12:01 <oneswig> Hello PeteC61, nice to see you
11:12:22 <oneswig> Jan indeed.  2021 is just getting started.
11:13:34 <oneswig> Next week we have our team "design summit" to plan our R&D activities
11:14:01 <oneswig> An unusually high number of exciting things to work on...
11:14:50 <dh3> are you thinking more infra/deployment tools, or user-facing features?
11:15:54 <oneswig> Well it covers everything but infrastructure is going to be "the big enchilada" this time round
11:18:40 <oneswig> Often the priorities we set are akin to new years resolutions in their enduring effect, but we've adopted the 10% time approach to R&D to change that
11:19:22 <oneswig> Do you have that approach - setting aside some time in the week to focus on R&D and improvement?
11:19:40 <b1airo> o/ hello and happy new year. i just remembered I'd opened irccloud but then not actually logged in :-)
11:19:42 <dh3> That's an interesting one, I'd expect to see some overlap between "I want to work on this" and "I have to work on this to keep my customers happy"
11:20:13 <dh3> No official 10% time here but recently we gained an official "no meetings" afternoon each week which is heading in that direction, I think.
11:20:30 <b1airo> sounds blissful!
11:20:47 <PeteC61> That is my plan. In the interim, if something is of potential interest and is proposed, we can potentially make space.
11:21:32 <oneswig> Hi b1airo, happy new year
11:21:36 <oneswig> #chair b1airo
11:21:37 <openstack> Current chairs: b1airo oneswig
11:21:43 <PeteC61> always looking to hear how others are progressing though. This can helkp direct us to what else is out there that is working for others.
11:22:38 <b1airo> interesting culture conversation i've stumbled into by the sounds of it
11:24:57 <b1airo> i've got some tech/openstack investigations underway if anyone wants to talk shop...
11:25:11 <oneswig> In some ways it's the challenge of getting beyond chapter 1 in "the phoenix project"
11:25:18 <oneswig> b1airo: go ahead!
11:27:38 <b1airo> first, let's start with multi-tenant options for Spectrum Scale. we're designing an OpenStack cloud/hpc hybrid thing at the moment (as you know oneswig), and i'm trying to ensure i've got an up-to-date picture of parallel filesystem options that could fit in and service multiple tenants from shared storage infrastructure
11:29:26 <b1airo> I have a pretty good idea of what's possible with Lustre, but I'm still trying to figure out if Spectrum Scale even has sufficient isolation controls
11:32:41 <b1airo> CephFS is obviously a possibility, but I'm a bit wary about it from several perspectives - maturity, supportability, and write/iops performance for scratch storage - we'll probably do other things with it at more modest scale and learn from there
11:35:02 <oneswig> Have you evaluated Weka b1airo?  They have a multi-tenancy story as well.
11:35:49 <b1airo> my next one is wondering whether anyone has current experience architecting (and running!) Ceph all-flash pools. thinking about choice of data devices and WAL/DB devices etc, also CPUs. wanting to make sure i've learned from someone else's mistakes :-)
11:36:41 <oneswig> On that subject, this looks like interesting kit https://h20195.www2.hpe.com/v2/GetPDF.aspx/a50000084enw.pdf
11:37:58 <dh3> I know Spectrum Scale/GPFS has Cinder drivers etc but not clear if it can present the shared filesystem as a volume (assuming you want to access the same data from both OpenStack and HPC)
11:39:55 <dh3> Our Ceph is predominantly HDD (NVME for journals) so the only advice I have is to make sure you use devices with high endurance (DWPD)
11:39:58 <b1airo> and finally (for now), we've got a partner who will be deploying a cluster (and bunch of related side machines/services) into this environment and they need pretty tight integration with other on-prem infrastructure like the AD domain controllers, so they their own private networking space routed between their on-prem networks and our cloud env. there are plenty of ways of skinning that cat outside of
11:39:58 <b1airo> OpenStack, but I'm wondering if there is already some relevant routing functionality in Neutron that could make the whole thing more software defined and repeatable for other future tenants
11:40:45 <b1airo> and finally (for now), we've got a partner who will be deploying a cluster (and bunch of related side machines/services) into this environment and they need pretty tight integration with other on-prem infrastructure like the AD domain controllers, so they want own private networking space routed between their on-prem networks (multiple sites) and our cloud env. there are plenty of ways of skinning that cat
11:40:45 <b1airo> outside of OpenStack, but I'm wondering if there is already some relevant routing functionality in Neutron that could make the whole thing more software defined and repeatable for other future tenants
11:44:03 <b1airo> yep oneswig, that could be a reasonable choice of box. I believe the DL325 is the platform on which HPE offer their Spectrum Scale (ECE based) product/solution too
11:46:31 <b1airo> though as we will have other general purpose "bulk storage" HDD based pools i was leaning towards scattering NVMe around the cluster. but perhaps it would make more sense to try and get at least 4 dedicated nodes, even if they start out only 1/4 or 1/2 populated from a storage and memory perspective
11:49:43 <b1airo> dh3: on endurance, that's an interesting point... are you referring to the data devices or WAL/DB? i suspect i've previously been too conservative on this front. some of the all-flash ceph product marketing i've read recently is using QLC based drives with what seem to be very read focused performance
11:50:30 <b1airo> (data drives that is, often optane for WAL/DB though)
11:52:00 <oneswig> What's the write endurance quoted for optane?  It's certainly a lot quicker and consistent for writes
11:53:15 <dh3> I was imagining for the WAL, the DB too assuming your data churn is high enough to justify it. I wonder if the speed differential between Optane and NVME would make it worth it.
11:53:31 <b1airo> dh3: there is also a Manila driver for Scale/GPFS, however it requires the filesystem to be exported via NFS from GPFS protocol nodes, so it's not a real parallel filesystem option - we do require something high performance
11:54:39 <b1airo> believe it also requires a shared flat provider network over which all tenants mount their shares, so that immediately limits the practical usage of it
11:55:18 <dh3> oneswig we had Intel in to talk about Optane and they didn't put a number on it, just to say "lots" or "plenty" (shades of Rolls-Royce and "adequate horsepower")
11:59:19 <b1airo> optane is insanely high IOPs both R & W, and very high endurance, but yes i do agree that it could be overly expensive depending on how much you need for DB duties (which in itself seems to be something that changes whereever you look - 4% of data capacity in one place, 6% elsewhere, some GBs per TB number elsewhere, or just 30GB or 300GB per OSD if you dig into mailing list and bugs that detail how things
11:59:19 <b1airo> work (currently) with RocksDB
12:00:21 <oneswig> b1airo: the integration with multi-tenant networking could present a significant challenge that HPC-class storage simply doesn't face in it's home turf
12:00:38 <oneswig> ah, we are out of time
12:00:46 <dh3> b1airo I'm interested to know what you end up with, we haven't (yet) found the performance motivation to have anything all-flash
12:01:34 <oneswig> We did this - but it's out of date now: https://www.stackhpc.com/ceph-on-the-brain-a-year-with-the-human-brain-project.html
12:01:59 <oneswig> I think this image is still relevant for NVME though - https://www.stackhpc.com/images/julia-nvme-sustained-writes.png
12:03:43 <oneswig> Better close the meeting - thanks all
12:03:46 <oneswig> #endmeeting