21:00:26 <oneswig> #startmeeting scientific-sig 21:00:26 <openstack> Meeting started Tue Jun 12 21:00:26 2018 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:30 <openstack> The meeting name has been set to 'scientific_sig' 21:00:45 <oneswig> Think I got the spelling right this week... 21:00:59 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_June_12th_2018 21:02:33 <janders> g'day everyone 21:02:41 <oneswig> Hello - Jacob? 21:02:53 <janders> yes, that's me :) 21:03:03 <janders> how are you Stig? 21:03:14 <oneswig> Good, but busy... 21:03:36 <oneswig> Rolling out a deploy of bare metal + IB, as it happens... 21:03:49 <oneswig> (although we are not onto the interesting bits yet) 21:03:50 <janders> sounds familiar :) that'd do it to you 21:04:07 <oneswig> How's your work going? 21:04:24 <janders> I keep hearing new MOFED with eth_ipoib will be out soon (even if only a beta) 21:04:57 <janders> it's good. I've been focusing mostly on the networking lately, running into un-interesting ethernet problems... (solved) 21:04:58 <oneswig> I'm quite pleased we decided not to depend on it. 21:05:23 <janders> I read through your solution - clever! :) 21:05:27 <oneswig> All ethernet problems are interesting, in the right audience :-) 21:05:39 <janders> the more interesting part is multirail 21:05:42 <oneswig> thanks janders - borne of necessity 21:05:57 <janders> I got it working, though there are some "interesting" issues 21:06:06 <janders> have you ever tried multirail IB with Ironic? 21:06:23 <oneswig> No. We only have one NIC and we use one port as IB and the other as Ethernet 21:06:32 <janders> I need it for 1) storage 2) AI (DGX) systems 21:06:37 <oneswig> What happens? 21:06:50 <janders> the issue I'm seeing is - if I have multiple IB ports defined, instance creation fails 21:06:54 <janders> "neutron timeout" 21:07:01 <janders> (though no errors in neutron) 21:07:05 <oneswig> dug any deeper? 21:07:27 <janders> however if I have one ironic port on instance creation and add the other three after - that works and I can attach ports OK 21:07:44 <b1airo> Morning. A bit hectic here, need 10 mins to get kids organised... 21:07:45 <janders> I suspect something between neutron server and NEO. I need to chat to my mlnx friends.. 21:08:01 <janders> good morning Blair 21:08:02 <oneswig> I was wondering if you were getting the wrong "MAC" for DHCP on the wrong physical rail 21:08:04 <oneswig> Hey b1airo 21:08:08 <oneswig> #chair b1airo 21:08:10 <openstack> Current chairs: b1airo oneswig 21:08:11 <martial_> Hi Stig, Blair 21:08:19 <oneswig> aha, hi martial_ 21:08:23 <oneswig> #chair martial_ 21:08:25 <openstack> Current chairs: b1airo martial_ oneswig 21:08:40 <oneswig> How many DockerCons have you been to this year martial_? :-) 21:08:55 <martial_> two this year :) 21:09:09 <martial_> Federal one + the real US one ... not the EU one 21:09:12 <oneswig> Can't get enough of the stuff, eh... 21:09:38 <martial_> something like that :) 21:10:19 <oneswig> How's the level of interest in GPU/AI/ML (and by some extension, HPC)? 21:11:33 <martial_> will tell you when it really start (ie tomorrow) 21:11:40 <oneswig> Ah, the calm before the storm. 21:11:51 <oneswig> Has DataMachines got a booth? 21:11:51 <martial_> today is more see who is out there, what ... 21:11:56 <martial_> OpenStack has a booth 21:12:01 <martial_> DMC does not 21:12:15 <oneswig> You're manning the OpenStack booth? 21:12:33 <martial_> The scientific effort will be in a SIG tomorrow. I will attend 21:12:45 <martial_> I am not, Chris Hoge and others are 21:13:50 <oneswig> Something I don't know, what's the overlap between Kata and Docker? Lots? None? 21:14:04 <janders> this DockerCon SIG meeting sounds interesting - maybe it's something we could add to next week's agenda for this meeting? :) 21:14:27 <martial_> some, kata is a container solution native to OpenStack, Docker is still not 21:14:33 <oneswig> I'm guessing the images are different, given the Kata ones are booted rather than started 21:15:07 <janders> this reminds me of something from the Vancouver Summit that might be worth mentioning 21:15:12 <martial_> I will tell you after I talk to the people of the SIG (Christine and Christian are going to be there) 21:15:12 <b1airo> Yes, but you can make a Kata container from a Docker image I believe 21:15:35 <janders> some of you have attended the DK Panda's talk on the last day 21:15:46 <janders> he was talking about optimising overheads in containers 21:15:47 <martial_> yes, same as CharlieCloud or Singularity (or docker save to tar) 21:15:57 <oneswig> martial_: conversely, thanks to Kolla, OpenStack runs natively well on Docker (once you disable all that networking...) 21:16:20 <janders> I had to leave early to attend a meeting so didn't get to ask this question but where did he see these overheads? 21:16:23 <janders> networking I suppose? 21:17:00 <oneswig> janders: networking is painful owing to the overlays and bridging on the host side. 21:17:18 <janders> indeed 21:17:53 <janders> but I wonder if he's hit container overheads in other areas 21:17:56 <oneswig> I'm interested in the scope for VF pass-through, SRIOV style. In a way that doesn't circumvent the overlays - wonder if that is possible 21:18:38 <oneswig> janders: filesystems of many overlays can often be cited - each adding a layer of indirection 21:18:58 <janders> re SRIOV/vf - to my best knowledge nothing like this exists today with k8s and kuryr, howver it would be of interest to us as well 21:20:00 <janders> speaking of filesystems - do you have experience with passing through an RDMA-native system mounted on the "container host" to containers? 21:20:04 <martial_> I think the Cloud Native foundation will be there as well, I want to talk to there as well 21:20:46 <oneswig> janders: We've used GlusterFS like that I believe and I think it JFWed 21:21:16 <oneswig> martial_: keep an eye out for Michael Jennings - I saw a really great presentation from him in April on Charliecloud 21:22:36 <oneswig> janders: it ought to work because the container's seeing a filesystem, the VFS implementation shouldn't be affected by the encapsulation (in theory) 21:23:27 <janders> oneswig: thanks! :) I will likely try a similar approach soon 21:24:04 <oneswig> martial_: be very interested to know if you hear more about https://rootlesscontaine.rs or similar efforts - unprivileged container runtimes 21:24:40 <b1airo> Yeh, I think someone here at Monash tried it on a DGX and it works 21:24:59 <oneswig> janders: for extra points, orchestrate your GlusterFS volume using Manila 21:25:01 <martial_> interesting indeed 21:25:11 <martial_> do not know if they are here but I can try to check 21:25:39 <trandles> Hey there. Sorry, been lurking and working...I can ask Michael Jennings if he's going to be at <insert name here> if someone fills in that blank for me. 21:25:50 <oneswig> Hi Tim! Smoked you out :-) 21:25:55 <janders> oneswig: we'll likely use BeeGFS or perhaps GPFS 21:25:56 <trandles> indeed 21:25:59 <oneswig> DockerCon 21:26:19 <martial_> Hi Tim, yes DockerCon :) 21:26:27 <janders> b1airo: do you happen to run your DGXes with OpenStack? 21:26:28 <oneswig> janders: I have an ansible module under my wing for BeeGFS but it's a little alpha right now. 21:26:46 <oneswig> How are you provisioning the filesystem and mounting the client? 21:27:18 <b1airo> janders: not today, but we are planning to put Ironic under everything eventually 21:27:34 <trandles> @mej: Nope, no DockerCon for me. Would love to go sometime, though. 21:27:53 <oneswig> brb 21:28:29 <janders> oneswig: with our BeeGFS, we've finished specing out the hardware and the kit is "in flight" so we haven't done much hands on work yet, but it is coming soon 21:29:03 <oneswig> janders: might have these roles on Ansible Galaxy by then, will let you know if so... 21:29:33 <janders> b1airo: thanks! DGX1 is one of my motivations behind implementing multi-rail IB with Ironic. 21:30:02 <janders> oneswig: great! we would be more than happy to work with you on this one 21:30:19 <oneswig> Guess we ought to look at the agenda... 21:30:24 <janders> good call 21:30:43 <oneswig> b1airo: martial_: can you guys drive a sec - family bed-time matters to attend to... 21:31:15 <b1airo> Copy oneswig 21:31:24 <martial_> sure thing 21:32:02 <martial_> joined a little after the hour ... need to quickly check the agenda link 21:32:05 <b1airo> #link https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_June_12th_2018 21:32:50 <b1airo> #topic sensitive data processing on OpenStack 21:33:10 <martial_> mmmh extra agenda item? 21:33:14 <martial_> how sensitive? 21:33:23 <b1airo> Firstly, we're still collecting input on this and trying to form up what efforts here should look like 21:34:16 <martial_> so my experience with this is related to a private cloud type work 21:34:17 <oneswig> back now. Good topic :-) 21:34:20 <b1airo> I think we can at least extend the book by a chapter or two 21:34:33 <oneswig> agreed, if we can 21:34:39 <martial_> not a shared cloud 21:34:49 <martial_> for SC18? 21:34:54 <martial_> (re book) 21:34:59 <oneswig> martial_: private in the most private sense, ie one tenant only? 21:35:15 <b1airo> Maybe, not sure whether we'll have it ready 21:35:49 <b1airo> Ha, well tenant per hypervisor isolation is one control we've considered using 21:36:06 <b1airo> But remember, "defence in depth"! 21:36:17 <oneswig> I'm interested in hearing all the hoops people jump through and what each gives them 21:36:50 <martial_> more with known tenants 21:37:01 <martial_> otherwise we have to create separate hardware instances 21:37:12 <oneswig> How would you implement per-tenant hypervisor isolation - is there a filter for doing that or would it be about creating host aggregates with restricted visibility (if that's possible) 21:37:21 <martial_> good news, Ansible makes it easier obviously 21:37:25 <b1airo> #link https://etherpad.openstack.org/p/Scientific-SIG-Controlled-Data-Research 21:37:44 <janders> martial: even easier with ansible + ironic 21:37:53 <b1airo> oneswig: yeah there is an isolation filter 21:38:27 <b1airo> But other ways with aggregates too, depends on the overall cloud design I guess 21:38:51 <oneswig> Might be worth a line item under the technologies section b1airo 21:39:07 <oneswig> janders: what are you doing for cleaning - how thorough? 21:39:24 <b1airo> Yeah I still haven't dumped much in the etherpad yet, definitely have more to add 21:39:39 <oneswig> b1airo: the perfect accompaniment to breakfast :-) 21:39:51 <janders> I think AggregateMultiTenancyIsolation is one of the keywords 21:40:02 <b1airo> Ha, I'm already smearing honey on my phone so why not! 21:40:06 <janders> I personally haven't used it though, just remember it exists 21:41:05 <oneswig> Thanks janders, good to know 21:41:16 <janders> oneswig: not cleaning yet, though will be looking at this shortly. Are you thinking storage, firmware or both? 21:41:25 <b1airo> Ok, shall we move on briefly..? 21:41:46 <oneswig> janders: the first spanner is local disks, if you have more than one. 21:42:05 <oneswig> We can follow up offline, we've got some handy tools up on github for when you tackle this 21:42:16 <janders> oneswig: ok! thanks 21:42:22 <b1airo> oneswig: do self encrypting drives help there? 21:42:54 <oneswig> yes - insta-scramble - without that one of our bare metal flavors takes 29 hours to clean. Not much use... 21:43:16 <janders> ouch! 21:43:30 <b1airo> Lol, yeah I think our users would assume Nova was out to lunch if that happened 21:43:50 <oneswig> Thankfully we only have one of those nodes, but it's too long. What we really want is LVM signatures and partition tables erasing. 21:44:11 <oneswig> That's enough to prevent it breaking the next deployment. 21:44:25 <janders> I'm hoping to use some of the smarts in the SSDs to help 21:44:32 <oneswig> anyway, let's move on... 21:44:43 <martial_> fair enough 21:44:44 <b1airo> #topic workload management 21:45:29 <oneswig> We have a summer intern to work on polishing a slurm-as-a-service packaging 21:46:00 <b1airo> Sounds interesting oneswig , what needs polishing? 21:46:18 <b1airo> And how does that relate to what's in openhpc? 21:46:24 <oneswig> Right now, we've got a bag of tools, it really needs some better integration without loss of modularity 21:47:19 <oneswig> There's a pair of Ansible modules on Galaxy for creating heat stacks or magnum clusters and spitting out a structured ansible inventory for further config. 21:47:50 <oneswig> It works pretty well for people who know it's innards. Should be easier to use though. 21:48:02 <b1airo> Ok, sounds interesting 21:48:49 <oneswig> #link heat clusters https://galaxy.ansible.com/stackhpc/cluster-infra/ 21:48:52 <b1airo> Is that largely about setting up the inventory then or is the idea to go all the way to compete running SLURM 21:49:04 <b1airo> *complete 21:49:11 <oneswig> #link magnum k8s/swarm clusters: https://galaxy.ansible.com/stackhpc/os-container-infra/ 21:49:52 <oneswig> b1airo: we follow a policy of only doing the infra in heat because it's like working in a small dark cupboard when you're debugging something 21:50:06 <oneswig> Ansible takes over pretty much as soon as the infra's up. 21:50:21 <oneswig> Using OpenHPC, it's pretty easy to put Slurm on top 21:50:55 <b1airo> Right 21:51:07 <oneswig> That ansible's here: https://github.com/stackhpc/p3-appliances - somewhat ragged round the edges 21:51:33 <b1airo> So SLURM seems to be all the rage but has anyone had any luck getting it to behave nicely in a dynamic environment? 21:51:55 <b1airo> oneswig: ragged around the edges, I can relate! 21:51:56 <oneswig> To make a slurm config file from an ansible inventory: https://github.com/stackhpc/p3-appliances/blob/master/ansible/roles/openhpc_runtime/templates/slurm.conf.j2 21:52:30 <oneswig> It turns out Ansible's facts have just enough data to populate what Slurm requires - phew 21:52:47 <janders> oneswig: nice work! 21:52:53 <trandles> b1airo: by "dynamic environment" do you mean adding/removing compute nodes? 21:52:59 <trandles> on the fly 21:53:06 <oneswig> The dynamic one's interesting. A lot of people follow the maximum-sized cluster, and work with most of it missing. 21:53:10 <b1airo> trandles: yep 21:53:29 <trandles> last I looked closely there was functionality around "cloud burst" type stuff 21:53:51 <oneswig> The thorny bit - would love to see a working demo of this - is scaling down 21:54:00 <trandles> the max cluster stuff oneswig references is another way, you just look like you have a load of down nodes when slurmd isn't running on cloud nodes 21:54:01 <b1airo> Is it really cloud burst or actually thinly disguised power management... 21:54:32 <trandles> b1airo: I can't answer that, never gave it a serious look 21:54:44 <b1airo> Can you rename nodes without bringing slurmd down? 21:55:23 <trandles> there are annoying things to deal with...such as telling slurmctld and the slurmd's that they shouldn't care if the config files differ 21:55:25 <oneswig> b1airo: I don't think so. Or perhaps I'm thinking of Ceph OSDs 21:55:42 <b1airo> Another point is, I'm not sure how suitable SLURM really is for handling high throughput workloads that would be bread and butter for a dynamic cluster 21:55:51 <oneswig> trandles: you can do that? revolutionary. How? 21:56:16 <janders> it will be interesting to see when will cluster management software vendors add "native" OpenStack support - as in using OpenStack APIs to create all the infrastructure from scratch 21:56:23 <b1airo> From what I've seen at a distance it is not very good at handling huge numbers of jobs in queue 21:56:25 <trandles> oneswig: there's a config param in later versions, not sure if it only appeared in 17.X or earlier, to say "don't hash the configs" 21:56:33 <janders> I was having a chat about this with Bright recently, they seem to be working on it 21:56:40 <trandles> we 21:56:54 <oneswig> b1airo: huge as in? Cambridge Slurm, apparently the queue was normally 4 days long 21:56:55 <trandles> 've gone slurm-only at LANL because moab couldn't handle large queues efficiently 21:57:23 <trandles> slurm can limit queue sizes, both max and by UID 21:57:29 <trandles> so one user can't hose up the works 21:57:37 <oneswig> janders: Bright's doing some interesting stuff around mixing bare metal clusters and openstack dynamically 21:57:45 <b1airo> Huge as in millions of jobs 21:58:01 <oneswig> that'll do it. 21:58:08 <trandles> I know we handle 10's of thousands, not sure about millions 21:58:35 <oneswig> b1airo: what's the alternative? HTCondor? 21:58:36 <trandles> does any HPC WLM handle millions without a 2 hour scheduling cycle? 21:59:18 <trandles> <aside> oneswig: Charliecloud v0.2.5 just released with new functionality to better handle MPI jobs 21:59:23 <b1airo> Yeah we had problems with a user trying to submit 10s of thousands recently and I was really surprised because I used to run experiments with millions of jobs using Nimrod (against PostgreSQL) years back 21:59:28 <janders> oneswig: indeed - however till now it was mostly running OpenStack on a cluster fully managed by Bright. I think now they are starting to realise it might make sense to do it the other way round. Have you seen any of this functionality yet? 21:59:36 <oneswig> trandles: nice work! What about OpenHPC packaging? 22:00:10 <trandles> oneswig: OpenHPC in the next release. One of the Intel devs was just working on an RPM spec file for Charliecloud in the upcoming release. 22:00:15 <b1airo> Yikes, times up! 22:00:22 <oneswig> janders: no but you're absolutely right, it was an issue that their OpenStack was basically proprietary 22:00:27 <oneswig> Oops... 22:00:30 <b1airo> #topic final comments... 22:00:58 <oneswig> Nothing major from here 22:01:18 <oneswig> Having fun with new Ceph on old hardware 22:01:33 <b1airo> Seems like we could at least so more knowledge and pain sharing in this space anyway 22:01:42 <b1airo> *do 22:01:56 <oneswig> Always seemed to be a good thing 22:02:00 <b1airo> Did it go a lot faster oneswig ? 22:02:13 <b1airo> We're just about to try the mimic upgrade... 22:02:19 <oneswig> Not sure yet - trying to figure out multipathing runes for it currently... 22:02:24 <janders> we didn't get to talk much about the focus areas for this cycle... next week? 22:02:30 <oneswig> ooh, mimic's just out isn't it? 22:02:47 <b1airo> Multipath, urgh. So Lustre hardware then 22:02:50 <oneswig> janders: sounds good, come prepared... 22:02:59 <oneswig> b1airo: got it in one 22:03:08 <b1airo> Tentacles loaded! 22:03:21 <b1airo> Ok, thanks all!! 22:03:25 <oneswig> I'm tempted by mimic now 22:03:31 <oneswig> Thanks everyone 22:03:34 <b1airo> #endmeeting