21:00:23 #startmeeting scientific-sig 21:00:24 Meeting started Tue Nov 28 21:00:23 2017 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:25 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:27 The meeting name has been set to 'scientific_sig' 21:00:34 Hellooo 21:00:52 o/ 21:00:59 #link agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_November_28th_2017 21:01:02 I am Leong from Intel 21:01:06 Hi leong, thanks for coming 21:01:36 oneswig: will try my best to join more often..:-) 21:01:43 Hello, This is Piyanai from the MOC 21:01:46 That'd be great... 21:01:55 Rajul will also be joining in couple minutes 21:01:58 Hi piyanai, thanks for joining 21:02:16 Hey, I am Erming from Compute Canada Cloud team 21:02:30 Hi epei, welcome 21:02:34 Morning 21:02:39 oneswig: thanks 21:02:40 g'day b1airo 21:02:45 #chair b1airo 21:02:46 Current chairs: b1airo oneswig 21:03:07 Hello. This is Sunil from Intel. 21:03:18 Hi isunil, thanks for coming 21:03:31 Shall we get started 21:03:48 b1airo: we are expecting martial today, right? 21:03:57 Hi! this is rajul from MOC 21:04:07 Hello rajulk, thanks for coming 21:04:09 Evening. 21:04:11 I think so, should be his TZ 21:04:25 Hi verdurin 21:04:37 ok, lets get the show on the road 21:04:50 #topic ostack-hpc / hpc-cloud-toolkit 21:05:10 leong: isunil: would you like to describe your project first? 21:05:16 ok 21:05:18 sur 21:05:29 #link HPC Cloud Toolkit: https://github.com/hpc-cloud-toolkit/ostack-hpc 21:05:53 this is the hpc cloud toolkit that Intel HPC team is working on the past few monhts 21:05:59 we are now open-sourcing the solution 21:06:03 This is a recipe to create HPC in a cloud, leveraging OpenHPC 21:06:11 OpenHPC components 21:06:23 Everything from OpenHPC? 21:06:26 thanks Leong 21:06:27 yup.. basiclly integrating OpenHPC with OpenStack (Ironic specifically) 21:06:45 combining the both open-source tooling for HPC community 21:07:06 how does openstack integrate with beowulf/warewulf - or just replace the infra management components? 21:07:17 warewulf is not used here. 21:07:25 it is replaced with ironic 21:07:27 oneswig: for now, it is mainly the infra managment 21:07:31 sounds good. 21:07:44 Can you describe the process you're using? 21:07:51 using ironic to provision the "hpc-head" and "hpc-compute" node 21:08:04 there are generally 3 phase 21:08:21 phase 1: using disk-image-buider to build the HPC related images 21:08:36 phase 2: using ironic to deploy the physical nodes for HPC Head/Compute 21:08:57 phase 3: provision/configure the OpenHPC on the nodes in phase 2 21:09:18 That's ansible-driven configuration in phase 3? 21:09:21 this simplify and make the deployment of HPC cluster easier in the cloud 21:09:36 no ansible so far 21:09:42 right now, in phase 3, it is more on a cloud-init script 21:09:46 hi folks...sorry, my calendar didn't update with the time change :( 21:09:54 How tightly does the implementation with Ironic? 21:10:00 ansible playbook can be consider for future releases (depending on community needs) 21:10:09 Any global storage associated with the cluster? 21:10:36 this is the initial POC/prototype, there is no "global storage" associated with the cluster recipe now 21:11:00 but we are looking into Lustre 21:11:10 we are envisioning to have Luster as a service. 21:11:18 What special HPC things is OpenHPC helping us with? 21:11:44 leong: We use CephFS and have developed early Manila integration for that. 21:11:58 piyanai: we uses Ironic to provision the baremetal host, depending on your perspective, i would said it is loosely 21:12:10 isunil: would be interesting to see ansible playbooks for lustre-as-a-service. 21:12:34 oneswig: is CephFS going to be widely adopted by HPC communty? 21:13:03 leong: too early to say, but it has a lot of interest and traction 21:13:04 oneswig: we are wondering if Manila can be integrated with Lustre? 21:13:06 Great, so if I have another “ironic” like software, I should be able to do a plug-and-play replacing Ironic easily… 21:13:07 does "lustre-as-a-service" mean building OSS's via nova and provisioning to your provider network, or having an existing luster setup that becomes accessible (driven by Manila) 21:13:12 OpenHPC brings with all tested, libraries, integrated components with it. all HPC environment installing HPC libraries on head node and sharing libraries and path with compute node via NFS 21:13:40 leong: CephFS is not particularly performant in a parallel use case, we are using it a bit like NFS - home directories basically. 21:13:43 piyanai: by theory it is.. however, the recipe now is integrated in an OpenStack environment 21:14:35 @oneswig, isn't home directories the worst place for something like CephFS? 21:14:35 leong: lustre integration with Manila - could be possible but requires kernel patches for lnet doesn't it? 21:14:40 isunil: can you describe the testing a little more - that sounds like a good motivation for using it 21:14:56 from our perspective, we are wondering if that toolkit can be "integrated" in the OpenStack community (for example, creating a HPC project) 21:14:57 Chris_MonashUni: not yet. It turns out scratch space is worse 21:15:05 for us... parallel file opens not good 21:15:19 leong: Is the system designed to use Ironic to deploy head and compute nodes once by request of an admin, or do you allow HPC users to deploy their own environments regularly? 21:15:23 oneswig: might be, not sure yet, we haven't deep dive into that path 21:16:12 Chris_MonashUni: but jewel's CephFS - not good for anything that does fsync (eg, vim) - you've definitely got a point 21:16:34 b1airo: OpenHPC maintains integrated test suites, for every library, application they have 21:17:01 after provisioning HPC, we run OpenHPC integration test suites to verify all functionality 21:17:06 leong: isunil: you've mentioned lustre integration, what other plans do you have? 21:17:09 priteau: the toolkit was testing based on a request from admin, this can also allow HPC users to deploy their own 21:17:13 leong: OpenStack community integration is an interesting idea for sure 21:17:54 leong: just saw your comment up there - aha, elaborate on that? 21:17:59 oneswig: OmniPath :-) 21:18:19 priteau: if I understand correctly then OpenHPC is just a tenant of the cloud, so users of the HPC are necessarily aware of the cloud infrastructure underneath 21:18:23 Intel Inside, eh? :-) 21:18:38 oneswig: you mean the HPC users deployment? 21:18:56 b1airo: you mean "not necessarily aware"? 21:19:10 the community part - leong how would that be done? 21:19:13 Yes! 21:19:14 oneswig: OpenHPC is community maintained "A linux foundation" 21:19:37 isunil: will the ostack-hpc project become part of that? 21:19:44 oneswig: the community part.. ok... that's the thing i want to discuss in this Scientific SIG 21:20:06 leong: you've got ~2 minutes :-) 21:20:10 oneswig: we were hoping for that.. 21:20:34 i am not sure if this is feasible to create a "HPC project", something like "DBaaS Trove" 21:21:07 so eventually this can provide a "HPC catalog" services that based on OpenHPC and OpenStack 21:21:31 Can't see why not 21:21:34 leong: Can we combine step 2 and 3 - Make a comprehensive image that contains what you want to do in #3? 21:21:42 leong: ironic, heat and ansible galaxy would be a great toolbag for this 21:21:58 oneswig: yes 21:22:18 epei: we kind of want to split it so that it is more flexible 21:22:42 epei: so step 3 potentially can be done with either heat, cloud-init, ansible, etc 21:22:54 leong: I see. thanks 21:23:01 leong: epei: generally there's parameters to configure with here too. 21:23:05 what do you guys think about the idea of "HPC project"? 21:23:09 leong: On the other hand, the head and compute nodes are different. 21:23:18 eventually this can add on others like Lustre 21:23:27 leong: just curoious..waht will this "HPC catalog" offer....template to integrate various openstack serivces for HPC? 21:23:34 leong: I think it is a good idea but I am not sure if something like Senlin is the way to do it. 21:23:35 Any statistic that can be shaerd? 21:23:54 deployment time for example? 21:24:13 Or is it still too early for that? 21:24:14 leong: isunil: have you had any interest from OpenHPC for this work/ 21:24:25 oneswig: wasn't sure too much on senline 21:24:32 leong: I'd be careful about labeling it as HPC, more like batch scheduler. to achieve the H, you need magic hardware integration 21:25:14 leong: so an HPC project might not get the mindshare if it was created in that form. Where to get the best focus on it? 21:25:20 Chris_MonashUni: you are right, the HPC Project will specifically focusing on hardware integration and HPC software enabling 21:25:32 Chris_MonashUni: very true, totally agree 21:25:39 oneswig: I noticed some interest from OpenHPC community, In past there were one request to OpenHPC to support OpenStack.. they are looking for a recipe to create images and provision... 21:25:56 Chris_MonashUni: +1 21:26:59 this is still at an early stage discussion, i am hoping to get more feedback/opinions from the community 21:27:08 We create something similar using https://github.com/stackhpc/stackhpc-image-elements/tree/master/elements/openhpc for elements, https://galaxy.ansible.com/stackhpc/cluster-infra/ for heat and https://github.com/SKA-ScienceDataProcessor/p3-appliances for stage 3 - there has to be some common ground 21:27:10 One nice integration down the line might be application credentials to allow the cluster to access the Openstack users objectstorage 21:27:45 Chris_MonashUni: ever used barbican for storing cephx keys, monasca credentials etc? 21:27:51 Chris_MonashUni: +1 21:27:59 Seems to work well for project-level secrets 21:28:03 no, I haven't used babrican, but keep meaning to look at it ;-) 21:28:16 it does sound appropriate 21:28:38 Chris_MonashUni: I have an ansible plugin for getting secrets from barbican and setting them as facts 21:28:51 that sounds fun, care to share? 21:29:16 Chris_MonashUni: try this: https://github.com/SKA-ScienceDataProcessor/p3-appliances/tree/master/ansible/roles/alaska_secrets/library 21:30:06 I welcome your PRs to it :-) 21:30:36 noice. I might move our team to putting all their stuff in babrican with that ;-) 21:30:55 Vault is good but this could be more dynamic 21:30:56 anymore questions for the HPC Cloud Toolkit? 21:31:22 leong: could you make something happen with scientific sig support? 21:31:30 @leong so you spin up a slurmdbd and mysql backend to store accounting info or just throw it away? 21:31:44 oneswig: what do you mean "scientific sig support"? 21:31:44 s/so/do/ 21:31:58 Chris_MonashUni: trade you a playbook for your slurm backend! 21:32:07 haha, OK 21:33:20 We should move on - MOC team, you ready? 21:33:33 As ready as can be :) 21:33:37 #link Slurm at Massachusetts Open Cloud 21:33:42 Chris_MonashUni: in this POC, it doesn't persist 21:33:45 Take it away piyanai 21:33:52 Rajul 21:34:01 would you like to start? 21:34:12 Will jump in as needed 21:34:13 piyanai: sure 21:34:58 the main idea of this poc(as yet ) is to use the underutilized cloud resources for scientific computing 21:35:48 so this scales an already exisitng slurm cluster over virtual machines as and when resources are avialable 21:37:14 scaling is driven by Openstack watcher. we added a strategy to look for resource utilization from openstack monasca and then scale the cluster if the utilization is low 21:37:41 We use OpenStack Watcher to monitor the cloud resource, contact/interact with Slurm to provision instances if/when there are tasks on Slurm queue. 21:38:20 Could Watcher easily have non Monasca data source/s - Monasca still looks extremely heavy 21:38:55 As far as I know, yes, it’s still Monasca that gets the most use cases and support in Watcher 21:39:11 In the MOC, we had a lot of scale issues with Celometer, btw. 21:39:19 We are considering moving to Monasca for monitoring 21:39:47 Monitoring, or metrics? 21:40:11 Both? 21:40:43 piyanai: we use Monasca - and use the multi-tenant features for providing performance telemetry to Slurm users. 21:40:56 Great! we should talk. 21:40:59 Currently investigating a local deploy of prometheus for the same. 21:41:11 piyanai: that'd be great, would be good to share notes. 21:41:27 oneswig: how's the overhead with monasca 21:42:23 In terms of delay or metrics sampling rate? Not too sure on either. We do hate the jvm processes though, they can be real hogs 21:42:45 We are currently kolla-ising it - or trying to - which might confine them better