21:00:44 #startmeeting scientific-wg 21:00:46 Meeting started Tue Sep 19 21:00:44 2017 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:47 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:49 The meeting name has been set to 'scientific_wg' 21:00:57 #chair martial 21:00:58 Current chairs: martial oneswig 21:01:14 #link Agenda for today https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_September_19th_2017 21:01:26 hi martial 21:01:30 Hi Stig 21:01:42 Hello martial rajulk, good evening 21:01:48 good afternoon, etc. 21:01:56 hi oneswig 21:02:02 good evening 21:02:06 I see Mr Kumar is already with us, wonderful :) 21:02:07 Hello 21:02:22 Hey priteau, how are you? 21:02:24 :) 21:02:34 hi 21:02:42 Hello all 21:02:58 #chair b1airo 21:02:59 Current chairs: b1airo martial oneswig 21:03:04 morning Blair 21:03:11 Morning 21:03:27 OK, lets roll? 21:03:43 #topic Scientific Hackathon 21:03:59 Yep, sounds good. I'm just feeding the animals here so will keep one eye on this. 21:04:07 Flanders mentioned the scientific hackathon is looking for mentors 21:04:17 #link http://hackathon.openstack.org.au/mentors/ 21:04:33 What else would you do in a dump like Sydney while the jetlag wears off :-) 21:05:05 hello 21:05:21 Not that we are a mercenary bunch, I did see mention of summit tickets in return... 21:05:26 Hi rbudden 21:06:03 The hackathon starts Friday afternoon before the summit if anyone is around and interested in taking part 21:06:23 blairo oneswig: I apologize, I need to sign off for a few minutes, will be back as soon as I can 21:06:29 #item OpenStack London 21:06:32 martial: np 21:06:41 Is next Tuesday! 21:06:55 We have a WG session for those attending. 21:07:07 Probably not a big win in this time zone... 21:07:57 Great to hear oneswig 21:07:59 I wanted to mention though, we are looking for lightning talks for the BoF, and in customary fashion there is now a small prize for best talk 21:08:08 Hopefully drum up some new members 21:08:40 I think so too. Many of the presentations of the main schedule are scientifically oriented. 21:08:56 Am hoping to meet some new faces. 21:09:03 (and some old ones, of course)! 21:09:12 oneswig: Do you know if talks will be recorded? 21:09:27 They were last year. They usually are. 21:10:14 OK, that's all to add there I think. 21:10:39 #topic Opportunistic utilisation for HTC/HPC 21:11:00 We have Rajul Kumar with us today from Northeastern Uni in Boston MA 21:11:26 #link Rajul's presentation made in Boston https://www.openstack.org/videos/boston-2017/hpchtc-and-cloud-making-them-work-together-efficiently 21:11:43 hello everyone 21:11:49 #link Slides for the presentation are here: https://drive.google.com/file/d/0B6_MvTMovwvFcVppSy1xVmFVbDQ/view?usp=sharing 21:11:57 Hi rajulk, thanks for coming along 21:12:25 oneswig: thanks for inviting me 21:12:48 Thank martial, but he's missing the fun :-) 21:12:59 Can you describe the context for your work? 21:13:06 Is this something used in MOC? 21:13:11 sure 21:13:35 well we are still working on it and is not ready for use yet 21:13:56 but hopefully will soon be in shape 21:14:51 So the idea is it use the idle/underutilized resources in OpenStack cloud to be offered as an HTC service 21:15:25 This will be a Slurm cluster comprising of virtual machines on OpenStack 21:15:55 Is it a separate partition in slurm, or a separate instance of slurm altogether? 21:16:09 these VMs will be added/removed from the cluster based on the resource availability driven by the resource utilization of the OpenStack cluster 21:16:54 It's a seperate instance of slurm - one that provides HTC service on OS 21:17:05 How is the OpenStack resource utilisation monitored? 21:17:43 we will be using watcher that will pull metrics from monasca 21:18:15 and watcher will drive the VM provisioning 21:18:19 rajulk: you don't have that piece in place yet, or is it currently getting developed? 21:19:16 we have developed an inital watcher strategy that does the job 21:19:22 but is yet to be tested 21:20:47 Are the Slurm instances pre-emptible? 21:21:02 yes 21:21:10 so the way it works is 21:21:32 rajulk: so do you rely on Nova scheduling or force 21:21:50 whenever the utilization goes beyond a certain threshold, the job and node is supended on slurm and then then VM on OS 21:22:08 we use nova scheduling 21:22:32 What do you do about re-creating the hpc cluster environment in OpenStack? What about filesystems etc? 21:23:46 we have used Salt as the config management tool that drives the provisioning and configuration of a new node for Slurm 21:23:54 the controller remains static 21:25:02 We'll be using an NFS file system mounted on top of swift storage 21:26:20 sounds interesting. 21:26:41 It's an internal filesystem to OpenStack then? 21:27:06 yes. initally that's the plan 21:27:18 however wwe may extend it as required down the line 21:27:26 Have you looked at any Keystone integration or would this service be seperate from OpenStack from the user's perspective? 21:28:28 So at MOC we have developed an UI that deals with these scientific tools 21:28:35 that is integrated with keystone 21:28:57 and whenver a user comes on to use slurm he has to use it via that UI 21:29:29 as of now, there will be no direct interaction between slurm and a user 21:30:22 Ah ok, so you have a science gateway of some kind 21:30:33 exactly 21:31:12 If you ever get to the point of allowing users to login, I've previously used this to have them authenticated against keystone in PAM: https://github.com/stackhpc/pam-keystone 21:31:20 it's still in development phase though 21:31:25 (back, sorry) 21:31:41 Ok, so presumably the OpenStack SLURM is just one computational resource behind that gateway, which would also include traditional HPC systems? 21:31:45 welcome back martial :-) 21:31:47 (will check the logs for the presentation) 21:32:25 rajulk: what gateway software do you use there? 21:32:29 As of now yes, slurm is running behind the gateway for the user 21:33:32 rajulk: when you say utilization goes beyond a threshold, is that looking at per-node utilization (CPU, mem, etc.) or number of instances across the whole cluster? 21:33:38 there is a different team working on that but as I know it's built on top of troposphere and atmosphere from cyverse 21:35:04 priteau: it's the utilization across the cluster on CPU and memory for a certain period of time 21:35:29 And how do you decide which VM to kill? 21:36:16 Interesting possibilities to look for idle VMs in SLURM there... 21:36:42 priteau: well there's not logic behind that yet. just pick the first one in the list and we dont kill a VM but just suspend it 21:37:05 rajulk: your presentation mentions multi-node jobs are future work, what's stopping that? 21:39:24 oneswig: theortically nothing.we haven't tried that yet. the only concern that was there initially that should we stop all the nodes in the cluster or just a portion of it 21:39:42 and if a portion that how to figure that and work around it 21:40:11 so began with a simpler case of single node jobs 21:40:31 makes sense 21:42:20 OK - did we have any final questions for Rajul? 21:42:45 Just a thanks for joining! 21:42:58 seconded - thanks again Rajul 21:43:08 I joined so I can't see his presentation 21:43:24 any link to that please? 21:43:34 armstrong: https://drive.google.com/file/d/0B6_MvTMovwvFcVppSy1xVmFVbDQ/view?usp=sharing 21:43:37 I'm still wanting a generic OpenStack level preemptible instance, but we could use this approach for a lot of stuff too 21:43:49 Ok thanks 21:43:58 no worries 21:44:10 #topic Scientific OpenStack book 21:44:11 Thanks for the discussion. really helpful for me as well. 21:44:30 b1airo: totally agree 21:44:36 thank you Mr Kumar 21:44:46 Final week for rabid hacking on the text everyone! 21:44:59 martial: thanks a lot :) 21:45:13 oneswig: well we are adding a chapter about "Research in Production" :) 21:45:26 Today I received a great study on Lustre on OpenStack from the Sanger Institute team 21:45:30 What's our drop dead date again oneswig ? 21:45:40 rajulk: thank you for giving us the opportunity to learn from your expertise 21:45:56 b1airo: 22nd I believe 21:46:02 Kathy wants final copy end of next week, edits close end of this week 21:46:03 oneswig: excellent, was hoping for that! 21:46:05 oneswig: started reviewing ... I saw you spot me :) 21:46:58 It's coming together - apologies to anyone who has commented and I've not responded for a couple of days. Quite a bit going on at once. 21:47:41 martial: not sure that was me... I've been largely elsewhere today. 21:48:01 might have been me 21:48:04 I'm working on an SKA infrastructure study to replace the Cray one, should have that in review tomorrow 21:48:07 i was making lots of edits before this meeting ;) 21:48:32 i sent oneswig an email for review, but any feedback is welcome 21:48:33 rbudden: looking for the prize for the most up-to-date entry, huh? :-) 21:48:36 oneswig: interesting, that person had your face icon ... you are being identity theft-ed on book reviews of your own chapter ... sneaky 21:48:49 Sept 29 is the hard deadline 21:49:04 Absolutely no edits after that. If it's not ready we won't have time to produce the book. 21:49:09 Thanks hogepodge 21:49:28 oh nice, i have a few more days ;) 21:49:49 We can propose cover images for this edition. 21:49:50 (Kathy has entrusted me to remind everyone of that as often as possible... I don't want to let Kathy down) :-D 21:50:14 I need URLs of images that are creative commons licensed 21:50:42 If we get options, I'll set up a vote 21:51:14 So - impressive images please everyone from your cloudy science workloads! 21:51:22 i know someone mentioned last year that they liked the Bridges OPA layout image 21:51:32 it’s already in the previous book though, so maybe something fresh 21:51:40 but figured it’s up for grabs if needed 21:52:17 rbudden: I used a screen shot from the vrml widget. Any way of getting something better resolution? 21:52:29 yes 21:52:39 I can ask and get the original 21:52:56 sounds good. 21:53:12 OK - any more on the book work? 21:53:26 #topic SC17 21:53:36 hogepodge: was this your item? 21:53:57 Yes, we need point people to receive book and sticker shipments 21:54:12 Denise got stickers approved, so I just need to know who to mail them to for the conference. 21:54:13 i can volunteer for that 21:54:16 IU or PSC booths might be good places to deliver to - rbudden? 21:54:21 ah, great 21:54:25 yes, i’ll be at SC Sun-Fri 21:54:47 b1airo: what's the latest on the BoF? 21:54:48 they can likely be shipped straight to the PSC booth or split between us and IU 21:54:51 Great! Just send an email to me and I'll connect you with Denise to set it up. 21:54:59 hogepodge: will do, thx 21:55:26 oneswig: BoF is on 21:55:42 Did the merge happen or is it just us? 21:55:42 #action rbudden to get in touch with hogepodge about SC book and sticker shipments 21:55:45 chris@openstack.org 21:55:48 Final participant list and format TBA 21:56:18 I believe merge is happening, but I still haven't heard anymore from Meredith on the other side 21:57:54 Any more on SC? 21:58:01 Not from me 21:58:05 do we have a list of presenters for booth talks? 21:58:09 or still TBA? 21:58:13 errr TBD 21:58:21 Other thing I wanted to raise is feedback (if any) on becoming a SIG?? 21:58:59 rbudden: I think Mike Lowe was tracking that. 21:59:08 Plus! Forum topics for Sydney - we have not thrown anything into the brainstorming pile yet I think 21:59:09 #topic AOB - SIG etc. 21:59:33 oneswig: thanks, i’ll sync up with Mike 21:59:49 SIG, still seems fine to me. The mailing list split (ideally involving less cross-posting) is the only question for me. 22:00:07 b1airo: our usual BoF / Lightning Talks 22:00:13 We upgraded to Pike today, it worked! 22:00:35 martial: those are already in, I mean the main Forum 22:00:59 b1airo: might be a good thread for openstack-sigs[scientific]? 22:01:00 (I am still confused about this Forum thing then) 22:01:34 oneswig: might be especially since we are out of time 22:01:35 oneswig: Anything to look out for before moving to Pike? I keep cherry-picking bug fixes from it into our Ocata deployment 22:03:05 priteau: Mark and johnthetubaguy in our team did a fair bit of prep. We use Kolla. There were issues with docker_py -> docker. Also, we had problems with RabbitMQ hosts mangling /etc/hosts with repeated entries. Finally, a racy problem with updating haproxy which was fixed by a cherry-pick from master 22:03:21 And I forgot to merge a PR I'd left dangling, doh. 22:03:39 Kolla worked really well for us though. 22:03:51 Ah, we are way over time. 22:03:57 oneswig: I would be very interested in thoughts/war storries of Kolla 22:04:02 for another time then! 22:04:06 Better wrap up. 22:04:14 rbudden: will try to write something up! 22:04:17 cheers all 22:04:20 #endmeeting