21:01:54 <b1airo> #startmeeting scientific-wg 21:01:55 <openstack> Meeting started Tue Jun 28 21:01:54 2016 UTC and is due to finish in 60 minutes. The chair is b1airo. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:01:56 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:01:58 <openstack> The meeting name has been set to 'scientific_wg' 21:02:11 <b1airo> #chair oneswig 21:02:11 <openstack> Current chairs: b1airo oneswig 21:02:24 <edleafe> \o 21:02:26 <oneswig> #topic roll-call 21:02:29 <oneswig> hi everyone! 21:02:56 <trandles> hello 21:02:59 <julian1> Hi oneswig! 21:03:11 <julian1> \o 21:03:15 <b1airo> #topic Agenda 21:03:23 <oneswig> #link https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_June_28th_2016 21:03:35 <b1airo> -----HPC / Research speaker track at Barcelona Summit 21:03:35 <b1airo> Spread the word! 21:03:35 <b1airo> Review of Activity Areas and opportunities for progress: 21:03:35 <b1airo> Bare metal 21:03:35 <b1airo> Parallel filesystems 21:03:36 <b1airo> Accounting and scheduling 21:03:37 <b1airo> User Stories 21:03:41 <b1airo> Other business 21:03:43 <b1airo> ----- 21:04:22 <oneswig> Nice, thanks b1airo 21:04:50 <oneswig> Lets get started 21:05:10 <oneswig> #topic HPC/Research speaker track at Barcelona 21:05:26 <b1airo> tell your friends! 21:05:38 <oneswig> After the track at Austin we get a thumbs up to run again 21:06:27 <oneswig> I am interested to know what people thought was missing from the content in Austin? 21:07:11 <trandles> yay! I enjoyed the HPC/Research track as it brought together a lot of us with common interest. I think that ability to focus the community was missing in past summits. 21:07:29 <oneswig> I wish we'd had a talk covering Lustre/GPFS for one 21:07:34 <oneswig> Thanks trandles, agreed 21:07:44 <oneswig> Got another talk in you Tim? 21:08:01 <trandles> I think so. Working on titles and abstracts now. 21:08:20 <oneswig> Great! Deadline was 13 July IIRC 21:08:23 <trandles> getting approval for foreign travel is the difficult part 21:09:47 <b1airo> did we ask whether you're attending SC trandles ? 21:09:52 <oneswig> in last week's discussion (EMEA time zone) an email was being drafted that people might be able to circulate in other forums 21:10:06 <trandles> I'll +1 the lack of a lustre/GPFS talk. There have to be user stories around provisioning infrastructure in a scientific context that we're missing. 21:10:09 <blakec> Tutorial and instructional content seemed to be missing from HPC track in Austin 21:10:12 <oneswig> Hopefully, we can help people with a template for spreading the word 21:10:30 <trandles> blairo: I don't have plans for SC this year 21:10:47 <blakec> i.e. optimizing nova for HPC workloads 21:10:59 <oneswig> blakec: step-by-step this is how I did it kind of stuff? +1 21:11:55 <blakec> Correct, even very entry level content... As Summit grows I suspect those talks have a wider audience 21:12:10 <trandles> pulling blakec's tutorial thread, a lessons learned from someone like Pittsburgh where they're deploying HPC using OpenStack would be very nice 21:12:52 <trandles> during the Austin WG planning session, the ironic breakout basically turned into just that, Robert fielding questions about lessons learned with Bridges 21:13:26 <oneswig> #link https://etherpad.openstack.org/p/hpc-research-circulation-email Lets put together some points that can be shared to raise the HPC/Research speaker track profile 21:13:40 <b1airo> agreed re. lessons learned 21:13:48 <oneswig> If you're on other lists or groups, please consider mailing around to spread the wrod 21:13:52 <oneswig> word... 21:14:40 <b1airo> i'm interested to know if people are actually optimising openstack (e.g. nova) or just the hypervisor and then making use of lesser known hypervisor features that can be exposed through nova 21:15:17 <oneswig> Our first order efforts at optimisation are all around SRIOV and RDMA for Cinder 21:15:38 <oneswig> But we're just getting going really 21:16:53 <b1airo> what sort of optimisation are you looking at with sriov oneswig (other than using it) ? 21:17:54 <oneswig> Just using it basically... We keep a VF in the hypervisor for Cinder and pass through the rest 21:19:51 <oneswig> I would be really interested in a discussion at the summit that brings together some of the recent conversations on the EMEA side wrt combining Nova resource reservations (Blazar) with preemtible instances. Seems far out but really interesting as a long-term goal 21:20:14 <dfflanders> +1 21:20:21 <b1airo> right, i suspect other common hypervisor tunings such as cpu/mem pinning and numa topology probably make a reasonable bit of difference there too, but we haven't done any work to quantify that yet (been focused on cpu and memory performance mainly) 21:20:56 <b1airo> +1 to e.g. blazar and opie 21:21:38 <oneswig> How could we help that happen? 21:22:26 <b1airo> at this stage i imagine it'd be more a matter of gathering supporters 21:22:33 <oneswig> Agreed I suspect any talk that isn't purely hot air in this area may be two summits away... 21:22:51 <b1airo> not likely to be anyone using it in the wild expect a few folks working on the dev 21:22:55 <oneswig> (Plus there is the question of feasibility) 21:23:36 <kyaz001> hi guys, Khalil just joining 21:24:02 <oneswig> Hi Khalil, we were discussion the HPC speaker track for Barcelona 21:24:02 <b1airo> but having a lightening talk from some devs about how it works and fits in would be cool 21:24:48 <oneswig> b1airo: good idea, a session on future tech and wish-lists perhaps 21:24:52 <kyaz001> apologies for being tardy... do we have a list of topics? 21:25:08 <b1airo> kyaz001, https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_June_28th_2016 21:26:09 <oneswig> Shall we move on to activity areas? 21:26:27 <b1airo> sure 21:26:39 <oneswig> #topic Bare metal 21:27:25 <oneswig> I'm interested in following the developments re: serial consoles in Ironic but have not got involved yet. 21:27:32 <oneswig> It's on my radar. Anyone using it? 21:27:55 <trandles> I'm about to make an attempt at using it 21:28:07 <b1airo> i'm just hoping to get some resourcing for us to start playing with ironic later in the year 21:28:13 <oneswig> Through Ironic or side-band? 21:28:43 <trandles> I have serial consoles working side-band but want to swap it for ironic eventually 21:29:16 <oneswig> Same here, would be a great help to have it all under one umbrella 21:30:31 <oneswig> What other areas of activity are WG members interested in wrt bare metal? 21:31:33 <dfflanders> would be good to have Chameleons opinion on this re baremetal as a service to researchers. 21:31:49 <oneswig> dfflanders: good point 21:31:50 <trandles> I'm interested in bare metal scalability issues but don't yet have a tested large enough to push the boundaries 21:32:07 <trandles> *testbed 21:32:44 <oneswig> We have a problem I'd like to fix at the Ironic level: our NIC firmware seems to get upgraded awfully often. I'm wondering how we might use the Ironic IPA ramdisk to do this for us to keep the deployment nice and clean 21:32:47 <dfflanders> https://www.youtube.com/watch?v=ycJeWH8FjL0 21:32:55 <trandles> we will have some large clusters retiring in the next ~12-18 months though and I hope to get some time with the hardware before it goes out the door 21:33:01 <trandles> ~2000 nodes 21:33:36 <oneswig> That's an awful lot for Ironic to pick up in one go! 21:33:45 <trandles> indeed 21:34:27 <trandles> but it's a chance to identify problem areas 21:35:05 <oneswig> I recall Robert saying there were deployment races between Nova and Ironic that limited him to deploying small numbers of nodes at a time - and he's got ~800 nodes IIRC 21:35:52 <oneswig> trandles: how long will you have? 21:36:03 <trandles> it varies a lot 21:36:27 <trandles> if there's no immediate demand for the floor space (and power, and cooling...) I could have several months 21:37:19 <b1airo> oneswig, have you reviewed https://blueprints.launchpad.net/nova/+spec/host-state-level-locking ? 21:37:29 <oneswig> trandles: I assume this is on-site and somewhat restricted but I'm sure I'd be interested to hear 21:37:55 <trandles> I'll keep it on the radar when decommissioning talk gets started 21:39:34 <oneswig> b1airo: not seen that but it sounds quite fundamental. My understanding of python's global interpreter lock and concurrency model fall short of this but I'm surprised that threads in python can preempt one another at all 21:42:32 <oneswig> Thinking of actions, we've shared some interests here and found some in common. 21:43:18 <oneswig> #action trandles b1airo oneswig we should keep in touch if we get underway with ironic serial consoles 21:43:21 <kyaz001> can you summarize the area of interest? 21:43:55 <oneswig> kyaz001: Right now we are looking at the bare metal area of activity, looking at incoming developments 21:44:50 <oneswig> Many new capabilities in this area, many of which are interesting to many people on the WG 21:45:48 <oneswig> We ought to crack on, time's passing 21:46:04 <oneswig> #topic parallel filesystems 21:46:21 <oneswig> Alas I've not seen much go by in this area but I had one thought 21:46:48 <oneswig> Is anyone in the WG interested in putting up a talk on Lustre or GPFS for the speaker track? 21:47:42 <b1airo> we could probably do one 21:48:10 <oneswig> I think one of the principal guys at Cambridge might be able to share a combined Lustre / GPFS talk 21:48:18 <b1airo> though i'd make that dependent on getting one of my colleagues to work on the lustre bits 21:48:38 <b1airo> that's an idea 21:49:10 <oneswig> The timezones would be a killer for planning but I think Cambridge could cover the Lustre side - and we'd get a benchmark bake-off :-) 21:49:16 <b1airo> two quick tours of a HPFS integration and then maybe an open Q&A 21:49:43 <oneswig> I'll check and report back. 21:50:17 <blakec> We (ORNL) could contribute to the Lustre side as well. 21:50:22 <b1airo> has some promise and judging from the wg sessions in Austin would be very relevant 21:51:04 <oneswig> Great, let's note that 21:51:35 <trandles> likewise someone at LANL is looking at deploying GPFS using openstack 21:51:49 <oneswig> #action b1airo blakec oneswig trandles to consider options for Lustre/GPFS talk proposal 21:53:03 <b1airo> blakec, your experience is with integrating Lustre and Ironic based compute, or have you done hypervisor based compute too? 21:54:41 <oneswig> Time for one more topic? 21:55:16 <oneswig> #topic accounting and scheduling 21:55:22 <blakec> With hypervisor - we have multiple Lustre networks (TCP for VMs, and IB for bare metal). No sriov 21:56:01 <b1airo> blakec, but all talking to the same filesystem/s i take it? 21:56:21 <blakec> yes, that's correct 21:56:49 <b1airo> sounds interesting 21:57:22 <oneswig> My colleague in Cambridge has responded with interest re: HPC filesystem talk proposal, lets follow up on that 21:58:03 <oneswig> We've already covered much of the recent discussion on scheduling, was there anything from WG members in this area? 21:58:11 <b1airo> absolutely - good iea oneswig 21:58:28 <oneswig> b1airo: thanks 21:58:40 <oneswig> Time's closing in 21:58:44 <oneswig> #topic AOB 21:58:57 <oneswig> any last-minute items? 21:59:12 <b1airo> coffee...? 21:59:18 <trandles> yes please 21:59:31 <oneswig> Sounds ideal! 21:59:49 <julian1> \o 22:00:11 <oneswig> Hi julian1 22:00:14 <b1airo> stayed up way late gathering mellanox neo debug info :-/ 22:00:26 <oneswig> you too? :-) 22:00:28 <julian1> Hey oneswig! 22:01:27 <oneswig> Time to wrap up / brew up - thanks all 22:01:38 <oneswig> #endmeeting