09:01:16 <oneswig> #startmeeting scientific_wg
09:01:16 <openstack> Meeting started Wed Jul 20 09:01:16 2016 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:01:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:01:19 <openstack> The meeting name has been set to 'scientific_wg'
09:01:41 <oneswig> Hello
09:01:56 <apdibbo> Hi
09:01:56 <oneswig> We have an agenda (such as it is)
09:02:01 <dariov> Hello!
09:02:06 <oneswig> #link https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_July_20th_2016
09:02:32 <oneswig> Hi dariov apdibbo
09:03:06 <oneswig> Topics today from the agenda:
09:03:13 <priteau> Hello
09:03:18 <oneswig> 1) Update on user stories activities
09:03:35 <oneswig> 2) anything from other activity areas (esp parallel filesystems this week)
09:03:47 <oneswig> 3) hypervisor tuning guide solicitation
09:04:02 <oneswig> 4) speaker track and wg activities for barcelona
09:04:28 <oneswig> BTW I've heard from b1airo he's going to join shortly
09:04:35 <oneswig> Hi priteau
09:05:21 <oneswig> #topic Update on user stories activities
09:06:27 <oneswig> I've been working away on this white paper for OpenStack and HPC, not much to report on that right now other than progress is being made
09:06:46 <oneswig> I forget what was discussed on this two weeks ago...
09:07:29 <oneswig> The next phase is solicitation of subject matter experts to provide input, review and comment
09:07:52 <oneswig> To recap there are five principal subjects
09:08:03 <oneswig> 1) HPC and virtualisation
09:08:13 <oneswig> 2) HPC parallel filesystems
09:08:20 <oneswig> 3) HPC worklaod managers
09:08:29 <oneswig> 4) HPC infrastructure management
09:08:42 <oneswig> 5) HPC network fabrics
09:09:00 <oneswig> virtualisation is actually half about bare metal
09:09:14 <oneswig> Anyone interested in contributing on these areas?
09:09:47 <oneswig> I have people in mind for many items but not all
09:11:08 <oneswig> OK, don't all shout at once :-)
09:11:13 <dariov> lol
09:11:33 <oneswig> no problem, I think we have plenty to go on as it is
09:12:26 <oneswig> More generally, is there activity with any of you that might convert well into a user story?
09:14:05 <dariov> 3 and 4 do interest us
09:14:09 <oneswig> At Cambridge our bioinformatics deployment is in early trials but it's HA is less reliable than a non-HA deployment right now...
09:14:24 <dariov> we’re trying to move some of the hpc on-prem to the cloud
09:14:37 <dariov> (possibly openstack, possibly AWS, but that’s another story)
09:14:45 <apdibbo> we are in early trials with openstack for HTC
09:14:49 <dariov> but no direct knowledge so far
09:14:54 <oneswig> dariov: interesting to know, and why constrain to one at this point
09:15:01 <dariov> or trials whatsoever
09:15:34 <dariov> on ostack we’re limited by our own resources now
09:15:57 <dariov> and eventually people landing on the openstack land for the first time tend to develop cloud-friendly apps since the beginning
09:16:11 <dariov> so no real need for HPC solutions there
09:16:30 <dariov> as they usually deploy their own orchestration layer
09:17:00 <oneswig> dariov: I've seen use of ManageIQ for abstracting away private OpenStack vs AWS
09:17:44 <dariov> onewswig, I think we tried it back in October to “rule all the clouds”, we also have VMware on site
09:17:52 <priteau> oneswig: One of our partners on Chameleon, Ohio State University, is using Chameleon to work on high performance networking in virtualized environments: http://mvapich.cse.ohio-state.edu/overview/#mv2v
09:18:01 <oneswig> dariov: if you document your journey I'd be interested to read it
09:18:22 <dariov> oneswig, where are you based, exacltly?
09:18:27 <dariov> *exactly?
09:18:52 <oneswig> priteau: That's pretty cool, I hadn't made the connection, but I know the work at OSU it's good stuff
09:19:03 <oneswig> dariov: I'm based in Bristol but working in Cambridge
09:19:09 <dariov> ah ah!
09:19:28 <dariov> Hello from the Genome Campus in Hinxton, Cambrigeshire, UK, then :-)
09:19:35 <priteau> oneswig: I can get you in touch if you want
09:19:39 <oneswig> Small world :-)
09:20:30 <oneswig> priteau: I would appreciate that.  I've previously mailed with DK Panda but only once or twice. A connected introduction for this work would be a great help
09:20:46 <dariov> oneswig, yes. We can also have a cup of coffee together one day, it would be cool to see how other people are making this journey :-)
09:21:13 <oneswig> #action Cambridge-centric OpenStack meetup needed!
09:21:26 <oneswig> I'm in town next week midweek.  You around?
09:21:34 <oneswig> Let's take this offline :-)
09:21:58 <oneswig> Anything else to add on user stories?
09:23:01 <oneswig> #topic Bare metal
09:23:36 <oneswig> I don't actually have anything here but Pierre I recall you were more active than me w.r.t Ironic serial consoles.  Any news there?
09:24:54 <priteau> We are working on backporting the patch to our installation (we're still on Liberty) and will soon test it
09:25:39 <priteau> I believe there was some progress upstream, as the multiple implementations seem to have merged into one, and I saw at least one patch being merged in master
09:25:40 <oneswig> that's great.  Is this a patch to Nova as well?
09:26:11 <priteau> this one: https://github.com/openstack/ironic/commit/cb2da13d15bf72755880e7a8e6881e5180e2e29f
09:26:34 <b1airo> evening
09:26:49 <oneswig> Hi b1airo
09:26:54 <oneswig> #chair b1airo
09:26:55 <openstack> Current chairs: b1airo oneswig
09:27:15 <oneswig> Just on bare metal, hpfs next
09:27:27 <priteau> as far as I know it's all in Ironic, no change to nova
09:28:12 <priteau> Another interesting thing that happened recently with bare-metal: support for network multitenancy is being integrated in Ironic
09:28:32 <b1airo> yeah we talked a bit about that last week actually
09:28:32 <priteau> e.g. https://github.com/openstack/ironic/commit/090ba640d9187ec6bee157ce8f1cf12ce6a868ca
09:28:44 <b1airo> or at least touched on it
09:29:01 <b1airo> i was asking about requirements for new OOB network we will be doing soon
09:29:06 <priteau> I haven't seen proper documentation yet so I don't know how it works and what kind of network support it requires
09:29:33 <b1airo> i was surprised to find out that Ironic-Neutron integration seems relatively new
09:29:53 <oneswig> It seems to be sufficiently far upstream that there's limited support and limited docs but hopefully it'll grow!
09:30:33 <oneswig> b1airo: I think one major obstacle is that Ironic requires physical network knowledge, something that OpenStack has been in wilful denial over
09:31:20 <oneswig> If your moving activities normally done into a hypervisor into a network port, you need to know which port (and possibly the paths to it)
09:31:47 <oneswig> Enter the Neo :-)
09:32:06 <priteau> for now the only doc I see is for devstack where the bare-metal servers are actually VMs
09:32:56 <oneswig> priteau: I'd previously seen an Arista driver that was doing bare metal multi-tenancy and required LLDP to make the connection with a physical switch+port
09:33:37 <oneswig> But that work is now very much out of date I expect
09:35:18 <oneswig> Anyway, it's great to see this documentation merged, thanks priteau for sharing that
09:35:23 <priteau> it looks like in this new implementation they extended the API to add the physical port info, I suppose managed by the admin
09:35:31 <priteau> e.g. https://github.com/openstack/ironic/commit/76726c6a3feda8419724d09c777e7bb578d82ec0#diff-2526370ba29c4ac923f13218952a32c3R167
09:35:53 <oneswig> ooh, interesting.
09:35:55 <priteau> I will keep an eye on what they are doing and will share what I learn
09:36:04 <oneswig> Maintaining the physical mapping - an exercise for the reader?
09:36:32 <oneswig> Any more for bare metal?
09:37:10 <priteau> not from me
09:37:59 <oneswig> #topic Parallel Filesystems
09:38:15 <b1airo> is just the three of us at the moment?
09:38:29 <oneswig> I think 5-6?
09:39:10 <b1airo> ah ok, a few lurkers then :-)
09:39:14 <priteau> apdibbo and dariov were talking earlier
09:39:27 <apdibbo> yeah, i'm still here
09:39:45 <dariov> I’m here too!
09:39:53 <apdibbo> just dont have much to contribute on bare metal, interesting developments though :)
09:40:01 <dariov> same here
09:40:23 <oneswig> There was quite a bit of interest to Blair's mail on the lists about discussion parallel filesystems for cloud hpc
09:41:21 <oneswig> Seems like an idea with momentum behind it
09:41:24 <b1airo> yeah was nice to have such quick feedback
09:41:47 <b1airo> pity i didn't think of it earlier, such is the life of a procrastinator
09:42:08 <oneswig> Any parallel filesystem users or interested potential users here today?
09:43:48 <oneswig> b1airo: for our benefit perhaps we should talk over the WG panel session
09:44:13 <b1airo> only news here is that we got LNET with o2ib over RoCE via PCI pass-through virtual function NICs in guests going last week, for our new deployment
09:44:28 <oneswig> Nice!
09:44:31 <b1airo> say that sentence ten times fast
09:44:46 <oneswig> I bet you had it pre-typed out :-)
09:45:23 <b1airo> got it bound to a hot key ;-)
09:45:31 <oneswig> What issues did you hit on the way?
09:46:07 <b1airo> this deployment is full of ConnectX-4 and ConnectX-4 Lx
09:46:33 <oneswig> b1airo: Is that considered an issue?
09:46:56 <b1airo> the firmware in MOFED 3.2-1.0.1 (i think that's right) seems to have an issue. we've had a handful of nodes just "lose" their NICs
09:47:18 <b1airo> i.e. all traffic stops passing on them and can't do anything to fix it, even resetting the pci device
09:47:29 <b1airo> on reboot the card is completely absent
09:47:40 <oneswig> Ah, we moved to 3.3 - there's a security advisory against 3.2.  Plus we had loads of issues getting link up
09:47:45 <b1airo> i.e., not visible in lspci
09:48:06 <oneswig> b1airo: interestingly your issues are completely disjoint from ours, yet we both appear to have many
09:48:07 <b1airo> we have upgraded a few to 3.3.xxx and that seems to have sorted it
09:48:18 <b1airo> orly!?
09:48:28 <b1airo> where is that advisory published?
09:48:39 <b1airo> i have a support case open about it and they haven't mentioned that
09:49:04 <oneswig> OFED 3.3 release notes is where I saw it - was wondering why it's no longer on the website
09:49:05 <dariov> guys, I need to run
09:49:13 <oneswig> later dariov, thanks
09:49:16 <dariov> bye!
09:49:32 <b1airo> anyway, that issue was just a distraction, main issue was needing to get client Lustre modules built by Intel for the particular kernel + MOFED combination in the guest compute nodes
09:49:46 <b1airo> bye!
09:50:25 <b1airo> after that just simple LNET config
09:50:40 <oneswig> Is your process blogged anywhere, perchance...?
09:51:42 <b1airo> the Intel Manager for Lustre (IML) doesn't understand RoCE though, so if you're configuring things server-side you need to manually edit one of the network config files to make it use o2ib, else it assumes tcp (because ethernet)
09:52:38 <b1airo> no, but that's probably a good idea - it is documented internally reasonably well now so should be easy enough
09:52:55 <oneswig> Share and enjoy, b1airo!
09:53:40 <oneswig> OK, we should move on
09:53:57 <b1airo> remaining issue is that we reconfigured the filesystem, into two separate filesystems (projects and scratch). and now we seem to be peaking at ~80MB/s for single client/thread writes... which is a little low!
09:54:27 <oneswig> b1airo: for that effort, I would say so.  Is there a gigabit ethernet somewhere in the data path???
09:55:11 <b1airo> it's a mix of 25, 50 (compute/client side) and 100 (server)
09:55:38 <b1airo> same issue with bare-metal clients so it isn't the passthrough or anything like that
09:55:54 <b1airo> initial acceptance tests were just fine
09:56:21 <b1airo> haven't looked into it yet myself, distracted with other issues as per recent os-ops posts
09:56:32 <oneswig> SMARTD errors on your disks perhaps?
09:56:51 <oneswig> Right, let's move on
09:56:54 <oneswig> #topic Any Other Business
09:57:07 <oneswig> What news?
09:57:08 <b1airo> they're all dell MDs so unlikley
09:57:26 <b1airo> Trump got the republican nomination?
09:57:44 <oneswig> You raise that with 3 MINUTES to discuss?
09:58:12 <b1airo> haha
09:58:35 <priteau> oneswig: Is this the Hypervisor Tuning Guide? https://wiki.openstack.org/wiki/Documentation/HypervisorTuningGuide
09:58:42 <b1airo> yep
09:58:52 <priteau> I had never seen it before
09:59:09 <b1airo> yeah it's not well promoted or linked anywhere
09:59:34 <b1airo> i'm hoping we can figure out something sensible to do with it
09:59:51 <oneswig> Are you officially it's custodian b1airo?
09:59:51 <b1airo> maybe look at converting to doc format next cycle
10:00:22 <b1airo> something like that, but all the work so far has been done by Joe Topjian
10:00:57 <b1airo> i think the initial idea was to turn it into a standalone doc
10:01:15 <priteau> b1airo: maybe merge it into http://docs.openstack.org/ops-guide/arch_compute_nodes.html
10:01:35 <b1airo> but seems like it's hard to find info on other hypervisors that would bulk out the content enough to justify that
10:01:44 <b1airo> priteau, good suggestion
10:01:52 <b1airo> i was also thinking probably the ops guide
10:02:38 <oneswig> It's more operational than architectural I'd guess
10:03:27 <b1airo> yeah, though some of the things covered are probably things you want to be thinking about before you buy your hardware
10:03:52 <oneswig> I found CERN's study that Hyper-V outperforms KVM relevant - a pity there isn't more diversity in the guide as-is if KVM's not the fastest thing out there
10:04:09 <priteau> or in the Operations part of the Ops Guide, it could be a new page about troubleshooting performance problems
10:05:19 <oneswig> ah, we are over time
10:05:26 <oneswig> Any final comments?
10:05:36 <b1airo> NUMA matters!
10:05:52 <oneswig> One from me - appears we have 8 sessions allocated for Barcelona summit
10:06:03 <oneswig> hooray!
10:06:24 <oneswig> Got to wrap up now, thanks everyone
10:06:28 <b1airo> yeah time for me to look at the proposals
10:06:35 <b1airo> bfn!
10:06:42 <apdibbo> bye
10:06:45 <oneswig> Until next time
10:06:53 <oneswig> #endmeeting