09:01:16 #startmeeting scientific_wg 09:01:16 Meeting started Wed Jul 20 09:01:16 2016 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:01:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:01:19 The meeting name has been set to 'scientific_wg' 09:01:41 Hello 09:01:56 Hi 09:01:56 We have an agenda (such as it is) 09:02:01 Hello! 09:02:06 #link https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_July_20th_2016 09:02:32 Hi dariov apdibbo 09:03:06 Topics today from the agenda: 09:03:13 Hello 09:03:18 1) Update on user stories activities 09:03:35 2) anything from other activity areas (esp parallel filesystems this week) 09:03:47 3) hypervisor tuning guide solicitation 09:04:02 4) speaker track and wg activities for barcelona 09:04:28 BTW I've heard from b1airo he's going to join shortly 09:04:35 Hi priteau 09:05:21 #topic Update on user stories activities 09:06:27 I've been working away on this white paper for OpenStack and HPC, not much to report on that right now other than progress is being made 09:06:46 I forget what was discussed on this two weeks ago... 09:07:29 The next phase is solicitation of subject matter experts to provide input, review and comment 09:07:52 To recap there are five principal subjects 09:08:03 1) HPC and virtualisation 09:08:13 2) HPC parallel filesystems 09:08:20 3) HPC worklaod managers 09:08:29 4) HPC infrastructure management 09:08:42 5) HPC network fabrics 09:09:00 virtualisation is actually half about bare metal 09:09:14 Anyone interested in contributing on these areas? 09:09:47 I have people in mind for many items but not all 09:11:08 OK, don't all shout at once :-) 09:11:13 lol 09:11:33 no problem, I think we have plenty to go on as it is 09:12:26 More generally, is there activity with any of you that might convert well into a user story? 09:14:05 3 and 4 do interest us 09:14:09 At Cambridge our bioinformatics deployment is in early trials but it's HA is less reliable than a non-HA deployment right now... 09:14:24 we’re trying to move some of the hpc on-prem to the cloud 09:14:37 (possibly openstack, possibly AWS, but that’s another story) 09:14:45 we are in early trials with openstack for HTC 09:14:49 but no direct knowledge so far 09:14:54 dariov: interesting to know, and why constrain to one at this point 09:15:01 or trials whatsoever 09:15:34 on ostack we’re limited by our own resources now 09:15:57 and eventually people landing on the openstack land for the first time tend to develop cloud-friendly apps since the beginning 09:16:11 so no real need for HPC solutions there 09:16:30 as they usually deploy their own orchestration layer 09:17:00 dariov: I've seen use of ManageIQ for abstracting away private OpenStack vs AWS 09:17:44 onewswig, I think we tried it back in October to “rule all the clouds”, we also have VMware on site 09:17:52 oneswig: One of our partners on Chameleon, Ohio State University, is using Chameleon to work on high performance networking in virtualized environments: http://mvapich.cse.ohio-state.edu/overview/#mv2v 09:18:01 dariov: if you document your journey I'd be interested to read it 09:18:22 oneswig, where are you based, exacltly? 09:18:27 *exactly? 09:18:52 priteau: That's pretty cool, I hadn't made the connection, but I know the work at OSU it's good stuff 09:19:03 dariov: I'm based in Bristol but working in Cambridge 09:19:09 ah ah! 09:19:28 Hello from the Genome Campus in Hinxton, Cambrigeshire, UK, then :-) 09:19:35 oneswig: I can get you in touch if you want 09:19:39 Small world :-) 09:20:30 priteau: I would appreciate that. I've previously mailed with DK Panda but only once or twice. A connected introduction for this work would be a great help 09:20:46 oneswig, yes. We can also have a cup of coffee together one day, it would be cool to see how other people are making this journey :-) 09:21:13 #action Cambridge-centric OpenStack meetup needed! 09:21:26 I'm in town next week midweek. You around? 09:21:34 Let's take this offline :-) 09:21:58 Anything else to add on user stories? 09:23:01 #topic Bare metal 09:23:36 I don't actually have anything here but Pierre I recall you were more active than me w.r.t Ironic serial consoles. Any news there? 09:24:54 We are working on backporting the patch to our installation (we're still on Liberty) and will soon test it 09:25:39 I believe there was some progress upstream, as the multiple implementations seem to have merged into one, and I saw at least one patch being merged in master 09:25:40 that's great. Is this a patch to Nova as well? 09:26:11 this one: https://github.com/openstack/ironic/commit/cb2da13d15bf72755880e7a8e6881e5180e2e29f 09:26:34 evening 09:26:49 Hi b1airo 09:26:54 #chair b1airo 09:26:55 Current chairs: b1airo oneswig 09:27:15 Just on bare metal, hpfs next 09:27:27 as far as I know it's all in Ironic, no change to nova 09:28:12 Another interesting thing that happened recently with bare-metal: support for network multitenancy is being integrated in Ironic 09:28:32 yeah we talked a bit about that last week actually 09:28:32 e.g. https://github.com/openstack/ironic/commit/090ba640d9187ec6bee157ce8f1cf12ce6a868ca 09:28:44 or at least touched on it 09:29:01 i was asking about requirements for new OOB network we will be doing soon 09:29:06 I haven't seen proper documentation yet so I don't know how it works and what kind of network support it requires 09:29:33 i was surprised to find out that Ironic-Neutron integration seems relatively new 09:29:53 It seems to be sufficiently far upstream that there's limited support and limited docs but hopefully it'll grow! 09:30:33 b1airo: I think one major obstacle is that Ironic requires physical network knowledge, something that OpenStack has been in wilful denial over 09:31:20 If your moving activities normally done into a hypervisor into a network port, you need to know which port (and possibly the paths to it) 09:31:47 Enter the Neo :-) 09:32:06 for now the only doc I see is for devstack where the bare-metal servers are actually VMs 09:32:56 priteau: I'd previously seen an Arista driver that was doing bare metal multi-tenancy and required LLDP to make the connection with a physical switch+port 09:33:37 But that work is now very much out of date I expect 09:35:18 Anyway, it's great to see this documentation merged, thanks priteau for sharing that 09:35:23 it looks like in this new implementation they extended the API to add the physical port info, I suppose managed by the admin 09:35:31 e.g. https://github.com/openstack/ironic/commit/76726c6a3feda8419724d09c777e7bb578d82ec0#diff-2526370ba29c4ac923f13218952a32c3R167 09:35:53 ooh, interesting. 09:35:55 I will keep an eye on what they are doing and will share what I learn 09:36:04 Maintaining the physical mapping - an exercise for the reader? 09:36:32 Any more for bare metal? 09:37:10 not from me 09:37:59 #topic Parallel Filesystems 09:38:15 is just the three of us at the moment? 09:38:29 I think 5-6? 09:39:10 ah ok, a few lurkers then :-) 09:39:14 apdibbo and dariov were talking earlier 09:39:27 yeah, i'm still here 09:39:45 I’m here too! 09:39:53 just dont have much to contribute on bare metal, interesting developments though :) 09:40:01 same here 09:40:23 There was quite a bit of interest to Blair's mail on the lists about discussion parallel filesystems for cloud hpc 09:41:21 Seems like an idea with momentum behind it 09:41:24 yeah was nice to have such quick feedback 09:41:47 pity i didn't think of it earlier, such is the life of a procrastinator 09:42:08 Any parallel filesystem users or interested potential users here today? 09:43:48 b1airo: for our benefit perhaps we should talk over the WG panel session 09:44:13 only news here is that we got LNET with o2ib over RoCE via PCI pass-through virtual function NICs in guests going last week, for our new deployment 09:44:28 Nice! 09:44:31 say that sentence ten times fast 09:44:46 I bet you had it pre-typed out :-) 09:45:23 got it bound to a hot key ;-) 09:45:31 What issues did you hit on the way? 09:46:07 this deployment is full of ConnectX-4 and ConnectX-4 Lx 09:46:33 b1airo: Is that considered an issue? 09:46:56 the firmware in MOFED 3.2-1.0.1 (i think that's right) seems to have an issue. we've had a handful of nodes just "lose" their NICs 09:47:18 i.e. all traffic stops passing on them and can't do anything to fix it, even resetting the pci device 09:47:29 on reboot the card is completely absent 09:47:40 Ah, we moved to 3.3 - there's a security advisory against 3.2. Plus we had loads of issues getting link up 09:47:45 i.e., not visible in lspci 09:48:06 b1airo: interestingly your issues are completely disjoint from ours, yet we both appear to have many 09:48:07 we have upgraded a few to 3.3.xxx and that seems to have sorted it 09:48:18 orly!? 09:48:28 where is that advisory published? 09:48:39 i have a support case open about it and they haven't mentioned that 09:49:04 OFED 3.3 release notes is where I saw it - was wondering why it's no longer on the website 09:49:05 guys, I need to run 09:49:13 later dariov, thanks 09:49:16 bye! 09:49:32 anyway, that issue was just a distraction, main issue was needing to get client Lustre modules built by Intel for the particular kernel + MOFED combination in the guest compute nodes 09:49:46 bye! 09:50:25 after that just simple LNET config 09:50:40 Is your process blogged anywhere, perchance...? 09:51:42 the Intel Manager for Lustre (IML) doesn't understand RoCE though, so if you're configuring things server-side you need to manually edit one of the network config files to make it use o2ib, else it assumes tcp (because ethernet) 09:52:38 no, but that's probably a good idea - it is documented internally reasonably well now so should be easy enough 09:52:55 Share and enjoy, b1airo! 09:53:40 OK, we should move on 09:53:57 remaining issue is that we reconfigured the filesystem, into two separate filesystems (projects and scratch). and now we seem to be peaking at ~80MB/s for single client/thread writes... which is a little low! 09:54:27 b1airo: for that effort, I would say so. Is there a gigabit ethernet somewhere in the data path??? 09:55:11 it's a mix of 25, 50 (compute/client side) and 100 (server) 09:55:38 same issue with bare-metal clients so it isn't the passthrough or anything like that 09:55:54 initial acceptance tests were just fine 09:56:21 haven't looked into it yet myself, distracted with other issues as per recent os-ops posts 09:56:32 SMARTD errors on your disks perhaps? 09:56:51 Right, let's move on 09:56:54 #topic Any Other Business 09:57:07 What news? 09:57:08 they're all dell MDs so unlikley 09:57:26 Trump got the republican nomination? 09:57:44 You raise that with 3 MINUTES to discuss? 09:58:12 haha 09:58:35 oneswig: Is this the Hypervisor Tuning Guide? https://wiki.openstack.org/wiki/Documentation/HypervisorTuningGuide 09:58:42 yep 09:58:52 I had never seen it before 09:59:09 yeah it's not well promoted or linked anywhere 09:59:34 i'm hoping we can figure out something sensible to do with it 09:59:51 Are you officially it's custodian b1airo? 09:59:51 maybe look at converting to doc format next cycle 10:00:22 something like that, but all the work so far has been done by Joe Topjian 10:00:57 i think the initial idea was to turn it into a standalone doc 10:01:15 b1airo: maybe merge it into http://docs.openstack.org/ops-guide/arch_compute_nodes.html 10:01:35 but seems like it's hard to find info on other hypervisors that would bulk out the content enough to justify that 10:01:44 priteau, good suggestion 10:01:52 i was also thinking probably the ops guide 10:02:38 It's more operational than architectural I'd guess 10:03:27 yeah, though some of the things covered are probably things you want to be thinking about before you buy your hardware 10:03:52 I found CERN's study that Hyper-V outperforms KVM relevant - a pity there isn't more diversity in the guide as-is if KVM's not the fastest thing out there 10:04:09 or in the Operations part of the Ops Guide, it could be a new page about troubleshooting performance problems 10:05:19 ah, we are over time 10:05:26 Any final comments? 10:05:36 NUMA matters! 10:05:52 One from me - appears we have 8 sessions allocated for Barcelona summit 10:06:03 hooray! 10:06:24 Got to wrap up now, thanks everyone 10:06:28 yeah time for me to look at the proposals 10:06:35 bfn! 10:06:42 bye 10:06:45 Until next time 10:06:53 #endmeeting