09:00:28 <oneswig> #startmeeting scientific_wg 09:00:29 <openstack> Meeting started Wed Oct 12 09:00:28 2016 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:33 <openstack> The meeting name has been set to 'scientific_wg' 09:00:44 <oneswig> Hello and good morning 09:01:18 <oneswig> Blair sends his apologies, he's at an eResearch conference 09:01:42 <oneswig> #link agenda today is https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_October_12th_2016 09:02:28 <oneswig> Do we have Phil K from STFC here today? 09:02:57 <dariov> hello everybody! 09:03:00 <b1airo> Hi! Coming to you live from eResearch Australasia 09:03:06 <oneswig> Morning Daria 09:03:21 <oneswig> Hi b1airo, mind you don't spill the soup :-) 09:03:52 <stevejims> Hello all, Steve from StackHPC here 09:04:02 <oneswig> Hi stevejims 09:04:05 <oneswig> :-) 09:04:38 <oneswig> While b1airo's here (and Phil might not be) can we reshuffle the items 09:04:48 <oneswig> b1airo: are you able to talk on your follow-up findings? 09:05:03 <b1airo> Sure 09:05:06 <oneswig> #topic Blair's research on virtualisation 09:05:16 <b1airo> Though don't have much to say yet... 09:05:39 <oneswig> I have a related item when you're done 09:05:43 <b1airo> As I reported last week we have been doing some bare metal versus KVM benchmarks 09:07:00 <b1airo> Getting equivalent performance with linpack 09:07:33 <b1airo> I'm on my phone so if someone else can reference last week's log that'd be great... 09:08:30 <b1airo> But basically on a two socket E5-2680v3 box we get ~780Gflops with Intel optimised Linpack 09:08:56 <oneswig> #link discussion was here http://eavesdrop.openstack.org/meetings/scientific_wg/2016/scientific_wg.2016-10-04-21.01.log.html#l-24 09:09:18 <b1airo> With CPU and NUMA pinning and topology we get the same in KVM 09:09:40 <oneswig> And I recall you needed very recent kernel and supporting tools, right? 09:10:30 <oneswig> Did you manage to write it up on the Monash blog? Sure loads of people would like to see that 09:10:35 <b1airo> We are on 4.4 kernel 09:10:38 <b1airo> On Trusty but using Xenial HWE 09:11:01 <b1airo> And Qemu 2.5 from Mitaka Cloud archive 09:11:24 <verdurin> Morning all. 09:11:52 <oneswig> Hi verdurin 09:12:18 <b1airo> Not yet no, need to get to the whitepaper first :-( - just a bit overloaded with DC migration and network activities at the moment 09:12:59 <oneswig> No problem b1airo. On kernel versions, we've found a possible path through on our PV VXLAN performance problems 09:13:04 <b1airo> Interesting issue we hit with the topology exposure is that openmpi hwloc gets confused in the guest 09:13:23 <oneswig> confused how? 09:13:59 <b1airo> It was reporting overlapping CPU sets 09:14:26 <b1airo> This was with the cpu topology emulated to match the host, so numa node 0 with CPUs 0,2,4,... and so on 09:15:16 <b1airo> Changed it to be consecutive and hwloc stops whinging. Odd. But seems like not the greatest code 09:15:36 <oneswig> And does that result in something like anti-affinity? 09:15:58 <b1airo> Also started writing a little script to test NUMA memory latency 09:16:27 <b1airo> oneswig: not if you then adjust the cpu pinning accordingly 09:16:52 <oneswig> b1airo: will be following your footsteps RSN... 09:18:10 <b1airo> I started on the memory latency testing because when adjusting the topology I did try Linpack with anti-affinity just for shits and giggles, but interestingly it made no difference! 09:18:36 <b1airo> So then figured it is not a good indication of memory performance 09:19:07 <oneswig> b1airo: perhaps the config is outsmarting your attempts to make it dumber? 09:19:47 <b1airo> Initial results of that testing seem to indicate guest is ~20% slower in random access memory latency 09:20:35 <b1airo> Mind you, I'm using 40 lines of Python for this so... 09:22:01 <b1airo> That's it for now. I will write this up while we're in Barcelona 09:22:03 <oneswig> Neat to see that achieved in python, didn't think it had it in it 09:22:21 <oneswig> Thanks b1airo, keep it coming 09:22:51 <oneswig> A little related follow up on the issues we had with VXLAN bandwidth 09:23:17 <oneswig> We were getting a feeble amount of TCP bandwidth - ~1.6Gbits/s on a 50G NIC 09:23:23 <b1airo> Just create a big list, generate another big (but smaller) list of random offsets and time accessing them. Run using numactl 09:23:38 <oneswig> With some BIOS tuning, this went up to ~3.2Gbits/s 09:23:58 <oneswig> Thanks to some helpful guidance from mlnx 09:24:02 <b1airo> Oh yeah? 09:24:15 <oneswig> I'll write it up when you write yours up :-) 09:24:31 <b1airo> Lol 09:24:37 <oneswig> But now we've got something more tangible which might make a bigger difference 09:24:52 <b1airo> Don't make me choke on my steak 09:25:00 <oneswig> offload of encapsulated frames is in some way borked up to kernel 4.6 09:25:33 <oneswig> So I'm currently working my way up to a latest 4.x kernel to see if I can get the step change I'm looking for 09:25:43 <b1airo> Can you run mainline? 09:26:14 <oneswig> Trying - this isn't production. Currently up to latest 3.x, seems good so far, making my way to 4.x today is the plan 09:26:59 <oneswig> I'm hoping to find some useful data along the way 09:27:23 <oneswig> OK, any more on tuning and tweaking? 09:27:41 <oneswig> Lets move on 09:27:50 <oneswig> #topic RCUK workshop at the Crick in London 09:27:51 <b1airo> I'm done - chewing now 09:28:05 <oneswig> bon appetit b1airo, thanks for joining 09:28:17 <oneswig> Phil K are you here? 09:28:39 <oneswig> #link RCUK cloud workshop https://www.eventbrite.com/e/rcuk-cloud-working-group-workshop-tickets-27722389413 09:29:15 <oneswig> There's a workshop coming up discussion OpenStack use cases for research computing 09:29:29 <oneswig> Was hoping Phil could join us to talk it over 09:30:00 <oneswig> I'm planning on going - hopefully see you there 09:30:27 <oneswig> OK lets move on 09:30:39 <verdurin> I'll be there too 09:30:50 <oneswig> verdurin: long way to go? :-) 09:31:02 <oneswig> great, see you there 09:31:19 <oneswig> #topic Barcelona 09:31:33 <oneswig> 2 weeks to go now 09:31:48 <oneswig> We have our WG events on the schedule 09:32:02 <oneswig> I put some details here 09:32:08 <oneswig> #link https://wiki.openstack.org/wiki/Scientific_working_group#Barcelona_Summit.2C_October_2016 09:32:51 <oneswig> #link etherpad agenda for the WG meeting https://etherpad.openstack.org/p/scientific-wg-barcelona-agenda 09:33:37 <oneswig> Please add any items for discussion here 09:34:24 <oneswig> For the BoF (2:15pm Wednesay), we are now looking for lightning talks 09:35:09 <oneswig> I might talk about our "burst buffer" storage 09:35:51 <oneswig> As per the list mail, mail me and b1airo to put your talk on the schedule, we should have room for about 8 09:37:45 <oneswig> Finally for the evening social, I'm going to mail all ticketholders once I've scraped together the addresses - we have subsidy but need a charge for the meal 09:38:35 <oneswig> I'll mail later today with the details for how we will do that and give people the choice 09:39:09 <oneswig> Any more items for Barcelona? 09:41:21 <oneswig> #link There's also a scientific theme to the opening keynote this year https://www.openstack.org/summit/barcelona-2016/summit-schedule/events/17299/major-scientific-discoveries-on-openstack 09:42:10 <oneswig> #topic SC16 09:42:27 <oneswig> One more item for SC 09:43:05 <oneswig> #link b1airo put together an etherpad collecting OpenStack themed activities at SC https://etherpad.openstack.org/p/scientific-wg-supercomputing16 09:43:59 <oneswig> #topic Any other business 09:44:24 <oneswig> OK that's all I had for today - anyone have other news or things to discuss? 09:44:30 <dariov> sorry - urgent email to reply to. oneswig, we’re also planning to come down to the RCUK meeting 09:44:50 <oneswig> dariov: great, see you there 09:45:01 <dariov> (I guess we’re giving a talk on how we see porting pipelines to the cloud, or similar things) 09:46:00 <oneswig> I'd be interested to know if you're doing that including cloud APIs or by recreating the environment transparently to the application 09:46:31 <dariov> 2nd for the time being, way easier 09:47:03 <dariov> (and requires less coding, users usually start to run as soon we start talking about rewriting a few lines of code) 09:47:14 <oneswig> dariov: I may be doing something very similar in the coming months, I'll be paying attention 09:47:31 <oneswig> verdurin: are you speaking at the event? 09:48:34 <oneswig> OK, lets conclude for the week... any final items? 09:49:02 <oneswig> #endmeeting