#openstack-meeting log

09:00:28 <oneswig> #startmeeting scientific_wg
09:00:29 <openstack> Meeting started Wed Oct 12 09:00:28 2016 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:00:31 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:00:33 <openstack> The meeting name has been set to 'scientific_wg'
09:00:44 <oneswig> Hello and good morning
09:01:18 <oneswig> Blair sends his apologies, he's at an eResearch conference
09:01:42 <oneswig> #link agenda today is https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_October_12th_2016
09:02:28 <oneswig> Do we have Phil K from STFC here today?
09:02:57 <dariov> hello everybody!
09:03:00 <b1airo> Hi! Coming to you live from eResearch Australasia
09:03:06 <oneswig> Morning Daria
09:03:21 <oneswig> Hi b1airo, mind you don't spill the soup :-)
09:03:52 <stevejims> Hello all, Steve from StackHPC here
09:04:02 <oneswig> Hi stevejims
09:04:05 <oneswig> :-)
09:04:38 <oneswig> While b1airo's here (and Phil might not be) can we reshuffle the items
09:04:48 <oneswig> b1airo: are you able to talk on your follow-up findings?
09:05:03 <b1airo> Sure
09:05:06 <oneswig> #topic Blair's research on virtualisation
09:05:16 <b1airo> Though don't have much to say yet...
09:05:39 <oneswig> I have a related item when you're done
09:05:43 <b1airo> As I reported last week we have been doing some bare metal versus KVM benchmarks
09:07:00 <b1airo> Getting equivalent performance with linpack
09:07:33 <b1airo> I'm on my phone so if someone else can reference last week's log that'd be great...
09:08:30 <b1airo> But basically on a two socket E5-2680v3 box we get ~780Gflops with Intel optimised Linpack
09:08:56 <oneswig> #link discussion was here http://eavesdrop.openstack.org/meetings/scientific_wg/2016/scientific_wg.2016-10-04-21.01.log.html#l-24
09:09:18 <b1airo> With CPU and NUMA pinning and topology we get the same in KVM
09:09:40 <oneswig> And I recall you needed very recent kernel and supporting tools, right?
09:10:30 <oneswig> Did you manage to write it up on the Monash blog?  Sure loads of people would like to see that
09:10:35 <b1airo> We are on 4.4 kernel
09:10:38 <b1airo> On Trusty but using Xenial HWE
09:11:01 <b1airo> And Qemu 2.5 from Mitaka Cloud archive
09:11:24 <verdurin> Morning all.
09:11:52 <oneswig> Hi verdurin
09:12:18 <b1airo> Not yet no, need to get to the whitepaper first :-( - just a bit overloaded with DC migration and network activities at the moment
09:12:59 <oneswig> No problem b1airo.  On kernel versions, we've found a possible path through on our PV VXLAN performance problems
09:13:04 <b1airo> Interesting issue we hit with the topology exposure is that openmpi hwloc gets confused in the guest
09:13:23 <oneswig> confused how?
09:13:59 <b1airo> It was reporting overlapping CPU sets
09:14:26 <b1airo> This was with the cpu topology emulated to match the host, so numa node 0 with CPUs 0,2,4,... and so on
09:15:16 <b1airo> Changed it to be consecutive and hwloc stops whinging. Odd. But seems like not the greatest code
09:15:36 <oneswig> And does that result in something like anti-affinity?
09:15:58 <b1airo> Also started writing a little script to test NUMA memory latency
09:16:27 <b1airo> oneswig: not if you then adjust the cpu pinning accordingly
09:16:52 <oneswig> b1airo: will be following your footsteps RSN...
09:18:10 <b1airo> I started on the memory latency testing because when adjusting the topology I did try Linpack with anti-affinity just for shits and giggles, but interestingly it made no difference!
09:18:36 <b1airo> So then figured it is not a good indication of memory performance
09:19:07 <oneswig> b1airo: perhaps the config is outsmarting your attempts to make it dumber?
09:19:47 <b1airo> Initial results of that testing seem to indicate guest is ~20% slower in random access memory latency
09:20:35 <b1airo> Mind you, I'm using 40 lines of Python for this so...
09:22:01 <b1airo> That's it for now. I will write this up while we're in Barcelona
09:22:03 <oneswig> Neat to see that achieved in python, didn't think it had it in it
09:22:21 <oneswig> Thanks b1airo, keep it coming
09:22:51 <oneswig> A little related follow  up on the issues we had with VXLAN bandwidth
09:23:17 <oneswig> We were getting a feeble amount of TCP bandwidth - ~1.6Gbits/s on a 50G NIC
09:23:23 <b1airo> Just create a big list, generate another big (but smaller) list of random offsets and time accessing them. Run using numactl
09:23:38 <oneswig> With some BIOS tuning, this went up to ~3.2Gbits/s
09:23:58 <oneswig> Thanks to some helpful guidance from mlnx
09:24:02 <b1airo> Oh yeah?
09:24:15 <oneswig> I'll write it up when you write yours up :-)
09:24:31 <b1airo> Lol
09:24:37 <oneswig> But now we've got something more tangible which might make a bigger difference
09:24:52 <b1airo> Don't make me choke on my steak
09:25:00 <oneswig> offload of encapsulated frames is in some way borked up to kernel 4.6
09:25:33 <oneswig> So I'm currently working my way up to a latest 4.x kernel to see if I can get the step change I'm looking for
09:25:43 <b1airo> Can you run mainline?
09:26:14 <oneswig> Trying - this isn't production.  Currently up to latest 3.x, seems good so far, making my way to 4.x today is the plan
09:26:59 <oneswig> I'm hoping to find some useful data along the way
09:27:23 <oneswig> OK, any more on tuning and tweaking?
09:27:41 <oneswig> Lets move on
09:27:50 <oneswig> #topic RCUK workshop at the Crick in London
09:27:51 <b1airo> I'm done - chewing now
09:28:05 <oneswig> bon appetit b1airo, thanks for joining
09:28:17 <oneswig> Phil K are you here?
09:28:39 <oneswig> #link RCUK cloud workshop https://www.eventbrite.com/e/rcuk-cloud-working-group-workshop-tickets-27722389413
09:29:15 <oneswig> There's a workshop coming up discussion OpenStack use cases for research computing
09:29:29 <oneswig> Was hoping Phil could join us to talk it over
09:30:00 <oneswig> I'm planning on going - hopefully see you there
09:30:27 <oneswig> OK lets move on
09:30:39 <verdurin> I'll be there too
09:30:50 <oneswig> verdurin: long way to go? :-)
09:31:02 <oneswig> great, see you there
09:31:19 <oneswig> #topic Barcelona
09:31:33 <oneswig> 2 weeks to go now
09:31:48 <oneswig> We have our WG events on the schedule
09:32:02 <oneswig> I put some details here
09:32:08 <oneswig> #link https://wiki.openstack.org/wiki/Scientific_working_group#Barcelona_Summit.2C_October_2016
09:32:51 <oneswig> #link etherpad agenda for the WG meeting https://etherpad.openstack.org/p/scientific-wg-barcelona-agenda
09:33:37 <oneswig> Please add any items for discussion here
09:34:24 <oneswig> For the BoF (2:15pm Wednesay), we are now looking for lightning talks
09:35:09 <oneswig> I might talk about our "burst buffer" storage
09:35:51 <oneswig> As per the list mail, mail me and b1airo to put your talk on the schedule, we should have room for about 8
09:37:45 <oneswig> Finally for the evening social, I'm going to mail all ticketholders once I've scraped together the addresses - we have subsidy but need a charge for the meal
09:38:35 <oneswig> I'll mail later today with the details for how we will do that and give people the choice
09:39:09 <oneswig> Any more items for Barcelona?
09:41:21 <oneswig> #link There's also a scientific theme to the opening keynote this year https://www.openstack.org/summit/barcelona-2016/summit-schedule/events/17299/major-scientific-discoveries-on-openstack
09:42:10 <oneswig> #topic SC16
09:42:27 <oneswig> One more item for SC
09:43:05 <oneswig> #link b1airo put together an etherpad collecting OpenStack themed activities at SC https://etherpad.openstack.org/p/scientific-wg-supercomputing16
09:43:59 <oneswig> #topic Any other business
09:44:24 <oneswig> OK that's all I had for today - anyone have other news or things to discuss?
09:44:30 <dariov> sorry - urgent email to reply to. oneswig, we’re also planning to come down to the RCUK meeting
09:44:50 <oneswig> dariov: great, see you there
09:45:01 <dariov> (I guess we’re giving a talk on how we see porting pipelines to the cloud, or similar things)
09:46:00 <oneswig> I'd be interested to know if you're doing that including cloud APIs or by recreating the environment transparently to the application
09:46:31 <dariov> 2nd for the time being, way easier
09:47:03 <dariov> (and requires less coding, users usually start to run as soon we start talking about rewriting a few lines of code)
09:47:14 <oneswig> dariov: I may be doing something very similar in the coming months, I'll be paying attention
09:47:31 <oneswig> verdurin: are you speaking at the event?
09:48:34 <oneswig> OK, lets conclude for the week... any final items?
09:49:02 <oneswig> #endmeeting