#openstack-meeting log

09:00:47 <oneswig> #startmeeting scientific_wg
09:00:48 <openstack> Meeting started Wed Aug 17 09:00:47 2016 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:00:49 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:00:51 <openstack> The meeting name has been set to 'scientific_wg'
09:01:17 <oneswig> #link Agenda from wiki https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_August_17th_2016
09:01:31 <oneswig> Good morning all
09:01:44 <dariov> good morning!
09:01:50 <priteau> Good morning
09:01:58 <jakeyip> hi
09:02:09 <oneswig> Hi there!
09:02:34 <blairo> evening
09:02:40 <oneswig> blairo: where's the '1'?
09:02:47 <oneswig> #chair blairo
09:02:48 <openstack> Current chairs: blairo oneswig
09:02:49 <blairo> wondered whether you'd notice
09:03:01 <oneswig> I miss nothing (except family birthdays)
09:03:03 <blairo> coming to you live from the office tonight
09:03:25 <oneswig> The Presidential suite?
09:03:28 <dariov> lol
09:03:29 <blairo> (where i have my client configured properly for my irc bouncer)
09:04:05 <oneswig> Aha, well lets get the show on the road - I don't think there's a massive amount this week, it's holiday season over here
09:04:14 <oneswig> #topic Accounting and Scheduling
09:04:33 <oneswig> Did you see the mail from Danielle Mundle?
09:04:41 <oneswig> #link http://lists.openstack.org/pipermail/user-committee/2016-August/001186.html
09:04:45 <blairo> re. quota study?
09:04:55 <oneswig> that's the one.  Seemed relevant to some in this wG
09:05:18 <dariov> yep, saw that
09:05:20 <blairo> yes i volunteered (i they can accommodate me being away next two weeks)
09:05:47 <dariov> I think a couple of guys here might be interested, but they’re on holiday for the next two weeks ;-)
09:05:50 <dariov> bad timing
09:05:56 <blairo> in the last UX thing i did (horizon mock-up review) i made some comments around quotas and the nectar use-cases
09:06:12 <oneswig> Quite an involved approach to have a video interview but it sounds like it could be useful way to capture feedback
09:06:22 <blairo> would be good to formalise that
09:06:37 <blairo> it's particularly relevant to large distributed clouds running cells i think
09:07:21 <blairo> yeah i guess they are after qualitative rather than quantitative data
09:08:04 <oneswig> I recall Tim Bell's gripe was the combination of a user quota and a group quota
09:08:15 <oneswig> I hope that's well covered in their survey
09:08:25 <blairo> reminds me i have not been back to review, summarise, share the results of the research/science cloud survey we knocked up for austin
09:08:49 <oneswig> perfect holiday activity?
09:09:18 <blairo> could be (if my wife is not looking ;-)
09:09:48 <oneswig> You take concerns over data confidentiality very seriously, that's impressive :-)
09:10:06 <oneswig> OK well I think that's the UX message covered
09:10:13 <priteau> In Chameleon the main pain point about quota is that they get out of sync regularly, and have to be corrected manually in the database. I have seen it reported on launchpad, anyone here seen it as well?
09:10:20 <oneswig> Was there more on accounting and scheduling to cover?
09:10:22 <blairo> i think cern in particular want a nested quota structure that works for a department - project/team - individual layered approach
09:10:47 <dariov> same here blairo
09:11:00 <dariov> we actually discussed with Tim on that back in march I guess
09:11:13 <dariov> we would *love* to have the same thing at the EBI
09:11:19 <blairo> priteau: yes we have that problem in many large deployments i think
09:11:20 <dariov> it would solve many problems
09:11:34 <oneswig> Good points people - stand up and be heard
09:11:39 <blairo> in nectar our cron box runs a regular quota sync task
09:11:58 <dariov> like the “Give the PI a big quota, and then let him decide how to split it among his people” kind of thing
09:12:28 <blairo> dariov: ok, well i'll be sure to make sure the UX folks are aware of that requirement and the people interested in it
09:12:29 <dariov> this would save some paint at our cloud guys
09:12:38 <dariov> blairo, thanks
09:12:58 <blairo> tim has blogged pretty extensively on this already so i think we can just say: that please
09:13:10 <oneswig> #link Sign up here for UX discussions http://doodle.com/poll/7tid473za2hpi6e7
09:13:47 <oneswig> Any more to cover on this?
09:14:18 <blairo> maybe one thing
09:15:11 <blairo> priteau: i can point you to what we're doing in nectar for quota sync (i'm not intimately familiar with the code but sounds like it may be the same problem)
09:15:32 <blairo> that was all
09:15:49 <oneswig> #topic User stories
09:15:53 <priteau> blairo: I'd like to take a look if you've got the scripts in a public repo. I have found another one on GitHub as well
09:16:59 <oneswig> OK I'm still working away on the OpenStack/HPC paper but I've realised I need some more data points, perhaps you can help
09:17:16 <blairo> priteau: details here: http://lists.openstack.org/pipermail/openstack-operators/2015-March/006596.html
09:17:23 <priteau> thanks blairo
09:18:22 <oneswig> Thanks blairo looks handy
09:18:43 <oneswig> You might have seen on the operators list, I'm looking for an IB user, ideally in production
09:18:52 <oneswig> Thought there were loads, turns out not so much
09:19:10 <oneswig> Does anyone know one?
09:19:52 <blairo> you might be first oneswig o_0
09:20:14 <oneswig> I don't have IB (although I might be getting some second-hand kit to experiment with)
09:20:41 <dariov> so, just because I’m noob, what’s an IB?
09:20:46 <oneswig> Jon Mills said they used to, but found IPoIB was slower than 10GE
09:20:50 <oneswig> IB = infiniband, sorry
09:20:58 <blairo> ah yes, crossing my link-layers - cambridge has SN2700 ethernet fabric too
09:20:59 <dariov> ah-ah
09:21:03 <dariov> thnx
09:21:40 <verdurin> Morning. I will have it shortly, as I've already told oneswig.
09:21:54 <oneswig> Thanks verdurin, and good morning
09:22:36 <blairo> oneswig: are you wanting folks that are doing IB all the way to guest, i.e., sriov, or just using IB as their DC interconnect ?
09:22:50 <oneswig> Well I'm not sure, whatever turns up
09:22:54 <blairo> (i assumed the former, but realised i could be wrong)
09:23:00 <blairo> ok
09:23:14 <oneswig> I'm trying to think of ways an HPC user might think of what OpenStack can't do
09:23:45 <oneswig> I think I'm trying to prove an apple is an orange
09:24:17 <blairo> so long as there are no lemons involved i don't see a problem
09:24:39 <oneswig> Right!  Bowled you a slow one there :-)
09:25:00 <blairo> like australia against sri lanka
09:25:24 <oneswig> I think I missed that one
09:25:28 <oneswig> underarm?
09:25:52 <blairo> clean sweep of latest test series to them
09:26:20 <oneswig> There's a strange randomness to cricket outcomes I don't fully understand
09:26:23 <oneswig> OK, I'll keep looking.  Anything else to cover on user stories?
09:26:55 <blairo> i'm finally reading your draft now oneswig
09:27:10 <oneswig> blairo: great thanks, appreciate that
09:28:10 <oneswig> #topic Bare metal
09:28:57 <oneswig> I don't think I have anything on bare metal this week.  What's new in the Ironic world?
09:29:42 <oneswig> verdurin: will your new cluster be bare metal with IB or virtualised?
09:31:15 <oneswig> Think he's gone...?
09:32:07 <oneswig> OK lets move on
09:32:10 <priteau> oneswig: All the multi-tenant network support appears to have been committed, but we have yet to evaluate it
09:32:13 <blairo> probably to get coffee
09:32:39 <oneswig> priteau: great, thanks for the update on that, good to know
09:33:09 <oneswig> I'll be interested to see how they are mapping physical machines to physical network ports - do you know?
09:33:31 <priteau> I think you have to define the port number when registering a node
09:33:54 <oneswig> Ah OK, defer the mapping problem...
09:34:18 <oneswig> This might be something the Ironic Inspector could learn, I wonder
09:34:21 <oneswig> cogs whirring
09:34:38 <oneswig> priteau: BTW did you get any further with your patches for Blazard?
09:35:05 <priteau> oneswig: not yet unfortunately
09:36:05 <oneswig> ok, just wondering.  A big set of patches can be like being carried away by a helium balloon - the longer you leave it, the more it'll hurt
09:36:56 <oneswig> OK, move on?
09:37:09 <oneswig> #topic Parallel filesystems
09:38:09 <oneswig> I believe the team at Cambridge have got Lustre into their VMs but I don't know how it's performing.  Progress for them though.
09:38:29 <verdurin> oneswig: virtualised, at least initially
09:38:42 <verdurin> We plan to investigate GPFS into our VMs
09:39:06 <blairo> good to hear
09:39:07 <oneswig> verdurin: via IB I assume?
09:39:50 <verdurin> oneswig: preferably. Ethernet is possible, too.
09:39:56 <oneswig> verdurin: how close are you to getting this system up and running?
09:40:17 <blairo> ours on M3 is going fine, still tuning the filesystem for performance at a more basic level, no performance issues caused by sriov at this stage
09:40:41 <verdurin> oneswig: not very - super busy with other stuff but people are starting to ask about it more, so I'll have to find the time
09:40:54 <oneswig> blairo: That's Lustre right?
09:41:01 <blairo> yep
09:41:06 <blairo> using o2ib LNET
09:41:56 <oneswig> dariov: what do you use at EBI?
09:42:12 <dariov> oneswig, NetApp I think
09:42:27 <dariov> but the guys are moving loads of stuff lately
09:42:32 <dariov> so I might need to check with them
09:42:46 <oneswig> Be interesting to hear what they migrate to
09:42:52 <oneswig> If that's the plan
09:42:59 <dariov> I don’t think
09:43:25 <dariov> we got “some” new kits coming in in the near future
09:44:02 <dariov> when they’ll be here they’ll start migrating everything to Mitaka
09:44:03 <blairo> oneswig: one thing to be aware of that i discovered recently is that device passthrough, e.g. for sriov NICs/HCAs, means that transparent huge pages cannot be allocated by the host
09:44:46 <oneswig> Giving a consequence for memory-intensive workloads?
09:44:52 <blairo> (because IOMMU requires guest memory to be pinned)
09:45:02 <dariov> but the storage backend will be the same
09:45:42 <oneswig> If guest memory must be pinned what effect does that have on things like KSM, overcommitment and ballooning?
09:45:43 <blairo> yes will probably be very sucky for memory intensive workloads if you have very large guests (as bigger memory footprint means more top-level TLB misses)
09:46:10 <blairo> e.g. 200+GB of 4kB pages
09:46:32 <blairo> but! you just have to know to use static huge pages instead
09:46:57 <oneswig> Can you note this in the doc?  Pearls of wisdom from Down Under
09:46:57 <blairo> i don't have any numbers yet that quantify this, but i hope to produce something at least anecdotal
09:47:07 <blairo> copy
09:47:11 <oneswig> Thanks!
09:47:31 <oneswig> OK, AOB?
09:47:33 <blairo> and yes, that would impact KSM etc, but you already want those disabled if you care about HPC-like performance
09:47:43 <oneswig> right
09:47:56 <oneswig> but perhaps this means you must have them disabled?
09:48:26 <oneswig> #topic AOB
09:48:39 <oneswig> I had an interesting problem this week
09:48:49 <oneswig> Mellanox NICs, anyone got those? ;-)
09:48:59 <blairo> not sure about ballooning as does that even work in libvirt+kvm reliably? but for KSM and overcommit just means they won't work
09:49:11 <oneswig> We've had this kernel barf whenever a TCP connection is initiated via VXLAN
09:49:15 <oneswig> Does anyone see that?
09:49:32 <blairo> though if you are overcommitting by force, like with nova scheduler, it probably means you'll end up DOS-ing your compute nodses
09:49:48 <blairo> i.e. they'll OOM
09:50:11 <blairo> haven'
09:50:33 <blairo> haven't tried any VXLAN traffic on them yet, but we are quite close to doing that with midonet
09:51:10 <oneswig> Midonet?  Interesting.  We have VXLAN+OVS and VLAN+SRIOV networks
09:51:13 <blairo> sounds worrisome
09:51:32 <oneswig> Only the VXLAN ones have this issue with kernel backtraces
09:51:37 <blairo> latest firmware+driver i take it?
09:52:02 <oneswig> I think so.
09:52:14 <blairo> oh, do you the Pro models specifically that support VXLAN and GRE offload?
09:52:22 <oneswig> Working on it now with support - I'll report back
09:52:23 <blairo> *you mean
09:52:36 <oneswig> Don't think so, we have ConnectX4-LX
09:52:54 <oneswig> Don't recall any reference to pro on these ones
09:52:56 <blairo> yeah all newer cards just support that
09:53:10 <blairo> but earlier CX-3 cards did not
09:53:29 <blairo> mellanox will tell you they are all pro now ;-)
09:53:33 <oneswig> I think we must be doing something wrong: VXLAN bandwidth is ~1.6Gbits/s at best - on a 50G link
09:53:49 <blairo> fineprint: except for the bugs
09:54:01 <oneswig> That's for iperf.  If there's a hardware offload, it's not engaging
09:54:12 <blairo> yeah definitely
09:54:31 <oneswig> Question is, what counts for decent bandwidth in a VXLAN world?  I don't have much hope for it.
09:55:18 <oneswig> Through SR-IOV, we sustained 10.5GBytes/s (bi-directional)
09:55:20 <oneswig> brb
09:56:08 <blairo> well i believe early testing in nectar land with iperf over midonet (which is just vxlan once established) managed 4-5Gbps melbourne to queensland (no h/w offload)
09:56:30 <oneswig> midonet is based on OVS too, right?
09:56:39 <blairo> yeah OVS dataplane
09:56:59 <oneswig> standard MTU?  It appears to be massively interrupt-dominant in the hypervisor
10:16:26 <oneswig> #endmeeting