11:01:20 <martial__> #startmeeting Scientific-SIG 11:01:21 <openstack> Meeting started Wed Jul 18 11:01:20 2018 UTC and is due to finish in 60 minutes. The chair is martial__. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:01:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:01:24 <openstack> The meeting name has been set to 'scientific_sig' 11:01:54 <martial__> Good day, welcome to a short version of the weekly Scientific SIG meeting 11:02:05 <daveholland> o/ 11:02:14 <janders> gday Martial! 11:02:26 <martial__> hello daveholland and janders 11:02:57 <martial__> Both Stig and Blair are otherwise occuppied and our agenda is light 11:03:21 <janders> Berlin CFP closed - bit of relief 11:03:49 <daveholland> John G persuaded me to put a proposal in, first time I've submitted one 11:03:50 <priteau> Hi everyone 11:04:14 <martial__> Hi priteau 11:04:24 <martial__> daveholland: cool :) 11:04:32 <janders> daveholland: great, congratulations! :) what is your preso about? 11:06:21 <daveholland> it's intended to be a high-level description of our first 12-18 months running a production OpenStack service (content to be finalized/argued about, if the proposal is accepted) 11:07:08 <martial__> look forward to hearing about it 11:07:11 <janders> very cool! 11:07:13 <janders> me too 11:07:40 <martial__> like I mentioned at the top of the hour, today is light 11:07:46 <martial__> CFP is closed 11:08:22 <janders> martial: have you put any proposals in? 11:08:29 <martial__> just to mention we have a Super Computing 18 Birds of a Feather proposal in the work to discuss OpenStack, Containers and Kubernetes 11:08:41 <janders> very cool! 11:09:01 <martial__> janders: with SC18 at the exact same time as Berlin, sadly I did not, I will be in Dallas 11:09:14 <janders> yeah it's a tough one, isn't it... 11:09:27 <martial__> sadly it is is 11:09:38 <janders> I'll almost certainly choose Berlin but I will have hard time bringing more people with me 11:10:34 <martial__> a lot of our usual HPC operators are at SC18. I have confirmed this with them 11:10:47 <martial__> some are on our BoF as well 11:11:00 <martial__> it is likely Blair will be as well 11:11:20 <martial__> Stig will be in Berlin to run the Scientific SIG 11:11:34 <janders> ACK! That's good to know 11:11:50 <martial__> and that is the entire content of the agenda that I had 11:12:05 <daveholland> hehe. I have a quick NUMA question for AOB 11:12:15 <martial__> like I said a short meeting so 11:12:18 <martial__> #topic AOB 11:12:24 <martial__> daveholland: go 11:12:44 <verdurin> Hello 11:12:48 <daveholland> so, we're using NUMA-aware instances for one particular user/project, with extra_spaces like hw:cpu_policy='dedicated', hw:cpu_thread_policy='isolate', hw:numa_nodes='2' 11:13:08 <daveholland> it's successful in that they see a performance benefit. Now we're being asked to enable this more widely. 11:13:30 <daveholland> What are people's experiences with mixing NUMA-aware/non-NUMA-aware instances on the same hypervisor? 11:13:39 <daveholland> (should be go "NUMA only"?) 11:13:59 <janders> what's your motivation behind using NUMA aware instances? 11:14:12 <janders> consistent, predictable performance? 11:14:12 <daveholland> "make it go faster" 11:14:29 <janders> I haven't tried this myself, but my gut feel is mixing it will yield random results 11:14:30 <daveholland> this is a CPU/memory access heavy workload 11:14:53 <janders> I did something similar before the NUMA aware days, I had flavors with and without CPU overcommit 11:15:02 <janders> these would be tied to different host aggregates 11:15:06 <janders> worked well 11:15:22 <janders> if I were to implement NUMA I'd probably start with the same 11:15:31 <daveholland> I should clarify, this is in an isolated aggregagte, we are looking at 26 or 52 VCPU instances (on a 56 CPU host... 28 with HT enabled) 11:15:48 <verdurin> daveholland: mixing the two sounds like a Brave and Exciting move to me 11:16:01 <daveholland> verdurin: that was my initial reaction too 11:16:31 <daveholland> we are considering enabling it for the biggest flavor only (so it would be the only instance on the hypervisor so couldn't trip over anything else) 11:16:46 <janders> do you overcommit CPUs on NUMA-aware instances? 11:17:09 <martial__> (sorry have to check on kids) 11:17:11 <janders> from your above comment I understand you don't - correct? 11:17:15 <daveholland> janders: not yet. We think it would be a Bad Idea because the cpu pinning + noisy neighbours would make life worse than expected 11:17:50 <janders> ok.. I think I'm getting it more 11:18:17 <janders> so - the concern is that you'll have enough cores to run both instances w/o overlapping cores 11:18:38 <janders> however having mix of NUMA and non-NUMA can somehow cause scheduling both VMs onto the same cores? 11:18:44 <janders> do I get this right? 11:18:47 <daveholland> janders: yes, I think that sums it up 11:18:59 <janders> (I'm using a case with two VMs to make it easier for me to follow) 11:19:47 <janders> if we take NUMA out of the picture for a second - if we had two non-NUMA instances on the hypervisor w/o CPU overcommit 11:20:05 <janders> is there any chance they would hit the same cores? My guess - no. 11:20:24 <janders> I wonder how KVM handles that 11:20:33 <priteau> From http://specs.openstack.org/openstack/nova-specs/specs/mitaka/implemented/virt-driver-cpu-thread-pinning.html, I read: If the host does have an SMT architecture (i.e. one or more cores have “thread siblings”) then each vCPU will be placed on a different physical core and no vCPUs from other guests will be placed on the same core. 11:20:37 <janders> and then question is - how NUMA awareness can affect that 11:20:43 <daveholland> for a non-overcommit flavor/host/aggregate - I think you are correct (we don't cpu-pin for those flavours as no perceived need) 11:20:58 <priteau> It's not clear if that works only when all instances use hw:cpu_thread_policy=isolate 11:21:25 <daveholland> priteau: thanks, I hadn't seen that spec 11:21:36 <daveholland> (we are on Pike) 11:22:04 <janders> what OS and OpenStack "distro"? 11:22:06 <priteau> daveholland: It's the first hit I got on Google, but I see pretty much the same text is used on the official doc: https://docs.openstack.org/nova/pike/admin/flavors.html 11:22:10 <daveholland> RHOSP12 11:22:20 <janders> have you looked at real time KVM doco? 11:23:14 <janders> I'm not sure if they go down to that level of detail, but I remember the RHATs using these things for their NFV implementations 11:23:27 <daveholland> no (I thought that was more for SRIOV or NFV, we haven't touched those) 11:23:45 <janders> I'll have a quick look out of curiosity 11:24:04 <janders> I noticed that at my time at RHAT - and now it's on my TODO list for a bit later in the project 11:25:24 <daveholland> heh, Google tells me there are past summit presentations on RT KVM, I will check them out 11:26:28 <martial__> daveholland: keep us updated what you find in a follow up meeting? 11:26:50 <janders> I was hoping that RHAT have a dedicated section on RT KVM in OSP doco, but unfortunately not 11:26:55 <janders> maybe worth a support ticket? 11:27:04 <daveholland> Certainly will.I think we understand most of the machinery (how to pin, what to pin, what thread policy etc) - and have had success with single instance per hypervisor in a separated aggregate - our uncertainty is mixing this configuration with the vanilla flavors. 11:27:16 <daveholland> +1 to support ticket, thanks 11:27:30 <janders> idea: 11:27:40 <janders> say you have 32 pCPU 11:27:55 <janders> spin up 16x 2vCPU instances, half NUMA aware, half not 11:28:03 <janders> look at xmls to see what cores they landed on 11:28:36 <priteau> daveholland: Quick look at the code that enforces the isolate policy (nova/virt/hardware.py), I *think* that your non-NUMA instances may be forbidden to execute on the core where a NUMA instance is pinned 11:28:53 <janders> using fewer larger VMs may yield more sensible results, but chances of running into a scheduling clash are lower 11:29:17 <janders> priteau: good stuff! 11:29:52 <janders> does NUMA aware xml differ much from a vanilla one? 11:30:22 <daveholland> you get to see the CPU mapping AIUI 11:30:27 <daveholland> thanks for the ideas and pointers. 11:30:50 <janders> thanks for an interesting question! :) 11:30:57 <priteau> daveholland: You could enable debug logs and look at what Nova (I assume nova-compute) prints out 11:31:14 <priteau> It should say something like "Selected cores for pinning: …" 11:32:25 <janders> are you using cpu_mode=host-passthrough ? 11:32:25 <priteau> http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/hardware.py?h=stable/pike#n890 11:32:29 <daveholland> OK I think our best bet is to do some experiments too 11:33:26 <daveholland> we have cpu_mode=host-model (all the hypervisors are identical....... currently) 11:34:02 <janders> regarding experiments - perhaps it's worth running Linpack with just the NUMA-aware instance running 11:34:09 <janders> and then add non-NUMA-aware 11:34:16 <janders> see if there's much fluctuation 11:35:12 <janders> I only tried running Linpack in VMs with CPU passthru though 11:35:52 <janders> I'd think that if there's more of an overhead in other CPU modes it will be consistent, but it's not that I tested that.. 11:36:02 <daveholland> the future is hazy but if we agreed not to want migration then host-passthrough is worth a look, yes 11:36:40 <janders> I remember needing passthru for nested virt, too 11:36:48 <janders> (OpenStack-on-OpenStack development) 11:37:15 <janders> but spot on all these optimisations can come back to bite later.. 11:43:46 <daveholland> plenty to think about, thanks all 11:44:59 <martial__> thanks daveholland 11:45:20 <martial__> please follow up in the channel for an update on this 11:45:25 <martial__> another other AOB? 11:45:53 <martial__> Otherwise let me call this meeting to an end (I must go unfortunately) 11:46:00 <martial__> thanks everybody 11:46:03 <martial__> #endmeeting