#openstack-meeting log

21:01:06 <b1airo> #startmeeting scientific-wg
21:01:07 <openstack> Meeting started Tue Nov 29 21:01:06 2016 UTC and is due to finish in 60 minutes.  The chair is b1airo. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:01:08 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:01:10 <openstack> The meeting name has been set to 'scientific_wg'
21:01:14 <oneswig> Greetings all
21:01:15 <b1airo> #chair oneswig
21:01:15 <openstack> Current chairs: b1airo oneswig
21:01:20 <trandles> hello
21:01:21 <jmlowe> hello
21:01:30 <b1airo> gday
21:01:45 <oneswig> hi all
21:02:21 <b1airo> how's things?
21:02:33 <oneswig> #link Agenda items https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_November_29th_2016
21:02:38 <b1airo> good thanksgiving break jmlowe, trandles ?
21:02:43 <jmlowe> getting back to normal
21:02:47 <trandles> yessir, thank you
21:02:47 <oneswig> All well here, had the research councils UK cloud workshop today
21:03:00 <priteau> Good *
21:03:01 <trandles> is everyone done with the mad travel schedule for the rest of 2016?
21:03:04 <b1airo> ah cool - did you chair / present something?
21:03:19 <jmlowe> I was home for a day and took the family to my inlaws in TX
21:03:23 <oneswig> I presented, kind of chaired, but that aspect was fairly minimal
21:03:33 <trandles> well attended oneswig ?
21:03:35 <b1airo> trandles, yes - though i already have 5 o/s trips in my calendar for 2017
21:03:54 <trandles> ouch b1airo
21:03:54 <oneswig> trandles: 140-odd I believe - not a sell-out but a good crowd
21:04:06 <oneswig> b1airo: coming to the UK?
21:04:34 <b1airo> oneswig, hopefully! but that isn't one of the firm ones yet
21:04:53 <martial> (sorry I am late, meeting running over)
21:04:53 <oneswig> What do we need to do to win the Bethwaited account, I wonder...
21:05:02 <oneswig> Hi Martial
21:05:06 <b1airo> #chair martial
21:05:07 <openstack> Current chairs: b1airo martial oneswig
21:05:11 <b1airo> hi martial
21:05:18 <martial> Hi Stig, Blair
21:05:26 <oneswig> Time to tee off?
21:05:45 <b1airo> yep, want to tell us about your vxlan results?
21:05:52 <jmlowe> I'll poke Bob and see if he's available
21:05:58 <oneswig> OK, lets do that...
21:05:58 <rocky_g> o/
21:06:12 <oneswig> #topic TCP bandwidth over VXLAN and OVS
21:06:14 <b1airo> hi rocky_g
21:06:26 <oneswig> Hi rocky_g
21:06:57 <oneswig> So, this has been a bit of a saga ever since we realised how much our shiny new deployment with its 50G/100G network sucked in this respect
21:07:20 <rocky_g> Hey!  This meeting time works great for me...I just stay on after the TC
21:07:27 <jmlowe> oneswig: I take it you get near line rate when doing bare metal?
21:07:36 <oneswig> We were seeing instance to instance bandwidth around 1.2gbits/s measured using iperf
21:07:49 <oneswig> jmlowe: bare metal, 1 stream TCP, 46 gbits/s
21:07:57 <jmlowe> close enough
21:08:04 <oneswig> I was satisfied :-)
21:08:07 <rbudden> hello
21:08:11 <oneswig> Hi rbudden
21:08:15 <rbudden> sorry, the time change got me!
21:08:16 <jmlowe> are you sysctl's posted somewhere?
21:08:26 <b1airo> oneswig, 1 stream?! e5-2680v3 ?
21:08:34 <b1airo> what window size?
21:08:35 <oneswig> Not sysctls for that, didn't do any...
21:08:42 <jmlowe> wow
21:08:53 <b1airo> jmlowe, +1
21:09:00 <oneswig> I know, I was quite pleased.
21:09:21 <b1airo> think there might be an element of VW tuning in play?? ;-)
21:09:27 <rocky_g> Yeah, bare metal looks good, but the vm stuff really *sucks*
21:09:35 <oneswig> Anyway, we got VM-to-VM bandwidth up after much tuning
21:09:55 <jmlowe> my sysctl's https://raw.githubusercontent.com/jetstream-cloud/Jetstream-Salt-States/master/states/10gige.sls
21:10:07 <jmlowe> use them everywhere
21:10:31 <oneswig> First move was to turn off all those power-saving states and some things Mellanox support recommended disabling
21:10:40 <oneswig> That got us to around 2-3 gbits/s
21:10:43 <oneswig> oh.
21:11:16 <oneswig> Tried a new kernel after that - 4.7.10 - apparently there is better handling of offloading of encapsulated traffic further upstream
21:11:23 <oneswig> Got us to 11gbits/s
21:11:29 <priteau> nice
21:11:58 <oneswig> Then we turned off hyperthreading - all those wimpy cores are fine for production but no good for hero numbers
21:12:08 <oneswig> Then we were up around 18gbits/s
21:12:25 <oneswig> Then I did VCPU-to-PCPU core pinning
21:12:30 <oneswig> that hit 21 gbits/s
21:12:31 <b1airo> oneswig, so where do Mellanox's shiny graphs (e.g. advertising CX-3 Pro 2+ years ago) come from...?
21:13:01 <b1airo> ahhh! i assumed you were pinning to begin with
21:13:14 <oneswig> That's CX-3 - and apparently the driver had a lot of capabilities back-ported to it which don't translate to mlx4 - that backporting process must be redone
21:13:40 <oneswig> b1airo: no - actually this was a proper nuisance
21:13:46 <b1airo> meanwhile CX-6 has been announced...
21:14:10 <oneswig> Certainly the next bit - NUMA passthrough - I had to build a new QEMU-KVM - what you get with CentOS is nobbled
21:14:35 <trandles> the amount of fixes/improvements going into upstream kernels for cx-4 and cx-5 are staggering
21:14:35 <jmlowe> even the qemu 2.3 from the ev repo?
21:14:36 <b1airo> ha yeah, 1990s
21:14:41 <oneswig> b1airo: announced for when?
21:15:07 <oneswig> jmlowe: Does that work for CentOS?  Not sure if it does?
21:15:10 <b1airo> oneswig, same time as HDR presumably
21:15:30 <oneswig> b1airo: wait a minute, I thought that was announced at SC ... :-)
21:16:26 <oneswig> Anyhow, I built and packaged QEMU-KVM 2.4.1, set isolcpus to give the host OS + OVS 4 cores, and pinned the VM to the NUMA node with the NIC attached to it.
21:16:36 <oneswig> That got me to 24.3 gbits/s
21:16:39 <b1airo> indeed - don't know whether there are even any eng. samples yet though
21:16:44 <jmlowe> this works just fine for us, has 2.3 in it, now if they could just get a non antique libvirt http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/
21:17:17 <oneswig> jmlowe: ooh, that's good - thanks, makes my life easier
21:17:32 <martial> jmlowe: there was a few posts about this on the ML recently if my memory is correct
21:17:50 <b1airo> jmlowe, you're talking about the rhev repos for centos?
21:17:50 <jmlowe> I should read the ml more carefully
21:17:59 <jmlowe> yeah, that's the one
21:18:27 <b1airo> oneswig, something of a saga then
21:18:34 <oneswig> I had to find an EFI boot ROM RPM from somewhere as well - the package I rebuilt from fedora had a load of broken links in it
21:18:45 <b1airo> and still only 50% of your bare-metal performance
21:18:46 <priteau> Looks like they may even have QEMU 2.6 soon: https://cbs.centos.org/koji/buildinfo?buildID=13884
21:18:53 <oneswig> That's where I am now - but the joy of it is that a rising tide lifts all boats.
21:18:56 <b1airo> what are mellanox doing to make this right?
21:19:21 <oneswig> From doing this tinkering, I got SR-IOV bandwidth raised from 33gbits/s->42gbits/s and bare metal to 46gbits/s
21:19:42 <rocky_g> And have you considered writing this up as a superuser blog?
21:20:05 <b1airo> ha!
21:20:09 <oneswig> rocky_g: that would be fab.  I am writing it up next time I get on a train - tomorrow - will share
21:20:19 <rocky_g> Fantastic!
21:20:32 <jmlowe> any thoughts as to how much of a difference if any there would be with linuxbridge vs ovs?
21:20:47 <oneswig> jmlowe: Ah, what a question
21:21:07 <jmlowe> being probably the only linxubridge guy here, I had to ask
21:21:16 <oneswig> I think there would be a positive uplift from ditching OVS, but tripleO has no means to deploy without i
21:22:15 <oneswig> There was more on this: the kernel capabilities gained in 4.x are being backported by RH to the 7.3 kernel
21:22:28 <jmlowe> I haven't been following triple o, the latest install guides have dropped ovs?  wondering if triple o would follow
21:22:38 <oneswig> Which makes the whole process much more attainable.
21:23:15 <oneswig> jmlowe: I'd need to check, I may be out of date but I hadn't seen that
21:23:18 <b1airo> so oneswig, had mellanox never looked at vxlan performance with CX-4 ?
21:23:58 <oneswig> b1airo: who knows?  I'm sure they are busy people :-)
21:24:16 <oneswig> I can't fault their efforts to get a solution once the issue was clear
21:25:02 <b1airo> yes, us either - just think their testing leaves something to be desired
21:25:06 <leong> o/
21:25:10 <oneswig> #action oneswig to write up a report and share the details on reproducing
21:25:13 <oneswig> Hi leong
21:25:47 <oneswig> OK next topic?
21:26:15 <oneswig> #topic telemetry and monitoring - research computing use cases
21:26:44 <oneswig> OK this is one of our activity areas for this cycle, I wanted to get some WG thoughts down
21:27:08 <jmlowe> I've got two use cases
21:27:14 <martial> so there was a good conversation on the ML recently
21:27:18 <oneswig> I think there are specific use cases we like that others don't need
21:27:40 <oneswig> #link etherpad for brainstorming
21:27:43 <oneswig> #link https://etherpad.openstack.org/p/scientific-wg-telemetry-and-monitoring
21:28:08 <oneswig> martial: got a link to the mailing list thread?
21:28:18 <martial> they mentioned Collectd
21:28:23 <martial> let me find it out
21:28:35 <oneswig> martial: thanks
21:29:08 <leong> for the telemetry and monitoring? are we looking at "enhancing" existing related openstack project to support those features/use-cases of scientific?
21:30:15 <oneswig> leong: interesting question.  Lots of problems with existing projects
21:30:21 <jmlowe> I've added my 3 current active use cases to the etherpad
21:30:42 <oneswig> jmlowe: what do you mean by the first?
21:31:42 <martial> FYI, we are developing a solution in house for telemetry aggregation
21:32:04 <oneswig> Great one b1airo
21:32:15 <leong> ok.. back step a bit.. are we aiming to create a user story in PWG and then perform gap analysis, then decide the best solution from there?
21:32:41 <martial> I have to find it in the archive, but it was an email: [Openstack-operators] VM monitoring suggestions
21:32:52 <jmlowe> for every hour there is an event generated in the ceilometer (now new project that I can't remember) database of instance exists with a start and end time up to the next hour, the cpu count, project, user and that can relatively easily be adapted to look like a hpc job and can be fed into existing hpc job reporting systems
21:32:53 <oneswig> leong: that's a likely path I'd guess.  We already have a federation user story in review
21:33:05 <leong> great oneswig!
21:33:34 <rocky_g> gnochhi
21:34:06 <martial> cloudkitty
21:34:16 <leong> having initial discussion on etherpad is a good start. Moving that discussion towards PWG user story will be able to keep track of the historical viewpoint/discussion on gerrit
21:34:42 <martial> our solution relies on a python library using psutil and we are also using ganglia
21:34:47 <oneswig> leong: this is how the other story first took shape
21:34:57 <oneswig> martial: got any documentation online for it?
21:35:23 <leong> there are new project evolving, including gnocchi, aodh which might be able to meet the needs.. however, without a formal description of the problem/use-case, it is hard to comment
21:35:44 <martial> oneswig: giving a link to my team member to the etherpad to describe it
21:36:24 <jmlowe> I'm looking at doing something with some civil engineers who have a stream of time series data coming from traffic cameras, they need to have continuous ingestion and aggregation, I'm going to try them with gnocchi and get them off of their 5TB ms sql db
21:36:25 <leong> it is also worthwhile to mention/document what existing solutions are adopted by existing scientific users in the User Story.
21:36:33 <oneswig> #link previous user story on federation - please review https://review.openstack.org/#/c/400738/
21:36:54 <martial> jmlowe: I need to get you in touch with a colleague of mine
21:37:22 <martial> jmlowe: and our Data Science team that does work on traffic camera data :)
21:37:23 <b1airo> jmlowe, o_0
21:37:30 <oneswig> leong: so one relevant shortcoming is I don't believe there is any way to transiently enable high-resolution telemetry - just as an example
21:37:48 <jmlowe> I suspect there are a lot of civil engineers grabbing data from their state's DOT and we could probably make a thing that several of them could use
21:38:39 <b1airo> jmlowe, i think you will be happy with gnochhi - we have it deployed in nectar and seems to work weel
21:38:42 <b1airo> *well
21:39:07 <oneswig> jmlowe: 5TB of SQL...
21:39:13 <jmlowe> I've been using gnocchi since the 1.x series
21:39:32 <jmlowe> each major release has been an order of magnitude improvement
21:39:43 <oneswig> We were using Influx as backend but are stuck now, what does Gnocchi use nowadays?
21:40:05 <b1airo> oneswig, stuck why? because they ripped influx out?
21:40:09 <jmlowe> file, and ceph, and one other
21:40:26 <jmlowe> if you use ceph make sure you are on the 3.x series
21:40:34 <rocky_g> fluent maybe?
21:40:40 <priteau> http://gnocchi.xyz/architecture.html#how-to-choose-back-ends
21:41:10 <b1airo> oneswig, we (well really sorrison) put influx support back and redesigned the driver
21:42:08 <oneswig> Others have said the Ceph backend kind of consumes the ceph - needs its own dedicated cluster - I wonder what rate of time-series metrics is attainable
21:42:12 <b1airo> not sure where the reviews are at - but pretty sure it is all going upstream
21:42:32 <jmlowe> I have very nice things to say about Gordon Chang, one of the gnocchi devs, he's been immensely helpful
21:42:40 <oneswig> b1airo: that's good, but I don't think we'd pay Influx for the clustered backend
21:43:12 <b1airo> don't need to
21:43:38 <b1airo> i could get sorrison to join next week and give us the low-down
21:43:50 <jmlowe> the ceph backend in the 1.x series relied too heavily on xattrs which didn't scale, the 2.x series created too many new objects which lead to lock contention and slow ops warnings, the 3.x series has been problem free
21:43:55 <oneswig> b1airo: which timezone works best for sorrison?
21:44:18 <b1airo> probably next week - he rows in the mornings
21:44:59 <oneswig> b1airo: sounds good to me
21:45:50 <b1airo> oneswig, this might(?) be the tree he's been working on: https://github.com/NeCTAR-RC/gnocchi/tree/nectar/2.2
21:47:20 <oneswig> What do people use for monitoring the health of OpenStack itself?  We have used Monasca, the agent gathers useful data out of the box
21:48:05 <b1airo> oneswig, how did you find the setup ?
21:48:15 <jmlowe> we've been using zabbix, because it was there, I'd love to use something better
21:48:21 <martial> found the ML thread: http://lists.openstack.org/pipermail/openstack-operators/2016-November/012129.html
21:48:41 <leong> oneswig: You mean the OpenStack control-plane?
21:48:49 <leong> nagios is one
21:48:57 <oneswig> My team mate took quite a few unhappy days on it - Monasca seems to have no concept of a lightweight solution :-)
21:48:59 <rbudden> we use naemon at PSC for most of our monitoring
21:49:14 <priteau> oneswig: we use Nagios with plugins from https://github.com/cirrax/openstack-nagios-plugins
21:49:36 <oneswig> rbudden: got a link to that?
21:50:02 <rbudden> oneswig: http://www.naemon.org
21:50:03 <oneswig> leong: yes - for example we get historical data of per-service CPU & RAM consumption
21:50:07 <rbudden> i believe it’s a fork from Nagios
21:50:10 <rocky_g> has anyone looked at the Vitrage project?  I know it's supposed to be root cause analysis, but what does that project use to capture and store their info?
21:50:44 <oneswig> Wasn't Zabbix also an evolution of Nagios?  I sense a bake-off coming
21:50:47 <b1airo> the attractive thing about monasca is that it understands openstack - nagios is easy for monitoring process and service state, but what about all the stuff flying around on the MQ
21:51:03 <trandles> we use LDMS/OVIS   https://ovis.ca.sandia.gov/mediawiki/index.php/Main_Page
21:51:05 <oneswig> rocky_g: saw the keynote - was blown away - not seen anything since
21:51:20 <jmlowe> how heavy is heavy when it comes to monasca
21:51:54 <b1airo> we use nagios and ganglia at the host level, elk for api data
21:52:09 <oneswig> jmlowe: It uses a lot of Apache stack - Kafka, Storm, etc.
21:52:36 <oneswig> Glued together with data-moving services of its own
21:52:38 <rocky_g> This is all great info to capture on the etherpad....
21:52:53 <oneswig> rocky_g: I'm going to print the whole thing out and put it on my wall :-)
21:52:58 <jmlowe> must keep from making an ouroboros
21:53:13 <martial> rocky_g: doing what I can to make that happen :)
21:53:38 <oneswig> jmlowe: ouroboros - I have learned something today
21:54:34 <oneswig> Final issues to lob in: who monitors OpenStack event notifications, and how?
21:54:57 <b1airo> as in the queues oneswig ?
21:54:59 <trandles> oneswig: I've played with it a tiny bit using splunk
21:55:00 <oneswig> One aspect of Monasca I quite like is that it hoovers up everything and anything
21:55:29 <oneswig> b1airo: json blobs that get thrown out whenever nova/cinder/etc does anything useful
21:55:35 <oneswig> those things
21:55:48 <oneswig> oslo.notification?
21:56:29 <trandles> oh, I don't touch the json, I was feeding the normal logs to splunk and building searches
21:56:52 <oneswig> OK, does anyone have infrastructure that triggers alerts based on log messages? - trandles looks like you're up to this
21:56:58 <b1airo> yeah i was just wondering if you meant notification.error in particular, or everything
21:56:58 <rocky_g> There are config options to capture those or not.
21:57:12 <trandles> oneswig: exactly
21:57:30 <b1airo> we have nagios alerts on certain queues
21:57:33 <oneswig> b1airo: everything - want to reconstruct into timelines (os-profiler?)
21:57:39 <b1airo> just based on ready message count
21:58:16 <b1airo> we're almost out of time
21:58:18 <trandles> our operations folks use zenoss to actually trigger alerts but I've never talked to them about feeding openstack data into it
21:58:59 <b1airo> priteau, we didn't talk about workload traces
21:59:06 <b1airo> next time?
21:59:17 <rocky_g> Next time...
21:59:19 <oneswig> priteau: sounds good to me, fits in well
21:59:20 <priteau> b1airo: Let's put it on the agenda for next itme
21:59:23 <b1airo> does next week's TZ work for you?
21:59:28 <b1airo> or week after?
21:59:35 <priteau> I join both meetings :-)
21:59:52 <b1airo> i thought you did - just a little hungover this morning...
22:00:07 <b1airo> (xmas parties have started already)
22:00:15 <b1airo> thanks all!
22:00:21 <b1airo> #endmeeting