09:00:35 <oneswig> #startmeeting scientific-wg 09:00:36 <openstack> Meeting started Wed Apr 26 09:00:35 2017 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:38 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:40 <openstack> The meeting name has been set to 'scientific_wg' 09:00:50 <oneswig> Hello and good morning 09:01:06 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_April_26th_2017 09:01:52 <priteau> Good morning oneswig 09:01:58 <verdurin> Morning. 09:02:06 <b1airo> Allo 09:02:10 <oneswig> Hi priteau verdurin b1airo 09:02:14 <oneswig> #chair b1airo 09:02:15 <openstack> Current chairs: b1airo oneswig 09:02:22 <oneswig> what's new? 09:02:37 <oneswig> Our bare metal kolla machine is awesome! 09:03:06 <oneswig> very happy with that. I was looking at chameleon appliances yesterday... 09:03:52 <priteau> I have heard good things about Kolla but never tried it 09:04:12 <oneswig> Think I've fixed a bug in our deployments where biosdevname would occasionally not name a device, which made automated deployments impossible. 09:04:27 <dariov> hello! 09:04:45 <oneswig> Kolla itself seems to work but we've got that "new deployment smell" about it - don't know the dodgy bits yet 09:04:49 <oneswig> Hi dariov 09:05:23 <oneswig> OK should we get started with the agenda 09:05:43 <oneswig> #topic Boston summit sessions 09:06:26 <oneswig> #link planning's in here https://etherpad.openstack.org/p/Scientific-WG-boston 09:06:50 <oneswig> I put a mail out this morning calling for Lightning talk submissions 09:07:14 <oneswig> Do I have any volunteers? :-) 09:08:05 <oneswig> ... ok ... next time ... 09:08:52 <oneswig> I've been asking around again for a prize for the best talk. 09:09:21 <oneswig> Nothing confirmed yet but last summit it was arranged on the day 09:10:06 <zioproto> hello 09:10:21 <oneswig> The next session was the committee meeting - have you seen the agenda on the same ehterpad? 09:10:23 <oneswig> Hi zioproto 09:10:50 <oneswig> zioproto: have you seen the video of your talk at HPCAC is posted? 09:11:10 <zioproto> oneswig: I have no idea 09:11:41 <oneswig> #link HPCAC videos http://insidehpc.com/video-gallery-switzerland-hpc-conference-2017/ 09:11:55 <b1airo> Sorry it's a bad time here 09:12:10 <oneswig> no problem b1airo 09:12:20 <b1airo> Wondering if there would be any objections to moving this an hour later...? 09:13:03 <b1airo> Anything exciting at the insidehpc conference oneswig ? 09:13:13 <oneswig> It would be useful to collect together any activity that has gone on recently relating to the activity areas of the WG 09:13:16 <zioproto> oneswig: looks like I am on air https://www.youtube.com/watch?v=Z7I2WI5Ay1w 09:13:51 <oneswig> zioproto: I'll be looking at your layer-2 work again later! How is it working out? 09:14:36 <zioproto> oneswig: we are upgrading our production cluster to Newton this Saturday, so we can test the feature on the production hardware :) 09:15:00 <oneswig> Oh cool - good luck 09:15:03 <zioproto> oneswig: now on the staging cluster we can test functionality of the parts but not the performance 09:15:35 <oneswig> zioproto: I'd be very interested to hear your experiences when you start to extend these layer 2 networks at range and at scale 09:15:45 <oneswig> keep us posted! 09:16:19 <zioproto> oneswig: I will ! 09:16:25 <oneswig> Which makes me think that hybrid cloud for research computing use cases could make a good activity area 09:17:14 <zioproto> are we follwing an agenda ? I lost the beginning of the meeting, I dont know if there is an agenda link 09:17:51 <oneswig> Agenda is https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_April_26th_2017 09:18:07 <oneswig> Looking at etherpad at https://etherpad.openstack.org/p/Scientific-WG-boston 09:18:35 <b1airo> oneswig: did you get to a happy outcome with your metrics work? 09:19:16 <oneswig> b1airo: It's ongoing. I can link to some of the things we've contributed, but a sheaf of gerrit reviews isn't going to make for an exciting meeting 09:19:35 <oneswig> I'll look for some key points we've put articles out on. 09:19:45 <oneswig> b1airo: Is the RACmon blog active? 09:20:37 <b1airo> It's out of date, there is a big backlog of posts in the pipeline - no time, not enough hands :-( 09:22:42 <oneswig> Ah, too bad. Made my day when somebody mentioned they found one of our old blog posts and it helped them. 09:23:29 <oneswig> b1airo: do you have anything on GPUs, or your upcoming presentation? 09:24:12 <oneswig> zioproto: how is the work going at SWITCH on data sets? 09:24:33 <zioproto> b1airo: we identified two open issues 09:24:42 <zioproto> the first issue is the permissions on the objects 09:24:58 <zioproto> using the radosgw with S3 interface 09:25:12 <zioproto> is not scalable with big datasets, because the r/w permissions are handled per object 09:25:25 <zioproto> there is no inheritance of permissions from the bucket 09:25:42 <zioproto> so if you have many objects, you have to touch all of them to grant an additional user a read-only permission 09:25:50 <b1airo> Yes, presenting at NVIDIA GTC on GPU accelerated OpenStack clouds - will be announced overview of how to build one and how OpenStack supports it, then a quick show and tell of our system and some benchmarks 09:26:03 <oneswig> zioproto: that's a nuisance indeed! 09:26:14 <b1airo> a/announced/an/ 09:26:38 <oneswig> b1airo: you mentioned peer-to-peer - is gpu direct working virtualised? 09:26:52 <b1airo> In other news, passthrough works with NVLink 09:27:10 <b1airo> And can do P2P over it 09:27:11 <zioproto> b1airo: the second issue is the support for AWS4 signature by the radosgw. Now it is I think two weeks I dont follow the story on that bug 09:27:20 <oneswig> ooh. 09:28:04 <b1airo> zioproto: is that permissions behaviour a bug? 09:28:13 <zioproto> b1airo: no public updates AFAIK http://tracker.ceph.com/issues/19056 09:28:29 <zioproto> b1airo: permission behavior is a design problem of S3 09:29:04 <zioproto> b1airo: the bug is about the AWS4 signature and keystone integration. This is a just a software bug. But it is bad because recent S3 clients like Hadoop need the AWS4 signature to be working in the backend 09:29:07 <oneswig> zioproto: bit late for a feature request then 09:29:22 <b1airo> I did open a Red Hat case on the AWS4 thing but the answers so far do not make much sense to me. Basically they were asserting it isn't supported with Keystone but of course no detail as to why 09:29:34 <zioproto> oneswig: I guess so. But Openstack Swift for example is able to inherit the permissions from the Swift container to the objects 09:29:37 <oneswig> zioproto: do you maintain a blog of what you're doing at SWITCH? 09:30:18 <zioproto> oneswig: http://cloudblog.switch.ch/ 09:30:26 <b1airo> I need to get involved in bedtime wrangling... 09:30:38 <oneswig> zioproto: great, thanks 09:31:52 <oneswig> zioproto: do you think you'd be able to summarise your project on datasets in a blog post? 09:32:10 <oneswig> I recall Sofiane had a detailed page on it 09:33:57 <b1airo> oneswig: do you know if powerd had anything new to report? 09:33:57 <oneswig> #link Did you see there is an eventbrite page for the WG evening social at Boston https://www.eventbrite.com/e/openstack-scientific-working-group-boston-social-tickets-33928219217 09:34:07 <zioproto> oneswig: yes, I will probably have to do it ! I put this in my TODO list for the month of May 09:34:09 <oneswig> b1airo: not that I'm aware of, sorry 09:34:44 <oneswig> BTW I heard already more than half the tickets have gone for the social and it's only been public a couple of days 09:35:48 <oneswig> OK, move on from the summit? 09:36:58 <oneswig> #topic Cloud congress 09:37:12 <oneswig> If you can attend and you haven't registered yet... 09:37:24 <oneswig> #link here's the link https://www.eventbrite.com/e/boston-open-research-cloud-workshop-tickets-31893256589 09:37:44 <oneswig> Is there any more to add on that? 09:38:23 <oneswig> #topic WG IRC channel 09:38:28 <oneswig> Quick one this 09:38:48 <oneswig> Mike Lowe and some others suggested an IRC channel for WG discussion 09:39:12 <oneswig> given we seem to work best as an information sharing forum 09:39:43 <oneswig> I've gone through the process of creating #scientific-wg, it's pending review 09:40:15 <oneswig> We should have a channel within a few days. 09:40:43 <oneswig> Nothing more to add on that... 09:41:40 <b1airo> Cool thanks for doing that oneswig 09:41:43 <dariov> nice one, oneswig 09:41:51 <priteau> Great news! 09:42:16 <oneswig> sorry, lost control of my keyboard for a sec, took a while to find my way back :-) 09:42:33 <oneswig> #topic AOB 09:42:47 <oneswig> Any more news? 09:43:32 <oneswig> The Cambridge team have been looking at heat+ansible for lustre client integration via SR-IOV interfaces. b1airo - what do you do for this? 09:44:40 <b1airo> The config management specifically? 09:44:56 <oneswig> b1airo: yes - how do you do it? 09:45:33 <oneswig> got anything you can share with your pommy mates? :-) 09:46:16 <b1airo> Our HPC team uses ansible. Currently our Lustre provider networks are not actually managed by Neutron, so we use a little hack and set the L3 interface config in the guest based on the config provided by Neutron on a different interface 09:46:54 <oneswig> interesting... 09:47:16 <b1airo> Today I found an issue with Ethernet​ NIC tuning that I'm about to open a case for 09:47:27 <oneswig> Mellanox NIC? 09:47:48 <oneswig> We've been moving to OFED 4 on our servers 09:48:32 <b1airo> When we bump up the rx/tx ring buffers, from their 1024 default to 8192 as Mellanox recommended for RoCE, each interface changed accounts for almost 3GB reduction in MemFree on the host 09:49:12 <oneswig> 8192 * 9216 < 3GB... 09:49:36 <oneswig> Is it a Rx ring per VF? 09:49:38 <b1airo> So on a dual port card we lose almost 6GB, which is enough to mean we can't launch a 120GB guest on a 128GB host - OOM!! 09:49:50 <b1airo> ring per PF 09:50:15 <b1airo> I don't know where that memory is going though 09:50:37 <oneswig> Tried /proc/slabinfo before and after? 09:51:06 <b1airo> Nothing else in meminfo accounts for it, and whilst slab goes show an increase it's only on the order of <100M 09:51:21 <b1airo> s/goes/does/ 09:52:17 <oneswig> Is that a regression after driver upgrade? 09:52:37 <b1airo> Even only setting it on one interface doesn't really work, host is under too much memory pressure and kswapd's start getting busy 09:53:30 <b1airo> We tried a MOFED upgrade recently too, but reverted back to 3.4 when we found they'd stupidly disabled VF enablement on bond slave PFs 09:54:07 <oneswig> You run VFs on a bond? I had no idea that was even possible. 09:55:02 <b1airo> In our setup we have an active-backup bond with linux-bridge above it for some of our provider networks, preferred active on p2, then we put the high performance VFs dedicated on p1 09:55:28 <oneswig> Ah OK, was assuming LACP 09:55:37 <oneswig> makes more sense... 09:55:43 <b1airo> Yeah, apparently Mellanox were too 09:56:08 <b1airo> They are fixing it but I guess we will have to wait for next point release 09:56:29 <oneswig> There's no underestimating antipodean craftiness 09:56:43 <b1airo> Also, (Intel) Lustre does not yet support MOFED4 09:57:25 <b1airo> (not that Intel Lustre is a separate thing anymore, but I imagine it will take a while for that to change) 09:57:37 <oneswig> b1airo: interesting. Any idea when? 09:57:49 <oneswig> on the ofed4 support 09:58:32 <b1airo> Soon I think, we are talking to Intel about it and I think they have given Gin a build to try 09:59:07 <oneswig> OK thanks b1airo 09:59:11 <oneswig> Out of time - any more? 09:59:27 <b1airo> Thanks all! 09:59:36 <oneswig> OK thanks everyone 09:59:37 <b1airo> Looking forward to Boston! 09:59:46 <zioproto> this VF Virtual Function is a Mellanox only thing ? 09:59:54 <oneswig> Good luck at GTC b1airo 10:00:03 <zioproto> enjoy Boston ! :) 10:00:10 <oneswig> zioproto: It's part of SR-IOV 10:00:26 <oneswig> Not specific although it seems the bond issue is 10:00:34 <oneswig> #endmeeting