21:00:28 <oneswig> #startmeeting scientific-wg
21:00:45 <oneswig> #link agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_October_17th_2017
21:01:17 <oneswig> err... hello
21:04:00 <oneswig> b1airo: morning.
21:04:06 <oneswig> #chair b1airo
21:04:07 <openstack> Current chairs: b1airo oneswig
21:04:17 <b1airo> Hi oneswig
21:04:32 <oneswig> Quiet one so far :-)
21:04:49 <oneswig> Was hoping to hear about your recent work and the RoCE tuning you've been doing.
21:05:00 <b1airo> Yeah and I am in Brisbane for a conference, so not all here I'm afraid
21:05:26 <oneswig> Not to mention NVLink in KVM - saw your talk intro - woah there.
21:05:44 <b1airo> We hit some me MOFED bugs, always fun
21:06:06 <b1airo> *new MOFED
21:06:24 <b1airo> (stupid phone keyboard)
21:06:38 <oneswig> Was this related to your krazy SR-IOV over a bond thingy?
21:06:59 <b1airo> Well the nvlink stuff is interesting, it pretty much just works
21:08:03 <b1airo> No, we found MPI bombed out on some newer nodes (Broadwell) when getting to 28 procs
21:08:35 <oneswig> b1airo: I think I'll have to come along.  So if I pass through 3 GPUs, they can do NVLink for GPU-direct between them in the virtualised context?
21:09:11 <b1airo> Was due to a memory/resource exhaustion issue in the driver with FCA enabled (which is default with MXM)
21:09:40 <oneswig> b1airo: 28 procs is basically 1 node, right?
21:09:47 <oneswig> plus or minus a few
21:09:57 <b1airo> That's right oneswig - or even 8 GPUs if you happen to have a DGX-1
21:10:09 <b1airo> Nice devstack platform that
21:10:14 <b1airo> ;-)
21:10:19 <bollig> interesting.
21:10:34 <oneswig> Hey bollig
21:10:40 <bollig> hey all
21:10:58 <b1airo> oneswig: yep one e5-2680v4 node
21:11:01 <b1airo> Hi bollig
21:11:08 <oneswig> b1airo: only the one DGX-1?  You poor guys ...
21:11:18 <b1airo> Lol
21:11:28 <b1airo> It's tough down here
21:11:41 <oneswig> box is probably full of spiders by now anyway...
21:12:05 <b1airo> Anyway we confirmed the fix for the MPI issue yesterday (had to replace a NIC first)
21:12:14 <oneswig> b1airo: what were the symptoms of the RoCE issue?
21:13:01 <b1airo> Have to tune VF and PF PCI LOG BAR settings in the firmware
21:14:07 <martial> good evening
21:14:11 <oneswig> How was it bombing out?  I need to test some new NICs with RoCE at 80 procs
21:14:14 <martial> (or morning ...)
21:14:20 <oneswig> Hi martial
21:14:24 <oneswig> #chair martial
21:14:24 <openstack> Current chairs: b1airo martial oneswig
21:14:29 <b1airo> Hi martial
21:14:40 <martial> Hi Stig, Blair, sorry for being late
21:14:57 <b1airo> It'll just just crash on startup, let me find the error message...
21:14:58 <oneswig> no worries, we were just getting going really, it's a slow one
21:15:18 <oneswig> no rush b1airo, think my nodes are Haswell
21:16:07 <b1airo> If you hit it you can workaround it by setting:
21:16:13 <b1airo> export OMPI_MCA_coll_hcoll_enable=0
21:16:55 <oneswig> thanks b1airo, noted
21:17:23 <oneswig> We should go through some agenda items but first up...
21:17:33 <oneswig> #topic Sydney Hackathon
21:17:40 <oneswig> I think you're all aware of that
21:17:51 <oneswig> #link Hackathon http://hackathon.openstack.org.au
21:18:10 <oneswig> martial: are you judging or was the travel time not convenient?
21:18:28 <martial> I am mentoring and very likely judging
21:18:35 <martial> I will be there on time
21:19:01 <oneswig> I think I may see you there then... I'll now be travelling to Sydney on the Saturday
21:20:32 <martial> good :)
21:20:36 <martial> see you there
21:20:46 <oneswig> #topic Sydney SIG activities
21:21:44 <oneswig> The events are in place: a meeting and bof
21:21:56 <oneswig> Both IIRC on Monday, back-to-back
21:21:58 <b1airo> Last week we started an etherpad for the lightning talk session, but I didn't get around to emailing it yet
21:22:11 <oneswig> b1airo: oh great, my next question :-)
21:22:13 <b1airo> Will do that right now...
21:24:14 <b1airo> It's at etherpad.openstack.org/p/sydney-scientific-sig-lightning-talks
21:24:24 <oneswig> ah, too quick, you just pipped me
21:25:28 <oneswig> Ah, I saw that Saverio's already put his name down - I saw him talk about that in April and was really piqued by it.
21:28:35 <oneswig> Shall we do a social in Sydney?
21:28:42 <b1airo> Email sent. I am still spamming both the SIGs and ops lists (I doubt the list membership has settled yet)
21:29:52 <oneswig> Thanks b1airo
21:30:13 <oneswig> #topic ORC Update - Sydney
21:30:13 <b1airo> Honestly, after the issues we've had organising it previously I was thinking it probably was not worthwhile
21:30:32 <oneswig> b1airo: informally, non-organised, perhaps...
21:30:52 <oneswig> like the good ol' days :-)
21:30:57 <b1airo> Yeah, I think maybe a non-sponsored one would be easier
21:31:32 <oneswig> seems fair.  I think I was lucky dealing with the sponsors in Barcelona, it worked well.
21:31:33 <martial> plus it makes it easier to chose on the spot (nearly :) )
21:31:53 <oneswig> martial: can you update us on ORC?
21:34:36 <martial> yes of course
21:34:54 <martial> so so far, the ORC planning is moving forward
21:35:28 <martial> I am going to try to have a meeting Thursday 8pm EST (Friday 11am Sydney time) to discuss local participation and presentations
21:35:39 <martial> I will email the OpenStack ML about it
21:35:49 <martial> we are not Wed/Thu full day
21:36:04 <martial> will make it is easier for people to attend both
21:36:29 <martial> there are still question of who can attend (ticket questions). Ms Vancsa started looking into that (thank you)
21:37:05 <martial> (we are "now" ... Wed/Thu)
21:37:15 <oneswig> Ah, thanks for clarifying
21:37:39 <oneswig> I can at least attend on the Thursday.  I'm talking on the Wednesday, alas
21:37:44 <martial> Wilfred is still looking into a place for Thursday
21:38:03 <martial> so pretty much a lot of work in progress
21:38:33 <martial> still to do an basically open call for participation:
21:38:35 <martial> We need to identify panelists for the sessions -- particularly getting presenters from regional country NRENs and research clouds.
21:38:52 <martial> technical working group and industry participation
21:39:27 <martial> and discussion of working group presentations and chairs
21:39:33 <martial> but it is moving along smoothly
21:39:54 <martial> any point in particular to discuss?
21:40:07 <martial> (ie questions)
21:41:20 <martial> I have been giving access to the website, so I will post information and draft agenda
21:41:44 <martial> I see David is around? any update for us by any chance?
21:42:02 <oneswig> Was there any further on the idea of knitting it into the forum sessions more?
21:43:27 <oneswig> ok - move on?
21:43:39 <martial> oneswig
21:43:50 <oneswig> #topic Supercomputing
21:44:23 <oneswig> We need a volunteer who lives in Denver, or someone with a shipping address there!
21:44:43 <martial> yes I am now co-moderating the P2302 meeting, so we will have a forum session on Tuesday
21:44:48 <b1airo> For books?
21:44:54 <oneswig> The Foundation's printing the OpenStack/HPC book and needs a shipping address
21:45:00 <oneswig> with an inexact delivery date, I believe
21:45:30 <oneswig> So a booth on the floor is good for a day or two beforehand, but they might be more vague than that.
21:45:56 <b1airo> I have some vendor contacts in Denver. Might be a little bit of an odd ask, so maybe only if there is no better idea...
21:46:57 <b1airo> Also, the OpenStack SC BOF is confirmed, has been on the programme a week or two now
21:47:03 <oneswig> Kathy says: "I can’t control when they’ll arrive, unfortunately. Can we ship the desired Supercomputing quantity to someone close to the event (drivable is ideal) so they can hand carry the boxes and distribute them at the show? Or I can ship to a hotel, hopefully they’ll hold the boxes if they arrive very early. Please send the contact name, address, phone, and if a hotel, the arrival dat
21:47:03 <oneswig> e."
21:47:44 <martial> the go community might have some people there, I will try to contact some people I know there
21:48:03 <armstrong> Is it possible for us to have the  OpenStack/HPC book?
21:48:06 <martial> (gophercon is usually in Denver)
21:48:12 <armstrong> i mean copy of the book
21:48:18 <b1airo> It will be updated to reflect a merge with 3 other speakers who had a submission put in for them by some vendors/analysts and will now become part of ours (there's was an OpenStack HPC storage integration focus, so works well)
21:48:42 <oneswig> Somewhat hilariously, the Work of flanders_ appears in her next paragraph: "I can also send another smaller batch to one other person. Maybe this person can distribute to the contributors. This batch could also go to someone at the Sydney Summit. Maybe to Blaire? "
21:49:18 <oneswig> b1airo: I can take one for the team - John's going to attend in my place so you can take me off the panel to make room
21:49:19 <b1airo> Gargh, damn that man!
21:49:57 <b1airo> oneswig: I think we'll be ok for room, but are you not coming at all now?
21:50:37 <oneswig> b1airo: unfortunately no.  We looked at it long and hard, it's a huge shame to miss out.
21:50:57 <martial> stig: missing SC?
21:51:24 <oneswig> martial: alas, yes - we'll need to make the most of Sydney :-)
21:51:39 <b1airo> Long time to be away from home
21:52:05 <oneswig> ok, it seems nobody present lives near Denver and is going to SC.
21:52:35 <flanders_> Doh! What I do?
21:52:50 <oneswig> flanders_: Ask Blairé
21:53:13 <b1airo> At least it isn't Blairè
21:53:21 <oneswig> touché
21:53:24 <b1airo> (would sound like a cat throwing up)
21:53:42 <flanders_> I think this is my disinformation, Kathy might be calling you Blairé he he ;p
21:54:11 <oneswig> flanders_ work is done here
21:54:15 <flanders_> I'm so proud!
21:54:27 <flanders_> Love U Blair ;D
21:54:55 <b1airo> Maybe I'll start a new LinkedIn and see if I get profiled differently with the é
21:55:09 <oneswig> b1airo: is it a stretch too far to call up your friendly vendor in Denver?
21:55:13 <martial> maybe you can do the evil twin moustache
21:55:56 <b1airo> oneswig: I'll give it a shot
21:56:24 <flanders_> Btw +1 if we can get ORC sessions for the Wed listed as part of the forum schedule.  martial if so email speakersupport@OpenStack.org to suggest with Erin, Ildiko and myself Cc'd
21:56:42 <oneswig> Thanks b1airo, I'll copy you in to Kathy's mail to work out details
21:56:43 <martial> good idea flanders_
21:56:58 <flanders_> 👍
21:57:03 <oneswig> You mean, flandres?
21:57:41 <martial> are we Frenchicizing everybody's name?
21:57:54 <martial> I win, mine is already correct then :)
21:58:00 <flanders_> +1 for Flandres! Ye oldy spelling!
21:58:19 <b1airo> Lol
21:58:26 <oneswig> Excellent
21:58:35 <oneswig> We should wrap up
21:58:36 <martial> flanders_: re ORC, there will be a Friday 11am Sydney time call, can you join?
21:58:38 <oneswig> #topic AOB
21:58:50 <flanders_> Gotta run, can't wait to see y'all in Syd, Flandres out yo.
21:58:59 <oneswig> Thanks Flandres :-)
21:59:07 <oneswig> Any more to add?
21:59:36 <martial> I am good
21:59:50 <oneswig> armstrong: to your earlier question - will you be at Supercomputing?  The books will be given out there.
21:59:58 <b1airo> oneswig: I didn't find the exact error message
22:00:08 <b1airo> But it is an out of memory
22:00:22 <armstrong> OH no but I will be at Syd
22:00:23 <b1airo> Or perhaps "memory allocation error"
22:00:45 <b1airo> If you run an all2all test you will hit it
22:00:57 <oneswig> I have OpenMPI over TCP latencies that are insanely high.  Too high.  Like 16ms.  This is PV-OVS-VxLAN - it's meant to be slow but that's unbelievable
22:01:33 <oneswig> Trying to find something to compare against RoCE and ASAP2 - but can't fit these things on the same graph!
22:02:04 <oneswig> OK, time to close - thanks all
22:02:09 <oneswig> #endmeeting