21:00:28 #startmeeting scientific-wg 21:00:29 Meeting started Tue Oct 17 21:00:28 2017 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:30 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:33 The meeting name has been set to 'scientific_wg' 21:00:45 #link agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_October_17th_2017 21:01:17 err... hello 21:04:00 b1airo: morning. 21:04:06 #chair b1airo 21:04:07 Current chairs: b1airo oneswig 21:04:17 Hi oneswig 21:04:32 Quiet one so far :-) 21:04:49 Was hoping to hear about your recent work and the RoCE tuning you've been doing. 21:05:00 Yeah and I am in Brisbane for a conference, so not all here I'm afraid 21:05:26 Not to mention NVLink in KVM - saw your talk intro - woah there. 21:05:44 We hit some me MOFED bugs, always fun 21:06:06 *new MOFED 21:06:24 (stupid phone keyboard) 21:06:38 Was this related to your krazy SR-IOV over a bond thingy? 21:06:59 Well the nvlink stuff is interesting, it pretty much just works 21:08:03 No, we found MPI bombed out on some newer nodes (Broadwell) when getting to 28 procs 21:08:35 b1airo: I think I'll have to come along. So if I pass through 3 GPUs, they can do NVLink for GPU-direct between them in the virtualised context? 21:09:11 Was due to a memory/resource exhaustion issue in the driver with FCA enabled (which is default with MXM) 21:09:40 b1airo: 28 procs is basically 1 node, right? 21:09:47 plus or minus a few 21:09:57 That's right oneswig - or even 8 GPUs if you happen to have a DGX-1 21:10:09 Nice devstack platform that 21:10:14 ;-) 21:10:19 interesting. 21:10:34 Hey bollig 21:10:40 hey all 21:10:58 oneswig: yep one e5-2680v4 node 21:11:01 Hi bollig 21:11:08 b1airo: only the one DGX-1? You poor guys ... 21:11:18 Lol 21:11:28 It's tough down here 21:11:41 box is probably full of spiders by now anyway... 21:12:05 Anyway we confirmed the fix for the MPI issue yesterday (had to replace a NIC first) 21:12:14 b1airo: what were the symptoms of the RoCE issue? 21:13:01 Have to tune VF and PF PCI LOG BAR settings in the firmware 21:14:07 good evening 21:14:11 How was it bombing out? I need to test some new NICs with RoCE at 80 procs 21:14:14 (or morning ...) 21:14:20 Hi martial 21:14:24 #chair martial 21:14:24 Current chairs: b1airo martial oneswig 21:14:29 Hi martial 21:14:40 Hi Stig, Blair, sorry for being late 21:14:57 It'll just just crash on startup, let me find the error message... 21:14:58 no worries, we were just getting going really, it's a slow one 21:15:18 no rush b1airo, think my nodes are Haswell 21:16:07 If you hit it you can workaround it by setting: 21:16:13 export OMPI_MCA_coll_hcoll_enable=0 21:16:55 thanks b1airo, noted 21:17:23 We should go through some agenda items but first up... 21:17:33 #topic Sydney Hackathon 21:17:40 I think you're all aware of that 21:17:51 #link Hackathon http://hackathon.openstack.org.au 21:18:10 martial: are you judging or was the travel time not convenient? 21:18:28 I am mentoring and very likely judging 21:18:35 I will be there on time 21:19:01 I think I may see you there then... I'll now be travelling to Sydney on the Saturday 21:20:32 good :) 21:20:36 see you there 21:20:46 #topic Sydney SIG activities 21:21:44 The events are in place: a meeting and bof 21:21:56 Both IIRC on Monday, back-to-back 21:21:58 Last week we started an etherpad for the lightning talk session, but I didn't get around to emailing it yet 21:22:11 b1airo: oh great, my next question :-) 21:22:13 Will do that right now... 21:24:14 It's at etherpad.openstack.org/p/sydney-scientific-sig-lightning-talks 21:24:24 ah, too quick, you just pipped me 21:25:28 Ah, I saw that Saverio's already put his name down - I saw him talk about that in April and was really piqued by it. 21:28:35 Shall we do a social in Sydney? 21:28:42 Email sent. I am still spamming both the SIGs and ops lists (I doubt the list membership has settled yet) 21:29:52 Thanks b1airo 21:30:13 #topic ORC Update - Sydney 21:30:13 Honestly, after the issues we've had organising it previously I was thinking it probably was not worthwhile 21:30:32 b1airo: informally, non-organised, perhaps... 21:30:52 like the good ol' days :-) 21:30:57 Yeah, I think maybe a non-sponsored one would be easier 21:31:32 seems fair. I think I was lucky dealing with the sponsors in Barcelona, it worked well. 21:31:33 plus it makes it easier to chose on the spot (nearly :) ) 21:31:53 martial: can you update us on ORC? 21:34:36 yes of course 21:34:54 so so far, the ORC planning is moving forward 21:35:28 I am going to try to have a meeting Thursday 8pm EST (Friday 11am Sydney time) to discuss local participation and presentations 21:35:39 I will email the OpenStack ML about it 21:35:49 we are not Wed/Thu full day 21:36:04 will make it is easier for people to attend both 21:36:29 there are still question of who can attend (ticket questions). Ms Vancsa started looking into that (thank you) 21:37:05 (we are "now" ... Wed/Thu) 21:37:15 Ah, thanks for clarifying 21:37:39 I can at least attend on the Thursday. I'm talking on the Wednesday, alas 21:37:44 Wilfred is still looking into a place for Thursday 21:38:03 so pretty much a lot of work in progress 21:38:33 still to do an basically open call for participation: 21:38:35 We need to identify panelists for the sessions -- particularly getting presenters from regional country NRENs and research clouds. 21:38:52 technical working group and industry participation 21:39:27 and discussion of working group presentations and chairs 21:39:33 but it is moving along smoothly 21:39:54 any point in particular to discuss? 21:40:07 (ie questions) 21:41:20 I have been giving access to the website, so I will post information and draft agenda 21:41:44 I see David is around? any update for us by any chance? 21:42:02 Was there any further on the idea of knitting it into the forum sessions more? 21:43:27 ok - move on? 21:43:39 oneswig 21:43:50 #topic Supercomputing 21:44:23 We need a volunteer who lives in Denver, or someone with a shipping address there! 21:44:43 yes I am now co-moderating the P2302 meeting, so we will have a forum session on Tuesday 21:44:48 For books? 21:44:54 The Foundation's printing the OpenStack/HPC book and needs a shipping address 21:45:00 with an inexact delivery date, I believe 21:45:30 So a booth on the floor is good for a day or two beforehand, but they might be more vague than that. 21:45:56 I have some vendor contacts in Denver. Might be a little bit of an odd ask, so maybe only if there is no better idea... 21:46:57 Also, the OpenStack SC BOF is confirmed, has been on the programme a week or two now 21:47:03 Kathy says: "I can’t control when they’ll arrive, unfortunately. Can we ship the desired Supercomputing quantity to someone close to the event (drivable is ideal) so they can hand carry the boxes and distribute them at the show? Or I can ship to a hotel, hopefully they’ll hold the boxes if they arrive very early. Please send the contact name, address, phone, and if a hotel, the arrival dat 21:47:03 e." 21:47:44 the go community might have some people there, I will try to contact some people I know there 21:48:03 Is it possible for us to have the OpenStack/HPC book? 21:48:06 (gophercon is usually in Denver) 21:48:12 i mean copy of the book 21:48:18 It will be updated to reflect a merge with 3 other speakers who had a submission put in for them by some vendors/analysts and will now become part of ours (there's was an OpenStack HPC storage integration focus, so works well) 21:48:42 Somewhat hilariously, the Work of flanders_ appears in her next paragraph: "I can also send another smaller batch to one other person. Maybe this person can distribute to the contributors. This batch could also go to someone at the Sydney Summit. Maybe to Blaire? " 21:49:18 b1airo: I can take one for the team - John's going to attend in my place so you can take me off the panel to make room 21:49:19 Gargh, damn that man! 21:49:57 oneswig: I think we'll be ok for room, but are you not coming at all now? 21:50:37 b1airo: unfortunately no. We looked at it long and hard, it's a huge shame to miss out. 21:50:57 stig: missing SC? 21:51:24 martial: alas, yes - we'll need to make the most of Sydney :-) 21:51:39 Long time to be away from home 21:52:05 ok, it seems nobody present lives near Denver and is going to SC. 21:52:35 Doh! What I do? 21:52:50 flanders_: Ask Blairé 21:53:13 At least it isn't Blairè 21:53:21 touché 21:53:24 (would sound like a cat throwing up) 21:53:42 I think this is my disinformation, Kathy might be calling you Blairé he he ;p 21:54:11 flanders_ work is done here 21:54:15 I'm so proud! 21:54:27 Love U Blair ;D 21:54:55 Maybe I'll start a new LinkedIn and see if I get profiled differently with the é 21:55:09 b1airo: is it a stretch too far to call up your friendly vendor in Denver? 21:55:13 maybe you can do the evil twin moustache 21:55:56 oneswig: I'll give it a shot 21:56:24 Btw +1 if we can get ORC sessions for the Wed listed as part of the forum schedule. martial if so email speakersupport@OpenStack.org to suggest with Erin, Ildiko and myself Cc'd 21:56:42 Thanks b1airo, I'll copy you in to Kathy's mail to work out details 21:56:43 good idea flanders_ 21:56:58 👍 21:57:03 You mean, flandres? 21:57:41 are we Frenchicizing everybody's name? 21:57:54 I win, mine is already correct then :) 21:58:00 +1 for Flandres! Ye oldy spelling! 21:58:19 Lol 21:58:26 Excellent 21:58:35 We should wrap up 21:58:36 flanders_: re ORC, there will be a Friday 11am Sydney time call, can you join? 21:58:38 #topic AOB 21:58:50 Gotta run, can't wait to see y'all in Syd, Flandres out yo. 21:58:59 Thanks Flandres :-) 21:59:07 Any more to add? 21:59:36 I am good 21:59:50 armstrong: to your earlier question - will you be at Supercomputing? The books will be given out there. 21:59:58 oneswig: I didn't find the exact error message 22:00:08 But it is an out of memory 22:00:22 OH no but I will be at Syd 22:00:23 Or perhaps "memory allocation error" 22:00:45 If you run an all2all test you will hit it 22:00:57 I have OpenMPI over TCP latencies that are insanely high. Too high. Like 16ms. This is PV-OVS-VxLAN - it's meant to be slow but that's unbelievable 22:01:33 Trying to find something to compare against RoCE and ASAP2 - but can't fit these things on the same graph! 22:02:04 OK, time to close - thanks all 22:02:09 #endmeeting