21:02:11 #startmeeting Scientific-SIG 21:02:12 Meeting started Tue Feb 19 21:02:11 2019 UTC and is due to finish in 60 minutes. The chair is b1airo. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:02:12 good morning, good evening All 21:02:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:02:16 The meeting name has been set to 'scientific_sig' 21:02:27 #chair martial 21:02:28 Current chairs: b1airo martial 21:02:36 hi janders 21:03:10 I just had a really interesting chat about baremetal cloud & networking 21:03:23 would you be interested in spendning some time on this? 21:03:26 oh yeah! who with? 21:03:39 RHAT, Mellanox and Bright Computing 21:03:53 do tell 21:04:11 have you guys played much with pxe booting baremetals from neutron tenant networks (as opposed to the nominated provisioning network)? 21:05:07 no, not at all, but it's clearly one of the key issues in multi-tenant bare-metal... 21:05:17 with Bright, we're trying to make Bright Cluster Manager a Cloud-native app 21:05:42 it might seem out of the blue but I do see a ton of benefits of this approach and few disadvantages (other than the initial work to make it work) 21:05:44 right, that's a very sensible approach for them 21:06:11 now - they're all about pxe booting (as you probably know - you're a Bright user right?) 21:06:23 presumably they'd make a new agent or something that sat inside each tenant 21:06:55 so you need to spin up frontend and backend networks. You put BCM-cloud on both. Then you pxeboot the compute nodes off BCM's backend 21:07:04 NeSI has Bright (BCM and BOS) so I'm picking some stuff up, but I'm not hands on in the ops stuff here 21:07:40 same as me - big bright user but we have bright gurus in the team so I know what it does but log in maybe once a month 21:08:12 on Ethernet - it's all easy and just works. Networking-ansible is plumbing VLANs, there's no NIC side config 21:08:27 on IB though provisioning_network pkey needs to be pre-set in the HCA FW 21:09:12 after that's done it's rock-solid but one negative side effect is you can't pxeboot off tenant networks. We tried some clever hacks with custom compiled ipxe but nah. The silicon will drop any packets that do not match the pre-set pkey 21:09:22 looks like dead end 21:09:28 so - back to the drawing board 21:09:45 the kit I've got does have ethernet ports, they are just shut off cause the switching has no SDN 21:10:01 it does have ssh though hence... networking-ansible! 21:10:09 hmm, until MLNX creates a new firmware with some pre-shared trusted key or some such 21:10:34 I was told it can be made work in parallel to Mellanox SDN, both can be configured at the same time just for different physnets 21:10:51 I think this can be made work (pending networking-mellanox testing on these switches) 21:11:04 which kit do you have? 21:11:12 M1000es from Dell 21:11:29 (I have many different platforms but that's the one I got in large numbers for OpenStack to start with) 21:11:55 here I wanted to ask you what you think about all this and what is your experience with networking-ansible 21:12:14 I'll VPN into work to get the exact switch details (RHAT are asking too) bear with me if I drop out 21:13:33 ok.. still here? 21:14:20 haven't used networking-ansible myself yet, though it was on the radar when looking at trying to insert an Ironic Undercloud into things at Monash, where our networks are all 100G Mellanox Spectrum running Cumulus 21:16:23 Force10 MXL Blade 21:16:27 that's my eth platform 21:17:55 on that - what are your thoughts on NEO+Mellanox ethernet vs networking-ansible+Mellanox ethernet? 21:18:32 I bet there won't be feature parity yet but conceptually it's quite interesting 21:19:01 very different approach 21:19:10 I suppose the same question applies to Juniper SDNs etc 21:19:46 we first looked at NEO over 2 years ago, it wasn't very polished or nice from a UI perspective yet, so was difficult to actually realise the potential value add. i had been promising the MLNX folks we'd try again for at least 6 months before I left Monash - no doubt way back in the pile for them now 21:21:07 we spent quite a lot of effort moving to Cumulus + Ansible for basic network management and I was/am very keen to see that progress to full automation of the network as a service 21:21:49 We will get there 21:22:53 yeah this tenant-pxeserver on baremetal/IB challenge might actually lead to interesting infrastructure changes 21:23:10 on my side 21:23:27 running SDN and networking-ansible side-by-side does more complexity but I think more capability going forward 21:24:07 aiming for bare-metal multi-tenant networking with Mellanox RoCE capable gear - it seemed like we would need to try Cumulus + Neutron native (networking-ansible would be a possible alternative there) and also NEO + Neutron. interesting thing there is that value of Cumulus suddenly becomes questionable if you're managing the BAU network operations via Neutron, given e.g. NEO can do all the day0 switch provisioning 21:24:08 stuff 21:24:28 OpenStack internal traffic (other than storage) couldn't care less about IB - maybe it's better if these are keep separate so there is even more IB bandwidth for the workloads (and less jitter) 21:25:04 with Cumulus, do they have the same SDN-controller-centric architecture as NEO? 21:25:36 Do you configure Cumulus URL in Neutron SDN section, and the flow is Neutron>Cumulus>Switching gear? 21:26:24 yeah our basic cloud hpc networking architecture used dualport eth with active-backup bonding for host stuff (openstack, management, etc) and sriov into tenant machines on top of the default backup interface 21:27:15 https://docs.cumulusnetworks.com/display/CL35/OpenStack+Neutron+ML2+and+Cumulus+Linux 21:27:23 I see Cumulus has HTTP service listening 21:27:34 However, in Neutron, individual switches are configured 21:27:36 no central controller that i'm aware of for Cumulus, the driver has to talk to all the devices 21:27:58 ok.. so what's the HTTP with REST API service for? Do switches talk to it? 21:28:12 but yes, all restful apis etc, no ssh over parallel python ;-) 21:30:22 or is the REST API running on each switch? 21:30:25 the rest api is on the switches themselves 21:30:27 ya 21:30:30 ok! 21:30:37 right... 21:30:53 nice - no two sources of truth (Neutron DB and then SDN DB) 21:31:06 however troubleshooting will be fun (log aggregation will help to a degree) 21:31:08 that's true 21:31:29 with centralised SDN my experience is - they work 99% - but when they stop, life sucks 21:31:35 exactly, one of the things we were trying to wrangle with - monitoring, visibility, etc etc 21:32:01 manual syncing of the DBs would be a nightmare and a high risk operation 21:32:40 ok, i'm at a conference and talk i'm in (Barbara Chapman talking about OpenMP for Exascale) just finished 21:33:08 so i'm going to have to scoot pretty soon 21:33:16 what are the other topics to cover? 21:33:21 better touch on the ISC19 BOF in case anyone is lurking 21:33:22 what conference? :) 21:33:23 ... 21:33:39 eResearchNZ :-) 21:34:40 do you guys know if we're still on track for speaker notifications for Denver? 21:34:53 it would be great to book travel by the end of this week if possible 21:35:00 had a plenary before that on the Integrated Data Infrastructure at Stats NZ - enviable resource coming from the Aus context, clearly digital health in australia didn't talk to them before launching my health record... 21:35:18 haha :) 21:35:32 janders: I heard they are runing a day late 21:35:56 ah yes, the track chairing process got extended till Sunday just gone as there was a slight booboo that locked everyone out of the tool 21:36:14 right.. so no notifications this week? 21:36:42 I heard the 20th instead of the 19th 21:36:51 but we've done our job now, so i guess foundation staff will be doing the final checks and balances before locking in 21:37:00 ah ok 21:37:06 I will keep this in mind while making plans 21:37:10 thanks guys - this is very helpful 21:37:13 sounds possible martial , let me check my email 21:37:40 janders: just quietly, you can confidently make plans 21:39:22 :) :) :) thank you 21:41:12 b1airo: not sure if this is relevant to you but Qantas double status credits are on 21:41:52 or are you with AirNZ these days? 21:42:45 yeah kiwi-air all the way ;-) 21:43:04 Nice. Good airline and nice aircraft. 21:43:16 ok i'd better run. martial , shall we finish up now or do you want to kick on and close things out? 21:43:34 I don't have anything else. 21:43:46 I can close 21:43:49 agreed janders , they can be a bit too progressive/cringe-worthy with their safety videos at times though 21:43:52 although not sure if we have much to add 21:44:00 #topic ISC BoF 21:44:10 ok, i'm off to the coffee cart. o/ 21:44:19 safe travels mate! 21:44:48 like @b1airo mentioned, we are proposing a OpenStack HPC panel for ISC 19 (Frankfurt, June) 21:44:57 I haven't managed to get anything out of my colleagues who might be going to ISC so unfortunately can't contribute 21:45:28 I put in a talk in vHPC, so I might be there for ISC 21:46:07 gt1437: cool :) 21:46:13 nice! 21:47:03 and just realised the one after Denver is Shanghai, that's cool 21:47:36 oh wow 21:48:48 thanks, I didn't know (Shanghai was in my top 3 guesses though) 21:49:15 yeah thought it was going to be beijing, anyway, still good 21:49:43 any goss about the exact dates? 21:50:10 (the website states just "nov 2019") 21:50:21 no idea 21:51:50 no dates yet 21:54:55 I'm off to meetings..ciao 21:55:02 have a good day 21:55:29 have a good day all, 21:55:33 anything else? 21:55:49 otherwise just a reminder that we will have the Lighting talk at the summit :) 21:56:45 I think we're good 21:56:55 thank you Martial 21:57:10 #endmeeting