21:00:18 <oneswig> #startmeeting scientific-sig 21:00:19 <openstack> Meeting started Tue Aug 6 21:00:18 2019 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:21 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:23 <openstack> The meeting name has been set to 'scientific_sig' 21:00:26 <oneswig> away we go 21:00:43 <b1airo> Morning 21:00:46 <janders> g'day guys 21:00:49 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_August_6th_2019 21:01:07 <oneswig> Morning b1airo janders how's Wednesday going? 21:01:25 <janders> oneswig starting slowly :) how are you? 21:01:27 <b1airo> I'm just back from holiday and organising late start with the kids etc, so will be a bit fleeting 21:01:30 <priteau> Hello everyone 21:01:42 <oneswig> Hi priteau! 21:02:12 <oneswig> janders: I'm well, thanks. Been a fairly intense week working on some customer projects, but a good one. 21:02:21 <janders> that's good to hear 21:03:20 <oneswig> I was anticipating jmlowe and trandles coming along today 21:03:44 <oneswig> janders: what's the latest on supercloud? 21:04:21 <janders> not much of an update - we're at this time in the year where all the projects are getting reshuffled - so more paperwork than anything else really 21:04:47 <janders> some interesting challenges with dual port ConnectX6es - happy to touch on this in AOB if there is time 21:05:12 <oneswig> was talking with somebody recently with an interest in a Manila driver for BeeGFS 21:05:49 <oneswig> janders: always happy to have some grunty networking issues to round the hour out. 21:06:07 <janders> :) sounds good 21:06:12 <b1airo> Yeah Manila BeeGFS sounds useful 21:06:17 <janders> +1!!! 21:06:23 <oneswig> #topic OpenStack user survey 21:06:37 <oneswig> #link yer tiz https://www.openstack.org/user-survey/survey-2019/landing 21:07:01 <oneswig> Get it filled in and make sure your scientific clouds are stood up and counted. 21:07:16 <oneswig> That's all about that. 21:07:32 <oneswig> #topic Monitoring for Chargeback and Accounting 21:07:53 <tbarron> just need someone to write and maintain a manila driver for BeeGFS and we commit (manila maintainers) to help it integrate/merge/etc. 21:08:20 * tbarron apologizes for jumping before current topic, is done 21:08:21 <oneswig> Hi tbarron - ears burning :-) 21:08:36 <oneswig> Thanks for joining in. 21:08:58 <tbarron> oneswig: :) 21:09:29 <trandles> o/ I'm late but oh well 21:09:58 <oneswig> I fear the maintaining bit is often overlooked. But I think there's a good opportunity here to do something good. 21:10:15 <oneswig> hi trandles, you made it! 21:11:35 <oneswig> tbarron: we'll see if we can get an interested group together 21:12:47 <oneswig> #action oneswig to canvas for interest on Manila driver for BeeGFS 21:13:11 <oneswig> OK, let's return to topic, ok? 21:13:57 <oneswig> We actually discussed this last week when priteau did some interesting work investigating CloudKitty drawing data from Monasca instead of the Ceilometer family. 21:14:11 <priteau> oneswig: Telemetry family :) 21:14:36 <oneswig> I stand corrected 21:14:50 <oneswig> #link CloudKitty and Monasca (episode 1) https://www.stackhpc.com/cloudkitty-and-monasca-1.html 21:15:27 <oneswig> This article sets the scene 21:15:45 <oneswig> priteau: I think you've been busy since and assume you've no further developments to report? 21:16:19 <priteau> I am afraid I've been otherwise engaged 21:16:50 <priteau> But hopefully the summer is not yet over for a sequel blog post 21:17:30 <oneswig> I hope not - you can't leave us hanging off this cliff-edge! 21:17:49 <oneswig> b1airo: were you using CloudKitty at Monash? 21:19:02 <b1airo> No, looked at it a few times, but never attempted putting it all together 21:20:05 <oneswig> Ah ok, I remember you mentioning it. 21:21:13 <oneswig> We have an interest in generating billing data but an aversion to pulling in additional telemetry to do it. 21:21:22 <priteau> CloudKitty itself isn't very complex to configure, it's more having the right data collected that can be tricky 21:24:04 <oneswig> I'll be interested to see how it works with data from the OpenStack exporter, or nova instance data 21:26:02 <priteau> Nova is actually the easiest service to charge because there are various ways to collect usage metrics. Other services, like charge for image or volume storage, will be more challenging. 21:28:28 <oneswig> A pity jmlowe's not around to tell us how they use xdmod (I assume) for this 21:31:30 <oneswig> OK, time for janders ConnectX6 issue? 21:31:38 <oneswig> #topic AOB 21:32:08 <janders> ok! 21:32:14 <oneswig> janders: what's been going on? 21:32:24 <janders> does any of you have any experience with dual-port CX6s? 21:32:54 <oneswig> only CX-5, alas 21:33:13 <clarkb> Thought I would point out http://lists.openstack.org/pipermail/openstack-discuss/2019-August/008304.html as some of you may be able to answer those questions 21:33:34 <janders> I think CX5s are a bit easier to work with :) 21:33:50 <janders> I wanted to use one CX6 port as 50GE and the other as HDR200 21:34:03 <oneswig> Hi clarkb - saw that earlier tonight, thanks for raising it. 21:34:14 <janders> it seems that with the current firmware it's hard to get eth/ib the ports to work concurrently 21:34:20 <janders> it's a bit of "one or the other" 21:34:32 <oneswig> What happened to VPI? 21:35:17 <janders> another angle: do you guys use splitter cables (eg 100GE> 2x50GE)? 21:35:32 <b1airo> Yes 21:35:41 <oneswig> janders: yes, on a CX-4 system, had a good deal of trouble initially 21:35:53 <janders> interesting.. hitting the same 21:36:01 <janders> what sorts of issues did you have? 21:36:47 <janders> so far I've seen that the support for splitters sometimes varies across firmware versions (say 1.1 supports it than you upgrade to 1.2 and stuff stops working and support say this version doesn't support splitters) 21:37:21 <janders> also support guys seem very confused about splitters in general (once I was told these are meant to connect switches not nodes, which seems insane) 21:38:16 <oneswig> We had 100G-4x25G splitter with 25G SFP+, but the NICs had QSFP sockets. Mellanox do a passive slipper that would take the SFP cable in the QSFP socket - needed some firmware tweaks to get it going. 21:38:46 <janders> right! 21:39:07 <janders> did they make these tweaks mainstream in the end, or did these remain tweaks? 21:39:54 <oneswig> I think they were mainstream but I'll check the fw version now 21:40:03 <janders> IIRC our "splitters" are QSFP on both sides 21:41:07 <oneswig> The CX4 NICs are running 12.20.1010 21:41:21 <oneswig> Probably ~1.5 years old 21:41:32 <janders> ok! 21:41:56 <janders> so - I think between dual-port CX6es and the splitter cables we might have a bit more "fun" before everything works 21:42:18 <oneswig> What's preventing running the NIC dual-protocol? Do you think it's the splitter? 21:42:44 <janders> I was told there are missing bits in the CX6 firmware but that came through a reseller not Mellanox directly 21:42:50 <janders> so not 100% sure if this is accurate 21:43:05 <oneswig> That would be a surprising piece to be missing. 21:43:13 <janders> also given we're using the dual-PCI-slot form factor I have a sneaking suspicion that VPI might be tricker than it used to 21:43:20 <oneswig> What do you get with CX-6, apart from 200G? 21:43:22 <janders> thanks, PCIe3.0 21:43:27 <janders> nothing 21:43:50 <janders> I'm actually considering asking to have some cards replaced with CX5s if this can't be resolved promptly 21:43:59 <janders> HDR200 would be handy for my GPFS cluster though 21:44:26 <janders> it's NVMe based (similar design to our BeeGFS) so could definitely use that bandwidth 21:44:45 <janders> but... 2x CX5 could do that, too :) 21:45:09 <oneswig> You'd get to put one on each socket as well, and perhaps exploit some numa locality 21:45:22 <janders> indeed 21:45:53 <janders> I'm anticipating an update from the reseller & Mellanox this week, I will report back if I learn anything intresting 21:46:02 <janders> it's very useful to know you guys had issues with splitters, too 21:46:24 <janders> I will be less keen to put them in any larger systems now :( 21:46:42 <oneswig> We did but I think the QSA28 slipper was part of our problem 21:46:59 <janders> it's a shame cause these could be used to build some really funky IO-centric topologies 21:47:26 <janders> I haven't given up on them entirely just yet but it's definitely an amber light in terms of resilient systems design 21:47:36 <oneswig> I wouldn't give up on them just yet, it would rule out many options 21:48:14 <janders> one thing seems certain: taking up a new HCA generation + splitters at the same time is painful 21:49:26 <janders> that's pretty much all I have on this topic for now. Thank you for sharing thoughts, I will keep you posted! :) 21:50:03 <oneswig> Thanks janders, feel your pain but slightly envy your kit at the same time :-) 21:50:18 <oneswig> OK, shall we wrap up? Any more to add today? 21:50:39 <b1airo> I think being an early adopter of anything new from MLNX has a certain degree of pain and perplexedness involved 21:50:53 <janders> +1 :) 21:51:23 <oneswig> how about being an early adopter of OPA, in fairness? 21:51:46 <janders> LOL! 21:52:01 <janders> I suppose you can be among the first and the last ones, all at the same time 21:53:03 <b1airo> Hehe 21:53:48 <b1airo> You guys see ARDC has released a call for proposals for new Research Cloud infra 21:54:18 <oneswig> What's going on there b1airo? 21:54:22 <b1airo> Only ~$5.5m worth though it seems 21:55:36 <b1airo> Probably just aiming to replace existing capacity. Was $20m 6-7 years ago, so $5m probably buys at least the same number of vcores now 21:56:02 <oneswig> ARDC is New Nectar? 21:57:10 <b1airo> Yeah, Nectar-ANDS-RDS conglomerate 21:58:13 <b1airo> Also, quickly on topic of public vs private/hybrid 21:58:50 <b1airo> Was thinking of reaching out to some server vendors to see if they have some good base modelling on costs, anyone know of some? 21:59:49 <oneswig> I think Cambridge Uni had some models for cost per core hour that might be relevant but I'm not sure how general (or public) they are 21:59:53 <oneswig> I'll check 22:00:11 <oneswig> OK, time to close 22:00:13 <b1airo> Sounds like I'll need to do some direct legwork 22:00:26 <oneswig> Thanks all 22:00:29 <oneswig> #endmeeting