21:00:04 <oneswig> #startmeeting scientific-sig 21:00:05 <openstack> Meeting started Tue Jul 21 21:00:04 2020 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:08 <openstack> The meeting name has been set to 'scientific_sig' 21:00:28 <oneswig> Greetings 21:01:02 <jmlowe> Hello 21:01:04 <oneswig> I don't think there is a formal agenda set for today, any items to suggest? 21:01:11 <oneswig> Hi jmlowe, good afternoon. 21:01:16 <rbudden> hello 21:01:30 <jmlowe> Bob! 21:01:31 <oneswig> Over here we've been watching 'stranger things' - lots of Indiana-themed action. 21:01:36 <oneswig> Hi rbudden 21:01:49 <jmlowe> Due to tax credits, filmed in Georga 21:02:05 <martial> ? :) 21:02:06 <jmlowe> Doesn't have quite the same look 21:02:10 <trandles> Hi all 21:02:17 <oneswig> hi trandles martial 21:02:20 <trandles> I have an agenda item oneswig 21:02:22 <oneswig> #chair martial 21:02:23 <openstack> Current chairs: martial oneswig 21:02:27 <jmlowe> They did get lots of other things just like I remember them 21:02:28 <oneswig> trandles: go for it 21:02:32 <trandles> It's a quick one, and amusing 21:03:07 <trandles> If anyone remembers the Slurm plugins I wrote and spoke about in Austin and Barcelona, you'll know that I submitted paperwork to the lab to get the open sourced. 21:03:14 <trandles> I haven't heard anything since October 2017 21:03:21 <oneswig> I remember! 21:03:23 <trandles> Even when I asked for status updates, I got nothing back. 21:03:32 <jmlowe> You accidentally ripped a hole with some classified high energy experiments and allowed the Gorgon in? 21:03:33 <trandles> Well, today someone contacted me. 21:03:46 <trandles> It looks like I'll be able to release that code after all 21:03:58 <jmlowe> That's less likely than releasing the Gorgon 21:04:08 <oneswig> Man, that's awesome. I'm going to downgrade to 2017 OpenStack specially 21:04:13 <trandles> I haven't looked at it since it seemed that it would never be approved to see the light of day. 21:04:36 <trandles> So I'll get it working with Slurm 20 and the current OpenStack API and get it released ASAP 21:04:54 <oneswig> ... won't that require another approval? :-) 21:05:09 <trandles> nope 21:05:17 <jmlowe> I would imagine it's covered by the open source license 21:05:33 <oneswig> trandles: can you remind us what the plugins did, and how? 21:05:37 <trandles> I wrote the request in a very open ended fashion as "Open source contributions to Slurm" 21:05:47 <jmlowe> must now release every change 21:06:26 <oneswig> back in a sec, kids are up 21:07:08 <trandles> Two sets of functionality: 1) slurm grabs images from glance and deposits the on the nodes in a job allocation and 2) dynamic VLAN manipulation via Neutron controlled by slurm 21:07:53 <trandles> I'm not sure how interesting #1 is any more. I'm willing to get it working if anyone cares. #2 is definitely still interesting at least to us at LANL. 21:08:50 <oneswig> With #2 was that for bare metal networking? 21:09:08 <trandles> Yeah. Nothing installed on the compute hosts. 21:09:16 <oneswig> Done how? 21:09:39 <trandles> The prototype I wrote made calls to the neutron API, neutron was using the Arista ML2 plugin 21:10:07 <oneswig> Very interesting. 21:10:28 <oneswig> Did you have issues with scaling the number of ports? And reliability? 21:10:30 <trandles> So it was configuring VLANs on the switch, adding/removing/creating/deleting depending on the job's request. 21:10:55 <trandles> I never did scale past my testbed size of 48 nodes. 21:11:28 <trandles> Reliability using Arista's CloudVision was fine 21:11:57 <oneswig> Was it doing the port updates in a single batched operation? 21:12:29 <oneswig> I've heard of similar activities using networking-generic-switch and ansible-networking 21:13:04 <trandles> I'll have to look at the code again, but as I recall it was one update to the switches per VLAN operation. 21:13:07 <jmlowe> Isn't there a neutron vxlan vtep thingy 21:13:25 <oneswig> One persistent problem appears to be making changes to large numbers of ports, if they can't be bulk-processed then it can lead to timeouts getting exceeded. 21:14:04 <oneswig> jmlowe: if the hardware supports it, yes but I am not sure which mechanism drivers support that for bare metal. 21:14:19 <trandles> The two use cases were isolating the nodes in a job, for instance if you were going to start up an insecure listening service (like DASK at the time), and making network-attached resources available ONLY to specific nodes in an allocation. 21:15:47 <trandles> So for the later, imagine you have an RDBMS like Oracle that you don't want an entire cluster to be able to see or a filesystem that should only be mounted by specific nodes in an allocation. 21:16:20 <martial> question: some people here are having a lot of issues with Octavia, is there some advice that 21:16:26 <martial> I can pass along? 21:16:31 <trandles> I had thought about going further and doing things like manipulating flows at the switch hardware level 21:16:43 <trandles> But never got that far with it 21:17:23 <oneswig> trandles: so how long would it take to program the switch with new vlan config? 21:17:43 <trandles> On a 2-switch testbed, it was fast 21:18:04 <oneswig> And how did you interface between slurmctld and Neutron? 21:18:05 <trandles> I want to say it didn't delay job start by more than a few seconds 21:18:22 <johnsom> martial What issues are you having with Octavia? 21:18:31 <trandles> The openstack API calls were all made by a Slurm job start plugin 21:18:46 <trandles> Slurmctld that is, not slurmd 21:19:15 <trandles> It was a lot like some fabric management stuff that slurm had long ago for the bluegene systems from IBM IIRC 21:19:21 <trandles> Although that's been removed from Slurm I think 21:20:05 <trandles> I've since wondered if using PMIx's fabric setup hooks would be a better generic solution for job-specific SDN 21:20:47 <trandles> https://pmix.github.io/standard/fabric-manager-roles-and-expectations 21:22:04 <oneswig> For Ethernet and bare metal Slurm, would be a nice additional method of job isolation. 21:23:31 <trandles> I agree. I'm interested to know if slingshot, since it's based on ethernet, will have similar isolation mechanisms. 21:23:49 <trandles> I just haven't had time to get near slingshot yet. Perhaps some of you have. 21:24:19 <oneswig> I wouldn't know. They shipped 4 Shasta cabinets to Edinburgh Parallel Computing Centre a few weeks ago. 21:24:43 <trandles> Kill them with fire and throw them in a loch :P 21:25:05 <oneswig> Deployment running a bit behind schedule it seems. 21:25:10 <trandles> (My opinion, I speak not for LANL/NNSA/DOE) 21:25:43 <oneswig> I'm sure you're just jealous of such a large body of fresh water :-) 21:25:54 <oneswig> trandles: Are you aware of the OpenDev session on bare metal networking tomorrow? My colleague Mark is chairing. 21:26:06 <oneswig> You should join in! 21:26:23 <trandles> Unfortunately I've been too swamped to attend any of this week's sessions. :( 21:26:28 <trandles> What time tomorrow? 21:27:00 <oneswig> 13:15 UTC 21:27:10 <oneswig> #link baremetal networking etherpad https://etherpad.opendev.org/p/OpenDev_HardwareAutomation_Day3 21:27:20 <martial> johnsom: truthfully not 100% sure, I heard issue about port collisions 21:27:31 <trandles> 7:15 here, I'll be available and will attend 21:28:10 <oneswig> Excellent. It's always useful to get many points of view. 21:28:11 <johnsom> martial Hm, ok, well let them know we are in the #openstack-lbaas channel and happy to help. 21:28:55 <martial> johnsom: will do 21:28:58 <trandles> In other relevant news, I've been working quite a lot with Ironic and will be doing an internal demo for some systems staff. It's a candidate for our next bare metal cluster provisioning system. 21:29:23 <martial> cool, re tomorrow, will see if my meeting is done and will try to attend :) 21:30:18 <oneswig> trandles: that would be very cool. Is that with the deploy-to-ramdisk stuff? 21:31:00 <oneswig> trandles: did you sign up for opendev this week? https://www.eventbrite.com/e/opendev-hardware-automation-registration-104569991660?aff=ebemnsuserinsight&afu=226342862332&utm_term=eattnewsrecs&recommended_events_quantity=6&utm_campaign=newsletter_editorial&utm_content=new_york-ny.r2020_28&utm_source=eventbrite&utm_medium=email&rank=0&ref=ebemnsuserinsight 21:31:09 <trandles> Some of it is. I'm pitching it to replace all provisioning, not just cluster but also lots of diskful infrastructure services too 21:32:39 <oneswig> We've been having fun recently with Ironic's new deploy to software RAID capability 21:33:17 <trandles> Registered, I thought I had but that's for the containers in production event 21:33:18 <oneswig> #link Software raid in Ironic blog https://www.stackhpc.com/software-raid-in-ironic.html 21:34:08 <oneswig> Did you also see the baremetal whitepaper came out this week? 21:34:11 <trandles> I saw your section of the new ironic whitepaper 21:34:22 <oneswig> Ah, great (it's the same stuff) 21:36:11 <oneswig> #link Baremetal Ironic whitepaper https://www.openstack.org/bare-metal/how-ironic-delivers-abstraction-and-automation-using-open-source-infrastructure 21:37:24 <oneswig> I have to drop off, alas - all this typing is waking my kids up (either that or the after effects of Stranger Things) 21:37:54 <oneswig> martial: can you take over as chair? 21:38:22 <oneswig> Interesting discussion, hopefully catch you at OpenDev tomorrow 21:39:10 <martial> sure thing 21:39:35 <martial> I would bet Stranger Things has something to do with it :) 21:41:03 <martial> checking the agenda to see if anything else is to be covered today 21:42:50 <martial> okay not much there :) 21:43:09 <martial> #topic AOB 21:43:26 <martial> opening the floor for additional conversations as needed 21:45:55 <martial> well offering to save people 15 minutes then :) 21:47:46 <trandles> Sorry, had a phone call 21:47:59 <trandles> I'll be at opened tomorrow, later everyone 21:48:36 <martial> bye then, bye all :) 21:48:40 <martial> #end-meeting 21:48:46 <martial> #endmeeting