21:00:04 <oneswig> #startmeeting scientific-sig
21:00:05 <openstack> Meeting started Tue Jul 21 21:00:04 2020 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:08 <openstack> The meeting name has been set to 'scientific_sig'
21:00:28 <oneswig> Greetings
21:01:02 <jmlowe> Hello
21:01:04 <oneswig> I don't think there is a formal agenda set for today, any items to suggest?
21:01:11 <oneswig> Hi jmlowe, good afternoon.
21:01:16 <rbudden> hello
21:01:30 <jmlowe> Bob!
21:01:31 <oneswig> Over here we've been watching 'stranger things' - lots of Indiana-themed action.
21:01:36 <oneswig> Hi rbudden
21:01:49 <jmlowe> Due to tax credits, filmed in Georga
21:02:05 <martial> ? :)
21:02:06 <jmlowe> Doesn't have quite the same look
21:02:10 <trandles> Hi all
21:02:17 <oneswig> hi trandles martial
21:02:20 <trandles> I have an agenda item oneswig
21:02:22 <oneswig> #chair martial
21:02:23 <openstack> Current chairs: martial oneswig
21:02:27 <jmlowe> They did get lots of other things just like I remember them
21:02:28 <oneswig> trandles: go for it
21:02:32 <trandles> It's a quick one, and amusing
21:03:07 <trandles> If anyone remembers the Slurm plugins I wrote and spoke about in Austin and Barcelona, you'll know that I submitted paperwork to the lab to get the open sourced.
21:03:14 <trandles> I haven't heard anything since October 2017
21:03:21 <oneswig> I remember!
21:03:23 <trandles> Even when I asked for status updates, I got nothing back.
21:03:32 <jmlowe> You accidentally ripped a hole with some classified high energy experiments and allowed the Gorgon in?
21:03:33 <trandles> Well, today someone contacted me.
21:03:46 <trandles> It looks like I'll be able to release that code after all
21:03:58 <jmlowe> That's less likely than releasing the Gorgon
21:04:08 <oneswig> Man, that's awesome.  I'm going to downgrade to 2017 OpenStack specially
21:04:13 <trandles> I haven't looked at it since it seemed that it would never be approved to see the light of day.
21:04:36 <trandles> So I'll get it working with Slurm 20 and the current OpenStack API and get it released ASAP
21:04:54 <oneswig> ... won't that require another approval? :-)
21:05:09 <trandles> nope
21:05:17 <jmlowe> I would imagine it's covered by the open source license
21:05:33 <oneswig> trandles: can you remind us what the plugins did, and how?
21:05:37 <trandles> I wrote the request in a very open ended fashion as "Open source contributions to Slurm"
21:05:47 <jmlowe> must now release every change
21:06:26 <oneswig> back in a sec, kids are up
21:07:08 <trandles> Two sets of functionality: 1) slurm grabs images from glance and deposits the on the nodes in a job allocation and 2) dynamic VLAN manipulation via Neutron controlled by slurm
21:07:53 <trandles> I'm not sure how interesting #1 is any more. I'm willing to get it working if anyone cares. #2 is definitely still interesting at least to us at LANL.
21:08:50 <oneswig> With #2 was that for bare metal networking?
21:09:08 <trandles> Yeah. Nothing installed on the compute hosts.
21:09:16 <oneswig> Done how?
21:09:39 <trandles> The prototype I wrote made calls to the neutron API, neutron was using the Arista ML2 plugin
21:10:07 <oneswig> Very interesting.
21:10:28 <oneswig> Did you have issues with scaling the number of ports?  And reliability?
21:10:30 <trandles> So it was configuring VLANs on the switch, adding/removing/creating/deleting depending on the job's request.
21:10:55 <trandles> I never did scale past my testbed size of 48 nodes.
21:11:28 <trandles> Reliability using Arista's CloudVision was fine
21:11:57 <oneswig> Was it doing the port updates in a single batched operation?
21:12:29 <oneswig> I've heard of similar activities using networking-generic-switch and ansible-networking
21:13:04 <trandles> I'll have to look at the code again, but as I recall it was one update to the switches per VLAN operation.
21:13:07 <jmlowe> Isn't there a neutron vxlan vtep thingy
21:13:25 <oneswig> One persistent problem appears to be making changes to large numbers of ports, if they can't be bulk-processed then it can lead to timeouts getting exceeded.
21:14:04 <oneswig> jmlowe: if the hardware supports it, yes but I am not sure which mechanism drivers support that for bare metal.
21:14:19 <trandles> The two use cases were isolating the nodes in a job, for instance if you were going to start up an insecure listening service (like DASK at the time), and making network-attached resources available ONLY to specific nodes in an allocation.
21:15:47 <trandles> So for the later, imagine you have an RDBMS like Oracle that you don't want an entire cluster to be able to see or a filesystem that should only be mounted by specific nodes in an allocation.
21:16:20 <martial> question: some people here are having a lot of issues with Octavia, is there some advice that
21:16:26 <martial> I can pass along?
21:16:31 <trandles> I had thought about going further and doing things like manipulating flows at the switch hardware level
21:16:43 <trandles> But never got that far with it
21:17:23 <oneswig> trandles: so how long would it take to program the switch with new vlan config?
21:17:43 <trandles> On a 2-switch testbed, it was fast
21:18:04 <oneswig> And how did you interface between slurmctld and Neutron?
21:18:05 <trandles> I want to say it didn't delay job start by more than a few seconds
21:18:22 <johnsom> martial What issues are you having with Octavia?
21:18:31 <trandles> The openstack API calls were all made by a Slurm job start plugin
21:18:46 <trandles> Slurmctld that is, not slurmd
21:19:15 <trandles> It was a lot like some fabric management stuff that slurm had long ago for the bluegene systems from IBM IIRC
21:19:21 <trandles> Although that's been removed from Slurm I think
21:20:05 <trandles> I've since wondered if using PMIx's fabric setup hooks would be a better generic solution for job-specific SDN
21:20:47 <trandles> https://pmix.github.io/standard/fabric-manager-roles-and-expectations
21:22:04 <oneswig> For Ethernet and bare metal Slurm, would be a nice additional method of job isolation.
21:23:31 <trandles> I agree. I'm interested to know if slingshot, since it's based on ethernet, will have similar isolation mechanisms.
21:23:49 <trandles> I just haven't had time to get near slingshot yet. Perhaps some of you have.
21:24:19 <oneswig> I wouldn't know.  They shipped 4 Shasta cabinets to Edinburgh Parallel Computing Centre a few weeks ago.
21:24:43 <trandles> Kill them with fire and throw them in a loch :P
21:25:05 <oneswig> Deployment running a bit behind schedule it seems.
21:25:10 <trandles> (My opinion, I speak not for LANL/NNSA/DOE)
21:25:43 <oneswig> I'm sure you're just jealous of such a large body of fresh water :-)
21:25:54 <oneswig> trandles: Are you aware of the OpenDev session on bare metal networking tomorrow?  My colleague Mark is chairing.
21:26:06 <oneswig> You should join in!
21:26:23 <trandles> Unfortunately I've been too swamped to attend any of this week's sessions. :(
21:26:28 <trandles> What time tomorrow?
21:27:00 <oneswig> 13:15 UTC
21:27:10 <oneswig> #link baremetal networking etherpad https://etherpad.opendev.org/p/OpenDev_HardwareAutomation_Day3
21:27:20 <martial> johnsom: truthfully not 100% sure, I heard issue about port collisions
21:27:31 <trandles> 7:15 here, I'll be available and will attend
21:28:10 <oneswig> Excellent.  It's always useful to get many points of view.
21:28:11 <johnsom> martial Hm, ok, well let them know we are in the #openstack-lbaas channel and happy to help.
21:28:55 <martial> johnsom: will do
21:28:58 <trandles> In other relevant news, I've been working quite a lot with Ironic and will be doing an internal demo for some systems staff. It's a candidate for our next bare metal cluster provisioning system.
21:29:23 <martial> cool, re tomorrow, will see if my meeting is done and will try to attend :)
21:30:18 <oneswig> trandles: that would be very cool.  Is that with the deploy-to-ramdisk stuff?
21:31:00 <oneswig> trandles: did you sign up for opendev this week?  https://www.eventbrite.com/e/opendev-hardware-automation-registration-104569991660?aff=ebemnsuserinsight&afu=226342862332&utm_term=eattnewsrecs&recommended_events_quantity=6&utm_campaign=newsletter_editorial&utm_content=new_york-ny.r2020_28&utm_source=eventbrite&utm_medium=email&rank=0&ref=ebemnsuserinsight
21:31:09 <trandles> Some of it is. I'm pitching it to replace all provisioning, not just cluster but also lots of diskful infrastructure services too
21:32:39 <oneswig> We've been having fun recently with Ironic's new deploy to software RAID capability
21:33:17 <trandles> Registered, I thought I had but that's for the containers in production event
21:33:18 <oneswig> #link Software raid in Ironic blog https://www.stackhpc.com/software-raid-in-ironic.html
21:34:08 <oneswig> Did you also see the baremetal whitepaper came out this week?
21:34:11 <trandles> I saw your section of the new ironic whitepaper
21:34:22 <oneswig> Ah, great (it's the same stuff)
21:36:11 <oneswig> #link Baremetal Ironic whitepaper https://www.openstack.org/bare-metal/how-ironic-delivers-abstraction-and-automation-using-open-source-infrastructure
21:37:24 <oneswig> I have to drop off, alas - all this typing is waking my kids up (either that or the after effects of Stranger Things)
21:37:54 <oneswig> martial: can you take over as chair?
21:38:22 <oneswig> Interesting discussion, hopefully catch you at OpenDev tomorrow
21:39:10 <martial> sure thing
21:39:35 <martial> I would bet Stranger Things has something to do with it :)
21:41:03 <martial> checking the agenda to see if anything else is to be covered today
21:42:50 <martial> okay not much there :)
21:43:09 <martial> #topic AOB
21:43:26 <martial> opening the floor for additional conversations as needed
21:45:55 <martial> well offering to save people 15 minutes then :)
21:47:46 <trandles> Sorry, had a phone call
21:47:59 <trandles> I'll be at opened tomorrow, later everyone
21:48:36 <martial> bye then, bye all :)
21:48:40 <martial> #end-meeting
21:48:46 <martial> #endmeeting