21:00:17 <oneswig> #startmeeting scientific-sig
21:00:18 <openstack> Meeting started Tue Apr 16 21:00:17 2019 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:21 <openstack> The meeting name has been set to 'scientific_sig'
21:00:30 <oneswig> Greetings SIG
21:00:36 <rbudden> hello
21:00:44 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_April_16th_2019
21:00:48 <oneswig> Hey rbudden, hody
21:00:57 <oneswig> (I mean, howdy)
21:01:00 <rbudden> heh
21:01:30 <oneswig> Martial sends apologies as he has family commitments this week
21:01:36 <elox> Hello from Sweden, Erik Lönroth http://eriklonroth.com here.
21:01:38 <rbudden> unfortunately i have to bounce early to pickup my son, but i’ll catch up on the logs afterwards
21:01:52 <oneswig> elox: Hi, welcome :-)
21:02:07 <elox> thanx!
21:02:15 <oneswig> Thanks for coming along
21:02:41 <elox> Hope to learn and contribute.
21:02:53 <oneswig> Great!
21:03:19 <oneswig> Seems like it might be a short meeting this week though - with the run-up to Easter a few folks are out
21:03:50 <oneswig> OK, let's get going...
21:03:58 <oneswig> #topic Ironic and external DCIM
21:04:18 <oneswig> The context here is there was some discussion last week with the CERN team (Arne in particular)
21:04:27 <martial> (On phone as discussed)
21:04:51 <oneswig> They've got a big server enrollment imminent and are looking again at how they might streamline that
21:04:57 <oneswig> hey martial, you made it :-)
21:05:06 <oneswig> #chair martial
21:05:07 <openstack> Current chairs: martial oneswig
21:05:28 <oneswig> #link previous week's discussion http://eavesdrop.openstack.org/meetings/scientific_sig/2019/scientific_sig.2019-04-10-10.59.log.html#l-17
21:06:43 <oneswig> If I can summarise correctly, the conclusion was that they'd like a means to store inspection data in an external infrastructure management DB (eg, Netbox, etc)
21:07:03 <oneswig> And to update periodically if possible to keep the information current
21:07:53 <elox> Is there anyone from their team here?
21:08:12 <oneswig> Arne's looking at candidate technologies and wants to consider things that people are actually using
21:08:21 <b1airo> o/
21:08:23 <oneswig> elox: Not as far as I know...
21:08:36 <oneswig> Hi b1airo :-)
21:08:40 <oneswig> #chair b1airo
21:08:40 <openstack> Current chairs: b1airo martial oneswig
21:08:53 <TheJulia> So following up dtantsur's comment, we've also discussed downstream building a reporting agent to provide a greater detail, but it would only really be viable in self hosted environments and there are items that would need to be worked out to make something viable like that.
21:09:19 <oneswig> Hi TheJulia, thanks for joining
21:09:33 <oneswig> What do you mean by self-hosted in this context?  Sorry.
21:10:10 <TheJulia> operator that is willing to trust and deploy an agent with-in their workload
21:10:40 <oneswig> Ah, thanks.
21:11:18 <TheJulia> The end idea, possibly being keeping ironic's inspection data fairly up to date.
21:11:25 <oneswig> That's not ideal, in many cases people wouldn't want that.
21:12:05 <TheJulia> Out of band... while seems perfect, we're also talking about embedded systems that are easy to put in weird states that cause them to lock up for a little while which is also not ideal
21:12:27 <oneswig> TheJulia: I think there was some discussion around the cleaning phase being an option for periodic re-inspection?
21:12:52 <TheJulia> oneswig: possibly, the case we have is we want to somehow capture a drive swap
21:12:57 <TheJulia> or the addition of a disk
21:13:26 <oneswig> I see.  Hence wanting to track while an instance is live?
21:13:39 <TheJulia> For which the usefulness is kind pointless. That being said during cleaning to have a check-in with inspector just seems like some minor-ish logic in inspector
21:13:46 <TheJulia> oneswig: exactly.
21:14:55 <oneswig> The sense I got was that within Ironic many of the pieces for a solution (that met many needs) were already within reach.
21:15:45 <oneswig> Although it sounds like your problem case is still beyond that.
21:16:11 <oneswig> Arne was looking for a shortlist of systems people use for DCIM.
21:16:36 <oneswig> So far I recall the list included Xcat, Netbox, Infoblox, possibly OpenDCIM
21:16:55 <oneswig> rbudden: do you know what you're using at GSFC?
21:17:07 <rbudden> we use xcat
21:17:31 <oneswig> Arne's list was actually longer... i-doit, CMDBuild, SnipeIT, SysAid, Spiceworks, GLPI, NetBox, xcat
21:18:27 <oneswig> thanks rbudden, I'll make sure that's fed back.  How about you elox?
21:18:31 <trandles> Hi folks. Sorry, was logged in and STILL late to the meeting.
21:18:50 <oneswig> Hi trandles!
21:19:38 <oneswig> trandles: Just talking solutions for DCIM
21:19:46 <rbudden> ironically at GSFC xcat is used for baremetal provisioning and some lightweight VMs that aren’t openstack controlled, but there is interest in entertaining Ironic for future HPC clusters
21:20:04 <trandles> DCIM is a hot topic
21:20:11 <elox> We are using MaaS
21:20:22 <oneswig> trandles: the circles in which you move... :-)
21:20:39 <oneswig> thanks elox, good to know
21:21:13 <trandles> I'd like to hear a lot about other's experiences. We don't have a comprehensive DCIM right now but we might be pushed in that direction. Would be beneficial to have a head start when it does come to the fore.
21:21:58 <oneswig> trandles: from the data gathered so far, it's a pretty diverse set of tools in use.
21:22:11 <elox> trandles: We are really keen on sharing our experiences.
21:22:15 <jmlowe> Hey, sorry I'm late
21:22:24 <oneswig> Hi jmlowe, no problem :-)
21:22:55 <oneswig> Currently surveying DCIMs in use - care to contribute a data point for IU?
21:23:12 <jmlowe> We have one, I ignore it
21:23:40 <TheJulia> I'm kind of curious if there is a reason why
21:23:44 <oneswig> fair enough :-) probably not a good recommendation
21:24:09 <TheJulia> Which sounds awful, but it gives everyone a data point in terms of is there a process or use for DCIM that might not be ideal
21:24:38 <jmlowe> looks like opendcim
21:24:54 <trandles> Our current issues with DCIM probably aren't unique. We have several siloed systems that came from vendors and only really work well with their equipment, where "equipment" is largely power and cooling. They call them "DCIM" but I don't think they know what that means in a larger sense.
21:24:59 <oneswig> jmlowe: Is it bad, or just not relevant to your workflows?
21:25:21 <jmlowe> not relevant, key info in when systems come and go, then ignore
21:25:40 <TheJulia> So its use is more as asset tracking
21:25:44 <rbudden> jmlowe: that was largely how things were at PSC as well
21:26:02 <jmlowe> typical lifespan of systems is 5-7 years
21:26:03 <rbudden> TheJulia: exactly
21:26:25 <oneswig> I've seen Netbox play a useful role inbetween those events
21:27:11 <rbudden> I know the problem with lifecycle at PSC was it was originally populated when new systems came in, then fell out of updates for one reason or another (I forget what DCIM was being used)
21:27:20 <oneswig> TheJulia: do you know what the current state of play is with anomaly detection on inspection data?  I've used cardiff from python-hw for this before, but it was hard to setup
21:27:22 <rbudden> so then it wasn’t reliably useful for anything :P
21:28:07 <TheJulia> oneswig: like failed hardware detection?
21:28:11 <oneswig> rbudden: jmlowe: if the DCIM was updated with Ironic inspection data, perhaps it would have more utility
21:28:21 <oneswig> TheJulia: exactly - missing DIMMs, disks, etc.
21:28:22 <rbudden> yes
21:28:48 <TheJulia> oneswig: for inside introspection data, likely the same as long as long as "extra-hardware" is set as a collector for introspection.
21:29:27 <oneswig> Thanks TheJulia, good to know.
21:29:37 <trandles> oneswig: we use NHC ( https://github.com/mej/nhc ) for that bit it doesn't integrate with a larger DCIM. We have custom splunk dashboards that communicate that level of brokenness.
21:30:19 <trandles> ie. missing DIMMs, interconnect links at the wrong speed, CPUs stuck in a bad cstate, etc.
21:30:20 <oneswig> brokenness as a metric - how cool is that
21:30:43 <oneswig> good link trandles, thanks for that
21:31:05 <rbudden> sorry, have to run, this is all very interesting though, i’ll ask around about anything else in use at GSFC
21:31:18 <oneswig> Thanks rbudden - see you
21:31:20 <rbudden> bye all
21:31:43 <oneswig> elox: What are your experiences with MaaS?
21:31:44 <trandles> it will detect things like wrong firmware levels on mobos, nics, hcas, hdds
21:33:39 <elox> oneswig: So far, it has provided us with a means to work with our hardware as if it was a cloud and adding in the Juju framework (https://jujucharms.com/) lets us provision a diversity of modern software previously out of our reach.
21:34:49 <oneswig> elox: you're provisioning bare metal software deployments with it?  Neat.
21:34:53 <elox> ... We will likely deploy an openStack on top of MaaS which is a legit use-case for us. We already leverage AWS, vsphere and plan to use GCE, Azure and perhaps even Oracle.
21:35:23 <b1airo> ok, other meeting done!
21:35:37 <b1airo> busy morning here o_0
21:35:58 <oneswig> elox: A really diverse set of IaaS there.  I think you've got pretty much the full set? :-)
21:36:17 <oneswig> hey b1airo, good to have you
21:36:56 <elox> oneswig: We deploy the full software stack on top of MaaS: SLURM, OFED, application, middleware etc. We add in things as we need to meet a diverse set of use-cases. AI/BigData etc.
21:37:53 <oneswig> Sounds good.
21:38:34 <oneswig> OK - we should move on with the agenda I guess.  Any more to add before we do?
21:38:44 <elox> oneswig: Yes, its diverse. We need to meet different requirements from projects limited in time. For example, month #1, we might need the HPC cluster(s) to be tailored towards DeepLearning. Month #2 Data Analytics or a mix. We then rebuild the cluster explicitly for those workloads via Juju.
21:38:51 <b1airo> catching up on DCIM discussion... Google Spreadsheet
21:38:57 * b1airo ducks
21:39:26 <oneswig> b1airo: Very true, certainly we've deployed from them
21:40:19 <trandles> I would be remiss if I didn't link this here:  https://github.com/hpc/kraken/
21:40:22 <trandles> it's a LANL thing
21:40:53 <oneswig> Don't see Juju very often but I've heard when it works it's pretty smooth.  Conversely when it doesn't work...
21:41:07 <trandles> not yet in production, very much in development, but looking for more interested use cases
21:42:01 <oneswig> trandles: sounds like a beast alright.  Keep us updated :-)
21:42:20 <elox> oneswig: I think we are pretty much first out here but its living up to our expectations and more so.
21:42:24 <b1airo> just got to the "or will when fully implemented" parentheses :-)
21:43:57 <elox> oneswig: We have developed a SLURM bundle that deploys SLURM to "any" cloud and is in progress of tailor it to be as close to a reference implementation we can get for HPC with SLURM.
21:44:10 <trandles> b1airo: haha, indeed...the local dev has taken some heat for that
21:44:30 <oneswig> elox: good to hear it.  Is that open source?
21:44:47 <elox> All open source.
21:45:08 <elox> oneswig: https://jujucharms.com/slurm-core/bundle/0
21:45:17 <oneswig> Thanks elox - share and enjoy :-)
21:45:54 <oneswig> OK, let's move on...
21:45:58 <oneswig> #topic Denver - SIG "help most wanted" forum
21:46:19 <elox> oneswig: I was hoping to. It feels a bit risky being out in this end so its really in our self interest to both get eyes on what we are doing and helping others that might want to try give it a spin.
21:47:31 <oneswig> I'm aware of one or two others in the Juju ecosystem but given it's a small world you probably know them already too
21:48:05 <oneswig> Last airing for this one, Rico's looking for input on cross-SIG needs to advocate
21:48:44 <oneswig> #link Scientific SIG wish-list taking shape here https://etherpad.openstack.org/p/scientific-sig-denver2019-gaps
21:48:57 <oneswig> Please contribute if you have pain points
21:49:37 <oneswig> There is a forum session in Denver where the hope is the SIGs will gather and find common needs
21:50:00 <oneswig> #link Forum session in Denver https://www.openstack.org/summit/denver-2019/summit-schedule/events/23612/help-most-needed-for-sigs-and-wgs
21:50:33 <oneswig> OK, that's enough on that I think
21:50:45 <oneswig> #topic Denver SIG meeting and BoF
21:51:39 <oneswig> I noticed the room we are in (507) is not looking particularly big on the map...
21:52:07 <oneswig> #link map of the convention center https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/summits/26/locations/428/floors/OSDEN-Map-FINAL-sm2.png
21:52:56 <oneswig> I have a photo from Berlin which I might share with the organisers to see if we can upgrade.
21:53:30 <oneswig> #link lightning talks wanted - sign up here https://etherpad.openstack.org/p/scientific-sig-denver19-bof
21:53:57 <oneswig> I'm hoping Arkady will offer a prize as usual!
21:54:18 <oneswig> Finally - evening beer social
21:54:34 <elox> \o/
21:54:34 <oneswig> Last summit (Berlin) we co-incided with the Ironic gang, which was great
21:55:13 <oneswig> I saw this go by on the Ops thread and wondered if we should tag along with them on the Monday night - http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005096.html
21:55:21 <b1airo> good idea re room upgrade oneswig
21:56:14 <trandles> could be a repeat of Barca when we jammed into that tiny room
21:57:43 <b1airo> also a reminder we're still looking for lightning talks: https://etherpad.openstack.org/p/scientific-sig-denver19-bof
21:58:31 <oneswig> Thanks b1airo
21:58:47 <oneswig> OK we are nearly at time...
21:58:52 <oneswig> AOB items?
21:59:05 <b1airo> oneswig: i'm onboard for the operators drinks
21:59:30 <oneswig> Sounds good to me too (jetlag permitting) - you'll be raring of course... :-)
22:00:11 <trandles> I'll be arriving Sunday around noon if anyone wants to get an early start ;)
22:00:17 <b1airo> i arrive (after ~24 hours in-transit) about 11pm on Sunday, so i should be sparkling...
22:00:39 <oneswig> I'm due in Sunday arvo some time.  Also fried.
22:00:46 <b1airo> but i'll be around until following Sunday before heading on to Montreal for CUG
22:00:49 <oneswig> OK we are at the hour
22:00:51 <jmlowe> I'm in Sunday as well
22:00:54 <elox> thanx for putting the meeting together.
22:00:56 <jmlowe> and like to drink
22:01:00 <b1airo> lol
22:01:03 <oneswig> On that happy note!
22:01:06 <trandles> b1airo: if you're interested some of us are going to a baseball game Friday and Saturday
22:01:10 <oneswig> Thanks all, until next time
22:01:20 <trandles> o/ see you all in Denver
22:01:23 <oneswig> #endmeeting