21:00:17 #startmeeting scientific-sig 21:00:18 Meeting started Tue Apr 16 21:00:17 2019 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:19 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:21 The meeting name has been set to 'scientific_sig' 21:00:30 Greetings SIG 21:00:36 hello 21:00:44 #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_April_16th_2019 21:00:48 Hey rbudden, hody 21:00:57 (I mean, howdy) 21:01:00 heh 21:01:30 Martial sends apologies as he has family commitments this week 21:01:36 Hello from Sweden, Erik Lönroth http://eriklonroth.com here. 21:01:38 unfortunately i have to bounce early to pickup my son, but i’ll catch up on the logs afterwards 21:01:52 elox: Hi, welcome :-) 21:02:07 thanx! 21:02:15 Thanks for coming along 21:02:41 Hope to learn and contribute. 21:02:53 Great! 21:03:19 Seems like it might be a short meeting this week though - with the run-up to Easter a few folks are out 21:03:50 OK, let's get going... 21:03:58 #topic Ironic and external DCIM 21:04:18 The context here is there was some discussion last week with the CERN team (Arne in particular) 21:04:27 (On phone as discussed) 21:04:51 They've got a big server enrollment imminent and are looking again at how they might streamline that 21:04:57 hey martial, you made it :-) 21:05:06 #chair martial 21:05:07 Current chairs: martial oneswig 21:05:28 #link previous week's discussion http://eavesdrop.openstack.org/meetings/scientific_sig/2019/scientific_sig.2019-04-10-10.59.log.html#l-17 21:06:43 If I can summarise correctly, the conclusion was that they'd like a means to store inspection data in an external infrastructure management DB (eg, Netbox, etc) 21:07:03 And to update periodically if possible to keep the information current 21:07:53 Is there anyone from their team here? 21:08:12 Arne's looking at candidate technologies and wants to consider things that people are actually using 21:08:21 o/ 21:08:23 elox: Not as far as I know... 21:08:36 Hi b1airo :-) 21:08:40 #chair b1airo 21:08:40 Current chairs: b1airo martial oneswig 21:08:53 So following up dtantsur's comment, we've also discussed downstream building a reporting agent to provide a greater detail, but it would only really be viable in self hosted environments and there are items that would need to be worked out to make something viable like that. 21:09:19 Hi TheJulia, thanks for joining 21:09:33 What do you mean by self-hosted in this context? Sorry. 21:10:10 operator that is willing to trust and deploy an agent with-in their workload 21:10:40 Ah, thanks. 21:11:18 The end idea, possibly being keeping ironic's inspection data fairly up to date. 21:11:25 That's not ideal, in many cases people wouldn't want that. 21:12:05 Out of band... while seems perfect, we're also talking about embedded systems that are easy to put in weird states that cause them to lock up for a little while which is also not ideal 21:12:27 TheJulia: I think there was some discussion around the cleaning phase being an option for periodic re-inspection? 21:12:52 oneswig: possibly, the case we have is we want to somehow capture a drive swap 21:12:57 or the addition of a disk 21:13:26 I see. Hence wanting to track while an instance is live? 21:13:39 For which the usefulness is kind pointless. That being said during cleaning to have a check-in with inspector just seems like some minor-ish logic in inspector 21:13:46 oneswig: exactly. 21:14:55 The sense I got was that within Ironic many of the pieces for a solution (that met many needs) were already within reach. 21:15:45 Although it sounds like your problem case is still beyond that. 21:16:11 Arne was looking for a shortlist of systems people use for DCIM. 21:16:36 So far I recall the list included Xcat, Netbox, Infoblox, possibly OpenDCIM 21:16:55 rbudden: do you know what you're using at GSFC? 21:17:07 we use xcat 21:17:31 Arne's list was actually longer... i-doit, CMDBuild, SnipeIT, SysAid, Spiceworks, GLPI, NetBox, xcat 21:18:27 thanks rbudden, I'll make sure that's fed back. How about you elox? 21:18:31 Hi folks. Sorry, was logged in and STILL late to the meeting. 21:18:50 Hi trandles! 21:19:38 trandles: Just talking solutions for DCIM 21:19:46 ironically at GSFC xcat is used for baremetal provisioning and some lightweight VMs that aren’t openstack controlled, but there is interest in entertaining Ironic for future HPC clusters 21:20:04 DCIM is a hot topic 21:20:11 We are using MaaS 21:20:22 trandles: the circles in which you move... :-) 21:20:39 thanks elox, good to know 21:21:13 I'd like to hear a lot about other's experiences. We don't have a comprehensive DCIM right now but we might be pushed in that direction. Would be beneficial to have a head start when it does come to the fore. 21:21:58 trandles: from the data gathered so far, it's a pretty diverse set of tools in use. 21:22:11 trandles: We are really keen on sharing our experiences. 21:22:15 Hey, sorry I'm late 21:22:24 Hi jmlowe, no problem :-) 21:22:55 Currently surveying DCIMs in use - care to contribute a data point for IU? 21:23:12 We have one, I ignore it 21:23:40 I'm kind of curious if there is a reason why 21:23:44 fair enough :-) probably not a good recommendation 21:24:09 Which sounds awful, but it gives everyone a data point in terms of is there a process or use for DCIM that might not be ideal 21:24:38 looks like opendcim 21:24:54 Our current issues with DCIM probably aren't unique. We have several siloed systems that came from vendors and only really work well with their equipment, where "equipment" is largely power and cooling. They call them "DCIM" but I don't think they know what that means in a larger sense. 21:24:59 jmlowe: Is it bad, or just not relevant to your workflows? 21:25:21 not relevant, key info in when systems come and go, then ignore 21:25:40 So its use is more as asset tracking 21:25:44 jmlowe: that was largely how things were at PSC as well 21:26:02 typical lifespan of systems is 5-7 years 21:26:03 TheJulia: exactly 21:26:25 I've seen Netbox play a useful role inbetween those events 21:27:11 I know the problem with lifecycle at PSC was it was originally populated when new systems came in, then fell out of updates for one reason or another (I forget what DCIM was being used) 21:27:20 TheJulia: do you know what the current state of play is with anomaly detection on inspection data? I've used cardiff from python-hw for this before, but it was hard to setup 21:27:22 so then it wasn’t reliably useful for anything :P 21:28:07 oneswig: like failed hardware detection? 21:28:11 rbudden: jmlowe: if the DCIM was updated with Ironic inspection data, perhaps it would have more utility 21:28:21 TheJulia: exactly - missing DIMMs, disks, etc. 21:28:22 yes 21:28:48 oneswig: for inside introspection data, likely the same as long as long as "extra-hardware" is set as a collector for introspection. 21:29:27 Thanks TheJulia, good to know. 21:29:37 oneswig: we use NHC ( https://github.com/mej/nhc ) for that bit it doesn't integrate with a larger DCIM. We have custom splunk dashboards that communicate that level of brokenness. 21:30:19 ie. missing DIMMs, interconnect links at the wrong speed, CPUs stuck in a bad cstate, etc. 21:30:20 brokenness as a metric - how cool is that 21:30:43 good link trandles, thanks for that 21:31:05 sorry, have to run, this is all very interesting though, i’ll ask around about anything else in use at GSFC 21:31:18 Thanks rbudden - see you 21:31:20 bye all 21:31:43 elox: What are your experiences with MaaS? 21:31:44 it will detect things like wrong firmware levels on mobos, nics, hcas, hdds 21:33:39 oneswig: So far, it has provided us with a means to work with our hardware as if it was a cloud and adding in the Juju framework (https://jujucharms.com/) lets us provision a diversity of modern software previously out of our reach. 21:34:49 elox: you're provisioning bare metal software deployments with it? Neat. 21:34:53 ... We will likely deploy an openStack on top of MaaS which is a legit use-case for us. We already leverage AWS, vsphere and plan to use GCE, Azure and perhaps even Oracle. 21:35:23 ok, other meeting done! 21:35:37 busy morning here o_0 21:35:58 elox: A really diverse set of IaaS there. I think you've got pretty much the full set? :-) 21:36:17 hey b1airo, good to have you 21:36:56 oneswig: We deploy the full software stack on top of MaaS: SLURM, OFED, application, middleware etc. We add in things as we need to meet a diverse set of use-cases. AI/BigData etc. 21:37:53 Sounds good. 21:38:34 OK - we should move on with the agenda I guess. Any more to add before we do? 21:38:44 oneswig: Yes, its diverse. We need to meet different requirements from projects limited in time. For example, month #1, we might need the HPC cluster(s) to be tailored towards DeepLearning. Month #2 Data Analytics or a mix. We then rebuild the cluster explicitly for those workloads via Juju. 21:38:51 catching up on DCIM discussion... Google Spreadsheet 21:38:57 * b1airo ducks 21:39:26 b1airo: Very true, certainly we've deployed from them 21:40:19 I would be remiss if I didn't link this here: https://github.com/hpc/kraken/ 21:40:22 it's a LANL thing 21:40:53 Don't see Juju very often but I've heard when it works it's pretty smooth. Conversely when it doesn't work... 21:41:07 not yet in production, very much in development, but looking for more interested use cases 21:42:01 trandles: sounds like a beast alright. Keep us updated :-) 21:42:20 oneswig: I think we are pretty much first out here but its living up to our expectations and more so. 21:42:24 just got to the "or will when fully implemented" parentheses :-) 21:43:57 oneswig: We have developed a SLURM bundle that deploys SLURM to "any" cloud and is in progress of tailor it to be as close to a reference implementation we can get for HPC with SLURM. 21:44:10 b1airo: haha, indeed...the local dev has taken some heat for that 21:44:30 elox: good to hear it. Is that open source? 21:44:47 All open source. 21:45:08 oneswig: https://jujucharms.com/slurm-core/bundle/0 21:45:17 Thanks elox - share and enjoy :-) 21:45:54 OK, let's move on... 21:45:58 #topic Denver - SIG "help most wanted" forum 21:46:19 oneswig: I was hoping to. It feels a bit risky being out in this end so its really in our self interest to both get eyes on what we are doing and helping others that might want to try give it a spin. 21:47:31 I'm aware of one or two others in the Juju ecosystem but given it's a small world you probably know them already too 21:48:05 Last airing for this one, Rico's looking for input on cross-SIG needs to advocate 21:48:44 #link Scientific SIG wish-list taking shape here https://etherpad.openstack.org/p/scientific-sig-denver2019-gaps 21:48:57 Please contribute if you have pain points 21:49:37 There is a forum session in Denver where the hope is the SIGs will gather and find common needs 21:50:00 #link Forum session in Denver https://www.openstack.org/summit/denver-2019/summit-schedule/events/23612/help-most-needed-for-sigs-and-wgs 21:50:33 OK, that's enough on that I think 21:50:45 #topic Denver SIG meeting and BoF 21:51:39 I noticed the room we are in (507) is not looking particularly big on the map... 21:52:07 #link map of the convention center https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-assets-prod/summits/26/locations/428/floors/OSDEN-Map-FINAL-sm2.png 21:52:56 I have a photo from Berlin which I might share with the organisers to see if we can upgrade. 21:53:30 #link lightning talks wanted - sign up here https://etherpad.openstack.org/p/scientific-sig-denver19-bof 21:53:57 I'm hoping Arkady will offer a prize as usual! 21:54:18 Finally - evening beer social 21:54:34 \o/ 21:54:34 Last summit (Berlin) we co-incided with the Ironic gang, which was great 21:55:13 I saw this go by on the Ops thread and wondered if we should tag along with them on the Monday night - http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005096.html 21:55:21 good idea re room upgrade oneswig 21:56:14 could be a repeat of Barca when we jammed into that tiny room 21:57:43 also a reminder we're still looking for lightning talks: https://etherpad.openstack.org/p/scientific-sig-denver19-bof 21:58:31 Thanks b1airo 21:58:47 OK we are nearly at time... 21:58:52 AOB items? 21:59:05 oneswig: i'm onboard for the operators drinks 21:59:30 Sounds good to me too (jetlag permitting) - you'll be raring of course... :-) 22:00:11 I'll be arriving Sunday around noon if anyone wants to get an early start ;) 22:00:17 i arrive (after ~24 hours in-transit) about 11pm on Sunday, so i should be sparkling... 22:00:39 I'm due in Sunday arvo some time. Also fried. 22:00:46 but i'll be around until following Sunday before heading on to Montreal for CUG 22:00:49 OK we are at the hour 22:00:51 I'm in Sunday as well 22:00:54 thanx for putting the meeting together. 22:00:56 and like to drink 22:01:00 lol 22:01:03 On that happy note! 22:01:06 b1airo: if you're interested some of us are going to a baseball game Friday and Saturday 22:01:10 Thanks all, until next time 22:01:20 o/ see you all in Denver 22:01:23 #endmeeting