21:00:22 #startmeeting scientific-sig 21:00:23 Meeting started Tue Mar 6 21:00:22 2018 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:25 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:27 The meeting name has been set to 'scientific_sig' 21:00:32 ahoy there 21:00:48 #link agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_March_6th_2018 21:02:04 #topic SIG roundup from the PTG 21:02:17 hello 21:02:27 Hi Bob 21:02:43 How's Bridges? 21:03:05 Doing good 21:03:21 Keeping us all busy! 21:03:30 We had some discussion around Ironic deploy steps, ramdisk boot and kexec - I think you were interested in this? 21:03:37 indeed 21:04:05 Seems like the deploy steps concept is right for you then :-) 21:04:20 Morning oneswig 21:04:25 It was lucky, we were scheduled at a quiet time when not much was going on 21:04:27 Hey Blair 21:04:31 #chair b1airo 21:04:32 Current chairs: b1airo oneswig 21:04:49 I'm wrangling the kids to school so one eye on this 21:04:52 as a result, we had good attendance by a number of key people 21:05:06 awesome 21:05:11 i’m checking out the etherpad now 21:05:15 b1airo: I got my kids to school hours ago... 21:06:08 Julia Kreger seemed particularly comfortable with the idea of supporting ramdisk boot as a proven technique 21:06:16 Bugger, I must have overslept! 21:07:15 That was at the tail end of a couple of hours of discussion though. We had an in-depth update on preemptible instances from ttsiouts at CERN 21:08:09 extremely comfortable given what I've read where it has been done in various deployments 21:08:14 Hello all 21:08:18 Hi TheJulia! 21:08:21 thanks for joining 21:08:23 Hi martial 21:08:28 #chair martial 21:08:29 Current chairs: b1airo martial oneswig 21:09:11 I was just recapping the discussion (although running backwards) 21:09:44 rbudden: one action we took was to document more clearly our use cases for non-conventional Ironic deployment steps 21:10:54 sounds good 21:10:55 TheJulia: can you remind me the best way use cases could be made available for the Ironic team? 21:11:31 oneswig: a new use case to support a new thing, or an existing usecase that we already support? 21:11:48 New thing - eg ramdisk boot 21:12:11 Storyboard / launchpad? 21:12:12 oneswig: Create an bug tagged with [RFE] in the subject on ironic's launchpad 21:12:38 at least, until we migrate to launchpad. I have to find a spare network cable before I can run a test migration to storyboard 21:13:00 OK, launchpad will work for now 21:13:56 If I create a bug and sketch out the need, rbudden can you add details specific to what you'd like for Bridges? I'll circulate to Pierre and Tim as well 21:14:05 sure 21:14:18 I think Trandles has the largest use case for ramdisk boot 21:14:40 we could use that as well for our 12TB nodes on Bridges, but we only have a handful of them 21:14:44 Sounds like he's up to something interesting 21:14:53 I think boot from Cinder Vol would fix us up as well 21:15:12 obviously we like kexec to avoid multiple reboots 21:16:03 rbudden: there was some discussion on multi-attach for large scale cinder volume boot, I think it needs some testing at scale (which we may try in a couple of months) 21:16:14 rbudden: how long to reboot a node with 12 TB RAM? 21:16:53 i don’t have the exact number off the top of my head, but i recall when we PXE booted it from the showfloor at SC it was at least 30 min :( 21:17:16 we unded up pulling blades to debug to cut back the boot time since it was a demo ;) 21:17:25 oneswig: if you do, we would love to know the details behind any testing since some systems have architectural limits. 21:17:41 that was years ago though, so i’m unsure if there have been improvements to disable things like ram check, etc. at the iLO level 21:18:37 TheJulia: johnthetubaguy and mgoddard are likely to be leading it - I'm sure they'll keep you updated. The rough scale is deploying to ~600 ironic nodes. 21:18:57 Awesome, thanks! 21:19:29 Should be a lot of fun :-) 21:21:39 There was a good deal of interesting discussion on preemptible instances, including how they might interact with reservations in Blazar. I think that was one of the highlights of the session 21:23:03 That discussion gained some user input from the Scientific SIG and went on to a Nova group discussion on the Friday afternoon. 21:23:42 It was a bit difficult to focus by that time given everyone had just had their flights cancelled but I think the Nova team soldiered on. 21:24:37 One of the nuances was on whether to perform the preempting action (ie, killing an instance) upon the final "NoValidHost" event, or to attempt to do it slightly before then based on (eg) 95% utilisation. 21:25:04 I think the CERN team want the former to get maximum utilisation 21:25:49 The latter might feasibly be a role performed by a process like Watcher. 21:27:06 There was also some discussion on a new strategy for resolving quotas across nested project hierarchies. 21:27:08 Would be nice to have that option integrated given how close they are 21:28:08 b1airo: right - seems like it, although having many concurrent actors could make a complex system chaotic. Perhaps one strategy will win out. 21:29:58 Figuring out what 95% is could be a difficult problem in real deployments 21:29:59 The quota issue may be resolved in the long term through managing support for quotas through a new Oslo library, tasked with managing resource quotas across a subtree of projects 21:31:05 There were some interesting issues raised on how to count resource consumption when (eg) mixing virtualised and bare metal compute, given the custom resource classes of baremetal. 21:31:56 My natural instinct is that they should be separate quotas 21:32:16 b1airo: does it all come back to the placement service in the end? On your previous comment 21:32:43 I suspect it has to 21:36:06 There was also some interesting discussion in the Ironic sessions on complex deploys - multi-partition, RAID, etc. 21:37:18 We also briefly talked about setting BIOS config during deploy steps. This raises a question on how to undo in cleaning all that was done in deployment. 21:38:14 i’m not sure if it currently exists, but a way to plugin certain cleaning steps would be nice 21:38:28 specifically for us it would be for puppet cert cleanup before a redeploy 21:38:50 (catching up on the typed text, why were flight canceled?) 21:39:18 martial: it snowed a freakish amount for Ireland. 21:40:33 https://twitter.com/DublinAirport/status/969368265662267393 21:41:44 I got home after ~36 hours, the airport had only just reopened then. This part of Europe isn't geared to handle weather like that, everything shuts down... 21:43:04 No plows lining the runways like in Chicago 21:43:31 They had plows at the airport... but yeah 21:45:05 oneswig: with regards to undo settings applied, I'm fairly sure ironic may only need to undo the boot node, but I've not had much time to think about it.. nor ability to brain after the two day trek home. 21:45:47 rbudden: I think you can already create custom clean steps, but perhaps you'd need to roll your sleeves up - https://docs.openstack.org/ironic/pike/admin/cleaning.html 21:45:48 oneswig: a distinct use case where we would need to peel things back that is not raid would be appreciated, if your aware of one 21:46:28 oneswig: thanks, i’ll check that out. i haven’t played with cleaning much, but always find simple cleanup steps that would be awesome to just automate 21:46:30 TheJulia: aside from RAID our main use cases are hyperthreading and power profile. 21:46:56 I guess hyperthreading is the one you'd notice immediately 21:47:15 I think those could all be done upon next deployment if we get deploy steps sorted with the bios interface work 21:48:01 TheJulia: if it could be done in one hit, that would be good - avoiding another reset... 21:48:24 hyperthreading is a good one, we occasionally get requests for this as well 21:49:30 I suspect it would almost be better to always try to assert desired state upfront. The only thing I can really think of is needing special firmware, but that is.... yeah. 21:50:01 TheJulia: careful what you wish for! :- 21:50:43 I'm sure that would make some operators happy 21:51:13 It would mean a comprehensive picture of default settings, to totally define hardware state upon deployment 21:51:54 I think that was all I had on the PTG - TheJulia was there anything the scientific SIG would really like from the Ironic sessions? 21:52:08 Virtualisation features would be another common toggle 21:54:03 b1airo: agreed. 21:54:58 BTW have you seen this project from Dell - https://github.com/dsp-jetpack/JetPack 21:55:23 The missing piece from python-dracclient (NIC config) is found here. 21:55:31 oneswig: I'm still typing up everything. We did briefly discuss firmware management but there are many different ways we can approach that. 21:56:00 Thanks TheJulia, I'll follow that (probably indirectly via Mark and John) 21:57:19 We are nearly out of time... 21:57:23 #topic AOB 21:57:38 Queens is imminent! 21:57:59 Mark did a test deploy to shake out some things in Kolla and Bifrost 21:59:00 One other announcement - https://github.com/openstack/kayobe - one step closer 21:59:31 I saw mikal praising Bifrost on Twitter :-) 22:00:18 it's the future of deployment! :-) 22:00:26 On that happy note, final comments? 22:00:42 He'll be on to Kayobe next 22:00:47 Our P2302/ORCA meeting is coming soon ( March 20-21) ... details at federatedcloud.eventbrite.com 22:00:56 Won't we all b1airo :-) 22:00:57 (final comments shameless plug ;) ) 22:01:07 thanks martial 22:01:13 good reminder! 22:01:19 OK, we are out of time 22:01:21 #endmeeting