21:00:15 #startmeeting scientific-sig 21:00:16 Meeting started Tue Oct 29 21:00:15 2019 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:17 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:20 The meeting name has been set to 'scientific_sig' 21:00:23 ahoy! 21:00:29 hello 21:00:33 #link agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_October_29th_2019 21:00:38 Hi trandles, afternoon 21:01:21 What's new? 21:01:41 Nothing much, just super busy 21:01:52 You preparing for Shanghai? 21:02:13 I have a visa in my passport, at least 21:02:43 Just looking through the schedule at the moment... 21:03:08 I haven't had a chance to see the schedule. Mostly been focusing on SC 21:03:27 Some useful talks about operations at scale. Definitely on my radar. 21:03:55 trandles: are you presenting material at SC? 21:04:23 I'm a co-author on 3 posters (one for CANOPIE) but not presenting 21:04:39 Well, the CANOPIE submission is a poster and a short paper 21:04:39 g'day all! 21:04:46 Hi janders 21:04:48 hi janders 21:05:15 o/ 21:05:22 Hi b1airo! 21:05:27 #chair b1airo 21:05:28 Current chairs: b1airo oneswig 21:05:44 How's everyone enjoying the rugby? :-) 21:05:54 that's it, i'm out! 21:06:22 such a horrible game from ABs 21:06:34 got to make hay when the sun shines, particularly with the all-blacks... 21:06:41 oneswig gets a +1 for htat 21:06:46 *that 21:07:05 indeed, they couldn't get their act together after realising England were there to play and play hard 21:07:59 With a 9am kick-off back home, many people lost a weekend after that... 21:08:03 not sure who to barrack for in the final 21:08:27 can't remember - who's in it b1airo? 21:08:32 :-) 21:08:34 lol 21:08:40 relentless 21:08:51 I'm sure we had an agenda somewher 21:09:35 All Blacks are just helping prepare the nation for the cricket season... 21:10:01 ah, at least you have summer to look forwards to 21:10:22 Anyone going to Shanghai from NESI? 21:11:19 no, not this time round i'm afraid. there was some interest from NIWA but couldn't get the travel sorted/approved 21:11:53 too bad. 21:12:03 I'm interested in the talk from Pawsey 21:13:01 ... which doesn't appear to be in the schedule any more... 21:14:19 oh, what was it about? their not Nectar Nectar-cloud ... :-) 21:14:44 Using CephFS as a cluster filesystem for MPI workloads, IIRC. 21:15:07 My personal experience was mostly in the "not yet" camp 21:15:43 But we have some good operational experience of it serving home dirs for non-MPI workloads. 21:16:16 look like it's been removed indeed :( 21:16:20 oh ok, i didn't know they had been attempting anything like that. I know UniMelb have been, with fairly middling results. i would agree with the "not yet" assessment 21:16:20 I'd love to watch it 21:16:48 in fact, i'm kind of skeptical about it ever being a good option for that 21:17:04 yeah that's why we went GPFS instead (which was running fine till the IB switch died, it's having a break while RMA is being sorted) 21:17:46 Maybe they had second thoughts about their experiences? 21:17:47 it hasn't been a good month for storage for ue 21:17:50 *us 21:17:51 i can see it being useful for all persistent storage applications in HPC, all the way to "project/programme" space, but for true scratch it's just not a good design fit 21:18:19 It's not anything like as fast. We've used it for hybrid cloud. 21:19:37 I was quite disappointed yesterday to hear 1) BeeGFS doesn't have any QoS and 2) it's not on the roadmap 21:19:57 I suppose BeeOND can cover that off in a way but still 21:20:21 janders: what's happened this month? 21:20:39 do you guys have any experience where BeeOND for job X is running on nodes where job Y is running? 21:20:46 oneswig: where do I start... 21:21:08 we had some metadata lockups on BeeGFS not handled by builtin HA which caused a lot of damage 21:21:08 No, sorry, sounds like a proper noisy neighbour issue 21:21:40 and last week HDR switch died in my OpenStack kit 21:21:42 janders: data loss? 21:22:00 not much of that but a lot of downtime at the least appropriate time 21:22:07 HDR switch, but that's practically brand new! 21:22:16 yeah hence RMAs come out of Israel 21:22:30 the joys of Mellanox support 21:22:35 no local stock 21:23:02 On a related note, these guys didn't get the news: https://www.openstack.org/summit/shanghai-2019/summit-schedule/events/24298/hpc-and-openstack-via-omni-path-architecture 21:23:03 I designed the system with one IB switch as it's just under 70 nodes (and I would struggle to fit in the budget otherwise) 21:23:22 "future work"... 21:23:28 haha 21:23:48 70 x 100G or 70x 200G? 21:23:50 but I'll see if I can get more switching there - it would be nice to experiment with IB-multipath-GPFS 21:24:03 a combination 21:24:17 80%-20% HDR100-HDR200 21:24:27 GPFS and some clients are 200 21:25:16 lol! seriously though, how did that talk get in..? 21:25:28 through side avenues? :) 21:25:40 or maybe keyword sponsored? 21:25:44 maybe it replaced the Pawsey one 21:26:26 I'm sure it has nothing to do with affiliation with a Summit Diamond Sponsor 21:27:07 as much as I dont think this is a relevant direction it would be interesting to have a look at the mechanics of the integration mechanism 21:27:17 Ironic without PXE seems interesting https://www.openstack.org/summit/shanghai-2019/summit-schedule/events/24274/ironically-reeling-in-large-scale-bare-metal-deployment-without-pxe 21:27:30 cause I asked Intel exactly that a year and a bit back and they were like "uh... oh... we don't think we can do that." 21:27:33 anyone here on the selection panel? my experience is the panel makes the final decision, have never seen it changed by foundation staff except for logistical reasons 21:27:40 janders: right, and there's a few OPA systems out there. 21:28:10 that PXEless talk is cool 21:28:27 Julia mentioned some of the underlying bits at the PTG in Denver 21:28:30 b1airo: think you're right. I saw the panel members at one point. 21:28:32 looks like it's coming together 21:29:38 hasn't the Dell Ironic driver always worked without PXE...? 21:30:07 The iDrac driver and "always worked" in the same sentence, hmmm. 21:30:19 LOL and +1 21:30:41 I used Ironic with Dell kit a fair bit, but it was always generic driver 21:30:50 We've spent some time with it again recently, still hit issues with stuck jobs 21:30:54 mostly due to the fact no one I spoke to at Dell was able to explain how to use it 21:31:11 Arkady stepped in to help some time back but I moved to Intel kit since 21:31:35 I think at least the doco should be less bad now 21:32:42 I recall there were some useful-looking Ansible playbooks built upon it in the last couple of years 21:37:24 did it get a bit quiet, or did I drop out? 21:38:12 oneswig: pretty sure i saw those playbooks in a blog of yours... 21:40:41 Not sure about that exactly, there were others we never got into using 21:40:56 Don't recall the names, alas 21:41:18 for BM clouds it does kind of feel like we need a better way of managing server imaging... there must be a way to pre-deploy common OS images ready for boot selection. some sort of boot drive mechanism that let's you deploy images and select the desired image in management mode, then in runtime mode it COW overlays the selected OS image 21:43:57 It does seem quite clunky, agreed. 21:44:58 I recall there was a switch a few years back from a company called Pluribus Networks that was all about pushing compute and data services out to the ToR 21:45:59 fancy that ... looks like they are still at it. 21:47:14 yeah, i seem to recall trandles dreaming up a ToR/rack-oriented architecture like this too 21:47:32 far out 21:47:52 ah yeah, scalability in provisioning 21:48:14 Our work is still in progress 21:49:01 Has your new team member started trandles? 21:49:15 b1airo, what you're describing in the "BM clouds" comment is kinda like what NERSC did with CHOS 21:49:27 But CHOS was kinda clunky and used chroot 21:49:49 oneswig, negative...but should be by the end of the year 21:51:10 https://github.com/NERSC/chos 21:52:32 in a chroot, can you still mknod a device file for the raw root device and mount it? 21:53:42 asking for a friend :-) 21:54:01 ah yeah, i think i've seen / heard of that before. i guess to do this properly(tm) would require some sort of integration with UEFI boot etc to avoid a guest OS instance from fiddling with the pre-deployed images 21:54:40 ... or could the CH be CHarliecloud? 21:55:24 I'm much more in favor of something like kexec + pivot_root for rapid re-provisioning 21:56:04 kexec (if a new kernel is required) and then the pivot_root to the new image...Ironic has talked about supporting it 21:56:40 I had a phone call with some people a couple months ago who are interested 21:56:52 In an Ironic context? 21:57:24 yeah 21:57:51 It does violate that abstraction of the guest OS, but perhaps we just want faster boots, man! 21:58:24 yeah that sounds fine for a trusted environment / set of tenants, but in a true multi-tenant cloud env you'd need a method to protect (or at least verify) the OS image/install is good 21:58:26 trandles: I still haven't tried the ramdisk deploy driver. Been meaning to for ages. 21:58:48 Either let us replace UEFI with a kernel and ramdisk, or give us ways to "reboot" without going back through the firmware please 21:59:29 oneswig: I had half a day a month ago to play with it some, but managed to mess something up along the way and haven't gone back to it yet :( 21:59:30 ... or give us a little game to play while the lifecycle controller does its stuff in the background? 21:59:48 ah, we are at time. 21:59:50 Any more? 22:00:17 oh, whoops - i meant to leave for a meeting 10 minutes ago! 22:00:27 cya! 22:00:35 OK, nice talking to you again guys. I'll put some of these talking points on the flip chart for the SIG PTG. 22:00:36 "standby" mode for ironic was on the radar for the current Ironic cycle 22:00:39 See you in Denver! 22:00:45 #endmeeting