21:00:17 #startmeeting scientific-sig 21:00:17 Meeting started Tue Sep 29 21:00:17 2020 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:18 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:20 The meeting name has been set to 'scientific_sig' 21:00:25 #chair martial_ 21:00:26 Current chairs: martial_ oneswig 21:00:34 the game's afoot :-) 21:00:41 woohoo :) 21:00:48 Nice HPC agenda 21:00:48 o/ 21:01:21 for the summit schedule - looks like some good talks. trandles have you recorded yours yet? 21:01:39 Yup, Julia, Jacob and I recorded yesterday. I think Julia uploaded today. 21:01:58 we used Bluejeans 21:02:01 went smoothly 21:02:28 I haven't come across bluejeans in ages 21:03:19 I think our team were recording via Google Meet. 21:04:06 We messed around with Meet initially but switched due to ease of use. We all presented and controlled slides and found switching easier with Bluejeans. 21:04:34 Looking forward to listening to those :) 21:05:52 trandles: is your talk on Bifrost, or something equivalent to it? 21:07:50 Ironic at LANL 21:08:27 we were going to do a proper demo of provisioning a cluster with Ironic + Ansible, but the format and limited time worked against us 21:11:15 Coincidentally we have a project underway not too far from this. Half of it is covered in the Cambridge CSD3 talk (https://www.openstack.org/summit/2020/summit-schedule/events/24616/lessons-learnt-building-cambridge-universitys-csd3-supercomputer-with-openstack) but they've figured out a neat workflow for hardware onboarding involving barcode scanners and Neutron for BMC ports as well. 21:12:47 I'll check that out 21:13:36 I think that talk will also be on how to apply image updates to a 1000-node cluster in a way that's both cloud-native and not disruptive to the service availability. 21:14:53 Quite a lot of trouble went into making that work. 21:15:25 it's a difficult problem 21:15:47 something along the lines of rolling updates, we've dabbled but never dared to adopt it in widespread use 21:18:14 The obvious solutions to it turn out to be naive in one way or another. I think the key was to have Slurm schedule jobs that would trigger the node getting nuked and paved with the new image. Immutability in a rolling update but scheduler-aware. 21:21:03 We have a Slurm prolog that will reboot the node if a sentinel file is found, that's the easy part. It is indeed difficult unless the scheduler is aware of just what's happening. You want to resume rebooted nodes, but you don't want the scheduler to accidentally launch a single job across two different images. Without scheduler help it's just one big race condition. :( 21:22:30 That makes sense. 21:23:04 Morning! 21:23:17 Welcome Blair :) 21:23:21 #chair b1airo 21:23:21 My pick of our bunch is the coral reef cloud talk, which covers some ideas around enabling reservation to guarantee availability 21:23:22 Current chairs: b1airo martial_ oneswig 21:23:29 Hi b1airo, morning! 21:23:56 I'm mildly distracted for next 10 mins - barista duties... 21:24:21 On the subject of upcoming talks, there is one tomorrow from the folks at the Open Infra Labs in Massachusetts 21:24:32 presentation on Project Caerus:improving coordination between compute and storage systems for big data and AI workloads (https://gitlab.com/open-infrastructure-labs/caerus) by Theodoros Gkountouvas 21:24:56 Wednesdays 9:00 AM to 10:00 AM Boston, MA time (1:00 PM - 2:00 PM GMT). Zoom Call:https://bostonu.zoom.us/j/95680241139?pwd=S01qQjg5MFVCNlR4VFNQVW1MZnBQZz09&from=msft 21:25:03 Tim's favourite :-( 21:25:20 LOL 21:27:46 On a different subject, anyone making regular use of the Ansible network modules? Every time I return to this, it seems to be broken in a new way. 21:31:44 ... ansible network modules seem guaranteed to stop a conversation :-) 21:32:08 just like they stop your network? 21:32:41 I haven't looked at them. I can ask some colleagues doing the bulk of our Ansible work. 21:32:58 One of those times where automation is so much more effort to set up than it would ever have saved... 21:33:04 is this host networking? switch management? 21:33:20 interface control? firewall? 21:33:51 Switches. Configuring a new deployment with details of physical connectivity and portchannels. 21:34:17 Ah, I think our networking team does use ansible for switch management. 21:34:35 AFAIK we're pretty much all Arista. 21:34:47 If you want a contact let me know. 21:35:19 including VLANs? 21:35:27 Those are a lot better but alas this is Dell OS 9, which is cranky to automate. 21:36:05 martial_: yes those bits too. In this case it's only the VLANs for control plane (maybe 5-6), the tenants are on VXLAN for this one. 21:36:19 very nice 21:36:39 it must help that everything is the same vendor too 21:37:19 thanks trandles but unless your contact has some old Aristas he/she was about to ditch, I don't think they can help :-) 21:37:59 Looks like our Arista management via Ansible is still being worked out. Only production use of Ansible and networking is to manage some Cumulus white box cluster ToR switches... 21:38:33 He did say that they just had a demo with Arista/Ansible a few weeks back. Maybe hit up your Arista contacts? 21:40:14 Can't change the facts on the ground, alas, but thanks anyway. 21:42:59 martial_: can we plan a session including Rion in two weeks to talk min.io? 21:43:16 let me ask him right now 21:45:06 possibly AFK 21:45:12 will follow up on our slack 21:45:49 sounds good. 21:45:52 Thanks martial_ 21:48:07 #topic AOB 21:48:30 topic implies there was purposeful discussion of business to this point. Apologies for that :-) 21:49:15 I see the SIG has some forum time reserved? 21:49:47 or is that unconfirmed? 21:49:48 Yes I saw a mail about that. Haven't checked when, do you have details? 21:50:19 I don't. I saw a webpage via link in something to openstack-discuss...if that isn't vague enough... :/ 21:51:44 Just had a look but didn't turn anything up on the schedule. 21:53:36 https://www.openstack.org/summit/2020/summit-schedule/global-search?t=forum 21:54:05 gah, that doesn't look like it 21:54:05 doesn't look like it does it trandles 21:54:39 I know I saw that spreadsheet-style page with the different rooms named for releases somewhere 21:54:42 something I said? Everyone appears to be leaving... IRC at play presumably 21:55:08 trandles: I think that was separate, the PTG. 21:55:48 maybe I'm confused like normal 21:56:27 #link PTG registration - 26th October https://www.eventbrite.com/e/project-teams-gathering-october-2020-tickets-116136313841 21:56:35 That looks like it. 21:57:09 I see the calendar at http://ptg.openstack.org/ptg.html 21:57:30 and we have our etherpad already https://etherpad.opendev.org/p/wallaby-ptg-scientific-sig 21:57:35 Yeah loading for me now 21:57:38 but I do not see us on the schedule 21:58:04 We are on Monday 21:58:20 Monday 21:58:36 Thanks ... Monday, Cactus 21:59:01 Beer after... 🥺 21:59:28 likewise :-) 21:59:40 oneswig: can I catch you on Slack in 5? 21:59:52 Sure, let's do that. 22:00:09 final call, time's up 22:00:18 bye all 22:00:20 any more to add? 22:00:26 thanks all 22:00:30 #endmeeting