11:00:19 #startmeeting scientific-sig 11:00:20 Meeting started Wed Mar 10 11:00:19 2021 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:21 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:23 The meeting name has been set to 'scientific_sig' 11:00:49 Hi all 11:01:00 #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_March_10th_2021 11:02:19 We have a discussion on Jupyter as today's main event 11:02:27 Hi 11:03:03 Hi eliaswimmer, thanks for coming along 11:03:08 I have prepared a few slides about my lessons learned using jupyterhub for lectures 11:03:14 https://docs.google.com/presentation/d/1Il7nCNTaKCla0AqbQJnrmPpp_TUhg_-E8IqgYA6q4gk/edit?usp=sharing 11:03:37 I need to request access, can you make them world-readable? 11:03:56 seconded 11:04:07 Hi b1airo, evening 11:04:11 #chair b1airo 11:04:12 Current chairs: b1airo oneswig 11:04:19 howdy oneswig 11:04:28 * b1airo yawns 11:05:47 oneswig: is it working now? 11:05:58 Yes, thanks that works 11:06:14 I am in 11:09:59 eliaswimmer: how much additional work does Zero to JupyterHub require to make into a production service? 11:10:46 That really depends on your requirements 11:10:59 but it works really well out of the box 11:11:15 a lot of effort went into it 11:11:45 sometimes it is a bit hard to follow the fast release updates 11:12:46 I spent most time with creating images and getting user creation right 11:13:14 What were the complexities in docker container image creation? 11:13:19 and of course on the underlying kubernetes setup 11:13:28 I'm not really here, but Magic Castle includes Jupyterhub. 11:13:36 does Z2J already handle idle notebook cleanup? 11:14:04 https://github.com/ComputeCanada/magic_castle 11:14:05 b1airo: yes, that works quite well 11:15:23 oneswig: mainly the diverse needs of our users, that's the place where all the features are setup 11:15:53 sorry for being late. we have a group which runs Jhub notebooks on k8s on Rancher on OpenStack (on turtles...) Their main difficulty was scheduling pods, not knowing if an arriving user was about to run something big or something small, and overcommitting resources. 11:16:05 what does your front-end proxy setup look like, any load or scaling issues with so many clients? 11:18:07 nothing special in our front end as far as I know, standard k8s ingresses 11:19:07 eliaswimmer: it sounds like your labs scale up and down a lot. What's the highest scale the deployment has reached in terms of users online? Is the Kubernetes auto-scaling working well? 11:19:40 I have to admit, we just throw a lot of hardware onto it, so it was never 100% utilized 11:19:59 I apologise for my lateness! Where I work we run Jupyter notebooks in two ways - we have JupyterHub running on Rancher Kubernetes which we manage for our users. We also run a system we have called "Cluster-as-a-Service" which can dynamically deploy Pangeo-based JupyterHub instances on our OpenStack cloud. 11:20:17 oneswig: autoscaling was never needed and tested so far 11:20:38 ah ok 11:21:35 I set limits for each lecture separately, which is easier for lectures where you know your requirements 11:22:52 Does anyone use GPUs already in their setups? 11:22:57 eliaswimmer: how is the storage interface working? 11:23:47 oneswig: right now I use CephFS via CSI driver directly for homes and shares 11:23:50 We have used GPUs with K8S in a couple of deploys 11:24:56 How is it with utilization? I wonder how to do AI lectures with 150 students, I would need 150 GPUs for them! 11:25:29 That is why I am looking into an additional KubeFlow setup for a GPU cluster 11:26:30 How many people that are running JupyterHub have Dask enabled? We have found that users like having that functionality. 11:26:33 Could split them under k8s - MIG or vGPU 11:27:25 b1airo: therefore I would need tesla grade GPUs 11:27:35 Yep, our HPC integrated JHub has Dask built-in 11:28:35 Anyone using SLURM spawner? 11:28:47 That's how the Magic Castle implementation works. 11:28:54 true eliaswimmer , though is they are already in an OpenStack cluster I'm assuming they're in server machines, in a data centre somewhere, so... 11:29:31 yes, we're using Slurmspawner 11:30:16 https://github.com/cmd-ntrf/slurmformspawner 11:30:38 b1airo: Do you have extra partitions for JupyterHub with shared nodes? 11:32:24 the system here has Dask (not sure how many people use it). not using SLURM (there is LSF elsewhere for those who want it) 11:34:56 Actually our spawner is kind of a mashup, as the Hub machine is not allowed the Slurm keys directly, so once a user authenticates (2fa via a custom PAM authenticator) we create a Kerberos credential that allows them temporary ssh access to a login node, so our version of batchspawner does things via ssh 11:36:27 eliaswimmer: what do you do to provide user data into Jupyter environments? 11:37:13 re partitions, we were just filling up space on gpu nodes to start with, but now have a dedicated interactive partition for jupyter and other modest jobs. partly because we had some requests for teaching postgrad labs on the environment 11:38:25 oneswig: that's a good question, upload and download capabilities are quite limited, so for lectures with huge data sets I provided the lecturers with an extra share server. 11:39:16 eliaswimmer oneswig We have found that, especially when using Dask, it makes sense to have any large datasets in an object store. 11:39:54 b1airo: we do so the same as our clusters are getting smaller and smaller in terms of nodes 11:41:02 For our managed notebook service, even though the notebook servers are running in Kubernetes we actually mount home directories and shared filesystems, so the environment they see is much like they see if they SSH to our traditional batch platform. 11:41:09 mpryor: Oh that sound interesting, are you using a plugin for Jupyter to provide a view on the object store? 11:42:03 eliaswimmer The community that we operate in (Earth Sciences) has built tools around a technology called Zarr that makes using the object store more or less transparent. 11:42:33 mpryor: opencube? 11:43:38 eliaswimmer I think we have some people using datacube-like technologies. However the most common software stack seems to be data on object store, accessed using Dask, XArray and Zarr. Data catalogs are provided by a tool called Intake. 11:43:58 This is basically the Pangeo stack - https://pangeo.io/ 11:45:30 The Pangeo community also maintain a data catalog for CMIP6 - https://pangeo.io/catalog.html 11:46:34 mpryor: thank you! Our geo scientist are very interested in our setups, so that is a good starting point for me 11:47:32 eliaswimmer Pangeo is our standard setup that we provide via our Cluster-as-a-Service system. 11:48:17 Mostly it is oceanographers using it at the moment, but we have had interest from other groups included geo-type stuff. 11:48:17 does anyone use gpfs with manila and kubernetes? 11:48:18 eliaswimmer: are you deploying all the k8s and jupyterhub environments for users or do they self-service somehow? 11:49:42 oneswig: right now I do everything my own, but the plan is to have a self service platform with a service catalog once 11:52:05 Anyone using JupyterHubs for teaching? 11:52:48 eliaswimmer A few of our tenants have used the self-service hubs we offer via CaaS for workshops and teaching. 11:53:43 They are able to onboard all their own users, so we often don't find out about it though. 11:53:44 We are planing to improve grading services (nbgrader and ngshare) a lot over summer, we will open source our code when ready 11:54:10 mpryor: how do the manage authentication? 11:56:03 Magic Castle was originally created for teaching at ComputeCanada, so I'm pretty sure they use the Jupyter part for that. It sets up IPA for auth. 11:56:20 eliaswimmer One of the other cluster types we offer as part of our Cluster-as-a-Service is a central identity manager, which all other clusters connect to. 11:56:37 I meant to ask if they wanted to contribute to this meeting, but forgot, and it's a bit early for Canada. 11:57:19 I think it's a significant enough use case that a follow-on is warranted 11:57:48 Computational notebooks for teaching.. some interesting pedagogical arguments around that I've come across, feels like a lost battle though 11:59:38 eliaswimmer verdurin Our identity manager is FreeIPA + Keycloak. Keycloak is only there in a readonly capacity to provide OpenID Connect. Our Pangeo (JupyterHub) instances authenticate using the LDAP that you get from FreeIPA. 12:00:16 ah, we are out of time 12:00:20 We could have also used OpenID Connect, but OIDC is not fully supported yet by the JupyterHub OAuthenticator. 12:00:27 final comments please 12:00:27 interesting discussion - thanks all 12:00:39 mpryor: keycloak is a great tool, I use it for our SLURM based setup, wrote a little extension for our 2fa auth 12:00:49 seems everyone is doing Jupyter these days 12:01:05 I think there is a lot to share, maybe we can setup some etherpad? 12:01:09 Thanks eliaswimmer and all, useful discussion 12:01:25 related, am keen to talk to anyone using OpenOnDemand 12:01:26 Thanks. 12:01:40 eliaswimmer: some follow-up is definitely needed. 12:01:49 OK, have to close the session. Thanks all 12:01:52 #endmeeting