21:00:39 <martial> #startmeeting Scientific-sig 21:00:40 <openstack> Meeting started Tue Jun 26 21:00:39 2018 UTC and is due to finish in 60 minutes. The chair is martial. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:00:41 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:00:43 <openstack> The meeting name has been set to 'scientific_sig' 21:00:54 <martial> Good day everybody 21:01:08 <trandles> hi martial 21:01:26 <m_ebert> Hello Martial 21:01:44 <janders> good evening, good morning :) 21:02:11 <martial> Today we are joined by Marcus Ebert who will share with us "Utilizing Distributed Clouds for Compute and Storage" 21:02:24 <janders> excellent, hello Marcus 21:02:25 <martial> The links to the video and slide deck are as follow 21:02:44 <m_ebert> Hello everyone! 21:02:47 <martial> #link Video https://youtu.be/QAda-ee-9Ko 21:03:02 <martial> #link Slides https://goo.gl/g9EmWg 21:03:17 <martial> now we understand that not everybody has had a chance to review the video 21:03:41 <martial> as such we want to give Mr Ebert a chance to discuss his slide deck and answer questions 21:03:42 <b1airo> o/ 21:03:48 <martial> #chair b1airo 21:03:49 <openstack> Current chairs: b1airo martial 21:04:05 <martial> welcome Blair 21:04:19 <b1airo> Thanks martial 21:04:48 <martial> m_ebert: would you like to give us an introduction to the work please? 21:05:00 <m_ebert> Sure 21:05:08 <b1airo> I'm feeding the animals/children breakfast, so sorry if I'm a little slow... 21:05:24 <b1airo> Hi m_ebert 21:05:38 <m_ebert> Hi blairo 21:07:12 <m_ebert> What I wanted to show in the slide is the system we developed to utilize different clouds, which can be anywhere in the world, by building system that unifies different clouds and types into a single infrastructure 21:07:58 <m_ebert> It hides the cloud structure from the users, which only see a single "local" batch system to which they submit their jobs 21:08:51 <m_ebert> In addition, since the jobs can now run anywhere and the user has no idea where, we are also working on an data infrastructure that unifies storage space on different endpoints into a single files system like structure 21:09:31 <m_ebert> an overview of how the compute part work is shown on slide 9 21:10:24 <m_ebert> and slide 19/20 how we would like to have the storage part that the compute is using in the future 21:10:30 <b1airo> m_ebert: presumably this works well for high-throughput workloads. What does the job definition contain, and can the system accommodate jobs added dynamically via API? 21:11:21 <m_ebert> blairo: do you mean with API submitting directly to condor, or which API do you mean? 21:12:59 <janders> m_ebert: this is excellent work. Regarding storage, is the workflow 1) download 2) process 3) upload results, or is in-place access to data also possible? 21:13:25 <b1airo> Either I suppose. (I haven't seen the slides yet sorry - just woke up) 21:14:55 <m_ebert> janders: the workflow of the 2 experiments is right now download-process-upload, but in-place access is possible. On slide 22 for example there is an example how the root analysis framework could open files directly over the network, but any other tool that can stream files should work to when it uses http/webdav 21:16:00 <m_ebert> blairo: Yes, all jobs that can be submitted to condor will go to VMs. In the job definition, requirements for RAM,CPUs, disk and so on can be defined, and if not then defaults will be used 21:17:18 <janders> m_ebert: great! Slide 22 says mounting the whole data federation is "reasonably fast". Do you know the approximate throughput you're getting? (just an order of magnitude will suffice) 21:17:21 <m_ebert> and if no VMs available that satisfy these requirements, then cloudscheduler will start a new VM 21:17:21 <martial> from your slide deck, this uses HTCondor as the batch server, can communicate to can use Openstack, OpenNebula, Amazon, Microsoft Azure, and Google Cloud, and I would second the question on storage as I see "DynaFed"? 21:18:31 <m_ebert> janders: getting data from a minio instance running on a VM on the same cloud, we see up to 3Gbps (but we use only files that are some GBs large) 21:20:35 <janders> m_ebert: Thank you. What do you think is the limiting factor in terms of achieving more throughput? storage bandwidth? WAN bandwidth? Applications (eg TCP window tuning)? 21:21:02 <m_ebert> martial: yes, HTCondor is the batch system, and cloudscheduler can use these types of clouds right now and can communicate with it. Then it uses cloud-init to send the condor config to a VM, which then can communicate with the HTcondor server. 21:23:07 <m_ebert> The limiting factor I see is the storage system itself in case of minio (data is on a volume mounted on a VM, where minio is just a layer), the cloud network interface to the outside, and all other ongoing activity on the cloud/hypervisor 21:24:17 <b1airo> Any issues with condor here? I have not used it for a long time but remember it was sometimes seeming a but slow to get started 21:24:20 <m_ebert> using gfalFS on a baremetal system, we get nearly line speed (tested up to 10Gbps interfaces) - but depends largely on where the endpoints are 21:25:25 <m_ebert> No issues in daily production so far. Only when we have a very large number of cores available temporary (>>10,000) then it made problems in the past 21:26:50 <m_ebert> Well, not cores really, but job slots (we run mostly single core jobs, so job slots==cores) 21:27:15 <b1airo> Ok, yes that sounds more familiar 21:30:18 <b1airo> What sort of length jobs are you running? 21:32:13 <m_ebert> we run mostly so called pilot jobs, which are just a wrapper to setup the environment for real jobs and then pull in from the experiments server the real payload. pilots can run up to 2days (defined in condor), payload jobs usually between some minutes and up to about 15h) 21:32:57 <martial> how much time is spend copying data then in those jobs? 21:34:28 <b1airo> Yeah i was meaning the actual payload jobs. Do you know if condor handles short jobs (10s to 1s of seconds) ok? 21:34:30 <m_ebert> Well, with dynafed it comes mostly from close by storage so it's just some seconds at the beginning of the job. We run mostly 8core VMs, so 8 such jobs will run in parallel 21:36:03 <m_ebert> well, in our case, the condor job is only the pilot since that is all it knows about. But we had misconfigured pilots before which terminated within seconds and then run through hundreds before we stopped it - condor was fine with that 21:36:39 <martial> do you have some local vs remote benchmark to share? 21:36:55 <m_ebert> martial: when we had to go back to the site SE, the pulling in of data can take up to half hour or timed out, since it allows only a specific number of parallel transfers and send all others in a waiting queue 21:37:02 <martial> (on similar enough systems) to see how bad the overhead is? 21:37:46 <m_ebert> martial: not right now, but working on a benchmarking sytem. There will be the CHEP conference in 2 weeks, I should have it by then to present. I can send a link around once I have it ready 21:38:03 <martial> thanks we can do a follow up then 21:38:39 <janders> that would be great 21:38:55 <m_ebert> sure, I'll do that 21:41:18 <martial> any other questions for Mr Ebert? 21:41:47 <martial> seems to me we have reached a natural stopping point in this conversation for the time being 21:41:58 <martial> we invite people to check the video for additional details 21:42:26 <martial> allow me to thank you again for coming to explain this very interesting solution to us 21:42:49 <martial> please follow with us in a few weeks if you have more to add 21:43:06 <martial> for the time, please allow me to thank you for accepting to talk to us m_ebert 21:43:19 <m_ebert> Thank you! Also everyone please feel free to send me questions/comments by email later if some come up by reviewing the slides, also if you have free resources somewhere ;-) 21:43:44 <b1airo> Thanks m_ebert , we'll also share this in next week's meeting which is friendlier to different timezones 21:43:46 <m_ebert> I'll try to join more often the meetings and let you know when the benchmarks are ready 21:43:55 <martial> your email is listed in the slides 21:44:01 <m_ebert> thanks blairo 21:44:01 <martial> thank you 21:44:21 <martial> and with this, our other topic for today was 21:44:25 <martial> #topic AOB 21:44:49 <martial> for once I do not have content for AOB :) 21:44:52 <martial> b1airo? 21:44:59 <b1airo> Ha! 21:45:34 <b1airo> Hmmmm, no not off the top of my head. But I am a little slow off the mark this morning 21:45:40 <janders> the submission deadline for Berlin is approaching.. I wonder what talks would you guys like to hear? 21:47:06 <verdurin> janders: the talk where someone solves all my controlled-data problems? 21:47:32 <janders> verdurin: :) 21:47:38 <b1airo> Yeah that would be pretty wonderful 21:48:00 <b1airo> Even just a talk that tells me what they all are 21:49:26 <b1airo> I'd be interested in hearing about how people do general purpose "managed" cloud, i.e., long-lived instances with patching etc 21:50:16 <martial> that is a tough one indeed 21:50:55 <janders> b1airo: on your RoCE-enabled system, do you install MOFED in the images or are you running upstream mlnx stack? 21:51:14 <b1airo> Yeah we use MOFED 21:51:52 <janders> do you embed custom repo config in images and manage kernel-dependent packages? 21:51:56 <janders> or do you leave that to the users? 21:51:58 <b1airo> Though it gets installed post launch via Ansible 21:52:22 <janders> do you keep your own MOFED repos? 21:52:36 <janders> (or use mlnx iso/tgz ?) 21:52:57 <b1airo> The HPC crew who are the primary users of the RoCE enabled stuff do have their own repo for managing updates consistently 21:53:12 <b1airo> They use the ISO I think 21:53:29 <janders> ok! 21:54:07 <janders> I maintain MOFED repos and put excludes in the yum config - that's mostly for the infra, but I'm considering doing something similar for the instances 21:54:23 <b1airo> I think it's mostly ok now. Though I sometimes hear them grumbling about needing to rebuild modules 21:54:38 <janders> sounds like mlnx_add_kernel_support.sh :) 21:54:53 <janders> mostly works, occasionally causes frustration.. 21:55:24 <b1airo> Yeah. Dkms on Ubuntu seems to be fine 21:55:26 <janders> it's an interesting lifecycle-related challenge, quite a boutique one though 21:55:59 <janders> what's your motivation for using MOFED as opposed to upstream? better perforamnce? supportability? good practice? all of the above? 21:56:00 <b1airo> It's a bugbear across the research cloud 21:56:16 <martial> we have only a short few minutes at this point 21:57:16 <b1airo> Lots of fairly green users with root access of public machines. Would be nice to give them guardrails and help protect their data 21:57:58 <janders> b1airo: +1! :) 21:59:12 <martial> and on those words, I am about to end our meeting 21:59:27 <martial> thanks everybody for spending this time with us 21:59:29 <b1airo> I see scope for a community project in this, but not something I can start at the moment! 21:59:37 <b1airo> Cheers all 21:59:43 <martial> and to Mr Ebert for presenting 21:59:48 <martial> bye all 21:59:53 <martial> #endmeeting