09:00:23 <oneswig> #startmeeting scientific-wg 09:00:24 <openstack> Meeting started Wed Jun 21 09:00:23 2017 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 09:00:25 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 09:00:27 <openstack> The meeting name has been set to 'scientific_wg' 09:00:45 <verdurin> Morning, oneswig, all 09:00:52 <oneswig> Good morning... 09:01:26 <oneswig> Blair mentioned a dinner appointment, is hoping to be along 09:01:36 <oneswig> #link agenda for today https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_June_21st_2017 09:01:43 <priteau> Hello 09:01:48 <oneswig> Hi priteau 09:02:04 <priteau> Good morning oneswig 09:02:10 <priteau> Hello verdurin 09:02:15 <verdurin> Hi priteau 09:02:32 <b1airo> Hi gang 09:02:35 <oneswig> #topic Optimal meeting time 09:02:37 <oneswig> Hi b1airo 09:02:41 <oneswig> #chair b1airo 09:02:42 <openstack> Current chairs: b1airo oneswig 09:02:48 <oneswig> good evening 09:03:15 <oneswig> I was just about to quote some Shakespeare - "when shall we three meet again", but now we're four :-) 09:03:42 <b1airo> Note that I'm presently walking to a restaurant for a work thing, so will be a bit patchy this meeting I'm afraid 09:04:15 <priteau> Good evening b1airo 09:04:19 <oneswig> On that theme, there was some list discussion on more convenient times to meet for EMEA and APAC. 09:04:19 <verdurin> oneswig: has someone complained about the current time? 09:04:25 <b1airo> Hi priteau 09:04:39 <oneswig> b1airo: proposes 0700 UTC or 1100 UTC 09:04:53 <b1airo> verdurin: mainly me - bedtime clash 09:05:05 <oneswig> 1100 UTC is good for me, 0700 less so... 09:05:56 <priteau> ditto, especially when we're out of DST 09:05:59 <verdurin> 1100 UTC okay for me - 0900 was the problematic one 09:06:35 <oneswig> I think then, we have unanimous approval for 1100 UTC? 09:06:54 <daveholland> (late vote for 1100UTC from me) 09:07:01 <oneswig> #agreed We'll request a move to 1100 UTC (if available) 09:07:09 <b1airo> Great, I'll follow that up later then. Shall we chat about scientific software/apps etc on cloud? 09:07:10 <oneswig> Hi daveholland that's excellent 09:07:25 <daveholland> sorry, previous meeting ran a bit late 09:07:28 <oneswig> #action b1airo to investigate 1100 UTC 09:07:40 <oneswig> #topic Scientific app catalogues on OpenStack 09:07:47 <verdurin> Hi daveholland 09:08:22 <oneswig> OK so this topic came about because there appear to be several groups around the world who want to get started on making app catalogues for their users 09:08:46 <b1airo> So I suggested we have a bit of a chat/survey on this prompted by some work we recently completed in the Nectar cloud to launch a new category of glance images contributed by our community 09:08:51 <verdurin> oneswig: could you define what "app" refers to in this context? 09:08:52 <oneswig> I'm aware of prior work from the NSF gang (Jetstream and Bridges) 09:09:04 <daveholland> are thinking of e.g. an image with Rstudio built/configured? 09:09:07 <daveholland> (we have done that) 09:09:33 <oneswig> verdurin: my definition is, someone using a tool rather than compiling it. Eg, ISV applications, or Spark, or similar. 09:09:47 <oneswig> daveholland: sounds right to me. 09:10:04 <b1airo> I'm interested in looking at this broadly, figuring out everything that this might mean and then looking at recommendations for particular use cases 09:10:48 <verdurin> oneswig: I suppose my point is whether we're covering something like Broad sequencing pipeline, or just the component programs, or both? 09:11:03 <b1airo> A machine image is probably the most base form 09:11:37 <b1airo> verdurin: I would say whatever is useful at the level of the scientist 09:11:38 <oneswig> verdurin: I'm thinking of the unit, a combination of services, allocated together 09:12:43 <daveholland> so we've done various "applications" e.g. an ELK stack; Docker (to make sure the bridge IP range is set to avoid clashing with our internal network range); an openlava cluster; an NFS server. Some of these are aimed at minimising the "sysadmin speed bump" for a developer. Some have CloudForms orchestration on top 09:12:54 <b1airo> The point of distributing "apps" is to make them easy to consume for a particular purpose, so for e.g. genomics a whole pipeline is probably a suitable unit to distribute in some way 09:13:14 <oneswig> The way the compute is typically consumed, I guess, makes it easier for users to consume 09:13:32 <oneswig> daveholland: how have you gone about that? 09:13:37 <priteau> Orchestration templates (Heat or CloudFormation or something else) should be handled as well as disk images 09:15:00 <daveholland> oneswig: we pointed the intern in the right direction and let him go :) Seriously, we're using Packer (so, image definition as code) and building on a base OS image. There's no concept of combining functions (e.g. openlava with docker) which is something I don't know how to address 09:15:04 <oneswig> priteau: on chameleon, do you have a feel for how many people use the appliances as a proportion? 09:15:25 <verdurin> Yes, I think we need the recipes as well, or references to upstream support cf. definitions in elasticluster 09:15:44 <priteau> oneswig: you mean what we call "complex appliances", which are orchestrated templates? 09:16:17 <b1airo> Anyone else using Murano? We have it on Nectar also and have some members contributing new packages quite steadily, but they are mostly typical app stacks rather than science/research specific at the moment 09:16:39 <oneswig> priteau: things like the hadoop appliance 09:17:10 <b1airo> I need to pay attention here now, pls mention me if there's anything to answer 09:17:53 <oneswig> We don't have Murano but are interested generally in it as a solution. One issue IIRC is that it's not supported by Red Hat OSP 09:18:14 <oneswig> Which makes it tricky for quite a lot of deployments out there. 09:18:30 <daveholland> @blairo we're not using Murano mostly because it's not in RHOSP. but it looks potentially interesting 09:18:50 <b1airo> Does RH have a viable alternative? 09:19:03 <priteau> oneswig: I haven't run the numbers recently but IIRC some months ago, most usage was with basic CentOS or Ubuntu images, then images for specialized hardware (GPU, FPGA), then OpenStack deployments 09:19:15 <b1airo> Things like Juju seem like a reasonable agnostic alternative 09:19:19 <oneswig> daveholland: Am I right in thinking that RH will still support OSP if a customer integrates other services - just not the extra non-supported services? 09:19:33 <daveholland> I'm not sure if OpenShift fits that gap (maybe if you push hard) 09:19:54 <priteau> The Hadoop appliance wasn't used much, I assume because it targets a very specific community of people interested by high-performance networking in VMs 09:19:56 <oneswig> b1airo: I think RH would like people to use OpenShift or CloudForms perhaps? 09:20:15 <oneswig> ok thanks priteau 09:20:15 <daveholland> @oneswig yes, we have done various customizations (e.g. ml2 drivers) and still get support, we haven't added other OpenStack services though 09:20:21 <verdurin> daveholland: Openshift leans more towards developers 09:20:54 <verdurin> RH may have Murano as a Tech. Preview soon, if they don't already 09:21:11 <priteau> oneswig: but we are a special testbed with a lot of users building their own systems for research. I expect domain scientists would use pre-packaged appliances much more. 09:21:36 <oneswig> From this discussion, I think if somebody was to deploy Murano and make a good fist of it, we've got enough content for another section of the book on this. 09:21:56 <daveholland> Murano isn't tech preview in RHOSP11 FWIW (we're only just about to deploy RHOSP10 as our next iteration...) 09:22:17 <b1airo> Yeah I think Nectar generally would be interested in contributing and collaborating on that 09:22:25 <verdurin> daveholland: okay, once more my optimism is dashed 09:22:32 <daveholland> sorry :) :( 09:22:48 <oneswig> b1airo: jmlowe and rbudden I'm sure would also be interested. 09:22:57 <verdurin> oneswig: we would be happy to contribute 09:23:05 <b1airo> Yes, this conversation actually started with them 09:23:29 <oneswig> verdurin: daveholland: I expect it can be done within TripleO + RDO (gut instinct) 09:23:38 <oneswig> And would then transfer direct to OSP 09:24:01 <oneswig> I can do some research on that. 09:24:08 <verdurin> oneswig: sure - that's how I would test it anyway 09:24:44 <oneswig> b1airo: Is there anything online about Nectar's work with Murano so far? 09:25:52 <oneswig> verdurin: testing Murano on tripleo - that would be great, lets try to work out how 09:26:10 <daveholland> do I understand right that Murano uses heat? Is that to assemble components in an instance deployed from a bare OS image? (sounds like say using CloudForms to inject cloud-init script to install/configure packages) 09:26:53 <b1airo> oneswig: a little, e.g., https://nectar.org.au/cloud-application-catalogue-released/ 09:27:08 <oneswig> daveholland: I believe so, yes. Particularly good for managing a multi-node application topology as a single addressable thing 09:27:36 <priteau> daveholland: yes, Murano can store Heat templates: https://docs.openstack.org/developer/murano/appdev-guide/hot_packages.html 09:27:51 <daveholland> OK, and that does address how to combine components/applications in a single instance (assuming they don't tread on each other) thanks 09:28:37 <oneswig> I wonder how portable Murano apps are from one cloud to another. Probably quite easy to include site-specific assumptions. 09:29:48 <oneswig> daveholland: b1airo's suggestion of juju earlier covers very similar ground - but even less likely to be on RH's roadmap 09:30:20 <zz9pzza> If you used something like packer with openstack and aws to generate images that would help to reduce site specific issues. 09:30:59 <b1airo> Though doesn't need to be on RH's roadmap to use it against a RHOSP cloud 09:31:07 <verdurin> Yes, I'd go that way rather than Juju, about which I'm a bit of a sceptic 09:31:13 <oneswig> zz9pzza: agreed, building for two different targets does help remove assumptions 09:33:13 <oneswig> OK, how can we make progress here? I think there's some investigation and reporting back. Like everyone I can't be certain if I'll get the time at this end but I'd like to try it too. 09:33:25 <daveholland> it's a trade-off though: build images and you have (potentially exponential) image sprawl and the size requirements; but an image is quicker to deploy the app than installing it on a bare OS every time 09:34:02 <zz9pzza> I prefer images, as it reduces launch time and they can have functional tests via a ci 09:34:37 <verdurin> oneswig: happy to coordinate with you separately about Murano testing with TripleO 09:34:39 <b1airo> I'll have a think about a way to survey what people are going/seeing today 09:34:55 <b1airo> s/going/doing/ 09:34:57 <oneswig> I guess with images you lose the opportunity to easily parameterise, eg scale of resource allocated 09:35:13 <oneswig> verdurin: great, lets do that. 09:35:21 <zz9pzza> You can put some customisation in cloud init 09:35:26 <b1airo> Can still use cloud-init to customise behaviour oneswig 09:35:31 <b1airo> Jinx! 09:35:35 <zz9pzza> :) 09:35:43 <oneswig> zz9pzza: but not things like the number of worker nodes? 09:35:56 <zz9pzza> But that is at a different layer eg heat 09:36:11 <zz9pzza> or auto scaling from the first machine 09:36:15 <oneswig> true. Heat + images would get you a long way 09:36:57 <oneswig> that is was we do for cluster-as-a-service currently, with ansible as the icing on the cake :-) 09:37:24 <b1airo> The biggest issue we see with Heat templates is that the original dev assumes some feature of their local cloud, e.g., tenant networks or floating IPs 09:37:46 <b1airo> Often those things are not necessary dependencies, but it's hard to untangle them 09:37:53 <priteau> My dream would be one unified way of defining customization at image-build time or at deployment-time, and the infrastructure would decide which is best to use 09:37:56 <oneswig> b1airo: My heat templates certainly have that stuff in them. 09:38:02 <priteau> But I suppose that's more for a PaaS than for IaaS 09:38:38 <daveholland> heat input parameters (+ defaults) could paper over that 09:38:49 <b1airo> Yes priteau , reminds me of when I was programming against Azure many years back 09:38:59 <oneswig> priteau: In this discussion I see the application catalogue in that layer, or above it 09:42:13 <oneswig> OK, final actions? I can start a review for a new section of the book. When people's research comes back, we can start to gather findings there. b1airo it would be particularly interesting if there's evidence of Nectar's Murano serving science/HPC apps. GPUs, SRIOV - bonus material. 09:43:21 <b1airo> Shall we have a quick chat on security oneswig ? I have one eye on my phone still... 09:43:21 <oneswig> We should also roll over this topic for the Americas 09:43:21 <oneswig> OK, here it comes 09:43:21 <oneswig> #topic Security of research computing instances 09:43:21 <b1airo> Yes agreed re. rollover 09:43:28 <oneswig> Another week, another kernel patch. 09:43:47 <b1airo> Ha! 09:43:50 <daveholland> :/ 09:44:00 <oneswig> I'm not sure cloud helps to the extent that it could with these things - so easy to have user machines going rotten 09:44:10 <oneswig> Any good practice out there? 09:44:29 <b1airo> We are looking at ways to integrate FW into our OpenStack in an opt-in manner (it's otherwise a DMZ) 09:44:40 <daveholland> turn on unattended-upgrades (or similar) in base images? maybe controversial for repeatable/reproducible 09:44:50 <zz9pzza> I note that cloudforms can do security scans by taking a snapshot of a machine and run scripts against that image however we havn't tried using it yet. 09:45:00 <verdurin> Should be possible to reuse some of the work people are doing to vet container images/definition files? 09:45:14 <daveholland> or accept that the soft centre is going to remain soft, and check that security groups keep the outside relatively hard? 09:45:28 <priteau> We run a vulnerability scanner against instances, but that only covers issues that can be detected remotely (mostly SSL security problems) 09:45:41 <b1airo> Currently seems like a routed solution where FW is the default gateway on certain provider nets might be the simplest option. So no direct OpenStack Integra 09:45:57 <oneswig> I've heard of the friendly probing service for that - both against infra and instances 09:46:36 <b1airo> verdurin: any links on that? 09:46:47 <oneswig> b1airo: is that an instance serving as firewall & gateway, or FWaaS? 09:47:05 <verdurin> b1airo: e.g. https://github.com/coreos/clair 09:47:16 <daveholland> also the location of the security responsibility boundary. If the OpenStack admin is purely providing hosting (indeed might be contractually forbidden from looking inside instances) that makes a difference 09:47:25 <b1airo> The image catalog work I mentioned earlier includes a static security scan of the image as one of the criteria to pass before we put the image in the "Contributed" category 09:48:02 <oneswig> b1airo: sounds good. Is that on nectar's github? 09:48:22 <b1airo> oneswig: the FW could be in an instance but we already have physical Palo Altos that we can probably integrate like that 09:49:21 <priteau> There was a related talk in Barcelona, still on my watch list though 09:49:24 <priteau> #link https://www.youtube.com/watch?v=vuL7in9CxHY 09:49:47 <oneswig> Is there an equivalent of something like tripwire/electricfence that can run within a VTN, or is that under the firewall remit? 09:49:57 <oneswig> Thanks priteau, looks interesting 09:50:00 <b1airo> oneswig: yes, it's in our CI somewhere 09:50:18 <b1airo> I can take that as an action to share 09:50:25 <zz9pzza> But that kind of thing will not catch new issues that arise. 09:50:28 <oneswig> b1airo: sure that would be helpful 09:51:20 <verdurin> zz9pzza: yes, that was my thought 09:51:35 <zz9pzza> Which cloudforms could in theory 09:51:48 <b1airo> zz9pzza: true, but we require "Contributed" images to be updated on a rolling basis 09:52:10 <oneswig> zz9pzza: I guess the hope is to catch the misuse of a compromised system. The university network watches for this. I guess it's all layers 09:52:13 <zz9pzza> but if I have a system that has been up for a month... 09:53:01 <daveholland> @b1airo do you expire or time-limit instances made from one of those images? can imagine a long-lived instance being vulnerable to a newly-announced bug 09:53:08 <b1airo> Oh sure. We do vulnerability scanning too 09:53:54 <daveholland> JOOI what is the time limit? we are looking to host "pets" which could be around for months 09:54:00 <oneswig> daveholland: indeed. If the managed infrastructure is updated immediately, what equivalent can be done to shore up user cloud instances 09:56:37 <oneswig> Ah, we are short on time. Any more to add here? 09:57:31 <oneswig> I think the different strategies are all valuable, great suggestions thanks everyone 09:57:40 <daveholland> just a quick comment about user education :) this week I found someone who thought he had to enable ssh inbound to make ssh outbound work 09:57:49 <verdurin> Worth talking to CERN people - when I was active in CMS, there was a lot of discussions about these sorts of security issues, e.g. with long lasting instances 09:58:19 <daveholland> (actually it wasn't ssh in particular but a misunderstanding about port number at each end of a TCP connection) 10:00:08 <priteau> oneswig: unrelated to the current topic, but any news about https://review.openstack.org/#/c/459884/ 10:00:12 <b1airo> Thanks all. I think all this warrants some ML threads too... 10:00:13 <oneswig> verdurin: good point, and it's a general issue not just for research computing 10:00:47 <verdurin> Bye all 10:00:51 <oneswig> priteau: no, oddly - I'll make a note to chase 10:00:54 <oneswig> thanks everyone, out of time 10:00:55 <zz9pzza> ttfn 10:01:00 <oneswig> #endmeeting