#openstack-meeting log

09:00:23 <oneswig> #startmeeting scientific-wg
09:00:24 <openstack> Meeting started Wed Jun 21 09:00:23 2017 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
09:00:25 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
09:00:27 <openstack> The meeting name has been set to 'scientific_wg'
09:00:45 <verdurin> Morning, oneswig, all
09:00:52 <oneswig> Good morning...
09:01:26 <oneswig> Blair mentioned a dinner appointment, is hoping to be along
09:01:36 <oneswig> #link agenda for today https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_June_21st_2017
09:01:43 <priteau> Hello
09:01:48 <oneswig> Hi priteau
09:02:04 <priteau> Good morning oneswig
09:02:10 <priteau> Hello verdurin
09:02:15 <verdurin> Hi priteau
09:02:32 <b1airo> Hi gang
09:02:35 <oneswig> #topic Optimal meeting time
09:02:37 <oneswig> Hi b1airo
09:02:41 <oneswig> #chair b1airo
09:02:42 <openstack> Current chairs: b1airo oneswig
09:02:48 <oneswig> good evening
09:03:15 <oneswig> I was just about to quote some Shakespeare - "when shall we three meet again", but now we're four :-)
09:03:42 <b1airo> Note that I'm presently walking to a restaurant for a work thing, so will be a bit patchy this meeting I'm afraid
09:04:15 <priteau> Good evening b1airo
09:04:19 <oneswig> On that theme, there was some list discussion on more convenient times to meet for EMEA and APAC.
09:04:19 <verdurin> oneswig: has someone complained about the current time?
09:04:25 <b1airo> Hi priteau
09:04:39 <oneswig> b1airo: proposes 0700 UTC or 1100 UTC
09:04:53 <b1airo> verdurin: mainly me - bedtime clash
09:05:05 <oneswig> 1100 UTC is good for me, 0700 less so...
09:05:56 <priteau> ditto, especially when we're out of DST
09:05:59 <verdurin> 1100 UTC okay for me - 0900 was the problematic one
09:06:35 <oneswig> I think then, we have unanimous approval for 1100 UTC?
09:06:54 <daveholland> (late vote for 1100UTC from me)
09:07:01 <oneswig> #agreed We'll request a move to 1100 UTC (if available)
09:07:09 <b1airo> Great, I'll follow that up later then. Shall we chat about scientific software/apps etc on cloud?
09:07:10 <oneswig> Hi daveholland that's excellent
09:07:25 <daveholland> sorry, previous meeting ran a bit late
09:07:28 <oneswig> #action b1airo to investigate 1100 UTC
09:07:40 <oneswig> #topic Scientific app catalogues on OpenStack
09:07:47 <verdurin> Hi daveholland
09:08:22 <oneswig> OK so this topic came about because there appear to be several groups around the world who want to get started on making app catalogues for their users
09:08:46 <b1airo> So I suggested we have a bit of a chat/survey on this prompted by some work we recently completed in the Nectar cloud to launch a new category of glance images contributed by our community
09:08:51 <verdurin> oneswig: could you define what "app" refers to in this context?
09:08:52 <oneswig> I'm aware of prior work from the NSF gang (Jetstream and Bridges)
09:09:04 <daveholland> are thinking of e.g. an image with Rstudio built/configured?
09:09:07 <daveholland> (we have done that)
09:09:33 <oneswig> verdurin: my definition is, someone using a tool rather than compiling it.  Eg, ISV applications, or Spark, or similar.
09:09:47 <oneswig> daveholland: sounds right to me.
09:10:04 <b1airo> I'm interested in looking at this broadly, figuring out everything that this might mean and then looking at recommendations for particular use cases
09:10:48 <verdurin> oneswig: I suppose my point is whether we're covering something like Broad sequencing pipeline, or just the component programs, or both?
09:11:03 <b1airo> A machine image is probably the most base form
09:11:37 <b1airo> verdurin: I would say whatever is useful at the level of the scientist
09:11:38 <oneswig> verdurin: I'm thinking of the unit, a combination of services, allocated together
09:12:43 <daveholland> so we've done various "applications" e.g. an ELK stack; Docker (to make sure the bridge IP range is set to avoid clashing with our internal network range); an openlava cluster; an NFS server. Some of these are aimed at minimising the "sysadmin speed bump" for a developer. Some have CloudForms orchestration on top
09:12:54 <b1airo> The point of distributing "apps" is to make them easy to consume for a particular purpose, so for e.g. genomics a whole pipeline is probably a suitable unit to distribute in some way
09:13:14 <oneswig> The way the compute is typically consumed, I guess, makes it easier for users to consume
09:13:32 <oneswig> daveholland: how have you gone about that?
09:13:37 <priteau> Orchestration templates (Heat or CloudFormation or something else) should be handled as well as disk images
09:15:00 <daveholland> oneswig: we pointed the intern in the right direction and let him go :) Seriously, we're using Packer (so, image definition as code) and building on a base OS image. There's no concept of combining functions (e.g. openlava with docker) which is something I don't know how to address
09:15:04 <oneswig> priteau: on chameleon, do you have a feel for how many people use the appliances as a proportion?
09:15:25 <verdurin> Yes, I think we need the recipes as well, or references to upstream support cf. definitions in elasticluster
09:15:44 <priteau> oneswig: you mean what we call "complex appliances", which are orchestrated templates?
09:16:17 <b1airo> Anyone else using Murano? We have it on Nectar also and have some members contributing new packages quite steadily, but they are mostly typical app stacks rather than science/research specific at the moment
09:16:39 <oneswig> priteau: things like the hadoop appliance
09:17:10 <b1airo> I need to pay attention here now, pls mention me if there's anything to answer
09:17:53 <oneswig> We don't have Murano but are interested generally in it as a solution.  One issue IIRC is that it's not supported by Red Hat OSP
09:18:14 <oneswig> Which makes it tricky for quite a lot of deployments out there.
09:18:30 <daveholland> @blairo we're not using Murano mostly because it's not in RHOSP. but it looks potentially interesting
09:18:50 <b1airo> Does RH have a viable alternative?
09:19:03 <priteau> oneswig: I haven't run the numbers recently but IIRC some months ago, most usage was with basic CentOS or Ubuntu images, then images for specialized hardware (GPU, FPGA), then OpenStack deployments
09:19:15 <b1airo> Things like Juju seem like a reasonable agnostic alternative
09:19:19 <oneswig> daveholland: Am I right in thinking that RH will still support OSP if a customer integrates other services - just not the extra non-supported services?
09:19:33 <daveholland> I'm not sure if OpenShift fits that gap (maybe if you push hard)
09:19:54 <priteau> The Hadoop appliance wasn't used much, I assume because it targets a very specific community of people interested by high-performance networking in VMs
09:19:56 <oneswig> b1airo: I think RH would like people to use OpenShift or CloudForms perhaps?
09:20:15 <oneswig> ok thanks priteau
09:20:15 <daveholland> @oneswig yes, we have done various customizations (e.g. ml2 drivers) and still get support, we haven't added other OpenStack services though
09:20:21 <verdurin> daveholland: Openshift leans more towards developers
09:20:54 <verdurin> RH may have Murano as a Tech. Preview soon, if they don't already
09:21:11 <priteau> oneswig: but we are a special testbed with a lot of users building their own systems for research. I expect domain scientists would use pre-packaged appliances much more.
09:21:36 <oneswig> From this discussion, I think if somebody was to deploy Murano and make a good fist of it, we've got enough content for another section of the book on this.
09:21:56 <daveholland> Murano isn't tech preview in RHOSP11 FWIW (we're only just about to deploy RHOSP10 as our next iteration...)
09:22:17 <b1airo> Yeah I think Nectar generally would be interested in contributing and collaborating on that
09:22:25 <verdurin> daveholland: okay, once more my optimism is dashed
09:22:32 <daveholland> sorry :) :(
09:22:48 <oneswig> b1airo: jmlowe and rbudden I'm sure would also be interested.
09:22:57 <verdurin> oneswig: we would be happy to contribute
09:23:05 <b1airo> Yes, this conversation actually started with them
09:23:29 <oneswig> verdurin: daveholland: I expect it can be done within TripleO + RDO (gut instinct)
09:23:38 <oneswig> And would then transfer direct to OSP
09:24:01 <oneswig> I can do some research on that.
09:24:08 <verdurin> oneswig: sure - that's how I would test it anyway
09:24:44 <oneswig> b1airo: Is there anything online about Nectar's work with Murano so far?
09:25:52 <oneswig> verdurin: testing Murano on tripleo - that would be great, lets try to work out how
09:26:10 <daveholland> do I understand right that Murano uses heat? Is that to assemble components in an instance deployed from a bare OS image? (sounds like say using CloudForms to inject cloud-init script to install/configure packages)
09:26:53 <b1airo> oneswig: a little, e.g., https://nectar.org.au/cloud-application-catalogue-released/
09:27:08 <oneswig> daveholland: I believe so, yes.  Particularly good for managing a multi-node application topology as a single addressable thing
09:27:36 <priteau> daveholland: yes, Murano can store Heat templates: https://docs.openstack.org/developer/murano/appdev-guide/hot_packages.html
09:27:51 <daveholland> OK, and that does address how to combine components/applications in a single instance (assuming they don't tread on each other) thanks
09:28:37 <oneswig> I wonder how portable Murano apps are from one cloud to another.  Probably quite easy to include site-specific assumptions.
09:29:48 <oneswig> daveholland: b1airo's suggestion of juju earlier covers very similar ground - but even less likely to be on RH's roadmap
09:30:20 <zz9pzza> If you used something like packer with openstack and aws to generate images that would help to reduce site specific issues.
09:30:59 <b1airo> Though doesn't need to be on RH's roadmap to use it against a RHOSP cloud
09:31:07 <verdurin> Yes, I'd go that way rather than Juju, about which I'm a bit of a sceptic
09:31:13 <oneswig> zz9pzza: agreed, building for two different targets does help remove assumptions
09:33:13 <oneswig> OK, how can we make progress here?  I think there's some investigation and reporting back.  Like everyone I can't be certain if I'll get the time at this end but I'd like to try it too.
09:33:25 <daveholland> it's a trade-off though: build images and you have (potentially exponential) image sprawl and the size requirements; but an image is quicker to deploy the app than installing it on a bare OS every time
09:34:02 <zz9pzza> I prefer images, as it reduces launch time and they can have functional tests via a ci
09:34:37 <verdurin> oneswig: happy to coordinate with you separately about Murano testing with TripleO
09:34:39 <b1airo> I'll have a think about a way to survey what people are going/seeing today
09:34:55 <b1airo> s/going/doing/
09:34:57 <oneswig> I guess with images you lose the opportunity to easily parameterise, eg scale of resource allocated
09:35:13 <oneswig> verdurin: great, lets do that.
09:35:21 <zz9pzza> You can put some customisation in cloud init
09:35:26 <b1airo> Can still use cloud-init to customise behaviour oneswig
09:35:31 <b1airo> Jinx!
09:35:35 <zz9pzza> :)
09:35:43 <oneswig> zz9pzza: but not things like the number of worker nodes?
09:35:56 <zz9pzza> But that is at a different layer eg heat
09:36:11 <zz9pzza> or auto scaling from the first machine
09:36:15 <oneswig> true.  Heat + images would get you a long way
09:36:57 <oneswig> that is was we do for cluster-as-a-service currently, with ansible as the icing on the cake :-)
09:37:24 <b1airo> The biggest issue we see with Heat templates is that the original dev assumes some feature of their local cloud, e.g., tenant networks or floating IPs
09:37:46 <b1airo> Often those things are not necessary dependencies, but it's hard to untangle them
09:37:53 <priteau> My dream would be one unified way of defining customization at image-build time or at deployment-time, and the infrastructure would decide which is best to use
09:37:56 <oneswig> b1airo: My heat templates certainly have that stuff in them.
09:38:02 <priteau> But I suppose that's more for a PaaS than for IaaS
09:38:38 <daveholland> heat input parameters (+ defaults) could paper over that
09:38:49 <b1airo> Yes priteau , reminds me of when I was programming against Azure many years back
09:38:59 <oneswig> priteau: In this discussion I see the application catalogue in that layer, or above it
09:42:13 <oneswig> OK, final actions?  I can start a review for a new section of the book.  When people's research comes back, we can start to gather findings there.  b1airo it would be particularly interesting if there's evidence of Nectar's Murano serving science/HPC apps.  GPUs, SRIOV - bonus material.
09:43:21 <b1airo> Shall we have a quick chat on security oneswig ? I have one eye on my phone still...
09:43:21 <oneswig> We should also roll over this topic for the Americas
09:43:21 <oneswig> OK, here it comes
09:43:21 <oneswig> #topic Security of research computing instances
09:43:21 <b1airo> Yes agreed re. rollover
09:43:28 <oneswig> Another week, another kernel patch.
09:43:47 <b1airo> Ha!
09:43:50 <daveholland> :/
09:44:00 <oneswig> I'm not sure cloud helps to the extent that it could with these things - so easy to have user machines going rotten
09:44:10 <oneswig> Any good practice out there?
09:44:29 <b1airo> We are looking at ways to integrate FW into our OpenStack in an opt-in manner (it's otherwise a DMZ)
09:44:40 <daveholland> turn on unattended-upgrades (or similar) in base images? maybe controversial for repeatable/reproducible
09:44:50 <zz9pzza> I note that cloudforms can do security scans by taking a snapshot of a machine and run scripts against that image however we havn't tried using it yet.
09:45:00 <verdurin> Should be possible to reuse some of the work people are doing to vet container images/definition files?
09:45:14 <daveholland> or accept that the soft centre is going to remain soft, and check that security groups keep the outside relatively hard?
09:45:28 <priteau> We run a vulnerability scanner against instances, but that only covers issues that can be detected remotely (mostly SSL security problems)
09:45:41 <b1airo> Currently seems like a routed solution where FW is the default gateway on certain provider nets might be the simplest option. So no direct OpenStack Integra
09:45:57 <oneswig> I've heard of the friendly probing service for that - both against infra and instances
09:46:36 <b1airo> verdurin: any links on that?
09:46:47 <oneswig> b1airo: is that an instance serving as firewall & gateway, or FWaaS?
09:47:05 <verdurin> b1airo: e.g. https://github.com/coreos/clair
09:47:16 <daveholland> also the location of the security responsibility boundary. If the OpenStack admin is purely providing hosting (indeed might be contractually forbidden from looking inside instances) that makes a difference
09:47:25 <b1airo> The image catalog work I mentioned earlier includes a static security scan of the image as one of the criteria to pass before we put the image in the "Contributed" category
09:48:02 <oneswig> b1airo: sounds good.  Is that on nectar's github?
09:48:22 <b1airo> oneswig: the FW could be in an instance but we already have physical Palo Altos that we can probably integrate like that
09:49:21 <priteau> There was a related talk in Barcelona, still on my watch list though
09:49:24 <priteau> #link https://www.youtube.com/watch?v=vuL7in9CxHY
09:49:47 <oneswig> Is there an equivalent of something like tripwire/electricfence that can run within a VTN, or is that under the firewall remit?
09:49:57 <oneswig> Thanks priteau, looks interesting
09:50:00 <b1airo> oneswig: yes, it's in our CI somewhere
09:50:18 <b1airo> I can take that as an action to share
09:50:25 <zz9pzza> But that kind of thing will not catch new issues that arise.
09:50:28 <oneswig> b1airo: sure that would be helpful
09:51:20 <verdurin> zz9pzza: yes, that was my thought
09:51:35 <zz9pzza> Which cloudforms could in theory
09:51:48 <b1airo> zz9pzza: true, but we require "Contributed" images to be updated on a rolling basis
09:52:10 <oneswig> zz9pzza: I guess the hope is to catch the misuse of a compromised system.  The university network watches for this.  I guess it's all layers
09:52:13 <zz9pzza> but if I have a system that has been up for a month...
09:53:01 <daveholland> @b1airo do you expire or time-limit instances made from one of those images? can imagine a long-lived instance being vulnerable to a newly-announced bug
09:53:08 <b1airo> Oh sure. We do vulnerability scanning too
09:53:54 <daveholland> JOOI what is the time limit? we are looking to host "pets" which could be around for months
09:54:00 <oneswig> daveholland: indeed.  If the managed infrastructure is updated immediately, what equivalent can be done to shore up user cloud instances
09:56:37 <oneswig> Ah, we are short on time.  Any more to add here?
09:57:31 <oneswig> I think the different strategies are all valuable, great suggestions thanks everyone
09:57:40 <daveholland> just a quick comment about user education :) this week I found someone who thought he had to enable ssh inbound to make ssh outbound work
09:57:49 <verdurin> Worth talking to CERN people - when I was active in CMS, there was a lot of discussions about these sorts of security issues, e.g. with long lasting instances
09:58:19 <daveholland> (actually it wasn't ssh in particular but a misunderstanding about port number at each end of a TCP connection)
10:00:08 <priteau> oneswig: unrelated to the current topic, but any news about https://review.openstack.org/#/c/459884/
10:00:12 <b1airo> Thanks all. I think all this warrants some ML threads too...
10:00:13 <oneswig> verdurin: good point, and it's a general issue not just for research computing
10:00:47 <verdurin> Bye all
10:00:51 <oneswig> priteau: no, oddly - I'll make a note to chase
10:00:54 <oneswig> thanks everyone, out of time
10:00:55 <zz9pzza> ttfn
10:01:00 <oneswig> #endmeeting