#openstack-meeting log

11:00:25 <oneswig> #startmeeting scientific-sig
11:00:25 <openstack> Meeting started Wed Jun  5 11:00:25 2019 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
11:00:26 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
11:00:28 <openstack> The meeting name has been set to 'scientific_sig'
11:00:34 <oneswig> yarr \o/
11:00:34 <janders> hi All
11:00:49 <martial_> Good morning/day/evening :)
11:00:51 <janders> oneswig: would you mind if we start with the SDN discussion? I need to pop out for half an hour soon
11:00:51 <oneswig> hi janders, g'day
11:00:59 <verdurin> Afternoon.
11:01:01 <oneswig> certainly, lets do that.
11:01:03 <oneswig> hi verdurin
11:01:11 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_June_5th_2019
11:01:25 <oneswig> OK, let's start by ignoring the running order...
11:01:27 <dh3> hi all
11:01:31 <mgoddard> o/
11:01:36 <oneswig> #topic Coherent SDN fabrics
11:01:44 <b1airo> 'ello
11:01:45 <oneswig> janders: you raised this at teh PTG
11:01:47 <janders> only a brief update from my side: I've had a brief chat with the Mellanox guys. Overall they don't see an issue with this, but I'd like to get more detailed feedback off them
11:01:59 <oneswig> #chair b1airo martial_
11:02:00 <openstack> Current chairs: b1airo martial_ oneswig
11:02:07 <janders> last week's neutron-driver meeting got cancelled so haven't spoken to the Neutron guys in detail either
11:02:26 <janders> indeed - this is follow up to the SDN consistency issues that we raised in the SIG
11:02:31 <oneswig> janders: ah that's too bad on the cancelled meeting
11:02:50 <oneswig> hi mgoddard thanks for coming
11:02:57 <oneswig> did you see what this is about?
11:03:18 <mgoddard> hi oneswig yes I did, and thought I'd pop along
11:03:28 <janders> hey Mark thanks for coming
11:03:30 <oneswig> #link Spec in review https://review.opendev.org/#/c/565463/
11:03:33 <mgoddard> I haven't got as far through the neutron spec as I'd like
11:04:30 <oneswig> Me neither.  Still a little concerned it's tied to one implementation
11:05:18 <oneswig> We could come back to this in 15 minutes to allow some reading time?
11:05:40 <hberaud> o/
11:05:41 <janders> my understanding is they're trying to model on ODL but are trying to make the consistency check mechanism generic
11:06:20 <janders> what we can also do is - I will give you guys some time to go through this more in detail - and if you could make comments in the bug, that would be great
11:06:21 <oneswig> Hi hberaud, welcome
11:06:36 <janders> doesn't have to be during the meeting
11:06:43 <hberaud> oneswig: thx
11:06:46 <oneswig> janders: +1 makes sense to get the feedback out there
11:07:00 <mgoddard> one key difference between mlnx and OVS is the async update
11:07:03 <janders> but I think if we could provide some feedback to the Neutron guys before Friday's driver meeting, that would be great
11:07:12 <mgoddard> the mellanox driver updates NEO in an async thread
11:07:57 <mgoddard> so not only have we committed the update to the neutron DB, we've actually finished processing the original update request by the time NEO is updated
11:08:48 <janders> right! that matches my experience/observations
11:09:35 <oneswig> mgoddard: in extreme cases could this cause a provisioned instance to be transiently connected to a different tenant network?  Is there any means of blocking on completion of an operation?
11:10:08 <b1airo> o_0 !
11:10:16 <mgoddard> oneswig: potentially
11:10:31 <mgoddard> oneswig: neutron does have a blocking mechanism, but I don't think mlnx uses it
11:10:38 <janders> which part of this would need to go into neutron-core, and which part would be mlnx-specific changes?
11:11:09 <mgoddard> there is a thing called provisioning blocks which can be used for this
11:11:41 <mgoddard> agents are inherently aysnc, so it's nothing new. You just have to know how to use it
11:11:55 <janders> convincing mlnx to use that definitely sounds like a good idea
11:11:56 <oneswig> Ah OK.  Feature not bug
11:12:36 <janders> but how about detecting state divergence and addressing it? what we're talking about doesn't address that right?
11:12:40 <janders> or would it?
11:12:51 <mgoddard> Probably bug not feature :)
11:13:32 <b1airo> state divergence is always a potential problem if the controller can be changed/updated independently of Neutron
11:13:41 <oneswig> janders: yes this doesn't relate to the missing port updates
11:14:04 <janders> my idea behind the bug was implementing a mechanism that brings SDN-based approaches on par with ovs - so that state divergence can be detected/addressed
11:14:26 <janders> https://review.opendev.org/#/c/565463/12/specs/stein/sbi-database-consistency.rst@332 looks like it might be able to address that
11:14:53 <janders> however it would be great to have your feedback
11:15:23 <janders> in general I think it would be worthwhile to split the discussion into two parts - the generic one (which the neutron team need to address) and the specific ones (which I'm happy to bring up with Mellanox)
11:16:00 <janders> I totally see value of improvements around provisioning blocks as suggested by mgoddard
11:17:01 <oneswig> Makes sense.  First part may be gated, for example, on the difficulty of associating a revision number on applied config in arbitrary SDN implementations.  That's reviewing the spec I guess
11:17:02 <mgoddard> I think provisioning blocks could be an easy win
11:17:23 <mgoddard> then the spec might be able to prevent subsequent drift
11:17:43 <oneswig> janders: think you can entice your friendly mlnx engineer here for part 2?
11:17:52 <janders> I have to disappear for about half an hour now - can I please ask you to make comments on what you like/dislike/any suggestions around the generic part in the bug report? It would be great to see some activity there, it would help get attention from the devs
11:18:03 <janders> oneswig: yes, I'm happy to! :)
11:18:08 <oneswig> +1
11:18:19 <janders> excellent - thanks guys
11:18:32 <janders> hopefully I'll be back before the end of the meeting
11:18:35 <oneswig> #action janders Involve Mellanox tech team on the issue
11:18:49 <oneswig> #action oneswig mgoddard add comments to the bug report and/or spec
11:19:08 <oneswig> OK, thanks janders and all, next
11:19:18 <oneswig> #topic CERN OpenStack roundup
11:19:36 <oneswig> that was fun, the CERN team laid on a great day
11:19:41 <oneswig> dh3: great talk
11:19:48 <dh3> you are too kind *blush*
11:19:55 <oneswig> How's the eagle recovering from being swabbed? :-)
11:20:17 <dh3> I hear there are some interesting stories from field expeditions ("Real Science")
11:20:58 <oneswig> I once had a support call that ended "can we get this fixed because there's a plane flying over antartica right now that needs the service back onilne..."
11:21:15 <oneswig> In a previous job
11:21:18 <b1airo> looked like a good time on Slack and Twitter
11:21:58 <oneswig> I enjoyed it.  In the afternoon the talk on OpenStack control plane tracing was especially interesting.
11:22:25 <dh3> Yes I think osprofiler could be educational as well as useful
11:22:44 <oneswig> #link Ilya's presentation https://indico.cern.ch/event/776411/contributions/3402240/attachments/1851629/3039923/08_-_Distributed_Tracing_in_OpenStack.pptx.pdf
11:23:20 <oneswig> I think we'll be doing plenty of that in due course
11:24:02 <martial_> nice, thanks
11:25:24 <oneswig> Erwan Gallens talk on face recognition was good too - nice to cover more on the algorithms.
11:25:58 <oneswig> OK we should move on I think
11:26:12 <oneswig> #topic Secure Computing Environments
11:26:44 <oneswig> One question from the audience at CERN last week related to how to find best practice for controlled-access data processing
11:27:23 <oneswig> This is something we've attempted to grapple with previously but not got very far towards common ground.
11:27:31 <oneswig> Once more unto the breach...
11:28:03 <oneswig> Jani from Basel University and I presented on a project underway there
11:28:23 <oneswig> #link BioMedIT project in Basel https://indico.cern.ch/event/776411/contributions/3345191/attachments/1851634/3039929/05_-_Secure_IT_framework_for_Personalised_Health_data.pdf
11:28:45 <oneswig> This was a lightning talk though so no time to cover anything at all in depth
11:28:54 <martial_> #link https://indico.cern.ch/event/776411/contributions/
11:29:04 <martial_> this had all the talks
11:29:25 <oneswig> cool, thanks martial_
11:30:11 <dh3> We don't have a best practice as such - we are still offering all the various compute/storage facilities to data owners/principal investigators and trying to help them choose the right one for their data access and analysis workload
11:30:29 <dh3> buffet table vs a la carte I guess
11:31:07 <verdurin> On our side, the first meeting to discuss a paper about our work with restricted data is coming up. That will be science as well as infrastructure.
11:32:16 <oneswig> I saw a useful paper recently on classification of security levels across various domains, but I'm struggling to find it alas
11:32:38 <b1airo> interesting set of slides
11:34:10 <b1airo> looks like a decent architecture, though the thing often missing is mapping of various controls to any particular framework of reference
11:34:40 <oneswig> Something I'd like to see is authoritative research of the form "to achieve this level of security / standard, you must implement this...'
11:35:24 <dh3> oneswig: do you have something like ISO 27001/2 in mind?
11:36:12 <oneswig> Yes, that kind of thing, but also there are different classification systems in different sectors - government, health data, etc.
11:38:02 <b1airo> yes agreed oneswig... starting with a basic multi-tenant cloud implementation, what bits of OpenStack or other standard tooling can you sprinkle in to achieve sufficient confidence in the architecture and visibility of drift or abnormalities, you then need to layer in incident response requirements, so on and so forth
11:39:42 <oneswig> b1airo: infrastructure configuration is certainly not the only requirement, true.
11:40:15 <verdurin> Yes, in my experience, lots of policies, and possibly even the setting up of new administrative bodies.
11:40:15 <oneswig> I'm drawing a blank on this paper, may follow up to the Slack workspace if I find it.
11:40:51 <b1airo> to achieve compliance with any standard there's necessarily a layer of abstraction down to specific controls, procedures (and sometimes the organisational units & heads that are responsible for them)
11:41:42 <dh3> I've never had to work with that sort of published standard but experience says that it will reach from infra/admins through application devs too
11:42:30 <b1airo> in OpenStack today I don't think we even have a reference list of controls, let alone reference architectures
11:42:31 <oneswig> dh3: do you need specific standards to host eg UK BioBank data?  (Or was that just animals and plants?)
11:43:19 <verdurin> oneswig: it's actually more complicated than that, there are many varieties of UKB data, with their own special requirements.
11:43:22 <dh3> oneswig: it's not really my area but I know that for some data (human identifiable) we have had to implement separate networks, virt platforms, storage....
11:44:50 <oneswig> dh3: does Sanger have policy guidelines for this that are public?
11:44:54 <dh3> our UKB vanguard data is in our production OpenStack environment, but in a separate host aggregate (for perf isolation but does bring security) and separate storage/networking (via provider network)
11:45:06 <dh3> oneswig: not that I have seen! but I can ask
11:45:16 <heikkine> dh3 Jani from Basel here, I agree with requirement for separate networks, etc.
11:45:26 <oneswig> Might be really useful!
11:45:34 <oneswig> hi heikkine, thanks for coming
11:45:51 <heikkine> I may drop off any moment but trying to follow what I can
11:46:27 <oneswig> dh3: do you go as far as separate physical storage?
11:47:15 <dh3> oneswig: yes in some cases, "identifiable human data" is the usual trigger but not always
11:47:40 <heikkine> dh3: what is in your case separate physical storage? Separate from OpenStack?
11:48:31 <dh3> heikkine: taking UKB as an example, we have it on a separate Lustre system, it's only accessible from that tenant's provider network.
11:49:02 <oneswig> so separate from OpenStack and separate from other projects too
11:49:03 <janders> would software-defined-storage help in this sense?
11:49:26 <heikkine> I see. So you do tenant isolation that way. We are starting with NFS but I have done tests with BeeGFS
11:49:33 <janders> as in - having a pool of JBOFs - and then spinning up a "private" storage cluster of a certain scale on demand
11:49:38 <verdurin> We are looking at Lustre to cover this kind of case in the future, given it's more amenable than GPFS.
11:49:59 <oneswig> A substantial discussion in its own right
11:50:38 <b1airo> there are some US standards that actually specify isolated physical storage
11:50:39 <dh3> janders: it depends - again not entirely my area but I understand that some data access agreements are restrictive on where data can be kept (physically and geographically) - and SDS makes it harder to show an auditor "it's here"
11:51:08 <oneswig> Over the summer it may be good to gather some of these pieces together into a joint study.  Does anyone object to being involved?
11:51:09 <b1airo> e.g. think ITAR data cannot be mixed with non-ITAR data on same physical storage
11:51:34 <martial_> I think this is correct, Blair
11:51:36 <b1airo> lol, love the phrasing of that question oneswig !
11:51:37 <janders> right! whether it works is one thing, whether it ticks compliance box might be another thing. Very useful feedback, thank you.
11:51:37 <verdurin> janders: yes, to echo dh3's comments, I'm dealing with a project at the moment in which the form would not cope well with such a setup.
11:52:04 <dh3> SDS also makes proving secure deletion more difficult (can't even identify which HDDs to take out and shred...)
11:52:22 <oneswig> b1airo: opt-out from being volunteered :-)
11:52:24 <janders> (I suppose node cleaning doesn't cut it :)
11:52:42 <dh3> node cleaning + DBAN maybe :)
11:53:01 <b1airo> janders: actually if you are using self-encrypting disks then it might...
11:53:04 <verdurin> Don't deprive the workshop people of their longer-for chances to deploy the angle grinder
11:53:10 <verdurin> longed-for
11:53:28 <oneswig> verdurin: --yes-i-really-mean-it ...
11:53:34 <janders> :D
11:53:40 <b1airo> because we know hardware level security is so good... *cough*
11:54:15 <oneswig> One more topic to cover today, can we follow up on this perhaps via the slack channel?
11:54:26 <b1airo> sounds good
11:54:29 <oneswig> dh3: are you on there?
11:54:47 <oneswig> I can send an invite - unless you're too secure for Slack of course :-)
11:55:11 <oneswig> #topic Justification whitepaper for on premise cloud for research and scientific computing
11:55:16 <dh3> oneswig: no (not yet - do invite me please)
11:55:22 <oneswig> dh3: will do
11:55:58 <oneswig> just a quick mention, some of the US folks are very interested in pooling together on a paper that lays out the private cloud case
11:56:17 <oneswig> This was raised at the SIG session in the Denver PTG
11:56:18 <b1airo> there's a great NASA tech paper i came across today that looks at this from the HPC perspective: https://www.nas.nasa.gov/assets/pdf/papers/NAS_Technical_Report_NAS-2018-01.pdf
11:56:32 <oneswig> Anyone in this session recently made this case themselves?
11:56:41 <oneswig> thanks b1airo
11:57:17 <oneswig> During the session somebody mentioned the talk by Andrew Jones of NAG at the UKRI Cloud Workshop in January
11:57:25 <oneswig> #link Andrew Jones presentation https://www.linkedin.com/pulse/cloud-hpc-dose-reality-andrew-jones/
11:58:30 <oneswig> The hope is that this material might make a useful resource to draw upon when writing procurement proposals.
11:58:42 <janders> lots of follow up reading after today's meeting - thanks for some great links guys
11:59:02 <b1airo> a key thing to establish early on here would be the kinds of applications that are being supported on the private cloud
11:59:09 <dh3> hadn't seen that, thanks (but we are "strategically" going cloud-wards anyway)
11:59:17 <oneswig> b1airo: very true
11:59:20 <b1airo> and a valid answer might be "all of them"
11:59:36 <oneswig> OK nearly out of time - final comments on this?
11:59:55 <b1airo> raise it again for next week's timezone i guess
12:00:07 <oneswig> b1airo: +1, will do.
12:00:30 <oneswig> OK all, time to stop - thanks everyone
12:00:31 <janders> thanks guys! again - any comments in the SDN bug will be very welcome by the Neutron team
12:00:37 <martial_> Thanks ;)
12:00:39 <oneswig> janders: will do
12:00:46 <oneswig> #endmeeting