07:00:35 #startmeeting scientific-wg 07:00:36 Meeting started Wed May 25 07:00:35 2016 UTC and is due to finish in 60 minutes. The chair is b1airo. Information about MeetBot at http://wiki.debian.org/MeetBot. 07:00:37 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 07:00:39 The meeting name has been set to 'scientific_wg' 07:00:54 Hi all 07:01:00 Morning/afternoon/evening 07:01:10 Morning. 07:01:12 now how do I give oneswig chair too? 07:01:15 let's try this... 07:01:18 #chair oneswig 07:01:19 Current chairs: b1airo oneswig 07:01:23 yay! 07:01:27 power of IRC is strong! 07:01:34 #topic roll call 07:01:43 #force-lightening 07:01:58 \o/ 07:02:30 Adam from the Crick here 07:02:38 Hi Adam 07:02:43 hi Adam, thanks for joinging 07:03:13 Tim from CERN 07:03:21 Hi Tim 07:04:07 hi Tim, i hoped this time would suit - glad it seems too 07:04:22 We are down a bit from last week's 20-odd though... 07:04:45 yeah, that was fairly impressive 07:05:26 OK shall we go through the agenda 07:05:39 yep 07:05:41 #topic Action items from last week 07:06:11 I had an item to look at user stories and how they might fit 07:06:33 I see I need to add eMedLab to https://etherpad.openstack.org/p/science-clouds 07:06:39 I looked through the Product WG pages, there's some good infrastructure there 07:06:46 verdurin: yes please 07:06:47 my 3 primary items are in progress, nothing finished there yet but should have some wiki updates to publish over the weekend 07:07:16 yes please verdurin - i have scraped that once already but will circle back 07:07:20 b1airo: great, will be good to see. Our page might need subdividing it's getting big 07:08:10 b1airo: will do so by the end of the week 07:08:10 yeah i intended to add this stuff as sub-pages - the etherpad de-convolution is taking a bit of head scratching (doesn't help when i try to look at it at night) 07:08:39 I had an item to chase up on the BMC issues we have with the Ironic team. I exchanged some mails but didn't file the bug they suggested yet. 07:08:57 verdurin, did you fill out our little google survey during the summit? (about your deployment/s) 07:09:36 b1airo: I filled in the normal survey, as I have done before - was there an extra one? 07:09:37 b1airo: is there any process for "garbage collection" of old etherpads or do they remain in the historical record in perpetuity? 07:10:20 verdurin, we created one specific to hpc/science workloads with questions not asked in the main user survey 07:10:30 b1airo: where's that #link? 07:10:36 let me find the link for you - would be great to have your data there too if you can spare the five minutes 07:10:50 b1airo: sure, happy to fill in that one too 07:11:20 oneswig, not that i'm aware of 07:11:26 OK so what actions have rolled over? 07:12:43 #action oneswig to capture iDRAC BMC failure mode in a bug on python-dracclient 07:13:04 #action b1airo to continue with transfer of content to wiki from etherpads 07:13:14 any others from last week? 07:13:41 #topic Parallel filesystems 07:14:06 There were some useful discussions on parallel filesystems but I wasn't part of them - was anyone here in that session? 07:14:09 verdurin, bit.ly/22XB35R 07:14:09 OpenStack Science & HPC usage survey: 07:14:09 #link bit.ly/22XB35R 07:14:10 oneswig, re. actions we should roll over at least these: 07:14:35 IBM have sent out a survey on OpenStack and GPFS, which I will be filling in at some point. 07:14:55 verdurin: was it public, can you share a #link to it? 07:15:26 oneswig: https://www.surveygizmo.com/s3/2774614/IBMSpectrumScale-OpenStackUsageSurvey 07:15:50 we had a brief email exchange with an IBM developer, sounds like they have a lot of interesting features/possibilities, particularly around exposing things in GPFS via Swift API 07:15:54 verdurin: thanks 07:16:14 They had a talk on OpenStack at the user group meeting in London last week. 07:17:09 Filesystems are becoming increasingly object stores as well. I heard that Red Hat are defaulting to using Ceph's object store as backing store for Ceilometer time-series data (via Gnocchi) 07:17:30 we 07:17:47 the only problem with these integrations is that it seems to be hard to do it with upstream projects and requires using some version of an upstream project (e.g. Swift servers) that the vendor has "tweaked" behind closed doors 07:17:53 we've been testing the gnocchi ceph backend. Seems more scalable than ceilometer but still early days 07:18:18 Yes, there's a lot of room for improvement in the Swift integration with GPFS. 07:18:27 noggin143: any feel for aggregate sampling rate it can sustain for a given Ceph deployment size? 07:18:32 Though it's a lot better now than it was last year. 07:19:07 verdurin, are you using that particular integration? 07:20:00 oneswig: doing functional tests at the moment, packaging and puppet so it's a bit early 07:20:27 OK thanks, I'd be interested to hear how it works out 07:20:29 b1airo: Not at the moment - just observations of the infrastructure 07:20:59 regarding the break-out HPFS group in the scientific-wg meeting - i was there but didn't have a lot of luck bringing discussion back to concrete deliverables beyond use-case doco 07:21:02 b1airo: that's in eMedLab. For Crick, it's something we might explore. 07:21:47 though we did agree there were several different angles to come at it: 07:22:37 b1airo: was there a volunteer to lead this activity? 07:22:59 1) plumbing existing non-cloud HPFS into a cloud, so largely network focused, e.g., neutron provider net for hpc-data with filesystem in some reserved part of the subnet range 07:23:51 oneswig, no, maybe worth calling out on the list? 07:24:08 I'll note tha 07:24:09 2) manila integration 07:24:55 3) fully cloud-ified, i.e., filesystem servers run on the IaaS, presumably mainly for dev/test purposes 07:25:03 #action Seek a volunteer to lead the WG activities around parallel filesystems 07:25:58 at Monash we are doing #1 with Lustre 07:26:04 As part of the wiki organisation it would be great to have a sub-page for each ativity 07:26:17 We have an interesting project at Cambridge underway wrt #3 07:26:28 Long way off at present though. 07:27:11 We have RDMA-enabled JBODs in our cloud, hoping to use iSER to provision lustre in the cloud that isn't dreadful 07:27:44 Interesting. 07:28:12 #1 seems to work pretty well, at least when you have hardware/SRIOV NICs into instances. but having said that we are seeing a few "odd" dmesg warnings that seem to indicate comms issues between clients and OSTs, yet to really get anywhere with that (causes occasional I/O blips for clients) 07:28:20 verdurin: am I right in thinking that eMedLab uses GPFS for serving Cinder volumes? 07:28:27 oneswig: yes 07:28:41 b1airo: is the provider network you're using L3 or L2? 07:28:44 oneswig, should work! ;-) 07:28:51 L2 07:29:07 Yes, L2 07:29:33 b1airo verdurin: good to know, thanks, that'll help our deployment plans 07:29:43 so can use o2ib LNET with Ethernet vNICs in the guests 07:30:31 let's talk about accounting and scheduling then? 07:30:46 Ah yes. No more on HPFS? 07:30:51 did noggin143 volunteer to lead that activity ? 07:31:05 #topic Accounting and scheduling 07:31:20 I didn't volunteer… 07:31:33 happy to talk more about that, by all means, just thinking Tim might be more interested in it 07:31:48 :-) no worries, i was guessing 07:31:52 Matt Jarvis was chairing the discussions at the summit but I don't know if he wants to take it on 07:32:16 OK, let's make a note to find a leader for accounting and scheduling 07:32:35 I did review the product WG user story. It seems more reservation than quotas 07:32:39 #action Find a leader for the activities on accounting and scheduling 07:33:04 There were references to group quotas, do these not go far enough? 07:33:06 #link http://specs.openstack.org/openstack/openstack-user-stories/user-stories/draft/capacity_management.html 07:33:36 It remindered me of Blazar 07:33:42 #link https://wiki.openstack.org/wiki/Blazar 07:33:59 have you tried Blazar? 07:34:05 Blazar is resurrected in the Chameleon project, did you hear that 07:34:42 We never tried blazar (it needed to get more momentum) … would be interested to read about Chameleon 07:34:45 I must say project proliferation isn't always helpful... 07:35:37 There is also some work going on with INFN in the EU Indigo Project called Symphony to add some fair share and pre-emptable instances 07:35:43 The user story is missing something like "As a cloud service provider, I want to manage the resource usage for a project as a single entity comprising the sum of its users" - is that what they are missing? 07:36:26 I think the user story is completely valid but it is not, from my perspective, the HTC/HPC user story for quotas 07:36:41 hi there, sorry for being late, Dario from EMBL-EBI here 07:36:49 verdurin, i agree - Blazar seems like one of those ideas where lots of people like/want the features but because it isn't part of Nova they are not sure it's safe to integrate. i also think it's hard to (quickly, from scanning the wiki) determine the impact of deploying it from the user and operator perspective 07:37:29 hi dariov, thanks for coming! 07:38:20 noggin143, Symphony sounds very interesting from the NeCTAR perspective 07:38:28 noggin143: Do you think that is a separate user story or should it be proposed as a consideration for this story? 07:38:29 hadn't heard about that 07:38:41 I went to a talk at the summit about the Chameleon Cloud if that is the same thing. 07:38:44 #link https://www.openstack.org/videos/video/chameleon-an-experimental-testbed-for-computer-science-as-application-of-cloud-computing-1 07:38:52 They talked about Blazar too. 07:38:59 noggin143: that's the one 07:39:43 Not sure what the product WG's preference would be between increasing the complexity of a user story or defining a new one 07:40:12 I think there are significant additional needs from the research sector which would potentially overwhelm the current user story 07:40:21 i went to the chameleon talk too but had forgotten the mention of preemption 07:40:34 OK we should note some actions, what can we do to move the discussion forward? 07:41:14 I have been in contact with the PI from Chameleon previously, I can find out what the state of Blazar is and report back 07:41:27 #action Tim to follow up with the product WG on research quota user story integration 07:41:35 Thanks Tim 07:41:57 #action oneswig to follow up with Chameleon project re: state of Blazar 07:42:17 Any others? 07:42:39 #topic User stories 07:43:13 OK, I did a cursory amount of investigation on this. I think we've covered the user stories from the Product WG already. 07:43:44 They have a process for proposing and reviewing them based on code review of a repo of RST documents 07:43:53 Very much in the form of spec review 07:44:22 yep, but what of the scope? 07:44:57 It would be interesting to consider what capabilities might be added from a research computing use case 07:45:29 b1airo: do you mean, what do we get out of creating user stories? 07:46:00 no i mean at what level should they be scoped, i.e. 07:46:14 #link user stories filed thus far http://specs.openstack.org/openstack/openstack-user-stories/ 07:46:33 i'm a cloud architect who wants to integrate my HPC filesystem into blah datacloud 07:46:43 is that the right level or too specific 07:47:22 Not sure. The documents have predefined structure, which should guide the level of detail. They aren't more than a few pages 07:47:51 looking at the current proposed list i wonder if that example would be not specific enough 07:48:53 #link random user story about integrating an external capability into a cloud environment http://specs.openstack.org/openstack/openstack-user-stories/user-stories/proposed/high-scale-media-Telco-apps.html 07:49:08 There's quite a bit more detail in that sample-of-one 07:49:43 I think there's room for a variation on https://wiki.openstack.org/wiki/ProductTeam/User_Stories/Onboarding_Legacy_Apps 07:50:07 right, definitely no particular standard scoping there then it would seem 07:50:08 where you're migrating from bare-metal, not virtualized 07:50:31 (I haven't looked at these links before, so forgive me if that's obvious) 07:51:03 The Product WG is unfamiliar to me as well 07:51:32 i'm a bit confused about what http://specs.openstack.org/openstack/openstack-user-stories/ is supposed to be - they seem to really be feature specs rather than particular user-stories 07:52:00 or, a mix 07:52:19 I think there's more discussion to be had here but we ought to move on. Can we take an action to think around how/where user stories might help? 07:53:03 #action Group action to consider ways in which user stories might help for research computing use cases 07:53:18 sounds good 07:53:45 Reference architectures: Tim I found a couple of places where CERN's architecture (among others) is described in fairly high-level details in ops guides. 07:53:59 my initial feeling was that, in the first instance, it would help to simply document some of the deployment architectures/choices that would be relevant to research/hpc use-cases 07:54:12 Is there a specific place where reference architectures are kept, and kept current? 07:54:30 b1airo: agreed 07:54:32 things that answer the "can i do hpc with openstack?" question 07:54:34 the CERN architecture has some oldish use case descriptions. 07:54:44 it is more HTC than HPC though 07:55:09 noggin143: true - I'm just thinking if others were to be created, where would they reside? 07:55:44 oneswig, wiki for the moment i guess 07:55:48 #link https://www.openstack.org/user-stories/ is one place with a reasonably high profile 07:55:59 but this is not really reference architectures. 07:56:21 Good link. Is this a discussion item for the user committee mailing list? 07:57:17 my feeling is that probably the formal HPC reference architecture should go into the project navigator 07:57:20 #link https://www.openstack.org/software/sample-configs 07:57:31 oh PS, not sure if i mentioned or knew this last week, but i'm now looking after https://wiki.openstack.org/wiki/Documentation/HypervisorTuningGuide - seems relevant for us to steer under this group 07:57:33 I have found considerable effort has gone into documenting configuration items, but a reference architecture as an integrated entity - they all refer to (eg) Grizzly, Havana that I have found 07:58:01 oneswig, yes good thing to raise with user-committee list 07:58:53 noggin143, yes Flanders wants a few things there by the sound of it 07:58:57 noggin143: I hadn't seen the sample-configs page before, thanks for sharing. 07:59:28 the sample configs are nice in that they are active content and link to the project selection 07:59:52 Yes, hopefully they won't go stale 08:00:08 I wonder why when I search for these things, I did not find this 08:00:11 noggin143, agreed, all that stuff under /software is a slick addition 08:00:17 guys, I’m quite new to how this kind of project works, but let us know if we can help in some way 08:00:30 any pointer to a good entry point would be helpful, by the way 08:00:39 dariov, so are we ;-) 08:00:44 dariov: are you on the user-committee or openstack-operators lists? Good place to start 08:00:51 yep, there already 08:00:54 we are out of time, thanks everyone 08:01:03 Any final comments? 08:01:04 ok, great that’s a general feeling ;-) 08:01:21 I'm interested in the white paper mentioned in the agenda 08:01:41 We'll have to follow up on that another time 08:01:43 oneswig, yeah the google is a problem, presumably foundation folks are already looking at SEO 08:01:46 Yes. 08:02:11 i did recently add a link from the wiki getting started page to /software 08:02:28 #endmeeting