21:02:48 #startmeeting scientific_wg 21:02:48 hello all 21:02:49 Meeting started Tue Nov 1 21:02:48 2016 UTC and is due to finish in 60 minutes. The chair is b1airo. Information about MeetBot at http://wiki.debian.org/MeetBot. 21:02:50 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 21:02:52 Good morning / evening everyone 21:02:52 The meeting name has been set to 'scientific_wg' 21:03:03 Good afternoon anyone? 21:03:04 #chair oneswig 21:03:04 Current chairs: b1airo oneswig 21:03:10 o/ 21:03:10 #chair martial 21:03:10 Current chairs: b1airo martial oneswig 21:03:40 #link agenda for today https://wiki.openstack.org/wiki/Scientific_working_group#IRC_Meeting_November_1st_2016 21:03:47 oneswig is stig ? 21:03:51 y 21:03:54 indeed 21:03:59 here is saverio 21:04:04 and agenda dump is: 21:04:05 Selection of New Activity Areas for Ocata Cycle 21:04:06 https://etherpad.openstack.org/p/scientific-wg-barcelona-agenda 21:04:06 Brainstorm of 'Forum' session proposals for Boston 21:04:06 WG Picks from the Summit 21:04:06 Developer sessions 21:04:06 Video Archive 21:04:08 Plans for Supercomputing 2016 21:04:10 Evening social? 21:04:14 Superuser blog post? 21:04:49 #topic new activity areas for ocata 21:05:07 Well, we had a good session on this at the summit 21:05:10 if you can check / add your details to the scientific-wg rolodex that'd be great - 21:05:13 #link https://etherpad.openstack.org/p/scientific-wg-rolodex 21:05:40 See the etherpad link - line 39 onwards - https://etherpad.openstack.org/p/scientific-wg-barcelona-agenda 21:05:43 oneswig: it sure was a busy meeting (we need a bigger room :) ) 21:06:06 Perhaps we do - but nobody has said we need a better view 21:06:52 I think we should pick four activities to focus on - seemed like about the right number. Any view on that? 21:07:39 should we go by number of people interested ? 21:07:49 re. room / logistics etc, i think next time we should try for a double session for the meeting 21:08:04 Identity federation was a popular choice, clearly 21:08:11 oneswig, key is having people willing to hold up their hand to play lead on each activity imho 21:08:14 b1airo: longer, rather than wider, I like it 21:08:19 makes sense 21:08:27 oneswig Scientific Datasets 21:08:36 +1 21:08:39 oneswig: big data processing and scientific datasets are two main topics, and they could possibly be under the same umbrella 21:08:43 I am interested in both Identity Federation and Scientific Datasets 21:08:43 martial: can you elaborate on what's involved there? 21:09:03 and GPGPU 21:09:11 those were the top three it seems 21:09:24 i think that's the bring the compute to the data idea, oneswig 21:09:53 and especially for large science clouds having interesting datasets easily accessible brings users 21:09:54 so more than a directory of who hosts what data then? Something automated 21:10:02 oneswig: was just listing the top topics from the etherpad 21:10:21 martial: ah ok 21:10:23 yeah i have no idea what practices are out there today - i guess that's where we'd start 21:10:26 http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-public-data-sets.html <-- thoughts martial 21:10:32 oneswig: the subtitle was "is anyone hosting or consuming Scientific Datasets with Openstack ?" 21:10:43 martial: I am 21:10:52 so the idea is that Amazon is doing this 21:10:54 https://aws.amazon.com/public-data-sets/ 21:10:58 we want to host datasets 21:11:07 otherwise users will go away to the amazon cloud 21:11:13 +1 21:11:21 I am hosting the Common Crowl dataset (part of it 30Tb) 21:11:27 yes i'm sure we're all consuming them, but whether our clouds are hosting them in an accessible manner is another question, and then what are the choices/best-practices for hosting/moving/sharing 21:11:36 zioproto: do you know how they are exposed in AWS? 21:11:44 oneswig: yes with S3 21:11:57 zioproto: objects then - interesting 21:11:57 that is why amazon implements the Hadoop connector to S3 21:12:01 this would be a nice topic for engaging with NRENs too i suspect 21:12:13 the page zioproto shared lists the information and datatypes available on S3 21:12:17 b1airo: yes NREN are very interested 21:12:42 zioproto: do you envisage a port to swift of the same? I assume more is needed wrt metadata? 21:13:21 I tested also swift 21:13:28 but the Hadoop connector for swift is buggy 21:13:33 john dickinson (PTL of Swift) would be interested in this 21:13:36 cant work with files bigger than 5Gb 21:13:39 (sorry dfflanders) 21:13:45 there is a open bug for that in openstack sahara 21:14:09 so NRENs got started this about SCientific Dataset collaboration #link https://docs.google.com/document/d/10YVe3Ex0tvR6p12t8kxhwgm_f2Ws_mMX7_w_AMm_BwM/edit?usp=sharing 21:14:09 Note that this is where SwiftOnFile comes into play 21:14:16 zioproto: assume you've added your vote to it? 21:14:25 where do I have to add the vote ? 21:14:41 zioproto https://etherpad.openstack.org/p/scientific-wg-barcelona-agenda 21:14:53 zioproto line 112 21:14:54 zioproto: comment, add yourself as a watcher, etc. 21:15:03 sahara bug #link https://bugs.launchpad.net/sahara/+bug/1593663 21:15:04 Launchpad bug 1593663 in Sahara "[hadoop-swift] Cannot access Swift Static Large Objects" [High,Confirmed] 21:15:20 Otherwise, any large data stored in Swift requires manual chunking 21:15:32 Oh, you filed it. Counts as interest I guess... 21:15:48 oneswig: right I wrote that stuff 21:16:07 * dfflanders thinks this would be a great poster for Forum in Boston to get PTLs of Sahara and Swift to meet this use case. 21:16:33 my irc client is going crazy over here 21:16:40 I was at the Sahara design session 21:16:41 sounds good to me - and a good activity area 21:16:52 there was not a lot of enthusiasm around fixing this bug 21:16:56 hello notmyname 21:17:05 lol popular notmyname 21:17:13 how can I help? 21:17:13 it is filed against sahara but it looks like hadoop is what is actually failing? 21:17:21 should that bug be pushed upstream to hadoop? 21:17:24 welcome John :) 21:17:41 hi John, thanks for watching! 21:17:46 there is a paternity problem about this swiftfs hadoop code, now it is maintained in Sahara... but should be a Apache Hadoop thing ? 21:18:08 We have some users interested by the Swift+Hadoop use case in Chameleon as well, I know we checked out a few solutions at some points, I will try to share what we've learned 21:18:24 thanks priteau 21:18:36 clarkb: see here https://etherpad.openstack.org/p/sahara-ocata-edp 21:18:44 at the end they talk about this bug 21:19:01 I would argue it depends on how often you want to access the common dataset 21:19:07 priteau: what is Chameleon ? 21:19:21 If its >1, then there is great benefit to just copying the data out of Swift 21:19:25 Does this mean Sahara does not currently get used for objects of this size? 21:19:27 notmyname, we were just discussing public scientific dataset hosting for openstack clouds and related to that hadoop+sahara - apparently there is a problematic bug in sahara with regard to large objects: #link https://bugs.launchpad.net/sahara/+bug/1593663 21:19:28 Launchpad bug 1593663 in Sahara "[hadoop-swift] Cannot access Swift Static Large Objects" [High,Confirmed] 21:19:32 zioproto: oh if sahara maintains it then you are probably fine. I was going off of the "error is emitted by hadoop process" in that bug 21:19:33 zioproto: testbed for cloud computing research built with OpenStack (http://www.chameleoncloud.org) 21:19:37 Chameleon is NSF funded baremetal as a service for CS researchers in the USA 21:19:53 seattleplus: if the dataset is 200Tb you want to consume it directly from object storage without holding another copy 21:20:34 What is the best way to share knowledge on this topic? etherpad or wiki page? 21:20:43 we were just musing that Swift folks would be interested in this general data hosting topic and may be able to help move any related technical issues 21:20:44 zioproto: Well…again, if it takes 10x longer to process, then that isn’t always true 21:20:59 seattleplus: agreed !!! 21:21:03 +1 priteau 21:21:04 it's a matter of *how* you download it. Also swiftfs downloads it in a sense 21:21:05 b1airo: yes, absolutely! 21:21:20 seattleplus: object storage is slow if you have to list many objects in the bucket, is not trivial how to organize the data 21:21:30 priteau, mt two pence = etherpad 21:22:18 chairs, do we see an action coming from this, or potentially a lead who can take this forward? 21:22:31 If an activity area is a combination of a goal and overcoming obstacles, we've got a great case here 21:22:34 priteau, dfflanders - agreed, etherpad for now until we have something written up 21:22:42 etherpad is good to me 21:22:45 +1 oneswig 21:22:55 oneswig, +1 this looks like a good topic 21:23:26 there's lots of little areas inside it as well, e.g., object store design for large clouds etc 21:23:27 So that seems like 2 winners discussed so far - this and identity federation. What shall we cover next, before we get on to assigning leads? 21:23:58 at switch we published a demo tutorial how to consume data with Hadoop from swiftfs #link https://github.com/switch-ch/hadoop-swift-tutorial 21:24:21 * dfflanders looks at etherpad from barcelona 21:24:25 I had one to propose based on discussions outside of the meeting - telemetry and monitoring. Any interest in that? 21:25:03 oneswig: but that stuff is a bit generic and not really focus on 'scientific wg', am I right ? 21:25:10 oneswig, +1 21:25:12 oneswig, sounds like one for martial 21:25:31 I have an interest in connecting high-resolution telemetry data to workload manager data 21:25:42 Which works its way onto our territory 21:25:52 ok I think I did misunderstood the topic 21:25:54 But you're right, it's a problem on everyone's minds in general 21:26:22 oneswig if not immediate interest here I would think this would be worth circulating via sceintific-wg rolodex ? 21:26:54 dfflanders, mailing list perhaps 21:27:14 prefer not to direct email people unsolicited 21:27:14 dfflanders: I'm hesitant to mail a massive distribution too often - keep the powder dry for whne you really need it 21:27:38 timecheck = half way done 21:27:57 oneswig blairo understood and agreed :) 21:28:02 OK - the other candidates to discuss - perhaps park this one in reserve 21:28:03 oneswig: I am very interested in the telemetry and monitoring 21:28:28 oneswig, what's the problem you are trying to solve here? 21:28:29 oneswig: we are developing tools for that, I "quickly" presented those in the BoF talk 21:29:15 b1airo: user-sourced telemetry data is part of it. Understanding our system's performance is the rest 21:29:42 other thing i wanted to discuss today is creating a HPC/Research Challenges/Gaps list that we could share with the community 21:29:56 +1 21:30:05 b1airo: Is that an activity area? 21:30:17 for ocata? 21:30:28 oneswig, user-sourced... as in they can feed telemetry data into the system in order to make scaling decisions etc? 21:30:51 More like MPI performance profiling I have in mind 21:31:01 oneswig, it could be a standing activity 21:31:27 What activities were there around GPGPUs - anyone want to make a case? 21:32:08 oneswig, so the underlying problem is that of mashing together data from the user environment and the infrastructure into something meaningful? 21:32:46 b1airo: yes - something to view performance telemetry from many contexts 21:32:56 oneswig, i think writing something up about how to make them work and any caveats would be very useful 21:32:57 and correlate 21:33:13 I get asked often question like "my app is slower on your VM than on my laptop" 21:33:19 that's an activity i'd be able to lead without too much trouble 21:33:20 b1airo: will do 21:33:34 turns out it was a VM with overcommit and running on an old machine. But for the user it might be as everything is the same 21:33:35 #action oneswig to elaborate on the case for telemetry 21:33:46 arcimboldo, yes that statement sounds familiar 21:34:27 On their defense, we are not giving any tool to predict the performance of the VM, not even after instance creation 21:34:33 The lingering question of "why did my app run slow" can get much trickier to answer in a cloud environment 21:34:33 or to check how fast is the vvm 21:34:42 regarding performance, i have some anecdotal evidence from M3 of applications that suffer very bad degradation without the appropriate pinning and topology 21:35:01 e.g. almost an order of magnitude! 21:35:11 b1airo: strewth 21:35:46 yeah, will try and get some real numbers for this particular case before SC 21:36:16 GPGPUs anyone? 21:36:38 oneswig, +1 21:36:47 +1 on the GPUs 21:36:49 tome the problem with GPU is that it's a scarse resource and you can't really do overbooking 21:36:56 +1 21:37:00 testing them in production 21:37:10 I don't know what's the general feeling. I feel like gpus still work best with a batch system 21:37:33 re. that nomad project currently doesn't talk about gpgpu, seems more focused on NFV stuff? 21:37:52 b1airo: it is very undefined yet 21:38:03 martial: what was the session like? 21:38:05 b1airo: was at the Design summit session 21:38:18 b1airo: they discussed the first steps to create the API 21:38:31 arcimboldo, we run a batch system (SLURM) on top of openstack, it then handles the job scheduling for GPGPU flavors 21:38:44 details at: https://etherpad.openstack.org/p/nomad-ocata-design-session 21:39:00 we also have some users who just want a CUDA dev environment, in which case an on-demand VM is good 21:39:22 martial, thanks will take a look 21:39:26 Are we including Xeon Phi in this? 21:39:36 ... anyone for phi? 21:39:43 why not? i have no direct experience, but sounds like it works 21:39:51 me neither 21:39:58 * dfflanders would like to learn more 21:40:09 OK, motion carried. One to go. 21:40:10 so phi seems a different beast in its latest incarnation - no a PCI devices / cores dont support VM 21:40:15 at least with older gen phi, not sure whether the computing model of the current ones works...? 21:40:18 we have done some testing for Deep Neural Network and our depth was too little to trigger the Phi :( 21:40:21 timecheck = 20 min remaining 21:40:40 Anyone to step forward with a final activity area proposal? 21:40:53 * arcimboldo still doesn't understand what nomad is exactly 21:40:58 we now have 4 21:41:09 what's the list now oneswig ? 21:41:33 id federation, big data / sahara / datasets, gpu, monitoring - in that order 21:41:54 time for people to o/ as activity leads? 21:41:55 arcimboldo: an absrtaction layer for hardware, see https://wiki.openstack.org/wiki/Nomad 21:42:32 also noting that this agenda item might need to be carried on to next meeting due to jetlags and tz differences? 21:42:50 martial, a kernel then? :) 21:42:55 martial: it is a hardware salad, GPUs for HPC and Network chips for IPSec encpryption ... 21:43:01 dfflanders: good point, perhaps ratify a list next week 21:43:36 OK - any more on activity areas? 21:43:58 Do I have any volunteers for leading? 21:44:15 I'm happy to do monitoring (since I'm doing it at work) 21:44:17 happy to take on some gpu work! 21:44:32 oneswig: can you explain exactly what one should do when leading ? 21:45:14 zioproto: good point, here's what I'd guess - track activities, represent the group, help advocate the use case at the forum in boston 21:45:28 zioproto: participate in meetings, discuss the topics on MLs and interact tiwht people 21:45:34 zioproto, herd cats towards some sort of outcome, however minor that might be...? :-) 21:45:42 +1 oneswig zioproto mt two pence : at a minimum attend meetings and report on progress + push forward any arising actions. 21:45:47 in the case of big data, make sure the issue you raised is followed through for example 21:46:00 zioproto: given that you are working on it already, sounds like a good fit too 21:46:13 ok, I already do all this stuff but I am afraid to take officially the lead, cant be sure I will be there in Boston 21:46:13 blair, +1 cat herding ftw ;-D 21:46:22 b1airo: at times, herding cats sounds easier though 21:46:22 I talk for Scientific Dataset 21:46:43 I will help support you zioproto if you lead 21:46:48 zioproto: I'm unable to attend Boston either, alas - ain't going to stop me :-) 21:46:53 powerd, sorry if i've missed this, but who are you? :-) 21:47:09 zioproto: has taken the kings shilling :-) 21:47:12 zioproto, if its travel funding we should get your application in for travel support 21:47:13 may I confirm this lead thing at the next IRC meeting ? we have the topic also in the next agenda ? 21:47:33 zioproto: sure, no problem 21:47:52 oneswig: we will talk on the monitoring, I am working on telemetry too, would love to know more about your model 21:48:16 all - regarding weekly meetings and agendas, oneswig and i sort of realised late last cycle that we should be pretty much repeating the agenda over the two different time slots so that we give everyone a chance to be on the same page 21:48:29 aha - yes i should really have opened with that ;) David Power (working with a number of Unis on HPC, a number of whom now using openstack) 21:48:30 martial: that would be great - I've already made an action to elaborate on it... 21:48:42 +1 blair the WG has gotten to that size/distribution, it is unavoidable 21:48:48 (based in Ireland) 21:48:57 powerd, cool! you are a consultant then? 21:49:24 would you mind adding yourself to https://etherpad.openstack.org/p/scientific-wg-rolodex ? 21:49:42 dfflanders: should not be a problem the travel expenses, just I have to check other activities going on in my radar :) thanks for asking 21:50:03 powerd: a hearty welcome to the WG! 21:50:16 Sort of, few hats including Boston Ltd / vScaler 21:50:20 zioproto, kewly-o 21:50:23 thanks Stig! 21:50:32 interestingly, telemetry is not on the Etherpad 21:50:38 down to 10 minutes ? 21:50:39 +1 welcome welcome 21:50:41 OK nobody yet has come forward for our most popular - identity federation. Nobody else have this in their in-tray? 21:50:57 this worth noting re Boston Cloud Declaration meeting as well. 21:50:58 this worth noting re Boston Cloud Declaration meeting as well. 21:51:02 martial: no - only remembered it afterwards! Will retrofit 21:51:03 at SWITCH we want to implement it asap 21:51:33 oneswig, we can wait to see if anyone else volunteers interest next week or on the mailing list 21:51:36 zioproto, Jisc might support this work? 21:51:45 also worth noting Khalil's work on this already 21:51:49 but we have to carefully integrate it in our production system, it is not a easy migration from the existing running stuff 21:52:01 sorry rather powerd --> jisc might support the identity fed work? 21:52:13 guys when you say themailing list, which ML exactly ? 21:52:20 b1airo: next week looks good for a leader 21:52:31 +1 21:53:02 zioproto: usually operators gets the right audience 21:53:05 also good to get redundacy with two leaders per activity where possible \o/ FTW 21:53:12 dfflanders: most tools are there now with Mitaka. We need to upgrade the production infrastructure to Mitaka and start the testing of Keystone+Shibboleth. 21:53:13 powerd, great, and welcome! we can collaborate on the GPU stuff 21:53:16 that is the current status 21:53:45 blario: thanks - looking forward to it! 21:53:45 ok we need to wrap up 21:53:55 last quick discussion on SC 21:54:00 #topic SC16 21:54:03 with ID federation, what I miss most is the information in one place 21:54:10 Khalil is helping put together a congress for international federation between academic clouds... 21:54:20 oneswig, +1 an overview doc would be a great start! 21:54:25 here is what I have so far, missing anything? Telemetry oneswig martial 21:54:25 Scientific Dataset zioproto 21:54:27 GPGPU powerd 21:54:29 dfflanders: do you have a link ? 21:54:46 ping me email, I'll FWD info 21:54:51 martial: looks good to me 21:55:00 flanders@openstack.org 21:55:06 add b1airo to GPGPU I think 21:55:35 +1 NVIDIA can sponsor the next social evening ;-) 21:55:39 yeah, have some work to do on that this cycle 21:55:48 +1000 21:55:58 zioproto: I think Simon has had contact with Khalil's discussions from previous WG IRC meeting 21:56:01 nobody on federation ? 21:56:02 blair, I've already pinged Mike 21:56:15 martial: not this week - carry over 21:56:26 martial: zioproto also pending confirmaiton next week 21:56:50 dfflanders, for sydney summit? 21:57:10 okay added to etherpad 21:57:25 dfflanders: there was some discussion on Foundation helping organise WG evening socials - hope it wont get forgotten! 21:57:36 blair, for all of 2017 ;-) 21:57:46 b1airo: what to raise on SC activities? 21:58:13 oneswig, main thing was that the foundation seemed keen to do some promotion via superuser blog or some such 21:58:23 #action foundation via dfflanders to support organising sponsorship for scientific_wg social evening in Boston 21:58:33 good point b1airo 21:58:44 needs doing by next meeting I guess? 21:58:49 there were some comments added to the etherpad #link https://etherpad.openstack.org/p/scientific-wg-supercomputing16 21:58:55 yeah asap i'd say 21:59:21 Who was the superuser contact? Nicole? 21:59:30 Allison 21:59:34 or Nicole 21:59:41 #action b1airo to follow up with Denise re. SC activity promotion 21:59:58 I see denise in the etherpad 22:00:06 i'll send some email and CC you guys 22:00:10 Denise for SC, Allison for SU 22:00:13 grand 22:00:26 ok we're out of time 22:00:33 thanks for a very lively session all!! 22:00:36 what a session! 22:00:38 hasta luego 22:00:40 thanks ! 22:00:41 bye 22:00:48 good bye 22:00:51 thanks all 22:00:54 ta! 22:00:59 * dfflanders misses jamon 22:01:03 #endmeeting