06:59:39 #startmeeting scientific-wg 06:59:40 Meeting started Wed Jun 8 06:59:39 2016 UTC and is due to finish in 60 minutes. The chair is blair. Information about MeetBot at http://wiki.debian.org/MeetBot. 06:59:41 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 06:59:44 The meeting name has been set to 'scientific_wg' 06:59:48 #chair oneswig 06:59:49 Current chairs: blair oneswig 07:00:16 Hello 07:00:21 Good morning! 07:00:29 #topic roll-call 07:00:42 Who is here today? 07:00:44 present! 07:00:49 hello 07:01:20 we have an apology from Tim Bell, which is a bit unfortunate because I think he was pretty interested in Blazar 07:01:36 I am Pierre Riteau from the Chameleon project 07:01:50 I was invited by Stig Telfer to join the meeting 07:02:00 priteau: thanks for coming Pierre 07:02:12 +1 07:02:58 ... attendance is light this week ... 07:03:29 Adam from Francis Crick here - morning 07:03:37 Hi Adam 07:03:42 G'day Adam 07:04:01 Good morning Adam 07:04:30 Shall we get going? 07:04:37 so maybe i can quickly tell you about the survey thing i added to the agenda (as i'll need to exit soon) and then we can move onto blazar? 07:04:58 #topic NeCTAR NPS 07:05:02 take it away Blair 07:05:20 just wanted to mention this so it's on the radar really 07:05:51 for those who don't know, the NeCTAR Research Cloud is a Science Cloud open to all government funded researchers in Australia 07:06:34 it has been operating since 2012 and scaled up between 2012-2015 to 8 sites and ~35k cores 07:06:58 we recently did our first major user survey 07:07:31 contacted over 6000 people/emails and got a pretty decent 600+ responses 07:07:47 that's a lot of datapoints 07:08:16 IIRC the OpenStack user survey is roughly the same sample size 07:08:28 it was a Net Promoter Score survey, so it's quite light in terms of participation, basically you just score the thing on a scale of 1-10 07:08:50 anything 6 or less is considered a detractor 07:09:02 i think 7 and 8 is neutral 07:09:09 9 and 10 is promoter 07:09:29 (i could have those slightly out, might only be 7 that is neutral) 07:09:50 anyway, you end up with a score between -100 and +100 for the survey overall 07:10:07 and we got +24, which we're pretty happy with 07:10:38 there's also a "more info" field, and then detractors were followed up for more specific info if willing 07:10:53 That counts as approval, ie 0 is neutral in the final scoring right? 07:11:04 oneswig: yep that's right 07:11:20 that score in general NPS terms is quite good i believe 07:11:33 Great to hear it. Any ideas on why? 07:11:55 though compared to similar infrastrucutre things it's a bit lower apparently 07:12:23 most of the detractors had usability or stability gripes 07:12:56 At least they are fixable, to some degree 07:13:32 the stability ones could be largely due to early problems with storage in the initial zone throughout 2012, but also probably because this cloud operates across university datacentres and each year there is usually at least one or two zones offline for a weekend (e.g. power works) 07:13:32 blair: will you be re-running this annually, six-monthly...? 07:13:45 yes that's the intention now 07:13:56 priteau: any idea how satisfied chameleon users are? 07:14:07 but of course just targeting active users and/or user who have recently become inactive 07:14:48 oneswig: We haven't run a survey across all users yet. We got some good feedback from users who successfully used the testbed to run experiments and published their results 07:15:09 anyhow, wrapping up this item. we have a good pile of data here that may be useful to the UX team, so i'm working to de-identify it and pass it to Piet 07:15:33 That's a great idea Blair, get more out of it that way 07:16:01 priteau: I saw Paul Ruth talk at Austin with Kate, he seemed pretty pleased 07:16:07 yeah, as you can imagine a lot of researchers have trouble when you give them Horizon 07:16:37 #idea is there any existing way to feed data like this as input to UX? 07:17:06 it doesn't seem like it, currently have a thread going asking various foundation staff 07:17:42 It's a really useful thing to do, I'll ask around at Cambridge to see if anything similar happens 07:17:46 ideally it would be passed to the foundation and then they'd pass it to blessed projects, so there are no issues with personal liability 07:18:33 Any more to cover on this agenda item? 07:18:53 no let's make use of priteau while we've got him :-) 07:19:00 Blair I'll note an action and move on 07:19:24 #action blair's going to anonymise NPS data for UX purposes 07:19:39 #topic Blazar and use cases 07:19:56 priteau: can you give an overview of what it does for Chameleon? 07:20:06 Absolutely 07:20:35 So as you may know the Blazar project implements Reservation as a Service for OpenStack 07:21:17 In Chameleon we make use of its ability to reserve physical hosts 07:22:28 Before running any experiments, Chameleon users interact with Blazar (generally through the Horizon dashboard, but it can also be from the CLI) to reserve one or several physical nodes for their experiments 07:22:55 The nodes are exclusively available to them for the duration of their reservation 07:23:05 At a future time, or from now for six days (say) 07:23:34 Reservations can start now or at any time in the future 07:24:09 priteau: forgive my ignorance, but how does Blazar actually implement that? does it manipulate aggregates or something to make nodes unavailable to nova scheduler and then force-host onto them as required? 07:24:09 Our main requirements was to allow running large scale experiments, e.g. using all resources of the testbed 07:24:26 And if the reservation is accepted, it's essentially a contract with the infrastructure? 07:24:43 for this users can take a reservation some time in advance, which would not be possible if they were just relying on launching on-demand instances with Nova 07:24:43 blair: looks like it uses aggregates 07:25:18 blair: yes it manipulates host-aggregates and forces them to be used with a custom Nova scheduler filter 07:25:33 users have to pass a scheduler hint to launch instances inside a specific reservation 07:25:52 (we've made this process easy to do through Horizon) 07:26:36 priteau: once a reservation has become active, can it be changed? 07:26:40 if no scheduler hint is given, Nova will schedule within the set of hosts not managed by Blazar, which in our case is empty, making it mandatory to use Blazar 07:26:53 For example, if users find they need more resources than they had predicted. 07:27:26 verdurin: only in time, i.e. you can shorten or extend the duration 07:27:45 but they cannot be modified in space 07:28:02 priteau: but there's no segregation between reserved hosts and "on-demand" hosts in the same project, right? 07:28:06 this is a feature we would like to have but it is not on our short term roadmap yet 07:28:39 priteau: at the moment, the workaround would be to create a new reservation with immediate start? 07:28:54 verdurin: correct 07:28:56 ok that's cool, so right now it sounds pretty useful with the main drawback being you have to segregate the reserved infrastructure 07:29:07 oneswig: I am not sure I understand your question 07:29:50 priteau: I meant, all the hosts for a project (blazar-managed or not) could still share the same east-west tenant networks? 07:30:24 would sure be nice if it worked in tandem with a preempt-able instance feature 07:30:42 priteau: can you have soft and hard reservations ? 07:31:10 oneswig: right, networking is completely independent of Blazar 07:31:25 priteau: thanks. Who else is using it, do you know? 07:32:04 blair: I believe it is only hard reservations 07:32:52 blair: are you thinking of something like spot market opportunistic availability? 07:33:05 oneswig: exactly 07:33:14 blair: me too :-) 07:33:39 or at least a way to make sure that ephemeral workload can leverage the reserved infrastructure when there is not a reservation active 07:33:44 yes, that's something we've wanted for a long time 07:34:08 priteau: how much effort do you need to put into maintaining Blazar and keeping it from bit-rot? 07:34:28 oneswig: I had a meeting some time ago with previous contributors to Blazar, from Mirantis & Red Hat. There was someone interested in using it for a project related to NFV, I can't find the name right now 07:34:35 as it sounds like for a general purpose science cloud where you want this functionality you'd need to hold compute capacity aside for blazar from the rest of your usual on-demand fleet 07:35:27 it would definitely be of interest for NeCTAR 07:35:58 priteau: does it support AZs? 07:36:03 priteau: Chameleon uses Ironic, is Blazar managing those compute nodes or other (virtualised) nodes? 07:36:17 oneswig: There was a substantial effort in making Blazar more stable at first, as some basic validation of user input was missing or in some conditions it would fail and leave Nova in a bad state 07:36:55 And as we operate the testbed we regularly make improvements 07:37:21 just last week I fixed an eventlet bug triggered to concurrent operations 07:37:48 blair: I don't know if it supports AZs, we don't use them in Chameleon 07:37:57 priteau: how do you handle backfilling of nodes? 07:38:18 oneswig: in Chameleon, Blazar only manages Ironic bare-metal hosts, not KVM hypervisor nodes 07:39:02 aloga: we don't, but our use case is very specialized, so I see how this would be needed for a general purpose compute cloud 07:39:25 blair: regarding preemtible instances, you should have a look at https://review.openstack.org/#/c/104883/ 07:39:26 This use case is interesting for the Intel-Rackspace OSIC. The idea that people could (to some degree) do self-service reservations of bare metal of arbitrary sizes 07:39:55 hello folks! 07:39:55 note that in Blazar we use the physical host reservation feature, but there is also support for reserving virtualized instances (which I am not familiar with) 07:40:01 priteau: I do see this as a problem, if a user reserves one node with 1 week in advance, the node won't be used during that week 07:40:41 aloga: thanks for pointing that out, will definitely review 07:40:44 blair: and of course at https://github.com/indigo-dc/opie 07:40:57 aloga: I may have misunderstood what you meant by backfilling. If user Alice reserves a node one week in advance, Bob can still reserve the same node now until next week 07:41:22 Blazar makes sure that reservations don't overlap 07:41:27 priteau: maybe it is my lack of knowledge about blazar (I joined late and I was having a look at the logs) 07:41:52 priteau: if alice reserves the node in today + 1 week and nobody else reserves that node using blazar 07:42:18 the node is not going to be used, as it will be removed from the "normal" nova aggregate 07:42:20 right? 07:42:48 aloga: right, so if no other Blazar reservation is made for that node, then it isn't used, at least using the physical host reservation feature of Blazar 07:42:59 as I said I am not familiar with the other modes of reservation 07:43:50 aloga: opie looks interesting, am i right in thinking it extends the existing nova-scheduler? some of the code looks familiar 07:43:51 priteau: thanks for the information :) 07:44:03 oneswig: note that for making Blazar and Ironic "play nice together" we had to make some customizations. Blazar expects one nova-compute per physical node, while Ironic is designed to run one nova-compute per cluster 07:44:20 priteau: it sounds like the issue here is that nodes that are reservable by blazar must be partitioned from others, is that the case? 07:44:29 priteau: I think that preemptible instances would be a good complement for blazar, as it would allow to fill the node until the reservation starts 07:44:35 blair: yes, you are right 07:44:48 aloga: yes indeed it would be a good completement 07:45:25 blair: the scheduling algorithm is described in the spec 07:45:51 blair: but the plumbing is basically the filtering alrogithm 07:45:55 oneswig: yes, when you register hosts to be managed by Blazar then they cannot be used without reservation anymore 07:46:02 s/filtering algorithm/filter scheduler/ 07:46:47 priteau: do you have future plans for developing Blazar not covered yet? 07:47:05 aloga: which spec, the one you mentioned earlier? 07:47:09 blair: the code needs a brush up, I am currently working on this, but I keep you updated 07:47:11 blair: yes 07:47:41 oh ok, i didn't see any reference to "opie" when i had a quick look 07:47:46 blair: basically opie's code is the materialization of the spec 07:47:57 oneswig: first, I have many patches to Blazar that I would like to push upstream: https://github.com/ChameleonCloud/blazar/commits/chameleon 07:48:49 priteau: does Blazar have CI defined for it? 07:49:40 oneswig: for additional development, we would like to 1) improve the resource selection capabilities of Blazar 2) add support for Keystone v3 domains to the Blazar client 07:50:12 oneswig: see one of the latest patch posted: https://review.openstack.org/#/c/325747/ 07:50:27 priteau: just looked - should have done before asking :p) 07:50:56 Any more to cover on Blazar? 07:51:28 #topic Reference architectures and user stories 07:51:36 oneswig: there are unit tests and tempest tests 07:51:58 OK, so we have some work underway to document our latest reference architecture at Cambridge Uni 07:52:05 Not much to report on that. 07:52:12 Just rolling on. 07:52:40 I am interested to know, is there anything similar for Chameleon, or for Indigo DC, 07:53:07 something that might help the Foundation's pages on how research/scientific compute can be done on OpenStack? 07:53:30 i'm gonna run folks, might be missing last call for beers! ;-) 07:53:39 Thanks blair, cheers 07:53:42 blair: enjoy 07:53:49 oneswig: yes, there is 07:53:53 thanks priteau and aloga! 07:53:56 thanks blair 07:54:02 cya oneswig 07:54:08 bye blair 07:54:35 oneswig: https://arxiv.org/abs/1603.09536 07:55:20 HEP in this case - high energy physics? 07:55:26 oneswig: yes 07:55:29 oneswig: we don't have this publicly available yet 07:55:44 oneswig: I know I need to submit something about eMedLab 07:56:08 oneswig: is there a particular format that would be needed by the Foundation? 07:56:08 verdurin: that would be great! 07:56:33 oneswig: there's also an EU deliverable, but it is a much larger version 07:57:15 #link www.openstack.org/user-stories - there are case studies linked to from there 07:57:28 oneswig: however, the paper is focused on HEP, but indigo goes far beyond HEP 07:57:50 I think there are other places where the pages are more "live" and link to blogs and external pages that are continually updated 07:58:58 It would be a great help to contribute case studies to link use cases with people using OpenStack in that way 07:59:25 for example, I think we've all learned stuff today we didn't expect 07:59:56 Ah, we are out of time for the week. 08:00:15 Thank you for coming! 08:00:45 I'll read on with interest 08:00:46 oneswig: thank you for inviting me! I hope my input on Blazar was helpful 08:00:50 oneswig: thank you for chairing 08:00:56 certainly was 08:00:59 Very interesting, priteau 08:01:07 #endmeeting