11:00:43 #startmeeting scientific-sig 11:00:44 Meeting started Wed Dec 4 11:00:43 2019 UTC and is due to finish in 60 minutes. The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot. 11:00:45 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 11:00:47 The meeting name has been set to 'scientific_sig' 11:00:51 greetings 11:00:54 \o 11:00:58 o/ 11:01:03 #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_December_4th_2019 11:01:15 o/ 11:01:20 How are we all :-) 11:02:08 Ok let's get the ball rolling 11:02:11 mostly noticing winter this morning :) 11:02:13 #topic Unified limits and nested projects 11:02:20 johngarbutt: thanks for coming along 11:02:41 Can you describe the issue here? 11:02:42 its long overdue, keep trying to make it, so glad to be here 11:02:53 #link https://review.opendev.org/#/c/602201/ 11:03:07 so here is the Nova spec for the Nova part of unified limits 11:03:37 but a bit of back story first 11:03:41 #link https://docs.openstack.org/api-ref/identity/v3/#unified-limits 11:03:58 we had a bit of a winter snap on the first two days of summer here - but back to good now :) 11:03:58 Keystone have now added APIs to support "unified limits" 11:04:24 the aim here is to make limits consistent between all the projects 11:04:37 at least all projects that adopt unified limits, anyways 11:05:01 ... with that in place we can then implement things like hierarchy in a way that we all agree 11:05:16 a long overdue capability! 11:05:20 a key part of that is the oslo.limit library 11:05:52 so the big news is I have some patches up 11:05:54 https://review.opendev.org/#/c/695527/ 11:06:14 there we have a long chain of patches that implements the limit enforcement 11:06:40 the final one being an sketch on how we do the two level limit enforcer using the proposed limit API 11:07:20 ... 11:07:33 so a patch that ties the keystone work and oslo.limit work into a project is here 11:07:34 https://review.opendev.org/#/c/615180/ 11:07:46 this is how I propose Nova could adopt oslo.limit 11:08:04 it builds on the great work melwitt has been doing in recent cycles 11:08:16 so it is "placement centric" 11:08:29 by which I mean the nova unified limits are based on resources in placement 11:08:36 this is great! 11:08:39 VCPUs, PCPUS, etc 11:08:43 Finally the spec has a +2 after all these iterations. Congrats :) Do you think it can be implemented/merged during this cycle? 11:08:58 and importantly, custom baremetal flavors also 11:09:40 belmoreira: that is my aim, but the second +2 is taking its time, I need to try and find a second supporter 11:10:09 does the customer baremetal flavor give me the ability to limit what machines can be taken (e.g. any VM resource but only 48 cores of bare metal)? 11:10:29 noggin143: almost... 11:10:38 johngarbutt: when this lands, how would a quota be set on a custom resource class? 11:10:50 each baremetal flavor has a custom resource class, and you can limit on that 11:10:55 oneswig: yes 11:11:20 so for baremetal you would like the number of machines of a given type 11:11:32 meaning: project X can use Y "custom resource class" 11:11:37 like CUSTOM_BAREMETAL_R630_SILVER or whatever 11:12:03 yeah, you can use N of any resource requested in the flavor 11:12:18 it means baremetal flavors would no longer use any VCPU 11:12:24 that is now just for VMs 11:12:36 in the first release, this will all be optional 11:12:48 i.e. you have to opt into the new behavior 11:13:47 it would be great to get people's support on the spec 11:14:14 by support, I mean constructive feedback too, that is always better than blind support :) 11:14:34 #link https://review.opendev.org/#/c/615180/ 11:14:44 oops 11:15:01 #link https://review.opendev.org/#/c/602201/ 11:15:04 that is the spec 11:15:38 are there any use cases people are worried about that we should go through, to make sure they are covered? 11:15:59 hopefully the baremetal one make some sense, or at least, I could go through that again if it wasn't that clear? 11:16:34 how about GPU or other specific resources support? 11:16:39 I think it splits into two main use cases - common resource quotas across baremetal and virt resources, and GPUs. 11:16:45 great question 11:16:47 brb 11:17:10 we can use resource classes but looks very limited 11:17:23 so right now they are not modeled in placement, so its trickier 11:17:49 there is a plan in Nova to allow you to add CUSTOM_RESOURCE_CLASS_X into your hypervisor resource provider 11:18:15 i.e. CUSTOM_GPU_V100 = 8 11:18:22 then your flavor could request those 11:18:45 basically pizza slicing your machine, using custom resource classes 11:18:53 and about to model this with "traits" 11:19:26 so traits... at the moment there isn't anything, but there is a follow on idea I had 11:19:35 lets consider the server count limit 11:19:47 today we count all servers 11:19:58 ... but we could extend that to a trait based count 11:20:11 how many instances do you have that request a given trait 11:20:26 which might be quite powerful, what do you think? 11:21:23 I think that will be more powerful/flexible than the resource_class for this usecase 11:21:32 I should be clear, the current aim for this cycle is to do the minimum unified limits work, to open up new options, it just happens the easy way to implement this means stuff like CUSTOM_RESOURCE_X will just work too 11:22:16 belmoreira: cool, I am just glad it doesn't sound crazy, its certainly a follow on feature (that may need placement changes) 11:22:36 +1 - let's get it in and then enhance - specs look like they cover our base use case from Boston - https://techblog.web.cern.ch/techblog/post/nested-quota-models/ 11:24:13 this all looks good to me. I don't have the GPU part thought through, but the baremetal part sounds reasonable and useful 11:24:14 I understand that... and it would be huge work to have it. Thank for working on it. I was just trying to not forget the other use cases (GPUs) 11:24:27 noggin143: hopefully this (once finished) would be the main bit you needed for nested: https://review.opendev.org/#/c/695527/ 11:24:45 johngarbutt: yup 11:24:48 belmoreira: ++ 11:25:10 where would we need to lobby for the second +2? 11:25:26 belmoreira: part of me hopes GPUs become a resource class at some point (basically) 11:25:51 janders: I would say the nova meeting, but maybe a +1 on the spec would help too 11:26:27 I should set a bit of context on that though... 11:26:57 for the folks of us that know this bit of Nova inside out, it takes us an hour or two to page back in all the context needed to work on these things 11:27:25 its a "bit tricky", so I know its asking quite a lot of someone to review it properly 11:27:40 but it clearly important to lots and lots of people! 11:28:09 any indications from the other projects or are they waiting for Nova ? 11:28:21 would bringing this to the attention of RHAT guys help? 11:28:45 melwitt is pushing this on her side, asking your contacts could only help I guess. 11:29:07 ack 11:29:15 noggin143: good question, cinder is probably the next people, I haven't seen anyone step up yet 11:29:45 cinder quota is still old nova quota :) 11:29:50 apologies... back now 11:30:03 Still GPUs... during the last summit I talked with the cyborg PTL. They were looking into the GPU quota case. It's worth to check with them 11:30:19 yeah, no counting approach there I guess, neutron did a re-write, and cinder added some hierarchy 11:31:28 belmoreira: I keep meanging to revisit the cyborg spec myself, there is some placement integration there, but its not in the flavor in quite the same way sadly 11:32:47 right - a general solution seems preferable 11:32:48 there is a discussion there about who creates the claim in placement and who enforced the unified limit, and which things are not in placement (images, maybe), but that is probably easier after we have the easy Nova integration complete 11:33:40 PCI passthrough is unlikely to be cyborg's problem in the near term, initial focus is FPGA's that need programming, etc 11:33:53 but I could be out of date there 11:34:09 sounds familiar to me. 11:34:17 (where Cyborg is) 11:34:33 my feeling was they wanted to handled it. 11:34:44 yeah, same 11:34:44 i just think it will be very sad if we use cyborg only for GPUs quotas 11:35:00 belmoreira: ok, sounds like we are in a similar place 11:35:27 I am hoping on adding extra resource classes into flavors and hypervisor inventory for that 11:35:58 I don't have a link to that spec, it might not exist yet 11:36:38 thanks all for your time, was good to report positive progress :) 11:36:52 Thanks John. One final question 11:36:56 sure 11:37:27 This still feels like it is more than one cycle out - any feeling on an ETA? 11:38:20 I am still hoping one cylce after the spec is approved, to be experimentally available 11:40:01 The greatest help the SIG can be right now is feedback on those specs, correct? 11:40:44 yes 11:40:47 +1 11:42:16 OK - thanks for the update johngarbutt, much appreciated 11:43:13 great work johngarbutt, thank you 11:43:26 thanks all 11:44:05 Shall we move on to telemetry and gnocchi? 11:44:39 #topic the post-gnocchi future 11:45:12 I'm not sure if the right folks are here but this topic was squeezed out of last week's meeting 11:45:24 who currently runs gnocchi? 11:47:09 ... perhaps we should move on 11:47:36 #topic AOB 11:47:40 We would like a working telemetry solution at scale though 11:48:07 I had a question. Does anyone have experience using Mellanox VF-LAG for SR-IOV? 11:49:10 there is a recent thread on openstack-discuss related to Gnocchi 11:49:11 http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010922.html 11:49:13 not yet - I'm interested to hear about it, too :) 11:49:41 Hi witek - ears burning :-) thanks for joining 11:49:54 :) 11:50:49 I'd seen that thread, thanks. Was the final decision to revert to mongodb? That didn't sound ideal. Or was it still undecided? 11:51:28 janders: If I get the chance I'll report back. I think b1airo was using it in Monash. 11:51:48 what info are people wanting from the telemetry, mostly usage data? or something more? 11:51:49 That's what current cores committed to, but they don't want to block other solutions 11:54:00 I keep wanting to see someone process openstack notification streams using kafka, and extract usage info, alongside other gems, but not seen anyone try that yet 11:54:34 me too :-) there is a kafka driver for oslo.notifications isn't there? 11:54:56 yeah, although it had all the markings of "please don't use me" when I last looked at it 11:55:08 yes, but not sure if it's maintained 11:55:19 it didn't look very loved 11:55:33 oh dear. 11:56:48 witek: how's the events API for Monasca, is that getting active work this cycle? 11:57:38 the API is ready, what we need is some kind of standardized collector 11:58:18 it is planned for this cycle, but no one assigned yet 11:59:33 you mean like oslo.notifications -> monasca events API? 11:59:38 right 11:59:47 I see, so notification events, published in the usual way (amqp?) would be gathered by this collector and transferred to the monasca api 11:59:48 oops 12:00:20 ah, we are on the hour. Final thoughts? 12:00:40 any volunteers for witek? :-) 12:00:51 (or John for that matter) 12:01:06 OK, thanks all 12:01:09 #endmeeting