#openstack-meeting log

11:00:43 <oneswig> #startmeeting scientific-sig
11:00:44 <openstack> Meeting started Wed Dec  4 11:00:43 2019 UTC and is due to finish in 60 minutes.  The chair is oneswig. Information about MeetBot at http://wiki.debian.org/MeetBot.
11:00:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
11:00:47 <openstack> The meeting name has been set to 'scientific_sig'
11:00:51 <oneswig> greetings
11:00:54 <noggin143> \o
11:00:58 <johngarbutt> o/
11:01:03 <oneswig> #link Agenda for today https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_December_4th_2019
11:01:15 <belmoreira> o/
11:01:20 <oneswig> How are we all :-)
11:02:08 <oneswig> Ok let's get the ball rolling
11:02:11 <johngarbutt> mostly noticing winter this morning :)
11:02:13 <oneswig> #topic Unified limits and nested projects
11:02:20 <oneswig> johngarbutt: thanks for coming along
11:02:41 <oneswig> Can you describe the issue here?
11:02:42 <johngarbutt> its long overdue, keep trying to make it, so glad to be here
11:02:53 <johngarbutt> #link https://review.opendev.org/#/c/602201/
11:03:07 <johngarbutt> so here is the Nova spec for the Nova part of unified limits
11:03:37 <johngarbutt> but a bit of back story first
11:03:41 <johngarbutt> #link https://docs.openstack.org/api-ref/identity/v3/#unified-limits
11:03:58 <janders> we had a bit of a winter snap on the first two days of summer here - but back to good now :)
11:03:58 <johngarbutt> Keystone have now added APIs to support "unified limits"
11:04:24 <johngarbutt> the aim here is to make limits consistent between all the projects
11:04:37 <johngarbutt> at least all projects that adopt unified limits, anyways
11:05:01 <johngarbutt> ... with that in place we can then implement things like hierarchy in a way that we all agree
11:05:16 <oneswig> a long overdue capability!
11:05:20 <johngarbutt> a key part of that is the oslo.limit library
11:05:52 <johngarbutt> so the big news is I have some patches up
11:05:54 <johngarbutt> https://review.opendev.org/#/c/695527/
11:06:14 <johngarbutt> there we have a long chain of patches that implements the limit enforcement
11:06:40 <johngarbutt> the final one being an sketch on how we do the two level limit enforcer using the proposed limit API
11:07:20 <johngarbutt> ...
11:07:33 <johngarbutt> so a patch that ties the keystone work and oslo.limit work into a project is here
11:07:34 <johngarbutt> https://review.opendev.org/#/c/615180/
11:07:46 <johngarbutt> this is how I propose Nova could adopt oslo.limit
11:08:04 <johngarbutt> it builds on the great work melwitt has been doing in recent cycles
11:08:16 <johngarbutt> so it is "placement centric"
11:08:29 <johngarbutt> by which I mean the nova unified limits are based on resources in placement
11:08:36 <belmoreira> this is great!
11:08:39 <johngarbutt> VCPUs, PCPUS, etc
11:08:43 <belmoreira> Finally the spec has a +2 after all these iterations. Congrats :) Do you think it can be implemented/merged during this cycle?
11:08:58 <johngarbutt> and importantly, custom baremetal flavors also
11:09:40 <johngarbutt> belmoreira: that is my aim, but the second +2 is taking its time, I need to try and find a second supporter
11:10:09 <noggin143> does the customer baremetal flavor give me the ability to limit what machines can be taken (e.g. any VM resource but only 48 cores of bare metal)?
11:10:29 <johngarbutt> noggin143: almost...
11:10:38 <oneswig> johngarbutt: when this lands, how would a quota be set on a custom resource class?
11:10:50 <johngarbutt> each baremetal flavor has a custom resource class, and you can limit on that
11:10:55 <johngarbutt> oneswig: yes
11:11:20 <johngarbutt> so for baremetal you would like the number of machines of a given type
11:11:32 <belmoreira> meaning: project X can use Y "custom resource class"
11:11:37 <johngarbutt> like CUSTOM_BAREMETAL_R630_SILVER or whatever
11:12:03 <johngarbutt> yeah, you can use N of any resource requested in the flavor
11:12:18 <johngarbutt> it means baremetal flavors would no longer use any VCPU
11:12:24 <johngarbutt> that is now just for VMs
11:12:36 <johngarbutt> in the first release, this will all be optional
11:12:48 <johngarbutt> i.e. you have to opt into the new behavior
11:13:47 <johngarbutt> it would be great to get people's support on the spec
11:14:14 <johngarbutt> by support, I mean constructive feedback too, that is always better than blind support :)
11:14:34 <johngarbutt> #link https://review.opendev.org/#/c/615180/
11:14:44 <johngarbutt> oops
11:15:01 <johngarbutt> #link https://review.opendev.org/#/c/602201/
11:15:04 <johngarbutt> that is the spec
11:15:38 <johngarbutt> are there any use cases people are worried about that we should go through, to make sure they are covered?
11:15:59 <johngarbutt> hopefully the baremetal one make some sense, or at least, I could go through that again if it wasn't that clear?
11:16:34 <belmoreira> how about GPU or other specific resources support?
11:16:39 <oneswig> I think it splits into two main use cases - common resource quotas across baremetal and virt resources, and GPUs.
11:16:45 <johngarbutt> great question
11:16:47 <oneswig> brb
11:17:10 <belmoreira> we can use resource classes but looks very limited
11:17:23 <johngarbutt> so right now they are not modeled in placement, so its trickier
11:17:49 <johngarbutt> there is a plan in Nova to allow you to add CUSTOM_RESOURCE_CLASS_X into your hypervisor resource provider
11:18:15 <johngarbutt> i.e. CUSTOM_GPU_V100 = 8
11:18:22 <johngarbutt> then your flavor could request those
11:18:45 <johngarbutt> basically pizza slicing your machine, using custom resource classes
11:18:53 <belmoreira> and about to model this with "traits"
11:19:26 <johngarbutt> so traits... at the moment there isn't anything, but there is a follow on idea I had
11:19:35 <johngarbutt> lets consider the server count limit
11:19:47 <johngarbutt> today we count all servers
11:19:58 <johngarbutt> ... but we could extend that to a trait based count
11:20:11 <johngarbutt> how many instances do you have that request a given trait
11:20:26 <johngarbutt> which might be quite powerful, what do you think?
11:21:23 <belmoreira> I think that will be more powerful/flexible than the resource_class for this usecase
11:21:32 <johngarbutt> I should be clear, the current aim for this cycle is to do the minimum unified limits work, to open up new options, it just happens the easy way to implement this means stuff like CUSTOM_RESOURCE_X will just work too
11:22:16 <johngarbutt> belmoreira: cool, I am just glad it doesn't sound crazy, its certainly a follow on feature (that may need placement changes)
11:22:36 <noggin143> +1 - let's get it in and then enhance - specs look like they cover our base use case from Boston - https://techblog.web.cern.ch/techblog/post/nested-quota-models/
11:24:13 <janders> this all looks good to me. I don't have the GPU part thought through, but the baremetal part sounds reasonable and useful
11:24:14 <belmoreira> I understand that... and it would be huge work to have it. Thank for working on it. I was just trying to not forget the other use cases (GPUs)
11:24:27 <johngarbutt> noggin143: hopefully this (once finished) would be the main bit you needed for nested: https://review.opendev.org/#/c/695527/
11:24:45 <noggin143> johngarbutt: yup
11:24:48 <johngarbutt> belmoreira: ++
11:25:10 <janders> where would we need to lobby for the second +2?
11:25:26 <johngarbutt> belmoreira: part of me hopes GPUs become a resource class at some point (basically)
11:25:51 <johngarbutt> janders: I would say the nova meeting, but maybe a +1 on the spec would help too
11:26:27 <johngarbutt> I should set a bit of context on that though...
11:26:57 <johngarbutt> for the folks of us that know this bit of Nova inside out, it takes us an hour or two to page back in all the context needed to work on these things
11:27:25 <johngarbutt> its a "bit tricky", so I know its asking quite a lot of someone to review it properly
11:27:40 <johngarbutt> but it clearly important to lots and lots of people!
11:28:09 <noggin143> any indications from the other projects or are they waiting for Nova ?
11:28:21 <janders> would bringing this to the attention of RHAT guys help?
11:28:45 <johngarbutt> melwitt is pushing this on her side, asking your contacts could only help I guess.
11:29:07 <belmoreira> ack
11:29:15 <johngarbutt> noggin143: good question, cinder is probably the next people, I haven't seen anyone step up yet
11:29:45 <belmoreira> cinder quota is still old nova quota :)
11:29:50 <oneswig> apologies... back now
11:30:03 <belmoreira> Still GPUs... during the last summit I talked with the cyborg PTL. They were looking into the GPU quota case. It's worth to check with them
11:30:19 <johngarbutt> yeah, no counting approach there I guess, neutron did a re-write, and cinder added some hierarchy
11:31:28 <johngarbutt> belmoreira: I keep meanging to revisit the cyborg spec myself, there is some placement integration there, but its not in the flavor in quite the same way sadly
11:32:47 <oneswig> right - a general solution seems preferable
11:32:48 <johngarbutt> there is a discussion there about who creates the claim in placement and who enforced the unified limit, and which things are not in placement (images, maybe), but that is probably easier after we have the easy Nova integration complete
11:33:40 <johngarbutt> PCI passthrough is unlikely to be cyborg's problem in the near term, initial focus is FPGA's that need programming, etc
11:33:53 <johngarbutt> but I could be out of date there
11:34:09 <oneswig> sounds familiar to me.
11:34:17 <oneswig> (where Cyborg is)
11:34:33 <belmoreira> my feeling was they wanted to handled it.
11:34:44 <johngarbutt> yeah, same
11:34:44 <belmoreira> i just think it will be very sad if we use cyborg only for GPUs quotas
11:35:00 <johngarbutt> belmoreira: ok, sounds like we are in a similar place
11:35:27 <johngarbutt> I am hoping on adding extra resource classes into flavors and hypervisor inventory for that
11:35:58 <johngarbutt> I don't have a link to that spec, it might not exist yet
11:36:38 <johngarbutt> thanks all for your time, was good to report positive progress :)
11:36:52 <oneswig> Thanks John.  One final question
11:36:56 <johngarbutt> sure
11:37:27 <oneswig> This still feels like it is more than one cycle out - any feeling on an ETA?
11:38:20 <johngarbutt> I am still hoping one cylce after the spec is approved, to be experimentally available
11:40:01 <oneswig> The greatest help the SIG can be right now is feedback on those specs, correct?
11:40:44 <johngarbutt> yes
11:40:47 <johngarbutt> +1
11:42:16 <oneswig> OK - thanks for the update johngarbutt, much appreciated
11:43:13 <janders> great work johngarbutt, thank you
11:43:26 <johngarbutt> thanks all
11:44:05 <oneswig> Shall we move on to telemetry and gnocchi?
11:44:39 <oneswig> #topic the post-gnocchi future
11:45:12 <oneswig> I'm not sure if the right folks are here but this topic was squeezed out of last week's meeting
11:45:24 <oneswig> who currently runs gnocchi?
11:47:09 <oneswig> ... perhaps we should move on
11:47:36 <oneswig> #topic AOB
11:47:40 <noggin143> We would like a working telemetry solution at scale though
11:48:07 <oneswig> I had a question.  Does anyone have experience using Mellanox VF-LAG for SR-IOV?
11:49:10 <witek> there is a recent thread on openstack-discuss related to Gnocchi
11:49:11 <witek> http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010922.html
11:49:13 <janders> not yet - I'm interested to hear about it, too :)
11:49:41 <oneswig> Hi witek - ears burning :-) thanks for joining
11:49:54 <witek> :)
11:50:49 <oneswig> I'd seen that thread, thanks.  Was the final decision to revert to mongodb?  That didn't sound ideal.  Or was it still undecided?
11:51:28 <oneswig> janders: If I get the chance I'll report back.  I think b1airo was using it in Monash.
11:51:48 <johngarbutt> what info are people wanting from the telemetry, mostly usage data? or something more?
11:51:49 <witek> That's what current cores committed to, but they don't want to block other solutions
11:54:00 <johngarbutt> I keep wanting to see someone process openstack notification streams using kafka, and extract usage info, alongside other gems, but not seen anyone try that yet
11:54:34 <oneswig> me too :-) there is a kafka driver for oslo.notifications isn't there?
11:54:56 <johngarbutt> yeah, although it had all the markings of "please don't use me" when I last looked at it
11:55:08 <witek> yes, but not sure if it's maintained
11:55:19 <johngarbutt> it didn't look very loved
11:55:33 <oneswig> oh dear.
11:56:48 <oneswig> witek: how's the events API for Monasca, is that getting active work this cycle?
11:57:38 <witek> the API is ready, what we need is some kind of standardized collector
11:58:18 <witek> it is planned for this cycle, but no one assigned yet
11:59:33 <johngarbutt> you mean like oslo.notifications -> monasca events API?
11:59:38 <witek> right
11:59:47 <oneswig> I see, so notification events, published in the usual way (amqp?) would be gathered by this collector and transferred to the monasca api
11:59:48 <oneswig> oops
12:00:20 <oneswig> ah, we are on the hour.  Final thoughts?
12:00:40 <oneswig> any volunteers for witek? :-)
12:00:51 <oneswig> (or John for that matter)
12:01:06 <oneswig> OK, thanks all
12:01:09 <oneswig> #endmeeting