*** mrunge_ is now known as mrunge | 08:49 | |
opendevreview | Elod Illes proposed openstack/ceilometermiddleware master: DNM: gate health test https://review.opendev.org/c/openstack/ceilometermiddleware/+/957348 | 09:38 |
---|---|---|
opendevreview | Takashi Kajinami proposed openstack/aetos master: Rename wsgi script https://review.opendev.org/c/openstack/aetos/+/957430 | 13:14 |
jwysogla | That's great... I started working on scoping prometheus queries in Aodh to user's projects similarly to what we do with gnocchi alarms https://review.opendev.org/c/openstack/aodh/+/957279 . I discovered the PromQL parser we have in observabilityclient isn't good enough for this. The parser queries Prometheus to get a list of metric names, which it uses as a help to make the parsing easier. That | 14:15 |
jwysogla | means the metric which is being queried needs to exist in Prometheus. This is fine for users using CLI, they need to wait until Prometheus has data anyway (and when that happens, the parser works). It's not fine for Aodh, which needs to scope the query right when an alarm gets created. So in fresh environments that don't have all the metrics in Prometheus yet like our autoscaling tests, the | 14:15 |
jwysogla | parser fails and Aodh isn't able to scope the query to project properly. | 14:15 |
jwysogla | (this is a new thing, so there is not a bug in the Aodh code at the moment) | 14:16 |
jwysogla | I've been looking at getting rid of needing to query for the metrics names in the parser, but I don't see an easy solution. It's either using a new library or reimplementing the parser from Prometheus, which is like a half a year of work :D | 14:17 |
jwysogla | And regarding libraries it's either promql-parser https://github.com/messense/py-promql-parser (which says: "This library declares compatible with [prometheus v2.45.0]") | 14:18 |
jwysogla | Or using gopy https://github.com/go-python/gopy and importing the parser from prometheus go code. Which is probably a bad idea with regards to dependencies | 14:19 |
jwysogla | Or using the parse_query endpoint of Prometheus https://prometheus.io/docs/prometheus/latest/querying/api/#parsing-a-promql-expressions-into-a-abstract-syntax-tree-ast . But they state: "This endpoint is experimental and might change in the future... It may also be removed again in case it is no longer needed by the UI." | 14:20 |
tkajinam | that's .... fun :-P | 14:21 |
jwysogla | yeah don't see any "perfect" solution at the moment. | 14:22 |
tkajinam | is this because aodh checks associated metrics when an alam is created ? > It's not fine for Aodh, which needs to scope the query right when an alarm gets created | 14:24 |
tkajinam | I'm still trying to understand the actual problem here (I though aodh may just leave alarm with insufficient data state when no metrics are found but I can be wrong) | 14:25 |
jwysogla | with the change I'm trying to propose yes | 14:25 |
tkajinam | ok | 14:25 |
tkajinam | talking about another exciting thing related to "scoping" , we have to find out a way to enable oslo_policy.enforce_scope in aodh. we override the option now but will be enforced soon. I tried it but failed and learned we probably need to change how request scope is passed (which is not passed entirely atm) | 14:26 |
tkajinam | we probably have to look into aetos as well (because it's codebase is based on aodh) | 14:26 |
tkajinam | leaving this here as I think I've never shared this problem to anyone else actually | 14:26 |
jwysogla | I've seen some mentions of this. I was wondering what's the status of the enforce_scope is in aodh. | 14:27 |
jwysogla | Do you know when it's planed to be enforced? | 14:28 |
tkajinam | gmaan may know it better but I expect that can happen in 2026.1 or probably 2026.2 | 14:28 |
jwysogla | ok | 14:29 |
jwysogla | Regarding the parser. I think I'm gonna manually append all ceilometer metrics to the metric name list retrieved from Prometheus. This way it'll always work for Ceilometer metrics, which should be like 98% of cases. I'll also try to use the "experimental" parse_query endpoint, with which it'd work for all metrics always. So we'd use parse_query endpoint first and as a fallback if that fail, we'd | 14:33 |
jwysogla | use the current way + hardcoded list of ceilometer metrics. | 14:33 |
jwysogla | let me do some experiments :D | 14:33 |
gmaan | jwysogla: tkajinam I am little less aggressive on forcing new things (more of remove old things). it was planned for this cycle but seeing the project progress I have planned it fort starting of the next cycle and will also send it on ML about what all projects will be impacted. | 15:34 |
jwysogla | alright, so we should take a look at it kinda soon then. | 16:07 |
tkajinam | https://github.com/openstack/ceilometer/blob/8763625d82bca2be025ce9334c0927d90d1cdc03/ceilometer/compute/virt/libvirt/inspector.py#L208 | 16:11 |
tkajinam | ok I noticed I was blind | 16:11 |
tkajinam | so it does not generate any stats for instance in shutoff sate. phhhh | 16:11 |
opendevreview | Takashi Kajinami proposed openstack/ceilometer master: Revert "Disable power.state meter" https://review.opendev.org/c/openstack/ceilometer/+/957448 | 16:14 |
tkajinam | mrunge, jwysogla I just noticed that current inspector just skips instance in shutoff state... so the "problem" I mentioned yesterday may not actually happen. I'm reverting the change ^^^ . sorry for the confusion... | 16:14 |
tkajinam | at least resize may not cause problems, though live migration could | 16:15 |
mrunge | tkajinam: ack. thank you for double checking | 16:15 |
tkajinam | I'm approving the vcpu/memory pollsters as well | 16:16 |
opendevreview | Takashi Kajinami proposed openstack/ceilometer master: Revert "Disable power.state meter" https://review.opendev.org/c/openstack/ceilometer/+/957448 | 16:16 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!