Thursday, 2025-08-14

*** mrunge_ is now known as mrunge08:49
opendevreviewElod Illes proposed openstack/ceilometermiddleware master: DNM: gate health test  https://review.opendev.org/c/openstack/ceilometermiddleware/+/95734809:38
opendevreviewTakashi Kajinami proposed openstack/aetos master: Rename wsgi script  https://review.opendev.org/c/openstack/aetos/+/95743013:14
jwysoglaThat's great... I started working on scoping prometheus queries in Aodh to user's projects similarly to what we do with gnocchi alarms https://review.opendev.org/c/openstack/aodh/+/957279 . I discovered the PromQL parser we have in observabilityclient isn't good enough for this. The parser queries Prometheus to get a list of metric names, which it uses as a help to make the parsing easier. That 14:15
jwysoglameans the metric which is being queried needs to exist in Prometheus. This is fine for users using CLI, they need to wait until Prometheus has data anyway (and when that happens, the parser works). It's not fine for Aodh, which needs to scope the query right when an alarm gets created. So in fresh environments that don't have all the metrics in Prometheus yet like our autoscaling tests, the 14:15
jwysoglaparser fails and Aodh isn't able to scope the query to project properly.14:15
jwysogla(this is a new thing, so there is not a bug in the Aodh code at the moment)14:16
jwysoglaI've been looking at getting rid of needing to query for the metrics names in the parser, but I don't see an easy solution. It's either using a new library or reimplementing the parser from Prometheus, which is like a half a year of work :D14:17
jwysoglaAnd regarding libraries it's either promql-parser https://github.com/messense/py-promql-parser (which says: "This library declares compatible with [prometheus v2.45.0]")14:18
jwysoglaOr using gopy https://github.com/go-python/gopy and importing the parser from prometheus go code. Which is probably a bad idea with regards to dependencies14:19
jwysoglaOr using the parse_query endpoint of Prometheus https://prometheus.io/docs/prometheus/latest/querying/api/#parsing-a-promql-expressions-into-a-abstract-syntax-tree-ast . But they state: "This endpoint is experimental and might change in the future... It may also be removed again in case it is no longer needed by the UI."14:20
tkajinamthat's .... fun :-P14:21
jwysoglayeah don't see any "perfect" solution at the moment.14:22
tkajinamis this because aodh checks associated metrics when an alam is created ? > It's not fine for Aodh, which needs to scope the query right when an alarm gets created14:24
tkajinamI'm still trying to understand the actual problem here (I though aodh may just leave alarm with insufficient data state when no metrics are found but I can be wrong)14:25
jwysoglawith the change I'm trying to propose yes14:25
tkajinamok14:25
tkajinamtalking about another exciting thing related to "scoping" , we have to find out a way to enable oslo_policy.enforce_scope in aodh. we override the option now but will be enforced soon. I tried it but failed and learned we probably need to change how request scope is passed (which is not passed entirely atm)14:26
tkajinamwe probably have to look into aetos as well (because it's codebase is based on aodh)14:26
tkajinamleaving this here as I think I've never shared this problem to anyone else actually14:26
jwysoglaI've seen some mentions of this. I was wondering what's the status of the enforce_scope is in aodh.14:27
jwysoglaDo you know when it's planed to be enforced?14:28
tkajinamgmaan may know it better but I expect that can happen in 2026.1 or probably 2026.214:28
jwysoglaok14:29
jwysoglaRegarding the parser. I think I'm gonna manually append all ceilometer metrics to the metric name list retrieved from Prometheus. This way it'll always work for Ceilometer metrics, which should be like 98% of cases. I'll also try to use the "experimental" parse_query endpoint, with which it'd work for all metrics always. So we'd use parse_query endpoint first and as a fallback if that fail, we'd 14:33
jwysoglause the current way + hardcoded list of ceilometer metrics.14:33
jwysoglalet me do some experiments :D14:33
gmaanjwysogla: tkajinam I am little less aggressive on forcing new things (more of remove old things). it was planned for this cycle but seeing the project progress I  have planned it fort starting of the next cycle and will also send it on ML about what all projects will be impacted.15:34
jwysoglaalright, so we should take a look at it kinda soon then.16:07
tkajinamhttps://github.com/openstack/ceilometer/blob/8763625d82bca2be025ce9334c0927d90d1cdc03/ceilometer/compute/virt/libvirt/inspector.py#L20816:11
tkajinamok I noticed I was blind16:11
tkajinamso it does not generate any stats for instance in shutoff sate. phhhh16:11
opendevreviewTakashi Kajinami proposed openstack/ceilometer master: Revert "Disable power.state meter"  https://review.opendev.org/c/openstack/ceilometer/+/95744816:14
tkajinammrunge, jwysogla I just noticed that current inspector just skips instance in shutoff state... so the "problem" I mentioned yesterday may not actually happen. I'm reverting the change ^^^ . sorry for the confusion...16:14
tkajinamat least resize may not cause problems, though live migration could16:15
mrungetkajinam: ack. thank you for double checking16:15
tkajinamI'm approving the vcpu/memory pollsters as well16:16
opendevreviewTakashi Kajinami proposed openstack/ceilometer master: Revert "Disable power.state meter"  https://review.opendev.org/c/openstack/ceilometer/+/95744816:16

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!