15:00:32 <gordc> #startmeeting telemetry 15:00:33 <openstack> Meeting started Thu Feb 25 15:00:32 2016 UTC and is due to finish in 60 minutes. The chair is gordc. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:34 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:36 <openstack> The meeting name has been set to 'telemetry' 15:00:38 <ildikov> o/ 15:00:51 <r-mibu> o/ 15:01:03 <liusheng> o/ 15:01:06 <sileht> o/ 15:01:10 <ityaptin> o/ 15:01:50 <_nadya_> o/ 15:02:09 <nijaba> o/ 15:02:44 <gordc> ok let's start, i think some people are on PTO 15:02:57 <gordc> #topic recurring: roadmap items (new/old/blockers) https://wiki.openstack.org/wiki/Telemetry/RoadMap 15:03:12 <gordc> we're basically up for time on features for Mitaka 15:03:14 <idegtiarov> o/ 15:03:23 <gordc> the items we were tracking last week seem to be ok 15:03:56 <gordc> i'll run through each in the subtopics 15:04:03 <gordc> but any pressing concerns? 15:04:29 <gordc> m-3 is next week so basically all features now will need to be very very small 15:05:18 <gordc> cool. let's move to the projects 15:05:29 <gordc> #topic aodh topics 15:05:44 <gordc> right now we're tracking composite alarms for Mitaka 15:05:48 <gordc> main patch is in 15:05:57 <gordc> we just need approval on api and client 15:06:01 <liusheng> gordc: thanks 15:06:20 <gordc> https://review.openstack.org/#/c/257722/ 15:06:44 <gordc> https://review.openstack.org/#/c/284022/ 15:07:02 <gordc> if we can get reviews and get that merged that'd be great 15:07:35 * gordc nudges sileht 15:08:03 <gordc> has anyone looked at switchig ceilometerclient to aodhclient in heat? 15:08:03 <liusheng> it seems the jenkins has been broken :( 15:08:23 <gordc> liusheng: yeah, it's being fixed 15:08:31 <sileht> liusheng, gordc I will check that one last time 15:08:37 <gordc> we have some time next week to merge as well. 15:09:00 <liusheng> gord, sileht, cool, tanks! 15:09:07 <gordc> i'll try taking a look at porting ceilometerclient to aodhclient in heat... i'm guessing it won't make it for Mitaka though 15:09:27 <liamji> liusheng: the problem we found in the our working day is fixed. Now it is the second one :) 15:09:35 <ildikov> gordc: can we get an exception for it? 15:09:47 <gordc> ildikov: in heat? 15:09:57 <ildikov> gordc: yes 15:10:17 <gordc> ildikov: i'll give them a ping and see. 15:10:25 <gordc> i'm hoping it's an easy swap 15:10:45 <ildikov> gordc: me too, this is why I asked 15:11:18 <gordc> #action see if heat will allow aodhclient FFE 15:11:42 <gordc> aside from that, i think Aodh is what it is? 15:11:49 <gordc> r-mibu: did you have success with tempest? 15:12:10 <gordc> i believe pradk mentioned whatever we have now is conflicting with what exists in tempest repo 15:12:14 <r-mibu> you mean running test with plugins? 15:12:23 <gordc> r-mibu: correct 15:12:38 <r-mibu> right, so I'll fix the id of test that may fix the bug 15:12:51 <r-mibu> but didn't check yet 15:13:04 <r-mibu> will do by this week :) 15:13:05 <gordc> r-mibu: do you have time next 2 weeks? 15:13:08 <gordc> ok 15:13:21 <r-mibu> yep, other stuffs done :) 15:13:22 <gordc> i'm going to make tempest stuff FFE since it's only tests. 15:13:31 <gordc> anyone have concerns? 15:13:31 <r-mibu> ok 15:14:01 <r-mibu> docs... 15:14:19 <r-mibu> as you pointed in review 15:14:37 <gordc> r-mibu: ok. let's try to get it working as soon as possible. but i'm ok with cutting m-3 and merging tempest stuff in an rc build unless someone has an issue 15:14:55 <gordc> r-mibu: docs for tempest tests? 15:15:01 <ildikov> I will run a config guide update for Aodh 15:15:04 <r-mibu> docs for aodh 15:15:20 <r-mibu> but, yes, that's not big problem for m-3 15:15:24 <gordc> ildikov: cool cool. thanks for tracking that 15:15:29 <ildikov> I think we will need to look into the Alarming section of the Admin Guide in OS Manuals 15:15:57 <gordc> ildikov: defnitely. we can still make changes to docs after m-3 correct? 15:15:58 <ildikov> I'm not sure how much that part is outdated, so prolly pretty much :) 15:16:19 <ildikov> gordc: sure, we have some time during the stabilization period 15:16:26 <gordc> ildikov: awesome 15:16:50 <gordc> we need to do docs for all of aodh, our dev docs are non-existent too 15:16:59 <ildikov> gordc: of course sooner the better, but still we will have a better picture right after m-3 regarding what made it and what to document 15:17:07 <gordc> ildikov: sounds good 15:17:16 <gordc> anything else for aodh? 15:17:31 <llu-laptop> test :( 15:18:01 <llu-laptop> anyone see me? can't read any messages :( 15:18:32 <liusheng> llu-laptop: I can see you :) 15:18:33 <r-mibu> llu-laptop: i can see your message 15:18:33 <ildikov> llu-laptop: I see your messages 15:18:40 <neelashah> I can see your messages llu-laptop 15:18:57 <gordc> and he's gone.lol 15:19:05 <gordc> let's move on for now 15:19:14 <gordc> #topic ceilometer topics 15:19:27 <gordc> we have two items here to get merged 15:19:56 <gordc> ityaptin's patch for minimising nova-api load: https://review.openstack.org/#/c/284322/ 15:20:14 <gordc> i think that needs a docimpact since we added a new optoin 15:20:48 <gordc> and liamji's patch for neutron v2: https://review.openstack.org/#/c/277434/ 15:21:24 <gordc> r-mibu: same FFE for tempest in ceilometer 15:21:48 <r-mibu> got it 15:21:57 <neelashah> gordc neutron v2 is failing due to gate issues? 15:22:17 <gordc> we all happy? let's hold off on any other features 15:22:27 <idegtiarov> gordc, what about event transformers FFE? 15:22:29 <gordc> neelashah: yeah, we need to fix a gnocchclient issue first and we should be ok 15:22:41 <neelashah> gordc - ok, thanks 15:22:59 <gordc> idegtiarov: i have some concerns but we can talk about that now 15:23:22 <gordc> #topic event bracketer transformer 15:23:26 <gordc> #link https://review.openstack.org/#/c/266488/ 15:23:30 <idegtiarov> great what is your main concern? 15:23:50 <gordc> i don't understand why latency is an event. 15:24:04 <gordc> and why it needs to be calculated inline/stream 15:24:38 <gordc> also, the code seems be very inflexible. 15:24:40 <idegtiarov> it is event that could be published to - - tosamplenotifier:// and will be stored as sample 15:25:45 <gordc> idegtiarov: and it has real-time requirement because? 15:26:50 <idegtiarov> gordc, if you need to get alarm based for example on latency_time event/sample it is 15:26:59 <_nadya_> gordc: quick question about alarms transformer 15:26:59 <_nadya_> gordc: events, sorry 15:26:59 <_nadya_> #link https://review.openstack.org/#/c/266488/5 perhaps we can start with instances only in Mitaka? 15:27:40 <gordc> idegtiarov: but the alarm scenario is handled by the timeout mechanism in aodh no? 15:28:53 <idegtiarov> gordc, as an example we could be interested why instances booting longer then 10 minutes and create alarm for that case 15:29:57 <idegtiarov> the main idea is have tool for event transformation and store it as event/sample 15:30:05 <_nadya_> can we convert only to samples? I agree that "latency" is mostly about sample 15:30:17 <gordc> shouldn't timeouts be done by event alarm? 15:30:42 <idegtiarov> we will can when https://review.openstack.org/#/c/227106/ will be merged 15:31:17 <_nadya_> it looks it's ready to be 15:31:21 <r-mibu> i agree gordc - alarming logics can be put in aodh rather than ceilometer itself 15:32:04 <r-mibu> if creation time of instance can be meter, i'm ok 15:32:39 <gordc> _nadya_: yeah. i think it's definitely a measurement. i'm wondering how many measurements it is though. 15:32:44 <idegtiarov> r-mibu, it is not alarming logic it is logic of event transformation that could be used for statistics of booting instances or alarming based on new samples/event 15:32:55 <gordc> we will really only have one latency measurement per resource (you can only ever create once) 15:33:25 <gordc> is this better as a query feature in events api 15:34:29 <gordc> usually that's how most BI tools work. you calculate the data from specific log records 15:34:45 <idegtiarov> much better as for me because it is rather expensive to index events traits end we already have event_type indexing in mongodb so api requests for new events will be pretty fast 15:35:17 <_nadya_> gordc: it is only one for now, right. But we can have latency for many different resources: instances, volumes and so on. In M* we may start with instances only 15:36:38 <gordc> is the 'resource' == host? because it doesn't matter how many different resources you'll have, it still just one entry for each instance/volume/etc... no? 15:37:39 <gordc> for me, i think this functionality is better as post-storage work, i don't really see the real-time requirement of it. that's my main point 15:38:00 <ildikov> gordc: I tend to agree with you on this point 15:38:46 <r-mibu> adding new logic might affect event processing and having date in workers make difficult in HA/multi-worker 15:38:52 <_nadya_> It looks so great to have alarm: "look, your instances start to boot more then 10 minutes" 15:39:00 <idegtiarov> it is not only about booting time, but for example instances update 15:39:06 <gordc> so just referencing stacktach and how they were planning to implemented alarms, i believe they also do these calculations in post 15:39:59 <gordc> r-mibu: yeah, it definitely complicates stuff having a global cache shared across workers/systems... 15:40:15 <gordc> although notification agent is/should be smart enough to redirect to common queues. 15:40:57 <_nadya_> dunno, we already have "online" mechanism for transformers 15:41:44 <ildikov> _nadya_: the instance booting time issues is more an alert in definition, also I would assume it gets interesting when it happens with all of them not just one 15:41:45 <gordc> _nadya_: so the alarm comment i believe we want to have it handled by Aodh 15:41:54 <gordc> you are defining the rule there already. 15:42:10 <liusheng> IIUC, if we emit measurements on events latency, these measurements sparse in timeline, if we have alarms on these measurements, the state of alarms maybe always "insufficient data" 15:42:33 <idegtiarov> ildikov, not only alerts but could be used for statistics of booting vms 15:42:56 <ildikov> idegtiarov: that can be a post operation/query as well 15:43:04 <gordc> ildikov: + 15:43:10 <idegtiarov> not now 15:43:22 <idegtiarov> even not with ceilo api 15:43:24 <gordc> idegtiarov: but it could be, if someone worked on it ;) 15:43:35 <ildikov> gordc: +1 :) 15:44:04 <r-mibu> idegtiarov: i understand your use case, but it sometimes won't work since we cannot make sure that ceil receive set of start and end message 15:44:08 <_nadya_> gordc: store events in time series storage and have post processing? in Gnocchi? 15:44:22 <gordc> idegtiarov: i had this topic for making events more useful for BI at last summit? i just didn't do anything, so it's kinda my fault (but i won't admit it) 15:45:59 <gordc> r-mibu: right, the potential latency in MQ may cause weird results from real-time pov 15:46:08 <_nadya_> r-mibu: I hear this very often. But actually, doesn't it mean that we cannot provide a reliable billing? We lose notifications about instances, sorry 15:46:18 <idegtiarov> r-mibu when we do not receives end event it sould be alarm otherwise we will have and could used if we need such data 15:46:47 <gordc> _nadya_: i'm not sure if we need gnocchi specifically. gnocchi i think is continuous measurements over time 15:47:09 <gordc> latency seems to be measurements in set time 15:47:48 <r-mibu> i assume for billing purpose operator will check db record as well, otherwise I will boot many instances on that system :) 15:47:49 <_nadya_> gordc: so...what storage will be used for events? sql? 15:48:40 <ildikov> for billing we usually have some freedom I think regarding messages and it usually does not have to be real-time 15:49:03 <_nadya_> r-mibu: :) 15:49:32 <gordc> _nadya_: existing storage: sql/elasticsearch. 15:49:36 <r-mibu> idegtiarov: yep, and that can be done in aodh + events storage 15:49:49 <_nadya_> let's move on, I see community point here 15:49:55 <idegtiarov> actually to do the same operation on not indexing data will be a big issue for big event collection 15:50:01 <gordc> _nadya_: in theory, this should be doable in elasticsearch. 15:50:35 <gordc> _nadya_: i also believe stacktach offers some mechanisms to handle related events (i don't know status of all that though) 15:50:40 <_nadya_> gordc: I don't like it is external, not ceilometer-core. But perhaps I need to think more 15:50:56 <gordc> what's external? 15:51:32 <_nadya_> gordc: that this statistics should be calculated outside ceilometer, in external system 15:51:47 <gordc> idegtiarov: but we index event_type and all the traits... 15:52:37 <idegtiarov> we do not index traits 15:53:15 <gordc> ceilometer gathers data, normalises and transforms. gnocchi does a lot of stuff 'outside ceiloemter' but it's still our project 15:54:45 <gordc> idegtiarov: https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/sqlalchemy/models.py#L294 15:55:05 <gordc> i don't understand, it seems index'd. it's a primary key 15:55:39 <gordc> if not, it should be. 15:55:48 <gordc> sileht: you have anything for gnocchi? 15:56:16 * sileht is reading backlog 15:56:17 <idegtiarov> gordc, I mean in mongodb 15:56:32 <gordc> sileht: no backlog. just asking if we can leave gnocchi topics :) 15:57:00 <gordc> idegtiarov: we probably should? or not use mongodb :P 15:57:09 <idegtiarov> :P 15:57:14 <sileht> gordc, oh I have released gnocchiclient 2.2.0 and I will start working on gnocchi dispatcher for bachting measurements 15:57:30 <gordc> tbh, it seems like i'm not the only person who has issues so maybe we should punt it for Mitaka 15:57:58 <idegtiarov> o no :( 15:58:12 <sileht> idegtiarov, gordc why not having both indexes, the new one and the old one ? 15:58:20 <ildikov> gordc: agreed 15:58:26 <gordc> idegtiarov: let's move this to main chanell post meeting 15:58:35 <gordc> #topic gnocchi topics 15:58:35 <idegtiarov> k 15:59:00 <gordc> sileht: i had a question, do we want to make the gnocchi dispatcher use new batching support 15:59:15 <sileht> gordc, why not ? 15:59:19 <gordc> for mitaka? 15:59:31 <gordc> i just want to know if we should track it 15:59:41 <gordc> someone is going to yell soon 15:59:42 <sileht> I'm a bit lost on where we are on the roadmap 15:59:54 <gordc> sileht: :) i'll ask in main channel 16:00:05 <gordc> thakns everyone 16:00:09 <gordc> #endmeeting