04:01:55 <ekcs> #startmeeting congressteammeeting 04:01:56 <openstack> Meeting started Fri Feb 22 04:01:55 2019 UTC and is due to finish in 60 minutes. The chair is ekcs. Information about MeetBot at http://wiki.debian.org/MeetBot. 04:01:57 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 04:01:59 <openstack> The meeting name has been set to 'congressteammeeting' 04:02:31 <ekcs> hi. topics here as usual: https://etherpad.openstack.org/p/congress-meeting-topics 04:02:38 <akhil_jain> ekcs: Hi! 04:02:52 <ekcs> hi akhil_jain ! happy friday hope things are well. 04:03:25 <akhil_jain> Happy friday, yes everything fine. what about you? 04:03:35 <ekcs> I’m doing alright! 04:03:57 <ekcs> ok so first a quick reminder:feature freeze 04:03:58 <ekcs> 7 March 2019 04:04:11 <ekcs> RC1 due 21 March 2019 04:04:38 <ekcs> let’s dive in to the topics then. 04:04:47 <ekcs> #topic Managing alarms 04:04:54 <ekcs> akhil_jain: I assume that’s your topic? 04:05:35 <akhil_jain> Yes right, ekcs as already discussed on mail. According to me. there can be faulty situation 04:07:41 <ekcs> yup. do you have a sample policy and scenario to help our analysis? 04:07:43 <akhil_jain> My usecase is. when an alarm is raised and it stays in congress with ACTIVE state. so lets consider operator created policy1 and handled the alarm situation. BUt state of alarm is still ACTIVE. so when he creates POLICY2 it can again execute actions based on that alarm 04:08:07 <akhil_jain> which may have been handled by policy1 04:09:31 <ekcs> right. ok I pulled up our email. 04:09:41 <akhil_jain> its 1.3 according to your mail. The one you specified to make new table 04:09:43 <ekcs> so the situation is either 3 or 4 in the email 04:09:44 <ekcs> 3. Alarms which had been activated and triggered some action, but the 04:09:45 <ekcs> alarm remains active because the action do not resolve the alarm. 04:09:46 <ekcs> 4. Alarms which had been activated and triggered some action, and the 04:09:47 <ekcs> action is in the process of resolving the alarm, but in the mean time the 04:09:48 <ekcs> alarm remains active. 04:10:09 <akhil_jain> yes right 04:11:22 <ekcs> so.. i want to consider a slightly different scenario. what if policy1 an policy2 have been active the whole time. that seems like the usual case. 04:11:32 <akhil_jain> also we can publish that discussion on openstack-discuss to get input from various developers as well 04:12:00 <ekcs> then when the alarm is activated, it would trigger action from BOTH policy1 and policy2 04:12:28 <akhil_jain> yes right even that can be harmful in various scenarios 04:13:01 <ekcs> ok. so let’s focus on the simple case first that does not have to do with timing and ordering. 04:13:29 <akhil_jain> like 1 policy pausing vms and other evacuating 04:13:30 <ekcs> how would you like the system to behave in the case that alarm activates while both policy1 and policy2 are in place? 04:14:40 <akhil_jain> thats a tough question :D anyways i would like only one policy to be executed on one alarm 04:15:12 <akhil_jain> that will cause issue of priority 04:16:04 <ekcs> I see. so… in my opinion in this case it’s up to the policy writer to write the policy in a way which decides what happens. 04:16:48 <ekcs> for example, if the writer wantns both actions to trigger, then she can write execute[action1] :- alarm1; execute[action2] :- alarm2 04:17:50 <ekcs> sorry i meant execute[action1] :- alarm1; execute[action2] :- alarm1 04:18:21 <ekcs> however, if she does not want both to trigger, then she can write something like execute[action1] :- alarm1, severity('low'); execute[action2] :- alarm1, severity('high') 04:20:39 <akhil_jain> i am not sure if policy writer can evaluate whether to take action on which alarm. they will get just list of alarms. lets suppose one alarm is 15 days old and still saying status=active. which was raised because compute node1 was down. But in actual the compute node1 is up now. as operator resolved the issue. but based on alarm policy writer can harm that node or vms created on that 04:22:15 <akhil_jain> saving alarms and computing actions on those is bit complex and can be wrongly used in real time scenarios. 04:22:46 <ekcs> ok that makes sense. 04:23:00 <akhil_jain> reaching to one solution will involve other communities as well i think. maybe monasca aodh n all 04:23:03 <ekcs> so would 1.3 solve the problem? one table for all active alarm. another table for most recent alarms. 04:23:52 <akhil_jain> yes maybe one table for all alarms and other for those on which policy is executed. i am not sure 04:24:28 <akhil_jain> or one field of policy_executed with alarm. dont know just a thought 04:25:12 <ekcs> yea there are several possibilities. 04:25:43 <ekcs> but i’m still trying to isolate the problem. is it distinguishing between new and old? or distinguishing between action taken and action not taken. 04:25:59 <akhil_jain> also gmann wants to discuss it in PTG if you are planning one 04:26:09 <ekcs> got it. 04:26:12 <ekcs> that’s great. 04:26:39 <akhil_jain> i would say both are the possibilities 04:26:58 <akhil_jain> 1.new n old 04:27:10 <akhil_jain> 2. action taken or not 04:28:36 <akhil_jain> hopefully i will be available too. for the PTG 04:29:03 <ekcs> ok. hopefully we can work through examples to see whether both are needed or just one. we can certainly implement things for both, but if one suffices then it’d be good to know it. 04:30:44 <ekcs> in general, I think it’s a good idea to leave the flexibility in the hands of the policy writer. 04:31:18 <ekcs> perhaps adding congress functions as possible actions would be one solution. 04:32:57 <akhil_jain> i didnt get the last point 04:33:07 <ekcs> something like this: 04:33:09 <ekcs> policy1: 04:33:10 <ekcs> execute[add tuple handled('alarm1') to congress] :- alarm1 04:33:11 <ekcs> execute[action1] :- alarm1, NOT handled('alarm1') 04:33:12 <ekcs> policy2: 04:33:13 <ekcs> execute[add tuple handled(alarm1) to congress] :- alarm1 04:33:14 <ekcs> execute[action2] :- alarm1, NOT handled('alarm1') 04:34:32 <ekcs> that’s a way for the policy writer to say: remember which alarm was already handled, and don’t do something again on the same alarm. 04:35:11 <akhil_jain> right seems good 04:36:58 <akhil_jain> adding another table can help this. also present thing will not be effected 04:37:34 <ekcs> well. unfortunately that still doesn’t solve the problem when both policy are active at the same time. the alarm will trigger both at the same time. 04:38:30 <ekcs> i hope to understand some more why policy writers cannot make sure to add conditions to their rules so that one alarm cannot trigger both actions. 04:38:47 <ekcs> who are the ones expected to write the policy? 04:39:55 <akhil_jain> i am not sure about that. i am not much into deployment side 04:41:05 <ekcs> ok. is there any way to find out who write the policy in projected use case? I think the policy workflow really changes our solutions. 04:42:03 <ekcs> for example, if it’s one “person” writing all the policy, then it’s not hard to make sure multiple rules don’t trigger on the same alarm. 04:42:29 <ekcs> but if it’s many different “people” writing policy independently, then it’s very hard to make sure of the same thing. 04:43:20 <ekcs> another factor is: how often are the policies changed? is it expected to be changed frequently by the operator? or more just operate the way its deployed? 04:44:35 <ekcs> oh and yes it’s ok to share that email on ML. 04:45:08 <akhil_jain> yes, there are serious multiple cases . 04:45:30 <ekcs> I think maybe the next step is to start an etherpad or something to start documenting the problem scenarios and also possible solutions. 04:46:19 <ekcs> and the closer we can understand the real policies and policy writing workflow the better we can solve the problems =) 04:46:26 <akhil_jain> yes everytime reaching on one soln. is alterted again on thinking next time 04:47:14 <ekcs> haha well this is an important and interesting problem. glad we’re discussing and hopefully solving. 04:47:31 <ekcs> should we move on to see if we can squeeze in the other topics before time’s up? 04:47:47 <akhil_jain> yes right 04:47:54 <ekcs> ok =) 04:47:57 <ekcs> #topic Adding created_at in nova servers 04:48:59 <ekcs> wanna start us off akhil_jain ? 04:49:02 <akhil_jain> oh yes. you already said its good to go. but is it possible to calcyulate based on current code to evealuate if server is older than one month or so 04:49:29 <akhil_jain> just adding created_at will be enough? 04:50:34 <akhil_jain> i am not sure if operators will work 04:51:22 <ekcs> oh hmm. is the created_at reported by nova api? 04:51:27 <akhil_jain> yes 04:51:40 <akhil_jain> field name is created 04:52:04 <ekcs> why do we need to calculate then? 04:52:15 <ekcs> sorry just clarifying so I understdand what the considerations are. 04:52:35 <ekcs> ooooh 04:52:37 <ekcs> I get it now. 04:52:44 <ekcs> we’re going to add the field. 04:52:52 <akhil_jain> policy automacially calculating if servers are old enough n execute action on them 04:53:06 <ekcs> but the policy writer wants to find out whether a server is more than 1 month old. that requires builtins to handle parsing and calculating time. 04:53:18 <akhil_jain> yes 04:53:38 <ekcs> right. I’ll need to check the current builtins to see whether it handles the case. 04:53:45 <ekcs> i’ll get back to you on that. 04:54:08 <akhil_jain> ok great, thanks 04:54:27 <akhil_jain> last topic then, about tacker test 04:54:37 <ekcs> btw this is one case for why i have high hopes for the postgres version. they have already built all the builtins. and even when they haven’t someone online already built an extension for it. 04:54:58 <akhil_jain> yes i guess that can solve this 04:55:12 <ekcs> much harder for us to keep up wth what’s needed and much harder for the users when we use our own language and policy engine. 04:55:20 <ekcs> ok anyway moving on then like you said 04:55:23 <ekcs> #topic tacker test 04:55:32 <ekcs> so a couple things. 04:55:50 <akhil_jain> i saw your patch. i will test using same and add another etst for vnf as well 04:55:52 <ekcs> 1. i added zuul config to enable tacker plugin in CI job. but it’s failing right now. 04:56:03 <akhil_jain> yes saw that 04:56:15 <ekcs> i’m not sure why yet. 04:56:50 <ekcs> and yes I added a patch just to test whether the generic approach works for this case. I think it should but haven’t been able to tell because of the devstnack plugin failure. 04:56:51 <ekcs> https://review.openstack.org/#/c/638516/1/congress_tempest_plugin/tests/scenario/congress_datasources/test_tacker.py 04:57:14 <akhil_jain> hm thats y i will test it on my env and let you kniow 04:57:32 <ekcs> ok great! 04:57:37 <akhil_jain> earlier i didnt understood. now got it. thanks 04:57:45 <ekcs> awesome. 04:58:03 <ekcs> one last thing is: 04:58:53 <ekcs> if you have features you want to finish by feature freeze and don’t have time for the tacker tempest test, then we could merge the driver by FF and then merge the tempest test before RC1. not preferred but in a pinch it can be done. 04:59:06 <ekcs> just keep that in mind in prioritizing what you’d like to do =) 04:59:44 <ekcs> that’s all from me. 05:00:10 <akhil_jain> yes any reviews on tacker driver. i can complete that first. plus i am completeing tempest test as well 05:00:54 <ekcs> ok. I think the tacker driver is good to go. I was just waiting for tempest before merging. but if we can merge that first if tempest not ready by FF. 05:00:55 <akhil_jain> nothing else from my side. 05:01:12 <ekcs> ok then. well time’s up too. 05:01:15 <akhil_jain> ok sounds good to me 05:01:17 <ekcs> #endmeeting