03:00:27 <samP> #startmeeting masakari 03:00:28 <openstack> Meeting started Tue Jun 5 03:00:27 2018 UTC and is due to finish in 60 minutes. The chair is samP. Information about MeetBot at http://wiki.debian.org/MeetBot. 03:00:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 03:00:31 <openstack> The meeting name has been set to 'masakari' 03:00:37 <samP> Hi all for masakari 03:00:37 <sagara> hi 03:00:40 <tpatil> Hi 03:00:42 <Dinesh__Bhor> Hi 03:00:50 <samP> Hi all, 03:01:11 <samP> From today meeting will start at 0300UTC 03:01:25 <samP> Let's start 03:01:38 <samP> #topic High priority items 03:02:04 <samP> Just push a patch to release python-masakariclient 03:02:08 <samP> #link https://review.openstack.org/#/c/572244/ 03:02:26 <samP> It has some validation error, I will fix it. 03:02:34 <samP> This will release 5.1.0 03:03:53 <samP> Any other high priority items? 03:04:10 <samP> if not lets move to bug and patches 03:04:26 <samP> #topic Bug/Patches 03:04:45 <samP> Any critical bugs or patches to discuss? 03:05:05 <tpatil> https://bugs.launchpad.net/masakari/+bug/1773132 03:05:06 <openstack> Launchpad bug 1773132 in masakari "masakari-engine runs recovery twice for one notification when disconnection with rabbitmq" [Undecided,Confirmed] 03:06:25 <tpatil> Looking at the code, I have marked this bug as confirmed 03:06:38 <tpatil> I will fix this issue 03:07:07 <samP> This problem could happen. 03:07:27 <tpatil> basically we will need to get the notification from db and compare the previous status with the current one and take decision to skip processing 03:09:51 <samP> tpatil: thanks. Do we have any exceptions for rabbit mq delivery failures? 03:10:34 <tpatil> on masakari-aPI side, yes 03:10:57 <tpatil> but after publishing, masakari-api return success status to the caller 03:11:33 <tpatil> and if masakari-engine doesn't get the message for long time, then this situation could occur 03:11:45 <samP> tpatil: correct 03:11:54 <tpatil> this situation is rare though 03:12:35 <samP> for Host failure, this would not be a critical issue. 03:13:36 <samP> ah..can't say for sure. I take my statement back 03:13:38 <samP> sorry 03:16:02 <samP> tpatil: thanks for fixing this. Let's discuss once we have patches 03:16:12 <tpatil> samP: Sure 03:16:16 <samP> Any other bugs? 03:16:19 <samP> or ptches 03:16:31 <samP> s/ptches/patches 03:17:17 <samP> if not let's move to discussion 03:17:26 <samP> #topic discussion 03:17:28 <tpatil> https://bugs.launchpad.net/masakari/+bug/1773765 03:17:29 <openstack> Launchpad bug 1773765 in masakari "There is a possibility that 'running' notification will remain" [Undecided,New] 03:17:43 <tpatil> The issue is almost same as the previous one 03:18:49 <tpatil> since host evacuation is lengthy process, it could fail in between and the status would remain as running instead of "failed". 03:18:50 <samP> almost same, but not duplicate. 03:19:19 <tpatil> But in production environment,this situation could only occur in case of power failure 03:19:42 <samP> or the network failure 03:20:01 <tpatil> yes 03:20:37 <samP> is it possible to check evacuation status from nova side/ 03:20:39 <samP> ? 03:20:53 <tpatil> since we cannot predict how much time its going to take to finish processing host failure notifications, it's difficult to rerun based on running status in the periodic tasks 03:21:24 <samP> tpatil: agree 03:21:57 <samP> Need to check about evacuation, but live migration have cancel feature. 03:23:41 <samP> Second thought, even we have cancel feature, better to wait till it finish or become error 03:25:24 <tpatil> In that case, masakari will need to store the request id and query based on it. But does nova support this feature? need to check 03:26:01 <tpatil> need to check instance actions and figure out the status based on the request if 03:26:02 <samP> well we could listen to nova notifications 03:26:05 <tpatil> s/if/id 03:26:27 <tpatil> yes, that's another option too 03:26:40 <tpatil> but in case of power/network failure, this won't happen, correct 03:26:59 <samP> tpatil: correct 03:27:01 <tpatil> I mean we wouldn't get notifications from nova 03:27:43 <samP> In such failure, we cant say for sure that we can get those notifications 03:28:30 <samP> Best option is to leave this to operator to handle 03:29:11 <tpatil> one thing is for sure, we will need to query nova to find out whether evacuation succeeded or failed and then maybe we can take some decision to process notifications with running status in periodic tasks 03:29:32 <samP> tpatil: agree 03:31:24 <samP> I will update bug report with my findings 03:32:37 <tpatil> we can certainly add some code in the periodic task to notify operators that some notifications which are in running status are taking longer time to process than expected (configurable using new config option) 03:33:45 <samP> + warning log is useful too 03:33:47 <tpatil> it will log warning messages and then operator will need to figure out the issue by themselves 03:34:10 <samP> tpatil: got it 03:35:10 <samP> I think that would be fine for immediate fix 03:35:36 <tpatil> Ok, We will fix this issue 03:36:20 <samP> It is better if we can load this config without restarting masakari-api 03:37:19 <samP> tpatil: thanks 03:38:00 <tpatil> config option will be needed in masakari-engine 03:38:30 <samP> tpatil: sorry, you are correct. 03:40:09 <samP> Any other bugs or patches? 03:40:23 <samP> Please bring them up any time. 03:40:32 <samP> (1) Horizon Plugin 03:40:47 <samP> Need to review... 03:41:13 <tpatil> Niraj will upload a new PS today 03:41:24 <tpatil> then we should start reviewing Add segment panel patch 03:41:40 <samP> I have the check to release deadline for horizon plug-in for Rocky 03:41:48 <samP> tpatil: sure, thans 03:41:56 <samP> s/thans/thanks 03:42:30 <samP> Any way we better merge main patches before Rocky-3 03:42:46 <samP> I will let you know the exact dates for this. 03:43:08 <samP> (2) Ansible support for Masakari 03:43:32 <tpatil> Rocky 3 milesone release date is Jul 23 - Jul 27 03:43:48 <samP> tpatil: yep. 03:44:03 <tpatil> Niraj is working on fixing functional tests 03:44:24 <tpatil> then he will added documentation to install masakari-api and masakari-engine 03:44:32 <tpatil> s/add/added 03:44:38 <samP> tpatil: Thanks 03:44:59 <tpatil> next action is to write masakari-monitor role 03:45:35 <tpatil> fixing functional tests and adding documentation will be done in this week 03:45:43 <samP> tpatil: yep, which might cause some troubles. 03:45:55 <tpatil> and then we will focus on masakari-monitor role 03:45:58 <samP> tpatil: got it. thanks for fixing tests and add docs 03:46:04 <samP> tpatil: got it 03:47:05 <samP> #topic AOB 03:47:41 <samP> I'm working on rpm packaging 03:48:02 <samP> Here is the test packaging which is failing currently.. 03:48:09 <samP> #link https://copr.fedorainfracloud.org/coprs/sampntt/masakari/ 03:48:36 <samP> I will fix this soon, and propose this to RDO. 03:48:56 <samP> Then we can use dnf or yum to install masakari packages... 03:49:20 <samP> That's all form my side 03:49:31 <samP> any updates? 03:49:43 <tpatil> recovery workflow customization 03:49:56 <tpatil> need to review specs 03:50:03 <samP> tpatil: ah, sorry I miss that 03:50:11 <samP> tpatil: I will review that 03:50:19 <tpatil> Thank you 03:51:02 <samP> Need to renew the agenda on wiki... 03:51:48 <tpatil> I will add my agenda items before the next meeting 03:51:57 <samP> And need to update masakari wiki too.. 03:52:13 <samP> tpatil: thanks that would be helpful 03:52:26 <tpatil> Any updates about project mascot? 03:52:35 <samP> tpatil: not yet. 03:52:39 <samP> I will ping them 03:52:44 <tpatil> Ok, thanks 03:55:11 <samP> Any updates? 03:55:25 <samP> or we could finish today's meeting 03:55:35 <tpatil> Nothing from my end for now 03:55:39 <samP> tpatil: thanks 03:56:31 <samP> Please use openstack-dev ML with [masakari] or IRC #openstack-masakari @freenode for further discussion 03:56:40 <samP> Thank you all... 03:56:45 <Dinesh__Bhor> Thank you all 03:56:47 <samP> #endmeeting