06:01:06 <yoctozepto> #startmeeting masakari 06:01:07 <opendevmeet> Meeting started Tue Jun 8 06:01:06 2021 UTC and is due to finish in 60 minutes. The chair is yoctozepto. Information about MeetBot at http://wiki.debian.org/MeetBot. 06:01:08 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 06:01:11 <opendevmeet> The meeting name has been set to 'masakari' 06:01:14 <yoctozepto> #topic Roll-call 06:01:16 <yoctozepto> \o/ 06:02:48 <suzhengwei_> o 06:02:52 <yoctozepto> hi suzhengwei_ 06:03:21 <suzhengwei_> yoctozepto: hi 06:06:50 <yoctozepto> ok, let's start 06:06:54 <yoctozepto> #topic Agenda 06:07:14 <yoctozepto> * Roll-call 06:07:14 <yoctozepto> * Agenda 06:07:14 <yoctozepto> * Announcements 06:07:14 <yoctozepto> * Review action items from the last meeting 06:07:14 <yoctozepto> * CI status 06:07:15 <yoctozepto> * Backports pending reviews 06:07:15 <yoctozepto> * Xena planning -> https://etherpad.opendev.org/p/masakari-xena-ptg 06:07:17 <yoctozepto> * Open discussion 06:08:43 <yoctozepto> #topic Announcements 06:08:51 <yoctozepto> no major announcements 06:09:08 <yoctozepto> but I have fixed the devstack functional tests job in masakari 06:09:29 <yoctozepto> and added it to the periodic pipeline 06:09:39 <yoctozepto> so we can observe its daily results 06:10:39 <yoctozepto> the breakage was due to devstack switching to ovn 06:10:59 <yoctozepto> we had a bunch of overrides but they turned out to be incompatible 06:11:11 <yoctozepto> so I restored more defaults 06:12:13 <yoctozepto> #topic Review action items from the last meeting 06:12:16 <yoctozepto> there were none 06:12:25 <yoctozepto> #topic CI status 06:12:34 <yoctozepto> really green now 06:13:53 <yoctozepto> #topic Backports pending reviews 06:14:17 <yoctozepto> none 06:14:28 <yoctozepto> #topic Xena planning -> https://etherpad.opendev.org/p/masakari-xena-ptg 06:16:01 <jopdorp> Morning 06:16:30 <yoctozepto> hi jopdorp 06:16:45 <yoctozepto> anyone has anything to report/discuss on Xena progress 06:16:58 <yoctozepto> I admit to not having much time to focus on these 06:17:34 <jopdorp> Nothing from me for now 06:18:00 <suzhengwei_> I want to talk about one spec. https://review.opendev.org/c/openstack/masakari-specs/+/734017 06:18:50 <yoctozepto> ok 06:20:07 <suzhengwei_> It pass the Zuul check in October 2020. But now it failed. There is no changes. 06:20:28 <yoctozepto> ah, you mean the job failure 06:20:53 <yoctozepto> ah yes, I read the details from it and copied them as a comment 06:21:20 <yoctozepto> yeah, the error is not clear about the reason for unreadability 06:21:33 <yoctozepto> I guess one needs to download the patchset locally and debug 06:22:01 <yoctozepto> give me a moment 06:25:06 <yoctozepto> the error is the same locally 06:29:38 <opendevreview> Radosław Piliszek proposed openstack/masakari-specs master: host monitor by consul https://review.opendev.org/c/openstack/masakari-specs/+/734017 06:30:07 <yoctozepto> fixed it 06:30:48 <yoctozepto> suzhengwei_: ^ 06:31:10 <suzhengwei_> oh. thanks 06:31:48 <yoctozepto> you are welcome 06:32:20 <jopdorp> Great 06:33:47 <yoctozepto> #topic Open discussion 06:34:33 <suzhengwei_> last meeting we talk about how to reduce host failover time. 06:34:46 <yoctozepto> yes 06:35:33 <suzhengwei_> We haven't reached an agreement. 06:38:01 <yoctozepto> looking 06:38:05 <yoctozepto> I thought we had 06:38:45 <suzhengwei_> We agree to evacuate instances after nova-compute down. 06:39:21 <suzhengwei_> We disagree with whether to wait for nova-compute disabled. 06:39:38 <yoctozepto> yes - we either (1) wait for it to be down or (2) force it down 06:39:49 <yoctozepto> we disable nova-compute when we know it is down 06:39:55 <yoctozepto> and don't wait any longer 06:40:01 <yoctozepto> that was my thinking 06:40:22 <yoctozepto> the 1/2 could be configurable because we don't know about fencing 06:40:41 <yoctozepto> the default would be 1 to keep the current/safe one 06:43:09 <yoctozepto> the disabling is masakari's extra feature that we ensure for the user 06:43:15 <yoctozepto> we should not wait on that 06:43:23 <yoctozepto> the current behaviour is a bug 06:43:32 <yoctozepto> I hope I am clear now 06:46:37 <suzhengwei_> yep. We can add our thought on the etherpad. 06:48:49 <yoctozepto> ok 06:49:33 <suzhengwei_> I hit the bug agian in my env today. https://bugs.launchpad.net/masakari-monitors/+bug/1930361. 06:49:35 <opendevmeet> Launchpad bug 1930361 in masakari-monitors "hostmonitor hangs after notifications send failed" [Critical,In progress] - Assigned to suzhengwei (sue.sam) 06:50:55 <suzhengwei_> ku 06:54:55 <yoctozepto> I've done a writeup on https://etherpad.opendev.org/p/masakari-xena-ptg as you asked 06:55:07 <yoctozepto> L261 and below 06:55:18 <yoctozepto> please comment/amend as you see fit 06:55:38 <yoctozepto> regarding bug #1930361 06:55:39 <opendevmeet> bug 1930361 in masakari-monitors "hostmonitor hangs after notifications send failed" [Critical,In progress] https://launchpad.net/bugs/1930361 - Assigned to suzhengwei (sue.sam) 06:55:56 <yoctozepto> I am curious - what is the error being reported? 06:56:11 <yoctozepto> perhaps we should guard against that specific one instead of all possible? 06:56:59 <yoctozepto> that said, I know we have a solution in https://review.opendev.org/c/openstack/masakari-monitors/+/794162 06:57:10 <suzhengwei_> It is easily to produce. While keystone or masakari-api out of service, trigger one host failure. 06:57:10 <yoctozepto> and it just needs unit tests to be adapted 06:58:29 <yoctozepto> ah, it fails on contacting the api 06:58:30 <yoctozepto> ok 06:58:50 <yoctozepto> added to the bug report 06:59:26 <yoctozepto> I hope to get some time to play more with this this week 06:59:41 <yoctozepto> meanwhile, thank you for the meeting 06:59:51 <yoctozepto> I must switch to another one 06:59:54 <yoctozepto> #endmeeting