06:01:06 <yoctozepto> #startmeeting masakari
06:01:14 <yoctozepto> #topic Roll-call
06:01:16 <yoctozepto> \o/
06:02:48 <suzhengwei_> o
06:02:52 <yoctozepto> hi suzhengwei_
06:03:21 <suzhengwei_> yoctozepto: hi
06:06:50 <yoctozepto> ok, let's start
06:06:54 <yoctozepto> #topic Agenda
06:07:14 <yoctozepto> * Roll-call
06:07:14 <yoctozepto> * Agenda
06:07:14 <yoctozepto> * Announcements
06:07:14 <yoctozepto> * Review action items from the last meeting
06:07:14 <yoctozepto> * CI status
06:07:15 <yoctozepto> * Backports pending reviews
06:07:15 <yoctozepto> * Xena planning -> https://etherpad.opendev.org/p/masakari-xena-ptg
06:07:17 <yoctozepto> * Open discussion
06:08:43 <yoctozepto> #topic Announcements
06:08:51 <yoctozepto> no major announcements
06:09:08 <yoctozepto> but I have fixed the devstack functional tests job in masakari
06:09:29 <yoctozepto> and added it to the periodic pipeline
06:09:39 <yoctozepto> so we can observe its daily results
06:10:39 <yoctozepto> the breakage was due to devstack switching to ovn
06:10:59 <yoctozepto> we had a bunch of overrides but they turned out to be incompatible
06:11:11 <yoctozepto> so I restored more defaults
06:12:13 <yoctozepto> #topic Review action items from the last meeting
06:12:16 <yoctozepto> there were none
06:12:25 <yoctozepto> #topic CI status
06:12:34 <yoctozepto> really green now
06:13:53 <yoctozepto> #topic Backports pending reviews
06:14:17 <yoctozepto> none
06:14:28 <yoctozepto> #topic Xena planning -> https://etherpad.opendev.org/p/masakari-xena-ptg
06:16:01 <jopdorp> Morning
06:16:30 <yoctozepto> hi jopdorp
06:16:45 <yoctozepto> anyone has anything to report/discuss on Xena progress
06:16:58 <yoctozepto> I admit to not having much time to focus on these
06:17:34 <jopdorp> Nothing from me for now
06:18:00 <suzhengwei_> I want to talk about one spec. https://review.opendev.org/c/openstack/masakari-specs/+/734017
06:18:50 <yoctozepto> ok
06:20:07 <suzhengwei_> It pass the Zuul check in October 2020. But now it failed. There is no changes.
06:20:28 <yoctozepto> ah, you mean the job failure
06:20:53 <yoctozepto> ah yes, I read the details from it and copied them as a comment
06:21:20 <yoctozepto> yeah, the error is not clear about the reason for unreadability
06:21:33 <yoctozepto> I guess one needs to download the patchset locally and debug
06:22:01 <yoctozepto> give me a moment
06:25:06 <yoctozepto> the error is the same locally
06:29:38 <opendevreview> Radosław Piliszek proposed openstack/masakari-specs master: host monitor by consul  https://review.opendev.org/c/openstack/masakari-specs/+/734017
06:30:07 <yoctozepto> fixed it
06:30:48 <yoctozepto> suzhengwei_: ^
06:31:10 <suzhengwei_> oh. thanks
06:31:48 <yoctozepto> you are welcome
06:32:20 <jopdorp> Great
06:33:47 <yoctozepto> #topic Open discussion
06:34:33 <suzhengwei_> last meeting we talk about how to reduce host failover time.
06:34:46 <yoctozepto> yes
06:35:33 <suzhengwei_> We haven't reached an agreement.
06:38:01 <yoctozepto> looking
06:38:05 <yoctozepto> I thought we had
06:38:45 <suzhengwei_> We agree to evacuate instances after nova-compute down.
06:39:21 <suzhengwei_> We disagree with whether to wait for nova-compute disabled.
06:39:38 <yoctozepto> yes - we either (1) wait for it to be down or (2) force it down
06:39:49 <yoctozepto> we disable nova-compute when we know it is down
06:39:55 <yoctozepto> and don't wait any longer
06:40:01 <yoctozepto> that was my thinking
06:40:22 <yoctozepto> the 1/2 could be configurable because we don't know about fencing
06:40:41 <yoctozepto> the default would be 1 to keep the current/safe one
06:43:09 <yoctozepto> the disabling is masakari's extra feature that we ensure for the user
06:43:15 <yoctozepto> we should not wait on that
06:43:23 <yoctozepto> the current behaviour is a bug
06:43:32 <yoctozepto> I hope I am clear now
06:46:37 <suzhengwei_> yep. We can add our thought on the etherpad.
06:48:49 <yoctozepto> ok
06:49:33 <suzhengwei_> I hit the bug agian in my env today. https://bugs.launchpad.net/masakari-monitors/+bug/1930361.
06:49:35 <opendevmeet> Launchpad bug 1930361 in masakari-monitors "hostmonitor hangs after notifications send failed" [Critical,In progress] - Assigned to suzhengwei (sue.sam)
06:50:55 <suzhengwei_> ku
06:54:55 <yoctozepto> I've done a writeup on https://etherpad.opendev.org/p/masakari-xena-ptg as you asked
06:55:07 <yoctozepto> L261 and below
06:55:18 <yoctozepto> please comment/amend as you see fit
06:55:38 <yoctozepto> regarding bug #1930361
06:55:39 <opendevmeet> bug 1930361 in masakari-monitors "hostmonitor hangs after notifications send failed" [Critical,In progress] https://launchpad.net/bugs/1930361 - Assigned to suzhengwei (sue.sam)
06:55:56 <yoctozepto> I am curious - what is the error being reported?
06:56:11 <yoctozepto> perhaps we should guard against that specific one instead of all possible?
06:56:59 <yoctozepto> that said, I know we have a solution in https://review.opendev.org/c/openstack/masakari-monitors/+/794162
06:57:10 <suzhengwei_> It is easily to produce. While keystone or masakari-api out of service, trigger one host failure.
06:57:10 <yoctozepto> and it just needs unit tests to be adapted
06:58:29 <yoctozepto> ah, it fails on contacting the api
06:58:30 <yoctozepto> ok
06:58:50 <yoctozepto> added to the bug report
06:59:26 <yoctozepto> I hope to get some time to play more with this this week
06:59:41 <yoctozepto> meanwhile, thank you for the meeting
06:59:51 <yoctozepto> I must switch to another one
06:59:54 <yoctozepto> #endmeeting