06:01:06 #startmeeting masakari 06:01:07 Meeting started Tue Jun 8 06:01:06 2021 UTC and is due to finish in 60 minutes. The chair is yoctozepto. Information about MeetBot at http://wiki.debian.org/MeetBot. 06:01:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 06:01:11 The meeting name has been set to 'masakari' 06:01:14 #topic Roll-call 06:01:16 \o/ 06:02:48 o 06:02:52 hi suzhengwei_ 06:03:21 yoctozepto: hi 06:06:50 ok, let's start 06:06:54 #topic Agenda 06:07:14 * Roll-call 06:07:14 * Agenda 06:07:14 * Announcements 06:07:14 * Review action items from the last meeting 06:07:14 * CI status 06:07:15 * Backports pending reviews 06:07:15 * Xena planning -> https://etherpad.opendev.org/p/masakari-xena-ptg 06:07:17 * Open discussion 06:08:43 #topic Announcements 06:08:51 no major announcements 06:09:08 but I have fixed the devstack functional tests job in masakari 06:09:29 and added it to the periodic pipeline 06:09:39 so we can observe its daily results 06:10:39 the breakage was due to devstack switching to ovn 06:10:59 we had a bunch of overrides but they turned out to be incompatible 06:11:11 so I restored more defaults 06:12:13 #topic Review action items from the last meeting 06:12:16 there were none 06:12:25 #topic CI status 06:12:34 really green now 06:13:53 #topic Backports pending reviews 06:14:17 none 06:14:28 #topic Xena planning -> https://etherpad.opendev.org/p/masakari-xena-ptg 06:16:01 Morning 06:16:30 hi jopdorp 06:16:45 anyone has anything to report/discuss on Xena progress 06:16:58 I admit to not having much time to focus on these 06:17:34 Nothing from me for now 06:18:00 I want to talk about one spec. https://review.opendev.org/c/openstack/masakari-specs/+/734017 06:18:50 ok 06:20:07 It pass the Zuul check in October 2020. But now it failed. There is no changes. 06:20:28 ah, you mean the job failure 06:20:53 ah yes, I read the details from it and copied them as a comment 06:21:20 yeah, the error is not clear about the reason for unreadability 06:21:33 I guess one needs to download the patchset locally and debug 06:22:01 give me a moment 06:25:06 the error is the same locally 06:29:38 Radosław Piliszek proposed openstack/masakari-specs master: host monitor by consul https://review.opendev.org/c/openstack/masakari-specs/+/734017 06:30:07 fixed it 06:30:48 suzhengwei_: ^ 06:31:10 oh. thanks 06:31:48 you are welcome 06:32:20 Great 06:33:47 #topic Open discussion 06:34:33 last meeting we talk about how to reduce host failover time. 06:34:46 yes 06:35:33 We haven't reached an agreement. 06:38:01 looking 06:38:05 I thought we had 06:38:45 We agree to evacuate instances after nova-compute down. 06:39:21 We disagree with whether to wait for nova-compute disabled. 06:39:38 yes - we either (1) wait for it to be down or (2) force it down 06:39:49 we disable nova-compute when we know it is down 06:39:55 and don't wait any longer 06:40:01 that was my thinking 06:40:22 the 1/2 could be configurable because we don't know about fencing 06:40:41 the default would be 1 to keep the current/safe one 06:43:09 the disabling is masakari's extra feature that we ensure for the user 06:43:15 we should not wait on that 06:43:23 the current behaviour is a bug 06:43:32 I hope I am clear now 06:46:37 yep. We can add our thought on the etherpad. 06:48:49 ok 06:49:33 I hit the bug agian in my env today. https://bugs.launchpad.net/masakari-monitors/+bug/1930361. 06:49:35 Launchpad bug 1930361 in masakari-monitors "hostmonitor hangs after notifications send failed" [Critical,In progress] - Assigned to suzhengwei (sue.sam) 06:50:55 ku 06:54:55 I've done a writeup on https://etherpad.opendev.org/p/masakari-xena-ptg as you asked 06:55:07 L261 and below 06:55:18 please comment/amend as you see fit 06:55:38 regarding bug #1930361 06:55:39 bug 1930361 in masakari-monitors "hostmonitor hangs after notifications send failed" [Critical,In progress] https://launchpad.net/bugs/1930361 - Assigned to suzhengwei (sue.sam) 06:55:56 I am curious - what is the error being reported? 06:56:11 perhaps we should guard against that specific one instead of all possible? 06:56:59 that said, I know we have a solution in https://review.opendev.org/c/openstack/masakari-monitors/+/794162 06:57:10 It is easily to produce. While keystone or masakari-api out of service, trigger one host failure. 06:57:10 and it just needs unit tests to be adapted 06:58:29 ah, it fails on contacting the api 06:58:30 ok 06:58:50 added to the bug report 06:59:26 I hope to get some time to play more with this this week 06:59:41 meanwhile, thank you for the meeting 06:59:51 I must switch to another one 06:59:54 #endmeeting