07:01:00 #startmeeting masakari 07:01:00 Meeting started Tue Sep 15 07:01:00 2020 UTC and is due to finish in 60 minutes. The chair is yoctozepto. Information about MeetBot at http://wiki.debian.org/MeetBot. 07:01:01 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 07:01:03 The meeting name has been set to 'masakari' 07:01:06 #topic Roll-call 07:01:18 o/ 07:02:01 here we go 07:02:02 o/ 07:02:06 hi 07:02:07 o/ 07:02:17 o/ 07:02:55 o/ 07:03:39 #topic Agenda 07:03:43 * Roll-call 07:03:43 * Agenda 07:03:43 * Announcements 07:03:43 ** gates have been fixed and are running Focal now 07:03:43 ** python-masakariclient 6.1.1 released for Victoria (branched): https://releases.openstack.org/victoria/index.html#victoria-python-masakariclient 07:03:43 ** we are in Victoria feature freeze now / RC1 next week: https://releases.openstack.org/victoria/schedule.html 07:03:44 * Review action items from the last meeting 07:03:44 * CI status 07:03:45 * Critical Bugs and Patches 07:03:45 * Victoria release planning 07:03:46 * Open discussion 07:03:54 #topic Announcements 07:04:03 #info gates have been fixed and are running Focal now 07:04:30 a part of the Victoria community goals was to migrate the testing to Focal which runs Py38 07:04:38 it has happened now 07:05:12 I had to self-merge a bunch of patches (due to deadlines) to make it happen but we are there 07:05:42 good 07:05:45 all merged patches are visible in gerrit so you can post-review them still and raise any issues; there should really be none though due to the kind of those patches 07:06:09 I remember a patch that removed py37 tests 07:06:10 many thanks to gmann for helping with the migration and driving the goal openstack-wise 07:06:20 jopdorp: yes 07:06:30 do note py37 is *not* a target platform for openstack now 07:06:41 cool 07:07:14 https://governance.openstack.org/tc/reference/runtimes/victoria.html 07:07:23 thanks 07:07:41 the spoken-of migration has happened so py37 is no longer relevant 07:07:57 it's unlikely it breaks now that we test py36 and py38 but who knows :-) 07:08:05 agreed 07:08:10 #info https://governance.openstack.org/tc/reference/runtimes/victoria.html 07:08:31 (in case you are wondering why I'm doing the # stuff - it's to get these entries into the summary) 07:08:50 I was indeed wondering 07:09:02 #info python-masakariclient 6.1.1 released for Victoria (branched): https://releases.openstack.org/victoria/index.html#victoria-python-masakariclient 07:09:23 python-masakariclient has a stable/victoria branch now 07:09:31 this week all client libraries got their releases 07:09:54 and there should be no feature changes whatsoever in them now that they are officially "stable" 07:10:13 #info we are in Victoria feature freeze now / RC1 next week: https://releases.openstack.org/victoria/schedule.html 07:10:31 that makes me sad kind of 07:10:42 noonedeadpunk: me too, unfortunately 07:10:49 me, too 07:11:03 this one didn't make it https://review.opendev.org/#/c/740777/ 07:11:04 don't worry, we'll make wallaby the best masakari release ever :-) 07:11:10 haha 07:11:13 yes 07:11:21 #info https://review.opendev.org/740777 07:11:50 as a matter of fact, due to low gate activity and low complexity of that patch, I suggest we put it in the exceptions and just merge it 07:11:57 then we have at least one feature in victoria 07:12:24 suzhengwei: what do you think? 07:12:29 jopdorp has already +2 07:12:37 agree 07:12:44 I find it uncomfortable to +2 my own proposals 07:12:48 unless they fix the gates 07:13:15 suzhengwei: thanks, please review and hopefully leave a +2 today/tomorrow :-) 07:13:26 nice 07:13:40 #info Review action items from the last meeting 07:13:48 aaand there were none! 07:13:56 lol 07:13:58 oopsie 07:14:01 #undo 07:14:02 Removing item from minutes: #info Review action items from the last meeting 07:14:10 #topic Review action items from the last meeting 07:14:21 #info there were none 07:14:32 #topic CI status 07:14:50 the master CI is green, I did not check others 07:15:03 master+Victoria considering the client has already branched 07:15:38 we could use some better organization in this regard, I'll try to spin up some CI glory page like we have for Kolla 07:15:59 #action yoctozepto to bring some visibility to Masakari CI status 07:16:09 I don't really know where to look for the ci stuff 07:16:33 only the results in gerrit 07:16:42 but I don't know where they get configured 07:16:59 or where the results for the main branches are visible 07:17:14 no problem, I know most of you are new to this so that's why I just assigned this task to myself 07:17:25 in general one can look at https://zuul.opendev.org/t/openstack/builds 07:17:34 with filters obviously 07:17:38 or https://zuul.opendev.org/t/openstack/status 07:17:41 for currently running 07:17:58 is that also where they are configured? 07:18:03 I don't remember if masakari runs relevant periodics at the moment 07:18:26 jopdorp: all CI stuff is configured in the repo nowadays 07:18:34 the CI/CD system that openstack uses is Zuul 07:18:52 in this setup we are considered users of zuul so these docs hold: https://zuul-ci.org/docs/zuul/reference/user.html 07:19:06 it's usually either zuul.yaml or zuul.d with some more yamls inside 07:19:20 it can be a hidden file/dir so .zuul.yaml .zuul.d respectively 07:19:45 zuul is driven using yaml and ansible (which still uses yaml) 07:20:07 for all the other details please just review the files in repo and the docs :-) 07:20:12 thanks, I'll dive into that 07:20:50 #topic Critical Bugs and Patches 07:21:04 #info none so far 07:21:10 hi, I wanted to raise about this review. https://review.opendev.org/#/c/720623/ 07:21:15 but it could be that they have not been triaged 07:21:37 sorry, probably not the good timing yet? 07:22:09 ykado: well, it's a fix to some bug but not necessarily critical I guess? let's postpone for the open discussion 07:22:19 ok 07:22:38 if you know of breaking/fugly bugs then please report/triage them 07:22:51 we encountered something that I'm not entirely sure is a masakari bug 07:23:12 please speak up 07:23:20 but we weren't able yet to get failovers of instances with LUKS encrypted volumes tow ork 07:23:36 they get a keymanager error 07:23:42 hmm, that does not sound like something masakari could go wrong about 07:23:45 barbican right related 07:23:52 rights 07:23:58 masakari essentially runs evacuations against instances 07:24:09 try plain evacuation and it might be failing 07:24:20 yeah 07:24:27 I *think* I saw someone reporting this issue against cinder+barbican 07:24:27 I think it's more configuration related 07:24:37 could 07:24:51 well then, let's not wander offtopic too much :-) 07:24:53 probably the place would be @openstack-kolla 07:24:55 # 07:25:08 jopdorp: yeah 07:25:11 #topic Victoria release planning 07:25:45 we already know it's frozen (freezing? :-) ) and we can only really squeeze that one patch of mine I mentioned 07:25:51 (plus obviously any bug fixes) 07:26:05 (noonedeadpunk triggered) 07:26:19 next week is RC1 07:26:29 so all the other repos will branch stable/victoria as well 07:26:45 RC1 is R-3 07:26:57 so then it's a matter of 3 weeks to polish eventual issues 07:27:31 #topic Open discussion 07:27:42 ykado: now it's the time 07:27:47 what about that commit 07:27:52 yoctozepto: thanks 07:28:02 https://review.opendev.org/720623 07:28:15 I was wondering how this can progress. 07:28:53 the Radosław guy is me so my opinion on how that should progress is in that comment there 07:29:29 the "better design" part is surely about wallaby now 07:29:43 but the rest holds 07:30:25 suzhengwei: could you comment on that? 07:30:49 the part that needs dealing is "I guess we could still mix the two ideas and check on init while considering the timeout" 07:31:04 the current design is the simplest 07:31:21 so it's actually a mix of your (suzhengwei) and tpatil's ideas 07:32:12 no, I didn't get tpatil's thought. 07:32:38 I see. it is true, that there is no real reproducible way, if I understand it correctly. 07:32:38 I only could reproduce this by forcefully powering off all the compute nodes or by stopping the masakari-engine services 07:32:38 however, without this patch there is no way to recover the compute-nodes that got resolved, unless you update the database manually 07:33:26 yes, it is a big use problem for product. 07:33:49 hmm, maybe what we need is an easier but manual way to achieve that 07:34:13 I'm worried tpatil is right that this could be too aggressive and result in more masakari surprises 07:34:46 i.e. simple but backstabbing :-) 07:35:56 It give a expired time for user to config. and that looks reasonable. 07:36:38 I agree. sorry, I'm still new to Masakari. but what are the potential issue that relates with "running" statuses, if this get removed(although the default timeout value is quite long as suzhengwei mentioned) ? 07:36:55 If it can't recovery one failure host in a short time, the HA is useless. 07:38:25 suzhengwei: true that 07:38:27 I think to turn it into failure is OK 07:38:58 one thing is some notifications are host-level so for a large host this could take a while 07:40:13 so I leave the expired time configrable. 07:40:58 aye, it's set to 24 hours 07:41:01 by default 07:42:26 ok, there is one edge case that this deals with because of RUNNING and generated_time 07:42:46 it could be that the engine picks up a notification to run and self-sabotages itself 07:43:06 imagine a situation where the engine was down too long 07:43:36 or maybe not 07:43:44 because rpc call will surely expire by this point 07:44:33 I think it is an controller node issue. 07:44:47 controller HA issue. 07:45:30 yeah, masakari does not do a great job of self-HA 07:45:35 other service also suffer 07:46:32 yeah, but it's no consolation considering masakari is THE HA project :D 07:48:07 controller HA and instance/compute HA is diffrent issue. They have their own solution. 07:49:33 true that but still sad 07:49:42 we can't avoid all controller HA problem influence. 07:50:47 well, we could mitigate more though but it needs some redesign to happen 07:51:03 your approach seems to be dealing with the reported issue 07:51:23 Doing in a better way is better than doing nothing. 07:53:19 https://review.opendev.org/#/c/732477/ 07:54:11 this is a long term spec, it gives a solution-degrade retry. 07:56:10 looks promising 07:57:03 ok, I'll re-review suzhengwei's patch; I just need to delve into the masakari code more to be more confident about it 07:57:11 anyone else up to the review task? 07:58:42 oh my, I have completely forgotten - the virtual PTG is coming - do we want a session for masakari? do you have any time preferences? please let me know via mail - I'll spin up a thread on openstack-discuss 07:58:48 I hope you are all subscribed 07:59:14 #action yoctozepto to spin up a Masakari Wallaby vPTG thread on openstack-discuss mailing list 07:59:45 please suzhengwei remember to review the only-feature-patch-that-we-can-get-it 07:59:47 thank you 07:59:52 and thank you all for attending 07:59:58 thank you! 08:00:03 #endmeeting