15:03:53 #startmeeting oslo 15:03:53 Courtesy ping for bnemec, smcginnis, moguimar, johnsom, stephenfin, bcafarel, kgiusti, jungleboyj 15:03:53 #link https://wiki.openstack.org/wiki/Meetings/Oslo#Agenda_for_Next_Meeting 15:03:54 Meeting started Mon Aug 10 15:03:53 2020 UTC and is due to finish in 60 minutes. The chair is bnemec. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:03:55 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:03:58 The meeting name has been set to 'oslo' 15:04:11 o/ 15:07:43 #topic Red flags for/from liaisons 15:07:56 Nothing from Octavia 15:08:07 I was out last week so I have no idea what's going on. Hopefully someone else can fill us in. :-) 15:08:59 nothing from Barbican 15:09:08 we didn't have a meeting last week 15:09:39 now Hervé is also on PTO 15:10:14 Might be a quick meeting then. 15:10:15 and we need to come to a decision about kevko_ 's patch 15:10:24 Which is okay since I have a ton of emails to get through. :-) 15:10:35 this: https://review.opendev.org/#/c/742193/ 15:11:56 I've added it to the agenda. 15:12:22 #topic Releases 15:12:37 I'll try to take care of these this week since Herve is out. 15:13:41 I guess that's all I have on this topic. 15:13:45 #topic Action items from last meeting 15:14:07 "kgiusti to retire devstack-plugin-zmq" 15:14:41 in progress 15:14:51 Cool, thanks. 15:14:59 "hberaud to sync oslo-cookiecutter contributing template with main cookiecutter one" 15:15:08 Pretty sure I voted on this patch. 15:15:40 Yep. 15:15:41 #link https://review.opendev.org/#/c/743939/ 15:15:50 It's blocked on ci. 15:16:22 Which is fixed by https://review.opendev.org/#/c/745304 15:16:49 So, all in progress, which is good. 15:17:09 #topic zuulv3 migration 15:17:22 The zmq retirement is related to this.. 15:17:40 I thought I saw something about migrating grenade jobs too. 15:17:48 yep 15:18:01 line 213: https://etherpad.opendev.org/p/goal-victoria-native-zuulv3-migration 15:18:04 Yeah - that's one of the "retirement" tasks 15:18:21 https://review.opendev.org/#/q/status:open+project:openstack/project-config+branch:master+topic:retire-devstack-plugin-zmq 15:18:25 - the openstack/devstack-plugin-zmq jobs are covered by repository retirement 15:18:40 Sean McGinnis proposed openstack/oslo-cookiecutter master: sync oslo-cookiecutter contributing template https://review.opendev.org/743939 15:18:42 - oslo.versionedobjects is fixed by https://review.opendev.org/745183 15:19:02 - and clarkb provided patches to port the pbr jobs (https://review.opendev.org/745171, https://review.opendev.org/745189, https://review.opendev.org/745192 ) 15:19:10 \o/ 15:19:47 So basically we have changes in flight to address all of the remaining oslo jobs. 15:20:03 correct 15:20:34 stephenfin: See the above about the pbr jobs. I know you had looked at that too. 15:21:16 Okay, we're on track for this goal. 15:21:20 Thanks for the updates! 15:21:25 And all the patches! 15:21:55 #topic oslo.cache flush patch 15:22:02 #link https://review.opendev.org/#/c/742193/ 15:22:19 moguimar: kevko_: You're up! 15:22:39 cc lbragstad since he had thoughts on this too. 15:22:53 I think the patch is pretty much solid 15:23:11 i need to look at it again 15:23:18 but I'm concerned about Keystone expecting the default behavior to be True and we flipping it to False 15:24:16 If we do go ahead with the patch we must have a way for keystone to default that back to true, IMHO. 15:24:40 imo - it seems like they need to scale up their memcached deployment 15:24:50 And since Keystone is one of the main consumers of oslo.cache I'm unclear how much it will help to turn it off only other places. 15:25:04 because it appears to the root of the issue is that a network event causes memcached to spiral into an unrecoverable error 15:26:01 i need to stand up an environment with caching configured to debug the issue where you don't flush, because i'm suspicious that stale authorization data will be returned 15:26:39 (e.g., when memcached is unreachable, the user revokes their token or changes their password, but their tokens are still in memcached) 15:27:04 what if the default value was True instead 15:27:25 I think it's just one server going down in the pool, then the token getting revoked on a different one, then the original server coming back up that is the problem. 15:27:39 IIUC it can result in a bad cached value for the server that disconnected. 15:27:50 Merged openstack/oslo-cookiecutter master: Add ensure-tox support. https://review.opendev.org/745304 15:27:54 right - you could have inconsistent data across servers 15:28:01 and we don't really handle that in keystone code 15:28:14 * bnemec proposes that we just rm -rf memcache_pool 15:28:52 well - that's essentially what we assume since we flush all memcached data (valid and invalid) when the client reconnects 15:29:22 (we're not sure what happened when you were gone, but rebuild the source of truth) 15:29:56 rebuild from keystone's database, which is the source of truth * 15:32:31 i need to dig into this more, but i haven't had the time 15:32:50 so i don't want to hold things up if it's in a reasonable place (where keystone can opt into the behavior we currently have today) 15:34:13 the patch does two things, turns it into a config option and flips the default behavior 15:39:36 I'm curious what happens in the affected cluster if they just restart all of their services. Doesn't it trigger the same overload? 15:39:46 Maybe on a rolling restart it's spread out enough to not cause a problem? 15:41:07 Moisés Guimarães proposed openstack/oslo.cache master: Bump dogpile.cache's version for Memcached TLS support https://review.opendev.org/745509 15:42:20 Okay, I've left a review that reflects our discussion here. Let me know if I misrepresented anything. 15:43:17 Merged openstack/oslo-cookiecutter master: sync oslo-cookiecutter contributing template https://review.opendev.org/743939 15:43:29 #topic enable oslo.messaging heartbeat fix by default? 15:43:44 This came up the week before I left. 15:43:59 seems like a safe bet at this point 15:44:08 Related to the oslo.messaging ping endpoint change. 15:44:34 yeah, that change I'm not so thrilled about. 15:45:11 I was thinking of -2'ing that change, but wanted to discuss it here first. 15:45:24 too bad herve is off having a life :) 15:45:38 Yeah, related only insofaras it came up in the discussion as an issue with checking liveness of services. 15:45:43 I wanted his opinion 15:45:55 bnemec: +1 15:46:11 We can probably wait until next week. This option has been around for quite a while now so it's not critical that we do it immediately. 15:46:28 I'm not aware of anyone reporting issues with it though. 15:46:46 neither do I 15:47:10 but I think we do need to make a final decision of that ping patch 15:47:14 Okay, I'll just leave it on the agenda for next week. 15:47:21 https://review.opendev.org/#/c/735385/ 15:47:31 Was there more discussion on that after I logged off? 15:47:46 * bnemec has not been through openstack-discuss yet 15:48:01 Lemme check... 15:48:44 http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016229.html 15:49:17 and the start of the discussion: http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016097.html 15:49:45 I think it's in the same place as when I left. :-/ 15:50:05 KK 15:50:23 It feels like a bug in Nova if a compute node can stop responding to messaging traffic and still be seen as "up". 15:50:59 Agreed. Seems like that proposed feature is out of scope for the oslo.messaging project IMHO. 15:51:43 Other than internal state monitoring, o.m. isn't intended to be a healthcheck solution 15:52:03 I feel weird arguing against this when I've been advocating for good-enough healthchecks in the api layer though. :-/ 15:52:36 heck I _wanted_ this, but for my own selfish "don't blame me" reasons :) 15:53:14 Having dan's opinion made me rethink that from a more user-driver perspective. 15:54:36 Anyhow, that's where we stand at the moment. 15:55:12 I was wondering if any folks in Oslo felt differently. 15:55:30 Unfortunately we're a bit short on Oslo folks today. 15:56:16 I'm going to reply to the thread and ask if fixing the service status on the Nova side would address the concern here. That seems like a better fix than adding a bunch of extra ping traffic on the rabbit bus (which is already a bottleneck in most deployments). 15:56:38 +1 15:57:04 #action bnemec to reply to rpc ping thread with results of meeting discussion 15:57:12 thanks bnemec 15:58:43 Okay, we're basically at time now so I'm going to skip the wayward review and open discussion. 15:59:07 I think we had some good discussions this week though, so it was a productive meeting. 15:59:29 If there's anything else we need to discuss, feel free to add it to the agenda for next week or bring it up in regular IRC. 15:59:42 Thanks for joining everyone! 15:59:45 not on my end 15:59:58 o/ 16:00:00 #endmeeting