14:00:01 <mnasiadka> #startmeeting kolla
14:00:01 <opendevmeet> Meeting started Wed Feb  7 14:00:01 2024 UTC and is due to finish in 60 minutes.  The chair is mnasiadka. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:01 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:01 <opendevmeet> The meeting name has been set to 'kolla'
14:00:04 <mnasiadka> #topic rollcall
14:00:04 <mnasiadka> o/
14:00:08 <mmalchuk> \o
14:00:18 <osmanlicilegi> o/
14:00:30 <dougszu> |o
14:00:31 <jovial> \o
14:00:33 <frickler> \o
14:00:35 <mhiner> o/
14:00:46 <kevko> \o/
14:01:14 <bbezak> \o
14:01:14 <mattcrees> o/
14:01:21 <jangutter> o/
14:01:38 <SvenKieske> o/
14:02:04 <mnasiadka> #topic agenda
14:02:04 <mnasiadka> * CI status
14:02:05 <mnasiadka> * Release tasks
14:02:05 <mnasiadka> * Regular stable releases (first meeting in a month)
14:02:05 <mnasiadka> * Current cycle planning
14:02:06 <mnasiadka> * Additional agenda (from whiteboard)
14:02:06 <mnasiadka> * Open discussion
14:02:08 <mnasiadka> #topic CI status
14:02:28 <mnasiadka> So, we broke OVN jobs - fixing that with https://review.opendev.org/c/openstack/kolla-ansible/+/908166
14:02:49 <mnasiadka> Rocky9 Ironic CI is suffering from ironic-api vs ironic-inspector race
14:02:57 <mnasiadka> any other CI issues that I haven't noticed?
14:03:07 <mnasiadka> And centos9 decided to break libvirt
14:03:11 <bbezak> cephadm jobs?
14:03:30 <bbezak> last time I checked those were failing
14:03:47 <mnasiadka> yeah, those fail from time to time, but maybe there's something new in them
14:03:53 <mnasiadka> anyway, CI needs some love
14:03:57 <frickler> yoga is gone to unmaintained, so upgrade jobs in zed will be failing, too. I pushed patches already
14:04:14 <mnasiadka> Ok, let's merge those if they pass
14:04:23 <mnasiadka> #topic Release tasks
14:04:48 <mnasiadka> R-8 week - nothing planned for it in release schedule
14:05:07 <mnasiadka> #topic Regular stable releases
14:05:16 <mnasiadka> Did we get in the stable releases in Jan?
14:05:21 <bbezak> yes
14:05:40 <mnasiadka> fantastic, so then it's time for Feb releases (excluding Yoga of course)
14:05:44 <mnasiadka> Any volunteer?
14:06:07 <bbezak> can do
14:06:16 <bbezak> aka will do
14:06:18 <frickler> updating the docs to drop yoga would also be nice
14:06:34 <mnasiadka> good idea
14:06:45 <mnasiadka> #topic Current cycle planning
14:06:53 <mnasiadka> We bumped Ansible
14:07:00 <mnasiadka> jovial: Would be nice to do the same in Kayobe
14:07:12 <jovial> Sounds like a good idea
14:07:13 <mnasiadka> I started working on the OVN BGP Agent
14:07:38 <mnasiadka> The same for Ubuntu 24.04 LTS - so we don't do it last minute
14:08:11 <SvenKieske> slightly OT but might be good to be aware of, this OVN SNAT bug, if you haven't seen it already: https://bugs.launchpad.net/bugs/2051935
14:08:14 <mnasiadka> SLURP patch would like some reviews I guess - https://review.opendev.org/c/openstack/kolla-ansible/+/905322
14:08:24 <bbezak> I think I'll add secure RBAC for ironic to the release tasks, as it looks we need to push it there this release
14:08:48 <mnasiadka> SvenKieske: that's probably a bit unusual setup, as in two levels of routers
14:08:49 <SvenKieske> +1 I'm happy to review the rbac and service role stuff, want to be done with it :D
14:09:34 <SvenKieske> true, regarding the bug, but still a little bit disturbing.
14:09:38 <mnasiadka> anybody working on anything from the list?
14:09:56 <mnasiadka> list == whiteboard L231
14:10:26 <SvenKieske> I pestered some folks from OSBA regarding mirrors at fosdem
14:10:54 <mnasiadka> Any luck?
14:10:57 <SvenKieske> might be we actually get new mirrors for a more stable CI, but I believe it when I see it (I guess I will need to do more talking still)
14:11:24 <mnasiadka> Ok then, good luck ;)
14:11:28 <SvenKieske> it was promised to me under the influence of some alcohol, so let's see how the promise holds up once everyone is sober ;)
14:12:07 <mnasiadka> Ok, let's move to topics from whiteboard
14:12:11 <mnasiadka> #topic Additional agenda (from whiteboard)
14:12:21 <mnasiadka> (SvenKieske): https://bugs.launchpad.net/kolla-ansible/+bug/2049762 (service token verification in cinder wrong?)
14:12:45 <mnasiadka> SvenKieske: I guess after bbezak's work - we could just stop setting service_token = admin?
14:12:52 <SvenKieske> yeah
14:13:24 <kevko> we should
14:13:31 <SvenKieske> I proposed a singular patch for that, I was curious if our CI would break, it didn't, at least not obvious. I guess it's a matter of taste if we want two patches for that
14:14:19 <mnasiadka> ok, we should get all service roles/tokens/ironic system scope patches as RP+1 and start reviewing them
14:14:20 <SvenKieske> I'm fine either way, we probably need to discuss the service role patch distinctly. I think it can be much simpler than it currently is.
14:14:35 <mnasiadka> bbezak: can you group them in a topic and do RP+1?
14:14:36 <SvenKieske> big +1 from me on getting this stuff over the line :)
14:14:44 <bbezak> service role is ready to review https://review.opendev.org/c/openstack/kolla-ansible/+/815577
14:14:48 <bbezak> the other one I have some more ideas
14:15:15 <bbezak> but yeah, I'll group them in the topic
14:15:21 <mnasiadka> goodie, thanks
14:15:37 <mnasiadka> (bbezak): Service role discussion - https://review.opendev.org/c/openstack/kolla-ansible/+/815577/
14:15:43 <mnasiadka> anything to discuss here?
14:16:29 <bbezak> indeed, there were some questions from frickler and SvenKieske if we need admin role still for service users
14:16:38 <bbezak> and we probably don't for some services
14:16:55 <bbezak> however not all projects implemented service role support
14:17:04 <bbezak> https://etherpad.opendev.org/p/rbac-goal-tracking#L48
14:17:44 <frickler> but if this isn't ready upstream, do we need to adopt it at all already?
14:18:15 <bbezak> ironic needs it, cinder apparently too for this service_token
14:18:33 <SvenKieske> frickler: well some projects (cinder/nova) do regard our current handling of admin role as a security bug, if you read https://bugs.launchpad.net/kolla-ansible/+bug/2049762
14:18:58 <frickler> so then only change the accounts for those projects? also hurray for openstack doing wildly inconsistent stuff once again
14:19:24 <jovial> Seems sane to adopt it for the projects that support it
14:19:25 <mnasiadka> yeah, I think we should implement what works today, and track the rest of the projects
14:19:27 <SvenKieske> I think we should configure stuff with the minimum needed roles possible, obviously. and if we need to maybe split up the existing service_ks_register role for that, fine.
14:19:55 <mnasiadka> bbezak: seems you went in a nice rabbit hole
14:20:05 <bbezak> :)
14:20:10 <SvenKieske> I actually don't think we should let users override this, but if user are already using it, we can't of course deprecate this functionality this fast.
14:20:33 <bbezak> I adopted old change that did the same thing and polished it with new services etc.
14:20:34 <SvenKieske> it is a rabbit hole, for sure. at least I learned some stuff about ironic and rbac in keystone :)
14:20:48 <bbezak> I'm fine with going with service role only for ironic/cinder for now
14:20:57 <bbezak> let's see if it will work
14:21:03 <bbezak> I'll focus on ironic
14:21:15 <mnasiadka> There's Neutron mentioned in the rbac goals
14:21:17 <bbezak> and SvenKieske could for cinder with his patch for service tokens
14:21:47 <mnasiadka> Problem with service tokens in cinder is that we need to backport this all the way to zed (unmaintained/yoga ?)
14:21:59 <mnasiadka> So maybe the question is what is the minimum set cinder needs
14:22:02 <SvenKieske> okay for me, but then I guess I need to adapt it to explicitly use the "service" role. I'm not sure I understand our config merging code in this regard :D
14:22:15 <SvenKieske> mnasiadka: ack
14:22:36 <bbezak> I thought that adding service roles for all services is a solution that could would solve our issues in the future
14:22:50 <mnasiadka> bbezak: as long as those services support it :)
14:22:51 <bbezak> but we could do that selectively too
14:22:58 <mnasiadka> and it seems some support it in 2023.1, some 2023.2, etc
14:23:14 <bbezak> adding service role won't hurt :)
14:23:14 <mnasiadka> So seems like fantastic mess
14:23:34 <mnasiadka> yeah well, if it's not needed and supported, then maybe it doesn't make sense
14:23:42 <bbezak> I agree
14:23:56 <mnasiadka> so maybe we should have per service/service group patches
14:24:01 <bbezak> 75% agree :)
14:24:17 <mnasiadka> I know that's more work, but this way we can decide what to backport
14:24:30 <bbezak> but yeah it is a mess. Ironic being in fact only service with system scope is somewhat breaking my mind
14:24:36 <SvenKieske> bbezak: I think in the long run your approach is fine, I don't know if we need to patch each service for that, though. maybe have three widgets for this? $service_role_default=service $service_role_not_migrated_yet=admin $service_role_user_override_beware_here_be_dragons=foobar
14:24:39 <bbezak> but that's different story
14:25:07 <SvenKieske> bbezak: you are not alone in that, I still don't know if I understood all this really (I think I didn't)
14:25:52 <SvenKieske> maybe someone needs to draw a nice flowchart how this works :D
14:26:32 <mnasiadka> Well, I think it might make sense to implement service roles for those projects, that support that today
14:26:33 <jovial> I guess the service user doesn't get any more perms with the service role over admin. So I can see bbezak's point of doing it in a big bang.
14:26:39 <mnasiadka> we have a list on the etherpad
14:27:28 <mnasiadka> I don't think just adding the role fixes anything, still we need some per-service configuration (e.g. cinder.conf) entries, right?
14:27:40 <SvenKieske> I was under the impression bbezak did refresh that etherpad, is the information in there current, or stale?
14:27:57 <bbezak> old etherpad with system scope is stale
14:28:08 <SvenKieske> mnasiadka: at least for some services I think a customization is currently needed (not 100% sure)
14:28:13 <bbezak> mnasiadka: adding service rolesis just initial thing yes
14:28:42 <bbezak> ironic for instance needs also that - https://review.opendev.org/c/openstack/kolla-ansible/+/908007
14:28:53 <bbezak> if not then system scope member user
14:29:20 <mnasiadka> yeah, I get that - and on Ironic side they enabled enforcing new defaults
14:29:32 <mnasiadka> meaning enforcing system scope
14:29:53 <mnasiadka> I'm not a fan of adding a role to a user just because 7 years later they might support service roles
14:30:23 <bbezak> ok, I'll add it just for ironic. will check if it will cope with just service role, or it will need admin too
14:31:51 <mnasiadka> I can have a look on Neutron, as in how to switch it to use service role in service-to-service communication
14:31:59 <bbezak> furthermore the inital change for adding service roles to all projects was somewhat agreed within the comments of then PTL
14:32:11 <bbezak> but I agree that things pivoted since then
14:32:23 <bbezak> out of system scope most importantly
14:33:23 <mnasiadka> We assumed every project will implement it in a reasonable time
14:33:31 <mnasiadka> Now it seems it's not that simple
14:33:59 <mnasiadka> scope implementation and service role are different phases
14:34:11 <bbezak> and scope died, so :)
14:34:22 <mnasiadka> well, system scope died
14:34:26 <bbezak> yeah
14:34:32 <bbezak> ok, I'm done with secure rbac for now thx :)
14:34:44 <SvenKieske> xD
14:34:45 <mnasiadka> #link https://governance.openstack.org/tc/goals/selected/consistent-and-secure-rbac.html#change-in-scope-implementation
14:35:28 <mnasiadka> ok, let's move on
14:35:39 <mnasiadka> (mnasiadka): Add unmaintained/* reviews to Gerrit dashboards (another section or stable backports)
14:35:55 <mnasiadka> basically gerrit bot doesn't announce patches to that branch
14:36:03 <mnasiadka> and we don't see them in the review dashboard as well
14:36:23 <mnasiadka> so either we get a new section in the review dashboard called Unmaintained branches
14:36:34 <mnasiadka> or we change the query for stable backports
14:36:39 <mnasiadka> or we basically don't care
14:36:42 <mnasiadka> which one do we choose?
14:37:10 <SvenKieske> I'm confused (again) why do we want notifications on unmaintained branches? wasn't the point that the "unmaintained" team handles those?
14:37:18 * frickler don't core
14:37:35 <SvenKieske> so I would opt for don't care, but maybe someone has a compelling reason? I can't think of any though.
14:37:35 <frickler> ehm ... care even ... but core is also not wrong ;)
14:37:48 <mmalchuk> I'm confused with Kayobe - there is 2 commits in stable/yoga which are not in unmaintained/yoga
14:37:51 <mnasiadka> Maybe another question
14:37:56 <mnasiadka> mmalchuk: this one later
14:38:13 <mnasiadka> Who wants to maintain unmaintained/yoga for Kolla/Kolla-Ansible apart SHPC?
14:38:26 <mmalchuk> me
14:38:45 <mmalchuk> I'm always care for backports
14:38:50 <SvenKieske> I may contribute drive-by backports on a case by case basis, but I wouldn't count that officially :)
14:39:13 <frickler> then likely you should create a kolla-unmaintained-core group and amend the gerrit acls
14:39:27 <frickler> I can find an example patch after the meeting
14:39:43 <SvenKieske> makes sense
14:40:14 <mnasiadka> frickler: what if we would prefer that the kolla-core group is core for unmaintained branches? :)
14:40:51 <frickler> mnasiadka: well I would prefer to not be core for unmaintained
14:41:11 <SvenKieske> shouldn't kolla-core be cleaned up either way? I swear I've seen people in their where their last patch/contribution was somewhere in 2017 or so?
14:41:23 <frickler> that's yet another topic
14:41:24 <SvenKieske> there*
14:41:41 <mnasiadka> frickler: but by default what are the ACLs? kolla-core has rights or some openstack-unmaintained-core?
14:41:50 <frickler> the latter only
14:41:53 <mnasiadka> ok
14:42:08 <mnasiadka> I'll create kolla-unmaintained-core and kayobe-unmaintained-core
14:42:41 <mnasiadka> and add current cores for starters, nobody is forced to review anything - just like in the usual EM branches
14:43:11 <frickler> mnasiadka: cf. I169e52d5fb545c675549ce06fef1ca2f8eb1de86
14:45:16 <mnasiadka> frickler: thanks
14:45:19 <mnasiadka> ok, let's go forward
14:45:28 <mnasiadka> (dougszu): Discuss sending all service and infra logs to journald (don't shoot).
14:45:36 <mnasiadka> dougszu: please elaborate ;-)
14:45:44 <mmalchuk> about 2 commits above unmaintained/yoga in Kayobe?
14:46:05 <mnasiadka> mmalchuk: they will be merged in unmaintained/yoga once requirements repo has unmaintained/yoga
14:46:08 <dougszu> So basically, oslo.log can output additional logging info, that we don't currently get: https://docs.openstack.org/oslo.log/latest/admin/journal.html
14:46:12 <mnasiadka> now nothing is mergable
14:46:45 <dougszu> There are two ways to get the extra info - write out logs in JSON format, which means they are less readable on the box, or send everything to journald
14:47:12 <SvenKieske> dougszu: I'm personally a big +1 on this, as it streamlines the logging infrastructure more. I don't know about the implementation though, but I guess I already commented on the patch.
14:47:20 <kevko> only one +2 and +w for unmaintained branches ! :)
14:47:24 <mmalchuk> mnasiadka stable/yoga would be dropped as after merge?
14:47:31 <mnasiadka> mmalchuk: yup
14:47:37 <mmalchuk> thanks
14:47:46 <mnasiadka> kevko: once they start working we can go back to that
14:47:49 <dougszu> thanks Sven - I could look at alternatives to writing direct to the journal in the patch
14:48:18 <mnasiadka> I have a slightly complicated question - you know that RH-clones do not persist journal across reboots?
14:48:23 <jovial> Is there a link to the patch?
14:48:49 <dougszu> there is no patch yet - some thoughts on the etherpad: https://etherpad.opendev.org/p/KollaWhiteBoard#L63
14:48:53 <jovial> Persisting the journal can be enabled though, right?
14:48:58 <frickler> mnasiadka: well thats configurable, isn't it? just create /var/log/journal
14:49:04 <jovial> IIRC all you need to do is create the default direcotry
14:49:18 <mnasiadka> frickler: true, but still that's something that needs to be included
14:49:30 <mnasiadka> I assume we're speaking about not logging anymore to /var/log/kolla?
14:49:39 <SvenKieske> well we should probably do a robust config, not just create the directory, that means taking care that it doesn't overflow etc.
14:50:07 <dougszu> Correct - I am proposing to stop logging to /var/log/kolla, and hand over everything to journald
14:50:21 <SvenKieske> we could also for the first part redirect /var/log/journal to /var/log/kolla, it's also configurable which directory to use, probably better for older users
14:50:42 <mnasiadka> SvenKieske: that's not the same format, not really human readable
14:50:42 <SvenKieske> the location is totally orthogonal to the mechanism being used..
14:50:43 <kevko> i like var/log/kolla :)
14:50:53 <mmalchuk> me too
14:50:54 <frickler> it is not only a matter of the directory, also text format vs. journal format
14:50:58 <SvenKieske> sure it's human readable, just use "journalctl" ;)
14:51:12 <mnasiadka> SvenKieske: cat vs journalctl, err... no
14:51:24 <kevko> if i need to choose beetween journalctl and tail ..i am voting for tail
14:51:28 <SvenKieske> but this is still a third orthogonal problem, can we please stop mixing problem spaces all the time? :)
14:51:50 <dougszu> journalctl is pretty good if you take time to read the man page
14:51:58 <jovial> journalctl has some nice filtering options such as log priority
14:52:18 <SvenKieske> so we have three problems: a) using journald b) which location on the FS to log to c) which binary format to use (utmp is also a binary log file btw)
14:52:23 <mnasiadka> We all understand that, but you know that proposal is like dropping Docker?
14:52:26 <mmalchuk> filtering bad with multiline logs
14:53:14 <dougszu> mmalchuk: there is no multi-line regex with this approach, it should fix some issues with that
14:53:22 <SvenKieske> so which problem do we want to talk about? all at once? because you can of course configure journald to output plain text and the location is configurable, so this really has nothing to do
14:53:22 <mnasiadka> I'm pretty sure there will be a lot of people disliking that approach
14:53:25 <kevko> why to not provide user and option if he want to do a or b ?
14:53:36 <dougszu> maintenance
14:54:05 <dougszu> You could perhaps configure fluentd to write back out kolla logs from the journal
14:54:14 <mnasiadka> So, we have 6 more minutes.
14:54:24 <jovial> Don't we only care about giving the the services access to the journal socket. Where the logs are stored are up to how you configure the host OS.
14:54:35 <mnasiadka> Unless dougszu can formulate the proposal in depth in a separate etherpad - we will need to discuss that at the PTG.
14:55:02 <dougszu> PTG sounds good, thanks all
14:55:09 <mnasiadka> I can't see an option we stop writing text files to /var/log/kolla/$service without a proper research, asking users on the ML and providing long deprecation phase
14:55:18 <kevko> grep -ri error /var/log/kolla :D
14:55:21 <SvenKieske> well the current state of affairs is at least very inconsistent, afaik, correct me if I'm wrong but: a) we have logs shipped with fluentd into opensearch b) we have(?) some local only logs in /var/log/{kolla}, c) we have stuff like docker logs which are not persistet anywhere afaik. d) I honestly don't know what podman does e) we have journald installed by default for stuff like kernel/systemd logs by
14:55:23 <SvenKieske> default afaik anyway..
14:55:29 <mnasiadka> SvenKieske: PTG
14:55:46 <mnasiadka> It's a very big change
14:55:54 <mnasiadka> Unless we are going to support both modes
14:56:08 <SvenKieske> it doesn't have to be. stop making things complicated. like, really!
14:56:14 <dougszu> :D
14:56:21 <jovial> Supporting both looks like it may be possible, right?
14:56:44 <kevko> SvenKieske: yeah and only b is the place where you are sure where all logs are present :D
14:56:45 <SvenKieske> s/syslog-ng/journald/ works, you know. if you do the proper config dance.
14:56:46 <mmalchuk> mission impossible)
14:56:49 <mnasiadka> Well, yes - but one of them won't have test coverage :)
14:56:53 <jovial> Just use log_file and use_journal at the same time?
14:56:57 <SvenKieske> kevko: not true, there aren't all logs in b)
14:57:10 <kevko> SvenKieske: okay + docker logs
14:57:15 <SvenKieske> but I agree it seems to be a PTG topic, or for some larger meeting at least
14:57:52 <kevko> SvenKieske: fluent + openserarch works ...but until we will not drop parsing and regexp and will not use python fluent logger  ...it's 75% working logging system
14:57:55 <SvenKieske> the current logging state is a mess, to be honest. but still we need some careful planning to improve upon it and don't make it worse :)
14:58:10 <halomiva> I want to ask, i started working on refactoring docker worker to not use low level client and instead use client similar with podman, i guess logical steps are to first merge this refactor and then try to put as many functions into container worker. What do you think about it?
14:58:33 <SvenKieske> kevko: last time I looked there are no kernel logs in fluentd? :D it's a mess! ;)
14:58:34 <mnasiadka> halomiva: I think we refactored Kolla to do the same, right?
14:59:09 <SvenKieske> halomiva: sounds sane on the surface at least
14:59:11 <mnasiadka> SvenKieske: nobody stops you to run syslog-ng and forward logs to fluentd
14:59:20 <halomiva> mnasiadka: i think yes
14:59:35 <mnasiadka> halomiva: so fine, I've seen the patch - is it ready for reviews?
14:59:53 <SvenKieske> mnasiadka: right, and you can do the very same thing with journald, so that's not really a big thing, if you just talk about using journald (without all the binary log file blabla)
15:00:13 <mnasiadka> ok, it's 16:00
15:00:13 <halomiva> waiting for tests to finish, i tested it locally on basic deployment and it worked so we will see after tests
15:00:26 <mnasiadka> It's the first meeting in last 2 years when we did use the full hour
15:00:31 <mnasiadka> Thanks for coming guys!
15:00:37 <mnasiadka> #endmeeting