#openstack-kolla log

13:01:23 <mnasiadka> #startmeeting kolla
13:01:23 <opendevmeet> Meeting started Wed May 22 13:01:23 2024 UTC and is due to finish in 60 minutes.  The chair is mnasiadka. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:01:23 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:01:23 <opendevmeet> The meeting name has been set to 'kolla'
13:01:26 <mnasiadka> #topic rollcall
13:01:28 <SvenKieske> another solution would be to move more to podman, at least in the last few months it didn't break quite as much ;)
13:01:32 <SvenKieske> o/
13:01:33 <r-krcek> o/
13:01:40 <mnasiadka> o/
13:01:43 <mhiner> o/
13:01:43 <frickler> \o
13:02:06 <mnasiadka> well, given the podman bug in Ubuntu 22.04 that breaks bifrost - I'm not moving anywhere yet ;)
13:02:19 <mmalchuk> o/
13:02:36 <SvenKieske> need to write our own container-py I guess :D
13:03:29 <mattcrees> o/
13:04:38 <mnasiadka> #topic agenda
13:04:38 <mnasiadka> * CI status
13:04:38 <mnasiadka> * Release tasks
13:04:38 <mnasiadka> * Regular stable releases (first meeting in a month)
13:04:38 <mnasiadka> * Current cycle planning
13:04:39 <mnasiadka> * Additional agenda (from whiteboard)
13:04:39 <mnasiadka> * Open discussion
13:04:41 <mnasiadka> #topic CI status
13:04:47 <mnasiadka> bleeding red
13:04:52 <mnasiadka> requests 2.32 all over the place
13:05:03 <opendevreview> Jake Hutchinson proposed openstack/kayobe master: Add Ironic Python Agent (IPA) NTP parameter configuration  https://review.opendev.org/c/openstack/kayobe/+/895199
13:05:11 <mnasiadka> being fixed in https://review.opendev.org/c/openstack/ansible-collection-kolla/+/920172 and some others
13:08:44 <frickler> so hopefully under control
13:08:52 <mnasiadka> #topic Release tasks
13:09:03 <mnasiadka> mattcrees: in absence of Will - can you share the status of Kayobe rc1?
13:10:58 <mattcrees> The Anbile bump patch is the main priority: https://review.opendev.org/c/openstack/kayobe/+/910513  but there's some prior patches to resolve the SLURP jobs that need merging first: https://review.opendev.org/c/openstack/kayobe/+/919925 https://review.opendev.org/c/openstack/kayobe/+/913878
13:11:49 <mnasiadka> ok then - and the failures in the last one is requests version fail?
13:12:50 <mattcrees> No the failure in the last one is because the script needs merging in 2023.1 first
13:13:37 <mnasiadka> ok, you have my +2 on the script
13:13:43 <mnasiadka> can you chase someone so it gets merged today?
13:13:48 <mnasiadka> and you can continue the work?
13:13:57 <mattcrees> Sure thing
13:14:00 <mnasiadka> thanks
13:14:08 <mnasiadka> #topic Current cycle planning
13:14:23 <mnasiadka> I tried to remove rp-1 from all patches that we needed
13:14:51 <mnasiadka> I'll try to go through some features plan and also existing patches that didn't go into Caracal
13:15:02 <mnasiadka> and will start RP+1 those that are ready for reviews
13:15:43 <opendevreview> Will Szumski proposed openstack/kayobe master: Bump up Ansible supported versions to 8.x/9.x  https://review.opendev.org/c/openstack/kayobe/+/910513
13:15:47 <mnasiadka> Will also tidy up the whiteboard so we have clear priorities for this cycle - which we could discuss next week
13:16:05 <mnasiadka> But for now priority is getting out of requests 2.23 (or whatever the version is) hell
13:16:25 <mnasiadka> #topic Additional agenda (from whiteboard)
13:17:03 <mnasiadka> (MatusJenca):
13:17:03 <mnasiadka> core reviewers, please look at my patch https://review.opendev.org/c/openstack/kolla-ansible/+/915901
13:17:34 <mnasiadka> I think that's reviewed - kevko's point is valid - when it will be resolved we can go forward
13:18:07 <SvenKieske> yes, discussed this with kevko here, seems like a good simplification
13:18:15 <SvenKieske> I'll ping Matus
13:18:15 <mnasiadka> (mhiner): migration patch https://review.opendev.org/c/openstack/kolla-ansible/+/836941
13:18:15 <mnasiadka> Best effor removal of old container engine files? Sometimes there are leftover processes running there, which prevents the removal.
13:18:15 <mnasiadka> I am unable to make migration work (in Zuul) while VMs are alive, the migration has to be done with stopped VMs. This is outlined in documentation.
13:18:15 <mnasiadka> Thoughts on this?
13:19:08 <mhiner> also, after I fix up linter issues it's now ready for review
13:19:11 <mnasiadka> From my perspective - we should just support host destroy - and then deploy with podman - and have some docs that it should be a rolling process for the cloud
13:19:18 <mnasiadka> Because I see there's more and more problems
13:19:34 <mnasiadka> Does it make sense?
13:20:21 <mhiner> I dont really understand what you mean
13:20:26 <frickler> same as distro upgrades I'd say
13:21:44 <frickler> mhiner: doing a host redeploy to change the container engine instead of trying to do it in place?
13:21:45 <mnasiadka> mhiner: you're trying to support migration - I'm saying we shouldn't support migration in that form - just ask the user to remove all containers and volumes from a host - and then do a re-deploy with podman
13:22:11 <mnasiadka> probably the destroy function limited to one host should work
13:22:33 <SvenKieske> would a mixed deployment work? not really, would it? (so some hosts still with docker, new ones with podman)
13:22:47 <frickler> why should that not work?
13:23:36 <SvenKieske> well I think it would work in the sense you could manually destroy host $foo, redeploy it, but then you have switched container engine and can't manage any remaining docker hosts until you have finished the migration
13:23:58 <SvenKieske> at least that caveat should be documented I think, it might surprise some users? :)
13:24:29 <mhiner> mnasiadka: but isn't the objective to preserve all the volumes with their data, so it can be used for re-deploy?
13:24:35 <SvenKieske> otherwise looks like this is easier doable
13:25:15 <mnasiadka> mhiner: It's your objective, but I don't think it's the project objective :)
13:25:41 <mnasiadka> with distribution upgrades you reinstall the OS completely and then run a deploy to get containers in place
13:25:50 <mnasiadka> for controllers you need to have HA - so that data will be replicated
13:26:10 <mnasiadka> computes should not have any data besides nova_compute volume - but that's ephemeral storage
13:26:49 <mhiner> I see, so instead of this patch we should just have some document detailing how to do the migration?
13:27:08 <kevko> \o
13:27:26 <mnasiadka> mhiner: yup, and some testing would be nice of course
13:27:27 <kevko> sorry ..another meeting .. but now i am here ..let me re-read
13:27:44 <SvenKieske> well it certainly would be nice if mhiner's objective would work, would be a really smooth transition for users, but it seems it's rather hard to achieve, so maybe just implement the "easier" solution first?
13:28:20 <mhiner> I mean, in its current form, where you have to stop VMs beforehand, it works
13:28:40 <mhiner> even our CI tests for it passed
13:28:59 <SvenKieske> so you need to stop workload VMs? I didn't follow the last patches I think, why is that necessary?
13:29:06 <SvenKieske> because of the vm volumes?
13:29:46 <mnasiadka> well, if we have something working - just the user needs to evacuate the instances from that host - probably that's fine as well - but needs to be documented
13:30:26 <kevko> podman migration path  is the same as for quorum queue ...even if that script will land kolla ..you will be afraid to just run it :D
13:30:27 <SvenKieske> yes, if that's really working with only the caveat of migrating VMs, just document it and merge/move on, I guess?
13:30:32 <kevko> so you will probably migrate manually ...
13:30:47 <SvenKieske> kevko: :D you might have a point there
13:30:59 <mhiner> SvenKieske: sometimes there are qemu processes left running in /var/lib/docker which prevents its removal
13:31:30 <SvenKieske> mhiner: ah, yes I remember a note about this.
13:31:32 <kevko> first time i saw that there is a migration to quorum queues i was happy ...and wondering how ? blue-green deployment ? or federation ?
13:31:45 <kevko> no, rabbitmq reset and restart all services :D
13:32:32 <SvenKieske> kevko: well "easy" "nuke it from orbit" solution first. more complicated solution..later..maybe :P
13:32:34 <kevko> mhiner: really ? qemu processes left running var/lib/docker ? how ?
13:32:52 <kevko> and why some ?
13:33:04 <SvenKieske> afaik the qemu process is spawned in the host pid namespace outside dockers control, isn't it?
13:33:10 <mnasiadka> if you boot from local ephemeral storage, then probably it's obvious?
13:33:11 <kevko> it's
13:33:25 <mnasiadka> as in qcow2 files in /var/lib/nova/instances ?
13:33:43 <mhiner> kevko: I am not really sure because it only happens in Zuul - I was not able to reproduce it even when I built our version of Rocky9 with DIB
13:33:47 <SvenKieske> so of course shutting down the container might leave qemu processes around if they can't be closed. lsof -p should tell why
13:34:34 <opendevreview> Michal Nasiadka proposed openstack/kolla-ansible stable/2023.1: CI: Pin requests to <2.32 for docker sdk  https://review.opendev.org/c/openstack/kolla-ansible/+/920131
13:34:50 <mnasiadka> anyway, let's document the limitation of the migration process and be gone with it?
13:35:04 <mnasiadka> any way to check if VMs are running and failing beforehand?
13:35:15 <kevko> mnasiadka: if it is in /var/lib/nova/instances ..it's probably volume (didn't check the code) and it can be migrated ? i mean data can be migrated  ?
13:35:52 <mnasiadka> kevko: it's not a volume, it's an ephemeral root disk - of course you can do live migration to copy that to other host - and that's what we should document + have a precheck before running the migration to podman
13:37:05 <chembervint> Hi to all,
13:37:56 <chembervint> I wana discuss https://review.opendev.org/c/openstack/kolla-ansible/+/918639 - starting Antelope we have no more chance to restart Docker daemon without container. Small fix into systemd unit file will fix it
13:38:01 <mnasiadka> Stop
13:38:07 <mnasiadka> We have a meeting now
13:38:19 <mnasiadka> You can raise that when we get to Open discussion
13:38:32 <chembervint> oh, ok, thank you
13:38:34 <SvenKieske> well looking into it qemu writes the instance log to /var/lib/nova and the qemu domain monitor socket is also located in /var/lib/libvirt/
13:39:13 <mnasiadka> ok, the technical discussion is fantastic - but for the sake of time - can we agree to not support migration of a host that has running instances?
13:39:23 <SvenKieske> seems fine to me :) +1
13:39:36 <mnasiadka> just for the simplicity and no angry people throwing things at us on OpenInfra conferences ;-)
13:39:41 <SvenKieske> and also +1 to move the remaining discussion into the review :)
13:39:53 <mnasiadka> mhiner: are you fine with that?
13:40:11 <mhiner> sure
13:40:25 <mnasiadka> ok then
13:40:27 <mnasiadka> next one
13:40:37 <mnasiadka> (r-krcek): please review:
13:40:37 <mnasiadka> on behalf of wu.chunyang: https://review.opendev.org/c/openstack/kolla-ansible/+/797498
13:40:37 <mnasiadka> Thoughts on the last comments? https://review.opendev.org/c/openstack/kolla-ansible/+/599735
13:42:14 <SvenKieske> I'll review the swift role thing later this week (TM), can't really test much as we use ceph-rgw as most deployments (I guess)
13:42:15 <mnasiadka> ok, so cores - please have a look in both - I think the swift role modernization is really something we should finally try to merge
13:43:04 <SvenKieske> to add to that: I think if swift gets changed it also should receive some more tests, because some stuff in swift was broken for quite some releases and nobody noticed because of the lack of tests
13:43:06 <mnasiadka> for check command - I agree the name is a bit vague, but if we do check-containers - then it clashes a bit with check-containers.yml ;-)
13:43:14 <SvenKieske> I'll make sure to also mention this in the review
13:43:29 <mnasiadka> I'll be happy to have around anybody that uses Swift and cares for it
13:43:36 <mnasiadka> but sadly I think we don't have such a person
13:43:49 <mnasiadka> maybe wuchunyang ;-)
13:43:56 <mmalchuk> its me)
13:44:03 <mmalchuk> will review
13:44:06 <SvenKieske> well check command: don't know about naming clashes any name works for me.
13:44:43 * mmalchuk about Swift
13:45:02 <mnasiadka> ok, reviews are good
13:45:03 <kevko> mnasiadka: commented ..
13:45:10 <mnasiadka> Let's move on
13:45:11 <kevko> why we want to add check to kolla ?
13:45:21 <r-krcek> thank you :)
13:45:22 <mnasiadka> check the comments in the patch
13:45:22 <kevko> we have monitoring documented
13:45:36 <mnasiadka> I'm fine with anything people need ;-)
13:45:37 <mmalchuk> not only review, will try to check on Xena/Yoga too))
13:45:53 <mnasiadka> well, we could have also some command that fails if any container healthcheck is bad
13:46:03 <kevko> well, i have couple of patches i need ..still waiting :)
13:46:11 <mnasiadka> everybody has those
13:46:25 <mnasiadka> I'm overloaded this week, might have some time next week
13:46:37 <mnasiadka> but then I have some business trip in first week of June
13:46:50 <mnasiadka> so complicated patches needing my attention might need to wait
13:46:51 <r-krcek> mnasiadka: the patch covers both not running and unhealthy containers
13:47:04 <mnasiadka> r-krcek: that's nice
13:47:17 <kevko> it's 10 minutes for review and merge ...i really need only two patches ...only init-core-once ..so i can work on tempest
13:47:50 <mnasiadka> I'm not convinced I want more bash scripts to be able to run tempest ;-)
13:47:51 <kevko> it's ctrl+c and ctrl+v what is already done + 2 additional lines ..common
13:47:59 <SvenKieske> speaking of healthchecks, this got reworked to only check actually supported HTTP checks: https://review.opendev.org/c/openstack/kolla-ansible/+/918437 I think that's a great improvement for the project
13:48:18 <SvenKieske> kevko: I'm not sure I can follow, which patch are you talking about?
13:48:52 <SvenKieske> I'll add above patch to the whiteboard for next week
13:48:59 <kevko> mnasiadka: yes, i can also setup dynamic users, networks etc geneeration ...but it's overkill ...
13:49:15 <kevko> SvenKieske: https://review.opendev.org/c/openstack/kolla-ansible/+/914191 <<
13:49:29 <mnasiadka> You don't understand, we've had that discussion on the PTG - we have such high number of bash scripts - that we wanted to move to Tempest to get rid of them
13:49:37 <mnasiadka> not multiply them or add new ones
13:50:03 <mnasiadka> But if moving to Tempest means we add more bash scripts and more dependencies to testing
13:50:10 <SvenKieske> mhm I'll not be around next week, I'm on PTO from 29.5. until 3.6.
13:50:19 <mnasiadka> I don't know if that's worth the risk of breaking CI for some time and resolving those issues
13:50:27 <mnasiadka> especially that it seems we've got more stable now
13:50:58 <mnasiadka> I propose that everybody interested does get a look on kevko's patches
13:51:07 <mnasiadka> and we discuss that in one of the future meetings
13:51:37 <SvenKieske> just move all those stuff from bash to python or ansible? I don't see anything there that's not possible to do in a basic ansible playbook, no? am I missing something?
13:51:52 <mnasiadka> Let's stop that discussion now, it's 9 minutes to closure :)
13:51:56 <SvenKieske> ok
13:52:01 <mnasiadka> #topic Open discussion
13:52:06 <kevko> mnasiadka: okay then, so i will provide drop for tools/init-runonce bash  and rewrite to ansible ok ?
13:52:18 <mnasiadka> chembervint: now
13:52:33 <mnasiadka> kevko: I think it would be good if we would have an etherpad on plan for improvements
13:52:42 <mnasiadka> not just spend time writing something, that we later argue on :)
13:53:04 <mnasiadka> and once we agree on the way forward - we can get to writing code that will easily get accepted
13:53:22 <kevko> mnasiadka: https://etherpad.opendev.org/p/KollaWhiteBoard << it is here
13:53:29 <mnasiadka> Which line?
13:53:35 <kevko> 258
13:53:42 <opendevreview> Mark Goddard proposed openstack/kayobe master: Add support for Cumulus NVUE switches  https://review.opendev.org/c/openstack/kayobe/+/914638
13:53:58 <mnasiadka> kevko: Tempest testing instead of maintaining all those bash scripts
13:54:00 <kevko> and it was discussed i think on previous ptg (not the last)
13:54:05 <mnasiadka> do you see the whole sentence?
13:54:13 <opendevreview> Mark Goddard proposed openstack/kayobe master: Add support for Cumulus NVUE switches  https://review.opendev.org/c/openstack/kayobe/+/914638
13:54:37 <mmalchuk> cores please merge two orphans: https://review.opendev.org/q/Id6e4bbe0aab2360c4e7e5f74fff6170bcc71080b
13:54:39 <kevko> mnasiadka: yep, I will process
13:54:44 <chembervint> ok, I'm here. so https://review.opendev.org/c/openstack/kolla-ansible/+/918639 - I propose a fix, which enable us back to configure live-restore to docker daemon and keep containers alive during docker restarts, which is important for production environments
13:54:51 <SvenKieske> kevko: thanks for working on it :)
13:55:07 <chembervint> and worked fine untill systemd units were introduced into Kolla
13:55:37 <mnasiadka> chembervint: thanks, I'll add review priority and we'll try to timely review that
13:55:48 <chembervint> thank you!
13:56:03 <mnasiadka> ok then
13:56:05 <mnasiadka> anybody else?
13:56:12 <mnasiadka> (we have 4 minutes)
13:56:34 <SvenKieske> well I need to process: https://bugs.launchpad.net/kolla-ansible/+bug/2065168 @chembervint first
13:56:52 <SvenKieske> and I guess all reviewers of https://review.opendev.org/c/openstack/kolla-ansible/+/918639 should read that as well
13:57:24 <SvenKieske> repeating myself from above: the healthcheck rework seems really like a good addition: https://review.opendev.org/c/openstack/kolla-ansible/+/918437
13:57:41 <SvenKieske> RP+1 on that would be nice the author is very responsive
13:57:50 <mnasiadka> well, I added RP+1
13:58:08 <mnasiadka> The patch is nice, but needs thorough testing
13:58:10 <SvenKieske> especially for a first time contributor :) thanks
13:58:15 <SvenKieske> for sure
13:58:18 <mnasiadka> Especially that we don't allow people to go back to previous setting easily
13:58:28 <mnasiadka> maybe we should?
13:58:30 <kevko> mnasiadka: sorry ..but i need to again say something ...on line 253 is "Tempest testing instead of maintaining all those bash scripts"   ....so if you allow tempest ...you can drop almost ALL bashes ... but you don't want to approve 10 lines of bash to allow work on tempest which will replace and remove all bashes ?
13:58:30 <SvenKieske> okay, that might be a good addition
13:58:39 <chembervint> it's not my first time ... :) but thank you! :)
13:58:57 <mnasiadka> kevko: I know you can do better and you can do Ansible instead of those bashes - and your patches don't include dropping of those bashes
13:59:09 <SvenKieske> chembervint: I was talking about https://review.opendev.org/c/openstack/kolla-ansible/+/918437 I think that's not you, is it? :)
13:59:25 <kevko> mnasiadka: thanks :D
13:59:27 <chembervint> oh, sorry :)
13:59:28 <mnasiadka> And I would rather see tenths of rechecks of the Tempest runs to make sure everything works like we want
13:59:31 <SvenKieske> chembervint: I'll re-review your patch as well
13:59:44 <mnasiadka> Because if it fails - we're going to chase and blame kevko :)
13:59:58 <SvenKieske> chembervint: no problem :)
14:00:10 <kevko> kolla -> kevko -> tempest cores
14:00:20 <kevko> :)
14:00:27 <SvenKieske> should we add rally to the list :D
14:00:43 <SvenKieske> ok, it's 16:00 here.
14:01:27 <mnasiadka> ok
14:01:29 <mnasiadka> thanks for coming!
14:01:32 <mnasiadka> see you next week
14:01:34 <mnasiadka> #endmeeting