13:01:23 #startmeeting kolla 13:01:23 Meeting started Wed May 22 13:01:23 2024 UTC and is due to finish in 60 minutes. The chair is mnasiadka. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:01:23 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:01:23 The meeting name has been set to 'kolla' 13:01:26 #topic rollcall 13:01:28 another solution would be to move more to podman, at least in the last few months it didn't break quite as much ;) 13:01:32 o/ 13:01:33 o/ 13:01:40 o/ 13:01:43 o/ 13:01:43 \o 13:02:06 well, given the podman bug in Ubuntu 22.04 that breaks bifrost - I'm not moving anywhere yet ;) 13:02:19 o/ 13:02:36 need to write our own container-py I guess :D 13:03:29 o/ 13:04:38 #topic agenda 13:04:38 * CI status 13:04:38 * Release tasks 13:04:38 * Regular stable releases (first meeting in a month) 13:04:38 * Current cycle planning 13:04:39 * Additional agenda (from whiteboard) 13:04:39 * Open discussion 13:04:41 #topic CI status 13:04:47 bleeding red 13:04:52 requests 2.32 all over the place 13:05:03 Jake Hutchinson proposed openstack/kayobe master: Add Ironic Python Agent (IPA) NTP parameter configuration https://review.opendev.org/c/openstack/kayobe/+/895199 13:05:11 being fixed in https://review.opendev.org/c/openstack/ansible-collection-kolla/+/920172 and some others 13:08:44 so hopefully under control 13:08:52 #topic Release tasks 13:09:03 mattcrees: in absence of Will - can you share the status of Kayobe rc1? 13:10:58 The Anbile bump patch is the main priority: https://review.opendev.org/c/openstack/kayobe/+/910513 but there's some prior patches to resolve the SLURP jobs that need merging first: https://review.opendev.org/c/openstack/kayobe/+/919925 https://review.opendev.org/c/openstack/kayobe/+/913878 13:11:49 ok then - and the failures in the last one is requests version fail? 13:12:50 No the failure in the last one is because the script needs merging in 2023.1 first 13:13:37 ok, you have my +2 on the script 13:13:43 can you chase someone so it gets merged today? 13:13:48 and you can continue the work? 13:13:57 Sure thing 13:14:00 thanks 13:14:08 #topic Current cycle planning 13:14:23 I tried to remove rp-1 from all patches that we needed 13:14:51 I'll try to go through some features plan and also existing patches that didn't go into Caracal 13:15:02 and will start RP+1 those that are ready for reviews 13:15:43 Will Szumski proposed openstack/kayobe master: Bump up Ansible supported versions to 8.x/9.x https://review.opendev.org/c/openstack/kayobe/+/910513 13:15:47 Will also tidy up the whiteboard so we have clear priorities for this cycle - which we could discuss next week 13:16:05 But for now priority is getting out of requests 2.23 (or whatever the version is) hell 13:16:25 #topic Additional agenda (from whiteboard) 13:17:03 (MatusJenca): 13:17:03 core reviewers, please look at my patch https://review.opendev.org/c/openstack/kolla-ansible/+/915901 13:17:34 I think that's reviewed - kevko's point is valid - when it will be resolved we can go forward 13:18:07 yes, discussed this with kevko here, seems like a good simplification 13:18:15 I'll ping Matus 13:18:15 (mhiner): migration patch https://review.opendev.org/c/openstack/kolla-ansible/+/836941 13:18:15 Best effor removal of old container engine files? Sometimes there are leftover processes running there, which prevents the removal. 13:18:15 I am unable to make migration work (in Zuul) while VMs are alive, the migration has to be done with stopped VMs. This is outlined in documentation. 13:18:15 Thoughts on this? 13:19:08 also, after I fix up linter issues it's now ready for review 13:19:11 From my perspective - we should just support host destroy - and then deploy with podman - and have some docs that it should be a rolling process for the cloud 13:19:18 Because I see there's more and more problems 13:19:34 Does it make sense? 13:20:21 I dont really understand what you mean 13:20:26 same as distro upgrades I'd say 13:21:44 mhiner: doing a host redeploy to change the container engine instead of trying to do it in place? 13:21:45 mhiner: you're trying to support migration - I'm saying we shouldn't support migration in that form - just ask the user to remove all containers and volumes from a host - and then do a re-deploy with podman 13:22:11 probably the destroy function limited to one host should work 13:22:33 would a mixed deployment work? not really, would it? (so some hosts still with docker, new ones with podman) 13:22:47 why should that not work? 13:23:36 well I think it would work in the sense you could manually destroy host $foo, redeploy it, but then you have switched container engine and can't manage any remaining docker hosts until you have finished the migration 13:23:58 at least that caveat should be documented I think, it might surprise some users? :) 13:24:29 mnasiadka: but isn't the objective to preserve all the volumes with their data, so it can be used for re-deploy? 13:24:35 otherwise looks like this is easier doable 13:25:15 mhiner: It's your objective, but I don't think it's the project objective :) 13:25:41 with distribution upgrades you reinstall the OS completely and then run a deploy to get containers in place 13:25:50 for controllers you need to have HA - so that data will be replicated 13:26:10 computes should not have any data besides nova_compute volume - but that's ephemeral storage 13:26:49 I see, so instead of this patch we should just have some document detailing how to do the migration? 13:27:08 \o 13:27:26 mhiner: yup, and some testing would be nice of course 13:27:27 sorry ..another meeting .. but now i am here ..let me re-read 13:27:44 well it certainly would be nice if mhiner's objective would work, would be a really smooth transition for users, but it seems it's rather hard to achieve, so maybe just implement the "easier" solution first? 13:28:20 I mean, in its current form, where you have to stop VMs beforehand, it works 13:28:40 even our CI tests for it passed 13:28:59 so you need to stop workload VMs? I didn't follow the last patches I think, why is that necessary? 13:29:06 because of the vm volumes? 13:29:46 well, if we have something working - just the user needs to evacuate the instances from that host - probably that's fine as well - but needs to be documented 13:30:26 podman migration path is the same as for quorum queue ...even if that script will land kolla ..you will be afraid to just run it :D 13:30:27 yes, if that's really working with only the caveat of migrating VMs, just document it and merge/move on, I guess? 13:30:32 so you will probably migrate manually ... 13:30:47 kevko: :D you might have a point there 13:30:59 SvenKieske: sometimes there are qemu processes left running in /var/lib/docker which prevents its removal 13:31:30 mhiner: ah, yes I remember a note about this. 13:31:32 first time i saw that there is a migration to quorum queues i was happy ...and wondering how ? blue-green deployment ? or federation ? 13:31:45 no, rabbitmq reset and restart all services :D 13:32:32 kevko: well "easy" "nuke it from orbit" solution first. more complicated solution..later..maybe :P 13:32:34 mhiner: really ? qemu processes left running var/lib/docker ? how ? 13:32:52 and why some ? 13:33:04 afaik the qemu process is spawned in the host pid namespace outside dockers control, isn't it? 13:33:10 if you boot from local ephemeral storage, then probably it's obvious? 13:33:11 it's 13:33:25 as in qcow2 files in /var/lib/nova/instances ? 13:33:43 kevko: I am not really sure because it only happens in Zuul - I was not able to reproduce it even when I built our version of Rocky9 with DIB 13:33:47 so of course shutting down the container might leave qemu processes around if they can't be closed. lsof -p should tell why 13:34:34 Michal Nasiadka proposed openstack/kolla-ansible stable/2023.1: CI: Pin requests to <2.32 for docker sdk https://review.opendev.org/c/openstack/kolla-ansible/+/920131 13:34:50 anyway, let's document the limitation of the migration process and be gone with it? 13:35:04 any way to check if VMs are running and failing beforehand? 13:35:15 mnasiadka: if it is in /var/lib/nova/instances ..it's probably volume (didn't check the code) and it can be migrated ? i mean data can be migrated ? 13:35:52 kevko: it's not a volume, it's an ephemeral root disk - of course you can do live migration to copy that to other host - and that's what we should document + have a precheck before running the migration to podman 13:37:05 Hi to all, 13:37:56 I wana discuss https://review.opendev.org/c/openstack/kolla-ansible/+/918639 - starting Antelope we have no more chance to restart Docker daemon without container. Small fix into systemd unit file will fix it 13:38:01 Stop 13:38:07 We have a meeting now 13:38:19 You can raise that when we get to Open discussion 13:38:32 oh, ok, thank you 13:38:34 well looking into it qemu writes the instance log to /var/lib/nova and the qemu domain monitor socket is also located in /var/lib/libvirt/ 13:39:13 ok, the technical discussion is fantastic - but for the sake of time - can we agree to not support migration of a host that has running instances? 13:39:23 seems fine to me :) +1 13:39:36 just for the simplicity and no angry people throwing things at us on OpenInfra conferences ;-) 13:39:41 and also +1 to move the remaining discussion into the review :) 13:39:53 mhiner: are you fine with that? 13:40:11 sure 13:40:25 ok then 13:40:27 next one 13:40:37 (r-krcek): please review: 13:40:37 on behalf of wu.chunyang: https://review.opendev.org/c/openstack/kolla-ansible/+/797498 13:40:37 Thoughts on the last comments? https://review.opendev.org/c/openstack/kolla-ansible/+/599735 13:42:14 I'll review the swift role thing later this week (TM), can't really test much as we use ceph-rgw as most deployments (I guess) 13:42:15 ok, so cores - please have a look in both - I think the swift role modernization is really something we should finally try to merge 13:43:04 to add to that: I think if swift gets changed it also should receive some more tests, because some stuff in swift was broken for quite some releases and nobody noticed because of the lack of tests 13:43:06 for check command - I agree the name is a bit vague, but if we do check-containers - then it clashes a bit with check-containers.yml ;-) 13:43:14 I'll make sure to also mention this in the review 13:43:29 I'll be happy to have around anybody that uses Swift and cares for it 13:43:36 but sadly I think we don't have such a person 13:43:49 maybe wuchunyang ;-) 13:43:56 its me) 13:44:03 will review 13:44:06 well check command: don't know about naming clashes any name works for me. 13:44:43 * mmalchuk about Swift 13:45:02 ok, reviews are good 13:45:03 mnasiadka: commented .. 13:45:10 Let's move on 13:45:11 why we want to add check to kolla ? 13:45:21 thank you :) 13:45:22 check the comments in the patch 13:45:22 we have monitoring documented 13:45:36 I'm fine with anything people need ;-) 13:45:37 not only review, will try to check on Xena/Yoga too)) 13:45:53 well, we could have also some command that fails if any container healthcheck is bad 13:46:03 well, i have couple of patches i need ..still waiting :) 13:46:11 everybody has those 13:46:25 I'm overloaded this week, might have some time next week 13:46:37 but then I have some business trip in first week of June 13:46:50 so complicated patches needing my attention might need to wait 13:46:51 mnasiadka: the patch covers both not running and unhealthy containers 13:47:04 r-krcek: that's nice 13:47:17 it's 10 minutes for review and merge ...i really need only two patches ...only init-core-once ..so i can work on tempest 13:47:50 I'm not convinced I want more bash scripts to be able to run tempest ;-) 13:47:51 it's ctrl+c and ctrl+v what is already done + 2 additional lines ..common 13:47:59 speaking of healthchecks, this got reworked to only check actually supported HTTP checks: https://review.opendev.org/c/openstack/kolla-ansible/+/918437 I think that's a great improvement for the project 13:48:18 kevko: I'm not sure I can follow, which patch are you talking about? 13:48:52 I'll add above patch to the whiteboard for next week 13:48:59 mnasiadka: yes, i can also setup dynamic users, networks etc geneeration ...but it's overkill ... 13:49:15 SvenKieske: https://review.opendev.org/c/openstack/kolla-ansible/+/914191 << 13:49:29 You don't understand, we've had that discussion on the PTG - we have such high number of bash scripts - that we wanted to move to Tempest to get rid of them 13:49:37 not multiply them or add new ones 13:50:03 But if moving to Tempest means we add more bash scripts and more dependencies to testing 13:50:10 mhm I'll not be around next week, I'm on PTO from 29.5. until 3.6. 13:50:19 I don't know if that's worth the risk of breaking CI for some time and resolving those issues 13:50:27 especially that it seems we've got more stable now 13:50:58 I propose that everybody interested does get a look on kevko's patches 13:51:07 and we discuss that in one of the future meetings 13:51:37 just move all those stuff from bash to python or ansible? I don't see anything there that's not possible to do in a basic ansible playbook, no? am I missing something? 13:51:52 Let's stop that discussion now, it's 9 minutes to closure :) 13:51:56 ok 13:52:01 #topic Open discussion 13:52:06 mnasiadka: okay then, so i will provide drop for tools/init-runonce bash and rewrite to ansible ok ? 13:52:18 chembervint: now 13:52:33 kevko: I think it would be good if we would have an etherpad on plan for improvements 13:52:42 not just spend time writing something, that we later argue on :) 13:53:04 and once we agree on the way forward - we can get to writing code that will easily get accepted 13:53:22 mnasiadka: https://etherpad.opendev.org/p/KollaWhiteBoard << it is here 13:53:29 Which line? 13:53:35 258 13:53:42 Mark Goddard proposed openstack/kayobe master: Add support for Cumulus NVUE switches https://review.opendev.org/c/openstack/kayobe/+/914638 13:53:58 kevko: Tempest testing instead of maintaining all those bash scripts 13:54:00 and it was discussed i think on previous ptg (not the last) 13:54:05 do you see the whole sentence? 13:54:13 Mark Goddard proposed openstack/kayobe master: Add support for Cumulus NVUE switches https://review.opendev.org/c/openstack/kayobe/+/914638 13:54:37 cores please merge two orphans: https://review.opendev.org/q/Id6e4bbe0aab2360c4e7e5f74fff6170bcc71080b 13:54:39 mnasiadka: yep, I will process 13:54:44 ok, I'm here. so https://review.opendev.org/c/openstack/kolla-ansible/+/918639 - I propose a fix, which enable us back to configure live-restore to docker daemon and keep containers alive during docker restarts, which is important for production environments 13:54:51 kevko: thanks for working on it :) 13:55:07 and worked fine untill systemd units were introduced into Kolla 13:55:37 chembervint: thanks, I'll add review priority and we'll try to timely review that 13:55:48 thank you! 13:56:03 ok then 13:56:05 anybody else? 13:56:12 (we have 4 minutes) 13:56:34 well I need to process: https://bugs.launchpad.net/kolla-ansible/+bug/2065168 @chembervint first 13:56:52 and I guess all reviewers of https://review.opendev.org/c/openstack/kolla-ansible/+/918639 should read that as well 13:57:24 repeating myself from above: the healthcheck rework seems really like a good addition: https://review.opendev.org/c/openstack/kolla-ansible/+/918437 13:57:41 RP+1 on that would be nice the author is very responsive 13:57:50 well, I added RP+1 13:58:08 The patch is nice, but needs thorough testing 13:58:10 especially for a first time contributor :) thanks 13:58:15 for sure 13:58:18 Especially that we don't allow people to go back to previous setting easily 13:58:28 maybe we should? 13:58:30 mnasiadka: sorry ..but i need to again say something ...on line 253 is "Tempest testing instead of maintaining all those bash scripts" ....so if you allow tempest ...you can drop almost ALL bashes ... but you don't want to approve 10 lines of bash to allow work on tempest which will replace and remove all bashes ? 13:58:30 okay, that might be a good addition 13:58:39 it's not my first time ... :) but thank you! :) 13:58:57 kevko: I know you can do better and you can do Ansible instead of those bashes - and your patches don't include dropping of those bashes 13:59:09 chembervint: I was talking about https://review.opendev.org/c/openstack/kolla-ansible/+/918437 I think that's not you, is it? :) 13:59:25 mnasiadka: thanks :D 13:59:27 oh, sorry :) 13:59:28 And I would rather see tenths of rechecks of the Tempest runs to make sure everything works like we want 13:59:31 chembervint: I'll re-review your patch as well 13:59:44 Because if it fails - we're going to chase and blame kevko :) 13:59:58 chembervint: no problem :) 14:00:10 kolla -> kevko -> tempest cores 14:00:20 :) 14:00:27 should we add rally to the list :D 14:00:43 ok, it's 16:00 here. 14:01:27 ok 14:01:29 thanks for coming! 14:01:32 see you next week 14:01:34 #endmeeting