Tuesday, 2024-09-03

harunMerhaba, bu dokümanı kullanarak ClusterAPI'yi yüklemeye çalışıyorum: https://docs.openstack.org/openstack-ansible-ops/latest/mcapi.html, ancak bir sorunum var.06:40
harunÖzel bir Docker Kayıt Defterim var. containerd_insecure_registries ve diğer yapılandırmaları kurdum. Sorun kümeyi başlatırken oluştu. (**openstack-ansible osa_ops.mcapi_vexxhost.k8s_install** çalıştırırken)06:40
harunSorun çıktısı:06:40
harunhttps://paste.openstack.org/show/b1VpMn3Sprro62z1R0BG/06:40
harunİşte benim yapılandırmam:06:40
harunhttps://paste.openstack.org/show/bSigT3VuqeXpp07QpVq1/06:40
harunHi, I am trying to install ClusterAPI using this documentation: https://docs.openstack.org/openstack-ansible-ops/latest/mcapi.html, but i have a problem. I have a private Docker Registry. I set up containerd_insecure_registries and other configurations. The problem occured in initializing the cluster. (when running openstack-ansible osa_ops.mcapi_vexxhost.k8s_install) Problem output:06:41
harunhttps://paste.openstack.org/show/b1VpMn3Sprro62z1R0BG Here is the my config: https://paste.openstack.org/show/bSigT3VuqeXpp07QpVq1/06:41
jrosserharun: good morning - i will check how we are doing this06:57
haruni tried to pull image using crictl in k8s lxc container but i got this error: https://paste.openstack.org/show/bEsCUf41YXsiMDY0Ys2S/07:01
jrosserharun: which operating system are you using?07:01
harunubuntu 22.0407:02
jrosserthat is the same as we have07:04
jrosseri guess that the first place that i would look is the journal for containerd07:12
jrosserharun: how are your lxc hosts setup (what kind of storage backend do you use for the lxc containers?)07:13
harunhere is the journal of containerd: https://paste.openstack.org/show/bnfcKxLlDycBhpu8gzjX/07:13
harunwe use ceph07:14
jrosserfor the infra hosts lxc?07:15
grauzikasHello, Yesterday we was talking about magnum and after that i enabled letsencrypt and now i have error in magnum: https://paste.openstack.org/show/bFCy7heEN0NCO8l92SIl/  my config what i have regarding letsencrypt and magnum https://paste.openstack.org/show/bv19hOUKhHV65Q9nnNmE/07:16
harunwe use ssd disks in the infra hosts07:16
grauzikasmay be you could sugest where could be issue? i didint reinstalled whole cluster, but i runned setup-hosts, setup-infrastructure, setup-openstack playbooks07:17
grauzikasi enabled debug, thougth may be it will be more informatyve, but seems didnt helped a lot07:18
jrosserharun: ok - and then for the lxc hosts there is choices of dir/lvm/zfs/.... for how the lxc storage is set up07:18
jrosserharun: basically i think there is something happy with the way containerd is interacting with the storage+lxc in your infra nodes07:20
jrosser*not happy07:20
harunthe lxc storage is ext4 in our system07:21
jrosserfor example we have to set this https://github.com/vexxhost/ansible-collection-containers/blob/be7967a4a8ed29fa6d1e4d27baedd69695952cf1/roles/containerd/defaults/main.yml#L69-L7107:22
jrosserbut that is very specific to our deployment becasue we use zfs07:22
jrosserharun: my best suggestion is that you make a test deployment in a virtual machine, because that is the same way that we test the mcapi code07:23
jrosseryou would then be able to compare what happens there with your actual deployment07:24
jrosserandrewbonney: now you are here - did you ever see https://paste.openstack.org/show/bnfcKxLlDycBhpu8gzjX/, harun is having trouble with cluster-api07:25
andrewbonneyThat's not something I remember07:26
harunso, you are saying that the problem likely occurred because of ext4, how can i do test deployment in a virtual machine07:28
noonedeadpunkgrauzikas: so I think what you see with keystone errors is smth related to trusts (usually)07:30
noonedeadpunkI'm seeing quite a lot of such errors in my logs in magnum pretty much always07:30
jrossermagnum is a huge mess https://bugs.launchpad.net/magnum/+bug/206019407:30
jrosseri dont really understand how anyone makes it work properly out of the box07:31
harunCould this error be occurring because the container image cannot be pulled within the lxc container? I pulled the image successfully in a virtual machine using the private repo.07:31
jrosserharun: yes, there is an interaction between containerd and lxc, that makes it a little more tricky that just straight on the host07:31
jrosserso the filesystem used by lxc (overlayfs, dir, zfs, lvm, whatever) is an important factor in if it works or not07:32
jrosserthat is why i suggest you build an all-in-one deployment with the k8s containers in a VM, using the exact same config we use for testing07:32
jrosserthen you will be easily able to see any difference between what we test, and your actual environmenrt07:33
jrosserharun: just to double check - you did these things? https://github.com/openstack/openstack-ansible-ops/blob/master/mcapi_vexxhost/playbooks/files/openstack_deploy/group_vars/k8s_all/main.yml07:34
grauzikasi enabled debug in keystone too, but nothing special what could help to figure out why this: https://paste.openstack.org/show/buVgYEkCbdQDTW7w0QIB/07:35
harunyes, i did these configurations.07:35
jrosserand this is on Caracal release of openstack-ansible?07:36
harunthank you for your answers, i will recheck and then i can try to make your suggestions07:37
harunyes, caracal07:37
jrossergrauzikas: that you were getting 401 from keystone in the magnum log means that it does connect07:40
harunhere is the config of the k8s lxc container: https://paste.openstack.org/show/bkgVrE0HaQTWvCPibdXA/07:42
harunis there any problem in here?07:42
jrosserharun: i don't see one07:54
jrossernoonedeadpunk: looks like we don't collect the lxc config in CI jobs any more? or am i missing where it is?07:54
jrosserharun: for testing in a VM, you don't need to do the whole deployment07:54
andrewbonneyI'd expect lxc.apparmor.profile=unconfined as well as the raw.lxc variant of it based on the config, but I don't know why they differ07:55
jrosserharun: ^ this is also an interesting thing, you should check the log on the host for apparmor trouble07:56
haruninteresting, the apparmor service is running in the container right now07:57
harunsorry, it seems inactive07:59
noonedeadpunkjrosser: yeah, I don't see that either. I wonder if we just didn't merge that08:11
haruni think that i solved the problem, i added to these lines to the container config: https://paste.openstack.org/show/bGAswQHb3ZvhlrNvnwKM, then restarted the container, the image is pulled successffuly.08:11
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Use hosts setup playbooks from openstack-ansible-plugins repo  https://review.opendev.org/c/openstack/openstack-ansible/+/92425908:15
jrosserharun: do you know if all of those were required, or was it just the apparmor one?08:15
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Verify OS for containers installation  https://review.opendev.org/c/openstack/openstack-ansible/+/92597408:15
noonedeadpunkbtw this is smth I did just for it to be backportable ^ 08:16
noonedeadpunkas for master I think we'd need to have some "assert" role or playbook not to repeat things multiple times08:17
noonedeadpunkpreferably in format which could be included into docs :D08:17
jrosseri was also wondering if we wanted some "deb822" role as well08:17
jrosseras that is going to be a bunch of cut/paste08:18
noonedeadpunkI guess depends on amount of places. If it's only openstack_hosts/rabbit/galera - then probably not? As after all migration it will be just 1 task?08:24
jrosseryeah, its just a lot of lines of code08:25
jrosserwith all the many options on the module, but we can always revisit that later08:25
jrosseri suspect that the issue harun is seeing is some lack of idempotence in generating the lxc config08:25
noonedeadpunkthough we unlikely to touch it later as we never did for apt_repo08:26
noonedeadpunkyeah, I don't see where we log lxc configs08:30
harunonly this config is enough: "lxc.apparmor.profile = unconfined", i deleted the other ones and then tried again, it worked08:33
noonedeadpunkI can recall there were some patches regarding apparmor profiles for lxc08:40
noonedeadpunkfor noble at least08:40
noonedeadpunkharun: would it work if you use `lxc.apparmor.profile = generated` along with `lxc.apparmor.allow_nesting = 1` ?08:40
noonedeadpunkie - https://opendev.org/openstack/openstack-ansible-lxc_hosts/commit/7b5fc5afab419afc9f17e7286375ad6b08b5d20d08:41
harun`lxc.apparmor.profile = generated` along with `lxc.apparmor.allow_nesting = 1`, i tried, it worked08:43
noonedeadpunkjrosser: do you think we should backport it together with https://review.opendev.org/c/openstack/openstack-ansible/+/924661 ?08:46
jrossernoonedeadpunk: it's possible - though i cannot remember if it is the default setup changes to `generated` in noble, and thats what causes us to need the change on master09:02
noonedeadpunkI think it was just more constrained apparmor in general that made our profile not being enough... But I was not working on that bit, so have a vague understanding09:04
noonedeadpunkBut iirc it was questioned why we have our profile at the first place at all09:05
jrosserit is likley OK to backport it09:05
jrosserthough i still think that we have underlying trouble with adjusting the lxc config09:06
jrosserthere is a bunch of lineinfile stuff that really is fragile and does not always work09:07
grauzikasjrosser: if im making inside lxc container changes for example in file venvs/magnum-29.0.2/lib/python3.10/site-packages/magnum/common/keystone.py and if i will rerun openstack-ansible os-magnum-install.yml it will fetch source again or use my modified?09:29
jrossergrauzikas: modifying the code manually in the container is OK for debugging and trying to find a fix for things09:30
jrosserbut you are right that those changes will be lost if you re-run the playbooks, so it is not really what you want to be doing for something you care about09:31
jrosserhere is some documentation for how you can point to your own modified versions of the git repos for a service like magnum https://docs.openstack.org/openstack-ansible/latest/user/source-overrides/index.html09:31
jrosserthis is the correct method to use for applying local patches, or fixes to services that are not yet included in a release09:32
grauzikasok thank you09:36
ramboHi Team10:03
*** rambo is now known as Guest235710:03
Guest2357I have joined this chat regarding Ussuri to Victoria release upgrade10:03
noonedeadpunko/10:03
noonedeadpunkhey10:04
Guest2357I need more information on the rabbitmq release not present on the external repo10:06
Guest2357Hi Dmitriy10:06
noonedeadpunkjust a sec10:11
noonedeadpunkSo I think we see our gates for unmaintained Victoria broken due to that (but not limited to it)10:12
noonedeadpunkso back in Victoria we were using the repo https://opendev.org/openstack/openstack-ansible-rabbitmq_server/src/branch/unmaintained/victoria/vars/debian.yml#L25-L2610:13
noonedeadpunkand rabbitmq was pinned to 3.8.1410:13
noonedeadpunkand I think that this version is not available there anymore10:13
noonedeadpunkthere are couple of things you can do.10:13
noonedeadpunkfirst - to use just rabbitmq_install_method: distro as I've suggested10:13
noonedeadpunksecond - you can eventually override rabbitmq_package_version and rabbitmq_erlang_version_spec to supported version which are present in repos10:14
Guest2357our current version of rabbitmq in Ussuri is 3.8.210:14
Guest2357for the first way , where can we set this parameter rabbitmq_install_method: distro?10:15
noonedeadpunkwell, if you're using ubuntu or debian as OS, you can check what's in native repos with `apt-cache policy rabbitmq-server`10:15
noonedeadpunkall these are for user_variables.yml10:16
noonedeadpunkas that would depend on the OS version in topic10:17
Guest2357apt-cache policy rabbitmq-server10:19
Guest2357rabbitmq-server:10:19
Guest2357  Installed: (none)10:19
Guest2357  Candidate: 3.8.2-0ubuntu1.510:19
Guest2357  Version table:10:19
Guest2357     3.8.2-0ubuntu1.5 50010:19
Guest2357        500 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages10:19
Guest2357        500 http://archive.ubuntu.com/ubuntu focal-security/main amd64 Packages10:20
Guest2357     3.8.2-0ubuntu1 50010:20
Guest2357        500 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages10:20
Guest2357I can see 3.8.2 here also.10:20
noonedeadpunkok, so likely you already fallback to distro-provided rabbitmq10:24
noonedeadpunkit sould be fine to set `rabbitmq_install_method: distro`  then. Just don't forget to remove it later on, when you will get closer to maintained releases :)10:24
Guest2357okay thanks so we will put this line rabbitmq_install_method: distro in the user variables yaml.10:25
Guest2357also on point of backup , I can see that we have some customer roles in /etc/ansible/roles such as for prometheus.10:27
Guest2357so those will be removed after the upgrade?10:28
noonedeadpunkwell, they will not be touched10:34
Guest2357okay thanks10:35
noonedeadpunkbut my suggestion would be to add custom roles to user-role-requirements to be managed with bootstrap-ansible script10:35
noonedeadpunkto make deploy host more stateless10:35
Guest2357thanks I will note this point.10:35
noonedeadpunkhttps://docs.openstack.org/openstack-ansible/latest/reference/configuration/extending-osa.html#adding-new-or-overriding-roles-in-your-openstack-ansible-installation10:36
noonedeadpunkthen you pretty much may not worry about anything there except presence of openstack_deploy folder. as state will be restored by running bootstrap-ansible.sh solely10:37
Guest2357thanks on the 2nd point - service sequence I got that core services which have serial parameter in the install yaml files will be upgraded sequentially. like control01 first and then control02 + control03 together.10:38
noonedeadpunk++10:39
Guest2357I can see serial parameter on nova, neutron, glance. for other service like designate and manila there is no serial parameter so they will be upgraded parallely on all control together so there will provisioning outage of these services during their upgrade10:39
Guest2357loadbalancer and manila file share which are already provisioned will not have impact? correct?10:40
noonedeadpunkso ideally no, they won't have outages - only API might have interruption during deployment10:41
noonedeadpunkBUT, I can't recall what release it was, likely something like Xena, where we've changed way of generating Octavia CA for LBs, where you need to be cautious about backwards compatability10:41
noonedeadpunkwe made a script to handle that, and it used to work, but still...10:42
Guest2357okay I see . thanks for confirmation.10:42
noonedeadpunkit was even Yoga10:42
Guest2357yeah we will take care of this , as we plan to sequentially upgrade our ussuri to some latest version.10:43
noonedeadpunkEventually - Octavia certificates and SSH key for Amphoras is another thing to backup10:43
Guest2357so starting with victoria10:43
Guest2357can you please confirm the path for these octavia certificates and ssh key of amphora ?10:44
noonedeadpunkit's really very important, as if cert is rotated on API side - it can't communicate with Amphoras API anymore, so you'd need either to return back original certs or failover all amphoras for them to get new one10:44
Guest2357so certificates rotation is expected after the upgrade?10:45
noonedeadpunkIt's like $HOME/openstack-ansible/octavia - terrible place actually for storage....10:45
noonedeadpunkno, it is not expected10:45
noonedeadpunkunless you tell to rotate them10:45
noonedeadpunkbut just mentioning importance of these10:46
noonedeadpunkand ssh keys are also $HOME/.ssh/octavia_key10:46
noonedeadpunkand $HOME is really poor choice, as heavily depend on how one use deploy host.10:47
noonedeadpunkas we had an issue as we're having LDAP on the deploy host, so had a separate set of octavia certs per user running playbooks10:48
noonedeadpunkand if certificate is not found under PATH - then it will be generated10:48
Guest2357looks like I don't have those certificates on deployment host.10:49
noonedeadpunkSo potentially - you might want to move these certs somewhere else10:49
noonedeadpunkor create them from octavia hosts :D10:49
noonedeadpunkthis is the upgrade script for Yoga which defines explicitly path for certificates for octavia: https://opendev.org/openstack/openstack-ansible/blame/branch/unmaintained/yoga/scripts/upgrade-utilities/define-octavia-certificate-vars.yml#L23-L2810:50
noonedeadpunkyou can try to find them inside octavia container - here's the mapping: https://opendev.org/openstack/openstack-ansible-os_octavia/src/branch/unmaintained/victoria/tasks/octavia_certs_distribute.yml#L26-L4310:52
Guest2357are these certificates and keys present on some path of control node or designate container as well?10:53
noonedeadpunkso if you're running lxc containers - it should be there then10:53
noonedeadpunkyou can do like `lxc-attach -n <container_name> cat /etc/octavia/certs/ca_key.pem`10:54
noonedeadpunkAnd I'd suggest to place certs also under openstack_deploy folder and explicitly supply path to these alike to what we do with our upgrade script10:56
Guest2357yea I can see the certs in above directory10:56
noonedeadpunkso good you've asked :)10:57
Guest2357okay so now we need to copy the complete folder of certs and keep it on openstack-deploy path.11:01
noonedeadpunkand define variables11:01
Guest2357okay we need to add in user-variable.yaml file.11:01
Guest2357what would be the variable name in that?11:02
noonedeadpunkbunch of - you'd need to match file names for that, check the script from Yoga: https://opendev.org/openstack/openstack-ansible/blame/branch/unmaintained/yoga/scripts/upgrade-utilities/define-octavia-certificate-vars.yml#L23-L2811:02
noonedeadpunkwill try to make more clear paste11:04
Guest2357okay got it. so it will be required during our ussuri to victoria upgrade also otherwise certificates will be rotated?11:05
noonedeadpunkhttps://paste.openstack.org/show/bzlnPZWOzrgdSGFL1kL9/11:06
noonedeadpunkyes, if no certs found under expected path - role will generate new ones11:06
Guest2357okay great thanks.11:06
noonedeadpunkas you don't have these in place - you need that right away to avoid all amphoras failover11:06
Guest2357what about the ssh keys? how can we keep them same?11:07
noonedeadpunk> user-variable.yaml - it's actually important that file was matching regexp `user_*.yml`11:07
Guest2357yeah we have user_variable.yaml , so it was typo earlier11:08
noonedeadpunkdo you have `octavia_ssh_enabled` defined ?11:08
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-plugins master: Add infrastructure playbooks to openstack-ansible-plugins collection  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/92417111:08
noonedeadpunkas it's False by default11:08
Guest2357no I dont see this parameter in user_variables.yaml file.11:09
noonedeadpunkso you might legitedly not having ssh keys11:09
noonedeadpunkas that block is skipped unless you have it explicitly enabled11:09
Guest2357okay cool thanks for clarity.11:09
Guest2357for 3rd point , we have project router defined as non-HA and virtual router reside on control nodes without any redundancy.11:10
Guest2357so when L3 agent on control node restart it will take down all routers and routers will be recover when L3 agent is back.11:11
Guest2357we have around 11 routers each control node.11:12
noonedeadpunkok, then it should be pretty much neglectible 11:12
noonedeadpunkyou might not even notice anything11:12
Guest2357okay just we can expect reachability issue of project when the L3 service is restarted.11:12
noonedeadpunkyeah11:12
noonedeadpunkso when neutron agents restarted they're trying to ensure their state and re-create wiring if needed. 11:13
noonedeadpunkSo until agent fully finish "self-healing" some of routers might misbehave11:13
noonedeadpunkother way around would be to do neutron-agents upgrade one-by-one manually specifying --limit11:14
noonedeadpunkand you can move l3 and dhcp agents between controllers using `openstack network agent remove router --l3 <old_agent_uuid> <router_uuid> && openstack network agent add router --l3 <new_agent_uuid> <router_uuid>`11:16
noonedeadpunkbut for non-ha there still be downtime for this operation as well11:16
Guest2357yeah we are following this during control node reboots moving all routers on other compute and doing the reboot.11:17
harunInstalling ClusterAPI, I solved the apparmor issue but i got this error: https://paste.openstack.org/show/bWKz9AfH3M6KhoooPcRx/11:23
Guest2357so in this case I need to run each playbook of setup-openstack.yaml one by one and while doing the os-neutron-install.yaml one give limit of control node ?11:26
noonedeadpunkyeah, kinda... or temporary comment out os-neutron-install.yml from setup-openstack.yml11:35
Guest2357okay yeah that would be better to keep neutron after that with limits.11:36
jrosserharun: best to look in the journal for kubelet to see what is the issue11:37
Guest2357also for compute openstack services [nova-compute and neutron-linuxbridge] upgrade , it is part of os-neutron-install.yaml and os-nova-install.yaml?11:39
Guest2357or some other playbook, as I want to see if there is option to control compute upgrade and doing some VM migrations in between to prevent downtime.11:40
haruni ran the command with -e kubelet_allow_unsafe_swap=true, 11:41
harunthe journal of the kubelet: https://paste.openstack.org/show/b3B0ALbcqAy1fUQeGLUR/11:41
jrosserso what does `/sbin/swapon -s` say?11:46
jrosseri do not have swap enabled on my controller nodes so have not had this issue11:47
haruni guess, it should be added "--fail-swap-on=false" to the /etc/systemd/system/kubelet.service.d/10-kubeadm.conf11:48
jrosserwell there are two things11:48
harunthe output of "/sbin/swapon -s": https://paste.openstack.org/show/bQBU9FKFhjs2cLu5TutE/ 11:48
jrosserthe variable kubelet_allow_unsafe_swap only controls this https://github.com/vexxhost/ansible-collection-kubernetes/blob/4b502b215ccaffe71dc1aa5c8fdda2e34a4ef37c/roles/kubelet/tasks/main.yml#L7111:49
rambohi11:49
*** rambo is now known as Guest236411:49
Guest2364just got disconnected. discussing ussuri to victoria upgrade11:50
*** Guest2364 is now known as rambo241211:51
jrosserharun: i do not see a way to add extra settings to 10-bubeadm.conf with that code as it stands11:51
jrosseralso - that is a 3rd party collection, not part of openstack-ansible directly11:51
noonedeadpunkrambo2412: neutron-linuxbridge is part of neutron playbook11:52
noonedeadpunkbut also from my experience downtime for it's restart might be less then from online migration...11:52
rambo2412okay I see so if I limit with control01 --> control02 --->control03 it will not upgrade the computes?11:53
rambo2412okay I see yeah so better keep the VM while the upgrade is happening.11:53
noonedeadpunkyep, or you can also do like that `--limit 'neutron_all:!nova_compute'`11:54
noonedeadpunkquite some options are around11:54
rambo2412okay sounds good, we can keep all routers on one and do like above. thanks.11:55
noonedeadpunkbut yeah, as you wanna do agents one-by-one - then makes sense to limit by hosts11:55
harunwhen running without kubelet_allow_unsafe_swap=true, i am getting the error: https://paste.openstack.org/show/b5kORMNd05P5EC2wzpTA/11:56
rambo2412or first we can limit by control and later remove any limit which will do on all computes and skip control nodes?11:57
harunthe error code is 3211:57
noonedeadpunkrambo2412: you can, but that would be more time consuming, as playbooks will run against neutron api and agents as well11:57
noonedeadpunkdespite it won't break/cahnge anything, jsut more execution time 11:58
rambo2412okay I see, --limit 'neutron_all:!nova_compute' is negating the nova_computes , so it will skip computes?11:59
opendevreviewMerged openstack/openstack-ansible-os_magnum master: Add test for high-availability mcapi control plane  https://review.opendev.org/c/openstack/openstack-ansible-os_magnum/+/92317412:04
rambo2412thanks for all the support , I will further prepare my plan and MOP of the upgrade. will come back in case of any further queries12:15
opendevreviewMerged openstack/openstack-ansible-os_ceilometer master: Add support for Magnum notifications  https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/92772412:52
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_ceilometer stable/2024.1: Add support for Magnum notifications  https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/92781212:55
opendevreviewMerged openstack/openstack-ansible master: Use haproxy_install playbook from openstack-ansible-plugins repo  https://review.opendev.org/c/openstack/openstack-ansible/+/92416813:55
jrossernoonedeadpunk: we want to make sure this merges first before going too far with moving playbooks to the plugins repo https://review.opendev.org/c/openstack/openstack-ansible/+/92597413:58
jrosseri think we might have some merge conflicts to deal with in all of these13:59
noonedeadpunkyeah13:59
noonedeadpunkBut also I was thinking if setup-hosts should be like that: https://review.opendev.org/c/openstack/openstack-ansible/+/924259/4/playbooks/setup-hosts.yml14:00
noonedeadpunkas I'd assume that for consistensy we need to have setup-hosts in collection as well14:00
noonedeadpunkbtw, I've just tested mariadb 11.4.3 and the issue with TLS is still there14:00
jrosserah yes you are right with setup-hosts, let me adjust that14:02
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-plugins master: Add setup-hosts playbook to plugins collection.  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/92782614:06
noonedeadpunkcan you have a `-` in playbook name in collection?14:06
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-plugins master: Add setup_hosts playbook to plugins collection.  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/92782614:08
jrossernope :)14:08
noonedeadpunkalso... I think we need to add dummy playbooks?14:09
opendevreviewJonathan Rosser proposed openstack/openstack-ansible master: Use hosts setup playbooks from openstack-ansible-plugins repo  https://review.opendev.org/c/openstack/openstack-ansible/+/92425914:09
jrossergood point14:09
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible master: Bump SHAs and pinned versions  https://review.opendev.org/c/openstack/openstack-ansible/+/92784114:59
noonedeadpunk#startmeeting openstack_ansible_meeting15:00
opendevmeetMeeting started Tue Sep  3 15:00:24 2024 UTC and is due to finish in 60 minutes.  The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot.15:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:00
opendevmeetThe meeting name has been set to 'openstack_ansible_meeting'15:00
noonedeadpunk#topic rollcall15:00
noonedeadpunko/15:00
jrossero/ hello15:00
noonedeadpunk#topic office hours15:01
noonedeadpunkso we have couple of things for dicsussion15:02
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-plugins master: Add setup_hosts playbook to plugins collection.  https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/92782615:02
noonedeadpunkNoble support is almost here from what I see15:02
noonedeadpunk#link https://review.opendev.org/c/openstack/openstack-ansible/+/92434215:02
jrossersort of - i would say yes as far as the integrated repo is concerned15:02
noonedeadpunkbut job is failing multiple times in a row now, but every time in a different way15:02
jrosserprobably no as far as all additional services are concerned15:03
noonedeadpunkyeah, that's true as well15:03
jrosserwe should do some work on CI stability15:03
jrosseri have been trying to keep notes on common failures15:03
NeilHanlonhiya15:03
jrosserlike failing to get u-c, image download errors etc15:03
noonedeadpunkI've spotted bunch of mirrors issues with RDO lately as well15:03
* NeilHanlon hopes for few rocky issues15:04
jrosserbut there is also a rumble of tempest failures, perhaps with more often than not it being keystone15:04
noonedeadpunkwere some :D15:04
jrosserandrewbonney: ^ you were looking at failures too a bit I think?15:04
* NeilHanlon plugs his ears and pretends he didn't hear anything15:04
jrosserand the mcapi job is extremely troublesome, which needs more investigation15:04
noonedeadpunkNeilHanlon: actually we've also discussed with infra folks Rocky mirrors15:04
jrosserbut on the surface that looks like nothing at all to do with magnum causing the errors15:05
NeilHanlonyeah i remember some message from last month or so... travelling took a lot out of me15:05
noonedeadpunkseems they do have space on afs share now and were fine adding them15:05
NeilHanloni will try and restart that convo15:05
noonedeadpunkyeah, would make sense, as CentOS testing was pulled of as a whole due to experiencing quite some issues15:05
noonedeadpunkand rocky was discussed as a replacement15:05
NeilHanlonright15:06
noonedeadpunkabout capi jobs - I frankly did not look into these at all15:06
noonedeadpunkas still barely get the topic15:06
noonedeadpunkthough coming closer and closer by internal backlog to it15:06
noonedeadpunkAnother thing that you raised my attention to - is changing a way of how uwsgi is supposed to be served15:07
noonedeadpunkand puling off wsgi scripts from service setup scripts15:07
noonedeadpunkSo this bump will totally fail on these changes15:07
noonedeadpunk#link https://review.opendev.org/c/openstack/openstack-ansible/+/92784115:08
jrosserhopefully we can make some depends-on patches and work through what is broken fairly easily15:08
noonedeadpunkyeah15:09
noonedeadpunkand with that test noble I hope15:09
opendevreviewMerged openstack/openstack-ansible master: Verify OS for containers installation  https://review.opendev.org/c/openstack/openstack-ansible/+/92597415:09
noonedeadpunkwe also need to come up with release highlights15:09
jrosserdo we have anything big left to fix/merge this cycle?15:10
NeilHanloni guess i will also probably start on rocky 10 experimental jobs at some point. i need to check up with RDO folks first15:10
jrosserdeb822 is one thing, but i think thats now understood and is just a question of doing the other places15:10
noonedeadpunklooking through our ptg doc15:10
noonedeadpunk#link https://etherpad.opendev.org/p/osa-dalmatian-ptg15:10
NeilHanlongoodness, it's almost PTG again isnt it..15:11
noonedeadpunkand realizing I failed to work on most interesting topic for myself so far15:11
jrosserbut it would be quite good to be able to spend the rest of the cycle getting existing stuff merged and doing tidy/up & CI fixing15:11
noonedeadpunkNeilHanlon: it really is....15:11
jrosserwe have had a couple of times now with a real big rush for release15:11
noonedeadpunkjrosser: yes, exactly. I don't aim to bring anything new15:11
noonedeadpunkreally want to have a coordinated release as a feature freeze15:11
jrosseri would say we are basically there apart from finishing a few things15:12
noonedeadpunkso about topics: deb822, noble, playbooks into collection15:12
jrosseryeah15:12
jrosseri will try to find time soon to revisit the deb822 stuff15:12
NeilHanlonoh i forgot if i mentioned it but i do have a working incus for rocky 915:13
NeilHanlonhttps://copr.fedorainfracloud.org/coprs/neil/incus/15:13
noonedeadpunkoh, that's really nice. 15:13
jrossernoble is potentially a big job, as also i think we have some still broken roles15:13
noonedeadpunkwe should try to look into that for 2025.1 I guess15:13
NeilHanlonagreed15:13
* NeilHanlon reads up on what deb822 is15:13
jrosserand for playbooks->collection - we should decide how far we go this cycle15:14
noonedeadpunkyeah, these are broken ones 15:14
noonedeadpunk#link https://review.opendev.org/q/topic:%22osa/frist_host_refactoring%22+status:open15:14
noonedeadpunkjrosser: I'd go all-in15:14
jrosserlike is -hosts and -infra enough and we treat -openstack as further work?15:14
noonedeadpunkI can get some time to finalize jsut in case15:14
jrosserok - i have kind of lost where we got up to as it has taken so very long to merge the initial stuff15:15
jrosserthere will be some remaining common-tasks / common-playbooks i expect15:15
noonedeadpunkyeah, it took quite long for reviews as well to ensure that all changes to playboks were moved as well15:15
noonedeadpunkso far good question is what to do with things like ceph playbooks15:17
noonedeadpunkbut it looks like that most of things you moved already anyway:)15:19
noonedeadpunkso it's good15:19
noonedeadpunkAnd there's also - what to do with things like that: https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/listening-port-report.yml15:19
noonedeadpunkI assume you're using this?15:20
jrosserthat was very useful in the time of working on bind-to-mgmt15:20
jrosserbut i think actually there is an ansible module to do the same now15:20
jrosserhttps://docs.ansible.com/ansible/2.9/modules/listen_ports_facts_module.html15:21
noonedeadpunkyeah15:21
noonedeadpunkok, so overall the list sounds doable - noble, wsgi_scripts and playbooks15:22
jrosseri think so15:22
jrosserthe magnum stuff is ok - but we do risk making a release that includes installing stuff from github.com/jrosser fork which i don't like 15:23
jrossermnaser: ^15:23
noonedeadpunkBtw there was 1 bug report I wanted to check on, but failed so far15:25
noonedeadpunk#link https://bugs.launchpad.net/openstack-ansible/+bug/207855215:25
noonedeadpunkI believe there's a race condition in there, as in case `rabbitmqctl cluster_status` exits with error code, which triggeres assert failure, then we probably should not attempt to run it to get flags either15:26
noonedeadpunkBut I didn't look in the code, but I guess expectation for recovery in case of cluster failure is fair15:26
noonedeadpunkI was thinking though if it would make sense to add another flag like `ignore_cluster_state` as we have in mariadb15:27
jrosserandrewbonney: you may have thoughts on this ^15:27
noonedeadpunkbut then it might go too far, and raise a question if mnesia should be preserved with that flag or not15:27
jrossertime going backwards is really bad though :)15:27
andrewbonneyYeah, I'll try and look tomorrow, context switch is too hard right now15:28
noonedeadpunkoh yes, it's not good :D15:28
noonedeadpunkI can get how that happened though...15:28
noonedeadpunkor well15:28
noonedeadpunkI spotted couple of times, that after reboot chrony somehow does not startup properly from time to time15:29
jrosseropenstack doesnt support 24.04 for D does it?15:29
noonedeadpunkno, they're trying master15:29
noonedeadpunkthere was another report: https://bugs.launchpad.net/openstack-ansible/+bug/207852115:29
jrosserright - so i still think we need to be careful what message we give out15:29
noonedeadpunkyeah, I explained support matrix in the previous one15:30
noonedeadpunkso folk is trying to beta test on master and report back findings15:30
noonedeadpunkjust pretty much missed collection dependency I guess15:30
jrosserindeed - the noble topic is really only just all merged now15:32
noonedeadpunkbut dunno... anyway, overall issue description looks reasonable enough t o double check15:32
noonedeadpunkthere was another one, but I feel like it's a zun issue15:34
noonedeadpunk#link https://bugs.launchpad.net/openstack-ansible/+bug/207848215:34
noonedeadpunkso at worst we can mark it as invalid for osa15:34
jrosserinteresting venv paths in that bug report15:36
noonedeadpunkindeed....15:37
noonedeadpunkah15:37
noonedeadpunkI guess it's just top of the 2024.115:38
noonedeadpunkand pbr detects version tag as `stable/2024.1`15:38
noonedeadpunkthough I would not expect that happening15:38
jrosseri thought you still got the previous tag with -dev<big-number> in that case15:38
noonedeadpunkit used to be that way for sure, yes15:39
jrosserwell, some number15:39
noonedeadpunkbut technically one can override version as well15:40
noonedeadpunkbut that's pretty much it then15:40
noonedeadpunkah, we have another "bug" on master (and 2024.1 I guess)15:43
noonedeadpunkwe have conflicting MPMs for Apache between services15:44
noonedeadpunklike repo and keystone asking for event and horizon and skyline for event15:44
noonedeadpunkor smth like that15:44
jrosseractually this is something we should fix15:44
noonedeadpunkso re-running playbooks result in fialures15:44
noonedeadpunkthings went completely off with repo actually15:45
noonedeadpunkyeah, I was just thinking about best way for that15:45
jrosserthats only in master though currently?15:45
noonedeadpunkwell, in stable you can shoot into your leg as well15:45
noonedeadpunklike - override https://opendev.org/openstack/openstack-ansible-os_keystone/src/branch/master/defaults/main.yml#L23515:46
noonedeadpunkbut then - https://opendev.org/openstack/openstack-ansible-os_skyline/src/branch/master/vars/debian.yml#L31-L3415:46
noonedeadpunkand https://opendev.org/openstack/openstack-ansible-os_horizon/src/branch/master/vars/debian.yml#L61-L6415:47
noonedeadpunkso this all leans towards apache role eventually15:48
jrosseryes agreed15:48
noonedeadpunkbut also I think this should be still be backportable at first...15:49
noonedeadpunkah, and also what I found yesterday - is a bug in neutron handlers for l3 - these 2 things just doens not work on modern kernels https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/handlers/main.yml#L33-L7515:51
noonedeadpunkbut also I'm not sure what's meant under `pgrep neutron-ns-meta`15:51
noonedeadpunkI'm not sure though if worth including apache thing in this release.. I guess not, but for 2025.115:54
noonedeadpunk#endmeeting16:00
opendevmeetMeeting ended Tue Sep  3 16:00:00 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-09-03-15.00.html16:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-09-03-15.00.txt16:00
opendevmeetLog:            https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-09-03-15.00.log.html16:00
noonedeadpunkjrosser: do you have any guess what `pgrep neutron-ns-meta` should be catching at all?16:13
jrosserwell - this returns the pids of any processes of that name?16:16
noonedeadpunkum, and do you have any output?16:17
noonedeadpunkas I'm not sure if it's a valid process name at all16:17
noonedeadpunkalso - it seems that pattern is limited by 16 symbols, just in case16:18
noonedeadpunkand then - `readlink -f` does not provide output to the level to see venv_tag....16:18
noonedeadpunktalking about these: https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/handlers/main.yml#L41-L4216:19
noonedeadpunkhttps://paste.openstack.org/show/b3dFyeCXzifdorJmYYMQ/16:19
noonedeadpunkugh.16:20
noonedeadpunkbut really at least understnad what exact procces we're supposed to catch at least...16:20
noonedeadpunkas my only guess would be `neutron-metadata-agent`16:21
* jrosser looking16:23
jrosseri wonder if it should actually be `pgrep -f ns-metadata-proxy`16:25
jrosserwhat is actually is searching for is part of the path in the haproxy config file16:26
jrosserlike `haproxy -f /var/lib/neutron/ns-metadata-proxy/47fb30ac-5c90-4ed7-9a15-65a1225bb6db.conf`16:26
noonedeadpunkbut then `| grep -qv "{{ neutron_venv_tag }}` would be pointless as well16:28
noonedeadpunkok, so commit message says about `neutron-ns-metadata-proxy`16:29
jrosseri feel like `/proc/$ns_pid/exe` would be the pid of the thing that owns the namespace16:29
noonedeadpunknah, it's returning `/usr/sbin/haproxy `16:30
jrosseralso https://opendev.org/openstack/neutron/src/branch/master/releasenotes/notes/switching-to-haproxy-for-metadata-proxy-9d8f7549fadf9182.yaml16:30
noonedeadpunkyeah, jsut found that16:31
jrosserand this cleanup code is 8 years old16:31
noonedeadpunkand reno is 7yo16:31
jrosserso it might be now either totally wrong or redundant16:31
noonedeadpunkok, cool, so that is likely redundant16:31
jrosserwell, unless the same issue exists just in a different way16:32
noonedeadpunkI guess intention there was to kill proxies running from old venvs on upgrade16:32
noonedeadpunkbut haproxy should not really matter that much16:32
jrosseryes exactly that16:32
noonedeadpunkas it's going from system packages kinda16:33
jrosserit would be simple to look in a sandbox to see if all those haproxy processes get restarted if you restart the relevant neutron service16:34
noonedeadpunkI'm not sure if they are, but my guess is that they should not even16:34
jrosserthe only thing would be if an upgrade to neutron expected to be putting different content in the generated .conf file16:35
noonedeadpunkas wha twe do in the next handler - kill things except haproxy and keepalived16:35
noonedeadpunkBut then neutron should be handling reload regardless16:35
noonedeadpunkas updated content would come only through notifications I assume16:35
noonedeadpunkor smth like that16:36
jrosseroh no i meant if there was some code change in neutron16:36
jrosserthat meant the conf files should be updated16:36
noonedeadpunkah, base template, yeah16:36
jrosseryeah16:36
noonedeadpunkbut then we need smth like `neutron_l3_cleanup_on_shutdown` I guess16:37
jrosseras usual "its complicated" but for certain we can remove that code as it's been doing nothing for a long time16:38
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-rabbitmq_server master: Manage apt repositores and keys using deb822_repository module  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/90783316:49
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-openstack_hosts master: Manage apt repositores and keys using deb822_repository module  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/90743416:51
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-openstack_hosts master: Manage apt repositores and keys using deb822_repository module  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/90743416:51
opendevreviewJonathan Rosser proposed openstack/openstack-ansible-openstack_hosts master: Manage apt repositores and keys using deb822_repository module  https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/90743416:52
noonedeadpunkok, we have some extra work to do to run Neutron with uwsgi22:46
noonedeadpunkhttps://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/SVP3VUCOZGIY63TGD33H6NQ6UBAFDN5V/22:47
noonedeadpunklike - neutron-ovn-maintenance-worker and neutron-periodic-workers22:47
opendevreviewDmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Disable uWSGI usage by default  https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/92788123:11
noonedeadpunksome extra chunk of work....23:11

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!