Tuesday, 2024-09-03

harun	Merhaba, bu dokümanı kullanarak ClusterAPI'yi yüklemeye çalışıyorum: https://docs.openstack.org/openstack-ansible-ops/latest/mcapi.html, ancak bir sorunum var.	06:40
harun	Özel bir Docker Kayıt Defterim var. containerd_insecure_registries ve diğer yapılandırmaları kurdum. Sorun kümeyi başlatırken oluştu. (openstack-ansible osa_ops.mcapi_vexxhost.k8s_install çalıştırırken)	06:40
harun	Sorun çıktısı:	06:40
harun	https://paste.openstack.org/show/b1VpMn3Sprro62z1R0BG/	06:40
harun	İşte benim yapılandırmam:	06:40
harun	https://paste.openstack.org/show/bSigT3VuqeXpp07QpVq1/	06:40
harun	Hi, I am trying to install ClusterAPI using this documentation: https://docs.openstack.org/openstack-ansible-ops/latest/mcapi.html, but i have a problem. I have a private Docker Registry. I set up containerd_insecure_registries and other configurations. The problem occured in initializing the cluster. (when running openstack-ansible osa_ops.mcapi_vexxhost.k8s_install) Problem output:	06:41
harun	https://paste.openstack.org/show/b1VpMn3Sprro62z1R0BG Here is the my config: https://paste.openstack.org/show/bSigT3VuqeXpp07QpVq1/	06:41
jrosser	harun: good morning - i will check how we are doing this	06:57
harun	i tried to pull image using crictl in k8s lxc container but i got this error: https://paste.openstack.org/show/bEsCUf41YXsiMDY0Ys2S/	07:01
jrosser	harun: which operating system are you using?	07:01
harun	ubuntu 22.04	07:02
jrosser	that is the same as we have	07:04
jrosser	i guess that the first place that i would look is the journal for containerd	07:12
jrosser	harun: how are your lxc hosts setup (what kind of storage backend do you use for the lxc containers?)	07:13
harun	here is the journal of containerd: https://paste.openstack.org/show/bnfcKxLlDycBhpu8gzjX/	07:13
harun	we use ceph	07:14
jrosser	for the infra hosts lxc?	07:15
grauzikas	Hello, Yesterday we was talking about magnum and after that i enabled letsencrypt and now i have error in magnum: https://paste.openstack.org/show/bFCy7heEN0NCO8l92SIl/ my config what i have regarding letsencrypt and magnum https://paste.openstack.org/show/bv19hOUKhHV65Q9nnNmE/	07:16
harun	we use ssd disks in the infra hosts	07:16
grauzikas	may be you could sugest where could be issue? i didint reinstalled whole cluster, but i runned setup-hosts, setup-infrastructure, setup-openstack playbooks	07:17
grauzikas	i enabled debug, thougth may be it will be more informatyve, but seems didnt helped a lot	07:18
jrosser	harun: ok - and then for the lxc hosts there is choices of dir/lvm/zfs/.... for how the lxc storage is set up	07:18
jrosser	harun: basically i think there is something happy with the way containerd is interacting with the storage+lxc in your infra nodes	07:20
jrosser	*not happy	07:20
harun	the lxc storage is ext4 in our system	07:21
jrosser	for example we have to set this https://github.com/vexxhost/ansible-collection-containers/blob/be7967a4a8ed29fa6d1e4d27baedd69695952cf1/roles/containerd/defaults/main.yml#L69-L71	07:22
jrosser	but that is very specific to our deployment becasue we use zfs	07:22
jrosser	harun: my best suggestion is that you make a test deployment in a virtual machine, because that is the same way that we test the mcapi code	07:23
jrosser	you would then be able to compare what happens there with your actual deployment	07:24
jrosser	andrewbonney: now you are here - did you ever see https://paste.openstack.org/show/bnfcKxLlDycBhpu8gzjX/, harun is having trouble with cluster-api	07:25
andrewbonney	That's not something I remember	07:26
harun	so, you are saying that the problem likely occurred because of ext4, how can i do test deployment in a virtual machine	07:28
noonedeadpunk	grauzikas: so I think what you see with keystone errors is smth related to trusts (usually)	07:30
noonedeadpunk	I'm seeing quite a lot of such errors in my logs in magnum pretty much always	07:30
jrosser	magnum is a huge mess https://bugs.launchpad.net/magnum/+bug/2060194	07:30
jrosser	i dont really understand how anyone makes it work properly out of the box	07:31
harun	Could this error be occurring because the container image cannot be pulled within the lxc container? I pulled the image successfully in a virtual machine using the private repo.	07:31
jrosser	harun: yes, there is an interaction between containerd and lxc, that makes it a little more tricky that just straight on the host	07:31
jrosser	so the filesystem used by lxc (overlayfs, dir, zfs, lvm, whatever) is an important factor in if it works or not	07:32
jrosser	that is why i suggest you build an all-in-one deployment with the k8s containers in a VM, using the exact same config we use for testing	07:32
jrosser	then you will be easily able to see any difference between what we test, and your actual environmenrt	07:33
jrosser	harun: just to double check - you did these things? https://github.com/openstack/openstack-ansible-ops/blob/master/mcapi_vexxhost/playbooks/files/openstack_deploy/group_vars/k8s_all/main.yml	07:34
grauzikas	i enabled debug in keystone too, but nothing special what could help to figure out why this: https://paste.openstack.org/show/buVgYEkCbdQDTW7w0QIB/	07:35
harun	yes, i did these configurations.	07:35
jrosser	and this is on Caracal release of openstack-ansible?	07:36
harun	thank you for your answers, i will recheck and then i can try to make your suggestions	07:37
harun	yes, caracal	07:37
jrosser	grauzikas: that you were getting 401 from keystone in the magnum log means that it does connect	07:40
harun	here is the config of the k8s lxc container: https://paste.openstack.org/show/bkgVrE0HaQTWvCPibdXA/	07:42
harun	is there any problem in here?	07:42
jrosser	harun: i don't see one	07:54
jrosser	noonedeadpunk: looks like we don't collect the lxc config in CI jobs any more? or am i missing where it is?	07:54
jrosser	harun: for testing in a VM, you don't need to do the whole deployment	07:54
andrewbonney	I'd expect lxc.apparmor.profile=unconfined as well as the raw.lxc variant of it based on the config, but I don't know why they differ	07:55
jrosser	harun: ^ this is also an interesting thing, you should check the log on the host for apparmor trouble	07:56
harun	interesting, the apparmor service is running in the container right now	07:57
harun	sorry, it seems inactive	07:59
noonedeadpunk	jrosser: yeah, I don't see that either. I wonder if we just didn't merge that	08:11
harun	i think that i solved the problem, i added to these lines to the container config: https://paste.openstack.org/show/bGAswQHb3ZvhlrNvnwKM, then restarted the container, the image is pulled successffuly.	08:11
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Use hosts setup playbooks from openstack-ansible-plugins repo https://review.opendev.org/c/openstack/openstack-ansible/+/924259	08:15
jrosser	harun: do you know if all of those were required, or was it just the apparmor one?	08:15
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Verify OS for containers installation https://review.opendev.org/c/openstack/openstack-ansible/+/925974	08:15
noonedeadpunk	btw this is smth I did just for it to be backportable ^	08:16
noonedeadpunk	as for master I think we'd need to have some "assert" role or playbook not to repeat things multiple times	08:17
noonedeadpunk	preferably in format which could be included into docs :D	08:17
jrosser	i was also wondering if we wanted some "deb822" role as well	08:17
jrosser	as that is going to be a bunch of cut/paste	08:18
noonedeadpunk	I guess depends on amount of places. If it's only openstack_hosts/rabbit/galera - then probably not? As after all migration it will be just 1 task?	08:24
jrosser	yeah, its just a lot of lines of code	08:25
jrosser	with all the many options on the module, but we can always revisit that later	08:25
jrosser	i suspect that the issue harun is seeing is some lack of idempotence in generating the lxc config	08:25
noonedeadpunk	though we unlikely to touch it later as we never did for apt_repo	08:26
noonedeadpunk	yeah, I don't see where we log lxc configs	08:30
harun	only this config is enough: "lxc.apparmor.profile = unconfined", i deleted the other ones and then tried again, it worked	08:33
noonedeadpunk	I can recall there were some patches regarding apparmor profiles for lxc	08:40
noonedeadpunk	for noble at least	08:40
noonedeadpunk	harun: would it work if you use `lxc.apparmor.profile = generated` along with `lxc.apparmor.allow_nesting = 1` ?	08:40
noonedeadpunk	ie - https://opendev.org/openstack/openstack-ansible-lxc_hosts/commit/7b5fc5afab419afc9f17e7286375ad6b08b5d20d	08:41
harun	`lxc.apparmor.profile = generated` along with `lxc.apparmor.allow_nesting = 1`, i tried, it worked	08:43
noonedeadpunk	jrosser: do you think we should backport it together with https://review.opendev.org/c/openstack/openstack-ansible/+/924661 ?	08:46
jrosser	noonedeadpunk: it's possible - though i cannot remember if it is the default setup changes to `generated` in noble, and thats what causes us to need the change on master	09:02
noonedeadpunk	I think it was just more constrained apparmor in general that made our profile not being enough... But I was not working on that bit, so have a vague understanding	09:04
noonedeadpunk	But iirc it was questioned why we have our profile at the first place at all	09:05
jrosser	it is likley OK to backport it	09:05
jrosser	though i still think that we have underlying trouble with adjusting the lxc config	09:06
jrosser	there is a bunch of lineinfile stuff that really is fragile and does not always work	09:07
grauzikas	jrosser: if im making inside lxc container changes for example in file venvs/magnum-29.0.2/lib/python3.10/site-packages/magnum/common/keystone.py and if i will rerun openstack-ansible os-magnum-install.yml it will fetch source again or use my modified?	09:29
jrosser	grauzikas: modifying the code manually in the container is OK for debugging and trying to find a fix for things	09:30
jrosser	but you are right that those changes will be lost if you re-run the playbooks, so it is not really what you want to be doing for something you care about	09:31
jrosser	here is some documentation for how you can point to your own modified versions of the git repos for a service like magnum https://docs.openstack.org/openstack-ansible/latest/user/source-overrides/index.html	09:31
jrosser	this is the correct method to use for applying local patches, or fixes to services that are not yet included in a release	09:32
grauzikas	ok thank you	09:36
rambo	Hi Team	10:03
*** rambo is now known as Guest2357		10:03
Guest2357	I have joined this chat regarding Ussuri to Victoria release upgrade	10:03
noonedeadpunk	o/	10:03
noonedeadpunk	hey	10:04
Guest2357	I need more information on the rabbitmq release not present on the external repo	10:06
Guest2357	Hi Dmitriy	10:06
noonedeadpunk	just a sec	10:11
noonedeadpunk	So I think we see our gates for unmaintained Victoria broken due to that (but not limited to it)	10:12
noonedeadpunk	so back in Victoria we were using the repo https://opendev.org/openstack/openstack-ansible-rabbitmq_server/src/branch/unmaintained/victoria/vars/debian.yml#L25-L26	10:13
noonedeadpunk	and rabbitmq was pinned to 3.8.14	10:13
noonedeadpunk	and I think that this version is not available there anymore	10:13
noonedeadpunk	there are couple of things you can do.	10:13
noonedeadpunk	first - to use just rabbitmq_install_method: distro as I've suggested	10:13
noonedeadpunk	second - you can eventually override rabbitmq_package_version and rabbitmq_erlang_version_spec to supported version which are present in repos	10:14
Guest2357	our current version of rabbitmq in Ussuri is 3.8.2	10:14
Guest2357	for the first way , where can we set this parameter rabbitmq_install_method: distro?	10:15
noonedeadpunk	well, if you're using ubuntu or debian as OS, you can check what's in native repos with `apt-cache policy rabbitmq-server`	10:15
noonedeadpunk	all these are for user_variables.yml	10:16
noonedeadpunk	as that would depend on the OS version in topic	10:17
Guest2357	apt-cache policy rabbitmq-server	10:19
Guest2357	rabbitmq-server:	10:19
Guest2357	Installed: (none)	10:19
Guest2357	Candidate: 3.8.2-0ubuntu1.5	10:19
Guest2357	Version table:	10:19
Guest2357	3.8.2-0ubuntu1.5 500	10:19
Guest2357	500 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages	10:19
Guest2357	500 http://archive.ubuntu.com/ubuntu focal-security/main amd64 Packages	10:20
Guest2357	3.8.2-0ubuntu1 500	10:20
Guest2357	500 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages	10:20
Guest2357	I can see 3.8.2 here also.	10:20
noonedeadpunk	ok, so likely you already fallback to distro-provided rabbitmq	10:24
noonedeadpunk	it sould be fine to set `rabbitmq_install_method: distro` then. Just don't forget to remove it later on, when you will get closer to maintained releases :)	10:24
Guest2357	okay thanks so we will put this line rabbitmq_install_method: distro in the user variables yaml.	10:25
Guest2357	also on point of backup , I can see that we have some customer roles in /etc/ansible/roles such as for prometheus.	10:27
Guest2357	so those will be removed after the upgrade?	10:28
noonedeadpunk	well, they will not be touched	10:34
Guest2357	okay thanks	10:35
noonedeadpunk	but my suggestion would be to add custom roles to user-role-requirements to be managed with bootstrap-ansible script	10:35
noonedeadpunk	to make deploy host more stateless	10:35
Guest2357	thanks I will note this point.	10:35
noonedeadpunk	https://docs.openstack.org/openstack-ansible/latest/reference/configuration/extending-osa.html#adding-new-or-overriding-roles-in-your-openstack-ansible-installation	10:36
noonedeadpunk	then you pretty much may not worry about anything there except presence of openstack_deploy folder. as state will be restored by running bootstrap-ansible.sh solely	10:37
Guest2357	thanks on the 2nd point - service sequence I got that core services which have serial parameter in the install yaml files will be upgraded sequentially. like control01 first and then control02 + control03 together.	10:38
noonedeadpunk	++	10:39
Guest2357	I can see serial parameter on nova, neutron, glance. for other service like designate and manila there is no serial parameter so they will be upgraded parallely on all control together so there will provisioning outage of these services during their upgrade	10:39
Guest2357	loadbalancer and manila file share which are already provisioned will not have impact? correct?	10:40
noonedeadpunk	so ideally no, they won't have outages - only API might have interruption during deployment	10:41
noonedeadpunk	BUT, I can't recall what release it was, likely something like Xena, where we've changed way of generating Octavia CA for LBs, where you need to be cautious about backwards compatability	10:41
noonedeadpunk	we made a script to handle that, and it used to work, but still...	10:42
Guest2357	okay I see . thanks for confirmation.	10:42
noonedeadpunk	it was even Yoga	10:42
Guest2357	yeah we will take care of this , as we plan to sequentially upgrade our ussuri to some latest version.	10:43
noonedeadpunk	Eventually - Octavia certificates and SSH key for Amphoras is another thing to backup	10:43
Guest2357	so starting with victoria	10:43
Guest2357	can you please confirm the path for these octavia certificates and ssh key of amphora ?	10:44
noonedeadpunk	it's really very important, as if cert is rotated on API side - it can't communicate with Amphoras API anymore, so you'd need either to return back original certs or failover all amphoras for them to get new one	10:44
Guest2357	so certificates rotation is expected after the upgrade?	10:45
noonedeadpunk	It's like $HOME/openstack-ansible/octavia - terrible place actually for storage....	10:45
noonedeadpunk	no, it is not expected	10:45
noonedeadpunk	unless you tell to rotate them	10:45
noonedeadpunk	but just mentioning importance of these	10:46
noonedeadpunk	and ssh keys are also $HOME/.ssh/octavia_key	10:46
noonedeadpunk	and $HOME is really poor choice, as heavily depend on how one use deploy host.	10:47
noonedeadpunk	as we had an issue as we're having LDAP on the deploy host, so had a separate set of octavia certs per user running playbooks	10:48
noonedeadpunk	and if certificate is not found under PATH - then it will be generated	10:48
Guest2357	looks like I don't have those certificates on deployment host.	10:49
noonedeadpunk	So potentially - you might want to move these certs somewhere else	10:49
noonedeadpunk	or create them from octavia hosts :D	10:49
noonedeadpunk	this is the upgrade script for Yoga which defines explicitly path for certificates for octavia: https://opendev.org/openstack/openstack-ansible/blame/branch/unmaintained/yoga/scripts/upgrade-utilities/define-octavia-certificate-vars.yml#L23-L28	10:50
noonedeadpunk	you can try to find them inside octavia container - here's the mapping: https://opendev.org/openstack/openstack-ansible-os_octavia/src/branch/unmaintained/victoria/tasks/octavia_certs_distribute.yml#L26-L43	10:52
Guest2357	are these certificates and keys present on some path of control node or designate container as well?	10:53
noonedeadpunk	so if you're running lxc containers - it should be there then	10:53
noonedeadpunk	you can do like `lxc-attach -n <container_name> cat /etc/octavia/certs/ca_key.pem`	10:54
noonedeadpunk	And I'd suggest to place certs also under openstack_deploy folder and explicitly supply path to these alike to what we do with our upgrade script	10:56
Guest2357	yea I can see the certs in above directory	10:56
noonedeadpunk	so good you've asked :)	10:57
Guest2357	okay so now we need to copy the complete folder of certs and keep it on openstack-deploy path.	11:01
noonedeadpunk	and define variables	11:01
Guest2357	okay we need to add in user-variable.yaml file.	11:01
Guest2357	what would be the variable name in that?	11:02
noonedeadpunk	bunch of - you'd need to match file names for that, check the script from Yoga: https://opendev.org/openstack/openstack-ansible/blame/branch/unmaintained/yoga/scripts/upgrade-utilities/define-octavia-certificate-vars.yml#L23-L28	11:02
noonedeadpunk	will try to make more clear paste	11:04
Guest2357	okay got it. so it will be required during our ussuri to victoria upgrade also otherwise certificates will be rotated?	11:05
noonedeadpunk	https://paste.openstack.org/show/bzlnPZWOzrgdSGFL1kL9/	11:06
noonedeadpunk	yes, if no certs found under expected path - role will generate new ones	11:06
Guest2357	okay great thanks.	11:06
noonedeadpunk	as you don't have these in place - you need that right away to avoid all amphoras failover	11:06
Guest2357	what about the ssh keys? how can we keep them same?	11:07
noonedeadpunk	> user-variable.yaml - it's actually important that file was matching regexp `user_*.yml`	11:07
Guest2357	yeah we have user_variable.yaml , so it was typo earlier	11:08
noonedeadpunk	do you have `octavia_ssh_enabled` defined ?	11:08
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-plugins master: Add infrastructure playbooks to openstack-ansible-plugins collection https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/924171	11:08
noonedeadpunk	as it's False by default	11:08
Guest2357	no I dont see this parameter in user_variables.yaml file.	11:09
noonedeadpunk	so you might legitedly not having ssh keys	11:09
noonedeadpunk	as that block is skipped unless you have it explicitly enabled	11:09
Guest2357	okay cool thanks for clarity.	11:09
Guest2357	for 3rd point , we have project router defined as non-HA and virtual router reside on control nodes without any redundancy.	11:10
Guest2357	so when L3 agent on control node restart it will take down all routers and routers will be recover when L3 agent is back.	11:11
Guest2357	we have around 11 routers each control node.	11:12
noonedeadpunk	ok, then it should be pretty much neglectible	11:12
noonedeadpunk	you might not even notice anything	11:12
Guest2357	okay just we can expect reachability issue of project when the L3 service is restarted.	11:12
noonedeadpunk	yeah	11:12
noonedeadpunk	so when neutron agents restarted they're trying to ensure their state and re-create wiring if needed.	11:13
noonedeadpunk	So until agent fully finish "self-healing" some of routers might misbehave	11:13
noonedeadpunk	other way around would be to do neutron-agents upgrade one-by-one manually specifying --limit	11:14
noonedeadpunk	and you can move l3 and dhcp agents between controllers using `openstack network agent remove router --l3 <old_agent_uuid> <router_uuid> && openstack network agent add router --l3 <new_agent_uuid> <router_uuid>`	11:16
noonedeadpunk	but for non-ha there still be downtime for this operation as well	11:16
Guest2357	yeah we are following this during control node reboots moving all routers on other compute and doing the reboot.	11:17
harun	Installing ClusterAPI, I solved the apparmor issue but i got this error: https://paste.openstack.org/show/bWKz9AfH3M6KhoooPcRx/	11:23
Guest2357	so in this case I need to run each playbook of setup-openstack.yaml one by one and while doing the os-neutron-install.yaml one give limit of control node ?	11:26
noonedeadpunk	yeah, kinda... or temporary comment out os-neutron-install.yml from setup-openstack.yml	11:35
Guest2357	okay yeah that would be better to keep neutron after that with limits.	11:36
jrosser	harun: best to look in the journal for kubelet to see what is the issue	11:37
Guest2357	also for compute openstack services [nova-compute and neutron-linuxbridge] upgrade , it is part of os-neutron-install.yaml and os-nova-install.yaml?	11:39
Guest2357	or some other playbook, as I want to see if there is option to control compute upgrade and doing some VM migrations in between to prevent downtime.	11:40
harun	i ran the command with -e kubelet_allow_unsafe_swap=true,	11:41
harun	the journal of the kubelet: https://paste.openstack.org/show/b3B0ALbcqAy1fUQeGLUR/	11:41
jrosser	so what does `/sbin/swapon -s` say?	11:46
jrosser	i do not have swap enabled on my controller nodes so have not had this issue	11:47
harun	i guess, it should be added "--fail-swap-on=false" to the /etc/systemd/system/kubelet.service.d/10-kubeadm.conf	11:48
jrosser	well there are two things	11:48
harun	the output of "/sbin/swapon -s": https://paste.openstack.org/show/bQBU9FKFhjs2cLu5TutE/	11:48
jrosser	the variable kubelet_allow_unsafe_swap only controls this https://github.com/vexxhost/ansible-collection-kubernetes/blob/4b502b215ccaffe71dc1aa5c8fdda2e34a4ef37c/roles/kubelet/tasks/main.yml#L71	11:49
rambo	hi	11:49
*** rambo is now known as Guest2364		11:49
Guest2364	just got disconnected. discussing ussuri to victoria upgrade	11:50
*** Guest2364 is now known as rambo2412		11:51
jrosser	harun: i do not see a way to add extra settings to 10-bubeadm.conf with that code as it stands	11:51
jrosser	also - that is a 3rd party collection, not part of openstack-ansible directly	11:51
noonedeadpunk	rambo2412: neutron-linuxbridge is part of neutron playbook	11:52
noonedeadpunk	but also from my experience downtime for it's restart might be less then from online migration...	11:52
rambo2412	okay I see so if I limit with control01 --> control02 --->control03 it will not upgrade the computes?	11:53
rambo2412	okay I see yeah so better keep the VM while the upgrade is happening.	11:53
noonedeadpunk	yep, or you can also do like that `--limit 'neutron_all:!nova_compute'`	11:54
noonedeadpunk	quite some options are around	11:54
rambo2412	okay sounds good, we can keep all routers on one and do like above. thanks.	11:55
noonedeadpunk	but yeah, as you wanna do agents one-by-one - then makes sense to limit by hosts	11:55
harun	when running without kubelet_allow_unsafe_swap=true, i am getting the error: https://paste.openstack.org/show/b5kORMNd05P5EC2wzpTA/	11:56
rambo2412	or first we can limit by control and later remove any limit which will do on all computes and skip control nodes?	11:57
harun	the error code is 32	11:57
noonedeadpunk	rambo2412: you can, but that would be more time consuming, as playbooks will run against neutron api and agents as well	11:57
noonedeadpunk	despite it won't break/cahnge anything, jsut more execution time	11:58
rambo2412	okay I see, --limit 'neutron_all:!nova_compute' is negating the nova_computes , so it will skip computes?	11:59
opendevreview	Merged openstack/openstack-ansible-os_magnum master: Add test for high-availability mcapi control plane https://review.opendev.org/c/openstack/openstack-ansible-os_magnum/+/923174	12:04
rambo2412	thanks for all the support , I will further prepare my plan and MOP of the upgrade. will come back in case of any further queries	12:15
opendevreview	Merged openstack/openstack-ansible-os_ceilometer master: Add support for Magnum notifications https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/927724	12:52
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_ceilometer stable/2024.1: Add support for Magnum notifications https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/927812	12:55
opendevreview	Merged openstack/openstack-ansible master: Use haproxy_install playbook from openstack-ansible-plugins repo https://review.opendev.org/c/openstack/openstack-ansible/+/924168	13:55
jrosser	noonedeadpunk: we want to make sure this merges first before going too far with moving playbooks to the plugins repo https://review.opendev.org/c/openstack/openstack-ansible/+/925974	13:58
jrosser	i think we might have some merge conflicts to deal with in all of these	13:59
noonedeadpunk	yeah	13:59
noonedeadpunk	But also I was thinking if setup-hosts should be like that: https://review.opendev.org/c/openstack/openstack-ansible/+/924259/4/playbooks/setup-hosts.yml	14:00
noonedeadpunk	as I'd assume that for consistensy we need to have setup-hosts in collection as well	14:00
noonedeadpunk	btw, I've just tested mariadb 11.4.3 and the issue with TLS is still there	14:00
jrosser	ah yes you are right with setup-hosts, let me adjust that	14:02
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-plugins master: Add setup-hosts playbook to plugins collection. https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/927826	14:06
noonedeadpunk	can you have a `-` in playbook name in collection?	14:06
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-plugins master: Add setup_hosts playbook to plugins collection. https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/927826	14:08
jrosser	nope :)	14:08
noonedeadpunk	also... I think we need to add dummy playbooks?	14:09
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible master: Use hosts setup playbooks from openstack-ansible-plugins repo https://review.opendev.org/c/openstack/openstack-ansible/+/924259	14:09
jrosser	good point	14:09
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Bump SHAs and pinned versions https://review.opendev.org/c/openstack/openstack-ansible/+/927841	14:59
noonedeadpunk	#startmeeting openstack_ansible_meeting	15:00
opendevmeet	Meeting started Tue Sep 3 15:00:24 2024 UTC and is due to finish in 60 minutes. The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot.	15:00
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	15:00
opendevmeet	The meeting name has been set to 'openstack_ansible_meeting'	15:00
noonedeadpunk	#topic rollcall	15:00
noonedeadpunk	o/	15:00
jrosser	o/ hello	15:00
noonedeadpunk	#topic office hours	15:01
noonedeadpunk	so we have couple of things for dicsussion	15:02
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-plugins master: Add setup_hosts playbook to plugins collection. https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/927826	15:02
noonedeadpunk	Noble support is almost here from what I see	15:02
noonedeadpunk	#link https://review.opendev.org/c/openstack/openstack-ansible/+/924342	15:02
jrosser	sort of - i would say yes as far as the integrated repo is concerned	15:02
noonedeadpunk	but job is failing multiple times in a row now, but every time in a different way	15:02
jrosser	probably no as far as all additional services are concerned	15:03
noonedeadpunk	yeah, that's true as well	15:03
jrosser	we should do some work on CI stability	15:03
jrosser	i have been trying to keep notes on common failures	15:03
NeilHanlon	hiya	15:03
jrosser	like failing to get u-c, image download errors etc	15:03
noonedeadpunk	I've spotted bunch of mirrors issues with RDO lately as well	15:03
* NeilHanlon hopes for few rocky issues		15:04
jrosser	but there is also a rumble of tempest failures, perhaps with more often than not it being keystone	15:04
noonedeadpunk	were some :D	15:04
jrosser	andrewbonney: ^ you were looking at failures too a bit I think?	15:04
* NeilHanlon plugs his ears and pretends he didn't hear anything		15:04
jrosser	and the mcapi job is extremely troublesome, which needs more investigation	15:04
noonedeadpunk	NeilHanlon: actually we've also discussed with infra folks Rocky mirrors	15:04
jrosser	but on the surface that looks like nothing at all to do with magnum causing the errors	15:05
NeilHanlon	yeah i remember some message from last month or so... travelling took a lot out of me	15:05
noonedeadpunk	seems they do have space on afs share now and were fine adding them	15:05
NeilHanlon	i will try and restart that convo	15:05
noonedeadpunk	yeah, would make sense, as CentOS testing was pulled of as a whole due to experiencing quite some issues	15:05
noonedeadpunk	and rocky was discussed as a replacement	15:05
NeilHanlon	right	15:06
noonedeadpunk	about capi jobs - I frankly did not look into these at all	15:06
noonedeadpunk	as still barely get the topic	15:06
noonedeadpunk	though coming closer and closer by internal backlog to it	15:06
noonedeadpunk	Another thing that you raised my attention to - is changing a way of how uwsgi is supposed to be served	15:07
noonedeadpunk	and puling off wsgi scripts from service setup scripts	15:07
noonedeadpunk	So this bump will totally fail on these changes	15:07
noonedeadpunk	#link https://review.opendev.org/c/openstack/openstack-ansible/+/927841	15:08
jrosser	hopefully we can make some depends-on patches and work through what is broken fairly easily	15:08
noonedeadpunk	yeah	15:09
noonedeadpunk	and with that test noble I hope	15:09
opendevreview	Merged openstack/openstack-ansible master: Verify OS for containers installation https://review.opendev.org/c/openstack/openstack-ansible/+/925974	15:09
noonedeadpunk	we also need to come up with release highlights	15:09
jrosser	do we have anything big left to fix/merge this cycle?	15:10
NeilHanlon	i guess i will also probably start on rocky 10 experimental jobs at some point. i need to check up with RDO folks first	15:10
jrosser	deb822 is one thing, but i think thats now understood and is just a question of doing the other places	15:10
noonedeadpunk	looking through our ptg doc	15:10
noonedeadpunk	#link https://etherpad.opendev.org/p/osa-dalmatian-ptg	15:10
NeilHanlon	goodness, it's almost PTG again isnt it..	15:11
noonedeadpunk	and realizing I failed to work on most interesting topic for myself so far	15:11
jrosser	but it would be quite good to be able to spend the rest of the cycle getting existing stuff merged and doing tidy/up & CI fixing	15:11
noonedeadpunk	NeilHanlon: it really is....	15:11
jrosser	we have had a couple of times now with a real big rush for release	15:11
noonedeadpunk	jrosser: yes, exactly. I don't aim to bring anything new	15:11
noonedeadpunk	really want to have a coordinated release as a feature freeze	15:11
jrosser	i would say we are basically there apart from finishing a few things	15:12
noonedeadpunk	so about topics: deb822, noble, playbooks into collection	15:12
jrosser	yeah	15:12
jrosser	i will try to find time soon to revisit the deb822 stuff	15:12
NeilHanlon	oh i forgot if i mentioned it but i do have a working incus for rocky 9	15:13
NeilHanlon	https://copr.fedorainfracloud.org/coprs/neil/incus/	15:13
noonedeadpunk	oh, that's really nice.	15:13
jrosser	noble is potentially a big job, as also i think we have some still broken roles	15:13
noonedeadpunk	we should try to look into that for 2025.1 I guess	15:13
NeilHanlon	agreed	15:13
* NeilHanlon reads up on what deb822 is		15:13
jrosser	and for playbooks->collection - we should decide how far we go this cycle	15:14
noonedeadpunk	yeah, these are broken ones	15:14
noonedeadpunk	#link https://review.opendev.org/q/topic:%22osa/frist_host_refactoring%22+status:open	15:14
noonedeadpunk	jrosser: I'd go all-in	15:14
jrosser	like is -hosts and -infra enough and we treat -openstack as further work?	15:14
noonedeadpunk	I can get some time to finalize jsut in case	15:14
jrosser	ok - i have kind of lost where we got up to as it has taken so very long to merge the initial stuff	15:15
jrosser	there will be some remaining common-tasks / common-playbooks i expect	15:15
noonedeadpunk	yeah, it took quite long for reviews as well to ensure that all changes to playboks were moved as well	15:15
noonedeadpunk	so far good question is what to do with things like ceph playbooks	15:17
noonedeadpunk	but it looks like that most of things you moved already anyway:)	15:19
noonedeadpunk	so it's good	15:19
noonedeadpunk	And there's also - what to do with things like that: https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/listening-port-report.yml	15:19
noonedeadpunk	I assume you're using this?	15:20
jrosser	that was very useful in the time of working on bind-to-mgmt	15:20
jrosser	but i think actually there is an ansible module to do the same now	15:20
jrosser	https://docs.ansible.com/ansible/2.9/modules/listen_ports_facts_module.html	15:21
noonedeadpunk	yeah	15:21
noonedeadpunk	ok, so overall the list sounds doable - noble, wsgi_scripts and playbooks	15:22
jrosser	i think so	15:22
jrosser	the magnum stuff is ok - but we do risk making a release that includes installing stuff from github.com/jrosser fork which i don't like	15:23
jrosser	mnaser: ^	15:23
noonedeadpunk	Btw there was 1 bug report I wanted to check on, but failed so far	15:25
noonedeadpunk	#link https://bugs.launchpad.net/openstack-ansible/+bug/2078552	15:25
noonedeadpunk	I believe there's a race condition in there, as in case `rabbitmqctl cluster_status` exits with error code, which triggeres assert failure, then we probably should not attempt to run it to get flags either	15:26
noonedeadpunk	But I didn't look in the code, but I guess expectation for recovery in case of cluster failure is fair	15:26
noonedeadpunk	I was thinking though if it would make sense to add another flag like `ignore_cluster_state` as we have in mariadb	15:27
jrosser	andrewbonney: you may have thoughts on this ^	15:27
noonedeadpunk	but then it might go too far, and raise a question if mnesia should be preserved with that flag or not	15:27
jrosser	time going backwards is really bad though :)	15:27
andrewbonney	Yeah, I'll try and look tomorrow, context switch is too hard right now	15:28
noonedeadpunk	oh yes, it's not good :D	15:28
noonedeadpunk	I can get how that happened though...	15:28
noonedeadpunk	or well	15:28
noonedeadpunk	I spotted couple of times, that after reboot chrony somehow does not startup properly from time to time	15:29
jrosser	openstack doesnt support 24.04 for D does it?	15:29
noonedeadpunk	no, they're trying master	15:29
noonedeadpunk	there was another report: https://bugs.launchpad.net/openstack-ansible/+bug/2078521	15:29
jrosser	right - so i still think we need to be careful what message we give out	15:29
noonedeadpunk	yeah, I explained support matrix in the previous one	15:30
noonedeadpunk	so folk is trying to beta test on master and report back findings	15:30
noonedeadpunk	just pretty much missed collection dependency I guess	15:30
jrosser	indeed - the noble topic is really only just all merged now	15:32
noonedeadpunk	but dunno... anyway, overall issue description looks reasonable enough t o double check	15:32
noonedeadpunk	there was another one, but I feel like it's a zun issue	15:34
noonedeadpunk	#link https://bugs.launchpad.net/openstack-ansible/+bug/2078482	15:34
noonedeadpunk	so at worst we can mark it as invalid for osa	15:34
jrosser	interesting venv paths in that bug report	15:36
noonedeadpunk	indeed....	15:37
noonedeadpunk	ah	15:37
noonedeadpunk	I guess it's just top of the 2024.1	15:38
noonedeadpunk	and pbr detects version tag as `stable/2024.1`	15:38
noonedeadpunk	though I would not expect that happening	15:38
jrosser	i thought you still got the previous tag with -dev<big-number> in that case	15:38
noonedeadpunk	it used to be that way for sure, yes	15:39
jrosser	well, some number	15:39
noonedeadpunk	but technically one can override version as well	15:40
noonedeadpunk	but that's pretty much it then	15:40
noonedeadpunk	ah, we have another "bug" on master (and 2024.1 I guess)	15:43
noonedeadpunk	we have conflicting MPMs for Apache between services	15:44
noonedeadpunk	like repo and keystone asking for event and horizon and skyline for event	15:44
noonedeadpunk	or smth like that	15:44
jrosser	actually this is something we should fix	15:44
noonedeadpunk	so re-running playbooks result in fialures	15:44
noonedeadpunk	things went completely off with repo actually	15:45
noonedeadpunk	yeah, I was just thinking about best way for that	15:45
jrosser	thats only in master though currently?	15:45
noonedeadpunk	well, in stable you can shoot into your leg as well	15:45
noonedeadpunk	like - override https://opendev.org/openstack/openstack-ansible-os_keystone/src/branch/master/defaults/main.yml#L235	15:46
noonedeadpunk	but then - https://opendev.org/openstack/openstack-ansible-os_skyline/src/branch/master/vars/debian.yml#L31-L34	15:46
noonedeadpunk	and https://opendev.org/openstack/openstack-ansible-os_horizon/src/branch/master/vars/debian.yml#L61-L64	15:47
noonedeadpunk	so this all leans towards apache role eventually	15:48
jrosser	yes agreed	15:48
noonedeadpunk	but also I think this should be still be backportable at first...	15:49
noonedeadpunk	ah, and also what I found yesterday - is a bug in neutron handlers for l3 - these 2 things just doens not work on modern kernels https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/handlers/main.yml#L33-L75	15:51
noonedeadpunk	but also I'm not sure what's meant under `pgrep neutron-ns-meta`	15:51
noonedeadpunk	I'm not sure though if worth including apache thing in this release.. I guess not, but for 2025.1	15:54
noonedeadpunk	#endmeeting	16:00
opendevmeet	Meeting ended Tue Sep 3 16:00:00 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	16:00
opendevmeet	Minutes: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-09-03-15.00.html	16:00
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-09-03-15.00.txt	16:00
opendevmeet	Log: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-09-03-15.00.log.html	16:00
noonedeadpunk	jrosser: do you have any guess what `pgrep neutron-ns-meta` should be catching at all?	16:13
jrosser	well - this returns the pids of any processes of that name?	16:16
noonedeadpunk	um, and do you have any output?	16:17
noonedeadpunk	as I'm not sure if it's a valid process name at all	16:17
noonedeadpunk	also - it seems that pattern is limited by 16 symbols, just in case	16:18
noonedeadpunk	and then - `readlink -f` does not provide output to the level to see venv_tag....	16:18
noonedeadpunk	talking about these: https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/handlers/main.yml#L41-L42	16:19
noonedeadpunk	https://paste.openstack.org/show/b3dFyeCXzifdorJmYYMQ/	16:19
noonedeadpunk	ugh.	16:20
noonedeadpunk	but really at least understnad what exact procces we're supposed to catch at least...	16:20
noonedeadpunk	as my only guess would be `neutron-metadata-agent`	16:21
* jrosser looking		16:23
jrosser	i wonder if it should actually be `pgrep -f ns-metadata-proxy`	16:25
jrosser	what is actually is searching for is part of the path in the haproxy config file	16:26
jrosser	like `haproxy -f /var/lib/neutron/ns-metadata-proxy/47fb30ac-5c90-4ed7-9a15-65a1225bb6db.conf`	16:26
noonedeadpunk	but then `\| grep -qv "{{ neutron_venv_tag }}` would be pointless as well	16:28
noonedeadpunk	ok, so commit message says about `neutron-ns-metadata-proxy`	16:29
jrosser	i feel like `/proc/$ns_pid/exe` would be the pid of the thing that owns the namespace	16:29
noonedeadpunk	nah, it's returning `/usr/sbin/haproxy `	16:30
jrosser	also https://opendev.org/openstack/neutron/src/branch/master/releasenotes/notes/switching-to-haproxy-for-metadata-proxy-9d8f7549fadf9182.yaml	16:30
noonedeadpunk	yeah, jsut found that	16:31
jrosser	and this cleanup code is 8 years old	16:31
noonedeadpunk	and reno is 7yo	16:31
jrosser	so it might be now either totally wrong or redundant	16:31
noonedeadpunk	ok, cool, so that is likely redundant	16:31
jrosser	well, unless the same issue exists just in a different way	16:32
noonedeadpunk	I guess intention there was to kill proxies running from old venvs on upgrade	16:32
noonedeadpunk	but haproxy should not really matter that much	16:32
jrosser	yes exactly that	16:32
noonedeadpunk	as it's going from system packages kinda	16:33
jrosser	it would be simple to look in a sandbox to see if all those haproxy processes get restarted if you restart the relevant neutron service	16:34
noonedeadpunk	I'm not sure if they are, but my guess is that they should not even	16:34
jrosser	the only thing would be if an upgrade to neutron expected to be putting different content in the generated .conf file	16:35
noonedeadpunk	as wha twe do in the next handler - kill things except haproxy and keepalived	16:35
noonedeadpunk	But then neutron should be handling reload regardless	16:35
noonedeadpunk	as updated content would come only through notifications I assume	16:35
noonedeadpunk	or smth like that	16:36
jrosser	oh no i meant if there was some code change in neutron	16:36
jrosser	that meant the conf files should be updated	16:36
noonedeadpunk	ah, base template, yeah	16:36
jrosser	yeah	16:36
noonedeadpunk	but then we need smth like `neutron_l3_cleanup_on_shutdown` I guess	16:37
jrosser	as usual "its complicated" but for certain we can remove that code as it's been doing nothing for a long time	16:38
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-rabbitmq_server master: Manage apt repositores and keys using deb822_repository module https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/907833	16:49
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-openstack_hosts master: Manage apt repositores and keys using deb822_repository module https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/907434	16:51
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-openstack_hosts master: Manage apt repositores and keys using deb822_repository module https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/907434	16:51
opendevreview	Jonathan Rosser proposed openstack/openstack-ansible-openstack_hosts master: Manage apt repositores and keys using deb822_repository module https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/907434	16:52
noonedeadpunk	ok, we have some extra work to do to run Neutron with uwsgi	22:46
noonedeadpunk	https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/SVP3VUCOZGIY63TGD33H6NQ6UBAFDN5V/	22:47
noonedeadpunk	like - neutron-ovn-maintenance-worker and neutron-periodic-workers	22:47
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Disable uWSGI usage by default https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/927881	23:11
noonedeadpunk	some extra chunk of work....	23:11

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!