harun | Merhaba, bu dokümanı kullanarak ClusterAPI'yi yüklemeye çalışıyorum: https://docs.openstack.org/openstack-ansible-ops/latest/mcapi.html, ancak bir sorunum var. | 06:40 |
---|---|---|
harun | Özel bir Docker Kayıt Defterim var. containerd_insecure_registries ve diğer yapılandırmaları kurdum. Sorun kümeyi başlatırken oluştu. (**openstack-ansible osa_ops.mcapi_vexxhost.k8s_install** çalıştırırken) | 06:40 |
harun | Sorun çıktısı: | 06:40 |
harun | https://paste.openstack.org/show/b1VpMn3Sprro62z1R0BG/ | 06:40 |
harun | İşte benim yapılandırmam: | 06:40 |
harun | https://paste.openstack.org/show/bSigT3VuqeXpp07QpVq1/ | 06:40 |
harun | Hi, I am trying to install ClusterAPI using this documentation: https://docs.openstack.org/openstack-ansible-ops/latest/mcapi.html, but i have a problem. I have a private Docker Registry. I set up containerd_insecure_registries and other configurations. The problem occured in initializing the cluster. (when running openstack-ansible osa_ops.mcapi_vexxhost.k8s_install) Problem output: | 06:41 |
harun | https://paste.openstack.org/show/b1VpMn3Sprro62z1R0BG Here is the my config: https://paste.openstack.org/show/bSigT3VuqeXpp07QpVq1/ | 06:41 |
jrosser | harun: good morning - i will check how we are doing this | 06:57 |
harun | i tried to pull image using crictl in k8s lxc container but i got this error: https://paste.openstack.org/show/bEsCUf41YXsiMDY0Ys2S/ | 07:01 |
jrosser | harun: which operating system are you using? | 07:01 |
harun | ubuntu 22.04 | 07:02 |
jrosser | that is the same as we have | 07:04 |
jrosser | i guess that the first place that i would look is the journal for containerd | 07:12 |
jrosser | harun: how are your lxc hosts setup (what kind of storage backend do you use for the lxc containers?) | 07:13 |
harun | here is the journal of containerd: https://paste.openstack.org/show/bnfcKxLlDycBhpu8gzjX/ | 07:13 |
harun | we use ceph | 07:14 |
jrosser | for the infra hosts lxc? | 07:15 |
grauzikas | Hello, Yesterday we was talking about magnum and after that i enabled letsencrypt and now i have error in magnum: https://paste.openstack.org/show/bFCy7heEN0NCO8l92SIl/ my config what i have regarding letsencrypt and magnum https://paste.openstack.org/show/bv19hOUKhHV65Q9nnNmE/ | 07:16 |
harun | we use ssd disks in the infra hosts | 07:16 |
grauzikas | may be you could sugest where could be issue? i didint reinstalled whole cluster, but i runned setup-hosts, setup-infrastructure, setup-openstack playbooks | 07:17 |
grauzikas | i enabled debug, thougth may be it will be more informatyve, but seems didnt helped a lot | 07:18 |
jrosser | harun: ok - and then for the lxc hosts there is choices of dir/lvm/zfs/.... for how the lxc storage is set up | 07:18 |
jrosser | harun: basically i think there is something happy with the way containerd is interacting with the storage+lxc in your infra nodes | 07:20 |
jrosser | *not happy | 07:20 |
harun | the lxc storage is ext4 in our system | 07:21 |
jrosser | for example we have to set this https://github.com/vexxhost/ansible-collection-containers/blob/be7967a4a8ed29fa6d1e4d27baedd69695952cf1/roles/containerd/defaults/main.yml#L69-L71 | 07:22 |
jrosser | but that is very specific to our deployment becasue we use zfs | 07:22 |
jrosser | harun: my best suggestion is that you make a test deployment in a virtual machine, because that is the same way that we test the mcapi code | 07:23 |
jrosser | you would then be able to compare what happens there with your actual deployment | 07:24 |
jrosser | andrewbonney: now you are here - did you ever see https://paste.openstack.org/show/bnfcKxLlDycBhpu8gzjX/, harun is having trouble with cluster-api | 07:25 |
andrewbonney | That's not something I remember | 07:26 |
harun | so, you are saying that the problem likely occurred because of ext4, how can i do test deployment in a virtual machine | 07:28 |
noonedeadpunk | grauzikas: so I think what you see with keystone errors is smth related to trusts (usually) | 07:30 |
noonedeadpunk | I'm seeing quite a lot of such errors in my logs in magnum pretty much always | 07:30 |
jrosser | magnum is a huge mess https://bugs.launchpad.net/magnum/+bug/2060194 | 07:30 |
jrosser | i dont really understand how anyone makes it work properly out of the box | 07:31 |
harun | Could this error be occurring because the container image cannot be pulled within the lxc container? I pulled the image successfully in a virtual machine using the private repo. | 07:31 |
jrosser | harun: yes, there is an interaction between containerd and lxc, that makes it a little more tricky that just straight on the host | 07:31 |
jrosser | so the filesystem used by lxc (overlayfs, dir, zfs, lvm, whatever) is an important factor in if it works or not | 07:32 |
jrosser | that is why i suggest you build an all-in-one deployment with the k8s containers in a VM, using the exact same config we use for testing | 07:32 |
jrosser | then you will be easily able to see any difference between what we test, and your actual environmenrt | 07:33 |
jrosser | harun: just to double check - you did these things? https://github.com/openstack/openstack-ansible-ops/blob/master/mcapi_vexxhost/playbooks/files/openstack_deploy/group_vars/k8s_all/main.yml | 07:34 |
grauzikas | i enabled debug in keystone too, but nothing special what could help to figure out why this: https://paste.openstack.org/show/buVgYEkCbdQDTW7w0QIB/ | 07:35 |
harun | yes, i did these configurations. | 07:35 |
jrosser | and this is on Caracal release of openstack-ansible? | 07:36 |
harun | thank you for your answers, i will recheck and then i can try to make your suggestions | 07:37 |
harun | yes, caracal | 07:37 |
jrosser | grauzikas: that you were getting 401 from keystone in the magnum log means that it does connect | 07:40 |
harun | here is the config of the k8s lxc container: https://paste.openstack.org/show/bkgVrE0HaQTWvCPibdXA/ | 07:42 |
harun | is there any problem in here? | 07:42 |
jrosser | harun: i don't see one | 07:54 |
jrosser | noonedeadpunk: looks like we don't collect the lxc config in CI jobs any more? or am i missing where it is? | 07:54 |
jrosser | harun: for testing in a VM, you don't need to do the whole deployment | 07:54 |
andrewbonney | I'd expect lxc.apparmor.profile=unconfined as well as the raw.lxc variant of it based on the config, but I don't know why they differ | 07:55 |
jrosser | harun: ^ this is also an interesting thing, you should check the log on the host for apparmor trouble | 07:56 |
harun | interesting, the apparmor service is running in the container right now | 07:57 |
harun | sorry, it seems inactive | 07:59 |
noonedeadpunk | jrosser: yeah, I don't see that either. I wonder if we just didn't merge that | 08:11 |
harun | i think that i solved the problem, i added to these lines to the container config: https://paste.openstack.org/show/bGAswQHb3ZvhlrNvnwKM, then restarted the container, the image is pulled successffuly. | 08:11 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Use hosts setup playbooks from openstack-ansible-plugins repo https://review.opendev.org/c/openstack/openstack-ansible/+/924259 | 08:15 |
jrosser | harun: do you know if all of those were required, or was it just the apparmor one? | 08:15 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Verify OS for containers installation https://review.opendev.org/c/openstack/openstack-ansible/+/925974 | 08:15 |
noonedeadpunk | btw this is smth I did just for it to be backportable ^ | 08:16 |
noonedeadpunk | as for master I think we'd need to have some "assert" role or playbook not to repeat things multiple times | 08:17 |
noonedeadpunk | preferably in format which could be included into docs :D | 08:17 |
jrosser | i was also wondering if we wanted some "deb822" role as well | 08:17 |
jrosser | as that is going to be a bunch of cut/paste | 08:18 |
noonedeadpunk | I guess depends on amount of places. If it's only openstack_hosts/rabbit/galera - then probably not? As after all migration it will be just 1 task? | 08:24 |
jrosser | yeah, its just a lot of lines of code | 08:25 |
jrosser | with all the many options on the module, but we can always revisit that later | 08:25 |
jrosser | i suspect that the issue harun is seeing is some lack of idempotence in generating the lxc config | 08:25 |
noonedeadpunk | though we unlikely to touch it later as we never did for apt_repo | 08:26 |
noonedeadpunk | yeah, I don't see where we log lxc configs | 08:30 |
harun | only this config is enough: "lxc.apparmor.profile = unconfined", i deleted the other ones and then tried again, it worked | 08:33 |
noonedeadpunk | I can recall there were some patches regarding apparmor profiles for lxc | 08:40 |
noonedeadpunk | for noble at least | 08:40 |
noonedeadpunk | harun: would it work if you use `lxc.apparmor.profile = generated` along with `lxc.apparmor.allow_nesting = 1` ? | 08:40 |
noonedeadpunk | ie - https://opendev.org/openstack/openstack-ansible-lxc_hosts/commit/7b5fc5afab419afc9f17e7286375ad6b08b5d20d | 08:41 |
harun | `lxc.apparmor.profile = generated` along with `lxc.apparmor.allow_nesting = 1`, i tried, it worked | 08:43 |
noonedeadpunk | jrosser: do you think we should backport it together with https://review.opendev.org/c/openstack/openstack-ansible/+/924661 ? | 08:46 |
jrosser | noonedeadpunk: it's possible - though i cannot remember if it is the default setup changes to `generated` in noble, and thats what causes us to need the change on master | 09:02 |
noonedeadpunk | I think it was just more constrained apparmor in general that made our profile not being enough... But I was not working on that bit, so have a vague understanding | 09:04 |
noonedeadpunk | But iirc it was questioned why we have our profile at the first place at all | 09:05 |
jrosser | it is likley OK to backport it | 09:05 |
jrosser | though i still think that we have underlying trouble with adjusting the lxc config | 09:06 |
jrosser | there is a bunch of lineinfile stuff that really is fragile and does not always work | 09:07 |
grauzikas | jrosser: if im making inside lxc container changes for example in file venvs/magnum-29.0.2/lib/python3.10/site-packages/magnum/common/keystone.py and if i will rerun openstack-ansible os-magnum-install.yml it will fetch source again or use my modified? | 09:29 |
jrosser | grauzikas: modifying the code manually in the container is OK for debugging and trying to find a fix for things | 09:30 |
jrosser | but you are right that those changes will be lost if you re-run the playbooks, so it is not really what you want to be doing for something you care about | 09:31 |
jrosser | here is some documentation for how you can point to your own modified versions of the git repos for a service like magnum https://docs.openstack.org/openstack-ansible/latest/user/source-overrides/index.html | 09:31 |
jrosser | this is the correct method to use for applying local patches, or fixes to services that are not yet included in a release | 09:32 |
grauzikas | ok thank you | 09:36 |
rambo | Hi Team | 10:03 |
*** rambo is now known as Guest2357 | 10:03 | |
Guest2357 | I have joined this chat regarding Ussuri to Victoria release upgrade | 10:03 |
noonedeadpunk | o/ | 10:03 |
noonedeadpunk | hey | 10:04 |
Guest2357 | I need more information on the rabbitmq release not present on the external repo | 10:06 |
Guest2357 | Hi Dmitriy | 10:06 |
noonedeadpunk | just a sec | 10:11 |
noonedeadpunk | So I think we see our gates for unmaintained Victoria broken due to that (but not limited to it) | 10:12 |
noonedeadpunk | so back in Victoria we were using the repo https://opendev.org/openstack/openstack-ansible-rabbitmq_server/src/branch/unmaintained/victoria/vars/debian.yml#L25-L26 | 10:13 |
noonedeadpunk | and rabbitmq was pinned to 3.8.14 | 10:13 |
noonedeadpunk | and I think that this version is not available there anymore | 10:13 |
noonedeadpunk | there are couple of things you can do. | 10:13 |
noonedeadpunk | first - to use just rabbitmq_install_method: distro as I've suggested | 10:13 |
noonedeadpunk | second - you can eventually override rabbitmq_package_version and rabbitmq_erlang_version_spec to supported version which are present in repos | 10:14 |
Guest2357 | our current version of rabbitmq in Ussuri is 3.8.2 | 10:14 |
Guest2357 | for the first way , where can we set this parameter rabbitmq_install_method: distro? | 10:15 |
noonedeadpunk | well, if you're using ubuntu or debian as OS, you can check what's in native repos with `apt-cache policy rabbitmq-server` | 10:15 |
noonedeadpunk | all these are for user_variables.yml | 10:16 |
noonedeadpunk | as that would depend on the OS version in topic | 10:17 |
Guest2357 | apt-cache policy rabbitmq-server | 10:19 |
Guest2357 | rabbitmq-server: | 10:19 |
Guest2357 | Installed: (none) | 10:19 |
Guest2357 | Candidate: 3.8.2-0ubuntu1.5 | 10:19 |
Guest2357 | Version table: | 10:19 |
Guest2357 | 3.8.2-0ubuntu1.5 500 | 10:19 |
Guest2357 | 500 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages | 10:19 |
Guest2357 | 500 http://archive.ubuntu.com/ubuntu focal-security/main amd64 Packages | 10:20 |
Guest2357 | 3.8.2-0ubuntu1 500 | 10:20 |
Guest2357 | 500 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages | 10:20 |
Guest2357 | I can see 3.8.2 here also. | 10:20 |
noonedeadpunk | ok, so likely you already fallback to distro-provided rabbitmq | 10:24 |
noonedeadpunk | it sould be fine to set `rabbitmq_install_method: distro` then. Just don't forget to remove it later on, when you will get closer to maintained releases :) | 10:24 |
Guest2357 | okay thanks so we will put this line rabbitmq_install_method: distro in the user variables yaml. | 10:25 |
Guest2357 | also on point of backup , I can see that we have some customer roles in /etc/ansible/roles such as for prometheus. | 10:27 |
Guest2357 | so those will be removed after the upgrade? | 10:28 |
noonedeadpunk | well, they will not be touched | 10:34 |
Guest2357 | okay thanks | 10:35 |
noonedeadpunk | but my suggestion would be to add custom roles to user-role-requirements to be managed with bootstrap-ansible script | 10:35 |
noonedeadpunk | to make deploy host more stateless | 10:35 |
Guest2357 | thanks I will note this point. | 10:35 |
noonedeadpunk | https://docs.openstack.org/openstack-ansible/latest/reference/configuration/extending-osa.html#adding-new-or-overriding-roles-in-your-openstack-ansible-installation | 10:36 |
noonedeadpunk | then you pretty much may not worry about anything there except presence of openstack_deploy folder. as state will be restored by running bootstrap-ansible.sh solely | 10:37 |
Guest2357 | thanks on the 2nd point - service sequence I got that core services which have serial parameter in the install yaml files will be upgraded sequentially. like control01 first and then control02 + control03 together. | 10:38 |
noonedeadpunk | ++ | 10:39 |
Guest2357 | I can see serial parameter on nova, neutron, glance. for other service like designate and manila there is no serial parameter so they will be upgraded parallely on all control together so there will provisioning outage of these services during their upgrade | 10:39 |
Guest2357 | loadbalancer and manila file share which are already provisioned will not have impact? correct? | 10:40 |
noonedeadpunk | so ideally no, they won't have outages - only API might have interruption during deployment | 10:41 |
noonedeadpunk | BUT, I can't recall what release it was, likely something like Xena, where we've changed way of generating Octavia CA for LBs, where you need to be cautious about backwards compatability | 10:41 |
noonedeadpunk | we made a script to handle that, and it used to work, but still... | 10:42 |
Guest2357 | okay I see . thanks for confirmation. | 10:42 |
noonedeadpunk | it was even Yoga | 10:42 |
Guest2357 | yeah we will take care of this , as we plan to sequentially upgrade our ussuri to some latest version. | 10:43 |
noonedeadpunk | Eventually - Octavia certificates and SSH key for Amphoras is another thing to backup | 10:43 |
Guest2357 | so starting with victoria | 10:43 |
Guest2357 | can you please confirm the path for these octavia certificates and ssh key of amphora ? | 10:44 |
noonedeadpunk | it's really very important, as if cert is rotated on API side - it can't communicate with Amphoras API anymore, so you'd need either to return back original certs or failover all amphoras for them to get new one | 10:44 |
Guest2357 | so certificates rotation is expected after the upgrade? | 10:45 |
noonedeadpunk | It's like $HOME/openstack-ansible/octavia - terrible place actually for storage.... | 10:45 |
noonedeadpunk | no, it is not expected | 10:45 |
noonedeadpunk | unless you tell to rotate them | 10:45 |
noonedeadpunk | but just mentioning importance of these | 10:46 |
noonedeadpunk | and ssh keys are also $HOME/.ssh/octavia_key | 10:46 |
noonedeadpunk | and $HOME is really poor choice, as heavily depend on how one use deploy host. | 10:47 |
noonedeadpunk | as we had an issue as we're having LDAP on the deploy host, so had a separate set of octavia certs per user running playbooks | 10:48 |
noonedeadpunk | and if certificate is not found under PATH - then it will be generated | 10:48 |
Guest2357 | looks like I don't have those certificates on deployment host. | 10:49 |
noonedeadpunk | So potentially - you might want to move these certs somewhere else | 10:49 |
noonedeadpunk | or create them from octavia hosts :D | 10:49 |
noonedeadpunk | this is the upgrade script for Yoga which defines explicitly path for certificates for octavia: https://opendev.org/openstack/openstack-ansible/blame/branch/unmaintained/yoga/scripts/upgrade-utilities/define-octavia-certificate-vars.yml#L23-L28 | 10:50 |
noonedeadpunk | you can try to find them inside octavia container - here's the mapping: https://opendev.org/openstack/openstack-ansible-os_octavia/src/branch/unmaintained/victoria/tasks/octavia_certs_distribute.yml#L26-L43 | 10:52 |
Guest2357 | are these certificates and keys present on some path of control node or designate container as well? | 10:53 |
noonedeadpunk | so if you're running lxc containers - it should be there then | 10:53 |
noonedeadpunk | you can do like `lxc-attach -n <container_name> cat /etc/octavia/certs/ca_key.pem` | 10:54 |
noonedeadpunk | And I'd suggest to place certs also under openstack_deploy folder and explicitly supply path to these alike to what we do with our upgrade script | 10:56 |
Guest2357 | yea I can see the certs in above directory | 10:56 |
noonedeadpunk | so good you've asked :) | 10:57 |
Guest2357 | okay so now we need to copy the complete folder of certs and keep it on openstack-deploy path. | 11:01 |
noonedeadpunk | and define variables | 11:01 |
Guest2357 | okay we need to add in user-variable.yaml file. | 11:01 |
Guest2357 | what would be the variable name in that? | 11:02 |
noonedeadpunk | bunch of - you'd need to match file names for that, check the script from Yoga: https://opendev.org/openstack/openstack-ansible/blame/branch/unmaintained/yoga/scripts/upgrade-utilities/define-octavia-certificate-vars.yml#L23-L28 | 11:02 |
noonedeadpunk | will try to make more clear paste | 11:04 |
Guest2357 | okay got it. so it will be required during our ussuri to victoria upgrade also otherwise certificates will be rotated? | 11:05 |
noonedeadpunk | https://paste.openstack.org/show/bzlnPZWOzrgdSGFL1kL9/ | 11:06 |
noonedeadpunk | yes, if no certs found under expected path - role will generate new ones | 11:06 |
Guest2357 | okay great thanks. | 11:06 |
noonedeadpunk | as you don't have these in place - you need that right away to avoid all amphoras failover | 11:06 |
Guest2357 | what about the ssh keys? how can we keep them same? | 11:07 |
noonedeadpunk | > user-variable.yaml - it's actually important that file was matching regexp `user_*.yml` | 11:07 |
Guest2357 | yeah we have user_variable.yaml , so it was typo earlier | 11:08 |
noonedeadpunk | do you have `octavia_ssh_enabled` defined ? | 11:08 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-plugins master: Add infrastructure playbooks to openstack-ansible-plugins collection https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/924171 | 11:08 |
noonedeadpunk | as it's False by default | 11:08 |
Guest2357 | no I dont see this parameter in user_variables.yaml file. | 11:09 |
noonedeadpunk | so you might legitedly not having ssh keys | 11:09 |
noonedeadpunk | as that block is skipped unless you have it explicitly enabled | 11:09 |
Guest2357 | okay cool thanks for clarity. | 11:09 |
Guest2357 | for 3rd point , we have project router defined as non-HA and virtual router reside on control nodes without any redundancy. | 11:10 |
Guest2357 | so when L3 agent on control node restart it will take down all routers and routers will be recover when L3 agent is back. | 11:11 |
Guest2357 | we have around 11 routers each control node. | 11:12 |
noonedeadpunk | ok, then it should be pretty much neglectible | 11:12 |
noonedeadpunk | you might not even notice anything | 11:12 |
Guest2357 | okay just we can expect reachability issue of project when the L3 service is restarted. | 11:12 |
noonedeadpunk | yeah | 11:12 |
noonedeadpunk | so when neutron agents restarted they're trying to ensure their state and re-create wiring if needed. | 11:13 |
noonedeadpunk | So until agent fully finish "self-healing" some of routers might misbehave | 11:13 |
noonedeadpunk | other way around would be to do neutron-agents upgrade one-by-one manually specifying --limit | 11:14 |
noonedeadpunk | and you can move l3 and dhcp agents between controllers using `openstack network agent remove router --l3 <old_agent_uuid> <router_uuid> && openstack network agent add router --l3 <new_agent_uuid> <router_uuid>` | 11:16 |
noonedeadpunk | but for non-ha there still be downtime for this operation as well | 11:16 |
Guest2357 | yeah we are following this during control node reboots moving all routers on other compute and doing the reboot. | 11:17 |
harun | Installing ClusterAPI, I solved the apparmor issue but i got this error: https://paste.openstack.org/show/bWKz9AfH3M6KhoooPcRx/ | 11:23 |
Guest2357 | so in this case I need to run each playbook of setup-openstack.yaml one by one and while doing the os-neutron-install.yaml one give limit of control node ? | 11:26 |
noonedeadpunk | yeah, kinda... or temporary comment out os-neutron-install.yml from setup-openstack.yml | 11:35 |
Guest2357 | okay yeah that would be better to keep neutron after that with limits. | 11:36 |
jrosser | harun: best to look in the journal for kubelet to see what is the issue | 11:37 |
Guest2357 | also for compute openstack services [nova-compute and neutron-linuxbridge] upgrade , it is part of os-neutron-install.yaml and os-nova-install.yaml? | 11:39 |
Guest2357 | or some other playbook, as I want to see if there is option to control compute upgrade and doing some VM migrations in between to prevent downtime. | 11:40 |
harun | i ran the command with -e kubelet_allow_unsafe_swap=true, | 11:41 |
harun | the journal of the kubelet: https://paste.openstack.org/show/b3B0ALbcqAy1fUQeGLUR/ | 11:41 |
jrosser | so what does `/sbin/swapon -s` say? | 11:46 |
jrosser | i do not have swap enabled on my controller nodes so have not had this issue | 11:47 |
harun | i guess, it should be added "--fail-swap-on=false" to the /etc/systemd/system/kubelet.service.d/10-kubeadm.conf | 11:48 |
jrosser | well there are two things | 11:48 |
harun | the output of "/sbin/swapon -s": https://paste.openstack.org/show/bQBU9FKFhjs2cLu5TutE/ | 11:48 |
jrosser | the variable kubelet_allow_unsafe_swap only controls this https://github.com/vexxhost/ansible-collection-kubernetes/blob/4b502b215ccaffe71dc1aa5c8fdda2e34a4ef37c/roles/kubelet/tasks/main.yml#L71 | 11:49 |
rambo | hi | 11:49 |
*** rambo is now known as Guest2364 | 11:49 | |
Guest2364 | just got disconnected. discussing ussuri to victoria upgrade | 11:50 |
*** Guest2364 is now known as rambo2412 | 11:51 | |
jrosser | harun: i do not see a way to add extra settings to 10-bubeadm.conf with that code as it stands | 11:51 |
jrosser | also - that is a 3rd party collection, not part of openstack-ansible directly | 11:51 |
noonedeadpunk | rambo2412: neutron-linuxbridge is part of neutron playbook | 11:52 |
noonedeadpunk | but also from my experience downtime for it's restart might be less then from online migration... | 11:52 |
rambo2412 | okay I see so if I limit with control01 --> control02 --->control03 it will not upgrade the computes? | 11:53 |
rambo2412 | okay I see yeah so better keep the VM while the upgrade is happening. | 11:53 |
noonedeadpunk | yep, or you can also do like that `--limit 'neutron_all:!nova_compute'` | 11:54 |
noonedeadpunk | quite some options are around | 11:54 |
rambo2412 | okay sounds good, we can keep all routers on one and do like above. thanks. | 11:55 |
noonedeadpunk | but yeah, as you wanna do agents one-by-one - then makes sense to limit by hosts | 11:55 |
harun | when running without kubelet_allow_unsafe_swap=true, i am getting the error: https://paste.openstack.org/show/b5kORMNd05P5EC2wzpTA/ | 11:56 |
rambo2412 | or first we can limit by control and later remove any limit which will do on all computes and skip control nodes? | 11:57 |
harun | the error code is 32 | 11:57 |
noonedeadpunk | rambo2412: you can, but that would be more time consuming, as playbooks will run against neutron api and agents as well | 11:57 |
noonedeadpunk | despite it won't break/cahnge anything, jsut more execution time | 11:58 |
rambo2412 | okay I see, --limit 'neutron_all:!nova_compute' is negating the nova_computes , so it will skip computes? | 11:59 |
opendevreview | Merged openstack/openstack-ansible-os_magnum master: Add test for high-availability mcapi control plane https://review.opendev.org/c/openstack/openstack-ansible-os_magnum/+/923174 | 12:04 |
rambo2412 | thanks for all the support , I will further prepare my plan and MOP of the upgrade. will come back in case of any further queries | 12:15 |
opendevreview | Merged openstack/openstack-ansible-os_ceilometer master: Add support for Magnum notifications https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/927724 | 12:52 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_ceilometer stable/2024.1: Add support for Magnum notifications https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/927812 | 12:55 |
opendevreview | Merged openstack/openstack-ansible master: Use haproxy_install playbook from openstack-ansible-plugins repo https://review.opendev.org/c/openstack/openstack-ansible/+/924168 | 13:55 |
jrosser | noonedeadpunk: we want to make sure this merges first before going too far with moving playbooks to the plugins repo https://review.opendev.org/c/openstack/openstack-ansible/+/925974 | 13:58 |
jrosser | i think we might have some merge conflicts to deal with in all of these | 13:59 |
noonedeadpunk | yeah | 13:59 |
noonedeadpunk | But also I was thinking if setup-hosts should be like that: https://review.opendev.org/c/openstack/openstack-ansible/+/924259/4/playbooks/setup-hosts.yml | 14:00 |
noonedeadpunk | as I'd assume that for consistensy we need to have setup-hosts in collection as well | 14:00 |
noonedeadpunk | btw, I've just tested mariadb 11.4.3 and the issue with TLS is still there | 14:00 |
jrosser | ah yes you are right with setup-hosts, let me adjust that | 14:02 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-plugins master: Add setup-hosts playbook to plugins collection. https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/927826 | 14:06 |
noonedeadpunk | can you have a `-` in playbook name in collection? | 14:06 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-plugins master: Add setup_hosts playbook to plugins collection. https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/927826 | 14:08 |
jrosser | nope :) | 14:08 |
noonedeadpunk | also... I think we need to add dummy playbooks? | 14:09 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible master: Use hosts setup playbooks from openstack-ansible-plugins repo https://review.opendev.org/c/openstack/openstack-ansible/+/924259 | 14:09 |
jrosser | good point | 14:09 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Bump SHAs and pinned versions https://review.opendev.org/c/openstack/openstack-ansible/+/927841 | 14:59 |
noonedeadpunk | #startmeeting openstack_ansible_meeting | 15:00 |
opendevmeet | Meeting started Tue Sep 3 15:00:24 2024 UTC and is due to finish in 60 minutes. The chair is noonedeadpunk. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:00 |
opendevmeet | The meeting name has been set to 'openstack_ansible_meeting' | 15:00 |
noonedeadpunk | #topic rollcall | 15:00 |
noonedeadpunk | o/ | 15:00 |
jrosser | o/ hello | 15:00 |
noonedeadpunk | #topic office hours | 15:01 |
noonedeadpunk | so we have couple of things for dicsussion | 15:02 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-plugins master: Add setup_hosts playbook to plugins collection. https://review.opendev.org/c/openstack/openstack-ansible-plugins/+/927826 | 15:02 |
noonedeadpunk | Noble support is almost here from what I see | 15:02 |
noonedeadpunk | #link https://review.opendev.org/c/openstack/openstack-ansible/+/924342 | 15:02 |
jrosser | sort of - i would say yes as far as the integrated repo is concerned | 15:02 |
noonedeadpunk | but job is failing multiple times in a row now, but every time in a different way | 15:02 |
jrosser | probably no as far as all additional services are concerned | 15:03 |
noonedeadpunk | yeah, that's true as well | 15:03 |
jrosser | we should do some work on CI stability | 15:03 |
jrosser | i have been trying to keep notes on common failures | 15:03 |
NeilHanlon | hiya | 15:03 |
jrosser | like failing to get u-c, image download errors etc | 15:03 |
noonedeadpunk | I've spotted bunch of mirrors issues with RDO lately as well | 15:03 |
* NeilHanlon hopes for few rocky issues | 15:04 | |
jrosser | but there is also a rumble of tempest failures, perhaps with more often than not it being keystone | 15:04 |
noonedeadpunk | were some :D | 15:04 |
jrosser | andrewbonney: ^ you were looking at failures too a bit I think? | 15:04 |
* NeilHanlon plugs his ears and pretends he didn't hear anything | 15:04 | |
jrosser | and the mcapi job is extremely troublesome, which needs more investigation | 15:04 |
noonedeadpunk | NeilHanlon: actually we've also discussed with infra folks Rocky mirrors | 15:04 |
jrosser | but on the surface that looks like nothing at all to do with magnum causing the errors | 15:05 |
NeilHanlon | yeah i remember some message from last month or so... travelling took a lot out of me | 15:05 |
noonedeadpunk | seems they do have space on afs share now and were fine adding them | 15:05 |
NeilHanlon | i will try and restart that convo | 15:05 |
noonedeadpunk | yeah, would make sense, as CentOS testing was pulled of as a whole due to experiencing quite some issues | 15:05 |
noonedeadpunk | and rocky was discussed as a replacement | 15:05 |
NeilHanlon | right | 15:06 |
noonedeadpunk | about capi jobs - I frankly did not look into these at all | 15:06 |
noonedeadpunk | as still barely get the topic | 15:06 |
noonedeadpunk | though coming closer and closer by internal backlog to it | 15:06 |
noonedeadpunk | Another thing that you raised my attention to - is changing a way of how uwsgi is supposed to be served | 15:07 |
noonedeadpunk | and puling off wsgi scripts from service setup scripts | 15:07 |
noonedeadpunk | So this bump will totally fail on these changes | 15:07 |
noonedeadpunk | #link https://review.opendev.org/c/openstack/openstack-ansible/+/927841 | 15:08 |
jrosser | hopefully we can make some depends-on patches and work through what is broken fairly easily | 15:08 |
noonedeadpunk | yeah | 15:09 |
noonedeadpunk | and with that test noble I hope | 15:09 |
opendevreview | Merged openstack/openstack-ansible master: Verify OS for containers installation https://review.opendev.org/c/openstack/openstack-ansible/+/925974 | 15:09 |
noonedeadpunk | we also need to come up with release highlights | 15:09 |
jrosser | do we have anything big left to fix/merge this cycle? | 15:10 |
NeilHanlon | i guess i will also probably start on rocky 10 experimental jobs at some point. i need to check up with RDO folks first | 15:10 |
jrosser | deb822 is one thing, but i think thats now understood and is just a question of doing the other places | 15:10 |
noonedeadpunk | looking through our ptg doc | 15:10 |
noonedeadpunk | #link https://etherpad.opendev.org/p/osa-dalmatian-ptg | 15:10 |
NeilHanlon | goodness, it's almost PTG again isnt it.. | 15:11 |
noonedeadpunk | and realizing I failed to work on most interesting topic for myself so far | 15:11 |
jrosser | but it would be quite good to be able to spend the rest of the cycle getting existing stuff merged and doing tidy/up & CI fixing | 15:11 |
noonedeadpunk | NeilHanlon: it really is.... | 15:11 |
jrosser | we have had a couple of times now with a real big rush for release | 15:11 |
noonedeadpunk | jrosser: yes, exactly. I don't aim to bring anything new | 15:11 |
noonedeadpunk | really want to have a coordinated release as a feature freeze | 15:11 |
jrosser | i would say we are basically there apart from finishing a few things | 15:12 |
noonedeadpunk | so about topics: deb822, noble, playbooks into collection | 15:12 |
jrosser | yeah | 15:12 |
jrosser | i will try to find time soon to revisit the deb822 stuff | 15:12 |
NeilHanlon | oh i forgot if i mentioned it but i do have a working incus for rocky 9 | 15:13 |
NeilHanlon | https://copr.fedorainfracloud.org/coprs/neil/incus/ | 15:13 |
noonedeadpunk | oh, that's really nice. | 15:13 |
jrosser | noble is potentially a big job, as also i think we have some still broken roles | 15:13 |
noonedeadpunk | we should try to look into that for 2025.1 I guess | 15:13 |
NeilHanlon | agreed | 15:13 |
* NeilHanlon reads up on what deb822 is | 15:13 | |
jrosser | and for playbooks->collection - we should decide how far we go this cycle | 15:14 |
noonedeadpunk | yeah, these are broken ones | 15:14 |
noonedeadpunk | #link https://review.opendev.org/q/topic:%22osa/frist_host_refactoring%22+status:open | 15:14 |
noonedeadpunk | jrosser: I'd go all-in | 15:14 |
jrosser | like is -hosts and -infra enough and we treat -openstack as further work? | 15:14 |
noonedeadpunk | I can get some time to finalize jsut in case | 15:14 |
jrosser | ok - i have kind of lost where we got up to as it has taken so very long to merge the initial stuff | 15:15 |
jrosser | there will be some remaining common-tasks / common-playbooks i expect | 15:15 |
noonedeadpunk | yeah, it took quite long for reviews as well to ensure that all changes to playboks were moved as well | 15:15 |
noonedeadpunk | so far good question is what to do with things like ceph playbooks | 15:17 |
noonedeadpunk | but it looks like that most of things you moved already anyway:) | 15:19 |
noonedeadpunk | so it's good | 15:19 |
noonedeadpunk | And there's also - what to do with things like that: https://opendev.org/openstack/openstack-ansible/src/branch/master/playbooks/listening-port-report.yml | 15:19 |
noonedeadpunk | I assume you're using this? | 15:20 |
jrosser | that was very useful in the time of working on bind-to-mgmt | 15:20 |
jrosser | but i think actually there is an ansible module to do the same now | 15:20 |
jrosser | https://docs.ansible.com/ansible/2.9/modules/listen_ports_facts_module.html | 15:21 |
noonedeadpunk | yeah | 15:21 |
noonedeadpunk | ok, so overall the list sounds doable - noble, wsgi_scripts and playbooks | 15:22 |
jrosser | i think so | 15:22 |
jrosser | the magnum stuff is ok - but we do risk making a release that includes installing stuff from github.com/jrosser fork which i don't like | 15:23 |
jrosser | mnaser: ^ | 15:23 |
noonedeadpunk | Btw there was 1 bug report I wanted to check on, but failed so far | 15:25 |
noonedeadpunk | #link https://bugs.launchpad.net/openstack-ansible/+bug/2078552 | 15:25 |
noonedeadpunk | I believe there's a race condition in there, as in case `rabbitmqctl cluster_status` exits with error code, which triggeres assert failure, then we probably should not attempt to run it to get flags either | 15:26 |
noonedeadpunk | But I didn't look in the code, but I guess expectation for recovery in case of cluster failure is fair | 15:26 |
noonedeadpunk | I was thinking though if it would make sense to add another flag like `ignore_cluster_state` as we have in mariadb | 15:27 |
jrosser | andrewbonney: you may have thoughts on this ^ | 15:27 |
noonedeadpunk | but then it might go too far, and raise a question if mnesia should be preserved with that flag or not | 15:27 |
jrosser | time going backwards is really bad though :) | 15:27 |
andrewbonney | Yeah, I'll try and look tomorrow, context switch is too hard right now | 15:28 |
noonedeadpunk | oh yes, it's not good :D | 15:28 |
noonedeadpunk | I can get how that happened though... | 15:28 |
noonedeadpunk | or well | 15:28 |
noonedeadpunk | I spotted couple of times, that after reboot chrony somehow does not startup properly from time to time | 15:29 |
jrosser | openstack doesnt support 24.04 for D does it? | 15:29 |
noonedeadpunk | no, they're trying master | 15:29 |
noonedeadpunk | there was another report: https://bugs.launchpad.net/openstack-ansible/+bug/2078521 | 15:29 |
jrosser | right - so i still think we need to be careful what message we give out | 15:29 |
noonedeadpunk | yeah, I explained support matrix in the previous one | 15:30 |
noonedeadpunk | so folk is trying to beta test on master and report back findings | 15:30 |
noonedeadpunk | just pretty much missed collection dependency I guess | 15:30 |
jrosser | indeed - the noble topic is really only just all merged now | 15:32 |
noonedeadpunk | but dunno... anyway, overall issue description looks reasonable enough t o double check | 15:32 |
noonedeadpunk | there was another one, but I feel like it's a zun issue | 15:34 |
noonedeadpunk | #link https://bugs.launchpad.net/openstack-ansible/+bug/2078482 | 15:34 |
noonedeadpunk | so at worst we can mark it as invalid for osa | 15:34 |
jrosser | interesting venv paths in that bug report | 15:36 |
noonedeadpunk | indeed.... | 15:37 |
noonedeadpunk | ah | 15:37 |
noonedeadpunk | I guess it's just top of the 2024.1 | 15:38 |
noonedeadpunk | and pbr detects version tag as `stable/2024.1` | 15:38 |
noonedeadpunk | though I would not expect that happening | 15:38 |
jrosser | i thought you still got the previous tag with -dev<big-number> in that case | 15:38 |
noonedeadpunk | it used to be that way for sure, yes | 15:39 |
jrosser | well, some number | 15:39 |
noonedeadpunk | but technically one can override version as well | 15:40 |
noonedeadpunk | but that's pretty much it then | 15:40 |
noonedeadpunk | ah, we have another "bug" on master (and 2024.1 I guess) | 15:43 |
noonedeadpunk | we have conflicting MPMs for Apache between services | 15:44 |
noonedeadpunk | like repo and keystone asking for event and horizon and skyline for event | 15:44 |
noonedeadpunk | or smth like that | 15:44 |
jrosser | actually this is something we should fix | 15:44 |
noonedeadpunk | so re-running playbooks result in fialures | 15:44 |
noonedeadpunk | things went completely off with repo actually | 15:45 |
noonedeadpunk | yeah, I was just thinking about best way for that | 15:45 |
jrosser | thats only in master though currently? | 15:45 |
noonedeadpunk | well, in stable you can shoot into your leg as well | 15:45 |
noonedeadpunk | like - override https://opendev.org/openstack/openstack-ansible-os_keystone/src/branch/master/defaults/main.yml#L235 | 15:46 |
noonedeadpunk | but then - https://opendev.org/openstack/openstack-ansible-os_skyline/src/branch/master/vars/debian.yml#L31-L34 | 15:46 |
noonedeadpunk | and https://opendev.org/openstack/openstack-ansible-os_horizon/src/branch/master/vars/debian.yml#L61-L64 | 15:47 |
noonedeadpunk | so this all leans towards apache role eventually | 15:48 |
jrosser | yes agreed | 15:48 |
noonedeadpunk | but also I think this should be still be backportable at first... | 15:49 |
noonedeadpunk | ah, and also what I found yesterday - is a bug in neutron handlers for l3 - these 2 things just doens not work on modern kernels https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/handlers/main.yml#L33-L75 | 15:51 |
noonedeadpunk | but also I'm not sure what's meant under `pgrep neutron-ns-meta` | 15:51 |
noonedeadpunk | I'm not sure though if worth including apache thing in this release.. I guess not, but for 2025.1 | 15:54 |
noonedeadpunk | #endmeeting | 16:00 |
opendevmeet | Meeting ended Tue Sep 3 16:00:00 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-09-03-15.00.html | 16:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-09-03-15.00.txt | 16:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/openstack_ansible_meeting/2024/openstack_ansible_meeting.2024-09-03-15.00.log.html | 16:00 |
noonedeadpunk | jrosser: do you have any guess what `pgrep neutron-ns-meta` should be catching at all? | 16:13 |
jrosser | well - this returns the pids of any processes of that name? | 16:16 |
noonedeadpunk | um, and do you have any output? | 16:17 |
noonedeadpunk | as I'm not sure if it's a valid process name at all | 16:17 |
noonedeadpunk | also - it seems that pattern is limited by 16 symbols, just in case | 16:18 |
noonedeadpunk | and then - `readlink -f` does not provide output to the level to see venv_tag.... | 16:18 |
noonedeadpunk | talking about these: https://opendev.org/openstack/openstack-ansible-os_neutron/src/branch/master/handlers/main.yml#L41-L42 | 16:19 |
noonedeadpunk | https://paste.openstack.org/show/b3dFyeCXzifdorJmYYMQ/ | 16:19 |
noonedeadpunk | ugh. | 16:20 |
noonedeadpunk | but really at least understnad what exact procces we're supposed to catch at least... | 16:20 |
noonedeadpunk | as my only guess would be `neutron-metadata-agent` | 16:21 |
* jrosser looking | 16:23 | |
jrosser | i wonder if it should actually be `pgrep -f ns-metadata-proxy` | 16:25 |
jrosser | what is actually is searching for is part of the path in the haproxy config file | 16:26 |
jrosser | like `haproxy -f /var/lib/neutron/ns-metadata-proxy/47fb30ac-5c90-4ed7-9a15-65a1225bb6db.conf` | 16:26 |
noonedeadpunk | but then `| grep -qv "{{ neutron_venv_tag }}` would be pointless as well | 16:28 |
noonedeadpunk | ok, so commit message says about `neutron-ns-metadata-proxy` | 16:29 |
jrosser | i feel like `/proc/$ns_pid/exe` would be the pid of the thing that owns the namespace | 16:29 |
noonedeadpunk | nah, it's returning `/usr/sbin/haproxy ` | 16:30 |
jrosser | also https://opendev.org/openstack/neutron/src/branch/master/releasenotes/notes/switching-to-haproxy-for-metadata-proxy-9d8f7549fadf9182.yaml | 16:30 |
noonedeadpunk | yeah, jsut found that | 16:31 |
jrosser | and this cleanup code is 8 years old | 16:31 |
noonedeadpunk | and reno is 7yo | 16:31 |
jrosser | so it might be now either totally wrong or redundant | 16:31 |
noonedeadpunk | ok, cool, so that is likely redundant | 16:31 |
jrosser | well, unless the same issue exists just in a different way | 16:32 |
noonedeadpunk | I guess intention there was to kill proxies running from old venvs on upgrade | 16:32 |
noonedeadpunk | but haproxy should not really matter that much | 16:32 |
jrosser | yes exactly that | 16:32 |
noonedeadpunk | as it's going from system packages kinda | 16:33 |
jrosser | it would be simple to look in a sandbox to see if all those haproxy processes get restarted if you restart the relevant neutron service | 16:34 |
noonedeadpunk | I'm not sure if they are, but my guess is that they should not even | 16:34 |
jrosser | the only thing would be if an upgrade to neutron expected to be putting different content in the generated .conf file | 16:35 |
noonedeadpunk | as wha twe do in the next handler - kill things except haproxy and keepalived | 16:35 |
noonedeadpunk | But then neutron should be handling reload regardless | 16:35 |
noonedeadpunk | as updated content would come only through notifications I assume | 16:35 |
noonedeadpunk | or smth like that | 16:36 |
jrosser | oh no i meant if there was some code change in neutron | 16:36 |
jrosser | that meant the conf files should be updated | 16:36 |
noonedeadpunk | ah, base template, yeah | 16:36 |
jrosser | yeah | 16:36 |
noonedeadpunk | but then we need smth like `neutron_l3_cleanup_on_shutdown` I guess | 16:37 |
jrosser | as usual "its complicated" but for certain we can remove that code as it's been doing nothing for a long time | 16:38 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-rabbitmq_server master: Manage apt repositores and keys using deb822_repository module https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/907833 | 16:49 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-openstack_hosts master: Manage apt repositores and keys using deb822_repository module https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/907434 | 16:51 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-openstack_hosts master: Manage apt repositores and keys using deb822_repository module https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/907434 | 16:51 |
opendevreview | Jonathan Rosser proposed openstack/openstack-ansible-openstack_hosts master: Manage apt repositores and keys using deb822_repository module https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/907434 | 16:52 |
noonedeadpunk | ok, we have some extra work to do to run Neutron with uwsgi | 22:46 |
noonedeadpunk | https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/SVP3VUCOZGIY63TGD33H6NQ6UBAFDN5V/ | 22:47 |
noonedeadpunk | like - neutron-ovn-maintenance-worker and neutron-periodic-workers | 22:47 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Disable uWSGI usage by default https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/927881 | 23:11 |
noonedeadpunk | some extra chunk of work.... | 23:11 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!