opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Add autocomplete script for playbooks https://review.opendev.org/c/openstack/openstack-ansible/+/932220 | 07:09 |
---|---|---|
*** gaudenz__ is now known as gaudenz | 07:12 | |
noonedeadpunk | fwiw, I failed to get CAPI working during weekends. | 08:14 |
noonedeadpunk | first weird thing was that magnum-system namespace wasn't created, so I had to create it manually. | 08:15 |
noonedeadpunk | Then code somehow fails if you've missed dns_nameservers in coe templates | 08:15 |
noonedeadpunk | but last - it just freeze in create_in_progress, and when I check inside k8s control cluster - it doesn't have any progress on creation either | 08:16 |
noonedeadpunk | https://paste.openstack.org/show/bKbpA5igK1IDKzPw17V5/ | 08:17 |
noonedeadpunk | and - no openstack resources are being created | 08:17 |
noonedeadpunk | so if someone (looking at jrosser) has some advice of what can I be doing wrong or how to trace that - would be appreciated | 08:28 |
opendevreview | Merged openstack/openstack-ansible-rabbitmq_server master: Cleanup unneeded upgrade tasks https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/931973 | 09:56 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Freeze roles for 30.0.0.0b1 release https://review.opendev.org/c/openstack/openstack-ansible/+/931611 | 13:32 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Unfreeze roles after milestone release https://review.opendev.org/c/openstack/openstack-ansible/+/931612 | 13:34 |
noonedeadpunk | would be nice to have some more reviews on https://review.opendev.org/q/topic:%22bump_osa%22+status:open | 13:36 |
opendevreview | Merged openstack/openstack-ansible-rabbitmq_server master: Add erlang package defenition to defaults https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/931794 | 14:22 |
jrosser | noonedeadpunk: so i never had to create the magnum-system namespace manually | 15:15 |
noonedeadpunk | huh | 15:15 |
noonedeadpunk | so smth went terribly wrong then in my case... | 15:15 |
jrosser | the ci jobs don't do that after all | 15:16 |
noonedeadpunk | yeah, true | 15:16 |
noonedeadpunk | but also I'm not sure they're passing today either | 15:16 |
noonedeadpunk | ok, they are... | 15:17 |
noonedeadpunk | I did multiple times destroyed k8s containers and re-spawned them as well... | 15:17 |
noonedeadpunk | wonder if there's some folder that persisted on control plane | 15:18 |
noonedeadpunk | but namespace is not created by vexxhost roles at least | 15:18 |
jrosser | i would guess originating something like here https://github.com/vexxhost/magnum-cluster-api/blob/main/magnum_cluster_api/resources.py#L134 | 15:24 |
jrosser | but i guess mnaser would have a quick answer to this :) | 15:24 |
noonedeadpunk | I was thinking about here even: https://github.com/vexxhost/magnum-cluster-api/blob/178b4be4202ce3338d28aab1644b6be4f7040592/magnum_cluster_api/sync.py#L34 | 15:24 |
noonedeadpunk | as there was smth related to failed locking | 15:24 |
mnaser | https://github.com/vexxhost/magnum-cluster-api/blob/main/magnum_cluster_api/driver.py#L63 | 15:25 |
mnaser | The namespace gets created when the create_cluster gets called | 15:25 |
mnaser | Did you make sure you had the correct kubeconfig file in $HOME/.kube/config ? | 15:25 |
noonedeadpunk | on magnum side I assume? | 15:26 |
mnaser | yeah, magnum-conductor needs to be able to read that file | 15:26 |
noonedeadpunk | as once I've created namespace manually - at least things passed a little bit to a point of https://paste.openstack.org/show/bKbpA5igK1IDKzPw17V5/ | 15:27 |
jrosser | ah well if you deleted/recreated the control plane k8s but did not do the rest of the playbook to redo the magnum integration, that would be wrong | 15:27 |
noonedeadpunk | and `kube-vn6vf` is what I got from magnum api | 15:27 |
mnaser | okay so it seems to have progressed a bit | 15:27 |
mnaser | id check the logs of the capo-system namespace | 15:27 |
mnaser | kubectl -n capo-system logs deploy/capo-controller-manager | 15:28 |
mnaser | -f will get you a follow, it'll talk a bit about what its trying to od | 15:28 |
noonedeadpunk | ok, I think that's the issue indeed, thanks mnaser | 15:28 |
noonedeadpunk | it claims on failure to verify cert | 15:29 |
noonedeadpunk | I wasn't just sure what logs to check where :) | 15:29 |
mnaser | PR to the docs would be welcome :P | 15:29 |
noonedeadpunk | (fwiw, cert is a valid let's encrypt there) | 15:29 |
noonedeadpunk | ++ | 15:29 |
noonedeadpunk | I'm trying to get my head around first before pushing to docs | 15:29 |
noonedeadpunk | mnaser: btw, I've pushed couple of things, but confused a bit at how/who should launch pipelines there? | 15:30 |
mnaser | noonedeadpunk: like for the repos? you can create a pr to the repo and its a combination of zuul/old github actions we're getting slowly rid of | 15:31 |
noonedeadpunk | as it seems it's only repo maintainers who run pipelines? | 15:31 |
mnaser | no, zuul should automatically run on the PR, but i dont see any :p | 15:32 |
mnaser | oooh, for https://github.com/vexxhost/ansible-collection-kubernetes/pulls | 15:32 |
mnaser | yeah, that i need to get around moving to zuul | 15:32 |
noonedeadpunk | yeah, ie https://github.com/vexxhost/ansible-collection-kubernetes/pull/136 | 15:32 |
noonedeadpunk | there're some zuul bits though | 15:32 |
mnaser | yeah i need to zuulify that repo like i did for the others, we moved away from GHA | 15:33 |
mnaser | noonedeadpunk: just fyi in github when you get a PR from a first time contributor, it needs maintainers to approve before the CI runs for the repo | 15:33 |
mnaser | once you have your first landed change for the future GHA will automatically run, i guess its to protect from someone being malicious or something | 15:34 |
noonedeadpunk | I somehow thought it depends on repo config, but yeah, not 100% sure | 15:34 |
noonedeadpunk | but anyway, point was that I'm trying to contribute back when have smth on my hands :) | 15:35 |
mnaser | yeah no i appreciate it | 15:36 |
noonedeadpunk | hm, okay, so why in the hell let's encrypt in unknown authority for k8s.... | 15:36 |
mnaser | noonedeadpunk: by default, the capo container has no "os", so we pass the CA for it dynamically in the config | 15:37 |
mnaser | i get it from certifi and push it if none is provided | 15:37 |
noonedeadpunk | aha, and I passed it internal CA and it tries to reach public endpoint | 15:38 |
mnaser | noonedeadpunk: https://github.com/vexxhost/magnum-cluster-api/blob/178b4be4202ce3338d28aab1644b6be4f7040592/magnum_cluster_api/resources.py#L587-L607 yea | 15:38 |
mnaser | https://github.com/vexxhost/magnum-cluster-api/blob/main/magnum_cluster_api/utils.py#L89-L96 | 15:39 |
noonedeadpunk | though I do have `[capi_client]/endpoint = internal` | 15:41 |
noonedeadpunk | ok, but knowing where logs are really explained a lot, so thanks! | 15:42 |
noonedeadpunk | I guess I should be able to proceed from that | 15:42 |
noonedeadpunk | I kinda wonder if that's what making to connect to public URL https://github.com/vexxhost/magnum-cluster-api/commit/178b4be4202ce3338d28aab1644b6be4f7040592 | 15:43 |
noonedeadpunk | which makes total sense | 15:43 |
noonedeadpunk | but then we need to pass just all system certs I guess? | 15:44 |
noonedeadpunk | though likely I'm not yet there anyway, as no resources were created at the first place. and then client cluster is spawned on ubuntu and it can have system certs (theoretically) | 15:45 |
noonedeadpunk | anyway | 15:45 |
jrosser | noonedeadpunk: have been round all this CA stuff a lot here :) | 16:05 |
jrosser | let me know if you want me to check any specific vars blah blah | 16:05 |
jrosser | and afaik, the config in the AIO should deal with this as theres pki role on the internal vip | 16:06 |
jrosser | so you have worst case setup, where it's not a trusted cert so the config has to be spot-on everywhere | 16:06 |
noonedeadpunk | the problem now, is that in logs it tries to reach keystone over public interface, while I totally see internal uri in config | 16:06 |
noonedeadpunk | and I've suplied path to internal CA in config... | 16:06 |
jrosser | what does :) there are many things | 16:07 |
jrosser | andrews patch is for the workload cluster | 16:07 |
noonedeadpunk | yeah, wrokload is not there yet.... | 16:07 |
jrosser | you might also need to patch magnum | 16:07 |
jrosser | maybe you run into this? https://bugs.launchpad.net/magnum/+bug/2060194 | 16:08 |
noonedeadpunk | oh, well | 16:09 |
noonedeadpunk | it would explain it | 16:09 |
noonedeadpunk | I just though this will be actually used: https://github.com/vexxhost/magnum-cluster-api/blob/178b4be4202ce3338d28aab1644b6be4f7040592/magnum_cluster_api/resources.py#L580-L583 | 16:10 |
noonedeadpunk | but I'm gonna try and pass jsut whole system trust to be frank | 16:10 |
jrosser | that needs a proper fix, as imho this should all be automagic from parsing the client config options with keystoneauth | 16:10 |
jrosser | but /o\ every service does this different and it's all messy | 16:10 |
noonedeadpunk | oh yes | 16:10 |
jrosser | andrews patch is just a sticking plaster really | 16:11 |
jrosser | ok so you have an issue in the magnum code about endpoints | 16:11 |
noonedeadpunk | btw there was some reply in ML about db pooling being broken, but I didn't look into the patch in detail still | 16:12 |
noonedeadpunk | yeah, likely. | 16:12 |
jrosser | the thing you linked is about openstack clients that run in the k8s control plane which also need to select the correct endpoint | 16:12 |
jrosser | so it has to be right in two completely seperate places | 16:12 |
noonedeadpunk | or well, in that env I don't care _much_ about public/internal, just different certs for them | 16:12 |
jrosser | well, at least two | 16:12 |
noonedeadpunk | yeah, why I looked into k8s code as the error I see on k8s control cluster https://paste.openstack.org/show/bL8Q0BBUbKjgtatJ63NF/ | 16:13 |
jrosser | tbh there is likley stuff we can improve in the docs | 16:15 |
jrosser | what i have pushed is basically all good for the AIO/CI | 16:15 |
jrosser | but having said that we make a bunch of overrides for things like this in actual deployments | 16:15 |
noonedeadpunk | well, I'm playing it a multinode physical sandbox, so I'm not in any real restrictions there _yet_ | 16:17 |
mnaser | noonedeadpunk: there is a secret in the magnum-system ns | 16:17 |
mnaser | it should contain the CA that is in use | 16:17 |
jrosser | i have written all this up | 16:17 |
jrosser | should make a patch for some "how it works" docs | 16:18 |
noonedeadpunk | so if I didn't have magnum-system ns - highly unlikely I got the secret.... | 16:18 |
mnaser | kubectl -n magnum-system get secret | 16:19 |
mnaser | what does that give you? | 16:19 |
noonedeadpunk | no resources found | 16:19 |
noonedeadpunk | ok, let me drop all clusters and namespaces | 16:20 |
noonedeadpunk | and restart magnum-conductor, I assume | 16:20 |
mnaser | yeah i think something went terribly wrong here, | 16:20 |
mnaser | the secret with the CA is missing so its probably not using any CAs at all | 16:20 |
mnaser | id watch logs of m-cond and send the create and see what gets logged | 16:21 |
noonedeadpunk | ++ | 16:21 |
noonedeadpunk | it looked like it was failing very early, when tried to create some lock with KeyError | 16:21 |
noonedeadpunk | that namespace is not found | 16:21 |
jrosser | mnaser: btw did you ever try multiple interfaces on a workload cluster? | 16:22 |
mnaser | jrosser: never got around it | 16:23 |
mnaser | jrosser: im going over all the open PRs -- are you still using https://github.com/vexxhost/ansible-collection-containers/pull/21/files ? | 16:23 |
jrosser | so we do make some progress, but are running into this https://medium.com/@kanrangsan/how-to-specify-internal-ip-for-kubernetes-worker-node-24790b2884fd | 16:23 |
mnaser | so you need to template the ip of the server into extraArgs | 16:24 |
jrosser | so for https://github.com/vexxhost/ansible-collection-containers/pull/21/files yes we are using that | 16:27 |
jrosser | and because there was no review of that yet i did not create a PR for the companion here https://github.com/jrosser/ansible-collection-kubernetes/tree/download-artifacts | 16:27 |
jrosser | but that is pretty much ready to go (subject to getting everything rebased and back up to date......) | 16:28 |
noonedeadpunk | btw, another thing, is that driver moves cluster to failure if there's even number of members. But API accepts request to create cluster. Was there some discussion around that in magnum? As it feels totally as verification before accepting the request. | 16:30 |
mnaser | noonedeadpunk: i think this is because its a few layers below where we cant validate this, i think adding stuff to the api to decline this would be helpful but we havent gotten around bubbling this stuff up | 16:31 |
noonedeadpunk | so I dropped the namespace and cluster creation fails on it again: https://paste.openstack.org/show/bx3OvvOVIreomHH1tYsG/ | 16:32 |
noonedeadpunk | oh, yeah, I totally get that it's verified very down the line | 16:32 |
mnaser | noonedeadpunk: i think you're seeing an unrelated issue here | 16:33 |
mnaser | `magnum.service.periodic.ClusterUpdateJob.update_status` runs to update status of clusters, and i think what happened was that the magnum-system ns did not exist when that periodic job ran | 16:33 |
noonedeadpunk | oh | 16:34 |
noonedeadpunk | ok, yes, now namespace is there, huh | 16:35 |
noonedeadpunk | so it was just a red herring | 16:35 |
noonedeadpunk | yeah, and at least one machine was created once I passed the system trust | 16:37 |
mnaser | the thing is magnum has no like... coordination system | 16:40 |
mnaser | so N conductors means N requests :p | 16:40 |
mnaser | or M*N requests where M is clusters | 16:40 |
noonedeadpunk | yeah, sounds like adopting tooz would be beneficial for magnum | 16:41 |
mnaser | yep | 16:42 |
noonedeadpunk | oh, btw, you folks are using horizon for magnum, right? Am I blind, or there's no good way to fetch magnum config together with cert embeded, like you'd get with CLI? | 16:44 |
jrosser | i think there was a patch to fix that | 16:44 |
noonedeadpunk | /o\, okay | 16:45 |
noonedeadpunk | it's same with heat driver, just to be clear | 16:45 |
noonedeadpunk | I jsut haven't used Horizon for a while | 16:45 |
jrosser | oh https://review.opendev.org/c/openstack/magnum-ui/+/917913 | 16:45 |
jrosser | doh | 16:46 |
noonedeadpunk | been a while as well... | 16:46 |
jrosser | mnaser: if you are looking at PR then doing something (anything?!) about Noble deployments would be useful - afaik you need something like https://github.com/vexxhost/ansible-collection-kubernetes/pull/127 | 16:49 |
jrosser | soon we will release OSA pointing to my fork otherwise, which would not be ideal | 16:49 |
mnaser | jrosser: ok sounds good, im working my way up the stack from the collections role and working my way up | 16:50 |
jrosser | ok cool - thanks | 16:50 |
* noonedeadpunk will check on patch to add support for more modern control cluster versions soonish | 16:51 | |
mnaser | jrosser: after this https://github.com/vexxhost/ansible-collection-containers/pull/21 i will tag/release new version of vexxhost.containers | 16:52 |
jrosser | cool | 16:53 |
jrosser | i will check on my counterpart patches for the kubernetes collection to go with that PR | 16:54 |
jrosser | i re-use the plugin out of the containers collection to parse the list of binaries needed by the kubernetes roles | 16:54 |
jrosser | this is likley out of date / conflicting now as theres been more change in the k8s collection | 16:55 |
mnaser | yeah no problem, i just wanted to know if the containers collection is missing any other pieces so i can release | 17:04 |
opendevreview | Merged openstack/openstack-ansible stable/2023.1: Ensure that the inventory tox job runs on an ubuntu-jammy node https://review.opendev.org/c/openstack/openstack-ansible/+/932080 | 17:11 |
opendevreview | Merged openstack/openstack-ansible stable/2023.1: Bump SHAs for 2023.1 https://review.opendev.org/c/openstack/openstack-ansible/+/931742 | 17:18 |
mnaser | jrosser: stage 1 done :) https://galaxy.ansible.com/ui/repo/published/vexxhost/containers/ | 17:46 |
noonedeadpunk | damn, I've introduced quite a bug to neutron role recently | 18:01 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Ensure that services that intended to stay disabled are not started https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/932357 | 18:04 |
noonedeadpunk | or well. it was for a while but become harmful lately | 18:04 |
opendevreview | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Ensure that services that intended to stay disabled are not started https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/932357 | 18:08 |
noonedeadpunk | I'd say we need to include that for release ..... | 18:28 |
mnaser | noonedeadpunk: did you use 1.31 for your k8s cluster in your tests? | 20:45 |
mnaser | i am seeing some changes in 1.29 that probably caused you to see your issues | 20:45 |
opendevreview | Merged openstack/openstack-ansible stable/2024.1: Bumps SHAs for 2024.1 https://review.opendev.org/c/openstack/openstack-ansible/+/931740 | 22:32 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!