opendevreview | OpenStack Proposal Bot proposed openstack/magnum-ui master: Imported Translations from Zanata https://review.opendev.org/c/openstack/magnum-ui/+/955260 | 03:44 |
---|---|---|
opendevreview | Jasleen proposed openstack/magnum-capi-helm master: Support token based kubeconfig files for the capi management cluster https://review.opendev.org/c/openstack/magnum-capi-helm/+/953956 | 13:06 |
opendevreview | Jasleen proposed openstack/magnum-capi-helm master: Support token based kubeconfig files for the capi management cluster https://review.opendev.org/c/openstack/magnum-capi-helm/+/953956 | 13:16 |
andrewbogott_ | Is there anyone here know knows about capi-helm? And/or is there a proper channel/mailing list/etc to discuss capi-helm or other magnum issues? (I know this channel is for devs but it's all I've got) | 14:50 |
atmark | andrewbogott_: It looks like you are missing cluster api resources in the management cluster. Which version of capi-helm-charts are you using? | 15:01 |
atmark | Does `kubectl api-resources | grep cluster.x-k8s.io` return anything? | 15:03 |
andrewbogott_ | atmark: regarding version, I have 'default_helm_chart_version=0.16.0' in magnum.conf, it seems to be getting the 16.0 chart. | 15:06 |
andrewbogott_ | (trying to find my kubectl context, one moment...) | 15:08 |
atmark | I'm on 0.16.0 too. No issues so far. I encountered `ensure CRDs are installed first, resource mapping` yesterday when I realized I forgot to install cluster api resources | 15:08 |
andrewbogott_ | ok, I'm caught up | 15:10 |
atmark | Did you run `clusterctl init --core cluster-api:v1.9.6 --bootstrap kubeadm:v1.9.6 --control-plane kubeadm:v1.9.6 --infrastructure openstack:v0.11.3` ? | 15:10 |
andrewbogott_ | indeed, kubectl api-resources | grep cluster.x-k8s.io doesn't return anything. | 15:10 |
andrewbogott_ | Looks like I did 'clusterctl init --infrastructure openstack' | 15:11 |
andrewbogott_ | Not sure what guide I was following, trying to dig that back up... | 15:12 |
andrewbogott_ | heh, do the docs at https://docs.openstack.org/magnum-capi-helm/ even have a section about setting up the management cluster? | 15:13 |
andrewbogott_ | I'm running that clusterctl command you pasted now. Interested in if that's from a step-by-step guide I can follow when I rebuild this... | 15:14 |
atmark | Not that I'm aware of. I'm using portion of this from devstack to setup the management cluster using k3s https://opendev.org/openstack/magnum-capi-helm/src/branch/master/devstack/contrib/new-devstack.sh#L192-L269 | 15:15 |
atmark | portion of this guide* | 15:15 |
andrewbogott_ | I'm using k3s too, so that's good. | 15:16 |
andrewbogott_ | But not devstack so I was ignoring that section | 15:16 |
andrewbogott_ | Anyway... after running your suggested command, coe cluster create produces the same error message as before | 15:16 |
andrewbogott_ | let's see if this paste is legible... | 15:17 |
andrewbogott_ | https://www.irccloud.com/pastebin/GV7RUzql/ | 15:17 |
atmark | Does `kubectl api-resources | grep cluster.x-k8s.io` return anything now? | 15:19 |
andrewbogott_ | oh, good question! | 15:20 |
andrewbogott_ | yes, lots | 15:20 |
atmark | Here's what I have running on my management cluster https://paste.openstack.org/show/bFpuMVDKiJvz2WpZRrkz/ | 15:21 |
andrewbogott_ | Can I safely assume that magnum controller is talking to my management k8s service? Or could this be a simple network issue? | 15:21 |
andrewbogott_ | yeah, my api-resources output looks like yours | 15:22 |
atmark | andrewbogott_: Yes, magnum talks to the management cluster. | 15:23 |
andrewbogott_ | ok, so at least I have an interesting problem :) | 15:24 |
atmark | which version is your magnum-capi-helm? | 15:24 |
andrewbogott_ | ii python3-magnum-capi-helm 1.2.0-3~bpo12+1 all Magnum driver that uses Kubernetes Cluster API via Helm | 15:25 |
andrewbogott_ | Epoxy | 15:25 |
atmark | Have you tried creating workload cluster without using Magnum? e.g. helm upgrade my-cluster capi/openstack-cluster --install -f ./clouds.yaml -f ./cluster-configuration.yaml | 15:26 |
atmark | https://github.com/azimuth-cloud/capi-helm-charts/tree/main/charts/openstack-cluster#managing-a-workload-cluster | 15:26 |
andrewbogott_ | I can copy/paste that command if you want but can't claim to understand what's happening :) | 15:27 |
atmark | I think I found your issue | 15:29 |
atmark | Run this for me please `helm list -A` | 15:29 |
andrewbogott_ | https://www.irccloud.com/pastebin/aJH6iweA/ | 15:30 |
atmark | Is your kubeconfig pointing to your management cluster? | 15:32 |
atmark | If so, you are missing CAPI addon manager and janitor | 15:32 |
andrewbogott_ | you mean, as opposed to a different cluster? k8s mgmt cluster is all I've got. | 15:32 |
andrewbogott_ | https://www.irccloud.com/pastebin/3kw5PTfq/ | 15:33 |
atmark | You are missing these charts https://paste.openstack.org/show/bcXGt9bUWZGKjIjULw5J/ | 15:33 |
andrewbogott_ | ok! stay tuned... | 15:34 |
atmark | Look at the installation steps https://opendev.org/openstack/magnum-capi-helm/src/branch/master/devstack/contrib/new-devstack.sh#L192-L269│ | 15:34 |
atmark | You can even just copy paste those commands | 15:35 |
atmark | Better to reset your k3s installation first | 15:35 |
andrewbogott_ | So... just to clarify (and maybe editorialize) -- there are no docs about how to set this up, only a script to do so on devstack? | 15:41 |
andrewbogott_ | That's OK, I can certainly adapt that script. But I definitely didn't think "Oh I should totally read the devstack instructions since there are 8 critical steps that are only mentioned there?) | 15:41 |
andrewbogott_ | oops, gotta update my k8s.conf after the rebuild... | 15:43 |
andrewbogott_ | ok! It's getting further now. Will wait a bit and see where we land... | 15:45 |
andrewbogott_ | Seems to be 'CREATE_IN_PROGRESS' forever. But I will keep waiting :) | 15:55 |
andrewbogott_ | thank you for all the help, btw, atmark -- it looks like I'm getting much further, even if not to the finishline | 15:55 |
andrewbogott_ | yeah, seems stuck | 15:59 |
andrewbogott_ | oh, it made a VM though! | 16:00 |
atmark | Just the script from devstack. I adapted the script too and installed it on management cluster that's HA. | 16:05 |
atmark | andrewbogott_: Glad it works | 16:05 |
andrewbogott_ | it made the controller node but seems to be hanging rather than making workers. Magnum seems to think it's still working on it but I imagine the actual capi process has long since errored out. | 16:06 |
atmark | Check the logs on capo-controller-manager-xxxx pod in capo-system namespace | 16:10 |
atmark | it might be your project doesn't have resources anymore | 16:12 |
andrewbogott_ | you mean, like, because of resource quotas? I don't think i'm over quota | 16:14 |
andrewbogott_ | Logs are pretty happy although I do see this | 16:15 |
andrewbogott_ | "Bootstrap data secret reference is not yet available" controller="openstackmachine" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OpenStackMachine" OpenStackMachine="magnum-admin/test-cluster-06-sp3akzg6hxqu-default-worker-4v2pj-rj7sr" namespace="magnum-admin" name="test-cluster-06-sp3akzg6hxqu-default-worker-4v2pj-rj7sr" reconcileID="a2069e7c-0af5-4be1-9691-7fa41fdd9184" | 16:15 |
andrewbogott_ | openStackMachine="test-cluster-06-sp3akzg6hxqu-default-worker-4v2pj-rj7sr" machine="test-cluster-06-sp3akzg6hxqu-default-worker-4v2pj-rj7sr" cluster="test-cluster-06-sp3akzg6hxqu" openStackCluster="test-cluster-06-sp3akzg6hxqu" | 16:15 |
atmark | Yes resource quotas. Here's ways to debug https://github.com/azimuth-cloud/capi-helm-charts/blob/main/charts/openstack-cluster/DEBUGGING.md | 16:16 |
atmark | `Bootstrap data secret reference is not yet available` do you have barbican installed? | 16:16 |
andrewbogott_ | My guess is that I have installed but not working. And it's not installed at all in the prod cloud where I hope to deploy this. | 16:17 |
andrewbogott_ | Is it possible to run w/out it? If not I'll go on a side-quest to set up a minimal version. | 16:17 |
andrewbogott_ | I'm going to rip out the endpoint in my test cloud and see what happens :) | 16:19 |
atmark | I thought barbican is required for magnum-capi-helm driver but looks like it's not | 16:25 |
atmark | My bad | 16:25 |
andrewbogott_ | unclear, it's certainly acting like it's required :) | 16:26 |
atmark | You can also check the logs for capi-controller-manager-xxx on capi-system namespace | 16:26 |
atmark | Did you build your own image? | 16:27 |
andrewbogott_ | no, upstream image | 16:28 |
andrewbogott_ | this is weird... controller-manager logs say | 16:29 |
andrewbogott_ | cluster is not reachable: Get \"https://185.15.57.22:6443 | 16:29 |
andrewbogott_ | but that's the IP of a different loadbalancer, unrelated to this cluster | 16:29 |
andrewbogott_ | in a different project | 16:29 |
andrewbogott_ | the lb it created is test-cluster-07-i3wtgzrhm4q6-control-plane-x8nkr, IP 10.0.0.240 | 16:30 |
andrewbogott_ | hmmmm actually maybe I'm wrong, I don't know /what/ that IP is | 16:31 |
atmark | Did it assign floating IP to lb? | 16:31 |
* andrewbogott_ digs deeper | 16:31 | |
andrewbogott_ | yes, you're right, it did. | 16:31 |
andrewbogott_ | So it is the right IP for the managed lb. But the lb is in an error state. | 16:31 |
andrewbogott_ | Also, I wish it wouldn't use that subnet for the lb IP. Maybe that's configured in the template somehow? | 16:32 |
atmark | Yes it's configurable in the template | 16:32 |
atmark | --external-network | 16:32 |
andrewbogott_ | ok, starting over with that :) | 16:32 |
andrewbogott_ | while I'm in here I guess I should change --network-driver from flannel (the default, I guess?) to calico? | 16:34 |
atmark | The capi driver doesn't respect --network-driver so it doesn't matter what it's set to. | 16:36 |
andrewbogott_ | ok :) | 16:37 |
atmark | I have set it to Calico but in cluster I'm running Cilium | 16:37 |
andrewbogott_ | lol, will it /only/ create clusters with floating IPs assigned? I can't unset external_network and also can't set it to anything other than a floating range... | 16:43 |
andrewbogott_ | I guess I need to make a 'fake' floating IP range with internal IPs but I'd rather it just skipped it | 16:44 |
atmark | Here is what options in the template it respects https://opendev.org/openstack/magnum-capi-helm/src/commit/3159a017a61b42a315ac0a2aa9383b317965d0ae/doc/source/configuration/index.rst | 16:44 |
atmark | andrewbogott_: there's no logic in the driver to disable floating IP so you are only choice is to edit the values.yaml in the charts | 16:48 |
atmark | You have to set this line to false https://github.com/azimuth-cloud/capi-helm-charts/blob/main/charts/openstack-cluster/values.yaml#L190 | 16:48 |
andrewbogott_ | makes sense | 16:49 |
andrewbogott_ | I'll live with it consuming a floating IP for now and see if I can figure out what connectivity wasn't working | 16:49 |
atmark | Do you allow NAT Hairpining? It looks your workers couldn't reach that public IP from inside. | 16:52 |
atmark | Workers will connect to https://185.15.57.22:6443 endpoint to join the cluster. | 16:54 |
andrewbogott_ | It should work, assuming the firewall rules are set up properly. Does magnum manage that or do I need to set up default security groups somehow? | 16:57 |
andrewbogott_ | Seems weird that they use the public network + the load balancer for what is clearly internal communication though. | 16:58 |
andrewbogott_ | *they | 16:58 |
atmark | By default, Magnum manages the security groups for the workload clusters. Are you using Octavia? | 17:00 |
andrewbogott_ | yes | 17:00 |
atmark | Does the octavia listener has any ACL? | 17:01 |
andrewbogott_ | I don't think so. The pool is in error state though, let me investigate that a bit... | 17:03 |
andrewbogott_ | I set an internal network rather than leaving it on default and now it's making worker nodes! | 17:40 |
atmark | Nice | 17:50 |
atmark | I set mine to use existing network as well | 17:51 |
atmark | btw, you can also reach the magnum devs on Kubernetes Slack in the #openstack-magnum channel | 17:55 |
andrewbogott_ | oh good to know! | 17:56 |
andrewbogott_ | Right now everything is looking good but the status still says create_in_progress. And my apparently very-underpowered management cluster is struggling so I'm trying to be patient :) | 17:57 |
atmark | andrewbogott_: that's a bug in the driver https://bugs.launchpad.net/magnum/+bug/2115207. As you can see in the comment section, someone uploaded a patch to fix that bug. I patched mine. | 18:01 |
atmark | There's a patch from upstream waiting to be merged https://review.opendev.org/c/openstack/magnum-capi-helm/+/950806 | 18:02 |
andrewbogott_ | oh, so it will never not say 'create in progress'? | 18:02 |
atmark | Patch is exactly the same. It's one change in the code. | 18:03 |
atmark | The status should transition to CREATE_COMPLETE | 18:05 |
* andrewbogott_ patches | 18:05 | |
andrewbogott_ | what do you think, atmark, healthy or unhealthy? | 19:07 |
andrewbogott_ | https://www.irccloud.com/pastebin/DNczdr72/ | 19:07 |
andrewbogott_ | magnum still thinks it's creating but it sounds like that is not reliable. | 19:08 |
atmark | Did you patch the driver? | 19:33 |
atmark | What does this command show you? `openstack coe cluster show -c health_status_reason $name` | 19:36 |
-opendevstatus- NOTICE: The Gerrit service on review.opendev.org will be offline briefly for a configuration and version update, but should return to service momentarily | 20:06 | |
andrewbogott_ | I did patch the driver. health status displays as {} | 20:41 |
andrewbogott_ | or, I mean, health_status_reason | 20:41 |
atmark | is the status still stuck in CREATE_IN_PROGRESS? | 21:15 |
atmark | Not sure why openstack-cinder-csi-controllerplugin-7b66d6bb65-6v787 pod keeps crashing | 21:16 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!