opendevreview | Marcus Klein proposed openstack/openstack-ansible-ops master: Add Prometheus Mysqld exporter https://review.opendev.org/c/openstack/openstack-ansible-ops/+/903858 | 08:50 |
---|---|---|
opendevreview | Marcus Klein proposed openstack/openstack-ansible-ops master: Add Prometheus Mysqld exporter https://review.opendev.org/c/openstack/openstack-ansible-ops/+/903858 | 11:32 |
opendevreview | Marcus Klein proposed openstack/openstack-ansible-ops master: Add Prometheus Mysqld exporter https://review.opendev.org/c/openstack/openstack-ansible-ops/+/903858 | 13:29 |
deflated | Hi all, me again, managed to fix the ceph repo issue kinda, it's still generating another ceph repo in each container i have to delete but once i do it works. Moving on i have noticed my external vip is binding to br-mgmt, i'm sure this isn't right, how do i set this to my external/public network as no matter what i set haproxy_keepalived_external_interface/haproxy_bind_external_lb_vip_interface to it | 14:00 |
deflated | either wont attach to anything or attaches to br-mgmt still, if i leave this blank it attaches to br-mgmt | 14:00 |
deflated | if this is intended i'll move on, if not, help is appreciated | 14:00 |
jrosser | you can make the external VIP be whatever you need | 14:11 |
deflated | i've tried and it doesnt seem to honour it in user_variables | 14:11 |
deflated | on 28.0.0 btw | 14:12 |
jrosser | can you share what you set? | 14:12 |
deflated | haproxy_keepalived_external_vip_cidr originally then when it binded to br-mgmt tried setting haproxy_keepalived_external_interface/haproxy_bind_external_lb_vip_interface to the wanted interface | 14:13 |
deflated | cidr i used the ip/subnet wanted of course | 14:14 |
deflated | i can ping and access the interface/network i am trying to attach to | 14:14 |
jrosser | here is some of my config https://paste.opendev.org/show/bRAsO7OBq3daaIRlg3xm/ | 14:14 |
deflated | yeah those are the same as the ones i set (dirrerent values of course) | 14:15 |
deflated | *different | 14:15 |
jrosser | and which playbook are you running | 14:15 |
deflated | hosts/infra.yml with -limit 'haproxy_all' to test my changes | 14:16 |
deflated | pretty sure i only need to run infra but i tried hosts for my own sanity tbh | 14:17 |
jrosser | you can run `openstack-ansible playbooks/haproxy-install.yml` | 14:18 |
deflated | just noticed your interfaces dont have quotes, does that matter | 14:18 |
deflated | ah ok, will do that from now | 14:18 |
jrosser | they are just yaml strings, so it should be fine | 14:18 |
jrosser | deflated: then the corresponding part of my openstack_user_config.yml is https://paste.opendev.org/show/bPWqqGHBgvWB1JGH8QVe/ | 14:22 |
jrosser | deflated: do you have more than one infra node? | 14:23 |
deflated | yep, also have those set to match | 14:23 |
jrosser | actually i mean, are you running more than one haproxy instance | 14:23 |
deflated | yeah, have 3, all identical | 14:23 |
jrosser | so you should then be able to check the keepalived config and haproxy config on those nodes | 14:24 |
deflated | i checked /etc/haproxy/haproxy.cfg and it states the right bridge in the text but it's not actually attaching | 14:25 |
jrosser | well it wouldnt | 14:25 |
jrosser | becasue in a HA deployment, keepalived is responsible for the VIP | 14:25 |
jrosser | deflated: can i just double check that you are not getting mixed up between haproxy_bind_internal_lb_vip_interface and haproxy_keepalived_internal_interface | 14:28 |
deflated | checking keepalived also shows me the correct virtual_ipaddress and bridge | 14:28 |
deflated | no, i dont have lb_vip set in variables | 14:28 |
jrosser | can you please explain more `i checked /etc/haproxy/haproxy.cfg and it states the right bridge in the text but it's not actually attaching` | 14:29 |
spatel | jrosser morning | 15:11 |
jrosser | o/ hello there | 15:11 |
spatel | I am playing with magnum-cluster-api and seeing this error in magnum - https://paste.opendev.org/show/btsoaa2SjauhVIkWq3uA/ | 15:12 |
jrosser | i see you all over the ML and slack and irc :) | 15:12 |
spatel | :D | 15:12 |
spatel | I am desperate to make it work because customer looking for alternative solution | 15:12 |
jrosser | you can only use calico | 15:12 |
spatel | I am frustrated because there are not enough doc for this stuff.. :( | 15:13 |
spatel | I am using calico in my template | 15:13 |
jrosser | oh no actually that is magnum.conf problem | 15:14 |
jrosser | this is all in my patches for OSA | 15:14 |
jrosser | magnum.conf must say that *only* calico is allowed | 15:15 |
spatel | Do you know config option which I can put manually? | 15:16 |
spatel | let me add allowed_network_drivers=calico in magnum.conf | 15:17 |
jrosser | https://review.opendev.org/c/openstack/openstack-ansible/+/893240/31/tests/roles/bootstrap-host/templates/user_variables_k8s.yml.j2 | 15:17 |
jrosser | don't jsut copy/paste the whole lot, needs understanding | 15:18 |
spatel | I am using kolla-ansible :( but I can compile info which required for it | 15:19 |
jrosser | imho there should be proper documentation with the deployment tools | 15:19 |
jrosser | otherwise it is a total nightmare | 15:20 |
jrosser | but you know enough how openstack-ansible overrides work to be able to translate magnum_magnum_conf_overrides in OSA into something the same in kolla? | 15:20 |
spatel | I do have template with calico driver - https://paste.opendev.org/show/b79POHXj4tWB8S1Aubdz/ | 15:20 |
jrosser | yes but like i say, the magnum-cluster-api driver is validating that magum.conf allows *only* calico | 15:21 |
jrosser | not in your cluster template | 15:21 |
spatel | ok.. let me add in magnum.conf file | 15:23 |
spatel | is this correct flag - allowed_network_drivers=calico | 15:23 |
jrosser | do you have barbican? | 15:23 |
spatel | no | 15:23 |
jrosser | you can see in my patch that i set kubernetes_allowed_network_drivers and ikubernetes_default_network_driver in the [cluster_template] config section | 15:24 |
spatel | ok.. let me try and i will get back to you | 15:25 |
jrosser | and if you do not have barbican then you also need cert_manager_type: x509keypair in [certificates] if it is not already like that | 15:25 |
spatel | I don't have barbican | 15:25 |
spatel | cert_manager_type: x509keypair in [certificates] this will be in magnum.conf | 15:26 |
jrosser | slow down :) | 15:26 |
jrosser | look at my patch | 15:26 |
spatel | ok.. :) | 15:27 |
spatel | Give me few min.. stuck in one meeting.. | 15:37 |
spatel | jrosser are you running kind cluster in OSA? | 15:38 |
jrosser | no, i have used the vexxhost.kubernetes ansible collection to deploy the control plane cluster | 15:42 |
spatel | jrosser I am getting this error now - https://paste.opendev.org/show/bv9F7fK4xLGT4dCsTBif/ | 15:47 |
jrosser | you have to debug | 15:47 |
spatel | I have enabled debug but no interesting logs there.. let me show you | 15:48 |
spatel | I am using this to deploy controlplane - https://github.com/vexxhost/magnum-cluster-api/blob/main/hack/stack.sh#L128C1-L140C45 | 15:48 |
jrosser | i am going to guess that this is because your magnum container does not trust the certificate in the k8s endpoint | 15:48 |
spatel | kubectl command works from magnum container | 15:49 |
spatel | jrosser how does magnum knows that I have to talk to CAPI node? | 15:57 |
jrosser | the credentials and CA and endpoint are all in the .kube/config | 15:59 |
jrosser | so if you have delete/recreate/something your control plane cluster but not copied the updated .kube/config to your magnum container, you could have difficulty | 16:00 |
jrosser | which would certainly lead to SSL errors as the CA will be different | 16:00 |
spatel | jrosser check this out - https://paste.opendev.org/ | 16:00 |
spatel | I do copy .kube/config when I rebuilt my kind cluster | 16:01 |
jrosser | and you restart magnum conductor? (i don't know if this is needed, not sure about the lifecycle of the config) | 16:02 |
jrosser | btw the paste link is incomplete | 16:02 |
spatel | I am always restarting all container | 16:02 |
jrosser | and have you looked at the log for magnum conductor | 16:03 |
spatel | jrosser https://pastebin.com/gVkvDmVd | 16:04 |
jrosser | i mean specifically for the SSL errors you see in the cluster status | 16:06 |
spatel | jrosser let me verify SSL again | 16:09 |
jrosser | spatel: mgariepy heres how my magnum diagram is so far https://pasteboard.co/XtSEagQfxwgv.png | 16:18 |
spatel | jrosser I got new error this time - https://paste.opendev.org/show/bIcRHJDJJVlQiTr82bfO/ | 16:18 |
spatel | jrosser +++1 for diagram :) | 16:19 |
jrosser | spatel: i have no idea on your error | 16:20 |
spatel | jrosser did you use this code to deploy capi control plane - https://github.com/vexxhost/magnum-cluster-api/blob/main/hack/stack.sh#L128C1-L140C45 | 16:21 |
jrosser | the diagram is "full fat / max complexity" deployment, lots is optional and probably not required | 16:21 |
jrosser | spatel: no i did not | 16:22 |
spatel | can you point me what did you use to deploy capi ? | 16:22 |
jrosser | what version did you install? | 16:22 |
jrosser | spatel: i used this https://review.opendev.org/c/openstack/openstack-ansible/+/893240 | 16:26 |
spatel | jrosser look like progress, I am seeing - CREATE_IN_PROGRESS | 16:39 |
spatel | fingers cross | 16:39 |
spatel | What is the command to check progress? in heat we can see resources but what is the command in CAPI? | 16:42 |
jrosser | spatel: hah that is a great question indeed | 16:44 |
jrosser | to start with i think you can see some of the progress in magnum conductor | 16:45 |
jrosser | you can try something like `kubectl -n capo-system logs deploy/capo-controller-manager` | 16:46 |
jrosser | spatel: do you have octavia deployed? | 16:48 |
deflated | jrosser so sorry i didn't get back to you earlier, my son had an accident at school, seem to have have found the problem, the bridge i was using was set to manual with no ip, setting it to static with an ip has caused the vip to be created as a secondary, figured this out by trying another network then analysed the differences, which was the ip, i've checked and i can't see anything in the docs that | 16:58 |
deflated | states the bridge for the vip requires an ip | 16:58 |
jrosser | well "it depends" | 16:58 |
jrosser | if it was your external interface for neutron routers / floating IP then there would be no need for an IP on the bridge | 16:59 |
jrosser | and ultimately it pretty much depends how you want it to work | 17:00 |
jrosser | if you were using real internet ipv4 for this then it might be quite reasonable not to want to "waste" a public ipv4 address on each node, as well as the VIP | 17:00 |
jrosser | the thing with openstack-ansible is that almost anything is possible, like a toolkit really | 17:01 |
deflated | I'm just happy i figured it out, it's a big learning curve, I can imagine i'm going to run into more caveats when this goes from testing to production | 17:03 |
jrosser | oh sure i totally undertstand about the learning curve | 17:03 |
deflated | having my settings confirmed helped me dig deeper so thanks for that | 17:03 |
jrosser | it's a very different thing to a shrink-wrap install where all the decisions are made for you | 17:03 |
jrosser | flip-side of that is, almost anything is possible | 17:04 |
deflated | i've been modding things my whole life, i much prefer to tinker and learn than be handed it on a platter | 17:04 |
jrosser | as an example, my API endpoints / horizon are in a different interface and subnet to the neturon networks | 17:04 |
jrosser | just becasue i choose it to be that way | 17:04 |
jrosser | fwiw most of the active poeple here in openstack-ansible IRC are operating clouds, and are contributing to the code | 17:05 |
deflated | currently running infrastructure then on to openstack, i have ran this before and had a ceph key error for gnocchi that if it reoccurs i'll post up later (probably tomorrow, it's almost the end of my work day) | 17:05 |
jrosser | so theres quite a good perspective on what works, and whats necessary | 17:06 |
jrosser | ah ok, i don't run the telemetry stack so don't have any hands on experience with gnocchi | 17:06 |
deflated | i have spent a bit of time learning and following the tracker on opendev, i think i need to make an account to better understand the process and then i think i'll submit an updated network setup for ovs as i may just have it working | 17:08 |
jrosser | cool - be sure to ask networking things of jamesdenton too | 17:09 |
jrosser | fwiw, OVS should 'just work' if you've followed how the all-in-one is setup | 17:09 |
jrosser | and also, new deployments probably should be using OVN | 17:09 |
spatel | jrosser yes I do have octavia | 17:10 |
deflated | i actually found his blog a while back and it helped to understand the transition from lb to ovs, i am using ovn, my bonds and bridges are however ovs | 17:11 |
jrosser | spatel: so you should be able to follow the creation of the loadbalancer, security groups, router, network,..... by cluster_api | 17:11 |
spatel | My cluster stuck in CREATE_IN_PROGRESS | 17:14 |
jrosser | right - you need to find out what it is trying to do | 17:14 |
spatel | nova list - I can see only single vm created - k8s-clusterapi-cluster-magnum-system-kube-5n49h | 17:14 |
jrosser | did you setup an ssh key with your cluster template? | 17:14 |
spatel | I think not.. that is my next step to add ssh key and re-create cluster | 17:15 |
jrosser | yes, definatly do that for debugging | 17:15 |
jrosser | spatel: so another question - can your control plane k8s contact the API endpoint on your created workload cluster | 17:16 |
jrosser | you either need "some networking" that makes that work / a floating IP to be created on the octavia LB / or use the magnum-cluster-api-proxy service | 17:17 |
spatel | vm has public floating IP so my k8s should be able to reach it | 17:18 |
spatel | I meant k8s-clusterapi-cluster-magnum-system-kube-5n49h vm | 17:18 |
jrosser | no, i mean floating IP on the loadbalancer | 17:18 |
spatel | I can't see any octavia instance yet | 17:19 |
jrosser | i think the default is that it's enabled actually | 17:19 |
spatel | I can see only single VM spun up with name of - kube-5n49h-7jkxl-245s5 | 17:20 |
spatel | Assuming this is master node | 17:20 |
deflated | spatel, you can ssh into the vm as soon as it creates the node and run journalctl -f to watch for errors, i'm of course only just entering the convo but i havent seen what kube version you are using? certain versions will fail no matter how hard you try | 17:25 |
spatel | deflated this is all pre-build images so version should work. I have feeling that my openstack endpoint not allowing access to kube vms because they are not on public. | 17:27 |
spatel | I am debugging it and see what is going on | 17:27 |
deflated | ah ok, assumed you were building from a coreos image | 17:27 |
jrosser | public ip doesnt matter | 17:28 |
jrosser | the magnum vm should nat out through the neutron router to your public endpoint | 17:28 |
jrosser | the floating ip is necessary for the control plane k8s cluster to see the workload cluster api | 17:28 |
jrosser | deflated: this is all new exciting stuff using cluster-api rather than the heat/coreos driver in magnum | 17:30 |
deflated | great, another subject to learn lol, guess more research is in order | 17:31 |
spatel | jrosser my controllers running on all private IPs and if kube vms running on public IP then they can't talk to openstack endpoints. | 17:49 |
spatel | I am setting up one VM with nginx to expose all endpoint to public IP and then I will update keystone catalog of Public to point my ngnix with public IP | 17:49 |
spatel | I believe k8s workload vms need to talk to openstack endpoints otherwise it won't work | 17:50 |
jrosser | yes, and a network and router are created for this | 17:56 |
jrosser | spatel: it’s totally not needed to make extra nginx | 17:56 |
jrosser | oh wait? you don’t have public endpoint? | 17:57 |
spatel | no | 18:08 |
spatel | not yet.. I am setting up now with ngnix | 18:08 |
spatel | any idea about this error in novaconsole logs - handler exception: The token '***' is invalid or has expired | 20:23 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!