Monday, 2023-12-18

opendevreview	Marcus Klein proposed openstack/openstack-ansible-ops master: Add Prometheus Mysqld exporter https://review.opendev.org/c/openstack/openstack-ansible-ops/+/903858	08:50
opendevreview	Marcus Klein proposed openstack/openstack-ansible-ops master: Add Prometheus Mysqld exporter https://review.opendev.org/c/openstack/openstack-ansible-ops/+/903858	11:32
opendevreview	Marcus Klein proposed openstack/openstack-ansible-ops master: Add Prometheus Mysqld exporter https://review.opendev.org/c/openstack/openstack-ansible-ops/+/903858	13:29
deflated	Hi all, me again, managed to fix the ceph repo issue kinda, it's still generating another ceph repo in each container i have to delete but once i do it works. Moving on i have noticed my external vip is binding to br-mgmt, i'm sure this isn't right, how do i set this to my external/public network as no matter what i set haproxy_keepalived_external_interface/haproxy_bind_external_lb_vip_interface to it	14:00
deflated	either wont attach to anything or attaches to br-mgmt still, if i leave this blank it attaches to br-mgmt	14:00
deflated	if this is intended i'll move on, if not, help is appreciated	14:00
jrosser	you can make the external VIP be whatever you need	14:11
deflated	i've tried and it doesnt seem to honour it in user_variables	14:11
deflated	on 28.0.0 btw	14:12
jrosser	can you share what you set?	14:12
deflated	haproxy_keepalived_external_vip_cidr originally then when it binded to br-mgmt tried setting haproxy_keepalived_external_interface/haproxy_bind_external_lb_vip_interface to the wanted interface	14:13
deflated	cidr i used the ip/subnet wanted of course	14:14
deflated	i can ping and access the interface/network i am trying to attach to	14:14
jrosser	here is some of my config https://paste.opendev.org/show/bRAsO7OBq3daaIRlg3xm/	14:14
deflated	yeah those are the same as the ones i set (dirrerent values of course)	14:15
deflated	*different	14:15
jrosser	and which playbook are you running	14:15
deflated	hosts/infra.yml with -limit 'haproxy_all' to test my changes	14:16
deflated	pretty sure i only need to run infra but i tried hosts for my own sanity tbh	14:17
jrosser	you can run `openstack-ansible playbooks/haproxy-install.yml`	14:18
deflated	just noticed your interfaces dont have quotes, does that matter	14:18
deflated	ah ok, will do that from now	14:18
jrosser	they are just yaml strings, so it should be fine	14:18
jrosser	deflated: then the corresponding part of my openstack_user_config.yml is https://paste.opendev.org/show/bPWqqGHBgvWB1JGH8QVe/	14:22
jrosser	deflated: do you have more than one infra node?	14:23
deflated	yep, also have those set to match	14:23
jrosser	actually i mean, are you running more than one haproxy instance	14:23
deflated	yeah, have 3, all identical	14:23
jrosser	so you should then be able to check the keepalived config and haproxy config on those nodes	14:24
deflated	i checked /etc/haproxy/haproxy.cfg and it states the right bridge in the text but it's not actually attaching	14:25
jrosser	well it wouldnt	14:25
jrosser	becasue in a HA deployment, keepalived is responsible for the VIP	14:25
jrosser	deflated: can i just double check that you are not getting mixed up between haproxy_bind_internal_lb_vip_interface and haproxy_keepalived_internal_interface	14:28
deflated	checking keepalived also shows me the correct virtual_ipaddress and bridge	14:28
deflated	no, i dont have lb_vip set in variables	14:28
jrosser	can you please explain more `i checked /etc/haproxy/haproxy.cfg and it states the right bridge in the text but it's not actually attaching`	14:29
spatel	jrosser morning	15:11
jrosser	o/ hello there	15:11
spatel	I am playing with magnum-cluster-api and seeing this error in magnum - https://paste.opendev.org/show/btsoaa2SjauhVIkWq3uA/	15:12
jrosser	i see you all over the ML and slack and irc :)	15:12
spatel	:D	15:12
spatel	I am desperate to make it work because customer looking for alternative solution	15:12
jrosser	you can only use calico	15:12
spatel	I am frustrated because there are not enough doc for this stuff.. :(	15:13
spatel	I am using calico in my template	15:13
jrosser	oh no actually that is magnum.conf problem	15:14
jrosser	this is all in my patches for OSA	15:14
jrosser	magnum.conf must say that only calico is allowed	15:15
spatel	Do you know config option which I can put manually?	15:16
spatel	let me add allowed_network_drivers=calico in magnum.conf	15:17
jrosser	https://review.opendev.org/c/openstack/openstack-ansible/+/893240/31/tests/roles/bootstrap-host/templates/user_variables_k8s.yml.j2	15:17
jrosser	don't jsut copy/paste the whole lot, needs understanding	15:18
spatel	I am using kolla-ansible :( but I can compile info which required for it	15:19
jrosser	imho there should be proper documentation with the deployment tools	15:19
jrosser	otherwise it is a total nightmare	15:20
jrosser	but you know enough how openstack-ansible overrides work to be able to translate magnum_magnum_conf_overrides in OSA into something the same in kolla?	15:20
spatel	I do have template with calico driver - https://paste.opendev.org/show/b79POHXj4tWB8S1Aubdz/	15:20
jrosser	yes but like i say, the magnum-cluster-api driver is validating that magum.conf allows only calico	15:21
jrosser	not in your cluster template	15:21
spatel	ok.. let me add in magnum.conf file	15:23
spatel	is this correct flag - allowed_network_drivers=calico	15:23
jrosser	do you have barbican?	15:23
spatel	no	15:23
jrosser	you can see in my patch that i set kubernetes_allowed_network_drivers and ikubernetes_default_network_driver in the [cluster_template] config section	15:24
spatel	ok.. let me try and i will get back to you	15:25
jrosser	and if you do not have barbican then you also need cert_manager_type: x509keypair in [certificates] if it is not already like that	15:25
spatel	I don't have barbican	15:25
spatel	cert_manager_type: x509keypair in [certificates] this will be in magnum.conf	15:26
jrosser	slow down :)	15:26
jrosser	look at my patch	15:26
spatel	ok.. :)	15:27
spatel	Give me few min.. stuck in one meeting..	15:37
spatel	jrosser are you running kind cluster in OSA?	15:38
jrosser	no, i have used the vexxhost.kubernetes ansible collection to deploy the control plane cluster	15:42
spatel	jrosser I am getting this error now - https://paste.opendev.org/show/bv9F7fK4xLGT4dCsTBif/	15:47
jrosser	you have to debug	15:47
spatel	I have enabled debug but no interesting logs there.. let me show you	15:48
spatel	I am using this to deploy controlplane - https://github.com/vexxhost/magnum-cluster-api/blob/main/hack/stack.sh#L128C1-L140C45	15:48
jrosser	i am going to guess that this is because your magnum container does not trust the certificate in the k8s endpoint	15:48
spatel	kubectl command works from magnum container	15:49
spatel	jrosser how does magnum knows that I have to talk to CAPI node?	15:57
jrosser	the credentials and CA and endpoint are all in the .kube/config	15:59
jrosser	so if you have delete/recreate/something your control plane cluster but not copied the updated .kube/config to your magnum container, you could have difficulty	16:00
jrosser	which would certainly lead to SSL errors as the CA will be different	16:00
spatel	jrosser check this out - https://paste.opendev.org/	16:00
spatel	I do copy .kube/config when I rebuilt my kind cluster	16:01
jrosser	and you restart magnum conductor? (i don't know if this is needed, not sure about the lifecycle of the config)	16:02
jrosser	btw the paste link is incomplete	16:02
spatel	I am always restarting all container	16:02
jrosser	and have you looked at the log for magnum conductor	16:03
spatel	jrosser https://pastebin.com/gVkvDmVd	16:04
jrosser	i mean specifically for the SSL errors you see in the cluster status	16:06
spatel	jrosser let me verify SSL again	16:09
jrosser	spatel: mgariepy heres how my magnum diagram is so far https://pasteboard.co/XtSEagQfxwgv.png	16:18
spatel	jrosser I got new error this time - https://paste.opendev.org/show/bIcRHJDJJVlQiTr82bfO/	16:18
spatel	jrosser +++1 for diagram :)	16:19
jrosser	spatel: i have no idea on your error	16:20
spatel	jrosser did you use this code to deploy capi control plane - https://github.com/vexxhost/magnum-cluster-api/blob/main/hack/stack.sh#L128C1-L140C45	16:21
jrosser	the diagram is "full fat / max complexity" deployment, lots is optional and probably not required	16:21
jrosser	spatel: no i did not	16:22
spatel	can you point me what did you use to deploy capi ?	16:22
jrosser	what version did you install?	16:22
jrosser	spatel: i used this https://review.opendev.org/c/openstack/openstack-ansible/+/893240	16:26
spatel	jrosser look like progress, I am seeing - CREATE_IN_PROGRESS	16:39
spatel	fingers cross	16:39
spatel	What is the command to check progress? in heat we can see resources but what is the command in CAPI?	16:42
jrosser	spatel: hah that is a great question indeed	16:44
jrosser	to start with i think you can see some of the progress in magnum conductor	16:45
jrosser	you can try something like `kubectl -n capo-system logs deploy/capo-controller-manager`	16:46
jrosser	spatel: do you have octavia deployed?	16:48
deflated	jrosser so sorry i didn't get back to you earlier, my son had an accident at school, seem to have have found the problem, the bridge i was using was set to manual with no ip, setting it to static with an ip has caused the vip to be created as a secondary, figured this out by trying another network then analysed the differences, which was the ip, i've checked and i can't see anything in the docs that	16:58
deflated	states the bridge for the vip requires an ip	16:58
jrosser	well "it depends"	16:58
jrosser	if it was your external interface for neutron routers / floating IP then there would be no need for an IP on the bridge	16:59
jrosser	and ultimately it pretty much depends how you want it to work	17:00
jrosser	if you were using real internet ipv4 for this then it might be quite reasonable not to want to "waste" a public ipv4 address on each node, as well as the VIP	17:00
jrosser	the thing with openstack-ansible is that almost anything is possible, like a toolkit really	17:01
deflated	I'm just happy i figured it out, it's a big learning curve, I can imagine i'm going to run into more caveats when this goes from testing to production	17:03
jrosser	oh sure i totally undertstand about the learning curve	17:03
deflated	having my settings confirmed helped me dig deeper so thanks for that	17:03
jrosser	it's a very different thing to a shrink-wrap install where all the decisions are made for you	17:03
jrosser	flip-side of that is, almost anything is possible	17:04
deflated	i've been modding things my whole life, i much prefer to tinker and learn than be handed it on a platter	17:04
jrosser	as an example, my API endpoints / horizon are in a different interface and subnet to the neturon networks	17:04
jrosser	just becasue i choose it to be that way	17:04
jrosser	fwiw most of the active poeple here in openstack-ansible IRC are operating clouds, and are contributing to the code	17:05
deflated	currently running infrastructure then on to openstack, i have ran this before and had a ceph key error for gnocchi that if it reoccurs i'll post up later (probably tomorrow, it's almost the end of my work day)	17:05
jrosser	so theres quite a good perspective on what works, and whats necessary	17:06
jrosser	ah ok, i don't run the telemetry stack so don't have any hands on experience with gnocchi	17:06
deflated	i have spent a bit of time learning and following the tracker on opendev, i think i need to make an account to better understand the process and then i think i'll submit an updated network setup for ovs as i may just have it working	17:08
jrosser	cool - be sure to ask networking things of jamesdenton too	17:09
jrosser	fwiw, OVS should 'just work' if you've followed how the all-in-one is setup	17:09
jrosser	and also, new deployments probably should be using OVN	17:09
spatel	jrosser yes I do have octavia	17:10
deflated	i actually found his blog a while back and it helped to understand the transition from lb to ovs, i am using ovn, my bonds and bridges are however ovs	17:11
jrosser	spatel: so you should be able to follow the creation of the loadbalancer, security groups, router, network,..... by cluster_api	17:11
spatel	My cluster stuck in CREATE_IN_PROGRESS	17:14
jrosser	right - you need to find out what it is trying to do	17:14
spatel	nova list - I can see only single vm created - k8s-clusterapi-cluster-magnum-system-kube-5n49h	17:14
jrosser	did you setup an ssh key with your cluster template?	17:14
spatel	I think not.. that is my next step to add ssh key and re-create cluster	17:15
jrosser	yes, definatly do that for debugging	17:15
jrosser	spatel: so another question - can your control plane k8s contact the API endpoint on your created workload cluster	17:16
jrosser	you either need "some networking" that makes that work / a floating IP to be created on the octavia LB / or use the magnum-cluster-api-proxy service	17:17
spatel	vm has public floating IP so my k8s should be able to reach it	17:18
spatel	I meant k8s-clusterapi-cluster-magnum-system-kube-5n49h vm	17:18
jrosser	no, i mean floating IP on the loadbalancer	17:18
spatel	I can't see any octavia instance yet	17:19
jrosser	i think the default is that it's enabled actually	17:19
spatel	I can see only single VM spun up with name of - kube-5n49h-7jkxl-245s5	17:20
spatel	Assuming this is master node	17:20
deflated	spatel, you can ssh into the vm as soon as it creates the node and run journalctl -f to watch for errors, i'm of course only just entering the convo but i havent seen what kube version you are using? certain versions will fail no matter how hard you try	17:25
spatel	deflated this is all pre-build images so version should work. I have feeling that my openstack endpoint not allowing access to kube vms because they are not on public.	17:27
spatel	I am debugging it and see what is going on	17:27
deflated	ah ok, assumed you were building from a coreos image	17:27
jrosser	public ip doesnt matter	17:28
jrosser	the magnum vm should nat out through the neutron router to your public endpoint	17:28
jrosser	the floating ip is necessary for the control plane k8s cluster to see the workload cluster api	17:28
jrosser	deflated: this is all new exciting stuff using cluster-api rather than the heat/coreos driver in magnum	17:30
deflated	great, another subject to learn lol, guess more research is in order	17:31
spatel	jrosser my controllers running on all private IPs and if kube vms running on public IP then they can't talk to openstack endpoints.	17:49
spatel	I am setting up one VM with nginx to expose all endpoint to public IP and then I will update keystone catalog of Public to point my ngnix with public IP	17:49
spatel	I believe k8s workload vms need to talk to openstack endpoints otherwise it won't work	17:50
jrosser	yes, and a network and router are created for this	17:56
jrosser	spatel: it’s totally not needed to make extra nginx	17:56
jrosser	oh wait? you don’t have public endpoint?	17:57
spatel	no	18:08
spatel	not yet.. I am setting up now with ngnix	18:08
spatel	any idea about this error in novaconsole logs - handler exception: The token '***' is invalid or has expired	20:23

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!