Friday, 2023-01-06

prometheanfire	does `openstack network agent list show XXX for alive ovn controller and ovn-metadata-agent for others?	00:06
jamesdenton	they should all show alive, prometheanfire	01:43
jamesdenton	https://docs.openstack.org/openstack-ansible-os_neutron/latest/app-ovn.html	01:45
prometheanfire	jamesdenton: br-int sock was not accessable due to protocol version issues, think it's fixed now	02:38
jamesdenton	ahh, good deal	03:12
jamesdenton	protocol version.. meaning ssl vs non ssl?	03:12
*** akahat\|rover is now known as akahat		05:12
prometheanfire	maybe? not sure	05:13
prometheanfire	someone set up the systems with linuxbridge and ovs but I'm redoing it with zed and ovn (instructions unclear for them when I went on leave I guess), so ovs created bridges without ssl I'm guessing and ovn connects with it, maybe	05:15
prometheanfire	deleted the bridge and had stuff recreate since this is greenfield	05:15
*** dviroel\|afk is now known as dviroel		11:24
cloudnull	OHAI - happy Friday all	14:37
prometheanfire	cloudnull: ohai	15:11
*** dviroel is now known as dviroel\|lunch		16:39
admin1	is there a easy way to enable 2fa/mfa in keystone via osa ?	17:34
darman	Hey	17:41
darman	Is this channel an active channel, anybody here online? (Unfortunately all openstack channels on Libra.chat are silent!)	17:43
jrosser	there are people here :)	17:44
jrosser	most activity is working-days / working-hours EU time	17:45
jrosser	admin1: 2fa enablement is not really an OSA thing, you'd use a config override to enable the auth method then the rest is via the keystone API https://docs.openstack.org/keystone/latest/admin/resource-options.html#multi-factor-auth-enabled	17:46
jrosser	it's per user, and as OSA does not deploy end-users then there is not really anywhere to do that	17:47
darman	Ah, finally I found you (:	17:49
darman	I have some general question, and also some issues	17:49
*** dviroel\|lunch is now known as dviroel		17:50
darman	1. If you were to deploy a production environment, would you choose OVN as it's not as common as OVS? Personally I prefer OVN since it's been the next step in openstack networking development to redesign the network backend; But some technical aspects would help me to defeat the choice against my managers.	17:53
darman	2. This the error I get: https://pastebin.ubuntu.com/p/prnxqmSCb4/ when the setup-everything.yml reaches to the keystone service installation.	17:54
darman	this is*	17:54
darman	Error link*: https://pastebin.ubuntu.com/p/dCsjv6bz9p/	17:56
jrosser	i have no direct experience of OVN myself but we have other poeple here who are using it for real	17:58
darman	3. Do you know a general active channel for openstack itself (here on IRC or other online platforms)?	17:59
jrosser	regarding your deploy error it is not possible to know what is wrong from that output	18:00
jrosser	that ansible task has no_log: True on it as otherwise it would display the database password in the log output	18:00
jrosser	first thing you need to do is check haproxy that it thinks the database backends are up	18:01
jrosser	you can either use the haproxy log, hatop or the haproxy management web interface for that	18:01
darman	Where I should set`false` for this option: 'no_log: true'? in the user_variables.yml?	18:01
darman	"Overview — HATop: Interactive ncurses client for HAProxy"; I didn't know it!	18:03
darman	jrosser: Here are my variables: https://pastebin.ubuntu.com/p/wP2vCzGJwf/ Do you see something strange there for haproxy? Is there anything that has been forgotten in the config file? I would appreciate it if you take a look	18:11
darman	Link*: user_variables.yml: https://pastebin.ubuntu.com/p/CpvC3Ym36Y/	18:13
darman	Yes, there's an issue with haproxy: https://pastebin.ubuntu.com/p/pK2kmMXgFw/	18:24
jrosser	well first i would really advise against install_method: distro unless you have a super clear understanding of why you choose that	18:28
jrosser	then from your error message we see failed: [infra01_keystone_container-51bb0d04 -> infra01_utility_container-e956a5a6(172.17.236.15)	18:29
jrosser	^ an address in 172.17....	18:29
jrosser	but you define internal and external vip in 10.x ranges	18:29
darman	The installation process from the source was very long, almost 6 hours. I thought maybe it would be faster from the distro, which was no different. I will change it to the source in the next installation.	18:45
jrosser	it should not be 6 hours at all, that suggests some sort of problem	18:46
darman	in my experience: setup-hosts --> 45 minutes	18:46
darman	setup-infra: 1 h	18:47
jrosser	is this on real hardware or some virtualised environment?	18:47
darman	On VMS on proxmox	18:47
jrosser	oh right, well	18:47
jrosser	i think that the deploy time is pretty sensitive to disk speed	18:48
jrosser	having said that our CI jobs run a complete deployment on a single node in < 2hours	18:49
jrosser	and those are virtualised	18:50
jrosser	a bare metal node with an nvme disk might complete in < 1 hour	18:50
jrosser	anyway, it feels like your haproxy problem is networking related	18:51
jrosser	i don't understand what is happening with your addressing	18:51
darman	on a single node in < 2hours; What about 3 controllers and 2 computes?	19:00
jrosser	the equivalent of 3 controllers in one of our H/A CI jobs takes 20 mins for setup-infrastructure	19:08
spatel	darman i would go with OVN if this is new cloud. because after few year converting production cloud would be mess.	19:20
spatel	I am deploying all new cloud using OVN	19:20
darman	spatel: +1, the 'converting in the future' is good point	19:22
jrosser	darman: do you find anything yet with your galera trouble?	19:24
spatel	eventually linuxbridge will die if no maintainer left. new version of OS will stop delivering it.	19:24
darman	jrosser: not yet.	19:26
jrosser	you need to find out why from the perspective of haproxy the backend is down	19:27
jrosser	there is a healthcheck	19:27
jrosser	and there is basic network connectivity to check	19:27
darman	It seems that the examples in the repository (/etc/openstack_deploy) are not suitable for deploying with OVN.	19:27
darman	Is there a place where users have shared their configs? Or if it is possible to share the here by removing sensitive data?	19:27
jrosser	hopefully everything is in the documentation	19:28
darman	`an address in 172.17.... but you define internal and external vip in 10.x ranges` I manually changed it to 10.0.0 when posting the error here to make it clearer!	19:28
spatel	I did blog out some OVN stuff - https://satishdotpatel.github.io/openstack-ansible-multinode-ovn/	19:28
jrosser	https://docs.openstack.org/openstack-ansible-os_neutron/latest/app-ovn.html	19:28
jrosser	spatel: you may need to update your blog for the changes in zed/master?	19:29
spatel	related SSL?	19:29
spatel	but method would be same.. running playbook etc.. correct?	19:29
jrosser	well i don't know :)	19:29
spatel	i don't think we did any major changes in OVN deployment	19:30
spatel	I will sure deploy zed with multinode and give it a try	19:30
darman	spatel: nice, I'll try your configs in that blog post	19:31
spatel	Try in lab first and let me know if any change required..	19:31
spatel	jrosser we should put some of my blogs links to OSA/OVN deployment example. Its not prefect but can help someone to give it a try :)	19:33
jrosser	well i think it may just lead to confusion	19:33
spatel	I will add more stuff as required	19:34
jrosser	as the AIO now defaults to OVN......	19:34
jrosser	so that is the 'reference' deployment	19:34
darman	spatel: Ah, you're using `/etc/openstack_deploy/env.d/neutron.yml` there, but I don't have it! Let me try it.	19:34
jrosser	spatel: ^ see	19:34
jrosser	now we have total confusion	19:34
jrosser	darman: have you yet used the "all-in-one" deployment?	19:35
darman	No, I wanted an environment as close as possible to production.	19:36
spatel	jrosser you are right Zed has built in environment for OVN so that step can be skip.	19:36
jrosser	so why follow that blog?	19:37
jrosser	you already have the default neutron env.d from here which is wildly different https://github.com/openstack/openstack-ansible/blob/master/inventory/env.d/neutron.yml	19:37
jrosser	darman: i am pretty unclear what you want to acheive	19:37
jrosser	the all-in-one will get you going automatically in a single VM and is more likley to work than anything else, as it is the exact code that we run in CI	19:38
darman	jrosser: Installation test for a multi-node environment	19:38
admin1	issue with going right now with ovn is that it does not support all LB functions, .. so tools like CAPI do not work	19:38
jrosser	then when your multinode is haveing difficulty you can use the AIO as a reference to see what is different / broken	19:38
spatel	admin1 LB is totally different service, you can use amphora if you want advance LB feature with OVN. What is CAPI?	19:40
jrosser	darman: if you want help with your deployment error - do you have a specific question?	19:41
darman	I am doing a T-shoot. If I can't solve it, I will ask here	19:43
darman	jrosser: For the all-in-one, I'm going to follow this doc: https://docs.openstack.org/openstack-ansible/latest/user/aio/quickstart.html, it's ok, right?	19:44
jrosser	well, 'latest' in the URL means that is the documentation for master branch, which is the next release	19:45
jrosser	the current release is here https://docs.openstack.org/openstack-ansible/zed/user/aio/quickstart.html	19:45
darman	thanks	19:46
jrosser	and personally i would check out stable/zed instead of the tag	19:46
admin1	capi is Kubernetes Cluster API .. its getting popular now a days as the way to deploy k8s cluster on clouds	19:46
admin1	including os	19:46
admin1	os => openstack	19:46
admin1	i will test a multinode install with ovn and see how far i can go	19:46
admin1	darman, if you have a big server where you can create vms, you can make it as close to prod as possible ..	19:47
darman	I have an HP G8 server running ProxMox, old but still powerful	19:48
admin1	you can create vms, replicate the network and vlans and even router	19:49
admin1	mimic ip address and everything to the exact detail	19:49
admin1	i rented a AMD EPYC from hetzner :)	19:49
admin1	works good	19:49
admin1	put 2 nvmes in raid0	19:49
admin1	so that the build goes faster	19:49
admin1	and use vyos for the router	19:50
admin1	to mimic vlans and DC side of stuff	19:50
spatel	I am running all my openstack labs on single VMware HOST (gen8 with 128GB ram 1TB SSD)	19:50
darman	vyos --> interesting +1	19:53
darman	"the output has been hidden due to the fact that 'no_log: true' was specified for this result"	20:00
darman	How can I override `no_log` to be false	20:00
darman	?	20:00
jrosser	there is no way to override that without editing the code	20:01
jrosser	from the top of my head its something like /etc/ansible/ansible_collections/openstack/osa/roles/db_setup/tasks/main.yml	20:05
jrosser	^ adjust to match reality	20:05
darman	no_log is only used in: `/opt/openstack-ansible/playbooks/healthcheck-infrastructure.yml`	20:12
darman	`/opt/openstack-ansible/playbooks/ceph-rgw-keystone-setup.yml`	20:12
darman	`/opt/openstack-ansible/playbooks/rabbitmq-install.yml`	20:12
darman	by `grep -r no_log /opt/openstack-ansible/`	20:12
jrosser	did you see the path i gave?	20:13
darman	changing it false on all above file didn't have any effect! For keystone installation, it still says: FAILED! => {"censored": "the output has been hidden due to the fact that `no_log: true` was specified for this result", "changed": false}	20:13
darman	Oops, I saw that message now. w8	20:14
darman	Worked, and now It says what the issue is: "`unable to connect to database, check login_user and login_password are correct or /root/.my.cnf has the credentials. Exception message: (2013, 'Lost connection to MySQL server during query')"`	20:20
jrosser	did you get haproxy to think that the galera back end was up?	20:20
darman	From the haproxy aspect, all containers are down! https://i.imgur.com/SPceco6.png	20:27
darman	^ `hatop -s /var/run/haproxy.stat`	20:27
jamesdenton	spatel Your blog is great, but some of what you outline is no longer necessary with OSA Zed, and there's an extra group or two that need defined.	20:28
jrosser	darman: they will all be down until the services are deployed, and as you have a failure on keystone that is the first openstack service, so it is not a surprise that they are down	20:30
jrosser	however, the database service should be up after you have run setup-infrastructure	20:30
jrosser	look at really basic things, is the database service in the db container actually running? does the journal suggest anything is wrong	20:31
jrosser	can you ping the db backend IP from where haproxy is running	20:31
jrosser	what happens if you curl/wget the db backend healthcheck service from haproxy?	20:31
admin1	darman, single controller ?	20:50
admin1	i had the same issue a day back .. i had to manually fix the database check to whitelist the ip	20:50
darman	Woooops! not possible to ping containers as I was using the wrong range in the `openstack_user_config.yml` for br-mgmt interface. I'm going to destroy containers, then deploy everything from the step setup-hosts.yml to assign new IPs to the containers.	20:55
jrosser	darman: also make sure you disable any IP/mc security stuff if there is any in proxmox	20:56
jrosser	*ip/mac address....	20:56
jrosser	admin1: it is not one controller	20:56
prometheanfire	ping from vm on node 1 to vm on node2 fails with ovn for me, vm on node 1 to second vm on node 2 works. I see the icmp packets hit node2's geneve interface though, but nothing beyond that	21:00
prometheanfire	trying to figure out why packets are not being forwarded is 'fun'	21:01
prometheanfire	that I can't run ovn-nbctl (or sbctl) doesn't help, tried passing the right socket and ssl terms	21:02
spatel	prometheanfire do you have OVN in cluster?	21:04
spatel	you can run ovn-nbctl only on leader node.	21:04
prometheanfire	oh, didn't know that part, guess I'll run it on the leader lol	21:04
spatel	if you want to run from member node then you need to pass some switch call --not-leader or something...	21:05
spatel	--no-leader-only	21:06
spatel	https://man7.org/linux/man-pages/man8/ovn-nbctl.8.html	21:06
spatel	you can use that switch on non-leader node to get data of OVN	21:06
prometheanfire	ya, got the command working at least	21:06
spatel	ovn has nice tool called ovn-trace which can simulate packet flow and tell you where is the blockage or drop	21:08
spatel	jamesdenton i will redefine my blog with latest Zed or make some comments.	21:09
spatel	jrosser is correct because when i deploy openstack on VMware then i disabled mac spoofing and some security shit in VMware.	21:12
*** dviroel is now known as dviroel\|pto		21:12
prometheanfire	not getting anything useful from ovn-trace, shows that the packet should reach the instance :\|	21:50
spatel	you can ping vm running on same compute node but not across the compute nodes correct?	21:53
prometheanfire	yep	21:53
prometheanfire	I see the packet reach the geneve interface on compute-node-2	21:54
spatel	Geneve tunnel is up.. assuming yes	21:54
prometheanfire	but that's the end	21:54
prometheanfire	is there a way to regenerate the openflow table on node-2?	21:54
spatel	security group etc.. blocking it	21:54
prometheanfire	I don't think so, at least the ovn-trace seemed to work	21:55
spatel	what is the output of ovs-vsctl show?	21:55
prometheanfire	for br-int?	21:56
spatel	ovs-vsctl show command output	21:57
prometheanfire	https://pastebin.com/raw/JBFrSy4v	21:58
spatel	looks good so far i can see tunnel and tap interface on br-int bridge	22:00
prometheanfire	yep, I can only think that it's some flow that's not working, harder to troubleshoot that lxb lol	22:01
prometheanfire	is there a good way to rule out port security?	22:04
prometheanfire	ofctl dump-ports shows the vm port recieving packets at the rate of the ping, so ovs seems to be routing it that far	22:06
spatel	This is what i have and everything works for me - https://paste.opendev.org/show/bjDY3HTMJV4fNtzIGSxK/	22:09
spatel	i wonder why we have br-tun	22:10
spatel	in my case i have tunnel directly connected to br-int	22:10
spatel	make sure you configure security-group with allow all..	22:11
prometheanfire	I just disabled security groups entirely on the port to test, no good	22:11
spatel	many time i endup in that issue where i assumed security-group is ok but endup finding issue there	22:12
spatel	what do you means disable security-group entries?	22:12
prometheanfire	openstack port set --no-security-group --no-port-security	22:12
prometheanfire	something like that	22:12
spatel	i don't think that is the issue here.. i am talking about security-group rules	22:13
prometheanfire	ah, with things disabled that's not it	22:13
spatel	openstack security group list	22:13
prometheanfire	I have a secgroup allowing all outbound and icmp+22 inbound	22:13
spatel	just make sure.. its :)	22:14
prometheanfire	also, having just removed the secgroup from the port should remove that variable, ovn-trace says all packets should reach (tested port 123)	22:14
spatel	I have to leave now.. but please keep us posted on progress	22:19
spatel	run ovs-tcpdump command which will help you to find painpoints	22:19
prometheanfire	yep, used that too :D	22:24
prometheanfire	cya	22:24

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!