Monday, 2022-12-26

*** chkumar\|rover is now known as chandankumar		03:31
noonedeadpunk	For those who use prometheus and libvirt exporter - it might be useful to know that project has changed an owner to quite contraversary one (just own opinion) - some details are in kolla patch - https://review.opendev.org/c/openstack/kolla/+/868161	10:20
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-openstack_hosts master: Allow to manage extra services, mounts and networks https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/868534	10:23
*** dviroel_ is now known as dviroel		11:15
anskiy	question! I have openstack installation with one region and Ceph cluster. I'm trying to move Cinder to control-plane nodes and use it in active-active mode. So, if I understood correctly: there would be only one cinder-volume service which would be attached to one AZ. Suppose I want to add another AZ (which should represent another DC), should I create another Ceph cluster with separate cinder-volume service on the exact sa	13:25
noonedeadpunk	anskiy: it kind of depends on your AZ implementation	13:26
noonedeadpunk	I'm doing AZ deployment at the moment and was planning to publish some better docs about it (and had some talk on how to configure AZs with OSA in OCtober)	13:26
noonedeadpunk	But long story short - it does depend on your requitements. For AZ you can both share and separate storages. So if your DCs are less then 10km from each other and you're confident in link between them - you might want to strecth ceph cluster between AZs	13:28
noonedeadpunk	But if you want separate ceph clusters - you can do that as well. But then I think you will need to spawn independant cinder-volumes, if you want to isolate az1 going to storage in az2	13:28
anskiy	noonedeadpunk: that's gonna be a one-to-one relation between AZ and DC, but control-plane nodes would be only in one DC. Is this reasonable (with your note on my trust in cross-DC network link)? Or do people just span control-plane across DCs for multi-AZ setup?	13:31
noonedeadpunk	Well, I personally spawn control plane cross DC, but we're going to have 3 AZs	13:35
noonedeadpunk	As then, you can survive complete AZ failure even API wise	13:35
noonedeadpunk	I did a pair of keepalived instances per AZ, so 3 public instances and 3 privatre, and then DNS RR	13:36
noonedeadpunk	Also, haproxy is targeting only local to AZ backends to reduce cross-az traffic	13:36
noonedeadpunk	and same can be done with internal vip either through /etc/hosts or dns - so services in containers will talk to local haproxy and be pointed towards local backends (ie nova-cinder communication)	13:39
noonedeadpunk	the only nasty thing are images in glance	13:39
noonedeadpunk	is I wasn't able to find proper way to satisfy everyone without using swift backend instead of rbd. If you're fine with using interoperable import only - there's a way around I guess.	13:40
noonedeadpunk	(so depends on how you can mandate users behaviour)	13:41
anskiy	noonedeadpunk: thank you for the insights, gonna have to think more about this.	13:50
noonedeadpunk	anskiy: that's actually the talk I was mentioning - it's far from being a good one, but still might give some insights https://www.youtube.com/watch?v=wvTvfAR_4eM&list=PLuLHMFPfD_--LAMu7bBkCNAXfTy04iLPj There're also presentations for the event lying in public access somewhere	13:56
moha7	while `telnet To whom not being in vacation (:	14:11
moha7	OSA was deployed successfully (I mean not getting any error in deployment processes), But now when I run the command ` openstack network list`, I get this error:	14:11
moha7	HttpException: 503: Server Error for url: http://172.17.246.1:9696/v2.0/networks, 503 Service Unavailable: No server is available to handle this request.	14:11
moha7	while `telnet 172.17.246.1 9696` from the infra1-utility-container gets connected to the port 9696	14:12
moha7	There's a same error here: https://bugzilla.redhat.com/show_bug.cgi?id=2045082#c9	14:12
noonedeadpunk	moha7: you should be telneting not to haproxy (which listens on 172.17.246.1 I guess), but to haproxy backends, or to put in a better way - mgmt address of neutron-server container	14:24
noonedeadpunk	the error you see most likely says that haproxy for some reason can't reach neutron-server either because of some networking issue, or because neutron-server died	14:25
moha7	172.17.246.1 --> internal vip	14:27
moha7	172.17.246.174 --> infra1-neutron-server-container-21189fcd	14:28
moha7	can not telnet to infra1-neutron-server-container-21189fcd from infra1-utility-container-5cf19aed on port 9696	14:29
noonedeadpunk	but does anything listein inside infra1-neutron-server-container-21189fcd container on that port?	14:31
moha7	There's a service there named "neutron.slice" with some error. I've never seen this name before! Services status: http://ix.io/4jAM	14:31
noonedeadpunk	so you're trying to have OVN as a networking driver?	14:33
noonedeadpunk	Or you don't care and just spawning default option?	14:33
moha7	nobody listens to 9696 in the neutron lxc container: http://ix.io/4jAN	14:33
noonedeadpunk	mhm, yeah, I guess it's related to ovn init issue - `ValueError: :6642: bad peer name format`	14:35
jamesdenton	that's a missing northd group	14:35
moha7	I didn't know OVN is the default and previously was configuring it as I was thinking it is on linuxbridge, trying to port it to OVS; But this time, I deployed it with ovn as it is the default option.	14:35
noonedeadpunk	moha7: yes, we switched default to OVN in Zed	14:36
noonedeadpunk	But you can still use lxb if you want to	14:36
moha7	I followed this post: https://satishdotpatel.github.io/openstack-ansible-multinode-ovn/ to configure the user_variables and openstack_user_config files	14:36
noonedeadpunk	yeah, I think northd group was introduced relatively lately	14:37
noonedeadpunk	So you'd need to add network-northd_hosts definition to your openstack-user-config.yml	14:38
jamesdenton	that blog is likely a little outdated. That is the way ^^^	14:38
jamesdenton	Something like --> network-northd_hosts: *controller_hosts, if you have an aliad setup	14:39
jamesdenton	*alias	14:39
noonedeadpunk	si we basically made override `env.d/neutron.yml` as default behaviour	14:39
moha7	Yeah, I have not set network-northd_hosts; Does it need to an OVN gateway toa sout network deinitions to?	14:42
moha7	Yeah, I have not set network-northd_hosts; Does it need to an OVN gateway toa sout network deinitions to?	14:42
moha7	Does it need to an OVN gateway too?*	14:42
moha7	noonedeadpunk: So, the setting for env.d/neutron.yml that is introduced in that blog is wrong?	14:44
jamesdenton	moha7 those aren't really necessary anymore	14:44
moha7	Then, network-northd_hosts would be enough, right?	14:44
jamesdenton	you will likely want:: network-gateway_hosts: *compute_hosts	14:44
jamesdenton	So, all computes are ovn controllers. you can decide if you want computes to be gateway nodes with that ^^	14:45
jamesdenton	or, you can make controllers or dedicated network nodes the gateway nodes using the appropriate alias	14:45
jamesdenton	moha7 the blog was correct as of early December. This is a very recent change, and docs are forthcoming	14:46
moha7	jamesdenton: I'm not enogh familiar with OVN to decide where I should put the gateway! Based on the picture in the post below, seems compute hosts are a good option:	14:48
moha7	https://blog.russellbryant.net/2016/09/29/ovs-2-6-and-the-first-release-of-ovn/	14:48
jamesdenton	yes, i agree, the gateway on computes mirrors the OVS DVR arch	14:49
jamesdenton	and i think that was the intention	14:49
moha7	Do you know any recent document on OVN, I'm searching but couldn't find any recent document!	14:49
jamesdenton	hmm, i don't really. sorry	14:49
jamesdenton	https://docs.openstack.org/networking-ovn/latest/admin/refarch/refarch.html	14:50
jamesdenton	that might help?	14:50
moha7	Thanks; By this changes, should deploy from scratch? Or just the os-neutron-install.yml would be enough?	14:51
moha7	jamesdenton: Sure, thanks for the link; It seems OVN is an interesting backend with new concepts	14:52
jamesdenton	just -os-neutron-install should be enough	14:52
moha7	+1	14:53
*** dviroel is now known as dviroel\|lunch		15:08
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_hosts master: Allow to create OVS bridge for lxcbr0 https://review.opendev.org/c/openstack/openstack-ansible-lxc_hosts/+/868603	15:32
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-lxc_container_create master: Add bridge_type to lxc_container_networks https://review.opendev.org/c/openstack/openstack-ansible-lxc_container_create/+/868604	15:40
moha7	now, after adding network-northd_hosts and network-gateway_hosts (here: http://ix.io/4jB4), there's no more of this error: "ValueError: :6642: bad peer name format", but this warning is in the status output for neutron-server and neutron.slice: http://ix.io/4jB3 Is there any other option missing? \| the command `openstack network list` on utility container returns "Gateway Timeout (HTTP 504)" after a long wait. \| port 9696	15:54
moha7	is not up on none of neutron containers	15:54
noonedeadpunk	And can you telnet to 172.17.246.1 3306 from neutron-server?	15:56
moha7	It's connected, but closed free fast!	15:58
noonedeadpunk	that can also be result of the bug that should be fixed with https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/868415	15:58
noonedeadpunk	but I think that you should have neutron version installed that is not affected by it yet	15:58
moha7	"Connection closed by foreign host."	15:58
noonedeadpunk	so sounds more like mariadb thingy	15:58
noonedeadpunk	Ok, and can you run `mysql -e "SHOW GLOBAL STATUS LIKE 'wsrep_%'"` from utility or galera container?	15:59
noonedeadpunk	eventually, results from galera and utility may differ	15:59
jamesdenton	it does seem like haproxy and/or galera are being a problem	16:00
moha7	from utility, the output of that mysql command is: ERROR 2013 (HY000): Lost connection to server at 'handshake: reading initial communication packet', system error: 1	16:00
jamesdenton	you might check the status of haproxy, it might be worth re-running haproxy playbook or simply restarting the service	16:00
noonedeadpunk	that sound like ssl	16:01
noonedeadpunk	and from galera?	16:01
moha7	from galera container: ERROR 2002 (HY000): Can't connect to local server through socket '/var/run/mysqld/mysqld.sock' (111)	16:01
noonedeadpunk	huh	16:01
noonedeadpunk	and systemctl status mariadb?	16:01
moha7	failed: http://ix.io/4jBa	16:03
moha7	`galera_new_cluster` couldn't start it.	16:04
noonedeadpunk	well. here you go... Do you have some strict firewall rules between controlelrs?	16:04
moha7	no at all	16:05
moha7	Seems I should re-deploy it, right?	16:05
jamesdenton	what is the status of the other 2 galera containers?	16:05
moha7	w8	16:05
noonedeadpunk	You can try re-running `openstack-ansible playbooks/galera-server.yml -e galera_ignore_cluster_state=true -e galera_force_bootstrap=true` if they're also down	16:06
noonedeadpunk	it either fail or succeed	16:06
moha7	jamesdenton: "Failed to start MariaDB" on all 3 galera containers.	16:07
jamesdenton	ok, try what noonedeadpunk mentioned	16:07
moha7	+1	16:07
noonedeadpunk	I wonder why they all would fail though	16:07
noonedeadpunk	doesn't sound too healthy that they did	16:08
jamesdenton	also, if you can post the output of this from each container, that would be helpful: cat /var/lib/mysql/grastate.dat	16:09
noonedeadpunk	bet it all -1	16:11
noonedeadpunk	I have impression that grstate kinda broken for a while as haven't seen anything except -1 there for years now	16:11
noonedeadpunk	Or maybe we were failing in a way that's not covered only	16:12
moha7	jamesdenton: http://ix.io/4jBc	16:13
jamesdenton	interesting, i feel like i've seen this before	16:14
jamesdenton	from within the ct3 container, can you use the mysql client?	16:16
moha7	I re-run the galera cluster from contanier3, there are some errors there, but now, `mysql -e "SHOW GLOBAL STATUS LIKE 'wsrep_%'"` returns the tables on utility	16:16
jamesdenton	ok, so on ct2 and ct1, it should just be a matter of "systemctl start mariadb"	16:16
noonedeadpunk	hm	16:19
noonedeadpunk	that's weird	16:19
noonedeadpunk	these errors in log should have been covered with https://opendev.org/openstack/openstack-ansible-galera_server/src/branch/master/defaults/main.yml#L112-L114	16:20
moha7	now, started and running on all galera nodes, but returining `[Warning] Aborted connection 67 to db: 'neutron' user: 'neutron' host: 'ct1-neut ron-server-container-21189fcd.openstack.local' (Got an error reading communication packets)` in the `systemctl status mariadb`	16:20
jamesdenton	ok - try rerunning neutron playbooks now that the DB is up	16:21
moha7	still no port 9696 on ct1-neutron-container	16:21
moha7	Ah, ok	16:22
noonedeadpunk	I'd say that galera is unlikely is desired state tbh	16:22
jamesdenton	and that could bem too	16:22
jamesdenton	maybe rerun setup-infra and setup-openstack?	16:22
noonedeadpunk	as `FATAL ERROR: Upgrade failed` is not good tbh	16:22
noonedeadpunk	and all these tmp tables shouldn't be there	16:22
moha7	I have snapshots. I rollback to the step where setup-hosts.yml was done	16:23
moha7	and start from setup-infra	16:23
jamesdenton	ok, don't forget to add the groups, then	16:24
moha7	The deployment server is standalone, not on the nodes	16:25
jamesdenton	gotcha	16:26
*** dviroel\|lunch is now known as dviroel		16:32
*** dviroel is now known as dviroel}out		19:37
*** dviroel}out is now known as dviroel\|out		19:37

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!