Thursday, 2021-07-08

*** zbr is now known as Guest164		05:03
opendevreview	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_neutron master: Exclude neutron from venv constraints https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/798960	05:47
*** rpittau\|afk is now known as rpittau		06:59
*** sshnaidm_ is now known as sshnaidm		08:25
mindthecap	hi! Asking this here aswell (copying from openstack channel): i'm gettin an error with clean install using OSA victoria. I can't create instance - volume attachment fails with cinder error "Invalid input received: Connector doesn't have required information: initiator)". The error is persistent when i try to attach already made volume to the instance.	11:48
mindthecap	.I'm out of ideas what / where to check. I'm using ISCSI (lvm).	11:48
jrosser	mindthecap: i'm guessing you've got something like this in your config https://github.com/openstack/openstack-ansible/blob/master/etc/openstack_deploy/openstack_user_config.yml.example#L631-L642	12:23
fridtjof[m]	mindthecap: try (re)starting iscsi related services on the compute node. I don't remember which one did the trick right now	12:33
mindthecap	thanks! for some reason iscsid was disabled and stopped on compute hosts. Started them and it works.	12:42
mindthecap	It's weird that the service is stopped and not started event tho i have replayed OSA deployment scripts many times.	12:43
*** rpittau is now known as rpittau\|afk		12:45
jrosser	mindthecap: which OS are you using?	13:03
spatel	any idea related this error during building new vm - {"message": "Build of instance 4e65ec9b-1b47-4972-98f9-2430d67eece5 aborted: Failed to allocate the network(s), not rescheduling.", "code": 500, "created": "2021-07-08T13:08:01Z"}	13:12
spatel	i am not seeing any bad error in neutron logs	13:12
spatel	still looking for more evidence	13:12
noonedeadpunk	well, I see issues for neutron OVN jobs for master	13:13
noonedeadpunk	or you're not talking about OVN?	13:13
spatel	no no	13:15
spatel	i have real production issue in my old openstack	13:15
spatel	what is the issue related OVN?	13:15
noonedeadpunk	I was seing that when dhcp agent was stuck for some reason in ovs	13:15
spatel	hmm! in CI job?	13:16
spatel	send me link i will try to debug and see	13:16
noonedeadpunk	no, in our prod) so when nutron dhcp agent was acting weird, we were not able to create VM with same issue and nothing in any logs	13:16
noonedeadpunk	regarding OVN - https://review.opendev.org/c/openstack/openstack-ansible-os_neutron/+/798960	13:17
spatel	hmm let me check DHCP logs and see	13:17
noonedeadpunk	there's issue with calico, but I think I will just bump it back to old version for now...	13:17
spatel	ok	13:17
spatel	noonedeadpunk my /var/log/neutron/neutron-dhcp-agent.log looking very clean	13:19
spatel	noticed in this neutron log - 2021-07-08 09:19:42.699 26878 ERROR oslo.messaging._drivers.impl_rabbit [-] [6cd375b0-8f3a-4a70-b2fa-91d52197b74b] AMQP server on 172.28.15.248:5671 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer	13:20
spatel	restarting and see	13:20
spatel	noonedeadpunk do you think this is something serious - http://paste.openstack.org/show/807274/	14:07
noonedeadpunk	um, well, it could mean that either smth wrong with rabbit (but you would see that in other services) or dhcp just stuck for some reason	14:12
noonedeadpunk	and not replying messages	14:12
spatel	rabbitMQ cluster looking health so not sure what is the issue, but i can restart rabbitMQ cluster	14:13
noonedeadpunk	and restart of dhcp agents didn't help?	14:14
spatel	does just restarting rabbitMQ service is enough?	14:14
spatel	no help with dhcp agent restart	14:14
noonedeadpunk	hm...	14:14
jrosser	you could try starting it with debug logging	14:14
spatel	neutron agent?	14:15
spatel	i meant dhcp	14:15
jrosser	yeah, you'd get some idea if it was just sitting doing nothing, or somehow spinning and failing	14:15
spatel	let me try that also...	14:16
noonedeadpunk	also - is it regarding only one network or all networks are failing?	14:17
spatel	i am able to delete vm but not able to create, does that also related to rabbitMQ issue? i know if rabbitMQ is not working then you can't delete vm	14:17
noonedeadpunk	as what we also did - we were adding another dhcp agent to the network	14:17
noonedeadpunk	as it might be issue with namespace actually	14:17
spatel	This is related to any network.. when i create vm it stuck in BUILD and then throw error - aborted: Failed to allocate the network(s), not rescheduling	14:18
noonedeadpunk	well in our case it sometimes was dependant on the network where port for VM resides	14:19
noonedeadpunk	s/sometimes/most times	14:19
spatel	hmm	14:23
spatel	jrosser i have enable debug in /etc/neutron/dhcp_agent.ini is that correct place?	14:23
spatel	after restart agent not seeing any good info except this error - http://paste.openstack.org/show/807274/	14:24
jrosser	i think it's normally somewhere right at the top of /etc/neutron/neutron.conf	14:24
spatel	let me try that	14:24
spatel	does DHCP agent talk to RabbitMQ?	14:25
jrosser	this looks similar https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=1774764	14:26
spatel	hmm but not saying any solution or something	14:27
jrosser	but kind of unhelpful other than it does reference some patches like https://review.opendev.org/c/openstack/neutron/+/659274 and https://review.opendev.org/c/openstack/neutron/+/694561	14:27
jrosser	but i'm just totally guessing	14:28
spatel	when its saying timeout does that it saying fail to talk to neutron server or rabbitMQ?	14:29
spatel	jrosser after restarting all nova-* service look like i am able to spin up vms	14:36
spatel	doesn't make any sense	14:36
jrosser	nothing useful in the nova log?	14:37
jrosser	anyway remember to put any debug back to False :)	14:37
spatel	i noticed some errors like failed to talk to neutron	14:37
jrosser	that is probably related	14:37
jrosser	as booting the VM / creating the port are very much coupled	14:38
spatel	I strongly believe my neutron-server is under presser	14:38
spatel	I have 800 vms running on this cloud..	14:39
spatel	does more vms put pressure on neutron or compute host?	14:39
spatel	on this cloud i have 260 compute hosts and 800 VMs..	14:41
noonedeadpunk	it would put pressure on rabbit at the first place	15:02
noonedeadpunk	so maybe it's timeouting for reason	15:02
*** frickler is now known as frickler_pto		15:03
spatel	hmm	15:04
spatel	how do you guys scale rabbitMQ?	15:05
spatel	Do you guys using some kind of different HA queue in rabbitMQ? like don't sync ABC queue etc?	15:06
spatel	i have noticed rabbit is doing bad job when you sync all queue	15:06
noonedeadpunk	it really does	15:06
noonedeadpunk	ha queue is really a penalty on performance	15:07
spatel	so what do you suggest? I am running everything default come out from OSA	15:07
spatel	never thought of playing with ha queue	15:07
noonedeadpunk	but otherwise you might get issues when restarting rabbit	15:08
noonedeadpunk	I'm not sure though, might be with maintenance mode this can be workedaround, but I never had a time to play a lot with it	15:08
noonedeadpunk	also it's available only from rabbit 3.8 or smth like that	15:09
noonedeadpunk	eventually I know that at some scale ppl even start using dedicated nodes for rabbit to gain some more performance	15:10
noonedeadpunk	and have like 5-7 nodes for rabbit...	15:10
spatel	agreed on dedicated nodes, but still even you add more node that add more load on HA syncing job	15:11
noonedeadpunk	iirc there were also some new features to rabbit that allowed to sync queues between specific nodes only, but not sure here	15:11
noonedeadpunk	yeah, they don't use ha queus	15:12
noonedeadpunk	not sure how they recover in case of rabbit failure though...	15:12
noonedeadpunk	as what we saw previously without ha queus is that services got stuck with rabbit failover for some reason. Maybe it's solved now though	15:12
spatel	when i last time talk to someone they said they are running this HA policy which help them a lot - http://paste.openstack.org/show/807279/	15:13
spatel	no HA for notifications* etc which is useless	15:14
spatel	i will try to play in lab and see how we can make RabbitMQ more responsive... its painful when you want to scale..	15:15
spatel	OVN can solved lots of rabbitMQ issue but its itself a beast	15:15
fridtjof[m]	jrosser: re the issue mindthecap was having, i've personally experienced this on ubuntu 18.04 at least	15:47
jrosser	ah - i was wondering if it was a centos type thing where the service is installed but disabled by default	15:48
*** mgoddard- is now known as mgoddard		20:50
opendevreview	Ghanshyam proposed openstack/openstack-ansible master: Moving IRC network reference to OFTC https://review.opendev.org/c/openstack/openstack-ansible/+/800127	23:25
opendevreview	Ghanshyam proposed openstack/ansible-hardening master: Moving IRC network reference to OFTC https://review.opendev.org/c/openstack/ansible-hardening/+/800128	23:26

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!