Monday, 2020-12-14

*** prometheanfire has joined #openstack-ansible		00:20
*** tosky has quit IRC		00:36
*** cshen has joined #openstack-ansible		01:45
*** cshen has quit IRC		01:49
ThiagoCMC	Never stop! OpenStack is fun! =P	02:22
ThiagoCMC	Victoria this week?! lol	02:22
*** dave-mccowan has joined #openstack-ansible		02:56
*** cshen has joined #openstack-ansible		03:45
*** cshen has quit IRC		03:50
*** akahat is now known as akahat\|ruck		04:11
*** evrardjp has quit IRC		05:33
*** evrardjp has joined #openstack-ansible		05:33
*** cshen has joined #openstack-ansible		05:44
*** cshen has quit IRC		05:48
*** cshen has joined #openstack-ansible		06:05
*** cshen has quit IRC		06:09
*** cshen has joined #openstack-ansible		06:12
*** cshen has quit IRC		06:17
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_zun master: Update zun role to match current requirements https://review.opendev.org/c/openstack/openstack-ansible-os_zun/+/763141	06:26
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_zun master: Update zun role to match current requirements https://review.opendev.org/c/openstack/openstack-ansible-os_zun/+/763141	06:31
*** kukacz has quit IRC		06:34
*** pto has joined #openstack-ansible		06:38
*** pto has quit IRC		06:38
openstackgerrit	Merged openstack/openstack-ansible-tests stable/ussuri: Bump virtualenv to version prior to 20.2.2 https://review.opendev.org/c/openstack/openstack-ansible-tests/+/766801	06:46
*** kukacz has joined #openstack-ansible		06:57
*** pcaruana has joined #openstack-ansible		07:50
*** pcaruana has quit IRC		07:51
*** masterpe has quit IRC		08:16
*** gundalow has quit IRC		08:16
*** tbarron has quit IRC		08:16
*** cshen has joined #openstack-ansible		08:17
*** johanssone has quit IRC		08:19
*** andrewbonney has joined #openstack-ansible		08:20
*** gundalow has joined #openstack-ansible		08:22
*** tbarron has joined #openstack-ansible		08:22
*** johanssone has joined #openstack-ansible		08:23
*** rpittau\|afk is now known as rpittau		08:27
*** tosky has joined #openstack-ansible		08:38
*** akahat\|ruck is now known as akahat\|lunch		09:08
noonedeadpunk	mornings	09:10
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: [DNM] https://review.opendev.org/c/openstack/openstack-ansible/+/766901	09:15
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_zun master: Update zun role to match current requirements https://review.opendev.org/c/openstack/openstack-ansible-os_zun/+/763141	09:17
*** macz_ has joined #openstack-ansible		09:20
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/ussuri: Apply /etc/environment for runtime after adjustment https://review.opendev.org/c/openstack/openstack-ansible/+/766798	09:21
noonedeadpunk	jrosser: regarding security.txt - you decided to have both for keystone and haproxy?	09:23
*** macz_ has quit IRC		09:24
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone stable/ussuri: Move openstack-ansible-uw_apache centos job to centos-8 https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/765928	09:25
jrosser	noonedeadpunk: it's keystone apache/nginx that serves the actual file	09:38
jrosser	but intervention is needed on haproxy to intercept https://example.com:443/security.txt to the backend which is normally listening on port 5000	09:39
* jrosser double checks the patch		09:41
noonedeadpunk	ah, ok	09:57
jrosser	git breakage on train bump upgrade job, python_venv_build repo fatal: reference is not a tree: 74d3eeacc72d5d6bb7a915e83440626a8d16a1c0	10:07
jrosser	that is so wierd	10:08
*** gshippey has joined #openstack-ansible		10:11
*** akahat\|lunch is now known as akahat\|ruck		10:32
*** sshnaidm\|off has quit IRC		10:47
*** SecOpsNinja has joined #openstack-ansible		11:06
noonedeadpunk	andrewbonney: ok, so seems focal just fails with kuryr from victoria	11:09
noonedeadpunk	seems it's missing some other backport to victoria	11:09
noonedeadpunk	oh, sorry pinged to early - it's still on passing tempest step :9	11:10
andrewbonney	:)	11:10
noonedeadpunk	it's bionic passed	11:10
andrewbonney	I've got an AIO going so I can always debug further	11:10
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove *_git_project_group variables https://review.opendev.org/c/openstack/openstack-ansible/+/766039	11:11
noonedeadpunk	I've returned patch o state of patchset 10 which was yours last one	11:12
*** sshnaidm has joined #openstack-ansible		11:17
andrewbonney	noonedeadpunk: timing looks suspicious for the release of https://docs.docker.com/engine/release-notes/#20100. I'm investigating...	11:49
noonedeadpunk	oh, well, it really does...	11:49
noonedeadpunk	do we add docker repo? as I'm not sure ubuntu would just publish latest....	11:50
andrewbonney	Yeah, the ubuntu ones tend to be a long way behind	11:50
noonedeadpunk	if we add reo, probably it's worth using apt_package_pinning role	11:51
andrewbonney	I'll take a look at that once I can confirm a downgrade fixes the test	11:52
noonedeadpunk	good example I guess in rabbit https://opendev.org/openstack/openstack-ansible-rabbitmq_server/src/branch/master/tasks/install_apt.yml#L16-L30	11:55
andrewbonney	Thanks. That definitely fixes it so I'll add a pin	11:58
noonedeadpunk	and drop depends on I've added then :)	11:59
andrewbonney	Will do	11:59
*** rfolco has joined #openstack-ansible		12:03
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: [doc] Adjut octavia docs https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/766833	12:10
openstackgerrit	Merged openstack/openstack-ansible-os_masakari master: Add taskflow connection details https://review.opendev.org/c/openstack/openstack-ansible-os_masakari/+/766830	12:24
openstackgerrit	Merged openstack/openstack-ansible-os_octavia master: Delegate info gathering to setup host https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/766693	12:41
openstackgerrit	Merged openstack/openstack-ansible-os_octavia master: Trigger service restart on cert change https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/766062	12:41
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: [doc] Adjut octavia docs https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/766833	12:53
openstackgerrit	Merged openstack/openstack-ansible-os_keystone master: Remove centos-7 conditional packages https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/765931	12:58
openstackgerrit	Merged openstack/openstack-ansible-openstack_hosts master: Make CentOS 8 metal voting again https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/766425	12:58
openstackgerrit	Merged openstack/openstack-ansible master: Bump SHAs for master https://review.opendev.org/c/openstack/openstack-ansible/+/766858	13:03
openstackgerrit	Andrew Bonney proposed openstack/openstack-ansible-os_zun master: Update zun role to match current requirements https://review.opendev.org/c/openstack/openstack-ansible-os_zun/+/763141	13:11
admin0	quick question .. when using sr-iov, is it transparent in horizon ?	13:14
admin0	i mean can a user create instances as normal and it will get sr-iov ports	13:14
*** mgariepy has joined #openstack-ansible		13:18
*** redrobot has quit IRC		13:29
jrosser	admin0: an sriov vm is always two steps - create the port, create the vm attached to the port	13:30
jrosser	so it's not the same as a non-sriov case	13:31
admin0	i want to be able to give horizon and not do support in office .. so looking for something easy for me to explain to users	13:35
jrosser	well it is what it is, you can write instructions for how to do this with horizon, but if i remember right it is different to a regular vm	13:37
admin0	and i also read the sr-iov can be used with linuxbridge also .. no need for ovs	13:38
admin0	if you personally had a choice for a new greenfield with both ovs and sr-iov , lb as well as ovs, what would you recommend .. ( for an internal cloud with 1000+ users) , so trying to keep support and complexity to a min	13:39
admin0	and also, if the card support sr-iov, dpdk, is it not a good idea to use it ?	13:39
jrosser	they are not the same, so if you want line speed networking to your VM then sriov is one way to do that	13:39
jrosser	but if you want security groups, or vxlan, and all the other stuff, then you want linuxbridge/ovs	13:40
admin0	can both co-exist	13:40
jrosser	yes	13:40
admin0	like normally people will get ovs/lb .. but if they want very fast, do the sr-iov stuff	13:40
jrosser	generally the recommendation is to have a dedicated nic for sriov	13:40
admin0	oh	13:40
admin0	so 2 diff vlan providers ..	13:41
jrosser	well you don't have to, but you mix up a lot of things	13:41
admin0	one for sr-iov, one for normal	13:41
admin0	that is good to know as well	13:41
admin0	so i will go with regular lb for now for this 10g .. . and later, add another 10g and dedicate it for sr-iov .. I do not need ovs at all right ?	13:43
admin0	one more question .. osa does support mixed hypervisors ? like one using lb and one using ovs ?	13:46
admin0	this specific use case might see upto 200 (small) instances in a single hypervisor .. so trying to figure out at what point lb will be a bottleneck	13:47
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove *_git_project_group variables https://review.opendev.org/c/openstack/openstack-ansible/+/766039	13:52
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove *_git_project_group variables https://review.opendev.org/c/openstack/openstack-ansible/+/766039	13:55
*** dave-mccowan has quit IRC		13:55
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove openstack_testing.yml for RC https://review.opendev.org/c/openstack/openstack-ansible/+/766957	13:56
*** spatel has joined #openstack-ansible		13:58
*** mgariepy has quit IRC		14:06
*** mgariepy has joined #openstack-ansible		14:06
jrosser	spatel: is that a discussion for #rdo or really for here?	14:12
spatel	sorry i didn't realized i am in RDO :(	14:14
spatel	what is your thought on that?	14:14
noonedeadpunk	I think we will probably try to switch to RDO but as you might get, there're no guarantees with CentOS these days...	14:15
spatel	noonedeadpunk: that is why i am worried	14:15
noonedeadpunk	there will be also Cloudlinux forks of CentOS	14:15
spatel	right now i have choice to make after 1 year i don't	14:15
noonedeadpunk	but yeah...	14:15
jrosser	i did a Centos 8 Stream AIO this morning	14:16
noonedeadpunk	eventually even cPanelstarted development for Ubuntu and promises to release till end of the 2021	14:16
noonedeadpunk	I'm pretty sure it just worked :)	14:16
spatel	jrosser: what is your experience	14:16
jrosser	right now i see this Transaction test error:\n file /usr/share/man/man7/systemd.net-naming-scheme.7.gz from install of systemd-239-43.el8.x86_64 conflicts with file from package systemd-networkd-246.6-1.el8.x86_64	14:17
jrosser	and i just put my head in my hands and sigh	14:17
spatel	jrosser: damn it	14:17
noonedeadpunk	oh, rly ?	14:17
noonedeadpunk	come on....	14:17
openstackgerrit	Marc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature. https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/766504	14:17
spatel	I am giving second thought of ubuntu	14:17
noonedeadpunk	great CI they told	14:17
spatel	Debian is good but worried about hardware support	14:17
noonedeadpunk	things won't be broken any more they said	14:18
jrosser	well this is becasue stream has a newer systemd than the one in EPEL where we get the networkd bit from	14:18
mgariepy	morning everyone	14:18
jrosser	oh also amusingly in ansible you cannot differentiate between centos 8.x and Centos 8 stream	14:18
spatel	steam will take rolling upgrade from fedora so definitely they will get updated more frequently	14:18
jrosser	becasue version = "8"	14:18
jrosser	so as far as ansible facts is concerned its older in a version compare than 8.3	14:19
jrosser	which breaks what we just merged for the kernel module renaming	14:19
noonedeadpunk	┻━┻︵ \(°□°)/ ︵ ┻━┻	14:19
jrosser	i think we have to grep in /etc/redhat-release and set a local fact	14:20
jrosser	oh wait	14:22
noonedeadpunk	or, we can jsut say from next release that centos 8 is not supported and only stream is which sucks. but leave regular centos bit for future forks of centos...	14:22
jrosser	wierdly it's installed systemd-networkd from epel fine	14:23
jrosser	i wonder if it tries to do it one more time in lxc_hosts and thats blowing up	14:23
jrosser	i would like to treat it like a totally different distro	14:23
openstackgerrit	Merged openstack/openstack-ansible-os_keystone master: Add security.txt file hosting to keystone https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/766437	14:24
jrosser	already it's obvious that all our version detection stuff is just wrong	14:24
noonedeadpunk	I'm wondering if it has some difference in ansible_distribution_release or smth...	14:28
jrosser	i could not find anything to drive centos(classic) vs centos(stream) logic	14:32
noonedeadpunk	also looking through simmilar thread on their forum....	14:34
noonedeadpunk	and no solution there	14:34
noonedeadpunk	how frustrating	14:35
openstackgerrit	Merged openstack/openstack-ansible stable/ussuri: Apply /etc/environment for runtime after adjustment https://review.opendev.org/c/openstack/openstack-ansible/+/766798	14:37
spatel	Folk, I have decided to rebuild my openstack using Ubuntu	14:37
mgariepy	spatel, ;)	14:37
openstackgerrit	Linhui Zhou proposed openstack/openstack-ansible-os_magnum master: Replace deprecated UPPER_CONSTRAINTS_FILE variable https://review.opendev.org/c/openstack/openstack-ansible-os_magnum/+/762057	14:37
noonedeadpunk	☜(⌒▽⌒)☞	14:37
spatel	I talked to my team and they give me thumbs up for ubuntu	14:37
noonedeadpunk	lol	14:38
spatel	No centOS hacks anymore	14:38
noonedeadpunk	so I'm starting wondering - will be there anybody interested in centos in half a year?	14:38
spatel	I am thinking for Debian but little worried	14:38
noonedeadpunk	Debian is good imo	14:38
spatel	worried about hardware support	14:38
*** mgariepy has quit IRC		14:39
spatel	Ubuntu is more popular in openstack community (very well known)	14:39
noonedeadpunk	used to worked for me previously	14:39
spatel	I didn't see anyone using Debian in production	14:39
noonedeadpunk	*used to work	14:39
jrosser	\o/ there is no tar for a container rootfs whatsoever https://cloud.centos.org/centos/8-stream/x86_64/images/	14:39
admin0	:)	14:40
*** cshen has quit IRC		14:41
noonedeadpunk	some infra folks does at least (like fungi) and vexxhost used to run it as well	14:41
spatel	right now they are using ubuntu right?	14:42
noonedeadpunk	I guess still debian	14:42
spatel	Hmm Let me try both and see how it goes.	14:43
noonedeadpunk	jrosser: I'm just speachless	14:43
spatel	good think moving forward you won't hear anything from me about CentOS :)	14:43
jrosser	it's kind of run this far though with just a couple of minor edits	14:44
jrosser	but really this i don't know what to do	14:44
noonedeadpunk	spatel: we have worse CI cverage for debian though> but it's pretty much similar to ubuntu... So you might see issues but nothing serious and smth we totally should fix (and maybe add more tests)	14:44
noonedeadpunk	jrosser: well, we always have lxcontainers and legacy method	14:44
jrosser	this is the lxc prep log http://paste.openstack.org/show/801012/	14:45
noonedeadpunk	but last time I saw really huge permormance degrade	14:45
jrosser	perhaps the prep script runs the command to convert centos->centos stream	14:45
jrosser	but we kind of only get one year out of that whichever way :(	14:45
*** mgariepy has joined #openstack-ansible		14:56
admin0	chances also are like centos came to be , ( downstream distro) .. people might just fork it and continue to make it downstream distro	15:14
admin0	it will be the same, just in another name	15:14
admin0	which has happened to many projects in the past when decisions like this has been taken	15:15
spatel	I just download Ubuntu Server 20.04.1 LTS (first time in my life)	15:16
admin0	spatel, \o/ yay	15:18
spatel	I need to setup PXE book first to fire up my servers	15:18
spatel	boot*	15:19
*** cshen has joined #openstack-ansible		15:26
*** macz_ has joined #openstack-ansible		15:37
*** macz_ has joined #openstack-ansible		15:38
kleini	ubuntu server is great, never had big issues with it. especially ZFS support in ubuntu solved my problems with filesystems getting too fragmented over time in production systems	15:45
SecOpsNinja	hi everyone. one quick question is there an easy way to recreate ques in rabbitmq? im getting "nova-scheduler: amqp.exceptions.NotFound: Queue.declare: (404) NOT_FOUND - queue 'scheduler_fanout_*' in vhost '/nova' process is stopped by supervisor" which is is the cause of the Connection failed: [Errno 113] EHOSTUNREACH (retrying in 32.0 seconds): OSError: [Errno 113] EHOSTUNREACH in nova-co	15:45
SecOpsNinja	nducttor.	15:45
openstackgerrit	Merged openstack/ansible-role-systemd_service master: Use upper-constraints for all tox environments https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/765831	15:55
SecOpsNinja	yep i confirm the queues exist but it doesn't have any messasges in it... what could be the problem? the compute node notbieng able to connect to rabbitmq?	15:57
spatel	SecOpsNinja: i had same issue and re-building rabbitMQ helped - https://bugs.launchpad.net/nova/+bug/1835637	16:03
openstack	Launchpad bug 1835637 in OpenStack Compute (nova) "(404) NOT_FOUND - failed to perform operation on queue 'notifications.info' in vhost '/nova' due to timeout" [Undecided,Incomplete]	16:03
openstackgerrit	Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: [doc] Adjust octavia docs https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/766833	16:03
spatel	RabbitMQ is much easier to re-build then troubleshoot	16:03
spatel	than*	16:04
admin0	SecOpsNinja, you can nuke the 3 rabbitmq containes and re-do it .. it will add the queues and fix it self	16:05
admin0	based on your build time, some agents might not retry, so you might have to locate them and manually restart the services	16:05
noonedeadpunk	it's the way faster just to rerun `rabbitmq-install.yml -e rabbitmq_upgrade=true`	16:06
noonedeadpunk	at least I'd start with it in case suggested issues with rabbit	16:07
SecOpsNinja	but from what im seasing the queue existis in /nova but it doesn have nay messages now i don't know if the problems is a connectivy one from compute node or from nova-api regarding rabbitmq cluster	16:09
SecOpsNinja	stil ltryin gto find a way to see who is connect to which queue and see if i can find one the problem	16:10
spatel	SecOpsNinja: tcpdump will give you idea if anything hitting RabbitMQ or not	16:17
spatel	RabbitMQ is complex sometime internal message routing is broken also cause issue and not visible until you debug components	16:18
SecOpsNinja	my checking /var/log/rabbitmq/*cf.log after rebooting this container and seeing who is conencted	16:18
SecOpsNinja	but yeh atm i can create vm becuase they get stuck in schedluing forever	16:19
spatel	SecOpsNinja: use RabbitMQ GUI management interface which is easy to understand who is connected and where	16:19
openstackgerrit	Merged openstack/openstack-ansible-tests master: Return centos-8 jobs to voting https://review.opendev.org/c/openstack/openstack-ansible-tests/+/765986	16:19
SecOpsNinja	spatel, what that GUI? can you give the url? im using the cli rabbitmqctl	16:20
admin0	anyone using netplan for declaring ovs setup on ubuntu 20 for osa ?	16:20
admin0	last i tried was in 18.04, but netplan was new and there was no ovs support on it	16:20
spatel	SecOpsNinja: https://www.rabbitmq.com/management.html	16:20
spatel	The management UI can be accessed using a Web browser at http://{rabbitmq_container_ip}:15672/	16:21
spatel	you may need to do some kind of SSH port forwarding if container network not accessible from your desktop	16:21
SecOpsNinja	spatel, ok thanks :D	16:21
spatel	SecOpsNinja: you can find UI password from cat /etc/openstack_deploy/user_secrets.yml \| grep rabbitmq_monitoring_password	16:23
SecOpsNinja	suposse the username is admin?	16:23
spatel	username monitoring	16:23
SecOpsNinja	ok thanks	16:23
SecOpsNinja	will check it now	16:24
SecOpsNinja	to see if i can understand what is happening	16:24
admin0	what i do is use firefox and foxyproxy with patterns like 172.29.236. via socks port say 17221 .. then,via ssh do ssh user@deploy/or-any-server -D 17221 ( which opens a socks tunnel on 17221)	16:26
admin0	then you can browse/reach any IP that the server you are doing an ssh to reaches	16:26
SecOpsNinja	yep i normaly use the ssh tunnel but in this cases im in the same management network so it not a problem. but yes the GUI is a lot easiear to see the connections :D	16:27
spatel	I think OSA should expose rabbitmq monitoring to external network using HAProxy :) let me tags jrosser & noonedeadpunk	16:27
*** fanfi has joined #openstack-ansible		16:28
admin0	should be via a user_variable	16:28
spatel	i am not seeing any security issue to expose that port because its read-only account and with password	16:30
spatel	I love SSH tunnel stuff but its hard to teach every person and specially NOC people..	16:31
noonedeadpunk	the problem with just monitoring user is that it's really very limited metrics can be gathered with it	16:32
noonedeadpunk	I usually put admin tag on it to make full privilege user to gather all available data... but dunno about security...	16:32
spatel	noonedeadpunk: we can give more privilege	16:32
noonedeadpunk	rabbit runs on mgmt network which should not be exposed	16:32
spatel	noonedeadpunk: question is can we expose it via HAproxy or not?	16:32
noonedeadpunk	ah	16:33
spatel	I want to just type http://openstack.example.com:<rabbit_port>/ on my browser	16:33
spatel	without any SSH tunnel hacks	16:33
noonedeadpunk	I think you can just do it with haproxy_extra_services	16:34
spatel	i didn't know that	16:34
spatel	can we add that example snippet in RabbitMQ troubleshooting page of OSA documents?	16:34
spatel	i meant at this page - https://docs.openstack.org/openstack-ansible/pike/admin/maintenance-tasks/rabbitmq-maintain.html	16:35
noonedeadpunk	I don't have example on my hands... but it would be pretty much the same to https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/haproxy/haproxy.yml#L64-L74	16:35
noonedeadpunk	I think that would be more proper place for this kind of doc https://docs.openstack.org/openstack-ansible-rabbitmq_server/latest/configure-rabbitmq.html	16:36
spatel	I will test that out in lab and if everyone agreed then put example in that link	16:36
noonedeadpunk	but maybe your link is good too...	16:37
noonedeadpunk	as eventually it's really manitenance...	16:37
spatel	will do there, i want all possible hacks to fix RabbitMQ in single page :)	16:38
SecOpsNinja	yep i dyep i think i will do what noonedeadpunk suggested and try running rabbitmq-install.yml -e rabbitmq_upgrade=true and see if it resolves the EHOSTUNREACH...	16:38
admin0	another check will be to acutally curl/ping and see if its a network issue and not rabbit	16:39
SecOpsNinja	in logs and haproxy i dont see any connection drops	16:40
noonedeadpunk	rabbit does not go throug haproxy by the way	16:41
spatel	In my last 3 years openstack operation i found re-building RabbitMQ fixed all kind of issue (even all monitoring showing green and cluster looking healthy).	16:41
noonedeadpunk	but I'd rather run this playbook tbh	16:41
noonedeadpunk	it never made things worse at least for me	16:42
spatel	noonedeadpunk: do you reset cluster (clear mnesia directory) before running that playbook?	16:42
SecOpsNinja	noonedeadpunk, i was checking the checkouts in haproxy regarding rabbitmq containers to see if they drops anything but yep the majority of from that i have are always something in rabbitmq... ok i will rerrun openstack ansible and see if it reosolverd the problem or not	16:43
noonedeadpunk	spatel: nope, just run it :)	16:43
spatel	I found if you have dirty mnesia directory then rabbitMQ start to fail and playbook get stuck	16:44
spatel	but again its case to case..	16:44
noonedeadpunk	Hm, maybe... I just never faced that, but I can imagine that happening tbh	16:44
noonedeadpunk	and I never run on centos, so...	16:44
noonedeadpunk	(well actually ran but it was not so many times as for ubuntu)	16:45
spatel	may be depend on what state your cluster die	16:45
noonedeadpunk	well yeah	16:45
noonedeadpunk	I mostly experienced issues after one controller outage was re-joining cluster	16:45
SecOpsNinja	spatel, noonedeadpunk regaring upgrading the rabbitmq this [req-*] identifies are going to be reset when the openstack ansible finishs or irs there any whay that i can reset this behaviour?	16:49
spatel	SecOpsNinja: I don't understand your question (what is req-*?)	16:51
admin0	SecOpsNinja, those req-s are going to be lost	16:51
admin0	coz the new db will have no idea of the request	16:51
admin0	request-id	16:51
SecOpsNinja	nova-conductor[449]: 2020-12-14 16:48:16.438 449 ERROR oslo.messaging._drivers.impl_rabbit [req-4901b480-6728-4b58-994f-8ed141e7898e - - - - -] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 32.0 seconds): OSError: [Errno 113] EHOSTUNREACH still showing after openstack-ansible rabbitmq-install.yml -e rabbitmq_upgrade=true	16:52
openstackgerrit	Jonathan Rosser proposed openstack/openstack-ansible-os_ceilometer master: Remove centos-7 conditional configuration https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/765956	16:52
SecOpsNinja	yep the rabbitmq cluster sound't know this reqquest id but the clienets are still expecting answer to it	16:52
SecOpsNinja	*shouldn't know	16:52
SecOpsNinja	that is why i asked how to reset this information from consumers. i already deleted server in openstack but the nova.-conductor is still requesting and answer to that previoues request id	16:53
spatel	not sure if openstack-ansible rabbitmq-install.yml -e rabbitmq_upgrade=true re-build cluster from scratch (like delete all)	16:54
admin0	it will timeout and not complain after a while	16:54
spatel	queue as TTL and it will die after TTL expire	16:54
SecOpsNinja	because i still have the 2 request id from almost 5h ago and its still complaning :D	16:54
spatel	You can delete that mesg also manually (need to google or use UI to delete those request)	16:55
spatel	noonedeadpunk: question for you does openstack-ansible rabbitmq-install.yml -e rabbitmq_upgrade=true will destroy cluster and re-build like new ?	16:56
SecOpsNinja	ok i will try to find a way to delete that because the queues are empty of messags	16:56
noonedeadpunk	pretty close to this	17:04
noonedeadpunk	yes	17:04
noonedeadpunk	it drops queues, and rebuilds cluster	17:04
spatel	noonedeadpunk: Does it preserve data during re-build because its in HA	17:05
noonedeadpunk	except it does not drop already created users, vhosts and some more of the persistant data	17:05
noonedeadpunk	but it does drop all messages that were there	17:06
spatel	what is why SecOpsNinja Req-* still in queue (because it preserve )	17:06
spatel	hmm	17:06
noonedeadpunk	(well I'm not 100% sure about that)	17:06
spatel	I believe if its in HA then it will preserve data in queue (i would like to try that would)	17:07
spatel	out*	17:07
*** jbadiapa has joined #openstack-ansible		17:08
noonedeadpunk	EHOSTUNREACH ofc sounds more like networking... are you able to telnet to 5671 port to all rabbitmq containers from nova-api one?	17:09
jrosser	looking at what nova-conductor is trying to connect to with strace -p <pid> then ping / check routes / telnet to whatever its trying to connect to is a good plan for these situations	17:15
jrosser	you'll see the actual IP it's trying like that	17:15
*** mgariepy has quit IRC		17:19
openstackgerrit	Merged openstack/openstack-ansible-repo_server master: Fix order for removing nginx file. https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/766257	17:35
SecOpsNinja	noonedeadpunk, spatel and jrosser yep i have a rabbitmq clsuter with 3 nodes. and i see that after recriaring it the queues in /nova vhost are still the same	17:36
SecOpsNinja	i will try to do that with sttrace and see if i can find it because i dont see any log of droping connection oin rabbitmq nodes	17:37
admin0	anyone doing netplan+ovs -- can share config ?	17:37
SecOpsNinja	the only whaty to stop the ERROR oslo.messaging._drivers.impl_rabbit in nova schedluer and nova conductor was restarting the systemd service. going to make a strace to both pids and make que request to create a new server and see what happens	17:38
*** johanssone has quit IRC		17:45
*** johanssone has joined #openstack-ansible		17:47
*** rpittau is now known as rpittau\|afk		17:56
*** spatel has quit IRC		17:57
*** maharg101 has quit IRC		17:58
*** spatel has joined #openstack-ansible		17:59
SecOpsNinja	jrosser, one question regarding strace if using in parrent process of nova-scheduler or nova-conductor i only see the something like this select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=9973}) = 0 (Timeout). who whould i use strace?	18:02
*** carlosm has joined #openstack-ansible		18:03
carlosm	hi guys	18:03
SecOpsNinja	from i able to see in the logs in the moment i try to create a new server using cli, i the the valdiation of /v2.1/flavors/ and after that i get uwsgi[72]: Mon Dec 14 17:47:22 2020 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /v2.1/servers (ip of the host) !!!	18:03
SecOpsNinja	and 2 seconds later nova snd scheduler reconnecting and starting giving EHOSTUNREACH errors...	18:05
carlosm	My neutron has following erros, someones knows? : Device brq3c0d52cf-11 cannot be used as it has no MAC address	18:05
admin0	SecOpsNinja, have you tried rebooting this host again :)	18:12
SecOpsNinja	yep varios times, including nova and scheduler containers	18:13
SecOpsNinja	im notw trying to do the strace to nova-api-wsgi pid to see what causes the A recoverable connection/channel error occurred, trying to reconnect: Server unexpectedly closed connection	18:14
*** mgariepy has joined #openstack-ansible		18:20
spatel	SecOpsNinja: just curious what your tcpdump saying? it should give you all the information	18:20
SecOpsNinja	ok i do see some connection from host in strace of uwsgi pid (/etc/uwsgi/nova-api-os-compute.ini) gerring ECONNRESET (Connection reset by peer) and see and error regaging "HTTP exception thrown: Flavor basic-small could not be found" it shows the flavor as public	18:21
SecOpsNinja	let me try again	18:22
SecOpsNinja	spatel, trying to reduce the quantaty of messages because tcpdump -i the1 inside nova-api container does get a lto of info	18:26
spatel	you need to filter for port and just grab 1 call to trace and see start to finish	18:33
spatel	rabbitMQ use TCP so it will keep connect in Established mode (so you won't see any SYN/ACK)	18:34
SecOpsNinja	how i do the trace with just one package?	18:34
spatel	download pcap and use wireshark	18:34
spatel	how many compute nodes you have?	18:35
SecOpsNinja	3	18:36
SecOpsNinja	and 3 infra ones where i have 1 node of rabbitmq	18:36
SecOpsNinja	but atm im seeing another error that could eb the problem (or at least narro it)	18:36
SecOpsNinja	http://paste.openstack.org/show/801020/	18:38
SecOpsNinja	the this part is strange GET /v2.1/flavors/basic-small	18:38
SecOpsNinja	because this openstack flavor show basic-small workds	18:39
SecOpsNinja	"GET /v2.1/flavors/basic-small" status: 404 but "GET /v2.1/flavors?is_public=None" status: 200 ? but the flavor does have os-flavor-access:is_public : True	18:41
SecOpsNinja	in meantime i will try to do a pcap and use it with wireshark	18:41
SecOpsNinja	because im still learning about lxc is there any what to copy files from inside the contaienrs to the host?	18:42
SecOpsNinja	forget the last question lol....	18:42
spatel	SecOpsNinja: did you see this - https://ask.openstack.org/en/question/32360/networking-issues-errno-113-ehostunreach/	18:44
SecOpsNinja	let me check thaht	18:45
spatel	copy file /var/lib/lxc/<contrainer_name>/rootfs/....	18:46
spatel	i do copy in/out mostly using that path never did scp from host to container :)	18:46
SecOpsNinja	sorry i dont understand that question. because from i understand the compute is not able to connect to any service and in the service nova in compute log doesnt report any thing and the nova api only reportes connection drops after specfic calls	18:47
SecOpsNinja	atm if i check all the services show up and runing	18:47
SecOpsNinja	spatel, thanks for the cp path i normaly did scp	18:48
spatel	think LXC container like folders :)	18:48
SecOpsNinja	im a bit lost atm because theo openstack service shows all up and runing and only after a espefic request i see nova-conducter/scheduelr reconnecting after a few seconds but can find why is droping the connection...	18:50
SecOpsNinja	to rabbitmq	18:50
*** openstackgerrit has quit IRC		18:50
admin0	is it recommended to change qcow2 to raw if using ceph for cinder/glance/vms ?	18:55
admin0	for the image	18:55
SecOpsNinja	i dont know why but the 404 in uwsgi of nova-api is causing the connection failed from rabbitmq as you can see it here http://paste.openstack.org/show/801021/	18:56
*** gyee has joined #openstack-ansible		18:58
SecOpsNinja	and that 172.30.0.2 its the primary ip of the haproxy so all the request that i make with openstack client outside go with haproxy ip and not mine	18:59
jrosser	SecOpsNinja: using internal/public would help as those are the terms in the code	19:00
SecOpsNinja	but the internal and public and managed by haproxy	19:01
jrosser	i struggle to follow primary/outsude	19:01
jrosser	admin0: yes for glance images in ceph you should convert to raw	19:01
SecOpsNinja	sorry not haproxy but keepalive but the public endpoisn are using the vips	19:02
SecOpsNinja	the public and private endppoints as i had various haproxys	19:03
SecOpsNinja	i now only have 1 but im still using the vip so i dont have to reconfigure all the cluster	19:03
SecOpsNinja	i will try to strace all pid fork process of uwsi in nova container to see if i can the the connection but strace is a bit unknown to me atm...	19:05
*** openstackgerrit has joined #openstack-ansible		19:05
openstackgerrit	Merged openstack/openstack-ansible master: Remove *_git_project_group variables https://review.opendev.org/c/openstack/openstack-ansible/+/766039	19:05
spatel	admin0: use raw for ceph storage	19:10
openstackgerrit	Jonathan Rosser proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature. https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/766504	19:10
spatel	most of people saying it boost performance (I personally never experience that so going with best practices)	19:11
spatel	nova directly talk to rabbitMQ (not via haproxy)	19:13
spatel	SecOpsNinja: ^	19:13
spatel	haproxy shouldn't come in picture for troubleshooting rabbitmq communication	19:14
SecOpsNinja	spatel, yep but the info that i have is SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /v2.1/servers/d8508991-78d5-45e3-a7a2-77ca8c11aba0 (ip 172.30.0.2) !!! and the 172.30.0.2 if from the physical host and not nova-api or haproxy containers so i suposse that ino says that my openstacl client cli dropded the connections but that should cause the n	19:16
SecOpsNinja	ova api to loose connection to rabbitmq	19:16
SecOpsNinja	and tcpdump doesn shows info regarding what/who dropped a connection	19:17
SecOpsNinja	i suposte that all the rabbitmq consumers are always connected to the varios rabbitmq cluster so there must be something that is causing the nova-scheduler and nova-condutor to reconnect	19:17
spatel	Make sure no MTU mismatch and no packetloss	19:18
SecOpsNinja	because ther are only the ones that reconnect after the failed api call	19:18
SecOpsNinja	and i see the reconnects in various rabbitmq logs	19:18
spatel	MTU mismatch is very complex to troublshoot because it look like works but drop packets	19:18
SecOpsNinja	the mtu is only a problem if you are using some thing like vlans because of the header but other wise it should be a problem in lan comunication, no?	19:19
spatel	If host A has MTU 9000 and host B has 1500 then you may see issue.	19:20
spatel	It has nothing to do with VLAN or VxLAN	19:20
SecOpsNinja	and i didnt mess twith mut so i believe its the 1500 default one that is configureed	19:20
SecOpsNinja	let me confirm that but i believe there all have the same	19:20
spatel	I had issue with LXC container 3 years ago, everything was working but it was dropping packets and turn out it was kernel logging issue	19:21
SecOpsNinja	from what im seaisng the majory is 1500 and some brq/tap interfces are using 1450	19:22
spatel	that is good	19:23
SecOpsNinja	but the starnge parte is that all this problems starts when i installed adicional infra nodes.... and tryied to enabled HA in all of them with keepalive and multiple haproxys	19:23
SecOpsNinja	this as been and intertaing adventure :D	19:24
spatel	if this is not in production then why don't you destroy container and re-build it	19:24
SecOpsNinja	i will make a new test and see if i can detect disconnects in all rabbitmq cluster nodes	19:25
spatel	re-build nova and rabbit	19:25
SecOpsNinja	becuase i want to understand what is the problem (sometimes i cant rdestroy and rebuild it)	19:25
spatel	yes agreed. lets us know whatever you find.	19:27
SecOpsNinja	let me make a few test before going home to rest :D but what i will try is to see making first a connection to the flaour list and then tryoing to create a new server with it	19:28
SecOpsNinja	and see if rabbitmq cluster nodes report any reconnect/error to the current consumers list	19:29
SecOpsNinja	spatel, jrosser, noonedeadpunk yep its as to be something in nova api container/services that is causing the drop of the connection http://paste.openstack.org/show/801022/ . If ii understand correctly the rabbitmq in openstack cluster it shouldn't loose connection to rabbitmq ecause of an 404 or http exception throw. If it was a network conenction varioues other consumers will also reconne	19:46
SecOpsNinja	ct but that didn't happen... only in nova_api_container	19:46
SecOpsNinja	and the flavour was reacreated in the same project where the imaegs and server is being created so there must be any missconfiguration form my oart but i can't find where...	19:47
jrosser	it almost suggests that the mq credentials are mismatched between the nova container and the mq cluster	19:49
jrosser	becasue it disconnects pretty much straight away	19:49
SecOpsNinja	ei dont think i replaced the openstack osa secrets but let mee check in nova api conf files	19:50
SecOpsNinja	not finding the password in nova conffiles	19:53
SecOpsNinja	yep im out of ideas to try to understand what is happening here... i can try to force the creation of all contaienr sexcept rabbitmq and galera and see if it resolves but supossly openstack-ansible should have done all configuration...	19:55
*** maharg101 has joined #openstack-ansible		19:55
*** carlosm has quit IRC		20:00
*** maharg101 has quit IRC		20:00
spatel	why you getting {handshake_timeout,handshake}	20:01
spatel	I have seen that error when cluster is not healthy	20:02
SecOpsNinja	probably because the http trhow exception and doesn finish the reques?	20:02
SecOpsNinja	but if i goo to the cluster is show that it as all the nodes and there install any brain slip	20:02
*** viks____ has quit IRC		20:03
spatel	why don't you run nova in debug mode	20:03
SecOpsNinja	and i make the rabbitmq install	20:03
*** hindret has quit IRC		20:03
*** simondodsley has quit IRC		20:03
SecOpsNinja	i can . lett me try to put that service in debug... i supose that --debug?	20:03
*** simondodsley has joined #openstack-ansible		20:04
spatel	nova.conf use debug=True	20:04
*** hindret has joined #openstack-ansible		20:04
SecOpsNinja	where is the file? i coud only find *.ini ones	20:04
spatel	inside nova-api container /etc/nova/	20:05
SecOpsNinja	ok give me a minute to change that and open all rabbitmq logs	20:06
*** cshen has quit IRC		20:10
SecOpsNinja	lol after restarting all nova services nova-api-os-compute.service nova-api-metadata.service nova-conductor.service nova-novncproxy.service nova-scheduler.service in the container not the api doesnt drop the connection in rabbitmq logs	20:11
SecOpsNinja	but still givves the error 404 in flavour	20:11
SecOpsNinja	http://paste.openstack.org/show/801025/	20:12
*** cshen has joined #openstack-ansible		20:12
jrosser	SecOpsNinja: where are you running the cli commands from?	20:13
SecOpsNinja	my computer that is using haproxy vip endpoint as the OS_AUTH_URL	20:13
jrosser	can you please try from the utlity container	20:14
SecOpsNinja	yep one second	20:14
SecOpsNinja	hum one firence that im finding in openrc configuration is that the utility use the /v3 part and the one in my machine don't	20:15
SecOpsNinja	but let me make the request	20:16
*** andrewbonney has quit IRC		20:17
SecOpsNinja	yep same behaviour regaring 404 and 202 - http://paste.openstack.org/show/801026/	20:18
SecOpsNinja	but still no droping connections now in rabbitmq	20:18
SecOpsNinja	and will try now to force the creation of a vm to see if i get more info	20:19
spatel	are you getting list of flavor with openstack flavor list ?	20:19
SecOpsNinja	yes	20:20
SecOpsNinja	that is the strange parte and all are public	20:20
spatel	mostly openstack flavor show command don't interact with rabbitMQ	20:20
spatel	That is API call directly go to mysql DB	20:21
SecOpsNinja	yep the openstack flavor list dont	20:21
spatel	not sure why flavor issue coming in picture	20:21
SecOpsNinja	atleast i dont see anything in logs	20:21
SecOpsNinja	let me try to creat ea dummy vm	20:21
SecOpsNinja	jrosser, spatel http://paste.openstack.org/show/801027/	20:26
SecOpsNinja	and it starts getting problems in rabbitmq disconets	20:27
SecOpsNinja	let me try to repost with info regarding rabbitmq logs	20:28
spatel	HTTP exception thrown: Flavor basic-small could not be found.	20:31
SecOpsNinja	http://paste.openstack.org/show/801028/	20:32
SecOpsNinja	but exists at least in the falouv list	20:33
SecOpsNinja	and shoing info regarding specific flavor	20:33
SecOpsNinja	thats is very strange indead	20:33
SecOpsNinja	should i force the destruction of all rabbitmq cluster and after they have been recreated force the restart of all the infra nodes containers?	20:34
spatel	2 node RabbitMQ ?	20:34
spatel	that is bad	20:34
SecOpsNinja	yep i have 3 nodes in my rabbitmq	20:34
SecOpsNinja	the first one didn't report any disconnect	20:34
spatel	I have strong feeling your rabbit isn't in good health	20:35
SecOpsNinja	or the tail didnt updated	20:35
SecOpsNinja	yep it didnt report any disconnect	20:35
spatel	Just nuke rabbitmq and re-build	20:35
SecOpsNinja	so you recommend destroy all the rabbitmq cluster containers, recreate them and run the rabbitmq install ?	20:36
admin0	i would recommend that also	20:36
spatel	This is what i do to nuke rabbitmq	20:36
SecOpsNinja	and the the consumers should i restart all of them or they will be able to resolve there problems?	20:36
spatel	stop all services	20:37
spatel	kill -9 rabbit	20:37
spatel	un-install rabbit (yum remove rabbitmq-server)	20:37
spatel	rm -rf /var/lib/rabbitmq/mnesia/*	20:37
spatel	Run playbook to deploy rabbitmq	20:37
SecOpsNinja	when you say stop all services is regarding infra host services that would be using rabbitmq right?	20:38
spatel	Inside rabbitmq-container	20:39
spatel	on infra nodes	20:39
SecOpsNinja	oh ok	20:39
SecOpsNinja	thanks everyone for all info and i will try tomorow to do that and see if i can have this resolved .... im having nightmware with rabbits :D	20:39
spatel	rabbit is worst part of openstack and majority of time you will see issue with rabbitmq	20:40
spatel	i have multiple time nuke rabbitMQ (because none of troubleshooting guide helped me)	20:41
SecOpsNinja	i would think that giving more nodes would put rabbitmq more stable	20:41
mgariepy	i tought it was neutron the worst part ;).. lol	20:41
SecOpsNinja	mgagne, yep neutron with some plugins is a interesting part also	20:42
spatel	neutron is CPU hungry (i haven't seen any complication about config)	20:42
SecOpsNinja	thanks again and i will try to give and updated tomorow :D	20:42
SecOpsNinja	gn to all	20:42
spatel	gn	20:42
spatel	I hate Rabbitmq clustering part, its always hard to recover. (whenever i tried to join node it always do something nasty or hung on me)	20:43
spatel	one day i had split-brain (that was nightmare)	20:44
spatel	at least with neutron you don't need to deal with clustering issue.	20:44
*** cshen has quit IRC		20:46
*** SecOpsNinja has left #openstack-ansible		20:48
*** cshen has joined #openstack-ansible		20:51
mgariepy	sure but neutron tend to be really slow to recover from what i've seen.	20:55
mgariepy	i agree failure when it's the first time and you need to learn on the spot to fix it is not fun.	20:56
spatel	mgariepy: its easy to horizontally add more resource in neutron to spread load	21:11
spatel	anyone has good Ubuntu pxe boot kickstart file?	21:40
spatel	in this option looks good for PXE - append initrd=/images/ubuntu/initrd ip=dhcp syslog=10.70.0.20:514 url=http://10.70.0.20/pxe_repo/ubuntu-20.04.1-live-server-amd64.iso ks=http://10.70.0.20/pxe_ks/ubuntu-20-04-1.ks	21:41
spatel	I found installation works but it prompt for question/answer :(	21:42
spatel	i need auto-install	21:42
jrosser	spatel: before 20.04 there was debian-installer and preseed	21:50
*** cshen has quit IRC		21:50
jrosser	in 20.04 there is now this https://ubuntu.com/server/docs/install/autoinstall	21:50
jrosser	it is late here but i can maybe share some stuff tomorrow	21:50
spatel	jrosser: thanks, let me read about that	21:51
jrosser	it is vert similar to cloud-init for a vm	21:51
spatel	hmm i came across with some article talking about cloud-init but i thought that would be not for my setup so ignored them	21:52
spatel	Let me understand how 20.04 handle that	21:52
spatel	what OS you guys running on your openstack?	21:52
spatel	19.x ?	21:52
jawad_axd	Hi! Can someone please push me on this one, with newly added compute . I can see in 'openstack compute service list' but not in 'openstack hypervisor list'. This is nova-compute log http://paste.openstack.org/show/801032/ .One more thing I noticed that I can not reach ceph from compute node after installation as "rbd --user cinder ls -p pool-name" after following openstack-ansible docs for adding n	21:53
jawad_axd	ew compute node. Thanks in advance for pointers.	21:53
jawad_axd	I am trying to make this compute host as gpu passthrough, and it has vfio-pci kernel driver enabled on the host. I am not sure if that is causing some problem.	21:55
*** maharg101 has joined #openstack-ansible		21:56
jawad_axd	I would highly appreciate if someone would gives some hints for it. I have spent last few days on it..	21:58
spatel	jawad_axd: did you check nova-api logs and nova-placement logs?	21:59
*** maharg101 has quit IRC		22:00
jawad_axd	I can not see any error there..	22:01
jawad_axd	I got http://paste.openstack.org/show/800980/ libvirt related error couple of days ago. But then it didt appear again.	22:04
spatel	jawad_axd: can you see your compute nodes in "openstack resource provider list"	22:06
spatel	if not then it could be nova-placement service related issue	22:07
jawad_axd	I can not see it with " openstack resource provider list""	22:07
*** cshen has joined #openstack-ansible		22:08
spatel	there you go	22:08
spatel	look like your compute node not able to register to nova-placement or may be nova	22:09
spatel	i would check your compute nova.conf file to see if you have good config and nothing missing	22:09
spatel	also make sure your nova-placement is running on infra node	22:10
jawad_axd	This is nova.conf from compute. http://paste.openstack.org/show/801033/	22:15
spatel	can you ping or curl your endpoints and node able to talk to all API services?	22:16
spatel	its hard to say anything just looking at file nova.conf	22:17
spatel	run in debug mode and see why its not able to register itself to controller nodes	22:17
jawad_axd	Ok. Regarding nova.conf I added pcipassthrough filter and [pci] information. I never had this kinda problem before.	22:19
spatel	remove that option and restart nova to see	22:19
spatel	I am using pcipassthrough and i had no issue at all	22:20
jawad_axd	ok	22:20
spatel	just do some quick hit and try to see if it make any sense	22:20
jawad_axd	nova-compute service restart is taking forever after removing those entries.	22:27
spatel	hmm	22:27
spatel	check logs and see	22:28
jawad_axd	This is nova-compute log http://paste.openstack.org/show/801034/ after service restarted.	22:31
spatel	nothing change	22:34
*** jbadiapa has quit IRC		22:34
spatel	no error except not able to find "No compute node record found"	22:34
spatel	I would check again nova-placement and nova-api logs	22:35
spatel	when compute node restart it try register to nova-placement/api and sure tell you something (run in debug mode to get more data)	22:36
jawad_axd	This is placement log http://paste.openstack.org/show/801036/	22:38
spatel	what if you run tcpdump on compute node and on other terminal restart service nova-compute	22:42
spatel	it will tell you in tcpdump what its trying to do making call to api etc..	22:42
jawad_axd	Ok. I do it.	22:43
jawad_axd	This is nova-api log http://paste.openstack.org/show/801037/	22:43
spatel	looking clean so look like your compute nodes not making a call (if you have single infra then run tcpdump on nova-api also to see if you getting any packet from compute)	22:47
jawad_axd	I have HA setup . 3 nova-api nodes.	22:48
jawad_axd	http://paste.openstack.org/show/801038/	22:49
jawad_axd	tcpdump on compute node while restarting services.	22:49
jawad_axd	I do tcpdump on nova-api	22:49
spatel	:) you need to filter tcpdump for specific host ip or port (otherwise you will see all garbage like SSH / ARP etc..)	22:50
jawad_axd	ah ok	22:50
spatel	tcpdump -i any -nn not port ssh -e -xX -s0 (i would try that)	22:51
spatel	Good night folks! see you tomorrow! it was wonderful troubleshooting day today.	22:55
jawad_axd	Goodnight!	22:56
jawad_axd	Thanks for your time.	22:56
*** spatel has quit IRC		22:58
*** spatel has joined #openstack-ansible		23:05
*** spatel has quit IRC		23:09
openstackgerrit	Jonathan Rosser proposed openstack/openstack-ansible-os_octavia master: [doc] Adjust octavia docs https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/766833	23:28

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!