*** prometheanfire has joined #openstack-ansible | 00:20 | |
*** tosky has quit IRC | 00:36 | |
*** cshen has joined #openstack-ansible | 01:45 | |
*** cshen has quit IRC | 01:49 | |
ThiagoCMC | Never stop! OpenStack is fun! =P | 02:22 |
---|---|---|
ThiagoCMC | Victoria this week?! lol | 02:22 |
*** dave-mccowan has joined #openstack-ansible | 02:56 | |
*** cshen has joined #openstack-ansible | 03:45 | |
*** cshen has quit IRC | 03:50 | |
*** akahat is now known as akahat|ruck | 04:11 | |
*** evrardjp has quit IRC | 05:33 | |
*** evrardjp has joined #openstack-ansible | 05:33 | |
*** cshen has joined #openstack-ansible | 05:44 | |
*** cshen has quit IRC | 05:48 | |
*** cshen has joined #openstack-ansible | 06:05 | |
*** cshen has quit IRC | 06:09 | |
*** cshen has joined #openstack-ansible | 06:12 | |
*** cshen has quit IRC | 06:17 | |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_zun master: Update zun role to match current requirements https://review.opendev.org/c/openstack/openstack-ansible-os_zun/+/763141 | 06:26 |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_zun master: Update zun role to match current requirements https://review.opendev.org/c/openstack/openstack-ansible-os_zun/+/763141 | 06:31 |
*** kukacz has quit IRC | 06:34 | |
*** pto has joined #openstack-ansible | 06:38 | |
*** pto has quit IRC | 06:38 | |
openstackgerrit | Merged openstack/openstack-ansible-tests stable/ussuri: Bump virtualenv to version prior to 20.2.2 https://review.opendev.org/c/openstack/openstack-ansible-tests/+/766801 | 06:46 |
*** kukacz has joined #openstack-ansible | 06:57 | |
*** pcaruana has joined #openstack-ansible | 07:50 | |
*** pcaruana has quit IRC | 07:51 | |
*** masterpe has quit IRC | 08:16 | |
*** gundalow has quit IRC | 08:16 | |
*** tbarron has quit IRC | 08:16 | |
*** cshen has joined #openstack-ansible | 08:17 | |
*** johanssone has quit IRC | 08:19 | |
*** andrewbonney has joined #openstack-ansible | 08:20 | |
*** gundalow has joined #openstack-ansible | 08:22 | |
*** tbarron has joined #openstack-ansible | 08:22 | |
*** johanssone has joined #openstack-ansible | 08:23 | |
*** rpittau|afk is now known as rpittau | 08:27 | |
*** tosky has joined #openstack-ansible | 08:38 | |
*** akahat|ruck is now known as akahat|lunch | 09:08 | |
noonedeadpunk | mornings | 09:10 |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: [DNM] https://review.opendev.org/c/openstack/openstack-ansible/+/766901 | 09:15 |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_zun master: Update zun role to match current requirements https://review.opendev.org/c/openstack/openstack-ansible-os_zun/+/763141 | 09:17 |
*** macz_ has joined #openstack-ansible | 09:20 | |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible stable/ussuri: Apply /etc/environment for runtime after adjustment https://review.opendev.org/c/openstack/openstack-ansible/+/766798 | 09:21 |
noonedeadpunk | jrosser: regarding security.txt - you decided to have both for keystone and haproxy? | 09:23 |
*** macz_ has quit IRC | 09:24 | |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_keystone stable/ussuri: Move openstack-ansible-uw_apache centos job to centos-8 https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/765928 | 09:25 |
jrosser | noonedeadpunk: it's keystone apache/nginx that serves the actual file | 09:38 |
jrosser | but intervention is needed on haproxy to intercept https://example.com:443/security.txt to the backend which is normally listening on port 5000 | 09:39 |
* jrosser double checks the patch | 09:41 | |
noonedeadpunk | ah, ok | 09:57 |
jrosser | git breakage on train bump upgrade job, python_venv_build repo fatal: reference is not a tree: 74d3eeacc72d5d6bb7a915e83440626a8d16a1c0 | 10:07 |
jrosser | that is so wierd | 10:08 |
*** gshippey has joined #openstack-ansible | 10:11 | |
*** akahat|lunch is now known as akahat|ruck | 10:32 | |
*** sshnaidm|off has quit IRC | 10:47 | |
*** SecOpsNinja has joined #openstack-ansible | 11:06 | |
noonedeadpunk | andrewbonney: ok, so seems focal just fails with kuryr from victoria | 11:09 |
noonedeadpunk | seems it's missing some other backport to victoria | 11:09 |
noonedeadpunk | oh, sorry pinged to early - it's still on passing tempest step :9 | 11:10 |
andrewbonney | :) | 11:10 |
noonedeadpunk | it's bionic passed | 11:10 |
andrewbonney | I've got an AIO going so I can always debug further | 11:10 |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove *_git_project_group variables https://review.opendev.org/c/openstack/openstack-ansible/+/766039 | 11:11 |
noonedeadpunk | I've returned patch o state of patchset 10 which was yours last one | 11:12 |
*** sshnaidm has joined #openstack-ansible | 11:17 | |
andrewbonney | noonedeadpunk: timing looks suspicious for the release of https://docs.docker.com/engine/release-notes/#20100. I'm investigating... | 11:49 |
noonedeadpunk | oh, well, it really does... | 11:49 |
noonedeadpunk | do we add docker repo? as I'm not sure ubuntu would just publish latest.... | 11:50 |
andrewbonney | Yeah, the ubuntu ones tend to be a long way behind | 11:50 |
noonedeadpunk | if we add reo, probably it's worth using apt_package_pinning role | 11:51 |
andrewbonney | I'll take a look at that once I can confirm a downgrade fixes the test | 11:52 |
noonedeadpunk | good example I guess in rabbit https://opendev.org/openstack/openstack-ansible-rabbitmq_server/src/branch/master/tasks/install_apt.yml#L16-L30 | 11:55 |
andrewbonney | Thanks. That definitely fixes it so I'll add a pin | 11:58 |
noonedeadpunk | and drop depends on I've added then :) | 11:59 |
andrewbonney | Will do | 11:59 |
*** rfolco has joined #openstack-ansible | 12:03 | |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: [doc] Adjut octavia docs https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/766833 | 12:10 |
openstackgerrit | Merged openstack/openstack-ansible-os_masakari master: Add taskflow connection details https://review.opendev.org/c/openstack/openstack-ansible-os_masakari/+/766830 | 12:24 |
openstackgerrit | Merged openstack/openstack-ansible-os_octavia master: Delegate info gathering to setup host https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/766693 | 12:41 |
openstackgerrit | Merged openstack/openstack-ansible-os_octavia master: Trigger service restart on cert change https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/766062 | 12:41 |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: [doc] Adjut octavia docs https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/766833 | 12:53 |
openstackgerrit | Merged openstack/openstack-ansible-os_keystone master: Remove centos-7 conditional packages https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/765931 | 12:58 |
openstackgerrit | Merged openstack/openstack-ansible-openstack_hosts master: Make CentOS 8 metal voting again https://review.opendev.org/c/openstack/openstack-ansible-openstack_hosts/+/766425 | 12:58 |
openstackgerrit | Merged openstack/openstack-ansible master: Bump SHAs for master https://review.opendev.org/c/openstack/openstack-ansible/+/766858 | 13:03 |
openstackgerrit | Andrew Bonney proposed openstack/openstack-ansible-os_zun master: Update zun role to match current requirements https://review.opendev.org/c/openstack/openstack-ansible-os_zun/+/763141 | 13:11 |
admin0 | quick question .. when using sr-iov, is it transparent in horizon ? | 13:14 |
admin0 | i mean can a user create instances as normal and it will get sr-iov ports | 13:14 |
*** mgariepy has joined #openstack-ansible | 13:18 | |
*** redrobot has quit IRC | 13:29 | |
jrosser | admin0: an sriov vm is always two steps - create the port, create the vm attached to the port | 13:30 |
jrosser | so it's not the same as a non-sriov case | 13:31 |
admin0 | i want to be able to give horizon and not do support in office .. so looking for something easy for me to explain to users | 13:35 |
jrosser | well it is what it is, you can write instructions for how to do this with horizon, but if i remember right it is different to a regular vm | 13:37 |
admin0 | and i also read the sr-iov can be used with linuxbridge also .. no need for ovs | 13:38 |
admin0 | if you personally had a choice for a new greenfield with both ovs and sr-iov , lb as well as ovs, what would you recommend .. ( for an internal cloud with 1000+ users) , so trying to keep support and complexity to a min | 13:39 |
admin0 | and also, if the card support sr-iov, dpdk, is it not a good idea to use it ? | 13:39 |
jrosser | they are not the same, so if you want line speed networking to your VM then sriov is one way to do that | 13:39 |
jrosser | but if you want security groups, or vxlan, and all the other stuff, then you want linuxbridge/ovs | 13:40 |
admin0 | can both co-exist | 13:40 |
jrosser | yes | 13:40 |
admin0 | like normally people will get ovs/lb .. but if they want very fast, do the sr-iov stuff | 13:40 |
jrosser | generally the recommendation is to have a dedicated nic for sriov | 13:40 |
admin0 | oh | 13:40 |
admin0 | so 2 diff vlan providers .. | 13:41 |
jrosser | well you don't have to, but you mix up a lot of things | 13:41 |
admin0 | one for sr-iov, one for normal | 13:41 |
admin0 | that is good to know as well | 13:41 |
admin0 | so i will go with regular lb for now for this 10g .. . and later, add another 10g and dedicate it for sr-iov .. I do not need ovs at all right ? | 13:43 |
admin0 | one more question .. osa does support mixed hypervisors ? like one using lb and one using ovs ? | 13:46 |
admin0 | this specific use case might see upto 200 (small) instances in a single hypervisor .. so trying to figure out at what point lb will be a bottleneck | 13:47 |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove *_git_project_group variables https://review.opendev.org/c/openstack/openstack-ansible/+/766039 | 13:52 |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove *_git_project_group variables https://review.opendev.org/c/openstack/openstack-ansible/+/766039 | 13:55 |
*** dave-mccowan has quit IRC | 13:55 | |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible master: Remove openstack_testing.yml for RC https://review.opendev.org/c/openstack/openstack-ansible/+/766957 | 13:56 |
*** spatel has joined #openstack-ansible | 13:58 | |
*** mgariepy has quit IRC | 14:06 | |
*** mgariepy has joined #openstack-ansible | 14:06 | |
jrosser | spatel: is that a discussion for #rdo or really for here? | 14:12 |
spatel | sorry i didn't realized i am in RDO :( | 14:14 |
spatel | what is your thought on that? | 14:14 |
noonedeadpunk | I think we will probably try to switch to RDO but as you might get, there're no guarantees with CentOS these days... | 14:15 |
spatel | noonedeadpunk: that is why i am worried | 14:15 |
noonedeadpunk | there will be also Cloudlinux forks of CentOS | 14:15 |
spatel | right now i have choice to make after 1 year i don't | 14:15 |
noonedeadpunk | but yeah... | 14:15 |
jrosser | i did a Centos 8 Stream AIO this morning | 14:16 |
noonedeadpunk | eventually even cPanelstarted development for Ubuntu and promises to release till end of the 2021 | 14:16 |
noonedeadpunk | I'm pretty sure it just worked :) | 14:16 |
spatel | jrosser: what is your experience | 14:16 |
jrosser | right now i see this Transaction test error:\n file /usr/share/man/man7/systemd.net-naming-scheme.7.gz from install of systemd-239-43.el8.x86_64 conflicts with file from package systemd-networkd-246.6-1.el8.x86_64 | 14:17 |
jrosser | and i just put my head in my hands and sigh | 14:17 |
spatel | jrosser: damn it | 14:17 |
noonedeadpunk | oh, rly ? | 14:17 |
noonedeadpunk | come on.... | 14:17 |
openstackgerrit | Marc Gariépy proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature. https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/766504 | 14:17 |
spatel | I am giving second thought of ubuntu | 14:17 |
noonedeadpunk | great CI they told | 14:17 |
spatel | Debian is good but worried about hardware support | 14:17 |
noonedeadpunk | things won't be broken any more they said | 14:18 |
jrosser | well this is becasue stream has a newer systemd than the one in EPEL where we get the networkd bit from | 14:18 |
mgariepy | morning everyone | 14:18 |
jrosser | oh also amusingly in ansible you cannot differentiate between centos 8.x and Centos 8 stream | 14:18 |
spatel | steam will take rolling upgrade from fedora so definitely they will get updated more frequently | 14:18 |
jrosser | becasue version = "8" | 14:18 |
jrosser | so as far as ansible facts is concerned its older in a version compare than 8.3 | 14:19 |
jrosser | which breaks what we just merged for the kernel module renaming | 14:19 |
noonedeadpunk | ┻━┻︵ \(°□°)/ ︵ ┻━┻ | 14:19 |
jrosser | i think we have to grep in /etc/redhat-release and set a local fact | 14:20 |
jrosser | oh wait | 14:22 |
noonedeadpunk | or, we can jsut say from next release that centos 8 is not supported and only stream is which sucks. but leave regular centos bit for future forks of centos... | 14:22 |
jrosser | wierdly it's installed systemd-networkd from epel fine | 14:23 |
jrosser | i wonder if it tries to do it one more time in lxc_hosts and thats blowing up | 14:23 |
jrosser | i would like to treat it like a totally different distro | 14:23 |
openstackgerrit | Merged openstack/openstack-ansible-os_keystone master: Add security.txt file hosting to keystone https://review.opendev.org/c/openstack/openstack-ansible-os_keystone/+/766437 | 14:24 |
jrosser | already it's obvious that all our version detection stuff is just wrong | 14:24 |
noonedeadpunk | I'm wondering if it has some difference in ansible_distribution_release or smth... | 14:28 |
jrosser | i could not find anything to drive centos(classic) vs centos(stream) logic | 14:32 |
noonedeadpunk | also looking through simmilar thread on their forum.... | 14:34 |
noonedeadpunk | and no solution there | 14:34 |
noonedeadpunk | how frustrating | 14:35 |
openstackgerrit | Merged openstack/openstack-ansible stable/ussuri: Apply /etc/environment for runtime after adjustment https://review.opendev.org/c/openstack/openstack-ansible/+/766798 | 14:37 |
spatel | Folk, I have decided to rebuild my openstack using Ubuntu | 14:37 |
mgariepy | spatel, ;) | 14:37 |
openstackgerrit | Linhui Zhou proposed openstack/openstack-ansible-os_magnum master: Replace deprecated UPPER_CONSTRAINTS_FILE variable https://review.opendev.org/c/openstack/openstack-ansible-os_magnum/+/762057 | 14:37 |
noonedeadpunk | ☜(⌒▽⌒)☞ | 14:37 |
spatel | I talked to my team and they give me thumbs up for ubuntu | 14:37 |
noonedeadpunk | lol | 14:38 |
spatel | No centOS hacks anymore | 14:38 |
noonedeadpunk | so I'm starting wondering - will be there anybody interested in centos in half a year? | 14:38 |
spatel | I am thinking for Debian but little worried | 14:38 |
noonedeadpunk | Debian is good imo | 14:38 |
spatel | worried about hardware support | 14:38 |
*** mgariepy has quit IRC | 14:39 | |
spatel | Ubuntu is more popular in openstack community (very well known) | 14:39 |
noonedeadpunk | used to worked for me previously | 14:39 |
spatel | I didn't see anyone using Debian in production | 14:39 |
noonedeadpunk | *used to work | 14:39 |
jrosser | \o/ there is no tar for a container rootfs whatsoever https://cloud.centos.org/centos/8-stream/x86_64/images/ | 14:39 |
admin0 | :) | 14:40 |
*** cshen has quit IRC | 14:41 | |
noonedeadpunk | some infra folks does at least (like fungi) and vexxhost used to run it as well | 14:41 |
spatel | right now they are using ubuntu right? | 14:42 |
noonedeadpunk | I guess still debian | 14:42 |
spatel | Hmm Let me try both and see how it goes. | 14:43 |
noonedeadpunk | jrosser: I'm just speachless | 14:43 |
spatel | good think moving forward you won't hear anything from me about CentOS :) | 14:43 |
jrosser | it's kind of run this far though with just a couple of minor edits | 14:44 |
jrosser | but really this i don't know what to do | 14:44 |
noonedeadpunk | spatel: we have worse CI cverage for debian though> but it's pretty much similar to ubuntu... So you might see issues but nothing serious and smth we totally should fix (and maybe add more tests) | 14:44 |
noonedeadpunk | jrosser: well, we always have lxcontainers and legacy method | 14:44 |
jrosser | this is the lxc prep log http://paste.openstack.org/show/801012/ | 14:45 |
noonedeadpunk | but last time I saw really huge permormance degrade | 14:45 |
jrosser | perhaps the prep script runs the command to convert centos->centos stream | 14:45 |
jrosser | but we kind of only get one year out of that whichever way :( | 14:45 |
*** mgariepy has joined #openstack-ansible | 14:56 | |
admin0 | chances also are like centos came to be , ( downstream distro) .. people might just fork it and continue to make it downstream distro | 15:14 |
admin0 | it will be the same, just in another name | 15:14 |
admin0 | which has happened to many projects in the past when decisions like this has been taken | 15:15 |
spatel | I just download Ubuntu Server 20.04.1 LTS (first time in my life) | 15:16 |
admin0 | spatel, \o/ yay | 15:18 |
spatel | I need to setup PXE book first to fire up my servers | 15:18 |
spatel | boot* | 15:19 |
*** cshen has joined #openstack-ansible | 15:26 | |
*** macz_ has joined #openstack-ansible | 15:37 | |
*** macz_ has joined #openstack-ansible | 15:38 | |
kleini | ubuntu server is great, never had big issues with it. especially ZFS support in ubuntu solved my problems with filesystems getting too fragmented over time in production systems | 15:45 |
SecOpsNinja | hi everyone. one quick question is there an easy way to recreate ques in rabbitmq? im getting "nova-scheduler: amqp.exceptions.NotFound: Queue.declare: (404) NOT_FOUND - queue 'scheduler_fanout_*' in vhost '/nova' process is stopped by supervisor" which is is the cause of the Connection failed: [Errno 113] EHOSTUNREACH (retrying in 32.0 seconds): OSError: [Errno 113] EHOSTUNREACH in nova-co | 15:45 |
SecOpsNinja | nducttor. | 15:45 |
openstackgerrit | Merged openstack/ansible-role-systemd_service master: Use upper-constraints for all tox environments https://review.opendev.org/c/openstack/ansible-role-systemd_service/+/765831 | 15:55 |
SecOpsNinja | yep i confirm the queues exist but it doesn't have any messasges in it... what could be the problem? the compute node notbieng able to connect to rabbitmq? | 15:57 |
spatel | SecOpsNinja: i had same issue and re-building rabbitMQ helped - https://bugs.launchpad.net/nova/+bug/1835637 | 16:03 |
openstack | Launchpad bug 1835637 in OpenStack Compute (nova) "(404) NOT_FOUND - failed to perform operation on queue 'notifications.info' in vhost '/nova' due to timeout" [Undecided,Incomplete] | 16:03 |
openstackgerrit | Dmitriy Rabotyagov proposed openstack/openstack-ansible-os_octavia master: [doc] Adjust octavia docs https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/766833 | 16:03 |
spatel | RabbitMQ is much easier to re-build then troubleshoot | 16:03 |
spatel | than* | 16:04 |
admin0 | SecOpsNinja, you can nuke the 3 rabbitmq containes and re-do it .. it will add the queues and fix it self | 16:05 |
admin0 | based on your build time, some agents might not retry, so you might have to locate them and manually restart the services | 16:05 |
noonedeadpunk | it's the way faster just to rerun `rabbitmq-install.yml -e rabbitmq_upgrade=true` | 16:06 |
noonedeadpunk | at least I'd start with it in case suggested issues with rabbit | 16:07 |
SecOpsNinja | but from what im seasing the queue existis in /nova but it doesn have nay messages now i don't know if the problems is a connectivy one from compute node or from nova-api regarding rabbitmq cluster | 16:09 |
SecOpsNinja | stil ltryin gto find a way to see who is connect to which queue and see if i can find one the problem | 16:10 |
spatel | SecOpsNinja: tcpdump will give you idea if anything hitting RabbitMQ or not | 16:17 |
spatel | RabbitMQ is complex sometime internal message routing is broken also cause issue and not visible until you debug components | 16:18 |
SecOpsNinja | my checking /var/log/rabbitmq/*cf.log after rebooting this container and seeing who is conencted | 16:18 |
SecOpsNinja | but yeh atm i can create vm becuase they get stuck in schedluing forever | 16:19 |
spatel | SecOpsNinja: use RabbitMQ GUI management interface which is easy to understand who is connected and where | 16:19 |
openstackgerrit | Merged openstack/openstack-ansible-tests master: Return centos-8 jobs to voting https://review.opendev.org/c/openstack/openstack-ansible-tests/+/765986 | 16:19 |
SecOpsNinja | spatel, what that GUI? can you give the url? im using the cli rabbitmqctl | 16:20 |
admin0 | anyone using netplan for declaring ovs setup on ubuntu 20 for osa ? | 16:20 |
admin0 | last i tried was in 18.04, but netplan was new and there was no ovs support on it | 16:20 |
spatel | SecOpsNinja: https://www.rabbitmq.com/management.html | 16:20 |
spatel | The management UI can be accessed using a Web browser at http://{rabbitmq_container_ip}:15672/ | 16:21 |
spatel | you may need to do some kind of SSH port forwarding if container network not accessible from your desktop | 16:21 |
SecOpsNinja | spatel, ok thanks :D | 16:21 |
spatel | SecOpsNinja: you can find UI password from cat /etc/openstack_deploy/user_secrets.yml | grep rabbitmq_monitoring_password | 16:23 |
SecOpsNinja | suposse the username is admin? | 16:23 |
spatel | username monitoring | 16:23 |
SecOpsNinja | ok thanks | 16:23 |
SecOpsNinja | will check it now | 16:24 |
SecOpsNinja | to see if i can understand what is happening | 16:24 |
admin0 | what i do is use firefox and foxyproxy with patterns like *172.29.236.* via socks port say 17221 .. then,via ssh do ssh user@deploy/or-any-server -D 17221 ( which opens a socks tunnel on 17221) | 16:26 |
admin0 | then you can browse/reach any IP that the server you are doing an ssh to reaches | 16:26 |
SecOpsNinja | yep i normaly use the ssh tunnel but in this cases im in the same management network so it not a problem. but yes the GUI is a lot easiear to see the connections :D | 16:27 |
spatel | I think OSA should expose rabbitmq monitoring to external network using HAProxy :) let me tags jrosser & noonedeadpunk | 16:27 |
*** fanfi has joined #openstack-ansible | 16:28 | |
admin0 | should be via a user_variable | 16:28 |
spatel | i am not seeing any security issue to expose that port because its read-only account and with password | 16:30 |
spatel | I love SSH tunnel stuff but its hard to teach every person and specially NOC people.. | 16:31 |
noonedeadpunk | the problem with just monitoring user is that it's really very limited metrics can be gathered with it | 16:32 |
noonedeadpunk | I usually put admin tag on it to make full privilege user to gather all available data... but dunno about security... | 16:32 |
spatel | noonedeadpunk: we can give more privilege | 16:32 |
noonedeadpunk | rabbit runs on mgmt network which should not be exposed | 16:32 |
spatel | noonedeadpunk: question is can we expose it via HAproxy or not? | 16:32 |
noonedeadpunk | ah | 16:33 |
spatel | I want to just type http://openstack.example.com:<rabbit_port>/ on my browser | 16:33 |
spatel | without any SSH tunnel hacks | 16:33 |
noonedeadpunk | I think you can just do it with haproxy_extra_services | 16:34 |
spatel | i didn't know that | 16:34 |
spatel | can we add that example snippet in RabbitMQ troubleshooting page of OSA documents? | 16:34 |
spatel | i meant at this page - https://docs.openstack.org/openstack-ansible/pike/admin/maintenance-tasks/rabbitmq-maintain.html | 16:35 |
noonedeadpunk | I don't have example on my hands... but it would be pretty much the same to https://opendev.org/openstack/openstack-ansible/src/branch/master/inventory/group_vars/haproxy/haproxy.yml#L64-L74 | 16:35 |
noonedeadpunk | I think that would be more proper place for this kind of doc https://docs.openstack.org/openstack-ansible-rabbitmq_server/latest/configure-rabbitmq.html | 16:36 |
spatel | I will test that out in lab and if everyone agreed then put example in that link | 16:36 |
noonedeadpunk | but maybe your link is good too... | 16:37 |
noonedeadpunk | as eventually it's really manitenance... | 16:37 |
spatel | will do there, i want all possible hacks to fix RabbitMQ in single page :) | 16:38 |
SecOpsNinja | yep i dyep i think i will do what noonedeadpunk suggested and try running rabbitmq-install.yml -e rabbitmq_upgrade=true and see if it resolves the EHOSTUNREACH... | 16:38 |
admin0 | another check will be to acutally curl/ping and see if its a network issue and not rabbit | 16:39 |
SecOpsNinja | in logs and haproxy i dont see any connection drops | 16:40 |
noonedeadpunk | rabbit does not go throug haproxy by the way | 16:41 |
spatel | In my last 3 years openstack operation i found re-building RabbitMQ fixed all kind of issue (even all monitoring showing green and cluster looking healthy). | 16:41 |
noonedeadpunk | but I'd rather run this playbook tbh | 16:41 |
noonedeadpunk | it never made things worse at least for me | 16:42 |
spatel | noonedeadpunk: do you reset cluster (clear mnesia directory) before running that playbook? | 16:42 |
SecOpsNinja | noonedeadpunk, i was checking the checkouts in haproxy regarding rabbitmq containers to see if they drops anything but yep the majority of from that i have are always something in rabbitmq... ok i will rerrun openstack ansible and see if it reosolverd the problem or not | 16:43 |
noonedeadpunk | spatel: nope, just run it :) | 16:43 |
spatel | I found if you have dirty mnesia directory then rabbitMQ start to fail and playbook get stuck | 16:44 |
spatel | but again its case to case.. | 16:44 |
noonedeadpunk | Hm, maybe... I just never faced that, but I can imagine that happening tbh | 16:44 |
noonedeadpunk | and I never run on centos, so... | 16:44 |
noonedeadpunk | (well actually ran but it was not so many times as for ubuntu) | 16:45 |
spatel | may be depend on what state your cluster die | 16:45 |
noonedeadpunk | well yeah | 16:45 |
noonedeadpunk | I mostly experienced issues after one controller outage was re-joining cluster | 16:45 |
SecOpsNinja | spatel, noonedeadpunk regaring upgrading the rabbitmq this [req-*] identifies are going to be reset when the openstack ansible finishs or irs there any whay that i can reset this behaviour? | 16:49 |
spatel | SecOpsNinja: I don't understand your question (what is req-*?) | 16:51 |
admin0 | SecOpsNinja, those req-s are going to be lost | 16:51 |
admin0 | coz the new db will have no idea of the request | 16:51 |
admin0 | request-id | 16:51 |
SecOpsNinja | nova-conductor[449]: 2020-12-14 16:48:16.438 449 ERROR oslo.messaging._drivers.impl_rabbit [req-4901b480-6728-4b58-994f-8ed141e7898e - - - - -] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 32.0 seconds): OSError: [Errno 113] EHOSTUNREACH still showing after openstack-ansible rabbitmq-install.yml -e rabbitmq_upgrade=true | 16:52 |
openstackgerrit | Jonathan Rosser proposed openstack/openstack-ansible-os_ceilometer master: Remove centos-7 conditional configuration https://review.opendev.org/c/openstack/openstack-ansible-os_ceilometer/+/765956 | 16:52 |
SecOpsNinja | yep the rabbitmq cluster sound't know this reqquest id but the clienets are still expecting answer to it | 16:52 |
SecOpsNinja | *shouldn't know | 16:52 |
SecOpsNinja | that is why i asked how to reset this information from consumers. i already deleted server in openstack but the nova.-conductor is still requesting and answer to that previoues request id | 16:53 |
spatel | not sure if openstack-ansible rabbitmq-install.yml -e rabbitmq_upgrade=true re-build cluster from scratch (like delete all) | 16:54 |
admin0 | it will timeout and not complain after a while | 16:54 |
spatel | queue as TTL and it will die after TTL expire | 16:54 |
SecOpsNinja | because i still have the 2 request id from almost 5h ago and its still complaning :D | 16:54 |
spatel | You can delete that mesg also manually (need to google or use UI to delete those request) | 16:55 |
spatel | noonedeadpunk: question for you does openstack-ansible rabbitmq-install.yml -e rabbitmq_upgrade=true will destroy cluster and re-build like *new* ? | 16:56 |
SecOpsNinja | ok i will try to find a way to delete that because the queues are empty of messags | 16:56 |
noonedeadpunk | pretty close to this | 17:04 |
noonedeadpunk | yes | 17:04 |
noonedeadpunk | it drops queues, and rebuilds cluster | 17:04 |
spatel | noonedeadpunk: Does it preserve data during re-build because its in HA | 17:05 |
noonedeadpunk | except it does not drop already created users, vhosts and some more of the persistant data | 17:05 |
noonedeadpunk | but it does drop all messages that were there | 17:06 |
spatel | what is why SecOpsNinja Req-* still in queue (because it preserve ) | 17:06 |
spatel | hmm | 17:06 |
noonedeadpunk | (well I'm not 100% sure about that) | 17:06 |
spatel | I believe if its in HA then it will preserve data in queue (i would like to try that would) | 17:07 |
spatel | out* | 17:07 |
*** jbadiapa has joined #openstack-ansible | 17:08 | |
noonedeadpunk | EHOSTUNREACH ofc sounds more like networking... are you able to telnet to 5671 port to all rabbitmq containers from nova-api one? | 17:09 |
jrosser | looking at what nova-conductor is trying to connect to with strace -p <pid> then ping / check routes / telnet to whatever its trying to connect to is a good plan for these situations | 17:15 |
jrosser | you'll see the actual IP it's trying like that | 17:15 |
*** mgariepy has quit IRC | 17:19 | |
openstackgerrit | Merged openstack/openstack-ansible-repo_server master: Fix order for removing nginx file. https://review.opendev.org/c/openstack/openstack-ansible-repo_server/+/766257 | 17:35 |
SecOpsNinja | noonedeadpunk, spatel and jrosser yep i have a rabbitmq clsuter with 3 nodes. and i see that after recriaring it the queues in /nova vhost are still the same | 17:36 |
SecOpsNinja | i will try to do that with sttrace and see if i can find it because i dont see any log of droping connection oin rabbitmq nodes | 17:37 |
admin0 | anyone doing netplan+ovs -- can share config ? | 17:37 |
SecOpsNinja | the only whaty to stop the ERROR oslo.messaging._drivers.impl_rabbit in nova schedluer and nova conductor was restarting the systemd service. going to make a strace to both pids and make que request to create a new server and see what happens | 17:38 |
*** johanssone has quit IRC | 17:45 | |
*** johanssone has joined #openstack-ansible | 17:47 | |
*** rpittau is now known as rpittau|afk | 17:56 | |
*** spatel has quit IRC | 17:57 | |
*** maharg101 has quit IRC | 17:58 | |
*** spatel has joined #openstack-ansible | 17:59 | |
SecOpsNinja | jrosser, one question regarding strace if using in parrent process of nova-scheduler or nova-conductor i only see the something like this select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=9973}) = 0 (Timeout). who whould i use strace? | 18:02 |
*** carlosm has joined #openstack-ansible | 18:03 | |
carlosm | hi guys | 18:03 |
SecOpsNinja | from i able to see in the logs in the moment i try to create a new server using cli, i the the valdiation of /v2.1/flavors/ and after that i get uwsgi[72]: Mon Dec 14 17:47:22 2020 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /v2.1/servers (ip of the host) !!! | 18:03 |
SecOpsNinja | and 2 seconds later nova snd scheduler reconnecting and starting giving EHOSTUNREACH errors... | 18:05 |
carlosm | My neutron has following erros, someones knows? : Device brq3c0d52cf-11 cannot be used as it has no MAC address | 18:05 |
admin0 | SecOpsNinja, have you tried rebooting this host again :) | 18:12 |
SecOpsNinja | yep varios times, including nova and scheduler containers | 18:13 |
SecOpsNinja | im notw trying to do the strace to nova-api-wsgi pid to see what causes the A recoverable connection/channel error occurred, trying to reconnect: Server unexpectedly closed connection | 18:14 |
*** mgariepy has joined #openstack-ansible | 18:20 | |
spatel | SecOpsNinja: just curious what your tcpdump saying? it should give you all the information | 18:20 |
SecOpsNinja | ok i do see some connection from host in strace of uwsgi pid (/etc/uwsgi/nova-api-os-compute.ini) gerring ECONNRESET (Connection reset by peer) and see and error regaging "HTTP exception thrown: Flavor basic-small could not be found" it shows the flavor as public | 18:21 |
SecOpsNinja | let me try again | 18:22 |
SecOpsNinja | spatel, trying to reduce the quantaty of messages because tcpdump -i the1 inside nova-api container does get a lto of info | 18:26 |
spatel | you need to filter for port and just grab 1 call to trace and see start to finish | 18:33 |
spatel | rabbitMQ use TCP so it will keep connect in Established mode (so you won't see any SYN/ACK) | 18:34 |
SecOpsNinja | how i do the trace with just one package? | 18:34 |
spatel | download pcap and use wireshark | 18:34 |
spatel | how many compute nodes you have? | 18:35 |
SecOpsNinja | 3 | 18:36 |
SecOpsNinja | and 3 infra ones where i have 1 node of rabbitmq | 18:36 |
SecOpsNinja | but atm im seeing another error that could eb the problem (or at least narro it) | 18:36 |
SecOpsNinja | http://paste.openstack.org/show/801020/ | 18:38 |
SecOpsNinja | the this part is strange GET /v2.1/flavors/basic-small | 18:38 |
SecOpsNinja | because this openstack flavor show basic-small workds | 18:39 |
SecOpsNinja | "GET /v2.1/flavors/basic-small" status: 404 but "GET /v2.1/flavors?is_public=None" status: 200 ? but the flavor does have os-flavor-access:is_public : True | 18:41 |
SecOpsNinja | in meantime i will try to do a pcap and use it with wireshark | 18:41 |
SecOpsNinja | because im still learning about lxc is there any what to copy files from inside the contaienrs to the host? | 18:42 |
SecOpsNinja | forget the last question lol.... | 18:42 |
spatel | SecOpsNinja: did you see this - https://ask.openstack.org/en/question/32360/networking-issues-errno-113-ehostunreach/ | 18:44 |
SecOpsNinja | let me check thaht | 18:45 |
spatel | copy file /var/lib/lxc/<contrainer_name>/rootfs/.... | 18:46 |
spatel | i do copy in/out mostly using that path never did scp from host to container :) | 18:46 |
SecOpsNinja | sorry i dont understand that question. because from i understand the compute is not able to connect to any service and in the service nova in compute log doesnt report any thing and the nova api only reportes connection drops after specfic calls | 18:47 |
SecOpsNinja | atm if i check all the services show up and runing | 18:47 |
SecOpsNinja | spatel, thanks for the cp path i normaly did scp | 18:48 |
spatel | think LXC container like folders :) | 18:48 |
SecOpsNinja | im a bit lost atm because theo openstack service shows all up and runing and only after a espefic request i see nova-conducter/scheduelr reconnecting after a few seconds but can find why is droping the connection... | 18:50 |
SecOpsNinja | to rabbitmq | 18:50 |
*** openstackgerrit has quit IRC | 18:50 | |
admin0 | is it recommended to change qcow2 to raw if using ceph for cinder/glance/vms ? | 18:55 |
admin0 | for the image | 18:55 |
SecOpsNinja | i dont know why but the 404 in uwsgi of nova-api is causing the connection failed from rabbitmq as you can see it here http://paste.openstack.org/show/801021/ | 18:56 |
*** gyee has joined #openstack-ansible | 18:58 | |
SecOpsNinja | and that 172.30.0.2 its the primary ip of the haproxy so all the request that i make with openstack client outside go with haproxy ip and not mine | 18:59 |
jrosser | SecOpsNinja: using internal/public would help as those are the terms in the code | 19:00 |
SecOpsNinja | but the internal and public and managed by haproxy | 19:01 |
jrosser | i struggle to follow primary/outsude | 19:01 |
jrosser | admin0: yes for glance images in ceph you should convert to raw | 19:01 |
SecOpsNinja | sorry not haproxy but keepalive but the public endpoisn are using the vips | 19:02 |
SecOpsNinja | the public and private endppoints as i had various haproxys | 19:03 |
SecOpsNinja | i now only have 1 but im still using the vip so i dont have to reconfigure all the cluster | 19:03 |
SecOpsNinja | i will try to strace all pid fork process of uwsi in nova container to see if i can the the connection but strace is a bit unknown to me atm... | 19:05 |
*** openstackgerrit has joined #openstack-ansible | 19:05 | |
openstackgerrit | Merged openstack/openstack-ansible master: Remove *_git_project_group variables https://review.opendev.org/c/openstack/openstack-ansible/+/766039 | 19:05 |
spatel | admin0: use raw for ceph storage | 19:10 |
openstackgerrit | Jonathan Rosser proposed openstack/openstack-ansible-haproxy_server master: Add haproxy_frontend_only and haproxy_raw feature. https://review.opendev.org/c/openstack/openstack-ansible-haproxy_server/+/766504 | 19:10 |
spatel | most of people saying it boost performance (I personally never experience that so going with best practices) | 19:11 |
spatel | nova directly talk to rabbitMQ (not via haproxy) | 19:13 |
spatel | SecOpsNinja: ^ | 19:13 |
spatel | haproxy shouldn't come in picture for troubleshooting rabbitmq communication | 19:14 |
SecOpsNinja | spatel, yep but the info that i have is SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /v2.1/servers/d8508991-78d5-45e3-a7a2-77ca8c11aba0 (ip 172.30.0.2) !!! and the 172.30.0.2 if from the physical host and not nova-api or haproxy containers so i suposse that ino says that my openstacl client cli dropded the connections but that should cause the n | 19:16 |
SecOpsNinja | ova api to loose connection to rabbitmq | 19:16 |
SecOpsNinja | and tcpdump doesn shows info regarding what/who dropped a connection | 19:17 |
SecOpsNinja | i suposte that all the rabbitmq consumers are always connected to the varios rabbitmq cluster so there must be something that is causing the nova-scheduler and nova-condutor to reconnect | 19:17 |
spatel | Make sure no MTU mismatch and no packetloss | 19:18 |
SecOpsNinja | because ther are only the ones that reconnect after the failed api call | 19:18 |
SecOpsNinja | and i see the reconnects in various rabbitmq logs | 19:18 |
spatel | MTU mismatch is very complex to troublshoot because it look like works but drop packets | 19:18 |
SecOpsNinja | the mtu is only a problem if you are using some thing like vlans because of the header but other wise it should be a problem in lan comunication, no? | 19:19 |
spatel | If host A has MTU 9000 and host B has 1500 then you may see issue. | 19:20 |
spatel | It has nothing to do with VLAN or VxLAN | 19:20 |
SecOpsNinja | and i didnt mess twith mut so i believe its the 1500 default one that is configureed | 19:20 |
SecOpsNinja | let me confirm that but i believe there all have the same | 19:20 |
spatel | I had issue with LXC container 3 years ago, everything was working but it was dropping packets and turn out it was kernel logging issue | 19:21 |
SecOpsNinja | from what im seaisng the majory is 1500 and some brq/tap interfces are using 1450 | 19:22 |
spatel | that is good | 19:23 |
SecOpsNinja | but the starnge parte is that all this problems starts when i installed adicional infra nodes.... and tryied to enabled HA in all of them with keepalive and multiple haproxys | 19:23 |
SecOpsNinja | this as been and intertaing adventure :D | 19:24 |
spatel | if this is not in production then why don't you destroy container and re-build it | 19:24 |
SecOpsNinja | i will make a new test and see if i can detect disconnects in all rabbitmq cluster nodes | 19:25 |
spatel | re-build nova and rabbit | 19:25 |
SecOpsNinja | becuase i want to understand what is the problem (sometimes i cant rdestroy and rebuild it) | 19:25 |
spatel | yes agreed. lets us know whatever you find. | 19:27 |
SecOpsNinja | let me make a few test before going home to rest :D but what i will try is to see making first a connection to the flaour list and then tryoing to create a new server with it | 19:28 |
SecOpsNinja | and see if rabbitmq cluster nodes report any reconnect/error to the current consumers list | 19:29 |
SecOpsNinja | spatel, jrosser, noonedeadpunk yep its as to be something in nova api container/services that is causing the drop of the connection http://paste.openstack.org/show/801022/ . If ii understand correctly the rabbitmq in openstack cluster it shouldn't loose connection to rabbitmq ecause of an 404 or http exception throw. If it was a network conenction varioues other consumers will also reconne | 19:46 |
SecOpsNinja | ct but that didn't happen... only in nova_api_container | 19:46 |
SecOpsNinja | and the flavour was reacreated in the same project where the imaegs and server is being created so there must be any missconfiguration form my oart but i can't find where... | 19:47 |
jrosser | it almost suggests that the mq credentials are mismatched between the nova container and the mq cluster | 19:49 |
jrosser | becasue it disconnects pretty much straight away | 19:49 |
SecOpsNinja | ei dont think i replaced the openstack osa secrets but let mee check in nova api conf files | 19:50 |
SecOpsNinja | not finding the password in nova conffiles | 19:53 |
SecOpsNinja | yep im out of ideas to try to understand what is happening here... i can try to force the creation of all contaienr sexcept rabbitmq and galera and see if it resolves but supossly openstack-ansible should have done all configuration... | 19:55 |
*** maharg101 has joined #openstack-ansible | 19:55 | |
*** carlosm has quit IRC | 20:00 | |
*** maharg101 has quit IRC | 20:00 | |
spatel | why you getting {handshake_timeout,handshake} | 20:01 |
spatel | I have seen that error when cluster is not healthy | 20:02 |
SecOpsNinja | probably because the http trhow exception and doesn finish the reques? | 20:02 |
SecOpsNinja | but if i goo to the cluster is show that it as all the nodes and there install any brain slip | 20:02 |
*** viks____ has quit IRC | 20:03 | |
spatel | why don't you run nova in debug mode | 20:03 |
SecOpsNinja | and i make the rabbitmq install | 20:03 |
*** hindret has quit IRC | 20:03 | |
*** simondodsley has quit IRC | 20:03 | |
SecOpsNinja | i can . lett me try to put that service in debug... i supose that --debug? | 20:03 |
*** simondodsley has joined #openstack-ansible | 20:04 | |
spatel | nova.conf use debug=True | 20:04 |
*** hindret has joined #openstack-ansible | 20:04 | |
SecOpsNinja | where is the file? i coud only find *.ini ones | 20:04 |
spatel | inside nova-api container /etc/nova/ | 20:05 |
SecOpsNinja | ok give me a minute to change that and open all rabbitmq logs | 20:06 |
*** cshen has quit IRC | 20:10 | |
SecOpsNinja | lol after restarting all nova services nova-api-os-compute.service nova-api-metadata.service nova-conductor.service nova-novncproxy.service nova-scheduler.service in the container not the api doesnt drop the connection in rabbitmq logs | 20:11 |
SecOpsNinja | but still givves the error 404 in flavour | 20:11 |
SecOpsNinja | http://paste.openstack.org/show/801025/ | 20:12 |
*** cshen has joined #openstack-ansible | 20:12 | |
jrosser | SecOpsNinja: where are you running the cli commands from? | 20:13 |
SecOpsNinja | my computer that is using haproxy vip endpoint as the OS_AUTH_URL | 20:13 |
jrosser | can you please try from the utlity container | 20:14 |
SecOpsNinja | yep one second | 20:14 |
SecOpsNinja | hum one firence that im finding in openrc configuration is that the utility use the /v3 part and the one in my machine don't | 20:15 |
SecOpsNinja | but let me make the request | 20:16 |
*** andrewbonney has quit IRC | 20:17 | |
SecOpsNinja | yep same behaviour regaring 404 and 202 - http://paste.openstack.org/show/801026/ | 20:18 |
SecOpsNinja | but still no droping connections now in rabbitmq | 20:18 |
SecOpsNinja | and will try now to force the creation of a vm to see if i get more info | 20:19 |
spatel | are you getting list of flavor with openstack flavor list ? | 20:19 |
SecOpsNinja | yes | 20:20 |
SecOpsNinja | that is the strange parte and all are public | 20:20 |
spatel | mostly openstack flavor show command don't interact with rabbitMQ | 20:20 |
spatel | That is API call directly go to mysql DB | 20:21 |
SecOpsNinja | yep the openstack flavor list dont | 20:21 |
spatel | not sure why flavor issue coming in picture | 20:21 |
SecOpsNinja | atleast i dont see anything in logs | 20:21 |
SecOpsNinja | let me try to creat ea dummy vm | 20:21 |
SecOpsNinja | jrosser, spatel http://paste.openstack.org/show/801027/ | 20:26 |
SecOpsNinja | and it starts getting problems in rabbitmq disconets | 20:27 |
SecOpsNinja | let me try to repost with info regarding rabbitmq logs | 20:28 |
spatel | HTTP exception thrown: Flavor basic-small could not be found. | 20:31 |
SecOpsNinja | http://paste.openstack.org/show/801028/ | 20:32 |
SecOpsNinja | but exists at least in the falouv list | 20:33 |
SecOpsNinja | and shoing info regarding specific flavor | 20:33 |
SecOpsNinja | thats is very strange indead | 20:33 |
SecOpsNinja | should i force the destruction of all rabbitmq cluster and after they have been recreated force the restart of all the infra nodes containers? | 20:34 |
spatel | 2 node RabbitMQ ? | 20:34 |
spatel | that is bad | 20:34 |
SecOpsNinja | yep i have 3 nodes in my rabbitmq | 20:34 |
SecOpsNinja | the first one didn't report any disconnect | 20:34 |
spatel | I have strong feeling your rabbit isn't in good health | 20:35 |
SecOpsNinja | or the tail didnt updated | 20:35 |
SecOpsNinja | yep it didnt report any disconnect | 20:35 |
spatel | Just nuke rabbitmq and re-build | 20:35 |
SecOpsNinja | so you recommend destroy all the rabbitmq cluster containers, recreate them and run the rabbitmq install ? | 20:36 |
admin0 | i would recommend that also | 20:36 |
spatel | This is what i do to nuke rabbitmq | 20:36 |
SecOpsNinja | and the the consumers should i restart all of them or they will be able to resolve there problems? | 20:36 |
spatel | stop all services | 20:37 |
spatel | kill -9 rabbit | 20:37 |
spatel | un-install rabbit (yum remove rabbitmq-server) | 20:37 |
spatel | rm -rf /var/lib/rabbitmq/mnesia/* | 20:37 |
spatel | Run playbook to deploy rabbitmq | 20:37 |
SecOpsNinja | when you say stop all services is regarding infra host services that would be using rabbitmq right? | 20:38 |
spatel | Inside rabbitmq-container | 20:39 |
spatel | on infra nodes | 20:39 |
SecOpsNinja | oh ok | 20:39 |
SecOpsNinja | thanks everyone for all info and i will try tomorow to do that and see if i can have this resolved .... im having nightmware with rabbits :D | 20:39 |
spatel | rabbit is worst part of openstack and majority of time you will see issue with rabbitmq | 20:40 |
spatel | i have multiple time nuke rabbitMQ (because none of troubleshooting guide helped me) | 20:41 |
SecOpsNinja | i would think that giving more nodes would put rabbitmq more stable | 20:41 |
mgariepy | i tought it was neutron the worst part ;).. lol | 20:41 |
SecOpsNinja | mgagne, yep neutron with some plugins is a interesting part also | 20:42 |
spatel | neutron is CPU hungry (i haven't seen any complication about config) | 20:42 |
SecOpsNinja | thanks again and i will try to give and updated tomorow :D | 20:42 |
SecOpsNinja | gn to all | 20:42 |
spatel | gn | 20:42 |
spatel | I hate Rabbitmq clustering part, its always hard to recover. (whenever i tried to join node it always do something nasty or hung on me) | 20:43 |
spatel | one day i had split-brain (that was nightmare) | 20:44 |
spatel | at least with neutron you don't need to deal with clustering issue. | 20:44 |
*** cshen has quit IRC | 20:46 | |
*** SecOpsNinja has left #openstack-ansible | 20:48 | |
*** cshen has joined #openstack-ansible | 20:51 | |
mgariepy | sure but neutron tend to be really slow to recover from what i've seen. | 20:55 |
mgariepy | i agree failure when it's the first time and you need to learn on the spot to fix it is not fun. | 20:56 |
spatel | mgariepy: its easy to horizontally add more resource in neutron to spread load | 21:11 |
spatel | anyone has good Ubuntu pxe boot kickstart file? | 21:40 |
spatel | in this option looks good for PXE - append initrd=/images/ubuntu/initrd ip=dhcp syslog=10.70.0.20:514 url=http://10.70.0.20/pxe_repo/ubuntu-20.04.1-live-server-amd64.iso ks=http://10.70.0.20/pxe_ks/ubuntu-20-04-1.ks | 21:41 |
spatel | I found installation works but it prompt for question/answer :( | 21:42 |
spatel | i need auto-install | 21:42 |
jrosser | spatel: before 20.04 there was debian-installer and preseed | 21:50 |
*** cshen has quit IRC | 21:50 | |
jrosser | in 20.04 there is now this https://ubuntu.com/server/docs/install/autoinstall | 21:50 |
jrosser | it is late here but i can maybe share some stuff tomorrow | 21:50 |
spatel | jrosser: thanks, let me read about that | 21:51 |
jrosser | it is vert similar to cloud-init for a vm | 21:51 |
spatel | hmm i came across with some article talking about cloud-init but i thought that would be not for my setup so ignored them | 21:52 |
spatel | Let me understand how 20.04 handle that | 21:52 |
spatel | what OS you guys running on your openstack? | 21:52 |
spatel | 19.x ? | 21:52 |
jawad_axd | Hi! Can someone please push me on this one, with newly added compute . I can see in 'openstack compute service list' but not in 'openstack hypervisor list'. This is nova-compute log http://paste.openstack.org/show/801032/ .One more thing I noticed that I can not reach ceph from compute node after installation as "rbd --user cinder ls -p pool-name" after following openstack-ansible docs for adding n | 21:53 |
jawad_axd | ew compute node. Thanks in advance for pointers. | 21:53 |
jawad_axd | I am trying to make this compute host as gpu passthrough, and it has vfio-pci kernel driver enabled on the host. I am not sure if that is causing some problem. | 21:55 |
*** maharg101 has joined #openstack-ansible | 21:56 | |
jawad_axd | I would highly appreciate if someone would gives some hints for it. I have spent last few days on it.. | 21:58 |
spatel | jawad_axd: did you check nova-api logs and nova-placement logs? | 21:59 |
*** maharg101 has quit IRC | 22:00 | |
jawad_axd | I can not see any error there.. | 22:01 |
jawad_axd | I got http://paste.openstack.org/show/800980/ libvirt related error couple of days ago. But then it didt appear again. | 22:04 |
spatel | jawad_axd: can you see your compute nodes in "openstack resource provider list" | 22:06 |
spatel | if not then it could be nova-placement service related issue | 22:07 |
jawad_axd | I can not see it with " openstack resource provider list"" | 22:07 |
*** cshen has joined #openstack-ansible | 22:08 | |
spatel | there you go | 22:08 |
spatel | look like your compute node not able to register to nova-placement or may be nova | 22:09 |
spatel | i would check your compute nova.conf file to see if you have good config and nothing missing | 22:09 |
spatel | also make sure your nova-placement is running on infra node | 22:10 |
jawad_axd | This is nova.conf from compute. http://paste.openstack.org/show/801033/ | 22:15 |
spatel | can you ping or curl your endpoints and node able to talk to all API services? | 22:16 |
spatel | its hard to say anything just looking at file nova.conf | 22:17 |
spatel | run in debug mode and see why its not able to register itself to controller nodes | 22:17 |
jawad_axd | Ok. Regarding nova.conf I added pcipassthrough filter and [pci] information. I never had this kinda problem before. | 22:19 |
spatel | remove that option and restart nova to see | 22:19 |
spatel | I am using pcipassthrough and i had no issue at all | 22:20 |
jawad_axd | ok | 22:20 |
spatel | just do some quick hit and try to see if it make any sense | 22:20 |
jawad_axd | nova-compute service restart is taking forever after removing those entries. | 22:27 |
spatel | hmm | 22:27 |
spatel | check logs and see | 22:28 |
jawad_axd | This is nova-compute log http://paste.openstack.org/show/801034/ after service restarted. | 22:31 |
spatel | nothing change | 22:34 |
*** jbadiapa has quit IRC | 22:34 | |
spatel | no error except not able to find "No compute node record found" | 22:34 |
spatel | I would check again nova-placement and nova-api logs | 22:35 |
spatel | when compute node restart it try register to nova-placement/api and sure tell you something (run in debug mode to get more data) | 22:36 |
jawad_axd | This is placement log http://paste.openstack.org/show/801036/ | 22:38 |
spatel | what if you run tcpdump on compute node and on other terminal restart service nova-compute | 22:42 |
spatel | it will tell you in tcpdump what its trying to do making call to api etc.. | 22:42 |
jawad_axd | Ok. I do it. | 22:43 |
jawad_axd | This is nova-api log http://paste.openstack.org/show/801037/ | 22:43 |
spatel | looking clean so look like your compute nodes not making a call (if you have single infra then run tcpdump on nova-api also to see if you getting any packet from compute) | 22:47 |
jawad_axd | I have HA setup . 3 nova-api nodes. | 22:48 |
jawad_axd | http://paste.openstack.org/show/801038/ | 22:49 |
jawad_axd | tcpdump on compute node while restarting services. | 22:49 |
jawad_axd | I do tcpdump on nova-api | 22:49 |
spatel | :) you need to filter tcpdump for specific host ip or port (otherwise you will see all garbage like SSH / ARP etc..) | 22:50 |
jawad_axd | ah ok | 22:50 |
spatel | tcpdump -i any -nn not port ssh -e -xX -s0 (i would try that) | 22:51 |
spatel | Good night folks! see you tomorrow! it was wonderful troubleshooting day today. | 22:55 |
jawad_axd | Goodnight! | 22:56 |
jawad_axd | Thanks for your time. | 22:56 |
*** spatel has quit IRC | 22:58 | |
*** spatel has joined #openstack-ansible | 23:05 | |
*** spatel has quit IRC | 23:09 | |
openstackgerrit | Jonathan Rosser proposed openstack/openstack-ansible-os_octavia master: [doc] Adjust octavia docs https://review.opendev.org/c/openstack/openstack-ansible-os_octavia/+/766833 | 23:28 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!