prometheanfire | does `openstack network agent list show XXX for alive ovn controller and ovn-metadata-agent for others? | 00:06 |
---|---|---|
jamesdenton | they should all show alive, prometheanfire | 01:43 |
jamesdenton | https://docs.openstack.org/openstack-ansible-os_neutron/latest/app-ovn.html | 01:45 |
prometheanfire | jamesdenton: br-int sock was not accessable due to protocol version issues, think it's fixed now | 02:38 |
jamesdenton | ahh, good deal | 03:12 |
jamesdenton | protocol version.. meaning ssl vs non ssl? | 03:12 |
*** akahat|rover is now known as akahat | 05:12 | |
prometheanfire | maybe? not sure | 05:13 |
prometheanfire | someone set up the systems with linuxbridge and ovs but I'm redoing it with zed and ovn (instructions unclear for them when I went on leave I guess), so ovs created bridges without ssl I'm guessing and ovn connects with it, maybe | 05:15 |
prometheanfire | deleted the bridge and had stuff recreate since this is greenfield | 05:15 |
*** dviroel|afk is now known as dviroel | 11:24 | |
cloudnull | OHAI - happy Friday all | 14:37 |
prometheanfire | cloudnull: ohai | 15:11 |
*** dviroel is now known as dviroel|lunch | 16:39 | |
admin1 | is there a easy way to enable 2fa/mfa in keystone via osa ? | 17:34 |
darman | Hey | 17:41 |
darman | Is this channel an active channel, anybody here online? (Unfortunately all openstack channels on Libra.chat are silent!) | 17:43 |
jrosser | there are people here :) | 17:44 |
jrosser | most activity is working-days / working-hours EU time | 17:45 |
jrosser | admin1: 2fa enablement is not really an OSA thing, you'd use a config override to enable the auth method then the rest is via the keystone API https://docs.openstack.org/keystone/latest/admin/resource-options.html#multi-factor-auth-enabled | 17:46 |
jrosser | it's per user, and as OSA does not deploy end-users then there is not really anywhere to do that | 17:47 |
darman | Ah, finally I found you (: | 17:49 |
darman | I have some general question, and also some issues | 17:49 |
*** dviroel|lunch is now known as dviroel | 17:50 | |
darman | 1. If you were to deploy a production environment, would you choose OVN as it's not as common as OVS? Personally I prefer OVN since it's been the next step in openstack networking development to redesign the network backend; But some technical aspects would help me to defeat the choice against my managers. | 17:53 |
darman | 2. This the error I get: https://pastebin.ubuntu.com/p/prnxqmSCb4/ when the setup-everything.yml reaches to the keystone service installation. | 17:54 |
darman | this is* | 17:54 |
darman | Error link*: https://pastebin.ubuntu.com/p/dCsjv6bz9p/ | 17:56 |
jrosser | i have no direct experience of OVN myself but we have other poeple here who are using it for real | 17:58 |
darman | 3. Do you know a general active channel for openstack itself (here on IRC or other online platforms)? | 17:59 |
jrosser | regarding your deploy error it is not possible to know what is wrong from that output | 18:00 |
jrosser | that ansible task has no_log: True on it as otherwise it would display the database password in the log output | 18:00 |
jrosser | first thing you need to do is check haproxy that it thinks the database backends are up | 18:01 |
jrosser | you can either use the haproxy log, hatop or the haproxy management web interface for that | 18:01 |
darman | Where I should set`false` for this option: 'no_log: true'? in the user_variables.yml? | 18:01 |
darman | "Overview — HATop: Interactive ncurses client for HAProxy"; I didn't know it! | 18:03 |
darman | jrosser: Here are my variables: https://pastebin.ubuntu.com/p/wP2vCzGJwf/ Do you see something strange there for haproxy? Is there anything that has been forgotten in the config file? I would appreciate it if you take a look | 18:11 |
darman | Link*: user_variables.yml: https://pastebin.ubuntu.com/p/CpvC3Ym36Y/ | 18:13 |
darman | Yes, there's an issue with haproxy: https://pastebin.ubuntu.com/p/pK2kmMXgFw/ | 18:24 |
jrosser | well first i would really advise against install_method: distro unless you have a super clear understanding of why you choose that | 18:28 |
jrosser | then from your error message we see failed: [infra01_keystone_container-51bb0d04 -> infra01_utility_container-e956a5a6(172.17.236.15) | 18:29 |
jrosser | ^ an address in 172.17.... | 18:29 |
jrosser | but you define internal and external vip in 10.x ranges | 18:29 |
darman | The installation process from the source was very long, almost 6 hours. I thought maybe it would be faster from the distro, which was no different. I will change it to the source in the next installation. | 18:45 |
jrosser | it should not be 6 hours at all, that suggests some sort of problem | 18:46 |
darman | in my experience: setup-hosts --> 45 minutes | 18:46 |
darman | setup-infra: 1 h | 18:47 |
jrosser | is this on real hardware or some virtualised environment? | 18:47 |
darman | On VMS on proxmox | 18:47 |
jrosser | oh right, well | 18:47 |
jrosser | i think that the deploy time is pretty sensitive to disk speed | 18:48 |
jrosser | having said that our CI jobs run a complete deployment on a single node in < 2hours | 18:49 |
jrosser | and those are virtualised | 18:50 |
jrosser | a bare metal node with an nvme disk might complete in < 1 hour | 18:50 |
jrosser | anyway, it feels like your haproxy problem is networking related | 18:51 |
jrosser | i don't understand what is happening with your addressing | 18:51 |
darman | on a single node in < 2hours; What about 3 controllers and 2 computes? | 19:00 |
jrosser | the equivalent of 3 controllers in one of our H/A CI jobs takes 20 mins for setup-infrastructure | 19:08 |
spatel | darman i would go with OVN if this is new cloud. because after few year converting production cloud would be mess. | 19:20 |
spatel | I am deploying all new cloud using OVN | 19:20 |
darman | spatel: +1, the 'converting in the future' is good point | 19:22 |
jrosser | darman: do you find anything yet with your galera trouble? | 19:24 |
spatel | eventually linuxbridge will die if no maintainer left. new version of OS will stop delivering it. | 19:24 |
darman | jrosser: not yet. | 19:26 |
jrosser | you need to find out why from the perspective of haproxy the backend is down | 19:27 |
jrosser | there is a healthcheck | 19:27 |
jrosser | and there is basic network connectivity to check | 19:27 |
darman | It seems that the examples in the repository (/etc/openstack_deploy) are not suitable for deploying with OVN. | 19:27 |
darman | Is there a place where users have shared their configs? Or if it is possible to share the here by removing sensitive data? | 19:27 |
jrosser | hopefully everything is in the documentation | 19:28 |
darman | `an address in 172.17.... but you define internal and external vip in 10.x ranges` I manually changed it to 10.0.0 when posting the error here to make it clearer! | 19:28 |
spatel | I did blog out some OVN stuff - https://satishdotpatel.github.io/openstack-ansible-multinode-ovn/ | 19:28 |
jrosser | https://docs.openstack.org/openstack-ansible-os_neutron/latest/app-ovn.html | 19:28 |
jrosser | spatel: you may need to update your blog for the changes in zed/master? | 19:29 |
spatel | related SSL? | 19:29 |
spatel | but method would be same.. running playbook etc.. correct? | 19:29 |
jrosser | well i don't know :) | 19:29 |
spatel | i don't think we did any major changes in OVN deployment | 19:30 |
spatel | I will sure deploy zed with multinode and give it a try | 19:30 |
darman | spatel: nice, I'll try your configs in that blog post | 19:31 |
spatel | Try in lab first and let me know if any change required.. | 19:31 |
spatel | jrosser we should put some of my blogs links to OSA/OVN deployment example. Its not prefect but can help someone to give it a try :) | 19:33 |
jrosser | well i think it may just lead to confusion | 19:33 |
spatel | I will add more stuff as required | 19:34 |
jrosser | as the AIO now defaults to OVN...... | 19:34 |
jrosser | so that is the 'reference' deployment | 19:34 |
darman | spatel: Ah, you're using `/etc/openstack_deploy/env.d/neutron.yml` there, but I don't have it! Let me try it. | 19:34 |
jrosser | spatel: ^ see | 19:34 |
jrosser | now we have total confusion | 19:34 |
jrosser | darman: have you yet used the "all-in-one" deployment? | 19:35 |
darman | No, I wanted an environment as close as possible to production. | 19:36 |
spatel | jrosser you are right Zed has built in environment for OVN so that step can be skip. | 19:36 |
jrosser | so why follow that blog? | 19:37 |
jrosser | you already have the default neutron env.d from here which is wildly different https://github.com/openstack/openstack-ansible/blob/master/inventory/env.d/neutron.yml | 19:37 |
jrosser | darman: i am pretty unclear what you want to acheive | 19:37 |
jrosser | the all-in-one will get you going automatically in a single VM and is more likley to work than anything else, as it is the *exact* code that we run in CI | 19:38 |
darman | jrosser: Installation test for a multi-node environment | 19:38 |
admin1 | issue with going right now with ovn is that it does not support all LB functions, .. so tools like CAPI do not work | 19:38 |
jrosser | then when your multinode is haveing difficulty you can use the AIO as a reference to see what is different / broken | 19:38 |
spatel | admin1 LB is totally different service, you can use amphora if you want advance LB feature with OVN. What is CAPI? | 19:40 |
jrosser | darman: if you want help with your deployment error - do you have a specific question? | 19:41 |
darman | I am doing a T-shoot. If I can't solve it, I will ask here | 19:43 |
darman | jrosser: For the all-in-one, I'm going to follow this doc: https://docs.openstack.org/openstack-ansible/latest/user/aio/quickstart.html, it's ok, right? | 19:44 |
jrosser | well, 'latest' in the URL means that is the documentation for master branch, which is the next release | 19:45 |
jrosser | the current release is here https://docs.openstack.org/openstack-ansible/zed/user/aio/quickstart.html | 19:45 |
darman | thanks | 19:46 |
jrosser | and personally i would check out stable/zed instead of the tag | 19:46 |
admin1 | capi is Kubernetes Cluster API .. its getting popular now a days as the way to deploy k8s cluster on clouds | 19:46 |
admin1 | including os | 19:46 |
admin1 | os => openstack | 19:46 |
admin1 | i will test a multinode install with ovn and see how far i can go | 19:46 |
admin1 | darman, if you have a big server where you can create vms, you can make it as close to prod as possible .. | 19:47 |
darman | I have an HP G8 server running ProxMox, old but still powerful | 19:48 |
admin1 | you can create vms, replicate the network and vlans and even router | 19:49 |
admin1 | mimic ip address and everything to the exact detail | 19:49 |
admin1 | i rented a AMD EPYC from hetzner :) | 19:49 |
admin1 | works good | 19:49 |
admin1 | put 2 nvmes in raid0 | 19:49 |
admin1 | so that the build goes faster | 19:49 |
admin1 | and use vyos for the router | 19:50 |
admin1 | to mimic vlans and DC side of stuff | 19:50 |
spatel | I am running all my openstack labs on single VMware HOST (gen8 with 128GB ram 1TB SSD) | 19:50 |
darman | vyos --> interesting +1 | 19:53 |
darman | "the output has been hidden due to the fact that 'no_log: true' was specified for this result" | 20:00 |
darman | How can I override `no_log` to be false | 20:00 |
darman | ? | 20:00 |
jrosser | there is no way to override that without editing the code | 20:01 |
jrosser | from the top of my head its something like /etc/ansible/ansible_collections/openstack/osa/roles/db_setup/tasks/main.yml | 20:05 |
jrosser | ^ adjust to match reality | 20:05 |
darman | no_log is only used in: `/opt/openstack-ansible/playbooks/healthcheck-infrastructure.yml` | 20:12 |
darman | `/opt/openstack-ansible/playbooks/ceph-rgw-keystone-setup.yml` | 20:12 |
darman | `/opt/openstack-ansible/playbooks/rabbitmq-install.yml` | 20:12 |
darman | by `grep -r no_log /opt/openstack-ansible/` | 20:12 |
jrosser | did you see the path i gave? | 20:13 |
darman | changing it false on all above file didn't have any effect! For keystone installation, it still says: FAILED! => {"censored": "the output has been hidden due to the fact that `no_log: true` was specified for this result", "changed": false} | 20:13 |
darman | Oops, I saw that message now. w8 | 20:14 |
darman | Worked, and now It says what the issue is: "`unable to connect to database, check login_user and login_password are correct or /root/.my.cnf has the credentials. Exception message: (2013, 'Lost connection to MySQL server during query')"` | 20:20 |
jrosser | did you get haproxy to think that the galera back end was up? | 20:20 |
darman | From the haproxy aspect, all containers are down! https://i.imgur.com/SPceco6.png | 20:27 |
darman | ^ `hatop -s /var/run/haproxy.stat` | 20:27 |
jamesdenton | spatel Your blog is great, but some of what you outline is no longer necessary with OSA Zed, and there's an extra group or two that need defined. | 20:28 |
jrosser | darman: they will all be down until the services are deployed, and as you have a failure on keystone that is the first openstack service, so it is not a surprise that they are down | 20:30 |
jrosser | however, the database service should be up after you have run setup-infrastructure | 20:30 |
jrosser | look at really basic things, is the database service in the db container actually running? does the journal suggest anything is wrong | 20:31 |
jrosser | can you ping the db backend IP from where haproxy is running | 20:31 |
jrosser | what happens if you curl/wget the db backend healthcheck service from haproxy? | 20:31 |
admin1 | darman, single controller ? | 20:50 |
admin1 | i had the same issue a day back .. i had to manually fix the database check to whitelist the ip | 20:50 |
darman | Woooops! not possible to ping containers as I was using the wrong range in the `openstack_user_config.yml` for br-mgmt interface. I'm going to destroy containers, then deploy everything from the step setup-hosts.yml to assign new IPs to the containers. | 20:55 |
jrosser | darman: also make sure you disable any IP/mc security stuff if there is any in proxmox | 20:56 |
jrosser | *ip/mac address.... | 20:56 |
jrosser | admin1: it is not one controller | 20:56 |
prometheanfire | ping from vm on node 1 to vm on node2 fails with ovn for me, vm on node 1 to second vm on node 2 works. I see the icmp packets hit node2's geneve interface though, but nothing beyond that | 21:00 |
prometheanfire | trying to figure out why packets are not being forwarded is 'fun' | 21:01 |
prometheanfire | that I can't run ovn-nbctl (or sbctl) doesn't help, tried passing the right socket and ssl terms | 21:02 |
spatel | prometheanfire do you have OVN in cluster? | 21:04 |
spatel | you can run ovn-nbctl only on leader node. | 21:04 |
prometheanfire | oh, didn't know that part, guess I'll run it on the leader lol | 21:04 |
spatel | if you want to run from member node then you need to pass some switch call --not-leader or something... | 21:05 |
spatel | --no-leader-only | 21:06 |
spatel | https://man7.org/linux/man-pages/man8/ovn-nbctl.8.html | 21:06 |
spatel | you can use that switch on non-leader node to get data of OVN | 21:06 |
prometheanfire | ya, got the command working at least | 21:06 |
spatel | ovn has nice tool called ovn-trace which can simulate packet flow and tell you where is the blockage or drop | 21:08 |
spatel | jamesdenton i will redefine my blog with latest Zed or make some comments. | 21:09 |
spatel | jrosser is correct because when i deploy openstack on VMware then i disabled mac spoofing and some security shit in VMware. | 21:12 |
*** dviroel is now known as dviroel|pto | 21:12 | |
prometheanfire | not getting anything useful from ovn-trace, shows that the packet should reach the instance :| | 21:50 |
spatel | you can ping vm running on same compute node but not across the compute nodes correct? | 21:53 |
prometheanfire | yep | 21:53 |
prometheanfire | I see the packet reach the geneve interface on compute-node-2 | 21:54 |
spatel | Geneve tunnel is up.. assuming yes | 21:54 |
prometheanfire | but that's the end | 21:54 |
prometheanfire | is there a way to regenerate the openflow table on node-2? | 21:54 |
spatel | security group etc.. blocking it | 21:54 |
prometheanfire | I don't think so, at least the ovn-trace seemed to work | 21:55 |
spatel | what is the output of ovs-vsctl show? | 21:55 |
prometheanfire | for br-int? | 21:56 |
spatel | ovs-vsctl show command output | 21:57 |
prometheanfire | https://pastebin.com/raw/JBFrSy4v | 21:58 |
spatel | looks good so far i can see tunnel and tap interface on br-int bridge | 22:00 |
prometheanfire | yep, I can only think that it's some flow that's not working, harder to troubleshoot that lxb lol | 22:01 |
prometheanfire | is there a good way to rule out port security? | 22:04 |
prometheanfire | ofctl dump-ports shows the vm port recieving packets at the rate of the ping, so ovs seems to be routing it that far | 22:06 |
spatel | This is what i have and everything works for me - https://paste.opendev.org/show/bjDY3HTMJV4fNtzIGSxK/ | 22:09 |
spatel | i wonder why we have br-tun | 22:10 |
spatel | in my case i have tunnel directly connected to br-int | 22:10 |
spatel | make sure you configure security-group with allow all.. | 22:11 |
prometheanfire | I just disabled security groups entirely on the port to test, no good | 22:11 |
spatel | many time i endup in that issue where i assumed security-group is ok but endup finding issue there | 22:12 |
spatel | what do you means disable security-group entries? | 22:12 |
prometheanfire | openstack port set --no-security-group --no-port-security | 22:12 |
prometheanfire | something like that | 22:12 |
spatel | i don't think that is the issue here.. i am talking about security-group rules | 22:13 |
prometheanfire | ah, with things disabled that's not it | 22:13 |
spatel | openstack security group list | 22:13 |
prometheanfire | I have a secgroup allowing all outbound and icmp+22 inbound | 22:13 |
spatel | just make sure.. its :) | 22:14 |
prometheanfire | also, having just removed the secgroup from the port should remove that variable, ovn-trace says all packets should reach (tested port 123) | 22:14 |
spatel | I have to leave now.. but please keep us posted on progress | 22:19 |
spatel | run ovs-tcpdump command which will help you to find painpoints | 22:19 |
prometheanfire | yep, used that too :D | 22:24 |
prometheanfire | cya | 22:24 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!